Release version 0.53.0

Merge pull request #456 from Iamrodos/fix-case
fix: case-sensitive username filtering causing silent backup failures
2025-12-05 16:18:02 +01:00 · 2025-11-30 04:30:48 +00:00 · 2025-11-29 23:30:07 -05:00 · 2025-11-29 23:29:30 -05:00 · 2025-11-29 21:16:22 +11:00 · 2025-11-29 17:21:14 +11:00
12 changed files with 866 additions and 95 deletions
--- a/.github/workflows/automatic-release.yml
+++ b/.github/workflows/automatic-release.yml
@@ -18,7 +18,7 @@ jobs:
        runs-on: ubuntu-24.04
        steps:
            - name: Checkout repository
-              uses: actions/checkout@v5
+              uses: actions/checkout@v6
              with:
                fetch-depth: 0
                ssh-key: ${{ secrets.DEPLOY_PRIVATE_KEY }}
--- a/.github/workflows/docker.yml
+++ b/.github/workflows/docker.yml
@@ -38,7 +38,7 @@ jobs:
    steps:
      - name: Checkout repository
-        uses: actions/checkout@v5
+        uses: actions/checkout@v6
        with:
          persist-credentials: false
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@@ -21,7 +21,7 @@ jobs:
    steps:
      - name: Checkout repository
-        uses: actions/checkout@v5
+        uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Setup Python
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -21,7 +21,7 @@ jobs:
    steps:
      - name: Checkout repository
-        uses: actions/checkout@v5
+        uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Setup Python
--- a/CHANGES.rst
+++ b/CHANGES.rst
@@ -1,10 +1,161 @@
 Changelog
 =========
-0.51.1 (2025-11-16)
+0.53.0 (2025-11-30)
 -------------------
 ------------------------
 Fix
 ~~~
 - Case-sensitive username filtering causing silent backup failures.
  [Rodos]
  GitHub's API accepts usernames in any case but returns canonical case.
  The case-sensitive comparison in filter_repositories() filtered out all
  repositories when user-provided case didn't match GitHub's canonical case.
  Changed to case-insensitive comparison.
  Fixes #198
 Other
 ~~~~~
 - Avoid rewriting unchanged JSON files for labels, milestones, releases,
  hooks, followers, and following. [Rodos]
  This change reduces unnecessary writes when backing up metadata that changes
  infrequently. The implementation compares existing file content before writing
  and skips the write if the content is identical, preserving file timestamps.
  Key changes:
  - Added json_dump_if_changed() helper that compares content before writing
  - Uses atomic writes (temp file + rename) for all metadata files
  - NOT applied to issues/pulls (they use incremental_by_files logic)
  - Made log messages consistent and past tense ("Saved" instead of "Saving")
  - Added informative logging showing skip counts
  Fixes #133
 0.52.0 (2025-11-28)
 -------------------
 - Skip DMCA'd repos which return a 451 response. [Rodos]
  Log a warning and the link to the DMCA notice. Continue backing up
  other repositories instead of crashing.
  Closes #163
 - Chore(deps): bump restructuredtext-lint in the python-packages group.
  [dependabot[bot]]
  Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint).
  Updates `restructuredtext-lint` from 1.4.0 to 2.0.2
  - [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst)
  - [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2)
  ---
  updated-dependencies:
  - dependency-name: restructuredtext-lint
    dependency-version: 2.0.2
    dependency-type: direct:production
    update-type: version-update:semver-major
    dependency-group: python-packages
  ...
 - Chore(deps): bump actions/checkout from 5 to 6. [dependabot[bot]]
  Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
  - [Release notes](https://github.com/actions/checkout/releases)
  - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
  - [Commits](https://github.com/actions/checkout/compare/v5...v6)
  ---
  updated-dependencies:
  - dependency-name: actions/checkout
    dependency-version: '6'
    dependency-type: direct:production
    update-type: version-update:semver-major
  ...
 - Chore(deps): bump the python-packages group with 3 updates.
  [dependabot[bot]]
  Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring).
  Updates `click` from 8.3.0 to 8.3.1
  - [Release notes](https://github.com/pallets/click/releases)
  - [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst)
  - [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1)
  Updates `pytest` from 8.3.3 to 9.0.1
  - [Release notes](https://github.com/pytest-dev/pytest/releases)
  - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
  - [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1)
  Updates `keyring` from 25.6.0 to 25.7.0
  - [Release notes](https://github.com/jaraco/keyring/releases)
  - [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst)
  - [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0)
  ---
  updated-dependencies:
  - dependency-name: click
    dependency-version: 8.3.1
    dependency-type: direct:production
    update-type: version-update:semver-patch
    dependency-group: python-packages
  - dependency-name: pytest
    dependency-version: 9.0.1
    dependency-type: direct:production
    update-type: version-update:semver-major
    dependency-group: python-packages
  - dependency-name: keyring
    dependency-version: 25.7.0
    dependency-type: direct:production
    update-type: version-update:semver-minor
    dependency-group: python-packages
  ...
 0.51.3 (2025-11-18)
 -------------------
 - Test: Add pagination tests for cursor and page-based Link headers.
  [Rodos]
 - Use cursor based pagination. [Helio Machado]
 0.51.2 (2025-11-16)
 -------------------
 Fix
 ~~~
 - Improve CA certificate detection with fallback chain. [Rodos]
  The previous implementation incorrectly assumed empty get_ca_certs()
  meant broken SSL, causing false failures in GitHub Codespaces and other
  directory-based cert systems where certificates exist but aren't pre-loaded.
  It would then attempt to import certifi as a workaround, but certifi wasn't
  listed in requirements.txt, causing the fallback to fail with ImportError
  even though the system certificates would have worked fine.
  This commit replaces the naive check with a layered fallback approach that
  checks multiple certificate sources. First it checks for pre-loaded system
  certs (file-based systems). Then it verifies system cert paths exist
  (directory-based systems like Ubuntu/Debian/Codespaces). Finally it attempts
  to use certifi as an optional fallback only if needed.
  This approach eliminates hard dependencies (certifi is now optional), works
  in GitHub Codespaces without any setup, and fails gracefully with clear hints
  for resolution when SSL is actually broken rather than failing with
  ModuleNotFoundError.
  Fixes #444
 0.51.1 (2025-11-16)
 -------------------
 Fix
 ~~~
 - Prevent duplicate attachment downloads. [Rodos]
--- a/github_backup/init.py
+++ b/github_backup/init.py
@@ -1 +1 @@
-__version__ = "0.51.1"
+__version__ = "0.53.0"
--- a/github_backup/github_backup.py
+++ b/github_backup/github_backup.py
@@ -37,22 +37,42 @@ FNULL = open(os.devnull, "w")
 FILE_URI_PREFIX = "file://"
 logger = logging.getLogger(__name__)
 class RepositoryUnavailableError(Exception):
    """Raised when a repository is unavailable due to legal reasons (e.g., DMCA takedown)."""
    def __init__(self, message, dmca_url=None):
        super().__init__(message)
        self.dmca_url = dmca_url
 # Setup SSL context with fallback chain
 https_ctx = ssl.create_default_context()
-if not https_ctx.get_ca_certs():
+if https_ctx.get_ca_certs():
-    import warnings
+    # Layer 1: Certificates pre-loaded from system (file-based)
    pass
 else:
    paths = ssl.get_default_verify_paths()
    if (paths.cafile and os.path.exists(paths.cafile)) or (
        paths.capath and os.path.exists(paths.capath)
    ):
        # Layer 2: Cert paths exist, will be lazy-loaded on first use (directory-based)
        pass
    else:
        # Layer 3: Try certifi package as optional fallback
        try:
            import certifi
-    warnings.warn(
+            https_ctx = ssl.create_default_context(cafile=certifi.where())
-        "\n\nYOUR DEFAULT CA CERTS ARE EMPTY.\n"
+        except ImportError:
-        + "PLEASE POPULATE ANY OF:"
+            # All layers failed - no certificates available anywhere
-        + "".join(
+            sys.exit(
-            ["\n - " + x for x in ssl.get_default_verify_paths() if type(x) is str]
+                "\nERROR: No CA certificates found. Cannot connect to GitHub over SSL.\n\n"
-        )
+                "Solutions you can explore:\n"
-        + "\n",
+                "  1. pip install certifi\n"
-        stacklevel=2,
+                "  2. Alpine: apk add ca-certificates\n"
-    )
+                "  3. Debian/Ubuntu: apt-get install ca-certificates\n\n"
-    import certifi
+            )
    https_ctx = ssl.create_default_context(cafile=certifi.where())
 def logging_subprocess(
@@ -581,27 +601,39 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
    auth = get_auth(args, encode=not args.as_app)
    query_args = get_query_args(query_args)
    per_page = 100
-    page = 0
+    next_url = None
    while True:
        if single_request:
-            request_page, request_per_page = None, None
+            request_per_page = None
        else:
-            page = page + 1
+            request_per_page = per_page
            request_page, request_per_page = page, per_page
        request = _construct_request(
            request_per_page,
            request_page,
            query_args,
-            template,
+            next_url or template,
            auth,
            as_app=args.as_app,
            fine=True if args.token_fine is not None else False,
        )  # noqa
-        r, errors = _get_response(request, auth, template)
+        r, errors = _get_response(request, auth, next_url or template)
        status_code = int(r.getcode())
        # Handle DMCA takedown (HTTP 451) - raise exception to skip entire repository
        if status_code == 451:
            dmca_url = None
            try:
                response_data = json.loads(r.read().decode("utf-8"))
                dmca_url = response_data.get("block", {}).get("html_url")
            except Exception:
                pass
            raise RepositoryUnavailableError(
                "Repository unavailable due to legal reasons (HTTP 451)",
                dmca_url=dmca_url
            )
        # Check if we got correct data
        try:
            response = json.loads(r.read().decode("utf-8"))
@@ -633,15 +665,14 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
            retries += 1
            time.sleep(5)
            request = _construct_request(
-                per_page,
+                request_per_page,
                page,
                query_args,
-                template,
+                next_url or template,
                auth,
                as_app=args.as_app,
                fine=True if args.token_fine is not None else False,
            )  # noqa
-            r, errors = _get_response(request, auth, template)
+            r, errors = _get_response(request, auth, next_url or template)
            status_code = int(r.getcode())
            try:
@@ -671,7 +702,16 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
            if type(response) is list:
                for resp in response:
                    yield resp
-                if len(response) < per_page:
+                # Parse Link header for next page URL (cursor-based pagination)
                link_header = r.headers.get("Link", "")
                next_url = None
                if link_header:
                    # Parse Link header: <https://api.github.com/...?per_page=100&after=cursor>; rel="next"
                    for link in link_header.split(","):
                        if 'rel="next"' in link:
                            next_url = link[link.find("<") + 1:link.find(">")]
                            break
                if not next_url:
                    break
            elif type(response) is dict and single_request:
                yield response
@@ -724,22 +764,27 @@ def _get_response(request, auth, template):
 def _construct_request(
-    per_page, page, query_args, template, auth, as_app=None, fine=False
+    per_page, query_args, template, auth, as_app=None, fine=False
 ):
-    all_query_args = {}
+    # If template is already a full URL with query params (from Link header), use it directly
-    if per_page:
+    if "?" in template and template.startswith("http"):
-        all_query_args["per_page"] = per_page
+        request_url = template
-    if page:
+        # Extract query string for logging
-        all_query_args["page"] = page
+        querystring = template.split("?", 1)[1]
    if query_args:
        all_query_args.update(query_args)
    request_url = template
    if all_query_args:
        querystring = urlencode(all_query_args)
        request_url = template + "?" + querystring
    else:
-        querystring = ""
+        # Build URL with query parameters
        all_query_args = {}
        if per_page:
            all_query_args["per_page"] = per_page
        if query_args:
            all_query_args.update(query_args)
        request_url = template
        if all_query_args:
            querystring = urlencode(all_query_args)
            request_url = template + "?" + querystring
        else:
            querystring = ""
    request = Request(request_url)
    if auth is not None:
@@ -755,7 +800,7 @@ def _construct_request(
                "Accept", "application/vnd.github.machine-man-preview+json"
            )
-    log_url = template
+    log_url = template if "?" not in template else template.split("?")[0]
    if querystring:
        log_url += "?" + querystring
    logger.info("Requesting {}".format(log_url))
@@ -832,8 +877,7 @@ def download_file(url, path, auth, as_app=False, fine=False):
        return
    request = _construct_request(
-        per_page=100,
+        per_page=None,
        page=1,
        query_args={},
        template=url,
        auth=auth,
@@ -1543,7 +1587,9 @@ def filter_repositories(args, unfiltered_repositories):
    repositories = []
    for r in unfiltered_repositories:
        # gists can be anonymous, so need to safely check owner
-        if r.get("owner", {}).get("login") == args.user or r.get("is_starred"):
+        # Use case-insensitive comparison to match GitHub's case-insensitive username behavior
        owner_login = r.get("owner", {}).get("login", "")
        if owner_login.lower() == args.user.lower() or r.get("is_starred"):
            repositories.append(r)
    name_regex = None
@@ -1646,40 +1692,47 @@ def backup_repositories(args, output_directory, repositories):
                continue  # don't try to back anything else for a gist; it doesn't exist
-        download_wiki = args.include_wiki or args.include_everything
+        try:
-        if repository["has_wiki"] and download_wiki:
+            download_wiki = args.include_wiki or args.include_everything
-            fetch_repository(
+            if repository["has_wiki"] and download_wiki:
-                repository["name"],
+                fetch_repository(
-                repo_url.replace(".git", ".wiki.git"),
+                    repository["name"],
-                os.path.join(repo_cwd, "wiki"),
+                    repo_url.replace(".git", ".wiki.git"),
-                skip_existing=args.skip_existing,
+                    os.path.join(repo_cwd, "wiki"),
-                bare_clone=args.bare_clone,
+                    skip_existing=args.skip_existing,
-                lfs_clone=args.lfs_clone,
+                    bare_clone=args.bare_clone,
-                no_prune=args.no_prune,
+                    lfs_clone=args.lfs_clone,
-            )
+                    no_prune=args.no_prune,
-        if args.include_issues or args.include_everything:
+                )
-            backup_issues(args, repo_cwd, repository, repos_template)
+            if args.include_issues or args.include_everything:
                backup_issues(args, repo_cwd, repository, repos_template)
-        if args.include_pulls or args.include_everything:
+            if args.include_pulls or args.include_everything:
-            backup_pulls(args, repo_cwd, repository, repos_template)
+                backup_pulls(args, repo_cwd, repository, repos_template)
-        if args.include_milestones or args.include_everything:
+            if args.include_milestones or args.include_everything:
-            backup_milestones(args, repo_cwd, repository, repos_template)
+                backup_milestones(args, repo_cwd, repository, repos_template)
-        if args.include_labels or args.include_everything:
+            if args.include_labels or args.include_everything:
-            backup_labels(args, repo_cwd, repository, repos_template)
+                backup_labels(args, repo_cwd, repository, repos_template)
-        if args.include_hooks or args.include_everything:
+            if args.include_hooks or args.include_everything:
-            backup_hooks(args, repo_cwd, repository, repos_template)
+                backup_hooks(args, repo_cwd, repository, repos_template)
-        if args.include_releases or args.include_everything:
+            if args.include_releases or args.include_everything:
-            backup_releases(
+                backup_releases(
-                args,
+                    args,
-                repo_cwd,
+                    repo_cwd,
-                repository,
+                    repository,
-                repos_template,
+                    repos_template,
-                include_assets=args.include_assets or args.include_everything,
+                    include_assets=args.include_assets or args.include_everything,
-            )
+                )
        except RepositoryUnavailableError as e:
            logger.warning(f"Repository {repository['full_name']} is unavailable (HTTP 451)")
            if e.dmca_url:
                logger.warning(f"DMCA notice: {e.dmca_url}")
            logger.info(f"Skipping remaining resources for {repository['full_name']}")
            continue
    if args.incremental:
        if last_update == "0000-00-00T00:00:00Z":
@@ -1847,11 +1900,21 @@ def backup_milestones(args, repo_cwd, repository, repos_template):
    for milestone in _milestones:
        milestones[milestone["number"]] = milestone
-    logger.info("Saving {0} milestones to disk".format(len(list(milestones.keys()))))
+    written_count = 0
    for number, milestone in list(milestones.items()):
        milestone_file = "{0}/{1}.json".format(milestone_cwd, number)
-        with codecs.open(milestone_file, "w", encoding="utf-8") as f:
+        if json_dump_if_changed(milestone, milestone_file):
-            json_dump(milestone, f)
+            written_count += 1
    total = len(milestones)
    if written_count == total:
        logger.info("Saved {0} milestones to disk".format(total))
    elif written_count == 0:
        logger.info("{0} milestones unchanged, skipped write".format(total))
    else:
        logger.info("Saved {0} of {1} milestones to disk ({2} unchanged)".format(
            written_count, total, total - written_count
        ))
 def backup_labels(args, repo_cwd, repository, repos_template):
@@ -1904,19 +1967,17 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
            reverse=True,
        )
        releases = releases[: args.number_of_latest_releases]
        logger.info("Saving the latest {0} releases to disk".format(len(releases)))
    else:
        logger.info("Saving {0} releases to disk".format(len(releases)))
    # for each release, store it
    written_count = 0
    for release in releases:
        release_name = release["tag_name"]
        release_name_safe = release_name.replace("/", "__")
        output_filepath = os.path.join(
            release_cwd, "{0}.json".format(release_name_safe)
        )
-        with codecs.open(output_filepath, "w+", encoding="utf-8") as f:
+        if json_dump_if_changed(release, output_filepath):
-            json_dump(release, f)
+            written_count += 1
        if include_assets:
            assets = retrieve_data(args, release["assets_url"])
@@ -1933,6 +1994,17 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
                        fine=True if args.token_fine is not None else False,
                    )
    # Log the results
    total = len(releases)
    if written_count == total:
        logger.info("Saved {0} releases to disk".format(total))
    elif written_count == 0:
        logger.info("{0} releases unchanged, skipped write".format(total))
    else:
        logger.info("Saved {0} of {1} releases to disk ({2} unchanged)".format(
            written_count, total, total - written_count
        ))
 def fetch_repository(
    name,
@@ -2057,9 +2129,10 @@ def _backup_data(args, name, template, output_file, output_directory):
        mkdir_p(output_directory)
        data = retrieve_data(args, template)
-        logger.info("Writing {0} {1} to disk".format(len(data), name))
+        if json_dump_if_changed(data, output_file):
-        with codecs.open(output_file, "w", encoding="utf-8") as f:
+            logger.info("Saved {0} {1} to disk".format(len(data), name))
-            json_dump(data, f)
+        else:
            logger.info("{0} {1} unchanged, skipped write".format(len(data), name))
 def json_dump(data, output_file):
@@ -2071,3 +2144,57 @@ def json_dump(data, output_file):
        indent=4,
        separators=(",", ": "),
    )
 def json_dump_if_changed(data, output_file_path):
    """
    Write JSON data to file only if content has changed.
    Compares the serialized JSON data with the existing file content
    and only writes if different. This prevents unnecessary file
    modification timestamp updates and disk writes.
    Uses atomic writes (temp file + rename) to prevent corruption
    if the process is interrupted during the write.
    Args:
        data: The data to serialize as JSON
        output_file_path: The path to the output file
    Returns:
        True if file was written (content changed or new file)
        False if write was skipped (content unchanged)
    """
    # Serialize new data with consistent formatting matching json_dump()
    new_content = json.dumps(
        data,
        ensure_ascii=False,
        sort_keys=True,
        indent=4,
        separators=(",", ": "),
    )
    # Check if file exists and compare content
    if os.path.exists(output_file_path):
        try:
            with codecs.open(output_file_path, "r", encoding="utf-8") as f:
                existing_content = f.read()
            if existing_content == new_content:
                logger.debug(
                    "Content unchanged, skipping write: {0}".format(output_file_path)
                )
                return False
        except (OSError, UnicodeDecodeError) as e:
            # If we can't read the existing file, write the new one
            logger.debug(
                "Error reading existing file {0}, will overwrite: {1}".format(
                    output_file_path, e
                )
            )
    # Write the file atomically using temp file + rename
    temp_file = output_file_path + ".temp"
    with codecs.open(temp_file, "w", encoding="utf-8") as f:
        f.write(new_content)
    os.rename(temp_file, output_file_path)  # Atomic on POSIX systems
    return True
--- a/release-requirements.txt
+++ b/release-requirements.txt
@@ -3,16 +3,16 @@ black==25.11.0
 bleach==6.3.0
 certifi==2025.11.12
 charset-normalizer==3.4.4
-click==8.3.0
+click==8.3.1
 colorama==0.4.6
 docutils==0.22.3
 flake8==7.3.0
 gitchangelog==3.0.4
-pytest==8.3.3
+pytest==9.0.1
 idna==3.11
 importlib-metadata==8.7.0
 jaraco.classes==3.4.0
-keyring==25.6.0
+keyring==25.7.0
 markdown-it-py==4.0.0
 mccabe==0.7.0
 mdurl==0.1.2
@@ -28,7 +28,7 @@ Pygments==2.19.2
 readme-renderer==44.0
 requests==2.32.5
 requests-toolbelt==1.0.0
-restructuredtext-lint==1.4.0
+restructuredtext-lint==2.0.2
 rfc3986==2.0.0
 rich==14.2.0
 setuptools==80.9.0
--- a/requirements.txt
+++ b/requirements.txt
@@ -1 +0,0 @@
--- a/tests/test_http_451.py
+++ b/tests/test_http_451.py
@@ -0,0 +1,143 @@
 """Tests for HTTP 451 (DMCA takedown) handling."""
 import json
 from unittest.mock import Mock, patch
 import pytest
 from github_backup import github_backup
 class TestHTTP451Exception:
    """Test suite for HTTP 451 DMCA takedown exception handling."""
    def test_repository_unavailable_error_raised(self):
        """HTTP 451 should raise RepositoryUnavailableError with DMCA URL."""
        # Create mock args
        args = Mock()
        args.as_app = False
        args.token_fine = None
        args.token_classic = None
        args.username = None
        args.password = None
        args.osx_keychain_item_name = None
        args.osx_keychain_item_account = None
        args.throttle_limit = None
        args.throttle_pause = 0
        # Mock HTTPError 451 response
        mock_response = Mock()
        mock_response.getcode.return_value = 451
        dmca_data = {
            "message": "Repository access blocked",
            "block": {
                "reason": "dmca",
                "created_at": "2024-11-12T14:38:04Z",
                "html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
            }
        }
        mock_response.read.return_value = json.dumps(dmca_data).encode("utf-8")
        mock_response.headers = {"x-ratelimit-remaining": "5000"}
        mock_response.reason = "Unavailable For Legal Reasons"
        def mock_get_response(request, auth, template):
            return mock_response, []
        with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
            with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
                list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
            # Check exception has DMCA URL
            assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
            assert "451" in str(exc_info.value)
    def test_repository_unavailable_error_without_dmca_url(self):
        """HTTP 451 without DMCA details should still raise exception."""
        args = Mock()
        args.as_app = False
        args.token_fine = None
        args.token_classic = None
        args.username = None
        args.password = None
        args.osx_keychain_item_name = None
        args.osx_keychain_item_account = None
        args.throttle_limit = None
        args.throttle_pause = 0
        mock_response = Mock()
        mock_response.getcode.return_value = 451
        mock_response.read.return_value = b'{"message": "Blocked"}'
        mock_response.headers = {"x-ratelimit-remaining": "5000"}
        mock_response.reason = "Unavailable For Legal Reasons"
        def mock_get_response(request, auth, template):
            return mock_response, []
        with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
            with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
                list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
            # Exception raised even without DMCA URL
            assert exc_info.value.dmca_url is None
            assert "451" in str(exc_info.value)
    def test_repository_unavailable_error_with_malformed_json(self):
        """HTTP 451 with malformed JSON should still raise exception."""
        args = Mock()
        args.as_app = False
        args.token_fine = None
        args.token_classic = None
        args.username = None
        args.password = None
        args.osx_keychain_item_name = None
        args.osx_keychain_item_account = None
        args.throttle_limit = None
        args.throttle_pause = 0
        mock_response = Mock()
        mock_response.getcode.return_value = 451
        mock_response.read.return_value = b"invalid json {"
        mock_response.headers = {"x-ratelimit-remaining": "5000"}
        mock_response.reason = "Unavailable For Legal Reasons"
        def mock_get_response(request, auth, template):
            return mock_response, []
        with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
            with pytest.raises(github_backup.RepositoryUnavailableError):
                list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
    def test_other_http_errors_unchanged(self):
        """Other HTTP errors should still raise generic Exception."""
        args = Mock()
        args.as_app = False
        args.token_fine = None
        args.token_classic = None
        args.username = None
        args.password = None
        args.osx_keychain_item_name = None
        args.osx_keychain_item_account = None
        args.throttle_limit = None
        args.throttle_pause = 0
        mock_response = Mock()
        mock_response.getcode.return_value = 404
        mock_response.read.return_value = b'{"message": "Not Found"}'
        mock_response.headers = {"x-ratelimit-remaining": "5000"}
        mock_response.reason = "Not Found"
        def mock_get_response(request, auth, template):
            return mock_response, []
        with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
            # Should raise generic Exception, not RepositoryUnavailableError
            with pytest.raises(Exception) as exc_info:
                list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/notfound/issues"))
            assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError)
            assert "404" in str(exc_info.value)
 if __name__ == "__main__":
    pytest.main([__file__, "-v"])
--- a/tests/test_json_dump_if_changed.py
+++ b/tests/test_json_dump_if_changed.py
@@ -0,0 +1,198 @@
 """Tests for json_dump_if_changed functionality."""
 import codecs
 import json
 import os
 import tempfile
 import pytest
 from github_backup import github_backup
 class TestJsonDumpIfChanged:
    """Test suite for json_dump_if_changed function."""
    def test_writes_new_file(self):
        """Should write file when it doesn't exist."""
        with tempfile.TemporaryDirectory() as tmpdir:
            output_file = os.path.join(tmpdir, "test.json")
            test_data = {"key": "value", "number": 42}
            result = github_backup.json_dump_if_changed(test_data, output_file)
            assert result is True
            assert os.path.exists(output_file)
            # Verify content matches expected format
            with codecs.open(output_file, "r", encoding="utf-8") as f:
                content = f.read()
                loaded = json.loads(content)
                assert loaded == test_data
    def test_skips_unchanged_file(self):
        """Should skip write when content is identical."""
        with tempfile.TemporaryDirectory() as tmpdir:
            output_file = os.path.join(tmpdir, "test.json")
            test_data = {"key": "value", "number": 42}
            # First write
            result1 = github_backup.json_dump_if_changed(test_data, output_file)
            assert result1 is True
            # Get the initial mtime
            mtime1 = os.path.getmtime(output_file)
            # Second write with same data
            result2 = github_backup.json_dump_if_changed(test_data, output_file)
            assert result2 is False
            # File should not have been modified
            mtime2 = os.path.getmtime(output_file)
            assert mtime1 == mtime2
    def test_writes_when_content_changed(self):
        """Should write file when content has changed."""
        with tempfile.TemporaryDirectory() as tmpdir:
            output_file = os.path.join(tmpdir, "test.json")
            test_data1 = {"key": "value1"}
            test_data2 = {"key": "value2"}
            # First write
            result1 = github_backup.json_dump_if_changed(test_data1, output_file)
            assert result1 is True
            # Second write with different data
            result2 = github_backup.json_dump_if_changed(test_data2, output_file)
            assert result2 is True
            # Verify new content
            with codecs.open(output_file, "r", encoding="utf-8") as f:
                loaded = json.load(f)
                assert loaded == test_data2
    def test_uses_consistent_formatting(self):
        """Should use same JSON formatting as json_dump."""
        with tempfile.TemporaryDirectory() as tmpdir:
            output_file = os.path.join(tmpdir, "test.json")
            test_data = {"z": "last", "a": "first", "m": "middle"}
            github_backup.json_dump_if_changed(test_data, output_file)
            with codecs.open(output_file, "r", encoding="utf-8") as f:
                content = f.read()
            # Check for consistent formatting:
            # - sorted keys
            # - 4-space indent
            # - comma-colon-space separator
            expected = json.dumps(
                test_data,
                ensure_ascii=False,
                sort_keys=True,
                indent=4,
                separators=(",", ": "),
            )
            assert content == expected
    def test_atomic_write_always_used(self):
        """Should always use temp file and rename for atomic writes."""
        with tempfile.TemporaryDirectory() as tmpdir:
            output_file = os.path.join(tmpdir, "test.json")
            test_data = {"key": "value"}
            result = github_backup.json_dump_if_changed(test_data, output_file)
            assert result is True
            assert os.path.exists(output_file)
            # Temp file should not exist after atomic write
            temp_file = output_file + ".temp"
            assert not os.path.exists(temp_file)
            # Verify content
            with codecs.open(output_file, "r", encoding="utf-8") as f:
                loaded = json.load(f)
                assert loaded == test_data
    def test_handles_unicode_content(self):
        """Should correctly handle Unicode content."""
        with tempfile.TemporaryDirectory() as tmpdir:
            output_file = os.path.join(tmpdir, "test.json")
            test_data = {
                "emoji": "🚀",
                "chinese": "你好",
                "arabic": "مرحبا",
                "cyrillic": "Привет",
            }
            result = github_backup.json_dump_if_changed(test_data, output_file)
            assert result is True
            # Verify Unicode is preserved
            with codecs.open(output_file, "r", encoding="utf-8") as f:
                loaded = json.load(f)
                assert loaded == test_data
            # Second write should skip
            result2 = github_backup.json_dump_if_changed(test_data, output_file)
            assert result2 is False
    def test_handles_complex_nested_data(self):
        """Should handle complex nested data structures."""
        with tempfile.TemporaryDirectory() as tmpdir:
            output_file = os.path.join(tmpdir, "test.json")
            test_data = {
                "users": [
                    {"id": 1, "name": "Alice", "tags": ["admin", "user"]},
                    {"id": 2, "name": "Bob", "tags": ["user"]},
                ],
                "metadata": {"version": "1.0", "nested": {"deep": {"value": 42}}},
            }
            result = github_backup.json_dump_if_changed(test_data, output_file)
            assert result is True
            # Verify structure is preserved
            with codecs.open(output_file, "r", encoding="utf-8") as f:
                loaded = json.load(f)
                assert loaded == test_data
    def test_overwrites_on_unicode_decode_error(self):
        """Should overwrite if existing file has invalid UTF-8."""
        with tempfile.TemporaryDirectory() as tmpdir:
            output_file = os.path.join(tmpdir, "test.json")
            test_data = {"key": "value"}
            # Write invalid UTF-8 bytes
            with open(output_file, "wb") as f:
                f.write(b"\xff\xfe invalid utf-8")
            # Should catch UnicodeDecodeError and overwrite
            result = github_backup.json_dump_if_changed(test_data, output_file)
            assert result is True
            # Verify new content was written
            with codecs.open(output_file, "r", encoding="utf-8") as f:
                loaded = json.load(f)
                assert loaded == test_data
    def test_key_order_independence(self):
        """Should treat differently-ordered dicts as same if keys/values match."""
        with tempfile.TemporaryDirectory() as tmpdir:
            output_file = os.path.join(tmpdir, "test.json")
            # Write first dict
            data1 = {"z": 1, "a": 2, "m": 3}
            github_backup.json_dump_if_changed(data1, output_file)
            # Try to write same data but different order
            data2 = {"a": 2, "m": 3, "z": 1}
            result = github_backup.json_dump_if_changed(data2, output_file)
            # Should skip because content is the same (keys are sorted)
            assert result is False
 if __name__ == "__main__":
    pytest.main([__file__, "-v"])
--- a/tests/test_pagination.py
+++ b/tests/test_pagination.py
@@ -0,0 +1,153 @@
 """Tests for Link header pagination handling."""
 import json
 from unittest.mock import Mock, patch
 import pytest
 from github_backup import github_backup
 class MockHTTPResponse:
    """Mock HTTP response for paginated API calls."""
    def __init__(self, data, link_header=None):
        self._content = json.dumps(data).encode("utf-8")
        self._link_header = link_header
        self._read = False
        self.reason = "OK"
    def getcode(self):
        return 200
    def read(self):
        if self._read:
            return b""
        self._read = True
        return self._content
    def get_header(self, name, default=None):
        """Mock method for headers.get()."""
        return self.headers.get(name, default)
    @property
    def headers(self):
        headers = {"x-ratelimit-remaining": "5000"}
        if self._link_header:
            headers["Link"] = self._link_header
        return headers
@pytest.fixture
 def mock_args():
    """Mock args for retrieve_data_gen."""
    args = Mock()
    args.as_app = False
    args.token_fine = None
    args.token_classic = "fake_token"
    args.username = None
    args.password = None
    args.osx_keychain_item_name = None
    args.osx_keychain_item_account = None
    args.throttle_limit = None
    args.throttle_pause = 0
    return args
 def test_cursor_based_pagination(mock_args):
    """Link header with 'after' cursor parameter works correctly."""
    # Simulate issues endpoint behavior: returns cursor in Link header
    responses = [
        # Issues endpoint returns 'after' cursor parameter (not 'page')
        MockHTTPResponse(
            data=[{"issue": i} for i in range(1, 101)],  # Page 1 contents
            link_header='<https://api.github.com/repos/owner/repo/issues?per_page=100&after=ABC123&page=2>; rel="next"',
        ),
        MockHTTPResponse(
            data=[{"issue": i} for i in range(101, 151)],  # Page 2 contents
            link_header=None,  # No Link header - signals end of pagination
        ),
    ]
    requests_made = []
    def mock_urlopen(request, *args, **kwargs):
        url = request.get_full_url()
        requests_made.append(url)
        return responses[len(requests_made) - 1]
    with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
        results = list(
            github_backup.retrieve_data_gen(
                mock_args, "https://api.github.com/repos/owner/repo/issues"
            )
        )
    # Verify all items retrieved and cursor was used in second request
    assert len(results) == 150
    assert len(requests_made) == 2
    assert "after=ABC123" in requests_made[1]
 def test_page_based_pagination(mock_args):
    """Link header with 'page' parameter works correctly."""
    # Simulate pulls/repos endpoint behavior: returns page numbers in Link header
    responses = [
        # Pulls endpoint uses traditional 'page' parameter (not cursor)
        MockHTTPResponse(
            data=[{"pull": i} for i in range(1, 101)],  # Page 1 contents
            link_header='<https://api.github.com/repos/owner/repo/pulls?per_page=100&page=2>; rel="next"',
        ),
        MockHTTPResponse(
            data=[{"pull": i} for i in range(101, 181)],  # Page 2 contents
            link_header=None,  # No Link header - signals end of pagination
        ),
    ]
    requests_made = []
    def mock_urlopen(request, *args, **kwargs):
        url = request.get_full_url()
        requests_made.append(url)
        return responses[len(requests_made) - 1]
    with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
        results = list(
            github_backup.retrieve_data_gen(
                mock_args, "https://api.github.com/repos/owner/repo/pulls"
            )
        )
    # Verify all items retrieved and page parameter was used (not cursor)
    assert len(results) == 180
    assert len(requests_made) == 2
    assert "page=2" in requests_made[1]
    assert "after" not in requests_made[1]
 def test_no_link_header_stops_pagination(mock_args):
    """Pagination stops when Link header is absent."""
    # Simulate endpoint with results that fit in a single page
    responses = [
        MockHTTPResponse(
            data=[{"label": i} for i in range(1, 51)],  # Page contents
            link_header=None,  # No Link header - signals end of pagination
        )
    ]
    requests_made = []
    def mock_urlopen(request, *args, **kwargs):
        requests_made.append(request.get_full_url())
        return responses[len(requests_made) - 1]
    with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
        results = list(
            github_backup.retrieve_data_gen(
                mock_args, "https://api.github.com/repos/owner/repo/labels"
            )
        )
    # Verify pagination stopped after first request
    assert len(results) == 50
    assert len(requests_made) == 1
Author	SHA1	Message	Date
GitHub Action	ff2681e196	Release version 0.53.0	2025-11-30 04:30:48 +00:00
Jose Diaz-Gonzalez	745b05a63f	Merge pull request #456 from Iamrodos/fix-case fix: case-sensitive username filtering causing silent backup failures	2025-11-29 23:30:07 -05:00
Jose Diaz-Gonzalez	83ff0ae1dd	Merge pull request #455 from Iamrodos/fix-133 Avoid rewriting unchanged JSON files for labels, milestones, releases…	2025-11-29 23:29:30 -05:00
Rodos	6ad1959d43	fix: case-sensitive username filtering causing silent backup failures GitHub's API accepts usernames in any case but returns canonical case. The case-sensitive comparison in filter_repositories() filtered out all repositories when user-provided case didn't match GitHub's canonical case. Changed to case-insensitive comparison. Fixes #198	2025-11-29 21:16:22 +11:00
Rodos	5739ac0745	Avoid rewriting unchanged JSON files for labels, milestones, releases, hooks, followers, and following This change reduces unnecessary writes when backing up metadata that changes infrequently. The implementation compares existing file content before writing and skips the write if the content is identical, preserving file timestamps. Key changes: - Added json_dump_if_changed() helper that compares content before writing - Uses atomic writes (temp file + rename) for all metadata files - NOT applied to issues/pulls (they use incremental_by_files logic) - Made log messages consistent and past tense ("Saved" instead of "Saving") - Added informative logging showing skip counts Fixes #133	2025-11-29 17:21:14 +11:00
GitHub Action	8b7512c8d8	Release version 0.52.0	2025-11-28 23:39:09 +00:00
Jose Diaz-Gonzalez	995b7ede6c	Merge pull request #454 from Iamrodos/http-451 Skip DMCA'd repos which return a 451 response	2025-11-28 18:38:32 -05:00
Rodos	7840528fe2	Skip DMCA'd repos which return a 451 response Log a warning and the link to the DMCA notice. Continue backing up other repositories instead of crashing. Closes #163	2025-11-29 09:52:02 +11:00
Jose Diaz-Gonzalez	6fb0d86977	Merge pull request #453 from josegonzalez/dependabot/pip/python-packages-42260fba7a chore(deps): bump restructuredtext-lint from 1.4.0 to 2.0.2 in the python-packages group	2025-11-24 15:07:08 -05:00
dependabot[bot]	9f6b401171	chore(deps): bump restructuredtext-lint in the python-packages group Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint). Updates `restructuredtext-lint` from 1.4.0 to 2.0.2 - [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst) - [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2) --- updated-dependencies: - dependency-name: restructuredtext-lint dependency-version: 2.0.2 dependency-type: direct:production update-type: version-update:semver-major dependency-group: python-packages ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-24 14:58:52 +00:00
Jose Diaz-Gonzalez	bf638f7aea	Merge pull request #452 from josegonzalez/dependabot/github_actions/actions/checkout-6 chore(deps): bump actions/checkout from 5 to 6	2025-11-24 04:42:52 -05:00
dependabot[bot]	c3855a94f1	chore(deps): bump actions/checkout from 5 to 6 Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v5...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-24 04:09:25 +00:00
Jose Diaz-Gonzalez	c3f4bfde0d	Merge pull request #451 from josegonzalez/dependabot/pip/python-packages-63544ef561 chore(deps): bump the python-packages group with 3 updates	2025-11-18 11:44:02 -05:00
dependabot[bot]	d3edef0622	chore(deps): bump the python-packages group with 3 updates Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring). Updates `click` from 8.3.0 to 8.3.1 - [Release notes](https://github.com/pallets/click/releases) - [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1) Updates `pytest` from 8.3.3 to 9.0.1 - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1) Updates `keyring` from 25.6.0 to 25.7.0 - [Release notes](https://github.com/jaraco/keyring/releases) - [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst) - [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0) --- updated-dependencies: - dependency-name: click dependency-version: 8.3.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages - dependency-name: pytest dependency-version: 9.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: python-packages - dependency-name: keyring dependency-version: 25.7.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-18 13:24:06 +00:00
GitHub Action	9ef496efad	Release version 0.51.3	2025-11-18 06:55:36 +00:00
Jose Diaz-Gonzalez	42bfe6f79d	Merge pull request #450 from Iamrodos/test/add-pagination-tests test: Add pagination tests for cursor and page-based Link headers	2025-11-18 01:54:54 -05:00
Rodos	5af522a348	test: Add pagination tests for cursor and page-based Link headers	2025-11-17 17:14:29 +11:00
Jose Diaz-Gonzalez	6dfba7a783	Merge pull request #449 from 0x2b3bfa0/patch-1 Use cursor based pagination	2025-11-17 00:31:25 -05:00
Helio Machado	7551829677	Use cursor based pagination	2025-11-17 02:09:29 +01:00
GitHub Action	72d35a9b94	Release version 0.51.2	2025-11-16 23:55:36 +00:00
Jose Diaz-Gonzalez	3eae9d78ed	Merge pull request #447 from Iamrodos/master fix: Improve CA certificate detection with fallback chain	2025-11-16 18:54:58 -05:00
Rodos	90ba839c7d	fix: Improve CA certificate detection with fallback chain The previous implementation incorrectly assumed empty get_ca_certs() meant broken SSL, causing false failures in GitHub Codespaces and other directory-based cert systems where certificates exist but aren't pre-loaded. It would then attempt to import certifi as a workaround, but certifi wasn't listed in requirements.txt, causing the fallback to fail with ImportError even though the system certificates would have worked fine. This commit replaces the naive check with a layered fallback approach that checks multiple certificate sources. First it checks for pre-loaded system certs (file-based systems). Then it verifies system cert paths exist (directory-based systems like Ubuntu/Debian/Codespaces). Finally it attempts to use certifi as an optional fallback only if needed. This approach eliminates hard dependencies (certifi is now optional), works in GitHub Codespaces without any setup, and fails gracefully with clear hints for resolution when SSL is actually broken rather than failing with ModuleNotFoundError. Fixes #444	2025-11-16 16:33:10 +11:00
`@@ -1 +1 @@`
	`__version__ = "0.51.1"`	`__version__ = "0.53.0"`