Release version 0.53.0

Merge pull request #456 from Iamrodos/fix-case
fix: case-sensitive username filtering causing silent backup failures
2026-02-12 23:12:27 +01:00 · 2025-11-30 04:30:48 +00:00 · 2025-11-29 23:30:07 -05:00 · 2025-11-29 23:29:30 -05:00 · 2025-11-29 21:16:22 +11:00 · 2025-11-29 17:21:14 +11:00
15 changed files with 1351 additions and 107 deletions
--- a/.github/workflows/automatic-release.yml
+++ b/.github/workflows/automatic-release.yml
@@ -18,7 +18,7 @@ jobs:
        runs-on: ubuntu-24.04
        steps:
            - name: Checkout repository
-              uses: actions/checkout@v5
+              uses: actions/checkout@v6
              with:
                fetch-depth: 0
                ssh-key: ${{ secrets.DEPLOY_PRIVATE_KEY }}
--- a/.github/workflows/docker.yml
+++ b/.github/workflows/docker.yml
@@ -38,7 +38,7 @@ jobs:

    steps:
      - name: Checkout repository
-        uses: actions/checkout@v5
+        uses: actions/checkout@v6
        with:
          persist-credentials: false

--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@@ -21,7 +21,7 @@ jobs:

    steps:
      - name: Checkout repository
-        uses: actions/checkout@v5
+        uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Setup Python
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -0,0 +1,33 @@
+---
+name: "test"
+
+# yamllint disable-line rule:truthy
+on:
+  pull_request:
+    branches:
+      - "*"
+  push:
+    branches:
+      - "main"
+      - "master"
+
+jobs:
+  test:
+    name: test
+    runs-on: ubuntu-24.04
+    strategy:
+      matrix:
+        python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+        with:
+          fetch-depth: 0
+      - name: Setup Python
+        uses: actions/setup-python@v6
+        with:
+          python-version: ${{ matrix.python-version }}
+          cache: "pip"
+      - run: pip install -r release-requirements.txt
+      - run: pytest tests/ -v
--- a/CHANGES.rst
+++ b/CHANGES.rst
@@ -1,10 +1,249 @@
 Changelog
 =========

-0.51.0 (2025-11-06)
+0.53.0 (2025-11-30)
 -------------------
 ------------------------

+Fix
+~~~
+- Case-sensitive username filtering causing silent backup failures.
+  [Rodos]
+
+  GitHub's API accepts usernames in any case but returns canonical case.
+  The case-sensitive comparison in filter_repositories() filtered out all
+  repositories when user-provided case didn't match GitHub's canonical case.
+
+  Changed to case-insensitive comparison.
+
+  Fixes #198
+
+Other
+~~~~~
+- Avoid rewriting unchanged JSON files for labels, milestones, releases,
+  hooks, followers, and following. [Rodos]
+
+  This change reduces unnecessary writes when backing up metadata that changes
+  infrequently. The implementation compares existing file content before writing
+  and skips the write if the content is identical, preserving file timestamps.
+
+  Key changes:
+  - Added json_dump_if_changed() helper that compares content before writing
+  - Uses atomic writes (temp file + rename) for all metadata files
+  - NOT applied to issues/pulls (they use incremental_by_files logic)
+  - Made log messages consistent and past tense ("Saved" instead of "Saving")
+  - Added informative logging showing skip counts
+
+  Fixes #133
+
+
+0.52.0 (2025-11-28)
+-------------------
+- Skip DMCA'd repos which return a 451 response. [Rodos]
+
+  Log a warning and the link to the DMCA notice. Continue backing up
+  other repositories instead of crashing.
+
+  Closes #163
+- Chore(deps): bump restructuredtext-lint in the python-packages group.
+  [dependabot[bot]]
+
+  Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint).
+
+
+  Updates `restructuredtext-lint` from 1.4.0 to 2.0.2
+  - [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst)
+  - [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2)
+
+  ---
+  updated-dependencies:
+  - dependency-name: restructuredtext-lint
+    dependency-version: 2.0.2
+    dependency-type: direct:production
+    update-type: version-update:semver-major
+    dependency-group: python-packages
+  ...
+- Chore(deps): bump actions/checkout from 5 to 6. [dependabot[bot]]
+
+  Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
+  - [Release notes](https://github.com/actions/checkout/releases)
+  - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
+  - [Commits](https://github.com/actions/checkout/compare/v5...v6)
+
+  ---
+  updated-dependencies:
+  - dependency-name: actions/checkout
+    dependency-version: '6'
+    dependency-type: direct:production
+    update-type: version-update:semver-major
+  ...
+- Chore(deps): bump the python-packages group with 3 updates.
+  [dependabot[bot]]
+
+  Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring).
+
+
+  Updates `click` from 8.3.0 to 8.3.1
+  - [Release notes](https://github.com/pallets/click/releases)
+  - [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst)
+  - [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1)
+
+  Updates `pytest` from 8.3.3 to 9.0.1
+  - [Release notes](https://github.com/pytest-dev/pytest/releases)
+  - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
+  - [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1)
+
+  Updates `keyring` from 25.6.0 to 25.7.0
+  - [Release notes](https://github.com/jaraco/keyring/releases)
+  - [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst)
+  - [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0)
+
+  ---
+  updated-dependencies:
+  - dependency-name: click
+    dependency-version: 8.3.1
+    dependency-type: direct:production
+    update-type: version-update:semver-patch
+    dependency-group: python-packages
+  - dependency-name: pytest
+    dependency-version: 9.0.1
+    dependency-type: direct:production
+    update-type: version-update:semver-major
+    dependency-group: python-packages
+  - dependency-name: keyring
+    dependency-version: 25.7.0
+    dependency-type: direct:production
+    update-type: version-update:semver-minor
+    dependency-group: python-packages
+  ...
+
+
+0.51.3 (2025-11-18)
+-------------------
+- Test: Add pagination tests for cursor and page-based Link headers.
+  [Rodos]
+- Use cursor based pagination. [Helio Machado]
+
+
+0.51.2 (2025-11-16)
+-------------------
+
+Fix
+~~~
+- Improve CA certificate detection with fallback chain. [Rodos]
+
+  The previous implementation incorrectly assumed empty get_ca_certs()
+  meant broken SSL, causing false failures in GitHub Codespaces and other
+  directory-based cert systems where certificates exist but aren't pre-loaded.
+  It would then attempt to import certifi as a workaround, but certifi wasn't
+  listed in requirements.txt, causing the fallback to fail with ImportError
+  even though the system certificates would have worked fine.
+
+  This commit replaces the naive check with a layered fallback approach that
+  checks multiple certificate sources. First it checks for pre-loaded system
+  certs (file-based systems). Then it verifies system cert paths exist
+  (directory-based systems like Ubuntu/Debian/Codespaces). Finally it attempts
+  to use certifi as an optional fallback only if needed.
+
+  This approach eliminates hard dependencies (certifi is now optional), works
+  in GitHub Codespaces without any setup, and fails gracefully with clear hints
+  for resolution when SSL is actually broken rather than failing with
+  ModuleNotFoundError.
+
+  Fixes #444
+
+
+0.51.1 (2025-11-16)
+-------------------
+
+Fix
+~~~
+- Prevent duplicate attachment downloads. [Rodos]
+
+  Fixes bug where attachments were downloaded multiple times with
+  incremented filenames (file.mov, file_1.mov, file_2.mov) when
+  running backups without --skip-existing flag.
+
+  I should not have used the --skip-existing flag for attachments,
+  it did not do what I thought it did.
+
+  The correct approach is to always use the manifest to guide what
+  has already been downloaded and what now needs to be done.
+
+Other
+~~~~~
+- Chore(deps): bump certifi in the python-packages group.
+  [dependabot[bot]]
+
+  Bumps the python-packages group with 1 update: [certifi](https://github.com/certifi/python-certifi).
+
+
+  Updates `certifi` from 2025.10.5 to 2025.11.12
+  - [Commits](https://github.com/certifi/python-certifi/compare/2025.10.05...2025.11.12)
+
+  ---
+  updated-dependencies:
+  - dependency-name: certifi
+    dependency-version: 2025.11.12
+    dependency-type: direct:production
+    update-type: version-update:semver-minor
+    dependency-group: python-packages
+  ...
+- Test: Add pytest infrastructure and attachment tests. [Rodos]
+
+  In making my last fix to attachments, I found it challenging not
+  having tests to ensure there was no regression.
+
+  Added pytest with minimal setup and isolated configuration. Created
+  a separate test workflow to keep tests isolated from linting.
+
+  Tests cover the key elements of the attachment logic:
+  - URL extraction from issue bodies
+  - Filename extraction from different URL types
+  - Filename collision resolution
+  - Manifest duplicate prevention
+- Chore(deps): bump black in the python-packages group.
+  [dependabot[bot]]
+
+  Bumps the python-packages group with 1 update: [black](https://github.com/psf/black).
+
+
+  Updates `black` from 25.9.0 to 25.11.0
+  - [Release notes](https://github.com/psf/black/releases)
+  - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
+  - [Commits](https://github.com/psf/black/compare/25.9.0...25.11.0)
+
+  ---
+  updated-dependencies:
+  - dependency-name: black
+    dependency-version: 25.11.0
+    dependency-type: direct:production
+    update-type: version-update:semver-minor
+    dependency-group: python-packages
+  ...
+- Chore(deps): bump docutils in the python-packages group.
+  [dependabot[bot]]
+
+  Bumps the python-packages group with 1 update: [docutils](https://github.com/rtfd/recommonmark).
+
+
+  Updates `docutils` from 0.22.2 to 0.22.3
+  - [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md)
+  - [Commits](https://github.com/rtfd/recommonmark/commits)
+
+  ---
+  updated-dependencies:
+  - dependency-name: docutils
+    dependency-version: 0.22.3
+    dependency-type: direct:production
+    update-type: version-update:semver-patch
+    dependency-group: python-packages
+  ...
+
+
+0.51.0 (2025-11-06)
+-------------------
+
 Fix
 ~~~
 - Remove Python 3.8 and 3.9 from CI matrix. [Rodos]
--- a/github_backup/init.py
+++ b/github_backup/init.py
@@ -1 +1 @@
-__version__ = "0.51.0"
+__version__ = "0.53.0"
--- a/github_backup/github_backup.py
+++ b/github_backup/github_backup.py
@@ -37,22 +37,42 @@ FNULL = open(os.devnull, "w")
 FILE_URI_PREFIX = "file://"
 logger = logging.getLogger(__name__)

+
+class RepositoryUnavailableError(Exception):
+    """Raised when a repository is unavailable due to legal reasons (e.g., DMCA takedown)."""
+
+    def __init__(self, message, dmca_url=None):
+        super().__init__(message)
+        self.dmca_url = dmca_url
+
+
+# Setup SSL context with fallback chain
 https_ctx = ssl.create_default_context()
-if not https_ctx.get_ca_certs():
-    import warnings
+if https_ctx.get_ca_certs():
+    # Layer 1: Certificates pre-loaded from system (file-based)
+    pass
+else:
+    paths = ssl.get_default_verify_paths()
+    if (paths.cafile and os.path.exists(paths.cafile)) or (
+        paths.capath and os.path.exists(paths.capath)
+    ):
+        # Layer 2: Cert paths exist, will be lazy-loaded on first use (directory-based)
+        pass
+    else:
+        # Layer 3: Try certifi package as optional fallback
+        try:
+            import certifi

-    warnings.warn(
-        "\n\nYOUR DEFAULT CA CERTS ARE EMPTY.\n"
-        + "PLEASE POPULATE ANY OF:"
-        + "".join(
-            ["\n - " + x for x in ssl.get_default_verify_paths() if type(x) is str]
-        )
-        + "\n",
-        stacklevel=2,
-    )
-    import certifi
-
-    https_ctx = ssl.create_default_context(cafile=certifi.where())
+            https_ctx = ssl.create_default_context(cafile=certifi.where())
+        except ImportError:
+            # All layers failed - no certificates available anywhere
+            sys.exit(
+                "\nERROR: No CA certificates found. Cannot connect to GitHub over SSL.\n\n"
+                "Solutions you can explore:\n"
+                "  1. pip install certifi\n"
+                "  2. Alpine: apk add ca-certificates\n"
+                "  3. Debian/Ubuntu: apt-get install ca-certificates\n\n"
+            )


 def logging_subprocess(
@@ -581,27 +601,39 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
    auth = get_auth(args, encode=not args.as_app)
    query_args = get_query_args(query_args)
    per_page = 100
-    page = 0
+    next_url = None

    while True:
        if single_request:
-            request_page, request_per_page = None, None
+            request_per_page = None
        else:
-            page = page + 1
-            request_page, request_per_page = page, per_page
+            request_per_page = per_page

        request = _construct_request(
            request_per_page,
-            request_page,
            query_args,
-            template,
+            next_url or template,
            auth,
            as_app=args.as_app,
            fine=True if args.token_fine is not None else False,
        )  # noqa
-        r, errors = _get_response(request, auth, template)
+        r, errors = _get_response(request, auth, next_url or template)

        status_code = int(r.getcode())
+
+        # Handle DMCA takedown (HTTP 451) - raise exception to skip entire repository
+        if status_code == 451:
+            dmca_url = None
+            try:
+                response_data = json.loads(r.read().decode("utf-8"))
+                dmca_url = response_data.get("block", {}).get("html_url")
+            except Exception:
+                pass
+            raise RepositoryUnavailableError(
+                "Repository unavailable due to legal reasons (HTTP 451)",
+                dmca_url=dmca_url
+            )
+
        # Check if we got correct data
        try:
            response = json.loads(r.read().decode("utf-8"))
@@ -633,15 +665,14 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
            retries += 1
            time.sleep(5)
            request = _construct_request(
-                per_page,
-                page,
+                request_per_page,
                query_args,
-                template,
+                next_url or template,
                auth,
                as_app=args.as_app,
                fine=True if args.token_fine is not None else False,
            )  # noqa
-            r, errors = _get_response(request, auth, template)
+            r, errors = _get_response(request, auth, next_url or template)

            status_code = int(r.getcode())
            try:
@@ -671,7 +702,16 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
            if type(response) is list:
                for resp in response:
                    yield resp
-                if len(response) < per_page:
+                # Parse Link header for next page URL (cursor-based pagination)
+                link_header = r.headers.get("Link", "")
+                next_url = None
+                if link_header:
+                    # Parse Link header: <https://api.github.com/...?per_page=100&after=cursor>; rel="next"
+                    for link in link_header.split(","):
+                        if 'rel="next"' in link:
+                            next_url = link[link.find("<") + 1:link.find(">")]
+                            break
+                if not next_url:
                    break
            elif type(response) is dict and single_request:
                yield response
@@ -724,22 +764,27 @@ def _get_response(request, auth, template):


 def _construct_request(
-    per_page, page, query_args, template, auth, as_app=None, fine=False
+    per_page, query_args, template, auth, as_app=None, fine=False
 ):
-    all_query_args = {}
-    if per_page:
-        all_query_args["per_page"] = per_page
-    if page:
-        all_query_args["page"] = page
-    if query_args:
-        all_query_args.update(query_args)
-
-    request_url = template
-    if all_query_args:
-        querystring = urlencode(all_query_args)
-        request_url = template + "?" + querystring
+    # If template is already a full URL with query params (from Link header), use it directly
+    if "?" in template and template.startswith("http"):
+        request_url = template
+        # Extract query string for logging
+        querystring = template.split("?", 1)[1]
    else:
-        querystring = ""
+        # Build URL with query parameters
+        all_query_args = {}
+        if per_page:
+            all_query_args["per_page"] = per_page
+        if query_args:
+            all_query_args.update(query_args)
+
+        request_url = template
+        if all_query_args:
+            querystring = urlencode(all_query_args)
+            request_url = template + "?" + querystring
+        else:
+            querystring = ""

    request = Request(request_url)
    if auth is not None:
@@ -755,7 +800,7 @@ def _construct_request(
                "Accept", "application/vnd.github.machine-man-preview+json"
            )

-    log_url = template
+    log_url = template if "?" not in template else template.split("?")[0]
    if querystring:
        log_url += "?" + querystring
    logger.info("Requesting {}".format(log_url))
@@ -832,8 +877,7 @@ def download_file(url, path, auth, as_app=False, fine=False):
        return

    request = _construct_request(
-        per_page=100,
-        page=1,
+        per_page=None,
        query_args={},
        template=url,
        auth=auth,
@@ -919,12 +963,6 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False):
        "error": None,
    }

-    if os.path.exists(path):
-        metadata["success"] = True
-        metadata["http_status"] = 200  # Assume success if already exists
-        metadata["size_bytes"] = os.path.getsize(path)
-        return metadata
-
    # Create simple request (no API query params)
    request = Request(url)
    request.add_header("Accept", "application/octet-stream")
@@ -1337,10 +1375,10 @@ def download_attachments(
    attachments_dir = os.path.join(item_cwd, "attachments", str(number))
    manifest_path = os.path.join(attachments_dir, "manifest.json")

-    # Load existing manifest if skip_existing is enabled
+    # Load existing manifest to prevent duplicate downloads
    existing_urls = set()
    existing_metadata = []
-    if args.skip_existing and os.path.exists(manifest_path):
+    if os.path.exists(manifest_path):
        try:
            with open(manifest_path, "r") as f:
                existing_manifest = json.load(f)
@@ -1395,9 +1433,6 @@ def download_attachments(
        filename = get_attachment_filename(url)
        filepath = os.path.join(attachments_dir, filename)

-        # Check for collision BEFORE downloading
-        filepath = resolve_filename_collision(filepath)
-
        # Download and get metadata
        metadata = download_attachment_file(
            url,
@@ -1552,7 +1587,9 @@ def filter_repositories(args, unfiltered_repositories):
    repositories = []
    for r in unfiltered_repositories:
        # gists can be anonymous, so need to safely check owner
-        if r.get("owner", {}).get("login") == args.user or r.get("is_starred"):
+        # Use case-insensitive comparison to match GitHub's case-insensitive username behavior
+        owner_login = r.get("owner", {}).get("login", "")
+        if owner_login.lower() == args.user.lower() or r.get("is_starred"):
            repositories.append(r)

    name_regex = None
@@ -1655,40 +1692,47 @@ def backup_repositories(args, output_directory, repositories):

                continue  # don't try to back anything else for a gist; it doesn't exist

-        download_wiki = args.include_wiki or args.include_everything
-        if repository["has_wiki"] and download_wiki:
-            fetch_repository(
-                repository["name"],
-                repo_url.replace(".git", ".wiki.git"),
-                os.path.join(repo_cwd, "wiki"),
-                skip_existing=args.skip_existing,
-                bare_clone=args.bare_clone,
-                lfs_clone=args.lfs_clone,
-                no_prune=args.no_prune,
-            )
-        if args.include_issues or args.include_everything:
-            backup_issues(args, repo_cwd, repository, repos_template)
+        try:
+            download_wiki = args.include_wiki or args.include_everything
+            if repository["has_wiki"] and download_wiki:
+                fetch_repository(
+                    repository["name"],
+                    repo_url.replace(".git", ".wiki.git"),
+                    os.path.join(repo_cwd, "wiki"),
+                    skip_existing=args.skip_existing,
+                    bare_clone=args.bare_clone,
+                    lfs_clone=args.lfs_clone,
+                    no_prune=args.no_prune,
+                )
+            if args.include_issues or args.include_everything:
+                backup_issues(args, repo_cwd, repository, repos_template)

-        if args.include_pulls or args.include_everything:
-            backup_pulls(args, repo_cwd, repository, repos_template)
+            if args.include_pulls or args.include_everything:
+                backup_pulls(args, repo_cwd, repository, repos_template)

-        if args.include_milestones or args.include_everything:
-            backup_milestones(args, repo_cwd, repository, repos_template)
+            if args.include_milestones or args.include_everything:
+                backup_milestones(args, repo_cwd, repository, repos_template)

-        if args.include_labels or args.include_everything:
-            backup_labels(args, repo_cwd, repository, repos_template)
+            if args.include_labels or args.include_everything:
+                backup_labels(args, repo_cwd, repository, repos_template)

-        if args.include_hooks or args.include_everything:
-            backup_hooks(args, repo_cwd, repository, repos_template)
+            if args.include_hooks or args.include_everything:
+                backup_hooks(args, repo_cwd, repository, repos_template)

-        if args.include_releases or args.include_everything:
-            backup_releases(
-                args,
-                repo_cwd,
-                repository,
-                repos_template,
-                include_assets=args.include_assets or args.include_everything,
-            )
+            if args.include_releases or args.include_everything:
+                backup_releases(
+                    args,
+                    repo_cwd,
+                    repository,
+                    repos_template,
+                    include_assets=args.include_assets or args.include_everything,
+                )
+        except RepositoryUnavailableError as e:
+            logger.warning(f"Repository {repository['full_name']} is unavailable (HTTP 451)")
+            if e.dmca_url:
+                logger.warning(f"DMCA notice: {e.dmca_url}")
+            logger.info(f"Skipping remaining resources for {repository['full_name']}")
+            continue

    if args.incremental:
        if last_update == "0000-00-00T00:00:00Z":
@@ -1856,11 +1900,21 @@ def backup_milestones(args, repo_cwd, repository, repos_template):
    for milestone in _milestones:
        milestones[milestone["number"]] = milestone

-    logger.info("Saving {0} milestones to disk".format(len(list(milestones.keys()))))
+    written_count = 0
    for number, milestone in list(milestones.items()):
        milestone_file = "{0}/{1}.json".format(milestone_cwd, number)
-        with codecs.open(milestone_file, "w", encoding="utf-8") as f:
-            json_dump(milestone, f)
+        if json_dump_if_changed(milestone, milestone_file):
+            written_count += 1
+
+    total = len(milestones)
+    if written_count == total:
+        logger.info("Saved {0} milestones to disk".format(total))
+    elif written_count == 0:
+        logger.info("{0} milestones unchanged, skipped write".format(total))
+    else:
+        logger.info("Saved {0} of {1} milestones to disk ({2} unchanged)".format(
+            written_count, total, total - written_count
+        ))


 def backup_labels(args, repo_cwd, repository, repos_template):
@@ -1913,19 +1967,17 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
            reverse=True,
        )
        releases = releases[: args.number_of_latest_releases]
-        logger.info("Saving the latest {0} releases to disk".format(len(releases)))
-    else:
-        logger.info("Saving {0} releases to disk".format(len(releases)))

    # for each release, store it
+    written_count = 0
    for release in releases:
        release_name = release["tag_name"]
        release_name_safe = release_name.replace("/", "__")
        output_filepath = os.path.join(
            release_cwd, "{0}.json".format(release_name_safe)
        )
-        with codecs.open(output_filepath, "w+", encoding="utf-8") as f:
-            json_dump(release, f)
+        if json_dump_if_changed(release, output_filepath):
+            written_count += 1

        if include_assets:
            assets = retrieve_data(args, release["assets_url"])
@@ -1942,6 +1994,17 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
                        fine=True if args.token_fine is not None else False,
                    )

+    # Log the results
+    total = len(releases)
+    if written_count == total:
+        logger.info("Saved {0} releases to disk".format(total))
+    elif written_count == 0:
+        logger.info("{0} releases unchanged, skipped write".format(total))
+    else:
+        logger.info("Saved {0} of {1} releases to disk ({2} unchanged)".format(
+            written_count, total, total - written_count
+        ))
+

 def fetch_repository(
    name,
@@ -2066,9 +2129,10 @@ def _backup_data(args, name, template, output_file, output_directory):
        mkdir_p(output_directory)
        data = retrieve_data(args, template)

-        logger.info("Writing {0} {1} to disk".format(len(data), name))
-        with codecs.open(output_file, "w", encoding="utf-8") as f:
-            json_dump(data, f)
+        if json_dump_if_changed(data, output_file):
+            logger.info("Saved {0} {1} to disk".format(len(data), name))
+        else:
+            logger.info("{0} {1} unchanged, skipped write".format(len(data), name))


 def json_dump(data, output_file):
@@ -2080,3 +2144,57 @@ def json_dump(data, output_file):
        indent=4,
        separators=(",", ": "),
    )
+
+
+def json_dump_if_changed(data, output_file_path):
+    """
+    Write JSON data to file only if content has changed.
+
+    Compares the serialized JSON data with the existing file content
+    and only writes if different. This prevents unnecessary file
+    modification timestamp updates and disk writes.
+
+    Uses atomic writes (temp file + rename) to prevent corruption
+    if the process is interrupted during the write.
+
+    Args:
+        data: The data to serialize as JSON
+        output_file_path: The path to the output file
+
+    Returns:
+        True if file was written (content changed or new file)
+        False if write was skipped (content unchanged)
+    """
+    # Serialize new data with consistent formatting matching json_dump()
+    new_content = json.dumps(
+        data,
+        ensure_ascii=False,
+        sort_keys=True,
+        indent=4,
+        separators=(",", ": "),
+    )
+
+    # Check if file exists and compare content
+    if os.path.exists(output_file_path):
+        try:
+            with codecs.open(output_file_path, "r", encoding="utf-8") as f:
+                existing_content = f.read()
+            if existing_content == new_content:
+                logger.debug(
+                    "Content unchanged, skipping write: {0}".format(output_file_path)
+                )
+                return False
+        except (OSError, UnicodeDecodeError) as e:
+            # If we can't read the existing file, write the new one
+            logger.debug(
+                "Error reading existing file {0}, will overwrite: {1}".format(
+                    output_file_path, e
+                )
+            )
+
+    # Write the file atomically using temp file + rename
+    temp_file = output_file_path + ".temp"
+    with codecs.open(temp_file, "w", encoding="utf-8") as f:
+        f.write(new_content)
+    os.rename(temp_file, output_file_path)  # Atomic on POSIX systems
+    return True
--- a/pytest.ini
+++ b/pytest.ini
@@ -0,0 +1,6 @@
+[pytest]
+testpaths = tests
+python_files = test_*.py
+python_classes = Test*
+python_functions = test_*
+addopts = -v
--- a/release-requirements.txt
+++ b/release-requirements.txt
@@ -1,17 +1,18 @@
 autopep8==2.3.2
-black==25.9.0
+black==25.11.0
 bleach==6.3.0
-certifi==2025.10.5
+certifi==2025.11.12
 charset-normalizer==3.4.4
-click==8.3.0
+click==8.3.1
 colorama==0.4.6
-docutils==0.22.2
+docutils==0.22.3
 flake8==7.3.0
 gitchangelog==3.0.4
+pytest==9.0.1
 idna==3.11
 importlib-metadata==8.7.0
 jaraco.classes==3.4.0
-keyring==25.6.0
+keyring==25.7.0
 markdown-it-py==4.0.0
 mccabe==0.7.0
 mdurl==0.1.2
@@ -27,7 +28,7 @@ Pygments==2.19.2
 readme-renderer==44.0
 requests==2.32.5
 requests-toolbelt==1.0.0
-restructuredtext-lint==1.4.0
+restructuredtext-lint==2.0.2
 rfc3986==2.0.0
 rich==14.2.0
 setuptools==80.9.0
--- a/requirements.txt
+++ b/requirements.txt
@@ -1 +0,0 @@
-
--- a/tests/init.py
+++ b/tests/init.py
@@ -0,0 +1 @@
+"""Tests for python-github-backup."""
--- a/tests/test_attachments.py
+++ b/tests/test_attachments.py
@@ -0,0 +1,353 @@
+"""Behavioral tests for attachment functionality."""
+
+import json
+import os
+import tempfile
+from pathlib import Path
+from unittest.mock import Mock
+
+import pytest
+
+from github_backup import github_backup
+
+
+@pytest.fixture
+def attachment_test_setup(tmp_path):
+    """Fixture providing setup and helper for attachment download tests."""
+    from unittest.mock import patch
+
+    issue_cwd = tmp_path / "issues"
+    issue_cwd.mkdir()
+
+    # Mock args
+    args = Mock()
+    args.as_app = False
+    args.token_fine = None
+    args.token_classic = None
+    args.username = None
+    args.password = None
+    args.osx_keychain_item_name = None
+    args.osx_keychain_item_account = None
+    args.user = "testuser"
+    args.repository = "testrepo"
+
+    repository = {"full_name": "testuser/testrepo"}
+
+    def call_download(issue_data, issue_number=123):
+        """Call download_attachments with mocked HTTP downloads.
+
+        Returns list of URLs that were actually downloaded.
+        """
+        downloaded_urls = []
+
+        def mock_download(url, path, auth, as_app, fine):
+            downloaded_urls.append(url)
+            return {
+                "success": True,
+                "saved_as": os.path.basename(path),
+                "url": url,
+            }
+
+        with patch(
+            "github_backup.github_backup.download_attachment_file",
+            side_effect=mock_download,
+        ):
+            github_backup.download_attachments(
+                args, str(issue_cwd), issue_data, issue_number, repository
+            )
+
+        return downloaded_urls
+
+    return {
+        "issue_cwd": str(issue_cwd),
+        "args": args,
+        "repository": repository,
+        "call_download": call_download,
+    }
+
+
+class TestURLExtraction:
+    """Test URL extraction with realistic issue content."""
+
+    def test_mixed_urls(self):
+        issue_data = {
+            "body": """
+            ## Bug Report
+
+            When uploading files, I see this error. Here's a screenshot:
+            https://github.com/user-attachments/assets/abc123def456
+
+            The logs show: https://github.com/user-attachments/files/789/error-log.txt
+
+            This is similar to https://github.com/someorg/somerepo/issues/42 but different.
+
+            You can also see the video at https://user-images.githubusercontent.com/12345/video-demo.mov
+
+            Here's how to reproduce:
+            ```bash
+            # Don't extract this example URL:
+            curl https://github.com/user-attachments/assets/example999
+            ```
+
+            More info at https://docs.example.com/guide
+
+            Also see this inline code `https://github.com/user-attachments/files/111/inline.pdf` should not extract.
+
+            Final attachment: https://github.com/user-attachments/files/222/report.pdf.
+            """,
+            "comment_data": [
+                {
+                    "body": "Here's another attachment: https://private-user-images.githubusercontent.com/98765/secret.png?jwt=token123"
+                },
+                {
+                    "body": """
+                    Example code:
+                    ```python
+                    url = "https://github.com/user-attachments/assets/code-example"
+                    ```
+                    But this is real: https://github.com/user-attachments/files/333/actual.zip
+                    """
+                },
+            ],
+        }
+
+        # Extract URLs
+        urls = github_backup.extract_attachment_urls(issue_data)
+
+        expected_urls = [
+            "https://github.com/user-attachments/assets/abc123def456",
+            "https://github.com/user-attachments/files/789/error-log.txt",
+            "https://user-images.githubusercontent.com/12345/video-demo.mov",
+            "https://github.com/user-attachments/files/222/report.pdf",
+            "https://private-user-images.githubusercontent.com/98765/secret.png?jwt=token123",
+            "https://github.com/user-attachments/files/333/actual.zip",
+        ]
+
+        assert set(urls) == set(expected_urls)
+
+    def test_trailing_punctuation_stripped(self):
+        """URLs with trailing punctuation should have punctuation stripped."""
+        issue_data = {
+            "body": """
+            See this file: https://github.com/user-attachments/files/1/doc.pdf.
+            And this one (https://github.com/user-attachments/files/2/image.png).
+            Check it out! https://github.com/user-attachments/files/3/data.csv!
+            """
+        }
+
+        urls = github_backup.extract_attachment_urls(issue_data)
+
+        expected = [
+            "https://github.com/user-attachments/files/1/doc.pdf",
+            "https://github.com/user-attachments/files/2/image.png",
+            "https://github.com/user-attachments/files/3/data.csv",
+        ]
+        assert set(urls) == set(expected)
+
+    def test_deduplication_across_body_and_comments(self):
+        """Same URL in body and comments should only appear once."""
+        duplicate_url = "https://github.com/user-attachments/assets/abc123"
+
+        issue_data = {
+            "body": f"First mention: {duplicate_url}",
+            "comment_data": [
+                {"body": f"Second mention: {duplicate_url}"},
+                {"body": f"Third mention: {duplicate_url}"},
+            ],
+        }
+
+        urls = github_backup.extract_attachment_urls(issue_data)
+
+        assert set(urls) == {duplicate_url}
+
+
+class TestFilenameExtraction:
+    """Test filename extraction from different URL types."""
+
+    def test_modern_assets_url(self):
+        """Modern assets URL returns UUID."""
+        url = "https://github.com/user-attachments/assets/abc123def456"
+        filename = github_backup.get_attachment_filename(url)
+        assert filename == "abc123def456"
+
+    def test_modern_files_url(self):
+        """Modern files URL returns filename."""
+        url = "https://github.com/user-attachments/files/12345/report.pdf"
+        filename = github_backup.get_attachment_filename(url)
+        assert filename == "report.pdf"
+
+    def test_legacy_cdn_url(self):
+        """Legacy CDN URL returns filename with extension."""
+        url = "https://user-images.githubusercontent.com/123456/abc-def.png"
+        filename = github_backup.get_attachment_filename(url)
+        assert filename == "abc-def.png"
+
+    def test_private_cdn_url(self):
+        """Private CDN URL returns filename."""
+        url = "https://private-user-images.githubusercontent.com/98765/secret.png?jwt=token123"
+        filename = github_backup.get_attachment_filename(url)
+        assert filename == "secret.png"
+
+    def test_repo_files_url(self):
+        """Repo-scoped files URL returns filename."""
+        url = "https://github.com/owner/repo/files/789/document.txt"
+        filename = github_backup.get_attachment_filename(url)
+        assert filename == "document.txt"
+
+
+class TestFilenameCollision:
+    """Test filename collision resolution."""
+
+    def test_collision_behavior(self):
+        """Test filename collision resolution with real files."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            # No collision - file doesn't exist
+            result = github_backup.resolve_filename_collision(
+                os.path.join(tmpdir, "report.pdf")
+            )
+            assert result == os.path.join(tmpdir, "report.pdf")
+
+            # Create the file, now collision exists
+            Path(os.path.join(tmpdir, "report.pdf")).touch()
+            result = github_backup.resolve_filename_collision(
+                os.path.join(tmpdir, "report.pdf")
+            )
+            assert result == os.path.join(tmpdir, "report_1.pdf")
+
+            # Create report_1.pdf too
+            Path(os.path.join(tmpdir, "report_1.pdf")).touch()
+            result = github_backup.resolve_filename_collision(
+                os.path.join(tmpdir, "report.pdf")
+            )
+            assert result == os.path.join(tmpdir, "report_2.pdf")
+
+    def test_manifest_reserved(self):
+        """manifest.json is always treated as reserved."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            # Even if manifest.json doesn't exist, should get manifest_1.json
+            result = github_backup.resolve_filename_collision(
+                os.path.join(tmpdir, "manifest.json")
+            )
+            assert result == os.path.join(tmpdir, "manifest_1.json")
+
+
+class TestManifestDuplicatePrevention:
+    """Test that manifest prevents duplicate downloads (the bug fix)."""
+
+    def test_manifest_filters_existing_urls(self, attachment_test_setup):
+        """URLs in manifest are not re-downloaded."""
+        setup = attachment_test_setup
+
+        # Create manifest with existing URLs
+        attachments_dir = os.path.join(setup["issue_cwd"], "attachments", "123")
+        os.makedirs(attachments_dir)
+        manifest_path = os.path.join(attachments_dir, "manifest.json")
+
+        manifest = {
+            "attachments": [
+                {
+                    "url": "https://github.com/user-attachments/assets/old1",
+                    "success": True,
+                    "saved_as": "old1.pdf",
+                },
+                {
+                    "url": "https://github.com/user-attachments/assets/old2",
+                    "success": True,
+                    "saved_as": "old2.pdf",
+                },
+            ]
+        }
+        with open(manifest_path, "w") as f:
+            json.dump(manifest, f)
+
+        # Issue data with 2 old URLs and 1 new URL
+        issue_data = {
+            "body": """
+            Old: https://github.com/user-attachments/assets/old1
+            Old: https://github.com/user-attachments/assets/old2
+            New: https://github.com/user-attachments/assets/new1
+            """
+        }
+
+        downloaded_urls = setup["call_download"](issue_data)
+
+        # Should only download the NEW URL (old ones filtered by manifest)
+        assert len(downloaded_urls) == 1
+        assert downloaded_urls[0] == "https://github.com/user-attachments/assets/new1"
+
+    def test_no_manifest_downloads_all(self, attachment_test_setup):
+        """Without manifest, all URLs should be downloaded."""
+        setup = attachment_test_setup
+
+        # Issue data with 2 URLs
+        issue_data = {
+            "body": """
+            https://github.com/user-attachments/assets/url1
+            https://github.com/user-attachments/assets/url2
+            """
+        }
+
+        downloaded_urls = setup["call_download"](issue_data)
+
+        # Should download ALL URLs (no manifest to filter)
+        assert len(downloaded_urls) == 2
+        assert set(downloaded_urls) == {
+            "https://github.com/user-attachments/assets/url1",
+            "https://github.com/user-attachments/assets/url2",
+        }
+
+    def test_manifest_skips_permanent_failures(self, attachment_test_setup):
+        """Manifest skips permanent failures (404, 410) but retries transient (503)."""
+        setup = attachment_test_setup
+
+        # Create manifest with different failure types
+        attachments_dir = os.path.join(setup["issue_cwd"], "attachments", "123")
+        os.makedirs(attachments_dir)
+        manifest_path = os.path.join(attachments_dir, "manifest.json")
+
+        manifest = {
+            "attachments": [
+                {
+                    "url": "https://github.com/user-attachments/assets/success",
+                    "success": True,
+                    "saved_as": "success.pdf",
+                },
+                {
+                    "url": "https://github.com/user-attachments/assets/notfound",
+                    "success": False,
+                    "http_status": 404,
+                },
+                {
+                    "url": "https://github.com/user-attachments/assets/gone",
+                    "success": False,
+                    "http_status": 410,
+                },
+                {
+                    "url": "https://github.com/user-attachments/assets/unavailable",
+                    "success": False,
+                    "http_status": 503,
+                },
+            ]
+        }
+        with open(manifest_path, "w") as f:
+            json.dump(manifest, f)
+
+        # Issue data has all 4 URLs
+        issue_data = {
+            "body": """
+            https://github.com/user-attachments/assets/success
+            https://github.com/user-attachments/assets/notfound
+            https://github.com/user-attachments/assets/gone
+            https://github.com/user-attachments/assets/unavailable
+            """
+        }
+
+        downloaded_urls = setup["call_download"](issue_data)
+
+        # Should only retry 503 (transient failure)
+        # Success, 404, and 410 should be skipped
+        assert len(downloaded_urls) == 1
+        assert (
+            downloaded_urls[0]
+            == "https://github.com/user-attachments/assets/unavailable"
+        )
--- a/tests/test_http_451.py
+++ b/tests/test_http_451.py
@@ -0,0 +1,143 @@
+"""Tests for HTTP 451 (DMCA takedown) handling."""
+
+import json
+from unittest.mock import Mock, patch
+
+import pytest
+
+from github_backup import github_backup
+
+
+class TestHTTP451Exception:
+    """Test suite for HTTP 451 DMCA takedown exception handling."""
+
+    def test_repository_unavailable_error_raised(self):
+        """HTTP 451 should raise RepositoryUnavailableError with DMCA URL."""
+        # Create mock args
+        args = Mock()
+        args.as_app = False
+        args.token_fine = None
+        args.token_classic = None
+        args.username = None
+        args.password = None
+        args.osx_keychain_item_name = None
+        args.osx_keychain_item_account = None
+        args.throttle_limit = None
+        args.throttle_pause = 0
+
+        # Mock HTTPError 451 response
+        mock_response = Mock()
+        mock_response.getcode.return_value = 451
+
+        dmca_data = {
+            "message": "Repository access blocked",
+            "block": {
+                "reason": "dmca",
+                "created_at": "2024-11-12T14:38:04Z",
+                "html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
+            }
+        }
+        mock_response.read.return_value = json.dumps(dmca_data).encode("utf-8")
+        mock_response.headers = {"x-ratelimit-remaining": "5000"}
+        mock_response.reason = "Unavailable For Legal Reasons"
+
+        def mock_get_response(request, auth, template):
+            return mock_response, []
+
+        with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
+            with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
+                list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
+
+            # Check exception has DMCA URL
+            assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
+            assert "451" in str(exc_info.value)
+
+    def test_repository_unavailable_error_without_dmca_url(self):
+        """HTTP 451 without DMCA details should still raise exception."""
+        args = Mock()
+        args.as_app = False
+        args.token_fine = None
+        args.token_classic = None
+        args.username = None
+        args.password = None
+        args.osx_keychain_item_name = None
+        args.osx_keychain_item_account = None
+        args.throttle_limit = None
+        args.throttle_pause = 0
+
+        mock_response = Mock()
+        mock_response.getcode.return_value = 451
+        mock_response.read.return_value = b'{"message": "Blocked"}'
+        mock_response.headers = {"x-ratelimit-remaining": "5000"}
+        mock_response.reason = "Unavailable For Legal Reasons"
+
+        def mock_get_response(request, auth, template):
+            return mock_response, []
+
+        with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
+            with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
+                list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
+
+            # Exception raised even without DMCA URL
+            assert exc_info.value.dmca_url is None
+            assert "451" in str(exc_info.value)
+
+    def test_repository_unavailable_error_with_malformed_json(self):
+        """HTTP 451 with malformed JSON should still raise exception."""
+        args = Mock()
+        args.as_app = False
+        args.token_fine = None
+        args.token_classic = None
+        args.username = None
+        args.password = None
+        args.osx_keychain_item_name = None
+        args.osx_keychain_item_account = None
+        args.throttle_limit = None
+        args.throttle_pause = 0
+
+        mock_response = Mock()
+        mock_response.getcode.return_value = 451
+        mock_response.read.return_value = b"invalid json {"
+        mock_response.headers = {"x-ratelimit-remaining": "5000"}
+        mock_response.reason = "Unavailable For Legal Reasons"
+
+        def mock_get_response(request, auth, template):
+            return mock_response, []
+
+        with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
+            with pytest.raises(github_backup.RepositoryUnavailableError):
+                list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
+
+    def test_other_http_errors_unchanged(self):
+        """Other HTTP errors should still raise generic Exception."""
+        args = Mock()
+        args.as_app = False
+        args.token_fine = None
+        args.token_classic = None
+        args.username = None
+        args.password = None
+        args.osx_keychain_item_name = None
+        args.osx_keychain_item_account = None
+        args.throttle_limit = None
+        args.throttle_pause = 0
+
+        mock_response = Mock()
+        mock_response.getcode.return_value = 404
+        mock_response.read.return_value = b'{"message": "Not Found"}'
+        mock_response.headers = {"x-ratelimit-remaining": "5000"}
+        mock_response.reason = "Not Found"
+
+        def mock_get_response(request, auth, template):
+            return mock_response, []
+
+        with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
+            # Should raise generic Exception, not RepositoryUnavailableError
+            with pytest.raises(Exception) as exc_info:
+                list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/notfound/issues"))
+
+            assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError)
+            assert "404" in str(exc_info.value)
+
+
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])
--- a/tests/test_json_dump_if_changed.py
+++ b/tests/test_json_dump_if_changed.py
@@ -0,0 +1,198 @@
+"""Tests for json_dump_if_changed functionality."""
+
+import codecs
+import json
+import os
+import tempfile
+
+import pytest
+
+from github_backup import github_backup
+
+
+class TestJsonDumpIfChanged:
+    """Test suite for json_dump_if_changed function."""
+
+    def test_writes_new_file(self):
+        """Should write file when it doesn't exist."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            output_file = os.path.join(tmpdir, "test.json")
+            test_data = {"key": "value", "number": 42}
+
+            result = github_backup.json_dump_if_changed(test_data, output_file)
+
+            assert result is True
+            assert os.path.exists(output_file)
+
+            # Verify content matches expected format
+            with codecs.open(output_file, "r", encoding="utf-8") as f:
+                content = f.read()
+                loaded = json.loads(content)
+                assert loaded == test_data
+
+    def test_skips_unchanged_file(self):
+        """Should skip write when content is identical."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            output_file = os.path.join(tmpdir, "test.json")
+            test_data = {"key": "value", "number": 42}
+
+            # First write
+            result1 = github_backup.json_dump_if_changed(test_data, output_file)
+            assert result1 is True
+
+            # Get the initial mtime
+            mtime1 = os.path.getmtime(output_file)
+
+            # Second write with same data
+            result2 = github_backup.json_dump_if_changed(test_data, output_file)
+            assert result2 is False
+
+            # File should not have been modified
+            mtime2 = os.path.getmtime(output_file)
+            assert mtime1 == mtime2
+
+    def test_writes_when_content_changed(self):
+        """Should write file when content has changed."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            output_file = os.path.join(tmpdir, "test.json")
+            test_data1 = {"key": "value1"}
+            test_data2 = {"key": "value2"}
+
+            # First write
+            result1 = github_backup.json_dump_if_changed(test_data1, output_file)
+            assert result1 is True
+
+            # Second write with different data
+            result2 = github_backup.json_dump_if_changed(test_data2, output_file)
+            assert result2 is True
+
+            # Verify new content
+            with codecs.open(output_file, "r", encoding="utf-8") as f:
+                loaded = json.load(f)
+                assert loaded == test_data2
+
+    def test_uses_consistent_formatting(self):
+        """Should use same JSON formatting as json_dump."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            output_file = os.path.join(tmpdir, "test.json")
+            test_data = {"z": "last", "a": "first", "m": "middle"}
+
+            github_backup.json_dump_if_changed(test_data, output_file)
+
+            with codecs.open(output_file, "r", encoding="utf-8") as f:
+                content = f.read()
+
+            # Check for consistent formatting:
+            # - sorted keys
+            # - 4-space indent
+            # - comma-colon-space separator
+            expected = json.dumps(
+                test_data,
+                ensure_ascii=False,
+                sort_keys=True,
+                indent=4,
+                separators=(",", ": "),
+            )
+            assert content == expected
+
+    def test_atomic_write_always_used(self):
+        """Should always use temp file and rename for atomic writes."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            output_file = os.path.join(tmpdir, "test.json")
+            test_data = {"key": "value"}
+
+            result = github_backup.json_dump_if_changed(test_data, output_file)
+
+            assert result is True
+            assert os.path.exists(output_file)
+
+            # Temp file should not exist after atomic write
+            temp_file = output_file + ".temp"
+            assert not os.path.exists(temp_file)
+
+            # Verify content
+            with codecs.open(output_file, "r", encoding="utf-8") as f:
+                loaded = json.load(f)
+                assert loaded == test_data
+
+    def test_handles_unicode_content(self):
+        """Should correctly handle Unicode content."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            output_file = os.path.join(tmpdir, "test.json")
+            test_data = {
+                "emoji": "🚀",
+                "chinese": "你好",
+                "arabic": "مرحبا",
+                "cyrillic": "Привет",
+            }
+
+            result = github_backup.json_dump_if_changed(test_data, output_file)
+            assert result is True
+
+            # Verify Unicode is preserved
+            with codecs.open(output_file, "r", encoding="utf-8") as f:
+                loaded = json.load(f)
+                assert loaded == test_data
+
+            # Second write should skip
+            result2 = github_backup.json_dump_if_changed(test_data, output_file)
+            assert result2 is False
+
+    def test_handles_complex_nested_data(self):
+        """Should handle complex nested data structures."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            output_file = os.path.join(tmpdir, "test.json")
+            test_data = {
+                "users": [
+                    {"id": 1, "name": "Alice", "tags": ["admin", "user"]},
+                    {"id": 2, "name": "Bob", "tags": ["user"]},
+                ],
+                "metadata": {"version": "1.0", "nested": {"deep": {"value": 42}}},
+            }
+
+            result = github_backup.json_dump_if_changed(test_data, output_file)
+            assert result is True
+
+            # Verify structure is preserved
+            with codecs.open(output_file, "r", encoding="utf-8") as f:
+                loaded = json.load(f)
+                assert loaded == test_data
+
+    def test_overwrites_on_unicode_decode_error(self):
+        """Should overwrite if existing file has invalid UTF-8."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            output_file = os.path.join(tmpdir, "test.json")
+            test_data = {"key": "value"}
+
+            # Write invalid UTF-8 bytes
+            with open(output_file, "wb") as f:
+                f.write(b"\xff\xfe invalid utf-8")
+
+            # Should catch UnicodeDecodeError and overwrite
+            result = github_backup.json_dump_if_changed(test_data, output_file)
+            assert result is True
+
+            # Verify new content was written
+            with codecs.open(output_file, "r", encoding="utf-8") as f:
+                loaded = json.load(f)
+                assert loaded == test_data
+
+    def test_key_order_independence(self):
+        """Should treat differently-ordered dicts as same if keys/values match."""
+        with tempfile.TemporaryDirectory() as tmpdir:
+            output_file = os.path.join(tmpdir, "test.json")
+
+            # Write first dict
+            data1 = {"z": 1, "a": 2, "m": 3}
+            github_backup.json_dump_if_changed(data1, output_file)
+
+            # Try to write same data but different order
+            data2 = {"a": 2, "m": 3, "z": 1}
+            result = github_backup.json_dump_if_changed(data2, output_file)
+
+            # Should skip because content is the same (keys are sorted)
+            assert result is False
+
+
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])
--- a/tests/test_pagination.py
+++ b/tests/test_pagination.py
@@ -0,0 +1,153 @@
+"""Tests for Link header pagination handling."""
+
+import json
+from unittest.mock import Mock, patch
+
+import pytest
+
+from github_backup import github_backup
+
+
+class MockHTTPResponse:
+    """Mock HTTP response for paginated API calls."""
+
+    def __init__(self, data, link_header=None):
+        self._content = json.dumps(data).encode("utf-8")
+        self._link_header = link_header
+        self._read = False
+        self.reason = "OK"
+
+    def getcode(self):
+        return 200
+
+    def read(self):
+        if self._read:
+            return b""
+        self._read = True
+        return self._content
+
+    def get_header(self, name, default=None):
+        """Mock method for headers.get()."""
+        return self.headers.get(name, default)
+
+    @property
+    def headers(self):
+        headers = {"x-ratelimit-remaining": "5000"}
+        if self._link_header:
+            headers["Link"] = self._link_header
+        return headers
+
+
+@pytest.fixture
+def mock_args():
+    """Mock args for retrieve_data_gen."""
+    args = Mock()
+    args.as_app = False
+    args.token_fine = None
+    args.token_classic = "fake_token"
+    args.username = None
+    args.password = None
+    args.osx_keychain_item_name = None
+    args.osx_keychain_item_account = None
+    args.throttle_limit = None
+    args.throttle_pause = 0
+    return args
+
+
+def test_cursor_based_pagination(mock_args):
+    """Link header with 'after' cursor parameter works correctly."""
+
+    # Simulate issues endpoint behavior: returns cursor in Link header
+    responses = [
+        # Issues endpoint returns 'after' cursor parameter (not 'page')
+        MockHTTPResponse(
+            data=[{"issue": i} for i in range(1, 101)],  # Page 1 contents
+            link_header='<https://api.github.com/repos/owner/repo/issues?per_page=100&after=ABC123&page=2>; rel="next"',
+        ),
+        MockHTTPResponse(
+            data=[{"issue": i} for i in range(101, 151)],  # Page 2 contents
+            link_header=None,  # No Link header - signals end of pagination
+        ),
+    ]
+    requests_made = []
+
+    def mock_urlopen(request, *args, **kwargs):
+        url = request.get_full_url()
+        requests_made.append(url)
+        return responses[len(requests_made) - 1]
+
+    with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
+        results = list(
+            github_backup.retrieve_data_gen(
+                mock_args, "https://api.github.com/repos/owner/repo/issues"
+            )
+        )
+
+    # Verify all items retrieved and cursor was used in second request
+    assert len(results) == 150
+    assert len(requests_made) == 2
+    assert "after=ABC123" in requests_made[1]
+
+
+def test_page_based_pagination(mock_args):
+    """Link header with 'page' parameter works correctly."""
+
+    # Simulate pulls/repos endpoint behavior: returns page numbers in Link header
+    responses = [
+        # Pulls endpoint uses traditional 'page' parameter (not cursor)
+        MockHTTPResponse(
+            data=[{"pull": i} for i in range(1, 101)],  # Page 1 contents
+            link_header='<https://api.github.com/repos/owner/repo/pulls?per_page=100&page=2>; rel="next"',
+        ),
+        MockHTTPResponse(
+            data=[{"pull": i} for i in range(101, 181)],  # Page 2 contents
+            link_header=None,  # No Link header - signals end of pagination
+        ),
+    ]
+    requests_made = []
+
+    def mock_urlopen(request, *args, **kwargs):
+        url = request.get_full_url()
+        requests_made.append(url)
+        return responses[len(requests_made) - 1]
+
+    with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
+        results = list(
+            github_backup.retrieve_data_gen(
+                mock_args, "https://api.github.com/repos/owner/repo/pulls"
+            )
+        )
+
+    # Verify all items retrieved and page parameter was used (not cursor)
+    assert len(results) == 180
+    assert len(requests_made) == 2
+    assert "page=2" in requests_made[1]
+    assert "after" not in requests_made[1]
+
+
+def test_no_link_header_stops_pagination(mock_args):
+    """Pagination stops when Link header is absent."""
+
+    # Simulate endpoint with results that fit in a single page
+    responses = [
+        MockHTTPResponse(
+            data=[{"label": i} for i in range(1, 51)],  # Page contents
+            link_header=None,  # No Link header - signals end of pagination
+        )
+    ]
+    requests_made = []
+
+    def mock_urlopen(request, *args, **kwargs):
+        requests_made.append(request.get_full_url())
+        return responses[len(requests_made) - 1]
+
+    with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
+        results = list(
+            github_backup.retrieve_data_gen(
+                mock_args, "https://api.github.com/repos/owner/repo/labels"
+            )
+        )
+
+    # Verify pagination stopped after first request
+    assert len(results) == 50
+    assert len(requests_made) == 1
Author	SHA1	Message	Date
GitHub Action	ff2681e196	Release version 0.53.0	2025-11-30 04:30:48 +00:00
Jose Diaz-Gonzalez	745b05a63f	Merge pull request #456 from Iamrodos/fix-case fix: case-sensitive username filtering causing silent backup failures	2025-11-29 23:30:07 -05:00
Jose Diaz-Gonzalez	83ff0ae1dd	Merge pull request #455 from Iamrodos/fix-133 Avoid rewriting unchanged JSON files for labels, milestones, releases…	2025-11-29 23:29:30 -05:00
Rodos	6ad1959d43	fix: case-sensitive username filtering causing silent backup failures GitHub's API accepts usernames in any case but returns canonical case. The case-sensitive comparison in filter_repositories() filtered out all repositories when user-provided case didn't match GitHub's canonical case. Changed to case-insensitive comparison. Fixes #198	2025-11-29 21:16:22 +11:00
Rodos	5739ac0745	Avoid rewriting unchanged JSON files for labels, milestones, releases, hooks, followers, and following This change reduces unnecessary writes when backing up metadata that changes infrequently. The implementation compares existing file content before writing and skips the write if the content is identical, preserving file timestamps. Key changes: - Added json_dump_if_changed() helper that compares content before writing - Uses atomic writes (temp file + rename) for all metadata files - NOT applied to issues/pulls (they use incremental_by_files logic) - Made log messages consistent and past tense ("Saved" instead of "Saving") - Added informative logging showing skip counts Fixes #133	2025-11-29 17:21:14 +11:00
GitHub Action	8b7512c8d8	Release version 0.52.0	2025-11-28 23:39:09 +00:00
Jose Diaz-Gonzalez	995b7ede6c	Merge pull request #454 from Iamrodos/http-451 Skip DMCA'd repos which return a 451 response	2025-11-28 18:38:32 -05:00
Rodos	7840528fe2	Skip DMCA'd repos which return a 451 response Log a warning and the link to the DMCA notice. Continue backing up other repositories instead of crashing. Closes #163	2025-11-29 09:52:02 +11:00
Jose Diaz-Gonzalez	6fb0d86977	Merge pull request #453 from josegonzalez/dependabot/pip/python-packages-42260fba7a chore(deps): bump restructuredtext-lint from 1.4.0 to 2.0.2 in the python-packages group	2025-11-24 15:07:08 -05:00
dependabot[bot]	9f6b401171	chore(deps): bump restructuredtext-lint in the python-packages group Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint). Updates `restructuredtext-lint` from 1.4.0 to 2.0.2 - [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst) - [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2) --- updated-dependencies: - dependency-name: restructuredtext-lint dependency-version: 2.0.2 dependency-type: direct:production update-type: version-update:semver-major dependency-group: python-packages ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-24 14:58:52 +00:00
Jose Diaz-Gonzalez	bf638f7aea	Merge pull request #452 from josegonzalez/dependabot/github_actions/actions/checkout-6 chore(deps): bump actions/checkout from 5 to 6	2025-11-24 04:42:52 -05:00
dependabot[bot]	c3855a94f1	chore(deps): bump actions/checkout from 5 to 6 Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v5...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-24 04:09:25 +00:00
Jose Diaz-Gonzalez	c3f4bfde0d	Merge pull request #451 from josegonzalez/dependabot/pip/python-packages-63544ef561 chore(deps): bump the python-packages group with 3 updates	2025-11-18 11:44:02 -05:00
dependabot[bot]	d3edef0622	chore(deps): bump the python-packages group with 3 updates Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring). Updates `click` from 8.3.0 to 8.3.1 - [Release notes](https://github.com/pallets/click/releases) - [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1) Updates `pytest` from 8.3.3 to 9.0.1 - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1) Updates `keyring` from 25.6.0 to 25.7.0 - [Release notes](https://github.com/jaraco/keyring/releases) - [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst) - [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0) --- updated-dependencies: - dependency-name: click dependency-version: 8.3.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages - dependency-name: pytest dependency-version: 9.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: python-packages - dependency-name: keyring dependency-version: 25.7.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-18 13:24:06 +00:00
GitHub Action	9ef496efad	Release version 0.51.3	2025-11-18 06:55:36 +00:00
Jose Diaz-Gonzalez	42bfe6f79d	Merge pull request #450 from Iamrodos/test/add-pagination-tests test: Add pagination tests for cursor and page-based Link headers	2025-11-18 01:54:54 -05:00
Rodos	5af522a348	test: Add pagination tests for cursor and page-based Link headers	2025-11-17 17:14:29 +11:00
Jose Diaz-Gonzalez	6dfba7a783	Merge pull request #449 from 0x2b3bfa0/patch-1 Use cursor based pagination	2025-11-17 00:31:25 -05:00
Helio Machado	7551829677	Use cursor based pagination	2025-11-17 02:09:29 +01:00
GitHub Action	72d35a9b94	Release version 0.51.2	2025-11-16 23:55:36 +00:00
Jose Diaz-Gonzalez	3eae9d78ed	Merge pull request #447 from Iamrodos/master fix: Improve CA certificate detection with fallback chain	2025-11-16 18:54:58 -05:00
Rodos	90ba839c7d	fix: Improve CA certificate detection with fallback chain The previous implementation incorrectly assumed empty get_ca_certs() meant broken SSL, causing false failures in GitHub Codespaces and other directory-based cert systems where certificates exist but aren't pre-loaded. It would then attempt to import certifi as a workaround, but certifi wasn't listed in requirements.txt, causing the fallback to fail with ImportError even though the system certificates would have worked fine. This commit replaces the naive check with a layered fallback approach that checks multiple certificate sources. First it checks for pre-loaded system certs (file-based systems). Then it verifies system cert paths exist (directory-based systems like Ubuntu/Debian/Codespaces). Finally it attempts to use certifi as an optional fallback only if needed. This approach eliminates hard dependencies (certifi is now optional), works in GitHub Codespaces without any setup, and fails gracefully with clear hints for resolution when SSL is actually broken rather than failing with ModuleNotFoundError. Fixes #444	2025-11-16 16:33:10 +11:00
GitHub Action	1ec0820936	Release version 0.51.1	2025-11-16 02:01:39 +00:00
Jose Diaz-Gonzalez	ca463e5cd4	Merge pull request #446 from josegonzalez/dependabot/pip/python-packages-4ff811fbf7 chore(deps): bump certifi from 2025.10.5 to 2025.11.12 in the python-packages group	2025-11-15 21:01:01 -05:00
Jose Diaz-Gonzalez	1750d0eff1	Merge pull request #448 from Iamrodos/fix/attachment-duplicate-downloads fix: Prevent duplicate attachment downloads (with tests)	2025-11-15 21:00:00 -05:00
Rodos	e4d1c78993	test: Add pytest infrastructure and attachment tests In making my last fix to attachments, I found it challenging not having tests to ensure there was no regression. Added pytest with minimal setup and isolated configuration. Created a separate test workflow to keep tests isolated from linting. Tests cover the key elements of the attachment logic: - URL extraction from issue bodies - Filename extraction from different URL types - Filename collision resolution - Manifest duplicate prevention	2025-11-14 10:28:30 +11:00
Rodos	7a9455db88	fix: Prevent duplicate attachment downloads Fixes bug where attachments were downloaded multiple times with incremented filenames (file.mov, file_1.mov, file_2.mov) when running backups without --skip-existing flag. I should not have used the --skip-existing flag for attachments, it did not do what I thought it did. The correct approach is to always use the manifest to guide what has already been downloaded and what now needs to be done.	2025-11-14 10:28:30 +11:00
dependabot[bot]	a98ff7f23d	chore(deps): bump certifi in the python-packages group Bumps the python-packages group with 1 update: [certifi](https://github.com/certifi/python-certifi). Updates `certifi` from 2025.10.5 to 2025.11.12 - [Commits](https://github.com/certifi/python-certifi/compare/2025.10.05...2025.11.12) --- updated-dependencies: - dependency-name: certifi dependency-version: 2025.11.12 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-12 13:11:06 +00:00
Jose Diaz-Gonzalez	7b78f06a68	Merge pull request #445 from josegonzalez/dependabot/pip/python-packages-499fb03faa chore(deps): bump black from 25.9.0 to 25.11.0 in the python-packages group	2025-11-10 12:45:25 -05:00
dependabot[bot]	56db3ff0e8	chore(deps): bump black in the python-packages group Bumps the python-packages group with 1 update: [black](https://github.com/psf/black). Updates `black` from 25.9.0 to 25.11.0 - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](https://github.com/psf/black/compare/25.9.0...25.11.0) --- updated-dependencies: - dependency-name: black dependency-version: 25.11.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-10 13:59:47 +00:00
Jose Diaz-Gonzalez	5c9c20f6ee	Merge pull request #443 from josegonzalez/dependabot/pip/python-packages-7fb8ba35da chore(deps): bump docutils from 0.22.2 to 0.22.3 in the python-packages group	2025-11-07 15:56:55 -05:00
dependabot[bot]	c8c585cbb5	chore(deps): bump docutils in the python-packages group Bumps the python-packages group with 1 update: [docutils](https://github.com/rtfd/recommonmark). Updates `docutils` from 0.22.2 to 0.22.3 - [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md) - [Commits](https://github.com/rtfd/recommonmark/commits) --- updated-dependencies: - dependency-name: docutils dependency-version: 0.22.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-06 13:09:51 +00:00