Compare commits

...

14 Commits

Author SHA1 Message Date
GitHub Action
ff2681e196 Release version 0.53.0 2025-11-30 04:30:48 +00:00
Jose Diaz-Gonzalez
745b05a63f Merge pull request #456 from Iamrodos/fix-case
fix: case-sensitive username filtering causing silent backup failures
2025-11-29 23:30:07 -05:00
Jose Diaz-Gonzalez
83ff0ae1dd Merge pull request #455 from Iamrodos/fix-133
Avoid rewriting unchanged JSON files for labels, milestones, releases…
2025-11-29 23:29:30 -05:00
Rodos
6ad1959d43 fix: case-sensitive username filtering causing silent backup failures
GitHub's API accepts usernames in any case but returns canonical case.
The case-sensitive comparison in filter_repositories() filtered out all
repositories when user-provided case didn't match GitHub's canonical case.

Changed to case-insensitive comparison.

Fixes #198
2025-11-29 21:16:22 +11:00
Rodos
5739ac0745 Avoid rewriting unchanged JSON files for labels, milestones, releases, hooks, followers, and following
This change reduces unnecessary writes when backing up metadata that changes
infrequently. The implementation compares existing file content before writing
and skips the write if the content is identical, preserving file timestamps.

Key changes:
- Added json_dump_if_changed() helper that compares content before writing
- Uses atomic writes (temp file + rename) for all metadata files
- NOT applied to issues/pulls (they use incremental_by_files logic)
- Made log messages consistent and past tense ("Saved" instead of "Saving")
- Added informative logging showing skip counts

Fixes #133
2025-11-29 17:21:14 +11:00
GitHub Action
8b7512c8d8 Release version 0.52.0 2025-11-28 23:39:09 +00:00
Jose Diaz-Gonzalez
995b7ede6c Merge pull request #454 from Iamrodos/http-451
Skip DMCA'd repos which return a 451 response
2025-11-28 18:38:32 -05:00
Rodos
7840528fe2 Skip DMCA'd repos which return a 451 response
Log a warning and the link to the DMCA notice. Continue backing up
other repositories instead of crashing.

Closes #163
2025-11-29 09:52:02 +11:00
Jose Diaz-Gonzalez
6fb0d86977 Merge pull request #453 from josegonzalez/dependabot/pip/python-packages-42260fba7a
chore(deps): bump restructuredtext-lint from 1.4.0 to 2.0.2 in the python-packages group
2025-11-24 15:07:08 -05:00
dependabot[bot]
9f6b401171 chore(deps): bump restructuredtext-lint in the python-packages group
Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint).


Updates `restructuredtext-lint` from 1.4.0 to 2.0.2
- [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2)

---
updated-dependencies:
- dependency-name: restructuredtext-lint
  dependency-version: 2.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-24 14:58:52 +00:00
Jose Diaz-Gonzalez
bf638f7aea Merge pull request #452 from josegonzalez/dependabot/github_actions/actions/checkout-6
chore(deps): bump actions/checkout from 5 to 6
2025-11-24 04:42:52 -05:00
dependabot[bot]
c3855a94f1 chore(deps): bump actions/checkout from 5 to 6
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-24 04:09:25 +00:00
Jose Diaz-Gonzalez
c3f4bfde0d Merge pull request #451 from josegonzalez/dependabot/pip/python-packages-63544ef561
chore(deps): bump the python-packages group with 3 updates
2025-11-18 11:44:02 -05:00
dependabot[bot]
d3edef0622 chore(deps): bump the python-packages group with 3 updates
Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring).


Updates `click` from 8.3.0 to 8.3.1
- [Release notes](https://github.com/pallets/click/releases)
- [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1)

Updates `pytest` from 8.3.3 to 9.0.1
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1)

Updates `keyring` from 25.6.0 to 25.7.0
- [Release notes](https://github.com/jaraco/keyring/releases)
- [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst)
- [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0)

---
updated-dependencies:
- dependency-name: click
  dependency-version: 8.3.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
- dependency-name: pytest
  dependency-version: 9.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: python-packages
- dependency-name: keyring
  dependency-version: 25.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-18 13:24:06 +00:00
10 changed files with 613 additions and 51 deletions

View File

@@ -18,7 +18,7 @@ jobs:
runs-on: ubuntu-24.04 runs-on: ubuntu-24.04
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v5 uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
ssh-key: ${{ secrets.DEPLOY_PRIVATE_KEY }} ssh-key: ${{ secrets.DEPLOY_PRIVATE_KEY }}

View File

@@ -38,7 +38,7 @@ jobs:
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v5 uses: actions/checkout@v6
with: with:
persist-credentials: false persist-credentials: false

View File

@@ -21,7 +21,7 @@ jobs:
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v5 uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
- name: Setup Python - name: Setup Python

View File

@@ -21,7 +21,7 @@ jobs:
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v5 uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
- name: Setup Python - name: Setup Python

View File

@@ -1,9 +1,125 @@
Changelog Changelog
========= =========
0.51.3 (2025-11-18) 0.53.0 (2025-11-30)
------------------- -------------------
------------------------ ------------------------
Fix
~~~
- Case-sensitive username filtering causing silent backup failures.
[Rodos]
GitHub's API accepts usernames in any case but returns canonical case.
The case-sensitive comparison in filter_repositories() filtered out all
repositories when user-provided case didn't match GitHub's canonical case.
Changed to case-insensitive comparison.
Fixes #198
Other
~~~~~
- Avoid rewriting unchanged JSON files for labels, milestones, releases,
hooks, followers, and following. [Rodos]
This change reduces unnecessary writes when backing up metadata that changes
infrequently. The implementation compares existing file content before writing
and skips the write if the content is identical, preserving file timestamps.
Key changes:
- Added json_dump_if_changed() helper that compares content before writing
- Uses atomic writes (temp file + rename) for all metadata files
- NOT applied to issues/pulls (they use incremental_by_files logic)
- Made log messages consistent and past tense ("Saved" instead of "Saving")
- Added informative logging showing skip counts
Fixes #133
0.52.0 (2025-11-28)
-------------------
- Skip DMCA'd repos which return a 451 response. [Rodos]
Log a warning and the link to the DMCA notice. Continue backing up
other repositories instead of crashing.
Closes #163
- Chore(deps): bump restructuredtext-lint in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint).
Updates `restructuredtext-lint` from 1.4.0 to 2.0.2
- [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2)
---
updated-dependencies:
- dependency-name: restructuredtext-lint
dependency-version: 2.0.2
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: python-packages
...
- Chore(deps): bump actions/checkout from 5 to 6. [dependabot[bot]]
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v5...v6)
---
updated-dependencies:
- dependency-name: actions/checkout
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
...
- Chore(deps): bump the python-packages group with 3 updates.
[dependabot[bot]]
Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring).
Updates `click` from 8.3.0 to 8.3.1
- [Release notes](https://github.com/pallets/click/releases)
- [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1)
Updates `pytest` from 8.3.3 to 9.0.1
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1)
Updates `keyring` from 25.6.0 to 25.7.0
- [Release notes](https://github.com/jaraco/keyring/releases)
- [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst)
- [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0)
---
updated-dependencies:
- dependency-name: click
dependency-version: 8.3.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
- dependency-name: pytest
dependency-version: 9.0.1
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: python-packages
- dependency-name: keyring
dependency-version: 25.7.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: python-packages
...
0.51.3 (2025-11-18)
-------------------
- Test: Add pagination tests for cursor and page-based Link headers. - Test: Add pagination tests for cursor and page-based Link headers.
[Rodos] [Rodos]
- Use cursor based pagination. [Helio Machado] - Use cursor based pagination. [Helio Machado]

View File

@@ -1 +1 @@
__version__ = "0.51.3" __version__ = "0.53.0"

View File

@@ -37,6 +37,15 @@ FNULL = open(os.devnull, "w")
FILE_URI_PREFIX = "file://" FILE_URI_PREFIX = "file://"
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
class RepositoryUnavailableError(Exception):
"""Raised when a repository is unavailable due to legal reasons (e.g., DMCA takedown)."""
def __init__(self, message, dmca_url=None):
super().__init__(message)
self.dmca_url = dmca_url
# Setup SSL context with fallback chain # Setup SSL context with fallback chain
https_ctx = ssl.create_default_context() https_ctx = ssl.create_default_context()
if https_ctx.get_ca_certs(): if https_ctx.get_ca_certs():
@@ -612,6 +621,19 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
status_code = int(r.getcode()) status_code = int(r.getcode())
# Handle DMCA takedown (HTTP 451) - raise exception to skip entire repository
if status_code == 451:
dmca_url = None
try:
response_data = json.loads(r.read().decode("utf-8"))
dmca_url = response_data.get("block", {}).get("html_url")
except Exception:
pass
raise RepositoryUnavailableError(
"Repository unavailable due to legal reasons (HTTP 451)",
dmca_url=dmca_url
)
# Check if we got correct data # Check if we got correct data
try: try:
response = json.loads(r.read().decode("utf-8")) response = json.loads(r.read().decode("utf-8"))
@@ -1565,7 +1587,9 @@ def filter_repositories(args, unfiltered_repositories):
repositories = [] repositories = []
for r in unfiltered_repositories: for r in unfiltered_repositories:
# gists can be anonymous, so need to safely check owner # gists can be anonymous, so need to safely check owner
if r.get("owner", {}).get("login") == args.user or r.get("is_starred"): # Use case-insensitive comparison to match GitHub's case-insensitive username behavior
owner_login = r.get("owner", {}).get("login", "")
if owner_login.lower() == args.user.lower() or r.get("is_starred"):
repositories.append(r) repositories.append(r)
name_regex = None name_regex = None
@@ -1668,40 +1692,47 @@ def backup_repositories(args, output_directory, repositories):
continue # don't try to back anything else for a gist; it doesn't exist continue # don't try to back anything else for a gist; it doesn't exist
download_wiki = args.include_wiki or args.include_everything try:
if repository["has_wiki"] and download_wiki: download_wiki = args.include_wiki or args.include_everything
fetch_repository( if repository["has_wiki"] and download_wiki:
repository["name"], fetch_repository(
repo_url.replace(".git", ".wiki.git"), repository["name"],
os.path.join(repo_cwd, "wiki"), repo_url.replace(".git", ".wiki.git"),
skip_existing=args.skip_existing, os.path.join(repo_cwd, "wiki"),
bare_clone=args.bare_clone, skip_existing=args.skip_existing,
lfs_clone=args.lfs_clone, bare_clone=args.bare_clone,
no_prune=args.no_prune, lfs_clone=args.lfs_clone,
) no_prune=args.no_prune,
if args.include_issues or args.include_everything: )
backup_issues(args, repo_cwd, repository, repos_template) if args.include_issues or args.include_everything:
backup_issues(args, repo_cwd, repository, repos_template)
if args.include_pulls or args.include_everything: if args.include_pulls or args.include_everything:
backup_pulls(args, repo_cwd, repository, repos_template) backup_pulls(args, repo_cwd, repository, repos_template)
if args.include_milestones or args.include_everything: if args.include_milestones or args.include_everything:
backup_milestones(args, repo_cwd, repository, repos_template) backup_milestones(args, repo_cwd, repository, repos_template)
if args.include_labels or args.include_everything: if args.include_labels or args.include_everything:
backup_labels(args, repo_cwd, repository, repos_template) backup_labels(args, repo_cwd, repository, repos_template)
if args.include_hooks or args.include_everything: if args.include_hooks or args.include_everything:
backup_hooks(args, repo_cwd, repository, repos_template) backup_hooks(args, repo_cwd, repository, repos_template)
if args.include_releases or args.include_everything: if args.include_releases or args.include_everything:
backup_releases( backup_releases(
args, args,
repo_cwd, repo_cwd,
repository, repository,
repos_template, repos_template,
include_assets=args.include_assets or args.include_everything, include_assets=args.include_assets or args.include_everything,
) )
except RepositoryUnavailableError as e:
logger.warning(f"Repository {repository['full_name']} is unavailable (HTTP 451)")
if e.dmca_url:
logger.warning(f"DMCA notice: {e.dmca_url}")
logger.info(f"Skipping remaining resources for {repository['full_name']}")
continue
if args.incremental: if args.incremental:
if last_update == "0000-00-00T00:00:00Z": if last_update == "0000-00-00T00:00:00Z":
@@ -1869,11 +1900,21 @@ def backup_milestones(args, repo_cwd, repository, repos_template):
for milestone in _milestones: for milestone in _milestones:
milestones[milestone["number"]] = milestone milestones[milestone["number"]] = milestone
logger.info("Saving {0} milestones to disk".format(len(list(milestones.keys())))) written_count = 0
for number, milestone in list(milestones.items()): for number, milestone in list(milestones.items()):
milestone_file = "{0}/{1}.json".format(milestone_cwd, number) milestone_file = "{0}/{1}.json".format(milestone_cwd, number)
with codecs.open(milestone_file, "w", encoding="utf-8") as f: if json_dump_if_changed(milestone, milestone_file):
json_dump(milestone, f) written_count += 1
total = len(milestones)
if written_count == total:
logger.info("Saved {0} milestones to disk".format(total))
elif written_count == 0:
logger.info("{0} milestones unchanged, skipped write".format(total))
else:
logger.info("Saved {0} of {1} milestones to disk ({2} unchanged)".format(
written_count, total, total - written_count
))
def backup_labels(args, repo_cwd, repository, repos_template): def backup_labels(args, repo_cwd, repository, repos_template):
@@ -1926,19 +1967,17 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
reverse=True, reverse=True,
) )
releases = releases[: args.number_of_latest_releases] releases = releases[: args.number_of_latest_releases]
logger.info("Saving the latest {0} releases to disk".format(len(releases)))
else:
logger.info("Saving {0} releases to disk".format(len(releases)))
# for each release, store it # for each release, store it
written_count = 0
for release in releases: for release in releases:
release_name = release["tag_name"] release_name = release["tag_name"]
release_name_safe = release_name.replace("/", "__") release_name_safe = release_name.replace("/", "__")
output_filepath = os.path.join( output_filepath = os.path.join(
release_cwd, "{0}.json".format(release_name_safe) release_cwd, "{0}.json".format(release_name_safe)
) )
with codecs.open(output_filepath, "w+", encoding="utf-8") as f: if json_dump_if_changed(release, output_filepath):
json_dump(release, f) written_count += 1
if include_assets: if include_assets:
assets = retrieve_data(args, release["assets_url"]) assets = retrieve_data(args, release["assets_url"])
@@ -1955,6 +1994,17 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
fine=True if args.token_fine is not None else False, fine=True if args.token_fine is not None else False,
) )
# Log the results
total = len(releases)
if written_count == total:
logger.info("Saved {0} releases to disk".format(total))
elif written_count == 0:
logger.info("{0} releases unchanged, skipped write".format(total))
else:
logger.info("Saved {0} of {1} releases to disk ({2} unchanged)".format(
written_count, total, total - written_count
))
def fetch_repository( def fetch_repository(
name, name,
@@ -2079,9 +2129,10 @@ def _backup_data(args, name, template, output_file, output_directory):
mkdir_p(output_directory) mkdir_p(output_directory)
data = retrieve_data(args, template) data = retrieve_data(args, template)
logger.info("Writing {0} {1} to disk".format(len(data), name)) if json_dump_if_changed(data, output_file):
with codecs.open(output_file, "w", encoding="utf-8") as f: logger.info("Saved {0} {1} to disk".format(len(data), name))
json_dump(data, f) else:
logger.info("{0} {1} unchanged, skipped write".format(len(data), name))
def json_dump(data, output_file): def json_dump(data, output_file):
@@ -2093,3 +2144,57 @@ def json_dump(data, output_file):
indent=4, indent=4,
separators=(",", ": "), separators=(",", ": "),
) )
def json_dump_if_changed(data, output_file_path):
"""
Write JSON data to file only if content has changed.
Compares the serialized JSON data with the existing file content
and only writes if different. This prevents unnecessary file
modification timestamp updates and disk writes.
Uses atomic writes (temp file + rename) to prevent corruption
if the process is interrupted during the write.
Args:
data: The data to serialize as JSON
output_file_path: The path to the output file
Returns:
True if file was written (content changed or new file)
False if write was skipped (content unchanged)
"""
# Serialize new data with consistent formatting matching json_dump()
new_content = json.dumps(
data,
ensure_ascii=False,
sort_keys=True,
indent=4,
separators=(",", ": "),
)
# Check if file exists and compare content
if os.path.exists(output_file_path):
try:
with codecs.open(output_file_path, "r", encoding="utf-8") as f:
existing_content = f.read()
if existing_content == new_content:
logger.debug(
"Content unchanged, skipping write: {0}".format(output_file_path)
)
return False
except (OSError, UnicodeDecodeError) as e:
# If we can't read the existing file, write the new one
logger.debug(
"Error reading existing file {0}, will overwrite: {1}".format(
output_file_path, e
)
)
# Write the file atomically using temp file + rename
temp_file = output_file_path + ".temp"
with codecs.open(temp_file, "w", encoding="utf-8") as f:
f.write(new_content)
os.rename(temp_file, output_file_path) # Atomic on POSIX systems
return True

View File

@@ -3,16 +3,16 @@ black==25.11.0
bleach==6.3.0 bleach==6.3.0
certifi==2025.11.12 certifi==2025.11.12
charset-normalizer==3.4.4 charset-normalizer==3.4.4
click==8.3.0 click==8.3.1
colorama==0.4.6 colorama==0.4.6
docutils==0.22.3 docutils==0.22.3
flake8==7.3.0 flake8==7.3.0
gitchangelog==3.0.4 gitchangelog==3.0.4
pytest==8.3.3 pytest==9.0.1
idna==3.11 idna==3.11
importlib-metadata==8.7.0 importlib-metadata==8.7.0
jaraco.classes==3.4.0 jaraco.classes==3.4.0
keyring==25.6.0 keyring==25.7.0
markdown-it-py==4.0.0 markdown-it-py==4.0.0
mccabe==0.7.0 mccabe==0.7.0
mdurl==0.1.2 mdurl==0.1.2
@@ -28,7 +28,7 @@ Pygments==2.19.2
readme-renderer==44.0 readme-renderer==44.0
requests==2.32.5 requests==2.32.5
requests-toolbelt==1.0.0 requests-toolbelt==1.0.0
restructuredtext-lint==1.4.0 restructuredtext-lint==2.0.2
rfc3986==2.0.0 rfc3986==2.0.0
rich==14.2.0 rich==14.2.0
setuptools==80.9.0 setuptools==80.9.0

143
tests/test_http_451.py Normal file
View File

@@ -0,0 +1,143 @@
"""Tests for HTTP 451 (DMCA takedown) handling."""
import json
from unittest.mock import Mock, patch
import pytest
from github_backup import github_backup
class TestHTTP451Exception:
"""Test suite for HTTP 451 DMCA takedown exception handling."""
def test_repository_unavailable_error_raised(self):
"""HTTP 451 should raise RepositoryUnavailableError with DMCA URL."""
# Create mock args
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
# Mock HTTPError 451 response
mock_response = Mock()
mock_response.getcode.return_value = 451
dmca_data = {
"message": "Repository access blocked",
"block": {
"reason": "dmca",
"created_at": "2024-11-12T14:38:04Z",
"html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
}
}
mock_response.read.return_value = json.dumps(dmca_data).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
# Check exception has DMCA URL
assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
assert "451" in str(exc_info.value)
def test_repository_unavailable_error_without_dmca_url(self):
"""HTTP 451 without DMCA details should still raise exception."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
mock_response = Mock()
mock_response.getcode.return_value = 451
mock_response.read.return_value = b'{"message": "Blocked"}'
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
# Exception raised even without DMCA URL
assert exc_info.value.dmca_url is None
assert "451" in str(exc_info.value)
def test_repository_unavailable_error_with_malformed_json(self):
"""HTTP 451 with malformed JSON should still raise exception."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
mock_response = Mock()
mock_response.getcode.return_value = 451
mock_response.read.return_value = b"invalid json {"
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with pytest.raises(github_backup.RepositoryUnavailableError):
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
def test_other_http_errors_unchanged(self):
"""Other HTTP errors should still raise generic Exception."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
mock_response = Mock()
mock_response.getcode.return_value = 404
mock_response.read.return_value = b'{"message": "Not Found"}'
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Not Found"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
# Should raise generic Exception, not RepositoryUnavailableError
with pytest.raises(Exception) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/notfound/issues"))
assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError)
assert "404" in str(exc_info.value)
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -0,0 +1,198 @@
"""Tests for json_dump_if_changed functionality."""
import codecs
import json
import os
import tempfile
import pytest
from github_backup import github_backup
class TestJsonDumpIfChanged:
"""Test suite for json_dump_if_changed function."""
def test_writes_new_file(self):
"""Should write file when it doesn't exist."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {"key": "value", "number": 42}
result = github_backup.json_dump_if_changed(test_data, output_file)
assert result is True
assert os.path.exists(output_file)
# Verify content matches expected format
with codecs.open(output_file, "r", encoding="utf-8") as f:
content = f.read()
loaded = json.loads(content)
assert loaded == test_data
def test_skips_unchanged_file(self):
"""Should skip write when content is identical."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {"key": "value", "number": 42}
# First write
result1 = github_backup.json_dump_if_changed(test_data, output_file)
assert result1 is True
# Get the initial mtime
mtime1 = os.path.getmtime(output_file)
# Second write with same data
result2 = github_backup.json_dump_if_changed(test_data, output_file)
assert result2 is False
# File should not have been modified
mtime2 = os.path.getmtime(output_file)
assert mtime1 == mtime2
def test_writes_when_content_changed(self):
"""Should write file when content has changed."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data1 = {"key": "value1"}
test_data2 = {"key": "value2"}
# First write
result1 = github_backup.json_dump_if_changed(test_data1, output_file)
assert result1 is True
# Second write with different data
result2 = github_backup.json_dump_if_changed(test_data2, output_file)
assert result2 is True
# Verify new content
with codecs.open(output_file, "r", encoding="utf-8") as f:
loaded = json.load(f)
assert loaded == test_data2
def test_uses_consistent_formatting(self):
"""Should use same JSON formatting as json_dump."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {"z": "last", "a": "first", "m": "middle"}
github_backup.json_dump_if_changed(test_data, output_file)
with codecs.open(output_file, "r", encoding="utf-8") as f:
content = f.read()
# Check for consistent formatting:
# - sorted keys
# - 4-space indent
# - comma-colon-space separator
expected = json.dumps(
test_data,
ensure_ascii=False,
sort_keys=True,
indent=4,
separators=(",", ": "),
)
assert content == expected
def test_atomic_write_always_used(self):
"""Should always use temp file and rename for atomic writes."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {"key": "value"}
result = github_backup.json_dump_if_changed(test_data, output_file)
assert result is True
assert os.path.exists(output_file)
# Temp file should not exist after atomic write
temp_file = output_file + ".temp"
assert not os.path.exists(temp_file)
# Verify content
with codecs.open(output_file, "r", encoding="utf-8") as f:
loaded = json.load(f)
assert loaded == test_data
def test_handles_unicode_content(self):
"""Should correctly handle Unicode content."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {
"emoji": "🚀",
"chinese": "你好",
"arabic": "مرحبا",
"cyrillic": "Привет",
}
result = github_backup.json_dump_if_changed(test_data, output_file)
assert result is True
# Verify Unicode is preserved
with codecs.open(output_file, "r", encoding="utf-8") as f:
loaded = json.load(f)
assert loaded == test_data
# Second write should skip
result2 = github_backup.json_dump_if_changed(test_data, output_file)
assert result2 is False
def test_handles_complex_nested_data(self):
"""Should handle complex nested data structures."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {
"users": [
{"id": 1, "name": "Alice", "tags": ["admin", "user"]},
{"id": 2, "name": "Bob", "tags": ["user"]},
],
"metadata": {"version": "1.0", "nested": {"deep": {"value": 42}}},
}
result = github_backup.json_dump_if_changed(test_data, output_file)
assert result is True
# Verify structure is preserved
with codecs.open(output_file, "r", encoding="utf-8") as f:
loaded = json.load(f)
assert loaded == test_data
def test_overwrites_on_unicode_decode_error(self):
"""Should overwrite if existing file has invalid UTF-8."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {"key": "value"}
# Write invalid UTF-8 bytes
with open(output_file, "wb") as f:
f.write(b"\xff\xfe invalid utf-8")
# Should catch UnicodeDecodeError and overwrite
result = github_backup.json_dump_if_changed(test_data, output_file)
assert result is True
# Verify new content was written
with codecs.open(output_file, "r", encoding="utf-8") as f:
loaded = json.load(f)
assert loaded == test_data
def test_key_order_independence(self):
"""Should treat differently-ordered dicts as same if keys/values match."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
# Write first dict
data1 = {"z": 1, "a": 2, "m": 3}
github_backup.json_dump_if_changed(data1, output_file)
# Try to write same data but different order
data2 = {"a": 2, "m": 3, "z": 1}
result = github_backup.json_dump_if_changed(data2, output_file)
# Should skip because content is the same (keys are sorted)
assert result is False
if __name__ == "__main__":
pytest.main([__file__, "-v"])