mirror of
https://github.com/josegonzalez/python-github-backup.git
synced 2025-12-05 16:18:02 +01:00
Compare commits
22 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ff2681e196 | ||
|
|
745b05a63f | ||
|
|
83ff0ae1dd | ||
|
|
6ad1959d43 | ||
|
|
5739ac0745 | ||
|
|
8b7512c8d8 | ||
|
|
995b7ede6c | ||
|
|
7840528fe2 | ||
|
|
6fb0d86977 | ||
|
|
9f6b401171 | ||
|
|
bf638f7aea | ||
|
|
c3855a94f1 | ||
|
|
c3f4bfde0d | ||
|
|
d3edef0622 | ||
|
|
9ef496efad | ||
|
|
42bfe6f79d | ||
|
|
5af522a348 | ||
|
|
6dfba7a783 | ||
|
|
7551829677 | ||
|
|
72d35a9b94 | ||
|
|
3eae9d78ed | ||
|
|
90ba839c7d |
2
.github/workflows/automatic-release.yml
vendored
2
.github/workflows/automatic-release.yml
vendored
@@ -18,7 +18,7 @@ jobs:
|
|||||||
runs-on: ubuntu-24.04
|
runs-on: ubuntu-24.04
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout repository
|
- name: Checkout repository
|
||||||
uses: actions/checkout@v5
|
uses: actions/checkout@v6
|
||||||
with:
|
with:
|
||||||
fetch-depth: 0
|
fetch-depth: 0
|
||||||
ssh-key: ${{ secrets.DEPLOY_PRIVATE_KEY }}
|
ssh-key: ${{ secrets.DEPLOY_PRIVATE_KEY }}
|
||||||
|
|||||||
2
.github/workflows/docker.yml
vendored
2
.github/workflows/docker.yml
vendored
@@ -38,7 +38,7 @@ jobs:
|
|||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout repository
|
- name: Checkout repository
|
||||||
uses: actions/checkout@v5
|
uses: actions/checkout@v6
|
||||||
with:
|
with:
|
||||||
persist-credentials: false
|
persist-credentials: false
|
||||||
|
|
||||||
|
|||||||
2
.github/workflows/lint.yml
vendored
2
.github/workflows/lint.yml
vendored
@@ -21,7 +21,7 @@ jobs:
|
|||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout repository
|
- name: Checkout repository
|
||||||
uses: actions/checkout@v5
|
uses: actions/checkout@v6
|
||||||
with:
|
with:
|
||||||
fetch-depth: 0
|
fetch-depth: 0
|
||||||
- name: Setup Python
|
- name: Setup Python
|
||||||
|
|||||||
2
.github/workflows/test.yml
vendored
2
.github/workflows/test.yml
vendored
@@ -21,7 +21,7 @@ jobs:
|
|||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout repository
|
- name: Checkout repository
|
||||||
uses: actions/checkout@v5
|
uses: actions/checkout@v6
|
||||||
with:
|
with:
|
||||||
fetch-depth: 0
|
fetch-depth: 0
|
||||||
- name: Setup Python
|
- name: Setup Python
|
||||||
|
|||||||
153
CHANGES.rst
153
CHANGES.rst
@@ -1,10 +1,161 @@
|
|||||||
Changelog
|
Changelog
|
||||||
=========
|
=========
|
||||||
|
|
||||||
0.51.1 (2025-11-16)
|
0.53.0 (2025-11-30)
|
||||||
-------------------
|
-------------------
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
|
Fix
|
||||||
|
~~~
|
||||||
|
- Case-sensitive username filtering causing silent backup failures.
|
||||||
|
[Rodos]
|
||||||
|
|
||||||
|
GitHub's API accepts usernames in any case but returns canonical case.
|
||||||
|
The case-sensitive comparison in filter_repositories() filtered out all
|
||||||
|
repositories when user-provided case didn't match GitHub's canonical case.
|
||||||
|
|
||||||
|
Changed to case-insensitive comparison.
|
||||||
|
|
||||||
|
Fixes #198
|
||||||
|
|
||||||
|
Other
|
||||||
|
~~~~~
|
||||||
|
- Avoid rewriting unchanged JSON files for labels, milestones, releases,
|
||||||
|
hooks, followers, and following. [Rodos]
|
||||||
|
|
||||||
|
This change reduces unnecessary writes when backing up metadata that changes
|
||||||
|
infrequently. The implementation compares existing file content before writing
|
||||||
|
and skips the write if the content is identical, preserving file timestamps.
|
||||||
|
|
||||||
|
Key changes:
|
||||||
|
- Added json_dump_if_changed() helper that compares content before writing
|
||||||
|
- Uses atomic writes (temp file + rename) for all metadata files
|
||||||
|
- NOT applied to issues/pulls (they use incremental_by_files logic)
|
||||||
|
- Made log messages consistent and past tense ("Saved" instead of "Saving")
|
||||||
|
- Added informative logging showing skip counts
|
||||||
|
|
||||||
|
Fixes #133
|
||||||
|
|
||||||
|
|
||||||
|
0.52.0 (2025-11-28)
|
||||||
|
-------------------
|
||||||
|
- Skip DMCA'd repos which return a 451 response. [Rodos]
|
||||||
|
|
||||||
|
Log a warning and the link to the DMCA notice. Continue backing up
|
||||||
|
other repositories instead of crashing.
|
||||||
|
|
||||||
|
Closes #163
|
||||||
|
- Chore(deps): bump restructuredtext-lint in the python-packages group.
|
||||||
|
[dependabot[bot]]
|
||||||
|
|
||||||
|
Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint).
|
||||||
|
|
||||||
|
|
||||||
|
Updates `restructuredtext-lint` from 1.4.0 to 2.0.2
|
||||||
|
- [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst)
|
||||||
|
- [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2)
|
||||||
|
|
||||||
|
---
|
||||||
|
updated-dependencies:
|
||||||
|
- dependency-name: restructuredtext-lint
|
||||||
|
dependency-version: 2.0.2
|
||||||
|
dependency-type: direct:production
|
||||||
|
update-type: version-update:semver-major
|
||||||
|
dependency-group: python-packages
|
||||||
|
...
|
||||||
|
- Chore(deps): bump actions/checkout from 5 to 6. [dependabot[bot]]
|
||||||
|
|
||||||
|
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
|
||||||
|
- [Release notes](https://github.com/actions/checkout/releases)
|
||||||
|
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
|
||||||
|
- [Commits](https://github.com/actions/checkout/compare/v5...v6)
|
||||||
|
|
||||||
|
---
|
||||||
|
updated-dependencies:
|
||||||
|
- dependency-name: actions/checkout
|
||||||
|
dependency-version: '6'
|
||||||
|
dependency-type: direct:production
|
||||||
|
update-type: version-update:semver-major
|
||||||
|
...
|
||||||
|
- Chore(deps): bump the python-packages group with 3 updates.
|
||||||
|
[dependabot[bot]]
|
||||||
|
|
||||||
|
Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring).
|
||||||
|
|
||||||
|
|
||||||
|
Updates `click` from 8.3.0 to 8.3.1
|
||||||
|
- [Release notes](https://github.com/pallets/click/releases)
|
||||||
|
- [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst)
|
||||||
|
- [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1)
|
||||||
|
|
||||||
|
Updates `pytest` from 8.3.3 to 9.0.1
|
||||||
|
- [Release notes](https://github.com/pytest-dev/pytest/releases)
|
||||||
|
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
|
||||||
|
- [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1)
|
||||||
|
|
||||||
|
Updates `keyring` from 25.6.0 to 25.7.0
|
||||||
|
- [Release notes](https://github.com/jaraco/keyring/releases)
|
||||||
|
- [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst)
|
||||||
|
- [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0)
|
||||||
|
|
||||||
|
---
|
||||||
|
updated-dependencies:
|
||||||
|
- dependency-name: click
|
||||||
|
dependency-version: 8.3.1
|
||||||
|
dependency-type: direct:production
|
||||||
|
update-type: version-update:semver-patch
|
||||||
|
dependency-group: python-packages
|
||||||
|
- dependency-name: pytest
|
||||||
|
dependency-version: 9.0.1
|
||||||
|
dependency-type: direct:production
|
||||||
|
update-type: version-update:semver-major
|
||||||
|
dependency-group: python-packages
|
||||||
|
- dependency-name: keyring
|
||||||
|
dependency-version: 25.7.0
|
||||||
|
dependency-type: direct:production
|
||||||
|
update-type: version-update:semver-minor
|
||||||
|
dependency-group: python-packages
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
0.51.3 (2025-11-18)
|
||||||
|
-------------------
|
||||||
|
- Test: Add pagination tests for cursor and page-based Link headers.
|
||||||
|
[Rodos]
|
||||||
|
- Use cursor based pagination. [Helio Machado]
|
||||||
|
|
||||||
|
|
||||||
|
0.51.2 (2025-11-16)
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Fix
|
||||||
|
~~~
|
||||||
|
- Improve CA certificate detection with fallback chain. [Rodos]
|
||||||
|
|
||||||
|
The previous implementation incorrectly assumed empty get_ca_certs()
|
||||||
|
meant broken SSL, causing false failures in GitHub Codespaces and other
|
||||||
|
directory-based cert systems where certificates exist but aren't pre-loaded.
|
||||||
|
It would then attempt to import certifi as a workaround, but certifi wasn't
|
||||||
|
listed in requirements.txt, causing the fallback to fail with ImportError
|
||||||
|
even though the system certificates would have worked fine.
|
||||||
|
|
||||||
|
This commit replaces the naive check with a layered fallback approach that
|
||||||
|
checks multiple certificate sources. First it checks for pre-loaded system
|
||||||
|
certs (file-based systems). Then it verifies system cert paths exist
|
||||||
|
(directory-based systems like Ubuntu/Debian/Codespaces). Finally it attempts
|
||||||
|
to use certifi as an optional fallback only if needed.
|
||||||
|
|
||||||
|
This approach eliminates hard dependencies (certifi is now optional), works
|
||||||
|
in GitHub Codespaces without any setup, and fails gracefully with clear hints
|
||||||
|
for resolution when SSL is actually broken rather than failing with
|
||||||
|
ModuleNotFoundError.
|
||||||
|
|
||||||
|
Fixes #444
|
||||||
|
|
||||||
|
|
||||||
|
0.51.1 (2025-11-16)
|
||||||
|
-------------------
|
||||||
|
|
||||||
Fix
|
Fix
|
||||||
~~~
|
~~~
|
||||||
- Prevent duplicate attachment downloads. [Rodos]
|
- Prevent duplicate attachment downloads. [Rodos]
|
||||||
|
|||||||
@@ -1 +1 @@
|
|||||||
__version__ = "0.51.1"
|
__version__ = "0.53.0"
|
||||||
|
|||||||
@@ -37,22 +37,42 @@ FNULL = open(os.devnull, "w")
|
|||||||
FILE_URI_PREFIX = "file://"
|
FILE_URI_PREFIX = "file://"
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class RepositoryUnavailableError(Exception):
|
||||||
|
"""Raised when a repository is unavailable due to legal reasons (e.g., DMCA takedown)."""
|
||||||
|
|
||||||
|
def __init__(self, message, dmca_url=None):
|
||||||
|
super().__init__(message)
|
||||||
|
self.dmca_url = dmca_url
|
||||||
|
|
||||||
|
|
||||||
|
# Setup SSL context with fallback chain
|
||||||
https_ctx = ssl.create_default_context()
|
https_ctx = ssl.create_default_context()
|
||||||
if not https_ctx.get_ca_certs():
|
if https_ctx.get_ca_certs():
|
||||||
import warnings
|
# Layer 1: Certificates pre-loaded from system (file-based)
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
paths = ssl.get_default_verify_paths()
|
||||||
|
if (paths.cafile and os.path.exists(paths.cafile)) or (
|
||||||
|
paths.capath and os.path.exists(paths.capath)
|
||||||
|
):
|
||||||
|
# Layer 2: Cert paths exist, will be lazy-loaded on first use (directory-based)
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
# Layer 3: Try certifi package as optional fallback
|
||||||
|
try:
|
||||||
|
import certifi
|
||||||
|
|
||||||
warnings.warn(
|
https_ctx = ssl.create_default_context(cafile=certifi.where())
|
||||||
"\n\nYOUR DEFAULT CA CERTS ARE EMPTY.\n"
|
except ImportError:
|
||||||
+ "PLEASE POPULATE ANY OF:"
|
# All layers failed - no certificates available anywhere
|
||||||
+ "".join(
|
sys.exit(
|
||||||
["\n - " + x for x in ssl.get_default_verify_paths() if type(x) is str]
|
"\nERROR: No CA certificates found. Cannot connect to GitHub over SSL.\n\n"
|
||||||
)
|
"Solutions you can explore:\n"
|
||||||
+ "\n",
|
" 1. pip install certifi\n"
|
||||||
stacklevel=2,
|
" 2. Alpine: apk add ca-certificates\n"
|
||||||
)
|
" 3. Debian/Ubuntu: apt-get install ca-certificates\n\n"
|
||||||
import certifi
|
)
|
||||||
|
|
||||||
https_ctx = ssl.create_default_context(cafile=certifi.where())
|
|
||||||
|
|
||||||
|
|
||||||
def logging_subprocess(
|
def logging_subprocess(
|
||||||
@@ -581,27 +601,39 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
|
|||||||
auth = get_auth(args, encode=not args.as_app)
|
auth = get_auth(args, encode=not args.as_app)
|
||||||
query_args = get_query_args(query_args)
|
query_args = get_query_args(query_args)
|
||||||
per_page = 100
|
per_page = 100
|
||||||
page = 0
|
next_url = None
|
||||||
|
|
||||||
while True:
|
while True:
|
||||||
if single_request:
|
if single_request:
|
||||||
request_page, request_per_page = None, None
|
request_per_page = None
|
||||||
else:
|
else:
|
||||||
page = page + 1
|
request_per_page = per_page
|
||||||
request_page, request_per_page = page, per_page
|
|
||||||
|
|
||||||
request = _construct_request(
|
request = _construct_request(
|
||||||
request_per_page,
|
request_per_page,
|
||||||
request_page,
|
|
||||||
query_args,
|
query_args,
|
||||||
template,
|
next_url or template,
|
||||||
auth,
|
auth,
|
||||||
as_app=args.as_app,
|
as_app=args.as_app,
|
||||||
fine=True if args.token_fine is not None else False,
|
fine=True if args.token_fine is not None else False,
|
||||||
) # noqa
|
) # noqa
|
||||||
r, errors = _get_response(request, auth, template)
|
r, errors = _get_response(request, auth, next_url or template)
|
||||||
|
|
||||||
status_code = int(r.getcode())
|
status_code = int(r.getcode())
|
||||||
|
|
||||||
|
# Handle DMCA takedown (HTTP 451) - raise exception to skip entire repository
|
||||||
|
if status_code == 451:
|
||||||
|
dmca_url = None
|
||||||
|
try:
|
||||||
|
response_data = json.loads(r.read().decode("utf-8"))
|
||||||
|
dmca_url = response_data.get("block", {}).get("html_url")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
raise RepositoryUnavailableError(
|
||||||
|
"Repository unavailable due to legal reasons (HTTP 451)",
|
||||||
|
dmca_url=dmca_url
|
||||||
|
)
|
||||||
|
|
||||||
# Check if we got correct data
|
# Check if we got correct data
|
||||||
try:
|
try:
|
||||||
response = json.loads(r.read().decode("utf-8"))
|
response = json.loads(r.read().decode("utf-8"))
|
||||||
@@ -633,15 +665,14 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
|
|||||||
retries += 1
|
retries += 1
|
||||||
time.sleep(5)
|
time.sleep(5)
|
||||||
request = _construct_request(
|
request = _construct_request(
|
||||||
per_page,
|
request_per_page,
|
||||||
page,
|
|
||||||
query_args,
|
query_args,
|
||||||
template,
|
next_url or template,
|
||||||
auth,
|
auth,
|
||||||
as_app=args.as_app,
|
as_app=args.as_app,
|
||||||
fine=True if args.token_fine is not None else False,
|
fine=True if args.token_fine is not None else False,
|
||||||
) # noqa
|
) # noqa
|
||||||
r, errors = _get_response(request, auth, template)
|
r, errors = _get_response(request, auth, next_url or template)
|
||||||
|
|
||||||
status_code = int(r.getcode())
|
status_code = int(r.getcode())
|
||||||
try:
|
try:
|
||||||
@@ -671,7 +702,16 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
|
|||||||
if type(response) is list:
|
if type(response) is list:
|
||||||
for resp in response:
|
for resp in response:
|
||||||
yield resp
|
yield resp
|
||||||
if len(response) < per_page:
|
# Parse Link header for next page URL (cursor-based pagination)
|
||||||
|
link_header = r.headers.get("Link", "")
|
||||||
|
next_url = None
|
||||||
|
if link_header:
|
||||||
|
# Parse Link header: <https://api.github.com/...?per_page=100&after=cursor>; rel="next"
|
||||||
|
for link in link_header.split(","):
|
||||||
|
if 'rel="next"' in link:
|
||||||
|
next_url = link[link.find("<") + 1:link.find(">")]
|
||||||
|
break
|
||||||
|
if not next_url:
|
||||||
break
|
break
|
||||||
elif type(response) is dict and single_request:
|
elif type(response) is dict and single_request:
|
||||||
yield response
|
yield response
|
||||||
@@ -724,22 +764,27 @@ def _get_response(request, auth, template):
|
|||||||
|
|
||||||
|
|
||||||
def _construct_request(
|
def _construct_request(
|
||||||
per_page, page, query_args, template, auth, as_app=None, fine=False
|
per_page, query_args, template, auth, as_app=None, fine=False
|
||||||
):
|
):
|
||||||
all_query_args = {}
|
# If template is already a full URL with query params (from Link header), use it directly
|
||||||
if per_page:
|
if "?" in template and template.startswith("http"):
|
||||||
all_query_args["per_page"] = per_page
|
request_url = template
|
||||||
if page:
|
# Extract query string for logging
|
||||||
all_query_args["page"] = page
|
querystring = template.split("?", 1)[1]
|
||||||
if query_args:
|
|
||||||
all_query_args.update(query_args)
|
|
||||||
|
|
||||||
request_url = template
|
|
||||||
if all_query_args:
|
|
||||||
querystring = urlencode(all_query_args)
|
|
||||||
request_url = template + "?" + querystring
|
|
||||||
else:
|
else:
|
||||||
querystring = ""
|
# Build URL with query parameters
|
||||||
|
all_query_args = {}
|
||||||
|
if per_page:
|
||||||
|
all_query_args["per_page"] = per_page
|
||||||
|
if query_args:
|
||||||
|
all_query_args.update(query_args)
|
||||||
|
|
||||||
|
request_url = template
|
||||||
|
if all_query_args:
|
||||||
|
querystring = urlencode(all_query_args)
|
||||||
|
request_url = template + "?" + querystring
|
||||||
|
else:
|
||||||
|
querystring = ""
|
||||||
|
|
||||||
request = Request(request_url)
|
request = Request(request_url)
|
||||||
if auth is not None:
|
if auth is not None:
|
||||||
@@ -755,7 +800,7 @@ def _construct_request(
|
|||||||
"Accept", "application/vnd.github.machine-man-preview+json"
|
"Accept", "application/vnd.github.machine-man-preview+json"
|
||||||
)
|
)
|
||||||
|
|
||||||
log_url = template
|
log_url = template if "?" not in template else template.split("?")[0]
|
||||||
if querystring:
|
if querystring:
|
||||||
log_url += "?" + querystring
|
log_url += "?" + querystring
|
||||||
logger.info("Requesting {}".format(log_url))
|
logger.info("Requesting {}".format(log_url))
|
||||||
@@ -832,8 +877,7 @@ def download_file(url, path, auth, as_app=False, fine=False):
|
|||||||
return
|
return
|
||||||
|
|
||||||
request = _construct_request(
|
request = _construct_request(
|
||||||
per_page=100,
|
per_page=None,
|
||||||
page=1,
|
|
||||||
query_args={},
|
query_args={},
|
||||||
template=url,
|
template=url,
|
||||||
auth=auth,
|
auth=auth,
|
||||||
@@ -1543,7 +1587,9 @@ def filter_repositories(args, unfiltered_repositories):
|
|||||||
repositories = []
|
repositories = []
|
||||||
for r in unfiltered_repositories:
|
for r in unfiltered_repositories:
|
||||||
# gists can be anonymous, so need to safely check owner
|
# gists can be anonymous, so need to safely check owner
|
||||||
if r.get("owner", {}).get("login") == args.user or r.get("is_starred"):
|
# Use case-insensitive comparison to match GitHub's case-insensitive username behavior
|
||||||
|
owner_login = r.get("owner", {}).get("login", "")
|
||||||
|
if owner_login.lower() == args.user.lower() or r.get("is_starred"):
|
||||||
repositories.append(r)
|
repositories.append(r)
|
||||||
|
|
||||||
name_regex = None
|
name_regex = None
|
||||||
@@ -1646,40 +1692,47 @@ def backup_repositories(args, output_directory, repositories):
|
|||||||
|
|
||||||
continue # don't try to back anything else for a gist; it doesn't exist
|
continue # don't try to back anything else for a gist; it doesn't exist
|
||||||
|
|
||||||
download_wiki = args.include_wiki or args.include_everything
|
try:
|
||||||
if repository["has_wiki"] and download_wiki:
|
download_wiki = args.include_wiki or args.include_everything
|
||||||
fetch_repository(
|
if repository["has_wiki"] and download_wiki:
|
||||||
repository["name"],
|
fetch_repository(
|
||||||
repo_url.replace(".git", ".wiki.git"),
|
repository["name"],
|
||||||
os.path.join(repo_cwd, "wiki"),
|
repo_url.replace(".git", ".wiki.git"),
|
||||||
skip_existing=args.skip_existing,
|
os.path.join(repo_cwd, "wiki"),
|
||||||
bare_clone=args.bare_clone,
|
skip_existing=args.skip_existing,
|
||||||
lfs_clone=args.lfs_clone,
|
bare_clone=args.bare_clone,
|
||||||
no_prune=args.no_prune,
|
lfs_clone=args.lfs_clone,
|
||||||
)
|
no_prune=args.no_prune,
|
||||||
if args.include_issues or args.include_everything:
|
)
|
||||||
backup_issues(args, repo_cwd, repository, repos_template)
|
if args.include_issues or args.include_everything:
|
||||||
|
backup_issues(args, repo_cwd, repository, repos_template)
|
||||||
|
|
||||||
if args.include_pulls or args.include_everything:
|
if args.include_pulls or args.include_everything:
|
||||||
backup_pulls(args, repo_cwd, repository, repos_template)
|
backup_pulls(args, repo_cwd, repository, repos_template)
|
||||||
|
|
||||||
if args.include_milestones or args.include_everything:
|
if args.include_milestones or args.include_everything:
|
||||||
backup_milestones(args, repo_cwd, repository, repos_template)
|
backup_milestones(args, repo_cwd, repository, repos_template)
|
||||||
|
|
||||||
if args.include_labels or args.include_everything:
|
if args.include_labels or args.include_everything:
|
||||||
backup_labels(args, repo_cwd, repository, repos_template)
|
backup_labels(args, repo_cwd, repository, repos_template)
|
||||||
|
|
||||||
if args.include_hooks or args.include_everything:
|
if args.include_hooks or args.include_everything:
|
||||||
backup_hooks(args, repo_cwd, repository, repos_template)
|
backup_hooks(args, repo_cwd, repository, repos_template)
|
||||||
|
|
||||||
if args.include_releases or args.include_everything:
|
if args.include_releases or args.include_everything:
|
||||||
backup_releases(
|
backup_releases(
|
||||||
args,
|
args,
|
||||||
repo_cwd,
|
repo_cwd,
|
||||||
repository,
|
repository,
|
||||||
repos_template,
|
repos_template,
|
||||||
include_assets=args.include_assets or args.include_everything,
|
include_assets=args.include_assets or args.include_everything,
|
||||||
)
|
)
|
||||||
|
except RepositoryUnavailableError as e:
|
||||||
|
logger.warning(f"Repository {repository['full_name']} is unavailable (HTTP 451)")
|
||||||
|
if e.dmca_url:
|
||||||
|
logger.warning(f"DMCA notice: {e.dmca_url}")
|
||||||
|
logger.info(f"Skipping remaining resources for {repository['full_name']}")
|
||||||
|
continue
|
||||||
|
|
||||||
if args.incremental:
|
if args.incremental:
|
||||||
if last_update == "0000-00-00T00:00:00Z":
|
if last_update == "0000-00-00T00:00:00Z":
|
||||||
@@ -1847,11 +1900,21 @@ def backup_milestones(args, repo_cwd, repository, repos_template):
|
|||||||
for milestone in _milestones:
|
for milestone in _milestones:
|
||||||
milestones[milestone["number"]] = milestone
|
milestones[milestone["number"]] = milestone
|
||||||
|
|
||||||
logger.info("Saving {0} milestones to disk".format(len(list(milestones.keys()))))
|
written_count = 0
|
||||||
for number, milestone in list(milestones.items()):
|
for number, milestone in list(milestones.items()):
|
||||||
milestone_file = "{0}/{1}.json".format(milestone_cwd, number)
|
milestone_file = "{0}/{1}.json".format(milestone_cwd, number)
|
||||||
with codecs.open(milestone_file, "w", encoding="utf-8") as f:
|
if json_dump_if_changed(milestone, milestone_file):
|
||||||
json_dump(milestone, f)
|
written_count += 1
|
||||||
|
|
||||||
|
total = len(milestones)
|
||||||
|
if written_count == total:
|
||||||
|
logger.info("Saved {0} milestones to disk".format(total))
|
||||||
|
elif written_count == 0:
|
||||||
|
logger.info("{0} milestones unchanged, skipped write".format(total))
|
||||||
|
else:
|
||||||
|
logger.info("Saved {0} of {1} milestones to disk ({2} unchanged)".format(
|
||||||
|
written_count, total, total - written_count
|
||||||
|
))
|
||||||
|
|
||||||
|
|
||||||
def backup_labels(args, repo_cwd, repository, repos_template):
|
def backup_labels(args, repo_cwd, repository, repos_template):
|
||||||
@@ -1904,19 +1967,17 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
|
|||||||
reverse=True,
|
reverse=True,
|
||||||
)
|
)
|
||||||
releases = releases[: args.number_of_latest_releases]
|
releases = releases[: args.number_of_latest_releases]
|
||||||
logger.info("Saving the latest {0} releases to disk".format(len(releases)))
|
|
||||||
else:
|
|
||||||
logger.info("Saving {0} releases to disk".format(len(releases)))
|
|
||||||
|
|
||||||
# for each release, store it
|
# for each release, store it
|
||||||
|
written_count = 0
|
||||||
for release in releases:
|
for release in releases:
|
||||||
release_name = release["tag_name"]
|
release_name = release["tag_name"]
|
||||||
release_name_safe = release_name.replace("/", "__")
|
release_name_safe = release_name.replace("/", "__")
|
||||||
output_filepath = os.path.join(
|
output_filepath = os.path.join(
|
||||||
release_cwd, "{0}.json".format(release_name_safe)
|
release_cwd, "{0}.json".format(release_name_safe)
|
||||||
)
|
)
|
||||||
with codecs.open(output_filepath, "w+", encoding="utf-8") as f:
|
if json_dump_if_changed(release, output_filepath):
|
||||||
json_dump(release, f)
|
written_count += 1
|
||||||
|
|
||||||
if include_assets:
|
if include_assets:
|
||||||
assets = retrieve_data(args, release["assets_url"])
|
assets = retrieve_data(args, release["assets_url"])
|
||||||
@@ -1933,6 +1994,17 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
|
|||||||
fine=True if args.token_fine is not None else False,
|
fine=True if args.token_fine is not None else False,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# Log the results
|
||||||
|
total = len(releases)
|
||||||
|
if written_count == total:
|
||||||
|
logger.info("Saved {0} releases to disk".format(total))
|
||||||
|
elif written_count == 0:
|
||||||
|
logger.info("{0} releases unchanged, skipped write".format(total))
|
||||||
|
else:
|
||||||
|
logger.info("Saved {0} of {1} releases to disk ({2} unchanged)".format(
|
||||||
|
written_count, total, total - written_count
|
||||||
|
))
|
||||||
|
|
||||||
|
|
||||||
def fetch_repository(
|
def fetch_repository(
|
||||||
name,
|
name,
|
||||||
@@ -2057,9 +2129,10 @@ def _backup_data(args, name, template, output_file, output_directory):
|
|||||||
mkdir_p(output_directory)
|
mkdir_p(output_directory)
|
||||||
data = retrieve_data(args, template)
|
data = retrieve_data(args, template)
|
||||||
|
|
||||||
logger.info("Writing {0} {1} to disk".format(len(data), name))
|
if json_dump_if_changed(data, output_file):
|
||||||
with codecs.open(output_file, "w", encoding="utf-8") as f:
|
logger.info("Saved {0} {1} to disk".format(len(data), name))
|
||||||
json_dump(data, f)
|
else:
|
||||||
|
logger.info("{0} {1} unchanged, skipped write".format(len(data), name))
|
||||||
|
|
||||||
|
|
||||||
def json_dump(data, output_file):
|
def json_dump(data, output_file):
|
||||||
@@ -2071,3 +2144,57 @@ def json_dump(data, output_file):
|
|||||||
indent=4,
|
indent=4,
|
||||||
separators=(",", ": "),
|
separators=(",", ": "),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def json_dump_if_changed(data, output_file_path):
|
||||||
|
"""
|
||||||
|
Write JSON data to file only if content has changed.
|
||||||
|
|
||||||
|
Compares the serialized JSON data with the existing file content
|
||||||
|
and only writes if different. This prevents unnecessary file
|
||||||
|
modification timestamp updates and disk writes.
|
||||||
|
|
||||||
|
Uses atomic writes (temp file + rename) to prevent corruption
|
||||||
|
if the process is interrupted during the write.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
data: The data to serialize as JSON
|
||||||
|
output_file_path: The path to the output file
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if file was written (content changed or new file)
|
||||||
|
False if write was skipped (content unchanged)
|
||||||
|
"""
|
||||||
|
# Serialize new data with consistent formatting matching json_dump()
|
||||||
|
new_content = json.dumps(
|
||||||
|
data,
|
||||||
|
ensure_ascii=False,
|
||||||
|
sort_keys=True,
|
||||||
|
indent=4,
|
||||||
|
separators=(",", ": "),
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check if file exists and compare content
|
||||||
|
if os.path.exists(output_file_path):
|
||||||
|
try:
|
||||||
|
with codecs.open(output_file_path, "r", encoding="utf-8") as f:
|
||||||
|
existing_content = f.read()
|
||||||
|
if existing_content == new_content:
|
||||||
|
logger.debug(
|
||||||
|
"Content unchanged, skipping write: {0}".format(output_file_path)
|
||||||
|
)
|
||||||
|
return False
|
||||||
|
except (OSError, UnicodeDecodeError) as e:
|
||||||
|
# If we can't read the existing file, write the new one
|
||||||
|
logger.debug(
|
||||||
|
"Error reading existing file {0}, will overwrite: {1}".format(
|
||||||
|
output_file_path, e
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Write the file atomically using temp file + rename
|
||||||
|
temp_file = output_file_path + ".temp"
|
||||||
|
with codecs.open(temp_file, "w", encoding="utf-8") as f:
|
||||||
|
f.write(new_content)
|
||||||
|
os.rename(temp_file, output_file_path) # Atomic on POSIX systems
|
||||||
|
return True
|
||||||
|
|||||||
@@ -3,16 +3,16 @@ black==25.11.0
|
|||||||
bleach==6.3.0
|
bleach==6.3.0
|
||||||
certifi==2025.11.12
|
certifi==2025.11.12
|
||||||
charset-normalizer==3.4.4
|
charset-normalizer==3.4.4
|
||||||
click==8.3.0
|
click==8.3.1
|
||||||
colorama==0.4.6
|
colorama==0.4.6
|
||||||
docutils==0.22.3
|
docutils==0.22.3
|
||||||
flake8==7.3.0
|
flake8==7.3.0
|
||||||
gitchangelog==3.0.4
|
gitchangelog==3.0.4
|
||||||
pytest==8.3.3
|
pytest==9.0.1
|
||||||
idna==3.11
|
idna==3.11
|
||||||
importlib-metadata==8.7.0
|
importlib-metadata==8.7.0
|
||||||
jaraco.classes==3.4.0
|
jaraco.classes==3.4.0
|
||||||
keyring==25.6.0
|
keyring==25.7.0
|
||||||
markdown-it-py==4.0.0
|
markdown-it-py==4.0.0
|
||||||
mccabe==0.7.0
|
mccabe==0.7.0
|
||||||
mdurl==0.1.2
|
mdurl==0.1.2
|
||||||
@@ -28,7 +28,7 @@ Pygments==2.19.2
|
|||||||
readme-renderer==44.0
|
readme-renderer==44.0
|
||||||
requests==2.32.5
|
requests==2.32.5
|
||||||
requests-toolbelt==1.0.0
|
requests-toolbelt==1.0.0
|
||||||
restructuredtext-lint==1.4.0
|
restructuredtext-lint==2.0.2
|
||||||
rfc3986==2.0.0
|
rfc3986==2.0.0
|
||||||
rich==14.2.0
|
rich==14.2.0
|
||||||
setuptools==80.9.0
|
setuptools==80.9.0
|
||||||
|
|||||||
@@ -1 +0,0 @@
|
|||||||
|
|
||||||
|
|||||||
143
tests/test_http_451.py
Normal file
143
tests/test_http_451.py
Normal file
@@ -0,0 +1,143 @@
|
|||||||
|
"""Tests for HTTP 451 (DMCA takedown) handling."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
from unittest.mock import Mock, patch
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from github_backup import github_backup
|
||||||
|
|
||||||
|
|
||||||
|
class TestHTTP451Exception:
|
||||||
|
"""Test suite for HTTP 451 DMCA takedown exception handling."""
|
||||||
|
|
||||||
|
def test_repository_unavailable_error_raised(self):
|
||||||
|
"""HTTP 451 should raise RepositoryUnavailableError with DMCA URL."""
|
||||||
|
# Create mock args
|
||||||
|
args = Mock()
|
||||||
|
args.as_app = False
|
||||||
|
args.token_fine = None
|
||||||
|
args.token_classic = None
|
||||||
|
args.username = None
|
||||||
|
args.password = None
|
||||||
|
args.osx_keychain_item_name = None
|
||||||
|
args.osx_keychain_item_account = None
|
||||||
|
args.throttle_limit = None
|
||||||
|
args.throttle_pause = 0
|
||||||
|
|
||||||
|
# Mock HTTPError 451 response
|
||||||
|
mock_response = Mock()
|
||||||
|
mock_response.getcode.return_value = 451
|
||||||
|
|
||||||
|
dmca_data = {
|
||||||
|
"message": "Repository access blocked",
|
||||||
|
"block": {
|
||||||
|
"reason": "dmca",
|
||||||
|
"created_at": "2024-11-12T14:38:04Z",
|
||||||
|
"html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
mock_response.read.return_value = json.dumps(dmca_data).encode("utf-8")
|
||||||
|
mock_response.headers = {"x-ratelimit-remaining": "5000"}
|
||||||
|
mock_response.reason = "Unavailable For Legal Reasons"
|
||||||
|
|
||||||
|
def mock_get_response(request, auth, template):
|
||||||
|
return mock_response, []
|
||||||
|
|
||||||
|
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
|
||||||
|
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
|
||||||
|
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
|
||||||
|
|
||||||
|
# Check exception has DMCA URL
|
||||||
|
assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
|
||||||
|
assert "451" in str(exc_info.value)
|
||||||
|
|
||||||
|
def test_repository_unavailable_error_without_dmca_url(self):
|
||||||
|
"""HTTP 451 without DMCA details should still raise exception."""
|
||||||
|
args = Mock()
|
||||||
|
args.as_app = False
|
||||||
|
args.token_fine = None
|
||||||
|
args.token_classic = None
|
||||||
|
args.username = None
|
||||||
|
args.password = None
|
||||||
|
args.osx_keychain_item_name = None
|
||||||
|
args.osx_keychain_item_account = None
|
||||||
|
args.throttle_limit = None
|
||||||
|
args.throttle_pause = 0
|
||||||
|
|
||||||
|
mock_response = Mock()
|
||||||
|
mock_response.getcode.return_value = 451
|
||||||
|
mock_response.read.return_value = b'{"message": "Blocked"}'
|
||||||
|
mock_response.headers = {"x-ratelimit-remaining": "5000"}
|
||||||
|
mock_response.reason = "Unavailable For Legal Reasons"
|
||||||
|
|
||||||
|
def mock_get_response(request, auth, template):
|
||||||
|
return mock_response, []
|
||||||
|
|
||||||
|
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
|
||||||
|
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
|
||||||
|
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
|
||||||
|
|
||||||
|
# Exception raised even without DMCA URL
|
||||||
|
assert exc_info.value.dmca_url is None
|
||||||
|
assert "451" in str(exc_info.value)
|
||||||
|
|
||||||
|
def test_repository_unavailable_error_with_malformed_json(self):
|
||||||
|
"""HTTP 451 with malformed JSON should still raise exception."""
|
||||||
|
args = Mock()
|
||||||
|
args.as_app = False
|
||||||
|
args.token_fine = None
|
||||||
|
args.token_classic = None
|
||||||
|
args.username = None
|
||||||
|
args.password = None
|
||||||
|
args.osx_keychain_item_name = None
|
||||||
|
args.osx_keychain_item_account = None
|
||||||
|
args.throttle_limit = None
|
||||||
|
args.throttle_pause = 0
|
||||||
|
|
||||||
|
mock_response = Mock()
|
||||||
|
mock_response.getcode.return_value = 451
|
||||||
|
mock_response.read.return_value = b"invalid json {"
|
||||||
|
mock_response.headers = {"x-ratelimit-remaining": "5000"}
|
||||||
|
mock_response.reason = "Unavailable For Legal Reasons"
|
||||||
|
|
||||||
|
def mock_get_response(request, auth, template):
|
||||||
|
return mock_response, []
|
||||||
|
|
||||||
|
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
|
||||||
|
with pytest.raises(github_backup.RepositoryUnavailableError):
|
||||||
|
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
|
||||||
|
|
||||||
|
def test_other_http_errors_unchanged(self):
|
||||||
|
"""Other HTTP errors should still raise generic Exception."""
|
||||||
|
args = Mock()
|
||||||
|
args.as_app = False
|
||||||
|
args.token_fine = None
|
||||||
|
args.token_classic = None
|
||||||
|
args.username = None
|
||||||
|
args.password = None
|
||||||
|
args.osx_keychain_item_name = None
|
||||||
|
args.osx_keychain_item_account = None
|
||||||
|
args.throttle_limit = None
|
||||||
|
args.throttle_pause = 0
|
||||||
|
|
||||||
|
mock_response = Mock()
|
||||||
|
mock_response.getcode.return_value = 404
|
||||||
|
mock_response.read.return_value = b'{"message": "Not Found"}'
|
||||||
|
mock_response.headers = {"x-ratelimit-remaining": "5000"}
|
||||||
|
mock_response.reason = "Not Found"
|
||||||
|
|
||||||
|
def mock_get_response(request, auth, template):
|
||||||
|
return mock_response, []
|
||||||
|
|
||||||
|
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
|
||||||
|
# Should raise generic Exception, not RepositoryUnavailableError
|
||||||
|
with pytest.raises(Exception) as exc_info:
|
||||||
|
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/notfound/issues"))
|
||||||
|
|
||||||
|
assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError)
|
||||||
|
assert "404" in str(exc_info.value)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
pytest.main([__file__, "-v"])
|
||||||
198
tests/test_json_dump_if_changed.py
Normal file
198
tests/test_json_dump_if_changed.py
Normal file
@@ -0,0 +1,198 @@
|
|||||||
|
"""Tests for json_dump_if_changed functionality."""
|
||||||
|
|
||||||
|
import codecs
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import tempfile
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from github_backup import github_backup
|
||||||
|
|
||||||
|
|
||||||
|
class TestJsonDumpIfChanged:
|
||||||
|
"""Test suite for json_dump_if_changed function."""
|
||||||
|
|
||||||
|
def test_writes_new_file(self):
|
||||||
|
"""Should write file when it doesn't exist."""
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdir:
|
||||||
|
output_file = os.path.join(tmpdir, "test.json")
|
||||||
|
test_data = {"key": "value", "number": 42}
|
||||||
|
|
||||||
|
result = github_backup.json_dump_if_changed(test_data, output_file)
|
||||||
|
|
||||||
|
assert result is True
|
||||||
|
assert os.path.exists(output_file)
|
||||||
|
|
||||||
|
# Verify content matches expected format
|
||||||
|
with codecs.open(output_file, "r", encoding="utf-8") as f:
|
||||||
|
content = f.read()
|
||||||
|
loaded = json.loads(content)
|
||||||
|
assert loaded == test_data
|
||||||
|
|
||||||
|
def test_skips_unchanged_file(self):
|
||||||
|
"""Should skip write when content is identical."""
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdir:
|
||||||
|
output_file = os.path.join(tmpdir, "test.json")
|
||||||
|
test_data = {"key": "value", "number": 42}
|
||||||
|
|
||||||
|
# First write
|
||||||
|
result1 = github_backup.json_dump_if_changed(test_data, output_file)
|
||||||
|
assert result1 is True
|
||||||
|
|
||||||
|
# Get the initial mtime
|
||||||
|
mtime1 = os.path.getmtime(output_file)
|
||||||
|
|
||||||
|
# Second write with same data
|
||||||
|
result2 = github_backup.json_dump_if_changed(test_data, output_file)
|
||||||
|
assert result2 is False
|
||||||
|
|
||||||
|
# File should not have been modified
|
||||||
|
mtime2 = os.path.getmtime(output_file)
|
||||||
|
assert mtime1 == mtime2
|
||||||
|
|
||||||
|
def test_writes_when_content_changed(self):
|
||||||
|
"""Should write file when content has changed."""
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdir:
|
||||||
|
output_file = os.path.join(tmpdir, "test.json")
|
||||||
|
test_data1 = {"key": "value1"}
|
||||||
|
test_data2 = {"key": "value2"}
|
||||||
|
|
||||||
|
# First write
|
||||||
|
result1 = github_backup.json_dump_if_changed(test_data1, output_file)
|
||||||
|
assert result1 is True
|
||||||
|
|
||||||
|
# Second write with different data
|
||||||
|
result2 = github_backup.json_dump_if_changed(test_data2, output_file)
|
||||||
|
assert result2 is True
|
||||||
|
|
||||||
|
# Verify new content
|
||||||
|
with codecs.open(output_file, "r", encoding="utf-8") as f:
|
||||||
|
loaded = json.load(f)
|
||||||
|
assert loaded == test_data2
|
||||||
|
|
||||||
|
def test_uses_consistent_formatting(self):
|
||||||
|
"""Should use same JSON formatting as json_dump."""
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdir:
|
||||||
|
output_file = os.path.join(tmpdir, "test.json")
|
||||||
|
test_data = {"z": "last", "a": "first", "m": "middle"}
|
||||||
|
|
||||||
|
github_backup.json_dump_if_changed(test_data, output_file)
|
||||||
|
|
||||||
|
with codecs.open(output_file, "r", encoding="utf-8") as f:
|
||||||
|
content = f.read()
|
||||||
|
|
||||||
|
# Check for consistent formatting:
|
||||||
|
# - sorted keys
|
||||||
|
# - 4-space indent
|
||||||
|
# - comma-colon-space separator
|
||||||
|
expected = json.dumps(
|
||||||
|
test_data,
|
||||||
|
ensure_ascii=False,
|
||||||
|
sort_keys=True,
|
||||||
|
indent=4,
|
||||||
|
separators=(",", ": "),
|
||||||
|
)
|
||||||
|
assert content == expected
|
||||||
|
|
||||||
|
def test_atomic_write_always_used(self):
|
||||||
|
"""Should always use temp file and rename for atomic writes."""
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdir:
|
||||||
|
output_file = os.path.join(tmpdir, "test.json")
|
||||||
|
test_data = {"key": "value"}
|
||||||
|
|
||||||
|
result = github_backup.json_dump_if_changed(test_data, output_file)
|
||||||
|
|
||||||
|
assert result is True
|
||||||
|
assert os.path.exists(output_file)
|
||||||
|
|
||||||
|
# Temp file should not exist after atomic write
|
||||||
|
temp_file = output_file + ".temp"
|
||||||
|
assert not os.path.exists(temp_file)
|
||||||
|
|
||||||
|
# Verify content
|
||||||
|
with codecs.open(output_file, "r", encoding="utf-8") as f:
|
||||||
|
loaded = json.load(f)
|
||||||
|
assert loaded == test_data
|
||||||
|
|
||||||
|
def test_handles_unicode_content(self):
|
||||||
|
"""Should correctly handle Unicode content."""
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdir:
|
||||||
|
output_file = os.path.join(tmpdir, "test.json")
|
||||||
|
test_data = {
|
||||||
|
"emoji": "🚀",
|
||||||
|
"chinese": "你好",
|
||||||
|
"arabic": "مرحبا",
|
||||||
|
"cyrillic": "Привет",
|
||||||
|
}
|
||||||
|
|
||||||
|
result = github_backup.json_dump_if_changed(test_data, output_file)
|
||||||
|
assert result is True
|
||||||
|
|
||||||
|
# Verify Unicode is preserved
|
||||||
|
with codecs.open(output_file, "r", encoding="utf-8") as f:
|
||||||
|
loaded = json.load(f)
|
||||||
|
assert loaded == test_data
|
||||||
|
|
||||||
|
# Second write should skip
|
||||||
|
result2 = github_backup.json_dump_if_changed(test_data, output_file)
|
||||||
|
assert result2 is False
|
||||||
|
|
||||||
|
def test_handles_complex_nested_data(self):
|
||||||
|
"""Should handle complex nested data structures."""
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdir:
|
||||||
|
output_file = os.path.join(tmpdir, "test.json")
|
||||||
|
test_data = {
|
||||||
|
"users": [
|
||||||
|
{"id": 1, "name": "Alice", "tags": ["admin", "user"]},
|
||||||
|
{"id": 2, "name": "Bob", "tags": ["user"]},
|
||||||
|
],
|
||||||
|
"metadata": {"version": "1.0", "nested": {"deep": {"value": 42}}},
|
||||||
|
}
|
||||||
|
|
||||||
|
result = github_backup.json_dump_if_changed(test_data, output_file)
|
||||||
|
assert result is True
|
||||||
|
|
||||||
|
# Verify structure is preserved
|
||||||
|
with codecs.open(output_file, "r", encoding="utf-8") as f:
|
||||||
|
loaded = json.load(f)
|
||||||
|
assert loaded == test_data
|
||||||
|
|
||||||
|
def test_overwrites_on_unicode_decode_error(self):
|
||||||
|
"""Should overwrite if existing file has invalid UTF-8."""
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdir:
|
||||||
|
output_file = os.path.join(tmpdir, "test.json")
|
||||||
|
test_data = {"key": "value"}
|
||||||
|
|
||||||
|
# Write invalid UTF-8 bytes
|
||||||
|
with open(output_file, "wb") as f:
|
||||||
|
f.write(b"\xff\xfe invalid utf-8")
|
||||||
|
|
||||||
|
# Should catch UnicodeDecodeError and overwrite
|
||||||
|
result = github_backup.json_dump_if_changed(test_data, output_file)
|
||||||
|
assert result is True
|
||||||
|
|
||||||
|
# Verify new content was written
|
||||||
|
with codecs.open(output_file, "r", encoding="utf-8") as f:
|
||||||
|
loaded = json.load(f)
|
||||||
|
assert loaded == test_data
|
||||||
|
|
||||||
|
def test_key_order_independence(self):
|
||||||
|
"""Should treat differently-ordered dicts as same if keys/values match."""
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdir:
|
||||||
|
output_file = os.path.join(tmpdir, "test.json")
|
||||||
|
|
||||||
|
# Write first dict
|
||||||
|
data1 = {"z": 1, "a": 2, "m": 3}
|
||||||
|
github_backup.json_dump_if_changed(data1, output_file)
|
||||||
|
|
||||||
|
# Try to write same data but different order
|
||||||
|
data2 = {"a": 2, "m": 3, "z": 1}
|
||||||
|
result = github_backup.json_dump_if_changed(data2, output_file)
|
||||||
|
|
||||||
|
# Should skip because content is the same (keys are sorted)
|
||||||
|
assert result is False
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
pytest.main([__file__, "-v"])
|
||||||
153
tests/test_pagination.py
Normal file
153
tests/test_pagination.py
Normal file
@@ -0,0 +1,153 @@
|
|||||||
|
"""Tests for Link header pagination handling."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
from unittest.mock import Mock, patch
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from github_backup import github_backup
|
||||||
|
|
||||||
|
|
||||||
|
class MockHTTPResponse:
|
||||||
|
"""Mock HTTP response for paginated API calls."""
|
||||||
|
|
||||||
|
def __init__(self, data, link_header=None):
|
||||||
|
self._content = json.dumps(data).encode("utf-8")
|
||||||
|
self._link_header = link_header
|
||||||
|
self._read = False
|
||||||
|
self.reason = "OK"
|
||||||
|
|
||||||
|
def getcode(self):
|
||||||
|
return 200
|
||||||
|
|
||||||
|
def read(self):
|
||||||
|
if self._read:
|
||||||
|
return b""
|
||||||
|
self._read = True
|
||||||
|
return self._content
|
||||||
|
|
||||||
|
def get_header(self, name, default=None):
|
||||||
|
"""Mock method for headers.get()."""
|
||||||
|
return self.headers.get(name, default)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def headers(self):
|
||||||
|
headers = {"x-ratelimit-remaining": "5000"}
|
||||||
|
if self._link_header:
|
||||||
|
headers["Link"] = self._link_header
|
||||||
|
return headers
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def mock_args():
|
||||||
|
"""Mock args for retrieve_data_gen."""
|
||||||
|
args = Mock()
|
||||||
|
args.as_app = False
|
||||||
|
args.token_fine = None
|
||||||
|
args.token_classic = "fake_token"
|
||||||
|
args.username = None
|
||||||
|
args.password = None
|
||||||
|
args.osx_keychain_item_name = None
|
||||||
|
args.osx_keychain_item_account = None
|
||||||
|
args.throttle_limit = None
|
||||||
|
args.throttle_pause = 0
|
||||||
|
return args
|
||||||
|
|
||||||
|
|
||||||
|
def test_cursor_based_pagination(mock_args):
|
||||||
|
"""Link header with 'after' cursor parameter works correctly."""
|
||||||
|
|
||||||
|
# Simulate issues endpoint behavior: returns cursor in Link header
|
||||||
|
responses = [
|
||||||
|
# Issues endpoint returns 'after' cursor parameter (not 'page')
|
||||||
|
MockHTTPResponse(
|
||||||
|
data=[{"issue": i} for i in range(1, 101)], # Page 1 contents
|
||||||
|
link_header='<https://api.github.com/repos/owner/repo/issues?per_page=100&after=ABC123&page=2>; rel="next"',
|
||||||
|
),
|
||||||
|
MockHTTPResponse(
|
||||||
|
data=[{"issue": i} for i in range(101, 151)], # Page 2 contents
|
||||||
|
link_header=None, # No Link header - signals end of pagination
|
||||||
|
),
|
||||||
|
]
|
||||||
|
requests_made = []
|
||||||
|
|
||||||
|
def mock_urlopen(request, *args, **kwargs):
|
||||||
|
url = request.get_full_url()
|
||||||
|
requests_made.append(url)
|
||||||
|
return responses[len(requests_made) - 1]
|
||||||
|
|
||||||
|
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
|
||||||
|
results = list(
|
||||||
|
github_backup.retrieve_data_gen(
|
||||||
|
mock_args, "https://api.github.com/repos/owner/repo/issues"
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Verify all items retrieved and cursor was used in second request
|
||||||
|
assert len(results) == 150
|
||||||
|
assert len(requests_made) == 2
|
||||||
|
assert "after=ABC123" in requests_made[1]
|
||||||
|
|
||||||
|
|
||||||
|
def test_page_based_pagination(mock_args):
|
||||||
|
"""Link header with 'page' parameter works correctly."""
|
||||||
|
|
||||||
|
# Simulate pulls/repos endpoint behavior: returns page numbers in Link header
|
||||||
|
responses = [
|
||||||
|
# Pulls endpoint uses traditional 'page' parameter (not cursor)
|
||||||
|
MockHTTPResponse(
|
||||||
|
data=[{"pull": i} for i in range(1, 101)], # Page 1 contents
|
||||||
|
link_header='<https://api.github.com/repos/owner/repo/pulls?per_page=100&page=2>; rel="next"',
|
||||||
|
),
|
||||||
|
MockHTTPResponse(
|
||||||
|
data=[{"pull": i} for i in range(101, 181)], # Page 2 contents
|
||||||
|
link_header=None, # No Link header - signals end of pagination
|
||||||
|
),
|
||||||
|
]
|
||||||
|
requests_made = []
|
||||||
|
|
||||||
|
def mock_urlopen(request, *args, **kwargs):
|
||||||
|
url = request.get_full_url()
|
||||||
|
requests_made.append(url)
|
||||||
|
return responses[len(requests_made) - 1]
|
||||||
|
|
||||||
|
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
|
||||||
|
results = list(
|
||||||
|
github_backup.retrieve_data_gen(
|
||||||
|
mock_args, "https://api.github.com/repos/owner/repo/pulls"
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Verify all items retrieved and page parameter was used (not cursor)
|
||||||
|
assert len(results) == 180
|
||||||
|
assert len(requests_made) == 2
|
||||||
|
assert "page=2" in requests_made[1]
|
||||||
|
assert "after" not in requests_made[1]
|
||||||
|
|
||||||
|
|
||||||
|
def test_no_link_header_stops_pagination(mock_args):
|
||||||
|
"""Pagination stops when Link header is absent."""
|
||||||
|
|
||||||
|
# Simulate endpoint with results that fit in a single page
|
||||||
|
responses = [
|
||||||
|
MockHTTPResponse(
|
||||||
|
data=[{"label": i} for i in range(1, 51)], # Page contents
|
||||||
|
link_header=None, # No Link header - signals end of pagination
|
||||||
|
)
|
||||||
|
]
|
||||||
|
requests_made = []
|
||||||
|
|
||||||
|
def mock_urlopen(request, *args, **kwargs):
|
||||||
|
requests_made.append(request.get_full_url())
|
||||||
|
return responses[len(requests_made) - 1]
|
||||||
|
|
||||||
|
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
|
||||||
|
results = list(
|
||||||
|
github_backup.retrieve_data_gen(
|
||||||
|
mock_args, "https://api.github.com/repos/owner/repo/labels"
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Verify pagination stopped after first request
|
||||||
|
assert len(results) == 50
|
||||||
|
assert len(requests_made) == 1
|
||||||
Reference in New Issue
Block a user