Compare commits

..

126 Commits

Author SHA1 Message Date
GitHub Action
f8cdf55050 Release version 0.62.0 2026-04-29 12:10:11 +00:00
Jose Diaz-Gonzalez
b59f719f10 Merge pull request #505 from mrexodia/redundant-fetches
Reduce redundant fetches
2026-04-29 08:09:20 -04:00
Duncan Ogilvie
014eff395a Skip checkpoint-equal incremental items 2026-04-29 12:56:24 +02:00
Duncan Ogilvie
9d0cfdb61d Avoid redundant release asset list requests 2026-04-29 12:56:23 +02:00
Duncan Ogilvie
6cd0ab3633 Reduce unnecessary pull requests with incremental fetching 2026-04-29 12:56:23 +02:00
Jose Diaz-Gonzalez
02e833e40a Merge pull request #504 from mrexodia/per-resource-last-update
Implement per-resource last_update timestamps
2026-04-29 06:25:05 -04:00
Duncan Ogilvie
b3a8241c9a Implement per-resource last_update timestamps
Closes #62
2026-04-29 12:06:33 +02:00
Jose Diaz-Gonzalez
d19e2ad9c5 Merge pull request #503 from mrexodia/pr-reviews
Add support for pull request reviews
2026-04-29 05:52:50 -04:00
Duncan Ogilvie
24b3fdb4f3 Add support for pull request reviews
Closes #124
2026-04-29 11:43:30 +02:00
Jose Diaz-Gonzalez
013b27208e Merge pull request #502 from mrexodia/discussions
Add support for discussions
2026-04-29 00:42:53 -04:00
Duncan Ogilvie
4d022d94d0 Add support for discussions
Closes #290
2026-04-28 14:32:27 +02:00
Jose Diaz-Gonzalez
ed29a917ca Merge pull request #501 from mrexodia/gh-cli-token
Add --token-from-gh authentication option
2026-04-27 17:22:10 -04:00
Duncan Ogilvie
f4117990b2 Add --token-from-gh authentication option 2026-04-27 15:52:55 +02:00
Jose Diaz-Gonzalez
4c1f21a306 Merge pull request #499 from josegonzalez/dependabot/pip/python-packages-590e9db7b9
chore(deps): bump pytest from 9.0.2 to 9.0.3 in the python-packages group
2026-04-08 12:46:47 -04:00
dependabot[bot]
9fde6ed1ff chore(deps): bump pytest in the python-packages group
Bumps the python-packages group with 1 update: [pytest](https://github.com/pytest-dev/pytest).


Updates `pytest` from 9.0.2 to 9.0.3
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.2...9.0.3)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 9.0.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-08 13:05:48 +00:00
Jose Diaz-Gonzalez
9a9b069e14 Merge pull request #497 from josegonzalez/dependabot/pip/python-packages-b7f5c28099
chore(deps): bump black from 26.3.0 to 26.3.1 in the python-packages group
2026-03-19 18:05:25 -04:00
dependabot[bot]
f85c759e5d chore(deps): bump black in the python-packages group
Bumps the python-packages group with 1 update: [black](https://github.com/psf/black).


Updates `black` from 26.3.0 to 26.3.1
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/26.3.0...26.3.1)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 26.3.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-12 13:05:24 +00:00
Jose Diaz-Gonzalez
26a6e1df1b Merge pull request #491 from josegonzalez/dependabot/github_actions/docker/login-action-4
chore(deps): bump docker/login-action from 3 to 4
2026-03-09 13:30:22 -04:00
dependabot[bot]
3d961d1118 chore(deps): bump docker/login-action from 3 to 4
Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-09 17:26:41 +00:00
Jose Diaz-Gonzalez
20f9542063 Merge pull request #494 from josegonzalez/dependabot/github_actions/docker/setup-qemu-action-4
chore(deps): bump docker/setup-qemu-action from 3 to 4
2026-03-09 13:26:23 -04:00
Jose Diaz-Gonzalez
bbf76e70eb Merge pull request #495 from josegonzalez/dependabot/github_actions/docker/build-push-action-7
chore(deps): bump docker/build-push-action from 6 to 7
2026-03-09 13:26:11 -04:00
Jose Diaz-Gonzalez
ca70725449 Merge pull request #493 from josegonzalez/dependabot/github_actions/docker/setup-buildx-action-4
chore(deps): bump docker/setup-buildx-action from 3 to 4
2026-03-09 13:25:54 -04:00
Jose Diaz-Gonzalez
653ceb1e12 Merge pull request #492 from josegonzalez/dependabot/github_actions/docker/metadata-action-6
chore(deps): bump docker/metadata-action from 5 to 6
2026-03-09 13:25:43 -04:00
Jose Diaz-Gonzalez
ba1575538b Merge pull request #496 from josegonzalez/dependabot/pip/python-packages-898938d50a
chore(deps): bump the python-packages group with 2 updates
2026-03-09 13:25:35 -04:00
dependabot[bot]
d5be07ec80 chore(deps): bump the python-packages group with 2 updates
Bumps the python-packages group with 2 updates: [black](https://github.com/psf/black) and [setuptools](https://github.com/pypa/setuptools).


Updates `black` from 26.1.0 to 26.3.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/26.1.0...26.3.0)

Updates `setuptools` from 82.0.0 to 82.0.1
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v82.0.0...v82.0.1)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 26.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-packages
- dependency-name: setuptools
  dependency-version: 82.0.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-09 13:28:37 +00:00
dependabot[bot]
5758e489e8 chore(deps): bump docker/build-push-action from 6 to 7
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v6...v7)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-09 04:33:58 +00:00
dependabot[bot]
cceef92346 chore(deps): bump docker/setup-qemu-action from 3 to 4
Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3 to 4.
- [Release notes](https://github.com/docker/setup-qemu-action/releases)
- [Commits](https://github.com/docker/setup-qemu-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: docker/setup-qemu-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-09 04:33:55 +00:00
dependabot[bot]
7f1807aaf8 chore(deps): bump docker/setup-buildx-action from 3 to 4
Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4.
- [Release notes](https://github.com/docker/setup-buildx-action/releases)
- [Commits](https://github.com/docker/setup-buildx-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: docker/setup-buildx-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-09 04:33:53 +00:00
dependabot[bot]
8a0553a5b1 chore(deps): bump docker/metadata-action from 5 to 6
Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6.
- [Release notes](https://github.com/docker/metadata-action/releases)
- [Commits](https://github.com/docker/metadata-action/compare/v5...v6)

---
updated-dependencies:
- dependency-name: docker/metadata-action
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-09 04:33:49 +00:00
GitHub Action
68af1d406a Release version 0.61.5 2026-02-18 21:04:32 +00:00
Jose Diaz-Gonzalez
b112b43a08 Merge pull request #490 from Iamrodos/fix/489-empty-repo-none-comparison
Fix empty repository crash due to None timestamp comparison (#489)
2026-02-18 16:03:57 -05:00
Rodos
f54a5458f6 Fix empty repository crash due to None timestamp comparison (#489)
Empty repositories have None for pushed_at/updated_at, causing a
TypeError when compared to the last_update string. Use .get() with
truthiness check to skip None timestamps in incremental tracking.
2026-02-18 20:10:48 +11:00
GitHub Action
60067650b0 Release version 0.61.4 2026-02-16 05:46:39 +00:00
Jose Diaz-Gonzalez
655886fa80 Merge pull request #488 from Iamrodos/fix/487-dmca-regression
Fix HTTP 451 DMCA and 403 TOS handling regression (#487)
2026-02-16 00:46:05 -05:00
Rodos
0162f7ed46 Fix HTTP 451 DMCA and 403 TOS handling regression (#487)
The DMCA handling added in PR #454 had a bug: make_request_with_retry()
raises HTTPError before retrieve_data() could check the status code via
getcode(), making the case 451 handler dead code. This also affected
HTTP 403 TOS violations (e.g. jumoog/MagiskOnWSA).

Fix by catching HTTPError in retrieve_data() and converting 451 and
blocked 403 responses (identified by "block" key in response body) to
RepositoryUnavailableError. Non-block 403s (permissions, scopes) still
propagate as HTTPError. Also handle RepositoryUnavailableError in
retrieve_repositories() for the --repository case.

Rewrote tests to mock urlopen (not make_request_with_retry) to exercise
the real code path that was previously untested.

Closes #487
2026-02-16 10:16:33 +11:00
Jose Diaz-Gonzalez
8c1a13475a Merge pull request #485 from josegonzalez/dependabot/pip/python-packages-906bf77f00
chore(deps): bump setuptools from 80.10.2 to 82.0.0 in the python-packages group
2026-02-11 15:26:03 -05:00
dependabot[bot]
6268a4c5c6 chore(deps): bump setuptools in the python-packages group
Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools).


Updates `setuptools` from 80.10.2 to 82.0.0
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v80.10.2...v82.0.0)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-version: 82.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-02-09 14:31:40 +00:00
Jose Diaz-Gonzalez
4b2295db0d Merge pull request #484 from josegonzalez/dependabot/pip/python-packages-e903f47b53
chore(deps): bump setuptools from 80.10.1 to 80.10.2 in the python-packages group
2026-01-26 10:54:29 -05:00
dependabot[bot]
be900d1f3f chore(deps): bump setuptools in the python-packages group
Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools).


Updates `setuptools` from 80.10.1 to 80.10.2
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v80.10.1...v80.10.2)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-version: 80.10.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-26 14:08:53 +00:00
GitHub Action
9be6282719 Release version 0.61.3 2026-01-24 05:45:42 +00:00
Jose Diaz-Gonzalez
1102990af0 Merge pull request #482 from Iamrodos/fix-481-private-key-typo
Fix KeyError: 'Private' when using --all flag (#481)
2026-01-24 00:45:01 -05:00
Jose Diaz-Gonzalez
311ffb40cd Merge pull request #483 from josegonzalez/dependabot/pip/python-packages-d4f9607e9b
chore(deps): bump setuptools from 80.9.0 to 80.10.1 in the python-packages group
2026-01-24 00:44:01 -05:00
dependabot[bot]
2f5e7c2dcf chore(deps): bump setuptools in the python-packages group
Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools).


Updates `setuptools` from 80.9.0 to 80.10.1
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v80.9.0...v80.10.1)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-version: 80.10.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-21 13:05:17 +00:00
Rodos
0d8a504b02 Fix KeyError: 'Private' when using --all flag (#481)
The repository dictionary uses lowercase "private" key. Use .get() with
the correct case to match the pattern used elsewhere in the codebase.

The bug only affects --all users since --security-advisories short-circuits
before the key access.
2026-01-21 21:12:03 +11:00
GitHub Action
712d22d124 Release version 0.61.2 2026-01-19 17:40:27 +00:00
Jose Diaz-Gonzalez
e0c9d65225 Merge pull request #480 from josegonzalez/dependabot/pip/python-packages-65ea79b78d
chore(deps): bump black from 25.12.0 to 26.1.0 in the python-packages group
2026-01-19 12:39:54 -05:00
Jose Diaz-Gonzalez
52d996f784 Merge pull request #479 from lukasbestle/fix/security-advisories-private
Fixes to `--security-advisories` option
2026-01-19 12:39:48 -05:00
dependabot[bot]
e6283f9384 chore(deps): bump black in the python-packages group
Bumps the python-packages group with 1 update: [black](https://github.com/psf/black).


Updates `black` from 25.12.0 to 26.1.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.12.0...26.1.0)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 26.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-19 14:50:28 +00:00
Lukas Bestle
1181f811b7 docs: Explain security advisories in README 2026-01-16 08:52:45 +01:00
Lukas Bestle
856ad5db41 fix: Skip security advisories for private repos unless explicitly requested 2026-01-14 21:10:12 +01:00
Lukas Bestle
c6fa8c7695 feat: Only make security advisory dir if successful
Avoids empty directories for private repos
2026-01-14 21:02:51 +01:00
Lukas Bestle
93e505c07d fix: Handle 404 errors on security advisories 2026-01-14 21:01:59 +01:00
GitHub Action
6780d3ad6c Release version 0.61.1 2026-01-13 23:10:05 +00:00
Jose Diaz-Gonzalez
65bacc27f0 Merge pull request #478 from Iamrodos/fix-477-fine-grained-pat-attachments
Fix 477 fine grained pat attachments
2026-01-13 18:09:27 -05:00
Rodos
ab0eebb175 Refactor test fixtures to use shared create_args helper
Uses the real parse_args() function to get CLI defaults, so when
new arguments are added they're automatically available to all tests.

Changes:
- Add tests/conftest.py with create_args fixture
- Update 8 test files to use shared fixture
- Remove duplicate _create_mock_args methods
- Remove redundant @pytest.fixture mock_args definitions

This eliminates the need to update multiple test files when
adding new CLI arguments.
2026-01-13 13:47:33 +11:00
Rodos
fce4abb74a Fix fine-grained PAT attachment downloads for private repos (#477)
Fine-grained personal access tokens cannot download attachments from
private repositories directly due to a GitHub platform limitation.

This adds a workaround for image attachments (/assets/ URLs) using
GitHub's Markdown API to convert URLs to JWT-signed URLs that can be
downloaded without authentication.

Changes:
- Add get_jwt_signed_url_via_markdown_api() function
- Detect fine-grained token + private repo + /assets/ URL upfront
- Use JWT workaround for those cases, mark success with jwt_workaround flag
- Skip download with skipped_at when workaround fails
- Add startup warning when using --attachments with fine-grained tokens
- Document limitation in README (file attachments still fail)
- Add 6 unit tests for JWT workaround logic
2026-01-13 13:15:38 +11:00
GitHub Action
c63fb37d30 Release version 0.61.0 2026-01-12 16:30:28 +00:00
Jose Diaz-Gonzalez
94b08d06c9 Merge pull request #476 from lukasbestle/patch-1
docs: Add missing `--retries` argument to README
2026-01-12 11:29:56 -05:00
Jose Diaz-Gonzalez
54a9872e47 Merge pull request #475 from lukasbestle/feat/security-advisories
feat: Backup of repository security advisories
2026-01-11 14:26:39 -05:00
Lukas Bestle
b3d35f9d9f docs: Add missing --retries argument to README 2026-01-10 15:44:37 +01:00
Lukas Bestle
a175ac3ed9 test: Adapt tests to new argument 2026-01-10 11:12:42 +01:00
Lukas Bestle
9a6f0b4c21 feat: Backup of repository security advisories 2026-01-09 21:04:21 +01:00
GitHub Action
858731ebbd Release version 0.60.0 2025-12-24 00:45:01 +00:00
Jose Diaz-Gonzalez
2e999d0d3c Merge pull request #474 from mwtzzz/retry_logic
update retry logic and logging
2025-12-23 19:44:32 -05:00
michaelmartinez
44b0003ec9 updates to the tests, and fixes to the retry 2025-12-23 14:07:38 -08:00
michaelmartinez
5ab3852476 rm max_retries.py 2025-12-23 08:57:57 -08:00
michaelmartinez
8b21e2501c readme 2025-12-23 08:55:52 -08:00
michaelmartinez
f9827da342 don't use a global variable, pass the args instead 2025-12-23 08:53:54 -08:00
michaelmartinez
1f2ec016d5 readme, simplify the logic a bit 2025-12-22 16:13:12 -08:00
michaelmartinez
8b1b632d89 max_retries 5 2025-12-22 14:47:26 -08:00
michaelmartinez
89502c326d update retry logic and logging
### What
1. configureable retry count
2. additional logging

### Why
1. pass retry count as a command line arg; default 5
2. show details when api requests fail

### Testing before merge
compiles cleanly

### Validation after merge
compile and test

### Issue addressed by this PR
https://github.com/stellar/ops/issues/2039
2025-12-22 14:23:02 -08:00
GitHub Action
81a72ac8af Release version 0.59.0 2025-12-21 23:48:36 +00:00
Jose Diaz-Gonzalez
3edbfc777c Merge pull request #472 from Iamrodos/feature/108-starred-skip-size-over
Add --starred-skip-size-over flag to limit starred repo size (#108)
2025-12-21 18:47:58 -05:00
Rodos
3c43e0f481 Add --starred-skip-size-over flag to limit starred repo size (#108)
Allow users to skip starred repositories exceeding a size threshold
when using --all-starred. Size is specified in MB and checked against
the GitHub API's repository size field.

- Only affects starred repos; user's own repos always included
- Logs each skipped repo with name and size

Closes #108
2025-12-21 22:18:09 +11:00
Jose Diaz-Gonzalez
875f09eeaf Merge pull request #473 from Iamrodos/chore/remove-password-auth
chore: remove deprecated -u/-p password authentication options
2025-12-21 01:36:35 -05:00
Rodos
db36c3c137 chore: remove deprecated -u/-p password authentication options 2025-12-20 19:16:11 +11:00
GitHub Action
c70cc43f57 Release version 0.58.0 2025-12-16 15:17:23 +00:00
Jose Diaz-Gonzalez
27d3fcdafa Merge pull request #471 from Iamrodos/fix/retry-logic
Fix retry logic for HTTP 5xx errors and network failures
2025-12-16 10:16:48 -05:00
Rodos
46140b0ff1 Fix retry logic for HTTP 5xx errors and network failures
Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a9048 to fix #29.

Fixes #140, #110, #138
2025-12-16 21:55:47 +11:00
Jose Diaz-Gonzalez
02dd902b67 Merge pull request #470 from Iamrodos/chore/cleanup-release-requirements
chore: remove transitive deps from release-requirements.txt
2025-12-12 21:51:24 -05:00
Rodos
241949137d chore: remove transitive deps from release-requirements.txt 2025-12-13 11:22:53 +11:00
Jose Diaz-Gonzalez
1155da849d Merge pull request #469 from josegonzalez/dependabot/pip/python-packages-3c63e8caab
chore(deps): bump urllib3 from 2.6.1 to 2.6.2 in the python-packages group
2025-12-12 16:39:50 -05:00
dependabot[bot]
59a70ff11a chore(deps): bump urllib3 in the python-packages group
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).


Updates `urllib3` from 2.6.1 to 2.6.2
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.1...2.6.2)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-12 13:09:29 +00:00
GitHub Action
ba852b5830 Release version 0.57.0 2025-12-12 11:07:14 +00:00
Jose Diaz-Gonzalez
934ee4b14b Merge pull request #467 from Iamrodos/docs/187-189-auth-docs
Add GitHub Apps documentation and stdin token example
2025-12-12 06:06:30 -05:00
Jose Diaz-Gonzalez
37a0c5c123 Merge pull request #468 from Iamrodos/feature/135-skip-assets-on
Add --skip-assets-on flag to skip release asset downloads (#135)
2025-12-12 06:05:47 -05:00
Rodos
f6e2f40b09 Add --skip-assets-on flag to skip release asset downloads (#135)
Allow users to skip downloading release assets for specific repositories
while still backing up release metadata. Useful for starred repos with
large assets (e.g. syncthing with 27GB+).

Usage: --skip-assets-on repo1 repo2 owner/repo3

Features:
- Space-separated repos (consistent with --exclude)
- Case-insensitive matching
- Supports both repo name and owner/repo format
2025-12-12 16:21:52 +11:00
Rodos
ef990483e2 Add GitHub Apps documentation and remove outdated header
- Add GitHub Apps authentication section with setup steps
  and CI/CD workflow example using actions/create-github-app-token
- Remove outdated machine-man-preview header (graduated 2020)

Closes #189
2025-12-12 10:25:49 +11:00
Rodos
3a513b6646 docs: add stdin token example to README
Add example showing how to pipe a token from stdin using
file:///dev/stdin to avoid storing tokens in environment
variables or command history.

Closes #187
2025-12-12 09:55:13 +11:00
GitHub Action
2bb83d6d8b Release version 0.56.0 2025-12-11 16:50:28 +00:00
Jose Diaz-Gonzalez
8fcc142621 Merge pull request #465 from Iamrodos/fix/379-lfs-clone-deprecated
fix: replace deprecated git lfs clone with git clone + git lfs fetch --all
2025-12-11 11:49:53 -05:00
Jose Diaz-Gonzalez
7615ce6102 Merge pull request #464 from Iamrodos/fix/246-restore-docs
docs: clarify no inbuilt restore and GitHub API limitations
2025-12-11 11:49:39 -05:00
Jose Diaz-Gonzalez
3f1ef821c3 Merge pull request #466 from Iamrodos/fix/112-windows-support
fix: add Windows support with entry_points and os.replace
2025-12-11 11:48:59 -05:00
Rodos
3684756eaa fix: add Windows support with entry_points and os.replace
- Replace os.rename() with os.replace() for atomic file operations
  on Windows (os.rename fails if destination exists on Windows)
- Add entry_points console_scripts for proper .exe generation on Windows
- Create github_backup/cli.py with main() entry point
- Add github_backup/__main__.py for python -m github_backup support
- Keep bin/github-backup as thin wrapper for backwards compatibility

Closes #112
2025-12-11 22:03:45 +11:00
Rodos
e745b55755 fix: replace deprecated git lfs clone with git clone + git lfs fetch --all
git lfs clone is deprecated - modern git clone handles LFS automatically.
Using git lfs fetch --all ensures all LFS objects across all refs are
backed up, matching the existing bare clone behavior and providing
complete LFS backups.

Closes #379
2025-12-11 20:55:38 +11:00
Rodos
75e6f56773 docs: add "Restoring from Backup" section to README
Clarifies that this tool is backup-only with no inbuilt restore.
Documents that git repos can be pushed back, but issues/PRs have
GitHub API limitations affecting all backup tools.

Closes #246
2025-12-11 20:35:08 +11:00
Jose Diaz-Gonzalez
b991c363a0 Merge pull request #463 from josegonzalez/dependabot/pip/python-packages-9e0978b55f
chore(deps): bump urllib3 from 2.6.0 to 2.6.1 in the python-packages group
2025-12-10 09:39:07 -05:00
dependabot[bot]
6d74af9126 chore(deps): bump urllib3 in the python-packages group
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).


Updates `urllib3` from 2.6.0 to 2.6.1
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.0...2.6.1)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-09 13:10:12 +00:00
Jose Diaz-Gonzalez
381d67af96 Merge pull request #462 from josegonzalez/dependabot/pip/python-packages-3a01b12ef5
chore(deps): bump the python-packages group with 3 updates
2025-12-08 16:00:24 -05:00
dependabot[bot]
2fbe8d272c chore(deps): bump the python-packages group with 3 updates
Bumps the python-packages group with 3 updates: [black](https://github.com/psf/black), [pytest](https://github.com/pytest-dev/pytest) and [platformdirs](https://github.com/tox-dev/platformdirs).


Updates `black` from 25.11.0 to 25.12.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.11.0...25.12.0)

Updates `pytest` from 9.0.1 to 9.0.2
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.1...9.0.2)

Updates `platformdirs` from 4.5.0 to 4.5.1
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.5.0...4.5.1)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 25.12.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-packages
- dependency-name: pytest
  dependency-version: 9.0.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
- dependency-name: platformdirs
  dependency-version: 4.5.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-08 13:09:32 +00:00
GitHub Action
eb5779ac23 Release version 0.55.0 2025-12-07 13:59:35 +00:00
Jose Diaz-Gonzalez
5b52931ebf Merge pull request #461 from Iamrodos/fix-cli-ux-and-cleanup
fix: CLI UX improvements and cleanup
2025-12-07 08:58:59 -05:00
Rodos
1d6d474408 fix: improve error messages for inaccessible repos and empty wikis 2025-12-07 21:50:49 +11:00
Rodos
b80049e96e test: add missing test coverage for case sensitivity fix 2025-12-07 21:21:37 +11:00
Rodos
58ad1c2378 docs: fix RST formatting in Known blocking errors section 2025-12-07 21:21:26 +11:00
Rodos
6e2a7e521c fix: --all-starred now clones repos without --repositories 2025-12-07 21:21:14 +11:00
Rodos
aba048a3e9 fix: warn when --private used without authentication 2025-12-07 21:20:54 +11:00
Jose Diaz-Gonzalez
9f7c08166f Merge pull request #460 from josegonzalez/dependabot/pip/urllib3-2.6.0
chore(deps): bump urllib3 from 2.5.0 to 2.6.0
2025-12-06 22:23:09 -05:00
dependabot[bot]
fdfaaec1ba chore(deps): bump urllib3 from 2.5.0 to 2.6.0
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-06 04:51:42 +00:00
Jose Diaz-Gonzalez
8f9cf7ff89 Merge pull request #459 from Iamrodos/issue-93-starred-gists-warning
fix: warn and skip when --starred-gists used for different user
2025-12-03 23:07:29 -05:00
Rodos
899ab5fdc2 fix: warn and skip when --starred-gists used for different user
GitHub's API only allows retrieving starred gists for the authenticated
user. Previously, using --starred-gists when backing up a different user
would silently return no relevant data.

Now warns and skips the retrieval entirely when the target user differs
from the authenticated user. Uses case-insensitive comparison to match
GitHub's username handling.

Fixes #93
2025-12-04 10:07:43 +11:00
GitHub Action
2a9d86a6bf Release version 0.54.0 2025-12-03 02:17:59 +00:00
Jose Diaz-Gonzalez
4fd3ea9e3c Merge pull request #457 from Iamrodos/readme-updates
docs: update README testing section and add fetch vs pull explanation
2025-12-02 21:15:33 -05:00
Jose Diaz-Gonzalez
041dc013f9 Merge pull request #458 from Iamrodos/fix-logging
fix: send INFO/DEBUG to stdout, WARNING/ERROR to stderr
2025-12-02 21:14:49 -05:00
Rodos
12802103c4 fix: send INFO/DEBUG to stdout, WARNING/ERROR to stderr
Fixes #182
2025-12-01 16:11:11 +11:00
Rodos
bf28b46954 docs: update README testing section and add fetch vs pull explanation 2025-12-01 15:55:00 +11:00
GitHub Action
ff2681e196 Release version 0.53.0 2025-11-30 04:30:48 +00:00
Jose Diaz-Gonzalez
745b05a63f Merge pull request #456 from Iamrodos/fix-case
fix: case-sensitive username filtering causing silent backup failures
2025-11-29 23:30:07 -05:00
Jose Diaz-Gonzalez
83ff0ae1dd Merge pull request #455 from Iamrodos/fix-133
Avoid rewriting unchanged JSON files for labels, milestones, releases…
2025-11-29 23:29:30 -05:00
Rodos
6ad1959d43 fix: case-sensitive username filtering causing silent backup failures
GitHub's API accepts usernames in any case but returns canonical case.
The case-sensitive comparison in filter_repositories() filtered out all
repositories when user-provided case didn't match GitHub's canonical case.

Changed to case-insensitive comparison.

Fixes #198
2025-11-29 21:16:22 +11:00
Rodos
5739ac0745 Avoid rewriting unchanged JSON files for labels, milestones, releases, hooks, followers, and following
This change reduces unnecessary writes when backing up metadata that changes
infrequently. The implementation compares existing file content before writing
and skips the write if the content is identical, preserving file timestamps.

Key changes:
- Added json_dump_if_changed() helper that compares content before writing
- Uses atomic writes (temp file + rename) for all metadata files
- NOT applied to issues/pulls (they use incremental_by_files logic)
- Made log messages consistent and past tense ("Saved" instead of "Saving")
- Added informative logging showing skip counts

Fixes #133
2025-11-29 17:21:14 +11:00
GitHub Action
8b7512c8d8 Release version 0.52.0 2025-11-28 23:39:09 +00:00
Jose Diaz-Gonzalez
995b7ede6c Merge pull request #454 from Iamrodos/http-451
Skip DMCA'd repos which return a 451 response
2025-11-28 18:38:32 -05:00
Rodos
7840528fe2 Skip DMCA'd repos which return a 451 response
Log a warning and the link to the DMCA notice. Continue backing up
other repositories instead of crashing.

Closes #163
2025-11-29 09:52:02 +11:00
Jose Diaz-Gonzalez
6fb0d86977 Merge pull request #453 from josegonzalez/dependabot/pip/python-packages-42260fba7a
chore(deps): bump restructuredtext-lint from 1.4.0 to 2.0.2 in the python-packages group
2025-11-24 15:07:08 -05:00
dependabot[bot]
9f6b401171 chore(deps): bump restructuredtext-lint in the python-packages group
Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint).


Updates `restructuredtext-lint` from 1.4.0 to 2.0.2
- [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2)

---
updated-dependencies:
- dependency-name: restructuredtext-lint
  dependency-version: 2.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-24 14:58:52 +00:00
27 changed files with 5502 additions and 561 deletions

View File

@@ -43,13 +43,13 @@ jobs:
persist-credentials: false
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
uses: docker/setup-qemu-action@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v4
- name: Log in to the Container registry
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
@@ -57,7 +57,7 @@ jobs:
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
uses: docker/metadata-action@v6
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
@@ -68,7 +68,7 @@ jobs:
type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', 'main') }}
- name: Build and push Docker image
uses: docker/build-push-action@v6
uses: docker/build-push-action@v7
with:
context: .
push: true

View File

@@ -1,9 +1,690 @@
Changelog
=========
0.51.3 (2025-11-18)
0.62.0 (2026-04-29)
-------------------
------------------------
- Skip checkpoint-equal incremental items. [Duncan Ogilvie]
- Avoid redundant release asset list requests. [Duncan Ogilvie]
- Reduce unnecessary pull requests with incremental fetching. [Duncan
Ogilvie]
- Implement per-resource last_update timestamps. [Duncan Ogilvie]
Closes #62
- Add support for pull request reviews. [Duncan Ogilvie]
Closes #124
- Add support for discussions. [Duncan Ogilvie]
Closes #290
- Add --token-from-gh authentication option. [Duncan Ogilvie]
- Chore(deps): bump pytest in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [pytest](https://github.com/pytest-dev/pytest).
Updates `pytest` from 9.0.2 to 9.0.3
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.2...9.0.3)
---
updated-dependencies:
- dependency-name: pytest
dependency-version: 9.0.3
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
- Chore(deps): bump black in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [black](https://github.com/psf/black).
Updates `black` from 26.3.0 to 26.3.1
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/26.3.0...26.3.1)
---
updated-dependencies:
- dependency-name: black
dependency-version: 26.3.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
- Chore(deps): bump docker/login-action from 3 to 4. [dependabot[bot]]
Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/v3...v4)
---
updated-dependencies:
- dependency-name: docker/login-action
dependency-version: '4'
dependency-type: direct:production
update-type: version-update:semver-major
...
- Chore(deps): bump docker/setup-qemu-action from 3 to 4.
[dependabot[bot]]
Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3 to 4.
- [Release notes](https://github.com/docker/setup-qemu-action/releases)
- [Commits](https://github.com/docker/setup-qemu-action/compare/v3...v4)
---
updated-dependencies:
- dependency-name: docker/setup-qemu-action
dependency-version: '4'
dependency-type: direct:production
update-type: version-update:semver-major
...
- Chore(deps): bump docker/build-push-action from 6 to 7.
[dependabot[bot]]
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v6...v7)
---
updated-dependencies:
- dependency-name: docker/build-push-action
dependency-version: '7'
dependency-type: direct:production
update-type: version-update:semver-major
...
- Chore(deps): bump docker/setup-buildx-action from 3 to 4.
[dependabot[bot]]
Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4.
- [Release notes](https://github.com/docker/setup-buildx-action/releases)
- [Commits](https://github.com/docker/setup-buildx-action/compare/v3...v4)
---
updated-dependencies:
- dependency-name: docker/setup-buildx-action
dependency-version: '4'
dependency-type: direct:production
update-type: version-update:semver-major
...
- Chore(deps): bump docker/metadata-action from 5 to 6.
[dependabot[bot]]
Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6.
- [Release notes](https://github.com/docker/metadata-action/releases)
- [Commits](https://github.com/docker/metadata-action/compare/v5...v6)
---
updated-dependencies:
- dependency-name: docker/metadata-action
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
...
- Chore(deps): bump the python-packages group with 2 updates.
[dependabot[bot]]
Bumps the python-packages group with 2 updates: [black](https://github.com/psf/black) and [setuptools](https://github.com/pypa/setuptools).
Updates `black` from 26.1.0 to 26.3.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/26.1.0...26.3.0)
Updates `setuptools` from 82.0.0 to 82.0.1
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v82.0.0...v82.0.1)
---
updated-dependencies:
- dependency-name: black
dependency-version: 26.3.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: python-packages
- dependency-name: setuptools
dependency-version: 82.0.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
0.61.5 (2026-02-18)
-------------------
- Fix empty repository crash due to None timestamp comparison (#489)
[Rodos]
Empty repositories have None for pushed_at/updated_at, causing a
TypeError when compared to the last_update string. Use .get() with
truthiness check to skip None timestamps in incremental tracking.
0.61.4 (2026-02-16)
-------------------
- Fix HTTP 451 DMCA and 403 TOS handling regression (#487) [Rodos]
The DMCA handling added in PR #454 had a bug: make_request_with_retry()
raises HTTPError before retrieve_data() could check the status code via
getcode(), making the case 451 handler dead code. This also affected
HTTP 403 TOS violations (e.g. jumoog/MagiskOnWSA).
Fix by catching HTTPError in retrieve_data() and converting 451 and
blocked 403 responses (identified by "block" key in response body) to
RepositoryUnavailableError. Non-block 403s (permissions, scopes) still
propagate as HTTPError. Also handle RepositoryUnavailableError in
retrieve_repositories() for the --repository case.
Rewrote tests to mock urlopen (not make_request_with_retry) to exercise
the real code path that was previously untested.
Closes #487
- Chore(deps): bump setuptools in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools).
Updates `setuptools` from 80.10.2 to 82.0.0
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v80.10.2...v82.0.0)
---
updated-dependencies:
- dependency-name: setuptools
dependency-version: 82.0.0
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: python-packages
...
- Chore(deps): bump setuptools in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools).
Updates `setuptools` from 80.10.1 to 80.10.2
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v80.10.1...v80.10.2)
---
updated-dependencies:
- dependency-name: setuptools
dependency-version: 80.10.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
0.61.3 (2026-01-24)
-------------------
- Fix KeyError: 'Private' when using --all flag (#481) [Rodos]
The repository dictionary uses lowercase "private" key. Use .get() with
the correct case to match the pattern used elsewhere in the codebase.
The bug only affects --all users since --security-advisories short-circuits
before the key access.
- Chore(deps): bump setuptools in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools).
Updates `setuptools` from 80.9.0 to 80.10.1
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v80.9.0...v80.10.1)
---
updated-dependencies:
- dependency-name: setuptools
dependency-version: 80.10.1
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: python-packages
...
0.61.2 (2026-01-19)
-------------------
Fix
~~~
- Skip security advisories for private repos unless explicitly
requested. [Lukas Bestle]
- Handle 404 errors on security advisories. [Lukas Bestle]
Other
~~~~~
- Chore(deps): bump black in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [black](https://github.com/psf/black).
Updates `black` from 25.12.0 to 26.1.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.12.0...26.1.0)
---
updated-dependencies:
- dependency-name: black
dependency-version: 26.1.0
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: python-packages
...
- Docs: Explain security advisories in README. [Lukas Bestle]
- Feat: Only make security advisory dir if successful. [Lukas Bestle]
Avoids empty directories for private repos
0.61.1 (2026-01-13)
-------------------
- Refactor test fixtures to use shared create_args helper. [Rodos]
Uses the real parse_args() function to get CLI defaults, so when
new arguments are added they're automatically available to all tests.
Changes:
- Add tests/conftest.py with create_args fixture
- Update 8 test files to use shared fixture
- Remove duplicate _create_mock_args methods
- Remove redundant @pytest.fixture mock_args definitions
This eliminates the need to update multiple test files when
adding new CLI arguments.
- Fix fine-grained PAT attachment downloads for private repos (#477)
[Rodos]
Fine-grained personal access tokens cannot download attachments from
private repositories directly due to a GitHub platform limitation.
This adds a workaround for image attachments (/assets/ URLs) using
GitHub's Markdown API to convert URLs to JWT-signed URLs that can be
downloaded without authentication.
Changes:
- Add get_jwt_signed_url_via_markdown_api() function
- Detect fine-grained token + private repo + /assets/ URL upfront
- Use JWT workaround for those cases, mark success with jwt_workaround flag
- Skip download with skipped_at when workaround fails
- Add startup warning when using --attachments with fine-grained tokens
- Document limitation in README (file attachments still fail)
- Add 6 unit tests for JWT workaround logic
0.61.0 (2026-01-12)
-------------------
- Docs: Add missing `--retries` argument to README. [Lukas Bestle]
- Test: Adapt tests to new argument. [Lukas Bestle]
- Feat: Backup of repository security advisories. [Lukas Bestle]
0.60.0 (2025-12-24)
-------------------
- Rm max_retries.py. [michaelmartinez]
- Readme. [michaelmartinez]
- Don't use a global variable, pass the args instead. [michaelmartinez]
- Readme, simplify the logic a bit. [michaelmartinez]
- Max_retries 5. [michaelmartinez]
0.59.0 (2025-12-21)
-------------------
- Add --starred-skip-size-over flag to limit starred repo size (#108)
[Rodos]
Allow users to skip starred repositories exceeding a size threshold
when using --all-starred. Size is specified in MB and checked against
the GitHub API's repository size field.
- Only affects starred repos; user's own repos always included
- Logs each skipped repo with name and size
Closes #108
- Chore: remove deprecated -u/-p password authentication options.
[Rodos]
0.58.0 (2025-12-16)
-------------------
- Fix retry logic for HTTP 5xx errors and network failures. [Rodos]
Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a9048 to fix #29.
Fixes #140, #110, #138
- Chore: remove transitive deps from release-requirements.txt. [Rodos]
- Chore(deps): bump urllib3 in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).
Updates `urllib3` from 2.6.1 to 2.6.2
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.1...2.6.2)
---
updated-dependencies:
- dependency-name: urllib3
dependency-version: 2.6.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
0.57.0 (2025-12-12)
-------------------
- Add GitHub Apps documentation and remove outdated header. [Rodos]
- Add GitHub Apps authentication section with setup steps
and CI/CD workflow example using actions/create-github-app-token
- Remove outdated machine-man-preview header (graduated 2020)
Closes #189
- Docs: add stdin token example to README. [Rodos]
Add example showing how to pipe a token from stdin using
file:///dev/stdin to avoid storing tokens in environment
variables or command history.
Closes #187
- Add --skip-assets-on flag to skip release asset downloads (#135)
[Rodos]
Allow users to skip downloading release assets for specific repositories
while still backing up release metadata. Useful for starred repos with
large assets (e.g. syncthing with 27GB+).
Usage: --skip-assets-on repo1 repo2 owner/repo3
Features:
- Space-separated repos (consistent with --exclude)
- Case-insensitive matching
- Supports both repo name and owner/repo format
0.56.0 (2025-12-11)
-------------------
Fix
~~~
- Replace deprecated git lfs clone with git clone + git lfs fetch --all.
[Rodos]
git lfs clone is deprecated - modern git clone handles LFS automatically.
Using git lfs fetch --all ensures all LFS objects across all refs are
backed up, matching the existing bare clone behavior and providing
complete LFS backups.
Closes #379
- Add Windows support with entry_points and os.replace. [Rodos]
- Replace os.rename() with os.replace() for atomic file operations
on Windows (os.rename fails if destination exists on Windows)
- Add entry_points console_scripts for proper .exe generation on Windows
- Create github_backup/cli.py with main() entry point
- Add github_backup/__main__.py for python -m github_backup support
- Keep bin/github-backup as thin wrapper for backwards compatibility
Closes #112
Other
~~~~~
- Docs: add "Restoring from Backup" section to README. [Rodos]
Clarifies that this tool is backup-only with no inbuilt restore.
Documents that git repos can be pushed back, but issues/PRs have
GitHub API limitations affecting all backup tools.
Closes #246
- Chore(deps): bump urllib3 in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).
Updates `urllib3` from 2.6.0 to 2.6.1
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.0...2.6.1)
---
updated-dependencies:
- dependency-name: urllib3
dependency-version: 2.6.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
- Chore(deps): bump the python-packages group with 3 updates.
[dependabot[bot]]
Bumps the python-packages group with 3 updates: [black](https://github.com/psf/black), [pytest](https://github.com/pytest-dev/pytest) and [platformdirs](https://github.com/tox-dev/platformdirs).
Updates `black` from 25.11.0 to 25.12.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.11.0...25.12.0)
Updates `pytest` from 9.0.1 to 9.0.2
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.1...9.0.2)
Updates `platformdirs` from 4.5.0 to 4.5.1
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.5.0...4.5.1)
---
updated-dependencies:
- dependency-name: black
dependency-version: 25.12.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: python-packages
- dependency-name: pytest
dependency-version: 9.0.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
- dependency-name: platformdirs
dependency-version: 4.5.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
0.55.0 (2025-12-07)
-------------------
Fix
~~~
- Improve error messages for inaccessible repos and empty wikis. [Rodos]
- --all-starred now clones repos without --repositories. [Rodos]
- Warn when --private used without authentication. [Rodos]
- Warn and skip when --starred-gists used for different user. [Rodos]
GitHub's API only allows retrieving starred gists for the authenticated
user. Previously, using --starred-gists when backing up a different user
would silently return no relevant data.
Now warns and skips the retrieval entirely when the target user differs
from the authenticated user. Uses case-insensitive comparison to match
GitHub's username handling.
Fixes #93
Other
~~~~~
- Test: add missing test coverage for case sensitivity fix. [Rodos]
- Docs: fix RST formatting in Known blocking errors section. [Rodos]
- Chore(deps): bump urllib3 from 2.5.0 to 2.6.0. [dependabot[bot]]
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0)
---
updated-dependencies:
- dependency-name: urllib3
dependency-version: 2.6.0
dependency-type: direct:production
...
0.54.0 (2025-12-03)
-------------------
Fix
~~~
- Send INFO/DEBUG to stdout, WARNING/ERROR to stderr. [Rodos]
Fixes #182
Other
~~~~~
- Docs: update README testing section and add fetch vs pull explanation.
[Rodos]
0.53.0 (2025-11-30)
-------------------
Fix
~~~
- Case-sensitive username filtering causing silent backup failures.
[Rodos]
GitHub's API accepts usernames in any case but returns canonical case.
The case-sensitive comparison in filter_repositories() filtered out all
repositories when user-provided case didn't match GitHub's canonical case.
Changed to case-insensitive comparison.
Fixes #198
Other
~~~~~
- Avoid rewriting unchanged JSON files for labels, milestones, releases,
hooks, followers, and following. [Rodos]
This change reduces unnecessary writes when backing up metadata that changes
infrequently. The implementation compares existing file content before writing
and skips the write if the content is identical, preserving file timestamps.
Key changes:
- Added json_dump_if_changed() helper that compares content before writing
- Uses atomic writes (temp file + rename) for all metadata files
- NOT applied to issues/pulls (they use incremental_by_files logic)
- Made log messages consistent and past tense ("Saved" instead of "Saving")
- Added informative logging showing skip counts
Fixes #133
0.52.0 (2025-11-28)
-------------------
- Skip DMCA'd repos which return a 451 response. [Rodos]
Log a warning and the link to the DMCA notice. Continue backing up
other repositories instead of crashing.
Closes #163
- Chore(deps): bump restructuredtext-lint in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint).
Updates `restructuredtext-lint` from 1.4.0 to 2.0.2
- [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2)
---
updated-dependencies:
- dependency-name: restructuredtext-lint
dependency-version: 2.0.2
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: python-packages
...
- Chore(deps): bump actions/checkout from 5 to 6. [dependabot[bot]]
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v5...v6)
---
updated-dependencies:
- dependency-name: actions/checkout
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
...
- Chore(deps): bump the python-packages group with 3 updates.
[dependabot[bot]]
Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring).
Updates `click` from 8.3.0 to 8.3.1
- [Release notes](https://github.com/pallets/click/releases)
- [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1)
Updates `pytest` from 8.3.3 to 9.0.1
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1)
Updates `keyring` from 25.6.0 to 25.7.0
- [Release notes](https://github.com/jaraco/keyring/releases)
- [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst)
- [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0)
---
updated-dependencies:
- dependency-name: click
dependency-version: 8.3.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
- dependency-name: pytest
dependency-version: 9.0.1
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: python-packages
- dependency-name: keyring
dependency-version: 25.7.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: python-packages
...
0.51.3 (2025-11-18)
-------------------
- Test: Add pagination tests for cursor and page-based Link headers.
[Rodos]
- Use cursor based pagination. [Helio Machado]

View File

@@ -4,7 +4,7 @@ github-backup
|PyPI| |Python Versions|
The package can be used to backup an *entire* `Github <https://github.com/>`_ organization, repository or user account, including starred repos, issues and wikis in the most appropriate format (clones for wikis, json files for issues).
The package can be used to backup an *entire* `Github <https://github.com/>`_ organization, repository or user account, including starred repos, issues, discussions and wikis in the most appropriate format (clones for wikis, json files for issues and discussions).
Requirements
============
@@ -36,23 +36,28 @@ Show the CLI help output::
CLI Help output::
github-backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN_CLASSIC]
[-f TOKEN_FINE] [--as-app] [-o OUTPUT_DIRECTORY]
[-l LOG_LEVEL] [-i] [--starred] [--all-starred]
[--watched] [--followers] [--following] [--all] [--issues]
[--issue-comments] [--issue-events] [--pulls]
[--pull-comments] [--pull-commits] [--pull-details]
[--labels] [--hooks] [--milestones] [--repositories]
[--bare] [--lfs] [--wikis] [--gists] [--starred-gists]
[--skip-archived] [--skip-existing] [-L [LANGUAGES ...]]
[-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY]
[-P] [-F] [--prefer-ssh] [-v]
github-backup [-h] [-t TOKEN_CLASSIC] [-f TOKEN_FINE] [--token-from-gh]
[-q] [--as-app] [-o OUTPUT_DIRECTORY] [-l LOG_LEVEL] [-i]
[--incremental-by-files]
[--starred] [--all-starred] [--starred-skip-size-over MB]
[--watched] [--followers] [--following] [--all]
[--issues] [--issue-comments] [--issue-events] [--pulls]
[--pull-comments] [--pull-reviews] [--pull-commits]
[--pull-details]
[--labels] [--hooks] [--milestones] [--security-advisories]
[--discussions] [--repositories] [--bare] [--no-prune]
[--lfs] [--wikis] [--gists] [--starred-gists]
[--skip-archived] [--skip-existing]
[-L [LANGUAGES ...]] [-N NAME_REGEX] [-H GITHUB_HOST]
[-O] [-R REPOSITORY] [-P] [-F] [--prefer-ssh] [-v]
[--keychain-name OSX_KEYCHAIN_ITEM_NAME]
[--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT]
[--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES]
[--skip-prerelease] [--assets] [--attachments]
[--exclude [REPOSITORY [REPOSITORY ...]]
[--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE]
[--skip-prerelease] [--assets]
[--skip-assets-on [SKIP_ASSETS_ON ...]] [--attachments]
[--throttle-limit THROTTLE_LIMIT]
[--throttle-pause THROTTLE_PAUSE]
[--exclude [EXCLUDE ...]] [--retries MAX_RETRIES]
USER
Backup a github account
@@ -60,29 +65,30 @@ CLI Help output::
positional arguments:
USER github username
optional arguments:
options:
-h, --help show this help message and exit
-u USERNAME, --username USERNAME
username for basic auth
-p PASSWORD, --password PASSWORD
password for basic auth. If a username is given but
not a password, the password will be prompted for.
-f TOKEN_FINE, --token-fine TOKEN_FINE
fine-grained personal access token or path to token
(file://...)
-t TOKEN_CLASSIC, --token TOKEN_CLASSIC
-t, --token TOKEN_CLASSIC
personal access, OAuth, or JSON Web token, or path to
token (file://...)
-f, --token-fine TOKEN_FINE
fine-grained personal access token (github_pat_....),
or path to token (file://...)
--token-from-gh read token from GitHub CLI (gh auth token)
-q, --quiet supress log messages less severe than warning, e.g.
info
--as-app authenticate as github app instead of as a user.
-o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
-o, --output-directory OUTPUT_DIRECTORY
directory at which to backup the repositories
-l LOG_LEVEL, --log-level LOG_LEVEL
-l, --log-level LOG_LEVEL
log level to use (default: info, possible levels:
debug, info, warning, error, critical)
-i, --incremental incremental backup
--incremental-by-files incremental backup using modified time of files
--incremental-by-files
incremental backup based on modification date of files
--starred include JSON output of starred repositories in backup
--all-starred include starred repositories in backup [*]
--starred-skip-size-over MB
skip starred repositories larger than this size in MB
--watched include JSON output of watched repositories in backup
--followers include JSON output of followers in backup
--following include JSON output of following users in backup
@@ -92,28 +98,34 @@ CLI Help output::
--issue-events include issue events in backup
--pulls include pull requests in backup
--pull-comments include pull request review comments in backup
--pull-reviews include pull request reviews in backup
--pull-commits include pull request commits in backup
--pull-details include more pull request details in backup [*]
--labels include labels in backup
--hooks include hooks in backup (works only when
authenticated)
--milestones include milestones in backup
--security-advisories
include security advisories in backup
--discussions include discussions in backup
--repositories include repository clone in backup
--bare clone bare repositories
--no-prune disable prune option for git fetch
--lfs clone LFS repositories (requires Git LFS to be
installed, https://git-lfs.github.com) [*]
--wikis include wiki clone in backup
--gists include gists in backup [*]
--starred-gists include starred gists in backup [*]
--skip-archived skip project if it is archived
--skip-existing skip project if a backup directory exists
-L [LANGUAGES [LANGUAGES ...]], --languages [LANGUAGES [LANGUAGES ...]]
-L, --languages [LANGUAGES ...]
only allow these languages
-N NAME_REGEX, --name-regex NAME_REGEX
-N, --name-regex NAME_REGEX
python regex to match names against
-H GITHUB_HOST, --github-host GITHUB_HOST
-H, --github-host GITHUB_HOST
GitHub Enterprise hostname
-O, --organization whether or not this is an organization user
-R REPOSITORY, --repository REPOSITORY
-R, --repository REPOSITORY
name of repository to limit backup to
-P, --private include private repositories [*]
-F, --fork include forked repositories [*]
@@ -128,16 +140,16 @@ CLI Help output::
--releases include release information, not including assets or
binaries
--latest-releases NUMBER_OF_LATEST_RELEASES
include certain number of the latest releases;
only applies if including releases
--skip-prerelease skip prerelease and draft versions; only applies if including releases
include certain number of the latest releases; only
applies if including releases
--skip-prerelease skip prerelease and draft versions; only applies if
including releases
--assets include assets alongside release information; only
applies if including releases
--attachments download user-attachments from issues and pull requests
to issues/attachments/{issue_number}/ and
pulls/attachments/{pull_number}/ directories
--exclude [REPOSITORY [REPOSITORY ...]]
names of repositories to exclude from backup.
--skip-assets-on [SKIP_ASSETS_ON ...]
skip asset downloads for these repositories
--attachments download user-attachments from issues, pull requests,
and discussions
--throttle-limit THROTTLE_LIMIT
start throttling of GitHub API requests after this
amount of API requests remain
@@ -145,7 +157,10 @@ CLI Help output::
wait this amount of seconds when API request
throttling is active (default: 30.0, requires
--throttle-limit to be set)
--exclude [EXCLUDE ...]
names of repositories to exclude
--retries MAX_RETRIES
maximum number of retries for API calls (default: 5)
Usage Details
=============
@@ -153,13 +168,15 @@ Usage Details
Authentication
--------------
**Password-based authentication** will fail if you have two-factor authentication enabled, and will `be deprecated <https://github.blog/2023-03-09-raising-the-bar-for-software-security-github-2fa-begins-march-13/>`_ by 2023 EOY.
GitHub requires token-based authentication for API access. Password authentication was `removed in November 2020 <https://developer.github.com/changes/2020-02-14-deprecating-password-auth/>`_.
``--username`` is used for basic password authentication and separate from the positional argument ``USER``, which specifies the user account you wish to back up.
The positional argument ``USER`` specifies the user or organization account you wish to back up.
**Classic tokens** are `slightly less secure <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#personal-access-tokens-classic>`_ as they provide very coarse-grained permissions.
**Fine-grained tokens** (``-f TOKEN_FINE``) are recommended for most use cases, especially long-running backups (e.g. cron jobs), as they provide precise permission control.
If you need authentication for long-running backups (e.g. for a cron job) it is recommended to use **fine-grained personal access token** ``-f TOKEN_FINE``.
**Classic tokens** (``-t TOKEN``) are `slightly less secure <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#personal-access-tokens-classic>`_ as they provide very coarse-grained permissions.
If you already authenticate with the `GitHub CLI <https://cli.github.com/>`_, you can use ``--token-from-gh`` to read the token with ``gh auth token`` instead of passing a token directly. This avoids placing the token in shell history or process arguments. When ``--github-host`` is set, the token is read with ``gh auth token --hostname HOST``.
Fine Tokens
@@ -171,7 +188,38 @@ Customise the permissions for your use case, but for a personal account full bac
**User permissions**: Read access to followers, starring, and watching.
**Repository permissions**: Read access to contents, issues, metadata, pull requests, and webhooks.
**Repository permissions**: Read access to contents, discussions, issues, metadata, pull requests, and webhooks.
GitHub Apps
~~~~~~~~~~~
GitHub Apps are ideal for organization backups in CI/CD. Tokens are scoped to specific repositories and expire after 1 hour.
**One-time setup:**
1. Create a GitHub App at *Settings -> Developer Settings -> GitHub Apps -> New GitHub App*
2. Set a name and homepage URL (can be any URL)
3. Uncheck "Webhook > Active" (not needed for backups)
4. Set permissions (same as fine-grained tokens above)
5. Click "Create GitHub App", then note the **App ID** shown on the next page
6. Under "Private keys", click "Generate a private key" and save the downloaded file
7. Go to *Install App* in your app's settings
8. Select the account/organization and which repositories to back up
**CI/CD usage with GitHub Actions:**
Store the App ID as a repository variable and the private key contents as a secret, then use ``actions/create-github-app-token``::
- uses: actions/create-github-app-token@v1
id: app-token
with:
app-id: ${{ vars.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
- run: github-backup myorg -t ${{ steps.app-token.outputs.token }} --as-app -o ./backup --all
Note: Installation tokens expire after 1 hour. For long-running backups, use a fine-grained personal access token instead.
Prefer SSH
@@ -215,13 +263,15 @@ When you use the ``--lfs`` option, you will need to make sure you have Git LFS i
Instructions on how to do this can be found on https://git-lfs.github.com.
LFS objects are fetched for all refs, not just the current checkout, ensuring a complete backup of all LFS content across all branches and history.
About Attachments
-----------------
When you use the ``--attachments`` option with ``--issues`` or ``--pulls``, the tool will download user-uploaded attachments (images, videos, documents, etc.) from issue and pull request descriptions and comments. In some circumstances attachments contain valuable data related to the topic, and without their backup important information or context might be lost inadvertently.
When you use the ``--attachments`` option with ``--issues``, ``--pulls`` or ``--discussions``, the tool will download user-uploaded attachments (images, videos, documents, etc.) from issue, pull request and discussion descriptions and comments. In some circumstances attachments contain valuable data related to the topic, and without their backup important information or context might be lost inadvertently.
Attachments are saved to ``issues/attachments/{issue_number}/`` and ``pulls/attachments/{pull_number}/`` directories, where ``{issue_number}`` is the GitHub issue number (e.g., issue #123 saves to ``issues/attachments/123/``). Each attachment directory contains:
Attachments are saved to ``issues/attachments/{issue_number}/``, ``pulls/attachments/{pull_number}/`` and ``discussions/attachments/{discussion_number}/`` directories, where ``{issue_number}`` is the GitHub issue number (e.g., issue #123 saves to ``issues/attachments/123/``). Each attachment directory contains:
- The downloaded attachment files (named by their GitHub identifier with appropriate file extensions)
- If multiple attachments have the same filename, conflicts are resolved with numeric suffixes (e.g., ``report.pdf``, ``report_1.pdf``, ``report_2.pdf``)
@@ -238,6 +288,29 @@ The tool automatically extracts file extensions from HTTP headers to ensure file
**Repository filtering** for repo files/assets handles renamed and transferred repositories gracefully. URLs are included if they either match the current repository name directly, or redirect to it (e.g., ``willmcgugan/rich`` redirects to ``Textualize/rich`` after transfer).
**Fine-grained token limitation:** Due to a GitHub platform limitation, fine-grained personal access tokens (``github_pat_...``) cannot download attachments from private repositories directly. This affects both ``/assets/`` (images) and ``/files/`` (documents) URLs. The tool implements a workaround for image attachments using GitHub's Markdown API, which converts URLs to temporary JWT-signed URLs that can be downloaded. However, this workaround only works for images - document attachments (PDFs, text files, etc.) will fail with 404 errors when using fine-grained tokens on private repos. For full attachment support on private repositories, use a classic token (``-t``) instead of a fine-grained token (``-f``). See `#477 <https://github.com/josegonzalez/python-github-backup/issues/477>`_ for details.
About Discussions
-----------------
GitHub Discussions are backed up with GitHub's GraphQL API because the REST API does not expose discussions. Use ``--discussions`` to save each discussion as JSON under ``repositories/{repo}/discussions/{number}.json``. Discussion backups include the discussion body and metadata, category information, comments, and comment replies.
``--discussions`` is included in ``--all``. Unlike most REST API-backed resources, discussions require authentication because GitHub's GraphQL API requires a token. Fine-grained personal access tokens and GitHub Apps need read access to the repository's Discussions permission.
Incremental backups use a per-repository checkpoint at ``repositories/{repo}/discussions/last_update`` based on discussion ``updatedAt`` timestamps. This is separate from the repository-level ``last_update`` file so discussion activity is not missed if the repository's own update timestamp does not change. If you enable ``--discussions`` on an existing incremental backup, the first run performs a full discussions backup for each repository and creates the discussions checkpoint for future runs.
About security advisories
-------------------------
GitHub security advisories are only available in public repositories. GitHub does not provide the respective API endpoint for private repositories.
Therefore the logic is implemented as follows:
- Security advisories are included in the `--all` option.
- If only the `--all` option was provided, backups of security advisories are skipped for private repositories.
- If the `--security-advisories` option is provided (on its own or in addition to `--all`), a backup of security advisories is attempted for all repositories, with graceful handling if the GitHub API doesn't return any.
Run in Docker container
-----------------------
@@ -254,17 +327,39 @@ All is not everything
The ``--all`` argument does not include: cloning private repos (``-P, --private``), cloning forks (``-F, --fork``), cloning starred repositories (``--all-starred``), ``--pull-details``, cloning LFS repositories (``--lfs``), cloning gists (``--gists``) or cloning starred gist repos (``--starred-gists``). See examples for more.
Cloning all starred size
------------------------
Starred repository size
-----------------------
Using the ``--all-starred`` argument to clone all starred repositories may use a large amount of storage space.
To see your starred repositories sorted by size (requires `GitHub CLI <https://cli.github.com>`_)::
gh api user/starred --paginate --jq 'sort_by(-.size)[]|"\(.full_name) \(.size/1024|round)MB"'
To limit which starred repositories are cloned, use ``--starred-skip-size-over SIZE`` where SIZE is in MB. For example, ``--starred-skip-size-over 500`` will skip any starred repository where the git repository size (code and history) exceeds 500 MB. Note that this size limit only applies to the repository itself, not issues, release assets or other metadata. This filter only affects starred repositories; your own repositories are always included regardless of size.
For finer control, avoid using ``--assets`` with starred repos, or use ``--skip-assets-on`` for specific repositories with large release binaries.
Alternatively, consider just storing links to starred repos in JSON format with ``--starred``.
About pull request reviews
--------------------------
Use ``--pull-reviews`` with ``--pulls`` to include GitHub pull request review metadata under each pull request's ``review_data`` key. Reviews are separate from review comments: ``--pull-comments`` backs up inline review comments via ``comment_data`` and regular PR conversation comments via ``comment_regular_data``, while ``--pull-reviews`` backs up review state, submitted time, commit ID, and the top-level review body.
``--pull-reviews`` is included in ``--all``. Incremental backups use a per-repository checkpoint at ``repositories/{repo}/pulls/reviews_last_update``. If ``--pull-reviews`` is enabled on an existing incremental backup, the first run performs a one-time backfill for pull request reviews so older PRs are not skipped by the existing pull request checkpoint. Existing ``comment_data``, ``comment_regular_data`` and ``commit_data`` fields are preserved when only review data is being added.
Using the ``--all-starred`` argument to clone all starred repositories may use a large amount of storage space, especially if ``--all`` or more arguments are used. e.g. commonly starred repos can have tens of thousands of issues, many large assets and the repo itself etc. Consider just storing links to starred repos in JSON format with ``--starred``.
Incremental Backup
------------------
Using (``-i, --incremental``) will only request new data from the API **since the last run (successful or not)**. e.g. only request issues from the API since the last run.
Using (``-i, --incremental``) will only request new data from the API **since the last successful resource backup**. e.g. only request issues from the API since the last issue backup for that repository.
This means any blocking errors on previous runs can cause a large amount of missing data in backups.
Incremental checkpoints for issue and pull request API backups are stored per resource in that repository's backup directory (for example ``repositories/{repo}/issues/last_update``, ``repositories/{repo}/pulls/last_update`` or ``starred/{owner}/{repo}/pulls/last_update``). Older versions stored a single global ``last_update`` file in the output directory root. During migration, the legacy global checkpoint is used as a fallback only for resource directories that already contain backup data but do not yet have their own checkpoint. New repositories or newly enabled resources with no existing data get a full backup instead of inheriting an unrelated global checkpoint.
After all existing issue and pull request resource directories have per-resource checkpoints, the legacy global ``last_update`` file is removed automatically.
This means any blocking errors on previous runs can cause missing data in backups for the affected repository resource.
Using (``--incremental-by-files``) will request new data from the API **based on when the file was modified on filesystem**. e.g. if you modify the file yourself you may miss something.
@@ -277,15 +372,15 @@ Known blocking errors
Some errors will block the backup run by exiting the script. e.g. receiving a 403 Forbidden error from the Github API.
If the incremental argument is used, this will result in the next backup only requesting API data since the last blocked/failed run. Potentially causing unexpected large amounts of missing data.
If the incremental argument is used, per-resource checkpoints are only advanced after that resource's backup work completes. A blocking error can still abort the overall run, but repositories and resources that were not processed will keep their previous checkpoints.
It's therefore recommended to only use the incremental argument if the output/result is being actively monitored, or complimented with periodic full non-incremental runs, to avoid unexpected missing data in a regular backup runs.
1. **Starred public repo hooks blocking**
**Starred public repo hooks blocking**
Since the ``--all`` argument includes ``--hooks``, if you use ``--all`` and ``--all-starred`` together to clone a users starred public repositories, the backup will likely error and block the backup continuing.
Since the ``--all`` argument includes ``--hooks``, if you use ``--all`` and ``--all-starred`` together to clone a users starred public repositories, the backup will likely error and block the backup continuing.
This is due to needing the correct permission for ``--hooks`` on public repos.
This is due to needing the correct permission for ``--hooks`` on public repos.
"bare" is actually "mirror"
@@ -301,6 +396,8 @@ Starred gists vs starred repo behaviour
The starred normal repo cloning (``--all-starred``) argument stores starred repos separately to the users own repositories. However, using ``--starred-gists`` will store starred gists within the same directory as the users own gists ``--gists``. Also, all gist repo directory names are IDs not the gist's name.
Note: ``--starred-gists`` only retrieves starred gists for the authenticated user, not the target user, due to a GitHub API limitation.
Skip existing on incomplete backups
-----------------------------------
@@ -308,6 +405,25 @@ Skip existing on incomplete backups
The ``--skip-existing`` argument will skip a backup if the directory already exists, even if the backup in that directory failed (perhaps due to a blocking error). This may result in unexpected missing data in a regular backup.
Updates use fetch, not pull
---------------------------
When updating an existing repository backup, ``github-backup`` uses ``git fetch`` rather than ``git pull``. This is intentional - a backup tool should reliably download data without risk of failure. Using ``git pull`` would require handling merge conflicts, which adds complexity and could cause backups to fail unexpectedly.
With fetch, **all branches and commits are downloaded** safely into remote-tracking branches. The working directory files won't change, but your backup is complete.
If you look at files directly (e.g., ``cat README.md``), you'll see the old content. The new data is in the remote-tracking branches (confusingly named "remote" but stored locally). To view or use the latest files::
git show origin/main:README.md # view a file
git merge origin/main # update working directory
All branches are backed up as remote refs (``origin/main``, ``origin/feature-branch``, etc.).
If you want to browse files directly without merging, consider using ``--bare`` which skips the working directory entirely - the backup is just the git data.
See `#269 <https://github.com/josegonzalez/python-github-backup/issues/269>`_ for more discussion.
Github Backup Examples
======================
@@ -329,15 +445,37 @@ Quietly and incrementally backup useful Github user data (public and private rep
export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
GH_USER=YOUR-GITHUB-USER
github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER
github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-reviews --pull-commits --labels --milestones --security-advisories --discussions --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER
Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup. ::
export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
GH_USER=YOUR-GITHUB-USER
github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-reviews --pull-commits --labels --milestones --discussions --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
Pipe a token from stdin to avoid storing it in environment variables or command history (Unix-like systems only)::
my-secret-manager get github-token | github-backup user -t file:///dev/stdin -o /backup --repositories
Restoring from Backup
=====================
This tool creates backups only, there is no inbuilt restore command.
**Git repositories, wikis, and gists** can be restored by pushing them back to GitHub as you would any git repository. For example, to restore a bare repository backup::
cd /tmp/white-house/repositories/petitions/repository
git push --mirror git@github.com:WhiteHouse/petitions.git
**Issues, pull requests, discussions, comments, and other metadata** are saved as JSON files for archival purposes. The GitHub API does not support recreating this data faithfully, creating issues via the API has limitations:
- New issue/PR numbers are assigned (original numbers cannot be set)
- Timestamps reflect creation time (original dates cannot be set)
- The API caller becomes the author (original authors cannot be set)
- Cross-references between issues and PRs will break
These are GitHub API limitations that affect all backup and migration tools, not just this one. Recreating issues with these limitations via the GitHub API is an exercise for the reader. The JSON backups remain useful for searching, auditing, or manual reference.
Development
@@ -357,7 +495,12 @@ A huge thanks to all the contibuters!
Testing
-------
This project currently contains no unit tests. To run linting::
To run the test suite::
pip install pytest
pytest
To run linting::
pip install flake8
flake8 --ignore=E501

View File

@@ -1,58 +1,18 @@
#!/usr/bin/env python
"""
Backwards-compatible wrapper script.
The recommended way to run github-backup is via the installed command
(pip install github-backup) or python -m github_backup.
This script is kept for backwards compatibility with existing installations
that may reference this path directly.
"""
import logging
import os
import sys
from github_backup.github_backup import (
backup_account,
backup_repositories,
check_git_lfs_install,
filter_repositories,
get_authenticated_user,
logger,
mkdir_p,
parse_args,
retrieve_repositories,
)
logging.basicConfig(
format="%(asctime)s.%(msecs)03d: %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S",
level=logging.INFO,
)
def main():
args = parse_args()
if args.quiet:
logger.setLevel(logging.WARNING)
output_directory = os.path.realpath(args.output_directory)
if not os.path.isdir(output_directory):
logger.info("Create output directory {0}".format(output_directory))
mkdir_p(output_directory)
if args.lfs_clone:
check_git_lfs_install()
if args.log_level:
log_level = logging.getLevelName(args.log_level.upper())
if isinstance(log_level, int):
logger.root.setLevel(log_level)
if not args.as_app:
logger.info("Backing up user {0} to {1}".format(args.user, output_directory))
authenticated_user = get_authenticated_user(args)
else:
authenticated_user = {"login": None}
repositories = retrieve_repositories(args, authenticated_user)
repositories = filter_repositories(args, repositories)
backup_repositories(args, output_directory, repositories)
backup_account(args, output_directory)
from github_backup.cli import main
from github_backup.github_backup import logger
if __name__ == "__main__":
try:

View File

@@ -1 +1 @@
__version__ = "0.51.3"
__version__ = "0.62.0"

13
github_backup/__main__.py Normal file
View File

@@ -0,0 +1,13 @@
"""Allow running as: python -m github_backup"""
import sys
from github_backup.cli import main
from github_backup.github_backup import logger
if __name__ == "__main__":
try:
main()
except Exception as e:
logger.error(str(e))
sys.exit(1)

92
github_backup/cli.py Normal file
View File

@@ -0,0 +1,92 @@
#!/usr/bin/env python
"""Command-line interface for github-backup."""
import logging
import os
import sys
from github_backup.github_backup import (
backup_account,
backup_repositories,
check_git_lfs_install,
filter_repositories,
get_auth,
get_authenticated_user,
logger,
mkdir_p,
parse_args,
retrieve_repositories,
)
# INFO and DEBUG go to stdout, WARNING and above go to stderr
log_format = logging.Formatter(
fmt="%(asctime)s.%(msecs)03d: %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S",
)
stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.setLevel(logging.DEBUG)
stdout_handler.addFilter(lambda r: r.levelno < logging.WARNING)
stdout_handler.setFormatter(log_format)
stderr_handler = logging.StreamHandler(sys.stderr)
stderr_handler.setLevel(logging.WARNING)
stderr_handler.setFormatter(log_format)
logging.basicConfig(level=logging.INFO, handlers=[stdout_handler, stderr_handler])
def main():
"""Main entry point for github-backup CLI."""
args = parse_args()
if args.private and not get_auth(args):
logger.warning(
"The --private flag has no effect without authentication. "
"Use -t/--token or -f/--token-fine to authenticate."
)
# Issue #477: Fine-grained PATs cannot download all attachment types from
# private repos. Image attachments will be retried via Markdown API workaround.
if args.include_attachments and args.token_fine:
logger.warning(
"Using --attachments with fine-grained token. Due to GitHub platform "
"limitations, file attachments (PDFs, etc.) from private repos may fail. "
"Image attachments will be retried via workaround. For full attachment "
"support, use --token-classic instead."
)
if args.quiet:
logger.setLevel(logging.WARNING)
output_directory = os.path.realpath(args.output_directory)
if not os.path.isdir(output_directory):
logger.info("Create output directory {0}".format(output_directory))
mkdir_p(output_directory)
if args.lfs_clone:
check_git_lfs_install()
if args.log_level:
log_level = logging.getLevelName(args.log_level.upper())
if isinstance(log_level, int):
logger.root.setLevel(log_level)
if not args.as_app:
logger.info("Backing up user {0} to {1}".format(args.user, output_directory))
authenticated_user = get_authenticated_user(args)
else:
authenticated_user = {"login": None}
repositories = retrieve_repositories(args, authenticated_user)
repositories = filter_repositories(args, repositories)
backup_repositories(args, output_directory, repositories)
backup_account(args, output_directory)
if __name__ == "__main__":
try:
main()
except Exception as e:
logger.error(str(e))
sys.exit(1)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,292 @@
"""GraphQL query templates used by github-backup."""
DISCUSSION_PAGE_SIZE = 100
DISCUSSION_LIST_QUERY = """
query($owner: String!, $name: String!, $after: String, $pageSize: Int!) {
repository(owner: $owner, name: $name) {
hasDiscussionsEnabled
discussions(
first: $pageSize,
after: $after,
orderBy: {field: UPDATED_AT, direction: DESC}
) {
totalCount
nodes {
id
number
title
updatedAt
}
pageInfo {
hasNextPage
endCursor
}
}
}
}
"""
DISCUSSION_DETAIL_QUERY = """
query(
$owner: String!,
$name: String!,
$number: Int!,
$commentsCursor: String,
$pageSize: Int!
) {
repository(owner: $owner, name: $name) {
discussion(number: $number) {
activeLockReason
answer {
id
databaseId
url
}
answerChosenAt
answerChosenBy {
...ActorFields
}
author {
...ActorFields
}
authorAssociation
body
bodyHTML
bodyText
category {
createdAt
description
emoji
emojiHTML
id
isAnswerable
name
slug
updatedAt
}
closed
closedAt
createdAt
createdViaEmail
databaseId
editor {
...ActorFields
}
id
includesCreatedEdit
isAnswered
labels(first: 100) {
totalCount
nodes {
id
name
color
description
}
}
lastEditedAt
locked
number
poll {
id
question
totalVoteCount
options(first: 100) {
totalCount
nodes {
id
option
totalVoteCount
}
}
}
publishedAt
reactionGroups {
...ReactionGroupFields
}
resourcePath
stateReason
title
updatedAt
upvoteCount
url
comments(first: $pageSize, after: $commentsCursor) {
totalCount
nodes {
...DiscussionCommentFields
replies(first: $pageSize) {
totalCount
nodes {
...DiscussionReplyFields
}
pageInfo {
hasNextPage
endCursor
}
}
}
pageInfo {
hasNextPage
endCursor
}
}
}
}
}
fragment ActorFields on Actor {
avatarUrl
login
resourcePath
url
}
fragment ReactionGroupFields on ReactionGroup {
content
reactors {
totalCount
}
}
fragment DiscussionCommentFields on DiscussionComment {
author {
...ActorFields
}
authorAssociation
body
bodyHTML
bodyText
createdAt
createdViaEmail
databaseId
deletedAt
editor {
...ActorFields
}
id
includesCreatedEdit
isAnswer
isMinimized
lastEditedAt
minimizedReason
publishedAt
reactionGroups {
...ReactionGroupFields
}
replyTo {
id
databaseId
url
}
resourcePath
updatedAt
upvoteCount
url
}
fragment DiscussionReplyFields on DiscussionComment {
author {
...ActorFields
}
authorAssociation
body
bodyHTML
bodyText
createdAt
createdViaEmail
databaseId
deletedAt
editor {
...ActorFields
}
id
includesCreatedEdit
isAnswer
isMinimized
lastEditedAt
minimizedReason
publishedAt
reactionGroups {
...ReactionGroupFields
}
replyTo {
id
databaseId
url
}
resourcePath
updatedAt
upvoteCount
url
}
"""
DISCUSSION_REPLIES_QUERY = """
query($commentId: ID!, $repliesCursor: String, $pageSize: Int!) {
node(id: $commentId) {
... on DiscussionComment {
replies(first: $pageSize, after: $repliesCursor) {
totalCount
nodes {
...DiscussionReplyFields
}
pageInfo {
hasNextPage
endCursor
}
}
}
}
}
fragment ActorFields on Actor {
avatarUrl
login
resourcePath
url
}
fragment ReactionGroupFields on ReactionGroup {
content
reactors {
totalCount
}
}
fragment DiscussionReplyFields on DiscussionComment {
author {
...ActorFields
}
authorAssociation
body
bodyHTML
bodyText
createdAt
createdViaEmail
databaseId
deletedAt
editor {
...ActorFields
}
id
includesCreatedEdit
isAnswer
isMinimized
lastEditedAt
minimizedReason
publishedAt
reactionGroups {
...ReactionGroupFields
}
replyTo {
id
databaseId
url
}
resourcePath
updatedAt
upvoteCount
url
}
"""

View File

@@ -1,40 +1,15 @@
# Linting & Formatting
autopep8==2.3.2
black==25.11.0
bleach==6.3.0
certifi==2025.11.12
charset-normalizer==3.4.4
click==8.3.1
colorama==0.4.6
docutils==0.22.3
black==26.3.1
flake8==7.3.0
gitchangelog==3.0.4
pytest==9.0.1
idna==3.11
importlib-metadata==8.7.0
jaraco.classes==3.4.0
keyring==25.7.0
markdown-it-py==4.0.0
mccabe==0.7.0
mdurl==0.1.2
more-itertools==10.8.0
mypy-extensions==1.1.0
packaging==25.0
pathspec==0.12.1
pkginfo==1.12.1.2
platformdirs==4.5.0
pycodestyle==2.14.0
pyflakes==3.4.0
Pygments==2.19.2
readme-renderer==44.0
requests==2.32.5
requests-toolbelt==1.0.0
restructuredtext-lint==1.4.0
rfc3986==2.0.0
rich==14.2.0
setuptools==80.9.0
six==1.17.0
tqdm==4.67.1
# Testing
pytest==9.0.3
# Release & Publishing
twine==6.2.0
urllib3==2.5.0
webencodings==0.5.1
zipp==3.23.0
gitchangelog==3.0.4
setuptools==82.0.1
# Documentation
restructuredtext-lint==2.0.2

View File

@@ -33,7 +33,11 @@ setup(
author="Jose Diaz-Gonzalez",
author_email="github-backup@josediazgonzalez.com",
packages=["github_backup"],
scripts=["bin/github-backup"],
entry_points={
"console_scripts": [
"github-backup=github_backup.cli:main",
],
},
url="http://github.com/josegonzalez/python-github-backup",
license="MIT",
classifiers=[

25
tests/conftest.py Normal file
View File

@@ -0,0 +1,25 @@
"""Shared pytest fixtures for github-backup tests."""
import pytest
from github_backup.github_backup import parse_args
@pytest.fixture
def create_args():
"""Factory fixture that creates args with real CLI defaults.
Uses the actual argument parser so new CLI args are automatically
available with their defaults - no test updates needed.
Usage:
def test_something(self, create_args):
args = create_args(include_releases=True, user="myuser")
"""
def _create(**overrides):
# Use real parser to get actual defaults
args = parse_args(["testuser"])
for key, value in overrides.items():
setattr(args, key, value)
return args
return _create

116
tests/test_all_starred.py Normal file
View File

@@ -0,0 +1,116 @@
"""Tests for --all-starred flag behavior (issue #225)."""
import pytest
from unittest.mock import patch
from github_backup import github_backup
class TestAllStarredCloning:
"""Test suite for --all-starred repository cloning behavior.
Issue #225: --all-starred should clone starred repos without requiring --repositories.
"""
@patch('github_backup.github_backup.fetch_repository')
@patch('github_backup.github_backup.get_github_repo_url')
def test_all_starred_clones_without_repositories_flag(self, mock_get_url, mock_fetch, create_args):
"""--all-starred should clone starred repos without --repositories flag.
This is the core fix for issue #225.
"""
args = create_args(all_starred=True)
mock_get_url.return_value = "https://github.com/otheruser/awesome-project.git"
# A starred repository (is_starred flag set by retrieve_repositories)
starred_repo = {
"name": "awesome-project",
"full_name": "otheruser/awesome-project",
"owner": {"login": "otheruser"},
"private": False,
"fork": False,
"has_wiki": False,
"is_starred": True, # This flag is set for starred repos
}
with patch('github_backup.github_backup.mkdir_p'):
github_backup.backup_repositories(args, "/tmp/backup", [starred_repo])
# fetch_repository should be called for the starred repo
assert mock_fetch.called, "--all-starred should trigger repository cloning"
mock_fetch.assert_called_once()
call_args = mock_fetch.call_args
assert call_args[0][0] == "awesome-project" # repo name
@patch('github_backup.github_backup.fetch_repository')
@patch('github_backup.github_backup.get_github_repo_url')
def test_starred_repo_not_cloned_without_all_starred_flag(self, mock_get_url, mock_fetch, create_args):
"""Starred repos should NOT be cloned if --all-starred is not set."""
args = create_args(all_starred=False)
mock_get_url.return_value = "https://github.com/otheruser/awesome-project.git"
starred_repo = {
"name": "awesome-project",
"full_name": "otheruser/awesome-project",
"owner": {"login": "otheruser"},
"private": False,
"fork": False,
"has_wiki": False,
"is_starred": True,
}
with patch('github_backup.github_backup.mkdir_p'):
github_backup.backup_repositories(args, "/tmp/backup", [starred_repo])
# fetch_repository should NOT be called
assert not mock_fetch.called, "Starred repos should not be cloned without --all-starred"
@patch('github_backup.github_backup.fetch_repository')
@patch('github_backup.github_backup.get_github_repo_url')
def test_non_starred_repo_not_cloned_with_only_all_starred(self, mock_get_url, mock_fetch, create_args):
"""Non-starred repos should NOT be cloned when only --all-starred is set."""
args = create_args(all_starred=True)
mock_get_url.return_value = "https://github.com/testuser/my-project.git"
# A regular (non-starred) repository
regular_repo = {
"name": "my-project",
"full_name": "testuser/my-project",
"owner": {"login": "testuser"},
"private": False,
"fork": False,
"has_wiki": False,
# No is_starred flag
}
with patch('github_backup.github_backup.mkdir_p'):
github_backup.backup_repositories(args, "/tmp/backup", [regular_repo])
# fetch_repository should NOT be called for non-starred repos
assert not mock_fetch.called, "Non-starred repos should not be cloned with only --all-starred"
@patch('github_backup.github_backup.fetch_repository')
@patch('github_backup.github_backup.get_github_repo_url')
def test_repositories_flag_still_works(self, mock_get_url, mock_fetch, create_args):
"""--repositories flag should still clone repos as before."""
args = create_args(include_repository=True)
mock_get_url.return_value = "https://github.com/testuser/my-project.git"
regular_repo = {
"name": "my-project",
"full_name": "testuser/my-project",
"owner": {"login": "testuser"},
"private": False,
"fork": False,
"has_wiki": False,
}
with patch('github_backup.github_backup.mkdir_p'):
github_backup.backup_repositories(args, "/tmp/backup", [regular_repo])
# fetch_repository should be called
assert mock_fetch.called, "--repositories should trigger repository cloning"
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -4,7 +4,7 @@ import json
import os
import tempfile
from pathlib import Path
from unittest.mock import Mock
from unittest.mock import Mock, patch
import pytest
@@ -12,24 +12,13 @@ from github_backup import github_backup
@pytest.fixture
def attachment_test_setup(tmp_path):
def attachment_test_setup(tmp_path, create_args):
"""Fixture providing setup and helper for attachment download tests."""
from unittest.mock import patch
issue_cwd = tmp_path / "issues"
issue_cwd.mkdir()
# Mock args
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.user = "testuser"
args.repository = "testrepo"
# Create args using shared fixture
args = create_args(user="testuser", repository="testrepo")
repository = {"full_name": "testuser/testrepo"}
@@ -351,3 +340,146 @@ class TestManifestDuplicatePrevention:
downloaded_urls[0]
== "https://github.com/user-attachments/assets/unavailable"
)
class TestJWTWorkaround:
"""Test JWT workaround for fine-grained tokens on private repos (issue #477)."""
def test_markdown_api_extracts_jwt_url(self):
"""Markdown API response with JWT URL is extracted correctly."""
html_response = (
'<p><a href="https://private-user-images.githubusercontent.com'
'/123/abc.png?jwt=eyJhbGciOiJ"><img src="https://private-user-'
'images.githubusercontent.com/123/abc.png?jwt=eyJhbGciOiJ" '
'alt="img"></a></p>'
)
mock_response = Mock()
mock_response.read.return_value = html_response.encode("utf-8")
with patch("github_backup.github_backup.urlopen", return_value=mock_response):
result = github_backup.get_jwt_signed_url_via_markdown_api(
"https://github.com/user-attachments/assets/abc123",
"github_pat_token",
"owner/repo"
)
expected = (
"https://private-user-images.githubusercontent.com"
"/123/abc.png?jwt=eyJhbGciOiJ"
)
assert result == expected
def test_markdown_api_returns_none_on_http_error(self):
"""HTTP errors return None."""
from urllib.error import HTTPError
error = HTTPError("http://test", 403, "Forbidden", {}, None)
with patch("github_backup.github_backup.urlopen", side_effect=error):
result = github_backup.get_jwt_signed_url_via_markdown_api(
"https://github.com/user-attachments/assets/abc123",
"github_pat_token",
"owner/repo"
)
assert result is None
def test_markdown_api_returns_none_when_no_jwt_url(self):
"""Response without JWT URL returns None."""
mock_response = Mock()
mock_response.read.return_value = b"<p>No image here</p>"
with patch("github_backup.github_backup.urlopen", return_value=mock_response):
result = github_backup.get_jwt_signed_url_via_markdown_api(
"https://github.com/user-attachments/assets/abc123",
"github_pat_token",
"owner/repo"
)
assert result is None
def test_needs_jwt_only_for_fine_grained_private_assets(self):
"""needs_jwt is True only for fine-grained + private + /assets/ URL."""
assets_url = "https://github.com/user-attachments/assets/abc123"
files_url = "https://github.com/user-attachments/files/123/doc.pdf"
token_fine = "github_pat_test"
private = True
public = False
# Fine-grained + private + assets = True
needs_jwt = (
token_fine is not None
and private
and "github.com/user-attachments/assets/" in assets_url
)
assert needs_jwt is True
# Fine-grained + private + files = False
needs_jwt = (
token_fine is not None
and private
and "github.com/user-attachments/assets/" in files_url
)
assert needs_jwt is False
# Fine-grained + public + assets = False
needs_jwt = (
token_fine is not None
and public
and "github.com/user-attachments/assets/" in assets_url
)
assert needs_jwt is False
def test_jwt_workaround_sets_manifest_flag(self, attachment_test_setup):
"""Successful JWT workaround sets jwt_workaround flag in manifest."""
setup = attachment_test_setup
setup["args"].token_fine = "github_pat_test"
setup["repository"]["private"] = True
issue_data = {"body": "https://github.com/user-attachments/assets/abc123"}
jwt_url = "https://private-user-images.githubusercontent.com/123/abc.png?jwt=token"
with patch(
"github_backup.github_backup.get_jwt_signed_url_via_markdown_api",
return_value=jwt_url
), patch(
"github_backup.github_backup.download_attachment_file",
return_value={"success": True, "http_status": 200, "url": jwt_url}
):
github_backup.download_attachments(
setup["args"], setup["issue_cwd"], issue_data, 123, setup["repository"]
)
manifest_path = os.path.join(setup["issue_cwd"], "attachments", "123", "manifest.json")
with open(manifest_path) as f:
manifest = json.load(f)
assert manifest["attachments"][0]["jwt_workaround"] is True
assert manifest["attachments"][0]["url"] == "https://github.com/user-attachments/assets/abc123"
def test_jwt_workaround_failure_uses_skipped_at(self, attachment_test_setup):
"""Failed JWT workaround uses skipped_at instead of downloaded_at."""
setup = attachment_test_setup
setup["args"].token_fine = "github_pat_test"
setup["repository"]["private"] = True
issue_data = {"body": "https://github.com/user-attachments/assets/abc123"}
with patch(
"github_backup.github_backup.get_jwt_signed_url_via_markdown_api",
return_value=None # Markdown API failed
):
github_backup.download_attachments(
setup["args"], setup["issue_cwd"], issue_data, 123, setup["repository"]
)
manifest_path = os.path.join(setup["issue_cwd"], "attachments", "123", "manifest.json")
with open(manifest_path) as f:
manifest = json.load(f)
attachment = manifest["attachments"][0]
assert attachment["success"] is False
assert "skipped_at" in attachment
assert "downloaded_at" not in attachment
assert "Use --token-classic" in attachment["error"]

75
tests/test_auth.py Normal file
View File

@@ -0,0 +1,75 @@
"""Tests for authentication helpers."""
from unittest.mock import patch
import pytest
from github_backup import github_backup
def test_token_from_gh_flag_parses():
args = github_backup.parse_args(["--token-from-gh", "testuser"])
assert args.token_from_gh is True
def test_get_auth_reads_token_from_gh_cli(create_args):
args = create_args(token_from_gh=True)
with patch(
"github_backup.github_backup.subprocess.check_output",
return_value=b"gho_test_token\n",
) as mock_check_output:
auth = github_backup.get_auth(args, encode=False)
assert auth == "gho_test_token:x-oauth-basic"
mock_check_output.assert_called_once_with(
["gh", "auth", "token"], stderr=github_backup.subprocess.PIPE
)
def test_get_auth_reads_token_from_gh_cli_for_enterprise_host(create_args):
args = create_args(token_from_gh=True, github_host="ghe.example.com")
with patch(
"github_backup.github_backup.subprocess.check_output",
return_value=b"gho_enterprise_token\n",
) as mock_check_output:
auth = github_backup.get_auth(args, encode=False)
assert auth == "gho_enterprise_token:x-oauth-basic"
mock_check_output.assert_called_once_with(
["gh", "auth", "token", "--hostname", "ghe.example.com"],
stderr=github_backup.subprocess.PIPE,
)
def test_token_from_gh_is_cached(create_args):
args = create_args(token_from_gh=True)
with patch(
"github_backup.github_backup.subprocess.check_output",
return_value=b"gho_cached_token\n",
) as mock_check_output:
assert github_backup.get_auth(args, encode=False) == "gho_cached_token:x-oauth-basic"
assert github_backup.get_auth(args, encode=False) == "gho_cached_token:x-oauth-basic"
mock_check_output.assert_called_once()
def test_graphql_auth_strips_basic_auth_suffix_for_gh_cli_token(create_args):
args = create_args(token_from_gh=True)
with patch(
"github_backup.github_backup.subprocess.check_output",
return_value=b"gho_graphql_token\n",
):
assert github_backup.get_graphql_auth(args) == "gho_graphql_token"
def test_token_from_gh_rejects_as_app(create_args):
args = create_args(token_from_gh=True, as_app=True)
with pytest.raises(Exception) as exc_info:
github_backup.get_auth(args, encode=False)
assert "--token-from-gh cannot be used with --as-app" in str(exc_info.value)

View File

@@ -0,0 +1,84 @@
"""Tests for case-insensitive username/organization filtering."""
import pytest
from github_backup import github_backup
class TestCaseSensitivity:
"""Test suite for case-insensitive username matching in filter_repositories."""
def test_filter_repositories_case_insensitive_user(self, create_args):
"""Should filter repositories case-insensitively for usernames.
Reproduces issue #198 where typing 'iamrodos' fails to match
repositories with owner.login='Iamrodos' (the canonical case from GitHub API).
"""
# Simulate user typing lowercase username
args = create_args(user="iamrodos")
# Simulate GitHub API returning canonical case
repos = [
{
"name": "repo1",
"owner": {"login": "Iamrodos"}, # Capital I (canonical from API)
"private": False,
"fork": False,
},
{
"name": "repo2",
"owner": {"login": "Iamrodos"},
"private": False,
"fork": False,
},
]
filtered = github_backup.filter_repositories(args, repos)
# Should match despite case difference
assert len(filtered) == 2
assert filtered[0]["name"] == "repo1"
assert filtered[1]["name"] == "repo2"
def test_filter_repositories_case_insensitive_org(self, create_args):
"""Should filter repositories case-insensitively for organizations.
Tests the example from issue #198 where 'prai-org' doesn't match 'PRAI-Org'.
"""
args = create_args(user="prai-org")
repos = [
{
"name": "repo1",
"owner": {"login": "PRAI-Org"}, # Different case (canonical from API)
"private": False,
"fork": False,
},
]
filtered = github_backup.filter_repositories(args, repos)
# Should match despite case difference
assert len(filtered) == 1
assert filtered[0]["name"] == "repo1"
def test_filter_repositories_case_variations(self, create_args):
"""Should handle various case combinations correctly."""
args = create_args(user="TeSt-UsEr")
repos = [
{"name": "repo1", "owner": {"login": "test-user"}, "private": False, "fork": False},
{"name": "repo2", "owner": {"login": "TEST-USER"}, "private": False, "fork": False},
{"name": "repo3", "owner": {"login": "TeSt-UsEr"}, "private": False, "fork": False},
{"name": "repo4", "owner": {"login": "other-user"}, "private": False, "fork": False},
]
filtered = github_backup.filter_repositories(args, repos)
# Should match first 3 (all case variations of same user)
assert len(filtered) == 3
assert set(r["name"] for r in filtered) == {"repo1", "repo2", "repo3"}
if __name__ == "__main__":
pytest.main([__file__, "-v"])

257
tests/test_discussions.py Normal file
View File

@@ -0,0 +1,257 @@
"""Tests for GitHub Discussions backup support."""
import json
import os
from unittest.mock import patch
from github_backup import github_backup
def test_parse_args_discussions_flag():
args = github_backup.parse_args(["--discussions", "testuser"])
assert args.include_discussions is True
def test_retrieve_discussion_summaries_stops_at_incremental_since(create_args):
args = create_args()
repository = {"full_name": "owner/repo"}
page = {
"repository": {
"hasDiscussionsEnabled": True,
"discussions": {
"totalCount": 3,
"nodes": [
{"number": 3, "title": "new", "updatedAt": "2026-02-01T00:00:00Z"},
{"number": 2, "title": "also new", "updatedAt": "2026-01-10T00:00:00Z"},
{"number": 1, "title": "old", "updatedAt": "2025-12-01T00:00:00Z"},
],
"pageInfo": {"hasNextPage": True, "endCursor": "NEXT"},
},
}
}
with patch(
"github_backup.github_backup.retrieve_graphql_data", return_value=page
) as mock_retrieve:
summaries, newest, enabled, total = github_backup.retrieve_discussion_summaries(
args, repository, since="2026-01-01T00:00:00Z"
)
assert enabled is True
assert total == 3
assert newest == "2026-02-01T00:00:00Z"
assert [item["number"] for item in summaries] == [3, 2]
# The old discussion stops pagination, so the next page is not requested.
assert mock_retrieve.call_count == 1
assert (
mock_retrieve.call_args.kwargs["log_context"]
== "discussion summaries owner/repo page 1"
)
def test_retrieve_discussion_summaries_excludes_checkpoint_timestamp(create_args):
args = create_args()
repository = {"full_name": "owner/repo"}
page = {
"repository": {
"hasDiscussionsEnabled": True,
"discussions": {
"totalCount": 1,
"nodes": [
{
"number": 1,
"title": "already backed up",
"updatedAt": "2026-01-01T00:00:00Z",
},
],
"pageInfo": {"hasNextPage": True, "endCursor": "NEXT"},
},
}
}
with patch(
"github_backup.github_backup.retrieve_graphql_data", return_value=page
) as mock_retrieve:
summaries, newest, enabled, total = github_backup.retrieve_discussion_summaries(
args, repository, since="2026-01-01T00:00:00Z"
)
assert enabled is True
assert total == 1
assert newest == "2026-01-01T00:00:00Z"
assert summaries == []
assert mock_retrieve.call_count == 1
def test_retrieve_discussion_summaries_disabled_discussions(create_args):
args = create_args()
repository = {"full_name": "owner/repo"}
with patch(
"github_backup.github_backup.retrieve_graphql_data",
return_value={"repository": {"hasDiscussionsEnabled": False}},
):
summaries, newest, enabled, total = github_backup.retrieve_discussion_summaries(
args, repository
)
assert summaries == []
assert newest is None
assert enabled is False
assert total == 0
def _comment(comment_id, body, replies=None, replies_has_next=False):
replies = replies or []
return {
"id": comment_id,
"body": body,
"replies": {
"totalCount": len(replies) + (1 if replies_has_next else 0),
"nodes": replies,
"pageInfo": {
"hasNextPage": replies_has_next,
"endCursor": "REPLIES2" if replies_has_next else None,
},
},
}
def _discussion_page(comment_nodes, has_next=False):
return {
"repository": {
"discussion": {
"number": 42,
"title": "Discussion title",
"updatedAt": "2026-02-01T00:00:00Z",
"comments": {
"totalCount": 2,
"nodes": comment_nodes,
"pageInfo": {
"hasNextPage": has_next,
"endCursor": "COMMENTS2" if has_next else None,
},
},
}
}
}
def test_retrieve_discussion_paginates_comments_and_replies(create_args):
args = create_args()
repository = {"full_name": "owner/repo"}
reply_1 = {"id": "reply-1", "body": "first reply"}
reply_2 = {"id": "reply-2", "body": "second reply"}
comment_1 = _comment("comment-1", "first comment", [reply_1], replies_has_next=True)
comment_2 = _comment("comment-2", "second comment")
responses = [
_discussion_page([comment_1], has_next=True),
{
"node": {
"replies": {
"totalCount": 2,
"nodes": [reply_2],
"pageInfo": {"hasNextPage": False, "endCursor": None},
}
}
},
_discussion_page([comment_2], has_next=False),
]
with patch(
"github_backup.github_backup.retrieve_graphql_data", side_effect=responses
) as mock_retrieve:
discussion = github_backup.retrieve_discussion(args, repository, 42)
assert discussion["number"] == 42
assert discussion["comment_count"] == 2
assert len(discussion["comment_data"]) == 2
assert discussion["comment_data"][0]["body"] == "first comment"
assert discussion["comment_data"][0]["reply_count"] == 2
assert [r["body"] for r in discussion["comment_data"][0]["reply_data"]] == [
"first reply",
"second reply",
]
assert discussion["comment_data"][1]["body"] == "second comment"
assert mock_retrieve.call_count == 3
assert [
call.kwargs["log_context"] for call in mock_retrieve.call_args_list
] == [
"discussion owner/repo#42 details/comments page 1",
"discussion owner/repo#42 comment comment-1 replies page 2",
"discussion owner/repo#42 details/comments page 2",
]
def test_backup_discussions_uses_incremental_checkpoint(create_args, tmp_path):
args = create_args(token_classic="fake_token", include_discussions=True, incremental=True)
repository = {"full_name": "owner/repo"}
discussions_dir = tmp_path / "discussions"
discussions_dir.mkdir()
(discussions_dir / "last_update").write_text("2026-01-01T00:00:00Z")
def fake_summaries(passed_args, passed_repository, since=None):
assert passed_args is args
assert passed_repository == repository
assert since == "2026-01-01T00:00:00Z"
return (
[{"number": 7, "title": "updated", "updatedAt": "2026-02-01T00:00:00Z"}],
"2026-02-01T00:00:00Z",
True,
1,
)
with patch(
"github_backup.github_backup.retrieve_discussion_summaries",
side_effect=fake_summaries,
), patch(
"github_backup.github_backup.retrieve_discussion",
return_value={"number": 7, "title": "updated"},
):
github_backup.backup_discussions(args, tmp_path, repository)
with open(discussions_dir / "7.json", encoding="utf-8") as f:
assert json.load(f) == {"number": 7, "title": "updated"}
assert (discussions_dir / "last_update").read_text() == "2026-02-01T00:00:00Z"
def test_backup_discussions_does_not_advance_checkpoint_on_discussion_error(
create_args, tmp_path
):
args = create_args(token_classic="fake_token", include_discussions=True, incremental=True)
repository = {"full_name": "owner/repo"}
discussions_dir = tmp_path / "discussions"
discussions_dir.mkdir()
(discussions_dir / "last_update").write_text("2026-01-01T00:00:00Z")
with patch(
"github_backup.github_backup.retrieve_discussion_summaries",
return_value=(
[{"number": 7, "title": "updated", "updatedAt": "2026-02-01T00:00:00Z"}],
"2026-02-01T00:00:00Z",
True,
1,
),
), patch(
"github_backup.github_backup.retrieve_discussion",
side_effect=Exception("temporary GraphQL error"),
):
github_backup.backup_discussions(args, tmp_path, repository)
assert (discussions_dir / "last_update").read_text() == "2026-01-01T00:00:00Z"
assert not os.path.exists(discussions_dir / "7.json")
def test_backup_discussions_skips_without_auth(create_args, tmp_path):
args = create_args(include_discussions=True)
repository = {"full_name": "owner/repo"}
with patch("github_backup.github_backup.retrieve_discussion_summaries") as mock_retrieve:
github_backup.backup_discussions(args, tmp_path, repository)
assert not mock_retrieve.called
assert not os.path.exists(tmp_path / "discussions")

203
tests/test_http_451.py Normal file
View File

@@ -0,0 +1,203 @@
"""Tests for HTTP 451 (DMCA takedown) and HTTP 403 (TOS) handling."""
import io
import json
from unittest.mock import patch
from urllib.error import HTTPError
import pytest
from github_backup import github_backup
def _make_http_error(code, body_bytes, msg="Error", headers=None):
"""Create an HTTPError with a readable body (like a real urllib response)."""
if headers is None:
headers = {"x-ratelimit-remaining": "5000"}
return HTTPError(
url="https://api.github.com/repos/test/repo",
code=code,
msg=msg,
hdrs=headers,
fp=io.BytesIO(body_bytes),
)
class TestHTTP451Exception:
"""Test suite for HTTP 451 DMCA takedown exception handling."""
def test_repository_unavailable_error_raised(self, create_args):
"""HTTP 451 should raise RepositoryUnavailableError with DMCA URL."""
args = create_args()
dmca_data = {
"message": "Repository access blocked",
"block": {
"reason": "dmca",
"created_at": "2024-11-12T14:38:04Z",
"html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md",
},
}
body = json.dumps(dmca_data).encode("utf-8")
def mock_urlopen(*a, **kw):
raise _make_http_error(451, body, msg="Unavailable For Legal Reasons")
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/dmca/issues"
)
assert (
exc_info.value.legal_url
== "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
)
assert "451" in str(exc_info.value)
def test_repository_unavailable_error_without_legal_url(self, create_args):
"""HTTP 451 without DMCA details should still raise exception."""
args = create_args()
def mock_urlopen(*a, **kw):
raise _make_http_error(451, b'{"message": "Blocked"}')
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/dmca/issues"
)
assert exc_info.value.legal_url is None
assert "451" in str(exc_info.value)
def test_repository_unavailable_error_with_malformed_json(self, create_args):
"""HTTP 451 with malformed JSON should still raise exception."""
args = create_args()
def mock_urlopen(*a, **kw):
raise _make_http_error(451, b"invalid json {")
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(github_backup.RepositoryUnavailableError):
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/dmca/issues"
)
class TestHTTP403TOS:
"""Test suite for HTTP 403 TOS violation handling."""
def test_403_tos_raises_repository_unavailable(self, create_args):
"""HTTP 403 (non-rate-limit) should raise RepositoryUnavailableError."""
args = create_args()
tos_data = {
"message": "Repository access blocked",
"block": {
"reason": "tos",
"html_url": "https://github.com/contact/tos-violation",
},
}
body = json.dumps(tos_data).encode("utf-8")
def mock_urlopen(*a, **kw):
raise _make_http_error(
403,
body,
msg="Forbidden",
headers={"x-ratelimit-remaining": "5000"},
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/blocked/issues"
)
assert (
exc_info.value.legal_url == "https://github.com/contact/tos-violation"
)
assert "403" in str(exc_info.value)
def test_403_permission_denied_not_converted(self, create_args):
"""HTTP 403 without 'block' in body should propagate as HTTPError, not RepositoryUnavailableError."""
args = create_args()
body = json.dumps({"message": "Must have admin rights to Repository."}).encode(
"utf-8"
)
def mock_urlopen(*a, **kw):
raise _make_http_error(
403,
body,
msg="Forbidden",
headers={"x-ratelimit-remaining": "5000"},
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(HTTPError) as exc_info:
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/private/issues"
)
assert exc_info.value.code == 403
def test_403_rate_limit_not_converted(self, create_args):
"""HTTP 403 with rate limit exhausted should NOT become RepositoryUnavailableError."""
args = create_args()
call_count = 0
def mock_urlopen(*a, **kw):
nonlocal call_count
call_count += 1
raise _make_http_error(
403,
b'{"message": "rate limit"}',
msg="Forbidden",
headers={"x-ratelimit-remaining": "0"},
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
with pytest.raises(HTTPError) as exc_info:
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/ratelimit/issues"
)
assert exc_info.value.code == 403
# Should have retried (not raised immediately as RepositoryUnavailableError)
assert call_count > 1
class TestRetrieveRepositoriesUnavailable:
"""Test that retrieve_repositories handles RepositoryUnavailableError gracefully."""
def test_unavailable_repo_returns_empty_list(self, create_args):
"""retrieve_repositories should return [] when the repo is unavailable."""
args = create_args(repository="blocked-repo")
def mock_urlopen(*a, **kw):
raise _make_http_error(
451,
json.dumps(
{
"message": "Blocked",
"block": {"html_url": "https://example.com/dmca"},
}
).encode("utf-8"),
msg="Unavailable For Legal Reasons",
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
repos = github_backup.retrieve_repositories(args, {"login": None})
assert repos == []
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -0,0 +1,189 @@
"""Tests for per-resource incremental checkpoints."""
import json
import os
from github_backup import github_backup
def _repo(name, updated_at, pushed_at=None):
return {
"name": name,
"full_name": "owner/{0}".format(name),
"owner": {"login": "owner"},
"clone_url": "https://github.com/owner/{0}.git".format(name),
"private": False,
"fork": False,
"has_wiki": False,
"updated_at": updated_at,
"pushed_at": pushed_at,
}
def test_incremental_uses_per_resource_last_update(
create_args, tmp_path, monkeypatch
):
args = create_args(incremental=True, include_issues=True)
repositories = [
_repo("repo-one", "2026-02-01T00:00:00Z"),
_repo("repo-two", "2026-03-01T00:00:00Z"),
]
repo_one_issues = tmp_path / "repositories" / "repo-one" / "issues"
repo_two_issues = tmp_path / "repositories" / "repo-two" / "issues"
repo_one_issues.mkdir(parents=True)
repo_two_issues.mkdir(parents=True)
(repo_one_issues / "last_update").write_text("2026-01-01T00:00:00Z")
(repo_two_issues / "last_update").write_text("2025-01-01T00:00:00Z")
seen_since = []
def fake_backup_issues(passed_args, repo_cwd, repository, repos_template):
seen_since.append((repository["name"], passed_args.since))
monkeypatch.setattr(github_backup, "backup_issues", fake_backup_issues)
github_backup.backup_repositories(args, tmp_path, repositories)
assert seen_since == [
("repo-one", "2026-01-01T00:00:00Z"),
("repo-two", "2025-01-01T00:00:00Z"),
]
assert (repo_one_issues / "last_update").read_text() == "2026-02-01T00:00:00Z"
assert (repo_two_issues / "last_update").read_text() == "2026-03-01T00:00:00Z"
assert not os.path.exists(tmp_path / "last_update")
def test_incremental_uses_independent_issue_and_pull_checkpoints(
create_args, tmp_path, monkeypatch
):
args = create_args(incremental=True, include_issues=True, include_pulls=True)
repository = _repo("repo-one", "2026-02-01T00:00:00Z")
repo_dir = tmp_path / "repositories" / "repo-one"
issues_dir = repo_dir / "issues"
pulls_dir = repo_dir / "pulls"
issues_dir.mkdir(parents=True)
pulls_dir.mkdir(parents=True)
(issues_dir / "last_update").write_text("2026-01-01T00:00:00Z")
(pulls_dir / "last_update").write_text("2025-01-01T00:00:00Z")
seen_since = []
def fake_backup_issues(passed_args, repo_cwd, repository, repos_template):
seen_since.append(("issues", passed_args.since))
def fake_backup_pulls(passed_args, repo_cwd, repository, repos_template):
seen_since.append(("pulls", passed_args.since))
monkeypatch.setattr(github_backup, "backup_issues", fake_backup_issues)
monkeypatch.setattr(github_backup, "backup_pulls", fake_backup_pulls)
github_backup.backup_repositories(args, tmp_path, [repository])
assert seen_since == [
("issues", "2026-01-01T00:00:00Z"),
("pulls", "2025-01-01T00:00:00Z"),
]
assert (issues_dir / "last_update").read_text() == "2026-02-01T00:00:00Z"
assert (pulls_dir / "last_update").read_text() == "2026-02-01T00:00:00Z"
def test_incremental_uses_legacy_global_last_update_for_existing_resource_backup(
create_args, tmp_path, monkeypatch
):
args = create_args(incremental=True, include_issues=True)
repository = _repo("repo-one", "2026-02-01T00:00:00Z")
(tmp_path / "last_update").write_text("2026-01-01T00:00:00Z")
issues_dir = tmp_path / "repositories" / "repo-one" / "issues"
issues_dir.mkdir(parents=True)
with open(issues_dir / "1.json", "w", encoding="utf-8") as f:
json.dump({"number": 1}, f)
seen_since = []
def fake_backup_issues(passed_args, repo_cwd, repository, repos_template):
seen_since.append(passed_args.since)
monkeypatch.setattr(github_backup, "backup_issues", fake_backup_issues)
github_backup.backup_repositories(args, tmp_path, [repository])
assert seen_since == ["2026-01-01T00:00:00Z"]
assert (issues_dir / "last_update").read_text() == "2026-02-01T00:00:00Z"
assert not os.path.exists(tmp_path / "last_update")
def test_incremental_does_not_use_legacy_global_last_update_for_new_resource_backup(
create_args, tmp_path, monkeypatch
):
args = create_args(incremental=True, include_issues=True)
repository = _repo("repo-one", "2026-02-01T00:00:00Z")
(tmp_path / "last_update").write_text("2099-01-01T00:00:00Z")
seen_since = []
def fake_backup_issues(passed_args, repo_cwd, repository, repos_template):
seen_since.append(passed_args.since)
monkeypatch.setattr(github_backup, "backup_issues", fake_backup_issues)
github_backup.backup_repositories(args, tmp_path, [repository])
assert seen_since == [None]
assert (
tmp_path / "repositories" / "repo-one" / "issues" / "last_update"
).read_text() == "2026-02-01T00:00:00Z"
assert not os.path.exists(tmp_path / "last_update")
def test_incremental_keeps_legacy_global_last_update_until_all_existing_resources_migrated(
create_args, tmp_path, monkeypatch
):
args = create_args(incremental=True, include_issues=True)
repository = _repo("repo-one", "2026-02-01T00:00:00Z")
(tmp_path / "last_update").write_text("2026-01-01T00:00:00Z")
repo_one_issues = tmp_path / "repositories" / "repo-one" / "issues"
repo_two_issues = tmp_path / "repositories" / "repo-two" / "issues"
repo_one_issues.mkdir(parents=True)
repo_two_issues.mkdir(parents=True)
with open(repo_one_issues / "1.json", "w", encoding="utf-8") as f:
json.dump({"number": 1}, f)
with open(repo_two_issues / "2.json", "w", encoding="utf-8") as f:
json.dump({"number": 2}, f)
def fake_backup_issues(passed_args, repo_cwd, repository, repos_template):
pass
monkeypatch.setattr(github_backup, "backup_issues", fake_backup_issues)
github_backup.backup_repositories(args, tmp_path, [repository])
assert (repo_one_issues / "last_update").read_text() == "2026-02-01T00:00:00Z"
assert not os.path.exists(repo_two_issues / "last_update")
assert (tmp_path / "last_update").read_text() == "2026-01-01T00:00:00Z"
def test_incremental_does_not_remove_legacy_checkpoint_without_resource_work(
create_args, tmp_path
):
args = create_args(incremental=True, include_repository=True)
repository = _repo("repo-one", "2026-02-01T00:00:00Z")
(tmp_path / "last_update").write_text("2026-01-01T00:00:00Z")
github_backup.backup_repositories(args, tmp_path, [repository])
assert (tmp_path / "last_update").read_text() == "2026-01-01T00:00:00Z"
assert not os.path.exists(
tmp_path / "repositories" / "repo-one" / "issues" / "last_update"
)
def test_repository_checkpoint_time_uses_newest_available_repo_timestamp():
repository = _repo(
"repo-one",
updated_at="2026-02-01T00:00:00Z",
pushed_at="2026-03-01T00:00:00Z",
)
assert github_backup.get_repository_checkpoint_time(repository) == (
"2026-03-01T00:00:00Z"
)

View File

@@ -0,0 +1,198 @@
"""Tests for json_dump_if_changed functionality."""
import codecs
import json
import os
import tempfile
import pytest
from github_backup import github_backup
class TestJsonDumpIfChanged:
"""Test suite for json_dump_if_changed function."""
def test_writes_new_file(self):
"""Should write file when it doesn't exist."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {"key": "value", "number": 42}
result = github_backup.json_dump_if_changed(test_data, output_file)
assert result is True
assert os.path.exists(output_file)
# Verify content matches expected format
with codecs.open(output_file, "r", encoding="utf-8") as f:
content = f.read()
loaded = json.loads(content)
assert loaded == test_data
def test_skips_unchanged_file(self):
"""Should skip write when content is identical."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {"key": "value", "number": 42}
# First write
result1 = github_backup.json_dump_if_changed(test_data, output_file)
assert result1 is True
# Get the initial mtime
mtime1 = os.path.getmtime(output_file)
# Second write with same data
result2 = github_backup.json_dump_if_changed(test_data, output_file)
assert result2 is False
# File should not have been modified
mtime2 = os.path.getmtime(output_file)
assert mtime1 == mtime2
def test_writes_when_content_changed(self):
"""Should write file when content has changed."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data1 = {"key": "value1"}
test_data2 = {"key": "value2"}
# First write
result1 = github_backup.json_dump_if_changed(test_data1, output_file)
assert result1 is True
# Second write with different data
result2 = github_backup.json_dump_if_changed(test_data2, output_file)
assert result2 is True
# Verify new content
with codecs.open(output_file, "r", encoding="utf-8") as f:
loaded = json.load(f)
assert loaded == test_data2
def test_uses_consistent_formatting(self):
"""Should use same JSON formatting as json_dump."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {"z": "last", "a": "first", "m": "middle"}
github_backup.json_dump_if_changed(test_data, output_file)
with codecs.open(output_file, "r", encoding="utf-8") as f:
content = f.read()
# Check for consistent formatting:
# - sorted keys
# - 4-space indent
# - comma-colon-space separator
expected = json.dumps(
test_data,
ensure_ascii=False,
sort_keys=True,
indent=4,
separators=(",", ": "),
)
assert content == expected
def test_atomic_write_always_used(self):
"""Should always use temp file and rename for atomic writes."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {"key": "value"}
result = github_backup.json_dump_if_changed(test_data, output_file)
assert result is True
assert os.path.exists(output_file)
# Temp file should not exist after atomic write
temp_file = output_file + ".temp"
assert not os.path.exists(temp_file)
# Verify content
with codecs.open(output_file, "r", encoding="utf-8") as f:
loaded = json.load(f)
assert loaded == test_data
def test_handles_unicode_content(self):
"""Should correctly handle Unicode content."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {
"emoji": "🚀",
"chinese": "你好",
"arabic": "مرحبا",
"cyrillic": "Привет",
}
result = github_backup.json_dump_if_changed(test_data, output_file)
assert result is True
# Verify Unicode is preserved
with codecs.open(output_file, "r", encoding="utf-8") as f:
loaded = json.load(f)
assert loaded == test_data
# Second write should skip
result2 = github_backup.json_dump_if_changed(test_data, output_file)
assert result2 is False
def test_handles_complex_nested_data(self):
"""Should handle complex nested data structures."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {
"users": [
{"id": 1, "name": "Alice", "tags": ["admin", "user"]},
{"id": 2, "name": "Bob", "tags": ["user"]},
],
"metadata": {"version": "1.0", "nested": {"deep": {"value": 42}}},
}
result = github_backup.json_dump_if_changed(test_data, output_file)
assert result is True
# Verify structure is preserved
with codecs.open(output_file, "r", encoding="utf-8") as f:
loaded = json.load(f)
assert loaded == test_data
def test_overwrites_on_unicode_decode_error(self):
"""Should overwrite if existing file has invalid UTF-8."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
test_data = {"key": "value"}
# Write invalid UTF-8 bytes
with open(output_file, "wb") as f:
f.write(b"\xff\xfe invalid utf-8")
# Should catch UnicodeDecodeError and overwrite
result = github_backup.json_dump_if_changed(test_data, output_file)
assert result is True
# Verify new content was written
with codecs.open(output_file, "r", encoding="utf-8") as f:
loaded = json.load(f)
assert loaded == test_data
def test_key_order_independence(self):
"""Should treat differently-ordered dicts as same if keys/values match."""
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "test.json")
# Write first dict
data1 = {"z": 1, "a": 2, "m": 3}
github_backup.json_dump_if_changed(data1, output_file)
# Try to write same data but different order
data2 = {"a": 2, "m": 3, "z": 1}
result = github_backup.json_dump_if_changed(data2, output_file)
# Should skip because content is the same (keys are sorted)
assert result is False
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -1,9 +1,7 @@
"""Tests for Link header pagination handling."""
import json
from unittest.mock import Mock, patch
import pytest
from unittest.mock import patch
from github_backup import github_backup
@@ -38,24 +36,9 @@ class MockHTTPResponse:
return headers
@pytest.fixture
def mock_args():
"""Mock args for retrieve_data_gen."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
return args
def test_cursor_based_pagination(mock_args):
def test_cursor_based_pagination(create_args):
"""Link header with 'after' cursor parameter works correctly."""
args = create_args(token_classic="fake_token")
# Simulate issues endpoint behavior: returns cursor in Link header
responses = [
@@ -77,10 +60,8 @@ def test_cursor_based_pagination(mock_args):
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list(
github_backup.retrieve_data_gen(
mock_args, "https://api.github.com/repos/owner/repo/issues"
)
results = github_backup.retrieve_data(
args, "https://api.github.com/repos/owner/repo/issues"
)
# Verify all items retrieved and cursor was used in second request
@@ -89,8 +70,9 @@ def test_cursor_based_pagination(mock_args):
assert "after=ABC123" in requests_made[1]
def test_page_based_pagination(mock_args):
def test_page_based_pagination(create_args):
"""Link header with 'page' parameter works correctly."""
args = create_args(token_classic="fake_token")
# Simulate pulls/repos endpoint behavior: returns page numbers in Link header
responses = [
@@ -112,10 +94,8 @@ def test_page_based_pagination(mock_args):
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list(
github_backup.retrieve_data_gen(
mock_args, "https://api.github.com/repos/owner/repo/pulls"
)
results = github_backup.retrieve_data(
args, "https://api.github.com/repos/owner/repo/pulls"
)
# Verify all items retrieved and page parameter was used (not cursor)
@@ -125,8 +105,9 @@ def test_page_based_pagination(mock_args):
assert "after" not in requests_made[1]
def test_no_link_header_stops_pagination(mock_args):
def test_no_link_header_stops_pagination(create_args):
"""Pagination stops when Link header is absent."""
args = create_args(token_classic="fake_token")
# Simulate endpoint with results that fit in a single page
responses = [
@@ -142,10 +123,8 @@ def test_no_link_header_stops_pagination(mock_args):
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list(
github_backup.retrieve_data_gen(
mock_args, "https://api.github.com/repos/owner/repo/labels"
)
results = github_backup.retrieve_data(
args, "https://api.github.com/repos/owner/repo/labels"
)
# Verify pagination stopped after first request

View File

@@ -0,0 +1,131 @@
"""Tests for incremental pull request pagination."""
import json
import os
from unittest.mock import patch
from github_backup import github_backup
class MockHTTPResponse:
def __init__(self, data, link_header=None):
self._content = json.dumps(data).encode("utf-8")
self._link_header = link_header
self._read = False
self.reason = "OK"
def getcode(self):
return 200
def read(self):
if self._read:
return b""
self._read = True
return self._content
@property
def headers(self):
headers = {"x-ratelimit-remaining": "5000"}
if self._link_header:
headers["Link"] = self._link_header
return headers
def test_backup_pulls_incremental_excludes_checkpoint_timestamp(create_args, tmp_path):
args = create_args(include_pulls=True, incremental=True)
args.since = "2026-04-26T08:13:46Z"
repository = {"full_name": "owner/repo"}
responses = [
MockHTTPResponse([]),
MockHTTPResponse(
[
{
"number": 1,
"title": "already backed up",
"updated_at": "2026-04-26T08:13:46Z",
},
],
link_header='<https://api.github.com/repos/owner/repo/pulls?per_page=100&state=closed&page=2>; rel="next"',
),
MockHTTPResponse(
[
{
"number": 0,
"title": "older pull on page 2",
"updated_at": "2026-04-25T07:00:00Z",
}
]
),
]
requests_made = []
def mock_urlopen(request, *args, **kwargs):
requests_made.append(request.get_full_url())
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
github_backup.backup_pulls(
args, tmp_path, repository, "https://api.github.com/repos"
)
assert len(requests_made) == 2
assert "state=open" in requests_made[0]
assert "state=closed" in requests_made[1]
assert all("page=2" not in url for url in requests_made)
assert not os.path.exists(tmp_path / "pulls" / "1.json")
assert not os.path.exists(tmp_path / "pulls" / "0.json")
def test_backup_pulls_incremental_stops_before_fetching_old_pages(
create_args, tmp_path
):
args = create_args(include_pulls=True, incremental=True)
args.since = "2026-04-26T08:13:46Z"
repository = {"full_name": "owner/repo"}
responses = [
MockHTTPResponse([]),
MockHTTPResponse(
[
{
"number": 2,
"title": "new pull",
"updated_at": "2026-04-26T09:00:00Z",
},
{
"number": 1,
"title": "old pull",
"updated_at": "2026-04-26T07:00:00Z",
},
],
link_header='<https://api.github.com/repos/owner/repo/pulls?per_page=100&state=closed&page=2>; rel="next"',
),
MockHTTPResponse(
[
{
"number": 0,
"title": "older pull on page 2",
"updated_at": "2026-04-25T07:00:00Z",
}
]
),
]
requests_made = []
def mock_urlopen(request, *args, **kwargs):
requests_made.append(request.get_full_url())
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
github_backup.backup_pulls(
args, tmp_path, repository, "https://api.github.com/repos"
)
assert len(requests_made) == 2
assert "state=open" in requests_made[0]
assert "state=closed" in requests_made[1]
assert all("page=2" not in url for url in requests_made)
assert os.path.exists(tmp_path / "pulls" / "2.json")
assert not os.path.exists(tmp_path / "pulls" / "1.json")
assert not os.path.exists(tmp_path / "pulls" / "0.json")

237
tests/test_pull_reviews.py Normal file
View File

@@ -0,0 +1,237 @@
"""Tests for pull request review backups."""
import json
import os
from github_backup import github_backup
def test_parse_args_pull_reviews_flag():
args = github_backup.parse_args(["--pull-reviews", "testuser"])
assert args.include_pull_reviews is True
def test_backup_pulls_includes_review_data(create_args, tmp_path, monkeypatch):
args = create_args(include_pulls=True, include_pull_reviews=True)
repository = {"full_name": "owner/repo"}
calls = []
def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs):
calls.append((template, query_args))
if template == "https://api.github.com/repos/owner/repo/pulls":
if query_args["state"] == "open":
return [
{
"number": 1,
"updated_at": "2026-02-01T00:00:00Z",
"title": "Add feature",
}
]
return []
if template == "https://api.github.com/repos/owner/repo/pulls/1/reviews":
return [
{
"id": 123,
"state": "APPROVED",
"body": "Looks good",
"submitted_at": "2026-02-01T00:00:00Z",
}
]
raise AssertionError("Unexpected template: {0}".format(template))
monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data)
github_backup.backup_pulls(
args, tmp_path, repository, "https://api.github.com/repos"
)
with open(tmp_path / "pulls" / "1.json", encoding="utf-8") as f:
pull = json.load(f)
assert pull["review_data"] == [
{
"body": "Looks good",
"id": 123,
"state": "APPROVED",
"submitted_at": "2026-02-01T00:00:00Z",
}
]
assert (
"https://api.github.com/repos/owner/repo/pulls/1/reviews",
None,
) in calls
def test_pull_reviews_backfill_ignores_repository_checkpoint(
create_args, tmp_path, monkeypatch
):
args = create_args(
include_pulls=True,
include_pull_reviews=True,
incremental=True,
)
args.since = "2026-01-01T00:00:00Z"
repository = {"full_name": "owner/repo"}
def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs):
if template == "https://api.github.com/repos/owner/repo/pulls":
if query_args["state"] == "open":
return [
{
"number": 1,
"updated_at": "2025-01-01T00:00:00Z",
"title": "Old pull request",
}
]
return []
if template == "https://api.github.com/repos/owner/repo/pulls/1/reviews":
return [{"id": 123, "state": "APPROVED"}]
raise AssertionError("Unexpected template: {0}".format(template))
monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data)
github_backup.backup_pulls(
args, tmp_path, repository, "https://api.github.com/repos"
)
with open(tmp_path / "pulls" / "1.json", encoding="utf-8") as f:
pull = json.load(f)
assert pull["review_data"] == [{"id": 123, "state": "APPROVED"}]
assert (tmp_path / "pulls" / "reviews_last_update").read_text() == (
"2025-01-01T00:00:00Z"
)
def test_pull_reviews_uses_review_checkpoint_when_older_than_repository_checkpoint(
create_args, tmp_path, monkeypatch
):
args = create_args(
include_pulls=True,
include_pull_reviews=True,
incremental=True,
)
args.since = "2026-01-01T00:00:00Z"
repository = {"full_name": "owner/repo"}
pulls_dir = tmp_path / "pulls"
pulls_dir.mkdir()
(pulls_dir / "reviews_last_update").write_text("2025-01-01T00:00:00Z")
def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs):
if template == "https://api.github.com/repos/owner/repo/pulls":
if query_args["state"] == "open":
return [
{
"number": 1,
"updated_at": "2025-06-01T00:00:00Z",
"title": "Review changed while feature was disabled",
},
{
"number": 2,
"updated_at": "2024-12-01T00:00:00Z",
"title": "Too old",
},
]
return []
if template == "https://api.github.com/repos/owner/repo/pulls/1/reviews":
return [{"id": 123, "state": "COMMENTED"}]
raise AssertionError("Unexpected template: {0}".format(template))
monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data)
github_backup.backup_pulls(
args, tmp_path, repository, "https://api.github.com/repos"
)
assert os.path.exists(tmp_path / "pulls" / "1.json")
assert not os.path.exists(tmp_path / "pulls" / "2.json")
assert (tmp_path / "pulls" / "reviews_last_update").read_text() == (
"2025-06-01T00:00:00Z"
)
def test_pull_reviews_preserves_existing_optional_pull_data(
create_args, tmp_path, monkeypatch
):
args = create_args(include_pulls=True, include_pull_reviews=True)
repository = {"full_name": "owner/repo"}
pulls_dir = tmp_path / "pulls"
pulls_dir.mkdir()
with open(pulls_dir / "1.json", "w", encoding="utf-8") as f:
json.dump(
{
"number": 1,
"updated_at": "2026-01-01T00:00:00Z",
"comment_data": [{"id": 10, "body": "inline comment"}],
"comment_regular_data": [{"id": 11, "body": "regular comment"}],
"commit_data": [{"sha": "abc"}],
},
f,
)
def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs):
if template == "https://api.github.com/repos/owner/repo/pulls":
if query_args["state"] == "open":
return [
{
"number": 1,
"updated_at": "2026-02-01T00:00:00Z",
"title": "Add reviews",
}
]
return []
if template == "https://api.github.com/repos/owner/repo/pulls/1/reviews":
return [{"id": 123, "state": "APPROVED"}]
raise AssertionError("Unexpected template: {0}".format(template))
monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data)
github_backup.backup_pulls(
args, tmp_path, repository, "https://api.github.com/repos"
)
with open(pulls_dir / "1.json", encoding="utf-8") as f:
pull = json.load(f)
assert pull["review_data"] == [{"id": 123, "state": "APPROVED"}]
assert pull["comment_data"] == [{"id": 10, "body": "inline comment"}]
assert pull["comment_regular_data"] == [{"id": 11, "body": "regular comment"}]
assert pull["commit_data"] == [{"sha": "abc"}]
def test_pull_reviews_does_not_advance_checkpoint_on_review_error(
create_args, tmp_path, monkeypatch
):
args = create_args(
include_pulls=True,
include_pull_reviews=True,
incremental=True,
)
args.since = "2026-01-01T00:00:00Z"
repository = {"full_name": "owner/repo"}
pulls_dir = tmp_path / "pulls"
pulls_dir.mkdir()
(pulls_dir / "reviews_last_update").write_text("2025-01-01T00:00:00Z")
def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs):
if template == "https://api.github.com/repos/owner/repo/pulls":
if query_args["state"] == "open":
return [
{
"number": 1,
"updated_at": "2025-06-01T00:00:00Z",
"title": "Review retrieval fails",
}
]
return []
if template == "https://api.github.com/repos/owner/repo/pulls/1/reviews":
raise Exception("temporary API failure")
raise AssertionError("Unexpected template: {0}".format(template))
monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data)
github_backup.backup_pulls(
args, tmp_path, repository, "https://api.github.com/repos"
)
assert (pulls_dir / "reviews_last_update").read_text() == "2025-01-01T00:00:00Z"

95
tests/test_releases.py Normal file
View File

@@ -0,0 +1,95 @@
"""Tests for release backup behavior."""
from github_backup import github_backup
def test_backup_releases_uses_embedded_assets_without_extra_asset_list_request(
create_args, tmp_path, monkeypatch
):
args = create_args(include_releases=True, include_assets=True)
repository = {"full_name": "owner/repo", "name": "repo"}
calls = []
downloads = []
def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs):
calls.append(template)
if template == "https://api.github.com/repos/owner/repo/releases":
return [
{
"tag_name": "v1.0.0",
"created_at": "2026-01-01T00:00:00Z",
"updated_at": "2026-01-01T00:00:00Z",
"prerelease": False,
"draft": False,
"assets_url": "https://api.github.com/repos/owner/repo/releases/1/assets",
"assets": [
{
"name": "artifact.zip",
"url": "https://api.github.com/repos/owner/repo/releases/assets/1",
}
],
}
]
raise AssertionError("Unexpected API request: {0}".format(template))
def fake_download_file(url, path, auth, as_app=False, fine=False):
downloads.append((url, path))
monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data)
monkeypatch.setattr(github_backup, "download_file", fake_download_file)
github_backup.backup_releases(
args,
tmp_path,
repository,
"https://api.github.com/repos",
include_assets=True,
)
assert calls == ["https://api.github.com/repos/owner/repo/releases"]
assert downloads == [
(
"https://api.github.com/repos/owner/repo/releases/assets/1",
str(tmp_path / "releases" / "v1.0.0" / "artifact.zip"),
)
]
def test_backup_releases_falls_back_to_assets_url_when_assets_missing(
create_args, tmp_path, monkeypatch
):
args = create_args(include_releases=True, include_assets=True)
repository = {"full_name": "owner/repo", "name": "repo"}
calls = []
def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs):
calls.append(template)
if template == "https://api.github.com/repos/owner/repo/releases":
return [
{
"tag_name": "v1.0.0",
"created_at": "2026-01-01T00:00:00Z",
"updated_at": "2026-01-01T00:00:00Z",
"prerelease": False,
"draft": False,
"assets_url": "https://api.github.com/repos/owner/repo/releases/1/assets",
}
]
if template == "https://api.github.com/repos/owner/repo/releases/1/assets":
return []
raise AssertionError("Unexpected API request: {0}".format(template))
monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data)
github_backup.backup_releases(
args,
tmp_path,
repository,
"https://api.github.com/repos",
include_assets=True,
)
assert calls == [
"https://api.github.com/repos/owner/repo/releases",
"https://api.github.com/repos/owner/repo/releases/1/assets",
]

529
tests/test_retrieve_data.py Normal file
View File

@@ -0,0 +1,529 @@
"""Tests for retrieve_data function."""
import json
import logging
import socket
from unittest.mock import Mock, patch
from urllib.error import HTTPError, URLError
import pytest
from github_backup import github_backup
from github_backup.github_backup import (
calculate_retry_delay,
make_request_with_retry,
)
# Default retry count used in tests (matches argparse default)
# With max_retries=5, total attempts = 6 (1 initial + 5 retries)
DEFAULT_MAX_RETRIES = 5
class TestCalculateRetryDelay:
def test_respects_retry_after_header(self):
headers = {"retry-after": "30"}
assert calculate_retry_delay(0, headers) == 30
def test_respects_rate_limit_reset(self):
import time
import calendar
# Set reset time 60 seconds in the future
future_reset = calendar.timegm(time.gmtime()) + 60
headers = {"x-ratelimit-remaining": "0", "x-ratelimit-reset": str(future_reset)}
delay = calculate_retry_delay(0, headers)
# Should be approximately 60 seconds (with some tolerance for execution time)
assert 55 <= delay <= 65
def test_exponential_backoff(self):
delay_0 = calculate_retry_delay(0, {})
delay_1 = calculate_retry_delay(1, {})
delay_2 = calculate_retry_delay(2, {})
# Base delay is 1s, so delays should be roughly 1, 2, 4 (plus jitter)
assert 0.9 <= delay_0 <= 1.2 # ~1s + up to 10% jitter
assert 1.8 <= delay_1 <= 2.4 # ~2s + up to 10% jitter
assert 3.6 <= delay_2 <= 4.8 # ~4s + up to 10% jitter
def test_max_delay_cap(self):
# Very high attempt number should not exceed 120s + jitter
delay = calculate_retry_delay(100, {})
assert delay <= 120 * 1.1 # 120s max + 10% jitter
def test_minimum_rate_limit_delay(self):
import time
import calendar
# Set reset time in the past (already reset)
past_reset = calendar.timegm(time.gmtime()) - 100
headers = {"x-ratelimit-remaining": "0", "x-ratelimit-reset": str(past_reset)}
delay = calculate_retry_delay(0, headers)
# Should be minimum 10 seconds even if reset time is in past
assert delay >= 10
class TestRetrieveDataRetry:
"""Tests for retry behavior in retrieve_data."""
def test_json_parse_error_retries_and_fails(self, create_args):
"""HTTP 200 with invalid JSON should retry and eventually fail."""
args = create_args(token_classic="fake_token")
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = b"not valid json {"
mock_response.headers = {"x-ratelimit-remaining": "5000"}
call_count = 0
def mock_make_request(*a, **kw):
nonlocal call_count
call_count += 1
return mock_response
with patch(
"github_backup.github_backup.make_request_with_retry",
side_effect=mock_make_request,
):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
): # No delay in tests
with pytest.raises(Exception) as exc_info:
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/repo/issues"
)
assert "Failed to read response after" in str(exc_info.value)
assert (
call_count == DEFAULT_MAX_RETRIES + 1
) # 1 initial + 5 retries = 6 attempts
def test_json_parse_error_recovers_on_retry(self, create_args):
"""HTTP 200 with invalid JSON should succeed if retry returns valid JSON."""
args = create_args(token_classic="fake_token")
bad_response = Mock()
bad_response.getcode.return_value = 200
bad_response.read.return_value = b"not valid json {"
bad_response.headers = {"x-ratelimit-remaining": "5000"}
good_response = Mock()
good_response.getcode.return_value = 200
good_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8")
good_response.headers = {"x-ratelimit-remaining": "5000", "Link": ""}
responses = [bad_response, bad_response, good_response]
call_count = 0
def mock_make_request(*a, **kw):
nonlocal call_count
result = responses[call_count]
call_count += 1
return result
with patch(
"github_backup.github_backup.make_request_with_retry",
side_effect=mock_make_request,
):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
result = github_backup.retrieve_data(
args, "https://api.github.com/repos/test/repo/issues"
)
assert result == [{"id": 1}]
assert call_count == 3 # Failed twice, succeeded on third
def test_http_error_raises_exception(self, create_args):
"""Non-success HTTP status codes should raise Exception."""
args = create_args(token_classic="fake_token")
mock_response = Mock()
mock_response.getcode.return_value = 404
mock_response.read.return_value = b'{"message": "Not Found"}'
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Not Found"
with patch(
"github_backup.github_backup.make_request_with_retry",
return_value=mock_response,
):
with pytest.raises(Exception) as exc_info:
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/notfound/issues"
)
assert not isinstance(
exc_info.value, github_backup.RepositoryUnavailableError
)
assert "404" in str(exc_info.value)
class TestMakeRequestWithRetry:
"""Tests for HTTP error retry behavior in make_request_with_retry."""
def test_502_error_retries_and_succeeds(self):
"""HTTP 502 should retry and succeed if subsequent request works."""
good_response = Mock()
good_response.read.return_value = b'{"ok": true}'
call_count = 0
fail_count = DEFAULT_MAX_RETRIES # Fail all retries, succeed on last attempt
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count <= fail_count:
raise HTTPError(
url="https://api.github.com/test",
code=502,
msg="Bad Gateway",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert (
call_count == DEFAULT_MAX_RETRIES + 1
) # 1 initial + 5 retries = 6 attempts
def test_503_error_retries_until_exhausted(self):
"""HTTP 503 should make 1 initial + DEFAULT_MAX_RETRIES retry attempts then raise."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise HTTPError(
url="https://api.github.com/test",
code=503,
msg="Service Unavailable",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 503
assert (
call_count == DEFAULT_MAX_RETRIES + 1
) # 1 initial + 5 retries = 6 attempts
def test_404_error_not_retried(self):
"""HTTP 404 should not be retried - raise immediately."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise HTTPError(
url="https://api.github.com/test",
code=404,
msg="Not Found",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 404
assert call_count == 1 # No retries
def test_rate_limit_403_retried_when_remaining_zero(self):
"""HTTP 403 with x-ratelimit-remaining=0 should retry."""
good_response = Mock()
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count == 1:
raise HTTPError(
url="https://api.github.com/test",
code=403,
msg="Forbidden",
hdrs={"x-ratelimit-remaining": "0"},
fp=None,
)
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert call_count == 2
def test_403_not_retried_when_remaining_nonzero(self):
"""HTTP 403 with x-ratelimit-remaining>0 should not retry (permission error)."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise HTTPError(
url="https://api.github.com/test",
code=403,
msg="Forbidden",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 403
assert call_count == 1 # No retries
def test_451_error_not_retried(self):
"""HTTP 451 should not be retried - raise immediately."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise HTTPError(
url="https://api.github.com/test",
code=451,
msg="Unavailable For Legal Reasons",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 451
assert call_count == 1 # No retries
def test_connection_error_retries_and_succeeds(self):
"""URLError (connection error) should retry and succeed if subsequent request works."""
good_response = Mock()
call_count = 0
fail_count = DEFAULT_MAX_RETRIES # Fail all retries, succeed on last attempt
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count <= fail_count:
raise URLError("Connection refused")
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert (
call_count == DEFAULT_MAX_RETRIES + 1
) # 1 initial + 5 retries = 6 attempts
def test_socket_error_retries_until_exhausted(self):
"""socket.error should make 1 initial + DEFAULT_MAX_RETRIES retry attempts then raise."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise socket.error("Connection reset by peer")
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
with pytest.raises(socket.error):
make_request_with_retry(Mock(), None)
assert (
call_count == DEFAULT_MAX_RETRIES + 1
) # 1 initial + 5 retries = 6 attempts
class TestRetrieveGraphqlDataLogging:
"""Tests for GraphQL request logging."""
def test_logs_graphql_context(self, create_args, caplog):
args = create_args(token_classic="fake_token")
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = json.dumps({"data": {}}).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5000"}
caplog.set_level(logging.INFO, logger="github_backup.github_backup")
with patch(
"github_backup.github_backup.make_request_with_retry",
return_value=mock_response,
):
github_backup.retrieve_graphql_data(
args,
"query { viewer { login } }",
log_context="discussion owner/repo#1",
)
assert (
"Requesting https://api.github.com/graphql (discussion owner/repo#1)"
in caplog.text
)
class TestRetrieveDataThrottling:
"""Tests for throttling behavior in retrieve_data."""
def test_throttling_pauses_when_rate_limit_low(self, create_args):
"""Should pause when x-ratelimit-remaining is at or below throttle_limit."""
args = create_args(
token_classic="fake_token",
throttle_limit=10,
throttle_pause=5,
)
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8")
mock_response.headers = {
"x-ratelimit-remaining": "5",
"Link": "",
} # Below throttle_limit
with patch(
"github_backup.github_backup.make_request_with_retry",
return_value=mock_response,
):
with patch("github_backup.github_backup.time.sleep") as mock_sleep:
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/repo/issues"
)
mock_sleep.assert_called_once_with(5) # throttle_pause value
class TestRetrieveDataSingleItem:
"""Tests for single item (dict) responses in retrieve_data."""
def test_dict_response_returned_as_list(self, create_args):
"""Single dict response should be returned as a list with one item."""
args = create_args(token_classic="fake_token")
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = json.dumps(
{"login": "testuser", "id": 123}
).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5000", "Link": ""}
with patch(
"github_backup.github_backup.make_request_with_retry",
return_value=mock_response,
):
result = github_backup.retrieve_data(
args, "https://api.github.com/user"
)
assert result == [{"login": "testuser", "id": 123}]
class TestRetriesCliArgument:
"""Tests for --retries CLI argument validation and behavior."""
def test_retries_argument_accepted(self):
"""--retries flag should be accepted and parsed correctly."""
args = github_backup.parse_args(["--retries", "3", "testuser"])
assert args.max_retries == 3
def test_retries_default_value(self):
"""--retries should default to 5 if not specified."""
args = github_backup.parse_args(["testuser"])
assert args.max_retries == 5
def test_retries_zero_is_valid(self):
"""--retries 0 should be valid and mean 1 attempt (no retries)."""
args = github_backup.parse_args(["--retries", "0", "testuser"])
assert args.max_retries == 0
def test_retries_negative_rejected(self):
"""--retries with negative value should be rejected by argparse."""
with pytest.raises(SystemExit):
github_backup.parse_args(["--retries", "-1", "testuser"])
def test_retries_non_integer_rejected(self):
"""--retries with non-integer value should be rejected by argparse."""
with pytest.raises(SystemExit):
github_backup.parse_args(["--retries", "abc", "testuser"])
def test_retries_one_with_transient_error_succeeds(self):
"""--retries 1 should allow one retry after initial failure."""
good_response = Mock()
good_response.read.return_value = b'{"ok": true}'
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count == 1:
raise HTTPError(
url="https://api.github.com/test",
code=502,
msg="Bad Gateway",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
result = make_request_with_retry(Mock(), None, max_retries=1)
assert result == good_response
assert call_count == 2 # 1 initial + 1 retry = 2 attempts
def test_custom_retry_count_limits_attempts(self, create_args):
"""Custom --retries value should limit actual retry attempts."""
args = create_args(
token_classic="fake_token",
max_retries=2, # 2 retries = 3 total attempts (1 initial + 2 retries)
)
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = b"not valid json {"
mock_response.headers = {"x-ratelimit-remaining": "5000"}
call_count = 0
def mock_make_request(*args, **kwargs):
nonlocal call_count
call_count += 1
return mock_response
with patch(
"github_backup.github_backup.make_request_with_retry",
side_effect=mock_make_request,
):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
with pytest.raises(Exception) as exc_info:
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/repo/issues"
)
assert "Failed to read response after 3 attempts" in str(exc_info.value)
assert call_count == 3 # 1 initial + 2 retries = 3 attempts

View File

@@ -0,0 +1,272 @@
"""Tests for --skip-assets-on flag behavior (issue #135)."""
import pytest
from unittest.mock import patch
from github_backup import github_backup
class TestSkipAssetsOn:
"""Test suite for --skip-assets-on flag.
Issue #135: Allow skipping asset downloads for specific repositories
while still backing up release metadata.
"""
def _create_mock_repository(self, name="test-repo", owner="testuser"):
"""Create a mock repository object."""
return {
"name": name,
"full_name": f"{owner}/{name}",
"owner": {"login": owner},
"private": False,
"fork": False,
"has_wiki": False,
}
def _create_mock_release(self, tag="v1.0.0"):
"""Create a mock release object."""
return {
"tag_name": tag,
"name": tag,
"prerelease": False,
"draft": False,
"assets_url": f"https://api.github.com/repos/testuser/test-repo/releases/{tag}/assets",
}
def _create_mock_asset(self, name="asset.zip"):
"""Create a mock asset object."""
return {
"name": name,
"url": f"https://api.github.com/repos/testuser/test-repo/releases/assets/{name}",
}
class TestSkipAssetsOnArgumentParsing(TestSkipAssetsOn):
"""Tests for --skip-assets-on argument parsing."""
def test_skip_assets_on_not_set_defaults_to_none(self):
"""When --skip-assets-on is not specified, it should default to None."""
args = github_backup.parse_args(["testuser"])
assert args.skip_assets_on is None
def test_skip_assets_on_single_repo(self):
"""Single --skip-assets-on should create list with one item."""
args = github_backup.parse_args(["testuser", "--skip-assets-on", "big-repo"])
assert args.skip_assets_on == ["big-repo"]
def test_skip_assets_on_multiple_repos(self):
"""Multiple repos can be specified space-separated (like --exclude)."""
args = github_backup.parse_args(
[
"testuser",
"--skip-assets-on",
"big-repo",
"another-repo",
"owner/third-repo",
]
)
assert args.skip_assets_on == ["big-repo", "another-repo", "owner/third-repo"]
class TestSkipAssetsOnBehavior(TestSkipAssetsOn):
"""Tests for --skip-assets-on behavior in backup_releases."""
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_downloaded_when_not_skipped(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args
):
"""Assets should be downloaded when repo is not in skip list."""
args = create_args(skip_assets_on=[])
repository = self._create_mock_repository(name="normal-repo")
release = self._create_mock_release()
asset = self._create_mock_asset()
mock_json_dump.return_value = True
mock_retrieve.side_effect = [
[release], # First call: get releases
[asset], # Second call: get assets
]
with patch("os.path.join", side_effect=lambda *args: "/".join(args)):
github_backup.backup_releases(
args,
"/tmp/backup/repositories/normal-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should have been called for the asset
mock_download.assert_called_once()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_skipped_when_repo_name_matches(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args
):
"""Assets should be skipped when repo name is in skip list."""
args = create_args(skip_assets_on=["big-repo"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_skipped_when_full_name_matches(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args
):
"""Assets should be skipped when owner/repo format matches."""
args = create_args(skip_assets_on=["otheruser/big-repo"])
repository = self._create_mock_repository(name="big-repo", owner="otheruser")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_case_insensitive_matching(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args
):
"""Skip matching should be case-insensitive."""
# User types uppercase, repo name is lowercase
args = create_args(skip_assets_on=["BIG-REPO"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called (case-insensitive match)
assert not mock_download.called
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_multiple_skip_repos(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args
):
"""Multiple repos in skip list should all be skipped."""
args = create_args(skip_assets_on=["repo1", "repo2", "repo3"])
repository = self._create_mock_repository(name="repo2")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/repo2",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_release_metadata_still_saved_when_assets_skipped(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args
):
"""Release JSON should still be saved even when assets are skipped."""
args = create_args(skip_assets_on=["big-repo"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# json_dump_if_changed should have been called for release metadata
mock_json_dump.assert_called_once()
# But download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_non_matching_repo_still_downloads_assets(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args
):
"""Repos not in skip list should still download assets."""
args = create_args(skip_assets_on=["other-repo"])
repository = self._create_mock_repository(name="normal-repo")
release = self._create_mock_release()
asset = self._create_mock_asset()
mock_json_dump.return_value = True
mock_retrieve.side_effect = [
[release], # First call: get releases
[asset], # Second call: get assets
]
with patch("os.path.join", side_effect=lambda *args: "/".join(args)):
github_backup.backup_releases(
args,
"/tmp/backup/repositories/normal-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file SHOULD have been called
mock_download.assert_called_once()
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -0,0 +1,201 @@
"""Tests for --starred-skip-size-over flag behavior (issue #108)."""
import pytest
from github_backup import github_backup
class TestStarredSkipSizeOverArgumentParsing:
"""Tests for --starred-skip-size-over argument parsing."""
def test_starred_skip_size_over_not_set_defaults_to_none(self):
"""When --starred-skip-size-over is not specified, it should default to None."""
args = github_backup.parse_args(["testuser"])
assert args.starred_skip_size_over is None
def test_starred_skip_size_over_accepts_integer(self):
"""--starred-skip-size-over should accept an integer value."""
args = github_backup.parse_args(["testuser", "--starred-skip-size-over", "500"])
assert args.starred_skip_size_over == 500
def test_starred_skip_size_over_rejects_non_integer(self):
"""--starred-skip-size-over should reject non-integer values."""
with pytest.raises(SystemExit):
github_backup.parse_args(["testuser", "--starred-skip-size-over", "abc"])
class TestStarredSkipSizeOverFiltering:
"""Tests for --starred-skip-size-over filtering behavior.
Issue #108: Allow restricting size of starred repositories before cloning.
The size is based on the GitHub API's 'size' field (in KB), but the CLI
argument accepts MB for user convenience.
"""
def test_starred_repo_under_limit_is_kept(self, create_args):
"""Starred repos under the size limit should be kept."""
args = create_args(starred_skip_size_over=500)
repos = [
{
"name": "small-repo",
"owner": {"login": "otheruser"},
"size": 100 * 1024, # 100 MB in KB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert result[0]["name"] == "small-repo"
def test_starred_repo_over_limit_is_filtered(self, create_args):
"""Starred repos over the size limit should be filtered out."""
args = create_args(starred_skip_size_over=500)
repos = [
{
"name": "huge-repo",
"owner": {"login": "otheruser"},
"size": 600 * 1024, # 600 MB in KB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 0
def test_own_repo_over_limit_is_kept(self, create_args):
"""User's own repos should not be affected by the size limit."""
args = create_args(starred_skip_size_over=500)
repos = [
{
"name": "my-huge-repo",
"owner": {"login": "testuser"},
"size": 600 * 1024, # 600 MB in KB
# No is_starred flag - this is the user's own repo
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert result[0]["name"] == "my-huge-repo"
def test_starred_repo_at_exact_limit_is_kept(self, create_args):
"""Starred repos at exactly the size limit should be kept."""
args = create_args(starred_skip_size_over=500)
repos = [
{
"name": "exact-limit-repo",
"owner": {"login": "otheruser"},
"size": 500 * 1024, # Exactly 500 MB in KB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert result[0]["name"] == "exact-limit-repo"
def test_mixed_repos_filtered_correctly(self, create_args):
"""Mix of own and starred repos should be filtered correctly."""
args = create_args(starred_skip_size_over=500)
repos = [
{
"name": "my-huge-repo",
"owner": {"login": "testuser"},
"size": 1000 * 1024, # 1 GB - own repo, should be kept
},
{
"name": "starred-small",
"owner": {"login": "otheruser"},
"size": 100 * 1024, # 100 MB - under limit
"is_starred": True,
},
{
"name": "starred-huge",
"owner": {"login": "anotheruser"},
"size": 2000 * 1024, # 2 GB - over limit
"is_starred": True,
},
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 2
names = [r["name"] for r in result]
assert "my-huge-repo" in names
assert "starred-small" in names
assert "starred-huge" not in names
def test_no_size_limit_keeps_all_starred(self, create_args):
"""When no size limit is set, all starred repos should be kept."""
args = create_args(starred_skip_size_over=None)
repos = [
{
"name": "huge-starred-repo",
"owner": {"login": "otheruser"},
"size": 10000 * 1024, # 10 GB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
def test_repo_without_size_field_is_kept(self, create_args):
"""Repos without a size field should be kept (size defaults to 0)."""
args = create_args(starred_skip_size_over=500)
repos = [
{
"name": "no-size-repo",
"owner": {"login": "otheruser"},
"is_starred": True,
# No size field
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
def test_zero_value_warns_and_is_ignored(self, create_args, caplog):
"""Zero value should warn and keep all repos."""
args = create_args(starred_skip_size_over=0)
repos = [
{
"name": "huge-starred-repo",
"owner": {"login": "otheruser"},
"size": 10000 * 1024, # 10 GB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert "must be greater than 0" in caplog.text
def test_negative_value_warns_and_is_ignored(self, create_args, caplog):
"""Negative value should warn and keep all repos."""
args = create_args(starred_skip_size_over=-5)
repos = [
{
"name": "huge-starred-repo",
"owner": {"login": "otheruser"},
"size": 10000 * 1024, # 10 GB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert "must be greater than 0" in caplog.text
if __name__ == "__main__":
pytest.main([__file__, "-v"])