Compare commits

...

40 Commits

Author SHA1 Message Date
GitHub Action
c70cc43f57 Release version 0.58.0 2025-12-16 15:17:23 +00:00
Jose Diaz-Gonzalez
27d3fcdafa Merge pull request #471 from Iamrodos/fix/retry-logic
Fix retry logic for HTTP 5xx errors and network failures
2025-12-16 10:16:48 -05:00
Rodos
46140b0ff1 Fix retry logic for HTTP 5xx errors and network failures
Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a9048 to fix #29.

Fixes #140, #110, #138
2025-12-16 21:55:47 +11:00
Jose Diaz-Gonzalez
02dd902b67 Merge pull request #470 from Iamrodos/chore/cleanup-release-requirements
chore: remove transitive deps from release-requirements.txt
2025-12-12 21:51:24 -05:00
Rodos
241949137d chore: remove transitive deps from release-requirements.txt 2025-12-13 11:22:53 +11:00
Jose Diaz-Gonzalez
1155da849d Merge pull request #469 from josegonzalez/dependabot/pip/python-packages-3c63e8caab
chore(deps): bump urllib3 from 2.6.1 to 2.6.2 in the python-packages group
2025-12-12 16:39:50 -05:00
dependabot[bot]
59a70ff11a chore(deps): bump urllib3 in the python-packages group
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).


Updates `urllib3` from 2.6.1 to 2.6.2
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.1...2.6.2)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-12 13:09:29 +00:00
GitHub Action
ba852b5830 Release version 0.57.0 2025-12-12 11:07:14 +00:00
Jose Diaz-Gonzalez
934ee4b14b Merge pull request #467 from Iamrodos/docs/187-189-auth-docs
Add GitHub Apps documentation and stdin token example
2025-12-12 06:06:30 -05:00
Jose Diaz-Gonzalez
37a0c5c123 Merge pull request #468 from Iamrodos/feature/135-skip-assets-on
Add --skip-assets-on flag to skip release asset downloads (#135)
2025-12-12 06:05:47 -05:00
Rodos
f6e2f40b09 Add --skip-assets-on flag to skip release asset downloads (#135)
Allow users to skip downloading release assets for specific repositories
while still backing up release metadata. Useful for starred repos with
large assets (e.g. syncthing with 27GB+).

Usage: --skip-assets-on repo1 repo2 owner/repo3

Features:
- Space-separated repos (consistent with --exclude)
- Case-insensitive matching
- Supports both repo name and owner/repo format
2025-12-12 16:21:52 +11:00
Rodos
ef990483e2 Add GitHub Apps documentation and remove outdated header
- Add GitHub Apps authentication section with setup steps
  and CI/CD workflow example using actions/create-github-app-token
- Remove outdated machine-man-preview header (graduated 2020)

Closes #189
2025-12-12 10:25:49 +11:00
Rodos
3a513b6646 docs: add stdin token example to README
Add example showing how to pipe a token from stdin using
file:///dev/stdin to avoid storing tokens in environment
variables or command history.

Closes #187
2025-12-12 09:55:13 +11:00
GitHub Action
2bb83d6d8b Release version 0.56.0 2025-12-11 16:50:28 +00:00
Jose Diaz-Gonzalez
8fcc142621 Merge pull request #465 from Iamrodos/fix/379-lfs-clone-deprecated
fix: replace deprecated git lfs clone with git clone + git lfs fetch --all
2025-12-11 11:49:53 -05:00
Jose Diaz-Gonzalez
7615ce6102 Merge pull request #464 from Iamrodos/fix/246-restore-docs
docs: clarify no inbuilt restore and GitHub API limitations
2025-12-11 11:49:39 -05:00
Jose Diaz-Gonzalez
3f1ef821c3 Merge pull request #466 from Iamrodos/fix/112-windows-support
fix: add Windows support with entry_points and os.replace
2025-12-11 11:48:59 -05:00
Rodos
3684756eaa fix: add Windows support with entry_points and os.replace
- Replace os.rename() with os.replace() for atomic file operations
  on Windows (os.rename fails if destination exists on Windows)
- Add entry_points console_scripts for proper .exe generation on Windows
- Create github_backup/cli.py with main() entry point
- Add github_backup/__main__.py for python -m github_backup support
- Keep bin/github-backup as thin wrapper for backwards compatibility

Closes #112
2025-12-11 22:03:45 +11:00
Rodos
e745b55755 fix: replace deprecated git lfs clone with git clone + git lfs fetch --all
git lfs clone is deprecated - modern git clone handles LFS automatically.
Using git lfs fetch --all ensures all LFS objects across all refs are
backed up, matching the existing bare clone behavior and providing
complete LFS backups.

Closes #379
2025-12-11 20:55:38 +11:00
Rodos
75e6f56773 docs: add "Restoring from Backup" section to README
Clarifies that this tool is backup-only with no inbuilt restore.
Documents that git repos can be pushed back, but issues/PRs have
GitHub API limitations affecting all backup tools.

Closes #246
2025-12-11 20:35:08 +11:00
Jose Diaz-Gonzalez
b991c363a0 Merge pull request #463 from josegonzalez/dependabot/pip/python-packages-9e0978b55f
chore(deps): bump urllib3 from 2.6.0 to 2.6.1 in the python-packages group
2025-12-10 09:39:07 -05:00
dependabot[bot]
6d74af9126 chore(deps): bump urllib3 in the python-packages group
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).


Updates `urllib3` from 2.6.0 to 2.6.1
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.0...2.6.1)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-09 13:10:12 +00:00
Jose Diaz-Gonzalez
381d67af96 Merge pull request #462 from josegonzalez/dependabot/pip/python-packages-3a01b12ef5
chore(deps): bump the python-packages group with 3 updates
2025-12-08 16:00:24 -05:00
dependabot[bot]
2fbe8d272c chore(deps): bump the python-packages group with 3 updates
Bumps the python-packages group with 3 updates: [black](https://github.com/psf/black), [pytest](https://github.com/pytest-dev/pytest) and [platformdirs](https://github.com/tox-dev/platformdirs).


Updates `black` from 25.11.0 to 25.12.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.11.0...25.12.0)

Updates `pytest` from 9.0.1 to 9.0.2
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.1...9.0.2)

Updates `platformdirs` from 4.5.0 to 4.5.1
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.5.0...4.5.1)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 25.12.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-packages
- dependency-name: pytest
  dependency-version: 9.0.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
- dependency-name: platformdirs
  dependency-version: 4.5.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-08 13:09:32 +00:00
GitHub Action
eb5779ac23 Release version 0.55.0 2025-12-07 13:59:35 +00:00
Jose Diaz-Gonzalez
5b52931ebf Merge pull request #461 from Iamrodos/fix-cli-ux-and-cleanup
fix: CLI UX improvements and cleanup
2025-12-07 08:58:59 -05:00
Rodos
1d6d474408 fix: improve error messages for inaccessible repos and empty wikis 2025-12-07 21:50:49 +11:00
Rodos
b80049e96e test: add missing test coverage for case sensitivity fix 2025-12-07 21:21:37 +11:00
Rodos
58ad1c2378 docs: fix RST formatting in Known blocking errors section 2025-12-07 21:21:26 +11:00
Rodos
6e2a7e521c fix: --all-starred now clones repos without --repositories 2025-12-07 21:21:14 +11:00
Rodos
aba048a3e9 fix: warn when --private used without authentication 2025-12-07 21:20:54 +11:00
Jose Diaz-Gonzalez
9f7c08166f Merge pull request #460 from josegonzalez/dependabot/pip/urllib3-2.6.0
chore(deps): bump urllib3 from 2.5.0 to 2.6.0
2025-12-06 22:23:09 -05:00
dependabot[bot]
fdfaaec1ba chore(deps): bump urllib3 from 2.5.0 to 2.6.0
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-06 04:51:42 +00:00
Jose Diaz-Gonzalez
8f9cf7ff89 Merge pull request #459 from Iamrodos/issue-93-starred-gists-warning
fix: warn and skip when --starred-gists used for different user
2025-12-03 23:07:29 -05:00
Rodos
899ab5fdc2 fix: warn and skip when --starred-gists used for different user
GitHub's API only allows retrieving starred gists for the authenticated
user. Previously, using --starred-gists when backing up a different user
would silently return no relevant data.

Now warns and skips the retrieval entirely when the target user differs
from the authenticated user. Uses case-insensitive comparison to match
GitHub's username handling.

Fixes #93
2025-12-04 10:07:43 +11:00
GitHub Action
2a9d86a6bf Release version 0.54.0 2025-12-03 02:17:59 +00:00
Jose Diaz-Gonzalez
4fd3ea9e3c Merge pull request #457 from Iamrodos/readme-updates
docs: update README testing section and add fetch vs pull explanation
2025-12-02 21:15:33 -05:00
Jose Diaz-Gonzalez
041dc013f9 Merge pull request #458 from Iamrodos/fix-logging
fix: send INFO/DEBUG to stdout, WARNING/ERROR to stderr
2025-12-02 21:14:49 -05:00
Rodos
12802103c4 fix: send INFO/DEBUG to stdout, WARNING/ERROR to stderr
Fixes #182
2025-12-01 16:11:11 +11:00
Rodos
bf28b46954 docs: update README testing section and add fetch vs pull explanation 2025-12-01 15:55:00 +11:00
15 changed files with 1650 additions and 402 deletions

View File

@@ -1,9 +1,217 @@
Changelog
=========
0.53.0 (2025-11-30)
0.58.0 (2025-12-16)
-------------------
------------------------
- Fix retry logic for HTTP 5xx errors and network failures. [Rodos]
Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a9048 to fix #29.
Fixes #140, #110, #138
- Chore: remove transitive deps from release-requirements.txt. [Rodos]
- Chore(deps): bump urllib3 in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).
Updates `urllib3` from 2.6.1 to 2.6.2
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.1...2.6.2)
---
updated-dependencies:
- dependency-name: urllib3
dependency-version: 2.6.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
0.57.0 (2025-12-12)
-------------------
- Add GitHub Apps documentation and remove outdated header. [Rodos]
- Add GitHub Apps authentication section with setup steps
and CI/CD workflow example using actions/create-github-app-token
- Remove outdated machine-man-preview header (graduated 2020)
Closes #189
- Docs: add stdin token example to README. [Rodos]
Add example showing how to pipe a token from stdin using
file:///dev/stdin to avoid storing tokens in environment
variables or command history.
Closes #187
- Add --skip-assets-on flag to skip release asset downloads (#135)
[Rodos]
Allow users to skip downloading release assets for specific repositories
while still backing up release metadata. Useful for starred repos with
large assets (e.g. syncthing with 27GB+).
Usage: --skip-assets-on repo1 repo2 owner/repo3
Features:
- Space-separated repos (consistent with --exclude)
- Case-insensitive matching
- Supports both repo name and owner/repo format
0.56.0 (2025-12-11)
-------------------
Fix
~~~
- Replace deprecated git lfs clone with git clone + git lfs fetch --all.
[Rodos]
git lfs clone is deprecated - modern git clone handles LFS automatically.
Using git lfs fetch --all ensures all LFS objects across all refs are
backed up, matching the existing bare clone behavior and providing
complete LFS backups.
Closes #379
- Add Windows support with entry_points and os.replace. [Rodos]
- Replace os.rename() with os.replace() for atomic file operations
on Windows (os.rename fails if destination exists on Windows)
- Add entry_points console_scripts for proper .exe generation on Windows
- Create github_backup/cli.py with main() entry point
- Add github_backup/__main__.py for python -m github_backup support
- Keep bin/github-backup as thin wrapper for backwards compatibility
Closes #112
Other
~~~~~
- Docs: add "Restoring from Backup" section to README. [Rodos]
Clarifies that this tool is backup-only with no inbuilt restore.
Documents that git repos can be pushed back, but issues/PRs have
GitHub API limitations affecting all backup tools.
Closes #246
- Chore(deps): bump urllib3 in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).
Updates `urllib3` from 2.6.0 to 2.6.1
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.0...2.6.1)
---
updated-dependencies:
- dependency-name: urllib3
dependency-version: 2.6.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
- Chore(deps): bump the python-packages group with 3 updates.
[dependabot[bot]]
Bumps the python-packages group with 3 updates: [black](https://github.com/psf/black), [pytest](https://github.com/pytest-dev/pytest) and [platformdirs](https://github.com/tox-dev/platformdirs).
Updates `black` from 25.11.0 to 25.12.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.11.0...25.12.0)
Updates `pytest` from 9.0.1 to 9.0.2
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.1...9.0.2)
Updates `platformdirs` from 4.5.0 to 4.5.1
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.5.0...4.5.1)
---
updated-dependencies:
- dependency-name: black
dependency-version: 25.12.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: python-packages
- dependency-name: pytest
dependency-version: 9.0.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
- dependency-name: platformdirs
dependency-version: 4.5.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
0.55.0 (2025-12-07)
-------------------
Fix
~~~
- Improve error messages for inaccessible repos and empty wikis. [Rodos]
- --all-starred now clones repos without --repositories. [Rodos]
- Warn when --private used without authentication. [Rodos]
- Warn and skip when --starred-gists used for different user. [Rodos]
GitHub's API only allows retrieving starred gists for the authenticated
user. Previously, using --starred-gists when backing up a different user
would silently return no relevant data.
Now warns and skips the retrieval entirely when the target user differs
from the authenticated user. Uses case-insensitive comparison to match
GitHub's username handling.
Fixes #93
Other
~~~~~
- Test: add missing test coverage for case sensitivity fix. [Rodos]
- Docs: fix RST formatting in Known blocking errors section. [Rodos]
- Chore(deps): bump urllib3 from 2.5.0 to 2.6.0. [dependabot[bot]]
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0)
---
updated-dependencies:
- dependency-name: urllib3
dependency-version: 2.6.0
dependency-type: direct:production
...
0.54.0 (2025-12-03)
-------------------
Fix
~~~
- Send INFO/DEBUG to stdout, WARNING/ERROR to stderr. [Rodos]
Fixes #182
Other
~~~~~
- Docs: update README testing section and add fetch vs pull explanation.
[Rodos]
0.53.0 (2025-11-30)
-------------------
Fix
~~~

View File

@@ -50,8 +50,8 @@ CLI Help output::
[--keychain-name OSX_KEYCHAIN_ITEM_NAME]
[--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT]
[--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES]
[--skip-prerelease] [--assets] [--attachments]
[--exclude [REPOSITORY [REPOSITORY ...]]
[--skip-prerelease] [--assets] [--skip-assets-on [REPO ...]]
[--attachments] [--exclude [REPOSITORY [REPOSITORY ...]]
[--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE]
USER
@@ -133,6 +133,9 @@ CLI Help output::
--skip-prerelease skip prerelease and draft versions; only applies if including releases
--assets include assets alongside release information; only
applies if including releases
--skip-assets-on [REPO ...]
skip asset downloads for these repositories (e.g.
--skip-assets-on repo1 owner/repo2)
--attachments download user-attachments from issues and pull requests
to issues/attachments/{issue_number}/ and
pulls/attachments/{pull_number}/ directories
@@ -174,6 +177,37 @@ Customise the permissions for your use case, but for a personal account full bac
**Repository permissions**: Read access to contents, issues, metadata, pull requests, and webhooks.
GitHub Apps
~~~~~~~~~~~
GitHub Apps are ideal for organization backups in CI/CD. Tokens are scoped to specific repositories and expire after 1 hour.
**One-time setup:**
1. Create a GitHub App at *Settings -> Developer Settings -> GitHub Apps -> New GitHub App*
2. Set a name and homepage URL (can be any URL)
3. Uncheck "Webhook > Active" (not needed for backups)
4. Set permissions (same as fine-grained tokens above)
5. Click "Create GitHub App", then note the **App ID** shown on the next page
6. Under "Private keys", click "Generate a private key" and save the downloaded file
7. Go to *Install App* in your app's settings
8. Select the account/organization and which repositories to back up
**CI/CD usage with GitHub Actions:**
Store the App ID as a repository variable and the private key contents as a secret, then use ``actions/create-github-app-token``::
- uses: actions/create-github-app-token@v1
id: app-token
with:
app-id: ${{ vars.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
- run: github-backup myorg -t ${{ steps.app-token.outputs.token }} --as-app -o ./backup --all
Note: Installation tokens expire after 1 hour. For long-running backups, use a fine-grained personal access token instead.
Prefer SSH
~~~~~~~~~~
@@ -215,6 +249,8 @@ When you use the ``--lfs`` option, you will need to make sure you have Git LFS i
Instructions on how to do this can be found on https://git-lfs.github.com.
LFS objects are fetched for all refs, not just the current checkout, ensuring a complete backup of all LFS content across all branches and history.
About Attachments
-----------------
@@ -281,7 +317,7 @@ If the incremental argument is used, this will result in the next backup only re
It's therefore recommended to only use the incremental argument if the output/result is being actively monitored, or complimented with periodic full non-incremental runs, to avoid unexpected missing data in a regular backup runs.
1. **Starred public repo hooks blocking**
**Starred public repo hooks blocking**
Since the ``--all`` argument includes ``--hooks``, if you use ``--all`` and ``--all-starred`` together to clone a users starred public repositories, the backup will likely error and block the backup continuing.
@@ -301,6 +337,8 @@ Starred gists vs starred repo behaviour
The starred normal repo cloning (``--all-starred``) argument stores starred repos separately to the users own repositories. However, using ``--starred-gists`` will store starred gists within the same directory as the users own gists ``--gists``. Also, all gist repo directory names are IDs not the gist's name.
Note: ``--starred-gists`` only retrieves starred gists for the authenticated user, not the target user, due to a GitHub API limitation.
Skip existing on incomplete backups
-----------------------------------
@@ -308,6 +346,25 @@ Skip existing on incomplete backups
The ``--skip-existing`` argument will skip a backup if the directory already exists, even if the backup in that directory failed (perhaps due to a blocking error). This may result in unexpected missing data in a regular backup.
Updates use fetch, not pull
---------------------------
When updating an existing repository backup, ``github-backup`` uses ``git fetch`` rather than ``git pull``. This is intentional - a backup tool should reliably download data without risk of failure. Using ``git pull`` would require handling merge conflicts, which adds complexity and could cause backups to fail unexpectedly.
With fetch, **all branches and commits are downloaded** safely into remote-tracking branches. The working directory files won't change, but your backup is complete.
If you look at files directly (e.g., ``cat README.md``), you'll see the old content. The new data is in the remote-tracking branches (confusingly named "remote" but stored locally). To view or use the latest files::
git show origin/main:README.md # view a file
git merge origin/main # update working directory
All branches are backed up as remote refs (``origin/main``, ``origin/feature-branch``, etc.).
If you want to browse files directly without merging, consider using ``--bare`` which skips the working directory entirely - the backup is just the git data.
See `#269 <https://github.com/josegonzalez/python-github-backup/issues/269>`_ for more discussion.
Github Backup Examples
======================
@@ -338,6 +395,28 @@ Debug an error/block or incomplete backup into a temporary directory. Omit "incr
github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
Pipe a token from stdin to avoid storing it in environment variables or command history (Unix-like systems only)::
my-secret-manager get github-token | github-backup user -t file:///dev/stdin -o /backup --repositories
Restoring from Backup
=====================
This tool creates backups only, there is no inbuilt restore command.
**Git repositories, wikis, and gists** can be restored by pushing them back to GitHub as you would any git repository. For example, to restore a bare repository backup::
cd /tmp/white-house/repositories/petitions/repository
git push --mirror git@github.com:WhiteHouse/petitions.git
**Issues, pull requests, comments, and other metadata** are saved as JSON files for archival purposes. The GitHub API does not support recreating this data faithfully, creating issues via the API has limitations:
- New issue/PR numbers are assigned (original numbers cannot be set)
- Timestamps reflect creation time (original dates cannot be set)
- The API caller becomes the author (original authors cannot be set)
- Cross-references between issues and PRs will break
These are GitHub API limitations that affect all backup and migration tools, not just this one. Recreating issues with these limitations via the GitHub API is an exercise for the reader. The JSON backups remain useful for searching, auditing, or manual reference.
Development
@@ -357,7 +436,12 @@ A huge thanks to all the contibuters!
Testing
-------
This project currently contains no unit tests. To run linting::
To run the test suite::
pip install pytest
pytest
To run linting::
pip install flake8
flake8 --ignore=E501

View File

@@ -1,58 +1,18 @@
#!/usr/bin/env python
"""
Backwards-compatible wrapper script.
The recommended way to run github-backup is via the installed command
(pip install github-backup) or python -m github_backup.
This script is kept for backwards compatibility with existing installations
that may reference this path directly.
"""
import logging
import os
import sys
from github_backup.github_backup import (
backup_account,
backup_repositories,
check_git_lfs_install,
filter_repositories,
get_authenticated_user,
logger,
mkdir_p,
parse_args,
retrieve_repositories,
)
logging.basicConfig(
format="%(asctime)s.%(msecs)03d: %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S",
level=logging.INFO,
)
def main():
args = parse_args()
if args.quiet:
logger.setLevel(logging.WARNING)
output_directory = os.path.realpath(args.output_directory)
if not os.path.isdir(output_directory):
logger.info("Create output directory {0}".format(output_directory))
mkdir_p(output_directory)
if args.lfs_clone:
check_git_lfs_install()
if args.log_level:
log_level = logging.getLevelName(args.log_level.upper())
if isinstance(log_level, int):
logger.root.setLevel(log_level)
if not args.as_app:
logger.info("Backing up user {0} to {1}".format(args.user, output_directory))
authenticated_user = get_authenticated_user(args)
else:
authenticated_user = {"login": None}
repositories = retrieve_repositories(args, authenticated_user)
repositories = filter_repositories(args, repositories)
backup_repositories(args, output_directory, repositories)
backup_account(args, output_directory)
from github_backup.cli import main
from github_backup.github_backup import logger
if __name__ == "__main__":
try:

View File

@@ -1 +1 @@
__version__ = "0.53.0"
__version__ = "0.58.0"

13
github_backup/__main__.py Normal file
View File

@@ -0,0 +1,13 @@
"""Allow running as: python -m github_backup"""
import sys
from github_backup.cli import main
from github_backup.github_backup import logger
if __name__ == "__main__":
try:
main()
except Exception as e:
logger.error(str(e))
sys.exit(1)

82
github_backup/cli.py Normal file
View File

@@ -0,0 +1,82 @@
#!/usr/bin/env python
"""Command-line interface for github-backup."""
import logging
import os
import sys
from github_backup.github_backup import (
backup_account,
backup_repositories,
check_git_lfs_install,
filter_repositories,
get_auth,
get_authenticated_user,
logger,
mkdir_p,
parse_args,
retrieve_repositories,
)
# INFO and DEBUG go to stdout, WARNING and above go to stderr
log_format = logging.Formatter(
fmt="%(asctime)s.%(msecs)03d: %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S",
)
stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.setLevel(logging.DEBUG)
stdout_handler.addFilter(lambda r: r.levelno < logging.WARNING)
stdout_handler.setFormatter(log_format)
stderr_handler = logging.StreamHandler(sys.stderr)
stderr_handler.setLevel(logging.WARNING)
stderr_handler.setFormatter(log_format)
logging.basicConfig(level=logging.INFO, handlers=[stdout_handler, stderr_handler])
def main():
"""Main entry point for github-backup CLI."""
args = parse_args()
if args.private and not get_auth(args):
logger.warning(
"The --private flag has no effect without authentication. "
"Use -t/--token, -f/--token-fine, or -u/--username to authenticate."
)
if args.quiet:
logger.setLevel(logging.WARNING)
output_directory = os.path.realpath(args.output_directory)
if not os.path.isdir(output_directory):
logger.info("Create output directory {0}".format(output_directory))
mkdir_p(output_directory)
if args.lfs_clone:
check_git_lfs_install()
if args.log_level:
log_level = logging.getLevelName(args.log_level.upper())
if isinstance(log_level, int):
logger.root.setLevel(log_level)
if not args.as_app:
logger.info("Backing up user {0} to {1}".format(args.user, output_directory))
authenticated_user = get_authenticated_user(args)
else:
authenticated_user = {"login": None}
repositories = retrieve_repositories(args, authenticated_user)
repositories = filter_repositories(args, repositories)
backup_repositories(args, output_directory, repositories)
backup_account(args, output_directory)
if __name__ == "__main__":
try:
main()
except Exception as e:
logger.error(str(e))
sys.exit(1)

View File

@@ -12,6 +12,7 @@ import json
import logging
import os
import platform
import random
import re
import select
import socket
@@ -19,6 +20,7 @@ import ssl
import subprocess
import sys
import time
from collections.abc import Generator
from datetime import datetime
from http.client import IncompleteRead
from urllib.error import HTTPError, URLError
@@ -74,6 +76,9 @@ else:
" 3. Debian/Ubuntu: apt-get install ca-certificates\n\n"
)
# Retry configuration
MAX_RETRIES = 5
def logging_subprocess(
popenargs, stdout_log_level=logging.DEBUG, stderr_log_level=logging.ERROR, **kwargs
@@ -440,6 +445,12 @@ def parse_args(args=None):
dest="include_assets",
help="include assets alongside release information; only applies if including releases",
)
parser.add_argument(
"--skip-assets-on",
dest="skip_assets_on",
nargs="*",
help="skip asset downloads for these repositories",
)
parser.add_argument(
"--attachments",
action="store_true",
@@ -597,175 +608,181 @@ def get_github_repo_url(args, repository):
return repo_url
def retrieve_data_gen(args, template, query_args=None, single_request=False):
def calculate_retry_delay(attempt, headers):
"""Calculate delay before next retry with exponential backoff."""
# Respect retry-after header if present
if retry_after := headers.get("retry-after"):
return int(retry_after)
# Respect rate limit reset time
if int(headers.get("x-ratelimit-remaining", 1)) < 1:
reset_time = int(headers.get("x-ratelimit-reset", 0))
return max(10, reset_time - calendar.timegm(time.gmtime()))
# Exponential backoff with jitter for server errors (1s base, 120s max)
delay = min(1.0 * (2**attempt), 120.0)
return delay + random.uniform(0, delay * 0.1)
def retrieve_data(args, template, query_args=None, paginated=True):
"""
Fetch the data from GitHub API.
Handle both single requests and pagination with yield of individual dicts.
Handles throttling, retries, read errors, and DMCA takedowns.
"""
query_args = query_args or {}
auth = get_auth(args, encode=not args.as_app)
query_args = get_query_args(query_args)
per_page = 100
def _extract_next_page_url(link_header):
for link in link_header.split(","):
if 'rel="next"' in link:
return link[link.find("<") + 1:link.find(">")]
return None
def fetch_all() -> Generator[dict, None, None]:
next_url = None
while True:
if single_request:
request_per_page = None
else:
request_per_page = per_page
# FIRST: Fetch response
for attempt in range(MAX_RETRIES):
request = _construct_request(
request_per_page,
query_args,
next_url or template,
auth,
per_page=per_page if paginated else None,
query_args=query_args,
template=next_url or template,
auth=auth,
as_app=args.as_app,
fine=True if args.token_fine is not None else False,
) # noqa
r, errors = _get_response(request, auth, next_url or template)
fine=args.token_fine is not None,
)
http_response = make_request_with_retry(request, auth)
status_code = int(r.getcode())
match http_response.getcode():
case 200:
# Success - Parse JSON response
try:
response = json.loads(http_response.read().decode("utf-8"))
break # Exit retry loop and handle the data returned
except (
IncompleteRead,
json.decoder.JSONDecodeError,
TimeoutError,
) as e:
logger.warning(f"{type(e).__name__} reading response")
if attempt < MAX_RETRIES - 1:
delay = calculate_retry_delay(attempt, {})
logger.warning(
f"Retrying in {delay:.1f}s (attempt {attempt + 1}/{MAX_RETRIES})"
)
time.sleep(delay)
continue # Next retry attempt
# Handle DMCA takedown (HTTP 451) - raise exception to skip entire repository
if status_code == 451:
case 451:
# DMCA takedown - extract URL if available, then raise
dmca_url = None
try:
response_data = json.loads(r.read().decode("utf-8"))
response_data = json.loads(
http_response.read().decode("utf-8")
)
dmca_url = response_data.get("block", {}).get("html_url")
except Exception:
pass
raise RepositoryUnavailableError(
"Repository unavailable due to legal reasons (HTTP 451)",
dmca_url=dmca_url
dmca_url=dmca_url,
)
# Check if we got correct data
try:
response = json.loads(r.read().decode("utf-8"))
except IncompleteRead:
logger.warning("Incomplete read error detected")
read_error = True
except json.decoder.JSONDecodeError:
logger.warning("JSON decode error detected")
read_error = True
except TimeoutError:
logger.warning("Tiemout error detected")
read_error = True
case _:
raise Exception(
f"API request returned HTTP {http_response.getcode()}: {http_response.reason}"
)
else:
read_error = False
# be gentle with API request limit and throttle requests if remaining requests getting low
limit_remaining = int(r.headers.get("x-ratelimit-remaining", 0))
if args.throttle_limit and limit_remaining <= args.throttle_limit:
logger.info(
"API request limit hit: {} requests left, pausing further requests for {}s".format(
limit_remaining, args.throttle_pause
logger.error(
f"Failed to read response after {MAX_RETRIES} attempts for {next_url or template}"
)
raise Exception(
f"Failed to read response after {MAX_RETRIES} attempts for {next_url or template}"
)
# SECOND: Process and paginate
# Pause before next request if rate limit is low
if (
remaining := int(http_response.headers.get("x-ratelimit-remaining", 0))
) <= (args.throttle_limit or 0):
if args.throttle_limit:
logger.info(
f"Throttling: {remaining} requests left, pausing {args.throttle_pause}s"
)
time.sleep(args.throttle_pause)
retries = 0
while retries < 3 and (status_code == 502 or read_error):
logger.warning("API request failed. Retrying in 5 seconds")
retries += 1
time.sleep(5)
request = _construct_request(
request_per_page,
query_args,
next_url or template,
auth,
as_app=args.as_app,
fine=True if args.token_fine is not None else False,
) # noqa
r, errors = _get_response(request, auth, next_url or template)
status_code = int(r.getcode())
try:
response = json.loads(r.read().decode("utf-8"))
read_error = False
except IncompleteRead:
logger.warning("Incomplete read error detected")
read_error = True
except json.decoder.JSONDecodeError:
logger.warning("JSON decode error detected")
read_error = True
except TimeoutError:
logger.warning("Tiemout error detected")
read_error = True
if status_code != 200:
template = "API request returned HTTP {0}: {1}"
errors.append(template.format(status_code, r.reason))
raise Exception(", ".join(errors))
if read_error:
template = "API request problem reading response for {0}"
errors.append(template.format(request))
raise Exception(", ".join(errors))
if len(errors) == 0:
if type(response) is list:
for resp in response:
yield resp
# Parse Link header for next page URL (cursor-based pagination)
link_header = r.headers.get("Link", "")
next_url = None
if link_header:
# Parse Link header: <https://api.github.com/...?per_page=100&after=cursor>; rel="next"
for link in link_header.split(","):
if 'rel="next"' in link:
next_url = link[link.find("<") + 1:link.find(">")]
break
if not next_url:
break
elif type(response) is dict and single_request:
# Yield results
if isinstance(response, list):
yield from response
elif isinstance(response, dict):
yield response
if len(errors) > 0:
raise Exception(", ".join(errors))
if single_request:
break
def retrieve_data(args, template, query_args=None, single_request=False):
return list(retrieve_data_gen(args, template, query_args, single_request))
def get_query_args(query_args=None):
if not query_args:
query_args = {}
return query_args
def _get_response(request, auth, template):
retry_timeout = 3
errors = []
# We'll make requests in a loop so we can
# delay and retry in the case of rate-limiting
while True:
should_continue = False
try:
r = urlopen(request, context=https_ctx)
except HTTPError as exc:
errors, should_continue = _request_http_error(exc, auth, errors) # noqa
r = exc
except URLError as e:
logger.warning(e.reason)
should_continue, retry_timeout = _request_url_error(template, retry_timeout)
if not should_continue:
raise
except socket.error as e:
logger.warning(e.strerror)
should_continue, retry_timeout = _request_url_error(template, retry_timeout)
if not should_continue:
raise
if should_continue:
continue
break
return r, errors
def _construct_request(
per_page, query_args, template, auth, as_app=None, fine=False
# Check for more pages
if not paginated or not (
next_url := _extract_next_page_url(
http_response.headers.get("Link", "")
)
):
break # No more data
return list(fetch_all())
def make_request_with_retry(request, auth):
"""Make HTTP request with automatic retry for transient errors."""
def is_retryable_status(status_code, headers):
# Server errors are always retryable
if status_code in (500, 502, 503, 504):
return True
# Rate limit (403/429) is retryable if limit exhausted
if status_code in (403, 429):
return int(headers.get("x-ratelimit-remaining", 1)) < 1
return False
for attempt in range(MAX_RETRIES):
try:
return urlopen(request, context=https_ctx)
except HTTPError as exc:
# HTTPError can be used as a response-like object
if not is_retryable_status(exc.code, exc.headers):
raise # Non-retryable error
if attempt >= MAX_RETRIES - 1:
logger.error(f"HTTP {exc.code} failed after {MAX_RETRIES} attempts")
raise
delay = calculate_retry_delay(attempt, exc.headers)
logger.warning(
f"HTTP {exc.code}, retrying in {delay:.1f}s "
f"(attempt {attempt + 1}/{MAX_RETRIES})"
)
if auth is None and exc.code in (403, 429):
logger.info("Hint: Authenticate to raise your GitHub rate limit")
time.sleep(delay)
except (URLError, socket.error) as e:
if attempt >= MAX_RETRIES - 1:
logger.error(f"Connection error failed after {MAX_RETRIES} attempts: {e}")
raise
delay = calculate_retry_delay(attempt, {})
logger.warning(
f"Connection error: {e}, retrying in {delay:.1f}s "
f"(attempt {attempt + 1}/{MAX_RETRIES})"
)
time.sleep(delay)
raise Exception(f"Request failed after {MAX_RETRIES} attempts") # pragma: no cover
def _construct_request(per_page, query_args, template, auth, as_app=None, fine=False):
# If template is already a full URL with query params (from Link header), use it directly
if "?" in template and template.startswith("http"):
request_url = template
@@ -796,9 +813,6 @@ def _construct_request(
else:
auth = auth.encode("ascii")
request.add_header("Authorization", "token ".encode("ascii") + auth)
request.add_header(
"Accept", "application/vnd.github.machine-man-preview+json"
)
log_url = template if "?" not in template else template.split("?")[0]
if querystring:
@@ -807,52 +821,6 @@ def _construct_request(
return request
def _request_http_error(exc, auth, errors):
# HTTPError behaves like a Response so we can
# check the status code and headers to see exactly
# what failed.
should_continue = False
headers = exc.headers
limit_remaining = int(headers.get("x-ratelimit-remaining", 0))
if exc.code == 403 and limit_remaining < 1:
# The X-RateLimit-Reset header includes a
# timestamp telling us when the limit will reset
# so we can calculate how long to wait rather
# than inefficiently polling:
gm_now = calendar.timegm(time.gmtime())
reset = int(headers.get("x-ratelimit-reset", 0)) or gm_now
# We'll never sleep for less than 10 seconds:
delta = max(10, reset - gm_now)
limit = headers.get("x-ratelimit-limit")
logger.warning(
"Exceeded rate limit of {} requests; waiting {} seconds to reset".format(
limit, delta
)
) # noqa
if auth is None:
logger.info("Hint: Authenticate to raise your GitHub rate limit")
time.sleep(delta)
should_continue = True
return errors, should_continue
def _request_url_error(template, retry_timeout):
# In case of a connection timing out, we can retry a few time
# But we won't crash and not back-up the rest now
logger.info("'{}' timed out".format(template))
retry_timeout -= 1
if retry_timeout >= 0:
return True, retry_timeout
raise Exception("'{}' timed out to much, skipping!".format(template))
class S3HTTPRedirectHandler(HTTPRedirectHandler):
"""
A subclassed redirect handler for downloading Github assets from S3.
@@ -1038,7 +1006,7 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False):
bytes_downloaded += len(chunk)
# Atomic rename to final location
os.rename(temp_path, path)
os.replace(temp_path, path)
metadata["size_bytes"] = bytes_downloaded
metadata["success"] = True
@@ -1459,7 +1427,7 @@ def download_attachments(
# Rename to add extension (already atomic from download)
try:
os.rename(filepath, final_filepath)
os.replace(filepath, final_filepath)
metadata["saved_as"] = os.path.basename(final_filepath)
except Exception as e:
logger.warning(
@@ -1480,9 +1448,11 @@ def download_attachments(
manifest = {
"issue_number": number,
"issue_type": item_type,
"repository": f"{args.user}/{args.repository}"
"repository": (
f"{args.user}/{args.repository}"
if hasattr(args, "repository") and args.repository
else args.user,
else args.user
),
"manifest_updated_at": datetime.now(timezone.utc).isoformat(),
"attachments": attachment_metadata_list,
}
@@ -1490,7 +1460,7 @@ def download_attachments(
manifest_path = os.path.join(attachments_dir, "manifest.json")
with open(manifest_path + ".temp", "w") as f:
json.dump(manifest, f, indent=2)
os.rename(manifest_path + ".temp", manifest_path) # Atomic write
os.replace(manifest_path + ".temp", manifest_path) # Atomic write
logger.debug(
"Wrote manifest for {0} #{1}: {2} attachments".format(
item_type_display, number, len(attachment_metadata_list)
@@ -1500,7 +1470,7 @@ def download_attachments(
def get_authenticated_user(args):
template = "https://{0}/user".format(get_github_api_host(args))
data = retrieve_data(args, template, single_request=True)
data = retrieve_data(args, template, paginated=False)
return data[0]
@@ -1514,7 +1484,7 @@ def check_git_lfs_install():
def retrieve_repositories(args, authenticated_user):
logger.info("Retrieving repositories")
single_request = False
paginated = True
if args.user == authenticated_user["login"]:
# we must use the /user/repos API to be able to access private repos
template = "https://{0}/user/repos".format(get_github_api_host(args))
@@ -1537,18 +1507,16 @@ def retrieve_repositories(args, authenticated_user):
repo_path = args.repository
else:
repo_path = "{0}/{1}".format(args.user, args.repository)
single_request = True
template = "https://{0}/repos/{1}".format(
get_github_api_host(args), repo_path
)
paginated = False
template = "https://{0}/repos/{1}".format(get_github_api_host(args), repo_path)
repos = retrieve_data(args, template, single_request=single_request)
repos = retrieve_data(args, template, paginated=paginated)
if args.all_starred:
starred_template = "https://{0}/users/{1}/starred".format(
get_github_api_host(args), args.user
)
starred_repos = retrieve_data(args, starred_template, single_request=False)
starred_repos = retrieve_data(args, starred_template)
# flag each repo as starred for downstream processing
for item in starred_repos:
item.update({"is_starred": True})
@@ -1558,19 +1526,26 @@ def retrieve_repositories(args, authenticated_user):
gists_template = "https://{0}/users/{1}/gists".format(
get_github_api_host(args), args.user
)
gists = retrieve_data(args, gists_template, single_request=False)
gists = retrieve_data(args, gists_template)
# flag each repo as a gist for downstream processing
for item in gists:
item.update({"is_gist": True})
repos.extend(gists)
if args.include_starred_gists:
if (
not authenticated_user.get("login")
or args.user.lower() != authenticated_user["login"].lower()
):
logger.warning(
"Cannot retrieve starred gists for '%s'. GitHub only allows access to the authenticated user's starred gists.",
args.user,
)
else:
starred_gists_template = "https://{0}/gists/starred".format(
get_github_api_host(args)
)
starred_gists = retrieve_data(
args, starred_gists_template, single_request=False
)
starred_gists = retrieve_data(args, starred_gists_template)
# flag each repo as a starred gist for downstream processing
for item in starred_gists:
item.update({"is_gist": True, "is_starred": True})
@@ -1666,8 +1641,11 @@ def backup_repositories(args, output_directory, repositories):
repo_url = get_github_repo_url(args, repository)
include_gists = args.include_gists or args.include_starred_gists
if (args.include_repository or args.include_everything) or (
include_gists and repository.get("is_gist")
include_starred = args.all_starred and repository.get("is_starred")
if (
(args.include_repository or args.include_everything)
or (include_gists and repository.get("is_gist"))
or include_starred
):
repo_name = (
repository.get("name")
@@ -1728,7 +1706,9 @@ def backup_repositories(args, output_directory, repositories):
include_assets=args.include_assets or args.include_everything,
)
except RepositoryUnavailableError as e:
logger.warning(f"Repository {repository['full_name']} is unavailable (HTTP 451)")
logger.warning(
f"Repository {repository['full_name']} is unavailable (HTTP 451)"
)
if e.dmca_url:
logger.warning(f"DMCA notice: {e.dmca_url}")
logger.info(f"Skipping remaining resources for {repository['full_name']}")
@@ -1788,7 +1768,11 @@ def backup_issues(args, repo_cwd, repository, repos_template):
modified = os.path.getmtime(issue_file)
modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ")
if modified > issue["updated_at"]:
logger.info("Skipping issue {0} because it wasn't modified since last backup".format(number))
logger.info(
"Skipping issue {0} because it wasn't modified since last backup".format(
number
)
)
continue
if args.include_issue_comments or args.include_everything:
@@ -1804,7 +1788,7 @@ def backup_issues(args, repo_cwd, repository, repos_template):
with codecs.open(issue_file + ".temp", "w", encoding="utf-8") as f:
json_dump(issue, f)
os.rename(issue_file + ".temp", issue_file) # Unlike json_dump, this is atomic
os.replace(issue_file + ".temp", issue_file) # Atomic write
def backup_pulls(args, repo_cwd, repository, repos_template):
@@ -1830,14 +1814,14 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
pull_states = ["open", "closed"]
for pull_state in pull_states:
query_args["state"] = pull_state
_pulls = retrieve_data_gen(args, _pulls_template, query_args=query_args)
_pulls = retrieve_data(args, _pulls_template, query_args=query_args)
for pull in _pulls:
if args.since and pull["updated_at"] < args.since:
break
if not args.since or pull["updated_at"] >= args.since:
pulls[pull["number"]] = pull
else:
_pulls = retrieve_data_gen(args, _pulls_template, query_args=query_args)
_pulls = retrieve_data(args, _pulls_template, query_args=query_args)
for pull in _pulls:
if args.since and pull["updated_at"] < args.since:
break
@@ -1845,7 +1829,7 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
pulls[pull["number"]] = retrieve_data(
args,
_pulls_template + "/{}".format(pull["number"]),
single_request=True,
paginated=False,
)[0]
logger.info("Saving {0} pull requests to disk".format(len(list(pulls.keys()))))
@@ -1862,7 +1846,11 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
modified = os.path.getmtime(pull_file)
modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ")
if modified > pull["updated_at"]:
logger.info("Skipping pull request {0} because it wasn't modified since last backup".format(number))
logger.info(
"Skipping pull request {0} because it wasn't modified since last backup".format(
number
)
)
continue
if args.include_pull_comments or args.include_everything:
template = comments_regular_template.format(number)
@@ -1879,7 +1867,7 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
with codecs.open(pull_file + ".temp", "w", encoding="utf-8") as f:
json_dump(pull, f)
os.rename(pull_file + ".temp", pull_file) # Unlike json_dump, this is atomic
os.replace(pull_file + ".temp", pull_file) # Atomic write
def backup_milestones(args, repo_cwd, repository, repos_template):
@@ -1912,9 +1900,11 @@ def backup_milestones(args, repo_cwd, repository, repos_template):
elif written_count == 0:
logger.info("{0} milestones unchanged, skipped write".format(total))
else:
logger.info("Saved {0} of {1} milestones to disk ({2} unchanged)".format(
logger.info(
"Saved {0} of {1} milestones to disk ({2} unchanged)".format(
written_count, total, total - written_count
))
)
)
def backup_labels(args, repo_cwd, repository, repos_template):
@@ -1968,6 +1958,20 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
)
releases = releases[: args.number_of_latest_releases]
# Check if this repo should skip asset downloads (case-insensitive)
skip_assets = False
if include_assets:
repo_name = repository.get("name", "").lower()
repo_full_name = repository.get("full_name", "").lower()
skip_repos = [r.lower() for r in (args.skip_assets_on or [])]
skip_assets = repo_name in skip_repos or repo_full_name in skip_repos
if skip_assets:
logger.info(
"Skipping assets for {0} ({1} releases) due to --skip-assets-on".format(
repository.get("name"), len(releases)
)
)
# for each release, store it
written_count = 0
for release in releases:
@@ -1979,7 +1983,7 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
if json_dump_if_changed(release, output_filepath):
written_count += 1
if include_assets:
if include_assets and not skip_assets:
assets = retrieve_data(args, release["assets_url"])
if len(assets) > 0:
# give release asset files somewhere to live & download them (not including source archives)
@@ -2001,9 +2005,11 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
elif written_count == 0:
logger.info("{0} releases unchanged, skipped write".format(total))
else:
logger.info("Saved {0} of {1} releases to disk ({2} unchanged)".format(
logger.info(
"Saved {0} of {1} releases to disk ({2} unchanged)".format(
written_count, total, total - written_count
))
)
)
def fetch_repository(
@@ -2037,9 +2043,14 @@ def fetch_repository(
"git ls-remote " + remote_url, stdout=FNULL, stderr=FNULL, shell=True
)
if initialized == 128:
if ".wiki.git" in remote_url:
logger.info(
"Skipping {0} ({1}) since it's not initialized".format(
name, masked_remote_url
"Skipping {0} wiki (wiki is enabled but has no content)".format(name)
)
else:
logger.info(
"Skipping {0} (repository not accessible - may be empty, private, or credentials invalid)".format(
name
)
)
return
@@ -2082,12 +2093,14 @@ def fetch_repository(
if no_prune:
git_command.pop()
logging_subprocess(git_command, cwd=local_dir)
else:
if lfs_clone:
git_command = ["git", "lfs", "clone", remote_url, local_dir]
else:
git_command = ["git", "clone", remote_url, local_dir]
logging_subprocess(git_command)
if lfs_clone:
git_command = ["git", "lfs", "fetch", "--all", "--prune"]
if no_prune:
git_command.pop()
logging_subprocess(git_command, cwd=local_dir)
def backup_account(args, output_directory):
@@ -2196,5 +2209,5 @@ def json_dump_if_changed(data, output_file_path):
temp_file = output_file_path + ".temp"
with codecs.open(temp_file, "w", encoding="utf-8") as f:
f.write(new_content)
os.rename(temp_file, output_file_path) # Atomic on POSIX systems
os.replace(temp_file, output_file_path) # Atomic write
return True

View File

@@ -1,40 +1,15 @@
# Linting & Formatting
autopep8==2.3.2
black==25.11.0
bleach==6.3.0
certifi==2025.11.12
charset-normalizer==3.4.4
click==8.3.1
colorama==0.4.6
docutils==0.22.3
black==25.12.0
flake8==7.3.0
gitchangelog==3.0.4
pytest==9.0.1
idna==3.11
importlib-metadata==8.7.0
jaraco.classes==3.4.0
keyring==25.7.0
markdown-it-py==4.0.0
mccabe==0.7.0
mdurl==0.1.2
more-itertools==10.8.0
mypy-extensions==1.1.0
packaging==25.0
pathspec==0.12.1
pkginfo==1.12.1.2
platformdirs==4.5.0
pycodestyle==2.14.0
pyflakes==3.4.0
Pygments==2.19.2
readme-renderer==44.0
requests==2.32.5
requests-toolbelt==1.0.0
restructuredtext-lint==2.0.2
rfc3986==2.0.0
rich==14.2.0
setuptools==80.9.0
six==1.17.0
tqdm==4.67.1
# Testing
pytest==9.0.2
# Release & Publishing
twine==6.2.0
urllib3==2.5.0
webencodings==0.5.1
zipp==3.23.0
gitchangelog==3.0.4
setuptools==80.9.0
# Documentation
restructuredtext-lint==2.0.2

View File

@@ -33,7 +33,11 @@ setup(
author="Jose Diaz-Gonzalez",
author_email="github-backup@josediazgonzalez.com",
packages=["github_backup"],
scripts=["bin/github-backup"],
entry_points={
"console_scripts": [
"github-backup=github_backup.cli:main",
],
},
url="http://github.com/josegonzalez/python-github-backup",
license="MIT",
classifiers=[

161
tests/test_all_starred.py Normal file
View File

@@ -0,0 +1,161 @@
"""Tests for --all-starred flag behavior (issue #225)."""
import pytest
from unittest.mock import Mock, patch
from github_backup import github_backup
class TestAllStarredCloning:
"""Test suite for --all-starred repository cloning behavior.
Issue #225: --all-starred should clone starred repos without requiring --repositories.
"""
def _create_mock_args(self, **overrides):
"""Create a mock args object with sensible defaults."""
args = Mock()
args.user = "testuser"
args.output_directory = "/tmp/backup"
args.include_repository = False
args.include_everything = False
args.include_gists = False
args.include_starred_gists = False
args.all_starred = False
args.skip_existing = False
args.bare_clone = False
args.lfs_clone = False
args.no_prune = False
args.include_wiki = False
args.include_issues = False
args.include_issue_comments = False
args.include_issue_events = False
args.include_pulls = False
args.include_pull_comments = False
args.include_pull_commits = False
args.include_pull_details = False
args.include_labels = False
args.include_hooks = False
args.include_milestones = False
args.include_releases = False
args.include_assets = False
args.include_attachments = False
args.incremental = False
args.incremental_by_files = False
args.github_host = None
args.prefer_ssh = False
args.token_classic = None
args.token_fine = None
args.username = None
args.password = None
args.as_app = False
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
for key, value in overrides.items():
setattr(args, key, value)
return args
@patch('github_backup.github_backup.fetch_repository')
@patch('github_backup.github_backup.get_github_repo_url')
def test_all_starred_clones_without_repositories_flag(self, mock_get_url, mock_fetch):
"""--all-starred should clone starred repos without --repositories flag.
This is the core fix for issue #225.
"""
args = self._create_mock_args(all_starred=True)
mock_get_url.return_value = "https://github.com/otheruser/awesome-project.git"
# A starred repository (is_starred flag set by retrieve_repositories)
starred_repo = {
"name": "awesome-project",
"full_name": "otheruser/awesome-project",
"owner": {"login": "otheruser"},
"private": False,
"fork": False,
"has_wiki": False,
"is_starred": True, # This flag is set for starred repos
}
with patch('github_backup.github_backup.mkdir_p'):
github_backup.backup_repositories(args, "/tmp/backup", [starred_repo])
# fetch_repository should be called for the starred repo
assert mock_fetch.called, "--all-starred should trigger repository cloning"
mock_fetch.assert_called_once()
call_args = mock_fetch.call_args
assert call_args[0][0] == "awesome-project" # repo name
@patch('github_backup.github_backup.fetch_repository')
@patch('github_backup.github_backup.get_github_repo_url')
def test_starred_repo_not_cloned_without_all_starred_flag(self, mock_get_url, mock_fetch):
"""Starred repos should NOT be cloned if --all-starred is not set."""
args = self._create_mock_args(all_starred=False)
mock_get_url.return_value = "https://github.com/otheruser/awesome-project.git"
starred_repo = {
"name": "awesome-project",
"full_name": "otheruser/awesome-project",
"owner": {"login": "otheruser"},
"private": False,
"fork": False,
"has_wiki": False,
"is_starred": True,
}
with patch('github_backup.github_backup.mkdir_p'):
github_backup.backup_repositories(args, "/tmp/backup", [starred_repo])
# fetch_repository should NOT be called
assert not mock_fetch.called, "Starred repos should not be cloned without --all-starred"
@patch('github_backup.github_backup.fetch_repository')
@patch('github_backup.github_backup.get_github_repo_url')
def test_non_starred_repo_not_cloned_with_only_all_starred(self, mock_get_url, mock_fetch):
"""Non-starred repos should NOT be cloned when only --all-starred is set."""
args = self._create_mock_args(all_starred=True)
mock_get_url.return_value = "https://github.com/testuser/my-project.git"
# A regular (non-starred) repository
regular_repo = {
"name": "my-project",
"full_name": "testuser/my-project",
"owner": {"login": "testuser"},
"private": False,
"fork": False,
"has_wiki": False,
# No is_starred flag
}
with patch('github_backup.github_backup.mkdir_p'):
github_backup.backup_repositories(args, "/tmp/backup", [regular_repo])
# fetch_repository should NOT be called for non-starred repos
assert not mock_fetch.called, "Non-starred repos should not be cloned with only --all-starred"
@patch('github_backup.github_backup.fetch_repository')
@patch('github_backup.github_backup.get_github_repo_url')
def test_repositories_flag_still_works(self, mock_get_url, mock_fetch):
"""--repositories flag should still clone repos as before."""
args = self._create_mock_args(include_repository=True)
mock_get_url.return_value = "https://github.com/testuser/my-project.git"
regular_repo = {
"name": "my-project",
"full_name": "testuser/my-project",
"owner": {"login": "testuser"},
"private": False,
"fork": False,
"has_wiki": False,
}
with patch('github_backup.github_backup.mkdir_p'):
github_backup.backup_repositories(args, "/tmp/backup", [regular_repo])
# fetch_repository should be called
assert mock_fetch.called, "--repositories should trigger repository cloning"
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -0,0 +1,112 @@
"""Tests for case-insensitive username/organization filtering."""
import pytest
from unittest.mock import Mock
from github_backup import github_backup
class TestCaseSensitivity:
"""Test suite for case-insensitive username matching in filter_repositories."""
def test_filter_repositories_case_insensitive_user(self):
"""Should filter repositories case-insensitively for usernames.
Reproduces issue #198 where typing 'iamrodos' fails to match
repositories with owner.login='Iamrodos' (the canonical case from GitHub API).
"""
# Simulate user typing lowercase username
args = Mock()
args.user = "iamrodos" # lowercase (what user typed)
args.repository = None
args.name_regex = None
args.languages = None
args.exclude = None
args.fork = False
args.private = False
args.public = False
args.all = True
# Simulate GitHub API returning canonical case
repos = [
{
"name": "repo1",
"owner": {"login": "Iamrodos"}, # Capital I (canonical from API)
"private": False,
"fork": False,
},
{
"name": "repo2",
"owner": {"login": "Iamrodos"},
"private": False,
"fork": False,
},
]
filtered = github_backup.filter_repositories(args, repos)
# Should match despite case difference
assert len(filtered) == 2
assert filtered[0]["name"] == "repo1"
assert filtered[1]["name"] == "repo2"
def test_filter_repositories_case_insensitive_org(self):
"""Should filter repositories case-insensitively for organizations.
Tests the example from issue #198 where 'prai-org' doesn't match 'PRAI-Org'.
"""
args = Mock()
args.user = "prai-org" # lowercase (what user typed)
args.repository = None
args.name_regex = None
args.languages = None
args.exclude = None
args.fork = False
args.private = False
args.public = False
args.all = True
repos = [
{
"name": "repo1",
"owner": {"login": "PRAI-Org"}, # Different case (canonical from API)
"private": False,
"fork": False,
},
]
filtered = github_backup.filter_repositories(args, repos)
# Should match despite case difference
assert len(filtered) == 1
assert filtered[0]["name"] == "repo1"
def test_filter_repositories_case_variations(self):
"""Should handle various case combinations correctly."""
args = Mock()
args.user = "TeSt-UsEr" # Mixed case
args.repository = None
args.name_regex = None
args.languages = None
args.exclude = None
args.fork = False
args.private = False
args.public = False
args.all = True
repos = [
{"name": "repo1", "owner": {"login": "test-user"}, "private": False, "fork": False},
{"name": "repo2", "owner": {"login": "TEST-USER"}, "private": False, "fork": False},
{"name": "repo3", "owner": {"login": "TeSt-UsEr"}, "private": False, "fork": False},
{"name": "repo4", "owner": {"login": "other-user"}, "private": False, "fork": False},
]
filtered = github_backup.filter_repositories(args, repos)
# Should match first 3 (all case variations of same user)
assert len(filtered) == 3
assert set(r["name"] for r in filtered) == {"repo1", "repo2", "repo3"}
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -13,7 +13,6 @@ class TestHTTP451Exception:
def test_repository_unavailable_error_raised(self):
"""HTTP 451 should raise RepositoryUnavailableError with DMCA URL."""
# Create mock args
args = Mock()
args.as_app = False
args.token_fine = None
@@ -25,7 +24,6 @@ class TestHTTP451Exception:
args.throttle_limit = None
args.throttle_pause = 0
# Mock HTTPError 451 response
mock_response = Mock()
mock_response.getcode.return_value = 451
@@ -41,14 +39,10 @@ class TestHTTP451Exception:
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues")
# Check exception has DMCA URL
assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
assert "451" in str(exc_info.value)
@@ -71,14 +65,10 @@ class TestHTTP451Exception:
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues")
# Exception raised even without DMCA URL
assert exc_info.value.dmca_url is None
assert "451" in str(exc_info.value)
@@ -101,42 +91,9 @@ class TestHTTP451Exception:
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with pytest.raises(github_backup.RepositoryUnavailableError):
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
def test_other_http_errors_unchanged(self):
"""Other HTTP errors should still raise generic Exception."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
mock_response = Mock()
mock_response.getcode.return_value = 404
mock_response.read.return_value = b'{"message": "Not Found"}'
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Not Found"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
# Should raise generic Exception, not RepositoryUnavailableError
with pytest.raises(Exception) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/notfound/issues"))
assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError)
assert "404" in str(exc_info.value)
github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues")
if __name__ == "__main__":

View File

@@ -40,7 +40,7 @@ class MockHTTPResponse:
@pytest.fixture
def mock_args():
"""Mock args for retrieve_data_gen."""
"""Mock args for retrieve_data."""
args = Mock()
args.as_app = False
args.token_fine = None
@@ -77,11 +77,9 @@ def test_cursor_based_pagination(mock_args):
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list(
github_backup.retrieve_data_gen(
results = github_backup.retrieve_data(
mock_args, "https://api.github.com/repos/owner/repo/issues"
)
)
# Verify all items retrieved and cursor was used in second request
assert len(results) == 150
@@ -112,11 +110,9 @@ def test_page_based_pagination(mock_args):
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list(
github_backup.retrieve_data_gen(
results = github_backup.retrieve_data(
mock_args, "https://api.github.com/repos/owner/repo/pulls"
)
)
# Verify all items retrieved and page parameter was used (not cursor)
assert len(results) == 180
@@ -142,11 +138,9 @@ def test_no_link_header_stops_pagination(mock_args):
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list(
github_backup.retrieve_data_gen(
results = github_backup.retrieve_data(
mock_args, "https://api.github.com/repos/owner/repo/labels"
)
)
# Verify pagination stopped after first request
assert len(results) == 50

365
tests/test_retrieve_data.py Normal file
View File

@@ -0,0 +1,365 @@
"""Tests for retrieve_data function."""
import json
import socket
from unittest.mock import Mock, patch
from urllib.error import HTTPError, URLError
import pytest
from github_backup import github_backup
from github_backup.github_backup import (
MAX_RETRIES,
calculate_retry_delay,
make_request_with_retry,
)
class TestCalculateRetryDelay:
def test_respects_retry_after_header(self):
headers = {'retry-after': '30'}
assert calculate_retry_delay(0, headers) == 30
def test_respects_rate_limit_reset(self):
import time
import calendar
# Set reset time 60 seconds in the future
future_reset = calendar.timegm(time.gmtime()) + 60
headers = {
'x-ratelimit-remaining': '0',
'x-ratelimit-reset': str(future_reset)
}
delay = calculate_retry_delay(0, headers)
# Should be approximately 60 seconds (with some tolerance for execution time)
assert 55 <= delay <= 65
def test_exponential_backoff(self):
delay_0 = calculate_retry_delay(0, {})
delay_1 = calculate_retry_delay(1, {})
delay_2 = calculate_retry_delay(2, {})
# Base delay is 1s, so delays should be roughly 1, 2, 4 (plus jitter)
assert 0.9 <= delay_0 <= 1.2 # ~1s + up to 10% jitter
assert 1.8 <= delay_1 <= 2.4 # ~2s + up to 10% jitter
assert 3.6 <= delay_2 <= 4.8 # ~4s + up to 10% jitter
def test_max_delay_cap(self):
# Very high attempt number should not exceed 120s + jitter
delay = calculate_retry_delay(100, {})
assert delay <= 120 * 1.1 # 120s max + 10% jitter
def test_minimum_rate_limit_delay(self):
import time
import calendar
# Set reset time in the past (already reset)
past_reset = calendar.timegm(time.gmtime()) - 100
headers = {
'x-ratelimit-remaining': '0',
'x-ratelimit-reset': str(past_reset)
}
delay = calculate_retry_delay(0, headers)
# Should be minimum 10 seconds even if reset time is in past
assert delay >= 10
class TestRetrieveDataRetry:
"""Tests for retry behavior in retrieve_data."""
@pytest.fixture
def mock_args(self):
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
return args
def test_json_parse_error_retries_and_fails(self, mock_args):
"""HTTP 200 with invalid JSON should retry and eventually fail."""
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = b"not valid json {"
mock_response.headers = {"x-ratelimit-remaining": "5000"}
call_count = 0
def mock_make_request(*args, **kwargs):
nonlocal call_count
call_count += 1
return mock_response
with patch("github_backup.github_backup.make_request_with_retry", side_effect=mock_make_request):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): # No delay in tests
with pytest.raises(Exception) as exc_info:
github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues")
assert "Failed to read response after" in str(exc_info.value)
assert call_count == MAX_RETRIES
def test_json_parse_error_recovers_on_retry(self, mock_args):
"""HTTP 200 with invalid JSON should succeed if retry returns valid JSON."""
bad_response = Mock()
bad_response.getcode.return_value = 200
bad_response.read.return_value = b"not valid json {"
bad_response.headers = {"x-ratelimit-remaining": "5000"}
good_response = Mock()
good_response.getcode.return_value = 200
good_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8")
good_response.headers = {"x-ratelimit-remaining": "5000", "Link": ""}
responses = [bad_response, bad_response, good_response]
call_count = 0
def mock_make_request(*args, **kwargs):
nonlocal call_count
result = responses[call_count]
call_count += 1
return result
with patch("github_backup.github_backup.make_request_with_retry", side_effect=mock_make_request):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
result = github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues")
assert result == [{"id": 1}]
assert call_count == 3 # Failed twice, succeeded on third
def test_http_error_raises_exception(self, mock_args):
"""Non-success HTTP status codes should raise Exception."""
mock_response = Mock()
mock_response.getcode.return_value = 404
mock_response.read.return_value = b'{"message": "Not Found"}'
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Not Found"
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with pytest.raises(Exception) as exc_info:
github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/notfound/issues")
assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError)
assert "404" in str(exc_info.value)
class TestMakeRequestWithRetry:
"""Tests for HTTP error retry behavior in make_request_with_retry."""
def test_502_error_retries_and_succeeds(self):
"""HTTP 502 should retry and succeed if subsequent request works."""
good_response = Mock()
good_response.read.return_value = b'{"ok": true}'
call_count = 0
fail_count = MAX_RETRIES - 1 # Fail all but last attempt
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count <= fail_count:
raise HTTPError(
url="https://api.github.com/test",
code=502,
msg="Bad Gateway",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert call_count == MAX_RETRIES
def test_503_error_retries_until_exhausted(self):
"""HTTP 503 should retry MAX_RETRIES times then raise."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise HTTPError(
url="https://api.github.com/test",
code=503,
msg="Service Unavailable",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 503
assert call_count == MAX_RETRIES
def test_404_error_not_retried(self):
"""HTTP 404 should not be retried - raise immediately."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise HTTPError(
url="https://api.github.com/test",
code=404,
msg="Not Found",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 404
assert call_count == 1 # No retries
def test_rate_limit_403_retried_when_remaining_zero(self):
"""HTTP 403 with x-ratelimit-remaining=0 should retry."""
good_response = Mock()
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count == 1:
raise HTTPError(
url="https://api.github.com/test",
code=403,
msg="Forbidden",
hdrs={"x-ratelimit-remaining": "0"},
fp=None,
)
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert call_count == 2
def test_403_not_retried_when_remaining_nonzero(self):
"""HTTP 403 with x-ratelimit-remaining>0 should not retry (permission error)."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise HTTPError(
url="https://api.github.com/test",
code=403,
msg="Forbidden",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 403
assert call_count == 1 # No retries
def test_connection_error_retries_and_succeeds(self):
"""URLError (connection error) should retry and succeed if subsequent request works."""
good_response = Mock()
call_count = 0
fail_count = MAX_RETRIES - 1 # Fail all but last attempt
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count <= fail_count:
raise URLError("Connection refused")
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert call_count == MAX_RETRIES
def test_socket_error_retries_until_exhausted(self):
"""socket.error should retry MAX_RETRIES times then raise."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise socket.error("Connection reset by peer")
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
with pytest.raises(socket.error):
make_request_with_retry(Mock(), None)
assert call_count == MAX_RETRIES
class TestRetrieveDataThrottling:
"""Tests for throttling behavior in retrieve_data."""
@pytest.fixture
def mock_args(self):
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = 10 # Throttle when remaining <= 10
args.throttle_pause = 5 # Pause 5 seconds
return args
def test_throttling_pauses_when_rate_limit_low(self, mock_args):
"""Should pause when x-ratelimit-remaining is at or below throttle_limit."""
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5", "Link": ""} # Below throttle_limit
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with patch("github_backup.github_backup.time.sleep") as mock_sleep:
github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues")
mock_sleep.assert_called_once_with(5) # throttle_pause value
class TestRetrieveDataSingleItem:
"""Tests for single item (dict) responses in retrieve_data."""
@pytest.fixture
def mock_args(self):
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
return args
def test_dict_response_returned_as_list(self, mock_args):
"""Single dict response should be returned as a list with one item."""
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = json.dumps({"login": "testuser", "id": 123}).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5000", "Link": ""}
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
result = github_backup.retrieve_data(mock_args, "https://api.github.com/user")
assert result == [{"login": "testuser", "id": 123}]

View File

@@ -0,0 +1,320 @@
"""Tests for --skip-assets-on flag behavior (issue #135)."""
import pytest
from unittest.mock import Mock, patch
from github_backup import github_backup
class TestSkipAssetsOn:
"""Test suite for --skip-assets-on flag.
Issue #135: Allow skipping asset downloads for specific repositories
while still backing up release metadata.
"""
def _create_mock_args(self, **overrides):
"""Create a mock args object with sensible defaults."""
args = Mock()
args.user = "testuser"
args.output_directory = "/tmp/backup"
args.include_repository = False
args.include_everything = False
args.include_gists = False
args.include_starred_gists = False
args.all_starred = False
args.skip_existing = False
args.bare_clone = False
args.lfs_clone = False
args.no_prune = False
args.include_wiki = False
args.include_issues = False
args.include_issue_comments = False
args.include_issue_events = False
args.include_pulls = False
args.include_pull_comments = False
args.include_pull_commits = False
args.include_pull_details = False
args.include_labels = False
args.include_hooks = False
args.include_milestones = False
args.include_releases = True
args.include_assets = True
args.skip_assets_on = []
args.include_attachments = False
args.incremental = False
args.incremental_by_files = False
args.github_host = None
args.prefer_ssh = False
args.token_classic = "test-token"
args.token_fine = None
args.username = None
args.password = None
args.as_app = False
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.skip_prerelease = False
args.number_of_latest_releases = None
for key, value in overrides.items():
setattr(args, key, value)
return args
def _create_mock_repository(self, name="test-repo", owner="testuser"):
"""Create a mock repository object."""
return {
"name": name,
"full_name": f"{owner}/{name}",
"owner": {"login": owner},
"private": False,
"fork": False,
"has_wiki": False,
}
def _create_mock_release(self, tag="v1.0.0"):
"""Create a mock release object."""
return {
"tag_name": tag,
"name": tag,
"prerelease": False,
"draft": False,
"assets_url": f"https://api.github.com/repos/testuser/test-repo/releases/{tag}/assets",
}
def _create_mock_asset(self, name="asset.zip"):
"""Create a mock asset object."""
return {
"name": name,
"url": f"https://api.github.com/repos/testuser/test-repo/releases/assets/{name}",
}
class TestSkipAssetsOnArgumentParsing(TestSkipAssetsOn):
"""Tests for --skip-assets-on argument parsing."""
def test_skip_assets_on_not_set_defaults_to_none(self):
"""When --skip-assets-on is not specified, it should default to None."""
args = github_backup.parse_args(["testuser"])
assert args.skip_assets_on is None
def test_skip_assets_on_single_repo(self):
"""Single --skip-assets-on should create list with one item."""
args = github_backup.parse_args(["testuser", "--skip-assets-on", "big-repo"])
assert args.skip_assets_on == ["big-repo"]
def test_skip_assets_on_multiple_repos(self):
"""Multiple repos can be specified space-separated (like --exclude)."""
args = github_backup.parse_args(
[
"testuser",
"--skip-assets-on",
"big-repo",
"another-repo",
"owner/third-repo",
]
)
assert args.skip_assets_on == ["big-repo", "another-repo", "owner/third-repo"]
class TestSkipAssetsOnBehavior(TestSkipAssetsOn):
"""Tests for --skip-assets-on behavior in backup_releases."""
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_downloaded_when_not_skipped(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Assets should be downloaded when repo is not in skip list."""
args = self._create_mock_args(skip_assets_on=[])
repository = self._create_mock_repository(name="normal-repo")
release = self._create_mock_release()
asset = self._create_mock_asset()
mock_json_dump.return_value = True
mock_retrieve.side_effect = [
[release], # First call: get releases
[asset], # Second call: get assets
]
with patch("os.path.join", side_effect=lambda *args: "/".join(args)):
github_backup.backup_releases(
args,
"/tmp/backup/repositories/normal-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should have been called for the asset
mock_download.assert_called_once()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_skipped_when_repo_name_matches(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Assets should be skipped when repo name is in skip list."""
args = self._create_mock_args(skip_assets_on=["big-repo"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_skipped_when_full_name_matches(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Assets should be skipped when owner/repo format matches."""
args = self._create_mock_args(skip_assets_on=["otheruser/big-repo"])
repository = self._create_mock_repository(name="big-repo", owner="otheruser")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_case_insensitive_matching(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Skip matching should be case-insensitive."""
# User types uppercase, repo name is lowercase
args = self._create_mock_args(skip_assets_on=["BIG-REPO"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called (case-insensitive match)
assert not mock_download.called
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_multiple_skip_repos(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Multiple repos in skip list should all be skipped."""
args = self._create_mock_args(skip_assets_on=["repo1", "repo2", "repo3"])
repository = self._create_mock_repository(name="repo2")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/repo2",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_release_metadata_still_saved_when_assets_skipped(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Release JSON should still be saved even when assets are skipped."""
args = self._create_mock_args(skip_assets_on=["big-repo"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# json_dump_if_changed should have been called for release metadata
mock_json_dump.assert_called_once()
# But download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_non_matching_repo_still_downloads_assets(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Repos not in skip list should still download assets."""
args = self._create_mock_args(skip_assets_on=["other-repo"])
repository = self._create_mock_repository(name="normal-repo")
release = self._create_mock_release()
asset = self._create_mock_asset()
mock_json_dump.return_value = True
mock_retrieve.side_effect = [
[release], # First call: get releases
[asset], # Second call: get assets
]
with patch("os.path.join", side_effect=lambda *args: "/".join(args)):
github_backup.backup_releases(
args,
"/tmp/backup/repositories/normal-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file SHOULD have been called
mock_download.assert_called_once()
if __name__ == "__main__":
pytest.main([__file__, "-v"])