Compare commits

..

18 Commits

Author SHA1 Message Date
GitHub Action
81a72ac8af Release version 0.59.0 2025-12-21 23:48:36 +00:00
Jose Diaz-Gonzalez
3edbfc777c Merge pull request #472 from Iamrodos/feature/108-starred-skip-size-over
Add --starred-skip-size-over flag to limit starred repo size (#108)
2025-12-21 18:47:58 -05:00
Rodos
3c43e0f481 Add --starred-skip-size-over flag to limit starred repo size (#108)
Allow users to skip starred repositories exceeding a size threshold
when using --all-starred. Size is specified in MB and checked against
the GitHub API's repository size field.

- Only affects starred repos; user's own repos always included
- Logs each skipped repo with name and size

Closes #108
2025-12-21 22:18:09 +11:00
Jose Diaz-Gonzalez
875f09eeaf Merge pull request #473 from Iamrodos/chore/remove-password-auth
chore: remove deprecated -u/-p password authentication options
2025-12-21 01:36:35 -05:00
Rodos
db36c3c137 chore: remove deprecated -u/-p password authentication options 2025-12-20 19:16:11 +11:00
GitHub Action
c70cc43f57 Release version 0.58.0 2025-12-16 15:17:23 +00:00
Jose Diaz-Gonzalez
27d3fcdafa Merge pull request #471 from Iamrodos/fix/retry-logic
Fix retry logic for HTTP 5xx errors and network failures
2025-12-16 10:16:48 -05:00
Rodos
46140b0ff1 Fix retry logic for HTTP 5xx errors and network failures
Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a9048 to fix #29.

Fixes #140, #110, #138
2025-12-16 21:55:47 +11:00
Jose Diaz-Gonzalez
02dd902b67 Merge pull request #470 from Iamrodos/chore/cleanup-release-requirements
chore: remove transitive deps from release-requirements.txt
2025-12-12 21:51:24 -05:00
Rodos
241949137d chore: remove transitive deps from release-requirements.txt 2025-12-13 11:22:53 +11:00
Jose Diaz-Gonzalez
1155da849d Merge pull request #469 from josegonzalez/dependabot/pip/python-packages-3c63e8caab
chore(deps): bump urllib3 from 2.6.1 to 2.6.2 in the python-packages group
2025-12-12 16:39:50 -05:00
dependabot[bot]
59a70ff11a chore(deps): bump urllib3 in the python-packages group
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).


Updates `urllib3` from 2.6.1 to 2.6.2
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.1...2.6.2)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-12 13:09:29 +00:00
GitHub Action
ba852b5830 Release version 0.57.0 2025-12-12 11:07:14 +00:00
Jose Diaz-Gonzalez
934ee4b14b Merge pull request #467 from Iamrodos/docs/187-189-auth-docs
Add GitHub Apps documentation and stdin token example
2025-12-12 06:06:30 -05:00
Jose Diaz-Gonzalez
37a0c5c123 Merge pull request #468 from Iamrodos/feature/135-skip-assets-on
Add --skip-assets-on flag to skip release asset downloads (#135)
2025-12-12 06:05:47 -05:00
Rodos
f6e2f40b09 Add --skip-assets-on flag to skip release asset downloads (#135)
Allow users to skip downloading release assets for specific repositories
while still backing up release metadata. Useful for starred repos with
large assets (e.g. syncthing with 27GB+).

Usage: --skip-assets-on repo1 repo2 owner/repo3

Features:
- Space-separated repos (consistent with --exclude)
- Case-insensitive matching
- Supports both repo name and owner/repo format
2025-12-12 16:21:52 +11:00
Rodos
ef990483e2 Add GitHub Apps documentation and remove outdated header
- Add GitHub Apps authentication section with setup steps
  and CI/CD workflow example using actions/create-github-app-token
- Remove outdated machine-man-preview header (graduated 2020)

Closes #189
2025-12-12 10:25:49 +11:00
Rodos
3a513b6646 docs: add stdin token example to README
Add example showing how to pipe a token from stdin using
file:///dev/stdin to avoid storing tokens in environment
variables or command history.

Closes #187
2025-12-12 09:55:13 +11:00
14 changed files with 1364 additions and 408 deletions

View File

@@ -1,9 +1,86 @@
Changelog Changelog
========= =========
0.56.0 (2025-12-11) 0.59.0 (2025-12-21)
------------------- -------------------
------------------------ ------------------------
- Add --starred-skip-size-over flag to limit starred repo size (#108)
[Rodos]
Allow users to skip starred repositories exceeding a size threshold
when using --all-starred. Size is specified in MB and checked against
the GitHub API's repository size field.
- Only affects starred repos; user's own repos always included
- Logs each skipped repo with name and size
Closes #108
- Chore: remove deprecated -u/-p password authentication options.
[Rodos]
0.58.0 (2025-12-16)
-------------------
- Fix retry logic for HTTP 5xx errors and network failures. [Rodos]
Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a9048 to fix #29.
Fixes #140, #110, #138
- Chore: remove transitive deps from release-requirements.txt. [Rodos]
- Chore(deps): bump urllib3 in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).
Updates `urllib3` from 2.6.1 to 2.6.2
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.1...2.6.2)
---
updated-dependencies:
- dependency-name: urllib3
dependency-version: 2.6.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
0.57.0 (2025-12-12)
-------------------
- Add GitHub Apps documentation and remove outdated header. [Rodos]
- Add GitHub Apps authentication section with setup steps
and CI/CD workflow example using actions/create-github-app-token
- Remove outdated machine-man-preview header (graduated 2020)
Closes #189
- Docs: add stdin token example to README. [Rodos]
Add example showing how to pipe a token from stdin using
file:///dev/stdin to avoid storing tokens in environment
variables or command history.
Closes #187
- Add --skip-assets-on flag to skip release asset downloads (#135)
[Rodos]
Allow users to skip downloading release assets for specific repositories
while still backing up release metadata. Useful for starred repos with
large assets (e.g. syncthing with 27GB+).
Usage: --skip-assets-on repo1 repo2 owner/repo3
Features:
- Space-separated repos (consistent with --exclude)
- Case-insensitive matching
- Supports both repo name and owner/repo format
0.56.0 (2025-12-11)
-------------------
Fix Fix
~~~ ~~~

View File

@@ -36,23 +36,26 @@ Show the CLI help output::
CLI Help output:: CLI Help output::
github-backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN_CLASSIC] github-backup [-h] [-t TOKEN_CLASSIC] [-f TOKEN_FINE] [-q] [--as-app]
[-f TOKEN_FINE] [--as-app] [-o OUTPUT_DIRECTORY] [-o OUTPUT_DIRECTORY] [-l LOG_LEVEL] [-i]
[-l LOG_LEVEL] [-i] [--starred] [--all-starred] [--incremental-by-files]
[--watched] [--followers] [--following] [--all] [--issues] [--starred] [--all-starred] [--starred-skip-size-over MB]
[--issue-comments] [--issue-events] [--pulls] [--watched] [--followers] [--following] [--all]
[--issues] [--issue-comments] [--issue-events] [--pulls]
[--pull-comments] [--pull-commits] [--pull-details] [--pull-comments] [--pull-commits] [--pull-details]
[--labels] [--hooks] [--milestones] [--repositories] [--labels] [--hooks] [--milestones] [--repositories]
[--bare] [--lfs] [--wikis] [--gists] [--starred-gists] [--bare] [--no-prune] [--lfs] [--wikis] [--gists]
[--skip-archived] [--skip-existing] [-L [LANGUAGES ...]] [--starred-gists] [--skip-archived] [--skip-existing]
[-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY] [-L [LANGUAGES ...]] [-N NAME_REGEX] [-H GITHUB_HOST]
[-P] [-F] [--prefer-ssh] [-v] [-O] [-R REPOSITORY] [-P] [-F] [--prefer-ssh] [-v]
[--keychain-name OSX_KEYCHAIN_ITEM_NAME] [--keychain-name OSX_KEYCHAIN_ITEM_NAME]
[--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT] [--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT]
[--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES] [--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES]
[--skip-prerelease] [--assets] [--attachments] [--skip-prerelease] [--assets]
[--exclude [REPOSITORY [REPOSITORY ...]] [--skip-assets-on [SKIP_ASSETS_ON ...]] [--attachments]
[--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE] [--throttle-limit THROTTLE_LIMIT]
[--throttle-pause THROTTLE_PAUSE]
[--exclude [EXCLUDE ...]]
USER USER
Backup a github account Backup a github account
@@ -60,29 +63,29 @@ CLI Help output::
positional arguments: positional arguments:
USER github username USER github username
optional arguments: options:
-h, --help show this help message and exit -h, --help show this help message and exit
-u USERNAME, --username USERNAME -t, --token TOKEN_CLASSIC
username for basic auth
-p PASSWORD, --password PASSWORD
password for basic auth. If a username is given but
not a password, the password will be prompted for.
-f TOKEN_FINE, --token-fine TOKEN_FINE
fine-grained personal access token or path to token
(file://...)
-t TOKEN_CLASSIC, --token TOKEN_CLASSIC
personal access, OAuth, or JSON Web token, or path to personal access, OAuth, or JSON Web token, or path to
token (file://...) token (file://...)
-f, --token-fine TOKEN_FINE
fine-grained personal access token (github_pat_....),
or path to token (file://...)
-q, --quiet supress log messages less severe than warning, e.g.
info
--as-app authenticate as github app instead of as a user. --as-app authenticate as github app instead of as a user.
-o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY -o, --output-directory OUTPUT_DIRECTORY
directory at which to backup the repositories directory at which to backup the repositories
-l LOG_LEVEL, --log-level LOG_LEVEL -l, --log-level LOG_LEVEL
log level to use (default: info, possible levels: log level to use (default: info, possible levels:
debug, info, warning, error, critical) debug, info, warning, error, critical)
-i, --incremental incremental backup -i, --incremental incremental backup
--incremental-by-files incremental backup using modified time of files --incremental-by-files
incremental backup based on modification date of files
--starred include JSON output of starred repositories in backup --starred include JSON output of starred repositories in backup
--all-starred include starred repositories in backup [*] --all-starred include starred repositories in backup [*]
--starred-skip-size-over MB
skip starred repositories larger than this size in MB
--watched include JSON output of watched repositories in backup --watched include JSON output of watched repositories in backup
--followers include JSON output of followers in backup --followers include JSON output of followers in backup
--following include JSON output of following users in backup --following include JSON output of following users in backup
@@ -100,20 +103,22 @@ CLI Help output::
--milestones include milestones in backup --milestones include milestones in backup
--repositories include repository clone in backup --repositories include repository clone in backup
--bare clone bare repositories --bare clone bare repositories
--no-prune disable prune option for git fetch
--lfs clone LFS repositories (requires Git LFS to be --lfs clone LFS repositories (requires Git LFS to be
installed, https://git-lfs.github.com) [*] installed, https://git-lfs.github.com) [*]
--wikis include wiki clone in backup --wikis include wiki clone in backup
--gists include gists in backup [*] --gists include gists in backup [*]
--starred-gists include starred gists in backup [*] --starred-gists include starred gists in backup [*]
--skip-archived skip project if it is archived
--skip-existing skip project if a backup directory exists --skip-existing skip project if a backup directory exists
-L [LANGUAGES [LANGUAGES ...]], --languages [LANGUAGES [LANGUAGES ...]] -L, --languages [LANGUAGES ...]
only allow these languages only allow these languages
-N NAME_REGEX, --name-regex NAME_REGEX -N, --name-regex NAME_REGEX
python regex to match names against python regex to match names against
-H GITHUB_HOST, --github-host GITHUB_HOST -H, --github-host GITHUB_HOST
GitHub Enterprise hostname GitHub Enterprise hostname
-O, --organization whether or not this is an organization user -O, --organization whether or not this is an organization user
-R REPOSITORY, --repository REPOSITORY -R, --repository REPOSITORY
name of repository to limit backup to name of repository to limit backup to
-P, --private include private repositories [*] -P, --private include private repositories [*]
-F, --fork include forked repositories [*] -F, --fork include forked repositories [*]
@@ -128,16 +133,16 @@ CLI Help output::
--releases include release information, not including assets or --releases include release information, not including assets or
binaries binaries
--latest-releases NUMBER_OF_LATEST_RELEASES --latest-releases NUMBER_OF_LATEST_RELEASES
include certain number of the latest releases; include certain number of the latest releases; only
only applies if including releases applies if including releases
--skip-prerelease skip prerelease and draft versions; only applies if including releases --skip-prerelease skip prerelease and draft versions; only applies if
including releases
--assets include assets alongside release information; only --assets include assets alongside release information; only
applies if including releases applies if including releases
--attachments download user-attachments from issues and pull requests --skip-assets-on [SKIP_ASSETS_ON ...]
to issues/attachments/{issue_number}/ and skip asset downloads for these repositories
pulls/attachments/{pull_number}/ directories --attachments download user-attachments from issues and pull
--exclude [REPOSITORY [REPOSITORY ...]] requests
names of repositories to exclude from backup.
--throttle-limit THROTTLE_LIMIT --throttle-limit THROTTLE_LIMIT
start throttling of GitHub API requests after this start throttling of GitHub API requests after this
amount of API requests remain amount of API requests remain
@@ -145,6 +150,8 @@ CLI Help output::
wait this amount of seconds when API request wait this amount of seconds when API request
throttling is active (default: 30.0, requires throttling is active (default: 30.0, requires
--throttle-limit to be set) --throttle-limit to be set)
--exclude [EXCLUDE ...]
names of repositories to exclude
Usage Details Usage Details
@@ -153,13 +160,13 @@ Usage Details
Authentication Authentication
-------------- --------------
**Password-based authentication** will fail if you have two-factor authentication enabled, and will `be deprecated <https://github.blog/2023-03-09-raising-the-bar-for-software-security-github-2fa-begins-march-13/>`_ by 2023 EOY. GitHub requires token-based authentication for API access. Password authentication was `removed in November 2020 <https://developer.github.com/changes/2020-02-14-deprecating-password-auth/>`_.
``--username`` is used for basic password authentication and separate from the positional argument ``USER``, which specifies the user account you wish to back up. The positional argument ``USER`` specifies the user or organization account you wish to back up.
**Classic tokens** are `slightly less secure <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#personal-access-tokens-classic>`_ as they provide very coarse-grained permissions. **Fine-grained tokens** (``-f TOKEN_FINE``) are recommended for most use cases, especially long-running backups (e.g. cron jobs), as they provide precise permission control.
If you need authentication for long-running backups (e.g. for a cron job) it is recommended to use **fine-grained personal access token** ``-f TOKEN_FINE``. **Classic tokens** (``-t TOKEN``) are `slightly less secure <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#personal-access-tokens-classic>`_ as they provide very coarse-grained permissions.
Fine Tokens Fine Tokens
@@ -174,6 +181,37 @@ Customise the permissions for your use case, but for a personal account full bac
**Repository permissions**: Read access to contents, issues, metadata, pull requests, and webhooks. **Repository permissions**: Read access to contents, issues, metadata, pull requests, and webhooks.
GitHub Apps
~~~~~~~~~~~
GitHub Apps are ideal for organization backups in CI/CD. Tokens are scoped to specific repositories and expire after 1 hour.
**One-time setup:**
1. Create a GitHub App at *Settings -> Developer Settings -> GitHub Apps -> New GitHub App*
2. Set a name and homepage URL (can be any URL)
3. Uncheck "Webhook > Active" (not needed for backups)
4. Set permissions (same as fine-grained tokens above)
5. Click "Create GitHub App", then note the **App ID** shown on the next page
6. Under "Private keys", click "Generate a private key" and save the downloaded file
7. Go to *Install App* in your app's settings
8. Select the account/organization and which repositories to back up
**CI/CD usage with GitHub Actions:**
Store the App ID as a repository variable and the private key contents as a secret, then use ``actions/create-github-app-token``::
- uses: actions/create-github-app-token@v1
id: app-token
with:
app-id: ${{ vars.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
- run: github-backup myorg -t ${{ steps.app-token.outputs.token }} --as-app -o ./backup --all
Note: Installation tokens expire after 1 hour. For long-running backups, use a fine-grained personal access token instead.
Prefer SSH Prefer SSH
~~~~~~~~~~ ~~~~~~~~~~
@@ -256,10 +294,20 @@ All is not everything
The ``--all`` argument does not include: cloning private repos (``-P, --private``), cloning forks (``-F, --fork``), cloning starred repositories (``--all-starred``), ``--pull-details``, cloning LFS repositories (``--lfs``), cloning gists (``--gists``) or cloning starred gist repos (``--starred-gists``). See examples for more. The ``--all`` argument does not include: cloning private repos (``-P, --private``), cloning forks (``-F, --fork``), cloning starred repositories (``--all-starred``), ``--pull-details``, cloning LFS repositories (``--lfs``), cloning gists (``--gists``) or cloning starred gist repos (``--starred-gists``). See examples for more.
Cloning all starred size Starred repository size
------------------------ -----------------------
Using the ``--all-starred`` argument to clone all starred repositories may use a large amount of storage space, especially if ``--all`` or more arguments are used. e.g. commonly starred repos can have tens of thousands of issues, many large assets and the repo itself etc. Consider just storing links to starred repos in JSON format with ``--starred``. Using the ``--all-starred`` argument to clone all starred repositories may use a large amount of storage space.
To see your starred repositories sorted by size (requires `GitHub CLI <https://cli.github.com>`_)::
gh api user/starred --paginate --jq 'sort_by(-.size)[]|"\(.full_name) \(.size/1024|round)MB"'
To limit which starred repositories are cloned, use ``--starred-skip-size-over SIZE`` where SIZE is in MB. For example, ``--starred-skip-size-over 500`` will skip any starred repository where the git repository size (code and history) exceeds 500 MB. Note that this size limit only applies to the repository itself, not issues, release assets or other metadata. This filter only affects starred repositories; your own repositories are always included regardless of size.
For finer control, avoid using ``--assets`` with starred repos, or use ``--skip-assets-on`` for specific repositories with large release binaries.
Alternatively, consider just storing links to starred repos in JSON format with ``--starred``.
Incremental Backup Incremental Backup
------------------ ------------------
@@ -361,6 +409,9 @@ Debug an error/block or incomplete backup into a temporary directory. Omit "incr
github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
Pipe a token from stdin to avoid storing it in environment variables or command history (Unix-like systems only)::
my-secret-manager get github-token | github-backup user -t file:///dev/stdin -o /backup --repositories
Restoring from Backup Restoring from Backup
===================== =====================

View File

@@ -1 +1 @@
__version__ = "0.56.0" __version__ = "0.59.0"

View File

@@ -43,7 +43,7 @@ def main():
if args.private and not get_auth(args): if args.private and not get_auth(args):
logger.warning( logger.warning(
"The --private flag has no effect without authentication. " "The --private flag has no effect without authentication. "
"Use -t/--token, -f/--token-fine, or -u/--username to authenticate." "Use -t/--token or -f/--token-fine to authenticate."
) )
if args.quiet: if args.quiet:

View File

@@ -7,11 +7,11 @@ import base64
import calendar import calendar
import codecs import codecs
import errno import errno
import getpass
import json import json
import logging import logging
import os import os
import platform import platform
import random
import re import re
import select import select
import socket import socket
@@ -19,10 +19,10 @@ import ssl
import subprocess import subprocess
import sys import sys
import time import time
from collections.abc import Generator
from datetime import datetime from datetime import datetime
from http.client import IncompleteRead from http.client import IncompleteRead
from urllib.error import HTTPError, URLError from urllib.error import HTTPError, URLError
from urllib.parse import quote as urlquote
from urllib.parse import urlencode, urlparse from urllib.parse import urlencode, urlparse
from urllib.request import HTTPRedirectHandler, Request, build_opener, urlopen from urllib.request import HTTPRedirectHandler, Request, build_opener, urlopen
@@ -74,6 +74,9 @@ else:
" 3. Debian/Ubuntu: apt-get install ca-certificates\n\n" " 3. Debian/Ubuntu: apt-get install ca-certificates\n\n"
) )
# Retry configuration
MAX_RETRIES = 5
def logging_subprocess( def logging_subprocess(
popenargs, stdout_log_level=logging.DEBUG, stderr_log_level=logging.ERROR, **kwargs popenargs, stdout_log_level=logging.DEBUG, stderr_log_level=logging.ERROR, **kwargs
@@ -144,17 +147,6 @@ def mask_password(url, secret="*****"):
def parse_args(args=None): def parse_args(args=None):
parser = argparse.ArgumentParser(description="Backup a github account") parser = argparse.ArgumentParser(description="Backup a github account")
parser.add_argument("user", metavar="USER", type=str, help="github username") parser.add_argument("user", metavar="USER", type=str, help="github username")
parser.add_argument(
"-u", "--username", dest="username", help="username for basic auth"
)
parser.add_argument(
"-p",
"--password",
dest="password",
help="password for basic auth. "
"If a username is given but not a password, the "
"password will be prompted for.",
)
parser.add_argument( parser.add_argument(
"-t", "-t",
"--token", "--token",
@@ -219,6 +211,13 @@ def parse_args(args=None):
dest="all_starred", dest="all_starred",
help="include starred repositories in backup [*]", help="include starred repositories in backup [*]",
) )
parser.add_argument(
"--starred-skip-size-over",
type=int,
metavar="MB",
dest="starred_skip_size_over",
help="skip starred repositories larger than this size in MB",
)
parser.add_argument( parser.add_argument(
"--watched", "--watched",
action="store_true", action="store_true",
@@ -440,6 +439,12 @@ def parse_args(args=None):
dest="include_assets", dest="include_assets",
help="include assets alongside release information; only applies if including releases", help="include assets alongside release information; only applies if including releases",
) )
parser.add_argument(
"--skip-assets-on",
dest="skip_assets_on",
nargs="*",
help="skip asset downloads for these repositories",
)
parser.add_argument( parser.add_argument(
"--attachments", "--attachments",
action="store_true", action="store_true",
@@ -522,16 +527,6 @@ def get_auth(args, encode=True, for_git_cli=False):
auth = args.token_classic auth = args.token_classic
else: else:
auth = "x-access-token:" + args.token_classic auth = "x-access-token:" + args.token_classic
elif args.username:
if not args.password:
args.password = getpass.getpass()
if encode:
password = args.password
else:
password = urlquote(args.password)
auth = args.username + ":" + password
elif args.password:
raise Exception("You must specify a username for basic auth")
if not auth: if not auth:
return None return None
@@ -561,7 +556,7 @@ def get_github_host(args):
def read_file_contents(file_uri): def read_file_contents(file_uri):
return open(file_uri[len(FILE_URI_PREFIX):], "rt").readline().strip() return open(file_uri[len(FILE_URI_PREFIX) :], "rt").readline().strip()
def get_github_repo_url(args, repository): def get_github_repo_url(args, repository):
@@ -597,175 +592,181 @@ def get_github_repo_url(args, repository):
return repo_url return repo_url
def retrieve_data_gen(args, template, query_args=None, single_request=False): def calculate_retry_delay(attempt, headers):
"""Calculate delay before next retry with exponential backoff."""
# Respect retry-after header if present
if retry_after := headers.get("retry-after"):
return int(retry_after)
# Respect rate limit reset time
if int(headers.get("x-ratelimit-remaining", 1)) < 1:
reset_time = int(headers.get("x-ratelimit-reset", 0))
return max(10, reset_time - calendar.timegm(time.gmtime()))
# Exponential backoff with jitter for server errors (1s base, 120s max)
delay = min(1.0 * (2**attempt), 120.0)
return delay + random.uniform(0, delay * 0.1)
def retrieve_data(args, template, query_args=None, paginated=True):
"""
Fetch the data from GitHub API.
Handle both single requests and pagination with yield of individual dicts.
Handles throttling, retries, read errors, and DMCA takedowns.
"""
query_args = query_args or {}
auth = get_auth(args, encode=not args.as_app) auth = get_auth(args, encode=not args.as_app)
query_args = get_query_args(query_args)
per_page = 100 per_page = 100
def _extract_next_page_url(link_header):
for link in link_header.split(","):
if 'rel="next"' in link:
return link[link.find("<") + 1:link.find(">")]
return None
def fetch_all() -> Generator[dict, None, None]:
next_url = None next_url = None
while True: while True:
if single_request: # FIRST: Fetch response
request_per_page = None
else:
request_per_page = per_page
for attempt in range(MAX_RETRIES):
request = _construct_request( request = _construct_request(
request_per_page, per_page=per_page if paginated else None,
query_args, query_args=query_args,
next_url or template, template=next_url or template,
auth, auth=auth,
as_app=args.as_app, as_app=args.as_app,
fine=True if args.token_fine is not None else False, fine=args.token_fine is not None,
) # noqa )
r, errors = _get_response(request, auth, next_url or template) http_response = make_request_with_retry(request, auth)
status_code = int(r.getcode()) match http_response.getcode():
case 200:
# Success - Parse JSON response
try:
response = json.loads(http_response.read().decode("utf-8"))
break # Exit retry loop and handle the data returned
except (
IncompleteRead,
json.decoder.JSONDecodeError,
TimeoutError,
) as e:
logger.warning(f"{type(e).__name__} reading response")
if attempt < MAX_RETRIES - 1:
delay = calculate_retry_delay(attempt, {})
logger.warning(
f"Retrying in {delay:.1f}s (attempt {attempt + 1}/{MAX_RETRIES})"
)
time.sleep(delay)
continue # Next retry attempt
# Handle DMCA takedown (HTTP 451) - raise exception to skip entire repository case 451:
if status_code == 451: # DMCA takedown - extract URL if available, then raise
dmca_url = None dmca_url = None
try: try:
response_data = json.loads(r.read().decode("utf-8")) response_data = json.loads(
http_response.read().decode("utf-8")
)
dmca_url = response_data.get("block", {}).get("html_url") dmca_url = response_data.get("block", {}).get("html_url")
except Exception: except Exception:
pass pass
raise RepositoryUnavailableError( raise RepositoryUnavailableError(
"Repository unavailable due to legal reasons (HTTP 451)", "Repository unavailable due to legal reasons (HTTP 451)",
dmca_url=dmca_url dmca_url=dmca_url,
) )
# Check if we got correct data case _:
try: raise Exception(
response = json.loads(r.read().decode("utf-8")) f"API request returned HTTP {http_response.getcode()}: {http_response.reason}"
except IncompleteRead: )
logger.warning("Incomplete read error detected")
read_error = True
except json.decoder.JSONDecodeError:
logger.warning("JSON decode error detected")
read_error = True
except TimeoutError:
logger.warning("Tiemout error detected")
read_error = True
else: else:
read_error = False logger.error(
f"Failed to read response after {MAX_RETRIES} attempts for {next_url or template}"
# be gentle with API request limit and throttle requests if remaining requests getting low
limit_remaining = int(r.headers.get("x-ratelimit-remaining", 0))
if args.throttle_limit and limit_remaining <= args.throttle_limit:
logger.info(
"API request limit hit: {} requests left, pausing further requests for {}s".format(
limit_remaining, args.throttle_pause
) )
raise Exception(
f"Failed to read response after {MAX_RETRIES} attempts for {next_url or template}"
)
# SECOND: Process and paginate
# Pause before next request if rate limit is low
if (
remaining := int(http_response.headers.get("x-ratelimit-remaining", 0))
) <= (args.throttle_limit or 0):
if args.throttle_limit:
logger.info(
f"Throttling: {remaining} requests left, pausing {args.throttle_pause}s"
) )
time.sleep(args.throttle_pause) time.sleep(args.throttle_pause)
retries = 0 # Yield results
while retries < 3 and (status_code == 502 or read_error): if isinstance(response, list):
logger.warning("API request failed. Retrying in 5 seconds") yield from response
retries += 1 elif isinstance(response, dict):
time.sleep(5)
request = _construct_request(
request_per_page,
query_args,
next_url or template,
auth,
as_app=args.as_app,
fine=True if args.token_fine is not None else False,
) # noqa
r, errors = _get_response(request, auth, next_url or template)
status_code = int(r.getcode())
try:
response = json.loads(r.read().decode("utf-8"))
read_error = False
except IncompleteRead:
logger.warning("Incomplete read error detected")
read_error = True
except json.decoder.JSONDecodeError:
logger.warning("JSON decode error detected")
read_error = True
except TimeoutError:
logger.warning("Tiemout error detected")
read_error = True
if status_code != 200:
template = "API request returned HTTP {0}: {1}"
errors.append(template.format(status_code, r.reason))
raise Exception(", ".join(errors))
if read_error:
template = "API request problem reading response for {0}"
errors.append(template.format(request))
raise Exception(", ".join(errors))
if len(errors) == 0:
if type(response) is list:
for resp in response:
yield resp
# Parse Link header for next page URL (cursor-based pagination)
link_header = r.headers.get("Link", "")
next_url = None
if link_header:
# Parse Link header: <https://api.github.com/...?per_page=100&after=cursor>; rel="next"
for link in link_header.split(","):
if 'rel="next"' in link:
next_url = link[link.find("<") + 1:link.find(">")]
break
if not next_url:
break
elif type(response) is dict and single_request:
yield response yield response
if len(errors) > 0: # Check for more pages
raise Exception(", ".join(errors)) if not paginated or not (
next_url := _extract_next_page_url(
http_response.headers.get("Link", "")
)
):
break # No more data
if single_request: return list(fetch_all())
break
def retrieve_data(args, template, query_args=None, single_request=False): def make_request_with_retry(request, auth):
return list(retrieve_data_gen(args, template, query_args, single_request)) """Make HTTP request with automatic retry for transient errors."""
def is_retryable_status(status_code, headers):
# Server errors are always retryable
if status_code in (500, 502, 503, 504):
return True
# Rate limit (403/429) is retryable if limit exhausted
if status_code in (403, 429):
return int(headers.get("x-ratelimit-remaining", 1)) < 1
return False
def get_query_args(query_args=None): for attempt in range(MAX_RETRIES):
if not query_args:
query_args = {}
return query_args
def _get_response(request, auth, template):
retry_timeout = 3
errors = []
# We'll make requests in a loop so we can
# delay and retry in the case of rate-limiting
while True:
should_continue = False
try: try:
r = urlopen(request, context=https_ctx) return urlopen(request, context=https_ctx)
except HTTPError as exc: except HTTPError as exc:
errors, should_continue = _request_http_error(exc, auth, errors) # noqa # HTTPError can be used as a response-like object
r = exc if not is_retryable_status(exc.code, exc.headers):
except URLError as e: raise # Non-retryable error
logger.warning(e.reason)
should_continue, retry_timeout = _request_url_error(template, retry_timeout) if attempt >= MAX_RETRIES - 1:
if not should_continue: logger.error(f"HTTP {exc.code} failed after {MAX_RETRIES} attempts")
raise
except socket.error as e:
logger.warning(e.strerror)
should_continue, retry_timeout = _request_url_error(template, retry_timeout)
if not should_continue:
raise raise
if should_continue: delay = calculate_retry_delay(attempt, exc.headers)
continue logger.warning(
f"HTTP {exc.code}, retrying in {delay:.1f}s "
f"(attempt {attempt + 1}/{MAX_RETRIES})"
)
if auth is None and exc.code in (403, 429):
logger.info("Hint: Authenticate to raise your GitHub rate limit")
time.sleep(delay)
break except (URLError, socket.error) as e:
return r, errors if attempt >= MAX_RETRIES - 1:
logger.error(f"Connection error failed after {MAX_RETRIES} attempts: {e}")
raise
delay = calculate_retry_delay(attempt, {})
logger.warning(
f"Connection error: {e}, retrying in {delay:.1f}s "
f"(attempt {attempt + 1}/{MAX_RETRIES})"
)
time.sleep(delay)
raise Exception(f"Request failed after {MAX_RETRIES} attempts") # pragma: no cover
def _construct_request( def _construct_request(per_page, query_args, template, auth, as_app=None, fine=False):
per_page, query_args, template, auth, as_app=None, fine=False
):
# If template is already a full URL with query params (from Link header), use it directly # If template is already a full URL with query params (from Link header), use it directly
if "?" in template and template.startswith("http"): if "?" in template and template.startswith("http"):
request_url = template request_url = template
@@ -796,9 +797,6 @@ def _construct_request(
else: else:
auth = auth.encode("ascii") auth = auth.encode("ascii")
request.add_header("Authorization", "token ".encode("ascii") + auth) request.add_header("Authorization", "token ".encode("ascii") + auth)
request.add_header(
"Accept", "application/vnd.github.machine-man-preview+json"
)
log_url = template if "?" not in template else template.split("?")[0] log_url = template if "?" not in template else template.split("?")[0]
if querystring: if querystring:
@@ -807,52 +805,6 @@ def _construct_request(
return request return request
def _request_http_error(exc, auth, errors):
# HTTPError behaves like a Response so we can
# check the status code and headers to see exactly
# what failed.
should_continue = False
headers = exc.headers
limit_remaining = int(headers.get("x-ratelimit-remaining", 0))
if exc.code == 403 and limit_remaining < 1:
# The X-RateLimit-Reset header includes a
# timestamp telling us when the limit will reset
# so we can calculate how long to wait rather
# than inefficiently polling:
gm_now = calendar.timegm(time.gmtime())
reset = int(headers.get("x-ratelimit-reset", 0)) or gm_now
# We'll never sleep for less than 10 seconds:
delta = max(10, reset - gm_now)
limit = headers.get("x-ratelimit-limit")
logger.warning(
"Exceeded rate limit of {} requests; waiting {} seconds to reset".format(
limit, delta
)
) # noqa
if auth is None:
logger.info("Hint: Authenticate to raise your GitHub rate limit")
time.sleep(delta)
should_continue = True
return errors, should_continue
def _request_url_error(template, retry_timeout):
# In case of a connection timing out, we can retry a few time
# But we won't crash and not back-up the rest now
logger.info("'{}' timed out".format(template))
retry_timeout -= 1
if retry_timeout >= 0:
return True, retry_timeout
raise Exception("'{}' timed out to much, skipping!".format(template))
class S3HTTPRedirectHandler(HTTPRedirectHandler): class S3HTTPRedirectHandler(HTTPRedirectHandler):
""" """
A subclassed redirect handler for downloading Github assets from S3. A subclassed redirect handler for downloading Github assets from S3.
@@ -1480,9 +1432,11 @@ def download_attachments(
manifest = { manifest = {
"issue_number": number, "issue_number": number,
"issue_type": item_type, "issue_type": item_type,
"repository": f"{args.user}/{args.repository}" "repository": (
f"{args.user}/{args.repository}"
if hasattr(args, "repository") and args.repository if hasattr(args, "repository") and args.repository
else args.user, else args.user
),
"manifest_updated_at": datetime.now(timezone.utc).isoformat(), "manifest_updated_at": datetime.now(timezone.utc).isoformat(),
"attachments": attachment_metadata_list, "attachments": attachment_metadata_list,
} }
@@ -1500,7 +1454,7 @@ def download_attachments(
def get_authenticated_user(args): def get_authenticated_user(args):
template = "https://{0}/user".format(get_github_api_host(args)) template = "https://{0}/user".format(get_github_api_host(args))
data = retrieve_data(args, template, single_request=True) data = retrieve_data(args, template, paginated=False)
return data[0] return data[0]
@@ -1514,7 +1468,7 @@ def check_git_lfs_install():
def retrieve_repositories(args, authenticated_user): def retrieve_repositories(args, authenticated_user):
logger.info("Retrieving repositories") logger.info("Retrieving repositories")
single_request = False paginated = True
if args.user == authenticated_user["login"]: if args.user == authenticated_user["login"]:
# we must use the /user/repos API to be able to access private repos # we must use the /user/repos API to be able to access private repos
template = "https://{0}/user/repos".format(get_github_api_host(args)) template = "https://{0}/user/repos".format(get_github_api_host(args))
@@ -1537,18 +1491,16 @@ def retrieve_repositories(args, authenticated_user):
repo_path = args.repository repo_path = args.repository
else: else:
repo_path = "{0}/{1}".format(args.user, args.repository) repo_path = "{0}/{1}".format(args.user, args.repository)
single_request = True paginated = False
template = "https://{0}/repos/{1}".format( template = "https://{0}/repos/{1}".format(get_github_api_host(args), repo_path)
get_github_api_host(args), repo_path
)
repos = retrieve_data(args, template, single_request=single_request) repos = retrieve_data(args, template, paginated=paginated)
if args.all_starred: if args.all_starred:
starred_template = "https://{0}/users/{1}/starred".format( starred_template = "https://{0}/users/{1}/starred".format(
get_github_api_host(args), args.user get_github_api_host(args), args.user
) )
starred_repos = retrieve_data(args, starred_template, single_request=False) starred_repos = retrieve_data(args, starred_template)
# flag each repo as starred for downstream processing # flag each repo as starred for downstream processing
for item in starred_repos: for item in starred_repos:
item.update({"is_starred": True}) item.update({"is_starred": True})
@@ -1558,14 +1510,17 @@ def retrieve_repositories(args, authenticated_user):
gists_template = "https://{0}/users/{1}/gists".format( gists_template = "https://{0}/users/{1}/gists".format(
get_github_api_host(args), args.user get_github_api_host(args), args.user
) )
gists = retrieve_data(args, gists_template, single_request=False) gists = retrieve_data(args, gists_template)
# flag each repo as a gist for downstream processing # flag each repo as a gist for downstream processing
for item in gists: for item in gists:
item.update({"is_gist": True}) item.update({"is_gist": True})
repos.extend(gists) repos.extend(gists)
if args.include_starred_gists: if args.include_starred_gists:
if not authenticated_user.get("login") or args.user.lower() != authenticated_user["login"].lower(): if (
not authenticated_user.get("login")
or args.user.lower() != authenticated_user["login"].lower()
):
logger.warning( logger.warning(
"Cannot retrieve starred gists for '%s'. GitHub only allows access to the authenticated user's starred gists.", "Cannot retrieve starred gists for '%s'. GitHub only allows access to the authenticated user's starred gists.",
args.user, args.user,
@@ -1574,9 +1529,7 @@ def retrieve_repositories(args, authenticated_user):
starred_gists_template = "https://{0}/gists/starred".format( starred_gists_template = "https://{0}/gists/starred".format(
get_github_api_host(args) get_github_api_host(args)
) )
starred_gists = retrieve_data( starred_gists = retrieve_data(args, starred_gists_template)
args, starred_gists_template, single_request=False
)
# flag each repo as a starred gist for downstream processing # flag each repo as a starred gist for downstream processing
for item in starred_gists: for item in starred_gists:
item.update({"is_gist": True, "is_starred": True}) item.update({"is_gist": True, "is_starred": True})
@@ -1624,6 +1577,25 @@ def filter_repositories(args, unfiltered_repositories):
] ]
if args.skip_archived: if args.skip_archived:
repositories = [r for r in repositories if not r.get("archived")] repositories = [r for r in repositories if not r.get("archived")]
if args.starred_skip_size_over is not None:
if args.starred_skip_size_over <= 0:
logger.warning(
"--starred-skip-size-over must be greater than 0, ignoring"
)
else:
size_limit_kb = args.starred_skip_size_over * 1024
filtered = []
for r in repositories:
if r.get("is_starred") and r.get("size", 0) > size_limit_kb:
size_mb = r.get("size", 0) / 1024
logger.info(
"Skipping starred repo {0} ({1:.0f} MB) due to --starred-skip-size-over {2}".format(
r.get("full_name", r.get("name")), size_mb, args.starred_skip_size_over
)
)
else:
filtered.append(r)
repositories = filtered
if args.exclude: if args.exclude:
repositories = [ repositories = [
r for r in repositories if "name" not in r or r["name"] not in args.exclude r for r in repositories if "name" not in r or r["name"] not in args.exclude
@@ -1673,9 +1645,11 @@ def backup_repositories(args, output_directory, repositories):
include_gists = args.include_gists or args.include_starred_gists include_gists = args.include_gists or args.include_starred_gists
include_starred = args.all_starred and repository.get("is_starred") include_starred = args.all_starred and repository.get("is_starred")
if (args.include_repository or args.include_everything) or ( if (
include_gists and repository.get("is_gist") (args.include_repository or args.include_everything)
) or include_starred: or (include_gists and repository.get("is_gist"))
or include_starred
):
repo_name = ( repo_name = (
repository.get("name") repository.get("name")
if not repository.get("is_gist") if not repository.get("is_gist")
@@ -1735,7 +1709,9 @@ def backup_repositories(args, output_directory, repositories):
include_assets=args.include_assets or args.include_everything, include_assets=args.include_assets or args.include_everything,
) )
except RepositoryUnavailableError as e: except RepositoryUnavailableError as e:
logger.warning(f"Repository {repository['full_name']} is unavailable (HTTP 451)") logger.warning(
f"Repository {repository['full_name']} is unavailable (HTTP 451)"
)
if e.dmca_url: if e.dmca_url:
logger.warning(f"DMCA notice: {e.dmca_url}") logger.warning(f"DMCA notice: {e.dmca_url}")
logger.info(f"Skipping remaining resources for {repository['full_name']}") logger.info(f"Skipping remaining resources for {repository['full_name']}")
@@ -1795,7 +1771,11 @@ def backup_issues(args, repo_cwd, repository, repos_template):
modified = os.path.getmtime(issue_file) modified = os.path.getmtime(issue_file)
modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ") modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ")
if modified > issue["updated_at"]: if modified > issue["updated_at"]:
logger.info("Skipping issue {0} because it wasn't modified since last backup".format(number)) logger.info(
"Skipping issue {0} because it wasn't modified since last backup".format(
number
)
)
continue continue
if args.include_issue_comments or args.include_everything: if args.include_issue_comments or args.include_everything:
@@ -1837,14 +1817,14 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
pull_states = ["open", "closed"] pull_states = ["open", "closed"]
for pull_state in pull_states: for pull_state in pull_states:
query_args["state"] = pull_state query_args["state"] = pull_state
_pulls = retrieve_data_gen(args, _pulls_template, query_args=query_args) _pulls = retrieve_data(args, _pulls_template, query_args=query_args)
for pull in _pulls: for pull in _pulls:
if args.since and pull["updated_at"] < args.since: if args.since and pull["updated_at"] < args.since:
break break
if not args.since or pull["updated_at"] >= args.since: if not args.since or pull["updated_at"] >= args.since:
pulls[pull["number"]] = pull pulls[pull["number"]] = pull
else: else:
_pulls = retrieve_data_gen(args, _pulls_template, query_args=query_args) _pulls = retrieve_data(args, _pulls_template, query_args=query_args)
for pull in _pulls: for pull in _pulls:
if args.since and pull["updated_at"] < args.since: if args.since and pull["updated_at"] < args.since:
break break
@@ -1852,7 +1832,7 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
pulls[pull["number"]] = retrieve_data( pulls[pull["number"]] = retrieve_data(
args, args,
_pulls_template + "/{}".format(pull["number"]), _pulls_template + "/{}".format(pull["number"]),
single_request=True, paginated=False,
)[0] )[0]
logger.info("Saving {0} pull requests to disk".format(len(list(pulls.keys())))) logger.info("Saving {0} pull requests to disk".format(len(list(pulls.keys()))))
@@ -1869,7 +1849,11 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
modified = os.path.getmtime(pull_file) modified = os.path.getmtime(pull_file)
modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ") modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ")
if modified > pull["updated_at"]: if modified > pull["updated_at"]:
logger.info("Skipping pull request {0} because it wasn't modified since last backup".format(number)) logger.info(
"Skipping pull request {0} because it wasn't modified since last backup".format(
number
)
)
continue continue
if args.include_pull_comments or args.include_everything: if args.include_pull_comments or args.include_everything:
template = comments_regular_template.format(number) template = comments_regular_template.format(number)
@@ -1919,9 +1903,11 @@ def backup_milestones(args, repo_cwd, repository, repos_template):
elif written_count == 0: elif written_count == 0:
logger.info("{0} milestones unchanged, skipped write".format(total)) logger.info("{0} milestones unchanged, skipped write".format(total))
else: else:
logger.info("Saved {0} of {1} milestones to disk ({2} unchanged)".format( logger.info(
"Saved {0} of {1} milestones to disk ({2} unchanged)".format(
written_count, total, total - written_count written_count, total, total - written_count
)) )
)
def backup_labels(args, repo_cwd, repository, repos_template): def backup_labels(args, repo_cwd, repository, repos_template):
@@ -1975,6 +1961,20 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
) )
releases = releases[: args.number_of_latest_releases] releases = releases[: args.number_of_latest_releases]
# Check if this repo should skip asset downloads (case-insensitive)
skip_assets = False
if include_assets:
repo_name = repository.get("name", "").lower()
repo_full_name = repository.get("full_name", "").lower()
skip_repos = [r.lower() for r in (args.skip_assets_on or [])]
skip_assets = repo_name in skip_repos or repo_full_name in skip_repos
if skip_assets:
logger.info(
"Skipping assets for {0} ({1} releases) due to --skip-assets-on".format(
repository.get("name"), len(releases)
)
)
# for each release, store it # for each release, store it
written_count = 0 written_count = 0
for release in releases: for release in releases:
@@ -1986,7 +1986,7 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
if json_dump_if_changed(release, output_filepath): if json_dump_if_changed(release, output_filepath):
written_count += 1 written_count += 1
if include_assets: if include_assets and not skip_assets:
assets = retrieve_data(args, release["assets_url"]) assets = retrieve_data(args, release["assets_url"])
if len(assets) > 0: if len(assets) > 0:
# give release asset files somewhere to live & download them (not including source archives) # give release asset files somewhere to live & download them (not including source archives)
@@ -2008,9 +2008,11 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
elif written_count == 0: elif written_count == 0:
logger.info("{0} releases unchanged, skipped write".format(total)) logger.info("{0} releases unchanged, skipped write".format(total))
else: else:
logger.info("Saved {0} of {1} releases to disk ({2} unchanged)".format( logger.info(
"Saved {0} of {1} releases to disk ({2} unchanged)".format(
written_count, total, total - written_count written_count, total, total - written_count
)) )
)
def fetch_repository( def fetch_repository(
@@ -2024,9 +2026,12 @@ def fetch_repository(
): ):
if bare_clone: if bare_clone:
if os.path.exists(local_dir): if os.path.exists(local_dir):
clone_exists = subprocess.check_output( clone_exists = (
subprocess.check_output(
["git", "rev-parse", "--is-bare-repository"], cwd=local_dir ["git", "rev-parse", "--is-bare-repository"], cwd=local_dir
) == b"true\n" )
== b"true\n"
)
else: else:
clone_exists = False clone_exists = False
else: else:
@@ -2047,7 +2052,9 @@ def fetch_repository(
) )
else: else:
logger.info( logger.info(
"Skipping {0} (repository not accessible - may be empty, private, or credentials invalid)".format(name) "Skipping {0} (repository not accessible - may be empty, private, or credentials invalid)".format(
name
)
) )
return return

View File

@@ -1,40 +1,15 @@
# Linting & Formatting
autopep8==2.3.2 autopep8==2.3.2
black==25.12.0 black==25.12.0
bleach==6.3.0
certifi==2025.11.12
charset-normalizer==3.4.4
click==8.3.1
colorama==0.4.6
docutils==0.22.3
flake8==7.3.0 flake8==7.3.0
gitchangelog==3.0.4
# Testing
pytest==9.0.2 pytest==9.0.2
idna==3.11
importlib-metadata==8.7.0 # Release & Publishing
jaraco.classes==3.4.0
keyring==25.7.0
markdown-it-py==4.0.0
mccabe==0.7.0
mdurl==0.1.2
more-itertools==10.8.0
mypy-extensions==1.1.0
packaging==25.0
pathspec==0.12.1
pkginfo==1.12.1.2
platformdirs==4.5.1
pycodestyle==2.14.0
pyflakes==3.4.0
Pygments==2.19.2
readme-renderer==44.0
requests==2.32.5
requests-toolbelt==1.0.0
restructuredtext-lint==2.0.2
rfc3986==2.0.0
rich==14.2.0
setuptools==80.9.0
six==1.17.0
tqdm==4.67.1
twine==6.2.0 twine==6.2.0
urllib3==2.6.1 gitchangelog==3.0.4
webencodings==0.5.1 setuptools==80.9.0
zipp==3.23.0
# Documentation
restructuredtext-lint==2.0.2

View File

@@ -46,8 +46,6 @@ class TestAllStarredCloning:
args.prefer_ssh = False args.prefer_ssh = False
args.token_classic = None args.token_classic = None
args.token_fine = None args.token_fine = None
args.username = None
args.password = None
args.as_app = False args.as_app = False
args.osx_keychain_item_name = None args.osx_keychain_item_name = None
args.osx_keychain_item_account = None args.osx_keychain_item_account = None

View File

@@ -24,8 +24,6 @@ def attachment_test_setup(tmp_path):
args.as_app = False args.as_app = False
args.token_fine = None args.token_fine = None
args.token_classic = None args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None args.osx_keychain_item_name = None
args.osx_keychain_item_account = None args.osx_keychain_item_account = None
args.user = "testuser" args.user = "testuser"

View File

@@ -26,6 +26,8 @@ class TestCaseSensitivity:
args.private = False args.private = False
args.public = False args.public = False
args.all = True args.all = True
args.skip_archived = False
args.starred_skip_size_over = None
# Simulate GitHub API returning canonical case # Simulate GitHub API returning canonical case
repos = [ repos = [
@@ -65,6 +67,8 @@ class TestCaseSensitivity:
args.private = False args.private = False
args.public = False args.public = False
args.all = True args.all = True
args.skip_archived = False
args.starred_skip_size_over = None
repos = [ repos = [
{ {
@@ -93,6 +97,8 @@ class TestCaseSensitivity:
args.private = False args.private = False
args.public = False args.public = False
args.all = True args.all = True
args.skip_archived = False
args.starred_skip_size_over = None
repos = [ repos = [
{"name": "repo1", "owner": {"login": "test-user"}, "private": False, "fork": False}, {"name": "repo1", "owner": {"login": "test-user"}, "private": False, "fork": False},

View File

@@ -13,19 +13,15 @@ class TestHTTP451Exception:
def test_repository_unavailable_error_raised(self): def test_repository_unavailable_error_raised(self):
"""HTTP 451 should raise RepositoryUnavailableError with DMCA URL.""" """HTTP 451 should raise RepositoryUnavailableError with DMCA URL."""
# Create mock args
args = Mock() args = Mock()
args.as_app = False args.as_app = False
args.token_fine = None args.token_fine = None
args.token_classic = None args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None args.osx_keychain_item_name = None
args.osx_keychain_item_account = None args.osx_keychain_item_account = None
args.throttle_limit = None args.throttle_limit = None
args.throttle_pause = 0 args.throttle_pause = 0
# Mock HTTPError 451 response
mock_response = Mock() mock_response = Mock()
mock_response.getcode.return_value = 451 mock_response.getcode.return_value = 451
@@ -41,14 +37,10 @@ class TestHTTP451Exception:
mock_response.headers = {"x-ratelimit-remaining": "5000"} mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons" mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template): with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues")) github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues")
# Check exception has DMCA URL
assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md" assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
assert "451" in str(exc_info.value) assert "451" in str(exc_info.value)
@@ -58,8 +50,6 @@ class TestHTTP451Exception:
args.as_app = False args.as_app = False
args.token_fine = None args.token_fine = None
args.token_classic = None args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None args.osx_keychain_item_name = None
args.osx_keychain_item_account = None args.osx_keychain_item_account = None
args.throttle_limit = None args.throttle_limit = None
@@ -71,14 +61,10 @@ class TestHTTP451Exception:
mock_response.headers = {"x-ratelimit-remaining": "5000"} mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons" mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template): with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues")) github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues")
# Exception raised even without DMCA URL
assert exc_info.value.dmca_url is None assert exc_info.value.dmca_url is None
assert "451" in str(exc_info.value) assert "451" in str(exc_info.value)
@@ -88,8 +74,6 @@ class TestHTTP451Exception:
args.as_app = False args.as_app = False
args.token_fine = None args.token_fine = None
args.token_classic = None args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None args.osx_keychain_item_name = None
args.osx_keychain_item_account = None args.osx_keychain_item_account = None
args.throttle_limit = None args.throttle_limit = None
@@ -101,42 +85,9 @@ class TestHTTP451Exception:
mock_response.headers = {"x-ratelimit-remaining": "5000"} mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons" mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template): with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with pytest.raises(github_backup.RepositoryUnavailableError): with pytest.raises(github_backup.RepositoryUnavailableError):
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues")) github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues")
def test_other_http_errors_unchanged(self):
"""Other HTTP errors should still raise generic Exception."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
mock_response = Mock()
mock_response.getcode.return_value = 404
mock_response.read.return_value = b'{"message": "Not Found"}'
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Not Found"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
# Should raise generic Exception, not RepositoryUnavailableError
with pytest.raises(Exception) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/notfound/issues"))
assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError)
assert "404" in str(exc_info.value)
if __name__ == "__main__": if __name__ == "__main__":

View File

@@ -40,13 +40,11 @@ class MockHTTPResponse:
@pytest.fixture @pytest.fixture
def mock_args(): def mock_args():
"""Mock args for retrieve_data_gen.""" """Mock args for retrieve_data."""
args = Mock() args = Mock()
args.as_app = False args.as_app = False
args.token_fine = None args.token_fine = None
args.token_classic = "fake_token" args.token_classic = "fake_token"
args.username = None
args.password = None
args.osx_keychain_item_name = None args.osx_keychain_item_name = None
args.osx_keychain_item_account = None args.osx_keychain_item_account = None
args.throttle_limit = None args.throttle_limit = None
@@ -77,11 +75,9 @@ def test_cursor_based_pagination(mock_args):
return responses[len(requests_made) - 1] return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list( results = github_backup.retrieve_data(
github_backup.retrieve_data_gen(
mock_args, "https://api.github.com/repos/owner/repo/issues" mock_args, "https://api.github.com/repos/owner/repo/issues"
) )
)
# Verify all items retrieved and cursor was used in second request # Verify all items retrieved and cursor was used in second request
assert len(results) == 150 assert len(results) == 150
@@ -112,11 +108,9 @@ def test_page_based_pagination(mock_args):
return responses[len(requests_made) - 1] return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list( results = github_backup.retrieve_data(
github_backup.retrieve_data_gen(
mock_args, "https://api.github.com/repos/owner/repo/pulls" mock_args, "https://api.github.com/repos/owner/repo/pulls"
) )
)
# Verify all items retrieved and page parameter was used (not cursor) # Verify all items retrieved and page parameter was used (not cursor)
assert len(results) == 180 assert len(results) == 180
@@ -142,11 +136,9 @@ def test_no_link_header_stops_pagination(mock_args):
return responses[len(requests_made) - 1] return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list( results = github_backup.retrieve_data(
github_backup.retrieve_data_gen(
mock_args, "https://api.github.com/repos/owner/repo/labels" mock_args, "https://api.github.com/repos/owner/repo/labels"
) )
)
# Verify pagination stopped after first request # Verify pagination stopped after first request
assert len(results) == 50 assert len(results) == 50

359
tests/test_retrieve_data.py Normal file
View File

@@ -0,0 +1,359 @@
"""Tests for retrieve_data function."""
import json
import socket
from unittest.mock import Mock, patch
from urllib.error import HTTPError, URLError
import pytest
from github_backup import github_backup
from github_backup.github_backup import (
MAX_RETRIES,
calculate_retry_delay,
make_request_with_retry,
)
class TestCalculateRetryDelay:
def test_respects_retry_after_header(self):
headers = {'retry-after': '30'}
assert calculate_retry_delay(0, headers) == 30
def test_respects_rate_limit_reset(self):
import time
import calendar
# Set reset time 60 seconds in the future
future_reset = calendar.timegm(time.gmtime()) + 60
headers = {
'x-ratelimit-remaining': '0',
'x-ratelimit-reset': str(future_reset)
}
delay = calculate_retry_delay(0, headers)
# Should be approximately 60 seconds (with some tolerance for execution time)
assert 55 <= delay <= 65
def test_exponential_backoff(self):
delay_0 = calculate_retry_delay(0, {})
delay_1 = calculate_retry_delay(1, {})
delay_2 = calculate_retry_delay(2, {})
# Base delay is 1s, so delays should be roughly 1, 2, 4 (plus jitter)
assert 0.9 <= delay_0 <= 1.2 # ~1s + up to 10% jitter
assert 1.8 <= delay_1 <= 2.4 # ~2s + up to 10% jitter
assert 3.6 <= delay_2 <= 4.8 # ~4s + up to 10% jitter
def test_max_delay_cap(self):
# Very high attempt number should not exceed 120s + jitter
delay = calculate_retry_delay(100, {})
assert delay <= 120 * 1.1 # 120s max + 10% jitter
def test_minimum_rate_limit_delay(self):
import time
import calendar
# Set reset time in the past (already reset)
past_reset = calendar.timegm(time.gmtime()) - 100
headers = {
'x-ratelimit-remaining': '0',
'x-ratelimit-reset': str(past_reset)
}
delay = calculate_retry_delay(0, headers)
# Should be minimum 10 seconds even if reset time is in past
assert delay >= 10
class TestRetrieveDataRetry:
"""Tests for retry behavior in retrieve_data."""
@pytest.fixture
def mock_args(self):
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
return args
def test_json_parse_error_retries_and_fails(self, mock_args):
"""HTTP 200 with invalid JSON should retry and eventually fail."""
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = b"not valid json {"
mock_response.headers = {"x-ratelimit-remaining": "5000"}
call_count = 0
def mock_make_request(*args, **kwargs):
nonlocal call_count
call_count += 1
return mock_response
with patch("github_backup.github_backup.make_request_with_retry", side_effect=mock_make_request):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): # No delay in tests
with pytest.raises(Exception) as exc_info:
github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues")
assert "Failed to read response after" in str(exc_info.value)
assert call_count == MAX_RETRIES
def test_json_parse_error_recovers_on_retry(self, mock_args):
"""HTTP 200 with invalid JSON should succeed if retry returns valid JSON."""
bad_response = Mock()
bad_response.getcode.return_value = 200
bad_response.read.return_value = b"not valid json {"
bad_response.headers = {"x-ratelimit-remaining": "5000"}
good_response = Mock()
good_response.getcode.return_value = 200
good_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8")
good_response.headers = {"x-ratelimit-remaining": "5000", "Link": ""}
responses = [bad_response, bad_response, good_response]
call_count = 0
def mock_make_request(*args, **kwargs):
nonlocal call_count
result = responses[call_count]
call_count += 1
return result
with patch("github_backup.github_backup.make_request_with_retry", side_effect=mock_make_request):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
result = github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues")
assert result == [{"id": 1}]
assert call_count == 3 # Failed twice, succeeded on third
def test_http_error_raises_exception(self, mock_args):
"""Non-success HTTP status codes should raise Exception."""
mock_response = Mock()
mock_response.getcode.return_value = 404
mock_response.read.return_value = b'{"message": "Not Found"}'
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Not Found"
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with pytest.raises(Exception) as exc_info:
github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/notfound/issues")
assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError)
assert "404" in str(exc_info.value)
class TestMakeRequestWithRetry:
"""Tests for HTTP error retry behavior in make_request_with_retry."""
def test_502_error_retries_and_succeeds(self):
"""HTTP 502 should retry and succeed if subsequent request works."""
good_response = Mock()
good_response.read.return_value = b'{"ok": true}'
call_count = 0
fail_count = MAX_RETRIES - 1 # Fail all but last attempt
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count <= fail_count:
raise HTTPError(
url="https://api.github.com/test",
code=502,
msg="Bad Gateway",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert call_count == MAX_RETRIES
def test_503_error_retries_until_exhausted(self):
"""HTTP 503 should retry MAX_RETRIES times then raise."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise HTTPError(
url="https://api.github.com/test",
code=503,
msg="Service Unavailable",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 503
assert call_count == MAX_RETRIES
def test_404_error_not_retried(self):
"""HTTP 404 should not be retried - raise immediately."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise HTTPError(
url="https://api.github.com/test",
code=404,
msg="Not Found",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 404
assert call_count == 1 # No retries
def test_rate_limit_403_retried_when_remaining_zero(self):
"""HTTP 403 with x-ratelimit-remaining=0 should retry."""
good_response = Mock()
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count == 1:
raise HTTPError(
url="https://api.github.com/test",
code=403,
msg="Forbidden",
hdrs={"x-ratelimit-remaining": "0"},
fp=None,
)
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert call_count == 2
def test_403_not_retried_when_remaining_nonzero(self):
"""HTTP 403 with x-ratelimit-remaining>0 should not retry (permission error)."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise HTTPError(
url="https://api.github.com/test",
code=403,
msg="Forbidden",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 403
assert call_count == 1 # No retries
def test_connection_error_retries_and_succeeds(self):
"""URLError (connection error) should retry and succeed if subsequent request works."""
good_response = Mock()
call_count = 0
fail_count = MAX_RETRIES - 1 # Fail all but last attempt
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count <= fail_count:
raise URLError("Connection refused")
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert call_count == MAX_RETRIES
def test_socket_error_retries_until_exhausted(self):
"""socket.error should retry MAX_RETRIES times then raise."""
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
raise socket.error("Connection reset by peer")
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
with pytest.raises(socket.error):
make_request_with_retry(Mock(), None)
assert call_count == MAX_RETRIES
class TestRetrieveDataThrottling:
"""Tests for throttling behavior in retrieve_data."""
@pytest.fixture
def mock_args(self):
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = 10 # Throttle when remaining <= 10
args.throttle_pause = 5 # Pause 5 seconds
return args
def test_throttling_pauses_when_rate_limit_low(self, mock_args):
"""Should pause when x-ratelimit-remaining is at or below throttle_limit."""
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5", "Link": ""} # Below throttle_limit
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with patch("github_backup.github_backup.time.sleep") as mock_sleep:
github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues")
mock_sleep.assert_called_once_with(5) # throttle_pause value
class TestRetrieveDataSingleItem:
"""Tests for single item (dict) responses in retrieve_data."""
@pytest.fixture
def mock_args(self):
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
return args
def test_dict_response_returned_as_list(self, mock_args):
"""Single dict response should be returned as a list with one item."""
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = json.dumps({"login": "testuser", "id": 123}).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5000", "Link": ""}
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
result = github_backup.retrieve_data(mock_args, "https://api.github.com/user")
assert result == [{"login": "testuser", "id": 123}]

View File

@@ -0,0 +1,318 @@
"""Tests for --skip-assets-on flag behavior (issue #135)."""
import pytest
from unittest.mock import Mock, patch
from github_backup import github_backup
class TestSkipAssetsOn:
"""Test suite for --skip-assets-on flag.
Issue #135: Allow skipping asset downloads for specific repositories
while still backing up release metadata.
"""
def _create_mock_args(self, **overrides):
"""Create a mock args object with sensible defaults."""
args = Mock()
args.user = "testuser"
args.output_directory = "/tmp/backup"
args.include_repository = False
args.include_everything = False
args.include_gists = False
args.include_starred_gists = False
args.all_starred = False
args.skip_existing = False
args.bare_clone = False
args.lfs_clone = False
args.no_prune = False
args.include_wiki = False
args.include_issues = False
args.include_issue_comments = False
args.include_issue_events = False
args.include_pulls = False
args.include_pull_comments = False
args.include_pull_commits = False
args.include_pull_details = False
args.include_labels = False
args.include_hooks = False
args.include_milestones = False
args.include_releases = True
args.include_assets = True
args.skip_assets_on = []
args.include_attachments = False
args.incremental = False
args.incremental_by_files = False
args.github_host = None
args.prefer_ssh = False
args.token_classic = "test-token"
args.token_fine = None
args.as_app = False
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.skip_prerelease = False
args.number_of_latest_releases = None
for key, value in overrides.items():
setattr(args, key, value)
return args
def _create_mock_repository(self, name="test-repo", owner="testuser"):
"""Create a mock repository object."""
return {
"name": name,
"full_name": f"{owner}/{name}",
"owner": {"login": owner},
"private": False,
"fork": False,
"has_wiki": False,
}
def _create_mock_release(self, tag="v1.0.0"):
"""Create a mock release object."""
return {
"tag_name": tag,
"name": tag,
"prerelease": False,
"draft": False,
"assets_url": f"https://api.github.com/repos/testuser/test-repo/releases/{tag}/assets",
}
def _create_mock_asset(self, name="asset.zip"):
"""Create a mock asset object."""
return {
"name": name,
"url": f"https://api.github.com/repos/testuser/test-repo/releases/assets/{name}",
}
class TestSkipAssetsOnArgumentParsing(TestSkipAssetsOn):
"""Tests for --skip-assets-on argument parsing."""
def test_skip_assets_on_not_set_defaults_to_none(self):
"""When --skip-assets-on is not specified, it should default to None."""
args = github_backup.parse_args(["testuser"])
assert args.skip_assets_on is None
def test_skip_assets_on_single_repo(self):
"""Single --skip-assets-on should create list with one item."""
args = github_backup.parse_args(["testuser", "--skip-assets-on", "big-repo"])
assert args.skip_assets_on == ["big-repo"]
def test_skip_assets_on_multiple_repos(self):
"""Multiple repos can be specified space-separated (like --exclude)."""
args = github_backup.parse_args(
[
"testuser",
"--skip-assets-on",
"big-repo",
"another-repo",
"owner/third-repo",
]
)
assert args.skip_assets_on == ["big-repo", "another-repo", "owner/third-repo"]
class TestSkipAssetsOnBehavior(TestSkipAssetsOn):
"""Tests for --skip-assets-on behavior in backup_releases."""
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_downloaded_when_not_skipped(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Assets should be downloaded when repo is not in skip list."""
args = self._create_mock_args(skip_assets_on=[])
repository = self._create_mock_repository(name="normal-repo")
release = self._create_mock_release()
asset = self._create_mock_asset()
mock_json_dump.return_value = True
mock_retrieve.side_effect = [
[release], # First call: get releases
[asset], # Second call: get assets
]
with patch("os.path.join", side_effect=lambda *args: "/".join(args)):
github_backup.backup_releases(
args,
"/tmp/backup/repositories/normal-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should have been called for the asset
mock_download.assert_called_once()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_skipped_when_repo_name_matches(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Assets should be skipped when repo name is in skip list."""
args = self._create_mock_args(skip_assets_on=["big-repo"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_skipped_when_full_name_matches(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Assets should be skipped when owner/repo format matches."""
args = self._create_mock_args(skip_assets_on=["otheruser/big-repo"])
repository = self._create_mock_repository(name="big-repo", owner="otheruser")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_case_insensitive_matching(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Skip matching should be case-insensitive."""
# User types uppercase, repo name is lowercase
args = self._create_mock_args(skip_assets_on=["BIG-REPO"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called (case-insensitive match)
assert not mock_download.called
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_multiple_skip_repos(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Multiple repos in skip list should all be skipped."""
args = self._create_mock_args(skip_assets_on=["repo1", "repo2", "repo3"])
repository = self._create_mock_repository(name="repo2")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/repo2",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_release_metadata_still_saved_when_assets_skipped(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Release JSON should still be saved even when assets are skipped."""
args = self._create_mock_args(skip_assets_on=["big-repo"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# json_dump_if_changed should have been called for release metadata
mock_json_dump.assert_called_once()
# But download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_non_matching_repo_still_downloads_assets(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Repos not in skip list should still download assets."""
args = self._create_mock_args(skip_assets_on=["other-repo"])
repository = self._create_mock_repository(name="normal-repo")
release = self._create_mock_release()
asset = self._create_mock_asset()
mock_json_dump.return_value = True
mock_retrieve.side_effect = [
[release], # First call: get releases
[asset], # Second call: get assets
]
with patch("os.path.join", side_effect=lambda *args: "/".join(args)):
github_backup.backup_releases(
args,
"/tmp/backup/repositories/normal-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file SHOULD have been called
mock_download.assert_called_once()
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -0,0 +1,224 @@
"""Tests for --starred-skip-size-over flag behavior (issue #108)."""
import pytest
from unittest.mock import Mock
from github_backup import github_backup
class TestStarredSkipSizeOver:
"""Test suite for --starred-skip-size-over flag.
Issue #108: Allow restricting size of starred repositories before cloning.
The size is based on the GitHub API's 'size' field (in KB), but the CLI
argument accepts MB for user convenience.
"""
def _create_mock_args(self, **overrides):
"""Create a mock args object with sensible defaults."""
args = Mock()
args.user = "testuser"
args.repository = None
args.name_regex = None
args.languages = None
args.fork = False
args.private = False
args.skip_archived = False
args.starred_skip_size_over = None
args.exclude = None
for key, value in overrides.items():
setattr(args, key, value)
return args
class TestStarredSkipSizeOverArgumentParsing(TestStarredSkipSizeOver):
"""Tests for --starred-skip-size-over argument parsing."""
def test_starred_skip_size_over_not_set_defaults_to_none(self):
"""When --starred-skip-size-over is not specified, it should default to None."""
args = github_backup.parse_args(["testuser"])
assert args.starred_skip_size_over is None
def test_starred_skip_size_over_accepts_integer(self):
"""--starred-skip-size-over should accept an integer value."""
args = github_backup.parse_args(["testuser", "--starred-skip-size-over", "500"])
assert args.starred_skip_size_over == 500
def test_starred_skip_size_over_rejects_non_integer(self):
"""--starred-skip-size-over should reject non-integer values."""
with pytest.raises(SystemExit):
github_backup.parse_args(["testuser", "--starred-skip-size-over", "abc"])
class TestStarredSkipSizeOverFiltering(TestStarredSkipSizeOver):
"""Tests for --starred-skip-size-over filtering behavior."""
def test_starred_repo_under_limit_is_kept(self):
"""Starred repos under the size limit should be kept."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "small-repo",
"owner": {"login": "otheruser"},
"size": 100 * 1024, # 100 MB in KB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert result[0]["name"] == "small-repo"
def test_starred_repo_over_limit_is_filtered(self):
"""Starred repos over the size limit should be filtered out."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "huge-repo",
"owner": {"login": "otheruser"},
"size": 600 * 1024, # 600 MB in KB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 0
def test_own_repo_over_limit_is_kept(self):
"""User's own repos should not be affected by the size limit."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "my-huge-repo",
"owner": {"login": "testuser"},
"size": 600 * 1024, # 600 MB in KB
# No is_starred flag - this is the user's own repo
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert result[0]["name"] == "my-huge-repo"
def test_starred_repo_at_exact_limit_is_kept(self):
"""Starred repos at exactly the size limit should be kept."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "exact-limit-repo",
"owner": {"login": "otheruser"},
"size": 500 * 1024, # Exactly 500 MB in KB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert result[0]["name"] == "exact-limit-repo"
def test_mixed_repos_filtered_correctly(self):
"""Mix of own and starred repos should be filtered correctly."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "my-huge-repo",
"owner": {"login": "testuser"},
"size": 1000 * 1024, # 1 GB - own repo, should be kept
},
{
"name": "starred-small",
"owner": {"login": "otheruser"},
"size": 100 * 1024, # 100 MB - under limit
"is_starred": True,
},
{
"name": "starred-huge",
"owner": {"login": "anotheruser"},
"size": 2000 * 1024, # 2 GB - over limit
"is_starred": True,
},
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 2
names = [r["name"] for r in result]
assert "my-huge-repo" in names
assert "starred-small" in names
assert "starred-huge" not in names
def test_no_size_limit_keeps_all_starred(self):
"""When no size limit is set, all starred repos should be kept."""
args = self._create_mock_args(starred_skip_size_over=None)
repos = [
{
"name": "huge-starred-repo",
"owner": {"login": "otheruser"},
"size": 10000 * 1024, # 10 GB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
def test_repo_without_size_field_is_kept(self):
"""Repos without a size field should be kept (size defaults to 0)."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "no-size-repo",
"owner": {"login": "otheruser"},
"is_starred": True,
# No size field
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
def test_zero_value_warns_and_is_ignored(self, caplog):
"""Zero value should warn and keep all repos."""
args = self._create_mock_args(starred_skip_size_over=0)
repos = [
{
"name": "huge-starred-repo",
"owner": {"login": "otheruser"},
"size": 10000 * 1024, # 10 GB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert "must be greater than 0" in caplog.text
def test_negative_value_warns_and_is_ignored(self, caplog):
"""Negative value should warn and keep all repos."""
args = self._create_mock_args(starred_skip_size_over=-5)
repos = [
{
"name": "huge-starred-repo",
"owner": {"login": "otheruser"},
"size": 10000 * 1024, # 10 GB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert "must be greater than 0" in caplog.text
if __name__ == "__main__":
pytest.main([__file__, "-v"])