Compare commits

...

14 Commits

Author SHA1 Message Date
GitHub Action
858731ebbd Release version 0.60.0 2025-12-24 00:45:01 +00:00
Jose Diaz-Gonzalez
2e999d0d3c Merge pull request #474 from mwtzzz/retry_logic
update retry logic and logging
2025-12-23 19:44:32 -05:00
michaelmartinez
44b0003ec9 updates to the tests, and fixes to the retry 2025-12-23 14:07:38 -08:00
michaelmartinez
5ab3852476 rm max_retries.py 2025-12-23 08:57:57 -08:00
michaelmartinez
8b21e2501c readme 2025-12-23 08:55:52 -08:00
michaelmartinez
f9827da342 don't use a global variable, pass the args instead 2025-12-23 08:53:54 -08:00
michaelmartinez
1f2ec016d5 readme, simplify the logic a bit 2025-12-22 16:13:12 -08:00
michaelmartinez
8b1b632d89 max_retries 5 2025-12-22 14:47:26 -08:00
michaelmartinez
89502c326d update retry logic and logging
### What
1. configureable retry count
2. additional logging

### Why
1. pass retry count as a command line arg; default 5
2. show details when api requests fail

### Testing before merge
compiles cleanly

### Validation after merge
compile and test

### Issue addressed by this PR
https://github.com/stellar/ops/issues/2039
2025-12-22 14:23:02 -08:00
GitHub Action
81a72ac8af Release version 0.59.0 2025-12-21 23:48:36 +00:00
Jose Diaz-Gonzalez
3edbfc777c Merge pull request #472 from Iamrodos/feature/108-starred-skip-size-over
Add --starred-skip-size-over flag to limit starred repo size (#108)
2025-12-21 18:47:58 -05:00
Rodos
3c43e0f481 Add --starred-skip-size-over flag to limit starred repo size (#108)
Allow users to skip starred repositories exceeding a size threshold
when using --all-starred. Size is specified in MB and checked against
the GitHub API's repository size field.

- Only affects starred repos; user's own repos always included
- Logs each skipped repo with name and size

Closes #108
2025-12-21 22:18:09 +11:00
Jose Diaz-Gonzalez
875f09eeaf Merge pull request #473 from Iamrodos/chore/remove-password-auth
chore: remove deprecated -u/-p password authentication options
2025-12-21 01:36:35 -05:00
Rodos
db36c3c137 chore: remove deprecated -u/-p password authentication options 2025-12-20 19:16:11 +11:00
13 changed files with 617 additions and 162 deletions

View File

@@ -1,9 +1,35 @@
Changelog
=========
0.58.0 (2025-12-16)
0.60.0 (2025-12-24)
-------------------
------------------------
- Rm max_retries.py. [michaelmartinez]
- Readme. [michaelmartinez]
- Don't use a global variable, pass the args instead. [michaelmartinez]
- Readme, simplify the logic a bit. [michaelmartinez]
- Max_retries 5. [michaelmartinez]
0.59.0 (2025-12-21)
-------------------
- Add --starred-skip-size-over flag to limit starred repo size (#108)
[Rodos]
Allow users to skip starred repositories exceeding a size threshold
when using --all-starred. Size is specified in MB and checked against
the GitHub API's repository size field.
- Only affects starred repos; user's own repos always included
- Logs each skipped repo with name and size
Closes #108
- Chore: remove deprecated -u/-p password authentication options.
[Rodos]
0.58.0 (2025-12-16)
-------------------
- Fix retry logic for HTTP 5xx errors and network failures. [Rodos]
Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a9048 to fix #29.

View File

@@ -36,23 +36,26 @@ Show the CLI help output::
CLI Help output::
github-backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN_CLASSIC]
[-f TOKEN_FINE] [--as-app] [-o OUTPUT_DIRECTORY]
[-l LOG_LEVEL] [-i] [--starred] [--all-starred]
[--watched] [--followers] [--following] [--all] [--issues]
[--issue-comments] [--issue-events] [--pulls]
github-backup [-h] [-t TOKEN_CLASSIC] [-f TOKEN_FINE] [-q] [--as-app]
[-o OUTPUT_DIRECTORY] [-l LOG_LEVEL] [-i]
[--incremental-by-files]
[--starred] [--all-starred] [--starred-skip-size-over MB]
[--watched] [--followers] [--following] [--all]
[--issues] [--issue-comments] [--issue-events] [--pulls]
[--pull-comments] [--pull-commits] [--pull-details]
[--labels] [--hooks] [--milestones] [--repositories]
[--bare] [--lfs] [--wikis] [--gists] [--starred-gists]
[--skip-archived] [--skip-existing] [-L [LANGUAGES ...]]
[-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY]
[-P] [-F] [--prefer-ssh] [-v]
[--bare] [--no-prune] [--lfs] [--wikis] [--gists]
[--starred-gists] [--skip-archived] [--skip-existing]
[-L [LANGUAGES ...]] [-N NAME_REGEX] [-H GITHUB_HOST]
[-O] [-R REPOSITORY] [-P] [-F] [--prefer-ssh] [-v]
[--keychain-name OSX_KEYCHAIN_ITEM_NAME]
[--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT]
[--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES]
[--skip-prerelease] [--assets] [--skip-assets-on [REPO ...]]
[--attachments] [--exclude [REPOSITORY [REPOSITORY ...]]
[--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE]
[--skip-prerelease] [--assets]
[--skip-assets-on [SKIP_ASSETS_ON ...]] [--attachments]
[--throttle-limit THROTTLE_LIMIT]
[--throttle-pause THROTTLE_PAUSE]
[--exclude [EXCLUDE ...]]
USER
Backup a github account
@@ -60,29 +63,29 @@ CLI Help output::
positional arguments:
USER github username
optional arguments:
options:
-h, --help show this help message and exit
-u USERNAME, --username USERNAME
username for basic auth
-p PASSWORD, --password PASSWORD
password for basic auth. If a username is given but
not a password, the password will be prompted for.
-f TOKEN_FINE, --token-fine TOKEN_FINE
fine-grained personal access token or path to token
(file://...)
-t TOKEN_CLASSIC, --token TOKEN_CLASSIC
-t, --token TOKEN_CLASSIC
personal access, OAuth, or JSON Web token, or path to
token (file://...)
-f, --token-fine TOKEN_FINE
fine-grained personal access token (github_pat_....),
or path to token (file://...)
-q, --quiet supress log messages less severe than warning, e.g.
info
--as-app authenticate as github app instead of as a user.
-o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
-o, --output-directory OUTPUT_DIRECTORY
directory at which to backup the repositories
-l LOG_LEVEL, --log-level LOG_LEVEL
-l, --log-level LOG_LEVEL
log level to use (default: info, possible levels:
debug, info, warning, error, critical)
-i, --incremental incremental backup
--incremental-by-files incremental backup using modified time of files
--incremental-by-files
incremental backup based on modification date of files
--starred include JSON output of starred repositories in backup
--all-starred include starred repositories in backup [*]
--starred-skip-size-over MB
skip starred repositories larger than this size in MB
--watched include JSON output of watched repositories in backup
--followers include JSON output of followers in backup
--following include JSON output of following users in backup
@@ -100,20 +103,22 @@ CLI Help output::
--milestones include milestones in backup
--repositories include repository clone in backup
--bare clone bare repositories
--no-prune disable prune option for git fetch
--lfs clone LFS repositories (requires Git LFS to be
installed, https://git-lfs.github.com) [*]
--wikis include wiki clone in backup
--gists include gists in backup [*]
--starred-gists include starred gists in backup [*]
--skip-archived skip project if it is archived
--skip-existing skip project if a backup directory exists
-L [LANGUAGES [LANGUAGES ...]], --languages [LANGUAGES [LANGUAGES ...]]
-L, --languages [LANGUAGES ...]
only allow these languages
-N NAME_REGEX, --name-regex NAME_REGEX
-N, --name-regex NAME_REGEX
python regex to match names against
-H GITHUB_HOST, --github-host GITHUB_HOST
-H, --github-host GITHUB_HOST
GitHub Enterprise hostname
-O, --organization whether or not this is an organization user
-R REPOSITORY, --repository REPOSITORY
-R, --repository REPOSITORY
name of repository to limit backup to
-P, --private include private repositories [*]
-F, --fork include forked repositories [*]
@@ -128,19 +133,16 @@ CLI Help output::
--releases include release information, not including assets or
binaries
--latest-releases NUMBER_OF_LATEST_RELEASES
include certain number of the latest releases;
only applies if including releases
--skip-prerelease skip prerelease and draft versions; only applies if including releases
include certain number of the latest releases; only
applies if including releases
--skip-prerelease skip prerelease and draft versions; only applies if
including releases
--assets include assets alongside release information; only
applies if including releases
--skip-assets-on [REPO ...]
skip asset downloads for these repositories (e.g.
--skip-assets-on repo1 owner/repo2)
--attachments download user-attachments from issues and pull requests
to issues/attachments/{issue_number}/ and
pulls/attachments/{pull_number}/ directories
--exclude [REPOSITORY [REPOSITORY ...]]
names of repositories to exclude from backup.
--skip-assets-on [SKIP_ASSETS_ON ...]
skip asset downloads for these repositories
--attachments download user-attachments from issues and pull
requests
--throttle-limit THROTTLE_LIMIT
start throttling of GitHub API requests after this
amount of API requests remain
@@ -148,7 +150,10 @@ CLI Help output::
wait this amount of seconds when API request
throttling is active (default: 30.0, requires
--throttle-limit to be set)
--exclude [EXCLUDE ...]
names of repositories to exclude
--retries MAX_RETRIES
maximum number of retries for API calls (default: 5)
Usage Details
=============
@@ -156,13 +161,13 @@ Usage Details
Authentication
--------------
**Password-based authentication** will fail if you have two-factor authentication enabled, and will `be deprecated <https://github.blog/2023-03-09-raising-the-bar-for-software-security-github-2fa-begins-march-13/>`_ by 2023 EOY.
GitHub requires token-based authentication for API access. Password authentication was `removed in November 2020 <https://developer.github.com/changes/2020-02-14-deprecating-password-auth/>`_.
``--username`` is used for basic password authentication and separate from the positional argument ``USER``, which specifies the user account you wish to back up.
The positional argument ``USER`` specifies the user or organization account you wish to back up.
**Classic tokens** are `slightly less secure <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#personal-access-tokens-classic>`_ as they provide very coarse-grained permissions.
**Fine-grained tokens** (``-f TOKEN_FINE``) are recommended for most use cases, especially long-running backups (e.g. cron jobs), as they provide precise permission control.
If you need authentication for long-running backups (e.g. for a cron job) it is recommended to use **fine-grained personal access token** ``-f TOKEN_FINE``.
**Classic tokens** (``-t TOKEN``) are `slightly less secure <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#personal-access-tokens-classic>`_ as they provide very coarse-grained permissions.
Fine Tokens
@@ -290,10 +295,20 @@ All is not everything
The ``--all`` argument does not include: cloning private repos (``-P, --private``), cloning forks (``-F, --fork``), cloning starred repositories (``--all-starred``), ``--pull-details``, cloning LFS repositories (``--lfs``), cloning gists (``--gists``) or cloning starred gist repos (``--starred-gists``). See examples for more.
Cloning all starred size
------------------------
Starred repository size
-----------------------
Using the ``--all-starred`` argument to clone all starred repositories may use a large amount of storage space, especially if ``--all`` or more arguments are used. e.g. commonly starred repos can have tens of thousands of issues, many large assets and the repo itself etc. Consider just storing links to starred repos in JSON format with ``--starred``.
Using the ``--all-starred`` argument to clone all starred repositories may use a large amount of storage space.
To see your starred repositories sorted by size (requires `GitHub CLI <https://cli.github.com>`_)::
gh api user/starred --paginate --jq 'sort_by(-.size)[]|"\(.full_name) \(.size/1024|round)MB"'
To limit which starred repositories are cloned, use ``--starred-skip-size-over SIZE`` where SIZE is in MB. For example, ``--starred-skip-size-over 500`` will skip any starred repository where the git repository size (code and history) exceeds 500 MB. Note that this size limit only applies to the repository itself, not issues, release assets or other metadata. This filter only affects starred repositories; your own repositories are always included regardless of size.
For finer control, avoid using ``--assets`` with starred repos, or use ``--skip-assets-on`` for specific repositories with large release binaries.
Alternatively, consider just storing links to starred repos in JSON format with ``--starred``.
Incremental Backup
------------------

View File

@@ -1 +1 @@
__version__ = "0.58.0"
__version__ = "0.60.0"

View File

@@ -43,7 +43,7 @@ def main():
if args.private and not get_auth(args):
logger.warning(
"The --private flag has no effect without authentication. "
"Use -t/--token, -f/--token-fine, or -u/--username to authenticate."
"Use -t/--token or -f/--token-fine to authenticate."
)
if args.quiet:

View File

@@ -7,7 +7,6 @@ import base64
import calendar
import codecs
import errno
import getpass
import json
import logging
import os
@@ -24,7 +23,6 @@ from collections.abc import Generator
from datetime import datetime
from http.client import IncompleteRead
from urllib.error import HTTPError, URLError
from urllib.parse import quote as urlquote
from urllib.parse import urlencode, urlparse
from urllib.request import HTTPRedirectHandler, Request, build_opener, urlopen
@@ -76,9 +74,6 @@ else:
" 3. Debian/Ubuntu: apt-get install ca-certificates\n\n"
)
# Retry configuration
MAX_RETRIES = 5
def logging_subprocess(
popenargs, stdout_log_level=logging.DEBUG, stderr_log_level=logging.ERROR, **kwargs
@@ -146,20 +141,20 @@ def mask_password(url, secret="*****"):
return url.replace(parsed.password, secret)
def non_negative_int(value):
"""Argparse type validator for non-negative integers."""
try:
ivalue = int(value)
except ValueError:
raise argparse.ArgumentTypeError(f"'{value}' is not a valid integer")
if ivalue < 0:
raise argparse.ArgumentTypeError(f"{value} must be 0 or greater")
return ivalue
def parse_args(args=None):
parser = argparse.ArgumentParser(description="Backup a github account")
parser.add_argument("user", metavar="USER", type=str, help="github username")
parser.add_argument(
"-u", "--username", dest="username", help="username for basic auth"
)
parser.add_argument(
"-p",
"--password",
dest="password",
help="password for basic auth. "
"If a username is given but not a password, the "
"password will be prompted for.",
)
parser.add_argument(
"-t",
"--token",
@@ -224,6 +219,13 @@ def parse_args(args=None):
dest="all_starred",
help="include starred repositories in backup [*]",
)
parser.add_argument(
"--starred-skip-size-over",
type=int,
metavar="MB",
dest="starred_skip_size_over",
help="skip starred repositories larger than this size in MB",
)
parser.add_argument(
"--watched",
action="store_true",
@@ -474,6 +476,13 @@ def parse_args(args=None):
parser.add_argument(
"--exclude", dest="exclude", help="names of repositories to exclude", nargs="*"
)
parser.add_argument(
"--retries",
dest="max_retries",
type=non_negative_int,
default=5,
help="maximum number of retries for API calls (default: 5)",
)
return parser.parse_args(args)
@@ -533,16 +542,6 @@ def get_auth(args, encode=True, for_git_cli=False):
auth = args.token_classic
else:
auth = "x-access-token:" + args.token_classic
elif args.username:
if not args.password:
args.password = getpass.getpass()
if encode:
password = args.password
else:
password = urlquote(args.password)
auth = args.username + ":" + password
elif args.password:
raise Exception("You must specify a username for basic auth")
if not auth:
return None
@@ -647,7 +646,7 @@ def retrieve_data(args, template, query_args=None, paginated=True):
while True:
# FIRST: Fetch response
for attempt in range(MAX_RETRIES):
for attempt in range(args.max_retries + 1):
request = _construct_request(
per_page=per_page if paginated else None,
query_args=query_args,
@@ -656,7 +655,7 @@ def retrieve_data(args, template, query_args=None, paginated=True):
as_app=args.as_app,
fine=args.token_fine is not None,
)
http_response = make_request_with_retry(request, auth)
http_response = make_request_with_retry(request, auth, args.max_retries)
match http_response.getcode():
case 200:
@@ -670,10 +669,10 @@ def retrieve_data(args, template, query_args=None, paginated=True):
TimeoutError,
) as e:
logger.warning(f"{type(e).__name__} reading response")
if attempt < MAX_RETRIES - 1:
if attempt < args.max_retries:
delay = calculate_retry_delay(attempt, {})
logger.warning(
f"Retrying in {delay:.1f}s (attempt {attempt + 1}/{MAX_RETRIES})"
f"Retrying read in {delay:.1f}s (attempt {attempt + 1}/{args.max_retries + 1})"
)
time.sleep(delay)
continue # Next retry attempt
@@ -699,10 +698,10 @@ def retrieve_data(args, template, query_args=None, paginated=True):
)
else:
logger.error(
f"Failed to read response after {MAX_RETRIES} attempts for {next_url or template}"
f"Failed to read response after {args.max_retries + 1} attempts for {next_url or template}"
)
raise Exception(
f"Failed to read response after {MAX_RETRIES} attempts for {next_url or template}"
f"Failed to read response after {args.max_retries + 1} attempts for {next_url or template}"
)
# SECOND: Process and paginate
@@ -734,7 +733,7 @@ def retrieve_data(args, template, query_args=None, paginated=True):
return list(fetch_all())
def make_request_with_retry(request, auth):
def make_request_with_retry(request, auth, max_retries=5):
"""Make HTTP request with automatic retry for transient errors."""
def is_retryable_status(status_code, headers):
@@ -746,40 +745,49 @@ def make_request_with_retry(request, auth):
return int(headers.get("x-ratelimit-remaining", 1)) < 1
return False
for attempt in range(MAX_RETRIES):
for attempt in range(max_retries + 1):
try:
return urlopen(request, context=https_ctx)
except HTTPError as exc:
# HTTPError can be used as a response-like object
if not is_retryable_status(exc.code, exc.headers):
logger.error(
f"API Error: {exc.code} {exc.reason} for {request.full_url}"
)
raise # Non-retryable error
if attempt >= MAX_RETRIES - 1:
logger.error(f"HTTP {exc.code} failed after {MAX_RETRIES} attempts")
if attempt >= max_retries:
logger.error(
f"HTTP {exc.code} failed after {max_retries + 1} attempts for {request.full_url}"
)
raise
delay = calculate_retry_delay(attempt, exc.headers)
logger.warning(
f"HTTP {exc.code}, retrying in {delay:.1f}s "
f"(attempt {attempt + 1}/{MAX_RETRIES})"
f"HTTP {exc.code} ({exc.reason}), retrying in {delay:.1f}s "
f"(attempt {attempt + 1}/{max_retries + 1}) for {request.full_url}"
)
if auth is None and exc.code in (403, 429):
logger.info("Hint: Authenticate to raise your GitHub rate limit")
time.sleep(delay)
except (URLError, socket.error) as e:
if attempt >= MAX_RETRIES - 1:
logger.error(f"Connection error failed after {MAX_RETRIES} attempts: {e}")
if attempt >= max_retries:
logger.error(
f"Connection error failed after {max_retries + 1} attempts: {e} for {request.full_url}"
)
raise
delay = calculate_retry_delay(attempt, {})
logger.warning(
f"Connection error: {e}, retrying in {delay:.1f}s "
f"(attempt {attempt + 1}/{MAX_RETRIES})"
f"(attempt {attempt + 1}/{max_retries + 1}) for {request.full_url}"
)
time.sleep(delay)
raise Exception(f"Request failed after {MAX_RETRIES} attempts") # pragma: no cover
raise Exception(
f"Request failed after {max_retries + 1} attempts"
) # pragma: no cover
def _construct_request(per_page, query_args, template, auth, as_app=None, fine=False):
@@ -1593,6 +1601,25 @@ def filter_repositories(args, unfiltered_repositories):
]
if args.skip_archived:
repositories = [r for r in repositories if not r.get("archived")]
if args.starred_skip_size_over is not None:
if args.starred_skip_size_over <= 0:
logger.warning("--starred-skip-size-over must be greater than 0, ignoring")
else:
size_limit_kb = args.starred_skip_size_over * 1024
filtered = []
for r in repositories:
if r.get("is_starred") and r.get("size", 0) > size_limit_kb:
size_mb = r.get("size", 0) / 1024
logger.info(
"Skipping starred repo {0} ({1:.0f} MB) due to --starred-skip-size-over {2}".format(
r.get("full_name", r.get("name")),
size_mb,
args.starred_skip_size_over,
)
)
else:
filtered.append(r)
repositories = filtered
if args.exclude:
repositories = [
r for r in repositories if "name" not in r or r["name"] not in args.exclude

View File

@@ -46,8 +46,6 @@ class TestAllStarredCloning:
args.prefer_ssh = False
args.token_classic = None
args.token_fine = None
args.username = None
args.password = None
args.as_app = False
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None

View File

@@ -24,8 +24,6 @@ def attachment_test_setup(tmp_path):
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.user = "testuser"

View File

@@ -26,6 +26,8 @@ class TestCaseSensitivity:
args.private = False
args.public = False
args.all = True
args.skip_archived = False
args.starred_skip_size_over = None
# Simulate GitHub API returning canonical case
repos = [
@@ -65,6 +67,8 @@ class TestCaseSensitivity:
args.private = False
args.public = False
args.all = True
args.skip_archived = False
args.starred_skip_size_over = None
repos = [
{
@@ -93,6 +97,8 @@ class TestCaseSensitivity:
args.private = False
args.public = False
args.all = True
args.skip_archived = False
args.starred_skip_size_over = None
repos = [
{"name": "repo1", "owner": {"login": "test-user"}, "private": False, "fork": False},

View File

@@ -17,12 +17,11 @@ class TestHTTP451Exception:
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
args.max_retries = 5
mock_response = Mock()
mock_response.getcode.return_value = 451
@@ -32,18 +31,26 @@ class TestHTTP451Exception:
"block": {
"reason": "dmca",
"created_at": "2024-11-12T14:38:04Z",
"html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
}
"html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md",
},
}
mock_response.read.return_value = json.dumps(dmca_data).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with patch(
"github_backup.github_backup.make_request_with_retry",
return_value=mock_response,
):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues")
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/dmca/issues"
)
assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
assert (
exc_info.value.dmca_url
== "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
)
assert "451" in str(exc_info.value)
def test_repository_unavailable_error_without_dmca_url(self):
@@ -52,12 +59,11 @@ class TestHTTP451Exception:
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
args.max_retries = 5
mock_response = Mock()
mock_response.getcode.return_value = 451
@@ -65,9 +71,14 @@ class TestHTTP451Exception:
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with patch(
"github_backup.github_backup.make_request_with_retry",
return_value=mock_response,
):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues")
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/dmca/issues"
)
assert exc_info.value.dmca_url is None
assert "451" in str(exc_info.value)
@@ -78,12 +89,11 @@ class TestHTTP451Exception:
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
args.max_retries = 5
mock_response = Mock()
mock_response.getcode.return_value = 451
@@ -91,9 +101,14 @@ class TestHTTP451Exception:
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with patch(
"github_backup.github_backup.make_request_with_retry",
return_value=mock_response,
):
with pytest.raises(github_backup.RepositoryUnavailableError):
github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues")
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/dmca/issues"
)
if __name__ == "__main__":

View File

@@ -45,12 +45,11 @@ def mock_args():
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
args.max_retries = 5
return args

View File

@@ -9,26 +9,27 @@ import pytest
from github_backup import github_backup
from github_backup.github_backup import (
MAX_RETRIES,
calculate_retry_delay,
make_request_with_retry,
)
# Default retry count used in tests (matches argparse default)
# With max_retries=5, total attempts = 6 (1 initial + 5 retries)
DEFAULT_MAX_RETRIES = 5
class TestCalculateRetryDelay:
def test_respects_retry_after_header(self):
headers = {'retry-after': '30'}
headers = {"retry-after": "30"}
assert calculate_retry_delay(0, headers) == 30
def test_respects_rate_limit_reset(self):
import time
import calendar
# Set reset time 60 seconds in the future
future_reset = calendar.timegm(time.gmtime()) + 60
headers = {
'x-ratelimit-remaining': '0',
'x-ratelimit-reset': str(future_reset)
}
headers = {"x-ratelimit-remaining": "0", "x-ratelimit-reset": str(future_reset)}
delay = calculate_retry_delay(0, headers)
# Should be approximately 60 seconds (with some tolerance for execution time)
assert 55 <= delay <= 65
@@ -50,12 +51,10 @@ class TestCalculateRetryDelay:
def test_minimum_rate_limit_delay(self):
import time
import calendar
# Set reset time in the past (already reset)
past_reset = calendar.timegm(time.gmtime()) - 100
headers = {
'x-ratelimit-remaining': '0',
'x-ratelimit-reset': str(past_reset)
}
headers = {"x-ratelimit-remaining": "0", "x-ratelimit-reset": str(past_reset)}
delay = calculate_retry_delay(0, headers)
# Should be minimum 10 seconds even if reset time is in past
assert delay >= 10
@@ -70,12 +69,11 @@ class TestRetrieveDataRetry:
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
args.max_retries = DEFAULT_MAX_RETRIES
return args
def test_json_parse_error_retries_and_fails(self, mock_args):
@@ -92,13 +90,22 @@ class TestRetrieveDataRetry:
call_count += 1
return mock_response
with patch("github_backup.github_backup.make_request_with_retry", side_effect=mock_make_request):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): # No delay in tests
with patch(
"github_backup.github_backup.make_request_with_retry",
side_effect=mock_make_request,
):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
): # No delay in tests
with pytest.raises(Exception) as exc_info:
github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues")
github_backup.retrieve_data(
mock_args, "https://api.github.com/repos/test/repo/issues"
)
assert "Failed to read response after" in str(exc_info.value)
assert call_count == MAX_RETRIES
assert (
call_count == DEFAULT_MAX_RETRIES + 1
) # 1 initial + 5 retries = 6 attempts
def test_json_parse_error_recovers_on_retry(self, mock_args):
"""HTTP 200 with invalid JSON should succeed if retry returns valid JSON."""
@@ -121,9 +128,16 @@ class TestRetrieveDataRetry:
call_count += 1
return result
with patch("github_backup.github_backup.make_request_with_retry", side_effect=mock_make_request):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
result = github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues")
with patch(
"github_backup.github_backup.make_request_with_retry",
side_effect=mock_make_request,
):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
result = github_backup.retrieve_data(
mock_args, "https://api.github.com/repos/test/repo/issues"
)
assert result == [{"id": 1}]
assert call_count == 3 # Failed twice, succeeded on third
@@ -136,11 +150,18 @@ class TestRetrieveDataRetry:
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Not Found"
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with patch(
"github_backup.github_backup.make_request_with_retry",
return_value=mock_response,
):
with pytest.raises(Exception) as exc_info:
github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/notfound/issues")
github_backup.retrieve_data(
mock_args, "https://api.github.com/repos/test/notfound/issues"
)
assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError)
assert not isinstance(
exc_info.value, github_backup.RepositoryUnavailableError
)
assert "404" in str(exc_info.value)
@@ -153,7 +174,7 @@ class TestMakeRequestWithRetry:
good_response.read.return_value = b'{"ok": true}'
call_count = 0
fail_count = MAX_RETRIES - 1 # Fail all but last attempt
fail_count = DEFAULT_MAX_RETRIES # Fail all retries, succeed on last attempt
def mock_urlopen(*args, **kwargs):
nonlocal call_count
@@ -169,14 +190,18 @@ class TestMakeRequestWithRetry:
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert call_count == MAX_RETRIES
assert (
call_count == DEFAULT_MAX_RETRIES + 1
) # 1 initial + 5 retries = 6 attempts
def test_503_error_retries_until_exhausted(self):
"""HTTP 503 should retry MAX_RETRIES times then raise."""
"""HTTP 503 should make 1 initial + DEFAULT_MAX_RETRIES retry attempts then raise."""
call_count = 0
def mock_urlopen(*args, **kwargs):
@@ -191,12 +216,16 @@ class TestMakeRequestWithRetry:
)
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
with pytest.raises(HTTPError) as exc_info:
make_request_with_retry(Mock(), None)
assert exc_info.value.code == 503
assert call_count == MAX_RETRIES
assert (
call_count == DEFAULT_MAX_RETRIES + 1
) # 1 initial + 5 retries = 6 attempts
def test_404_error_not_retried(self):
"""HTTP 404 should not be retried - raise immediately."""
@@ -239,7 +268,9 @@ class TestMakeRequestWithRetry:
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
result = make_request_with_retry(Mock(), None)
assert result == good_response
@@ -271,7 +302,7 @@ class TestMakeRequestWithRetry:
"""URLError (connection error) should retry and succeed if subsequent request works."""
good_response = Mock()
call_count = 0
fail_count = MAX_RETRIES - 1 # Fail all but last attempt
fail_count = DEFAULT_MAX_RETRIES # Fail all retries, succeed on last attempt
def mock_urlopen(*args, **kwargs):
nonlocal call_count
@@ -281,14 +312,18 @@ class TestMakeRequestWithRetry:
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
result = make_request_with_retry(Mock(), None)
assert result == good_response
assert call_count == MAX_RETRIES
assert (
call_count == DEFAULT_MAX_RETRIES + 1
) # 1 initial + 5 retries = 6 attempts
def test_socket_error_retries_until_exhausted(self):
"""socket.error should retry MAX_RETRIES times then raise."""
"""socket.error should make 1 initial + DEFAULT_MAX_RETRIES retry attempts then raise."""
call_count = 0
def mock_urlopen(*args, **kwargs):
@@ -297,11 +332,15 @@ class TestMakeRequestWithRetry:
raise socket.error("Connection reset by peer")
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch("github_backup.github_backup.calculate_retry_delay", return_value=0):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
with pytest.raises(socket.error):
make_request_with_retry(Mock(), None)
assert call_count == MAX_RETRIES
assert (
call_count == DEFAULT_MAX_RETRIES + 1
) # 1 initial + 5 retries = 6 attempts
class TestRetrieveDataThrottling:
@@ -313,12 +352,11 @@ class TestRetrieveDataThrottling:
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = 10 # Throttle when remaining <= 10
args.throttle_pause = 5 # Pause 5 seconds
args.max_retries = DEFAULT_MAX_RETRIES
return args
def test_throttling_pauses_when_rate_limit_low(self, mock_args):
@@ -326,11 +364,19 @@ class TestRetrieveDataThrottling:
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5", "Link": ""} # Below throttle_limit
mock_response.headers = {
"x-ratelimit-remaining": "5",
"Link": "",
} # Below throttle_limit
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
with patch(
"github_backup.github_backup.make_request_with_retry",
return_value=mock_response,
):
with patch("github_backup.github_backup.time.sleep") as mock_sleep:
github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues")
github_backup.retrieve_data(
mock_args, "https://api.github.com/repos/test/repo/issues"
)
mock_sleep.assert_called_once_with(5) # throttle_pause value
@@ -344,22 +390,125 @@ class TestRetrieveDataSingleItem:
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
args.max_retries = DEFAULT_MAX_RETRIES
return args
def test_dict_response_returned_as_list(self, mock_args):
"""Single dict response should be returned as a list with one item."""
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = json.dumps({"login": "testuser", "id": 123}).encode("utf-8")
mock_response.read.return_value = json.dumps(
{"login": "testuser", "id": 123}
).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5000", "Link": ""}
with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response):
result = github_backup.retrieve_data(mock_args, "https://api.github.com/user")
with patch(
"github_backup.github_backup.make_request_with_retry",
return_value=mock_response,
):
result = github_backup.retrieve_data(
mock_args, "https://api.github.com/user"
)
assert result == [{"login": "testuser", "id": 123}]
class TestRetriesCliArgument:
"""Tests for --retries CLI argument validation and behavior."""
def test_retries_argument_accepted(self):
"""--retries flag should be accepted and parsed correctly."""
args = github_backup.parse_args(["--retries", "3", "testuser"])
assert args.max_retries == 3
def test_retries_default_value(self):
"""--retries should default to 5 if not specified."""
args = github_backup.parse_args(["testuser"])
assert args.max_retries == 5
def test_retries_zero_is_valid(self):
"""--retries 0 should be valid and mean 1 attempt (no retries)."""
args = github_backup.parse_args(["--retries", "0", "testuser"])
assert args.max_retries == 0
def test_retries_negative_rejected(self):
"""--retries with negative value should be rejected by argparse."""
with pytest.raises(SystemExit):
github_backup.parse_args(["--retries", "-1", "testuser"])
def test_retries_non_integer_rejected(self):
"""--retries with non-integer value should be rejected by argparse."""
with pytest.raises(SystemExit):
github_backup.parse_args(["--retries", "abc", "testuser"])
def test_retries_one_with_transient_error_succeeds(self):
"""--retries 1 should allow one retry after initial failure."""
good_response = Mock()
good_response.read.return_value = b'{"ok": true}'
call_count = 0
def mock_urlopen(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count == 1:
raise HTTPError(
url="https://api.github.com/test",
code=502,
msg="Bad Gateway",
hdrs={"x-ratelimit-remaining": "5000"},
fp=None,
)
return good_response
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
result = make_request_with_retry(Mock(), None, max_retries=1)
assert result == good_response
assert call_count == 2 # 1 initial + 1 retry = 2 attempts
def test_custom_retry_count_limits_attempts(self):
"""Custom --retries value should limit actual retry attempts."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
args.max_retries = 2 # 2 retries = 3 total attempts (1 initial + 2 retries)
mock_response = Mock()
mock_response.getcode.return_value = 200
mock_response.read.return_value = b"not valid json {"
mock_response.headers = {"x-ratelimit-remaining": "5000"}
call_count = 0
def mock_make_request(*args, **kwargs):
nonlocal call_count
call_count += 1
return mock_response
with patch(
"github_backup.github_backup.make_request_with_retry",
side_effect=mock_make_request,
):
with patch(
"github_backup.github_backup.calculate_retry_delay", return_value=0
):
with pytest.raises(Exception) as exc_info:
github_backup.retrieve_data(
args, "https://api.github.com/repos/test/repo/issues"
)
assert "Failed to read response after 3 attempts" in str(exc_info.value)
assert call_count == 3 # 1 initial + 2 retries = 3 attempts

View File

@@ -48,8 +48,6 @@ class TestSkipAssetsOn:
args.prefer_ssh = False
args.token_classic = "test-token"
args.token_fine = None
args.username = None
args.password = None
args.as_app = False
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None

View File

@@ -0,0 +1,224 @@
"""Tests for --starred-skip-size-over flag behavior (issue #108)."""
import pytest
from unittest.mock import Mock
from github_backup import github_backup
class TestStarredSkipSizeOver:
"""Test suite for --starred-skip-size-over flag.
Issue #108: Allow restricting size of starred repositories before cloning.
The size is based on the GitHub API's 'size' field (in KB), but the CLI
argument accepts MB for user convenience.
"""
def _create_mock_args(self, **overrides):
"""Create a mock args object with sensible defaults."""
args = Mock()
args.user = "testuser"
args.repository = None
args.name_regex = None
args.languages = None
args.fork = False
args.private = False
args.skip_archived = False
args.starred_skip_size_over = None
args.exclude = None
for key, value in overrides.items():
setattr(args, key, value)
return args
class TestStarredSkipSizeOverArgumentParsing(TestStarredSkipSizeOver):
"""Tests for --starred-skip-size-over argument parsing."""
def test_starred_skip_size_over_not_set_defaults_to_none(self):
"""When --starred-skip-size-over is not specified, it should default to None."""
args = github_backup.parse_args(["testuser"])
assert args.starred_skip_size_over is None
def test_starred_skip_size_over_accepts_integer(self):
"""--starred-skip-size-over should accept an integer value."""
args = github_backup.parse_args(["testuser", "--starred-skip-size-over", "500"])
assert args.starred_skip_size_over == 500
def test_starred_skip_size_over_rejects_non_integer(self):
"""--starred-skip-size-over should reject non-integer values."""
with pytest.raises(SystemExit):
github_backup.parse_args(["testuser", "--starred-skip-size-over", "abc"])
class TestStarredSkipSizeOverFiltering(TestStarredSkipSizeOver):
"""Tests for --starred-skip-size-over filtering behavior."""
def test_starred_repo_under_limit_is_kept(self):
"""Starred repos under the size limit should be kept."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "small-repo",
"owner": {"login": "otheruser"},
"size": 100 * 1024, # 100 MB in KB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert result[0]["name"] == "small-repo"
def test_starred_repo_over_limit_is_filtered(self):
"""Starred repos over the size limit should be filtered out."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "huge-repo",
"owner": {"login": "otheruser"},
"size": 600 * 1024, # 600 MB in KB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 0
def test_own_repo_over_limit_is_kept(self):
"""User's own repos should not be affected by the size limit."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "my-huge-repo",
"owner": {"login": "testuser"},
"size": 600 * 1024, # 600 MB in KB
# No is_starred flag - this is the user's own repo
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert result[0]["name"] == "my-huge-repo"
def test_starred_repo_at_exact_limit_is_kept(self):
"""Starred repos at exactly the size limit should be kept."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "exact-limit-repo",
"owner": {"login": "otheruser"},
"size": 500 * 1024, # Exactly 500 MB in KB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert result[0]["name"] == "exact-limit-repo"
def test_mixed_repos_filtered_correctly(self):
"""Mix of own and starred repos should be filtered correctly."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "my-huge-repo",
"owner": {"login": "testuser"},
"size": 1000 * 1024, # 1 GB - own repo, should be kept
},
{
"name": "starred-small",
"owner": {"login": "otheruser"},
"size": 100 * 1024, # 100 MB - under limit
"is_starred": True,
},
{
"name": "starred-huge",
"owner": {"login": "anotheruser"},
"size": 2000 * 1024, # 2 GB - over limit
"is_starred": True,
},
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 2
names = [r["name"] for r in result]
assert "my-huge-repo" in names
assert "starred-small" in names
assert "starred-huge" not in names
def test_no_size_limit_keeps_all_starred(self):
"""When no size limit is set, all starred repos should be kept."""
args = self._create_mock_args(starred_skip_size_over=None)
repos = [
{
"name": "huge-starred-repo",
"owner": {"login": "otheruser"},
"size": 10000 * 1024, # 10 GB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
def test_repo_without_size_field_is_kept(self):
"""Repos without a size field should be kept (size defaults to 0)."""
args = self._create_mock_args(starred_skip_size_over=500)
repos = [
{
"name": "no-size-repo",
"owner": {"login": "otheruser"},
"is_starred": True,
# No size field
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
def test_zero_value_warns_and_is_ignored(self, caplog):
"""Zero value should warn and keep all repos."""
args = self._create_mock_args(starred_skip_size_over=0)
repos = [
{
"name": "huge-starred-repo",
"owner": {"login": "otheruser"},
"size": 10000 * 1024, # 10 GB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert "must be greater than 0" in caplog.text
def test_negative_value_warns_and_is_ignored(self, caplog):
"""Negative value should warn and keep all repos."""
args = self._create_mock_args(starred_skip_size_over=-5)
repos = [
{
"name": "huge-starred-repo",
"owner": {"login": "otheruser"},
"size": 10000 * 1024, # 10 GB
"is_starred": True,
}
]
result = github_backup.filter_repositories(args, repos)
assert len(result) == 1
assert "must be greater than 0" in caplog.text
if __name__ == "__main__":
pytest.main([__file__, "-v"])