Compare commits

..

17 Commits

Author SHA1 Message Date
GitHub Action
ba852b5830 Release version 0.57.0 2025-12-12 11:07:14 +00:00
Jose Diaz-Gonzalez
934ee4b14b Merge pull request #467 from Iamrodos/docs/187-189-auth-docs
Add GitHub Apps documentation and stdin token example
2025-12-12 06:06:30 -05:00
Jose Diaz-Gonzalez
37a0c5c123 Merge pull request #468 from Iamrodos/feature/135-skip-assets-on
Add --skip-assets-on flag to skip release asset downloads (#135)
2025-12-12 06:05:47 -05:00
Rodos
f6e2f40b09 Add --skip-assets-on flag to skip release asset downloads (#135)
Allow users to skip downloading release assets for specific repositories
while still backing up release metadata. Useful for starred repos with
large assets (e.g. syncthing with 27GB+).

Usage: --skip-assets-on repo1 repo2 owner/repo3

Features:
- Space-separated repos (consistent with --exclude)
- Case-insensitive matching
- Supports both repo name and owner/repo format
2025-12-12 16:21:52 +11:00
Rodos
ef990483e2 Add GitHub Apps documentation and remove outdated header
- Add GitHub Apps authentication section with setup steps
  and CI/CD workflow example using actions/create-github-app-token
- Remove outdated machine-man-preview header (graduated 2020)

Closes #189
2025-12-12 10:25:49 +11:00
Rodos
3a513b6646 docs: add stdin token example to README
Add example showing how to pipe a token from stdin using
file:///dev/stdin to avoid storing tokens in environment
variables or command history.

Closes #187
2025-12-12 09:55:13 +11:00
GitHub Action
2bb83d6d8b Release version 0.56.0 2025-12-11 16:50:28 +00:00
Jose Diaz-Gonzalez
8fcc142621 Merge pull request #465 from Iamrodos/fix/379-lfs-clone-deprecated
fix: replace deprecated git lfs clone with git clone + git lfs fetch --all
2025-12-11 11:49:53 -05:00
Jose Diaz-Gonzalez
7615ce6102 Merge pull request #464 from Iamrodos/fix/246-restore-docs
docs: clarify no inbuilt restore and GitHub API limitations
2025-12-11 11:49:39 -05:00
Jose Diaz-Gonzalez
3f1ef821c3 Merge pull request #466 from Iamrodos/fix/112-windows-support
fix: add Windows support with entry_points and os.replace
2025-12-11 11:48:59 -05:00
Rodos
3684756eaa fix: add Windows support with entry_points and os.replace
- Replace os.rename() with os.replace() for atomic file operations
  on Windows (os.rename fails if destination exists on Windows)
- Add entry_points console_scripts for proper .exe generation on Windows
- Create github_backup/cli.py with main() entry point
- Add github_backup/__main__.py for python -m github_backup support
- Keep bin/github-backup as thin wrapper for backwards compatibility

Closes #112
2025-12-11 22:03:45 +11:00
Rodos
e745b55755 fix: replace deprecated git lfs clone with git clone + git lfs fetch --all
git lfs clone is deprecated - modern git clone handles LFS automatically.
Using git lfs fetch --all ensures all LFS objects across all refs are
backed up, matching the existing bare clone behavior and providing
complete LFS backups.

Closes #379
2025-12-11 20:55:38 +11:00
Rodos
75e6f56773 docs: add "Restoring from Backup" section to README
Clarifies that this tool is backup-only with no inbuilt restore.
Documents that git repos can be pushed back, but issues/PRs have
GitHub API limitations affecting all backup tools.

Closes #246
2025-12-11 20:35:08 +11:00
Jose Diaz-Gonzalez
b991c363a0 Merge pull request #463 from josegonzalez/dependabot/pip/python-packages-9e0978b55f
chore(deps): bump urllib3 from 2.6.0 to 2.6.1 in the python-packages group
2025-12-10 09:39:07 -05:00
dependabot[bot]
6d74af9126 chore(deps): bump urllib3 in the python-packages group
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).


Updates `urllib3` from 2.6.0 to 2.6.1
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.0...2.6.1)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-09 13:10:12 +00:00
Jose Diaz-Gonzalez
381d67af96 Merge pull request #462 from josegonzalez/dependabot/pip/python-packages-3a01b12ef5
chore(deps): bump the python-packages group with 3 updates
2025-12-08 16:00:24 -05:00
dependabot[bot]
2fbe8d272c chore(deps): bump the python-packages group with 3 updates
Bumps the python-packages group with 3 updates: [black](https://github.com/psf/black), [pytest](https://github.com/pytest-dev/pytest) and [platformdirs](https://github.com/tox-dev/platformdirs).


Updates `black` from 25.11.0 to 25.12.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.11.0...25.12.0)

Updates `pytest` from 9.0.1 to 9.0.2
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.1...9.0.2)

Updates `platformdirs` from 4.5.0 to 4.5.1
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.5.0...4.5.1)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 25.12.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-packages
- dependency-name: pytest
  dependency-version: 9.0.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
- dependency-name: platformdirs
  dependency-version: 4.5.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-08 13:09:32 +00:00
10 changed files with 706 additions and 121 deletions

View File

@@ -1,9 +1,134 @@
Changelog
=========
0.55.0 (2025-12-07)
0.57.0 (2025-12-12)
-------------------
------------------------
- Add GitHub Apps documentation and remove outdated header. [Rodos]
- Add GitHub Apps authentication section with setup steps
and CI/CD workflow example using actions/create-github-app-token
- Remove outdated machine-man-preview header (graduated 2020)
Closes #189
- Docs: add stdin token example to README. [Rodos]
Add example showing how to pipe a token from stdin using
file:///dev/stdin to avoid storing tokens in environment
variables or command history.
Closes #187
- Add --skip-assets-on flag to skip release asset downloads (#135)
[Rodos]
Allow users to skip downloading release assets for specific repositories
while still backing up release metadata. Useful for starred repos with
large assets (e.g. syncthing with 27GB+).
Usage: --skip-assets-on repo1 repo2 owner/repo3
Features:
- Space-separated repos (consistent with --exclude)
- Case-insensitive matching
- Supports both repo name and owner/repo format
0.56.0 (2025-12-11)
-------------------
Fix
~~~
- Replace deprecated git lfs clone with git clone + git lfs fetch --all.
[Rodos]
git lfs clone is deprecated - modern git clone handles LFS automatically.
Using git lfs fetch --all ensures all LFS objects across all refs are
backed up, matching the existing bare clone behavior and providing
complete LFS backups.
Closes #379
- Add Windows support with entry_points and os.replace. [Rodos]
- Replace os.rename() with os.replace() for atomic file operations
on Windows (os.rename fails if destination exists on Windows)
- Add entry_points console_scripts for proper .exe generation on Windows
- Create github_backup/cli.py with main() entry point
- Add github_backup/__main__.py for python -m github_backup support
- Keep bin/github-backup as thin wrapper for backwards compatibility
Closes #112
Other
~~~~~
- Docs: add "Restoring from Backup" section to README. [Rodos]
Clarifies that this tool is backup-only with no inbuilt restore.
Documents that git repos can be pushed back, but issues/PRs have
GitHub API limitations affecting all backup tools.
Closes #246
- Chore(deps): bump urllib3 in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3).
Updates `urllib3` from 2.6.0 to 2.6.1
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.0...2.6.1)
---
updated-dependencies:
- dependency-name: urllib3
dependency-version: 2.6.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
- Chore(deps): bump the python-packages group with 3 updates.
[dependabot[bot]]
Bumps the python-packages group with 3 updates: [black](https://github.com/psf/black), [pytest](https://github.com/pytest-dev/pytest) and [platformdirs](https://github.com/tox-dev/platformdirs).
Updates `black` from 25.11.0 to 25.12.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.11.0...25.12.0)
Updates `pytest` from 9.0.1 to 9.0.2
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.1...9.0.2)
Updates `platformdirs` from 4.5.0 to 4.5.1
- [Release notes](https://github.com/tox-dev/platformdirs/releases)
- [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst)
- [Commits](https://github.com/tox-dev/platformdirs/compare/4.5.0...4.5.1)
---
updated-dependencies:
- dependency-name: black
dependency-version: 25.12.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: python-packages
- dependency-name: pytest
dependency-version: 9.0.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
- dependency-name: platformdirs
dependency-version: 4.5.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
0.55.0 (2025-12-07)
-------------------
Fix
~~~

View File

@@ -50,8 +50,8 @@ CLI Help output::
[--keychain-name OSX_KEYCHAIN_ITEM_NAME]
[--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT]
[--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES]
[--skip-prerelease] [--assets] [--attachments]
[--exclude [REPOSITORY [REPOSITORY ...]]
[--skip-prerelease] [--assets] [--skip-assets-on [REPO ...]]
[--attachments] [--exclude [REPOSITORY [REPOSITORY ...]]
[--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE]
USER
@@ -133,6 +133,9 @@ CLI Help output::
--skip-prerelease skip prerelease and draft versions; only applies if including releases
--assets include assets alongside release information; only
applies if including releases
--skip-assets-on [REPO ...]
skip asset downloads for these repositories (e.g.
--skip-assets-on repo1 owner/repo2)
--attachments download user-attachments from issues and pull requests
to issues/attachments/{issue_number}/ and
pulls/attachments/{pull_number}/ directories
@@ -174,6 +177,37 @@ Customise the permissions for your use case, but for a personal account full bac
**Repository permissions**: Read access to contents, issues, metadata, pull requests, and webhooks.
GitHub Apps
~~~~~~~~~~~
GitHub Apps are ideal for organization backups in CI/CD. Tokens are scoped to specific repositories and expire after 1 hour.
**One-time setup:**
1. Create a GitHub App at *Settings -> Developer Settings -> GitHub Apps -> New GitHub App*
2. Set a name and homepage URL (can be any URL)
3. Uncheck "Webhook > Active" (not needed for backups)
4. Set permissions (same as fine-grained tokens above)
5. Click "Create GitHub App", then note the **App ID** shown on the next page
6. Under "Private keys", click "Generate a private key" and save the downloaded file
7. Go to *Install App* in your app's settings
8. Select the account/organization and which repositories to back up
**CI/CD usage with GitHub Actions:**
Store the App ID as a repository variable and the private key contents as a secret, then use ``actions/create-github-app-token``::
- uses: actions/create-github-app-token@v1
id: app-token
with:
app-id: ${{ vars.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
- run: github-backup myorg -t ${{ steps.app-token.outputs.token }} --as-app -o ./backup --all
Note: Installation tokens expire after 1 hour. For long-running backups, use a fine-grained personal access token instead.
Prefer SSH
~~~~~~~~~~
@@ -215,6 +249,8 @@ When you use the ``--lfs`` option, you will need to make sure you have Git LFS i
Instructions on how to do this can be found on https://git-lfs.github.com.
LFS objects are fetched for all refs, not just the current checkout, ensuring a complete backup of all LFS content across all branches and history.
About Attachments
-----------------
@@ -359,6 +395,28 @@ Debug an error/block or incomplete backup into a temporary directory. Omit "incr
github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
Pipe a token from stdin to avoid storing it in environment variables or command history (Unix-like systems only)::
my-secret-manager get github-token | github-backup user -t file:///dev/stdin -o /backup --repositories
Restoring from Backup
=====================
This tool creates backups only, there is no inbuilt restore command.
**Git repositories, wikis, and gists** can be restored by pushing them back to GitHub as you would any git repository. For example, to restore a bare repository backup::
cd /tmp/white-house/repositories/petitions/repository
git push --mirror git@github.com:WhiteHouse/petitions.git
**Issues, pull requests, comments, and other metadata** are saved as JSON files for archival purposes. The GitHub API does not support recreating this data faithfully, creating issues via the API has limitations:
- New issue/PR numbers are assigned (original numbers cannot be set)
- Timestamps reflect creation time (original dates cannot be set)
- The API caller becomes the author (original authors cannot be set)
- Cross-references between issues and PRs will break
These are GitHub API limitations that affect all backup and migration tools, not just this one. Recreating issues with these limitations via the GitHub API is an exercise for the reader. The JSON backups remain useful for searching, auditing, or manual reference.
Development

View File

@@ -1,76 +1,18 @@
#!/usr/bin/env python
"""
Backwards-compatible wrapper script.
The recommended way to run github-backup is via the installed command
(pip install github-backup) or python -m github_backup.
This script is kept for backwards compatibility with existing installations
that may reference this path directly.
"""
import logging
import os
import sys
from github_backup.github_backup import (
backup_account,
backup_repositories,
check_git_lfs_install,
filter_repositories,
get_auth,
get_authenticated_user,
logger,
mkdir_p,
parse_args,
retrieve_repositories,
)
# INFO and DEBUG go to stdout, WARNING and above go to stderr
log_format = logging.Formatter(
fmt="%(asctime)s.%(msecs)03d: %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S",
)
stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.setLevel(logging.DEBUG)
stdout_handler.addFilter(lambda r: r.levelno < logging.WARNING)
stdout_handler.setFormatter(log_format)
stderr_handler = logging.StreamHandler(sys.stderr)
stderr_handler.setLevel(logging.WARNING)
stderr_handler.setFormatter(log_format)
logging.basicConfig(level=logging.INFO, handlers=[stdout_handler, stderr_handler])
def main():
args = parse_args()
if args.private and not get_auth(args):
logger.warning(
"The --private flag has no effect without authentication. "
"Use -t/--token, -f/--token-fine, or -u/--username to authenticate."
)
if args.quiet:
logger.setLevel(logging.WARNING)
output_directory = os.path.realpath(args.output_directory)
if not os.path.isdir(output_directory):
logger.info("Create output directory {0}".format(output_directory))
mkdir_p(output_directory)
if args.lfs_clone:
check_git_lfs_install()
if args.log_level:
log_level = logging.getLevelName(args.log_level.upper())
if isinstance(log_level, int):
logger.root.setLevel(log_level)
if not args.as_app:
logger.info("Backing up user {0} to {1}".format(args.user, output_directory))
authenticated_user = get_authenticated_user(args)
else:
authenticated_user = {"login": None}
repositories = retrieve_repositories(args, authenticated_user)
repositories = filter_repositories(args, repositories)
backup_repositories(args, output_directory, repositories)
backup_account(args, output_directory)
from github_backup.cli import main
from github_backup.github_backup import logger
if __name__ == "__main__":
try:

View File

@@ -1 +1 @@
__version__ = "0.55.0"
__version__ = "0.57.0"

13
github_backup/__main__.py Normal file
View File

@@ -0,0 +1,13 @@
"""Allow running as: python -m github_backup"""
import sys
from github_backup.cli import main
from github_backup.github_backup import logger
if __name__ == "__main__":
try:
main()
except Exception as e:
logger.error(str(e))
sys.exit(1)

82
github_backup/cli.py Normal file
View File

@@ -0,0 +1,82 @@
#!/usr/bin/env python
"""Command-line interface for github-backup."""
import logging
import os
import sys
from github_backup.github_backup import (
backup_account,
backup_repositories,
check_git_lfs_install,
filter_repositories,
get_auth,
get_authenticated_user,
logger,
mkdir_p,
parse_args,
retrieve_repositories,
)
# INFO and DEBUG go to stdout, WARNING and above go to stderr
log_format = logging.Formatter(
fmt="%(asctime)s.%(msecs)03d: %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S",
)
stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.setLevel(logging.DEBUG)
stdout_handler.addFilter(lambda r: r.levelno < logging.WARNING)
stdout_handler.setFormatter(log_format)
stderr_handler = logging.StreamHandler(sys.stderr)
stderr_handler.setLevel(logging.WARNING)
stderr_handler.setFormatter(log_format)
logging.basicConfig(level=logging.INFO, handlers=[stdout_handler, stderr_handler])
def main():
"""Main entry point for github-backup CLI."""
args = parse_args()
if args.private and not get_auth(args):
logger.warning(
"The --private flag has no effect without authentication. "
"Use -t/--token, -f/--token-fine, or -u/--username to authenticate."
)
if args.quiet:
logger.setLevel(logging.WARNING)
output_directory = os.path.realpath(args.output_directory)
if not os.path.isdir(output_directory):
logger.info("Create output directory {0}".format(output_directory))
mkdir_p(output_directory)
if args.lfs_clone:
check_git_lfs_install()
if args.log_level:
log_level = logging.getLevelName(args.log_level.upper())
if isinstance(log_level, int):
logger.root.setLevel(log_level)
if not args.as_app:
logger.info("Backing up user {0} to {1}".format(args.user, output_directory))
authenticated_user = get_authenticated_user(args)
else:
authenticated_user = {"login": None}
repositories = retrieve_repositories(args, authenticated_user)
repositories = filter_repositories(args, repositories)
backup_repositories(args, output_directory, repositories)
backup_account(args, output_directory)
if __name__ == "__main__":
try:
main()
except Exception as e:
logger.error(str(e))
sys.exit(1)

View File

@@ -440,6 +440,12 @@ def parse_args(args=None):
dest="include_assets",
help="include assets alongside release information; only applies if including releases",
)
parser.add_argument(
"--skip-assets-on",
dest="skip_assets_on",
nargs="*",
help="skip asset downloads for these repositories",
)
parser.add_argument(
"--attachments",
action="store_true",
@@ -561,7 +567,7 @@ def get_github_host(args):
def read_file_contents(file_uri):
return open(file_uri[len(FILE_URI_PREFIX):], "rt").readline().strip()
return open(file_uri[len(FILE_URI_PREFIX) :], "rt").readline().strip()
def get_github_repo_url(args, repository):
@@ -631,7 +637,7 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
pass
raise RepositoryUnavailableError(
"Repository unavailable due to legal reasons (HTTP 451)",
dmca_url=dmca_url
dmca_url=dmca_url,
)
# Check if we got correct data
@@ -709,7 +715,7 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
# Parse Link header: <https://api.github.com/...?per_page=100&after=cursor>; rel="next"
for link in link_header.split(","):
if 'rel="next"' in link:
next_url = link[link.find("<") + 1:link.find(">")]
next_url = link[link.find("<") + 1 : link.find(">")]
break
if not next_url:
break
@@ -763,9 +769,7 @@ def _get_response(request, auth, template):
return r, errors
def _construct_request(
per_page, query_args, template, auth, as_app=None, fine=False
):
def _construct_request(per_page, query_args, template, auth, as_app=None, fine=False):
# If template is already a full URL with query params (from Link header), use it directly
if "?" in template and template.startswith("http"):
request_url = template
@@ -796,9 +800,6 @@ def _construct_request(
else:
auth = auth.encode("ascii")
request.add_header("Authorization", "token ".encode("ascii") + auth)
request.add_header(
"Accept", "application/vnd.github.machine-man-preview+json"
)
log_url = template if "?" not in template else template.split("?")[0]
if querystring:
@@ -1038,7 +1039,7 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False):
bytes_downloaded += len(chunk)
# Atomic rename to final location
os.rename(temp_path, path)
os.replace(temp_path, path)
metadata["size_bytes"] = bytes_downloaded
metadata["success"] = True
@@ -1459,7 +1460,7 @@ def download_attachments(
# Rename to add extension (already atomic from download)
try:
os.rename(filepath, final_filepath)
os.replace(filepath, final_filepath)
metadata["saved_as"] = os.path.basename(final_filepath)
except Exception as e:
logger.warning(
@@ -1480,9 +1481,11 @@ def download_attachments(
manifest = {
"issue_number": number,
"issue_type": item_type,
"repository": f"{args.user}/{args.repository}"
if hasattr(args, "repository") and args.repository
else args.user,
"repository": (
f"{args.user}/{args.repository}"
if hasattr(args, "repository") and args.repository
else args.user
),
"manifest_updated_at": datetime.now(timezone.utc).isoformat(),
"attachments": attachment_metadata_list,
}
@@ -1490,7 +1493,7 @@ def download_attachments(
manifest_path = os.path.join(attachments_dir, "manifest.json")
with open(manifest_path + ".temp", "w") as f:
json.dump(manifest, f, indent=2)
os.rename(manifest_path + ".temp", manifest_path) # Atomic write
os.replace(manifest_path + ".temp", manifest_path) # Atomic write
logger.debug(
"Wrote manifest for {0} #{1}: {2} attachments".format(
item_type_display, number, len(attachment_metadata_list)
@@ -1538,9 +1541,7 @@ def retrieve_repositories(args, authenticated_user):
else:
repo_path = "{0}/{1}".format(args.user, args.repository)
single_request = True
template = "https://{0}/repos/{1}".format(
get_github_api_host(args), repo_path
)
template = "https://{0}/repos/{1}".format(get_github_api_host(args), repo_path)
repos = retrieve_data(args, template, single_request=single_request)
@@ -1565,7 +1566,10 @@ def retrieve_repositories(args, authenticated_user):
repos.extend(gists)
if args.include_starred_gists:
if not authenticated_user.get("login") or args.user.lower() != authenticated_user["login"].lower():
if (
not authenticated_user.get("login")
or args.user.lower() != authenticated_user["login"].lower()
):
logger.warning(
"Cannot retrieve starred gists for '%s'. GitHub only allows access to the authenticated user's starred gists.",
args.user,
@@ -1673,9 +1677,11 @@ def backup_repositories(args, output_directory, repositories):
include_gists = args.include_gists or args.include_starred_gists
include_starred = args.all_starred and repository.get("is_starred")
if (args.include_repository or args.include_everything) or (
include_gists and repository.get("is_gist")
) or include_starred:
if (
(args.include_repository or args.include_everything)
or (include_gists and repository.get("is_gist"))
or include_starred
):
repo_name = (
repository.get("name")
if not repository.get("is_gist")
@@ -1735,7 +1741,9 @@ def backup_repositories(args, output_directory, repositories):
include_assets=args.include_assets or args.include_everything,
)
except RepositoryUnavailableError as e:
logger.warning(f"Repository {repository['full_name']} is unavailable (HTTP 451)")
logger.warning(
f"Repository {repository['full_name']} is unavailable (HTTP 451)"
)
if e.dmca_url:
logger.warning(f"DMCA notice: {e.dmca_url}")
logger.info(f"Skipping remaining resources for {repository['full_name']}")
@@ -1795,7 +1803,11 @@ def backup_issues(args, repo_cwd, repository, repos_template):
modified = os.path.getmtime(issue_file)
modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ")
if modified > issue["updated_at"]:
logger.info("Skipping issue {0} because it wasn't modified since last backup".format(number))
logger.info(
"Skipping issue {0} because it wasn't modified since last backup".format(
number
)
)
continue
if args.include_issue_comments or args.include_everything:
@@ -1811,7 +1823,7 @@ def backup_issues(args, repo_cwd, repository, repos_template):
with codecs.open(issue_file + ".temp", "w", encoding="utf-8") as f:
json_dump(issue, f)
os.rename(issue_file + ".temp", issue_file) # Unlike json_dump, this is atomic
os.replace(issue_file + ".temp", issue_file) # Atomic write
def backup_pulls(args, repo_cwd, repository, repos_template):
@@ -1869,7 +1881,11 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
modified = os.path.getmtime(pull_file)
modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ")
if modified > pull["updated_at"]:
logger.info("Skipping pull request {0} because it wasn't modified since last backup".format(number))
logger.info(
"Skipping pull request {0} because it wasn't modified since last backup".format(
number
)
)
continue
if args.include_pull_comments or args.include_everything:
template = comments_regular_template.format(number)
@@ -1886,7 +1902,7 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
with codecs.open(pull_file + ".temp", "w", encoding="utf-8") as f:
json_dump(pull, f)
os.rename(pull_file + ".temp", pull_file) # Unlike json_dump, this is atomic
os.replace(pull_file + ".temp", pull_file) # Atomic write
def backup_milestones(args, repo_cwd, repository, repos_template):
@@ -1919,9 +1935,11 @@ def backup_milestones(args, repo_cwd, repository, repos_template):
elif written_count == 0:
logger.info("{0} milestones unchanged, skipped write".format(total))
else:
logger.info("Saved {0} of {1} milestones to disk ({2} unchanged)".format(
written_count, total, total - written_count
))
logger.info(
"Saved {0} of {1} milestones to disk ({2} unchanged)".format(
written_count, total, total - written_count
)
)
def backup_labels(args, repo_cwd, repository, repos_template):
@@ -1975,6 +1993,20 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
)
releases = releases[: args.number_of_latest_releases]
# Check if this repo should skip asset downloads (case-insensitive)
skip_assets = False
if include_assets:
repo_name = repository.get("name", "").lower()
repo_full_name = repository.get("full_name", "").lower()
skip_repos = [r.lower() for r in (args.skip_assets_on or [])]
skip_assets = repo_name in skip_repos or repo_full_name in skip_repos
if skip_assets:
logger.info(
"Skipping assets for {0} ({1} releases) due to --skip-assets-on".format(
repository.get("name"), len(releases)
)
)
# for each release, store it
written_count = 0
for release in releases:
@@ -1986,7 +2018,7 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
if json_dump_if_changed(release, output_filepath):
written_count += 1
if include_assets:
if include_assets and not skip_assets:
assets = retrieve_data(args, release["assets_url"])
if len(assets) > 0:
# give release asset files somewhere to live & download them (not including source archives)
@@ -2008,9 +2040,11 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F
elif written_count == 0:
logger.info("{0} releases unchanged, skipped write".format(total))
else:
logger.info("Saved {0} of {1} releases to disk ({2} unchanged)".format(
written_count, total, total - written_count
))
logger.info(
"Saved {0} of {1} releases to disk ({2} unchanged)".format(
written_count, total, total - written_count
)
)
def fetch_repository(
@@ -2024,9 +2058,12 @@ def fetch_repository(
):
if bare_clone:
if os.path.exists(local_dir):
clone_exists = subprocess.check_output(
["git", "rev-parse", "--is-bare-repository"], cwd=local_dir
) == b"true\n"
clone_exists = (
subprocess.check_output(
["git", "rev-parse", "--is-bare-repository"], cwd=local_dir
)
== b"true\n"
)
else:
clone_exists = False
else:
@@ -2047,7 +2084,9 @@ def fetch_repository(
)
else:
logger.info(
"Skipping {0} (repository not accessible - may be empty, private, or credentials invalid)".format(name)
"Skipping {0} (repository not accessible - may be empty, private, or credentials invalid)".format(
name
)
)
return
@@ -2090,11 +2129,13 @@ def fetch_repository(
git_command.pop()
logging_subprocess(git_command, cwd=local_dir)
else:
if lfs_clone:
git_command = ["git", "lfs", "clone", remote_url, local_dir]
else:
git_command = ["git", "clone", remote_url, local_dir]
git_command = ["git", "clone", remote_url, local_dir]
logging_subprocess(git_command)
if lfs_clone:
git_command = ["git", "lfs", "fetch", "--all", "--prune"]
if no_prune:
git_command.pop()
logging_subprocess(git_command, cwd=local_dir)
def backup_account(args, output_directory):
@@ -2203,5 +2244,5 @@ def json_dump_if_changed(data, output_file_path):
temp_file = output_file_path + ".temp"
with codecs.open(temp_file, "w", encoding="utf-8") as f:
f.write(new_content)
os.rename(temp_file, output_file_path) # Atomic on POSIX systems
os.replace(temp_file, output_file_path) # Atomic write
return True

View File

@@ -1,5 +1,5 @@
autopep8==2.3.2
black==25.11.0
black==25.12.0
bleach==6.3.0
certifi==2025.11.12
charset-normalizer==3.4.4
@@ -8,7 +8,7 @@ colorama==0.4.6
docutils==0.22.3
flake8==7.3.0
gitchangelog==3.0.4
pytest==9.0.1
pytest==9.0.2
idna==3.11
importlib-metadata==8.7.0
jaraco.classes==3.4.0
@@ -21,7 +21,7 @@ mypy-extensions==1.1.0
packaging==25.0
pathspec==0.12.1
pkginfo==1.12.1.2
platformdirs==4.5.0
platformdirs==4.5.1
pycodestyle==2.14.0
pyflakes==3.4.0
Pygments==2.19.2
@@ -35,6 +35,6 @@ setuptools==80.9.0
six==1.17.0
tqdm==4.67.1
twine==6.2.0
urllib3==2.6.0
urllib3==2.6.1
webencodings==0.5.1
zipp==3.23.0

View File

@@ -33,7 +33,11 @@ setup(
author="Jose Diaz-Gonzalez",
author_email="github-backup@josediazgonzalez.com",
packages=["github_backup"],
scripts=["bin/github-backup"],
entry_points={
"console_scripts": [
"github-backup=github_backup.cli:main",
],
},
url="http://github.com/josegonzalez/python-github-backup",
license="MIT",
classifiers=[

View File

@@ -0,0 +1,320 @@
"""Tests for --skip-assets-on flag behavior (issue #135)."""
import pytest
from unittest.mock import Mock, patch
from github_backup import github_backup
class TestSkipAssetsOn:
"""Test suite for --skip-assets-on flag.
Issue #135: Allow skipping asset downloads for specific repositories
while still backing up release metadata.
"""
def _create_mock_args(self, **overrides):
"""Create a mock args object with sensible defaults."""
args = Mock()
args.user = "testuser"
args.output_directory = "/tmp/backup"
args.include_repository = False
args.include_everything = False
args.include_gists = False
args.include_starred_gists = False
args.all_starred = False
args.skip_existing = False
args.bare_clone = False
args.lfs_clone = False
args.no_prune = False
args.include_wiki = False
args.include_issues = False
args.include_issue_comments = False
args.include_issue_events = False
args.include_pulls = False
args.include_pull_comments = False
args.include_pull_commits = False
args.include_pull_details = False
args.include_labels = False
args.include_hooks = False
args.include_milestones = False
args.include_releases = True
args.include_assets = True
args.skip_assets_on = []
args.include_attachments = False
args.incremental = False
args.incremental_by_files = False
args.github_host = None
args.prefer_ssh = False
args.token_classic = "test-token"
args.token_fine = None
args.username = None
args.password = None
args.as_app = False
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.skip_prerelease = False
args.number_of_latest_releases = None
for key, value in overrides.items():
setattr(args, key, value)
return args
def _create_mock_repository(self, name="test-repo", owner="testuser"):
"""Create a mock repository object."""
return {
"name": name,
"full_name": f"{owner}/{name}",
"owner": {"login": owner},
"private": False,
"fork": False,
"has_wiki": False,
}
def _create_mock_release(self, tag="v1.0.0"):
"""Create a mock release object."""
return {
"tag_name": tag,
"name": tag,
"prerelease": False,
"draft": False,
"assets_url": f"https://api.github.com/repos/testuser/test-repo/releases/{tag}/assets",
}
def _create_mock_asset(self, name="asset.zip"):
"""Create a mock asset object."""
return {
"name": name,
"url": f"https://api.github.com/repos/testuser/test-repo/releases/assets/{name}",
}
class TestSkipAssetsOnArgumentParsing(TestSkipAssetsOn):
"""Tests for --skip-assets-on argument parsing."""
def test_skip_assets_on_not_set_defaults_to_none(self):
"""When --skip-assets-on is not specified, it should default to None."""
args = github_backup.parse_args(["testuser"])
assert args.skip_assets_on is None
def test_skip_assets_on_single_repo(self):
"""Single --skip-assets-on should create list with one item."""
args = github_backup.parse_args(["testuser", "--skip-assets-on", "big-repo"])
assert args.skip_assets_on == ["big-repo"]
def test_skip_assets_on_multiple_repos(self):
"""Multiple repos can be specified space-separated (like --exclude)."""
args = github_backup.parse_args(
[
"testuser",
"--skip-assets-on",
"big-repo",
"another-repo",
"owner/third-repo",
]
)
assert args.skip_assets_on == ["big-repo", "another-repo", "owner/third-repo"]
class TestSkipAssetsOnBehavior(TestSkipAssetsOn):
"""Tests for --skip-assets-on behavior in backup_releases."""
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_downloaded_when_not_skipped(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Assets should be downloaded when repo is not in skip list."""
args = self._create_mock_args(skip_assets_on=[])
repository = self._create_mock_repository(name="normal-repo")
release = self._create_mock_release()
asset = self._create_mock_asset()
mock_json_dump.return_value = True
mock_retrieve.side_effect = [
[release], # First call: get releases
[asset], # Second call: get assets
]
with patch("os.path.join", side_effect=lambda *args: "/".join(args)):
github_backup.backup_releases(
args,
"/tmp/backup/repositories/normal-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should have been called for the asset
mock_download.assert_called_once()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_skipped_when_repo_name_matches(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Assets should be skipped when repo name is in skip list."""
args = self._create_mock_args(skip_assets_on=["big-repo"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_assets_skipped_when_full_name_matches(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Assets should be skipped when owner/repo format matches."""
args = self._create_mock_args(skip_assets_on=["otheruser/big-repo"])
repository = self._create_mock_repository(name="big-repo", owner="otheruser")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_case_insensitive_matching(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Skip matching should be case-insensitive."""
# User types uppercase, repo name is lowercase
args = self._create_mock_args(skip_assets_on=["BIG-REPO"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called (case-insensitive match)
assert not mock_download.called
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_multiple_skip_repos(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Multiple repos in skip list should all be skipped."""
args = self._create_mock_args(skip_assets_on=["repo1", "repo2", "repo3"])
repository = self._create_mock_repository(name="repo2")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/repo2",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_release_metadata_still_saved_when_assets_skipped(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Release JSON should still be saved even when assets are skipped."""
args = self._create_mock_args(skip_assets_on=["big-repo"])
repository = self._create_mock_repository(name="big-repo")
release = self._create_mock_release()
mock_json_dump.return_value = True
mock_retrieve.return_value = [release]
github_backup.backup_releases(
args,
"/tmp/backup/repositories/big-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# json_dump_if_changed should have been called for release metadata
mock_json_dump.assert_called_once()
# But download_file should NOT have been called
mock_download.assert_not_called()
@patch("github_backup.github_backup.download_file")
@patch("github_backup.github_backup.retrieve_data")
@patch("github_backup.github_backup.mkdir_p")
@patch("github_backup.github_backup.json_dump_if_changed")
def test_non_matching_repo_still_downloads_assets(
self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download
):
"""Repos not in skip list should still download assets."""
args = self._create_mock_args(skip_assets_on=["other-repo"])
repository = self._create_mock_repository(name="normal-repo")
release = self._create_mock_release()
asset = self._create_mock_asset()
mock_json_dump.return_value = True
mock_retrieve.side_effect = [
[release], # First call: get releases
[asset], # Second call: get assets
]
with patch("os.path.join", side_effect=lambda *args: "/".join(args)):
github_backup.backup_releases(
args,
"/tmp/backup/repositories/normal-repo",
repository,
"https://api.github.com/repos/{owner}/{repo}",
include_assets=True,
)
# download_file SHOULD have been called
mock_download.assert_called_once()
if __name__ == "__main__":
pytest.main([__file__, "-v"])