mirror of
https://github.com/josegonzalez/python-github-backup.git
synced 2025-12-05 08:08:02 +01:00
312 lines
17 KiB
ReStructuredText
312 lines
17 KiB
ReStructuredText
=============
|
|
github-backup
|
|
=============
|
|
|
|
|PyPI| |Python Versions|
|
|
|
|
The package can be used to backup an *entire* `Github <https://github.com/>`_ organization, repository or user account, including starred, issues and wikis in the most appropriate format (clones for wikis, json files for issues).
|
|
|
|
Requirements
|
|
============
|
|
|
|
- GIT 1.9+
|
|
- Python
|
|
|
|
Installation
|
|
============
|
|
|
|
Using PIP via PyPI::
|
|
|
|
pip install github-backup
|
|
|
|
Using PIP via Github (more likely the latest version)::
|
|
|
|
pip install git+https://github.com/josegonzalez/python-github-backup.git#egg=github-backup
|
|
|
|
*Note for Python newcomers: even after you've installed pip and python etc, (e.g. debian based: ``sudo apt install pip``), an installed python scripts are unlikely to be included in your ``$PATH`` by default, this means it cannot be run directly in terminal with ``$ github-backup ...``, you can either add pythons install path to your environments ``$PATH`` or call the script directly e.g. `$ ~/.local/bin/github-backup`.*
|
|
|
|
|
|
|
|
Usage Details
|
|
=============
|
|
|
|
Authentication
|
|
--------------
|
|
|
|
**Password-based authentication** will fail if you have two-factor authentication enabled, and will `be deprecated <https://github.blog/2023-03-09-raising-the-bar-for-software-security-github-2fa-begins-march-13/>`_ by 2023 EOY.
|
|
|
|
``--username`` is used for basic password authentication and separate from the position argument ``USER``, which specifies the user account you wish to backing up.
|
|
|
|
**Classic tokens** are `slightly less secure <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#personal-access-tokens-classic>`_ as they provide very coarse-grained permissions.
|
|
|
|
If you need authentication for long-running backups (i.e. for private repositories etc) it is therefore recommended to use **fine-grained personal access token** ``-f TOKEN_FINE``.
|
|
|
|
|
|
Fine Tokens
|
|
~~~~~~~~~~~
|
|
|
|
Under Settings -> Developer Settings -> Personal access tokens -> Fine-grained Tokens. You can "generate new token" and choose the repository scope, either specific repos or all repos.
|
|
|
|
You can customise the permissions for use case, but for a personal account full backup you'll need to enable the following permissions:
|
|
|
|
**User permissions**: Read access to followers, starring, and watching.
|
|
|
|
**Repository permissions**: Read access to actions, code, commit statuses, environments, issues, merge queues, metadata, pages, pull requests, repository advisories, and repository hooks.
|
|
|
|
|
|
Prefer SSH
|
|
~~~~~~~~~~
|
|
|
|
Using the ``-prefer-ssh`` argument will use ssh for cloning the git repos. If cloning repos is enabled with ``--repositories``, ``--all-starred``, ``--wikis``, ``--gists``, ``--starred-gists``.
|
|
|
|
To clone with SSH, you'll need SSH authentication setup `as usual with Github <https://docs.github.com/en/authentication/connecting-to-github-with-ssh>`_, e.g. via SSH public and private keys.
|
|
|
|
All other connections will still use their own protocol, e.g. API requests for issues using HTTPS.
|
|
|
|
|
|
Using the Keychain on Mac OSX
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
Note: On Mac OSX the token can be stored securely in the user's keychain. To do this:
|
|
|
|
1. Open Keychain from "Applications -> Utilities -> Keychain Access"
|
|
2. Add a new password item using "File -> New Password Item"
|
|
3. Enter a name in the "Keychain Item Name" box. You must provide this name to github-backup using the --keychain-name argument.
|
|
4. Enter an account name in the "Account Name" box, enter your Github username as set above. You must provide this name to github-backup using the --keychain-account argument.
|
|
5. Enter your Github personal access token in the "Password" box
|
|
|
|
Note: When you run github-backup, you will be asked whether you want to allow "security" to use your confidential information stored in your keychain. You have two options:
|
|
|
|
1. **Allow:** In this case you will need to click "Allow" each time you run `github-backup`
|
|
2. **Always Allow:** In this case, you will not be asked for permission when you run `github-backup` in future. This is less secure, but is required if you want to schedule `github-backup` to run automatically
|
|
|
|
|
|
Github Rate-limit and Throttling
|
|
--------------------------------
|
|
|
|
``github-backup`` will automatically throttle itself based on feedback from the Github API. The API is usually rate-limited to 5000 calls per hour, and it tells github-backup when to pause and wait until the limit is reset in the next hour.
|
|
|
|
On a fast connection this can result in safe (~20 min) pauses and bursts of API calls and downloading periodically maxing our your connection, is this is not suitable `it has been observed <https://github.com/josegonzalez/python-github-backup/issues/76#issuecomment-636158717>`_ under real-world conditions that overriding the throttle with ``--throttle-limit 5000 --throttle-pause 0.6`` provides a smooth rate across the hour, although a ``--throttle-pause 0.72`` (3600 seconds [1 hour] / 5000 limit) is theoretically safer.
|
|
|
|
|
|
About Git LFS
|
|
-------------
|
|
|
|
When you use the ``--lfs`` option, you will need to make sure you have Git LFS installed.
|
|
|
|
Instructions on how to do this can be found on https://git-lfs.github.com.
|
|
|
|
|
|
Gotchas / Known-issues
|
|
======================
|
|
|
|
All is not all
|
|
--------------
|
|
|
|
The ``--all`` argument does not include; cloning private repos (``-P, --private``), cloning forks (``-F, --fork``) cloning starred repositories (``--all-starred``), ``--pull-details``, cloning LFS repositories (``--lfs ``), cloning gists (``--starred-gists``) or starred gist repos (``--starred-gists``).
|
|
|
|
All Starred can be very large
|
|
------------------------
|
|
|
|
Using the ``--all-starred`` argument to clone all starred repositories may use a large amount of storage space, especially if ``--all`` or more arguments are used. e.g. thousands of JSON issues files.
|
|
|
|
Incremental Backup
|
|
-------------------
|
|
|
|
Incremental (``-i, --incremental``) backups in this context means, requesting only parts since the last run (successful or not). e.g. only request issues from the API since the last run. This means any blocking errors on previous runs can cause large missing chucks of data.
|
|
|
|
Known blocking errors
|
|
---------------------
|
|
|
|
Some errors will block the backup and exit the script, such as receiving a 403 Forbidden error from the Github API. If the incremental argument is used, this will result in the next backup only requesting data from the API since the last blocked/failed run. It's therefore recommended to only use the incremental argument if the output/result is being actively monitored to avoid unexpected missing data in your backup.
|
|
|
|
Starred public repo blocking with all argument
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Since the ``--all`` argument includes ``--hooks``, if you use ``--all`` and ``--all-starred`` to clone a users starred public repositories, the backup will error and block the backup continuing. This is due to needing the correct permission for ``-hooks`` on public repos.
|
|
|
|
Releases blocking error
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
A ``--releases`` (required for ``--assets``) error will sometimes block the backup. If you're backing up a lot of repositories with releases e.g. an organisation or ``--starred-gists`` you may need to remove ``--releases`` (and therefore ``--assets``) to complete a backup. Documented in `issue 209 <https://github.com/josegonzalez/python-github-backup/issues/209>`_.
|
|
|
|
|
|
Bare is actually mirror
|
|
-----------------------
|
|
|
|
Using the bare clone argument (``--bare``) will actually call git's ``clone --mirror`` command. There's a subtle difference between `bare <https://www.git-scm.com/docs/git-clone#Documentation/git-clone.txt---bare>`_ and `mirror <https://www.git-scm.com/docs/git-clone#Documentation/git-clone.txt---mirror>`_ clone.::
|
|
|
|
Compared to --bare, --mirror not only maps local branches of the source to local branches of the target, it maps all refs (including remote-tracking branches, notes etc.) and sets up a refspec configuration such that all these refs are overwritten by a git remote update in the target repository.
|
|
|
|
|
|
Starred gists stored with your gists
|
|
------------------------------------
|
|
|
|
The starred repo cloning (``--all-starred``) argument stores starred repos under a separate directory to your own repositories. Using ``--starred-gists`` will store them within the same directory as your own gists ``--gists``.
|
|
|
|
|
|
Skipping existing may leave you with incomplete backups
|
|
-------------------------------------------------------
|
|
|
|
The ``--skip-existing`` argument will skip any existing backup if the directory exists, if the backup in that directory was successfully completed or not (perhaps due to a blocking error).
|
|
|
|
|
|
Basic Help
|
|
===========
|
|
|
|
Show the CLI help output::
|
|
github-backup -h
|
|
|
|
CLI Help output::
|
|
github-backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN_CLASSIC]
|
|
[-f TOKEN_FINE] [--as-app] [-o OUTPUT_DIRECTORY]
|
|
[-l LOG_LEVEL] [-i] [--starred] [--all-starred]
|
|
[--watched] [--followers] [--following] [--all] [--issues]
|
|
[--issue-comments] [--issue-events] [--pulls]
|
|
[--pull-comments] [--pull-commits] [--pull-details]
|
|
[--labels] [--hooks] [--milestones] [--repositories]
|
|
[--bare] [--lfs] [--wikis] [--gists] [--starred-gists]
|
|
[--skip-archived] [--skip-existing] [-L [LANGUAGES ...]]
|
|
[-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY]
|
|
[-P] [-F] [--prefer-ssh] [-v]
|
|
[--keychain-name OSX_KEYCHAIN_ITEM_NAME]
|
|
[--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT]
|
|
[--releases] [--assets] [--exclude [REPOSITORY [REPOSITORY ...]]
|
|
[--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE]
|
|
USER
|
|
|
|
Backup a github account
|
|
|
|
positional arguments:
|
|
USER github username
|
|
|
|
optional arguments:
|
|
-h, --help show this help message and exit
|
|
-u USERNAME, --username USERNAME
|
|
username for basic auth
|
|
-p PASSWORD, --password PASSWORD
|
|
password for basic auth. If a username is given but
|
|
not a password, the password will be prompted for.
|
|
-f TOKEN_FINE, --token-fine TOKEN_FINE
|
|
fine-grained personal access token or path to token
|
|
(file://...)
|
|
-t TOKEN_CLASSIC, --token TOKEN_CLASSIC
|
|
personal access, OAuth, or JSON Web token, or path to
|
|
token (file://...)
|
|
--as-app authenticate as github app instead of as a user.
|
|
-o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
|
|
directory at which to backup the repositories
|
|
-l LOG_LEVEL, --log-level LOG_LEVEL
|
|
log level to use (default: info, possible levels:
|
|
debug, info, warning, error, critical)
|
|
-i, --incremental incremental backup
|
|
--starred include JSON output of starred repositories in backup
|
|
--all-starred include starred repositories in backup [*]
|
|
--watched include JSON output of watched repositories in backup
|
|
--followers include JSON output of followers in backup
|
|
--following include JSON output of following users in backup
|
|
--all include everything in backup (not including [*])
|
|
--issues include issues in backup
|
|
--issue-comments include issue comments in backup
|
|
--issue-events include issue events in backup
|
|
--pulls include pull requests in backup
|
|
--pull-comments include pull request review comments in backup
|
|
--pull-commits include pull request commits in backup
|
|
--pull-details include more pull request details in backup [*]
|
|
--labels include labels in backup
|
|
--hooks include hooks in backup (works only when
|
|
authenticated)
|
|
--milestones include milestones in backup
|
|
--repositories include repository clone in backup
|
|
--bare clone bare repositories
|
|
--lfs clone LFS repositories (requires Git LFS to be
|
|
installed, https://git-lfs.github.com) [*]
|
|
--wikis include wiki clone in backup
|
|
--gists include gists in backup [*]
|
|
--starred-gists include starred gists in backup [*]
|
|
--skip-existing skip project if a backup directory exists
|
|
-L [LANGUAGES [LANGUAGES ...]], --languages [LANGUAGES [LANGUAGES ...]]
|
|
only allow these languages
|
|
-N NAME_REGEX, --name-regex NAME_REGEX
|
|
python regex to match names against
|
|
-H GITHUB_HOST, --github-host GITHUB_HOST
|
|
GitHub Enterprise hostname
|
|
-O, --organization whether or not this is an organization user
|
|
-R REPOSITORY, --repository REPOSITORY
|
|
name of repository to limit backup to
|
|
-P, --private include private repositories [*]
|
|
-F, --fork include forked repositories [*]
|
|
--prefer-ssh Clone repositories using SSH instead of HTTPS
|
|
-v, --version show program's version number and exit
|
|
--keychain-name OSX_KEYCHAIN_ITEM_NAME
|
|
OSX ONLY: name field of password item in OSX keychain
|
|
that holds the personal access or OAuth token
|
|
--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT
|
|
OSX ONLY: account field of password item in OSX
|
|
keychain that holds the personal access or OAuth token
|
|
--releases include release information, not including assets or
|
|
binaries
|
|
--assets include assets alongside release information; only
|
|
applies if including releases
|
|
--exclude [REPOSITORY [REPOSITORY ...]]
|
|
names of repositories to exclude from backup.
|
|
--throttle-limit THROTTLE_LIMIT
|
|
start throttling of GitHub API requests after this
|
|
amount of API requests remain
|
|
--throttle-pause THROTTLE_PAUSE
|
|
wait this amount of seconds when API request
|
|
throttling is active (default: 30.0, requires
|
|
--throttle-limit to be set)
|
|
|
|
|
|
Github Backup Examples
|
|
========
|
|
|
|
Backup all repositories, including private ones using a classic token::
|
|
|
|
export ACCESS_TOKEN=SOME-GITHUB-TOKEN
|
|
github-backup WhiteHouse --token $ACCESS_TOKEN --organization --output-directory /tmp/white-house --repositories --private
|
|
|
|
Use a fine-grained access token to backup a single organization repository with everything else (wiki, pull requests, comments, issues etc)::
|
|
|
|
export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
|
|
ORGANIZATION=docker
|
|
REPO=cli
|
|
# e.g. git@github.com:docker/cli.git
|
|
github-backup $ORGANIZATION -P -f $FINE_ACCESS_TOKEN -o . --all -O -R $REPO
|
|
|
|
Quietly and incrementally backup most useful Github data for a user (public and private) with SSH cloning including; all issues, pulls, all public starred repos and gists (omitting "hooks", "releases" and therefore "assets" to prevent blocking) into an output folder in your home directory. *Great for a cron job. Omit "incremental" to fix a previous incomplete backup.*::
|
|
|
|
export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
|
|
GH_USER=YOUR-GITHUB-USER
|
|
|
|
github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
|
|
|
|
Debug an erroring/blocking or incomplete backup into a temporary directory::
|
|
|
|
export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
|
|
GH_USER=YOUR-GITHUB-USER
|
|
|
|
github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
|
|
|
|
|
|
|
|
|
|
Development
|
|
===========
|
|
|
|
This project is considered feature complete for the primary maintainer. If you would like a bugfix or enhancement and can not sponsor the work, pull requests are welcome. Feel free to contact the maintainer for consulting estimates if desired.
|
|
|
|
Testing
|
|
-------
|
|
|
|
This project currently contains no unit tests. To run linting::
|
|
|
|
pip install flake8
|
|
flake8 --ignore=E501
|
|
|
|
|
|
.. |PyPI| image:: https://img.shields.io/pypi/v/github-backup.svg
|
|
:target: https://pypi.python.org/pypi/github-backup/
|
|
.. |Python Versions| image:: https://img.shields.io/pypi/pyversions/github-backup.svg
|
|
:target: https://github.com/josegonzalez/python-github-backup
|