Compare commits

...

62 Commits

Author SHA1 Message Date
Jose Diaz-Gonzalez
498d9eba32 Release version 0.27.0 2020-01-21 21:29:44 -05:00
Jose Diaz-Gonzalez
0f82b1717c Merge pull request #142 from einsteinx2/issue/141-import-error-version
Fixed script fails if not installed from pip
2020-01-21 21:28:22 -05:00
Ben Baron
4d5126f303 Fixed script fails if not installed from pip
At the top of the script, the line from github_backup import __version__ gets the script's version number to use if the script is called with the -v or --version flags. The problem is that if the script hasn't been installed via pip (for example I cloned the repo directly to my backup server), the script will fail due to an import exception.

Also presumably it will always use the version number from pip even if running a modified version from git or a fork or something, though this does not fix that as I have no idea how to check if it's running the pip installed version or not. But at least the script will now work fine if cloned from git or just copied to another machine.

closes https://github.com/josegonzalez/python-github-backup/issues/141
2020-01-21 21:15:57 -05:00
Jose Diaz-Gonzalez
98919c82c9 Merge pull request #136 from einsteinx2/issue/88-macos-keychain-broken-python3
Fixed macOS keychain access when using Python 3
2020-01-07 11:44:36 -05:00
Jose Diaz-Gonzalez
045eacbf18 Merge pull request #137 from einsteinx2/issue/134-only-use-auth-token-when-needed
Public repos no longer include the auth token
2020-01-07 11:44:23 -05:00
Jose Diaz-Gonzalez
7a234ba7ed Merge pull request #130 from einsteinx2/issue/129-fix-crash-on-release-asset-download-error
Crash when an release asset doesn't exist
2020-01-07 11:44:00 -05:00
Ben Baron
e8a255b450 Public repos no longer include the auth token
When backing up repositories using an auth token and https, the GitHub personal auth token is leaked in each backed up repository. It is included in the URL of each repository's git remote url.

This is not needed as they are public and can be accessed without the token and can cause issues in the future if the token is ever changed, so I think it makes more sense not to have the token stored in each repo backup. I think the token should only be "leaked" like this out of necessity, e.g. it's a private repository and the --prefer-ssh option was not chosen so https with auth token was required to perform the clone.
2020-01-06 21:25:54 -05:00
Ben Baron
81a2f762da Fixed macOS keychain access when using Python 3
Python 3 is returning bytes rather than a string, so the string concatenation to create the auth variable was throwing an exception which the script was interpreting to mean it couldn't find the password. Adding a conversion to string first fixed the issue.
2020-01-06 21:10:50 -05:00
Ben Baron
cb0293cbe5 Fixed comment typo 2020-01-06 14:15:41 -05:00
Jose Diaz-Gonzalez
252c25461f Merge pull request #132 from einsteinx2/issue/126-prevent-overwriting-release-assets
Separate release assets and skip re-downloading
2020-01-06 13:12:33 -05:00
Jose Diaz-Gonzalez
e8ed03fd06 Merge pull request #131 from einsteinx2/improve-gitignore
Improved gitignore, macOS files and IDE configs
2020-01-06 13:11:06 -05:00
Ben Baron
38010d7c39 Switched log_info to log_warning in download_file 2020-01-06 13:06:22 -05:00
Ben Baron
71b4288e6b Added newline to end of file 2020-01-06 13:04:40 -05:00
Ben Baron
ba4fa9fa2d Moved asset downloading loop inside the if block 2020-01-06 12:50:33 -05:00
Ben Baron
869f761c90 Separate release assets and skip re-downloading
Currently the script puts all release assets into the same folder called `releases`. So any time 2 release files have the same name, only the last one downloaded is actually saved. A particularly bad example of this is MacDownApp/macdown where all of their releases are named `MacDown.app.zip`. So even though they have 36 releases and all 36 are downloaded, only the last one is actually saved.

With this change, each releases' assets are now stored in a fubfolder inside `releases` named after the release name. There could still be edge cases if two releases have the same name, but this is still much safer tha the previous behavior.

This change also now checks if the asset file already exists on disk and skips downloading it. This drastically speeds up addiotnal syncs as it no longer downloads every single release every single time. It will now only download new releases which I believe is the expected behavior.

closes https://github.com/josegonzalez/python-github-backup/issues/126
2020-01-06 12:40:47 -05:00
Ben Baron
195e700128 Improved gitignore, macOS files and IDE configs
Ignores the annoying hidden macOS files .DS_Store and ._* as well as the IDE configuration folders for contributors using the popular Visual Studio Code and Atom IDEs (more can be added later as needed).
2020-01-06 11:26:06 -05:00
Ben Baron
27441b71b6 Crash when an release asset doesn't exist
Currently, the script crashes whenever a release asset is unable to download (for example a 404 response). This change instead logs the failure and allows the script to continue. No retry logic is enabled, but at least it prevents the crash and allows the backup to complete. Retry logic can be implemented later if wanted.

closes https://github.com/josegonzalez/python-github-backup/issues/129
2020-01-06 11:13:25 -05:00
Jose Diaz-Gonzalez
cfeaee7309 Update ISSUE_TEMPLATE.md 2020-01-06 10:20:07 -05:00
Jose Diaz-Gonzalez
fac8e4274f Release version 0.26.0 2019-09-23 11:45:01 -04:00
Jose Diaz-Gonzalez
17fee66f31 Merge pull request #128 from Snawoot/master
Workaround gist clone in `--prefer-ssh` mode
2019-09-23 11:44:21 -04:00
Vladislav Yarmak
a56d27dd8b workaround gist clone in --prefer-ssh mode 2019-09-21 19:22:27 +03:00
Jose Diaz-Gonzalez
e57873b6dd Create PULL_REQUEST.md 2019-08-14 17:51:19 -04:00
Jose Diaz-Gonzalez
2658b039a1 Create ISSUE_TEMPLATE.md 2019-08-14 17:47:47 -04:00
Jose Diaz-Gonzalez
fd684a71fb Update README.rst 2019-07-11 13:40:25 -07:00
Jose Diaz-Gonzalez
bacd77030b Update README.rst 2019-07-11 13:39:41 -07:00
Jose Diaz-Gonzalez
b73079daf2 Release version 0.25.0 2019-07-03 17:46:12 -04:00
Jose Diaz-Gonzalez
eca8a70666 Merge pull request #120 from 8h2a/patch-1
Issue 119: Change retrieve_data to be a generator
2019-07-03 17:45:40 -04:00
2a
e74765ba7f Issue 119: Change retrieve_data to be a generator
See issue #119.
2019-07-03 23:01:00 +02:00
Jose Diaz-Gonzalez
6db5bd731b Release version 0.24.0 2019-06-27 11:24:43 -04:00
Jose Diaz-Gonzalez
7305871c20 Merge pull request #117 from QuicketSolutions/master
Add option for Releases
2019-06-27 11:15:02 -04:00
Ethan Timm
baf7b1a9b4 Merge pull request #5 from QuicketSolutions/QKT-45
QKT-45: include assets - update readme
2019-06-25 15:41:11 -05:00
Ethan Timm
121fa68294 QKT-45: include assets - update readme
update readme with flag information for including assets alongside their respective releases
2019-06-25 15:41:02 -05:00
Ethan Timm
44dfc79edc Merge pull request #4 from whwright/wip-releases
Download github assets
2019-06-25 15:35:39 -05:00
Harrison Wright
89f59cc7a2 Make assets it's own flag 2019-06-24 15:49:19 -05:00
Jose Diaz-Gonzalez
ad8c5b8768 Merge pull request #118 from whwright/115-fix-pull-details
Fix pull details
2019-06-24 14:51:10 -04:00
Harrison Wright
921aab3729 Fix pull details 2019-06-22 13:19:45 -05:00
Harrison Wright
ea4c3d0f6f Fix super call for python2 2019-06-22 13:05:54 -05:00
Harrison Wright
9b6400932d Fix redirect to s3 2019-06-22 13:00:42 -05:00
Harrison Wright
de0c3f46c6 WIP: download assets 2019-06-21 20:03:14 -05:00
Ethan Timm
73b069f872 Merge pull request #3 from QuicketSolutions/QKT-42
QKT-42: releases - add readme info
2019-06-21 16:54:28 -05:00
ethan
3d3f512074 QKT-42: releases - add readme info 2019-06-21 16:53:40 -05:00
Ethan Timm
1c3078992d Merge pull request #2 from QuicketSolutions/QKT-42
QKT-42 update: shorter command flag
2019-06-21 16:49:40 -05:00
ethan
4b40ae94d7 QKT-42 update: shorter command flag 2019-06-21 16:48:25 -05:00
Ethan Timm
a18fda9faf Merge pull request #1 from QuicketSolutions/QKT-42
QKT-42: support saving release information
2019-06-21 16:43:48 -05:00
ethan
41130fc8b0 QKT-42: support saving release information 2019-06-21 11:20:32 -05:00
Jose Diaz-Gonzalez
2340a02fc6 Release version 0.23.0 2019-06-04 14:43:32 -04:00
Jose Diaz-Gonzalez
cafff4ae80 Merge pull request #113 from kleag/master
Avoid to crash in case of HTTP 502 error
2019-06-04 14:43:10 -04:00
Gael de Chalendar
3193d120e5 Avoid to crash in case of HTTP 502 error
Survive also on socket.error connections like on HTTPError or URLError.

This should solve issue #110.
2019-06-04 18:53:58 +02:00
Jose Diaz-Gonzalez
da4b29a2d6 Release version 0.22.2 2019-02-21 15:41:11 -05:00
Jose Diaz-Gonzalez
d05c96ecef Merge pull request #107 from josegonzalez/patch-1
fix: warn instead of error
2019-02-21 15:40:59 -05:00
Jose Diaz-Gonzalez
c86163bfe6 fix: warn instead of error
Refs #106
2019-02-21 15:40:39 -05:00
Jose Diaz-Gonzalez
eff6e36974 Release version 0.22.1 2019-02-21 15:13:31 -05:00
Jose Diaz-Gonzalez
63e458bafb Merge pull request #106 from jstetic/master
Log URL error
2019-02-21 15:13:02 -05:00
JOHN STETIC
57ab5ce1a2 Log URL error https://github.com/josegonzalez/python-github-backup/issues/105 2019-02-20 20:43:00 -05:00
Jose Diaz-Gonzalez
d148f9b900 Release version 0.22.0 2019-02-01 09:50:42 -05:00
Jose Diaz-Gonzalez
89ee22c2be Merge pull request #103 from whwright/98-better-logging
Fix accidental system exit with better logging strategy
2018-12-27 15:12:26 -05:00
W. Harrison Wright
9e472b74e6 Remove unnecessary sys.exit call 2018-12-27 13:07:13 -06:00
W. Harrison Wright
4b459f9af8 Add org check to avoid incorrect log output 2018-12-27 12:58:57 -06:00
W. Harrison Wright
b70ea87db7 Fix accidental system exit with better logging strategy 2018-12-27 12:53:21 -06:00
Jose Diaz-Gonzalez
f8be34562b Release version 0.21.1 2018-12-25 06:28:28 -05:00
Jose Diaz-Gonzalez
ec05204aa9 Merge pull request #101 from ecki/patch-2
Mark options which are not included in --all
2018-12-25 06:27:58 -05:00
Bernd
628f2cbf73 Mark options which are not included in --all
As discussed in Issue #100
2018-12-24 04:19:29 +01:00
7 changed files with 334 additions and 140 deletions

9
.gitignore vendored
View File

@@ -25,3 +25,12 @@ doc/_build
# Generated man page
doc/aws_hostname.1
# Annoying macOS files
.DS_Store
._*
# IDE configuration files
.vscode
.atom

View File

@@ -1,49 +1,152 @@
Changelog
=========
0.21.0 (2018-11-28)
0.27.0 (2020-01-21)
-------------------
------------------------
- Fixed script fails if not installed from pip. [Ben Baron]
At the top of the script, the line from github_backup import __version__ gets the script's version number to use if the script is called with the -v or --version flags. The problem is that if the script hasn't been installed via pip (for example I cloned the repo directly to my backup server), the script will fail due to an import exception.
Also presumably it will always use the version number from pip even if running a modified version from git or a fork or something, though this does not fix that as I have no idea how to check if it's running the pip installed version or not. But at least the script will now work fine if cloned from git or just copied to another machine.
closes https://github.com/josegonzalez/python-github-backup/issues/141
- Fixed macOS keychain access when using Python 3. [Ben Baron]
Python 3 is returning bytes rather than a string, so the string concatenation to create the auth variable was throwing an exception which the script was interpreting to mean it couldn't find the password. Adding a conversion to string first fixed the issue.
- Public repos no longer include the auth token. [Ben Baron]
When backing up repositories using an auth token and https, the GitHub personal auth token is leaked in each backed up repository. It is included in the URL of each repository's git remote url.
This is not needed as they are public and can be accessed without the token and can cause issues in the future if the token is ever changed, so I think it makes more sense not to have the token stored in each repo backup. I think the token should only be "leaked" like this out of necessity, e.g. it's a private repository and the --prefer-ssh option was not chosen so https with auth token was required to perform the clone.
- Fixed comment typo. [Ben Baron]
- Switched log_info to log_warning in download_file. [Ben Baron]
- Crash when an release asset doesn't exist. [Ben Baron]
Currently, the script crashes whenever a release asset is unable to download (for example a 404 response). This change instead logs the failure and allows the script to continue. No retry logic is enabled, but at least it prevents the crash and allows the backup to complete. Retry logic can be implemented later if wanted.
closes https://github.com/josegonzalez/python-github-backup/issues/129
- Moved asset downloading loop inside the if block. [Ben Baron]
- Separate release assets and skip re-downloading. [Ben Baron]
Currently the script puts all release assets into the same folder called `releases`. So any time 2 release files have the same name, only the last one downloaded is actually saved. A particularly bad example of this is MacDownApp/macdown where all of their releases are named `MacDown.app.zip`. So even though they have 36 releases and all 36 are downloaded, only the last one is actually saved.
With this change, each releases' assets are now stored in a fubfolder inside `releases` named after the release name. There could still be edge cases if two releases have the same name, but this is still much safer tha the previous behavior.
This change also now checks if the asset file already exists on disk and skips downloading it. This drastically speeds up addiotnal syncs as it no longer downloads every single release every single time. It will now only download new releases which I believe is the expected behavior.
closes https://github.com/josegonzalez/python-github-backup/issues/126
- Added newline to end of file. [Ben Baron]
- Improved gitignore, macOS files and IDE configs. [Ben Baron]
Ignores the annoying hidden macOS files .DS_Store and ._* as well as the IDE configuration folders for contributors using the popular Visual Studio Code and Atom IDEs (more can be added later as needed).
0.26.0 (2019-09-23)
-------------------
- Workaround gist clone in `--prefer-ssh` mode. [Vladislav Yarmak]
- Create PULL_REQUEST.md. [Jose Diaz-Gonzalez]
- Create ISSUE_TEMPLATE.md. [Jose Diaz-Gonzalez]
0.25.0 (2019-07-03)
-------------------
- Issue 119: Change retrieve_data to be a generator. [2a]
See issue #119.
0.24.0 (2019-06-27)
-------------------
- QKT-45: include assets - update readme. [Ethan Timm]
update readme with flag information for including assets alongside their respective releases
- Make assets it's own flag. [Harrison Wright]
- Fix super call for python2. [Harrison Wright]
- Fix redirect to s3. [Harrison Wright]
- WIP: download assets. [Harrison Wright]
- QKT-42: releases - add readme info. [ethan]
- QKT-42 update: shorter command flag. [ethan]
- QKT-42: support saving release information. [ethan]
- Fix pull details. [Harrison Wright]
0.23.0 (2019-06-04)
-------------------
- Avoid to crash in case of HTTP 502 error. [Gael de Chalendar]
Survive also on socket.error connections like on HTTPError or URLError.
This should solve issue #110.
0.22.2 (2019-02-21)
-------------------
Fix
~~~
- Warn instead of error. [Jose Diaz-Gonzalez]
Refs #106
0.22.1 (2019-02-21)
-------------------
- Log URL error https://github.com/josegonzalez/python-github-
backup/issues/105. [JOHN STETIC]
0.22.0 (2019-02-01)
-------------------
- Remove unnecessary sys.exit call. [W. Harrison Wright]
- Add org check to avoid incorrect log output. [W. Harrison Wright]
- Fix accidental system exit with better logging strategy. [W. Harrison
Wright]
0.21.1 (2018-12-25)
-------------------
- Mark options which are not included in --all. [Bernd]
As discussed in Issue #100
0.21.0 (2018-11-28)
-------------------
- Correctly download repos when user arg != authenticated user. [W.
Harrison Wright]
0.20.1 (2018-09-29)
-------------------
- Clone the specified user's gists, not the authenticated user. [W.
Harrison Wright]
- Clone the specified user's starred repos, not the authenticated user.
[W. Harrison Wright]
0.20.0 (2018-03-24)
-------------------
- Chore: drop Python 2.6. [Jose Diaz-Gonzalez]
- Feat: simplify release script. [Jose Diaz-Gonzalez]
0.19.2 (2018-03-24)
-------------------
Fix
~~~
- Cleanup pep8 violations. [Jose Diaz-Gonzalez]
0.19.0 (2018-03-24)
-------------------
- Add additional output for the current request. [Robin Gloster]
This is useful to have some progress indication for huge repositories.
- Add option to backup additional PR details. [Robin Gloster]
Some payload is only included when requesting a single pull request
- Mark string as binary in comparison for skip_existing. [Johannes
Bornhold]
@@ -54,66 +157,53 @@ Fix
0.18.0 (2018-02-22)
-------------------
- Add option to fetch followers/following JSON data. [Stephen Greene]
0.17.0 (2018-02-20)
-------------------
- Short circuit gists backup process. [W. Harrison Wright]
- Formatting. [W. Harrison Wright]
- Add ability to backup gists. [W. Harrison Wright]
0.16.0 (2018-01-22)
-------------------
- Change option to --all-starred. [W. Harrison Wright]
- JK don't update documentation. [W. Harrison Wright]
- Put starred clone repoistories under a new option. [W. Harrison
Wright]
- Add comment. [W. Harrison Wright]
- Add ability to clone starred repos. [W. Harrison Wright]
0.14.1 (2017-10-11)
-------------------
- Fix arg not defined error. [Edward Pfremmer]
Ref: https://github.com/josegonzalez/python-github-backup/issues/69
0.14.0 (2017-10-11)
-------------------
- Added a check to see if git-lfs is installed when doing an LFS clone.
[pieterclaerhout]
- Added support for LFS clones. [pieterclaerhout]
- Add pypi info to readme. [Albert Wang]
- Explicitly support python 3 in package description. [Albert Wang]
- Add couple examples to help new users. [Yusuf Tran]
0.13.2 (2017-05-06)
-------------------
- Fix remotes while updating repository. [Dima Gerasimov]
0.13.1 (2017-04-11)
-------------------
- Fix error when repository has no updated_at value. [Nicolai Ehemann]
0.13.0 (2017-04-05)
-------------------
- Add OS check for OSX specific keychain args. [Martin O'Reilly]
Keychain arguments are only supported on Mac OSX.
@@ -122,8 +212,6 @@ Fix
error message rather than a "No password item matching the
provided name and account could be found in the osx keychain"
error message
- Add support for storing PAT in OSX keychain. [Martin O'Reilly]
Added additional optional arguments and README guidance for storing
@@ -133,62 +221,48 @@ Fix
0.12.1 (2017-03-27)
-------------------
- Avoid remote branch name churn. [Chris Adams]
This avoids the backup output having lots of "[new branch]" messages
because removing the old remote name removed all of the existing branch
references.
- Fix detection of bare git directories. [Andrzej Maczuga]
0.12.0 (2016-11-22)
-------------------
Fix
~~~
- Properly import version from github_backup package. [Jose Diaz-
Gonzalez]
- Support alternate git status output. [Jose Diaz-Gonzalez]
Other
~~~~~
- Pep8: E501 line too long (83 > 79 characters) [Jose Diaz-Gonzalez]
- Pep8: E128 continuation line under-indented for visual indent. [Jose
Diaz-Gonzalez]
- Support archivization using bare git clones. [Andrzej Maczuga]
- Fix typo, 3x. [Terrell Russell]
0.11.0 (2016-10-26)
-------------------
- Support --token file:///home/user/token.txt (fixes gh-51) [Björn
Dahlgren]
- Fix some linting. [Albert Wang]
- Fix byte/string conversion for python 3. [Albert Wang]
- Support python 3. [Albert Wang]
- Encode special characters in password. [Remi Rampin]
- Don't pretend program name is "Github Backup" [Remi Rampin]
- Don't install over insecure connection. [Remi Rampin]
The git:// protocol is unauthenticated and unencrypted, and no longer advertised by GitHub. Using HTTPS shouldn't impact performance.
0.10.3 (2016-08-21)
-------------------
- Fixes #29. [Jonas Michel]
Reporting an error when the user's rate limit is exceeded causes
@@ -196,8 +270,6 @@ Other
sleep. Instead of generating an explicit error we just want to
inform the user that the script is going to sleep until their rate
limit count resets.
- Fixes #29. [Jonas Michel]
The errors list was not being cleared out after resuming a backup
@@ -208,14 +280,13 @@ Other
0.10.2 (2016-08-21)
-------------------
- Add a note regarding git version requirement. [Jose Diaz-Gonzalez]
Closes #37
0.10.0 (2016-08-18)
-------------------
- Implement incremental updates. [Robert Bradshaw]
Guarded with an --incremental flag.
@@ -228,12 +299,11 @@ Other
0.9.0 (2016-03-29)
------------------
- Fix cloning private repos with basic auth or token. [Kazuki Suda]
0.8.0 (2016-02-14)
------------------
- Don't store issues which are actually pull requests. [Enrico Tröger]
This prevents storing pull requests twice since the Github API returns
@@ -244,43 +314,31 @@ Other
0.7.0 (2016-02-02)
------------------
- Softly fail if not able to read hooks. [Albert Wang]
- Add note about 2-factor auth. [Albert Wang]
- Make user repository search go through endpoint capable of reading
private repositories. [Albert Wang]
- Prompt for password if only username given. [Alex Hall]
0.6.0 (2015-11-10)
------------------
- Force proper remote url. [Jose Diaz-Gonzalez]
- Improve error handling in case of HTTP errors. [Enrico Tröger]
In case of a HTTP status code 404, the returned 'r' was never assigned.
In case of URL errors which are not timeouts, we probably should bail
out.
- Add --hooks to also include web hooks into the backup. [Enrico Tröger]
- Create the user specified output directory if it does not exist.
[Enrico Tröger]
Fixes #17.
- Add missing auth argument to _get_response() [Enrico Tröger]
When running unauthenticated and Github starts rate-limiting the client,
github-backup crashes because the used auth variable in _get_response()
was not available. This change should fix it.
- Add repository URL to error message for non-existing repositories.
[Enrico Tröger]
@@ -291,40 +349,28 @@ Other
0.5.0 (2015-10-10)
------------------
- Add release script. [Jose Diaz-Gonzalez]
- Refactor to both simplify codepath as well as follow PEP8 standards.
[Jose Diaz-Gonzalez]
- Retry 3 times when the connection times out. [Mathijs Jonker]
- Made unicode output defalut. [Kirill Grushetsky]
- Import alphabetised. [Kirill Grushetsky]
- Preserve Unicode characters in the output file. [Kirill Grushetsky]
Added option to preserve Unicode characters in the output file
- Josegonzales/python-github-backup#12 Added backup of labels and
milestones. [aensley]
- Fixed indent. [Mathijs Jonker]
- Skip unitialized repo's. [mjonker-embed]
These gave me errors which caused mails from crontab.
- Added prefer-ssh. [mjonker-embed]
Was needed for my back-up setup, code includes this but readme wasn't updated
- Retry API requests which failed due to rate-limiting. [Chris Adams]
This allows operation to continue, albeit at a slower pace,
if you have enough data to trigger the API rate limits
- Logging_subprocess: always log when a command fails. [Chris Adams]
Previously git clones could fail without any indication
@@ -334,21 +380,15 @@ Other
Now a non-zero return code will always output a message to
stderr and will display the executed command so it can be
rerun for troubleshooting.
- Switch to using ssh_url. [Chris Adams]
The previous commit used the wrong URL for a private repo. This was
masked by the lack of error loging in logging_subprocess (which will be
in a separate branch)
- Add an option to prefer checkouts over SSH. [Chris Adams]
This is really useful with private repos to avoid being nagged
for credentials for every repository
- Add pull request support. [Kevin Laude]
Back up reporitory pull requests by passing the --include-pulls
@@ -360,8 +400,6 @@ Other
Pull requests are automatically backed up when the --all argument is
uesd.
- Add GitHub Enterprise support. [Kevin Laude]
Pass the -H or --github-host argument with a GitHub Enterprise hostname
@@ -371,35 +409,21 @@ Other
0.2.0 (2014-09-22)
------------------
- Add support for retrieving repositories. Closes #1. [Jose Diaz-
Gonzalez]
- Fix PEP8 violations. [Jose Diaz-Gonzalez]
- Add authorization to header only if specified by user. [Ioannis
Filippidis]
- Fill out readme more. [Jose Diaz-Gonzalez]
- Fix import. [Jose Diaz-Gonzalez]
- Properly name readme. [Jose Diaz-Gonzalez]
- Create MANIFEST.in. [Jose Diaz-Gonzalez]
- Create .gitignore. [Jose Diaz-Gonzalez]
- Create setup.py. [Jose Diaz-Gonzalez]
- Create requirements.txt. [Jose Diaz-Gonzalez]
- Create __init__.py. [Jose Diaz-Gonzalez]
- Create LICENSE.txt. [Jose Diaz-Gonzalez]
- Create README.md. [Jose Diaz-Gonzalez]
- Create github-backup. [Jose Diaz-Gonzalez]

13
ISSUE_TEMPLATE.md Normal file
View File

@@ -0,0 +1,13 @@
# Important notice regarding filed issues
This project already fills my needs, and as such I have no real reason to continue it's development. This project is otherwise provided as is, and no support is given.
If pull requests implementing bug fixes or enhancements are pushed, I am happy to review and merge them (time permitting).
If you wish to have a bug fixed, you have a few options:
- Fix it yourself and file a pull request.
- File a bug and hope someone else fixes it for you.
- Pay me to fix it (my rate is $200 an hour, minimum 1 hour, contact me via my [github email address](https://github.com/josegonzalez) if you want to go this route).
In all cases, feel free to file an issue, they may be of help to others in the future.

7
PULL_REQUEST.md Normal file
View File

@@ -0,0 +1,7 @@
# Important notice regarding filed pull requests
This project already fills my needs, and as such I have no real reason to continue it's development. This project is otherwise provided as is, and no support is given.
I will attempt to review pull requests at _my_ earliest convenience. If I am unable to get to your pull request in a timely fashion, it is what it is. This repository does not pay any bills, and I am not required to merge any pull request from any individual.
If you wish to jump my personal priority queue, you may pay me for my time to review. My rate is $200 an hour - minimum 1 hour - feel free contact me via my github email address if you want to go this route.

View File

@@ -4,6 +4,8 @@ github-backup
|PyPI| |Python Versions|
This project is considered feature complete for the primary maintainer. If you would like a bugfix or enhancement and cannot sponsor the work, pull requests are welcome. Feel free to contact the maintainer for consulting estimates if desired.
backup a github user or organization
Requirements
@@ -32,8 +34,9 @@ CLI Usage is as follows::
[--watched] [--followers] [--following] [--all]
[--issues] [--issue-comments] [--issue-events] [--pulls]
[--pull-comments] [--pull-commits] [--labels] [--hooks]
[--milestones] [--repositories] [--bare] [--lfs]
[--wikis] [--gists] [--starred-gists] [--skip-existing]
[--milestones] [--repositories] [--releases] [--assets]
[--bare] [--lfs] [--wikis] [--gists] [--starred-gists]
[--skip-existing]
[-L [LANGUAGES [LANGUAGES ...]]] [-N NAME_REGEX]
[-H GITHUB_HOST] [-O] [-R REPOSITORY] [-P] [-F]
[--prefer-ssh] [-v]
@@ -76,6 +79,8 @@ CLI Usage is as follows::
authenticated)
--milestones include milestones in backup
--repositories include repository clone in backup
--releases include repository releases' information without assets or binaries
--assets include assets alongside release information; only applies if including releases
--bare clone bare repositories
--lfs clone LFS repositories (requires Git LFS to be
installed, https://git-lfs.github.com)

View File

@@ -1,6 +1,7 @@
#!/usr/bin/env python
from __future__ import print_function
import socket
import argparse
import base64
@@ -17,6 +18,7 @@ import subprocess
import sys
import time
import platform
PY2 = False
try:
# python 3
from urllib.parse import urlparse
@@ -25,31 +27,41 @@ try:
from urllib.error import HTTPError, URLError
from urllib.request import urlopen
from urllib.request import Request
from urllib.request import HTTPRedirectHandler
from urllib.request import build_opener
except ImportError:
# python 2
PY2 = True
from urlparse import urlparse
from urllib import quote as urlquote
from urllib import urlencode
from urllib2 import HTTPError, URLError
from urllib2 import urlopen
from urllib2 import Request
from urllib2 import HTTPRedirectHandler
from urllib2 import build_opener
from github_backup import __version__
try:
from github_backup import __version__
VERSION = __version__
except ImportError:
VERSION = 'unknown'
FNULL = open(os.devnull, 'w')
def log_error(message):
if type(message) == str:
message = [message]
for msg in message:
sys.stderr.write("{0}\n".format(msg))
"""
Log message (str) or messages (List[str]) to stderr and exit with status 1
"""
log_warning(message)
sys.exit(1)
def log_info(message):
"""
Log message (str) or messages (List[str]) to stdout
"""
if type(message) == str:
message = [message]
@@ -57,6 +69,17 @@ def log_info(message):
sys.stdout.write("{0}\n".format(msg))
def log_warning(message):
"""
Log message (str) or messages (List[str]) to stderr
"""
if type(message) == str:
message = [message]
for msg in message:
sys.stderr.write("{0}\n".format(msg))
def logging_subprocess(popenargs,
logger,
stdout_log_level=logging.DEBUG,
@@ -163,11 +186,11 @@ def parse_args():
parser.add_argument('--all-starred',
action='store_true',
dest='all_starred',
help='include starred repositories in backup')
help='include starred repositories in backup [*]')
parser.add_argument('--watched',
action='store_true',
dest='include_watched',
help='include watched repositories in backup')
help='include JSON output of watched repositories in backup')
parser.add_argument('--followers',
action='store_true',
dest='include_followers',
@@ -179,7 +202,7 @@ def parse_args():
parser.add_argument('--all',
action='store_true',
dest='include_everything',
help='include everything in backup')
help='include everything in backup (not including [*])')
parser.add_argument('--issues',
action='store_true',
dest='include_issues',
@@ -207,7 +230,7 @@ def parse_args():
parser.add_argument('--pull-details',
action='store_true',
dest='include_pull_details',
help='include more pull request details in backup')
help='include more pull request details in backup [*]')
parser.add_argument('--labels',
action='store_true',
dest='include_labels',
@@ -231,7 +254,7 @@ def parse_args():
parser.add_argument('--lfs',
action='store_true',
dest='lfs_clone',
help='clone LFS repositories (requires Git LFS to be installed, https://git-lfs.github.com)')
help='clone LFS repositories (requires Git LFS to be installed, https://git-lfs.github.com) [*]')
parser.add_argument('--wikis',
action='store_true',
dest='include_wiki',
@@ -239,11 +262,11 @@ def parse_args():
parser.add_argument('--gists',
action='store_true',
dest='include_gists',
help='include gists in backup')
help='include gists in backup [*]')
parser.add_argument('--starred-gists',
action='store_true',
dest='include_starred_gists',
help='include starred gists in backup')
help='include starred gists in backup [*]')
parser.add_argument('--skip-existing',
action='store_true',
dest='skip_existing',
@@ -273,23 +296,32 @@ def parse_args():
parser.add_argument('-P', '--private',
action='store_true',
dest='private',
help='include private repositories')
help='include private repositories [*]')
parser.add_argument('-F', '--fork',
action='store_true',
dest='fork',
help='include forked repositories')
help='include forked repositories [*]')
parser.add_argument('--prefer-ssh',
action='store_true',
help='Clone repositories using SSH instead of HTTPS')
parser.add_argument('-v', '--version',
action='version',
version='%(prog)s ' + __version__)
version='%(prog)s ' + VERSION)
parser.add_argument('--keychain-name',
dest='osx_keychain_item_name',
help='OSX ONLY: name field of password item in OSX keychain that holds the personal access or OAuth token')
parser.add_argument('--keychain-account',
dest='osx_keychain_item_account',
help='OSX ONLY: account field of password item in OSX keychain that holds the personal access or OAuth token')
parser.add_argument('--releases',
action='store_true',
dest='include_releases',
help='include release information, not including assets or binaries'
)
parser.add_argument('--assets',
action='store_true',
dest='include_assets',
help='include assets alongside release information; only applies if including releases')
return parser.parse_args()
@@ -309,6 +341,8 @@ def get_auth(args, encode=True):
'-s', args.osx_keychain_item_name,
'-a', args.osx_keychain_item_account,
'-w'], stderr=devnull).strip())
if not PY2:
token = token.decode('utf-8')
auth = token + ':' + 'x-oauth-basic'
except:
log_error('No password item matching the provided name and account could be found in the osx keychain.')
@@ -359,14 +393,14 @@ def get_github_host(args):
def get_github_repo_url(args, repository):
if args.prefer_ssh:
return repository['ssh_url']
if repository.get('is_gist'):
return repository['git_pull_url']
if args.prefer_ssh:
return repository['ssh_url']
auth = get_auth(args, False)
if auth:
if auth and repository['private'] == True:
repo_url = 'https://{0}@{1}/{2}/{3}.git'.format(
auth,
get_github_host(args),
@@ -378,12 +412,11 @@ def get_github_repo_url(args, repository):
return repo_url
def retrieve_data(args, template, query_args=None, single_request=False):
def retrieve_data_gen(args, template, query_args=None, single_request=False):
auth = get_auth(args)
query_args = get_query_args(query_args)
per_page = 100
page = 0
data = []
while True:
page = page + 1
@@ -392,6 +425,16 @@ def retrieve_data(args, template, query_args=None, single_request=False):
status_code = int(r.getcode())
retries = 0
while retries < 3 and status_code == 502:
print('API request returned HTTP 502: Bad Gateway. Retrying in 5 seconds')
retries += 1
time.sleep(5)
request = _construct_request(per_page, page, query_args, template, auth) # noqa
r, errors = _get_response(request, auth, template)
status_code = int(r.getcode())
if status_code != 200:
template = 'API request returned HTTP {0}: {1}'
errors.append(template.format(status_code, r.reason))
@@ -400,11 +443,12 @@ def retrieve_data(args, template, query_args=None, single_request=False):
response = json.loads(r.read().decode('utf-8'))
if len(errors) == 0:
if type(response) == list:
data.extend(response)
for resp in response:
yield resp
if len(response) < per_page:
break
elif type(response) == dict and single_request:
data.append(response)
yield response
if len(errors) > 0:
log_error(errors)
@@ -412,8 +456,8 @@ def retrieve_data(args, template, query_args=None, single_request=False):
if single_request:
break
return data
def retrieve_data(args, template, query_args=None, single_request=False):
return list(retrieve_data_gen(args, template, query_args, single_request))
def get_query_args(query_args=None):
if not query_args:
@@ -433,7 +477,13 @@ def _get_response(request, auth, template):
except HTTPError as exc:
errors, should_continue = _request_http_error(exc, auth, errors) # noqa
r = exc
except URLError:
except URLError as e:
log_warning(e.reason)
should_continue = _request_url_error(template, retry_timeout)
if not should_continue:
raise
except socket.error as e:
log_warning(e.strerror)
should_continue = _request_url_error(template, retry_timeout)
if not should_continue:
raise
@@ -503,6 +553,55 @@ def _request_url_error(template, retry_timeout):
return False
class S3HTTPRedirectHandler(HTTPRedirectHandler):
"""
A subclassed redirect handler for downloading Github assets from S3.
urllib will add the Authorization header to the redirected request to S3, which will result in a 400,
so we should remove said header on redirect.
"""
def redirect_request(self, req, fp, code, msg, headers, newurl):
if PY2:
# HTTPRedirectHandler is an old style class
request = HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, newurl)
else:
request = super(S3HTTPRedirectHandler, self).redirect_request(req, fp, code, msg, headers, newurl)
del request.headers['Authorization']
return request
def download_file(url, path, auth):
# Skip downloading release assets if they already exist on disk so we don't redownload on every sync
if os.path.exists(path):
return
request = Request(url)
request.add_header('Accept', 'application/octet-stream')
request.add_header('Authorization', 'Basic '.encode('ascii') + auth)
opener = build_opener(S3HTTPRedirectHandler)
try:
response = opener.open(request)
chunk_size = 16 * 1024
with open(path, 'wb') as f:
while True:
chunk = response.read(chunk_size)
if not chunk:
break
f.write(chunk)
except HTTPError as exc:
# Gracefully handle 404 responses (and others) when downloading from S3
log_warning('Skipping download of asset {0} due to HTTPError: {1}'.format(url, exc.reason))
except URLError as e:
# Gracefully handle other URL errors
log_warning('Skipping download of asset {0} due to URLError: {1}'.format(url, e.reason))
except socket.error as e:
# Gracefully handle socket errors
# TODO: Implement retry logic
log_warning('Skipping download of asset {0} due to socker error: {1}'.format(url, e.strerror))
def get_authenticated_user(args):
template = 'https://{0}/user'.format(get_github_api_host(args))
data = retrieve_data(args, template, single_request=True)
@@ -513,7 +612,6 @@ def check_git_lfs_install():
exit_code = subprocess.call(['git', 'lfs', 'version'])
if exit_code != 0:
log_error('The argument --lfs requires you to have Git LFS installed.\nYou can get it from https://git-lfs.github.com.')
sys.exit(1)
def retrieve_repositories(args, authenticated_user):
@@ -524,8 +622,8 @@ def retrieve_repositories(args, authenticated_user):
template = 'https://{0}/user/repos'.format(
get_github_api_host(args))
else:
if args.private:
log_error('Authenticated user is different from user being backed up, thus private repositories cannot be accessed')
if args.private and not args.organization:
log_warning('Authenticated user is different from user being backed up, thus private repositories cannot be accessed')
template = 'https://{0}/users/{1}/repos'.format(
get_github_api_host(args),
args.user)
@@ -671,6 +769,10 @@ def backup_repositories(args, output_directory, repositories):
if args.include_hooks or args.include_everything:
backup_hooks(args, repo_cwd, repository, repos_template)
if args.include_releases or args.include_everything:
backup_releases(args, repo_cwd, repository, repos_template,
include_assets=args.include_assets or args.include_everything)
if args.incremental:
open(last_update_path, 'w').write(last_update)
@@ -756,24 +858,27 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
pull_states = ['open', 'closed']
for pull_state in pull_states:
query_args['state'] = pull_state
# It'd be nice to be able to apply the args.since filter here...
_pulls = retrieve_data(args,
_pulls = retrieve_data_gen(args,
_pulls_template,
query_args=query_args)
for pull in _pulls:
if args.since and pull['updated_at'] < args.since:
break
if not args.since or pull['updated_at'] >= args.since:
pulls[pull['number']] = pull
else:
_pulls = retrieve_data(args,
_pulls = retrieve_data_gen(args,
_pulls_template,
query_args=query_args)
for pull in _pulls:
if args.since and pull['updated_at'] < args.since:
break
if not args.since or pull['updated_at'] >= args.since:
pulls[pull['number']] = retrieve_data(
args,
_pulls_template + '/{}'.format(pull['number']),
single_request=True
)
)[0]
log_info('Saving {0} pull requests to disk'.format(
len(list(pulls.keys()))))
@@ -852,6 +957,37 @@ def backup_hooks(args, repo_cwd, repository, repos_template):
log_info("Unable to read hooks, skipping")
def backup_releases(args, repo_cwd, repository, repos_template, include_assets=False):
repository_fullname = repository['full_name']
# give release files somewhere to live & log intent
release_cwd = os.path.join(repo_cwd, 'releases')
log_info('Retrieving {0} releases'.format(repository_fullname))
mkdir_p(repo_cwd, release_cwd)
query_args = {}
release_template = '{0}/{1}/releases'.format(repos_template, repository_fullname)
releases = retrieve_data(args, release_template, query_args=query_args)
# for each release, store it
log_info('Saving {0} releases to disk'.format(len(releases)))
for release in releases:
release_name = release['tag_name']
output_filepath = os.path.join(release_cwd, '{0}.json'.format(release_name))
with codecs.open(output_filepath, 'w+', encoding='utf-8') as f:
json_dump(release, f)
if include_assets:
assets = retrieve_data(args, release['assets_url'])
if len(assets) > 0:
# give release asset files somewhere to live & download them (not including source archives)
release_assets_cwd = os.path.join(release_cwd, release_name)
mkdir_p(release_assets_cwd)
for asset in assets:
download_file(asset['url'], os.path.join(release_assets_cwd, asset['name']), get_auth(args))
def fetch_repository(name,
remote_url,
local_dir,

View File

@@ -1 +1 @@
__version__ = '0.21.0'
__version__ = '0.27.0'