Compare commits

..

33 Commits
0.2.0 ... 0.5.0

Author SHA1 Message Date
Jose Diaz-Gonzalez
050f5f1c17 Release version 0.5.0 2015-10-10 00:19:45 -04:00
Jose Diaz-Gonzalez
348a238770 Add release script 2015-10-10 00:19:31 -04:00
Jose Diaz-Gonzalez
708b377918 Refactor to both simplify codepath as well as follow PEP8 standards 2015-10-10 00:16:30 -04:00
Jose Diaz-Gonzalez
6193efb798 Merge pull request #19 from Embed-Engineering/retry-timeout
Retry 3 times when the connection times out
2015-09-04 10:36:57 -04:00
Mathijs Jonker
4b30aaeef3 Retry 3 times when the connection times out 2015-09-04 14:07:45 +02:00
Jose Diaz-Gonzalez
762059d1a6 Merge pull request #15 from kromkrom/master
Preserve Unicode characters in the output file
2015-05-04 14:13:11 -04:00
Kirill Grushetsky
a440bc1522 Update github-backup 2015-05-04 19:16:23 +03:00
Kirill Grushetsky
43793c1e5e Update github-backup 2015-05-04 19:15:55 +03:00
Kirill Grushetsky
24fac46459 Made unicode output defalut 2015-05-04 19:12:47 +03:00
Kirill Grushetsky
c9916e28a4 Import alphabetised 2015-05-04 13:45:39 +03:00
Kirill Grushetsky
ab4b28cdd4 Preserve Unicode characters in the output file
Added option to preserve Unicode characters in the output file
2015-05-04 13:38:28 +03:00
Jose Diaz-Gonzalez
6feb409fc2 Merge pull request #14 from aensley/master
Added backup of labels and milestones.
2015-04-23 15:37:14 -04:00
aensley
8bdbc2cee2 josegonzales/python-github-backup#12 Added backup of labels and milestones. 2015-04-23 14:05:48 -05:00
Jose Diaz-Gonzalez
a4d6272b50 Merge pull request #11 from Embed-Engineering/master
Added test for uninitialized repo's (or wiki's)
2015-04-15 11:03:25 -04:00
Mathijs Jonker
7ce61202e5 Fixed indent 2015-04-15 12:21:58 +02:00
mjonker-embed
3e82d829e4 Update github-backup 2015-04-15 12:14:55 +02:00
mjonker-embed
339ad96876 Skip unitialized repo's
These gave me errors which caused mails from crontab.
2015-04-15 12:10:53 +02:00
Jose Diaz-Gonzalez
b2a942eb43 Merge pull request #10 from Embed-Engineering/master
Added prefer-ssh
2015-03-20 10:54:42 -04:00
mjonker-embed
e8aa38f395 Added prefer-ssh
Was needed for my back-up setup, code includes this but readme wasn't updated
2015-03-20 14:22:53 +01:00
Jose Diaz-Gonzalez
86bdb1420c Merge pull request #9 from acdha/ratelimit-retries
Retry API requests which failed due to rate-limiting
2015-03-13 17:55:25 -04:00
Chris Adams
2e7f325475 Retry API requests which failed due to rate-limiting
This allows operation to continue, albeit at a slower pace,
if you have enough data to trigger the API rate limits
2015-03-13 17:37:01 -04:00
Jose Diaz-Gonzalez
8bf62cd932 Release 0.4.0 2015-03-13 16:33:28 -04:00
Jose Diaz-Gonzalez
63bf7267a6 Merge pull request #7 from acdha/repo-backup-overhaul
Repo backup overhaul
2015-03-13 16:32:49 -04:00
Chris Adams
5612e51153 Update repository back up handling for wikis
* Now wikis will follow the same logic as the main repo
  checkout for --prefer-ssh.
* The regular repository and wiki paths both use the same
  function to handle either cloning or updating a local copy
  of the remote repo
* All git updates will now use “git fetch --all --tags”
  to ensure that tags and branches other than master will
  also be backed up
2015-03-13 15:50:30 -04:00
Chris Adams
c81bf98627 logging_subprocess: always log when a command fails
Previously git clones could fail without any indication 
unless you edited the source to change `logger=None` to use
a configured logger.

Now a non-zero return code will always output a message to
stderr and will display the executed command so it can be
rerun for troubleshooting.
2015-03-13 15:50:04 -04:00
Chris Adams
040516325a Switch to using ssh_url
The previous commit used the wrong URL for a private repo. This was
masked by the lack of error loging in logging_subprocess (which will be
in a separate branch)
2015-03-13 15:39:35 -04:00
Jose Diaz-Gonzalez
dca9f8051b Merge pull request #6 from acdha/allow-clone-over-ssh
Add an option to prefer checkouts over SSH
2015-03-12 17:15:19 -04:00
Chris Adams
3bc23473b8 Add an option to prefer checkouts over SSH
This is really useful with private repos to avoid being nagged
for credentials for every repository
2015-03-12 16:10:46 -04:00
Jose Diaz-Gonzalez
2c9eb80cf2 Release 0.3.0 2015-02-20 12:41:25 -05:00
Jose Diaz-Gonzalez
bb86f0582e Merge pull request #4 from klaude/pull_request_support
Add pull request support
2015-01-16 11:06:01 -05:00
Kevin Laude
e8387f9a7f Add pull request support
Back up reporitory pull requests by passing the --include-pulls
argument. Pull requests are saved to
repositories/<repository name>/pulls/<pull request number>.json. Include
the --pull-request-comments argument to add review comments to the pull
request backup and pass the --pull-request-commits argument to add
commits to the pull request backup.

Pull requests are automatically backed up when the --all argument is
uesd.
2015-01-16 09:57:05 -06:00
Jose Diaz-Gonzalez
39b173f173 Merge pull request #5 from klaude/github-enterprise-support
Add GitHub Enterprise Support
2015-01-15 22:05:33 -05:00
Kevin Laude
883c92753d Add GitHub Enterprise support
Pass the -H or --github-host argument with a GitHub Enterprise hostname
to backup from that GitHub enterprise host. If no argument is passed
then back up from github.com.
2015-01-15 20:20:33 -06:00
5 changed files with 730 additions and 131 deletions

117
CHANGES.rst Normal file
View File

@@ -0,0 +1,117 @@
Changelog
=========
0.5.0 (2015-10-10)
------------------
- Add release script. [Jose Diaz-Gonzalez]
- Refactor to both simplify codepath as well as follow PEP8 standards.
[Jose Diaz-Gonzalez]
- Retry 3 times when the connection times out. [Mathijs Jonker]
- Made unicode output defalut. [Kirill Grushetsky]
- Import alphabetised. [Kirill Grushetsky]
- Preserve Unicode characters in the output file. [Kirill Grushetsky]
Added option to preserve Unicode characters in the output file
- Josegonzales/python-github-backup#12 Added backup of labels and
milestones. [aensley]
- Fixed indent. [Mathijs Jonker]
- Skip unitialized repo's. [mjonker-embed]
These gave me errors which caused mails from crontab.
- Added prefer-ssh. [mjonker-embed]
Was needed for my back-up setup, code includes this but readme wasn't updated
- Retry API requests which failed due to rate-limiting. [Chris Adams]
This allows operation to continue, albeit at a slower pace,
if you have enough data to trigger the API rate limits
- Logging_subprocess: always log when a command fails. [Chris Adams]
Previously git clones could fail without any indication
unless you edited the source to change `logger=None` to use
a configured logger.
Now a non-zero return code will always output a message to
stderr and will display the executed command so it can be
rerun for troubleshooting.
- Switch to using ssh_url. [Chris Adams]
The previous commit used the wrong URL for a private repo. This was
masked by the lack of error loging in logging_subprocess (which will be
in a separate branch)
- Add an option to prefer checkouts over SSH. [Chris Adams]
This is really useful with private repos to avoid being nagged
for credentials for every repository
- Add pull request support. [Kevin Laude]
Back up reporitory pull requests by passing the --include-pulls
argument. Pull requests are saved to
repositories/<repository name>/pulls/<pull request number>.json. Include
the --pull-request-comments argument to add review comments to the pull
request backup and pass the --pull-request-commits argument to add
commits to the pull request backup.
Pull requests are automatically backed up when the --all argument is
uesd.
- Add GitHub Enterprise support. [Kevin Laude]
Pass the -H or --github-host argument with a GitHub Enterprise hostname
to backup from that GitHub enterprise host. If no argument is passed
then back up from github.com.
0.2.0 (2014-09-22)
------------------
- Add support for retrieving repositories. Closes #1. [Jose Diaz-
Gonzalez]
- Fix PEP8 violations. [Jose Diaz-Gonzalez]
- Add authorization to header only if specified by user. [Ioannis
Filippidis]
- Fill out readme more. [Jose Diaz-Gonzalez]
- Fix import. [Jose Diaz-Gonzalez]
- Properly name readme. [Jose Diaz-Gonzalez]
- Create MANIFEST.in. [Jose Diaz-Gonzalez]
- Create .gitignore. [Jose Diaz-Gonzalez]
- Create setup.py. [Jose Diaz-Gonzalez]
- Create requirements.txt. [Jose Diaz-Gonzalez]
- Create __init__.py. [Jose Diaz-Gonzalez]
- Create LICENSE.txt. [Jose Diaz-Gonzalez]
- Create README.md. [Jose Diaz-Gonzalez]
- Create github-backup. [Jose Diaz-Gonzalez]

View File

@@ -22,12 +22,15 @@ CLI Usage is as follows::
Github Backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN] Github Backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN]
[-o OUTPUT_DIRECTORY] [--starred] [--watched] [--all] [-o OUTPUT_DIRECTORY] [--starred] [--watched] [--all]
[--issues] [--issue-comments] [--issue-events] [--issues] [--issue-comments] [--issue-events] [--pulls]
[--repositories] [--wikis] [--skip-existing] [--pull-comments] [--pull-commits] [--repositories]
[-L [LANGUAGES [LANGUAGES ...]]] [-N NAME_REGEX] [-O] [--wikis] [--skip-existing]
[-R REPOSITORY] [-P] [-F] [-v] [-L [LANGUAGES [LANGUAGES ...]]] [-N NAME_REGEX]
[-H GITHUB_HOST] [-O] [-R REPOSITORY] [-P] [-F]
[--prefer-ssh] [-v]
USER USER
Backup a github users account Backup a github users account
positional arguments: positional arguments:
@@ -49,6 +52,9 @@ CLI Usage is as follows::
--issues include issues in backup --issues include issues in backup
--issue-comments include issue comments in backup --issue-comments include issue comments in backup
--issue-events include issue events in backup --issue-events include issue events in backup
--pulls include pull requests in backup
--pull-comments include pull request review comments in backup
--pull-commits include pull request commits in backup
--repositories include repository clone in backup --repositories include repository clone in backup
--wikis include wiki clone in backup --wikis include wiki clone in backup
--skip-existing skip project if a backup directory exists --skip-existing skip project if a backup directory exists
@@ -56,11 +62,15 @@ CLI Usage is as follows::
only allow these languages only allow these languages
-N NAME_REGEX, --name-regex NAME_REGEX -N NAME_REGEX, --name-regex NAME_REGEX
python regex to match names against python regex to match names against
-H GITHUB_HOST, --github-host GITHUB_HOST
GitHub Enterprise hostname
-O, --organization whether or not this is a query for an organization -O, --organization whether or not this is a query for an organization
-R REPOSITORY, --repository REPOSITORY -R REPOSITORY, --repository REPOSITORY
name of repository to limit backup to name of repository to limit backup to
-P, --private include private repositories -P, --private include private repositories
-F, --fork include forked repositories -F, --fork include forked repositories
--prefer-ssh Clone repositories using SSH instead of HTTPS
-v, --version show program's version number and exit -v, --version show program's version number and exit
The package can be used to backup an *entire* organization or repository, including issues and wikis in the most appropriate format (clones for wikis, json files for issues). The package can be used to backup an *entire* organization or repository, including issues and wikis in the most appropriate format (clones for wikis, json files for issues).

597
bin/github-backup Normal file → Executable file
View File

@@ -1,7 +1,11 @@
#!/usr/bin/env python #!/usr/bin/env python
from __future__ import print_function
import argparse import argparse
import base64 import base64
import calendar
import codecs
import errno import errno
import json import json
import logging import logging
@@ -10,11 +14,14 @@ import re
import select import select
import subprocess import subprocess
import sys import sys
import time
import urllib import urllib
import urllib2 import urllib2
from github_backup import __version__ from github_backup import __version__
FNULL = open(os.devnull, 'w')
def log_error(message): def log_error(message):
if type(message) == str: if type(message) == str:
@@ -34,7 +41,11 @@ def log_info(message):
sys.stdout.write("{0}\n".format(msg)) sys.stdout.write("{0}\n".format(msg))
def logging_subprocess(popenargs, logger, stdout_log_level=logging.DEBUG, stderr_log_level=logging.ERROR, **kwargs): def logging_subprocess(popenargs,
logger,
stdout_log_level=logging.DEBUG,
stderr_log_level=logging.ERROR,
**kwargs):
""" """
Variant of subprocess.call that accepts a logger instead of stdout/stderr, Variant of subprocess.call that accepts a logger instead of stdout/stderr,
and logs stdout messages via logger.debug and stderr messages via and logs stdout messages via logger.debug and stderr messages via
@@ -47,7 +58,10 @@ def logging_subprocess(popenargs, logger, stdout_log_level=logging.DEBUG, stderr
child.stderr: stderr_log_level} child.stderr: stderr_log_level}
def check_io(): def check_io():
ready_to_read = select.select([child.stdout, child.stderr], [], [], 1000)[0] ready_to_read = select.select([child.stdout, child.stderr],
[],
[],
1000)[0]
for io in ready_to_read: for io in ready_to_read:
line = io.readline() line = io.readline()
if not logger: if not logger:
@@ -61,7 +75,13 @@ def logging_subprocess(popenargs, logger, stdout_log_level=logging.DEBUG, stderr
check_io() # check again to catch anything after the process exits check_io() # check again to catch anything after the process exits
return child.wait() rc = child.wait()
if rc != 0:
print(u'{} returned {}:'.format(popenargs[0], rc), file=sys.stderr)
print('\t', u' '.join(popenargs), file=sys.stderr)
return rc
def mkdir_p(*args): def mkdir_p(*args):
@@ -76,28 +96,121 @@ def mkdir_p(*args):
def parse_args(): def parse_args():
parser = argparse.ArgumentParser(description='Backup a github users account', prog='Github Backup') parser = argparse.ArgumentParser(description='Backup a github account',
parser.add_argument('user', metavar='USER', type=str, help='github username') prog='Github Backup')
parser.add_argument('-u', '--username', dest='username', help='username for basic auth') parser.add_argument('user',
parser.add_argument('-p', '--password', dest='password', help='password for basic auth') metavar='USER',
parser.add_argument('-t', '--token', dest='token', help='personal access or OAuth token') type=str,
parser.add_argument('-o', '--output-directory', default='.', dest='output_directory', help='directory at which to backup the repositories') help='github username')
parser.add_argument('--starred', action='store_true', dest='include_starred', help='include starred repositories in backup') parser.add_argument('-u',
parser.add_argument('--watched', action='store_true', dest='include_watched', help='include watched repositories in backup') '--username',
parser.add_argument('--all', action='store_true', dest='include_everything', help='include everything in backup') dest='username',
parser.add_argument('--issues', action='store_true', dest='include_issues', help='include issues in backup') help='username for basic auth')
parser.add_argument('--issue-comments', action='store_true', dest='include_issue_comments', help='include issue comments in backup') parser.add_argument('-p',
parser.add_argument('--issue-events', action='store_true', dest='include_issue_events', help='include issue events in backup') '--password',
parser.add_argument('--repositories', action='store_true', dest='include_repository', help='include repository clone in backup') dest='password',
parser.add_argument('--wikis', action='store_true', dest='include_wiki', help='include wiki clone in backup') help='password for basic auth')
parser.add_argument('--skip-existing', action='store_true', dest='skip_existing', help='skip project if a backup directory exists') parser.add_argument('-t',
parser.add_argument('-L', '--languages', dest='languages', help='only allow these languages', nargs='*') '--token',
parser.add_argument('-N', '--name-regex', dest='name_regex', help='python regex to match names against') dest='token',
parser.add_argument('-O', '--organization', action='store_true', dest='organization', help='whether or not this is a query for an organization') help='personal access or OAuth token')
parser.add_argument('-R', '--repository', dest='repository', help='name of repository to limit backup to') parser.add_argument('-o',
parser.add_argument('-P', '--private', action='store_true', dest='private', help='include private repositories') '--output-directory',
parser.add_argument('-F', '--fork', action='store_true', dest='fork', help='include forked repositories') default='.',
parser.add_argument('-v', '--version', action='version', version='%(prog)s ' + __version__) dest='output_directory',
help='directory at which to backup the repositories')
parser.add_argument('--starred',
action='store_true',
dest='include_starred',
help='include starred repositories in backup')
parser.add_argument('--watched',
action='store_true',
dest='include_watched',
help='include watched repositories in backup')
parser.add_argument('--all',
action='store_true',
dest='include_everything',
help='include everything in backup')
parser.add_argument('--issues',
action='store_true',
dest='include_issues',
help='include issues in backup')
parser.add_argument('--issue-comments',
action='store_true',
dest='include_issue_comments',
help='include issue comments in backup')
parser.add_argument('--issue-events',
action='store_true',
dest='include_issue_events',
help='include issue events in backup')
parser.add_argument('--pulls',
action='store_true',
dest='include_pulls',
help='include pull requests in backup')
parser.add_argument('--pull-comments',
action='store_true',
dest='include_pull_comments',
help='include pull request review comments in backup')
parser.add_argument('--pull-commits',
action='store_true',
dest='include_pull_commits',
help='include pull request commits in backup')
parser.add_argument('--labels',
action='store_true',
dest='include_labels',
help='include labels in backup')
parser.add_argument('--milestones',
action='store_true',
dest='include_milestones',
help='include milestones in backup')
parser.add_argument('--repositories',
action='store_true',
dest='include_repository',
help='include repository clone in backup')
parser.add_argument('--wikis',
action='store_true',
dest='include_wiki',
help='include wiki clone in backup')
parser.add_argument('--skip-existing',
action='store_true',
dest='skip_existing',
help='skip project if a backup directory exists')
parser.add_argument('-L',
'--languages',
dest='languages',
help='only allow these languages',
nargs='*')
parser.add_argument('-N',
'--name-regex',
dest='name_regex',
help='python regex to match names against')
parser.add_argument('-H',
'--github-host',
dest='github_host',
help='GitHub Enterprise hostname')
parser.add_argument('-O',
'--organization',
action='store_true',
dest='organization',
help='whether or not this is an organization user')
parser.add_argument('-R',
'--repository',
dest='repository',
help='name of repository to limit backup to')
parser.add_argument('-P', '--private',
action='store_true',
dest='private',
help='include private repositories')
parser.add_argument('-F', '--fork',
action='store_true',
dest='fork',
help='include forked repositories')
parser.add_argument('--prefer-ssh',
action='store_true',
help='Clone repositories using SSH instead of HTTPS')
parser.add_argument('-v', '--version',
action='version',
version='%(prog)s ' + __version__)
return parser.parse_args() return parser.parse_args()
@@ -108,45 +221,48 @@ def get_auth(args):
elif args.username and args.password: elif args.username and args.password:
auth = base64.b64encode(args.username + ':' + args.password) auth = base64.b64encode(args.username + ':' + args.password)
elif args.username and not args.password: elif args.username and not args.password:
log_error('You must specify a password for basic auth when specifying a username') log_error('You must specify a password for basic auth')
elif args.password and not args.username: elif args.password and not args.username:
log_error('You must specify a username for basic auth when specifying a password') log_error('You must specify a username for basic auth')
return auth return auth
def get_github_api_host(args):
if args.github_host:
host = args.github_host + '/api/v3'
else:
host = 'api.github.com'
return host
def get_github_ssh_host(args):
if args.github_host:
host = args.github_host
else:
host = 'github.com'
return host
def retrieve_data(args, template, query_args=None, single_request=False): def retrieve_data(args, template, query_args=None, single_request=False):
auth = get_auth(args) auth = get_auth(args)
query_args = get_query_args(query_args)
per_page = 100 per_page = 100
page = 0 page = 0
data = [] data = []
if not query_args:
query_args = {}
while True: while True:
page = page + 1 page = page + 1
querystring = urllib.urlencode(dict({ request = _construct_request(per_page, page, query_args, template, auth) # noqa
'per_page': per_page, r, errors = _get_response(request, template)
'page': page
}.items() + query_args.items()))
request = urllib2.Request(template + '?' + querystring) status_code = int(r.getcode())
if auth is not None:
request.add_header('Authorization', 'Basic ' + auth)
r = urllib2.urlopen(request)
errors = [] if status_code != 200:
if int(r.getcode()) != 200: template = 'API request returned HTTP {0}: {1}'
errors.append('Bad response from api') errors.append(template.format(status_code, r.reason))
if 'X-RateLimit-Limit' in r.headers and int(r.headers['X-RateLimit-Limit']) == 0:
ratelimit_error = 'No more requests remaining'
if auth is None:
ratelimit_error = ratelimit_error + ', specify username/password or token to raise your github ratelimit'
errors.append(ratelimit_error)
if int(r.getcode()) != 200:
log_error(errors) log_error(errors)
response = json.loads(r.read()) response = json.loads(r.read())
@@ -167,16 +283,108 @@ def retrieve_data(args, template, query_args=None, single_request=False):
return data return data
def get_query_args(query_args=None):
if not query_args:
query_args = {}
return query_args
def _get_response(request, template):
retry_timeout = 3
errors = []
# We'll make requests in a loop so we can
# delay and retry in the case of rate-limiting
while True:
should_continue = False
try:
r = urllib2.urlopen(request)
except urllib2.HTTPError as exc:
errors, should_continue = _request_http_error(exc, auth, errors) # noqa
except urllib2.URLError:
should_continue = _request_url_error(template, retry_timeout)
if should_continue:
continue
break
return r, errors
def _construct_request(per_page, page, query_args, template, auth):
querystring = urllib.urlencode(dict({
'per_page': per_page,
'page': page
}.items() + query_args.items()))
request = urllib2.Request(template + '?' + querystring)
if auth is not None:
request.add_header('Authorization', 'Basic ' + auth)
return request
def _request_http_error(exc, auth, errors):
# HTTPError behaves like a Response so we can
# check the status code and headers to see exactly
# what failed.
should_continue = False
headers = exc.headers
limit_remaining = int(headers.get('x-ratelimit-remaining', 0))
if exc.code == 403 and limit_remaining < 1:
# The X-RateLimit-Reset header includes a
# timestamp telling us when the limit will reset
# so we can calculate how long to wait rather
# than inefficiently polling:
gm_now = calendar.timegm(time.gmtime())
reset = int(headers.get('x-ratelimit-reset', 0)) or gm_now
# We'll never sleep for less than 10 seconds:
delta = max(10, reset - gm_now)
limit = headers.get('x-ratelimit-limit')
print('Exceeded rate limit of {} requests; waiting {} seconds to reset'.format(limit, delta), # noqa
file=sys.stderr)
ratelimit_error = 'No more requests remaining'
if auth is None:
ratelimit_error += '; authenticate to raise your GitHub rate limit' # noqa
errors.append(ratelimit_error)
time.sleep(delta)
should_continue = True
return errors, should_continue
def _request_url_error(template, retry_timeout):
# Incase of a connection timing out, we can retry a few time
# But we won't crash and not back-up the rest now
log_info('{} timed out'.format(template))
retry_timeout -= 1
if retry_timeout >= 0:
return True
log_error('{} timed out to much, skipping!')
return False
def retrieve_repositories(args): def retrieve_repositories(args):
log_info('Retrieving repositories') log_info('Retrieving repositories')
single_request = False single_request = False
template = 'https://api.github.com/users/{0}/repos'.format(args.user) template = 'https://{0}/users/{1}/repos'.format(
get_github_api_host(args),
args.user)
if args.organization: if args.organization:
template = 'https://api.github.com/orgs/{0}/repos'.format(args.user) template = 'https://{0}/orgs/{1}/repos'.format(
get_github_api_host(args),
args.user)
if args.repository: if args.repository:
single_request = True single_request = True
template = 'https://api.github.com/repos/{0}/{1}'.format(args.user, args.repository) template = 'https://{0}/repos/{1}/{2}'.format(
get_github_api_host(args),
args.user,
args.repository)
return retrieve_data(args, template, single_request=single_request) return retrieve_data(args, template, single_request=single_request)
@@ -196,7 +404,7 @@ def filter_repositories(args, repositories):
if not args.private: if not args.private:
repositories = [r for r in repositories if not r['private']] repositories = [r for r in repositories if not r['private']]
if languages: if languages:
repositories = [r for r in repositories if r['language'] and r['language'].lower() in languages] repositories = [r for r in repositories if r['language'] and r['language'].lower() in languages] # noqa
if name_regex: if name_regex:
repositories = [r for r in repositories if name_regex.match(r['name'])] repositories = [r for r in repositories if name_regex.match(r['name'])]
@@ -205,101 +413,237 @@ def filter_repositories(args, repositories):
def backup_repositories(args, output_directory, repositories): def backup_repositories(args, output_directory, repositories):
log_info('Backing up repositories') log_info('Backing up repositories')
issue_template = "https://api.github.com/repos" repos_template = 'https://{0}/repos'.format(get_github_api_host(args))
wiki_template = "git@github.com:{0}.wiki.git"
issue_states = ['open', 'closed']
for repository in repositories: for repository in repositories:
backup_cwd = os.path.join(output_directory, 'repositories') backup_cwd = os.path.join(output_directory, 'repositories')
repo_cwd = os.path.join(backup_cwd, repository['name']) repo_cwd = os.path.join(backup_cwd, repository['name'])
repo_dir = os.path.join(repo_cwd, 'repository')
if args.prefer_ssh:
repo_url = repository['ssh_url']
else:
repo_url = repository['clone_url']
if args.include_repository or args.include_everything: if args.include_repository or args.include_everything:
mkdir_p(backup_cwd, repo_cwd) fetch_repository(repository['name'],
exists = os.path.isdir('{0}/repository/.git'.format(repo_cwd)) repo_url,
if args.skip_existing and exists: repo_dir,
continue skip_existing=args.skip_existing)
if exists: download_wiki = (args.include_wiki or args.include_everything)
log_info('Updating {0} repository'.format(repository['full_name'])) if repository['has_wiki'] and download_wiki:
git_command = ["git", "pull", 'origin', 'master'] fetch_repository(repository['name'],
logging_subprocess(git_command, logger=None, cwd=os.path.join(repo_cwd, 'repository')) repo_url.replace('.git', '.wiki.git'),
else: os.path.join(repo_cwd, 'wiki'),
log_info('Cloning {0} repository'.format(repository['full_name'])) skip_existing=args.skip_existing)
git_command = ["git", "clone", repository['clone_url'], 'repository']
logging_subprocess(git_command, logger=None, cwd=repo_cwd)
if repository['has_wiki'] and (args.include_wiki or args.include_everything):
mkdir_p(backup_cwd, repo_cwd)
exists = os.path.isdir('{0}/wiki/.git'.format(repo_cwd))
if args.skip_existing and exists:
continue
if exists:
log_info('Updating {0} wiki'.format(repository['full_name']))
git_command = ["git", "pull", 'origin', 'master']
logging_subprocess(git_command, logger=None, cwd=os.path.join(repo_cwd, 'wiki'))
else:
log_info('Cloning {0} wiki'.format(repository['full_name']))
git_command = ["git", "clone", wiki_template.format(repository['full_name']), 'wiki']
logging_subprocess(git_command, logger=None, cwd=repo_cwd)
if args.include_issues or args.include_everything: if args.include_issues or args.include_everything:
if args.skip_existing and os.path.isdir('{0}/issues/.git'.format(repo_cwd)): backup_issues(args, repo_cwd, repository, repos_template)
continue
log_info('Retrieving {0} issues'.format(repository['full_name'])) if args.include_pulls or args.include_everything:
issue_cwd = os.path.join(repo_cwd, 'issues') backup_pulls(args, repo_cwd, repository, repos_template)
mkdir_p(backup_cwd, repo_cwd, issue_cwd)
issues = {} if args.include_milestones or args.include_everything:
_issue_template = '{0}/{1}/issues'.format(issue_template, repository['full_name']) backup_milestones(args, repo_cwd, repository, repos_template)
for issue_state in issue_states: if args.include_labels or args.include_everything:
query_args = { backup_labels(args, repo_cwd, repository, repos_template)
'filter': 'all',
'state': issue_state
}
_issues = retrieve_data(args, _issue_template, query_args=query_args)
for issue in _issues:
issues[issue['number']] = issue
log_info('Saving {0} issues to disk'.format(len(issues.keys()))) def backup_issues(args, repo_cwd, repository, repos_template):
for number, issue in issues.iteritems(): has_issues_dir = os.path.isdir('{0}/issues/.git'.format(repo_cwd))
comments_template = _issue_template + '/{0}/comments' if args.skip_existing and has_issues_dir:
events_template = _issue_template + '/{0}/events' return
if args.include_issue_comments or args.include_everything:
issues[number]['comment_data'] = retrieve_data(args, comments_template.format(number))
if args.include_issue_events or args.include_everything:
issues[number]['event_data'] = retrieve_data(args, events_template.format(number))
with open('{0}/{1}.json'.format(issue_cwd, number), 'w') as issue_file: log_info('Retrieving {0} issues'.format(repository['full_name']))
json.dump(issue, issue_file, sort_keys=True, indent=4, separators=(',', ': ')) issue_cwd = os.path.join(repo_cwd, 'issues')
mkdir_p(repo_cwd, issue_cwd)
issues = {}
_issue_template = '{0}/{1}/issues'.format(repos_template,
repository['full_name'])
issue_states = ['open', 'closed']
for issue_state in issue_states:
query_args = {
'filter': 'all',
'state': issue_state
}
_issues = retrieve_data(args,
_issue_template,
query_args=query_args)
for issue in _issues:
issues[issue['number']] = issue
log_info('Saving {0} issues to disk'.format(len(issues.keys())))
comments_template = _issue_template + '/{0}/comments'
events_template = _issue_template + '/{0}/events'
for number, issue in issues.iteritems():
if args.include_issue_comments or args.include_everything:
template = comments_template.format(number)
issues[number]['comment_data'] = retrieve_data(args, template)
if args.include_issue_events or args.include_everything:
template = events_template.format(number)
issues[number]['event_data'] = retrieve_data(args, template)
issue_file = '{0}/{1}.json'.format(issue_cwd, number)
with codecs.open(issue_file, 'w', encoding='utf-8') as f:
json_dump(issue, f)
def backup_pulls(args, repo_cwd, repository, repos_template):
has_pulls_dir = os.path.isdir('{0}/pulls/.git'.format(repo_cwd))
if args.skip_existing and has_pulls_dir:
return
log_info('Retrieving {0} pull requests'.format(repository['full_name'])) # noqa
pulls_cwd = os.path.join(repo_cwd, 'pulls')
mkdir_p(repo_cwd, pulls_cwd)
pulls = {}
_pulls_template = '{0}/{1}/pulls'.format(repos_template,
repository['full_name'])
pull_states = ['open', 'closed']
for pull_state in pull_states:
query_args = {
'filter': 'all',
'state': pull_state
}
_pulls = retrieve_data(args,
_pulls_template,
query_args=query_args)
for pull in _pulls:
pulls[pull['number']] = pull
log_info('Saving {0} pull requests to disk'.format(len(pulls.keys())))
comments_template = _pulls_template + '/{0}/comments'
commits_template = _pulls_template + '/{0}/commits'
for number, pull in pulls.iteritems():
if args.include_pull_comments or args.include_everything:
template = comments_template.format(number)
pulls[number]['comment_data'] = retrieve_data(args, template)
if args.include_pull_commits or args.include_everything:
template = commits_template.format(number)
pulls[number]['commit_data'] = retrieve_data(args, template)
pull_file = '{0}/{1}.json'.format(pulls_cwd, number)
with codecs.open(pull_file, 'w', encoding='utf-8') as f:
json_dump(pull, f)
def backup_milestones(args, repo_cwd, repository, repos_template):
milestone_cwd = os.path.join(repo_cwd, 'milestones')
if args.skip_existing and os.path.isdir(milestone_cwd):
return
log_info('Retrieving {0} milestones'.format(repository['full_name']))
mkdir_p(repo_cwd, milestone_cwd)
template = '{0}/{1}/milestones'.format(repos_template,
repository['full_name'])
query_args = {
'state': 'all'
}
_milestones = retrieve_data(args, template, query_args=query_args)
milestones = {}
for milestone in _milestones:
milestones[milestone['number']] = milestone
log_info('Saving {0} milestones to disk'.format(len(milestones.keys())))
for number, milestone in milestones.iteritems():
milestone_file = '{0}/{1}.json'.format(milestone_cwd, number)
with codecs.open(milestone_file, 'w', encoding='utf-8') as f:
json_dump(milestone, f)
def backup_labels(args, repo_cwd, repository, repos_template):
label_cwd = os.path.join(repo_cwd, 'labels')
output_file = '{0}/labels.json'.format(label_cwd)
template = '{0}/{1}/labels'.format(repos_template,
repository['full_name'])
_backup_data(args,
'labels',
template,
output_file,
label_cwd)
def fetch_repository(name, remote_url, local_dir, skip_existing=False):
clone_exists = os.path.exists(os.path.join(local_dir, '.git'))
if clone_exists and skip_existing:
return
initalized = subprocess.call('git ls-remote ' + remote_url,
stdout=FNULL,
stderr=FNULL,
shell=True)
if initalized == 128:
log_info("Skipping {0} since it's not initalized".format(name))
return
if clone_exists:
log_info('Updating {0} in {1}'.format(name, local_dir))
git_command = ['git', 'fetch', '--all', '--tags', '--prune']
logging_subprocess(git_command, None, cwd=local_dir)
else:
log_info('Cloning {0} repository from {1} to {2}'.format(name,
remote_url,
local_dir))
git_command = ['git', 'clone', remote_url, local_dir]
logging_subprocess(git_command, None)
def backup_account(args, output_directory): def backup_account(args, output_directory):
account_cwd = os.path.join(output_directory, 'account') account_cwd = os.path.join(output_directory, 'account')
if args.include_starred or args.include_everything:
if not args.skip_existing or not os.path.exists('{0}/starred.json'.format(account_cwd)):
log_info('Retrieving {0} starred repositories'.format(args.user))
mkdir_p(account_cwd)
starred_template = "https://api.github.com/users/{0}/starred" if args.include_starred or args.include_everything:
starred = retrieve_data(args, starred_template.format(args.user)) output_file = '{0}/starred.json'.format(account_cwd)
log_info('Writing {0} starred repositories'.format(len(starred))) template = "https://{0}/users/{1}/starred"
with open('{0}/starred.json'.format(account_cwd), 'w') as starred_file: template = template.format(get_github_api_host(args), args.user)
json.dump(starred, starred_file, sort_keys=True, indent=4, separators=(',', ': ')) _backup_data(args,
'starred repositories',
template,
output_file,
account_cwd)
if args.include_watched or args.include_everything: if args.include_watched or args.include_everything:
if not args.skip_existing or not os.path.exists('{0}/watched.json'.format(account_cwd)): output_file = '{0}/watched.json'.format(account_cwd)
log_info('Retrieving {0} watched repositories'.format(args.user)) template = "https://{0}/users/{1}/subscriptions"
mkdir_p(account_cwd) template = template.format(get_github_api_host(args), args.user)
_backup_data(args,
'watched repositories',
template,
output_file,
account_cwd)
watched_template = "https://api.github.com/users/{0}/subscriptions"
watched = retrieve_data(args, watched_template.format(args.user)) def _backup_data(args, name, template, output_file, output_directory):
log_info('Writing {0} watched repositories'.format(len(watched))) skip_existing = args.skip_existing
with open('{0}/watched.json'.format(account_cwd), 'w') as watched_file: if not skip_existing or not os.path.exists(output_file):
json.dump(watched, watched_file, sort_keys=True, indent=4, separators=(',', ': ')) log_info('Retrieving {0} {1}'.format(args.user, name))
mkdir_p(output_directory)
data = retrieve_data(args, template)
log_info('Writing {0} {1} to disk'.format(len(data), name))
with codecs.open(output_file, 'w', encoding='utf-8') as f:
json_dump(data, f)
def json_dump(data, output_file):
json.dump(data,
output_file,
ensure_ascii=False,
sort_keys=True,
indent=4,
separators=(',', ': '))
def main(): def main():
@@ -307,7 +651,8 @@ def main():
output_directory = os.path.realpath(args.output_directory) output_directory = os.path.realpath(args.output_directory)
if not os.path.isdir(output_directory): if not os.path.isdir(output_directory):
log_error('Specified output directory is not a directory: {0}'.format(output_directory)) log_error('Specified output directory is not a directory: {0}'.format(
output_directory))
log_info('Backing up user {0} to {1}'.format(args.user, output_directory)) log_info('Backing up user {0} to {1}'.format(args.user, output_directory))

View File

@@ -1 +1 @@
__version__ = '0.2.0' __version__ = '0.5.0'

127
release Executable file
View File

@@ -0,0 +1,127 @@
#!/usr/bin/env bash
set -eo pipefail; [[ $RELEASE_TRACE ]] && set -x
PACKAGE_NAME='github-backup'
INIT_PACKAGE_NAME='github_backup'
PUBLIC="true"
# Colors
COLOR_OFF="\033[0m" # unsets color to term fg color
RED="\033[0;31m" # red
GREEN="\033[0;32m" # green
YELLOW="\033[0;33m" # yellow
MAGENTA="\033[0;35m" # magenta
CYAN="\033[0;36m" # cyan
# ensure wheel is available
pip install wheel > /dev/null
command -v gitchangelog >/dev/null 2>&1 || {
echo -e "${RED}WARNING: Missing gitchangelog binary, please run: pip install gitchangelog==2.2.0${COLOR_OFF}\n"
exit 1
}
command -v rst-lint > /dev/null || {
echo -e "${RED}WARNING: Missing rst-lint binary, please run: pip install restructuredtext_lint${COLOR_OFF}\n"
exit 1
}
if [[ "$@" != "major" ]] && [[ "$@" != "minor" ]] && [[ "$@" != "patch" ]]; then
echo -e "${RED}WARNING: Invalid release type, must specify 'major', 'minor', or 'patch'${COLOR_OFF}\n"
exit 1
fi
echo -e "\n${GREEN}STARTING RELEASE PROCESS${COLOR_OFF}\n"
set +e;
git status | grep "working directory clean" &> /dev/null
if [ ! $? -eq 0 ]; then # working directory is NOT clean
echo -e "${RED}WARNING: You have uncomitted changes, you may have forgotten something${COLOR_OFF}\n"
exit 1
fi
set -e;
echo -e "${YELLOW}--->${COLOR_OFF} Updating local copy"
git pull -q origin master
echo -e "${YELLOW}--->${COLOR_OFF} Retrieving release versions"
current_version=$(cat ${INIT_PACKAGE_NAME}/__init__.py |grep '__version__ ='|sed 's/[^0-9.]//g')
major=$(echo $current_version | awk '{split($0,a,"."); print a[1]}')
minor=$(echo $current_version | awk '{split($0,a,"."); print a[2]}')
patch=$(echo $current_version | awk '{split($0,a,"."); print a[3]}')
if [[ "$@" == "major" ]]; then
major=$(($major + 1));
minor="0"
patch="0"
elif [[ "$@" == "minor" ]]; then
minor=$(($minor + 1));
patch="0"
elif [[ "$@" == "patch" ]]; then
patch=$(($patch + 1));
fi
next_version="${major}.${minor}.${patch}"
echo -e "${YELLOW} >${COLOR_OFF} ${MAGENTA}${current_version}${COLOR_OFF} -> ${MAGENTA}${next_version}${COLOR_OFF}"
echo -e "${YELLOW}--->${COLOR_OFF} Ensuring readme passes lint checks (if this fails, run rst-lint)"
rst-lint README.rst > /dev/null
echo -e "${YELLOW}--->${COLOR_OFF} Creating necessary temp file"
tempfoo=$(basename $0)
TMPFILE=$(mktemp /tmp/${tempfoo}.XXXXXX) || {
echo -e "${RED}WARNING: Cannot create temp file using mktemp in /tmp dir ${COLOR_OFF}\n"
exit 1
}
find_this="__version__ = '$current_version'"
replace_with="__version__ = '$next_version'"
echo -e "${YELLOW}--->${COLOR_OFF} Updating ${INIT_PACKAGE_NAME}/__init__.py"
sed "s/$find_this/$replace_with/" ${INIT_PACKAGE_NAME}/__init__.py > $TMPFILE && mv $TMPFILE ${INIT_PACKAGE_NAME}/__init__.py
find_this="${PACKAGE_NAME}.git@$current_version"
replace_with="${PACKAGE_NAME}.git@$next_version"
echo -e "${YELLOW}--->${COLOR_OFF} Updating README.rst"
sed "s/$find_this/$replace_with/" README.rst > $TMPFILE && mv $TMPFILE README.rst
if [ -f docs/conf.py ]; then
echo -e "${YELLOW}--->${COLOR_OFF} Updating docs"
find_this="version = '${current_version}'"
replace_with="version = '${next_version}'"
sed "s/$find_this/$replace_with/" docs/conf.py > $TMPFILE && mv $TMPFILE docs/conf.py
find_this="version = '${current_version}'"
replace_with="release = '${next_version}'"
sed "s/$find_this/$replace_with/" docs/conf.py > $TMPFILE && mv $TMPFILE docs/conf.py
fi
echo -e "${YELLOW}--->${COLOR_OFF} Updating CHANGES.rst for new release"
version_header="$next_version ($(date +%F))"
set +e; dashes=$(yes '-'|head -n ${#version_header}|tr -d '\n') ; set -e
gitchangelog |sed "4s/.*/$version_header/"|sed "5s/.*/$dashes/" > $TMPFILE && mv $TMPFILE CHANGES.rst
echo -e "${YELLOW}--->${COLOR_OFF} Adding changed files to git"
git add CHANGES.rst README.rst ${INIT_PACKAGE_NAME}/__init__.py
if [ -f docs/conf.py ]; then git add docs/conf.py; fi
echo -e "${YELLOW}--->${COLOR_OFF} Creating release"
git commit -q -m "Release version $next_version"
echo -e "${YELLOW}--->${COLOR_OFF} Tagging release"
git tag -a $next_version -m "Release version $next_version"
echo -e "${YELLOW}--->${COLOR_OFF} Pushing release and tags to github"
git push -q origin master && git push -q --tags
if [[ "$PUBLIC" == "true" ]]; then
echo -e "${YELLOW}--->${COLOR_OFF} Creating python release"
cp README.rst README
python setup.py sdist bdist_wheel upload > /dev/null
rm README
fi
echo -e "\n${CYAN}RELEASED VERSION ${next_version}!${COLOR_OFF}\n"