Compare commits

..

27 Commits

Author SHA1 Message Date
Jose Diaz-Gonzalez
f0b28567b9 Release version 0.11.0 2016-10-26 14:14:00 -06:00
Jose Diaz-Gonzalez
77ede50b19 Merge pull request #52 from bjodah/fix-gh-51
Support --token file:///home/user/token.txt (fixes gh-51)
2016-10-26 14:13:35 -06:00
Björn Dahlgren
97e4fbbacb Support --token file:///home/user/token.txt (fixes gh-51) 2016-10-26 01:57:33 +02:00
Jose Diaz-Gonzalez
03604cc654 Merge pull request #48 from albertyw/python3
Support Python 3
2016-10-25 17:38:05 -06:00
Albert Wang
73a62fdee1 Fix some linting 2016-09-11 01:14:36 -07:00
Albert Wang
94e1d62ad5 Fix byte/string conversion for python 3 2016-09-11 01:14:31 -07:00
Albert Wang
54cef11ce7 Support python 3 2016-09-11 01:14:19 -07:00
Jose Diaz-Gonzalez
56397eba1c Merge pull request #46 from remram44/encode-password
Encode special characters in password
2016-09-06 14:31:35 -04:00
Remi Rampin
9f861efccf Encode special characters in password 2016-09-06 14:27:47 -04:00
Jose Diaz-Gonzalez
c1c9ce6dca Merge pull request #45 from remram44/cli-programname
Fix program name
2016-09-06 12:42:45 -04:00
Jose Diaz-Gonzalez
ab18d8aee0 Merge pull request #44 from remram44/readme-git-https
Don't install over insecure connection
2016-09-06 12:42:30 -04:00
Remi Rampin
9d7d98b19e Update README.rst 2016-09-06 12:28:44 -04:00
Remi Rampin
0233bff696 Don't pretend program name is "Github Backup" 2016-09-06 12:24:51 -04:00
Remi Rampin
6154ceda15 Don't install over insecure connection
The git:// protocol is unauthenticated and unencrypted, and no longer advertised by GitHub. Using HTTPS shouldn't impact performance.
2016-09-06 12:11:29 -04:00
Jose Diaz-Gonzalez
9023052e9c Release version 0.10.3 2016-08-20 20:50:29 -04:00
Jose Diaz-Gonzalez
874c235ba5 Merge pull request #30 from jonasrmichel/master
Fixes #29
2016-08-20 20:50:25 -04:00
Jose Diaz-Gonzalez
b7b234d8a5 Release version 0.10.2 2016-08-20 20:49:46 -04:00
Jose Diaz-Gonzalez
ed160eb0ca Add a note regarding git version requirement
Closes #37
2016-08-20 20:49:42 -04:00
Jose Diaz-Gonzalez
1d11d62b73 Release version 0.10.1 2016-08-20 20:45:27 -04:00
Jose Diaz-Gonzalez
9e1cba9817 Release version 0.10.0 2016-08-18 14:20:46 -04:00
Jose Diaz-Gonzalez
3859a80b7a Merge pull request #42 from robertwb/master
Implement incremental updates
2016-08-18 14:20:05 -04:00
Robert Bradshaw
8c12d54898 Implement incremental updates
Guarded with an --incremental flag.

Stores the time of the last update and only downloads issue and
pull request data since this time.  All other data is relatively
small (likely fetched with a single request) and so is simply
re-populated from scratch as before.
2016-08-17 21:31:59 -07:00
Jose Diaz-Gonzalez
b6b6605acd Release version 0.9.0 2016-03-29 13:23:45 -04:00
Jose Diaz-Gonzalez
ff5e0aa89c Merge pull request #36 from zlabjp/fix-cloning-private-repos
Fix cloning private repos with basic auth or token
2016-03-29 13:21:57 -04:00
Kazuki Suda
79726c360d Fix cloning private repos with basic auth or token 2016-03-29 15:23:54 +09:00
Jonas Michel
1e5a90486c Fixes #29
Reporting an error when the user's rate limit is exceeded causes
the script to terminate after resuming execution from a rate limit
sleep. Instead of generating an explicit error we just want to
inform the user that the script is going to sleep until their rate
limit count resets.
2016-01-20 14:48:02 -06:00
Jonas Michel
9b74aff20b Fixes #29
The errors list was not being cleared out after resuming a backup
from a rate limit sleep. When the backup was resumed, the non-empty
errors list caused the backup to quit after the next `retrieve_data`
request.
2016-01-17 11:10:28 -06:00
4 changed files with 203 additions and 55 deletions

View File

@@ -1,6 +1,71 @@
Changelog
=========
0.11.0 (2016-10-26)
-------------------
- Support --token file:///home/user/token.txt (fixes gh-51) [Björn
Dahlgren]
- Fix some linting. [Albert Wang]
- Fix byte/string conversion for python 3. [Albert Wang]
- Support python 3. [Albert Wang]
- Encode special characters in password. [Remi Rampin]
- Don't pretend program name is "Github Backup" [Remi Rampin]
- Don't install over insecure connection. [Remi Rampin]
The git:// protocol is unauthenticated and unencrypted, and no longer advertised by GitHub. Using HTTPS shouldn't impact performance.
0.10.3 (2016-08-21)
-------------------
- Fixes #29. [Jonas Michel]
Reporting an error when the user's rate limit is exceeded causes
the script to terminate after resuming execution from a rate limit
sleep. Instead of generating an explicit error we just want to
inform the user that the script is going to sleep until their rate
limit count resets.
- Fixes #29. [Jonas Michel]
The errors list was not being cleared out after resuming a backup
from a rate limit sleep. When the backup was resumed, the non-empty
errors list caused the backup to quit after the next `retrieve_data`
request.
0.10.2 (2016-08-21)
-------------------
- Add a note regarding git version requirement. [Jose Diaz-Gonzalez]
Closes #37
0.10.0 (2016-08-18)
-------------------
- Implement incremental updates. [Robert Bradshaw]
Guarded with an --incremental flag.
Stores the time of the last update and only downloads issue and
pull request data since this time. All other data is relatively
small (likely fetched with a single request) and so is simply
re-populated from scratch as before.
0.9.0 (2016-03-29)
------------------
- Fix cloning private repos with basic auth or token. [Kazuki Suda]
0.8.0 (2016-02-14)
------------------

View File

@@ -4,6 +4,11 @@ github-backup
backup a github user or organization
Requirements
============
- GIT 1.9+
Installation
============
@@ -13,22 +18,22 @@ Using PIP via PyPI::
Using PIP via Github::
pip install git+git://github.com/josegonzalez/python-github-backup.git#egg=github-backup
pip install git+https://github.com/josegonzalez/python-github-backup.git#egg=github-backup
Usage
=====
CLI Usage is as follows::
Github Backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN]
[-o OUTPUT_DIRECTORY] [--starred] [--watched] [--all]
[--issues] [--issue-comments] [--issue-events] [--pulls]
[--pull-comments] [--pull-commits] [--labels] [--hooks]
[--milestones] [--repositories] [--wikis]
[--skip-existing] [-L [LANGUAGES [LANGUAGES ...]]]
[-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY]
[-P] [-F] [--prefer-ssh] [-v]
USER
github-backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN]
[-o OUTPUT_DIRECTORY] [-i] [--starred] [--watched]
[--all] [--issues] [--issue-comments] [--issue-events]
[--pulls] [--pull-comments] [--pull-commits] [--labels]
[--hooks] [--milestones] [--repositories] [--wikis]
[--skip-existing] [-L [LANGUAGES [LANGUAGES ...]]]
[-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY]
[-P] [-F] [--prefer-ssh] [-v]
USER
Backup a github account
@@ -46,6 +51,7 @@ CLI Usage is as follows::
personal access or OAuth token
-o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
directory at which to backup the repositories
-i, --incremental incremental backup
--starred include starred repositories in backup
--watched include watched repositories in backup
--all include everything in backup

View File

@@ -16,10 +16,25 @@ import select
import subprocess
import sys
import time
import urllib
import urllib2
try:
# python 3
from urllib.parse import urlparse
from urllib.parse import quote as urlquote
from urllib.parse import urlencode
from urllib.error import HTTPError, URLError
from urllib.request import urlopen
from urllib.request import Request
except ImportError:
# python 2
from urlparse import urlparse
from urllib import quote as urlquote
from urllib import urlencode
from urllib2 import HTTPError, URLError
from urllib2 import urlopen
from urllib2 import Request
from github_backup import __version__
__version__='asdf'
# from github_backup import __version__
FNULL = open(os.devnull, 'w')
@@ -79,8 +94,8 @@ def logging_subprocess(popenargs,
rc = child.wait()
if rc != 0:
print(u'{} returned {}:'.format(popenargs[0], rc), file=sys.stderr)
print('\t', u' '.join(popenargs), file=sys.stderr)
print('{} returned {}:'.format(popenargs[0], rc), file=sys.stderr)
print('\t', ' '.join(popenargs), file=sys.stderr)
return rc
@@ -96,9 +111,19 @@ def mkdir_p(*args):
raise
def mask_password(url, secret='*****'):
parsed = urlparse(url)
if not parsed.password:
return url
elif parsed.password == 'x-oauth-basic':
return url.replace(parsed.username, secret)
return url.replace(parsed.password, secret)
def parse_args():
parser = argparse.ArgumentParser(description='Backup a github account',
prog='Github Backup')
parser = argparse.ArgumentParser(description='Backup a github account')
parser.add_argument('user',
metavar='USER',
type=str,
@@ -116,12 +141,17 @@ def parse_args():
parser.add_argument('-t',
'--token',
dest='token',
help='personal access or OAuth token')
help='personal access or OAuth token, or path to token (file://...)')
parser.add_argument('-o',
'--output-directory',
default='.',
dest='output_directory',
help='directory at which to backup the repositories')
parser.add_argument('-i',
'--incremental',
action='store_true',
dest='incremental',
help='incremental backup')
parser.add_argument('--starred',
action='store_true',
dest='include_starred',
@@ -221,19 +251,33 @@ def parse_args():
return parser.parse_args()
def get_auth(args):
if args.token:
return base64.b64encode(args.token + ':' + 'x-oauth-basic')
def get_auth(args, encode=True):
auth = None
if args.username:
if args.token:
_path_specifier = 'file://'
if args.token.startswith(_path_specifier):
args.token = open(args.token[len(_path_specifier):],
'rt').readline().strip()
auth = args.token + ':' + 'x-oauth-basic'
elif args.username:
if not args.password:
args.password = getpass.getpass()
return base64.b64encode(args.username + ':' + args.password)
if args.password:
if encode:
password = args.password
else:
password = urlquote(args.password)
auth = args.username + ':' + password
elif args.password:
log_error('You must specify a username for basic auth')
return None
if not auth:
return None
if not encode:
return auth
return base64.b64encode(auth.encode('ascii'))
def get_github_api_host(args):
@@ -245,7 +289,7 @@ def get_github_api_host(args):
return host
def get_github_ssh_host(args):
def get_github_host(args):
if args.github_host:
host = args.github_host
else:
@@ -254,6 +298,23 @@ def get_github_ssh_host(args):
return host
def get_github_repo_url(args, repository):
if args.prefer_ssh:
return repository['ssh_url']
auth = get_auth(args, False)
if auth:
repo_url = 'https://{0}@{1}/{2}/{3}.git'.format(
auth,
get_github_host(args),
args.user,
repository['name'])
else:
repo_url = repository['clone_url']
return repo_url
def retrieve_data(args, template, query_args=None, single_request=False):
auth = get_auth(args)
query_args = get_query_args(query_args)
@@ -273,7 +334,7 @@ def retrieve_data(args, template, query_args=None, single_request=False):
errors.append(template.format(status_code, r.reason))
log_error(errors)
response = json.loads(r.read())
response = json.loads(r.read().decode('utf-8'))
if len(errors) == 0:
if type(response) == list:
data.extend(response)
@@ -305,11 +366,11 @@ def _get_response(request, auth, template):
while True:
should_continue = False
try:
r = urllib2.urlopen(request)
except urllib2.HTTPError as exc:
r = urlopen(request)
except HTTPError as exc:
errors, should_continue = _request_http_error(exc, auth, errors) # noqa
r = exc
except urllib2.URLError:
except URLError:
should_continue = _request_url_error(template, retry_timeout)
if not should_continue:
raise
@@ -322,14 +383,14 @@ def _get_response(request, auth, template):
def _construct_request(per_page, page, query_args, template, auth):
querystring = urllib.urlencode(dict({
querystring = urlencode(dict(list({
'per_page': per_page,
'page': page
}.items() + query_args.items()))
}.items()) + list(query_args.items())))
request = urllib2.Request(template + '?' + querystring)
request = Request(template + '?' + querystring)
if auth is not None:
request.add_header('Authorization', 'Basic ' + auth)
request.add_header('Authorization', 'Basic '.encode('ascii') + auth)
return request
@@ -356,10 +417,9 @@ def _request_http_error(exc, auth, errors):
print('Exceeded rate limit of {} requests; waiting {} seconds to reset'.format(limit, delta), # noqa
file=sys.stderr)
ratelimit_error = 'No more requests remaining'
if auth is None:
ratelimit_error += '; authenticate to raise your GitHub rate limit' # noqa
errors.append(ratelimit_error)
print('Hint: Authenticate to raise your GitHub rate limit',
file=sys.stderr)
time.sleep(delta)
should_continue = True
@@ -428,15 +488,21 @@ def backup_repositories(args, output_directory, repositories):
log_info('Backing up repositories')
repos_template = 'https://{0}/repos'.format(get_github_api_host(args))
if args.incremental:
last_update = max(repository['updated_at'] for repository in repositories)
last_update_path = os.path.join(output_directory, 'last_update')
if os.path.exists(last_update_path):
args.since = open(last_update_path).read().strip()
else:
args.since = None
else:
args.since = None
for repository in repositories:
backup_cwd = os.path.join(output_directory, 'repositories')
repo_cwd = os.path.join(backup_cwd, repository['name'])
repo_dir = os.path.join(repo_cwd, 'repository')
if args.prefer_ssh:
repo_url = repository['ssh_url']
else:
repo_url = repository['clone_url']
repo_url = get_github_repo_url(args, repository)
if args.include_repository or args.include_everything:
fetch_repository(repository['name'],
@@ -466,6 +532,9 @@ def backup_repositories(args, output_directory, repositories):
if args.include_hooks or args.include_everything:
backup_hooks(args, repo_cwd, repository, repos_template)
if args.incremental:
open(last_update_path, 'w').write(last_update)
def backup_issues(args, repo_cwd, repository, repos_template):
has_issues_dir = os.path.isdir('{0}/issues/.git'.format(repo_cwd))
@@ -488,6 +557,8 @@ def backup_issues(args, repo_cwd, repository, repos_template):
'filter': 'all',
'state': issue_state
}
if args.since:
query_args['since'] = args.since
_issues = retrieve_data(args,
_issue_template,
@@ -503,10 +574,10 @@ def backup_issues(args, repo_cwd, repository, repos_template):
if issues_skipped:
issues_skipped_message = ' (skipped {0} pull requests)'.format(issues_skipped)
log_info('Saving {0} issues to disk{1}'.format(len(issues.keys()), issues_skipped_message))
log_info('Saving {0} issues to disk{1}'.format(len(list(issues.keys())), issues_skipped_message))
comments_template = _issue_template + '/{0}/comments'
events_template = _issue_template + '/{0}/events'
for number, issue in issues.iteritems():
for number, issue in list(issues.items()):
if args.include_issue_comments or args.include_everything:
template = comments_template.format(number)
issues[number]['comment_data'] = retrieve_data(args, template)
@@ -536,19 +607,23 @@ def backup_pulls(args, repo_cwd, repository, repos_template):
for pull_state in pull_states:
query_args = {
'filter': 'all',
'state': pull_state
'state': pull_state,
'sort': 'updated',
'direction': 'desc',
}
# It'd be nice to be able to apply the args.since filter here...
_pulls = retrieve_data(args,
_pulls_template,
query_args=query_args)
for pull in _pulls:
pulls[pull['number']] = pull
if not args.since or pull['updated_at'] >= args.since:
pulls[pull['number']] = pull
log_info('Saving {0} pull requests to disk'.format(len(pulls.keys())))
log_info('Saving {0} pull requests to disk'.format(len(list(pulls.keys()))))
comments_template = _pulls_template + '/{0}/comments'
commits_template = _pulls_template + '/{0}/commits'
for number, pull in pulls.iteritems():
for number, pull in list(pulls.items()):
if args.include_pull_comments or args.include_everything:
template = comments_template.format(number)
pulls[number]['comment_data'] = retrieve_data(args, template)
@@ -582,8 +657,8 @@ def backup_milestones(args, repo_cwd, repository, repos_template):
for milestone in _milestones:
milestones[milestone['number']] = milestone
log_info('Saving {0} milestones to disk'.format(len(milestones.keys())))
for number, milestone in milestones.iteritems():
log_info('Saving {0} milestones to disk'.format(len(list(milestones.keys()))))
for number, milestone in list(milestones.items()):
milestone_file = '{0}/{1}.json'.format(milestone_cwd, number)
with codecs.open(milestone_file, 'w', encoding='utf-8') as f:
json_dump(milestone, f)
@@ -609,7 +684,7 @@ def backup_hooks(args, repo_cwd, repository, repos_template):
hook_cwd = os.path.join(repo_cwd, 'hooks')
output_file = '{0}/hooks.json'.format(hook_cwd)
template = '{0}/{1}/hooks'.format(repos_template,
repository['full_name'])
repository['full_name'])
try:
_backup_data(args,
'hooks',
@@ -626,12 +701,14 @@ def fetch_repository(name, remote_url, local_dir, skip_existing=False):
if clone_exists and skip_existing:
return
masked_remote_url = mask_password(remote_url)
initalized = subprocess.call('git ls-remote ' + remote_url,
stdout=FNULL,
stderr=FNULL,
shell=True)
if initalized == 128:
log_info("Skipping {0} ({1}) since it's not initalized".format(name, remote_url))
log_info("Skipping {0} ({1}) since it's not initalized".format(name, masked_remote_url))
return
if clone_exists:
@@ -644,7 +721,7 @@ def fetch_repository(name, remote_url, local_dir, skip_existing=False):
logging_subprocess(git_command, None, cwd=local_dir)
else:
log_info('Cloning {0} repository from {1} to {2}'.format(name,
remote_url,
masked_remote_url,
local_dir))
git_command = ['git', 'clone', remote_url, local_dir]
logging_subprocess(git_command, None)

View File

@@ -1 +1 @@
__version__ = '0.8.0'
__version__ = '0.11.0'