Compare commits

..

27 Commits

Author SHA1 Message Date
GitHub Action
8b7512c8d8 Release version 0.52.0 2025-11-28 23:39:09 +00:00
Jose Diaz-Gonzalez
995b7ede6c Merge pull request #454 from Iamrodos/http-451
Skip DMCA'd repos which return a 451 response
2025-11-28 18:38:32 -05:00
Rodos
7840528fe2 Skip DMCA'd repos which return a 451 response
Log a warning and the link to the DMCA notice. Continue backing up
other repositories instead of crashing.

Closes #163
2025-11-29 09:52:02 +11:00
Jose Diaz-Gonzalez
6fb0d86977 Merge pull request #453 from josegonzalez/dependabot/pip/python-packages-42260fba7a
chore(deps): bump restructuredtext-lint from 1.4.0 to 2.0.2 in the python-packages group
2025-11-24 15:07:08 -05:00
dependabot[bot]
9f6b401171 chore(deps): bump restructuredtext-lint in the python-packages group
Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint).


Updates `restructuredtext-lint` from 1.4.0 to 2.0.2
- [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2)

---
updated-dependencies:
- dependency-name: restructuredtext-lint
  dependency-version: 2.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-24 14:58:52 +00:00
Jose Diaz-Gonzalez
bf638f7aea Merge pull request #452 from josegonzalez/dependabot/github_actions/actions/checkout-6
chore(deps): bump actions/checkout from 5 to 6
2025-11-24 04:42:52 -05:00
dependabot[bot]
c3855a94f1 chore(deps): bump actions/checkout from 5 to 6
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-24 04:09:25 +00:00
Jose Diaz-Gonzalez
c3f4bfde0d Merge pull request #451 from josegonzalez/dependabot/pip/python-packages-63544ef561
chore(deps): bump the python-packages group with 3 updates
2025-11-18 11:44:02 -05:00
dependabot[bot]
d3edef0622 chore(deps): bump the python-packages group with 3 updates
Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring).


Updates `click` from 8.3.0 to 8.3.1
- [Release notes](https://github.com/pallets/click/releases)
- [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1)

Updates `pytest` from 8.3.3 to 9.0.1
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1)

Updates `keyring` from 25.6.0 to 25.7.0
- [Release notes](https://github.com/jaraco/keyring/releases)
- [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst)
- [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0)

---
updated-dependencies:
- dependency-name: click
  dependency-version: 8.3.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
- dependency-name: pytest
  dependency-version: 9.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: python-packages
- dependency-name: keyring
  dependency-version: 25.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-18 13:24:06 +00:00
GitHub Action
9ef496efad Release version 0.51.3 2025-11-18 06:55:36 +00:00
Jose Diaz-Gonzalez
42bfe6f79d Merge pull request #450 from Iamrodos/test/add-pagination-tests
test: Add pagination tests for cursor and page-based Link headers
2025-11-18 01:54:54 -05:00
Rodos
5af522a348 test: Add pagination tests for cursor and page-based Link headers 2025-11-17 17:14:29 +11:00
Jose Diaz-Gonzalez
6dfba7a783 Merge pull request #449 from 0x2b3bfa0/patch-1
Use cursor based pagination
2025-11-17 00:31:25 -05:00
Helio Machado
7551829677 Use cursor based pagination 2025-11-17 02:09:29 +01:00
GitHub Action
72d35a9b94 Release version 0.51.2 2025-11-16 23:55:36 +00:00
Jose Diaz-Gonzalez
3eae9d78ed Merge pull request #447 from Iamrodos/master
fix: Improve CA certificate detection with fallback chain
2025-11-16 18:54:58 -05:00
Rodos
90ba839c7d fix: Improve CA certificate detection with fallback chain
The previous implementation incorrectly assumed empty get_ca_certs()
meant broken SSL, causing false failures in GitHub Codespaces and other
directory-based cert systems where certificates exist but aren't pre-loaded.
It would then attempt to import certifi as a workaround, but certifi wasn't
listed in requirements.txt, causing the fallback to fail with ImportError
even though the system certificates would have worked fine.

This commit replaces the naive check with a layered fallback approach that
checks multiple certificate sources. First it checks for pre-loaded system
certs (file-based systems). Then it verifies system cert paths exist
(directory-based systems like Ubuntu/Debian/Codespaces). Finally it attempts
to use certifi as an optional fallback only if needed.

This approach eliminates hard dependencies (certifi is now optional), works
in GitHub Codespaces without any setup, and fails gracefully with clear hints
for resolution when SSL is actually broken rather than failing with
ModuleNotFoundError.

Fixes #444
2025-11-16 16:33:10 +11:00
GitHub Action
1ec0820936 Release version 0.51.1 2025-11-16 02:01:39 +00:00
Jose Diaz-Gonzalez
ca463e5cd4 Merge pull request #446 from josegonzalez/dependabot/pip/python-packages-4ff811fbf7
chore(deps): bump certifi from 2025.10.5 to 2025.11.12 in the python-packages group
2025-11-15 21:01:01 -05:00
Jose Diaz-Gonzalez
1750d0eff1 Merge pull request #448 from Iamrodos/fix/attachment-duplicate-downloads
fix: Prevent duplicate attachment downloads (with tests)
2025-11-15 21:00:00 -05:00
Rodos
e4d1c78993 test: Add pytest infrastructure and attachment tests
In making my last fix to attachments, I found it challenging not
having tests to ensure there was no regression.

Added pytest with minimal setup and isolated configuration. Created
a separate test workflow to keep tests isolated from linting.

Tests cover the key elements of the attachment logic:
- URL extraction from issue bodies
- Filename extraction from different URL types
- Filename collision resolution
- Manifest duplicate prevention
2025-11-14 10:28:30 +11:00
Rodos
7a9455db88 fix: Prevent duplicate attachment downloads
Fixes bug where attachments were downloaded multiple times with
incremented filenames (file.mov, file_1.mov, file_2.mov) when
running backups without --skip-existing flag.

I should not have used the --skip-existing flag for attachments,
it did not do what I thought it did.

The correct approach is to always use the manifest to guide what
has already been downloaded and what now needs to be done.
2025-11-14 10:28:30 +11:00
dependabot[bot]
a98ff7f23d chore(deps): bump certifi in the python-packages group
Bumps the python-packages group with 1 update: [certifi](https://github.com/certifi/python-certifi).


Updates `certifi` from 2025.10.5 to 2025.11.12
- [Commits](https://github.com/certifi/python-certifi/compare/2025.10.05...2025.11.12)

---
updated-dependencies:
- dependency-name: certifi
  dependency-version: 2025.11.12
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-12 13:11:06 +00:00
Jose Diaz-Gonzalez
7b78f06a68 Merge pull request #445 from josegonzalez/dependabot/pip/python-packages-499fb03faa
chore(deps): bump black from 25.9.0 to 25.11.0 in the python-packages group
2025-11-10 12:45:25 -05:00
dependabot[bot]
56db3ff0e8 chore(deps): bump black in the python-packages group
Bumps the python-packages group with 1 update: [black](https://github.com/psf/black).


Updates `black` from 25.9.0 to 25.11.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.9.0...25.11.0)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 25.11.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-10 13:59:47 +00:00
Jose Diaz-Gonzalez
5c9c20f6ee Merge pull request #443 from josegonzalez/dependabot/pip/python-packages-7fb8ba35da
chore(deps): bump docutils from 0.22.2 to 0.22.3 in the python-packages group
2025-11-07 15:56:55 -05:00
dependabot[bot]
c8c585cbb5 chore(deps): bump docutils in the python-packages group
Bumps the python-packages group with 1 update: [docutils](https://github.com/rtfd/recommonmark).


Updates `docutils` from 0.22.2 to 0.22.3
- [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rtfd/recommonmark/commits)

---
updated-dependencies:
- dependency-name: docutils
  dependency-version: 0.22.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-06 13:09:51 +00:00
14 changed files with 1030 additions and 95 deletions

View File

@@ -18,7 +18,7 @@ jobs:
runs-on: ubuntu-24.04 runs-on: ubuntu-24.04
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v5 uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
ssh-key: ${{ secrets.DEPLOY_PRIVATE_KEY }} ssh-key: ${{ secrets.DEPLOY_PRIVATE_KEY }}

View File

@@ -38,7 +38,7 @@ jobs:
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v5 uses: actions/checkout@v6
with: with:
persist-credentials: false persist-credentials: false

View File

@@ -21,7 +21,7 @@ jobs:
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v5 uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
- name: Setup Python - name: Setup Python

33
.github/workflows/test.yml vendored Normal file
View File

@@ -0,0 +1,33 @@
---
name: "test"
# yamllint disable-line rule:truthy
on:
pull_request:
branches:
- "*"
push:
branches:
- "main"
- "master"
jobs:
test:
name: test
runs-on: ubuntu-24.04
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
steps:
- name: Checkout repository
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
- run: pip install -r release-requirements.txt
- run: pytest tests/ -v

View File

@@ -1,9 +1,213 @@
Changelog Changelog
========= =========
0.51.0 (2025-11-06) 0.52.0 (2025-11-28)
------------------- -------------------
------------------------ ------------------------
- Skip DMCA'd repos which return a 451 response. [Rodos]
Log a warning and the link to the DMCA notice. Continue backing up
other repositories instead of crashing.
Closes #163
- Chore(deps): bump restructuredtext-lint in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint).
Updates `restructuredtext-lint` from 1.4.0 to 2.0.2
- [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2)
---
updated-dependencies:
- dependency-name: restructuredtext-lint
dependency-version: 2.0.2
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: python-packages
...
- Chore(deps): bump actions/checkout from 5 to 6. [dependabot[bot]]
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v5...v6)
---
updated-dependencies:
- dependency-name: actions/checkout
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
...
- Chore(deps): bump the python-packages group with 3 updates.
[dependabot[bot]]
Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring).
Updates `click` from 8.3.0 to 8.3.1
- [Release notes](https://github.com/pallets/click/releases)
- [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1)
Updates `pytest` from 8.3.3 to 9.0.1
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1)
Updates `keyring` from 25.6.0 to 25.7.0
- [Release notes](https://github.com/jaraco/keyring/releases)
- [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst)
- [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0)
---
updated-dependencies:
- dependency-name: click
dependency-version: 8.3.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
- dependency-name: pytest
dependency-version: 9.0.1
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: python-packages
- dependency-name: keyring
dependency-version: 25.7.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: python-packages
...
0.51.3 (2025-11-18)
-------------------
- Test: Add pagination tests for cursor and page-based Link headers.
[Rodos]
- Use cursor based pagination. [Helio Machado]
0.51.2 (2025-11-16)
-------------------
Fix
~~~
- Improve CA certificate detection with fallback chain. [Rodos]
The previous implementation incorrectly assumed empty get_ca_certs()
meant broken SSL, causing false failures in GitHub Codespaces and other
directory-based cert systems where certificates exist but aren't pre-loaded.
It would then attempt to import certifi as a workaround, but certifi wasn't
listed in requirements.txt, causing the fallback to fail with ImportError
even though the system certificates would have worked fine.
This commit replaces the naive check with a layered fallback approach that
checks multiple certificate sources. First it checks for pre-loaded system
certs (file-based systems). Then it verifies system cert paths exist
(directory-based systems like Ubuntu/Debian/Codespaces). Finally it attempts
to use certifi as an optional fallback only if needed.
This approach eliminates hard dependencies (certifi is now optional), works
in GitHub Codespaces without any setup, and fails gracefully with clear hints
for resolution when SSL is actually broken rather than failing with
ModuleNotFoundError.
Fixes #444
0.51.1 (2025-11-16)
-------------------
Fix
~~~
- Prevent duplicate attachment downloads. [Rodos]
Fixes bug where attachments were downloaded multiple times with
incremented filenames (file.mov, file_1.mov, file_2.mov) when
running backups without --skip-existing flag.
I should not have used the --skip-existing flag for attachments,
it did not do what I thought it did.
The correct approach is to always use the manifest to guide what
has already been downloaded and what now needs to be done.
Other
~~~~~
- Chore(deps): bump certifi in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [certifi](https://github.com/certifi/python-certifi).
Updates `certifi` from 2025.10.5 to 2025.11.12
- [Commits](https://github.com/certifi/python-certifi/compare/2025.10.05...2025.11.12)
---
updated-dependencies:
- dependency-name: certifi
dependency-version: 2025.11.12
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: python-packages
...
- Test: Add pytest infrastructure and attachment tests. [Rodos]
In making my last fix to attachments, I found it challenging not
having tests to ensure there was no regression.
Added pytest with minimal setup and isolated configuration. Created
a separate test workflow to keep tests isolated from linting.
Tests cover the key elements of the attachment logic:
- URL extraction from issue bodies
- Filename extraction from different URL types
- Filename collision resolution
- Manifest duplicate prevention
- Chore(deps): bump black in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [black](https://github.com/psf/black).
Updates `black` from 25.9.0 to 25.11.0
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.9.0...25.11.0)
---
updated-dependencies:
- dependency-name: black
dependency-version: 25.11.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: python-packages
...
- Chore(deps): bump docutils in the python-packages group.
[dependabot[bot]]
Bumps the python-packages group with 1 update: [docutils](https://github.com/rtfd/recommonmark).
Updates `docutils` from 0.22.2 to 0.22.3
- [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rtfd/recommonmark/commits)
---
updated-dependencies:
- dependency-name: docutils
dependency-version: 0.22.3
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: python-packages
...
0.51.0 (2025-11-06)
-------------------
Fix Fix
~~~ ~~~

View File

@@ -1 +1 @@
__version__ = "0.51.0" __version__ = "0.52.0"

View File

@@ -37,22 +37,42 @@ FNULL = open(os.devnull, "w")
FILE_URI_PREFIX = "file://" FILE_URI_PREFIX = "file://"
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
https_ctx = ssl.create_default_context()
if not https_ctx.get_ca_certs():
import warnings
warnings.warn( class RepositoryUnavailableError(Exception):
"\n\nYOUR DEFAULT CA CERTS ARE EMPTY.\n" """Raised when a repository is unavailable due to legal reasons (e.g., DMCA takedown)."""
+ "PLEASE POPULATE ANY OF:"
+ "".join( def __init__(self, message, dmca_url=None):
["\n - " + x for x in ssl.get_default_verify_paths() if type(x) is str] super().__init__(message)
) self.dmca_url = dmca_url
+ "\n",
stacklevel=2,
) # Setup SSL context with fallback chain
https_ctx = ssl.create_default_context()
if https_ctx.get_ca_certs():
# Layer 1: Certificates pre-loaded from system (file-based)
pass
else:
paths = ssl.get_default_verify_paths()
if (paths.cafile and os.path.exists(paths.cafile)) or (
paths.capath and os.path.exists(paths.capath)
):
# Layer 2: Cert paths exist, will be lazy-loaded on first use (directory-based)
pass
else:
# Layer 3: Try certifi package as optional fallback
try:
import certifi import certifi
https_ctx = ssl.create_default_context(cafile=certifi.where()) https_ctx = ssl.create_default_context(cafile=certifi.where())
except ImportError:
# All layers failed - no certificates available anywhere
sys.exit(
"\nERROR: No CA certificates found. Cannot connect to GitHub over SSL.\n\n"
"Solutions you can explore:\n"
" 1. pip install certifi\n"
" 2. Alpine: apk add ca-certificates\n"
" 3. Debian/Ubuntu: apt-get install ca-certificates\n\n"
)
def logging_subprocess( def logging_subprocess(
@@ -581,27 +601,39 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
auth = get_auth(args, encode=not args.as_app) auth = get_auth(args, encode=not args.as_app)
query_args = get_query_args(query_args) query_args = get_query_args(query_args)
per_page = 100 per_page = 100
page = 0 next_url = None
while True: while True:
if single_request: if single_request:
request_page, request_per_page = None, None request_per_page = None
else: else:
page = page + 1 request_per_page = per_page
request_page, request_per_page = page, per_page
request = _construct_request( request = _construct_request(
request_per_page, request_per_page,
request_page,
query_args, query_args,
template, next_url or template,
auth, auth,
as_app=args.as_app, as_app=args.as_app,
fine=True if args.token_fine is not None else False, fine=True if args.token_fine is not None else False,
) # noqa ) # noqa
r, errors = _get_response(request, auth, template) r, errors = _get_response(request, auth, next_url or template)
status_code = int(r.getcode()) status_code = int(r.getcode())
# Handle DMCA takedown (HTTP 451) - raise exception to skip entire repository
if status_code == 451:
dmca_url = None
try:
response_data = json.loads(r.read().decode("utf-8"))
dmca_url = response_data.get("block", {}).get("html_url")
except Exception:
pass
raise RepositoryUnavailableError(
"Repository unavailable due to legal reasons (HTTP 451)",
dmca_url=dmca_url
)
# Check if we got correct data # Check if we got correct data
try: try:
response = json.loads(r.read().decode("utf-8")) response = json.loads(r.read().decode("utf-8"))
@@ -633,15 +665,14 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
retries += 1 retries += 1
time.sleep(5) time.sleep(5)
request = _construct_request( request = _construct_request(
per_page, request_per_page,
page,
query_args, query_args,
template, next_url or template,
auth, auth,
as_app=args.as_app, as_app=args.as_app,
fine=True if args.token_fine is not None else False, fine=True if args.token_fine is not None else False,
) # noqa ) # noqa
r, errors = _get_response(request, auth, template) r, errors = _get_response(request, auth, next_url or template)
status_code = int(r.getcode()) status_code = int(r.getcode())
try: try:
@@ -671,7 +702,16 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False):
if type(response) is list: if type(response) is list:
for resp in response: for resp in response:
yield resp yield resp
if len(response) < per_page: # Parse Link header for next page URL (cursor-based pagination)
link_header = r.headers.get("Link", "")
next_url = None
if link_header:
# Parse Link header: <https://api.github.com/...?per_page=100&after=cursor>; rel="next"
for link in link_header.split(","):
if 'rel="next"' in link:
next_url = link[link.find("<") + 1:link.find(">")]
break
if not next_url:
break break
elif type(response) is dict and single_request: elif type(response) is dict and single_request:
yield response yield response
@@ -724,13 +764,18 @@ def _get_response(request, auth, template):
def _construct_request( def _construct_request(
per_page, page, query_args, template, auth, as_app=None, fine=False per_page, query_args, template, auth, as_app=None, fine=False
): ):
# If template is already a full URL with query params (from Link header), use it directly
if "?" in template and template.startswith("http"):
request_url = template
# Extract query string for logging
querystring = template.split("?", 1)[1]
else:
# Build URL with query parameters
all_query_args = {} all_query_args = {}
if per_page: if per_page:
all_query_args["per_page"] = per_page all_query_args["per_page"] = per_page
if page:
all_query_args["page"] = page
if query_args: if query_args:
all_query_args.update(query_args) all_query_args.update(query_args)
@@ -755,7 +800,7 @@ def _construct_request(
"Accept", "application/vnd.github.machine-man-preview+json" "Accept", "application/vnd.github.machine-man-preview+json"
) )
log_url = template log_url = template if "?" not in template else template.split("?")[0]
if querystring: if querystring:
log_url += "?" + querystring log_url += "?" + querystring
logger.info("Requesting {}".format(log_url)) logger.info("Requesting {}".format(log_url))
@@ -832,8 +877,7 @@ def download_file(url, path, auth, as_app=False, fine=False):
return return
request = _construct_request( request = _construct_request(
per_page=100, per_page=None,
page=1,
query_args={}, query_args={},
template=url, template=url,
auth=auth, auth=auth,
@@ -919,12 +963,6 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False):
"error": None, "error": None,
} }
if os.path.exists(path):
metadata["success"] = True
metadata["http_status"] = 200 # Assume success if already exists
metadata["size_bytes"] = os.path.getsize(path)
return metadata
# Create simple request (no API query params) # Create simple request (no API query params)
request = Request(url) request = Request(url)
request.add_header("Accept", "application/octet-stream") request.add_header("Accept", "application/octet-stream")
@@ -1337,10 +1375,10 @@ def download_attachments(
attachments_dir = os.path.join(item_cwd, "attachments", str(number)) attachments_dir = os.path.join(item_cwd, "attachments", str(number))
manifest_path = os.path.join(attachments_dir, "manifest.json") manifest_path = os.path.join(attachments_dir, "manifest.json")
# Load existing manifest if skip_existing is enabled # Load existing manifest to prevent duplicate downloads
existing_urls = set() existing_urls = set()
existing_metadata = [] existing_metadata = []
if args.skip_existing and os.path.exists(manifest_path): if os.path.exists(manifest_path):
try: try:
with open(manifest_path, "r") as f: with open(manifest_path, "r") as f:
existing_manifest = json.load(f) existing_manifest = json.load(f)
@@ -1395,9 +1433,6 @@ def download_attachments(
filename = get_attachment_filename(url) filename = get_attachment_filename(url)
filepath = os.path.join(attachments_dir, filename) filepath = os.path.join(attachments_dir, filename)
# Check for collision BEFORE downloading
filepath = resolve_filename_collision(filepath)
# Download and get metadata # Download and get metadata
metadata = download_attachment_file( metadata = download_attachment_file(
url, url,
@@ -1655,6 +1690,7 @@ def backup_repositories(args, output_directory, repositories):
continue # don't try to back anything else for a gist; it doesn't exist continue # don't try to back anything else for a gist; it doesn't exist
try:
download_wiki = args.include_wiki or args.include_everything download_wiki = args.include_wiki or args.include_everything
if repository["has_wiki"] and download_wiki: if repository["has_wiki"] and download_wiki:
fetch_repository( fetch_repository(
@@ -1689,6 +1725,12 @@ def backup_repositories(args, output_directory, repositories):
repos_template, repos_template,
include_assets=args.include_assets or args.include_everything, include_assets=args.include_assets or args.include_everything,
) )
except RepositoryUnavailableError as e:
logger.warning(f"Repository {repository['full_name']} is unavailable (HTTP 451)")
if e.dmca_url:
logger.warning(f"DMCA notice: {e.dmca_url}")
logger.info(f"Skipping remaining resources for {repository['full_name']}")
continue
if args.incremental: if args.incremental:
if last_update == "0000-00-00T00:00:00Z": if last_update == "0000-00-00T00:00:00Z":

6
pytest.ini Normal file
View File

@@ -0,0 +1,6 @@
[pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts = -v

View File

@@ -1,17 +1,18 @@
autopep8==2.3.2 autopep8==2.3.2
black==25.9.0 black==25.11.0
bleach==6.3.0 bleach==6.3.0
certifi==2025.10.5 certifi==2025.11.12
charset-normalizer==3.4.4 charset-normalizer==3.4.4
click==8.3.0 click==8.3.1
colorama==0.4.6 colorama==0.4.6
docutils==0.22.2 docutils==0.22.3
flake8==7.3.0 flake8==7.3.0
gitchangelog==3.0.4 gitchangelog==3.0.4
pytest==9.0.1
idna==3.11 idna==3.11
importlib-metadata==8.7.0 importlib-metadata==8.7.0
jaraco.classes==3.4.0 jaraco.classes==3.4.0
keyring==25.6.0 keyring==25.7.0
markdown-it-py==4.0.0 markdown-it-py==4.0.0
mccabe==0.7.0 mccabe==0.7.0
mdurl==0.1.2 mdurl==0.1.2
@@ -27,7 +28,7 @@ Pygments==2.19.2
readme-renderer==44.0 readme-renderer==44.0
requests==2.32.5 requests==2.32.5
requests-toolbelt==1.0.0 requests-toolbelt==1.0.0
restructuredtext-lint==1.4.0 restructuredtext-lint==2.0.2
rfc3986==2.0.0 rfc3986==2.0.0
rich==14.2.0 rich==14.2.0
setuptools==80.9.0 setuptools==80.9.0

View File

@@ -1 +0,0 @@

1
tests/__init__.py Normal file
View File

@@ -0,0 +1 @@
"""Tests for python-github-backup."""

353
tests/test_attachments.py Normal file
View File

@@ -0,0 +1,353 @@
"""Behavioral tests for attachment functionality."""
import json
import os
import tempfile
from pathlib import Path
from unittest.mock import Mock
import pytest
from github_backup import github_backup
@pytest.fixture
def attachment_test_setup(tmp_path):
"""Fixture providing setup and helper for attachment download tests."""
from unittest.mock import patch
issue_cwd = tmp_path / "issues"
issue_cwd.mkdir()
# Mock args
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.user = "testuser"
args.repository = "testrepo"
repository = {"full_name": "testuser/testrepo"}
def call_download(issue_data, issue_number=123):
"""Call download_attachments with mocked HTTP downloads.
Returns list of URLs that were actually downloaded.
"""
downloaded_urls = []
def mock_download(url, path, auth, as_app, fine):
downloaded_urls.append(url)
return {
"success": True,
"saved_as": os.path.basename(path),
"url": url,
}
with patch(
"github_backup.github_backup.download_attachment_file",
side_effect=mock_download,
):
github_backup.download_attachments(
args, str(issue_cwd), issue_data, issue_number, repository
)
return downloaded_urls
return {
"issue_cwd": str(issue_cwd),
"args": args,
"repository": repository,
"call_download": call_download,
}
class TestURLExtraction:
"""Test URL extraction with realistic issue content."""
def test_mixed_urls(self):
issue_data = {
"body": """
## Bug Report
When uploading files, I see this error. Here's a screenshot:
https://github.com/user-attachments/assets/abc123def456
The logs show: https://github.com/user-attachments/files/789/error-log.txt
This is similar to https://github.com/someorg/somerepo/issues/42 but different.
You can also see the video at https://user-images.githubusercontent.com/12345/video-demo.mov
Here's how to reproduce:
```bash
# Don't extract this example URL:
curl https://github.com/user-attachments/assets/example999
```
More info at https://docs.example.com/guide
Also see this inline code `https://github.com/user-attachments/files/111/inline.pdf` should not extract.
Final attachment: https://github.com/user-attachments/files/222/report.pdf.
""",
"comment_data": [
{
"body": "Here's another attachment: https://private-user-images.githubusercontent.com/98765/secret.png?jwt=token123"
},
{
"body": """
Example code:
```python
url = "https://github.com/user-attachments/assets/code-example"
```
But this is real: https://github.com/user-attachments/files/333/actual.zip
"""
},
],
}
# Extract URLs
urls = github_backup.extract_attachment_urls(issue_data)
expected_urls = [
"https://github.com/user-attachments/assets/abc123def456",
"https://github.com/user-attachments/files/789/error-log.txt",
"https://user-images.githubusercontent.com/12345/video-demo.mov",
"https://github.com/user-attachments/files/222/report.pdf",
"https://private-user-images.githubusercontent.com/98765/secret.png?jwt=token123",
"https://github.com/user-attachments/files/333/actual.zip",
]
assert set(urls) == set(expected_urls)
def test_trailing_punctuation_stripped(self):
"""URLs with trailing punctuation should have punctuation stripped."""
issue_data = {
"body": """
See this file: https://github.com/user-attachments/files/1/doc.pdf.
And this one (https://github.com/user-attachments/files/2/image.png).
Check it out! https://github.com/user-attachments/files/3/data.csv!
"""
}
urls = github_backup.extract_attachment_urls(issue_data)
expected = [
"https://github.com/user-attachments/files/1/doc.pdf",
"https://github.com/user-attachments/files/2/image.png",
"https://github.com/user-attachments/files/3/data.csv",
]
assert set(urls) == set(expected)
def test_deduplication_across_body_and_comments(self):
"""Same URL in body and comments should only appear once."""
duplicate_url = "https://github.com/user-attachments/assets/abc123"
issue_data = {
"body": f"First mention: {duplicate_url}",
"comment_data": [
{"body": f"Second mention: {duplicate_url}"},
{"body": f"Third mention: {duplicate_url}"},
],
}
urls = github_backup.extract_attachment_urls(issue_data)
assert set(urls) == {duplicate_url}
class TestFilenameExtraction:
"""Test filename extraction from different URL types."""
def test_modern_assets_url(self):
"""Modern assets URL returns UUID."""
url = "https://github.com/user-attachments/assets/abc123def456"
filename = github_backup.get_attachment_filename(url)
assert filename == "abc123def456"
def test_modern_files_url(self):
"""Modern files URL returns filename."""
url = "https://github.com/user-attachments/files/12345/report.pdf"
filename = github_backup.get_attachment_filename(url)
assert filename == "report.pdf"
def test_legacy_cdn_url(self):
"""Legacy CDN URL returns filename with extension."""
url = "https://user-images.githubusercontent.com/123456/abc-def.png"
filename = github_backup.get_attachment_filename(url)
assert filename == "abc-def.png"
def test_private_cdn_url(self):
"""Private CDN URL returns filename."""
url = "https://private-user-images.githubusercontent.com/98765/secret.png?jwt=token123"
filename = github_backup.get_attachment_filename(url)
assert filename == "secret.png"
def test_repo_files_url(self):
"""Repo-scoped files URL returns filename."""
url = "https://github.com/owner/repo/files/789/document.txt"
filename = github_backup.get_attachment_filename(url)
assert filename == "document.txt"
class TestFilenameCollision:
"""Test filename collision resolution."""
def test_collision_behavior(self):
"""Test filename collision resolution with real files."""
with tempfile.TemporaryDirectory() as tmpdir:
# No collision - file doesn't exist
result = github_backup.resolve_filename_collision(
os.path.join(tmpdir, "report.pdf")
)
assert result == os.path.join(tmpdir, "report.pdf")
# Create the file, now collision exists
Path(os.path.join(tmpdir, "report.pdf")).touch()
result = github_backup.resolve_filename_collision(
os.path.join(tmpdir, "report.pdf")
)
assert result == os.path.join(tmpdir, "report_1.pdf")
# Create report_1.pdf too
Path(os.path.join(tmpdir, "report_1.pdf")).touch()
result = github_backup.resolve_filename_collision(
os.path.join(tmpdir, "report.pdf")
)
assert result == os.path.join(tmpdir, "report_2.pdf")
def test_manifest_reserved(self):
"""manifest.json is always treated as reserved."""
with tempfile.TemporaryDirectory() as tmpdir:
# Even if manifest.json doesn't exist, should get manifest_1.json
result = github_backup.resolve_filename_collision(
os.path.join(tmpdir, "manifest.json")
)
assert result == os.path.join(tmpdir, "manifest_1.json")
class TestManifestDuplicatePrevention:
"""Test that manifest prevents duplicate downloads (the bug fix)."""
def test_manifest_filters_existing_urls(self, attachment_test_setup):
"""URLs in manifest are not re-downloaded."""
setup = attachment_test_setup
# Create manifest with existing URLs
attachments_dir = os.path.join(setup["issue_cwd"], "attachments", "123")
os.makedirs(attachments_dir)
manifest_path = os.path.join(attachments_dir, "manifest.json")
manifest = {
"attachments": [
{
"url": "https://github.com/user-attachments/assets/old1",
"success": True,
"saved_as": "old1.pdf",
},
{
"url": "https://github.com/user-attachments/assets/old2",
"success": True,
"saved_as": "old2.pdf",
},
]
}
with open(manifest_path, "w") as f:
json.dump(manifest, f)
# Issue data with 2 old URLs and 1 new URL
issue_data = {
"body": """
Old: https://github.com/user-attachments/assets/old1
Old: https://github.com/user-attachments/assets/old2
New: https://github.com/user-attachments/assets/new1
"""
}
downloaded_urls = setup["call_download"](issue_data)
# Should only download the NEW URL (old ones filtered by manifest)
assert len(downloaded_urls) == 1
assert downloaded_urls[0] == "https://github.com/user-attachments/assets/new1"
def test_no_manifest_downloads_all(self, attachment_test_setup):
"""Without manifest, all URLs should be downloaded."""
setup = attachment_test_setup
# Issue data with 2 URLs
issue_data = {
"body": """
https://github.com/user-attachments/assets/url1
https://github.com/user-attachments/assets/url2
"""
}
downloaded_urls = setup["call_download"](issue_data)
# Should download ALL URLs (no manifest to filter)
assert len(downloaded_urls) == 2
assert set(downloaded_urls) == {
"https://github.com/user-attachments/assets/url1",
"https://github.com/user-attachments/assets/url2",
}
def test_manifest_skips_permanent_failures(self, attachment_test_setup):
"""Manifest skips permanent failures (404, 410) but retries transient (503)."""
setup = attachment_test_setup
# Create manifest with different failure types
attachments_dir = os.path.join(setup["issue_cwd"], "attachments", "123")
os.makedirs(attachments_dir)
manifest_path = os.path.join(attachments_dir, "manifest.json")
manifest = {
"attachments": [
{
"url": "https://github.com/user-attachments/assets/success",
"success": True,
"saved_as": "success.pdf",
},
{
"url": "https://github.com/user-attachments/assets/notfound",
"success": False,
"http_status": 404,
},
{
"url": "https://github.com/user-attachments/assets/gone",
"success": False,
"http_status": 410,
},
{
"url": "https://github.com/user-attachments/assets/unavailable",
"success": False,
"http_status": 503,
},
]
}
with open(manifest_path, "w") as f:
json.dump(manifest, f)
# Issue data has all 4 URLs
issue_data = {
"body": """
https://github.com/user-attachments/assets/success
https://github.com/user-attachments/assets/notfound
https://github.com/user-attachments/assets/gone
https://github.com/user-attachments/assets/unavailable
"""
}
downloaded_urls = setup["call_download"](issue_data)
# Should only retry 503 (transient failure)
# Success, 404, and 410 should be skipped
assert len(downloaded_urls) == 1
assert (
downloaded_urls[0]
== "https://github.com/user-attachments/assets/unavailable"
)

143
tests/test_http_451.py Normal file
View File

@@ -0,0 +1,143 @@
"""Tests for HTTP 451 (DMCA takedown) handling."""
import json
from unittest.mock import Mock, patch
import pytest
from github_backup import github_backup
class TestHTTP451Exception:
"""Test suite for HTTP 451 DMCA takedown exception handling."""
def test_repository_unavailable_error_raised(self):
"""HTTP 451 should raise RepositoryUnavailableError with DMCA URL."""
# Create mock args
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
# Mock HTTPError 451 response
mock_response = Mock()
mock_response.getcode.return_value = 451
dmca_data = {
"message": "Repository access blocked",
"block": {
"reason": "dmca",
"created_at": "2024-11-12T14:38:04Z",
"html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
}
}
mock_response.read.return_value = json.dumps(dmca_data).encode("utf-8")
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
# Check exception has DMCA URL
assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md"
assert "451" in str(exc_info.value)
def test_repository_unavailable_error_without_dmca_url(self):
"""HTTP 451 without DMCA details should still raise exception."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
mock_response = Mock()
mock_response.getcode.return_value = 451
mock_response.read.return_value = b'{"message": "Blocked"}'
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
# Exception raised even without DMCA URL
assert exc_info.value.dmca_url is None
assert "451" in str(exc_info.value)
def test_repository_unavailable_error_with_malformed_json(self):
"""HTTP 451 with malformed JSON should still raise exception."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
mock_response = Mock()
mock_response.getcode.return_value = 451
mock_response.read.return_value = b"invalid json {"
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Unavailable For Legal Reasons"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
with pytest.raises(github_backup.RepositoryUnavailableError):
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues"))
def test_other_http_errors_unchanged(self):
"""Other HTTP errors should still raise generic Exception."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = None
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
mock_response = Mock()
mock_response.getcode.return_value = 404
mock_response.read.return_value = b'{"message": "Not Found"}'
mock_response.headers = {"x-ratelimit-remaining": "5000"}
mock_response.reason = "Not Found"
def mock_get_response(request, auth, template):
return mock_response, []
with patch("github_backup.github_backup._get_response", side_effect=mock_get_response):
# Should raise generic Exception, not RepositoryUnavailableError
with pytest.raises(Exception) as exc_info:
list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/notfound/issues"))
assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError)
assert "404" in str(exc_info.value)
if __name__ == "__main__":
pytest.main([__file__, "-v"])

153
tests/test_pagination.py Normal file
View File

@@ -0,0 +1,153 @@
"""Tests for Link header pagination handling."""
import json
from unittest.mock import Mock, patch
import pytest
from github_backup import github_backup
class MockHTTPResponse:
"""Mock HTTP response for paginated API calls."""
def __init__(self, data, link_header=None):
self._content = json.dumps(data).encode("utf-8")
self._link_header = link_header
self._read = False
self.reason = "OK"
def getcode(self):
return 200
def read(self):
if self._read:
return b""
self._read = True
return self._content
def get_header(self, name, default=None):
"""Mock method for headers.get()."""
return self.headers.get(name, default)
@property
def headers(self):
headers = {"x-ratelimit-remaining": "5000"}
if self._link_header:
headers["Link"] = self._link_header
return headers
@pytest.fixture
def mock_args():
"""Mock args for retrieve_data_gen."""
args = Mock()
args.as_app = False
args.token_fine = None
args.token_classic = "fake_token"
args.username = None
args.password = None
args.osx_keychain_item_name = None
args.osx_keychain_item_account = None
args.throttle_limit = None
args.throttle_pause = 0
return args
def test_cursor_based_pagination(mock_args):
"""Link header with 'after' cursor parameter works correctly."""
# Simulate issues endpoint behavior: returns cursor in Link header
responses = [
# Issues endpoint returns 'after' cursor parameter (not 'page')
MockHTTPResponse(
data=[{"issue": i} for i in range(1, 101)], # Page 1 contents
link_header='<https://api.github.com/repos/owner/repo/issues?per_page=100&after=ABC123&page=2>; rel="next"',
),
MockHTTPResponse(
data=[{"issue": i} for i in range(101, 151)], # Page 2 contents
link_header=None, # No Link header - signals end of pagination
),
]
requests_made = []
def mock_urlopen(request, *args, **kwargs):
url = request.get_full_url()
requests_made.append(url)
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list(
github_backup.retrieve_data_gen(
mock_args, "https://api.github.com/repos/owner/repo/issues"
)
)
# Verify all items retrieved and cursor was used in second request
assert len(results) == 150
assert len(requests_made) == 2
assert "after=ABC123" in requests_made[1]
def test_page_based_pagination(mock_args):
"""Link header with 'page' parameter works correctly."""
# Simulate pulls/repos endpoint behavior: returns page numbers in Link header
responses = [
# Pulls endpoint uses traditional 'page' parameter (not cursor)
MockHTTPResponse(
data=[{"pull": i} for i in range(1, 101)], # Page 1 contents
link_header='<https://api.github.com/repos/owner/repo/pulls?per_page=100&page=2>; rel="next"',
),
MockHTTPResponse(
data=[{"pull": i} for i in range(101, 181)], # Page 2 contents
link_header=None, # No Link header - signals end of pagination
),
]
requests_made = []
def mock_urlopen(request, *args, **kwargs):
url = request.get_full_url()
requests_made.append(url)
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list(
github_backup.retrieve_data_gen(
mock_args, "https://api.github.com/repos/owner/repo/pulls"
)
)
# Verify all items retrieved and page parameter was used (not cursor)
assert len(results) == 180
assert len(requests_made) == 2
assert "page=2" in requests_made[1]
assert "after" not in requests_made[1]
def test_no_link_header_stops_pagination(mock_args):
"""Pagination stops when Link header is absent."""
# Simulate endpoint with results that fit in a single page
responses = [
MockHTTPResponse(
data=[{"label": i} for i in range(1, 51)], # Page contents
link_header=None, # No Link header - signals end of pagination
)
]
requests_made = []
def mock_urlopen(request, *args, **kwargs):
requests_made.append(request.get_full_url())
return responses[len(requests_made) - 1]
with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen):
results = list(
github_backup.retrieve_data_gen(
mock_args, "https://api.github.com/repos/owner/repo/labels"
)
)
# Verify pagination stopped after first request
assert len(results) == 50
assert len(requests_made) == 1