feat: Add attachment download support for issues and pull requests

Adds new --attachments flag that downloads user-uploaded files from
issue and PR bodies and comments. Key features:

- Determines attachment URLs
- Tracks downloads in manifest.json with metadata
- Supports --skip-existing to avoid re-downloading
- Handles filename collisions with counter suffix
- Smart retry logic for transient vs permanent failures
- Uses Content-Disposition for correct file extensions
This commit is contained in:
Rodos
2025-11-03 13:36:15 +11:00
parent 8f859be355
commit a194fa48ce
2 changed files with 637 additions and 3 deletions

View File

@@ -50,7 +50,7 @@ CLI Help output::
[--keychain-name OSX_KEYCHAIN_ITEM_NAME]
[--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT]
[--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES]
[--skip-prerelease] [--assets]
[--skip-prerelease] [--assets] [--attachments]
[--exclude [REPOSITORY [REPOSITORY ...]]
[--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE]
USER
@@ -133,6 +133,9 @@ CLI Help output::
--skip-prerelease skip prerelease and draft versions; only applies if including releases
--assets include assets alongside release information; only
applies if including releases
--attachments download user-attachments from issues and pull requests
to issues/attachments/{issue_number}/ and
pulls/attachments/{pull_number}/ directories
--exclude [REPOSITORY [REPOSITORY ...]]
names of repositories to exclude from backup.
--throttle-limit THROTTLE_LIMIT
@@ -213,6 +216,29 @@ When you use the ``--lfs`` option, you will need to make sure you have Git LFS i
Instructions on how to do this can be found on https://git-lfs.github.com.
About Attachments
-----------------
When you use the ``--attachments`` option with ``--issues`` or ``--pulls``, the tool will download user-uploaded attachments (images, videos, documents, etc.) from issue and pull request descriptions and comments. In some circumstances attachments contain valuable data related to the topic, and without their backup important information or context might be lost inadvertently.
Attachments are saved to ``issues/attachments/{issue_number}/`` and ``pulls/attachments/{pull_number}/`` directories, where ``{issue_number}`` is the GitHub issue number (e.g., issue #123 saves to ``issues/attachments/123/``). Each attachment directory contains:
- The downloaded attachment files (named by their GitHub identifier with appropriate file extensions)
- If multiple attachments have the same filename, conflicts are resolved with numeric suffixes (e.g., ``report.pdf``, ``report_1.pdf``, ``report_2.pdf``)
- A ``manifest.json`` file documenting all downloads, including URLs, file metadata, and download status
The tool automatically extracts file extensions from HTTP headers to ensure files can be more easily opened by your operating system.
**Supported URL formats:**
- Modern: ``github.com/user-attachments/{assets,files}/*``
- Legacy: ``user-images.githubusercontent.com/*`` and ``private-user-images.githubusercontent.com/*``
- Repo files: ``github.com/{owner}/{repo}/files/*`` (filtered to current repository)
- Repo assets: ``github.com/{owner}/{repo}/assets/*`` (filtered to current repository)
**Repository filtering** for repo files/assets handles renamed and transferred repositories gracefully. URLs are included if they either match the current repository name directly, or redirect to it (e.g., ``willmcgugan/rich`` redirects to ``Textualize/rich`` after transfer).
Run in Docker container
-----------------------
@@ -303,7 +329,7 @@ Quietly and incrementally backup useful Github user data (public and private rep
export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
GH_USER=YOUR-GITHUB-USER
github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER
Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup. ::