::install_github("matt-dray/ghdump")
remoteslibrary(ghdump)
tl;dr
Run ghd_copy()
from the {ghdump} package to either clone or download all the GitHub repositories for a given user. Intended for archival purposes or setting up a new computer.
The package comes with no guarantees and will likely be in a perpetual work-in-progress state. Please submit issues or pull requests.
Clone army
Situation:
- Sometimes I get a new computer and want to clone all my repos to it
- Sometimes I want to be able to archive my repos so I’m not dependent on GitHub nor any given computer
- it would be tedious to download or clone the repos one-by-one from the GitHub interface
Wants:
- To clone (with HTTPS or SSH) or download all of my repos with one command
- Be able to unzip downloaded repos en masse if I want to
- Do all this from within R, mostly for the learning experience, but also to allow for user interactivity
Observations:
- I don’t know of a specific R function that automates mass-downloading or mass-cloning of GitHub repos
- the {gh} package provides a lightweight GitHub API wrapper for R that’s likely to be helpful
- R has many file-handling functions that will be helpful
{ghdump}
The result is that I wrote a function, ghd_copy()
, that copies (clones or downloads) all the repos for a given user to a specified location. You can get it in the tiny {ghdump} package.
The function interacts with the GitHub API thanks to the {gh} package by Gábor Csárdi, Jenny Bryan and Hadley Wickham, while iterating over repos comes thanks to the {purrr} package by Lionel Henry and Hadley Wickham.
Update
As of May 2022 there’s also a handy rOpenSci package called {gitcellar}, by Maëlle Salmon and Jeroen Ooms, which is for downloading an organisation’s repos for archival purposes.
Get and use
Install with:
To use the package, you’ll need a GitHub account and a GitHub Personal Access Token (PAT) stored in your .Renviron file. You can do this with the following steps:
::create_github_token() # opens browser to generate token
usethis::edit_r_environ() # add your token to the .Renviron
usethis# then restart R
You can use {ghdump} to download the repos for a specified user:
ghd_copy(
gh_user = "matt-dray", # download repos for this user
dest_dir = "~/Documents/repos", # full local file path to copy to
copy_type = "download" # "download" or "clone" the repos
)
Or clone them:
ghd_copy(
gh_user = "matt-dray",
dest_dir = "~/Documents/repos",
copy_type = "clone",
protocol = "https" # specify "https" or "ssh"
)
If you want to use the SSH protocol when cloning, you need to make sure that you’ve set up your keys.
Interactivity
My expectation is to use ghd_copy()
infrequently and in a non-programmatic way, so I’ve made it quite interactive. This means user input is required; you’ll get some yes/no questions in the console that will affect how the function runs.
Here’s an imaginary demo of the output from ghd_copy()
when copy_type = "download"
:
ghd_copy("made-up-user", "~/Desktop/test-download", "download")
Fetching GitHub repos for user made-up-user... 3 repos found
Create new directory at path ~/Desktop/test-download? y/n: y
Definitely download all 3 repos? y/n: y
Downloading zipped repositories to ~/Desktop/test-download
trying URL 'https://github.com/made-up-user/fake-repo-1/archive/master.zip'
Content type 'application/zip' length 100 bytes
==================================================
downloaded 100 bytes
trying URL 'https://github.com/made-up-user/fake-repo-2/archive/master.zip'
Content type 'application/zip' length 100 bytes
==================================================
downloaded 100 bytes
trying URL 'https://github.com/made-up-user/fake-repo-3/archive/master.zip'
Content type 'application/zip' length 100 bytes
==================================================
downloaded 100 bytes
Unzip all folders? y/n: y
Unzipping repositories
Retain the zip files? y/n: y
Keeping zipped folders.
Remove '-master' suffix from unzipped directory names? y/n: y
Renaming files to remove '-master' suffix
Finished downloading
And now imaginary demo of the output from ghd_copy()
when copy_type = "clone"
:
ghd_copy("made-up-user", "~/Desktop/test-clone", "clone", "ssh")
Fetching GitHub repos for user made-up-user... 3 repos found
Create new directory at path ~/Desktop/test-clone? y/n: y
Definitely clone all 3 repos? y/n: y
Cloning repositories to ~/Desktop/test-clone
Cloning into 'fake-repo-1'...
Cloning into 'fake-repo-2'...
Cloning into 'fake-repo-3'...
Finished cloning
Note that cloning has only been tested on my own Mac OS machine at this point (June 2020) and is not guaranteed to work elsewhere yet. Please submit issues or pull requests to help improve this.
Under the hood
What are the steps to downloading repos with ghdump::ghd_copy()
? Each of the functions in this section are not exported from the package, but you can access them by prefacing with ghdump:::
(the rare triple-colon operator) if you want to see their code.
First, to get repo info:
ghd_get_repos()
passes a GitHub username togh::gh()
, which contacts the GitHub API to return a gh_response object that contains info about each of that user’s reposghd_extract_names()
takes the gh_response object fromghd_get_repos()
and extracts the names into a character vector
Then to download (if copy_type = "download"
):
ghd_enframe_urls()
turns the character vector of repo names into a data.frame, with a corresponding column that contains the URL to a zip file for that repoghd_copy_zips()
takes each zip file URL from that data frame and downloads them to the file path provided by the userghd_unzip()
unzips the zipped repos
You can, of course, use these intermediate functions if you have slightly different needs. Maybe you want to limit the repos that are downloaded; do this by filtering the vector output from ghd_extract_names()
for example.
Or to clone (if copy_type = "clone"
):
ghd_clone_multi()
that iterates cloning over the repos, itself callingghd_clone_one()
Why bother?
What did I learn from doing this? As if I have to explain myuself to you, lol.
1. Iteration
Aside from {gh}, the package also depends on {purrr} for iterative programming.
For example, the gh_response object output from ghdump:::ghd_get_repos()
is passed to map()
with the pluck()
function to extract the repo names.
Another example is the use of walk()
, which is like map()
, except we use it when the output is some ‘side effect’. By ‘side effect’, we mean that it doesn’t return an R object. For example, we can walk()
the unzip()
function over the path to each zip file. This doesn’t return anything in R; it results in some local files being manipulated.
2. File manipulation
R can be used to interact with files on your computer. There’s a number of these base R functions in the package:
dir.create()
to create a new folderfile.remove()
to remove a file or folderlist.files()
andlist.dirs()
to return a character vector files and folders at some pathfile.rename
to change the name of a file or folderunzip()
to unpack a zipped folder
3. User input
How do you ask questions of your user and get answers? This interactivity is made possible by readline()
. You pass it a string to prompt the user, whose return value can be stored.
For example, this is how it looks in the console:
<- readline("Do you like pizza? ") answer
Do you like pizza? yes
answer
[1] "yes"
Where a user has written yes
after the prompt on the second line.
4. Stickers
I’ve designed a few hex stickers with the {hexSticker} package; you can see them in my ‘stickers’ GitHub repo. This time I made the sticker for {ghdump} using Dmytro Perepolkin’s {bunny} package, which is a helper for the {magick} package from Jeroen Ooms. It’s a very smooth process with much flexibility.
This belongs in a dump
Yeah, maybe. It’s not sophisticated, but I’ve found it useful for my own specific purposes.
Environment
Session info
Last rendered: 2024-04-11 21:36:11 BST
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.2 compiler_4.3.1 fastmap_1.1.1 cli_3.6.2
[5] tools_4.3.1 htmltools_0.5.8 rstudioapi_0.16.0 yaml_2.3.8
[9] rmarkdown_2.26 knitr_1.45 jsonlite_1.8.8 xfun_0.43
[13] digest_0.6.35 rlang_1.1.3 fontawesome_0.5.2 evaluate_0.23