# Attach packages
library(tidyverse) # CRAN v1.3.0
library(rvest) # CRAN v1.0.0
# Scrape the rostrum.blog home page
<- read_html("https://rostrum.blog/")
html
# Extract the post titles
<- html %>%
title html_nodes(".archive-item-link") %>% # extract title node
html_text() # extract text
# Extract the post URLs
<- html %>%
link html_nodes(".archive-item-link") %>% # extract title node
html_attr("href") # extract href attribute
# Extract the post dates
<- html %>%
date html_nodes(".archive-item-date") %>% # extract date nodes only
html_text() %>% # extract text
str_replace_all("[:space:]", "") # remove newline/space
# Dataframe of titles and dates
<- tibble(date, title link), %>%
posts transmute(
n = nrow(.):1, # number starting from first post
publish_date = ymd(date), # convert to date class
# title text
title, link = paste0("https://www.rostrum.blog", link) # create full URL
)
tl;dr
You can use a scheduled GitHub Action to render up-to-date stats about your blog into its README.
Happy blogday
This blog has been knocking around for three years now. I wrote a post on its first birthday with a simple, interactive 2D plot of the posts to date.
Only now, two years later, have I thought to put this info into the blog’s README on GitHub—along with some other little stats, like total number of posts—and have it update automatically on a schedule using a GitHub Action.1
This is useful for me so I can keep track of things without counting on my fingers, but it also signals activity on the blog to any curious visitors. I may change its content at some point, but it does what I want it to do for now.
Unwrap your GitHub Action
I’ve scheduled a GitHub Action for the early hours of each day. The YAML file for it reads like ‘at the specified time2, set up a remote environment with R and some dependencies, then render the R Markdown file and push the changes to GitHub.’
I’ve modified r-lib’s pre-written YAML for this, which can be generated in the correct location in your project with usethis::use_github_action("render-rmarkdown.yaml")
.
Click for the GitHub Action YAML
name: Render README
on:
schedule:
- cron: '09 05 * * *'
jobs:
render:
name: Render README
runs-on: macOS-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v2
- uses: r-lib/actions/setup-r@v1
- uses: r-lib/actions/setup-pandoc@v1
- name: Install CRAN packages
run: Rscript -e 'install.packages(c("remotes", "rmarkdown", "knitr", "tidyverse"))'
- name: Install GitHub packages
run: Rscript -e 'remotes::install_github("hadley/emo")'
- name: Render README
run: Rscript -e 'rmarkdown::render("README.Rmd")'
- name: Commit results
run: |
git config --local user.email "actions@github.com"
git config --local user.name "GitHub Actions"
git commit README.md README_files/ -m 'Re-build README.Rmd' || echo "No changes to commit"
git push origin || echo "No changes to commit"
Basically, the action knits the repo’s README.Rmd (R Markdown format containing R code) to a counterpart README.md (GitHub-flavoured markdown), which is displayed when you visit the repo.
PaRty time
The real magic is in some R code chunks at the top of the README.Rmd file itself. There’s some R code there that uses {rvest} to scrape the archive page of the blog and create a dataframe of the titles, links and publish dates of each post.
Click for the scraping code
That information can be cajoled to show some basic stats. The README includes inline R code that renders to show:
- the total number of posts
- posting rates (posts per month and days per post)
- the number of days since since the last post and a link to it
- a clickable details block containing a table of all the posts to date
- a simple 2D plot showing the distribution of posts over time3 (preview below)
Click for plot code
# Create plot object
<- posts %>%
p ggplot(aes(x = publish_date, y = 1)) +
geom_point(shape = "|", size = 10, stroke = 1, color = "#1D8016") +
theme_void()
I also added a call to lubridate::today()
at the bottom of the README.Rmd so it’s obvious when the stats were last updated.
Until next year
Finally, and most importantly, I included a tiny Easter egg: an emoji balloon 🎈 will appear on the page when the README is rendered on the anniversary of the blog’s inception.4
Environment
Session info
Last rendered: 2023-07-17 20:34:40 BST
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.2 compiler_4.3.1 fastmap_1.1.1 cli_3.6.1
[5] tools_4.3.1 htmltools_0.5.5 rstudioapi_0.15.0 yaml_2.3.7
[9] rmarkdown_2.23 knitr_1.43.1 jsonlite_1.8.7 xfun_0.39
[13] digest_0.6.31 rlang_1.1.1 evaluate_0.21
Footnotes
I’ve written before about GitHub Actions to create a Twitter bot and for continuous integration of R packages.↩︎
I wrote about scheduling with cron strings in an earlier post, which details the {dialga} package for translating from R to cron to English.↩︎
The original chart was made with {plotly}, so you could hover over the points to see the post titles and publishing dates. Plotly isn’t supported in GitHub Markdown, so I included a static chart instead. I used a similar ‘barcode’ format in a recent post about health data.↩︎
That’s today if you’re reading this on the day it was published.↩︎