suppressPackageStartupMessages({
library(tidyverse)
library(rtweet)
library(sf)
library(geojsonio)
library(PostcodesioR)
library(leaflet)
})
tl;dr
I recently made a Twitter bot with R, {rtweet}, MapBox and GitHub Actions – londonmapbot – that tweets images of random coordinates in London. I decided to explore them interactively by creating a simple {leaflet} map. You can jump directly to the map).
Note
Twitter changed its API terms in 2023. As a result, you probably can’t re-run the code in this blog. Read about how I moved londonmapbot to Mastodon at botsin.space/@londonmapbot because of these changes.
The bot
I built the londonmapbot Twitter bot as a fun little project to get to grips with GitHub Actions. An action is scheduled every 30 minutes to run some R code that (1) selects random coordinates in London, (2) fetches a satellite image from the MapBox API, (3) generates an OpenStreetMap URL, all of which are (4) passed to {rtweet} to post to the londonmapbot account.
The outputs have been compelling so far. The composition is usually ‘accidentally’ pleasing. Sometimes landmarks are captured, like The Shard, The Natural History Museum and V&A and Heathrow.
I was wondering whether the bot has ‘found’ other landmarks that I hadn’t noticed or whether it’s found my house. The londonmapbot source code doesn’t have a log file for all the coordinates it’s generated, so I figured the easiest way to get this information and explore it would be to grab all the tweets – which contain the coordinates as text – and then map the results.
Packages
I’m loading the tidyverse for data manipulation with {dplyr}, {tidyr} and {stringr}. {rtweet} greatly simplifies the Twitter API and the objects it returns. We’ll use it to fetch all the tweets from londonmapbot.
I’m using a few geography-related packages:
- {sf} for tidy dataframes with spatial information
- {geojsonio} to read spatial files in geojson format
- {PostcodesioR} to fetch additional geographic data given our x-y information
- {leaflet} to build interactive maps from spatial data.
A particular shoutout to rOpenSci for this post: {rtweet}, {geojsonio} and {PostcodesioR} have all passed muster to become part of the rOpenSci suite of approved packages.
Fetch tweets
{rtweet} does all the legwork to fetch and parse information from the Twitter API, saving you loads of effort.
The rtweet::get_timeline()
function is amazing in its user-side simplicity. Pass the account name from which to fetch tweets, along with the number of tweets to get (3200 is the maximum).
<- get_timeline("londonmapbot", n = 3200)
lmb_tweets 1:5, c("created_at", "text")] # limited preview lmb_tweets[
# A tibble: 5 x 2
created_at text
<dttm> <chr>
1 2020-12-29 09:36:20 "51.5519, -0.33\nhttps://t.co/ica9ZypBLS https://t.co/5gS…
2 2020-12-29 08:59:46 "51.4392, 0.1636\nhttps://t.co/2DsbbLYIDG https://t.co/KV…
3 2020-12-29 06:28:48 "51.6773, -0.2555\nhttps://t.co/1hu2VoxCBF https://t.co/b…
4 2020-12-29 05:56:15 "51.674, -0.4042\nhttps://t.co/HMhmUVVrIn https://t.co/mP…
5 2020-12-29 05:31:03 "51.4451, 0.1058\nhttps://t.co/nWuqy4s7am https://t.co/qv…
{rtweet} has a function to quick-plot tweets over time. There’s meant to be a tweet every half-hour from londonmapbot, but GitHub Actions has been a little inconsistent and sometimes fails to post.
::ts_plot(lmb_tweets) + # plot daily tweets
rtweetlabs(
title = "@londonmapbot tweets per day",
x = "", y = "", caption = "Data collected via {rtweet}"
)
Extract tweet information
The dataframe returned by {rtweet} contains nearly 100 columns. For our purposes we can minimise to:
- the unique tweet identifier,
status_id
, which we can use to build a URL back to the tweet - the datetime the tweet was
created_at
- the tweet text content, from which we can isolate the latitude and longitude values
- the
media_url
to the MapBox image attached to each tweet - the full OpenStreetMap link in each tweet via
urls_expanded_urls
<- lmb_tweets %>%
lmb_simple filter(str_detect(text, "^\\d")) %>% # must start with a digit
separate( # break column into new columns given separator
# column to separate
text, into = c("lat", "lon"), # names to split into
sep = "\\s", # separate on spaces
extra = "drop" # discard split elements
%>%
) mutate( # tidy up variables
lat = str_remove(lat, ","),
across(c(lat, lon), as.numeric)
%>%
) select( # focus variables
status_id, created_at, lat, lon,osm_url = urls_expanded_url, media_url
)
1:5, c("status_id", "lat", "lon")] # limited preview lmb_simple[
# A tibble: 5 x 3
status_id lat lon
<chr> <dbl> <dbl>
1 1343853346478841862 51.6 -0.33
2 1343844145518026752 51.4 0.164
3 1343806154397409280 51.7 -0.256
4 1343797964645523456 51.7 -0.404
5 1343791619049480192 51.4 0.106
Reverse geocode
Tweets from londonmapbot are really simple by design; they only have the latitude and longitude, a link to OpenStreetMap and a satellite image pulled from the MapBox API. It might be interesting to provide additional geographic information.
{PostcodesioR} can perform a ‘reverse geocode’1 of our coordinates. Give latitude and longitude to PostcodesioR::reverse_geocoding()
and it returns a list with various administrative geographies for that point.
<-lmb_simple %>%
lmb_geocode mutate(
reverse_geocode = map2(
.x = lon, .y = lat,
~reverse_geocoding(.x, .y, limit = 1) # limit to first result
)%>%
) unnest(cols = reverse_geocode) %>% # unpack listcol
hoist(reverse_geocode, "postcode") %>% # pull out postcode into a
hoist(reverse_geocode, "admin_district") # pull out borough
1:5, c("lat", "lon", "postcode", "admin_district")] # limited preview lmb_geocode[
# A tibble: 5 × 4
lat lon postcode admin_district
<dbl> <dbl> <chr> <chr>
1 51.6 -0.33 UB6 7QT Ealing
2 51.4 0.164 DA5 2DJ Bexley
3 51.7 -0.256 WD6 5PL Hertsmere
4 51.7 -0.404 WD24 5TU Watford
5 51.4 0.106 DA15 9BQ Bexley
The object returned from the reverse geocode is a nested list that we can tidyr::hoist()
the geographic information from. Here we grabbed the postcode and ‘administrative district’, which for our purposes is the London borough that the point is in.
Convert to spatial object
Right now we have a dataframe where the geographic information is stored as numeric values. We can use the {sf} package to convert and handle this information as spatial information instead.
Basically we can use {sf} to ‘geographise’ our dataframe. It can add geometry (points in our case), dimensions (XY, meaning 2D), the maximum geographic extent (a ‘bounding box’ that roughly covers London) and recognition of the coordinate reference system (‘4236’ for latitude-longitude).
The sf::st_as_sf()
function performs the magic of converting our tidy dataframe into a tidy spatial dataframe. You’ll see that the print method provides us the extra spatial metadata and that our geographic information has been stored in a special geometry
column with class sfc_POINT
.
<- lmb_geocode %>%
lmb_sf st_as_sf(
coords = c("lon", "lat"), # xy columns
crs = 4326, # coordinate reference system code
remove = FALSE # retain the xy columns
)
1:5, c("status_id", "geometry")] # limited preview lmb_sf[
Simple feature collection with 5 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -0.4042 ymin: 51.4392 xmax: 0.1636 ymax: 51.6773
Geodetic CRS: WGS 84
# A tibble: 5 × 2
status_id geometry
<chr> <POINT [°]>
1 1343853346478841862 (-0.33 51.5519)
2 1343844145518026752 (0.1636 51.4392)
3 1343806154397409280 (-0.2555 51.6773)
4 1343797964645523456 (-0.4042 51.674)
5 1343791619049480192 (0.1058 51.4451)
Map it
You can build up layers in {leaflet} in a similar kind of way to a {ggplot2} graphic. The base map is applied with addProviderTiles()
and points added as circle-shaped markers with addCircleMarkers()
. I used addRectangles()
to show the bounding box from within which points are randomly sampled.
<- leaflet(lmb_sf, width = '100%') %>%
lmb_map addProviderTiles("CartoDB.Positron") %>%
addRectangles(
lng1 = -0.489, lat1 = 51.28,
lng2 = 0.236, lat2 = 51.686,
fillColor = "transparent"
%>%
) addCircleMarkers( # locations as points
lng = lmb_sf$lon, lat = lmb_sf$lat, # xy
radius = 5, stroke = FALSE, # marker design
fillOpacity = 0.5, fillColor = "#0000FF", # marker colours
clusterOptions = markerClusterOptions(), # bunch-up markers
popup = ~paste0( # dynamic HTML-creation for popup content
::ji("round_pushpin"), " ", lmb_sf$lat, ", ", lmb_sf$lon, "<br>",
emo::ji("postbox"), lmb_sf$admin_district,
emo", ", lmb_sf$postcode, "<br>",
::ji("bird"), " <a href='https://twitter.com/londonmapbot/status/",
emo$status_id, "'>Tweet</a><br>",
lmb_sf::ji("world_map"), " ", "<a href='",
emo$osm_url, "' width='100%'>OpenStreetMap</a><br><br>",
lmb_sf"<img src='", lmb_sf$media_url, "' width='200'>"
) )
The markers, which are blue dots, have rich pop-ups when clicked. The information is generated dynamically for each point by pasting HTML strings with the content of the dataframe. Props to Matt Kerlogue’s narrowbotR, which uses this emoji-info layout in its automated tweets.
To keep the design simple and uncluttered, I’ve intentionally used a muted base map (‘Positron’ from CartoDB) and limited the amount of pop-up content.
In the pop-up you’ll see information from the tweet, including the satellite image and printed coordinates; URLs to the original tweet and OpenStreetMap; plus the reverse-geocoded info we got from {PostcodesioR}.
Since there are thousands of points, it makes sense to cluster them with markerClusterOptions()
to avoid graphical and navigational troubles. Click a cluster to expand until you reach a marker.
The map
lmb_map
If you can’t see the satellite photos in each pop-up you may need to change browser.
Development
I made this for my own amusement and as an excuse to use {PostcodesioR} and reacquaint myself with {leaflet}. If I were going to develop it, I would make a Shiny app that continuously refreshes with the latest tweet information. I may revisit londonmapbot in future, or create a new bot; in which case the reverse geocoding capabilities of {PostcodesioR} could come in handy for providing more content in tweet text.
Environment
Session info
Last rendered: 2023-08-20 22:35:11 BST
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] leaflet_2.1.2 PostcodesioR_0.3.1 geojsonio_0.11.1 sf_1.0-14
[5] rtweet_1.1.0 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[9] dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[13] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gtable_0.3.3 xfun_0.39 htmlwidgets_1.6.2
[4] lattice_0.21-8 tzdb_0.4.0 leaflet.providers_1.9.0
[7] crosstalk_1.2.0 vctrs_0.6.3 tools_4.3.1
[10] generics_0.1.3 curl_5.0.1 proxy_0.4-27
[13] fansi_1.0.4 pkgconfig_2.0.3 KernSmooth_2.23-22
[16] assertthat_0.2.1 lifecycle_1.0.3 compiler_4.3.1
[19] munsell_0.5.0 fontawesome_0.5.1 jqr_1.2.3
[22] htmltools_0.5.5 class_7.3-22 yaml_2.3.7
[25] lazyeval_0.2.2 crayon_1.5.2 pillar_1.9.0
[28] ellipsis_0.3.2 classInt_0.4-9 tidyselect_1.2.0
[31] digest_0.6.33 stringi_1.7.12 fastmap_1.1.1
[34] grid_4.3.1 colorspace_2.1-0 cli_3.6.1
[37] magrittr_2.0.3 emo_0.0.0.9000 crul_1.4.0
[40] utf8_1.2.3 e1071_1.7-13 withr_2.5.0
[43] scales_1.2.1 sp_2.0-0 timechange_0.2.0
[46] httr_1.4.6 rmarkdown_2.23 hms_1.1.3
[49] evaluate_0.21 knitr_1.43.1 V8_4.3.3
[52] geojson_0.3.4 rlang_1.1.1 Rcpp_1.0.11
[55] glue_1.6.2 DBI_1.1.3 httpcode_0.3.0
[58] geojsonsf_2.0.3 rstudioapi_0.15.0 jsonlite_1.8.7
[61] R6_2.5.1 units_0.8-2
Footnotes
I’m going to assume this is also a wrestling move.↩︎