index.qmd

::: {.callout-note collapse=true appearance="simple"}

# Click here to see the R code for creating the below

Repository with the source code: <https://github.com/wolfganghuber/tweets>

```{r}
#| label: pkgs
#| message: false
library("jsonlite")
library("dplyr")
library("stringr")
```

The file data/tweets.js is in the Twitter archive (zip file) that I downloaded from X. Adapt path to whatever you have.

```{r}
#| label: fromjson
#| cache: true
#| warning: false
archivepath = "/Users/whuber/twitter/data" 
tweets = readLines(file.path(archivepath, "tweets.js")) |>
    sub("^window.YTD.tweets.part0 = ", "", x = _) |>
    fromJSON(flatten = TRUE) 
```

Select tweets and select relevant columns. Here, I chose to drop all retweets and keep all others. Adapt this to your liking.

```{r}
#| label: noRT
isrt = grepl("^RT", tweets$tweet.full_text)
out = dplyr::select(tweets[!isrt, ], all_of(c(
  date = "tweet.created_at", 
  text = "tweet.full_text", 
  id = "tweet.id", 
  retweets = "tweet.retweet_count",
  likes = "tweet.favorite_count",
  mediadf = "tweet.entities.media")
))
```

Some cleanup and prettification: add hyperlinks to URLs and tweet IDs, and sort by date (default: ascending). 

```{r}
#| label: prettify
out = mutate(out, 
    text = str_replace_all(text, "(https?://\\S+)", "<a href='\\1'>\\1</a>"),
    idhtml = sprintf('<a href="https://x.com/wolfgangkhuber/status/%s">%s</a>', id, id),
    date = strptime(out$date, "%a %b %d %H:%M:%S %z %Y", tz = "UTC")
) |> arrange(date)
```

Deal with media. Tweets that have media associated (images, movies) come with a `data.frame` in the `tweet.entities.media` column. We also just go and find all media whose filename contains the tweet ID (see code for `ip` below) and check consistency.

```{r}
#| label: media
indir  = file.path(archivepath, "tweets_media")
outdir = "media"
mediafiles = dir(indir)
out$media = character(nrow(out))

if (file.exists(outdir))
  unlink(outdir, recursive = TRUE)
dir.create(outdir)

for (i in seq_len(nrow(out))) {
  m = out$mediadf[[i]]
  if (!is.null(m)) {
    stopifnot(is.data.frame(m), nrow(m) == 1)
    key = tools::file_path_sans_ext(basename(m$media_url))
    im = grep(key, mediafiles)
    ip = grep(paste0("^", out$id[i]), mediafiles)
    if (length(ip) == 0) {
      message(sprintf("%s from tweet #%d not found",  key, i))
    } else {
      stopifnot(im %in% ip)
      file.copy(file.path(indir, mediafiles[ip]), outdir)
      out$media[i] = paste(
        "::: {.tweet-media}",
        paste(sprintf('![](%s){.lightbox .resized-image}', file.path(outdir, mediafiles[ip])), collapse = "\n"),
        ":::", sep = "\n") 
    }
  }
}
```

Create the markdown text for each tweet. The main work here is done by the [CSS file](tweetarchive.css).

```{r}
#| label: createtweets
tweetsmd = with(out, sprintf(
'::: {#%s .tweet}
::: {.tweet-header}
<span class="tweet-timestamp">%s Retweets: %s Likes: %s</span>
<span class="tweet-handle">%s</span>
:::  
::: {.tweet-content}
%s
:::  
%s
:::
', id, as.character(date), retweets, likes, idhtml, text, media
))
```

Inject into the document.

```{r}
#| label: showtweetsmock
#| eval: false
cat(tweetsmd, "\n", sep = "")
```
::: 

```{r}
#| label: showtweets
#| output: "asis"
#| echo: false
cat(tweetsmd, "\n", sep = "")
```