-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.qmd
120 lines (102 loc) · 3.2 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
::: {.callout-note collapse=true appearance="simple"}
# Click here to see the R code for creating the below
Repository with the source code: <https://github.com/wolfganghuber/tweets>
```{r}
#| label: pkgs
#| message: false
library("jsonlite")
library("dplyr")
library("stringr")
```
The file data/tweets.js is in the Twitter archive (zip file) that I downloaded from X. Adapt path to whatever you have.
```{r}
#| label: fromjson
#| cache: true
#| warning: false
archivepath = "/Users/whuber/twitter/data"
tweets = readLines(file.path(archivepath, "tweets.js")) |>
sub("^window.YTD.tweets.part0 = ", "", x = _) |>
fromJSON(flatten = TRUE)
```
Select tweets and select relevant columns. Here, I chose to drop all retweets and keep all others. Adapt this to your liking.
```{r}
#| label: noRT
isrt = grepl("^RT", tweets$tweet.full_text)
out = dplyr::select(tweets[!isrt, ], all_of(c(
date = "tweet.created_at",
text = "tweet.full_text",
id = "tweet.id",
retweets = "tweet.retweet_count",
likes = "tweet.favorite_count",
mediadf = "tweet.entities.media")
))
```
Some cleanup and prettification: add hyperlinks to URLs and tweet IDs, and sort by date (default: ascending).
```{r}
#| label: prettify
out = mutate(out,
text = str_replace_all(text, "(https?://\\S+)", "<a href='\\1'>\\1</a>"),
idhtml = sprintf('<a href="https://x.com/wolfgangkhuber/status/%s">%s</a>', id, id),
date = strptime(out$date, "%a %b %d %H:%M:%S %z %Y", tz = "UTC")
) |> arrange(date)
```
Deal with media. Tweets that have media associated (images, movies) come with a `data.frame` in the `tweet.entities.media` column. We also just go and find all media whose filename contains the tweet ID (see code for `ip` below) and check consistency.
```{r}
#| label: media
indir = file.path(archivepath, "tweets_media")
outdir = "media"
mediafiles = dir(indir)
out$media = character(nrow(out))
if (file.exists(outdir))
unlink(outdir, recursive = TRUE)
dir.create(outdir)
for (i in seq_len(nrow(out))) {
m = out$mediadf[[i]]
if (!is.null(m)) {
stopifnot(is.data.frame(m), nrow(m) == 1)
key = tools::file_path_sans_ext(basename(m$media_url))
im = grep(key, mediafiles)
ip = grep(paste0("^", out$id[i]), mediafiles)
if (length(ip) == 0) {
message(sprintf("%s from tweet #%d not found", key, i))
} else {
stopifnot(im %in% ip)
file.copy(file.path(indir, mediafiles[ip]), outdir)
out$media[i] = paste(
"::: {.tweet-media}",
paste(sprintf('![](%s){.lightbox .resized-image}', file.path(outdir, mediafiles[ip])), collapse = "\n"),
":::", sep = "\n")
}
}
}
```
Create the markdown text for each tweet. The main work here is done by the [CSS file](tweetarchive.css).
```{r}
#| label: createtweets
tweetsmd = with(out, sprintf(
'::: {#%s .tweet}
::: {.tweet-header}
<span class="tweet-timestamp">%s Retweets: %s Likes: %s</span>
<span class="tweet-handle">%s</span>
:::
::: {.tweet-content}
%s
:::
%s
:::
', id, as.character(date), retweets, likes, idhtml, text, media
))
```
Inject into the document.
```{r}
#| label: showtweetsmock
#| eval: false
cat(tweetsmd, "\n", sep = "")
```
:::
```{r}
#| label: showtweets
#| output: "asis"
#| echo: false
cat(tweetsmd, "\n", sep = "")
```