Skip to content

Releases: echen102/ukraine-russia

Release v1.5

22 Feb 03:04
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 2/22/22 - 2/17/23.

Due to Twitter's changing policies around their free API, we are unsure of how this will impact academic access to the API. We will continue to collect tweets and update this repository for as long as we can.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488

BibTeX:

@misc{chen2022tweets,
      title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia}, 
      author={Emily Chen and Emilio Ferrara},
      year={2022},
      eprint={2203.07488},
      archivePrefix={arXiv},
      primaryClass={cs.SI}
}

Statistics Summary (v1.5)

Number of Tweets : 620,510,853

Language breakdown of top 10 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 440,353,617 70.97%
French fr 26,249,721 4.23%
Spanish es 23,284,103 3.75%
German de 20,754,440 3.34%
Italian it 15,342,240 2.47%
Russian ru 13,520,722 2.18%
Undefined und 12,902,561 2.08%
Japanese ja 11,419,369 1.84%
Ukrainian uk 11,163,902 1.8%
Turkish tr 8,088,613 1.3%

Known Gaps

Date Time

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.4

07 Feb 03:10
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 2/22/22 - 2/01/23.

Due to Twitter's changing policies around their free API, we are unsure of how this will impact academic access to the API. We will continue to collect tweets and update this repository for as long as we can.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488

BibTeX:

@misc{chen2022tweets,
      title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia}, 
      author={Emily Chen and Emilio Ferrara},
      year={2022},
      eprint={2203.07488},
      archivePrefix={arXiv},
      primaryClass={cs.SI}
}

Statistics Summary (v1.4)

Number of Tweets : 601,848,471

Language breakdown of top 10 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 427,556,217 71.04%
French fr 25,059,327 4.16%
Spanish es 22,843,274 3.8%
German de 19,766,768 3.28%
Italian it 14,773,605 2.45%
Russian ru 13,157,263 2.19%
Undefined und 12,735,856 2.12%
Japanese ja 11,190,426 1.86%
Ukrainian uk 10,829,828 1.8%
Turkish tr 7,812,365 1.3%

Known Gaps

Date Time

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.3

13 Jan 12:31
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 2/22/22 - 1/08/23.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488

BibTeX:

@misc{chen2022tweets,
      title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia}, 
      author={Emily Chen and Emilio Ferrara},
      year={2022},
      eprint={2203.07488},
      archivePrefix={arXiv},
      primaryClass={cs.SI}
}

Statistics Summary (v1.3)

Number of Tweets : 571,558,960

Language breakdown of top 10 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 406,376,392 71.1%
French fr 23,371,718 4.09%
Spanish es 22,154,023 3.88%
German de 18,154,279 3.18%
Italian it 14,011,497 2.45%
Russian ru 12,575,979 2.2%
Undefined und 12,468,591 2.18%
Japanese ja 10,845,286 1.9%
Ukrainian uk 10,243,136 1.79%
Turkish tr 7,346,366 1.29%

Known Gaps

Date Time

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.2

07 Oct 00:08
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 2/22/22 - 10/01/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488

BibTeX:

@misc{chen2022tweets,
      title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia}, 
      author={Emily Chen and Emilio Ferrara},
      year={2022},
      eprint={2203.07488},
      archivePrefix={arXiv},
      primaryClass={cs.SI}
}

Statistics Summary (v1.2)

Number of Tweets : 454,488,445

Language breakdown of top 15 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 321,088,619 70.65%
Spanish es 18,358,931 4.04%
French fr 17,857,397 3.93%
German de 14,533,854 3.2%
Italian it 11,589,565 2.55%
Undefined und 11,473,234 2.52%
Russian ru 9,968,421 2.19%
Japanese ja 9,113,466 2.01%
Ukrainian uk 8,016,384 1.76%
Turkish tr 6,219,988 1.37%
Portuguese pt 3,897,544 0.86%
Polish pl 3,411,167 0.75%
Dutch nl 1,837,698 0.4%
Indonesian in 1,607,514 0.35%
Chinese zh 1,430,735 0.31%

Known Gaps

Date Time

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.1

06 Apr 22:02
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 2/22/22 - 03/27/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488

BibTeX:

@misc{chen2022tweets,
      title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia}, 
      author={Emily Chen and Emilio Ferrara},
      year={2022},
      eprint={2203.07488},
      archivePrefix={arXiv},
      primaryClass={cs.SI}
}

Statistics Summary (v1.1)

Number of Tweets : 141,084,354

Language breakdown of top 15 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 103,148,176 73.11%
Spanish es 6,824,764 4.84%
French fr 5,322,756 3.77%
Undefined und 4,742,414 3.36%
Japanese ja 2,911,591 2.06%
German de 2,873,438 2.04%
Italian it 2,650,310 1.88%
Russian ru 1,914,236 1.36%
Turkish tr 1,668,966 1.18%
Portuguese pt 1,518,027 1.08%
Ukrainian uk 1,451,532 1.03%
Polish pl 1,134,799 0.8%
Thai th 715,638 0.51%
Indonesian in 540,813 0.38%
Dutch nl 437,723 0.31%

Known Gaps

Date Time

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.0

16 Mar 06:34
083940d
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 2/22/22 - 03/08/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488

BibTeX:

@misc{chen2022tweets,
      title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia}, 
      author={Emily Chen and Emilio Ferrara},
      year={2022},
      eprint={2203.07488},
      archivePrefix={arXiv},
      primaryClass={cs.SI}
}

Statistics Summary (v1.0)

Number of Tweets : 63,417,299

Language breakdown of top 15 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 46,027,619 72.58%
Spanish es 3,568,412 5.63%
French fr 2,388,366 3.77%
Undefined und 2,055,267 3.24%
German de 1,289,814 2.03%
Japanese ja 1,232,532 1.94%
Turkish tr 1,006,548 1.59%
Italian it 903,205 1.42%
Portuguese pt 899,671 1.42%
Russian ru 622,920 0.98%
Thai th 503,889 0.79%
Polish pl 465,238 0.73%
Ukrainian uk 406,535 0.64%
Indonesian in 302,191 0.48%
Hindi hi 272,851 0.43%

Known Gaps

Date Time

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.