Releases: echen102/ukraine-russia
Release v1.5
The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 2/22/22 - 2/17/23.
Due to Twitter's changing policies around their free API, we are unsure of how this will impact academic access to the API. We will continue to collect tweets and update this repository for as long as we can.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement / How to Cite
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:
Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488
BibTeX:
@misc{chen2022tweets,
title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia},
author={Emily Chen and Emilio Ferrara},
year={2022},
eprint={2203.07488},
archivePrefix={arXiv},
primaryClass={cs.SI}
}
Statistics Summary (v1.5)
Number of Tweets : 620,510,853
Language breakdown of top 10 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 440,353,617 | 70.97% |
French | fr | 26,249,721 | 4.23% |
Spanish | es | 23,284,103 | 3.75% |
German | de | 20,754,440 | 3.34% |
Italian | it | 15,342,240 | 2.47% |
Russian | ru | 13,520,722 | 2.18% |
Undefined | und | 12,902,561 | 2.08% |
Japanese | ja | 11,419,369 | 1.84% |
Ukrainian | uk | 11,163,902 | 1.8% |
Turkish | tr | 8,088,613 | 1.3% |
Known Gaps
Date | Time |
---|
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.4
The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 2/22/22 - 2/01/23.
Due to Twitter's changing policies around their free API, we are unsure of how this will impact academic access to the API. We will continue to collect tweets and update this repository for as long as we can.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement / How to Cite
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:
Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488
BibTeX:
@misc{chen2022tweets,
title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia},
author={Emily Chen and Emilio Ferrara},
year={2022},
eprint={2203.07488},
archivePrefix={arXiv},
primaryClass={cs.SI}
}
Statistics Summary (v1.4)
Number of Tweets : 601,848,471
Language breakdown of top 10 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 427,556,217 | 71.04% |
French | fr | 25,059,327 | 4.16% |
Spanish | es | 22,843,274 | 3.8% |
German | de | 19,766,768 | 3.28% |
Italian | it | 14,773,605 | 2.45% |
Russian | ru | 13,157,263 | 2.19% |
Undefined | und | 12,735,856 | 2.12% |
Japanese | ja | 11,190,426 | 1.86% |
Ukrainian | uk | 10,829,828 | 1.8% |
Turkish | tr | 7,812,365 | 1.3% |
Known Gaps
Date | Time |
---|
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.3
The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 2/22/22 - 1/08/23.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement / How to Cite
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:
Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488
BibTeX:
@misc{chen2022tweets,
title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia},
author={Emily Chen and Emilio Ferrara},
year={2022},
eprint={2203.07488},
archivePrefix={arXiv},
primaryClass={cs.SI}
}
Statistics Summary (v1.3)
Number of Tweets : 571,558,960
Language breakdown of top 10 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 406,376,392 | 71.1% |
French | fr | 23,371,718 | 4.09% |
Spanish | es | 22,154,023 | 3.88% |
German | de | 18,154,279 | 3.18% |
Italian | it | 14,011,497 | 2.45% |
Russian | ru | 12,575,979 | 2.2% |
Undefined | und | 12,468,591 | 2.18% |
Japanese | ja | 10,845,286 | 1.9% |
Ukrainian | uk | 10,243,136 | 1.79% |
Turkish | tr | 7,346,366 | 1.29% |
Known Gaps
Date | Time |
---|
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.2
The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 2/22/22 - 10/01/22.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement / How to Cite
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:
Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488
BibTeX:
@misc{chen2022tweets,
title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia},
author={Emily Chen and Emilio Ferrara},
year={2022},
eprint={2203.07488},
archivePrefix={arXiv},
primaryClass={cs.SI}
}
Statistics Summary (v1.2)
Number of Tweets : 454,488,445
Language breakdown of top 15 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 321,088,619 | 70.65% |
Spanish | es | 18,358,931 | 4.04% |
French | fr | 17,857,397 | 3.93% |
German | de | 14,533,854 | 3.2% |
Italian | it | 11,589,565 | 2.55% |
Undefined | und | 11,473,234 | 2.52% |
Russian | ru | 9,968,421 | 2.19% |
Japanese | ja | 9,113,466 | 2.01% |
Ukrainian | uk | 8,016,384 | 1.76% |
Turkish | tr | 6,219,988 | 1.37% |
Portuguese | pt | 3,897,544 | 0.86% |
Polish | pl | 3,411,167 | 0.75% |
Dutch | nl | 1,837,698 | 0.4% |
Indonesian | in | 1,607,514 | 0.35% |
Chinese | zh | 1,430,735 | 0.31% |
Known Gaps
Date | Time |
---|
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.1
The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 2/22/22 - 03/27/22.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement / How to Cite
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:
Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488
BibTeX:
@misc{chen2022tweets,
title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia},
author={Emily Chen and Emilio Ferrara},
year={2022},
eprint={2203.07488},
archivePrefix={arXiv},
primaryClass={cs.SI}
}
Statistics Summary (v1.1)
Number of Tweets : 141,084,354
Language breakdown of top 15 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 103,148,176 | 73.11% |
Spanish | es | 6,824,764 | 4.84% |
French | fr | 5,322,756 | 3.77% |
Undefined | und | 4,742,414 | 3.36% |
Japanese | ja | 2,911,591 | 2.06% |
German | de | 2,873,438 | 2.04% |
Italian | it | 2,650,310 | 1.88% |
Russian | ru | 1,914,236 | 1.36% |
Turkish | tr | 1,668,966 | 1.18% |
Portuguese | pt | 1,518,027 | 1.08% |
Ukrainian | uk | 1,451,532 | 1.03% |
Polish | pl | 1,134,799 | 0.8% |
Thai | th | 715,638 | 0.51% |
Indonesian | in | 540,813 | 0.38% |
Dutch | nl | 437,723 | 0.31% |
Known Gaps
Date | Time |
---|
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.0
The repository contains an ongoing collection of tweets IDs associated with the current conflict in Ukraine and Russia, which we commenced collecting on February 22, 2022. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 2/22/22 - 03/08/22.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement / How to Cite
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:
Emily Chen and Emilio Ferrara. 2022. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. arXiv:cs.SI/2203.07488
BibTeX:
@misc{chen2022tweets,
title={Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia},
author={Emily Chen and Emilio Ferrara},
year={2022},
eprint={2203.07488},
archivePrefix={arXiv},
primaryClass={cs.SI}
}
Statistics Summary (v1.0)
Number of Tweets : 63,417,299
Language breakdown of top 15 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 46,027,619 | 72.58% |
Spanish | es | 3,568,412 | 5.63% |
French | fr | 2,388,366 | 3.77% |
Undefined | und | 2,055,267 | 3.24% |
German | de | 1,289,814 | 2.03% |
Japanese | ja | 1,232,532 | 1.94% |
Turkish | tr | 1,006,548 | 1.59% |
Italian | it | 903,205 | 1.42% |
Portuguese | pt | 899,671 | 1.42% |
Russian | ru | 622,920 | 0.98% |
Thai | th | 503,889 | 0.79% |
Polish | pl | 465,238 | 0.73% |
Ukrainian | uk | 406,535 | 0.64% |
Indonesian | in | 302,191 | 0.48% |
Hindi | hi | 272,851 | 0.43% |
Known Gaps
Date | Time |
---|
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.