Skip to content

Latest commit

 

History

History
108 lines (53 loc) · 3.24 KB

monitoring_events_twarc.md

File metadata and controls

108 lines (53 loc) · 3.24 KB
layout title
page
Monitoring Events with Twarc

Back to Home Page

Monitoring Events Using twarc Filter and Search

This is a narrative guide outlining how to start a search and a filter and combine the results once the event is over. We're going to running this on a recent news event about the Governor of Florida, but any topic will work.

Table of Contents

Before You Start
Filter and Search
Dehydrate
Combine
Rehydrate
Deduplicate
Analysis

Before starting this guide, make sure you have twarc installed and setup.

Next you're going to want to run twarc filter which collects tweets from the Twitter stream matching the filter criteria, and twarc search which collects tweets made in the past seven days matching the search criteria. There are a couple of ways this can be done, but the most preferable is to run two command line windows.

twarc filter desantis > desantis_filter.jsonl

twarc search desantis > desantis_search.jsonl

The search command will finish before the filter which will keep running until manually stopped. Once we are finished running the search, we can work on combining the two JSONLs.

We will start by dehydrating the two collected datasets.

twarc dehydrate desantis_filter.jsonl > desantis_filter.txt 
   
twarc dehydrate desantis_search.jsonl  > desantis_search.txt

Now that the datasets have been dehydrated, we can use the python program combine.py here to combine them.

python utils/combine.py 

And enter the input requests as follows:

Enter the name of your filter txt: desantis_filter.txt
Enter the name of your search txt: desantis_search.txt
Enter the name of your output txt: desantis_fs.txt    

Now that we have our merged dataset, we can rehydrate the dataset.

twarc hydrate desantis_fs.txt > desantis_fs.jsonl

Then, we can run deduplicate.py to remove any overlap from the merging of the two datasets.

python utils/deduplicate.py desantis_fs.jsonl > desantis.jsonl

All of the usage is displayed in the command line here:

DESANTIS1

DESANTIS2

Now that we have our merged dataset without duplicate ID's, we can perform analysis using the python utilities provided with twarc. See the twarc page for more information and links the the repository.

You can download the DeSantis files from the twitter repo.

Back To Top