Skip to content

fastforwardlabs/tweetratio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter reply-to-retweet ratio scraping code

This is the scraping and front-end code used to acquire and visualize the data discussed in A Quick Look at the Reply-to-Retweet Ratio.

Installation

Requirements: Python 3.6+ (f-strings!)

$ git clone git@github.com:fastforwardlabs/tweetratio.git
$ cd tweetratio
$ python3 -m virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ mkdir -p raw minified csv  # for output

Usage

To download realDonaldTrump's last 3200 tweets as json, and add a reply_count field to each tweet, do

>>> import tweetratio
>>> tweetratio.get_user('realDonaldTrump')

This code has to scrape as well as make API calls, so it will take 30-60 minutes, depending on the speed of your internet connection.

The tweets can then be found in raw/realDonaldTrump.json.

If you want a minified copy of the tweets, which contains only the keys necessary for the visualization, and the same data as a CSV file, do

>>> import analysis
>>> analysis.process('realDonaldTrump')

The minified JSON is saved to minified/realDonaldTrump.json. The CSV is saved to csv/realDonaldTrump.csv.

Frontend

To run the visualization locally, download and minify the data for realDonaldTrump, BernieSanders, BarackObama, HillaryClinton, GovMikeHuckabee, dril and SpeakerRyan (see above). If you'd like to plot other accounts, download those and change web/app.js.

Then

$ mv minified/* web/data/
$ cd web
$ python3 -m http.server

and visit localhost:8000

Analysis

analysis.py contains simple code to load the tweets as a pandas DataFrame. For example:

>>> import analysis
>>> tweets = analysis.load_df()
>>> analysis.plot_trend(tweets)

U.S. Senators

batch_download.py demonstrates how to download the tweets for a list of users (e.g. the U.S. senators as of June 2017).