The Easiest Way to Webscrape Tweets with Python
Before quarantine, I started a Tableau Public dashboard that contained an analysis of tweets from twitter personality @caucasianjames. Did I finish this dashboard? Nope- but I did figure out the laziest way to gather all of the tweets I needed based on a criteria! To get all of @caucasianjames’ tweets, I used a package called GetOldTweets3, which I consider to be the easiest way to webscrape tweets for a Python beginner.
Python and I were good friends back in college: I took CS classes with Python and occasionally did some tutoring for those classes. However, I haven’t had a chance to use it since college, so this blog post will be from the perspective of a might-as-well-be-a-beginner. In other words, Python beginners welcome here.
Before we move forward, I want to give a shout out to the post by Martin Beck that directed me toward GetOldTweets3. This post does a great job of outlining the differences between using Tweepy and GetOldTweets3, as well as how to do each. I used it as the base of my learning, but the below only expands on what he already has written. In case you were wondering, I specifically chose to use GetOldTweets3 over Tweepy for my lazy script because this way, I don’t need to deal with OAuth with the Twitter API.
So let’s get into the how-to: how to scrape tweets from a user using GetOldTweets3.
To install the package using pip, open your command line and enter the following:
pip install GetOldTweets3
This is done in the command line and not in the Python shell directly. If you attempt to do this in Python, the commands will not be recognized.
When the install is finished, you will get a notification that it was successful.
Creating the Python File and Importing the Package
To create a new Python file, open up shell and navigate File > New File.
To use GetOldTweets3, we’ll have to import it at the top of our file. Later on, we’ll also use the csv package to write the tweets we scrape into a csv file, so I will import that package as well.
importfollowed by the package name. You can assign an alias to the package by using keyword
asfollowed by the alias you would like to refer to the package as. In this case, I am going to create an alias for GetOldTweets3 because it is a little long to type out every time I want to use something from the package.
import GetOldTweets3 as got
Establishing our Variables
Next, we are going to create the criteria for the tweets we will be scraping. I’m going to establish my variables. In this case, I will be setting up the following:
Declaring variables in Python is very simple. We’ll write the following:
twitterHandle = 'caucasianjames'
tweetCount = 200
This means that from now on when I type “twitterHandle” it will refer to “caucasianjames” and when I type “tweetCount”, 200. I’m wrapping the value for twitterHandle in quotations, above, because it is a string. I’m starting with only 200 tweets as it is a relatively small sample size to use while we test.
Creating the Tweet Criteria
We are going to use the variables we just created to set the criteria with the set subroutines
Create a list of tweets
Now that we have our criteria, we can use it to get a list of all of the tweets meeting that criteria using
got.manager.TweetManager.getTweets(TweetCriteria). This gets me everything in the Tweet object class. I’m only interested in certain fields for my data source, so I’m creating another list that exports all of the fields I want from each tweet.
Here are all of my options as far as fields go:
Write tweets to a CSV for use in Tableau!
Finally, we get to output! We’re going to use the csv package we imported at the beginning. We’ll do this by opening the file (mine is currently blank), giving it a name, and creating a writer for it. Then, I can use a
for loop to iterate through my list, writing a new row in the csv file for each item in our