Access to Twitter APIs
- nikolaos giannakoulis

- Nov 27, 2019
- 8 min read
Updated: Nov 1, 2021
1. ---
2. title: "**Getting Data from Twitter**"
3. author: "NGiannakoulis"
4. date: '`r Sys.Date()`'
5. output:
6. html_document:
7. toc: yes
8. number_sections: yes
9. theme: cosmo
10. highlight: tango
11. ---
12.
13. <center><img
14. src="https://i.imgur.com/bROptKf.png">
15. </center>
16.
17. # **Introduction**
18.
19. Hi! In this kernel we are going to learn step by step how to use the Twitter API to capture data, using three different R packages: **twitterR**, **streamR** and **rtweet**.
20. Before starting the tutorial, take a look at the following basic prerequisites:
21.
22. **1**. You have already installed [**R**](https://www.r-project.org/) and are using [**RStudio**](https://www.rstudio.com/).
23.
24. **2**. You need a [**Twitter application**](https://apps.twitter.com/) and hence a [**Twitter account**](https://twitter.com/). Don't worry if you don't have a Twitter application,
25. in this kernel we are going to explain how to make one.
26.
27. I hope that once you have read this tutorial you will be able to easily capture data from Twitter. So let's begin!
28.
29. # **Create a Twitter Application**
30.
31. Using the Twitter API requires an authorized Twitter App and authenticated requests. Let's start creating the Twitter App.
32.
33. **1**. Sign in using your Twitter account and open the following link: https://dev.twitter.com/apps
34.
35. **2**. Click on the button "Create an app". The process consists of the following steps: user profile, account details, use case details, terms of service and email verificacion.
36.
37. **3**. Select your user profile to associate. This @username will be the admin of this developer account. For example, in my case, the user profile associated is @Xavier91vg.
38.
39. **4**. Select the option "I am requesting access for my own personal use" and add your account details (account name and primary country of operation).
40.
41. **5**. Fill out the form about your project. Here you have to describe what you would like to build with Twitter's API's (minimum characters: 300).
42.
43. **6**. Read and agree to the Terms of Service.
44.
45. **7**. To complete your application, check your inbox to confirm your email address.
46.
47. **8**. Wait while the application is under review. You'll receive an email when the review is complete.
48.
49. # **Generating access tokens**
50.
51. Follow the below steps to generate access tokens for an existing Twitter app:
52.
53. **1**. Login to your Twitter account on developer.twitter.com.
54.
55. **2**. Navigate to the Twitter app dashboard and open the Twitter app for which you would like to generate access tokens.
56.
57. **3**. Navigate to the "Keys and Tokens" page.
58.
59. **4**. Select "Create" under the "Access token & access token secret" section.
60.
61. If you have difficulties or doubts creating the Twitter Application and generating the access tokens, you can view this [simple tutorial](https://www.youtube.com/watch?v=M_gGUqhCJoU).
62.
63. # **RStudio Set Up: twitteR and streamR packages**
64.
65. Install the following required packages:
66.
67. - [**ROAuth**](https://cran.r-project.org/web/packages/ROAuth/index.html). Provides an interface to the OAuth 1.0 specification allowing users to authenticate via OAuth to the
68. server of their choice.
69.
70. - [**twitteR**](https://cran.r-project.org/web/packages/twitteR/twitteR.pdf). Provides access to the Twitter API. Most functionality of the API is supported, with a bias towards
71. API calls that are more useful in data analysis as opposed to daily interaction.
72.
73. - [**streamR**](https://cran.r-project.org/web/packages/streamR/index.html). Access to Twitter Streaming API via R. Functions to access Twitter's filter, sample, and user streams,
74. and to parse the output into data frames.
75.
76. ```{r eval=FALSE}
77. # Install required packages
78. install.packages("ROAuth")
79. install.packages("twitteR")
80. install.packages("streamR")
81. ```
82.
83. Once you have installed the packages, run the following script with the API keys and access tokens as input parameters.
84.
85. ```{r eval=FALSE}
86. # Load packages
87. require(twitteR)
88. library(ROAuth)
89.
90. # Parameters configuration
91. reqURL <- "https://api.twitter.com/oauth/request_token"
92. accessURL <- "https://api.twitter.com/oauth/access_token"
93. authURL <- "https://api.twitter.com/oauth/authorize"
94.
95. options(httr_oauth_cache=T)
96.
97. # Keys and tokens
98. consumer_key <- ""
99. consumer_secret <- ""
100. access_token <- ""
101. access_secret <- ""
102.
103. # twitteR authentication
104. setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
105.
106. # streamR authentication
107. credentials_file <- "my_oauth.Rdata"
108. if (file.exists(credentials_file)){
109. load(credentials_file)
110. } else {
111. cred <- OAuthFactory$new(consumerKey=consumer_key, consumerSecret=consumer_secret, requestURL=reqURL, accessURL=accessURL, authURL=authURL)
112. cred$handshake(cainfo=system.file("CurlSSL", "cacert.pem", package="RCurl"))
113. save(cred, file=credentials_file)
114. }
115. ```
116.
117. We use the [`setup_twitter_oauth()`](https://www.rdocumentation.org/packages/twitteR/versions/1.1.9/topics/setup_twitter_oauth) funtion to set up our authentication.
118. This function takes in the four Twitter credentials that we have generated from the API.
119.
120. <center>
121.
122. | **Arguments** | **Explanation** |
123. |:--------------------|:--------------------------------------------|
124. |**consumer_key** | The consumer key supplied by Twitter |
125. |**consumer_secret** | The consumer secret supplied by Twitter |
126. |**access_token** | The access token supplied by Twitter |
127. |**access_secret** | The access secret supplied by Twitter |
128.
129. </center>
130.
131. We are ready to capture some data!
132.
133. # **Capturing Twitter data: twitteR and streamR packages**
134.
135. There are different ways to obtain Twitter data. Two of the main ones are the APIs called REST and Streaming:
136.
137. - **REST API**. Return any authorized tweets which match the search criteria. This search API searches against a sampling of recent Tweets published in the past 7 days. You can use the
138. [`searchTwitter()`](https://www.rdocumentation.org/packages/twitteR/versions/1.1.9/topics/searchTwitter) R function from the twitteR package.
139.
140. - **Streaming API**. Opens a connection to Twitter's Streaming API that will return public statuses that match one or more filter predicates. In other words, with this API you can capture
141. Tweets in real time. Tweets can be filtered by keywords, users, language, and location.
142. You can use the [`filterStream()`](https://www.rdocumentation.org/packages/streamR/versions/0.4.5/topics/filterStream) R function from the streamR package.
143.
144. Let's view some examples.
145.
146. ## REST API examples
147.
148. In this first example, the function returns the last 20 Spanish tweets containing the hashtag #Obama.
149.
150. ```{r eval=FALSE}
151. # Load library
152. library(twitteR)
153.
154. # Capturing Twitter data
155. tweets <- searchTwitter("#Obama", n=20, lang="es")
156. ```
157.
158. In the following example we obtain the last 200 tweets containing the keyword "kaggle".
159.
160. ```{r eval=FALSE}
161. # Load library
162. library(twitteR)
163.
164. # Capturing Twitter data
165. tweets <- searchTwitter("kaggle", n=200)
166. ```
167.
168. You can use other parameters to further filter the results. For instance, search Tweets between two dates,
169.
170. ```{r eval=FALSE}
171. # Load library
172. library(twitteR)
173.
174. # Capturing Twitter data
175. tweets <- searchTwitter("kaggle", since='2019-05-09', until='2019-05-10')
176. ```
177.
178. Keep in mind that the search index has a 7-day limit!
179.
180. ## Streaming API examples
181.
182. The following example capture Tweets in real time containing the hashtag #NBA during 60 seconds.
183.
184. ```{r eval=FALSE}
185. # Load library
186. library(streamR)
187.
188. # Connect to Twitter stream a get messages
189. filterStream("tweets.json", track="#NBA", timeout=60, oauth=cred)
190. ```
191.
192. This API provides the captured data encoded using JavaScript Object Notation (JSON). JSON is based on key-value pairs, with named attributes and associated values. These attributes,
193. and their state are used to describe objects. If you want more information about Tweet JSON check
194. [here](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json.html).
195.
196. You may want to load the JSON file into RStudio to perform some data analysis. For this purpose you can use two functions from the `streamR` package:
197.
198. - [`parseTweets()`](https://www.rdocumentation.org/packages/streamR/versions/0.4.5/topics/parseTweets). Parses tweets downloaded using the `filterStream()`, `sampleStream()`
199. or `userStream()` functions and returns a **data frame** where each row corresponds to one tweet and each column represents a different field for each
200. tweet (id, text, created_at, etc.).
201.
202. - [`readTweets()`](https://www.rdocumentation.org/packages/streamR/versions/0.4.5/topics/readTweets). This function parses tweets downloaded using `filterStream()`, `sampleStream()` or
203. `userStream()` and returns a **list**.
204.
205. # **Another approach: rtweet package**
206.
207. <center><img
208. src="https://i.imgur.com/1un9dg8.png">
209. </center>
210.
211. There are several R packages for interacting with Twitter’s APIs. In this section we are going to discover the [rtweet](https://cran.r-project.org/web/packages/rtweet/index.html) package,
212. with which we can capture Twitter data more easily than with the previous ones. In fact, using this package it's no longer necessary to obtain a developer account and create your own
213. Twitter application. All you need is a Twitter account! See how rtweet compares to twitterR and streamR in the chart below.
214.
215. <center><img
216. src="https://i.imgur.com/Xctc7zH.png">
217. </center>
218.
219. <div style="text-align: right"> **Reference**: https://rtweet.info/ </div>
220.
221. Not bad! Let's see some examples.
222.
223. ## REST API examples
224.
225. Search for up to 15.000 tweets containing the rstats hashtag.
226.
227. ```{r eval=FALSE}
228. # Load library
229. library(rtweet)
230.
231. # Capturing Twitter data
232. tweets <- search_tweets("#rstats", n=15000)
233. ```
234.
235. In this second example, we are going to search for 10.000 tweets (non-retweeted) in the English language sent from the US.
236.
237. ```{r eval=FALSE}
238. # Load library
239. library(rtweet)
240.
241. # Capturing Twitter data
242. tweets <- search_tweets("lang:en", geocode=lookup_coords("usa"), n=10000, include_rts=FALSE)
243. ```
244.
245. The [`search_tweets()`](https://rtweet.info/reference/search_tweets.html) function returns a data frame where each observation (row) is a different tweet.
246.
247. ## Streaming API examples
248.
249. Stream all geo enabled tweets from London for 60 seconds.
250.
251. ```{r eval=FALSE}
252. # Load library
253. library(rtweet)
254.
255. # Capturing Twitter data
256. tweets <- stream_tweets(lookup_coords("london, uk"), timeout=60)
257. ```
258. Stream all tweets mentioning "cats" for 60 seconds.
259.
260. ```{r eval=FALSE}
261. # Load library
262. library(rtweet)
263.
264. # Capturing Twitter data
265. tweets <- stream_tweets("cats", timeout=60)
266. ```
267.
268. The [`stream_tweets()`](https://rtweet.info/reference/stream_tweets.html) function returns the tweets data returned as data frame with users data as attribute.
269.
270. ## Other interesting functions
271.
272. * [`get_friends()`](https://rtweet.info/reference/get_friends.html). Returns a list of user IDs for the accounts following BY one or more specified users.
273.
274. * [`get_followers()`](https://rtweet.info/reference/get_followers.html). Returns a list of user IDs for the accounts following specified user.
275.
276. * [`get_timelines()`](https://rtweet.info/reference/get_timeline.html). Returns up to 3.200 statuses posted to the timelines of each of one or more specified Twitter users.
277.
278. * [`get_favorites()`](https://rtweet.info/reference/get_favorites.html). Returns up to 3.000 statuses favorited by each of one or more specific Twitter users.
279.
280. * [`get_trends()`](https://rtweet.info/reference/get_trends.html). Get Twitter trends data.
281.
282. * [`get_mentions()`](https://www.rdocumentation.org/packages/rtweet/versions/0.6.9/topics/get_mentions). Returns data on up to 200 of the most recent mentions of the authenticating user.
283.
284. * [`get_retweets()`](https://www.rdocumentation.org/packages/rtweet/versions/0.6.9/topics/get_retweets). Returns a collection of the 100 most recent retweets of a given status.
285.
286. You can check all the other rtweet functions [here](https://cran.r-project.org/web/packages/rtweet/rtweet.pdf).
287.
288. # **Examples of captured data sets**
289.
290. - [**Tweets during Real Madrid vs Liverpool (2018 UEFA Champions League Final)**](https://www.kaggle.com/xvivancos/tweets-during-r-madrid-vs-liverpool-ucl-2018). JSON file containing
291. Tweets captured during the 2018 UEFA Champions League Final between Real Madrid and Liverpool. I used the `filterStream()` function to open a connection to Twitter's Streaming API,
292. using the keyword #UCLFinal. The capture started on Saturday, May 27th 6:45 pm UCT (beginning of the match) and finished on Saturday, May 27th 8:45 pm UCT.
293.
294. <center><img
295. src="https://i.imgur.com/UH2yKBH.png">
296. </center>
297.
298. - [**Tweets during Nintendo E3 2018 Conference**](https://www.kaggle.com/xvivancos/tweets-during-nintendo-e3-2018-conference). JSON file containing Tweets captured during the Nintendo E3 2018
299. Conference. I used the `filterStream()` function to open a connection to Twitter's Streaming API, using the keywords #NintendoE3 and #NintendoDirect. The capture started on Tuesday, June 12th 04:00
300. am UCT and finished on Tuesday, June 12th 05:00 am UCT.
301.
302. <center><img
303. src="https://i.imgur.com/tFEPslS.png">
304. </center>
305.
306. - [**Tweets during Cavaliers vs Warriors (3rd game of the 2018 NBA Finals)**](https://www.kaggle.com/xvivancos/tweets-during-cavaliers-vs-warriors). JSON file containing
307. Tweets captured during the 3rd game of the 2018 NBA Finals between Cleveland Cavaliers and Golden State Warriors. I used the `filterStream()` function to open a connection to Twitter's Streaming API,
308. using the keyword #NBAFinals. The capture started on Thursday, June 7th 01:13 am UCT and finished on Thursday, June 7th 01:58 am UCT.
309.
310. <center><img
311. src="https://i.imgur.com/LEscSOo.png">
312. </center>
313.
314. # **Additional documentation**
315.
316. - [**Get started with the Twitter developer platform**](https://developer.twitter.com/en/docs/basics/getting-started)
317.
318. - [**Twitter developer apps**](https://developer.twitter.com/en/docs/basics/apps/overview)
319.
320. - [**Developer portal**](https://developer.twitter.com/en/docs/basics/developer-portal/overview)
321.
322. - [**Authentication**](https://developer.twitter.com/en/docs/basics/authentication/overview/oauth)
323.
324. - [**Tweet objects**](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json.html)
325.
326. - [**Rate limits**](https://developer.twitter.com/en/docs/basics/rate-limits)
327.
328. - [**Security**](https://developer.twitter.com/en/docs/basics/security-best-practices)
329.





Comments