top of page

Access to Twitter APIs

Updated: Nov 1, 2021

1. ---

2. title: "**Getting Data from Twitter**"

3. author: "NGiannakoulis"

4. date: '`r Sys.Date()`'

5. output:

6. html_document:

7. toc: yes

8. number_sections: yes

9. theme: cosmo

10. highlight: tango

11. ---

12.

13. <center><img

14. src="https://i.imgur.com/bROptKf.png">

15. </center>

16.

17. # **Introduction**

18.

19. Hi! In this kernel we are going to learn step by step how to use the Twitter API to capture data, using three different R packages: **twitterR**, **streamR** and **rtweet**.

20. Before starting the tutorial, take a look at the following basic prerequisites:

21.

22. **1**. You have already installed [**R**](https://www.r-project.org/) and are using [**RStudio**](https://www.rstudio.com/).

23.

24. **2**. You need a [**Twitter application**](https://apps.twitter.com/) and hence a [**Twitter account**](https://twitter.com/). Don't worry if you don't have a Twitter application,

25. in this kernel we are going to explain how to make one.

26.

27. I hope that once you have read this tutorial you will be able to easily capture data from Twitter. So let's begin!

28.

29. # **Create a Twitter Application**

30.

31. Using the Twitter API requires an authorized Twitter App and authenticated requests. Let's start creating the Twitter App.

32.

33. **1**. Sign in using your Twitter account and open the following link: https://dev.twitter.com/apps

34.

35. **2**. Click on the button "Create an app". The process consists of the following steps: user profile, account details, use case details, terms of service and email verificacion.

36.

37. **3**. Select your user profile to associate. This @username will be the admin of this developer account. For example, in my case, the user profile associated is @Xavier91vg.

38.

39. **4**. Select the option "I am requesting access for my own personal use" and add your account details (account name and primary country of operation).

40.

41. **5**. Fill out the form about your project. Here you have to describe what you would like to build with Twitter's API's (minimum characters: 300).

42.

43. **6**. Read and agree to the Terms of Service.

44.

45. **7**. To complete your application, check your inbox to confirm your email address.

46.

47. **8**. Wait while the application is under review. You'll receive an email when the review is complete.

48.

49. # **Generating access tokens**

50.

51. Follow the below steps to generate access tokens for an existing Twitter app:

52.

53. **1**. Login to your Twitter account on developer.twitter.com.

54.

55. **2**. Navigate to the Twitter app dashboard and open the Twitter app for which you would like to generate access tokens.

56.

57. **3**. Navigate to the "Keys and Tokens" page.

58.

59. **4**. Select "Create" under the "Access token & access token secret" section.

60.

61. If you have difficulties or doubts creating the Twitter Application and generating the access tokens, you can view this [simple tutorial](https://www.youtube.com/watch?v=M_gGUqhCJoU).

62.

63. # **RStudio Set Up: twitteR and streamR packages**

64.

65. Install the following required packages:

66.

67. - [**ROAuth**](https://cran.r-project.org/web/packages/ROAuth/index.html). Provides an interface to the OAuth 1.0 specification allowing users to authenticate via OAuth to the

68. server of their choice.

69.

70. - [**twitteR**](https://cran.r-project.org/web/packages/twitteR/twitteR.pdf). Provides access to the Twitter API. Most functionality of the API is supported, with a bias towards

71. API calls that are more useful in data analysis as opposed to daily interaction.

72.

73. - [**streamR**](https://cran.r-project.org/web/packages/streamR/index.html). Access to Twitter Streaming API via R. Functions to access Twitter's filter, sample, and user streams,

74. and to parse the output into data frames.

75.

76. ```{r eval=FALSE}

77. # Install required packages

78. install.packages("ROAuth")

79. install.packages("twitteR")

80. install.packages("streamR")

81. ```

82.

83. Once you have installed the packages, run the following script with the API keys and access tokens as input parameters.

84.

85. ```{r eval=FALSE}

86. # Load packages

87. require(twitteR)

88. library(ROAuth)

89.

90. # Parameters configuration

91. reqURL <- "https://api.twitter.com/oauth/request_token"

92. accessURL <- "https://api.twitter.com/oauth/access_token"

93. authURL <- "https://api.twitter.com/oauth/authorize"

94.

95. options(httr_oauth_cache=T)

96.

97. # Keys and tokens

98. consumer_key <- ""

99. consumer_secret <- ""

100. access_token <- ""

101. access_secret <- ""

102.

103. # twitteR authentication

104. setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

105.

106. # streamR authentication

107. credentials_file <- "my_oauth.Rdata"

108. if (file.exists(credentials_file)){

109. load(credentials_file)

110. } else {

111. cred <- OAuthFactory$new(consumerKey=consumer_key, consumerSecret=consumer_secret, requestURL=reqURL, accessURL=accessURL, authURL=authURL)

112. cred$handshake(cainfo=system.file("CurlSSL", "cacert.pem", package="RCurl"))

113. save(cred, file=credentials_file)

114. }

115. ```

116.

117. We use the [`setup_twitter_oauth()`](https://www.rdocumentation.org/packages/twitteR/versions/1.1.9/topics/setup_twitter_oauth) funtion to set up our authentication.

118. This function takes in the four Twitter credentials that we have generated from the API.

119.

120. <center>

121.

122. | **Arguments** | **Explanation** |

123. |:--------------------|:--------------------------------------------|

124. |**consumer_key** | The consumer key supplied by Twitter |

125. |**consumer_secret** | The consumer secret supplied by Twitter |

126. |**access_token** | The access token supplied by Twitter |

127. |**access_secret** | The access secret supplied by Twitter |

128.

129. </center>

130.

131. We are ready to capture some data!

132.

133. # **Capturing Twitter data: twitteR and streamR packages**

134.

135. There are different ways to obtain Twitter data. Two of the main ones are the APIs called REST and Streaming:

136.

137. - **REST API**. Return any authorized tweets which match the search criteria. This search API searches against a sampling of recent Tweets published in the past 7 days. You can use the

138. [`searchTwitter()`](https://www.rdocumentation.org/packages/twitteR/versions/1.1.9/topics/searchTwitter) R function from the twitteR package.

139.

140. - **Streaming API**. Opens a connection to Twitter's Streaming API that will return public statuses that match one or more filter predicates. In other words, with this API you can capture

141. Tweets in real time. Tweets can be filtered by keywords, users, language, and location.

142. You can use the [`filterStream()`](https://www.rdocumentation.org/packages/streamR/versions/0.4.5/topics/filterStream) R function from the streamR package.

143.

144. Let's view some examples.

145.

146. ## REST API examples

147.

148. In this first example, the function returns the last 20 Spanish tweets containing the hashtag #Obama.

149.

150. ```{r eval=FALSE}

151. # Load library

152. library(twitteR)

153.

154. # Capturing Twitter data

155. tweets <- searchTwitter("#Obama", n=20, lang="es")

156. ```

157.

158. In the following example we obtain the last 200 tweets containing the keyword "kaggle".

159.

160. ```{r eval=FALSE}

161. # Load library

162. library(twitteR)

163.

164. # Capturing Twitter data

165. tweets <- searchTwitter("kaggle", n=200)

166. ```

167.

168. You can use other parameters to further filter the results. For instance, search Tweets between two dates,

169.

170. ```{r eval=FALSE}

171. # Load library

172. library(twitteR)

173.

174. # Capturing Twitter data

175. tweets <- searchTwitter("kaggle", since='2019-05-09', until='2019-05-10')

176. ```

177.

178. Keep in mind that the search index has a 7-day limit!

179.

180. ## Streaming API examples

181.

182. The following example capture Tweets in real time containing the hashtag #NBA during 60 seconds.

183.

184. ```{r eval=FALSE}

185. # Load library

186. library(streamR)

187.

188. # Connect to Twitter stream a get messages

189. filterStream("tweets.json", track="#NBA", timeout=60, oauth=cred)

190. ```

191.

192. This API provides the captured data encoded using JavaScript Object Notation (JSON). JSON is based on key-value pairs, with named attributes and associated values. These attributes,

193. and their state are used to describe objects. If you want more information about Tweet JSON check

194. [here](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json.html).

195.

196. You may want to load the JSON file into RStudio to perform some data analysis. For this purpose you can use two functions from the `streamR` package:

197.

198. - [`parseTweets()`](https://www.rdocumentation.org/packages/streamR/versions/0.4.5/topics/parseTweets). Parses tweets downloaded using the `filterStream()`, `sampleStream()`

199. or `userStream()` functions and returns a **data frame** where each row corresponds to one tweet and each column represents a different field for each

200. tweet (id, text, created_at, etc.).

201.

202. - [`readTweets()`](https://www.rdocumentation.org/packages/streamR/versions/0.4.5/topics/readTweets). This function parses tweets downloaded using `filterStream()`, `sampleStream()` or

203. `userStream()` and returns a **list**.

204.

205. # **Another approach: rtweet package**

206.

207. <center><img

208. src="https://i.imgur.com/1un9dg8.png">

209. </center>

210.

211. There are several R packages for interacting with Twitter’s APIs. In this section we are going to discover the [rtweet](https://cran.r-project.org/web/packages/rtweet/index.html) package,

212. with which we can capture Twitter data more easily than with the previous ones. In fact, using this package it's no longer necessary to obtain a developer account and create your own

213. Twitter application. All you need is a Twitter account! See how rtweet compares to twitterR and streamR in the chart below.

214.

215. <center><img

216. src="https://i.imgur.com/Xctc7zH.png">

217. </center>

218.

219. <div style="text-align: right"> **Reference**: https://rtweet.info/ </div>

220.

221. Not bad! Let's see some examples.

222.

223. ## REST API examples

224.

225. Search for up to 15.000 tweets containing the rstats hashtag.

226.

227. ```{r eval=FALSE}

228. # Load library

229. library(rtweet)

230.

231. # Capturing Twitter data

232. tweets <- search_tweets("#rstats", n=15000)

233. ```

234.

235. In this second example, we are going to search for 10.000 tweets (non-retweeted) in the English language sent from the US.

236.

237. ```{r eval=FALSE}

238. # Load library

239. library(rtweet)

240.

241. # Capturing Twitter data

242. tweets <- search_tweets("lang:en", geocode=lookup_coords("usa"), n=10000, include_rts=FALSE)

243. ```

244.

245. The [`search_tweets()`](https://rtweet.info/reference/search_tweets.html) function returns a data frame where each observation (row) is a different tweet.

246.

247. ## Streaming API examples

248.

249. Stream all geo enabled tweets from London for 60 seconds.

250.

251. ```{r eval=FALSE}

252. # Load library

253. library(rtweet)

254.

255. # Capturing Twitter data

256. tweets <- stream_tweets(lookup_coords("london, uk"), timeout=60)

257. ```

258. Stream all tweets mentioning "cats" for 60 seconds.

259.

260. ```{r eval=FALSE}

261. # Load library

262. library(rtweet)

263.

264. # Capturing Twitter data

265. tweets <- stream_tweets("cats", timeout=60)

266. ```

267.

268. The [`stream_tweets()`](https://rtweet.info/reference/stream_tweets.html) function returns the tweets data returned as data frame with users data as attribute.

269.

270. ## Other interesting functions

271.

272. * [`get_friends()`](https://rtweet.info/reference/get_friends.html). Returns a list of user IDs for the accounts following BY one or more specified users.

273.

274. * [`get_followers()`](https://rtweet.info/reference/get_followers.html). Returns a list of user IDs for the accounts following specified user.

275.

276. * [`get_timelines()`](https://rtweet.info/reference/get_timeline.html). Returns up to 3.200 statuses posted to the timelines of each of one or more specified Twitter users.

277.

278. * [`get_favorites()`](https://rtweet.info/reference/get_favorites.html). Returns up to 3.000 statuses favorited by each of one or more specific Twitter users.

279.

280. * [`get_trends()`](https://rtweet.info/reference/get_trends.html). Get Twitter trends data.

281.

282. * [`get_mentions()`](https://www.rdocumentation.org/packages/rtweet/versions/0.6.9/topics/get_mentions). Returns data on up to 200 of the most recent mentions of the authenticating user.

283.

284. * [`get_retweets()`](https://www.rdocumentation.org/packages/rtweet/versions/0.6.9/topics/get_retweets). Returns a collection of the 100 most recent retweets of a given status.

285.

286. You can check all the other rtweet functions [here](https://cran.r-project.org/web/packages/rtweet/rtweet.pdf).

287.

288. # **Examples of captured data sets**

289.

290. - [**Tweets during Real Madrid vs Liverpool (2018 UEFA Champions League Final)**](https://www.kaggle.com/xvivancos/tweets-during-r-madrid-vs-liverpool-ucl-2018). JSON file containing

291. Tweets captured during the 2018 UEFA Champions League Final between Real Madrid and Liverpool. I used the `filterStream()` function to open a connection to Twitter's Streaming API,

292. using the keyword #UCLFinal. The capture started on Saturday, May 27th 6:45 pm UCT (beginning of the match) and finished on Saturday, May 27th 8:45 pm UCT.

293.

294. <center><img

295. src="https://i.imgur.com/UH2yKBH.png">

296. </center>

297.

298. - [**Tweets during Nintendo E3 2018 Conference**](https://www.kaggle.com/xvivancos/tweets-during-nintendo-e3-2018-conference). JSON file containing Tweets captured during the Nintendo E3 2018

299. Conference. I used the `filterStream()` function to open a connection to Twitter's Streaming API, using the keywords #NintendoE3 and #NintendoDirect. The capture started on Tuesday, June 12th 04:00

300. am UCT and finished on Tuesday, June 12th 05:00 am UCT.

301.

302. <center><img

303. src="https://i.imgur.com/tFEPslS.png">

304. </center>

305.

306. - [**Tweets during Cavaliers vs Warriors (3rd game of the 2018 NBA Finals)**](https://www.kaggle.com/xvivancos/tweets-during-cavaliers-vs-warriors). JSON file containing

307. Tweets captured during the 3rd game of the 2018 NBA Finals between Cleveland Cavaliers and Golden State Warriors. I used the `filterStream()` function to open a connection to Twitter's Streaming API,

308. using the keyword #NBAFinals. The capture started on Thursday, June 7th 01:13 am UCT and finished on Thursday, June 7th 01:58 am UCT.

309.

310. <center><img

311. src="https://i.imgur.com/LEscSOo.png">

312. </center>

313.

314. # **Additional documentation**

315.

316. - [**Get started with the Twitter developer platform**](https://developer.twitter.com/en/docs/basics/getting-started)

317.

318. - [**Twitter developer apps**](https://developer.twitter.com/en/docs/basics/apps/overview)

319.

320. - [**Developer portal**](https://developer.twitter.com/en/docs/basics/developer-portal/overview)

321.

322. - [**Authentication**](https://developer.twitter.com/en/docs/basics/authentication/overview/oauth)

323.

324. - [**Tweet objects**](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json.html)

325.

326. - [**Rate limits**](https://developer.twitter.com/en/docs/basics/rate-limits)

327.

328. - [**Security**](https://developer.twitter.com/en/docs/basics/security-best-practices)

329.


ree

Comments


©2019 by  NGiannakoulis 

bottom of page