File: json-apis.Rmd.orig

package info (click to toggle)
r-cran-jsonlite 1.9.1%2Bdfsg-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 1,340 kB
sloc: ansic: 3,792; sh: 9; makefile: 6
file content (140 lines) | stat: -rw-r--r-- 4,926 bytes
parent folder | download | duplicates (2)
---
title: "Fetching JSON data from REST APIs"
date: "`r Sys.Date()`"
output:
  html_document
vignette: >
  %\VignetteIndexEntry{Fetching JSON data from REST APIs}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r echo=FALSE}
library(knitr)
opts_chunk$set(comment="")

#this replaces tabs by spaces because latex-verbatim doesn't like tabs
#no longer needed with yajl
#toJSON <- function(...){
#  gsub("\t", "  ", jsonlite::toJSON(...), fixed=TRUE);
#}
```

This section lists some examples of public HTTP APIs that publish data in JSON format. These are great to get a sense of the complex structures that are encountered in real world JSON data. All services are free, but some require registration/authentication. Each example returns lots of data, therefore not all output is printed in this document.

```{r message=FALSE}
library(jsonlite)
```

## Github

Github is an online code repository and has APIs to get live data on almost all activity. Below some examples from a well known R package and author:

```{r}
hadley_orgs <- fromJSON("https://api.github.com/users/hadley/orgs")
hadley_repos <- fromJSON("https://api.github.com/users/hadley/repos")
gg_commits <- fromJSON("https://api.github.com/repos/hadley/ggplot2/commits")
gg_issues <- fromJSON("https://api.github.com/repos/hadley/ggplot2/issues")

#latest issues
paste(format(gg_issues$user$login), ":", gg_issues$title)
```

## CitiBike NYC

A single public API that shows location, status and current availability for all stations in the New York City bike sharing imitative.

```{r}
citibike <- fromJSON("https://gbfs.citibikenyc.com/gbfs/en/station_information.json")
stations <- citibike$data$stations
colnames(stations)
nrow(stations)
```

## Ergast

The Ergast Developer API is an experimental web service which provides a historical record of motor racing data for non-commercial purposes.

```{r}
res <- fromJSON('http://ergast.com/api/f1/2004/1/results.json')
drivers <- res$MRData$RaceTable$Races$Results[[1]]$Driver
colnames(drivers)
drivers[1:10, c("givenName", "familyName", "code", "nationality")]
```


## ProPublica

Below an example from the [ProPublica Nonprofit Explorer API](https://projects.propublica.org/nonprofits/api) where we retrieve the first 10 pages of tax-exempt organizations in the USA, ordered by revenue. The `rbind_pages` function is used to combine the pages into a single data frame.


```{r, message=FALSE}
#store all pages in a list first
baseurl <- "https://projects.propublica.org/nonprofits/api/v2/search.json?order=revenue&sort_order=desc"
pages <- list()
for(i in 0:10){
  mydata <- fromJSON(paste0(baseurl, "&page=", i), flatten=TRUE)
  message("Retrieving page ", i)
  pages[[i+1]] <- mydata$organizations
}

#combine all into one
organizations <- rbind_pages(pages)

#check output
nrow(organizations)
organizations[1:10, c("name", "city", "strein")]
```


## New York Times

The New York Times has several APIs as part of the NYT developer network. These interface to data from various departments, such as news articles, book reviews, real estate, etc. Registration is required (but free) and a key can be obtained at [here](http://developer.nytimes.com/signup). The code below includes some example keys for illustration purposes.

```{r}
#search for articles
article_key <- "&api-key=b75da00e12d54774a2d362adddcc9bef"
url <- "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=obamacare+socialism"
req <- fromJSON(paste0(url, article_key))
articles <- req$response$docs
colnames(articles)

#search for best sellers
books_key <- "&api-key=76363c9e70bc401bac1e6ad88b13bd1d"
url <- "http://api.nytimes.com/svc/books/v2/lists/overview.json?published_date=2013-01-01"
req <- fromJSON(paste0(url, books_key))
bestsellers <- req$results$list
category1 <- bestsellers[[1, "books"]]
subset(category1, select = c("author", "title", "publisher"))
```

## Twitter

The twitter API requires OAuth2 authentication. Some example code:

```{r}
#Create your own appication key at https://dev.twitter.com/apps
consumer_key = "EZRy5JzOH2QQmVAe9B4j2w";
consumer_secret = "OIDC4MdfZJ82nbwpZfoUO4WOLTYjoRhpHRAWj6JMec";

#Use basic auth
secret <- jsonlite::base64_enc(paste(consumer_key, consumer_secret, sep = ":"))
req <- httr::POST("https://api.twitter.com/oauth2/token",
  httr::add_headers(
    "Authorization" = paste("Basic", gsub("\n", "", secret)),
    "Content-Type" = "application/x-www-form-urlencoded;charset=UTF-8"
  ),
  body = "grant_type=client_credentials"
);

#Extract the access token
httr::stop_for_status(req, "authenticate with twitter")
token <- paste("Bearer", httr::content(req)$access_token)

#Actual API call
url <- "https://api.twitter.com/1.1/statuses/user_timeline.json?count=10&screen_name=Rbloggers"
req <- httr::GET(url, httr::add_headers(Authorization = token))
json <- httr::content(req, as = "text")
tweets <- fromJSON(json)
substring(tweets$text, 1, 100)
```