Importing Data into R from Google Spreadsheet

Importing data into R from google spreadsheet

Use the googlesheets4 package, a Google Sheets R API by Jenny Bryan. It is the best way to analyze and edit Google Sheets data in R. Not only can it pull data from Google Sheets, but you can edit the data in Google Sheets, create new sheets, etc.

The package can be installed with install.packages("googlesheets4").

There's a vignette for getting started; see her GitHub repository for more. And you also can install the latest development version of the package from that GitHub page, if desired.

Import Google sheet data directly into R

With the package gsheet you can read in a sheet from the shareable link.
Example given in the docs:

library(gsheet)
url <- 'docs.google.com/spreadsheets/d/1I9mJsS5QnXF2TNNntTy-HrcdHmIF9wJ8ONYvEJTXSNo'
a <- gsheet2tbl(url)

It's quite nice.

Import Google Spreadsheet with formula into R

All the GoogleSheets -> R packages export a csv from GoogleSheets, so it will only show the values.

The best way sounds like to copy-paste the forumlas of your Google Sheet into text page:

first do a find-replace of (equals)

= 

to (single quote-equals)

'= 

This will get rid of the formulas and change them back into strings, then copy-paste back into Google sheets or R.

Hacky :)

How to download all sheets in a google sheet in R

UPDATE 2021-01-31: updated code to use new functions that replaced sheets_find() and sheets_sheets() as of googlesheets4 version 0.2.0.

The googlesheets4 package includes a function to list all sheets associated with an account's Google Drive: sheets_find(). From the list of sheets one can use the sheet IDs to read the sheets into R.

library(googlesheets4)
sheets_auth()
theSheets <- gs4_find()
theSheets

My test account on Google has one Google sheet, a spreadsheet of Pokémon Stats.

> theSheets
# A tibble: 1 x 3
name id drive_resource
* <chr> <chr> <list>
1 PokemonStats 13rGxY7ScDUl7bFJ9NipO7QUafEACYTH4MagFjcj4pVw <named list [34]>

We can use the ID field to download the sheet.

pokemonData <- sheets_read(theSheets$id[1])
head(pokemonData)

> head(pokemonData)
# A tibble: 6 x 13
Number Name Type1 Type2 Total HP Attack Defense SpecialAtk SpecialDef Speed
<dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 Bulb… Grass Pois… 318 45 49 49 65 65 45
2 2 Ivys… Grass Pois… 405 60 62 63 80 80 60
3 3 Venu… Grass Pois… 525 80 82 83 100 100 80
4 3 Venu… Grass Pois… 625 80 100 123 122 120 80
5 4 Char… Fire NA 309 39 52 43 60 50 65
6 5 Char… Fire NA 405 58 64 58 80 65 80
# … with 2 more variables: Generation <dbl>, Legendary <lgl>
>

One could use the vector theSheets$id with lapply() to read a group of sheets from Google Drive as follows:

sheetList <- lapply(theSheets$id,sheet_read)

To read multiple worksheets within a Google Sheets spreadsheet, we add the sheet= argument to sheet_read(). Here we read the Pokémon Types from the second worksheet in the Pokémon Stats spreadsheet.

pokemonTypes <- sheets_read(theSheets$id[1],sheet = 2)
head(pokemonTypes)

...and the output:

> head(pokemonTypes)
# A tibble: 6 x 1
Type
<chr>
1 Fire
2 Grass
3 Poison
4 Water
5 Bug
6 Fighting
>

Reading all worksheets in a spreadsheet

We can automate the process of reading multiple tabs from a single spreadsheet. The sheets_sheets() function is useful for this purpose.

# technique where we read multiple worksheets by worksheet name
# using functions from googlesheets4 version 0.2.0.
theSheets <-gs4_find()
# get metadata from first sheet
sheetMetadata <- gs4_get(theSheets$id[1])
# get worksheet tab names
sheetNames <- sheet_names(theSheets$id[1])
sheetNames

At this point we can see that there are two worksheet tabs in the Pokémon Stats spreadsheet. We use the vector sheetNames with lapply() to read all the worksheets within the main spreadsheet.

theWorksheets <- lapply(sheetNames, function(x){             
sheets_read(theSheets$id[1],sheet = x)
})
# use the `names()` function to name the data frames stored in the list
names(theWorksheets) <- sheetNames
lapply(theWorksheets,head)

...and the output:

> lapply(theWorksheets,head)
$Pokemon
# A tibble: 6 x 13
Number Name Type1 Type2 Total HP Attack Defense SpecialAtk SpecialDef Speed
<dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 Bulb… Grass Pois… 318 45 49 49 65 65 45
2 2 Ivys… Grass Pois… 405 60 62 63 80 80 60
3 3 Venu… Grass Pois… 525 80 82 83 100 100 80
4 3 Venu… Grass Pois… 625 80 100 123 122 120 80
5 4 Char… Fire NA 309 39 52 43 60 50 65
6 5 Char… Fire NA 405 58 64 58 80 65 80
# … with 2 more variables: Generation <dbl>, Legendary <lgl>

$Metadata
# A tibble: 6 x 1
Type
<chr>
1 Fire
2 Grass
3 Poison
4 Water
5 Bug
6 Fighting

>

At this point individual worksheets can be accessed with the $ form of the extract operator, as theWorksheets$Pokemon or theWorksheets$Metadata.

How can I read a Google spreadsheet in a Team Drive in R?

This is currently a work in progress for a new reboot of the googlesheets package, called googlesheets4, which is not yet released on CRAN.

However, you can currently achieve what you want with a combination of the googlesheets package and the googledrive package, as you eluded to above.

First, you have to get the metadata for a particular spreadsheet using drive_get, and specify the team drive, either by name or ID or URL. Then, register the spreadsheet with any of the functions for this from the googlesheets package, i.e. gs_title, gs_key, or gs_url.

The workaround here is to set the arguments lookup = FALSE and visibility = "private" in gs_key.

I prefer to use team drive IDs and spreadsheets keys, as opposed to names. You can use team_drive_find to get the IDs of your team drives.

See below:

library(dplyr)   # For pipe operator, %>%
library(googledrive)
library(googlesheets)

data <-
drive_get(path = "example-googlesheet-name",
team_drive = as_id("0ABCDefGH2jK_Lm3NPA")) %>%
select(id) %>%
combine() %>%
gs_key(lookup = FALSE,
visibility = "private") %>%
gs_read_csv()

The above code reads in the example spreadsheet from a team drive, via the team drive's ID. From the metadata returned by drive_get, it selects the ID, combines it into a vector, and passes the ID to gs_key to register the spreadsheet. gs_read_csv then reads in the register spreadsheet and assigns the result to data.

See these Github issues for more information:

https://github.com/jennybc/googlesheets/issues/327

https://github.com/jennybc/googlesheets/pull/318



Related Topics



Leave a reply



Submit