Importing data into R from google spreadsheet
Use the googlesheets4 package, a Google Sheets R API by Jenny Bryan. It is the best way to analyze and edit Google Sheets data in R. Not only can it pull data from Google Sheets, but you can edit the data in Google Sheets, create new sheets, etc.
The package can be installed with install.packages("googlesheets4")
.
There's a vignette for getting started; see her GitHub repository for more. And you also can install the latest development version of the package from that GitHub page, if desired.
Import Google sheet data directly into R
With the package gsheet
you can read in a sheet from the shareable link.
Example given in the docs:
library(gsheet)
url <- 'docs.google.com/spreadsheets/d/1I9mJsS5QnXF2TNNntTy-HrcdHmIF9wJ8ONYvEJTXSNo'
a <- gsheet2tbl(url)
It's quite nice.
Import Google Spreadsheet with formula into R
All the GoogleSheets -> R packages export a csv from GoogleSheets, so it will only show the values.
The best way sounds like to copy-paste the forumlas of your Google Sheet into text page:
first do a find-replace of (equals)
=
to (single quote-equals)
'=
This will get rid of the formulas and change them back into strings, then copy-paste back into Google sheets or R.
Hacky :)
How to download all sheets in a google sheet in R
UPDATE 2021-01-31: updated code to use new functions that replaced sheets_find()
and sheets_sheets()
as of googlesheets4
version 0.2.0.
The googlesheets4
package includes a function to list all sheets associated with an account's Google Drive: sheets_find()
. From the list of sheets one can use the sheet IDs to read the sheets into R.
library(googlesheets4)
sheets_auth()
theSheets <- gs4_find()
theSheets
My test account on Google has one Google sheet, a spreadsheet of Pokémon Stats.
> theSheets
# A tibble: 1 x 3
name id drive_resource
* <chr> <chr> <list>
1 PokemonStats 13rGxY7ScDUl7bFJ9NipO7QUafEACYTH4MagFjcj4pVw <named list [34]>
We can use the ID field to download the sheet.
pokemonData <- sheets_read(theSheets$id[1])
head(pokemonData)
> head(pokemonData)
# A tibble: 6 x 13
Number Name Type1 Type2 Total HP Attack Defense SpecialAtk SpecialDef Speed
<dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 Bulb… Grass Pois… 318 45 49 49 65 65 45
2 2 Ivys… Grass Pois… 405 60 62 63 80 80 60
3 3 Venu… Grass Pois… 525 80 82 83 100 100 80
4 3 Venu… Grass Pois… 625 80 100 123 122 120 80
5 4 Char… Fire NA 309 39 52 43 60 50 65
6 5 Char… Fire NA 405 58 64 58 80 65 80
# … with 2 more variables: Generation <dbl>, Legendary <lgl>
>
One could use the vector theSheets$id
with lapply()
to read a group of sheets from Google Drive as follows:
sheetList <- lapply(theSheets$id,sheet_read)
To read multiple worksheets within a Google Sheets spreadsheet, we add the sheet=
argument to sheet_read()
. Here we read the Pokémon Types from the second worksheet in the Pokémon Stats spreadsheet.
pokemonTypes <- sheets_read(theSheets$id[1],sheet = 2)
head(pokemonTypes)
...and the output:
> head(pokemonTypes)
# A tibble: 6 x 1
Type
<chr>
1 Fire
2 Grass
3 Poison
4 Water
5 Bug
6 Fighting
>
Reading all worksheets in a spreadsheet
We can automate the process of reading multiple tabs from a single spreadsheet. The sheets_sheets()
function is useful for this purpose.
# technique where we read multiple worksheets by worksheet name
# using functions from googlesheets4 version 0.2.0.
theSheets <-gs4_find()
# get metadata from first sheet
sheetMetadata <- gs4_get(theSheets$id[1])
# get worksheet tab names
sheetNames <- sheet_names(theSheets$id[1])
sheetNames
At this point we can see that there are two worksheet tabs in the Pokémon Stats spreadsheet. We use the vector sheetNames
with lapply()
to read all the worksheets within the main spreadsheet.
theWorksheets <- lapply(sheetNames, function(x){
sheets_read(theSheets$id[1],sheet = x)
})
# use the `names()` function to name the data frames stored in the list
names(theWorksheets) <- sheetNames
lapply(theWorksheets,head)
...and the output:
> lapply(theWorksheets,head)
$Pokemon
# A tibble: 6 x 13
Number Name Type1 Type2 Total HP Attack Defense SpecialAtk SpecialDef Speed
<dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 Bulb… Grass Pois… 318 45 49 49 65 65 45
2 2 Ivys… Grass Pois… 405 60 62 63 80 80 60
3 3 Venu… Grass Pois… 525 80 82 83 100 100 80
4 3 Venu… Grass Pois… 625 80 100 123 122 120 80
5 4 Char… Fire NA 309 39 52 43 60 50 65
6 5 Char… Fire NA 405 58 64 58 80 65 80
# … with 2 more variables: Generation <dbl>, Legendary <lgl>
$Metadata
# A tibble: 6 x 1
Type
<chr>
1 Fire
2 Grass
3 Poison
4 Water
5 Bug
6 Fighting
>
At this point individual worksheets can be accessed with the $
form of the extract operator, as theWorksheets$Pokemon
or theWorksheets$Metadata
.
How can I read a Google spreadsheet in a Team Drive in R?
This is currently a work in progress for a new reboot of the googlesheets package, called googlesheets4, which is not yet released on CRAN.
However, you can currently achieve what you want with a combination of the googlesheets package and the googledrive package, as you eluded to above.
First, you have to get the metadata for a particular spreadsheet using drive_get
, and specify the team drive, either by name or ID or URL. Then, register the spreadsheet with any of the functions for this from the googlesheets package, i.e. gs_title
, gs_key
, or gs_url
.
The workaround here is to set the arguments lookup = FALSE
and visibility = "private"
in gs_key
.
I prefer to use team drive IDs and spreadsheets keys, as opposed to names. You can use team_drive_find
to get the IDs of your team drives.
See below:
library(dplyr) # For pipe operator, %>%
library(googledrive)
library(googlesheets)
data <-
drive_get(path = "example-googlesheet-name",
team_drive = as_id("0ABCDefGH2jK_Lm3NPA")) %>%
select(id) %>%
combine() %>%
gs_key(lookup = FALSE,
visibility = "private") %>%
gs_read_csv()
The above code reads in the example spreadsheet from a team drive, via the team drive's ID. From the metadata returned by drive_get
, it selects the ID, combines it into a vector, and passes the ID to gs_key
to register the spreadsheet. gs_read_csv
then reads in the register spreadsheet and assigns the result to data
.
See these Github issues for more information:
https://github.com/jennybc/googlesheets/issues/327
https://github.com/jennybc/googlesheets/pull/318
Related Topics
What Is a Good Way to Read Line-By-Line in R
How to Have Conditional Markdown Chunk Execution in Rmarkdown
How to Break Out of a Foreach Loop
Passing List of Named Parameters to Function
Implementation of Parallel Coordinates
One-Class Classification with Svm in R
How to Change Font Family in a Legend in an R-Plot
Names of R's Available Packages
Using Lapply and Read.CSV on Multiple Files (In R)
Plotting Normal Curve Over Histogram Using Ggplot2: Code Produces Straight Line at 0
Recommended Package for Very Large Dataset Processing and MAChine Learning in R
How to Change X-Axis Tick Label Names, Order and Boxplot Colour Using R Ggplot
Force Ggplot2 Scatter Plot to Be Square Shaped
Population Pyramid Density Plot in R
How to Suppress Automatic Table Name and Number in an .Rmd File Using Xtable or Knitr::Kable