Read CSV File Hosted on Google Drive

Pandas: How to read CSV file from google drive public?

Using pandas

import pandas as pd

url='https://drive.google.com/file/d/0B6GhBwm5vaB2ekdlZW5WZnppb28/view?usp=sharing'
file_id=url.split('/')[-2]
dwn_url='https://drive.google.com/uc?id=' + file_id
df = pd.read_csv(dwn_url)
print(df.head())

Using pandas and requests

import pandas as pd
import requests
from io import StringIO

url='https://drive.google.com/file/d/0B6GhBwm5vaB2ekdlZW5WZnppb28/view?usp=sharing'

file_id = url.split('/')[-2]
dwn_url='https://drive.google.com/uc?export=download&id=' + file_id
url2 = requests.get(dwn_url).text
csv_raw = StringIO(url2)
df = pd.read_csv(csv_raw)
print(df.head())

output

      sex   age state  cheq_balance  savings_balance  credit_score  special_offer
0 Female 10.0 FL 7342.26 5482.87 774 True
1 Female 14.0 CA 870.39 11823.74 770 True
2 Male 0.0 TX 3282.34 8564.79 605 True
3 Female 37.0 TX 4645.99 12826.76 608 True
4 Male NaN FL NaN 3493.08 551 False

Read csv file hosted on Google Drive

You could try it like this

id <- "0B-wuZ2XMFIBUd09Ob0pKVkRzQTA" # google file ID
read.csv(sprintf("https://docs.google.com/uc?id=%s&export=download", id))

Read CSV file from Google Drive or any cloud service with Python Pandas

You data is UTF16 encoded. You can read it specifying the encoding:

pd.read_csv(dwn_url, encoding='utf16')

Result:

           email first_name     last_name
0 NaN NaN NaN
1 uno@gmail.com Luca Rossi
2 due@gmail.com Daniel Bianchi
3 tre@gmail.com Gabriel Domeneghetti
4 qua@gmail.com Christian Bona
5 cin@gmail.com Simone Marsango

(read_csv can directly read from a url, no need for requests and StringIO.)

Error reading cvs with pandas from google drive url

Short answer - you can't put Google Drive URL to pd.read_csv(). You have to download the CSV file and use the actual path to it.

Basically, the Google Drive URL shows you that there is some CSV file. In reality, it's just a website (with HTML content) that shows you some information about the CSV file that they are hosting. That's what you see: <!DOCTYPE html>....

Locally, this works because you use an actual file system path that Pandas can read. If you want to do this with a remote file, you have to fetch the file so it's available in a local file system. In general, you can use wget or curl command, but this is not straightforward to do with Google Drive because you need to be authenticated with your Google account to access the file. There are some ideas on how to do that here and here.

The best way to download a file in Python / Jupyter notebook is to use gdown. You can install it via pip and provide your URL and it will download it for you.

# install gdown in terminal
pip install gdown

# download your file
gdown 'https://drive.google.com/uc?id=1iE1nHPJvglklttBEqX92_Mfg6421CtMq'

Notice the URL that we're providing to gdown.

import pandas as pd
pd.read_csv('/path/to/file.csv')

I created an example notebook for you in Deepnote, you can do the same in local Python repl, in VSCode, in Jupyter notebook, or in Google Colab.

There is a special way for you to connect to Drive from Colab by mounting Drive. More on that here.

Get CSV from google drive and then load to pandas

I believe your goal and situation as follows.

  • You want to download the CSV data from the CSV file on Google Drive.
  • You can get values from Google Spreadsheet using googleapis for python.

Pattern 1:

In this pattern, the CSV data is downloaded with googleapis. The downloaded CSV data is saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.

Sample script:

file_id = "###"  # Please set the file ID of the CSV file.

service = build('drive', 'v3', credentials=creds)
request = service.files().get_media(fileId=file_id)
fh = io.FileIO("sample.csv", mode='wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
  • In this case, the CSV data can be converted to the dataframe with df = pd.read_csv("sample.csv").

Pattern 2:

In this pattern, as a simple method, the access token is used from creds. The downloaded CSV data is not saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.

Sample script:

file_id = "###"  # Please set the file ID of the CSV file.

access_token = creds.token
url = "https://www.googleapis.com/drive/v3/files/" + file_id + "?alt=media"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res.text)
  • In this case, the CSV data can be directly converted to the dataframe with df = pd.read_csv(io.StringIO(res.text)).

Note:

  • In the following scripts, please include the scope of https://www.googleapis.com/auth/drive.readonly and/or https://www.googleapis.com/auth/drive. When you modified the scopes, please reauthorize the scopes. By this, the modified scopes are included in the access token. Please be careful this.

Reference:

  • Download files

Getting a csv read into R though a shareable google drive link

The answer was indicated in the post you linked. Namely,

id <- "0B5V8AyEFBTmXM1VIYUYxSG5tSjQ"
stuff <- read.csv(sprintf("https://docs.google.com/uc?id=%s&export=download", id))


Related Topics



Leave a reply



Submit