pandas read_html ValueError: No tables found
You can use requests
and avoid opening browser.
You can get current conditions by using:
https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15
and strip of 'jQuery1720724027235122559_1542743885014('
from the left and ')'
from the right. Then handle the json string.
You can get summary and history by calling the API with the following
https://api-ak.wunderground.com/api/606f3f6977348613/history_20170201null/units:both/v:2.0/q/pws:KMAHADLE7.json?callback=jQuery1720724027235122559_1542743885015&_=1542743886276
You then need to strip 'jQuery1720724027235122559_1542743885015('
from the front and ');'
from the right. You then have a JSON string you can parse.
Sample of JSON:
You can find these URLs by using F12 dev tools in browser and inspecting the network tab for the traffic created during page load.
An example for current
, noting there seems to be a problem with nulls
in the JSON so I am replacing with "placeholder"
:
import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
url = 'https://stationdata.wunderground.com/cgi-bin/stationlookup?station=KMAHADLE7&units=both&v=2.0&format=json&callback=jQuery1720724027235122559_1542743885014&_=15'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data= json.loads(s)
data = json_normalize(data)
df = pd.DataFrame(data)
print(df)
Python Pandas - read_html No tables Found
There's no table but you're in luck because the data is coming from a fetch:
https://datacrunch.9c9media.ca/statsapi/sports/hockey/leagues/nhl/sortablePlayerSeasonStats/skater?brand=tsn&type=json&seasonType=regularSeason&season=2021
pandas read_html - no tables found
the page is dynamic which means you'll need to to render the page first. So you would need to use something like Selenium to render the page, then you can pull the table using pandas .read_html()
:
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get("https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26")
html = driver.page_source
tables = pd.read_html(html)
data = tables[1]
driver.close()
Output:
print (data)
Time Temperature ... Precip Accum Condition
0 6:52 PM 68 F ... 0.0 in Mostly Cloudy
1 7:52 PM 69 F ... 0.0 in Mostly Cloudy
2 8:52 PM 70 F ... 0.0 in Mostly Cloudy
3 9:52 PM 67 F ... 0.0 in Cloudy
4 10:52 PM 65 F ... 0.0 in Partly Cloudy
5 11:42 PM 66 F ... 0.0 in Mostly Cloudy
6 11:52 PM 68 F ... 0.0 in Mostly Cloudy
7 12:08 AM 68 F ... 0.0 in Cloudy
8 12:52 AM 68 F ... 0.0 in Mostly Cloudy
9 1:52 AM 70 F ... 0.0 in Cloudy
10 2:13 AM 70 F ... 0.0 in Cloudy
11 2:52 AM 71 F ... 0.0 in Cloudy
12 3:52 AM 70 F ... 0.0 in Mostly Cloudy
13 4:19 AM 70 F ... 0.0 in Cloudy
14 4:29 AM 70 F ... 0.0 in Cloudy
15 4:52 AM 70 F ... 0.0 in Cloudy
16 5:25 AM 70 F ... 0.0 in Mostly Cloudy
17 5:52 AM 71 F ... 0.0 in Cloudy
18 6:52 AM 73 F ... 0.0 in Cloudy
19 7:52 AM 74 F ... 0.0 in Cloudy
20 8:52 AM 73 F ... 0.0 in Cloudy
21 9:52 AM 71 F ... 0.0 in Cloudy
22 10:52 AM 71 F ... 0.0 in Cloudy
23 11:52 AM 70 F ... 0.0 in Cloudy
24 12:52 PM 72 F ... 0.0 in Mostly Cloudy
25 1:52 PM 70 F ... 0.0 in Mostly Cloudy
26 2:52 PM 71 F ... 0.0 in Mostly Cloudy
27 3:52 PM 71 F ... 0.0 in Partly Cloudy
28 4:52 PM 68 F ... 0.0 in Mostly Cloudy
29 5:52 PM 66 F ... 0.0 in Mostly Cloudy
[30 rows x 11 columns]
pd.read_html-ValueError: No tables found
pd.read_html()
is for processing HTML documents, reading data contained within the <table>
tag inside the HTML document to be processed.
To process CSV files, you need pd.read_csv()
This will accept a URL as an argument, so the following should work for you:
import pandas as pd
url = "https://raw.githubusercontent.com/hadley/data-baby-names/master/baby-names.csv"
df = pd.read_csv(url)
df.head()
Related Topics
Flask Application Traceback Doesn't Show Up in Server Log
Get Human Readable Version of File Size
How to Send Non-English Unicode String Using Http Header
Size Legend for Plotly Bubble Map/Chart
Does Ruby Support Conditional Regular Expressions
Iterate Over Object Attributes in Python
How to Create a Large Pandas Dataframe from an SQL Query Without Running Out of Memory
Stream Large Binary Files with Urllib2 to File
How to Reversibly Store and Load a Pandas Dataframe To/From Disk
Permanent Fix for Opencv Videocapture
SQL Join or R's Merge() Function in Numpy
How to Compile Opencv for iOS7 (Arm64)
Data Scraping from Published Power Bi Visual
How to Import a JSON from a File on Cloud Storage to Bigquery
Why Can't Python Find Shared Objects That Are in Directories in Sys.Path
What's the Best Way to Return Multiple Values from a Function