Get page generated with Javascript in Python
You could use Selenium Webdriver:
#!/usr/bin/env python
from contextlib import closing
from selenium.webdriver import Firefox # pip install selenium
from selenium.webdriver.support.ui import WebDriverWait
# use firefox to get page with javascript generated content
with closing(Firefox()) as browser:
browser.get(url)
button = browser.find_element_by_name('button')
button.click()
# wait for the page to load
WebDriverWait(browser, timeout=10).until(
lambda x: x.find_element_by_id('someId_that_must_be_on_new_page'))
# store it to string variable
page_source = browser.page_source
print(page_source)
scrape html generated by javascript with python
In Python, I think Selenium 1.0 is the way to go. It’s a library that allows you to control a real web browser from your language of choice.
You need to have the web browser in question installed on the machine your script runs on, but it looks like the most reliable way to programmatically interrogate websites that use a lot of JavaScript.
Python Scraping JavaScript page without the need of an installed browser
Aside from automating a browser your other 2 options are as follows:
try find the backend query that loads the data via javascript. It's not a guarantee that it will exist but open your browser's Developer Tools - Network tab - fetch/Xhr and then refresh the page, hopefully you'll see requests to a backend api that loads the data you want. If you do find a request click on it and explore the endpoint, headers and possibly the payload that is sent to get the response you are looking for, these can all be recreated in python using requests to that hidden endpoint.
the other possiblility is that the data hidden in the HTML within a script tag possibly in a json file... Open the Elements tab of your developer tools where you can see the HTML of the page, right click on the tag and click "expand recursively" this will open every tag (it might take a second) and you'll be able to scroll down and search for the data you want. Ignore the regular HTML tags, we know it is loaded by javascript so look through any "script" tag. If you do find it then you can hopefully find it in your script with a combination of Beautiful Soup to get the script tag and string slicing to just get out the json.
If neither of those produce results then try requests_html package, and specifically the "render" method. It automatically installs a headless browser when you first run the render method in your script.
What site is it, perhaps I can offer more help if I can see it?
Scraping elements generated by javascript queries using python
Working example :
import urllib
import requests
import json
url = "https://daphnecaruanagalizia.com/2017/10/crook-schembri-court-today-pleading-not-crook/"
encoded = urllib.parse.quote_plus(url)
# encoded = urllib.quote_plus(url) # for python 2 replace previous line by this
j = requests.get('https://count-server.sharethis.com/v2.0/get_counts?url=%s' % encoded).text
obj = json.loads(j)
print(obj['clicks']['twitter'] + obj['shares']['twitter'])
# => 5008
Explanation :
Inspecting the webpage, you can see that it does a request to this :
https://count-server.sharethis.com/v2.0/get_counts?url=https%3A%2F%2Fdaphnecaruanagalizia.com%2F2017%2F10%2Fcrook-schembri-court-today-pleading-not-crook%2F&cb=stButtons.processCB&wd=true
If you paste it in your browser you'll have all your answers. Then playing a bit with the url, you can see that removing extra parameters will give you a nice json.
So as you can see, you just have to replace the url
parameter of the request with the url of the page you want to get the twitter counts.
How to scrape a javascript website in Python?
You can access data via API (check out the Network tab):
For example,
import requests
url = "https://www.example.com/api/v3/news_feed/7"
data = requests.get(url).json()
How to get Javascript generated content
Try changing
html = driver.execute_script("return document.getElementsByTagName('table')[38].innerHTML")
print(html)
To:
target = driver.find_element_by_xpath('//table[@class="ACA_GridView ACA_Grid_Caption"]')
print(target.text)
Output:
Showing 1-13 of 13
License Type
License Number
First Name
Middle Initial
Last Name
Organization Name
DBA/Trade Name
License Status
License Expiration Date
Pharmacist
5302017621
Arthur
James
etc.
Using python Requests with javascript pages
You are going to have to make the same request (using the Requests library) that the javascript is making. You can use any number of tools (including those built into Chrome and Firefox) to inspect the http request that is coming from javascript and simply make this request yourself from Python.
Related Topics
How to Have an Element With an Id That Starts With a Number
Page Content Is Loaded With JavaScript and Jsoup Doesn't See It
How to Wait For the 'End' of 'Resize' Event and Only Then Perform an Action
How to Render an 'Atmosphere' Over a Rendering of the Earth in Three.Js
How to Get Current Formatted Date Dd/Mm/Yyyy in JavaScript and Append It to an Input
Innertext VS Innerhtml VS Label VS Text VS Textcontent VS Outertext
How to Create a Link Using JavaScript
JavaScript: How to Find Out If the User Browser Is Chrome
How to Apply CSS to Half of a Character
How to Turn Off Antialiasing on an HTML ≪Canvas≫ Element
How to Get Selected Text from a Textbox Control With JavaScript
Html5 Canvas Resize (Downscale) Image High Quality
Onchange Event on Input Type=Range Is Not Triggering in Firefox While Dragging
How to Set the Form Action Through JavaScript
Html5 Canvas Todataurl Returns Blank