Python Library for Rendering HTML and JavaScript

How can I render JavaScript HTML to HTML in python?

You can pip install selenium from a command line, and then run something like:

from selenium import webdriver
from urllib2 import urlopen

url = 'http://www.google.com'
file_name = 'C:/Users/Desktop/test.txt'

conn = urlopen(url)
data = conn.read()
conn.close()

file = open(file_name,'wt')
file.write(data)
file.close()

browser = webdriver.Firefox()
browser.get('file:///'+file_name)
html = browser.page_source
browser.quit()

How to render HTML in python?

I have fixed this by using the tkinterweb library.

Code:

import tkinter
from tkinterweb import HtmlFrame

screen = tkinter.Tk()
screen.geometry("700x700")
frame = HtmlFrame(screen, horizontal_scrollbar="auto")
urlInput = tkinter.Entry(screen)

def search():
frame.load_website(urlInput.get())

button = tkinter.Button(screen,text="search",command=search)
frame = HtmlFrame(screen)
urlInput.grid(row=0,column=0,columnspan=2)
button.grid(row=1,column=0)
frame.grid(row=2,column=0)
screen.mainloop()

This is for anyone who wants to know how I solved it

Trouble getting the trade-price using Requests-HTML library

You have several errors. The first is a 'navigation' timeout, showing that the page didn’t complete rendering:

Exception in callback NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49
handle: <Handle NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49>
Traceback (most recent call last):
File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\asyncio\events.py", line 145, in _run
self._callback(*self._args)
File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py", line 52, in watchdog_cb
self._timeout)
File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py", line 40, in _raise_error
raise error
concurrent.futures._base.TimeoutError: Navigation Timeout Exceeded: 3000 ms exceeded

This traceback is not raised in the main thread, your code was not aborted because of this. Your page may or may not be complete; you may want to set a longer timeout or introduce a sleep cycle for the browser to have time to process AJAX responses.

Next, the response.html.render() element returns None. It loads the HTML into a headless Chromium browser, leaves JavaScript rendering to that browser, then copies back the page HTML into the response.html datasctructure in place, and nothing needs to be returned. So js is set to None, not a new HTML instance, causing your next traceback.

Use the existing response.html object to search, after rendering:

r.html.render()
item = r.html.find('.MarketInfo_market-num_1lAXs', first=True)

There is most likely no such CSS class, because the last 5 characters are generated on each page render, after JSON data is loaded over AJAX. This makes it hard to use CSS to find the element in question.

Moreover, I found that without a sleep cycle, the browser has no time to fetch AJAX resources and render the information you wanted to load. Give it, say, 10 seconds of sleep to do some work before copying back the HTML. Set a longer timeout (the default is 8 seconds) if you see network timeouts:

r.html.render(timeout=10, sleep=10)

You could set the timeout to 0 too, to remove the timeout and just wait indefinitely until the page has loaded.

Hopefully a future API update also provides features to wait for network activity to cease.

You can use the included parse library to find the matching CSS classes:

# search for CSS suffixes
suffixes = [r[0] for r in r.html.search_all('MarketInfo_market-num_{:w}')]
for suffix in suffixes:
# for each suffix, find all matching elements with that class
items = r.html.find('.MarketInfo_market-num_{}'.format(suffix))
for item in items:
print(item.text)

Now we get output produced:

169.81 EUR
+
1.01 %
18,420 LTC
169.81 EUR
+
1.01 %
18,420 LTC
169.81 EUR
+
1.01 %
18,420 LTC
169.81 EUR
+
1.01 %
18,420 LTC

Your last traceback shows that the Chromium user data path could not be cleaned up. The underlying Pyppeteer library configures the headless Chromium browser with a temporary user data path, and in your case the directory contains some still-locked resource. You can ignore the error, although you may want to try and remove any remaining files in the .pyppeteer folder at a later time.



Related Topics



Leave a reply



Submit