Jsoup Java HTML Parser:Executing JavaScript Events

Jsoup Java HTML parser : Executing Javascript events

JSoup is just an HTML parser/"tidyfier" - not a browser emulator. To interact with HTML pages (execute javascript, fill out forms, etc.) you should use a tool like HtmlUnit or Selenium.

JSoup doesn't load the whole HTML

It seems id=tournamentTable is generated dynamically using javascript. JSoup is not evaluating javascript, so you'd have to use library like HtmlUnit. For example:

WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true); // enable javascript
webClient.getOptions().setThrowExceptionOnScriptError(false); //even if there is error in js continue
webClient.waitForBackgroundJavaScript(5000); // important! wait until javascript finishes rendering
HtmlPage page = webClient.getPage(url);

page.getElementById("tournamentTable");

Jsoup Java HTML parser : Executing Javascript events

JSoup is just an HTML parser/"tidyfier" - not a browser emulator. To interact with HTML pages (execute javascript, fill out forms, etc.) you should use a tool like HtmlUnit or Selenium.

Fetching the website with Jsoup - page view source and Jsoup shows different content

Short answer Jsoup can't execute the Javascript.

Long answer

http://www.yelp.com/search?find_desc=restaurant&find_loc=willowbrook%2C+IL&ns=1#l=p:IL:Willowbrook::&sortby=rating&rpp=40

The webpage your are looking for accepts the Http Get with the parameters. In the normal browser it accepts the params and loads the page . But Not with willowbrook checked(in your example). It loads the JS after it loads the page and the Javascript does the check box for Fliters the serach results. Therefore when you use Jsoup you are getting more results because it loads 'state=IL' without 'willowbrook' filtered.

Jsoup Java HTML parser : Executing Javascript events

JSoup is just an HTML parser/"tidyfier" - not a browser emulator. To interact with HTML pages (execute javascript, fill out forms, etc.) you should use a tool like HtmlUnit or Selenium.

Can't access some DOM elements using Jsoup

Jsoup is a html parser only. Unfortunately it's not possible to parse any javascript / ajax content, since Jsoup can't execute those.

You can disable javascript on the page and try to reload it. You will see that div.matchlist-header element doesn't exist



Related Topics



Leave a reply



Submit