Getting Jsoup to Support Dynamically Generated HTML by JavaScript

Getting Jsoup to support dynamically generated html by JavaScript

Jsoup does not support javascript and it does not emulate a browser. Just forget about it if you're planning to execute Javascript. In my experience HtmlUnit, which is a headless browser, has given me the best results (always talking about Java frameworks).

One thing that worths trying in HtmlUnit is changing the BrowserVersion (Chrome / InternetEplorer / FireFox) while creating the WebClient instance. Some sites react in a different way and sometimes just changing that value might give you the results you expect to get.

Using jsoup to parse a dynamic page

You cant find it on the source page because the link is running a javascript that is populating the page dynamically based on a return JSON from this link

https://en.mygon.com/MGMDW/REST/web/client/shops/getShops?startIndex=0&pageSize=30&hourInterval=0&onlyPromotions=false&categoryId=0&day=5&searchWords=sushi&languageCode=en_EN&originMygon=true&capital=portugal%2C+portugal&_=1410373067855

JSoup wont help you in this case.

Run the link that I posted and change searchWords=sushi for the search that you want and the result will be a JSON that is easily parsed

[]s

Scrape a dynamically-produced page on Android

Selenium would be a good option for web scraping. https://www.selenium.dev/ It basically has access to the website's DOM. In past experience, a dynamically generated web page can be difficult to scrape. RegExp will be your friend. https://regexone.com/



Related Topics



Leave a reply



Submit