Getting Jsoup to support dynamically generated html by JavaScript
Jsoup does not support javascript and it does not emulate a browser. Just forget about it if you're planning to execute Javascript. In my experience HtmlUnit, which is a headless browser, has given me the best results (always talking about Java frameworks).
One thing that worths trying in HtmlUnit is changing the BrowserVersion
(Chrome / InternetEplorer / FireFox) while creating the WebClient
instance. Some sites react in a different way and sometimes just changing that value might give you the results you expect to get.
How to read/parse dynamically generated client side content in Android using Java
Jsoup can't parse JavaScript so it can't be used here.
It can be done with Selenium webdriver or in case of Android use Selendroid.
Get full HTML using Jsoup
Most likely the elements you see are dynamically added to the DOM by some JavaScript code. That means they are not available in the body of the request when you use Jsoup.
Jsoup Scraping HTML dynamic content
You can use the .select(String cssQuery)
method:
doc.select("h1")
gives you all h1
Elements
.
If you need the actual Text in these tags use the .text()
for each Element
.
If you need a attribute like class
or id
use .attr(String attributeKey)
on a Element
eg:
doc.getElementsByClass("hover_item_name").first().attr("id")
gives you "iteminfo0_item_name"
But if you need to perform clicks on a website you can't do that with JSoup, hence JSoup is a HTML parser and not a browser alternative. Jsoup can't handle dynamic content.
But what you could do is, firstly scrape the relevant data in your h1
tags and then send a new .post()
request, respectively an ajax call
If you rather want a real webdriver, have a look at Selenium.
Jsoup parse dynamically loading webpage in Java
EDIT - After few comments from the OP, I understood exectly what he wants to acheive. I've changed a bit my original solution and tested it.
You can do it with JSOUP
. After the first page, getting the next one requiers you to sen a post
request with some headers. The headers contains (among other) the start number and how many records to get. If you send an illegel number (i.e. you ask the page that contains game number 700 but the results contain only 600 games), you get the first page again. You can loop thru the pages, until you get a result that you already have.
Sometimes the server returns 600 results and sometimes only 540, I could not figure why.
The code for that is -
import java.util.regex.Pattern;
import org.jsoup.Connection;
import org.jsoup.Connection.Method;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class HelloWorld {
public static void main(String[] args) {
Connection.Response res = null;
Document doc = null;
Boolean OK = true;
int start = 0;
String query;
ArrayList<String> tempList = new ArrayList<>();
ArrayList<String> games = new ArrayList<>();
Pattern r = Pattern.compile("title=\"(.*)\" a");
try { //first connection with GET request
res = Jsoup.connect("https://play.google.com/store/apps/category/GAME_ACTION/collection/topselling_free")
.method(Method.GET)
.execute();
doc = res.parse();
} catch (Exception ex) {
//Do some exception handling here
}
for (int i=1; i <= 60; i++) { //parse the result and add it to the list
query = "div.card:nth-child(" + i + ") > div:nth-child(1) > div:nth-child(3) > h2:nth-child(2) > a:nth-child(1)";
tempList.add(doc.select(query).toString());
}
while (OK) { //loop until you get the same results again
start += 60;
System.out.println("now at number " + start);
try { //send post request for each new page
doc = Jsoup.connect("https://play.google.com/store/apps/category/GAME_ACTION/collection/topselling_free?authuser=0")
.cookies(res.cookies())
.data("start", String.valueOf(start))
.data("num", "60")
.data("numChildren", "0")
.data("ipf", "1")
.data("xhr", "1")
.post();
} catch (Exception ex) {
//Do some exception handling here
}
for (int i=1; i <= 60; i++) { //parse the result and add it to the list
query = "div.card:nth-child(" + i + ") > div:nth-child(1) > div:nth-child(3) > h2:nth-child(2) > a:nth-child(1)";
if (!tempList.contains(doc.select(query).toString())) {
tempList.add(doc.select(query).toString());
} else { //we've seen these games before, time to quit
OK = false;
break;
}
}
}
for (int i = 0; i < tempList.size(); i++) { //remove all redundent info.
Matcher m = r.matcher(tempList.get(i));
if (m.find()) {
games.add(m.group(1));
System.out.println((i + 1) + " " + games.get(i));
}
}
}
}
The code can be further improved (like handling all the lists at a seperate method), so it's up to you.
I hope this does the work for you.
Related Topics
Fragment Add or Replace Not Working
How to Change the Edittext Text Without Triggering the Text Watcher
Automatically Log Android Lifecycle Events Using Activitylifecyclecallbacks
Onintercepttouchevent Only Gets Action_Down
How to Update Information in an Android Activity from a Background Service
Upgrade SQLite Database from One Version to Another
App Crashing When Trying to Use Recyclerview on Android 5.0
Android Studio - Failed to Complete Gradle Execution - Error in Opening Zip File
How to Detect a Click in an Ontouch Listener
Networksecurityconfig: No Network Security Config Specified -- Android 7.0 Error
How to Update Xml File from Another Xml File Dynamically
"Hello World" Android App with as Few Files as Possible, No Ide, and Text Editor Only
How to Use Mockito with Junit5
How to Call a Method Stored in a Hashmap? (Java)