Jsoup Cookies for Https Scraping

jsoup posting and cookie

When you login to the site, it is probably setting an authorised session cookie that needs to be sent on subsequent requests to maintain the session.

You can get the cookie like this:

Connection.Response res = Jsoup.connect("http://www.example.com/login.php")
.data("username", "myUsername", "password", "myPassword")
.method(Method.POST)
.execute();

Document doc = res.parse();
String sessionId = res.cookie("SESSIONID"); // you will need to check what the right cookie name is

And then send it on the next request like:

Document doc2 = Jsoup.connect("http://www.example.com/otherPage")
.cookie("SESSIONID", sessionId)
.get();

Jsoup cookie authentication from cookiesyncmanager to scrape from https site

I'm not an android developer but maybe you can try something like this:

final String url = "https://need.authentication.com";

// -- Android Cookie part here --
CookieSyncManager.getInstance().sync();
CookieManager cm = CookieManager.getInstance();

String cookie = cm.getCookie(url); // returns cookie for url

// ...

// -- JSoup part here --
// Jsoup uses cookies as "name/value pairs"
doc = Jsoup.connect("https://need.authentication.com").header("Cookie", cookie).get();

// ...

I hope this helps a bit, but as i said before: im no android developer (and code isn't tested!)

Here's some documentation:

  • CookieManager
  • CookieSyncManager
  • Jsoup Connection

jsoup does not send cookies from previous requests - bug?

Since it seems that the second but last answer does not return any cookies, you can't use that response as source for the cookies for the final query. JSoup does not automagically handle cookies for you. In each request you need to specify the cookies to send along - as you do. But you also overwrite the variable res each time with a new response. If you do not save the cookies of the connection in a map the old cookies are deleted together with the responses. So your approach with the map is perfectly valid and I would keep using this pattern.

If you want a more automatic cookie management I would suggest using the Apache httpClient library.

Jsoup can't login to establish cookies

Your code gives me

HTTP error fetching URL. Status=404, URL=https://www.rbauction.com/home/auth&

If you change only:

.data("_58_redirect", "%2Fhome%2Fauth&") for .data("_58_redirect", "%2Fhome%2Fauth") (without the trailing ampersand)

It works fine!

If it doesn't, check your user/pass.

How can I pass a cookie held by CookieManager to Jsoup in Android?

cookie(name, value) expects the name of the cookie not its related url.

Try this instead:

doc = Jsoup //
.connect("https://need.authentication.com") //
.header("Cookie", cookie) //
.get();


Related Topics



Leave a reply



Submit