Java - How to Find the Redirected Url of a Url

Java - How to find the redirected url of a url?

You need to cast the URLConnection to HttpURLConnection and instruct it to not follow the redirects by setting HttpURLConnection#setInstanceFollowRedirects() to false. You can also set it globally by HttpURLConnection#setFollowRedirects().

You only need to handle redirects yourself then. Check the response code by HttpURLConnection#getResponseCode(), grab the Location header by URLConnection#getHeaderField() and then fire a new HTTP request on it.

How to find url which caused redirection from my redirected page?

javax.servlet.forward.request_uri helped me to get uri

String requestUri = request.getAttribute("javax.servlet.forward.request_uri").toString();
String sourceUrl = <domain_url> + requestUri;

Allows me to get entire url. Here I know base domain name hence I could get the entire url. But when it is redirected from some external site I'm not sure how to handle it.

How to get redirected URL and content using HttpURLConnection

actually we can use HttpClient, which we can set HttpClient.followRedirect(true)
HttpClinent will handle the redirect things.

Get the redirected URL of a very specific URL (in Java)

I played a bit with your URL with telnet, wget, and curl and I noticed that in some cases the server returns response 200 OK, and sometimes 302 Moved Temporarily. The main difference seems to be the request User-agent header. Your code works if you add the following before con1.connect():

con1.setRequestProperty("User-Agent","");

That is, with empty User-Agent (or if the header is not present at all), the server issues a redirect. With the Java User-Agent (in my case User-Agent: Java/1.7.0_45) and with the default curl User-Agent (User-Agent: curl/7.32.0) the server responds with 200 OK.

In some cases you might need to also set:

System.setProperty("http.agent", "");

See Setting user agent of a java URLConnection

The server running the site is the Adtech Adserver and apparently it is doing user agent sniffing. There is a long history of user agent sniffing. So it seems that the safest thing to do would be to set the user agent to Mozilla:

con1.setRequestProperty("User-Agent","Mozilla"); //works with your code for your URL

Maybe the safest option would be to use a user agent used by some of the popular web browsers.

Retrieve redirected URL with Java / HttpURLConnection

Conceptual problems:

0.) Can one URLConnection or HttpURLConnection object be reused?

No, you can not reuse such an object. You can use it to fetch the content of one URL just once. You can not use it to retrieve another URL, nor to fetch the content twice (speaking on the network level).

If you want to fetch another URL or to fetch the URL a second time, you have to call the openConnection() method of the URL class again to instanciate a new connection object.

1.) When is the URLConnection actually connected?

The method name openConnection() is misleading. It only instanciates the connection object. It does not do anything on the network level.

The interaction on the network level starts in this line, which implicitly connects the connection (= the TCP socket under the hood is opened and data is sent and received):

int responseType = con.getResponseCode()/100;

.

Alternatively, you can use HttpURLConnection.connect() to explicitly connect the connection.

2.) How does setInstanceFollowRedirects work?

setInstanceFollowRedirects(true) causes the URLs to be fetched "under the hood" again and again until there is a non-redirect response. The response code of the non-redirect response is returned by your call to getResponseCode().

UPDATE:

Yes, this allows to write simple code if you do not want to bother about the redirects yourself. You can simply switch on to follow redirects and then you can read the final response of the location to which you get redirected as if there was no redirect taking place.

How to check programatically if url of page is redirecting?

In groovy, you could do what Joachim suggests by doing:

String location = "url-of-webpage-A"
boolean wasRedirected = false
String pageContent = null

while( location ) {
new URL( location ).openConnection().with { con ->
// We'll do redirects ourselves
con.instanceFollowRedirects = false

// Get the response code, and the location to jump to (in case of a redirect)
location = con.getHeaderField( "Location" )
if( !wasRedirected && location ) {
wasRedirected = true
}

// Read the HTML and close the inputstream
pageContent = con.inputStream.withReader { it.text }
}
}

println "wasRedirected:$wasRedirected contentLength:${pageContent.length()}"

If you don't want to be redirected, and want the contents of the first page, you simply need to do:

String location = "url-of-webpage-A"
String pageContent = new URL( location ).openConnection().with { con ->
// We'll do redirects ourselves
con.instanceFollowRedirects = false

// Get the location to jump to (in case of a redirect)
location = con.getHeaderField( "Location" )

// Read the HTML and close the inputstream
con.inputStream.withReader { it.text }
}

if( location ) {
println "Page wanted to redirect to $location"
}
println "Content was:"
println pageContent


Related Topics



Leave a reply



Submit