How to Programmatically Download a Webpage in Java

How do you Programmatically Download a Webpage in Java

Here's some tested code using Java's URL class. I'd recommend do a better job than I do here of handling the exceptions or passing them up the call stack, though.

public static void main(String[] args) {
URL url;
InputStream is = null;
BufferedReader br;
String line;

try {
url = new URL("http://stackoverflow.com/");
is = url.openStream(); // throws an IOException
br = new BufferedReader(new InputStreamReader(is));

while ((line = br.readLine()) != null) {
System.out.println(line);
}
} catch (MalformedURLException mue) {
mue.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if (is != null) is.close();
} catch (IOException ioe) {
// nothing to see here
}
}
}

How to programmatically download website sources?

This is quite similar to this issue. You can use that to get a String with all the sources. Then you just search the string to find what you're looking for. It can look like this.

First start ChromeDriver and navigate to the page you wish to scrap.

WebDriver driver = new ChromeDriver();
driver.get("http://www.oddsportal.com/soccer/argentina/copa-argentina/rosario-central-racing-club-hnmq7gEQ/");

Then download the sources into a string

String scriptToExecute = "var performance = window.performance || window.mozPerformance || window.msPerformance || window.webkitPerformance || {}; var network = performance.getEntries() || {}; return network;";
String netData = ((JavascriptExecutor) driver).executeScript(scriptToExecute).toString();

And finally search the string for the desired link

netData = netData.substring(netData.indexOf("fb.oddsportal"), netData.indexOf(".dat")+4);       
System.out.println(netData);

How to download a pdf file programmatically from a webpage with .html extension?

For downloading a file, perhaps you could try something like this:

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;

public final class FileDownloader {

private FileDownloader(){}

public static void main(String args[]) throws IOException{
download("http://pdfobject.com/pdf/sample.pdf", new File("sample.pdf"));
}

public static void download(final String url, final File destination) throws IOException {
final URLConnection connection = new URL(url).openConnection();
connection.setConnectTimeout(60000);
connection.setReadTimeout(60000);
connection.addRequestProperty("User-Agent", "Mozilla/5.0");
final FileOutputStream output = new FileOutputStream(destination, false);
final byte[] buffer = new byte[2048];
int read;
final InputStream input = connection.getInputStream();
while((read = input.read(buffer)) > -1)
output.write(buffer, 0, read);
output.flush();
output.close();
input.close();
}
}

How to programmatically download all contents of webpage, not only the source code in Java

I would suggest to use JSoup library to do it as its pretty good HTML parse. You can parse HTML and than iterate over resources to download them. I am not sure but there should be an example on the same topic you asked.

How can I download and save a file from the Internet using Java?

Give Java NIO a try:

URL website = new URL("http://www.website.com/information.asp");
ReadableByteChannel rbc = Channels.newChannel(website.openStream());
FileOutputStream fos = new FileOutputStream("information.html");
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);

Using transferFrom() is potentially much more efficient than a simple loop that reads from the source channel and writes to this channel. Many operating systems can transfer bytes directly from the source channel into the filesystem cache without actually copying them.

Check more about it here.

Note: The third parameter in transferFrom is the maximum number of bytes to transfer. Integer.MAX_VALUE will transfer at most 2^31 bytes, Long.MAX_VALUE will allow at most 2^63 bytes (larger than any file in existence).

Downloading a website to a string

You can get the text using InputStream Reader like this.

try 
{
URL url = new URL("http://yourwebpage.com");
// Read all the text returned by the server
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null)
{
// str is one line of text; readLine() strips the newline character(s)
// You can use the contain method here.
if(str.contains(editText.getText().toString))
{
You can perform your logic here!!!!!
}
}
in.close();
} catch (MalformedURLException e) {
} catch (IOException e) {
}

Also add an additional permission in your apps Manifest file:

<uses-permission android:name="android.permission.INTERNET/>   

//============================EDIT================================//

if (group.length() > 0)
{
mProgressDialog = new ProgressDialog(this);
mProgressDialog.setMessage("Bezig met checken voor roosterwijzigingen...");
mProgressDialog.show();
try
{
URL url = new URL("http://www.augustinianum.eu/roosterwijzigingen/14062012.pdf");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null){

if(str.contains(mEtxtGroup.getText().toString())){

if(mProgressDialog.isShowing())
{
mProgressDialog.dismiss();
}

Toast.makeText(this, "U hebt een roosterwijziging.", Toast.LENGTH_LONG).show();
break;
}
}
in.close();
} catch (MalformedURLException e) {
Toast.makeText(this, "Er is een fout opgetreden, probeer opniew.", Toast.LENGTH_LONG).show();
} catch (IOException e) {
Toast.makeText(this, "Er is een fout opgetreden, probeer opniew.", Toast.LENGTH_LONG).show();
}

if(mProgressDialog.isShowing())
{
mProgressDialog.dismiss();
}
}
else{
Toast.makeText(this, "Voer een klas in", Toast.LENGTH_LONG).show();
}


Related Topics



Leave a reply



Submit