How to Read PDF Files Using Java

How to read PDF files using Java?

PDFBox is the best library I've found for this purpose, it's comprehensive and really quite easy to use if you're just doing basic text extraction. Examples can be found here.

It explains it on the page, but one thing to watch out for is that the start and end indexes when using setStartPage() and setEndPage() are both inclusive. I skipped over that explanation first time round and then it took me a while to realise why I was getting more than one page back with each call!

Itext is another alternative that also works with C#, though I've personally never used it. It's more low level than PDFBox, so less suited to the job if all you need is basic text extraction.

how to read a pdf file online and save on local machine using java

Your Pdf Link actually redirects to https://www.gnostice.com/downloads.asp, so there is no pdf directly behind the link.

Try with another link: check first in a browser of your choice that invoking the pdf's url render a real pdf in the browser.

The code below is practically the same as yours except for the pdf's url and the output's path, and I am also adding exception throws to the main method's signature and simply printing the content type.

It works as expected:

public class PdfFileReader {
public static void main(String[] args) throws IOException {

URL pdfUrl = new URL("http://www.crdp-strasbourg.fr/je_lis_libre/livres/Anonyme_LesMilleEtUneNuits1.pdf");
byte[] ba1 = new byte[1024];
int baLength;
try (FileOutputStream fos1 = new FileOutputStream("c:\\mybook.pdf")) {
URLConnection urlConn = pdfUrl.openConnection();
System.out.println("The content type is: " + urlConn.getContentType());

try {
InputStream is1 = pdfUrl.openStream();
while ((baLength = is1.read(ba1)) != -1) {
fos1.write(ba1, 0, baLength);
}
fos1.flush();
fos1.close();
is1.close();

} catch (ConnectException ce) {
System.out.println("FAILED.\n[" + ce.getMessage() + "]\n");
}
}
}
}

Output:

The content type is: application/pdf

How to read pdf file and write it to outputStream

import java.io.*;

public class FileRead {

public static void main(String[] args) throws IOException {

File f=new File("C:\\Documents and Settings\\abc\\Desktop\\abc.pdf");

OutputStream oos = new FileOutputStream("test.pdf");

byte[] buf = new byte[8192];

InputStream is = new FileInputStream(f);

int c = 0;

while ((c = is.read(buf, 0, buf.length)) > 0) {
oos.write(buf, 0, c);
oos.flush();
}

oos.close();
System.out.println("stop");
is.close();

}

}

The easiest way so far. Hope this helps.

How to read pdf file in java

Yes it is possible. For reading pdf file from java gone through Apache PDFBOX. This PDFBOX allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command line utilities.



Related Topics



Leave a reply



Submit