How to read PDF files using Java?
PDFBox is the best library I've found for this purpose, it's comprehensive and really quite easy to use if you're just doing basic text extraction. Examples can be found here.
It explains it on the page, but one thing to watch out for is that the start and end indexes when using setStartPage() and setEndPage() are both inclusive. I skipped over that explanation first time round and then it took me a while to realise why I was getting more than one page back with each call!
Itext is another alternative that also works with C#, though I've personally never used it. It's more low level than PDFBox, so less suited to the job if all you need is basic text extraction.
how to read a pdf file online and save on local machine using java
Your Pdf Link actually redirects to https://www.gnostice.com/downloads.asp, so there is no pdf directly behind the link.
Try with another link: check first in a browser of your choice that invoking the pdf's url render a real pdf in the browser.
The code below is practically the same as yours except for the pdf's url and the output's path, and I am also adding exception throws to the main method's signature and simply printing the content type.
It works as expected:
public class PdfFileReader {
public static void main(String[] args) throws IOException {
URL pdfUrl = new URL("http://www.crdp-strasbourg.fr/je_lis_libre/livres/Anonyme_LesMilleEtUneNuits1.pdf");
byte[] ba1 = new byte[1024];
int baLength;
try (FileOutputStream fos1 = new FileOutputStream("c:\\mybook.pdf")) {
URLConnection urlConn = pdfUrl.openConnection();
System.out.println("The content type is: " + urlConn.getContentType());
try {
InputStream is1 = pdfUrl.openStream();
while ((baLength = is1.read(ba1)) != -1) {
fos1.write(ba1, 0, baLength);
}
fos1.flush();
fos1.close();
is1.close();
} catch (ConnectException ce) {
System.out.println("FAILED.\n[" + ce.getMessage() + "]\n");
}
}
}
}
Output:
The content type is: application/pdf
How to read pdf file and write it to outputStream
import java.io.*;
public class FileRead {
public static void main(String[] args) throws IOException {
File f=new File("C:\\Documents and Settings\\abc\\Desktop\\abc.pdf");
OutputStream oos = new FileOutputStream("test.pdf");
byte[] buf = new byte[8192];
InputStream is = new FileInputStream(f);
int c = 0;
while ((c = is.read(buf, 0, buf.length)) > 0) {
oos.write(buf, 0, c);
oos.flush();
}
oos.close();
System.out.println("stop");
is.close();
}
}
The easiest way so far. Hope this helps.
How to read pdf file in java
Yes it is possible. For reading pdf file from java gone through Apache PDFBOX. This PDFBOX allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command line utilities.
Related Topics
Spring Boot Rest Service Exception Handling
When Would You Use a Weakhashmap or a Weakreference
What Is Simplest Way to Read a File into String
Performance of Java Matrix Math Libraries
How to Sort a List by Different Parameters at Different Timed
How to Read .Pem File to Get Private and Public Key
Multi-Project Test Dependencies with Gradle
Can a Java File Have More Than One Class
Why Is Java's Boolean Primitive Size Not Defined
How to Force Selenium Webdriver to Click on Element Which Is Not Currently Visible
Test If a String Contains Any of the Strings from an Array
Java: How to Do Dynamic Casting of a Variable from One Type to Another
Java Get File Size Efficiently
Best Way to Represent a Fraction in Java
How to Connect SQLite with Java
How to Create and Run Apache Jmeter Test Scripts from a Java Program