Read Input Stream Twice

Read input stream twice

You can use org.apache.commons.io.IOUtils.copy to copy the contents of the InputStream to a byte array, and then repeatedly read from the byte array using a ByteArrayInputStream. E.g.:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
org.apache.commons.io.IOUtils.copy(in, baos);
byte[] bytes = baos.toByteArray();

// either
while (needToReadAgain) {
    ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
    yourReadMethodHere(bais);
}

// or
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
while (needToReadAgain) {
    bais.reset();
    yourReadMethodHere(bais);
}

Read part of an InputStream twice

BufferedInputStream.markSupported() returns true (see Javadoc). Simply wrap your stream with BufferedInputStream and set a mark limit bigger than X.

Reading an input stream twice without storing it in memory

It depends on the source of the stream.

If it's a local file, you can likely re-open and re-read the stream as many times as you want.

If it's dynamically generated by a process, a remote service, etc., you might not be free to re-generate it. In that case, you need to store it, either in memory or in some more persistent (and slow) storage like a file system or storage service.

Maybe an analogy would help. Suppose your friend is speaking to you at length. You listen carefully without interruption, but when they are done, you realize you didn't understand something they said near the beginning, and want to review that portion.

At this point, there are a few possibilities.

Perhaps your friend was actually reading aloud from a book. You can simply re-read the book.

Or, perhaps you had to foresight to record their monologue. You can replay the recording.

However, since neither you nor your friend has perfect and unlimited recall, simply repeating verbatim what was said ten minutes ago from memory alone is not an option.

An InputStream is like your friend speaking. Neither of you has a good enough memory to remember exactly, word-for-word, what is said. In the same way, neither a process that is generating the data stream nor your program has enough RAM to store, byte-for-byte, the stream. To scale, your program has to rely on its "short-term memory" (RAM), working with just a small portion of the whole stream at any given time, and "taking notes" (writing to a persistent store) as it encounters important points.

If the source of stream is a local file, then it's like your friend reading a book. Either of you can re-read that content easily enough.

If you copy the stream to some persistent storage, that's like recording your friend's speech. You can replay it as often as you like.

Consider a scenario where browser is uploading a large file, but the server is busy, and not able to read that stream for some time. Where is that data stored during that delay?

Because the receiver can't always respond immediately to input, TCP and many other protocols allocate a small buffer to store some data from a sender. But, they also have a way to tell the sender to wait, they are sending data too fast—flow control. Going back to the analogy, it's like telling your friend to pause a moment while you catch up with your note-taking.

As the browser uploads the file, at first, the buffer will be filled. But if the server can't keep up, the browser will be instructed to pause its upload until there is more room in the buffer. (This generally happens at the OS and TCP level; the client and server applications don't manage this directly.) The upload speed depends on how fast the browser can read the file from disk, how fast the network link is, and how fast the server can process the uploaded data. Even a fast network and client will be limited by the weak link in this chain.

How to read same inputstream twice without both inputstreams stopping from working

It sounds like you are trying to display the data from an input stream in two distinct views. You can't really read a plain InputStream twice; instead, you need some kind of buffering. So rather than trying to read it twice, I suggest modifying your code in one of two ways, both of which involve using only a single read loop for each stream:

Modify your read loop to update an internal buffer instead of calling logarea.setText(logarea.getText() + addtext);. Whenever the read loop updates the buffer, any interested views should be notified (via some sort of observer pattern that you implement) that the contents have changed.
Modify the read loop to update all interested views instead of having a separate read loop for each JFrame. The read loop would need access to a list of log areas instead of a single logarea. It would loop through the list and update each log area using the same logic you are now using for logarea.

The key thing is to never have two read loops accessing the same input stream.

Can't read the same InputStream twice

I find out that an InputStream can't be read twice as Tika and Boilerpipe did in my old code, so I figured out that I could read fileStream and convert it to String, pass it to Boilerpipe, convert the String to a ByteArrayInputStream and pass that to Tika.
This is my new code.

// getFile() method returns the input stream of a local or online file
InputStream fileStream = getFile(source);

// Read the value of the InputStream and pass it to the
// Boilerpipe DefaultExtractor in order to extract the text
String html = readFromStream(fileStream);
String text = DefaultExtractor.INSTANCE.getText(html);

// Convert the value read from fileStream to a new ByteArrayInputStream
fileStream = new ByteArrayInputStream(html.getBytes("UTF-8"));

// Extract text and metadata via Apache Tika
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
AutoDetectParser parser = new AutoDetectParser();
parser.parse(fileStream, handler, metadata, context);

Getting an InputStream to read more than once, regardless of markSupported()

You can't necessarily read an InputStream more than once. Some implementations support it, some don't. What you are doing is checking the markSupported method, which is indeed an indicator if you can read the same stream twice, but then you are ignoring the result. You have to call that method to see if you can read the stream twice, and if you can't, make other arrangements.

Edit (in response to comment): When I wrote my answer, my "other arrangements" was to get a fresh InputStream. However, when I read in your comments to your question about what you want to do, I'm not sure it is possible. For the basics of the operation, you probably want RandomAccessFile (at least that would be my first guess, and if it worked, that would be the easiest) - however you will have file access issues. You have an application actively writing to a file, and another reading that file, you will have problems - exactly which problems will depend on the OS, so whatever solution would require more testing. I suggest a separate question on SO that hits on that point, and someone who has tried that out can perhaps give you more insight.

How to read a DataInputStream twice or more than twice?

I would use ByteArrayInput stream or something that you can reset. That would involve reading the data into another type of input stream and then creating one.

InputStream has a markSupported() method that you could check on the original and the byte array one to find one that the mark will work with:

https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#markSupported()
https://docs.oracle.com/javase/7/docs/api/java/io/ByteArrayInputStream.html

Read Input Stream Twice