Why Do You Have to Call Urlconnection#Getinputstream to Be Able to Write Out to Urlconnection#Getoutputstream

Why do you have to call URLConnection#getInputStream to be able to write out to URLConnection#getOutputStream?

The API for URLConnection and HttpURLConnection are (for better or worse) designed for the user to follow a very specific sequence of events:

Set Request Properties
(Optional) getOutputStream(), write to the stream, close the stream
getInputStream(), read from the stream, close the stream

If your request is a POST or PUT, you need the optional step #2.

To the best of my knowledge, the OutputStream is not like a socket, it is not directly connected to an InputStream on the server. Instead, after you close or flush the stream, AND call getInputStream(), your output is built into a Request and sent. The semantics are based on the assumption that you will want to read the response. Every example that I've seen shows this order of events. I would certainly agree with you and others that this API is counterintuitive when compared to the normal stream I/O API.

The tutorial you link to states that "URLConnection is an HTTP-centric class". I interpret that to mean that the methods are designed around a Request-Response model, and make the assumption that is how they will be used.

For what it's worth, I found this bug report that explains the intended operation of the class better than the javadoc documentation. The evaluation of the report states "The only way to send out the request is by calling getInputStream."

Why when I use HttpURLConnection to post it requires to have both OutputSteam and InputStream in order to make the post request?

The OutputStream is what you (as the client) are going to write your HTTP POST data to. The InputStream, in this case, will be the response from the server. Presumably, the server doesn't commit the transaction until the response completes.

Can you explain the HttpURLConnection connection process?

String message = URLEncoder.encode("my message", "UTF-8");

try {
    // instantiate the URL object with the target URL of the resource to
    // request
    URL url = new URL("http://www.example.com/comment");

    // instantiate the HttpURLConnection with the URL object - A new
    // connection is opened every time by calling the openConnection
    // method of the protocol handler for this URL.
    // 1. This is the point where the connection is opened.
    HttpURLConnection connection = (HttpURLConnection) url
            .openConnection();
    // set connection output to true
    connection.setDoOutput(true);
    // instead of a GET, we're going to send using method="POST"
    connection.setRequestMethod("POST");

    // instantiate OutputStreamWriter using the output stream, returned
    // from getOutputStream, that writes to this connection.
    // 2. This is the point where you'll know if the connection was
    // successfully established. If an I/O error occurs while creating
    // the output stream, you'll see an IOException.
    OutputStreamWriter writer = new OutputStreamWriter(
            connection.getOutputStream());

    // write data to the connection. This is data that you are sending
    // to the server
    // 3. No. Sending the data is conducted here. We established the
    // connection with getOutputStream
    writer.write("message=" + message);

    // Closes this output stream and releases any system resources
    // associated with this stream. At this point, we've sent all the
    // data. Only the outputStream is closed at this point, not the
    // actual connection
    writer.close();
    // if there is a response code AND that response code is 200 OK, do
    // stuff in the first if block
    if (connection.getResponseCode() == HttpURLConnection.HTTP_OK) {
        // OK

        // otherwise, if any other status code is returned, or no status
        // code is returned, do stuff in the else block
    } else {
        // Server returned HTTP error code.
    }
} catch (MalformedURLException e) {
    // ...
} catch (IOException e) {
    // ...
}

The first 3 answers to your questions are listed as inline comments, beside each method, in the example HTTP POST above.

From getOutputStream:

Returns an output stream that writes to this connection.

Basically, I think you have a good understanding of how this works, so let me just reiterate in layman's terms. getOutputStream basically opens a ~~connection~~ stream, with the intention of writing data to the server. In the above code example "message" could be a comment that we're sending to the server that represents a comment left on a post. When you see getOutputStream, you're opening the ~~connection~~ stream for writing, but you don't actually write any data until you call writer.write("message=" + message);.

From getInputStream():

Returns an input stream that reads from this open connection. A SocketTimeoutException can be thrown when reading from the returned input stream if the read timeout expires before data is available for read.

getInputStream does the opposite. Like getOutputStream, it also opens a ~~connection~~ stream, but the intent is to read data from the server, not write to it. If the connection or stream-opening fails, you'll see a SocketTimeoutException.

How about the getInputStream? Since I'm only able to get the response at getInputStream, then does it mean that I didn't send any request at getOutputStream yet but simply establishes a connection?

Keep in mind that sending a request and sending data are two different operations. When you invoke ~~getOutputStream or getInputStream~~ url.openConnection(), you send a request to the server to establish a connection. There is a handshake that occurs where the server sends back an acknowledgement to you that the connection is established. It is then at that point in time that you're prepared to send or receive data. Thus, you do not need to call getOutputStream to ~~establish a connection~~ open a stream, unless your purpose for making the request is to send data.

In layman's terms, making a getInputStream request is the equivalent of making a phone call to your friend's house to say "Hey, is it okay if I come over and borrow that pair of vice grips?" and your friend establishes the handshake by saying, "Sure! Come and get it". Then, at that point, the connection is made, you walk to your friend's house, knock on the door, request the vice grips, and walk back to your house.

Using a similar example for getOutputStream would involve calling your friend and saying "Hey, I have that money I owe you, can I send it to you"? Your friend, needing money and sick inside that you kept it for so long, says "Sure, come on over you cheap bastard". So you walk to your friend's house and "POST" the money to him. He then kicks you out and you walk back to your house.

Now, continuing with the layman's example, let's look at some Exceptions. If you called your friend and he wasn't home, that could be a 500 error. If you called and got a disconnected number message because your friend is tired of you borrowing money all the time, that's a 404 page not found. If your phone is dead because you didn't pay the bill, that could be an IOException. (NOTE: This section may not be 100% correct. It's intended to give you a general idea of what's happening in layman's terms.)

Question #5:

~~Yes, you are correct that openConnection simply creates a new connection object but does not establish it. The connection is established when you call either getInputStream or getOutputStream.~~

openConnection creates a new connection object. From the URL.openConnection javadocs:

A new connection is opened every time by calling the openConnection method of the protocol handler for this URL.

The connection is established when you call openConnection, and the InputStream, OutputStream, or both, are called when you instantiate them.

Question #6:

To measure the overhead, I generally wrap some very simple timing code around the entire connection block, like so:

long start = System.currentTimeMillis();
log.info("Time so far = " + new Long(System.currentTimeMillis() - start) );

// run the above example code here
log.info("Total time to send/receive data = " + new Long(System.currentTimeMillis() - start) );

I'm sure there are more advanced methods for measuring the request time and overhead, but this generally is sufficient for my needs.

For information on closing connections, which you didn't ask about, see In Java when does a URL connection close?.

Why do I need getInputStream for HttpUrlConnection to send request?

You don't, but you do have to call either getInputStream() or getResponseCode(). Otherwise nothing is sent, but also otherwise you don't have any way of knowing whether the call succeeded or not.

Safe use of HttpURLConnection

is it safe to close an InputStream
before all of it's content has been
read

You need to read all of the data in the input stream before you close it so that the underlying TCP connection gets cached. I have read that it should not be required in latest Java, but it was always mandated to read the whole response for connection re-use.

Check this post: keep-alive in java6

URLConnection.getInputStream() uses too much memory

Your application is reading and writing using a byte array as a buffer. This could be allocated once and re-used for all of the files. (In fact, you are probably doing that already ... though you haven't shown us the actual code.)

If you read and write using a large byte[] as a buffer (as you are currently doing), then there is no need to use BufferedInputStream. (Using BufferedInputStream won't improve performance relative to using a buffer explicitly.) And since each time you create a new BufferedInputStream it is allocating a new byte array as the internal buffer, you will find that reading directly from the InputStream (i.e. us) should save memory, and not cost you any performance.

Your ideas were:

Reusing such streams

You can't do that with the standard Java APIs.

Explicitly pointing the size

I assume that you mean creating buffers whose size exactly match the size of the input streams's content.

That won't help if you recycle the buffer (as I suggested)
It probably wouldn't help anyway. At the base level, your code will be reading from a socket stream, and the reads typically won't fill the buffer anyway. (Reading from a socket will deliver the data that is currently available in the local TCP protocol stack .... not the entire stream content ... in one read call`.)
Beyond a few Kbytes, increasing the buffer size has little performance benefit. (Your existing 64 KB buffer size is probably not helping throughput.)

Why Do You Have to Call Urlconnection#Getinputstream to Be Able to Write Out to Urlconnection#Getoutputstream