Persistent Tcp Connection in Rails App

Persistent/Keep-Alive HTTP Connection Using POST In Rails

I think the Net::HTTP::Persistent library is what you are looking for. There's also this library going one step further by implementing connection pools over persistent connections. But since it sounds like you just had one API point, this might be overkill.

Some additional thoughts: If you really look into raw speed, it might be worth to send a single multipart POST request to further reduce the overhead. This would come down to implementing a reverse server push.

For this to work, your rails app would need to accept a chunk-encoded request. This is important as we are continuously streaming data to the server without any knowledge how long the resulting message body will ultimately be. HTTP/1.1 requires all messages (that is responses and requests) to be either chunk-encoded or have their body size specified by a Content-Length header (cf RFC 2616, section 4.4). However, most clients prefer latter option which results into some webservers not handling chunk-encoded requests well (e.g. nginx hasn't had this implemented before v1.3.9).

As a serialization format, I can safely recommend JSON, which is really fast to generate and widely accepted. An implementation for RoR can be found here. You might want to have a look at this implementation as well as it is natively working with streams and might thus be better suitable. If you find that JSON doesn't suit your needs, give MessagePack a try.

If you hit network saturation, it could be worth to investigate the possibilities for request compression.

Everything put together, your request could look like this (compression and chunk-encoding stripped for the sake of legibility):

POST /api/endpoint HTTP/1.1
Host: example.com
Content-Type: multipart/mixed; boundary="---boundary-"
Transfer-Encoding: chunked
Content-Encoding: deflate

---boundary-
Content-Type: application/json

{...}
---boundary-
Content-Type: application/json

{...}
---boundary---

The mime type is multipart/mixed as I felt it were the most appropriate one. It actually implies the message parts were of different content types. But as far as I can see, this is nowhere enforced, so multipart/mixed is safe to use here. deflate is chosen over gzip as compression method as it doesn't need to generate a CRC32 checksum. This allows for a speed boost (and saves a few bytes).

HTTP::persistent and proxy?

On the wiki page of the HTTP gem there is mention of .persistent. There's also a page dedicated to proxy support.

Load balance active TCP sessions to AWS Aurora RDS

Finally I've figured this out and the fix had to be applied on application level in order to stop keeping long live TCP connections.

Particulary in Ruby I had to set idle_timeout and reaping_frequency parameters. However here you can find parameters for other programming languages.

Hope it help someone in future. Cheers!

How to retain one million simultaneous TCP connections?

What operating systems are you considering for this?

If using a Windows OS and using something later than Vista then you shouldn't have a problem with many thousands of connections on a single machine. I've run tests (here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html) with a low spec Windows Server 2003 machine and easily achieved more than 70,000 active TCP connections. Some of the resource limits that affect the number of connections possible have been lifted considerably on Vista (see here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html) and so you could probably achieve your goal with a small cluster of machines. I don't know what you'd need in front of those to route the connections.

Windows provides a facility called I/O Completion Ports (see: http://msdn.microsoft.com/en-us/magazine/cc302334.aspx) which allow you to service many thousands of concurrent connections with very few threads (I was running tests yesterday with 5000 connections saturating a link to a server with 2 threads to process the I/O...). Thus the basic architecture is very scalable.

If you want to run some tests then I have some freely available tools on my blog that allow you to thrash a simple echo server using many thousands of connections (1) and (2) and some free code which you could use to get you started (3)

The second part of your question, from your comments, is more tricky. If the client's IP address keeps changing and there's nothing between you and them that is providing NAT to give you a consistent IP address then their connections will, no doubt, be terminated and need to be re-established. If the clients detect this connection tear down when their IP address changes then they can reconnect to the server, if they can't then I would suggest that the clients need to poll the server every so often so that they can detect the connection loss and reconnect. There's nothing the server can do here as it can't predict the new IP address and it will discover that the old connection has failed when it tries to send data.

And remember, your problems are only just beginning once you get your system to scale to this level...

Client Server API pattern in REST (unreliable network use case)

If it isn't reasonable for duplicate resources to be created (e.g. products with identical titles, descriptions, etc.), then unique identifiers can be generated on the server which can be tracked against created resources to prevent duplicate requests from being processed. Unlike Darrel's suggestion of generating unique IDs on the client, this would also prevent separate users from creating duplicate resources (which you may or may not find desirable). Clients will be able to distinguish between "created" responses and "duplicate" responses by their response codes (201 and 303 respectively, in my example below).

Pseudocode for generating such an identifier — in this case, a hash of a canonical representation of the request:

func product_POST
    // the canonical representation need not contain every field in
    // the request, just those which contribute to its "identity"
    tags = join sorted request.tags
    canonical = join [request.name, request.maker, tags, request.desc]
    id = hash canonical

    if id in products
        http303 products[id]
    else
        products[id] = create_product_from request
        http201 products[id]
    end
end

This ID may or may not be part of the created resources' URIs. Personally, I'd be inclined to track them separately — at the cost of an extra lookup table — if the URIs were going to be exposed to users, as hashes tend to be ugly and difficult for humans to remember.

In many cases, it also makes sense to "expire" these unique hashes after some time. For example, if you were to make a money transfer API, a user transferring the same amount of money to the same person a few minutes apart probably indicates that the client never received the "success" response. If a user transfers the same amount of money to the same person once a month, on the other hand, they're probably paying their rent. ;-)

Listen for multiple response produced by one HTTP request

We ended up building a solution on top of eventmachine. EM allow us to receive events every time a piece of data is received on the connection. Then data is passed to http_parser.rb to parse incoming http messages and finally we enqueue the message on sidekiq.

Is TCP Reset (RST) two way?

If the RST is sent by the client, it can be seen on it using a packet sniffer such as wireshark. However, it won't show up in any user-level sockets since it's sent by the OS as a response to various erroneous inputs (such as connection attempts to a closed port).

If the RST is sent by the network, then it's pretending to be the client to sever the connection. It can do so in one direction, or in both of them. In that case, the client might not see anything, except for a RST sent by the actual server when the client continues to send data to a connection it perceives as open, while the server sees it as closed.

Try capturing the traffic on both the server and the client, see where the resets are coming from.