Long Polling/Http Streaming General Questions

Long Polling/HTTP Streaming General Questions

Yeah, the Comet-like techniques usually blowing up the brain in the beginning -- just making you think in a different way. And another problem is there are not that much resources available for PHP, cuz everyone's doing their Comet in node.js, Python, Java, etc.

I'll try to answer your questions, hope it would shed some light on this topic for people.

How will the server know when an update have been sent? will it need to query the databse continually or is there a better way?

The answer is: in the most general case you should use a message queue (MQ). RabbitMQ or the Pub/Sub functionality built into the Redis store may be a good choices, though there are many competing solutions on the market available such as ZeroMQ, Beanstalkd, etc.

So instead of continuous querying your database, you can just subscribe for an MQ-event and just hang until someone else will publish a message you subscribed for and MQ will wake you up and send a message. The chat app is a very good use case to understand this functionality.

Also I have to mention that if you would search for Comet-chat implementations in other languages, you might notice simple ones not using MQ. So how do they exchange the information then? The thing is such solutions are usually implemented as standalone single-threaded asynchronous servers, so they can store all connections in a thread local array (or something similar), handle many connections in a single loop and just pick a one and notify when needed. Such asynchronous server implementations are a modern approach that fits Comet-technique really great. However you're most likely implementing your Comet on top of mod_php or FastCGI, in this case this simple approach is not an option for you and you should use MQ.

This could still be very useful to understand how to implement a standalone asynchronous Comet-server to handle many connections in a single thread. Recent versions of PHP support Libevent and Socket Streams, so it is possible to implement such kind of server in PHP as well. There's also an example available in PHP documentation.

How do I check for the results during the Ajax connection is still active? I'm aware of jQuery's success function for ajax calls, but how do I check the data while the connection is still ongoing?

If you're doing your long-running polls with a usual Ajax technique such as plain XHR, jQuery Ajax, etc. you don't have an easy way to transmit several responses in a single Ajax request. As you mentioned you only have 'success' handler to deal with the response in whole and not with its part. As a workaround people send only a single response per request and process it in a 'success' handler, after that they just open a new long-poll request. This is just how HTTP-protocol works.

Also should be mentioned that actually there are workaround to implement streaming-like functionality using various techniques using techniques such as infinitely long page in a hidden IFRAME or using multipart HTTP-responses. Both of those methods are certain drawbacks (the former one is considered unreliable and sometimes could produce unwanted browser behavior such as infinite loading indicator and the latter one leaks consistent and straightforward cross-browser support, however certain applications still are known to successfully rely on that mechanism falling back to long-polling when the browser can't properly handle multipart responses).

If you'd like to handle multiple responses per single request/connection in a reliable way you should consider using a more advanced technology such as WebSocket which is supported by the most current browsers or on any platform that supports raw sockets (such as Flash or if you develop for a mobile app for instance).

Could you please elaborate more on message queues?

Message Queue is a term that describes a standalone (or built-in) implementation of the Observer pattern (also known as 'Publish/Subscribe' or simply PubSub). If you develop a big application, having one is very useful -- it allows you to decouple different parts of your system, implement event-driven asynchronous design and make your life much easier, especially in a heterogeneous systems. It has many applications to the real-world systems, I'll mention just couple of them:

  • Task queues. Let's say we're writing our own YouTube and need to convert users' video files in the background. We should obviously have a webapp with the UI to upload a movie and some fixed number of worker processes to convert the video files (maybe we would even need a number of dedicated servers where our workers only will leave). Also we would probably have to write our workers in C to ensure better performance. All we have to do is just setup a message queue server to collect and deliver video-conversion tasks from the webapp to our workers. When the worker spawns it connects to the MQ and goes idle waiting for a new tasks. When someone uploads a video file the webapp connects to the MQ and publishes a message with a new job. Powerful MQs such as RabbitMQ can equally distribute tasks among number of workers connected, keep track of what tasks had been completed, ensure nothing will get lost and will provide fail-over and even admin UI to browse current tasks pending and stats.
  • Asynchronous behavior. Our Comet-chat is a good example. Obviously we don't want to periodically poll our database all time (what's the use of Comet then? -- Not big difference of doing periodical Ajax-requests). We would rather need someone to notify us when a new chat-message appears. And a message queue is that someone. Let's say we're using Redis key/value store -- this is a really great tool that provides PubSub implementation among its data store features. The simplest scenario may look like following:
    1. After someone enters the chat room a new Ajax long poll request is being made.
    2. Request handler on the server side issues the command to Redis to subscribe a 'newmessage' channel.
    3. Once someone enters a message into his chat the server-side handler publishes a message into the Redis' 'newmessage' topic.
    4. Once a message is published, Redis will immediately notify all those pending handlers which subscribed to that channel before.
    5. Upon notification PHP-code that keeps long-poll request open, can return the request with a new chat message, so all users will be notified. They can read new messages from the database at that moment, or the messages may be transmitted directly inside message payload.

I hope my illustration is easy to understand, however message queues is a very broad topic, so refer to the resources mentioned above for further reading.

My Understanding of HTTP Polling, Long Polling, HTTP Streaming and WebSockets

There are more differences than the ones you have identified.

Duplex/directional:

  • Uni-directional: HTTP poll, long poll, streaming.
  • Bi-direcitonal: WebSockets, plugin networking

In order of increasing latency (approximate):

  • WebSockets
  • Plugin networking
  • HTTP streaming
  • HTTP long-poll
  • HTTP polling

CORS (cross-origin support):

  • WebSockets: yes
  • Plugin networking: Flash via policy request (not sure about others)
  • HTTP * (some recent support)

Native binary data (typed arrays, blobs):

  • WebSockets: yes
  • Plugin networking: not with Flash (requires URL encoding across ExternalInterface)
  • HTTP *: recent proposal to enable binary type support

Bandwidth in decreasing efficiency:

  • Plugin networking: Flash sockets are raw except for initial policy request
  • WebSockets: connection setup handshake and a few bytes per frame
  • HTTP streaming (re-use of server connection)
  • HTTP long-poll: connection for every message
  • HTTP poll: connection for every message + no data messages

Mobile device support:

  • WebSocket: iOS 4.2 and up. Some Android via Flash emulation or using Firefox for Android or Google Chrome for Android which both provide native WebSocket support.
  • Plugin networking: some Android. Not on iOS
  • HTTP *: mostly yes

Javascript usage complexity (from simplest to most complicated). Admittedly complexity measures are somewhat subjective.

  • WebSockets
  • HTTP poll
  • Plugin networking
  • HTTP long poll, streaming

Also note that there is a W3C proposal for standardizing HTTP streaming called Server-Sent Events. It is currently fairly early in it's evolution and is designed to provide a standard Javascript API with comparable simplicity to WebSockets.

long polling vs streaming for about 1 update/second

Do you want the process to be client- or server-driven? In other words, do you want to push new data to the clients as soon as it's available, or would you rather that the clients request new data whenever they see fit, even though that might not be once/second? What is the likelyhood that the client will be able to stick around to wait for an answer? Even though you expect the events to occur once/second, how long does it take between a request from a client and the return from the server? If it's longer than a second, I'd expect you to lean towards pushing the events to the clients, though the other way around, I'd expect polling to be okay. If the response takes longer than the interval, then you're essentially streaming anyway, since there's a new event ready by the time the client receives the last one, so the client could essentially poll continually and always receive events - in this case, streaming the data would actually be more lightweight, since you're removing the connection/negotiation overhead from the process.

I would suspect that server load to be higher for a client-based (pull) subscription, instead of a streaming configuration, since the client would have to re-negotiate the connection each time, instead of leaving a connection open, but each open connection in a streaming model would require server resources as well. It depends on what the trade-off is between how aggressive your negotiation process is vs. how much memory/processing is required for each open connection. I'm no expert, though, so there may be other factors.

UPDATE: This guy talks about the trade-offs between long-polling and streaming, and he seems to say that with HTTP/1.1, the connection re-negotiation process is trivial, so that's not as much of an issue.

Is this chat using long polling or http streaming?

It's not anything that simple. It uses http://www.mibbit.com/chat, which is a full IRC client written in Javascript and Java. Blog at http://blog.mibbit.com/.

Edit: Here's your answer.

The first part I got working was the communications between browser and server. That’s done using 2 XMLHttpRequests. The first one is simply to send data from browser to server. It utilizes keep-alive, to minimise new connections.

The second XHR is the ‘receive lazy polling’ one. It connects to the server, and the server holds it open until there are messages available, or a timeout expires. This one is also keep-alive, so the next request goes down the same connection.

What you end up with is 2 connections held open to the server, with packets (json in this case), and some http headers from time to time.
To make sure the server would scale, I wrote a custom webserver in java using nio. It handles all of the connections in a single thread and as I say, scales to tens of thousands of connections.

If the client requests a new connection, it sends a request to the webserver, which then connects out, and starts proxying etc. It also runs an ident server in the case of irc connections so that an irc server can identify individual browsers. I looked at existing frameworks etc to do this sort of thing, but I valued learning how it all works, and thought that my use case may be specific enough to be able to optimise more than general frameworks can.

IE Ajax Streaming/long-polling without folding the server

I found how to detect the close event. You have to use the onload method.

So the code will look like that



var ajaxRequest = new XDomainRequest();

ajaxRequest.onload = function() {
//alert("[XDR-onload]. responseText: " + ajaxRequest.responseText + "");
};

ajaxRequest.onerror = function() { alert("[XDR-onerror] Fatal Error."); };

ajaxRequest.ontimeout = function() {
alert("[XDR-ontimeout] Timeout Error.");
};
ajaxRequest.onprogress = function() {
//alert("[XDR-onprogress] responseText so far: " + ajaxRequest.responseText + "");
};

and don't forget to add the Header in the response (server's side)

response.setHeader("Access-Control-Allow-Origin","*");

polling vs long polling

The difference is this: long polling allows for some kind of event-driven notifying, so the server is able to actively send data to the client. Normal polling is a periodical checking for data to fetch, so to say. Wikipedia is quite detailed about that:

With long polling, the client requests information from the server in a way similar to a normal polling; however, if the server does not have any information available for the client, then instead of sending an empty response, the server holds the request and waits for information to become available (or for a suitable timeout event), after which a complete response is finally sent to the client.

Long polling reduces the amount of data that needs to be sent because the server only sends data when there really IS data, hence the client does not need to check at every interval x.

If you need a more performant (and imho more elegant) way of full duplex client/server communication, consider using the WebSocket protocol, it's great!



Related Topics



Leave a reply



Submit