Low Latency (< 2S) Live Video Streaming HTML5 Solutions

Low latency ( 2s) live video streaming HTML5 solutions?

Technologies and Requirements

The only web-based technology set really geared toward low latency is WebRTC. It's built for video conferencing. Codecs are tuned for low latency over quality. Bitrates are usually variable, opting for a stable connection over quality.

However, you don't necessarily need this low latency optimization for all of your users. In fact, from what I can gather on your requirements, low latency for everyone will hurt the user experience. While your users in control of the robot definitely need low latency video so they can reasonably control it, the users not in control don't have this requirement and can instead opt for reliable higher quality video.

How to Set it Up

Robot Live Streaming Diagram

In-Control Users to Robot Connection

Users controlling the robot will load a page that utilizes some WebRTC components for connecting to the camera and control server. To facilitate WebRTC connections, you need some sort of STUN server. To get around NAT and other firewall restrictions, you may need a TURN server. Both of these are usually built into Node.js-based WebRTC frameworks.

The cam/control server will also need to connect via WebRTC. Honestly, the easiest way to do this is to make your controlling application somewhat web based. Since you're using Node.js already, check out NW.js or Electron. Both can take advantage of the WebRTC capabilities already built in WebKit, while still giving you the flexibility to do whatever you'd like with Node.js.

The in-control users and the cam/control server will make a peer-to-peer connection via WebRTC (or TURN server if required). From there, you'll want to open up a media channel as well as a data channel. The data side can be used to send your robot commands. The media channel will of course be used for the low latency video stream being sent back to the in-control users.

Again, it's important to note that the video that will be sent back will be optimized for latency, not quality. This sort of connection also ensures a fast response to your commands.

Video for Viewing Users

Users that are simply viewing the stream and not controlling the robot can use normal video distribution methods. It is actually very important for you to use an existing CDN and transcoding services, since you will have 10k-15k people watching the stream. With that many users, you're probably going to want your video in a couple different codecs, and certainly a whole array of bitrates. Distribution with DASH or HLS is easiest to work with at the moment, and frees you of Flash requirements.

You will probably also want to send your stream to social media services. This is another reason why it's important to start with a high quality HD stream. Those services will transcode your video again, reducing quality. If you start with good quality first, you'll end up with better quality in the end.

Metadata (chat, control signals, etc.)

It isn't clear from your requirements what sort of metadata you need, but for small message-based data, you can use a web socket library, such as Socket.IO. As you scale this up to a few instances, you can use pub/sub, such as Redis, to distribution messaging throughout the servers.

To synchronize the metadata to the video depends a bit on what's in that metadata and what the synchronization requirement is, specifically. Generally speaking, you can assume that there will be a reasonable but unpredictable delay between the source video and the clients. After all, you cannot control how long they will buffer. Each device is different, each connection variable. What you can assume is that playback will begin with the first segment the client downloads. In other words, if a client starts buffering a video and begins playing it 2 seconds later, the video is 2 seconds behind from when the first request was made.

Detecting when playback actually begins client-side is possible. Since the server knows the timestamp for which video was sent to the client, it can inform the client of its offset relative to the beginning of video playback. Since you'll probably be using DASH or HLS and you need to use MCE with AJAX to get the data anyway, you can use the response headers in the segment response to indicate the timestamp for the beginning the segment. The client can then synchronize itself. Let me break this down step-by-step:

  1. Client starts receiving metadata messages from application server.
  2. Client requests the first video segment from the CDN.
  3. CDN server replies with video segment. In the response headers, the Date: header can indicate the exact date/time for the start of the segment.
  4. Client reads the response Date: header (let's say 2016-06-01 20:31:00). Client continues buffering the segments.
  5. Client starts buffering/playback as normal.
  6. Playback starts. Client can detect this state change on the player and knows that 00:00:00 on the video player is actualy 2016-06-01 20:31:00.
  7. Client displays metadata synchronized with the video, dropping any messages from previous times and buffering any for future times.

This should meet your needs and give you the flexibility to do whatever you need to with your video going forward.

Why not [magic-technology-here]?

  • When you choose low latency, you lose quality. Quality comes from available bandwidth. Bandwidth efficiency comes from being able to buffer and optimize entire sequences of images when encoding. If you wanted perfect quality (lossless for each image) you would need a ton (gigabites per viewer) of bandwidth. That's why we have these lossy codecs to begin with.
  • Since you don't actually need low latency for most of your viewers, it's better to optimize for quality for them.
  • For the 2 users out of 15,000 that do need low latency, we can optimize for low latency for them. They will get substandard video quality, but will be able to actively control a robot, which is awesome!
  • Always remember that the internet is a hostile place where nothing works quite as well as it should. System resources and bandwidth are constantly variable. That's actually why WebRTC auto-adjusts (as best as reasonable) to changing conditions.
  • Not all connections can keep up with low latency requirements. That's why every single low latency connection will experience drop-outs. The internet is packet-switched, not circuit-switched. There is no real dedicated bandwidth available.
  • Having a large buffer (a couple seconds) allows clients to survive momentary losses of connections. It's why CD players with anti-skip buffers were created, and sold very well. It's a far better user experience for those 15,000 users if the video works correctly. They don't have to know that they are 5-10 seconds behind the main stream, but they will definitely know if the video drops out every other second.

There are tradeoffs in every approach. I think what I have outlined here separates the concerns and gives you the best tradeoffs in each area. Please feel free to ask for clarification or ask follow-up questions in the comments.

low-latency html5 video on a LAN

You won't achieve low latency with any of the segmented distribution methods (HLS, DASH, or similar). The very nature of these protocols is that the data is chunked into relatively large pieces. 4 seconds with HLS is amazingly low, and with chunks that small you have quite a bit of overhead... a waste of bandwidth and not really HLS and DASH are good for.

The first one is working under the assumption that the content consumers will not be on site.

My answer there (https://stackoverflow.com/a/37475943/362536) doesn't assume that the consumers will not be on your site... that's not the case at all. What I'm suggesting there is that you take advantage of YouTube and embed their viewer when low latency isn't needed, saving you mountains of money.

If all of your viewers require low latency video to make this work, you're going to have to get crafty on the server side. If you told us what sort of scale you were working with, perhaps we could suggest something more specific. Since you didn't, let's focus on the possibilities client-side.

WebRTC is one of the best options. Everything in the whole WebRTC stack is built with low latency in mind. With WebRTC, you can get those sub-second latencies in normal operation. Note that aren't a lot of good choices for streaming servers that support WebRTC today.

You can also use Media Source Extensions and Web Sockets. This gives you quite a bit of control and allows very fast streaming of data to the clients, at a slightly higher cost of latency. It's much easier to do this than it is to implement your own server-side WebRTC that supports media streams.

I strongly recommend reading over my answer on that other question again as well. There are a lot of considerations here... make very sure that this low latency is actually worth the reduction in quality and the financial costs involved. This is rarely the case, expecially for 10s of thousands of users or more.

Low Latency (50ms) Video Streaming with NODE.JS and html5

I’d like suggestions for NODE.JS packages or other solutions to provide a UDP H264 video stream that can be decoded by an HTML5 video tag with a target latency of 50ms.

That’s Almost certainly not possible in that configuration.

If you drop the video tag requirement, and use just straight WebRTC in the browser, you may be able to get down to about 150ms.

Broadcasting solutions / encoders/ decoders

If you're absolutely sure you need latency that low, you need WebRTC. While it is possible to achieve latency this low over regular HTTP Progressive streaming, your application will benefit from being able to drop chunks if they don't come in time, encode in a low quality low latency mode, decode in a low latency mode, and everything all the way through.

You're going to need a provider that supports WebRTC distribution, and they aren't cheap.

WebRTC vs WebSockets server to client/s (one to many) live video streaming from IP camera

Some notes and other things to consider:

…from what I've seen webRTC server implementations don't support media channels anyways, so I will need to use Data Channels…

You can run WebRTC media channels server-side, but you're right in that there is limited software available for doing so. I actually often end up using headless Chromium on the server because its easy, but this doesn't work for your use case since your streams are coming in via RTSP.

If you go the WebRTC route, I'd recommend using GStreamer on the server side. It has its own implementation of everything needed for WebRTC. You can use it to take your existing stream and mux and stream and transcode as necessary for WebRTC.

I could just do the same using WebSockets

You could, but I would recommend using just regular HTTP at that point. Your stream is just unidirectional, from the server to the client. There's no need for the overhead and hassle of Web Sockets. In fact, if you do this right, you don't even need anything special on the client side. Just a video element:

<video src="https://streams.example.com/your-stream-id" preload="none" controls></video>

The server would need to set up all the video initialization data and then drop into the live stream. The client will just play back the stream no problem.

I've gone this route using a lightweight Node.js server, wrapping FFmpeg. This way it's trivial to get the video from the source. When I did this, I actually used WebM. All data before the first Cluster element can be treated as initialization data. And then, assuming each Cluster starts with a keyframe (which is usually the case), you can drop into any part of the stream later. (See also: https://stackoverflow.com/a/45172617/362536)

In other words, take the WebM/Matroska output from FFmpeg and buffer it until you see 0x1F43B675. Everything before that, hang on to it as initialization data. When a client connects, send that initialization data, and then start the "live" stream as soon as you see the next 0x1F43B675. (This is a quick summary to get you started, but if you get stuck implementing, please post a new question.)

Now, what should you do?

This comes down to some tradeoffs.

  • If you need low latency end-to-end (<2 seconds), you must use WebRTC.
    The whole stack, while complicated, is built around the lowest possible latency. Tradeoffs are made in the encoding, decoding, network, everywhere. This means lower media quality. It means that when packets are lost, everything is done to skip the client forward rather than buffering to try to get lost data. But, all this needs to be done if you require low latency.

  • If you want the simplest implementation, have a high number of clients per-source, or want to use existing CDNs, and you don't mind higher latency, consider HLS.
    With a simple FFmpeg command per-source, you can have live streams of all your inputs running all the time, and when clients connect they just receive the playlists and media segments. It's a great way to isolate the source end from the serving and the clients, and allows you to reuse a lot of existing infrastructure. The downsides of course are the added latency, and that you really should have the source streams running all the time. Otherwise, there will be a relatively long delay when starting the streams initially. Also, HLS gets you adaptive bitrate very easily, costing you only some more CPU for transcoding.

  • If you have few clients per-source and don't require ABR, consider a HTTP progressive streaming proxy.
    This can be basically a ~10 line Node.js server that receives a request for a stream from clients. When a request comes in, it immediately executes FFmpeg to connect to the source, and FFmpeg outputs the WebM stream. This is similar to what I was talking about above, but since there is a separate FFmpeg process per-client, you don't need to buffer until Cluster elements or anything. Simply pipe the FFmpeg output directly to the client. This actually gets you pretty low latency. I've gotten it as low as ~300ms glass-to-glass latency. The downside is that the client will definitely try to buffer if packets are lost, and then will be behind live. You can always skip the player ahead client-side by looking at the buffered time ranges and deciding whether to seek or increase playback speed. (This is exactly what HLS players do when they get too far behind live.) The client-side in this is otherwise just a video element.

This is a pretty broad topic, so hopefully this answer gives you some more options to consider so that you can decide what's most appropriate for your specific use case. There is no one right answer, but there are definitely tradeoffs that are both technical, and in ease-of-development.



Related Topics



Leave a reply



Submit