Learners on Coursera spend the majority of their time on the site watching videos, a format that allows Coursera’s partner universities to educate at scale. These videos range from thirty-second snippets to hour-long debates and presented interesting challenges in preparing for the launch of Coursera’s mobile apps. Here is what we learned!
HTTP Live Streaming
Our first step was to understand iOS’s HTTP Live Streaming protocol. According to Section 9.4 of the iOS application requirements:
Video streaming content over a cellular network longer than 10 minutes must use HTTP Live Streaming and include a baseline 64 kbps HTTP Live stream.
It’s easy to find a more technical description of what HTTP Live Streaming (HLS) is and how it works, but here’s a simple breakdown:
Start with a video. Any video.
Encode it in 3-5 different bit rates (read: resolutions)
Using a segmenter, cut the video into 10-second clips.
Create a playlist to keep track of all these segments. (Every bit rate gets its own playlist)
Create a master playlist to point to each bit rate playlist, along with bandwidth cutoffs for moving between bit rates.
Point the client to the master playlist. The client will play the first playlist in the master playlist.
It will determine whether or not it had enough bandwidth to play a better bit-rate, and move up or down in the master playlist depending on that calculation. The segment chosen will be played next!
Repeat 7. until the video is complete!
The process is simple enough: HLS is basically a two-dimensional array of video segments, with an algorithm to choose the ones the client can comfortably play.
But there are some gotchas:
Video Segmentation is not as easy as it sounds.
When a video gets compressed, the majority of the frames are not actual pictures; they’re the differences from the picture before. This “group of pictures” format helps save space, but it means that if you were to segment the video in the middle of this process, you’d get a picture that was just the difference between pictures, not the full picture. In order to properly segment compressed videos you’ll need to force the encoder to create the full picture frames at least every 10 seconds so each clip can start with a full picture.
Playlist bandwidths are incredibly important.
The HLS protocol makes judicious use of playlists to keep track of where everything is and what it should do next. The master playlist uses a “bandwidth” parameter to help the client determine when to move between different streams of video. These parameters need to be as close as possible to the actual bit rates of the playlists they point to; if they are inaccurate, overestimated, or underestimated, the client will not perform the stream switching as expected!
The docs are a little inconsistent (and sometimes out of date).
Going through documentation (even the HLS spec itself), we found that while there are many comments and suggestions for implementing HLS, some of them were more accurate than others with the dynamic ecosystem. We’ve attempted to keep as much of this information as general as possible to make sure it’s still relevant in a year or two, but always be on the look out for publish dates and draft revision times to make sure you’re not getting led down the wrong path.
Now that we’ve gotten the obstacles out of the way, here are some of the things that we really appreciate about the entire HLS process:
ffmpeg is awesome.
ffmpeg is a great solution to the video encoding problem; it has HLS specific settings as well as generic segmenters already built in, so it can deal with all sorts of video codecs, and the community and documentation around it is solid.
Additionally, the many companies in the video encoding space all use ffmpeg, so figuring out the problems and nuances of it on your own machine will translate nicely to encoding at scale.
Audio-only means access for all.
Part of the HLS requirement was that we needed a baseline 64-kbps stream. Its pretty hard to fit proper video in only 64-kbps, so we just put the audio track in there. It turns out that this substitution is incredibly useful for our learners in places where internet bandwidth is hard to come by; having access to educational content, even if it’s only the audio, is better than nothing at all.
And of course, we couldn’t have figured out how to get HLS to work without the following tools and services (which we highly recommend!):
If there’s one thing you should get out of this post, its that this particular tool is an absolute joy to use. You can proxy your devices through Charles and see which bit rates are getting chosen by the HLS algorithm, which is helpful when you’re determining bandwidth cutoffs or testing video segment encodings.
Both Mac OS and iOS have a built in developer tool called the “Network Link Conditioner” that allows you to test your application in varying bandwidth conditions, on 3G/Edge/LTE, or even with significant packet loss. This tool, in conjunction with Charles, meant we could more easily test our HLS implementation in a number of environments to make sure we got it right.
When we were working on our HLS implementation, we had some 70,000 - 100,000 different videos on Coursera that had to be re-processed with HLS in mind. Not only did Transloadit help us with this immense task, we worked closely on figuring out an optimal HLS implementation, both from a client (us) and a service (Transloadit) side. Their continuous support is what makes all the videos on Coursera possible.
By sharing this information, we hope that you can better understand HTTP Live Streaming, how it works, and how it can help benefit you and the people who watch your videos. Please do not hesitate to reach out if you have questions or suggestions for us!