YAROSLAV POGREBNYAK
Staff Software Developer & Architect

Codecs and Containers explained

What are Codecs and Containers? What the difference and how do they work?

In the multimedia world, codec does encoding and decoding video or audio signals. And a container wraps that encoded video and audio alongside additional information used to identify, interleave and synchronize them.

There is quite a lot of complexity when it comes to multimedia. Sometimes the same term means different things. It’s very easy to mix up these concepts and get lost. Also, there are a lot of different codecs, containers and file formats. Partially, the reason for such ambiguity is a lack of standardization. Also, it’s because multimedia itself is just not that trivial thing.

ℹ️ In this post, I’m going to dive into video and audio codecs and containers, and describe everything you need to know about them. We’ll understand why you need both to watch videos on the Web or to publish Instagram stories. We’ll also take a look at modern codecs and containers that are used on the Internet.

signal


What is Codec?

Generally speaking, a codec is a hardware or software, or even a combination of both, that transforms some signal or data stream from it’s original form to the format convenient for transmission or storage (encoding), and backward (decoding).

There is no strict definition for the codec, and we can apply this term both to analog and digital signals.

Let’s take a look at both and see what’s the difference between them.

Analog Signals Coding

If we talk about the analog signals, for example, our voice, we could call a codec (thus this is not so common) the hardware circuit that performs as analog-to-digital (ADC) and digital-to-analog (DAC) conversion.

For example, that device may implement PCM (Pulse-code modulation) method for encoding analog signals into a RAW digital stream of data to digitally record our voice. Then, if we want to playback it, we decode our digital voice back into analog form so we could hear it.

For video, after doing an analog-to-digital conversion we get series of RAW images, for example, in RGB 4:4:4 or JUV 4:2:2.

analog-to-digital

This may sound weird, but although RAW digital data usually considered uncompressed, actually, it may be compressed. It’s because of data loss during the conversion. In many cases, we can choose such properties for the conversion so that it will be no, or so little data loss, so we can neglect it.

For audio, we have a sampling size that affects how many bits would be used to represent an audio sample. And a sampling rate that is a frequency of how many times the audio is measured per second. So the lower these values are, the lower would be quality and the less space that digital data would occupy.

For video, we have basically the same characteristics that the still image has. A resolution (usually width × height in pixels) and bit depth (amount of color information stored in an image) and a color space (for example, RGB or YUV). Plus a frame rate that’s usually a number of images captures per second.

In summary, after analog to digital conversion, we get a digital signal, so-called RAW data. We consider it as uncompressed. In fact, we can get kind of “compression” when doing analog to digital conversion, but we don’t name it so. Then we can encode RAW data using more sophisticated algorithms to get higher compression levels.

Digital Signals Coding

Now, if we talk about an already digitalized signal, audio or video, the codec is a hardware circuit or software program that does compression/decompression (and sometimes even encryption/decryption) of a digital data stream. There are a wide variety of coding algorithms that differ in their implementation details and compression levels.

Talking about our voice digitalized with the analog-to-digital encoder in the previous section, we could compress resulting data to reduce transmission bandwidth or required storage space. For example, encode it in MP3 or AAC format and store it in a file. For video, we could encode it, for example, in VP9.

Codec is an encoder and decoder in one thing. Sometimes if hardware or software utilizes only one function (either encoding or decoding) we call it, respectively, encoder or decoder. For example, we can say that your digital camera contains an H265 encoder. And your smartphone contains both H265 encoder and decoder.

Lossless and Lossy Compression

Digital compression could be lossless, and lossy.

Lossless compression is a method of reducing data size in the way that it may be later reconstructed without any quality loss.

There are almost no lossless video codecs out of there. But lossless audio is a quite popular thing.

Lossy compression sacrifices quality to get a higher level of compression, thus, reduces data size even more.

Almost all video coding formats are lossy. For audio, both lossy and lossless formats exist.

Since you can encode your digital signal differently, a lot of ways on how exactly perform it exists. The exact way of coding is covered in a coding format specification.


What is Coding Format?

Coding format (or, sometimes, compression format) is a specification that describes how exactly the encoding and decoding process should work. Obviously, there are a lot of different methods for compression video and audio.

Many coding formats were developed to be standards. For example, H264 is a video encoding standard and a coding format, and AAC is an audio encoding standard. There are quite a lot of coding formats, most known and widely used now are listed below.

Video Coding Formats

  • H264 – AVC, Advanced Video Coding or MPEG-4 Part 10. As of 2019, the most commonly used and widely supported video coding standard. First published in 2003 and still key technology. Supported by all major browsers.
  • H265 - HEVC, High-Efficiency Video Coding or MPEG-H Part 2. It’s a successor for H264 and offers from 25% to 50% better compression than H264. It was first published in 2013. Supported in Safari, but not supported in Chrome.
  • VP9 – Open and royalty-free format developed by Google. It mainly competes with H265 but has more widespread support in web-browsers than H265. Supported on Chrome, but not in Safari.
  • AV1 – Open and royalty-free format designed to be a new video standard for the Internet. It can have more than 20% better compression than VP9 and H265 and more than 50% better than H264. AV1 adoption is on the beginning stage, should be supported by all browsers in the near future.
  • H266 – VVC, Versatile Video Coding or MPEG-I Part 3. This is a future standard that would be a successor of H265.

Audio Coding Formats

  • MP3 – MPEG-2 Audio Layer III. It’s a pretty popular but also quite old audio coding format initially released in 1993. Supported by all major browsers.
  • AAC – Advanced Audio Coding. It’s a successor of MP3. It was created in 1997 and generally achieves better quality than MP3 at the same bit rate. Supported by all major browsers.
  • FLAC – Free Lossless Audio Codec, provides loseless audio compression.
  • Vorbis – free and open-source created by Xiph.Org Foundation.
  • Opus – Free audio coding format designed for low-latency enough for real-time interactive communication. It has wide support in web browsers.
  • G.722 – Old ITU-T audio coding standard approved in 1988. Its patents have expired, so it is freely available. Some real-time conferencing application may still use this it. For example, WebRTC implementations should support this audio codec.

Difference Between Codec and Coding Format

There is a slight confusion about the terms “codec” and “coding format”. People tend to call coding format the codec that is not strictly correct. For example, H264 is a coding format (specification), but x265 and OpenH264 are codecs (software implementation). There may be many implementations of the spec, both hardware, and software.

If you keep in mind that codecs and coding formats are different things, then it’s usually safe to name one with another.


What is a Container?

Container (media container, or digital container format) is a wrapper for encoded data bitstream alongside some metadata used to identify contained data coding format, a do a couple of other things.

If you just take the binary output (called bitstream) of, say, H264 encoder and put into a file, another user won’t know what is it and how to open it. So it’s very handy to include this information alongside video and audio data. You need a container to wrap that bitstream and add meta-information.

Another important function of a container is to store several encoded data streams or tracks. For example, when we watch a movie we usually have both video and audio playing at the same time. So your video file or video stream should contain both video and audio, and also information used to synchronize them. Usually, video and audio are split into packets (chunks, atoms or segments) with assigned timestamps. In this case, video player could know when exactly it should decode and play that piece of audio or video to keep them synchronized and smooth.

In most cases, the container handles one video stream and one audio stream. Sometimes it could also contain additional data stream with closed captions. Most containers are capable to handle many streams. Say, several audio streams, for example, with audio tracks in different languages. So a user could select which audio track to listen to while watching the same video track.

Media container formats are very different, some of them may handle only some kind of data or only certain types of coding formats. Others can handle almost everything. Some are more adopted on the Internet, and others not.

Well-known Media Containers

  • MP4 – MPEG-4 Part 14 with the ‘.mp4’ filename extension, very popular and widely used on the Internet and in HTML5 video. It usually contains H264 or H265 video and AAC audio, but it not limited to them. You can find a full list of “registered” codecs here. But usually, most software does not support all of them.
  • Matroska – free and open media container format. Usually, it has the ‘.mkv’ extension. Designed to hold any type of audio and video codec.
  • WebM – royalty-free container format extended from Matroska with ‘.webm' extension. It is sponsored by Google and intended to be used in HTML5 as an alternative to MP4. It usually holds VP8/VP9 or AV1 video and Vorbis or Opus audio.
  • MPEG-TS – MPEG transport stream. It usually has ‘.ts’ extension. Used in DVB and IPTV broadcast systems. Usually holds H264/H265 and AAC, but is not limited to them.
  • FLV – Adobe’s media container for the Flash video. Since Flash is considered deprecated now, FLV is becoming obsolete as well. FLV may be used as a file with ‘.flv’ extension or to stream media via RTMP protocol. Read more about RTMP and how it works in this blog post. FLV usually holds H264 and AAC streams.
  • AVI – Audio Video Interleaved, created by Microsoft in 1992. AVI files have ‘.avi’ extension. This is an old container format that has a lot of problems solved by newer containers listed above.

Streaming Media

We can use almost any coding format for the streaming. On the other hand, media container specification usually describes the binary format of the file. But it also may support streaming media, so the same container could be suitable to carry data while being transferred over the network as a continuous stream. Some examples:

  • MPEG-TS container data could be saved as .ts file. It may be live-streamed through the network using TCP, UDP or SRT.
  • MP4 or MPEG-TS segments may be transferred via HTTP using MPEG-DASH or HLS protocols.
  • FLV container data may be a .flv file. It may be live-streamed using RTMP protocol.

While you can send any file through the network, it’s may not be optimal for low-delay live-streaming or conferencing. In this case, we usually use specialized streaming protocols like RTMP, SRT, RTP or others.

Sometimes multimedia could be transferred even without a container. A good example is the WebRTC protocol. It uses RTP protocol under the hood to transfer video and audio in separate connections. And it does not require a container. Instead, the RTP protocol has specification extensions that describe how to transfer concrete video or audio data format. For example, RFC6184 describes the H264 payload format for RTP. And RFC7587 describes the Opus.

You may be interested in how to implement video/audio synchronization in this case. It’s an interesting question and I hope to cover it in a separate blog post soon.


Codecs, Containers, and Licensing

As you can see from the previous sections, some containers were created to hold only some subset of the coding formats data. So however there is a close relation between codecs and containers, they are totally different things. Despite large amounts of different codecs and containers, we can trace the one main idea. Currently, there are two big “camps” out of here:

  • MPEG patented coding standards H264, H265 and AAC (however, with open container standard MP4). Anyone interested in using these coding formats in their product should pay licensing royalties to MPEG LA. These licensing questions are fairly complex.
  • Royalty-fee open coding standards like VP8/VP9/AV1 and Opus/Vorbis (with accompanying containers Matroska/WebM). Anyone could use them for free.

Future of Codecs and Containers on the Web

Currently, there is quite a fragmentation in codecs support in the browsers and devices. For example, Chrome support H264 and VP8/VP9, but not HEVC. Safari supports H264 and HEVC but not VP8/VP9. Almost the same story for audio and AAC, Vorbis and Opus codecs. Here and here are the nices tables with all major browsers and codecs/containers that they support.

As a result, companies should provide the same media content in different formats to ensure any Internet user will be able to view it. It’s pretty hard and requires much extra work. And that’s what Alliance for Open Media was created to solve.

So the future of multimedia on the Web seems pretty clear. Alliance for Open Media is developing royalty-free “state-of-art” AV1 codec. It aimed to be supported in all Web technologies, i.e. HTML5 video and WebRTC. It has serious members on board: Amazon, Apple, Google, IBM, Microsoft, Cisco, Facebook, and others, just take a look here.

The broadcasting industry may still continue to use MPEG standards, while the Internet will be switched to open standards.


Summary

We’ve seen how video and audio get encoded and then packaged into a container to be stored or transmitted over the network. We discussed the difference between a codec and coding format, and between codec and a container. Also, we took a brief look at the most popular codecs and containers.

I hope this topic is more clear for you now and you could feel yourself more comfortable when looking at coding formats and containers.


Thank you for reading!

If you have propositions how to improve this post, or any questions or need a consultation, please do feel free to send me a message.

You can find my contacts on a main page header.