File compression is a core part of how the web works. It allows us to transfer files that would otherwise take too much bandwidth and time. Whenever you access ZIP files or view JPEG images, you’re benefiting from file compression.
Thus, at some point you’ve probably asked a question: how does file compression work? Here’s a basic look at how compression works.
What Does Compression Mean?
Simply put, file compression (or data compression) is the act of reducing the size of a file while preserving the original data. Doing so allows the file to take up less space on a storage device, in addition to making it easier to transfer over the internet or otherwise.
It’s important to note that compression is not infinite. While compressing a file into a ZIP reduces its size, you can’t keep compressing the file to further reduce the size to nothing.
Generally, file compression is split into two main types: lossy and lossless. Let’s look at how both of these work in turn.
How File Compression Works: Lossy Compression
Lossy compression reduces file size by removing unnecessary bits of information. It’s most common in image, video, and audio formats, where a perfect representation of the source media isn’t necessary. Many common formats for these types of media use lossy compression; MP3 and JPEG are two popular examples.
An MP3 doesn’t contain all the audio information from the original recording—instead, it throws out some sounds that humans can’t hear. You wouldn’t notice them missing anyway, so removing that info results in a lower file size with basically no drawbacks.
Similarly, JPEGs remove non-vital parts of images. For instance, in a picture containing a blue sky, JPEG compression might change all the sky pixels to one or two shades of blue, instead of using dozens of different shades.
However, the more heavily you compress a file, the more noticeable the drop in quality becomes. You’ve probably experienced this with muddy MP3 files uploaded to YouTube. For example, compare this high-quality music track:
With this heavily compressed version of the same song:
Lossy compression is suitable when a file contains more information than you need for your purposes. For instance, let’s say you have a huge RAW image file. While you probably want to preserve that quality when printing the image onto a large banner, it’s pointless to upload the RAW file to Facebook.
The picture contains so much data that isn’t noticeable when viewed on social media sites. Compressing the image to a high-quality JPEG throws out some information, but the image looks almost the same to the naked eye. See our comparison of popular image formats for a deeper look at this.
Lossy Compression in General Usage
As we’ve mentioned, lossy compression is great for most forms of media. Because of this, it’s vital for companies like Spotify and Netflix that constantly transmit massive amounts of information. Reducing the file size as much as possible, while still preserving quality, makes their operation more efficient. Can you imagine if every video YouTube was stored and transmitted in its original uncompressed format?
But lossy compression doesn’t work so well for files where all the information is crucial. For instance, using lossy compression on a text file or a spreadsheet would result in garbled output. You really can’t throw anything out without severely harming the final product.
When saving in a lossy format, you can often set the level of quality. For instance, many image editors have a slider to choose the quality of a JPEG from 0-100.
Saving at something like 90 or 80 percent reduces the file size quite a bit, with little difference to the eye. But saving in poor quality or repeatedly saving the same file in a lossy format will degrade it.
Below you can see an example of this (click to see the larger images). On the left is the original image downloaded from Pixabay as a JPEG. The middle image is the result of saving this as a JPEG at 50 percent quality. And the rightmost image shows the original image saved instead as a 10 percent quality JPEG.
At a quick glance, the middle image doesn’t look too bad. You can only notice the artifacts around the edges of the boxes if you zoom in. Of course, the rightmost image immediately looks terrible.
Before cropping for upload, the file sizes were 874KB, 310KB, and 100KB respectively.
How File Compression Works: Lossless Compression
Lossless compression is a way of reducing file size so that you can perfectly reconstruct the original file. Contrary to lossy compression, it doesn’t throw any information out. Instead, lossless compression essentially works by removing redundancy.
Let’s take a basic example to show what this means. Below is a stack of 10 bricks: two blue, five yellow, and three red. This stack is a simple way to illustrate those blocks, but there’s another way to do so.
Instead of showing all 10 blocks, we can remove all but one of each color. Then, if we use numbers to show how many bricks of each color there were, we’ve represented the exact same bit of information using far fewer bricks. Instead of 10 bricks, we now only need three.
This is a simple illustration of how lossless compression is possible. It stores the same information in a more efficient way by removing redundancy. Consider an actual file, where the below string:
mmmmmuuuuuuuoooooooooooo
Can “compress” to the following, much shorter form:
m5u7o12
This allows us to use seven characters instead of 24 to represent the same data, which is a significant saving.
Lossless Compression in Everyday Use
As we mentioned above, lossless compression is important in cases where you can’t remove any of the original file. If you’ve been curious as to how ZIP files work, this is the answer.
When you create a ZIP file from a program executable in Windows, it uses lossless compression. The ZIP file compression is a more efficient way to store the program, but when you unzip (decompress) it, all the original information is present. If you used lossy compression to compress executables, the unzipped version would be damaged and unusable.
Common lossless formats include PNG for images, FLAC for audio, and ZIP. Lossless formats for video are rare, because they would take up massive amounts of space.
When to Use Lossy vs. Lossless Compression
Now that we’ve looked at both forms of file compression, you might wonder when you should use one or the other. As it turns out, there is no “better” form of compression—it all depends on what you’re using the files for.
In general, you should use lossless compression when you want a perfect copy of the source material, and lossy compression when an imperfect copy is good enough. Let’s look at another example to see how they can work in harmony.
Say that you’ve just dug up your old CD collection and want to digitize it so you have all your music on your computer. When you rip your CDs, it makes sense to use a format like FLAC, which is lossless. This lets you have a master copy on your computer that’s as good as the original CD.
Later, perhaps you want to put some music on your phone or an old MP3 player so you can listen on-the-go. You probably don’t care about your music being in perfect quality for this, so you can convert the FLAC files to MP3. This gives you an audio file that’s still perfectly listenable, but doesn’t take up as much space on your mobile device. The quality of the MP3 converted from the FLAC will be as good as if you’d created a compressed MP3 right from the original CD.
The type of data represented in a file can also dictate which type of compression is best. Because PNG images use lossless compression, they offer small file sizes for images with lots of uniform space, like computer screenshots. However, you’ll notice that PNGs take up much more space when they represent the jumble of colors in real-world photos.
Concerns During File Compression
As we’ve seen, converting lossless formats to lossy is fine, as is converting one lossless format to another. However, you should never convert a lossy format to lossless, and should beware converting one lossy format to another.
Converting lossy formats to lossless is simply a waste of space. Remember that lossy formats throw data out; it’s impossible to recover that data.
Say you have a 3MB MP3 file. Converting that to FLAC might result in a 30MB file, but those 30MB contain the exact sounds that the much smaller MP3 did. Converting back to a lossless format doesn’t “recover” the information that the MP3 compression threw out.
Finally, as mentioned earlier, converting one lossy format to another (or repeatedly saving in the same format) will degrade the quality further. Every time you apply the lossy compression, you lose more detail. This becomes more and more noticeable until the file is essentially ruined.
How Does Compression Work? Now You Know
We’ve taken a look at both lossy and lossless compression to see how they work. Now you know how it’s possible to store a file at a smaller size than its original form, and how to choose the best method for your needs.
Of course, the algorithms that decide what data gets thrown out in lossy methods and how to best store redundant data in lossless compression are much more complicated than we’ve explained here. There’s a lot more to discover on this topic if you’re interested.
Tried out lossless compression and need to send something to a friend? Try these fast ways to transfer large files online.
Read the full article: How Does File Compression Work?