How to Compress and Decompress Files in Python

Learn how to compress and decompress files, folders and symbolic links in Python using gzip compression in tarfile built-in module.
Abdou Rockikz · 5 min read · Updated may 2020 · Python Standard Library


A compressed file is a sort of archive that contains one or more files that have been reduced in size. Compressing files in modern operating systems is usually pretty simple. However, in this tutorial, you will learn how to compress and decompress files using Python programming language.

You may ask, why would I learn to compress files in Python where there are already provided tools out there ? Well, decompressing files programmatically without any manual clicks is extremely useful, for example, when downloading machine learning datasets in which you want a piece of code to download, extract and load them into memory automatically.

You may also want to add a compression/decompression feature in your application, or you have thousands of compressed files and you want to decompress them in one click, this tutorial can help.

Related: How to Encrypt and Decrypt Files in Python.

Let's get started, we will be using tarfile built-in module, so we don't have to install anything, you can optionally install tqdm just for printing progress bars:

pip3 install tqdm

Open up a new Python file and:

import tarfile
from tqdm import tqdm # pip3 install tqdm

Let's start by compression, the following function is responsible for compressing a file/folder or a list of files/folders:

def compress(tar_file, members):
    """
    Adds files (`members`) to a tar_file and compress it
    """
    # open file for gzip compressed writing
    tar = tarfile.open(tar_file, mode="w:gz")
    # with progress bar
    # set the progress bar
    progress = tqdm(members)
    for member in progress:
        # add file/folder/link to the tar file (compress)
        tar.add(member)
        # set the progress description of the progress bar
        progress.set_description(f"Compressing {member}")
    # close the file
    tar.close()

I called these files/folders as members, well that's what the documentation calls them anyway.

First, we opened and created a new tar file for gzip compressed writing (that's what mode='w:gz' stands for), and then for each member, add it to the archive and then finally close the tar file.

I've optionally wrapped members with tqdm to print progress bars, this will be useful when compressing a lot of files in one go.

That's it for compression, now let's dive into decompression.

The below function is for decompressing a given archive file:

def decompress(tar_file, path, members=None):
    """
    Extracts `tar_file` and puts the `members` to `path`.
    If members is None, all members on `tar_file` will be extracted.
    """
    tar = tarfile.open(tar_file, mode="r:gz")
    if members is None:
        members = tar.getmembers()
    # with progress bar
    # set the progress bar
    progress = tqdm(members)
    for member in progress:
        tar.extract(member, path=path)
        # set the progress description of the progress bar
        progress.set_description(f"Extracting {member.name}")
    # or use this
    # tar.extractall(members=members, path=path)
    # close the file
    tar.close()

First, we opened the archive file as reading with gzip compression. After that, I made a optional parameter 'member' in case we want to extract specific files (not all archive), if 'members' isn't specified, we gonna get all files in the archive using getmembers() method which returns all the members of the archive as a Python list.

And then for each member, extract it using extract() method which extracts a member from the archive to the 'path' directory we specified.

Note that we can alternatively use extractall() for that (which is prefered in the official documentation).

Let's test this:

compress("compressed.tar.gz", ["test.txt", "folder"])

This will compress test.txt file and folder in the current directory to a new tar archive file called compressed.tar.gz as shown in the following example figure:

Compressed file using tarfile module in PythonIf you want to decompress:

decompress("compressed.tar.gz", "extracted")

This will decompress the previous archive we just compressed to a new folder called extracted:

Decompressed tar archive file using tarfile module in PythonOkey, we are done! You can be creative with this, here are some ideas:

In this tutorial, we have explored compression and decompression using tarfile module, you can also use zipfile module to work with ZIP archives, bz2 module for bzip2 compressions, gzip or zlib modules for gzip files.

Finally, if you're a beginner and want to learn Python, I suggest you take Master Python in 5 Online Courses from University of Michigan, in which you'll learn a lot about Python, good luck!

Learn Also: How to Generate and Read QR Code in Python.

Happy Coding ♥

View Full Code
Sharing is caring!



Read Also





Comment panel

   
Comment system is still in Beta, if you find any bug, please consider contacting us here.