How to Get the Size of Directories in Python

Calculating the size of a directory in bytes in Python and plotting a pie using matplotlib to see which subdirectory takes most size.
Abdou Rockikz · 4 min read · Updated jan 2020 · General Python Topics


Have you ever wondered how you can get a folder size in bytes using Python ? As you may already know, os.path.get_size() function only returns the correct size of proper files and not folders. In this quick tutorial, you will learn how you can make a simple function to calculate the total size of a directory in Python.

Let's get started, open up a new Python file:

import os

The below core function calculates the total size of a directory given its relative or absolute path:

def get_directory_size(directory):
    """Returns the `directory` size in bytes."""
    total = 0
    try:
        # print("[+] Getting the size of", directory)
        for entry in os.scandir(directory):
            if entry.is_file():
                # if it's a file, use stat() function
                total += entry.stat().st_size
            elif entry.is_dir():
                # if it's a directory, recursively call this function
                total += get_directory_size(entry.path)
    except NotADirectoryError:
        # if `directory` isn't a directory, get the file size then
        return os.path.getsize(directory)
    except PermissionError:
        # if for whatever reason we can't open the folder, return 0
        return 0
    return total

Notice that I used os.scandir() function which returns an iterator of entries (files or directories) in the directory given.

os.scandir() raises NotADirectoryError if the given path isn't a folder (a file or link), that's why we catched that exception and we return only the actual size of that file.

It also raises PermissionError if it cannot open the file (such as system files), in that case, we'll just return 0.

The above function will return the size in bytes, which will be of course, unreadable for large directories, as a result, let's make a function to scale these bytes to Kilo, Mega, Giga, etc:

def get_size_format(b, factor=1024, suffix="B"):
    """
    Scale bytes to its proper byte format
    e.g:
        1253656 => '1.20MB'
        1253656678 => '1.17GB'
    """
    for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]:
        if b < factor:
            return f"{b:.2f}{unit}{suffix}"
        b /= factor
    return f"{b:.2f}Y{suffix}"

Alright, I'm gonna test this on my C drive (I know it's large):

get_size_format(get_directory_size("C:\\"))

This took about a minute and returned the following:

'100.91GB'

Now what if I want to know which sub directories are taking most of this space ? Well, the following code doesn't just calculate the size of each subdirectory, but plots a pie using matplotlib library (in which you can install using pip3 install matplotlib) that shows the size of each of them:

import matplotlib.pyplot as plt

def plot_pie(sizes, names):
    """Plots a pie where `sizes` is the wedge sizes and `names` """
    plt.pie(sizes, labels=names, autopct=lambda pct: f"{pct:.2f}%")
    plt.title("Different Sub-directory sizes in bytes")
    plt.show()

if __name__ == "__main__":
    import sys
    folder_path = sys.argv[1]

    directory_sizes = []
    names = []
    # iterate over all the directories inside this path
    for directory in os.listdir(folder_path):
        directory = os.path.join(folder_path, directory)
        # get the size of this directory (folder)
        directory_size = get_directory_size(directory)
        if directory_size == 0:
            continue
        directory_sizes.append(directory_size)
        names.append(os.path.basename(directory) + ": " + get_size_format(directory_size))

    print("[+] Total directory size:", get_size_format(sum(directory_sizes)))
    plot_pie(directory_sizes, names)

Now this takes the directory as an argument in the command line:

python get_directory_size.py C:\

This will show a nice pie that looks something like this:

Subdirectory Sizes in PythonNow after seeing this chart, I know Users and Windows folders are taking most of my C drive !

Alright, this is it for this tutorial, If you want to learn more about handling files and directories in Python, check this tutorial.

Read also: How to Transfer Files in the Network using Sockets in Python.

Happy Coding ♥

View Full Code
Sharing is caring!



Read Also





Comment panel

   
Comment system is still in Beta, if you find any bug, please consider contacting us here.