Disclosure: This post may contain affiliate links, meaning when you click the links and make a purchase, we receive a commission.
Creating an application that can read your emails and automatically download attachments is a handy tool. In this tutorial, you will learn how to use the built-in imaplib module to list and read your emails in Python; we gonna need the help of IMAP protocol.
IMAP is an Internet standard protocol used by email clients to retrieve email messages from a mail server. Unlike the POP3 protocol, which downloads emails and deletes them from the server (and then reads them offline), with IMAP, the message does not remain on the local computer; it stays on the server.
If you want to read emails with Python using some sort of API instead of the standard
imaplib, you can check the tutorial on using Gmail API, where we cover that.
Learn also: How to Extract Google Trends Data in Python
To get started, we don't have to install anything. All the modules used in this tutorial are the built-in ones:
import imaplib import email from email.header import decode_header import webbrowser import os # account credentials username = "firstname.lastname@example.org" password = "yourpassword" # use your email provider's IMAP server, you can look for your provider's IMAP server on Google # or check this page: https://www.systoolsgroup.com/imap/ # for office 365, it's this: imap_server = "outlook.office365.com" def clean(text): # clean text for creating a folder return "".join(c if c.isalnum() else "_" for c in text)
We've imported the necessary modules and then specified the credentials of our email account. Since I'm testing this on an Office 365 account, I've used
outlook.office365.com as the IMAP server, you check this link that contains a list of IMAP servers for the most commonly used email providers.
We need the
clean() function later to create folders without spaces and special characters.
First, we gonna need to connect to the IMAP server:
# create an IMAP4 class with SSL imap = imaplib.IMAP4_SSL(imap_server) # authenticate imap.login(username, password)
Note: From May 30, 2022, Google no longer supports the use of third-party apps or devices which ask you to sign in to your Google Account using only your username and password. Therefore, this code won't work for Gmail accounts. If you want to interact with your Gmail account in Python, I highly encourage you to use the Gmail API tutorial instead.
If everything went okay, then you have successfully logged in to your account. Let's start getting emails:
status, messages = imap.select("INBOX") # number of top emails to fetch N = 3 # total number of emails messages = int(messages)
We've used the
imap.select() method, which selects a mailbox (Inbox, spam, etc.), we've chosen the INBOX folder. You can use the
imap.list() method to see the available mailboxes.
messages variable contains a number of total messages in that folder (inbox folder) and
status is just a message that indicates whether we received the message successfully. We then converted
messages into an integer so we could make a
N variable is the number of top email messages you want to retrieve; I'm gonna use 3 for now. Let's loop over each email message, extract everything we need, and finish our code:
for i in range(messages, messages-N, -1): # fetch the email message by ID res, msg = imap.fetch(str(i), "(RFC822)") for response in msg: if isinstance(response, tuple): # parse a bytes email into a message object msg = email.message_from_bytes(response) # decode the email subject subject, encoding = decode_header(msg["Subject"]) if isinstance(subject, bytes): # if it's a bytes, decode to str subject = subject.decode(encoding) # decode email sender From, encoding = decode_header(msg.get("From")) if isinstance(From, bytes): From = From.decode(encoding) print("Subject:", subject) print("From:", From) # if the email message is multipart if msg.is_multipart(): # iterate over email parts for part in msg.walk(): # extract content type of email content_type = part.get_content_type() content_disposition = str(part.get("Content-Disposition")) try: # get the email body body = part.get_payload(decode=True).decode() except: pass if content_type == "text/plain" and "attachment" not in content_disposition: # print text/plain emails and skip attachments print(body) elif "attachment" in content_disposition: # download attachment filename = part.get_filename() if filename: folder_name = clean(subject) if not os.path.isdir(folder_name): # make a folder for this email (named after the subject) os.mkdir(folder_name) filepath = os.path.join(folder_name, filename) # download attachment and save it open(filepath, "wb").write(part.get_payload(decode=True)) else: # extract content type of email content_type = msg.get_content_type() # get the email body body = msg.get_payload(decode=True).decode() if content_type == "text/plain": # print only text email parts print(body) if content_type == "text/html": # if it's HTML, create a new HTML file and open it in browser folder_name = clean(subject) if not os.path.isdir(folder_name): # make a folder for this email (named after the subject) os.mkdir(folder_name) filename = "index.html" filepath = os.path.join(folder_name, filename) # write the file open(filepath, "w").write(body) # open in the default browser webbrowser.open(filepath) print("="*100) # close the connection and logout imap.close() imap.logout()
A lot to cover here. The first thing to notice is we've used
range(messages, messages-N, -1), which means going from the top to the bottom, the newest email messages got the highest id number, and the first email message has an ID of 1, so that's the main reason, if you want to extract the oldest email addresses, you can change it to something like
Second, we used the
imap.fetch() method, which fetches the email message by ID using the standard format specified in RFC 822.
After that, we parse the bytes returned by the
fetch() method to a proper Message object and use the
decode_header() function from the
email.header module to decode the subject of the email address to human-readable Unicode.
After printing the email sender and the subject, we want to extract the body message. We look if the email message is multipart, which means it contains multiple parts. For instance, an email message can contain the
text/html content and
text/plain parts, which means it has the HTML and plain text versions of the message.
It can also contain file attachments. We detect that by the
Content-Disposition header, so we download it under a new folder created for each email message named after the subject.
The msg object, which is the email module's
Message object, has many other fields to extract. In this example, we used only
From and the
msg.keys() and see available fields to extract. You can, for instance, get the date of when the message was sent using msg["Date"].
After I ran the code for my test email account, I got this output:
Subject: Thanks for Subscribing to our Newsletter ! From: email@example.com ==================================================================================================== Subject: An email with a photo as an attachment From: Python Code <firstname.lastname@example.org> Get the photo now! ==================================================================================================== Subject: A Test message with attachment From: Python Code <email@example.com> There you have it! ====================================================================================================
So the code will only print
text/plain body messages, it will create a folder for each email, which contains the attachment and the HTML version of the email. It also opens the HTML email in your default browser for each email extracted that has the HTML content.
Going to my email, I see the same emails that were printed in Python:
Awesome, I also noticed the folders created for each email:
Each folder has the HTML message (if available) and all the files attached to the email.
Awesome, now you can build your own email client using this recipe. For example, instead of opening each email on a new browser tab, you can build a GUI program that reads and parses HTML just like a regular browser, or maybe you want to send notifications whenever a new email is sent to you; the possibilities are endless!
A note, though, we haven't covered everything that the
imaplib module offers. For example, you can search for emails and filter by the sender address, subject, sending date, and more using the
Here are other Python email tutorials:
Here is the official documentation of modules used for this tutorial:
Finally, if you're a beginner and want to learn Python, I suggest you take the Python For Everybody Coursera course, in which you'll learn a lot about Python. You can also check our resources and courses page to see the Python resources I recommend!
Learn also: How to Create a Watchdog in Python.
Happy Coding ♥View Full Code