Hashing algorithms are mathematical functions that convert data into fixed-length hash values, hash codes, or hashes. The output hash value is literally a summary of the original value. The most important thing about these hash values is that it is impossible to retrieve the original input data just from hash values.
Now, you may be thinking, then what's the benefit of using hashing algorithms? Why not just use encryption? Well, even though encryption is important for protecting data (data confidentiality), sometimes it is important to be able to prove that no one has modified the data you're sending. Using hashing values, you'll be able to tell if some file isn't modified since creation (data integrity).
That's not it, there are many examples of their use, some examples include digital signatures, public-key encryption, message authentication, password protection, and many other cryptographic protocols. In fact, whether you're storing your files on a cloud storage system, using a Git version control system, connecting to an HTTPS website, connecting to a remote machine using SSH, or even sending a message on your mobile phone, there's a hash function somewhere under the hood.
In this tutorial, we will be using hashlib built-in module to use different hash algorithms in Python, let's get started:
import hashlib # encode it to bytes using UTF-8 encoding message = "Some text to hash".encode()
We gonna use different hash algorithms on this message string, starting with MD5:
# hash with MD5 (not recommended) print("MD5:", hashlib.md5(message).hexdigest())
# hash with SHA-2 (SHA-256 & SHA-512) print("SHA-256:", hashlib.sha256(message).hexdigest()) print("SHA-512:", hashlib.sha512(message).hexdigest())
SHA-256: 7a86e0e93e6aa6cf49f19368ca7242e24640a988ac8e5508dfcede39fa53faa2 SHA-512: 96fa772f72678c85bbd5d23b66d51d50f8f9824a0aba0ded624ab61fe8b602bf4e3611075fe13595d3e74c63c59f7d79241acc97888e9a7a5c791159c85c3ccd
SHA-2 is a family of 4 hash functions: SHA-224, SHA-256, SHA-384 and SHA-512, you can also use hashlib.sha224() and hashlib.sha-384(). However, SHA-256 and SHA-512 are mostly used.
The reason it's called SHA-2 (Secure Hash Algorithm 2), is because SHA-2 is the successor of SHA-1 which is outdated and easy to break, the motivation of SHA-2 was to generate longer hashes which leads to higher security levels than SHA-1.
Although SHA-2 is still used nowadays, many believe that attacks on SHA-2 are just a matter of time, researchers are concerned about its long-term security due to its similarity to SHA-1.
# hash with SHA-3 print("SHA-3-256:", hashlib.sha3_256(message).hexdigest()) print("SHA-3-512:", hashlib.sha3_512(message).hexdigest())
SHA-3-256: d7007c1cd52f8168f22fa25ef011a5b3644bcb437efa46de34761d3340187609 SHA-3-512: de6b4c8f7d4fd608987c123122bcc63081372d09b4bc14955bfc828335dec1246b5c6633c5b1c87d2ad2b777d713d7777819263e7ad675a3743bf2a35bc699d0
SHA-3 is unlikely to be broken any time soon. In fact, hundreds of skilled cryptanalysts have failed to break SHA-3.
What do we mean by secure in hashing algorithms? Hashing functions have many safety characteristics, including collision resistance, which is provided by algorithms that make it extremely hard for an attacker to find two completely different messages that hash to the same hash value.
Pre-image resistance is also a key factor for hash algorithm security, an algorithm that is pre-image resistant makes it hard and time-consuming for an attacker to find the original message given the hash value.
There are few incentives to upgrade to SHA-3, since SHA-2 is still secure, and because speed is also a concern, SHA-3 isn't faster than SHA-2.
What if we want to use a faster hash function more secure than SHA-2 and at least as secure as SHA-3 ? The answer lies on BLAKE2:
# hash with BLAKE2 # 256-bit BLAKE2 (or BLAKE2s) print("BLAKE2c:", hashlib.blake2s(message).hexdigest()) # 512-bit BLAKE2 (or BLAKE2b) print("BLAKE2b:", hashlib.blake2b(message).hexdigest())
BLAKE2c: 6889074426b5454d751547cd33ca4c64cd693f86ce69be5c951223f3af845786 BLAKE2b: 13e2ca8f6a282f27b2022dde683490b1085b3e16a98ee77b44b25bc84a0366afe8d70a4aa47dd10e064f1f772573513d64d56e5ef646fb935c040b32f67e5ab2
BLAKE2 hashes are faster than SHA-1, SHA-2, SHA-3, and even MD5, and even more secure than SHA-2. It is suited for use on modern CPUs that support parallel computing on multicore systems.
You can also easily hash an entire file, just by reading all the file content and then passing the file bytes to any function we covered. Check the corresponding code here.
To wrap up, for further readings, you need to read the official Python documentation for hashlib module. Also, If you're interested in cryptography, Serious Cryptography book helped me a lot of learning crypto, I hope you do too!
Happy Hashing ♥View Full Code