A cryptographic hash function is a mathematical algorithm that maps data of arbitrary size to a bit string of a fixed size (a hash function) which is designed to also be one-way function, that is, a function which is infeasible to invert.
The input data is often called the “message”, and the output (the hash value or “hash”) is often called the “message digest” or simply the “digest”.
Cryptographic hash functions have many information-security applications, notably in digital signatures, message authentication codes (MACs), and other forms of authentication. They can also be used as ordinary hash functions, to index data in hash tables, for fingerprinting, to detect duplicate data or uniquely identify files, and as checksums to detect accidental data corruption.
An example application of a hash function is password verification. Storing all user passwords as cleartext can result in a massive security breach if the password file is compromised. One way to reduce this danger is to only store the hash digest of each password. To authenticate a user, the password presented by the user is hashed and compared with the stored hash (note that this approach prevents the original passwords from being retrieved if forgotten or lost, and they have to be replaced with new ones). The password is often concatenated with a random, non-secret salt value before the hash function is applied. The salt is stored with the password hash. Because users have different salts, it is not feasible to store tables of precomputed hash values for common passwords.
md5 -s 'hello world'
md5 /path/to/file
echo hello world | shasum
shasum /path/to/file
echo hello world | shasum -a 256
shasum -a 256 /path/to/file
Note: the flag
-a
default algorithm is1
(i.e. SHA1)Instead of piping a string you can run
shasum
and type string
But you’ll need to add a carriage return/line break and then…
Execute<ctrl-d>
(which registers an EOF on standard input)
echo hello world | openssl dgst -md5
echo hello world | openssl dgst -sha1
echo hello world | openssl dgst -sha256
Note: use
-hmac "some-key"
to convert algorithm into a HMAC
Encryption transforms data from a cleartext to ciphertext and back (given the right keys), and the two texts should roughly correspond to each other in size: big cleartext yields big ciphertext, and so on. “Encryption” is a two-way operation.
Hashes, on the other hand, compile a stream of data into a small digest (a summarized form: think “Reader’s Digest”), and it’s strictly a one way operation.
The Advanced Encryption Standard (AES) is a family of ciphers with different key and block sizes. The algorithm described by AES is a symmetric-key algorithm, meaning the same key is used for both encrypting and decrypting the data.
The downside of symmetrical encryption is the key needs to be transported somehow without being compromised. This is the problem asymmetric encryption solves and is primarily used with online communication (SSL/TLS).
A salt is a random, non-secret value which is combined with a password before a hash function is applied.
Salts help combat the use of rainbow tables for cracking passwords. A rainbow table is a large list of pre-computed hashes for commonly used passwords. For a password file without salts, an attacker can go through each entry and look up the hashed password in the rainbow table. If the look-up is considerably faster than the hash function (which it often is), this will considerably speed up cracking the file. However, if the password file is salted, then the rainbow table would have to contain “salt . password” pre-hashed. If the salt is long enough and sufficiently random, this is very unlikely.
A Message Authentication Code (MAC) is a string of bits that is sent alongside a message. The MAC depends on the message itself and a secret key. No one should be able to compute a MAC without knowing the key. This allows two people who share a secret key to send messages to each without fear that someone else will tamper with the messages.
HMAC is a recipe for turning hash functions (such as MD5 or SHA256) into MACs. So HMAC-MD5 and HMAC-SHA256 are specific MAC algorithms, just like QuickSort is a specific sorting algorithm.
A checksum has a special purpose — it verifies or checks the integrity of data. “Good” checksums are easy to compute, and can detect many types of data corruptions (e.g. one, two, three erroneous bits).
A checksum for a string should include each and every bit, and order matters. This means that “aaaaaaaaaaba” would hash the same as “aaaaaaaaaaab” where as a checksum could identify the difference.
A hash “collision” occurs when two different data inputs generate the same resulting hash. The likelihood of this happening depends on which function you use.
For example:
md5
generates 128-bit hashessha1
generates 160-bit hashessha2
generates 224, 256, 384, or 512 bit hashes