Identifying Data
Security professionals often use hashes to represent data - think of it like a unique fingerprint or “key” for the data. While there are many ways to make data keys (we could assign them sequentially, or pick them at random) hashes provide a way to build a unique key from the data itself.
The purpose of a key is to allow us to reference a piece of data. Perhaps we need a key to identify movies; we could define a data key as:
- the first letter of each word in the title,
- directors initials
- and the year of release.
So, Indiana Jones and the Temple of Doom, by Steven Speilberg (1984) would have the key: IJATTODSS1984.
This key is pretty simple and easy to reverse. Because we know the key (IJATTODSS1984) and how it’s made, we can identify the movie by searching the Internet for releases in 1984, and directors with the initials S.S. This key is also not guaranteed to be unique, and it’s not secure. If our goal isn’t security, this key could be perfectly cromulent.
Secure Hashes
A more thorough way of combining data is to use a secure hash algorithm. A secure hash is similar to a baked cake, we use a predetermined process for combining unique ingredients and the result is the cake. If we do the same process, with the same ingredients, we get the same cake - every time. It’s also impossible to reverse; we can’t un-bake the cake.
Hashes are even more secure; in the cake analogy, examining a chocolate cake reveals that one of the ingredients is chocolate. With a secure hash, we can’t guess any ingredients based on the output. Furthermore, even a tiny change to an ingredient radically changes the hash output. This means we can also use hashes to detect modification:
Input | Hash (output) |
---|---|
Observe the Capital letter | yQ70NLk9NCoE3JLefr5i1beeaU7R4tAuMalRGj8PMtg |
Observe the capital letter | BlczCpjKFQ66sHZz15pwO1EZIw62rrR4fpbU9tewnpA |
Notice how much the output changed?
Secure hashes are used as fingerprints to represent other information. They are critical security tools because hash values can be used to identify unique data and verify content. We can hash anything we want, and for each different input, there will be a unique hash output.
Brute Force
While hashes are used in security, they aren’t secure alone. Because the process is deterministic, if the inputs are easy to guess, we don’t need to reverse the process, we just make guesses; we don’t un-bake the cake, instead we mix 1,000 different batches until we end up with the cake we want.
For example, if we simply ‘slap a hash on’ our movie key input, the output becomes jumbled:
Indiana Jones and the Temple of Doom:
IJATTDSS1984 - glTDKakoUPbVOd03b0gW7idkUX2l4CNVFK9DMWRIDXo
But, it’s not secure, because we can guess the inputs. With a little computing power and a database, we can search all movies, directors and years, to rebuild all known hash combinations - this is called a brute force search. It’s harder than guessing Indiana Jones from IJATTD, but still not difficult in terms of security.
For example, IMDB contains 353,866 feature movies, and 441,281 directors. So that’s 156 billion combinations to brute force. While 156 billion sounds like a big number, computers are fast. To re-generate all keys for every IMDB movie, my clunky old laptop would take a day. Or, for a few bucks I could hire a cloud-computing cluster and be done in an hour or two.
The punchline is that hashes are an important security tool, but they alone provide no security at all. The construction of the hash inputs are what’s important. If I can guess your inputs, I can find your hash.
212
READY.