Tech explained: Hash puzzles and proofs of work

Following my blockchain Computerworld article, I’ve been getting quite a few questions about how the Bitcoin blockchain is protected by ‘difficulty’.  Mining blocks is hard, so, what are miners doing that uses so much time, effort and power? They’re proving their work, by solving hash-puzzles.

See also: Burning coin – estimating the energy use of the Bitcoin network in 2016.

Puzzles as Proofs of Work

Puzzles come in many shapes and forms: from the common jigsaw puzzle, to number puzzles, and even crosswords. Regardless of the type of puzzle, they all have some similarities: puzzles take time to solve, they have varying levels of difficulty, and most solutions are obvious and easy to check.

With most puzzles, it is possible to alter the difficulty by changing parameters. For example an easy jigsaw might have 100 pieces, while a difficult one has 5000 pieces.

The more difficult a puzzle is, the longer it will take to solve.

 Similarly, hash-puzzles take time to solve, vary in difficulty, and are easy to check. However, unlike solving a jigsaw puzzle, logic doesn’t help. Hash-puzzles are much harder, and they can only be solved by trial and error.

How a hash works

A hash is like a fingerprint. It’s a massive, random-looking number that uniquely identifies a piece of information. Given any electronic data, it is possible to generate its hash (which looks like a very long serial number). Hashes are infinitely sensitive to change, even a tiny change to the input results in a wildly different hash output.

See also: Tech explained: What is a hash, what is brute force and are hashes secure?

While hashes appear random and unpredictable, they are deterministic. For each input, only one hash output exists — every time I use the same hash inputs, the same output value will come out.  Hash-puzzles rely on both the random and deterministic properties of hashes to prove and verify work.

Solve this

If I challenge you to roll a dice until you saw a two: for each roll you’d have a 1/6 chance of throwing a 2. If I gave you two dice and wanted double twos, the likelihood of that is 1/6 X 1/6 = 1/36 (and with three dice, 1/216). The difficulty increases exponentially, and the challenge takes longer to complete.

Solving a hash puzzle, making the hash match the puzzle rules

A short python script to solve hash puzzles. Using random values (a nonce) to make the hash result match the rules of the puzzle. The difficulty of the puzzle is affected by how many letters need to match the rules. The more letters, the more difficult the puzzle – A single letter is easy to match, but three letters starts to take a long time!

I can do the same with a hash-puzzle. I can say “roll hashes until you find one starting with the letter ‘A’” (probability 1/64), or roll a hash that starts with something longer e.g.: “Hi” (1/4096) or “Dog” (1/262,144). The only way to solve the puzzle is to try random input values, perform the hash operation, and then look at the output to see if it matched the puzzle rules. If I want to make the puzzle take longer or require more resources, I just make the rule harder to match.

I can also require that you find a solution where the input starts with some pre-defined content. Your task is to append some random value to my starting content, hash, test and repeat. My input doesn’t affect your effort, but it does ensure you start from scratch.

Bitcoin puzzles

Bitcoin uses the transaction data, and ‘chains’ the previous block’s hash value as a starting input for the next block’s. Miners then race to tack on massive random numbers until they find one that produces a hash that matches a specific pattern with a certain level of difficulty.

The blockchain is protected from modification because the hash-puzzle takes (on average) about a one-hundred-billion-billion attempts to find — it’s more work the any individual is capable of producing. But, the miners work in parallel competing to solve the next puzzle — thousands of miners, rolling the hash-puzzle-dice, over a million-million-million times per second.

Hash-puzzles are an elegant solution that provides security to the Bitcoin blockchain.




Tech Brief: Anonymising sensitive data with entropy and salt.

As researchers or programmers, we will often want to protect our data by anonymising sensitive information like names and addresses. To do this, we can combine pieces of user data to make an ’anonymous’ key that can be used in-place of the sensitive information. Instead of referring to “Jane Smith of Drury Lane”, Jane could have a nonsense identifier like “675AF3C”, which can be used throughout our study.

(Want more info? See security brief: Statistical Linkage Keys and Security)

Anonymising data with hashes and entropy

A common method for anonymising fields such as name and date of birth is to combine them with a hash function. But, because secure hash functions are ’deterministic’, they produce the same identifier for the same set of input data. If we have limited hash inputs, we will have a limited range of possible outputs; if we limit things too far, an attacker can run a brute force search to identify our original inputs. Continue reading

Security Brief: The Australian Census and Statistical Linkage Keys

There have been concerns among security professionals and privacy advocates about changes to the Australian 2016 Census. The biggest concern is how the ABS plans to combine your private data. The ABS will link your Census records across multiple products, services and share it with other government departments.

In the past, this has never been a problem because the ABS never used our individual name and address data. Consequently, people could answer uncomfortable questions honestly, with the knowledge that even if data were to leak, there would be no back to them.

The Census Data Statistical Linkage Key (SLK)

This year that has changed, the ABS revealed plans to assign Australians a unique identification number called a Statistical Linkage Key or SLK. Continue reading

Tech explained: What is a hash, what is brute force and are hashes secure?

Identifying Data

Security professionals often use hashes to represent data – think of it like a unique fingerprint or “key” for the data. While there are many ways to make data keys (we could assign them sequentially, or pick them at random) hashes provide a way to build a unique key from the data itself.

The purpose of a key is to allow us to reference a piece of data. Perhaps we need a key to identify movies; we could define a data key as:

- the first letter of each word in the title,
- directors initials
- and the year of release.

So, Indiana Jones and the Temple of Doom, by Steven Speilberg (1984) would have the key: IJATTODSS1984.

This key is pretty simple and easy to reverse. Because we know the key (IJATTODSS1984) and how it’s made, we can identify the movie by searching the Internet for releases in 1984, and directors with the initials S.S. This key is also not guaranteed to be unique, Continue reading