There have been concerns among security professionals and privacy advocates about changes to the Australian 2016 Census. The biggest concern is how the ABS plans to combine your private data. The ABS will link your Census records across multiple products, services and share it with other government departments.
In the past, this has never been a problem because the ABS never used our individual name and address data. Consequently, people could answer uncomfortable questions honestly, with the knowledge that even if data were to leak, there would be no back to them.
This year that has changed, the ABS revealed plans to assign Australians a unique identification number called a Statistical Linkage Key or SLK. It is just a way of turning your identity information into a semi-random looking serial number. While a unique key is useful for tracking people within a study it introduces security risks.
The government has told us to relax and remove our tinfoil hats because, while the ABS will collect names and addresses, there will be no way to identify personal information; because our identities are going to be scrambled using a secure SLK.
Unfortunately, the ABS hasn’t provided any information about how they plan to construct the SLK. The ABS’ own website give us some clues; the site describes a standard called SLK581 which turns your name, date of birth, and gender in to an SLK.
According to the site, Jane Smith (Female) born 01/01/2007 is turned in to MIHAN010120072 using components of her name, date of birth and gender.
S MI T H - J AN E, 01 01 2007 2 = MIHAN010120072
At first glance, the SLK MIHAN010120072, offers a bit of privacy from casual snooping by research staff, but it offers zero security. Because we know Jane’s name, we can quickly identify her record by scanning through all the MIHAN IDs.
The ABS says that the Census SLK will be ‘hashed for security’. The claim is that the hashed SLK will be impossible to reverse and will securely protect your identity. But, that’s not necessarily the case.
Hashing works the same way every time, so the same input ‘components’ result in an identical output hash - every time. Even when hashed, we can use Jane’s name to ‘brute force’ her SLK; we can simply test every combination of input components: name; date of birth; and gender. (See: “what is hashing”)
For the proposed hash SLK581, this means creating and hashing about 85,000 combinations. I tested this on my old laptop computer, in 5 minutes I wrote a program that generated all 85,000 possible Jane Smith records in 200 milliseconds - literally less time than the blink of an eye!
There are suggestions that the ABS will make the SLK harder to break by adding more fields like your address. Adding data complicates matters somewhat; combining an address will make the hash 11 million times harder to break.
That’s still very easy in security terms. For about $25 a cloud-computing-cluster can complete the search in an hour or two.
There are ways to make more secure identifiers, but history shows us that even the most secure databases and institutions have been breached.
The punchline is, security is hard and very complicated. If someone tells you that they are using a secure hash to create keys, that’s great. But, the devil is in the details; a poorly implemented SLK is always going to be easy to break.
That’s fine for large pools of aggregate data, but when that data exists forever and can be used to identify you directly, we need to make sure we truly understand the security implications.
A longer more detailed version of this post is available on my LinkedIn: