As researchers or programmers, we will often want to protect our data by anonymising sensitive information like names and addresses. To do this, we can combine pieces of user data to make an ’anonymous’ key that can be used in-place of the sensitive information. Instead of referring to “Jane Smith of Drury Lane”, Jane could have a nonsense identifier like “675AF3C”, which can be used throughout our study.
(Want more info? See security brief: Statistical Linkage Keys and Security)
Anonymising data with hashes and entropy
A common method for anonymising fields such as name and date of birth is to combine them with a hash function. But, because secure hash functions are ’deterministic’, they produce the same identifier for the same set of input data. If we have limited hash inputs, we will have a limited range of possible outputs; if we limit things too far, an attacker can run a brute force search to identify our original inputs. Continue reading