Skip to content

Search for Research Data

Search the ANDS Site

Search
Search
http://www.ands.org.au http://www.ands.org.au

De-identifying your data

Share
Share

Data de-identification, anonymisation and pseudonymisation are processes for removing identifying information from datasets, most commonly to protect the privacy of individuals. Data de-identification may also be used to protect organisations, such as businesses included in statistical surveys or other information such as the spatial location of mineral or archaeological findings or endangered species. Data de-identification may be mandated by legislation or ethical guidelines governing research.

The National Statement on Ethical Conduct in Human Research (2007, updated May 2015), published by the National Health and Medical Research Council, does not advocate use of the term de-identified data, but suggests the term 'non-identified' in preference.

This National Statement avoids the term 'de-identified data', as its meaning is unclear. While it is sometimes used to refer to a record that cannot be linked to an individual ('non-identifiable'), it is also used to refer to a record in which identifying information has been removed but the means still exist to re-identify the individual. When the term 'de-identified data' is used, researchers and those reviewing research need to establish precisely which of these possible meanings is intended.

Techniques

Identifying information such as identifiers, names, addresses, gender, date of birth or other identifying information can be removed from datasets entirely, or coded or encrypted. Information can also be masked by changing data values or by aggregation.

Iain Hrynaszkiewicz, Melissa L Norton, Andrew J Vickers, Douglas G Altman, 'Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers', British Medical Journal, 29 January 2010. doi10.1136/bmj.c181.

Implications for reuse

The purpose of de-identifying data is to allow it to be used by others without the possibility of individuals being identified. The loss of individual identities, however, means that it will not be possible to incorporate the data into other datasets which may include information about the same individuals.

For an overview of the potential for sharing data without linking it, see the Australian Bureau of Statistics; A good practice guide to sharing your data with others.

When to de-identify data

The need for data de-identification arises when data is published, shared or reused. Researchers need to consider legislation, policies and ethical guidelines that apply to them, as well as any undertakings made or informed consent obtained from funders or research participants.

If data is only being stored in its original form by the researcher who created it, and is not being shared or published, ethics and privacy requirements are usually met through access control and data security, rather than through data de-identification. Identifiers are usually needed for analysis of research data by the original researcher.

Avoiding re-identification

When de-identifying data it is important to keep in mind the possibility of re-identification. This usually occurs with large data sets which can be subject to data mining or other analytical techniques. For a lay guide to some of these issues, see "Anonymized" data really isn't-and here's why not.

Legislation

De-identification is also impacted by legal requirements. In Australia, in addition to the Commonwealth legislation (the Privacy Act (Cwlth) 1988), each state and territory has its own privacy legislation. The Office of the Australian Information Commissioner offers links to all this legislation, and to other material.

National guidelines

Other resources

Examples of guidelines, discussion of issues around de-identification and two case studies (this is not a comprehensive list):