Data Pseudonymization and POPIA – Bill of Health
By Donrich Thaldar
Just as Zeus, the King of the Gods in Greek mythology, assumed various forms to conceal his true identity, so does modern data often undergo transformations to mask its origins. Zeus could become a swan, a bull, or even golden rain to achieve his purposes — all while maintaining his essence. Similarly, pseudonymization techniques aim to alter data enough to protect individual privacy without losing the core information necessary for research or analysis. This entails replacing data subjects’ identifying information in a dataset with unique codes, while keeping another dataset that links these data subjects’ identifying information with their allocated codes. Therefore, just as Zeus’ transformations were sometimes seen through by keen eyes, pseudonymized data can be re-identified by those having access to the linking dataset.
Identifiability within datasets has always been the cornerstone in determining if a dataset falls under the protective umbrella of data protection laws. But whether context is relevant in making this determination has been controversial. In other words, should the question of whether a dataset is identifiable be answer in a context-agnostic way (nobody anywhere in the world can identify the data subjects in the dataset), or should it be answered with reference to a specific context (in the hands of a specific person, that person cannot identify the data subjects in the dataset)? This question has been at the heart of several debates and even reached the steps of European courts. Just recently, in the landmark case of Single Resolution Board v European Data Protection Supervisor, the European Data Protection Supervisor argued that even with pseudonymization, data subjects are not truly cloaked in anonymity. Why? Because somewhere, tucked away, there exists a linking dataset that could potentially unmask their identity. However, the EU General Court, relying on an earlier case of Breyer v Federal Republic of Germany, held that identifiability should be seen through the lens of the particular context of the involved party standing before them in court. Thus, if such party does not have lawful access to the linking dataset, in its hands the pseudonymized dataset is viewed as anonymized data, and the European data protection law does not apply to it. Unhappy with this judgment, the European Data Protection Supervisor has filed an appeal. The appeal must still be heard in the EU Court of Justice.
All this litigation raises the question: How does South Africa’s Protection of Personal Information Act (POPIA) deal with pseudonymized data? Although POPIA does not explicitly refer to pseudonymized data, I argue that it governs pseudonymized data in a context-specific way, for two reasons: First, POPIA’s test for whether personal information has been de-identified centers around whether there is a “reasonably foreseeable method” to re-identify the information. Reasonability in South African law is associated with an objective, context-specific inquiry. Second, POPIA itself contemplates scenarios where the same dataset will be identifiable in one context, but not in another.
A useful way to ground this theoretical discussion is through the example of two hypothetical universities — University X and University Y — that are engaged in collaborative research. University X collects health data from participants but immediately pseudonymizes it. The university retains a separate linking dataset that could re-identify the data if needed. When this pseudonymized dataset is with University X, which also has the linking dataset, the data qualifies as “personal information” under POPIA. Therefore, any processing of this data, including analysis for research, must comply with POPIA’s conditions.
But what happens when University X shares this pseudonymized dataset with University Y, which does not receive the linking dataset? The dataset is not identifiable in the hands of University Y. In other words, University Y can process this data without falling under POPIA. Here, a conundrum arises. When University X transfers the pseudonymized dataset to University Y, does it have to adhere to POPIA’s rules for data transfer? After all, at the moment of transfer, the pseudonymized dataset is still in control of University X. Does this not mean that POPIA should apply to the transfer? I suggest not. Since the act of transfer is oriented towards University Y (the recipient), and the pseudonymized dataset is not identifiable in the hands of University Y, POPIA does not apply to the act of transferring the pseudonymized dataset.
The context-specific interpretation of identifiability in POPIA opens the door for a more nuanced understanding of data sharing, particularly in research collaborations. On the one hand, the institution that collects, generates and pseudonymizes the data (University X) must adhere to POPIA’s provisions for all internal processing of the data — as long as it retains the capability to re-identify the data. On the other hand, the receiving entity (University Y) is not bound by the same requirements if it lacks the means to re-identify the data. This dual approach seems to offer the best of both worlds: fostering collaborative research while upholding the principles of data privacy.
Navigating the realm of data privacy is as intricate as deciphering the myriad forms of a shape-shifting deity. But, at its core, the goal remains consistent: protecting the essence of identity, whether divine or digital. And as we move forward, the hope is that we will find a balance that respects both the quest for knowledge and the sanctity of identity.