We – the government appointed expert group – published our final public report last month (informally summarized by me in English here) on the Norwegian COVID-19 app "Smittestopp", ascertaining whether security and privacy is responsibly taken care of.
In a group effort such as this one, there is often compromise – in order to be able to end up with a result everyone involved can justify to themselves, and stand by.
We all agree on the conclusion in our report.
There are, however – in my opinion – certain issues that are not addressed in the final report (and that might be out of scope for the report), that I think are imporant to consider. I will state some of these here, in addition to expanding on issues that appear in the report.
What I write here is my own professional opinion on security- and privacy aspects of the Norwegian COVID-19 contact tracing app, "Smittestopp". I do not (and can not) speak on behalf of any other persons, including any other members of the government appointed expert group. Nothing described herein is covered by NDA or legislation – everything is completely based on public information and what is described in our public report.
Introduction and Context
Comparatively, Norway was fairly early in rolling out an app, and the app itself is arguably one of the most invasive ones on the market – at least in a European context, where there are few (if any) other countries with the same configuration of privacy-impacting factors.
Smittestopp is a closed-source solution; requires registration and de facto identification of users; collects sensor data from multiple sources (both BLE and GPS); and uploads data from all users, all of the time, to a centralized storage – unless users pause collection, but even then "heartbeats" that contain information about BLE and GPS-activations in the app are sent in the background.
The degree to which (if any) there is data minimization in such a solution has been questioned by experts in public debate from the get-go.
Some of the design choices has been defended by involved parties in the media, as a prerequisite for attempting to both contact tracing and generating data for monitoring of public movement and other research and analysis purposes (including datasets for long term use). One might then question the choice of attempting to solve both problems with one application, and what the privacy implications of this might be.
Any privacy engineer (and indeed many others with a modicum of technical or practical understanding) will quickly see that these design choices have practical consequences – and, in my assessment, huge privacy implications.
What one is interested in when performing contact tracing is "who met whom". The identity of either party, or the location of contact is not relevant to prove contact. You thus don't necessarily need to know who the involved parties are, or where the contact took place.
The argument made for the use of location data in the case of Smittestopp is to attempt to compensate for lack in data quality as a consequence of Bluetooth API limitations at the time: Bluetooth wouldn't work reliably in the backround on iOS, whereas Android might kill apps that continuously used Bluetooth or location services in the background.
On the other hand, GPS has a typical accuracy of 3 - 10 meters under ideal conditions (meaning outdoor usage).
A proper and transparent evaluation of the possibilities available here might then include:
- How big of a problem does the current API limitations pose in practice (i.e. "could we get by at all?")
- If workarounds are needed, how do we evaluate alternatives (for instance, non-exhaustively):
- Attempting to "live with" the current limitations
- The Singaporean approach (in practice implementing a faux sleep-mode, necessitating keeping the app in foreground, but dimming screen when device is positioned "upside-down")
- Collecting location data, which is Personally Identifiable Information (PII)
- How do the privacy implications of the respective alternatives size up against each other and the issue at hand?
Though it has been claimed that this data is "anonymous" in several contexts, this is incorrect. By virtue of being personally identifiable information, location data cannot be anonymous by definition. Location data can in itself reveal a person's identity. There is no such thing as "anonymous location data" on an individual basis. In aggregated datasets, one can have certain quantifiable guarantees about degree of privacy (e.g. via k-anonymity, differential privacy), but this gets complicated very quickly for a variety of reasons, such as temporal correlations or re-identification by combining data sources.
In practice, location data is only a clear functional requirement (of sorts, not necessarily to this degree of accuracy) in the case of monitoring public movement or other research – the second purpose of the app.
When we talk about centralized storage in the context of contact tracing apps, we usually mean systems that are based on collection that is uploaded to a central server, which holds all data. This is in contrast with decentralized systems, where every user's data is stored on their device – until it is needed. One should also note that most popular decentralized solutions are not distributed, i.e. they still use a central server as a communications channel of some sort (as opposed to purely peer-to-peer communications).
The argument made in favor of data centralization in the case of Smittestopp is that augmentation of user data with data from other users is needed in analysis. It is also a prerequisite for the purpose of looking at movement patterns (to evaluate government actions), or do further unspecified research on aggregated data – which is also a purpose of the same app.
A centralized datastore is in principle a defining factor when dealing with private data. Its very existence makes misuse, function creep, leakage and so on possible in a way that a decentralized solution just plainly doesn't – as you can't lose or abuse data you don't have.
Alternative sources to aggregated data may already exist, such as the data telco's already provide in aggregate form, and which has already been used for the same purposes in Norway. The upside in using this is reusing existing data (not collecting, storing, protecting the same data) and existing control mechanisms that protects security and privacy. The downside is that this data might not be as precise as location data collected directly from devices, as resolution would depend on a host of factors, including cell site density.
The privacy cost of uploading every user's locations and movements, as well as who they have met, and timestamps for all these events is undoubtably much larger than uploading what data is needed when needed, e.g. prompting users to upload their movements (or even just BLE-defined contacts) once a person they have been in contact with is positively diagnosed with COVID-19.
The current app is all-or-nothing, in that users can chose to have their data used for all the app's purposes, or to not use the app.
It is obviously not ideal to not let users explicitly opt in for either purpose. Nor is it in accordance with regular GDPR-demands (though we must remember that this is a major crisis), nor even best practice. A potential consequence of implementing one app that collects a lot of data (as a consequence of enabling two purposes), as well as not giving users a choice is that user uptake may be hampered.
Data integrity and user traceability
The use and communications of static device identifiers makes it possible to track or impersonate others, trace users in limited/partial leaks, and so on. Just about every other proposed solution (both protocol specifications, and existing apps) use "rolling" identifiers in one form or another.
Data was temporarily stored in an unencrypted database on user devices in previous versions of the app, which made it possible to inject or modify data before uploading it to the server.
The application connects to a cloud solution using an everlasting connection string, using no other session handling.
All of this means that data integrity cannot be guaranteed, at least in the parts of the dataset collected before fixes for some of these issue was released.
Identifying users and analytics data
In order to use the application, users have to register their phone number (de facto identifying themselves). Functionally, there is no need to identify any involved party. Even in contact tracing, users could be notified by the application when a contact has been diagnosed with COVID-19 by health authorities. One could argue that registration is a mechanism that protects against bogus uploads to some extent – but this, in addition to protection of privacy, is in a sense built-in to decentralized approaches that demands human intervention before any upload takes place (e.g. distributing upload-codes, in the case of DP-3T) – many of which also lets users choose specificly what timespans to share.
Note: I am not a lawyer. Read my reflections with this in mind.
The regulation that is the formal basis for processing of the data mentions that health- and location data collected for this purpose can not be shared with law enforcement, etc. Bluetooth-data, however, is not mentioned. I interpret this as sharing of Bluetooth-data being permitted. This would mean that parties the data is shared with could be able to, for instance, build social graphs of the data subjects. Though the regulation puts in place certain limitations (including a sunset-clause), the regulation also states that it can be changed at any time by the government via a new regulation.
The CLOUD Act and the Patriot Act mean that the U.S. government can demand, and secretly obtain data stored on the servers of American providers, even abroad. Most of the backend-services of this solution is hosted by Microsoft (a U.S. company) in Ireland, where there is already precedence for this.
Another consequence of the current Norwegian solution is that Norway will be unable to easily achieve data interoperability and collaboration with other European countries, as most of these already has implemented or will implement solutions based on Apple and Google's new APIs or DP-3T.
Other countries' contact tracing systems will therefore (in theory) be able to register contact events that involve citizens of other countries and/or persons using other apps, including apps produced by other countries' health officials.
The publicly available DPIA (Norwegian) does not appear to seriously consider alternative approaches in implementation, nor consider malicious use of the data, data breach, or data leakage other than via security features of the mobile apps. Some of the probabilities stated in the risk-assesments seem too good to be true.
Using a static identifier that is never rotated is obviously a bad idea, and makes it possible to track users.
There is something to be desired in transparency; Both the purpose(s) of the application, as well as just how data is "anonymized" and aggregated should be clearly and specificly communicated to the public. The anonymization process was not finalized during our evaluation, other than involving various forms of aggregation – which does is not necessarily able to make any guarantees with regards to anonymity.
If code were open sourced, the public would be able to verify the functionality, as opposed to depending on "security by obscurity".
The fact that the functionality used to bind phone number to the cloud device ID is implemented using a so-called "preview feature", which the supplier says one should not use to process personal data or any other data that is subject to heightened compliance requirements, is obviously not great.
There were also various logging- and compliance-issues, such as users not being able to see any data about their Bluetooth-contacts, access logs from health authorities or view audit logs after requesting deletion.
The contact analysis code was very complicated and complex (low quality in a maintainability-context), and had weaknesses both in implementation and in method.
The app also used SMS to notify users, which is not a secure communications channel, and is easily spoofable.
The group's recommendations (in our final public report) included:
- Clarifying the regulation which serves as basis for processing (changing "anonymized" to "deidentified"), to enable data aggregation in practice.
- Split purposes, and allow users to choose how their data is used (split into several apps, or implement opt-in functionality). This might both protect users' interests and lead to more users.
- Remove all data that is not needed (e.g. delete location data older than 15-16 days, delete location data without crossing trajectories at regular intervals) to increase data minimization.
- Implement differential privacy in data aggregation processes, to reduce risk to privacy and increase accuracy of the resulting dataset.
- Consider rewriting to a more distributed solution, post stabilized contact tracing criteria, as this could be both less invasive and lead to an increase in users.
- Implement local differential privacy before uploading user data, to further decrease privacy impact.
- Make as much source code as possible available as open source, to give the public real insight into how their data is used.
- Regularly evaluate the solution, purpose and effect, to ensure that the solution is still suitable, and the problem is still relevant.
It is not known whether digital contact tracing is a viable solution to the problem at hand. It is unknown what value it can bring, and if it is even feasable. Most scientists with direct experience claim digital contact tracing is at best complementary addition to manual contact tracing.
The Norwegian app is not in accordance with common European guidelines, the EU commision's recommendations on apps for contact tracing, the EU resolution on coordinated work against COVID-19, nor guidelines from the European Data Protection Board (EPDB).
Surveilling movements and contacts of all users of the app is an extremely invasive measure, and as the effectivenes and usefulness of the system is not clear, the proportionality of this measure is questionable at best. One would expect some sort of explanation, analysis or Privacy Impact Assessment (i.e. PIA; not a DPIA, or Data Protection Impact Assessment) – but none exists.
Regarding anonymization, FHI has written (Norwegian) this comment on our report (freely translated from Norwegian):
The report also has a recommendation of anonymization of data for analysis purposes, through so-called differential privacy. FHI has at this point already developed an elaborate system for anonymization that in FHIs view will have an equally anonymizing effect as so-called differential privacy, but which is easier to implement, communicate and doesn't lose any data quality to speak of.
While the statement makes sense syntactically, evaluating the logic is left as an exercise for the reader: Is it possible to somehow deliver anonymization to the same extent as differential privacy (but without the formal guarantees)?
The app is not open source, as it is claimed this might lead to bettered security in the long run, but will give would-be-attackers a chance to exploit vulnerabilities before anyone sees or fixes them in the short run.
Simula (the supplier) has themselves argued, in the case of whether to open source or not, that even if they acknowledge potential positive sides of doing this, it's about trust vs. real security. It's strange, then, that in discussions of privacy, they seem to believe that certain choices and actions are fine and defendable because the involved parties (themselves, FHI, the Norwegian government, etc.) are "good", as opposed to alternative parties (whether real or hypothetical).
Work on the project apparently started in early March. While things looked ugly back then – we didn't know how many could die etc. – anyone with serious competency in privacy would have thought (and at least considered implementations) along the lines of the distributed protocols we know today. Still, one might be able to forgive the choice to be more ambitious in data collection; What's worse is the refusal to adjust at any point of change in situation, and as we have gained a better technical understanding of the issues.
In aggregate, all of the above point to a flawed decision-making process, where privacy can not be said to be not "built in" – and the solution itself would seem to be to be the very antithesis of "privacy by design".
The Norwegian public generally trusts their government to a large extent. This makes it possible for us to take collective action in ways other countries cannot. At the same time, if government actions are more privacy-invasive than they need to be – in and of itself, or even by enabling a leak or misuse – we risk undermining this very trust.
An app that works this way should, in my professional opinion, obviously not be used by lawyers, journalists, people that work in defense, live at a secret address, are in positions of power, and so on.
Media strategy and public communications: A sidenote
Pretty early on in the process, Simula – the suppliers of the app, under contract from Norwegian Institute of Public Health (Folkehelseinstituttet, FHI) – was given critical feedback from the Norwegian Data Protection Authority.
Simula responded by writing blog posts asserting the normality of the situation, and that this was no cause for concern.
During our evaluation, a joint statement was released by Norwegian technology, security and privacy experts – asking the Norwegian health authorities to change course, which gained a lot of media attention.
While the Norwegian Institutee of Public Health remained silent, Simula wrote op-eds describing critics and those that would not use the app as selfish.
After delivering our final public report, which concluded that the solution handled neither security nor privacy responsibly, things got weird. During the press conference at which our group's leader presented our findings, the Norwegian Institute of Public Health commented that both security and privacy was responsibly handled in their opinion. At the same time Simula wrote a blogpost, in which they attacked the expert group's integrity – claiming that our conclusions were based on personal opinions, and that our recommendations were politically motivated (see screenshot below, Norwegian).
This part of the post basically translates to:
What about privacy?
The expert group concludes that they "think privacy is not well enough taken care of". Simula would like to point out that this is not justified with any sides of the app itself. The expert group do not wish that location data be collected, and they therefore conclude that privacy is not handled good enough.
Several of the recommendations from the expert group, on the other hand, bears the impression of being the members' views on some familiar discussions that have been around Smittestopp along the way. This especially goes for the members of the group wanting contact tracing only locally on the phones (Recommendations "Go over to a dsitributed model for collection of data" and "Split the purposes and make it possible to elect to be part of only one") and that the members wish that the source code be made publicly available. ("Make available as much source code as possible as open source"). These are familiar subjects of debate, but has little to do with how Smittestopp works.
Their blogpost has since been edited (exchanging "political" for "personal"), but questioning the motives of an impartial external group tasked with evaluating their work in this way is concerning nonetheless.
In addition, Kyrre Lekve (Deputy Managing Director at Simula) said "There are many countries I think should not use the Norwegian solution – precisely because they don't have a well regulated democracy; They don't have strong privacy interests and governments that keep watch" (freely translated from Norwegian) in episode #2 of the Norwegian podcast Waterhouse.
Privacy would then by definition not be handled responsibly, as any privacy guarantees would be contingent on trust.
Key point: Data protection and and privacy are different things.
Although Simula and the Norwegian Institute of Public Health has just recently announced that they are experimenting with an app based on Apple and Google's new exposure notification APIs, it's too little, too late...
On June 15th, the Norwegian Data Protection Authority concluded that (in the context of the low Norwegian rate of infection), the degree of privacy-invasiveness in the Norwegian solution for contact tracing COVID-19 is not justified, as it is disproportionately invasive to privacy. They told FHI that they intended to enforce a temporary ban on processing of personal information from Smittestopp by the 23rd of June.
News broke that morning that FHI would stop all data collection from the app, and delete all previously collected data – though they would be able continue collection if they did it in a more responsible way, according to the DPA. This means that FHI themselves chose to delete all data.
The DPA said they were especially critical of the use of location data, pointing out that this goes against the recommendations of the World Health Organization and the European Data Protection Board.
FHI were given a week to document the usefulness of the app, and make neccessary adjustments.
Then, a day later, Amnesty International announced that they found Smittestopp to be among the most dangerous tracing apps for privacy.
On June 17th there were reports of people that had uninstalled the app many weeks prior were getting tex messages informing them about the pause in data collection – even though their data (including phone numbers) should already be deleted, acording to FHI.
FHI delivered a response to the Norwegian DPA, plus some other documents on June 24th, in which they state that they disagree with the DPA.
Update July 7th
On July 7th NRK reported (Norwegian) that the Norwegian DPA had implemented the temporary ban on processing of personal information from Smittestopp for FHI.
FHI stated they were working to follow up the parliament's decision (in line with the recommendations of our report) of splitting the app in two based on its functionality: an analysis-part, and a part for contact tracing.
Norway is hence worst-in-class in contact tracing apps for COVID-19.
This is pretty unexpected – and not something I would have seen coming half a year ago.
To be able to defend the privacy impact and degree of invasiveness, one would need a (probable) effect of utility; The calculus of necessity includes legality, necessity and proportionality. To claim it to be necessary, you need a (probable) effect to point to.
Amnesty, the Norwegian DPA, Parliament, EU, Google and Apple, the independent expert group, and 300 professionals in privacy and technology have all warned the involved parties several times. The fact that there has been no change until the entire app was put on hold is very strange given the degree of trust we usually pride ourselves on placing in experts in Norway.
I dare say that most engineers – and those with serious competency in security and/or privacy in particular – would see the issues inherent in the Norwegian model, and would have explored other alternatives to a larger extent.
There are privacy-preserving alternatives (such as the existing protocols and solutions), and they should always be explored first.
As for what the supplier, the producer and responsible party, or even politicians say and think about the privacy-implications of such solutions: When independent third parties are tasked with evaluating them (be it an expert group, the DPA or otherwise), it matters little what they feel about the results.
It's laudable to want to solve this very real, and very big problem with the means available to us. But we can't excuse bad work and a lack of understanding by claiming that the ends justifies the means.
What should be done now is to rewrite the app in a more privacy preserving way, as well as trying to learn from the methods, processess and decisions that have lead to this outcome in order not to make this mistake again.