How to use data for the public interest, even – or especially -- in a pandemic | Heinrich Böll Stiftung Hong Kong

Commentary

Governments, the private sector and other organizations increasingly collect massive amounts of data, some for the public good, other times not. A consistent framework could ensure data use is both beneficial to the public interest and sufficiently protected against misuse.

The amount of data governments, the private sector and other organizations collect worldwide is growing, and both the practice of collection and the results of it increasingly pervade our everyday lives, from smartphones and networked cars to data-controlled industrial production. Society has a growing expectation that data, on health in particular, can help us better identify problems (e.g. early detection of strokes, detection of influenza waves as they emerge in search queries) and enable new solutions (more personalized cancer drugs, contact tracing in a pandemic).

But how do we make sure that the advantages of that data benefit society as a whole, and not, as is often feared – and is the case too often now -- only the small group of organizations that collect the data? To address this issue, we must explore three sub-questions: Who uses data for the benefit of society, how do risks to the public interest arise, and how can we guarantee access to data that benefits the common good? In principle, health-related data is no different from data in other domains, so this article addresses data in general, while adding specifics for data used to improve health outcomes.

Who uses data: serving, promoting, or endangering the public interest?

Many associate the “common good” only with non-commercial activity and demand that research and civil society be given more access to data and that the state make better use of its data. While these actors explicitly serve the public interest, it does not mean that common good cannot also be produced elsewhere. Commercial players, in particular, should not be rashly excluded, as they often generate common good, even if it is just a by-product – the benefit of a service or good beyond its commercial price tag. Such a rather open notion of what constitutes common good offers the advantage that, as a starting point, all interests are considered, for the debate about what benefits society and what furthers the common good is often guided by different, sometimes conflicting values.

There are numerous examples of how social and economic interests are not mutually exclusive: Platforms for the exchange of medical research data, such as YODA, also include corporate partners whose interest is to bring profitable new drugs to market. A faster (while equally safe) development of therapies is certainly also in the public interest. YODA has already helped calculate the effectiveness of individual drugs or improve models for calculating healing prospects.

Similarly, many pharmaceutical companies drive current efforts to find and produce a vaccine for the coronavirus that causes Covid-19, as they anticipate huge global demand and receive government financial support such as advance purchase guarantees to offset some of the costs. But for society, paying an enormous bill for vaccines is still strongly preferable to enduring the alternatives such as restrictions on business or entire shutdowns to contain the pandemic.

Still, data use can at times undermine social interests: In the case of targeted advertising, for example, the harm to consumers’ privacy caused by the required behavioral tracking and profiling might well outweigh the public benefit. These practices are the subject of intense debate within European Union institutions and also have been raised in the context of the proposed acquisition of fitness tracker Fitbit by Alphabet (Google’s parent company). While weighing the non-monetary and economic impacts of data-driven services is a complex task, it is necessary so that policymakers and regulators can make sound decisions on whether or how to impose stricter regulations or even bans on certain forms of data collection and use.

Thus, a broader notion of societal benefits supports allowing a multitude of actors to promote data-driven common good, including civil society and researchers. Companies also often create considerable value via data that they do not (and cannot) fully exploit themselves. Therefore, commercial actors should not be prematurely excluded.

Data use will always involve risk

At the same time, using data always harbors a potential for abuse – even if the data is collected and used with good intentions. An extreme example is police databases, whose primary purpose is to protect individuals. Yet if they are not adequately secured, the data of the very individuals the systems are supposed to protect may fall into the hands of people who pose a danger to them.

There are also various examples of contact tracing with problematic side effects: In South Korea, the open communication about where transmission may have happened put members of the LGBTQ community at risk. In large parts of Europe, concerns about potential abuse of centrally stored contact data have led most member governments to rely on decentralized data structures to implement their infection-alert apps (i.e. locally stored on smartphones). Most countries also have kept the use of apps completely voluntary, but there is no unified approach (for example, Poland has a mandatory app for monitoring quarantine).

Societal risks depend strongly on the context of the data and the situations in which the information is used. A consistent framework is therefore needed to ensure that data use is both beneficial to the public interest and sufficiently protected against misuse. Data protection and non-discrimination are two requirements that help limit the potential harm of data use.

Yet we also must consider other criteria to ensure that data is used in compliance with data security requirements, consumer choice, and an acceptable level of data power such that no individual firm can overrule decisions by democratic governments. As outlined above, these principles apply not only to private sector actors, but also to civil society, research, and the state.

For example, there is a strong case for making health data accessible not only for basic research, as is planned under the German Patient Data Protection Act, but also for commercial research. A prerequisite for this would be adequate protection of patients – e.g. they must have choice and transparency as to whether and which data is shared, there must be restrictions on which items of a data set can be accessed simultaneously to prevent profiling, and the data infrastructure must be robust against hacker attacks. All this is reasonable and necessary, even if the data is “only” made available to organizations with good intentions.

A broad understanding of the benefits of data use thus also requires an equally broad understanding of its societal risks. These result not only from intentional misuse, but can also arise from insufficient protection against inadvertent risks, such as discrimination.

More data use for the benefit of society thanks to greater access for all

To increase data use in the public interest, we need more data – which does not always mean collecting even more data, but rather making existing data more accessible. It is now a widely held view that data is still often held by individual companies that cannot exploit its full potential. However, we have yet to master the challenge of how we will provide access to public-interest data. A rather concrete example is the debate on antitrust law in Germany, which has already facilitated (mandatory) access to data necessary for competition in the context of the 10^th Amendment to the German Competition Act (GWB).

However, voluntary initiatives are often easier to justify and implement. Many actors can play a role in promoting the common good beyond the narrowly defined cases under antitrust law.

The most obvious starting point for the state is to make public data accessible. Data collected by public authorities is still subject to licensing, which mostly restricts the action of civil society organizations and start-ups because they can least afford to buy it.

There are similar obstacles when it comes to geodata, weather data, and data about the political process – unrestricted access to greater amounts of data in all these areas would create opportunities for projects of public interest, e.g. for more transparency about the use of public spaces and political accountability. At present, there is a lack of political will and incentives for those who collect and could provide such data. With its Findata project, the Finnish Health and Social Services Agency demonstrates how secondary use of public data is possible, even in the sensitive health domain.

Ideally, the consumers themselves retain most of the control over sharing their personal data. This requires new, effective mechanisms that empower consumers to take an active role in data sharing. For example, the European Commission is considering promoting voluntary data-sharing in the public interest so citizens can provide data about themselves, for example, in order to train machine-learning models. Such data donations are a reasonable first step, as initial attempts to analyze search-engine personalization and coronavirus research have shown.

However, the challenge remains that data often only becomes valuable once it reaches critical mass. In the future, this will lead to a great interest in new data institutions that can provide collective access to larger amounts of data. Creating a data trust for consumer data is one way of sharing greater amounts of data for the benefit of consumers, but the concept needs to be further developed and tested in practical application.

Data-sharing restrictions could also be eased for companies as privacy-enhancing technologies continue to mature, and the German data strategy aims to further propel the development of such privacy tools. These technologies reduce the risk of unintentional disclosure of confidential data of a personal or business-sensitive nature.

Greater legal certainty about when and what data may be shared can increase companies’ willingness to share data voluntarily. Positive developments include paragraph 32c of the 10^th Amendment to the GWB, under which companies can request the German Federal Competition Authority to evaluate their plans for data exchange. Also, a statement by German Federal Data Protection Commissioner Ulrich Kelber on anonymization makes it clear that personal aspects of anonymized data do not have to be completely removed as long as re-identification is sufficiently difficult.

Such sensible flexibility, with precautions, allows the state, consumers, and companies to help make greater amounts of data available, thus creating more opportunities to generate value, both for organizations that explicitly serve the public interest, such as civil society and research, as well as for those that generate common good merely as a by-product.

At the same time, it is important that all organizations that collect and use data systematically minimize the potential for data misuse. This risk is often particularly high where the potential value is also high – such as health or financial data. There is no silver bullet that magically makes data work better for the public interest. Instead, we can promote data access and data use across many different areas at the same time. As long as we are acutely aware of both benefits and risks, our societal interests stand to gain a lot.

How to use data for the public interest, even – or especially -- in a pandemic

Who uses data: serving, promoting, or endangering the public interest?

Data use will always involve risk

More data use for the benefit of society thanks to greater access for all

Why we need to rethink open data in Asia during Covid-19