Putting the "good" in health data as a public good?

Jeni Tennison

Jeni spoke on the RightsCon panel “Putting the ‘good’ in health data as a public good?”.

The COVID-19 pandemic accelerated the collection of health data – underscoring the importance of good governance to uphold the highest standards of data protection and respect human rights. In June 2021, the World Health Organization called for a new global consensus on “health data as a global public good,” with a focus on data sharing for improved health outcomes. But what does health data as a public good mean in real-world contexts?

This panel sets out to explore the rights-based challenges we face when moving from design of data governance frameworks to implementation, such as:

  • What are the benefits and risks of making health data a public good? And how do we determine what data, if any, should be shared?

  • How do we involve people, communities, and the public – who all have different perspectives – in governance decisions? How do we engage government partners?

  • How do we ensure that insights unlocked by health data – and particularly the sharing of data – bring value to the communities from which that data was generated?

Through real-world case studies, we hope to articulate practical considerations for multiple stakeholders – e.g., implementers, policymakers, funders, and community representatives alike – to meaningfully engage in health data governance.

This panel was hosted by the McGovern Foundation, chaired by Rebecca Distler and I was joined by two amazing other panellists, Krystal Tsosie, Ethics and Policy Director at the Native BioData Consortium and Isaac Hamuli, Country Director at D-Tree International. What follows is a combination of the notes I made prior to the session and notes I made during the session from what Krystal and Isaac talked about.

I was asked to kick off with a description of what data governance is and why it’s so important with health data.

I like to frame “data governance” quite broadly as how we make and implement decisions about data. Let’s unpack that a bit:

  • The decisions we make about data can range from what data we collect, how we collect it, and who from; through how we share data, including whether it’s open or more restricted and what demands we make of those who access it; to what gets done with data – these latter decisions becoming intertwined with aspects of AI, platform and general technology governance.

  • In health, the types of data range from very personal and sensitive data such as patient health records, results from medical trials, genomic data; through summarised and aggregated versions of these datasets – examples from Covid include things like numbers of cases, hospitalisations, deaths and vaccinations – which may be disaggregated by geography or by demographic characteristics such as gender or race; to types of reference data that are often forgotten in these conversations such as where hospital facilities are, or the codes we use for symptoms or data we have about medicines.

  • We make and implement decisions about data in a range of different ways. There are legal mechanisms such as data protection laws, which might delegate those decisions down to us as individuals. There are technical mechanisms such as privacy-enhancing technologies and trusted research environments that limit access to data. And there are institutional mechanisms such as having decision-making bodies like ethics boards within organisations, or even having separate organisations whose sole purpose is stewarding data.

Data governance is particularly important for health data because the toughest kinds of decisions that we make about data are ones that involve a lot of different interests, and this is really salient for health data. On one side we have deeply personal information about each of us, including things even our families might not know. On the other side, that data, especially in aggregate, can help advance medical research, discover new and better treatments, and simply help manage the health system.

The latter benefits are why the World Health Organisation is calling for us to see health data as a “public good”. Realising those benefits means sharing data as widely as possible – and we’ve seen how sharing data about the novel coronavirus really helped understand it and develop treatments for Covid-19. It’s also fair to say that all the public attitudes studies I’ve seen about the use of health data also recognise the utility of data for medical research and these kinds of public benefits.

However, there are also contexts and kinds of data use and sharing that people feel a lot less happy about, where “public good” as a term can rightly be criticised, for example:

  • sharing with private sector companies who then charge prohibitive amounts for access to the drugs they develop based on that data

  • sharing data about particular groups, particularly those who have been and are discriminated against, including within the health system, such as indigenous people, Black people, or trans people

  • and at a global scale, sharing data about Global South populations for exploitation by Global North organisations, who then get recognition and funding for further research

So data governance, particularly in a health setting, needs to address these challenges:

  • awareness of public attitudes, fears and concerns and what that might mean in terms of participation in both health data collection – such as increases in opt outs – and health system more generally

  • end-to-end involvement, participation, and actual co-design in data and data governance so the interests of communities are embedded into how these systems and processes work

  • data governance systems that feel safe for people who aren’t directly involved in co-design themselves, really demonstrate and communicate care, recognise personal and community sovereignty, and provide accountability and redress when things go wrong

And I’ll note that in all three we have to be aware that consideration has to be made to the fact there are multiple publics with different risks, risk appetites, privileges and histories.

In summary, data governance is all about how we make the tough decisions about data and make sure everyone’s interests – particularly the people data is about and those who are going to be affected by its use – are taken into account.

Isaac then spoke about his experience in Zanzibar where they’re putting in place the first national-led digital health programme with volunteers collecting health data that is then stewarded by the Ministry of Health. He talked about the data governance framework they have put in place to set norms about things like sharing data externally and internally, and about their activities to increase awareness and capacity inside and outside government.

Krystal spoke eloquently about how indigenous data should not be seen as a public good, but as one that belongs to relevant indigenous nations. She described “helicopter research programmes” where researchers come to collect data and communities don’t understand what they’re signing up to (not least because of language gaps); researchers leave without real connections, conversations or engagement; and how those indigenous communities feel exploited, particularly when they are promised access to medicine as a result of this research, and it never comes to pass.

Isaac similarly related complaints he’d heard from government officials about development organisations partnering with researchers and implementing programmes which collect data but then publish papers without really involving either the government or the communities they’re gathering data on. He emphasised that these communities need to see how data is going to be used to address their problems, as well as ensuring things like data anonymisation.

Krystal spoke about group risks arising from harmful inferences, stigmas and assumptions around biology and race, and pointed out that increased harmonisation across datasets increases risks of re-identification. She challenged us to think beyond Western approaches to ethics and the ideas of informed consent. She also highlighted that while public research in bio-medicine is highly regulated, industry research is much more of a free-for-all, and that they gather a lot of data from digital services and insurance providers.

I probed Krystal and Isaac a little about the “right to be seen” and representation within datasets that can render already-marginalised people invisible, which is also a risk. While they recognised that risk, they also emphasised that the right to be seen is a right to be exercised on their own terms, not one that acts as an excuse for exploitative data collection by third parties.

Before the session, I’d also made notes on a couple of other topics which helped me structure some of my thinking.

First, I believe that organisations who are stewarding health data – medical researchers, national health systems, individual hospitals – should be actively involving the people who are affected by the use of data in these decisions. There are a range of methods for doing this, all with pros and cons, so good approaches will use a combination of them:

  • Organisations need to include communities in setting up how the data governance system works in the first place, including, for example, putting in place broad ethical guidelines, and understanding how to balance individual and collective interests. Here, they can apply good participative co-design methodologies, for example using citizen juries and deliberative workshops.

  • Organisations also need to include communities in their day-to-day data governance, for example when choosing whether a particular researcher should have access to particular data for a particular purpose. This involvement might look more like a board or panel, with patient and community representatives as members.

  • Organisations also need to provide a route for members of affected communities to raise complaints and making suggestions – detecting where things have gone wrong, holding organisations to account and achieving redress. This engagement needs to always be open to everyone, but be particularly accessible to consumer rights and other civil society organisations who can act collectively. What happens through complaints should feed back into wider decision making processes.

Second, I think it’s very hard to get meaningful individual-level consent about the use of data, particularly the way data is used now. The idea of individual consent to manage data originally arose from research methodologies where you would typically be gathering data for a specific study and be looking at a relatively small number of participants. It’s a very different situation now, particularly with “secondary use” of patient health records, where data might be used for lots of different purposes, in ways we can’t predict when the data is collected.

We need to be aware that the primary use of health records – to provide care to an individual patient – is still really important. We really don’t want to prevent people from divulging information when they see a medical professional because they’re worried about how that information might be reused.

We know that human decision making around consent is really sub-optimal. We don’t make fully-considered rational decisions all the time, particularly when we’re desperate for a service (which is often the case when engaging around health) or pressed for time. The whole set of implications of the use of data, particularly in the future with unknown technologies, is really complex so none of us can really be truly informed about it. And we know the context in which we’re asked these questions – the choice architecture we’re presented with – really affects our answers, either deliberately or not. For example, “just sign this form” at the beginning of an appointment isn’t likely to provide a context in which consent is meaningful.

We do need some level of individual flexibility around how data about us is used, because we do carry different levels of risk, different risk appetites, and have different values and levels of commitment to altruism. But we also need some basic protections that provide a safe framework within which those decisions can be made.

Third, it’s worth thinking about how governance changes in the face of innovation. Despite being a technologist, I don’t generally think data governance is something that can be solved with technology. Even the best Trusted Research Environments, such as OpenSAFELY, which provides lots of provable privacy and proactive transparency, can still be used to do studies that are ultimately damaging or unethical. We still need the human side of governance to identify and mitigate some kinds of harms.

My final take-away from the session was that it highlighted the need for communities to be involved at every stage of a project or programme, from the way in which it’s originally shaped (such as the purpose of a research study) through to how the benefits and outcomes are felt by those who take part. Being involved in data governance just one part of this much larger and longer process of co-design and engagement, but you can’t really have good data governance if you’re not thinking about the whole process.