It’s been a few weeks since my last weeknotes because I’ve been away on leave for the last couple. But things didn’t stop happening just because I was on holiday: in particular it coincided with the publication of the Data Protection and Digital Information Bill (DPDIB), and the news that the Shuttleworth Foundation, which has enabled me to start this work, is going to be closing up shop in 18 months time.
The loss of the (potential) third year of the Shuttleworth Foundation fellowship and funding is disappointing but not fatal. I’ve always seen it as a one year grant with the potential of renewal, rather than a guarantee, and knew that we would have to find other funding to get to the five year mark that I see as the useful duration of this work. We have other conversations in the pipeline that I’m hopeful about. And I don’t think the supportive Shuttleworth Foundation community is likely to entirely disband. It’s really rare to get the size (~£350k/year) and flexibility of funding that the Shuttleworth Foundation provides, though, and I really feel for the team when this has been their life for 17 years.
Onto Connected by Data itself. This Wednesday we had our second team monthly meetup. We had arranged for this to be in person, but unfortunately the train strikes meant we had to do it online instead, which made it even more of an intensive day than usual. Let me go into some of the things we discussed…
This was a topic I wanted to surface because over the holiday I’d read the critique Against Democratising AI by Johannes Himmelreich at Stanford University. The paper argues against the remedy of the “democratisation” of data and AI governance through naively putting every decision to a popular vote, and for a model that addresses the systemic issues of justice largely through strengthening existing institutions.
I largely agree with the paper and want to highlight a few points in it that I think are important.
Collective, democratic, participatory, deliberative
First, there are a bunch of terms that I’ve previously used somewhat interchangeably which actually need to be made more distinctive and used with more care. This is how I’m thinking about it now:
- Collective data governance is not about how decisions are made, but in whose interests. When we argue for collective data governance we’re saying that our collective interests (eg in systemic impacts such as on equality or our environment) should be considered, not just our individual ones, nor those of powerful corporate or government interests.
- Democratic data governance is about the power of communities in how those decisions are made. We argue for democratic data governance because we think communities should have more power in decisions about data and AI, rather than that being held by the organisations building those systems.
- Participatory data governance is about the mechanics of the exercise of that power in terms of who actually gets involved in the decision making. It is a sliding scale. You can have low participation (eg by communities being given a small opportunity to object, or handing over decision making to representatives) and high participation (eg by communities co-designing the outcome and everyone voting on the outcome) even in democratic data governance. We’d argue that more participation is better, but also resource intensive and seldom perfect. And we’d argue that including those who are marginalised is the most important aspect of good participation.
- Deliberative data governance is about the thoughtfulness of the decision making process. Again there is a sliding scale between quick decisions made on gut reactions and identity allegiances, and decisions made based on weighing often conflicting evidence to reach a nuanced position. Again we’d argue that greater deliberation is better, but there are resource implications around running deliberative exercises.
Unpicking it like this is helpful because it highlights what’s at the centre of our argument, and what’s entailed from that. The core is the need for collective data governance in response to the collective impacts of data. You could achieve collective data governance through technocratic means, but we think technocratic decision making would lack legitimacy and trust, and that these are important for data and AI systems. Therefore we advocate for democratic data governance, with power given to communities affected by these systems. We believe that the more participatory and deliberative these are, the better, but there are lots of models, and putting everything to referendums is not likely to give the best outcomes!
Legitimacy and trust
The second thing I want to pick up is why democratic data governance is important. In his paper, Himmelreich argues that democracy is only needed to fill legitimacy gaps, and that data and AI do not in and of themselves create legitimacy gaps. Rather, he says, those gaps are in the organisations and systems that deploy data and AI. So when scholars and advocates call for democratisation of data and AI their target is wrong: the underlying need is for greater power to the people rather than governments and corporations across their operations.
I can see the argument. Frequently, uses of data and AI are “simply” the automation of capitalism and bureaucracy, and the problematic impacts that arise are manifestations of the underlying systems and institutions. If those same systems were run or decisions made by humans, many of the same problems would remain. But I do think there are features of data-driven and AI-based decision making that are qualitatively different and that lead to distinct legitimacy gaps:
- First, they entail a level of surveillance that would be unthinkable (both practically and morally) if it were done by people. This feels intrusive, reduces our sense of freedom and autonomy, and has a chilling effect on what we choose to do and share. There is a legitimacy gap over the need to know that is a result of the use of data and AI.
- Second, systems that use data and AI for decision making lack humanity, by which I mean both the ability to perceive, understand and account for the depth and complexity of people’s lives; and the empathy to be able to respond with care and compassion. They tend to function much better with the normal and expected than with surprising extremes or minority groups. So it is more likely that people justifiably feel their circumstances haven’t been properly taken into account and question the legitimacy of systems that crudely put us into boxes.
- Third, while various governance systems have been built up over centuries around human decision making, to provide transparency and accountability in their processes, the scale, opacity and pace of introduction of data and AI mean that existing governance has not had time to adapt. This introduces legitimacy gaps in the level of scrutiny applied to these systems.
A final thought on this point about whether targeting data and AI is the right thing is that there could be an argument that getting community participation around the creation of data and AI systems could be seen as a way of introducing it into institutions more generally, similarly to how digital transformation often leads to broader institutional transformations. Even if the legitimacy gaps go beyond data and AI, they provide an opportunity and excuse for a process that will naturally raise issues outside of that scope.
As I write this I’m very aware that I’m of course invested in having democratic data governance being something that is worth aiming for. I’d love to know from readers whether what I’ve said above makes sense.
A final, briefer, point arising from Himmelreich’s paper is the importance of strengthening (governance around) existing institutions, rather than focusing on new forms. In data, this argues against creating new data trusts, unions, cooperatives and so on, and instead focusing on organisations that already have power and data – governments, corporates, even civil society organisations – and improving the way they work.
I think this is largely right, though I do think it’s likely that our new data environment will require new institutions too (I wrote about this in the FT a few years ago). When I was at ODI, I would often speak about our work there on data institutions as focused on three kinds of organisation:
- existing data institutions – national statistics offices, mapping agencies, credit reference agencies or biobanks for example – which have established governance structures that need to adapt to new demands (such as wider data sharing, and engagement with affected communities)
- new data institutions – data trusts, data unions, data cooperatives and so on – which are being established in response to changes to the data ecosystem and the more active role that people and communities want to play in data stewardship and governance
- existing community-based institutions taking on new data stewardship roles – trade unions, big charities, public bodies etc who find themselves as the custodians of data even when it wasn’t previously part of their institutional role
The same is the case at Connected by Data. Each kind of organisation has strengths and weaknesses. Existing data institutions will have strong data expertise, be established and sustainable, but a cultural shift to make towards greater openness and engagement. New data institutions may be similarly strong on data expertise and can be built around and with their communities (which will take time to grow), but will struggle in the same way as any startup organisation with capacity and sustainability. Existing community organisations will have good, established communities, and be on a firmer footing financially, but may not have expertise in data and its stewardship and governance.
Anyway, my main point is that at Connected by Data we’re not just interested in new forms of data institution such as data trusts, but in collective power and community involvement being embedded in all kinds of organisations.
Data Protection and Digital Information Bill
The second big topic I wanted to cover in these weeknotes is our emerging plans around the UK Data Protection and Digital Information Bill (DPDIB) (renamed just as I’d got “Data Reform Bill” to roll off my tongue!).
Now, the last few weeks have been… unsettled… politically, and I’m not sure exactly where the priorities of the new Prime Minister and their Government are going to land. I haven’t seen much or any attention on matters around data and AI in the priorities of the candidates (though note that Liz Truss was Secretary of State for Defra during their Open Defra days). Julia Lopez, who took over from John Whittingdale as Minister for Data in the middle of the “Data: a new direction” consultation, was one of the many junior ministers who resigned during July, and it’s not clear how long the existing DCMS ministerial team – including Secretary of State Nadine Dorries – will be in post following the selection of the new Conservative leader.
Nevertheless, the second reading of DPDIB is scheduled for 5th September and should give us a better clue to the priority it’ll be given in parliamentary processes over the next year or so.
We’re hoping to both influence changes to its content, and use it as an opportunity to build contacts and awareness with politicians, advisers, journalists and so on, who are going to be important in the long run, with a particular eye to the next General Election, which we think will probably come in 2024.
Working through the Bill itself was really interesting, requiring us to bridge from the vision and principles of what we’d like to see in the world, through what that means in terms of the behaviour of various different organisations, and the legal obligations that would be necessary to require those changes, to what that means in terms of concrete amendments to the Bill.
I expect this will change as we work through everything over the next couple of months, but currently we have a primary focus on making systemic changes that will encourage, enable and enforce collective and participatory data governance; and a secondary focus on ensuring that’s happening in the specific areas where the Bill introduces rules around data in specific sectors such as crime or around vulnerable people.
We want to see an expansion in the harms that the Bill provides protection against:
consideration of collective impacts on societies and our environment, not just on individual impacts
We think the best way for decision makers to properly consider these harms is through public participation as part of data governance (including in the design of rules, regulations and codes of conduct).
We think the only way of making that participation powerful and meaningful is by improving transparency and accountability around those decisions. This includes:
- open publication of explanations for decisions
- the ability for those affected to raise complaints
Based on this, the kinds of amendments we’re thinking about are:
- Defining the concepts of “decision subject” and “affected stakeholder”, and using them alongside “data subject” wherever “data subject” is being used as a shorthand for those that are affected by the use of data. This is most relevant in the clauses about automated decision making.
- Wherever there are lists of considerations that a decision maker has to “have regard to”, ensuring these include both:
- having regard to the rights and impacts on decision subjects and affected stakeholders
- having regard to wider impacts on society, equality, etc.
This applies to decisions being made by the Secretary of State, the ICO, groups creating codes of practice, and individual organisations for example when carrying out balancing tests for legitimate interest assessments.
- Wherever these decisions are being made – eg in the design of codes of conduct – ensuring there is:
- an appropriate level of consultation with and participation of affected stakeholders in their creation
- transparency of the outcome, the rationale, and the process of decision making
- scope for those affected by the decisions (not just data subjects) to object to them
There are a number of areas within the Bill where it is proposing new rules around the governance of particular types of data. This is particularly the case for:
- some aspects of health and social care data
- biometric data
- birth and death record data
- data that is used for democratic engagement
- data that is used for the set of purposes laid out in Appendix 1 (listed legitimate interests), including:
Many of these have not been through a specific consultation process, so there has been little or no public participation in shaping the rules proposed in the legislation. As such, they are proof points that help to make the broader points about the systemic changes we want to see about considering broader impacts and public consultation and deliberation.
Our goal here, then, is to support other organisations who are campaigning around changes to these sector-specific rules and highlight the wider systemic issues that need to be addressed.
Evidence and stories
One of our challenges, as we look to influence DPDIB, is coming up with the stories – which are best if they’re grounded in evidence – that will illustrate the points we want to make.
The well-known stories about data tend to:
- focus on data harms rather than benefits – understandably as describing dangers tends to prompt action
- describe potential future impacts, rather than evidenced actual impacts – again understandably given that collecting evidence of impact is a lot harder (and actual impacts a lot more complex) than imagined ones
- emphasise individual impacts and agency – after all, a personal story is a lot more compelling than a somewhat abstract collective one
I’m really looking forward to having a researcher join us over the next few months to help us unpack stories of the impact of data, both to give us some more examples and to start to identify some patterns in them which help to highlight collective impacts.
There are a couple of areas of focus for me for the rest of this month just to progress those important-but-not-urgent parts of our work. In particular, we’ve been working on our Theory of Change, which I need to polish off and share, and I want to take the first steps towards getting an Advisory Board into place.
The other thing I’ve been working on (with Tim) is a “Fellowship and Action Learning Set” programme to help introduce collective and participatory data governance into organisations, build a community of practitioners, and learn about what works and what doesn’t. I had a really useful conversation with Julia Kloiber at Superrr Lab this week about how we should design this programme, concluding that supporting existing employees – rather than bringing in people from outside – was more likely to be both effective and affordable.
We still have to get some funding to support this programme, and one of the biggest risks is not identifying any organisations or individuals who want to take part. So if you’re following our work and are interested in taking part in the fellowship (or know someone who might be), please get in touch!
Those about whom individual-level decisions are made, often based on data about other people, for example when an AI system is trained on data about one subset of the population and applied to another subset. I’ve previously described these as victims. ↩
Those who are affected by data-based decisions that affect whole communities, for example if a council plans road improvements based on a survey of drivers, this would include drivers who weren’t directly surveyed, cyclists, pedestrians, homeowners, refuse collectors and so on. ↩