Online Safety Data Initiative: Establishing our priorities

Earlier this month, DCMS launched the Online Safety Data Initiative – a 15 month project designed to test methodologies to facilitate better access to higher quality data to support the development of technology to identify and remove harmful and illegal content from the internet.

This work is being led by a consortium of experts from Government, the Online Safety Tech Industry Association (OSTIA), Faculty, and PUBLIC.

The initiative aims to determine and prototype new approaches which ensure trusted parties can access the online harms data they need to develop new online safety solutions. Harm data – the media and metadata that relates to harmful and illegal online content and behaviour – is the single most valuable resource in safety tech. All existing moderation models are based on this data, from URL sharing initiatives to hash-matching and other automated tooling. It’s also the most vital dependency for the development and testing of AI models to address particularly challenging and emergent forms of harm.

We’ve spent much of the past fortnight talking to Safety Tech companies, communications service providers (CSPs), civil society and academia about the data challenges they’ve faced while trying to make the internet safer and more trustworthy.

The two challenges that have been most frequently cited are scarcity of access to data and inconsistency in its quality.

Lack of data access

In most cases, lack of access to harm data is the primary barrier to growth and innovation for safety tech. This data, by nature, is sensitive and sometimes contains personal information which necessitates a greater level of protection. Getting the security and privacy principles right to safe access to this data is paramount to the success of this project.

We’re in the process of mapping the current data landscape for a range of online harms – research that we’re keen to share with interested parties after its completion in March 2021. What we’re starting to see already is that online harms data is owned by a range of parties but collective efforts to share that data are easier and more developed for some forms of harm than for others.

Lack of quality data

The challenge of inconsistent data quality stems from the distributed nature of online harms and the range of individual approaches to platform moderation taken by CSPs.

Take extremism (as distinct from terrorism). What constitutes ‘extremism’ varies internationally. Even in the UK there’s no statutory definition. As a result, CSPs have to make their own decisions about the extreme content that is uploaded on their platforms, often going beyond what a government could legally enforce in respect of the removal of harmful extremist material.

As a result, there is huge variance in how online harms are classified and how data is labelled and stored. These inconsistencies make it difficult to use this data to train new safety tech or test existing systems.

The team leading the Online Safety Data Initiative has significant experience in developing practical solutions to the legal, ethical and technical challenges of data science for online harms, which we will be bringing to bear as this project progresses.

But this initiative will not succeed without listening to and collaborating with a wide and diverse range of stakeholders across the global community who are working to find solutions to online harms. And, crucially, building on the great work this community has already done and is currently developing.

To do this effectively and to maintain the trust of the community we are representing, our work will be guided by the following three principles:

Transparency in thought, decision making and action. We are committed to the principle of working in the open and, where it is appropriate to do so, making our work and anything we develop available to those working to further develop safety technologies. We will couple this with independent oversight of our aims and activities at every stage of our work.
Diversity of thought and approach. We aim to explore a range of practical solutions for a variety of challenges faced by the safety tech community. This will require a whole-community approach to idea generation and the development of innovative solutions.
Privacy and security. Holding ourselves to the highest standards in protecting public and proprietary data and ensuring the security of anything we develop and of the work as a whole. This week we began a work programme aimed to identify the foundational security and privacy standards and measures we require to deliver this work, which we’ll detail in a subsequent blog later this week. Crucially, though, we won’t be ingesting any data until those standards and principles are agreed.

We are under no illusions of the size of the challenge we’re attempting to address, but we’re excited about the opportunity to test some novel approaches that could deliver measurable improvements in online safety technologies. If you have a view to offer, please get in touch with me or any of the rest of the team – we will make the time to talk to all interested parties.

Online Safety Data Initiative: Establishing our priorities

Lack of data access

Lack of quality data

Share this article

Articles you might like

Three successful projects participating in the second Safety Tech Challenge Fund

Government publishes 2023 UK sectoral analysis & announces winners of the Safety Tech Challenge Fund

Online Safety Tech: 2022 Recap