Defining Publicly Available Platform Data
Given the central role that private technology platforms play in the dissemination of information in the modern media environment, public access to platform data is essential for advancing the common good. Independent researchers, journalists, members of civil society and the public all rely on platform data to understand and expose critical aspects of information production and dissemination. Increasingly, regulatory requirements like the Digital Services Act require the production of some level of publicly available data from these platforms. But there is no clear agreement on what data should be made available, when and how.
KGI is advancing a project to proactively articulate a consensus definition for Publicly Available Platform Data. The definition will be a framework for what kind of platform data should be made publicly available, under what circumstances, and in what format. The ultimate goal of our Publicly Available Platform Data work is to articulate a uniform, cross-industry framework that allows for understanding the online information ecosystem as a whole, not in platform-specific silos mediated by ever-dwindling and highly structured researcher access opportunities. See the Publicly Available Platform Data Expert Working Group Members here. Learn more about this project below:
FAQ
Who is KGI?
The Knight-Georgetown Institute (KGI) is dedicated to connecting independent research with technology policy and design. Based at Georgetown University in Washington, D.C., KGI serves as a central hub for the growing network of scholarship that seeks to shape how technology is used to produce, disseminate, and access information. KGI is designed to provide practical resources that policymakers, journalists, and private and public sector leaders can use to tackle information and technology issues in real time.
Why is KGI working to define Publicly Available Platform Data?
Given the central role that private technology platforms play in the dissemination of information in the modern media environment, public access to platform data is essential for advancing the common good. Independent researchers, journalists, members of civil society and the public all must rely on platform data to make informed opinions and identify policy priorities. Increasingly, regulatory requirements like the Digital Services Act require the production of some level of publicly available data from these platforms. But there is not clear agreement on what data should be made available, how and when. The definition will be a framework for what kind of platform data should be made publicly available, under what circumstances, and in what formats. The ultimate goal of our Publicly Available Platform Data work is to articulate a uniform, cross-industry framework that allows for understanding the online information ecosystem as a whole, not in platform-specific silos mediated by ever-dwindling and highly structured researcher access opportunities.
Why is a definition necessary?
For over a decade, stakeholders both inside and outside platform companies have debated what data platforms should disclose and to whom. Thanks to transparency advocates in diverse fields, we have seen various transparency regimes take hold—voluntary, self-regulatory, and regulatory. These regimes require platforms to share information about their activities, algorithms, and processes with vetted entities like researchers, regulators, or business competitors, and, sometimes, with the broader public. The public interest research community–composed of researchers, advocates, and journalists–is intimately familiar with the benefits and limitations of different data access regimes. This community is therefore uniquely positioned to help articulate the societal needs and benefits of publicly accessible platform data, advocate for effective policies, and develop standardized data schemas and protocols that can support meaningful cross-platform research. Yet at the moment, despite years of successful advocacy to increase vetted researcher access to platform data, there is no gold standard for what kind of platform data should be made publicly available, under what circumstances, and in what formats.
How will KGI work to develop a definition?
KGI will convene and collaborate through an expert working group (EWG) that includes academic researchers, journalists, and civil society representatives to jointly develop the definition of Publicly Available Platform Data. The EWG will initiate and iterate on a set of standards for the types and formats of data that should be made publicly available from online platforms. The EWG will proactively seek diverse views from across the research, journalism and civil society ecosystem.
Who is involved in developing the definition?
The EWG will include academic researchers, journalists, and civil society representatives. Participants are being identified based on their experience, perspectives, and membership in broader transparency and data networks. Developing the definition will be a shared effort and the EWG expects to utilize surveys and other forms of outreach to solicit and incorporate a range of perspectives. We intend to work in ways that will eventually lead to expanding consistent access to publicly available platform data.
How is publicly available platform data related to other transparency and data access initiatives?
There are important existing data access efforts underway. Catalyzed by the EU’s Digital Services Act, there is vital work happening to ensure that vetted researcher access regimes deliver their full potential. This project will complement these activities and focus on public access to public platform data as a means to enable any interested party to further understand the relationships between online platforms and individuals, communities, and societies. We also work collaboratively with others in this area, including the Coalition for Independent Technology Research, the Mozilla Foundation and others. This work is being conceptualized as complementary to important work already underway in this space.
What do the platforms think about the idea of publicly available data?
Platforms have not voluntarily expanded public access to data. Indeed, in the last several years platforms including X/Twitter, Meta (including Facebook and Instagram) and Reddit have further restricted access to data via their API. However, we are in a period of regulatory fragmentation where numerous countries are pursuing related – but at times contradictory – regulation related to transparency reporting and data access. Articulating a consensus definition can contribute to more effective and predictable standards for data access across platforms. It could level the playing field and help ensure baseline levels of compliance across all platforms.
Do you expect platforms to adopt the definition voluntarily? If not, how will it be required by law?
In the near term, we would not expect most platforms to voluntarily sign on to a consensus definition of publicly available platform data (but, we would certainly encourage them to do so!) As regulation proliferates, however, we expect that a single, consistent definition for access to publicly available platform data will be in both the public’s as well as platforms’ interest. A key purpose for establishing a definition will be to engage key policy audiences in the EU, UK, US, and elsewhere on approaches to regulation mandating data access.
What is the timeline for developing the definition?
KGI launched the development of a definition for publicly available platform data in the fourth quarter of 2024. The EWG expects to finalize and launch the definition for publicly available platform data in the second half of 2025.
How will the definition accommodate differences in design of different platforms, and platforms of different sizes?
These are key questions that are core to the EWG. We currently envision a modular approach so that the definition can be usable across platforms of different kinds and at different scales.
Will the definition be published by a formal standards organization or by the new intermediary body?
While we can’t predict the eventual possible uses of the definition, by articulating a consensus definition, KGI and the EWG will advance more consensus around what kind of platform data should be made publicly available, under what circumstances, and in what formats. We see this effort as a critical step towards the evolution of more formal alignment around standards for public access to public data.
Who can I contact to learn more?
Please contact Leticia Bode (Leticia.Bode@georgetown.edu) and Peter Chapman (peter.chapman@georgetown.edu) or find us online.