Commentary /

From Ambiguity to Accountability: Analyzing Recommender System Audits under the DSA

The first round of Digital Services Act (DSA) recommender system audits are inconsistent in how platforms and auditors define key terms and assess recommender system performance. To ensure meaningful transparency, we need clear definitions, more data, and verifiable system outcomes.

Imagine if financial auditors had no standard processes or definitions to follow – just their own insights and discretion to decide what and how to audit. The results would not be very meaningful. This is where we find ourselves with the first round of audits under the EU’s Digital Services Act (DSA). Without clear standards, the effectiveness of these audits depends on which firm conducts them and how key terms and processes are defined.

Last month, I wrote about how DSA risk assessments and audits are undermined by two gaps: (1) the failure to adequately assess the role of platform design in relation to risk; and (2) a lack of reporting about the data, metrics, and methods platforms and auditors used to evaluate risk and compliance.

This piece analyzes the different approaches platforms and independent auditors took in the first round of DSA audits related to recommender systems (DSA Articles 27 and 38).

Without consistency and comparability, these audits cannot meaningfully assess how effectively platforms are mitigating risk. By clarifying core definitions and identifying effective audit approaches now, the DSA audits can become a more effective accountability tool. To begin to make progress, let’s look at where we are today.

Auditing under the DSA regime

As many others have noted, the DSA introduces new terms and concepts for understanding platform risks, sometimes without a clear definition. This is expected given the novel nature of the DSA regulatory regime. In these early years of the DSA, a range of stakeholders – online platforms, civil society, the European Commission (EC), and national Digital Service Coordinators (DSCs) – must experiment, identify good practices, and share lessons learned. Such iteration is important to ensure an adaptive DSA regime that spurs innovation and responds to shifting technologies, risks, and mitigation strategies.

The need for iteration and flexibility, however, should not mean the audits fail to deliver on their potential as vehicles for transparency and accountability. The first round of independent audits of recommender systems reveals clear areas for immediate improvement.

Because the core definitions and methodologies were developed independently by platforms and auditors, significant inconsistencies exist in both risk assessment and audit processes. When evaluating the same requirements, platform auditors have differing expectations and employ different terminology. Reviewing audit findings related to recommender systems leaves us comparing apples to artichokes.

What do the DSA recommender systems audits assess?

Auditors assessed articles related to recommender systems across the DSA, including Articles 27 and 38. These articles require platforms to: 

  • In “plain and intelligible language” describe the “main parameters” of their recommender systems, including the “most significant” criteria for recommending information to users, and the “reasons for the relative importance of those parameters.”
  • Platforms should enable users “to select and to modify at any time their preferred” recommender system; this option should be “directly and easily accessible” on the platform.
  • Very large platforms should also offer at least one recommender system that is “not based on profiling,” as defined by the EU’s General Data Protection Regulation.

Several core terms – like “plain and intelligible language” or “most significant” recommender system criteria – are not defined in the DSA. In auditing Facebook, for example, Ernst & Young noted that “many of the obligations needed to be supplemented by the audited provider’s own legal determination, benchmark and/or definition of ambiguous terms.” This is a common approach and is an important way for platforms and auditors to clarify expectations. It is also an opportunity for stakeholders to align around a shared understanding of core expectations.

What are the key definitions of recommender systems?

Before your eyes glaze over reading a thousand words on audit definitions, these nuances really matter! There is significant variation in how platforms approach recommender system definitions. Some platforms have defined DSA-related terms, whereas others have chosen not to. Table 1 below summarizes some of the definitions given by platforms.

In many instances, definitions will be where the DSA’s substantive requirements actually become meaningful for users. A requirement for “directly and easily accessible,” for example, will only be meaningful if it is operationalized in a way that empowers platform users to shape their recommender systems.

The audits demonstrate important variation across foundational definitions:

  • Main parameters of a recommender system: What exactly are the main parameters of a recommender system? As others have noted, recommender systems rely on hundreds of components that are constantly refined (not to mention the potential for “millions or billions of learned neural network weights”). The text of the DSA says major parameters are “at least: (a) the criteria which are most significant in determining the information suggested to the recipient of the service; (b) the reasons for the relative importance of those parameters.” Platforms have further defined this as “broad categories of signals” (TikTok), those “most significant in determining” recommendations (Pinterest, closely aligned to the DSA), and the “primary factors determining output” of the recommender system (Snap). Other platforms have simply left them open to interpretation, as in the case with Google/YouTube and X. Without understanding how a platform interprets and communicates “main parameters,” it is very difficult to understand how these components connect to system outputs. Yet there is work on which to build. The Knight-Georgetown Institute (KGI), where I work, has a forthcoming report offering concrete recommendations for how to publicly disclose information about the specific input data and weights used in the design of recommender systems.
  • Plain and intelligible language: The DSA requires the main parameters of recommender systems to be spelled out in plain and intelligible language. What does this concretely mean in the recommender system context? Is it free of “acronyms or complex/technical terminology” (Pinterest), “straightforward vocabulary and easy to perceive, understand, or interpret” (Snap), or “written for a general audience with varying technical skill levels, inclusive of all users” (TikTok)? There’s a subtle difference in expectations associated with each framing. These terms don’t need to be defined in a vacuum. Platforms, auditors, and the EC should build on important research into effective online disclosures, for example from the CyLab Usable Privacy and Security Laboratory (CUPS) at Carnegie Mellon University or the OECD
  • Direct and easily accessible: The DSA requires that recommender system selection and modification be “directly and easily accessible” for users. Again, this could mean different things to platforms and auditors. Is it “intuitive, reliable, and easy-to-find entry-points” (Facebook) or merely tools available to “all users” (Snap)? Or must the option be “initially surfaced as a pop-up … where the information is being ‘prioritized,’ and is easily accessible with one click” (Pinterest, emphasis added). Or might it be whatever TikTok’s redacted definition is? Digital platforms have a long history of conducting user testing to understand concepts like direct and easily accessible. The UK’s Competition & Markets Authority’s evidence review of online choice architecture (OCA) and consumer and competition harm, for example, taxonomizes OCA  and summarizes literature related to effective practices. A recent Centre on Regulation in Europe (CERRE) report maps the EU choice architecture regulatory environment and articulates principles to address risks. Platforms and auditors need to incorporate these and other benchmarks into the definition of “directly and easily accessible” modification. 

Each of these examples demonstrates consequential decisions the auditors made that significantly affect the outcome of the audits. Not all elements of the DSA recommender system audits need standardized definitions. However, more standardized definitions of key terms, including the three described above, are needed to make the DSA audits meaningful. The EC, DSCs, civil society groups, and platforms should take the opportunity to identify which terms necessitate more specificity and guidance.

Table 1: Summary of Articles 27 and 38 Audit Definitions 
Platform Summary of Article 27 and 38 Definitional Guidance Provided to Auditors
Facebook “Directly and easily accessible” in Article 27(3) is defined by Facebook as “intuitive, reliable, and easy-to-find entry-points throughout Facebook to access and browse non-profiled recommendations and content.”

Meta did not provide the auditor with supplemental definitional language for Article 38. 

Google/ YouTube  Google’s audit does not reference definitional standards for Article 27 or Article 38.    
Pinterest Pinterest provided its auditor with definitions, including: 

  • “Plain and intelligible language” under Article 27(1): “without acronyms or complex / technical terminology. …where terms can be complex or have legal implications, a ‘More simply put’ section is added to the policy page to provide a concise summary.”
  • “Main parameters” under Article 27(1): “The criteria that are most significant in determining the information suggested to the recipient of the service.”
  • “Directly and easily accessible” under Article 27(3): “The option that allows recipients of the service to select their preferred option for a recommender system is initially surfaced as a pop-up on the homepage of Pinterest or on the user’s app, which is where the information is being “prioritized,” and is easily accessible with one click. Following that initial selection, EU users can access this option from their settings under “Privacy and Data,” which is directly and easily accessible with one click on the “settings” icon in the app or website and another click on “Privacy and Data”
  • Pinterest did not provide the auditor with supplemental definitional language for Article 38. 
Snap Snap provided its auditor with definitions, including: 

  • Definition of plain under Article 27(1): “Using straightforward vocabulary
  • Definition of intelligible under Article 27(1): “Easy to perceive, understand, or interpret
  • Definition of main parameters under Article 27(1): “Primary factors determining output of Snap’s recommender systems
  • Definition of directly under Article 27(3): “Visible (includes clear headings and keywords and is discoverable)”
  • Definition of easily accessible under Article 27(3): “Available to all users
  • Snap did not provide the auditor with supplemental definitional language for Article 38. 
TikTok TikTok provided its auditor with definitions, including: 

  • “Plain and intelligible language” under Article 27(1): “Clear information, written for a general audience with varying technical skill levels, inclusive of all users, that is helpful and avoids complex words, phrases, jargon, formality, and legalese.
  • “Main parameters” under Article 27(1): “broad categories of signals that inform recommender systems.
  • “Options to modify or influence those main parameters” under Article 27(1): “(1)… enable users to influence or modify the ‘main parameters’, as defined by TikTok (user information, content information, interaction information) and therefore affect the information that will be presented to the user; and (2) options must be a tool or a feature which the user engages with, over and above a user interaction (defined as a main parameter and already informing the user in the Help Center article that how they use the app / interact with content, will impact what they see). (3) the following controls are not considered to be ‘options to modify or influence those main parameters’:
    • Limiting the pool of content: These are content controls that TikTok offers to users that restrict certain content from being retrieved by the recommender systems but do not ‘modify or influence’ the main parameters (listed above).
    • Non-personalized options: These offer users separate experiences not based on profiling.
    • Settings: These are user preferences that affect the user’s experience on the Platform generally (e.g., language and translation preferences, and location services).
  • “Directly” under Article 27(3): “This must be directly accessible from the main part of the feed/feature in question … Where the setting is not present in the first layer, we have ensured that it is intuitive for the user to click through to the setting.”
  • “Easily accessible” under Article 27(3): is redacted. 
  • “Recommender systems in scope that are based on profiling” under Article 38: “Friends, Accounts, Comments, Search, For You Feed (FYF), Following, Live, Notifications and Stories.”
X X’s audit does not include supplemental definitional standards for Article 27 or Article 38.  

What methods were used to audit recommender systems? 

Auditors used varied methodologies. Table 2 below summarizes the methodologies auditors used in assessing select recommender system compliance.

Given that the EC has provided limited guidance, variation in audit methodologies was foreseeable. Like with definition, different approaches early on in the DSA compliance regime can allow us to take stock of effective practices and emerging gaps. So, what methodological lessons does this first round of audits offer? There is certainly room for improvement.

  • Outcomes: Audits appear split on whether they confirm user modification of the recommender system actually changes system outputs. Some audits appear to be desk-based, such as Facebook, where the auditor looks at the system card and “sample changes.” Pinterest’s auditor examined “model documentation and code and ascertained that the main parameters used in recommender systems were impacted by [the] user’s decision on opting out from profiling.” Auditors for Google/YouTube and Snap, however, appear to describe a more involved process. With Snap, the auditor selected a sample of recommender systems from an inventory, and assessed whether “the algorithmic systems were tested and approved consistent with the audited providers policies and processes.” For Google/YouTube, the auditor appeared to actually confirm recommender system outcomes. The auditor inspected “the changes to the recommender system outputs before and after modifying the options and determined that the user’s selected options influence the main parameter.” Meanwhile, X’s audit was not able to proceed with planned “substantive testing” due to a lack of necessary audit resources. Looking across these approaches, YouTube’s audit appears to assess outcomes in the most robust way and could be incorporated across other audits. 
  • User experience: The audits also differ in how the auditor assessed the user interface for modifying recommender system parameters. The Facebook audit notes that user tools to modify recommender systems were “easily accessible from the specific section of the online platform’s interface where the information was being prioritized.” But the auditor doesn’t describe how it came to this determination or if any data was used to inform this assessment. Pinterest’s auditor confirmed modification was “direct and easily accessible,” which was informed by the definition Pinterest provided to the auditor described above. Google/YouTube’s auditor describes reviewing and assessing user journey processes (in the form of screenshots). Platforms regularly conduct user testing of design changes to assess impact. Were such user testing studies incorporated into the audit process when assessing accessibility for users? Mozilla has found, for example, that subtle design choices influence users’ choice. To meaningfully assess the user interface, auditors surely must consider the metrics and testing that platforms use to assess their interfaces. Research suggests that small changes can have big impacts.    
  • Data: One of the most striking gaps in the audits is the lack of discussion of how audits were informed by actual platform data and assessments. When assessing the sufficiency of the inclusion of main parameters, what benchmarks did auditors use? Did they consider input data (i.e., sources of raw information), predictions, score, or weights assigned to any of these terms when assessing the most significant criteria in design? When assessing options for users to shape recommendation system design, did the auditors assess a combination of granular controls (e.g., over individual pieces of content) and coarser control over the inclusion of specific topics (e.g., political content)? Did the auditors consider the platform’s use of item or user level surveys? Research suggests these are important tools to effectively shape the design of recommender systems. Moving forward, audits can be much clearer about their sources of data and evidence.
Table 2: Summary of Article 27 and 38 Audit Methodologies 
Platform Audit Approach 
Facebook Article 27(1): Terms and Conditions; Main Parameters

  • 10 step process; engaged management; reviewed terms and conditions; System Card documentation; and “a sample change to a Facebook recommender system.”

Article 27(2): Criteria and Relative Importance

  • 8 step process; engaged management; asked about non-compliance; inspected semi-annual review of System Card documentation; and “a sample change to a Facebook recommender system.”

Article 27(3): Modification: 

  • 4 step process; engaged management; determined easy access; asked about non-compliance.

Article 38(1): Non-Profiling Option

  • 6 step process; engaged management; asked about non-compliance; checked for non-profiling option.
Google/ YouTube  Article 27(1): Terms and Conditions; Main Parameters

  • 4 step process; walk-through; engaged management; reviewed terms and conditions (and listed); review user journey process screenshots; assessed for reasonableness.

Article 27(2): Criteria and Relative Importance

  • 4 step process; walk-through; engaged management; reviewed Transparency Center; assessed for reasonableness.

Article 27(3): Modification: 

  • 7 step process; walk-through; engaged management; inspect user journey process screenshots; inspect the changes to the recommender system outputs before and after modifying the options and determined that the user’s selected options influence the main parameters; inspected model documentation and code; inspected system documentation, dashboards and reports; inspected systemic risk assessment pursuant to Art. 34

Article 38(1): Non-Profiling Option

  • 6 step process; walk-through; engaged management; inspected platform’s underlying code; inspected user journey screenshots; inspected the results before and after disabling profiling based system settings
Pinterest Article 27(1): Terms and Conditions; Main Parameters

  • 6 step process; engaged management; reviewed terms and conditions; examined Pinterest’s recommender system model documentation and code; and “a sample change to a Facebook recommender system”

Article 27(2): Criteria and Relative Importance

  • 6 step process; ascertained that the parameters used in recommender systems include but are not limited to user demographic information and Pinterest search history, and matched with Terms and Conditions information.

Article 27(3): Modification: 

  • 9 step process; engaged management; confirmed modification “was directly and easily accessible;” ascertained that recommender system parameters were in fact impacted by user choice to opt out of profiling.

Article 38(1): Non-Profiling Option

  • 5 step process; engaged management; inspected model documentation and code.
Snap Article 27(1): Terms and Conditions; Main Parameters

  • 9 step process; engaged management; reviewed terms of service; inspected recommender system model documentation, code and main parameters; inspected documentation, dashboards and reports; tested that recommender system was “tested and approved consistent with the audited providers policies and processes prior to being implemented in production.”

Article 27(2): Criteria and Relative Importance

  • 5 step process; engaged management; reviewed terms of service; confirmed most significant criteria described in terms of service and relevant policies 

Article 27(3): Modification: 

  • 10 step process; engaged management; “inspected the code supporting the recommender system preference functionality in the production environment;” inspected documentation, dashboards and reports; inspected the “recommender system model documentation and code” to conclude Snap’s “policies, processes and controls were followed for the selected samples.”

Article 38(1): Non-Profiling Option

  • 9 step process; engaged management; inspected model documentation and code to confirm “the user’s preference and selection are used as input by the recommender systems.”
TikTok Article 27(1): Terms and Conditions; Main Parameters

  • 5 step process; some redacted; conducted walk-through; inspected process to identify main parameters; user options to modify parameters in Help Center; given non-compliance, opted to “perform additional substantive procedures;” “requested TikTok to perform an analysis where they evaluated all identified settings and functionalities against their internal benchmarks.”

Article 27(2): Criteria and Relative Importance

  • 3 step process; conducted walk-through; inspected documentation on the most significant criteria and reasons for relative importance.

Article 27(3): Modification: 

  • 6 step process; conducted walk-through; evaluated “the design and implementation of the functionality” of user control; inspected
    “the design and implementation” of user modification functionality, “determined that the options did appropriately alter the performance of the underlying recommender systems.”

Article 38(1): Non-Profiling Option

  • 4 step process; conducted walk-through; assess design and implementation of non-profiling recommender system and identified that TikTok uses location 
X Article 27(1): Terms and Conditions; Main Parameters

  • Public information and internal documentation; some redacted; “planned substantive testing was reduced due to a lack of the availability of identified … Expert resources that were necessary to inform the testing team.”

Article 27(2): Criteria and Relative Importance

  • Public information and internal documentation; some redacted; “the reasons for the relative importance of parameters may not be static or continuous, due to the nature of the analytical models used by the recommender systems.”

Article 27(3): Modification: 

  • Public information and internal documentation; some redacted.

Article 38(1): Non-Profiling Option

  • Public information and internal documentation; some redacted; “planned substantive testing was reduced due to a lack of the availability of identified … Expert resources that were necessary to assist the testing team conduct some processes.”

The Way Forward

For audits to deliver on their potential, we need (1) a common understanding of key terms and expectations, and (2) minimum standards of data, evidence, and documentation that should be incorporated into the audit (and risk assessment) process.

  • Definitions: Key terms can be further defined and there is concrete work on which to build. 
    • The definition and operationalization of “plain and intelligible language” should leverage behavioral insights and be grounded in existing research into effective online disclosures, including by CUPS, the OECD and other relevant sources. 
    • A definition of “main parameters” should explicitly spell out what is expected, including in relation to input data, values, and weights. This would allow stakeholders to more effectively understand and compare how recommender system design may contribute to risk and advance effective mitigation.
    • The definition and operationalization of “directly and easily accessible” should account for research and evidence related to OCA and consider platform measures of accessibility of tools to influence recommender systems.
  • Data, Evidence, and Documentation: There is a need to clarify that the audit process (and risk assessments) should further incorporate a range of relevant platform data sources. The recommender system audits did not appear to consider the wide range of existing platform metrics related to specific user behaviors or user experiences. Much of this data already exists within platforms and there are existing efforts to clarify what minimum levels of data should be expected to assess recommender systems, as well as DSA requirements more broadly

Clear standards would benefit all involved. Platforms would gain a more predictable and coherent policy environment. European users would have confidence that meaningful standards are in place. Auditors would have a more objective playbook to follow, improving the efficiency of the audit process.

Now is the time to engage to improve the next round of assessments and audits.

This commentary is cross-posted to Tech Policy Press.

Close