The Tree of Complexity

Publikationen

Table of Content:

1. Introduction
2. Understanding the data
3. Introducing product features
4.1. Product features comparison
4.2. Product features in detail
5. Conclusion
6. Annex

Analyzing YouTube's Interconnected Components as a Case Study for Comprehensive Platform Audits and Risk Assessment

Executive Summary

The Digital Services Act (DSA), proposed by the European Union, aims to mitigate the risks associated with online platforms. This policy paper underscores the significance of conducting comprehensive audits to guarantee the secure and responsible functioning of these platforms. To illustrate this point, the study focuses on YouTube and utilizes the data collected during the data donation campaign in Summer 2021 by the consortium DataSkop, coordinated by AlgorithmWatch, a non-profit research and advocacy organization.

The analysis centers on three key components of YouTube available in the data: the news playlist, search results, and Up Next recommendations. Averaged metrics were computed to examine how each component operates in terms of personalized content, credible sources, content sharing among users, and differences between logged-in and logged-out statuses. For instance, the findings reveal that Up Next recommendations exhibit a much higher degree of personalization compared to other features, differ across users and across login statuses. However, it is crucial to acknowledge that while news features and search results may appear less personalized, addressing specific risks comprehensively requires considering the interdependence of these components.

Furthermore, the study emphasizes that relying solely on average figures for each component fails to provide a comprehensive understanding of the platform's dynamics. For instance, by delving deeper into the Up Next component, it was discovered that a cluster of users received a disproportionately high number of recommendations for specific content. This discovery could only be made by analyzing the Up Next component in greater detail, and no discernible difference in suggested content was observed for this cluster in the news and search features.

In conclusion, understanding the interconnections among the components of online platforms, such as YouTube, is crucial for comprehensive audits aimed at ensuring compliance with the Digital Services Act (DSA). By analyzing the tree-like structure of these components and their interdependencies, auditors can uncover valuable insights to develop robust auditing processes. The findings of this study emphasize the need to move beyond average metrics and delve into the details of specific components, as they may hold unique risks and dynamics that require careful examination. By adopting a holistic approach to auditing, we can effectively address the risks associated with online platforms and foster a safer and more accountable digital environment.

1. Introduction

The Digital Services Act (DSA) has been introduced by the European Union to address the risks posed by online platforms. In particular, the act calls for auditing the risks associated with these platforms, which is crucial to ensure their safe and responsible operation.[1] However, the DSA does not provide clear guidelines on how these audits should be conducted, leaving room for various interpretations. This gap increases the likelihood of overlooking critical insights, such as the intricacies within each individual platform component and the interconnectedness of the platform as a whole. This limited view can result in a flawed understanding of the platform's impact on users and lead to inadequate recommendations for improving its performance. Therefore, it is crucial to establish a comprehensive auditing framework that considers the intricate details of each component and their interdependencies to ensure that the platform is assessed thoroughly and effectively.

In the past, platforms and in particular very large online platforms (VLOPs) were often perceived as monolithic entities with a single algorithm, which was misleading. In reality, platforms like YouTube are made up of multiple components and different algorithms, all of which might have different functional logics. Understanding the differences between these components is crucial for accurate risk assessments. To analyze the risk posed by a platform, auditors need to identify the specific components of the platform that are involved, understand how they and their interplay contribute to the risk, and determine which components have the highest impact. For example, in the case of YouTube, when a person searches for election information, they might start with the platform's search engine and only then surf the video recommendation engine. Therefore, looking at only one of those components in isolation is not enough.

To underscore the significance of dissecting the platform into its individual components, we have taken YouTube as a case study, utilizing the data collected as part of a data donation project by the consortium DataSkop, coordinated by AlgorithmWatch, a non-profit research and advocacy organization, in the Summer of 2021. Our findings highlight the considerable differences in various components of the platform and the varying levels of risks associated with them. By quantifying these differences, the study offers a more nuanced understanding of the platform's ecosystem, including the interconnections and dependencies between its components.

However, the complexity extends beyond this point. While examining the different components is a crucial step in understanding the platform's ecosystem, it is not enough to fully understand the user experience. For example, in 2021 YouTube set a goal to limit views of "borderline" content from recommendations to 0.5% of all views on the platform.[2] Borderline content refers to material that comes close to but does not quite violate the community guidelines. While this may sound like a small number, it is important to understand that this average does not tell us everything. If those views are concentrated among a small group of users, it can still be a significant problem. Therefore, auditors need to go beyond the average metrics and examine the individual components in detail. Thus, the platform can be thought of as a tree with different interconnected components or branches, each of which may further branch out, form clusters, and requires more careful analysis.

In this publication, we aim to contribute to the development of useful auditing processes by analyzing YouTube as a demonstrative case. First, we provide a brief overview of the data at hand. Then, we introduce three YouTube components that are included in our dataset and explore their variations. Next, we examine each component individually in more detail. Finally, the publication concludes with recommendations for implementing a thorough auditing process.

It is important to note that this publication does not aim to provide an extensive definition of risks or exhaustive guidelines for auditing. Rather, we focus on showcasing the complexities of platform analysis and make the case for a more nuanced approach. For a more comprehensive overview of the Digital Services Act (DSA), including its objectives, importance, and related risks, audits, and assessments, we recommend consulting the SNV publication "Auditing Recommender Systems".

For optimal viewing experience, we recommend accessing this publication from a desktop or laptop computer, as the charts and interactive elements may not be correctly displayed on smaller mobile devices.

2. Understanding the data

The data used in this analysis was gathered as part of a data donation project by the consortium DataSkop, coordinated by AlgorithmWatch, in cooperation with the European New School of Digital Studies (European University Viadrina). DataSkop is funded by the German Federal Ministry of Education and Research. The dataset includes content recommendations from various YouTube components, such as:

news playlist,
search results,
and Up Next recommendations that suggest additional content based on the video a user is currently watching.

Additionally, the dataset includes certain user profile information such as subscriptions and most recent watch history.

As we will explain below in more detail, our data analysis reveals how recommended content differs across different components of the platform, with distinct clusters of users being exposed to specific content solely through certain platform features.

It is important to note that the data was collected from around 5000 participants, but between 2600 and 4400 users were included in the analysis of each step of the experiment depending on data availability due to technical issues. Moreover, the sample of data donation campaign participants is not necessarily representative of the broader YouTube user base, and the results cannot be generalized beyond this sample. Therefore, the primary objective of this publication is not to provide a comprehensive understanding of how YouTube operates but rather to illustrate and emphasize the complexities of platform auditing.

3. Introducing product features

Very large online platforms are complex entities that consist of several technical products, each with their own teams, objectives, and sets of algorithms. We will illustrate this by using YouTube as an example. Oftentimes, even a single webpage holds multiple platform features. For example, from YouTube homepage users can access several YouTube components, or product features, as defined by the company itself.[3]

This is an interactive graphic. Use your mouse to interact with the chart by hovering over its elements to explore YouTube's various components. If you are using a mobile device, click on the corresponding elements.

It is important to acknowledge that the list of displayed product features in the chart is not comprehensive. Also, these products may be further divided into sub-products. The aim of this presentation is not to provide a comprehensive overview of YouTube but rather to highlight the intricate architecture of any large platform.

Within the scope of our available data, we have chosen to focus on three of YouTube's most prominent product features: News, Search, and Up Next recommendations. For your reference, we will provide a brief introduction to each feature in the following section.

YouTube has developed several news features, some of which are integrated in other features like search or homepage. Our analysis pertains to the top news stories playlist feature, which can be found in the "News" section of YouTube (YouTube.com/news), and specifically focuses on examining the first five videos of the playlists from the users logged in to their YouTube accounts.

According to YouTube, the news content featured on their platform is sourced from credible and authoritative news outlets.[4] However, the specific list of these sources remains undisclosed. Therefore, we compiled a list of channels that, according to our data, seem to fulfill YouTube's criteria for being a reputable and reliable source of news. Our selection includes all channels featured in the top news stories playlist as well as the channels of public broadcasters. The complete list is available below.[5] Henceforth, we will refer to these channels as “authoritative” or “credible”.

Overall, the news top stories feature on YouTube appears to be not personalized. Unless a user is subscribed to the news channels we previously identified, the suggested videos are not tailored to their individual viewing history or subscriptions. Moreover, a comparison of the news top stories playlist across multiple users who donated their data at the same hour reveals that a vast majority of them are presented with the same content. In fact, over 75% of these users were shown more than 90% of the same news playlist videos.

When a user types something in the search bar on YouTube, the website suggests a list of videos related to the search. For the search function experiment several political or federal election-related terms such as ‘Scholz’, ‘Baerbock’, ‘Laschet’, ‘Gendern’ (i.e. using gender-inclusive language), ‘Hochwasser’ (i.e. flood), ‘Impfpflicht’ (i.e. mandatory vaccination), and ‘Bundestagswahl 2021 wen wählen’ (i.e. federal election 2021, who to vote for) were selected. We looked at the first ten video suggestions for these seven search queries, first when the users were logged in, and then when they were logged out of their account. This enabled us to compare the results across different users, as well as across login/logout statuses for each user.

On average, there is considerable similarity between the content displayed in the login and logout states for each user. However, this could be due to carry-over effects, in which the search results are influenced by dependencies from previous searches, as there was less than a minute interval between different queries. Another possible factor is that the logout phase of the study was performed within the same browser session with active cookies.

The search results are also relatively consistent among various users, with the majority of users being presented with a large proportion of the same video suggestions. Nevertheless, a few users are provided with uncommon material. This suggests that the degree of personalization in the search engine feature is more pronounced than in news, although the exact extent of this is uncertain.

In general, we found that the top ten search results on YouTube vary more between different users compared to the newsfeed. Furthermore, the proportion of content from the "credible” sources was lower, at approximately 74%. However, similar to the newsfeed, the video recommendations in the search outcomes were not significantly influenced by users' subscriptions or watch history.

The Up Next recommendations appear when users are watching a video. It suggests additional content based on the currently watched video, alongside other videos that YouTube thinks a user may be interested in. In our data we had two gateways to the feature: by accessing certain preselected videos directly, and by starting the 5 videos from the top news playlist above in the login and logout states. This enabled us to compare how the feature performs when accessed through different pathways, and also how the recommendations perform in logged-in/-out states. We limited our analysis to the first 10 recommended videos in all cases.

Based on our findings, the Up Next recommendations feature on YouTube appears to be highly personalized, with an average of 50% of the suggested content derived from a user's subscriptions or previously viewed channels when logged in. This holds true for both pathways we examined. However, when users are logged out, the recommended videos vary significantly and appear to be unrelated to subscriptions or viewing history. Furthermore, the proportion of “authoritative” recommendations is lower than in the two other features in both logged-in (32%) and logged-out (68%) states respectively.

Overall, the Up Next feature shows a significant shift between the logged-in and logged-out states, with high levels of personalization in the former and greater emphasis on authoritative sources in the latter.

4.1. Product features comparison

By providing the brief introduction to each feature above, we intend to highlight their distinct characteristics and the need for tailored approaches. To further underscore this point, we have compiled a table that presents some general information and mean values from our data for each feature. This table offers a visual representation of the unique characteristics of each feature and why a custom-fit approach is required for effective analysis.

For example, while in our data the newsfeed and the search results are not defined by personal subscriptions and watch history, the Up Next recommendations are heavily influenced by them. Furthermore, as shown in the table, the features differ in how they work with "authoritative" sources, how similar the recommended content is across users, and how similar the recommendations are when a user is logged-in versus logged-out.

News

Up Next

Personalized recommendations

from subscriptions or previously watched channels

No
data

"Authoritative" recommendations

from channels that appeared in the news, and further public broadcasters

No
data

73%

68%

Shared content among users

Recommendations shown to at least 25% of users in the data

No
data

78%

48%

Overlap: logged-in vs logged-out

Consistent recommendations for each user across user login status

No data

83%

14%

This is an interactive graphic. Use your mouse to access explanatory information. In contrast to the Search and Up Next product features, the data from the News feature was only collected when the users were logged into their accounts.

It is essential to emphasize that while the news playlist and the search results may appear more independent of user preferences than the Up Next recommendations, analyzing the components together, considering their interdependence, is necessary to address specific risks. Typically, when exploring news or search features, users click on a video and move to the more personalized Up Next feature. Our analysis has revealed that the point of accessing Up Next recommendations, whether it is from the news playlist or opening a particular video by following a link, does not have a significant impact on the observed patterns. Thus, it is necessary to consider all features holistically when assessing the risks associated with personalized recommendations on online platforms.

4.2. Product features in detail

While the table provides a valuable snapshot of the general characteristics of the three YouTube features, it only scratches the surface of what we can learn about them. In order to gain a more comprehensive understanding of each feature, we must conduct a deeper analysis of the data. In particular, we need to examine the distribution of content and the clustering of users within each feature.

In the following sections, we will delve into some numbers from the table in greater detail.

4.2.1. News

YouTube claims that their news content comes from credible and authoritative news sources. However, they have not disclosed a list of such sources. This lack of transparency makes it difficult to assess the trustworthiness of YouTube’s sources and to ensure accountability and oversight.

Furthermore, simply providing a list of authoritative sources would not suffice in ensuring transparency and accountability, as such a list alone would not reveal how recommendations are distributed and which channels are given higher visibility. In April 2022, AlgorithmWatch published a comprehensive study of YouTube's news playlist, which revealed the WELT channel's notable dominance compared to other featured channels.[6]

Most frequent channels in the news top stories playlist

As the chart above shows, out of the first 5 news playlist videos, on average more than half of those videos came from the WELT channel.

4.2.2. Search

For the search feature we were able to compare the number of authoritative sources across seven search queries, namely "Baerbock", "Bundestagswahl 2021 wen wählen" (i.e. federal election 2021, who to vote for), "Gendern" (i.e. using gender-inclusive language), "Hochwasser" (i.e. flood), "Impfpflicht" (i.e. mandatory vaccination), "Laschet", and "Scholz". This allows for an examination of how the results vary across different queries and how they differ from the average number for the entire dataset.

Based on our data analysis, we found a significant variance across search queries, with results sometimes deviating considerably from the average of 74%. For instance, the authoritative sources accounted for only 16%-17% of the results for "Bundestagswahl 2021 wen wählen," while they made up 98% of the results for "Scholz," as illustrated in the chart below.

These results further highlight the necessity to examine the platform's features beyond the average numbers.

Average share of "credible" search results by search query

"Baerbock"

85%

"Bundestagswahl 2021
wen wählen"

17%

"Gendern"

63%

"Hochwasser"

65%

"Impfpflicht"

92%

"Laschet"

93%

"Scholz"

97%

Each number represents a total share of "credible" sources in over 36.000 search results in total for over 3600 users for each query.

To examine why the numbers are so different across the search queries we looked at the most frequent channels for each search query. Our analysis identified one channel, "Marvin Neumann," that overwhelmingly dominated the search results for the query "Bundestagswahl 2021 wen wählen." It is worth noting that Marvin Neumann had previously participated in YouTube's Creator Program for Independent Journalists, a support program for freelance journalists on YouTube whose participants received a grant, best practices training and specialist support from YouTube.[7]

Most frequent channels in the search results

Percentages are calculated from over 36.000 search results, equivalent to 10 searches by over 3600 users. This is an interactive chart where you can explore different search queries using the dropdown menu.

These results underscore the necessity of disclosing a list of authoritative sources to allow for public scrutiny, as well as how the views of such sources are distributed.

4.2.3. Up Next

The Up Next feature of YouTube is the most personalized feature, which means it can vary greatly from one user to another. We started its examination by analyzing the average percentage of recommended videos from channels that the user has subscribed to or previously watched, as well as the average percentage of “credible” sources - the channels from the top news stories playlists and further public broadcasters.

We looked at the Up Next recommendations to each of the videos preselected before the start of the donation campaign – seed videos:

"7 kritische Fragen zur Impfung" ("7 critical questions about vaccination")
"Armin Laschet erklärt. Der nächste Bundeskanzler?" ("Armin Laschet explains. The next Federal Chancellor?")
"Die Zerstörung der CDU." ("The destruction of the CDU.")
"Hochwasser in NRW: Katastrophenfall im Rhein-Erft-Kreis" ("Flooding in NRW: State of emergency in the Rhein-Erft district")
"KANZLERDUELL auf YouTube mit REZO?" ("CHANCELLOR DEBATE on YouTube with REZO?")
"mRNA-Impfstoffe: Erste Hinweise auf Langzeitfolgen (mit Clemens Arvay)" ("mRNA vaccines: First indications of long-term effects (with Clemens Arvay)", the video is currently removed from the platform)

After the current video has finished, YouTube automatically plays another video. So, each of the seed videos above triggered an auto-play chain of videos, which was collected for seven iterations. We also looked at the Up Next recommendations at each auto-play iteration in each chain.

The chart below presents how many personalized and how many "authoritative“ recommendations users see on average out of 10 recommended videos when they were watching one of the seed videos or one of the auto-play videos this seed video triggered.

Personalized and "credible" content in recommendations to seed videos and subsequent auto-play videos

The share of recommended videos from channels that the user has subscribed to or watched before is more consistent compared to the share of videos recommended from a list of “credible” sources.

In other words, the percentage of “credible” videos in the recommendations varied more (ranging from 13% to 57%) based on the seed video that was played, whereas the share of recommended videos from watched or subscribed channels had a narrower range (between 38% and 52%).

Overall, as the auto-play iterations progressed, the recommendations became more tailored to the individual user, but were less likely to come from our list of “credible” channels.

The trend becomes clearer when comparing the relative percentage change of each auto-play iteration to the initial value of the seed video. However, the auto-play chain triggered by the "mRNA-Impfstoffe: Erste Hinweise auf Langzeitfolgen (mit Clemens Arvay)" seed video is an exception. In this case, the share of "credible" recommendations increased by nearly 40% in the first auto-play iteration.

Thus, by examining the data for each seed video, we identified an exception to the overall pattern - "mRNA-Impfstoffe: Erste Hinweise auf Langzeitfolgen (mit Clemens Arvay)" (i.e. "mRNA vaccines: First indications of long-term effects (with Clemens Arvay)"). The video on alleged long-term effects of mRNA-vaccination can no longer be found on YouTube. In previous analyses of this data, DataSkop's researchers at the European New School of Digital Studies[8] and the German news magazine Der Spiegel[9] have both highlighted the odd nature of the video's recommended content, including "esoteric and conspiracy-oriented contents."

Analysis of the recommended channels associated with the seed videos reveals the two channels flagged by Der Spiegel: TV.Berlin and LitLounge.tv. These channels sometimes feature right-populist content (such as Tichys Ausblick) or clips from the field of alternative medicine and esotericism. The frequency of these channels in the recommendations is roughly 3%, leading Spiegel to conclude that certain users may encounter content that leads them in an unexpected or unwanted direction, but the probability is not very high.

Most frequent recommended channels (login)

Percentages are calculated from over 26.000 search results, equivalent to 10 searches by over 2600 users. This is an interactive chart where you can explore different seed videos using the dropdown menu.

However, by shifting our focus from the individual channel frequencies to the total number of videos related to alternative medicine, esotericism, and vaccine skepticism that each user is exposed to, we can discover some important insights.

From all videos recommended while watching the mRNA video, we manually compiled a list of 25 videos related to natural healing, esotericism, and vaccine skepticism. [Please have a look at the annex for the list of 25 videos and to read more about our selection process.] We then looked at how many of these videos were among each user's top 10 recommendations after watching the mRNA video. While some of these videos were only recommended once to a single user, it is important to consider the total count of videos related to natural healing, esotericism, and vaccine skepticism that each user was shown. Our analysis found that out of 3506 users, approximately 8%, or 273 users, had three or more videos from this cluster in their top 10 recommendations.

Distribution of videos related to natural healing, esotericism, and vaccine skepticism

This chart presents the user distribution based on the number of videos they were exposed to from a list of 25 videos related to natural healing, esotericism, and vaccine skepticism.

For instance, out of the total of 3506 users, the chart reveals that 2160 users did not encounter any videos from the list among their 10 Up Next recommendations. 152 users had 3 videos from the list, and the highest number of videos shown to any user was 6, observed in 6 users according to our data.

A significant number of users had three or more videos related to natural healing, esotericism, and vaccine skepticism among their top 10 recommended Up Next videos when watching the mRNA seed video. This means that over 30% of the recommended content for these users was about this subject, increasing the likelihood of further engagement with such content.

Ultimately, out of the total 3506 users analyzed, 2233 were exposed to two or fewer videos related to natural healing, esotericism, and vaccine skepticism, while 273 users, or 8% were exposed to three or more videos.

Further statistical analysis, as detailed in the annex, indicated that this distribution is likely to be meaningful and not merely a random occurrence. The analysis provides evidence that the observed differences in video exposure among users have a basis beyond chance alone.

This suggests that there is a group of users who are more likely to engage with this type of content.

These findings suggest that it is crucial to look beyond the average numbers and to examine content distribution for individual users. With this more nuanced approach, we can detect clusters of users exposed to specific types of content even when the individual channel frequencies do not appear to be high.

Now, how can these clusters be explained? Our analysis has shown that there is a correlation between being exposed to content from the specified topic and having a clean(ed) watch history, which includes scenarios where users had not watched any video prior to the donation campaign, deactivated the watch history saving function, or deleted their watch history before donating the data. On average, users with a clean(ed) watch history were recommended three videos from the above list, while users with an existing watch history were not recommended any videos. Further analysis indicated that the two groups of users were similar to each other in terms of their subscriptions and demographic characteristics (age, gender, first two digits of the postal code), and no other factors in our data were found to contribute to this difference. One possible explanation would be that for the users with a clean watch history the currently played video plays a higher role in determining the Up Next recommendations.

Importantly, we found no discernible differences in the suggested content for this cluster of users when it came to the news and search components of the data.

5. Conclusion

The case study on YouTube highlights the importance of a comprehensive approach to auditing digital platforms. Auditors should initially identify the specific components of the platform that contribute to the risk in question. Subsequently, during the analysis of these components, it is crucial not to rely solely on simplistic metrics such as averages but should also consider examining the distribution patterns of data. To ensure a thorough audit of digital platforms, we recommend the following steps:

Define which platform components contribute to the risk in question. Auditors need to identify the specific platform components involved and understand how they and their interplay contribute to the risk. The numbers presented in the product features comparison table give us an idea of the differences across various platform components. However, users rarely engage with just a single product feature. Our analysis showed that even though the news playlist is not personalized, the Up Next recommendations that are suggested once users access one of the videos are. This step is also crucial in the risk-scenario based approach to auditing recommender systems devised by Anna-Katharina Meßmer, Martin Degeling.[10]
Demand metrics such as distributions beyond simple averages. It is essential to look at individual components and their distribution patterns to identify potential clusters of users that may be exposed to harmful content. The analysis of news and search features underscores the importance of visibility for credible channels. While disclosing lists of authoritative sources is important, it is equally essential to ensure that these channels are visible to users. Furthermore, the analysis of Up Next recommendations shows how clusters of users exposed to specific types of content can form, even when the individual channel frequencies do not appear to be high.

Applying these findings to YouTube specifically, to promote transparency and accountability, the platform should disclose its authoritative news sources and criteria for their selection. Additionally, YouTube should provide data on how recommendations are distributed among these sources, including the percentage of recommendations and frequency compared to other channels.

Overall, the complexity of online platforms requires a nuanced approach in analyzing their risks. This includes breaking down the platform into its various components and examining each one in detail. The need for such detailed analysis and auditing becomes even more crucial as platforms continue to introduce new features, like YouTube Shorts, which may utilize a different set of algorithms and render previous research on the platform invalid. Additionally, the algorithms used by other components of the platform are also constantly evolving, which can impact their individual risk profiles. As a result, platform auditing is not a static process but requires ongoing adjustments to keep up with the changing dynamics. As platforms continue to evolve, the focus of an audit must also be constantly adjusted.

Acknowledgements

We would like to thank the individuals and organizations who have contributed to the development and publication of this paper. Their assistance and support have been extremely valuable.

We are grateful to AlgorithmWatch for providing the data that formed the foundation of this analysis.

We express our gratitude to Anna-Katharina Meßmer for the crucial contributions in transforming the data analysis results into a compelling narrative.

We would like to thank the Data Science Unit of the Stiftung Neue Verantwortung (SNV) for developing ideas, guidance and feedback on the paper, support with the data analysis, code review: Pegah Maham, Laurenz Hemmen, Wiebke Denkena, Shannon Reitmeir, Lisa Koeritz. We also acknowledge the exceptional support of the wider SNV team, including Luisa Seeling, Julian Jaursch, Alina Siebert, and Sebastian Rieger, throughout the research and publication process.

The views expressed in the text do not necessarily reflect those of the experts with whom we have exchanged views or their employers, and any remaining errors are our own.

The paper belongs to the SNV programme ‘Data Science’, which is supported by Stiftung Mercator.

Editing: Luisa Seeling

Visualizations consultations: Julia Semenikhina

Design oversight: Alina Siebert

Sources and footnotes

[1] Note: In the context of this publication, the terms "audit/auditors" are used in a general manner without specifically referring to any particular type of audit mentioned in the DSA (Digital Services Act). The purpose is to highlight how data can reveal hidden problems, offering valuable insights for a wide range of readers. However, it is important to note that the analysis presented here may hold particular relevance for external platform auditors and those involved in evaluating internal risk assessments.

[2] Cristos Goodrow, ‘On YouTube’s recommendation system’, YouTube Official Blog, 15 September 2021.

[3] ‘YouTube Product features’

[4] ‘News on Youtube’

[5] Our analysis relied on a list of channels that, according to our data, seem to fulfill YouTube's criteria for being a credible source of news. This included the channels featured in the top news stories playlist, and some further public broadcasters' channels identified through the information provided in the clarification box displayed beneath videos from public broadcasters:
"AFP Deutschland", "ARD", "BR24", "Bayerischer Rundfunk", "DER SPIEGEL", "DIE DA OBEN!", "DW Deutsch", "DW Documentary", "DW News", "DerTagesspiegel", "FOCUS Online", "General-Anzeiger Bonn", "Handelsblatt", "MDR DOK", "MDR Investigativ", "MDR Mitteldeutscher Rundfunk", "MrWissen2go", "MrWissen2go Geschichte", "NDR Doku", "NDR Ratgeber", "Nordwest TV", "PULS Reportage", "Quarks", "SRF DOK", "SRF Dok", "STRG_F", "SWR", "SWR Doku", "Saarländischer Rundfunk", "TRU DOKU", "TV Westsachsen", "TV.Berlin - Der Hauptstadtsender", "Terra X", "Terra X Lesch & Co", "Terra X plus", "WDR", "WDR Doku", "WDR aktuell", "WELT", "WELT Nachrichtensender", "Weltspiegel", "WetterOnline", "Y-Kollektiv", "Y-zwei", "ZAPP - Das Medienmagazin", "ZDF", "ZDFheute Nachrichten", "ZDFinfo Dokus & Reportagen", "ZDFneo", "[W] wie Wissen", "faz", "funk", "hrfernsehen", "maiLab", "maischberger. die woche", "münchen.tv", "ntv Nachrichten", "phoenix", "quer", "rbb", "reporter", "stern", "tagesschau", "wetternet".

[6] Johannes Filter, ‘DIE WELT newspaper dominates YouTube’s featured news playlist’, DataSkop, 14 April 2022.

[7] Sebastian Meineck, ‘Sollten Journalist:innen Geldgeschenke von YouTube nehmen?’, Netzpolitik.org, 17 April 2022.

[8] Peter Kahlert, European New School of Digital Studies (European University Viadrina), DataSkop project, ‘YouTubes Wahl’ , 28 February 2022.

[9] Holger Dambeck and Patrick Stotz, ‘Wie YouTube es allen recht machen will’, DER SPIEGEL, 22 September 2021.

[10] Dr. Anna-Katharina Meßmer and Dr. Martin Degeling, 'Auditing Recommender Systems' (Berlin: Stiftung Neue Verantwortung, 2023).

6. Annex

1. How was the list of 25 videos related to alternative medicine, esotericism, and vaccine skepticism compiled?

We started by identifying the channels that exclusively appeared in the Up next recommendations for the mRNA skeptical video. These channels were not present in any other suggested content, allowing us to locate content specifically associated with vaccine skepticism. Additionally, we included the channels mentioned in the analyses conducted by AlgorithmWatch and Der Spiegel, namely "TV.Berlin - Der Hauptstadtsender" and "OE24.TV". We also incorporated channels that featured video titles containing the term "Impfzwang" (compulsory vaccination), as well as the channel "CGArvay" which originally hosted the mRNA skeptical video.

Then, we manually reviewed all recommended videos from the resulting list of channels. From this process, we handpicked the 25 videos based on our assessment. The final list appears as follows:

Top Tipps zum Heilen mit der Natur | Clemens Arvay & Wolf-Dieter Storl | LitLounge.tv
TICHYS AUSBLICK - "Inflation, die heimliche Enteignung – was kommt auf uns zu?" | TV.Berlin - Der Hauptstadtsender
Livestream und Q&A: Dr. Mark Benecke und Clemens Arvay im Gespräch mit Ulla Atzert | Bastei Lübbe
TICHYS AUSBLICK - „Was ist in diesen Zeiten noch normal?“ | TV.Berlin - Der Hauptstadtsender
Wie gesund werde ich, wenn ich in den Wald gehe? - Clemens G. Arvay | Frank Elstner Menschen
Corona: Grundrechtsverletzung durch 3-G? | OE24.TV
TICHYS AUSBLICK: Bundesnotbremse und verlängerte Notlage – außer Spesen nichts gewesen? | TV.Berlin - Der Hauptstadtsender
Talk im Hangar-7 – Impfzwang und Testpflicht: Alles nur noch Schikane? | Kurzfassung | ServusTV
#ichmachdanichtmit | Fragt Zahnarzt Klottka!
Hans-Georg Maassen trifft Wolfgang Bosbach - "Wie geht es weiter mit Deutschland" | TV.Berlin - Der Hauptstadtsender
Clemens Arvay - Corona-Impfstoffe: Rettung oder Risiko? - Wirkungsweisen, Schutz und Nebenwirkungen | lismio - Deine Hörbücher und Hörspiele
"Beeindruckend, dass Sie noch im Amt sind": Abgeordneter Jan Korte nimmt sich Jens Spahn zur Brust | RT DE
TICHYS AUSBLICK - Corona für immer? Impfzwang, Dauerlockdown und die Arroganz der Macht | TV.Berlin - Der Hauptstadtsender
Epidemiologe Dr. Friedrich Pürner: "Klärt die Leute gut auf, die ihr impft" | RT DE
tv.berlin Spezial - Hans - Georg Maaßen im TV Berlin Interview | TV.Berlin - Der Hauptstadtsender
Talk im Hangar-7 – Diktatur der Denkverbote: Streiten verboten? | Kurzfassung | ServusTV
TICHYS AUSBLICK - "Wie bankrott ist Deutschland?" | TV.Berlin - Der Hauptstadtsender
Der neue Querdenker-Film der ARD ist eine Warnung vor dem Denken | RT DE
Michael Brunner über Aufstand gegen Impfpflicht | OE24.TV
Aufspaltung der USA: Viele Amerikaner sind dafür | RT DE
Brinkmann & König - Das erwartet der Verband der Mittelständler von der neuen Bundesregierung | TV.Berlin - Der Hauptstadtsender
Gerald Grosz über Ergebnisse der Corona-Task-Force | OE24.TV
Nina Proll ruft zum Aufstand gegen Impfpflicht | OE24.TV
Aiwanger zum "Impfzwang": Dürfen uns nicht von den Lauterbachs in die Enge treiben lassen | RT DE
Corona-Impfstoffe: Der „Papa-Talk“ zur Kinderimpfung (Sönnichsen, Schubert, Braun, Arvay) | CGArvay

2. Is it statistically significant to observe that out of a total of 3506 users, 273 were recommended three or more videos from the given list of 25 videos out of a set of ten Up next suggestions?

We employed two approaches to determine the statistical significance.

In the first approach, we established a control group consisting of 25 videos. These videos were chosen based on their belonging to one topic and having comparable overall frequencies to the 25 videos mentioned earlier. The second most frequently recommended topic after natural healing, vaccine skepticism, etc. in the Up Next recommendations for the mRNA video was climate.

The chart below displays the distribution of individual video frequencies within the subset of the 150 most frequently recommended videos for the mRNA skeptical video. Videos belonging to the topics of natural healing, vaccine skepticism, esoterism are highlighted in orange, while videos from the control group related to climate - in dark red. The control videos contain the tag "klima" (climate) or similar terms like "wetter" (weather) in their titles.

The total sum of frequencies for the 25 control videos is 2155, compared to 2379 for the 25 videos in question.

Only 68 users out of the total 3506 users (approximately 2%) were recommended three or more videos from the 25 control videos out of their set of ten recommendations. This number is significantly lower than the group of 273 users with 3 or more videos in question. However, one potential factor could be that the individual frequencies of the control videos are slightly lower.

To address this, we employed a second approach to test the statistical significance of our findings.

This time, we ensured that the individual control video frequencies were always equal to or greater than those of the videos in question. We randomly selected 25 videos from the set of all videos recommended for the mRNA critical video, ensuring that each video from the natural healing, vaccine skepticism, etc. group was matched with another video having an overall frequency equal to or greater than that. The chart below provides an example of such a random selection, with the videos in question highlighted in red and the control videos in green. These control videos are not necessarily topically related to each other.

We repeated the process of selecting 25 random control videos with the specified condition 10.000 times, resulting in 10.000 different samples of control videos. The first video in these samples always remained the same, as it was the most frequently recommended video in the subset and the only video that satisfied the condition of having a frequency not lower than that of the most frequently recommended video in question.

For each of the 10.000 samples, we calculated the number of users exposed to three or more videos from the sample, resulting in the distribution displayed below.

The X-axis represents the number of users exposed to three or more videos from the 25 videos (control videos in dark red or videos in question in orange), and the Y-axis represents the number of samples in which this specific number of users was observed out of the 10.000 tested samples. The distribution indicates that in the majority of control samples, between 110 and 160 users were exposed to three or more videos with overall frequencies equal to or greater than those in question. Furthermore, no control sample had a number of such users exceeding 250.

We believe these results demonstrate the significance of 273 users with 3 or more videos related to the specified topic.