Towards Understanding and Defending Against Algorithmically Curated Misinformation Prerna Juneja A dissertation submitted in the partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2023 Reading Committee: Tanushree Mitra, Chair Chirag Shah Bill Howe Program Authorized to Offer Degree: Information School © Copyright 2023 Prerna Juneja University of Washington Abstract Towards Understanding and Defending Against Algorithmically Curated Misinformation Prerna Juneja Chair of the Supervisory Committee: Tanushree Mitra Department of Information Science Search engines and online social media platforms have become important sources of information for users worldwide. Despite their popularity and ubiquitousness, online platforms are not always trustworthy sources of information. The platforms are driven by black box algorithms that optimize for engagement over the credibility of information. There are increasing concerns that online platforms amplify inaccurate information, making it easily accessible via search results and recommendations. In this thesis, I explore the role of online algorithms in promoting misinformation and design defenses against online misinformation by incorporating human-centered insights from stakeholders such as fact-checking organizations and news agencies. My research recognizes the multifaceted nature of online misinformation and explores the algorithmic, policy, fact-checking, and design aspects of the problem through three distinct research threads. In the first thread of my research, I investigate and audit online platforms such as YouTube and Amazon to understand the role of algorithms driving these platforms in surfacing and amplifying misinformative content to users. Through the audits, I found that performing certain real-world actions on misinformative content (e.g. watching a conspiratorial video on YouTube, or adding a misinformative book to the cart on Amazon) could lead users into problematic echo chambers of misinformation. Additionally, I identified vulnerable user populations who could be targets for specific misinformative topics on online platforms. In the second research thread, I explore ways to support the fact-checking pro- cess to combat online misinformation. For this work, I interviewed 14 fact-checking organizations and news agencies across four continents to understand their current fact-checking processes, challenges, and needs. This research establishes fact-checking process as a socio-technical phenomenon, revealing the collaborative efforts of vari- ous stakeholder groups and technological infrastructure in facilitating effective fact- checking endeavors. It also highlights the technical, policy, and informational barriers to fact-checking and emphasizes the need for systematic changes in civic, informa- tional, and technological contexts to improve the overall quality of fact-checking. In the final thread of my dissertation research, I collaborated with Pesacheck, Africa’s largest indigenous fact-checking organization, to design and develop YouCred— a fact-checking system that enables monitoring of algorithmically driven online plat- forms for misinformation. To create YouCred, I incorporated insights from previous research threads as well as the expertise and feedback of Pesacheck’s fact-checkers throughout the development and design stages. YouCred specifically facilitates mis- information discovery and credibility assessments on the YouTube platform. It auto- matically generates search queries related to important events and topics of interest to fact-checkers and also offers an intuitive interface for annotating videos for misinfor- mation. Through a nine-month evaluation period at Pesacheck, YouCred demonstrates its practical value and usefulness for fact-checkers, underscoring the importance of ongoing collaboration between fact-checking organizations and technology developers in combating online misinformation. In conclusion, this thesis adopts a socio-technical approach to understanding and defending against algorithmically curated online misinformation. It also paves the way for future research in designing interventions to counter algorithmic harm and developing socio-technical systems to address the problem of online misinformation. “There were pages turned with the bridges burned Everything you lose is a step you take So make the friendship bracelets Take the moment and taste it You’ve got no reason to be afraid” — Taylor Swift i DEDICATION The dedication of this thesis is split three ways: To those who have shown me kindness and provided unwavering support, to TS’s songs that have been my constant companion through the good and bad times, and to you dear reader if you find value in any aspect of my work. ii ACKNOWLEDGMENTS I find myself humbled and deeply grateful as I reflect upon the journey that has brought me to this moment. Until now, I never fully appreciated the courage it took for me to leave behind a good job, my home, and my country to come to the U.S. for a Ph.D. Spending 5 years for a degree is a long time and a lot of life happens. And truth be told, it hasn’t been easy. Like every significant decision I’ve made, choosing this path made me gain some things and lose some. In the last five years, I survived several personal and professional hardships. Nevertheless, I am truly grateful that the journey is coming to an end on a positive note. I want all my students, friends, mentors, and family to know that your kindness, encouragement, and support are what made this possible. I first want to highlight the most rewarding aspect of this journey—mentoring students. Most days, I eagerly looked forward to meeting with my students, and our interactions were the highlights of my day. I want to thank David Xie, Louis Leng, Hayoung Jung, Vincent Zhiyuan Zhou, Stephanie L. Zhang, Alice Zhang, Ankita Khera, Benjamin Ye, Lee Polla, and Ethan Yee with all my heart. It has been an absolute honor to know you and to work with you. Your enthusiasm, energy, and creativity breathed life into the projects we collaborated on. My lab mates made this journey a lot more pleasant and memorable. Shruti, thanks to you, I always had a second home in Blacksburg and Seattle. Momen, it was a pleasure to collaborate with you. Brian, thanks for your willingness to help whenever needed. Kristen, I deeply admire your strength, clarity of mind, and independent spirit. Neelesh and Saloni, your presence infused our lab with much-needed life, energy, and colors. The last year was a lot of fun because of you two. To my friends, I am always grateful for your encouragement. Parul, Shruti Bansal, Arka, and Harry you all are my biggest cheerleaders. I am also grateful to all my therapists. Seeking therapy has emerged as one of life’s truest blessings and a very important part of my personal and professional journey. I also want to thank several researchers who have made a lasting impact on my iii DEDICATION academic journey. First, I want to acknowledge Dr. Mitra who introduced me to an incredible field of research that feels like home, where I truly belong. Your discipline and work ethic have been inspiring. Bringing me to Seattle was like offering me a lifeline, and I am sincerely grateful to you for that. I also want to extend my heartfelt gratitude to Dr. Francisco Servant and Dr. Megan Finn, whose teaching styles have been a deep source of inspiration for me. Additionally, I am grateful to my committee members for their guidance and support. I would also like to express my gratitude to Dr. Isabel Zhang and Dr. Alison Renner for their mentorship, as well as Dr. Eni Mustafaraj, and Dr. Kokil Jaidka, for their support, and words of encouragement over the last couple of years. I also want to acknowledge the huge role art has played these last five years. Especially, you TS. It feels like all these years, you’ve been filling pages, writing all these songs, narrating all these stories that deeply intertwine with my own life. With your songs, I have smiled, hoped, danced, and experienced a rainbow of beautiful emotions. I must also extend my thanks to the captivating TV shows that provided a perfect escape during the last five years. Schitt’s Creek, Atypical, Extraordinary Attorney Woo, Marvelous Mrs. Maisel, and Anne With An E felt like a warm hug. They allowed me to live vicariously through their characters, evoking laughter, and tears, and giving me a lot of amazing moments during my most trying times. In the end, I want to say that I’m proud of myself. And while my grad school journey is ending, a personal journey of self-acceptance and self-compassion is just beginning. iv TABLE OF CONTENTS Dedication ii Acknowledgments iii List of Figures xi List of Tables xxi 1 Introduction 1 1.1 Study Context: Misinformation . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Research Arcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Auditing online platforms to measure the prevalence of algorith- mically curated misinformation . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Identifying ways to support fact-checking online misinformation 6 1.2.3 Defending against online misinformation via system design . . 6 1.3 Contributions and impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Related Work 9 2.1 Auditing Online Platforms for Misinformation . . . . . . . . . . . . . . . 9 2.1.1 Misinformation in algorithmic platforms . . . . . . . . . . . . . . . 9 2.1.2 Search engine audits . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.3 Methodological Challenges in Audit Investigations . . . . . . . . 11 2.2 Fact-checking online-misinformation . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Fact-checking: Definition, Origin, and Evolution . . . . . . . . . . 13 2.2.2 Invisible Work of Fact-checking . . . . . . . . . . . . . . . . . . . . 14 2.2.3 Current Landscape of Research in Fact-checking . . . . . . . . . 15 2.3 Designing for Mitigating Online Misinformation . . . . . . . . . . . . . . 15 3 Auditing YouTube for perennial and demonstrably false conspiracy theories 18 v TABLE OF CONTENTS 3.1 Research Questions and Hypotheses . . . . . . . . . . . . . . . . . . . . . . 20 3.1.1 Five Misinformative Topics: Demonstrably False and Perennial 22 3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 Compiling High Impact Topics and Queries . . . . . . . . . . . . . 24 3.2.2 Overview of Audit Experiments . . . . . . . . . . . . . . . . . . . . 27 3.2.3 Annotating the Data Collection . . . . . . . . . . . . . . . . . . . . . 32 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.1 RQ1: Effect of demographics and geolocation . . . . . . . . . . . 36 3.3.2 RQ2: Effect of watch history . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.3 RQ3: Across topic differences . . . . . . . . . . . . . . . . . . . . . 39 3.3.4 Analyzing Video Length and Popularity . . . . . . . . . . . . . . . 41 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4.1 Effect of demographics and geolocation on misinformation . . . 42 3.4.2 Effect of watch history on misinformation . . . . . . . . . . . . . . 43 3.4.3 Tackling search engine enabled misinformation . . . . . . . . . . 44 3.5 Limitation and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 Auditing YouTube for election misinformation 47 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.1 Developing search queries to measure election fraud based mis- information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.2 Determining popular seed videos to collect up-next video trails 53 4.2.3 Experimental design . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2.4 Screening and study survey . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.5 Recruitment and study deployment . . . . . . . . . . . . . . . . . . 58 4.2.6 Developing data annotation scheme . . . . . . . . . . . . . . . . . 58 4.2.7 Classifying YouTube videos for election misinformation . . . . . 61 4.2.8 Annotating YouTube channels for partisan bias . . . . . . . . . . . 62 4.3 Ethical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4 RQ1 Results: Extent of Personalization . . . . . . . . . . . . . . . . . . . . 64 4.4.1 RQ1a: Personalization in search results . . . . . . . . . . . . . . . . 65 4.4.2 RQ1b: Personalization in up-next trails . . . . . . . . . . . . . . . . 66 4.5 RQ2 Results: Amount of Misinformation . . . . . . . . . . . . . . . . . . . 68 vi TABLE OF CONTENTS 4.5.1 RQ2a: Misinformation in search results . . . . . . . . . . . . . . . . 68 4.5.2 RQ2b: Misinformation in up-next trails . . . . . . . . . . . . . . . . 71 4.5.3 RQ2c: Misinformation in homepages . . . . . . . . . . . . . . . . . 74 4.6 RQ3: Composition and Diversity . . . . . . . . . . . . . . . . . . . . . . . . 74 4.6.1 RQ3a: Diversity in search results . . . . . . . . . . . . . . . . . . . . 75 4.6.2 RQ3b: Diversity in up-next trails . . . . . . . . . . . . . . . . . . . . 77 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.7.1 Standardization of search results . . . . . . . . . . . . . . . . . . . . 79 4.7.2 Scope for improvement in up-next trail recommendations . . . . 80 4.7.3 Participants’ beliefs vs algorithmic reality . . . . . . . . . . . . . . 82 4.8 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5 Auditing e-commerce platforms for health misinformation 85 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.1.1 Research Questions and Findings . . . . . . . . . . . . . . . . . . . 87 5.1.2 Contributions and Implications . . . . . . . . . . . . . . . . . . . . 88 5.1.3 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2.1 Health misinformation in online algorithmic systems . . . . . . . 90 5.2.2 Search engine audits . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3 Amazon components and terminology . . . . . . . . . . . . . . . . . . . . 92 5.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.1 Compiling high impact vaccine-related topics and search queries 95 5.4.2 RQ1: Unpersonalized Audit . . . . . . . . . . . . . . . . . . . . . . . 98 5.4.3 RQ2: Personalized Audit . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.4.4 Annotating Amazon data for health misinformation . . . . . . . 108 5.4.5 Quantifying misinformation bias in SERPs: . . . . . . . . . . . . . 112 5.5 RQ1 Results [Unpersonalized audit]: Quantify misinformation bias . . 114 5.5.1 RQ1a: Search results . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.2 RQ1b: Product page recommendations . . . . . . . . . . . . . . . . 120 5.6 RQ2 Results [Personalized audit]: Effect of personalization . . . . . . . . 124 5.6.1 RQ2a: Search Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.6.2 RQ2b: Recommendations . . . . . . . . . . . . . . . . . . . . . . . . 125 5.6.3 RQ2c: Auto-complete suggestions . . . . . . . . . . . . . . . . . . . 128 vii TABLE OF CONTENTS 5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.7.1 Amazon: a marketplace of multifaceted health misinformation . 129 5.7.2 Amazon search results: a stockpile of health misinformation . . 130 5.7.3 Amazon recommendations: problematic echo chambers . . . . . 130 5.7.4 Combating health misinformation . . . . . . . . . . . . . . . . . . . 131 5.8 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6 Identifying ways to support fact-checking online misinformation 135 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.1 Research context: Human and Technological Infrastructures . . 138 6.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.2.1 Participant Sampling Technique . . . . . . . . . . . . . . . . . . . . 139 6.2.2 Interview Protocol and Data Analysis . . . . . . . . . . . . . . . . 141 6.3 Types of Fact-checking: Short-term Claims and Long-term Advocacy . 142 6.3.1 Short-term Claims Centric Fact-checking . . . . . . . . . . . . . . . 142 6.3.2 Long-term Advocacy Centric Fact-checking . . . . . . . . . . . . . 144 6.4 Infrastructures Supporting Short-term Claims Centric Fact-checking . 145 6.4.1 News Desk Editors—Approving Claims and Guiding Fact-checkers145 6.4.2 Copy Editors—Ensuring Quality of the Fact-checks . . . . . . . . 146 6.4.3 External Fact-checkers—Monitoring, Investigating and Publish- ing Fact-checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.4.4 In-house Fact-checkers—Gathering Sources and Verifying Claims151 6.4.5 Social Media Managers—Disseminating Fact-checks, Increasing Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.5 Infrastructures Supporting Long-term Advocacy Centric Fact-checking 156 6.5.1 Investigators and Researchers—Conducting In-depth Research and Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.5.2 Advocators—Influencing Policy, Building Coalitions, Conduct- ing Educational Workshops and Literacy Campaigns . . . . . . . 159 6.6 Needs and Challenges of Stakeholder Groups . . . . . . . . . . . . . . . . 162 6.6.1 Skepticism Towards AI and Automation . . . . . . . . . . . . . . . 162 6.6.2 Need For Tools and Limiting Social Media Affordances . . . . . 164 6.6.3 Issues around policy and information infrastructure . . . . . . . 167 6.6.4 Emotional cost of fact-checking . . . . . . . . . . . . . . . . . . . . . 169 viii TABLE OF CONTENTS 6.6.5 Rendering visibility to the human infrastructure of fact-checking 170 6.6.6 Collaborative efforts in the fact-checking process . . . . . . . . . 171 6.6.7 Implications for future research on fact-checking . . . . . . . . . . 173 6.7 Conclusions and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 7 Defending against online misinformation via system design 178 7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 7.2 Formative study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 7.2.1 Participants and Procedures . . . . . . . . . . . . . . . . . . . . . . . 181 7.2.2 Interview protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 7.2.3 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 7.2.4 Design goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 7.2.5 Design process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.3 Overview of YouCred . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 7.4 Misinformation discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 7.4.1 Inputting seed videos to YouCred . . . . . . . . . . . . . . . . . . . 186 7.4.2 Formation of search queries . . . . . . . . . . . . . . . . . . . . . . . 188 7.4.3 Viewing and filtering search results . . . . . . . . . . . . . . . . . . 193 7.5 Credibility Assessments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 7.5.1 Video annotation database . . . . . . . . . . . . . . . . . . . . . . . . 200 7.5.2 Video annotation page . . . . . . . . . . . . . . . . . . . . . . . . . . 201 7.5.3 Claims database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 7.6 Evaluate stakeholders’ acceptance . . . . . . . . . . . . . . . . . . . . . . . 202 7.6.1 Patterns of Usage over Time . . . . . . . . . . . . . . . . . . . . . . . 204 7.6.2 Semi-structured interviews . . . . . . . . . . . . . . . . . . . . . . . 205 7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 7.7.1 Design Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 7.7.2 Maintainability of socio-technical systems . . . . . . . . . . . . . . 210 7.8 Limitations and Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 8 Future Work and Conclusion 214 8.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 8.1.1 Exploring New Horizons in Algorithmic Audit research . . . . . 216 8.1.2 Designing for algorithmic literacy and awareness. . . . . . . . . . 217 8.1.3 Designing for algorithmic recourse. . . . . . . . . . . . . . . . . . . 218 ix TABLE OF CONTENTS 8.1.4 Studying misinformation, fact-checking, and algorithmic impact beyond the US. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Bibliography 220 x LIST OF FIGURES FIGURE Page 3.1 (a) Google Trends allows users to specify search query as either a topic search or a term search. (b) Interest over time graph. (c) Popularity of chemtrail conspiracy theory topic in YouTube searches in the United States between January 1st, 2016 and December 31st, 2018. Color intensity in the heatmap is proportional to the topic’s popularity in that region. . . . . . . 24 3.2 (a) YouTube search’s auto-complete suggests 10 trending queries. (b) Google Trends displays the top search queries related to the term or topic entered in the search box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Three components collected from YouTube: (a) search results from a SERP and (b) Up-Next and Top 5 recommended videos from a video page . . . . . 28 3.4 Steps performed in Search experiments 1 and 2. . . . . . . . . . . . . . . . . . 29 3.5 Steps performed in Watch experiments 3 & 4. These experiments have two phases: (1) watch phase (denoted by →), (2) search phase (denoted by →). 30 3.6 RQ3: Percentages of video stances for each topic. . . . . . . . . . . . . . . . . 40 3.7 Box plots of (a) video length in seconds and (b) video popularity (pm) for each stance under each topic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1 Figure illustrating the method to curate search queries for audit experiment 51 4.2 List of video tags associated with YouTube video titled Is Voter Fraud Real? (video id: RkLuXvIxFew) that promotes voter fraud misinformation. Video tags are added by content creators while uploading YouTube videos on the platform. The tags can be extracted from videos via YouTube APIs or third-party tools. I use tags associated with videos shared by users promoting voter fraud claims on Twitter as search queries in the audit experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Figure illustrating the method to curate seed videos for the audit experiment 53 xi LIST OF FIGURES 4.4 Figure (a) presents an overview of the crowd-sourced audit of YouTube for election misinformation, Figures (b) and (c) show how the extension Tube- Capture collected YouTube components from both standard and incognito windows simultaneously. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.5 Figure illustrating the process of obtaining YouTube video annotations from AMT workers. The workers were screened via a qualification test where they were first trained by providing detailed descriptions of the annotation labels. To test their understanding, they were asked to annotate three YouTube videos whose labels were known in advance. Workers who correctly labeled the three videos proceeded to work on the annotation task. To ensure that the description of the annotation labels and task was clear and comprehensive, I posted on r/mturk—a subreddit community of AMT workers and AMT workers’ unofficial slack channel. I released the qualification test and annotation task after receiving positive feedback from the AMT community. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6 RQ1a results: Figure (a) shows participants’ response to survey question: “How much, if at all, do you think YouTube personalizes search results”. Figures (b) and (c) show personalization calculated via jaccard index values and RBO metric values respectively in YouTube’s standard-incognito SERP pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.7 RQ1b results: Figure (a) shows participants’ response to survey question: “How much, if at all, do you think YouTube personalizes up-next recommen- dations”. Figure (b) shows the distribution of the percentage of YouTube videos recommended to the study participants from their subscribed chan- nels. Figures (c) and (d) show personalization calculated via jaccard index values and DL distance metric values respectively in YouTube’s standard- incognito up-next trails pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.8 RQ2: Figure showing participants’ response to survey question: “How much do you trust the credibility of information present in the ” a) search results and b) up-next videos recommended by YouTube. . . . . . . . . . . . 69 4.9 RQ2a results: Mean misinformation bias scores for 88 search queries for all participants. A negative score indicates that SERPs contain more videos opposing election misinformation. . . . . . . . . . . . . . . . . . . . . . . . . . 69 xii LIST OF FIGURES 4.10 RQ2a results: a) Search queries with highest (labeled in red) and lowest (labeled in blue) mean misinformation bias scores. Positive misinformation bias scores indicate a lean towards misinformation where as negative bias scores indicate a lean towards information that opposes misinformation. b) Figure showing the distribution of misinformation bias scores of search queries for democrats, republicans and independents. Note that the bias scores for the participants belonging to the different political leanings coincide indicating that misinformation bias in SERPs remain constant throughout for each participant. . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.11 RQ2b results: Mean misinformation scores of standard up-next trails with seed videos that are supporting (S), neutral (N), or opposing election misin- formation (O) for Democrats, Independents, and Republicans. A positive misinformation score indicates a lean toward misinformative content while a negative score indicates a lean toward content that opposes election mis- information. Statistical tests reveal a significant difference in the amount of misinformation contained in up-next trails. I find that democrats, repub- licans, and independents find more misinformation in supporting trails compared to neutral trails, and more misinformation in neutral trails as compared to opposing trails. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.12 RQ2b results: Mean percentage of various transitions present in the stan- dard up-next trails of democrats, independents and republicans. S repre- sents a video supporting election misinformation, N represents a neutral video and O represents a video opposing election misinformation. Transi- tion S->S denotes that a YouTube video supporting election election misin- formation leads to an up-next video recommendation supporting election misinformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.13 RQ2c results: Figure showing the average change in the amount of bias present in homepages because of watching a trail of up-next videos starting with supporting, opposing, and neutral seeds for democrats, republicans, and independents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 xiii LIST OF FIGURES 4.14 RQ3 results: a) Figure showing Top-10 YouTube channels with impressions in most number of search queries for all study participants. For example, on an average CNN appears in 61.86% of search queries for all study participants. b) Figure showing average number of impressions for Top- 10 YouTube channels that appear in most number of standard up-trails collected for users. For example, on an average, videos from Fox News channel appear 3.27 times in those up-next trails where videos from the channel are observed. is a left-leaning channel, is right-leaning and is center-leaning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.15 RQ3a results: Distribution of Gini coefficients for all search queries (n=88) for a) Democrats, b) Republicans and c) Independents, calculated based on distribution of impressions of YouTube channels appearing in the search results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.16 RQ3b results: Figure showing the top YouTube channels appearing in supporting, neutral, and opposing trails of democrats, republicans, and independents and the percentage of users in whose trails these channels appear. is a left-leaning channel, is right-leaning and is center-leaning. 78 5.1 (a) Amazon homepage recommendations. (b) Pre-purchase recommenda- tions displayed to users after adding a product to cart. (c) Product page recommendations. (d) Table showing 15 recommendation types spread across 3 recommendation pages. . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2 (a) Google Trend’s Related Topics for topic vaccine. People who searched for vaccine topic also searched for these topics. (b) Google Trend’s Related queries for topic vaccine. These are the top search queries searched by people related to vaccine topic. (c) Amazon’s auto-complete suggestions displaying popular and trending search queries. . . . . . . . . . . . . . . . . 95 xiv LIST OF FIGURES 5.3 Figure illustrating the breadth-wise topic discovery approach used to collect vaccine-related topics from Google Trends starting from two seed topics: vaccine and vaccine controversies. Each node in the tree denotes a vaccine- related topic. An edge A→ B indicates that topic B was discovered from the Trends’ Related Topic list of topic A. For example, topics “vaccination” and “andrew wakefield” were obtained from the Trend’s Related Topic list of “vaccine controversies” topic. Then, topic “mmr vaccine and autism” was obtained from topic “andrew wakefield” and so on. indicates the topic was discarded during filtering. Similar colored square brackets indicate similar topics that were merged together. . . . . . . . . . . . . . . . . . . . . . 97 5.4 Eight steps performed in Unpersonalized audit. The steps are described in detail in Section 5.4.2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.5 Steps performed by treatment and control accounts in Personalized audit corresponding to the 6 different features. . . . . . . . . . . . . . . . . . . . . . 104 5.6 Qualitative Coding Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.7 RQ1a: (a) Number (percentage) of search results belonging to each anno- tation value. While majority of products have a neutral stance (40.81%), products promoting health misinformation (10.47%) are greater than prod- ucts debunking health misinformation (8.99%). (b) Number (percentage) of recommendations belonging to each annotation value. A high percentage of product recommendations promote misinformation (12.95%) while per- centage of recommendations debunking health misinformation is very low (1.99%). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.8 RQ1a: Figure showing categories of promoting, neutral and debunking Amazon products (search results). All categories occurring less than 5% were combined and are presented as other category. Note that misinforma- tion exists in various forms on Amazon. Products promoting health mis- information include books (Books, Kindle eBooks, Audible Audiobooks), apparel (Amazon Fashion) and dietary supplements (Health & Personal Care). Additionally, proportion of books promoting health misinformation is much greater than proportion of books debunking misinformation. . . . 115 xv LIST OF FIGURES 5.9 RQ1a: Input, rank and output bias for all 10 vaccine-related topics across five search filters. The bias scores are average of scores obtained for each of the 15 days. Input and rank bias is positive (>0) in the search results of majority of topics for filters “featured” and “average customer review”. A bias value greater than 0 indicates a lean towards misinformation. Topics “andrew wakefield” and “mmr vaccine & autism” have a positive input bias across all five filters indicating that search results of these topics contain large number of products promoting health misinformation irrespective of the filter used to sort the search results. Topic “vaccination” has the highest overall bias (output bias) of 0.63 followed by topic “andrew wakefield” that has output bias of 0.53. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.10 Input, rank and output bias for all filter types. . . . . . . . . . . . . . . . . . . 117 5.11 Top 20 search query-filter combinations with highest output bias. In other words, these query-filter combinations are the most problematic ones con- taining highest amount of misinformation. . . . . . . . . . . . . . . . . . . . . 118 5.12 Recommendation graphs for 5 different types of recommendations collected from the product pages of top three search-results obtained in response to 48 search queries, sorted by 5 filters over a duration of 15 days during Unpersonalized audit run. denotes products annotated as misinformative, as neutral and as debunking. Node size is proportional to the times the product was recommended in that recommendation type. Large sized red nodes coupled with several interconnections between red nodes indicate a strong filter-bubble effect where recommendations of misinformative products returned more misinformation. . . . . . . . . . . . . . . . . . . . . . 119 5.13 Investigating the presence and amount of personalization due to “following contributors” action by calculating (a) Jaccard index and (b) kendall’s tao metric between search results of treatment and control. M, N and D indicate results for accounts that follow contributors of misinformative, neutral and debunking products respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . 125 xvi LIST OF FIGURES 5.14 (a) Input bias in homepages of accounts performing actions ‘add to cart”, “search + click” and “mark top rated all positive review” for seven days of experiment run. (b) Input bias in pre-purchase recommendations of accounts for 7 days experiment run. These recommendations are only collected for accounts adding products to their carts. (c) Input bias in product pages of accounts performing actions “add to cart”, “search + click” and “mark top rated all positive review” for 7 days of experiment run. M, N and D indicate that the accounts performed actions on misinformative, neutral and debunking products respectively. . . . . . . . . . . . . . . . . . . 126 6.1 Figure presenting the ecosystem of fact-checking, the whole or part of which could exist in a fact-checking organization or a news publication house. indicates the two types of fact-checking (short-term claims centric and long-term advocacy centric fact-checking) introduced in the study, presents the stakeholder groups involved in the fact-checking process (hu- man infrastructure), shows work done by the stakeholder groups as part of their role, and specifies the tools stakeholders use to mediate their roles (technological infrastructure). The numbers indicate the sequence in which various roles are performed. . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2 (a) A short YouTube video explaining a fact-check using comic like visuals (b) An Instagram post containing a fact-check (c) A “postcard” containing fact-check in Hindi language to be shared on mediums like WhatsApp. The single image contains the false-claim and the debunk. . . . . . . . . . . . . . 155 7.1 (a) A snapshot of the UI widgets implemented in the Jupyter Notebook to demonstrate the search query generation methods, (b) Figure presenting the initial wireframe of the YouCred view-results page, developed in Figma, (c) Figure displaying an example snapshot of one of the initial workflow diagram created for YouCred . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.2 Figure illustrating the workflow of YouTube-CSV-Helper extension. . . . . 187 7.3 Snapshot of YouCred’s topic database. . . . . . . . . . . . . . . . . . . . . . . . 188 xvii LIST OF FIGURES 7.4 Figure illustrates YouCred’s query generation method page, which utilizes the YouTube video tags method. The page displays a collection of video tags that can be sorted either by frequency or alphabetically (A). Each tag is accompanied by its frequency of occurrence. When a tag is selected, its corresponding bubble changes color to blue (B). Fact-checkers can choose multiple tags, and as they make their selections, the chosen tags are ap- pended with the topic to form the search query. Importantly, the search query is editable, allowing fact-checkers the agency to modify it as needed (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 7.5 Figure depicts YouCred’s query generation page utilizing the Google Trends (GT) method. Fact-checkers begin by selecting keywords that serve as seed words for extracting GT topics (A). They also have the flexibility to add custom keywords (B). Next, fact-checkers choose the GT topics of interest (C), select the countries and languages (D) they want to focus on, and specify the desired date range (E). The system then extracts the GT search queries (F), which fact-checkers can review and select from. The search query generated is editable, allowing fact-checkers to modify it as needed (G).191 7.6 Snapshot of YouCred’s view-results page consisting of multiple columns, with each column representing the search results of a specific query. The column header provides essential information such as the search query generation method (A), the search query itself (B), the applied sorting filter, and the count of search results (C). The page offers functionalities like downloading the search results as a CSV file and removing individual columns (D) as needed. Within each column, there is an interactive graph (E) that visualizes the engagement received by the search result videos and their publication dates. The page also includes sections dedicated to individual videos (H) representing each search result. These video sections provide important metadata such as the video title, channel name, upload date, views, likes, comments, and a thumbnail. If fact-checkers identify a potentially misinformative video, they can add it to the annotation database (F) for tracking and later fact-checking. Additionally, fact-checkers can utilize the block video functionality (G) to prevent a video from appearing in future search results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 xviii LIST OF FIGURES 7.7 The view-results page in YouCred features an interactive, dynamic, and multifunctional scatter plot graph. This graph showcases the engagement received by the videos in the search results, represented on the y-axis, along with their respective dates of publication on the x-axis. (a) When hovering over a point on the graph, a text box displays detailed information about the video, including its title, engagement metrics such as likes and views, and the date of publication (Figure 7.7a). (b) Fact-checkers have the ability to select a specific cluster or area of interest within the graph (Figure 7.7b), (c) allowing them to zoom in and enabling a more focused analysis of selected videos (Figure 7.7c). The "View Selected Results" button filters the search results, displaying only the videos within the selected area, facilitating a more targeted evaluation. To revert back to the original graph view, fact- checkers can simply click the "Clear Brush" button, resetting the graph and allowing for further exploration and analysis. . . . . . . . . . . . . . . . . . . 196 7.8 Snapshot of YouCred’s preview mode. Fact-checkers can click on any video in the view-results page and can view the video in the system itself. . . . . . 197 7.9 Figure shows the snapshot of YouCred’s video annotation database. Fact- checkers add videos to this database while exploring the view-results page or directly from the browser extension. All videos have a corresponding annotate button which takes the fact-checkers to the video’s annotation page. This database contains the video’s title along with other metadata such as views, likes, upload date of video, channel, etc. Columns conclusion is populated once fact-checkers assign a veracity label to the video on the annotation page. Added date column denotes the date on which the video was added to the database. The page also provides a variety of search and filter options to find or view selected videos. . . . . . . . . . . . . . . . . . . . 198 7.10 Figure showing YouCred’s annotation page that streamlines and facilitates the credibility assessment process. The header corresponds to the video’s title (A). The video is embedded towards the left side of the page (B) and the video’s transcript, subtitles, title, and description are shown in the middle in separate tabs (C). Fact-checkers can highlight misinformative claims (D) in any tabs, add corresponding annotations (E) and also assign a veracity label to the video (F). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 xix LIST OF FIGURES 7.11 Snapshot of YouCred’s claim database that stores entries for all the mis- informative claims highlighted by fact-checkers in the videos that they annotated. The database shows the fact-checker name, the misinforma- tive claim highlighted in the video, the veracity label of the claim, tags associated with the claim, and the date when the video was added to the annotation database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 7.12 Figure (a) illustrates the number of seed videos added to YouCred through manual CSV uploads and the use of the ’YouTube-CSV-Helper’ extension. Figure (b) presents the usage frequency of YouCred for generating search queries using the four proposed methods throughout the 9-month deploy- ment period. Figure (c) provides an overview of the topics monitored using YouCred and the corresponding proportions of query generation meth- ods utilized for each topic. Figure (d) illustrates the number of potentially misinformative videos added by fact-checkers to YouCred’s annotation database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 8.1 When a conspiratorial video gets recommended on a user’s YouTube home- page, the user is warned about the consequences of watching the video on future video recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 xx LIST OF TABLES TABLE Page 3.1 Seed query, hot & cold regions, and sample search queries for the five misinformation search topics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 List of user features for the audit experiments. . . . . . . . . . . . . . . . . . . 28 3.3 Accounts created to execute Watch experiments for each misinformative topic. In total, I created 120 (24X5) accounts to run experiment 3 and 30 (6X5) accounts for experiment 4. Here 5 denotes the number of topics. . . . 30 3.4 Description of the annotation scale and heuristics along with sample YouTube videos corresponding to each annotation value. I map the 9-point annota- tion scale to 3-point normalized scores with values -1 (Promoting, (P)) , 0 (Neutral, (N)) and 1 (Debunking, (D)). I have shared the list of 2,943 unique videos along with their annotation values in an online dataset.1 . . . . . . . 33 3.5 RQ1b:Watch experiment results for demographics and geolocations, given accounts have built watch history after watching promoting (P), neutral (N) or debunking (D) videos. Mean corresponds to normalized scores for the annotated videos. Higher values indicate that accounts receive more promoting videos. For example, M (50 or older) >F (50 or older) indicates that males who are 50 or older and who watch neutral flat earth videos receive more promoting videos in their Top 5 than females of the same age group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.6 RQ2: Analyzing watch history effects on the three YouTube components. P, N, and D are means of the normalized scores of videos presented (via the YouTube components) to accounts that have built their watch histories by viewing promoting (P), neutral (N), and debunking (D) videos, respectively. For example, P > N indicates that accounts that watched promoting videos received more misinformation (or more promoting videos) compared to accounts that watched neutral videos. . . . . . . . . . . . . . . . . . . . . . . . 38 xxi LIST OF TABLES 4.1 Sample search queries for the YouTube audit . . . . . . . . . . . . . . . . . . . 52 4.2 Sample seed videos curated for the audit experiment. . . . . . . . . . . . . . 53 4.3 A sample of of classifiers and feature set with the progression of performance. 62 4.4 The misinformation bias scores form a bimodal distribution, each consti- tuting a cluster of similar queries. This table describes the clusters and presents sample queries for each cluster. . . . . . . . . . . . . . . . . . . . . . . 69 5.1 Sample search queries for each of the ten vaccine-related search topics. . . 97 5.2 List of user actions employed to build account history. Every action and product type (misinformative, neutral or debunking) combination was performed on two accounts. One account sorted search results by filters “featured” and “average customer review”. The other account built history in the same way but sorted the search results by filters “price low to high” and “newest arrivals”. Overall, I created 40 Amazon accounts (6 actions X 3 tested values X 2 replicates for filters + 2 control accounts + 2 twin accounts).101 5.3 List of contributors selected for building up account history for action “Follow contributors”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4 Books corresponding to each annotation value shortlisted to build account histories in the Personalized audit. S represents the star rating of the product and R denotes the number of ratings received by the book. . . . . . . . . . . 103 5.5 Description of annotation scale, heuristics along with sample products corresponding to each annotation value. . . . . . . . . . . . . . . . . . . . . . 107 5.6 Example illustrating the bias calculations. For a given query, Amazon’s search engine presents users with the following products in the search results i1, i2 and i3. The misinformation bias scores of the products are s1, s2 and s3 respectively. The table has been adopted from previous work [242]. A bias score larger than 0 indicates a lean towards misinformation. . 114 5.7 RQ1b: Analyzing echo chamber effect in product page recommendations. M, N and D are the means of misinformation bias scores of products recom- mended in the product pages of misinformative, neutral and debunking Amazon products respectively. Higher means indicate that recommenda- tions contain more misinformative products. For example, M>D indicates that recommendations of misinformative products have more misinforma- tion than recommendations of debunking products. d, n and m are number of unique products annotated as debunking, neutral and promoting for each recommendation type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 xxii LIST OF TABLES 5.8 RQ2: Table summarizing RQ2 results. IR suggests noise and inconclusive results, i.e search results of control and its twin seldom matched. Thus, difference between treatment and control could either be attributed to noise or personalization, making it impossible to study the impact of personaliza- tion on misinformation. NP denotes little to no personalization. - indicates that the given activity had no impact on the component. X indicates that component was not collected for the activity. M, N and D indicate average per day bias in the component collected by accounts that built their history by performing actions on misinformative, neutral or debunking products. Higher mean value indicates more misinformation. For example, consider the cell corresponding to action “search + click & add to cart product” and “Homepage” recommendation. M>N>D indicates that accounts adding misinformative products to cart ends up with more misinformation in their homepage recommendations in comparison to accounts that add neutral or debunking products to cart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.1 Table showing list of participants with their gender and experience (in years) in their current role. Some participants have been associated with fact-checking work for a longer duration. I only report their experience in the current role in the organization. . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.2 Table showing the stakeholder groups identified in the study, the partici- pating organizations, and the continents I covered through the interviews. In the organization column, freelance refers to no association with a partic- ular fact-checking organization/team. I aggregated the roles of stakehold- ers and their association with fact-checking organization/team to ensure anonymity as in some cases knowledge of network affiliation and role could potentially reveal the identities of a few participants. Note that the participants that I interviewed sometimes provided insights about more than one role. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 xxiii C H A P T E R 1 INTRODUCTION “Google’s search algorithm spreads false information with a rightwing bias—Search and autocomplete algorithms prioritize sites with rightwing bias, and far-right groups trick it to boost propaganda and misinformation in search rankings”—The Guardian [3] “YouTube more likely to recommend election-fraud content to those skeptical of the 2020 election”—The Hill [5] “YouTube is still suggesting conspiracy videos, hyperpartisan and misogynist videos, pirated videos, and content from hate groups following common news-related searches.”— The BuzzFeed [4] “An Anti-Vaccine Book Tops Amazon’s COVID Search Results.”—NPR [1] Search engines and social media platforms are an indispensable part of our lives; 92% of the adult population relies on them for information with 52% doing this on an average day [320]. Despite their increasing popularity, to date, their search, ranking, and recommendation algorithms remain a black box to the users. The relevance of results produced by these search engines is mostly driven by market factors and not by quality ( fairness, credibility, and representativeness) of the content of those results [397]—a fact most people are unaware of [262]. There is no guarantee that the in- formation presented to people on online platforms is credible. The repercussions of users’ exposure to fabricated information in search results combined with their un- wavering trust in online platforms could be enormous. Previous research has already demonstrated that online misinformation can manipulate individuals’ social, political, and health-related behavior, leading to inaction and detachment [124]. When citizens 1 CHAPTER 1. INTRODUCTION base their decisions on inaccurate information, it could not only pose a threat to democratic processes [241] but also impact their health and well-being [232, 247, 402]. Consequently, it is crucial to prioritize the investigation of algorithmically curated misinformation and develop effective long-term defenses against it. This dissertation research aims to address these concerns through two primary objectives. First, it seeks to understand the role played by algorithms employed by online platforms in amplifying online misinformation. Second, it aims to design de- fenses against online misinformation by incorporating human-centered insights from stakeholder groups like fact-checking organizations and news agencies. My work acknowledges the complex multifaceted nature of online misinformation and recog- nizes that in order to design effective defenses against online misinformation, it is crucial to leverage the expertise and insights of stakeholder groups who are actively combating misinformation in the real world. To this end, my research delves into three interconnected threads, each focusing on a distinct aspect of the misinforma- tion problem: online algorithms, fact-checking, and the design of systems to combat misinformation. The first thread of my research addresses the algorithm problem by investigating and auditing the online platforms to understand the role of algorithms driving these platforms in surfacing misinformative content to users (Chapters 3, 4, and 5). The second thread focuses on supporting online fact-checking as a means to combat misinformation. Here, I explore how fact-checking is performed in the real world, identifying stakeholder groups involved in the process, and uncovering the technical, policy, and information barriers to fact-checking online misinformation (Chapter 6). The final thread of my research aims at designing and building an online system that helps fact-checkers in monitoring online platforms for algorithmically curated misinformation (Chapter 7). In the rest of the introduction, I will elaborate on the definition of online misinformation and then will briefly outline each of my research arcs. 1.1 Study Context: Misinformation The research community has referenced online misinformation with different names and definitions. A few popular characterizations include “fake news” [184, 198], “hoaxes” [243], “rumors” [150, 321], “conspiracy theories” [68, 341], “information cred- ibility” [84, 281] and “perceived accuracy” [78, 308]. In my dissertation, especially the first thread of my research, I focus on the conspiratorial aspect of misinformation and 2 1.2. RESEARCH ARCS use these terms interchangeably. Conspiracy theories are narratives that embody the belief that secret and influential organizations are behind the occurrence of a particu- lar event [439]. Note that conspiracy theories are not always false. There have been several cases in the past where conspiracy theories turned out to be true (for example, Watergate Scandal [357] and Project MKUltra [421]). To differentiate true conspiracy theories from false ones, I depend on the theory of social constructionism where a fact is only considered “true” if its claim is widely cited, replicated, and accepted without contest [246]. For the purpose of my audit research, I focus only on conspiratorial topics whose mainstream view of reality is known—for e.g., “vaccines do not cause autism”. The mainstream perspective of such theories is either backed by expert au- thorities or scientific research and is widely accepted by a large number of people. At present, “what the majority of people believe in” is our best effort in determining the truthfulness of conspiracy theories. I acknowledge that the truth value may change in the future if new information is available. For each of the topics under investigation, I operationalize the credibility assess- ment task involving annotation of social media content (e.g. YouTube videos, Amazon products, etc.) using social epistemology [166], as done by prior research [279]. Accord- ing to social epistemology, the consensus among individuals can be considered one of the ways for determining truth [166]. In this perspective, if a majority of individuals hold the same belief, it is often seen as an indicator of truth. This approach is based on the idea that collective agreement can be a reliable measure of the truth. Thus, while getting credibility annotations from multiple individuals, I assigned the credibility la- bel using the majority rule. However, for the third thread of my research, where I built a fact-checking system to assist fact-checkers with misinformation discovery, I leave the determination of the veracity labels to the fact-checkers and their organizations’ established processes. 1.2 Research Arcs In this section, I provide an overview of each thread of my dissertation research. 3 CHAPTER 1. INTRODUCTION 1.2.1 Auditing online platforms to measure the prevalence of algorithmically curated misinformation Search engines and social media platforms are the primary gateways of information. However, algorithms powering these platforms optimize for relevance and engage- ment with no regard for the credibility of the information while presenting content to the users. My research, first, aims to understand the role of algorithms driving these online platforms in surfacing misinformation. I have designed audit method- ologies to determine the effect of user features and user activities on the amount of misinformation surfaced by online platforms in searches and recommendations. Using this methodology, I conducted an exhaustive set of carefully controlled experiments to audit social media search interfaces such as YouTube and Amazon. Through my audits, I identified the conditions under which algorithms present misinformative content to users as well as vulnerable user populations who could be targets for certain misinformative topics on online platforms. I briefly describe the three audit studies that I performed under this thread below. 1.2.1.1 Auditing YouTube for perennial and demonstrably false conspiracy theories In the first study under this research thread, I conducted audit experiments to inves- tigate whether personalization (based on age, gender, geolocation, or watch history) contributes to amplifying misinformation on YouTube (Chapter 3). After shortlisting five popular topics known to contain misinformative content (Chemtrails, Flat Earth, Vaccine Controversies, etc.) and compiling associated search queries representing them (via Google Trends and YouTube auto-complete suggestions), I conducted two sets of sock-puppet audits—Search- and Watch-misinformative audits. The audits resulted in a dataset of more than 56K videos compiled to link stance (whether promoting misinformation or not) with the personalization attribute audited. The videos cor- responded to three major YouTube components: search results, UpNext, and Top 5 recommendations. I found that demographics, such as gender, age, and geolocation do not have a significant effect on amplifying misinformation in returned search results for users with brand-new accounts. On the other hand, once users develop a watch history, these attributes do affect the extent of misinformation recommended to them. For example, I found YouTube recommending misinformative videos to men watching neutral videos about conspiratorial topics. 4 1.2. RESEARCH ARCS 1.2.1.2 Auditing YouTube for election misinformation In this study (Chapter 4), I conducted a post-hoc audit of election misinformation on the YouTube platform. After the US presidential elections, a lot of conspiracies circulated on YouTube questioning the validity of the election procedures as well as the results of the elections. In response, YouTube established content policies to remove videos promoting election-related falsehoods from its platform and said that such misinformative videos would not prominently surface in its searches and recommen- dations. In this work, I conducted a large-scale crowd-sourced audit of the YouTube platform to determine how effectively YouTube regulated its algorithms—search and recommendation—for election misinformation. To conduct the investigation, I re- cruited 99 participants with different demographics and political affiliations who installed TubeCapture, a browser extension that I built to collect users’ YouTube search results, and recommendations. The extension conducted searches for 88 search queries related to the 2020 US presidential elections and collected up-next recommendation trails—five consecutive up-next recommendation videos—for a set of pre-selected seed videos. I found that YouTube’s search results, irrespective of search query bias, contain more videos that oppose rather than support election misinformation. How- ever, watching misinformative election videos still lead users to a small number of misinformative videos in the up-next trails. 1.2.1.3 Auditing e-commerce platforms for health misinformation In the third study under this research thread, I conducted two sets of algorithmic audits on the Amazon platform to examine the prevalence of vaccine misinformation in its search results and recommendations (Chapter 5). First, I systematically audited search results belonging to vaccine-related search queries without logging into the platform— unpersonalized audits. Second, I analyzed the effects of personalization due to account history, where history is built progressively by performing various real-world user actions, such as clicking a product, adding a product to cart, etc—personalized au- dits. My work provides an elaborate understanding of how Amazon’s algorithm is introducing misinformation bias in the product selection stage and ranking of search results across five Amazon filters for ten impactful vaccine-related topics. My analy- sis of Amazon’s product page recommendations suggests that recommendations of products promoting health misinformation contain more health misinformation when compared to recommendations of neutral and debunking products. Through my audit 5 CHAPTER 1. INTRODUCTION experiments, I empirically establish how certain real-world actions on health misin- formative products on Amazon could drive users into problematic echo chambers of health misinformation. 1.2.2 Identifying ways to support fact-checking online misinformation In order to design effective long-term solutions against online misinformation, it is important to understand and support the current fact-checking practices. A lot of times, fact-checking is only considered a technical problem concerning methodolo- gies, tools, and algorithms to detect, assess, and verify the accuracy of claims and information. However, fact-checking is a complex socially-situated technical phe- nomenon involving collaboration among multiple stakeholders. In this thread of work, I highlight both the social and technical aspects of fact-checking (Chapter 6). First, I foreground the social aspect—the human infrastructure—of fact-checking by revealing the synergistic collaboration that occurs among various stakeholder groups that work together to accomplish fact-checking work [317]. Second, I highlight the technical aspect—technological infrastructure—which comprises of tools, technology, processes, and policies that support and enable the work of the stakeholder groups. The fore- grounding of the infrastructures supporting the fact-checking work helped me in unraveling the technical, policy, and information barriers to fact-checking. Based on my findings, I suggest that improving the quality of fact-checking requires system- atic changes in the civic, informational, and technological contexts. For this work, I interviewed 14 fact-checking organizations across 14 continents, enabling me to get a global perspective on the current fact-checking practices and needs of the fact-checking organizations. 1.2.3 Defending against online misinformation via system design In my third research thread, I design a fact-checking system that caters to the needs of fact-checking organizations. Through collaboration with these organizations, I dis- covered that monitoring algorithmically driven online platforms still heavily relies on manual efforts by fact-checkers. They spend significant time conducting manual searches on search engines and social media platforms to identify misleading con- tent. Moreover, generating effective search queries to uncover potentially dubious content remains a challenge, often relying on guesswork. This problem is particularly 6 1.3. CONTRIBUTIONS AND IMPACT pronounced on video search platforms like YouTube, where the lack of dedicated moni- toring tools further exacerbates the issue. To address these challenges, I partnered with Pesacheck, Africa’s largest indigenous fact-checking organization, to develop YouCred. YouCred is an online fact-checking system that automatically generates search queries related to important events and topics of interest to fact-checkers and provides an easy interface to analyze and annotate YouTube videos. To ensure the system met the needs of fact-checkers, I regularly gathered feedback from the Pesacheck team, leading to refinements and enhancements in the system’s interface, features, and functionality. The finalized version of YouCred was deployed and evaluated at Pesacheck for nine months. The response from the fact-checking community was positive, with consistent usage observed throughout the evaluation period. The development of YouCred serves as an example of how participatory design methods can bridge the “design-reality gap”, aligning the needs of fact-checking stakeholders with the technical systems designed to support their work. 1.3 Contributions and impact Below I discuss the broader impacts of my dissertation research. • Establishing the phenomenon of algorithmically curated misinformation: My work has resulted in a methodology to audit search engines and social media platforms for misinformation and opened up a new avenue in the domain of algorithmic-audit research. My audit study of the YouTube platform was the first to empirically establish the prevalence of the “misinformation filter bubble effect” revealing how search engines could trap people in echo chambers of misinformation. By conducting an exhaustive list of experiments, my work has demonstrated the distinct characteristics of algorithmically curated misinforma- tion and has quantified its prevalence across multiple attributes such as user features, user actions, search query types, and time variations. • Policy implications: As a sign of direct policy implications of my work, U.S. Representative Adam B. Schiff cited my algorithm audit work on Amazon in his congressional letter where he asked Amazon to address the problem of vaccine misinformation on its platform [347]. Additionally, I was also interviewed by the U.S. House Select Subcommittee staff who wanted to learn about the challenges of COVID-19 and vaccine misinformation in relation to e-commerce platforms. 7 CHAPTER 1. INTRODUCTION • Creation of novel datasets: The comprehensive audits conducted as part of my thesis resulted in the creation of two valuable and novel datasets. First, the YouTube audit study (Chapter 3) produced a novel dataset comprising 56,475 videos, which linked the veracity label of the video (promoting, neutral, or debunking) with the audited personalization attribute. Second, the audit ex- periments on Amazon (Chapter 5) yielded a dataset of 4,997 unique Amazon products distributed across multiple search queries, search filters, recommen- dation types, and user actions collected over an extensive 22-day audit period. These datasets serve as invaluable resources, facilitating further research and analysis in the field of online misinformation, specifically regarding the impact of algorithmic curation on content credibility. • Determining ways to support the online fact-checking process: Through my research, I actively engaged with 26 individuals from 14 fact-checking teams and organizations representing four continents, to determine ways to support the online fact-checking process. This collaborative effort provided invaluable insights into real-world fact-checking practices, shedding light on the often invisible advocacy, policy, and research work carried out by these organizations. Through this research, I also identified the diverse technical, social, informational, and policy needs of fact-checking organizations across the globe, contributing to a better understanding of their challenges and requirements in combating misinformation. • Narrowing the design-reality gap in the development of fact-checking systems: Through my work, I have tried to bridge the design-reality gap in the develop- ment of fact-checking systems. A significant step in this direction is the design and development of YouCred, a fact-checking system that assists fact-checkers in discovering and assessing misinformation on the YouTube platform. This endeavor involved a two-year collaboration with the Pesacheck fact-checking organization, whose members played a pivotal role in shaping the system. To demonstrate the practicality and effectiveness of YouCred, I conducted an exten- sive nine-month evaluation, showcasing its applicability in combating misinfor- mation and enhancing the fact-checking process. 8 C H A P T E R 2 RELATED WORK My research focuses on multiple aspects of online misinformation and is informed by a large body of multi-disciplinary research. This chapter provides a comprehensive review of the literature, exploring three primary dimensions through which I inves- tigate the phenomenon of online misinformation: auditing algorithms employed by online platforms, fact-checking online misinformation, and designing for mitigating online misinformation. 2.1 Auditing Online Platforms for Misinformation 2.1.1 Misinformation in algorithmic platforms Search engines are modern-day gatekeepers and curators of information. Their black- box algorithm can shape user behavior, alter beliefs and even affect voting behavior either by impeding or facilitating the flow of certain kinds of information [117, 134, 238]. Despite their importance and the power they exert, to date, their search results and recommendations have mostly been unregulated. The information quality of a search engine’s output is still measured in terms of relevance and it is up to the user to determine the credibility of the information. In recent times, these search engines have been critiqued for promoting misinformative and biased results [411]. For example, researchers found that a significant number of people ended up believing that the Earth is flat after watching recommended videos on Youtube—one of the 9 CHAPTER 2. RELATED WORK most popular video search platforms [297]. Another report outlined that searching for “vitamin K shot” on Google and YouTube returned web pages and videos asking parents to skip the vitamin shot [120]. The top search results also promoted anti- vaccine conspiracies [120]. In another instance, a search for the term “vaccine” on Amazon resulted in pages dominated by anti-vaccination books and movies, some including sponsored posts and ads [145]. All the aforementioned studies provide anecdotal evidence of algorithms playing a role in surfacing misinformation without experimentally quantifying its prevalence. My proposed research aims to fill this gap by applying the audit methodology to empirically establish the conditions under which the algorithms driving online platforms surface misinformation. 2.1.2 Search engine audits In recent times, search engines have been critiqued for promoting misinformative and biased results [411]. One of the key methodologies used to identify, study, and quan- tify such bias, discrimination and misinformation is the audit methodology. An audit comprises of systematic statistical probing of an online platform to uncover societally problematic behavior underlying its algorithms [344]. Scholars have proposed and used a myriad of audit research methods, including code audits, scraping audits, sock puppet audits, and crowd-sourced audits [344]. A code audit requires researchers to get access to the algorithm’s code and design in order to analyze it for problematic or harmful behaviour. Such audits are unfeasible for online platforms since their code is proprietary and is not available for public access [305, 306]. Furthermore, such audits could require human experts to understand and untangle the code logic [79]. This method is also only useful in detecting a limited range of problems in algorithmic systems since algorithms do not exist in a vacuum, and they might only show biased behaviour when they act on users’ data [344]. A scraping audit involves researchers collecting data directly from the web page or via an API. This audit method is not useful in situations where user characteristics (such as gender, age, etc.) impact the algorithmic output [423]. In a sock puppet audit, researchers create bot accounts or fake user accounts that impersonate real-life users in order to investigate how an algorithmic system may behave in response to different user characteristics or user actions. This audit method gives researchers the greatest control over experimental variables [423]. However, this method involves injection of false and harmful data into the platform under investigation and necessitates serious ethical considerations. Researchers need to ensure that the actions performed by the 10 2.1. AUDITING ONLINE PLATFORMS FOR MISINFORMATION fake accounts do not negatively impact the real users of the platform. Finally, in a crowdsourced audit, researchers hire crowdworkers to collect data from the platform in order to test the algorithmic system. Just like sock puppet audits, this method could also inject false and harmful data into the platform. High participant recruitment cost is another limitation of this research design [344]. Using these audit techniques, researchers have investigated several issues perti- nent to algorithmically driven online platforms. For example, they have explored the presence of partisan bias in search engine components [202, 275, 330]; investigated representativeness issues, such as racial and gender bias in online freelance market- places [189] and resume search engines [93]; the presence of price discrimination and algorithmic manipulation in e-commerce websites [95, 188]; opacity in price surging algorithms used by ride-sharing services [94]; lack of news source diversity in the information returned by search platforms, [393]; and the extent of personalization and localization used by search engines [187, 235]. Yet, auditing online platforms for algorithmic misinformation is practically non-existing. By focusing on auditing online platforms such as YouTube and Amazon for misinformation, my research takes a first step in the direction of auditing algorithms for misinformation. 2.1.3 Methodological Challenges in Audit Investigations There are numerous methodological challenges while conducting audit investigations. The first roadblock is determining a viable set of search queries that will result in meaningful measurements. Surely, we cannot feed all possible search queries to the system under audit. Researchers have adopted several techniques to compile and shortlist meaningful search queries. For example, to audit Google’s Top stories box, researchers selected Trending topics from Google Trends at a fixed time every day and then manually shortlisted the trending queries related to those topics [393]. An audit conducted on Google, Yahoo, and Bing search engines, during the 2016 United States Congressional elections, used the names of electoral candidates as queries [276]. To investigate gender bias in the resume database, researchers used the most commonly searched job titles [93]. To audit for partisan bias in Google search, scholars compiled autocomplete suggestions for multiple root queries related to Donald Trump’s presi- dential inauguration [330]. In my work, I leverage both, queries from Google Trends as well as autocomplete suggestions to ensure that the query set is trending and relevant to the platform under investigation. The second challenge of audit methodologies relates to carefully controlling the 11 CHAPTER 2. RELATED WORK experimental setup for meaningful audit investigations. These comprise decisions on setting the data collection framework, selecting the components to audit, and control- ling for confounding factors or noise. What audit methodologies should one select for conducting the audit? Researchers in the past have used various methods to collect data for the audit experiments [59]. In my work, I have employed both sock-puppet and crowdsourcing methods. For two projects, I manually crafted accounts on online platforms and used automated scripts to collect data so as to have more control over the experiments. For the third project, I recruited individuals who were instructed to install browser extensions to collect data, enabling me to observe algorithmic behavior in response to the complex user histories of real individuals. What components should one select for the audit experiments? Some audit studies focus on one component of the search engine, such as Google’s Top stories box [393] or Google’s search results [187]). Others focus on multiple components combined, such as various Google search page components including people-ask, news-card, twitter, people-search etc. [332]). My audit studies focused on search results and various recommendations specific to the platform under audit. I also leverage previous literature [187] to control for any confounding factors that could possibly affect the outcome of the experiments. The third challenge for conducting search engine audits lies in identifying the attributes and actions that could possibly affect the feature one is auditing. Several audit studies have focused on geolocation-based personalization. For example, to in- vestigate the effects of geolocation on web-based personalization, researchers focused on nation-level (randomly selected states in the USA), state-level (counties within Ohio) and county-level (voting districts in Cuyahoga County) locations. They found that personalization in search results increases with physical distance [236]. Audit studies have also investigated the effects of demographics, search-history, click history, and browsing history on Google’s web search results as well as prices of commodities on e-commerce platforms [187, 188]. Motivated by these studies, I investigate various user attributes and actions relevant to the platform under audit and determine their impact on the amount of misinformation that gets surfaced on online platforms. The last challenge for conducting online audits relates to properly defining how one is measuring the output label of the phenomenon that is being audited. For example, if a study investigates partisan bias, how do you define and label bias in a valid way? For my work on misinformation audits, I label algorithmic content as promoting, debunking, or neutral based on whether it supports, debunks, or presents general information about the topic of the audit study. 12 2.2. FACT-CHECKING ONLINE-MISINFORMATION 2.2 Fact-checking online-misinformation With the presence of a vast amount of information online, it is becoming increasingly difficult to judge what to believe or discredit [66]. One of the most prominent ap- proaches to identifying information accuracies on online platforms is fact-checking. Thus, to combat misinformation it is essential to support every aspect—both visible and invisible—of the fact-checking process. In this section, I first present the definition, origin, and evolution of fact-checking (Section 2.2.1). Next, I discuss the literature on the invisible work of fact-checking (Section 2.2.2), and finally the current landscape of research in fact-checking (Section 2.2.3). I show how previous work engages with understanding the fact-checking practices and tools in a limited manner and describe how my research addresses this gap. 2.2.1 Fact-checking: Definition, Origin, and Evolution The American Press Institute defines the process of fact-checking as “re-reporting and researching the purported facts in published/recorded statements made by politicians and anyone whose words impact others’ lives and livelihoods.” [133]. One of the early examples of fact-checking emerging as an integral part of journalism was when the Time magazine set up a separate research department to objectively verify every printed word before releasing the publication, a phenomenon now known as ante hoc, internal, or in-house fact-checking [119]. The last decade also witnessed the emergence of post-hoc or external fact-checking which consists of publishing an evidence based analysis of claims made in any public text (e.g., news report, political speech, social me- dia posts, etc.) after it is released to the world [174]. Today, fact-checking has emerged both as a principal part of news reporting as well as a separate entity [400]. According to Duke Reporters’ Lab, by 2019 there were around 188 active fact-checking initia- tives spread across 60 countries [370]. These initiatives have incorporated a range of methodologies and data-driven journalistic practices to not only hold disinformation- spreading individuals and organizations accountable through their fact-checks, but to also disseminate fact-checks in such a way that increases engagement with the public [58, 99]. With the aim of bringing together these fact-checking initiatives and in order to promote common fact-checking standards through a code of principles, the Inter- national Fact-checking Network (IFCN) was established in 2015 [315]. Major social media companies such as Facebook and Google have since then partnered with IFCN signatories to debunk false claims surfacing on their platforms [175]. While existing 13 CHAPTER 2. RELATED WORK work describes the evolution of fact-checking from journalism to external fact-checking [172, 177], there are still gaps in understanding how fact-checking is actually practiced, the identity and role of the participating stakeholders and the various collaborations and partnerships occurring in the process. Understanding these aspects is essential to support the process of fact-checking online misinformation. Through my research, I deep dive into answering these missing aspects of the fact-checking phenomenon. 2.2.2 Invisible Work of Fact-checking Within CSCW, a lot of attention has also been paid on highlighting the invisible or the overlooked work in a process or within an organization [38, 131, 142, 345]. Invis- ible work can include situations where the person performing the work is visible but some of the work they perform is “functionally invisible or taken for granted” [372]. Such work remains hidden in the background but is essential for the collective functioning of a workplace [289]. It often includes informal work practices such as informal conversations, operational and maintenance work, etc [289, 396, 401]. There are also situations where the person performing the work itself is invisible, such as service, design, or domestic work [266, 368]. In certain complex environments (e.g. hospitals), both visible and invisible work practices can take place simultaneously [372]. In a similar vein, fact-checking is a complex ecosystem that includes somewhat visible editorial and investigative work and invisible advocacy, policy, and research work. Most of the prior research has looked at fact-checking as a process to debunk misinformative claims[49, 173, 174, 362]. Scholars have mostly engaged with the role of stakeholder groups such as fact-checkers and editors in supporting the fact-checking work [49, 176, 362]. I add to the existing literature by not only expanding on the previ- ously reported roles of fact-checkers and editors but also identifying other stakeholder groups, such as investigators and researchers, and advocators whose roles remain invisible and unexplored by prior research. I shed light on the invisible work that fact-checking organizations are doing to improve the availability and quality of the in- formation in their country. By rendering visibility to the invisible work in fact-checking, I hope to foreground the challenges faced by stakeholders involved in every step of the fact-checking work. This would in turn open avenues for future research supporting various aspects of the fact-checking process. 14 2.3. DESIGNING FOR MITIGATING ONLINE MISINFORMATION 2.2.3 Current Landscape of Research in Fact-checking Past research studies on fact-checking have primarily focused on automating mul- tiple stages of the fact-checking process [72, 88, 158, 191, 228, 356], determining the perception and believability of fact checks [44, 149, 295] and construction of fact-check databases [239, 385, 410]. Scholars have adopted several approaches to determine the veracity of content, such as use of knowledge graphs [359], crowd-sourcing [193, 233], deep learning models [225], natural language processing techniques coupled with su- pervised learning techniques [191] and combination of human knowledge and AI [292]. Work in the field of multimedia forensics has also led to the development of content verification tools especially for image and video verification such as Tineye, InVID, etc. [380]. Despite the plethora of automated systems and tools available for fact-checking, our understanding of their usefulness in practice is limited. Furthermore, there is a dearth of scholarly work that engages with the limitations of the current fact-checking tools and practices ([118, 171] are a few exceptions). My research addresses this gap by interviewing the various stakeholder groups involved in the fact-checking process to understand the the technological infrastructure supporting their work including the tools that are actually used by the stakeholders in practice, the limitations of the current tools, and the challenges faced by multiple stakeholder groups. 2.3 Designing for Mitigating Online Misinformation In response to the prevalence of online misinformation, scholars have proposed multi- ple solutions to combat online misinformation. The solutions span several approaches, including designing interventions on platforms [70, 97, 104, 156, 220], media literacy programs [85, 113, 213, 391] as well as designing games to help people build resistance to the online fake news [61, 170, 260, 336, 337]. A lot of these existing approaches are aimed to aid users in addressing the credibility of information that they see online [398]. However, scholars have argued that while it is important to add design features that would help people navigate a large amount of online information, the sole burden of determining the credibility of online content should not solely be shifted to users [2]. Thus, researchers are also designing tools and systems to support fact-checkers by automating and scaling various aspects of the fact-checking process (see [287] for a review). Since finding misinformation is one of the most challenging aspects of fact-checking process, several tools have been developed to monitor online platforms. For example, WhatsApp monitor tool monitors multimodal content (text, image, video) 15 CHAPTER 2. RELATED WORK that is being posted and shared on a set of public WhatsApp groups and displays the content shared the most number of times by the users [272]. Watch ’n’ Check allows fact-checkers to monitor trending topics on Twitter and find tweets containing specific keywords. The tool also displays metadata of the tweet along with details about the communities of users sharing the tweet to assist fact-checkers in making better deci- sions [88]. Crowdtangle allows monitoring of public Facebook groups and pages and is widely used by fact-checkers to find misinformation on the platform [138]. In addition to the externally available tools, some platforms also have internal tools only accessi- ble to partner fact-checking organizations to discover misinformation. For example, Meta has a misinformation monitoring tool, colloquially known as The Queue that surfaces user-submitted and AI-surfaced potentially misleading content on Facebook and Instagram [138]. While there is a lot of research and available tools for platforms like Facebook and Twitter, there is a dearth of external or platform-supported internal tools to monitor misinformation on video search platforms like YouTube. My work fills this gap by designing a monitoring system that assists fact- checkers by suggesting search queries that could lead fact-checkers to potentially misinformative content on the platform. Apart from platform monitoring tools, research has also concentrated on assist- ing fact-checkers in determining fact-check-worthy claims in a document [192, 218]. Additionally, a plethora of work has been done to determine the veracity of a given claim [84, 152, 193, 438]. Scholars have also built end-to-end fact-checking tools that automate all aspects of the fact-checking process. ClaimBuster monitors live discourses and social media platforms to catch factual claims and matches them with a repository of fact-checks to determine their veracity. For the previously unchecked claims, the tool queries search engines and databases (Wolfram Alpha) to find more information about the claim [191]. from online platforms, stance detection of documents with respect to given claims, extracting evidence, and Despite the plethora of available systems designed to facilitate fact-checking procedures, only a fraction have demonstrated real-world impact [171, 317]. This can be attributed, in part, to their limited appli- cability to diverse real-world scenarios and inadequate consideration of the distinct needs, knowledge, and resources available to fact-checking organizations [317]. To overcome these limitations and to ensure the utility of fact-checking systems within real-world organizations, it becomes imperative to incorporate the specific require- ments, knowledge, and expertise unique to the fact-checking organizations while designing fact-checking systems. My work is deeply rooted in this approach, inte- 16 2.3. DESIGNING FOR MITIGATING ONLINE MISINFORMATION grating the needs and feedback of fact-checking organizations throughout the design, development, and evaluation phases. 17 C H A P T E R 3 AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES Search engines are an indispensable part of our lives. Despite their importance in selecting, ranking, and recommending what information is considered most relevant for us—a key aspect governing the ability to meaningfully participate in public life [161]—there is no guarantee that the information is credible. Numerous scholars have emphasized the need for systematic statistical investigations, or audits of search sys- tems so as to uncover societally problematic behavior [344]. For example, multiple studies have audited search engines for the presence of partisan bias [202, 330] and gender bias [93, 117]. Yet, none have empirically audited them for misinformation. Moreover, investigation of video search engines, like YouTube is rare (work by Jiang et al. is one exception [216]), despite popular prediction that by 2022, 82% of internet traffic will come from videos [101]. YouTube has also faced years of criticism for surfac- ing misinformative content [82, 120, 415]. Critics have gone as far as calling YouTube a conspiracy ecosystem [45]. Despite such vehement criticisms, there has been little effort toward quantifying the extent of misinformation in video search platforms or investigating user attributes that might have an effect. What is the effect of attributes, such as user’s demographics and geolocation on the amount of misinformation re- turned and recommended on YouTube? How does it change with user’s watch history, where watch history is progressively built by watching videos rife with inaccuracies or videos presenting extensive debunks? This chapter grapples with these questions 18 and sheds light on the phenomenon of algorithmically surfaced misinformation on YouTube and how that is affected by penalization attributes (gender, age, geolocation, and watch history). I study the conspiracy facet of misinformation and perform audits on trending and perennial misinformative topics that are widely known to be false (details in Section 1.1). In particular, I examine five misinformative topics namely, 9/11 conspiracy theories, chemtrail conspiracy theory, flat earth, moon landing conspiracy theories, and vaccine controversies. I conduct two sets of audit experiments—Search and Watch audits to examine YouTube’s search and recommendation algorithms, respectively. While Search audits are conducted using brand new user accounts, Watch audits examine user accounts that have built watch history by systematically watching either all promoting, neutral, or debunking videos of potentially misinformative topics. Both audits control for extraneous factors that can lead to potential errors in the audit data collection. I create more than 150 Google accounts to audit YouTube. The experiments collect 56,475 YouTube videos, spread across five popular misinformative topics and correspond to three major components of YouTube: videos present in search results, Up-Next, and Top 5 recommendations. I find little evidence to support that users’ age, gender and geolocation play any significant role in amplifying misinformation in search results or recommended videos for brand new accounts. On the other hand, watch history exerts a significant effect on the amount of misinformation present in the search results corresponding to the vaccine controversy topic. Watch history also significantly affects the extent of misinformation in recommended videos (both Up-Next and Top 5) for all five misinformative topics. Interestingly, I observe a filter bubble effect in recommendations, where watching pro- moting misinformative videos lead to more promoting videos in the Up-Next and Top 5 video recommendations. This filter bubble effect for recommended content is observed for all topics, except vaccines controversies. For the vaccine topic, while filter bubble is not observed for the recommended videos, it exists for the search results. Specifically, people who watch anti-vaccination videos are presented with less misinformation in their recommendations but more misinformation in their search results, compared to those who watch neutral or debunking vaccine videos. 19 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES 3.1 Research Questions and Hypotheses My work is guided by the following main research question: What is the effect of personalization (based on age, gender, geolocation, or watch history) on the amount of misinformation presented to users on YouTube? I formulate the following sub- questions and hypotheses to investigate the effects of each of these personalization attributes. RQ1 [Search &Watch Experiments]: What is the effect of demographics (age, gender) and geolocation on the amount of misinformation returned in various YouTube components? RQ1a [Search Experiments]: How are search results affected for brand new ac- counts? RQ1b [Watch Experiments]: How are search results, Up-Next, and Top 5 recommen- dations affected, given accounts have a watch history? Users provide their demographic information, including age and gender while signing- up for a new Google account. They use the same Google account for accessing YouTube. Prior studies investigating associations between user demographics and engagement with misinformation have found that the likelihoods for sharing misinformation vary across user groups [183]. For example, adults aged 65 or older were seven times more likely to share articles from fake news domains compared to younger age group users. Another study indicated that women have a higher likelihood of sharing misinforma- tion [98]. Different demographics having different likelihoods of sharing misinforma- tion might imply that certain groups are exposed to more misinformative content than others. Thus, given the interplay between demographic differences and engagement with misinformation, I hypothesize that YouTube’s algorithm could indeed be biased, exposing older people and females to more misinformation while presenting content related to the five misinformative topics. H1a. Older people (50 years or older) will be presented with more misinformative content than younger age groups. H1b. Females will be presented with more misinformative content than males. Prior studies have also shown that search algorithms, specifically Google search, leverage user’s geolocation information to present personalized search results [187]. 20 3.1. RESEARCH QUESTIONS AND HYPOTHESES Moreover, Google keeps track of the region-based popularity of search topics and search queries through Google Trends data [196]. Hence, I hypothesize that geolocation will exert an effect, which in turn will depend on how popular the misinformative search topic is in that region. H1c. Regions where misinformative topics are popular (hot regions) will be presented with more misinformative content compared to regions where such topics are rarely searched (cold regions). While RQ1 investigates the effect of attributes that are directly connected to a user’s account, RQ2 delves into the second order effect of a user’s accumulated watch history. Hence, in RQ2, I ask: RQ2 [Watch Experiments]: What is the effect of watch history on the stance of misinformative content returned in various YouTube components? Technology critics have raised concerns on search engines’ tendency to create a filter bubble over time by presenting less diverse and more attitude confirming search results and recommendations [304, 373]. Some media reports have gone so far as to claim that YouTube recommendations drive users down the conspiracy rabbit-hole by recommending increasingly more pro-conspiracy theory videos [335]. Hence, I hypothesize: H2. Watching more videos belonging to a particular misinformative stance (promoting, neutral or debunking) leads YouTube’s search and recommendation algorithm to present more videos reflecting that particular stance to users. RQ3 [Search &Watch Experiments]: How does the amount of misinformative content differ across misinformative topics? RQ3a [Search Experiments]: How does misinformative content present in search results of brand new accounts differ across topics? RQ3b [Watch Experiments]: How does misinformative content present in search results, Up-Next, and Top 5 recommendations of accounts having a watch history differ across topics? Some misinformative topics are more popular than others. For example, topics like vaccine controversies have been widely discussed in the popular media. In the last few years, several social media platforms received backlash for harboring anti-vaccination content [154, 269]. At the beginning of 2019, a handful of them, including YouTube, pledged to take measures against vaccine misinformation [351, 379]. Does that indicate 21 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES that YouTube’s algorithm will present less misinformative content for such topics, given I performed the audit experiments in the middle of 2019? I hypothesize that when attention received by misinformative topics varies, the amount of misinformative content presented by topics will also vary. H3. The amount of misinformative content returned will differ across misinformative topics. 3.1.1 Five Misinformative Topics: Demonstrably False and Perennial In this research, I focus on five topics namely, 9/11 conspiracy theories, chemtrail conspiracy theory, flat earth, moon landing conspiracy theories, and vaccine controversies. All these topics are demonstrably false, perennial, and denied by authoritative sources or backed by scientific research. I now describe each topic and demonstrate how these are demonstrably false and perennial. 3.1.1.1 9/11 misinformative topic There are several conspiracy theories surrounding the 9/11 attacks [291]. Some of them claim that authorities had foreknowledge of the attacks and that they deliberately aided the attackers. Few attribute the collapse of the Twin Towers to a controlled demolition or explosives [291]. Possible motives for these theories involve justification of the Iraq and Afghanistan invasion by the U.S. Government. Other theories assert that attacks were financed by Saudi Arabia’s Royal family or were orchestrated by the Israel Government or Pentagon was hit by a missile launched under the orders of the U.S. Government [237]. All these accounts have been denied by authoritative sources and expert analysts [374]; hence the theory is demonstrably false. Yet, a New York Times poll conducted on 1,042 individuals revealed that 16% US adults do not believe in government’s account of 9/11 attacks and 56% believe that the government is hiding something from them [387]. These statistics reveal that the theory is still persistent, despite being false. 3.1.1.2 Chemtrails misinformative topic Chemtrails conspiracy theories claim that long lasting condensation trails, also known as Contrails, left by air-crafts and rockets in the sky are composed of harmful chemi- cals. The theories blame United States Air Force (USAF) for spraying these harmful chemicals with the intention of altering the weather, controlling the population and 22 3.2. METHODOLOGY causing diseases. National Oceanic and Atmospheric Administration (NOAA) has constantly denied such allegations, citing research that has debunked these false claims [294]. Despite the scientific evidence, a recent study done with 1000 subjects found that 10% and 30% of Americans believe chemtrails conspiracy to be “completely” and “somewhat true”, respectively [388]. 3.1.1.3 Flat earth misinformative topic The third topic relates to flat earth conspiracies. Flat earth conspiracy theorists claim that the National Aeronautics and Space Administration (NASA) and government agencies are duping the public into believing that Earth is spherical in shape. Surprisingly, a 2018 survey revealed that only 66% of millennials believed that the Earth is spherical [431]. 3.1.1.4 Moon landing misinformative topic Moon landing conspiracies claim that NASA’s Apollo Mission’s moon landing was staged by the agency. The theory was denied by NASA [290]. A 500 person poll revealed that 1 in 10 Americans still believe that moon landing never happened [53], justifying the perennial criteria for topic selection. 3.1.1.5 Vaccine misinformative topic Conspiracy theories related to vaccines are based on the mistaken belief that vaccines contain harmful ingredients that can cause diseases like autism and sudden infant death syndrome (SIDS). Some theories also claim that childhood diseases can be automatically cured by the human body’s immune system and thus, vaccination is not required. Such claims are denied by the World Health Organization among other authoritative sources and several scientific research [298, 426]. Yet, a recent survey conducted with 2000 participants revealed that 45% of American adults doubt vaccines [378]. I discuss how I empirically selected these five misinformative topics in detail in Section 6.2. 3.2 Methodology Here, I first present the methodology for compiling high-impact misinformative queries, the design, and implementation of the audit experiments, the steps for col- 23 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES (a) (b) (c) Figure 3.1: (a) Google Trends allows users to specify search query as either a topic search or a term search. (b) Interest over time graph. (c) Popularity of chemtrail conspir- acy theory topic in YouTube searches in the United States between January 1st, 2016 and December 31st, 2018. Color intensity in the heatmap is proportional to the topic’s popularity in that region. lecting audit data, including components of YouTube’s Search Engine Results Page (SERP) and video pages, and the qualitative coding scheme for determining the stance of the returned videos. 3.2.1 Compiling High Impact Topics and Queries My selection methodology to identify relevant and impactful misinformation search topics and queries comprises three key steps. 3.2.1.1 Selecting misinformative topics via Wikipedia and related research: I curate a list of relevant misinformative topics (see Table 3.1) by referring to Wikipedia pages on conspiracy theories [6, 419] (e.g., 9/11, chemtrails, sandy hook, pizzagate conspiracy, etc.). I also refer to past studies that examine misinformation and conspir- atorial phenomena in online communities [342, 425]. From this list, I exclude topics whose “truth” value is uncertain, that is, topics for which I was either unable to determine the mainstream perspective or the mainstream perspective is not backed by authoritative voices or scientific research. I manually identify and eliminate such topics. For example, I removed “Malaysian Airlines Flight MH370” topic since official investigations about the flight’s disappearance have presented inconclusive reports [64, 245, 420]. Next, I leverage Google Trends to identify the most popular topics— continuously trending, high interest topics—that are searched on YouTube by a large number of people. 24 3.2. METHODOLOGY Search Topic Seed Query Hot Cold Sample Search Query 9/11 conspiracy theories 9/11 and 9/11 conspiracy Maryland Ohio 9/11 inside job 9/11 tribute 9/11 conspiracy Chemtrail conspiracy theory chemtrail Montana New Jersey chemtrail chemtrail flu chemtrail pilot Flat Earth flat earth Montana New Jersey flat earth proof is the earth flat Moon landing conspiracy theories moon landing Ohio Georgia moon moon hoax moon landing china Vaccine controversies vaccines Montana South Carolina anti vaccine vaccines vaccines revealed Table 3.1: Seed query, hot & cold regions, and sample search queries for the five misinformation search topics. 3.2.1.2 Selecting high impact misinformation search topics via Google Trends: Google Trends (Trends for short) is a good indicator for real-world activities impacting a large number of people [128]. Trends also provides interest data across different Google search services including YouTube. Figure 3.1a demonstrates how Trends could be used to search either as a Term or as a Topic. For example, searching as a Topic, chem- trail conspiracy theory will give results for several queries related to the topic (chemtrails, contrails—a common word used to refer to chemtrails), whereas searching as a Term will return results that contain text strings “chemtrail,” “conspiracy,” and “theory.” I opted to search as a Topic and selected “YouTube search” as the preferred service (refer to Figure 3.1b). This step discarded a few topics for which no trends data was returned. Next, I compare the interest over time plots for all remaining search topics from January 1, 2016 to December 31, 2018 to ensure that the topics have been persistently discussed in the last two years. Then, I select the top 5 topics which represent the most searched topics, resulting in a list of highly impactful misinformative topics. Table 3.1 provides a list. 3.2.1.3 Selecting Search Queries The next step is to generate a set of queries for each of the misinformation search topics which I can use in the subsequent audit experiments and SERP data collection. I 25 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES (a) (b) Figure 3.2: (a) YouTube search’s auto-complete suggests 10 trending queries. (b) Google Trends displays the top search queries related to the term or topic entered in the search box. need to ensure that the query set comprises both relevant and high-impact or popular queries. I feed seed queries per search topic on both YouTube and Trends. Since the study audits YouTube, query suggestions on YouTube represent the most trending queries searched on the platform, whereas Trends helps identify the most prevalent and impactful queries. YouTube’s search box’s auto-complete feature suggests 10 popular queries once a seed query is fed into the search box (refer to Figure 3.2a). I add those to expand the query set. Searching on Google Trends as a Topic displays top related queries; the number can vary by topic. I also include those in the query set (refer to Figure 3.2b). Thus, the query set comprises queries suggested by both YouTube and Trends platforms. Next, I manually removed duplicates and replaced semantically similar queries with a single relevant query. I retain the most impactful (trending and most searched) queries by keeping the seed query as well as queries that appear both in the top 5 YouTube suggestions and top 5 related queries list in Trends. I find that shorter queries (length ≤ 4) were better representative of the misinformative topic. Queries comprising more than 4 keywords (for e.g., “the flat earther’s $100,000 challenge” and “moon landing press conference analysis”) were overly specific. Hence, I only retain more representative generic queries that had a maximum of 4 keywords. The final query set for the 9/11 conspiracy theories and vaccine controversies topics had 11 queries each. Query sets for chemtrails, flat earth, and moon landing conspiracy theories topics had 10, 8, and 9 queries, respectively. In total, I had 49 queries. Table 3.1 presents a sample. 26 3.2. METHODOLOGY 3.2.2 Overview of Audit Experiments YouTube utilizes age, gender, geolocation, and watch history as features in its rec- ommendation system [106]. To determine if these features amplify the amount of conspiratorial content returned to users, I conduct a series of four audit experiments. The audits collect three primary YouTube components. I annotate the collected videos with stance values: promoting, debunking, or neutral stance towards the topic. Finally, I conduct statistical comparison tests on the annotated data. The audit experiments also control for multiple sources of noise. Unfortunately, in search engine audit stud- ies, differences in search results and recommendations cannot be solely attributed to personalization. Confounding factors (or noise), if not controlled, can also influence the results. For example, users’ choice of web browser could impact Google’s search results and recommendations and hence could lead to noisy inferences. Thus, follow- ing prior search engine audit work [187], I control for browser noise by selecting one single version of Firefox browser for all experiments. Firefox was selected over Google Chrome to avoid the possibility of Chrome browser tracking Google accounts used in my experiments. All interactions with YouTube happened in incognito mode to remove any noise resulting from tracked cookies or browsing history. I also control for temporal effects by performing simultaneous searches. Additionally, all machines used in my experiments had the same architecture, configuration, and version of the operating system (64-bit, Ubuntu 14.04, 3.75GB Ram). This step ensures that there are no temporal effects due to the differences in machines’ speeds. In the remaining sec- tion, I describe the collected YouTube components and the layout of my experimental setup. 3.2.2.1 YouTube Components I collect the following components: (a) search results. These consist of top 20 videos in YouTube’s SERP (Search Engine Results Page) returned in response to a search query. (b) Up-Next corresponds to the next recommended video that will be played imme- diately after the current video finishes, (c) Top 5 relates to the top five recommended videos on the right of the video page. Figure 3.3 demonstrates the three components. 3.2.2.2 Search Experiments: Auditing with brand new accounts For the Search experiments, I conduct two experiments to test whether demographics (age and gender) and geolocation for a new user (with no prior history on YouTube) 27 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES Search Results (a) SERP up next T o p 5 re co m m e n d e d v id e o s (b) Video page Figure 3.3: Three components collected from YouTube: (a) search results from a SERP and (b) Up-Next and Top 5 recommended videos from a video page have a significant effect on the proportion of misinformative content returned by the platform. Experiment 1: Search & Demographics (age and gender). I consider four age groups (less than 18 years old, 18−34, 35-50, and greater than 50) and two gender values (male and female) (see Table 3.2). I create eight different Google accounts—2 (gender values) X 4 (age group values)—each having a unique combi- nation of gender and age. I manually crafted these accounts by following Google’s account setup process of adding profile details (age and gender), and including a recovery email and phone verification. Implementation: Each account is managed by a selenium bot. The bot runs on a virtual machine created on Google Cloud Platform (GCP). When testing for demo- graphics, searches across all accounts are performed from the same location (Mountain View, California) to control for the effect of geolocation. Figure 4.6 shows the exper- imental setup. Each bot controlling an account opens Firefox browser in incognito Experiment # Category Feature Tested Values Search (Exp 1) Demographics Age <18, 18-34, 35-50, >50Gender Male, Female Search (Exp 2) Geolocation IP Address Georgia, Montana, New Jersey, Ohio, South Carolina Watch (Exp 3) Demographics Age <18, 18-34, 35-50, >50 Gender Male, Female Watch history Watch history Promoting, Neutral, Debunking Watch (Exp 4) Geolocation IP Address Georgia, Montana, New Jersey, Ohio, South CarolinaWatch history Watch history Promoting, Neutral, Debunking Table 3.2: List of user features for the audit experiments. 28 3.2. METHODOLOGY Figure 3.4: Steps performed in Search experiments 1 and 2. mode and logs in to YouTube using that account’s credentials. Each bot conducts searches on YouTube’s homepage by drawing queries from the query sets of all misin- formative topics. The searches are done in sequence similar to Vincent et al’s approach in [406]. The bot sleeps for 20 minutes after every search to neutralize the carry-over effect—noise introduced in search results from dependency present in consecutive searches. Prior audit experiments on Google Web Search showed that carry-over effect is observed if the interval between two sequential query execution is less than 11 minutes [187]. I use this value as the benchmark and decide to keep a time interval of 20 minutes between two YouTube searches to control for carry-over effects. I collect SERP data for each of the 49 search queries, scrape these html-based SERPs to extract URLs of the top 20 videos present in the search results. Experiment 2: Search & Geolocation To study the effect of geolocation, I need to identify physical locations corresponding to each search topic from where automated YouTube searches will be performed. I make use of Google Trend’s interest by sub-region feature to shortlist locations that have the highest (or lowest) interest corresponding to each topic under audit investigation. I searched Trends 50 times for each of the misinformative search topics with the same parameters (region=“US,” time=“1/1/2016 to 12/31/2018,” service=“YouTube search”). I calculate the average interest-by-region value for each sub-region (i.e. state), shortlist 15 sub-regions with the highest interest scores (referred to as hot regions) and bottom 15 regions with lowest scores (cold regions). Intuitively hot and cold regions are states in the U.S. where the search topic is the most and least popular, respectively. I select one hot and one cold sub-region for each search topic based on its availability on the list of active working nodes in geographically dispersed machines, called Planet-Lab [312]. For example, for flat earth topic, among the 15 hottest sub-regions (e.g. North Dakota, Montana, Oregon, etc.) I selected Montana because of its availability among Planet-Lab active working nodes. Table 3.1 shows the selected hot and cold sub-regions across all topics. 29 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES Implementation: For each search topic, I run two selenium bots, each corresponding to either a hot or cold geolocation. The bots run on the virtual machines created on the GCP. These bots Figure 3.5: Steps performed in Watch ex- periments 3 & 4. These experiments have two phases: (1) watch phase (denoted by →), (2) search phase (denoted by →). Watch experiments, for each misinformative topic: Stance No. of accounts (Demographics) No. of accounts (Geolocation) Hot Cold Debunking (-1) 8 1 1 Neutral (0) 8 1 1 Promoting (1) 8 1 1 Total accounts 24 6 Table 3.3: Accounts created to execute Watch experiments for each misinforma- tive topic. In total, I created 120 (24X5) ac- counts to run experiment 3 and 30 (6X5) accounts for experiment 4. Here 5 de- notes the number of topics. connect to the Planet-Lab machines deployed in the hot and cold regions (refer to Table 3.1) for that misinformative topic through ssh tunneling. Figure 4.6 presents the steps performed in this experiment. After searching every query, every bot saves the SERP. Later, I scrape all the saved SERPs and extract the URLs of the top 20 videos present in them (i.e. search results). After completion of both search experiments (demographics and geolocation), I collected a set of 848 unique videos. 3.2.2.3 Watch Experiments The goal of the Watch experiments is to examine the effect that a user’s watch history exerts on the amount of misinformation presented to the user in both YouTube’s search and video pages. I also determine how that effect varies with user demographics and geolocation. The experimental setup comprises of two phases, 1) watch and 2) search. The watch phase builds the watch history of every Google account followed by the search phase that conducts searches on YouTube. During the watch phase, after watching every video, I extract the Up-Next video and the Top 5 recommendation components. 30 3.2. METHODOLOGY Experiment 3: Watch & Demographics. The aim of this experiment is to test the effects in the presence of a user’s watch history. Hence, I first need to build the history of new user accounts by automatically making them watch videos that are either all debunking, neutral or promoting the particular misinformative topic under audit investigation. I create three sets of 2 (gender values) X 4 (age group values) Google accounts to audit each misinformative topic where each set watches 20 videos from each of the three stances. I obtain the videos from the Search experiments. I select the 20 most popular videos for each of the misinformative topics. Popularity is calculated as the engagement accumulated by the video at the time of the experimental runs; Popularity metric (pm)= view count+ l ike count+dislike count+ f avorite count+ comment count I have released all videos corresponding to each stance (promoting, neutral, debunk- ing) that were used to create watch histories of Google accounts along with their popularity values as the part of the online dataset1. Two authors annotated the video collection with stance values: -1 (debunking), 0 (neutral) and 1 (promoting). I describe the qualitative coding scheme and process in Section 4.2.6. Table 3.3 shows the count of accounts created for each misinformative topic for this experiment. Implementation: The Watch experiment for studying the effects of demographics is similar to the Search experiment runs. The only difference is that accounts build their watch history by watching, in its entirety, 20 popular videos from a particular stance set (all having the same stance in a set, either -1, 0, or 1) before conducting any search operation on YouTube. Figure 3.5 presents the steps for the Watch experiment. Experiment 4: Watch & Geolocation. The aim of this experiment is to test the effect of the hot and cold geolocations on the amount of misinformation presented to the users in YouTube, given that each user has a watch history. Similar to the previous Watch experiment, the history is created by making each account watch YouTube videos of a particular stance. I create three sets of two Google accounts (see Table 3.3), each corresponding to a hot or cold region (refer to Table 3.1). The three sets build their watch histories following the same steps as in experiment 3. Implementation: For each search topic, I run six selenium bots, three for hot and three for cold geolocations. After building their watch histories, the bot runs in a similar fashion as experiment 2—Search & Geolocation. 1https://social-comp.github.io/YouTubeAudit-data/ 31 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES After completion of the experimental runs, I collected 2,479 unique videos from both Watch experiments—demographics and geolocation. One author annotated one half of these videos, while the other half was annotated by the second author using the process described in Section 4.2.6. 3.2.3 Annotating the Data Collection Through the audit experiments, I collected a total of 56,475 videos with 2,943 unique videos. I used an iteratively developed qualitative coding scheme to label the video collection. Qualitative coding is a process of interpreting data and labeling it into meaningful categories. First, two researchers randomly selected 25 videos from the Search experiments’ data collection, 5 from each topic. Next, six human annotators independently annotated all videos using a basic 3-scale annotation scheme: -1 (de- bunking), 0 (neutral), and 1 (promoting). All six annotators, including the authors, then discussed their individual annotations and the heuristics followed for the task. After discussions and multiple rounds of iterations, all raters reached a consensus on the annotation heuristics. The process resulted in a scale comprising 9 different annotation values: −1 to 7. This 9-point scale gives a microscopic view of the kinds of videos a user is exposed to when she searches for a misinformative topic (details in the next section). For example, the videos could either promote, discuss or debunk the misinformative topic being searched, or they could discuss a different misinformative topic—a topic that the user never searched for. Table 3.4 enlists the annotation values with descriptions and examples. Please note that to curate misinformative topics for the study, I only considered demonstrably false conspiracy theories. But my annotation scheme does not classify videos for veracity, I rather check whether they promote, debunk or discuss a conspiratorial view related/unrelated to the search topic under audit. 3.2.3.1 Annotation heuristics I annotated videos as “debunking” (-1) when their narrative disputed, derided, or provided scientific evidence against any of the conspiratorial theories related to the particular misinformative topic being audited. For example, the video titled Bill Maher Throws Out 9/11 Conspiracy Theorists On Live TV was present in the Top 5 recommen- dations while auditing the 9/11 misinformative topic. It mocks people supporting the 9/11 conspiracy theory and hence is annotated as “debunking”. Conversely, I 32 3.2. METHODOLOGY Annot- ation Value Stance Description Annotation Heuristics No.ofvideos Normal- ized Score Sample Videos Video Title (Video URL, youtu.be/) -1 debunking, mocking, disproving related misinformation narrative of video disputes, mocks or provides authoritative evidence against conspiracy theories related to the topic under audit 430 -1 (D) Bill Maher Throws Out 9/11 Conspi- racy Theorists On Live TV (p80hXaM4QgU) 0 neutral & related to misinformation narrative of the video does not take any stance on conspiracy theories related to the topic under audit 238 0 (N) The Howard Stern Show and WCBS-2 On Sept. 11 (O3LT6FMF2f8) 1 promoting, support- ing, justifying, ex- plaining related mis- information narrative of video promotes, supports or substantiates any conspiratorial views related to the topic under audit 374 1 (P) 9/11 truthers attend Treason in Amer- ica (2-7GCs-2NUg) 2 debunking, mock- ing, disproving un- related mis- information narrative of video debunks, mocks or provides evidence against a conspiratorial view related to a topic different than the one under audit 64 -1 (D) Did the Titanic Really Sink? The Oly- mpic Switch Theory Debunked (_mpLRCqQ620) 3 neutral & related to another mis- information narrative of the video does not take any stance on conspiracy theories unrelated to the topic under audit 25 0 (N) JFK coverage 12:30pm-1:40pm 11/22/63 (pDOojsg62O0) 4 promoting, support- ing, justifying, ex- plaining unrelated misinformation narrative of the video promotes, supports, justifies or explains any conspiratorial view unrelated to the topic under audit 66 -1 (P) Mafia Boss Tells All - Jimmy Hoffa, JFK Assassination and Much More (__LxwaAEaL8) 5 not about misinformation video content does not contain any conspiratorial views 1667 0 (N) Former Abortionist Dr. Levatino At Virginia Tech (dIRcw45n9RU) 6 foreign language video content in non-English language 35 translated & re- annotated Las voces del 11S, documental en Español del Canal National Geogra- phic (7rMQu2B_3vU) 7 undefined/unknown annotators were unable to assign any of the above annotation values to the video 9 ignored Ahmed Mohamed’s Dad Pushes 9/11 Conspiracy Theories Online (CTkE0Etkszc) 8 removed video removed from the platform at the time of annotation 35 ignored n/a (tpSO7i70LHw) Table 3.4: Description of the annotation scale and heuristics along with sample YouTube videos corresponding to each annotation value. I map the 9-point anno- tation scale to 3-point normalized scores with values -1 (Promoting, (P)) , 0 (Neutral, (N)) and 1 (Debunking, (D)). I have shared the list of 2,943 unique videos along with their annotation values in an online dataset.2 33 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES annotated videos as “promoting” (1) if they proposed, championed, or substantiated any theory or perspective that promotes inaccurate views related to the topic under audit. For example, the video titled 9/11 truthers attend Treason in America shows inter- views with 9/11 truthers—people who believe 9/11 was an inside job—and hence is annotated as “promoting”. I annotated videos as “neutral” (0) when the content of the video presented a general discussion on the topic, without taking stance on conspiracy theories. For example, the video titled The Howard Stern Show and WCBS-2 On Sept. 11 shows clips depicting damage done to the World Trade Centre after the 9/11 attacks. I marked it as neutral since there is no discussion for and against 9/11 conspiracies. Annotation values “2”, “3”, and “4” are similar to values “-1”, “0”, and “1”, respec- tively, with the difference that they correspond to videos promoting, containing neutral content, or debunking conspiratorial information related to a topic different from the one being audited. For example, consider the scenario where audit experiments of 9/11 misinformative topic returned videos discussing conspiratorial information cor- responding to John F. Kennedy’s assassination or those pertaining to the Titanic’s demise. To illustrate, I list two concrete examples here. Video titled Did the Titanic Really Sink? The Olympic Switch Theory Debunked was returned in the Top 5 recom- mendations during the Watch audits of the 9/11 misinformative topic. The video content refutes the conspiracy theory that claims that the Titanic ship never sank. I annotated it as “debunking misinformation not related to the misinformative topic under audit” (annotation value = 2). In another example, a video titled JFK coverage 12:30pm-1:40pm 11/22/63 showed news coverage about JFK’s assassination without promoting or debunking any false conspiracies. I annotated that video as “neutral video not related to the misinformative topic under audit” (annotation value = 3). On the other hand, a video Mafia Boss Tells All - Jimmy Hoffa, JFK Assassination and Much More discusses conspiracy theories surrounding JFK’s assassination. I annotated that video as “promoting misinformation not related to the misinformative topic under audit” and assigned an annotation value of 4. Additionally, I annotated videos as “not related to misinformation” (5) if the content of the video is not related to any misinformative topic. For example, one of the videos in the audit experiment, titled SHOCKINGLY OFFENSIVE AUDITIONS Have Simon Cowell In A Rage! | ANGRY JUDGES | X Factor Global is about a reality TV show audition. Since the content does not contain any information related to any misinformative topic, I annotated the video as unrelated to misinformation. Moreover, 2https://social-comp.github.io/YouTubeAudit-data/ 34 3.2. METHODOLOGY I annotated non-English videos as “foreign language” (annotation value = 6). I later translated the title, description, and the top few comments of these videos using Google Translate3. I then re-annotated them with the appropriate stance value lying between -1 to 5. For example, I re-annotated the Spanish video titled Las voces del 11S, documental en Español del Canal National Geographic as “debunking”, since the comments within the video indicated that it debunks 9/11 conspiracy theory—the misinformative topic being audited. Finally, videos for which I was unable to assign any annotation value between -1 to 6, I annotated them as “undefined or unknown” (annotation value = 7). For example, the video titled Ahmed Mohamed’s Dad Pushes 9/11 Conspiracy Theories Online mentions a 9/11 conspiracy tweet. Since the video neither discusses 9/11 events nor takes a stance for or against any conspiracy theory, the coder was unable to decide the annotation value. Because of the confusion it was marked as “unknown”. During the annotation phase, I also find that YouTube had taken down 35 unique videos that were captured by the audit experiment. I make an ethical decision to not collect the data or annotate content that was removed by the platform. After converging on the annotation scale and heuristic, two authors independently coded 158 videos to test for their inter-rater reliability. A high-reliability score (Cohen’s Kappa score of 0.80), suggested substantial agreement and offered credence to the annotation heuristic. The authors then split the annotation task of the remaining videos evenly between them. I next develop two scoring metrics to score the amount of misinformation in videos. 3.2.3.2 Normalized scores The key goal of my audit investigation is to determine whether user activities—search and watch activities corresponding to a particular misinformative topic—leads to more misinformative content, either in the returned search result videos or through the recommended videos. Hence, for downstream analysis, I map the 9-point gran- ular scale (−1 to 7) to a 3-point normalized score with values of −1, 0, and 1. The normalization process puts videos that contain any type of misinformation, whether related or unrelated to the searched topic, under the same bucket. For instance, if queries for the 9/11 topic result in a video enumerating conspiracies corresponding to missing Malaysian flight 370 (an example from the dataset), then I annotate the video as promoting unrelated misinformation (annotation value = 4) with normalized score = 1. Annotation values of 2, 3, and 4 are mapped to -1, 0, and 1, respectively, while 5 3https://translate.google.com/ 35 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES and 6 are treated as neutral (see Table 3.4). I discard videos coded as 7 and 8, since annotators were either unable to identify their stance (value = 7) or the video was removed from the platform (value = 8). In total, I annotated 2,943 unique videos with 501, 1,980, and 462 videos marked as -1, 0, and 1. 3.2.3.3 SERP-MS Score I develop a scoring metric SERP-MS (SERP Misinformation Score) that captures the amount of misinformation while taking into account the ranking of search results. SERP-MS = ∑n r=1 (xi∗(n−r+1)) n∗(n+1) 2 ; where r is the rank of the search result and n is the number of search results present in the SERP. I only consider the top 10 search results for computing SERP-MS. Thus, SERP-MS is a continuous value ranging between -1 (all top 10 videos are debunking) to +1 (all top 10 are promoting). 3.3 Results In this section, I analyze the collected and annotated audit data to investigate my research questions and hypothesis (refer to Section 3.1). The goal is to determine the effects of personalization attributes on the amount of misinformation returned in both Search and Watch experiments. Recall that, among the three YouTube components (search results, Up-Next, and Top 5 recommendations), I can only collect search results for Search experiments. On the other hand, I collect all three components for Watch experiments. A test of normality reveals that the data is not normally distributed and the samples have unequal sizes. Hence, I opt for non-parametric tests. For all pairwise comparisons, I use Mann-Whitney U test. To perform multiple comparisons, I use Kruskal Wallis ANOVA followed by post-hoc Tukey HSD4. I report results using both normalized and SERP-MS scores. Note that the SERP-MS score is only calculated for the search results component. 3.3.1 RQ1: Effect of demographics and geolocation In the first research question, I investigate the effect of demographics (age and gender) and geolocation on the amount of misinformation returned in various YouTube com- ponents for both brand new accounts and accounts that have build their watch history 4Tukey HSD adjusts p-values automatically, thus controlling family-wise error rate for multiple comparisons. 36 3.3. RESULTS Feature Topic Stance Comp. Statistical Tests Mean Diff. Age Flat Earth N Top5 KW H(3, 800)=18.28, p=0.0004 50 or older < all other age groups (post-hoc)Vaccine controversies N Top5 KW H(3,799)=24.65, p=1.8e-05 age 18-34 < all other age groups (post-hoc) Gender Flat Earth N Top5 MW U=74659, p=0.004 M > F MW U=3612, p=6.6e-07 M (50 or older) > F (50 or older) Moon landing conspiracy theories N Up-Next MW U=2720, p=0.03 F > M Vaccine controversies N Top5 MW U=4068, p=0.002 M (age 35-50) > F (age 35-50) MW U=76206.5, p=0.02 M > F P Top5 MW U=4443, p=0.01 M (age 18-34) > F (age 18-34) Up-Next MW U=2880, p=0.04 M > F MW U=120, p=0.002 M (age 18-34) > F (age 18-34) Geo- location Moon landing conspiracy theories P Top5 MW U=4137.5, p=0.02 Hot > Cold Table 3.5: RQ1b:Watch experiment results for demographics and geolocations, given accounts have built watch history after watching promoting (P), neutral (N) or de- bunking (D) videos. Mean corresponds to normalized scores for the annotated videos. Higher values indicate that accounts receive more promoting videos. For example, M (50 or older) >F (50 or older) indicates that males who are 50 or older and who watch neutral flat earth videos receive more promoting videos in their Top 5 than females of the same age group. progressively by watching either promoting, neutral or debunking misinformative videos. RQ1a [Search experiments]: How are search results affected for brand new ac- counts? I find no significant effect for gender (Mann-Whitney U = 7247667.0, p>0.48), age (Kruskal Wallis H(3,7616) = 0.00888, p>0.99), and geolocation (Mann-Whitney U=471803.0, p>0.496) when comparing using normalized scores. Use of SERP-MS score also shows non-significant results. Thus, H1a, H1b and H1c are not supported demonstrating that age, gender and geolocation do not have an impact on the amount of misinformation returned in search results for users who have newly created their YouTube accounts. RQ1b [Watch experiments]: How are search results, Up-Next, and Top 5 recommenda- tions affected, given accounts have a watch history? I find that age has a significant effect for only two comparisons (refer Table 3.5). In both cases, older people do not receive more misinformation than the other younger age groups. Thus, H1a is rejected. Next, I find that gender has a significant effect across five comparisons involving certain combinations of search topics, watch stance, and YouTube components. Out of the five comparisons, H1b is supported for one case, where female accounts watching neutral moon landing videos receive more misinformation in their Up-Next component than corresponding male accounts watching the same videos. In all other significant 37 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES Component Topic Test Mean Diff(post-hoc) Search Results Vaccines controversies KW H(2,6517)=6.2953, p=0.04 P >N & P >D Top5 All KW H(2,14740)=9.42, p=0.009 P >N & P >D 9/11 conspiracy theories KW H(2,2911)=186.68, p=2.9e-41 P >N & P >D Chemtrail conspiracy theory KW H(2,2845)=73.20, p=1.31e-16 P >N & N >D Flat Earth KW H(2,2980)=49.18, p=2.18e-11 N >P & D >P Moon Landing conspiracy theories KW H(2,3005)=17.18, p=0.0002 P >N & D >N Vaccines controversies KW H(2,2999)=48.54, p=2.9e-11 N >P & D >P Up-Next All KW H(2,2963)=10.29, p=0.006 P >N 9/11 conspiracy theories KW H(2,487)=60.12, p=8.8e-14 P >N & P >D Chemtrail conspiracy theory KW H(2,570)=16.12, p=0.0003 P >D Flat Earth KW H(2,600)=26.29, p=1.96e-06 P >D & D >N Moon Landing conspiracy theories KW (2,606)=5.99, p=0.049 D >N Vaccines controversies KW H(2,600)=66.86, p=3.0e-15 D >N >P Table 3.6: RQ2: Analyzing watch history effects on the three YouTube components. P, N, and D are means of the normalized scores of videos presented (via the YouTube components) to accounts that have built their watch histories by viewing promoting (P), neutral (N), and debunking (D) videos, respectively. For example, P > N indicates that accounts that watched promoting videos received more misinformation (or more promoting videos) compared to accounts that watched neutral videos. comparisons, men receive more misinformation than females. For example, male ac- counts who watch neutral vaccination videos receive more promoting videos in their Top 5 recommendations than female accounts that watch the same videos. Table 3.5 presents all the significant results. I find that H1c holds only for the Top 5 recommendations of moon landing topic. Accounts that watch promoting moon landing videos from Ohio (hot geolocation, region with the most interest) receive more promoting videos in their Top 5 than those who watch the same videos from Georgia (cold geolocation or region exhibiting lowest interest in the topic). For other topics, geolocation did not have any significant effect on the amount of misinformation presented in search results, Up-Next and Top 5 recommendations. 3.3.2 RQ2: Effect of watch history Next, I explore the effect of watch history on the amount of misinformative content returned in the three YouTube components of interest. Note, that RQ2 only applies to the watch experiment, where an account has already built its watch history. Table 3.6 presents only the significant results. I discuss a handful. Statistical tests performed using SERP-MS did not give any significant results. Note that I apply this metric only on the search results component. Using the normalized score metric, I find that H2 only holds for search results corresponding to the vaccine controversies topic (Kruskal 38 3.3. RESULTS Wallis H(2,6517)=6.2953, p=0.0429). This indicates that a user’s previous watch history only affects the misinformative stance of videos presented in search results of the aforementioned topic. Post-hoc tests reveal that accounts that watch promoting anti- vaccination videos receive more promoting videos in their search results compared to those who watch neutral or debunking vaccination videos. Next, I find that watch history has significant effects on the stance of misinformative videos presented in Top 5 (Kruskal Wallis H(2,14740)=9.4235, p=0.0089) and Up-Next video recommendations (Kruskal Wallis H(2,2963)=10.2932, p=0.00581) when all topics are considered together. Post-hoc tests show that accounts that watch promoting videos receive more promoting results in both Up-Next and Top 5 compared to those who watch either neutral or debunking videos. The effect of watch history for both these components is significant for all topics individually too. Thus, H2 is supported for Up-Next and Top 5 recommendations for all topics. I discuss the post-hoc test results for vaccine controversies and chemtrail conspiracy theories topics. Post-hoc tests for the vaccine controversies topic reveal that accounts that watch promoting anti-vaccination videos receive more debunking videos in their Top 5 (Kruskal Wallis H(2,2999)=48.54, p=2.9e-11) and Up-Next (Kruskal Wallis H(2,600)=66.86, p=3.0e-15) components. This finding can be attributed to YouTube’s initiative to reduce the recommendations of anti- vaccination videos. It is important to note that while recommendations of such videos have decreased, a filter bubble still exists with respect to the search results—people who watch promoting anti-vaccination videos were presented with more promoting content (Kruskal Wallis H(2,6517)=6.29, p=0.04). Post-hoc tests for chemtrail conspiracy theories topic demonstrate that accounts that watch videos promoting chemtrails conspiracies receive more promoting videos in their Top 5 (Kruskal Wallis H(2,2845)=73.20, p=1.3e- 16) and Up-Next (Kruskal Wallis H(2,5709)=16.12, p=0.0003) video recommendations than those who watch neutral and debunking videos respectively. Whereas accounts that watch neutral chemtrails conspiracies receive more promoting videos in their Top 5 compared to those who watch debunking videos of chemtrails. Table 3.6 lists the results for the remaining topic comparisons. 3.3.3 RQ3: Across topic differences While in RQ1 and RQ2 I studied the effects of personalization attributes on the amount of misinformation presented to users in various YouTube components, in RQ3 I in- vestigate whether misinformative content presented to users differ across the five misinformative topics. 39 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES 9/11 chemtrails flat earth vaccinesmoonlanding 80 60 P e rc e n ta g e (% ) Debunking Neutral Promoting 40 20 0 (a) Search Results (Search) Debunking Promoting 0 20 40 60 P e rc e n ta g e ( % ) Neutral 80 9/11 chemtrails flat earth moonlanding vaccines (b) Search Results (Watch) Neutral Promoting 80 0 20 40 60 9/11 chemtrails flat earth moonlanding vaccines Debunking P e rc e n ta g e ( % ) (c) Top 5 (Watch) 20 40 60 0 Debunking Neutral Promoting P e rc e n ta g e ( % ) 9/11 chemtrails flat earth moonlanding vaccines (d) Up-Next (Watch) ‘ Figure 3.6: RQ3: Percentages of video stances for each topic. RQ3a [Search experiments]: How does misinformative content present in search results of brand new accounts differ across topics? Figure 5.12c shows the proportion of promoting, neutral, and debunking videos across all topics in Search experiments. I find that H3 is supported for search results of brand new accounts. Comparing both nor- malized scores (Kruskal Wallis H(4,1943)=467.29, p < 7.9e-100) and SERP-MS (Kruskal Wallis H(4,98)=51.1, p < 2.1e-10) across topics show that the amount of misinformation significantly differs among topics. Post-hoc comparisons using Tukey HSD (on both score metrics) reveal that the chemtrail conspiracy theory topic harbors significantly more misinformative search results compared to all other topics. Figure 5.12c also demon- strates the largest amount of promoting videos in the chemtrails topic. I discuss the possible reasons for this occurrence in Section 3.4.2 40 3.3. RESULTS 5000 4000 3000 2000 1000 0 6000 7000 8000 V id e o l e n g th ( se co n d s) Debunking Neutral Promoting 9/11 chemtrails flat earth moonlanding vaccines (a) Video length 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 V id e o p o p u la ri ty ( p m ) Debunking Neutral Promoting 9/11 chemtrails flat earth moonlanding vaccines (b) Video popularity Figure 3.7: Box plots of (a) video length in seconds and (b) video popularity (pm) for each stance under each topic. RQ3b [Watch experiments]: How does misinformative content present in search results, Up-Next, and Top 5 recommendations of accounts having a watch history differ across topics? Figure 3.6b, Figure 3.6d and Figure 3.6c show the proportion of promoting, neutral, and debunking videos across all topics collected from search results, Up-Next and Top 5 recommendations respectively in Watch experiments. H3 is supported for all the three YouTube components for accounts having a watch history. Comparing both normalized scores and SERP-MS across topics, show that topics have a significant effect on the amount of misinformation present in search results, Up-Next (Kruskal Wallis H(4,2963)=375, p < 6.7e-80), and Top 5 recommended videos (Kruskal Wallis H(4,14740)=390.6, p < 2.9e-83). Recall that SERP-MS is applicable only for the search results component. Post-hoc comparisons using Tukey HSD reveal that chemtrail conspiracy theories has significantly more misinformation in its search results compared to all other topics. Figure 3.6b exhibits the largest amount of promoting videos on that topic. On the other hand, the amount of misinformation present in Up-Next and Top 5 recommendations for 9/11 conspiracy theory topic is significantly more than other topics. This is also evident from Figures 3.6c and 3.6d. 3.3.4 Analyzing Video Length and Popularity Analyzing video length, I observe that promoting videos are longer than neutral and debunking videos across all misinformative topics, except chemtrails conspiracies where they are slightly shorter than neutral videos and longer than debunking ones 41 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES (see Figure 3.7a). For all topics, debunking videos are the shortest compared to other stances. I also observe that neutral videos are the most popular (see Figure 3.7b), where popularity is calculated using the popularity metric (pm). Topic 9/11 has more popular videos compared to other topics. On the other hand, for topic moon landing, popularity of videos under each stance is almost the same. Although the percentage of videos promoting chemtrails conspiracies is highest when compared to other misinformative topics, they are the least popular videos. 3.4 Discussion 3.4.1 Effect of demographics and geolocation on misinformation Modern search engines filter, rank, and personalize results before presenting them to a user. These information retrieval systems make decisions about the relevance of results without considering accuracy and credibility—a fact most people are unaware of [262]. Motivated by several media reports pointing out the prevalence of misinformation in search spaces, I audited YouTube to empirically determine the extent of conspir- atorial content present in its search and recommended video results. I investigate the role played by personalization attributes (age, gender, geolocation, and watch history). The analyses show little evidence to support that user demographics and geolocation play any role in amplifying misinformation in search results for users who have newly started their search journey—those with brand-new accounts. On the contrary, once they have a watch history, I find that demographics and geolocation attributes do exert an effect. However, this effect pertains to only certain combinations of personalization attributes and varies with the topic under audit investigation. I saw significant gender differences in 8 comparisons and in all but one case men (account gender set to “male”) were recommended more misinformative videos. Perhaps more surprisingly, in 4 of these cases, men were watching neutral videos and yet ended up with significantly higher misinformative video recommendations. While I do not know why YouTube’s algorithm showed this behavior, the observed gender-based differences have important societal implications, especially for certain misinformative topics, such as vaccine controversies. For example, a survey of 2,300 people in the United States revealed that the percentage of male anti-vaxxers is more than females [274]. Therefore, recommending videos that promote misinformative topics to men can inflict more harm by reconfirming their pro-conspiracy beliefs. Moreover, recommending 42 3.4. DISCUSSION promoting videos to men who are drawn to neutral information and have yet not developed a strong pro-conspiracy belief towards the topic is even more problematic because it might increase their chances of forming pro-conspiracy beliefs. 3.4.2 Effect of watch history on misinformation One of the goals of the audit investigations was to verify several anecdotal claims criticizing YouTube for surfacing misinformative content in its recommendations [82, 254, 415]. These claims accused the platform of driving users into a misinformation rabbit hole—a phenomenon where people watching videos promoting misinformation are presented with more such videos in the search results and recommendations. Con- trary to these blanket claims, I observe variability in YouTube’s behavior in presenting recommendations to accounts having a watch history across different misinformative topics. Comparing the stances of the annotated data obtained from the search results of accounts with a watch history shows that YouTube’s search algorithm fares well for the flat earth and vaccine topic. On the other hand, I witness a large proportion of videos promoting misinformation for the chemtrails topic (refer to Figure 3.6b). This observation can be attributed to YouTube’s recent effort to censor misinformative content belonging to select search topics. In an announcement to the public, the plat- form pledged to reduce misinformative content belonging to topics like 9/11, flat earth and medical misinformation [379]. Thus, I believe the percentage of search results promoting these misinformative topics is less compared to other topics like chemtrail conspiracies. The audits reveal that people who watch promoting videos for certain misinfor- mative topics (for example, 9/11 conspiracies) are recommended more of such videos in their Up-Next and Top 5 recommendations compared to those who watch neutral or debunking videos. These findings indicate that the recommendation algorithm is biased towards the stance of videos watched by the user for certain misinformative topics (refer Table 3.6). In another observation I find that for users watching videos on the vaccine topic, both Top 5 and Up-Next recommendations return a negligible propor- tion of videos promoting “vaccine hesitancy”, 1.2% and 0.5% respectively. Statistical tests reveal that people watching promoting anti-vaccination videos receive more debunking videos in their recommendations compared to people who watch neutral or debunking videos. However, a filter bubble effect still exists for the search results component, where people watching anti-vaccination videos are presented with more such results. This variability in YouTube’s behavior across search topics suggests that 43 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES YouTube is modifying its search ranking and recommendation algorithms selectively, handpicking topics that are highlighted by media reports and technology critics (e.g. reports around anti-vaccine video recommendations). These observations are concern- ing, since all misinformative topics are high impact, popular and perennial and hence are likely to affect a large population of users’ search experiences. My findings serve as an important call to action for YouTube to develop a more universal approach that offers a comprehensive solution to the problem of misinformation. 3.4.3 Tackling search engine enabled misinformation Complete eradication of misinformation from YouTube requires time and significant resources. In the interim, YouTube can take several steps to tackle the problem of misinformation on its platform. It can begin by giving priority to monitoring certain misinformative topics that have a wider negative impact on society. Which misinfor- mative topics are a threat to public well-being? While “vaccine hesitancy” is now one of the top 10 global threats of 2019 [299] and has led four European nations to lose their “measles free” status [62], seemingly harmless pizzagate conspiracy led a man to fire shots in a pizza parlor [386]. I recommend that YouTube should identify high-impact and popular misinformative topics. My work itself suggests a technique to curate such misinformative topics that are perennial, popular, and searched by a large number of people. Misinformative content belonging to the selected impactful topics can be filtered, fact-checked, and accordingly censored from the platform. But is censoring the misinformative content enough? The audit experiments reveal that YouTube recommendations are still biased towards the misinformative stance of videos watched by a user. Given that almost 500 hours of content is uploaded to YouTube every hour [186], censorship might not be a comprehensive solution to fix this algorithmic bias. There is a need to break the filter bubble effect by recommending debunking videos to people who watch videos promoting misinformative content. YouTube can start by identifying and modifying recommendations of vulnerable populations who could be targets for certain misinformative topics. The audit experi- ments revealed one such demographic. For example, I found YouTube recommending promoting videos to men who watched neutral misinformative videos. The audits also revealed variability in YouTube’s behavior toward certain misinfor- mative topics—an indication of a reactive strategy for dealing with misinformation. I recommend the platform also proactively reveal the workings of its algorithm. For example, users can be told “you are recommended video A because you viewed videos 44 3.5. LIMITATION AND FUTURE WORK C and D”. Given the complexity of algorithms used by search engines and the interplay between the data and algorithm, even an expert in the area might not be able to predict algorithmic output [343]. Thus, there is also an inherent need for platforms to conduct audit studies that can help reveal biases present in their algorithm. While I discussed some nascent steps that YouTube can take towards eradication of misinformation from its platform, this feat cannot be achieved without having proper content policies and infrastructure in place. Currently YouTube’s community guidelines do not disallow misinformative content [434]. There is a need to have appropriate policies in place that not only prohibit posting misinformative content on the platform but also ensure that posting advertisements on misinformative videos is not financially incentivized. The challenge of having appropriate infrastructure to implement these policies still remains. 3.5 Limitation and Future Work This study is not without limitations. I do not perform repeated searches of the search queries over several days which is essential to study the longitudinal effect of person- alization. I plan to conduct continuous audit runs with repeated searches in the future. I also tested the effect of the geolocation feature only for regions within the Unites States, but conducting audits over a global scale is a fertile area for future endeavors. The Search and Watch audit runs had a gap of three months. Thus, I do not perform any comparisons between the search result components of the two audits. I do not take into account the stance of a search query and how that affects the search results. I make this conscious choice because my methodology for compiling high-impact search queries, by definition, focuses on realistic searches that were most used by real users on YouTube. Identifying videos that promote conspiracies and inaccurate content or those that debunk them is a challenging task. To make such distinctions with high precision, I used qualitative coding to annotate videos. In addition to the video content, I referred to metadata attributes, such as video title, description, and user reactions present in the comments section. I found that videos relating to misinformative topics exhibit special characteristics. For example, pro-conspiracy videos are mostly longer while neutral videos are more popular. I believe that such distinctive features along with features used in the manual annotation process can be leveraged to build machine learning models that can identify the stance of videos. 45 CHAPTER 3. AUDITING YOUTUBE FOR PERENNIAL AND DEMONSTRABLY FALSE CONSPIRACY THEORIES While I audit three major components of YouTube, other components such as home pages and trending sections can also be examined. Auditing search queries presented by YouTube’s autocomplete feature for their stance is also left for future investigation. Moreover, understanding how misinformative search results and recommendations affect users’ search intent [427, 429] is another compelling avenue for future research. 3.6 Conclusion In this study, I conducted two sets of sock puppet audit experiments on the YouTube platform to empirically determine the effect of personalization attributes (age, gen- der, geolocation, and watch history) on the amount of misinformation prevalent in YouTube searches and recommendations. I created bots to impersonate users with specific personalization attributes and built YouTube account history by making bots watch videos of certain stances (promoting, neutral, and debunking). I found that the personalization attributes affect the amount of misinformation in recommenda- tions once the bots develop a watch history. The study also empirically establishes the “misinformation filter bubble effect”—the extent to which personalized search engines could trap people in echo chambers of inaccurate information. I also found that the misinformation filter bubbles do not exist equally for all topics. For example, the study suggests that YouTube is modifying its search and recommendation algo- rithm for vaccine controversies topics where the platform recommends scientific videos to users watching promoting videos. As the research delved into these findings, it also propelled further inquiries. Once YouTube modifies its policies and algorithms for a specific topic, what are the long-term effects of such modifications? How do the algorithms behave with real users with complex user histories? I explore these questions in the next chapter. 46 C H A P T E R 4 AUDITING YOUTUBE FOR ELECTION MISINFORMATION 4.1 Introduction “Oregon GOP frontrunner for governor embraces claims of election fraud... said he doubted Oregon’s vote-by-mail system”—The Texas Tribune, Feb 11, 2022 [360] “Election Deniers Go Door-to-Door to Confront Voters After Losses (in US primaries)”— Bloomberg, Aug 23 2022 [63] “With 10 weeks until midterms, election deniers are hampering some election preparations Some election deniers have “weaponized” against us, one election official says.”—ABC News, Aug 30, 2022 [364] Skepticism around the legitimacy of the US electoral process, which primarily gained momentum during the 2020 US presidential election, had serious ramifications. For example, endorsement of election conspiracy theories was found to be positively associated with lower turnout in the 2021 US Senate election in Georgia [178]. In 2022, the false narratives around the 2020 elections still persist [257, 261] and continue to threaten democratic participation in the upcoming US midterm elections [257, 261]. In the last two years, 19 US states altered voting procedures and enacted laws to make voting more restrictive, creating information gaps and fresh opportunities for election misinformation to emerge and proliferate in the real and online world [261]. Thus, battling election misinformation has never been more important. Studies show that social media platforms have become important mediums for political discourse [46, 408]. In particular, YouTube—the most popular platform among 47 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION US adults [310]—has emerged as a political battleground as demonstrated by the fact that both political parties extensively used the platform for election campaigning [384]. However, the platform came under fire from technology critics for being a hub of electoral conspiracy theories [224, 409]. Given the concern that search engines can play a significant role in shifting voting decisions [134, 135] and can confine users into a filter bubble of misinformation [205], there has been a push for online platforms to enact policies that minimize election misinformation [353]. In response to this push, YouTube introduced content policies to remove videos spreading election-related falsehoods and claimed that misinformative videos would not prominently surface or get recommended on the platform [256, 371, 433, 436]. However, the formulation of policies does not equate to effective enactment [318]. It’s evident from the results of two misinformation audits conducted on the platform for the same conspiratorial topics (such as vaccine controversies, 9/11 conspiracies), first in 2019 [205] and second in 2021 [390], both of which found echo chambers of misinformation on the platform. Despite changes to YouTube’s misinformation policies in 2020 [382], the authors of the second audit study did not find improvements when compared to the results of the first audit, rather they found recommendations worsening for topics like vaccination. These findings iterate the need to continuously audit platforms to investigate how a platform’s algorithms fare with respect to problematic content and how effectively a platform’s content policies are implemented [361]. While my previous study audited YouTube for misinformation (Chapter 3), it was conducted using sock-puppets (bot accounts emulating real users) in conservative settings1 which often do not reflect true user behavior. There is a dearth of crowd-sourced misinformation audits that test the algorithms’ behavior with real-world users ([71] is one of the few excep- tions). In this study, I fill this gap by conducting a large-scale crowd-sourced audit on YouTube to determine how effectively YouTube has regulated its algorithms—search and recommendation—for election misinformation. To conduct the audit, I recruited 99 participants who filled out a survey and installed TubeCapture, a browser extension built to collect users’ YouTube search results, and recommendations. The extension conducted searches for 88 search queries related to the 2020 US presidential elections. I also seeded TubeCapture with 45 seed videos with three differing stances on election misinformation—supporting, neutral, and opposing. The extension collected up-next recommendation trails—five consecutive 1For example, sock-puppet building account history by watching videos that only promote misin- formation. 48 4.1. INTRODUCTION up-next recommendation videos—for each seed video. TubeCapture simultaneously collected YouTube components from both personalized standard and unpersonalized incognito windows allowing me to measure the extent of personalization. This leads me to the first research question: RQ1 Extent of personalization: What is the extent of personalization in various YouTube components? RQ1a: How much are search results personalized for search queries about the 2020 US presidential elections and the surrounding voter fraud claims? RQ1b: How much are YouTube’s up-next recommendation trails personalized for seed videos with different stances on election misinformation—supporting, neutral and opposing? I find that while search results have very little personalization, up-next trails are highly personalized. I next venture into determining the amount of election misinfor- mation real users could be exposed to under different conditions, such as following up-next trails for videos supporting or opposing election misinformation. RQ2: Amount of election misinformation: What is the impact of watching a se- quence of YouTube up-next recommendation videos starting with seed videos with different stances on election misinformation (supporting, neutral, and opposing) on various YouTube components? RQ2a: How much do search results get contaminated with election misinfor- mation? RQ2b: What is the amount of misinformation returned in users’ up-next recommendation trails? RQ2c: What is the amount of misinformation that appears in users’ homepage video recommendations? I find that YouTube presents debunking videos in search results for most of the queries. I also observe an echo chamber effect in recommendations where trails with supporting seeds contain more misinformation than trails with neutral and opposing seeds. Since election misinformation is closely entangled with political beliefs with several right-leaning news sources amplifying the claims of voter fraud [77, 264], I also study the diversity and composition of the content presented by YouTube in its various components. I ask, 49 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION RQ3: Impact on composition and diversity: What is the impact on content diversity when watching a sequence of YouTube up-next recommendation videos starting with seed videos with different stances on election misinformation (supporting, neutral and opposing)? RQ3a: How diverse are the search results ? RQ3b: How diverse are the up-next recommendation trails? I find that YouTube ensures source diversity in its search results. I also find a large number of impressions for left-leaning late-night shows (e.g. Last Week Tonight with John Oliver) and right-leaning Fox news in users’ up-next trails. Overall, my work makes the following contributions: • I conduct a post hoc audit on YouTube to determine how its algorithms fare with respect to election misinformation; post hoc auditing comprises investigating a platform for a past topic or event which could have a significant impact on citizenry in the present and future. In turn, I am able to test the effectiveness of YouTube’s content policies enforced to curb election misinformation. • I extend prior work on misinformation audits by conducting an ethical crowd- sourced audit to see the impact of performing certain actions on the searches and recommendations of real-world people with complex platform histories instead of conservative settings of sock puppet audits. • My audit reveals that YouTube search results contain more videos that oppose election misinformation as compared to videos supporting election misinforma- tion, especially for search queries about election fraud in presidential elections. However, a filter bubble effect still persists in the up-next recommendation trails, where a small number of misinformative videos are presented to users watching videos supporting election misinformation. 4.2 Methodology 4.2.1 Developing search queries to measure election fraud based misinformation The first methodological step in any algorithmic audit is to determine a viable set of relevant search queries that would be used to probe the algorithmic system. For my 50 4.2. METHODOLOGY Extract high impact queries from Google Trends Extract tags from YouTube videos shared by users promoting voter fraud claims on Twitter Extract relevant tags using keyword matching Combine search queries Filter search queries Final set of search queries Figure 4.1: Figure illustrating the method to curate search queries for audit experiment study, I identified search queries that satisfy two properties. First, I select high-impact search queries that were used by people to search about Presidential Election as well as the voter fraud claims about the 2020 elections. Second, I curate search queries that have a high probability of returning misinformative results which would result in meaningful measurements of algorithmically curated misinformation about the audit topic. To compile such queries, I used Google Trends and YouTube video tags (refer Figure 4.1). 4.2.1.1 Curating high-impact queries via Google Trends First, I leveraged Google Trends which contain Google’s daily and real-time search trends data. As the most popular search service, its trends are a good indicator for understanding the real-world search behavior of a large number of people. Using Election Fraud 2020 and Presidential Election as search topics, United States as location, April 2020 to Present as date range, and search service as YouTube search, I extracted the top 15 most and least popular search queries that people used on YouTube. I choose April 7 as the start date since this was the day when Donald Trump made one of his first fraudulent claims about the security of mail-in ballots [208]. I included the most popular queries since they represent the ones that people mostly use to get information on elections. To explore the data-voids [167] associated with the audit topic, I also included the least popular search queries to determine if those terms have been hijacked by conspiracists to surface misinformation. 4.2.1.2 Curating misinfo-queries queries using YouTube video tags Second, I used YouTube video tags that content creators associated with misinfor- mative videos while uploading them on the YouTube platform (see Figure 4.2 for an example). These tags could be thought of as search words representing how content creators would like their videos to be discovered. To extract video tags associated with election misinformation videos, I leveraged a large-scale Voter Fraud 2020 dataset 51 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION threat to democracy election meddling election tampering ballot harvesting non-citizen voters voter fraud Figure 4.2: List of video tags associated with YouTube video titled Is Voter Fraud Real? (video id: RkLuXvIxFew) that promotes voter fraud misinfor- mation. Video tags are added by con- tent creators while uploading YouTube videos on the platform. The tags can be extracted from videos via YouTube APIs or third-party tools. I use tags as- sociated with videos shared by users promoting voter fraud claims on Twit- ter as search queries in the audit exper- iments. presidential election 2020 us elections 2020 latest news election fraud 2020 rigged election dominion voting exposed mail in ballots 2020 stop the steal joe biden voter fraud usps whistleblower voter fraud evidence trump biden general election dominion voter fraud Table 4.1: Sample search queries for the YouTube audit released by Abilov et al [37]. The dataset contains over 12,002 YouTube video URLs that were shared on Twitter by accounts that tend to refute and promote voter fraud claims. I extracted YouTube video tags associated with videos shared by accounts promoting voter fraud claims to probe YouTube (n=200K). To curate a viable number of search queries from the extracted video tags, I employed several steps. First, I manually curated a list of 10 keywords related to elections and fraudulent claims sur- rounding the elections2 from the list of keywords provided by Abilov et al [37] as well election 2020 misinformation report produced by the Election Integrity Partnership [147]. Then for each of the keywords, I extracted 15 top and 15 least occurring video tags containing that term. For example, one of the most occurring tags containing keyword whistleblower was ‘usps whistleblower’ while the least occurring tag was ‘whistleblower jesse morgan’. 4.2.1.3 Filtering search queries to obtain the final set I combined search queries obtained from both Google Trends and YouTube video tags in the final query set and employed several filtering steps to obtain a reasonable number of relevant search queries. First, I only kept queries related to election 2020, for example, I kept ‘election fraud 2020’ and removed ‘election fraud 2016’. I removed 2steal, fraud, ballot, elect, seal, dominion, sharpiegate, whistleblower, harvest, and sunrise zoom 52 4.2. METHODOLOGY Election fraud dataset by Abilov et al Select top 15 videos with most views as seeds from each stance Extract videos with max views Annotate videos Supporting Opposing Neutral Figure 4.3: Figure illustrating the method to curate seed videos for the audit experiment Annotation label Video title Video id Supporting election fraud misinformation Poll worker gives his account of what happened when he tried to monitor the vote in Nevada 4X2V5hPPp6w Joe Biden says he’s built most extensive "voter fraud" org in history WGRnhBmHYN0 Neutral Ex-Trump official shares his prediction if Trump loses 2020 KuqhhrmhfCI ’Don’t be ridiculous’: Rudy Giuliani learns about Biden win from reporters Z0hEFa52Bdo Opposing election fraud misinformation Voting by Mail: Last Week Tonight with John Oliver (HBO) l- nEHkgm_Gk Trump and the GOP Still Refuse to Accept Biden’s Win: A Closer Look QoPA3unjQgA Table 4.2: Sample seed videos curated for the audit experiment. duplicate and redundant search queries and replaced them with a single randomly selected query. For example, I replaced queries ‘voter fraud 2020’, ’voter fraud’, and ‘vote fraud’ with ‘voter fraud 2020’. I removed queries with lengths greater than five since they were overly specific (e.g. ‘we’ve got pictures of the check stubs paid to people to ballot harvest’). I also removed queries containing names of news channels, news anchors, and presidential candidates because they were too generic and not directly related to the audit topic. However, I kept the search queries where the names of the presidential candidates were together with election or election fraud related terms (e.g. ‘Joe Biden voter fraud’). I also removed search queries that were in languages other than English. Finally, I had 88 search queries in total. Table 4.1 presents a sample. 4.2.2 Determining popular seed videos to collect up-next video trails The second step of the audit experiment is to curate YouTube videos that would act as seed videos to collect the up-next video recommendation trails. I again leveraged Abilov et al’s YouTube video dataset [37]. Recall, the authors identified clusters of Twitter users who either shared tweets promoting or detracting from voter fraud claims and released the YouTube videos related to election fraud 2020 shared by those users. At the the time of analysis, out of the ∼12K videos present in the dataset, 53 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION 8.9K were present on YouTube. The remaining videos were either removed or made private. Out of the videos that were still present, 1K videos were shared by users in the detractor cluster, 6.5K videos were shared by users in the promoting cluster, and the rest were shared by users who were suspended from Twitter. I sampled 445 videos that had accumulated the maximum number of views from both the promoting and detracting clusters (890 in total). Since the videos were not annotated by the authors for misinformation, I could not assume that videos shared by users in the promoting cluster would contain misinformation. Therefore, I conducted an intensive and iterative process to determine the labels and heuristics for annotating the YouTube videos for misinformation. I describe the process in detail in Section 4.2.6. Through the annotation process, I labeled the videos as supporting, neutral, or opposing election misinformation. Out of the 890 videos, 74 were opposing, 16 were neutral, 101 supported election misinformation while remaining were irrelevant. I selected the top 15 videos that had accumulated maximum engagement, determined by the number of views, for each stance (except the irrelevant) as seeds. Figure 4.3 illustrates the seed video curation method. Table 4.2 presents a sample of seed videos. 4.2.3 Experimental design To conduct the crowd-sourced audit, I designed a Chrome browser extension named TubeCapture that enabled us to watch videos, conduct searches, and collect various YouTube components from users’ browsers. Figure 4.4 presents an overview of the experimental design. To select the study participants, I conducted a screening survey of a large sample of people (details in Section 4.2.4). Next, participants were instructed on how to use TubeCapture and provided with a unique code to activate the extension. Once activated, they used TubeCapture for a period of 9 days. I seeded the extension with 45 seed videos and 88 search queries. For each participant, each day the extension opened YouTube in two browser windows, one standard window and one incognito window. While the personalized results act as treatment for the experiments, results obtained from incognito act as control since YouTube does not personalize content in the incognito browsing window [437]. By comparing the results from standard and incognito windows, I determine the role of YouTube’s personalization algorithms in exposing users to misinformative content. TubeCapture first collected and stored the user’s YouTube homepage from standard and incognito windows. The extension ensured that the user had signed in to their YouTube account in the standard window and remained logged in using the same 54 4.2. METHODOLOGY Install chrome extension Fill screening surveyCurate search queries Collect up-next trails for supporting seed videos Collect search results Fill study survey User registers extension using a unique code Standard (signed-in) window Incognito window Seed video (trailhead) Level #1 Level #2 Standard (signed-in) window Incognito window Delete YouTube watch search history for last 3 days Delete YouTube watch search history for last 3 days click on up- next video click on up- next video click on up- next video click on up- next video repeat for three days collect up-next trails for 3 seed videos collect search results for 88 queries collect search results for 88 queries Collect up-next trails Experimental design Collect search results If selected Collect up-next trails for neutral seed videos Collect search results repeat for three days collect up-next trails for 3 seed videos Collect up-next trails for opposing seed videos Collect search results repeat for three days collect up-next trails for 3 seed videos upto level #5 upto level #5 collect search results for 88 queries (a) (b) (c) Curate seed videos Section 3.1 Section 3.2 Section 3.4 Section 3.4 Section 3.3 Delete YouTube watch search history for last 3 days Figure 4.4: Figure (a) presents an overview of the crowd-sourced audit of YouTube for election misinformation, Figures (b) and (c) show how the extension TubeCapture collected YouTube components from both standard and incognito windows simultane- ously. YouTube account throughout the study period. I also ensured that the homepage from the standard window is stored without the user’s email address to ensure the participant’s anonymity. Next, the extension opened a seed video (previously selected) that supports election misinformation, watched it for 2 minutes, saved the video 55 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION page, clicked on the up-next video, and again saved the video page of the up-next video. This process was repeated until I collected 5 levels of up-next recommendations’ video pages. I refer to the collection of 5 up-next video recommendations as up-next trails. Each day I collected up-next trails for five seed videos. Then, the extension again collected user’s homepage followed by personalized (via standard window) and unpersonalized (via incognito window) search results for the curated search queries. The extension collected the search results for queries in the same order for every participant to control for carry-over effects of the search queries [187]. For days 1-3, the extension collected up-next trails for seed videos supporting election misinformation. At the beginning of the fourth day, the extension deleted the search and watch history created by the browser extension. According to YouTube, removing an item from search or watch history removes the impact of consuming that content on future searches and recommendations. This essential step helped in two ways- 1) it ensured that the history created by the extension in the first three days does not impact the rest of the experiment3, and 2) it also ensured that the user histories built by the extension did not pollute users’ future recommendations and search results after the study period is over. For days 4-6, the extension collected up-next trails for seed videos that were neutral in stance. At the beginning of the seventh day, again search and watch history developed by the extension was deleted. For days 7-9, the extension collected up-next trails for opposing seed videos. Towards the end of the 9th day, extension again deleted the YouTube history developed by the extension. All the data collected by the extension was sent to a back-end server. The participants were instructed on how to remove the extension after the study period was over. My current mixed design allows me to test how YouTube’s algorithm fares under different conditions—watching videos of different stances—for individuals with different political beliefs. Note that I did not opt for a randomized assignment in a between-subject design since it would require a large number of participants to test all the conditions (3 political affiliations X 3 misinformation stances). I built the YouTube capture extension using JavaScript libraries. The back-end server was set up using Flask and Nginx. I load-tested the server using Jmeter and ensured that the server could simultaneously handle 500 GET and 200 POST requests 3To test whether the Youtube algorithm discards the deleted history while making recommendations, the first author ran two test runs for two topics: presidential elections (the audit topic), and OLED TV. They built the account history of a brand new YouTube account for 3 days using a few videos and search queries related to the topic after which they deleted the search and watch history. Then the author manually inspected the homepage recommendations and the video recommendations of the top five videos present on the homepage and found that the effect of history deletion is almost immediate. 56 4.2. METHODOLOGY and added mechanisms to handle errors and server timeouts. I used a MySQL database for storing the data collected using the extension. The communication between the extension and the back-end server was encrypted using SSL. Note that to collect data, TubeCapture opened windows in the background of the currently active browser window, thereby allowing participants to continue working on their device while the extension is running. In case, the participant accidentally closed any of the windows opened by the extension, I informed users via a pop-up window and instructed them on how to resume running the extension. After building the TubeCapture extension, I tested it within my research group and conducted three pilot studies. The aim of the pilot studies was to fix technical issues, examine the impact of running the extension on devices with different configurations, RAM, and operating systems as well as improve the usability of the extension. 4.2.4 Screening and study survey In order to select participants for the study, I screened users according to several criteria. To be eligible for the study, users should be 1) 18 years of age or older, 2) reside in the United States, 3) have a YouTube account, 4) consume content on YouTube primarily in the English language, 5) have a chrome browser installed, 6) willing to run a chrome browser extension for 9 days and 7) have at least 8GB RAM on their device to ensure the smooth running of the extension4. The users who qualified for the screening survey were sent another study survey. The study survey contained questions about users’ demographics, political affiliation, YouTube usage, trust in online information, their opinion on personalization and bias in various components of YouTube, and their view on the results of the presidential elections 2020 as well as conspiracies surrounding the elections. I also included two attention-check questions. The study survey was also used for screening participants. I disqualified users who 1) answered both attention check questions incorrectly, 2) did not frequently use YouTube, and 3) did not use YouTube to access news or information about the 2020 presidential elections. I also used the survey responses to obtain a balanced number of participants across three political affiliations (Democrats, Republicans, and Independents). Later in the recruitment phase, I had enough democrats and independents as participants and thus, added being a republican as a qualifying criterion in the study survey. 4I warned users against participating in the study if their device’s RAM is less than 8GB and informed them that their device or browser might hang in such a situation 57 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION 4.2.5 Recruitment and study deployment For the pilot studies, I recruited users from a combination of platforms such as Reddit5, Facebook ads, Twitter, and Amazon Mechanical Turk (AMT). The retention rate was highest for participants recruited from Twitter and AMT. Thus, I used these two platforms to recruit participants for the main study. Out of the 575 users who submitted the screening survey, 400 qualified, and 99 participated in the study. Out of the 99 participants, 94 ran the extension for the entire study duration. Overall, my study sample of 99 users constituted of 60.6% males and 39.39% females, was predominantly White/Caucasian (60.6%) and the majority (53.53%) of the participants had a bachelor’s degree. Politically, 39.39% of the participants were Democrats, 34.34% independents, and 26.26% Republicans. Based on the results of 2020 presidential elections6, 66.67% of the participants lived in the blue states, 32.32% in red while one individual resided in Puerto Rico7. 4.2.6 Developing data annotation scheme Developing the qualitative coding scheme to label YouTube videos for election misin- formation was hard and time-consuming, requiring four rounds of discussions and consultation with an expert to reach a consensus on the annotation heuristics. In the first round, I and an undergraduate research assistant sampled 196 YouTube videos from Abilov et al’s YouTube dataset [37] and separately annotated the videos. We con- sidered prior work on election misinformation narratives [147] and YouTube content policy [436] as references to identify election misinformation, and came up with an initial annotation scale and heuristics to classify videos. Then we came together to reach a consensus on the annotation values. However, even after multiple rounds of discussions, annotations diverged for 33.6% of the videos. I then conducted additional rounds of annotation exercises with seven researchers, out of which five had extensive work experience on online misinformation. In every round, researchers independently annotated 15 videos and later discussed every video’s annotation value and the re- searchers’ annotation process. I also reached out to a postdoctoral researcher who has extensive research experience on online multi-modal election misinformation for feed- back. Based on the insights provided by the external researchers and postdoc, I refined 5https://www.reddit.com/r/SampleSize/ 6https://www.politico.com/2020-election/results/president/ 7Puerto Rico is not considered a state but is considered an unincorporated territory of the United States 58 4.2. METHODOLOGY the annotation criteria and heuristics 8. Below I describe the annotation guidelines and heuristics in detail. 4.2.6.1 Annotation guidelines In order to annotate a YouTube video, the annotators were required to go through sev- eral fields present on the video page in the following order: title and description, the overall premise of the video which could be determined by going through the video transcript or watching the video content, and considering channel bias. I encouraged participants to perform an online search to gain more contextual information about events or individuals discussed in the video that they were unaware of. This strategy is grounded in lateral reading technique that is often used by fact-checkers for credibility assessments [424]. Note that I did not ask participants to consider video comments for the annotations because I found during the annotation exercises that comments could be misleading. For example, video Dominion Voting Systems representative demon- strates voting machines (Q7kPSzYsR6Y) contains a demonstration of dominion voting machines, however, the comments indicate the video to be supporting misinformation. 4.2.6.2 Annotation heuristics In this section, I describe the annotation scale and heuristics. Supporting election misinformation (1): This category includes YouTube videos that support or provide evidence for misleading narratives around the presidential elections. I did not include videos showing incidents of mail dumping, destroyed ballots, etc. in isolation. However, if the videos use these incidents to push a specific narrative/agenda like undermining confidence in mail-in voting, then I considered them as supporting misinformation. I also considered live YouTube videos (live press conferences, court hearings, etc.) that highlighted voter fraud claims without giving any additional context in the title, description, or beginning of the video as supporting misinformation. A few examples of videos in this category include NO RETREAT! America Is About To #StopTheSteal | Good Morning #MugClub (Xqcwzi8Onsk) where video’s title, description, and content hint towards massive voter fraud incidents in the US 2020 presidential elections and LIVE: Trump Legal Team Presents CLEAR Evidence of Fraud Before Georgia Senate Committee 12/3/20 (e35f4pUIYOg) which contains live 8It is important to note that all annotators and the post-doctoral researcher are left and center-left leaning individuals which may have affected how the content of YouTube videos was perceived and how the annotation heuristics were developed. 59 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION AMT workers 1) Approval rating>90 2) Reside in US Train workers Screen workers Work on annota- tion task Annotate three YouTube videos whose labels were known in advance 1) Annotate YouTube video 2) Provide rationale Description of annotation task, and labels with sev- eral examples labels, heuristics and the Qualification test Full score in screening AMT workers' unofficial slack community Receive positive feedbackr/mturk: subreddit comm- unity of AMT workers Figure 4.5: Figure illustrating the process of obtaining YouTube video annotations from AMT workers. The workers were screened via a qualification test where they were first trained by providing detailed descriptions of the annotation labels. To test their understanding, they were asked to annotate three YouTube videos whose labels were known in advance. Workers who correctly labeled the three videos proceeded to work on the annotation task. To ensure that the description of the annotation labels and task was clear and comprehensive, I posted on r/mturk—a subreddit community of AMT workers and AMT workers’ unofficial slack channel. I released the qualification test and annotation task after receiving positive feedback from the AMT community. footage capturing the testimony of individuals claiming occurrence of voter fraud in 2020 presidential elections. The video’s description, title, and beginning do not contain any statements questioning or contradicting the claims of widespread voter fraud. Neutral (0): I consider videos as neutral when they are related to the 2020 elections but do not support or oppose false narratives surrounding the elections. For example, video WATCH: The first 2020 presidential debate (w3KxBME7DpM) is considered neutral since it covers the first presidential debate of the elections. Opposing (-1): I annotate videos as opposing when they oppose or debunk the misin- formation narratives behind the 2020 US presidential elections. I also include satire videos making fun of the misinformative claims in this category. For example, video Trump Has Yet To Show Real Evidence Of Fraud, But Getting Him Out Of Office May Be A Bumpy Ride (7mJwuKhfvqY) whose title and description indicate that Donald Trump made false claims of massive voter fraud. Other annotations: I mark a video as Irrelevant (2) if its content is not related to the presidential elections, as URL not accessible (3) if the YouTube video was not accessible at the time of annotation and as Other languages (4) when the content, title, or description of the YouTube video was in a language other than English. 60 4.2. METHODOLOGY 4.2.7 Classifying YouTube videos for election misinformation The crowd-sourced audit experiment resulted in ∼47K unique YouTube videos and 35 unique YouTube shorts9. Given a large number of videos, I scaled the annotation process using a machine learning classifier. In this section, I present the method of creating the ground truth dataset, a description of features used in the classification model, model architecture, and the results of the classification. 4.2.7.1 Creating a ground truth dataset Two researchers manually annotated 1196 videos using the guidelines and heuristics mentioned in Section 4.2.6. I obtained annotations for 545 additional videos using AMT. I describe the process of obtaining video annotations from AMT workers in Figure 4.5. Overall, in the ground truth dataset, I had 1741 videos out of which 124 are supporting10, 257 opposing, 228 neutral, and 1132 irrelevant videos. 4.2.7.2 Feature description I considered the following features for the classifier. Snippet (title+description): I concatenated the title of the YouTube video with its description together, as done by [303], and used the concatenated string as a feature. Transcript: Transcript contains the textual content of the video. I use transcripts auto- generated by YouTube. Tags: Video tags are words that a content creator associates with their video while uploading it on the platform. Video Statistics: Video statistics include the number of views, likes, comments, and date of publication. Channel Bias: Since the election misinformation is closely entangled with the political beliefs [77, 264], I used partisan bias of YouTube channels as a feature. Using existing data sets on media bias and manual annotations (described in the next Section), I annotated YouTube channels’ partisan bias on a 5-point scale of far-left to far-right. Apart from the features listed above, I also tried several other features like LIWC dictionary [376], Credibility Cues [283], and hashtag matching from the Voter Fraud dataset on the text features [37] that didn’t improve performance. Therefore, I do not discuss them in detail. Recall, while manually annotating the videos, I discovered that 9YouTube shorts are short YouTube videos with lengths equal to or less than 60 seconds 10Out of these 67 videos were removed from the platform at the time of analysis. 61 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION Classifier[Feature + Vectorizer + Imbalance Handling + Data] Acc. F1 SVM[Video Engagement Statistics] 0.38 0.14 SVM[Snippet + FastText] 0.61 0.56 SVM[Transcript + FastText] 0.58 0.51 SVM[Tags + FastText] 0.59 0.53 SVM[Snippet,Transcript,Tag + FastText] 0.63 0.57 SVM[Snippet,Transcript,Tag + Count] 0.65 0.58 SVM[Snippet,Transcript,Tag + TFIDF] 0.71 0.65 SVM[Snippet,Transcript,Tag,Channel Bias + Sentence Transformer] 0.73 0.69 SVM[Snippet,Transcript,Tag,Channel Bias + TFIDF] 0.74 0.70 SGD[Snippet,Transcript,Tag,Channel Bias + TFIDF] 0.64 0.57 KNN[Snippet,Transcript,Tag,Channel Bias + TFIDF] 0.61 0.58 XGB[Snippet,Transcript,Tag,Channel Bias + TFIDF] 0.74 0.68 Voting SVM + SGD + KNN + XGB [Snippet, Transcript, Tag, Channel Bias + TFIDF] 0.75 0.71 SVM[Snippet,Tag,Channel Bias + TFIDF + SMOTE + Ad- ditional Training Data] 0.91 0.90 XGB[Snippet,Tag,Channel Bias + TFIDF + SMOTE + Ad- ditional Training Data] 0.91 0.91 Table 4.3: A sample of of classifiers and feature set with the progression of performance. comments are not a good indicator of the veracity of the video. Therefore, I chose not to include those in the feature set. 4.2.8 Annotating YouTube channels for partisan bias The dataset of unique videos came from a large number of YouTube channels (∼17.5K) and comprised of channels devoted to both news and non-news content. I coded the leaning of the channel on a 5-point Likert scale (far-left, center-left, neutral, center-right and far-right) using computational methods and several heuristics. First, to identify news-related channels, I used several pattern-matching techniques (e.g., finding key- word news in the channel’s name, etc.) and discovered a total of 802 news channels. Then I used existing datasets on media bias from mediabiasfactcheck.com and allsides.com for annotating the channels. For channels whose annotations were not available in the datasets, I manually went through their title, description, sample videos, related information from their website, wikipedia, and/or google search to identify their leaning or the leaning of their affiliations. Many local news channels 62 4.2. METHODOLOGY such as KHOU11 or KPRC12 are affiliated with national channels. If I did not find the bias ratings for such local channels, I assigned them the label of their affiliations. For example, KHOU is associated with center-left CBS and thus, was also assigned a center-left rating. I assigned channels that didn’t fall under news category the neutral label. I manually checked a random sample (n=50) of non-news channels and found only one channel that had content about news. Therefore, this process produced chan- nel bias annotations (to be used as a feature in the classifier) with reasonable accuracy for the study, given that channel bias detection is not the main focus of the work. 4.2.8.1 Classifier Selection To find a classifier that performs well on the dataset, I applied a series of machine learning classifiers on several combinations of feature sets. To create feature vectors, I tested two types of word vectors (count and tf-idf vectors) and two types of sentence vectors (FastText 13 and BERT [116]). For word vector generation, I cleaned the dataset by removing stop words and lemmatization, followed by up to 3-gram generation. To deal with data imbalance in the dataset, I used Synthetic Minority Over-sampling Technique [90] I applied several classifier models on the feature set including support vector machine, stochastic gradient descent, decision trees, nearest neighbor, and ensemble models. To find the best model, I performed a grid search on a five-fold cross-validation dataset by looking into standard parameter space for each classifier. For the sake of brevity, I only show a sample of combinations tested in Table 4.3. Out of all the combinations, both SVM and XGBoost performed the best (ACC=91%) when trained with snippet, tags, and channel bias features and tf-idf text vectorizer 14. Based on Occam’s Razor principle [103], I selected SVM as the final classifier, i.e., the simplest model with maximum accuracy. Using this classifier, I determined the annotation labels for the remaining videos. In total, the dataset consisted of 431 supporting, 1868 opposing, 1658 neutral, and 43041 irrelevant videos. 11https://www.youtube.com/c/KHOU 12https://www.youtube.com/c/KPRC2Click2Houston 13https://fasttext.cc/ 14If I merge irrelevant and neutral videos into one class resulting in a three-class classification problem, SVM classifier performs with a 93% accuracy. 63 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION 4.3 Ethical considerations The browser extension TubeCapture uses crowd workers’ YouTube accounts to watch videos (including videos containing election misinformation) and conduct searches on the platform. It was possible that participants would have seen more misinformation than they would have otherwise during and also after the research study due to the watch and search history built during the audit. In order to eliminate the potential harm of my experiments, I included two essential steps in the experimental design. First, the extension always opened the browser window in the background so that participants don’t actively see the videos being played. Second, the extension deleted users’ search and watch history built during the study period. Note that YouTube allows the deletion of items from the search and watch history for a specific date range. YouTube’s website [404, 435] clearly states that “search entries you delete will no longer influence your recommendations. At any time you can (also) remove videos (from watch history) to influence what YouTube recommends to you”. I explicitly informed users that their YouTube history during the study period would be deleted. I ensured that the extension expires after the study period so that it does not perform any action. In addition, I ensured that the YouTube pages saved by the extension do not contain users’ personally identifiable information such as email addresses. 4.4 RQ1 Results: Extent of Personalization To measure the extent of personalization in YouTube components, I compare the personalized list of video URLs present in the standard window with the baseline unpersonalized videos obtained from the incognito window. Below I discuss the metrics that I used to quantify personalization. Measuring personalization in web search: In this study, to determine personal- ization in search results, I employ two metrics: jaccard index and rank bias overlap (RBO). Jaccard index measures the similarity between two lists and has been used in several previous audit studies to measure personalization in web search [187, 223, 236]. However, Jaccard index does not take into account the rank of the lists being compared. Thus, I used RBO metric introduced by Webber et al [413] which takes into account the order of elements in the list. The RBO function includes a parameter p which indicates the top-weightedness of the metric, i.e. how much will the metric penalize the difference in the top rankings. A previous audit study used the click-through rate 64 4.4. RQ1 RESULTS: EXTENT OF PERSONALIZATION (CTR) of Google search results to estimate the value of p [331]. Because of the lack of CTR statistics available for YouTube, I consider the default value of p which is 1 (prior audit studies such as [249] opted for a similar approach), indicating that differences in all rankings are equally penalized. Both jaccard and RBO scores range between 0 and 1, with 1 indicating that the two lists have similar elements while 0 indicating that the lists are completely different. Measuring personalization in up-next trails: To measure personalization in up- next trails, I employ jaccard index and Damerau-Levenshtein (DL) distance [110]. DL distance is the enhanced version of edit distance that computes the number of transpositions in addition to insertions, deletions, and substitutions required to make the treatment list identical to the control list. DL distance has been used by prior audit work as a metric to estimate the ranking differences between two lists [81]. It returns a score from 0 to 1 (identical lists) indicating how similar the two lists are. I refrain from using the RBO metric to determine personalization in up-next trails because RBO is suitable for indefinite lists while the trails collected through the experiments have a known maximum length of five. I also refrain from using the Kendall tau metric since it requires the two ranked lists being compared to be conjoint15. Given, jaccard, RBO, and DL distance return similarity values, I define personalization as:- (4.1) 1− similarity_metric(URL incognito,URLstandard). 4.4.1 RQ1a: Personalization in search results When asked in the study survey how much YouTube personalizes search results (Figure 4.6a), 34.34% believed YouTube personalizes search results to a great extent while 19.19% believed the extent of personalization to be very little. On quantitatively measuring the extent of personalization in YouTube search results, I found little to no personalization indicating that search results present in standard and incognito windows are highly similar. Figures 4.6b and 4.6c show the extent of personalization in SERPs calculated using jaccard index and RBO metric respectively for democrats, republicans, and independents for each day of the experiment run. I did not find any significant difference in the personalization values of SERPs for participants with respect to their political leaning. 15There are alternative versions of Kendall Tau that assume the dissimilar elements to be present at the end of the list. However, conceptually, the metric does not fit my collected trail data. 65 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION Very little Somewhat To a great extent 0 10 20 30 40 Co un t 19.2% 46.5% 34.3% (a) Participant’s belief in ex- tent of personalization in YouTube search results Da y0 Da y1 Da y2 Da y3 Da y4 Da y5 Da y6 Da y7 Da y8 0.0 0.2 0.4 0.6 0.8 1.0 Pe rs on al iza tio n (Ja cc ar d in de x) Democrat Republican Independent (b) Measuring extent of per- sonalization in SERPs using jaccard index Da y0 Da y1 Da y2 Da y3 Da y4 Da y5 Da y6 Da y7 Da y8 0.0 0.2 0.4 0.6 0.8 1.0 Pe rs on al iza tio n (R BO ) Democrat Republican Independent (c) Measuring extent of per- sonalization in SERPs using RBO Figure 4.6: RQ1a results: Figure (a) shows participants’ response to survey question: “How much, if at all, do you think YouTube personalizes search results”. Figures (b) and (c) show personalization calculated via jaccard index values and RBO metric values respectively in YouTube’s standard-incognito SERP pairs. 4.4.2 RQ1b: Personalization in up-next trails When asked how much YouTube personalizes up-next recommendations, 51.5% of participants believed that YouTube personalizes up-next recommendations to a great extent (refer Figure 4.7a). The quantitative measurements are in line with this belief showing that up-next trails are highly personalized. Figures 4.7c and 4.7d show the extent of personalization in up-next trails using jaccard index and DL distance. The graphs indicate that the up-next trails obtained from the users’ standard and incognito windows are highly dissimilar and thus, highly personalized. Statistical test revealed that the amount of personalization in trails with supporting, neutral, and opposing seeds is significantly different [F(2)=15.2, p<0.0001]. Post hoc test revealed that up-next trails with seed videos opposing misinformation have lesser personalization (higher jaccard index16) when compared with up-next trails with supporting and neutral seed videos. Next, I checked the influence of users’ subscriptions on personalized trails. 81 (out of 99) participants had subscribed to at least one YouTube channel (mean=109.4, me- dian=31, SD=207.8). The maximum number of subscriptions for a participant was 1073 and the minimum was 1. The participants had subscribed to 7670 unique channels out of which 79 either did not exist or were suspended due to violation of YouTube’s mod- eration policy and thus, I did not consider these channels for analysis. To determine how many video recommendations in users’ up-next trails were coming from their 16The jaccard index values obtained were highly correlated with DL distance scores (pearson correla- tion coefficient = 0.96). Thus, I used jaccard index values to perform the statistical test. 66 4.4. RQ1 RESULTS: EXTENT OF PERSONALIZATION Very little Somewhat To a great extent 0 10 20 30 40 50 60 Co un t 9.1% 39.4% 51.5% (a) Participant’s belief in extent of personalization in YouTube up-next recommendations 0 10 20 30 40 50 60 70 80 90 Percentage of trail videos coming from users' subscribed channels 0 10 20 30 40 50 No . o f u se rs (b) Distribution of percentage of up-next video recommenda- tions coming from users’ sub- scribed channels. Trails with supporting seed (Day1-3) Trails with neutral seed (Day4-6) Trails with opposing seed (Day7-9) 0.0 0.2 0.4 0.6 0.8 1.0 Pe rs on al iza tio n (ja cc ar d in de x) Democrat Republican Independent (c) Measuring extent of person- alization using jaccard index Trails with supporting seed (Day1-3) Trails with neutral seed (Day4-6) Trails with opposing seed (Day7-9) 0.0 0.2 0.4 0.6 0.8 1.0 Pe rs on al iza tio n (D L di st .) Democrat Republican Independent (d) Measuring extent of person- alization using DL index Figure 4.7: RQ1b results: Figure (a) shows participants’ response to survey question: “How much, if at all, do you think YouTube personalizes up-next recommendations”. Figure (b) shows the distribution of the percentage of YouTube videos recommended to the study participants from their subscribed channels. Figures (c) and (d) show personalization calculated via jaccard index values and DL distance metric values respectively in YouTube’s standard-incognito up-next trails pairs. subscriptions, first, for each user I extracted the unique videos recommended in all the up-next trails collected for the user. Then I filtered and calculated the number of videos coming from the users’ subscribed channels. Figure 4.7b shows the distribution of the percentage of videos recommended to the participants in up-next trails that are coming from their subscribed channels. This percentage value is moderately correlated with the number of channels subscribed (r=0.61) and highly correlated with the number of news-related channels subscribed17(r=0.71). 17To get a rough estimate of YouTube channels that broadcast news, I considered the news sources from mediabiasfactcheck.com and allsides.com. Additionally, I extracted the description of each channel and categorized it as a news channel if the description contained terms such as ‘breaking news’, ‘politic*’, ‘current affairs’, ‘government’, ‘national tv’, ‘national news’, ‘international news’, 67 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION 4.5 RQ2 Results: Amount of Misinformation When asked how much do participants trust the credibility of videos in search results and recommendations, less than 20% reported that they trust the credibility of content shown to them by YouTube to a great extent (Figure 4.8). To determine how much credible information is presented by YouTube to users in reality, I quantify the misinfor- mation present in the YouTube components by adopting the misinformation bias score developed in Chapter 3. The score determines the misinformation in ranked lists and is calculated as ∑n r=1 (xr∗(n−r+1)) n∗(n+1) 2 ; where x is the video annotation, r is rank of the video, and n is the total number of videos present in the SERP/up-next trail. To conform to the video annotation scale developed by me in Chapter 3, I map the annotation values to a normalized scale of -1, 0, and 1. I assign scores of -1 and 1 to videos opposing and supporting election misinformation respectively. Videos marked as irrelevant, neutral, belonging to a non-English language, or removed from the platform are as- signed a 0 score. Thus, the misinformation bias score of a SERP/trail is a continuous value ranging between -1 (all videos are opposing election misinformation) to +1 (all videos are supporting election misinformation). Note that a positive score indicates a lean towards misinformation, while a negative score indicates a lean towards content opposing misinformation. For analysis, I consider the top ten search results and five consecutive videos in the up-next trails. 4.5.1 RQ2a: Misinformation in search results The results of RQ1 showed that YouTube’s SERPs are very slightly personalized suggesting that search results present in the standard and incognito windows are mostly similar. Therefore, to quantify the misinformation bias in SERPs I only consider the SERPs obtained from the standard YouTube windows of all the participants. I first calculated the average misinformation bias score for each of the 88 search queries for 9 days of the experiment run across all 99 participants. Figure 4.9 shows the distribution of misinformation bias scores for all the search queries. I observe that the average misinformation bias scores of 84 (out of 88) search queries are negative indicating that the search results contain more videos that oppose election misinformation as ‘world news’, ‘global news’, ‘current affairs’, ‘wall street’ etc. These terms were curated by the first author after manually going through the description of 50 national and regional news channels on YouTube. I found that 44 users had subscribed to news and politics-related channels. 68 4.5. RQ2 RESULTS: AMOUNT OF MISINFORMATION Not at all Very little Somewhat To a great extent 0 10 20 30 40 50 60 Co un t 4.0% 16.2% 60.6% 19.2% (a) Participant’s trust in the credibil- ity of information presented in search results Not at all Very little Somewhat To a great extent 0 10 20 30 40 50 60 Co un t 7.1% 18.2% 60.6% 14.1% (b) Participant’s trust in the credibility of information presented in up-next recommendations Figure 4.8: RQ2: Figure showing participants’ response to survey question: “How much do you trust the credibility of information present in the ” a) search results and b) up-next videos recommended by YouTube. Cluster1 Cluster2 Figure 4.9: RQ2a results: Mean misinfor- mation bias scores for 88 search queries for all participants. A negative score indicates that SERPs contain more videos opposing election misinformation. Cluster1: Search queries containing keyword fraud in conjunction with keywords voter, election, and domin- ion voter fraud evidence, dominion voter machine scandal, sharpie voter fraud, election fraud 2020, election fraud whistleblower Cluster2: Search queries containing keywords election, and 2020 trump biden general election, presiden- tial election 2020, presidential election results 2020, mail in ballots 2020 Table 4.4: The misinformation bias scores form a bimodal distribution, each constituting a cluster of similar queries. This table describes the clus- ters and presents sample queries for each cluster. compared to videos supporting election misinformation18. Furthermore, I observe in Figure 4.9 that the misinformation bias scores of the SERPs form a bimodal distribution constituting two clusters of search queries (Table 4.4). The cluster1 search queries have the most negative bias, i.e. they contain more opposing videos. This cluster mostly 18Only four search queries in the query set (‘stop the seal’, ‘voting machine fraud’, ‘ballots in garbage’ and ‘ballots thrown out’) have a positive misinformation bias. 69 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION 1.0 0.5 0.0 0.5 1.0 Misinformation bias score stop the steal voting machine fraud ballots in garbage ballots thrown out us elections 2020 pennsylvania voter fraud claims voter fraud evidence ballot fraud electoral fraud ballot box fraud (a) Search queries with highest and lowest mean misinformation bias scores Da y0 Da y1 Da y2 Da y3 Da y4 Da y5 Da y6 Da y7 Da y8 1.0 0.5 0.0 0.5 1.0 M isi nf or m at io n bi as sc or es Democrat Republican Independent (b) Misinformation bias scores of search queries for each day of ex- periment run Figure 4.10: RQ2a results: a) Search queries with highest (labeled in red) and lowest (labeled in blue) mean misinformation bias scores. Positive misinformation bias scores indicate a lean towards misinformation where as negative bias scores indicate a lean towards information that opposes misinformation. b) Figure showing the distribu- tion of misinformation bias scores of search queries for democrats, republicans and independents. Note that the bias scores for the participants belonging to the differ- ent political leanings coincide indicating that misinformation bias in SERPs remain constant throughout for each participant. consists of search queries containing the keyword fraud in conjunction with keywords voter, election and dominion. Cluster2 on the other hand consists of search queries with keywords election and 2020. Overall, cluster1 consists of more search queries biased towards finding misinformation compared to search queries in cluster2. This indicates that YouTube pays more attention to search queries about election fraud and ensures that users are exposed to opposing videos when searching about fraudulent claims surrounding the elections. Figure 4.10a shows five search queries with the highest and 5 search queries with the lowest misinformation bias. The search query ‘voter fraud claims’ has the least amount of misinformation bias, indicating that most of the search results for this query oppose election misinformation. On the other hand, the search query ‘stop the seal’ has the most amount of videos supporting election fraud claims. Next, I determine how do misinformation bias scores in SERPs vary for democrats, independents, and republicans. Figure 4.10b shows that the bias values for democrats, independents, and republicans for all days coincide indicating that the amount of misinformation bias is almost constant for all days for all participants irrespective of their partisanship. Overall, RQ2 results indicate that YouTube pushes debunking information in search results, more for search queries about voter fraud claims as compared to generic queries about the presidential elections. 70 4.5. RQ2 RESULTS: AMOUNT OF MISINFORMATION Mean misinfo score of trails with supporting seeds (S) Mean misinfo score of trails with neutral seeds (N) Mean misinfo score of trails with opposing seeds (O) Dem. Repub. Indep. 0.28 -0.02 -0.49 0.31 0.01 -0.49 0.33 -0.01 -0.51 1.0 0.5 0.0 0.5 1.0 heightPol. aff. Statistical tests Mean diff. Democrats F(2,3407)=4035.1 , p=0 S>N>O Republicans F(2,2265)=2981.4, p=0 S>N>O Independents F(2,2941)=3593.8, p=0 S>N>O Figure 4.11: RQ2b results: Mean misinformation scores of standard up-next trails with seed videos that are supporting (S), neutral (N), or opposing election misinformation (O) for Democrats, Independents, and Republicans. A positive misinformation score indicates a lean toward misinformative content while a negative score indicates a lean toward content that opposes election misinformation. Statistical tests reveal a significant difference in the amount of misinformation contained in up-next trails. I find that democrats, republicans, and independents find more misinformation in supporting trails compared to neutral trails, and more misinformation in neutral trails as compared to opposing trails. 4.5.2 RQ2b: Misinformation in up-next trails The results of RQ1 showed that participants’ up-next trails are highly personalized. In other words, videos in up-next trails obtained from the standard window are different from videos in trails obtained from the incognito window. Recall, that trails extracted from incognito window act as baseline unpersonalized trails while trails extracted from the standard window, where users had signed into their accounts, act as personalized treatment trails. Therefore, to determine the impact of personalization on the amount of misinformation in up-next trails, I compare the misinformation bias scores of trails collected in standard windows with the trails collected in incognito windows. I find that the difference in misinformation bias scores of standard and incognito up-next trails is not significant (t=-0.62, p=0.53). This means that although the standard up-next trails are very different from the incognito up-next trails, there is no difference in the amount of misinformation present in them. To avoid inflating the sample size, for further downstream analysis, I only consider up-next trails obtained from participants’ standard windows. This similar strategy was adopted by Robertson et al for analyzing bias in Google search results when they did not see any significant difference in the amount of partisan bias in incognito-standard SERP pairs [331]. 4.5.2.1 Misinformation in standard up-next trails for different scenarios In this section, I determine the amount of misinformation encountered by the study participants in the standard up-next trails for seed videos with different stances on elec- 71 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION tion misinformation—supporting, neutral and opposing. Figure 4.11 shows the mean misinformation scores of different up-next trails collected from the standard windows of democrats, republicans, and independents. Recall that a positive misinformation score (>0) indicates a lean towards misinformation, while a negative misinformation score indicates a lean towards information that opposes election misinformation. I conduct within-group statistical tests to determine the difference in misinformation for the three scenarios (following trails for supporting, neutral, and opposing seed videos). The tests indicate a filter bubble effect. If users watch supporting videos, they are led to supporting videos in the trails. But if they watch neutral videos, they are led to less misinformation compared to when they watched supporting videos. However, if users watch opposing videos, they are led to more opposing videos in the up-next trails. The same trend is observed for democrats, republicans, and independents. Is the amount of misinformation in trails with different seeds different for democrats, republicans, and independents? Between-group statistical tests reveal that amount of misinformation in supporting trails (KW H(2)=11.9,p=0.002) and neutral trails (KW H(2)=8.69,p=0.01) for democrats, independents, and republicans is significantly differ- ent. I find that independents receive more misinformation in their supporting trails as compared to democrats. Additionally, republicans receive more misinformation in their neutral trails compared to democrats. Overall, by observing Figure 4.11, I realize misinformation scores of supporting trails are positive and opposing trails are negative. However, the magnitude of misin- formation scores of opposing trails is much more than the supporting trails indicating that the strength of the filter bubble effect was more when the study participants watched videos opposing election misinformation. 4.5.2.2 Transitions in standard up-next trails In this section, I gain more insights into the anatomy of YouTube’s up-next trails by studying the various transitions present in them. This allows me to determine how users get pushed towards misinformative or debunking videos in the trails. Since the annotation scale consists of three values, supporting (S), neutral (N), and opposing (O), there are 9 transitions possible in the trails (S->S, S->N, S->O, N->S, N->N, N->O, O->S, O->N, N->O). For each participant, I first individually determine the percentage of each of these transitions present in the three types of standard up-next trails collected (ones starting with a supporting seed video, neutral seed videos, and opposing seed video). Then I calculated the mean percentage of all of 72 4.5. RQ2 RESULTS: AMOUNT OF MISINFORMATION Dem Repub Indep S->S S->N S->O N->S N->N N->O O->S O->N O->O 0.38 0.86 1.78 18.57 19.18 19.20 2.52 2.27 2.55 1.41 1.61 2.34 66.73 68.73 66.76 3.21 2.45 1.89 0.00 0.11 0.09 4.37 4.05 3.66 2.81 0.74 1.73 Supporting trails (a) Mean % of transitions in trails with seed videos sup- porting elec. misinfo. Dem Repub Indep S->S S->N S->O N->S N->N N->O O->S O->N O->O 0.60 0.63 0.77 2.03 3.17 3.49 0.03 0.00 0.08 2.67 3.78 4.26 85.85 86.66 82.75 3.99 2.88 4.01 0.05 0.15 0.08 2.98 1.99 2.97 1.79 0.73 1.59 Neutral trails (b) Mean % of transitions in trails with neutral seed videos Dem Repub Indep S->S S->N S->O N->S N->N N->O O->S O->N O->O 0.11 0.11 0.08 0.60 0.66 0.52 0.00 0.00 0.04 0.46 0.72 0.71 52.37 54.20 51.85 4.93 4.29 3.70 0.32 0.16 0.16 21.04 21.39 20.79 20.18 18.48 22.15 Opposing trails 0 20 40 60 80 100 (c) Mean % of transitions in trails with seed videos opposing elec. mis- info. Figure 4.12: RQ2b results: Mean percentage of various transitions present in the standard up-next trails of democrats, independents and republicans. S represents a video supporting election misinformation, N represents a neutral video and O represents a video opposing election misinformation. Transition S->S denotes that a YouTube video supporting election election misinformation leads to an up-next video recommendation supporting election misinformation. these transitions for democrats, independents, and republicans. From Figure 4.12, I see that the maximum number of transitions across all participants and all types of up-next trails is N->N. Problematic transitions like S->S and O->S are less than 2% in trails of all users. However, comparatively S->S transitions are still more in the supporting up-next trails of independents (1.78%) compared to democrats (0.38%) and republicans (0.86%). In the neutral up-next trails of republicans and independents, N->S transitions dominate (after N->N transitions) indicating that independents and republicans are sometimes led to supporting videos in their up-next recommendations even when they are viewing neutral YouTube videos. I also observe that the opposing up-next trails majorly consist of transitions O->N and N->O (after N->N transitions) indicating that once a user watches a video that opposes election misinformation, YouTube pushes more videos that are either neutral or opposing in stance in the up-next trails of all the participants. I also observe that S->O transitions are less than S->N transitions in the supporting trails of democrats, republicans, and independents. Previous work has shown that watching YouTube videos that debunk misinformation helps in bursting filter bubbles of misinformation [390]. My work also shows that opposing videos could lead to more opposing videos (O->O transitions in opposing trails). Thus, increasing the number of S->O transitions can lead users to trustworthy 73 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION information on the platform. 4.5.3 RQ2c: Misinformation in homepages I collected participants’ YouTube homepages to determine how the bias in the home- page changes (δ) after watching a trail of videos starting with a seed video that is either supporting (δS), opposing (δO) or neutral (δN) in stance with respect to election misinformation. I calculated the impact of trails by using the following formula:- δstance = Misinfo. scoreHomepage_be f ore_the_trail - Misinfo. scoreHomepage_af ter_the_trail δS, δN and δO represent the change in the amount of bias present in homepages because of watching a trail of up-next videos starting with supporting, opposing and neutral seeds. A negative δ would indicate that the YouTube homepage collected after the trail contained more opposing videos compared to the YouTube homepage before the trail. A positive δ, on the other hand, indicates either presence of more videos supporting election misinformation or a lesser number of opposing videos on the homepage collected after the trail as compared to the homepage collected before the trail. I consider the top ten recommendations present on the homepage for analysis. Figure 4.13 shows δ values for all three kinds of trails for democrats, republicans, and independents. I discuss a few results. I observe that after following the up-next video trails starting from a neutral seed, the homepages of democrats and independents contain more supporting videos. However, recall that the average misinformation score of the up-next trails with neutral seeds for both democrats and independents was negative (Figure 4.11). This indicates that although the up-next trails with neutral seeds lead users to more opposing videos, the homepages, however, contain more misinformation or a lesser number of opposing videos after the trail. I also observe that after watching up-next trail videos with supporting seed, republicans’ homepage contain more opposing videos (Figure 4.13) while the trail itself contained more misinformation (Figure 4.11). However, note that the magnitude of the δ is low in all the conditions indicating that fewer videos supporting or opposing election misinformation appear on the participants’ homepages. 4.6 RQ3: Composition and Diversity In this research question, I want to characterize source diversity on YouTube when users search for election misinformation on the platform. Source diversity in searches 74 4.6. RQ3: COMPOSITION AND DIVERSITY Avg. S Avg. N Avg. O Dem. Repub. Indep. -0.01 0.002 -0.005 -0.009 -0.014 0.006 0.004 0.01 -0.06 1.0 0.5 0.0 0.5 1.0 Figure 4.13: RQ2c results: Figure showing the average change in the amount of bias present in homepages because of watching a trail of up-next videos starting with sup- porting, opposing, and neutral seeds for democrats, republicans, and independents. and recommendations is an important characterization of fairness [157]. Furthermore, given that the narratives about the election misinformation were closely intertwined with news sources and their leanings, it is important to determine what kinds of YouTube channels are users exposed to. News and media diversity can be characterized in multiple ways [221]. One typology characterizes media diversity with respect to source (content providers), content (perspectives) and exposure (actual consumption of diverse content) [288, 394]. My work analyzed the content diversity in RQ2 by analyzing the video’s stance on election misinformation. I cannot study exposure diversity since it requires determining the actual content consumed (clicked, watched, etc) by the study participants in their naturalistic settings. For this study, I focus on source diversity in terms of the identity of top content providers (YouTube channels) and the distribution and concentration of channels in the standard SERPs and up-next trails. I acknowledge that future studies should also examine the ideological position of news sources and study the filter bubbles of partisan content on the platform. 4.6.1 RQ3a: Diversity in search results For analysis, I consider the top ten search results in standard SERPs. Figure 4.14a shows the top 10 YouTube channels with impressions in the most number of search queries.19 Here, I define impression as the occurrence of a channel’s video in SERP. I observe that the left-leaning channel CNN on average appears in SERPs of more than half (61.86%) search queries. Additionally, except Fox news and 11Alive, all other top channels are left-leaning. I further analyzed which channels were responsible for the most relevant YouTube videos in the collected data. In the standard SERPs, I obtained a total of 4901 unique videos out of which 1940 (39.51%) videos were relevant, i.e. 19The top 10 YouTube channels and their mean percentage of total impressions were almost similar when calculated separately for democrats, republicans, and independents. Thus, I show the overall distribution for all users combined together. 75 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION 0 10 20 30 40 50 60 Mean percentage of total impressions in search queries CNN NBC News CBS News CNBC Television 60 Minutes MSNBC Fox News PBS NewsHour 11Alive TODAY 61.86 39.85 33.64 31.36 29.51 27.79 27.08 26.27 24.72 23.88 (a) 0 1 2 3 4 Average no. of impressions per up-next trail where the channel was observed LastWeekTonight Saturday Night Live Fox News Late Night with Seth Meyers The Late Show with Stephen Colbert Jimmy Kimmel Live NBC News Sky News Australia Fox Business PBS NewsHour 4.00 3.60 3.27 2.98 2.68 2.65 2.02 1.93 1.92 1.71 (b) Figure 4.14: RQ3 results: a) Figure showing Top-10 YouTube channels with impressions in most number of search queries for all study participants. For example, on an average CNN appears in 61.86% of search queries for all study participants. b) Figure showing average number of impressions for Top-10 YouTube channels that appear in most number of standard up-trails collected for users. For example, on an average, videos from Fox News channel appear 3.27 times in those up-next trails where videos from the channel are observed. is a left-leaning channel, is right-leaning and is center- leaning. related to elections (959 opposing, 865 neutral, and 103 supporting). Overall, in these relevant videos, most videos come from CNN and MSNBC. The most opposing videos come from channels MSNBC followed by CNN, most supporting videos come from Fox News followed by Daily Mail while most neutral videos come from NBC news followed by CNN. Given, CNN is one of the channels with the most opposing videos, it is encouraging to see that it has the most search query impressions. Next, I determine the source diversity in the SERPs using gini coefficient metric [157, 394, 428]. Gini coefficient determines inequality in a frequency distribution. For the case of this study, I use this metric to determine the inequality in the distribution of YouTube channel impressions. For a given SERP consisting of videos from n unique channels, given a list of impressions for all YouTube channels [g1, g2,...gn], then gini coefficient would be calculated as, Gini coefficient (G) = 12 g¯n2 Σ |n| i=1 Σ |n| j=1 |g i - g j| where g¯ is the mean of all impressions. A fairer search engine would have lower values of gini coefficient indicating uni- form distributions of YouTube channel impressions. Figure 4.15 shows the distribution of gini coefficients for all SERPs for democrats, republicans and independents. The 76 4.6. RQ3: COMPOSITION AND DIVERSITY 0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 Avg. gini index 0 10 20 30 40 50 Co un t 48(54.5%) 29(33.0%) 8(9.1%) 2(2.3%) 1(1.1%) (a) Democrats 0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 Avg. gini index 0 10 20 30 40 50 Co un t 48(54.5%) 29(33.0%) 8(9.1%) 2(2.3%) 1(1.1%) (b) Republicans 0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 Avg. gini index 0 10 20 30 40 50 Co un t 50(56.8%) 27(30.7%) 8(9.1%) 2(2.3%) 1(1.1%) (c) Independents Figure 4.15: RQ3a results: Distribution of Gini coefficients for all search queries (n=88) for a) Democrats, b) Republicans and c) Independents, calculated based on distribution of impressions of YouTube channels appearing in the search results. distributions are similar for users with different political leanings. Furthermore, for approximately 96% of search queries, the gini coefficient of SERPs is less than 0.3 indicating that YouTube has mostly evenly distributed videos from different channels in its search results. 4.6.2 RQ3b: Diversity in up-next trails Overall, I collected 6943 videos in standard trails out of which 1082 are relevant, i.e. related to elections. The most number of opposing videos in trails come from channels MSNBC and Late Night with Seth Meyers*, most supporting videos in trails come from Fox News* and Fox Business, and most neutral videos come from Fox News* and NBC News20. Next, I determine the top ten YouTube channels occurring in the standard trails. Note, I do not consider the seed videos while analyzing the trails. Figure 4.14b shows the average number of impressions of the top 10 channels appearing the most number of times in the trails. Here, impression indicates the number of occurrences of a channel’s videos in a trail, while considering trails containing videos from that channel. Note that the top channels are also channels of some of the seed videos in the dataset. The figure reveals that on an average, videos from LastWeekTonight, Saturday Night Live, and Fox News appear more than 3 times in a trail, when taking into account all the trails where the channel was observed. This finding indicates that videos from these channels lead to more videos from these channels in the up-next recommendations. Next, to determine the diversity in trails, I determine the proportion of channels that are different than the channel of the seed video in the trails. I find that on average, 20* indicates that seed videos of the experiments also belonged to these channels. 77 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION 0 25 50 75 100 Perc. of users in whose trails the channel appears Fox News NBC News PBS NewsHour LastWeek- Tonight CBS NewsCh an ne ls ap pe ar in g in m os t su pp or tin g tra ils fo r d em oc ra ts 100.00% 51.28% 51.28% 33.33% 58.97% (a) Democrats (supp. trails) 0 25 50 75 100 Perc. of users in whose trails the channel appears Fox News Fox Business Sky News Australia CBS News PBS NewsHourC ha nn el s a pp ea rin g in m os t su pp or tin g tra ils fo r r ep ub lic an s 100.00% 80.77% 76.92% 76.92% 69.23% (b) Republicans (supp. trails) 0 25 50 75 100 Perc. of users in whose trails the channel appears Fox News NBC News PBS NewsHour CBS News Sky News AustraliaC ha nn el s a pp ea rin g in m os t su pp or tin g tra ils fo r i nd ep en de nt s 100.00% 58.82% 67.65% 52.94% 41.18% (c) Independents (supp. trails) 0 25 50 75 100 Perc. of users in whose trails the channel appears Fox News PowerfulJRE Fox Business The Late Show with Stephen Colbert Late Night with Seth MeyersC ha nn el s a pp ea rin g in m os t n eu tra l t ra ils fo r d em oc ra ts 100.00% 61.54% 51.28% 33.33% 28.21% (d) Democrats (neutral trails) 0 25 50 75 100 Perc. of users in whose trails the channel appears Fox News Sky News Australia PowerfulJRE Fox Business CBS NewsCh an ne ls ap pe ar in g in m os t n eu tra l t ra ils fo r r ep ub lic an s 100.00% 69.23% 76.92% 57.69% 50.00% (e) Republicans (neutral trails) 0 25 50 75 100 Perc. of users in whose trails the channel appears Fox News PowerfulJRE Fox Business NBC News Sky News AustraliaC ha nn el s a pp ea rin g in m os t n eu tra l t ra ils fo r i nd ep en de nt s 100.00% 64.71% 58.82% 50.00% 44.12% (f) Independents (neutral trails) 0 25 50 75 100 Perc. of users in whose trails the channel appears Saturday Night Live LastWeek- Tonight Late Night with Seth Meyers The Late Show with Stephen Colbert The Daily Show with Trevor NoahC ha nn el s a pp ea rin g in m os t o pp os in g tra ils fo r d em oc ra ts 97.44% 97.44% 97.44% 97.44% 74.36% (g) Democrats (oppos. trails) 0 25 50 75 100 Perc. of users in whose trails the channel appears Saturday Night Live LastWeek- Tonight Late Night with Seth Meyers Jimmy Kimmel Live The Daily Show with Trevor NoahC ha nn el s a pp ea rin g in m os t o pp os in g tra ils fo r r ep ub lic an s 100.00% 100.00% 96.15% 96.15% 61.54% (h) Republicans (oppos. trails) 0 25 50 75 100 Perc. of users in whose trails the channel appears Saturday Night Live LastWeek- Tonight Late Night with Seth Meyers Jimmy Kimmel Live The Daily Show with Trevor NoahC ha nn el s a pp ea rin g in m os t o pp os in g tra ils fo r i nd ep en de nt s 100.00% 100.00% 97.06% 97.06% 67.65% (i) Independents (oppos. trails) Figure 4.16: RQ3b results: Figure showing the top YouTube channels appearing in supporting, neutral, and opposing trails of democrats, republicans, and independents and the percentage of users in whose trails these channels appear. is a left-leaning channel, is right-leaning and is center-leaning. in an up-next trail of length five, I find 2.07 YouTube channels other than the channel of the seed video. The number of non-seed channels in up-next trails is least for trails with seed videos from Saturday Night Live (0.85), LastWeekTonight (0.86), and Late Night with Seth Meyers (1.07). Note, I did not calculate this metric for supporting, neutral, and opposing seeds separately since the channels of the supporting, opposing, and neutral videos are not unique. For example, I have a supporting as well as a neutral seed from Fox news. Given this scenario, there is no way to determine whether the videos appearing in the trails are due to the channel lean of the seed video or because of other factors. I also refrain from determining the diversity in up-next trails using gini coefficient since several trails had just one or two unique channels (M=3.1, SD=1.46) in which case gini coefficient would not give a good representation of diversity. To get a sense of what kinds of channels are presented to users in the up-next 78 4.7. DISCUSSION trails, I determine the channels appearing in the most number of trails of democrats, republicans, and independents for trails with supporting, neutral, and opposing seeds (Figure 4.16). I observe that Fox news appears in up-next trails with supporting and neutral seeds of all users. Fox Business and Sky News Australia appear in both the supporting and neutral up-next trails of more than half of the republicans (Figure 4.16b, and 4.16e). None of the seed videos belonged to these channels and they still appear in the up-next trails. Similarly, Sky News Australia also appears in the neutral up-next trails of 44.12% independents (Figure 4.16f) despite no neutral seed belonging to the channel. Furthermore, PowerfulJRE (Joe Rogan’s YouTube channel) did not appear in the neutral up-next trails of all the users even though two neutral seed videos belonged to the channel (Figure 4.16d, 4.16e and 4.16f). On the other hand, the top channels appearing in the up-next trails with opposing seeds of all users (Figure 4.16g, 4.16h and 4.16i) are the channels of the opposing seed videos used in the experiment. Furthermore, three channels out of the top four appear in the trails of more than 96% of the users. This indicates that watching a video belonging to these left-leaning channels will probably lead to one or more videos belonging to this channel in the up-next recommendation trail. 4.7 Discussion In this study, I conduct a crowd-sourced audit of the YouTube platform to deter- mine how effectively the platform removed election misinformation from its various components. I discuss the implications of the findings below. 4.7.1 Standardization of search results I find little to no personalization in the search results. I also did not find any effect of personalization on the amount of misinformation returned in search results. Through- out the study period, the amount of personalization and misinformation remained constant in the searches. On analyzing the standard SERPs, I find that YouTube re- turns more videos opposing election misinformation in 95% of the search queries that I tested. Interestingly, I see that misinformation scores of search queries having a misinformation lean (e.g. dominion voter fraud) are more negative compared to misinformation scores of queries that are neutral in stance (e.g. presidential election 2020). This finding implies that YouTube has paid more attention to the queries with 79 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION misinformation lean and ensured that users are exposed to more debunking informa- tion when they search about the fraudulent claims surrounding the elections. This selective attention is also in-line with results of past audits that showed YouTube improving the recommendations of topics like vaccination over 9/11 conspiracies [205]. My analysis also indicates that gini index of 96% of search queries is less than 0.3, with ∼54% queries having a gini index of less than 0.1. Such low values of gini index imply that YouTube is ensuring source diversity in searches by evenly distributing videos from different channels in its SERPs. Furthermore, the distribution of gini coefficients was similar for all users irrespective of their partisanship. This finding indicates YouTube’s attempt to expose users to videos from different channels rather than a select few based on participants’ partisanship. Interestingly, in line with a previous audit on Google search [394], I find that CNN is one of the top channels whose videos appear in 61.8% of search queries. Future studies can test whether the dominance is due to emergent bias or the strategies adopted by the channel to enhance algorithmic visibility [394]. Overall, my analysis reveals that YouTube’s search results are largely unpersonalized and the platform has had varying levels of success in removing misinformation and presenting videos that debunk election- related falsehoods in different clusters of search queries. 4.7.2 Scope for improvement in up-next trail recommendations I find that up-next trails are highly personalized. However, for 50% of the users, only up to 10% videos in the up-next recommendations come from users’ subscribed channels. Future audit studies should further investigate the impact of users’ channel subscriptions (both news and non-news channels) on the platform’s recommendations. I also find that there is no significant difference in the amount of misinformation that users are exposed to in up-next recommendation trails in the signed-in standard window and unpersonalized incognito window. On examining the standard up-next trails, I do find an echo-chamber effect. Users, irrespective of their partisanship, receive more misinformation in the up-next trails with supporting seeds as compared to the trails with neutral and opposing seeds (Figure 4.11). I also observe that the magnitude of misinformation scores of trails with opposing seeds is more than the magnitude of misinformation scores of trails with supporting seeds. This implies that users are exposed to a small number of misinformative videos when they follow the up-next recommendations of a video supporting election misinformation. On the other hand, 80 4.7. DISCUSSION users are exposed to a larger number of opposing videos in the opposing up-next trails. This is a key finding also supported by prior work that showed that echo chambers of misinformation can be burst by watching debunking videos [390]. The platform can leverage this phenomenon by making its recommendation engine present more debunking videos to users which would then expose them to more credible videos in the recommendation trails. I also examine various transitions in the up-next trails to study how users get pushed towards misinformation. Overall, I observe that problematic transitions where a supporting video is recommended in the up-next video recommendation of a sup- porting (S->S) or opposing video (O->S) are less than 2%. However, S->S transitions are more in trails with supporting seeds for independents compared to democrats and republicans. Furthermore, N->S transitions are also high in up-next trails with neutral seeds for independents. These findings are problematic. Showing misinforma- tive videos to independents who might not have developed a strong opinion on the election fraud conspiracies could increase their chances of forming a pro-conspiracy belief. I also observe that N->S transitions are more for republicans in the up-next trails with neutral seeds (3.78%) compared to trails with supporting seeds (1.61%). This finding is again troublesome. Past studies have indicated that republicans are more susceptible to electoral fake news [300]. Thus, recommending videos supporting election misinformation to republicans watching neutral videos would expose them to more misinformation which might reinforce or lead to forming conspiratorial beliefs. On analyzing the up-next trails for channel diversity, I observe several interesting phenomena. First, the number of impressions for left-leaning late-night show channels on YouTube such as LastWeekTonight is very high. On average, approximately 3-4 videos from these channels appear in the up-next trails (of length five) when starting with opposing seed videos. Furthermore, these channels appear in the video recom- mendations of almost all of the study participants. Similar to the late-night shows, I find that fox news also appears on average 3.27 times in the up-next trails of all participants. Future studies can look into the reasons behind the strong “algorithmic recognizability” [162] and high amplification of these channels in YouTube recommen- dations. Overall, I conclude that while YouTube has reduced misinformative videos in its up-next recommendations, there is still scope for improving the recommendation algorithm. 81 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION 4.7.3 Participants’ beliefs vs algorithmic reality The study survey conducted before the audit experiment provided me with an op- portunity to map participants’ beliefs about personalization and trust in YouTube’s algorithms with the reality of the situation as determined by the audits. The majority of participants believe that YouTube somewhat personalizes search results. However, in reality, they are hardly personalized. On the other hand, only half of the participants believe up-next recommendations to be highly personalized which is in line with the findings. This mismatch in beliefs and reality indicates users’ lack of algorithmic awareness. It also acts as a call to action for the platform to make users aware of the functioning of the algorithms. Users could be made aware of personalization or lack of it by adding design features that promote algorithmic reflection, for example, seeing search results or recommendations of other users [69]. The survey also showed that, respectively, 19.2% and 14.1% users trust the credi- bility of information presented to them by YouTube in the search results and up-next recommendations to a great extent. This belief is problematic and indicates reliance on the platform’s algorithms to show credible information. In reality, while I find the majority of YouTube’s search results to be credible, up-next recommendations still contained misinformative videos. One way to make people spot misinformation on the platform and not blindly trust YouTube’s recommendations could be by providing additional context about the content that the participant is searching for or viewing. While YouTube has started displaying Wikipedia links on the platforms [122], ad- ditional cues in the form of credibility citations, existing fact-checks or knowledge panel21 could also be helpful [203]. 4.8 Limitations and future work My work is not without limitations. My audit study is observational in nature, i.e. my experiment does not isolate user attributes that produce the differences in misinfor- mation measurements. I only make observations on the differences in misinformation received in searches and recommendations of users with different political affiliations. I recruited participants who used YouTube extensively to get information about the 2020 elections. However, for ethical reasons, I did not analyze participants’ account his- tories to verify their self-reported data. My participant sample was also not balanced 21https://support.google.com/knowledgepanel/answer/9163198?hl=en 82 4.8. LIMITATIONS AND FUTURE WORK with respect to demographic attributes and political affiliation. I selected YouTube videos that had accumulated the most number of views as the seed videos for the audit experiments. One potential pitfall of such a sampling strategy is that it reduces the ecological validity of the experiment since the participants in the study might not have engaged with those videos in the past. Another limitation is that YouTube might have specifically tailored the recommendations of popular misinformative videos. Future studies could consider alternative strategies for sampling videos, such as selecting videos that were more recently published on YouTube or sampling a combination of videos that have accumulated the least and most amount of engagement. The search queries used in the audit also might not be representative of how the study participants formulate queries about the elections. Future studies can survey the study participants to determine how they used YouTube searches in the context of political elections as well as their information needs about the elections. My classifier developed to annotate the YouTube videos for election misinformation has an error rate of 9% which could have affected the downstream analysis that I performed to quantify the amount of misinformation in various YouTube components. Additionally, I assign an annotation value of 0 to all videos that were removed from YouTube after the audit data collection. While the number of such videos is very small (<1%), it would result in a conservative estimate of misinformation bias present in the search results and recommendations. I use the misinformation bias score adopted from Hussein and Juneja et al’s study that captures the amount of misinformation along with the rank of the video [205]. However, this metric does not take into account the relevance of the videos. Future studies can use metrics that measure simultaneously the relevance and credibility in ranked lists such as Normalised Weighted Cumulative Score and Convex Aggregating Measure [259]. In my audit experiment, after testing every condition (watching supporting, neutral, and opposing videos), I performed a step to delete users’ YouTube history created by the extension so that it does not impact the other experimental condition. I tested out the effect of deletion on users’ search and watch history for a few sample queries and videos and found that the effect of such deletion is almost immediate. However, I did not test out this scenario for all search queries and videos used in the audit. Future studies can determine how soon the deletion of history impacts users’ recommendations and search results across various topics. My study focuses on users’ beliefs about the personalization and credibility of content on YouTube as well as the role of YouTube’s algorithms in driving users 83 CHAPTER 4. AUDITING YOUTUBE FOR ELECTION MISINFORMATION to the filter bubbles of problematic content. Future studies can focus on the impact of algorithmic recommendations on the radicalization of users. There are several scholars who argue that algorithms are not centrally culpable for the polarization or the filter bubbles that users experience on online platforms [75, 76, 417]. Many times the users of social media have a more diverse media diet than the non-users [75, 76]. Scholars posit that while algorithms can observe what a user consumes on social media, they cannot determine what the user actually prefers [108]. In other words, a digital choice is not always a true reflection of an individual’s preference [108]. Furthermore, users might use different online platforms for different types of content [108]. Thus, to gain a holistic idea of the extent algorithms play a role in user polarization, future audit studies can conduct multi-platform crowd-sourced audits for individuals. These audit studies can determine the impact of algorithmic recommendations on users’ social/political viewpoints via surveys and monitor users’ patterns of content consumption simultaneously on multiple search engines and social media platforms used by the users. 4.9 Conclusion In this study, I conducted a crowd-sourced audit on YouTube to determine the effec- tiveness of its content regulation policies with respect to election misinformation. I find that YouTube returns videos that debunk election misinformation in its searches. I also find that YouTube leads users to a small number of misinformative videos in up-next trails with seed videos that support election misinformation. Overall, my study shows that while YouTube has been largely successful in removing election misinformation from its searches, there is still scope to fix up-next recommendations. 84 C H A P T E R 5 AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION 5.1 Introduction The period after the arrival of coronavirus witnessed an advent of dangerous health misinformation on the internet including anti-vaccine lies and a deluge of fraudulent treatments and cures [57, 143]. The pandemic also brought the focus back to the anti- vaccine movement which has gained popularity in the recent past with anti-vax social media accounts seeing 19% increase in their followers [301]. Health experts worry that vaccine hesitancy could make it difficult to achieve herd immunity against the Coronavirus. Battling health misinformation, especially anti-vaccine misinformation has never been more important. Statistics show that people are increasingly depending on internet for health in- formation [323] including information about medical treatments, immunizations, vaccinations and vaccine-related side effects [73, 148]. While internet search is conve- nient, relying too much on it for health information could be dangerous [229]. The algorithms powering the search engines are not traditionally designed to take into account the credibility and trustworthiness of the information. Thus, there has been a growing interest in empirically investigating the search engine results for vaccine misinformation. While multiple studies have performed audits on commercial search engines to investigate problematic behaviour [201, 205, 331] , e-commerce platforms 85 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION have received little to no attention ([95, 358] are two exceptions) despite critics calling platforms, like Amazon, a “dystopian” store for hosting several anti-vaccine books on its platform [121]. Amazon specifically has faced criticism from several technology critics for not regulating the content on its platform [65, 328]. Consider the most re- cent instance. Several medically unverified products for Coronavirus treatment like prayer healing, herbal treatments, antiviral vitamin supplements proliferated Amazon [127, 165], so much so that the company had to remove 1 million fake products from its platform after several instances of such treatments were reported by the media [143]. The scale of the problematic content on the platform suggests that Amazon could unintentionally be a great enabler of misinformation, especially health misinformation. It not only hosts problematic health-related content but its recommendation algorithms drive engagement by pushing potentially problematic content to users [164, 358]. Thus, in this study I investigate Amazon—world’s leading e-retailer—for the most critical form of health misinformation i.e vaccine misinformation. What is the amount of misinformation present in Amazon’s search results and recommendations? How does personalization due to user history built progressively by performing real-world user actions, such as clicking or browsing certain prod- ucts, impact the amount of misinformation returned in subsequent search results and recommendations? In this study, I dabble into these questions. I conduct 2 sets of systematic audit experiments– Unpersonalized audit and Personalized audit. In the Unpersonalized audit, I adopt Information Retrieval metrics from prior work [242] to determine the amount of health misinformation users are exposed to when searching for vaccine-related queries. In particular, I examine search-results of 48 search queries belonging to 10 popular vaccine-related topics like ‘hpv vaccine’, ‘immunization’, ‘vaccination’, ‘MMR vaccine and autism’, etc. I collect search results without logging in to Amazon to eliminate the influence of personalization. To gain in-depth insights about the platform’s searching and sorting algorithm, the Unpersonalized audits ran for 15 consecutive days, sorting the search results across 5 different Amazon filters each day: “featured”, “price low to high”, “price high to low”, “average customer review” and “newest arrivals,” . The first audit resulted in 36,000 search results and 16,815 product page recommendations which were later annotated for their stance on health misinformation—promoting, neutral or debunking. In the second set of audit—Personalized audit, I determine the impact of personal- ization due to user history on the amount of health misinformation returned in search results, recommendations and auto-complete suggestions. The user history is built 86 5.1. INTRODUCTION progressively over 7 days by performing several real-world actions such as “search” , “search + click” + , “search + click + add to cart” + + , “search + click + mark top-rated all positive review as helpful” + + , “follow contributor” and “search on third party website” ( Google.com in my case) . I collect several Amazon components in the Personalized audit, like homepages, product pages and pre-purchase pages, search results, etc. These components are explained in detail in Section 5.3. I found Amazon hosting a plethora of health misinformative products belonging to several categories like Books, Kindle eBooks, Amazon Fashion (apparel, t-shirt, etc.) and Health & Personal care items (e.g. dietary supplements). Below I present the formal research questions, findings, contributions and implication of this study along with ethical implications. 5.1.1 Research Questions and Findings In the first set of audit I ask, RQ1 [Unpersonalized audit]: What is the amount of health misinformation returned in various Amazon components, given the components are not affected by user personalization? RQ1a: How much are search results contaminated with misinformation? RQ1b: How much are recommendations contaminated with misinformation? Is there a filter-bubble effect in the recommendations? I find a higher percentage of products promoting health misinformation (10.47%) compared to products that debunk misinformation (8.99%) in the unpersonalized search results. I discover that Amazon returns high number of misinformative search results when users sort their searches by filter “featured” and high number of de- bunking results when they sort results by filter “newest arrivals”. I also find Amazon ranking misinformative results higher than debunking results especially when results are sorted by filter “average customer reviews” and “price low to high”. Overall, search results of topics “vaccination”, “andrew wakefield” and “hpv vaccine” contain the highest misinformation bias when sorted by default filter “featured”. My analyses of product page recommendation suggests that recommendations of products promot- ing health misinformation contain more health misinformation when compared to recommendations of neutral and debunking products. Next, in the second set of audit 87 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION I ask, RQ2 [Personalized audit]: What is the effect of personalization due to user history on the amount of health misinformation returned in various Amazon components, where user history is built progressively by performing certain actions? RQ2a: How are search results affected by various user actions? RQ2b: How are recommendations affected by various user actions? Is there a filter-bubble effect in the recommendations? RQ2c: How are auto-complete suggestions affected by various user actions? The Personalized audits reveal that search results sorted by filters “average customer review”, “price high to low”, “price low to high” and “newest arrivals” along with auto-complete suggestions are not personalized. Additionally, I find that user actions involving clicking a search product leads to personalized homepages. I found evidence of filter-bubble effect in various recommendations found in homepages, product and pre-purchase pages. Surprisingly, the amount of misinformation present in homepages of accounts building their history by performing actions “search + click” and “mark top-rated all positive review as helpful” on misinformative product was more than the amount of misinformation present in homepages of accounts that added the same misinformative product in cart. The finding suggests that Amazon nudges users more towards misinformation once a user shows interest in a misinformative product by clicking on it but hasn’t shown any intention of purchasing it. Overall, the study suggests that Amazon has a severe vaccine/health misinfor- mation problem exacerbated by its search and recommendation algorithms. Yet, the platform has not taken any steps to address this issue. 5.1.2 Contributions and Implications In the absence of an online regulatory body monitoring the quality of content created, sold and shared, vaccine misinformation is rampant on online platforms. Through the work I specifically bring the focus on e-commerce platforms since they have the power to influence browsing as well as buying habits of millions of people. I believe the study is the first large-scale systematic audit of an e-commerce platform that investigates the role of its algorithms in surfacing and amplifying vaccine misinformation. My work provides an elaborate understanding of how Amazon’s algorithm is introducing 88 5.1. INTRODUCTION misinformation bias in product selection stage and ranking of search results across 5 Amazon filters for 10 impactful vaccine-related topics. I found that even use of different search filters on Amazon can dictate what kind of content a user can be exposed to. For example, use of default filter “featured” lead users to more health misinformation while sorting search results by filter “newest arrivals” lead users to products debunking health-related misinformation. This is also the first study to empir- ically establish how certain real-world actions on health misinformative products on Amazon could drive users into problematic echo chambers of health misinformation. Both the audit experiments resulted in a dataset of 4,997 unique Amazon products distributed across 48 search queries, 5 search filters, 15 recommendation types, and 6 user actions, conducted over 22 (15+7) days 1. The findings suggest that traditional recommendation algorithms should not be blindly applied to all topics equally. There is an urgent need for Amazon to treat vaccine related searches as searches of higher importance and ensure higher quality content for them. Finally, the findings also have several design implications that I discuss in detail Section 5.7.4. 5.1.3 Ethical Considerations I took several steps to minimize the potential harm of my experiments to retailers. For example, buying and later returning an Amazon product for the purpose of the project can be deemed unethical and thus, I avoid performing this activity. Similarly, writing a fake positive review about an Amazon product containing misinformation could negatively influence the audience. Therefore, in the Personalized audit I explored other alternatives that could mimic similar if not the same influence as the aforementioned activities. For example, instead of buying a product, I performed "add to cart" action that shows users’ intent to purchase a product. Instead of writing positive reviews for products, I marked top rated positive review as helpful. Since, accounts did not have any purchase history, marking a review helpful did not increase the “Helpful” count for that review. Through this activity, the account shows positive reaction towards the product, at the same time avoids manipulation and thus, eliminates impacting potential buyers/users. Lastly, I refrained from performing the experiments on real- world users. Performing actions on misinformative products could contaminate users’ searches and recommendations. It could potentially have long-term consequences in terms of what types of products are pushed at participants. Thus, in the audit 1https://social-comp.github.io/AmazonAudit-data/ 89 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION experiments, accounts were managed by bots that emulated the actions of actual users. 5.2 Related work 5.2.1 Health misinformation in online algorithmic systems The current research on online health misinformation including vaccine misinforma- tion spans three broad themes: 1) quantifying the characteristics of anti-vaccine dis- course [105, 280, 284], 2) building machine learning models to identify users engaging with health misinformation or instances of health misinformation itself [109, 158, 159] and 3) designing and evaluating effective interventions to ensure that users critically think when presented with health (mis)information [234, 399]. The existing research has a major gap. Most of these studies are post-hoc investigations of health misinfor- mation, i.e the misinformation has already propagated and is in the wild now. The current work neither takes into consideration how the user encountered the misin- formation nor does it investigate the role of the source of the misinformation. With the outset of internet, search engines have become the primary sources of information with 55% of American adults relying on the web to get medical information [323]. 5.9M people said that web search results influenced their decision to visit a doctor and 14.7M claimed that online information affected their decision on how to treat a disease [323]. Given how the medical information can directly influence one’s health and well-being, relying on internet is not always a good idea. A lot of outlets have emerged that have contaminated the online health information. These sources could be conspiracy groups or websites spreading misinformation due to vested interests or companies having commercial interests in selling herbal cures or fictitious medical treatments [350]. Moreover, online curation algorithms themselves are not built to take into account the credibility of information. Thus, it is of paramount importance that role of search engines are investigated for harvesting health misinformation. How can I empirically and systematically probe the search engines to investigate problematic behavior like prevalence of health misinformation? In the next section, I briefly de- scribe a new emerging research field called “algorithmic auditing” which is focused on investigating search engines to reveal problematic biases. I discuss this field as well as my contribution to this growing research in the next section. 90 5.2. RELATED WORK 5.2.2 Search engine audits Search engines are modern day gatekeepers and curators of information, controlling “what” content users are exposed to. Their black-box algorithm can shape user be- haviour, alter beliefs and even affect voting behaviour by impeding or facilitating the flow of certain kinds of information [117, 134, 238]. Despite their importance and the power they exert, till date, the search results and recommendations have mostly been unregulated. Information quality of search engine’s output is still measured in terms of relevance and it is up to the user to determine the credibility of information. Thus, researchers have pushed for making algorithms more accountable. One recent method developed to achieve this is to perform systematic audits of search engines. Raji et al provide one definition of algorithmic audits. An algorithmic audit involves the collection and analysis of outcomes from a fixed algorithm or defined model within a system. Through the stimulation of a mock user population, these audits can uncover problematic patterns in models of interest [324]. Previous audit studies have investigated search engines for partisan bias [331], gender bias [93] and price discrimination [188]. However, only a few studies have systematically investigated the role of search engines in surfacing misinformation ([205] is the only exception). Moreover, there is a dearth of systematic audits focusing specifically on health misinformation. The past literature, mostly consists of small- scale experiments that probe the search engines with a handful of search queries. For example, an analysis of the first 30 pages of search results for query “vaccines autism” revealed that Google.com has 10% less anti-vaccine search results compared to other search engines, like Qwant, Swisscows and Bing [160]. Whereas, search results present in the first 102 pages for the query “autism vaccine” on Google’s Turkey version returned 20% websites with incorrect information [136]. One recently published work, closely related to this work examined Amazon’s first 10 pages of search results in response to query “vaccine”. They only collected and annotated books appearing in the searches for misinformation [358]. The aforementioned studies probed the search engine for one single query and did the analysis on multiple search results pages. I, on the other hand, perform the Unpersonalized audit on a curated list of 48 search queries belonging to 10 most searched vaccine-related topics, spanning various combinations of search filters and recommendation types, over multiple days—an aspect missing in the previous work. Additionally I am the only ones who experimentally quantify the prevalence of misinformation in various search queries, topics and filters. Furthermore, instead of just focusing on books, I analyze the platform for products belonging to 91 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION different categories resulting in an extensive all categories inclusive coding scheme. Another recent study on YouTube, audited the platform for various misinformative topics including vaccine controversies. The work established the effect of personaliza- tion due to watching videos on the amount of misinformation present in search results and recommendations on YouTube [205]. However, there are no studies investigating the impact of personalization on misinformation present in the product search engines of e-commerce platforms. My work fills this gap by conducting a second set of audit— Personalized audit where I shortlist several real-world user actions and investigate their role in amplifying misinformation in Amazon’s searches and recommendations. 5.3 Amazon components and terminology For the audit experiments, I collected three major Amazon components and numerous sub-components. I list them below. 1. Search results: These are products present on Amazon’s Search Engine Results Page (SERP) returned in response to a search query. SERP results can be sorted using five filters: “featured”, “price low to high,” “price high to low,” “average customer review” and “newest arrivals.” 2. Auto-complete suggestions: These are the popular and trending search queries suggested by Amazon when a query is typed into the search box (see Figure 6.2c). 3. Recommendations: Amazon presents several recommendations to users as they navigate through the platform. For the purpose of this project, I collect recommen- dations present on three different Amazon pages: homepage, pre-purchase page and product pages. Each page is host of several types of recommendations. Table 5.1d shows the 15 recommendation types collected across 3 recommendation pages. I describe all three recommendations below. a) Homepage recommendations: These recommendations are present on the homepage of a user’s Amazon account. The homepage recommendations could be of three types “Related to items you’ve viewed”, “Inspired by your shopping trends” and “Recommended items other customers often buy again” (see Figure 5.1a). Any of the three types together or separately could be present on the homepage depending on the actions performed by the 92 5.4. METHODOLOGY user. For example, “Inspired by your shopping trends” recommendation type appears when a user performs one of two actions: either makes a purchase or adds a product to cart. b) Pre-purchase recommendations: These recommendations consist of prod- uct suggestions that are presented to users after they add product(s) to cart. These recommendations could be considered as a nudge to purchase other similar products. Figure 5.1b displays pre-purchase page. The page has several recommendations like “Frequently bought together”, “Cus- tomers also bought these highly rated items” and “Related to items you’ve viewed”, etc. I collectively call these recommendations as pre-purchase recommendations. c) Product recommendations: These are the recommendations present on the product page, also known as details page2. The page contains details of an Amazon product, like product title, category (e.g., Amazon Fashion, Books, Health & Personal care, etc.), description, price, star rating, number of reviews, and other metadata. The details page is home to several different types of recommendations. I extracted five: “Frequently bought together”, “What other items customers buy after viewing this item”, “Customers who viewed this item also viewed”, “Sponsored products related to this item” and “Customers who bought this item also bought”. Figure 5.1c presents an example of product page recommendations. 5.4 Methodology Here I present the audit methodology in detail. This section is organized as follows. I start by describing the approach to compile high impact vaccine related topics and associated search queries. Then, I present overview of each audit experiment followed by the details of numerous methodological decisions I took while designing the audits. Next, I describe the qualitative coding scheme for annotating Amazon products for health misinformation. Finally, I discuss my approach to calculate misinformation bias in search results. 2https://sellercentral.amazon.com/gp/help/external/51 93 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION (a) (b) (c) Recommend- ation page Recommendation types Homepage Related to items you’ve viewed Inspired by your shopping trends” Recommended items other customers often buy again Pre-purchase page Customers also bought these highly rated items Customers also shopped these items Related to items you’ve viewed Frequently bought together Related to items Sponsored products related Top picks for Product page Frequently bought together Customers who bought this item also bought Customers who viewed this item also viewed Sponsored products related to this item What other items customers buy after viewing this item (d) Figure 5.1: (a) Amazon homepage recommendations. (b) Pre-purchase recommenda- tions displayed to users after adding a product to cart. (c) Product page recommenda- tions. (d) Table showing 15 recommendation types spread across 3 recommendation pages. 94 5.4. METHODOLOGY (a) (b) (c) Figure 5.2: (a) Google Trend’s Related Topics for topic vaccine. People who searched for vaccine topic also searched for these topics. (b) Google Trend’s Related queries for topic vaccine. These are the top search queries searched by people related to vaccine topic. (c) Amazon’s auto-complete suggestions displaying popular and trending search queries. 5.4.1 Compiling high impact vaccine-related topics and search queries Here, I present my methodology to curate high impact vaccine-related topics and search queries. 5.4.1.1 Selecting high impact search topics: The first step of any audit is to determine input—a viable set of topics and associated search queries that will be used to query the platform under investigation. I leveraged Google Trends (Trends henceforth) to select and expand vaccine-related search topics. Trends is an optimal choice since it shares past search trends and popular queries searched by people across the world. Since it is not practical to audit all topics present on Trends, I designed a method to curate a reasonable number of high impact topics and associated search queries, i.e., topics that were searched by a large number of people for the longest period of time. I started with two seed topics and employed a breadth-wise search to expand the topic list. Trends allows to search for any subject matter either as a topic or a term. Intuitively, topic can be considered as a collection of terms that share a common concept. Searching as a term returns results that include terms present in the search query while searching as a topic returns all search terms having same meaning as the topic. For example, searching for “banana” as a term will return results that include terms like banana smoothie, banana, etc. On the other hand, searching for London as a topic will include results containing terms like Capital of UK, Londres (London in Spanish), etc 3. I began 3https://support.google.com/trends/answer/4359550?hl=en 95 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION the search with two seed words namely “vaccine” and “vaccine controversies” and decided to search them as topics. Starting the topic search by the aforementioned seed topics ensured that the related topics will cover general vaccine-related topics as well as topics related to controversies surrounding the vaccines, offering us a holistic view of search interests. I set location to United States, date range to 2004-Present (this step was performed in Feb, 2020), categories to “All” and search service to “Web search”. The date range ensured that the topics are perennial, and have been popular for a long time (note that Trends data is available from 1/1/2004 onwards). I selected the category setting as “All” so as to get a holistic view of the search trends encompassing all the categories together. Search service filter has options like was ‘web search’, ‘YouTube search’, ‘Google Shopping’, etc. Although Google shopping is an e-commerce platform like Amazon, its selection returned handful to no results. Thus, I opted for ‘web search’ service. I employed Trends’s Related Topics feature for breadth-wise expansion of search topics (see Figure 6.2a). I viewed the Related Topics using “Top” filter which presents popular search topics in the selected time range that are related to the topic searched. I manually went through the top 15 Related Topics and retained relevant topics using the following guidelines. All generic topics like Infant, Travel, Side-Effects, Pregnancy CVS, Virus, etc. were discarded. My focus was to only pick topics representing vaccine information. Thus, I discarded topics that were names of diseases but kept their corresponding vaccines. For example, I discarded topic Influenza but kept the topic Influenza vaccine. I also discarded temporal topics, such as 2009 flu pandemic vaccine. Moreover, 2009 flu pandemic was an Influenza pandemic and I had included influenza vaccine in my topic list [422]. I kept track of duplicates and discarded them from the search. To further expand the topics list, I again went through the Related Topics of the shortlisted topics and used the aforementioned filtering strategy to shortlist relevant topics. This step allowed me to expand the topic list to a reasonable number. After two levels of breadth-wise search, I obtained a list of 16 vaccine-related search topics (see Figure 5.3). Next, I combined multiple similar topics into a single topic. The idea is to collect search queries for both topics separately and then combine them under one single topic. For example, topics zoster vaccine and varicella vaccine were combined since both the vaccines are used to prevent chickenpox. Therefore, search queries of both topics were later combined under topic varicella vaccine. All topics enclosed with similar colored boxes in Figure 5.3 were merged together. 11 topics remained after 96 5.4. METHODOLOGY vaccine hpv vaccine zoster vaccine mmr vaccine hep b vaccine measles vaccine influenza vaccine varicella vaccine vaccine controversies vaccination andrew wakefield vaccination schedule rabies vaccine mmr vaccine and autism immuniz- ation hep a vaccine Figure 5.3: Figure illustrating the breadth-wise topic discovery approach used to collect vaccine-related topics from Google Trends starting from two seed topics: vaccine and vaccine controversies. Each node in the tree denotes a vaccine-related topic. An edge A→ B indicates that topic B was discovered from the Trends’ Related Topic list of topic A. For example, topics “vaccination” and “andrew wakefield” were obtained from the Trend’s Related Topic list of “vaccine controversies” topic. Then, topic “mmr vaccine and autism” was obtained from topic “andrew wakefield” and so on. indicates the topic was discarded during filtering. Similar colored square brackets indicate similar topics that were merged together. # Search topic Seed query Sample searchqueries # Search topic Seed query Sample search queries 1 vaccinecontroversies vaccine controversy/ anti vaccine anti vaccination 6 mmr vaccineand autism mmr autism/ vaccine autism autism anti vaccine shirt autism vaccine 2 vaccination vaccine/vaccination vaccine 7 influenzavaccine varicella vaccine flu shot vaccine friendly me influenza vaccine 3 andrewwakefield andrew wakefield andrew wakefield 8 hepatitisvaccine hepatitis vaccine hepatitis b vaccine wakefield autism hepatitis a vaccine 4 hpv vaccine hpv vaccine vaccine hpv 9 varicellavaccine varicella vaccine chicken pox hpv vaccine on trial varicella vaccine 5 immunization immunization immunization 10 mmr vaccine mmr vaccine mmr vaccineimmunization book measles vaccination Table 5.1: Sample search queries for each of the ten vaccine-related search topics. merging. 5.4.1.2 Selecting high impact search queries: After shortlisting a reasonable number of topics, next I determine the associated search queries per topic, to be later used for querying Amazon’s search engine. To compile search queries, I relied on both Trends’ and Amazon’s auto-complete suggestions; Trends, because it gives a list of popular queries that people searched on Google—the most popular search service, and Amazon, because it is the platform under investiga- tion and it will provide popular trending queries specific to the platform. Searching for a topic on Trends displays popular search queries related to the topic (see Figure 6.2b). I obtained top 3 queries per topic. Next, I collected Top 3 auto- complete suggestions obtained by typing seed query of each topic into Amazon’s 97 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION search box (see Figure 6.2c). I remove all animal or pet related search queries (e.g “rabies vaccine for dogs”), overly specific queries (e.g. “callous disregard by andrew wakefield”) and replaced redundant and similar queries with a single search query selected at random. For example search queries “flu shots” and “flu shot” were re- placed with a single search query “flu shot”. After these filtering steps, I had 48 search queries corresponding to 10 vaccine-related search topics. Table 5.1 presents sample search queries for all 10 search topics. 5.4.2 RQ1: Unpersonalized Audit 5.4.2.1 Overview The aim of the Unpersonalized audit is to determine the amount of misinformation present in Amazon’s search results and recommendations without the influence of personalization. I measure the amount of misinformation by determining the misinfor- mation bias of the returned results. I explain the misinformation bias calculation in detail in Section 5.4.5. Intuitively, more the number of higher ranked misinformative results, higher the overall bias. I ran the Unpersonalized audit for 15 days, from 2 May, 2020 to 16 May, 2020. I took two important methodological decisions regarding which components to audit and what sources of noise to control for. I present these decisions as well as implementation details of the audit experiment below. 5.4.2.2 What components should we collect for the Unpersonalized audits? I collected SERPs sorted by all five Amazon filters: “featured”, “price low to high”, “price high to low”, “average customer review” and “newest arrivals”. For analysis, I extracted the top 10 search results from each SERP. Recent statistics have shown that the first three search results receive 75% of all clicks [114]. Thus, I extracted the recommendations present on the product pages of the first three search results. I collected following 5 types of product page recommendations: “Frequently bought together”, “What other items customers buy after viewing this item”, “Customers who viewed this item also viewed”, “Sponsored products related to this item” and “Customers who bought this item also bought”. Refer Figure 5.1c for an example. I extracted the first product present in each recommendation type for analysis. Next, I annotated all collected components as promoting, neutral or debunking health misinformation. I describe the annotation scheme shortly in Section 5.4.4. 98 5.4. METHODOLOGY Newly created VM(no browsing history, cleared browser cache & cookies) open chrome in incognito window kill the incognito window select filter save SERP repeat for all 5 filters featured, price low to high, price high to low, avg. customer review & newest arrivals clear cookies & cache open amazon.com & search for query q Parse SERP & extract search results' URLS open product page & save it repeat for all product URLs repeat for 15 days 1 2 3 4 5 6 7 8 Figure 5.4: Eight steps performed in Unpersonalized audit. The steps are described in detail in Section 5.4.2.4 5.4.2.3 How can we control for noise? I controlled for potential confounding factors that may add to noise to the audit measurements. To eliminate the effect of personalization, I ran the experiment on newly created virtual machines (VM) and freshly installed browser with empty browsing history, cookies and browser cache. Additionally, I ran search queries from the same version of Google Chrome in incognito mode to ensure that no history is built during the audit runs. To avoid cookie tracking, I erased cookies and cache before and after opening the incognito window and destroyed the window after each search. In sum, I performed searches on newly created incognito windows everyday. All VMs operated from same geolocation so that any effects due to location would affect all machines equally. To prevent machine speeds from affecting the experiment, all VMs had the same architecture and configuration. To control for temporal effect, I searched every single query at one particular time everyday for consecutive 15 days. Prior studies have established the presence of carry-over effect in search engines, where previously executed queries affect the results of the current query when both queries are issued subsequently within a small time interval [187]. Since, I destroyed browser windows and cleared session cookies and cache after every single search, carry over effect did not influence the experiment. 5.4.2.4 Implementation details Figure 5.4 illustrates the eight steps for the Unperonalized audits. I used Amazon Web Services (AWS) infrastructure to create all VMs. I created selenium bots to automate web browser actions. As a first step, each day at a particular time, the bot opened 99 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION amazon.com in incognito window. Next, the bot searched for a single query, sorted the results by an Amazon filter and saved the SERPs. The bot then extracted the top 10 URLs of the products present in the results. The sixth step is an iterative step where the bot iteratively opened the product URLs and saved the product pages. In the last two steps, the bot cleared the browser cache and killed the browser window. I repeated steps 1 to 8 to collect search results sorted by all 5 Amazon filters ,‘featured”, “average customer review”, “price low to high” and “newest arrivals”. I added appropriate wait times after each step to prevent Amazon from detecting the account as a bot and blocking the experiment. I repeated these steps for 15 consecutive days for each of the 48 search queries. After completion of the experiment, I parsed the saved product pages to extract product metadata, like product category, contributors’ names (author, editor, etc.), star rating and number of ratings. I extracted product page recommendations for the top 3 search results only. 5.4.3 RQ2: Personalized Audit 5.4.3.1 Overview The goal of the Personalization Experiments is twofold. First, I assess whether user actions, such as clicking on a product, adding product to a cart would trigger personal- ization on Amazon. Second, and more importantly, I determine the impact of a user’s account history on the amount of misinformation presented to them in their search results page, recommendations, and auto-complete suggestions; account history is built progressively by performing a particular action for seven consecutive days. I ran the Personalized audit from 12th August, 2020 to 18th August, 2020. I took several methodological decisions while designing this experimental setup. I discuss each of these decisions below. 5.4.3.2 What real-world user actions should we select to build account history? Users’ click history and purchase history trigger personalization and influence the price of commodities on e-commerce websites [188]. They also affect the amount of misinformation present in the personalized results [205]. Informed by the results of these studies, I selected six real-world user actions that could trigger personalization and thus, could potentially impact the amount of misinformation in search results and recommendations. The actions are (1) “search” (2) “search + click” + (3) “search + click + add to cart” + + (4) “search + click + mark top-rated all positive review 100 5.4. METHODOLOGY # User action Type of history Tested values 1 Search product Product search his-tory Product debunks vaccineor other health related misinformation (annotation value -1) Neutral health information (annotation value 0) Product promotes vaccine or other health related misinformation (annotation value 1) 2 Search + click product + Product search andclick history 3 Search + click + add to cart + + Intent to purchasehistory 4 Search + click + mark “Top rated, All positive review” helpful + + Searching, click- ing and marking reviews helpful history 5 Following contributor by clicking follow button on contributor’s page Following history 6 Search product on Google(third party application) Third party search history Table 5.2: List of user actions employed to build account history. Every action and product type (misinformative, neutral or debunking) combination was performed on two accounts. One account sorted search results by filters “featured” and “average customer review”. The other account built history in the same way but sorted the search results by filters “price low to high” and “newest arrivals”. Overall, I created 40 Amazon accounts (6 actions X 3 tested values X 2 replicates for filters + 2 control accounts + 2 twin accounts). as helpful” + + (5) “follow contributor” and (6) “search on third party website” (Google.com in my case) . Table 5.2 provides an overview. First three actions involve searching for a product and/or clicking on it and/or adding it to cart. Through the fourth action, a user shows positive reaction towards a product by marking its top rated critical review as helpful. In the fifth action, a user follows a contributor. For example, for a product in the Books category, the associated list of contributors include the author and editor of the book. The contributors have dedicated profile pages that a user can follow. This action investigates the impact of following a contributor of a misinformative product. Sixth action investigates the effect of searching for an Amazon product on Google.com. The user logs into Google using the email id used to register the Amazon account. The hypothesis is that Amazon search results could be affected by third party browsing history. After selecting the actions, next I determine the products on which the actions needed to be performed. 101 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION # Contributors to debunking health products Contributors to neutral health products Contributors to misinformative health products name url code name url code name url code 1 Paul-A-Offit B001ILIGP6 Jason-Soft B078HP6TBD Andrew-J-Wakefield B003JS8YQC 2 Seth-Mnookin B001H6NG7A Joy-For-All-Art B07LDMJ1P4 Mary-Holland B004MZW7HS 3 Michael-Fitzpatrick B001H6L348 Peter-Pauper-Press B00P7QR4RO Kent-Heckenlively B00J08DNE8 4 Ziegler-Prize B00J8VZKBQ Geraldine-Dawson B00QIZY0MA Jenny-McCarthy B001IGJOUC 5 Ben-Goldacre B002C1VRBQ Tina-Payne-Bryson B005O0PL3W Forrest-Maready B0741C9TKH 6 Jennifer-A-Reich B001KDUUHY Vassil-St-Georgiev B001K8I8XC Wendy-Lydall B001K8LNVQ 7 Peter-J-Hotez B001HPIC48 Bryan-Anderson B087RL79G8 Neil-Z-Miller B001JP7UW6 Table 5.3: List of contributors selected for building up account history for action “Follow contributors”. 5.4.3.3 What products and contributors should we select for building account history? To build user history, all user actions except “follow contributor” need to be performed on products. First, I annotated all products collected in the Unpersonalized audit run as debunking (-1), neutral (0) or promoting (1) health misinformation. I present the annotation details in Section 5.4.4. For each annotation value (-1, 0, 1), I selected top- rated products that had received maximum engagement and belonged to the most occurring category—‘Books’. I started by filtering Books belonging to each annotation value and eliminated the ones that did not have an “Add to cart” button on their product page at the time of product selection. Next, I sorted the Books based on the accumulated engagement—number of customer ratings received by the Books. I again sorted the top 10 books obtained from the previous sorting based on star ratings received by the Books . I selected top 7 books from the second sorting for the experiment (see Table 5.4 for the shortlisted books). Action “follow contributor” is the only action that is performed on contributors’ Amazon profile pages 4. I selected contributors who contributed to the most number of debunking (-1), neutral (0) and promoting (1) books. I retained only those who had a profile page on Amazon. Table 5.3 lists the selected contributors. 5.4.3.4 How do we design the experimental setup? I performed all six actions explained in Section 5.4.3.2 and Table 5.2 on Books (or contributors of the books in case of action “follow contributor”) that are either all debunking, neutral or promoting health misinformation. Each action and product type combination was acted upon by two treatment accounts. One account built its search history by first performing searches on Amazon and then viewing search results sorted 4The contributors could be authors, editors, people writing foreward of a book, publisher, etc. 102 5.4. METHODOLOGY # Debunking products Neutral products Misinformative productstitle (url code) S R title (url code) S R title (url code) S R 1 Vaccinated: One Man’s Quest to Defeat the World’s Deadliest Diseases (006122796X) 4.7 134 Baby’s Book: The First Five Years (Woodland Friends) 144131976X 4.9 614 Dissolving Illusions: Disease, Vaccines, and The Forgotten History (1480216895) 4.9 953 2 Epidemiology and Prevention of Vaccine- Preventable Dis- eases, 13th Edition (990449114) 4.5 11 My Child’s Health Record Keeper (Log Book) (1441313842) 4.8 983 The Vaccine Book: Making the Right De- cision for Your Child (Sears Parenting Library) (0316180521) 4.8 1013 3 The Panic Virus: The True Story Behind the Vaccine-Autism Con- troversy (1439158657) 4.4 175 Ten Things Every Child with Autism Wishes You Knew, 3rd Edition: Revised and Updated paper- back (1941765882) 4.8 792 The Vaccine-Friendly Plan: Dr. Paul’s Safe and Effective Ap- proach to Immunity and Health-from Pregnancy Through Your Child’s Teen Years (1101884231) 4.8 877 4 Vaccines: Expert Con- sult - Online and Print (Vaccines (Plotkin)) (1455700908) 4.4 18 Baby 411: Your Baby, Birth to Age 1! Ev- erything you wanted to know but were afraid to ask about your newborn: breast- feeding, weaning, calming a fussy baby, milestones and more! Your baby bible! (1889392618)) 4.8 580 How to End the Autism Epidemic (1603588248) 4.8 717 5 Bad Science(865479186) 4.3 967 Uniquely Human: A Different Way of Seeing Autism (1476776245) 4.8 504 How to Raise a Healthy Child in Spite of Your Doctor: One of America’s Leading Pediatricians Puts Parents Back in Control of Their Children’s Health (0345342763) 4.8 598 6 Reasons to Vacci- nate: Proof That Vaccines Save Lives (B086B8MM71) 4.3 232 The Whole-Brain Child: 12 Revolu- tionary Strategies to Nurture Your Child’s Developing Mind (0553386697) 4.7 2347 Miller’s Review of Critical Vaccine Stud- ies: 400 Important Scientific Papers Summarized for Par- ents and Researchers (188121740X) 4.8 473 7 Deadly Choices: How the Anti-Vaccine Movement Threatens Us All (465057969) 4.2 223 We’re Pregnant! The First Time Dad’s Pregnancy Handbook (1939754682) 4.7 862 Herbal Antibiotics, 2nd Edition: Natural Alternatives for Treat- ing Drug-resistant Bacteria (1603429875) 4.7 644 Table 5.4: Books corresponding to each annotation value shortlisted to build account histories in the Personalized audit. S represents the star rating of the product and R denotes the number of ratings received by the book. 103 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION search amazon product amazon login & acc. verification search amazon product amazon login & acc. verification search amazon product amazon login & acc. verification search amazon product amazon login & acc. verification amazon login & acc. verification amazon login & acc. verification click amazon product & save HTML add product to cart and save HTML click amazon product, save HTML & go to review page mark top rated,posi- tive review' helpful click amazon product & save HTML go to amazon homepage & save HTML search query1 select filter1 & save SERP collect auto- complete suggestions repeat for all 48 search queries select filter2 & save SERP go to contributor's webpage and follow login to google.com, search product, save HTML and go to amazon.com go to amazon homepage & save HTML search query1 select filter1 & save SERP collect auto- complete suggestions repeat for all 48 search queries select filter2 & save SERP go to amazon homepage & save HTML search query1 select filter1 & save SERP collect auto- complete suggestions repeat for all 48 search queries select filter2 & save SERP go to amazon homepage & save HTML search query1 select filter1 & save SERP collect auto- complete suggestions repeat for all 48 search queries select filter2 & save SERP go to amazon homepage & save HTML search query1 select filter1 & save SERP collect auto- complete suggestions repeat for all 48 search queries select filter2 & save SERP go to amazon homepage & save HTML search query1 select filter1 & save SERP collect auto- complete suggestions repeat for all 48 search queries select filter2 & save SERP time t time t+90min time t+150min Day 0 Day 1-7 Treatments amazon login & acc. verification Control search query1 select filter1 & save SERP collect auto- complete suggestions repeat for all 48 search queries select filter2 & save SERP Figure 5.5: Steps performed by treatment and control accounts in Personalized audit corresponding to the 6 different features. by filters “featured” and “average customer review” while the other did the same but sorted results by “price low to high” and “newest arrivals”5. I did not use the filter “price high to low” since intuitively it is less likely to be used during searches. I also created 2 control accounts corresponding to 2 treatments that emulated the same actions as the treatment except that they did not build account histories by per- 5Every account created for this experiment was run by a bot. It was not possible for a bot to complete the following order of tasks in 24 hours because of wait times added between every subsequent actions– building history using a particular action, searching for 48 search queries sorted by 4 filters and collect auto-complete suggestions for those queries. Thus, every action-product type combination was performed on two accounts. First account, sorted the search results by two filters and second account sorted results using remaining two filters. I call these two accounts replicates since they built their history in the same way. 104 5.4. METHODOLOGY forming one of the 6 user actions. Like 2 treatment accounts, the first control account searched for 48 queries curated in Section 5.4.1.2 and sorted them by filters “Featured” and “Average customer Review” while the other control sorted them by the remain- ing two filters. Figure 5.5 outlines the experimental steps performed by treatment and control accounts. I also created twins for each of the control accounts. The twins performed the exact same tasks as the corresponding control. Any inconsistencies between a control account and its twin can be attributed to noise, and not personal- ization. Remember, Amazon’s algorithms are a black box. Even after controlling for all known possible sources of noise, there could be some sources that I am not aware of or the algorithm itself could be injecting some noise in the results. If the difference between search results of control and treatment is greater than the baseline noise, only then it can be attributed to personalization. Prior audit work have also adopted the strategy of creating a control and its twin to differentiate between the effect due to noise versus personalization [188]. Overall, I created 40 Amazon accounts (6 actions X 3 tested values X 2 replicates for filters + 2 control accounts + 2 twin accounts). Next, I discuss the components collected from each account. 5.4.3.5 What components should we collect for the personalized audits? I collected search results and auto-complete suggestions for all accounts and recom- mendations only for the treatment accounts. Search results were sorted by filters ‘featured”, “average customer review”, “price low to high” and “newest arrivals”. Once a user starts building their account history, Amazon displays several recommen- dations to drive engagement on the platform. I collected various types of recommen- dations spread across three recommendation pages—homepage, product pages and pre-purchase page. Pre-purchase pages were only collected for the accounts that per- form “add to cart” action. Additionally, product pages were collected for accounts that clicked on search results while creating their respective account history. Each of the aforementioned pages consist of several recommendation types, such as “Customers who bought this item also bought”, etc. I collected the first product present in each of these recommendation types from both product pages and pre-purchase pages and two products from each type from the homepages for further analysis. Refer to Table 5.1d and Figures 5.1a, 5.1b and 5.1c for examples of these recommendation types. 105 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION 5.4.3.6 How do we control for noise? To control for extraneous sources of noise, I adopted a number of measures from previous audit studies [187, 188]. First, all VMs had same configuration, architecture and operating system. Second, I ran all VMs from same geolocation to control for the effect of location. Third, I controlled for demographics by setting the same gender (Female) and Age (birth date 1/1/1995) for newly creating Google accounts. Recall, that these Google accounts were used to sign-up for the Amazon accounts. Since, the VMs were newly created, the browser had no search history that could otherwise hint towards users’ demographics. Fourth, all accounts created their histories at the same time. They also performed the Amazon searches at the same time each day, thus, controlling for temporal effects. I also control for the category of the products used in building account histories. I selected books that have accumulated highest engagement for the experiments. Lastly, I did not account for carry over effects since it affected all the treatment and control accounts equally. 5.4.3.7 Implementation details Figure 5.5 illustrates the experimental steps. I ran 40 selenium bots on 40 VMs. Each selenium bot operated on a single Amazon account. On day 0, I manually logged in to each of the accounts by entering login credentials and performing account verification. Next day, experiment began at time t. All bots controlling treatment accounts started performing various actions to build history. Note, everyday bots built history by performing actions on a single Book/contributor. At time t+90, bots collected and saved Amazon homepage. Later, all 40 accounts (control+treatment) searched for 48 queries with different search filters and saved the SERPs. Next, the bots collected and saved auto-complete suggestions for all 48 queries. I included appropriate wait times between every step to prevent accounts from being recognized as bots and getting banned in the process. I repeated these steps for a week. At the end of the week, for each treatment account I had collected personalized search results, recommendations and auto-complete suggestions. Next, I annotated the collected search results and recommendations to determine their stance on misinformation so that later I could analyze them to study the effect of user actions on the amount of misinformation presented to users in each component. 106 5.4. METHODOLOGY A. Scale Value Annotation Description Annotation Heuristics Sample Amazon Products -1 debunks vac- cine misinfor- mation Product debunks, derides OR provides evidence against the myths/controversies surrounding vaccines OR helps understand anti-vaccination attitude OR promotes use of vaccination OR describes history of a disease and details how its vaccine was developed OR describes scientific facts about vaccines that help users to understand how they work OR debunks other health- related misinforma- tion 0 neutral health related informa- tion All medicines and antibodies OR medical equipment (thermometer, syringes, record-books, etc.) OR dietary supplements that do not violate Amazon’s policy OR products about animal vaccination and diseases OR health-related products not promoting any conspirato- rial views about health and vaccines 1 promotes vac- cine and other health related misinformation Product promotes disuse of vaccines OR promotes anti- vaccine myths, controversies or conspiracy theories sur- rounding the vaccines OR advocates alternatives to vac- cines and/or western medicine (diets, pseudoscience methods like homeopathy, hypnosis, etc.) OR product is a misleading dietary supplement that violates Amazon’s policy on dietary supplements- the supplement states that it can cure, mitigate, treat, OR prevent a disease in hu- mans, but the claim is not approved by the FDA OR it promotes other health-related misinformation OR pro- motes other health-related misinformation 2 unknown Product’s description and metadata is not sufficient to an-notate it as promoting, debunking or neutral information 3 removed Product’s URL is not accessible at the time of annotation - 4 Other language Product’s title and description is in language other thanenglish 5 Unrelated Non-health related products Table 5.5: Description of annotation scale, heuristics along with sample products corresponding to each annotation value. 107 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION Amazon data Sample 200 products Interpret & analyze data Develop topical categories Develop annotation scale, heuristics & guidelines Assign labels to sampled products Qualitative codification by Expert (1st author) Sample 32  products Annotation by 6 researchers Discussion about each label Resolve conflicts Feedback on codification scheme Compile feedback & modify coding scheme Refining coding scheme  Sample and annotate 100 other products Further refine coding scheme Feedback from external researcher  Finalize the coding scheme Initial qualitative  coding scheme multiple iterations Repeat the process 3 times Figure 5.6: Qualitative Coding Process 5.4.4 Annotating Amazon data for health misinformation Unlike determining partisan bias where bias could be determined by training models using features such as news source bias [331], labelling a product for misinformation is hard and time-consuming. There are no pre-determined sources of misinformation such as list of sellers or authors of misinformative products on Amazon. Additionally, I found that the annotation process for some categories of products, like Books, Kindle ebooks, etc. required me to consider the product image, read the book’s preview, if available, and even perform external search about the authors. Therefore, I decided to manually annotate the data collection. I developed a qualitative coding scheme to label the Amazon data collection through an iterative process that required several rounds of discussions to reach an agreement on the annotation scale. In the first round, the first author randomly sampled 200 Amazon products across different topics and categories. After multiple iterations of analyzing and interpreting each product, the author came up with an initial 7-point annotation scale. Then, six researchers with extensive work experience on online misinformation independently annotated 32 products, randomly selected from the 200 products. I discussed every product’s annotation value and the researchers’ annotation process. I refined the scale as well as the scheme based on the feedback. This process was repeated three times after which all six annotators reached a consensus on the annotation scheme and process. In the fourth round, I gathered additional feedback from an external researcher from the Credibility Coalition group6—an international organization of interdisciplinary researchers and practitioners dedicated to developing standards for news credibility and tackling the problem of online misinformation. The final result of the multi-stage iterative process (see Figure 5.6) is a 5-point annotation scale 6https://credibilitycoalition.org/ 108 5.4. METHODOLOGY comprising of annotation values ranging from -1 to 3 (see Table 5.5). The scale provides an overview of the scientific quality of products users are exposed to when they make vaccine-related searches on Amazon. 5.4.4.1 Annotation Guidelines In order to annotate the product, the annotators were required to go through several fields present on the product’s detail page in the following order: title, description, top critical and top positive reviews about the product, other metadata present on the detail page, such as editorial reviews, legal disclaimers, etc. If the product is a book, the annotators are also recommended to do the following three steps: (1) go through the first few pages in the book preview 7 (2) see other books published by the authors, (3) perform a google search on the book and go through the first few links to discover more information about the book. 5.4.4.2 Annotation scale and heuristics: Below I describe each value in my annotation scale. Debunking (-1): Annotation value ‘-1’ indicates that the product debunks vaccine misinformation or derides any vaccine-related myth or conspiracy theory or promotes the use of vaccination. As an example, consider the poster titled Immunization Poster 1979 Vintage Star Wars C-3PO R2-D2 Original (B00TFTS194)8 that encourages parents to vaccinate their children. Products helping users understand anti-vaccination attitude are also included in this category. For example, consider a book titled Health, Risk and News: The MMR Vaccine and the Media (Media and Culture) (0820488380) which explores the controversy surrounding MMR vaccine and autism and investigates how media played a role in panicking the public. Moreover, products are also considered “debunking” if they describe the history about the development of vaccines or the science behind how vaccines work. Promoting (1): Conversely, I annotated a product as ‘1’ if it promotes any kind of vaccine or health-related misinformation. This category includes all products that 7Amazon has introduced a Look Inside feature that allows users to preview few pages from the book. 8Every title of the Amazon product is followed by a URL id. This URL id can be converted into a url using the format: http://www.amazon.com/dp/url_id 109 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION support or substantiate any vaccine related myth or controversies and encourages parents to raise a vaccine-free child. For example, consider the following books that promote anti-vaccination agenda. In A Summary of the Proofs that Vaccination Does Not Prevent Small-pox but Really Increases It (B01G5QWIFM), the author talks about dangers of large scale vaccination and in Vaccine Epidemic: How Corporate Greed, Biased Science, and Coercive Government Threaten Our Human Rights, Our Health, and Our Children (B00CWSONCE), the authors question vaccine safety and present several narratives of vaccine injuries. I annotated both books as 1. Several Amazon Fashion (B07R6PB2KP) products, Amazon Home (B01HXAB7TM) merchandise and cell phone accessories (B07Z9LDBD5) are also included in this category since they contained anti-vaccine slogans like “Educate before you Vaccinate”, “Jesus wasn’t vaccinated”, etc. I also include all products advocating any alternatives to vaccines in this category. Consider the book Vaccine Free: Prevention and Treatment of Infectious Contagious Disease with Homeopathy (1482789604) that not only encourages people to use homeopathy as a vaccine alternative to treat or prevent diseases but also instills fear in the minds of the public by discussing instances of vaccine injuries. Additionally, I include products that promote other health-related misinformation in this category. For example, the diet book titled Natural Immune Support For People on the Go: Food, Diet, Immune Support Supplements, Colloidal Silver and Many Other Natural Remedies. Learn How to Naturally Boost Your Immune System! (B086C1WT36) includes recipes with colloidal silver as an ingredient. According to the US Department of Health and Services, consumption of colloidal silver can be dangerous to health9, and thus, this book was annotated with value ‘1’. Dietary supplements that claim to cure diseases in their description but are not approved by Food and Drug Administration (FDA) are also included in this cate- gory.10 For example, consider the dietary supplement Yinchiao Tablet Herbal Supplement (1586377791) that claims to cure pediatric ear infection, acute bronchitis, tonsillitis, pneumonia, pharyngitis, parotitis, measles, and influenza despite the claims not being approved by FDA. Not just dietary supplements, there are several books that claim to treat health conditions using unproven techniques. For example, the book Weight Loss Hypnosis For Women: How to Lose Weight Quickly Using Meditation, Affirmations, And Other Hypnosis Techniques found during the audit suggests self-hypnosis techniques to 9https://www.nccih.nih.gov/health/colloidal-silver 10Note that for the dietary supplements category, Amazon asks sellers not to state that the products cure, mitigate, treat, or prevent a disease in humans in their details page, unless that statement is approved by the FDA [87] 110 5.4. METHODOLOGY help lose weight (B0881V7RBL). Neutral (-0): I annotated all medical equipment and medicines as neutral (annotation value ‘0’). Note that it is beyond the scope of this project to determine the safety and veracity of the claims of each medicine sold on the Amazon platform. This means that the number of products that I have determined to be promoting (1) serve as the lower bound of the amount of misinformation present on the platform. This category also includes dietary supplements that do not violate Amazon’s policy and pet/animal- related products. Health-related products not advocating a conspiratorial view are also included in this category. Other annotations: I annotated a product as ‘2’ if the product’s description and metadata were not sufficient to determine the stance of the product. I assigned values ‘3’ and ‘4’ to all products whose URL was not accessible at the time of the annotation and whose title and description was in a language other than English, respectively. I annotated ‘all non-health related products (e.g. diary, carpet, electronic products, etc.) with value ‘5’. Table 5.5 presents examples of products belonging to these categories. Both the audits resulted in a dataset of 4,997 Amazon products that were annotated by the first author and Amazon Mechanical Turk workers (MTurks). The first author being the expert annotated majority of products (3,367) to determine what would be a good task representation to obtain high quality annotations for the remaining 1,630 products from novice MTurks. I obtained three Turker ratings for each remaining product and used the majority response to assign the annotation value. My task design worked. For 97.9% of the products, annotation values converged. Only 34 products had diverging responses. The first author then annotated these 34 products to obtain the final set of annotation values. I describe the AMT job in detail in the next section. 5.4.4.3 Amazon Mechanical Turk Job Turk job description: In this section, I describe how I obtained annotations for the study from Amazon Mechanical Turk workers (MTurks). Past research has shown that it is possible to get good data from crowd-sourcing platforms like Amazon Mechanical Turk (AMT) if the workers are screened and trained for the crowd-sourced task [282]. Below I describe the screening process and the annotation task briefly. Screening: To get high quality annotations, I screened MTurks by adding 3 qualifi- cation requirements. First, I required MTurks to be Masters. Second, I required them 111 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION to have atleast 90% approval rating. And lastly, I required them to get a full score of 100 in a Qualification Test. I introduced a test to ensure that MTurks attempting the annotation job had a good understanding of the annotation scheme. The test had one eligibility question asking them to confirm whether they are affiliated to authors’ University. Other three questions involved Mturks to annotate three Amazon prod- ucts. First author annotated these products and thus, their annotation values were known. To ensure MTurks understood the task and annotation scheme, I gave detailed instructions and described each annotation value in detail with various examples of Amazon products in the qualifying test. Examples were added as visuals. In each ex- ample, I marked the meta data used for the annotation and explained why a particular annotation value was assigned to the product. I took two steps to ensure that instructions and test questions were easy to un- derstand and attempt. First, I posted the test on subreddit r/mturk11—a community of MTurks, to obtain feedback. Second, I did a pilot run by posting ten tasks along with the aforementioned screening requirements. After obtaining positive feedback from the community and successful pilot-run, I released the AMT job titled “Amazon product categorization task”. I paid the Turks according to the United States federal minimum wage ($7.25/hr). Additionally, I did not disapprove any worker’s responses. Amazon product categorization task: I posted 1630 annotations (tasks) in batches of 50 at a time. The job was setup to get three responses for each annotation value. The majority response was selected to label the Amazon product. To avoid any MTurk bias, I did not explicitly reveal that the idea behind the task was to get misinformation annotations. I used the term "Amazon product categorization" to describe the project and task throughout. For 34 products, all three MTurk responses differed. The first author then annotated these products to get annotation values. 5.4.5 Quantifying misinformation bias in SERPs: In this section, I describe the method to determine the amount of misinformation present in search results collected in both sets of audits. How do I estimate the mis- information bias present in Amazon’s SERPs? First, I used the annotation scheme to assign misinformation bias scores (si) to individual products present in SERPs. I converted the 7 point ( -1 to 5) scale to misinformation bias scores with values -1, 0 and 1. I mapped annotation values 2, 3, 4, and 5 to bias score 0. Because of the mapping, the 11https://www.reddit.com/r/mturk/ 112 5.4. METHODOLOGY bias calculations will give a conservative estimate (lower bound) of misinformation bias present in the search results. Now, a product can be assigned one of the three bias scores: -1 suggests that product debunks misinformation, 0 indicates a neutral stance and 1 implies that the product promotes misinformation. Next, to quantify misinformation bias in Amazon’s SERPs, I adopt the framework and metrics proposed in prior work to quantify partisan bias in Twitter search results [242]. Below I discuss three kinds of bias proposed by the framework and delineate how I estimate each bias with respect to misinformation. Table 5.6 illustrates how I calculated the bias values. (i) The input bias (ib) of a list is the mean of misinformation bias scores of the constituting elements [242]. Therefore, ib = ∑n i=1 si, where n is the length of the list & si is the misinformation bias score of ith item in the list. Input bias is an unweighted bias, i.e it is not affected by the rank/ordering of the items. An ib of 1 indicates that all items in the list promote misinformation. By contrast, ib of -1 indicates that all items in the list debunk misinformation. (ii) The output bias (ob) of a ranked list is the overall bias present in the SERPs and is the sum of biases introduced due to input and ranks of the input. It is computed as the cumulative weighted average of misinformation bias scores of items in the ranked list [242]. The score assigns more weight to the higher ranked items. I first calculate weighted bias score B(r) of every rank r, which is the average misinformation bias of results ranked from 1 to r. Thus, B(r) = ∑r i=1 si r , where si is the misinformation bias score of ith item. Output bias (ob) is the average of weighted bias score B(r) for all ranks. Thus, by definition ob = ∑r i=1B(i) r . (iii) The ranking bias (rb) is the bias introduced by the ranking algorithm of the search engine [242]. It is calculated by subtracting input bias from output bias. Thus, rb = ob-ib. In this case, high-ranking bias indicates that the search algorithm ranks misinformative products higher than neutral or debunking products. Why do I need three bias scores? Amazon’s search algorithm is not only selecting the products to be shown in the search results but it is also ranking them according to their internal algorithm. Therefore, the overall bias (ob) could be introduced either at the product selection stage (ib), or ranking stage (rb) or both. Studying all three biases gives us an elaborate understanding of how biases are introduced by the search algorithm. All three bias values (ib, ob and rb) lie between -1 and 1. A bias score larger than 0 indicates a lean towards misinformation. Conversely, a bias score less than 0 113 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION Rank r Items Bias of each product Bias till rank r Bias value 1 i1 s1 B(1) s1 2 i2 s2 B(2) 12 (s1 + s2) 3 i3 s3 B(3) 13 (s1 + s2 + s3) Input Bias (ib) 13 (s1 + s2 + s3) Output Bias (ob) 13 [s1(1 + 1 2 + 1 3 ) + s2( 1 2 + 1 3 ) + s3( 1 3 )] Rank Bias (rb) ob-ib Table 5.6: Example illustrating the bias calculations. For a given query, Amazon’s search engine presents users with the following products in the search results i1, i2 and i3. The misinformation bias scores of the products are s1, s2 and s3 respectively. The table has been adopted from previous work [242]. A bias score larger than 0 indicates a lean towards misinformation. indicates a propensity towards debunking information. I only consider top 10 search results in each SERP. Thus, in the bias calculations, rank always varies from 1 to 10. 5.5 RQ1 Results [Unpersonalized audit]: Quantify misinformation bias The aim of the Unpersonalized audit is to determine the amount of bias in search results. Below I present the input, rank, and output bias detected by the audit in search results of all 10 vaccine-related topics with respect to 5 search filters. 5.5.1 RQ1a: Search results I collected 36,000 search results from the Unpersonalized audit run, out of which 3,180 were unique. Recall, I collected these products by searching for 48 search queries belonging to vaccine-related topics and sorting results by each of the 5 Amazon filters. I later extracted and annotated top 10 search results from all the collected SERPs resulting in 3,180 annotations. Figure 5.7a shows the number (and percentage) of products corresponding to each annotation value. Through the audits, I find a high percentage (10.47%) of misinformative products in the search results. Moreover, the number of misinformative products outnumbered the debunking products. Figure 5.8 illustrates the distribution of categories of Amazon products annotated as debunking (-1), neutral (0) and promoting (1). Note that the products promoting health misin- formation primarily belong to categories Books (35.43%), Kindle eBooks (28.52%), 114 5.5. RQ1 RESULTS [UNPERSONALIZED AUDIT]: QUANTIFY MISINFORMATION BIAS 30.06% 0.97% 3.23% 5.44% 10.47%8.99% 1200 1000 800 600 400 200 0 debunking neutral promoting unable to annotate unrelatedother language URL not accessible N o. o f s ea rc h re su lts 40.81% (a) Search results 1.99% 37.56% 12.95% 0.48% 2.80% 0.21% 43.98% 0 debunking neutral promoting unable to annotate unrelatedother language URL not accessible 800 N o. o f r ec om m en da tio ns 700 600 500 400 300 200 100 (b) Recommendations Figure 5.7: RQ1a: (a) Number (percentage) of search results belonging to each an- notation value. While majority of products have a neutral stance (40.81%), products promoting health misinformation (10.47%) are greater than products debunking health misinformation (8.99%). (b) Number (percentage) of recommendations belonging to each annotation value. A high percentage of product recommendations promote misinformation (12.95%) while percentage of recommendations debunking health misinformation is very low (1.99%). Amazon Fashion Amazon Home Amazon Audiobooks Books Health & Personal Care Other Office Products Kindle eBooks Product Categories debunking neutral promoting No. of products 0 200 400 600 800 1000 1200 Figure 5.8: RQ1a: Figure showing categories of promoting, neutral and debunking Amazon products (search results). All categories occurring less than 5% were com- bined and are presented as other category. Note that misinformation exists in various forms on Amazon. Products promoting health misinformation include books (Books, Kindle eBooks, Audible Audiobooks), apparel (Amazon Fashion) and dietary supple- ments (Health & Personal Care). Additionally, proportion of books promoting health misinformation is much greater than proportion of books debunking misinformation. [Categories of debunking, neutral and promoting Amazon products] Debunking products mostly belong to categories, Kindle eBooks, Books, Amazon fashion and Amazon home. Neutral products mostly belong to categories Books, Kindle eBooks, Health & Personal care and Amazon home. Promoting products belong to categories Books, Kindle eBooks, Health & Personal care and Amazon fashion. Amazon Fashion (12.61%)—a category that includes t-shirts, apparel, etc. and Health & Personal Care (10.21%)—a category consisting of dietary supplements. Below I discuss the misinformation bias observed across all the vaccine-related topics, the Amazon search filters and search queries. 115 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION mmr influenza vacc. immunization varicella vacc. vaccination hpv vacc. mmr vacc.& autism hepatitis vacc. controversies andrew wak. featured cust. rev. price LtoH price HtoL new arriv. featured cust. rev. price LtoH price HtoL new arriv. featured cust. rev. price LtoH price HtoL new arriv. input bias rank bias output bias 1.0 0.5 0.0 -0.5 -1.0 Figure 5.9: RQ1a: Input, rank and output bias for all 10 vaccine-related topics across five search filters. The bias scores are average of scores obtained for each of the 15 days. Input and rank bias is positive (>0) in the search results of majority of topics for filters “featured” and “average customer review”. A bias value greater than 0 indicates a lean towards misinformation. Topics “andrew wakefield” and “mmr vaccine & autism” have a positive input bias across all five filters indicating that search results of these topics contain large number of products promoting health misinformation irrespective of the filter used to sort the search results. Topic “vaccination” has the highest overall bias (output bias) of 0.63 followed by topic “andrew wakefield” that has output bias of 0.53. 5.5.1.1 Misinformation bias in vaccine related topics I calculate the input, rank and output bias for each of the 10 search topics. All the bias scores presented are average of scores obtained across the 15 days of audit. The bias score for a topic is also the average across each of the constituting search queries. Figure 5.9 shows the bias scores for all the topics, search filters and bias combinations. Input bias: I observe a high input bias (>0) for all topics except “hepatitis” for the “average customer review” filter indicating presence of large number of misinformative books in the SERPs when search results are sorted by this filter. Similarly, input biases for most topics is also positive for “featured” filter. Note, “featured” is the default Amazon filter. Thus, by default Amazon is presenting more misinformative search results to users searching for vaccine related queries. Topics “andrew wakefield”, “vac- cination” and “vaccine controversies” have highest input biases for the both “featured” and “average customer review” filters. Another noteworthy trend is the negative input bias for 7 out of 10 topics with respect to filter “newest arrivals” indicating that there are more debunking products present in the SERP when users look for newly 116 5.5. RQ1 RESULTS [UNPERSONALIZED AUDIT]: QUANTIFY MISINFORMATION BIAS featured avg.customer reviews price low to high price high to low newsest arrival input bias rank bias output bias -1 0 1 Figure 5.10: Input, rank and output bias for all filter types. appearing products on Amazon. “andrew wakefield” and “mmr vaccine & autism” are the only two topics that have the high input bias (>0) across all the five filters. Interestingly, there is no topic that has negative input bias across all filters. Recall, a negative (<0) bias indicates a debunking lean. Topics “mmr” and “hepatitis” have negative bias scores in four out of five filters. Rank bias: 8 out of 10 topics have positive rank bias for filters “price low to high” and “average customer reviews” and 6 out of 10 topics have positive rank bias for filter “featured”. These results suggest that Amazon’s ranking algorithm favors misin- formative products and ranks them higher when customers filter their search results by the aforementioned filters. Some topics have negative input bias but positive rank bias. Consider topic “mmr” with respect to filter “price low to high” whose input bias is -0.1 but the rank bias is 0.065. This observation suggests that although the SERPs obtained had more debunking products, a few misinformative products were still ranked higher. Rank bias for 8 out of 10 topics with respect to filter “newest arrivals” was negative, similar to what I observed for input bias. Output bias: Output bias is positive (>0) for most of the topics with respect to filters “featured” and “average customer reviews”. Recall, a bias value greater than 0 indicates a lean towards misinformation. Topic “vaccination” has the highest output bias value of 0.63 for filter “featured”. On the other hand, topic “hepatitis” has least output bias for filter “newest arrivals”. 5.5.1.2 Misinformation bias in search filters Figure 5.10 shows the results for all 5 filters. Bias scores are averaged across all search queries. All 5 filters except “newest arrivals” have positive input, rank, and output 117 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION ib rb obrbob search query - Amazon filter ib vaccination is not immunization - custReview vaccination is not immunization - featured vaccination is not immunization - priceHtoL autism vaccine - custReview vaccine - custReview anti vaccine books - custReview vaccine friendly plan - featured anti vaccination - custReview anti vaccine - custReview anti vaccine books - featured vaccine free me book - featured andrew wakefield - custReview andrew wakefield - featured wakefield autism - custReview vaccine controversy - custReview wakefield autism - featured vaccine friendly plan - custReview vaccine free me - featured anti vaccine shirt - featured varicella vaccine - custReview 1.0 0.5 0.0 -0.5 -1.0 search query - Amazon filter Figure 5.11: Top 20 search query-filter combinations with highest output bias. In other words, these query-filter combinations are the most problematic ones containing highest amount of misinformation. misinformation bias. Filter “average customer review” has the highest positive bias indicating that misinformative products belonging to vaccine related topics receive higher ratings. I present the implications of these results in the Discussion section. 5.5.1.3 Misinformation bias in search queries Figure 5.11 shows the top 20 search queries and filter combinations with highest output bias. Predictably, filter “newest arrivals” does not appear in any instance. Surprisingly, 9 search query-filter combinations have very high output biases (ob > 0.9). Search query “vaccination is not immunization” has output bias of 1 for three filter types. Most of the search queries in Figure 5.11 have a negative connotation, i.e the queries themselves have a bias (e.g search queries anti vaccine books, vaccination is not immunization indicates an intent to search for misinformation). This observation indicates that if you search for anti vaccine stuff, you will get high amount of vaccine and health misinformation. The most troublesome observation is the presence of high output bias for generic and neutral search queries, “vaccine” (ob = 0.99) and “varicella vaccine” (ob = 0.79). These results indicate that, unlike companies like Pinterest, who have altered their search engines in response to vaccine related queries [383], Amazon has not made any modification to its search algorithm to push less anti vaccine products to users. 118 5.5. RQ1 RESULTS [UNPERSONALIZED AUDIT]: QUANTIFY MISINFORMATION BIAS (a) Customers who bought this item also bought (CBB) (b) Customers who viewed this item also viewed (CVV) (c) Frequently bought together (FBT) (d) Sponsored products related to this item (e) What other items customers buy after viewing this item (CBV). Note that the recommenda- tion graph for CBV recommendation type is indeed one figure. It consists of two disconnected components, indicating strong filter bubble effect. Figure 5.12: Recommendation graphs for 5 different types of recommendations col- lected from the product pages of top three search-results obtained in response to 48 search queries, sorted by 5 filters over a duration of 15 days during Unpersonalized audit run. denotes products annotated as misinformative, as neutral and as de- bunking. Node size is proportional to the times the product was recommended in that recommendation type. Large sized red nodes coupled with several interconnections between red nodes indicate a strong filter-bubble effect where recommendations of misinformative products returned more misinformation. 119 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION 5.5.2 RQ1b: Product page recommendations I extracted the product page recommendations of top 3 search results present in the SERPs. The product page constitutes various types of recommendations. For analy- sis, I consider the first product present in 5 types of recommendations “Customers who bought this item also bought”, “Customers who viewed this item also viewed”, “Frequently bought together”, “Sponsored products related to this item” and “What other items customers buy after viewing this item”. The process resulted in 16,815 recommendations out of which 1,853 were unique. Figure 5.7b shows the number and percentage of recommendations belonging to different annotation values. The percentage of misinformative recommendations (12.95%) is much higher than the de- bunking recommendations (1.95%). The total input bias in all 16,815 recommendations is 0.417 while in all 1,853 unique recommendations is 0.109, indicating a lean towards misinformation. Does filter-bubble effect occur in product page recommendations? To answer, I compared the misinformation bias scores of all types of recommendations considered together (refer Table 5.7). Kruskal Wallis Anova test revealed the difference to be significant (KW H(2, N=16815) = 6,927.6, p=0.0). Post-hoc Tukey HSD test showed that the product page recommendations of misinformative products contain more misin- formation when compared to recommendations of neutral and debunking products. Even more concerning is that the recommendations of debunking products have more misinformation than neutral products. To investigate further I qualitatively studied the recommendation graphs of each of the 5 recommendation types (Figure 5.12). Each node in the graph represents an Amazon product. An edge A→ B indicates that B was recommended in the product page of A. Node size is proportional to the number of times the product was recommended. 5.5.2.1 Recommendation type- Customers who bought this item also bought (CBB) Misinformation bias scores of CBB recommendations are significantly different for debunking, neutral, and promoting products (KW H(2, N=3133) = 2136.03, p=0.0). Post hoc tests reveal that CBB recommendations of misinformative products have more misinformation when compared to CBB recommendations of neutral and debunking products. Additionally CBB recommendations of neutral products have more misinfor- mation than CBB recommendations of debunking products. The findings are evident from Figure 5.12a too. For example, there are several instances of red nodes connected 120 5.5. RQ1 RESULTS [UNPERSONALIZED AUDIT]: QUANTIFY MISINFORMATION BIAS Type of product page recom- mendations Kruskal Wallis Anova Test Post hoc Tukey HSD d n m All KW H(2, N=16815) = 6,927.6, p=0.0 M>D & M>N & D>N 37 1576 240 Cust. who bought this item also bought (CBB) KW H(2, N=3133) = 2136.03, p=0.0 M >D & M>N & N>D 11 225 66 Cust. who viewed this item also viewed (CVV) KW H(2, N=6575) = 628.52, p=3.2e-137 M>D & M>N & D>N 18 331 100 Frequently bought together (FBT) KW H(2, N=2234) = 1611.34, p=0.0 M>D & M>N & D>N 1 111 16 Sponsored products related to this item KW H(2, N=388) = 277.08, p=6.8e-61 M>D & M>N 7 953 98 What other items cust. buy after viewing this item (CBV) KW H(2, N=4485) = 2673.95, p=0.0 M>D & M>N & D>N 9 230 57 Table 5.7: RQ1b: Analyzing echo chamber effect in product page recommendations. M, N and D are the means of misinformation bias scores of products recommended in the product pages of misinformative, neutral and debunking Amazon products respectively. Higher means indicate that recommendations contain more misinforma- tive products. For example, M>D indicates that recommendations of misinformative products have more misinformation than recommendations of debunking products. d, n and m are number of unique products annotated as debunking, neutral and promoting for each recommendation type. to each other. In other words, if you click on a misinformative search result, you will get misinformative products in CBB recommendations. Few of the green nodes are attached to red ones indicating that CBB recommendation of a neutral product sometimes contain a misinformative product. The most recommended product present in CBB is a misinformative Kindle book titled Miller’s Review of Critical Vaccine Studies: 400 Important Scientific Papers Summarized for Parents and Researchers (B07NQW27VD). 5.5.2.2 Recommendation type- Customers who viewed this item also viewed (CVV) Misinformation bias scores of CVV recommendations are significantly different for debunking, neutral and promoting products (KW H(2, N=4485) =2673.95, p=0.0) . Post hoc test indicates that CVV recommendations of misinformative products have more misinformation than CVV recommendations of debunking and neutral products. Notably, CVV recommendations of debunking products contain more misinformat- ion than CVV recommendations of neutral products. In the recommendation graph (Figure 5.12b ), I see edges connecting multiple red nodes supporting my finding that CVV recommendations of misinformative products mostly contain other misinfor- mative products. The most recommended product in this recommendation type is a misinformative Kindle book titled Dissolving Illusions (B00E7FOA0U). 121 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION 5.5.2.3 Recommendation type- Frequently bought together (FBT) Misinformation bias scores of FBT recommendations are significantly different for debunking, neutral and promoting products (KW H(2, N=2234) = 1611.34, p=0.0). Post hoc tests reveal that amount of misinformation in FBB recommendations of misin- formative products is significantly more than the FBB recommendations of neutral and debunking products. The finding is also evident from the graph (Figure 5.12c). There are large sized red nodes attached to other red nodes and several green nodes attached together indicating the presence of a strong filter-bubble effect. “Frequently bought together” can be considered an indicator of buying patterns on the platform. The post hoc tests indicate that people buy multiple misinformative products together. The most recommended product in this recommendation type is a misinformative Paperback book titled Dissolving Illusions: Disease, Vaccines, and The Forgotten History (1480216895). 5.5.2.4 Recommendation type- Sponsored products related to this item Most of the sponsored recommendations are either neutral or promoting (Figure 5.12d). Statistical test reveals that the misinformation bias score of sponsored rec- ommendations are significantly different among debunking, neutral and promoting products (KW H(2, N=6575) = 628.52, p=3.2e-137). Post hoc tests reveal same results as for CVV recommendations. There are two most recommended sponsored books. First is a misinformative paperback book titled Vaccine Epidemic: How Corporate Greed, Biased Science, and Coercive Government Threaten Our Human Rights, Our Health, and Our Children (1620872129). Second is a neutral Kindle book titled SPANISH FLU 1918: Data and Reflections on the Consequences of the Deadliest Plague, What History Teaches, How Not to Repeat the Same Mistakes (B08774MCVP). 5.5.2.5 Recommendation type- What other items customers buy after viewing this item (CBV) Misinformation bias scores of CBV recommendations are significantly different for debunking, neutral and promoting products (KW H(2, N=2234) = 1611.34, p=0.0). Post hoc tests reveal a filter-bubble effect in the product recommendations. CBV recommendations of misinformative products contain more misinformation than neutral or debunking products. Furthermore, CBV recommendations of debunking products contain more misinformation than neutral products. This is troubling since 122 5.5. RQ1 RESULTS [UNPERSONALIZED AUDIT]: QUANTIFY MISINFORMATION BIAS RQ2a RQ2b RQ2c Search results Recommendations Auto complete suggestions Featured Avg. customer reviews Price low to High Newest Arrivals Homepage Pre-purchase Product page Actions performed to build account history D N M D N M D N M D N M D N M D N M D N M D N M Search product IR IR IR NP NP NP NP NP NP NP NP NP - - - X X X X X X NP NP NP Search & click product IR IR IR NP NP NP NP NP NP NP NP NP KW H(2, N=42) = 32.07, p = 1.08e-07 M>N>D X X X KW H(2, N=42) = 24.89, p = 3.94e-06 M>D & M>N NP NP NP Search + click & add to cart product IR IR IR NP NP NP NP NP NP NP NP NP KW H(2, N=42) = 33.48, p = 5.38e-08 M>N>D KW H(2, 42) = 32.63, p = 8.19e-08 M>N>D KW H(2, N=42) = 24.05, p = 5.98e-06 M>D & M>N NP NP NP Search + click & mark “Top rated, All positive review” as helpful IR IR IR NP NP NP NP NP NP NP NP NP KW H(2, N=42) = 32.33, p = 9.52e-08 M>N>D X X X KW H(2, 42) = 23.36, p = 8.44e-06 M>N & M>D NP NP NP Following contributor IR IR IR NP NP NP NP NP NP NP NP NP - - - X X X X X X NP NP NP Search product on Google IR IR IR NP NP NP NP NP NP NP NP NP - - - X X X X X X NP NP NP Table 5.8: RQ2: Table summarizing RQ2 results. IR suggests noise and inconclusive results, i.e search results of control and its twin seldom matched. Thus, difference between treatment and control could either be attributed to noise or personalization, making it impossible to study the impact of personalization on misinformation. NP denotes little to no personalization. - indicates that the given activity had no impact on the component. X indicates that component was not collected for the activity. M, N and D indicate average per day bias in the component collected by accounts that built their history by performing actions on misinformative, neutral or debunking products. Higher mean value indicates more misinformation. For example, consider the cell corresponding to action “search + click & add to cart product” and “Homepage” recommendation. M>N>D indicates that accounts adding misinformative products to cart ends up with more misinformation in their homepage recommendations in comparison to accounts that add neutral or debunking products to cart. users who are clicking on products that present scientific information are pushed more misinformation in this recommendation type. The presence of an echo chamber is quite evident in the recommendation graph (see Figure 5.12e). The graph has two disconnected components, one comprising a mesh of misinformative products indicating a cluster of misinformative products that keep getting recommended. CBV is also indicative of buying patterns of Amazon users. The algorithm has learnt that people viewing misinformative products end up purchasing them. Thus, it pushes more misinformative items to users that click on them, creating a feedback loop. The most recommended product in this recommendation type is a misinformative Kindle book titled Miller’s Review of Critical Vaccine Studies: 400 Important Scientific Papers Summarized for Parents and Researchers (B07NQW27VD). 123 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION 5.6 RQ2 Results [Personalized audit]: Effect of personalization The aim of the Personalized audit was to determine the effect of personalization due to account history on the amount of misinformation returned in search results and various recommendations. Table 5.8 provides a summary. Below, I explain the effect of personalization on each component. 5.6.1 RQ2a: Search Results I measure personalization in search results for each Amazon filter using two metrics: Jaccard index and Kendall τ coefficient. Jaccard index determines similarity between two lists. A Jaccard index of 1 indicates that the two lists have same elements and zero indicates that the lists are completely different. On the other hand, Kendall τ coefficient, also known as Kendall rank correlation coefficient determines the ordinal correlation between two lists. It can take values between [-1,1] with -1 indicating that lists have inverse ordering, 0 signifying no correlation and 1 suggesting that items in the list have same ranks. First I compare search results of control account and its twin. Recall I created twins for the two control accounts in the Personalized audit to establish the baseline noise. Ideally, both should have Jaccard and Kendall rank correlation coefficient closer to 1 since the accounts do not build any history, are set up in a similar manner, perform searches at the same time and are in the same geolocation. Next, I compare search results of control account with treatment accounts that built account histories by performing different actions. If personalization is occurring, the difference between search results of treatment and control should be more than the baseline noise (or Jaccard index and Kendall τ should be less). Whereas, if the baseline noise itself is large, it indicates inconsistencies and randomness in the search results. Interestingly, I found significant noise in search results of control and its twin for “featured” filter with jaccard index <0.8 and Kendall’s rank correlation coefficient <0.2, that is, control and its twins seldom matched. Presence of noise suggests that Amazon is injecting some randomness in the “featured” search results. Unfortunately, this means that I would be not be able to study the effect of personalization on the accounts for the “featured” search filter setting. For the other three search filters, “average customer review”, “price low to high” 124 5.6. RQ2 RESULTS [PERSONALIZED AUDIT]: EFFECT OF PERSONALIZATION featured avg. cust. review price low to high newest arrivals av g. ja cc ar d in de x 1.0 0.8 0.6 0.4 0.2 0.0 control M N D (a) 1.0 0.8 0.6 0.4 0.2 0.0 control M N D featured avg. cust. review price low to high newest arrivals av g. ke nd al l's ta u (b) Figure 5.13: Investigating the presence and amount of personalization due to “fol- lowing contributors” action by calculating (a) Jaccard index and (b) kendall’s tao metric between search results of treatment and control. M, N and D indicate results for accounts that follow contributors of misinformative, neutral and debunking products respectively. and “newest arrivals”, I see high (>0.8) jaccard index and kendall τ metric values between and control and its twin. Additionally, I do not see any personalization for these filters since metrics values for treatment-control comparison are similar to that of control-twin comparison. Figure 5.13 shows the metrics calculation for control account and treatments that have built their search histories by following contributor’s of misinformative, neutral and debunking products. I see two minor inconsistencies for filter “average customer review” in accounts building their history on debunking products. The metric values for treatment-control account was higher than control- twin value. This means treatment received more similar results to control than its twin account. In any case, the treatment account does not see more inconsistency than the control and its twin indicating no personalization. Other user actions show similar results, hence, I have removed their results for brevity. 5.6.2 RQ2b: Recommendations I investigated the occurrence of personalization and its impact on the amount of misinformation in three different types of recommendations. I discuss each type of recommendation below. Homepage recommendations: I find that homepages are personalized only when a user performs click actions on the search results. Thus, actions “add to cart”, “search + click” and “mark top rated most positive review helpful” led to homepage person- alization. On the other hand, homepages were not personalized for actions “follow 125 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION add to cart (M) add to cart (N) add to cart (D) search-click (N) search-click (M) search-click (D) mark rev helpful (M) mark rev helpful (N) mark rev helpful (D) 1.0 0.5 0.0 -0.5 -1.0 In pu t b ia s 20 20 -0 8- 13 20 20 -0 8- 14 20 20 -0 8- 15 20 20 -0 8- 16 20 20 -0 8- 17 20 20 -0 8- 18 20 20 -0 8- 12 (a) 1.0 0.5 0.0 -0.5 -1.0 In pu t b ia s 20 20 -0 8- 13 20 20 -0 8- 14 20 20 -0 8- 15 20 20 -0 8- 16 20 20 -0 8- 17 20 20 -0 8- 18 20 20 -0 8- 12 add to cart (M) add to cart (N) add to cart (D) (b) 1.0 0.5 0.0 -0.5 -1.0 In pu t b ia s 20 20 -0 8- 13 20 20 -0 8- 14 20 20 -0 8- 15 20 20 -0 8- 16 20 20 -0 8- 17 20 20 -0 8- 18 20 20 -0 8- 12 add to cart (M) add to cart (N) add to cart (D) search-click (N) search-click (M) search-click (D) mark rev helpful (M) mark rev helpful (N) mark rev helpful (D) (c) Figure 5.14: (a) Input bias in homepages of accounts performing actions ‘add to cart”, “search + click” and “mark top rated all positive review” for seven days of experiment run. (b) Input bias in pre-purchase recommendations of accounts for 7 days experiment run. These recommendations are only collected for accounts adding products to their carts. (c) Input bias in product pages of accounts performing actions “add to cart”, “search + click” and “mark top rated all positive review” for 7 days of experiment run. M, N and D indicate that the accounts performed actions on misinformative, neutral and debunking products respectively. contributor”, “search product” and “google search” actions. After identifying the actions leading to personalized homepages, I investigate the impact of personalization on the amount of misinformation. In other words, I investigate how misinformation bias in homepages is different for accounts building their history by performing ac- tions on misinformative, neutral and debunking products. For each action, I had 6 accounts, two replicates for each action and product type (misinformation, neutral and debunking). For example, for action “add to cart” two accounts built their history by adding misinformative products to cart for 7 days, two added neutral products and two accounts added debunking products to their carts. I calculate per day input bias (ib) in homepages by averaging the misinformation bias scores of each recommended product present in the homepage. Therefore, for every account I have seven bias values. I consider only top two products in each recommendation type. Recall, homepages could contain three different types of recommendations ‘Inspired by your shopping trends”, “Recommended items other customers often buy again” and “Related to items you’ve viewed”. All the different types are considered together for analysis. Statistical tests reveal significant differences in the amount of misinformation present in homepages of accounts that built their histories by performing actions on misinformative, neutral and debunking products (see Table 5.8). This observation holds true for all three activities “add to cart”, “search + click” and “mark top rated most positive review helpful”. Post hoc test reveal an echo chamber effect. Amount of misinformation in recommendations of products performing actions on misinforma- 126 5.6. RQ2 RESULTS [PERSONALIZED AUDIT]: EFFECT OF PERSONALIZATION tive products is more than the amount of misinformation in homepages of accounts performing actions on neutral products which in turn is more than the misinformation present in homepages of accounts performing actions on debunking products. Figure 5.14a shows per day input bias of homepages of different accounts per- forming different actions. I take an average of the replicates for plotting the graph. Surprisingly, performing actions “mark top rated most positive review helpful” and “search + click” on a misinformative product leads to highest amount of misinformation in the homepages, even more than the homepages of accounts adding misinformative products to the cart. This means that amount of misinformation present in homepage is comparatively less once a user shows an intention to purchase a misinformative product but high if a user shows interest in the misinformative product but doesn’t show an intention to buy it. Figure 5.14a also shows that amount of misinformation present in homepages of accounts performing actions “mark top rated most positive review helpful” and “search + click” on misinformative products gradually increases and becomes 1 on day 4 (2020-08-15). Bias value 1 indicates that all analysed prod- ucts in homepages were misinformative. Homepage recommendations of products performing actions on neutral objects show 0 bias constantly indicating all recommen- dations on all days were neutral. On the other hand, average bias in homepages of accounts building history on debunking accounts rises a little above 0 in the first three days but eventually fells below 0 indicating a debunking lean. Pre-purchase recommendations: These recommendations are only presented to users that add product(s) to their Amazon cart. Therefore, they were collected for 6 accounts, 2 of which added misinformative products to cart, 2 added neutral products and the other 2 added debunking products. These recommendations could be of several types. See Figure 5.1b for an example of pre-purchase page. For the analysis, I consider the first product present in each recommendation type. Statistical tests reveal significant difference in the amount of misinformation present in pre-purchase recommendations of accounts that added misinformative, neutral and debunking products to cart (KW H(2, 42) = 32.63,p = 8.19e-08). Those adding misinformative products to cart contain more misinformation than the accounts adding neutral or debunking products to their carts. Figure 5.14b shows the input bias in the pre-purchase recommendations for all the accounts. There is no coherent temporal trend, indicating that the input bias in this recommendation type depends on the particular product being added to cart. However, an echo chamber effect is evident. For example, bias in pre-purchase 127 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION recommendations of accounts adding misinformative products to cart is above 0 for all 7 days. Product recommendations: I collect product recommendations for accounts perform- ing actions “add to cart”, “search + click” and “mark top rated most positive review helpful”. I find significant difference in the amount of misinformation present in prod- uct page recommendations when accounts performing the aforementioned actions on misinformative, neutral and debunking products (refer Table 5.8). Post hoc analysis reveals that product page recommendations of misinformative products contain more misinformation than those of neutral and debunking products. Figure 5.14c shows the input bias present in product pages for various accounts. The bias for neutral products is constantly 0 across the 7 days, but for misinformative products, it is con- stantly greater than 0 for all actions. I see an unusually high bias value on the 6th day (2020-08-17) of the experiment for accounts performing actions on debunking product titled Reasons to Vaccinate: Proof That Vaccines Save Lives (B086B8MM71). I checked the product page recommendations of this particular debunking book and found several misinformative recommendations on its product page. 5.6.3 RQ2c: Auto-complete suggestions I audited auto-complete suggestions to investigate how personalization affects the change in search query suggestions. My initial hypothesis was that performing actions on misinformative products could increase the auto-complete suggestions of anti- vaccine search queries. However, I found little to no personalization in the auto- complete suggestions indicating that account history built by performing actions on vaccine-related misinformative, neutral, or debunking products have little to no effect on how auto-complete suggestions of accounts change. In the interest of brevity, I do not add the results and graphs for this component. 5.7 Discussion There is a growing concern that e-commerce platforms are becoming hubs of dan- gerous medical misinformation. Because of a lack of regulatory policies, websites like Amazon are providing a platform to people who are making money by sell- ing misinformation—dangerous anti-vaccine ideas, pseudoscience treatments, or un- 128 5.7. DISCUSSION proven dietary alternatives—some of which could have dangerous effects on people’s health and well-being. With a US market share of 49%, Amazon is the leading product search engine in the United States [112]. Thus, any misinformation present in its search and recommendations could have a far-reaching influence where they can negatively shape users’ viewing and purchasing patterns. Therefore, in this study, I audited Ama- zon for the most dangerous form of health misinformation—vaccine misinformation. My work resulted in several critical findings with far-reaching implications. I discuss them below. 5.7.1 Amazon: a marketplace of multifaceted health misinformation The analysis shows that Amazon hosts a variety of health misinformative products. Maximum number of such products belong to the category Books and Kindle eBooks (Figure 5.8). Despite the enormous amount of information available online, people still turn to books to gain information. A Pew Research survey revealed that 73% of Americans read atleast one book in a year [309]. Books are considered “intellectual heft”, have more presence than scientific journals and thus, leave “a wider long lasting wake” [197]. Thus, anti-vaccine books could have a wider reach and can easily influence the audience negatively. Moreover, it does not help that a large number of anti-vaccine books are written by authors with medical degrees [358]. Not just anti-vaccine books, there are abundant pseudoscience books on the platform, all suggesting unproven methods to cure diseases. I found diet books suggesting recipes with colloidal silver—an unsafe product, as an ingredient. Some of the books proposing cures for incurable diseases, like autism and auto immune diseases, can have a huge appeal for people suffering with such diseases [328]. Thus, there is an urgent need to check the quality of health books presented to the users. The next most prominent category of health misinformative products is Amazon Fashion. Numerous apparel are sold on the platform with innovative anti-vaccine slogans, giving tools to the anti-vaccine propagandists to advocate their anti-vaccine agenda and gain visibility, not just in the online world, but in the offline world. During the annotation process, I also found many dietary supplements claiming to treat and cure diseases—a direct violation of Amazon’s policy on dietary supplements. Overall, I find that health misinformation exists on the platform in various forms—books, t-shirts, and other merchandise. Additionally, it is very easy to sell problematic content because of the lack of appropriate quality-control policies and their enforcement. 129 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION 5.7.2 Amazon search results: a stockpile of health misinformation Analysis of the Unpersonalized audit revealed that 10.47% of search results promote vaccine and other health-related misinformation. Notably, the higher percentage of products promoting misinformation compared to debunking suggests that anti-vaccine and problematic health-related content is churned out more and the attempts to de- bunk the existing misinformation is less. I also found that Amazon’s search algorithm puts more health misinformative products in search results than debunking products leading to high input bias for topics like “vaccination”, “vaccine controversies”, “hpv vaccine”, etc. This is specifically true for search filters “featured” and “average cus- tomer reviews”. Note, that “featured” is the default search filter indicating that by default users will see more misinformation when they search for the aforementioned topics. On the other hand, if users want to make a purchase decision based on product ratings, again users will be presented with more misinformation because it seems misinformative products have higher user ratings on the platform. I also found a ranking bias in Amazon’s search algorithm with misinformative products ranked higher. Past research has shown that people trust higher ranked search results [181]. Thus, more number of higher ranked misinformative products can make problematic ideas in these products appear mainstream. The only positive finding of my analysis was the presence of more debunking products in search results sorted by filter “newest arrivals”. This might indicate that higher quality products are being sold on the plat- form in recent times. However, since there are no studies/surveys indicating which search filters are mostly used by people while making purchase decisions, it is difficult to conclude how beneficial this finding is. 5.7.3 Amazon recommendations: problematic echo chambers Many search engines and social media platforms employ personalization to enhance users’ experience on their platform by recommending them items that the algorithm think they will like based on their past browsing or purchasing history. But on the downside, if not checked, personalization can also lead users into a rabbit hole of prob- lematic content. My analysis of Personalized audit revealed that an echo chamber exists on Amazon where users performing real-world actions on misinformative books are presented with more misinformation in various recommendations. Just a single click on an anti-vaccine book could fill your homepage with several other similar anti vac- cine books. And if you proceed to add that book in your cart, Amazon again presents 130 5.7. DISCUSSION more anti-vaccine books, nudging you to purchase even more problematic content. The worst discovery is that your homepages get filled with more misinformation if you just show an interest in a misinformative product (by clicking on it) compared to when you show an intention to buy it by adding product to your cart. Additionally on the product page itself, you are presented 5 different kinds of recommendations each of them presenting you with equally problematic content. In a nutshell, once you start engaging with misinformative products on the platform, you will be presented with more misinformative stuff at every point of your Amazon navigation route and at multiple places. These findings would not have been concerning if buying a milk chocolate would lead to recommendations of other chocolates of different brands. The problem is that Amazon is blindly applying its algorithms on all products including problematic content. Its algorithms do not differentiate or gives special significance to vaccine-related topics. Amazon has learnt from users’ past viewing and purchasing behaviour and has categorized all the anti-vaccine and other problematic health cures together. It presents the problematic content to users performing actions on any of these products, creating a dangerous recommendation loop in the process. There is an urgent need for the platform to treat vaccine and other health related topics differently and ensure high quality searches and recommendations. In the next section, I present a few ways, based on my findings, that could assist the platform in combating health misinformation. 5.7.4 Combating health misinformation Tackling online health misinformation is a complex problem and there is no easy silver-bullet solution to curb its spread. However, the first step towards addressing is accepting that there is a problem. Many Tech giants have acknowledged their social responsibility in ensuring high quality in health-related content and are actively taking many steps to ensure the same. For example, Google’s policy “Your Money Or Your Life” classifies medical and health-related search pages as pages of particular importance, whose content should come from reputable websites [270]. Pinterest completely hobbled the search results some queries such as ‘anti-vax’ [383] and limited the search results for other vaccine-related queries to content from officially recognized health institutions [206]. Even Facebook—a platform known to have questionable content moderation policies—banned anti-vaccine ads and demoted the anti-vaccine content in its search results to make its access difficult [268]. Therefore, given the massive reach and user base of Amazon—206 million website visits every month 131 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION [34]—it is disconcerting to see that Amazon has not yet joined the bandwagon. To date, it has not taken any concrete steps toward addressing the problem of anti-vaccine content on its platform. I recommend several short-term and long-term strategies that the platform can adopt. 5.7.4.1 Short-term strategies: design interventions, The simplest short-term solution would be to introduce design interventions. The Unpersonalized audit revealed high misinformation bias in search results. The platform can use interventions as an opportunity to communicate to users the quality of data presented to them by signalling misinformation bias. The platform could introduce a bias meter or scale that signals the amount of misinformation present in search results every time it detects a vaccine-related query in its search bar. The bias indica- tors could be coupled with informational interventions like showing Wikipedia and encyclopedia links, that have already been proven to be effective in reducing traffic to anti-vaccine content [234]. The second intervention strategy could be to recognise and signal source bias. During the massive annotation process, I realized that several health misinformative books have been written by known anti-vaxxers like Andrew Wakefield, Jenny Mccarthy, Robert S. Mendelsohn, etc. I also present a list of authors who have contributed to most misinformative books in Table 5.3. Imagine a design where users are presented with a message “The author is a known anti-vaxxer and is known to write books that might contain health minformation” every time they click a book written by these authors. An another extreme short term solution could be to either enforce a platform-wide ban prohibiting sale of any anti-vaccine product or hobble search results for anti-vaccine search queries. 5.7.4.2 Long term strategies: algorithmic modifications and policy changes. Long term interventions would include modification of search, ranking and recom- mendation algorithm. My investigations revealed that Amazon’s algorithm has learnt problematic patterns through consumer’s past viewing and buying patterns. It has categorized all products of similar stance together (see several edges connecting red nodes—products promoting misinformation in Figure 5.12). In some cases, it has also associated some misinformative products with neutral and debunking products (refer Figure 5.12) Amazon needs to “unlearn” this categorization. Additionally, the platform should incorporate misinformation bias in their search and recommendation algo- rithms to reduce the exposure to misinformative content. There is also an urgent need 132 5.8. LIMITATIONS to introduce some policy changes. First and foremost, Amazon should stop promoting health misinformative books by sponsoring them. I found 98 misinformative products in the sponsored recommendations indicating that today, anti-vaccine outlets can eas- ily promote their products by spending some money. Amazon should also introduce some minimum quality requirements that should be met before a product is allowed to be sponsored or sold on its platform. It can employ search quality raters to rate the quality of search results for various health-related search queries. Google has already set an example with its extensive Search Quality Rating process and guidelines [7, 169]. In recent times Amazon introduced several policy and algorithmic changes including roll out of a new feature “verified purchase” to curb fake reviews problem on its platform [334]. Similar efforts are required to ensure product quality as well. Amazon can introduce a similar “verified quality” or “verified claims” tag with health-related products once they are evaluated by experts. Having a product base of millions of products can make any kind of review process tedious and challenging. Amazon can start by targeting specific health and vaccine related topics that are most likely to be searched. My work itself presents a list of most popular vaccine-related topics that can be used as a starting point. Finally, I hope my work acts as a call to action for Amazon and also inspires other vaccine and health audits on other platforms. 5.8 Limitations This study is not without limitations. First, I only considered top products in each recommendation-type present on a page while determining bias of the entire page. An- notating and determining bias of all the recommendations occurring in a page would give a much more accurate logic of recommendation algorithms. However, past studies have shown that the top results receive the highest number of clicks, thus, are more likely to receive attention from users [114]. Second, search queries themselves have inherent bias. For example query ‘anti vaccine t-shirt’ suggests that user is looking for anti-vax products. Higher bias in search results of neutral queries is much worse than that of biased queries. I did not segregate the analysis based on search query bias. Al- though, I did notice two neutral search queries namely ‘vaccine’ and ‘varicella vaccine’ appearing in the list of most problematic search-query and filter combinations. Third, while I audited various recommendations present on the platform, I did not analyse the email recommendations—product recommendations present outside the platform. A journalistic report pointed that email recommendations could be contaminated 133 CHAPTER 5. AUDITING E-COMMERCE PLATFORMS FOR HEALTH MISINFORMATION too if a user shows an interest in a misinformative product but leaves the platform without buying it [121]. I leave investigation of these recommendations to future work. Fourth, in the Personalized audit, accounts only built history for a week. Moreover, experiments were only run on Amazon.com. I plan to continue to run the experiments and explore features such as geolocation for future audits. Fifth, the audit study only targeted results returned in response to vaccine-related queries. Since, Amazon is a vast platform that hosts variety of products and sellers, I cannot claim that my results are generalizable for other misinformative topics or conspiracy theories. However, my methodology is generic enough to be applied to other misinformative topics. Lastly, another major limitation of the study is that in the Personalized audits account histories were built in a very conservative setting. Accounts performed actions on only one product each day. Additionally, the actions were only performed on products with the same stance. In real-world it will be tough to find users who only add misinformative products to their carts for seven days continuously. But in spite of this limitation, my study still provides a peek into the workings of Amazon’s algorithm and has paved the way for future audits that could use my audit methodology and extensive qualitative coding scheme to perform experiments considering complex real-world settings. 5.9 Conclusion In this study, I conducted two sets of audit experiments on a popular e-commerce platform, Amazon to empirically determine the amount misinformation returned by its search and recommendation algorithm. I also investigated whether personalization due to user history plays any role in amplifying misinformation. My audits resulted in a dataset of 4,997 Amazon products annotated for health misinformation. I found that search results returned for many vaccine-related queries contain a large number of misinformative products leading to high misinformation bias. Moreover, misinforma- tive products are also ranked higher than debunking products. My study also suggests the presence of a filter-bubble effect in recommendations, where users performing actions on misinformative products are presented with more misinformation on their homepages, product page recommendations, and pre-purchase recommendations. I believe, my proposed methodology to audit vaccine misinformation can be applied to other platforms to investigate health-misinformation bias. Overall, my study brings attention to the need for search engines to ensure high standards and quality of results for health-related queries. 134 C H A P T E R 6 IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION 6.1 Introduction Chapters 3, 4, and 5, focused on auditing online platforms for algorithmically curated misinformation. This chapter focuses on determining ways to support online fact- checking as a way to combat online misinformation. While a lot of research has been done to design scalable technological systems for fact-checking, such systems fail to have an impact on fact-checking in the real world [171] for primarily for two reasons. First, their design is treated as a technical solution to what is often seen as a purely technological problem. But fact-checking is a complex socially-situated technical phenomenon involving collaboration among multiple stakeholder groups at various stages of the process. Yet, current automated fact-checking systems rarely take into account the insights and needs of “the human”—stakeholder groups who are central to this process. Second, automated fact-checking systems are limited in their applicability. For instance, most systems are either restricted to verifying claims about very specific public statistics by matching them against official figures (e.g., unemployment rate, inflation rate, etc.) or they are limited to identifying simple declarative claims to debunk [171]. Hence, automated fact-checking solutions fail to generalize to real-world fact-checking scenarios [171]. In other words, the rigidity of a purely technical system lacks the social flexibility necessary to support an inherent 135 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION socio-technical process. In the last decade, HCI and CSCW communities have developed a better under- standing of the gap between the social and the technical [39, 129, 163, 395] and thus, are well positioned to develop an understanding of the socio-technical mechanisms underlying fact-checking. Yet, to date, we know very little about how fact-checking is done in practice and what we could do to socially and technically support the fact-checking process. In this work, I elucidate how fact-checking is practiced by lay- ing bare the human and technological infrastructures that facilitate and shape the fact-checking process in a fact-checking team/organization. I attempt to foreground the social by revealing the synergistic collaboration that occurs among human infras- tructure —various stakeholder groups that work together to accomplish fact-checking work. I provide visibility to the stakeholder groups’ roles, needs, and activities, many of which often remain invisible to the external world. I also highlight the technologi- cal infrastructure —the tools, technology, processes, and policies—that supports and enables the work of the stakeholder groups. The foregrounding of the infrastructures supporting the fact-checking work helps us unravel the technical, policy, and infor- mation barriers to fact-checking. My hope is that by considering both the human and technological infrastructures underlying the fact-checking process, we might narrow the “design-reality gap” [195]—the gap that exists between the needs of stakeholder groups involved in fact-checking and design of technical systems for fact-checking. Overall, in this work, I answer the following research questions. RQ1: What are the various infrastructures supporting fact-checking work? RQ1a Human infrastructure: Who are the various stakeholder groups involved in the fact-checking process? What roles do they play? How do they evaluate priorities and collaborate together to make decisions? RQ2b Technological infrastructure: How do tools, technology and policy support stakeholder groups in performing their roles? RQ2 Barriers to fact-checking: What are the various needs and challenges of stake- holder groups involved in the fact-checking process? To answer the research questions, I adopt a multi-stakeholder approach and per- form semi-structured interviews with 26 participants belonging to 16 fact-checking organizations or fact-checking teams within publication houses. I began this study by 136 6.1. INTRODUCTION interviewing fact-checkers and editors—the stakeholder groups identified in previous works [49, 173, 362]. I discovered the existence of other stakeholder groups through these interviews and expanded the recruitment by reaching out to them using conve- nience [137] and snowball sampling [168]. The participants had diverse representations from 4 continents—North America, Europe, Asia, and Africa. I intentionally sampled participants widely across fact-checking teams, organizations, and countries to capture the practices and challenges emerging in this space. My work aims at uncovering the possible human and technological infrastructures supporting the fact-checking work in teams/organizations across the regions instead of capturing the variability of fact-checking process across regions. My findings reveal the existence of six distinct stakeholder groups involved in the fact-checking process and the various roles performed by them. The identified stakeholder groups are: (1) Editors who are responsible for overseeing the fact-checking process, including planning what topics to target and ensuring the integrity of the fact-checks produced, (2) External fact-checkers who are responsible for monitoring the external world (social media platforms, presidential speeches, etc.), investigating dubi- ous claims and writing fact-checks, (3) In-house fact-checkers who are responsible for fixing incorrect claims present in the news stories or articles produced internally in the media/news publication house, (4) Investigators and researchers who conduct in-depth investigation and data analysis of persistently circulating disinformation campaigns (e.g. investigating coordinated campaigns that used anti-Ruto hashtags1 on Twitter to spread misinformation2), (5) Social media managers who distribute fact-checks across multiple social media platforms and strategize on ways to increase audience engage- ment with the fact-checks, and (6) Advocators who spearhead initiatives to improve policies around the availability of information and statistics in their countries to im- prove the quality of fact-checking. By studying the roles performed by the stakeholder groups (human infrastructure), I establish how fact-checking has evolved from a process to debunk individual pieces of misinformation (short-term claims centric fact-checking) to a multi-step long-term campaign involving research, policy, and advocacy work (long-term advocacy centric fact-checking). I find that stakeholder groups mediate their roles via different tools (technological infrastructure) ranging from third-party social media monitoring tools (e.g. BuzzSumo [80]), public databases, process management 1William Ruto is the current Deputy President of the Republic of Kenya. In May 2020, several anti-Ruto hashtags (e.g. #RutoMustGo, #RutoWantedToKillUhuru, etc.) began trending on Twitter in an attempt to discredit the Deputy President. 2https://investigate.africa/opt-report-post/ 137 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION tools (e.g. Trello [392]), color coding schemes, to training and educational workshops. The interviews reveal that fact-checkers are skeptical of using fully automated AI- based tools. They desire algorithm explainability and the involvement of humans in the decision-making process as key values in the systems they would use. I also identify several technical, policy, and informational challenges. For example, there are limited tools to monitor and flag content on private messaging platforms and investigate false claims in videos and content in local regional languages. I also find that in some countries, information to investigate claims from government and civic bodies is either unavailable, difficult to obtain, or not updated periodically. 6.1.1 Research context: Human and Technological Infrastructures The foundational work of studying infrastructure as a subject could be credited to Star and Ruhleder [367]. The authors consider infrastructure as “something that emerges for people in practice, connected to activities and structures” [367]. Rather than be- ing a thing to use, Star and Ruhleder refer to infrastructure as a relational concept [367]. Scholars have since advocated for broadening the understanding of infrastruc- ture by also including social practices, processes, and flow of information [340] and have called for investigating the complexities and particularities of infrastructures in practice [83, 226, 401]. In response to this call, several research studies in Computer Supported Cooperative Work (CSCW) and related fields, such as Human Computer Interaction (HCI) have examined the infrastructures—both human and technological— supporting the diverse socio-technical systems [131, 210, 252, 293] in various contexts such as, in health-care [141, 307, 375], e-governance [89], crisis situations [265], etc. For my study, I first focus on human infrastructure which Lee et al define as “orga- nizations and actors that must be brought into alignment in order for work to be accomplished” [252]. Scholars have used this concept of human infrastructure to de- note the human partnerships that are necessary for a successful socio-technical system [340]. Drawing on such scholarly work [131, 252, 340], I use the analytical lens of hu- man infrastructure to “magnify the social” by rendering visibility to the stakeholder groups who collaborate to enable the fact-checking work. Highlighting the human infrastructure also allows us to focus on how collaboration and coordination are ac- complished in socio-technical systems [252]. Sustaining online collaboration among various groups in an organization can be challenging [285]. Thus, within CSCW, a lot of attention has also been paid on examining collaborative [219, 244, 252] and coordina- tion efforts [179, 263, 285, 412] in socio-technical systems, determining ways to foster 138 6.2. METHOD collaboration and coordination [155, 286, 339] and designing cooperative work tools [146, 185, 194, 227]. I complement these prior studies on by establishing fact-checking as a distributed problem that requires collaboration and coordination of the human infrastructure supporting the fact-checking process. In a socio-technical system, human infrastructure does not exist in a vacuum [375]. It is intertwined with the technological infrastructure which is the software, hardware, and processes supporting the human actors in performing their roles [325, 375]. Schol- ars argue that technology and human actors are mutually constituting; one mediates the other [333, 375]. My work also borrows the concept of technological infrastructure to shed light on how the use of various tools facilitates the enactment of the stakeholder groups’ roles in the fact-checking process. 6.2 Method To better understand how fact-checking is practiced in real-world, I conducted semi- structured interviews with six stakeholder groups (N=26): (1) Editors (2) External fact-checkers (3) In-house fact-checkers (4) Investigators and researchers (5) Social media managers, and (6) Advocators. All interviews were conducted with the approval of the Institutional Review Board. I started by interviewing fact-checkers and editors employed in fact-checking organizations and publication houses. The qualitative analysis of the initial interviews (with P3, P4, and P5) as well as conversations with our contacts in the fact-checking organizations during the initial recruitment gave us a new perspective on fact-checking revealing the complex workflows that include several other stakeholder groups (apart from editors and fact-checkers) who work together to achieve the end goal of fact-checking. I then expanded my recruitment to interview people belonging to these other stakeholder groups. 6.2.1 Participant Sampling Technique I adopted convenience [137] and snowball sampling [168] to recruit the subjects. First, I employed convenience sampling to identify fact-checkers via Twitter search. I sent out personal recruitment messages to those Twitter users whose Twitter bio revealed them to be fact-checkers and whose accounts allowed direct messaging. The second author had established collaboration with a few fact-checking organizations. I also reached out to individuals working in these organizations. Next, I used snowball 139 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION P# Gender Exp.(yrs) P# Gender Exp.(yrs) P# Gender Exp.(yrs) P# Gender Exp.(yrs) P1 Female 0.67 P8 Male 2 P15 Female 2 P22 Male 3 P2 Male 0.21 P9 Female 1 P16 Female 0.5 P23 Male 1 P3 Female 2.5 P10 Male 0.42 P17 Male 0.83 P24 Male 15 P4 Male 2.25 P11 Female 2 P18 Male 10 P25 Female 17 P5 Male 3 P12 Male 4 P19 Male 4 P26 Female 1 P6 Female 0.75 P13 Female 2.5 P20 Female 7 P7 Female 0.92 P14 Male 0.5 P21 Male 0.75 Table 6.1: Table showing list of participants with their gender and experience (in years) in their current role. Some participants have been associated with fact-checking work for a longer duration. I only report their experience in the current role in the organization. sampling for recruitment. I requested the individuals who participated in the study to further connect us with other individuals belonging to different stakeholder groups in their or other organizations. I interviewed a total of 26 participants from 16 fact- checking teams/organizations including a Pulitzer Prize-winning editor and journalist. Tables 6.1 and 6.2 list the participants’ demographics, experience in their current role, stakeholder groups that I studied, participating organizations, and the continents I covered by interviewing participants based in those continents. Note that the names of the roles of various stakeholder groups were not always self-reported (with the exception of fact-checkers, news editors, and copy editors), but were rather found during qualitative analysis. Therefore, these roles might not match with participants’ designation in the fact-checking team or organization. For example, the professional designations of two participants performing advocacy work in their respective fact- checking organizations were Head of Public Policy & Institutional Development and Partnerships manager. Several of the participants’ roles were fluid and overlapping, i.e. in some fact-checking teams/organizations, at times a single person enacted several roles. For example, participant P12 performs both editorial and advocacy work. In some cases, a participant provided us with insights about more than one role since they either led or managed the entire fact-checking team or worked closely with the other stakeholder groups and thus, were aware of the various roles involved in the fact-checking pipeline. For example, I interviewed an advocator working at Meedan, an organization that provides institutional and programmatic support to partner organizations doing fact-checking work. Their job at Meedan allows them to work closely with people performing various roles at the fact-checking organizations. Thus, the participant was able to describe the responsibilities and tasks conducted by various stakeholder groups such as fact-checkers, advocators and investigators. 140 6.2. METHOD Stakeholder group Organization Continent 1) News desk editors and copy editors 2) External fact-checkers 3) Social media managers 4) Investigators and researchers 5) Advocators 6) In house fact-checkers Pesacheck [311], Meedan [271], First Check [250], Full Fact [139], The African Network of Centers for Investigative Reporting’s Investiga- tive Lab [33], AFP [43], Africa Check [91], The New Republic [326], The Quint [322], Al Jazeera [214], The Washington Post [381], DPA [13], Maldita [15], India Today [14], Der Spiegel [12], Fine Tip Research & Editing, Freelance Africa, Asia, Europe, North Amer- ica Table 6.2: Table showing the stakeholder groups identified in the study, the partic- ipating organizations, and the continents I covered through the interviews. In the organization column, freelance refers to no association with a particular fact-checking organization/team. I aggregated the roles of stakeholders and their association with fact-checking organization/team to ensure anonymity as in some cases knowledge of network affiliation and role could potentially reveal the identities of a few participants. Note that the participants that I interviewed sometimes provided insights about more than one role. 6.2.2 Interview Protocol and Data Analysis All interviews were conducted between November 2020 to September 2021. I designed a generic semi-structured interview script for the study that contained a set of broad questions about participants and their organization’s role. Based on participants’ responses to these questions, I inquired them about the specific details of their roles. Thus, recruiting and interviewing different stakeholder groups did not require us to file any changes with my university’s Institutional Review Board. I first asked participants to describe their role within their organization and the function of the organization itself. I encouraged participants to share their screens and describe var- ious aspects of their work using real-world examples. I also asked the participants to demonstrate the tools they use wherever applicable. I probed them about the role of technology in their day-to-day work and how the affordances provided by online platforms facilitate or impede their work. To get insights about how fact-checking is practiced in the participant’s team/organization, I asked them to describe all the steps involved in the fact-checking pipeline. I inquired about the other stakeholder groups working in their team/organization who also contribute towards the fact-checking process and how these various groups collaborate with each other. I also discussed various challenges participants face in their job. The interviews lasted between 60 to 125 minutes and averaged over 90 minutes. All interviews were recorded via Zoom or Google Meet. The first author transcribed all the video and audio recordings. Then, two authors independently went through the transcripts and observation notes taken 141 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION during the interviews and thematically analyzed them using a mixture of deductive and inductive coding schemes [74]. First, both authors conducted a deductive scan where the transcripts were coded for categories: fact-checking process, stakeholder groups, participant’s role, decision making, use of tools, collaboration within stake- holder groups, and challenges faced by the participant in performing their role. Then within each deductive code, inductive coding was conducted. Both authors read the transcripts multiple times to determine the codes. Then, the two authors compared and contrasted their codes with each other to refine the codes and resolve the inconsis- tencies. After several rounds of discussions, both authors converged on a final set of themes. I present these themes with respect to the research questions in the following sections. 6.3 Types of Fact-checking: Short-term Claims and Long-term Advocacy This study aims to identify the infrastructures—both human and technological— supporting the fact-checking work. I identify and present the human infrastructure by elucidating the role of six stakeholder groups that need to come in alignment to accomplish fact-checking. The six stakeholder groups are editors (news desk and copy editors), external fact-checkers, in-house fact-checkers, social media managers, investigators and researchers, and advocators. I show how these stakeholder groups’ roles are supported by technological infrastructure. Through the study of the fact- checking infrastructures, I establish that fact-checking exists as both short-term claims centric and long-term advocacy centric fact-checking. In this section, I first provide an overview of the two types of fact-checking before deep diving into the infrastructures supporting them in Section 6.4 and Section 6.5. Figure 6.1 provides an overview of the fact-checking ecosystem including the stakeholder groups, their roles, and various tools that support them in performing their roles. 6.3.1 Short-term Claims Centric Fact-checking Short-term claims centric fact-checking aims at informing the public by debunking misleading claims circulating on online platforms. It begins with fact-checkers contin- uously monitoring the online spaces for potentially misleading content. They identify the exact claim(s) that they want to fact-check and make a pitch to the editorial team 142 6.3. TYPES OF FACT-CHECKING: SHORT-TERM CLAIMS AND LONG-TERM ADVOCACY Infrastructures supporting short-term claims centric fact-checking News desk editors Copy editors Trello board, Google docs (comments) Approve claims Provide feedback & guidance Investigation Find source of claim External fact-checkers In-house fact-checkers Social media managers Manual searches Tools Stringers Tools User tip linesWatch lists Monitoring social media platforms Copy edit 1 Identify claim Identify claim Archive claim Web archves Labelling Scheme Tools Organizing corrections Copy edit N Making fact-checks engaging Trello board (color code, labeling, prioritization), Google docs (comments), infographics Assign a verdict Writing a fact-check Contacting experts Using authoritative data Verify claims Disseminate fact-checks on various social media platforms Increase engagement Tools Contacting experts Using authori- tative data Click through ratesSocial media posts, news letters Advertisements, visual story telling, video captions, video and image editing tools Google doc Google doc, spreadsheets, color coding Measure engagement Update engagement strategies Investigators and researchers Advocators Conducting long term investigative projects Data & network analysis tools, network visualization tools Building collaborations Conducting literacy campaigns and educational workshops Trainings, online courses, workshops, fellowships Policy work Appeals and pitches Types of fact-checking Human infrastructure Stakeholder group roles Technological infrastructure Section 5.3 Section 5.1 Section 5.2 Section 5.4 Section 5.5 Section 6.1 Section 6.2 Ask for correct- ions 4 5 61 2 12 13 3 7 9 8 11 10 Section 5. Infrastructures supporting long-term advocacy centric fact-checking Section 6. 14 15 Figure 6.1: Figure presenting the ecosystem of fact-checking, the whole or part of which could exist in a fact-checking organization or a news publication house. indicates the two types of fact-checking (short-term claims centric and long-term advocacy centric fact-checking) introduced in the study, presents the stakeholder groups involved in the fact-checking process (human infrastructure), shows work done by the stakeholder groups as part of their role, and specifies the tools stakeholders use to mediate their roles (technological infrastructure). The numbers indicate the sequence in which various roles are performed. 143 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION about how they plan to debunk that claim (Section 6.4.3). After gaining the editor’s approval (Section 6.4.1), they archive the content, find the source of the claim, and investigate the claim by using online tools, consulting experts and employing authori- tative publicly available evidence (e,g, statistics on unemployment, census data, etc.). Based on their investigation, fact-checkers assign a label indicating the veracity of the claim and then write a report declaring all the sources they gathered (Section 6.4.3). The written report goes through a rigorous copy editing pipeline to ensure the integrity of the fact-check (Section 6.4.2). If the claim is false, fact-checking organizations reach out to the person/organization who made the claim for correction(s) (Section 6.4.3). Finally, the social media engagement team publishes the fact-check story/document on the social media pages of the organization and adopts several strategies to in- crease the audience’s engagement with the published fact-check (Section 6.4.5). My work also examines the role of in-house fact-checkers doing short-term claims centric fact-checking in news and media publication houses. In-house fact-checkers assist reporters/journalists in verifying and validating their articles by ensuring that the facts and quotes present in the article are correct and backed by authoritative sources (Section 6.4.4). 6.3.2 Long-term Advocacy Centric Fact-checking “Fact checking is more than publishing fact checks. In order to change the informa- tion ecosystem.., we need to spot patterns and try and do something about those patterns.. We try to influence policymakers, information producers, and media to raise their standards and improve the quality of information and public debate. [Our role is] more akin to a campaigning organization. ” - P9 I find that the work of the fact-checking organizations is more than a one-off engagement with misleading claims and fact-checks (short-term claims centric fact- checking). Most organizations also perform long-term advocacy centric fact-checking. They run several investigative projects to study the misinformation ecosystem in their country (Section 6.5.1). They are also actively involved in advocacy and policy work where they try to influence civic bodies and policymakers to improve the quality of data, organize workshops to train organizations and journalists to do fact-checking, and work towards forming coalitions among various fact-checking organizations and internet companies (Section 6.5.2). To demystify the fact-checking process, in the following sections, I present in detail the roles performed by the stakeholder groups involved in both types of fact-checking along with the collaborations occurring in the 144 6.4. INFRASTRUCTURES SUPPORTING SHORT-TERM CLAIMS CENTRIC FACT-CHECKING process. For each stakeholder group, I also discuss how technological infrastructure supports the enactment of their roles. 6.4 Infrastructures Supporting Short-term Claims Centric Fact-checking Short-term claims centric fact-checking is supported by five stakeholder groups—news desk editors, copy editors, external and in-house fact-checkers, and social media managers. I present the roles played, activities performed, decisions made, and tools used by them. 6.4.1 News Desk Editors—Approving Claims and Guiding Fact-checkers News desk editors are one of the most critical stakeholder groups supporting the short-term claims centric fact-checking. They decide what their team/organization is going to fact-check. They approve or reject the claims pitched by the fact-checkers and guide them in their work. 6.4.1.1 Approving claims to fact-check News desk editors are looking for newsworthy claims that impact a lot of people. The first criteria for approving a claim is the popularity and reach of the person making the claim since it increases the chance of the claim spreading far and wide (“it should be newsworthy, said by an important person,... if a popular public figure makes a claim, a lot of people are likely to be exposed to that claim, given the bully pulpit that public figures have”—P18). The second criterion includes a number of people likely to be misled based on the context surrounding the claim. For example, a claim touching upon a communal angle is likely to impact a lot of people in a country that has “many illiterate people..who process information based on their communal experience (P12)”. Third, news editors approve content that is gaining a lot of traction on social media platforms by accumulating engagement in the form of likes, comments, and shares. Content that has received less attention from the public is not fact-checked from the fear of amplifying the false information by inadvertently bringing attention to the false claims (“Are people believing this or are they taking it just as a joke? Does it just have one share, 145 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION meaning if I fact checks it, it will just amplify the fake news and not really give the correct information”—P11). Fourth, opinion pieces and claims that cannot be verified using sources are not fact-checked (“we check things that have facts, that can be verified using records and information. We don’t want a situation where we see this person says this and this person says this, that will just be hearsay”—P3). Fifth, fact-checkers and news desk editors consider several stages of harm that false information is likely to cause—physical, mental, social, or emotional and prioritize the claims that are likely going to cause maximum damage to the public by affecting their health and well-being. “We have stages of harm that you have to check. Is it causing physical harm? Is it causing mental harm? Is it causing someone to lose social standing? How much effect will it have to the person.. if I don’t fact-check the story? .. We do the ones that actually have greater harm, we give them priority.. We have to get it out before most people see it to reduce the harm that it’s causing.” - P3 6.4.1.2 Guiding fact-checkers The job of news editors do not end after approving a claim. They also guide and help fact-checkers in “gathering sources and evidence (P11)” for verifying stories, “understand- ing concepts, finding best available data [for research],..and connecting with the experts given [editors’] long experience in the media ecosystem (P12)”. 6.4.2 Copy Editors—Ensuring Quality of the Fact-checks Copy editors do quality control of the fact-check story/document through multiple iterative copy editing cycles. A fact-check story or document is a report written by fact- checkers containing the claim investigated, sources used for investigation, and verdict indicating the veracity of the claim. This stakeholder group acts like the first readers who determine whether fact-checkers have accurately interpreted the claim in the fact-check, used multiple primary sources as evidence for the investigation, provided working links to the sources used for investigation, and presented the evidence in such a way that it leads to a logical and correct conclusion. They also work towards making the fact-checks engaging by checking the phrasing and grammatical errors. I discuss the tasks performed by copy editors in detail below. 146 6.4. INFRASTRUCTURES SUPPORTING SHORT-TERM CLAIMS CENTRIC FACT-CHECKING 6.4.2.1 Performing copy editing cycles Through the interviews I realized that a fact-check goes through two to three copy- editing stages. Each stage is supervised by a different copy editor to ensure higher quality. The written fact-check is provided to the copy editor in a shared document (e.g. Google doc) where they leave comments to provide feedback. At the first copy edit stage, copy editors check the central premise of the fact-check. “When I get a fact-check, I read through it three times to understand it before I make any change on it, before I ask any questions. I look for the claim and the debunk. Does it really hold? If there are questions about the debunk then we send it back to the news desk.. At that stage, the fact checker will pick it up and go back and try to sort out any queries that have been raised.” - P1 The second copy-edit stage focuses on refining the language and flow of the fact- check. The final copy-edit stage focuses on making the fact-check more engaging and interesting to read. I discuss this aspect briefly. 6.4.2.2 Making fact-checks more engaging Copy editors try to keep the fact-check short, clear, crisp, and interesting. They ensure that the fact-checks are written in a language that is understood by laymen. P1 spoke most candidly about the engagement aspect of the editorial process. They revealed that they often collaborate with social media managers to get feedback on the engagement aspect of the fact-checks. For example, ensuring that the country relevant to the fact- check is present “in the title or in the blurb (P1)” so that people could quickly determine if it’s of interest to them, or making certain that the verdict on the claim is placed high up in the fact-check to get more attention from the public. Copy editors along with social media managers also suggest the addition of info-graphics (engaging visuals, imagery, tables, charts, etc.) to fact-checks in order to attract the eyes of the readers. The info-graphics are added to quickly communicate “complex information in a visual manner (P2)”. They are usually added in long-form fact-checks—the ones that debunk multiple claims and delve into the subject of the claim in greater depth. Copy editors also ensure that the fact-check contains terms that people are most likely to search online. P1, P9, P12, and P18 talked about the claim-review schema used by fact-checking organizations to allow internet companies like Google to index their fact-checks [153]. These participants believe that using popular search terms in fact- check helps increase its visibility since search engines then rank it higher in the search 147 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION results. I briefly touch upon this collaboration between fact-checking organizations and internet companies later in Section 6.5.2. 6.4.2.3 Use of collaborative systems, labeling, and color coding schemes In most of the organizations that I interviewed, both news desk and copy editors use Google Docs to comment and provide feedback on the fact-checks. In addition, fact-checking organizations like Pesacheck, Africa Check, etc. also use an open-source project management and collaboration tool called Trello [392]. The tool provides a dashboard with a series of columns that contain cards. The columns are named so that they denote the current stage of the fact-check. For example, P2 showed the Trello board of their organization which had several stages such as complete fact checks, copy edit 1, copy edit 2, copy edit 3, Q/A & final review, final review done, published live, and amplification done. Trello cards are the basic functioning unit in the tool. The cards hold various information about the fact-checks including due dates, conversations, attachments, etc. The tool allows editors to add colors and labels to the cards. Copy editors use labels and color coding schemes for different purposes. For example, P2 uses labels for prioritizing the fact checks—(“Earlier this year there was a lot of interest in COVID, so I would label fact-check as COVID because we were prioritizing those). P10 uses colors to indicate tasks allocated to people (“It’s easier for me to see what people are working on.. If its research, it will be purple, if its documentation, it will be green.).” 6.4.3 External Fact-checkers—Monitoring, Investigating and Publishing Fact-checks External fact-checkers are the most evident stakeholder group supporting the fact- checking process. Their role is to continuously monitor the external world for po- tentially false claims, investigate them for veracity, assign a verdict, and publish fact-checks. I discuss these roles below. I also specify the technological infrastructures supporting the roles along with the description of the roles. 6.4.3.1 Monitoring online spaces Monitoring content is one of the most tedious steps in the fact-checking workflow. Fact- checkers monitor content reactively (in response to user tips), preemptively (before events like elections, presidential speeches, etc.), and in real-time (tracking current affairs via trends and listening to conversations in real-time). They monitor the content 148 6.4. INFRASTRUCTURES SUPPORTING SHORT-TERM CLAIMS CENTRIC FACT-CHECKING via several tools and artifacts. First, they rely on user reports and tiplines which are useful ways to access misleading content circulating in private groups and WhatsApp that are otherwise hard to access. “We have our WhatsApp tipline and emails.. Public who wants to verify a particu- lar piece of information send it on WhatsApp to us.. We verify those queries at our end and then send them the replies with the fact-check story if possible. If we don’t have fact check story, then we send people whatever information we have on the query. ” - P14 Second, fact-checkers create watch lists and track social media accounts, groups, pages, and websites of repeat offenders–those who posted misinformative content multiple times in the past (“We have a database where we’ve tracked all accounts spreading misinformation.. We go back and check these accounts, see what they’ve posted on their website and personal account.”—P3). Third, fact-checkers rely on manual searches. They follow current events via news or Twitter trends to get updated about topics that people are talking about and track them on all online platforms. Creation of relevant search queries to track these topics is mostly a tedious “hit-n-trial (P7)” method. Searching for a query can give millions of results on search platforms. Therefore, fact-checkers rely on search query syntax to reduce the number of search results. “I do not want the news content. I want user generated content. So one of the simplest tools is to write minus news (search_query -news) so that it cancels out the major news content.” - P7 “I use keywords like “intitle” [on Google search]. [intitle:coronajihad site:facebook.com].. is showing me every search term, every post on Facebook, which has the title “coronajihad”.” - P8 Fourth, fact-checkers use several tools to track content on the internet. For example, they use CrowdTangle [107] to track Facebook’s public pages and groups. Organi- zations that have partnered with Facebook have access to a Facebook proprietary tool colloquially known as the “Facebook Queue”. The tool aggregates potentially misleading content that is accumulating engagement on the platform. In some of the organizations, fact-checkers also rely on several third-party tools (e.g. Social searcher [352], Influencer [207], BuzzSumo [80] etc.) to search and filter content on platforms since they provide them varied search filter options that the original interface of the 149 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION social media platform lacks. For example, Facebook search allows one to filter content by year and not by months and dates. Lastly, fact-checkers also rely on a network of stringers—“reporters who work for a publication or news agency on a part-time basis” [414]—who inform them about the misinformation “circulating in their region..[and] language (P11).” 6.4.3.2 Making a decision to fact-check and extracting claim(s) Once fact-checkers have identified potentially false content on the internet, they iden- tify claim(s) in the content to verify. The fact-checkers can decide to do a short-form or a long-form fact-check depending on the number of claims they identify in the content. For their partnership with Facebook, the fact-checkers only do short-form fact-checks where they identify one claim from the body of the content. In long-form fact-checks, fact-checkers will debunk multiple claims present in the content. “The reason we do [short-forms] is to make sure we are clear, we’re not confusing the audience. This is largely inline with the Facebook partnership where we mainly focus on one claim.” - P2 Next, fact-checkers prepare a pitch to convince news desk editors why the content needs to be fact-checked along with a plan of how they would accumulate proofs to debunk the claim. Once the claim is approved, they archive the content using online websites like archiveis [52], wayback machine [51], etc. Many times people and organizations delete the false claims made by them once they are fact-checked. Thus, archiving becomes essential to prove that the content existed. “Once you identify the claim you archive it because what purveyors of misinfor- mation do mostly, they delete it. So once they delete it you’re unable to read that particular claim.” - P11 After archiving, fact-checkers find the original source of the claim–“who shared [the claim], context with which it (the claim) was originally shared (P14)”, without which the fact-checker would not have a complete picture of the context in which the content was originally shared. 6.4.3.3 Researching Once a claim is identified, fact-checkers collect multiple primary sources to prove or disprove the claim. They use three ways of gathering sources. First, they use quotes 150 6.4. INFRASTRUCTURES SUPPORTING SHORT-TERM CLAIMS CENTRIC FACT-CHECKING from experts such as doctors, physicians, meteorologists, and academics. Second, they use public authoritative sources like government databases (e.g. Kenya National Bureau of Statistics), mainstream news sources (e.g. CNN), and peer-reviewed sci- entific research papers and journals. Third, they rely on several tools. For example, fact-checkers use image and video verification tools such as InVid [209] followed by a reverse image search on search engines to collect metadata and a digital trail of the image/video in order to determine its authenticity. Fact-checkers from First Check, Pesacheck, and Africa Check also reported that they depend on third-party tools such as whois [418], Spoonbill [366], edit history functionality in Facebook, etc. to determine the veracity of a website or post. Through my interviews, I realized the important role played by comments in fact-checking. Comments contain useful clues that help in investigating the claims. “What we do first is read comments before checking. We found in comments that one woman said this is not the entire video, here is the link to entire video. So that’s what led us to entire video. [In comments] we see some clues, how to look for what really happened. ” - P16 6.4.3.4 Assigning veracity label and publishing fact-check After the investigation, fact-checkers assign a label to the claim that reflects its veracity. Through the interviews, I realized fact-checking organizations use a range of labeling conventions, for example, 5 point scale ranging from completely false to true, a four- point Pinocchio scale [314], etc. There is no commonly accepted standard for labeling misinformation. By publishing these details, the fact-checker takes the reader through the entire investigation “so that the readers can replicate [the process] themselves (P18)”. 6.4.4 In-house Fact-checkers—Gathering Sources and Verifying Claims In-house fact-checkers are employed by publication houses to fact-check the stories produced by the journalists before they are published and disseminated. They receive the script or the news story from the journalist along with all source material that they used while researching and writing the piece. The in-house fact-checker then verifies every claim present in the story and delivers a modified story along with a list of proposed changes. Unlike the job of external fact-checkers, the job of this stakeholder group is to fix or remove incorrect claims without publicizing or calling attention to the 151 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION inaccuracies [172]. The need for in-house fact-checkers arises in a publication house not only to ensure that the stories published are reporting accurate facts to readers but also to “protect the publishing house from any liability or future lawsuit (P13)”. I discuss the roles of this stakeholder group below. 6.4.4.1 Identifying and verifying claims In-house fact-checkers verify every line present in the story. Each line usually has more than one fact to be checked, from how a proper noun is spelled, and grammatical id- iosyncrasies to every phrase making a claim. Journalist’s opinions and arguments, and quotes from a reputable expert are the only phrases in the content that are not verified. However, for opinions, the in-house fact-checkers check the context surrounding the argument to ensure it is “right and mainstream (P13)”. All the claims that are vague and without proper sources backing them are modified or removed from the text. “Vague and broad facts which I.. try to get people to remove.. I recently had a whole debate with someone about a line that said America is more divided today than ever. I was like, how do you check that line.. what are the sources for that kind of statement? [It] is just so broad.” - P15 In-house fact-checkers employ a myriad of techniques— top-down approach, prioritization, and batch processing—to identify and verify claims. The majority of the in-house fact-checkers I interviewed first perform a top-down linear scan of the document to get the most central ideas “which if were incorrect, the entire piece (article) would be called into question” (P13). Second, they prioritize the claims for verification. Different fact-checkers prioritize claims differently. For example, P13, P15, and P25 work with organizations that have strict timelines for publications. Thus, they first prioritize the claims that will take the maximum time to investigate. On the other hand, P20 does not have strict deadlines and prioritizes claims that they are certain are false. “[I prioritize] a claim that I know is going to take me a while to nail down. So for example, if there’s a claim ... about someone committing a crime and I need to file a FOIA request3 or go through public records or order a document from sort of government agency, I want to do that as early as possible so that I get myself as much time. You know, basically, any claim that relies on other actors in order to meet to verify.” - P13 3https://www.foia.gov/how-to.html 152 6.4. INFRASTRUCTURES SUPPORTING SHORT-TERM CLAIMS CENTRIC FACT-CHECKING “When I prioritize I start with the things that I know are wrong, and then I look at the things I think are right. And then I check all dates and all names. ” - P20 P15 revealed that they “process the claims in batches”. Everyday they work on a batch of claims and send inquiries and suggestions regarding those claims to the journalist. This technique gives journalists ample time to respond and does not inundate them with several queries towards the very end of the schedule. 6.4.4.2 Gathering sources for verification In-house fact-checkers use “primary reputable sources (P15)” to verify claims. The jour- nalist is expected to give the primary sources that they used while doing research and writing the article. However, if the sources are missing, in-house fact-checkers look for primary documentation (like death certificate, house deed, etc.), mainstream news sources (like CNN, New York Times, etc.), academic peer-reviewed journals, and relevant experts to verify the claims. In case the source used by the journalist is not reliable, the usual journalistic practice is to gather a total of three to four sources to back up the claim. In-house fact-checkers heavily rely on Google search to hunt for sources. Some of the in-house fact-checkers also use Nexis search [17]—a paid service—that gives news articles, blogs, and legal documents as search results. They find the latter to be more effective for searching news and documents. 6.4.4.3 Organizing corrections After identifying and verifying claims, in-house fact-checkers organize and commu- nicate the list of questions and suggested changes to the journalist. According to in-house fact-checkers, there is no standard convention for organizing corrections. However, I found that all fact-checkers use color coding schemes for the organization but in different ways. For example, P13 highlights verifiable and unverifiable claims in the Google doc with different colors and leaves comments containing information about the sources. P15 copies each claim from the doc into a separate row in a spread- sheet and then uses color coding to indicate the state of the claim (pending, verified, incorrect, etc). 153 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION 6.4.5 Social Media Managers—Disseminating Fact-checks, Increasing Engagement Social media managers are responsible for leading all social media initiatives including publishing and disseminating fact-checks on their organization’s social media handles. They determine how to make fact-checks more appealing so that they attract people’s eyes. They also measure the engagement received by the posted fact-check(s) and based on the feedback continuously update their amplification strategies to attract more audience to engage with the fact-checks. I discuss these tasks below. 6.4.5.1 Disseminating fact-checks The social media manager’s primary responsibility is to post fact-checks and educa- tional tip sheets produced by fact-checking organizations on all major social media platforms. Few organizations also have a “WhatsApp number where they broadcast weekly newsletters containing the top fact-checks of the week (P12)”. Social media managers want to make their fact-checks accessible to people with disabilities. Since Twitter did not have a captioning feature with its audio tweets, P17 usually posts fact-checking videos with captions. They use tools like Kapwing [26] to add captions to the videos. “We wanted to see how can we reach, for example, blind and deaf audiences. The audio tweets did not have captioning, they did not have accessibility features for disabled audiences. And so this is why we decided to use the video instead because videos allow us to caption and so people who can’t hear the video can still read the information. ” - P17 6.4.5.2 Adopting strategies to increase engagement Social media managers adopt several innovative ways to increase their content’s reach inorganically (via ads) and organically (via visual storytelling). I discuss some of the strategies. • Running advertisements: A few fact-checking organizations that have partnered with Facebook receive free advertisement credits to promote their fact-checks on the platform. P17 delved deep into the ad usage process. They advertise fact-checks that are not bounded by time since they could be promoted via ads for an extended period. To select the audience for ad targeting, they look at three attributes, namely, the audience’s education, relevance to the fact-check, and 154 6.4. INFRASTRUCTURES SUPPORTING SHORT-TERM CLAIMS CENTRIC FACT-CHECKING (a) (b) (c) Figure 6.2: (a) A short YouTube video explaining a fact-check using comic like visuals (b) An Instagram post containing a fact-check (c) A “postcard” containing fact-check in Hindi language to be shared on mediums like WhatsApp. The single image contains the false-claim and the debunk. interests. They target audiences who have “graduated from a university” (educa- tion), live in “countries that are most relevant to the fact-check” being promoted (relevance), and are “interested in news, advocacy and community issues” (interests). • Visual storytelling: Social media managers find visual storytelling to be a very effective way of getting people to engage with fact-checking content. They’ve found that many people prefer “watching their content over reading it (P18)”. Therefore, they convert fact-checks into a visual narrative (images or videos) before posting them on social media platforms. To create the visual content, these stakeholder groups rely on video and image editing tools. For example, social media managers in a few of the organizations that I interviewed use multimedia editing tools such as Adobe Illustrator [25], Photoshop [28], etc. Social media managers also create comic strips where a cartoon walks readers through the fact-checks (“I myself had started this small comic strip thing to get more engagement.. it became quite popular and people were liking it, and sharing it”—P7). To engage with the local non-English speaking audience, the comic strips, images and videos are also converted to regional languages. “We translate [comics] into Swahili because there’s been this type of content, .. being done in like mainstream languages like English, French, Portuguese, But some of the more widely spoken local languages aren’t really a priority. That was a gap that we identified. ” - P5 Figure 6.2 contains three examples of fact-checks leveraging visual storytelling techniques. 155 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION 6.4.5.3 Measuring engagement and updating strategies Measuring social media engagement is important to determine what content is gaining more traction and which engagement strategy is working. P17 informed us how they track click-through rates to determine how many people are engaging with the content by clicking on the links posted by them. “[Bitly] can tell how many people clicked on a particular link, and we can track that.. [It allowed us to] analyze how people engage with these links, how many clicks come from the newsletter, as opposed to Facebook or Twitter, or Instagram? And then that allows us to know whether we need to change how we present the information. ” - P17 The engagement statistics help social media teams to update their engagement strategies. For example, P17 described how they changed the number of stories pub- lished in their weekly WhatsApp newsletter from five to three after realizing that the public only clicked on the top URLs. “Earlier editions [of WhatsApp newsletter] had up to 5 stories per edition. And we were seeing that people were only clicking the top 2 or 3 links, and were ignoring the rest. So a decision was made to reduce the number of stories that we feature from 5 to 3 to make it shorter.” - P17 6.5 Infrastructures Supporting Long-term Advocacy Centric Fact-checking Long-term advocacy centric fact-checking aims at improving the information landscape by conducting research about various aspects of online disinformation, influencing policies surrounding the availability and quality of data and statistics, conducting educational training for aspiring fact-checkers, organizing literacy campaigns for the general public and forming coalitions with various fact-checking organizations and internet companies. This type of fact-checking is supported by two stakeholder groups—investigators and researchers who conduct long-term investigative projects that include data and network analysis, and advocators who are involved in policy and advocacy work. In this section, I elaborate on the role of these stakeholder groups. 156 6.5. INFRASTRUCTURES SUPPORTING LONG-TERM ADVOCACY CENTRIC FACT-CHECKING 6.5.1 Investigators and Researchers—Conducting In-depth Research and Investigation Few fact-checking organizations (e.g. Full Fact, Code for Africa, etc.) have a sepa- rate team of investigators and researchers. Unlike fact-checkers who engage with individual pieces of misinformation, this stakeholder group conducts an in-depth investigation of persistently circulating misinformation and disinformation campaigns via data and network analysis. “Part of my work at the moment is creating or developing a framework for misin- formation crises. So, as opposed to individual pieces of misinformation.. looking at when responses need to go over and above the day-to-day..everybody who works in the kind of anti-misinformation space gets together and they normally introduce new responses and new policies to manage that.” - P9 “If.. we are seeing a certain type of misinformation occurring.. every single day, debunking each individual one will not help. So what they [fact-checking team] do is now they refer that case to us. And then..[we do an] in-depth investigation to try and see where is this narrative originating from.. The data analytics team is the one that does the sifting through the data sets that we obtained from social media. The forensic team does profiling of key accounts..that we identified.” - P10 6.5.1.1 Conducting long-term investigative projects Investigators and researchers undertake several investigative projects such as verifying the backfire effect4 of fact-checking (when a claim aligns with a person’s beliefs, proving that it is wrong will make them believe it more strongly), studying long term effects of conspiracy theories, examining public engagement with political news, determining how to communicate fact checks effectively, etc. “You’ve probably heard about the backfire effect which is a kind of mythical idea that fact-checking.. does more damage than good. I think the original research was repeated and the same effect wasn’t found.. We also did a project recently where we looked at.. how conspiracy theories affect people’s beliefs in the long term.. who believes and shares misinformation, how to communicate your fact checks [while] presenting them to the audience.” - P9 4https://fullfact.org/blog/2019/mar/does-backfire-effect-exist/ 157 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION “Projects include doing investigations into Russian disinformation or Russian influence in African countries.. investigating Chinese influence operations into .. African countries,.. human trafficking in four South African countries.” - P10 “There’s been a lot of research that was done around.. how to design a fact check to make it more engaging, how do you phrase a headline,.. how much of this information can you put into like a video. What is the ideal video length? How to get more people to interact with them and then how to try to make them much more long-lasting and persistent in people’s memory.” - P5 These stakeholder groups analyze misinformative content that is going viral on social media platforms and then trace it back to the social media accounts that started sharing it. They also conduct qualitative research by conducting surveys. “ We have like a survey and we’ve been trying to find what kind of narratives against immigrants are more popular here in Spain. We’re trying to prepare another big survey about how fact-checking works in Spain and what kind of debunk works better for us. ” - P21 The results of the investigative projects are released as dossiers (e.g. [16], [18]). The dossier provides a background of the issue investigated, data collection and analysis method(s), results of investigation, and conclusion. For example, in [18], investigators study the misinformation influence operations that occurred in Uganda before the January 2021 elections. The dossier first briefly describes the political landscape in Uganda and provides examples of misinformative tweets that acted as the starting point of investigation. These tweets leveraged trending hashtags such as #StopHooli- ganism, #UgandaIsBleeding 5, etc. to spread false narratives against the opposition party using past events from other countries. The dossier then deep dives into the methods used to collect and analyze relevant tweets and finally attributes the influence operation to the supporters of National Resistance Movement6. Investigators and researchers use tools like Python libraries for data analysis and Gephi [22] for network visualization in addition to the tools used by fact-checkers. “We were actually using an open source tool called tweepy to collect tweets from Twitter.. The network analysis is [done] using a tool called gephi [which] will 5In November 2020, Robert Kyagulanyi, a presidential candidate in Uganda was arrested on two separate occasions. The protests that broke out after his arrests were documented in Twitter posts containing hashtags such as #StopHooliganism, #UgandaIsBleeding, etc. 6https://en.wikipedia.org/wiki/National_Resistance_Movement 158 6.5. INFRASTRUCTURES SUPPORTING LONG-TERM ADVOCACY CENTRIC FACT-CHECKING be able to show relationships between tweets using the retweet function like which are the accounts that have been highly retweeted, which are the influential accounts within the network. ” - P10 6.5.2 Advocators—Influencing Policy, Building Coalitions, Conducting Educational Workshops and Literacy Campaigns Several of the study participants revealed that fact-checking doesn’t stop with the generation and distribution of fact-checks and is much more like a sustained campaign. It also includes influencing policymakers and information producers to improve the quality of information (and in turn the quality of fact-checking), building coalitions with other fact-checking organizations, social media companies, and journalistic orga- nizations as well as providing fact-checking training to newsrooms and organizations. Such initiatives are led by advocators. This stakeholder group identifies and real- izes several ways to improve the short-term claims centric fact-checking process. They steward multiple outreach programs, policy initiatives, and advocacy projects locally and globally. All the initiatives started by these stakeholder groups could be consid- ered actions that are performed via technological and informational infrastructures such as workshops, appeals, and training programs. I present the tasks performed by this stakeholder group below. 6.5.2.1 Creating new generation of fact-checkers and fact-checking organizations The advocators conduct training, workshops and fellowship programs for people and organizations all over the world, teaching them nitty-gritty details of fact-checking along with how to setup and operate a fact-checking organization of their own. “Ethiopia.. [is] a country where press freedom is very limited, online false infor- mation often leads to offline.. And most of the media there are state-controlled. So we were conducting training on how they can set up fact-checking desks and try to be independent.” - P2 “In Germany, we only got two IFCN signatories. And that’s not enough. So we try to convince traditional media outlets in the regions..to start with fact-checking.. We are training them. [We have created] a community of fact-checkers..and more than 600 journalists and we are doing training, encouraging them to start with fact checks.” - P19 159 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION Advocators conduct training through webinars or online platforms. Some organi- zations have set up their online learning platforms where they provide video tutorials to fact-checkers (e.g. [125]), while others have partnered with academic universities to conduct educational training. “The advocacy campaigns.. on fact-checking, we conduct them virtually. We have partnerships with Kenyan institutions of higher learning, such as Daystar and Aga Khan university, which allows us to conduct webinars on fact-checking. Those sessions are attended by media professors and their students. We mainly use Jitsi, Slack, and Google Meet to do the training. ” - P2 “We are proud of the digital e-learning platform we have built at DPA. we have a lot of videos here, where we explain how to work with the Internet Archive. Those are webinars [where we] train maximum of 15 people, zoom webinars mean it’s live training. ” - P19 6.5.2.2 Pitching the importance of evidence-based decision-making to policymakers The advocators are also actively trying to reach their country’s policymakers and civil society organizations, informing them about their work and bidding the importance of facts and evidence-based decision-making. For example, P12 attended parliamentary researchers’ conference in Kenya where they pitched the importance of facts in inform- ing the country’s policies and intervention programs. Similarly, P9’s organization tried to get parliamentary support to attribute electoral imprints to election campaigners during referendum and elections in the United Kingdom. “Civil society organizations generate data that then is used by government to put in place policy intervention, say poverty eradication.. interventions in healthcare. So we just want them to understand that as you’re doing this, you also have to check your facts, don’t just rely on a news story or rely on a document or rely on a statement by a public politician to define your problem.. And so I.. tell them.. their job is to tell the leaders what the data says. ” - P12 “We sort of started a project, a couple of years ago about imprints. So in the UK if you distribute companion materials on paper, you have to say who is from whereas online that’s not the case. And during the referendum and.. election.. where certain information comes from online, it wasn’t attributed to campaigners in the same way that it would be offline. And so we tried to get some parliamentary support to change those rules.” - P9 160 6.5. INFRASTRUCTURES SUPPORTING LONG-TERM ADVOCACY CENTRIC FACT-CHECKING 6.5.2.3 Organizing literacy campaigns The advocators actively organize literacy campaigns educating people on how to critically examine information that they find on online platforms. For example, P5 shared how they partner with community radio stations, design MOOC [27] courses, and frequently share tip sheets explaining to people what fake news is and how they can identify it. “We try to also work with community radio stations .. and talk through what like what fake news is,.. talking about information literacy and information disorder and how it manifests” - P5 6.5.2.4 Spearheading initiatives to improve accessibility and quality of data and information Better decisions are made when better data is available. Quality data and information is essential because it acts as a source to verify facts in the short-term claims centric fact-checking. The advocators are actively working to improve the code of practices in releasing data, for example, updating it from the paper to the digital age. “Improving the code of practice for official statistics and updating it to the internet age. And we’ve done lots of individual pieces of work trying to improve specific statistical releases because obviously, they form the basis of a lot of what we do.. we try and get ..the Office for Statistics regulation to be a bit bolder in how they treat misuse of statistics officially.” - P9 The interviews also revealed that data in most of the African countries are either old or is not accessible. The advocators there are engaging with government agencies, making them aware of the problem and stressing the importance of having data publicly accessible. “Before the census in 2019, the last census had been done in 2009. So while there are estimates available on the census data, we would use those estimates. But then the data is just not accurate when it’s estimated.. [So we] talk to the people at the National Statistics Office and say we would like this data. [We] talk to people at the ministry, and trade unions telling them that this is how it would be better if you track unemployment. ” - P12 161 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION 6.5.2.5 Building collaborations and coalitions Several advocators stressed the need for all fact-checking organizations to work to- gether, collaborate, and share resources. Such collaborations would be helpful in understanding the common challenges and needs of the fact-checkers. The coalition would also better position the organizations while making certain demands from the internet companies. IFCN has played a huge part in forming such a collaboration and several advocators are actively working to expand this network. “There’s not a lot of us working together. So that’s what I’m working on in col- laboration with IFCN.. how different organizations and sectors can work better together to complement each other and collaborate and kind of information shar- ing and resources.., [understand] common challenges. For example, when we work with internet companies, there are certain things we might all want to ask them for which, at the moment, [only] some of us asking.” - P9 In addition to a coalition among themselves, a few fact-checking organizations are also actively partnering with companies like Google and Facebook to fight fake news on their platform [102] “because they have a huge impact on the way people experience misinformation (P23)”. P9 and P12 informed us how Google is working with their organizations to make their fact-checks more visible by ranking them higher in Google searches. One of the study participants (P18) was in fact instrumental in starting the initiative. “When we want what our fact checks to rank higher or to be more visible, so there is a back end tool claim review that we use that is integrated into WordPress.. And so the search engine looks at whatever you post as a fact check and then it ranks it higher in the matrix.” - P12 6.6 Needs and Challenges of Stakeholder Groups In this section, I answer RQ2 by presenting the challenges faced by various stakeholder groups categorized by emerging themes. 6.6.1 Skepticism Towards AI and Automation Fact-checkers expressed skepticism and distrust towards artificial intelligence (AI) and automation of the fact-checking process. This finding resonates with prior work that 162 6.6. NEEDS AND CHALLENGES OF STAKEHOLDER GROUPS revealed that the black box nature of artificial intelligence techniques and machine learning algorithms make their inner workings unintelligible to humans thereby decreasing users’ trust in their outputs [302, 348]. P4 divulged that they are skeptical of Facebook’s AI-based tool that aggregates potentially misinformative content for fact-checkers to verify and rate on its platform because they think that “algorithms hide a lot of stuff ”. They believe that fact-checkers or independent organizations should be responsible for aggregating content to fact-check on the social media platforms rather than “[companies] that are running the platforms.” Fact-checkers understand that machine learning models work when there are lots of similar data for training and pattern recognition. Thus, P18 doubts AI’s capability to detect falsehood in politicians’ statements which can be very diverse in language and topics. They also doubt AI’s capability to differentiate between a true and a false statement especially when there are only subtle differences between the two. P19 does not believe that fact-checking could be automated since its a complicated process that requires several humans to discuss and make decisions. I think [AI based tools are] going to be less useful for most.. politicians, because the problem is people don’t repeat stuff the same way and the addition of a word or two can make a huge difference.. I don’t think a computer is ever going to be able to figure that out. - P18 “[Fact-checking is] such a complex process, for example, extracting a claim,- what’s the underlying meaning of a certain claim, how to understand it. Even [for hu- mans] it’s a process of discussing and then deciding. So I think it still will be humans work in a way. ” - P19 Despite the skepticism towards AI, I found that some fact-checking organizations have indeed adopted AI-based tools and a few others showed a willingness to adopt such tools. However, such tools are only acceptable for low-stake scenarios of monitor- ing content on social media as compared to high-stake scenarios of assigning a veracity label to the content. This observation is in line with recent work that found that the use of AI is more acceptable in low stake compared to high stake decision-making processes [56, 346]. “I will not trust any algorithm or any AI to flag something as right or wrong, at least not at this stage.. I prefer AI only to give us a curated list, flag us that this is something that we should look in. Make a tool that picks up the signals that [indicate the content is] misleading and makes a curated list. ” - P22 163 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION “If you’re serious with fact-checking, you cannot replace it by automatization or things like that. But it could be helpful in [social-media] monitoring. ” - P19 To increase fact-checkers acceptance of AI-based tools, P16 and P12 stressed how it’s essential to have humans in the fact-checking loop. P12 further said that they would only trust the AI output if the tool is able to explain how it arrived at a par- ticular conclusion. Past studies have also developed human-in the loop AI systems [363, 430] and tried to make them more explainable in order to foster trust in them [35, 111, 258, 278, 348]. “I think at a certain point, AI stops and human being needs to come in to verify.. I don’t think that AI can really do exactly the same what we can do. Not yet at least.” - P16 “ The manual process would still have [to be there]. I would be willing to use it [automated tool] to see how it arrives at that conclusion. So if you say this is misinformation, and these are the sources of data that we’re using to make that. And we check and find that the algorithm.. is not using them out of context. So at that point, we would be in a position to say let’s check it. ” - P12 6.6.2 Need For Tools and Limiting Social Media Affordances I found stakeholder groups divulging several challenges related to the tools they used and the affordances provided by social media platforms. In the process, I also discovered a few needs with respect to tools and systems. I present them next. 6.6.2.1 Monitoring Social Media Platforms is Manual, Time Consuming, and Difficult. Fact checkers complained about the information overload problem. With the emer- gence of a variety of social media platforms, the amount of online information is increasing exponentially. However, for fact-checkers, searching for misleading content mostly remains a manual task. In addition, the generation of search queries that could lead to potentially dubious content is still based on the hit and trial method. As dis- cussed in Section 6.4.3, only Facebook has provided a tool to its partner fact-checking organizations that can aggregate potential misinformation. While a number of social media monitoring tools have been developed to identify and aggregate misinfor- mation, particularly on Twitter (such as [48, 88, 204, 327, 356, 389], etc.), other social 164 6.6. NEEDS AND CHALLENGES OF STAKEHOLDER GROUPS media platforms (e.g. YouTube, Google, Yandex, etc.) where misinformation is equally prolific also need attention. “The search is not easy to be honest.. it takes a lot of time to actually find the content because we only have a manual method to do that, hit n trial method to do that.. I have even scrolled to the point where YouTube shows me no more results. So that’s how manual it gets.” - P7 “If there’s some.. tool that helps filter..different types of misinformation will help because now you have information overload and you don’t know what to choose and what not to choose.” - P3 6.6.2.2 Limitations of platform affordances. Many fact-checkers complained about how online social media platforms’ affordances become a hindrance for searching and filtering content. For example, the inability to search for posts and comments on Facebook, a lack of search trends feature on plat- forms (with the exception of Twitter), the unavailability of fine-grained search filters, the inability to download content on platforms like Instagram, and the inability to search for same videos that were uploaded with different keywords (title, description, etc.) on YouTube are some of the limitations of platform affordances. “Facebook is actually tricky to be honest, because there’s no one way to search content. you can only search people..pages and groups, etc.. but I need user generated content.” - P7 “Youtube search engine is not very good. It will just show you only videos which are popular which has no use. Misleading videos won’t have too much views, but they have a lot of uploads. What happens is that since a lot of people.. upload videos using different caption, different keywords. So.. [there might be] 10 versions of the same thing. ” - P8 6.6.2.3 Systems and tools needed to detect misleading claims on private message platforms. Privacy settings on social media groups (e.g. Facebook) and end-to-end encryption in messaging platforms (e.g. WhatsApp) act as a hindrance in accessing misleading content circulating on these platforms. The fact-checkers informed that they need tools that allow them to access and flag content on these platforms. Recent research work 165 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION has focused on building crowd-sourced WhatsApp tip lines for discovering content to fact-check [230, 273]. However, being able to track or report private messages via tools necessitates serious ethical considerations. Private messaging platforms give users a false sense of security, thereby making them share sensitive information (e.g. clinical records of patients [267]), without anonymizing it [182] and hence, expose users to privacy risks [60]. 6.6.2.4 Overload of tools. During the interviews, fact-checkers showed and talked about several tools that they use during fact-checking. One of the participants P8 showed us around 15 tools. There is a tool overload problem. P3 suggested building a single tool that could provide all functionalities that fact-checkers need. However, since fact-checking is a complex task involving multiple steps, it is difficult for a single tool to cater to a divergent set of requirements and functionalities [389]. Thus, building a suite of purpose-built tools that cater to the specific needs of fact-checkers in the various steps involved in fact-checking would be more useful for the fact-checkers. “[If one could] put all these tools in one, .. So I don’t have to look for different tools when I’m analyzing a video or an audio. I can do it in one place instead. Like.. I can use 10 tools to analyze the video. But if we can have one..[tool that] gives you the information you need.” - P3 6.6.2.5 Need for specific tools Despite the problem with tools overload, existing tools lack task-specific functionalities. For instance, a lot of steps involved in video fact-checking are manual. While tools like Invid and reverse image search are available to verify a video, they are only useful to check if the video is digitally altered or used in a different context than the original. For all other videos (e.g. videos with conspiracy theories), the claim extraction and claim verification process is manual. For lengthier videos, this process could get even more tedious. “There is no tool to debunk [conspiracy theory videos]. For example, I cannot do a reverse image search, and I cannot divide the video into keyframes, because it is a narrative that is false.. We have to really go line by line and see what the person is saying, and then all we can do is search quotes from different organizations to [verify the claim] .” - P7 166 6.6. NEEDS AND CHALLENGES OF STAKEHOLDER GROUPS Fact-checkers believe that “efforts against misinformation advance much more and much quicker in English” (P23) than in other regional languages. They need tools to transcribe videos in local regional languages. “The problem with transcribing videos online or using any other software is languages because India has so many languages and we tend to get a video and information in every possible language. So it becomes difficult to have one dedicated tool to transcribe all our videos.” - P14 Editors revealed that editorial work is mostly manual. P1 spoke about how editing work is “still very human” and human resource man- agement tools like Trello could be further strengthened. For example, currently, “there are a lot of issues of accountability” with the Trello board, it lacks control features because of which “anybody can move a card anywhere.” 6.6.2.6 Getting organic engagement for fact-checks without advertisements P17 revealed that while on some platforms (e.g. Twitter) it is easier to “get organic reach and engagement” by adopting appropriate strategies, it is “a big challenge to get [same].. attention without advertising” on other platforms (e.g. Facebook). 6.6.3 Issues around policy and information infrastructure Stakeholder groups discussed several challenges surrounding information availability and quality. They understand that quality data is essential not only for developing AI-based automated tools but also for investigating claims. 6.6.3.1 Need to improve information quality before automation AI models are as good as the data available. If the information against which the claims are to be verified is missing or of low quality, the models would never work. “Automation is tricky because.. in a place like East Africa ..information is not readily available. You cannot say that I’ll go to this site and get this information so that when these numbers are presented it can easily be automated.” - P12 P9 raised an interesting point along the same lines. They explained how the success of automated fact-checking is dependent on the accessibility and format of statistics and information available. The data has to be in the same format for the machine to be 167 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION able to understand it. Their organization is working with institutes around the world to improve statistics globally. “[We want] statistics being published in a kind of open accessible and consistent format. So, for example, like any symbols that are used to show a caveat about data needs to be the same so that our machine can understand them each time.. My colleague.. is working with the open data Institute in UK and also globally.. to [determine] whether there are ways of improving statistics so that automated fact-checking can work.” - P9 P22 elicited how data in their country is not available in a user-friendly or machine- usable format. “The data, you will find that it is in a PDF... file or a photograph on the web- site..then how do we use it?...You need the data ..[to be] put into a excel sheet so you can clean it. Data should be in a format which is user friendly so it can be used, [it should be] machine learning friendly so that the machine can pick up and use the data. ” - P22 The aforementioned concerns elicited by fact-checkers are in line with the prior research that has also listed the absence of structured and quality data as well as lack of adherence to data standards, as few of the major challenges faced in the field of big data [42, 115]. Lack of harmonization between data sets makes data integration a complicated and time-consuming task [8, 9]. Data integration is necessary in various scenarios related to fact-checking [251], for example, querying different databases originating from different sources to determine the veracity of content or determining whether a piece of content has already been fact-checked by searching in various fact-checking databases [86, 251]. Thus, scholars have stressed on the need to adhere to common data standards to help facilitate data integration and reuse [67]. There have been a number of cross-country initiatives to set common standards for data in various domains. For example, in 2017, European Medicines Agency held a meeting to discuss the opportunities and challenges in applying a common data model to healthcare data across the countries in Europe to support regulatory decision-making [10, 11]. Furthermore, several advocators (Section 6.5.2) have also been spearheading initiatives to improve the data quality in their respective countries. 168 6.6. NEEDS AND CHALLENGES OF STAKEHOLDER GROUPS 6.6.3.2 Lack of information sources. The interviews with fact-checkers in the Global South revealed that the information needed to investigate claims either does not exist or is not updated periodically. “There’s a lot that we don’t cover because of lack of sources.” - P2 “In Kenya,.. demographic Health Survey, which .. shows the health situation in the country, the last one was done in 2014, this is 2020. It’s just too old, and we can’t use it.” - P12 6.6.3.3 Difficulty in getting information for research from civic organizations. In most cases, fact-checkers need multiple sources to debunk a claim. To obtain these resources they rely on public data sets and information from government and civic organizations. Six fact-checkers working in African countries and the Balkan regions informed us how information needed for research is not publicly available and getting it from officials is a long and difficult process. “It’s so hard to get information from the government.. because everybody wants to protect themselves. They don’t want to give you the information that you actually need.” - P3 6.6.3.4 Algorithmic bias against local content P5 complained about algorithmic bias in terms of how content indexed in search engines is skewed towards the Global North region making it very difficult to search for local content in regional languages of the Global South. “The way the algorithms works was very I’d say Euro-centric or like North Amer- ica [centric].. Trying to find tools that would enable me to find.. stuff that’s not in English, like things in local languages was quite challenging.. it would take a while for some of the stuff from from this side to be indexed on.. Google searches and other platforms .. So, there’s sort of algorithmic bias when it comes to.. find stuff like that.” - P5 6.6.4 Emotional cost of fact-checking In addition to the manual labor involved in fact-checking, fact-checkers also face significant emotional toll and stress in their job. They are often victims of online 169 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION threats and abuse from users and conspiracy theorists whose posts they were tasked to debunk. Manually scanning through misinformative content about certain topics, such as riots, and conspiracy theories, also has adverse effects on their mental health. “It becomes kind of very stressful job. Seeing all violence and getting into each and every detail, kind of takes a toll on your mental health. And then again you have to listen to abuses. And the situation is worse when you like put this content online and then people attack you” - P8 “We are exposed to threats, these conspiracy theorists are very aggressive and I worked only for like a couple weeks when I saw a CrowdTangle post with my own photo saying, this is your censor and it was very unpleasant feeling to find that.” - P16 While previous work has studied emotional labor and psychological symptomatol- ogy in content moderation work [54, 123, 231, 318, 329, 369], no study has investigated the emotional cost of fact-checking work. Studying human costs underlying the fact- checking process and determining wellness interventions to psychological effects of fact-checking are other fruitful avenues for future research. In this section, I discuss how this study renders visibility to the human and techno- logical infrastructures supporting the fact-checking work and the collaborative efforts involved in the process. I also discuss the needs of the stakeholder groups and the implications of my findings on future research directions in fact-checking. 6.6.5 Rendering visibility to the human infrastructure of fact-checking To date, the primary objective of fact-checking is considered as debunking mislead- ing claims. Based on this objective, prior work suggests that fact-checking can only influence three constituencies— people, journalists, and political operatives [49]. By identifying and examining the human infrastructure—the stakeholder groups that need to be brought into alignment to accomplish fact-checking, this work provides a means to think about fact-checking as a multidimensional initiative which then assists in understanding 1) the invisible aspects of the process, and 2) the other ways the fact-checking process can have an influence. This work shows that fact-checking is supported by several processes, such as editorial work, social media engagement work, in-depth research and data analysis, as well as advocacy and policy work that might not be visible to the external world. Through the study of these processes, 170 6.6. NEEDS AND CHALLENGES OF STAKEHOLDER GROUPS I establish how fact-checking has evolved to include both short-term claims centric and long-term advocacy centric fact-checking. I make visible the efforts that fact-checking advocators are putting in to improve the availability, accessibility, and quality of data and statistics by aligning the focus and interests of governments and internet companies with fact-checking organizations. Through these efforts, the fact-checking organizations are not only improving the information landscape of their country but in turn are also improving the quality of (short-term claims centric) fact-checking itself. Rendering visibility to the work of the human infrastructure and the invisible processes of fact-checking has helped in uncovering the needs—both social and technical—of the entire fact-checking ecosystem consisting of all the stakeholder groups. The knowledge of the needs of the ecosystem could enable the design and development of tools and policies to support various aspects of the fact-checking process. 6.6.6 Collaborative efforts in the fact-checking process This study highlights fact-checking as a distributed problem where collaboration takes place at multiple stages among people with different skill sets within and outside the fact-checking team/organization. First, collaboration occurs among the stakeholder groups: editors, fact-checkers, social media managers, researchers & investigators, and advocators (refer Figure 6.1 for an overview). Second, collaboration extends to the outside world with experts such as doctors, oncologists, academics, etc. whose expertise is needed to investigate dubious claims. Third, fact-checkers collaborate with civic and government organizations during the investigative stage to access data and statistics related to the claims under investigation. Fourth, in parallel, advocates reach out to policymakers to influence policy by highlighting the challenges faced by fact-checkers. Fifth, several internet companies, like Facebook, collaborate with fact- checking organizations to fact-check and debunk misleading claims on their platform. Finally, collaboration occurs between social media users and the stakeholder groups at two stages: 1) at the content monitoring stage where users report dubious claims that they encountered online directly to fact-checkers via tip lines, and 2) when users engage with fact-checks disseminated by social media managers. Rendering visibility to the collaborative efforts in the fact-checking process has multiple benefits. I discuss a few. 171 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION 6.6.6.1 Increase in efforts to foster collaborations Making visible the collaborative efforts in the fact-checking ecosystem can lead to efforts and policies that foster these collaborations. There has been growing research on how to make users engage with the published fact-checks [50, 96, 132, 151, 432]. For example, fact-checking organization Pesacheck created a Twitter bot named debunk bot that detects tweets containing URL(s) to misinformative content and replies to them with the link to the fact-check that their organization has published [20]. Similar investigations and efforts can be put to support other collaborations, such as between experts and fact-checkers. There has been only a handful of recent efforts in this direc- tion, for example, Meedan’s Digital Health Lab7 and Facebook’s Journalism project8 support fact-checkers in debunking health-related misinformation by connecting them with health experts and providing them with resources on the health topics that they are covering. Imagine a private social media platform consisting of separate com- munities (like subreddits) of fact-checkers, and experts from different fields (from doctors, journalists, meteorologists, to university librarians and professors) to facilitate easy and targeted communication and information sharing. Fact-checkers can post questions in relevant communities, seek quotes from experts, and get suggestions for online and offline resources to support their investigations. Fact-checkers can also share with each other their concerns or information about new tools that they discov- ered. Such a platform could facilitate fact-checking organizations in addressing online misinformation effectively and in a timely manner. 6.6.6.2 Revealing the value of fact-checking work to internet companies In recent times several platforms such as Google search, YouTube, and Google images have started actively using fact-checks produced by fact-checking organizations with their search results to help determine their validity and truthfulness [19, 23, 29]. The study also revealed how fact-checking organizations are collaborating with internet companies and allowing them to index their fact-checks. This finding contributes towards the HCI scholars’ call of making people aware of “the value their data brings to intelligent technologies” [24, 407]. This work highlights the value that fact-checking is bringing to social media companies that regularly use fact-checks to regulate the 7https://meedan.com/digital-health-lab 8https://www.facebook.com/journalismproject/facebook-partners- with-meedan-digital-health-lab-to-help-fact-checkers-debunk-health- misinformation?locale=pa_IN 172 6.6. NEEDS AND CHALLENGES OF STAKEHOLDER GROUPS content on their platforms and provide reliable information to the users. Previous scholarly works have raised questions on whether volunteer-created content such as Wikipedia articles should receive more economic benefits from the internet companies [405, 407]. Along similar lines, I want to raise the question of whether fact-checkers and fact-checking organizations should also receive more economic benefits for their work. 6.6.6.3 Revealing power dynamics in collaborations This study also sheds light on the power dynamics of collaborations happening in the fact-checking ecosystem. For example, this work shows how editors have the power over fact-checkers in determining what kinds of misinformative claims should be prioritized. Additionally, in certain situations, fact-checkers depend on civic and government organizations to get data for investigation. Tools released by social media companies, such as Facebook Queue (refer Section 6.4.3.1) also dictate what kind of claims fact-checkers investigate. These companies also have the power to remove the fact-checks from their platform. For example, Facebook removed a fact-check on abor- tion after receiving complaints from Republican senators [21]. Further investigation of the potential consequences of these power dynamics in the fact-checking process is a fruitful avenue of future research. 6.6.7 Implications for future research on fact-checking Taking a multi-stakeholder perspective on the fact-checking process helped us learn the needs of stakeholder groups as well as uncover the challenges that go beyond the technical aspects of fact-checking. I use the findings to discuss and propose various directions that future research on fact-checking can take. In Section 6.6.7.1, I start by discussing the needs of various stakeholder groups and propose solutions for the same. In Section 6.6.7.2, I discuss the values that the stakeholder groups desire in the tools and systems built for them. Next, in Section 6.6.7.3, I discuss how focusing on technical solutions alone is not enough and how the existing automated approaches fail to work in real life since they ignore the social aspect of fact-checking. I also reflect on the social and civic challenges faced by fact-checking organizations by discussing the role of information infrastructure in fact-checking. Finally, in Section 6.6.7.4, I end by stressing how the current research on misinformation has not focused on the Global 173 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION South countries and how there is a dearth of fact-checking tools built for regional languages of the Global South. 6.6.7.1 Technical needs of fact-checking The study reveals that monitoring social media platforms is the most challenging aspect of the job of a fact-checker. A combination of over-reliance on third-party tools to discover potentially dubious content, limiting platform affordances, and a manual way of going through each search result to determine if it’s potentially misleading makes the process extremely tedious. Access to better search filters on social media platforms, and a community-based approach to reporting misinformation where users of social media platforms are able to report problematic content are some useful ways to assist fact-checkers in finding misleading claims. Fact-checkers also agreed that it’s impossible for them to have access to all corners of the web. For example, they do not have access to content on private messaging platforms like WhatsApp that have become a popular haven for groups interested in sharing misinformation [47]. Given fact-checkers’ skepticism towards AI and automa- tion, user reporting via tip lines appears to be a feasible solution to access content on such platforms. Research on how to motivate people to report problematic content is a fruitful avenue for future research. I also found that fact-checkers end up viewing long videos to extract misleading claims and reading through the lengthy comments sections to get clues that would help them investigate the claims. A system that could utilize comments to highlight all the misleading claims present in the video, leaving the final decision of selecting what claims to verify to fact-checkers could be useful for the fact-checking community. Furthermore, a tool that highlights credibility indicators (such as comments containing useful information for investigating the claims) would significantly reduce the manual effort put in by fact-checkers. The interviews revealed that when it comes to assigning a verdict to a claim there is no single standard labeling system shared across fact-checking organizations. Collective agreement on the labels could allow for greater information sharing and interoperability. Future efforts can focus on developing structured ontologies for representing credibility assessment metrics that would lead to the development of common labels and benchmarks for assigning veracity labels to the claims. This work also sheds light on the needs of stakeholder groups other than fact- checkers. There is a lack of editorial and process management tools that could be used by news desk and copy editors. Social media managers need effective strategies to 174 6.6. NEEDS AND CHALLENGES OF STAKEHOLDER GROUPS increase users’ engagement with fact-checks. In order for fact-checks to really have an impact, they must be “seen and attended to by audiences” [50]. To accomplish this, it is essential to understand who shares fact-checks on social media platforms and what modality or visual storytelling technique is more suited for which platform. Tools are also needed to convert fact-checks to multiple languages to increase their reach. Platforms can also help in making the fact-checks more visible and accessible. Google’s efforts (e.g. claim review) to prioritize the ranks of fact-checks in searches is one such effort. Technology critics have called structured journalism, where fact-checks are produced in a machine readable form, as the future of fact-checking [41]. Recent work has tried to automate extraction of structured information—claim, claimant, and verdict from the fact-check, to allow search engines to display it in the search results [215]. More such efforts towards structured journalism are needed to integrate fact-checks with online content. 6.6.7.2 Values desired in fact-checking tools and systems The study participants expressed skepticism about automation and AI technology because of its black-box nature. At the same time, they also showed a willingness to adopt automated solutions for low stake tasks. Fact-checkers do not want systems that decide the veracity of information, rather they want tools that could help them with their day-to-day tasks such as monitoring online platforms, checking whether a video is digitally altered, transcribing videos in regional languages, etc. Algorithm explainability combined with tools that have humans in the loop emerged as key values that fact-checkers desire in the systems built for them. Familiarity with the inner workings of algorithms along with tools that use both human and machine capabilities for problem-solving can help increase fact-checkers’ trust in the automated systems. Recent times have witnessed a burgeoning interest in the field of human- centered XAI where researchers draw from formal HCI theories to design explanations on how machines reach a particular decision [36, 40, 55]. Scholars are also studying how to design human and machine configurations to operationalize human-in-the- loop systems [180, 430]. Understanding specific needs for explainability with respect to fact-checkers, operationalizing those needs at the conceptual and methodological levels in tools developed for fact-checking, designing systems that could take fact- checker’s feedback and use that to modify the algorithm used by the system are few useful directions for future research. 175 CHAPTER 6. IDENTIFYING WAYS TO SUPPORT FACT-CHECKING ONLINE MISINFORMATION 6.6.7.3 Going beyond the technical: need for socio-technical solutions Would technology-mediated solutions alone lead to improvement in the quality of fact-checking? While automating the entire fact-checking process and developing new tools to increase efficiency and scalability seems promising, it is not a panacea to all the problems faced by the stakeholder groups. A holistic change could only be achieved via systematic changes in the civic, political, and informational contexts. Through this study, I found how accessing data from the civic and government bodies is a difficult task, especially if it portrays the government in a less-than-favorable light. While fact-checking is now a global endeavor, in some countries information is either not publicly available or essential information (e.g. census data, health surveys, etc.) is outdated because of a lack of periodic collection. The interviews revealed how several claims are left unchecked because of a lack of sources. The availability of high-quality up-to-date information is essential for fact-checking—manual or automatic. Addition- ally, good quality data is a precursor to having good machine learning models [355] that would be needed to automate the investigative step in the fact-checking pipeline where publicly available authoritative statistics are used to determine the veracity of claims. Thus, as part of long-term advocacy-centric fact-checking, advocators within the fact-checking organizations have been actively pushing for policy changes to improve the availability and quality of data and statistics. For example, fact-checking organiza- tion Full Fact gave oral evidence to the House of Commons Public Administration and Constitutional Affairs Committee on issues of coherent and accessible health statistics in the United Kingdom [296]. There is a growing interest in the HCI community to en- gage with policymakers as a way to inform policy that could benefit society [248, 365]. Future work in fact-checking could focus on understanding the opportunities and difficulties that advocators face while engaging with civic organizations and determin- ing strategies that advocators can adopt to shape policies surrounding statistics and public data in their respective countries. 6.6.7.4 Fact-checking in the Global South Recent work in the CSCW community stressed on the fact that academic research, to date, has primarily focused on misinformation in Western countries, while not addressing the phenomenon in the Global South [190]. This study reiterates the lack of knowledge and context surrounding the misinformation landscape in the Global South. Local regional languages are under-resourced both by online platforms and 176 6.7. CONCLUSIONS AND LIMITATIONS search engines making it difficult for fact-checkers to gain access to local context online and for regional speakers to gain access to reliable information [126]. The current design-based approaches to fact-checking do not take into account the lack of fact- checking resources in regional languages and thus, fail to account for the unevenness of the viability of fact-checking across the globe. Advocators recognize how access to local community-specific knowledge and culture is essential to understand the characteristics of misinformation and why it spreads in a particular region [144]. Thus, they are calling attention to the need to improve the information infrastructure and research in the Global South. This work also acts as a call to action for researchers to study the misinformation landscape in the Global South region. 6.7 Conclusions and Limitations This work sheds light on how fact-checking is practiced in the real world by presenting the infrastructures—both human and technological—supporting the fact-checking work. I interviewed 26 participants belonging to six primary stakeholder groups involved in the fact-checking process namely editors, external fact-checkers, in-house fact-checkers, investigators and researchers, social media managers, and advocators. By studying the various tasks performed by these stakeholder groups, I identified the role of tools, technology, and policy in their work. Finally, I also identified key challenges faced by the stakeholder groups along with opportunities of advancing current tools, policies, and technology for fact-checkers. This work is not without limitations. The majority of the organizations that I interviewed are either IFCN signatories or work closely with IFCN signatories. Thus, the fact-checking model and tools used by the stakeholders presented in this study might not apply to every fact-checking organization. The fact-checking process could also be subjected to various degrees of local and regional variabilities that this work does not capture. I also acknowledge that not all stakeholder roles are present in every fact-checking organization that I interviewed. For example, not every organization is doing long-term investigative or advocacy work. I leave the examination of the factors influencing the fact-checking work in different regions and the variability in fact-checking work across regions to future work. 177 C H A P T E R 7 DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN In Chapter 6, I interviewed fact-checkers to understand the process of fact-checking, and the technical, informational, and social needs of the fact-checking organizations. Using the insights from that work, in the last thread of my research, I design and build a system to combat online misinformation. More specifically, I present YouCred—a fact-checking system that I designed with and for fact-checkers to help them with their fact-checking workflow on the YouTube platform. 7.1 Motivation With the growing scale and widespread dissemination of online misinformation, mon- itoring social media platforms has become an ongoing challenge. While there has been an increase in efforts to develop fact-checking systems, especially for platforms like Twitter and Facebook, these systems have struggled to make a substantial impact on the practices and reporting methods of fact-checking organizations [317]. Moreover, there is a noticeable absence of such tools for video search platforms such as YouTube [317]. This is particularly noteworthy considering that YouTube is the second-largest search engine and the most popular video-sharing platform [3] and has been fre- quently labeled as a hub for conspiracy theories [45, 82, 416]. To address this issue, I partnered with Pesacheck [311], Africa’s largest indigenous fact-checking organization, 178 7.1. MOTIVATION to develop YouCred—a fact-checking system for the YouTube platform. To ensure that YouCred caters to the actual day-to-day needs of fact-checking orga- nizations and assists them in their fact-checking workflow on the YouTube platform, I first conducted a formative interview study. The objective of the study was to gain insights into the methods employed by fact-checkers for monitoring and fact-checking videos on YouTube, as well as to identify the challenges they encounter and their spe- cific requirements. I interviewed seven participants performing various fact-checking roles (as identified in [317]) at Pesacheck. I also included two fact-checkers outside of Pesacheck to ensure the broader applicability of YouCred beyond a single organization. Through this study, in line with the findings of Chapter 6, I found that fact-checkers heavily rely on manual searches to find misleading content on YouTube. In addition, generating search queries that could lead to potentially dubious content is still based on guesswork, domain knowledge, and experience. As a result, fact-checkers spend countless hours crafting search queries and scanning YouTube in search of potentially misleading information. Furthermore, I also discovered a lack of video annotation tools to aid fact-checkers with credibility assessments on YouTube. Based on these insights, I designed the YouCred system to aid fact-checkers with misinformation dis- covery and credibility assessments on YouTube. The system is a result of a 2-year long collaboration with Pesacheck. I integrated the knowledge and feedback of Pesacheck’s fact-checkers throughout all stages of system development, including requirement elicitation, feature engineering, system design, deployment, and testing. YouCred offers a misinformation discovery feature designed to generate search queries related to significant events and topics of interest to fact-checkers which are likely to yield misinformative results on YouTube. The system also provides an intuitive credibility assessment interface that simplifies video annotation, allowing fact-checkers to high- light misleading claims, add comments, and contribute other relevant information to investigate the veracity of the information presented in the videos. To test the acceptance of YouCred, I deployed the system at Pesacheck for nine months and closely monitored its usage. I also conducted interviews with Pesacheck’s team to gain deeper insights about YouCred’s applicability. My evaluation revealed that YouCred was used until the very end of the deployment period. Fact-checkers found YouCred to be a valuable tool. It provided them with a wealth of information that would otherwise be challenging to obtain manually, enhancing their fact-checking capabilities on the YouTube platform. Overall, YouCred represents a significant step towards combating online misinfor- 179 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN mation on video search engines. The collaborative efforts highlight the importance of an ongoing dialogue between fact-checking organizations and technology devel- opers to ensure the relevance and effectiveness of technological solutions in the fight against online misinformation. I show that by incorporating stakeholders’ insights in designing fact-checking systems, I can develop solutions that have a tangible impact on real-world fact-checking practices. My study makes the following contributions: • YouCred serves as an example of how to design systems that bridge the gap between the needs of fact-checking organizations and the development of fact- checking systems. By integrating the knowledge and feedback of Pesacheck’s fact-checkers throughout the entire development process, YouCred is designed to align with the values and fulfill the requirements of fact-checkers in a real-world context. • YouCred introduces an innovative approach to assisting the fact-checking process by automatically generating search queries pertaining to important events and topics of interest. This feature provides structure to the current search query gen- eration method which traditionally relied on guesswork and lacked a systematic approach. • The YouCred system provides an intuitive interface for annotating YouTube videos, allowing fact-checkers to highlight misinformative claims, add comments, and other relevant information. This streamlined annotation process simplifies the identification and flagging of misleading content within videos, enhancing the overall fact-checking workflow. • YouCred was deployed and monitored for a period of 9 months, allowing for extensive evaluation. The continuous usage of YouCred during this period indi- cates its practical value and usefulness for fact-checkers. Additionally, the study also underscores that designing and deploying systems is not enough, we need continuous maintenance and evolution of systems to effectively address the evolving needs of fact-checkers. 7.2 Formative study To inform my work, I conducted a formative interview study to understand the current fact-checking practices to monitor YouTube and opportunities for improving those practices. In this section, I describe the details of the formative study including the participant details (Section 7.2.1), study procedure (Section 7.2.2), and findings 180 7.2. FORMATIVE STUDY (Section 7.2.3). Next, I describe the design goals I identified to guide the development of YouCred (Section 7.2.4). Finally, I describe the collaborative and iterative design process adopted for the study (Section 7.2.5). 7.2.1 Participants and Procedures I conducted in-depth, formative interviews with 9 participants. Among them, 7 par- ticipants were affiliated with Pesacheck (referred to as P1-P7) and held various roles within the organization. These roles included 4 external fact-checkers responsible for monitoring online platforms, investigating dubious claims, and creating fact-check reports, 2 news editors overseeing the fact-checking process, and an advocator en- gaged in long-term investigative research on persistently circulating misinformation. For a detailed understanding of these roles, please refer to [317]. To ensure the sys- tem’s flexibility and adaptability to different organizations’ needs, I also interviewed 2 additional external fact-checkers (referred to as P8-P9) from other fact-checking organizations (DPA [31], and First Check [30]). In order to preserve the participants’ anonymity, I refrain from providing their demographic details and specific roles within the organization. 7.2.2 Interview protocol I began the interviews by asking participants how they monitor YouTube and discover misleading videos on the platform. Then, I asked participants to share their screens and guide me through a fact-checking report debunking a false claim made in a YouTube video. I asked them to explain the entire process followed in the report, including how they found the video, identified the problematic claim, conducted investigative work, and wrote the fact-checking report. Next, I asked questions about the tools that fact-checkers employed in the process as well as any disadvantages associated with those tools. Finally, I asked about the challenges encountered throughout the fact-checking process on YouTube and discussed how the affordances of YouTube facilitated or hindered their work. Overall, the formative interview study provided valuable insights into the needs, values, and challenges experienced by the stakeholder groups. I conducted interviews on Zoom and they lasted between 60-90 minutes. The first and second authors went through the transcriptions of the recordings and coded them using thematic analysis. In the next section, I report the findings of the formative study. 181 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN 7.2.3 Findings All participants emphasized the challenge of finding and fact-checking misleading content on YouTube. One participant stated how “they have seen a decline in the YouTube videos that [they] fact-check because of the hardships that [they] face with finding misinforma- tion on the platform” (P5). The interviews revealed that searching for misleading content on YouTube remains a manual and difficult task. The generation of search queries to find potentially misleading content is based on “guesswork and trial-error method” (P3). As one participant explained, “To search about a particular topic/claim on YouTube, fact-checkers start with relevant terms related to certain claims or terms that best describe that person or their situation or history, and [they] keep trying different options until [they] find some results that they’re looking for” (P4). P3 further added that searching for generic queries like “Obama will get both misinformation and non-misinformation content. So it’s up to [the fact-checker] to through all search results to figure out which one is misinformation”. Going through the search results is also challenging because the top search results usually contain videos from mainstream news channels. As one participant elaborated, “It’s sort of hard for you to zero down to exactly what you’re looking for... If you look for, say, Ebola, they’ll give you these videos that are authentic, say, from the BBC, DW, Aljazeera, and CNN, and that is not potential misinformation” (P5). The modality of videos poses additional challenges, as fact-checkers may have to “watch very long videos from start to end” (P2). To save time, some fact-checkers prefer using transcripts and searching for specific keywords. As one participant explained, “ I go to a video that is 1 hour long and if you’re looking for misinformation about COVID-19, I just search for COVID-19, and then I only go through the instance where the person in the video spoke about COVID-19” (P3). Interviews revealed that the YouTube platform also lacks certain affordances that are useful for fact-checking workflows, for example, “the ability to search for a lot of things at once” (P2), “get the analytics for search results” (P2), ability to download search results (P2, P3), and the capability to filter search results using a combination of search filters, for example, date-range and engagement metrics (P2, P3, P4, P5). The interviews further revealed that the process of annotating a video for veracity is unstructured and they rely on google docs to keep track of all information about the fact-check—“When it comes to video annotation, it is very analog. What we do is take the actual video, download the transcript..figure out what part of the video we want and use google docs to annotate the videos” (P1). When probed about what kind of tool they envision, I repeatedly observed partici- 182 7.2. FORMATIVE STUDY pants expressing to have control over the outcomes of the tool, for example, “I want to be able to choose the keywords”, “I need the ability to decide what videos I want to look into”, “human being needs to come in to verify. I don’t think that [any tool] can really do exactly the same what we can do, not yet at least”. All of these statements underscored the partici- pants’ strong desire for a sense of control and agency when it comes to the tools they would utilize. Participants also expressed a desire for the tool to support searching “in a specific country, a specific region” (P5) and the ability to search in “regional languages of Africa, such as Amharic” (P7). They also desired a unified system that incorporates most elements of their workflow, eliminating the need to switch tabs and providing a seamless experience (P4, P6). 7.2.4 Design goals Drawing from my own research in Chapter 6 and the formative study, I identified five design objectives that inform the design of the YouCred system. These objectives directly address the needs and obstacles fact-checkers encounter when monitoring YouTube and conducting credibility assessments on YouTube videos. • Automated Search Query Generation: YouCred aims to automate the generation of search queries, which is currently a tedious and manual task for fact-checkers. The system aims to eliminate guesswork by suggesting relevant keywords that could lead to misinformative search results. • Data Visualization and Insights: YouCred empowers fact-checkers with intuitive and informative data visualizations that present crucial engagement patterns (such as likes, views, and comment counts) and publication dates in a clear and accessible manner. This feature enables fact-checkers to efficiently prioritize videos for analysis. • Multi-Term Tracking: Fact-checkers expressed a desire to track multiple search terms and channels simultaneously. By providing this functionality, YouCred aims to facilitate efficient monitoring and analysis of content relevant to their topics of interest. • Adaptability to Different Languages and Contexts: YouCred addresses the fact- checkers desire for multilingual content discovery and annotation capabilities by supporting search in various languages and in specific regions. • Agency and Human Involvement: Fact-checkers value their agency in the fact- checking process and consider human involvement crucial for verification and contextual understanding. YouCred aims to strike a balance between automation 183 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN and human expertise, allowing fact-checkers to have control over verification capabilities while leveraging the benefits of automated tools. (a) (b) (c) Figure 7.1: (a) A snapshot of the UI widgets implemented in the Jupyter Notebook to demonstrate the search query generation methods, (b) Figure presenting the initial wireframe of the YouCred view-results page, developed in Figma, (c) Figure displaying an example snapshot of one of the initial workflow diagram created for YouCred 7.2.5 Design process I adopted a collaborative and iterative design approach to create the YouCred system. From June to September 2021, I came up with search query generation methods based on prior literature and the findings of the formative study. These methods were implemented using interactive widgets in a Python Jupyter Notebook (e.g. Figure 7.1a). Iterative improvements were made based on feedback received from the fact-checking community at Pesacheck. The feedback was largely positive, and the fact-checking team expressed interest in supporting the design and evaluation of the system. From October to December 2021, the first author collaborated with two undergraduate students who had extensive experience in UX design and prototyping. Together, they created wireframes (e.g. Figure 7.1b) and workflow diagrams (e.g. Figure 7.1c) using Figma1, a widely used prototyping tool. The development of YouCred commenced in January 2022, with features being added incrementally. Regular meetings were conducted with Pesacheck’s team to showcase the built features and gather feedback. 1https://www.figma.com/ 184 7.3. OVERVIEW OF YOUCRED The team graciously shared their time, expertise, and ideas, demonstrating a deep and continuous engagement throughout the project. Between May 2021 and June 2023, I conducted approximately 38 meetings with the Pesacheck team. The meetings were attended by 1-21 members from Pesacheck, with the team lead of the fact-checking team being the sole attendee in four of the meetings. The UX designers and developers who were involved in building YouCred also joined every meeting. These meetings were conducted in English over Zoom and had a duration of approximately 1 to 2.5 hours. Additionally, constant communication was maintained through a Slack channel. In the next section, I describe the YouCred system in detail. 7.3 Overview of YouCred YouCred is a fact-checking system specifically developed to support fact-checkers in combating online misinformation on the YouTube platform. This comprehensive system offers a wide range of functionalities that significantly enhance the efficiency and effectiveness of fact-checking activities. At its core, YouCred features a powerful misinformation discovery module that generates search queries that are of interest to fact-checkers and have a high probability of returning misinformative videos on YouTube. By eliminating the need for manual query formation, this automated search capability saves fact-checkers time and effort. Furthermore, YouCred incorporates a visualization component that provides detailed insights into date of publication of videos, as well as the engagement received by search results, including likes, dislikes, comments, and view counts. This visual representation empowers fact-checkers with valuable information to prioritize their investigation efforts more effectively. Addi- tionally, YouCred offers a user-friendly credibility assessment interface, facilitating the annotation and analysis of YouTube videos. Fact-checkers can easily highlight misinformative claims, add comments, and provide additional information to delve deeper into the veracity of the presented information. This streamlined interface accel- erates the fact-checking process and empowers fact-checkers to efficiently evaluate the credibility of videos. In the upcoming sections, I provide a detailed description of the design of the YouCred system and its functionalities, specifically focusing on how it facilitates misinformation discovery (Section 7.4) and credibility assessments (Section 7.5). 185 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN 7.4 Misinformation discovery The primary objective of the YouCred system is to help fact-checkers discover misinfor- mation on the YouTube platform. The system accomplishes this objective by generating search queries related to important events and topics that need monitoring and are of interest to fact-checkers. Search queries are generated via four methods that include leveraging YouTube video tags, Google Trends search queries, YouTube’s autocom- plete suggestions, and analyzing frequently occurring words within the transcripts of misinformative videos. However, for each of these methods to work, fact-checkers are required to provide a small, curated list of seed misinformative (or potentially misinformative) videos pertaining to a particular topic as input. While only a mini- mum of one seed video is required for a topic, I recommended fact-checkers upload at least 3 videos for the topic to get better results based on my testing. I explain the ways fact-checkers can provide input to the system in Section 7.4.1 and expand on the search query generation methods in 7.4.2. 7.4.1 Inputting seed videos to YouCred The fact-checkers have two options to input seed videos into the YouCred system: uploading a CSV file for each topic separately or utilizing the ’YouTube-CSV-Helper’ Chrome extension. Initially, when the system was deployed in September, only the CSV method was available for inputting seed videos. However, after a month of testing and usage, fact-checkers provided feedback that creating input CSVs for each topic was a tedious task, hindering their frequent use of the system. To address this issue, I conducted two brainstorming sessions with the fact-checking team and developed the ’YouTube-CSV-Helper’ (YCH) Chrome browser extension. This extension simplifies and automates the process of providing seed misinformative videos to the YouCred system. Once the seed videos for a specific topic are uploaded using either method, they are stored in a database and remain accessible for future use. Fact-checkers can conveniently browse through existing topics without the need for repetitive CSV creation and uploading. In situations where multiple fact-checkers upload CSVs or use the extension to submit videos about the same topic, I merge the videos and eliminate duplicates. The YouCred system also allows users to view, edit, delete, and download all uploaded topics and their corresponding seed videos as a CSV file. Below, I provide a detailed description of both the CSV and extension methods of providing input to the YouCred system. 186 7.4. MISINFORMATION DISCOVERY Figure 7.2: Figure illustrating the workflow of YouTube-CSV-Helper extension. 7.4.1.1 Manually uploading a CSV YouCred offers fact-checkers the capability to upload a maximum of 30 CSVs simulta- neously, with each CSV focusing on videos related to a specific topic. It is essential that each CSV includes standardized column headers, namely Video Title’, Video Link’, and Misinformation Category’. The Misinformation Category’ aligns with the veracity labels currently employed by Pesacheck, encompassing categories such as altered, false headline, hoax, missing context, partly false, satire, false, and likely false. I also accom- modate a ’No conclusion’ category for situations where fact-checkers add potentially misinformative videos as seed videos that have not undergone a formal fact-checking process at the organization. To ensure consistent topic naming, fact-checkers are re- quired to manually enter the topic for each CSV. To aid in this process and promote topic name consistency, I provide fact-checkers with a user-friendly drop-down list containing all existing topics. This prevents fact-checkers from inadvertently using different names for the same topic, such as "Kenya elections" and "Kenyan elections". 7.4.1.2 Via CSV helper extension To simplify the process of gathering seed videos for YouCred, I developed the ’YouTube CSV Helper’ browser extension. This extension offers fact-checkers a convenient way to store the details of potentially misinformative YouTube videos they come across on social media, user tip-lines, etc. When fact-checkers encounter such videos, they can simply click on the extension, triggering a popup that displays the video’s title and URL. In this popup, fact-checkers can enter the misinformation category and the topic to which the video belongs. To ensure consistency, I provide a drop-down list that contains all existing topics. Additionally, fact-checkers have the option to add their name for proper attribution, although this field is optional. The popup also includes an ’Edit’ button that directs fact-checkers to a page listing all videos along with their associated topics, misinformation categories, and fact-checker names. Each video entry in the list is equipped with a ’Delete’ button, 187 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN Figure 7.3: Snapshot of YouCred’s topic database. allowing fact-checkers to remove entries as needed. Furthermore, an ’Annotate’ button is available for each video, enabling fact-checkers to send the video to the annotation database of the YouCred tool. This feature is particularly helpful if fact-checkers intend to fact-check the seed video(s) and publish a corresponding fact-checking report. The edit page further offers the option to filter videos by topic, providing fact-checkers with enhanced organization and accessibility. Moreover, fact-checkers have the flexibility to selectively send all or specific topics to the topic database and/or the annotation database of YouCred. Additionally, they have the capability to download topics and their corresponding videos as separate CSV files, facilitating easy access and storage of the data. These features collectively contribute to a more efficient and streamlined process for fact-checkers to provide input to the YouCred system. Figure 7.2 shows the workflow and features of the extension. 7.4.2 Formation of search queries Fact-checkers can form search queries for topics for which they’ve uploaded seed videos. They can choose four methods of query generation including query generation using video tags of seed videos, frequently occurring words in YouTube videos, the Google Trends platform and YouTube’s autocomplete suggestions. For each of the methods to work, fact-checkers have to input their YouTube API key in the tool. All four methods provide fact-checkers with complete agency in terms of what they want to search on YouTube. They are free to add/modify/remove search terms from the generated search queries or add their own search queries which can then be monitored 188 7.4. MISINFORMATION DISCOVERY on YouCred. After finalizing search queries, fact-checkers can select optional region and language parameters2. The region parameter returns search results with videos that can be viewed in the specified countries (default value is ’Worldwide’). Language parameter returns search results that are most relevant to the specified language (default is ’All languages’). Both these parameters were requested by the Pesacheck team since they sometimes monitor videos specific to a region and language. Fact- checkers also have the ability to specify the date range to obtain videos that were published in that timeframe on YouTube. They have to enter the number of search results desired for the search query (minimum 1 and maximum 2003) and select one or more search filter(s) by which they want the search results to be sorted (uploaded date, video rating, relevance, video count, or number of views ). I added an additional feature to the search query generation methods, which allows for the exclusion of videos that have been marked as “blocked” by the fact-checkers. During the process of reviewing the search results (as explained in Section 7.4.3), fact- checkers have the capability to block videos that do not contain misinformation. Once this optional feature is selected, the blocked videos will not appear in the YouCred search results. By blocking non-misinformative videos, fact-checkers would have the option to focus their attention and resources on reviewing videos that are more likely to contain misinformation. This streamlines the fact-checking process and enables fact-checkers to allocate their time and efforts more effectively. Fact-checkers informed me that YouTube’s top search results predominantly feature videos from mainstream channels. They compiled a list of 46 credible mainstream news channels that have a lower likelihood of containing misinformation. Some examples include The Ethiopian Reporter, Nation Africa, KTN News, New York Times, and Politifact. The team requested to exclude videos from these channels when displaying the search results. Although not visible on the system, I internally remove videos from these channels by adding a minus operator [277] before each channel name in the generated search query. This modified query is then used to query YouTube and retrieve the search results. It’s important to note that the capability to exclude blocked videos and remove videos from selected channels is available across all four query generation methods. I will now provide a detailed description of the four methods. 2https://developers.google.com/youtube/v3/docs/search/list 3The maximum number of search results was selected after consulting the Pesacheck’s fact-checking team. 189 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN Figure 7.4: Figure illustrates YouCred’s query generation method page, which utilizes the YouTube video tags method. The page displays a collection of video tags that can be sorted either by frequency or alphabetically (A). Each tag is accompanied by its frequency of occurrence. When a tag is selected, its corresponding bubble changes color to blue (B). Fact-checkers can choose multiple tags, and as they make their selections, the chosen tags are appended with the topic to form the search query. Importantly, the search query is editable, allowing fact-checkers the agency to modify it as needed (C). 7.4.2.1 YouTube video tags This method leverages YouTube video tags found in the seed videos uploaded by fact- checkers to YouCred. Video tags are chosen by the channel owners and are typically not visible to viewers. However, they can be accessed through the YouTube API4. These tags can be viewed as search words that content creators use to enhance the discoverability of their videos. They serve as labels that indicate the associated topics, themes, or relevant keywords of the video, helping users find the video more effectively on YouTube’s search engine5. The tags extracted from the seed videos can also be employed as search queries to uncover more misinformative videos on YouTube. Previous research has demonstrated that video tags are highly informative input features for detecting misinformative videos [303]. Moreover, the practice of using tags as search queries, referred to as "misinfo-queries," has been utilized in the literature on misinformation audits, where tags were employed to retrieve misinformative results [222]. In YouCred, I empower fact-checkers to sort tags alphabetically or by frequency. As fact-checkers select the tags, I append them to the search topic using search query operators. In case the seed videos do not have any tags associated with them, fact- checkers would have the option to proceed to the next query generation method that 4https://developers.google.com/youtube/v3/docs/videos 5https://support.google.com/youtube/answer/146402?hl=en 190 7.4. MISINFORMATION DISCOVERY they selected or return to YouCred’s home page. Figure 7.4 shows the interface of YouCred query generation page via tag method. The seed videos have used tags such as ‘enemy of the new world order’ and ‘red pill revolution’, ‘population control’. These selected tags are combined with the topic name ‘ebola’ to form a search query which can be used to find more misinformative videos related to Ebola on YouTube. (a) (b) Figure 7.5: Figure depicts YouCred’s query generation page utilizing the Google Trends (GT) method. Fact-checkers begin by selecting keywords that serve as seed words for extracting GT topics (A). They also have the flexibility to add custom keywords (B). Next, fact-checkers choose the GT topics of interest (C), select the countries and languages (D) they want to focus on, and specify the desired date range (E). The system then extracts the GT search queries (F), which fact-checkers can review and select from. The search query generated is editable, allowing fact-checkers to modify it as needed (G). 7.4.2.2 Google Trends Fact-checkers actively monitor online platforms by tracking current topics and real- time search trends of people [317]. They keep a vigilant eye on popular claims and assertions related to topics of interest [317]. To assist fact-checkers to monitor popular themes about a topic of interest, I leverage Google Trends (GT) platform. GT’s search queries are a good indicator for understanding how people search for a topic on Google-owned platforms including YouTube. As a result, researchers have extensively used search queries obtained from GT to monitor misinformation and disinformation on online platforms [92, 205, 222, 223, 319, 338]. YouCred also utilizes GT’s search 191 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN queries to help fact-checkers gain a better understanding of the public’s interest and the prominence of certain claims on the YouTube platform. YouCred extracts and displays both the most popular and least popular search queries related to a topic in a specified region and time period on the YouTube platform. The system showcases the most popular search queries as they represent the commonly used terms by users. Additionally, the system presents the least popular terms, as these could potentially be exploited by conspiracy theorists to spread false information, a phenomenon known as data-voids [167]. Figure 7.5 shows how the process of obtaining search queries from GT is automated. As a user manually enters a seed word in the search bar of GT, the platform presents a dropdown list of existing GT topics containing that word. For example, entering seed word ‘Ebola’ results in GT suggesting GT topics such as ‘Ebola virus’, ‘Ebola disease’, ‘West African Ebola virus epidemic’, etc. To automate this step, I prompt fact-checkers to select a few relevant seed words. To assist in the selection process, I provide a list of the most and least occurring unigrams and bigrams found in the titles and descriptions of seed videos related to the topic. Once the seed words are selected, I curate all the GT topics suggested by the platform. Fact-checkers can then select the relevant GT topics from the list. Then, I ask fact-checkers to select the date range, country, and language of search trends. I utilize these parameters along with the GT topic to extract and present the search queries about the topic. All search queries selected by fact-checkers are appended by the OR (+) operator 6 to ensure that the search results obtained encompass the chosen queries. 7.4.2.3 YouTube transcript YouCred utilizes the keywords occurring in the misinformative seed videos as potential search queries. A misinformative YouTube video could potentially have multiple false claims about a topic and the keywords associated with the claims could be potentially used as search queries for finding other misinformative videos related to the topic. To facilitate this, YouCred extracts transcripts of all the seed videos using the Python open-source package, youtube-transcript-api. The transcripts are subjected to standard text preprocessing steps of stop word removal and lemmatization. Then, I use TfIdf to extract the unigram and bigram features. TfIdf assigns higher weights to terms that are frequent within a specific transcript but relatively rare across the entire collection. This helps in distinguishing significant terms that are indicative of 6Search query search-term-1 + search-term-2 will return videos containing any of the search terms. 192 7.4. MISINFORMATION DISCOVERY the content and themes of the seed videos. The usage of TfIdf features in the analysis of misinformative YouTube videos has been widely adopted and recognized in literature [140, 199, 222, 354]. The fact-checkers can choose from the top x% of the features (default value of x is 5% but could be modified). To form the search query, all features selected are appended together using the OR operator. Fact-checkers have complete agency to add/remove/edit and term of the query generated. 7.4.2.4 YouTube autocomplete Autocomplete suggestion is a widely used functionality found in web search engines, designed to aid users in constructing their search queries. It is estimated that about 75% of search queries on Google are influenced by autocomplete suggestions [32]. Prior studies have shown that auto-complete suggestions could lead users to misinformation online [200, 205, 223]. Additionally, previous work has revealed that autocomplete suggestions in languages of the Global South, such as Amharic, Kiswahili, and Somali, could especially expose users to harmful content [100]. Given their impact, misinforma- tion appearing in search results for these terms could impact a large number of people and thus, are of interest to fact-checkers. Therefore, YouCred enables fact-checkers to track autocomplete search suggestions for topics of interest. In order to facilitate the selection of seed words for obtaining search query sug- gestions, I provide fact-checkers with the top and bottom x% of frequently appearing unigrams and bigrams. These are curated from the titles and descriptions of seed videos related to the topic at hand. By default, the value of x is set at 5, but fact-checkers have the freedom to modify this value as they see fit. Additionally, fact-checkers can include their own custom seed words for generating queries. YouCred extracts all search queries for the selected and/or entered keywords and prompts fact-checkers to select the ones that they want to monitor. The selected search queries are joined using OR search operator. 7.4.3 Viewing and filtering search results Once search queries are generated using one or more query generation methods, fact- checkers can access and evaluate the corresponding search results on the view-results page. This page serves as a powerful tool, equipping fact-checkers to monitor, analyze, and track search results in a user-friendly and efficient manner. It provides a centralized hub where fact-checkers can conveniently view and manage multiple search queries 193 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN Figure 7.6: Snapshot of YouCred’s view-results page consisting of multiple columns, with each column representing the search results of a specific query. The column header provides essential information such as the search query generation method (A), the search query itself (B), the applied sorting filter, and the count of search results (C). The page offers functionalities like downloading the search results as a CSV file and removing individual columns (D) as needed. Within each column, there is an interactive graph (E) that visualizes the engagement received by the search result videos and their publication dates. The page also includes sections dedicated to individual videos (H) representing each search result. These video sections provide important metadata such as the video title, channel name, upload date, views, likes, comments, and a thumbnail. If fact-checkers identify a potentially misinformative video, they can add it to the annotation database (F) for tracking and later fact-checking. Additionally, fact-checkers can utilize the block video functionality (G) to prevent a video from appearing in future search results. 194 7.4. MISINFORMATION DISCOVERY simultaneously, enabling comprehensive evaluation and timely response to emerging trends and misinformation. The view-results page offers a range of capabilities and features designed to enhance fact-checkers’ experience and effectiveness. These include the ability to customize search queries, an interactive graph for visualizing engagement metrics of the search results, access to search results metadata, video preview options, and other customization options to tailor the view and analysis of search results. Each of these features is described in more detail below, showcasing the rich functionality and flexibility provided by YouCred system. 7.4.3.1 Tracking Multiple Search Queries Simultaneously The view-results page presents search results corresponding to the search queries as dedicated columns, ensuring a structured and intuitive layout. Fact-checkers can easily navigate between different search queries, enabling them to compare and evaluate various sets of search results efficiently. This organization enhances clarity and stream- lines the fact-checking process. Figure 7.6 shows the snapshot of the view-results page with two columns, each corresponding to a different search query. The header of each column denotes the query generation method, followed by the search query, the search filter selected to sort the results, and the number of search results. The visually appealing and intuitive interface of YouCred ensures that fact-checkers can quickly scan and digest large volumes of videos without feeling overwhelmed. Fact-checkers can refresh and get the latest search results by simply getting double-clicking the query and pressing enter. 7.4.3.2 Customization through Editable Queries, sorting options, and addition of New Columns Fact-checkers using YouCred have the flexibility to customize their search queries directly within the user interface. By double-clicking on a search query, they can easily edit it according to their specific needs. Once the editing is completed, simply pressing enter triggers YouCred to fetch and display search results for the modified query. Additionally, each column in the view-results page features a Sort Type drop-down button, allowing fact-checkers to further sort the fetched results based on criteria such as date, views, likes, or comments. This sorting feature empowers fact-checkers to focus on videos that are gaining engagement or target recently published content, enhancing their ability to prioritize fact-checking efforts. 195 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN YouCred also provides an option to remove a column by clicking on the button, enabling them to declutter the interface and focus on relevant search queries. Moreover, the search results within a column can be downloaded as a CSV file by clicking on the button, facilitating easy data management and analysis. To expand their monitoring capabilities, YouCred enables fact-checkers to effortlessly add new columns dedicated to tracking additional search queries. By clicking on the plus symbol located on the left side of the page, fact-checkers can create new columns tailored to different search queries, allowing them to simultaneously track multiple topics or keywords in real time. (a) (b) (c) Figure 7.7: The view-results page in YouCred features an interactive, dynamic, and multifunctional scatter plot graph. This graph showcases the engagement received by the videos in the search results, represented on the y-axis, along with their respective dates of publication on the x-axis. (a) When hovering over a point on the graph, a text box displays detailed information about the video, including its title, engagement metrics such as likes and views, and the date of publication (Figure 7.7a). (b) Fact- checkers have the ability to select a specific cluster or area of interest within the graph (Figure 7.7b), (c) allowing them to zoom in and enabling a more focused analysis of selected videos (Figure 7.7c). The "View Selected Results" button filters the search results, displaying only the videos within the selected area, facilitating a more targeted evaluation. To revert back to the original graph view, fact-checkers can simply click the "Clear Brush" button, resetting the graph and allowing for further exploration and analysis. 7.4.3.3 Interactive Graphs for Engagement Metrics Each column in the view-results page of YouCred features a dynamic, multifunctional, and interactive scatter plot graph. This graph showcases the engagement metrics on the y-axis and the date of publication of videos on the x-axis, providing valuable insights into the performance and timeline of the videos present in the search results. By default, the view count is displayed as the metric on the y-axis, providing an 196 7.4. MISINFORMATION DISCOVERY Figure 7.8: Snapshot of YouCred’s preview mode. Fact-checkers can click on any video in the view-results page and can view the video in the system itself. initial understanding of the popularity of the videos. However, fact-checkers have the flexibility to modify this metric based on their preferences. They can simply click on the sort-by drop-down button and select either the number of likes or comments as the desired metric. Each point on the graph is a YouTube video present in the search result. Hovering on a point displays the video details including Title, like and view count, as well as date of publication of video. This information provides fact-checkers an overview of the video and its engagement metrics. The graph also aids in visualization and identification of any correlation or clustering of views based on the age of the videos or around a specific date range. To further explore specific clusters or areas of interest, fact-checkers can zoom in on the graph. By clicking and dragging the cursor around the desired region, they can filter out and focus solely on the zoomed-in version of the selected points, allowing for a more detailed analysis. In case any adjustments need to be made or additional points need to be included, users can easily reset the graph to its original state by clicking the "Clear Brush" button. Additionally, a convenient "View Selected Results" button allows users to exclusively view the videos within the selected area, aiding in focused analysis. Overall, the dynamic and interactive scatter plot graph in YouCred’s view-results page provides fact-checkers with a comprehensive and intuitive tool for visualizing engagement metrics and video publication dates. It enhances their ability to iden- tify patterns, correlations, and clusters within the search results, facilitating efficient analysis and evaluation. 197 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN Figure 7.9: Figure shows the snapshot of YouCred’s video annotation database. Fact- checkers add videos to this database while exploring the view-results page or directly from the browser extension. All videos have a corresponding annotate button which takes the fact-checkers to the video’s annotation page. This database contains the video’s title along with other metadata such as views, likes, upload date of video, channel, etc. Columns conclusion is populated once fact-checkers assign a veracity label to the video on the annotation page. Added date column denotes the date on which the video was added to the database. The page also provides a variety of search and filter options to find or view selected videos. 7.4.3.4 Video Metadata Sections Below the interactive graph in each column, fact-checkers have access to detailed video sections. These sections present the search result videos along with their corresponding metadata, including the video title, channel name, upload date, views, likes, comments, and a thumbnail image. By clicking on the video thumbnail, fact-checkers can enter preview mode and view the video within the YouCred system. Based on the video’s metadata and preview, if fact-checkers identify the video as potentially misinformative, they have the option to add it to the annotation database (described in Section 7.5.1) for future fact-checking. To add the video to the annotation database, fact-checkers simply click on the button and can optionally add tags to aid in categorization. Additionally, fact-checkers have the ability to block a video by clicking on the button (refer to Section 7.4.2). 198 7.4. MISINFORMATION DISCOVERY Figure 7.10: Figure showing YouCred’s annotation page that streamlines and facilitates the credibility assessment process. The header corresponds to the video’s title (A). The video is embedded towards the left side of the page (B) and the video’s transcript, subtitles, title, and description are shown in the middle in separate tabs (C). Fact- checkers can highlight misinformative claims (D) in any tabs, add corresponding annotations (E) and also assign a veracity label to the video (F). Figure 7.11: Snapshot of YouCred’s claim database that stores entries for all the misin- formative claims highlighted by fact-checkers in the videos that they annotated. The database shows the fact-checker name, the misinformative claim highlighted in the video, the veracity label of the claim, tags associated with the claim, and the date when the video was added to the annotation database. 199 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN 7.5 Credibility Assessments YouCred plays a crucial role in assisting fact-checkers with credibility assessments by providing a robust platform and valuable resources. First, YouCred offers an anno- tation database (Section 7.5.1) that allows fact-checkers to annotate videos, analyze transcripts, and assign veracity labels, enabling them to accurately assess the credi- bility of the information presented. Second, YouCred also curates a claims database (Section 7.5.3) that serves as a comprehensive repository, storing entries for misinfor- mative claims highlighted by fact-checkers in annotated videos. It offers a snapshot of fact-checked claims, including details such as the fact-checkers name, the misinforma- tive claim, the veracity label, associated tags, and the date of video addition. These centralized and structured databases enhance fact-checkers’ ability to track, analyze, and combat misinformation effectively. 7.5.1 Video annotation database Fact-checkers have multiple convenient methods to add videos to the annotation database: either while exploring the view-results page or directly from the browser extension. They have the flexibility to continuously add videos to the database and return to it later to annotate videos of their choice or prioritize important ones. Figure 7.9 shows a snapshot of the annotation database. Each video entry in the database includes an ‘annotate’ button, which, upon clicking, takes fact-checkers to the specific video’s annotation page. This allows them to easily annotate the transcript in a new tab and provide accurate assessments. Similarly, clicking on the video title link opens the corresponding YouTube video page in a new tab, enabling fact-checkers to access additional context if needed. The database ensures that essential metadata for each video is stored, including the video’s title, number of views, likes, upload date, and channel information. The ‘conclusion’ column remains empty until fact-checkers assign a veracity label to the video on the annotation page, ensuring that conclusions are accurately reflected. The ‘added date’ column specifies the precise date when the video was added to the database, facilitating tracking and chronological organization of entries. To facilitate efficient navigation and retrieval of specific videos, the page offers a range of search and filter options. Users can enter search keywords in the search box below, allowing for the filtering of videos based on desired criteria, such as keywords present in the video title. Furthermore, users can sort the table based on the selected 200 7.5. CREDIBILITY ASSESSMENTS column by clicking on the headers of "Views," "Likes," and "Upload date." Clicking the headers toggles between not sorting, ascending sorting, descending sorting, and back to not sorting, enabling users to quickly arrange the table according to their preference. 7.5.2 Video annotation page When fact-checkers want to annotate a specific video from the database, they can click on the ’annotate’ button corresponding to that video, which directs them to the annotation page specifically designed for the selected video. Figure 7.10 provides a snapshot of the video annotation page. The title of the page corresponds to the title of the YouTube video. The page consists of four main components. The first component is an embedded YouTube video located on the left side of the page. The second component is the video profile area, which contains different tabs for the video’s textual metadata, including the transcript, subtitle, title, and description. These components were selected by the fact-checkers as they play a crucial role in analyzing the veracity of the video. For example, Pesacheck assigns a veracity label of ’false headline’ when the video title is misleading and does not accurately represent the actual content. These metadata components are computationally extracted using YouTube’s API. If any of these components are missing, I provide an empty text box where fact-checkers can manually add and save the text. In the transcript tab, I display the text along with corresponding timestamps that are hyperlinked to specific time instances in the video. The third component on the page is the annotations. Fact-checking is a multi-step process involving identifying misinformation claims, finding their sources, investi- gating their veracity, and writing fact-check reports. Throughout this process, fact- checkers curate information, consult experts, and make notes for themselves. Typically, fact-checkers use spreadsheets or text documents for this purpose, leaving comments containing information about sources, to-do tasks, and more. To streamline this pro- cess, I have incorporated annotations within the YouCred system. The annotation page offers a comprehensive solution for credibility assessments, eliminating the need for fact-checkers to switch between different documents and web pages. To begin the annotation process, a fact-checker can select the portion of text that contains a misinformation claim from any of the tabs. Once a text is selected, an annotation button appears, allowing fact-checkers to add annotations. The annotation appears as a text form, including a timestamp (in case the highlighted text is part of transcript) corresponding to the selected text. It also provides fields where fact-checkers can add 201 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN comments, sources, conclusions indicating the veracity label of the claim, and tags. Tags allow fact-checkers to assign specific labels or keywords to the claims which assists with categorization, organization and searchability. Once saved, the annotation appears on the right side of the page, and the corresponding text is highlighted in yellow. All fields in the text form are editable. Fact-checkers can easily locate their annotations by clicking on the highlighted text, which then becomes green. Similarly, clicking on the annotation block leads the fact-checker to the corresponding text. The fourth component of the page is the overall veracity label that fact-checkers can assign to the video. Note that each claim and the video as a whole can have different veracity labels. 7.5.3 Claims database YouCred’s claim database (Figure 7.11) serves as a comprehensive repository, capturing a snapshot of all the misinformative claims identified by fact-checkers within annotated videos. Each entry in the database contains essential information, including the name of the fact-checker who annotated the video, the specific misinformative claim identified, the veracity label assigned to the claim, relevant tags associated with the claim, and the date when the video was added to the annotation database. The database allows for the identification of recurring patterns or trends in misinformation. By analyzing the stored claims and associated tags, researchers can gain insights into common themes or topics prone to misinformation. This information can be used to develop targeted educational campaigns or policies to address specific areas of concern. Fact-checkers can refer to the database to determine if a similar claim has been fact-checked before. This feature helps to avoid duplicating efforts and allows fact-checkers to efficiently utilize their resources. 7.6 Evaluate stakeholders’ acceptance In early September 2022, I successfully completed all major features of YouCred and de- ployed the system, making it available to Pesacheck. To ensure effective adoption and utilization, I conducted two organization-wide training sessions for the fact-checking community at Pesacheck. The first training session took place on November 7, 2022, followed by the second session on March 13, 2023. Additionally, I created a demo video and provided comprehensive documentation of the system to support the organization 202 7.6. EVALUATE STAKEHOLDERS’ ACCEPTANCE Nov 22 Dec 22 Jan 23 Feb 23 Mar 23 Apr 23 May 23 Month 0 10 20 30 40 50 No o f s ee d vi de os a dd ed to Yo uC re d Extension CSV (a) Sep 22 Oct 22 Nov 22 Dec 22 Jan 23 Feb 23 Mar 23 Apr 23 May 23 0 20 40 60 80 100 120 140 160 Fr eq ue nc y of se ar ch q ue ry g en er at io n YT Video Tags YT Transcript Google Trends YT Autocomplete (b) (c) Nov 22 Dec 22 Jan 23 Feb 23 Mar 23 Apr 23 May 23 0 2 4 6 8 10 12 14 16 No . o f v id eo s a dd ed to a nn o. d at ab as e (d) Figure 7.12: Figure (a) illustrates the number of seed videos added to YouCred through manual CSV uploads and the use of the ’YouTube-CSV-Helper’ extension. Figure (b) presents the usage frequency of YouCred for generating search queries using the four proposed methods throughout the 9-month deployment period. Figure (c) provides an overview of the topics monitored using YouCred and the corresponding proportions of query generation methods utilized for each topic. Figure (d) illustrates the number of potentially misinformative videos added by fact-checkers to YouCred’s annotation database. in using YouCred effectively. Over a 9-month period, from September 2022 to May 2023, I closely monitored the usage of YouCred to assess its acceptance and impact within the organization. I employed two evaluation approaches for this assessment: tracking and analyzing the usage patterns of YouCred throughout its deployment and conducting semi-structured interviews with the fact-checking community to under- stand their overall perception and assessment of the tool’s usefulness. In the following 203 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN sections, I delve into the details and findings of these two evaluation approaches. 7.6.1 Patterns of Usage over Time YouCred’s primary goals are to assist the organization in monitoring YouTube for misinformation discovery and credibility assessments. Therefore, I tracked the system’s usage for search query generation and video annotation. It is important to note that following the deployment of YouCred, I observed a limited frequency of usage by the Pesacheck organization in the initial month. During a meeting with Pesacheck, I identified the manual creation and uploading of CSVs containing seed videos as a barrier to frequent usage. To address this issue, I developed and launched a helper extension in late October 2022. The usage of manual CSV uploads and the extension for adding new seed videos to the system is illustrated in Figure 7.12a7. As expected, the usage of the extension for providing seed videos was significantly higher compared to manual CSV uploads. Throughout the 9-month deployment period, I tracked the frequency of search query generation using YouCred’s four query generation methods, as depicted in Figure 7.12b. The usage frequency notably increased in November, aligning with the release of my extension. Among the query generation methods, video tags emerged as the most popular, followed by Google Trends and the transcript method. It is worth noting that Pesacheck’s office was closed from December 15th to January 10th, and the system experienced downtime in January due to server issues. To address these challenges, I migrated YouCred from a local server to Microsoft Azure cloud services in mid-February, ensuring improved stability and accessibility. Both the office closure and server downtime impacted the usage of YouCred during that period. Figure 7.12c showcases the various topics monitored on YouCred, along with the proportion of query generation methods used for each topic. I observed that the topic of Ebola received the highest level of monitoring, followed by the Kenyan elections in August 2022 and the FIFA World Cup held from July to August 2022. More recently, the system has been used to monitor discussions related to the Personal Data Protection Act (PDPA)8,9 and Countering Violent Extremism (CVE) policies 10. Figure 7.12d demonstrates the number of videos added to YouCred’s annotation database 7It is important to note that I began recording the usage of the extension and CSV uploads in November, coinciding with the organization’s adoption of the extension 8https://www.dataguidance.com/notes/south-africa-data-protection-overview 9https://www.dataguidance.com/jurisdiction/africa 10https://www.usaid.gov/policy/countering-violent-extremism 204 7.6. EVALUATE STAKEHOLDERS’ ACCEPTANCE for credibility assessments. Fact-checkers have added 45 videos about various topics mentioned in Figure 7.12c for credibility assessments. Overall, the quantitative analysis indicates consistent usage of YouCred during the 9-month deployment, despite the temporary downtime and the few bugs that I addressed after the initial deployment. 7.6.2 Semi-structured interviews To gain a comprehensive understanding of YouCred’s adaptation and usage, I com- plemented the quantitative evaluation with semi-structured interviews with the fact- checking team. In this section, I provide a brief overview of the interview protocol and summarize the key findings derived from these interviews. 7.6.2.1 Participants and Interview Procedure I utilized a purposive sampling technique [255] to select participants for the semi- structured interviews. I reached out to both regular users of YouCred, identified from the video annotation database and extension log, as well as individuals who actively participated in the design process. The sample consisted of six stakeholders from Pesacheck (referred to as I1-I6), including fact-checkers and advocates, all of whom had utilized YouCred during its deployment phase. Notably, four participants were regular attendees in the design process meetings. The semi-structured interviews were conducted remotely over Zoom by the first author, with the consent of the participants. All six interviews lasted approximately 60 minutes and took place between June and July 2023. Throughout the interviews, the first author made detailed notes based on observations. 7.6.2.2 Interview Protocol and Data Analysis I began the interviews by asking fact-checkers to provide a brief explanation of their previous methods for monitoring the YouTube platform before the deployment of YouCred. This allowed the fact-checkers to elaborate on the limitations they faced with their existing methods and set the stage for discussing how YouCred addressed those limitations. I then focused on how the participants integrated YouCred into their fact-checking workflow and explored specific scenarios and topics where they had utilized the tool. To gain deeper insights into their usage, I selected one of the topics discussed and requested the participants to share their screen and guide me through their process of searching for that topic on YouTube. Subsequently, I asked 205 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN the participants to demonstrate how they monitored the same topic using YouCred. I encouraged them to walk me through their step-by-step process and highlight the differences between directly searching on YouTube and utilizing YouCred. Throughout this walk-through, I inquired about the benefits and usefulness of each functionality provided by YouCred. I also prompted the participants to evaluate and discuss the advantages and usefulness of each query generation method available in YouCred. In addition, I conducted an exercise where I displayed the YouTube search results interface and the YouCred view-results page side by side on the screen, enabling direct comparisons. I specifically asked the participants to share their insights on how the two interfaces differed and how these differences influenced their fact-checking process. Throughout the interview, I delved into YouCred’s effectiveness and discussed how the system aligns with their specific needs. I also actively encouraged the interviewees to provide suggestions for improving YouCred and discuss any limitations they may have encountered during their usage. The first author transcribed all the interviews and notes and analyzed them us- ing an iterative qualitative thematic approach. This method allowed me to uncover significant insights and patterns emerging from the participants’ responses. 7.6.2.3 Findings By analyzing interview transcripts and notes, I gained valuable insights into the utility of YouCred. In this section, I delve into the key findings from the interviews and also discuss potential areas for improvement as suggested by the participants. Fact-checkers have integrated YouCred into their day-to-day workflow. Fact-checkers revealed that YouCred “comes in handy because of the lack of available free tools available to look for misinformation online, especially on Youtube” (I3). They revealed various ways in which they have integrated YouCred into their workflow. For example, I3 revealed that most of the time “viral misinformation will go viral on all social media sites”, so when- ever they find a misinformative video, “even if it’s on Facebook” they “try to feed it into the YouCred system”. I5 revealed that they have also used YouCred to monitor some persistently circulating misinformation around health-related topics like COVID and Ebola. YouCred provides rich and meaningful query suggestions. YouCred’s query gener- ation methods proved to be a valuable resource for fact-checkers providing “a pool 206 7.6. EVALUATE STAKEHOLDERS’ ACCEPTANCE of information that would otherwise not be available if [they] were generating queries man- ually” (I4). Fact-checkers also appreciated the flexibility to customize their searches on YouCred. As one participant expressed, “I like YouCred suggesting keywords. So yeah, and also being able to edit those keywords and also adding your own, because that refines your search a lot and leads to a higher possibility of getting what you want with the YouCred tool but with the YouTube keyword search, it’s a hit or miss” (I3). Fact-checkers also highlighted the effectiveness of “YouCred’s integration with Google Trends” which consistently delivered “high-quality search results for them” (I3). In ad- dition, the use of video tags within YouCred unveiled additional insights into the strategies employed by content creators to boost the popularity of misinformative videos. For instance, I4 observed that “content creators use the names of celebrities as tags on their YouTube video so that they can trend”. YouCred facilitates modular assessment of videos. Participants revealed that they mostly write short-form fact-check reports which debunk an individual claim. How- ever, if the video contains multiple false claims, they’ll either write multiple short-form reports for each misleading claim or one long-form article debunking all claims. In both scenarios, fact-checkers have found YouCred’s annotation interface to be incredi- bly valuable, enabling them to break down the video into distinct claims and conduct focused investigations. “When we look at multiple items within the video that might be false, YouCred allows us to break down the video into multiple claims and investigate each section” - I1 Fact-checkers use YouCred’s claim database as an informative resource. Fact checkers’ consider YouCred’s claim database as a powerful resource that goes beyond simply cataloging misinformative claims. For example, P4 suggested that the tool helps them avoid duplication of effort since “you can realize that you know, someone else has had really added the video [in the claim’s database]..and is working on it”. In addition, they find the addition of channel details on the page useful as “it will let you know the frequent spreaders of misinformation...So you can actually go to their channel and see, if have they shared any more misinformation”. Additionally, another participant expressed that they are now able to explore a collection of videos related to a specific topic and claim, enhancing their understanding of the claim’s prevalence and context. “[Tags] on claim database provides an option to see a cluster of videos within one topic, so I believe, probably you will see this video contains this claim And 207 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN you’d probably find another video that contains the same claim so on the claims database you’re able to quickly see which the number of videos that are actually speaking about one specific claim, so I think it’s a fantastic feature” - I2 Youcred enhances the efficiency of fact-checking workflow. Participants revealed that using “YouCred tool, [they] could spend less time getting misinformation as opposed to when they were searching on YouTube directly” (I5). They also talked about several features that have played a vital role in improving the efficiency of their fact-checking workflow. According to I2, the preview mode on the view-results page is a valuable tool that allows fact-checkers to do “quickly preview [the video] on the same page and decide if they even want to annotate a video in the first place without going to Youtube.”. I3 highlights the usefulness of the ability to track multiple search queries on the view-results page by stating, “they now have everything on one page, and we’re able to monitor everything at the same time...instead of doing multiple searches in several tabs..it saves you more time”. Additionally, the interactive graphs in Youcred have also garnered positive feedback from participants. As I3 expressed, “The interactive graphs are very helpful. You’re able to only focus on what you need. You can just select a particular period, and also even the number of views”. Possible future improvements and enhancements. During the discussions, partic- ipants highlighted the need for continuous improvement in systems like YouCred. They made valuable suggestions to enhance the system and meet the evolving needs of fact-checkers. One common complaint was the limited YouTube API quota, which at times hindered the seamless use of the system. Participants mentioned that they desire the ability to track multiple topics simultaneously, as well as the convenience of viewing all suggested queries directly in the drop-down menu on the view-results page, eliminating the need to navigate back to the methods page to select new keywords. Some participants requested us to further refine search results using AI to only show videos that could contain problematic content. Looking towards the future, one fact- checker envisioned the claims database to be integrated with the YouTube platform itself. I4 proposed that YouTube could extract videos and their associated misleading claims, along with timestamps, and display a disclaimer indicating the presence of misinformation. The aim would be to instill hesitation in viewers when considering sharing such videos. I4 considers this intervention as a potential strategy to promote responsible information sharing and combat the spread of misinformation. 208 7.7. DISCUSSION 7.7 Discussion In this study, I present YouCred, a fact-checking system designed and built to assist fact-checkers in monitoring the YouTube platform. The formative study highlighted the manual nature of the platform monitoring process and the lack of tools to aid credibility assessments of YouTube videos, leading me to develop YouCred as a solution. YouCred serves as a successful case study, demonstrating the importance of continued dialogue and collaboration with fact-checking organizations in designing systems that have a tangible impact on real-world fact-checking practices. In this section, I reflect on the design implications of my work and discuss, how we need to think about the maintenance of socio-technical systems beyond the initial deployment period. 7.7.1 Design Implications 7.7.1.1 Bridging design-reality gap in design of fact-checking systems The design of fact-checking systems often falls short of making a significant impact on real-world fact-checking practices due to a lack of incorporation of insights and needs from the various stakeholder groups involved in the process [317]. Scholars have highlighted this gap, emphasizing the importance of involving key stakeholders to ensure the relevance and effectiveness of such systems [317]. In my work, I aimed to address this issue by adopting participatory design methods that actively engaged key stakeholders from fact-checking organizations throughout the design process. This approach allowed me to foster meaningful dialogue, gain valuable insights, and better understand the perspectives and concerns of fact-checkers. By involving fact-checkers in the design of YouCred, I was able to create a sys- tem that truly catered to their needs. The participatory design process facilitated collaborative decision-making, where fact-checkers’ expertise and experiences directly influenced the system’s functionalities and features. Through continual engagement, I built a deep understanding of the challenges they faced, such as the manual nature of monitoring the YouTube platform and the lack of tools for credibility assessments of videos. As a result, YouCred was purposefully designed to address these specific challenges and provide fact-checkers with effective tools for their daily work. This approach emphasizes the importance of collaboration and knowledge exchange be- tween researchers and practitioners, ensuring that the resulting system is not only technically robust but also practical and impactful in supporting fact-checking efforts. Moving forward, such participatory approaches can serve as a valuable framework 209 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN for designing socio-technical systems that address the needs of various stakeholder groups involved in complex real-world domains. 7.7.1.2 Imbibing values desired by fact-checkers in design of fact-checking systems Fact-checking is a socio-technical process that goes beyond the mere application of technology [317]. To ensure the successful adoption and effectiveness of fact-checking systems, it is crucial to incorporate the values and preferences of fact-checkers into the design [317]. The formative interviews with fact-checkers revealed their desire for agency and human involvement in the design of these systems. This value signifies the importance of integrating human expertise and judgment into the decision-making processes of fact-checking systems. Building upon these insights, I developed YouCred to provide fact-checkers with a sense of control and agency over their work. One way this was achieved was by empowering fact-checkers with the ability to customize their search queries on YouTube. The system allows fact-checkers to add, modify, or remove terms from the generated search queries, or even create their own queries to monitor relevant content. By granting this level of flexibility, fact-checkers can tailor their searches to align with their specific needs and interests, enhancing the effectiveness and efficiency of their fact-checking efforts. Imbibing the value ensures that fact-checkers are not merely passive users of the system but active participants in shaping its functionality and outcomes. 7.7.2 Maintainability of socio-technical systems Deploying a fact-checking system in the real world is a complex undertaking that goes beyond the initial development and deployment phase. It requires ongoing maintenance and continuous improvement to ensure its effectiveness and relevance in the ever-evolving technological and information landscape. As I deployed YouCred, I encountered various challenges, including scalability, concurrency, and the need for real-time bug handling to ensure uninterrupted system usage. I also realized that the needs of fact-checkers are not static but evolve over time, emphasizing the need for constant adaptation and evolution of the fact-checking system. Simply building and deploying a system is insufficient; active maintenance and enhancement are essential to meet the evolving requirements of fact-checkers. This underscores the significance of investing resources and efforts into the ongoing maintenance of deployed systems. 210 7.8. LIMITATIONS AND OPPORTUNITIES In recent literature, scholars have raised concerns about sustaining maintainability efforts after the initial project funding ends [240]. Maintenance and repair activities have been recognized as overlooked yet essential aspects of socio-technical initiatives, encompassing creativity, innovation, knowledge, power dynamics, and ethics of care [211]. Efforts have been made to study the challenges of sustaining and maintaining the outcomes of Human-Computer Interaction (HCI) design and systems projects beyond their runtime and beyond the researchers’ role within the project context [130, 211, 212, 240]. For instance, Meurer et al. suggest the importance of nurturing a sense of ownership among research participants from the project’s outset and develop- ing their technological capabilities [377]. However, the feasibility of such approaches varies greatly depending on contextual factors, including the technical and human resources available to different researchers and stakeholders for maintaining the de- signed technological artifacts. Hence, scholars argue that active support from funding bodies is crucial to encourage and enable researchers by allocating time and resources specifically for maintenance activities, ensuring the continuity of these efforts beyond the project’s duration [240]. Maintaining a real-world deployed system also requires a collaborative effort be- tween researchers, developers, and other stakeholder groups. While, there is an active research area in software engineering focusing on developing strategies, frameworks, and tools that facilitate the maintainability of software systems [253, 349, 403], we also need to explore the collaborative efforts in the maintenance and evolution of socio-technical systems. Additionally, a significant investment of time and effort is dedicated to cultivating and sustaining social relationships with stakeholder groups throughout the development and maintenance of such collaborative socio-technical systems projects. Recognizing the importance of these relationships, even beyond the initial stages, is crucial. However, these efforts often fall outside the scope of tradi- tional design or research activities. [240]. Encouraging and supporting such endeavors would not only ensures the long-term sustainability of collaborative initiatives but also enhance their overall impact. 7.8 Limitations and Opportunities While YouCred has demonstrated significant potential in assisting fact-checkers in combating online misinformation on YouTube, there are limitations to address and numerous opportunities for further development and expansion. I discuss a few below. 211 CHAPTER 7. DEFENDING AGAINST ONLINE MISINFORMATION VIA SYSTEM DESIGN • Limited YouTube API Quota: The current version of YouCred relies on the YouTube API to extract search results and video metadata. However, the API has a restricted quota, limited to 10,000 requests per day. Fact-checkers often quickly max out their individual API quota. To address this problem, I filled out an official form requesting increased quota limits but did not receive any re- sponse. However, I am hopeful about the opportunities provided by the YouTube Researcher Program11, which offers expanded access to the YouTube API. I am actively applying to join this program to address the quota limitations. • Dependency on YouTube API for Transcripts: YouCred currently depends on the YouTube API to extract video transcripts, which are displayed on the video annotation page. However, this approach limits the availability of transcripts, as not all videos have them. To overcome this limitation, I plan to explore video-to- text conversion tools in the future. By utilizing these tools, I can obtain transcripts for videos that do not have them available, further enhancing the annotation capabilities of YouCred. • Single-Topic Monitoring: The current version of YouCred allows fact-checkers to monitor YouTube for one topic at a time. This limitation hinders their ability to track multiple topics simultaneously. To address this limitation and provide a more comprehensive monitoring solution, I plan to expand and improve YouCred. My goal is to develop functionality that enables fact-checkers to easily track and monitor multiple topics concurrently, enhancing their efficiency and effectiveness in combating misinformation. • Deployment to Other Fact-Checking Organizations: As part of my future plans, I aim to deploy YouCred to other fact-checking organizations. By sharing this tool with different organizations, we can create a network where resources, such as the claims database, can be shared among them. This collaborative approach not only enhances fact-checking efforts but also provides an opportunity for researchers to study the misinformation landscape across different countries, furthering our understanding of this global challenge. • Integration with Tiplines and User Reporting: Many fact-checking organizations have established tiplines where users can report potentially misinformative content. To leverage user contributions and expand the reach of YouCred, I plan to integrate the system with these tiplines. By encouraging users to report search queries that lead them to misinformation online, I can enhance the effectiveness 11https://research.youtube/ 212 7.9. CONCLUSION and scope of YouCred in identifying and addressing misinformation on YouTube. • Dependency on seed videos: Youcred requires fact-checkers to curate a few seed videos in order to further monitor that topic. This curation process can be time- consuming and labor-intensive. Integrating the system with user tiplines is one way to address this issue. User-reported videos can not only provide valuable insights into the problematic topics that users are exposed to, but they can serve as alternative seed videos. • Refining search results: As part of my future plans, I aim to enhance the display of search results in YouCred. As fact-checkers fact-check more videos about a topic, the system can utilize machine learning algorithms to identify and prioritize search results with a higher probability of containing misinformation. One potential way is to look for linguistic signals in the videos’ comments to determine if the video could potentially contain misinformation [217]. 7.9 Conclusion In this study, I build a fact-checking system that helps fact-checkers with misinfor- mation discovery and credibility assessments on YouTube. The system is a result of a 2-year collaboration with Pesacheck—Africa’s largest indigenous fact-checking organization. Throughout the entire development process, I actively engaged the fact-checking team, involving them in requirement elicitation, design iterations, and evaluation phases. To evaluate the effectiveness and user acceptance of YouCred, I de- ployed the system at Pesacheck and monitored its usage for a duration of nine months. In addition, I conducted follow-up semi-structured interviews with the fact-checkers to gather their insights and feedback. The results of this comprehensive evaluation revealed a positive reception of YouCred within the fact-checking community. The fact-checkers acknowledged the system’s utility and found it valuable in their daily fact-checking endeavors. Overall, this work validates the effectiveness of participatory approaches in designing fact-checking systems that effectively meet the needs and expectations of fact-checkers. 213 C H A P T E R 8 FUTURE WORK AND CONCLUSION In the current digital landscape, people increasingly turn to search engines and social media platforms for news and information. However, the content presented by online platforms is not always reliable. The information can be biased, inaccurate, or even misleading. This situation is further compounded by the risk of users getting trapped in filter bubbles that expose them to even more problematic content. In my thesis work, I delve into this issue, which I term “algorithmically curated problematic content,” with a specific focus on misinformation. I examine how algorithms contribute to presenting and amplifying misleading content and design defenses against such content presented by platforms through three distinct research threads. In the first research thread, I developed audit methodologies to assess how user attributes and activities influence the extent of misinformation that surfaces in search results and recommendations. Applying these methodologies, I conducted a compre- hensive series of meticulously controlled experiments on various social media search interfaces, including YouTube (Chapter 3 and 4) and Amazon (Chapter 5). I found that these platforms amplify the misinformative content to users under certain conditions. I also identified vulnerable user populations who could be targets for certain misinfor- mative topics on online platforms. Given the significant influence of online platforms and the lack of universal policies against harmful online content, I advocate for a shift in responsibility from users to platforms. I believe platforms must take a more proactive role in ensuring the accuracy and reliability of the information presented to their users. This may involve rethinking traditional recommendation algorithms and 214 treating topics that directly impact users’ well-being, health, and happiness with extra scrutiny and ensuring high-quality searches and recommendations for them. The second thread of my research deep dives into the ways we can address the problem of online misinformation that gets surfaced by online platforms. For this work, I turned to fact-checking organizations that are constantly monitoring online platforms to determine the veracity of potentially dubious claims. Including the voices of those battling misinformation is essential, as they offer insights into real-world challenges and needs. Equally important is identifying the human stakeholders within these organizations and their roles. This knowledge empowers us to support all facets of the fact-checking process, both visible and invisible. Consequently, I interviewed individuals from fact-checking organizations across four continents. Through this study, I deep-dived into the process of online fact-checking by foregrounding the human and technological infrastructures of the fact-checking process (Chapter 6). I also unraveled the barriers to fact-checking online misinformation. Based on my findings, I propose that improving the quality of fact-checking necessitates systematic changes within the civic, informational, and technological contexts. Such changes are essential for a comprehensive approach to mitigating misinformation effectively. This research provides a road map for future investigations in fact-checking space, offering valuable insights that can aid and enhance the endeavors of fact-checking organizations. In the final thread of my dissertation, I leverage the insights gained from my previous research to design and build a fact-checking system (Chapter 7). The need for the system emerged from my interview study with fact-checking organizations, which revealed the challenges in monitoring video search engines like YouTube. Unlike platforms such as Twitter and Facebook, YouTube lacks trending topics or public groups for information sharing. Therefore, fact-checkers end up manually searching the platform by crafting queries based on guesswork. I solve this problem by building the YouCred fact-checking system. The major contribution of this project is the use of participatory methods where fact-checking stakeholders were included in all stages of system development, from requirement elicitation to design, and evaluation. I build the solution based on fact-checkers existing knowledge and processes to ensure the system’s long-term sustainability. This work demonstrates that for effective solutions that can impact fact-checking processes in the real world, collaboration with fact- checkers and their inclusion in the design process is imperative. In conclusion, my thesis documents an extensive investigation into identifying, 215 CHAPTER 8. FUTURE WORK AND CONCLUSION measuring, and defending against algorithmically curated online misinformation. My approach adopts a multifaceted perspective, scrutinizing online misinformation through three distinct lenses: algorithms, fact-checking infrastructure, and design. Through this multifaceted approach and close collaboration with diverse stakeholders, I have been successful in developing effective solutions that can make a real difference in the fight against online misinformation. My research contributes not only to aca- demic discourse but also offers practical insights into designing better technology and policies to mitigate the harmful effects of online misinformation. 8.1 Future work Each research thread of my dissertation paves the way for new avenues of exploration. In this section, I delve into the potential future directions of my work. First, I discuss promising avenues that could be pursued within the realm of algorithmic audit re- search (Section 8.1.1). Second, I propose strategies to counter algorithmic harm by designing for algorithmic awareness (Section 8.1.2). I propose to seek ways to incor- porate algorithmic literacy early in the education system by piloting interventions in university courses. I also propose designing frameworks to generate explanations of how algorithms work and impact users. While awareness about how the algorithms function and how they can cause harm is important, we also need pathways to re- vert undesirable algorithmic behaviors. Thus, my third proposed work focuses on designing for algorithmic recourse that allows users to change undesirable algorithmic decisions (Section 8.1.3). Finally, my dissertation research mostly focused on US-centric misinformation in the English language. I am eager to extend my investigations to the Global South region (Section 8.1.4). This future research trajectory would enable me to better understand the nuances of the misinformation phenomenon within diverse cultural contexts and languages. 8.1.1 Exploring New Horizons in Algorithmic Audit research I think there is a lot to be done in the algorithmic audits space. A recurring pattern in most audit studies (including my own) is the limited timeframe—they tend to assess a platform over a short span. In my view, there’s immense value in embracing continuous audits that unfold over months or even years. This prolonged perspective could unveil how platforms’ policies and algorithmic behavior evolve over time. Fur- 216 8.1. FUTURE WORK thermore, certain platforms like Spotify, ephemeral content generators like Snapchat, and non-Western search engines like Yandex have received limited attention in audit research. Additionally, most audits are focused on the Global North regions. Auditing algorithms in the Global South and even conducting cross-country or cross-continent audits could provide invaluable insights into regional variations in algorithmic be- havior in two different parts of the world. Multi-platform audits also present an unexplored avenue. Such audits would scrutinize and evaluate the impact, fairness, and overall behavior of algorithms across various digital spaces. The goal would be to gain insights into how these algorithms curate content, make recommendations, and shape user experiences on different platforms. Consider an example, few extremist groups have a presence on both Twitter and Facebook [313]. But there has been no study to test and compare the effects of following these accounts on both platforms. Multi platforms audits can help understand which platform exacerbates the effect of a problematic user action and to what extent. The last year of my Ph.D. witnessed an explosive rise and interest in generative AI powered by large language models. Generative AI is now being integrated with search engines like Google and Bing and has brought forth concerns about its potential to propagate harm and amplify biases. I believe there is a huge potential in auditing research space to investigate these platforms for bias. This includes evaluating the factual accuracy of their responses to queries about important topics, as well as under- standing their behavior when presented with prompts that are either multilingual or in non-western languages. 8.1.2 Designing for algorithmic literacy and awareness. One of the first steps towards redressing algorithmic harm is to make people aware of the presence of algorithmic systems which would then allow users to question the outcomes of the algorithmic system. I want to understand how can we incorporate algorithmic literacy early in the education system by piloting interventions in high school courses as well as university courses in non-IT degrees. I want the interventions to educate people not only about the presence of algorithms but also about how their actions can impact algorithmic behavior. Next, I want to create frameworks to generate clear, meaningful, and useful explanations of how algorithms work and impact users. The vision of this work closely aligns with the White House’s AI Bill of Rights and I’ll be applying for federal funding for the same. I’ve already made headway in this line of inquiry [316]. I have proposed design interventions that act as ‘decision aids’ to 217 CHAPTER 8. FUTURE WORK AND CONCLUSION Warning: Watching this video might lead to the following: This recommended video talks about 9/11 conspiracy theory. This conspiracy has been debunked by several trustworthy sources. Read more Similar videos could appear in your YouTube recommen- dations in the future Suggestion: To remove the effect of this video from future recommendations, delete it from watch history Figure 8.1: When a conspiratorial video gets recommended on a user’s YouTube homepage, the user is warned about the consequences of watching the video on future video recommendations. users when algorithms expose them to problematic content [316]. My design presents users with facts (what happened) accompanied by forewarnings (what could happen) to convey the potential risks of action in a comprehensible manner. Figure 8.1 shows a mock-up of my design intervention. 8.1.3 Designing for algorithmic recourse. Awareness of the presence of an algorithm empowers users to question or contest the algorithmic decision. What does a user do when they encounter an undesirable algorithmic outcome? The field of machine learning (ML) has introduced the concept of recourse which is defined as the ability to change decisions of an ML model by changing input variables. I want to take a human-centered approach to recourse. I plan to understand users’ needs for recourse from the online platforms and investigate the current systems for recourse settings (e.g. setting to indicate that a user is not interested in content from a particular source). I plan to redesign existing online platforms for algorithmic recourse so that users have pathways to change the algorithmic decisions by performing particular actions on the system. 218 8.1. FUTURE WORK 8.1.4 Studying misinformation, fact-checking, and algorithmic impact beyond the US. While misinformation is a global crisis, measures to combat it vary with respect to culture, geographic location, language, etc. Most of the academic research, to date, has primarily focused on combating misinformation in Western countries, while not addressing the phenomenon in the Global South. How is fact-checking practiced in the Global South? What are the various barriers to the fact-checking process in the Global South? How can we design tools and technology to assist the stakeholders in the fact-checking process in a resource-constrained context of the Global South region? I want to answer these questions by first interviewing various stakeholder groups involved in the fact-checking process in the Global South and then using the insights to build tools for them. I strongly believe that this research would help provide a global perspective on misinformation and fact-checking. I am already collaborating with several fact-checking organizations in the Global South to make headway in this research direction. In a separate line of inquiry, I want to conduct systematic, effective, and ethical audits on various online platforms for problematic content such as hate speech, ex- tremism, and conspiracy theories specific to the Global South countries. Many online platforms have content moderation policies for the aforementioned problematic con- tent. However, it’s not known whether these policies are being applied uniformly in the Global North and Global South region. There also have been reports of inconsistent support for misinformation and hate speech in non-English languages on social media platforms, which disproportionately affects those in the Global South where such con- tent spreads through local regional languages. I want to investigate how problematic content in non-English languages gets surfaced online and in turn understand, how effectively online platforms have enacted their policies in the Global South. 219 BIBLIOGRAPHY [1] Amazon slammed for promoting false covid cures and anti-vaccine claims : Npr. https://www.npr.org/2021/09/09/1035559330/democrats- slam-amazon-for-promoting-false-covid-cures-and-anti- vaccine-claims. (Accessed on 12/28/2022). [2] Fighting coronavirus misinformation and disinformation - center for american progress. https://www.americanprogress.org/article/fighting- coronavirus-misinformation-disinformation/. (Accessed on 01/10/2023). [3] How google’s search algorithm spreads false information with a rightwing bias | google | the guardian. https://www.theguardian.com/technology/2016/dec/16/google- autocomplete-rightwing-bias-algorithm-political- propaganda. (Accessed on 12/28/2022). [4] Youtube is still struggling to rein in its recommendation algorithm. https://www.buzzfeednews.com/article/carolineodonovan/ down-youtubes-recommendation-rabbithole. (Accessed on 12/28/2022). [5] Youtube more likely to recommend election-fraud content to those skeptical of the 2020 election: study – the hill. https://thehill.com/changing-america/enrichment/arts- culture/3625989-youtube-more-likely-to-recommend- election-fraud-content-to-those-skeptical-of-the-2020- election-study/. (Accessed on 12/28/2022). 220 BIBLIOGRAPHY [6] List of conspiracy theories, (2019). [7] Google search help, (2020). https://support.google.com/websearch/answer/9281931?hl=en. [8] 3 challenges of integrating heterogeneous data sources - dzone integration. https://dzone.com/articles/3-challenges-of-integrating- heterogeneous-data-sou, August 2021. (Accessed on 08/03/2021). [9] Challenges of integrating heterogeneous data sources - dataversity. https://www.dataversity.net/challenges-of-integrating- heterogeneous-data-sources/, August 2021. (Accessed on 08/03/2021). [10] A common data model for europe? - why? which? how? - workshop report. https://www.ema.europa.eu/en/documents/report/common-data- model-europe-why-which-how-workshop-report_en.pdf, Aug 2021. (Accessed on 08/03/2021). [11] A common data model in europe? – why? which? how? | european medicines agency. https://www.ema.europa.eu/en/events/common-data-model- europe-why-which-how, Aug 2021. (Accessed on 08/03/2021). [12] Der spiegel | online-nachrichten. https://www.spiegel.de/consent-a-?targetUrl=https%3A%2F% 2Fwww.spiegel.de%2Finternational%2F&ref=https%3A%2F% 2Fwww.google.com%2F, September 2021. (Accessed on 09/14/2021). [13] dpa: en. https://www.dpa.com/en/, August 2021. (Accessed on 08/17/2021). [14] Fact check. https://www.indiatoday.in/fact-check, August 2021. (Accessed on 08/17/2021). 221 BIBLIOGRAPHY [15] Portada · maldita.es - periodismo para que no te la cuelen. https://maldita.es/, August 2021. (Accessed on 08/17/2021). [16] Presentación de powerpoint. https://maldita.es/uploads/public/docs/barometro_ desinformacion_parte_1.pdf, Aug 2021. (Accessed on 08/25/2021). [17] Search results - lexisnexis. https://www.lexisnexis.com/en-us/search.page, April 2021. (Accessed on 04/15/2021). [18] Uganda in crisis – ancir’s ilab. https://investigate.africa/reports/uganda-in-crisis/, Aug 2021. (Accessed on 08/25/2021). [19] Bringing fact check information to google images. https://blog.google/products/search/bringing-fact-check- information-google-images/, January 2022. (Accessed on 01/13/2022). [20] Debunk bot (@debunkbotafrica) / twitter. https://twitter.com/debunkbotafrica, Jan 2022. (Accessed on 01/12/2022). [21] Facebook takes down fact-check of live action, lila rose anti-abortion videos. https://www.buzzfeednews.com/article/claudiakoerner/ facebook-fact-check-abortion-video-doctors-medical, January 2022. (Accessed on 01/05/2022). [22] Gephi - the open graph viz platform. https://gephi.org/, Jan 2022. (Accessed on 01/08/2022). [23] Google fact check feature: What it means for your online efforts - act-on. 222 BIBLIOGRAPHY https://act-on.com/blog/google-fact-check-feature-what- it-means-for-your-online-efforts/, Jan 2022. (Accessed on 01/05/2022). [24] Hci and the u.s. presidential election: A few thoughts on a research agenda | by brent hecht | medium. https://brenthecht.medium.com/hci-and-the-u-s- presidential-election-a-few-thoughts-on-a-research- agenda-7c1a0a04986, January 2022. (Accessed on 01/05/2022). [25] Industry-leading vector graphics software | adobe illustrator. https://www.adobe.com/products/illustrator.html, January 2022. (Accessed on 01/07/2022). [26] Kapwing: The collaborative online video editor. https://www.kapwing.com/, January 2022. (Accessed on 01/07/2022). [27] Mooc.org | massive open online courses | an edx site. https://www.mooc.org/, January 2022. (Accessed on 01/08/2022). [28] Official adobe photoshop | photo and design software. https://www.adobe.com/products/photoshop.html, January 2022. (Accessed on 01/07/2022). [29] See fact checks in youtube search results - youtube help. https://support.google.com/youtube/answer/9229632?hl=en, Jan 2022. (Accessed on 01/05/2022). [30] Dataleads. https://dataleads.co.in/, 2023. (Accessed on 06/25/2023). [31] Dpa german press agency. https://www.dpa.com/en, 2023. (Accessed on 06/25/2023). 223 BIBLIOGRAPHY [32] Triggering google suggests - fatrank. https://www.fatrank.com/triggering-google-suggests/, June 2023. (Accessed on 06/05/2023). [33] The african network of centers for investigativereporting’s investigative lab, accessed in 2021. https://investigate.africa/. [34] 10UNDER100, 20 eye opening amazon statistics & facts for 2020, (2020). https://10under100.com/amazon-statistics-facts/. [35] A. ABDUL, J. VERMEULEN, D. WANG, B. Y. LIM, AND M. KANKANHALLI, Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda, in Proceedings of the 2018 CHI conference on human factors in computing systems, 2018, pp. 1–18. [36] A. ABDUL, J. VERMEULEN, D. WANG, B. Y. LIM, AND M. KANKANHALLI, Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda, Association for Computing Machinery, New York, NY, USA, 2018, p. 1–18. [37] A. ABILOV, Y. HUA, H. MATATOV, O. AMIR, AND M. NAAMAN, Voterfraud2020: a multi-modal dataset of election fraud claims on twitter, Proceedings of the International AAAI Conference on Web and Social Media, 15 (2021), pp. 901– 912. [38] J. ABRAHAM AND M. C. REDDY, Re-coordinating activities: an investigation of articulation work in patient transfers, in Proceedings of the 2013 conference on Computer supported cooperative work, 2013, pp. 67–78. [39] M. S. ACKERMAN, The intellectual challenge of cscw: The gap between social re- quirements and technical feasibility, Human–Computer Interaction, 15 (2000), pp. 179–203. [40] A. ADADI AND M. BERRADA, Peeking inside the black-box: A survey on explainable artificial intelligence (xai), IEEE Access, 6 (2018), pp. 52138–52160. [41] B. ADAIR, The future of fact-checking is all about structured data, April 2021. 224 BIBLIOGRAPHY [42] K. ADNAN, R. AKBAR, AND K. S. WANG, Information extraction from multi- faceted unstructured big data, International Journal of Recent Technology and Engineering (IJRTE), 8 (2019), pp. 1398–1404. [43] AFP, Fact check |. https://factcheck.afp.com//, April 2021. (Accessed on 04/15/2021). [44] A. AGADJANIAN, N. BAKHRU, V. CHI, D. GREENBERG, B. HOLLANDER, A. HURT, J. KIND, R. LU, A. MA, B. NYHAN, ET AL., Counting the pinocchios: The effect of summary fact-checking data on perceived accuracy and favorability of politicians, Research & Politics, 6 (2019), p. 2053168019870351. [45] J. ALBRIGHT, Untrue tube – youtube's conspiracy ecosystem, 2018. [46] H. ALLCOTT AND M. GENTZKOW, Social media and fake news in the 2016 election, Journal of economic perspectives, 31 (2017), pp. 211–36. [47] J. ALMEIDA, Misinformation dissemination on the web, in Companion Proceedings of the 2019 World Wide Web Conference, 2019, pp. 740–740. [48] M. ALRUBAIAN, M. AL-QURISHI, M. M. HASSAN, AND A. ALAMRI, A credi- bility analysis system for assessing information on twitter, IEEE Transactions on Dependable and Secure Computing, 15 (2016), pp. 661–674. [49] M. A. AMAZEEN, A critical assessment of fact-checking in 2012, 2013. [50] M. A. AMAZEEN, C. J. VARGO, AND T. HOPP, Reinforcing attitudes in a gatewatch- ing news era: Individual-level antecedents to sharing fact-checks on social media, Communication Monographs, 86 (2019), pp. 112–132. [51] I. ARCHIVE, Internet archive: Wayback machine. https://archive.org/web/, April 2021. (Accessed on 04/15/2021). [52] ARCHIVE.TODAY, archive.today - wikipedia. https://en.wikipedia.org/wiki/Archive.today, April 2021. (Accessed on 04/15/2021). [53] R. L. ARMSTRONG, New survey suggests 10% of americans believe the moon landing was fake, (2019). 225 BIBLIOGRAPHY [54] A. ARSHT AND D. ETCOVITCH, The human cost of online content moderation, Harvard Law Review Online, Harvard University, Cambridge, MA, USA. Re- trieved from https://jolt. law. harvard. edu/digest/the-human-cost-ofonline- content-moderation, (2018). [55] V. ARYA, R. K. BELLAMY, P.-Y. CHEN, A. DHURANDHAR, M. HIND, S. C. HOFFMAN, S. HOUDE, Q. V. LIAO, R. LUSS, A. MOJSILOVIC´, ET AL., One explanation does not fit all: A toolkit and taxonomy of ai explainability techniques, arXiv preprint arXiv:1909.03012, (2019). [56] M. ASHOORI AND J. D. WEISZ, In ai we trust? factors that influence trustworthiness of ai-infused decision-making processes, arXiv preprint arXiv:1912.02675, (2019). [57] P. BALL AND A. MAXMEN, The epic battle against coronavirus misinformation and conspiracy theories, (2020). [58] BALLOTPEDIA, The methodologies of fact-checking, accessed in March 2021. [59] J. BANDY, Problematic machine behavior: A systematic literature review of algorithm audits, Proc. ACM Hum.-Comput. Interact., 5 (2021). [60] S. BARBOSA AND S. MILAN, Do not harm in private chat apps: Ethical issues for research on and with whatsapp, Westminster Papers in Communication and Culture, 14 (2019), pp. 49–65. [61] M. BASOL, J. ROOZENBEEK, AND S. VAN DER LINDEN, Good news about bad news: Gamified inoculation boosts confidence and cognitive immunity against fake news, Journal of cognition, 3 (2020). [62] BBC NEWS, Measles: Four european nations lose eradication status, (2019). [63] R. T. BECKWITH, Us primaries: Election deniers go door-to-door to confront voters after losses - bloomberg. https://www.bloomberg.com/news/articles/2022-08-23/ election-deniers-go-door-to-door-to-confront-voters- after-losses?leadSource=uverify%20wall, 2022. (Accessed on 09/07/2022). [64] W. BELLAMY, Malaysia airlines flight 370 final report inconclusive, (2019). [65] J. BELLUZ, Amazon is a giant purveyor of medical quackery, (2016). 226 BIBLIOGRAPHY [66] A. BERMES, Information overload and fake news sharing: A transactional stress perspec- tive exploring the mitigating role of consumers’ resilience during covid-19, Journal of Retailing and Consumer Services, 61 (2021), p. 102555. [67] J. C. BERTOT AND H. CHOI, Big data and e-government: Issues, policies, and recom- mendations, in Proceedings of the 14th Annual International Conference on Digital Government Research, dg.o ’13, New York, NY, USA, 2013, Associa- tion for Computing Machinery, p. 1–10. [68] A. BESSI, M. COLETTO, G. A. DAVIDESCU, A. SCALA, G. CALDARELLI, AND W. QUATTROCIOCCHI, Science vs conspiracy: Collective narratives in the age of misinformation, PloS one, 10 (2015), p. e0118093. [69] M. M. BHUIYAN, C. A. BAUTISTA ISAZA, T. MITRA, AND S. W. LEE, Othertube: Facilitating content discovery and reflection by exchanging youtube recommenda- tions with strangers, in CHI Conference on Human Factors in Computing Systems, 2022, pp. 1–17. [70] M. M. BHUIYAN, M. HORNING, S. W. LEE, AND T. MITRA, Nudgecred: Supporting news credibility assessment on social media through nudges, Proceedings of the ACM on Human-Computer Interaction, 5 (2021), pp. 1–30. [71] J. BISBEE, M. BROWN, A. LAI, R. BONNEAU, J. NAGLER, AND J. A. TUCKER, Election fraud, youtube, and public perception of the legitimacy of president biden, Journal of Online Trust and Safety, 1 (2022). [72] A. BONDIELLI AND F. MARCELLONI, A survey on fake news and rumour detection techniques, Information Sciences, 497 (2019), pp. 38–55. [73] N. L. BRAGAZZI, I. BARBERIS, R. ROSSELLI, V. GIANFREDI, D. NUCCI, M. MORETTI, T. SALVATORI, G. MARTUCCI, AND M. MARTINI, How of- ten people google for vaccination: Qualitative and quantitative insights from a systematic search of the web-based activities using google trends, Human Vaccines & Immunotherapeutics, 13 (2017), pp. 464–469. PMID: 27983896. [74] V. BRAUN AND V. CLARKE, Using thematic analysis in psychology, Qualitative research in psychology, 3 (2006), pp. 77–101. [75] A. BRUNS, Are filter bubbles real?, (2019). 227 BIBLIOGRAPHY [76] A. BRUNS, Filter bubble, Internet Policy Review, 8 (2019). [77] P. BUMP, The unique role of fox news in the misinformation universe - the washington post. https://www.washingtonpost.com/politics/2021/11/08/unique- role-fox-news-misinformation-universe/, 2021. (Accessed on 09/10/2022). [78] T. D. BURGESS II AND S. M. SALES, Attitudinal effects of “mere exposure”: A reevaluation, Journal of Experimental Social Psychology, 7 (1971), pp. 461–472. [79] J. BURRELL, How the machine ‘thinks’: Understanding opacity in machine learning algorithms, Big Data & Society, 3 (2016), p. 2053951715622512. [80] BUZZSUMO, Buzzsumo.com. https://buzzsumo.com/, April 2021. (Accessed on 04/15/2021). [81] J. C. DOS SANTOS, S. WM SIQUEIRA, B. PEREIRA NUNES, P. P. BALESTRASSI, AND F. RS PEREIRA, Is there personalization in twitter search? a study on polarized opinions about the brazilian welfare reform, in 12th ACM Conference on Web Science, 2020, pp. 267–276. [82] N. CARNE, 'conspiracies'dominate youtube climate modification videos, (2019). [83] N. CASS, T. SCHWANEN, AND E. SHOVE, Infrastructures, intersections and societal transformations, Technological Forecasting and Social Change, 137 (2018), pp. 160–167. [84] C. CASTILLO, M. MENDOZA, AND B. POBLETE, Information credibility on twitter, in Proceedings of the 20th international conference on World wide web, 2011, pp. 675–684. [85] M. CAULFIELD, Web literacy for student fact-checkers, 2017. [86] S. CAZALENS, P. LAMARRE, J. LEBLAY, I. MANOLESCU, AND X. TANNIER, A content management perspective on fact-checking, in Companion Proceedings of the The Web Conference 2018, 2018, pp. 565–574. [87] A. S. CENTRAL, Dietary supplements, (accessed in 2020). 228 BIBLIOGRAPHY [88] A. CERONE, E. NAGHIZADE, F. SCHOLER, D. MALLAL, R. SKELTON, AND D. SPINA, Watch’n’check: Towards a social media monitoring tool to assist fact- checking experts, in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), IEEE, 2020, pp. 607–613. [89] B. CHAUDHURI, Paradoxes of intermediation in aadhaar: Human making of a digital infrastructure, South Asia: Journal of South Asian Studies, 42 (2019), pp. 572– 587. [90] N. V. CHAWLA, K. W. BOWYER, L. O. HALL, AND W. P. KEGELMEYER, Smote: synthetic minority over-sampling technique, Journal of artificial intelligence research, 16 (2002), pp. 321–357. [91] A. CHECK, Africa check | sorting fact from fiction. https://africacheck.org/, April 2021. (Accessed on 04/15/2021). [92] J. M. CHEJFEC-CIOCIANO, J. P. MARTÍNEZ-HERRERA, A. D. PARRA- GUERRA, R. CHEJFEC, F. J. BARBOSA-CAMACHO, J. C. IBARROLA-PEÑA, G. CERVANTES-GUEVARA, G. A. CERVANTES-CARDONA, C. FUENTES- OROZCO, E. CERVANTES-PÉREZ, ET AL., Misinformation about and interest in chlorine dioxide during the covid-19 pandemic in mexico identified using google trends data: infodemiology study, JMIR infodemiology, 2 (2022), p. e29894. [93] L. CHEN, R. MA, A. HANNÁK, AND C. WILSON, Investigating the impact of gender on rank in resume search engines, in Proceedings of the 2018 chi conference on human factors in computing systems, 2018, pp. 1–14. [94] L. CHEN, A. MISLOVE, AND C. WILSON, Peeking beneath the hood of uber, in Proceedings of the 2015 internet measurement conference, 2015, pp. 495–508. [95] L. CHEN, A. MISLOVE, AND C. WILSON, An empirical analysis of algorithmic pric- ing on amazon marketplace, in Proceedings of the 25th International Conference on World Wide Web, 2016, pp. 1339–1349. [96] Q. CHEN, Y. ZHANG, R. EVANS, AND C. MIN, Why do citizens share covid-19 fact-checks posted by chinese government social media accounts? the elaboration likelihood model, International Journal of Environmental Research and Public Health, 18 (2021), p. 10058. 229 BIBLIOGRAPHY [97] X. CHEN, S.-C. J. SIN, Y.-L. THENG, AND C. S. LEE, Deterring the spread of mis- information on social network sites: A social cognitive theory-guided intervention, Proceedings of the Association for Information Science and Technology, 52 (2015), pp. 1–4. [98] X. CHEN, S.-C. J. SIN, Y.-L. THENG, AND C. S. LEE, Why students share mis- information on social media: Motivation, gender, and study-level differences, The Journal of Academic Librarianship, 41 (2015), pp. 583–592. [99] D. CHERUIYOT AND R. FERRER-CONILL, “fact-checking africa”, Digital Journal- ism, 6 (2018), pp. 964–975. [100] P. CHONKA, S. DIEPEVEEN, AND Y. HAILE, Algorithmic power and african indige- nous languages: search engine autocomplete and the global multilingual internet, Media, Culture & Society, 45 (2023), pp. 246–265. [101] CISCO, Cisco visual networking index: Forecast and trends, 2017–2022 white paper, (2019). [102] J. CONDITT, Google partners with fact-checking network to fight fake news, 2017. [103] W. CONTRIBUTORS, Occam’s razor — Wikipedia, the free encyclopedia, 2022. [Online; accessed 13-September-2022]. [104] J. COOK, S. LEWANDOWSKY, AND U. K. ECKER, Neutralizing misinformation through inoculation: Exposing misleading argumentation techniques reduces their influence, PloS one, 12 (2017), p. e0175799. [105] A. COSSARD, G. D. F. MORALES, K. KALIMERI, Y. MEJOVA, D. PAOLOTTI, AND M. STARNINI, Falling into the echo chamber: the italian vaccination debate on twitter, in Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, 2020, pp. 130–140. [106] P. COVINGTON, J. ADAMS, AND E. SARGIN, Deep neural networks for youtube rec- ommendations, in Proceedings of the 10th ACM Conference on Recommender Systems, 2016. [107] CROWDTANLE, Crowdtangle | content discovery and social monitoring made easy. https://www.crowdtangle.com/, April 2021. (Accessed on 04/15/2021). 230 BIBLIOGRAPHY [108] P. M. DAHLGREN, A critical review of filter bubbles and a comparison with selective exposure, Nordicom Review, 42 (2021), pp. 15–33. [109] E. DAI, Y. SUN, AND S. WANG, Ginger cannot cure cancer: Battling fake health news with a comprehensive data repository, Proceedings of the International AAAI Conference on Web and Social Media, 14 (2020), pp. 853–862. [110] F. J. DAMERAU, A technique for computer detection and correction of spelling errors, Communications of the ACM, 7 (1964), pp. 171–176. [111] M. DANILEVSKY, K. QIAN, R. AHARONOV, Y. KATSIS, B. KAWAS, AND P. SEN, A survey of the state of explainable AI for natural language processing, in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Suzhou, China, Dec. 2020, Association for Computational Linguistics, pp. 447–459. [112] E. DAYTON, Amazon statistics you should know: Opportunities to make the most of america’s top online marketplace. [113] S. DE PAOR AND B. HERAVI, Information literacy and fake news: How the field of librarianship can help combat the epidemic of fake news, The Journal of Academic Librarianship, 46 (2020), p. 102218. [114] B. DEAN, Here’s what we learned about organic click through rate, (2019). [115] K. C. DESOUZA AND K. L. SMITH, Big data for social innovation, Stanford Social Innovation Review, 12 (2014), pp. 38–43. [116] J. DEVLIN, M.-W. CHANG, K. LEE, AND K. TOUTANOVA, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, (2018). [117] N. DIAKOPOULOS, D. TRIELLI, J. STARK, AND S. MUSSENDEN, I vote for– how search informs our choice of candidate, Digital Dominance: The Power of Google, Amazon, Facebook, and Apple, M. Moore and D. Tambini (Eds.), 22 (2018). [118] N. DIAS AND A. SIPPITT, Researching fact checking: Present limitations and future opportunities, The Political Quarterly, 91 (2020), pp. 605–613. [119] C. DICKEY, The rise and fall of facts, 2019. 231 BIBLIOGRAPHY [120] R. DIRESTA, The complexity of simply searching for medical advice, 2018. [121] R. DIRESTA, How amazon’s algorithms curated a dystopian bookstore, (2019). https://www.wired.com/story/amazon-and-the-spread-of- health-misinformation/. [122] J. D’ONFRO, Youtube adding wikipedia links debunking conspiracy theories. https://www.cnbc.com/2018/03/13/youtube-wikipedia-links- debunk-conspiracy.html, 2018. (Accessed on 09/12/2022). [123] B. DOSONO AND B. SEMAAN, Moderation practices as emotional labor in sustaining online communities: The case of aapi identity work on reddit, in Proceedings of the 2019 CHI conference on human factors in computing systems, 2019, pp. 1–13. [124] K. M. DOUGLAS, R. M. SUTTON, D. JOLLEY, AND M. J. WOOD, The social, political, environmental, and health-related consequences of conspiracy theories, The psychology of conspiracy, (2015), pp. 183–200. [125] DPA, Fact check. https://dps-factify.com, Aug 2021. [126] F. DRAFT, Combating misinformation in under-resourced languages: lessons from around the world, 2020. [127] T. DREISBACH, On amazon, dubious ’antiviral’ supplements proliferate amid pandemic, (2020). [128] A. F. DUGAS, Y.-H. HSIEH, S. R. LEVIN, J. M. PINES, D. P. MAREINISS, A. MO- HAREB, C. A. GAYDOS, T. M. PERL, AND R. E. ROTHMAN, Google flu trends: correlation with emergency department influenza rates and crowding metrics, Clini- cal infectious diseases, 54 (2012), pp. 463–469. [129] C. DWYER, Task technology fit, the social technical gap and social networking sites, AMCIS 2007 Proceedings, (2007), p. 374. [130] M. DYE, D. NEMER, N. KUMAR, AND A. S. BRUCKMAN, If it rains, ask grandma to disconnect the nano: Maintenance & care in havana’s streetnet, Proceedings of the ACM on human-computer interaction, 3 (2019), pp. 1–27. 232 BIBLIOGRAPHY [131] M. DYE, D. NEMER, J. MANGIAMELI, A. S. BRUCKMAN, AND N. KUMAR, El paquete semanal: The week’s internet in havana, in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018, pp. 1–12. [132] U. K. ECKER, Z. O’REILLY, J. S. REID, AND E. P. CHANG, The effectiveness of short-format refutational fact-checks, British Journal of Psychology, 111 (2020), pp. 36–54. [133] J. ELIZABETH, Who are you calling a fact checker?, 2014. [134] R. EPSTEIN AND R. E. ROBERTSON, The search engine manipulation effect (seme) and its possible impact on the outcomes of elections, Proceedings of the National Academy of Sciences, 112 (2015), pp. E4512–E4521. [135] R. EPSTEIN, R. E. ROBERTSON, D. LAZER, AND C. WILSON, Suppressing the search engine manipulation effect (seme), Proceedings of the ACM on Human- Computer Interaction, 1 (2017), pp. 1–22. [136] S. ERDEN, K. NALBANT, AND H. FERAHKAYA, Autism and vaccinations: Does google side with science?, Journal of Contemporary Medicine, 9 (2019), pp. 295– 299. [137] I. ETIKAN, S. A. MUSA, AND R. S. ALKASSIM, Comparison of convenience sampling and purposive sampling, American journal of theoretical and applied statistics, 5 (2016), pp. 1–4. [138] F. FACT, coof-2020.pdf. https://fullfact.org/media/uploads/coof-2020.pdf, Dec 2020. (Accessed on 07/27/2023). [139] , Full fact. https://fullfact.org/, April 2021. (Accessed on 04/15/2021). [140] M. FADDOUL, G. CHASLOT, AND H. FARID, A longitudinal analysis of youtube’s promotion of conspiracy videos, arXiv preprint arXiv:2003.03318, (2020). [141] K. P. FERGUSON, Impact of technology on rural appalachian health care providers: Assessment of technological infrastructure, behaviors, and attitudes., (2005). 233 BIBLIOGRAPHY [142] J. FERREIRA, H. SHARP, AND H. ROBINSON, User experience design and agile development: managing cooperation through articulation work, Software: Practice and Experience, 41 (2011), pp. 963–974. [143] T. FINANCIAL, Amazon removed 1 million fake coronavirus cures and overpriced products, (2020). [144] FIRST DRAFT, The importance of local context in taking on misinformation: Lessons from africacheck, 2020. [145] S. FISCHER, Amazon has a misinformation problem, too, (2019). [146] R. S. FISH, R. E. KRAUT, AND M. D. LELAND, Quilt: A collaborative tool for cooperative writing, in Proceedings of the ACM SIGOIS and IEEECS TC-OA 1988 conference on Office information systems, 1988, pp. 30–37. [147] C. FOR AN INFORMED PUBLIC, D. F. R. LAB, GRAPHIKA, AND S. I. OBSERVA- TORY, The long fuse: Misinformation and the 2020 election, (2021). [148] S. FOX, Online health search 2006, (2006). [149] K. FRIDKIN, P. J. KENNEY, AND A. WINTERSIECK, Liar, liar, pants on fire: How fact-checking influences citizens’ reactions to negative advertising, Political Com- munication, 32 (2015), pp. 127–151. [150] A. FRIGGERI, L. ADAMIC, D. ECKLES, AND J. CHENG, Rumor cascades, in Eighth International AAAI Conference on Weblogs and Social Media, 2014. [151] A. J. B. FROM, Communicating fact checks online, (2020). [152] C. M. FULLER, D. P. BIROS, AND R. L. WILSON, Decision support for determining veracity via linguistic-based cues, Decision Support Systems, 46 (2009), pp. 695– 703. [153] FULLFACT, Github - fullfact/claim-review-schema-wordpress-plugin: An open source project to create a wordpress plugin for claim review schema. https://github.com/FullFact/claim-review-schema-wordpress- plugin, April 2021. (Accessed on 04/15/2021). [154] E. GAILLARD, Facebook under fire for permitting anti-vax groups, (2019). 234 BIBLIOGRAPHY [155] H. GAO, X. WANG, G. BARBIER, AND H. LIU, Promoting coordination for disaster relief–from crowdsourcing to coordination, in International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, Springer, 2011, pp. 197–204. [156] R. K. GARRETT AND B. E. WEEKS, The promise and peril of real-time corrections to political misperceptions, in Proceedings of the 2013 conference on Computer supported cooperative work, 2013, pp. 1047–1058. [157] Y. GE, S. LIU, R. GAO, Y. XIAN, Y. LI, X. ZHAO, C. PEI, F. SUN, J. GE, W. OU, ET AL., Towards long-term fairness in recommendation, in Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021, pp. 445–453. [158] A. GHENAI AND Y. MEJOVA, Catching zika fever: Application of crowdsourcing and machine learning for tracking health misinformation on twitter, arXiv preprint arXiv:1707.03778, (2017). [159] A. GHENAI AND Y. MEJOVA, Fake cures: user-centric modeling of health misinforma- tion in social media, Proceedings of the ACM on human-computer interaction, 2 (2018), pp. 1–20. [160] P. GHEZZI, P. G. BANNISTER, G. CASINO, A. CATALANI, M. GOLDMAN, J. MOR- LEY, M. NEUNEZ, A. PRADOS-BO, P. R. SMEESTERS, M. TADDEO, ET AL., Online information of vaccines: information quality, not only privacy, is an ethical responsibility of search engines, Frontiers in Medicine, 7 (2020). [161] T. GILLESPIE, The relevance of algorithms, Media technologies: Essays on commu- nication, materiality, and society, 167 (2014). [162] T. GILLESPIE, Algorithmically recognizable: Santorum’s google problem, and google’s santorum problem, Information, communication & society, 20 (2017), pp. 63–80. [163] F. GIRARDIN, Towards Reducing the Social-Technical Gap in Location-Aware Comput- ing, PhD thesis, Citeseer, 2007. [164] A. GLASER, Amazon is suggesting “frequently bought together” items that can make a bomb, (2017). [165] O. GOLDHILL, Amazon is selling coronavirus misinformation, (2020). 235 BIBLIOGRAPHY [166] A. GOLDMAN AND C. O’CONNOR, Social Epistemology, in The Stanford Ency- clopedia of Philosophy, E. N. Zalta, ed., Metaphysics Research Lab, Stanford University, Winter 2021 ed., 2021. [167] M. GOLEBIEWSKI AND D. BOYD, Data voids: Where missing data can easily be exploited, (2019). [168] L. A. GOODMAN, Snowball sampling, The annals of mathematical statistics, (1961), pp. 148–170. [169] GOOGLE, Google’s search quality rating guidelines, (2019). [170] L. GRACE AND B. HONE, Factitious: Large scale computer game to fight fake news and improve news literacy, in Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, 2019, pp. 1–8. [171] D. GRAVES, Understanding the promise and limits of automated fact-checking, (2018). [172] L. GRAVES, Deciding what’s true: Fact-checking journalism and the new ecology of news, PhD thesis, Columbia University, 2013. [173] L. GRAVES, Anatomy of a fact check: Objective practice and the contested epistemology of fact checking, Communication, Culture & Critique, 10 (2017), pp. 518–537. [174] L. GRAVES AND M. A. AMAZEEN, Fact-checking as idea and practice in journalism, in Oxford Research Encyclopedia of Communication, 2019. [175] L. GRAVES AND C. W. ANDERSON, Discipline and promote: Building infrastructure and managing algorithms in a “structured journalism” project by professional fact-checking groups, New Media & Society, 22 (2020), pp. 342–360. [176] L. GRAVES AND F. CHERUBINI, The rise of fact-checking sites in europe, (2016). [177] L. GRAVES AND T. GLAISYER, The fact-checking universe in spring 2012, New America, (2012). [178] J. GREEN, W. HOBBS, S. MCCABE, AND D. LAZER, Online engagement with 2020 election misinformation and turnout in the 2021 georgia runoff election, Proceed- ings of the National Academy of Sciences, 119 (2022), p. e2115900119. 236 BIBLIOGRAPHY [179] R. E. GRINTER, Using a configuration management tool to coordinate software devel- opment, in Proceedings of conference on Organizational computing systems, 1995, pp. 168–177. [180] T. GRØNSUND AND M. AANESTAD, Augmenting the algorithm: Emerging human- in-the-loop work configurations, The Journal of Strategic Information Systems, 29 (2020), p. 101614. Strategic Perspectives on Digital Work and Organizational Transformation. [181] Z. GUAN AND E. CUTRELL, An eye tracking study of the effect of target rank on web search, in Proceedings of the SIGCHI conference on Human factors in computing systems, 2007, pp. 417–420. [182] F. GUERRA, D. LINZ, R. GARCIA, B. KOMMATA, J. KOSIUK, J. CHUN, S. BOVEDA, AND D. DUNCKER, The use of instant messaging in clinical data sharing: the ehra sms survey, EP Europace, 23 (2021), pp. euab116–515. [183] A. GUESS, J. NAGLER, AND J. TUCKER, Less than you think: Prevalence and predictors of fake news dissemination on facebook, Science advances, 5 (2019), p. eaau4586. [184] A. GUPTA, H. LAMBA, P. KUMARAGURU, AND A. JOSHI, Faking sandy: character- izing and identifying fake images on twitter during hurricane sandy, in Proceed- ings of the 22nd international conference on World Wide Web, ACM, 2013, pp. 729–736. [185] P. H. CARSTENSEN, Modeling coordination work: Lessons learned from analyzing a cooperative work setting, in Symbiosis of Human and Artifact, Y. Anzai, K. Ogawa, and H. Mori, eds., vol. 20 of Advances in Human Factors/Er- gonomics, Elsevier, 1995, pp. 327–332. [186] J. HALE, More than 500 hours of content are now being uploaded to youtube every minute, (2019). [187] A. HANNAK, P. SAPIEZYNSKI, A. MOLAVI KAKHKI, B. KRISHNAMURTHY, D. LAZER, A. MISLOVE, AND C. WILSON, Measuring personalization of web search, in Proceedings of the 22nd international conference on World Wide Web, 2013, pp. 527–538. 237 BIBLIOGRAPHY [188] A. HANNAK, G. SOELLER, D. LAZER, A. MISLOVE, AND C. WILSON, Measuring price discrimination and steering on e-commerce web sites, in Proceedings of the 2014 conference on internet measurement conference, 2014, pp. 305–318. [189] A. HANNÁK, C. WAGNER, D. GARCIA, A. MISLOVE, M. STROHMAIER, AND C. WILSON, Bias in online freelance marketplaces: Evidence from taskrabbit and fiverr, in Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW ’17, ACM, 2017, pp. 1914– 1933. [190] M. M. HAQUE, M. YOUSUF, A. S. ALAM, P. SAHA, S. I. AHMED, AND N. HAS- SAN, Combating misinformation in bangladesh: Roles and responsibilities as per- ceived by journalists, fact-checkers, and users, Proceedings of the ACM on Human-Computer Interaction, 4 (2020), pp. 1–32. [191] N. HASSAN, F. ARSLAN, C. LI, AND M. TREMAYNE, Toward automated fact- checking: Detecting check-worthy factual claims by claimbuster, in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 1803–1812. [192] N. HASSAN, C. LI, AND M. TREMAYNE, Detecting check-worthy factual claims in presidential debates, in Proceedings of the 24th acm international on conference on information and knowledge management, 2015, pp. 1835–1838. [193] N. HASSAN, M. YOUSUF, M. MAHFUZUL HAQUE, J. A. SUAREZ RIVAS, AND M. KHADIMUL ISLAM, Examining the roles of automation, crowds and profession- als towards sustainable fact-checking, in Companion Proceedings of The 2019 World Wide Web Conference, 2019, pp. 1001–1006. [194] F. HE AND S. HAN, A method and tool for human–human interaction and instant collaboration in cscw-based cad, Computers in Industry, 57 (2006), pp. 740–751. [195] R. HEEKS, Most egovernment-for-development projects fail: how can risks be reduced?, (2003). [196] G. T. HELP, Explore results by region, (2020). [197] M. HERR, Writing and Publishing Your Book: A Guide for Experts in Every Field, ABC-CLIO, 2017. 238 BIBLIOGRAPHY [198] B. D. HORNE AND S. ADALI, This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news, in Eleventh International AAAI Conference on Web and Social Media, 2017. [199] R. HOU, V. PÉREZ-ROSAS, S. LOEB, AND R. MIHALCEA, Towards automatic detec- tion of misinformation in online medical videos, in 2019 International conference on multimodal interaction, 2019, pp. 235–243. [200] D. HOULI, M. L. RADFORD, AND V. K. SINGH, “covid19 is_”: The perpetuation of coronavirus conspiracy theories via google autocomplete, Proceedings of the Association for Information Science and Technology, 58 (2021), pp. 218–229. [201] D. HU, S. JIANG, R. E. ROBERTSON, AND C. WILSON, Auditing the partisanship of google search snippets, in The World Wide Web Conference, WWW ’19, New York, NY, USA, 2019, Association for Computing Machinery, p. 693–704. [202] D. HU, S. JIANG, R. E. ROBERTSON, AND C. WILSON, Auditing the partisanship of google search snippets, in The World Wide Web Conference, WWW ’19, ACM, 2019, pp. 693–704. [203] E. HUGHES, R. WANG, P. JUNEJA, T. MITRA, AND A. X. ZHANG, Introducing credibility signals and citations to video-sharing platforms, (2021). [204] K. HUNT, P. AGARWAL, AND J. ZHUANG, Monitoring misinformation on twitter during crisis events: a machine learning approach, Risk analysis, (2020). [205] E. HUSSEIN, P. JUNEJA, AND T. MITRA, Measuring misinformation in video search platforms: An audit study on youtube, Proceedings of the ACM on Human- Computer Interaction, 4 (2020), pp. 1–27. [206] A. HUTCHINSONA, Pinterest will limit search results for vaccine-related queries to content from official health outlets, (2019). [207] INFLUENCER, Find everything about youtube on noxinfluencer. https://www.noxinfluencer.com/, April 2021. (Accessed on 04/15/2021). [208] S. INSKEEP, Timeline: The false election fraud story trump told for months before jan. 6 : Npr. 239 BIBLIOGRAPHY https://www.npr.org/2021/02/08/965342252/timeline-what- trump-told-supporters-for-months-before-they-attacked, 2022. (Accessed on 04/18/2022). [209] INVID, Invid verification plugin - invid project. https://www.invid-project.eu/tools-and-services/invid- verification-plugin/, April 2021. (Accessed on 04/15/2021). [210] M. JACK, J. CHEN, AND S. J. JACKSON, Infrastructure as creative action: Online buying, selling, and delivery in phnom penh, in Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 2017, pp. 6511–6522. [211] S. J. JACKSON, 11 rethinking repair, Media technologies: Essays on communica- tion, materiality, and society, (2014), pp. 221–39. [212] S. J. JACKSON, A. POMPE, AND G. KRIESHOK, Repair worlds: maintenance, repair, and ict for development in rural namibia, in Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, 2012, pp. 107–116. [213] S. M. JANG AND J. K. KIM, Third person effects of fake news: Fake news regulation and media literacy interventions, Computers in human behavior, 80 (2018), pp. 295–302. [214] A. JAZEERA, Breaking news, world news and video from al jazeera | today’s latest from al jazeera. https://www.aljazeera.com/, April 2021. (Accessed on 04/15/2021). [215] S. JIANG, S. BAUMGARTNER, A. ITTYCHERIAH, AND C. YU, Factoring fact-checks: Structured information extraction from fact-checking articles, in Proceedings of The Web Conference 2020, 2020, pp. 1592–1603. [216] S. JIANG, R. E. ROBERTSON, AND C. WILSON, Bias misperceived: The role of partisanship and misinformation in youtube comment moderation, in Proceedings of the International AAAI Conference on Web and Social Media, vol. 13, 2019, pp. 278–289. 240 BIBLIOGRAPHY [217] S. JIANG AND C. WILSON, Linguistic signals under misinformation and fact-checking: Evidence from user comments on social media, Proceedings of the ACM on Human-Computer Interaction, 2 (2018), pp. 1–23. [218] D. JIMENEZ AND C. LI, An empirical study on identifying sentences with salient factual statements, in 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, 2018, pp. 1–8. [219] M. JIROTKA, R. PROCTER, T. RODDEN, AND G. C. BOWKER, Collaboration in e- research, Computer Supported Cooperative Work (CSCW), 15 (2006), pp. 251– 255. [220] H. M. JOHNSON AND C. M. SEIFERT, Sources of the continued influence effect: When misinformation in memory affects later inferences., Journal of experimental psychology: Learning, memory, and cognition, 20 (1994), p. 1420. [221] G. JORIS, F. DE GROVE, K. VAN DAMME, AND L. DE MAREZ, News diversity reconsidered: A systematic literature review unraveling the diversity in conceptual- izations, Journalism Studies, 21 (2020), pp. 1893–1912. [222] P. JUNEJA, M. M. BHUIYAN, AND T. MITRA, Assessing enactment of content regulation policies: A post hoc crowd-sourced audit of election misinformation on youtube, in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–22. [223] P. JUNEJA AND T. MITRA, Auditing e-commerce platforms for algorithmically curated vaccine misinformation, in Proceedings of the 2021 chi conference on human factors in computing systems, 2021, pp. 1–27. [224] A. KAPLAN, Youtube has allowed conspiracy theories about interference with voting machines to go viral | media matters for america. https://www.mediamatters.org/google/youtube-has-allowed- conspiracy-theories-about-interference-voting-machines- go-viral, 2020. (Accessed on 09/08/2022). [225] G. KARADZHOV, P. NAKOV, L. MÀRQUEZ, A. BARRÓN-CEDEÑO, AND I. KOY- CHEV, Fully automated fact checking using external sources, arXiv preprint arXiv:1710.00341, (2017). 241 BIBLIOGRAPHY [226] H. KARASTI AND J. BLOMBERG, Studying infrastructuring ethnographically, Com- puter Supported Cooperative Work (CSCW), 27 (2018), pp. 233–265. [227] A. KARP AND B. PARDO, Hapteq: A collaborative tool for visually impaired audio producers, in Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences, 2017, pp. 1– 4. [228] Y. S. KARTAL, B. GUVENEN, AND M. KUTLU, Too many claims to fact-check: Priori- tizing political claims based on check-worthiness, arXiv preprint arXiv:2004.08166, (2020). [229] A. KATA, A postmodern pandora’s box: anti-vaccination misinformation on the internet, Vaccine, 28 (2010), pp. 1709–1716. [230] A. KAZEMI, K. GARIMELLA, G. K. SHAHI, D. GAFFNEY, AND S. A. HALE, Tiplines to combat misinformation on encrypted platforms: A case study of the 2019 indian election on whatsapp, arXiv preprint arXiv:2106.04726, (2021). [231] A. KERR AND J. D. KELLEHER, The recruitment of passion and community in the service of capital: Community managers in the digital games industry, Critical studies in media communication, 32 (2015), pp. 177–192. [232] A. KHARA, Iran: Over 700 dead after drinking alcohol to cure coronavirus | coron- avirus pandemic news | al jazeera, April 2020. (Accessed on 06/21/2023). [233] J. KIM, B. TABIBIAN, A. OH, B. SCHÖLKOPF, AND M. GOMEZ-RODRIGUEZ, Leveraging the crowd to detect and reduce the spread of fake news and misinforma- tion, in Proceedings of the eleventh ACM international conference on web search and data mining, 2018, pp. 324–332. [234] S. KIM, O. F. YALCIN, S. E. BESTVATER, K. MUNGER, B. L. MONROE, AND B. A. DESMARAIS, The effects of an informational intervention on attention to anti-vaccination content on youtube, in Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, 2020, pp. 949–953. [235] C. KLIMAN-SILVER, A. HANNAK, D. LAZER, C. WILSON, AND A. MISLOVE, Location, location, location: The impact of geolocation on web search personalization, 242 BIBLIOGRAPHY in Proceedings of the 2015 Internet Measurement Conference, IMC ’15, ACM, 2015, pp. 121–127. [236] C. KLIMAN-SILVER, A. HANNAK, D. LAZER, C. WILSON, AND A. MISLOVE, Location, location, location: The impact of geolocation on web search personalization, in Proceedings of the 2015 Internet Measurement Conference, ACM, 2015, pp. 121–127. [237] P. KNIGHT, Outrageous conspiracy theories: Popular and official responses to 9/11 in germany and the united states, New German Critique, (2008), pp. 165–193. [238] S. KNOBLOCH-WESTERWICK, B. K. JOHNSON, N. A. SILVER, AND A. WESTER- WICK, Science exemplars in the eye of the beholder: How exposure to online science information affects attitudes, Science Communication, 37 (2015), pp. 575–601. [239] N. KOTONYA AND F. TONI, Explainable automated fact-checking for public health claims, arXiv preprint arXiv:2010.09926, (2020). [240] M. KRÜGER, A. WEIBERT, D. D. C. LEAL, D. RANDALL, AND V. WULF, It takes more than one hand to clap: On the role of ‘care’in maintaining design results., in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–14. [241] J. H. KUKLINSKI, P. J. QUIRK, J. JERIT, D. SCHWIEDER, AND R. F. RICH, Misin- formation and the currency of democratic citizenship, The Journal of Politics, 62 (2000), pp. 790–816. [242] J. KULSHRESTHA, M. ESLAMI, J. MESSIAS, M. B. ZAFAR, S. GHOSH, K. P. GUM- MADI, AND K. KARAHALIOS, Quantifying search bias: Investigating sources of bias for political searches in social media, in Proceedings of the 2017 ACM Con- ference on Computer Supported Cooperative Work and Social Computing, CSCW ’17, New York, NY, USA, 2017, Association for Computing Machinery, p. 417–432. [243] S. KUMAR, R. WEST, AND J. LESKOVEC, Disinformation on the web: Impact, char- acteristics, and detection of wikipedia hoaxes, in Proceedings of the 25th inter- national conference on World Wide Web, International World Wide Web Conferences Steering Committee, 2016, pp. 591–602. 243 BIBLIOGRAPHY [244] A. LAMPINEN, V. BELLOTTI, C. CHESHIRE, AND M. GRAY, Cscw and thesharing economy: The future of platforms as sites of work collaboration and trust, in Pro- ceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion, 2016, pp. 491–497. [245] W. LANGEWIESCHE, What really happened to malaysia’s missing airplane, (2019). [246] B. LATOUR AND S. WOOLGAR, Laboratory life: The construction of scientific facts, Princeton University Press, 2013. [247] J. LAXA, The consumption of disinformation as a health crisis, Journal of Public Health, 45 (2023), pp. e161–e161. [248] J. LAZAR, Public policy and hci: Making an impact in the future, Interactions, 22 (2015), p. 69–71. [249] B. LE, D. SPINA, F. SCHOLER, AND H. CHIA, A crowdsourcing methodology to measure algorithmic bias in black-box systems: A case study with covid-related searches, in Proceedings of the Third Workshop on Bias and Social Aspects in Search and Recommendation (Bias@ ECIR 2022), 2022. [250] D. LEADS, Data leads. https://dataleads.co.in/, April 2021. (Accessed on 04/15/2021). [251] J. LEBLAY, I. MANOLESCU, AND X. TANNIER, Computational fact-checking: Prob- lems, state of the art, and perspectives, in The Web Conference, 2018. [252] C. P. LEE, P. DOURISH, AND G. MARK, The human infrastructure of cyberinfrastruc- ture, in Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work, 2006, pp. 483–492. [253] S.-P. LEINO, S. LIND, M. POYADE, S. KIVIRANTA, P. MULTANEN, A. REYES- LECUONA, A. MÄKIRANTA, AND A. MUHAMMAD, Enhanced industrial main- tenance work task planning by using virtual engineering tools and haptic user interfaces., in HCI (13), 2009, pp. 346–354. [254] P. LEWIS AND E. MCCORMICK, How an ex-youtube insider investigated its secret algorithm, (2018). 244 BIBLIOGRAPHY [255] M. LEWIS-BECK, A. E. BRYMAN, AND T. F. LIAO, The Sage encyclopedia of social science research methods, Sage Publications, 2003. [256] C. LIMA, Youtube to remove videos claiming mass fraud changed election results - politico. https://www.politico.com/news/2020/12/09/youtube-videos- mass-fraud-election-results-443925, 2020. (Accessed on 09/08/2022). [257] C. LIMA AND A. SCHAFFER, Study finds social media posts about election fraud still prevalent - the washington post. https://www.washingtonpost.com/politics/2022/08/09/social- media-posts-about-election-fraud-still-prevalent-study- finds/, 2022. (Accessed on 09/06/2022). [258] P. LINARDATOS, V. PAPASTEFANOPOULOS, AND S. KOTSIANTIS, Explainable ai: A review of machine learning interpretability methods, Entropy, 23 (2021), p. 18. [259] C. LIOMA, J. G. SIMONSEN, AND B. LARSEN, Evaluation measures for relevance and credibility in ranked lists, in Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, 2017, pp. 91–98. [260] I. LITERAT, Y. K. CHANG, AND S.-Y. HSU, Gamifying fake news: Engaging youth in the participatory design of news literacy games, Convergence, 26 (2020), pp. 503– 516. [261] J. LOCHER, How to fight election misinformation in 2022 – brewminate: A bold blend of news and ideas. https://brewminate.com/how-to-fight-election- misinformation-in-2022/, 2022. (Accessed on 09/08/2022). [262] R. LUDOLPH, A. ALLAM, AND P. SCHULZ, Manipulating google's knowledge graph box to counter biased information processing during an online search on vaccination: Application of a technological debiasing strategy, Journal of Medical Internet Research, 18 (2016), p. e137. [263] N. LUNDBERG AND H. TELLIOG˘LU, Understanding complex coordination processes in health care, Scandinavian Journal of Information Systems, 11 (1999), p. 5. 245 BIBLIOGRAPHY [264] M. MACDONALD AND M. A. BROWN, Republican candidates are spreading more fake news than just two years ago - the washington post. https://www.washingtonpost.com/politics/2022/08/29/ republicans-democrats-misinformation-falsehoods/, 2022. (Accessed on 09/10/2022). [265] G. MARK, B. AL-ANI, AND B. SEMAAN, Repairing human infrastructure in war zones, Proceedings of ISCRAM, (2009), pp. 10–13. [266] R. MARKUSSEN, Politics of intervention in design: Feminist reflections on the scandi- navian tradition, ai & Society, 10 (1996), pp. 127–141. [267] M. MARS AND R. E. SCOTT, Whatsapp in clinical practice: A literature, The Promise of New Technologies in an Age of New Health Challenges, (2016), p. 82. [268] L. MATSAKIS, Facebook will crack down on anti-vaccine content, (2019). [269] L. MCDONALD AND C. O’DONOVAN, Youtube continues to promote anti-vax videos as facebook prepares to fight medical misinformation, (2019). [270] M. MCGEE, In quality raters’ handbook, google adds higher standards for “your money or your life” websites, (2013). [271] MEEDAN, Meedan. https://meedan.com/, April 2021. (Accessed on 04/15/2021). [272] P. MELO, J. MESSIAS, G. RESENDE, K. GARIMELLA, J. ALMEIDA, AND F. BEN- EVENUTO, Whatsapp monitor: A fact-checking system for whatsapp, in Proceed- ings of the International AAAI Conference on Web and Social Media, vol. 13, 2019, pp. 676–677. [273] P. MELO, J. MESSIAS, G. RESENDE, K. GARIMELLA, J. ALMEIDA, AND F. BEN- EVENUTO, Whatsapp monitor: A fact-checking system for whatsapp, Proceedings of the International AAAI Conference on Web and Social Media, 13 (2019), pp. 676–677. [274] A. MERELLI, The average anti-vaxxer is probably not who you think she is, (2015). 246 BIBLIOGRAPHY [275] D. METAXA, J. S. PARK, J. A. LANDAY, AND J. HANCOCK, Search media and elections: A longitudinal investigation of political search results, Proceedings of the ACM on Human-Computer Interaction, 3 (2019), pp. 1–17. [276] P. T. METAXAS AND Y. PRUKSACHATKUN, Manipulation of search engine results during the 2016 us congressional elections, (2017). [277] R. MIHAILA, How to stop a youtube channel from showing up in search results. https://www.makeuseof.com/block-youtube-channel-from- search-results, Feb 2023. (Accessed on 05/30/2023). [278] T. MILLER, Explanation in artificial intelligence: Insights from the social sciences, Artificial intelligence, 267 (2019), pp. 1–38. [279] T. MITRA, Understanding social media credibility, PhD thesis, Georgia Institute of Technology, 2017. [280] T. MITRA, S. COUNTS, AND J. W. PENNEBAKER, Understanding anti-vaccination attitudes in social media, in Tenth International AAAI Conference on Web and Social Media, 2016. [281] T. MITRA AND E. GILBERT, Credbank: A large-scale social media corpus with associ- ated credibility annotations, in Ninth International AAAI Conference on Web and Social Media, 2015. [282] T. MITRA, C. J. HUTTO, AND E. GILBERT, Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk, in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2015, pp. 1345–1354. [283] T. MITRA, G. P. WRIGHT, AND E. GILBERT, A parsimonious language model of social media credibility across disparate events, in Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing, 2017, pp. 126–145. [284] B. MØNSTED AND S. LEHMANN, Algorithmic detection and analysis of vaccine- denialist sentiment clusters in social networks, arXiv preprint arXiv:1905.12908, (2019). 247 BIBLIOGRAPHY [285] J. T. MORGAN, M. GILBERT, D. W. MCDONALD, AND M. ZACHRY, Project talk: Coordination work and group membership in wikiprojects, in Proceedings of the 9th International Symposium on Open Collaboration, 2013, pp. 1–10. [286] D. MOTA, C. V. DE CARVALHO, AND L. P. REIS, Fostering collaborative work between educators in higher education, in 2011 IEEE International Conference on Systems, Man, and Cybernetics, IEEE, 2011, pp. 1286–1291. [287] P. NAKOV, D. CORNEY, M. HASANAIN, F. ALAM, T. ELSAYED, A. BARRÓN- CEDEÑO, P. PAPOTTI, S. SHAAR, AND G. D. S. MARTINO, Automated fact- checking for assisting human fact-checkers, arXiv preprint arXiv:2103.07769, (2021). [288] P. M. NAPOLI, Exposure diversity reconsidered, Journal of information policy, 1 (2011), pp. 246–259. [289] B. A. NARDI AND Y. ENGESTRÖM, A web on the wind: The structure of invisible work, Computer supported cooperative work, 8 (1999), pp. 1–8. [290] NASA, Nasa facts, (2001). [291] B. NEWS, 9/11 conspiracy theories: How they’ve evolved, (2011). [292] A. T. NGUYEN, A. KHAROSEKAR, S. KRISHNAN, S. KRISHNAN, E. TATE, B. C. WALLACE, AND M. LEASE, Believe it or not: Designing a human-ai partnership for mixed-initiative fact-checking, in Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, 2018, pp. 189–199. [293] L. U. NGUYEN, Infrastructural action in vietnam: Inverting the techno-politics of hacking in the global south, New Media & Society, 18 (2016), pp. 637–652. [294] N. OCEANIC AND A. ADMINISTRATION, Do contrails affect conditions on the surface?, (2016). [295] A. OELDORF-HIRSCH, M. SCHMIERBACH, A. APPELMAN, AND M. P. BOYLE, The ineffectiveness of fact-checking labels on news memes and articles, Mass Com- munication and Society, 23 (2020), pp. 682–704. [296] H. OF COMMONS, Public administration and constitutional affairs committee, oral evidence: Governance of statistics, 2019. 248 BIBLIOGRAPHY [297] A. OLSHANSKY, Conspiracy theorizing and religious motivated reasoning: Why the earth 'must' be flat, (2018). [298] W. H. ORGANIZATION, Mmr and autism, (2019). [299] , Ten threats to global health in 2019, 2019. [300] L. H. OWEN, Republicans seem more susceptible to fake news than democrats (but liberals, don’t feel too comfy yet) | nieman journalism lab. https://www.niemanlab.org/2017/05/republicans-seem-more- susceptible-to-fake-news-than-democrats-but-liberals- dont-feel-too-comfy-yet/, 2017. (Accessed on 09/14/2022). [301] L. H. OWEN, One group that’s really benefited from covid-19: Anti-vaxxers, (2020). [302] A. PÁEZ, The pragmatic turn in explainable artificial intelligence (xai), Minds and Machines, 29 (2019), pp. 441–459. [303] K. PAPADAMOU, S. ZANNETTOU, J. BLACKBURN, E. DE CRISTOFARO, G. STRINGHINI, AND M. SIRIVIANOS, “it is just a flu”: Assessing the effect of watch history on youtube’s pseudoscientific video recommendations, in Proceed- ings of the International AAAI Conference on Web and Social Media, vol. 16, 2022, pp. 723–734. [304] E. PARISER, The filter bubble: How the new personalized web is changing what we read and how we think, Penguin, 2011. [305] F. PASQUALE, Beyond innovation and competition: The need for qualified transparency in internet intermediaries, Nw. UL Rev., 104 (2010), p. 105. [306] F. PASQUALE, Restoring transparency to automated authority, J. on Telecomm. & High Tech. L., 9 (2011), p. 235. [307] S. R. PENDSE, F. M. LALANI, M. DE CHOUDHURY, A. SHARMA, AND N. KU- MAR, " like shock absorbers": Understanding the human infrastructures of technology-mediated mental health support, in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 2020, pp. 1–14. 249 BIBLIOGRAPHY [308] G. PENNYCOOK, T. D. CANNON, AND D. G. RAND, Prior exposure increases perceived accuracy of fake news., Journal of experimental psychology: general, (2018). [309] A. PERRIN, Book reading 2016, (2016). [310] A. PERRIN AND M. ANDERSON, Social media usage in the u.s. in 2019 | pew research center. https://www.pewresearch.org/fact-tank/2019/04/10/share- of-u-s-adults-using-social-media-including-facebook-is- mostly-unchanged-since-2018/, 2019. (Accessed on 09/08/2022). [311] PESACHECK, Pesacheck. https://pesacheck.org/, April 2021. (Accessed on 04/15/2021). [312] L. PETERSON, T. ANDERSON, D. CULLER, AND T. ROSCOE, A blueprint for introducing disruptive technology into the internet, SIGCOMM Computer Com- munication Review, 33 (2003), pp. 59–64. [313] S. PHADKE AND T. MITRA, Many faced hate: A cross platform study of content framing and information sharing by online hate groups, in Proceedings of the 2020 CHI conference on human factors in computing systems, 2020, pp. 1–13. [314] T. W. POST, About the fact checker - the washington post. https://www.washingtonpost.com/politics/2019/01/07/about- fact-checker/, April 2021. (Accessed on 04/15/2021). [315] POYNTER, The international fact-checking network, accessed in March, 2021. [316] PRERNA JUNEJA AND T. MITRA, Algorithmic nudge: Using xai frameworks to design interventions, CHI 2021 Workshop on Operationalizing Human-centered Perspectives in Explainable AI, (2021). [317] PRERNA JUNEJA AND T. MITRA, Human and technological infrastructures of fact- checking, Proc. ACM Hum.-Comput. Interact., (2022). 250 BIBLIOGRAPHY [318] PRERNA JUNEJA, D. RAMA SUBRAMANIAN, AND T. MITRA, Through the looking glass: Study of transparency in reddit’s moderation practices, Proc. ACM Hum.- Comput. Interact., 4 (2020). [319] S. PULLAN AND M. DEY, Vaccine hesitancy and anti-vaccination in the time of covid-19: A google trends analysis, Vaccine, 39 (2021), pp. 1877–1881. [320] K. PURCELL, Findings: Search and email remain the top online activities| pew internet & american life project, Pew Research Center’s Internet & American Life Project, (2011). [321] V. QAZVINIAN, E. ROSENGREN, D. R. RADEV, AND Q. MEI, Rumor has it: Identify- ing misinformation in microblogs, in Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, 2011, pp. 1589–1599. [322] T. QUINT, Latest news, breaking news live, top news headlines, viral videos news updates - the quint. https://www.thequint.com/, April 2021. (Accessed on 04/15/2021). [323] L. RAINIE AND S. FOX, The online health care revolution, Pew Research Center, (2000). [324] I. D. RAJI AND J. BUOLAMWINI, Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products, in Proceed- ings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019, pp. 429–435. [325] L. B. RASMUSSEN, From human-centred to human-context centred approach: looking back over ‘the hills’, what has been gained and lost?, Ai & Society, 21 (2007), pp. 471–495. [326] T. N. REPUBLIC, The new republic. https://newrepublic.com/, April 2021. (Accessed on 04/15/2021). [327] P. RESNICK, S. CARTON, S. PARK, Y. SHEN, AND N. ZEFFER, Rumorlens: A system for analyzing the impact of rumors and corrections in social media, in Proc. Computational Journalism Conference, vol. 5, 2014. 251 BIBLIOGRAPHY [328] M. REYNOLDS, Amazon sells ’autism cure’ books that suggest children drink toxic, bleach-like substances, (2019). [329] S. T. ROBERTS, Behind the screen: The hidden digital labor of commercial content moderation, PhD thesis, University of Illinois at Urbana-Champaign, 2014. [330] R. E. ROBERTSON, S. JIANG, K. JOSEPH, L. FRIEDLAND, D. LAZER, AND C. WIL- SON, Auditing partisan audience bias within google search, Proceedings of ACM on Human Computer Interaction, 2 (2018), pp. 148:1–148:22. [331] R. E. ROBERTSON, S. JIANG, K. JOSEPH, L. FRIEDLAND, D. LAZER, AND C. WIL- SON, Auditing partisan audience bias within google search, Proceedings of the ACM on Human-Computer Interaction, 2 (2018), pp. 1–22. [332] R. E. ROBERTSON, D. LAZER, AND C. WILSON, Auditing the personalization and composition of politically-related search engine results pages, in Proceedings of the 2018 World Wide Web Conference, WWW ’18, International World Wide Web Conferences Steering Committee, 2018, pp. 955–965. [333] J. J. ROBINSON, J. MADDOCK, AND K. STARBIRD, Examining the role of human and technical infrastructure during emergency response., in ISCRAM, 2015. [334] S. RODDY, Recent updates to amazon verified purchase reviews, 2019. [335] A. RODRIGUEZ, Youtube’s algorithms can drag you down a rabbit hole of conspiracies, researcher finds, (2018). [336] J. ROOZENBEEK AND S. VAN DER LINDEN, Fake news game confers psychological resistance against online misinformation, Palgrave Communications, 5 (2019), pp. 1–10. [337] J. ROOZENBEEK AND S. VAN DER LINDEN, Breaking harmony square: A game that “inoculates” against political misinformation, The Harvard Kennedy School Misinformation Review, (2020). [338] A. ROVETTA, Infodemic emergency in italy: A longitudinal analysis of the web interest in sources of dis-misinformation, epidemiologically dangerous behaviors, and vaccine hesitancy during covid-19, (2022). 252 BIBLIOGRAPHY [339] N. RUMMEL, H. SPADA, F. HERMANN, F. CASPAR, AND K. SCHORNSTEIN, Promoting the coordination of computer-mediated interdisciplinary collaboration, (2002). [340] N. SAMBASIVAN AND T. SMYTH, The human infrastructure of ictd, in Proceed- ings of the 4th ACM/IEEE international conference on information and communication technologies and development, 2010, pp. 1–9. [341] M. SAMORY AND T. MITRA, Conspiracies online: User discussions in a conspiracy community following dramatic events, in ICWSM, 2018. [342] M. SAMORY AND T. MITRA, ’the government spies using our webcams’: The language of conspiracy theories in online discussions, Proceedings of the ACM on Human- Computer Interaction, 2 (2018), pp. 1–24. [343] C. SANDVIG, K. HAMILTON, K. KARAHALIOS, AND C. LANGBORT, An algorithm audit, Data and Discrimination: Collected Essays. Washington, DC: New America Foundation, (2014), pp. 6–10. [344] C. SANDVIG, K. HAMILTON, K. KARAHALIOS, AND C. LANGBORT, Auditing algorithms: Research methods for detecting discrimination on internet platforms, Data and discrimination: converting critical concerns into productive inquiry, 22 (2014). [345] S. SAWYER AND A. TAPIA, Always articulating: Theorizing on mobile and wireless technologies, The Information Society, 22 (2006), pp. 311–323. [346] N. SCHAROWSKI, Transparency and Trust in AI, PhD thesis, Institute of Psychology, 2020. [347] A. B. SCHIFF, 090821_letter to amazon.pdf. https://schiff.house.gov/imo/media/doc/090821_Letter% 20to%20Amazon.pdf, month = Sep, year = 2021, note = (Accessed on 06/22/2023). [348] P. SCHMIDT, F. BIESSMANN, AND T. TEUBNER, Transparency and trust in artificial intelligence systems, Journal of Decision Systems, 29 (2020), pp. 260–278. [349] N. F. SCHNEIDEWIND, The state of software maintenance, IEEE Transactions on Software Engineering, (1987), pp. 303–310. 253 BIBLIOGRAPHY [350] G. SCHWITZER, Pollution of health news, 2017. [351] S. SCUTTI, Facebook to target vaccine misinformation with focus on pages, groups, ads, (2019). [352] S. SEARCHER, Social searcher - free social media search engine. https://www.social-searcher.com/, April 2021. (Accessed on 04/15/2021). [353] A. SEITZ, In election misinformation fight, ’2020 changed everything’ | ap news. https://apnews.com/article/2022-midterm-elections- voting-rights-technology-business-social-media- f5ba340c7a98f6f058fb3afac74a26bb, 2022. (Accessed on 09/10/2022). [354] J. C. M. SERRANO, O. PAPAKYRIAKOPOULOS, AND S. HEGELICH, Nlp-based feature extraction for the detection of covid-19 misinformation videos on youtube, in Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, 2020. [355] V. SESSIONS AND M. VALTORTA, The effects of data quality on machine learning algorithms., ICIQ, 6 (2006), pp. 485–498. [356] C. SHAO, G. L. CIAMPAGLIA, A. FLAMMINI, AND F. MENCZER, Hoaxy: A plat- form for tracking online misinformation, in Proceedings of the 25th international conference companion on world wide web, 2016, pp. 745–750. [357] G. SHEPARD, The Real Watergate Scandal: Collusion, Conspiracy, and the Plot That Brought Nixon Down, Simon and Schuster, 2015. [358] J. SHIN AND T. VALENTE, Algorithms and health misinformation: A case study of vaccine books on amazon, Journal of Health Communication, (2020), pp. 1–8. [359] P. SHIRALKAR, A. FLAMMINI, F. MENCZER, AND G. L. CIAMPAGLIA, Finding streams in knowledge graphs to support fact checking, in 2017 IEEE International Conference on Data Mining (ICDM), IEEE, 2017, pp. 859–864. [360] J. SHUMWAY, Oregon gop frontrunner for governor embraces claims of election fraud – oregon capital chronicle. https://oregoncapitalchronicle.com/2022/02/01/oregon-gop- frontrunner-for-governor-embraces-claims-of-election- fraud/, 2022. 254 BIBLIOGRAPHY (Accessed on 09/08/2022). [361] J. SIMKO, M. TOMLEIN, B. PECHER, R. MORO, I. SRBA, E. STEFANCOVA, A. HRCKOVA, M. KOMPAN, J. PODROUZEK, AND M. BIELIKOVA, Towards continuous automatic audits of social media adaptive behavior and its role in misin- formation spreading, in Adjunct Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, 2021, pp. 411–414. [362] S. C. SIVEK AND S. BLOYD-PESHKIN, Where do facts matter? the digital paradox in magazines’ fact-checking processes, Journalism Practice, 13 (2019), pp. 998–1002. [363] A. SMITH, V. KUMAR, J. BOYD-GRABER, K. SEPPI, AND L. FINDLATER, Closing the loop: User-centered design and evaluation of a human-in-the-loop topic modeling system, in 23rd International Conference on Intelligent User Interfaces, 2018, pp. 293–304. [364] P. L. SOO RIN KIM, LAURA ROMERO AND K. HOLLAND, With 10 weeks until midterms, election deniers are hampering some election preparations - abc news. https://abcnews.go.com/US/10-weeks-midterms-election- deniers-hampering-election-preparations/story?id= 89007798, 2022. (Accessed on 09/07/2022). [365] A. SPAA, A. DURRANT, C. ELSDEN, AND J. VINES, Understanding the boundaries between policymaking and hci, in Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, New York, NY, USA, 2019, Association for Computing Machinery, p. 1–15. [366] SPOONBILL, Spoonbill. https://spoonbill.io/, April 2021. (Accessed on 04/15/2021). [367] S. L. STAR AND K. RUHLEDER, Steps toward an ecology of infrastructure: Design and access for large information spaces, Information systems research, 7 (1996), pp. 111–134. [368] S. L. STAR AND A. STRAUSS, Layers of silence, arenas of voice: The ecology of visible and invisible work, Computer supported cooperative work (CSCW), 8 (1999), pp. 9–30. 255 BIBLIOGRAPHY [369] M. STEIGER, T. J. BHARUCHA, S. VENKATAGIRI, M. J. RIEDL, AND M. LEASE, The psychological well-being of content moderators, (2021). [370] M. STENCEL, Number of fact-checking outlets surges to 188 in more than 60 countries, 2019. [371] H. STEPNICK, How will social media platforms respond to election misinformation? it isn’t clear - poynter. https://www.poynter.org/fact-checking/2022/how- will-social-media-platforms-respond-to-election- misinformation-it-isnt-clear/, 2022. (Accessed on 09/08/2022). [372] A. STISEN, N. VERDEZOTO, H. BLUNCK, M. B. KJÆRGAARD, AND K. GRØN- BÆK, Accounting for the invisible work of hospital orderlies: Designing for local and global coordination, in Proceedings of the 19th ACM Conference on Computer- Supported Cooperative Work & Social Computing, 2016, pp. 980–992. [373] N. J. STROUD, Polarization and partisan selective exposure, Journal of communica- tion, 60 (2010), pp. 556–576. [374] C. R. SUNSTEIN, Conspiracy theories and other dangerous ideas, Simon and Schuster, 2014. [375] C. TANG, Y. CHEN, B. C. SEMAAN, AND J. A. ROBERSON, Restructuring human infrastructure: The impact of ehr deployment in a volunteer-dependent clinic, in Pro- ceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, 2015, pp. 649–661. [376] Y. R. TAUSCZIK AND J. W. PENNEBAKER, The psychological meaning of words: Liwc and computerized text analysis methods, Journal of language and social psychology, 29 (2010), pp. 24–54. [377] N. TAYLOR, K. CHEVERST, P. WRIGHT, AND P. OLIVIER, Leaving the wild: lessons from community technology handovers, in Proceedings of the SIGCHI Confer- ence on Human Factors in Computing Systems, 2013, pp. 1549–1558. [378] A. O. A. M. TEAM, 45% of american adults doubt vaccine safety, according to survey, (2019). 256 BIBLIOGRAPHY [379] T. Y. TEAM, Continuing our work to improve recommendations on youtube, (2019). [380] D. TEYSSOU, J.-M. LEUNG, E. APOSTOLIDIS, K. APOSTOLIDIS, S. PAPADOPOU- LOS, M. ZAMPOGLOU, O. PAPADOPOULOU, AND V. MEZARIS, The invid plug-in: web video verification on the browser, in Proceedings of the first interna- tional workshop on multimedia verification, 2017, pp. 23–30. [381] THE WASHINGTON POST, Fact checker - the washington post. https://www.washingtonpost.com/news/fact-checker/, April 2021. (Accessed on 04/15/2021). [382] THE YOUTUBE TEAM, Managing harmful conspiracy theories on youtube. https://blog.youtube/news-and-events/harmful-conspiracy- theories-youtube/, 2020. (Accessed on 09/08/2022). [383] L. THOMAS, Pins and needles: Pinterest tackles spread of vaccine misinformation, (2019). [384] A. THOMPSON, Trump deploys youtube as his secret weapon in 2020 - politico. https://www.politico.com/news/2020/09/06/trumpyoutube- election-comeback-408576, 2020. (Accessed on 09/08/2022). [385] J. THORNE, A. VLACHOS, C. CHRISTODOULOPOULOS, AND A. MITTAL, Fever: a large-scale dataset for fact extraction and verification, arXiv preprint arXiv:1803.05355, (2018). [386] L. A. TIMES, Man inspired by false ‘pizzagate’ rumor on internet pleads guilty to shooting at d.c. restaurant, (2017). [387] T. N. Y. TIMES, The new york times/cbs news poll, (2004). [388] D. TINGLEY AND G. WAGNER, Solar geoengineering and the chemtrails conspiracy on social media, Palgrave Communications, 3 (2017), p. 12. [389] P. TOLMIE, R. PROCTER, D. W. RANDALL, M. ROUNCEFIELD, C. BURGER, G. WONG SAK HOI, A. ZUBIAGA, AND M. LIAKATA, Supporting the use of user generated content in journalistic practice, in Proceedings of the 2017 chi conference on human factors in computing systems, 2017, pp. 3632–3644. 257 BIBLIOGRAPHY [390] M. TOMLEIN, B. PECHER, J. SIMKO, I. SRBA, R. MORO, E. STEFANCOVA, M. KOMPAN, A. HRCKOVA, J. PODROUZEK, AND M. BIELIKOVA, An au- dit of misinformation filter bubbles on youtube: Bubble bursting and recent behavior changes, in Fifteenth ACM Conference on Recommender Systems, 2021, pp. 1– 11. [391] J. M. P. TORNERO, S. S. TAYIE, S. TEJEDOR, AND C. PULIDO, How to confront fake news through news literacy? state of the art., Doxa Comunicación, (2018). [392] TRELLO, Trello. https://trello.com/en-US, April 2021. (Accessed on 04/15/2021). [393] D. TRIELLI AND N. DIAKOPOULOS, Search as news curator: The role of google in shaping attention to news information, in Proceedings of the 2019 CHI Con- ference on Human Factors in Computing Systems, CHI ’19, ACM, 2019, pp. 453:1–453:15. [394] D. TRIELLI AND N. DIAKOPOULOS, Search as news curator: The role of google in shaping attention to news information, in Proceedings of the 2019 CHI Confer- ence on human factors in computing systems, 2019, pp. 1–15. [395] D. TUFFLEY, Mind the gap, International Journal of Sociotechnology and Knowl- edge Development (IJSKD), 1 (2009), pp. 58–69. [396] K. T. UNRUH AND W. PRATT, The invisible work of being a patient and implications for health care:“[the doctor is] my business partner in the most important business in my life, staying alive.”, in Ethnographic Praxis in Industry Conference Proceedings, vol. 2008, Wiley Online Library, 2008, pp. 40–50. [397] E. VAN COUVERING, Is relevance relevant? market, science, and war: Discourses of search engine quality, Journal of Computer-Mediated Communication, 12 (2007), pp. 866–887. [398] S. VAN DER LINDEN, Misinformation: susceptibility, spread, and interventions to immunize the public, Nature Medicine, 28 (2022), pp. 460–467. [399] T. G. VAN DER MEER AND Y. JIN, Seeking formula for misinformation treatment in public health crises: The effects of corrective information type and source, Health Communication, 35 (2020), pp. 560–575. 258 BIBLIOGRAPHY [400] M. VAN DER MEULEN AND W. G. REIJNIERSE, Factcorp: A corpus of dutch fact- checks and its multiple usages, in Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 1286–1292. [401] N. VERDEZOTO, N. BAGALKOT, S. Z. AKBAR, S. SHARMA, N. MACKINTOSH, D. HARRINGTON, AND P. GRIFFITHS, The invisible work of maintenance in com- munity health: Challenges and opportunities for digital health to support frontline health workers in karnataka, south india, Proceedings of the ACM on Human- Computer Interaction, 5 (2021), pp. 1–31. [402] G. VERMA, A. BHARDWAJ, T. ALEDAVOOD, M. DE CHOUDHURY, AND S. KU- MAR, Examining the impact of sharing covid-19 misinformation online on mental health, Scientific Reports, 12 (2022), pp. 1–9. [403] R. VERN AND S. K. DUBEY, Evaluating the maintainability of a software system by using fuzzy logic approach, Int. J. Information Technology and Computer Science, 7 (2014), pp. 67–72. [404] Y. VIEWERS, Learn about watch history on youtube - youtube. https://www.youtube.com/watch?v=YbWZcgOYHAc&ab_channel= YouTubeViewers, 2022. (Accessed on 02/07/2022). [405] N. VINCENT, B. HECHT, AND S. SEN, “data strikes”: Evaluating the effectiveness of a new form of collective action against technology companies, in The World Wide Web Conference, 2019, pp. 1931–1943. [406] N. VINCENT, I. JOHNSON, P. SHEEHAN, AND B. HECHT, Measuring the impor- tance of user-generated content to search engines, Proceedings of the International AAAI Conference on Web and Social Media, 13 (2019), pp. 505–516. [407] N. VINCENT, I. JOHNSON, P. SHEEHAN, AND B. HECHT, Measuring the impor- tance of user-generated content to search engines, in Proceedings of the Interna- tional AAAI Conference on Web and Social Media, vol. 13, 2019, pp. 505–516. [408] J. VITAK, P. ZUBE, A. SMOCK, C. T. CARR, N. ELLISON, AND C. LAMPE, It’s complicated: Facebook users’ political participation in the 2008 election, CyberPsy- chology, behavior, and social networking, 14 (2011), pp. 107–114. 259 BIBLIOGRAPHY [409] D. WAKABAYASHI, Election misinformation continues staying up on youtube. - the new york times. https://www.nytimes.com/2020/11/10/technology/election- misinformation-continues-staying-up-on-youtube.html, Accessed on 09/08/2022. [410] W. Y. WANG, " liar, liar pants on fire": A new benchmark dataset for fake news detection, arXiv preprint arXiv:1705.00648, (2017). [411] X. WANG, N. GOLBANDI, M. BENDERSKY, D. METZLER, AND M. NAJORK, Position bias estimation for unbiased learning to rank in personal search, in Pro- ceedings of the Eleventh ACM International Conference on Web Search and Data Mining, ACM, 2018, pp. 610–618. [412] B. WASSON, Identifying coordination agents for collaborative telelearning, Inter- national Journal of Artificial Intelligence in Education (IJAIED), 9 (1998), pp. 275–299. [413] W. WEBBER, A. MOFFAT, AND J. ZOBEL, A similarity measure for indefinite rankings, ACM Transactions on Information Systems (TOIS), 28 (2010), pp. 1–38. [414] M. WEBSTER, Stringer definition & meaning - merriam-webster. https://www.merriam-webster.com/dictionary/stringer, 2022. (Accessed on 04/17/2022). [415] C. G. WEISSMAN, Despite recent crackdown, youtube still promotes plenty of conspir- acies, (2019). [416] C. G. WEISSMAN, Despite recent crackdown, youtube still promotes plenty of conspir- acies, (2019). [417] J. WHITTAKER, S. LOONEY, A. REED, AND F. VOTTA, Recommender systems and the amplification of extremist content, Internet Policy Review, 10 (2021), pp. 1–29. [418] WHOIS.NET, Whois lookup & ip | whois.net. https://www.whois.net/, April 2021. (Accessed on 04/15/2021). [419] WIKIPEDIA, Conspiracy theory, (2002). [420] WIKIPEDIA, Malaysia airlines flight 370, (2019). 260 BIBLIOGRAPHY [421] WIKIPEDIA, Project mkultra, (2019). [422] WIKIPEDIA CONTRIBUTORS, 2009 swine flu pandemic, (2020). [423] C. WILSON, The promise and peril of algorithm audits for increasing transparency and accountability of donated datasets, (2019). [424] S. WINEBURG AND S. MCGREW, Lateral reading: Reading less and learning more when evaluating digital information, (2017). [425] M. WOOD, Has the internet been good for conspiracy theorising, PsyPAG Quarterly, 88 (2013), pp. 31–34. [426] WORLD HEALTH ORGANIZATION, Six common misconceptions about immunization, (2019). [427] Z. WU, Y. LIU, Q. ZHANG, K. WU, M. ZHANG, AND S. MA, The influence of image search intents on user behavior and satisfaction, in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, ACM, 2019, pp. 645–653. [428] W. XIAO, H. ZHAO, H. PAN, Y. SONG, V. W. ZHENG, AND Q. YANG, Beyond personalization: Social content recommendation for creator equality and consumer satisfaction, in Proceedings of the 25th ACM SIGKDD International Confer- ence on Knowledge Discovery & Data Mining, 2019, pp. 235–245. [429] X. XIE, Y. LIU, M. DE RIJKE, J. HE, M. ZHANG, AND S. MA, Why people search for images using web search engines, in Proceedings of the Eleventh ACM Interna- tional Conference on Web Search and Data Mining, ACM, 2018, pp. 655–663. [430] D. XIN, L. MA, J. LIU, S. MACKE, S. SONG, AND A. PARAMESWARAN, Ac- celerating human-in-the-loop machine learning: Challenges and opportunities, in Proceedings of the second workshop on data management for end-to-end machine learning, 2018, pp. 1–4. [431] YOUGOV, Most flat earthers consider themselves very religious, (2018). [432] D. G. YOUNG, K. H. JAMIESON, S. POULSEN, AND A. GOLDRING, Fact-checking effectiveness as a function of format and tone: Evaluating factcheck. org and flackcheck. org, Journalism & Mass Communication Quarterly, 95 (2018), pp. 49– 75. 261 BIBLIOGRAPHY [433] YOUTUBE, Supporting the 2020 u.s. election, (2020). [434] , Youtube community guidelines, (2020). [435] YOUTUBE, View or delete search history - computer - youtube help. https://support.google.com/youtube/answer/57711?co=GENIE. Platform%3DDesktop&hl=en, 2022. (Accessed on 02/07/2022). [436] YOUTUBE, Elections misinformation policies - youtube help. https://support.google.com/youtube/answer/10835034?hl=en, Accessed on 09/08/2022. [437] YOUTUBE, Browse youtube while incognito on mobile devices - youtube help, (Accessed on 09/14/2022). [438] Q. ZHANG, A. LIPANI, S. LIANG, AND E. YILMAZ, Reply-aided detection of misinformation via bayesian deep learning, in The world wide web conference, 2019, pp. 2333–2343. [439] M. ZONIS AND C. M. JOSEPH, Conspiracy thinking in the middle east, Political Psychology, (1994), pp. 443–459. 262