samemonthesisfinal.pdf (2.6 MB)
Download file

Characterizing Misinformed Online Health Communities

Download (2.6 MB)
posted on 2020-09-22, 14:06 authored by Shahan Ali MemonShahan Ali Memon
Do vaccines cause Autism? Does drinking bleach cure coronavirus? Is polio vaccination a ploy to sterilize and reduce the population? From a medical perspective,
the definitive answer to all these questions would be “no”. And yet, the web is populated by a cacophony of mixed opinions about these issues triggered by the proliferation of public health misinformation. Health-related misinformation has detrimental effects on the public health, and debunking it is a challenging task. Because these misinformed sub-communities discourage differing beliefs, public health practitioners and policy makers must grapple
with the challenge of penetrating into these communities to disseminate facts or conduct any message-based intervention. Combating the spread of false information by differential promotion or censorship of the content, or by broadcasting facts does not work. Instead, there is a need
to strategically communicate with the misinformed communities. This requires a thorough understanding of what is an effective communication paradigm to debunk
such myths. For an effective message-based intervention, it is imperative to focus on preference-based framing where the preferences of the target sub-community are taken into consideration. These preferences can be defined over two main aspects: (i) who should deliver the message; (ii) what should the message be. Choosing the right messenger(s) requires understanding of how these online communities interact by tapping into their network structures. Choosing the content of the message, on the other hand, requires a thorough understanding of what language choices they make, and how those language choices reflect their non-negotiable social identities. In this work, we identify two different health communities online: (i) vaccination
sub-communities; and (ii) COVID-19 misinformation sub-communities. In the first part of this thesis, we characterize the network, and sociolinguistic variation in the online competing vaccination sub-communities to understand their linguistic choices and motivations. With the emergence of COVID-19 pandemic, the political and medical misinformation has elevated to create what is being commonly referred to as the global infodemic. Thus, in the second part, we first introduce a novel Twitter dataset, CMU-MisCov19 annotated for different COVID-19 themes.
We then use this dataset to characterize the competing COVID-19 misinformation sub-communities. Our analyses show that the competing sub-communities within each part tend to have significant differences in their communication patterns, and that these differences can be leveraged to form better message interventions. We also make our
annotated dataset available for the community to use for further analysis.


CMLH Center for Machine learning and Health




Degree Type

Master's Thesis


Language Technologies Institute

Degree Name

  • Master of Science (MS)


Kathleen M. Carley