Carnegie Mellon University
Browse
- No file added yet -

Computational Models of Identity Presentation in Language

Download (1.85 MB)
thesis
posted on 2023-01-06, 21:38 authored by Michael YoderMichael Yoder

Researchers in computer science and natural language processing have built models of how language use varies according to the latent, stable identities of language users. However, such a conception of identity is ill-equipped to investigate dynamic, contextual expressions of identity in online communities. This thesis draws on theories from sociolinguistics, linguistic anthropology, and the social sciences that view identity not as fixed and predetermined, but constructed in language and interaction. We pair these theories with techniques from machine learning, statistics, and natural language processing in a new framework for computational investigations of identity presentation in online communities. This framework identifies the role identity presentation plays in online contexts, as well as its relationship to social interaction and outcomes within and outside particular online communities. We demonstrate this framework on datasets of linguistic and social interaction from two online contexts known for identity talk, especially regarding gender and sexuality. The first context is Tumblr, a social media and micro-blogging site, where we examine direct identity presentation of users in textual self-descriptions and communities in network data. The second context is fanfiction, narratives that transform and expand on original media, where we investigate the indirect presentation of character identity in narrative. 

With data from Tumblr, we first examine associations between self-presented identity labels and content sharing through the network of users. We extract both identity categories and specific labels presented by Tumblr users in free-text bio boxes and consider whether similarities or differences in self-presentation affect the propagation of content. To test this hypothesis, we use self-presentation features in a machine learning task predicting whether a user will share content from another user. We find that identity features provide an informative signal often overlooked in previous work on content propagation. Interpreting the learned feature weights in a linear model, we find that alignment on different “levels” of identity self-presentation (broad categories or exact label matches) have differing effects on content propagation in a social network. Interactions between labels that indicate shared experience or values, such as conceptualizations of gender, are particularly informative. Though we cannot directly observe the construction of social solidarity or alignment that comes from self-presentation in language, in this way we are able to use computational tools to discover its effects. 

Identity alignment plays a role in how content propagates in Tumblr, but communities are also salient in this social media site. From communities emergent from network connections among users, we investigate the effect of community alignment on content propagation. We find a non-random association, yet this effect is quite small compared with the influence of features of the content. The most informative content features relate to communities, however, which points to the importance of community-based identity organized around content rather than direct network connections among users on Tumblr.

We then use this framework to examine how implicit identity positioning in nar- rative is associated with social change within and beyond online communities. For this we use corpora of fanfiction, stories written by fans that expand or change original narratives from TV shows, comics, books and movies. To extract which characters are present in stories and which text is used to portray them, we first introduce and evaluate a processing pipeline that adapts natural language processing tools for entity coreference and quote attribution to the fanfiction domain of varied, informal narrative. 

Using this pipeline, we investigate a suite of computational approaches that use word vectors to represent characters and relationships between them in narrative. In particular, we investigate the ability of such approaches to identify when fanfiction writers change the depiction of a relationship between characters from its portrayal in the original, source narrative as romantic or not. A qualitative analysis reveals that this predictive model picks up on emotionally intense language used around relationships that have been changed from original media. We also construct visualizations of the learned representations for characters and analyze the extent to which contrasts in these representations reflect contrasts in the positioning of characters between original and derived works. However, some difficulty in separating out social phenomena from more surface-level features in language such as genre persists. 

Fanfiction has been considered a “queer space” for its considerable representation of same-gender relationships. The community of online fanfiction readers and writers has grown at the same time of significant offline action by the LGBTQ social movement for rights. We investigate how the representation of LGBTQ characters in number and in textual portrayal varies along with changes in the offline social movement. We find that representations of characters in same-gender marriages in fanfiction increases after 2015 US marriage equality. We also find correlations between the representation of LGBTQ characters in fanfiction with trends in mainstream news of LGBTQ issues from 2010-2020. Cultural events such as Pride drive this effect more than legal and law-focused news, suggesting the nature of the relationship between fanfiction and the LGBTQ social movement. 

In these computational projects, we operationalize identity as a dynamic, contextual presentation in a new framework for computational analysis of identity in online communities. The findings from these projects demonstrate how this framework can be used to identify the values around identity that are central in online communities– and how these identity presentations relate to social interactions online and offline. These include which combinations of identity labels relate to the flow of content on social media, and which presentations of LGBTQ characters in online narrative relate to the LGBTQ social movement offline. Our goal is for this framework, theory, and example case studies to be useful examples to others investigating the effects of identity in online communities at a large scale. 

History

Date

2021-09-16

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Carolyn Penstein Rosé

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC