Carnegie Mellon University
Browse

Supporting Global Context under Evolving User Intents during Data Exploration

Download (2.73 MB)
thesis
posted on 2023-01-06, 21:44 authored by Joseph ChangJoseph Chang

Whether its consumers comparing all the available options on Amazon, novice learners synthesizing information scattered across many online tutorials and discussion boards, or data scientists analyzing datasets to find patterns and themes, users often need to explore large quantities of unstructured information beyond an individual’s capacity to process them fully. Typically, users reduce task uncertainty by learning the unknown unknowns as they process individual pieces of information to gain deep qualitative insights. However, the cost of evaluating learned insights (known unknowns) under the global context can be high, prohibiting users to evaluate their generalizability and whether they lead to high-yield information patches [176]. For example, a consumer who encountered a product recommendation on one webpage may need to search across the Web and consider many other sources to figure out if it is worth adding it to their shortlist for deeper comparisons. As they read online reviews they often discover new criteria that fit their personal context and interests, but it can be difficult for them to figure out how well new criteria can differentiate all the different products on their shortlist. Similarly, a team of scientists who observed an interesting phenomenon on a subset of data also needed to spend a lot of effort figuring out whether it generalizes to the rest of the dataset [51]. Most existing approaches either focused on aggregation techniques of unstructured data (e.g., topic modeling, review summarization and aspect extraction) or interaction techniques for exploring structured data (e.g., faceted navigation and multivariate visualizations), and do not support this process of bottom-up exploration and interpretation of unstructured online data. 

This thesis explores systems and interaction techniques that support users in exploring large and unstructured data by allowing them to both examine each piece of information to gain local insights and at the same time evaluate them under the global context. I identify and focus on two domains in which addressing this issue can lead to high impact. The first half of the thesis focuses on the domain of crowdsourced sensemaking, in which an individual’s capacity for understanding large datasets is scaled up by segmenting data into microtasks to be processed by a group of crowdworkers. I describe two approaches that allowed crowdworkers who each saw a small subset of data to generate categories that were more globally coherent compared to existing crowd-based and computation-based approaches (Chapters 3 and 4). The second part of the thesis focuses on supporting individual sensemaking, in which an individual explores and synthesizes online information scattered across different webpages for their own personal tasks, such as product comparison or trip planning. I describe three systems that allow users to discover important options and criteria from one source and evaluate them across information sources and different options to gain a deeper global understanding with lowered interaction costs (Chapters 5 to 7). Through lab and field deployment user studies, I investigated the costs and benefits of the systems for supporting personal online sensemaking. 

History

Date

2020-07-15

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Aniket Kittur