posted on 2009-02-01, 00:00authored byChris Scaffidi, Brad Myers, Mary Shaw
When users combine data from multiple sources into a
spreadsheet or dataset, the result is often a mishmash of
different formats, since phone numbers, dates, course numbers
and other string-like kinds of data can each be written
in many different formats. Although spreadsheets provide
features for reformatting numbers and a few specific kinds
of string data, they do not provide any support for the wide
range of other kinds of string data encountered by users.
We describe a user interface where a user can describe the
formats of each kind of data. We provide an algorithm that
uses these formats to automatically generate reformatting
rules that transform strings from one format to another. In
effect, our system enables users to create a small expert
system called a “tope” that can recognize and reformat instances
of one kind of data. Later, as the user is working
with a spreadsheet, our system recommends appropriate
topes for validating and reformatting the data. With a recall
of over 80% for a query time of under 1 second, this algorithm
is accurate enough and fast enough to make useful
recommendations in an interactive setting. A laboratory
experiment shows that compared to manual typing, users
can reformat sample spreadsheet data more than twice as
fast by creating and using topes.