Topes: Reusable Abstractions for Validating Data
Programmers often omit input validation when inputs can appear
in many different formats or when validation criteria cannot be
precisely specified. To enable validation in these situations, we
present a new technique that puts valid inputs into a consistent
format and that identifies “questionable” inputs which might be
valid or invalid, so that these values can be double-checked by a
person or a program. Our technique relies on the concept of a
“tope”, which is an application-independent abstraction describing
how to recognize and transform values in a category of data.
We present our definition of topes and describe a development
environment that supports the implementation and use of topes.
Experiments with web application and spreadsheet data indicate
that using our technique improves the accuracy and reusability of
validation code and also improves the effectiveness of subsequent
data cleaning such as duplicate identification.