posted on 2008-05-01, 00:00authored byChris Scaffidi, Brad Myers, Mary Shaw
<p>Programmers often omit input validation when inputs can appear</p>
<p>in many different formats or when validation criteria cannot be</p>
<p>precisely specified. To enable validation in these situations, we</p>
<p>present a new technique that puts valid inputs into a consistent</p>
<p>format and that identifies “questionable” inputs which might be</p>
<p>valid or invalid, so that these values can be double-checked by a</p>
<p>person or a program. Our technique relies on the concept of a</p>
<p>“tope”, which is an application-independent abstraction describing</p>
<p>how to recognize and transform values in a category of data.</p>
<p>We present our definition of topes and describe a development</p>
<p>environment that supports the implementation and use of topes.</p>
<p>Experiments with web application and spreadsheet data indicate</p>
<p>that using our technique improves the accuracy and reusability of</p>
<p>validation code and also improves the effectiveness of subsequent</p>
<p>data cleaning such as duplicate identification.</p>