MindReader: Querying Databases Through Multiple Examples

Users often can not easily express their queries. For example, in a multimedia/image by content setting, the user might want photographs with sunsets; in current systems, like QBIC, the user has to give a sample query, and to specify the relative importance of color, shape and texture. Even worse, the user might want correlations between attributes, like, for example, in a traditional, medical record database, a medical researcher might want to find "mildly overweight patients", where the implied query would be "weight/height ~ 4 lb/inch". Our goal is to provide a user-friendly, but theoretically solid method, to handle such queries. We allow the user to give several examples, and, optionally, their 'goodness' scores, and we propose a novel method to "guess" which attributes are important, which correlations are important, and with what weight. Our contributions are twofold: (a) we formalize the problem as a minimization problem and show how to solve for the optimal solution, completely avoiding the ad-hoc heuristics of the past. (b) Moreover, we are the first that can handle 'diagonal' queries (like the 'overweight' query above). Experiments on synthetic and real datasets show that our method estimates quickly and accurately the 'hidden' distance function in the user's mind.