Exploring Massive Structured Data in Argus
journal contributionposted on 2009-01-01, 00:00 authored by Jaime G. Carbonell, Eugene Fink, Chun Jin, Cenk Gazen, Phil Hayes, Ganesh Mani, Dwight Dietrich
Project Argus is focused on helping an analyst explore massive, structured data. This exploration includes exact and partial match queries, monitoring hypotheses and discovery of new patterns in both static and streaming data. We provide these facilities within the context of a workbench interface, called Data Explorer. We support exploration of data that is a collection of records, each of which is structured as several distinct fields. For instance, financial transfers are typically represented as structured records, with such fields as sending bank, sending account number, currency, amount, date, receiving account, etc. Most fields are well-defined, like a date, a dollar amount, or the receiving bank. Other fields may be longer and of more free-form content, like the body of an email message. In Argus, we have focused exclusively on the well-defined, structured data. As previously reported, we have been working on methods to retrieve such data flexibly to accommodate the lack of integrity and consistency in real-world data, to monitor it for watch patterns, and to identify novel and emerging trends as it accumulates over time.