Using Search Engine Technology for Information Management

For information management, databases offer precise, controlled access to data. But, they do not offer the easy-to-use search capabilities that most knowledge workers manipulate daily on sites such as Google. Access to information contained in databases is more difficult, and more restricted. One solution to this information bottleneck is to let search engines support the brunt of the work, by offloading information from the database into alternative infrastructures, such as that provided by search engine technology. Many business applications such as search, report generation and data analysis might be performed more efficiently on the replicated data without involving the native database technology, e.g. transactions. These offloaded databases, retaining some of their structure, can be recombined, mashed up, creating one-off, possibly disposable, databases, while the primary data is safe in the original database. This workshop will examine the limits and potentialities of use information retrieval and search engine technology for information management (IM) applications.

Topics to be explored:

– search engine as a database

– business intelligence applications without OLAP

– optimization of relational database search

– affordances of search engine technology for database offloading

– mashups for user applications

– content aggregation systems

– limits of database offloading

– database connectors

– access-optimized databases

– disposable databases

– optimizing access, flexibility, and scalability

This workshop explores the application of information retrieval and search engine technologies and paradigms in the context of traditional information management systems and applications. Traditional systems often use relational databases management systems which offer many advantages for information management: providing a normative modeling of data, ensuring transaction security, access control, protecting data through versioning and rollbacks, formal specification of interprocess communication. But with this data-centered security comes implementation and exploitation costs that reduce flexibility, reduce response time, and limit access in information management systems. Many of the uses of databases might be offloaded by replicating the data in easier-to-access and scalable technologies such as search engine technologies. For rapid prototyping, mature search engine technology might be a low-cost solution for aggregating, repurposing and mashing up data in novel ways. There has been a growing tendency of including more and more semantics in information retrieval systems (information extraction); and using databases to stored unstructured (documents) and semi structured (XML) data. The focus of this workshop will be examine the opposite problem, exploring the opportunities of moving from structured databases to information retrieval systems which nonetheless recognize and maintain native semantic structure of the data.

As information retrieval and information management communities grow closer together, this workshop will provide a meeting point for discussing technology overlap.

Program Committee Chairs

Gregory Grefenstette — Exalead, France

Wolfgang Nejdl — University of Hannover, Germany

David Simmen — IBM Almaden, USA

Tiny url: http://tinyurl.com/USETIM09

Paper formatting guidelines: http://vldb2009.org/?q=node/5

Papers submissions (4-8 pages) : http://www.easychair.org/conferences/?conf=usetim2009

Confirmed Program Committee Members

Rakesh Agrawal — Microsoft, USA

Hannah Bast — Max-Planck Institute, Germany

Lukas Biewald — Dolores Labs, USA

Stefano Ceri — Politecnico de Milano, Italy

Eben Haber — IBM, USA

Donald Kossmann — ETHZ, Switzerland

Pankaj Mehra — HP, Russia

Johannes Meinecke — SAP, Germany

Guillaume Pierre — Vrije Universiteit, Netherlands

Swami Sivasubramanian — Amazon, USA

Qi Su — Aster Data Systems, USA

Øystein Torbjørnsen — FAST, Norway

Ingmar Weber — EPFL, Switzerand