Book Review: Search-Based Applications

Search is moving away from the simple keyword search box with a result list for indexed web pages. Features like facetting and aggregations offer completely new possibilities for data discovery, making them relevant for business applications as well.

The Book

Search-Based Applications by Greogory Grefenstette and Laura Wilber describes the changes that have occured in the last years regarding search engines, traditionally used for indexing web pages and databases, that have been used for business applications.

The short book introduces the motivation for using search engines in business applications, mostly caused by exponential data growth and realtime needs. Several chapters describe what has changed in the database and search engine world, focusing on one aspect in each chapter. On the search side it shows that advanced features like faceted search or natural language processing techniques can be valuable for offering real time access on data that has traditionally been put to a data warehouse. On the database side it shows that with the advent of non-relational types, some databases are moving in the direction of the flexible schema, scalability or specialized access patterns of search engines.

Some common themes in the book for using search based applications are the aggregation of content from different data sources and the reduction of load on databases by offloading the traffic to the read optimized search engines. Mixing content from different data sources can be useful to provide flexible access on multiple legacy systems, increasing usability of the applications. The document model of search engines and the possibility to do incremental indexing lead to applications that provide near realtime access to data and can be adjusted to match changing needs quicker.

Though most of the book is product agnostic one chapter lists some platforms that are available for building search based applications, mainly focusing on big commercial players like Exalead (the company of the authors), Endeca and Autonomy. The book closes with three case studies that show different aspects of building search based applications.

Even if there are some statements contained that I don't fully agree with or that are even contradictory it is a very good book for understanding the reasoning behind building search based applications. I got some new ideas for applications of search engines and this alone makes it a worthwile read.

Open Source Options for Search Based Applications

Though the book lists quite some SBA platforms and related technology there is not a single mention of Apache Solr, which is quite surprising as it employs a lot of the features the authors define for SBAs. Solr has the Data Import Handler to connect external data sources, semantic technologies (though probably not as rich as some of the commercial options) and complementary open source projects like carrot² for search result clustering or ManifoldCF as a connector framework.

When the book talks about replacing parts of data warehouse applications with SBAs for real time analytics this of course reminds me of use cases for Elasticsearch. Kibana or custom dashboards can make a wealth of information that is contained in the index accessible in an easy way.