#Open Source #Ideas

Pucene: Vision

26 Mar

Open-source projects often come to the point where they have to decide if they want to rely on a third party product like elasticsearch to realize a search. Also Sulu had already reached this point.

There we had decided to implement a simple abstraction layer (MassiveSearchBundle) which allows the developer to decide if he wants to use zend-search (raw PHP) or index the data in elasticsearch.

But this abstractions has a big disadvantage we rely on the highest common factor between zend-search and elasticsearch. This is the reason why the bundle only allows searching for Lucene queries and index very simple data-structures.

Another problem we have currently is that zend-search is quite "sleepy" since a long time. So we searched for an alternative implementation of indexing and searching - based on lucene - since a while.

"Search is something that any application should have."
Shay Banon - Creator of Elasticsearch

Idea trigger

In January this year I passed the training CoreDeveloper for elasticsearch. The training was offered by elastic and hold in Munich. After the 2 days of hearing how elasticsearch works - I was really motivated to see how far I can get with reimplementing Lucene and the additions which elasticsearch provides.

After a few hours reading the Lucene and elasticsearch documentation I have started a few tests and see that the basic analyze, index and search process is quite easy to implement.

This was quite amazing to see. In the discussion with the Sulu-team we realize that this could be a solution for all our problems.

Current State

The current state I have already reached is that we are able to analyze, index and search documents. Also scoring is implemented in a very Hacky way.

In one of the next blog-posts I will give some more internals about scoring algorithm in elasticsearch and how I want to achieve that in pucene.