WorkHabit Blogs

WORKHABIT LABS

Using Hibernate Search for fulltext indexing

by Aaron Stewart Published: August 5th, 2009
Tagged: buzzword compliant, hibernate, howto, java, lucene, maven, mysql, performance, solr

During one of our recent projects, we ran into the classic deadlock between having non-blocking database updates with the ability to perform fulltext searches against our MySQL database.

Let me back up and explain a bit better what I mean. In other words, "let me explain.. no, there is too much. Let me sum up."

  • MyISAM Tables: very fast access times, supports FULLTEXT indexes, but writing records will lock a whole table until the write completes. Also subject to data corruption and is non-transactional. This is BAD for many reasons; for example, if you have a lot of write traffic, since your site performance will suffer. Additionally, there's no support for commit / rollback, and if you need something compliant, it's simply not an option to use this table type.
  • InnoDB Tables: slightly slower than MyISAM, does NOT support FULLTEXT indexes, but you get all the transactional goodness and only locks records being updated, rather than the whole table.

So as a responsible enterprise development shop, we have to go with InnoDB. But how to do full text searches?

Introduce Hibernate Search, which provides the power of an ORM with the ability to do full text searches backed by lucene and solr, and nearly transparently as well.

Let's walk through the steps:

Set up Maven Dependencies

This is only really necessary if you're using Maven, otherwise you can fetch the dependencies yourself and put them somewhere in a lib directory, add them to your classpath, and go from there. In our pom.xml, we added these dependencies, in addition to the ones that we have for hibernate core, hibernate-annotations, and all the dbcp/c3po goodness that we could possibly need:

<dependency>
     <groupId>org.hibernate</groupId>
     <artifactId>hibernate-search</artifactId>
     <version>3.1.1.GA</version>
</dependency>
<dependency>
     <groupId>org.apache.solr</groupId>
     <artifactId>solr-common</artifactId>
     <version>1.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.solr</groupId>
    <artifactId>solr-core</artifactId>
    <version>1.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-snowball</artifactId>
    <version>2.4.1</version>
</dependency>

Set up Hibernate's Configuration

This will tell hibernate how to use lucene to auto-index, and where to put lucene's configuration/indices. In your hibernate.cfg.xml:

<property name="hibernate.search.default.directory_provider">
   org.hibernate.search.store.FSDirectoryProvider
</property>
<property name="hibernate.search.default.indexBase">
    /tmp/lucene/indexes
</property>

Set up our Entities to be indexed.

In this case, we have a series of titles and authors that we want to search. Two entity beans define these:

package com.workhabit.example.entity;

import javax.persistence.*;
import javax.xml.bind.annotation.*;

@Entity
@Indexed
public class Article {
        @Id
        @GeneratedValue(strategy = GenerationType.AUTO)
        @DocumentId
        private Long id;

        @Column
        @Field(index = Index.TOKENIZED, store = Store.NO)
        private String title;

        @ManyToOne
        @IndexedEmbedded
        private Author author;

        ... getters and setters for the above go here ...
}

Same thing with the author class:

@Entity
@Indexed
public class Author {
        @Id
        @GeneratedValue(strategy = GenerationType.AUTO)
        @DocumentId
        private Long id;

        @Column
        private String name;

        @Column
        @Field(index = Index.TOKENIZED, store = Store.NO)
        private String bio;

        @OneToMany
        @IndexedEmbedded
        private List<Article> articles;

        ... getters and setters for the above go here ...
}

The @Indexed, @Field, @DocumentId, and @IndexedEmbedded annotations are all from the hibernate-search package, and tell hibernate what to do with these entities. Pretty basic.

Set up the DAO

In our case, we're using our own DAO patter, not the hibernate Entity Manager (though that's absolutely an option), so we added a method on our DAO to take advantage of lucene's index. In our app, everything is handled as a runtime exception and run through an AOP advisor for throw advice, so we catch the parse exception and rethrow it as unchecked.

public <T> List<T> getAllBySearch(
  Class<T> tClass,
  String[] fields,
  String searchTerms,
  DetachedCriteria criteria) {
  // the regular hibernate session is wrapped in a FullTextSession
  FullTextSession fullTextSession = Search.getFullTextSession(getSession());  

   // we have to begin the transaction again.  Closing it is handled by our AOP layer.
   Transaction tx = fullTextSession.beginTransaction();

   // used to process the query
   MultiFieldQueryParser parser = new MultiFieldQueryParser(
      fields,
      new StandardAnalyzer()
   );  

   try {
     // generates a lucene search based on our search terms
     org.apache.lucene.search.Query query = parser.parse(searchTerms);
 
     // we build a hibernate query
     FullTextQuery hibQuery = fullTextSession.createFullTextQuery(
       query,
       tClass
     );  

     if (criteria != null) {
       // if there are any hibernate criteria, we drop those in..
       // This is only needed really if you want to reference some
       //  hibernate field in the query
       hibQuery.setCriteriaQuery(
          criteria.getExecutableCriteria(fullTextSession)
       );  
     }
     // returns a list of entity beans that contain our search terms
     return hibQuery.list();

   } catch (ParseException e) {
     throw new RuntimeException(e);
   }
}

Now the fun part. Making a query

In our manager, we have a method that returns a list of authors that have written Articles with titles containing the words "Chicago Transit Authority." It couldn't be easier:

public List<Author> getAuthorsForSearchTerms(String searchTerms, int start, int limit) {

        String[] fields = new String[] { "articles.title" };

        return dao.getAllBySearch(Author.class, fields, searchTerms, null);
 }

What's Happening?

Hibernate automagically indexes new content as it is saved or updated. For existing content, there's a means of re-indexing it, but I'll refer to the hibernate documentation for how to do that, plus much more:

http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#...

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote> <h3>
  • You can use Markdown syntax to format and style the text.

More information about formatting options

Papernote
Papernote

WorkHabit Labs Archives