WorkHabit Blogs
WORKHABIT LABSUsing Hibernate Search for fulltext indexing
During one of our recent projects, we ran into the classic deadlock between having non-blocking database updates with the ability to perform fulltext searches against our MySQL database.
Let me back up and explain a bit better what I mean. In other words, "let me explain.. no, there is too much. Let me sum up."
- MyISAM Tables: very fast access times, supports FULLTEXT indexes, but writing records will lock a whole table until the write completes. Also subject to data corruption and is non-transactional. This is BAD for many reasons; for example, if you have a lot of write traffic, since your site performance will suffer. Additionally, there's no support for commit / rollback, and if you need something compliant, it's simply not an option to use this table type.
- InnoDB Tables: slightly slower than MyISAM, does NOT support FULLTEXT indexes, but you get all the transactional goodness and only locks records being updated, rather than the whole table.
So as a responsible enterprise development shop, we have to go with InnoDB. But how to do full text searches?
Introduce Hibernate Search, which provides the power of an ORM with the ability to do full text searches backed by lucene and solr, and nearly transparently as well.
Let's walk through the steps:
Set up Maven Dependencies
This is only really necessary if you're using Maven, otherwise you can fetch the dependencies yourself and put them somewhere in a lib directory, add them to your classpath, and go from there. In our pom.xml, we added these dependencies, in addition to the ones that we have for hibernate core, hibernate-annotations, and all the dbcp/c3po goodness that we could possibly need:
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search</artifactId>
<version>3.1.1.GA</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-common</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-core</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-snowball</artifactId>
<version>2.4.1</version>
</dependency>
Set up Hibernate's Configuration
This will tell hibernate how to use lucene to auto-index, and where to put lucene's configuration/indices. In your hibernate.cfg.xml:
org.hibernate.search.store.FSDirectoryProvider
</property>
<property name="hibernate.search.default.indexBase">
/tmp/lucene/indexes
</property>
Set up our Entities to be indexed.
In this case, we have a series of titles and authors that we want to search. Two entity beans define these:
import javax.persistence.*;
import javax.xml.bind.annotation.*;
@Entity
@Indexed
public class Article {
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@DocumentId
private Long id;
@Column
@Field(index = Index.TOKENIZED, store = Store.NO)
private String title;
@ManyToOne
@IndexedEmbedded
private Author author;
... getters and setters for the above go here ...
}
Same thing with the author class:
@Indexed
public class Author {
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@DocumentId
private Long id;
@Column
private String name;
@Column
@Field(index = Index.TOKENIZED, store = Store.NO)
private String bio;
@OneToMany
@IndexedEmbedded
private List<Article> articles;
... getters and setters for the above go here ...
}
The @Indexed, @Field, @DocumentId, and @IndexedEmbedded annotations are all from the hibernate-search package, and tell hibernate what to do with these entities. Pretty basic.
Set up the DAO
In our case, we're using our own DAO patter, not the hibernate Entity Manager (though that's absolutely an option), so we added a method on our DAO to take advantage of lucene's index. In our app, everything is handled as a runtime exception and run through an AOP advisor for throw advice, so we catch the parse exception and rethrow it as unchecked.
Class<T> tClass,
String[] fields,
String searchTerms,
DetachedCriteria criteria) {
// the regular hibernate session is wrapped in a FullTextSession
FullTextSession fullTextSession = Search.getFullTextSession(getSession());
// we have to begin the transaction again. Closing it is handled by our AOP layer.
Transaction tx = fullTextSession.beginTransaction();
// used to process the query
MultiFieldQueryParser parser = new MultiFieldQueryParser(
fields,
new StandardAnalyzer()
);
try {
// generates a lucene search based on our search terms
org.apache.lucene.search.Query query = parser.parse(searchTerms);
// we build a hibernate query
FullTextQuery hibQuery = fullTextSession.createFullTextQuery(
query,
tClass
);
if (criteria != null) {
// if there are any hibernate criteria, we drop those in..
// This is only needed really if you want to reference some
// hibernate field in the query
hibQuery.setCriteriaQuery(
criteria.getExecutableCriteria(fullTextSession)
);
}
// returns a list of entity beans that contain our search terms
return hibQuery.list();
} catch (ParseException e) {
throw new RuntimeException(e);
}
}
Now the fun part. Making a query
In our manager, we have a method that returns a list of authors that have written Articles with titles containing the words "Chicago Transit Authority." It couldn't be easier:
What's Happening?
Hibernate automagically indexes new content as it is saved or updated. For existing content, there's a means of re-indexing it, but I'll refer to the hibernate documentation for how to do that, plus much more:
http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#...


Post new comment