This document covers the latest stable release of Hibernate Search 5. If you’re looking for a getting started guide for the latest stable series, Hibernate Search 7.2, see here. |
Welcome to Hibernate Search. The following chapter will guide you through the initial steps required to integrate Hibernate Search 5.11 into an existing Hibernate enabled application.
In case you are a new to Hibernate we recommend you start from here.
System requirements
- Java Runtime
-
To use the latest version of the Hibernate Search 5.11 series (5.11.12.Final) you’ll need Java version 8 or greater. You can download a Java Runtime for Windows/Linux/Solaris here.
- Hibernate Search
-
hibernate-search-orm-5.11.12.Final.jar and all runtime dependencies. You can get the jar artifacts either from the dist/lib directory of the Hibernate Search distribution or you can download them from the Maven central repository.
- Hibernate ORM
-
You will need
hibernate-core
version 5.4 and its transitive dependencies. See here for information on how to get these. - JPA 2.2
-
Even though Hibernate Search can be used without needing JPA annotations the following instructions will use them for basic entity configuration (@Entity, @Id, @OneToMany,…). This part of the configuration could also be expressed in xml or code. Hibernate Search has its own set of annotations (@Indexed, @DocumentId, @Field,…) for which there exists so far no XML based alternative; if annotations aren’t suited for your project, a better option is the Programmatic Mapping API.
If you can’t upgrade beyond Java 7, you can use Hibernate Search versions 5.6.x. People on Java 6 should look at the older versions of Hibernate Search 4.x. For Java 5, you can use Hibernate Search 3.x.
Using Maven
The Hibernate Search artifacts can be found in Maven’s central repository but are released first in the JBoss Maven repository. To make sure you always get the latest updates in a timely manner we suggest to add this repository to your global settings.xml file (see also Maven Getting Started for more details).
This is all you need to add to your pom.xml to get started:
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search-orm</artifactId>
<version>{latest_stable}</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search-elasticsearch</artifactId>
<version>{latest_stable}</version>
</dependency>
Optionally, you may also store your indexes in an Infinispan cluster; see the Infinispan Directory Provider documentation for information about dependencies and how to configure it.
Configuration
The next step is to add a couple of properties to your Hibernate configuration file. If you are using Hibernate directly this can be done in hibernate.properties or hibernate.cfg.xml; if you are using Hibernate via JPA you can add these same properties to persistence.xml.
Most configuration settings have reasonable defaults: to get started you don’t need to add any property.
An example configuration could look like this:
...
<property name="hibernate.search.default.directory_provider"
value="filesystem"/>
<property name="hibernate.search.default.indexBase"
value="/var/lucene/indexes"/>
...
First you have to tell Hibernate Search which DirectoryProvider to use. This can be achieved by setting the hibernate.search.default.directory_provider property. Apache Lucene has the notion of a Directory to store the index files. Hibernate Search handles the initialization and configuration of a Lucene Directory instance via a DirectoryProvider. In this tutorial we will use a a directory provider storing the index on the file system. This will give us the ability to inspect the Lucene indexes created by Hibernate Search (eg via Luke). See Directory Configuration to configure different Directory implementations.
When using the filesystem DirectoryProvider you also have to specify the default base directory for all indexes via the property hibernate.search.default.indexBase.
Let’s assume that your application contains the Hibernate managed classes example.Book and example.Author and you want to add free text search capabilities to your application in order to search the books contained in your database.
package example;
...
@Entity
public class Book {
@Id
@GeneratedValue
private Integer id;
private String title;
private String subtitle;
@ManyToMany
private Set<Author> authors = new HashSet<Author>();
private Date publicationDate;
public Book() {}
// standard getters/setters follow here
...
}
package example;
...
@Entity
public class Author {
@Id
@GeneratedValue
private Integer id;
private String name;
public Author() {}
// standard getters/setters follow here
...
}
Enable full-text search capabilities on your Entities
To achieve this you have to add a few annotations to the Book and Author class:
Define which entities need to be indexed
The annotation @Indexed marks Book as an entity which needs to be indexed by Hibernate Search.
Pick a unique identifier
Hibernate Search needs to store the entity identifier in the index for each entity. By default, it will use for this purpose the field marked with @Id but you can override this using @DocumentId (advanced users only).
Choose what to index, and how
Next you have to mark the fields you want to make searchable. Let’s start with title and subtitle and annotate both with @Field.
The parameter index=Index.YES will ensure that the text will be indexed, while analyze=Analyze.YES ensures that the text will be analyzed using the default Lucene analyzer.
Analyzer options are important concept that we will better explain in the reference documentation. For the purpose of a simple introduction, let’s simplify and say that analyzing means chunking a sentence into individual words, lowercase them and potentially excluding common words like 'a' or 'the'.
Store option and Projections
The third parameter, store=Store.NO, ensures that the actual data will not be stored in the index. Whether this data is stored in the index or not has nothing to do with the ability to search for it: the benefit of storing it is the ability to retrieve it via projections (see Projections).
When not using projections Hibernate Search will execute a Lucene query in order to find the database identifiers of the entities matching the query and use these identifiers to retrieve managed objects from the database. If you use projections you might avoid the roundtrip to the database, but this will only return object arrays and not the managed objects you get from a normal query.
Note that index=Index.YES, analyze=Analyze.YES and store=Store.NO are the default values for these parameters and could be omitted.
Some types might need encoding
The Lucene index is mostly string based, with some additional support for numeric types. For this reason Hibernate Search must convert the data types of the indexed fields to strings and vice versa. The exception are those properties like Integer, Long, Calendar.. these are all indexed as NumericField which means they will be encoded in a representation more suited for range queries.
Many predefined bridges are provided, for example the BooleanBridge will encode properties of type Boolean to literals "true" or "false"; by so doing they are searchable by keyword.
In the case of our example, the Book entity has a Date property so if we want to make this property searchable too, we will need to annotate it with both @Field and @DateBridge.
For more details see Bridges.
Indexing of associated entities
The @IndexedEmbedded annotation is used to index associated entities, like those normally defined via @ManyToMany, @OneToOne, @ManyToOne, @Embedded and @ElementCollection.
Note however that the properties of the associated entities are embedded in the same index entry of the entity being marked with @Indexed, essentially denormalizing the data. This is needed since a Lucene index document is a flat data structure which is not suited to store relational information.
In our example, to ensure that the author’s name will be searchable you have to make sure that the names are indexed as part of the book itself. On top of @IndexedEmbedded you will also have to mark all fields of the associated entity you want to have included in the index with @Indexed. For more details see Embedded and Associated Objects.
More advanced models
These are all annotations you need to know about for our quickstart. For more details on entity mapping refer to Mapping an Entity.
package example;
...
@Entity
@Indexed
public class Book {
@Id
@GeneratedValue
private Integer id;
@Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
private String title;
@Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
private String subtitle;
@Field(index=Index.YES, analyze=Analyze.NO, store=Store.YES)
@DateBridge(resolution=Resolution.DAY)
private Date publicationDate;
@IndexedEmbedded
@ManyToMany
private Set<Author> authors = new HashSet<Author>();
public Book() {
}
// standard getters/setters follow here
...
}
package example;
...
@Entity
public class Author {
@Id
@GeneratedValue
private Integer id;
@Field
private String name;
public Author() {
}
// standard getters/setters follow here
...
}
Indexing
The short answer is that indexing is automatic: Hibernate Search will transparently index every entity each time it’s persisted, updated or removed through Hibernate ORM. Its mission is to keep the index and your database in sync, allowing you to forget about this problem.
However, when introducing Hibernate Search in an existing application, you have to create an initial Lucene index for the data already present in your database.
Once you have added the above properties and annotations, if you have existing data in the database you will need to trigger an initial batch index of your books. This will rebuild your index to make sure your index and your database is in synch. You can achieve this by using one of the following code snippets (see also Rebuilding the whole index):
FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.createIndexer().startAndWait();
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
fullTextEntityManager.createIndexer().startAndWait();
After executing the above code, you should be able to see a Lucene index under /var/lucene/indexes/example.Book.
The root of the storage path depends on the configuration property hibernate.search.default.indexBase we specified in the configuration step.
You could now inspect this index with Luke. It will help you to understand how Hibernate Search works: Luke allows you to inspect the index contents and structure, similarly to how you would use a SQL console to inspect the working of Hibernate ORM on relational databases.
Searching
Now we’ll finally execute a first search. The general approach is to create a Lucene query, either via the Lucene API (see Building a Lucene query using the Lucene API) or via the Hibernate Search query DSL (Building a Lucene query with the Hibernate Search query DSL), and then wrap this query into a org.hibernate.Query in order to get all the functionality one is used to from the Hibernate API. Essentially:
-
Create a Lucene Query (either using Lucene code directly or via the Hibernate Search DSL)
-
Wrap the Lucene Query into an Hibernate Query (org.apache.lucene.search.Query → org.hibernate.Query)
-
Execute the Hibernate Query
The following code will prepare a query against the indexed fields, execute it and return a list of Books.
EntityManager em = entityManagerFactory.createEntityManager();
FullTextEntityManager fullTextEntityManager =
org.hibernate.search.jpa.Search.getFullTextEntityManager(em);
em.getTransaction().begin();
// create native Lucene query unsing the query DSL
// alternatively you can write the Lucene query using the Lucene query parser
// or the Lucene programmatic API. The Hibernate Search DSL is recommended though
QueryBuilder qb = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity(Book.class).get();
org.apache.lucene.search.Query luceneQuery = qb
.keyword()
.onFields("title", "subtitle", "authors.name")
.matching("Java rocks!")
.createQuery();
// wrap Lucene query in a javax.persistence.Query
javax.persistence.Query jpaQuery =
fullTextEntityManager.createFullTextQuery(luceneQuery, Book.class);
// execute search
List result = jpaQuery.getResultList();
em.getTransaction().commit();
em.close();
When the Lucene Query is wrapped into an Hibernate or JPA standard Query, all well known methods of this interface are available.
Introduction to Full-Text
Let’s make things a little more interesting now. Assume that one of your indexed book entities has the title "Refactoring: Improving the Design of Existing Code" and you want to get hits for all of the following queries: "refactor", "refactors", "refactored" and "refactoring". In Lucene this can be achieved by choosing an Analyzer class which applies word stemming during the indexing and during the search process. Hibernate Search offers several ways to configure the analyzer to be used (see Default analyzer and analyzer by class):
-
Setting the hibernate.search.analyzer property in the configuration file. The specified class will then be the default analyzer.
-
Setting the @Analyzer annotation at the entity level.
-
Setting the @Analyzer annotation at the field level.
When using the @Analyzer annotation one can either specify the fully qualified classname of the analyzer to use or one can refer to an analyzer definition defined by the @AnalyzerDef annotation. In the latter case the Solr analyzer framework with its factories approach is utilized. To find out more about the factory classes available you can either browse the Solr JavaDoc or read the corresponding section on the Solr Wiki.
In the example below a StandardTokenizerFactory is used followed by two filter factories, LowerCaseFilterFactory and SnowballPorterFilterFactory. The standard tokenizer splits words at punctuation characters and hyphens while keeping email addresses and internet hostnames intact. It is a good general purpose tokenizer. The lowercase filter lowercases the letters in each token whereas the snowball filter finally applies language specific stemming.
Generally, when using the Solr framework you start with a Tokenizer followed by an arbitrary number of filters.
@Entity
@Indexed
@AnalyzerDef(name = "customanalyzer",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
@Parameter(name = "language", value = "English")
})
})
public class Book {
@Id
@GeneratedValue
@DocumentId
private Integer id;
@Field
@Analyzer(definition = "customanalyzer")
private String title;
@Field
@Analyzer(definition = "customanalyzer")
private String subtitle;
@IndexedEmbedded
@ManyToMany
private Set<Author> authors = new HashSet<Author>();
@Field(index = Index.YES, analyze = Analyze.NO, store = Store.YES)
@DateBridge(resolution = Resolution.DAY)
private Date publicationDate;
public Book() {
}
// standard getters/setters follow here
...
}
Using @AnalyzerDef you define an Analyzer, you still have to apply it to entities and or properties using @Analyzer. Like in the above example the customanalyzer is defined but not applied on the entity: it’s applied on the title and subtitle properties only.
An analyzer definition is not scoped to the entity, so you can define it on any entity and reuse the definition on other entities.
What’s next
The above paragraphs gave you an introduction to Hibernate Search, but it supports many more features.
For example Filters make for a very convenient API to add recurring restrictions, or the Spatial Queries can add restrictions based by distance from coordinates.
The next step after this tutorial is to get more familiar with the overall architecture of Hibernate Search (Architecture) and explore the basic features in more detail. Two topics which were only briefly touched in this tutorial were Analyzer configuration (Default analyzer and analyzer by class) and field bridges (Bridges). Both are important features required for more fine-grained indexing. More advanced topics cover clustering (JMS Master/Slave back end, Infinispan Directory configuration), large index handling (Sharding Indexes), Spatial indexing, Faceting.