头图


Elasticsearch is built on Apache Lucene and was first released by Elasticsearch NV (now Elastic ) in 2010. According to the Elastic website, it is a distributed open source search and analysis engine, suitable for all types of data, including text, numerical, geospatial, structured and unstructured . Elasticsearch operations are implemented through REST API. The main functions are:

  • Store documents in the index,
  • Use a powerful query search index to retrieve these documents, and
  • Run analysis functions on the data.

Spring Data Elasticsearch provides a simple interface to perform these operations on Elasticsearch as an alternative to using the REST API directly .
Here, we will use Spring Data Elasticsearch to demonstrate Elasticsearch's indexing and search functions, and build a simple search application at the end to search for products in product inventory.

Code example

This article attached on GitHub working code examples.

Elasticsearch concepts

Elasticsearch concepts
The easiest way to understand the concept of Elasticsearch is to use the database analogy, as shown in the following table:

Elasticsearch->database
index->surface
Documentation->Row
Documentation->List

Any data we want to search or analyze is stored in the index as a document. In Spring Data, we represent a document in the form of a POJO and decorate it with annotations to define the mapping to the Elasticsearch document.

Unlike a database, the text stored in Elasticsearch is first processed by various analyzers. The default analyzer splits the text by common word separators (such as spaces and punctuation marks) and deletes common English words.

If we store the text "The sky is blue", the analyzer will store it as a document containing the "terms" "sky" and "blue". We will be able to search for this document using text in the form of "blue sky", "sky" or "blue" and use the degree of match as a score.

In addition to text, Elasticsearch can also store other types of data, called Field Type, as described in the mapping-types section of the document.

Start an Elasticsearch instance

Before going any further, let's start an Elasticsearch instance, which we will use to run our example. There are multiple ways to run an Elasticsearch instance:

  • Use hosting services
  • Use managed services from cloud providers such as AWS or Azure
  • Install Elasticsearch by yourself in a virtual machine cluster
  • Run Docker image
    We will use the Docker image from Dockerhub, which is sufficient for our demo application. Let's start the Elasticsearch instance by running the Docker run command:
docker run -p 9200:9200 \
  -e "discovery.type=single-node" \
  docker.elastic.co/elasticsearch/elasticsearch:7.10.0

Executing this command will start an Elasticsearch instance listening on port 9200. We can verify the instance status by clicking on the URL http://localhost:9200 , and check the result output in the browser:

{
  "name" : "8c06d897d156",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "Jkx..VyQ",
  "version" : {
  "number" : "7.10.0",
  ...
  },
  "tagline" : "You Know, for Search"
}

If our Elasticsearch instance starts successfully, you should see the above output.

Index and search using REST API

Elasticsearch operations are accessed through the REST API. There are two ways to add documents to the index:

  • Add one document at a time, or
  • Add documents in batches.

The API for adding a single document accepts a document as a parameter.

A simple PUT request to an Elasticsearch instance to store documents is as follows:

PUT /messages/_doc/1
{
  "message": "The Sky is blue today"
}

This will store the message-"The Sky is blue today" as a document in the index of "messages".

We can use the search query sent to the search REST API to get this document:

GET /messages/search
{
  "query":
  {
  "match": {"message": "blue sky"}
  }
}

Here we send a match type 06194f5d92e856 to obtain documents matching the string "blue sky". We can specify the query used to search for documents in a variety of ways. Elasticsearch provides a JSON-based query DSL (Domain Specific Language) to define the query.

For batch addition, we need to provide a JSON document containing entries similar to the following code snippet:

POST /_bulk
{"index":{"_index":"productindex"}}{"_class":"..Product","name":"Corgi Toys .. Car",..."manufacturer":"Hornby"}{"index":{"_index":"productindex"}}{"_class":"..Product","name":"CLASSIC TOY .. BATTERY"...,"manufacturer":"ccf"}

Use Spring Data for Elasticsearch operations

We have two ways to access Elasticsearch using Spring Data, as follows:

  • Repositories: We define methods in the interface, and Elasticsearch queries are generated based on the method name at runtime.
  • ElasticsearchRestTemplate: We use method chains and native queries to create queries in order to better control the creation of Elasticsearch queries in relatively complex scenarios.

We will examine these two methods in more detail in the following sections.

Create an application and add dependencies

Let's first create our application Spring Initializr by including web, thymeleaf and lombok dependencies. Add thymeleaf dependency to increase the user interface.

Add spring-data-elasticsearch dependency in Maven pom.xml

<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-elasticsearch</artifactId>
</dependency>

Connect to Elasticsearch instance

Spring Data Elasticsearch uses Java High Level REST Client (JHLC) ") to connect to the Elasticsearch server. JHLC is the default client of Elasticsearch. We will create a Spring Bean configuration to set it up:

@Configuration
@EnableElasticsearch
Repositories(basePackages
        = "io.pratik.elasticsearch.repositories")@ComponentScan(basePackages = { "io.pratik.elasticsearch" })
public class ElasticsearchClientConfig extends
         AbstractElasticsearchConfiguration {
  @Override
  @Bean
  public RestHighLevelClient elasticsearchClient() {


  final ClientConfiguration clientConfiguration =
    ClientConfiguration
      .builder()
      .connectedTo("localhost:9200")
      .build();


  return RestClients.create(clientConfiguration).rest();
  }
}

Here, we connect to the Elasticsearch instance we started earlier. We can further customize the connection by adding more attributes (such as enabling ssl, setting a timeout, etc.).

For debugging and diagnosis, we will turn on the request/response log of the transfer level in the log configuration of logback-spring.xml

public class Product {
  @Id
  private String id;
  
  @Field(type = FieldType.Text, name = "name")
  private String name;
  
  @Field(type = FieldType.Double, name = "price")
  private Double price;
  
  @Field(type = FieldType.Integer, name = "quantity")
  private Integer quantity;
  
  @Field(type = FieldType.Keyword, name = "category")
  private String category;
  
  @Field(type = FieldType.Text, name = "desc")
  private String description;
  
  @Field(type = FieldType.Keyword, name = "manufacturer")
  private String manufacturer;


  ...
}

Express documents

In our example, we will search for products by name, brand, price, or description. Therefore, in order to store the product as a document in Elasticsearch, we represent the product as a POJO and add a Field annotation to configure the mapping of Elasticsearch, as shown below:

public class Product {
  @Id
  private String id;
  
  @Field(type = FieldType.Text, name = "name")
  private String name;
  
  @Field(type = FieldType.Double, name = "price")
  private Double price;
  
  @Field(type = FieldType.Integer, name = "quantity")
  private Integer quantity;
  
  @Field(type = FieldType.Keyword, name = "category")
  private String category;
  
  @Field(type = FieldType.Text, name = "desc")
  private String description;
  
  @Field(type = FieldType.Keyword, name = "manufacturer")
  private String manufacturer;


  ...
}

@Document annotation specifies the index name.

@Id annotation makes the annotation field the _id of the document as the unique identifier in this index. id field has a limit of 512 characters.

@Field Annotates the type of configuration field. We can also set the name to a different field name.

Based on these annotations, an index named productindex is created in Elasticsearch.

Use Spring Data Repository for indexing and searching

The repository provides the most convenient way to access data in Spring Data using the finder method. Elasticsearch queries are created based on the method name. However, we must be careful to avoid generating inefficient queries and putting a high load on the cluster.

Let's create a Spring Data repository interface by extending the ElasticsearchRepository

public interface ProductRepository
    extends ElasticsearchRepository<Product, String> {

}

Here ProductRepository class inherits ElasticsearchRepository contained in the interface save() , saveAll() , find() and findAll() like.

index

We will now save() method, and call the saveAll() method to index in batches, so as to store some products in the index. Before that, we put the repository interface in a service class:

@Service
public class ProductSearchServiceWithRepo {


  private ProductRepository productRepository;


  public void createProductIndexBulk(final List<Product> products) {
    productRepository.saveAll(products);
  }


  public void createProductIndex(final Product product) {
    productRepository.save(product);
  }
}

When we call these methods from JUnit, we can see the REST API call index and batch index in the trace log.

search

To meet our search requirements, we will add the finder method to the repository interface:

public interface ProductRepository
    extends ElasticsearchRepository<Product, String> {
  List<Product> findByName(String name);
  
  List<Product> findByNameContaining(String name);
  List<Product> findByManufacturerAndCategory
       (String manufacturer, String category);
}

When running the findByName() method using JUnit, we can see the Elasticsearch query generated in the trace log before it is sent to the server:

TRACE Sending request POST /productindex/_search? ..:
Request body: {.."query":{"bool":{"must":[{"query_string":{"query":"apple","fields":["name^1.0"],..}

Similarly, by running
findByManufacturerAndCategory() method, we can see query_string parameters corresponding to two fields-"manufacturer" and "category":

TRACE .. Sending request POST /productindex/_search..:
Request body: {.."query":{"bool":{"must":[{"query_string":{"query":"samsung","fields":["manufacturer^1.0"],..}},{"query_string":{"query":"laptop","fields":["category^1.0"],..}}],..}},"version":true}

There are a variety of method naming patterns can generate various Elasticsearch queries.

Use ElasticsearchRestTemplate for indexing and searching

When we need more control over the way we design queries, or the team has mastered Elasticsearch syntax, Spring Data repositories may no longer be suitable.

In this case, we use ElasticsearchRestTemplate . It is a new HTTP-based client of Elasticsearch, which replaces the previous TransportClient that used the node-to-node binary protocol.

ElasticsearchRestTemplate implements the interface ElasticsearchOperations , which is responsible for the complex work of the underlying search and cluster operations.

index

The interface has a method for adding a single document index() and for adding the plurality of documents to the index bulkIndex() method. The code snippet here shows how to use bulkIndex() add multiple products to the index " productindex ":

@Service
@Slf4j
public class ProductSearchService {


  private static final String PRODUCT_INDEX = "productindex";
  private ElasticsearchOperations elasticsearchOperations;


  public List<String> createProductIndexBulk
            (final List<Product> products) {


      List<IndexQuery> queries = products.stream()
      .map(product->
        new IndexQueryBuilder()
        .withId(product.getId().toString())
        .withObject(product).build())
      .collect(Collectors.toList());;
    
      return elasticsearchOperations
      .bulkIndex(queries,IndexCoordinates.of(PRODUCT_INDEX));
  }
  ...
}

The document to be stored is contained in the IndexQuery object. bulkIndex() method takes as input the IndexQuery object list and the Index IndexCoordinates When we execute this method, we will get a REST API trace of the batch request:

Sending request POST /_bulk?timeout=1m with parameters:
Request body: {"index":{"_index":"productindex","_id":"383..35"}}{"_class":"..Product","id":"383..35","name":"New Apple..phone",..manufacturer":"apple"}
..
{"_class":"..Product","id":"d7a..34",.."manufacturer":"samsung"}

Next, we use the index() method to add a single document:

@Service
@Slf4j
public class ProductSearchService {


  private static final String PRODUCT_INDEX = "productindex";
   
  private ElasticsearchOperations elasticsearchOperations;


  public String createProductIndex(Product product) {


    IndexQuery indexQuery = new IndexQueryBuilder()
         .withId(product.getId().toString())
         .withObject(product).build();


    String documentId = elasticsearchOperations
     .index(indexQuery, IndexCoordinates.of(PRODUCT_INDEX));


    return documentId;
  }
}

The trace accordingly shows the REST API PUT request to add a single document.

Sending request PUT /productindex/_doc/59d..987..:
Request body: {"_class":"..Product","id":"59d..87",..,"manufacturer":"dell"}

search

ElasticsearchRestTemplate also has the search() method for searching documents in the index. This search operation is similar to an Elasticsearch query and is constructed by constructing a Query object and passing it to the search method.

Query object has three variants- NativeQueryy , StringQuery and CriteriaQuery , depending on how we construct the query. Let's construct some queries for searching products.

NativeQuery

NativeQuery provides maximum flexibility for building queries using objects that represent Elasticsearch constructs (such as aggregation, filtering, and sorting). NativeQuery used to search for products matching a specific manufacturer:

@Service
@Slf4j
public class ProductSearchService {


  private static final String PRODUCT_INDEX = "productindex";
  private ElasticsearchOperations elasticsearchOperations;


  public void findProductsByBrand(final String brandName) {


    QueryBuilder queryBuilder =
      QueryBuilders
      .matchQuery("manufacturer", brandName);


    Query searchQuery = new NativeSearchQueryBuilder()
      .withQuery(queryBuilder)
      .build();


    SearchHits<Product> productHits =
      elasticsearchOperations
      .search(searchQuery,
          Product.class,
          IndexCoordinates.of(PRODUCT_INDEX));
  }
}

Here, we use NativeSearchQueryBuilder build a query that uses MatchQueryBuilder specify a matching query that includes the field "manufacturer".

StringQuery

StringQuery provides full control by allowing native Elasticsearch queries to be used as JSON strings, as follows:

@Service
@Slf4j
public class ProductSearchService {


  private static final String PRODUCT_INDEX = "productindex";
  private ElasticsearchOperations elasticsearchOperations;


  public void findByProductName(final String productName) {
    Query searchQuery = new StringQuery(
      "{\"match\":{\"name\":{\"query\":\""+ productName + "\"}}}\"");
    
    SearchHits<Product> products = elasticsearchOperations.search(
      searchQuery,
      Product.class,
      IndexCoordinates.of(PRODUCT_INDEX_NAME));
  ...     
   }
}

In this code snippet, we specified a simple match query to get products with specific names sent as method parameters.

CriteriaQuery

Using CriteriaQuery , we can construct queries without knowing any terminology in Elasticsearch. The query is constructed using a method chain Criteria Each object specifies some criteria for searching documents:

@Service
@Slf4j
public class ProductSearchService {


  private static final String PRODUCT_INDEX = "productindex";
   
  private ElasticsearchOperations elasticsearchOperations;


  public void findByProductPrice(final String productPrice) {
    Criteria criteria = new Criteria("price")
                  .greaterThan(10.0)
                  .lessThan(100.0);


    Query searchQuery = new CriteriaQuery(criteria);


    SearchHits<Product> products = elasticsearchOperations
       .search(searchQuery,
           Product.class,
           IndexCoordinates.of(PRODUCT_INDEX_NAME));
  }
}

In this code snippet, we use CriteriaQuery form a query to obtain products 100.0 10.0 and less than 06194f5d92eddd.

Build a search application

We will now add a user interface to our application to see the actual effect of the product search. The user interface will have a search input box to search for products by name or description. The input box will have an auto-completion function to display a list of suggestions based on available products, as shown below:

We will create auto-completion suggestions for the user’s search input. Then search for products based on a name or description that closely matches the search text entered by the user. We will build two search services to achieve this use case:

  • Get search suggestions for auto-complete
  • Process the search for search products according to the user's search query
    The service class ProductSearchService will contain methods for searching and getting suggestions.

A mature application with a user interface is provided in the GitHub repository

Build product search index

productindex same index we used to run the JUnit test before. productindex using the Elasticsearch REST API productindex during application startup with products loaded from our sample dataset of 50 fashion series products:

curl -X DELETE http://localhost:9200/productindex

If the delete operation is successful, we will receive the message {"acknowledged": true} .

Now, let's create an index for the products in the inventory. We will use a sample data set containing 50 products to build our index. These products are arranged as separate rows CSV file

Each row has three attributes-id, name, and description. We want to create an index during application startup. Please note that in an actual production environment, index creation should be a separate process. We will read each row of the CSV and add it to the product index:

@SpringBootApplication
@Slf4j
public class ProductsearchappApplication {
  ...
  @PostConstruct
  public void buildIndex() {
    esOps.indexOps(Product.class).refresh();
    productRepo.saveAll(prepareDataset());
  }


  private Collection<Product> prepareDataset() {
    Resource resource = new ClassPathResource("fashion-products.csv");
    ...
    return productList;
  }
}

In this snippet, we do some preprocessing by reading rows from the data set and passing these rows to the saveAll() method of the repository to add products to the index. When running the application, we can see the following trace log during application startup.

...Sending request POST /_bulk?timeout=1m with parameters:
Request body: {"index":{"_index":"productindex"}}{"_class":"io.pratik.elasticsearch.productsearchapp.Product","name":"Hornby 2014 Catalogue","description":"Product Desc..talogue","manufacturer":"Hornby"}{"index":{"_index":"productindex"}}{"_class":"io.pratik.elasticsearch.productsearchapp.Product","name":"FunkyBuys..","description":"Size Name:Lar..& Smoke","manufacturer":"FunkyBuys"}{"index":{"_index":"productindex"}}.
...

Search products using multi-field and fuzzy search

Here is how we process the search request when we submit the search request processSearch()

@Service
@Slf4j
public class ProductSearchService {


  private static final String PRODUCT_INDEX = "productindex";


  private ElasticsearchOperations elasticsearchOperations;


  public List<Product> processSearch(final String query) {
  log.info("Search with query {}", query);
  
  // 1. Create query on multiple fields enabling fuzzy search
  QueryBuilder queryBuilder =
    QueryBuilders
    .multiMatchQuery(query, "name", "description")
    .fuzziness(Fuzziness.AUTO);


  Query searchQuery = new NativeSearchQueryBuilder()
            .withFilter(queryBuilder)
            .build();


  // 2. Execute search
  SearchHits<Product> productHits =
    elasticsearchOperations
    .search(searchQuery, Product.class,
    IndexCoordinates.of(PRODUCT_INDEX));


  // 3. Map searchHits to product list
  List<Product> productMatches = new ArrayList<Product>();
  productHits.forEach(searchHit->{
    productMatches.add(searchHit.getContent());
  });
  return productMatches;
  }...
}

Here, we perform a search on multiple fields-name and description. We also appended fuzziness() to search for closely matched text to explain spelling errors.

Use wildcard search to get suggestions

Next, we build an auto-completion function for the search text box. When we enter something in the search text field, we will get suggestions by performing a wildcard search using the characters entered in the search box.

We build this function in the fetchSuggestions() method as follows:

@Service
@Slf4j
public class ProductSearchService {


  private static final String PRODUCT_INDEX = "productindex";


  public List<String> fetchSuggestions(String query) {
    QueryBuilder queryBuilder = QueryBuilders
      .wildcardQuery("name", query+"*");


    Query searchQuery = new NativeSearchQueryBuilder()
      .withFilter(queryBuilder)
      .withPageable(PageRequest.of(0, 5))
      .build();


    SearchHits<Product> searchSuggestions =
      elasticsearchOperations.search(searchQuery,
        Product.class,
      IndexCoordinates.of(PRODUCT_INDEX));
    
    List<String> suggestions = new ArrayList<String>();
    
    searchSuggestions.getSearchHits().forEach(searchHit->{
      suggestions.add(searchHit.getContent().getName());
    });
    return suggestions;
  }
}

We use wildcard queries in the form of search input text and append * so that if we enter "red", we will get suggestions starting with "red". We use the withPageable() method to limit the number of suggestions to 5. You can see some screenshots of the search results of the running application here:

in conclusion

In this article, we introduced the main operations of Elasticsearch-indexing documents, batch indexing, and searching-which are provided in the form of a REST API. The combination of Query DSL and different analyzers makes searching very powerful.

Spring Data Elasticsearch provides a convenient interface to access these operations in the application by using Spring Data Repositories or ElasticsearchRestTemplate

We finally built an application in which we saw how to use Elasticsearch's bulk indexing and search capabilities in applications that are close to real life.



信码由缰
65 声望8 粉丝

“码”界老兵,分享程序人生。