Getting Started with RediSearch

Getting Started with RediSearch

Hands-On and Deep-Dive into RediSearch, a full-text search offering by Redis.
Ankit Kalia
Ankit Kalia
December 29, 2020
Web

Table of Contents

What is RediSearch?

RediSearch is a Full-Text Search engine, available as a module for Redis. This article provides a hands-on tutorial with RediSearch, as well as some important Full-Text Search concepts needed to get a good grasp on RediSearch.

Prerequisites

Basic knowledge of programming and databases should be enough for following this guide. Although it would be good if you have some understanding of Redis, it is not necessary for following this guide.

In a full-text search, a search engine examines all of the words in every stored document as it tries to match search criteria e.g. the text specified by a user. Contrary to wildcard search offered by databases via LIKE query, full-text search is primarily based on natural language processing. Below are highlighted the major differences between wildcard search and full-text search:

Wildcard Search (Via LIKE query) Full Text Search
Supported by databases such as PostgreSQL, MySQL Supported by RediSearch, ElasticSearch, PostgreSQL etc.
Based on a wildcard, eg: ‘%ang%’ will match ‘mango’, ‘angel’. Based on language processing eg. “gone” can match “going”
Usually slower performance, specially on large datasets Much better performance
Usually on a single attribute, like search by name Can search by multiple attributes in a single query, eg: search by name and description.

Installation

For this sample hands-on, RediSearch can be installed using the steps below provided Docker is already installed in the local environment.

docker run -p 6379:6379 redislabs/redisearch:latest

Then in a separate terminal tab:

docker ps # Get the container id
docker exec -it {container_id} redis-cli

Alternatively, you can follow the quick start guide.

Hands-On with Redis-CLI

After following the steps above for installation, you should be able to use Redis-CLI with the RediSearch module loaded. The steps below describe how to create indexes and add documents to the index.

Indexes

Indexes can be visualized like a table or collection of records e.g. a collection of products. To create a new index, use the command FT.CREATE:

FT.CREATE products ON HASH PREFIX 1 product: SCHEMA name TEXT SORTABLE quantity NUMERIC SORTABLE description TEXT

One important thing to note here is that indexing is based on the prefix specified. So all Redis hash keys which start with prefix product: will be added to this index.

Documents

These are the actual records in the index e.g. individual products in the collection of products. For adding a document to the index, use the command HMSET along with the appropriate index prefix e.g. for products, the prefixes will be product:1, product:2.

HMSET product:1 name "Apple Juice" quantity 2 description "Fresh apple juice"
HMSET product:2 name "Mango Juice" quantity 4 description "Fresh mango juice"
HMSET product:3 name "Grape Smoothie" quantity 5 description "Fresh grape smoothie"

HMSET is a standard Redis command for setting the value of hash and does not have anything to do with RediSearch module. As our index is defined based on the prefix, all hashes which are added in this particular way will be added to the index.

To search for a document, we can use the command FT.SEARCH. The search is case insensitive for all Latin characters. RediSearch supports primarily two algorithms for search:

  • Prefix-based Search

This type of search is based on a prefix of individual terms:


FT.SEARCH products app* // Returns product with the name Apple Juice
FT.SEARCH products jui* // Returns product with the name Apple Juice, Mango Juice
FT.SEARCH products @name=app* // For searching across a specific field

In the table below the terms which could be used to search based on a prefix:

Dataset Possible Search Terms
Playstation Pl, Pla, Play
Chocolate Ice Cream Ch, Cho, Ic, Ice, Cr, Cre
Mango-Juice Ma, Man, Ju, Jui
  • Fuzzy Search

This type of search is based on Levenshtein Distance (L.D.). The Levenshtein distance between two words is the minimum number of single-character edits required to change one word into the other. For instance, the distance between guava and grape is 3.

The distance between 1 to 3 can be specified for the purpose of searching:

FT.SEARCH products %jui% // Search with fields with L.D upto 1
FT.SEARCH products %%jui%% // Search with fields with L.D upto 2
FT.SEARCH products %%%jui%%% // Search with fields with L.D upto 3

Beware that this type of search can return highly inaccurate results when a high value of L.D is specified. For instance, with L.D set to 3, searching for the term dog can return cat as a result.

Listing All Entries

We can use the same SEARCH command to list all entries in the index as well:

FT.SEARCH products *

Pagination and Sorting

Basic pagination and sorting can be implemented:

FT.SEARCH products {term}* LIMIT #{OFFSET} #{LIMIT} SORTBY #{sort_field} #{sort_direction}
FT.SEARCH products jui* LIMIT 0 10 SORTBY quantity desc

For detailed reference on query syntax for search, refer to the official documentation:

Tokenization and Escaping

This is an important concept to understand while implementing a search, especially if you want to implement search with special characters.

When a product is created with the name Apple Juice or Apple-Juice, it is split into terms Apple and Juice. So it is not possible to search using a term like Apple-Ju*, as the field is split into two different terms in the database.

These are some of the rules used for tokenization:

  • Special characters including ,.<>{}[]"':;!@#$%^&*()-+=~ break the text into terms. For instance, foo-bar will be broken into foo, bar terms.
  • All latin characters are converted to lowercase, so search will always be case insensitive.

For further information, refer to https://oss.redis.com/redisearch/Escaping/.

Escaping special characters

To search with special characters, the special characters must be escaped with double backslash \\, both while creating the record, and when performing the search query. This way, the term Apple-Juice, would be split into Apple, Juice, and Apple-Juice.

In the following Ruby implementation, we added the method below to escape special characters.

module StringExtensions
  refine String do
    def escape_special_characters
      # List of characters from https://oss.redis.com/redisearch/Escaping/
      # ,.<>{}[]"':;!@#$%^&*()-+=~
      pattern = %r{(\'|\"|\.|\,|\;|\<|\>|\{|\}|\[|\]|\"|\'|\=|\~|\*|\:|\#|\+|\^|\$|\@|\%|\!|\&|\)|\(|/|\-|\\)}
      gsub(pattern) { |match| '\\' + match }
    end
  end
end

# Sample Usage
# using StringExtensions
# 'Apple-Juice'.escape_special_characters

Demo Application

As there are no official libraries available for either Ruby or Ruby on Rails from Redis, we had to build a custom implementation. The demo application for implementation can be found in this Github repository.

Some snippets from the codebase have been added below focusing on how to interact with the RediSearch module using Ruby.

  • The Gemfile includes the library redis-rb , which is the standard Ruby library to interact with Redis.
# Gemfile.rb
gem 'redis-rb'
  • To initialize a connection to a Redis database.
# lib/redis.rb
REDIS = Redis.new(url: 'redis://redis:6379')
  • To create an index using FT.CREATE .
# lib/redisearch/index.rb
module RediSearch
  class Index
    class << self
      # Creates a new index
      # @param [String] name
      # @param [String] prefix
      # @param [Hash] schema
      # Example Usage:
      # RediSearch::Index.create(name: 'products', prefix: 'products:', schema: { id: 'NUMERIC SORTABLE', name: 'TEXT SORTABLE' })
      def create(name:, prefix:, schema:)
        command = "FT.CREATE #{name} ON HASH PREFIX 1 #{prefix} SCHEMA #{schema.to_a.flatten.join(' ')}"
        REDIS.call(command.split(' '))
      end
   end
end
  • To add a record to the Redis database.
 REDIS.mapped_hmset("products:1", { id: 1, name: 'mango', quantity: 2 })
  • To search the Redis database using the RediSearch module.
#lib/redisearch/document.rb

require_relative '../string_extensions'

module RediSearch
  class Document
    using ::StringExtensions

    class << self
      # Performs searching on a index for a particual term
      # @param [String] index_name
      # @param [String] term
      # @param [Class] klass You can specify the underlying object class for mapping the results
      # @param [Hash] filters
      # @param [Hash] paging
      # @param [Hash] sort
      # Example Usage:
      # RediSearch::Document.search(index_name: "business-products-1",
      #                             term: "app",
      #                             klass: Product,
      #                             filters: { stock_quantity: { min: 0, max: 4 } },
      #                             paging: { limit: 10, offset: 0 },
      #                             sort: { key: :stock_quantity, direction: :desc })
      def search(index_name:, term: nil, klass: nil, filters: {}, paging: {}, sort: {})
        command = build_query(index_name: index_name, term: term)
        command = apply_filters(command: command, filters: filters) unless filters.empty?
        command = apply_pagination(command: command, paging: paging) unless paging.empty?
        command = apply_sorting(command: command, sort: sort) unless sort.empty?

        parse_results(value: REDIS.call(command.split(' ')), klass: klass)
      end
     
      private

      def build_query(index_name:, term: nil)
        query = '*'
        query.prepend(term.escape_special_characters) unless term.nil?

        "FT.SEARCH #{index_name} #{query}"
      end

      def apply_pagination(command:, paging:)
        command << " LIMIT #{paging[:offset]} #{paging[:limit]}"
      end

      def apply_sorting(command:, sort:)
        command << " SORTBY #{sort[:key]} #{sort[:direction]}"
      end

      # Remaing implementation could be found in codebase at [https://github.com/nimblehq/redisearch-ruby-demo](https://github.com/nimblehq/redisearch-ruby-demo)
  end
end

Deployment

There are two options for deployment:

✋ At the time of writing this article, AWS ElastiCache does not support options to add modules like RediSearch.

Conclusion

Should you use RediSearch instead of Elasticsearch? As with any engineering decision, it depends. Below are the pros and cons which can help to decide to pick one over the other and/or at least understand the trade-offs of using RediSearch instead of Elasticsearch.

Pros

  • Faster in some benchmarks
  • If you are already using Redis for cache, the same Redis host and dataset can be used for Full-Text searching.

Cons

  • Limited search algorithms compared to Elasticsearch.
  • Only a default tokenizer is available, therefore there is no capability to use an n-gram tokenizer like Elasticsearch.
  • Limited libraries available for several languages and platforms.
  • Community adoption is relatively low at the moment.

References

If this is the kind of challenges you wanna tackle, Nimble is hiring awesome web and mobile developers to join our team in Bangkok, Thailand, Ho Chi Minh City, Vietnam, and Da Nang, Vietnam✌️

Join Us

Recommended Stories:

Accelerate your digital transformation.

Subscribe to our newsletter and get latest news and trends from Nimble