What is RediSearch?
RediSearch is a Full-Text Search engine, available as a module for Redis. This article provides a hands-on tutorial with RediSearch, as well as some important Full-Text Search concepts needed to get a good grasp on RediSearch.
Prerequisites
Basic knowledge of programming and databases should be enough for following this guide. Although it would be good if you have some understanding of Redis, it is not necessary for following this guide.
What is Full Text Search and how does it differ from wildcard search?
In a full-text search, a search engine examines all of the words in every stored document as it tries to match search criteria e.g. the text specified by a user. Contrary to wildcard search offered by databases via LIKE
query, full-text search is primarily based on natural language processing
.
Below are highlighted the major differences between wildcard search and full-text search:
Wildcard Search (Via LIKE query) | Full Text Search |
---|---|
Supported by databases such as PostgreSQL, MySQL | Supported by RediSearch, ElasticSearch, PostgreSQL etc. |
Based on a wildcard, eg: ‘%ang%’ will match ‘mango’, ‘angel’. | Based on language processing eg. “gone” can match “going” |
Usually slower performance, specially on large datasets | Much better performance |
Usually on a single attribute, like search by name | Can search by multiple attributes in a single query, eg: search by name and description. |
Installation
For this sample hands-on, RediSearch can be installed using the steps below provided Docker is already installed in the local environment.
docker run -p 6379:6379 redislabs/redisearch:latest
Then in a separate terminal tab:
docker ps # Get the container id
docker exec -it {container_id} redis-cli
Alternatively, you can follow the quick start guide.
Hands-On with Redis-CLI
After following the steps above for installation, you should be able to use Redis-CLI with the RediSearch module loaded. The steps below describe how to create indexes and add documents to the index.
Indexes
Indexes can be visualized like a table or collection of records e.g. a collection of products. To create a new index, use the command FT.CREATE:
FT.CREATE products ON HASH PREFIX 1 product: SCHEMA name TEXT SORTABLE quantity NUMERIC SORTABLE description TEXT
One important thing to note here is that indexing is based on the prefix specified. So all Redis hash keys which start with prefix product:
will be added to this index.
Documents
These are the actual records in the index e.g. individual products in the collection of products. For adding a document to the index, use the command HMSET
along with the appropriate index prefix e.g. for products, the prefixes will be product:1
, product:2
.
HMSET product:1 name "Apple Juice" quantity 2 description "Fresh apple juice"
HMSET product:2 name "Mango Juice" quantity 4 description "Fresh mango juice"
HMSET product:3 name "Grape Smoothie" quantity 5 description "Fresh grape smoothie"
HMSET
is a standard Redis command for setting the value of hash and does not have anything to do with RediSearch module. As our index is defined based on the prefix, all hashes which are added in this particular way will be added to the index.
Search
To search for a document, we can use the command FT.SEARCH
. The search is case insensitive for all Latin characters. RediSearch supports primarily two algorithms for search:
- Prefix-based Search
This type of search is based on a prefix of individual terms:
FT.SEARCH products app* // Returns product with the name Apple Juice
FT.SEARCH products jui* // Returns product with the name Apple Juice, Mango Juice
FT.SEARCH products @name=app* // For searching across a specific field
In the table below the terms which could be used to search based on a prefix:
Dataset | Possible Search Terms |
---|---|
Playstation | Pl, Pla, Play |
Chocolate Ice Cream | Ch, Cho, Ic, Ice, Cr, Cre |
Mango-Juice | Ma, Man, Ju, Jui |
- Fuzzy Search
This type of search is based on Levenshtein Distance (L.D.). The Levenshtein distance between two words is the minimum number of single-character edits required to change one word into the other. For instance, the distance between guava
and grape
is 3.
The distance between 1 to 3 can be specified for the purpose of searching:
FT.SEARCH products %jui% // Search with fields with L.D upto 1
FT.SEARCH products %%jui%% // Search with fields with L.D upto 2
FT.SEARCH products %%%jui%%% // Search with fields with L.D upto 3
Beware that this type of search can return highly inaccurate results when a high value of L.D is specified. For instance, with L.D set to 3, searching for the term dog
can return cat
as a result.
Listing All Entries
We can use the same SEARCH
command to list all entries in the index as well:
FT.SEARCH products *
Pagination and Sorting
Basic pagination and sorting can be implemented:
FT.SEARCH products {term}* LIMIT #{OFFSET} #{LIMIT} SORTBY #{sort_field} #{sort_direction}
FT.SEARCH products jui* LIMIT 0 10 SORTBY quantity desc
For detailed reference on query syntax for search, refer to the official documentation:
Tokenization and Escaping
This is an important concept to understand while implementing a search, especially if you want to implement search with special characters.
When a product is created with the name Apple Juice
or Apple-Juice
, it is split into terms Apple
and Juice
. So it is not possible to search using a term like Apple-Ju*
, as the field is split into two different terms in the database.
These are some of the rules used for tokenization:
- Special characters including
,.<>{}[]"':;!@#$%^&*()-+=~
break the text into terms. For instance,foo-bar
will be broken intofoo, bar
terms. - All latin characters are converted to lowercase, so search will always be case insensitive.
For further information, refer to https://redis.io/docs/interact/search-and-query/advanced-concepts/escaping/.
Escaping special characters
To search with special characters, the special characters must be escaped with double backslash \\
, both while creating the record, and when performing the search query. This way, the term Apple-Juice
, would be split into Apple
, Juice
, and Apple-Juice
.
In the following Ruby implementation, we added the method below to escape special characters.
module StringExtensions
refine String do
def escape_special_characters
# List of characters from https://redis.io/docs/interact/search-and-query/advanced-concepts/escaping/
# ,.<>{}[]"':;!@#$%^&*()-+=~
pattern = %r{(\'|\"|\.|\,|\;|\<|\>|\{|\}|\[|\]|\"|\'|\=|\~|\*|\:|\#|\+|\^|\$|\@|\%|\!|\&|\)|\(|/|\-|\\)}
gsub(pattern) { |match| '\\' + match }
end
end
end
# Sample Usage
# using StringExtensions
# 'Apple-Juice'.escape_special_characters
Demo Application
As there are no official libraries available for either Ruby or Ruby on Rails from Redis, we had to build a custom implementation. The demo application for implementation can be found in this Github repository.
Some snippets from the codebase have been added below focusing on how to interact with the RediSearch module using Ruby.
- The Gemfile includes the library
redis-rb
, which is the standard Ruby library to interact with Redis.
# Gemfile.rb
gem 'redis-rb'
- To initialize a connection to a Redis database.
# lib/redis.rb
REDIS = Redis.new(url: 'redis://redis:6379')
- To create an index using
FT.CREATE
.
# lib/redisearch/index.rb
module RediSearch
class Index
class << self
# Creates a new index
# @param [String] name
# @param [String] prefix
# @param [Hash] schema
# Example Usage:
# RediSearch::Index.create(name: 'products', prefix: 'products:', schema: { id: 'NUMERIC SORTABLE', name: 'TEXT SORTABLE' })
def create(name:, prefix:, schema:)
command = "FT.CREATE #{name} ON HASH PREFIX 1 #{prefix} SCHEMA #{schema.to_a.flatten.join(' ')}"
REDIS.call(command.split(' '))
end
end
end
- To add a record to the Redis database.
REDIS.mapped_hmset("products:1", { id: 1, name: 'mango', quantity: 2 })
- To search the Redis database using the RediSearch module.
#lib/redisearch/document.rb
require_relative '../string_extensions'
module RediSearch
class Document
using ::StringExtensions
class << self
# Performs searching on a index for a particual term
# @param [String] index_name
# @param [String] term
# @param [Class] klass You can specify the underlying object class for mapping the results
# @param [Hash] filters
# @param [Hash] paging
# @param [Hash] sort
# Example Usage:
# RediSearch::Document.search(index_name: "business-products-1",
# term: "app",
# klass: Product,
# filters: { stock_quantity: { min: 0, max: 4 } },
# paging: { limit: 10, offset: 0 },
# sort: { key: :stock_quantity, direction: :desc })
def search(index_name:, term: nil, klass: nil, filters: {}, paging: {}, sort: {})
command = build_query(index_name: index_name, term: term)
command = apply_filters(command: command, filters: filters) unless filters.empty?
command = apply_pagination(command: command, paging: paging) unless paging.empty?
command = apply_sorting(command: command, sort: sort) unless sort.empty?
parse_results(value: REDIS.call(command.split(' ')), klass: klass)
end
private
def build_query(index_name:, term: nil)
query = '*'
query.prepend(term.escape_special_characters) unless term.nil?
"FT.SEARCH #{index_name} #{query}"
end
def apply_pagination(command:, paging:)
command << " LIMIT #{paging[:offset]} #{paging[:limit]}"
end
def apply_sorting(command:, sort:)
command << " SORTBY #{sort[:key]} #{sort[:direction]}"
end
# Remaing implementation could be found in codebase at [https://github.com/nimblehq/redisearch-ruby-demo](https://github.com/nimblehq/redisearch-ruby-demo)
end
end
Deployment
There are two options for deployment:
- Self managed Redis using the official Docker Image.
- Redis Cloud: Official cloud based offering by Redis.
✋ At the time of writing this article, AWS ElastiCache does not support options to add modules like RediSearch.
Conclusion
Should you use RediSearch instead of Elasticsearch? As with any engineering decision, it depends. Below are the pros and cons which can help to decide to pick one over the other and/or at least understand the trade-offs of using RediSearch instead of Elasticsearch.
Pros
- Faster in some benchmarks
- If you are already using Redis for cache, the same Redis host and dataset can be used for Full-Text searching.
Cons
- Limited search algorithms compared to Elasticsearch.
- Only a default tokenizer is available, therefore there is no capability to use an n-gram tokenizer like Elasticsearch.
- Limited libraries available for several languages and platforms.
- Community adoption is relatively low at the moment.