First impressions of Meilisearch and how it compares to Elasticsearch

Thursday, Jan 26, 2023
1 comment Elasticsearch

tl;dr Meilisearch is like Elasticsearch but simpler. Decent parity in functionality and performance, but definitely intriguing if you don't already know Elasticsearch or want to run with fewer resources.

Meilisearch is a full-text search solution that you can use to power a really good site-search solution. My personal blog uses Elasticsearch but I wanted to experiment with switching to Meilisearch. This blog post is about some impressions based on this experiment.

Here are some of my observations:

Memory usage

When I start Elasticsearch and index all my blog posts and all comments, on the Activity Monitor that java process uses 1.3GB. The meilisearch process peaks at 290MB.

Indexing performance

In my case, it doesn't matter. When you constantly update the index (Elasticsearch) the time is unimportantly small when the dataset is small.
If you do a mass-reindexing you do that, in Elasticsearch, by creating a new index with a timestamp (e.g. blogpost-20230126134512) and then swap the alias with which you send your search queries. In that strategy, it doesn't matter how many seconds it takes because nobody's waiting for it to finish fast.

At the moment I don't even know how to append more to a Meilisearch index. I only know how to index everything all at once.

Note with the Elasticseach SDK (in Python) you can pass a generator to the parallel_bulk helper function meaning you can kinda "stream" in all the documents without loading them all into memory. I.e. I can't do queued = index.add_documents(get_docs_iterator()) so I have to instead do queued = index.add_documents(list(get_docs_iterator())).

Complex relevancy is easier, but more magic

Ranking search results by "matchness" is easy. I.e. when the search terms are "more present" it's likely to be more relevant. Elasticsearch sorts by _score by default. So does Meilisearch. In reality, you want to control that more. In my use case, I want the ranking to be a function of "matchness" and popularity (which is a number I derive from pageview metrics). In Elasticsearch you have the power writing a "Function score query" which gives you lots of flexibility to control exactly how. E.g. multiply or sum or an average or combination where you write a log function.

With Meilisearch you can only control the sort order of relevancy "algorithms". An example is:


[
  "words",
+ "popularity:desc",
  "typo",
  "proximity",
  "attribute",
  "sort",
  "exactness"
]

I can't exactly phrase myself how I'd exactly prefer it but it feels a bit like magic. The lack of functionality also speaks to a strength of Meilisearch in that it's easy to get something to incorporate it.

Highlighting is easy

What I want is that the title to be highlighted. In full. For the text I'd rather have snippets that focus where the highlights appear within the text. This was a breeze! Here's how you do it:


res = client.index("blogitems").search(
    q,
    {
        "attributesToHighlight": ["title", "text"],
        "highlightPreTag": "<mark>",
        "highlightPostTag": "</mark>",
        "attributesToCrop": ["text"],
        "cropLength": 30,
    },
)

Relevancy in between fields is too basic

In Elasticsearch you use a boost multiplier to specify, that the title is more important than the body. For example:


from elasticsearch_dsl import Q

...

title_match = Q("match", title={"query": word, "boost": 3.5}) 
body_match = Q("match", body={"query": word, "boost": 1.0})
match = title_match | body_match

That means you can express that the title is 3.5 times more important than the match on body.

With Meilisearch you have no such functionality (as far as I can see) but you can say that title > body and the way you do that is a bit odd. It's the order with which you define the searchable fields.

Search performance

My dataset is small. About 2,000 documents. I wasn't able to measure a significant difference. In both my search query implementations the time it takes is between 10 and 30 milliseconds. Both are fast enough. The time that matters is the networking overheads. The networking to and fro the database probably matters more but if the network is localhost the search time is irrelevant.

In conclusion

When you're already comfortable and versed in the more powerful beast that is Elasticsearch, it's less relevant. However, Meilisearch feels like a nicer experience in its simplicity if you're confronted with a choice on your next full-text search project.

You could say that in terms of core search functionality, to me, Meilisearch sits between PostgreSQL's full-text and Elasticsearch.

What often matters more, if the project is a team effort that involves many people that might outlast you, the operational side matters more. I.e. do you install it yourself or do you use a proprietary cloud provider (which both Elastic and Meilisearch Cloud are) then that's what needs to be more carefully considered? It's good to know though that Meilisearch has most of the core functionality, including great documentation, to build something really great.

Comments

Mikael March 9, 2023

Great review!

We’re also missing function scoring in Meili, we use that quite heavily in ES (decay, for example).

Lack of aliasing or some other mechanism to handle reindexing is also notable.

I hope they’ll add it so we can switch.