In the first part of this series, I explained how to use Elasticsearch for traditional keyword search. Elasticsearch excels at keyword-based searches, and it’s one of the primary reasons many organizations use it. I also presented an approach to using Elasticsearch in combination with semantic search and explained why it was not effective.
In this article, I will present a more robust approach to combine keyword search and semantic search. But first, let’s talk about the advantages of that approach.
Keyword search in systems like Elasticsearch leverages inverted indices, which allow for rapid lookups of terms in large datasets. For short, specific queries, this can result in lightning-fast responses.
For precise queries containing unique terms or names, keyword search can often return more accurate results than semantic search. This is because it’s looking for exact or near-exact matches rather than trying to infer meaning.
Short questions can sometimes be challenging for semantic search models. When the question is concise and the analyzed chunks of text are significantly larger, the semantic meaning derived from embeddings might not align perfectly. The vector representation of a short question might not capture its nuance as effectively as that of a larger chunk of text.
Keyword searches excel when granularity matters. For instance, if you’re looking for mentions of a particular name or term, keyword search will pinpoint those occurrences. In contrast, semantic search might overlook them if the overall context of the chunk doesn’t align semantically with the query, even if the exact term is present.
While modern semantic search models, especially those based on transformers like BERT or its variants, are powerful, they can be computationally intensive. For vast datasets and frequent searches, keyword searches might scale better in terms of cost and performance.
So now that I have hopeffuly convinced you about the importance of combining semantic search and traditional keyword search. Let’s define the approach we will follow.