When it comes to optimizing Magento 2 Elasticsearch configuration, one often overlooked yet critical feature is word delimiters. If you’ve ever scratched your head wondering why certain search queries don’t return the expected results—especially when dealing with SKUs, hyphenated words, or product codes—the culprit could very well be Magento 2’s Elasticsearch word delimiter settings.
In this guide, we’ll break down what word delimiters are, why they matter, and how to configure them correctly in your Magento 2 store.
What Are Word Delimiters in Elasticsearch?
In the world of Magento 2 Elasticsearch configuration, word delimiters are part of the tokenization process. Tokenization is how Elasticsearch breaks down text into smaller parts (tokens) for indexing and searching.
A delimiter is simply a character that separates words or tokens. Common delimiters include:
- Hyphens (-)
- Underscores (_)
- Spaces ( )
- Periods (.)
Why Word Delimiters Matter in Magento 2
Imagine you have two SKUs:
- ABC-123-XYZ
- 123-XYZ
If a customer searches for “123,” they might expect to see both products. However, if the Elasticsearch word delimiter settings aren’t configured correctly, only 123-XYZ might appear. This happens because the hyphen (-) acts as a separator, and Elasticsearch treats “ABC-123-XYZ” differently depending on how it’s tokenized.
The Role of Tokenizers and Analyzers in Magento 2
To understand how word delimiters work, it’s essential to grasp two key Elasticsearch components:
- Tokenizers: Break text into tokens based on rules (like spaces or hyphens).
- Analyzers: Apply additional processing, such as lowercasing or removing stop words.
Magento 2 uses Elasticsearch’s built-in analyzers, but you can customize these to handle word delimiters more effectively.
Common Search Issues Caused by Word Delimiters
- SKU Search Fails with Partial Matches: Searching for part of a SKU like “123” doesn’t return all relevant products.
- Hyphenated Product Names Are Missed: Products like “High-Speed-USB” don’t appear when searching “High Speed.”
- Inconsistent Results with Special Characters: Searches behave unpredictably when special characters are involved.
How to Configure Word Delimiters in Magento 2 Elasticsearch
Magento 2 doesn’t offer direct UI options for advanced Elasticsearch configurations like word delimiters. You’ll need to modify the Elasticsearch index settings via command line or API.
Step 1: Identify Your Elasticsearch Index
Magento creates an index for product data, typically named like:
magento2_product_1_v1
You can list all indexes using:
curl -X GET "localhost:9200/_cat/indices?v"
Step 2: Define a Custom Analyzer with Word Delimiter Filter
Create a JSON file (e.g., custom_analyzer.json) with the following configuration:
{
"settings": {
"analysis": {
"filter": {
"custom_word_delimiter": {
"type": "word_delimiter",
"preserve_original": true,
"catenate_words": true,
"catenate_numbers": true,
"catenate_all": true
}
},
"analyzer": {
"sku_analyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"custom_word_delimiter"
]
}
}
}
}
}
Explanation of Key Settings:
- preserve_original: Keeps the original token along with the split tokens.
- catenate_words/numbers: Combines split words/numbers into single tokens.
- catenate_all: Joins all tokens into one, improving partial search matches.
Step 3: Apply the Configuration to Elasticsearch
Delete the existing index: (Backup your data first!)
curl -X DELETE "localhost:9200/magento2_product_1_v1"
Create a new index with the custom analyzer:
curl -X PUT "localhost:9200/magento2_product_1_v1" -H
'Content-Type: application/json' -d @custom_analyzer.json
Reindex Magento:
php bin/magento indexer:reindex
Clear Cache:
php bin/magento cache:clean
php bin/magento cache:flush
Now, your Magento 2 Elasticsearch configuration will handle word delimiters more effectively.
Advanced Configuration Tips
- Use Edge N-grams for Autocomplete: Combine word delimiter filters with edge_ngram for faster autocomplete suggestions.
- Custom Synonym Filters: Improve search by adding synonyms, e.g., mapping “laptop” to “notebook.”
- Test Tokenization: Use Elasticsearch’s _analyze API to verify tokenization:
curl -X GET "localhost:9200/magento2_product_1_v1/_analyze" -H
'Content-Type: application/json' -d'{ "analyzer": "sku_analyzer",
"text": "ABC-123-XYZ" }'
This will show how the SKU is broken into tokens.
Common Mistakes to Avoid
- Forgetting to Reindex After Changes: Always reindex Magento after modifying Elasticsearch settings.
- Overcomplicating Analyzers: Keep configurations simple to avoid performance issues.
- Ignoring Case Sensitivity: Ensure the lowercase filter is applied to avoid case-related mismatches.
Magento 2 Elasticsearch Configuration Best Practices
- Regularly Monitor Search Performance: Use tools like Kibana or Magento’s built-in reports to track search queries and results.
- Back Up Before Changes: Always back up your Elasticsearch data and Magento database before making major changes.
- Test in a Staging Environment: Apply new configurations in a staging environment before deploying to production.
Final Thoughts
Properly configuring word delimiters in your Magento 2 Elasticsearch configuration can significantly improve your store’s search accuracy, especially when dealing with SKUs, product codes, and hyphenated names. By understanding how tokenizers and analyzers work, applying custom filters, and regularly testing your configuration, you’ll create a smoother, more accurate search experience for your customers.
