Debugging and Testing ES Systems
-
Upload
chris-birchall -
Category
Technology
-
view
4.503 -
download
6
Transcript of Debugging and Testing ES Systems
Debugging and Testing ES Systems
Chris Birchall
2013/8/29Elasticsearch 勉強会 第1回#elasticsearchjp
Elasticsearch and me
● At Infoscience, helped build a log management product based on ES + Hadoop
● At M3, ES evangelist (??)○ Maintain ES cluster○ Help dev teams integrate ES into their apps
Twitter: @cbirchallGithub: https://github.com/cb372
Search at M3
● Using ES for all new services○ Search, recommendation (MoreLikeThis)
● Slowly migrating other services from Solr
● A few legacy services use Lucene directly
● Running all indices on one ES cluster
● Kuromoji for Japanese content
Debugging
Mostly debugging of queries● “Why doesn’t doc X match query Y?”● “Why does this search return no results?”
Operational issues are very rare● ES’s clustering magic is surprisingly
stable!● No performance issues so far
Debugging - Step 1
Check for typos!
ES will silently ignore many typos in settings/mapping definitions
Typo - Example
$ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 }, "mapping" : { "article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" },
... } }, ...}'
Let’s create a new index...
Typo - Example (cont’d)
{"ok":true,"acknowledged":true}
Response from ES:
OK, seems fine...
Typo - Example (cont’d)
$ curl localhost:9200/myapp/_mappings?pretty
Response from ES:{ "myapp" : { }}
Eh? Where are my lovingly-crafted mappings?!
Now check the mappings...
Typo - Example (cont’d)
$ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 },
"mappings" : {
"article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" },
... } }, ...}'
Oops!
Debugging - Step 2
Set up a local environment● Makes it easy to wipe & rebuild index
Setting up a local env (OSX)# Install$ brew install elasticsearch
# Kuromoji plugin (optional)$ /usr/local/opt/elasticsearch/bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/1.5.0
# Start$ elasticsearch
# Create index$ curl -X PUT localhost:9200/my_app -d '{ ... }'
# Insert some documents$ curl -X PUT localhost:9200/my_app/my_type/1 -d '{ ... }'$ curl -X PUT localhost:9200/my_app/my_type/2 -d '{ ... }'
# Done!
Useful commands - Analyze$ curl 'localhost:9200/myindex/_analyze?pretty' / -d '東京特許許可局許可局長'{ "tokens" : [ { "token" : "東京", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "特許", "start_offset" : 2, "end_offset" : 4, "type" : "word", ...
How is my
document/query
being
tokenized?
Useful commands - Explain$ curl 'localhost:9200/kuro/docs/123/_explain?pretty' / -d '{ "query": { "term": { "body": "東京" } } }'{ ... "matched" : true, "explanation" : { "value" : 0.375, "description" : "weight(body:東京 in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.375, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" ...
Why does this
document (not)
match this
query?
Specify document ID
Tuning queries
Parameters to tweak● default_operator (AND/OR)● auto_generate_phrase_queries● minumum_should_match● Stop words/tags● Kuromoji
○ Segmentation mode○ Reading form filter○ Disable Kuromoji! (for some fields)
Why disable Kuromoji?Problem: occasionally weird tokenization
● AND query will fail, because not all terms match● OR query will match any document with 病院
→ low precision
Phrase Terms
特定医療法人財団 日本会 東日本病院(document field)
特定、医療、法人、財団、日本、会、東日本、病院
東日本 (query) 東日、東日本、本
東日本病院 (query) 東、東日本、日本、病院
Useful plugin - Head$ bin/plugin -install mobz/elasticsearch-head
http://mobz.github.io/elasticsearch-head/
Testing
Main goal: Ensure that queries return the results that we expect● Test coverage of representative queries
○ Freedom to tune for a given query without breaking other queries
Ideally, tests should:● Run fast● Run standalone (i.e. no need to have an
ES server running)
Testing - Java
elasticsearch-test is awesome● DSL to set up/tear down ES ● Annotations + JUnit runner● ES runs in-process
○ No need to start an external ES server● Index is stored in-memory
○ Runs quickly
https://github.com/tlrx/elasticsearch-test
https://github.com/cb372/elasticsearch-test-example
Testing - Java
Simple elasticsearch-test example
Testing - Ruby
Simple Rails + Tire + RSpec examplehttps://github.com/cb372/elasticsearch-rspec-example
We’re hiring!
TODO We are hiring slide
http://bit.ly/m3jobs