Debugging and Testing ES Systems

21
Debugging and Testing ES Systems Chris Birchall 2013/8/29 Elasticsearch 勉強会 第1回 #elasticsearchjp

Transcript of Debugging and Testing ES Systems

Page 1: Debugging and Testing ES Systems

Debugging and Testing ES Systems

Chris Birchall

2013/8/29Elasticsearch 勉強会 第1回#elasticsearchjp

Page 2: Debugging and Testing ES Systems

Elasticsearch and me

● At Infoscience, helped build a log management product based on ES + Hadoop

● At M3, ES evangelist (??)○ Maintain ES cluster○ Help dev teams integrate ES into their apps

Twitter: @cbirchallGithub: https://github.com/cb372

Page 3: Debugging and Testing ES Systems

Search at M3

● Using ES for all new services○ Search, recommendation (MoreLikeThis)

● Slowly migrating other services from Solr

● A few legacy services use Lucene directly

● Running all indices on one ES cluster

● Kuromoji for Japanese content

Page 4: Debugging and Testing ES Systems

Debugging

Mostly debugging of queries● “Why doesn’t doc X match query Y?”● “Why does this search return no results?”

Operational issues are very rare● ES’s clustering magic is surprisingly

stable!● No performance issues so far

Page 5: Debugging and Testing ES Systems

Debugging - Step 1

Check for typos!

ES will silently ignore many typos in settings/mapping definitions

Page 6: Debugging and Testing ES Systems

Typo - Example

$ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 }, "mapping" : { "article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" },

... } }, ...}'

Let’s create a new index...

Page 7: Debugging and Testing ES Systems

Typo - Example (cont’d)

{"ok":true,"acknowledged":true}

Response from ES:

OK, seems fine...

Page 8: Debugging and Testing ES Systems

Typo - Example (cont’d)

$ curl localhost:9200/myapp/_mappings?pretty

Response from ES:{ "myapp" : { }}

Eh? Where are my lovingly-crafted mappings?!

Now check the mappings...

Page 9: Debugging and Testing ES Systems

Typo - Example (cont’d)

$ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 },

"mappings" : {

"article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" },

... } }, ...}'

Oops!

Page 10: Debugging and Testing ES Systems

Debugging - Step 2

Set up a local environment● Makes it easy to wipe & rebuild index

Page 11: Debugging and Testing ES Systems

Setting up a local env (OSX)# Install$ brew install elasticsearch

# Kuromoji plugin (optional)$ /usr/local/opt/elasticsearch/bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/1.5.0

# Start$ elasticsearch

# Create index$ curl -X PUT localhost:9200/my_app -d '{ ... }'

# Insert some documents$ curl -X PUT localhost:9200/my_app/my_type/1 -d '{ ... }'$ curl -X PUT localhost:9200/my_app/my_type/2 -d '{ ... }'

# Done!

Page 12: Debugging and Testing ES Systems

Useful commands - Analyze$ curl 'localhost:9200/myindex/_analyze?pretty' / -d '東京特許許可局許可局長'{ "tokens" : [ { "token" : "東京", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "特許", "start_offset" : 2, "end_offset" : 4, "type" : "word", ...

How is my

document/query

being

tokenized?

Page 13: Debugging and Testing ES Systems

Useful commands - Explain$ curl 'localhost:9200/kuro/docs/123/_explain?pretty' / -d '{ "query": { "term": { "body": "東京" } } }'{ ... "matched" : true, "explanation" : { "value" : 0.375, "description" : "weight(body:東京 in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.375, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" ...

Why does this

document (not)

match this

query?

Specify document ID

Page 14: Debugging and Testing ES Systems

Tuning queries

Parameters to tweak● default_operator (AND/OR)● auto_generate_phrase_queries● minumum_should_match● Stop words/tags● Kuromoji

○ Segmentation mode○ Reading form filter○ Disable Kuromoji! (for some fields)

Page 15: Debugging and Testing ES Systems

Why disable Kuromoji?Problem: occasionally weird tokenization

● AND query will fail, because not all terms match● OR query will match any document with 病院

→ low precision

Phrase Terms

特定医療法人財団 日本会 東日本病院(document field)

特定、医療、法人、財団、日本、会、東日本、病院

東日本 (query) 東日、東日本、本

東日本病院 (query) 東、東日本、日本、病院

Page 16: Debugging and Testing ES Systems

Useful plugin - Head$ bin/plugin -install mobz/elasticsearch-head

http://mobz.github.io/elasticsearch-head/

Page 17: Debugging and Testing ES Systems

Testing

Main goal: Ensure that queries return the results that we expect● Test coverage of representative queries

○ Freedom to tune for a given query without breaking other queries

Ideally, tests should:● Run fast● Run standalone (i.e. no need to have an

ES server running)

Page 18: Debugging and Testing ES Systems

Testing - Java

elasticsearch-test is awesome● DSL to set up/tear down ES ● Annotations + JUnit runner● ES runs in-process

○ No need to start an external ES server● Index is stored in-memory

○ Runs quickly

https://github.com/tlrx/elasticsearch-test

Page 19: Debugging and Testing ES Systems

https://github.com/cb372/elasticsearch-test-example

Testing - Java

Simple elasticsearch-test example

Page 20: Debugging and Testing ES Systems

Testing - Ruby

Simple Rails + Tire + RSpec examplehttps://github.com/cb372/elasticsearch-rspec-example

Page 21: Debugging and Testing ES Systems

We’re hiring!

TODO We are hiring slide

http://bit.ly/m3jobs