Elasticsearch And Ruby [RuPy2012]

23
Elasticsearch And Ruby Karel Mina ř ík

description

 

Transcript of Elasticsearch And Ruby [RuPy2012]

Elasticsearch And Ruby

Karel Minařík

Elasticsearch and Ruby

http://karmi.cz

Elasticsearch and Ruby

{elasticsearch in a nutshell}

Built on top of Apache LuceneSearching and analyzing big dataScalabilityREST API, JSON DSL

Great fit for dynamic languages and web-oriented workflows / architectures

http://www.elasticsearch.org

Elasticsearch and Ruby

}{

Elasticsearch and Ruby

}{

Elasticsearch and Ruby

Example

class Results include Enumerable attr_reader :query, :curl, :time, :total, :results, :facets

def initialize(search) response = JSON.parse( Slingshot.http.post("http://localhost:9200/#{search.indices}/_search", search.to_json) ) @query = search.to_json @curl = %Q|curl -X POST "http://localhost:9200/#{search.indices}/_search?pretty" -d '#{@query}'| @time = response['took'] @total = response['hits']['total'] @results = response['hits']['hits'] @facets = response['facets'] end

def each(&block) @results.each(&block) endend

Elasticsearch plays nicely with Ruby…

curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  '

{    "query"  :  {        "filtered"  :  {            "filter"  :  {                "range"  :  {                    "date"  :  {                        "from"  :  "2012-­‐01-­‐01",                        "to"      :  "2012-­‐12-­‐31"                    }                }            },            "query"  :  {                "bool"  :  {                    "must"  :  {                        "terms"  :  {                            "tags"  :  [  "ruby",  "python"  ]                        }                    },                    "must"  :  {                        "match"  :  {                            "title"  :  {                                "query"  :  "conference",                                "boost"  :  10.0                            }                        }                    }                }            }        }    }}

elasticsearch’s Query DSL

Elasticsearch and Ruby

Example

Tire.search('articles') do query do boolean do must { terms :tags, ['ruby', 'python'] } must { string 'published_on:[2011-01-01 TO 2011-01-02]' } end endend

Elasticsearch and Ruby

Example

tags_query = lambda do |boolean| boolean.must { terms :tags, ['ruby', 'python'] }end

published_on_query = lambda do |boolean| boolean.must { string 'published_on:[2011-01-01 TO 2011-01-02]' }end

Tire.search('articles') do query { boolean &tags_query }end

Tire.search('articles') do query do boolean &tags_query boolean &published_on_query endend

Elasticsearch and Ruby

Example

search = Tire.search 'articles' do query do string 'title:T*' end filter :terms, tags: ['ruby'] facet 'tags', terms: tags sort { by :title, 'desc' }end

search = Tire::Search::Search.new('articles')search.query { string('title:T*') }search.filter :terms, :tags => ['ruby']search.facet('tags') { terms :tags }search.sort { by :title, 'desc' }

Elasticsearch and Ruby

TEH PROBLEM

Designing the Tire library as domain-specific language, from the higher level, and consequently doing a lot of mistakes in the lower levels.

‣ Class level settings (Tire.configure); cannot connect to two elasticsearch clusters in one codebase

‣ Inconsistent access (methods vs Hashes)

‣ Not enough abstraction and separation of concerns

Elasticsearch and Ruby

”Blocks with arguments”(alternative DSL syntax)

Tire.search do query do text :name, params[:q] endend

Tire.search do |search| search.query do |query| query.text :name, params[:q] endend

The Git(Hub) (r)evolution

‣ Lots of contributions... but less feedback

‣ Many contributions focus on specific use case

‣ Many contributions don’t take the bigger picture and codebase conventions into account

‣ Almost every patch needs to be processed, polished, amended

‣ Maintainer: lots of curation, less development — even on this small scale (2K LOC, 7K LOT)

‣ Contributors very eager to code, but a bit afraid to talk

Elasticsearch and Ruby

Tire’s Ruby on Rails integration

$  rails  new  myapp  \      -­‐m  "https://raw.github.com/karmi/tire/master/examples/rails-­‐application-­‐template.rb"

‣ Generate a fully working Rails application with a single command

‣ Downloads elasticsearch if not running, creates the application, commits every step, seeds the example data, launches the application on a free port, …

‣ Tire::Results::Item fully compatible with Rails view / URL helpers

‣ Any ActiveModel compatible OxM supported

‣ Rake task for importing data (using pagination libraries)

Elasticsearch and Ruby

Rails integration baked in‣ No proper separation of concerns / layers

‣ People expect everything to be as easy as that

‣ Tire::Results::Item baked in, not opt-in, masquerades as models

‣ People consider ActiveRecord the only OxM in the world

Base library (HTTP, JSON, API)

The Ruby DSL

ActiveModel integration

ActiveRecord extensions

Rails extensions

Persistence extension

Elasticsearch and Ruby

class Rubygem < ActiveRecord::Base # ...

def self.search(query) conditions = <<-SQL versions.indexed and (upper(name) like upper(:query) or upper(translate(name, '#{SPECIAL_CHARACTERS}', '#{' ' * SPECIAL_CHARACTERS.length}')) like upper(:query)) SQL

where(conditions, {:query => "%#{query.strip}%"}). includes(:versions). by_downloads endend

https://github.com/rubygems/rubygems.org/blob/master/app/models/rubygem.rb

“Search”

“Hello Cloud” with Chef Server

http://git.io/chef-hello-cloud

‣ Deploy Rubygems.org on EC2 (or locally with Vagrant) from a “zero state”

‣ 1 load balancer (HAproxy), 3 application servers (Thin+Nginx)

‣ 1 database node (PostgreSQL, Redis)

‣ 2 elasticsearch nodes

‣ Install Ruby 1.9.3 via RVM

‣ Clone the application from GitHub repository

‣ init.d scripts and full configuration for every component

‣ Restore data from backup (database dump) and import into search index

‣ Monitor every part of the stack

Elasticsearch and Ruby

Thanks!d