Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
Elgg solr presentation
-
Upload
beck24 -
Category
Technology
-
view
217 -
download
0
Transcript of Elgg solr presentation
Blue and Red Gradient
Elgg Search Scalability & Solr Integration
Matt Beckett
Community search moved to solr Jun 17, 2014
Matt Beckett
Elgg Core Team Member
Lead Dev Arck Interactive
Scuba Diver
Hello and welcome.
My name is Matt Beckett, you may know me from such places as the internet and underwater.I have been involved with Elgg since April, 2011 and quickly became a very productive plugin writer for various clients, notably Athabasca University.I have been a member of the Elgg core team since October 2013.I'm also the lead developer at Arck Interactive, one of the top Elgg dev outfits.
Sorry for the shameless plug, but every time I say Arck Interactive Paul gives me a raise ;)
Outline
Bundled Elgg Search
Scalability issues
Birth of the Elgg Solr Plugin
What is Solr?
Elgg-Solr integration
Customization
Case Study
Before we dive into the code lets just back up a bit and take a look at the history of search in Elgg.I came to Elgg when at version 1.7.8, and search was a bundled core plugin. It has been since 1.7.0.According to the code attribution it was a collaborative effort between Curverider and The MITRE Corporation (oh, also, whenever I say The MITRE Corporation they send me a contract offer worth more than Paul's last raise so t his should be a profitable trip!)
The core plugin brought some important features to search capability a standardized hook based framework and a nice way to customize results display with simple view overrides.
The plugin is mostly unchanged to from that point to now
Elgg Search
Bundled core plugin
Provides customizable UI
Search logic is hookable
Works out of the box
Bundled with Elgg this is something people expect with a social framework, the ability to search, and there it is supported in core.
Works as advertised you type in a query, and you get results matching that query. No magic involved and not much unexpected.
No setup/config it comes enabled by default and there's nothing else to it. No APIs, external services or technical debt.
Elgg Search Scalability
Large sites run into slow search times
Can affect performance of all areas of site
Combination of MySQL and Elgg data normalization
Elgg Community - 2014
Bundled with Elgg this is something people expect with a social framework, the ability to search, and there it is supported in core.
Works as advertised you type in a query, and you get results matching that query. No magic involved and not much unexpected.
No setup/config it comes enabled by default and there's nothing else to it. No APIs, external services or technical debt.
Billy GunnMyISAM a big part of the standard elgg performance improvements include converting database tables to innodb for row level transactions. Benchmarks have consistently shown this to be a faster overall schema, but until recently innodb did not support full text search
Not scalable with DB size: we saw this on the Elgg community. Back in 2014 we had to switch over to google search because searches were timing out over 30 seconds...
Tag search: we have the ability to register multiple names for tag metadata, each one causes the query to become heavier
SLOW: filtering results by arbitrary metadata
MyISAM a big part of the standard elgg performance improvements include converting database tables to innodb for row level transactions. Benchmarks have consistently shown this to be a faster overall schema, but until recently innodb did not support full text search
Not scalable with DB size: we saw this on the Elgg community. Back in 2014 we had to switch over to google search because searches were timing out over 30 seconds...
Tag search: we have the ability to register multiple names for tag metadata, each one causes the query to become heavier
SLOW: filtering results by arbitrary metadata
So that's core search in Elgg, so what is Solr?
What is Solr?
Java based search engine
Single purpose and built for speed
Flat xml document structure
File content searching
Flexible setup options (same/different server, load balancing)
First and foremost Solr is a java based search engine. It's a single purpose application built for speed of searching xml documents. XML documents have arbitrary fields so they can be fit to model your data.It has a file parser that allows for indexing the content of a wide array of file types.
Being java based it's OS independent and can be deployed on the same webserver as other applications such as Elgg, or load balanced on multiple servers.
Solr Plugin Design
Generic for use in any Elgg project
Utilize existing:- Pagehandlers- Views- Hooks
First and foremost Solr is a java based search engine. It's a single purpose application built for speed of searching xml documents. XML documents have arbitrary fields so they can be fit to model your data.It has a file parser that allows for indexing the content of a wide array of file types.
Being java based it's OS independent and can be deployed on the same webserver as other applications such as Elgg, or load balanced on multiple servers.
So why did I choose Solr? I didn't. The Solr plugin was originally started as a solution by Billy Gunn for one of our clients with a large database that was experiencing some major performance issues with search. It is however an official FOSS project of the Apache Software Foundation, and is used by many big players which means it's well tested, well maintained, and well supported. Those are all good qualities to look for when pulling a new service into a project.Billy did the original implementation for the client, I then took over and made some improvements, eventually rewriting it and making it generic enough for general release as an opensource Elgg plugin.
So why did I choose Solr? I didn't. The Solr plugin was originally started as a solution by Billy Gunn for one of our clients with a large database that was experiencing some major performance issues with search. It is however an official FOSS project of the Apache Software Foundation, and is used by many big players which means it's well tested, well maintained, and well supported. Those are all good qualities to look for when pulling a new service into a project.Billy did the original implementation for the client, I then took over and made some improvements, eventually rewriting it and making it generic enough for general release as an opensource Elgg plugin.
Indexing
Mirroring an ElggEntity in Solr
Hookable custom field management
Flatten data structure
Match Solr entity with ElggEntity by GUID
Event-based synchronization
How it works
ElggEntityAnnotationElgg DBcreate/updateEventEventEvent
Shutdown
CachedGUIDSolr Index
Searching
Pagehandler & hook calling handled by core plugin
Default hooks unregistered
Hook parameters interpreted into Solr Query notation
All default parameters handled automagically
Search Hook Parameters
$params['select'] = ['start' => (int) offset,'rows' => (int) limit,'fields' => (array) field names to match against];$params['sorts'] = ['score' => 'desc','time_created' => 'desc'];
Search Hook Parameters
$params['qf'] = title^1.5 description^1 location^1;
$params['hlfields'] = ['title','description'];
$params['fragsize'] = 200;
Search Hook Parameters
$params['fq'] = ['type' => 'type:object','subtype' => 'subtype:blog'];
eg. $params['fq'] = ['profile_pic' => 'profile_pic:true'];
How it works
Elgg DB
SearchPagehander
Hook
Solr IndexUserQuerySolarium
Solarium
Hook
Results
Code time, finally
$event = new \ElggObject();$event->subtype = 'event';$event->access_id = ACCESS_PUBLIC;$event->title = $title;$event->description = $description;$event->location = $location;$event->start_time = time(); // starting now$event->end_time = strtotime('+3 days'); // ending in 3 days
Helper Plugin
Dynamic fields
_i : integer_is : array of integers_s : string (title)_ss : array of strings (tags, etc)_t : general text (description)_txt : array of texts_b : boolean_bs : array of booleans_f : float_fs : array of floats
Case Study: EN MIRG
Executive Networks
Member Information Report Generator
Staff facilitated communication
Multiple reports with varying conditions
Solr to the rescue!
Conclusions
MySQL: 41.89 secondsSolr: 0.29 secondsSolr === Fast(144x faster in this case)
Todos
Https://github.com/arckinteractive/elgg_solr
Code cleanup
Multi-threaded reindex
Index auto-correction
Other ideas?