Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
-
Upload
lucidworks -
Category
Software
-
view
207 -
download
1
Transcript of Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
![Page 1: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/1.jpg)
![Page 2: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/2.jpg)
Approaching Join Indexyet another one join algorithm
Mikhail Khludnevprincipal engineer
![Page 3: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/3.jpg)
PRIVILEGED AND CONFIDENTIAL
• Grid Dynamics is a Silicon Valley-based leading provider of scalable, next-generation commerce technology solutions
• Record of outperformance with Tier 1 retail clients
• Fortune 1000 client relationships
![Page 4: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/4.jpg)
About Me● principal engineer at Grid Dynamics
● spoke at few last LuceneRevolutions
● contributed BlockJoin query parser for Solr - {!parent}
● blogged about it at http://blog.griddynamics.com/
● tried to fix threads at DataImportHandler
http://google.com/+MikhailKhludnev
![Page 5: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/5.jpg)
You are expected to know● how Lucene searches/filters
● how it counts facets
● that there are segments
● what is DocValues
● why to join
● RDBMS joins: nested loop join, sort-merge join and hash join.
![Page 6: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/6.jpg)
I’m expected to know● query-time join
● index-time join
● yet another one join
![Page 7: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/7.jpg)
Lucene/Solr Is Strong● searching
○ filtering
● analytics
○ facets
○ pivots
○ stats
![Page 8: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/8.jpg)
Lucene/Solr Is Weak in
● robust joins
○ multiple entities
○ relations
![Page 9: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/9.jpg)
Joins in General
![Page 10: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/10.jpg)
Joins in General
![Page 11: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/11.jpg)
Joins in General
PK=FK
![Page 12: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/12.jpg)
Joins in General
PK=FK
![Page 13: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/13.jpg)
Joins in General
PK=FK
![Page 14: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/14.jpg)
children
Joins in General
1:M
parents
![Page 15: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/15.jpg)
Executing Join Query
q
![Page 16: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/16.jpg)
Executing Join Query
q
![Page 17: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/17.jpg)
Executing Join Query
q
![Page 18: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/18.jpg)
Executing Join Query
qfq
![Page 19: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/19.jpg)
Executing Join Query
qfq
![Page 20: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/20.jpg)
Join in General
parents ∩ join-relation ∩ children
![Page 21: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/21.jpg)
Join in General
parents ∩ join-relation ∩ children
● input● output● enumeration order
![Page 22: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/22.jpg)
JoinUtil
q
“25”
“17”
“17”
“25”
“25”“56”
“56”
“56”
“25”
“4”
“61”
FK[doc#]
![Page 23: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/23.jpg)
JoinUtil
q
FK[doc#]
“17”
“17”
“25”
“25”“56”
“56”
“56”
“25”
“4”
“61”
“25”
“25”
“17”
...
![Page 24: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/24.jpg)
JoinUtil
q
FK
“1” →△…“17”→△“25”→△...
“25”
“17”
...
“25”
“17”
“17”
“25”
“25”“56”
“56”
“56”
“25”
“4”
“61”
FK[doc#]
![Page 25: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/25.jpg)
JoinUtil
q
“1” →△…“17”→△“25”→△...
“25”
“17”
...
fq“25”
“17”
“17”
“25”
“25”“56”
“56”
“56”
“25”
“4”
“61”
FK[doc#]FK
![Page 26: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/26.jpg)
Block Joindoc#
![Page 27: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/27.jpg)
Block Joindoc#
![Page 28: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/28.jpg)
Block Joindoc#1
0
0
1
0
0
1
0
0
1
0
![Page 29: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/29.jpg)
Block Joindoc#1
0
0
1
0
0
1
0
0
1
0
q
![Page 30: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/30.jpg)
Block Joindoc#1
0
0
1
0
0
1
0
0
1
0
qfq
![Page 31: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/31.jpg)
Block Joindoc#1
0
0
1
0
0
1
0
0
1
0
qfq
![Page 32: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/32.jpg)
Comparison
JoinUtil BlockJoin
searchingis fast
reindexingis fast
![Page 33: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/33.jpg)
Comparison
JoinUtil BlockJoin
searchingis fast
reindexingis fast
![Page 34: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/34.jpg)
Comparison
JoinUtil BlockJoin
searchingis fast < ? <
reindexingis fast > ? >
![Page 35: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/35.jpg)
Comparison
JoinUtil BlockJoin
searchingis fast < ? <
reindexingis fast > ? >
doc#
![Page 36: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/36.jpg)
Comparison
JoinUtil BlockJoin
searchingis fast < ? <
reindexingis fast > ? >
doc#
![Page 37: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/37.jpg)
Comparison
JoinUtil BlockJoin
searchingis fast < ? <
reindexingis fast > ? >
doc#
![Page 38: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/38.jpg)
Join Index
q
doc#[doc#]
3
6
0
3
10
0
63
![Page 39: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/39.jpg)
Join Index
q
doc#[doc#]
3
6
0
3
10
0
63
![Page 40: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/40.jpg)
Join Index
q
doc#[doc#]
fq 3
6
0
3
10
0
63
![Page 41: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/41.jpg)
Join Index
qfq
doc#[doc#]
2
4
1
6
5 10
9
8
![Page 42: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/42.jpg)
Join Index
q
doc#[doc#]
fq
2
4
1
6
5 10
9
8
![Page 43: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/43.jpg)
Join Index
q
doc#[doc#]
fq
2
4
1
6
5 10
9
8
![Page 44: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/44.jpg)
Benchmarking 2.9 M docs
https://github.com/m-khl/lucene-solr/tree/dvjoin-benchmark
Latency, ms the bigger the worse
10
27
14BlockJoin
(i-time)
Join Index
JoinUtil(q-time)
![Page 45: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/45.jpg)
doc#[doc#]Indexing is still a problem
2
43
6
103
10
0
63
5 10
9
8
![Page 46: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/46.jpg)
Further Plans● incremental join-index update
● perhaps just calculate and cache it
● join in both directions
● calculate optimal execution plan of segments enumeration
![Page 47: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/47.jpg)
Summary● Joins in General
● JoinUtil vs Block-join
● {!scorejoin } - SOLR-6234
● updatable DocValues
● opportunities for improving query-time joins:○ eliminate term enum○ choose lower cardinality side for enumeration
![Page 48: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/48.jpg)
References● Searching relational content with Lucene's BlockJoinQuery
http://blog.mikemccandless.com/● Solr Experience: search parent-child relations. Part I
Solr block-join supporthttp://blog.griddynamics.com/
● https://wiki.apache.org/solr/Join● http://www.slideshare.net/martijnvg/document-relations● https://issues.apache.org/jira/browse/SOLR-6234 {!scorejoin }
● Updatable DocValues Under the Hoodhttp://shaierera.blogspot.com/
● Subject: How to openIfChanged the most recent merge?at: [email protected]
![Page 49: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/49.jpg)
Thanks for Joining us!
![Page 50: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/50.jpg)
Off scope
![Page 51: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/51.jpg)
Joins’ Zoo in Lucene True Joins
● query-time join
○ JoinUtil
○ {!join }
○ {!scorejoin } - SOLR-6234
● index-time join aka block-join {!
parent}
![Page 52: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/52.jpg)
Joins’ Zoo in Lucene Workarounds
● term positions/SpanQueries
● FieldCollapsing/Grouping
● term decoration
○ spatial
● multivalue fields
True Joins● query-time join
○ JoinUtil
○ {!join },
○ {!scorejoin } - SOLR-6234
● index-time join aka block-join {!
parent}
![Page 53: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/53.jpg)
Joins’ Zoo in Lucene Workarounds
● term positions/SpanQueries
● FieldCollapsing/Grouping
● term decoration
○ spatial
● multivalue fields
True Joins● query-time join
○ JoinUtil
○ {!join },
○ {!scorejoin } - SOLR-6234
● index-time join aka block-join {!
parent}
![Page 55: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/55.jpg)
JoinUtil● query-time
● indexing is fast
● searching is slow, why?
○ expensive term enum
○ single enumeration order
BlockJoin● index-time
● reindexing whole block is as expensive as mandatory
● searching is darn fast, however
○ can’t reorder child docs
![Page 56: Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics](https://reader030.fdocument.pub/reader030/viewer/2022032420/55a4e1f21a28aba70e8b484e/html5/thumbnails/56.jpg)