20151022 elasticsearch 적용및활용_송준이_sds발표용

적용 및 활용

로엔엔터테인먼트플랫폼개발팀2015.10.22송준이 (

[email protected])

mailto:[email protected]

목차

• 들어가기 전에– 검색이란

• 시연– Developer Guide

• 설치하기

– 검색엔진이란 ?• 색인과 검색• 검색하기

– 형태소 분석이란 ?• 한글 형태소 분석• 동의어 처리

– 응용하기• 파워 네트워크

목차

• 추가 설명 자료– about elasticsearch

– inside a cluster

– inside a shard

들어가기 전에…

• 가사에 “사랑”을 포함하는 곡을 찾는다면

검색이란

title: 1lyrics: …..…………….……..

문서

title lyrics1 ………………………

……2 ………………………

……… ………………………

……table in RDBMS

색인

검색 • “ 사랑”을 포함하는 row full scan=> 1M

term

docs

사랑 1, 7, 3, …행복 23, 54, …… ………………………

……inverted index in search engine

색인

• “ 사랑”을 포함하는 row full scan=> 1검색

시연

melon-helloEs

http://socurites.com:8088/melon-helloes-0.0.1-SNAPSHOT/overview/intro




elasticsearch-head

http://socurites.com:9200/_plugin/head/



Marvel Sense

http://socurites.com:9200/_plugin/marvel/sense/index.html



about elasticsearch

개요

• 분산 환경의 문서 지향 (distributed document-oriented)– 데이터 저장소 (data store)

• 수백 대의 서버로 scale out• PB 급의 데이터 저장• document(serialized JSON object) 기반 data structure• partial document update 지원

– document 는 근본적으로 수정이 불가능 (immutable)– update = replacement(internally)

– 검색 엔진 (search engine)• 루씬 (lucene) 을 내부 엔진으로 사용• 모든 필드를 indexing 하여 검색 가능

– 실시간 분석 플랫폼 (real-time analytic platform)• aggregation 지원• approximate aggregation = (big data + real-time analysis) – precision

– 빅 데이터를 정확도를 낮춰서 실시간으로 분석

storing indexing

searching

analyzing

filtering ordering

aggregation

저장검색분석

life inside a cluster

node & cluster

• 0 개의 index 를 가진 > 1 개의 node 로 구성된 > 1 개의 cluster

– node: 실행중인 elasticsearch instance• master node

– 클러스터를 관리» index 추가 / 삭제» node 추가 / 삭제

– 투표를 통해 master node 선출– document 단위의 변경 또는 search 는 모두 데이터 노드에서 이루어지므로

master node 는 bottleneck 이 되지는 않음

– cluster: 동일한 cluster.name 을 가지는 node 들의 집합

참고 ) cluster / node / index

https://github.com/socurites/elasticsearch-foot



index & shard

• 3 개의 primary shard 로 구성된 > 1 개의 index 를 가진 > 1 개의 node 로 구성된 1 개의 cluster

– index: 관련성이 있는 데이터의 저장 단위• 물리적인 shard 에 대한 논리적인 이름 공간

– shard: index 의 데이터 일부를 저장하여 전체 index 를 구성• indexing 된 document 가 실제 저장되는 곳• document 들은 여러 shard 에 분산되어 저장되므로 scale out 을 지원• primary shard

– 모든 document 단 하나의 primary shard 에 위치– primary shard 의 개수는 index 를 생성할 때 결정되며 바꿀 수 없음 (default: 5)

• replica shard– primary shrad 의 복사본– 장애 발생시 recovery / searching 에 대한 concurrent read 보장

1 replica

• 3 개의 primary shard 와 1 개의 replica 로 구성된 > 1 개의 index 를 가진 > 2 개의 node 로 구성된 1 개의 cluster

scale out – shard reallocation


– shard 는 새로운 노드로 재할당되어 , 새로 추가된 computing power 를 완전히 활용할 수 있음

2 replica


recovery on failure

• 3 개의 primary shard 와 2 개의 replica 로 구성된 > 1 개의 index 를 가진 > 2 개의 node 로 구성된 1 개의 cluster– master node 1 이 shutdown 된 경우 ,

• primary node selection– node 2 가 새로운 primary node 가 선출된다

• recovering primary shard– node 1 에 위치한 primary shard 1, 2 가 사라짐– replica node 를 새로운 primary shard 로 지정

inside a shard

flush, refresh, optimize

• elasticsearch 에서– search 는 near real-time: indexing 한 document 를 기본적으로 1s 이후

search 가능– CRUD 는 real-time– data persistence 를 보장– delete operation 을 하더라도 disk 가 바로 해제되지 않음

• Why?– refresh– flush– optimize

dynamically updatable indices

• index 의 immutable 한 장점은 그대로 유지한 채 , index 를 수정 가능하게 하기– “ 여러 개의 index 를 사용하자”

• 기존의 커다란 inverted index 는 그대로 유지• 새로운 document 는 주기적으로 새로운 index 로 생성• search 요청 발생시 , 여러 개의 index 를 차례대로 검색한 후 결과를 병합하여 리턴

– segment: 이러한 여러 개의 index 들 중 하나

• 용어 정리– index = segements + commit point– commit point

= 현재 segment 목록

index

shard

seg-ment

http://socurites.com/big-data/elasticsearch/elasticsearchs-index-structrue-segment-merging





20151022 elasticsearch 적용및활용_송준이_sds발표용

Technology

Transcript of 20151022 elasticsearch 적용및활용_송준이_sds발표용