DynamoDB의 안과밖 - 정민영 (비트패킹 컴퍼니)

Post on 21-Jan-2017

226 views 5 download

Transcript of DynamoDB의 안과밖 - 정민영 (비트패킹 컴퍼니)

DynamoDB의�안과�밖정민영�

AWSKRUG

DynamoDB의�바깥

Amazon�DynamoDB는�완벽하게�관리되는�NoSQL�데이터베이스�서비스로,�원활한�확장성과�함께�빠르고�예측�가능한�성능을�제공합니다.�

DynamoDB는�분산�데이터베이스를�운영하고�조정하는�데�따른�관리�부담을�줄이며,�따라서�하드웨어�프로비저닝,�설정�및�구성,�복제,�소프트웨어�패치�또는�클러스터�조정에�대해�걱정할�필요가�없습니다

DynamoDB는�테이블의�데이터와�트래픽을�충분한�수의�서버로�자동�분산하여�처리량�및�스토리지�요구�사항을�처리하면서도�일관되고�빠른�성능을�유지합니다.�모든�데이터가�SSD(Solid�State�Disk)에�저장되고�

AWS�리전의�여러�가용�영역에�걸쳐�자동�복제되기�때문에�확실한�고가용성과�데이터�내구성을�보입니다.

안�쓸�수�없는�DDB

•관리형이면서��

•성능을�예측하여�설계할�수�있고�

•일관된�성능을�보장하는데�

•고가용성과�내구성까지?

인생이�그렇게�쉬울리가

DynamoDB에�관한�가장�흔한�질문

왜�돈낸만큼�성능이�나오지�않나요?

왜�처리량에�여유가�있는데��쓰로팅이�발생하나요?

DDB의�특징에�대한�바른�이해?

• DDB는�고유의�특징과�제약을�동시에�지님�

•특히�일관된�성능(=Latency)를�제공하는데�주력�

• ex)�성능�지연�대신�요청�거부(Throttling)

DDB의�가장�큰�장점�DDB�특징/제약의�원흉

일관된�성능�제공

??

일관된�성능을�제공하기�위한�DDB의�안쪽

Throughput

•테이블의�읽기(RCU)와�쓰기(WCU)에�원하는�만큼의�처리량을�설정�

• RCU와�WCU는�각기�독립적으로�설정�

• 1�WCU�=�1KB/sec�

• 1�RCU�=�4KB/sec�**

•DDB는�항목의�일관성에�대하여�2가지�옵션을�제공.�

• Strongly�vs�Eventually�

• 1�RCU는�최대�4KB의�항목에�대하여,�Strongly�조건으로�1/sec,�Eventually�조건으로�2/sec�

• ex)�6KB�항목에�대해서�100회�읽기를�Strongly/Eventually�각�수행할�경우�필요�RCU�

• Ceiling(6KB/4KB)�*�100�=�2�*�1�*�100�=�200�

• Ceiling(6KB/4KB)�/�2�*�100�=�2�/�2�*�100�=�100

파티셔닝

• DDB는�일관된�성능을�제공하기�위해�한�테이블을�여러개의�파티션으로�분산�

•파티션의�갯수는�Throughput과�테이블의�크기로�결정�

•어떤�항목이�어떤�파티션에�저장될지는�분산키(Hash�Key)에�의해�결정�

•파티션의�갯수를�결정하는�공식�

•MAX((RCU/3000+WCU/1000),�Table�Size(GB)/10GB)�

•MAX((5000/3000+500/1000),�8/10)�=�MAX(3,�1)�=�3

00 55 A954 AA FF

Hash table• Hash key uniquely identifies an item• Hash key is used for building an unordered hash index• Table can be partitioned for scale

00 FF

Id = 1Name = Jim

Hash (1) = 7B

Id = 2Name = AndyDept = Engg

Hash (2) = 48

Id = 3Name = KimDept = Ops

Hash (3) = CD

Key Space

그렇다면….

•매우�고르게�분포하는�분산키를�사용하고�(Uniform�distribution�Hash�key)�

•적절한�처리량을�설정하면�아무�문제�없겠구나!

DDB의�함정

1.�설정한�Throughput은��모든�파티션에�균등�분배된다

읽기와�쓰기에서�분산키의�인기도가�다를�수�있음

(Hot�Key)

Example: hot keysPartition

Time

Heat

2.�한번�증가한�파티션은�줄어들지�않는다

Throughput��확장의�고민요소

=�파티션당�RCU/WCU는�갈수록�줄어들�수�있음

RCU/WCU가�여유�있는데�쓰로팅�발생의�주�원인

Throughput을�무조건�늘리는�대응의�위험성

3.�LSI는�테이블의��WCU를�소모한다

평균�항목�크기�계산에서�흔히�놓치는�지점

사용되지�않는�Throughput은�저장된다

고�들었는데요?

Burst capacity is built-in

0

400

800

1200

1600

Cap

acity

Uni

ts

Time

Provisioned Consumed

“Save up” unused capacity

Consume saved up capacity

Burst capacity: 300 seconds(1200 × 300 = 3600 CU)

Burst capacity may not be sufficient

0

400

800

1200

1600

Cap

acity

Uni

ts

Time

Provisioned Consumed Attempted

Burst capacity: 300 seconds(1200 × 300 = 3600 CU)

Throttled requests

Don’t completely depend on burst capacity… provision sufficient throughput

하지만�이것도��전부�파티션�단위

제발�패치좀….

•DDB의�Throughput은�테이블�단위로�설정하지만*�파티션에�균등�분배됨.�즉�사실상�Throughput은�파티션�단위의�설정.�

•별도의�요청�없이는�파티션�상황을�추적하기�어려워�Throughput�설계에�어려움이�많음�

• Throughput�감소가�4회/24시간�제약�

• *�GSI제외

그럼에도�불구하고….

장점도�뚜렷한�DDB

• Latency를�일정하게�낮게�관리할�수�있는건�매우�큰�장점�

•저장�용량에�신경쓰지�않고�이용�가능�

• Rich한�자료형�지원으로�복잡한�고급�기능�처리�가능

나름의�BP들

테이블�관리

•명백하게�인기�있는�항목(Hot)과�덜�인기�있는�항목(Cold)을�분리할�수�있다면,�테이블을�분리�

• ex)�시계열�데이터의�경우�연-월�혹은�연-월-일�단위의�테이블�분리�

•과거�테이블의�경우�쓰기�혹은�읽기가�발생하지�않을�수�있어�비용�절감�

• Index에�모든�Attribute를�포함시키지�않고�LSI�보다�GSI로�대체

50%�이상�버퍼

https://github.com/sebdah/dynamic-dynamodb

DynamoDB�Stream

•DDB에서�Scan은�매우�비싸고�Query도�Get에�비하면�상대적으로�고�비용�

•키�혹은�내용�검색을�위해서�DDB�Stream과�ElasticSearch를�연계해서�이용하는�방법�고려�

•테이블을�분산시키는�경우등에도�유용

Q&A