DynamoDB의�안과�밖정민영�
AWSKRUG
DynamoDB의�바깥
Amazon�DynamoDB는�완벽하게�관리되는�NoSQL�데이터베이스�서비스로,�원활한�확장성과�함께�빠르고�예측�가능한�성능을�제공합니다.�
DynamoDB는�분산�데이터베이스를�운영하고�조정하는�데�따른�관리�부담을�줄이며,�따라서�하드웨어�프로비저닝,�설정�및�구성,�복제,�소프트웨어�패치�또는�클러스터�조정에�대해�걱정할�필요가�없습니다
DynamoDB는�테이블의�데이터와�트래픽을�충분한�수의�서버로�자동�분산하여�처리량�및�스토리지�요구�사항을�처리하면서도�일관되고�빠른�성능을�유지합니다.�모든�데이터가�SSD(Solid�State�Disk)에�저장되고�
AWS�리전의�여러�가용�영역에�걸쳐�자동�복제되기�때문에�확실한�고가용성과�데이터�내구성을�보입니다.
안�쓸�수�없는�DDB
•관리형이면서��
•성능을�예측하여�설계할�수�있고�
•일관된�성능을�보장하는데�
•고가용성과�내구성까지?
인생이�그렇게�쉬울리가
DynamoDB에�관한�가장�흔한�질문
왜�돈낸만큼�성능이�나오지�않나요?
왜�처리량에�여유가�있는데��쓰로팅이�발생하나요?
DDB의�특징에�대한�바른�이해?
• DDB는�고유의�특징과�제약을�동시에�지님�
•특히�일관된�성능(=Latency)를�제공하는데�주력�
• ex)�성능�지연�대신�요청�거부(Throttling)
DDB의�가장�큰�장점�DDB�특징/제약의�원흉
일관된�성능�제공
??
일관된�성능을�제공하기�위한�DDB의�안쪽
Throughput
•테이블의�읽기(RCU)와�쓰기(WCU)에�원하는�만큼의�처리량을�설정�
• RCU와�WCU는�각기�독립적으로�설정�
• 1�WCU�=�1KB/sec�
• 1�RCU�=�4KB/sec�**
•DDB는�항목의�일관성에�대하여�2가지�옵션을�제공.�
• Strongly�vs�Eventually�
• 1�RCU는�최대�4KB의�항목에�대하여,�Strongly�조건으로�1/sec,�Eventually�조건으로�2/sec�
• ex)�6KB�항목에�대해서�100회�읽기를�Strongly/Eventually�각�수행할�경우�필요�RCU�
• Ceiling(6KB/4KB)�*�100�=�2�*�1�*�100�=�200�
• Ceiling(6KB/4KB)�/�2�*�100�=�2�/�2�*�100�=�100
파티셔닝
• DDB는�일관된�성능을�제공하기�위해�한�테이블을�여러개의�파티션으로�분산�
•파티션의�갯수는�Throughput과�테이블의�크기로�결정�
•어떤�항목이�어떤�파티션에�저장될지는�분산키(Hash�Key)에�의해�결정�
•파티션의�갯수를�결정하는�공식�
•MAX((RCU/3000+WCU/1000),�Table�Size(GB)/10GB)�
•MAX((5000/3000+500/1000),�8/10)�=�MAX(3,�1)�=�3
00 55 A954 AA FF
Hash table• Hash key uniquely identifies an item• Hash key is used for building an unordered hash index• Table can be partitioned for scale
00 FF
Id = 1Name = Jim
Hash (1) = 7B
Id = 2Name = AndyDept = Engg
Hash (2) = 48
Id = 3Name = KimDept = Ops
Hash (3) = CD
Key Space
그렇다면….
•매우�고르게�분포하는�분산키를�사용하고�(Uniform�distribution�Hash�key)�
•적절한�처리량을�설정하면�아무�문제�없겠구나!
DDB의�함정
1.�설정한�Throughput은��모든�파티션에�균등�분배된다
읽기와�쓰기에서�분산키의�인기도가�다를�수�있음
(Hot�Key)
Example: hot keysPartition
Time
Heat
2.�한번�증가한�파티션은�줄어들지�않는다
Throughput��확장의�고민요소
=�파티션당�RCU/WCU는�갈수록�줄어들�수�있음
RCU/WCU가�여유�있는데�쓰로팅�발생의�주�원인
Throughput을�무조건�늘리는�대응의�위험성
3.�LSI는�테이블의��WCU를�소모한다
평균�항목�크기�계산에서�흔히�놓치는�지점
사용되지�않는�Throughput은�저장된다
고�들었는데요?
Burst capacity is built-in
0
400
800
1200
1600
Cap
acity
Uni
ts
Time
Provisioned Consumed
“Save up” unused capacity
Consume saved up capacity
Burst capacity: 300 seconds(1200 × 300 = 3600 CU)
Burst capacity may not be sufficient
0
400
800
1200
1600
Cap
acity
Uni
ts
Time
Provisioned Consumed Attempted
Burst capacity: 300 seconds(1200 × 300 = 3600 CU)
Throttled requests
Don’t completely depend on burst capacity… provision sufficient throughput
하지만�이것도��전부�파티션�단위
제발�패치좀….
•DDB의�Throughput은�테이블�단위로�설정하지만*�파티션에�균등�분배됨.�즉�사실상�Throughput은�파티션�단위의�설정.�
•별도의�요청�없이는�파티션�상황을�추적하기�어려워�Throughput�설계에�어려움이�많음�
• Throughput�감소가�4회/24시간�제약�
• *�GSI제외
그럼에도�불구하고….
장점도�뚜렷한�DDB
• Latency를�일정하게�낮게�관리할�수�있는건�매우�큰�장점�
•저장�용량에�신경쓰지�않고�이용�가능�
• Rich한�자료형�지원으로�복잡한�고급�기능�처리�가능
나름의�BP들
테이블�관리
•명백하게�인기�있는�항목(Hot)과�덜�인기�있는�항목(Cold)을�분리할�수�있다면,�테이블을�분리�
• ex)�시계열�데이터의�경우�연-월�혹은�연-월-일�단위의�테이블�분리�
•과거�테이블의�경우�쓰기�혹은�읽기가�발생하지�않을�수�있어�비용�절감�
• Index에�모든�Attribute를�포함시키지�않고�LSI�보다�GSI로�대체
50%�이상�버퍼
https://github.com/sebdah/dynamic-dynamodb
DynamoDB�Stream
•DDB에서�Scan은�매우�비싸고�Query도�Get에�비하면�상대적으로�고�비용�
•키�혹은�내용�검색을�위해서�DDB�Stream과�ElasticSearch를�연계해서�이용하는�방법�고려�
•테이블을�분산시키는�경우등에도�유용
Q&A
Reference
•http://www.slideshare.net/AmazonWebServices/deep-dive-amazon-dynamodb�
• http://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-dynamodb�
• https://docs.aws.amazon.com/ko_kr/amazondynamodb/latest/developerguide/Introduction.html
Top Related