AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
深入探讨 Amazon ElastiCache
最佳实践与常见使用场景
郑进佳, AWS解决方案架构师
Kenny Zheng, Solution Architect, Amazon Web Services
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
• Amazon ElastiCache 概述
• 零宕机在线扩展ElastiCache集群
• Amazon ElastiCache 安全与加密
• Amazon ElastiCache 使用场景
• 最佳实践
主要内容
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
内存 --- 键值--- 存储支持
• Redis 3.2.10
• Memcached 1.4.34
高性能
全托管;零管理
高可用,高可靠
Amazon强化版
Amazon
ElastiCache
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
请求频率高 低
延迟低 低
数据结构
低
高
数据量低 低
Amazon
RDS
Amazon S3AmazonGlacier
AmazonCloudSearch and
Amazon Elasticsearch Service
Amazon
DynamoDB
Amazon
ElastiCache
and Amazon
DynamoDB
Accelerator
(DAX)
HDFS
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
性能更强~200 命令 + Lua 脚本
内存数据结构服务器
实用数据结构Strings, lists, hashes, sets, sorted
sets, bitmaps, and HyperLogLogs
简单
原子操作支持事务
极速大多数命令延迟<1 ms
高可用复制
持久性
开源
Redis 概览
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
SMEMBERS features
REDIS:6379>
Amazon
ElastiCache
1) “Easy to deploy & monitor”
AWS
Config
Amazon
CloudWatch
AWS
CloudTrail
AWS
CloudFormation
AWS
Management
Console
AWS CLI
and SDKs
alarm
REDIS:6379>
hget feature:details “deploy-monitor”
Amazon
SNS
Notification
AWS
Lambda
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
SMEMBERS features
REDIS:6379>
REDIS:6379>
hget feature:details “enhancements”
2) “Enhanced Redis Engine”
Optimized Swap Memory
•Mitigate the risk of increased swap usage during syncs and snapshots
Dynamic write throttling
• Improved output buffer management when the node’s memory is close to being exhausted
Smoother failovers
•Clusters recover faster as replicas avoid flushing their data to do a full re-sync with the primary
Amazon
ElastiCache
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
SMEMBERS ec-team:open-source:contributorsREDIS:6379>
https://raw.githubusercontent.com/antirez/redis/2.8/00-RELEASENOTES(from https://github.com/antirez/redis/issues/1814)
1) “Kevin McGehee”2) “Qu Chen”3) “Rajib Dugar”
hgetall ec-team:open-source:contributionsREDIS:6379>
1) PSYNC2 (Redis 4.0)
https://raw.githubusercontent.com/antirez/redis/4.0/00-RELEASENOTES
3) BGSAVE (Redis 3.2)
https://raw.githubusercontent.com/antirez/redis/3.2/00-RELEASENOTES
5) MIGRATE (Redis 3.0)
7) MASTER TIMEOUT (Redis 2.8)
https://raw.githubusercontent.com/antirez/redis/3.0/00-RELEASENOTES
https://raw.githubusercontent.com/antirez/redis/2.8/00-RELEASENOTES
9) INCREASE 2Billion+ KEYS in a DATASET (Redis 2.8)
2)
4)
6)
8)
10)
Amazon
ElastiCache
11) also: 4114, 4250, 3926, 3899
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Slot 0-5461
禁用Cluster Mode
Ke
ysp
ac
e
Slot 10923-16383
I Primary 0–5 复制节点
启用Cluster Mode
Primary Endpoint
1–15 主/分片
Slot 0
Slot 5462-10922
Slot 16383
Ke
ysp
ac
e
0–5 复制节点
Configuration Endpoint
Slot 1 …
纵向扩展
横向扩展
Max Storage 407 GiB
Max Storage 6+ TiB
Redis拓扑
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
功能 启用 禁用
故障转移 15–30 sec(Non-DNS)
~1.5 min(DNS-based)
故障转移风险 • Writes affected—partial dataset (less risk with more partitions)
• Reads available
• Writes affected on entire dataset• Reads available
性能 Scales with cluster size(90 nodes—15 primaries + 0–5 replicas per shard)
6 nodes (1 primary + 0–5 replicas)
最大连接数 • Primaries (65,000 x 15 = 975,000)• Replicas (65,000 x 75 = 4,875,000)
• Primary: 65,000• Replicas: (65,000 x 5 = 325,000)
存储 6+ TiB 407 GB
成本
案例: 假设需要使用175 GB
Smaller nodes but more $$
9 x cache.r3.xlarge ($0.455hr) = $4.095 hr 255.6 GB
Larger nodes less $
1 X cache.r3.8xlarge = $3.640, 237 GB
Redis 集群模式启用 vs 禁用
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
S5
S1
S2
S4 S3Client
16384 哈希分片/集群 每个键的分片是 CRC16(键) 模16384
每个槽分配到集群的每个分片 开发者需要使用Redis 集探知客户端
客户端被重定向到新的分片 智能客户端存储映射关系图
Shard S1 = slots 0–3276
Shard S2 = slots 3277–6553
Shard S3 = slots 6554–9829
Shard S4 = slots 9830–13106
Shard S5 = slots 13107–16383
Redis 集群 : 自动客户端分片
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone A
slots 0–5454 slots 5455–10909
Redis集群
Redis 集群—架构
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909slots 5455–10909slots 0–5454 slots 0–5454
slots 10910–16363slots 10910–16363
Redis 集群—多可用区一个集群可包含1 到15 个分片
举例: 每集群3分片2 只读副本
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone A
slots 0–5454
Redis集群
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909slots 5455–10909slots 0–5454 slots 0–5454
slots 10910–16363
分片
ReplicaReplicaPrimary
每个分片包含一个主节点和最多五个复制节点
slots 5455–10909
slots 10910–16363
Redis 集群—架构
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone A
slots 0–5454 slots 5455–10909
Redis集群
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909slots 5455–10909
分片
ReplicaReplica Primary
slots 0–5454 slots 0–5454
slots 10910–16363slots 10910–16363
Redis 集群—架构
每个分片包含一个主节点和最多五个复制节点
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone A
slots 0–5454
Redis集群
slots 10910–16363
Availability Zone B Availability Zone C
slots 10910–16363slots 10910–16363
分片
Replica PrimaryReplica
slots 5455–10909 slots 0–5454slots 5455–10909
slots 0–5454 slots 5455–10909
每个分片包含一个主节点和最多五个复制节点
Redis 集群—架构
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone A
slots 0–5454 slots 5455–10909
Redis集群
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454
slots 10910–16363 slots 10910–16363
场景1: 单主节点失效
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone A
slots 0–5454 slots 5455–10909
Redis集群
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454
slots 10910–16363
故障缓解: 1. 自动故障检测和复制节点提升 (~15–30 s)2. 修复故障节点
slots 10910–16363
场景1: 单主节点失效
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone A
slots 0–5454 slots 5455–10909
Redis集群
场景2: 大多数主节点故障
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454
slots 10910–16363slots 10910–16363
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone A
slots 0–5454 slots 5455–10909
Redis集群
slots 10910–16363
Availability Zone B Availability Zone C
slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454
故障缓解: ElastiCache 的Redis强化• 自动故障检测和复制节点提升• 修复故障节点
slots 10910–16363slots 10910–16363
场景2: 大多数主节点故障
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
aws elasticache create-snapshot --replication-group-id redisclusterID --snapshot-name snameStep 1
aws elasticache copy-snapshot --source-snapshot-name sname --target-snapshot-name sname
--target-bucket s3ucketnameStep 2
Step 3 aws elasticache create-replication-group --replication-group-id NewRedisClusterID … --snapshot-arns
arn:aws:s3:::bucketname/redisbackup-0001.rdb, etc.
Step 4 Once the new cluster is up, update your app with new Amazon ElastiCache endpoint, then terminate old cluster
3 Shards 5 Shards
Downtime
New writes
not in
snapshot
rdb
Pro tip: DR strategy—enable
CRR on Amazon S3 bucket
triggering AWS Lambda function
to hydrate destination cluster
通过备份与恢复来扩容
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
0-5461
Shard 1 Shard 2 Shard 3
5462--10922 10923-16383
aws elasticache modify-replication-group-shard-configuration --replication-group-id rep-group-id--apply-immediately --node-group-count 5
Simple API
Scale In || Out
在线分区 — 零宕机
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
0-5461
reads/writes
Shard 1 Shard 2 Shard 3
Shard 4 Shard 5
5462--10922 10923-163830-2909,5095-5461
5462-5783,6876-9830
10923-14199
2910-5094,9831--10922
无应用中断
跨分片的统一分布
5784-6875,14200-16383
在线分区—零宕机 :向外扩展
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
0-5461
reads/writes
Shard 1 Shard 2 Shard 3
Shard 4 Shard 5
5462--10922 10923-16383
跨分片的统一分布
无应用中断
在线分片—零宕机 : 向内扩展
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
AWS Lambda
3 Shards
…
var params = {
ApplyImmediately: true,
NodeGroupCount: 5,
ReplicationGroupId: ‘rep-group-id’,
… }
elasticache.modifyReplicationGroupShardConfiguration(params, function(err, data) {
if (err) console.log(err, err.stack);
else console.log(data);
}); …
5 Shards
内存高
Amazon
CloudWatch
Cluster Resized
AWS SNS
在线分片—CW 告警触发
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
reads/
writes
reads
AZ1
AZ2reads
search
reads
search
clients
缓存集群
关系型数据
正常状态
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
reads/
writes
reads
AZ1
AZ2reads
search
reads
search
clients
重负载
关系型数据
缓存集群
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
reads/
writes
reads
AZ1
AZ2reads
search
reads
search
clients
正常状态—
自动横向扩展
关系型数据
缓存集群
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
ElastiCache Redis 参考架构
Availability Zone B Availability Zone CAvailability Zone A
REDIS:6379> hget feature:details “ref-arch”
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone B Availability Zone CAvailability Zone A
Private SubnetPrivate Subnet Private Subnet
REDIS:6379> hget feature:details “ref-arch”ElastiCache Redis 参考架构
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone B Availability Zone CAvailability Zone A
Private SubnetPrivate Subnet Private Subnet
REDIS:6379> hget feature:details “ref-arch”
security group security group security group
ElastiCache Redis 参考架构
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone B Availability Zone CAvailability Zone A
Private SubnetPrivate Subnet Private Subnet
REDIS:6379> hget feature:details “ref-arch”
security group security group security group
ElastiCache Redis ClusterAmazon S3
bucket
REDIS RDB
snapshot
Encryption at
REST
ElastiCache Redis 参考架构
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Availability Zone B Availability Zone CAvailability Zone A
Private SubnetPrivate Subnet Private Subnet
REDIS:6379> hget feature:details “ref-arch”
security group security group security group
ElastiCache Redis Cluster
security group
Public Subnet
security group
Public Subnet Public Subnet
Amazon S3
bucket
REDIS RDB
snapshot
security group
Encryption In-Transit 3.2.6 Redis AUTHEncryption at
REST
ElastiCache Redis 参考架构
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
加密
• 传输过程: 加密客户端和Redis服务端的所有通讯包括节点之间通讯
• 存储: 加密存储在硬盘和S3上的备份
• 全托管: 通过API 或者控制台创建,自动发布和更新
合规
• HIPAA 合规 ElastiCache Redis
• 包含在AWS Business Associate Addendum
• Redis 3.2.6
Amazon ElastiCache 加密与合规
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
会话管理 数据库缓存 APIs (HTTP 响应)
IOT 流数据分析(赛选/聚合)
Pub/sub
社交媒体
(情感分析)
独立数据库(元数据存储)
排行榜
使用场景
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
缓存
Clients
Amazon
ElastiCache
Redis
Amazon
DynamoDB
Elastic Load
Balancing
Amazon
EC2
Amazon
RDS
write-through
reads/
writes
DDB streams
mysql.lambda_async
reads/
writes
Amazon
S3
reads/writes
对象数据
非结构化数据
关系型数据
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
缓存 NoSQL
Amazon
EC2 reads/
writes
reads
MongoDB
Cluster
Cassandra
Cluster
更小的NoSQL DB 集群需求 = 低成本 更快的数据检索= 更好的性能
Elasticsearch
Cluster
Clients
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Amazon
EC2
reads/
writes
Amazon
ElastiCache
Redis
reads
MongoDB
Cluster
DBObject doc = collection.findOne();
Cache serialized DBObject in Redis (good)
Cache rows in Redis hash (faster/more efficient)
Cassandra
Cluster
Amazon
ElastiCache
Redis
Amazon
EC2
reads/
writes
reads
ResultSet rs = session.execute(stmt);
Cache serialized ResultSet in Redis (good)
Cache rows in Redis hash (faster/more efficient)
Smaller NoSQL DB clusters needed = lower costs
Faster data retrieval = better performance
使用Amazon ElastiCache 缓存NoSQL 数据库
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Amazon
Kinesis
Analytics
Amazon
Kinesis
Streams
Amazon
Kinesis
Streams
Amazon
ElastiCache
(Redis)
cleansed
stream
流式数据的的加工/处理数据源 raw
stream
订阅
AWS Lambda function 1
持续的数据赛选/加工
Real-time
pub/sub
AWS Lambda function 2
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
使用Redis 的大数据框架
Amazon Kinesis
数据源
AWS Lambda
Apache Storm
on EMR
Spark Streaming
on Amazon EMR
Amazon
Kinesis app
Amazon
EC2
AWS IoT
Amazon
ElastiCache
Collect
Store
Process
Amazon
S3
Apache Kafka
AWS
Lambda
Custom
app
Spark on
Amazon
EMR
Analyze
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
移动社交应用
Amazon API
GatewayAWS
LambdaAmazon
ElastiCache
RedisGEOADD
GEORADIUS
Search points of interest
Update points of interest
https://aws.amazon.com/blogs/database/amazon-elasticache-utilizing-redis-geospatial-capabilities/
Amazon
DynamoDB
DDB streams
Amazon
EC2
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
数字化广告
Clients
Advertisers
https://aws.amazon.com/caching/database-caching/
Ad network
Ad slot
Consumer
Ad slot
publishers
Ad placement
(websites/apps)
Amazon
ElastiCache
Redis
<40 ms
Clickstream
(shopping
events)
User visits
page
Publisher
places ad slot
for auction
Ad network
calls for bidsBidders respond
with bids
Winners bid
ad displayed
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
• 需要独特性和订购的热门游戏
• 非常方便使用Redis排序集合
ZADD "leaderboard" 1201 "Gollum”
ZADD "leaderboard" 963 "Sauron"
ZADD "leaderboard" 1092 "Bilbo"
ZADD "leaderboard" 1383 "Frodo”
ZREVRANGE "leaderboard" 0 -1
1) "Frodo"
2) "Gollum"
3) "Bilbo"
4) "Sauron”
ZREVRANK "leaderboard" "Sauron"
(integer) 3
游戏—实时排行榜
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
集群规模最佳实践• 存储—集群需要有足够内存
• 推荐: 所需内存+ 25% 保留内存( Redis) + 预留增长(可选10%)
• 使用eviction 策略和TTLs来优化
• 在达到最大内存之前,使用CloudWatch告警纵向或者横向扩展集群
• 使用内存优化节点提高成本效率 (支持R4)
• 性能—不能牺牲性能• 使用Redis基准工具测试性能基准
• 更多READIOPS—增加复制节点
• 更多WRITEIOPS—增加分片(横向扩展)
• 更多network IO—使用网络优化的实例和横向扩展
• 使用管道来批量读写
• 数据结构命令考虑 Big(O) 时间复杂度
• 集群独立(应用共享键空间)—为你的应用选择合适的策略• 基于工作负载和环境识别哪种类型的隔离是最适合你的
• 隔离: 不隔离 $ | 基于应用类型隔离 $$ | 全隔离 $$$
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Redis 基准工具开源工具测试性能基准
example: src/redis-benchmark -h r3-xlarge-perf.foio87.0001.use1.cache.amazonaws.com -p 6379 -n -150000 -d 100
Syntax:
redis-benchmark -h <host> -p <port> -c 50 -n 1000 -d 500 –q
-c <clients>—Specifies the number of parallel connections (default 50).
-n <requests>—Specifies the number of requests (default 1000000).
-d <size>—Specifies the data size of GET and SET values in bytes.
-t <test1,test2>—Comma-separated list of tests to perform.
-q—Quiet operation, displays only the result.
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
Redis 最大内存策略基于你的应用负载需求选择最大内存策略
• noeviction: return errors when the memory limit has been reached and the client is trying to execute
commands that might result in more memory to be used
• allkeys-lru: evict keys trying to remove the less recently used (LRU) keys first
• volatile-lru: evict keys trying to remove the less recently used (LRU) keys first, but only among keys
that have an expire set
• allkeys-random: evict random keys to make space for the new data added
• volatile-random: evict random keys to make space for the new data added, but only evict keys with an
expire set
• volatile-ttl: evict only keys with an expire set, and try to evict keys with a shorter time to live (TTL) first
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
关键的 ElastiCache CloudWatch 指标
• CPUUtilization
• Memcached—up to 90% ok
• Redis—divide by cores (ex: 90% / 4 = 22.5%)
• SwapUsage low
• CacheMisses/CacheHits Ratio low/stable
• Evictions near zero
• Exception: Russian-doll caching
• CurrConnections stable
• 设置CloudWatch告警指标
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
ElastiCache 可修改的参数
• Maxclients: 65000 (unchangeable)
• Use connection pooling
• timeout—closes a connection after it has been idle for a given interval
• tcp-keepalive—detects dead peers given an interval
• Databases: 16 (default) for non-clustered mode
• Logical partition
• Reserved-memory: 25% (default)
• Recommended
50% of maxmemory to use before 2.8.22
25% after 2.8.22—ElastiCache
• Maxmemory-policy:
• The eviction policy for keys when maximum memory usage is reached
• Possible values: volatile-lru, allkeys-lru, volatile-random, allkeys-random, volatile-ttl,
noeviction
AWS中国(北京)区域由光环新网运营AWS中国(宁夏)区域由西云数据运营
• 明白潜在数据变更的频率• 为每个键设置基于适合那个频率的TTLs 值• 根据应用的需求,选择合适的移除策略• 根据应用目的隔离集群(例如, 缓存集群, 队列,独立数据库,等等)• 通过写抛出保持集群更新• 性能测试调整集群大小• 监控HIT/MISS 命中率来告警• 使用故障转移的API来测试应用弹性
缓存的诀窍
Top Related