Digging Cassandra Cluster

61
DIGGING CASSANDRA CLUSTER Ivan Burmistrov

Transcript of Digging Cassandra Cluster

Page 1: Digging Cassandra Cluster

DIGGING CASSANDRA CLUSTER

Ivan Burmistrov

Page 2: Digging Cassandra Cluster

Ivan BurmistrovTech Lead at SKB Kontur5+ years Cassandra experience (from Cassandra 0.7)

WHO AM I?

[email protected]

@isburmistrov

https://www.linkedin.com/in/isburmistrov/en

Page 3: Digging Cassandra Cluster

• Services for businesses• B2B: e-Invoicing• B2G: e-reporting of tax returns to government

SKB KONTUR

Page 4: Digging Cassandra Cluster

RETAIL

Page 5: Digging Cassandra Cluster

• 24 x 7 x 365• Guarantee of delivering

REQUIREMENTS

Page 6: Digging Cassandra Cluster

• 24 x 7 x 365• Guarantee of delivering• Delivery time <= 1

minute

REQUIREMENTS

Page 7: Digging Cassandra Cluster

When Cassandra just works

Page 8: Digging Cassandra Cluster

When Cassandra just works

Page 9: Digging Cassandra Cluster

When Cassandra just works

Page 10: Digging Cassandra Cluster

SMART GUY

Page 11: Digging Cassandra Cluster

• 150+ different tables in cluster (Cassandra 1.2)• Client read latency (99th percentile): 100ms – 2.0s• Affected almost all tables• CPU: 40% – 80%• Disk: not a problem

THE PROBLEM

2 sec.

Page 12: Digging Cassandra Cluster

• ReadLatency.99thPercentilenode’s latency of processing read request

• ReadLatency.OneMinuteRatenode’s read requests per second

• SSTablesPerReadHistogramhow many SSTables node reads per read request

HYPOTHESIS 1: ANOMALIES IN METRICS

Page 13: Digging Cassandra Cluster

• ReadLatency.99thPercentilenode’s latency of processing read request

• ReadLatency.OneMinuteRatenode’s read requests per second

• SSTablesPerReadHistogramhow many SSTables node reads per read request

• Tables were pretty similar in these metrics• What values are good, which are bad?

HYPOTHESIS 1: ANOMALIES IN METRICS

Page 14: Digging Cassandra Cluster

• Decrease/increase compaction throughput• Change compaction strategy

HYPOTHESIS 2: COMPACTION

Page 15: Digging Cassandra Cluster

• Decrease/increase compaction throughput• Change compaction strategy• Nothing changed

HYPOTHESIS 2: COMPACTION

Page 16: Digging Cassandra Cluster

• ParNew GC – 6 seconds per minute (10%!)• Read good articles about Cassandra and GC

• http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads

• http://aryanet.com/blog/cassandra-garbage-collector-tuning

• Tried to tune

HYPOTHESIS 3: GC

Page 17: Digging Cassandra Cluster

• ParNew GC – 6 seconds per minute (10%!)• Read good articles about Cassandra and GC

• http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads

• http://aryanet.com/blog/cassandra-garbage-collector-tuning

• Tried to tune• Nothing changed

HYPOTHESIS 3: GC

Page 18: Digging Cassandra Cluster

• Built-in profiling tool from Oracle JDK 7 Update 40

• Low performance overhead: 1-2% • Useful for CPU profiling: hot threads, hot

methods, call stacks,…

• Profiling results: 70% of time – SSTablesReader

Java Mission Control and Java Flight Recorder

Page 19: Digging Cassandra Cluster

• SSTablesPerReadHistogram did not help• We needed another metric

• SSTablesPerSecondhow many SSTables each table read per second

SSTablesPerSecond = SSTablesPerReadHistogram.Mean *

ReadLatency.OneMinuteRate

What tables cause most reads of SSTables?

Page 20: Digging Cassandra Cluster

SSTablesPerSecond

Page 21: Digging Cassandra Cluster

• 7 leading tables = only 7 candidates for deep investigation

• Large difference between leaders and others• Almost all leaders were surprises• 3 types of problems

SSTablesPerSecond: results

Page 22: Digging Cassandra Cluster

Problem 1: Invalid timestamp usageCREATE TABLE users_lastaction (

user_id uuid,subsystem text,last_action_time timestamp,PRIMARY KEY (user_id)

);

subsystem: ‘API‘,‘WebApplication‘,…

Page 23: Digging Cassandra Cluster

Problem 1: Invalid timestamp usageFirst subsystem:

INSERT INTO users_lastaction (user_id, subsystem, last_action_time) VALUES (62c36092-82a1-3a00-93d1-46196ee77204,‘API',‘2011-02-03T04:05:00');

Second subsystem:INSERT INTO users_lastaction (user_id, subsystem, last_action_time) VALUES (62c36092-82a1-3a00-93d1-46196ee77204,‘WebApp',‘2011-02-08T07:05:00')USING TIMESTAMP 635774040762020710;

Time in ticks, 10000 ticks = 1 millisecond

Page 24: Digging Cassandra Cluster

Problem 1: Invalid timestamp usageSELECT last_action_time FROM users_lastactionWHERE user_id = 62c36092-82a1-3a00-93d1-46196ee77204 AND subsystem = ‘API'

SSTables

Memtable

Page 25: Digging Cassandra Cluster

Problem 1: Invalid timestamp usageSELECT last_action_time FROM users_lastactionWHERE user_id = 62c36092-82a1-3a00-93d1-46196ee77204 AND subsystem = ‘API'

1. Looks at Memtable

SSTables

Memtable

Page 26: Digging Cassandra Cluster

Problem 1: Invalid timestamp usageSELECT last_action_time FROM users_lastactionWHERE user_id = 62c36092-82a1-3a00-93d1-46196ee77204 AND subsystem = ‘API'

1. Looks at Memtable2. Filters SSTables using bloom

filter

SSTables

Memtable

Page 27: Digging Cassandra Cluster

Problem 1: Invalid timestamp usageSELECT last_action_time FROM users_lastactionWHERE user_id = 62c36092-82a1-3a00-93d1-46196ee77204 AND subsystem = ‘API'

1. Looks at Memtable2. Filters SSTables using bloom

filter3. Filters SSTables by timestamp

(CASSANDRA-2498) SSTables

Memtable

Page 28: Digging Cassandra Cluster

Problem 1: Invalid timestamp usageSELECT last_action_time FROM users_lastactionWHERE user_id = 62c36092-82a1-3a00-93d1-46196ee77204 AND subsystem = ‘API'

1. Looks at Memtable2. Filters SSTables using bloom

filter3. Filters SSTables by timestamp

(CASSANDRA-2498)4. Reads remaining SSTables SSTables

Memtable

Page 29: Digging Cassandra Cluster

Problem 1: Invalid timestamp usageSELECT last_action_time FROM users_lastactionWHERE user_id = 62c36092-82a1-3a00-93d1-46196ee77204 AND subsystem = ‘API'

1. Looks at Memtable2. Filters SSTables using bloom

filter3. Filters SSTables by timestamp

(CASSANDRA-2498)4. Reads remaining SSTables5. Merges result SSTables

Memtable

Page 30: Digging Cassandra Cluster

Problem 1: Invalid timestamp usageFirst subsystem:

INSERT INTO users_lastaction (user_id, subsystem, last_action_time) VALUES (62c36092-82a1-3a00-93d1-46196ee77204,‘API',‘2011-02-03T04:05:00');

Second subsystem:INSERT INTO users_lastaction (user_id, subsystem, last_action_time) VALUES (62c36092-82a1-3a00-93d1-46196ee77204,‘WebApp',‘2011-02-08T07:05:00')USING TIMESTAMP 635774040762020710;

Time in ticks, 10000 ticks = 1 millisecond

Page 31: Digging Cassandra Cluster

Problem 1: Invalid timestamp usageFix:

started to use equal timestamp sources for one table

Page 32: Digging Cassandra Cluster

Problem 2: Few writes, many reads• Reads dominates over writes (example – user

accounts)• Each read – from SSTable (Memtable already

flushed)

Page 33: Digging Cassandra Cluster

Problem 2: Few writes, many reads• Reads dominates over writes (example – user

accounts)• Each read – from SSTable (Memtable already

flushed)• Fix: just enabled row cache

Page 34: Digging Cassandra Cluster

Problem 3: Aggressive time seriesCREATE TABLE activity_records(

time_bucket text,record_time timestamp,record_content text,PRIMARY KEY (time_bucket, record_time)

);

SELECT record_content FROM activity_recordsWHERE time_bucket = ‘2015-05-10 12:00:00'AND record_time > ‘2015-05-10 12:30:10'

Page 35: Digging Cassandra Cluster

Problem 3: Aggressive time seriesSELECT record_content FROM activity_recordsWHERE time_bucket = ‘2015-05-10 12:00:00'AND record_time > ‘2015-05-10 12:30:10'

SSTables

Memtable

Page 36: Digging Cassandra Cluster

Problem 3: Aggressive time seriesSELECT record_content FROM activity_recordsWHERE time_bucket = ‘2015-05-10 12:00:00'AND record_time > ‘2015-05-10 12:30:10'

1. Looks at Memtable SSTables

Memtable

Page 37: Digging Cassandra Cluster

Problem 3: Aggressive time seriesSELECT record_content FROM activity_recordsWHERE time_bucket = ‘2015-05-10 12:00:00'AND record_time > ‘2015-05-10 12:30:10'

1. Looks at Memtable2. Filters SSTables using bloom

filter SSTables

Memtable

Page 38: Digging Cassandra Cluster

Problem 3: Aggressive time seriesSELECT record_content FROM activity_recordsWHERE time_bucket = ‘2015-05-10 12:00:00'AND record_time > ‘2015-05-10 12:30:10'

1. Looks at Memtable2. Filters SSTables using bloom

filter3. Can’t use CASSANDRA-2498 SSTables

Memtable

Page 39: Digging Cassandra Cluster

Problem 3: Aggressive time seriesSELECT record_content FROM activity_recordsWHERE time_bucket = ‘2015-05-10 12:00:00'AND record_time > ‘2015-05-10 12:30:10'

1. Looks at Memtable2. Filters SSTables using bloom

filter3. Can’t use CASSANDRA-24984. CASSANDRA-5514! SSTables

Memtable

Page 40: Digging Cassandra Cluster

Problem 3: Aggressive time seriesSELECT record_content FROM activity_recordsWHERE time_bucket = ‘2015-05-10 12:00:00'AND record_time > ‘2015-05-10 12:30:10'

1. Looks at Memtable2. Filters SSTables using bloom

filter3. Can’t use CASSANDRA-24984. CASSANDRA-5514!5. Reads remaining SSTables SSTables

Memtable

Page 41: Digging Cassandra Cluster

Problem 3: Aggressive time seriesSELECT record_content FROM activity_recordsWHERE time_bucket = ‘2015-05-10 12:00:00'AND record_time > ‘2015-05-10 12:30:10'

1. Looks at Memtable2. Filters SSTables using bloom

filter3. Can’t use CASSANDRA-24984. CASSANDRA-5514!5. Reads remaining SSTables6. Merges result SSTables

Memtable

Page 42: Digging Cassandra Cluster

Problem 3: Aggressive time series

Fix: just upgraded to Cassandra 2.0+

Page 43: Digging Cassandra Cluster

SSTablesPerSecond: before

Page 44: Digging Cassandra Cluster

SSTablesPerSecond: after

Page 45: Digging Cassandra Cluster

Before:• Client read latency (99th percentile): 100ms – 2s• CPU: 40% – 80%

After:• Client read latency (99th percentile): 50ms – 200ms• CPU: 20% – 50%

WHAT ABOUT OUR GOAL?

Page 46: Digging Cassandra Cluster

• Reading SSTables vs reading Memtable – 50/50

• SliceQuery – 70%

PROFILE AGAIN

Page 47: Digging Cassandra Cluster

• LiveScannedHistogramhow many live columns node scans per slice

query• TombstonesScannedHistogram

how many tombstones node scans per slice query

LOOK AT METRICS AGAIN

Page 48: Digging Cassandra Cluster

• LiveScannedHistogramhow many live columns node scans per slice

query• TombstonesScannedHistogram

how many tombstones node scans per slice query• Not found any anomalies

LOOK AT METRICS AGAIN

Page 49: Digging Cassandra Cluster

• LiveScannedHistogramhow many live columns node scans per slice

query• TombstonesScannedHistogram

how many tombstones node scans per slice query• Not found any anomalies• Why not use the successful trick?

LOOK AT METRICS AGAIN

Page 50: Digging Cassandra Cluster

LiveScannedPerSecondhow many live columns Cassandra scans per second for each tableLiveScannedHistogram.Mean * ReadLatency.OneMinuteRate

Page 51: Digging Cassandra Cluster

• 1 obvious leader• Large difference between leader and others• Leader – big surprise

LiveScannedPerSecond: results

Page 52: Digging Cassandra Cluster

• 1 obvious leader• Large difference between leader and others• Leader – big surprise• Fix: fixed the bug

LiveScannedPerSecond: results

Page 53: Digging Cassandra Cluster

Initial:• Client read latency (99th percentile): 100ms – 2.0s• CPU: 40% – 80%

After SSTablesPerSecond fixes:• Client read latency (99th percentile): 50ms – 200ms• CPU: 20% – 50%

After LiveScannedPerSecond fixes:• Client read latency (99th percentile): 30ms – 100ms• CPU: 10% – 30%

WHAT ABOUT OUR GOAL?

Page 54: Digging Cassandra Cluster

Compaction – 30%

PROFILE AGAIN

Page 55: Digging Cassandra Cluster

Compaction – 30%Fix:

throttled down compactions during high load period,

throttled up during low load period

PROFILE AGAIN

Page 56: Digging Cassandra Cluster

WHAT ABOUT OUR GOAL?

Page 57: Digging Cassandra Cluster

Initial:• Client read latency (99th percentile): 100ms – 2.0s• CPU: 40% – 80%

After LiveSkannedPerSecond fixes:• Client read latency (99th percentile): 30ms – 100ms• CPU: 10% – 30%

After Compaction fixes:• Client read latency (99th percentile): 10ms – 50ms• CPU: 5% – 25%

WHAT ABOUT OUR GOAL?

Page 58: Digging Cassandra Cluster

• TombstonesScannedPerSecond• KeyCacheMissesPerSecond• …

MORE METRICS!

Page 59: Digging Cassandra Cluster

• TombstonesScannedPerSecond• KeyCacheMissesPerSecond• …

MORE METRICS!

Initial:• Client read latency (99th percentile): 100ms – 2.0s• CPU: 40% – 80%

After all fixes:• Client read latency (99th percentile): 5ms – 25ms 50 times less at

average!• CPU: 5% – 15% 7 times less at

average

Page 60: Digging Cassandra Cluster

THANK YOU

Page 61: Digging Cassandra Cluster

Extra: The effect of the slow queries

pending tasks

concurrent_reads