Post on 12-Apr-2017
Using Load Test to Automatically Compare the Subsystems of a Large Enterprise
SystemHaroon Malik, Bram Adams & Ahmed E. Hassan
Software Analysis and Intelligence Lab (SAIL)Queen’s University, Kingston, Canada
Parminder Flora & Gilbert Hamann Performance Engineering
Research In Motion, Waterloo, Canada
Today's Large scale systems (LSS) are composed of many underlying subsystems.
These LSS grow rapidly in size to handle growing traffic, complex services and business critical functionality
Performance analyst have to face the challenge of dealing with performance bugs as processing is spread across thousands of subsystems and mail lion of hardware nodes
LOAD TESTING
LOAD TESTING
Load Generator-1
Load Generator-2
Monitoring Tool
Performance counter Log
Performance Repository
System
Environment Setup Load test execution Load test analysis Report generation
CURRENT PRACTICE
1 2 3 4
CHALLENGES…
LARGE NUMBER OF PERFORMANCE COUNTERS
LIMITED TIME
RISK OF ERROR
2 + 2 = 5
Automated
Methodology
Required
homeLikeJustWorkNowreally
:::::::coolcpt
Just man Work home smash lunch day pretty beer ready working home day smash pretty
Time getting get well dude dinner bucket head really heading got time night get dude got
Feeling matt dude last 4560 ut2465 like now still good feel still next might game today 4562
PC-1
PC-2
PC-3
Lot of Data Our Methodology Signature
METHODOLOGY
homeLikeJustWorkNowreally
:::::::coolcpt
Just man Work home smash lunch day pretty beer ready working home day smash pretty
Time getting get well dude dinner bucket head really heading got time night get dude got
Feeling matt dude last 4560 ut2465 like now still good feel still next might game today 4562
PC-1
PC-2
PC-3
Lot of Data Our Methodology Signature
METHODOLOGYDatabase
Mail Web
METHODOLOGY
Commits/Sec
Writes/Sec
CPU Utilization
Database Cache % Hit
Subsystems Base-Line Load Test - 1 DeviationMatch
0.59
1
0.99
METHODOLOGY STEPS
1 2 3 4 5 6
Data Preparation
Counter Normalization
Dimension Reduction
Crafting Performance Signatures
Extracting Performance Deviations
Report Generation
CASE STUDY
MEASURING THE PERFORMANCE
Base- Line
Test- 1t1 t2 t3 t4 t5 t6
Deviations Predicted (P)
Deviations Occurred (O)
PO= P ∩ O
Precision = P ∩ O/ P = 1/4 = 0.25
Recall = P ∩ O/ O = 1/3 = 0.33
RESEARCH QUESTIONS
Can our methodology identify the subsystems of an LSS, which have performance deviations relative to prior tests?
Can we save time on the unnecessary load test completion by early identifying the performance deviations along different subsystems of a LSS?
How is the performance of our methodology affected by different sampling intervals?
Can our methodology identify the subsystems of an LSS, which have performance deviations relative to prior tests?
RQ-1
APPROACH
4 Load tests 8 hours700 performance counters eachMonitoring interval 15 sec 1922 instances
Baseline test 85% data reductionTest-1 Baseline test reproductionTest-2 Synthetic fault injection via mutationTest-3 Increased the work load intensity (8X)
1 2 3 4 5 6 7 8 9 10 110.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Base Line Test Test-A Synthesized Test 8X- Load
Performance Counters
impo
rtan
ce
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1Web Server- A
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1Application System
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1Web Server- B
Database
FINDINGS
Our methodology help performance analysts to identify sub-systems with performance deviations relative to prior tests
SubsystemsLoad Test
Test-A Synthesized 8-X loadData Base 0.997 0.732 0.826Web Server-A 1.000 0.701 0.795Web Server-B 1.000 0.700 0.790Application 1.000 0.623 0.681
Can we save time on the unnecessary load test completion by early identifying the performance deviations along different subsystems of a LSS?
RQ-2
1 33 65 97 129
161
193
225
257
289
321
353
385
417
449
481
513
545
577
609
641
673
705
737
769
801
833
865
897
929
961
99335
40
45
50
55
60
65
70
75
80
% CPU Utilization
Observations
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
35
40
45
50
55
60
65
70
75
80
% CPU Utilization
Observations
Baseline
Load TestCPU Stress
0 20 40 60 80 100 12038
88 % CPU Utilization
Time (min)
Two Load Test 2 hours, each Monitoring rate – 15 sec CPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min
6% 6%
APPROACH
Baseline
Load TestCPU Stress
0 20 40 60 80 100 12038
88 % CPU Utilization
Time (min)
Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min
6% 6%
APPROACH
Baseline
Load TestCPU Stress
0 20 40 60 80 100 12038
88 % CPU Utilization
Time (min)
Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min
6% 6%
APPROACH
Baseline
Load TestCPU Stress
0 20 40 60 80 100 12038
88 % CPU Utilization
Time (min)
Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min
6% 6%
APPROACH
Baseline
Load TestCPU Stress
0 20 40 60 80 100 12038
88 % CPU Utilization
Time (min)
Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min
6% 6%
APPROACH
Baseline
Load TestCPU Stress
0 20 40 60 80 100 12038
88 % CPU Utilization
Time (min)
Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min
6% 6%
APPROACH
Database(30-mins)
1 2 3 4 5 6 7 8 9 10 110.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Base-Line Test Load Test
Database(15-mins)
1 2 3 4 5 6 7 8 9 10 110.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 110.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1 Database(10-mins)
1 2 3 4 5 6 7 8 9 10 110.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1 Database(5-mins)
Performance Counters
impo
rtan
ce
FINDINGSTime-(Observations) Database30-mins (120) 115-mins ( 60) 110-mins (40) 0.98935-mins (20) 0.8255
Early identification of deviations within 10 minutes or 40 Observations
How is the performance of our methodology affected by different sampling intervals?
RQ-3
Two Load Test 2 hours, Each Monitoring rate– 15 sec Fault Stopped Load Generators 10 Times- 15 sec each Measured the performance of methodology at different time interval
30 min – 4 Samples 15 min – 8 Samples
Baseline
Load Test -1
30-min
APPROACH
Baseline
Load Test -1
30-min
Two Load Test 2 hours, Each Monitoring rate– 15 secFault Stopped Load Generators 10 Times- 15 sec each Measured the performance of methodology at different time interval
30 min – 4 Samples 15 min – 8 Samples
APPROACH
Baseline
Load Test -130-min
Two Load Test 2 hours, Each Monitoring rate– 15 secFault Stopped Load Generators 10 Times- 15 sec each Measured the performance of methodology at different time interval
30 min – 4 Samples 15 min – 8 Samples
15-min
APPROACH
Small sample yield high RECALL
FINDINGS
Test Run Database Web Server -1 Web Server- 2 Application System Average
Min Obs Samples Recall Prec Recall Prec Recall Prec Recall Prec Recall Prec30 120 4 0.50 1.00 0.50 1.00 0.30 1.00 0.25 1.00 0.325 1.000
15 60 8 0.62 1.00 0.62 1.0 0.62 1.0 0.50 1.0 0.590 1.000
10 40 12 1.00 0.90 1.00 0.9 1.00 0.9 0.9 0.69 0.975 0.847
5 20 24 1.00 0.70 1.00 0.7 1.00 0.8 1.00 0.66 1.000 0.715
All - 0.78 0.90 0.78 0.90 0.73 0.92 0.66 0.83 0.738 0.890
Large sample yield high PRECISION
Methodology performs best at 10 minutes time interval with nice balance of both recall and precision