"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

PID Service Performance and Reliability

Contents

Overview

Service performance and reliability study has been performed as a mutual effort of CSIRO and the Australian Bureau of Meteorology engineers using a test set of URI mappings. A set of stress and volume testing procedure has been run in a controlled environment and monitored on CPU utilization and memory usage.

Method

The stress and volume testing procedure employed execution of multiple concurrent instances of wget utility crawling the list of all mapped URIs in a random fashion. Each "instance" described above was a wget instance running with the following parameters:

wget -b --domains=neiipid-svt.bom.gov.au --max-redirect 0 -T 10 --mirror -r --delete-after -p --no-cache
    --header=Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    -P 1. http://neiipid-svt.bom.gov.au/manager/index.php

Note that /manager/index.php was used by Apache to provide a specifically crafted index page, which if crawled by a robot, would generate an infinite number of unique PID Service URLs necessary for load testing.

Hardware

  • Linux virtual server running RHEL 6 or equivalent
  • 1 x 2.4GHz Xeon vCPU
  • 512 Mb RAM

Environment

Java version:

  • java version "1.6.0_35"
  • Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
  • Java HotSpot (TM) 64-Bit Server VM (build 20.10-b01, mixed mode)

JVM execution arguments:

/usr/java/latest/bin/java –Xmx384m -XX:MaxPermSize=128m -XX:+UseParallelGC -server -Djava.awt.headless=true
    -Djava.ext.dirs=/usr/java/latest/jre/lib/ext -Djavax.sql.DataSource.Factory=org.apache.commons.dbcp.BasicDataSourceFactory
    -classpath :/usr/share/tomcat6/bin/bootstrap.jar:/usr/share/tomcat6/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
    -Dcatalina.base=/data/pidsvc/pidsvc -Dcatalina.home=/usr/share/tomcat6 -Djava.endorsed.dirs=
    -Djava.io.tmpdir=/data/pidsvc/pidsvc/temp -Djava.util.logging.config.file=/data/pidsvc/pidsvc/conf/logging.properties
    -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start

Test Dataset

Test dataset included:
  • 100,001 one-to-one URI mappings
  • 501 Regex-based URI mappings

PerfTestDataset.zip contains the PID Service backup file (Backup.psb) and a list of test URIs in BackupLinks.txt file.

psqldump8.1.zip contains the the test dataset in a form of PostgreSQL 8.1 SQL dump file. It might be used for faster database restore as it omits all integrity check otherwise performed by the PID Service import logic.

Test Results

##################################
#   FINAL LOADGEN STATS REPORT   #
##################################
Test started : 2013-02-11 22:19:00
Current time : 2013-02-12 22:19:00
Co. Instances: 3
Test interval: 1440
Test runtime : 1440
Requests sent: 118582
Requests/min : 82.3499984741211
Requests/sec : 1.3700000047683716
----------------------------------
HTTP response summary at 1440 mins:
591 (0.4984%)     - 200 OK
117369 (98.9771%)     - 302 Found
609 (0.5136%)     - 404 Not Found
23 (0.0194%)     - No data received.
##################################
#  End of Report                 #
##################################

##################################
#  INTERIM LOADGEN STATS REPORT  #
##################################
Test started : 2013-02-11 22:19:00
Current time : 2013-02-12 03:19:00
Co. Instances: 3
Test interval: 300
Test runtime : 1440
Requests sent: 26895
Requests/min : 89.6500015258789
Requests/sec : 1.4900000095367432
----------------------------------
HTTP response summary at 300 mins:
136 (0.5057%)     - 404 Not Found
3 (0.0112%)     - No data received.
135 (0.5020%)     - 200 OK
26621 (98.9812%)     - 302 Found
##################################
#  End of Report                 #
##################################

Extract from “top” towards the end of the test:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12499 pidsvc    20   0 1372m  62m 4160 S 35.9 10.6 507:24.05 java
28627 postgres  20   0  214m  26m  24m S  9.6  4.5   0:11.24 postmaster
28628 postgres  20   0  214m  26m  24m S  9.3  4.5   0:11.20 postmaster
28624 postgres  20   0  214m  26m  24m R  8.3  4.5   0:11.68 postmaster
28626 postgres  20   0  214m  26m  24m R  8.0  4.5   0:11.69 postmaster
28631 postgres  20   0  214m  26m  24m R  8.0  4.5   0:10.65 postmaster
28633 postgres  20   0  214m  26m  24m R  7.6  4.5   0:11.36 postmaster
28632 postgres  20   0  214m  26m  24m R  4.7  4.5   0:11.04 postmaster

Interpretation of Results

  1. We are testing the stability of the application itself over time and under load. You can safely ignore the “200 OK” and the “404 Not Found” results. The only responses we’re interested in are “No data received” and “302 Found”. The 200 OK responses are generated because the test method requires an index page of URLs to be crawled for stress testing purposes. This index page is served by Apache (and not Tomcat) and won't be needed in a production deployment. The 404 Not Found responses are due to the test dataset itself.
  2. If the application suffered memory leaks then the rate of “No data received” responses should have increased as time progressed. This wasn’t the case. We can see in the first graph (“Current loadgen test results”) that the 302 Found responses were steady all the way through, and there was no change in gradient even after 24 hours. We also found that the connection error rate was less than 0.02%, which is well within acceptable limits.
  3. Under this kind of load (3 concurrent instances or 1.4 requests/sec), the ratio of CPU consumption between the JVM and the postgres database was approximately 2:3. The second time the test has been re-run with a somewhat higher load of 8 concurrent instances (still only 1.3 requests/sec), and JVM to postgres CPU usage ratio changed to around 1:3, which is quite expected as the PID Service hands off a chunk of actual computation to the database tier.





Conclusion

No major issues were detected with stress and volume testing of the PID Service. The application itself was fully responsive at the end of the test, and no restarts were required.

Acknowledgements

Special thanks to Arya Abdian for detailed reports on the PID Service stress and volume testing conducted in a controlled environment.

-- PavelGolodoniuc - 13 Mar 2013
Topic attachments
I Attachment Action Size Date Who Comment
PerfTestDataset.zipzip PerfTestDataset.zip manage 250.3 K 13 Mar 2013 - 13:00 PavelGolodoniuc PID Service performance testing dataset (backup file)
plot1.gifgif plot1.gif manage 7.0 K 13 Mar 2013 - 12:28 PavelGolodoniuc  
plot2.gifgif plot2.gif manage 7.7 K 13 Mar 2013 - 12:29 PavelGolodoniuc  
plot3.jpgjpg plot3.jpg manage 36.3 K 13 Mar 2013 - 12:29 PavelGolodoniuc  
plot4.jpgjpg plot4.jpg manage 25.7 K 13 Mar 2013 - 12:29 PavelGolodoniuc  
plot5.jpgjpg plot5.jpg manage 28.9 K 13 Mar 2013 - 12:29 PavelGolodoniuc  
psqldump8.1.zipzip psqldump8.1.zip manage 1290.2 K 13 Mar 2013 - 13:01 PavelGolodoniuc PID Service performance testing dataset (PostgreSQL 8.1 SQL dump)
Topic revision: r1 - 13 Mar 2013, PavelGolodoniuc
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).