How to use siege properly to performance test a web server? - performance-testing

I am trying to use siege for performance tests on a webpage. See also this documentation.
But when I try to make an example request like
siege -c=1 --reps=1 https://www.google.com/search?q=42
I get the following output:
HTTP/1.1 200 0.82 secs: 24729 bytes ==> GET /search?q=42
Transactions: 1 hits
Availability: 100.00 %
Elapsed time: 1.82 secs
Data transferred: 0.02 MB
Response time: 0.82 secs
Transaction rate: 0.55 trans/sec
Throughput: 0.01 MB/sec
Concurrency: 0.45
Successful transactions: 1
Failed transactions: 0
Longest transaction: 0.82
Shortest transaction: 0.82
I thought you get the complete bunch of requests that are being made when connecting to a web site? Because when I open the inspector and go to the URL https://www.google.com/search?q=42 I get about 20 requests coming from google.com, with a total of some hundred kilobytes. And with siege it is only one request with 24 kilobytes?
Am I doing something wrong? Do I understand the documentation incorrect?

You have specified exactly one URL, so exactly one URL was tested. Siege is not a browser, so it is not parsing and evaluating html. You can still specify multiple URLs, e.g. via URLs file - see Siege doc: https://www.joedog.org/siege-manual/

Related

Node+Express+MongoDB Native Client Performance issue

I am testing the performance of Node.js (ExpressJS/Fastify), Python (Flask) and Java (Spring Boot with webflux) with MongoDB. I hosted all these sample applications on the same server one after another so all services have the same environment. I used two different tools Load-test and Apache Benchmark cli for measuring the performance.
All the code for the Node sample is present in this repository:
benchmark-nodejs-mongodb
I have executed multiple tests with various combinations of the number of requests and concurrent requests with both the tools
Apache Benchmark Total 1K requests and 100 concurrent
ab -k -n 1000 -c 100 http://{{server}}:7102/api/case1/1000
Load-Test Total 100 requests and 10 concurrent
loadtest http://{{server}}:7102/api/case1/1000 -n 100 -c 10
The results are also attached to the Github repository and are shocking for NodeJS as compared to other technologies, either the requests are breaking in between the test or the completion of the test is taking too much time.
Server Configuration: Not dedicated but
CPU: Core i7 8th Gen 12 Core
RAM: 32GB
Storage: 2TB HDD
Network Bandwidth: 30Mbps
Mongo Server Different nodes on different networks connected through the Internet
Please help me in understanding this issue in detail. I do understand how the Event loop works in nodejs but this problem is not identifiable.
Reproduced
Setup:
Mongodb Atlas M30
AWS c4xlarge in the same region
Results:
No failures
Document Path: /api/case1/1000
Document Length: 37 bytes
Concurrency Level: 100
Time taken for tests: 33.915 seconds
Complete requests: 1000
Failed requests: 0
Keep-Alive requests: 1000
Total transferred: 265000 bytes
HTML transferred: 37000 bytes
Requests per second: 29.49 [#/sec] (mean)
Time per request: 3391.491 [ms] (mean)
Time per request: 33.915 [ms] (mean, across all concurrent requests)
Transfer rate: 7.63 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 3.1 0 12
Processing: 194 3299 1263.1 3019 8976
Waiting: 190 3299 1263.1 3019 8976
Total: 195 3300 1264.0 3019 8976
Length failures on havier load:
Document Path: /api/case1/5000
Document Length: 37 bytes
Concurrency Level: 100
Time taken for tests: 176.851 seconds
Complete requests: 1000
Failed requests: 22
(Connect: 0, Receive: 0, Length: 22, Exceptions: 0)
Keep-Alive requests: 978
Total transferred: 259170 bytes
HTML transferred: 36186 bytes
Requests per second: 5.65 [#/sec] (mean)
Time per request: 17685.149 [ms] (mean)
Time per request: 176.851 [ms] (mean, across all concurrent requests)
Transfer rate: 1.43 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.9 0 4
Processing: 654 17081 5544.0 16660 37911
Waiting: 650 17323 5290.9 16925 37911
Total: 654 17081 5544.1 16660 37911
I copied results of your tests from the github repo for completeness:
Python
Java Spring Webflux
Node Native Mongo
So, there are 3 problems.
Upload bandwidth
ab -k -n 1000 -c 100 http://{{server}}:7102/api/case1/1000 uploads circa 700 MB of bson data over the wire.
30Mb/s = less than 4MB/s which requires at least 100 seconds only to transfer data at top speed. If you test it from home, consumer grade ISP do not always give you the max speed, especially to upload.
It's usually less a problem for servers, especially if application is hosted close to the database. I put some stats for the app and mongo servers hosted on aws in the same zone in the question itself.
Failed requests
All I could notice are "Length" failures - the number of bytes factually received does not match.
It happens only to the last batch (100 requests) because some race conditions in nodejs cluster module - the master closes connections to the worker threads before worker's http.response.end() writes data to the socket. On TCP level it looks like this:
After 46 seconds of struggles there is no HTTP 200 OK, only FIN, ACK.
This is very easy to fix by using nginx reverse proxy + number of nodejs workers started manually instead of built-in cluster module, or let k8s do resource management.
In short - don't use nodejs cluster module for network-intensive tasks.
Timeout
It's ab timeout. When network is a limiting factor and you increase the payload x5 - increase default timeout (30 sec) at least x4:
ab -s 120 -k -n 1000 -c 100 http://{{server}}:7102/api/case1/5000
I am sure you did this for other tests, since you report 99 sec/request for java and 81 sec/request for python.
Conclusion
There are nothing shockingly bad with nodejs. Some bugs in the cluster, but it's a very niche usecase to start from, and it's trivial to work it around.
The flamechart:
Most of the CPU time is used to serialise/deserialise bson and send data to the stream, with some 10% spent on the most CPU intensive bson serialiseInto,
If you are using only single server, then you can cache the database operations on the app side and get rid of database latency altogether and only commit to it with an interval or when cache expires.
If there are multiple servers, you may get help from a scalable cache, maybe Redis. Redis alao has client caching and you can still apply your own cache on Redis to boost the performance further.
A plain LRU cache written in NodeJs can do at least 3-5 million lookups per second and even more if key access is based on integers(so it can be sharded like an n-way associative lru cache).
If you group multiple clients into single cache request, then getting help from C++ app can reach hundreds of millions to billions of lookups per second depending on data type.
You can also try sharding the db on extra disk drives like ramdisk if db data is temporary.
Event loop can be offloaded a task queue for database operations and another queue for incoming requests. This way event loop can harness i/o overlapping more, instead of making a client wait for own db operation.

PM2 nodeJS cluster mode testing with apache benchmark

I've set up one nodeJS application which only has one route '/' and I'm using Nginx as a reverse proxy. So app flow is like below:
The user sends the request to the Nginx server.
And as per the location '/' Nginx server passes the request to the node server.
From the nodeJS, '/' route sends one HTML file as a response to the client. For load testing, I've used the apache benchmark.
Apache benchmark command used for testing:
ab -k -c 250 -n 10000 http://localhost/
Please check the apache benchmark response in the following two cases:
Case 1: When clustering mode is not on. (Without pm2, simple nodeJS server without clustering ex: node index.js)
rails#rails-laptop:~$ ab -k -c 250 -n 10000 http://localhost/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: nginx/1.10.3
Server Hostname: localhost
Server Port: 80
Document Path: /
Document Length: 134707 bytes
Concurrency Level: 250
Time taken for tests: 9.531 seconds
Complete requests: 10000
Failed requests: 0
Keep-Alive requests: 10000
Total transferred: 1350590000 bytes
HTML transferred: 1347070000 bytes
Requests per second: 1049.26 [#/sec] (mean)
Time per request: 238.264 [ms] (mean)
Time per request: 0.953 [ms] (mean, across all concurrent requests)
Transfer rate: 138390.37 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.8 0 6
Processing: 38 237 77.6 213 626
Waiting: 31 230 73.8 209 569
Total: 44 237 77.5 213 626
Percentage of the requests served within a certain time (ms)
50% 213
66% 229
75% 247
80% 280
90% 373
95% 395
98% 438
99% 538
100% 626 (longest request)
Case 2: When PM2 clustering mode in ON.(pm2 start index.js -i 4 (4 cluster))
rails#rails-laptop:~$ ab -k -c 250 -n 10000 http://localhost/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: nginx/1.10.3
Server Hostname: localhost
Server Port: 80
Document Path: /
Document Length: 134707 bytes
Concurrency Level: 1
Time taken for tests: 14.109 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 1350540000 bytes
HTML transferred: 1347070000 bytes
Requests per second: 708.79 [#/sec] (mean)
Time per request: 1.411 [ms] (mean)
Time per request: 1.411 [ms] (mean, across all concurrent requests)
Transfer rate: 93481.05 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 9
Processing: 1 1 1.2 1 35
Waiting: 0 1 0.9 1 21
Total: 1 1 1.2 1 35
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 2
95% 3
98% 5
99% 6
100% 35 (longest request)
Now, if you compare the request per second time in both the scenarios, you will see that Requests per second (1049.26 [#/sec] (mean)) when no cluster mode is used higher than the pm2 cluster mode (708.79 [#/sec] (mean)). I don't understand why it so? As far as I know, clustering mode is used to achieve a higher level of concurrency but why there is conflict in the result?
I tried clustering with different parameters:
no process
calculation
for(let i = 1; i <= 50000000; i++){
r += i;
}
sending file
concurrent request count
Here is the git repo
Here is my conclusion:
for serving files, it does not make sense to cluster. I think the network is the bottle neck here, and clustering does not help.
for calculation it does make sense to cluster, cause it keeps the event loop busy, and if you cluster, you have multiple event loops to be kept busy. While testing with calculation, I checked the server cores processes by htop and I considered that the same amount of the clusters I had CPU's 100 percent busy. The performance multiplied by the cluster count, for example if I made 6 node cluster, the performance became 6 times more.
it does not make sense to cluster more than CPU cores you have on the machine. I recommend to reserve one core for the OS.
I made a repository and in the readme file I wrote the detailed results.

why does nodejs orm performs worse than just running a simple query?

Just doing some performance testing using orm2 and seems to be 4 times slower than just querying directly with sql. Any thoughts?
https://github.com/gmaggiotti/rule-restApi/tree/orm-poc
Benchmark using ORM2
Document Path: /rules/
Document Length: 6355 bytes
Concurrency Level: 100
Time taken for tests: 5.745 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 6484000 bytes
HTML transferred: 6355000 bytes
Requests per second: 174.06 [#/sec] (mean)
Time per request: 574.526 [ms] (mean)
Time per request: 5.745 [ms] (mean, across all concurrent requests)
Transfer rate: 1102.13 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 2
Processing: 118 552 83.1 555 857
Waiting: 116 552 83.1 555 857
Total: 119 552 83.0 555 857
Benchmark using just sql
Document Path: /rules/
Document Length: 6355 bytes
Concurrency Level: 100
Time taken for tests: 1.630 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 6484000 bytes
HTML transferred: 6355000 bytes
Requests per second: 613.38 [#/sec] (mean)
Time per request: 163.032 [ms] (mean)
Time per request: 1.630 [ms] (mean, across all concurrent requests)
Transfer rate: 3883.92 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 2
Processing: 98 158 49.2 137 361
Waiting: 98 158 49.2 137 361
Total: 98 158 49.4 137 362
Not sure this is worthy of an answer, but it's too long for a comment.
This is (in my experience) true in every language/platform, for every ORM. In general, you don't use ORMs for query performance, you use them for code maintenance optimization and developer speed.
Why is this the case? Well, as a rule, ORMs have to translate what you say in language X into SQL, and in doing so they won't often come up with the most optimized query. They will typically do the query generation on the fly, and so the actual "building" of the string of (ideally parameterized) SQL takes some small amount of time, as can reflection on the structure of the native code objects to figure out what the right column names, etc.
Many ORMs are also not completely deterministic in terms of how they do this, either, which means that the underlying DB has a harder time caching the query plan than they might otherwise have. Also I couldn't find your actual benchmark tests in the link you provided; it's possible that you're not actually measuring apples to apples.
So I can't answer specifically for the particular module you're using without spending more time on it than I care to, but in general I would discourage this line of questioning for the reasons stated above. The workflow I've often used is to do all my development using the ORM and worry about optimizing queries, etc, once I can do some production time profiling, and at that point I would replace the worst offenders with direct SQL or possibly stored procedures or views (depending on the DB engine) to improve performance where it actually matters.

Why new sailjs project has terrible response times during load testing

I tried load testing our site with terrible results, so I decided to try load test a new ("out of the box") sailsjs project with no changes other than local.js port (8080) and environment mode (production). We are using Google Cloud platform for both site hosting and load testing. The site resources can easily handle the requests:
30% cpu usage - Disk I/O 16k bytes/sec - ram < 10% - no db used
The average and max response times, in milliseconds:
250 users:
Avg 10
Max 89
500 users:
Avg 10
Max 122
750 users:
Avg 26
Max 847
1000 users:
Avg 50 (but starts jumping faster from this point)
Max 3000
2000 users:
Avg 700
Max 6400
2500 user:
Avg 1115
Max 7611
4000 users:
Avg 3030
Max 10370
Is there possibly some bottleneck created because of a limit of 1 thousand, because that's when the bad delays start?
When I try profiling, the major part of the delay is given as (idle).
Sailsjs out of the box seems to be no where near the hundreds of thousands of concurrent users others have achieved, with good response times.

How to find uptime, load average for last one hour in Solaris/Unix like OS

I want to find load average of system for last one hour.
I could find the command:uptime
07:05:05 up 1151 days, 20:06, 2 users, load average: 0.01, 0.10, 0.29
Where average number of jobs in the run queue over the last 1, 5 and 15 minutes.
I want above load average for last one hour. Could anyone tell the way to do so? After that i will display output using Java
An easy solution would be to install the sysstat package. I am not sure whether Solaris is supported but the sources can be found at the download page, so you can try and compile it on your target distribution.
$ yum install sysstat
The command you are after is sar -q, it will display the historical queue size and load average for the time-spans you are after. Like this:
05:53:13 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked
05:53:14 AM 0 200 5.00 4.00 3.00 0
05:53:15 AM 0 200 5.00 4.00 3.00 0
...
Where ldavg-1, ldavg-5, etc are the load averages for preceding 1 and 5 minutes respectively.
You also mentioned that you'd prefer to display the data yourself. In that case you will need to process the binary data stored in the /var/log/sa directory with the sadf tool - it can produce JSON or XML that you can consume in java.

Resources