Cassandra READ Where In performance - cassandra

I have a Cassandra cluster of 6 nodes, each one has 96 CPU/800 RAM.
My table for performance tests is:
create table if not exists space.table
(
id bigint primary key,
data frozen<list<float>>,
updated_at timestamp
);
Table contains 150.000.000 rows.
When I was testing it with query:
SELECT * FROM space.table WHERE id = X
I even wasn't able to overload cluster, the client was overloaded by itself, RPS to cluster were 350.000.
Now I'm testing a second test case:
SELECT * FROM space.table WHERE id in (X1, X2 ... X3000)
I want to get 3000 random rows from Cassandra per request.
Max RPS in this case 15 RPS after that occurs a lot of pending tasks in Cassandra thread pool with type: Native-Transport-Requests.
Isn't it the best idea to get big resultsets from cassandra? What is the best practice, for sure I can divide 3000 rows to separate requests, for example 30 request each with 100 ids.
Where can I find info about it, maybe WHERE IN operation is not good from performance perspective?
Update:
Want to share my measurements for getting 3000 rows by different chunk size from Cassandra:
Test with 3000 ids per request
Latency: 5 seconds
Max RPS to cassandra: 20
Test with 100 ids per request (total 300 request each by 100 ids)
Latency at 350 rps to service (350 * 30 = 10500 requests to cassandra): 170 ms (q99), 95 ms (q90), 75 ms(q50)
Max RPS to cassandra: 350 * 30 = 10500
Test with 20 ids per request (total 150 request each by 20 ids)
Latency at 250 rps to service(250 * 150 = 37500 requests to cassandra): 49 ms (q99), 46 ms (q90), 32 ms(q50)
Latency at 600 rps to service(600 * 150 = 90000 requests to cassandra): 190 ms (q99), 180 ms (q90), 148 ms(q50)
Max RPS to cassandra: 650 * 150 = 97500
Test with 10 ids per request (total 300 request each by 10 ids)
Latency at 250 rps to service(250 * 300 = 75000 requests to cassandra): 48 ms (q99), 31 ms (q90), 11 ms(q50)
Latency at 600 rps to service(600 * 300 = 180000 requests to cassandra): 159 ms (q99), 95 ms (q90), 75 ms(q50)
Max RPS to cassandra: 650 * 300 = 195000
Test with 5 ids per request (total 600 request each by 5 ids)
Latency at 550 rps to service(550 * 600 = 330000 requests to cassandra): 97 ms (q99), 92 ms (q90), 60 ms(q50)
Max RPS to cassandra: 550 * 660 = 363000
Test with 1 ids per request (total 3000 request each by 1 ids)
Latency at 190 rps to service(250 * 3000 = 750000 requests to cassandra): 49 ms (q99), 43 ms (q90), 30 ms(q50)
Max RPS to cassandra: 190 * 3000 = 570000

The IN is really not recommended to use, especially for so many individual partition keys. The problem is that when you send query with IN:
query sent to the any node (coordinator node), not necessary node that is owning the data
then that coordinator node identifies which nodes are owning data for specific partition keys
queries are sent to identified nodes
coordinator node collects results from all nodes
result is consolidated and sent back
This puts a lot of load onto the coordinator node, and making the whole query as slow as the slowest node in the cluster.
The better solution would be to use prepared queries and sent individual async requests for each of partition keys, and then collect data in your application. Just take into account that there are limits on how many in-flight queries could be per connection.
P.S. It should be possible to optimize that further, by looking into your values, finding if different partition keys are in the same token range, generate the IN query for all keys in the same token range, and send that query setting the routing key explicitly. But it requires more advanced coding.

Related

SAS Proc IML Simulate from empirical data with limits

This might sound bonkers, but looking to see if there are any ideas on how to do this.
I have N categories (say 7) where a set number of people (say 1000) have to be allocated. I know from historical data the minimum and maximum for each category (there is limited historical data, say 15 samples, so I have data that looks like this - if I had a larger sample, I would try to generate a distribution for each category from all the samples, but there isn't.
-Year 1: [78 97 300 358 132 35 0]
-Year 2: [24 74 346 300 148 84 22]
-.
-.
-Year 15:[25 85 382 302 146 52 8]
The min and max for each category over these 15 years of data is:
Min: [25 74 252 278 112 27 0 ]
Max: [132 141 382 360 177 84 22]
I am trying to scale this using simulation - by allocating 1000 to each category within the min and max limits, and repeating it. The only condition is that the sum of the allocation across the seven categories in each simulation has to sum to 1000.
Any ideas would be greatly appreciated!
The distribution you want is called the multinomial distribution. You can use the RandMultinomial function in SAS/IML to produce random samples from the multinomial distribution. To use the multinomial distribution, you need to know the probability of an individual in each category. If this probability has not changed over time, the best estimate of this probability is to take the average proportion in each category.
Thus, I would recommend using ALL the data to estimate the probability, not just max and min:
proc iml;
X = {...}; /* X is a 15 x 7 matrix of counts, each row is a year */
mean = mean(X);
p = mean / sum(mean);
/* simulate new counts by using the multinomial distribution */
numSamples = 10;
SampleSize = 1000;
Y = randmultinomial(numSamples, SampleSize, p);
print Y;
Now, if you insist on using the max/min, you could use the midrange to estimate the most likely value and use that to estimate the probabilty, as follows:
Min = {25 74 252 278 112 27 0};
Max = {132 141 382 360 177 84 22};
/* use midrange to estimate probabilities */
midrange = (Min + Max)/2;
p = midrange / sum(midrange);
/* now use RandMultinomial, as before */
If you use the second method, there is no guarantee that the simulated values will not exceed the Min/Max values, although in practice many of the samples will obey that criterion.
Personally, I advocate the first method, which uses the average count. Or you can use a time-weighted count, if you think recent observations are more relevant than observations from 15 years ago.

How do I interpret the includedQuantity in Azure's rate card API?

I am trying to calculate the cost incurred on my Azure Pay-As-You-Go subscription using the usage and rate-card API's. For this I came across this parameter includedQuantity in the rate-card API's which, according to the documentation, refers to "The resource quantity that is included in the offer at no cost. Consumption beyond this quantity will be charged."
Consider an example, where the usageQuantity is 700 and the rate-card is as follows:
0 : 20
101 : 15
501 : 10
and the includedQuantity is 200.
My assumption was, the calculation would be as one of the following:
Quantity = (700 - 200) = 500
Hence, cost = 100 * 20 + 400 * 15 = 8000
New rate card:
0 : 0
101 : 0
201 : 15
501 : 10
So, cost = 300 * 15 + 200 * 10 = 6500
I have seen this question, but it does not clarify the includedQuantity properly.
Great question! So I checked with Azure Billing team on this and what they told me is that they will first take off the included units (200 in your example) and then apply graduated pricing on the remaining units.
Based on this, your cost would be 4500:
Total units consumed: 700
Included units: 200
Tiered pricing: {0-100 = 0; 101-200 = 0; 201-500=15; 501-No Upper Limit=10}
4500 = 0 x 100 + 0 x 100 + 15 x 300

Coordinator get responce from one node notably later than from other nodes

Please, help me to understand what i missed.
I see strange behavior of one cluster node on SELECT with LIMIT and ORDER BY DESC clauses:
SELECT cid FROM test_cf WHERE uid = 0x50236b6de695baa1140004bf ORDER BY tuuid DESC LIMIT 1000;
TRACING (only part):
…
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:25.117000 | 10.0.23.15 | 7862
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:25.136000 | 10.0.25.57 | 6283
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:38.568000 | 10.0.24.51 | 457931
…
10.0.25.56 - coordinator node
10.0.23.15, 10.0.24.51, 10.0.25.57 - node with data
Coordinator get response from 10.0.24.51 13 seconds later than other nodes! Why so? How can i fix it?
Number of rows for partition key (uid = 0x50236b6de695baa1140004bf) is about 300.
All is fine if we use ORDER BY ASC (our clustering order) or LIMIT value less than number of rows for this partition key.
Cassandra (v2.2.5) cluster contains 25 nodes.
Every node holds about 400Gb of data.
Cluster is placed in AWS. Nodes are evenly distributed over 3 subnets in VPC. Type of instance for nodes is c3.4xlarge (16 CPU cores, 30GB RAM). We use EBS-backed storages (1TB GP SSD).
Keyspace RF equals 3.
Column family:
CREATE TABLE test_cf (
uid blob,
tuuid timeuuid,
cid text,
cuid blob,
PRIMARY KEY (uid, tuuid)
) WITH CLUSTERING ORDER BY (tuuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction ={'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression ={'sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
nodetool gcstats (10.0.25.57):
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1208504 368 4559 73 553798792712 58 305691840
nodetool gcstats (10.0.23.15):
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1445602 369 3120 57 381929718000 38 277907601
nodetool gcstats (10.0.24.51):
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1174966 397 4137 69 1900387479552 45 304448986
This could be due to a number of factors both related and not related to Cassandra.
Non-Cassandra Specific
How does the hardware (CPU/RAM/Disk Type (SSD v Rotational) on this
node compare to the other nodes?
How is the network configured? Is traffic to this node slower than other nodes? Do you have a routing issue between the nodes?
How does the load on this server compare to other nodes?
Cassandra Specific
Is the JVM properly configured? Is GC running significantly more frequently than the other nodes? Check nodetool gcstats on this and other nodes to compare.
Has compaction been run on this node recently? Check nodetool compactionhistory
Are there any issues with corrupted files on disk?
Have you checked the system.log to see if it contains any information.
Besides general Linux troubleshooting I would suggest you compare some of the specific C* functionality using nodetool and look for differences:
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsNodetool_r.html

Slow performance for Node.js running on AWS

I am running a very simple RESTful API on AWS using Node.js. The API takes a request in the form of '/rest/users/jdoe' and returns the following (it's all done in memory, no database involved):
{
username: 'jdoe',
firstName: 'John',
lastName: 'Doe'
}
The performance of this API on Node.js + AWS is horrible compared to the local network - only 9 requests/sec vs. 2,214 requests/sec on a local network. AWS is running a m1.medium instance whereas the local Node server is a desktop machine with an Intel i7-950 processor. Trying to figure out why such a huge difference in performance.
Benchmarks using Apache Bench are as follows:
Local Network
10,000 requests with concurrency of 100/group
> ab -n 10000 -c 100 http://192.168.1.100:8080/rest/users/jdoe
Document Path: /rest/users/jdoe
Document Length: 70 bytes
Concurrency Level: 100
Time taken for tests: 4.516 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 2350000 bytes
HTML transferred: 700000 bytes
Requests per second: 2214.22 [#/sec] (mean)
Time per request: 45.163 [ms] (mean)
Time per request: 0.452 [ms] (mean, across all concurrent requests)
Transfer rate: 508.15 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.4 0 2
Processing: 28 45 7.2 44 74
Waiting: 22 43 7.5 42 74
Total: 28 45 7.2 44 74
Percentage of the requests served within a certain time (ms)
50% 44
66% 46
75% 49
80% 51
90% 54
95% 59
98% 65
99% 67
100% 74 (longest request)
AWS
1,000 requests with concurrency of 100/group
(10,000 requests would have taken too long)
C:\apps\apache-2.2.21\bin>ab -n 1000 -c 100 http://54.200.x.xxx:8080/rest/users/jdoe
Document Path: /rest/users/jdoe
Document Length: 70 bytes
Concurrency Level: 100
Time taken for tests: 105.693 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 235000 bytes
HTML transferred: 70000 bytes
Requests per second: 9.46 [#/sec] (mean)
Time per request: 10569.305 [ms] (mean)
Time per request: 105.693 [ms] (mean, across all concurrent requests)
Transfer rate: 2.17 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 98 105 3.8 106 122
Processing: 103 9934 1844.8 10434 10633
Waiting: 103 5252 3026.5 5253 10606
Total: 204 10040 1844.9 10540 10736
Percentage of the requests served within a certain time (ms)
50% 10540
66% 10564
75% 10588
80% 10596
90% 10659
95% 10691
98% 10710
99% 10726
100% 10736 (longest request)
Questions:
Connect time for AWS is 105 ms (avg) compared to 0 ms on local network. I assume that this is because it takes a lot more time to open a socket to AWS then to a server on a local network. Is there anything to be done here for better performance under load assuming requests are coming in from multiple machines across the globe.
More serious is the server processing time - 45 ms for local server compared to 9.9 seconds for AWS! I can't figure out what's going on in here. The server is only pressing 9.46 requests/sec. which is peanuts!
Any insight into these issues much appreciated. I am nervous about putting a serious application on Node+AWS if it can't perform super fast on such a simple application.
For reference here's my server code:
var express = require('express');
var app = express();
app.get('/rest/users/:id', function(req, res) {
var user = {
username: req.params.id,
firstName: 'John',
lastName: 'Doe'
};
res.json(user);
});
app.listen(8080);
console.log('Listening on port 8080');
Edit
Single request sent in isolation (-n 1 -c 1)
Requests per second: 4.67 [#/sec] (mean)
Time per request: 214.013 [ms] (mean)
Time per request: 214.013 [ms] (mean, across all concurrent requests)
Transfer rate: 1.07 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 104 104 0.0 104 104
Processing: 110 110 0.0 110 110
Waiting: 110 110 0.0 110 110
Total: 214 214 0.0 214 214
10 request all sent concurrently (-n 10 -c 10)
Requests per second: 8.81 [#/sec] (mean)
Time per request: 1135.066 [ms] (mean)
Time per request: 113.507 [ms] (mean, across all concurrent requests)
Transfer rate: 2.02 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 98 103 3.4 102 110
Processing: 102 477 296.0 520 928
Waiting: 102 477 295.9 520 928
Total: 205 580 295.6 621 1033
Results using wrk
As suggested by Andrey Sidorov. The results are MUCH better - 2821 requests per second:
Running 30s test # http://54.200.x.xxx:8080/rest/users/jdoe
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 137.04ms 48.12ms 2.66s 98.89%
Req/Sec 238.11 27.97 303.00 88.91%
84659 requests in 30.01s, 19.38MB read
Socket errors: connect 0, read 0, write 0, timeout 53
Requests/sec: 2821.41
Transfer/sec: 661.27KB
So it certainly looks like the culprit is ApacheBench! Unbelievable!
It's probably ab issue (see also this question). There is nothing wrong in your server code. I suggest to try to benchmark using wrk load testing tool. Your example on my t1.micro:
wrk git:master ❯ ./wrk -t12 -c400 -d30s http://some-amazon-hostname.com/rest/users/10 ✭
Running 30s test # http://some-amazon-hostname.com/rest/users/10
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 333.42ms 322.01ms 3.20s 91.33%
Req/Sec 135.02 59.20 283.00 65.32%
48965 requests in 30.00s, 11.95MB read
Requests/sec: 1631.98
Transfer/sec: 407.99KB

Node.js a lot better in ab without clustering, what I'm missing?

Update 1: #BagosGiAr tests with a quite similar configuration shows the cluster always should perform better. That is, there is some problem with my configuration, and I'm asking you to help me find out what could be.
Update 2: I'd like to go in deep of this problem. I've tested on a LiveCD* (Xubuntu 13.04), same node version. First thing is that, with Linux, performances are way better than Windows: -n 100000 -c 1000 gives me 6409.85 reqs/sec without cluster, 7215.74 reqs/sec with clustering. Windows build has definitely a lot of problems. Still I want to investigate why this is happening only to me, given that some people with a similar configuration perform better (and clustering performs well too).
*It should be noted that LiveCD uses a RAM filesystem, while in Windows I was using a fast SSD.
How this is possible? Shouldn't result be better with cluster module? Specs: Windows 7 x64, Dual Core P8700 2.53Ghz, 4GB RAM, Node.js 0.10.5, ab 2.3. Test command line is ab -n 10000 -c 1000 http://127.0.0.1:8080/.
var http = require('http');
http.createServer(function (req, res) {
res.end('Hello World');
}).listen(8080);
Benchmark result ~ 2840.75 reqs/second:
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /
Document Length: 12 bytes
Concurrency Level: 1000
Time taken for tests: 3.520 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 870000 bytes
HTML transferred: 120000 bytes
Requests per second: 2840.75 [#/sec] (mean)
Time per request: 352.020 [ms] (mean)
Time per request: 0.352 [ms] (mean, across all concurrent requests)
Transfer rate: 241.35 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 7.1 0 505
Processing: 61 296 215.9 245 1262
Waiting: 31 217 216.7 174 1224
Total: 61 297 216.1 245 1262
Percentage of the requests served within a certain time (ms)
50% 245
66% 253
75% 257
80% 265
90% 281
95% 772
98% 1245
99% 1252
100% 1262 (longest request)
With cluster module:
var cluster = require('cluster'),
http = require('http'),
numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function (worker, code, signal) {
console.log('worker ' + worder.process.pid + ' died');
});
} else {
http.createServer(function (req, res) {
res.end('Hello World');
}).listen(8080);
}
... and with the same benchmark, result is worst: 849.64 reqs/sec:
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /
Document Length: 12 bytes
Concurrency Level: 1000
Time taken for tests: 11.770 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 870000 bytes
HTML transferred: 120000 bytes
Requests per second: 849.64 [#/sec] (mean)
Time per request: 1176.967 [ms] (mean)
Time per request: 1.177 [ms] (mean, across all concurrent requests)
Transfer rate: 72.19 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 21.3 0 509
Processing: 42 1085 362.4 1243 2274
Waiting: 27 685 409.8 673 1734
Total: 42 1086 362.7 1243 2275
Percentage of the requests served within a certain time (ms)
50% 1243
66% 1275
75% 1286
80% 1290
90% 1334
95% 1759
98% 1772
99% 1787
100% 2275 (longest request)
You are not giving port number 8080 in your url address.
By default 80 is used when no port given.(8080 is default port used for Apache Tomcat). Maybe another server is listening on port 80 on your machine.
Update
Machine Specs : Intel(R) Xeon(R) CPU X5650 # 2.67GHz, 64GB RAM,CentOS Linux release 6.0 (Final), node -v 0.8.8, ab -V 2.3
I think the problem in your case is that either Windows is not efficiently using the resources or CPU or RAM is being saturated when you run the benchmark.
Without cluster (used the same script)
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.232.5.169 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: 10.232.5.169
Server Port: 8000
Document Path: /
Document Length: 11 bytes
Concurrency Level: 1000
Time taken for tests: 3.196 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 860000 bytes
HTML transferred: 110000 bytes
Requests per second: 3129.14 [#/sec] (mean)
Time per request: 319.577 [ms] (mean)
Time per request: 0.320 [ms] (mean, across all concurrent requests)
Transfer rate: 262.80 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 3 43.0 0 2999
Processing: 1 81 39.9 81 201
Waiting: 1 81 39.9 81 201
Total: 12 84 57.8 82 3000
Percentage of the requests served within a certain time (ms)
50% 82
66% 103
75% 114
80% 120
90% 140
95% 143
98% 170
99% 183
100% 3000 (longest request)
With cluster (used your cluster script)
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.232.5.169 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: 10.232.5.169
Server Port: 8000
Document Path: /
Document Length: 11 bytes
Concurrency Level: 1000
Time taken for tests: 1.056 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 924672 bytes
HTML transferred: 118272 bytes
Requests per second: 9467.95 [#/sec] (mean)
Time per request: 105.620 [ms] (mean)
Time per request: 0.106 [ms] (mean, across all concurrent requests)
Transfer rate: 854.96 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 22 47 13.6 46 78
Processing: 23 52 13.8 52 102
Waiting: 5 22 17.6 17 83
Total: 77 99 5.8 100 142
Percentage of the requests served within a certain time (ms)
50% 100
66% 101
75% 102
80% 102
90% 104
95% 105
98% 110
99% 117
100% 142 (longest request)
I assume this is a result for not using the concurrent option of ApacheBench. as a default ab makes one request at the time, so the each request (in the cluster test) is served by one node and the rest of them stay idle. If you use the -c option you will benchmark the cluster mode of nodejs.
eg
ab -n 10000 -c 4 -t 25 http://127.0.0.1:8083/
My result are:
Without cluster ab -n 10000 -t 25 http://127.0.0.1:8083/:
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8083
Document Path: /
Document Length: 11 bytes
Concurrency Level: 1
Time taken for tests: 16.503 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Total transferred: 4300000 bytes
HTML transferred: 550000 bytes
Requests per second: 3029.66 [#/sec] (mean)
Time per request: 0.330 [ms] (mean)
Time per request: 0.330 [ms] (mean, across all concurrent requests)
Transfer rate: 254.44 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 1
Processing: 0 0 0.4 0 13
Waiting: 0 0 0.4 0 11
Total: 0 0 0.5 0 13
Percentage of the requests served within a certain time (ms)
50% 0
66% 0
75% 1
80% 1
90% 1
95% 1
98% 1
99% 1
100% 13 (longest request)
With cluster ab -n 10000 -c 4 -t 25 http://127.0.0.1:8083/:
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8083
Document Path: /
Document Length: 11 bytes
Concurrency Level: 4
Time taken for tests: 8.935 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Total transferred: 4300000 bytes
HTML transferred: 550000 bytes
Requests per second: 5595.99 [#/sec] (mean)
Time per request: 0.715 [ms] (mean)
Time per request: 0.179 [ms] (mean, across all concurrent requests)
Transfer rate: 469.98 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 1
Processing: 0 1 0.6 1 17
Waiting: 0 0 0.6 0 17
Total: 0 1 0.6 1 18
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 1
95% 1
98% 1
99% 1
100% 18 (longest request)
Cheers!
EDIT
I forgot my specifications; Windows 8x64, intel core i5-2430M # 2.4GHz, 6GB RAM

Resources