Nodejs max number of open websocket on the server - node.js

I am finally trying to benchmark my nodejs application. To benchmark I open a nodejs instance where I run my server, and one where I run my benchmark.
The benchmark open a number of websockets, the server wait for all of them to be open, gets the data from them, and only after having all the data, crunch some numbers.
I have tested it with 2000 websocket and it works fine, then I have tried with 10 000 and it took 38 minutes to open all the connection.
Being my first nodejs/websocket project,I was wondering if that is a reasonable timing (I guess so), or 38 minutes means that something went wrong in my implementation?
SHORT VERSION: is 38 minutes "too long" to open 10 000 websockets on a nodejs server?
I am testing on a server with two AMD Opteron(TM) Processor 6272, with 96GiB DIMM DDR3 Synchronous 667 MHz (1.5 ns) (12 x 8GB Memory Modules), but I am not doing any load balancing, i.e. my server (I think) runs on only one core.

Related

Efficiency of multithreading and TCP socket coding

I developed a server, which accepts many incoming TCP client connections, and transfers data between them. Client A may send some data via connection A targeting client B. The server forwards the data to connection B so that client B gets the data.
I am doing my testing on a PC with Intel Core i7-8700 CPU # 3.2GHz and 32 GB RAM.
I launched two of these servers, and have 50 threads constantly running.
Each thread repeats the following steps at an interval of 150~300 ms:
Randomly disconnects and connects to one of the servers
When connected to a server, the server creates a new record in a SQL Server database.
When disconnected from a server, the server updates that record in the database.
Sends 16 KB of data to another randomly picked connection, and receives the 16 KB back.
This is what I observed about the two servers on the Windows Task Manager:
There is no memory leak, stabilizes around 28 MB.
CPU usage varies between 8% to 18%.
Power usage varies between "moderate" to "very high".
My questions is:
I have not worked with apps with TCP connections and threads before. So I have no feeling on whether these figures look unreasonably high - if they are, then I must have done the coding in a very inefficient way.
Can you tell?

Socket.io hangs after 2k concurrent connections

I have purchased 2 vCPU core and 4 GB Ram memory VPS server and deployed nodejs Socket.io server. its working fine without any issue upto 2k concurrent connection. But this limit is very small according to me. when connection is reached at 3k socketio server hang and stopped working.
Normally memory usage is 300mb but after 3k connection memory usage is reaching upto 2.5 GB and not emitting packets for several seconds and after that works for very few second and server hang again.
My server is not very small for this amount of connection.
Is there any suggestion for optimisations how to increase concurrent connection without hang after few thousand clients connected simultaneously. for few clients its working fine.

Load testing bottleneck on nodejs with Google Compute Engine

I cannot figure out what is the cause of the bottleneck on this site, very bad response times once about 400 users reached. The site is on Google compute engine, using an instance group, with network load balancing. We created the project with sailjs.
I have been doing load testing with Google container engine using kubernetes, running the locust.py script.
The main results for one of the tests are:
RPS : 30
Spawn rate: 5 p/s
TOTALS USERS: 1000
AVG(res time): 27500!! (27,5 seconds)
The response time initially is great, below one second, but when it starts reaching about 400 users the response time starts to jump massively.
I have tested obvious factors that can influence that response time, results below:
Compute engine Instances
(2 x standard-n2, 200gb disk, ram:7.5gb per instance):
Only about 20% cpu utilization used
Outgoing network bytes: 340k bytes/sec
Incoming network bytes: 190k bytes/sec
Disk operations: 1 op/sec
Memory: below 10%
MySQL:
Max_used_connections : 41 (below total possible)
Connection errors: 0
All other results for MySQL also seem fine, no reason to cause bottleneck.
I tried the same test for a new sailjs created project, and it did better, but still had terrible results, 5 seconds res time for about 2000 users.
What else should I test? What could be the bottleneck?
Are you doing any file reading/writing? This is a major obstacle in node.js, and will always cause some issues. Caching read files or removing the need for such code should be done as much as possible. In my own experience, serving files like images, css, js and such trough my node server would start causing trouble when the amount of concurrent requests increased. The solution was to serve all of this trough a CDN.
Another proble could be the mysql driver. We had some problems with connection not being closed correctly (Not using sails.js, but I think they used the same driver at the time I encountered this), so they would cause problems on the mysql server, resulting in long delays when fetching data from the database. You should time/track the amount of mysql queries and make sure they arent delayed.
Lastly, it could be some special issue with sails.js and Google compute engine. You should make sure there arent any open issues on either of these about the same problem you are experiencing.

Why node.js+mongodb does not gives 100 req/sec throughput for 100 req sent in a second?

I kept node.js sever on one machine and mongodb sever on another machine. requests were mixture of 70% read and 30% write. It is observed that at 100 request in a second throughput is 60req/sec and at 200 requests second throughput is 130 req/sec. cpu and memory usage is same in both the cases. If application can server 130 req/sec then why it has not server 100req/sec in first case since cpu and memory utilization is same. machines are using ubuntu server 14.04
Make user threads in Jmeter and use loop forever for 300 sec. Then get the values.

Why does Windows Azure not scale?

I am trying to scale websites on Widows Azure. So far I‘ve tested Wordpress, Ghost (Blog) and a plain HTML site and it’s all the same: If I scale them up (add instances), they don’t get any faster. I am sure I must do something wrong…
This is what I did:
I created a new shared website, with a plain HTML Bootstrap template on it. http://demobootstrapsite.azurewebsites.net/
Then I installed ab.exe from the Apache Project on a hosted bare metal server (4 Cores, 12 GB RAM, 100 MBit)
I ran the test two times. The first time with a single shared instance and the second time with two shared instances using this command:
ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/
This means ab.exe is going to create 10000 requests with 100 parallel threads.
I expected the response times of the test with two shared instances to be significantly lower than the response times with just one shared instance. But the mean time per request even rised a bit from 1452.519 ms with one shared instance to 1460.631 ms with two shared instances. Later I even ran the site on 8 shared instances with no effect at all. My first thought was that maybe the shared instances are the problem. So I put the site on a standard VM and ran the test again. But the problems remain the same. Also adding more instances didn’t make the site any faster (even a bit slower).
Later I‘ve whatched a Video with Scott Hanselman and Stefan Schackow in which they‘ve explained the Azure Scaling features. Stefan says that Azure has a kind of „sticky loadbalancing“ which will redirect a client always to the same instance/VM to avoid compatibility problems with statefull applications. So I‘ve checked the WebServer logs and I found a Logfile for every instance with about the same size. Usually that means that every instance was used during the test..
PS: During the test run I‘ve checked the response time oft the website from my local computer (from a different network than the server) and the response times were about 1.5s.
Here are the test results:
######################################
1 instance result
######################################
PS C:\abtest> .\ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking demobootstrapsite.azurewebsites.net (be patient)
Finished 10000 requests
Server Software: Microsoft-IIS/8.0
Server Hostname: demobootstrapsite.azurewebsites.net
Server Port: 80
Document Path: /
Document Length: 16396 bytes
Concurrency Level: 100
Time taken for tests: 145.252 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 168800000 bytes
HTML transferred: 163960000 bytes
Requests per second: 68.85 [#/sec] (mean)
Time per request: 1452.519 [ms] (mean)
Time per request: 14.525 [ms] (mean, across all concurrent requests)
Transfer rate: 1134.88 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 14 8.1 16 78
Processing: 47 1430 93.9 1435 1622
Waiting: 16 705 399.3 702 1544
Total: 62 1445 94.1 1451 1638
Percentage of the requests served within a certain time (ms)
50% 1451
66% 1466
75% 1482
80% 1498
90% 1513
95% 1529
98% 1544
99% 1560
100% 1638 (longest request)
######################################
2 instances result
######################################
PS C:\abtest> .\ab.exe -n 10000 -c 100 http://demobootstrapsite.azurewebsites.net/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking demobootstrapsite.azurewebsites.net (be patient)
Finished 10000 requests
Server Software: Microsoft-IIS/8.0
Server Hostname: demobootstrapsite.azurewebsites.net
Server Port: 80
Document Path: /
Document Length: 16396 bytes
Concurrency Level: 100
Time taken for tests: 146.063 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 168800046 bytes
HTML transferred: 163960000 bytes
Requests per second: 68.46 [#/sec] (mean)
Time per request: 1460.631 [ms] (mean)
Time per request: 14.606 [ms] (mean, across all concurrent requests)
Transfer rate: 1128.58 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 14 8.1 16 78
Processing: 31 1439 92.8 1451 1607
Waiting: 16 712 402.5 702 1529
Total: 47 1453 92.9 1466 1622
Percentage of the requests served within a certain time (ms)
50% 1466
66% 1482
75% 1482
80% 1498
90% 1513
95% 1529
98% 1544
99% 1560
100% 1622 (longest request)
"Scaling" the website in terms of resources adds more capacity to accept more requests, and won't increase the speed at which a single capacity instance can perform when not overloaded.
For example; assume a Small VM can accept 100 requests per second, processing each request at 1000ms, (and if it was 101 requests per second, each request would start to slow down to say 1500ms) then scaling to more Small VMs won't increase the speed at which a single request can be processed, it just raises us to accepting 200 requests per second under 1000ms each (as now both machines are not overloaded).
For per-request performance; the code itself (and CPU performance of the Azure VM) will impact how quickly a single request can be executed.
Given the complete absence in the question of the most important detail of such a test, it sounds to me you are merely testing your Internet connection bandwidth. 10 Mb/sec is a very common rate.
No, it doesn't scale.
I usually run logparser against the iis logs that were generated at the time of the load test and calculate the RPS and latency (time-taken field) off that. This helps isolate the slowness from network, to server processing to actual load test tool reporting.
Some ideas:
Is Azure throttling to prevent a DOS attack? You are making a hell of a lot of requests from one location to a single page.
Try Small sized Web Sites rather than shared. Capacity and Scaling might be quite different. Load of 50 requests/sec doesn't seem terrible for a shared service.
Try to identify where that time is going. 1.4s is a really long time.
Run load tests from several different machines simultaneously, to determine if there's throttling going on or you're affected by sticky load balancing or other network artefacts.
You said it's ok under load of about 10 concurrent requests at 50 requests/second. Gradually increase the load you're putting on the server to determine the point at which it starts to choke. Do this across multiple machines too.
Can you log on to Web Sites? Probably not ... see if you can replicate the same issues on a Cloud Service Web Role and analyze from there using Performance Monitor and typical IIS tools to see where the bottleneck is, or if it's even on the machine versus Azure network infrastructure.
Before you load test the websites, you should do a baseline test with a single instance, say with 10 concurrent threads, to check how the website handles when not under load. Then use this base line to understand how the websites behave under load.
For example, if the baseline shows the website responds in 1.5s to requests when not under load, and again with 1.5s under load, then this means the website is able to handle the load easily. If under load the website takes 3-4s using a single instance, then this means it doesn't handle the load so well - try to add another instance and check if the response time improves.
Here
You can test for FREE
http://tools.pingdom.com/fpt/#!/ELmHA/http://demobootstrapsite.azurewebsites.net/
http://tools.pingdom.com/
Regards
Valentin

Resources