Low download speed - node.js

Hey guys I am having a big problem and i need some advice .
I have a Dedicated server with those informations :
Atom C2750 8/8t 2,4 / 2,6 GHz
16 GB RAM DDR3 1600MHz
12TB
500Mbps Bandwidth
List item
1Gbps Network Burst
I am running a website using Nodejs where users can download high volume files .
The website evolved rapidly and I am having 10K users per day and an average of 1K concurrent users (downloads).
The problem is the server is getting lower and lower download speed on client's side so I have added a throttle to the downloads to 800Kb/s , it did help a bit but the problem remains the same what should I ?
Thanks

You have to figure out where your worst bottleneck is. You will have to design some measurements and tests and maybe some calculations to help determine where the bottleneck is.
Here are some possibilities:
Total server bandwidth over your server ethernet connection to the Internet. If you have 1K users all trying to download something and you have 500Mbps total bandwidth, then you're only going to get 0.5Mbps or 500Kbps per user. At that point, you need to either shrink the data, reduce the number of users or increase your server bandwidth.
Server CPU. This should be easy to detect. Check your CPU utilization. If your node.js process is using 100% of a CPU, then you are CPU bound and you either need a faster computer or if you have a multi-CPU processor, you can cluster your server on the same host to get more CPUs working for you. If you don't have a multi-CPU process, then get one or cluster across multiple servers (the idea is you need more CPUs). Though I have no idea if you are CPU bound (I suspect you're more likely bandwidth bound), this Atom C2750 has 8 cores so it would be good for clustering, but each core is not particularly fast compared to other Intel CPUs.
Network card. It's possible that your network card could be holding you back and not fully saturating your bandwidth. For example, if you only had a 100Mbps network connection to your server, then that's the max bandwidth you can use. If you think you should have a 1Gbps network connection to your server, then you need to make sure you actually are getting that fast a link.
FYI, the 1Gbps Network Burst probably doesn't help you much if you have lots of users downloading stuff over a longer period of time. That is most useful for a sudden and short peak of activity, not for a continuous high load.

Related

NodeJs slowing down when process consuming big amount of memory

So, i have a nodejs process which is, when scaled consumes around 5-8gb of RAM. It is running within the docker container. launching with the arg --max-old-space-size=12192 to increase the node process limit.
The memory consumption is OK, since i try to use a dedicated server (AMD EPYC CPU, 64GB memory) in place of the horizontal scaling with AWS or other cloud provider, because it is 10x cheaper in case if i can make it work on dedicated server (most of expenses for AWS/Google Cloud goes for network traffic, while the VDS have unlimited. The network side is already optimised with the use of GraphQL and minimising the amount of requests). The process itself processes huge amount of data in memory, in multithreaded fashion. There is no further significant optimisation from the side of the process code itself.
When the process memory consumption reaches 3Gb+, it is significantly slowing down. Docker is not limiting the container resources. The server itself is running on 5-10% load in terms of memory and CPU. SSD driver -> low drive load (low amount of I/O on the server side).
I guess re-writing the app to golang for example might improve it significantly, however that is really a lot of work.
Anything can be done on the server setup / nodejs app side to prevent slowing down?
Thanks!
Solved by horizontal scaling within a single VDS with docker.
As jfriend00 noticed, the nature of the problem might be with the garbage collection - personally have no other guess.

CPU utilization in performance testing

I am doing performance testing on an app. I found when the number of virtual users increases, the response time increases linearly(should be natural, right?), but the CPU utilization stops increasing when reaches around 60%. Does it mean the CPU is the bottleneck? If not, what could be the bottleneck?
The bottleneck might or might not be CPU, you need to consider monitoring other OS metrics as well, to wit:
Physical RAM
Swap usage
Network IO
Disk IO
Each of them could be the bottleneck.
Also when you increase number of users ideal system should increase the number of TPS (transactions per second) by the same factor. When you increase virtual users and TPS is not getting increased the situation is called saturation point and you need to find out what is slowing your system down.
If resources utilization is far from 95-100% and your system provides large response times the reason can be non-optimal code of your application or slow database query or something like that, in this case you will need to use profiling tools to get to the bottom of the issue.
See How to Monitor Your Server Health & Performance During a JMeter Load Test article for more information on the application under test monitoring concept

Solr I/O increases over time

I am running around eight solr servers (version 3.5) instances behind a Load Balancer. All servers are identical and the LB is weighted by number connections. The servers have around 4M documents and receive a constant flow of queries. When the solr server starts, it works fine. But after some time running, it starts to take longer respond to queries, and the server I/O goes crazy to 100%. Look at the New Relic graphic:
If the servers behaves well in the beginning, I it starts to fail after some time? Then if I restart the server, it gets back to low I/O for same time and this repeats over and over.
The answer to this question is related to the content in this blog post.
What happens in this case is that queries are highly dependent of reading solr indexes. These indexes are in disk, to I/O i high. To optimize disk accesses, Linux OS creates a cache in memory for the most accessed disk areas. It uses free memory (not occupied my applications) for this cache. When the memory is full, the server needs to read from disks again. For this reason, when solr restarts, JVM occupies less memory and there is more free space for disk cache.
(The problem is happening in a server with 15Gb RAM and a 20Gb solr index)
The solution is to simple increase the server's RAM, so the whole index fits into memory and no I/O is required.

Nodejs on heavy load

I am using nodejs with socket.io on my chat system. My hardware is 13.6 Ghz Cpu and 16gb ram.
When the online users count reaches 600,some users can't connect to socket,can't send messages.And some users disconnects from chat.
How can I resolve this problem ? What is your opinion of this problem ?
First, I'm not sure how you have a clock speed of 13.6ghz for a single thread. I'd assume your CPU has multiple cores, or your mobo supports multiple processor sockets, and 13.6 is simply a sum. (8.2GHz was a world record that was set July 23rd, 2013.)
Secondly, I'd ask yourself why the disconnect is happening.
Are you watching processor load - is a single thread maxing out its processor allocation (i.e. 100% usage on a single core)?
How's your RAM consumption; is it climbing, and has the OS offloaded
memory onto the page file/swap partition - could there be a memory
leak?
Is your network bandwidth capped? Has it reached its maximum capacity?
My high-level recommendations are:
Make sure your application is non-blocking. This means using asynchronous methods whenever possible. By design, however, all Node.js' Net methods are asynchronous.
Consider clustering your application and use a shared port. This is possible with Node.js child processes using Cluster. This will distribute the load on the CPU to multiple cores. You don't have to worry about load balancing (e.g. round-robin, fastest, ratio) - handling the client is on a first-available first-serve basis, whichever Node.js process can handle the client's request first, wins.
Verify your NIC has enough throughput to handle the load. If it is configured for / auto-negotiating as 10BASE-T or 100BASE-TX half-duplex, you could be in trouble.
Ultimately, you need to perform more diagnostics to isolate the issue. Curiosity, digging, patience and research will lead you to the answer. Your question is far too open-ended to be provided with an exact answer - it is more theoretical. There are also too many variables to pinpoint an exact cause.

Theoretical limit of file descriptors in Linux

I'm running a dedicated proxy server with Squid, and I'm trying to get a feel for the maximum number of connections that the server can handle. I've realized this comes down to available file descriptors on the Linux machine.
I've found plenty of information on increasing maximum file descriptors, but I'd like to find out the theoretical maximum. According to the StackOverflow question "Why do operating systems limit file descriptors?", it comes down to available system RAM, which makes plenty of sense.
Now, given how much RAM I have available, how can I determine a maximum value for file descriptors for the operating system? Some value which would obviously still allow the system to run stably.
Perhaps someone might have an idea given other high-end production servers? What is the 'norm' for maxing out the potential number of simultaneous connections (file descriptors)? Any insight into how I can max-out file descriptors for a Linux system would be greatly appreciated.
You have many limits.
Multiplexing. This shouldn't be an issue if your application uses a decent backend. Libev claims to multiplex with 350us latency at 100,000 file descriptors.
Application speed. A 1ms application latency at that scale (pretty low) per request would take almost two minutes to serve 100,000 requests in optimum conditions.
Bandwidth. Depending on your application and protocol efficiency, this may be a problem. You say it's a squid proxy... if you're proxying websites: a client with no cache requesting a website can receive anywhere from a few hundred KB to several MB. If your average full page request per client was 500KB, you'd max out a full gigabit connection at 2000 requests per second. This might be your limiting factor.
2000 file descriptors is a fairly small amount. I've seen simple apps in languages like Python scale to over 3000 active connections on a single processor core without bad latency.
You can test your squid proxy with software like apachebench running on multiple client computers to get some realistic numbers. It's pretty easy to crank your file descriptor limit up to 2000+ and see what happens, and whether it even makes a difference afterwards.

Resources