Can Node.js force larger POST request chunk size? - node.js

This may be beyond the scope of what Node.js can control, but any pointers to other solutions to this problem are appreciated as well.
I'm uploading large files to a Node.js service via HTTP POST. When this runs locally or on the LAN, the chunks the server receives are always 65536 bytes in size and uploads are fast.
When I upload the same file to the same code running on a remote server (a Google Cloud VM, or real hardware at a co-lo) the received chunks are between 1448-2869 bytes and uploads are much slower (far below the bandwidth limits of the connection).
I'm not sure where the decision is being made to send smaller chunks across the WAN connection, if it's a calculation performed by the client software (both curl and node.js clients produce the same result), or if it's routing hardware in-between slicing up the packets, or something completely different.
I'm wondering if there is anything I can do to force larger chunks through to the server, or perhaps an alternative approach that overcomes the server thrash associated with processing these small chunks?

After a lot of experimentation and discussing this with a network engineer, the heart of the issue is the limits imposed by MTU size when the connection goes beyond a direct Ethernet link.
My solution was to modify the application to use multiple simultaneous requests to essentially "fill the gaps" created by the network overhead of the smaller chunks. The result was substantially faster network transfers at the cost of a little increased complexity.

Related

Http response time with nodejs compression is taking more time than without compression

I am trying to perform a load test with jmeter on my nodejs server. I found that the average http request response times for N concurrent user and the standard deviation is longer when using compression module than without using it. Is this something normal and what could the reason for that
Compression benefits you when the bandwidth speedup from delivering fewer bits by compressing is more important than the extra CPU that the compression takes. If you're running a local test over a fast network, then the bandwidth savings of compression may not be able to overcome the extra CPU load that the compression takes.
A local network test may not be representative of what would happen between a real set of clients and your server over a longer internet link that isn't as fast as a local network.
The slower the network link, the bigger the difference the compression might make. It also depends upon how large the http responses are. Small responses will not benefit much from compression either. Large responses have much more of a chance of benefiting from compression and even more on slower links.

Low download speed

Hey guys I am having a big problem and i need some advice .
I have a Dedicated server with those informations :
Atom C2750 8/8t 2,4 / 2,6 GHz
16 GB RAM DDR3 1600MHz
12TB
500Mbps Bandwidth
List item
1Gbps Network Burst
I am running a website using Nodejs where users can download high volume files .
The website evolved rapidly and I am having 10K users per day and an average of 1K concurrent users (downloads).
The problem is the server is getting lower and lower download speed on client's side so I have added a throttle to the downloads to 800Kb/s , it did help a bit but the problem remains the same what should I ?
Thanks
You have to figure out where your worst bottleneck is. You will have to design some measurements and tests and maybe some calculations to help determine where the bottleneck is.
Here are some possibilities:
Total server bandwidth over your server ethernet connection to the Internet. If you have 1K users all trying to download something and you have 500Mbps total bandwidth, then you're only going to get 0.5Mbps or 500Kbps per user. At that point, you need to either shrink the data, reduce the number of users or increase your server bandwidth.
Server CPU. This should be easy to detect. Check your CPU utilization. If your node.js process is using 100% of a CPU, then you are CPU bound and you either need a faster computer or if you have a multi-CPU processor, you can cluster your server on the same host to get more CPUs working for you. If you don't have a multi-CPU process, then get one or cluster across multiple servers (the idea is you need more CPUs). Though I have no idea if you are CPU bound (I suspect you're more likely bandwidth bound), this Atom C2750 has 8 cores so it would be good for clustering, but each core is not particularly fast compared to other Intel CPUs.
Network card. It's possible that your network card could be holding you back and not fully saturating your bandwidth. For example, if you only had a 100Mbps network connection to your server, then that's the max bandwidth you can use. If you think you should have a 1Gbps network connection to your server, then you need to make sure you actually are getting that fast a link.
FYI, the 1Gbps Network Burst probably doesn't help you much if you have lots of users downloading stuff over a longer period of time. That is most useful for a sudden and short peak of activity, not for a continuous high load.

Best way to manage big files downloads

I'm looking for the best way to manage the download of my products on the web.
Each of them weighs between 2 and 20 Go.
Each of them is approximatively downloaded between 1 and 1000 times a day by our customers.
I've tried to use Amazon S3 but the download speed isn't good and it quickly becomes expensive.
I've tried to use Amazon S3 + CloudFront but files are too large and the downloads too rare: the files didn't stay in the cache.
Also, I can't create torrent files in S3 because the files size is too large.
I guess the cloud solutions (such as S3, Azure, Google Drive...) work well only for small files, such as image / css / etc.
Now, I'm using my own servers. It works quite well but it is really more complex to manage...
Is there a better way, a perfect way to manage this sort of downloads?
This is a huge problem and we see it when dealing with folks in the movie or media business: they generate HUGE video files that need to be shared on a tight schedule. Some of them resort to physically shipping hard drives.
When "ordered and guaranteed data delivery" is required (e.g. HTTP, FTP, rsync, nfs, etc.) the network transport is usually performed with TCP. But TCP implementations are quite sensitive to packet loss, round-trip time (RTT), and the size of the pipe, between the sender and receiver. Some TCP implementations also have a hard time filling big pipes (limits on the max bandwidth-delay product; BDP = bit rate * propagation delay).
The ideal solution would need to address all of these concerns.
Reducing the RTT usually means reducing the distance between sender and receiver. Rule of thumb, reducing your RTT by half can double your max throughput (or cut your turnaround time in half). Just for context, I'm seeing an RTT from US East Coast to US West Coast as ~80-85ms.
Big deployments typically use a content delivery network (CDN) like Akamai or AWS CloudFront, to reduce the RTT (e.g. ~5-15ms). Simply stated, the CDN service provider makes arrangements with local/regional telcos to deploy content-caching servers on-premise in many cities, and sells you the right to use them.
But control over a cached resource's time-to-live (TTL) can depend on your service level agreement ($). And cache memory is not infinite, so idle resources might be purged to make room for newly requested data, especially if the cache is shared with others.
In your case, it sounds to me like you want to meaningfully reduce the RTT while retaining full control of the cache behaviour, so you can set a really long cache TTL. The best price/performance solution IMO is to deploy your own cache servers running CentOS 7 + NGINX with proxy_cache turned on and enough disk space, and deploy a cache server for each major region (e.g. west coast and east coast). Your end users could select the region closest to them, or your could add some code to automatically detect the closest regional cache server.
Deploying these cache servers on AWS EC2 is definitely an option. Your end users will probably see much better performance than by connecting to AWS S3 directly, and there are no BW caps.
The current AWS pricing for your volume is about $0.09/GB for BW out to the internet. Assuming your ~50 files at an average of 10GB, that's about $50/month for BW from cache servers to your end users - not bad? You could start with c4.large for low/average usage regions ($79/month). Higher usage regions might cost you about ~$150/month (c4.xl), ~$300/month (c4.2xl), etc. You can get better pricing with spot instances and you can tune performance based on your business model (e.g. VIP vs Best-Effort).
In terms of being able to "fill the pipe" and sensitivity to network loss (e.g. congestion control, congestion avoidance), you may want to consider an optimized TCP stack like SuperTCP (full disclaimer, I'm the director of development). The idea here is to have a per-connection auto-tuning TCP stack with a lot of engineering behind it, so it can fill huge pipes like the ones between AWS regions, and not overreact to network loss as regular TCP often does, especially when sending to Wi-Fi endpoints.
Unlike UDP solutions, it's a single-sided install (<5 min), you don't get charged for hardware or storage, you don't need to worry about firewalls, and it won't flood/kill your own network. You'd want to install it on your sending devices: the regional cache servers and the origin server(s) that push new requests to the cache servers.
An optimized TCP stack can increase your throughput by 25%-85% over healthy networks, and I've seen anywhere from 2X to 10X throughput on lousy networks.
Unfortunately I don't think AWS is going to have a solution for you. At this point I would recommend looking into some other CDN providers like Akamai https://www.akamai.com/us/en/solutions/products/media-delivery/download-delivery.jsp that provide services specifically geared toward large file downloads. I don't think any of those services are going to be cheap though.
You may also want to look into file acceleration software, like Signiant Flight or Aspera (disclosure: I'm a product manager for Flight). Large files (multiple GB in size) can be a problem for traditional HTTP transfers, especially over large latencies. File acceleration software goes over UDP instead of TCP, essentially masking the latency and increasing the speed of the file transfer.
One negative to using this approach is that your clients will need to download special software to download their files (since UDP is not supported natively in the browser), but you mentioned they use a download manager already, so that may not be an issue.
Signiant Flight is sold as a service, meaning Signiant will run the required servers in the cloud for you.
With file acceleration solutions you'll generally see network utilization of about 80 - 90%, meaning 80 - 90 Mbps on a 100 Mbps connection, or 800 Mbps on a 1 Gbps network connection.

Does thread has limit to use the network bandwidth?

I heard there is some limitation for a single thread to use network bandwidth? if this is true, is this the reason to use multithread programming to achieve the maximum bandwidth?
The reason to use multithreading for network tasks is that one thread might be waiting for a response from the remote server. Creating multiple threads can help you having at least one thread downloading from different requests at one time.
The usual reason for issuing more than one network request at a time, (either implicitly with user threads, or implicitly with kernel threads and asynchronous callbacks), is that the effects of network latency can be be minimised. Latency can have a large effect. A web connection, for example, needs a DNS lookup first, then a TCP 3-way connect, then some data transfer and finally a 4-way close. If the page size is small and the bandwidth large compared with the latency, most time is spent waiting for protocol exchanges.
So, if you are crawling multiple servers, a multithreaded design is hugely faster even on a single-core machine. If you are downloading a single video file from one server, not so much..

Theoretical limit of file descriptors in Linux

I'm running a dedicated proxy server with Squid, and I'm trying to get a feel for the maximum number of connections that the server can handle. I've realized this comes down to available file descriptors on the Linux machine.
I've found plenty of information on increasing maximum file descriptors, but I'd like to find out the theoretical maximum. According to the StackOverflow question "Why do operating systems limit file descriptors?", it comes down to available system RAM, which makes plenty of sense.
Now, given how much RAM I have available, how can I determine a maximum value for file descriptors for the operating system? Some value which would obviously still allow the system to run stably.
Perhaps someone might have an idea given other high-end production servers? What is the 'norm' for maxing out the potential number of simultaneous connections (file descriptors)? Any insight into how I can max-out file descriptors for a Linux system would be greatly appreciated.
You have many limits.
Multiplexing. This shouldn't be an issue if your application uses a decent backend. Libev claims to multiplex with 350us latency at 100,000 file descriptors.
Application speed. A 1ms application latency at that scale (pretty low) per request would take almost two minutes to serve 100,000 requests in optimum conditions.
Bandwidth. Depending on your application and protocol efficiency, this may be a problem. You say it's a squid proxy... if you're proxying websites: a client with no cache requesting a website can receive anywhere from a few hundred KB to several MB. If your average full page request per client was 500KB, you'd max out a full gigabit connection at 2000 requests per second. This might be your limiting factor.
2000 file descriptors is a fairly small amount. I've seen simple apps in languages like Python scale to over 3000 active connections on a single processor core without bad latency.
You can test your squid proxy with software like apachebench running on multiple client computers to get some realistic numbers. It's pretty easy to crank your file descriptor limit up to 2000+ and see what happens, and whether it even makes a difference afterwards.

Resources