As you know windows 10 pro supports only 10 concurrent connections at a same time.
but my iis-hosted-app need more than that number connections. how can i unlimit this.
Related
I have both Linux and Windows 10 installed on a dual Xeon 6138 computer with 64GB of RAM.
I cannot access the computer immediately (because of the lockdown) but I strongly believe the Windows version is Windows 10 Enterprise. The system has last been updated in late 2018, and not after.
Xeon 6138 specs are available here (basically, each CPU has 20 cores, totalizing 40 HT threads for a single CPU, and 80 in my dual setup):
https://ark.intel.com/content/www/fr/fr/ark/products/120476/intel-xeon-gold-6138-processor-27-5m-cache-2-00-ghz.html
When I run a CPU-intensive program in Linux on this setup, all 80 threads of my system are used (see attached image 1).
My question is: when I run the same program on Windows 10, compiled with VC++ 2017, the process can only saturate 40 of the 80 threads available on my system.
Why, and how can I have all 80 threads used? (I know there is the concept of processor groups on Windows, but most of the programs I use are simply not processor-group aware, and I just know that I can't change that).
I have a setup with 2 machines. I am using one as the server and the other as client. They are connected directly using a 1Ghz link. Both the machines have 4 cores, 8Gb ram and almost 100Gb disk space. I need to tune the Nginx server ( its the one im trying with but i can use any other as well) to handle 85000 concurrent connections. I have a 1kb file on the server and i am using curl on the client to get the same file over all the connections.
After trying various tuning settings, i have 1500 established connections and around 30000 TIME_WAIT connections when i call the curl around 40000 times. Is there a way i can make the TIME_WAITs ESTABLISHED?
Any help in tuning both the server and client will be much appreciated. I am pretty new to using Linux and trying to get the hang of it. The version of linux on both machines is Fedora 20.
Besides of tuning Nginx, you will also need to tune your Linux installation in respect to limits in number of tcp connections, sockets, open files, etc.
These two links should give you a great overview:
https://www.nginx.com/blog/tuning-nginx/
https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/
You might want to check how much memory TCP buffers etc are using for all those connections.
See this SO thread: How much memory is consumed by the Linux kernel per TCP/IP network connection?
Also, this page is good: http://www.psc.edu/index.php/networking/641-tcp-tune
Given that your two machines are one the same physical network and delays are very low, you can use fairly small TCP window buffer sizes. Modern Linuxes (you didn't mention what kernel you're using) have TCP Autotuning that automatically adjusts these buffers, so you should not have to worry about this unless you're using an old kernel.
Regardless, however, the application(s) can allocate send- and receive buffers separately, which disables TCP Autotuning, so if you're running an application that does this, you might want to limit how much buffer space an application can request per connection (the net.core.wmem_max and net.core.rmem_max variables mentioned in the SO article).
I would recommend https://github.com/eunyoung14/mtcp to achieve 1 million concurrent connection, I did some tuning of mtcp and tested it on a used Dell PowerEdge R210 with 32G ram and 8 cores to achieve 1 million concurrent connection.
When I run a TCP server and client on the same machine what I am observing is that the client send time (that is timestampT1 , send() , timestampT2 ; timestampT2 - timestampT1 ) is significantly higher in the tail percentiles than if I run the same server on a different machine.
With all TCP parameters, software and machine specs being equal if client takes 10 mirco sec in the mean and 20-25 mircoseconds in the 90-99th percentile for 1 million sends in case of server and client on different boxes , it takes 10 microsec in the mean and 70-100 microseconds in the 90-99th percentile for server and client on same box.
I have tried playing with interuupt isolation, socket send buffer sizing and CPU pinning with no significant improvements. This is RHEL 5.6.
Any possible explanation for this ?
Heisenberg uncertainty principle in a broad sense. More specifically, if you have two programs on a computer where one is sending data and the other is analyzing it - then you're taxing the CPU with two tasks, where as if your monitoring program is running on a different computer - your sender has the benefit of not having to compete with anyone else and will always be faster.
Don't test network throughput with both programs on the same machine.
Are there any important change in how SLES 10 implements Tcp sockets vs. SLES 9?
I have several apps written in C# (.NET 3.5) that run on Windows XP and Windows Server 2003. They've been running fine for over a year, getting market data from a SLES 9 machine using a socket connection.
The machine was upgraded today to SLES 10 and its causing some strange behavior. The socket normally returns a few hundred or thousand bytes every second. But occasionally, I stop receiving data. Ten or more seconds will go by with no data and then Receive returns with a 10k+ bytes. And some buffer is causing data loss because the bytes I receive on the socket no longer make a correct packet.
The only thing changed was the SLES 9 to 10 upgrade. And rolling back fixes this immediately. Any ideas?
The dropped packets can be resolved by upgrading the smb kernel to 2.6.16.60-0.37 or later. The BNX2 kernel module is the root cause for the dropping packets. This is a known issue with SLES 10 out of the box.
Reference: http://www.novell.com/support/search.do?cmd=displayKC&sliceId=SAL_Public&externalId=7002506
The defaults for /proc/sys/net settings may have changed. Maybe newer SLES enables things like tcp_ecn?
If your network is dropping some packets it doesn't like with SLES10, then it's probably enabling newer TCP features. Otherwise I don't know. I'd look at it with tcpdump/wireshark. And maybe strace the server process to see what system calls it was doing.
SLES is the sender, so it's possible something could have changed that made it decide to wait until it had a full window of data or something. But 10k is too much. Sounds more like dropped packets, and then a large return when a missing packet finally arrives, allowing the queued up data to be returned too.
Has anyone an idea how many tcp-socket connections are possible on a modern standard Linux server?
(There is in general less traffic on each connection, but all the connections have to be up all the time.)
I achieved 1600k concurrent idle socket connections, and at the same time 57k req/s on a Linux desktop (16G RAM, I7 2600 CPU). It's a single thread http server written in C with epoll. Source code is on github, a blog here.
Edit:
I did 600k concurrent HTTP connections (client & server) on both the same computer, with JAVA/Clojure . detail info post, HN discussion: http://news.ycombinator.com/item?id=5127251
The cost of a connection(with epoll):
application need some RAM per connection
TCP buffer 2 * 4k ~ 10k, or more
epoll need some memory for a file descriptor, from epoll(7)
Each registered file descriptor costs roughly 90
bytes on a 32-bit kernel, and roughly 160 bytes on a 64-bit kernel.
This depends not only on the operating system in question, but also on configuration, potentially real-time configuration.
For Linux:
cat /proc/sys/fs/file-max
will show the current maximum number of file descriptors total allowed to be opened simultaneously. Check out http://www.cs.uwaterloo.ca/~brecht/servers/openfiles.html
A limit on the number of open sockets is configurable in the /proc file system
cat /proc/sys/fs/file-max
Max for incoming connections in the OS defined by integer limits.
Linux itself allows billions of open sockets.
To use the sockets you need an application listening, e.g. a web server, and that will use a certain amount of RAM per socket.
RAM and CPU will introduce the real limits. (modern 2017, think millions not billions)
1 millions is possible, not easy. Expect to use X Gigabytes of RAM to manage 1 million sockets.
Outgoing TCP connections are limited by port numbers ~65000 per IP. You can have multiple IP addresses, but not unlimited IP addresses.
This is a limit in TCP not Linux.
10,000? 70,000? is that all :)
FreeBSD is probably the server you want, Here's a little blog post about tuning it to handle 100,000 connections, its has had some interesting features like zero-copy sockets for some time now, along with kqueue to act as a completion port mechanism.
Solaris can handle 100,000 connections back in the last century!. They say linux would be better
The best description I've come across is this presentation/paper on writing a scalable webserver. He's not afraid to say it like it is :)
Same for software: the cretins on the
application layer forced great
innovations on the OS layer. Because
Lotus Notes keeps one TCP connection
per client open, IBM contributed major
optimizations for the ”one process,
100.000 open connections” case to Linux
And the O(1) scheduler was originally
created to score well on some
irrelevant Java benchmark. The bottom
line is that this bloat benefits all of
us.
On Linux you should be looking at using epoll for async I/O. It might also be worth fine-tuning socket-buffers to not waste too much kernel space per connection.
I would guess that you should be able to reach 100k connections on a reasonable machine.
depends on the application. if there is only a few packages from each client, 100K is very easy for linux. A engineer of my team had done a test years ago, the result shows : when there is no package from client after connection established, linux epoll can watch 400k fd for readablity at cpu usage level under 50%.
Which operating system?
For windows machines, if you're writing a server to scale well, and therefore using I/O Completion Ports and async I/O, then the main limitation is the amount of non-paged pool that you're using for each active connection. This translates directly into a limit based on the amount of memory that your machine has installed (non-paged pool is a finite, fixed size amount that is based on the total memory installed).
For connections that don't see much traffic you can reduce make them more efficient by posting 'zero byte reads' which don't use non-paged pool and don't affect the locked pages limit (another potentially limited resource that may prevent you having lots of socket connections open).
Apart from that, well, you will need to profile but I've managed to get more than 70,000 concurrent connections on a modestly specified (760MB memory) server; see here http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html for more details.
Obviously if you're using a less efficient architecture such as 'thread per connection' or 'select' then you should expect to achieve less impressive figures; but, IMHO, there's simply no reason to select such architectures for windows socket servers.
Edit: see here http://blogs.technet.com/markrussinovich/archive/2009/03/26/3211216.aspx; the way that the amount of non-paged pool is calculated has changed in Vista and Server 2008 and there's now much more available.
Realistically for an application, more then 4000-5000 open sockets on a single machine becomes impractical. Just checking for activity on all the sockets and managing them starts to become a performance issue - especially in real-time environments.