Connect exception in gatling- What does this mean? - performance-testing

I ran the below config in gatling from my local machine to verify 20K requests per second ..
scn
.inject(
atOnceUsers(20000)
)
It gave these below error in reports...What des this mean in gatling?
j.n.ConnectException: Can't assign requested address:
/xx.xx.xx:xxxx 3648 83.881 %
j.n.ConnectException: connection timed out: /xx.xx.xx:xxxx 416 9.565 %
status.find.is(200), but actually found 500 201 4.622 %
j.u.c.TimeoutException: Request timeout to not-connected after
60000ms 84 1.931 %
Are these timeouts happening due to server not processing the requests or requests not going from my local machine

Most probably yes, that's the reason.
Seems your simulation was compiled successfully and started.
If you look to the error messages you will see percentages after each line (83.881%, 9.565%, 1.931 %). This means that actually the requests were generated and were sent and some of them failed. Percentages are counted based on total number of fails.
If some of the requests are OK and you get these errors, then Gatling did its job. It stress tested your application.
Try to simulate with lower number of users,for example:
scn
inject(
rampUsers(20) over (10 seconds)
)
If it works then definitely your application is not capable to handle 20000 requests at once.
For more info on how to setup a simulation see here.

Related

Python aiohttp been receiving SSL transport errors

We have an application running that relies heavily on asyncio.
It sends hundreds of get requests per minute to mostly the same host, but with different urls.
Since about 3 weeks, we observe the following issues:
The process gets stuck, often for up to (exactly) 2400 seconds.
We observe the following error in the logging:
2018-12-07T23:37:33Z ERROR base_events.py: Fatal error on SSL transport protocol:
File "/usr/lib64/python3.6/asyncio/sslproto.py", line 638, in _process_write_backlog ssldata, offset = self._sslpipe.feed_appdata(data, offset)
Python version: 3.6.3
aiohttp version: 3.4.4
Question 1: Does anyone know what is going on here? And how can we get rid of those nasty periods of the process getting stuck ... ? (Or how to debug?)
Question 2: Can this be related?: https://bugs.python.org/issue29406

Spring websevice, single wsdl but different WS provider, performance issue

I am facing a performance issue. In my project, I have a webservice client which gives a call to hardware entity to get its status and other parameter's value. I am using Soap based Spring WS.
I have approx 5000 devices to which I need to make call in parallel using 100-500 threads at a time.
With a single call, it takes less than 5 second per device which is expected.
But when in multi-threading, the time keeps on increasing from 5 seconds to 30 sec and further more, more than 100 seconds even, device per device. And it takes more than 30 min for all devices which should be less than 2 min as per requirement.
We have different uri for each device so we gets URI dynamically so we use Spring's webServiceTemplate's method- marshalSendAndReceive(String uri, Object requestPayload, WebServiceMessageCallback requestCallback).
WebServiceTemplate object is singleton.
Only 1 wsdl but different devices are different WS provider.
Somewhere I found that it might be an issue with marshallers so I have increased the number of marshallers object for singleton webServiceTemplate object but this also didn't work.
Please share me idea to solve such issue. If need more info in order to solve this issue, please let me know if I missed to share any info.
Elaborating some more about the question:
Thanks hagrawal, yes threads cannot increase the response time but somewhere threads are taking time which I am not able to understand but yes, it is taking time when calls to actual webservice to talk to devices. I have taken start and end time to measure the timing for that call and found that first few 100 devices, the time taken is less that 3-4 sec but after that, the time taken keeps on increasing for further devices.
I have checked the JVM also and could not find any issue related to memory but yes, found so many threads blocked multiple times. Looks like these blocking threads consumes most of the time. I have taken the stack trace of those blocked threads, as below.
pool-111757-thread-1 [13184] (BLOCKED)
sun.security.ssl.Handshaker.calculateConnectionKeys line: 1266
sun.security.ssl.Handshaker.calculateKeys line: 1112
sun.security.ssl.ClientHandshaker.serverHelloDone line: 1078
sun.security.ssl.ClientHandshaker.processMessage line: 348
sun.security.ssl.Handshaker.processLoop line: 979
sun.security.ssl.Handshaker.process_record line: 914
sun.security.ssl.SSLSocketImpl.readRecord line: 1062
sun.security.ssl.SSLSocketImpl.performInitialHandshake line: 1375
sun.security.ssl.SSLSocketImpl.starHandshake line: 1403
sun.security.ssl.SSLSocketImpl.startHandshake line: 1387
org.apache.http.conn.ssl.SSSLConnectionSocketFactory.createLayeredSocket line: 275
org.apache.http.conn.ssl.SSSLConnectionSocketFactory.connectSocket line: 254
org.apache.http.impl.conn.HttpClientConnectionOperator.connect line: 123
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect line: 318
Again just to inform, the time taken increasing for the method when calling to actual ws of devices.

Node.js struggling with lots of concurrent connections

I'm working on a somewhat unusual application where 10k clients are precisely timed to all try to submit data at once, every 3 mins or so. This 'ab' command fairly accurately simulates one barrage in the real world:
ab -c 10000 -n 10000 -r "http://example.com/submit?data=foo"
I'm using Node.js on Ubuntu 12.4 on a rackspacecloud VPS instance to collect these submissions, however, I'm seeing some very odd behavior from Node, even when I remove all my business logic and turn the http request into a no-op.
When the test gets about 90% done, it hangs for a long period of time. Strangely, this happens consistently at 90% - for c=n=10k, at 9000; for c=n=5k, at 4500; for c=n=2k, at 1800. The test actually completes eventually, often with no errors. But both ab and node logs show continuous processing up till around 80-90% of the test run, then a long pause before completing.
When node is processing requests normally, CPU usage is typically around 50-70%. During the hang period, CPU goes up to 100%. Sometimes it stays near 0. Between the erratic CPU response and the fact that it seems unrelated to the actual number of connections (only the % complete), I do not suspect the garbage collector.
I've tried this running 'ab' on localhost and on a remote server - same effect.
I suspect something related to the TCP stack, possibly involving closing connections, but none of my configuration changes have helped. My changes:
ulimit -n 999999
When I listen(), I set the backlog to 10000
Sysctl changes are:
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_max_orphans = 20000
net.ipv4.tcp_max_syn_backlog = 10000
net.core.somaxconn = 10000
net.core.netdev_max_backlog = 10000
I have also noticed that I tend to get this msg in the kernel logs:
TCP: Possible SYN flooding on port 80. Sending cookies. Check SNMP counters.
I'm puzzled by this msg since the TCP backlog queue should be deep enough to never overflow. If I disable syn cookies the "Sending cookies" goes to "Dropping connections".
I speculate that this is some sort of linux TCP stack tuning problem and I've read just about everything I could find on the net. Nothing I have tried seems to matter. Any advice?
Update: Tried with tcp_max_syn_backlog, somaxconn, netdev_max_backlog, and the listen() backlog param set to 50k with no change in behavior. Still produces the SYN flood warning, too.
Are you running ab on the same machine running node? If not do you have a 1G or 10G NIC? If you are, then aren't you really trying to process 20,000 open connections?
Also if you are changing net.core.somaxconn to 10,000 you have absolutely no other sockets open on that machine? If you do then 10,000 is not high enough.
Have you tried to use nodejs cluster to spread the number of open connections per process out?
I think you might find this blog post and also the previous ones useful
http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/

Does WGET timeout?

I'm running a PHP script via cron using Wget, with the following command:
wget -O - -q -t 1 http://www.example.com/cron/run
The script will take a maximum of 5-6 minutes to do its processing. Will WGet wait for it and give it all the time it needs, or will it time out?
According to the man page of wget, there are a couple of options related to timeouts -- and there is a default read timeout of 900s -- so I say that, yes, it could timeout.
Here are the options in question :
-T seconds
--timeout=seconds
Set the network timeout to seconds
seconds. This is equivalent to
specifying --dns-timeout,
--connect-timeout, and
--read-timeout, all at the same
time.
And for those three options :
--dns-timeout=seconds
Set the DNS lookup timeout to seconds
seconds. DNS lookups that don't
complete within the specified time
will fail. By default, there is no
timeout on DNS lookups, other than
that implemented by system libraries.
--connect-timeout=seconds
Set the connect timeout to seconds
seconds. TCP connections that take
longer to establish will be aborted.
By default, there is no connect
timeout, other than that implemented
by system libraries.
--read-timeout=seconds
Set the read (and write) timeout to
seconds seconds. The "time" of
this timeout refers to idle time: if,
at any point in the download, no data
is received for more than the
specified number of seconds, reading
fails and the download is restarted.
This option does not directly
affect the duration of the entire
download.
I suppose using something like
wget -O - -q -t 1 --timeout=600 http://www.example.com/cron/run
should make sure there is no timeout before longer than the duration of your script.
(Yeah, that's probably the most brutal solution possible ^^ )
The default timeout is 900 second. You can specify different timeout.
-T seconds
--timeout=seconds
The default is to retry 20 times. You can specify different tries.
-t number
--tries=number
link: wget man document
Prior to version 1.14, wget timeout arguments were not adhered to if downloading over https due to a bug.
Since in your question you said it's a PHP script, maybe the best solution could be to simply add in your script:
ignore_user_abort(TRUE);
In this way even if wget terminates, the PHP script goes on being processed at least until it does not exceeds max_execution_time limit (ini directive: 30 seconds by default).
As per wget anyay you should not change its timeout, according to the UNIX manual the default wget timeout is 900 seconds (15 minutes), whis is much larger that the 5-6 minutes you need.
None of the wget timeout values have anything to do with how long it takes to download a file.
If the PHP script that you're triggering sits there idle for 5 minutes and returns no data, wget's --read-timeout will trigger if it's set to less than the time it takes to execute the script.
If you are actually downloading a file, or if the PHP script sends some data back, like a ... progress indicator, then the read timeout won't be triggered as long as the script is doing something.
wget --help tells you:
-T, --timeout=SECONDS set all timeout values to SECONDS
--dns-timeout=SECS set the DNS lookup timeout to SECS
--connect-timeout=SECS set the connect timeout to SECS
--read-timeout=SECS set the read timeout to SECS
So if you use --timeout=10 it sets the timeouts for DNS lookup, connecting, and reading bytes to 10s.
When downloading files you can set the timeout value pretty low and as long as you have good connectivity to the site you're connecting to you can still download a large file in 5 minutes with a 10s timeout. If you have a temporary connection failure to the site or DNS, the transfer will time out after 10s and then retry (if --tries aka -t is > 1).
For example, here I am downloading a file from NVIDIA that takes 4 minutes to download, and I have wget's timeout values set to 10s:
$ time wget --timeout=10 --tries=1 https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
--2021-07-02 16:39:21-- https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3057439068 (2.8G) [application/octet-stream]
Saving to: ‘cuda_11.2.2_460.32.03_linux.run.1’
cuda_11.2.2_460.32.03_linux.run.1 100%[==================================================================================>] 2.85G 12.5MB/s in 4m 0s
2021-07-02 16:43:21 (12.1 MB/s) - ‘cuda_11.2.2_460.32.03_linux.run.1’ saved [3057439068/3057439068]
real 4m0.202s
user 0m5.180s
sys 0m16.253s
4m to download, timeout is 10s, everything works just fine.
In general, timing out DNS, connections, and reads using a low value is a good idea. If you leave it at the default value of 900s you'll be waiting 15m every time there's a hiccup in DNS or your Internet connectivity.

How do I stress test a web form file upload?

I need to test a web form that takes a file upload.
The filesize in each upload will be about 10 MB.
I want to test if the server can handle over 100 simultaneous uploads, and still remain
responsive for the rest of the site.
Repeated form submissions from our office will be limited by our local DSL line.
The server is offsite with higher bandwidth.
Answers based on experience would be great, but any suggestions are welcome.
Use the ab (ApacheBench) command-line tool that is bundled with Apache
(I have just discovered this great little tool). Unlike cURL or wget,
ApacheBench was designed for performing stress tests on web servers (any type of web server!).
It generates plenty statistics too. The following command will send a
HTTP POST request including the file test.jpg to http://localhost/
100 times, with up to 4 concurrent requests.
ab -n 100 -c 4 -p test.jpg http://localhost/
It produces output like this:
Server Software:
Server Hostname: localhost
Server Port: 80
Document Path: /
Document Length: 0 bytes
Concurrency Level: 4
Time taken for tests: 0.78125 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Non-2xx responses: 100
Total transferred: 2600 bytes
HTML transferred: 0 bytes
Requests per second: 1280.00 [#/sec] (mean)
Time per request: 3.125 [ms] (mean)
Time per request: 0.781 [ms] (mean, across all concurrent requests)
Transfer rate: 25.60 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 2.6 0 15
Processing: 0 2 5.5 0 15
Waiting: 0 1 4.8 0 15
Total: 0 2 6.0 0 15
Percentage of the requests served within a certain time (ms)
50% 0
66% 0
75% 0
80% 0
90% 15
95% 15
98% 15
99% 15
100% 15 (longest request)
Automate Selenium RC using your favorite language. Start 100 Threads of Selenium,each typing a path of the file in the input and clicking submit.
You could generate 100 sequentially named files to make looping over them easyily, or just use the same file over and over again
I would perhaps guide you towards using cURL and submitting just random stuff (like, read 10MB out of /dev/urandom and encode it into base32), through a POST-request and manually fabricate the body to be a file upload (it's not rocket science).
Fork that script 100 times, perhaps over a few servers. Just make sure that sysadmins don't think you are doing a DDoS, or something :)
Unfortunately, this answer remains a bit vague, but hopefully it helps you by nudging you in the right track.
Continued as per Liam's comment:
If the server receiving the uploads is not in the same LAN as the clients connecting to it, it would be better to get as remote nodes as possible for stress testing, if only to simulate behavior as authentic as possible. But if you don't have access to computers outside the local LAN, the local LAN is always better than nothing.
Stress testing from inside the same hardware would be not a good idea, as you would do double load on the server: Figuring out the random data, packing it, sending it through the TCP/IP stack (although probably not over Ethernet), and only then can the server do its magic. If the sending part is outsourced, you get double (taken with an arbitrary sized grain of salt) performance by the receiving end.

Resources