Spring websevice, single wsdl but different WS provider, performance issue - multithreading

I am facing a performance issue. In my project, I have a webservice client which gives a call to hardware entity to get its status and other parameter's value. I am using Soap based Spring WS.
I have approx 5000 devices to which I need to make call in parallel using 100-500 threads at a time.
With a single call, it takes less than 5 second per device which is expected.
But when in multi-threading, the time keeps on increasing from 5 seconds to 30 sec and further more, more than 100 seconds even, device per device. And it takes more than 30 min for all devices which should be less than 2 min as per requirement.
We have different uri for each device so we gets URI dynamically so we use Spring's webServiceTemplate's method- marshalSendAndReceive(String uri, Object requestPayload, WebServiceMessageCallback requestCallback).
WebServiceTemplate object is singleton.
Only 1 wsdl but different devices are different WS provider.
Somewhere I found that it might be an issue with marshallers so I have increased the number of marshallers object for singleton webServiceTemplate object but this also didn't work.
Please share me idea to solve such issue. If need more info in order to solve this issue, please let me know if I missed to share any info.
Elaborating some more about the question:
Thanks hagrawal, yes threads cannot increase the response time but somewhere threads are taking time which I am not able to understand but yes, it is taking time when calls to actual webservice to talk to devices. I have taken start and end time to measure the timing for that call and found that first few 100 devices, the time taken is less that 3-4 sec but after that, the time taken keeps on increasing for further devices.
I have checked the JVM also and could not find any issue related to memory but yes, found so many threads blocked multiple times. Looks like these blocking threads consumes most of the time. I have taken the stack trace of those blocked threads, as below.
pool-111757-thread-1 [13184] (BLOCKED)
sun.security.ssl.Handshaker.calculateConnectionKeys line: 1266
sun.security.ssl.Handshaker.calculateKeys line: 1112
sun.security.ssl.ClientHandshaker.serverHelloDone line: 1078
sun.security.ssl.ClientHandshaker.processMessage line: 348
sun.security.ssl.Handshaker.processLoop line: 979
sun.security.ssl.Handshaker.process_record line: 914
sun.security.ssl.SSLSocketImpl.readRecord line: 1062
sun.security.ssl.SSLSocketImpl.performInitialHandshake line: 1375
sun.security.ssl.SSLSocketImpl.starHandshake line: 1403
sun.security.ssl.SSLSocketImpl.startHandshake line: 1387
org.apache.http.conn.ssl.SSSLConnectionSocketFactory.createLayeredSocket line: 275
org.apache.http.conn.ssl.SSSLConnectionSocketFactory.connectSocket line: 254
org.apache.http.impl.conn.HttpClientConnectionOperator.connect line: 123
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect line: 318
Again just to inform, the time taken increasing for the method when calling to actual ws of devices.

Related

ESP32: BLE transmission speed is very slow

I am trying to build an Android app that interfaces with the ESP32 using BLE. I am using the RxBluetoothKotlin library from Vincent Masselis for the Android side. For the ESP32 side, I am using the default Kolban libraries that are included in the Arduino IDE. My phone is a OnePlus 5T and my ESP32 is a MH ET Live ESP32DevKIT. My Android app can be found here, and my ESP32 program here.
The whole system works pretty much perfectly for me in terms of pure functionality. That is to say, every button does what it's supposed to do, and I get the exact behaviour I had expected to get. However, the communication itself is very slow. Around 200 bytes/second. My test button in the Android app requests a bunch of text data from the ESP32, and displays this in a dialog. It also lists a number which represents the time between request and reception in milliseconds. Using this, I get around 2 seconds for 440 bytes of data. When I send less data, the time decreases approximately linearly with data size. 40 bytes of data will take around 200ms, and 20 bytes or under typically takes less than 100ms.
This seems rather slow to me. From what I understand, I should be able to at least get a few kilobytes per second. I have tried to check the speed using nRF Connect, but I get the same 2 seconds timespan for my data transfer. This suggests that the problem is not in my app, since I also have it with a completely different app. I also put the code in my main loop inside of callbacks instead (which I probably should have done in the first place), but this didn't change things at all. I have tried taking the microcontroller and my phone to a few different locations, hoping to eliminate interference. I have tried to mess with BLEDevice::setPower and BLEDevice::setMTU, as well as setting RxBluetoothGatt.requestMtu(500) on the Android side. Everything so far seems to have had little to no effect. The only thing that did anything, was adding the line "pServer->updatePeerMTU(0,500);" in my loop during the connection phase. This caused the first 23 bytes of data to be repeated whenever I pressed the test button in my app, and made the data transfer take about 3 seconds. If I'm lucky, I can get maybe a bit under 1.8 seconds for 440 bytes, but this is a very small change when I'm expecting an order of magnitude of difference, and might even be down to pure chance rather than anything I did.
Does anyone have an idea of how to increase my transfer speed?
The data transmission speed is mainly influenced by the Bluetooth LE connection interval (between 7.5 ms and 4 seconds) and is negotiated between the master (central unit) and the peripheral device. The master establishes a connection with a parameter set and the peripheral can propose to change this parameter set. In the end, however, the central unit decides which parameter set is to be used.
But the Bluetooth connection interval cannot be changed by an Android applications directly, which normally act as the central role. Instead it can request a connection priority which is known to have an influence on the connection interval.

Connect exception in gatling- What does this mean?

I ran the below config in gatling from my local machine to verify 20K requests per second ..
scn
.inject(
atOnceUsers(20000)
)
It gave these below error in reports...What des this mean in gatling?
j.n.ConnectException: Can't assign requested address:
/xx.xx.xx:xxxx 3648 83.881 %
j.n.ConnectException: connection timed out: /xx.xx.xx:xxxx 416 9.565 %
status.find.is(200), but actually found 500 201 4.622 %
j.u.c.TimeoutException: Request timeout to not-connected after
60000ms 84 1.931 %
Are these timeouts happening due to server not processing the requests or requests not going from my local machine
Most probably yes, that's the reason.
Seems your simulation was compiled successfully and started.
If you look to the error messages you will see percentages after each line (83.881%, 9.565%, 1.931 %). This means that actually the requests were generated and were sent and some of them failed. Percentages are counted based on total number of fails.
If some of the requests are OK and you get these errors, then Gatling did its job. It stress tested your application.
Try to simulate with lower number of users,for example:
scn
inject(
rampUsers(20) over (10 seconds)
)
If it works then definitely your application is not capable to handle 20000 requests at once.
For more info on how to setup a simulation see here.

How to increase kernel poll rate for accelerometer?

I'm using the hwmon/mxc_mma8451.c module to access an accelerometer. Using /sys/devices/virtual/input/input0/poll I can change the polling rate to some degree... if I set a larger millisecond value the polling becomes slower. However, I cannot seem to get below around 30ms per poll, despite the device driver source apparently allowing as low as 1ms per poll. The accelerometer itself supports 800Hz sample rate, so that is not the bottleneck. When I write a value of 1 to the above file, I see each sample occurs either 30ms or 60ms from the previous sample, so it is not even consistent. However, even 30ms is unacceptably slow at it is only 33Hz.
The kernel source for the module clearly shows that I should be able to use a value of 1:
#define POLL_INTERVAL_MIN 1
#define POLL_INTERVAL_MAX 500
#define POLL_INTERVAL 100 /* msecs */
...
mma8451_idev->poll_interval = POLL_INTERVAL;
mma8451_idev->poll_interval_min = POLL_INTERVAL_MIN;
mma8451_idev->poll_interval_max = POLL_INTERVAL_MAX;
I'm not familiar with exactly how Linux does this kind of polling, but this system has a 10ms tick, so even if sampling with ticks, why is it taking 3 or 6 ticks per sample and nothing else? Is there some kernel parameter somewhere else that is throttling how fast polling can occur?
Linux kernel version is 3.14.28 for IMX28 (ARM) if that makes any difference. This is the version available for the device in question, so I can't just up and use a different/newer one.

How much data can be fetched by submit_bio() at a time

Here is my LAN structure
I want to download a .zip file of 258.6MB from the samba server, meanwhile, start a profiling for the router's linux stack just before the download.
When finished, stopped the profiling and I found this in the porfiling report
samples % image name app name symbol name
...
16 0.0064 vmlinux smbd submit_bio
...
The sampling rate is 100000 and the event is CPU_CYCLES.
Because this is the first download of the file that is to say it is not in the page cache, submit_bio() should be pretty busy. Thus, I don't understand why there is just a poor portion of submit_bio(). Is that mean each time the submit_bio is called, we fetch about (258.6/16)MB data?
Thanks
That's statistical sampling. It means of the x times the profiler sampled the system, 16 times it happened to find the CPU running in submit_bio(). It does not mean that submit_bio() is called 16 times.

Ideal timeout period for dns lookup

In my rails app i do a nslookup using a ruby library resolv. If the site like dgdfgdfgdfg.com is entered its talking too long to resolve. in some instance like 20 sec.(mostly for non-existent sites) Because it cause the application to slowdown.
So i though of introducing a timeout period for the dns lookup.
What will be the ideal timeout period for the dns lookup so that resolution of actual site doesnt fail. will something like 10 sec will be fine?
There's no IETF mandated value, although ยง6.1.3.3 of RFC 1123 suggests a value not less than 5 seconds.
Perl's Net::DNS and the command line dig utility do default to 5 seconds between retries. Some versions of the Microsoft resolver appear to default to 3 seconds.
You can run some tests among the users to find out the right number compromising responsiveness / performance.
Also you can adjust that timeout dinamically depending on the network traffic.
For example, for every sucessful resolv, you save how much time it took you to resolv it. And every hour (for example) you can calculate an average and set double of its value as timeout (Remember that "average" is, roughly speaking, "the middle"). This way if your latency is high at some point, it autoadjust itself to increase the timeout period.

Resources