Jmeter TCP Raspberry Pi cache - linux

I have written simple server in Qt, which responses for TCP requests with simple string (few bytes), the response and request are constant sets of data. I have compiled it on Raspberry Pi (Arch Linux), then ran and connected it to my LAN. On my laptop I ran Jmeter with TCP Sampler.
After 5 minutes responding to 15 threads server stays on constant 80ms time of response. Then, after 8 minutes it starts to falling down:
time - avg response time
5mins - 80ms
8mins - 72ms
10mins - 44ms
12mins and more - 20ms
And it stays on this about 20ms. Why is that happening? Is there some cache mechanism or just some random conditions changing? I cant run the tests again and I have no idea where is the possibility to cache the sending data.

How many hits per seconds are you getting?
The TCP sampler will NOT cache the results.
Maybe something in the OS is doing some caching?

The answer maybe too late, but I would like to share my experience in a similar situation when I was debugging a performance issue. Using a network sniffer to capture the packets, then look at a few sessions to check the response time. You can either prove or disprove the reported response time by traffic generator is correct or not and go from there. Good luck.

Related

Send HTTP request at exact time in future with Nodejs

I need to make POST http request at exact timestamp in future, as accurate as possible, down to milliseconds. But there is network latency as well. How can I achieve such a goal?
setTimeout is not enough here, because it always takes some time resulting in latecomer request due vary network latency. And firing this request before target timestamp may result in early coming request.
My goal is to make request guaranteed came to server after target timestamp, but as soon as possible after it. Could you suggest any solutions with Nodejs?
The best you can do in nodejs (which is not a real-time system) is to do the following:
Premeasure the expected latency so you know about how much to presend the request.
Use setTimeout() to schedule the send at precisely the one-way latency time before your target time. There is no other mechanism in nodejs that would be more precise.
If your request involves a DNS lookup, you can prefetch the TCP address for your hostname and take the DNS lookup time out of your request cycle or at least prime the local DNS cache.
Create a dedicated nodejs program that does nothing else - so its event loop will not be doing anything else at the time the setTimeout() needs to run. You could run this as a child_process from your larger program if desired.
Run a number of tests to see how the timing works and, if you are consistently off by some margin, then adjust your latency offset.
You can develop a regular latency test to determine if the latency changes with time.
As others have said, there is no way to predict what the natural response time will be of the target server (how long it takes to start processing your request from the moment your network packets arrive there). If lots of incoming requests are all racing for the same time slot, then your request will get interleaved in among all the others and served in some order that you do not control.
Other things you can consider. If the target server supports the latest http specifications, then you can have a pre-established http connection with the host (perhaps targeting some other endpoint) that will be kept alive for you to send your precise timing request on. This would take some experimentation to figure out what the target host supports and if this would work.

Limit of HTTPS request per seconds

I am doing a project where I need to send device parameters to the server. I will be using Rasberry Pi for that and flask framework.
1. I want to know is there any limitation of HTTPS POST requests per second. Also, I will be using PythonAnywhere for server-side and their SQL database.
Initially, my objective was to send data over the HTTPS channel when the device is in sleep mode. But when the device (ex: car) wakes up I wanted to upgrade the HTTPS to WebSocket and transmit data in realtime. Later came to know PythonAnywhere doesn't support WebSocket.
Apart from answering the first question, can anyone put some light on the second part? I can just increase the number of HTTPS requests when the device is awake (ex: 1 per 60 min in sleep mode and 6 per 60sec when awake), but it will be unnecessary data consumption over the wake period for transmission of the overhead. It will be a consistent channel during the wake period.
PythonAnywhere developer here: from the server side, if you're running on our platform, there's no hard limit on the number of requests you can handle beyond the amount of time your Flask server takes to process each request. In a free account you would have one worker process handling all of the requests, each one in turn, so if it takes (say) 0.2 seconds to handle a request, your theoretical maximum throughput would be five requests a second. A paid "Hacker" plan would have two worker processes, and they would both be handling requests, to that would get you up to ten a second. And you could customize a paid plan and get more worker processes to increase that.
I don't know whether there would be any limits on the RPi side; perhaps someone else will be able to help with that.

Expected performance with getstream.io

The getstream.io documentation says that one should expect retrieving a feed in approximately 60ms. When I retrieve my feeds they contain a field named 'duration' which I take is the calculated server side processing time. This value is steadily around 10-40ms, with an average around 15ms.
The problem is, I seldomly get my feeds in less than 150ms and the average time is rather around 200-250ms and sometimes up to 300-400ms. This is the time for the getting the feed alone, no enrichment etc., and I have verified with tcpdump that the network roundtrip is low (around 25ms), and that the time is actually spent waiting for the server to respond.
I've tried to move around my application (eu-west and eu-central) but that doesn't seem to affect things much (again, network roundtrip is steadily around 25ms).
My question is - should I really expect 60ms and continue investigating, or is 200-400ms normal? On the getstream.io site it is explained that developer accounts receive "Low Priority Processing" - what does this mean in practise? How much difference could I expect with another plan?
I'm using the node js low level API.
Stream APIs use SSL to encrypt traffic. Unfortunately SSL introduces additional network I/O. Usually you need to pay for the increased latency only once because Stream HTTP APIs supports HTTP persistent connection (aka keep-alive).
Here's a Wireshark screenshot of the TCP traffic of 2 sequential API requests with keep alive disabled client side:
The 4 lines in red highlight that the TCP connection is getting closed each time. Another interesting thing is that the handshaking takes almost 100ms and it's done twice (the first bunch of lines).
After some investigation, it turns out that the library used to make API requests to Stream's APIs (request) does not have keep-alive enabled by default. Such change will be part of the library soon and is available on a development branch.
Here's a screenshot of the same two requests with keep-alive enabled (using the code from that branch):
This time there is not connection reset anymore and the second HTTP request does not do SSL handshaking.

Node.js Server Timeout Problems (EC2 + Express + PM2)

I'm relatively new to running production node.js apps and I've recently been having problems with my server timing out.
Basically after a certain amount of usage & time my node.js app stops responding to requests. I don't even see routes being fired on my console anymore - it's like the whole thing just comes to a halt and the HTTP calls from my client (iPhone running AFNetworking) don't reach the server anymore. But if I restart my node.js app server everything starts working again, until things inevitable stop again. The app never crashes, it just stops responding to requests.
I'm not getting any errors, and I've made sure to handle and log all DB connection errors so I'm not sure where to start. I thought it might have something to do with memory leaks so I installed node-memwatch and set up a listener for memory leaks but that doesn't get called before my server stops responding to requests.
Any clue as to what might be happening and how I can solve this problem?
Here's my stack:
Node.js on AWS EC2 Micro Instance (using Express 4.0 + PM2)
Database on AWS RDS volume running MySQL (using node-mysql)
Sessions stored w/ Redis on same EC2 instance as the node.js app
Clients are iPhones accessing the server via AFNetworking
Once again no errors are firing with any of the modules mentioned above.
First of all you need to be a bit more specific about timeouts.
TCP timeouts: TCP divides a message into packets which are sent one by one. The receiver needs to acknowledge having received the packet. If the receiver does not acknowledge having received the package within certain period of time, a TCP retransmission occurs, which is sending the same packet again. If this happens a couple of more times, the sender gives up and kills the connection.
HTTP timeout: An HTTP client like a browser, or your server while acting as a client (e.g: sending requests to other HTTP servers), can set an arbitrary timeout. If a response is not received within that period of time, it will disconnect and call it a timeout.
Now, there are many, many possible causes for this... from more trivial to less trivial:
Wrong Content-Length calculation: If you send a request with a Content-Length: 20 header, that means "I am going to send you 20 bytes". If you send 19, the other end will wait for the remaining 1. If that takes too long... timeout.
Not enough infrastructure: Maybe you should assign more machines to your application. If (total load / # of CPU cores) is over 1, or your memory usage is high, your system may be over capacity. However keep reading...
Silent exception: An error was thrown but not logged anywhere. The request never finished processing, leading to the next item.
Resource leaks: Every request needs to be handled to completion. If you don't do this, the connection will remain open. In addition, the IncomingMesage object (aka: usually called req in express code) will remain referenced by other objects (e.g: express itself). Each one of those objects can use a lot of memory.
Node event loop starvation: I will get to that at the end.
For memory leaks, the symptoms would be:
the node process would be using an increasing amount of memory.
To make things worse, if available memory is low and your server is misconfigured to use swapping, Linux will start moving memory to disk (swapping), which is very I/O and CPU intensive. Servers should not have swapping enabled.
cat /proc/sys/vm/swappiness
will return you the level of swappiness configured in your system (goes from 0 to 100). You can modify it in a persistent way via /etc/sysctl.conf (requires restart) or in a volatile way using: sysctl vm.swappiness=10
Once you've established you have a memory leak, you need to get a core dump and download it for analysis. A way to do that can be found in this other Stackoverflow response: Tools to analyze core dump from Node.js
For connection leaks (you leaked a connection by not handling a request to completion), you would be having an increasing number of established connections to your server. You can check your established connections with netstat -a -p tcp | grep ESTABLISHED | wc -l can be used to count established connections.
Now, the event loop starvation is the worst problem. If you have short lived code node works very well. But if you do CPU intensive stuff and have a function that keeps the CPU busy for an excessive amount of time... like 50 ms (50 ms of solid, blocking, synchronous CPU time, not asynchronous code taking 50 ms), operations being handled by the event loop such as processing HTTP requests start falling behind and eventually timing out.
The way to find a CPU bottleneck is using a performance profiler. nodegrind/qcachegrind are my preferred profiling tools but others prefer flamegraphs and such. However it can be hard to run a profiler in production. Just take a development server and slam it with requests. aka: a load test. There are many tools for this.
Finally, another way to debug the problem is:
env NODE_DEBUG=tls,net node <...arguments for your app>
node has optional debug statements that are enabled through the NODE_DEBUG environment variable. Setting NODE_DEBUG to tls,net will make node emit debugging information for the tls and net modules... so basically everything being sent or received. If there's a timeout you will see where it's coming from.
Source: Experience of maintaining large deployments of node services for years.

Linux TCP weirdly unresponsive when under heavy load

I'm trying to get an HTTP server I'm writing on to behave well when under heavy load, but I'm getting some weird behavior that I cannot quite understand.
My testing consists of using ab (the Apache benchmark program) over the loopback interface at a concurrency level of 1000 (ab -n 50000 -c 1000 http://localhost:8080/apa), while straceing the server process. Strace both slows processing down well enough for the problem to be readily reproducible and allows me to debug the server internals post completion to some extent. I also capture the network traffic with tcpdump while the test is running.
What happens is that ab stops running a while into the test, complaining that a connection returned ECONNRESET, which I find a bit weird. I could easily buy into a connection timing out since the server might simply not have the bandwidth to process them all, but shouldn't that reasonably return ETIMEDOUT or even ECONNREFUSED if not all connections can be accepted?
I used Wireshark to extract the packets constituting the first connection to return ECONNRESET, and its brief packet list looks like this:
(The entire tcpdump file of this connection is available here.)
As you can see from this dump, the connection is accepted (after a few SYN retransmissions), and then the request is retransmitted a few times, and then the server resets the connection. I'm wondering, what could cause this to happen? Normally, Linux' TCP implementation ACKs data before the reading process even chooses to receive it so long as their is space in the TCP window, so why doesn't it do that here? Are there some kind of shared buffers that are running out? Most importantly, why is the kernel responding with a RST packet all of a sudden instead of simply waiting and letting the client re-transmit further?
For the record, the strace of the process indicates that it never even accepts a connection from the port in this connection (port 56946), so this seems to be something Linux does on its own. It is also worth noting that the server works perfectly well as long as ab's concurrency level is low enough (it works perfectly well up to about 100, and then starts failing intermittently somewhere between 100-500), and that its request throughput is rather constant regardless of the concurrency level (it processes somewhere between 6000-7000 requests per second as long as it isn't being straced). I have not found any particular correlation between the frequency of the problem occurring and my backlog setting to listen() (I'm currently using 128, but I've tried up to 1024 without it seeming to make a difference).
In case it matters, I'm running Linux 3.2.0 on this AMD64 box.
The backlog queue filled up: hence the SYN retransmissions.
Then a slot became available: hence the SYN/ACK.
Then the GET was sent, followed by four retransmissions, which I can't account for.
Then the server gave up and reset the connection.
I suspect you have a concurrency or throughput problem in your server which is preventing you from accepting connections rapidly enough. You should have a thread that is dedicated to doing nothing else but calling accept() and either starting another thread to handle the accepted socket or else queueing a job to handle it to a thread pool. I would then speculate that Linux resets connections on connections which are in the backlog queue and which are receiving I/O retries, but that's only a guess.

Resources