How to loadtest microservices with Python 3.5?

How to loadtest microservices with Python 3.5? - python-3.x

We have a set of micro-services that I'd like to load test in a manner that is consistent with how they are accessed.
After settling on Locust as my tool of choice, I found out that the TCP connection underpinning has connection pooling because I keep seeing messages like these:
WARNING/requests.packages.urllib3.connectionpool: Connection pool is full, discarding connection:
As I understand it, this message is telling me that it discards a connection from the pool that it manages. I assume that it still creates a new connection, and adds it in the place of the one that it discarded.
Is that what it does?
Does it do this without the connection failing?
I don't think that our micro-services keep any sessions open. The connections are made, from a far end, to our services, which provide a result, and then the connection is closed. So, the test is handling the connections in a way that's different than the services are used. Is there a way to get the requests lib to not use a pool, and go through the work of setting up and tearing down all connections made through it, each time?
Is there any reason why we wouldn't want to test this way?
If it is preferable to test with a connection pool, how should I anticipate the difference in load when it's not done this way in production?

That's correct. Unless you set the urllib3 pool to blocking, it will generate more connections than the pool is configured to hold, as needed, and then will discard them once the request is done.
This often happens when you have more threads using a pool than the number of connections the pool is configured to store. urllib3 takes a maxsize parameter (defaults to 1) which you can set to the number of threads you're running. For requests, you'll need to make a custom adapter to do this. See:
https://stackoverflow.com/a/18845952/187878
https://laike9m.com/blog/requests-secret-pool_connections-and-pool_maxsize,89/
That said, it's merely a warning which some people ignore, so it's not a failure. But if this happens a lot in production, that probably means you should tweak your configuration because creating/discarding new connections all the time is fairly costly.
In general, it's a good idea to re-use connections for this reason.
My suggestions would be in this order:
Re-use connections, or
Increase the number of connections that get pooled to match the number of threads, or
Disable the warning if you'd rather not deal with it.

Related

Except from memory and CPU leaks, what will be reasons for Node.js server might go went down?

I have a Node.js (Express.js) server for my React.js website as BFF. I use Node.js for SSR, proxying some request and cache some pages in Redis. In last time I found that my server time to time went down. I suggest an uptime is about 2 days. After restart, all ok, then response time growth from hour to hour. I have resource monitoring at this server, and I see that server don't have problems with RAM or CPU. It used about 30% of RAM and 20% of CPU.
I regret to say it's a big production site and I can't make minimal reproducible example, cause i don't know where is reason of these error :(
Except are memory and CPU leaks, what will be reasons for Node.js server might go went down?
I need at least direction to search.
UPDATE1:
"went down" - its when kubernetes kills container due 3 failed life checks (GET request to a root / of website)
My site don't use any BD connection but call lots of 3rd party API's. About 6 API requests due one GET/ request from browser
UPDATE2:
Thx. To your answers, guys.
To understand what happend inside my GET/ request, i'm add open-telemetry into my server. In longtime and timeout GET/ requests i saw long API requests with very big tcp.connect and tls.connect.
I think it happens due lack of connections or something about that. I think Mostafa Nazari is right.
I create patch and apply them within the next couple of days, and then will say if problem gone
I solve problem.
It really was lack of connections. I add reusing node-fetch connection due keepAlive and a lot of cache for saving connections. And its works.
Thanks for all your answers. They all right, but most helpful thing was added open-telemetry to my server to understand what exactly happens inside request.
For other people with these problems, I'm strongly recommended as first step, add telemetry to your project.
https://opentelemetry.io/
PS: i can't mark two replies as answer. Joe have most detailed and Mostafa Nazari most relevant to my problem. They both may be "best answers".
Tnx for help, guys.

Gradual growth of response time suggest some kind of leak.
If CPU and memory consumption is excluded, another potentially limiting resources include:
File descriptors - when your server forgets to close files. Monitor for number of files in /proc//fd/* to confirm this. See what those files are, find which code misbehaves.
Directory listing - even temporary directory holding a lot of files will take some time to scan, and if your application is not removing some temporary files and lists them - you will be in trouble quickly.
Zombie processes - just monitor total number of processes on the server.
Firewall rules (some docker network magic may in theory cause this on host system) - monitor length of output of "iptables -L" or "iptables-save" or equivalent on modern kernels. Rare condition.
Memory fragmentation - this may happen in languages with garbage collection, but often leaves traces with something like "Can not allocate memory" in logs. Rare condition, hard to fix. Export some health metrics and make your k8s restart your pod preemptively.
Application bugs/implementation problems. This really depends on internal logic - what is going on inside the app. There may be some data structure that gets filled in with data as time goes by in some tricky way, becoming O(N) instead of O(1). Really hard to trace down, unless you have managed to reproduce the condition in lab/test environment.
API calls from frontend shift to shorter, but more CPU-hungry ones. Monitor distribution of API call types over time.

Here are some of the many possibilities of why your server may go down:
Memory leaks The server may eventually fail if a Node.js application is leaking memory, as you stated in your post above. This may occur if the application keeps adding new objects to the memory without appropriately cleaning up.
Unhandled exceptions The server may crash if an exception is thrown in the application code and is not caught. To avoid this from happening, ensure that all exceptions are handled properly.
Third-party libraries If the application uses any third-party libraries, the server may experience problems as a result. Before using them, consider examining their resource usage, versions, or updates.
Network Connection The server's network connection may have issues if the server is sending a lot of queries to third-party APIs or if the connection is unstable. Verify that the server is handling connections, timeouts, and retries appropriately.
Connection to the Database Even though your server doesn't use any BD connections, it's a good idea to look for any stale connections to databases that could be problematic.
High Volumes of Traffic The server may experience performance issues if it is receiving a lot of traffic. Make sure the server is set up appropriately to handle a lot of traffic, making use of load balancing, caching, and other speed enhancement methods. Cloudflare is always a good option ;)
Concurrent Requests Performance problems may arise if the server is managing a lot of concurrent requests. Check to see if the server is set up correctly to handle several requests at once, using tools like a connection pool, a thread pool, or other concurrency management strategies.
(Credit goes to my System Analysis and Design course slides)

With any incoming/outgoing web requests, 2 File Descriptors will be acquired. as there is a limit on number of FDs, OS does not let new Socket to be opened, this situation cause "Timeout Error" on clients. you can easily check number of open FDs by sudo ls -la /proc/_PID_/fd/ | tail -n +4 | wc -l where _PID_ is nodejs PID, if this value is rising, you have connection leak issue.
I guess you need to do the following to prevent Connection Leak:
make sure you are closing outgoing API call Http Connection (it depends on how you are opening them, some libraries manage this and you just need to config them)
cache your outgoing API call (if it is possible) to reduce API call
for your outgoing API call, use Connection pool, this would manage number of open HttpConnection, reuse already-opened connection and ...
review your code, so that you can serve a request faster than now (for example make your API call more parallel instead of await or nested call). anything you do to make your response faster, is good for preventing this situation

I solve problem. It really was lack of connections. I add reusing node-fetch connection due keepAlive and a lot of cache for saving connections. And its works.
Thanks for all your answers. They all right, but most helpful thing was added open-telemetry to my server to understand what exactly happens inside request.
For other people with these problems, I'm strongly recommended as first step, add telemetry to your project.
https://opentelemetry.io/

nodejs | worker_thread | keep alive tcp connection within workers?

Using worker_threads from node 12, is it suitable to establish remote connection within the workers and keep those connection alive ?
I don't mean sharing the socket between the master and the workers like we could do with node cluster and fork.
The idea would be to have pools of secure connections already established within the workers to use if needed.
Let say I have a pool of 10 workers. When a worker is created, some pre-established "TLS" connection are created (streams) to server X,Y amd Z, and the worker is marked as "ready"
Each time that I use a worker to process "heavy" tasks (mapReduce, etc, ) and if I need to post data or get data to/from server X,Y or Z during the process,
I use the appropriate "TLS" connection already established from the pool.
Once the task completed, the result is return to the master and the worker just execute a new/next tasks.
1 ) Do you see any side effect / impact of doing so ?
2 ) would it be better to have the pool of "TLS" connection on the "main thread" (master) . If "remote" data are needed within the workers during the tasks, use the "postMessage" method to communicate with the "master" ( and vice/versa ).
Thanks

Worker Threads do not work for remote connections. However, you can build your own system that would work similar using TLS sockets. In a case of such a system I would definitely recommend keeping these types of connections alive. There is a significant latency in setting up these connections, and having these connections active in memory, will use a minimum amount of resources.
Keep in mind that a system like this has some drawbacks:
You are working with different machines, and each of these machines can have its own set of failure conditions.
You are communicating over a network, connections with remote servers might suddenly drop, for any reason imaginable.
You are increasing the physical distance, this will cause latency.
So keep this in the back of your mind.
Would I recommend building a system like this. It is really hard to determine and it relies on your use case, time and money. You mentioned the cluster nodes are processing 'heavy tasks', and with that I reckon CPU / GPU intensive tasks. So a system like this might be a good solution, however, a simple rest API in front of your processing servers might be good enough. Or maybe even database synchronized servers, that just check the database for tasks to execute.
There are many solutions for the same problem, just have to consider what works best for your project(s).

Clustered socket.io server hangs

I'm writing a socket.io based server in Node.js (6.9.0). I am using the builtin cluster module to enable multiple processes. For now, there is only two process: a master and a worker. The master receives the connections and maintains an in-memory global data structure (which the worker can query via IPC). The worker process does the majority of work by handling each incoming connection.
I am finding a hanging condition that I cannot attribute to any internal failure when the server is stressed at 300 concurrent users. Under lower concurrency, I don't see the hanging condition.
I'm enabling all forms of debugging (using the debug module: socket.io:socket, socket.io:client as well as my own custom calls to debug).
The last activity I can see is in socket.io, however, the messages indicate that sockets are closing ("reason client namespace disconnect") due to their own "end of test" cycle. It just seems like incoming connections are not be serviced.
I'm using Artillery.io as the test client.
In the server application, I have handlers for uncaught exceptions and try-catch blocks around everything.
In a prior iteration, I also used cluster, but reversed the responsibilities so that the master process handled the connections (with the worker handling global data). That didn't exhibit the same failure. Not sure if something is wrong with the connection distribution. For that, I have also dumped internalMessage events to monitor the internal workings of cluster.
I am not using any other module for connection distribution or sticky sessions. As there is only a single process handling connections (at this time), it doesn't seem relevant.

I was able to remove the hanging condition by changing the cluster scheduling policy from Round Robin (SCHED_RR) to None, which is OS specific (SCHED_NONE). I can't tell whether this is due to a bug in connection distribution (or something else inherent in the scheduling policy), but this one change seems to prevent the hanging condition.

Check queued reads/writes for MongoDB

I feel like this question would have been asked before, but I can't find one. Pardon me if this is a repeat.
I'm building a service on Node.js hosted in Heroku and using MongoDB hosted by Compose. Under heavy load, the latency is most likely to come from the database, as there is nothing very CPU-heavy in the service layer. Thus, when MongoDB is overloaded, I want to return an HTTP 503 promptly instead of waiting for a timeout.
I'm also using REDIS, and REDIS has a feature where you can check the number of queued commands (redisClient.command_queue.length). With this feature, I can know right away if REDIS is backed up. Is there something similar for MongoDB?
The best option I have found so far is polling the server for status via this command, but (1) I'm hoping for something client side, as there could be spikes within the polling interval that cause problems, and (2) I'm not actually sure what part of the status response I want to act on. That second part brings me to a follow up question...
I don't fully understand how the MondoDB client works with the server. Is one connection shared per client instance (and in my case, per process)? Are queries and writes queued locally or on the server? Or, is one connection opened for each query/write, until the database's connection pool is exhausted? If the latter is the case, it seems like I might want to keep an eye on the open connections. Does the MongoDB server return such information at other times, besides when polled for status?
Thanks!

MongoDB connection pool workflow-
Every MongoClient instance has a built-in connection pool. The client opens sockets on demand to support the number of concurrent MongoDB operations your application requires. There is no thread-affinity for sockets.
The client instance, opens one additional socket per server in your MongoDB topology for monitoring the server’s state.
The size of each connection pool is capped at maxPoolSize, which defaults to 100.
When a thread in your application begins an operation on MongoDB, if all other sockets are in use and the pool has reached its maximum, the thread pauses, waiting for a socket to be returned to the pool by another thread.
You can increase maxPoolSize:
client = MongoClient(host, port, maxPoolSize=200)
By default, any number of threads are allowed to wait for sockets to become available, and they can wait any length of time. Override waitQueueMultiple to cap the number of waiting threads. E.g., to keep the number of waiters less than or equal to 500:
client = MongoClient(host, port, maxPoolSize=50, waitQueueMultiple=10)
Once the pool reaches its max size, additional threads are allowed to wait indefinitely for sockets to become available, unless you set waitQueueTimeoutMS:
client = MongoClient(host, port, waitQueueTimeoutMS=100)
Reference for connection pooling-
http://blog.mongolab.com/2013/11/deep-dive-into-connection-pooling/

When does a single JMS connection with multiple producing sessions start becoming a bottleneck?

I've recently read a lot about best practices with JMS, Spring (and TIBCO EMS) around connections, sessions, consumers & producers
When working within the Spring world, the prevailing wisdom seems to be
for consuming/incoming flows - to use an AbstractMessageListenerContainer with a number of consumers/threads.
for producing/publishing flows - to use a CachingConnectionFactory underneath a JmsTemplate to maintain a single connection to the broker and then cache sessions and producers.
For producing/publishing, this is what my (largeish) server application is now doing, where previously it was creating a new connection/session/producer for every single message it was publishing (bad!) due to use of the raw connection factory under JmsTemplate. The old behaviour would sometimes lead to 1,000s of connections being created and closed on the broker in a short period of time in high peak periods and even hitting socket/file handle limits as a result.
However, when switching to this model I am having trouble understanding what the performance limitations/considerations are with the use of a single TCP connection to the broker. I understand that the JMS provider is expected to ensure it can be used in the multi-threaded way etc - but from a practical perspective
it's just a single TCP connection
the JMS provider to some degree needs to co-ordinate writes down the pipe so they don't end up an interleaved jumble, even if it has some chunking in its internal protocol
surely this involves some contention between threads/sessions using the single connection
with certain network semantics (high latency to broker? unstable throughput?) surely a single connection will not be ideal?
On the assumption that I'm somewhat on the right track
Am I off base here and misunderstanding how the underlying connections work and are shared by a JMS provider?
is any contention a problem mitigated by having more connections or does it just move the contention to the broker?
Does anyone have any practical experience of hitting such a limit they could share? Either with particular message or network throughput, or even caused by # of threads/sessions sharing a connection in parallel
Should one be concerned in a single-connection scenario about sessions that write very large messages blocking other sessions that write small messages?
Would appreciate any thoughts or pointers to more reading on the subject or experience even with other brokers.

When thinking about the bottleneck, keep in mind two facts:
TCP is a streaming protocol, almost all JMS providers use a TCP based protocol
lots of the actions from TIBCO EMS client to EMS server are in the form of request/reply. For example, when you publish a message / acknowledge a receive message / commit a transactional session, what's happening under the hood is that some TCP packets are sent out from client and the server will respond with some packets as well. Because of the nature of TCP streaming, those actions have to be serialised if they are initiated from the same connection -- otherwise say if from one thread you publish a message and in the exact same time from another thread you commit a session, the packets will be mixed on the wire and there is no way server can interpret the right message from the packets. [ Note: the synchronisation is done from the EMS client library level, hence user can feel free to share one connection with multiple threads/sessions/consumers/producers ]
My own experience is multiple connections always output perform single connection. In a lossy network situation, it is definitely a must to use multiple connections. Under best network condition, with multiple connections, a single client can nearly saturate the network bandwidth between client and server.
That said, it really depends on what is your clients' performance requirement, a single connection under good network can already provides good enough performance.

Even if you use one connection and 100 sessions it means finally you
are using 100threads, it is same as using 10connections* 10 sessions =
100threads.
You are good until you reach your system resource limits

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string