Is there a way to set concurrency in a linux EC2 instance?

I currently have a script inside a linux ec2 instance that processes some documents. This script gets called from AWS Lambda using (SSM) send_command. It works fine when it processes one or two documents but when it gets past that, I get empty responses. Im assuming the system bottlenecks as there is essentially no limit to the amount of calls that I can send to the instance. So is there a way to set the concurrency level on the instance to only process say 2 commands at a time?
I know I can set the concurrency level on the lambdas, but the execution time is usually less than 200ms. Meanwhile the processing time in the instance is about 5 to 15 seconds.
Ultimately, I can have the lambdas wait for the job to be completed but it would be expensive as I need to process thousands of documents.
Thank you!


How to convert a multiprocess Flask/unicorn to a single multithreaded process

I would like to cache a large amount of data in a Flask application. Currently it runs on K8S pods with the following unicorn.ini
bind = ""
workers = 10
timeout = 900
preload_app = True
To avoid caching the same data in those 10 workers I would like to know if Python supports a way to multi-thread instead of multi-process. This would be very easy in Java but I am not sure if it is possible in Python. I know that you can share cache between Python instances using the file system or other methods. However it would be a lot simpler if it is all share in the same process space.
There are couple post that suggested threads are supported in Python. This comment by Filipe Correia, or this answer in the same question.
Based on the above comment the Unicorn design document talks about workers and threads:
Since Gunicorn 19, a threads option can be used to process requests in multiple threads. Using threads assumes use of the gthread worker.
Based on how Java works, to shared some data among threads, I would need one worker and multiple threads. Based on this other link
I know it is possible. So I assume I can change my gunicorn configuration as follows:
bind = ""
workers = 1
threads = 10
timeout = 900
preload_app = True
This should give me 1 worker and 10 threads which should be able to process the same number of request as current configuration. However the question is: Would the cache still be instantiated once and shared among all the threads? How or where should I instantiate the cache to make sure is shared among all the threads.
would like to ... multi-thread instead of multi-process.
I'm not sure you really want that. Python is rather different from Java.
workers = 10
One way to read that is "ten cores", sure.
But another way is "wow, we get ten GILs!"
The global interpreter lock must be held
before the interpreter interprets a new bytecode instruction.
Ten interpreters offers significant parallelism,
executing ten instructions simultaneously.
Now, there are workloads dominated by async I/O, or where
the interpreter calls into a C extension to do the bulk of the work.
If a C thread can keep running, doing useful work
in the background, and the interpreter gathers the result later,
terrific. But that's not most workloads.
tl;dr: You probably want ten GILs, rather than just one.
To avoid caching the same data in those 10 workers
Right! That makes perfect sense.
Consider pushing the cache into a storage layer, or a daemon like Redis.
Or access memory-resident cache, in the context of your own process,
via mmap or shmat.
When running Flask under Gunicorn, you are certainly free
to set threads greater than 1,
though it's likely not what you want.
YMMV. Measure and see.

Querying multiple sensors regularly using NodeJS

I need to fetch the values of about 200 sensors every 15 seconds or so. To fetch the values I simply need to make an HTTP call with basic authentication and parse the response. The catch is that these sensors might be on slow connection so I need to wait at least 5 seconds for one sensor (but usually they respond a lot quicker, but there's always some that are slow and timeout).
So right now I have the following setup for that:
There is a NodeJS process that is connected to my DB and knows all about the sensors. It checks regularly to see if there are new ones or there are some that got deleted. It spawns a child process for every sensor, and in case the child process dies it restarts it. Also it kills it if the sensor gets deleted. The child process makes the HTTP call to its sensor with a 5 second timeout value and if it receives the value, saves it to Redis. Also it is in an infinite loop with a 15 seconds setTimeout. And there is a third process that copies all the values from Redis to the main MySQL DB.
So that has been a working solution for half a year, but after a major system upgrade (from Ubuntu 14.04 to 18.04 and thus every package upgraded as well) it seems to leak some memory and I can't seem to figure out where.
After starting out, the processes summarised take about 1.5GB of memory. But after a day or so this goes up to 3GB and so on and before running out of memory I need to kill all node processes and restart the whole thing.
So now I am trying to figure out more efficient methods to achieve the same result (query around 2-300 URLs every 15 sec and store the result in MySQL). At the moment I'm thinking of ditching Redis and the child processes will communicate with their master process and the master process will write to MySQL directly. This way I don't need to load the Redis library into every child process and that might save me some time.
So I need ideas on how to reduce memory usage for that application (I'm limited to PHP and NodeJS, mainly because of my knowledge, so writing a native daemon might be out of the question)
The solution was easier than I thought. I had to rewrite the child process into a native bash script and that brought down the memory usage to almost being zero.

Why are concurrent lambda requests being kicked off late?

I'm running load tests on AWS Lambda with Charlesproxy, but am confused by the timelines chart produced. I've setup a test with 100 concurrent connections and expect varying degrees of latency, but expect all 100 requests to be kicked off at the same time (hence concurrent setting in charlesproxy repeat advanced feature), but I'm seeing some requests get started a bit late ... that is if I understand the chart correctly.
With only 100 invocations, I should be well within the concurrency max set by AWS Lambda, so why then are these request being kicked off late (see requests 55 - 62 on attached image)?
Lambda can take from a few hundred milliseconds to 1-2 seconds to start up when it's in "cold state". Cold means it needs to download your package, unpack it, load in memory, then start executing your code. After execution, this container is kept "alive" for about 5 to 30 minutes ("warm state"). If you request again while it's warm, container startup is much faster.
You probably had a few containers already warm when you started your test. Those started up faster. Since the other requests came concurrently, Lambda needed to start more containers and those came from a "cold state", thus the time difference you see in the chart.

Can I use child process or cluster to do custom function calls in node?

I have a node program that does a lot of heavy synchronous work. The work that needs to be done could easily be split into several parts. I would like to utilize all processor cores on my machine for this. Is this possible?
Form the docs on child processes and clusters I see no obvious solution. Child processes seems to be focused on running external programs and clusters only work for incoming http connections (or have I misunderstood that?).
I have a simple function var output = fn(input) and would just like to run it several times, spread all the calls across the cores on my machine and provide the result in a callback. Can that be done?
Yes, child processes and clusters are the way to do that. There are a couple of ways of implementing a solution to your problem.
Your server creates a queue and manages that queue. Whenever you need to call your function, you will drop it into the queue. You will then process the queue N items at a time, where N equals the number of your cores. When you start processing, you will spawn a child process, probably either using spawn or exec, with the argument being another standalone Node.js script, along with any additional parameters (it's just a command line call, basically). Inside that script you will do your work, and emit the result back to the server. The worker is then freed up.
You can create a dedicated server with cluster, where all it will do is run your function. With the cluster module, you can (once again) create N number of other workers, and delegate work to these wokers.
Now this may seem like a lot of work, and it is. And for that reason you should use an existing library as this is a, for the most part, a solve problem at this point. I really like redis-based queues, so if you're interested in that see this answer for some queue recommendations.

CRON + Nodejs + multiple cores => behaviour?

I'm building in a CRON like module into my service (using node-schedule) that will get required into each instance of my multi-core setup and I'm wondering since they are all running their own threads and they are all scheduled to run at the same time, will they get called for every single thread or just once because they're all loading the same module.
If they do get called multiple times, then what is the best way to make sure the desired actions only get called once?
if you are using pm2 with cluster mode, then can use
process.env.NODE_APP_INSTANCE to detect which instance is running. You can use the following code so your cron jobs will be called only once.
// run cron jobs only for first instance
if(process.env.NODE_APP_INSTANCE === '0'){
// cron jobs
node-schedule runs inside a given node process and it schedules things that that particular node process asked it to schedule.
If you are running multiple node processes and each is using node-schedule, then all the node-schedule instances within those separate node processes are independent (no cooperation or coordination between them). If each node process asks it's own node-schedule instance to run a particular task at 3pm on the first wednesday of the month, then all the node processes will start running that task at that time.
If you only want the action carried out once, then you have to coordinate among your node-instances so that the action is only scheduled in one node process, not in all of them or only schedule these types of operations in one of your node instances, not all of them.
The best way to handle this in a generic way is to have a shared database that you write a "lock" entry to. As in, let's say all tasks wrote a DB entry such as {instanceId: "a", taskId: "myTask", timestamp: "2021-12-22:10:35"}.
All tasks would submit the same thing except with their own instanceId. You then have an unique index on 'timestamp' so that only 1 gets accepted.
Then they all do a query and see if their node was the one that was accepted to do the cron.
You could do the same thing but also add a "random" field that generates a random number and the task with the lowest number wins.
