I have a fargate cluster running a node.js API, running on fargate 1.4.0
I have maybe 8-25 instances running depending on the load. Instances are defined with these parameters using aws CDK:
cpu: 512,
assignPublicIp: true,
memoryLimitMiB: 2048,
publicLoadBalancer: true,
Like few times a day I get error like this:
error Command failed with signal "SIGKILL".
I though I was running out of memory, so I've configured node, to start with less memory like this: NODE_OPTIONS=--max_old_space_size=900
This made it less likely to occur, but I am still getting some SIGKILLs.
When looking at the instances at runtime I see they have plenty of memory free on the OS level:
{
"freemem": "6.95GB",
"totalmem": "7.79GB",
"max_old_space_size": 813.1680679321289,
"processUptime": "46m",
"osUptime": "49m",
"rssMemory": "396.89MB"
}
Why is fargate still killing those instances? Is there a way to find out most memory hungry processes just before the SIGKILL?
Related
I have a Python (3.x) webservice deployed in GCP. Everytime Cloud Run is shutting down instances, most noticeably after a big load spike, I get many logs like these Uncaught signal: 6, pid=6, tid=6, fault_addr=0. together with [CRITICAL] WORKER TIMEOUT (pid:6) They are always signal 6.
The service is using FastAPI and Gunicorn running in a Docker with this start command
CMD gunicorn -w 2 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8080 app.__main__:app
The service is deployed using Terraform with 1 gig of ram, 2 cpu's and the timeout is set to 2 minutes
resource "google_cloud_run_service" <ressource-name> {
name = <name>
location = <location>
template {
spec {
service_account_name = <sa-email>
timeout_seconds = 120
containers {
image = var.image
env {
name = "GCP_PROJECT"
value = var.project
}
env {
name = "BRANCH_NAME"
value = var.branch
}
resources {
limits = {
cpu = "2000m"
memory = "1Gi"
}
}
}
}
}
autogenerate_revision_name = true
}
I have already tried tweaking the resources and timeout in Cloud Run, using the --timeout and --preload flag for gunicorn as that is what people always seem to recommend when googling the problem but all without success. I also dont exactly know why the workers are timing out.
Extending on the top answer which is correct, You are using GUnicorn which is a process manager that manages Uvicorn processes which runs the actual app.
When Cloudrun wants to shutdown the instance (due to lack of requests probably) it will send a signal 6 to process 1. However, GUnicorn occupies this process as the manager and will not pass it to the Uvicorn workers for handling - thus you receive the Unhandled signal 6.
The simplest solution, is to run Uvicorn directly instead of through GUnicorn (possibly with a smaller instance) and allow the scaling part to be handled via Cloudrun.
CMD ["uvicorn", "app.__main__:app", "--host", "0.0.0.0", "--port", "8080"]
Unless you have enabled CPU is always allocated, background threads and processes might stop receiving CPU time after all HTTP requests return. This means background threads and processes can fail, connections can timeout, etc. I cannot think of any benefits to running background workers with Cloud Run except when setting the --cpu-no-throttling flag. Cloud Run instances that are not processing requests, can be terminated.
Signal 6 means abort which terminates processes. This probably means your container is being terminated due to a lack of requests to process.
Run more workloads on Cloud Run with new CPU allocation controls
What if my application is doing background work outside of request processing?
This error happens when a background process is aborted. There are some advantages of running background threads on cloud just like for other applications. Luckily, you can still use them on Cloud Run without processes getting aborted. To do so, when deploying, chose the option "CPU always allocated" instead of "CPU only allocated during request processing"
For more details, check https://cloud.google.com/run/docs/configuring/cpu-allocation
I have two services deployed on Google cloud infrastructure; Service 1 runs on Compute Engine and Service 2 on Cloud Run and I'd like to log their memory usage via the ekg-core library (https://hackage.haskell.org/package/ekg-core-0.1.1.7/docs/System-Metrics.html).
The logging bracket is similar to this :
mems <- newStore
registerGcMetrics mems
void $ concurrently io (loop mems)
where
loop ms = do
m <- sampleAll ms
... (lookup the gauges from m and log their values)
threadDelay dt
loop ms
I'm very puzzled by this: both rts.gc.current_bytes_used and rts.gc.max_bytes_used gauges return constant 0 in the case of Service 2 (the Cloud Run one), even though I'm using the same sampling/logging functionality and build options for both services. I should add that the concurrent process in concurrently is a web server, and I expect the base memory load to be around 200 KB, not 0B.
This is about where my knowledge ends; could this behaviour be due to the Google Cloud Run hypervisor ("gVisor") implementing certain syscalls in a non-standard way (gVisor syscall guide : https://gvisor.dev/docs/user_guide/compatibility/linux/amd64/) ?
Thank you for any pointers to guides/manuals/computer wisdom.
Details :
Both are built with these options :
-optl-pthread -optc-Os -threaded -rtsopts -with-rtsopts=-N -with-rtsopts=-T
the only difference is that Service 2 has an additional flag -with-rtsopts=-M2G since Cloud Run services must work with 2 GB of memory at most.
The container OS in both cases is Debian 10.4 ("Buster").
Thinking a bit longer about this, this behaviour is perfectly reasonable in the "serverless" model; resources(both CPU and memory) are throttled down to 0 when the service is not processing requests [1], which is exactly what ekg picks up.
Why logs are printed out even outside of requests is still a bit of a mystery, though ..
[1] https://cloud.google.com/run/docs/reference/container-contract#lifecycle
I am using sharp image processing module to resize the image and render it on UI.
app.get('/api/preview-small/:filename',(req,res)=>{
let filename = req.params.filename;
sharp('files/' + filename)
.resize(200, 200, {
fit: sharp.fit.inside,
withoutEnlargement: true
})
.toFormat('jpeg')
.toBuffer()
.then(function(outputBuffer) {
res.writeHead('200',{"Content-Type":"image/jpeg"});
res.write(outputBuffer);
res.end();
});
});
I am running above code on a single board computer Rock64 with 1 GB ram. When I run a Linux htop command and monitor the memory utilization, I could see the memory usage is adding up exponentially from 10% to 60% after every call to the nodejs app and it never comes down.
CPU USAGE
Though it does not give any issue running the application, my only concern is memory usage does not come down, even when the app is not running and I am not sure if this will crash the application eventually if this application runs continuously.
or if I move a similar code snippet to the cloud will it keep occupying memory even when it's not running?
Anyone who is using sharp module facing the similar issue or is this a known issue with node.js. Do we have a way to flush out/clear out the memory or will node do garbage collection?
Any help is appreciated. Thanks
sharp has some memory debugging stuff built in:
http://sharp.dimens.io/en/stable/api-utility/#cache
You can control the libvips cache, and get stats about resource usage.
The node version has a very strong effect on memory behaviour. This has been discussed a lot on the sharp issue tracker, see for example:
https://github.com/lovell/sharp/issues/429
Or perhaps:
https://github.com/lovell/sharp/issues/778
I'm running a kubernetes cluster and one microservice is constantly crashing with exitCode 134. I already changed the resource memory limit to 6Gi
resources: {
limits: {
memory: "6Gi"
}
}
but the pod never goes above 1.6/1.7Gi.
What may be missing?
It's not about Kubernetes memory limit. Default JavaScript Heap limit is 1.76GB when running in node (v8 engine).
The command-line in Deployment/Pod should be changed like node --max-old-space-size=6144 index.js.
Env.: Node.js on Ubuntu, using PM2 programmatically.
I have started PM2 with 3 instances via Node on my main code. Suppose I use the PM2 command line to delete one of the instances. Can I add back another worker to the pool? Can this be done without affecting the operation of the other workers?
I suppose I should use the start method:
pm2.start({
name : 'worker',
script : 'api/workers/worker.js', // Script to be run
exec_mode : 'cluster', // OR FORK
instances : 1, // Optional: Scale your app by 4
max_memory_restart : '100M', // Optional: Restart your app if it reaches 100Mo
autorestart : true
}, function(err, apps) {
pm2.disconnect();
});
However, if you use pm2 monit you'll see that the 2 existing instances are restarted and no other is created. Result is still 2 running instances.
update
it doesn't matter if cluster or fork -- behavior is the same.
update 2 The command line has the scale option ( https://keymetrics.io/2015/03/26/pm2-clustering-made-easy/ ), but I don't see this method on the programmatic API documentation ( https://github.com/Unitech/PM2/blob/master/ADVANCED_README.md#programmatic-api ).
I actually think this can't be done in PM2 as I have the exact same problem.
I'm sorry, but I think the solution is to use something else as PM2 is fairly limited. The lack of ability to add more workers is a deal breaker for me.
I know you can "scale" on the command line if you are using clustering but I have no idea why you can not start more instances if you are using fork. It makes no sense.
As I know, all commands of PM2 can also be used programmatically, including scale. Check out CLI.js to see all available methods.
Try to use the force attribute in the application declaration. If force is true, you can start the same script several times, which is usually not allowed by PM2 (according to the Application Declaration
docs)
By the way, autorestart it's true by default.
You can do so by use of a ecosystem.config file.
Inside that file you can specify as much worker processes as you want.
E.g. we used BullJS to develop a microservice architecture of different workers that are started with the help of PM2 on multiple cores: The same worker started as named instances multiple times.
Now when jobs are run BullJS load balances the workloads for one specific worker on all available instances for that worker.
You could of course start or stop any instance via CLI and also start additional named workers via the command line to increase the amount of workers (e.g. if many jobs need to be run and you want to process more jobs at a time):
pm2 start './script/to/start.js' --name additional-worker-4
pm2 start './script/to/start.js' --name additional-worker-5