I have a dictionary in my flask app with is global and contains information of logged in user.
globalDict = {}
globalDict[username] = "Some Important info"
When a person logs in this globalDict populates and depopulates when user logs out. When I use uwsgi with a single process, obviously there is no problem.
When I use it with multiple processes, sometimes the dictionary turns out to be empty on printing. I guess this is because there are multiple globalDicts across different processes.
How to share globalDict across all processes in my flask application?
P.S. I use only uwsgi for hosting my server and nothing else.
Give a try to uwsgi cache.
uwsgi startup
uwsgi --cache2 name=mycache,items=100 --ini $conffile
write example:
uwsgi.cache_update(this_item["sha"], encoded_blob, 0, "mycache")
read example:
value_from_cache = uwsgi.cache_get(this_item["sha"], "mycache")
More Details
Related
I have a Python (3.x) webservice deployed in GCP. Everytime Cloud Run is shutting down instances, most noticeably after a big load spike, I get many logs like these Uncaught signal: 6, pid=6, tid=6, fault_addr=0. together with [CRITICAL] WORKER TIMEOUT (pid:6) They are always signal 6.
The service is using FastAPI and Gunicorn running in a Docker with this start command
CMD gunicorn -w 2 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8080 app.__main__:app
The service is deployed using Terraform with 1 gig of ram, 2 cpu's and the timeout is set to 2 minutes
resource "google_cloud_run_service" <ressource-name> {
name = <name>
location = <location>
template {
spec {
service_account_name = <sa-email>
timeout_seconds = 120
containers {
image = var.image
env {
name = "GCP_PROJECT"
value = var.project
}
env {
name = "BRANCH_NAME"
value = var.branch
}
resources {
limits = {
cpu = "2000m"
memory = "1Gi"
}
}
}
}
}
autogenerate_revision_name = true
}
I have already tried tweaking the resources and timeout in Cloud Run, using the --timeout and --preload flag for gunicorn as that is what people always seem to recommend when googling the problem but all without success. I also dont exactly know why the workers are timing out.
Extending on the top answer which is correct, You are using GUnicorn which is a process manager that manages Uvicorn processes which runs the actual app.
When Cloudrun wants to shutdown the instance (due to lack of requests probably) it will send a signal 6 to process 1. However, GUnicorn occupies this process as the manager and will not pass it to the Uvicorn workers for handling - thus you receive the Unhandled signal 6.
The simplest solution, is to run Uvicorn directly instead of through GUnicorn (possibly with a smaller instance) and allow the scaling part to be handled via Cloudrun.
CMD ["uvicorn", "app.__main__:app", "--host", "0.0.0.0", "--port", "8080"]
Unless you have enabled CPU is always allocated, background threads and processes might stop receiving CPU time after all HTTP requests return. This means background threads and processes can fail, connections can timeout, etc. I cannot think of any benefits to running background workers with Cloud Run except when setting the --cpu-no-throttling flag. Cloud Run instances that are not processing requests, can be terminated.
Signal 6 means abort which terminates processes. This probably means your container is being terminated due to a lack of requests to process.
Run more workloads on Cloud Run with new CPU allocation controls
What if my application is doing background work outside of request processing?
This error happens when a background process is aborted. There are some advantages of running background threads on cloud just like for other applications. Luckily, you can still use them on Cloud Run without processes getting aborted. To do so, when deploying, chose the option "CPU always allocated" instead of "CPU only allocated during request processing"
For more details, check https://cloud.google.com/run/docs/configuring/cpu-allocation
I need to fix the problem with django-celery with a Redis as a broker. Celery receives the task, accept it but can't finish with raising an exception that the task is not registered.
celery -A proj inspect registered shows that my tasks are registered.
I have also run into this problem. You might want to check a few things:
1.) the directory from which you are running the celery application
2.) the directory you supplied to the include parameter
Celery is a little bit picky with using full and relative paths when looking for your tasks
Here is an example of what my app looks like:
|Dir
|_to
|_my
|_app
|_celery.py
|_producer.py
|_tasksfolder
|_consumer.py
In celery.py: This is how I provider path to my task
celery_app = Celery("some_name"
, broker="amqp://localhost//"
, backend="db+YOUR_DB://user:pw#localhost:port/db"
, include=["tasksfolder.consumer"])
And in /Dir/to/my/app I execute celery -A celery worker -l info
Hope this helps
Is that the only Celery worker in your cluster, or you have more? You did not show the entire trace so I am not sure... What I am sure though is that the exception was not thrown on your celery#a4983e769717 worker! You can check this hopefully by examining this particular worker's log (I hope you are writing logs to a file).
The exception was thrown on a worker that has old code. You can even find which one by examining more carefully the output of the same command you used above: celery -A proj inspect registered. Find worker(s) that does not have run_saving_aggregate_data registered. At least one of them does not.
PS. you do not have to use the screenshots, you can paste the text into a pre-formatted block...
I'm setting up a Flask app with Gunicorn in a Docker environment.
When I want to spin up my containers, I want my Flask container to create database tables (based on my models) if my database is empty. I included a function in my wsgi.py file, but that seems to trigger the function each time a worker is initialized. After that I tried to use server hooks in my gunicorn.py config file, like below.
"""gunicorn WSGI server configuration."""
from multiprocessing import cpu_count
from setup import init_database
def on_starting(server):
"""Executes code before the master process is initialized"""
init_database()
def max_workers():
"""Returns an amount of workers based on the number of CPUs in the system"""
return 2 * cpu_count() + 1
bind = '0.0.0.0:8000'
worker_class = 'eventlet'
workers = max_workers()
I expect gunicorn to trigger the on_starting function automatically but the hook never seems to trigger. The app seems to startup normally, but when I try to make a request that wants to insert a database entry it says that the table doesn't exist. How do I trigger the on_starting hook?
I fixed my issue by preloading the app first before creating workers to serve my app. I did this by adding this line to my gunicorn.py config file:
...
preload_app = True
This way the app is already running and can accept commands to create the necessary database tables.
Gunicorn imports a module in order to get at app (or whatever other name you tell Gunicorn the WSGI application object lives at). During that import, which happens before Gunicorn starts directing traffic to the app, code is executing. Put your startup code there, after you've created db (assuming you're using SQLAlchemy), and imported your models (so that SQLAlchemy will know about then and will hence know what tables to create).
Alternatively, populate your container with an pre-created database.
Env.: Node.js on Ubuntu, using PM2 programmatically.
I have started PM2 with 3 instances via Node on my main code. Suppose I use the PM2 command line to delete one of the instances. Can I add back another worker to the pool? Can this be done without affecting the operation of the other workers?
I suppose I should use the start method:
pm2.start({
name : 'worker',
script : 'api/workers/worker.js', // Script to be run
exec_mode : 'cluster', // OR FORK
instances : 1, // Optional: Scale your app by 4
max_memory_restart : '100M', // Optional: Restart your app if it reaches 100Mo
autorestart : true
}, function(err, apps) {
pm2.disconnect();
});
However, if you use pm2 monit you'll see that the 2 existing instances are restarted and no other is created. Result is still 2 running instances.
update
it doesn't matter if cluster or fork -- behavior is the same.
update 2 The command line has the scale option ( https://keymetrics.io/2015/03/26/pm2-clustering-made-easy/ ), but I don't see this method on the programmatic API documentation ( https://github.com/Unitech/PM2/blob/master/ADVANCED_README.md#programmatic-api ).
I actually think this can't be done in PM2 as I have the exact same problem.
I'm sorry, but I think the solution is to use something else as PM2 is fairly limited. The lack of ability to add more workers is a deal breaker for me.
I know you can "scale" on the command line if you are using clustering but I have no idea why you can not start more instances if you are using fork. It makes no sense.
As I know, all commands of PM2 can also be used programmatically, including scale. Check out CLI.js to see all available methods.
Try to use the force attribute in the application declaration. If force is true, you can start the same script several times, which is usually not allowed by PM2 (according to the Application Declaration
docs)
By the way, autorestart it's true by default.
You can do so by use of a ecosystem.config file.
Inside that file you can specify as much worker processes as you want.
E.g. we used BullJS to develop a microservice architecture of different workers that are started with the help of PM2 on multiple cores: The same worker started as named instances multiple times.
Now when jobs are run BullJS load balances the workloads for one specific worker on all available instances for that worker.
You could of course start or stop any instance via CLI and also start additional named workers via the command line to increase the amount of workers (e.g. if many jobs need to be run and you want to process more jobs at a time):
pm2 start './script/to/start.js' --name additional-worker-4
pm2 start './script/to/start.js' --name additional-worker-5
I'm running a Spark cluster in standalone mode.
I've submitted a Spark application in cluster mode using options:
--deploy-mode cluster –supervise
So that the job is fault tolerant.
Now I need to keep the cluster running but stop the application from running.
Things I have tried:
Stopping the cluster and restarting it. But the application resumes
execution when I do that.
Used Kill -9 of a daemon named DriverWrapper but the job resumes again after that.
I've also removed temporary files and directories and restarted the cluster but the job resumes again.
So the running application is really fault tolerant.
Question:
Based on the above scenario can someone suggest how I can stop the job from running or what else I can try to stop the application from running but keep the cluster running.
Something just accrued to me, if I call sparkContext.stop() that should do it but that requires a bit of work in the code which is OK but can you suggest any other way without code change.
If you wish to kill an application that is failing repeatedly, you may do so through:
./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>
You can find the driver ID through the standalone Master web UI at http://:8080.
From Spark Doc
Revisiting this because I wasn't able to use the existing answer without debugging a few things.
My goal was to programmatically kill a driver that runs persistently once a day, deploy any updates to the code, then restart it. So I won't know ahead of time what my driver ID is. It took me some time to figure out that you can only kill the drivers if you submitted your driver with the --deploy-mode cluster option. It also took me some time to realize that there was a difference between application ID and driver ID, and while you can easily correlate an application name with an application ID, I have yet to find a way to divine the driver ID through their api endpoints and correlate that to either an application name or the class you are running. So while run-class org.apache.spark.deploy.Client kill <master url> <driver ID> works, you need to make sure you are deploying your driver in cluster mode and are using the driver ID and not the application ID.
Additionally, there is a submission endpoint that spark provides by default at http://<spark master>:6066/v1/submissions and you can use http://<spark master>:6066/v1/submissions/kill/<driver ID> to kill your driver.
Since I wasn't able to find the driver ID that correlated to a specific job from any api endpoint, I wrote a python web scraper to get the info from the basic spark master web page at port 8080 then kill it using the endpoint at port 6066. I'd prefer to get this data in a supported way, but this is the best solution I could find.
#!/usr/bin/python
import sys, re, requests, json
from selenium import webdriver
classes_to_kill = sys.argv
spark_master = 'masterurl'
driver = webdriver.PhantomJS()
driver.get("http://" + spark_master + ":8080/")
for running_driver in driver.find_elements_by_xpath("//*/div/h4[contains(text(), 'Running Drivers')]"):
for driver_id in running_driver.find_elements_by_xpath("..//table/tbody/tr/td[contains(text(), 'driver-')]"):
for class_to_kill in classes_to_kill:
right_class = driver_id.find_elements_by_xpath("../td[text()='" + class_to_kill + "']")
if len(right_class) > 0:
driver_to_kill = re.search('^driver-\S+', driver_id.text).group(0)
print "Killing " + driver_to_kill
result = requests.post("http://" + spark_master + ":6066/v1/submissions/kill/" + driver_to_kill)
print json.dumps(json.loads(result.text), indent=4)
driver.quit()
https://community.cloudera.com/t5/Support-Questions/What-is-the-correct-way-to-start-stop-spark-streaming-jobs/td-p/30183
according this link use to stop if your master use yarn
yarn application -list
yarn application -kill application_id