I am trying to setup a Celery application under Flask to accept API requests and then separate Celery workers to perform the long running tasks. My problem is that my Flask and everything else in my environment uses MongoDB so I do not want to setup a separate SQL db just for the Celery results. I cannot find any good examples of how to properly configure Celery with a MongoDB cluster as the backend.
Here are the settings I have tried to make it accept:
CELERY_RESULT_BACKEND = "mongodb"
CELERY_MONGODB_BACKEND_SETTINGS = {"host": "mongodb://mongodev:27017",
"database": "celery",
"taskmeta_collection": "celery_taskmeta"}
No matter what I do, Celery seems to ignore the config settings and launched without any results backend. Does anywon have a working example using the latest version of Celery? The only other examples I can find are of v3 Celery setups and that didn't work for me either since I am using a Mongo replica cluster in production which seems unsupported for that version.
[Edit]Adding more information in the complicated way I am setting the config to work with the rest of the application.
The config values are first passed as environment variables through a docker-compose file like this:
environment:
- PYTHONPATH=/usr/src/
- APP_SETTINGS=config.DevelopmentConfig
- FLASK_ENV=development
- CELERY_BROKER_URL=amqp://guest:guest#rabbit1:5672
- CELERY_BROKER_DEV=amqp://guest:guest#rabbit1:5672
- CELERY_RESULT_SERIALIZER=json
- CELERY_RESULT_BACKEND=mongodb
- CELERY_MONGODB_BACKEND_SETTINGS={"host":"mongodb://mongodev:27017","database":"celery","taskmeta_collection":"celery_taskmeta"}
Then, inside the config.py file they are loaded:
class DevelopmentConfig(BaseConfig):
"""Development configuration"""
CELERY_BROKER_URL = os.getenv('CELERY_BROKER_DEV')
CELERY_RESULT_SERIALIZER = os.getenv('CELERY_RESULT_SERIALIZER')
CELERY_RESULT_BACKEND = os.getenv('CELERY_RESULT_BACKEND')
CELERY_MONGODB_BACKEND_SETTINGS = ast.literal_eval(os.getenv('CELERY_MONGODB_BACKEND_SETTINGS'))
Then, when Celery is initiated, the config is loaded:
app = Celery('celeryworker', broker=os.getenv('CELERY_BROKER_URL'),
include=['celeryworker.tasks'])
print('app initiated')
app.config_from_object(app_settings)
app.conf.update(accept_content=['json'])
print("CELERY_MONGODB_BACKEND_SETTINGS",
os.getenv('CELERY_MONGODB_BACKEND_SETTINGS'))
print("celery config",app.conf)
When the application comes up here is what I see with all my troubleshooting prints. I have redacted a lot of the config output just to show what I have here is passing through the config.py to app.config but being ignored by Celery. You can see the value makes it into the celery.py file and I am sure Celery does something with it because before I added the ast.literal_eval in the config.py Celery would throw an error saying that the MongoDB backend settings needed to be a dict rather then a string. Unfortunately now that it is being passed as a proper dict Celery ignores it.
app_settings SGSDevOps.config.DevelopmentConfig
app initiated
CELERY_MONGODB_BACKEND_SETTINGS {"host":"mongodb://mongodev:27017","database":"celery","taskmeta_collection":"celery_taskmeta"}
celery config Settings(Settings({'BROKER_URL': 'amqp://guest:guest#rabbit1:5672', 'CELERY_INCLUDE': ['celeryworker.tasks'], 'CELERY_ACCEPT_CONTENT': ['json']}, 'BROKER_URL': 'amqp://guest:guest#rabbit1:5672', 'CELERY_MONGODB_BACKEND_SETTINGS': None, 'CELERY_RESULT_BACKEND': None}))
APP_SETTINGS config.DevelopmentConfig
app.config <Config {'ENV': 'development', 'CELERY_BROKER_URL': 'amqp://guest:guest#rabbit1:5672', 'CELERY_MONGODB_BACKEND_SETTINGS': {'host': 'mongodb://mongodev:27017', 'database': 'celery', 'taskmeta_collection': 'celery_taskmeta'}, 'CELERY_RESULT_BACKEND': 'mongodb', 'CELERY_RESULT_SERIALIZER': 'json', }>
-------------- celery#a5ea76b91f77 v4.2.1 (windowlicker)
---- **** -----
--- * *** * -- Linux-4.9.93-linuxkit-aufs-x86_64-with-debian-9.4 2018-10-29 17:25:27
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: celeryworker:0x7f28e828f668
- ** ---------- .> transport: amqp://guest:**#rabbit1:5672//
- ** ---------- .> results: mongodb://
- *** --- * --- .> concurrency: 2 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. celeryworker.tasks.longtime_add
I still do not know why the above config is not working but I found a workaround to update the config after the app loads using the new config value names:
app = Celery('celeryworker', broker=os.getenv('CELERY_BROKER_URL'),
backend=os.getenv('CELERY_RESULT_BACKEND'),
include=['SGSDevOps.celeryworker.tasks'])
print('app initiated')
app.config_from_object(app_settings)
app.conf.update(accept_content=['json'])
app.conf.update(mongodb_backend_settings=ast.literal_eval(os.getenv('CELERY_MONGODB_BACKEND_SETTINGS')))
Related
I'm currently working on a project where I have a virtual machine on Microsoft Azure and I'm trying to have multiple Docker containers to be accessed through different routes with the help of a Traefik reverse proxy. Besides the reverse-proxy, the first service I need to have is RabbitMQ and I should be able to access its user-interface on a /rmq route. Right now, I have the following docker-compose file to build both services:
version: "3.5"
services:
rabbitmq:
image: rabbitmq:3-alpine
expose:
- 5672
- 15672
volumes:
- ./rabbit/enabled_plugins:/etc/rabbitmq/enabled_plugins
labels:
- traefik.enable=true
- traefik.http.routers.rabbitmq.rule=Host(`HOST.com`) && PathPrefix(`/rmq`)
# needed, when you do not have a route "/rmq" inside your container (according to https://stackoverflow.com/questions/59054551/how-to-map-specific-port-inside-docker-container-when-using-traefik)
- traefik.http.routers.rabbitmq.middlewares=strip-docs
- traefik.http.middlewares.strip-docs.stripprefix.prefixes=/rmq
- traefik.port=15672
networks:
- proxynet
traefik:
image: traefik:2.1
command: --api=true # Enables the web UI
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./traefik/traefik.toml:/etc/traefik/traefik.toml:ro
ports:
- 80:80
- 443:443
labels:
traefik.enable: true
traefik.http.routers.traefik.rule: "Host(`HOST.com`)"
traefik.http.routers.traefik.service: "api#internal"
networks:
- proxynet
And this is the content of my traefik.toml file:
logLevel = "DEBUG"
debug = true
[api]
dashboard = true
insecure = false
debug = true
[providers.docker]
endpoint = "unix:///var/run/docker.sock"
watch = true
[entryPoints]
[entryPoints.web]
address = ":80"
[entryPoints.web-secure]
address = ":443"
[log]
level = "DEBUG"
format = "json"
The enabled_plugins file specifies which RabbitMQ plugins should be activated. Here, I have the rabbitmq_management plugin (among others), which I think is needed to access the RabbitMQ UI. I even checked the logs of the RabbitMQ container and apparently the rabbitmq_management was properly started:
rabbitmq_1 | 2021-01-30 15:50:30.538 [info] <0.730.0> Server startup complete; 7 plugins started.
rabbitmq_1 | * rabbitmq_stomp
rabbitmq_1 | * rabbitmq_federation_management
rabbitmq_1 | * rabbitmq_mqtt
rabbitmq_1 | * rabbitmq_federation
rabbitmq_1 | * rabbitmq_management
rabbitmq_1 | * rabbitmq_web_dispatch
rabbitmq_1 | * rabbitmq_management_agent
rabbitmq_1 | completed with 7 plugins.
rabbitmq_1 | 2021-01-30 15:50:30.539 [info] <0.730.0> Resetting node maintenance status
With these configurations running with docker-compose up, if I try to access HOST.com/rmq, I get a 502 (Bad Gateway) error on the console of my browser. And initially, this was where I was stuck. However, after searching for some help online, I found a different way to specify the traefik port on the RabbitMQ container labels (traefik.http.services.rabbitmq.loadbalancer.server.port=15672) and, with this modification, I don't have the Bad Request error anymore, but I get a lot of ERR_ABORTED 404 (Not Found) errors on the console of my browser (the list bellow does not contain all the errors):
rmq:7 GET http://HOST.com/js/ejs-1.0.min.js net::ERR_ABORTED 404 (Not Found)
rmq:18 GET http://HOST.com/js/charts.js net::ERR_ABORTED 404 (Not Found)
rmq:19 GET http://HOST.com/js/singular/singular.js net::ERR_ABORTED 404 (Not Found)
Refused to apply style from 'http://HOST.com/css/main.css' because its MIME type ('text/plain') is not a supported stylesheet MIME type, and strict MIME checking is enabled.
rmq:27 Uncaught ReferenceError: sync_get is not defined at rmq:27
I don't have much experience with this kind of projects and I don't know if I'm doing something wrong or if there's something missing in these configurations or on the configurations of the virtual machine itself. Do you know what I should do to be able to access the RabbitMQ UI with the URL HOST.com/rmq ?
If I get this running properly, I think I would also be able to configure Docker to only allow access to the Traefik UI with a route such as HOST.com/dashboard, instead of accessing it only with the URL without any routes.
Thanks in advance!
Solved it. I don't know why, but when I used the configuration traefik.http.services.rabbitmq.loadbalancer.server.port=15672, I changed the order of the lines traefik.http.routers.rabbitmq.middlewares=strip-docs and traefik.http.middlewares.strip-docs.stripprefix.prefixes=/rmq, making the prefix appear before the middleware. Changed that and now I can access the RabbitMQ UI on HOST.com/rmq. So my final docker-compose was this:
version: "3.5"
services:
rabbitmq:
image: rabbitmq:3-alpine
expose:
- 5672
- 15672
volumes:
- ./rabbit/enabled_plugins:/etc/rabbitmq/enabled_plugins
labels:
- traefik.enable=true
- traefik.http.routers.rabbitmq.rule=Host(`HOST.com`) && PathPrefix(`/rmq`)
# needed, when you do not have a route "/rmq" inside your container (according to https://stackoverflow.com/questions/59054551/how-to-map-specific-port-inside-docker-container-when-using-traefik)
- traefik.http.routers.rabbitmq.middlewares=strip-docs
- traefik.http.middlewares.strip-docs.stripprefix.prefixes=/rmq
- traefik.http.services.rabbitmq.loadbalancer.server.port=15672
networks:
- proxynet
traefik:
image: traefik:2.1
command: --api=true # Enables the web UI
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./traefik/traefik.toml:/etc/traefik/traefik.toml:ro
ports:
- 80:80
- 443:443
labels:
traefik.enable: true
traefik.http.routers.traefik.rule: "Host(`HOST.com`)"
traefik.http.routers.traefik.service: "api#internal"
networks:
- proxynet
I'll mark this question as solved, but if you know why the order of these 2 lines matters, please explain for future reference.
Thanks!
Trace of how I determined an answer to suggest for this question, given that I haven't used the specific tools:
By searching rabbitmq admin url, I found the rabbitmq management docs page, which near the top, mentions support of a path prefix setting. I searched the page for that, and under the relevant heading, found that you likely will need to set this setting in your rabbitmq config:
management.path_prefix = /rmq
So, to apply it to your docker config, I looked up the rabbitmq docker image, which discusses that configuration files need to be injected via a bind mount, or can be provided via an esoteric erlang config thing which I'd personally not mess with. Therefore, the steps I'd follow from here would be:
look in the existing rabbitmq image to find out what the default config file in /etc/rabbitmq/rabbitmq.conf is, eg by running docker-compose run rabbitmq cat /etc/rabbitmq/rabbitmq.conf, or an appropriate docker cp command if it turns out rabbitmq sets a docker ENTRYPOINT which prevents use of shell commands on the image command line
add a volume just like you have with enabled plugins but move it one directory upward, mapping rabbit/ to /etc/rabbitmq/, and then put the default config from the container in rabbit/
add that line to the config file
With any luck that should at least get you closer. I'm curious to hear how it goes!
By the way, while looking at the rabbitmq docker image docs, I discovered that there are special tags for if you need management interface support. You may find that you need to switch to one of those instead of plain 3-alpine in order for this to work, eg rabbitmq:3-management-alpine.
I have the following code snippet in my main.py:
import os
from app import create_app
from models import db, bcrypt
if __name__ == '__main__':
HOST = os.environ.get('SERVER_HOST', 'localhost')
try:
PORT = int(os.environ.get('SERVER_PORT', '5555'))
except ValueError:
PORT = 5555
env_name = os.getenv('FLASK_ENV', "Please set FLASK_ENV")
print("env_name: ", env_name)
app = create_app(env_name)
I run it using flask run inside pipenv shell and bump into the following error in the line which prints out the env_name. I have tried both set FLASK_ENV=development (Windows 10) and using .env but to no avail. I use python-3.8.3
(src-4Nvvrxp5) C:\Projects\Python\PythonFlaskRestAPI\src>flask run
* Serving Flask app "main.py" (lazy loading)
* Environment: development
* Debug mode: on
* Restarting with stat
env_name: <flask.cli.ScriptInfo object at 0x000002D6BA598940>
* Debugger is active!
* Debugger PIN: 269-678-937
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Any advice and insight is appreciated.
So upon further research, I was able to reproduce your problem(somewhat) locally. As you can see in the image below.
Now the reason this is happening is because of this paragraph from flask official documentation:
What you want to focus on is this :
If the application factory takes only one argument and no parentheses follow the factory name, the ScriptInfo instance is passed as a positional argument.
As such there is no error that is occuring, the code is working as expected.
Now if your concern is the fact that (set FLASK_ENV=development) command is not setting it correctly, I would point out that it is indeed setting it correctly as seen here in your OP:
(src-4Nvvrxp5) C:\Projects\Python\PythonFlaskRestAPI\src>flask run
* Serving Flask app "main.py" (lazy loading)
* Environment: development
* Debug mode: on
* Restarting with stat
The third line in the terminal out above says "* Environment: development" , whereas the default value according to the documentation is "* Environment: production "
Let me know if that resolved your concerns and queries :D . Good Luck!
I have the following cron.yaml:
cron:
- description: "TEST_TEST_TEST"
- url: /cronBatchClean
- schedule: every 2 minutes
And then in app.yaml:
service: environ-flexible
runtime: python
env: flex
entrypoint: gunicorn -b :$PORT main:app
runtime_config:
python_version: 3
With this as main.py:
from flask import Flask, request
import sys
app = Flask(__name__)
#app.route('/cronBatchClean')
def cronBatchClean():
print("CRON_CHECK", file=sys.stderr)
return "CRON_CHECK"
When I type in the full URL, I receive "CRON_CHECK" on screen but this doesn't seem to be executing. Also in App Engine dashboard, when I click on CRON jobs there aren't any listed.
Any help in getting this to execute would be much appreciated,
Thanks :)
EDIT 1
I now have the cron task executing but I'm receiving a 404 error. When I type the full URL (that is - https://.appspot.com/cronBatchClean) the respective code executes.
I added a GET handler but I'm still not receiving any luck.
#app.route('/cronBatchClean', methods=['GET'])
def cronBatchClean():
print("CRON_JOB_PRINT", file=sys.stderr)
return "CRON_CHECK"
In the cron.yaml there are unnecessary “-” characters, that are starting the new list. YAML Syntax
Correct format for Cron Jobs cron.yaml, see Google Cloud documentation:
cron:
- description: "TEST_TEST_TEST"
url: /cronBatchClean
schedule: every 2 minutes
To deploy Cron Job use gcloud command :
$ gcloud app deploy cron.yaml
To solve this problem i changed the service name to default then i deploy with the default service the cron job task path was pointed to the default path of app engine so that when the task was scheduled 404 error raised because the path doesn't match when the service name set to "environ-flexible"
In app.yaml
change :
service: environ-flexible
to
service: default
My node app that's on google app engine was working fine and then suddenly after another deployment, it started bringing this error:
Error: Server Error
The server encountered a temporary error and could not complete your
request. Please try again in 30 seconds.
I check the log there is no error or any problem notification.
Here is my app.yaml content:
# [START app_yaml]
runtime: nodejs
env: flex
# [START env]
env_variables:
MYSQL_USER: adminpard
MYSQL_PASSWORD: password
MYSQL_DATABASE: mydatabase
# e.g. my-awesome-project:us-central1:my-cloud-sql-instance
INSTANCE_CONNECTION_NAME: kock-130024:us-east1:osdb
# [END env]
# [START cloudsql_settings]
beta_settings:
# The connection name of your instance, available by using
# 'gcloud beta sql instances describe [INSTANCE_NAME]' or from
# the Instance details page in the Google Cloud Platform Console.
cloud_sql_instances: kock-130024:us-east1:osdb
# [END cloudsql_settings]
# [END app_yaml]
Here is also my .dockerignore file:
.git
projects.sln
.vs
myproject.njsproj
.gitignore
node_module
skills
npm-debug.log
I am little bit confused I need some guidance to understand where the problem is or how to solve it.
I use PM2 to execute my Node.js app.
In order to do that, I have defined the following ecosystem config file:
apps:
- script: app.js
name: "myApp"
exec_mode: cluster
cwd: "/etc/myService/myApp"
Everything is working. Now I want to specify the custom location for the PM2's logs, therefore I added into ecosystem config file:
log: "/etc/myService/myApp/logs/myApp.log"
It works, but I paid attention that after execution of pm2 start ecosystem, PM2 will write the logs to both locations at the same time:
/etc/myService/myApp/logs/myApp.log (as expected)
/home/%$user%/.pm2/logs/ (default logs destination)
How can I specify the only place for logs of PM2 and avoid the duplicate logs generation?
Based on the comment of robertklep, in order to solve the issue we have to use out_file and err_file fields for output and error log paths respectively.
Syntax sample in YAML format:
out_file: "/etc/myService/myApp/logs/myApp_L.log"
err_file: "/etc/myService/myApp/logs/myApp_E.log"
P.S. The field log can be removed from the config file:
log: "/etc/myService/myApp/logs/myApp.log"