I am trying to run airflow on my windows machine using docker. Here is the link that I am following from the official doc - https://airflow.apache.org/docs/apache-airflow/2.0.1/start/docker.html.
I have created the directory structure as expected and also downloaded the docker-compose yaml file. On running 'docker-compose up airflow-init' as suggested by documentation. I get below error
airflow-init_1 |
airflow-init_1 | [2021-07-03 10:19:29,721] {cli_action_loggers.py:105} WARNING - Failed to log action with (psycopg2.errors.UndefinedTable) relation "log" does not exist
airflow-init_1 | LINE 1: INSERT INTO log (dttm, dag_id, task_id, event, execution_dat...
airflow-init_1 | ^
airflow-init_1 |
airflow-init_1 | [SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (%(dttm)s, %(dag_id)s, %(task_id)s, %(event)s, %(execution_date)s, %(owner)s, %(extra)s) RETURNING log.id]
airflow-init_1 | [parameters: {'dttm': datetime.datetime(2021, 7, 3, 10, 19, 29, 712157, tzinfo=Timezone('UTC')), 'dag_id': None, 'task_id': None, 'event': 'cli_upgradedb', 'execution_date': None, 'owner': 'airflow', 'extra': '{"host_name": "7f142ce11611", "full_command": "[\'/home/airflow/.local/bin/airflow\', \'db\', \'upgrade\']"}'}]
From the logs its clear that the log table does not exists and airflow is trying to insert into it. Not sure though why or how this error can be fixed. I am using the original docker-compose file that is published on airflow doc page.
This is the current status of my airflow docker image
on trying to access the airflow UI using - http://localhost:8080/admin/
I get Airflow 404=lot of circles error
This is just a warning because airflow CLI tries to add an audit log to log table before the tables get created.
I have the same warning on a fresh DB initially, but then the ouptu continues.
The output should continue and you should get something like that at the end (I run it with just released 2.1.1 which I recommend you to start with):
airflow-init_1 | [2021-07-03 15:54:01,449] {manager.py:784} WARNING - No user yet created, use flask fab command to do it.
airflow-init_1 | Upgrades done
airflow-init_1 | [2021-07-03 15:54:06,899] {manager.py:784} WARNING - No user yet created, use flask fab command to do it.
airflow-init_1 | Admin user airflow created
airflow-init_1 | 2.1.1
Need to initialize the airflow dB
docker exec -ti airflow-webserver airflow db init && echo "Initialized airflow DB"
Create Admin user
docker exec -ti airflow-webserver airflow users create --role Admin --username {AIRFLOW_USER} --password {AIRFLOW_PASSWORD) -e {AIRFLOW_USER_EMAIL} -f {FIRST_NAME} -l {LAST_NAME} && echo "Created airflow admin user"
Related
I use the mcr.microsoft.com/mssql/server:2017 docker container to run a mssql server. I tried to change the collation like this:
echo "SQL_Latin1_General_CP1_CI_AS" | /opt/mssql/bin/mssql-conf set-collation
Unfortunately I get this error:
No passwd entry for user 'mssql'
How is it possible to fix this error?
I created a new user with useradd mssql, but now I get this error if I run the command:
sqlservr: Unable to open /var/opt/mssql/.system/instance_id: File: pal.cpp:566 [Status: 0xC0000022 Access Denied errno = 0xD(13) Permission denied]
/opt/mssql/bin/sqlservr: PAL initialization failed. Error: 101
It looks the latest mcr.microsoft.com/mssql/server fix such issue, if you insist on the old, next could be the procedure to fix all user/permission issue:
cake#cake:~/20211012$ docker run --rm -it mcr.microsoft.com/mssql/server:2017-latest /bin/bash
SQL Server 2019 will run as non-root by default.
This container is running as user root.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
root#4fd0bdf1d21c:/# useradd mssql
root#4fd0bdf1d21c:/# mkdir -p /var/opt/mssql
root#4fd0bdf1d21c:/# chmod -R 777 /var/opt/mssql
root#4fd0bdf1d21c:/# echo "SQL_Latin1_General_CP1_CI_AS" | /opt/mssql/bin/mssql-conf set-collation
Enter the collation: Configuring SQL Server...
The SQL Server End-User License Agreement (EULA) must be accepted before SQL
Server can start. The license terms for this product can be downloaded from
http://go.microsoft.com/fwlink/?LinkId=746388.
You can accept the EULA by specifying the --accept-eula command line option,
setting the ACCEPT_EULA environment variable, or using the mssql-conf tool.
I am adding airflow to a web application that manually adds a directory containing business logic to the PYTHON_PATH env var, as well as does additional system-level setup that I want to be consistent across all servers in my cluster. I've been successfully running celery for this application with RMQ as the broker and redis as the task results backend for awhile, and have prior experience running Airflow with LocalExecutor.
Instead of using Pukel's image, I have a an entry point for a base backend image that runs a different service based on the SERVICE env var. That looks like this:
if [ $SERVICE == "api" ]; then
# upgrade to the data model
flask db upgrade
# start the web application
python wsgi.py
fi
if [ $SERVICE == "worker" ]; then
celery -A tasks.celery.celery worker --loglevel=info --uid=nobody
fi
if [ $SERVICE == "scheduler" ]; then
celery -A tasks.celery.celery beat --loglevel=info
fi
if [ $SERVICE == "airflow" ]; then
airflow initdb
airflow scheduler
airflow webserver
I have an .env file that I build the containers with the defines my airflow parameters:
AIRFLOW_HOME=/home/backend/airflow
AIRFLOW__CORE__LOAD_EXAMPLES=False
AIRFLOW__CORE__EXECUTOR=CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN=mysql+pymysql://${MYSQL_USER}:${MYSQL_ROOT_PASSWORD}#${MYSQL_HOST}:${MYSQL_PORT}/airflow?charset=utf8mb4
AIRFLOW__CELERY__BROKER_URL=amqp://${RABBITMQ_DEFAULT_USER}:${RABBITMQ_DEFAULT_PASS}#${RABBITMQ_HOST}:5672
AIRFLOW__CELERY__RESULT_BACKEND=redis://${REDIS_HOST}
With how my entrypoint is setup currently, it doesn't make it to the webserver. Instead, it runs that scheduler in the foreground with invoking the web server. I can change this to
airflow initdb
airflow scheduler -D
airflow webserver
Now the webserver runs, but it isn't aware of the scheduler, which is now running as a daemon:
Airflow does, however, know that I'm using a CeleryExecutor and looks for the dags in the right place:
airflow | [2020-07-29 21:48:35,006] {default_celery.py:88} WARNING - You have configured a result_backend of redis://redis, it is highly recommended to use an alternative result_backend (i.e. a database).
airflow | [2020-07-29 21:48:35,010] {__init__.py:50} INFO - Using executor CeleryExecutor
airflow | [2020-07-29 21:48:35,010] {dagbag.py:396} INFO - Filling up the DagBag from /home/backend/airflow/dags
airflow | [2020-07-29 21:48:35,113] {default_celery.py:88} WARNING - You have configured a result_backend of redis://redis, it is highly recommended to use an alternative result_backend (i.e. a database).
I can solve this by going inside the container and manually firing up the scheduler:
The trick seems to be running both processes in the foreground within the container, but I'm stuck on how to do that inside the entrypoint. I've checked out Pukel's entrypoint code, but it's not obvious to me what he's doing. I'm sure that with just a slight tweak this will be off to the races... Thanks in advance for the help. Also, if there's any major anti-pattern that I'm at risk of running into here I'd love to get the feedback so that I can implement airflow properly. This is my first time implementing CeleryExecutor, and there's a decent amount involved.
try using nohup. https://en.wikipedia.org/wiki/Nohup
nohup airflow scheduler >scheduler.log &
in your case, you would update your entrypoint as follows:
if [ $SERVICE == "airflow" ]; then
airflow initdb
nohup airflow scheduler > scheduler.log &
nohup airflow webserver
fi
I have an elastic beanstalk environment, which is running a docker container that has a node js API. On the AWS Console, if I select my environment, then go to Configuration/Software I have the following:
Log groups: /aws/elasticbeanstalk/my-environment
Log streaming: Enabled
Retention: 3 days
Lifecycle: Keep after termination.
However, if I click on that log group on the Cloudwatch console, I have a Last Event Time of some weeks ago (which I believe corresponds to when the environment was created) and have no content on the logs.
Since this is a dockerized application, Logs for the server itself should be at /aws/elasticbeanstalk/my-environment/var/log/eb-docker/containers/eb-current-app/stdouterr.log.
If I instead get the Logs directly from the instances by going once again to my EB environment, clicking "Logs" and then "Request last 100 Lines" the logging is happening correctly. I just can't see a thing when using CloudWatch.
Any help is gladly appreciated
I was able to get around this problem.
So CloudWatch makes a hash based on the first line of your log file and the log stream key, and the problem is that my first line on the stdouterr.log file was actually an empty line!
After couple of days playing around and getting help from the good AWS support team, I first connected via SSH to my EC2 instance associated to the EB environment and you need to add the following line to the /etc/awslogs/config/beanstalklogs.conf file, right after the "file=/var/log/eb-docker/containers/eb-current-app/stdouterr.log" line:
file_fingerprint_lines=1-20
With these, you tell the AWS service that it should calculate the hash using lines 1 through 20 on the log file. You could change 20 for larger or smaller numbers depending on your logging content; however I don't know if there is an upper limit for the value.
After doing so, you need to restart the AWS Logs Service on the instance.
For this you would execute:
sudo service awslogs stop
sudo service awslogs start
or simpler:
sudo service awslogs restart
After these steps I started using my environment and the logging was now being properly streamed to the CloudWatch console!
However this would not work if a new deployment is made, if the EC2 instance gets replaced or the auto scalable group spawns another.
To have a fix for this, it is possible to add log config via the .ebextensions directory, at the root of your application before deploying.
I added a file called logs.config to the newly created .ebextensions directory and placed the following content:
files:
"/etc/awslogs/config/beanstalklogs.conf":
mode: "000644"
user: root
group: root
content: |
[/var/log/eb-docker/containers/eb-current-app/stdouterr.log]
log_group_name=/aws/elasticbeanstalk/EB-ENV-NAME/var/log/eb-docker/containers/eb-current-app/stdouterr.log
log_stream_name={instance_id}
file=/var/log/eb-docker/containers/eb-current-app/*stdouterr.log
file_fingerprint_lines=1-20
commands:
01_remove_eb_stream_config:
command: 'rm /etc/awslogs/config/beanstalklogs.conf.bak'
02_restart_log_agent:
command: 'service awslogs restart'
Changing of course EB-ENV-NAME by my environment name on EB.
Hope it can help someone else!
For 64 bit Amazon Linux 2 the setup is slightly different.
For the delivery of log the AWS CloudWatch Agent is installed in /opt/aws/amazon-cloudwatch-agent and the Elastic Beanstalk configuration is in /opt/aws/amazon-cloudwatch-agent/etc/beanstalk.json. It is set to log the output of the container assuming there's a file called stdouterr.log, here's a snippet of the config:
{
"file_path": "/var/log/eb-docker/containers/eb-current-app/stdouterr.log",
"log_group_name": "/aws/elasticbeanstalk/EB-ENV-NAME/var/log/eb-docker/containers/eb-current-app/stdouterr.log",
"log_stream_name": "{instance_id}"
}
However when I look for the file_path it doesn't exist, instead I have a file path that encodes the current docker container ID /var/log/eb-docker/containers/eb-current-app/eb-e4e26c0bc464-stdouterr.log.
This logfile is created by a script /opt/elasticbeanstalk/config/private/eb-docker-log-start that is started by the eb-docker-log service, the default contents of this file are:
EB_CONFIG_DOCKER_CURRENT_APP=`cat /opt/elasticbeanstalk/deployment/.aws_beanstalk.current-container-id | cut -c 1-12`
mkdir -p /var/log/eb-docker/containers/eb-current-app/
docker logs -f $EB_CONFIG_DOCKER_CURRENT_APP >> /var/log/eb-docker/containers/eb-current-app/eb-$EB_CONFIG_DOCKER_CURRENT_APP-stdouterr.log 2>&1
To temporarily fix the logging you can manually run (replacing the docker ID) and then logs will start to appear in CloudWatch:
ln -sf /var/log/eb-docker/containers/eb-current-app/eb-e4e26c0bc464-stdouterr.log /var/log/eb-docker/containers/eb-current-app/stdouterr.log
To make this permanant I added an .ebextension to fix the eb-docker-log service so it re-makes this link so create a file in your source code in .ebextensions called fix-cloudwatch-logging.config and set it's contents to:
files:
"/opt/elasticbeanstalk/config/private/eb-docker-log-start" :
mode: "000755"
owner: root
group: root
content: |
EB_CONFIG_DOCKER_CURRENT_APP=`cat /opt/elasticbeanstalk/deployment/.aws_beanstalk.current-container-id | cut -c 1-12`
mkdir -p /var/log/eb-docker/containers/eb-current-app/
ln -sf /var/log/eb-docker/containers/eb-current-app/eb-$EB_CONFIG_DOCKER_CURRENT_APP-stdouterr.log /var/log/eb-docker/containers/eb-current-app/stdouterr.log
docker logs -f $EB_CONFIG_DOCKER_CURRENT_APP >> /var/log/eb-docker/containers/eb-current-app/eb-$EB_CONFIG_DOCKER_CURRENT_APP-stdouterr.log 2>&1
commands:
fix_logging:
command: systemctl restart eb-docker-log.service
cwd: /home/ec2-user
test: "[ ! -L /var/log/eb-docker/containers/eb-current-app/stdouterr.log ] && systemctl is-active --quiet eb-docker-log"
I configured odoo in aws ec2 and connecting Postgresql from rds when I run the command ./odoo-bin --config=/etc/odoo.conf and try to access from a browser, I'm getting the following error:
ERROR odoo_db odoo.modules.loading: Database odoo_db not initialized, you can force it with `-i base`
File "/opt/odoo/odoo/odoo/modules/registry.py", line 176, in __getitem__
return self.models[model_name]
KeyError: 'ir.http' - - -
and also I'm getting this error as well:
STATEMENT: SELECT latest_version FROM ir_module_module WHERE name='base'
ERROR odoo_db odoo.sql_db: bad query: SELECT latest_version FROM ir_module_module WHERE name='base'
ERROR: relation "ir_module_module" does not exist
In command line run:
./odoo-bin --addons-path=addons --database=odoo --db_user=odoo --db_password=odoo --db_host=localhost --db_port=5432 -i INIT
explicitly give db name, user and password, "-i INIT" option initialises the odoo database
The first glance issue is though the DB has created in Postgres but it has not the required odoo related setup records i.e. base setup. You can verify by directly accessing the DB and see the number of tables or browsing some tables.
It happens sometimes that you create the DB [specifically giving similar DB names as you have already created before and deleted later [its dropped from PG but still has traces in session or DB location path], it will not get initialized properly.
Solution:
Create sample DB with different name initial 4 characters different completely and check
Initialize the DB from odoo.conf file add db_name = < Your DB Name > {for experiment purpose put completely different name} and restart odoo services and check
Hope it will help. Njoy troubleshooting!
First do what #FaisalAnsari says in here (what I reference below):
*
Go to RDS and create a database in PostgreSQL and configure the
server.conf file as the given below.
;This is the password that allows database operations:
;admin_passwd = admin
db_host = rds_endpoint (after creating database you will get
rds_endpoint)
db_port = False
db_user = "user name which is created by you to the database"
db_password = "password which is created"
;addons_path =
/home/deadpool/workspace/odoo_13_community/custom_addons,
/home/deadpool/workspace/odoo_13_community/custom_addons
Then go to the command line and do the following.
Stop your odoo instance
~$ service odoo stop
Enable command line for the user odoo
~$ chsh -s /bin/bash odoo
execute odoo from command line as user odoo
~$ runuser -l odoo -c "odoo -i base -d YourRDSDatabase --db_host YourAmazonRDSHost.Address.rds.amazonaws.com -r YourRDSDatabaseUserName -w YourRDSDatabasePassword --stop-after-init"
After the initialization finished, start odoo service
~$ service odoo start
Troubleshooting :
if odoo doesn't start correctly make sure that the database user in your RDS instance have privileges at least on the database you are using.
~$ psql --host=YourAmazonRDSHost.Address.rds.amazonaws.com --port=5432 --username=YourRDSDatabaseUserName --password --dbname=YourRDSDatabase
and when you are inside postgresql type the following:
~$ grant all privileges on database YourRDSDatabase to YourRDSDatabaseUserName;
~$ \q
and try again from step 3.
Hope that Helps!!
I am following this post to run SQL script to create default table.
Here is my docker compose file
version: "3"
services:
web:
build: .
volumes:
- ./:/usr/src/app/
- /usr/src/app/node_modules
ports:
- “3000:3000”
depends_on:
- postgres
postgres:
container_name: postgres
build:
context: .
dockerfile: pgDockerfile
environment:
POSTGRES_PASSWORD: test
POSTGRES_USER: test
ports:
- 5432:5432
Here is my pgDockerfile
FROM postgres:9.6-alpine
# copy init sql
ADD 00-initial.sql /docker-entrypoint-initdb.d/
Here is my sql script
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE IF NOT EXISTS test (
id text NOT NULL,
title varchar(200) NOT NULL
);
I can build and run docker-compose up, and I see the following message:
postgres | CREATE DATABASE
postgres |
postgres | CREATE ROLE
postgres |
postgres |
postgres | /usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/00-initial-data.sql
postgres | CREATE EXTENSION
postgres | CREATE TABLE
postgres |
postgres |
postgres | LOG: received fast shutdown request
postgres | LOG: aborting any active transactions
postgres | waiting for server to shut down....LOG: autovacuum launcher shutting down
postgres | LOG: shutting down
postgres | LOG: database system is shut down
postgres | done
postgres | server stopped
postgres |
postgres | PostgreSQL init process complete; ready for start up.
postgres |
postgres | LOG: database system was shut down at 2017-03-22 21:36:16 UTC
postgres | LOG: MultiXact member wraparound protections are now enabled
postgres | LOG: database system is ready to accept connections
postgres | LOG: autovacuum launcher started
It seems like the db is shut down after the table is created. I think that’s the standard process for Postgres Docker but when I login to Postgres, I don’t see my table that is supposed to be there.
I login through
docker exec -it $(my postgres container id) sh
#su - postgres
#psql
# \d => No relations found.
I am not sure if this is the right way to create default data for Postgres.
I think that your docker works as intended, but it's not quite doing what you think. When setting POSTGRES_USER, but not POSTGRES_DB, the system will default to using the user name as database name, so your table is in the test database!
Using \l to list your databases will show the databases, and then you can use \c test to connect to the database. Once connected, \d will list your relations!