Restarting the airflow scheduler

Restarting the airflow scheduler - python-3.x

I'm trying to get airflow working to better orchestrate an etl process. When I make changes to a dag in my dags folder, I often have to restart the scheduler with
airflow scheduler
before the changes are visible in the UI. I would like to run the scheduler as a daemon process with
airflow scheduler -D
but we I try to do so, I get a message saying
[2018-10-17 14:13:54,769] {jobs.py:580} ERROR -
Cannot use more than 1 thread when using sqlite. Setting max_threads to 1
I think this error pops up because the scheduler is already running as a daemon. However, when I try to find out where the scheduler is being run with
lsof -i
I don't get any results.
Question: Why am I not able to restart the scheduler with airflow scheduler -D. Why does the scheduler restart with airflow webserver? How do I successfully kill the process that is preventing me to run airflow scheduler -D?

Run ps aux | grep airflow and check if airflow webserver or airflow scheduler processes are running. If they are kill them and rerun using airflow scheduler -D

You need to clear out the airflow-scheduler.pid file at $AIRFLOW_HOME. The stale pid file from the daemon will prevent you to start another scheduler process.

If you just restart your webserver, the dag changes gets reflected in UI. No need to restart scheduler for the same.
Applicable for 1.8 and 1.10.3. Cant comment for latest 1.10.10.

Related

Trying to write a shell script to monitor when a service stops in linux, and to automate the restart of this service

So I am relatively new to Centos, version 6.2. I have a service that needs to be mnonitored as a cron job, and if it stops needs to be restarted. I have a few ideas on how to monitor it, but when it comes to getting it restarted thats when I get stuck. I also know the PiD of the service I want to monitor.

You can use supervise for this: http://cr.yp.to/daemontools/supervise.html
Put it in your crontab to launch on system start:
#reboot supervise foo

docker stop spark container from exiting

I know docker only listens to pid 1 and in case that pid exits (or turns into a daemon) it thinks the program exited and the container is shut down.
When apache-spark is started the ./start-master.sh script how can I kept the container running?
I do not think: while true; do sleep 1000; done is an appropriate solution.
E.g. I used command: sbin/start-master.sh to start the master. But it keeps shutting down.
How to keep it running when started with docker-compose?

As mentioned in "Use of Supervisor in docker", you could use phusion/baseimage-docker as a base image in which you can register scripts as "services".
The my_init script included in that image will take care of the exit signals management.
And the processes launched by start-master.sh would still be running.
Again, that supposes you are building your apache-spark image starting from phusion/baseimage-docker.
As commented by thaJeztah, using an existing image works too: gettyimages/spark/~/dockerfile/. Its default CMD will keep the container running.
Both options are cleaner than relying on a tail -f trick, which won't handle the kill/exit signals gracefully.

Here is another solution.
Create a file spark-env.sh with the following contents and copy it into the spark conf directory.
SPARK_NO_DAEMONIZE=true
If your CMD in the Dockerfile looks like this:
CMD ["/spark/sbin/start-master.sh"]
the container will not exit.

tail -f -n 50 /path/to/spark/logfile
This will keep the container alive and also provide useful info if you run -it interactive mode. You can run -d detached and it will stay alive.

Alternative for #reboot cron job, start job when cron daemon starts

I have a script which is specified to start on boot-up with the #reboot annotation.
I tried to restart the script by stopping the cron daemon and starting it by entering service crond stop and service crond start, respectively.
However, I noticed that the script doesn't restart at the restarting of the cron daemon, but only when the entire system is rebooted.
My question is, since the cron daemon starts when the system is booted, is there a way start jobs not on reboot but specifically when the cron daemon starts so that service crond stop and service crond start work as expected?

Unfortunately, there is no way to do so,
Cron daemon just ignores #reboot directive
(CRON) INFO (Skipping #reboot jobs -- not system startup)
However, if you're trying to start some script at boot time and have ability to restart it without rebooting the machine, you might want to consider creating either init script or, if you're using systemd, systemd service description.(same with upstart and other init replacements)

Node.js Ubuntu and Monit

I'm working on getting a Node server up with upstart and monit instead of using a cron job to run a script to check on things. I've built an admin dashboard for the server that uses the Node os module for things like os.loadavg() and os.totalmem(), etc...
The problem is, when monit is running, os.loadavg() always returns [0, 0, 0]. Has anyone else encountered this problem? Does monit create a lock or something that does not allow Node to read that property?
Thanks in advance for any help!
Monit Script
check process flinch
with pidfile "/var/run/flinch.pid"
start program = "/sbin/start flinch"
stop program = "/sbin/stop flinch"
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if memory usage > 0% then alert

To give this question some closure, I removed monit from the system check and wrote a custom bash script that checks the process and it runs on during a cron job every minute. Monit seems to put a lock on the system stats when in use.

Can I use cron to run long processes or services?

I need to have some processes start when the computer boots and run forever. These are not actually daemons, ie. they do not fork or demonize but they do not exit. I am currently using cron to start them using the #reboot directive like this:
#reboot /path/to/myProcess >>/logs/myProcess.log
Could this cause any problems with the cron daemon? I thought I could try nohup ... & to detach the new process from cron, like this:
#reboot nohup /path/to/myProcess >>/logs/myProcess.log &
Is this required at all?
Is there some other, preferred method to start processes at system boot? I know all Linux distributions provide config files and means to run a program as a service but I am looking for a method that is not Linux distribution specific.

http://www.somacon.com/p38.php
This article answers my question. It suggests that running daemons this way spawns two extra processes, a cron and a shell process, that live for as long as your daemon.
I tested this with linux and following the instructions I was able to get rid of the cron process but not the zombie shell process.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Restarting the airflow scheduler - python-3.x

Run ps aux | grep airflow and check if airflow webserver or airflow scheduler processes are running. If they are kill them and rerun using airflow scheduler -D

You need to clear out the airflow-scheduler.pid file at $AIRFLOW_HOME. The stale pid file from the daemon will prevent you to start another scheduler process.

If you just restart your webserver, the dag changes gets reflected in UI. No need to restart scheduler for the same. Applicable for 1.8 and 1.10.3. Cant comment for latest 1.10.10.

Related

Trying to write a shell script to monitor when a service stops in linux, and to automate the restart of this service

docker stop spark container from exiting

Alternative for #reboot cron job, start job when cron daemon starts

Node.js Ubuntu and Monit

Can I use cron to run long processes or services?

Categories

Resources