Airflow test task works, but in dag run fails

Airflow test task works, but in dag run fails - python-3.x

I'm wondering why it is possible that the following command works:
airflow test [dag_id] [task_id] 20200421
but that the same task fails if I trigger the dag manually in the UI.
The task itself is quite easy, it is basically:
cmd = 'ls' # other command
os.system(cmd)
the os library is imported, and like said above, in testing it does work, but in running it does not. My code is in python, and this specific dag needs to run a specific command in the terminal.
Have you got any idea how this is possible?
If you need more info, let me know in the comments!

Answer:
This problem is due to the different user that runs the script.
airflow run uses a different user (and (sub-) processes) as airflow test. Switching to the airflow user does not work, but providing the airflow user with more rights (in linux) should work.

One possible reason for this behaviour could be, your task that was executed earlier cached in the DB.
So the test works but when you call airflow to run the DAG it fails as it already running in the background or the state is cached in database. Try running $ airflow resetdb

Related

Run Python script in Task Scheduler as normal user but with admin privileges

I have an odd set of constraints and I'm not sure if what I want to do is possible. I'm writing a Python script that can restart programs/services for me via an Uvicorn/FastAPI server. I need the following:
For the script to always be running and to restart if it stops
To be constantly logged on as the standard (non-admin) user
To stop/start a Windows service that requires admin privileges
To start a program as the current (non-admin) user that displays a GUI
I've set up Task Scheduler to run this script as admin, whether logged in or not. This was the only way I found to be able to stop/start Windows services. With this, I'm able to do everything I need except for running a program as the current user. If I set the task to run as the current user, I can do everything except the services.
Within Python, I've tried running the program with os.startfile(), subprocess.Popen(), and subprocess.run(), but it always runs with no GUI, and seemingly as the admin since I can't kill the process without running Task Manager as admin. I'm aware of the 'user' flag in subprocess, but as I'm on Windows 8, the latest Python version I can use is 3.8.10, and 'user' wasn't introduced until Python 3.9.
I've tried the 'runas' cmd command (run through os.system() as well as a separate batch script), but this doesn't work as it prompts for the user's password. I've tried the /savecred flag and I've run the script manually both as a user and as admin just fine, but if I run this through Task Scheduler, either nothing happens, or there is a perpetual 'RunAs' process that halts my script.
I've tried PsExec, but again that doesn't work in Task Scheduler. If I run even a basic one-line batch file with PsExec as a task, I get error 0xC0000142, which from what I can tell is some DLL error: NT_STATUS_DLL_INIT_FAILED.
The only solution I can think of is running two different Python scripts in Task Scheduler (one as admin, one as non-admin), but this is not ideal as I want only one Uvicorn/FastAPI server running with one single port.
EDIT -
I figured out a way to grant service perms to the user account with ServiceSecurityEditor, but I'm still open to any suggestions that may be better. I want the setup process for a new machine to be as simple as possible.

How can I send a command to X number of EC2 instances via SSH

I've a lot AWS EC2 Instances and I need to execute a python script from them at the same time.
I've been trying from my pc to execute the script by sending via ssh the commands required. For this, I've created a another python script that open a cmd terminal and then execute some commands (the ones I need to execute the python script on each instance). Since I need that all these cmd terminal are openned at the same time I've used the ThreatPoolExecutor that (with my PC characteristics) grants me 60 runs in parallel. This is the code:
import os
from concurrent.futures import ThreadPoolExecutor
ipAddressesList=list(open("hosts.txt").read().splitlines())
def functionMain(threadID):
os.system(r'start cmd ssh -o StrictHostKeyChecking=no -i mysshkey.pem ec2-user#'+ipAddressesList[threadID]+' "cd scripts && python3.7 script.py"')
functionMainList =list(range(0,len(ipAddressesList)))
with ThreadPoolExecutor() as executor:
results = executor.map(functionMain, functionMainList)
The problem of this is that the command that executes the script.py is blocking the terminal until the end of the process, hence the functionMain stays waiting for the result. I would like to find the way that after sending the command python3.7 script.py the function ends but the script keeps executing in the instance. So the pool executor can continue with the threads.

The AWS Systems Manager Run Command can be used to run scripts on multiple Amazon EC2 instances (and even on-premises computers if they have the Systems Manager agent installed).
The Run Command can also provide back results of the commands run on each instance.
This is definitely preferably to connecting to the instances via SSH to run commands.

Forgive me for not providing a "code" answer, but I believe there are existing tools that already solve this problem. This sounds like an ideal use of ClusterShell:
ClusterShell provides a light and unified command execution Python framework to help administer GNU/Linux or BSD clusters. Some of the most important benefits of using ClusterShell are to:
provide an efficient, parallel and highly scalable command execution engine in Python,
Using clush you can execute commands in parallel across many nodes. It has options for grouping the output by hostname as well.
Another option would be to use Ansible, but you'll need to create a playbook in that case whereas with ClusterShell you are running a command the same way you would with SSH. With Ansible, you will create a target group for a playbook and it will connect up to each instance and tell it to run the playbook. To make it disconnect while the command is still running, look into asynchronous actions:
By default Ansible runs tasks synchronously, holding the connection to the remote node open until the action is completed. This means within a playbook, each task blocks the next task by default, meaning subsequent tasks will not run until the current task completes. This behavior can create challenges. For example, a task may take longer to complete than the SSH session allows for, causing a timeout. Or you may want a long-running process to execute in the background while you perform other tasks concurrently. Asynchronous mode lets you control how long-running tasks execute.
I've used both of these in HPC environments with more than 5,000 machines and they both will work well for your purpose.

How to create a nodejs instance to run cron jobs at set schedule?

I need to create a nodejs "server" which wont actually serve any assets or content, but will just run some scheduled job to fetch contents from one database and update another database. The schedule of the job should be configurable and should be able to cancel the job at any time. Basically what I need is to run a node script periodically. In past, I have created node/express projects, but I am having a hard time understanding how to implement such a node instance which will run on a remote machine and how to start or terminate it. I found a npm package called "node-schedule" which runs the job periodically, but how to put this package on a remote machine instance and run it?
One possibility that was considered was to schedule a cron job on remote machine which will execute "node updateDB.js" on set schedule, but it is a requirement to keep everything in node package and not depend on cron.

Sounds like a job for ssh.
Personally I wouldn't use NodeJS for this, this should be pretty trivial to do, with Node or otherwise, not sure why you are stuck, honestly. I have nothing against Node, but I don't see why it would be necessary for this task, but certainly you could use it for such a thing.
EDIT: After reading your comment I'm convinced someone thinks Node is a good tool for this task. I guess I don't understand where you are stuck. What part are you stuck on?
I think you should be able to puzzle this out pretty fast. The link below should be enough to put this together. http://book.mixu.net/node/ch9.html
If you need to execute ad hoc commands on a remote server you could use Node to call an Ansible playbook, in that case you'll need to share the public ssh key on the target instance(s) with the instance issuing the commands. There are other ways to skin this cat, but based on the information given, that's how I'd do it. I'd use Node and Ansible (requires python) + SSH.
Oh neato, maybe if I were forced to use NodeJS I'd use this package. https://www.npmjs.com/package/ssh2-exec
Did you find an answer to your problem? Share it here.

How do I create an application that runs in the background and is interactive in Linux?

I want to create an application that runs in the background in Linux (daemon) that will basically at set times (5 times) play a music file or any sound given every single day. I want this daemon to start when the computer is started in terminal mode (non-GUI). I want to know if this is possible and if so, What considerations, tools, and programming language would be the most efficient in doing so? This will be a dedicated computer that will only be executing this task, so if any recommendations on how I can maximize efficiency while disabling other features that are not required for this task will be appreciated. Also, could you please explain how processes and tasks work in terminal (non-GUI)? I always thought terminal was something like CMD in Windows and can only run tasks one at a time.
EDIT: I need the sound to run at variable times, I'll be fetching these times from a website. Any suggestions regarding how to achieve this?
Thanks for the help and sorry for any shortcoming in the questions or my research.

Look at using cron to run your tasks. cron is a very flexible scheduling utility built in to most Linux distributions.
Basically, with cron you specify a task to run (your main program, or maybe just a sound-playing program), all of its arguments, and when it runs. cron takes care of running it, and will even send you "mail" if the job produces any output (such as errors).

You can make processes fork into a subprocess of your terminal, i.e. you are able to run more than one task at a time by putting a & after your terminal command:
> cmd&
> [you can type other commands here but the "cmd" program is still running]
However, for services you generally don't have to worry about starting it as a subprocess because the system already knows to do this. Here's a good question from Super User that has an example of a working service. Simply place your service as a shell script in the /etc/init.d and it will be automatically started as a service.

Is it possible to use Jenkins server to run custom tasks one by one?

Is it possible to use Jenkins server to run custom tasks one by one?
By task I mean to execute an external groovy program which designed as an independent performance and integration test for specific deployment.
If it is possible then how to:
To define tasks in Jenkins and group them so they can start by starting a group.
To see an output of each task (output log).
If there is a specific outcome like "-1" then stop execution of the whole group.
And all this should start automatically after software has been built and deployed.
I feel there has to be a way to do it with Jenkins utilising its out-of-the-box functionality, just not sure how. Or I am wrong and we are looking at custom plugin as a solution?
Thanks a lot!
P.S. I am not asking for detailed answer, just a general direction would be Ok. Also Jenkins is not a requirement, it can be another similar CI server.

It sounds like this could work by a simple Jenkins task with Execute shell commands.
The Console Output for the jobs will contain the output from the processes that you run externally, and the exit status of the script can cause the task to be in failure (any non-zero exit code will do this by default).
On unix systems, #! beginning the first line will denote the script environment to use.
To chain this together with the other Jenkins steps, you can use Build Triggers for Build after other projects are built and use your deployment step as the starting off point.

It is possible, but be careful. Normally Jenkins is used to run build jobs and to deploy software to a QA or staging server. It does not touch Production. But when you start doing this in Jenkins you increase the risk that someone will accidentally run a production job that should not have been run. So if you do decide to use Jenkins for this, set up an entirely separate instance of Jenkins that does nothing other than run these jobs. Then go to Manage Jenkins->Configure Global Security and set up login users. At the least, use "logged in users can do anything" but it would be better to set up "matrix-based security". Then run any jobs that you need by using an Execute Shell step. You can schedule jobs by using a Build Trigger, and you can connect jobs sequentially by setting up Build Other Projects in the post build section. If you want to do more complex job chaining, look into the Join Plugin.
Just keep this Jenkins entirely separate from the Jenkins which you use for CI.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string