We have configured a webserver a scheduler and a VMSS for workers in airflow. Have created the postgre for Metadata information for all Airflow related activities like tasks, connections , redis for For orchestrating the workers and azure blob storage for logging . Have created a sample dag and when triggered the dag keeps on running and is not executed.I find my scheduler workers and webserver up and working fine am not sure why my jobs are not picked by the scheduler. Is there any connection that I could have missed ? Kindly let me know on the same.
Ensure you have set a FERNET_KEY and its the same across Webserver, Scheduler and Workers.
To generate FERNET_KEY
https://bcb.github.io/airflow/fernet-key
Related
I have this .NET long running API process/function that usually runs 30 mins in one execution that is hosted in AKS. This API is usually executed from the users coming from the front end of the app.
Due to concurrent executions from users, this is causing exhaustion of the app so I'm planning to implement a some sort of a queueing mechanism with the help of a scheduler(s).
What possibly is applicable Azure service that can execute my API in AKS on a scheduled basis (let's say every minute) and possibly check the database for some flagging values.
I need a way to check the table for some flagging value if there a currently running process or its been completed so it can process the next one, otherwise ignore the call until current on is complete.
I was looking into Azure Web Apps, Web Jobs or Batch Jobs but kinda confused which is applicable with my case.
Please advise thank you in advance.
There are a couple of options here.
Hangfire
Hangfire is an open-source library that can run background jobs in queues. In your case, you can enqueue each request from the client in a queue. Then Hangfire server will process them one by one (even with retry if the job fails). Hangfire supports SQL Server or Redis. You can query the storage to see the status of the queued jobs.
Hangfire can also run scheduled jobs, which will take care of that only one job run at a time.
Azure Service Bus
A more expensive option is to use Azure Service Bus for your queueing capability. For scheduled jobs, you can use AKS CronJobs but you will
implement the check yourself to see if there is a job already running.
Overall, I would recommend Hangfire, which can meet your requirements and is cheaper.
I was just curious about the lifecycle of the Azure Batch's virtual machines. Say a VM is created and a task is completed. Is the VM terminated after a successful completion?
• No, according to the best practices official documentation and the nodes and pools documentation, the Azure VM in a batch doesn’t shutdown unless a repetitive job is initiated. If a repetitive scheduled job is to be executed on a VM in an Azure Batch, then the time period for which the batch job is to be executed is to be defined during the batch job creation itself and resource allocation instance.
Thus, when a batch job on an Azure VM is completed, the Azure VM becomes idle in state and after a defined threshold period, it is deleted. The threshold period is the time for which the Azure VM in the node pool is in hibernation state/idle. Post completion of which and absence of any lined-up task or job, the Azure VM in the node pool gets deleted/deallocated if a spot instance is used.
• Please refer to the below official documentation link on the provisioning of pool and compute node lifetime: -
https://learn.microsoft.com/en-us/azure/batch/nodes-and-pools#pool-and-compute-node-lifetime
we installed the follwing presto cluster on Linux redhat 7.2 version
presto latest version - 0.216
1 presto coordinator
231 presto workers
on each worker machine we can use the follwing command in order to verify the status
/app/presto/presto-server-0.216/bin/launcher status
Running as 61824
and also stop/start as the follwing
/app/presto/presto-server-0.216/bin/launcher stop
/app/presto/presto-server-0.216/bin/launcher start
I also searches in google about UI that can manage the presto status/stop/start
but not seen any thing about this
its very strange that presto not comes with some user interface that can show the cluster status and do stop/start action if we need to do so
as all know the only user interface of presto is show status and not have the actions as stop/start
in the above example screen we can see that the active presto worker are only 5 from 231 , but this UI not support stop/start actions and not show on which worker presto isn't active
so what we can do about it?
its very bad idea to access each worker machine and see if presto is up or down
why presto not have centralized UI that can do stop/start action ?
example what we are expecting from the UI , - partial list
.
.
.
Presto currently uses discovery service where workers announce themselves to join the cluster, so if a worker node is not registered there is no way for coordinator or discovery server to know about its presence and/or restart it.
At Qubole, we use an external service alongside presto master that tracks nodes which do not register with discovery service within a certain interval. This service is responsible for removing such nodes from the cluster.
One more thing we do is use monit service on each of presto worker nodes, which ensures that presto server is restarted whenever it goes down.
You may have to do something similar for cluster management , as presto does not provide it right now.
In my opinion and experience managing prestosql cluster, it matters of service discovery in architecture patterns.
So far, it uses following patterns in the open source release of prestodb/prestosql:
server-side service discovery - it means a client app like presto cli or any app uses presto sdk just need to reach a coordinator w/o awareness of worker nodes.
service registry - a place to keep tracking available instances.
self-registration - A service instance is responsible for registering itself with the service registry. This is the key part that it forces several behaviors:
Service instances must be registered with the service registry on startup and unregistered on shutdown
Service instances that crash must be unregistered from the service registry
Service instances that are running but incapable of handling requests must be unregistered from the service registry
So it keeps the life-cycle management of each presto worker to each instance itself.
so what we can do about it?
It provides some observability from presto cluster itself like HTTP API /v1/node and /v1/service/presto to see instance status. Personally I recommend using another cluster manager like k8s or nomad to manage presto cluster members.
its very bad idea to access each worker machine and see if presto is up or down
why presto not have centralized UI that can do stop/start action ?
No opinion on good/bad. Take k8s for example, you can manage all presto workers as one k8s deployment and manage each presto worker in one pod. It can use Liveness, Readiness and Startup Probes to automate the instance lifecycle with a few YAML code. E.g., the design of livenessProbe of helm chart stable/presto. And cluster manageer like k8s does provide web UI so that you can touch resources to act like an admin. . Or you can choose to write more Java code to extend Presto.
I am evaluating options for launching arbitrary Python tasks/scripts on temporary cloud VMs that get shut down once the job is complete. I am looking across all cloud providers, but the ideal solution should not be vendor-specific. Here is what I found:
Docker Swarm / Kubernetes / Nomad for spinning up docker containers. All look attractive, but cannot confirm if they can terminate VMs once the task is complete.
Cloud Functions/Lambdas look great, but works only for short-lived (few minutes) tasks. Also, GCP supports only JavaScript.
Spins up/down VMs explicitly from a launch script with vendor specific commands. Straightforward and should work.
AWS Batch, Azure Batch - vendor-specific services for batch jobs
AWS Data Pipeline, Azure Data Factory, Google Dataflow - vendor-specific services for data pipelines
Did I miss any good option? Does any container orchestration service like Docker Swarm support allocation and deallocation of multiple temporary VMs to run a one-shot job?
Similar to an Azure worker role, does amazon web services or Heroku offer a service that would be useful for a worker process or recurring worker role. For example, an hourly job that would download a file from a url and parse it into a database or insert it into a dynamo db.
Ideally it would have a status dashboard web interface that would let you see the status of jobs, errors in jobs, etc...
Heroku has a concept of 'worker dynos' to which you can assign to process background jobs, or the other tasks you had asked about.
More information:
https://devcenter.heroku.com/articles/background-jobs-queueing