While reading kubernetes agent documentation, I am getting confused with below line
"Configure a flow-run to run as a Kubernetes Job."
Does it mean that the process which is incharge of submitting flow and communication with api server will run as kubernetes job?
On the other side, the use case which I am trying to solve is
Setup backend server
Execute a flow composed of 2 tasks
if k8s infra available the tasks should be executed as kubernetes jobs
if docker only infra available, the tasks should be executed as docker containers.
Can somebody suggest me, how to solve above scenario in prefect.io?
That's exactly right. When you use KubernetesAgent, Prefect deploys your flow runs as Kubernetes jobs.
For #1 - you can do that in your agent YAML file as follows:
env:
- name: PREFECT__CLOUD__AGENT__AUTH_TOKEN
value: ''
- name: PREFECT__CLOUD__API
value: "http://some_ip:4200/graphql" # paste your GraphQL Server endpoint here
- name: PREFECT__BACKEND
value: server
#2 - write your flow
#3 and #4 - this is more challenging to do in Prefect, as there is currently no load balancing mechanism aware of your infrastructure. There are some hacky solutions that you may try, but there is no first-class way to handle this in Prefect.
One hack would be: you build a parent flow that checks your infrastructure resources and depending on the outcome, it spins up your flow run with either DockerRun or KubernetesRun run config.
from prefect import Flow, task, case
from prefect.tasks.prefect import create_flow_run, wait_for_flow_run
from prefect.run_configs import DockerRun, KubernetesRun
#task
def check_the_infrastructure():
return "kubernetes"
with Flow("parent_flow") as flow:
infra = check_the_infrastructure()
with case(infra, "kubernetes"):
child_flow_run_id = create_flow_run(
flow_name="child_flow_name", run_config=KubernetesRun()
)
k8_child_flowrunview = wait_for_flow_run(
child_flow_run_id, raise_final_state=True, stream_logs=True
)
with case(infra, "docker"):
child_flow_run_id = create_flow_run(
flow_name="child_flow_name", run_config=DockerRun()
)
docker_child_flowrunview = wait_for_flow_run(
child_flow_run_id, raise_final_state=True, stream_logs=True
)
But note that this would require you to have 2 agents: Kubernetes agent and Docker agent running at all times
Related
Using the v2 Azure ML Python SDK (azure-ai-ml) how do I get an instance of the currently running job?
In v1 (azureml-core) I would do:
from azureml.core import Run
run = Run.get_context()
if isinstance(run, Run):
print("Running on compute...")
What is the equivalent on the v2 SDK?
This is a little more involved in v2 than in was in v1. The reason is that v2 makes a clear distinction between the control plane (where you start/stop your job, deploy compute, etc.) and the data plane (where you run your data science code, load data from storage, etc.).
Jobs can do control plane operations, but they need to do that with a proper identity that was explicitly assigned to the job by the user.
Let me show you the code how to do this first. This script creates an MLClient and then connects to the service using that client in order to retrieve the job's metadata from which it extracts the name of the user that submitted the job:
# control_plane.py
from azure.ai.ml import MLClient
from azure.ai.ml.identity import AzureMLOnBehalfOfCredential
import os
def get_ml_client():
uri = os.environ["MLFLOW_TRACKING_URI"]
uri_segments = uri.split("/")
subscription_id = uri_segments[uri_segments.index("subscriptions") + 1]
resource_group_name = uri_segments[uri_segments.index("resourceGroups") + 1]
workspace_name = uri_segments[uri_segments.index("workspaces") + 1]
credential = AzureMLOnBehalfOfCredential()
client = MLClient(
credential=credential,
subscription_id=subscription_id,
resource_group_name=resource_group_name,
workspace_name=workspace_name,
)
return client
ml_client = get_ml_client()
this_job = ml_client.jobs.get(os.environ["MLFLOW_RUN_ID"])
print("This job was created by:", this_job.creation_context.created_by)
As you can see, the code uses a special AzureMLOnBehalfOfCredential to create the MLClient. Options that you would use locally (AzureCliCredential or InteractiveBrowserCredential) won't work for a remote job since you are not authenticated through az login or through the browser prompt on that remote run. For your credentials to be available on the remote job, you need to run the job with user_identity. And you need to retrieve the corresponding credential from the environment by using the AzureMLOnBehalfOfCredential class.
So, how do you run a job with user_identity? Below is the yaml that will achieve it:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
type: command
command: |
pip install azure-ai-ml
python control_plane.py
code: code
environment:
image: library/python:latest
compute: azureml:cpu-cluster
identity:
type: user_identity
Note the identity section at the bottom. Also note that I am lazy and install the azureml-ai-ml sdk as part of the job. In a real setting, I would of course create an environment with the package installed.
These are the valid settings for the identity type:
aml_token: this is the default which will not allow you to access the control plane
managed or managed_identity: this means the job will be run under the given managed identity (aka compute identity). This would be accessed in your job via azure.identity.ManagedIdentityCredential. Of course, you need to provide the chosen compute identity with access to the workspace to be able to read job information.
user_identity: this will run the job under the submitting user's identity. It is to be used with the azure.ai.ml.identity.AzureMLOnBehalfOfCredential credentials as shown above.
So, for your use case, you have 2 options:
You could run the job with user_identity and use the AzureMLOnBehalfOfCredential class to create the MLClient
You could create the compute with a managed identity which you give access to the workspace and then run the job with managed_identity and use the ManagedIdentityCredential class to create the MLClient
I'm making a crone job that switches cluster context every time and checks for stuff. But, for switching the context to EKS, I need to run aws configure every time to get logged in.
I'm wondering how this step can be fulfilled via a crone job that will also switch the context to EKS. If It is possible to run aws configure like AWS configure | key1 | key2 | region, I'll pass the input in via string templating.
Since you are using EKS, I assume you are using aws-auth configmap also. To contact with EKS, you need to use a role or user inside aws-auth.
Here is what you can do now:
Make your credentials file having multiple profiles:
[profile1]
...
[profile2]
...
then you can switch profiles in your script by this environment variable:
export AWS_PROFILE=profile1
Example like:
export AWS_PROFILE=profile1
aws eks ...
kubectl ...
export AWS_PROFILE=profile2
aws eks ...
kubectl ...
The export part might be different in the real world, but the base script is similar.
I would like to use an Azure Machine Learning Compute Cluster as a compute target but do not want it to containerize my project. Is there a way to deactivate this "feature" ?
The main reasons behind this request is that :
I already set up a docker-compose file that is used to specify 3 containers for Apache Airflow and want to avoid a Docker-in-Docker situation. Especially that I already tried to do so but failed so far (here's the link my other related SO question).
I prefer not to use a Compute Instance as it is tied to an Azure account which is not ideal for automation purposes.
Thanks in advance !
Use the provisioning_configuration method of the AmlCompute class to specify configuration parameters.
In the following example, a persistent compute target provisioned by AmlCompute is created. The provisioning_configuration parameter in this example is of type AmlComputeProvisioningConfiguration, which is a child class of ComputeTargetProvisioningConfiguration.
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"
# Verify that cluster does not exist already
try:
cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
print('Found existing cluster, use it.')
except ComputeTargetException:
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
max_nodes=4)
cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
cpu_cluster.wait_for_completion(show_output=True)
Refer - https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.amlcompute(class)?view=azure-ml-py
I've installed Prometheus operator 0.34 (which works as expected) on cluster A (main prom)
Now I want to use the federation option,I mean collect metrics from other Prometheus which is located on other K8S cluster B
Secnario:
have in cluster A MAIN prometheus operator v0.34 config
I've in cluster B SLAVE prometheus 2.13.1 config
Both installed successfully via helm, I can access to localhost via port-forwarding and see the scraping results on each cluster.
I did the following steps
Use on the operator (main cluster A) additionalScrapeconfig
I've added the following to the values.yaml file and update it via helm.
additionalScrapeConfigs:
- job_name: 'federate'
honor_labels: true
metrics_path: /federate
params:
match[]:
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- 101.62.201.122:9090 # The External-IP and port from the target prometheus on Cluster B
I took the target like following:
on prometheus inside cluster B (from which I want to collect the data) I use:
kubectl get svc -n monitoring
And get the following entries:
Took the EXTERNAL-IP and put it inside the additionalScrapeConfigs config entry.
Now I switch to cluster A and run kubectl port-forward svc/mon-prometheus-operator-prometheus 9090:9090 -n monitoring
Open the browser with localhost:9090 see the graph's and click on Status and there Click on Targets
And see the new target with job federate
Now my main question/gaps. (security & verification)
To be able to see that target state on green (see the pic) I configure the prometheus server in cluster B instead of using type:NodePort to use type:LoadBalacer which expose the metrics outside, this can be good for testing but I need to secure it, how it can be done ?
How to make the e2e works in secure way...
tls
https://prometheus.io/docs/prometheus/1.8/configuration/configuration/#tls_config
Inside cluster A (main cluster) we use certificate for out services with istio like following which works
tls:
mode: SIMPLE
privateKey: /etc/istio/oss-tls/tls.key
serverCertificate: /etc/istio/oss-tls/tls.crt
I see that inside the doc there is an option to config
additionalScrapeConfigs:
- job_name: 'federate'
honor_labels: true
metrics_path: /federate
params:
match[]:
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- 101.62.201.122:9090 # The External-IP and port from the target
# tls_config:
# ca_file: /opt/certificate-authority-data.pem
# cert_file: /opt/client-certificate-data.pem
# key_file: /sfp4/client-key-data.pem
# insecure_skip_verify: true
But not sure which certificate I need to use inside the prometheus operator config , the certificate of the main prometheus A or the slave B?
You should consider using Additional Scrape Configuration
AdditionalScrapeConfigs allows specifying a key of a Secret
containing additional Prometheus scrape configurations. Scrape
configurations specified are appended to the configurations generated
by the Prometheus Operator.
I am affraid this is not officially supported. However, you can update your prometheus.yml section within the Helm chart. If you want to learn more about it, check out this blog
I see two options here:
Connections to Prometheus and its exporters are not encrypted and
authenticated by default. This is one way of fixing that with TLS
certificates and
stunnel.
Or specify Secrets which you can add to your scrape configuration.
Please let me know if that helped.
A couple of options spring to mind:
Put the two clusters in the same network space and put a firewall in-front of them
VPN tunnel between the clusters.
Use istio multicluster routing (but this could get complicated): https://istio.io/docs/setup/install/multicluster
I have written a scheduled function in node.js using typescript that successfully deploys.The related pub/sub topic gets created automatically but somehow the related scheduler job does not.
This is even after getting these lines
i scheduler: ensuring necessary APIs are enabled...
i pubsub: ensuring necessary APIs are enabled...
+ scheduler: all necessary APIs are enabled
+ pubsub: all necessary APIs are enabled
+ functions: created scheduler job firebase-schedule-myFunction-us-central1
+ functions[myFunction(us-central1)]: Successful create operation.
+ Deploy complete!
I have cloned the sample at https://github.com/firebase/functions-samples/tree/master/delete-unused-accounts-cron which deploys and automatically creates both the related pub/sub topic and scheduler job.
What could i be missing?
Try to change .timeZone('utc') (per the docs) to .timeZone('Etc/UTC') (also per the self-contradictory docs).
It seems that when using the 'every 5 minutes' syntax, the deploy does not create the scheduler job.
Switching to the cron syntax solved the problem for me
Maybe your cron syntax isn't correct. There are some tools to validate the syntax
Check your firebase-debug.log
At some point,it will invoke a POST request to:
>> HTTP REQUEST POST https://cloudscheduler.googleapis.com/v1beta1/projects/*project_name*/locations/*location*/jobs
This must be a 200 response.