Azure DevOps build pipeline logid for a specific task - dynamically [duplicate] - azure

This question already has an answer here:
Azure DevOps pipeline logs for a specific task
(1 answer)
Closed last month.
In Azure DevOps, I have a multistage build pipeline. In that pipeline, I have a task named - Terraform_init. I need the log ID of this task, dynamically. How do I find out the log ID dynamically, if I know the displayname of the task?
Current situation:
Right now I have figured out the Log ID for that task. But later, in my build pipeline, I will add more tasks before the Terraform_init task. So, the log Id will be changed for that task.
Why I need:
After Terraform init task, I have another task called Get_logs. This task gets the logs of the Terraform_init task and saves it in a blob. For that, I have to use the following line -
$logs_url = ('https://dev.azure.com/bmw-ai-big-data-platform/{0}/_apis/build/builds/{1}/logs/27?api-version=6.0' -f $($env:SYSTEM_TEAMPROJECTID), $($env:BUILD_BUILDID) )
I will have more tasks before the Terrafrom_init task, so ../logs/27.. - this part will need to update every time. I want to avoid this.
Thanks in advanced.

Use this code to get the logids:
import requests
import json
url = "https://dev.azure.com/<orgname>/<project name>/_apis/build/builds/<build id>/timeline?api-version=6.0"
payload={}
headers = {
'Authorization': 'Basic <base64encoded PAT>'
}
response = requests.request("GET", url, headers=headers, data=payload)
reponse_json = response.json()
records = reponse_json['records']
for record in records:
if record['name'] == 'Initialize job':
print(record['log']['id'])
After that, using logging command to output the data as variable, and then you can be able to use it in other following tasks(Only for runtime, compile time is unable to achieve.).

Related

how to launch a cloud dataflow pipeline when particular set of files reaches Cloud storage from a google cloud function

I have a requirement to create a cloud function which should check for a set of files in a GCS bucket and if all of those files arrives in GCS bucket then only it should launch the dataflow templates for all those files.
My existing cloud function code launches cloud dataflow for each file which comes into a GCS bucket. It runs different dataflows for different files based on naming convention. This existing code is working fine but my intention is not to trigger dataflow for each uploaded file directly.
It should check for set of files and if all the files arrives, then it should launch dataflows for those files.
Is there a way to do this using Cloud Functions or is there an alternative way of achieving the desired result ?
from googleapiclient.discovery import build
import time
def df_load_function(file, context):
filesnames = [
'Customer_',
'Customer_Address',
'Customer_service_ticket'
]
# Check the uploaded file and run related dataflow jobs.
for i in filesnames:
if 'inbound/{}'.format(i) in file['name']:
print("Processing file: {filename}".format(filename=file['name']))
project = 'xxx'
inputfile = 'gs://xxx/inbound/' + file['name']
job = 'df_load_wave1_{}'.format(i)
template = 'gs://xxx/template/df_load_wave1_{}'.format(i)
location = 'asia-south1'
dataflow = build('dataflow', 'v1b3', cache_discovery=False)
request = dataflow.projects().locations().templates().launch(
projectId=project,
gcsPath=template,
location=location,
body={
'jobName': job,
"environment": {
"workerRegion": "asia-south1",
"tempLocation": "gs://xxx/temp"
}
}
)
# Execute the dataflowjob
response = request.execute()
job_id = response["job"]["id"]
I've written the below code for the above functionality. The cloud function is running without any error but it is not triggering any dataflow. Not sure what is happening as the logs has no error.
from googleapiclient.discovery import build
import time
import os
def df_load_function(file, context):
filesnames = [
'Customer_',
'Customer_Address_',
'Customer_service_ticket_'
]
paths =['Customer_','Customer_Address_','Customer_service_ticket_']
for path in paths :
if os.path.exists('gs://xxx/inbound/')==True :
# Check the uploaded file and run related dataflow jobs.
for i in filesnames:
if 'inbound/{}'.format(i) in file['name']:
print("Processing file: {filename}".format(filename=file['name']))
project = 'xxx'
inputfile = 'gs://xxx/inbound/' + file['name']
job = 'df_load_wave1_{}'.format(i)
template = 'gs://xxx/template/df_load_wave1_{}'.format(i)
location = 'asia-south1'
dataflow = build('dataflow', 'v1b3', cache_discovery=False)
request = dataflow.projects().locations().templates().launch(
projectId=project,
gcsPath=template,
location=location,
body={
'jobName': job,
"environment": {
"workerRegion": "asia-south1",
"tempLocation": "gs://xxx/temp"
}
}
)
# Execute the dataflowjob
response = request.execute()
job_id = response["job"]["id"]
else:
exit()
Could someone please help me with the above python code.
Also my file names contain current dates at the end as these are incremental files which I get from different source teams.
If I'm understanding your question correctly, the easiest thing to do is to write basic logic in your function that determines if the entire set of files is present. If not, exit the function. If yes, run the appropriate Dataflow pipeline. Basically implementing what you wrote in your first paragraph as Python code.
If it's a small set of files it shouldn't be an issue to have a function run on each upload to check set completeness. Even if it's, for example, 10,000 files a month the cost is extremely small for this service assuming:
Your function isn't using lots of bandwidth to transfer data
The code for each function invocation doesn't take a long time to run.
Even in scenarios where you can't meet these requirements Functions is still pretty cheap to run.
If you're worried about costs I would recommend checking out the Google Cloud Pricing Calculator to get an estimate.
Edit with updated code:
I would highly recommend using the Google Cloud Storage Python client library for this. Using os.path likely won't work as there are additional underlying steps required to search a bucket...and probably more technical details there than I fully understand.
To use the Python client library, add google-cloud-storage to your requirements.txt. Then, use something like the following code to check the existence of an object. This example is based off an HTTP trigger, but the gist of the code to check object existence is the same.
from google.cloud import storage
import os
def hello_world(request):
# Instantiate GCS client
client = storage.client.Client()
# Instantiate bucket definition
bucket = storage.bucket.Bucket(client, name="bucket-name")
# Search for object
for file in filenames:
if storage.blob.Blob(file, bucket) and "name_modifier" in file:
# Run name_modifier Dataflow job
elif storage.blob.Blob(file, bucket) and "name_modifier_2" in file:
# Run name_modifier_2 Dataflow job
else:
return "File not found"
This code ins't exactly what you want from a logic standpoint, but should get you started. You'll probably want to just make sure all of the objects can be found first and then move to another step where you start running the corresponding Dataflow jobs for each file if they are all found in the previous step.

How to use Pipeline parameters on AzureML

I've built a pipeline on AzureML Designer and I'm trying to use pipeline parameters but I'm not able to get the values of those parameters on a python script module.
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline
This documentation contains a section called "Use pipeline parameters for arguments that change at inference time" but, unfortunately, it is empty.
I'm defining the parameters on the pipeline setting, see the screenshot on the bottom. Does anyone know how to use the parameters when using the Designer to build the pipeline?
You can correlate each pipeline stage’s outputs w/its inputs. e.g. given the results of model evaluation we should be able to easily identify all the artifacts (model evaluation configuration, model specification, model parameters, training script, training data etc.) pertaining to said evaluation.
Azure Machine Learning Pipelines Referenced Article:
https://github.com/Azure/MachineLearningNotebooks/blob/4a3f8e7025334ea8c0de0bada69b031ce54c24a0/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb
We have an AMLS pipeline trying to parameterize with a date string to process our pipeline in the context of old historical dates.
Here’s the code we’re using to submit the pipeline
from azureml.core.authentication import InteractiveLoginAuthentication
import requests
auth = InteractiveLoginAuthentication()
aad_token = auth.get_authentication_header()
rest_endpoint = published_pipeline.endpoint
print("You can perform HTTP POST on URL {} to trigger this pipeline".format(rest_endpoint))
# specify the param when running the pipeline
response = requests.post(rest_endpoint,
headers=aad_token,
json={"ExperimentName": "dtpred-Dock2RTEG-EX-param",
"RunSource": "SDK",
"DataPathAssignments": {"input_datapath": {"DataStoreName": "erpgen2datastore","RelativePath": "teams/PredictiveInsights/DatePrediction/2019/10/10"}},
"ParameterAssignments": {"param_inputDate": "2019/10/10"}})
run_id = response.json()["Id"]
print('Submitted pipeline run: ', run_id)

'Delay until' finish time of 'Queue a new build' not working in Azure Logic App

I'm triggering an Azure Logic App from an https webhook for a docker image in Azure Container Registry.
The workflow is roughly:
When a HTTP request is received
Queue a new build
Delay until
FinishTime of Queue a new build
See: Workflow image
The Delay until action doesn't work in that the queueried FinishTime is 0001-01-01T00:00:00.
It complains about the wrong format, so I manually added a Z after the FinishTime keyword.
Now the time stamp is in the right format, however, the timestamp 0001-01-01T00:00:00Z obviously doesn't make sense and subsequent steps are executed without delay.
Anything that I am missing?
edit: Queue a new build queues an Azure pipeline build. I.e. the FinishTime property comes from the pipeline.
You need to set a timestamp in future, the timestamp 0001-01-01T00:00:00Z you set to the "Delay until" action is not a future time. If you set a timestamp as 2020-04-02T07:30:00Z, the "Delay until" action will take effect.
Update:
I don't think the "Delay until" can do what you expect, but maybe you can refer to the operations below. Just add a "Condition" action to judge if the FinishTime is greater than current time.
The expression in the "Condition" is:
sub(ticks(variables('FinishTime')), ticks(utcNow()))
In a word, if the FinishTime is greater than current time --> do the "Delay until" aciton. If the FinishTime is less than current time --> do anything else which you want.(By the way you need to pay attention to the time zone of your timestamp, maybe you need to convert all of the time zone to UTC)
I've been in touch with an Azure support engineer, who has confirmed that the Delay until action should work as I intended to use it, however, that the FinishTime property will not hold a value that I can use.
In the meantime, I have found a workaround, where I'm using some logic and quite a few additional steps. Inconvenient but at least it does what I want.
Here are the most important steps that are executed after the workflow gets triggered from a webhook (docker base image update in Azure Container Registry).
Essentially, I'm initializing the following variables and queing a new build:
buildStatusCompleted: String value containing the target value completed
jarsBuildStatus: String value containing the initial value notStarted
jarsBuildResult: String value containing the default value failed
Then, I'm using an Until action to monitor when the jarsBuildStatus's value is switching to completed.
In the Until action, I'm repeating the following steps until jarsBuildStatus changes its value to buildStatusCompleted:
Delay for 15 seconds
HTTP request to Azure DevOps build, authenticating with personal access token
Parse JSON body of previous raw HTTP output for status and result keywords
Set jarsBuildStatus = status
After breaking out of the Until action (loop), the jarsBuildResult is set to the parsed result.
All these steps are part of a larger build orchestration workflow, where I'm repeating the given steps multiple times for several different Azure DevOps build pipelines.
The final action in the workflow is sending all the status, result and other relevant data as a build summary to Azure DevOps.
To me, this is only a workaround and I'll leave this question open to see if others have suggestions as well or in case the Azure support engineers can give more insight into the Delay until action.
Here's an image of the final workflow (at least, the part where I implemented the Delay until action):
edit: Turns out, I can simplify the workflow because there's a dedicated Azure DevOps action in the Logic App called Send an HTTP request to Azure DevOps, which omits the need for manual authentication (Azure support engineer pointed this out).
The workflow now looks like this:
That is, I can query the build status directly and set the jarsBuildStatus as
#{body('Send_an_HTTP_request_to_Azure_DevOps:_jar''s')['status']}
The code snippet above is automagically converted to a value for the Set variable action. Thus, no need to use an additional Parse JSON action.

Microsoft.TeamFoundation.Build.WebApi Get Build Status Launched by PR policy

In our pipeline we programmatically create a pull request (PR). The branch being merged into has a policy on it that launches a build. This build takes a variable amount of time. I need to query the build status until it is complete (or long timeout) so that I can complete the PR, and clean up the temp branch.
I am trying to figure out how to get the build that was kicked off by the PR so that I can inspect the status by using Microsoft.TeamFoundation.Build.WebApi, but all overloads of BuildHttpClientBase.GetBuildAsync require a build Id which I don't have. I would like to avoid using the Azure Build REST API. Does anyone know how I might get the Build kicked off by the PR without the build ID using BuildHttpClientBase?
Unfortunately the documentation doesn't offer a lot of detail about functionality.
Answering the question you asked:
Finding a call that provides the single deterministic build id for a pull request doesn't seem to be very readily available.
As mentioned, you can use BuldHttpClient.GetBuildsAsync() to filter builds based on branch, repository, requesting user and reason.
Adding the BuildReason.PullRequest value in the request is probably redundant according to the branch you will need to pass.
var pr = new GitPullRequest(); // the PR you've received after creation
var requestedFor = pr.CreatedBy.DisplayName;
var repo = pr.Repository.Id.ToString();
var branch = $"refs/pull/{pr.PullRequestId}/merge";
var reason = BuildReason.PullRequest;
var buildClient = c.GetClient<BuildHttpClient>();
var blds = await buildClient.GetBuildsAsync("myProject",
branchName: branch,
repositoryId: repo,
requestedFor: requestedFor,
reasonFilter: reason,
repositoryType: "TfsGit");
In your question you mentioned wanting the build (singular) for the pull request, which implies that you only have one build definition acting as the policy gate. This method can return multiple Builds based on the policy configurations on your target branch. However, if that were your setup, it would seem logical that your question would then be asking for all those related builds for which you would wait to complete the PR.
I was looking into Policy Evaluations to see if there was a more straight forward way to get the id of the build being run via policy, but I haven't been able to format the request properly as per:
Evaluations are retrieved using an artifact ID which uniquely identifies the pull request. To generate an artifact ID for a pull request, use this template:
vstfs:///CodeReview/CodeReviewId/{projectId}/{pullRequestId}
Even using the value that is returned in the artifactId field on the PR using the GetById method results in a Doesn't exist or Don't have access response, so if someone else knows how to use this method and if it gives exact build ids being evaluated for the policy configurations, I'd be glad to hear it.
An alternative to get what you actually desire
It sounds like the only use you have for the branch policy is to run a "gate build" before completing the merge.
Why not create the PR with autocomplete.
Name - autoCompleteSetBy
Type - IdentityRef
Description - If set, auto-complete is enabled for this pull request and this is the identity that enabled it.
var me = new IdentityRef(); // you obviously need to populate this with real values
var prClient = connection.GetClient<GitHttpClient>();
await prClient.CreatePullRequestAsync(new GitPullRequest()
{
CreatedBy = me,
AutoCompleteSetBy = me,
Commits = new GitCommitRef[0],
SourceRefName = "feature/myFeature",
TargetRefName = "master",
Title = "Some good title for my PR"
},
"myBestRepository",
true);

How to output a value from ADF .NET Custom Activity

I have an ADF that takes a Dataset input from Azure Data Lake Storage, it then has a pipeline with a custom .NET activity. The activity moves files from their Import folder into a custom folder location ready for processing and then deletes the original file.
I want to be able to pass the custom folder location back out into the activities pipeline so that I can give it to the next activity.
Is there a way of outputting the custom folder string to the activities pipeline?
Thank you,
We have an improve item the output can be next customer activity's input. Its pending deployment. You can have a try by the end of this month :)
For how to use this feature:
Update code: Activity1:
Execute(...)
{
return new Dictionary<string, string> { "**Foo**", "Value" }
}
Update pipeline: Activity2 Json:
"extendedProperties": { "ValueOfFoo", "**$$Foo**" }
If you want to use a "custom activity" as the title of the question suggest, this is possible using
Reference : https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-custom-activity#retrieve-execution-outputs
#activity('MyCustomActivity').output.outputs[0]"
You can also consume the output in another activity as described here:
Reference : https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-custom-activity#pass-outputs-to-another-activity
You can send custom values from your code in a Custom Activity back to the service. You can do so by writing them into outputs.json from your application. The service copies the content of outputs.json and appends it into the Activity Output as the value of the customOutput property. (The size limit is 2MB.) If you want to consume the content of outputs.json in downstream activities, you can get the value by using the expression
#activity('<MyCustomActivity>').output.customOutput

Resources