How to organize one step after another in Azure Machine Learning Pipelines? - azure-machine-learning-service

I have defined an Azure Machine Learning Pipeline with three steps:
e2e_steps=[etl_model_step, train_model_step, evaluate_model_step]
e2e_pipeline = Pipeline(workspace=ws, steps = e2e_steps)
The idea is to run the Pipeline in the given sequence:
etl_model_step
train_model_step
evaluate_model_step
However, my experiment is failing because it is trying to execute evaluate_model_step before train_model_step:
How do I enforce the sequence of execution?

azureml.pipeline.core.StepSequence lets you do exactly that.
A StepSequence can be used to easily run steps in a specific order, without needing to specify data dependencies through the use of PipelineData.
See the docs to read more.
However, the preferable way to have steps run in order is stitching them together via PipelineData or OutputFileDatasetConfig. In your example, does the train_step depend on outputs from the etl step? If so, consider having that be the way that steps are run in sequence. For more info see this tutorial for more info

Related

What are the steps within `make_pipeline()`?

What are the steps within sklearn.pipeline.make_pipeline()?
The documentation doesn't explicitly state what are the steps that we "save" when using make_pipeline() instead of doing it the normal way. Was wondering what those steps are just for my understanding?
I can guess that one of the steps is applying a scaling to all columns in the dataset, like StandardScaler().

Is there a generic dag/task can be written in Airflow?

Detailed question: I have a scenario in my mind, and would like to take a suggestion from experts!
in the attached image, first set of image shows current workflow. below is the expected workflow. I can't combine all the workflows into one dag due to different source data refresh time.
I was thinking like, is it possible to create one dag named generic and use the individual tasks in other child dags like templates?
or, how can i call these generic dags from my child dags?
looking for your valuable suggestion to make the work flow optimised and easy to maintain when any update needed in extract and load part. as of now, its really complex as I need to touch all the 4 dags individually!

Ability to execute tests on data sets in a csv file 'in parallel'

I asked in my previous question if Karate is capable of executing tests on specific data sets (For instance, based on priority p0,p1) given in a csv file.
Now my second question is if Karate is capable of executing tests on specific data sets in a csv file in parallel?
Example: DataProvider supports data-provider-thread-count. Here's an example of usage.
I've read the documentation in regards to parallel execution in Karate, however I did not find anything on this type of parallel feature. Can you please let me know if this is possible in Karate. Thank you.
Yes if you use a Scenario Outline each row will run in parallel. And this applies to even the "Dynamic" Scenario Outline as explained here: https://github.com/intuit/karate#dynamic-scenario-outline
Karate runs each Scenario in parallel and behind the scenes, each Examples row is turned into a Scenario. A few paragraphs below it is mentioned in the docs: https://intuit.github.io/karate/#parallel-stats

sklearn subset fitted pipeline - reuse for tranform

I have constructed a pipeline with several steps which takes some time to fit. For debugging I would like to be able to inspect subsets of that pipeline (e.g. {pipe step 1-3}.transform(X)).
I know that I can use Pipe(pipe.named_steps[:3]) to extract a subset and construct a new pipeline from it. Unfortunately I have to refit the pipeline before calling transform on it.
Is there a way to avoid the refit?
You can access subparts of a Pipeline object by indexing it like a normal list, e.g. pipe[:3]. This will return a new, yet unfitted Pipeline instance. Interestingly though, its components are fitted.
In consequence, a check with scikit-learn's check_is_fitted function would raise an error. However, you can nonetheless call pipe[:3].transform(X) which will still work if you have fit the whole pipeline before.

TestComplete scripting checkpoints and comparisons

SO with my limited knowledge of test Complete scripting. It seems as though one should look at the object viewer to see your windows, and use the UI features via name mapping of these objects and clicking selecting or populating their fields.
I have a question about how to do assertions using the JSscripting tests. If i want to see if a certain window looks like a past window, what i have been doing is making a checkpoint via keyword tests at that time. I feel like i should be doing this through the api though. Is there an area that explains how to do this via code, rather then using the keyword checkpoints?
Bob, the checkpoints idea is not limited to Keyword Tests. You can use checkpoints in scripts as well. When recording a script, you just create the needed Checkpoint type via the Recording toolbar (I guess you need the Region Checkpoint in your case), and you will get the needed script generated. Based on this script, you will see how checkpoints are called from a script.
As for the documentation, the "Region Checkpoints" help topic does a good job explaining basics, and giving the links to other topics to read. And the "Creating Region Checkpoints" help topic shows the procedure step by step.
I hope this helps. Let me know if there are unclear points.

Resources