If I go to Test results screen after run of my pipeline, it is showing each test case from Java/Maven/TestNG automated test project duplicated. One instance of each test case shows blank for machine name and the duplicate of that shows a machine name.
Run 1000122 - JUnit_TestResults_3662
There are several possibilities. First, if you added multiple configurations to a test plan, if so, the tests cases will be repeated in the plan with the each of the configurations you have assigned.
Another possibility is that when you passed parameters to the test method, did you use multiple parameters, so the test method was executed two times.
The information you provided is not sufficient. Can you share the code or screenshots of your Test Samples?
Related
I have a use case for testing authentication functionality where there are multiple test cases like login into an app, forgetting a password, log in to MFA-enabled applications. I have a set of multiple users that can be used in any of the test cases but then the issue arises when trying to run them in multiple browser contexts. I have stored my test data in a JSON file with the username and password of multiple sample users.
When let's say test runs for login to MFA enabled application, all three browser worker is launched simultaneously and all of them try to get user details from the test data file.
BUT the issue comes here, all of them picked up the first object let's say user A, all three browser test passes till the password step but when MFA is entered, it creates a race condition, the one worker who submitted the OTP first will pass but the rest fails because that 30 seconds window OTP is already redeemed.
I want to have a way that works in the same way as the synchronized method in Java where if a worker is using one user, don't make them available for another user, instead provide them next user from the test data.
Please guide me on how to do that in Playwright!
I'm sure there are far more elegant ways of doing this, but one approach is to use the worker index / parallel index feature described in the docs here:
https://playwright.dev/docs/test-parallel#worker-index-and-parallel-index
It looks like parallel index may be a better fit for your use case.
If each of your rows of test data includes the index of the worker it is intended for, then your code can ensure that worker 0 only picks up worker 0's test data and worker 1 picks up worker 1's test data.
Alternatively you could use testConfig.workers to determine how many workers there are and then use the remainer/modulo operator (%) in JS to split the rows up between the different workers:
https://playwright.dev/docs/api/class-testconfig#test-config-workers
So you would compare TestDataRowNumber % testConfig.workers to process.env.TEST_PARALLEL_INDEX to split the file up amongst however many workers you had.
I have two variable groups with overlapping keys but different values. I want to use one group under one task [ JSON replace ] and the other group in another [JSON replace ].
I have tried going through the documentations and it says that variables can only be set at root/stage/job levels. Is there a way I can work around it?
I want to use one group under one task [ JSON replace ] and the other group in another [JSON replace ].
According to the document Specify jobs in your pipeline, we could to know that:
You can organize your pipeline into jobs. Every pipeline has at least
one job. A job is a series of steps that run sequentially as a unit.
In other words, a job is the smallest unit of work that can be
scheduled to run.
And the variable group will be added as a preselected condition to the precompiled review stage, we could not re-set in the task level.
To resolve this issue, you could overwrite the specify variable by the Logging Command:
Write-Host "##vso[task.setvariable variable=testvar;]testvalue"
I want to perform hyperparameter search using AzureML. My models are small (around 1GB) thus I would like to run multiple models on the same GPU/node to save costs but I do not know how to achieve this.
The way I currently submit jobs is the following (resulting in one training run per GPU/node):
experiment = Experiment(workspace, experiment_name)
config = ScriptRunConfig(source_directory="./src",
script="train.py",
compute_target="gpu_cluster",
environment="env_name",
arguments=["--args args"])
run = experiment.submit(config)
ScriptRunConfig can be provided with a distributed_job_config. I tried to use MpiConfiguration there but if this is done the run fails due to an MPI error that reads as if the cluster is configured to only allow one run per node:
Open RTE detected a bad parameter in hostfile: [...]
The max_slots parameter is less than the slots parameter:
slots = 3
max_slots = 1
[...] ORTE_ERROR_LOG: Bad Parameter in file util/hostfile/hostfile.c at line 407
Using HyperDriveConfig also defaults to submitting one run to one GPU and additionally providing a MpiConfiguration leads to the same error as shown above.
I guess I could always rewrite my train script to train multiple models in parallel, s.t. each run wraps multiple trainings. I would like to avoid this option though, because then logging and checkpoint writes become increasingly messy and it would require a large refactor of the train pipeline. Also this functionality seems so basic that I hope there is a way to do this gracefully. Any ideas?
Use Run.create_children method which will start child runs that are “local” to the parent run, and don’t need authentication.
For AMLcompute max_concurrent_runs map to maximum number of nodes that will be used to run a hyperparameter tuning run.
So there would be 1 execution per node.
single service deployed but you can load multiple model versions in the init then the score function, depending on the request’s param, uses particular model version to score.
or with the new ML Endpoints (Preview).
What are endpoints (preview) - Azure Machine Learning | Microsoft Docs
I'm new to azure-ml, and have been tasked to make some integration tests for a couple of pipeline steps. I have prepared some input test data and some expected output data, which I store on a 'test_datastore'. The following example code is a simplified version of what I want to do:
ws = Workspace.from_config('blabla/config.json')
ds = Datastore.get(ws, datastore_name='test_datastore')
main_ref = DataReference(datastore=ds,
data_reference_name='main_ref'
)
data_ref = DataReference(datastore=ds,
data_reference_name='main_ref',
path_on_datastore='/data'
)
data_prep_step = PythonScriptStep(
name='data_prep',
script_name='pipeline_steps/data_prep.py',
source_directory='/.',
arguments=['--main_path', main_ref,
'--data_ref_folder', data_ref
],
inputs=[main_ref, data_ref],
outputs=[data_ref],
runconfig=arbitrary_run_config,
allow_reuse=False
)
I would like:
my data_prep_step to run,
have it store some data on the path to my data_ref), and
I would then like to access this stored data afterwards outside of the pipeline
But, I can't find a useful function in the documentation. Any guidance would be much appreciated.
two big ideas here -- let's start with the main one.
main ask
With an Azure ML Pipeline, how can I access the output data of a PythonScriptStep outside of the context of the pipeline?
short answer
Consider using OutputFileDatasetConfig (docs example), instead of DataReference.
To your example above, I would just change your last two definitions.
data_ref = OutputFileDatasetConfig(
name='data_ref',
destination=(ds, '/data')
).as_upload()
data_prep_step = PythonScriptStep(
name='data_prep',
script_name='pipeline_steps/data_prep.py',
source_directory='/.',
arguments=[
'--main_path', main_ref,
'--data_ref_folder', data_ref
],
inputs=[main_ref, data_ref],
outputs=[data_ref],
runconfig=arbitrary_run_config,
allow_reuse=False
)
some notes:
be sure to check out how DataPaths work. Can be tricky at first glance.
set overwrite=False in the `.as_upload() method if you don't want future runs to overwrite the first run's data.
more context
PipelineData used to be the defacto object to pass data ephemerally between pipeline steps. The idea was to make it easy to:
stitch steps together
get the data after the pipeline runs if need be (datastore/azureml/{run_id}/data_ref)
The downside was that you have no control over where the pipeline is saved. If you wanted to data for more than just as a baton that gets passed between steps, you could have a DataTransferStep to land the PipelineData wherever you please after the PythonScriptStep finishes.
This downside is what motivated OutputFileDatasetConfig
auxilary ask
how might I programmatically test the functionality of my Azure ML pipeline?
there are not enough people talking about data pipeline testing, IMHO.
There are three areas of data pipeline testing:
unit testing (the code in the step works?
integration testing (the code works when submitted to the Azure ML service)
data expectation testing (the data coming out of the meets my expectations)
For #1, I think it should be done outside of the pipeline perhaps as part of a package of helper functions
For #2, Why not just see if the whole pipeline completes, I think get more information that way. That's how we run our CI.
#3 is the juiciest, and we do this in our pipelines with the Great Expectations (GE) Python library. The GE community calls these "expectation tests". To me you have two options for including expectation tests in your Azure ML pipeline:
within the PythonScriptStep itself, i.e.
run whatever code you have
test the outputs with GE before writing them out; or,
for each functional PythonScriptStep, hang a downstream PythonScriptStep off of it in which you run your expectations against the output data.
Our team does #1, but either strategy should work. What's great about this approach is that you can run your expectation tests by just running your pipeline (which also makes integration testing easy).
I have a JMeter test that has two thread groups. The first thread group goes out and gets auth and audit tokens. The second requires the tokens to test the APIs on which I'm interested in gathering performance data. I have Listeners set up as children of the samplers in the second thread group only. Running JMeter I get the results I want. But when I execute the same test from Jenkins, I get results from the both of the thread groups. I don't want the results from the first thread group. They clutter up my graphs and since there is only one execution of each they fluctuate, performance wise, enough to trigger my unstable/failed percentages routinely. Is there a way to get Jenkins to report on only the listeners/samplers I want? Do I have to run one test to get the tokens and another to test? If so, how do I pass the tokens from one test to the other?
You can execute 2 jenkins jobs:
First job write to file the tokens using BeanShell/JSR223 PostProcessor
Second job read the tokens from file using CSV Data Set Config