Because "reasons", we know that when we use azureml-sdk's HyperDriveStep we expect a number of HyperDrive runs to fail -- normally around 20%. How can we handle this without failing the entire HyperDriveStep (and then all downstream steps)? Below is an example of the pipeline.
I thought there would be an HyperDriveRunConfig param to allow for this, but it doesn't seem to exist. Perhaps this is controlled on the Pipeline itself with the continue_on_step_failure param?
The workaround we're considering is to catch the failed run within our train.py script and manually log the primary_metric as zero.
thanks for your question.
I'm assuming that HyperDriveStep is one of the steps in your Pipeline and that you want the remaining Pipeline steps to continue, when HyperDriveStep fails, is that correct?
Enabling continue_on_step_failure, should allow the rest of the pipeline steps to continue, when any single steps fails.
Additionally, the HyperDrive run consists of multiple child runs, controlled by the HyperDriveConfig. If the first 3 child runs explored by HyperDrive fail (e.g. with user script errors), the system automatically cancels the entire HyperDrive run, in order to avoid further wasting resources.
Are you looking to continue other Pipeline steps when the HyperDriveStep fails? or are you looking to continue other child runs within the HyperDrive run, when the first 3 child runs fail?
Thanks!
Related
I have this gitlab-ci job and I would like it to just ignore failures and keep going. Do you have a way of doing that? Note that allow_fail: true does not work because it will just ignore that the job have failed however I want that the job keep executing in spite of failing commands in the middle.
palms up, serious look: "We don't do that here"
The pipeline is supposed to work every time, and by design its commands cannot fail. You can however:
change the commands logic and avoid failure
split the commands in different jobs, using the on_failure parameter to manage workflow
force the commands to have a clean exit code (ie: using || true after the fallible command)
During debug I often use the third option after debug statement, or after commands that I'm not sure how will behave. The definitive version, however, is supposed to always work.
I have one DAG that has three task streams (licappts, agents, agentpolicy):
For simplicity I'm calling these three distinct streams. The streams are independent in the sense that just because agentpolicy failed doesn't mean the other two (liceappts and agents) should be affected by the other streams failure.
But for the sourceType_emr_task_1 tasks (i.e., licappts_emr_task_1, agents_emr_task_1, and agentpolicy_emr_task_1) I can only run one of these tasks at a time. For example I can't run agents_emr_task_1 and agentpolicy_emr_task_1 at the same time even though they are two independent tasks that don't necessarily care about each other.
How can I achieve this functionality in Airflow? For now the only thing I can think of is to wrap that task in a script that somehow locks a global variable, then if the variable is locked I'll have the script do a Thread.sleep(60 seconds) or something, and then retry. But that seems very hacky and I'm curious if Airflow offers a solution for this.
I'm open to restructuring the ordering of my DAG if needed to achieve this. One thing I thought about doing was to make a hard coded ordering of
Dag Starts -> ... -> licappts_emr_task_1 -> agents_emr_task_1 -> agentpolicy_emr_task_1 -> DAG Finished
But I don't think combining the streams this way because then for example agentpolicy_emr_task_1 has to wait for the other two to finish before it can start and there could be times when agentpolicy_emr_task_1 is ready to go before the other two have finished their other tasks.
So ideally I want whatever sourceType_emr_task_1 task to start that's ready first and then block the other tasks from running their sourceType_emr_task_1 task until it's finished.
Update:
Another solution I just thought of is if there is a way for me to check on the status of another task I could create a script for sourceType_emr_task_1 that checks to see if any of the other two sourceType_emr_task_1 tasks have a status of running, and if they do it'll sleep and periodically check to see if none of the other's are running, in which case it'll start it's process. I'm not a big fan of this way though because I feel like it could cause a race condition where both read (at the same time) that none are running and both start running.
You could use a pool to ensure the parallelism for those tasks is 1.
For each of the *_emr_task_1 tasks, set a pool kwarg to to be something like pool=emr_task.
Then just go into the webserver -> admin -> pools -> create:
Set the name Pool to match the pool used in your operator, and the Slots to be 1.
This will ensure the scheduler will only allow tasks to be queued for that pool up to the number of slots configured, regardless of the parallelism of the rest of Airflow.
I'm running scripts that require a different thread for each user account I pull from a database. So the script starts by running a JDBC processor to get all the accounts and store them (using the "Variable Names" field) in "accounts". Then I run a BeanShell PreProcessor to convert the variable "accounts_#" to a property:
props.put("p_accounts_#",vars.get("accounts_#"));
Then, I have a thread group start. Under "Number of Threads (users)", I have
${__P(p_accounts_#)}
The FIRST time I run this script (after launching jMeter), I only get a SINGLE thread. Every subsequent time I run it, it runs for all accounts.
It seems like for some reason, the property is not being saved until the end of the first execution. This is a very big problem as when jMeter is launched without the UI, it only does a single thread every time.
Am I setting the property incorrectly? I also tried it with a Beanshell Assertion with the same result.
Just as a test, I created a new test with the bare minimum I needed to reproduce this. Here's the script (images): http://imgur.com/a/WB5J2
It's a Beanshell PreProcessor with "props.put("accounts","12");"
Then a Thread group using "${__P(accounts)}" as the Number of Threads
Then inside that thread group is a Debug Sampler outputting the JMeter properties.
At the end is a View Results Tree.
When I run it the first time, there's only one output: "Thread 1 Running".
When I run it again, there's 12 outputs, "Tread 1 Running", "Thread 2 running", etc.
I can see that for both Debug Samplers (for the first run and second run), the "Accounts" property is set to 12. But the Thread Group needed to execute TWICE before it would work.
Any ideas?
This can be solved by adding another ThreadGroup called a 'setUp ThreadGroup' to contain the setup portion. If you put all of your staging steps into this type of threadgroup, it will run prior to any other threadgroups. You can then have your preprocessor, or move the logic to a beanshell sampler if you'd like, and set the property from there.
I have a feature file which has multiple given when and then steps for ex
// File My.feature
Give doUserLogin
And changeUserPreference
When executeWhen1
And executeWhen2
Then executeThen1
And executeThen2
These are mapped to step definitions correctly, the problem i'm facing is that some are getting executed parallel for ex. in given, 'changeUserPreference' is happening before 'doUserLogin'. Similarly in Then part, 'executeThen2' is triggered before 'executeThen1' is fully completed.
How to specify the dependency between these statements.Is there any way i can say don't start execution of second statement(given, when or then) until first one is executed completely.
If your 'doUserLogin' step is exiting before the download completes, that would explain why the 'changeUserPreference' is starting up. This could happen, say, if you were connecting to an external system and initiating a download and then the api you are using is performing the download in another thread, then the main thread would continue on to the next step while the download is continuing in another thread.
My advice would be to execute this scenario in debug mode (assuming you are using an IDE that supports this) and see if your 'doUserLogin' step is finishing before the file download.
Is there any way to run an exec task in cruise-control.net even if errors have occurred in the previous tasks. The same functionality which we get through the finally block in .net.
I want to execute a set of tasks which is independent of the success/failure of the previous tasks.
Thanks.
This is usually achieved using the <publishers/> element, which accepts the same Tasks as the <tasks/> element. Publishers are always executed, even if the <tasks/> fail.