Azure AutoML pipeline -Batch scoring fails with "DriverException: Job failed with There is no succeeded mini batch item returned from run() - azure-machine-learning-service

While running automl parallel run with tabulardataset ,
azureml_common.parallel_run.exception.NoResultToAppendError: There is no succeeded mini batch item returned from run()

Related

How to force PythonScriptStep to run in Azure ML

I'm relatively new to Azure ML and trying to run a model via PythonScriptStep
I can publish pipelines and run the model. However, once it has run once I can't re-submit the step as it states "This run reused the output from a previous run".
My code declares allow_reuse to be False, but this doesn't seem to make a difference and I can simply not resubmit the step even though the underlying data is changing.
train_step = PythonScriptStep(
name='model_train',
script_name="model_train.py",
compute_target=aml_compute,
runconfig=pipeline_run_config,
source_directory=train_source_dir,
allow_reuse=False)
Many thanks for your help

Azure Datafactory Pipeline Failed inside a scheduled trigger

I have created 2 pipeline in Azure Datafactory. We have a custom activity created to run a python script inside the pipeline.When the pipeline is executed manually it successfully run for n number of time.But i have created a scheduled trigger of an interval of 15 minutes in order to run the 2 pipelines.The first execution successfully runs but in the next interval i am getting the error "Operation on target PyScript failed: Hit unexpected exception and execution failed." we are blocked wiht this.any input on this would be really helpful.
from ADF troubleshooting guide, it states...
Custom Activity :
The following table applies to Azure Batch.
Error code: 2500
Message: Hit unexpected exception and execution failed.
Cause: Can't launch command, or the program returned an error code.
Recommendation: Ensure that the executable file exists. If the program started, make sure stdout.txt and stderr.txt were uploaded to the storage account. It's a good practice to emit copious logs in your code for debugging.
Related helpful doc: Tutorial: Run Python scripts through Azure Data Factory using Azure Batch
Hope this helps.
If you are still blocked, please share failed pipeline run ID & failed activity run ID, for further analysis.

Spark streaming error during job runtime in cluster (yarn resource manager)

I am facing the following error:
I wrote an application which is based on Spark streaming (Dstream) to pull messages coming from PubSub. Unfortunately, I am facing errors during the execution of this job. Actually I am using a cluster composed of 4 nodes to execute the spark Job.
After 10 minutes of the job running without any specific error, I get the following error permanently:
ERROR org.apache.spark.streaming.CheckpointWriter:
Could not submit checkpoint task to the thread pool executor java.util.concurrent.RejectedExecutionException: Task org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler#68395dc9 rejected
from java.util.concurrent.ThreadPoolExecutor#1a1acc25
[Running, pool size = 1, active threads = 1, queued tasks = 1000, completed tasks = 412]

Tensorboard + Keras + ML Engine

I currently have Google Cloud ML Engine setup to train models created in Keras. When using Keras, it seems ML Engine does not automatically save the logs to a storage bucket. I see the logs in the ML Engine Jobs page but they do not show in my storage bucket and therefore I am unable to run tensorboard while training.
You can see the job completed successfully and produced logs:
But then there are no logs saved in my storage bucket:
I followed this tutorial when setting up my environment: (http://liufuyang.github.io/2017/04/02/just-another-tensorflow-beginner-guide-4.html)
So, how do I get the logs and run tensorboard when training a Keras model on ML Engine? Has anyone else had success with this?
You will need to create a callback keras.callbacks.TensorBoard(..) in order to write out the logs. See Tensorboad callback. You can supply GCS path as well (gs://path/to/my/logs) to the log_dir argument of the callback and then point Tensorboard to that location. You will add the callback as a list when calling model.fit_generator(...) or model.fit(...).
tb_logs = callbacks.TensorBoard(
log_dir='gs://path/to/logs',
histogram_freq=0,
write_graph=True,
embeddings_freq=0)
model.fit_generator(..., callbacks=[tb_logs])

In Google Cloud Dataproc, where is all the log stored?

I have a PySpark job that I am distributing across a 1-master, 3-worker cluster.
I have some python print commands which help me debug my code.
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')
print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
Now, when I run the code on Google Dataproc with the master set as local, the print outputs correctly. However, when I try to run it on yarn, the print with YARN-based Spark, the print outputs do not appear in the Google Cloud Console under the jobs section of the Dataproc UI.
Where can I access these python print outputs from each of the workers and master which do not appear in the Google Dataproc Console
If you're using Dataproc, why to access the logs via Spark UI? The better way would be to:
Submit a job using gcloud dataproc jobs submit example
Once the job is submitted, you can access Cloud Dataproc job driver output using the Cloud Platform Console, the gcloud command, or Cloud Storage, as explained below.
The Cloud Platform Console allows you to view a job's realtime driver output. To view job output, go to your project's Cloud Dataproc Jobs section, then click on the Job ID to view job output.
Reference Documentation
If you really want to access to the YARN interface (with the detailed list of all the jobs and their logs), you can do the following :
Get the external ip address of your master. You can find it in the Cluster Details/VM instances in the UI.
http://img15.hostingpics.net/pics/386611Capturedecran20170403a162303.png
Just click on your master.
Connect to the URL : http://yourMastersExternalIpAddress:8088/cluster

Resources