Encountered an internal AutoML error - PipelineRunException: No group keys passed - azure

I have created pipeline using python sdk AutoML for data preparation, AutoML training and deploy as endpoint url with scheduler option. It was working as expected for past day. Today i tried to create new pipline at the AutoML model creation I got error
PipelineRunException: No group keys passed!
Tried with different conda environment and also with new compute instance. Still that issue persist.
"message": "Encountered an internal AutoML error. Error Message/Code: PipelineRunException. Additional Info: PipelineRunException:\n\tMessage: PipelineRunException: No group keys passed!\n\tInnerException: None\n\tErrorResponse \n{\n "error": {\n "message": "PipelineRunException: No group keys passed!"\n }\n}",
"message_format": "Encountered an internal AutoML error. Error Message/Code: PipelineRunException. Additional Info: {error_details}"
what needs to be done?

The azure team fixed this bug and released it for all regions. Now it's working as expected.

Related

Strange Internal server error on Synapse pipeline

I received the below error on Synapse pipeline. I am running pipeline with more cluster size with memory optimized clusters. I am just processing 7-8 JSON files of around 90MB of size each.
Error
{ "errorCode": "145", "message": "Internal Server Error in Synapse
batch operation: '[plugins.C4T-PRIV-SAW-CAS.IR-Test.19
WorkspaceType: CCID:]
[Monitoring] Livy Endpoint=[
https://hubservice1.westeurope.azuresynapse.net:8001/api/v1.0/publish/c1e53348-b457-4afd-a61d-76553bdd369c
]. Livy Id=[4] Job failed during run time with state=[dead].'.",
"failureType": "SystemError", "target": "DF_Load_CustomsShipment",
"details": [] }
The internal server errors are mostly noticed when there is an intermittent/transient issue with the dependent service.
If the issue is not auto resolved or you still experience the issue consistently, I would recommend raising a support ticket for the support engineer to investigate.

Azure : Error 404: AciDeploymentFailed / Error 400 ACI Service request failed

I am trying to deploy a machine learning model through an ACI (Azure Container Instances) service. I am working in Python and I followed the following code (from the official documentation : https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli) :
The entry script file is the following (score.py):
import os
import dill
import joblib
def init():
global model
# Get the path where the deployed model can be found
model_path = os.getenv('AZUREML_MODEL_DIR')
# Load existing model
model = joblib.load('model.pkl')
# Handle request to the service
def run(data):
try:
# Pick out the text property of the JSON request
# Expected JSON details {"text": "some text to evaluate"}
data = json.loads(data)
prediction = model.predict(data['text'])
return prediction
except Exception as e:
error = str(e)
return error
And the model deployment workflow is as:
from azureml.core import Workspace
# Connect to workspace
ws = Workspace(subscription_id="my-subscription-id",
resource_group="my-ressource-group-name",
workspace_name="my-workspace-name")
from azureml.core.model import Model
model = Model.register(workspace = ws,
model_path= 'model.pkl',
model_name = 'my-model',
description = 'my-description')
from azureml.core.environment import Environment
# Name environment and call requirements file
# requirements: numpy, tensorflow
myenv = Environment.from_pip_requirements(name = 'myenv', file_path = 'requirements.txt')
from azureml.core.model import InferenceConfig
# Create inference configuration
inference_config = InferenceConfig(environment=myenv, entry_script='score.py')
from azureml.core.webservice import AciWebservice #AksWebservice
# Set the virtual machine capabilities
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 0.5, memory_gb = 3)
from azureml.core.model import Model
# Deploy ML model (Azure Container Instances)
service = Model.deploy(workspace=ws,
name='my-service-name',
models=[model],
inference_config=inference_config,
deployment_config=deployment_config)
service.wait_for_deployment(show_output = True)
I succeded once with the previous code. I noticed that during the deployment the Model.deploy created a container registry with a specific name (6e07ce2cc4ac4838b42d35cda8d38616).
The problem:
The API was working well and I wanted to deploy an other model from scratch. I deleted the API service and model from Azure ML Studio and the container registry from Azure ressources.
Unfortunately I am not able to deploy again anything.
Everything goes fine until the last step (the Model.deploy step), I have the following error message :
Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 46243f9b-3833-4650-8d47-3ac54a39dc5e
More information can be found here: https://machinelearnin2812599115.blob.core.windows.net/azureml/ImageLogs/46245f8b-3833-4659-8d47-3ac54a39dc5e/build.log?sv=2019-07-07&sr=b&sig=45kgNS4sbSZrQH%2Fp29Rhxzb7qC5Nf1hJ%2BLbRDpXJolk%3D&st=2021-10-25T17%3A20%3A49Z&se=2021-10-27T01%3A24%3A49Z&sp=r
Error:
{
"code": "AciDeploymentFailed",
"statusCode": 404,
"message": "No definition exists for Environment with Name: myenv Version: Autosave_2021-10-25T17:24:43Z_b1d066bf Reason: Container > registry 6e07ce2cc4ac4838b42d35cda8d38616.azurecr.io not found. If private link is enabled in workspace, please verify ACR is part of private > link and retry..",
"details": []
}
I do not understand why the first time a new container registry was well created, but now it seems that it is sought (the message is saying that container registry identified by name 6e07ce2cc4ac4838b42d35cda8d38616 is missing). I never found where I can force the creation of a new container registry ressource in Python, neither specify a name for it in AciWebservice.deploy_configuration or Model.deploy.
Does anyone could help me moving on with this? The best solution would be I think to delete totally this 6e07ce2cc4ac4838b42d35cda8d38616 container registry but I can't find where the reference is set so Model.deploy always fall to find it.
An other solution would be to force Model.deploy to generate a new container registry, but I could find how to make that.
It's been 2 days that I am on this and I really need your help !
PS : I am not at all a DEVOPS/MLOPS guy, I make data science and good models, but infrastructure and deployment is not really my thing so please be gentle on this part ! :-)
What I tried
Creating the container registry with same name
I tried to create the container registry by hand, but this time, this is the container that cannot be created. The Python output of the Model.deploy is the following :
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-10-25 19:25:10+02:00 Creating Container Registry if not exists.
2021-10-25 19:25:10+02:00 Registering the environment.
2021-10-25 19:25:13+02:00 Building image..
2021-10-25 19:30:45+02:00 Generating deployment configuration.
2021-10-25 19:30:46+02:00 Submitting deployment to compute.
Failed
Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 93780de6-7662-40d8-ab9e-4e1556ef880f
Current sub-operation type not known, more logs unavailable.
Error:
{
"code": "InaccessibleImage",
"statusCode": 400,
"message": "ACI Service request failed. Reason: The image '6e07ce2cc4ac4838b42d35cda8d38616.azurecr.io/azureml/azureml_684133370d8916c87f6230d213976ca5' in container group 'my-service-name-LM4HbqzEBEi0LTXNqNOGFQ' is not accessible. Please check the image and registry credential.. Refer to https://learn.microsoft.com/azure/container-registry/container-registry-authentication#admin-account and make sure Admin user is enabled for your container registry."
}
Setting admin user enabled
I tried to follow the recommandation of the last message saying to set Admin user enabled for the container registry. All what I saw in Azure interface is that a username and password appeared when enabling on user admin.
Unfortunately the same error message appears again if I try to relaunche my code and I am stucked here...
Changing name of the environment and model
This does not produces any change. Same errors.
As you tried with first attempt it was worked. After deleting the API service and model from Azure ML Studio and the container registry from Azure resources you are not able to redeploy again.
My assumption is your first attempt you are already register the Model Environment variable. So when you try to reregister by using the same model name while deploying it will gives you the error.
Thanks # anders swanson Your solution worked for me.
If you have already registered your env, myenv, and none of the details of the your environment have changed, there is no need re-register it with myenv.register(). You can simply get the already register env using Environment.get() like so:
myenv = Environment.get(ws, name='myenv', version=11)
My Suggestion is to name your environment as new value.
"model_scoring_env". Register it once, then pass it to the InferenceConfig.
Refer here

Service Fabric FABRIC_E_IMAGEBUILDER_VALIDATION_ERROR: DOWNLOAD PATH SANITIZED Error

I am deploying a Service Fabric application and encountered this error for a resource of type Microsoft.ServiceFabric/clusters/applicationTypes/versions:
Status: Failed
Error:
Code: ClusterChildResourceOperationFailed
Message: Resource operation failed. Operation: CreateOrUpdate. Error details: {
"Details": "FABRIC_E_IMAGEBUILDER_VALIDATION_ERROR: DOWNLOAD PATH SANITIZED"
}
Has anyone run into this issue before? If so, what was the root cause of the error?
When I encountered this error, my application type name in my manifest did not match the application type name that I was deploying to.
It is possible to view far more useful/relevant error messages under these scenarios by going to the Service Fabric Explorer.
e.g.
https://{my-service-fabric-clustername.example.com}:19080/Explorer/old.html#
NOTE: The "new" UI does not show these useful error details, you need to select the "View old SFX" interface option
Then clicking on the "Type" that I was uploading the application to, revelaed far more descriptive and helpful errors:
From my experience, this is an issue with the version number of the sfpkg not aligning with the version in the template's Microsoft.ServiceFabric/clusters/applicationTypes/versions. Try looking into the application package's ApplicationManifest.xml file for ApplicationTypeVersion for the right version.

Azure Data Factory Integration runtimes will not start

I have an issue where Azure Data Factory Integration runtimes will not start.
When I trigger the pipeline I get the following error in Monitor -> Pipeline runs "InternalServerError executing request"
Image 1
In "view activity run" I can see that it's the Data Flow that failed with the error
{
"errorCode": "1006",
"message": "Hit unexpected exception and execution failed.",
"failureType": "SystemError",
"target": "data_wrangling_ks",
"details": []
}
Image 2
(the two successful runs are from a Self-Hosted IR)
When i try to start "Data flow debug" it will just disappear without any information.
This issue started earlier today without any changes in Data Factory config or the pipeline.
Please help and thank you for your time.
SOLVED:
I changed the Compute type from General Purpose to Compute Optimized and that solved the problem.
By looking at the error message, it seems like this issue has occurred due ADF related service outage in West Europe region. The issue has been resolved by the product team. Please open a MSDN thread if you ever encounter this issue.
Ref: Azure Data Factory Pipeline failed while running data flows with error message : Hit unexpected exception and execution failed

Unable to register an ONNX model in azure machine learning service workspace

I was trying to register an ONNX model to Azure Machine Learning service workspace in two different ways, but I am getting errors I couldn't solve.
First method: Via Jupyter Notebook and python Script
model = Model.register(model_path = MODEL_FILENAME,
model_name = "MyONNXmodel",
tags = {"onnx":"V0"},
description = "test",
workspace = ws)
The error is : HttpOperationError: Operation returned an invalid status code 'Service invocation failed!Request: GET https://cert-westeurope.experiments.azureml.net/rp/workspaces'
Second method: Via Azure Portal
Anyone can help please?
error 413 means the payload is too large. Using Azure portal, you can only upload a model upto 25MB in size. Please use python SDK to upload models larger than 25MB.

Resources