Strange Internal server error on Synapse pipeline - azure

I received the below error on Synapse pipeline. I am running pipeline with more cluster size with memory optimized clusters. I am just processing 7-8 JSON files of around 90MB of size each.
Error
{ "errorCode": "145", "message": "Internal Server Error in Synapse
batch operation: '[plugins.C4T-PRIV-SAW-CAS.IR-Test.19
WorkspaceType: CCID:]
[Monitoring] Livy Endpoint=[
https://hubservice1.westeurope.azuresynapse.net:8001/api/v1.0/publish/c1e53348-b457-4afd-a61d-76553bdd369c
]. Livy Id=[4] Job failed during run time with state=[dead].'.",
"failureType": "SystemError", "target": "DF_Load_CustomsShipment",
"details": [] }

The internal server errors are mostly noticed when there is an intermittent/transient issue with the dependent service.
If the issue is not auto resolved or you still experience the issue consistently, I would recommend raising a support ticket for the support engineer to investigate.

Related

Encountered an internal AutoML error - PipelineRunException: No group keys passed

I have created pipeline using python sdk AutoML for data preparation, AutoML training and deploy as endpoint url with scheduler option. It was working as expected for past day. Today i tried to create new pipline at the AutoML model creation I got error
PipelineRunException: No group keys passed!
Tried with different conda environment and also with new compute instance. Still that issue persist.
"message": "Encountered an internal AutoML error. Error Message/Code: PipelineRunException. Additional Info: PipelineRunException:\n\tMessage: PipelineRunException: No group keys passed!\n\tInnerException: None\n\tErrorResponse \n{\n "error": {\n "message": "PipelineRunException: No group keys passed!"\n }\n}",
"message_format": "Encountered an internal AutoML error. Error Message/Code: PipelineRunException. Additional Info: {error_details}"
what needs to be done?
The azure team fixed this bug and released it for all regions. Now it's working as expected.

Logic Apps "Get file Content" Sharepoint connector timed out for files over 20MB

As the topic implies, my logic app having a Get File Content from a sharepoint connector,which was working fine for all the files less than 20 MB. For the files greater than 20 MB getting timeout after 4 retries by giving the 500 Internal Server Error
I couldn't able to find this type of size limitations in the documentations.
I tried to use the chunks options but its only for upload not for retrieve
Some Other findings :
A file with 17 MB got succeeded at the second retry, however for files more than 20 MB it always getting failed even after 4 retries.
RAW Output:
{
"error": {
"code": 500,
"source": "logic-apis-northeurope.azure-apim.net",
"clientRequestId": "3a0bf64d-2b82-4aef-92ba-ff8b101e44bb",
"message": "BadGateway",
"innerError": {
"status": 500,
"message": "Request timed out. Try again later.\r\nclientRequestId: 3a0bf64d-2b82-4aef-92ba-ff8b101e44bb\r\nserviceRequestId: e0ce569f-96aa-d08b-1c7e-20a6ccf358c3",
"source": "https://xxxxx",
"errors": []
}
}
}
P.S I'm using on-prem sharepoint, i.e gateway is already using. However no timeout logs in the gateway,which makes me to eliminate the issue is not from gateway and from logic app
We can see logic app supports get data from sharepoint according to the on-premises data gateway in this document.
Then click limits on their payload size, we can see the limits of it.
Although the document above doesn't mention the limit of 20 MB for the data response, but I think when you request the data of 17 MB, it beyond exceed the limit. So it got succeeded at the second retry but not success at the first time.

Azure Data Factory Integration runtimes will not start

I have an issue where Azure Data Factory Integration runtimes will not start.
When I trigger the pipeline I get the following error in Monitor -> Pipeline runs "InternalServerError executing request"
Image 1
In "view activity run" I can see that it's the Data Flow that failed with the error
{
"errorCode": "1006",
"message": "Hit unexpected exception and execution failed.",
"failureType": "SystemError",
"target": "data_wrangling_ks",
"details": []
}
Image 2
(the two successful runs are from a Self-Hosted IR)
When i try to start "Data flow debug" it will just disappear without any information.
This issue started earlier today without any changes in Data Factory config or the pipeline.
Please help and thank you for your time.
SOLVED:
I changed the Compute type from General Purpose to Compute Optimized and that solved the problem.
By looking at the error message, it seems like this issue has occurred due ADF related service outage in West Europe region. The issue has been resolved by the product team. Please open a MSDN thread if you ever encounter this issue.
Ref: Azure Data Factory Pipeline failed while running data flows with error message : Hit unexpected exception and execution failed

Service Fabric Application PackageDeployment Operation Time Out exception

i have service fabric cluster and 3 nodes are created in 3 systems and it is inter-connected. i am able to connect each of nodes. These nodes are created in windows server. These Windows Server(VMs) are on-premises.
Manually i am trying to deploy my package into my cluster/one of nodes, i am getting Operation Timeout exception. i have used below commands to execute for deployment.
Service Fabric Power shell Commands:
Copy-ServiceFabricApplicationPackage -ApplicationPackagePath 'c:\sample\etc' -ApplicationPackagePathInImageStore 'abc.app.portaltype'
after execute above command it runs for 2 -3 mins and throws Operation Timeout exception. My package size is almost 250 MB and approximately 15000 file exist in my package. after that i have passed an extra parameter -TimeOutSec to 600(10mins) explicitly in above command, then it successfully executed and it copied to service fabric imagestore.
Register-ServiceFabricApplicationType -ApplicationPathInImageStore 'abc.app.portaltype'
after executed Copy-ServiceFabricApplicationPackage command , i have executed above Register-ServiceFabricApplicationType command to register my in cluster.but it also throws Operation timeout exception then i have passed an extra parameter -TimeOutSec to 600(10mins) explicitly in above command, but no luck it throws same operation timeout exception.
Just to make sure these operation Timeout issue because of no files in package or not. i have created simple empty service fabric asp.net core app and created package and try to deploy in same server with using above command, it deployed with in fraction of second and it works as smoothly.
Anybody has any idea how to over come service fabric operation timeout issue ?
How to handle the operation timeout issue if the package contains large set of files ?
Any help/suggestion would be very appreciated.
Thanks,
If this is taking longer than the 10 Minute default max it's probably one of the following issues:
Large application packages (>100s of MB)
Slow network connections
A large number of files within the application package (>1000s).
The following workarounds should help you.
Add the following settings to your cluster config:
"fabricSettings": [
{
"name": "NamingService",
"parameters": [
{
"name": "MaxOperationTimeout",
"value": "3600"
},
]
}
]
Also add:
"fabricSettings": [
{
"name": "EseStore",
"parameters": [
{
"name": "MaxCursors",
"value": "32768"
},
]
}
]
There’s a couple additional features which are currently rolling out. For these to be present and functional, you need to be sure that the client is at least 2.4.28 and the runtime of your cluster is at least 5.4.157. If you’re staying up to date these should already be present in your environment.
For register you can specify the -Async flag which will handle the upload asynchronously, reducing the need for the timeout to just the time necessary to send the command, not the application package. You can also query the status of the registration with Get-ServiceFabricApplicationType. 5.5 fixes some issues with these commands, so if they aren't working for you you'll have to wait for that release to hit your environment.

Azure ML: Getting Error 503: NoMoreResources to any web service API even when I only make 1 request

Getting the following response even when I make one request (concurrency set to 200) to a web service.
{ status: 503, headers: '{"content-length":"174","content-type":"application/json; charset=utf-8","etag":"\"8ce068bf420a485c8096065ea3e4f436\"","server":"Microsoft-HTTPAPI/2.0","x-ms-request-id":"d5c56cdd-644f-48ba-ba2b-6eb444975e4c","date":"Mon, 15 Feb 2016 04:54:01 GMT","connection":"close"}', body: '{"error":{"code":"ServiceUnavailable","message":"Service is temporarily unavailable.","details":[{"code":"NoMoreResources","message":"No resources available for request."}]}}' }
The request-response web service is a recommender retraining web service with the training set containing close to 200k records. The training set is already present in my ML studio dataset, only 10-15 extra records are passed in the request. The same experiment was working flawlessly till 13th Feb 2016. I have already tried increasing the concurrency but still the same issue. I even reduced the size of the training set to 20 records, still didn't work.
I have two web service both doing something similar and both aren't working since 13th Feb 2016.
Finally, I created a really small experiment ( skill.csv --> split row ---> web output ) which doesn't take any input. It just has to return some part of the dataset. Did not work, response code 503.
The logs I got are as follows
{
"version": "2014-10-01",
"diagnostics": [{
.....
{
"type": "GetResourceEndEvent",
"timestamp": 13.1362,
"resourceId": "5e2d653c2b214e4dad2927210af4a436.865467b9e7c5410e9ebe829abd0050cd.v1-default-111",
"status": "Failure",
"error": "The Uri for the target storage location is not specified. Please consider changing the request's location mode."
},
{
"type": "InitializationSummary",
"time": "2016-02-15T04:46:18.3651714Z",
"status": "Failure",
"error": "The Uri for the target storage location is not specified. Please consider changing the request's location mode."
}
]
}
What am I missing? Or am I doing it completely wrong?
Thank you in advance.
PS: Data is stored in mongoDB and then imported as CSV
This was an Azure problem. I quote the Microsoft guy,
We believe we have isolated the issue impacting tour service and we are currently working on a fix. We will be able to deploy this in the next couple of days. The problem is impacting only the ASIA AzureML region at this time, so if this is an option for you, might I suggest using a workspace in either the US or EU region until the fix gets rolled out here.
To view the complete discussion, click here

Resources