Azure Synapse Sparkpool 0 core limit - azure

I want to use a Spark pool in Synapse, but for every notebook execution fails with the following error:
InvalidHttpRequestToLivy: Your Spark job requested 24 vcores.
However, the workspace has a 0 core limit.
Try reducing the numbers of vcores requested or increasing your vcore quota.
HTTP status code: 400.
A simple print statement gives me the same error. How can I fix this?

Related

Azure ML Internal Server Error and 404 Error

Azure ML pipeline run failed with status message ServiceError: InternalServerError.
404 error when viewing executionlogs.txt, stderrlogs.txt, and stdoutlogs.txt.
A pipeline run completed the day before. No changes were made between these runs.
Compute cluster properties:
VM size: Standard_D3_v2 (4 cores, 14 GB RAM, 200 GB disk)
Processing unit: CPU - General purpose
OS type: Linux
Location: East US
You need to delete and create the endpoint again and this should work.
Also, for Error 404, make sure that URL is correct.
The root cause of the issue:
Time-based retention policy was set on the storage account level. This immutability policy prevented Azure Machine Learning from writing log files to the workspaceblobstore, hence the 404 error when viewing any of the log file followed by InternalServerError.
Solution:
Allow protected writes to append blobs

How can I increase the request timeout in Java SDK v4 for Cosmos or Spring data for Cosmos v3?

I need to run an aggregate query to calculate the count of records e.g. SELECT r.product_id, r.rating, COUNT(1) FROM product_ratings r GROUP BY r.product_id, r.rating. The query works perfectly fine on the Azure Data Explorer, albeit a little slow. An optimised version of the query takes about 30 seconds when executed on the Data Explorer. However, when I run the same query in my Java app, it appears to be timing out in 5 seconds with the following exception:
com.azure.cosmos.implementation.GoneException: {"innerErrorMessage":"The requested resource is no longer available at the server."}
I believe this is due to a default request timeout of 5 seconds defined in ConnectionPolicy (both Direct and Gateway modes). I can't find a way to override this default. Is there a way to increase the request timeout? Is there another possible reason for this error?
Tried this both on the Java SDK v4 and Spring Data Connector v3 with the same end result i.e. GoneException.
You could consider the following recommendations,
The following should help to address the issue:
Try to increase http connection pool size (default is 1000, you can increase to 2000)
If you are using GateWay mode, try the DirectMode, more traffic will go over tcp and less traffic over http
You can refer the Github code on setting the timeout.

Azure ML - AKS Service deployment unable to handle concurrent requests despite auto scaling enabled

I have deployed around 23 models (amounting to 1.57 GB) in a Azure ML workspace using Azure Kubernetes Service. For the AKS cluster, I have used 3 D8sv3 nodes, and enabled cluster auto scaling for the cluster up to 6 nodes.
The AksWebService is configured with 4.4 cores, 16 GB memory. I have enabled pod auto scaling for the Web service, having set autoscale_max_replicas at 40:
aks_config = AksWebservice.deploy_configuration(cpu_cores = 4.4, memory_gb = 16, autoscale_enabled = True,
description = 'TEST - Configuration for Kubernetes Compute Target',
enable_app_insights = True, max_request_wait_time = 25000,
autoscale_target_utilization = 0.6, autoscale_max_replicas = 40)
I tried running load tests with 10 concurrent users (using JMeter). And I monitored the cluster application insights:
I can see the nodes and pods scaling. However, there is no spike in CPU/memory utilization. For 10 concurrent requests, only 5 to 6 requests pass, the rest fail. When I send an individual request to the deployed endpoint, the response is generated in 7 to 9 seconds. However, in the load test logs, there are plenty requests taking more than 15 seconds to generate a response. And the requests taking more than 25 seconds, fail with status code 503. I increased the max_request_wait_time due to this reason, however, I don't understand why it would take so much time despite such amount of compute, and the dashboard shows that memory isn't even 30% utilized. Should I be changing the replica_max_concurrent_requests param? Or should I be increasing the autoscale_max_replicas even more? Concurrent requests load may sometimes reach 100 in production, is there any solution to this?
Will be grateful for any advice. Thanks.

"Operation was cancelled" exception is throwing for long running Azure indexer

We are getting "Operation was cancelled" exception while Azure Indexer is running for larger records (around 2M+). Here are the log details -
"The operation was canceled. Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request. The I/O operation has been aborted because of either a thread exit or an application request "
We are running the indexer under thread. It is working for smaller records but for larger records (1M+), it is throwing Socket Exception.
Does anyone saw this error while running Azure Indexer for larger records (running for long time)?
(we have already increase httpclient timeout to maximum value for serviceClient object.)
This could happen because of happen because of excess http connections. Try to make your **HttpClient** static and see if anything improves. **HttpClient** timeout to maximum value is required to execute with maximum records.
You may also want to consider working to reduce your sql query time for best indexer performance. Also please share you code if possible.
Hope it helps.
Try set SearchServiceClient.HttpClient.Timeout to Timeout.InfiniteTimeSpan. You have to set the timeout before you send any request to Azure Cognitive Search.
client.HttpClient.Timeout = Timeout.InfiniteTimeSpan;

Presto Query error for memory

I am running complex query on Presto 0.148 on HDP 2.3 which errors out-
Query 20161215_175704_00035_tryh6 failed: Query exceeded local memory limit of 1GB
I am able to un simple queries without issues.
Configuration on coordinator and worker nodes-
http-server.http.port=9080
query.max-memory=50GB
query.max-memory-per-node=4GB
discovery.uri=http://host:9080
Query-
CREATE TABLE a.product_id, b.date, LOCATION FROM tblproduct a, day b WHERE b.date BETWEEN a.mfg_date AND a.exp_date
I had to restart and then configuration was updated. I see Presto keeping query result set in memory if we have any operation performed on result set.
Hence Presto needs lot of Reserved Memory and default setting of 1 GB is not good enough.
Make sure that you restart Presto after changing the config files, it seems like your configuration files are out of sync with the Presto server.

Resources