Failed to flush task queue within 120 seconds on Automatic ML

Failed to flush task queue within 120 seconds on Automatic ML - azure

I'm running a Automatic ML on Azure but it fails and this is the error the child jobs are giving me.
Starting the automl_batch_driver setup...
Set enable_streaming flag to False
Batch Run Id in the real script: AutoML_11f87010-f7b0-4819-aecb-ad6b31f2469a_worker_11
2022-06-07 17:46:03,026817 - INFO - Beginning batch driver wrapper.
2022-06-07 17:46:03.297 - INFO - Successfully got the cache data store, caching enabled.
2022-06-07 17:46:03.297 - INFO - Took 0.13631367683410645 seconds to retrieve cache data store
2022-06-07 17:46:03.306 - INFO - No files are available for the cache store locally, downloading files from the Run.
2022-06-07 17:48:04.348 - ERROR - Type: {'code': 'ResourceExhausted', 'inner_error': {'code': 'Timeout'}}
Class: AzureMLException
Message: AzureMLException:
Message: Failed to flush task queue within 120 seconds
InnerException None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Failed to flush task queue within 120 seconds",
"inner_error": {
"code": "ResourceExhausted",
"inner_error": {
"code": "Timeout"
}
}
}
}

Related

How can I debug "Build failed: Too many concurrent builds" error when only one function is being deployed via Google Cloud Function?

I'm currently trying to deploy a function via the console. I have added variables, package specs, and service account credentials.
When I hit deploy, the status was in build with the spinning wheel for about ten minutes before coming back with a build failed icon.
When I went to the logs I am seeing the following:
status: {
code: 8
message: "Build failed: Too many concurrent builds, please stagger your deployments."
}
with severity: ERROR under resource.
There are several other cloud functions that are already deployed and active; they were deployed some time ago and are not currently being redeployed.
I have attempted to redeploy the function in question but that resulted in a timeout after 60 seconds.
Full logs below:
{
protoPayload: {
#type: "type.googleapis.com/google.cloud.audit.AuditLog"
status: {
code: 8
message: "Build failed: Too many concurrent builds, please stagger your deployments."
}
authenticationInfo: {
principalEmail: "user#user"
}
serviceName: "cloudfunctions.googleapis.com"
methodName: "google.cloud.functions.v1.CloudFunctionsService.CreateFunction"
resourceName: "projects/resource_name"
}
insertId: "-n11hqacqvq"
resource: {
type: "cloud_function"
labels: {3}
}
timestamp: "2021-02-18T22:16:56.681559Z"
severity: "ERROR"
logName: "projects/.../logs/cloudaudit.googleapis.com%2Factivity"
operation: {
id: "operations/..."
producer: "cloudfunctions.googleapis.com"
last: true
}
receiveTimestamp: "2021-02-18T22:16:56.858611526Z"
}

How to resolve "error sending: timeout expired while executing transaction" in hyperledger fabric?

I'm trying to upload a bulk data. Im splitting the records like 100 & trying to invoke . The thing is first 100 transactions executing well after that Im facing an issue like below.
DLT Error { Error: failed to execute transaction 53842934bed9ad4b1f604bdc253f2e06f5677383c0c91cc20f313fa40a85ebf8: error sending: timeout expired while executing transaction
at self._endorserClient.processProposal (/home/user/Project/node_modules/fabric-client/lib/Peer.js:140:36)
at Object.onReceiveStatus (/home/user/Project/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:1207:9)
at InterceptingListener._callNext (/home/user/Project/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:568:42)
at InterceptingListener.onReceiveStatus (/home/user/Project/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:618:8)
at callback (/home/user/Project/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:845:24)
status: 500,
payload: <Buffer >,
peer:
{ url: 'grpcs://ip:7051',
name: 'ip:7051',
options:
{ 'grpc.max_receive_message_length': -1,
'grpc.max_send_message_length': -1,
'grpc.keepalive_time_ms': 120000,
'grpc.http2.min_time_between_pings_ms': 120000,
'grpc.keepalive_timeout_ms': 20000,
'grpc.http2.max_pings_without_data': 0,
'grpc.keepalive_permit_without_calls': 1,
'grpc.ssl_target_name_override': 'peer0.tata.com',
'grpc.default_authority': 'peer0.tata.com' } },
isProposalResponse: true }
I have tried reducing 100 to 50 Records at a time & also increased the timeout in invoking file
let handle = setTimeout(() => {
event_hub.unregisterTxEvent(transaction_id_string);
event_hub.disconnect();
resolve({event_status : 'TIMEOUT'});
}, 70000);
But still facing the same issue. Can anybody please help me to fix this Issue?

Prisma: getting "com.prisma.deploy.schema.InvalidProjectId: No service with name 'default' and stage 'default' found" error

I'm getting errors related to name 'default' and stage 'default' when initializing new prisma project
Steps to reproduce:
Follow all the steps from official guide strictly
Get com.prisma.deploy.schema.InvalidProjectId: No service with name 'default' and stage 'default' found error when run prisma deploy
Get this error when performing a simple query from http://localhost:4466/graphql:
Query:
query {
user {
id
name
}
}
Response:
{
"errors": [
{
"message": "Project not found: 'graphql_default'",
"code": 3016,
"requestId": "local:cjzs556h5000f0754vc6k36qd"
}
]
}
Versions:
Connector: MongoDB
Prisma Server: 1.34.6
prisma CLI: prisma/1.34.6 (darwin-x64) node-v10.16.3
OS: OS X Mojave - 10.14.6
Logs from Docker:
$ docker logs hello-world_prisma_1
No log level set, defaulting to INFO.
[INFO] Cluster created with settings {hosts=[mongo:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
[INFO] Exception in monitor thread while connecting to server mongo:27017
Exception opening socket
com.mongodb.MongoSocketOpenException: Exception opening socket
at com.mongodb.internal.connection.AsynchronousSocketChannelStream$OpenCompletionHandler.failed(AsynchronousSocketChannelStream.java:272)
at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128)
at sun.nio.ch.Invoker$2.run(Invoker.java:218)
at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishConnect(UnixAsynchronousSocketChannelImpl.java:252)
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:198)
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)
at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293)
... 1 more
[INFO] Initializing workers...
[INFO] Obtaining exclusive agent lock...
[INFO] Obtaining exclusive agent lock... Successful.
[INFO] Successfully started 1 workers.
[INFO] No server chosen by com.mongodb.async.client.ClientSessionHelper$1#70a6c292 from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=mongo:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused}}]}. Waiting for 30000 ms before timing out
[INFO] Opened connection [connectionId{localValue:2, serverValue:1}] to mongo:27017
[INFO] Monitor thread successfully connected to server with description ServerDescription{address=mongo:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 6, 13]}, minWireVersion=0, maxWireVersion=6, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=16638401}
Server running on :4466
[INFO] Opened connection [connectionId{localValue:3, serverValue:2}] to mongo:27017
[INFO] Deployment worker initialization complete.
[Warning] Management authentication is disabled. Enable it in your Prisma config to secure your server.
{"key":"error/handled","requestId":"local:cjzs54qg500020754mbbzqni9","payload":{"exception":"com.prisma.deploy.schema.InvalidProjectId: No service with name 'default' and stage 'default' found","query":"\n query($name: String! $stage: String!) {\n project(name: $name stage: $stage) {\n name\n stage\n }\n }\n ","variables":"{\"name\":\"default\",\"stage\":\"default\"}","code":"4000","stack_trace":"com.prisma.deploy.schema.SchemaBuilderImpl.$anonfun$projectField$3(SchemaBuilder.scala:144)\\n scala.Option.getOrElse(Option.scala:121)\\n com.prisma.deploy.schema.SchemaBuilderImpl.$anonfun$projectField$2(SchemaBuilder.scala:144)\\n scala.util.Success.$anonfun$map$1(Try.scala:251)\\n scala.util.Success.map(Try.scala:209)\\n scala.concurrent.Future.$anonfun$map$1(Future.scala:288)\\n scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)\\n scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)\\n scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)\\n akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)\\n akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)\\n scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)\\n scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)\\n akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)\\n akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)\\n akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)\\n akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)\\n akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)\\n akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)\\n akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)","message":"No service with name 'default' and stage 'default' found"}}
[Debug] Initializing deployment worker for default_default
[Debug] Scheduling deployment for project default_default
[INFO] Opened connection [connectionId{localValue:4, serverValue:3}] to mongo:27017
[Debug] Applied migration for project default_default
Formatted [Warning]:
{
"key": "error/handled",
"requestId": "local:cjzs54qg500020754mbbzqni9",
"payload": {
"exception": "com.prisma.deploy.schema.InvalidProjectId: No service with name 'default' and stage 'default' found",
"query": "\n query($name: String! $stage: String!) {\n project(name: $name stage: $stage) {\n name\n stage\n }\n }\n ",
"variables": "{\"name\":\"default\",\"stage\":\"default\"}",
"code": "4000",
"stack_trace": "com.prisma.deploy.schema.SchemaBuilderImpl.$anonfun$projectField$3(SchemaBuilder.scala:144)\\n scala.Option.getOrElse(Option.scala:121)\\n com.prisma.deploy.schema.SchemaBuilderImpl.$anonfun$projectField$2(SchemaBuilder.scala:144)\\n scala.util.Success.$anonfun$map$1(Try.scala:251)\\n scala.util.Success.map(Try.scala:209)\\n scala.concurrent.Future.$anonfun$map$1(Future.scala:288)\\n scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)\\n scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)\\n scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)\\n akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)\\n akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)\\n scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)\\n scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)\\n akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)\\n akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)\\n akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)\\n akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)\\n akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)\\n akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)\\n akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)",
"message": "No service with name 'default' and stage 'default' found"
}
}
Formatted "query":
query($name: String! $stage: String!) {
project(name: $name stage: $stage) {
name
stage
}
}
Formatted "variables":
{ "name":"default", "stage":"default" }
Formatted stack trace:
com.prisma.deploy.schema.SchemaBuilderImpl.$anonfun$projectField$3(SchemaBuilder.scala:144)
scala.Option.getOrElse(Option.scala:121)
com.prisma.deploy.schema.SchemaBuilderImpl.$anonfun$projectField$2(SchemaBuilder.scala:144)
scala.util.Success.$anonfun$map$1(Try.scala:251)
scala.util.Success.map(Try.scala:209)
scala.concurrent.Future.$anonfun$map$1(Future.scala:288)
scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
P/s: It actually was running flawlessly some days before, but today I can't manage to make it work again!

Azure container fails to configure and then 'terminated'

I have a Docker container with an ASP.NET (.NET 4.7) web application. The Docker image works perfectly using our local docker deployment, but will not start on Azure and I cannot find any information or diagnostics on why that might be.
From the log stream I get
31/05/2019 11:05:34.487 INFO - Site: ip-app-develop-1 - Creating container for image: 3tcsoftwaredockerdevelop.azurecr.io/irs-plus-app:latest-develop.
31/05/2019 11:05:34.516 INFO - Site: ip-app-develop-1 - Create container for image: 3tcsoftwaredockerdevelop.azurecr.io/irs-plus-app:latest-develop succeeded. Container Id 1ea16ee9f5f128f14246fefcd936705bb8a655dc6cdbce184fb11970ef7b1cc9
31/05/2019 11:05:40.151 INFO - Site: ip-app-develop-1 - Start container succeeded. Container: 1ea16ee9f5f128f14246fefcd936705bb8a655dc6cdbce184fb11970ef7b1cc9
31/05/2019 11:05:43.745 INFO - Site: ip-app-develop-1 - Application Logging (Filesystem): On
31/05/2019 11:05:44.919 INFO - Site: ip-app-develop-1 - Container ready
31/05/2019 11:05:44.919 INFO - Site: ip-app-develop-1 - Configuring container
31/05/2019 11:05:57.448 ERROR - Site: ip-app-develop-1 - Error configuring container
31/05/2019 11:06:02.455 INFO - Site: ip-app-develop-1 - Container has exited
31/05/2019 11:06:02.456 ERROR - Site: ip-app-develop-1 - Container customization failed
31/05/2019 11:06:02.470 INFO - Site: ip-app-develop-1 - Purging pending logs after stopping container
31/05/2019 11:06:02.456 INFO - Site: ip-app-develop-1 - Attempting to stop container: 1ea16ee9f5f128f14246fefcd936705bb8a655dc6cdbce184fb11970ef7b1cc9
31/05/2019 11:06:02.470 INFO - Site: ip-app-develop-1 - Container stopped successfully. Container Id: 1ea16ee9f5f128f14246fefcd936705bb8a655dc6cdbce184fb11970ef7b1cc9
31/05/2019 11:06:02.484 INFO - Site: ip-app-develop-1 - Purging after container failed to start
After several restart attempts (manual or as a result of re-configuration) I will simply get:
2019-05-31T10:33:46 The application was terminated.
The application then refuses to even attempt to start regardless of whether I use the az cli or the portal.
My current logging configuration is:
{
"applicationLogs": {
"azureBlobStorage": {
"level": "Off",
"retentionInDays": null,
"sasUrl": null
},
"azureTableStorage": {
"level": "Off",
"sasUrl": null
},
"fileSystem": {
"level": "Verbose"
}
},
"detailedErrorMessages": {
"enabled": true
},
"failedRequestsTracing": {
"enabled": false
},
"httpLogs": {
"azureBlobStorage": {
"enabled": false,
"retentionInDays": 2,
"sasUrl": null
},
"fileSystem": {
"enabled": true,
"retentionInDays": 2,
"retentionInMb": 35
}
},
"id": "/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Web/sites/XXX/config/logs",
"kind": null,
"location": "North Europe",
"name": "logs",
"resourceGroup": "XXX",
"type": "Microsoft.Web/sites/config"
}
Further info on the app:
- deployed using a docker container
- docker base image mcr.microsoft.com/dotnet/framework/aspnet:4.7.2
- image entrypoint c:\ServiceMonitor.exe w3svc
- app developed in ASP.NET 4.7
- using IIS as a web server
Questions:
How can I get some diagnostics on what is going on to enable me to determine why the app is not starting?
Why does the app refuse to even attempt to restart after a few failed attempts?

We have had the same issue, at last we have seen that Appservice is mounting a directory with a "special cooked" version of servicemonitor.exe this version reads the events from the backend of the appservice. If you change your docker image for use this version of service monitor will work. We have created a small powershell and changed the entrypoint from this:
#WORKDIR /LogMonitor
SHELL ["C:\\LogMonitor\\LogMonitor.exe", "powershell.exe"]
# Start IIS Remote Management and monitor IIS
ENTRYPOINT Start-Service WMSVC; C:/ServiceMonitor.exe w3svc;
to this:
ENTRYPOINT ["C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe","-File","C:\\start-iis-environment.ps1"]
and we have created this powershell like:
if (Test-Path -Path 'C:\AppService\Util\ServiceMonitor.exe' -PathType Leaf) {
& C:\AppService\Util\ServiceMonitor.exe w3svc
}
else{
& C:\ServiceMonitor.exe w3svc
}

Spark Jobserver - stress test - Async POST error response: akka.pattern.AskTimeoutException

Hi I am trying to do stress test on the spark job server, and I am sharing the spark context with the following properties among the submitted jobs.
spark.executor.cores='2'
spark.cores.max='1'
spark.driver.cores='1'
spark.driver.memory='1g'
spark.executor.memory='1g'
spark.executor.instances='2'
spark.scheduler.mode='FAIR'
spark.scheduler.pool='fair_pool'
spark.scheduler.allocation.file='/spark-jobserver/scheduler.xml
When I post 10 jobs in 100 ms using Jmeter, only 4 to 5 jobs gives success respons and others give the following error:
{
"status": "ERROR",
"result": {
"message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/admin-context#-1409264293]] after [10000 ms]. Sender[null] sent message of type \"spark.jobserver.JobManagerActor$StartJob\".",
"errorClass": "akka.pattern.AskTimeoutException",
"stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)", "akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)", "scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)", "scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)", "akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)", "akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)", "akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)", "java.lang.Thread.run(Thread.java:748)"]
}
}
Please note that I am expecting asynchronous success response for how much ever the reponse time can be.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Failed to flush task queue within 120 seconds on Automatic ML - azure

Related

How can I debug "Build failed: Too many concurrent builds" error when only one function is being deployed via Google Cloud Function?

How to resolve "error sending: timeout expired while executing transaction" in hyperledger fabric?

Prisma: getting "com.prisma.deploy.schema.InvalidProjectId: No service with name 'default' and stage 'default' found" error

Azure container fails to configure and then 'terminated'

Spark Jobserver - stress test - Async POST error response: akka.pattern.AskTimeoutException

Categories

Resources