Spark Jobserver - stress test - Async POST error response: akka.pattern.AskTimeoutException

Spark Jobserver - stress test - Async POST error response: akka.pattern.AskTimeoutException - apache-spark

Hi I am trying to do stress test on the spark job server, and I am sharing the spark context with the following properties among the submitted jobs.
spark.executor.cores='2'
spark.cores.max='1'
spark.driver.cores='1'
spark.driver.memory='1g'
spark.executor.memory='1g'
spark.executor.instances='2'
spark.scheduler.mode='FAIR'
spark.scheduler.pool='fair_pool'
spark.scheduler.allocation.file='/spark-jobserver/scheduler.xml
When I post 10 jobs in 100 ms using Jmeter, only 4 to 5 jobs gives success respons and others give the following error:
{
"status": "ERROR",
"result": {
"message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/admin-context#-1409264293]] after [10000 ms]. Sender[null] sent message of type \"spark.jobserver.JobManagerActor$StartJob\".",
"errorClass": "akka.pattern.AskTimeoutException",
"stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)", "akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)", "scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)", "scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)", "akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)", "akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)", "akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)", "java.lang.Thread.run(Thread.java:748)"]
}
}
Please note that I am expecting asynchronous success response for how much ever the reponse time can be.

Related

Failed to flush task queue within 120 seconds on Automatic ML

I'm running a Automatic ML on Azure but it fails and this is the error the child jobs are giving me.
Starting the automl_batch_driver setup...
Set enable_streaming flag to False
Batch Run Id in the real script: AutoML_11f87010-f7b0-4819-aecb-ad6b31f2469a_worker_11
2022-06-07 17:46:03,026817 - INFO - Beginning batch driver wrapper.
2022-06-07 17:46:03.297 - INFO - Successfully got the cache data store, caching enabled.
2022-06-07 17:46:03.297 - INFO - Took 0.13631367683410645 seconds to retrieve cache data store
2022-06-07 17:46:03.306 - INFO - No files are available for the cache store locally, downloading files from the Run.
2022-06-07 17:48:04.348 - ERROR - Type: {'code': 'ResourceExhausted', 'inner_error': {'code': 'Timeout'}}
Class: AzureMLException
Message: AzureMLException:
Message: Failed to flush task queue within 120 seconds
InnerException None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Failed to flush task queue within 120 seconds",
"inner_error": {
"code": "ResourceExhausted",
"inner_error": {
"code": "Timeout"
}
}
}
}

How can I debug "Build failed: Too many concurrent builds" error when only one function is being deployed via Google Cloud Function?

I'm currently trying to deploy a function via the console. I have added variables, package specs, and service account credentials.
When I hit deploy, the status was in build with the spinning wheel for about ten minutes before coming back with a build failed icon.
When I went to the logs I am seeing the following:
status: {
code: 8
message: "Build failed: Too many concurrent builds, please stagger your deployments."
}
with severity: ERROR under resource.
There are several other cloud functions that are already deployed and active; they were deployed some time ago and are not currently being redeployed.
I have attempted to redeploy the function in question but that resulted in a timeout after 60 seconds.
Full logs below:
{
protoPayload: {
#type: "type.googleapis.com/google.cloud.audit.AuditLog"
status: {
code: 8
message: "Build failed: Too many concurrent builds, please stagger your deployments."
}
authenticationInfo: {
principalEmail: "user#user"
}
serviceName: "cloudfunctions.googleapis.com"
methodName: "google.cloud.functions.v1.CloudFunctionsService.CreateFunction"
resourceName: "projects/resource_name"
}
insertId: "-n11hqacqvq"
resource: {
type: "cloud_function"
labels: {3}
}
timestamp: "2021-02-18T22:16:56.681559Z"
severity: "ERROR"
logName: "projects/.../logs/cloudaudit.googleapis.com%2Factivity"
operation: {
id: "operations/..."
producer: "cloudfunctions.googleapis.com"
last: true
}
receiveTimestamp: "2021-02-18T22:16:56.858611526Z"
}

How to resolve "error sending: timeout expired while executing transaction" in hyperledger fabric?

I'm trying to upload a bulk data. Im splitting the records like 100 & trying to invoke . The thing is first 100 transactions executing well after that Im facing an issue like below.
DLT Error { Error: failed to execute transaction 53842934bed9ad4b1f604bdc253f2e06f5677383c0c91cc20f313fa40a85ebf8: error sending: timeout expired while executing transaction
at self._endorserClient.processProposal (/home/user/Project/node_modules/fabric-client/lib/Peer.js:140:36)
at Object.onReceiveStatus (/home/user/Project/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:1207:9)
at InterceptingListener._callNext (/home/user/Project/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:568:42)
at InterceptingListener.onReceiveStatus (/home/user/Project/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:618:8)
at callback (/home/user/Project/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:845:24)
status: 500,
payload: <Buffer >,
peer:
{ url: 'grpcs://ip:7051',
name: 'ip:7051',
options:
{ 'grpc.max_receive_message_length': -1,
'grpc.max_send_message_length': -1,
'grpc.keepalive_time_ms': 120000,
'grpc.http2.min_time_between_pings_ms': 120000,
'grpc.keepalive_timeout_ms': 20000,
'grpc.http2.max_pings_without_data': 0,
'grpc.keepalive_permit_without_calls': 1,
'grpc.ssl_target_name_override': 'peer0.tata.com',
'grpc.default_authority': 'peer0.tata.com' } },
isProposalResponse: true }
I have tried reducing 100 to 50 Records at a time & also increased the timeout in invoking file
let handle = setTimeout(() => {
event_hub.unregisterTxEvent(transaction_id_string);
event_hub.disconnect();
resolve({event_status : 'TIMEOUT'});
}, 70000);
But still facing the same issue. Can anybody please help me to fix this Issue?

Failed to parse Dialogflow response into AppResponse because of invalid platform response (with Youtube API)

I am trying to build a YouTube entertainment app using Google Assistant, following this tutorial: here. I have followed every step precisely, copying code verbatim, but when I run the test, I get this error:
MalformedResponse Failed to parse Dialogflow response into AppResponse because of invalid platform response: Could not find a RichResponse or SystemIntent in the platform response for agentId: ~~ and intentId: ~~. WebhookStatus: code: 2 message: "Webhook call failed. Error: UNKNOWN." ..
I'm not really well versed in DialogFlow, so I'm not sure what's happening. If anyone has any advice, I'd really appreciate it!
Edit: So, here's what happens that triggers the error. I follow the tutorial all the way to the end. I run the test and type in their test request 'rahman'. The response I get back from the test is the above error. I'm not sure what other details I can add, but if there's anything else I can provide, please let me know!
Edit 2: Following the next comment I received, I opened the cloud functions up in the GCP console and found that a new function was made called dialogflowFirebaseFulfillment. I checked the logs for the 'youtube' function I made, and found this notification:
{
insertId: "..."
labels: {
execution_id: ""
}
logName: "projects/<name of project>/logs/cloudfunctions.googleapis.com%2Fcloud-functions"
receiveTimestamp: "<time>"
resource: {
labels: {…}
type: "cloud_function"
}
severity: "ERROR"
textPayload: "Warning, estimating Firebase Config based on GCLOUD_PROJECT. Intializing firebase-admin may fail"
timestamp: "<time>"
}
I then checked out the new function that was made without me knowing and saw it didn't deploy, having an error: "Function failed on loading user code. Error message: Node.js module defined by file index.js is expected to export function named dialogflowFirebaseFulfillment". I checked the logs and found this:
{
insertId: "<id>"
logName: "projects/<project name>/logs/cloudaudit.googleapis.com%2Factivity"
operation: {
id: "operations/<id>"
last: true
producer: "cloudfunctions.googleapis.com"
}
protoPayload: {
#type: "type.googleapis.com/google.cloud.audit.AuditLog"
authenticationInfo: {
principalEmail: "<email>"
}
methodName: "google.cloud.functions.v1.CloudFunctionsService.UpdateFunction"
requestMetadata: {
destinationAttributes: {
}
requestAttributes: {
}
}
resourceName: "projects/<project name>/locations/us-central1/functions/dialogflowFirebaseFulfillment"
serviceName: "cloudfunctions.googleapis.com"
status: {
code: 3
message: "INVALID_ARGUMENT"
}
}
receiveTimestamp: "<time>"
resource: {
labels: {…}
type: "cloud_function"
}
severity: "ERROR"
timestamp: "<time>"
I know this isn't a good sign, but I also don't know how to really interpret where I should go fix the error. Any ideas would be appreciated, thanks!

Google Cloud PubSub/Datastore Error 13 & 14: "GOAWAY received" and "TCP Read/Write Fail"

Sorry for the long title. Having some issues randomly pop up (every handful of hours, but not on a regular schedule, could be anywhere from 3 hours to 8) when streaming data from Cloud PubSub into Cloud Datastore using Cloud Functions.
Source is a Node.js 6 script that receives an HTTP Post with info, writes to PubSub topic, then publishes topic to Cloud Datastore.
It is a modified version of this:
https://github.com/CiscoSE/serverless-cmx
Errors:
This first one happens sometimes with TCP Write instead of Read, but it's the same error.
ERROR: { Error: 14 UNAVAILABLE: TCP Read failed
at Object.exports.createStatusError (/user_code/node_modules/#google-cloud/datastore/node_modules/grpc/src/common.js:87:15)
at Object.onReceiveStatus (/user_code/node_modules/#google-cloud/datastore/node_modules/grpc/src/client_interceptors.js:1188:28)
at InterceptingListener._callNext (/user_code/node_modules/#google-cloud/datastore/node_modules/grpc/src/client_interceptors.js:564:42)
at InterceptingListener.onReceiveStatus (/user_code/node_modules/#google-cloud/datastore/node_modules/grpc/src/client_interceptors.js:614:8)
at callback (/user_code/node_modules/#google-cloud/datastore/node_modules/grpc/src/client_interceptors.js:841:24)
code: 14,
metadata: Metadata { _internal_repr: {} },
details: 'TCP Read failed' }
And:
ERROR: { Error: 13 INTERNAL: GOAWAY received
at Object.exports.createStatusError (/user_code/node_modules/#google-cloud/datastore/node_modules/grpc/src/common.js:87:15)
at Object.onReceiveStatus (/user_code/node_modules/#google-cloud/datastore/node_modules/grpc/src/client_interceptors.js:1188:28)
at InterceptingListener._callNext (/user_code/node_modules/#google-cloud/datastore/node_modules/grpc/src/client_interceptors.js:564:42)
at InterceptingListener.onReceiveStatus (/user_code/node_modules/#google-cloud/datastore/node_modules/grpc/src/client_interceptors.js:614:8)
at callback (/user_code/node_modules/#google-cloud/datastore/node_modules/grpc/src/client_interceptors.js:841:24)
code: 13,
metadata: Metadata { _internal_repr: {} },
details: 'GOAWAY received' }

It looks like there is a similar error for other services and the workaround is just to retry.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Spark Jobserver - stress test - Async POST error response: akka.pattern.AskTimeoutException - apache-spark

Related

Failed to flush task queue within 120 seconds on Automatic ML

How can I debug "Build failed: Too many concurrent builds" error when only one function is being deployed via Google Cloud Function?

How to resolve "error sending: timeout expired while executing transaction" in hyperledger fabric?

Failed to parse Dialogflow response into AppResponse because of invalid platform response (with Youtube API)

Google Cloud PubSub/Datastore Error 13 & 14: "GOAWAY received" and "TCP Read/Write Fail"

Categories

Resources