Conditional Retry of Failed Jobs using Bull in Node.js - node.js

I know we can retry failed jobs in Bull using the backoff attribute in Bull like below
var Queue = require('bull');
var myQueue = new Queue('myQueue');
runQueryQueue
.add(data, {
jobId: uid,
attempts: 10,
backoff: {
type: 'exponential',
delay: 2000,
},
stackTraceLimit: 100,
})
.then((result) => {})
.catch(console.log);
But I want to retry only when the failure reasons are worth retrying - example timeouts, rate limits, etc and not retry when the errors are related to user inputs, etc.
How can I check the failure error message and decide to retry?

You can mark a job as complete, e.g. do not throw errors, but write error data to a result of job completion. Result can be processed different way, if there is any sign of error.
async function jobProcess(job) {
if (doNotRetryError) {
return doNotRetryError
} else if (anyOtherError) {
throw new Error('retry')
} else {
return {success: true}
}
}
async function jobCompleted(job, result) {
if (result instanceof Error) {
// some error happened, but job shouldn't be retried
} else {
// job completed
}
}

Keep setting your backoff config. If the job is not worth retrying, set opts.attempts to attemptsMade + 1 or smaller inside the processor.
Your changed opts will not be saved, but your job will be moved to the failed list instantly.
async process(job: Job): Promise<void> {
try {
await doStuff(job);
} catch (error) {
if (!isWorthRetrying(error)) {
// Move job to failed list instantly by setting attemps to attemptsMade + 1 or smaller
job.opts.attempts = job.attemptsMade + 1;
}
throw error;
}
}

This answer is for bull-mq so would be useful if you're willing to upgrade.
There is now an UnrecoverableError you can throw and bull-mq will take care of not retying the job.
https://docs.bullmq.io/guide/retrying-failing-jobs#stop-retrying-jobs

bull has a job.discard() function that should allow you to mark a job as non-retriable in the process.

Related

Solace via NodeJS Not Waiting for Success

I'm trying to get Solace (a queuing system) to create a session, then send a message on that session. Instead, it listens to my session creation, receives all the event handlers (I registered all of them), fails to create that session and fails to tell me why. I cannot get this to WAIT for completion. I suspect if it had a few more microseconds, the session would be completed. The promises I have are not being kept. Any awaits that I put in are dutifully ignored.
The Typescript code below is attempting to make a connection to Solace to put a message on a queue. At a high level, it works by getting an instance of the Solace module, then it creates a Session, then with that session, it sends the message. Session creation returns an actual Session and not a promise. That doesn't mean it actually works though. Instead, you have to register an event handler. Because I don't see any of the console.log()s, I believe the createSession event handlers are not being run. Despite registering an event handler for every error in the session handler, Solace neither made the connection, nor said why. As far as I can tell, there's no concept of getting the current state of the session either.
Please note, in previous attempts, I was getting a WaitingForDNS error on the send. It also runs relatively quickly, so I don't think it's doing very much. When I turned on tracing, the most I could tell is that eventually Solace decided to resolve the IP address.
Please see my wishes annotated below:
export class TopicPublisher {
public async connect() {
// Return me a Promise for the Session; either the Session is fully loaded
// loaded, or it's rejected
return new Promise<Session>((resolve, reject) => {
if (this.session !== null) {
this.log("Already connected and ready to publish");
reject();
}
try {
this.session = this.solace.SolclientFactory.createSession({
// solace.SessionProperties
url: this.hosturl,
vpnName: this.vpn,
userName: this.username,
password: this.pass,
connectRetries: 1,
});
} catch (error: any) {
this.log('Error on creating session: ' + error.toString());
reject(error);
}
//The UP_NOTICE dictates whether the session has been established
this.session.on(solace.SessionEventCode.UP_NOTICE, () => {
// *** At this point, return the session as a successfully completing promise ***
this.log("=== Successfully connected and ready to subscribe. ===");
resolve(this.session);
});
//The CONNECT_FAILED_ERROR implies a connection failure
this.session.on(solace.SessionEventCode.CONNECT_FAILED_ERROR, (sessionEvent: { infoStr: string; }) => {
this.log("Connection failed to the message router: " + sessionEvent.infoStr + " - check correct parameter values and connectivity!");
reject(`Check the settings in game-config.ts and try again!`);
});
// Register every event handler in vain attempt at getting Solace to tell me
// why it does not work
let otherErrors = [
solace.SessionEventCode.DOWN_ERROR,
solace.SessionEventCode.REJECTED_MESSAGE_ERROR,
solace.SessionEventCode.SUBSCRIPTION_ERROR,
solace.SessionEventCode.SUBSCRIPTION_OK,
solace.SessionEventCode.VIRTUALROUTER_NAME_CHANGED,
solace.SessionEventCode.REQUEST_ABORTED,
solace.SessionEventCode.REQUEST_TIMEOUT,
solace.SessionEventCode.PROPERTY_UPDATE_OK,
solace.SessionEventCode.PROPERTY_UPDATE_ERROR,
solace.SessionEventCode.CAN_ACCEPT_DATA,
solace.SessionEventCode.RECONNECTING_NOTICE,
solace.SessionEventCode.RECONNECTED_NOTICE,
solace.SessionEventCode.REPUBLISHING_UNACKED_MESSAGES,
solace.SessionEventCode.ACKNOWLEDGED_MESSAGE,
solace.SessionEventCode.UNSUBSCRIBE_TE_TOPIC_OK,
solace.SessionEventCode.UNSUBSCRIBE_TE_TOPIC_ERROR,
solace.SessionEventCode.MESSAGE,
solace.SessionEventCode.GUARANTEED_MESSAGE_PUBLISHER_DOWN
];
for (let errorCodeIndex = 0; errorCodeIndex < otherErrors.length; errorCodeIndex++) {
this.log('Registering error handler code: '+otherErrors[errorCodeIndex]);
this.session.on(otherErrors[errorCodeIndex], (sessionEvent: { infoStr: string; }) => {
this.log("Connection failed with error code : " + otherErrors[errorCodeIndex] + " " + sessionEvent.infoStr);
reject(`Check the config settings`);
});
}
//DISCONNECTED implies the client was disconnected
this.session.on(solace.SessionEventCode.DISCONNECTED, (sessionEvent: any) => {
this.log("Disconnected.");
if (this.session !== null) {
this.session.dispose();
//this.subscribed = false;
this.session = null;
}
});
try {
this.session.connect();
} catch (error: any) {
reject();
}
});
};
public async publish(topicName: string, payload: any) {
// This builds a message payload, it works fine
let solaceMessage = this.getSolaceMessage(topicName, payload);
try {
// *** It does *not* wait for the connection ***
console.log('##This point is reached');
let localSession = await this.connect();
// UP_EVENT ***SHOULD*** have happened, but it does not wait for any events
// or promises to be completed.
console.log('##This point is reached');
console.log('localSession =' + localSession);
localSession.send(solaceMessage);
} catch (error) {
}
};
}
let topicPublisher: TopicPublisher = new TopicPublisher(getInitializedSolaceModule(),
argumentParser.hosturl,
argumentParser.usernamevpn,
argumentParser.username,
argumentParser.vpn,
argumentParser.pass,
argumentParser.topicName);
topicPublisher.publish(argumentParser.topicName, readMessageFromFile(argumentParser.messageFileSpecification)).then(() => {
console.log('##This point is reached');
}, () => {
console.log('##BP10.5 Error handler on publish');
}
).catch(error => {
console.log('publish error' + error);
});
console.log('##This point is reached');
topicPublisher.disconnect();
console.log('##This point is reached');
Solace API documentation is at https://docs.solace.com/API-Developer-Online-Ref-Documentation/nodejs/index.html, but I'm not sure this is a Solace error.
I don't have great exposure to TypeScript - is it possible that the check 'this.session !== null' ends up rejecting the promise, and no session is created. An uninitialized value, if it holds undefined, a !== null check would fail. Maybe your log output sequence can shed light on this.
My apologies, this is a silly point, and not offering any direct help.

GetDone in ibmmq (Node.js) doesn't stop listener of the queue

I'm using ibmmq module https://github.com/ibm-messaging/mq-mqi-nodejs
I need to get message by CorrelId and then stop listen to the queue.
async listen(queue: string, messageId?: string, waitInterval?: number) {
let mqmd = new mq.MQMD()
let gmo = new mq.MQGMO()
gmo.Options = this.MQC.MQGMO_NO_SYNCPOINT | this.MQC.MQGMO_WAIT | this.MQC.MQGMO_CONVERT | this.MQC.MQGMO_NO_PROPERTIES | this.MQC.MQGMO_FAIL_IF_QUIESCING
gmo.MatchOptions = this.MQC.MQMO_MATCH_CORREL_ID
mqmd.CorrelId = this.hexToBytes(messageId)
gmo.WaitInterval = this.MQC.MQWI_UNLIMITED
mq.Get(obj as mq.MQObject, mqmd, gmo, getCB)
}
And the getCB function:
getCB(err: mq.MQError, hObj: mq.MQObject, gmo: mq.MQGMO, mqmd: mq.MQMD, buf: Buffer, hConn: mq.MQQueueManager) {
if (err) {
...
} else {
...
console.log('GetDone:', hObj)
mq.GetDone(hObj, err => {
console.log('GetDoneError:', err)
})
}
}
I start listening to the queue. Then I put a message with the CorrelId there. The listener get it. I see 'GetDone' in the terminal.
And then I put a message with the same CorrelId. And I get that message and Error.
GetDoneError: MQError: GetDone: MQCC = MQCC_FAILED [2] MQRC = MQRC_HOBJ_ERROR [2019]
at Object.exports.GetDone (/home/apps/connector/node_modules/ibmmq/lib/mqi.js:2316:11)
at MqiConnector.getCB (/home/apps/connector/src/wmq-mqi-connector.js:206:20)
at /home/apps/connector/node_modules/ibmmq/lib/mqi.js:2263:14
at Object.<anonymous> (/home/apps/connector/node_modules/ffi-napi/lib/_foreign_function.js:115:9) {
mqcc: 2,
mqccstr: 'MQCC_FAILED',
mqrc: 2019,
mqrcstr: 'MQRC_HOBJ_ERROR',
version: '1.0.0',
verb: 'GetDone'
}
Looks like the loop with the function getCB didn't stop after GetDone.
I get messages with this CorrelId as many times as I send them. And every time I see this error. The listener is still running.
What am I doing wrong?
I suspect that you are calling GetDone twice and the second time hObj is invalid in the call to mq.GetDone.
mq.GetDone(hObj, err => {
console.log('GetDoneError:', err)
})
I think you have fallen foul of Node.js asynchronous nature and you have hit a timing issue. IE. the cleanup following GetDone has not completed around the same time as the next message is being retrieved.
The function GetDone seems to be synchronous and can be found in https://github.com/ibm-messaging/mq-mqi-nodejs/blob/3a99e0bbbeb017cc5e8498a59c32967cbd2b27fe/lib/mqi.js
The error appears to come from this snippet in GetDone -
var userContext = getUserContext(jsObject);
var err;
if (!userContext) {
err = new MQError(MQC.MQCC_FAILED,MQC.MQRC_HOBJ_ERROR,"GetDone");
} else {
deleteUserContext(jsObject);
}
First time through userContext is found and then deleted. Second time round userContext doesn't exist and the error is thrown.
The Sample in the repo - https://github.com/ibm-messaging/mq-mqi-nodejs/blob/72fba926b7010a85ce2a2c6459d2e9c58fa066d7/samples/amqsgeta.js
only calls GetDone in an error condition, ie. when there are either no messages on the queue or there has been a problem getting the next message off the queue.
function getCB(err, hObj, gmo,md,buf, hConn ) {
// If there is an error, prepare to exit by setting the ok flag to false.
if (err) {
if (err.mqrc == MQC.MQRC_NO_MSG_AVAILABLE) {
console.log("No more messages available.");
} else {
console.log(formatErr(err));
exitCode = 1;
}
ok = false;
// We don't need any more messages delivered, so cause the
// callback to be deleted after this one has completed.
mq.GetDone(hObj);
} else {
if (md.Format=="MQSTR") {
console.log("message <%s>", decoder.write(buf));
} else {
console.log("binary message: " + buf);
}
}
}
Whereas you are calling it when you have retrieved a message. You may need to create a guard that stops you calling it twice.
As for why the second message has been obtained, without an error, you might need to raise an issue on the ibmmq module.

Waiting in a while loop on an async function (Node.js/ES6)

I'm writing a Windows Node.js server app (using ES6 btw).
The first thing I want to do - in the top-level code - is sit in a while loop, calling an async function which searches for a particular registry key/value. This function is 'proven' - it returns the value data if found, or else throws:
async GetRegValue(): Promise<string> { ... }
I need to sit in a while loop until the registry item exists, and then grab the value data. (With a delay between retries).
I think I know how to wait for an async call to complete (one way or the other) before progressing with the rest of the start-up, but I can't figure out how to sit in a loop waiting for it to succeed.
Any advice please on how to achieve this?
(I'm fairly new to typescript, and still struggling to get my head round all async/await scenarios!)
Thanks
EDIT
Thanks guys. I know I was 'vague' about my code - I didn't want to put my real/psuedo code attempts, since they have all probably overlooked the points you can hopefully help me understand.
So I just kept it as a textual description... I'll try though:
async GetRegValue(): Promise<string> {
const val: RegistryItem = await this.GetKeyValue(this.KEY_SW, this.VAL_CONN);
return val.value
}
private async GetKeyValue(key: string, name: string): Promise<RegistryItem> {
return await new Promise((resolve, reject) => {
new this.Registry({
hive: this.Hive, key
}).get(name, (err, items) => {
if (err) {
reject(new Error('Registry get failed'));
}
else {
resolve( items );
}
});
})
.catch(err => { throw err });
}
So I want to do something like:
let keyObtained = false
let val
while (keyObtained == false)
{
// Call GetRegValue until val returned, in which case break from loop
// If exception then pause (e.g. ~100ms), then loop again
}
}
// Don't execute here till while loop has exited
// Then use 'val' for the subsequent statements
As I say, GetRegValue() works fine in other places I use it, but here I'm trying to pause further execution (and retry) until it does come back with a value
You can probably just use recursion. Here is an example on how you can keep calling the GetRegValue function until is resolves using the retryReg function below.
If the catch case is hit, it will just call GetRegValue over and over until it resolves successfully.
you should add a counter in the catch() where if you tried x amount of times you give up.
Keep in mind I mocked the whole GetRegValue function, but given what you stated this would still work for you.
let test = 0;
function GetRegValue() {
return new Promise((resolve, reject) => {
setTimeout(function() {
test++;
if (test === 4) {
return resolve({
reg: "reg value"
});
}
reject({
msg: "not ready"
});
}, 1000);
});
}
function retryReg() {
GetRegValue()
.then(registryObj => {
console.log(`got registry obj: ${JSON.stringify(registryObj)}`)
})
.catch(fail => {
console.log(`registry object is not ready: ${JSON.stringify(fail)}`);
retryReg();
});
}
retryReg();
I don't see why you need this line:
.catch(err => { throw err });
The loop condition of while isn't much use in this case, as you don't really need a state variable or expression to determine if the loop should continue:
let val;
while (true)
{
try {
val = await GetRegValue(/* args */);
break;
} catch (x) {
console.log(x); // or something better
}
await delay(100);
}
If the assignment to val succeeds, we make it to the break; statement and so we leave the loop successfully. Otherwise we jump to the catch block and log the error, wait 100 ms and try again.
It might be better to use a for loop and so set a sensible limit on how many times to retry.
Note that delay is available in an npm package of the same name. It's roughly the same as:
await new Promise(res => setTimeout(res, 100));

Firebase Nodejs : DEADLINE_EXCEEDED: Deadline exceeded on set() [duplicate]

I took one of the sample functions from the Firestore documentation and was able to successfully run it from my local firebase environment. However, once I deployed to my firebase server, the function completes, but no entries are made in the firestore database. The firebase function logs show "Deadline Exceeded." I'm a bit baffled. Anyone know why this is happening and how to resolve this?
Here is the sample function:
exports.testingFunction = functions.https.onRequest((request, response) => {
var data = {
name: 'Los Angeles',
state: 'CA',
country: 'USA'
};
// Add a new document in collection "cities" with ID 'DC'
var db = admin.firestore();
var setDoc = db.collection('cities').doc('LA').set(data);
response.status(200).send();
});
Firestore has limits.
Probably “Deadline Exceeded” happens because of its limits.
See this. https://firebase.google.com/docs/firestore/quotas
Maximum write rate to a document 1 per second
https://groups.google.com/forum/#!msg/google-cloud-firestore-discuss/tGaZpTWQ7tQ/NdaDGRAzBgAJ
In my own experience, this problem can also happen when you try to write documents using a bad internet connection.
I use a solution similar to Jurgen's suggestion to insert documents in batch smaller than 500 at once, and this error appears if I'm using a not so stable wifi connection. When I plug in the cable, the same script with the same data runs without errors.
I have written this little script which uses batch writes (max 500) and only write one batch after the other.
use it by first creating a batchWorker let batch: any = new FbBatchWorker(db);
Then add anything to the worker batch.set(ref.doc(docId), MyObject);. And finish it via batch.commit().
The api is the same as for the normal Firestore Batch (https://firebase.google.com/docs/firestore/manage-data/transactions#batched-writes) However, currently it only supports set.
import { firestore } from "firebase-admin";
class FBWorker {
callback: Function;
constructor(callback: Function) {
this.callback = callback;
}
work(data: {
type: "SET" | "DELETE";
ref: FirebaseFirestore.DocumentReference;
data?: any;
options?: FirebaseFirestore.SetOptions;
}) {
if (data.type === "SET") {
// tslint:disable-next-line: no-floating-promises
data.ref.set(data.data, data.options).then(() => {
this.callback();
});
} else if (data.type === "DELETE") {
// tslint:disable-next-line: no-floating-promises
data.ref.delete().then(() => {
this.callback();
});
} else {
this.callback();
}
}
}
export class FbBatchWorker {
db: firestore.Firestore;
batchList2: {
type: "SET" | "DELETE";
ref: FirebaseFirestore.DocumentReference;
data?: any;
options?: FirebaseFirestore.SetOptions;
}[] = [];
elemCount: number = 0;
private _maxBatchSize: number = 490;
public get maxBatchSize(): number {
return this._maxBatchSize;
}
public set maxBatchSize(size: number) {
if (size < 1) {
throw new Error("Size must be positive");
}
if (size > 490) {
throw new Error("Size must not be larger then 490");
}
this._maxBatchSize = size;
}
constructor(db: firestore.Firestore) {
this.db = db;
}
async commit(): Promise<any> {
const workerProms: Promise<any>[] = [];
const maxWorker = this.batchList2.length > this.maxBatchSize ? this.maxBatchSize : this.batchList2.length;
for (let w = 0; w < maxWorker; w++) {
workerProms.push(
new Promise((resolve) => {
const A = new FBWorker(() => {
if (this.batchList2.length > 0) {
A.work(this.batchList2.pop());
} else {
resolve();
}
});
// tslint:disable-next-line: no-floating-promises
A.work(this.batchList2.pop());
}),
);
}
return Promise.all(workerProms);
}
set(dbref: FirebaseFirestore.DocumentReference, data: any, options?: FirebaseFirestore.SetOptions): void {
this.batchList2.push({
type: "SET",
ref: dbref,
data,
options,
});
}
delete(dbref: FirebaseFirestore.DocumentReference) {
this.batchList2.push({
type: "DELETE",
ref: dbref,
});
}
}
I tested this, by having 15 concurrent AWS Lambda functions writing 10,000 requests into the database into different collections / documents milliseconds part. I did not get the DEADLINE_EXCEEDED error.
Please see the documentation on firebase.
'deadline-exceeded': Deadline expired before operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire.
In our case we are writing a small amount of data and it works most of the time but loosing data is unacceptable. I have not concluded why Firestore fails to write in simple small bits of data.
SOLUTION:
I am using an AWS Lambda function that uses an SQS event trigger.
# This function receives requests from the queue and handles them
# by persisting the survey answers for the respective users.
QuizAnswerQueueReceiver:
handler: app/lambdas/quizAnswerQueueReceiver.handler
timeout: 180 # The SQS visibility timeout should always be greater than the Lambda function’s timeout.
reservedConcurrency: 1 # optional, reserved concurrency limit for this function. By default, AWS uses account concurrency limit
events:
- sqs:
batchSize: 10 # Wait for 10 messages before processing.
maximumBatchingWindow: 60 # The maximum amount of time in seconds to gather records before invoking the function
arn:
Fn::GetAtt:
- SurveyAnswerReceiverQueue
- Arn
environment:
NODE_ENV: ${self:custom.myStage}
I am using a dead letter queue connected to my main queue for failed events.
Resources:
QuizAnswerReceiverQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: ${self:provider.environment.QUIZ_ANSWER_RECEIVER_QUEUE}
# VisibilityTimeout MUST be greater than the lambda functions timeout https://lumigo.io/blog/sqs-and-lambda-the-missing-guide-on-failure-modes/
# The length of time during which a message will be unavailable after a message is delivered from the queue.
# This blocks other components from receiving the same message and gives the initial component time to process and delete the message from the queue.
VisibilityTimeout: 900 # The SQS visibility timeout should always be greater than the Lambda function’s timeout.
# The number of seconds that Amazon SQS retains a message. You can specify an integer value from 60 seconds (1 minute) to 1,209,600 seconds (14 days).
MessageRetentionPeriod: 345600 # The number of seconds that Amazon SQS retains a message.
RedrivePolicy:
deadLetterTargetArn:
"Fn::GetAtt":
- QuizAnswerReceiverQueueDLQ
- Arn
maxReceiveCount: 5 # The number of times a message is delivered to the source queue before being moved to the dead-letter queue.
QuizAnswerReceiverQueueDLQ:
Type: "AWS::SQS::Queue"
Properties:
QueueName: "${self:provider.environment.QUIZ_ANSWER_RECEIVER_QUEUE}DLQ"
MessageRetentionPeriod: 1209600 # 14 days in seconds
If the error is generate after around 10 seconds, probably it's not your internet connetion, it might be that your functions are not returning any promise. In my experience I got the error simply because I had wrapped a firebase set operation(which returns a promise) inside another promise.
You can do this
return db.collection("COL_NAME").doc("DOC_NAME").set(attribs).then(ref => {
var SuccessResponse = {
"code": "200"
}
var resp = JSON.stringify(SuccessResponse);
return resp;
}).catch(err => {
console.log('Quiz Error OCCURED ', err);
var FailureResponse = {
"code": "400",
}
var resp = JSON.stringify(FailureResponse);
return resp;
});
instead of
return new Promise((resolve,reject)=>{
db.collection("COL_NAME").doc("DOC_NAME").set(attribs).then(ref => {
var SuccessResponse = {
"code": "200"
}
var resp = JSON.stringify(SuccessResponse);
return resp;
}).catch(err => {
console.log('Quiz Error OCCURED ', err);
var FailureResponse = {
"code": "400",
}
var resp = JSON.stringify(FailureResponse);
return resp;
});
});

Awaited async method keep previous retry payload

In our program, we try to implement a task retry pattern with await.
Our main problem is our method keeps the first retry payload in subsequent ones.
Here is the retry method:
async retryTaskUntilExpectedValue({
task,
expectedValue,
messageOnError = 'Max retry number reached without expected result',
maxRetries = 10,
timeout = 10,
spinner = null
}) {
let printFn = console.log;
if (spinner !== null) {
printFn = spinner.text;
}
// Proceed retries
for (let i = 1; i <= maxRetries; i++) {
try {
let result = await task;
console.log(result); // Always display same result: {"state": "upgrading"} even if curling returns {"state": "upgraded"} after about 2 retries
result = JSON.parse(result).state;
if (result === expectedValue) {
return Promise.resolve(result);
} else if (i <= maxRetries) {
printFn(`Result "${result}" differs from expected value "${expectedValue}"`);
await wait(1000);
printFn(`Waiting ${timeout}s before retry`);
await wait(timeout * 1000);
printFn(`Retrying (${i})`);
continue;
} else {
return Promise.reject(`ERROR: ${messageOnError}`);
}
} catch (err) {
return Promise.reject(`ERROR: Unexpected error while running task`);
}
}
};
And the use in our CLI:
checkUpgrade(url) {
return retryTaskUntilExpectedValue({
task: this.makeHttpRequest('GET', url),
expectedValue: 'upgraded'
});
}
In our case, the task is an http request returning a state from our backend database.
The model is simple:
{ "state": "upgrading" } then when the backend job is done, it returns { "state": "upgraded"}.
The job takes some time to process (around 20 sec). In our tests, this behavior occured:
First call: upgrading
First retry: upgrading
After that, by curling directly the REST api by hand, I get the upgraded status
all other retries in the CLI: upgrading
So in the CLI we build, we have 10 times the result: Result "upgrading" differs from expected value "upgraded"
It seems the let response = await task; in the subsequent retries does not call the task method at each retry. Indeed if actual call was made, it would for sure retrieve the proper upgraded state since we get it through curl.
How to make the await task; to actually trigger the call task method and not to keep the result from first call?
A promise is the result for an already started operation. By passing in task as a promise inside - it will always await the same result and return the same value.
Instead, retryTaskUntilExpectedValue should take a function for a promise and await an invocation of that:
let result = await functionReturningTask();
Where functionReturningTask is whatever you used to obtain task in the first place.

Resources