RxJS non-blocking pooling - node.js

I'm working with Amazon Transcribe Service and in the SDK doesn't have any way to get when the transcription job has finished.
So we need to pool that information. Nowdays we have the following working code...
const jobid = nanoid();
await amazonTrascribeClient
.startTranscriptionJob({
IdentifyMultipleLanguages: true,
TranscriptionJobName: jobId,
Media: {
MediaFileUri: "s3://file-location",
},
Subtitles: {
OutputStartIndex: 1,
Formats: ["vtt", "srt"],
},
OutputBucketName: `file-location`,
OutputKey: `transcriptions/${jobId}/`,
})
.promise();
// HELP HERE:
const callIntervalFunc = () => {
const callInverval = setInterval(async () => {
const { TranscriptionJob } = await amazonTrascribeClient
.getTranscriptionJob({ TranscriptionJobName: jobId })
.promise();
if (
["COMPLETED", "FAILED"].includes(TranscriptionJob.TranscriptionJobStatus)
) {
clearInterval(callInverval);
// Persist in database etc...
}
}, 2000);
};
callIntervalFunc();
But as you can see it's extremally expensive and don't work in the concurrency mode since it's lock the thread. The objective is pool the information without block the event loop, some people said to use fire and forget, but I have no idea where I should start.

Related

Bi-directional Websocket with RTK Query

I'm building a web-based remote control application for the music program Ableton Live. The idea is to be able to use a tablet on the same local network as a custom controller.
Ableton Live runs Python scripts, and I use this library that exposes the Ableton Python API to Node. In Node, I'm building an HTTP/Websocket server to serve my React frontend and to handle communication between the Ableton Python API and the frontend running Redux/RTK Query.
Since I both want to send commands from the frontend to Ableton Live, and be able to change something in Ableton Live on my laptop and have the frontend reflect it, I need to keep a bi-directional Websocket communication going. The frontend recreates parts of the Ableton Live UI, so different components will care about/subscribe to different small parts of the whole Ableton Live "state", and will need to be able to update just those parts.
I tried to follow the official RTK Query documentation, but there are a few things I really don't know how to solve the best.
RTK Query code:
import { createApi, fetchBaseQuery } from '#reduxjs/toolkit/query/react';
import { LiveProject } from '../models/liveModels';
export const remoteScriptsApi = createApi({
baseQuery: fetchBaseQuery({ baseUrl: 'http://localhost:9001' }),
endpoints: (builder) => ({
getLiveState: builder.query<LiveProject, void>({
query: () => '/completeLiveState',
async onCacheEntryAdded(arg, { updateCachedData, cacheDataLoaded, cacheEntryRemoved }) {
const ws = new WebSocket('ws://localhost:9001/ws');
try {
await cacheDataLoaded;
const listener = (event: MessageEvent) => {
const message = JSON.parse(event.data)
switch (message.type) {
case 'trackName':
updateCachedData(draft => {
const track = draft.tracks.find(t => t.trackIndex === message.id);
if (track) {
track.trackName = message.value;
// Components then use selectFromResult to only
// rerender on exactly their data being updated
}
})
break;
default:
break;
}
}
ws.addEventListener('message', listener);
} catch (error) { }
await cacheEntryRemoved;
ws.close();
}
}),
})
})
Server code:
import { Ableton } from 'ableton-js';
import { Track } from 'ableton-js/ns/track';
import path from 'path';
import { serveDir } from 'uwebsocket-serve';
import { App, WebSocket } from 'uWebSockets.js';
const ableton = new Ableton();
const decoder = new TextDecoder();
const initialTracks: Track[] = [];
async function buildTrackList(trackArray: Track[]) {
const tracks = await Promise.all(trackArray.map(async (track) => {
initialTracks.push(track);
// A lot more async Ableton data fetching will be going on here
return {
trackIndex: track.raw.id,
trackName: track.raw.name,
}
}));
return tracks;
}
const app = App()
.get('/completeLiveState', async (res, req) => {
res.onAborted(() => console.log('TODO: Handle onAborted error.'));
const trackArray = await ableton.song.get('tracks');
const tracks = await buildTrackList(trackArray);
const liveProject = {
tracks // Will send a lot more than tracks eventually
}
res.writeHeader('Content-Type', 'application/json').end(JSON.stringify(liveProject));
})
.ws('/ws', {
open: (ws) => {
initialTracks.forEach(track => {
track.addListener('name', (result) => {
ws.send(JSON.stringify({
type: 'trackName',
id: track.raw.id,
value: result
}));
})
});
},
message: async (ws, msg) => {
const payload = JSON.parse(decoder.decode(msg));
if (payload.type === 'trackName') {
// Update track name in Ableton Live and respond
}
}
})
.get('/*', serveDir(path.resolve(__dirname, '../myCoolProject/build')))
.listen(9001, (listenSocket) => {
if (listenSocket) {
console.log('Listening to port 9001');
}
});
I have a timing issue where the server's ".ws open" method runs before the buildTrackList function is done fetching all the tracks from Ableton Live. These "listeners" I'm adding in the ws-open-method are callbacks that you can attach to stuff in Ableton Live, and the one in this example will fire the callback whenever the name of a track changes. The first question is if it's best to try to solve this timing issue on the server side or the RTK Query side?
All examples I've seen on working with Websockets in RTK Query is about "streaming updates". But since the beginning I've thought about my scenario as needing bi-directional communication using the same Websocket connection the whole application through. Is this possible with RTK Query, and if so how do I implement it? Or should I use regular query endpoints for all commands from the frontend to the server?

Manual approval task in step function - invoke through a script

I have a step function with an activity task that should wait for an input from a script that I will run from my terminal. If the script is invoked, the task that is waiting should get succeeded
Could someone provide examples on how to achieve this?
Any helpful documentations or referenced code links are appreciated.
Would I need an activity worker to invoke this?
Can a script running in my terminal invoke the lambda and mark it as succeeded?
node report-choice-step-success.js --stepfunction-arn <SFN-EXEC> --step-name ManualTask
Script report-choice-step-success.js
const main = () => {
let sfnClient;
const rolename = `StepFunctionExecuter-LOCAL`;
return getcreds({ accountId: '123456789012', region: 'us-east-1', rolename })
.then(params => {
sfnClient = new AWS.StepFunctions(params)
})
.then(() => startmystepfunction(sfnClient));
};
const startmystepfunction = (sfnClient) => {
const stateMachineArn = `arn:aws:states:us-east-1:123456789012:stateMachine:MYSTEPFUNCTION`;
const name = `Manual step`;
const executionParams = { name, stateMachineArn };
return sfnClient.startExecution(executionParams).promise()
.then(response => {
if (response && response.executionArn) {
print(`Started SFN execution for arn: ${response.executionArn}`);)
};
How should I modify the task so that it waits for a manual input so that it gets succeeded?
{
"Comment": "My state machine",
"StartAt": "Manual step",
"States": {
"ManualStep": {
"Type": "Task",
"Resource": "arn:aws:states:::activity:manualtask",
"End": true
}
}
}
I could get the Activity ARN from the executionARN using the getExecutionHistory method and filtered the scheduled activities.
Then I used that particular activityARN to getActivityTask and then used the sendTaskSuccess method to transition the step to succeeded.
sfnClient.getActivityTask(params).promise()
.then(data => {
var params = {
output: data.input.toString(),
taskToken: data.taskToken.toString()
};
sfnClient.sendTaskSuccess(params).promise()
}).catch(err => {
console.error(err);
console.error(err.stack);
});

Does top-level await have a timeout?

With top-level await accepted into ES2022, I wonder if it is save to assume that await import("./path/to/module") has no timeout at all.
Here is what I’d like to do:
// src/commands/do-a.mjs
console.log("Doing a...");
await doSomethingThatTakesHours();
console.log("Done.");
// src/commands/do-b.mjs
console.log("Doing b...");
await doSomethingElseThatTakesDays();
console.log("Done.");
// src/commands/do-everything.mjs
await import("./do-a");
await import("./do-b");
And here is what I expect to see when running node src/commands/do-everything.mjs:
Doing a...
Done.
Doing b...
Done.
I could not find any mentions of top-level await timeout, but I wonder if what I’m trying to do is a misuse of the feature. In theory Node.js (or Deno) might throw an exception after reaching some predefined time cap (say, 30 seconds).
Here is how I’ve been approaching the same task before TLA:
// src/commands/do-a.cjs
import { autoStartCommandIfNeeded } from "#kachkaev/commands";
const doA = async () => {
console.log("Doing a...");
await doSomethingThatTakesHours();
console.log("Done.");
}
export default doA;
autoStartCommandIfNeeded(doA, __filename);
// src/commands/do-b.cjs
import { autoStartCommandIfNeeded } from "#kachkaev/commands";
const doB = async () => {
console.log("Doing b...");
await doSomethingThatTakesDays();
console.log("Done.");
}
export default doB;
autoStartCommandIfNeeded(doB, __filename);
// src/commands/do-everything.cjs
import { autoStartCommandIfNeeded } from "#kachkaev/commands";
import doA from "./do-a";
import doB from "./do-b";
const doEverything = () => {
await doA();
await doB();
}
export default doEverything;
autoStartCommandIfNeeded(doEverything, __filename);
autoStartCommandIfNeeded() executes the function if __filename matches require.main?.filename.
Answer: No, there is not a top-level timeout on an await.
This feature is actually being used in Deno for a webserver for example:
import { serve } from "https://deno.land/std#0.103.0/http/server.ts";
const server = serve({ port: 8080 });
console.log(`HTTP webserver running. Access it at: http://localhost:8080/`);
console.log("A");
for await (const request of server) {
let bodyContent = "Your user-agent is:\n\n";
bodyContent += request.headers.get("user-agent") || "Unknown";
request.respond({ status: 200, body: bodyContent });
}
console.log("B");
In this example, "A" gets printed in the console and "B" isn't until the webserver is shut down (which doesn't automatically happen).
As far as I know, there is no timeout by default in async-await. There is the await-timeout package, for example, that is adding a timeout behavior. Example:
import Timeout from 'await-timeout';
const timer = new Timeout();
try {
await Promise.race([
fetch('https://example.com'),
timer.set(1000, 'Timeout!')
]);
} finally {
timer.clear();
}
Taken from the docs: https://www.npmjs.com/package/await-timeout
As you can see, a Timeout is instantiated and its set method defines the timeout and the timeout message.

Job processing microservices using bull

I would like to process scheduled jobs using node.js bull. Basically I have two processors that handle 2 types of jobs. There is one configurator that configures the jobs which will be added to the bull queue using cron.
The scheduler will be in one microservice and the each of the processor will be a separate microservice. So I will be having 3 micro services.
My question is am I using the correct pattern with bull?
index.js
const Queue = require('bull');
const fetchQueue = new Queue('MyScheduler');
fetchQueue.add("fetcher", {name: "earthQuakeAlert"}, {repeat: {cron: '1-59/2 * * * *'}, removeOnComplete: true});
fetchQueue.add("fetcher", {name: "weatherAlert"}, {repeat: {cron: '3-59/3 * * * *'}, removeOnComplete: true});
processor-configurator.js
const Queue=require('bull');
const scheduler = new Queue("MyScheduler");
scheduler.process("processor", __dirname + "/alert-processor");
fetcher-configurator.js
const Queue=require('bull');
const scheduler = new Queue("MyScheduler");
scheduler.process("fetcher", __dirname+"/fetcher");
fetcher.js
const Queue = require('bull');
const moment = require('moment');
module.exports = function (job) {
const scheduler = new Queue('MyScheduler');
console.log("Insider processor ", job.data, moment().format("YYYY-MM-DD hh:mm:ss"));
scheduler.add('processor', {'name': 'Email needs to be sent'}, {removeOnComplete: true});
return Promise.resolve()
};
alert-processor.js
const Queue = require('bull');
const moment = require('moment');
module.exports = function (job) {
const scheduler = new Queue('MyScheduler');
console.log("Insider processor ", job.data, moment().format("YYYY-MM-DD hh:mm:ss"));
scheduler.add('processor', {'name': 'Email needs to be sent'}, {removeOnComplete: true});
return Promise.resolve()
};
There will be three microservices -
node index.js
node fetcher-configurator.js
node processor-configurator.js
I see inconsistent behavior from bull. Sometimes I am getting the error Missing process handler for job type
Quoting myself with a hope this will be helpful for someone else
This is because both workers use the same queue. Worker tries to get next job from queue, receives a job with wrong type (eg "fetcher" instead of "processor") and fails because it knows how to handle "processor" and doesn't know what to do with "fetcher". Bull doesn't allow you to take only compatible jobs from queue, both workers should be able to process all types of jobs. The simplest solution would be to use two different queues, one for processors and one for fetchers. Then you can remove names from jobs and processors, it won't be needed anymore since name is defined by the queue.
https://github.com/OptimalBits/bull/issues/1481
The Bull:
expiration-queue.js
import Queue from 'bull';
import { ExpirationCompletePublisher } from '../events/publishers/expiration-complete-publisher';
import { natsWrapper } from '../nats-wrapper';
interface Payload {
orderId: string;
}
const expirationQueue = new Queue<Payload>('order:expiration', {
redis: {
host: process.env.REDIS_HOST,
},
});
expirationQueue.process(async (job) => {
console.log('Expiries order id', job.data.orderId);
new ExpirationCompletePublisher(natsWrapper.client).publish({
orderId: job.data.orderId,
});
});
export { expirationQueue };
promotionEndQueue.js
import Queue from 'bull';
import { PromotionEndedPublisher } from '../events/publishers/promotion-ended-publisher';
import { natsWrapper } from '../nats-wrapper';
interface Payload {
promotionId: string;
}
const promotionEndQueue = new Queue<Payload>('promotions:end', {
redis: {
host: process.env.REDIS_HOST, // look at expiration-depl.yaml
},
});
promotionEndQueue.process(async (job) => {
console.log('Expiries promotion id', job.data.promotionId);
new PromotionEndedPublisher(natsWrapper.client).publish({
promotionId: job.data.promotionId,
});
});
export { promotionEndQueue };
order-created-listener.js
import { Listener, OrderCreatedEvent, Subjects } from '#your-lib/common';
import { queueGroupName } from './queue-group-name';
import { Message } from 'node-nats-streaming';
import { expirationQueue } from '../../queues/expiration-queue';
export class OrderCreatedListener extends Listener<OrderCreatedEvent> {
subject: Subjects.OrderCreated = Subjects.OrderCreated;
queueGroupName = queueGroupName;
async onMessage(data: OrderCreatedEvent['data'], msg: Message) {
// delay = expiredTime - currentTime
const delay = new Date(data.expiresAt).getTime() - new Date().getTime();
// console.log("delay", delay)
await expirationQueue.add(
{
orderId: data.id,
},
{
delay,
}
);
msg.ack();
}
}
promotion-started-listener.js
import {
Listener,
PromotionStartedEvent,
Subjects,
} from '#your-lib/common';
import { queueGroupName } from './queue-group-name';
import { Message } from 'node-nats-streaming';
import { promotionEndQueue } from '../../queues/promotions-end-queue';
export class PromotionStartedListener extends Listener<PromotionStartedEvent> {
subject: Subjects.PromotionStarted = Subjects.PromotionStarted;
queueGroupName = queueGroupName;
async onMessage(data: PromotionStartedEvent['data'], msg: Message) {
// delay = expiredTime - currentTime
const delay = new Date(data.endTime).getTime() - new Date().getTime();
// console.log("delay", delay)
await promotionEndQueue.add(
{
promotionId: data.id,
},
{
delay,
}
);
msg.ack();
}
}

When use vm2 in worker_threads, is it possible to share a NodeVM instance between workers?

I am using worker_threads and vm2 to implement a serverless-like thing, but I cannot get a NodeVM instance in the main thread and then pass through workData(because of worker_threads's limitation), so I can only new NodeVM in a worker thread per request, inside which I cannot reuse a vm instance and the cost hurts.
The new NodeVM() takes 200 ~ 450 ms to finish, so I wish to pre-init a reusable instance.
const w = new Worker(`
(async () => {
const { workerData, parentPort } = require('worker_threads');
const { NodeVM } = require('vm2');
const t = Date.now();
const vm = new NodeVM({ // cost 200 ~ 450 ms
console: 'inherit',
require: {
external: [ 'request-promise', 'lodash' ],
builtin: [],
import: [ 'request-promise', 'lodash' ], // faster if added
},
});
console.log('time cost on new NodeVM:', Date.now() - t);
const fnn = vm.run(workerData.code, workerData.filename);
console.log('time cost by initializing vm:', Date.now() - t);
try {
const ret = await fnn(workerData.params);
parentPort.postMessage({
data: typeof ret === 'string' ? ret : JSON.stringify(ret),
});
} catch (e) {
parentPort.postMessage({
err: e.toString(),
});
}
console.log('----worker donex');
})();
`,
{
workerData: {
params,
code,
dirname: __dirname,
filename: `${__dirname}/faasVirtual/${fn}.js`,
},
eval: true,
});
Can anybody give me some advice?
Thanks a lot.
I've decided to prohibit external module import. Because require is internally readFileSync, which costs most of the time, and the http module within node itself can be used to replace request-promise.
After comment out external option, the average time cost for init is roughly 10+ms, which is acceptable for now.
But if worker_threads can clone function object through workerData, it would be more efficient.

Resources