How do conditionally thread in a nodejs application - node.js

My website's api works fine and runs fast, except for one route. Since nodejs is single threaded, we wanted to make the api call DATA be separately threaded so that it doesnt block the rest of the incoming calls, and because it took too long to fork. Basically the code I want is like this:
router.get(req, res){
if(MasterThread){
create a thread to run the DATA functions
}
else{
res.json(getDATA())
}
}
is this possible at all? All the tutorials I found implied that I either had to use cluster, or my threading had to occur in my main.js, neither of which I want to do.
When I tried to set up the threading across 2 files, the imports for nodejs threading were always null, or my imports never worked.
So again, if this a possible thing?

You can use worker_threads for this case I use it for one of my sites and it works perfectly here is minimal example
const { Worker } = require('worker_threads')
const worker_script = path.join(__dirname, "./worker.js")
router.get('/', function(req, res) {
const worker = new Worker(worker_script, {
workerData: JSON.stringify(obj)
})
worker.on("error", (err) => console.log(err))
worker.on("exit", () => console.log("exit"))
worker.on("message", (data) => {
console.log(data)
res.send(data)
})
})
and here is the worker
const { parentPort, workerData, isMainThread } = require('worker_threads')
if (!isMainThread) {
console.log("workerData: ", workerData)
//do some heavy work with the data
var parsed = JSON.parse(workerData)
parentPort.postMessage(parsed)
}

Related

Keeping track of socket.io sessions for logging

I am working on making a generic logging module for my application and am trying to add session information to each log (requestId/socketId, userId, etc.) But I am running into some issues with logging websockets.
Basically my application has 2 parts: a restAPI (express) and websockets (socket.io)
Both the restAPI and websockets use some of the same functions (database edits etc.), now these functions should log errors or other useful data.
But passing the session information to the logger module will create a lot of overhead and makes the code quite unreadable, so I am looking for a way to save the session information so that the logger can get the information from there.
For the restAPI this was fairly simple using asyncLocalStorage and I was hoping to utilize the same principle for the websockets but I guess its not that simple.
My (partially) working code setup is as follows:
Global context creator (logAsyncContext.ts):
import { AsyncLocalStorage } from "async_hooks";
export const context = new AsyncLocalStorage();
export const createContext = (data: any, callBack: () => any) => {
const store = data;
return context.run(store, () => callBack());
};
This is then used by the middleware of the restAPI and websockets
RestAPI middleware (apiLogContext.ts):
// Import the required modules
import { v4 } from "uuid";
import { Request, Response, NextFunction } from "express";
// Import custom utilities
import { createContext } from "../../utils/logAsyncContext";
import { logger } from "../../utils/logger";
// Generate a unique ID for incoming requests and store in context so logger can access it
export const apiLogContext = (
req: Request,
_res: Response,
next: NextFunction
) => {
const logData = {
api: {
requestId: v4(),
originalUrl: req.originalUrl,
},
};
return createContext(logData, () => debugLog(next));
};
const debugLog = (next: NextFunction) => {
logger. Debug("API log context created");
return next();
};
websocket middleware (wsLogContext.ts):
// Import the required modules
import { v4 } from "uuid";
import { Socket } from "socket.io";
// Import custom utilities
import { createContext } from "../../utils/logAsyncContext";
import { logger } from "../../utils/logger";
// Generate a unique ID for incoming requests and store in context so logger can access it
export const wsLogContext = (socket: Socket, next: () => void) => {
const logData = {
ws: {
socketId: v4(),
nameSpace: socket.nsp.name,
},
};
return createContext(logData, () => debugLog(next));
};
const debugLog = (next: () => void) => {
logger.debug(`WS log context created`);
return next();
};
Now the logger can get the context from logAsyncContext.ts:
import { context } from "./logAsyncContext";
const getStore = () => {
// Get the store from the AsyncLocalStorage
const store = context.getStore();
// If the store is not defined, log an error
if (!store) {
console.log("Store is not defined");
return undefined;
}
return store;
};
export function debug(message: string) {
// Get the context
const store = getStore();
if (!store) {
return;
}
if (isAPILog(store)) {
console.debug(
`DEBUG LOG: ${store.api.requestId} | ${store.api.originalUrl} - ${message}`
);
} else {
console.debug(
`DEBUG LOG: ${store.ws.socketId} | ${store.ws.nameSpace} - ${message}`
);
}
};
This works perfectly for the restAPI but for the websockets its a different story, it does log the initial debug message ("WS log context created") but everything logged after cannot access the store ("Store is not defined")
Now I am sure this is very logical but I don't fully understand the structure of data for websocket connections, so I am asking, am I just making a simple mistake or is this whole setup of logging for websockets incorrect? If so what would be the better way (without needing to pass the session info with every log)?
I faced with same issue.
After shallow investigation, I can suppose following moments:
socket.io middlewares are not the same as in express.(not 100% sure)
There known issue https://github.com/nodejs/node/issues/32330 (closed but with tricky code)
To go forward with AsyncLocalStorage in sockets.io I do next steps:
// context.js
const uuid = require('uuid').v4;
const { AsyncLocalStorage } = require('async_hooks');
const context = new AsyncLocalStorage();
const enterWith = (data) => context.enterWith({traceId: uuid(), ...data });
module.exports = { context, enterWith };
// sockets.js
// I have legacy socket.io v2, your code may be different
io.use(contextMiddleware);
io.use(authSocket);
io.on('connection', (socket) => {
socket.on('USER_CONNECT', async () => {
socket.emit('Exo', `USER_CONNECT`);
try {
// The main solution is here, enter a valid context before actual controller execution
await enterWith({ userId: socket.chatuser });
await userService.createOrUpdateChatUser({ userId: socket.chatuser, customerId });
socket.emit('Exo', `User created`);
} catch (e) {
logger.error(`Create user failure ${e.message}`, { error: e });
socket.emit('Error', e.message);
}
});
Thanks for #bohdan for reminding me that this issue was still unanswered. While his solution works, I will also explain what I did for anyone wondering how to do this using middleware.
What I learned is that WebSockets can be very confusing but quite logical, for me the most important thing to realize was that you cannot use the "same" asyncLocalStorage for a single socket as long as that socket is connected. So I use a different asyncLocalStorage for each event (I will call them stores)
For me there are 4 different "stores" for a websocket connection Which cannot share the same store.
When a connection is made
When an event is received (frontend --> backend)
When an event is sent (backend --> frontend)
When a connection is closed
For all of these types I (mostly) use the same middleware:
import { AsyncLocalStorage } from "async_hooks";
import { Socket } from "socket.io"
import { v4 } from "uuid";
const context = mew AsyncLocalStorage();
const wsLogStore = (socket: Socket, next: () => void) => {
const newData: any = {
// Any data you want to save in the store
// For example socket Id
socketId: socket.id
// I also add an eventId which I can later use in my logging to combine all logs belonging to a single event
eventId: v4()
}
return context.run(newData, () => callBack())
}
#1 For the first type (when a connection is made)
You can use the middleware like this:
// Import the middleware we just created
import wsLogStore from "./wsLogStore"
// io = socketIO server instance (io = new Server)
io.use(wsLogStore)
Now a store will be available everywhere as long as it happens directly after the connection
#2 When a event is received (frontend --> backend)
io.use((socket, next) => {
socket.use((event, next) => {
wsLogStore(socket, () => {
next()
});
});
})
Now everywhere you use socket.on("<any event>") A store will have been created and usable
#3 When an event is sent (backend --> frontend)
Now this one is a little bit different since depending on your implementation this will not be easy, for example when you sent something to a specific room, is it enough to create a single store for the whole room? Or do you want to have a separate one for each socket that is receiving a event? And how do we create a store since we don't have a specific socket available?
For my use case it was absolutely necessary to have a separate store for each socket that is receiving an event.
const sentEventsToSockets = () => {
// Get the sockets you want to send a event to,
// For example, you could get the sockets from a room
const sockets = (Array.from(io.sockets.values()) as Socket[]).filter((socket) => socket.rooms.has("your room"))
for (const socket of sockets) {
wsLogStore(socket, () => {
//Here a separate store for each socket will be available
socket.emit("your event")
})
}
}
#4 When a connection is closed
Sadly, the store we created in step 1 is not available in this case so we would need to create a new one.
io.use((socket, next) => {
socket.on("disconnect", () => {
wsLogStore(socket, () => {
// A separate store will be available here if a connection is closed
});
});
})
Conclusion
While it would be easier if we could create a single store for each socket and use it the whole time, it seems like that is simply not possible.
By saving the socketId in our store we can however combine all data that we need afterwards. For example, in logging.
Note: If you use namespaces the socketId will be different for each namespace, you could use the connection id socket.conn.id which is a unique ID for each socket (no matter which namespace). Why this value is marked as private (if using TS) I have no clue
All of this will of course be slightly different depending on your use case and implementation. For example, if you use namespaces then you need to make sure the middleware is applied in each namespace.
I hope someone finds this helpful and if there are any question about how I do things or how to improve my setup, I would love to hear from you!

Trigger the execution of a function if any condition is met

I'm writing an HTTP API with expressjs in Node.js and here is what I'm trying to achieve:
I have a regular task that I would like to run regularly, approx every minute. This task is implemented with an async function named task.
In reaction to a call in my API I would like to have that task called immediately as well
Two executions of the task function must not be concurrent. Each execution should run to completion before another execution is started.
The code looks like this:
// only a single execution of this function is allowed at a time
// which is not the case with the current code
async function task(reason: string) {
console.log("do thing because %s...", reason);
await sleep(1000);
console.log("done");
}
// call task regularly
setIntervalAsync(async () => {
await task("ticker");
}, 5000) // normally 1min
// call task immediately
app.get("/task", async (req, res) => {
await task("trigger");
res.send("ok");
});
I've put a full working sample project at https://github.com/piec/question.js
If I were in go I would do it like this and it would be easy, but I don't know how to do that with Node.js.
Ideas I have considered or tried:
I could apparently put task in a critical section using a mutex from the async-mutex library. But I'm not too fond of adding mutexes in js code.
Many people seem to be using message queue libraries with worker processes (bee-queue, bullmq, ...) but this adds a dependency to an external service like redis usually. Also if I'm correct the code would be a bit more complex because I need a main entrypoint and an entrypoint for worker processes. Also you can't share objects with the workers as easily as in a "normal" single process situation.
I have tried RxJs subject in order to make a producer consumer channel. But I was not able to limit the execution of task to one at a time (task is async).
Thank you!
You can make your own serialized asynchronous queue and run the tasks through that.
This queue uses a flag to keep track of whether it's in the middle of running an asynchronous operation already. If so, it just adds the task to the queue and will run it when the current operation is done. If not, it runs it now. Adding it to the queue returns a promise so the caller can know when the task finally got to run.
If the tasks are asynchronous, they are required to return a promise that is linked to the asynchronous activity. You can mix in non-asynchronous tasks too and they will also be serialized.
class SerializedAsyncQueue {
constructor() {
this.tasks = [];
this.inProcess = false;
}
// adds a promise-returning function and its args to the queue
// returns a promise that resolves when the function finally gets to run
add(fn, ...args) {
let d = new Deferred();
this.tasks.push({ fn, args: ...args, deferred: d });
this.check();
return d.promise;
}
check() {
if (!this.inProcess && this.tasks.length) {
// run next task
this.inProcess = true;
const nextTask = this.tasks.shift();
Promise.resolve(nextTask.fn(...nextTask.args)).then(val => {
this.inProcess = false;
nextTask.deferred.resolve(val);
this.check();
}).catch(err => {
console.log(err);
this.inProcess = false;
nextTask.deferred.reject(err);
this.check();
});
}
}
}
const Deferred = function() {
if (!(this instanceof Deferred)) {
return new Deferred();
}
const p = this.promise = new Promise((resolve, reject) => {
this.resolve = resolve;
this.reject = reject;
});
this.then = p.then.bind(p);
this.catch = p.catch.bind(p);
if (p.finally) {
this.finally = p.finally.bind(p);
}
}
let queue = new SerializedAsyncQueue();
// utility function
const sleep = function(t) {
return new Promise(resolve => {
setTimeout(resolve, t);
});
}
// only a single execution of this function is allowed at a time
// so it is run only via the queue that makes sure it is serialized
async function task(reason: string) {
function runIt() {
console.log("do thing because %s...", reason);
await sleep(1000);
console.log("done");
}
return queue.add(runIt);
}
// call task regularly
setIntervalAsync(async () => {
await task("ticker");
}, 5000) // normally 1min
// call task immediately
app.get("/task", async (req, res) => {
await task("trigger");
res.send("ok");
});
Here's a version using RxJS#Subject that is almost working. How to finish it depends on your use-case.
async function task(reason: string) {
console.log("do thing because %s...", reason);
await sleep(1000);
console.log("done");
}
const run = new Subject<string>();
const effect$ = run.pipe(
// Limit one task at a time
concatMap(task),
share()
);
const effectSub = effect$.subscribe();
interval(5000).subscribe(_ =>
run.next("ticker")
);
// call task immediately
app.get("/task", async (req, res) => {
effect$.pipe(
take(1)
).subscribe(_ =>
res.send("ok")
);
run.next("trigger");
});
The issue here is that res.send("ok") is linked to the effect$ streams next emission. This may not be the one generated by the run.next you're about to call.
There are many ways to fix this. For example, you can tag each emission with an ID and then wait for the corresponding emission before using res.send("ok").
There are better ways too if calls distinguish themselves naturally.
A Clunky ID Version
Generating an ID randomly is a bad idea, but it gets the general thrust across. You can generate unique IDs however you like. They can be integrated directly into the task somehow or can be kept 100% separate the way they are here (task itself has no knowledge that it's been assigned an ID before being run).
interface IdTask {
taskId: number,
reason: string
}
interface IdResponse {
taskId: number,
response: any
}
async function task(reason: string) {
console.log("do thing because %s...", reason);
await sleep(1000);
console.log("done");
}
const run = new Subject<IdTask>();
const effect$: Observable<IdResponse> = run.pipe(
// concatMap only allows one observable at a time to run
concatMap((eTask: IdTask) => from(task(eTask.reason)).pipe(
map((response:any) => ({
taskId: eTask.taskId,
response
})as IdResponse)
)),
share()
);
const effectSub = effect$.subscribe({
next: v => console.log("This is a shared task emission: ", v)
});
interval(5000).subscribe(num =>
run.next({
taskId: num,
reason: "ticker"
})
);
// call task immediately
app.get("/task", async (req, res) => {
const randomId = Math.random();
effect$.pipe(
filter(({taskId}) => taskId == randomId),
take(1)
).subscribe(_ =>
res.send("ok")
);
run.next({
taskId: randomId,
reason: "trigger"
});
});

Node.js/Vuetify- Is there a way to get data from server based on time?

I have a node.js server set up for a vuetify project. In my server, I am parsing a csv file that has information in it about scheduling and time. In my vuetify project, is there a way to get data from the csv based on the time that the client is being used?
OK, let's go with an example.
From what I understand, you have the following information in your CSV file:
Time,Activity
07:00,Breakfast
08:00,Go to work
12:00,Lunch break
Since you didn't specify, I will use an example parser, which will push all rows, as objects, into an array:
[
{ Time: '07:00', Activity: 'Breakfast' },
{ Time: '08:00', Activity: 'Go to work' },
{ Time: '12:00', Activity: 'Lunch break' }
]
You need to send that information to your clients, so assuming you are using Express, you could go with something in the lines of:
const csv = require('csv-parser');
const fs = require('fs');
const express = require('express');
const app = express();
const timeSchedule = [];
function parseCsv() {
return new Promise((resolve, reject) => {
fs.createReadStream('data.csv')
.pipe(csv())
.on('data', (data) => timeSchedule.push(data))
.on('error', (err) => reject(err))
.on('end', () => {
csvParsed = true;
resolve();
});
}
}
app.get('/scheduled-for/:hour', function (req, res) {
// You need to come up with the logic for your case
// As an example, I will assume that if I make a request at any time
// between 7 and 8, I will get "Breakfast"
// between 8 and 9 - "Go to work"
// between 12 and 13 - "Lunch break"
parseCsv().then(() => {
res.json(timeSchedule.find(row => row.Time.startsWith(req.params.hour)))
})
})
Please note, all of the above is happening on the Nodejs server.
From the client, you will have to call the scheduled-for GET handle with the hour param. Another option is to allow the back-end to determine the hour of the request, by using the Date object, but the above is more flexible for the client. It will also avoid issues with different timezones, given that your client requests are coming from different timezones than the one your server is on.
Assuming you are using axios in your Vue application, the simplest way to get the schedule is to call your API in your component:
new Vue({
el: '#app',
data () {
return {
activity: null
}
},
mounted () {
axios
.get('https://urlToYourApi/v1/scheduled-for/' + new Date().getHours())
.then(schedule => {
if (schedule) {
this.activity = schedule.Activity;
}
else {
console.log("No activity found for this hour!");
}
}
}
})
This code is NOT for production! You need to handle many cases, as with new Date().getHours() returning single-digit hours, parsing of CSV, not to mention the domain logic itself, which depends on your specific case. This is just a simple example. I hope it helps to guide you in the right direction!

How to launch/cancel a function in express by user request

I have express js server which listens for a request from a user:
// PUG template
$("#request").click(()=>{
$.ajax({url: "/launch", method: 'get'});
})
// server.js
app.get('/launch', (req, res) => {
getCatalog();
}
This should launch a huge do while function, which may literally work for hours, except if user wishes to cancel it.
Question: what should be the proper way to launch and cancel this function by user request?
// PUG template
$("#cancel").click(()=>{
...
})
I would approach this case with code logic other than express functionality.
You can create a class that handles catalog loading and also have a state for this process that you can turn on and off (I believe loading process involves multi async functions calls so the event loop allow this).
For example:
class CatalogLoader {
constructor() {
this.isProcessing = false
}
getCatalog() {
this.isProcessing = true
while(... && this.isProcessing) {
// Huge loading logic
}
this.isProcessing = false
}
}
And in express you can add below api:
app.get('/launch', (req, res) => {
catalogLoader.getCatalog();
}
app.get('/cancelLaunch', (req, res) => {
catalogLoader.isProcessing = false
...
}
Second possible solution using require('child_process');, but you need to know the PID of the process you wish to cancel. Benefit: unload the main node thread from a heavy task.
So, including node's const childProcess = require('child_process');
Then:
app.get('/launch', (req,res) => {
const getCatalog = childProcess.fork('script.js', null, {
detached: true
});
res.send();
});
app.get('/kill', (req,res,next) => {
const pid = req.query.pid;
if (pid) {
process.kill(pid);
res.send();
} else {
res.end();
}
});
$("#requestCancel").click(()=>{
$.ajax({url: "/kill?pid=variable*", method: 'get'});
})
I send data to PUG's js from node via Server Sent Events

How to make multiple http requests from a Google Cloud Function (Cheerio, Node.js)

MY PROBLEM:
I'm building a web-scraper with Cheerio, Node.js, and Google Cloud Functions.
The problem is I need to make multiple requests, then write data from each request to a Firestore database before calling response.send() and thereby terminating the function.
My code requires two loops: the first loop is with urls from my db, with each one making a separate request. The second loop is with Cheerio using .each to scrape multiple rows of table data from the DOM and make a separate write for each row.
WHAT I'VE TRIED:
I've tried pushing each request to an array of promises and then waiting for all the promises to resolve with promises.all() before calling res.send(), but I'm still a little shaky on promises and not sure that is the right approach. (I have gotten the code to work for smaller datasets that way, but still inconsistently.)
I also tried creating each request as a new promise and using async/await to await each function call from the forEach loop to allow time for each request and write to fully finish so I could call res.send() afterward, but I found out that forEach doesn't support Async/await.
I tried to get around that with the p-iteration module but because its not actually forEach but rather a method on the query (doc.forEach()) I don't think it works like that.
So here's my code.
NOTE:
As mentioned, this is not everything I tried (I removed my promise attempts), but this should show what I am trying to accomplish.
export const getCurrentLogs = functions.https.onRequest((req, response) => {
//First, I make a query from my db to get the urls
// that I want the webscraper to loop through.
const ref = scheduleRef.get()
.then((snapshot) => {
snapshot.docs.forEach((doc) => {
const scheduleGame = doc.data()
const boxScoreUrl = scheduleGame.boxScoreURL
//Inside the forEach I call the request
// as a function with the url passed in
updatePlayerLogs("https://" + boxScoreUrl + "/");
});
})
.catch(err => {
console.log('Error getting schedule', err);
});
function updatePlayerLogs (url){
//Here I'm not sure on how to set these options
// to make sure the request stays open but I have tried
// lots of different things.
const options = {
uri: url,
Connection: 'keep-alive',
transform: function (body) {
return cheerio.load(body);
}
};
request(options)
.then(($) => {
//Below I loop through some table data
// on the dom with cheerio. Every loop
// in here needs to be written to firebase individually.
$('.stats-rows').find('tbody').children('tr').each(function(i, element){
const playerPage = $(element).children('td').eq(0).find('a').attr('href');
const pts = replaceDash($(element).children('td').eq(1).text());
const reb = replaceDash($(element).children('td').eq(2).text());
const ast = replaceDash($(element).children('td').eq(3).text());
const fg = replaceDash($(element).children('td').eq(4).text());
const _3pt = replaceDash($(element).children('td').eq(5).text());
const stl = replaceDash($(element).children('td').eq(9).text());
const blk = replaceDash($(element).children('td').eq(10).text());
const to = replaceDash($(element).children('td').eq(11).text());
const currentLog = {
'pts': + pts,
'reb': + reb,
'ast': + ast,
'fg': fgPer,
'3pt': + _3ptMade,
'stl': + stl,
'blk': + blk,
'to': + to
}
//here is the write
playersRef.doc(playerPage).update({
'currentLog': currentLog
})
.catch(error =>
console.error("Error adding document: ", error + " : " + url)
);
});
})
.catch((err) => {
console.log(err);
});
};
//Here I call response.send() to finish the function.
// I have tried doing this lots of different ways but
// whatever I try the response is being sent before all
// docs are written.
response.send("finished writing logs")
});
Everything I have tried either results in a deadline exceeded error (possibly because of quota limits which I have looked into but I don't think I should be exceeding) Or some unexplained error where the code doesn't finish executing but shows me nothing in the logs.
Please help, is there a way to use async/await in this scenario that I am not understanding? Is there a way to use promises to make this elegant?
Many thanks,
Maybe have a look at something like this. It uses Bluebird promises and the request-promise library
const Promise = require('bluebird');
var rp = require('request-promise');
const urlList = ['http://www.google.com', 'http://example.com']
async function getList() {
await Promise.map(urlList, (url, index, length) => {
return rp(url)
.then((response) => {
console.log(`${'\n\n\n'}${url}:${'\n'}${response}`);
return;
}).catch(async (err) => {
console.log(err);
return;
})
}, {
concurrency: 10
}); //end Promise.map
}
getList();

Resources