I am struggling with node.js 'worker_threads'.
So what I wanted to do is to pass several instances of my custom class into the worker thread.
Instances are assigned to map by some unique serial number.
So basically, I have got a Map of type - <string, MyUniqueClassInstance>.
My worker implementation looks like this:
Class method running a worker service:
public static runService = (workerData:any) => {
return new Promise((resolve, reject) => {
const route = path.join(__dirname, '/worker.js');
const worker = new Worker(route, { workerData });
worker.on('message', resolve);
worker.on('error', reject);
worker.on('exit', (code:number) => {
if (code !== 0)
reject(new Error(`Worker stopped with exit code ${code}`));
})
})
}
worker itself:
const { workerData, parentPort } = require('worker_threads')
const instance = new Counter();
const {items, myMap} = workerData;
const pResponseList:Promise<any>[] = [];
items.map((item: Item) => {
pResponseList.push(
instance.count(item, myMap.get(item._id)!)
);
});
Promise.all(pResponseList).then(res => parentPort.postMessage(res));
And whenever, inside 'count' method I try to run a method from item instance it throws an error
myMapEntry.myCustomInstanceMethod is not a function
I tried to console.log() content of my instance just before passing it to .count() method and everything is correctly settled.
Same pattern runs flawlessly outside of worker instance.
Could anyone help me to find out what exactly could potentially be wrong inside this code?
You can't pass functions (e.g. instances of classes won't work, at least their methods won’t) - you can only pass serializable data.
https://nodejs.org/api/worker_threads.html#worker_threads_worker_workerdata
An arbitrary JavaScript value that contains a clone of the data passed to this thread’s Worker constructor.
The data is cloned as if using postMessage(), according to the HTML structured clone algorithm.
Then let's go to HTML structured clone algorithm
Things that don't work with structured clone
Function objects cannot be duplicated by the structured clone algorithm; attempting to throws a DATA_CLONE_ERR exception.
Could you perhaps reconstruct the instance of the class using serialized data, call your methods, and then return the serialized data back to the parent thread?
Related
I may be demonstrating my deep, deep ignorance of threading in node, but this is my first attempt at using worker threads, and the documentation says I ought to use a worker pool.
So here we are. I am using the node-worker-threads-pool package. My app will be training several dozen ML models simultaneously using TensorFlow.
The following TypeScript code falls over at runtime, although it transpiles fine:
trainModels = async () => {
const modelIds: string[] = getModelId();
for await (const modelId of modelIds) {
this.dynamicPool.exec({
task: id => {
const predictor = new Predictor(id, []);
// ...do such-and-such
},
param: modelId,
});
}
};
The runtime complains:
ReferenceError [Error]: Predictor is not defined
So it can't find it. Accessing the class elsewhere in the code is fine; but not within a thread.
I"m guessing this is threading 101 in node. How do I get around this? Perhaps I can construct the class and then create the thread within that class instead? What is the pattern?
Ok, so I've done this. Seems to be happier:
constructor(private pairNames: string[], private data: ModelData[]) {
for (const pairName of pairNames) {
this.dynamicPool.exec({
task: this.execTask,
param: pairName,
});
}
console.log(`In Predictor constructor`);
}
private execTask = async (pairName: string) => {
console.log(`In Predictor execTask for ${pairName}`);
};
I was trying to create an automation script for work, it is supposed to use multiple puppeteer instances to process input strings simultaneously.
the task queue and number of puppeteer instances are controlled by the package generic-pool,
strangely, when i run the script on ubuntu or debian, it seems that it fells into an infinite loop. tries to run infinite number of puppeteer instances. while when run on windows, the output was normal.
const puppeteer = require('puppeteer');
const genericPool = require('generic-pool');
const faker = require('faker');
let options = require('./options');
let i = 0;
let proxies = [...options.proxy];
const pool = genericPool.createPool({
create: async () => {
i++;
console.log(`create instance ${i}`);
if (!proxies.length) {
proxies = [...options.proxy];
}
let {control = null, proxy} = proxies.pop();
let instance = await puppeteer.launch({
headless: true,
args: [
`--proxy-server=${proxy}`,
]
});
instance._own = {
proxy,
tor: control,
numInstance: i,
};
return instance;
},
destroy: async instance => {
console.log('destroy instance', instance._own.numInstance);
await instance.close()
},
}, {
max: 3,
min: 1,
});
async function run(emails = []) {
console.log('Processing', emails.length);
const promises = emails.map(email => {
console.log('Processing', email)
pool.acquire()
.then(browser => {
console.log(`${email} handled`)
pool.destroy(browser);})
})
await Promise.all(promises)
await pool.drain();
await pool.clear();
}
let emails = [a,b,c,d,e,];
run(emails)
Output
create instance 1
Processing 10
Processing Stacey_Haley52
Processing Polly.Block
create instance 2
Processing Shanny_Hudson59
Processing Vivianne36
Processing Jayda_Ullrich
Processing Cheyenne_Quitzon
Processing Katheryn20
Processing Jamarcus74
Processing Lenore.Osinski
Processing Hobart75
create instance 3
create instance 4
create instance 5
create instance 6
create instance 7
create instance 8
create instance 9
is it because of my async functions? How can I fix it?
Appreciate your help!
Edit 1. modified according to #James suggested
The main problem you are trying to solve,
It is supposed to use multiple puppeteer instances to process input strings simultaneously.
Promise Queue
You can use a rather simple solution that involves a simple promise queue. We can use p-queue package to limit the concurrency as we wish. I used this on multiple scraping projects to always test things out.
Here is how you can use it.
// emails to handle
let emails = [a, b, c, d, e, ];
// create a promise queue
const PQueue = require('p-queue');
// create queue with concurrency, ie: how many instances we want to run at once
const queue = new PQueue({
concurrency: 1
});
// single task processor
const createInstance = async (email) => {
let instance = await puppeteer.launch({
headless: true,
args: [
`--proxy-server=${proxy}`,
]
});
instance._own = {
proxy,
tor: control,
numInstance: i,
};
console.log('email:', email)
return instance;
}
// add tasks to queue
for (let email of emails) {
queue.add(async () => createInstance(email))
}
Generic Pool Infinite Loop Problem
I removed all kind of puppeteer related code from your sample code and saw how it was still producing the infinite output to console.
create instance 70326
create instance 70327
create instance 70328
create instance 70329
create instance 70330
create instance 70331
...
Now, if you test few times, you will see it will throw the loop only if you something on your code is crashing. The culprit is this pool.acquire() promise, which is just re queuing on error.
To find what is causing the crash, use the following events,
pool.on("factoryCreateError", function(err) {
console.log('factoryCreateError',err);
});
pool.on("factoryDestroyError", function(err) {
console.log('factoryDestroyError',err);
});
There are some issues related to this:
acquire() never resolves/rejects if factory always rejects, here.
About the acquire function in pool.js, here.
.acquire() doesn't reject when resource creation fails, here.
Good luck!
You want to return from your map rather than await, also don't await inside the destroy call, return the result and you can chain these e.g.
const promises = emails.map(e => pool.acquire().then(pool.destroy));
Or alternatively, you could just get rid of destroy completely e.g.
pool.acquire().then(b => b.close())
So no matter what I've read, even once I do it right, I can't seem to get the the hang of async and await. For example I have this in my startup.
startup.js
await CommandBus.GetInstance();
await Consumers.GetInstance();
Debugging jumps to the end of the get instance for CommandBus (starting up a channel for rabbitmq) and start Consumers.GetInstance() which fails since channel is null.
CommandBus.js
export default class CommandBus {
private static instance: CommandBus;
private channel: any;
private conn: Connection;
private constructor() {
this.init();
}
private async init() {
//Create connection to rabbitmq
console.log("Starting connection to rabbit.");
this.conn = await connect({
protocol: "amqp",
hostname: settings.RabbitIP,
port: settings.RabbitPort,
username: settings.RabbitUser,
password: settings.RabbitPwd,
vhost: "/"
});
console.log("connecting channel.");
this.channel = await this.conn.createChannel();
}
static async GetInstance(): Promise<CommandBus> {
if (!CommandBus.instance) {
CommandBus.instance = new CommandBus();
}
return CommandBus.instance;
}
public async AddConsumer(queue: Queues) {
await this.channel.assertQueue(queue);
this.channel.consume(queue, msg => {
this.Handle(msg, queue);
});
}
}
Consumers.js
export default class Consumers {
private cb: CommandBus;
private static instance: Consumers;
private constructor() {
this.init();
}
private async init() {
this.cb = await CommandBus.GetInstance();
await cb.AddConsumer(Queues.AuthResponseLogin);
}
static async GetInstance(): Promise<Consumers> {
if (!Consumers.instance) {
Consumers.instance = new Consumers();
}
return Consumers.instance;
}
}
Sorry I realize this is in Typescript, but I imagine that doesn't matter. The issue occurs specifically when calling cb.AddConsumer which can be found CommandBus.js. It tries to assert a queue against a channel that doesn't exist yet. What I don't understand, is looking at it. I feel like I've covered all the await areas, so that it should wait on channel creation. The CommandBus is always fetched as a singleton. I don't if this poses issues, but again it is one of those areas that I cover with awaits as well. Any help is great thanks everyone.
You can't really use asynchronous operations in a constructor. The problem is that the constructor needs to return your instance so it can't also return a promise that will tell the caller when it's done.
So, in your Consumers class, await new Consumers(); is not doing anything useful. new Consumers() returns a new instance of a Consumers object so when you await that it doesn't actually wait for anything. Remember that await does something useful with you await a promise. It doesn't have any special powers to await your constructor being done.
The usual way around this is to create a factory function (which can be a static in your design) that returns a promise that resolves to the new object.
Since you're also trying to make a singleton, you would cache the promise the first time you create it and always return the promise to the caller so the caller would always use .then() to get the finished instance. The first time they call it, they'd get a promise that was still pending, but later they'd get a promise that was already fulfilled. In either case, they just use .then() to get the instance.
I don't know TypeScript well enough to suggest to you the actual code for doing this, but hopefully you get the idea from the description. Turn GetInstance() into a factory function that returns a promise (that you cache) and have that promise resolve to your instance.
Something like this:
static async GetInstance(): Promise<Consumers> {
if (!Consumers.promise) {
let obj = new Consumers();
Consumers.promise = obj.init().then(() => obj);
}
return Consumers.promise;
}
Then, the caller would do:
Consumers.getInstance().then(consumer => {
// code here to use the singleton consumer object
}).catch(err => {
console.log("failed to get consumer object");
});
You will have to do the same thing in any class that has async operations involved in initializing the object (like CommandBus) too and each .init() call needs to call the base class super.init().then(...) so base class can do its thing to get properly initialized too and the promise your .init() is linked to the base class too. Or, if you're creating other objects that themselves have factory functions, then your .init() needs to call those factory functions and link their promises together so the .init() promise that is returned is linked to the other factory function promises too (so the promise your .init() returns will not resolve until all dependent objects are all done).
I am very new to nodejs and stuck at a place where one function populates an array and the other reads from it.
Is there any simple construct to synchronize this.
Code looks something like Below
let arr = [];
let prod = function() {
arr.push('test');
};
let consume = function() {
process(arr.pop());
};
I did find some complicated ways to do it :(
Thanks alot for any help... ☺️
By synchronizing you probably mean that push on one side of your application should trigger pop on the other. That can be achieved with not-so-trivial event-driven approach, using the NodeJS Events module.
However, in simple case you could try another approach with intermediary object that does the encapsulation of array operations and utilizes the provided callbacks to achieve observable behavior.
// Using the Modular pattern to make some processor
// which has 2 public methods and private array storage
const processor = () => {
const storage = [];
// Consume takes value and another function
// that is the passed to the produce method
const consume = (value, cb) => {
if (value) {
storage.push(value);
produce(cb);
}
};
// Pops the value from storage and
// passes it to a callback function
const produce = (cb) => {
cb(storage.pop());
};
return { consume, produce };
};
// Usage
processor().consume(13, (value) => {
console.log(value);
});
This is really a noop example, but I think that this should create a basic understanding how to build "synchronization" mechanism you've mentioned, using observer behavior and essential JavaScript callbacks.
You can use callback to share data between two functions
function prod(array) {
array.push('test1')
}
function consume() {
prod(function (array) {
console.log(array)
})
}
I'm running Express on Node.js and am wondering how I can effectively pass a single database connection context object between distinct Node modules (think of them sort of like application models).
I'd like to do this to be able to start a database transaction in one model and preserve it across calls to other affected models, for the duration of a single HTTP request.
I've seen people attempt to solve this using per-request database connections exposed as middleware before my route is run (taking from a connection pool, then running another piece of middleware after the routes to return the connection to the pool). That unfortunately means explicitly passing around a context object to all the affected functions, which is inelegant and clunky.
I've also seen people talking about the continuation-local-storage and AsyncWrap modules, but I'm unclear how they can solve my particular problem. I tried working with continuation-local-storage briefly but because I primarily use promises and generators in my code, it wasn't able to pass state back from the run method (it simply returns the context object passed into its callback).
Here's an example of what I'm trying to do:
// player-routes.js
router.post('/items/upgrade', wrap(function *(req, res) {
const result = yield playerItem.upgrade(req.body.itemId);
res.json();
});
// player-item.js
const playerItem = {
upgrade: Promise.coroutine(function *(user, itemId) {
return db.withTransaction(function *(conn) {
yield db.queryAsync('UPDATE player_items SET level = level + 1 WHERE id = ?', [itemId]);
yield player.update(user);
return true;
});
})
};
module.exports = playerItem;
// player.js
const player = {
update(user) {
return db.queryAsync('UPDATE players SET last_updated_at = NOW() WHERE id = ?', [user.id]);
})
};
module.exports = player;
// db.js
db.withTransaction = function(genFn) {
return Promise.using(getTransactionConnection(), conn => {
return conn.beginTransactionAsync().then(() => {
const cr = Promise.coroutine(genFn);
return Promise
.try(() => cr(conn))
.then(res => {
return conn.commitAsync().thenReturn(res);
}, err => {
return conn.rollbackAsync()
.then(() => logger.info('Transaction successfully rolled back'))
.catch(e => logger.error(e))
.throw(err);
});
});
});
};
A couple of notes here:
The wrap function is just a little piece of wrapper middleware that allows me to use generators/yield in my routes.
The db module is also just a small wrapper around the popular mysql module, that has been promisified.
What I'd like to do, probably in db.queryAsync, is check if there's a conn object set on the current context (which I'd set around the return Promise... call in db.withTransaction). If so, use that connection to do all subsequent database calls until the context goes out of scope.
Unfortunately, wrapping the return Promise... call in the CLS namespace code didn't allow me to actually return the promise -- it just returned the context object, which is incorrect in my case. It looks like most usages of CLS rely on not actually returning anything from inside the run callback. I also looked at cls-bluebird, but that didn't seem to do what I need it to do, either.
Any ideas? I feel like I'm close, but it's just not all hooking up exactly how I need it to.