I'm making a server script and, to make it easier for both hosts and clients to do what they want, I made a customizable server script that runs using nw.js(with a visual interface). Said script was made using web workers since nw.js was having problems with support to worker threads.
Now that NW.js fixed their problems with worker threads, I've been trying to move all the things that were inside the web workers to worker threads, but there's a problem: When the main thread receives the answer from the second thread, the later stops responding to any subsequent message.
For example, running the following code with either NW.js or Node.js itself will return "pong" only once
const { Worker } = require('worker_threads');
const worker = new Worker('const { parentPort } = require("worker_threads");parentPort.once("message",message => parentPort.postMessage({ pong: message })); ', { eval: true });
worker.on('message', message => console.log(message));
worker.postMessage('ping');
worker.postMessage('ping');
How do I configure the worker so it will keep responding to whatever message it receives after the first one?
Because you use EventEmitter.once() method. According to the documentation this method does the next:
Adds a one-time listener function for the event named eventName. The
next time eventName is triggered, this listener is removed and then
invoked.
If you need your worker to process more than one event then use EventEmitter.on()
const worker = new Worker('const { parentPort } = require("worker_threads");' +
'parentPort.on("message",message => parentPort.postMessage({ pong: message }));',
{ eval: true });
Related
I have a architecture with a express.js webserver that accepts new tasks over a REST API.
Furthermore, I have must have another process that creates and supervises many other tasks on other servers (distributed system). This process should be running in the background and runs for a very long time (months, years).
Now the questions is:
1)
Should I create one single Node.js app with a task queue such as bull.js/Redis or Celery/Redis that basically launches this long running task once in the beginning.
Or
2)
Should I have two processes, one for the REST API and another daemon processes that schedules and manages the tasks in the distributed system?
I heavily lean towards solution 2).
Drawn:
I am facing the same problem now. as we know nodejs run in single thread. but we can create workers for parallel or handle functions that take some time that we don't want to affect our main server. fortunately nodejs support multi-threading.
take a look at this example:
const worker = require('worker_threads');
const {
Worker, isMainThread, parentPort, workerData
} = require('worker_threads');
if (isMainThread) {
module.exports = function parseJSAsync(script) {
return new Promise((resolve, reject) => {
const worker = new Worker(__filename, {
workerData: script
});
worker.on('message', resolve);
worker.on('error', reject);
worker.on('exit', (code) => {
if (code !== 0)
reject(new Error(`Worker stopped with exit code ${code}`));
});
});
};
} else {
const { parse } = require('some-js-parsing-library');
const script = workerData;
parentPort.postMessage(parse(script));
}
https://nodejs.org/api/worker_threads.html
search some articles about multi-threading in nodejs. but remember one here , the state cannot be shared with threads. you can use some message-broker like kafka, rabbitmq(my recommended), redis for handling such needs.
kafka is quite difficult to configure in production.
rabbitmq is good because you can store messages, queues and .., in local storage too. but personally I could not find any proper solution for load balancing these threads . maybe this is not your answer, but I hope you get some clue here.
I'm using Nodejs cluster module to have multiple workers running.
I created a basic Architecture where there will be a single MASTER process which is basically an express server handling multiple requests and the main task of MASTER would be writing incoming data from requests into a REDIS instance. Other workers(numOfCPUs - 1) will be non-master i.e. they won't be handling any request as they are just the consumers. I have two features namely ABC and DEF. I distributed the non-master workers evenly across features via assigning them type.
For eg: on a 8-core machine:
1 will be MASTER instance handling request via express server
Remaining (8 - 1 = 7) will be distributed evenly. 4 to feature:ABD and 3 to fetaure:DEF.
non-master workers are basically consumers i.e. they read from REDIS in which only MASTER worker can write data.
Here's the code for the same:
if (cluster.isMaster) {
// Fork workers.
for (let i = 0; i < numCPUs - 1; i++) {
ClusteringUtil.forkNewClusterWithAutoTypeBalancing();
}
cluster.on('exit', function(worker) {
console.log(`Worker ${worker.process.pid}::type(${worker.type}) died`);
ClusteringUtil.removeWorkerFromList(worker.type);
ClusteringUtil.forkNewClusterWithAutoTypeBalancing();
});
// Start consuming on server-start
ABCConsumer.start();
DEFConsumer.start();
console.log(`Master running with process-id: ${process.pid}`);
} else {
console.log('CLUSTER type', cluster.worker.process.env.type, 'running on', process.pid);
if (
cluster.worker.process.env &&
cluster.worker.process.env.type &&
cluster.worker.process.env.type === ServerTypeEnum.EXPRESS
) {
// worker for handling requests
app.use(express.json());
...
}
{
Everything works fine except consumers reading from REDIS.
Since there are multiple consumers of a particular feature, each one reads the same message and start processing individually, which is what I don't want. If there are 4 consumers, 1 is marked as busy and can not consumer until free, 3 are available. Once the message for that particular feature is written in REDIS by MASTER, the problem is all 3 available consumers of that feature start consuming. This means that the for a single message, the job is done based on number of available consumers.
const stringifedData = JSON.stringify(req.body);
const key = uuidv1();
const asyncHsetRes = await asyncHset(type, key, stringifedData);
if (asyncHsetRes) {
await asyncRpush(FeatureKeyEnum.REDIS.ABC_MESSAGE_QUEUE, key);
res.send({ status: 'success', message: 'Added to processing queue' });
} else {
res.send({ error: 'failure', message: 'Something went wrong in adding to queue' });
}
Consumer simply accepts messages and stop when it is busy
module.exports.startHeartbeat = startHeartbeat = async function(config = {}) {
if (!config || !config.type || !config.listKey) {
return;
}
heartbeatIntervalObj[config.type] = setInterval(async () => {
await asyncLindex(config.listKey, -1).then(async res => {
if (res) {
await getFreeWorkerAndDoJob(res, config);
stopHeartbeat(config);
}
});
}, HEARTBEAT_INTERVAL);
};
Ideally, a message should be read by only one consumer of that particular feature. After consuming, it is marked as busy so it won't consume further until free(I have handled this). Next message could only be processed by only one consumer out of other available consumers.
Please help me in tacking this problem. Again, I want one message to be read by only one free consumer and rest free consumers should wait for new message.
Thanks
I'm not sure I fully get your Redis consumers architecture, but I feel like it contradicts with the use case of Redis itself. What you're trying to achieve is essentially a queue based messaging with an ability to commit a message once its done.
Redis has its own pub/sub feature, but it is built on fire and forget principle. It doesn't distinguish between consumers - it just sends the data to all of them, assuming that its their logic to handle the incoming data.
I recommend to you use Queue Servers like RabbitMQ. You can achieve your goal with some features that AMQP 0-9-1 supports: message acknowledgment, consumer's prefetch count and so on. You can set up your cluster with very agile configs like ok, I want to have X consumers, and each can handle 1 unique (!) message at a time and they will receive new ones only after they let the server (rabbitmq) know that they successfully finished message processing. This is highly configurable and robust.
However, if you want to go serverless with some fully managed service so that you don't provision like virtual machines or anything else to run a message queue server of your choice, you can use AWS SQS. It has pretty much similar API and features list.
Hope it helps!
I have an implementation in node where an API when called does some processing and waits for an event from another function before returning the response. This works fine when ran locally and when running in a single instance in AWS but when multiple instances are involved there are some issues which I'm assuming is because the API is being called from one instance and the emitter is being emitted in another instance. Is there any way to keep the listeners and emitters same across all instances?
Update :
After some research I found that using an application loadbalancer with some logic for routing can help with this issue. I am marking the answer below as correct because while it did not help me with AWS autoscaling, it did help me find an alernate solution to my problem.
AFAIU you think that event emitted from one process is being handled in a different process, but it never would be the case from what I know because each process has its own memory and also events would be associated with the process only.
I have added a sample code that demonstrates what I meant by it. Maybe if you post the code you are referring to, we could check what went wrong.
const cluster = require("cluster");
const EventEmitter = require("events");
if (cluster.isMaster) {
cluster.fork();
const myEE = new EventEmitter();
myEE.on("foo", arg =>
console.log("emitted from ", arg, "received in master")
);
setTimeout(() => {
myEE.emit("foo", "master");
}, 1000);
} else {
const myEE = new EventEmitter();
myEE.on("foo", arg => console.log("emitted from", arg, "received in worker"));
setTimeout(() => {
myEE.emit("foo", "client");
}, 2000);
}
I'm building a background task management system with rabbitmq and nodejs using the amqlib module.
Some of the tasks are really CPU-consuming, so if I'm launching a lot of them and I have only a few workers up, my server can get killed (using too much CPU).
I'm wondering if there is a way to create an amqp queue so that my consumers will only consume one task of this queue at a time (i.e. Before an ack or a reject, do not send a task of this kind to this consumer).
Or should I handle this myself in the code (maybe keeping a reference in my worker that I'm handling a task of this queue and rejecting all tasks of this queue while I'm executing the task ?).
Here is my sample code :
I'm creating the amqp connection like that
const amqpConn = require('amqplib').connect('amqp://localhost');
My queue name is tasks :
amqpConn.then((conn) => {
return conn.createChannel();
}).then((ch) => {
return ch.assertQueue('tasks').then((ok) => {
ch.sendToQueue(q, new Buffer(`something to do ${i}`));
});
}).catch(console.warn);
And here is my consumer (I guess this is where I should do the work to limit only one concurrent task of this queue) :
amqpConn.then((conn) => {
return conn.createChannel();
}).then((ch) => {
return ch.assertQueue('tasks').then((ok) => {
return ch.consume('tasks', (msg) => {
if (msg !== null) {
console.log(msg.content.toString());
ch.ack(msg);
}
});
});
}).catch(console.warn);
Thanks a lot !
I'm wondering if there is a way to create an amqp queue so that my consumers will only consume one task of this queue at a time
If this is what you really need then yes, simply have exactly one consumer and declare the queue exclusive. In that way one tasks is consumed at the time.
I think I got it going by :
creating a Channel per queue
using the prefetch_count of the channel to limit the concurrency on a per-consumer basis
https://www.rabbitmq.com/consumer-prefetch.html
I have encountered a weird problem in one of my projects. I am creating one WCF channel and trying to consume it from multiple threads. The service I am targeting is shut down so I except to get an exception after the "Open timeout" (30 seconds in my case) at most. But what I have seen is that the first two calls to the channel are finished (with exception) really quickly. all the other calls are finished after 20 minutes (My receive timeout).
I am using the same channel because I don't want to wait for the channel to open for each request (Can take a few seconds in case of security and high latency). I have read that a channel is thread safe so I didn't think it should be a problem.
I am using dot net 4
Code sample:
EndpointAddress address = new EndpointAddress("net.tcp://localhost:9000/SomeService");
var netTcpBinding = new NetTcpBinding();
var channelFactory = new ChannelFactory<IService>(netTcpBinding, address);
IService channel = channelFactory.CreateChannel();
Parallel.For(0, 10, new ParallelOptions{MaxDegreeOfParallelism = 10}, i =>
{
try
{
channel.SomeOperation();
}
catch
{
}
});
I have tried to Close/Abort/Dispose the channel in the catch block but it didn't help.
Does anyone have any idea why this happens and how to fix it?
A Channel only has one connection, so even if it is thread-safe, you won't get the asynchronous benefits of using Parallel. Create a channel per loop and ensure that you close the channel after each request or you'll exhaust the connection pool on your machine from undisposed connections retained by the Channel.
Didn't find a standard solution but what I did find is that when I use async calls the problem doesn't happen (tested it several time with a 100 iterations loop.
Parallel.For(0, 10, new ParallelOptions{MaxDegreeOfParallelism = 10}, i =>
{
try
{
var result = channel.BeginSomeOperation();
channel.EndSomeOperation(result);
}
catch
{
}
});
Try this instead.
var tasks = from i in Enumerable.Range(0, 10)
select TaskEx.FromAsync(channel.BeginSomeOperation, channel.EndSomeOperation, null);
var results = from t in TaskEx.WhenAll(tasks)
select t.Result;
PS TaskEx is in the Async targeting pack.