I am working in nodejs with express for a web app that communicates with mongodb frequently. Currently I running production with my own job queue system that only begins processing a job once the previous job has been completed (an approach that seems to be taken by kue).
To me, this seems wildly inefficient, I was hoping for a more asynchronous job queue, so I am looking for some advice as to how other nodejs developers queue their jobs and structure their processing.
One of my ideas is to process any jobs that are received immediately and return the resulting data in the order of addition.
Also to be considered: currently each user has their own independent job queue instance, is this normal practice? Is there any reason why this should not be the case? (ie, all users send jobs to one universal queue?)
Any comments/advice are appreciated.
Why do you build your own queue system? You did quite a lot of work to serialize a async queue with addLocalJob.
Why don't you just do something like
on('request', function(req, res) { queryDB(parameter, function(result) { res.send(result) })
? Full parallel access, no throttle, no (async) waiting.
If you really want to do it "by hand" in your own code, why not execute the first n elements of your trafficQueue instead of only the first?
If you want to throttle the DB - two ways:
use a library like async and the function parallelLimit
connect to your mongodb with http(s) and use the node-build-in http.globalAgent.maxSockets.
Hope this helps.
Related
I have a restify api similar to this (i write in pseudocode) :
server.post('api/import')
{
database.write(status of this file.id is pending)
fileModification(req.file)
res.status(200)
res.send(import has started)
} //here I do some file modifications and then i import it to database
server.get('api/import_info')
{
database.select(file status)
} //here I want to see status (is file imported or pending(process is not finished yet))
//In another module after import is finished I update database to
database.write(file.id import status is completed)
Importing file is process that takes about 2 minutes, but even I don't wait for it to finish in api/import when I try to trigger 'info' route my api is blocked
Is it possible that event loop is blocked or maybe connection is not properly closed.
Thanks in advance
I have some ideas about your question.
you can use cluster module Cluster, cluster module can create process depend on your cpu core. When on process blocked, Others process still can work.
you can fork a new process in your api, use the new process handle your task.
We have recently started working on Typescript language for one of the application where a queue'd communication is expected between a server and client/clients.
For achieving the queue'd communication, we are trying to use the ZeroMQ library version 4.6.0 as a npm package: npm install -g zeromq and npm install -g #types/zeromq.
The exact scenario :
The client is going to send thousands of messages to the server over ZeroMQ. The server in-turn will be responding with some acknowledgement message per incoming message from the client. Based on the acknowledgement message, the client will send next message.
ZeroMQ pattern used :
The ROUTER/DEALER pattern (we cannot use any other pattern).
Client side code :
import Zmq = require('zeromq');
let clientSocket : Zmq.Socket;
let messageQueue = [];
export class ZmqCommunicator
{
constructor(connString : string)
{
clientSocket = Zmq.socket('dealer');
clientSocket.connect(connString);
clientSocket.on('message', this.ReceiveMessage);
}
public ReceiveMessage = (msg) => {
var argl = arguments.length,
envelopes = Array.prototype.slice.call(arguments, 0, argl - 1),
payload = arguments[0];
var json = JSON.parse(msg.toString('utf8'));
if(json.type != "error" && json.type =='ack'){
if(messageQueue.length>0){
this.Dispatch(messageQueue.splice(0, 1)[0]);
}
}
public Dispatch(message) {
clientSocket.send(JSON.stringify(message));
}
public SendMessage(msg: Message, isHandshakeMessage : boolean){
// The if condition will be called only once for the first handshake message. For all other messages, the else condition will be called always.
if(isHandshakeMessage == true){
clientSocket.send(JSON.stringify(message));
}
else{
messageQueue.push(msg);
}
}
}
On the server side, we already have a ROUTER socket configured.
The above code is pretty straight forward. The SendMessage() function is essentially getting called for thousands of messages and the code works successfully but with load of memory consumption.
Problem :
Because the behavior of ZeroMQ is asynchronous, the client has to wait on the call back call ReceiveMessage() whenever it has to send a new message to ZeroMQ ROUTER (which is evident from the flow to the method Dispatch).
Based on our limited knowledge with TypeScript and usage of ZeroMQ with TypeScript, the problem is that because default thread running the typescript code (which creates the required 1000+ messages and sends to SendMessage()) continues its execution (creating and sending more messages) after sending the first message (handshake message essentially), unless all the 1000+ messages are created and sent to SendMessage() (which is not sending the data but queuing the data as we want to interpret the acknowledgement message sent by the router socket and only based on the acknowledgement we want to send the next message), the call does not come to the ReceiveMessage() call back method.
It is to say that the call comes to ReceiveMessage() only after the default thread creating and calling SendMessage() is done doing this for 1000+ message and now there is no other task for it to do any further.
Because ZeroMQ does not provide any synchronous mechanism of sending/receiving data using the ROUTER/DEALER, we had to utilize the queue as per the above code using a messageQueue object.
This mechanism will load a huge size messageQueue (with 1000+ messages) in memory and will dequeue only after the default thread gets to the ReceiveMessage() call at the end. The situation will only worsen if say we have 10000+ or even more messages to be sent.
Questions :
We have validated this behavior certainly. So we are sure of the understanding that we have explained above. Is there any gap in our understanding of either/or TypeScript or ZeroMQ usage?
Is there any concept like a blocking queue/limited size array in Typescript which would take limited entries on queue, and block any new additions to the queue until the existing ones are queues (which essentially applies that the default thread pauses its processing till the time the call back ReceiveMessage() is called which will de-queue entries from the queue)?
Is there any synchronous ZeroMQ methodology (We have used it in similar setup for C# where we pool on ZeroMQ and received the data synchronously)?.
Any leads on using multi-threading for such a scenario? Not sure if Typescript supports multi threading to a good extent.
Note : We have searched on many forums and have not got any leads any where. The above description may have multiple questions inside one question (against the rules of stackoverflow forum); but for us all of these questions are interlinked to using ZeroMQ effectively in Typescript.
Looking forward to getting some leads from the community.
Welcome to ZeroMQ
If this is your first read about ZeroMQ, feel free to first take a 5 seconds read - about the main conceptual differences in [ ZeroMQ hierarchy in less than a five seconds ] Section.
1 ) ... Is there any gap in our understanding of either/or TypeScript or ZeroMQ usage ?
Whereas I cannot serve for the TypeScript part, let me mention a few details, that may help you move forwards. While ZeroMQ is principally a broker-less, asynchronous signalling/messaging framework, it has many flavours of use and there are tools to enforce both a synchronous and asynchronous cooperation between the application code and the ZeroMQ Context()-instance, which is the cornerstone of all the services design.
The native API provides means to define, whether a respective call ought block, until a message processing across the Context()-instance's boundary was able to get completed, or, on the very contrary, if a call ought obey the ZMQ_DONTWAIT and asynchronously return the control back to the caller, irrespectively of the operation(s) (in-)completion.
As additional tricks, one may opt to configure ZMQ_SND_HWM + ZMQ_RCV_HWM and other related .setsockopt()-options, so as to meet a specific blocking / silent-dropping behaviours.
Because ZeroMQ does not provide any synchronous mechanism of sending/receiving data
Well, ZeroMQ API does provide means for a synchronous call to .send()/.recv() methods, where the caller is blocked until any feasible message could get delivered into / from a Context()-engine's domain of control.
Obviously, the TypeScript language binding/wrapper is responsible for exposing these native API services to your hands.
3 ) Is there any synchronous ZeroMQ methodology (We have used it in similar setup for C# where we pool on ZeroMQ and received the data synchronously) ?
Yes, there are several such :
- the native API, if not instructed by a ZMQ_DONTWAIT flag, blocks until a message can get served
- the native API provides a Poller()-object, that can .poll(), if given a -1 as a long duration specifier to wait for sought for events, blocking the caller until any such event comes and appears to the Poller()-instance.
Again, the TypeScript language binding/wrapper is responsible for exposing these native API services to your hands.
... Large memory consumption ...
Well, this may signal a poor resources management care. ZeroMQ messages, once got allocated, ought become also free-d, where appropriate. Check your TypeScript code and the TypeScript language binding/wrapper sources, if the resources systematically get disposed off and free-d from memory.
I'm still looking for a solution to implement an asynchronous call before returning a response. In other words, I have a long asynchronous process which should start running before returning a response, a user should not be waiting a long time for the end of this process:
$data = ....
...//Here call to an asynchronous function <<----
return $this->getSuccessResponse($data);
I tried with Events, Thread, Process, but no result.
What should I do ? (something expect RabbitMQ)
You can use a queuing system like Beanstalk. With this bundle LeezyPheanstalkBundle you can manage the queues.
In the controller, insert the job in the queue. And, in a command running with supervisor, execute your task.
Edit:
You can use an EventSubscriber
I have a few Azure functions sharing same the code. So I created a batch file for publishing my libs. It is a simple bat file. For each of my azure functions, it connects to a host and uses robocopy to synchronize folders.
However, each time I publish, current running functions are dropped. I want to avoid that. Is there a way to let a running function naturally terminate its work?
I think its possible because when I publish, I'm not re-write real running dll, but I copy file in <azure-function-url>/site/wwwroot folder.
NOTE:
The function calls an async method without await. The async method does not completed the work when source change. (Im not focus on this problem, thanks Matt for the comment..open my eyes)
The functions runtime is designed to allow functions to gracefully exit in the event of host restarts, see here.
Not awaiting your async calls is an antipattern in functions, as we won't be able to track your function execution. We use the returned Task to determine when your function has finished. If you do not return a Task, we assume your function has completed when it returns.
In your case, that means we will kill the host on restarts while your orphaned asynchronous calls are running. If you fail to await async calls, we also don't guarantee successful:
Logging
Output bindings
Exception handling
Do: static async Task Run(...){ await asyncMethod(); }
Don't: static void Run(...){ asyncMethod(); }
I'm using MVC4 ApiController to upload data to Azure Blob. Here is the sample code:
public Task PostAsync(int id)
{
return Task.Factory.StartNew(() =>
{
// CloudBlob.UploadFromStream(stream);
});
}
Does this code even make sense? I think ASP.NET is already processing the request in a worker thread, so running UploadFromStream in another thread doesn't seem to make sense since it now uses two threads to run this method (I assume the original worker thread is waiting for this UploadFromStream to finish?)
So my understanding is that async ApiController only makes sense if we are using some built-in async methods such as HttpClient.GetAsync or SqlCommand.ExecuteReaderAsync. Those methods probably use I/O Completion Ports internally so it can free up the thread while doing the actual work. So I should change the code to this?
public Task PostAsync(int id)
{
// only to show it's using the proper async version of the method.
return TaskFactory.FromAsync(BeginUploadFromStream, EndUploadFromStream...)
}
On the other hand, if all the work in the Post method is CPU/memory intensive, then the async version PostAsync will not help throughput of requests. It might be better to just use the regular "public void Post(int id)" method, right?
I know it's a lot questions. Hopefully it will clarify my understanding of async usage in the ASP.NET MVC. Thanks.
Yes, most of what you say is correct. Even down to the details with completion ports and such.
Here is a tiny error:
I assume the original worker thread is waiting for this UploadFromStream to finish?
Only your task thread is running. You're using the async pipeline after all. It does not wait for the task to finish, it just hooks up a continuation. (Just like with HttpClient.GetAsync).