API with Work Queue Design Pattern - node.js

I am building an API that is connected to a work queue and I'm having trouble with the structure. What I'm looking for is a design pattern for a worker queue that is interfaced via a API.
Details:
I'm using a Node.js server and Express to create an API that takes a request and returns JSON. These request can take a long time to process (very data intensive) so this is why we use a queuing system (RabbitMQ).
So for example lets say I send a request to the API that will take 15 min to process. The Express API formats the request and puts it in a RabbitMQ (AMQP) queue. The next available worker takes the request off the queue and starts to process it. After its done (in this case 15 min) it saves the data into a MongoDB. .... now what .....
My issue is, how do I get the finished data back to the caller of the API? The caller is a completely separate program that contacts the API via something like an Ajax request.
The worker will save the processed data into a database but I have no way to push back to the original calling program.
Does anyone have any API with a work queue resources?
Please and thank you.

On the initiating call by the client you should return to the client a task identifier that will persist with the data all the way to MongoDB.
You can then provide an additional API method for the client to check the task's status. This method should take a single parameter, the task identifier, and check if a document with that identifier has made in into your collection in MongoDB. Return false if it doesn't exist yet, true when it does.
The client will have to repeatedly poll (but maybe at a 1 minute interval) the task status API method until it returns true.

Related

Generate big data in excel or pdf using REST API

I'm trying to generate the excel report file in micro-service using REST API.
On REST API if the generation process may take long time, connection would give time out for the users.
Is there any best practice or architecture pattern for this purpose?
EX: If data includes 10 column with 1 million rows the generation process should spend 30 seconds. Also it might depends on what technical resources we have.
You should do heavy task in asynchronous way. Client should just trigger the process and should not wait for the completion. Now question come how Client will get updated copy of Excel. There are 2 ways:-
In response of initiate call, server return a job Id. Client will keep polling for the status of job Id. Whenever job get completed, it will get the file.
Some notification mechanism like Socket.io, where server will notify whenever job is done. After getting notification, client may download the processed file.

Handling Response on a Worker Process in NodeJs

I am trying to design a service following the Command Query Responsibility Segregation Pattern (CQRS) in NodeJs. To handle segregation, I am taking the following approach:
Create separate workers for querying and executing commands
Expose them using a REST API
The REST API has been designed with ExpressJs. All endpoints starting with 'update', 'create' and 'delete' keywords are treated as commands; all endpoints with 'get' or 'find' are treated as queries.
When a request reaches its designated handler, one of the following occurs:
If its a command, a response is sent immediately after delegating the task to worker process; other services are notified by generating appropriate events when the master process receives a completion message from the worker.
If its a query, the response is handled by a designated worker that can use a reference of the database connection passed on as arguments to fetch and send the query result.
For (2) above, I am trying to create a mechanism that somehow "passes" the response object to the worker which, can then complete the request. Can this be done by "cloning" the response object and passing it as plain arguments? If not, what is the preferred way of achieving this?
I think you are better off in (2) to pass the query off onto a worker process, which returns to the master process, which then sends back the request.
First of all, you don't really want to give the worker processes "access" to the outside. They should be all internal workers, managed by the master process.
Second, the Express server's job is to receive requests, do something with them, then return a result. It seems like over-complicating to try to pass the communication off to a worker.
If you are really worried about your Express server getting overwhelmed with requests, you should consider something like Docker to create a "swarm" of express instances.

How to get the status of all requests to one API in nodejs

I want to get API server status in nodejs. I'm using nodejs to open an interface: "api/request?connId=50&timeout=90". This API will keep the request running for provided time on the server side. After the successful completion of the provided time it should return status/OK. And when we have multiple connection ids & timeout, we want the API return all the running requests on the server with their time left for completion, something like below, where 4 and 8 are the connId and 25 and 15 is the time remaining for the requests to complete (in seconds):
{"4":"25","8":"15"}
please help.
Node.js server uses async model in one single thread, which means at any time, only one request (connId) is under execution by Node (except... you have multiple node.js instance, but let's keep the scenario simple and ignore this case).
When one request is processed (running its handler code), it may start an async task such as read a file, and continue execution. The request itself's handler code would be executed without waiting for async task, and when this handler code is finished running, from Node.js point of view, the request handling itself is done -- the handling of async task's result is another thing in another time, node does not care about the progress of it.
Thus, in order to return remaining time of all requests -- I guess this is the remaining time of other request's async task, because remaining time of other request's handler code execution does not make any sense, there must be some place to store the information of all requests, including:
request's connId and startTime (the time when request is received).
request's timeout value, which is passed as parameter in URL.
request's estimated remaining time, this information is mission specific and must be retrieved from other async task related services (you can pull time by time using setInterval or make other services push the latest remaining time). Node.js doesn't know the remaining time information of any async task.
In this way, you can track all running requests and their remaining time. Before one request is returned, you can check the above "some place" to calculate all requests' remaining time. This "some place" could be global variable, memory database such as Redis, or even a plain database such as MySQL.
Please note: the calculated remaining time would not be accurate, as the read&calculation itself would cost time and introduce error.

Azure Storage Queue - correlate response to request

When a Web Role places a message onto a Storage Queue, how can it poll for a specific, correlated response? I would like the back-end Worker Role to place a message onto a response queue, with the intent being that the caller would pick the response up and go from there.
Our intent is to leverage the Queue in order to offload some heavy processing onto the back-end Worker Roles in order to ensure high performance on the Web Roles. However, we do not wish to respond to the HTTP requests until the back-end Workers are finished and have responded.
I am actually in the middle of making a similar decision. In my case i have a WCF service running in a web role which should off-load calculations to worker-roles. When the result has been computed, the web role will return the answer to the client.
My basic data structure knowledge tells me that i should avoid using something that is designed as a queue in a non-queue way. That means a queue should always be serviced in a FIFO like manner. So basically if using queues for both requests and response, the threads awaiting to return data to the client will have to wait untill the calculation message is at the "top" of the response queue, which is not optimal. If storing the responses by using Azure tables, the threads poll for messages creating unnecessary overhead
What i belive is a possible solution to this problem is using a queue for the requests. This enables use of the competeing consumers pattern and thereby load-balancing. On messages sent into this queue you set the correlationId property on the message. For reply the pub/sub part ("topics") part of Azure service bus is used togehter with a correlation filter. When your back-end has processed the request, it published a result to a "responseSubject" with the correlationId given in the original request. Now this response ca be retrieved by your client by calling CreateSubscribtion (Sorry, i can't post more than two links apparently, google it) using that correlation filter, and it should get notified when the answer is published. Notice that the CreateSubscribtion part should just be done one time in the OnStart method. Then you can do an async BeginRecieve on that subscribtion and the role will be notified in the given callback when a response for one of it's request is available. The correlationId will tell you which request the response is for. So your last challenge is giving this response back to the thread holding the client connection.
This could be achieved by creating Dictionary with the correlationId's (probably GUID's) as key and responses as value. When your web role gets a request it creates the guid, set it as correlationId, add it the hashset, fire the message to the queue and then call Monitor.Wait() on the Guid object. Then have the recieve method invoked by the topic subscribition add the response to the dictionary and then call Monitor.Notify() on that same guid object. This awakens your original request-thread and you can now return the answer to your client (Or something. Basically you just want your thread to sleep and not consume any ressources while waiting)
The queues on the Azure Service Bus have a lot more capabilities and paradigms including pub / sub capabilities which can address issues dealing with queue servicing across multiple instance.
One approach with pub / sub, is to have one queue for requests and one for the responses. Each requesting instance would also subscribe to the response queue with a filter on the header such that it would only receive the responses targeted for it. The request message would, of course contain the value to the placed in the response header to drive the filter.
For the Service Bus based solution there are samples available for implementing Request/Response pattern with Queues and Topics (pub-sub)
Let worker role keep polling and processing the message. As soon as the message is processed add an entry in Table storage with the required corelationId(RowKey) and the processing result, before deleting the processed message from the queue.
Then WebRoles just need to do a look up of the Table with the desired correlationId(RowKey) & PartitionKey
Have a look at using SignalR between the worker role and the browser client. So your web role puts a message on the queue and returns a result to the browser (something simple like 'waiting...') and hook it up to the worker role with SignalR. That way your web role carries on doing other stuff and doesn't have to wait for a result from asynchronous processing, only the browser needs to.
There is nothing intrinsic to Windows Azure queues that does what you are asking. However, you could build this yourself fairly easily. Include a message ID (GUID) in your push to the queue and when processing is complete, have the worker push a new message with that message ID into a response channel queue. Your web app can poll this queue to determine when processing is completed for a given command.
We have done something similar and are looking to use something like SignalR to help reply back to the client when commands are completed.

API usage Limit in Flurry?

Flurry says that The rate limit for the API is 1 request per second. In other words, you may call the API once every second. I could not understand this.it means whenever the event occurs in mobile Application the request sends to server not as whole thing.Am I right? any help please?
When registering events you don't need to worry about API Limits. All events you fire are stored locally and when your session finishes the whole event package is sent to flurry server.
You could create a queue in your application with the events you want to register with the API, and continuously try to send all items in that queue, with a one second interval.

Resources