I have a queue of data from the AWS SQS service, and I am retrieving this data, posting it to a webpage created and hosted via Node.js, and then telling the SQS service to delete the file. I use Nodemon to create and update the page, such that every time I pull a new event, the page updates and users logged into the page see fresh data. I achieve this with code that goes something like:
sqs.receiveMessage(data){
if (data = 1) {
dataForWebPage = something
fs.writeFileSync( "dataFile.json", JSON.stringify(dataForWebPage, null, 2), "utf8");
}
if (data = 2) {
dataForWebPage = somethingDifferent
fs.writeFileSync( "dataFile.json", JSON.stringify(dataForWebPage, null, 2), "utf8");
}
}
sqs.deleteMessage(data)
When testing this on Windows using Visual Code Studio, this works well. Running 'nodemon myscript.js' and opening localhost:3000 displays the page. As events come in, nodemon restarts, the page updates seamlessly, and the events are purged from the queue.
However, if zip the files and modules up, and move the script over to a linux machine, running an identical script via SSH means that I can view the webpage, the page gets update, nodemon restarts and behaves in the same way that I expect, but the messages from the SQS queue do not get deleted. They simply stay in the queue, and are never removed. Moments later, my script will pull them again, making the webpage inaccurate. They will continue to look forever and never delete.
If I do not use nodemon or if I comment out the fs.writeFileSync, the app works as expected and the events from the SQS queue are deleted as expected. However, my webpage is not then updated.
I had a theory that this was due to nodemon restarting the service, and as a result, causing the script to stop and restart before it reached the 'deleteMessage' part. However, If I simply move the delete event so that it happens before any reset, it does not solve the problem. For example, the following code is still broken on Linux, but like the previous version, DOES work on Windows:
sqs.receiveMessage(data){
if (data = 1) {
dataForWebPage = something
sqs.deleteMessage(data)
fs.writeFileSync( "dataFile.json", JSON.stringify(dataForWebPage, null, 2), "utf8");
}
if (data = 2) {
dataForWebPage = somethingDifferent
sqs.deleteMessage(data)
fs.writeFileSync( "dataFile.json", JSON.stringify(dataForWebPage, null, 2), "utf8");
}
}
It seems that if I use the asynchronous version of this call, fs.writeFile, the SQS events are also deleted as expected, but as I receive a lot of events, I am using the synchronous version of this service to ensure that data does not queue, and is updated simultaneously.
Later in the code, I use fs.readFileSync, and that does not seem to be interfering with the call to delete the SQS events.
My questions are:
1) What is happening, and why is it happening?
2) Why only Linux, and not windows?
3) What's the best way to solve this to ensure I get live updates to the page, but events are being deleted as expected?
1) What is happening, and why is it happening?
Guessing : deleteMessage is asynchronous, and a sync operation to write file is blocking the event loop, so your deleteMessage http call may be blocked and as you restart the process, it's actually never executed.
2) Why only Linux, and not windows?
No idea.
3) What's the best way to solve this to ensure I get live updates to
the page, but events are being deleted as expected?
I will be blunt : you have to redo all the architecture of your system.
Voluntarily failing your webserver and restarting it to refresh a web page won't scale to more than one user, and not even one it seems. It's not meant to work that way.
Depending of the constraint of the system you are trying to build (scale, speed, etc..) many different solution can work.
To stay as simple as possible :
A first improvement could be to keep your file storage but expose it through an API to get the data on the frontend from an ajax request, and polling it at regular interval. You will have a lot more request, but a lot less problem. It's maybe less "live" but few system actually need less than a few seconds live update.
Secondly, don't do sync operation on nodejs, it's a huge performance bottleneck, leading to strange errors and huge latencies.
When that works, file storage is usually a pain, and not really performant, maybe ask yourself if you need a database or a memcached/redis, also you can check if you need to replace polling an API from the webpage to a Socket that will prevent a lot of request and allow less than 1sec update.
Related
I'm currently developing a Shopify app with Node/Express and a Postgres database. When a user registers an account and connects their Shopify store, I'll need to download all of their store's orders. They could have 100,000s of orders, so I'd like to use a Shopify GraphQL Bulk Operation. While Shopify is handling this, my Node server will need to poll the Shopify server to check on the progress, and when the operation is complete, Shopify will send me a link where I can download all of the data. Once the data is processed and stored in my database, I'll send the user an email to say that their account is now set up.
How should I handle polling the Shopify server? The process could take anywhere from a few mins to hours. Using setInterval() would be a bad idea right? Because if the server restarts for whatever reason, It will lose the interval? So, should I use some sort of background task? And would I need to store anything in my database? I've researched cron jobs, child processes, worker threads, the bull package -- and it's left me a little confused.
(I also know that I could use a webhook, but Shopify offers no guarantees that my app will receive the webhook.)
Upon installation, launch a background job labeled "GetCustomerOrders". As you know, background jobs are mature, and nicely handle problems. For example, they can retry themselves if something goes wrong.
The Background job itself just sets up the Bulk Download and then settles into Poll. Polling is no big deal and just happens. As you said, could be minutes, could take hours. Nevertheless, a poll gets status on a bulk download, and that can even be hot-rodded. For example, you poll with an ID. So you poll till that ID completes. Regardless of restarts.
At the end of that rather simple setup, you get an URL to download and parse JSON. Spawn another job even for that. Endless fun. Why sweat it? Background jobs are the way to go.
The Webhook idea is OK but as the documentation says, they are not 100% and CRON is bush-league in that it misses out on the mature development of jobs in queues and is more like a simple trigger. Relying on CRON to start something is fine, but gives you zero management over what it starts.
I am guessing NodeJS has a decent background job system by this time. When you look at Sidekiq for Ruby you realize what awesome is. Surely you can find a copycat in Node that comes close anyway.
I have an app where I need to check people's posts constantly. I am trying to make sure that the -server- handles more than 100,000 posts. I tried to explain the program and specify the issues I am worried about by numbers.
I am running a simple node.js program on my terminal that runs as firebase admin controlling the Firebase Database. The program has no connectivity with clients(users), it just keeps the database locally to check users' posts every 2-3 seconds. I am keeping the posts in local hash variables by using on('child_added') to simply push the post to a posts hash and so on for on('child_removed') and on('child_changed').
Are these functions able to handle more than 5 requests per second?
Is this the proper way of keeping data locally for faster processing(and not abusing firebase limits)? I need to check every post on the platform every 2-3 seconds, so I am trying to keep a local copy of the -posts data.
That local copy of the posts are looped through every 2-3 seconds.
If there are thousands of posts, will a simple array variable handle that load?
Second part of the program:
I run a for loop to loop through the posts in a function. I run the function every 2-3 seconds using setInterval(). The program needs not only to check new added posts but it constantly needs to check all posts on the database.
If(specific condition for a post) => the program changes the state of the post
.on(child_changed) function => sends an API request to a website after that state change
Can this function run asynchronously ? When it is called, the function should not wait for the previous call to finish because the old call is sending an API request and it might not complete fast. How can I make sure that .on(child_changed) doesn't miss a single change on the -posts data?
Listen for Value Events documentation shows how to observe changes, namely one uses the .on method.
In terms of backing up your Realtime Database, you simply export the data manually, or if you have the paid plan you can automate it.
I don't understand why you would want to recreate the wheel, so to speak, and have your server ping firebase for updates. Simply use firebase observers.
I have a question that nobody seems to help with. How will this be handled in a production mode with thousands of requests at the same time?
I did a simple test case:
module.exports = {
index: function (req, res){
if (req.param('foo') == 'bar'){
async.series([
function(callback){
for (k=0; k <= 50000; k++){
console.log('did something stupid a few times');
}
callback();
}
], function(){
return res.json(null);
});
}else{
return res.view('homepage', {
});
}
}
};
Now if I go to http://localhost:1337/?foo=bar it will obviously wait a while before it responds. So if I now open a different session (other browser or incognito, and go to http://localhost:1337/ I am expecting a result immediately. Instead it is waiting for the other request to finish and only then it will let this request go through.
Therefore it is not asynchronous and it is a huge problem if I have as much as 2 ppl at the same time operating this app. I mean this app will have drop downs coming from databases, html files being served etc...
My question is this: how does one handle such an issue??? I hear the word "promises vs callbacks" - is this some sort of a solution to this?
I know about clustering, but that only separates the requests over the amount of cpu's, ultimately you will fix it by at most allowing 8 people at the same time without being blocked. It won;t handle 100 requests at the same time...
P.S. That test was to simplify the example, but think of someone uploading a file, a web service that goes to a different server, a point of sales payment terminal waiting for a user to input the pin, someone downloading a file from the app, etc...
nodejs is event driven and runs your Javascript as single threaded. So, as long as your code from the first request is sitting in that for loop, nodejs can't do anything else and won't get to the next event in the event queue so thus your second request has to wait for the first one to finish.
Now, if you used a true async operation such as setTimeout() instead of your big for loop, then nodejs could service other events while the first request was waiting for the setTimeout().
The general rule in nodejs is to avoid doing anything that takes a ton of CPU in your main nodejs app. If you are stuck with something CPU-intensive, then you're best to either run clusters (as many as CPUs you have) or move the CPU-intensive work to some sort of worker queue that is served by different processes and let the OS time slice those other processes while the main nodejs process stays free and ready to service new incoming requests.
My question is this: how does one handle such an issue??? I hear the word "promises vs callbacks" - is this some sort of a solution to this?
I know about clustering, but that only separates the requests over the amount of cpu's, ultimately you will fix it by at most allowing 8 people at the same time without being blocked. It won;t handle 100 requests at the same time...
Most of the time, a server process spends most of the time of a request doing things that are asynchronous in nodejs (reading files, talking to other servers, doing database operations, etc...) where the actual work is done outside the nodejs process. When that is the case, nodejs does not block and is free to work on other requests while the async operations from other requests are underway. The little bit of CPU time coordinating these operations can be helped further by clustering though it's probably worth testing a single process first to see if clustering is really needed.
P.S. That test was to simplify the example, but think of someone uploading a file, a web service that goes to a different server, a point of sales payment terminal waiting for a user to input the pin, someone downloading a file from the app, etc...
All the operations you mentioned here can be done truly asynchronously so they won't block your nodejs app like your for loop does so basically the for loop isn't a good simulation of any of this. You need to use a real async operation to simulate it. Real async operations do their work outside of the main nodejs thread and then just post an event to the event queue when they are done, allowing nodejs to do other things while the async operations are doing their work. That's the key.
I have, as part of a meteor application, a server side that gets POST messages of information to feed to the web client via inserts/updates to a Collection. So far so good. However, sometimes these updates can be rather large (50K records a go, every 5 seconds). I was having a hard time keeping up to this until I started using batch-insert package and then low-level batch.find.update() and batch.execute() from Mongo.
However, there is still a good amount of processing going on even with 50K records (it does some calculations, analytics, etc). I would LOVE to be able to "thread" that logic so the main event loop can continue along. However, I am not sure there is a real easy way to create "real" threads for this within Meteor. So baring that, I would like to know the best / proper way of at least "batching" the work so that every N (say 1K or so) records I can release the event loop back to process other events (like some client side DDP messages and the like). Then do another 1K records, etc. until however many records as I need are done.
I am THINKING the solution lies within using Fibers/Futures -- which appear to be the Meteor way -- but I am not positive that is correct or the low level ideas like "setTimeout()" and/or "setImmediate()" are more appropriate.
TIA!
Meteor is not a one size fits all tool. I think you should decouple your meteor application from your batch processing. Set up a separate meteor instance, or better yet set up a pure node.js server to handle these requests and batch processes. It would look like this:
Create a node.js instance that connects to the same mongo database using the mongodb plugin (https://www.npmjs.com/package/mongodb).
Use express if you're using node.js to handle the post requests (https://www.npmjs.com/package/express).
Do the batch processing/inserts/updates in this instance.
The updates in mongo will be reflected in meteor very quickly. I had a similar situation and used a node server to do some batch data collection and then pass it into a cassandra database. I then used pig latin to run some batch operations on that data, and then inserted it into mongo. My meteor application would reactively display the new data pretty much instantaneously.
You can call this.unblock() inside a server method to allow the code to run in the background, and immediately return from the method. See example below.
Meteor.methods({
longMethod: function() {
this.unblock();
Meteor._sleepForMs(1000 * 60 * 60);
}
});
I have an NServiceBus configuration that is working great on developers machines and in my Development Environment.
However, when I move it to my Test Environment my messages just start getting tossed.
Here is the system:
An app gets a TCP message from a Mainframe system and sends it to a MSMQ (call it FromMainframe).
An application hosted in IIS has a "Handle" method for that MSMQ and processes the messages from the mainframe.
In my Test Environment, step two only half way happens. The message is popped off the MSMQ, but not processed by my application.
Effectively my data is LOST! NServiceBus removes them from the Queue but I never get to process them. They are not even in the error queue!
These are the things I have tried in an attempt to figure out what is happening:
Check the Config files
Attach a remote debugger to the process to see what the Handle method is doing
The Handle method is never called (but when I attach to the Development Environment my breakpoint in my Handle method is hit and it all works flawlessly).
Redeploy my Dev version to the Test Envioronment and try step 2 again (just in case the versions were not exactly the same.)
Check the Config files again
Check that the Error queue is not filling up
The error queue stays empty (I wish it would fill up, then my data would not be LOST).
Check for any other process that may be pulling stuff from my MSMQs
I Turned off my IIS website and the messages in the FromMainframe queue start to backup.
When I turn it back on, the messages disappear fairly fast (but still not all at once). The speed that they disappear is too fast for them to be processed by my Handle method.
Check Config files yet again.
Run the NServiceBusTools\MsmqUtils\Runner.exe \i
I ran it, rebooted, ran it again and again for good measure!
Check the Configs again (I must have missed SOMETHING right?)
Check the Development Environment Configs are not pointing to the Test Environment
I don't think it is possible to use another computer's MSMQ as your input queue, but it does not hurt to check.
Look for any catch blocks that could be silently killing my message.
One last check of the Config files.
Recreate my Test Environment on another machine (it worked flawlessly)
Run my stuff outside of IIS.
When I host outside of IIS (using NServiceBus.Host.exe) it all works fine. So it has to be an IIS thing right?
Go crazy and hope that stack overflow can offer any kind of insight.
So I know enough about what happened to throw out an "Answer".
When I setup my NServiceBus self hosting I had a call that loaded the message handlers.
NServiceBus.Configure.With().LoadMessageHandlers()
(There are more configurations, but I omitted them for brevity)
When you call this, NServiceBus scans the assmeblies for a class that implements IHandleMessages<T>.
So, somehow, on my Test Environment Machine, the ServiceBus scan of the directory for a class that calls IHandleMessages was failing to find my class (even though the assembly was absolutely there).
Turns out that if NServiceBus does not find something that handles a message it will THROW IT AWAY!!!
This is a total design bug in my opinion. The whole idea of NServiceBus is to not lose your data, but in this case it does just that!
Now, once you know about this pitfall, there are several ways around it.
Expressly state what your handler(s) should be:
NServiceBus.Configure.With().LoadMessageHandlers<First<MyMessageType>>()
Even further protection is to add another handler that will handle "Everything else". IMessage is the base for all message payloads, so if you put a handler on it, it will pickup everything.
If you set IMessage to handle after your messages get handled, then it will handle everything that NServiceBus can't find a handler for. If you throw and exception in that Handle method that will cause NServiceBus to to move the message to the error queue. (What I think should be the default behavior.)