I want to schedule some code to be run at a variable time. For example, after 60 minutes I want to send an HTTP request to an endpoint and update a database document, but I also want to be able to cancel that code from being ran if an action occurs before that 60 minutes.
What is the best way to architect this? I want it to be able to survive server restarts and also to work if my app is scaled across multiple servers.
Thanks!
You would use setTimeout() for that and save the timer ID that it returns because you can then use that timer ID to cancel the timer. To schedule the timer:
var timer = setTimeout(myFunc, 60 * 60 * 1000);
Then, sometime later before the timer fires, you can cancel the timer with:
clearTimeout(timer);
If you want to survive server restarts, then you also need to save the actual system time that you want the timer to fire to some persistent store (like a config file on disk or database). Then, when your server starts, you read that value from the persistent store and, if it is set, then you set a new setTimeout() that will trigger at that time. Likewise, when the timer fires or when you clear the timer because you no longer need it, you then update the persistent store so there is no future time stored there.
You could make this all fairly clean to use by creating a persistentTimer object that had a method for setting the timer, clearing the timer and initializing from whatever persistent store you are using upon server restart.
Related
I have 500 000 data in a single collection (NODE JS, Express JS, Mongoose).
We need send a sms to each one of them. we have a direct API URL.
Now we need to send this one by one in the background and update the status on db as sent.
There is a condition, to send 60 sms per minute.
How do i schedule automatically in the background.
How do i send this? Is it possible via corn tab? Any reference
The most basic solution is to use a timer function like setTimeout and setInterval, run it every minute, fetch a batch of unsent messages from the database and send them while marking them as sent. There are a few caveats:
Make sure the sending function finishes before being executed again, in case sending the batch of messages takes more than 1 minute. Running setTimeout at the end of the function is a safer alternative to setInterval.
Active timer will prevent your application from exiting, so make sure in to run clearTimeout in an exit handler.
Periodic task can block your event loop, so it may be better to either run message sending in a separate worker thread or as a completely separate process (i.e. script executed separately with Node).
Possibly more robust option is to use a job scheduler like bree or a task queue like bull (requires Redis).
I have a timer trigger Function App ("version": "2.0") in azure which runs every 5 min. Cron Expression Used- 0 */5 * * * *
Its working as expected but sometimes it suddenly stops running. If I disable the function app and re-enable it, its starts working again.
If you see the screenshot below, It stopped working from 2021-04-14 16:54:59.998 to 2021-04-14 20:55:12.139
Any Help will be appreciated.
There could be different reasons for this issue and I will suggest you to review the below document to troubleshoot the issue and see if you are able to find the root cause.
Timer triggered function app uses TimerTriggerAttribute. This attribute consists of the Singleton Lock feature which ensures that only a single instance of the function is running at any given time. If any process runs longer than the scheduled timer, the new incoming process waits for the older process to finish and then uses the same instance. If you are using the same storage account across different timer trigger functions then this could be one of the reasons as mentioned here.
The other reason could be a restart and I will suggest you to check the Web App Restart detection section.
https://github.com/Azure/azure-functions-host/wiki/Investigating-and-reporting-issues-with-timer-triggered-functions-not-firing
https://github.com/Azure/azure-webjobs-sdk-extensions/wiki/TimerTrigger#troubleshooting
I am serving my users with data fetched from an external API. Now, I don't know when this API will have new data, how would be the best approach to do that using Node, for example?
I have tried setInterval's and node-schedule to do that and got it working, but isn't it expensive for the CPU? For example, over a day I would hit this endpoint to check for new data every minute, but it could have new data every five minutes or more.
The thing is, this external API isn't ran by me. Would the only way to check for updates hitting it every minute? Is there any module that can do that in Node or any approach that fits better?
Use case 1 : Call a weather API for every city of the country and just save data to my db when it is going to rain in a given city.
Use case 2 : Send notification to the user when a given Philips Hue lamp is turned on at the time it is turned on without having to hit the endpoint to check if it is on or not.
I appreciate the time to discuss this.
If this external API has no means of notifying you when there's new data, then the only thing you can do is to "poll" it to check for new data.
You will have to decide what an "efficient design" for polling is in your specific application and given the type of data and the needs of the client (what is an acceptable latency for new data).
You also need to be sure that your service is not violating any terms of service with your polling scheme or running afoul of rate limiting that may deny you access to the server if you use it "too much".
Would the only way to check for updates hitting it every minute?
Unless the API offers some notification feature, there is no other scheme other than polling at some interval. Polling every minute is fairly quick. Do your clients really need information that is less than a minute old? Or would it really make no difference if the information was as much as 5 minutes old.
For example, in your example of weather, a client wouldn't really need temperature updates more often than probably every 10-15 minutes.
Is there any module that can do that in Node or any approach that fits better?
No. Not really. You'll probably just use some sort of timer (either repeated setTimeout() or setInterval() in a node.js app to repeatedly carry out your API operations.
Use case: Call a weather API for every city of the country and just save data to my db when it is going to rain in a given city.
Trying to pre-save every possible piece of data from an external API is probably a losing proposition. You're essentially trying to "scrape" all the data from the external API. That is likely against the terms of service and will likely also run afoul of rate limits. And, it's just not very practical.
Instead, you will probably want to fetch data upon demand (when a client requests data for Phoenix, then, and only then, do you start collecting data for Phoenix) and then once a demand for a certain type of data (temperatures in a particular city) is established, then you might want to pre-cache that data more regularly so you can notify clients of changes. If, after awhile, no clients are asking for data from Phoenix, you stop requesting updates for Phoenix any more until a client establishes demand again.
I have tried setInterval's and node-schedule to do that and got it working, but isn't it expensive for the CPU? For example, over a day I would hit this endpoint to check for new data every minute, but it could have new data every five minutes or more.
Making a remote network request is not a CPU intensive operation, even if you're doing it every minute. node.js uses non-blocking networking so most of the time during a network request, node.js isn't doing anything and isn't using the CPU at all. The only time the CPU would be briefly used is when you first send the API request and then when you receive back the result from the API call and need to process it.
Whether you really need to "poll" every minute depends upon the data and the needs of the client. I'd ask yourself if your app will work just fine if you check for new data every 5 minutes.
The method I would use to update would be contained outside of the code in a scheduled batch/powershell/bash file. In windows you can schedule tasks based upon time of day or duration since last run, so what you could do is run a simple command that will kill your application for five minutes, run npm update, and then restart your application before closing the shell.
That way you're staying out of your API and keeping code to a minimum, and if your code is inside that Node package in the update, it'll be there and ready once you make serious application changes or you need to take the server down for maintenance and updates to the low-level code.
This is a light-weight solution for you and it's a method I've used once or twice at my workplace. There are lots of options out there, and if this isn't what you're looking for I can keep looking out for you.
I'm creating an Azure Function that will run in consumption mode and will get triggered by messages in a queue.
The function will typically need to make a database call when it gets triggered. I "assume" the function gets launched and loaded to memory when it gets triggered and when it's idle, it gets terminated because it's running in consumption mode.
Based on this assumption, I don't think I can load up a singleton instance of my back-end client which includes the logic for making database calls.
Is then new'ing up my back-end client the right approach every time I need to perform some back-end operations?
This is a wrong assumption. Your function will be loaded during the first call, and will be unloaded only after an idle timeout (5 or 10 minutes).
You will not pay for idling, but you will pay for the whole time that your function was running, including the wait time during the database calls (or other IO).
Singletons and statics work just fine; and you should reuse instances like HttpClient between the calls.
I am trying to use mysql pool in my NodeJS service that is running on Amazon Lambda.
This is the beginning of my module that works with database:
console.log('init database module ...');
var settings = require('./settings.json');
var mysql = require('mysql');
var pool = mysql.createPool(settings);
As following from logs in Amazon console this piece of code is executed very often:
If I just deployed the service and executed 10 requests simultaneously - all these 10 requests execute this piece of code.
If I again execute 10 requests simultaneously immediately after first series - they don't execute this code.
If some time is passed from last query - then some of the requests re-execute that code.
Even if I use global - this decreases but not eliminates duplicates:
if (!global.pool) {
console.log('init database module ...');
var settings = require('./settings.json');
var mysql = require('mysql');
global.pool = mysql.createPool(settings);
}
Moreover, if request execution has some error - this piece of code is executed after the request and global.pool is null at that moment.
So, does this mean that using pool in Amazon Lambda is not possible?
Is there any option how I can make Amazon use the same pool instance every time?
Each time a Lambda function is invoked, it runs in its own, independent container. If no idle containers are available, a new one is automatically created by the service. Hence:
If I just deployed the service and executed 10 requests simultaneously - all these 10 requests execute this piece of code.
If a container is available, it may be, and very likely will be, reused. When that happens, the process is already running, so the global section doesn't run again -- the invocation starts with the handler. Therefore:
If I again execute 10 requests simultaneously immediately after first series - they don't execute this code.
After each invocation is complete, the container that was used is frozen, and will ultimately be either thawed and reused for a subsequent invocation, or if it isn't needed after a few minutes, it is destroyed. Thus:
If some time is passed from last query - then some of the requests re-execute that code.
Makes sense, now, right?
The only "catch" is that the amount of time that must elapse before a container is destroyed is not a fixed value. Anecdotally, it appears to be about 15 minutes, but I don't believe it's documented, since most likely the timer is adaptive... the service can (at its descretion) use heuristics to predict whether recent activity was a spike or likely to be sustained, and probably considers other factors.
(Lambda#Edge, which is Lambda integrated with CloudFront for HTTP header manipulation, seems to operate with different timing. Idle containers seem to persist much longer, at least in small quantities, but this makes sense because they are always very small containers... and again this observation is anecdotal.)
The global section of your code only runs when a new container is created.
Pooling doesn't make sense because nothing is shared during an invocation -- each invocation is the only one running in its container -- one per process -- at any one time.
What you will want to do, though, is change the idle_timeout on the connections. MySQL Server doesn't have an effective way to "discover" that an idle connection has gone away entirely, so when your connection goes away when the container is destroyed, the server just sits there, and the connection remains in the Sleep state until the default idle_timeout expires. The default is 28800 seconds, or 8 hours, which is too long. You can change this on the server, or send the query SET ##IDLE_TIMEOUT = 900 (though you'll need to experiment with an appropriate value).
Or, you can establish and destroy the connection inside the handler for each invocation. This will take a little bit more time, of course, but it's a sensible approach if your function isn't going to be running very often. The MySQL client/server protocol's connection/handshake sequence is reasonably lightweight, and frequent connect/disconnect doesn't impose as much load on the server as you might expect... although you would not want to do that on an RDS server that uses IAM token authentication, which is more resource-intensive.