I am trying to understand how node-cron by kelektiv works.
Specifically, if your node app crashes, how does it remember the dates that you scheduled for an event? Does it store the jobs in a database somewhere or a somewhere locally on the machine?
Any recommended reading resources or an explanation will be very helpful.
Thank you in advance for your answer.
See this code: https://github.com/kelektiv/node-cron/blob/master/lib/cron.js
They are using methods to calculate when to send next by sendAt, how much time is left before sending next by getTimeout and then they are simply putting a setTimeout based on that in start.
It's a nice piece of code and I'll suggest you to check it out, it's very simple and written in very understandable way.
And no it doesn't stores the next time in Database, it's just setTimeout
Related
I am trying to implement a Django web application (on Python 3.8.5) which allows a user to create “activities” where they define an activity duration and then set the activity status to “In progress”.
The POST action to the View writes the new status, the duration and the start time (end time, based on start time and duration is also possible to add here of course).
The back-end should then keep track of the duration and automatically change the status to “Finished”.
User actions can also change the status to “Finished” before the calculated end time (i.e. the timer no longer needs to be tracked).
I am fairly new to Python so I need some advice on the smartest way to implement such a concept?
It needs to be efficient and scalable – I’m currently using a Heroku Free account so have limited system resources, but efficiency would also be important for future production implementations of course.
I have looked at the Python threading Timer, and this seems to work on a basic level, but I’ve not been able to determine what kind of constraints this places on the system – e.g. whether the spawned Timer thread might prevent the main thread from finishing and releasing resources (i.e. Heroku Dyno threads), etc.
I have read that persistence might be a problem (if the server goes down), and I haven’t found a way to cancel the timer from another process (the .cancel() method seems to rely on having the original object to cancel, and I’m not sure if this is achievable from another process).
I was also wondering about a more “background” approach, i.e. a single process which is constantly checking the database looking for activity records which have reached their end time and swapping the status.
But what would be the best way of implementing such a server?
Is it practical to read the database every second to find records with an end time of “now”? I need the status to change in real-time when the end time is reached.
Is something like Celery a good option, or is it overkill for a single process like this?
As I said I’m fairly new to these technologies, so I may be missing other obvious solutions – please feel free to enlighten me!
Thanks in advance.
To achieve this you need some kind of scheduling tasks functionality. For a fast simpler implementation is a good solution to use the Timer object from the
Threading module.
A more complete solution is tu use Celery. If you are new, deeping in it will give you a good value start using celery as a queue manager distributing your work easily across several threads or process.
You mentioned that you want it to be efficient and scalable, so I guess you will want to implement similar functionalities that will require multiprocessing and schedule so for that reason my recommendation is to use celery.
You can integrate it into your Django application easily following the documentation Integrate Django with Celery.
I'm writing a crawler module which is calling it-self recursively to download more and more links depending on a depth option parameter passed.
Besides that, I'm doing more tasks on the returned resources I've downloaded (enrich/change it depending on the configuration passed to the crawler). This process is going on recursively until it's done which might take a-lot of time (or not) depending on the configurations used.
I wish to optimize it to be as fast as possible and not to hinder on any Node.js application that will use it.I've set up an express server that one of its routes launch the crawler for a user defined (query string) host. After launching a few crawling sessions for different hosts, I've noticed that I can sometimes get real slow responses from other routes that only return simple text.The delay can be anywhere from a few milliseconds to something like 30 seconds, and it's seems to be happening at random times (well nothing is random but I can't pinpoint the cause).I've read an article of Jetbrains about CPU profiling using V8 profiler functionality that is integrated with Webstorm, but unfortunately it only shows on how to collect the information and how to view it, but it doesn't give me any hints on how to find such problems, so I'm pretty much stuck here.
Could anyone help me with this matter and guide me, any tips on what could hinder the express server that my crawler might do (A lot of recursive calls), or maybe how to find those hotspots I'm looking for and optimize them?
It's hard to say anything more specific on how to optimize code that is not shown, but I can give some advice that is relevant to the described situation.
One thing that comes to mind is that you may be running some blocking code. Never use deep recursion without using setTimeout or process.nextTick to break it up and give the event loop a chance to run once in a while.
I'm trying to figure out exactly how nodejs tasks are run. I understand that there is a main loop that takes requests and then queues them up and moves on. What exactly then executes those queued up events/tasks?
Update:
Can somebody actually please explain it? I appreciate people wanting me to script it and figure it out myself, but sometimes it's better to just have it explained rather than creating barriers to learning simple concepts.
you can follow https://github.com/node-inspector/node-inspector
use can use node-inspector to select a script and set some breakpoints,help you to understand event loop,
Right now I have a weekly email job that works by first checking a last_email_sent timestamp against the current time, it then uses setTimeout to schedule a routine that is exactly a week from the last_email_sent timestamp. If the process ever restarts, the setTimeout would be queued again but the interval would of course be smaller. This works for a weekly email job, but is there a better way to handle jobs in node.js? Maybe there's a module out there that can let me manage my jobs that I'm not aware of.
There's a handy module in npmjs.org called node-cron.
It'll give you more flexibility.
Many of the modules listed in the node.js wiki under "Message Queues" will help with this type of system. Being a TJ Holowaychuck fanboy, I myself would probably first look at Kue.
Can someone recommend a straight forward way of adding some type of graphical notification (status bar, spinning clocks, etc...) to my wx.Python gui application? Currently, it searches logs on a server for unique strings, and often times takes upwards to 3-4 minutes to complete. However, it would be convenient to have some type of display letting a user know that the status of the job towards finishing/completion. If I added a feature like this, I'm not sure, but I'm afraid I may have to look at using threads ... and I'm a complete newbie to Python? Any help and direction will be appreciated.
Yes, you'd need to use threads or queues or something similar. Fortunately, there are some excellent examples here: http://wiki.wxpython.org/LongRunningTasks and this tutorial I wrote is pretty straight-forward too: http://www.blog.pythonlibrary.org/2010/05/22/wxpython-and-threads/
Using threads isn't that hard. Basically you put the long running part in the thread and every so often, you send a status update to your GUI using a thread-safe method, like wx.PostEvent or wx.CallAfter. Then you can update your statusbar or a progress bar or whatever you're using.