Using 'cron' module in node.js on heroku server - node.js

I am using cron in node.js to schedule a function that sends text messages at user-determined times. It works on my local server, but when i deploy to heroku the functions never get called.

I'll elaborate a little bit more on what #rsp said above, so that if anyone else finds this question they'll understand why using Heroku Scheduler is the correct answer here.
When you're running software on Heroku, what happens is that Heroku will take your project, and run it on a random dyno (server) in the Heroku collection of servers on Amazon.
For a number of reasons (including to help distribute application load evenly across a large number of Amazon servers), Heroku will periodically move your dyno from one Amazon server to another. This happens many times per day, automatically, behind the scenes.
This means that your application code will be periodically restarting all the time when running on Heroku.
Now -- this isn't a bad thing from an end-user perspective, because while your application code is restarting, Heroku will queue up incoming requests, then just send them to your application once it's been successfully restarted on a new host. So to the end user, this behavior is 100% transparent.
What's important to know here, though, is that since your application code may randomly restart, you shouldn't use it to do things like run long tasks that take a while to finish, or queue up jobs to be executed in the future (what the cron module does).
Instead: Heroku created a free scheduler addon that you can use to basically say "Hey Heroku, run this Node script every minute|hour|etc."
The scheduler addon Heroku provides will reliably execute your cron task because Heroku keeps track of that timing stuff separately (outside of your application logic).
Hopefully this helps!

I'm using a cron job on heroku with node. Here is my top level stuff
var CronJob = require('cron').CronJob;
var dailyJob = new CronJob({
cronTime: '0 0 0 * * *',
onTick: function () {
// Do daily function
console.log('I get called 1 time a day.');
},
start: false
});
dailyJob.start();

On Heroku you may need to use the Heroku Scheduler.
See: https://devcenter.heroku.com/articles/scheduler

Related

how to stop heroku from restarting dynos

I have a node.js app hosted on Heroku. I am paying the $7 a month hosting for the better plan which has me running with the next tier dynos and SSL. My problem is, I have a cronjob running in my app that runs every minute. It is VERY important this runs every minute and pretty much never misses. However, it happens to not run sometimes, and after debugging a little bit, I believe it to be that it restarts itself. like so:
So I was wondering if there is a way to schedule the app to restart instead of having it do it whenever, or if my cronjob is actually the problem and I can't do what I'm looking for. any ideas?
EDIT: here's the cronjob code:
var sendTexts = new CronJob('*/1 * * * *', function() {
// code that sends Texts if event is true
}, null, true)
and it should run every 1 minute. it does locally when my server is up, but again the issue seems to be with restarting dynos
Dynos are restarted (cycled) at least every 24 hours, if you restart manually (with heroku CLI for example) will reset the 24 hour period.
You could consider restarting you app every X hours to try to manage that, however you must consider:
Dynos can be restarted randomly by Heroku (after a platform error)
upon restart your chronojob starts immediately, so you are going to have executions before a whole minute is passed
You might want to consider an architectural change using a DB or a queue which allow you not to rely on the application always running.
In cloud-based architecture it is never a good idea to assume a single instance (container) is always available.

Doing tasks before heroku nodejs server is ready

When deploying a new release, I would like my server to do some tasks before actually being released and listen to http requests.
Let's say that those tasks take around a minute and are setting some variables: until the tasks are done I would like the users to be redirected to the old release.
Basically do some nodejs work before the server is ready.
I tried a naive approach:
doSomeTasks().then(() => {
app.listen(PORT);
})
But as soon as the new version is released, all https request during the tasks do not work instead of being redirect to old release.
I have read https://devcenter.heroku.com/articles/release-phase but this looks like I can only run an external script which is not good for me since my tasks are setting cache variables.
I know this is possible with /check_readiness on App Engine, but I was wondering for Heroku.
You have a couple options.
If the work you're doing only changes on release, you can add a task as part of your dyno build stage that will fetch and store data inside of the compiled slug that will be deployed to virtual containers on Heroku and booted as your dyno. For example, you can run a task in your build cycle that fetches data and stores/caches it as a file in your app that you read on-boot.
If this data changes more frequently (e.g. daily), you can utilize “preboot” to capture and cache this data on a per-dyno basis. Depending on the data and architecture of your app you may want to be cautious with this approach when running multiple dynos as each dyno will have data that was fetched independently, thus this data may not match across instances of your application. This can lead to subtle, hard to diagnose bugs.
This is a great option if you need to, for example, pre-cache a larger chunk of data and then fetch only new data on a per-request basis (e.g. fetch the last 1,000 posts in an RSS feed on-boot, then per request fetch anything newer—which is likely to be fewer than a few new entries—and coalesce the data to return to the client).
Here's the documentation on customizing a build process for Node.js on Heroku.
Here's the documentation for enabling and working with Preboot on Heroku
I don't think it's a good approach to do it this way. you can use an external script ( npm script ) to do this task and then use the release phase. the situation here is very similar to running migrations you can require the needed libraries to the script you can even load all the application to the script without listening to a port let's make it clearer by example
//script file
var client = require('cache_client');
// and here you can require all the needed libarires to the script
// then execute your logic using sync apis
client.setCacheVar('xyz','xyz');
then in packege.json in "scripts" add this script let assume that you named it set_cache
"scripts": {
"set_cache": "set_cache",
},
now you can use npm to run this script as npm set_cache and use this command in Procfile
web: npm start
release: npm set_cache

Google Cloud Platform : Running several hours scraping script

I have a NodeJS script, that scrapes URLs everyday.
The requests are throttled to be kind to the server. This results in my script running for a fairly long time (several hours).
I have been looking for a way to deploy it on GCP. And because it was previously done in cron, I naturally had a look at how to have a cronjob running on Google Cloud. However, according to the docs, the script has to be exposed as an API and http calls to that API can only run for up to 60 minutes, which doesn't fit my needs.
I had a look at this S.O question, which recommends to use a Cloud Function. However, I am unsure this approach would be suitable in my case, as my script requires a lot more processing than the simple server monitoring job described there.
Has anyone experience in doing this on GCP ?
N.B : To clarify, I want to to avoid deploying it on a VPS.
Edit :
I reached out to google, here is their reply :
Thank you for your patience. Currently, it is not possible to run cron
script for 6 to 7 hours in a row since the current limitation for cron
in App Engine is 60 minutes per HTTP
request.
If it is possible for your use case, you can spread the 7 hours to
recurrring tasks, for example, every 10 minutes or 1 hour. A cron job
request is subject to the same limits as those for push task
queues. Free
applications can have up to 20 scheduled tasks. You may refer to the
documentation
for cron schedule format.
Also, it is possible to still use Postgres and Redis with this.
However, kindly take note that Postgres is still in beta.
As I a can't spread the task, I had to keep on managing a dokku VPS for this.
I would suggest combining two services, GAE Cron Jobs and Cloud Tasks.
Use GAE Cron jobs to publish a list of sites and ranges to scrape to Cloud Tasks. This initialization process doesn't need to be 'kind' to the server yet, and can simple publish all chunks of works to the Cloud Task queue, and consider itself finished when completed.
Follow that up with a Task Queue, and use the queue rate limiting configuration option as the method of limiting the overall request rate to the endpoint you're scraping from. If you need less than 1 qps add a sleep statement in your code directly. If you're really queueing millions or billions of jobs follow their advice of having one queue spawn to another.
Large-scale/batch task enqueues
When a large number of tasks, for
example millions or billions, need to be added, a double-injection
pattern can be useful. Instead of creating tasks from a single job,
use an injector queue. Each task added to the injector queue fans out
and adds 100 tasks to the desired queue or queue group. The injector
queue can be sped up over time, for example start at 5 TPS, then
increase by 50% every 5 minutes.
That should be pretty hands off, and only require you to think through the process of how the cron job pulls the next desired sites and pages, and how small it should break down the work loads into.
I'm also working on this task. I need to crawl website and have the same problem.
Instead of running the main crawler task on the VM, I move the task to Google Cloud Functions. The task is consist of add get the target url, scrape the web, and save the result to Datastore, then return the result to caller.
This is how it works, I have a long run application that call be called a master. The master know what URL we are going to access in to. But instead of access the target website by itself, it sends the url to a crawler function in GCF. Then the crawling tasked is done and send result back to the master. In this case, the master only request and get a small amount of data and never touch the target website, let the rest to GCF. You can off load your master and crawl the website in parallel via GCF. Or you can use other method to trigger GCF instead of HTTP request too.

No Mongo Query gets result when cron is running in background

I have been NodeJS as server side and MongoDB as our database. It really works great together.
Now I have added node-schedule library into our system , to call a function like a cron-job.
The process takes around hours to complete.
My issue is whenever cron is running , all users to my site gets No response fro server i.e database gets locked.
Stuck on the issue from a week , needs good solution to run cron , without affecting users using the site.
Typically you will want to write a worker and run the worker in a different entry point that is not part of your server. There are multiple ways you could achieve this.
1) Write a worker on another server to interact with your database
2) Write a service worker on another server that interacts with your api
3) Use the same server but setup a cronjob to execute the file that does the work at a specified time.
But you should not do this from the same entry point that your server is running on. You need a different execution file.
There is one thing you can do to run this where it will not bog down your server and that would be for your trigger for node-schedule to run a child process. https://nodejs.org/api/child_process.html

Node.JS with forever on Heroku

So, I need to run my node.js app on heroku, it works very well, but when my app crashes, i need something to restart it, so i added forever to package.json, and created a file named forever.js with this:
var forever = require('forever');
var child = new (forever.Monitor)('web.js', {
max: 3,
silent: false,
options: []
});
//child.on('exit', this.callback);
child.start();
forever.startServer(child);
on my Procfile (that heroku uses to know what to start) i put:
web: node forever.js
alright! Now everytime my app crashes it auto restarts, but, from time to time (almost every 1 hour), heroku starts throwing H99 - Platform error, and about this error, they say:
Unlike all of the other errors which will require action from you to correct, this one does not require action from you. Try again in a minute, or check the status site.
But I just manually restart my app and the error goes away, if I don't do that, it may take hours to go away by itself.
Can anyone help me here? Maybe this is a forever problem? A heroku issue?
This is an issue with free Heroku accounts: Heroku automatically kills unpaid apps after 1 hour of inactivity, and then spins them back up the next time a request comes in. (As mentioned below, this does not apply to paid accounts. If you scale up to two servers and pay for the second one, you get two always-on servers.) - https://devcenter.heroku.com/articles/dynos#dyno-sleeping
This behavior is probably not playing nicely with forever. To confirm this, run heroku logs and look for the lines "Idling" and " Stopping process with SIGTERM" and then see what comes next.
Instead of using forever, you might want to try the using the Cluster API and automatically create a new child each time one dies. http://nodejs.org/api/cluster.html#cluster_cluster is a good example, you'd just put your code into the else block.
The upshot is that your app is now much more stable, plus it gets to use all of the available CPU cores (4 in my experience).
The downside is that you cannot store any state in memory. If you need to store sessions or something along those lines, try out the free Redis To Go addon (heroku addons:add redistogo).
Here's an example that's currently running on heroku using cluster and Redis To Go: https://github.com/nfriedly/node-unblocker
UPDATE: Heroku has recently made some major changes to how free apps work, and the big one is they can only be online for a maximum of 18 hours per day, making it effectively unusable as a "real" web server. Details at https://blog.heroku.com/archives/2015/5/7/heroku-free-dynos
UPDATE 2: They changed it again. Now, if you verify your ID, you can run 1 free dyno constantly: https://blog.heroku.com/announcing_heroku_free_ssl_beta_and_flexible_dyno_hours#flexible-free-dyno-hours

Resources