What exactly is a "sleeping" dyno - node.js

This question actually means a couple of things.
First of all I want to ask what exactly happens when a dyno sleeps?
If I have global variables stored in an array in my bot, does it get
wiped when it sleeps (so that means that I have to actually save
everything to external files)? - Basically does my memory data cleared
when it sleeps, when it wokes up my bot won't be having that data?
Secondly, for the 550 free dyno hours, can I schedule a sleeping
schedule (e.g., 01:00 - 7:00 am) or is it not a daily limit (18h/day)
but a monthly limit (so a 24/7 uses up the hours until I have 0 for
the rest of the month)?

Adding on to what #Beppe C said in his answer, in order to understand dyno sleeping, you need to understand the difference between web and worker dynos.
Web dynos allow you to show a webpage, some file content, etc. This dyno goes to sleep periodically (after 30 minutes of inactivity). This is where you're getting the phrase dyno sleeping. If you insist on using a web dyno (as in you need to show content), the best way to schedule it is by using the Heruko Scheduler add-on. However, I don't recommend this, as it requires you to enter your credit card information. Otherwise, you could use a cron job service.
Worker dynos are different. As long as you don't use up your 550 hours in a given month (the hour limit is monthly), they will stay running without sleeping. With a worker dyno, any global variables will be saved for the duration of the time that the process stays running, unless you stop the process. The downside is that they work in the background, meaning that you can't show/display web content with them. If you need to show web content, stick with a web dyno.
To start a worker process for an app, scale up the app like so:
# scale up
heroku ps:scale worker=1 -a {appname}
Read more documentation about worker dynos here: https://devcenter.heroku.com/articles/background-jobs-queueing
My advice? If you don't need to show web content, use a worker dyno that periodically shuts off at a given time every day. Since you seem to be using node.js, maybe use setInterval to check the time?

A sleeping dyno is when the virtual node shuts down stopping the application which runs on it. The application memory is cleared (yes, all your variables and arrays) and it cannot process any request until the dyno restarts.
Any data which needs to survive should be persisted to an external storage (external file system, database).
A Web dyno goes to sleep after 30 min inactivity (ie no http requests have arrived), you cannot schedule this.
A Worker does not go to sleep and it runs until your free quota has run out.
You can always scale down (shutdown) and scale up (restart) using the command line.
# scale down
heroku ps:scale web=0 -a {appname}

Related

Response timeout or slow response from my node server hosted at Heroku's free plan

I am running a nodejs app on Heroku free tier.
Free tier is more than enough for occasional traffic, just a few times a month for a backend admin system.
However, if a few multiple Rest API calls were made within 1 or 2 minutes, it will encounter timeout error. Actual scenario - my node server is receiving WhatsApp replies coming from Twilio - Twilio uses webhook to call the REST API from my node server hosted at Heroku free tier. Multiple WhatsApp replies of about 10 - 20 is expected but could come in within a minute. Node server receives the data from Twilio and writes to FireStore for each WhatsApp reply.
I am reading Heroku's help document on how to deal with request time-outs > https://help.heroku.com/PFSOIDTR/why-am-i-seeing-h12-request-timeouts-high-response-times-in-my-app
Can I increase the number of web workers under the current Free unverified account?
I tried from the Heroku CLI and get the following response
heroku ps:type worker=standard-2x
› Warning: heroku update available from 7.59.2 to 7.60.2.
▸ Type worker not found in process formation.
▸ Types: web
If I get my account verified by adding a credit card, how many web workers can I add to the existing free dyno?
If I stay at Free tier, without upgrading to say Hobby tier. Even if there is unexpected event of a spike in traffic, my dyno will just go to sleep once the free dyno hours are used up - Heroku will not automatically charge my credit card for the excess traffic right? I am very concern due to news about Firebase users charged for thousands of dollars due to unexpected spike in traffic and there is no way to cap or limit it.
If I am expecting up to 20 simultaneous/concurrent REST API calls to my node server at Heroku within a minute - how many web workers should I increase?
On a free account, you can only have one worker at a time. If you add more dynos, you will be charged.
When your free Heroku dyno sleeps because it is not being used, if it receives a new request it has to wake up. Waking up takes quite a bit of time and likely is longer than Twilio's 10 second timeout for HTTP responses. I think your app is probably failing not because of the spike of traffic especially as you are using Node which can handle multiple connections with ease, but because your sleeping dyno times out.
You might find some benefit adding the same URL as the fallback URL for your number. That would allow Twilio to call your server, wake it up and if it fails the first time because it timed out, the fallback URL will make a second attempt which would hopefully succeed.
In reality, the best practice is to pay for an account that allows you to keep your dyno running the whole time.

how to stop heroku from restarting dynos

I have a node.js app hosted on Heroku. I am paying the $7 a month hosting for the better plan which has me running with the next tier dynos and SSL. My problem is, I have a cronjob running in my app that runs every minute. It is VERY important this runs every minute and pretty much never misses. However, it happens to not run sometimes, and after debugging a little bit, I believe it to be that it restarts itself. like so:
So I was wondering if there is a way to schedule the app to restart instead of having it do it whenever, or if my cronjob is actually the problem and I can't do what I'm looking for. any ideas?
EDIT: here's the cronjob code:
var sendTexts = new CronJob('*/1 * * * *', function() {
// code that sends Texts if event is true
}, null, true)
and it should run every 1 minute. it does locally when my server is up, but again the issue seems to be with restarting dynos
Dynos are restarted (cycled) at least every 24 hours, if you restart manually (with heroku CLI for example) will reset the 24 hour period.
You could consider restarting you app every X hours to try to manage that, however you must consider:
Dynos can be restarted randomly by Heroku (after a platform error)
upon restart your chronojob starts immediately, so you are going to have executions before a whole minute is passed
You might want to consider an architectural change using a DB or a queue which allow you not to rely on the application always running.
In cloud-based architecture it is never a good idea to assume a single instance (container) is always available.

Node/Bull/Throng background jobs. Super slow, and many processes are being used?

I've changed a long running process on an Node/Express route to use Bull/Redis.
I've pretty much copied this tutorial on Heroku docs.
The gist of that tutorial is: The Express route schedules the Job, immediately returns 200 to the client, and browser long polls for the job status (a ping route on Express). When the client gets a completed status, it displays it in the UI. The Worker is a separate file and is run with an addtional yarn run worker.js.
Notice the end where it recommends using Throng for clustering your workers.
I'm using this Bull Dashboard to monitor jobs/queues. This dashboard shows the available Workers, and their status (Idle when not running/ Not in Idle when running).
I've got the MVP working, but the operation is super slow. The average time to complete is 1 minute 30 second. Whereas before (before adding Bull) it was seconds to complete.
Another strange it seems to take at least 30 seconds for a Workers status to change from Idle to not Idle. Seems that a lot of the latency is waiting for the worker.
Being that the new operation is a separate file (worker.js) and throng is enabling clustering, I was expecting this operation to be super fast, but it is just the opposite.
Does anyone have an experience with this? Or pointers to help figure out what is causing this to be so slow?

Google Cloud Platform : Running several hours scraping script

I have a NodeJS script, that scrapes URLs everyday.
The requests are throttled to be kind to the server. This results in my script running for a fairly long time (several hours).
I have been looking for a way to deploy it on GCP. And because it was previously done in cron, I naturally had a look at how to have a cronjob running on Google Cloud. However, according to the docs, the script has to be exposed as an API and http calls to that API can only run for up to 60 minutes, which doesn't fit my needs.
I had a look at this S.O question, which recommends to use a Cloud Function. However, I am unsure this approach would be suitable in my case, as my script requires a lot more processing than the simple server monitoring job described there.
Has anyone experience in doing this on GCP ?
N.B : To clarify, I want to to avoid deploying it on a VPS.
Edit :
I reached out to google, here is their reply :
Thank you for your patience. Currently, it is not possible to run cron
script for 6 to 7 hours in a row since the current limitation for cron
in App Engine is 60 minutes per HTTP
request.
If it is possible for your use case, you can spread the 7 hours to
recurrring tasks, for example, every 10 minutes or 1 hour. A cron job
request is subject to the same limits as those for push task
queues. Free
applications can have up to 20 scheduled tasks. You may refer to the
documentation
for cron schedule format.
Also, it is possible to still use Postgres and Redis with this.
However, kindly take note that Postgres is still in beta.
As I a can't spread the task, I had to keep on managing a dokku VPS for this.
I would suggest combining two services, GAE Cron Jobs and Cloud Tasks.
Use GAE Cron jobs to publish a list of sites and ranges to scrape to Cloud Tasks. This initialization process doesn't need to be 'kind' to the server yet, and can simple publish all chunks of works to the Cloud Task queue, and consider itself finished when completed.
Follow that up with a Task Queue, and use the queue rate limiting configuration option as the method of limiting the overall request rate to the endpoint you're scraping from. If you need less than 1 qps add a sleep statement in your code directly. If you're really queueing millions or billions of jobs follow their advice of having one queue spawn to another.
Large-scale/batch task enqueues
When a large number of tasks, for
example millions or billions, need to be added, a double-injection
pattern can be useful. Instead of creating tasks from a single job,
use an injector queue. Each task added to the injector queue fans out
and adds 100 tasks to the desired queue or queue group. The injector
queue can be sped up over time, for example start at 5 TPS, then
increase by 50% every 5 minutes.
That should be pretty hands off, and only require you to think through the process of how the cron job pulls the next desired sites and pages, and how small it should break down the work loads into.
I'm also working on this task. I need to crawl website and have the same problem.
Instead of running the main crawler task on the VM, I move the task to Google Cloud Functions. The task is consist of add get the target url, scrape the web, and save the result to Datastore, then return the result to caller.
This is how it works, I have a long run application that call be called a master. The master know what URL we are going to access in to. But instead of access the target website by itself, it sends the url to a crawler function in GCF. Then the crawling tasked is done and send result back to the master. In this case, the master only request and get a small amount of data and never touch the target website, let the rest to GCF. You can off load your master and crawl the website in parallel via GCF. Or you can use other method to trigger GCF instead of HTTP request too.

Node.JS with forever on Heroku

So, I need to run my node.js app on heroku, it works very well, but when my app crashes, i need something to restart it, so i added forever to package.json, and created a file named forever.js with this:
var forever = require('forever');
var child = new (forever.Monitor)('web.js', {
max: 3,
silent: false,
options: []
});
//child.on('exit', this.callback);
child.start();
forever.startServer(child);
on my Procfile (that heroku uses to know what to start) i put:
web: node forever.js
alright! Now everytime my app crashes it auto restarts, but, from time to time (almost every 1 hour), heroku starts throwing H99 - Platform error, and about this error, they say:
Unlike all of the other errors which will require action from you to correct, this one does not require action from you. Try again in a minute, or check the status site.
But I just manually restart my app and the error goes away, if I don't do that, it may take hours to go away by itself.
Can anyone help me here? Maybe this is a forever problem? A heroku issue?
This is an issue with free Heroku accounts: Heroku automatically kills unpaid apps after 1 hour of inactivity, and then spins them back up the next time a request comes in. (As mentioned below, this does not apply to paid accounts. If you scale up to two servers and pay for the second one, you get two always-on servers.) - https://devcenter.heroku.com/articles/dynos#dyno-sleeping
This behavior is probably not playing nicely with forever. To confirm this, run heroku logs and look for the lines "Idling" and " Stopping process with SIGTERM" and then see what comes next.
Instead of using forever, you might want to try the using the Cluster API and automatically create a new child each time one dies. http://nodejs.org/api/cluster.html#cluster_cluster is a good example, you'd just put your code into the else block.
The upshot is that your app is now much more stable, plus it gets to use all of the available CPU cores (4 in my experience).
The downside is that you cannot store any state in memory. If you need to store sessions or something along those lines, try out the free Redis To Go addon (heroku addons:add redistogo).
Here's an example that's currently running on heroku using cluster and Redis To Go: https://github.com/nfriedly/node-unblocker
UPDATE: Heroku has recently made some major changes to how free apps work, and the big one is they can only be online for a maximum of 18 hours per day, making it effectively unusable as a "real" web server. Details at https://blog.heroku.com/archives/2015/5/7/heroku-free-dynos
UPDATE 2: They changed it again. Now, if you verify your ID, you can run 1 free dyno constantly: https://blog.heroku.com/announcing_heroku_free_ssl_beta_and_flexible_dyno_hours#flexible-free-dyno-hours

Resources