Doing tasks before heroku nodejs server is ready

Doing tasks before heroku nodejs server is ready - node.js

When deploying a new release, I would like my server to do some tasks before actually being released and listen to http requests.
Let's say that those tasks take around a minute and are setting some variables: until the tasks are done I would like the users to be redirected to the old release.
Basically do some nodejs work before the server is ready.
I tried a naive approach:
doSomeTasks().then(() => {
app.listen(PORT);
})
But as soon as the new version is released, all https request during the tasks do not work instead of being redirect to old release.
I have read https://devcenter.heroku.com/articles/release-phase but this looks like I can only run an external script which is not good for me since my tasks are setting cache variables.
I know this is possible with /check_readiness on App Engine, but I was wondering for Heroku.

You have a couple options.
If the work you're doing only changes on release, you can add a task as part of your dyno build stage that will fetch and store data inside of the compiled slug that will be deployed to virtual containers on Heroku and booted as your dyno. For example, you can run a task in your build cycle that fetches data and stores/caches it as a file in your app that you read on-boot.
If this data changes more frequently (e.g. daily), you can utilize “preboot” to capture and cache this data on a per-dyno basis. Depending on the data and architecture of your app you may want to be cautious with this approach when running multiple dynos as each dyno will have data that was fetched independently, thus this data may not match across instances of your application. This can lead to subtle, hard to diagnose bugs.
This is a great option if you need to, for example, pre-cache a larger chunk of data and then fetch only new data on a per-request basis (e.g. fetch the last 1,000 posts in an RSS feed on-boot, then per request fetch anything newer—which is likely to be fewer than a few new entries—and coalesce the data to return to the client).
Here's the documentation on customizing a build process for Node.js on Heroku.
Here's the documentation for enabling and working with Preboot on Heroku

I don't think it's a good approach to do it this way. you can use an external script ( npm script ) to do this task and then use the release phase. the situation here is very similar to running migrations you can require the needed libraries to the script you can even load all the application to the script without listening to a port let's make it clearer by example
//script file
var client = require('cache_client');
// and here you can require all the needed libarires to the script
// then execute your logic using sync apis
client.setCacheVar('xyz','xyz');
then in packege.json in "scripts" add this script let assume that you named it set_cache
"scripts": {
"set_cache": "set_cache",
},
now you can use npm to run this script as npm set_cache and use this command in Procfile
web: npm start
release: npm set_cache

Related

Ways to overcome the 230s hard coded limit to the SCM endpoint

Background
We have a PHP App Service and MySQL that is deployed using an Azure Devops Pipeline (YML). The content itself is a PHP site that it packaged up into a single file using Akeeba by an external supplier. The package is a Zip file (which can be deployed as a standard Zip deployment) and inside the Zip file is a huge JPA file. The JPA is essentially the whole web site plus database tables, settings, file renames and a ton of other stuff all rolled into one JPA file. Akeeba essentially unzips the files, copies them to the right places, does all the DB stuff and so on. To kick the process off, we can simply connect to a specific URL (web site + path) and run the PHP which does all the clever unpackaging via a web GUI. But, we want to include this stage in the pipeline instead so that the process is fully automated end to end. Akeeba has a CLI as an alternative to the Web GUI deployment, so it should go like this:
Create web app
Deploy the web site ZIP (zipDeploy)
Use the REST API to access Kudu and run the relevant command (php install.php web.jpa) to unpack the jpa and do the MySQL stuff - this normally takes well over 30 minutes (it is a big site and it has a lot of "stuff" to do - but, it does actually work in the end).
The problem is that the SCM REST API has a hard-coded 230s limit as described here: https://blog.headforcloud.com/2016/11/15/azure-app-service-hard-timeout-limit/
So, the unpack stage keeps throwing "Invoke-RestMethod : 500 - The request timed out" exactly on the 230s mark.
We have tried SCM_COMMAND_IDLE_TIMEOUT and WEBJOBS_IDLE_TIMEOUT but, unsurprisingly, they did not make any difference.
$cmd=#{"command"="php .\site\wwwroot\install.php .\site\wwwroot\web.jpa .\site\wwwroot"}
Invoke-RestMethod -Uri $url -Headers #{"Authorization"="Basic $creds"} -Body (ConvertTo-Json($cmd)) -Method Post -ContentType "application/json" -TimeoutSec 7200
I can think of a few hypothetical ways around it (some quite eccentric):
Find another way to run CLI commands inside the Web App after deployment other than the Kudu REST API. Is there such a thing? I Googled and checked SO but all I found were pointers to the way we do it (or try to do it) now.
Use something like Selenium to click the GUI buttons instead of using the CLI. (I do not know if they would suffer a timeout.)
Instead of running the command via Kudu REST, use the same API to create and deploy a script to the web server, start it and then let the REST API exit whilst the script still runs on the Web App. Essentially, bodge an async call but without the callback and then have the pipeline check in on the site at, say, 5 minute intervals. Clunky.
Extend the 230s limit - but I do not think that Microsoft make this possible.
Make the web site as fast as possible during the deployment in the hope of getting it under the 4-minute mark and then down-scale it. Yuk!
See what the Akeeba JPA unpacking actually does, unpack it pre-deployment and do what the unpackage process does but controlled via the Pipeline. This is potentially a lot of work and would lose the support of the supplier.
Give up on an automated deployment. That would rather defeat much of the purpose of a Devops pipeline.
Try AWS + terraform instead. That's not a approved infrastructure environment, however.
Given that Microsoft understandably do not want long-running API calls hanging around, I understand why the limit exists. However, I would expect therefore there to be a mechanism to interact with an App Service file system via a CLI in another way. Does anyone know how?

The 4 minute idle timeout on the TCP level and this is implemented on the Azure hardware load balancer. This timeout is not configurable and this cannot be changed. One thing I want to mention is that this is idle timeout at the TCP level which means that if the connection is idle only and no data transfer happening, only then this timeout is hit. To provide more info, this will hit if the web application got the request and kept processing the request for > 4minutes without sending any data back.
Resolution
Ideally in a web application, it is not good to keep the underlying HTTP request open and 4 minutes is a decent amount of time. If you have a requirement about background processing within your web application, then the recommended solution is to use Azure WebJobs and have the Azure Webapp interact with the Azure Webjob to notify once the background processing is done (there are many ways that Azure provides like queues triggers etc. and you can choose the method that suits you the best). Azure Webjobs are designed for background processing and you can do as much background processing as you want within them. I am sharing a few articles that talk about webjobs in detail
· http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
· https://azure.microsoft.com/en-us/documentation/articles/websites-webjobs-resources/
============================================================================
It totally depends on the app. Message Queue comes to mind. There are a lot of potential solutions and it will be up to you to decide.
============================================================================
Option #1)
You can change the code to send some sort of header to continue to the client to keep the session open.
A sample is shown here
This shows the HTTP Headers with the Expect 100-continue header:
https://msdn.microsoft.com/en-us/library/aa287673%28v=vs.71%29.aspx?f=255&MSPPError=-2147217396
This shows how to add a Header to the collection:
https://msdn.microsoft.com/en-us/library/aa287502(v=vs.71).aspx
Option #2) Progress bar
This sample shows how to use a progress bar:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/d84f4c89-ebbf-44d3-bc4e-43525ae1df45/how-to-increase-progressbar-when-i-running-havey-query-in-sql-server-or-oracle-?forum=csharpgeneral
Option #3) A common practice to keep the connection active for a longer period is to use TCP Keep-alive. Packets are sent when no activity is detected on the connection. By keeping on-going network activity, the idle timeout value is never hit and the connection is maintained for a long period
Option #4) You can also try the option of hosting your application as an IaaS VM instead of a APP SERVICE. This may avoid the ARR timeout issue because its architecture is different and I believe that the time-out is configurable.

Using 'cron' module in node.js on heroku server

I am using cron in node.js to schedule a function that sends text messages at user-determined times. It works on my local server, but when i deploy to heroku the functions never get called.

I'll elaborate a little bit more on what #rsp said above, so that if anyone else finds this question they'll understand why using Heroku Scheduler is the correct answer here.
When you're running software on Heroku, what happens is that Heroku will take your project, and run it on a random dyno (server) in the Heroku collection of servers on Amazon.
For a number of reasons (including to help distribute application load evenly across a large number of Amazon servers), Heroku will periodically move your dyno from one Amazon server to another. This happens many times per day, automatically, behind the scenes.
This means that your application code will be periodically restarting all the time when running on Heroku.
Now -- this isn't a bad thing from an end-user perspective, because while your application code is restarting, Heroku will queue up incoming requests, then just send them to your application once it's been successfully restarted on a new host. So to the end user, this behavior is 100% transparent.
What's important to know here, though, is that since your application code may randomly restart, you shouldn't use it to do things like run long tasks that take a while to finish, or queue up jobs to be executed in the future (what the cron module does).
Instead: Heroku created a free scheduler addon that you can use to basically say "Hey Heroku, run this Node script every minute|hour|etc."
The scheduler addon Heroku provides will reliably execute your cron task because Heroku keeps track of that timing stuff separately (outside of your application logic).
Hopefully this helps!

I'm using a cron job on heroku with node. Here is my top level stuff
var CronJob = require('cron').CronJob;
var dailyJob = new CronJob({
cronTime: '0 0 0 * * *',
onTick: function () {
// Do daily function
console.log('I get called 1 time a day.');
},
start: false
});
dailyJob.start();

On Heroku you may need to use the Heroku Scheduler.
See: https://devcenter.heroku.com/articles/scheduler

Zero downtime deploy with node.js and mongodb?

I'm looking after building a global app from the ground up that can be updated and scaled transparently to the user.
The architecture so far is very easy, each part of the application has it own process and talk to other trough sockets.
This way i can spawn as many instances i want for each part of the application and distribute them across the globe accordingly to my needs.
In the front of the system i'll have a load balancer, which will them route the users to their closest instance, and when new code is spawned my instances will spawn new processes with the new code and route new requests to it and gracefully shutdown.
Thank you very much for any advice.
Edit:
The question is: What is the best ( and simplest ) solution for achieving zero downtime when deploying node to multiple instances ?
About the app:
https://github.com/Raynos/boot for "socket" connections,
http for http requests,
mongo for database
Solutions i'm trying at the moment:
https://www.npmjs.org/package/thalassa ( which managed haproxy configuration files and app instances ), if you don't know it, watch this talk: https://www.youtube.com/watch?v=k6QkNt4hZWQ and be aware crowsnest is being replaced by https://github.com/PearsonEducation/thalassa-consul

Deployment with zero downtime is only possible if the data you share between old and new nodes are compatible.
So in case you change the structure, you have to build a intermediate release, that can handle the old and new data structure without utilizing the new structure until you have replaced all nodes with that intermediate version. Then roll out the new version.
Taking nodes in and out of production can be done with your loadbalancer (and a grace time until all sessions expired on the nodes) (don't know enough about your application).

How to run cronjobs on local files on cloudControl PaaS?

On cloudControl, I can either run a local task via a worker or I can run a cronjob.
What if I want to perform a local task on a regular basis (I don't want to call a publicly accessible website).
I see possible solutions:
According to the documentation,
"cronjobs on cloudControl are periodical calls to a URL you specify."
So calling the file locally is not possible(?). So I'd have to create a page I can call via URL. And I have to perform checks, if the client is on localhost (=the server) -- I would like to avoid this way.
I make the worker sleep() for the desired amount of time and then make it re-run.
// do some arbitrary action
Foo::doSomeAction();
// e.g. sleep 1 day
sleep(86400);
// restart worker
exit(2);
Which one is recommended?
(Or: Can I simply call a local file via cron?)

The first option is not possible, because the url request is made from a seperate webservice.
You could either use HTTP authentication in the cron task, but the worker solution is also completely valid.
Just keep in mind that the worker can get migrated to a different server (in case of software updates or hardware failure), so do SomeAction() may get executed more often than once per day from time to time.

Node.JS with forever on Heroku

So, I need to run my node.js app on heroku, it works very well, but when my app crashes, i need something to restart it, so i added forever to package.json, and created a file named forever.js with this:
var forever = require('forever');
var child = new (forever.Monitor)('web.js', {
max: 3,
silent: false,
options: []
});
//child.on('exit', this.callback);
child.start();
forever.startServer(child);
on my Procfile (that heroku uses to know what to start) i put:
web: node forever.js
alright! Now everytime my app crashes it auto restarts, but, from time to time (almost every 1 hour), heroku starts throwing H99 - Platform error, and about this error, they say:
Unlike all of the other errors which will require action from you to correct, this one does not require action from you. Try again in a minute, or check the status site.
But I just manually restart my app and the error goes away, if I don't do that, it may take hours to go away by itself.
Can anyone help me here? Maybe this is a forever problem? A heroku issue?

This is an issue with free Heroku accounts: Heroku automatically kills unpaid apps after 1 hour of inactivity, and then spins them back up the next time a request comes in. (As mentioned below, this does not apply to paid accounts. If you scale up to two servers and pay for the second one, you get two always-on servers.) - https://devcenter.heroku.com/articles/dynos#dyno-sleeping
This behavior is probably not playing nicely with forever. To confirm this, run heroku logs and look for the lines "Idling" and " Stopping process with SIGTERM" and then see what comes next.
Instead of using forever, you might want to try the using the Cluster API and automatically create a new child each time one dies. http://nodejs.org/api/cluster.html#cluster_cluster is a good example, you'd just put your code into the else block.
The upshot is that your app is now much more stable, plus it gets to use all of the available CPU cores (4 in my experience).
The downside is that you cannot store any state in memory. If you need to store sessions or something along those lines, try out the free Redis To Go addon (heroku addons:add redistogo).
Here's an example that's currently running on heroku using cluster and Redis To Go: https://github.com/nfriedly/node-unblocker
UPDATE: Heroku has recently made some major changes to how free apps work, and the big one is they can only be online for a maximum of 18 hours per day, making it effectively unusable as a "real" web server. Details at https://blog.heroku.com/archives/2015/5/7/heroku-free-dynos
UPDATE 2: They changed it again. Now, if you verify your ID, you can run 1 free dyno constantly: https://blog.heroku.com/announcing_heroku_free_ssl_beta_and_flexible_dyno_hours#flexible-free-dyno-hours

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string