Calculating large data in Node on Heroku - node.js

I have a service that I run daily in the background with a database of about 140mb in size. The calculations I run require me to load all 140mb into Node at once, and after a minute or so quickly reach the process limit of 512mb and Heroku restarts the process.
For the mean time, a quick solution is to increase the server to 2X so I get 1 GB RAM, but within a month or so the database will outgrow that as well.
As far as Heroku goes, is my option basically to upgrade Dyno options? Since these are calculations I do once per day, I would rather run them locally on my machine and upload the results than to pay $250-500/month for the Performance Dynos.
I know I could also just upgrade to the Performance Dynos to run these services and then downgrade once finished, but I'm looking for something I can automate and not have to deal with each day.
Thanks for reading.

Heroku Scheduler seems to fit your use case exactly. You can schedule your task to run daily on a One-Off Dyno of any size, and since Heroku pricing is "prorated to the second" you will only pay for the time that your task is running on that Dyno.
I haven't actually used this, but I was about to recommend a similar solution on AWS when I searched and found this feature of Heroku.

Related

node cron jobs on heroku, is using the heroku scheduler necessary?

I'm building my app backend as a node/express API to be deployed on Heroku.
I am new to implementing "cron jobs", and I found the npm library named node-cron which seems very straightforward.
Is it as simple as just setting up the cron job in my app runtime code? I know it will not run when the heroku free dyno goes into "sleep mode" (based on other StackOverflow answers), but I plan to use paid dynos in production so that's not an issue.
My main concern is when I "scale up" on heroku and run multiple dynos, will this cause weird interactions? Will every "instance" of my app on a separate dyno try to run its own crons independantly, causing duplication of work?
I know heroku provides a free "scheduler" addon for this to spin up dynos to do this, but if the above won't happen, the scheduler heroku addon seems to be unneeded overhead in my case.
Notes: my cron will be very simple, just doing some database clean up on old records, I didn't want to do it in the database layer to keep things simpler as it seems not super easy to schedule jobs in postgres.
Any insights will be much appreciated. Thank you.
As you mentioned above all things are correct as per my past experience with the same situation.
npm package node-cron only works fine if you have only one dyno
otherwise it will trigger based on a number of dynos.
If you want to execute the cron perfectly without taking any risk (doesn't matter how much dyno's you have) I suggest you to use heroku add-on.

Azure App Service: How can I determine which process is consuming high CPU?

UPDATE: I've figured it out. See the end of this question.
I have an Azure App Service running four sites. One of the sites has two deployment slots in addition to the primary one. Recently I've been seeing really high CPU utilization for the App Service plan as a whole.
The dark orange line shows the CPU percentage. This is just after restarting all my sites, which brought it down to this level.
However, when I look at the CPU use reported by each site, it's really low.
The darker blue line shows the CPU time, which is basically nothing. I did this for all of my sites, and all the graphs look the same. Basically, it seems that none of my sites are causing the issue.
A couple of the sites have web jobs, so I took a look at the logs but everything is running fine there. The jobs run for a few seconds every few hours.
So my question is: how can I determine the source of this CPU utilization? Any pointers would be greatly appreciated.
UPDATE: Thanks to the replies below, I was able to get more detail into what was happening. I ended up getting what I needed from SCM / Kudu tools. You can get here by going to your web app in Azure and choosing Advanced Tools from the side nav. From the Kudu dashboard, choose Process Explorer. The value in the Total CPU Time column is not directly useful, because it's the time in seconds that the process has run since it started, which might have been minutes or days ago.
However, if you make a record of the value at intervals, you can look at the change over time, and one process might jump out at you. In my case, it was my WebJobs process. Every 60 seconds, this one process was consuming about 10 seconds of processor time, just within one environment.
The great thing about this Kudu dashboard is, if you can catch the problem while it is actually happening, you can hit the Start Profiling button and capture a diagnostic session. You can then open this up in Visual Studio and get some nice details about where the CPU time is being spent.
Just in case anyone else is seeing similar issues, I'll provide more details about my particular case. As I mentioned, my WebJobs exe was the culprit, and I found that all the CPU time was being spent in StackExchange.Redis.SocketManager, which manages connections to Azure Redis Cache. In my main web app, I create only one connection, as recommended. But Since my web jobs only run every once in a while, I was creating a new connection to Azure Redis Cache each time one ran, which apparently can lead to issues. I changed my code to create the Redis Cache connection once when the WebJob process starts up and use the existing connection when any individual WebJob runs.
Time will tell if this really fixes the issue, but I think it will. When the problem occurred, it always fit the same pattern: After a few days of running fine, my CPU would slowly ramp up over the course of about 12 hours. My thinking is that each time a WebJob ran, it created a connection object, which at first didn't produce trouble, but gradually as WebJobs ran every hour or two, cruft was building up until finally some critical threshold was met and the CPU usage would take off.
Hope this helps someone out there. Best wishes!
May be you should go to webApp scm?
%yourAppName%.scm.azurewebsites.com;
There is a page, that can show you all process, that runned now on your web app. (something like Console > Process).
Also you can go to support page (from scm right corner).
You can find some more info about your performance there, and make memory dump (not for this problem, but it useful for performance issues).
According to your description, I assumed that you could leverage the Crash Diagnoser extension to capture dump files from your Web Apps and WebJobs when the CPUs usage percentage is higher than the specific threshold to isolate this issue. For more details, you could refer to this official blog.

Heroku how many apps on a dyno

I'm running my blog on an Heroku dyno, and too many times my users have to wait almost half a minute for my blog to respond. There are ways to prevent Heroku from idling: Easy way to prevent Heroku idling? Most obvious is to ping the server every minute or so.
But it seems those methods are against Heroku's TOS, if I check the pricing page: https://www.heroku.com/pricing (see MUST SLEEP 6 HOURS IN A 24 HOUR PERIOD). And because Pingdom does costs me some money as well, I'm thinking of paying $7 dollars a month for the Hobby package. But how many apps can you run with that package? Cause I always run one app per dyno, but if I have to pay $7 per app... That seems too much.
Anyone who knows there is a way to run multiple apps on a dyno? Or is hiring a server at DigitalOcean with NodeJS a better choice, for example?
The free and hobby dyno types only support a maximum of one dyno running per process type. Additionally, applications using a free dyno type are limited to a maximum of two concurrent running dynos.
By default, a process type can’t be scaled to more than 100 dynos for standard-1X or standard-2X sized dynos. A process type can’t be scaled to more than 10 dynos for performance dynos.

Heroku workers for node.js

I am starting with Heroku and I have a webapp that has a part that needs to run once every week (Mondays preferably). I had been reading something about workers: here and here and here... But I still have many doubts:
1) This workers, runs on background without a strict control, can´t be scheduled to run once a week. or am I wrong? If I am wrong how can I schedule it?
2) To make them work, what exactly do I need to do? Type
web: node webApp.js
worker: node worker.js
in the Procfile (where worker.js is the part of the program that needs to run only once a week). And that is all?? nothing else?? so easy??
3) And the last one... but the most important. The "squamous matter of money"... One dyno is the same as one worker, so if you have a dyno running for the web you need to buy another for the worker... no? And on the list of prices a extra dyno cost 34.5$ (27.87€). It isn´t cheap... so I want to know if I am right, is it necessary buy a dyno if you want to run a worker?
You might find that the Heroku Scheduler add-on (https://devcenter.heroku.com/articles/scheduler) is a 'good enough' low cost option. You are charged for the hours that your scheduled tasks run for so if you have a regular job that only takes a short time to run it would work out much cheaper than a continuous worker process.
Its not as flexible with regard to scheduling as other options. It can be set up to run a task at a specific time every day or hourly. So if you need to have your task run say only on Mondays then you would need to have the scheduler run daily then check the day within your worker.js and exit immediately on other days.

Which Node.js Concurrent Web Server is best on Heroku?

I have just learned about Heroku and was pretty much excited to test it out. Ive quickly assembled their demo's with Node.js Language and stumbled across a problem. When running the application locally, apache benchmark prints roughly about 3500 request/s but when its on the cloud that drops to 10 request/s and does not increase or lower based on network latency. I cannot believe that this is the performance they are asking 5 cents/hour for and highly suspect my application to be not multi-threaded.
This is my code.js: http://pastebin.com/hyM47Ue7
What configuration do I need to apply in order to get it 'running' (faster) on Heroku ? Or what other web servers for node.js could I use ?
I am thankful for every answer on this topic.
Your little example is not multi-threaded. (Not even on your own machine.) But you don't need to pay for more dyno's immediately, as you can make use of multiple cores on a dyno, see this answer Running Node.js App with cluster module is meaningless in Heroku?
To repeat that answer: a node solution to using multiple processes that should increase your throughput is to use the (built-in) cluster module.
I would guess that you can easily get more than 10 req/s from a heroku dyno without a problem, see this benchmark, for example:
http://openhood.com/ruby/node/heroku/sinatra/mongo_mapper/unicorn/express/mongoose/cluster/2011/06/14/benchmark-ruby-versus-node-js/
What do you use to benchmark?
You're right, the web server is not multi-threaded, until you pay for more web dynos. I've found Heroku is handy for prototyping; depending on the monetary value of your time, you may or may not want to use it to set up a scalable server instead of using EC2 directly.

Resources