Azure: when should I use startup tasks? - azure

I understand that startup tasks are used to set up your system. For example, if your code is written in Python, you can add a startup task to install Python. But can't this also be done in the ProgramEntryPoint batch script? What's the difference?

Its true that, if you use the ProgramEntryPoint there doesn't seem to be a reason to use startup tasks. You can indeed include all the logic in that same batch file.
Startup tasks get more useful when working with the .NET WebRoles/WorkerRoles. There you only have the option to write code (where you could again call a single batch file calling other batch files) and or use startup tasks.
But if you look at it from a maintenance point of view its much cleaner to use startup tasks for everything having to do with configuration and installation of your instance. You draw a clear line between configuration/installation and your actual application - you could actually see this as separation of concerns (this will be easy to understand be other/new developers on the project).
Besides that you have to know that, when you use tasks, you can execute tasks in different contexts (limited / elevated) which might be important from a security perspective. And tasks exist in different types (simple, background, foreground) which can be used in many different scenarios (a background app that constantly pings your site for example). If you don't use tasks, you might need to handle all of this yourself.
Here is a good blog post covering the details of startup tasks: Using Startup Task in Windows Azure detailed summary

Great answer from Sandrino. In a nutshell - you would use startup tasks if you want some code to execute (or start executing) before your role starts. If that is not a constraint you can always execute any process (including batch scripts) from the OnStart method of the Role. One case where I have used startup tasks in the past is to install the NewRelic monitoring agent. I wanted that running to profile my app before the actual app started.

You will probably not be able to install Python from the ProgramEntryPoint since the install will probably require elevated ("admin") privilegies.
A role (web/worker) usually does not have elevated privilegies (it might be possible but it is a bad practice for obvious security reasons). So code in ProgramEntryPoint does not have elevated privilegies.
On another hand, a startup task can have elevated privilegies. IMO, this probably the biggest (single ?) benefit from using startup tasks.

Related

Is it possible to run different tasks on different schedules with prefect?

I'm moving my first steps with prefect, and I'm trying to see what its degrees of freedom are. To this end, I'm investigating whether prefect supports running different tasks on different schedules in the same python process. For example, Task A might have to run every 5 minutes, while Task B might run twice a day with a Cron scheduler.
It seems to me that schedules are associated with a Flow, not with a task, so to do the above, one would have to create two distinct one-task Flows, each with its own schedule. But even as that, given that running a flow is a blocking operation, I can't see how to "start" both flows concurrently (or pseudo-concurrently, I'm perfectly aware the flows won't execute on separate threads).
Is there a built-in way of getting the tasks running on their independent schedules? I'm under the impression that there is a way to achieve this, but given my limited experience with prefect, I'm completely missing it.
Many thanks in advance for any pointers.
You are right that schedules are associated with Flows and not Tasks, so the only place to add a schedule is a Flow. Running a Flow is a blocking operation if you are using the open source Prefect core only. For production use cases, it's recommended running your Flows against Prefect Cloud or Prefect Server. Cloud is the managed offering and Server is when you host it yourself. Note that Cloud has a very generous free tier.
When using a backend, you will use an agent that will kick off the flow run in a new process. This will not be blocking.
To start with using a backend, you can check the docs here
This Prefect Discourse topic discusses a very similar problem and shows how you could solve it using a flow-of-flows orchestrator pattern.
One way to approach it is to leverage Caching to avoid recomputation of certain tasks that require lower-frequency scheduling than the main flow.

Linux crontab command can replace java quartz

I have a question about job schedular, can use crontab(linux) command replace java quartz?
I want to know the advantage of quartz, someone could give some advice.
Depending on the scale of the problem being solved, using the cron scheduler provided by Linux will work well for many problems (on a single host). When you would like some fail over capability quartz is going to be the solution. Quartz can act as a clustered scheduler. Configured correctly, one node could be brought down for patching while the jobs running on quartz continue to process. There are also features of quartz that cron does not provide. Persistence and disallowing concurrent execution are two that I am using for a project. Those are some of the features that stand out to me. It would be best to check the documentation and look at some of the examples provided.
Cron is available by default on any unix based os. Quarz is simply a Java API with (more scheduling options). If you wish to schedule tasks within a Java application, Quartz is the way to go. If you wish to schedule adhoc os commands, unless you feel like writing your own generic scheduler, cron is the way to go.

Any limitations creating processes under Azure Web Sites (specifically Web Jobs)?

Are there any limitations on creating separate processes from an Azure Web Site (specifically, from a continuous Web Job)? I have an executable that often (about %20 of the time) stalls and eventually fails with exit code -1073741819 (access denied? or access violation?), but only when run as a separate process. If this work is retried later, it eventually succeeds (usually on the first retry).
When instead I call this logic directly via a .NET method call (so within the same process and app domain), the code succeeds 100% of the time. The same code also always succeeds when run locally, even when it creates a separate process.
Is there anything going on at the Azure Web Sites/Web Jobs level that I should be aware of, such as using Windows job objects or other security mechanisms to limit the creation or runtime of spawned processes? If not, any suggestions on how to diagnose further what might be going wrong? (I believe remote desktop to a web site isn't possible; anything else that would help "see" what's failing, such as whether there's a WER dialog appearing?)
In case it matters, the logic (in both cases) includes P/Invoking custom native code, and the web site I'm using is Always On, x64, Basic pricing tier.
#David Ebbo, thanks for the suggestion. I used it to help isolate, and I ultimately found this was non-determinism in the code made more likely in the Azure Web Sites environment but not 100% restricted to that context.

Monitor node.js scripts running on ubuntu instance

I have a node.js script that run once in a day on ubuntu EC2 instance. This script pulls data from some hundered thousand remote APIs and save to our local database. Is there any way we can monitor this node.js script on remote server? There have been few instances where script crashed due to some reason and we were unable to figure it out without SSHing into instance and checking the logs. I have however created a small system after first few crashes which send us an email whenever script crashes due to some uncaught exception and also when script completes execution.
However, we need to develop a better system where we can monitor the progress of script via web interface of our admin application which is deployed over some other instance and also trigger start/stop of script via this interface. What are possible options for achieving this?
If you like to stay in Node.js, then there are several process monitoring tools:
PM2 comes with lots of other features besides monitoring processes. You can monitor your processes via CLI or their official web interface: https://keymetrics.io/. A quick search on npm also gives a bunch of nice unofficial gui tools: https://www.npmjs.com/search?q=pm2+web
Forever is not as feature rich as PM2 but will do the basic process operations and couple of gui are also available in npm.
There are two problems here that you are trying to solve:
Scheduling work to be done
Monitoring a process for failure
At a simple level, this is easy: schedule a cron job and restart failed things so they keep trying.
However, when things don't go smoothly, it helps to have a lot more granularity over what you are scheduling, and how it is executed. This would also give you the visibility over each little piece of work.
Adding a little more complexity, you can end up with something like this:
Schedule the script that starts everything (via cron, if that's comfortable)
That script generates several jobs that need to be executed into a queue
A worker process (or n worker processes) consume that queue and execute pending jobs
You can monitor both the progress of the jobs, as well as the state of each worker (# of crashes, failures, jobs completed, etc.). The other tools mentioned above are good candidates for this (forever, pm2, etc.)
When jobs fail, other workers can pick up the small piece of work that was in progress and restart it. This is much more efficient than restarting the entire process, and also lets you parallelize things across n workers based on how you can split up the workloads.
You could easily throw the status onto a web app so you can check in periodically rather than have to dig through server logs.
You can also get more intelligent with different types of failures. Network error? Retry 5 times. Rated limited? Gradual back-off. Crash? Don't retry and notify via email. etc
I have tried this with pm2, you can get the info of the task, then cat out or grab the log files. Or you could have a logging server, see also: https://github.com/papertrail/remote_syslog2

How do you instruct a SharePoint Farm to run a Timer Job on a specific server?

We have an SP timer job that was running fine for quite a while. Recently the admins enlisted another server into the farm, and consequently SharePoint decided to start running this timer job on this other server. The problem is the server does not have all the dependencies installed (i.e., Oracle) on it and so the job is failing. I'm just looking for the path of least resistance here. My question is there a way to force a timer job to run on the server you want it to?
[Edit]
If I can do it through code that works for me. I just need to know what the API is to do this if one does exist.
I apologize if I'm pushing for the obvious; I just haven't seen anyone drill down on it yet.
Constraining a custom timer job (that is, your own timer job class that derives from SPJobDefinition) is done by controlling constructor parameters.
Timer jobs typically run on the server where they are submitted (as indicated by vinny) assuming no target server is specified during the creation of the timer job. The two overloaded constructors for the SPJobDefinition type, though, accept an SPServer and an SPJobLockType as the third and fourth parameters, respectively. Using these two parameters properly will allow you to dictate where your job runs.
By specifying your target server as the SPServer and an SPJobLockType of "Job," you can constrain the timer job instance you create to run on the server of your choice.
For documentation on what I've described, see MSDN: http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.administration.spjobdefinition.spjobdefinition.aspx.
I don't know anything about the code you're running, but custom timer jobs are commonly setup during Feature activation. I got the sense that your codebase might not be your own (?); if so, you might want to look for the one or more types/classes that derive from SPFeatureReceiver. In the FeatureActivated method of such classes is where you might find the code that actually carries out the timer job instantiation.
Of course, you'll also want to look at the custom timer job class (or classes) themselves to see how they're being instantiated. Sometimes developers will build the instantiation of the class into the class itself (via Factory Method pattern, for example). Between the timer job class and SPFeatureReceiver implementations, though, you should be on the way towards finding what needs to change.
I hope that helps!
Servers in a farm need to be identical.
If you happen to use VMs for your web front ends, you can snap a server and provision copies so that you know they are all identical.
Timer jobs per definition run on all web front ends.
If you need scheduled logic to run on a specific server, you either need to specifically code this in the timer job, or to use a "standard" NT Service instead.
I think a side effect of setting SPJobLockType to 'Job' is that it'll execute on the server where the job is submitted.
You could implement a Web Service with the business logig and deploy that Web Service to one machine. Then your Timer Job could trigger your web service periodically.
The it sould be not that important wher your timer job is running. SharePoint decides itself where to run the timer job.

Resources