Background job with a thread/process? - multithreading

Technology used: EJB 3.1, Java EE 6, GlassFish 3.1.
I need to implement a background job that is execute every 2 minutes to check the status of a list of servers. I already implemented a timer and my function updateStatus get called every two minutes.
The problem is I want to use a thread to do the update because in case the timer is triggered again but my function called is not done, i will like to kill the thread and start a new one.
I understand I cannot use thread with EJB 3.1 so how should I do that? I don't really want to introduce JMS either.

You should simply use and EJB Timer for this.
When the job finishes, simply have the job reschedule itself. If you don't want the job to take more that some amount of time, then monitor the system time in the process, and when it goes to long, stop the job and reschedule it.
The other thing you need to manage is the fact that if the job is running when the server goes down, it will restart automatically when the server comes back up. You would be wise to have a startup process that scans the current jobs that exist in the Timer system, and if yours is not there, then you need to submit a new one. After that the job should take care of itself until your deploy (which erases existing Timer jobs).
The only other issue is that if the job is dependent upon some initialization code that runs on server startup, it is quite possible that the job will start BEFORE this happens when the server is firing up. So, may need to have to manage that start up race condition (or simply ensure that the job "Fails fast", and resubmits itself).

Related

Single job store, single started scheduler and multiple read-only worker processes

I ran into this FAQ indicating that sharing a persistent job store among two or more processes will lead to incorrect scheduler behavior:
How do I share a single job store among one or more worker processes?
My question is: If there's only a single worker scheduler that has been started via .start(), and another scheduler process is initialized on the same persistent sqlite jobstore only to print the trigger of a certain job_id (won't invoke a .start()), could that lead to cases of incorrect scheduler behavior?
Using apscheduler 3.6.3
Yes. First of all, the scheduler has to be started for it to return you the list of permanently stored jobs. Another potential issue is that the current APScheduler version deletes any jobs on retrieval for which it cannot find the corresponding task function. This behavior was initially added to clear out obsolete jobs, but was in retrospect ill conceived and will be removed in v4.0.
On the upside, it is possible to start the scheduler in paused mode so it won't try to run any jobs but will still give you the list of jobs, so long as all the target functions are importable.

Priority of Sitecore scheduled jobs

I have two Sitecore agents that run as scheduled jobs. Agent A is a long-running task that has low priority. Agent B is a short-running task that has high priority. B runs with an interval that is shorter than A's interval.
The problem is that B is never run, when A is already running.
I have implemented this to be able to run agents manually inside the content editor. When I do this, I am able to run B although A is already running (even though I set them to the same thread priority in the custom dialog).
How can I specify in my configuration file that B has a higher priority than A? Or make my scheduled job setup multi-threaded so simultaneously running jobs are possible in Sitecore? Is there a standard way to do this?
I have tried something like this where I set the thread priority inside the agent implementation, but this code is never invoked in B, when A is already running. So the prioritizing should somehow be done "before" the job implementation themselves.
As already mentioned in the other answer, Sitecore Scheduled Tasks are run sequentially, so each agent runs only after the previous one has finished.
However, tasks defined in the Database can be run async which means you can schedule multiple tasks to run in Parallel.
You'll need to create a Command and define a Schedule:
Defining a Command
Commands are defined in the Sitecore client.
In Content Editor navigate to /sitecore/system/Tasks/Commands
Create a new item using the Command template
Specify the Type and Method fields.
Defining a Schedule
The database agent will execute the command based on the settings in the schedule.
In Content Editor navigate to /sitecore/system/Tasks/Schedules
Create a new item using the Schedule template
For the Command field select the Command item you just created
If the task applies to specific items, you can identify those items in the Items field. This field supports a couple of formats.
For the Schedule field, you identify when the task should run. The value for this field is in a pipe-separated format.
On the schedule, check the field marked Async.
You can read more about the Database Scheduler in the Sitecore Community Docs.
Note that this may lead to performance issues if you schedule too many tasks to run in parallel.
The downside to running scheduled tasks from the Database if you cannot pass in parameters like you can for tasks defined in config. If you cannot simply access config settings from code (and they need to be passed in) then for a scheduled task defined in config you could then invoke a Sitecore Job from your Scheduled Task code. Sitecore Jobs run as a thread and each job will spin up a new thread so can be run in parallel, just make sure that Job names are unique (jobs of the same name will queue up).
The reason is because Sitecore Scheduled Job runs in sequence. So if you have a job being executed, it will not trigger the other jobs until it finishes.
If I am not mistaken, sitecore will queued the other jobs that will need to be executed after the current running job.
Since you trigger the job using the Run Agent Tool, it will run because you are forcing it to execute. It will not check if there is another job being ran except for publishing, it will queued because it is transferring item from Source to Target database.
EDIT:
You can check the <job> from the Web.config for Sitecore v7.x or Sitecore.config for Sitecore v8.x. You will see the pipeline being used for the Job. If I am not mistaken, I think you will need to check the code for the scheduler. The namespace is Sitecore.Tasks.Scheduler, Sitecore.Kernel
Thanks
As you might already understand from the answer from Hishaam (not going to repeat that good info), using the Sitecore agents might not be the best solution for what you are trying to do. For a similar setup (tasks that need to perform import, export or other queued tasks on an e-commerce site) I used an external scheduling engine (in my case Hangfire which did the job fine, but you could use an other as well) that called services in my Sitecore solution. The services performed as a layer to get to Sitecore.
You can decide how far you want to go with these services (could even start new threads) but they will be able to run next to each other. This way you will not bump into issues that another process is still running. I went for an architecture where the service was very thin layer towards the real business logic.
You might need to make sure though that the code behind one service cannot be called whilst already running (I needed to do that in case of handling a queue), but those things are all possible in .net code.
I find this setup more robust, especially for tasks that are important. It's also easier to configure when the tasks need to run.
I ended up with the following solution after realising that it was not a good solution to run my high priority scheduled task, B, as a background agent in Sitecore. For the purpose of this answer I will now call B: ExampleJob
I created a new agent class called ExampleJobStarter. The purpose of this job was simply to start another thread that runs the actual job, ExampleJob:
public class ExampleJobStarter
{
public override void Run()
{
if (ExampleJob.RunTheExampleJob) return;
ExampleJob.RunTheExampleJob = true;
Task.Run(() => new ExampleJob().Run());
}
}
public class ExampleJob
{
public static bool RunTheExampleJob;
public override void Run()
{
while (RunTheExampleJob)
{
DoWork();
Thread.Sleep(10000);
}
}
private void DoWork()
{
... // here I perform the actual work
}
}
ExampleJobStarter was now registered in my Sitecore config file to run every 10 minutes. I also removed ExampleJob from the Sitecore config, so it will not run automatically (thus, it is no longer a Sitecore job per se). ExampleJobStarter will simply ensure that ExampleJob is running in another thread. ExampleJob itself will do its work every 10 seconds, without being interfeered by the low-priority job agent, that still run as normal background agent.
Be aware of deadlock-issues if you go down this path (not an issue for the data I am working with in this case).

Creating a Web Crawler using Windows Azure

I want to create a Web Crawler, that takes the content of some website and saves it in a blob storage. What is the right way to do that on Azure? Should I start a Worker role, and use the Thread.Sleep method to make it run once a day?
I also wonder, if I use this Worker Role, how would it work if I create two instances of it? I noticed using "Compute Emulator UI" that the command "Trace.WriteLine" works on both instances at the same time, can someone clarify this point.
I created the same crawler using php and set the cron job to start the script once a day, but it took 6 hours to grab the whole content, thats why I want to use Azure.
This is the right way to do it, as of Jan 2014 Microsoft introduced Azure WebJobs, where you can create a project (console for example), and run it as a scheduled task (occurrence once, recurrence)
https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/
http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
Considering that a worker role is basically Windows 2008 Server, you can run the same code you'd run on-premises.
Consider, though, that there are several reasons why a role instance might reboot: OS updates, crash, etc. In these cases, it's possible you'd lose the work being done. So... you can handle this in a few ways:
Queue. Place a message on a command queue. If it's a once-a-day task, you can just push the message on the queue when done processing the previous message. Note that you can put an invisibility timeout on the message, so it doesn't appear for a day. In the event of failure during processing, the message will re-appear on the queue and a different instance can pick it up. You can also modify the message as you go, to keep track of your status.
Scheduler. Just make sure there's only one instance running (by way of a mutex). An easy way to do this is to attempt to obtain a write-lock on a blob (there can only be one).
One thing to consider is breaking up your web-crawl into separate tasks (url's?) and place those individually on the queue? With this, you'd be able to scale, running either multiple instances or, potentially, multiple threads in the same instance (since web-crawling is likely to be a blocking operation, rather than a cpu- and bandwidth-intensive one).
A single worker role running once a day is probably the best approach. I would not use thread sleep though, since you may want to restart the instance and then it may, depening on your programming, start before one day or later than one day. What about putting the task command as a message on the Azure Queue and dequeuing it once it has been picked up by a worker role, then adding a new task command on the Azure Queue once.

How to check whether a Timer Job has run

Is it possible to check whether a SharePoint (actually WSS 3.0) timer job has run when it was scheduled to ?
Reason is we have a few daily custom jobs and want to make sure they're always run, even if the server has been down during the time slot for the jobs to run, so I'd like to check them and then run them
And is it possible to add a setting when creating them similar to the one for standard Windows scheduled tasks ... "Run task as soon as possible after a scheduled start is missed" ?
check it in job status page and then you can look at the logs in 12 hive folder for further details
central administration/operations/monitoring/timer jobs/check jobs status
As far as the job restart is concerned when it is missed that would not be possible with OOTB features. and it make sense as well since there are lot of jobs which are executed at particular interval if everything starts at the same time load on server would be very high
You can look at the LastRunTime property of an SPJobDefinition to see when the job was actually executed. As far as I can see in Reflector, the value of this property is loaded from the database and hence it should reflect the time it was actually executed.

SharePoint timer jobs not getting invoked

I have a timer job which has been deployed to a server with multiple Web front ends.
This timer job reads it's configuration from a Hierarchical Object Store.
This timer job is scheduled to run daily on the server.
But the problem is that this timer job is not getting invoked daily. I have implemented event logging in the timer job's Execute() method, but I dont see any logs being generated.
Any ideas as to what could cause a timer job to be not picked up for execution by the SharePoint Timer Service? How can I troubleshoot this problem?
Are there any "gotcha"s for running timer jobs in servers from multiple front ends? Will the timer job get execute in all the web front ends, or any one of them arbitarily? How to know which machine will have my event logs?
This might be a stupid question, but does having multiple front ends for load balancing affect the way Hierarchical Object Stores behave?
EDIT:
One of the commenters, Sean McDounough, (Thanks Sean!! ) made a very good point that:
"whether or not the timer job runs on all WFEs will be a function of the SPJobLockType enum value you specified in the constructor. Using a value of "None" means that the job will run on all WFEs."
Now, my timer job is responsible for sending periodic mails to a list of users. Currently it is marked as SPJobLockType.Job"
If I change this to SPJobLockType.None, does this mean that my timer job will be executed in all the WFEs separately? (THis is not desired, it will spam all the users with multiple emails)
Or does it mean that the timer job will execute in any one of the WFEs, arbitarily?
Try restarting the SharePoint timer service from the command-line using NET STOP SPTIMERV3 followed by a NET START SPTIMERV3. My guess is that the timer service is running with an older version of your .NET assembly. The timer service does not automatically reload assemblies when you upgrade the WSP solution.
To do this, follow these steps:
Stop the Timer service.
Click Start, point to Administrative Tools, and then click Services.
Right-click Windows SharePoint Services Timer, and then click Stop, or Restart service.
This URL helped me.

Resources