Holding Quartz scheduler to trigger job again till previous job finish processing

Holding Quartz scheduler to trigger job again till previous job finish processing - cron

I am using Quartz scheduler to schedule the process of file download from SFTP.
The job triggered after every 2 hrs. But sometimes due to huge file size, downloading
takes more time and before it completes, the process is restarted. Is their any way we
can hold the scheduler to trigger same job again till the previous process completes processing?
I m using quartz 1.8.5.
Below is code
<flow name="quartzCronModel">
<quartz:inbound-endpoint connector-ref="Quartz"
jobName="cron-job" cronExpression="${database_download_timer}"
encoding="UTF-8">
<quartz:event-generator-job />
</quartz:inbound-endpoint>
<component doc:name="Download Database"
class="com.org.components.sftp.FileTransfer">
<method-entry-point-resolver
acceptVoidMethods="true">
<include-entry-point method="execute" />
</method-entry-point-resolver>
</component>
</flow>
I am reading cron expression from a properties file.

Your job will need to implement the StatefulJob interface. It is a marker interface that tells Quartz that it should not trigger the job if it is still running. In other words it prevents concurrent executions of the job.

It has been a long time since this question was asked. Jan Moravec has answered correctly, but during this time, this class has been deprecated. According to the Quartz documentation, It is currently best to use #DisallowConcurrentExecution and/or #PersistJobDataAfterExecution annotations instead.
I hope it will be useful

Related

How node-scheduler working in this scenerio

I am new in NodeJs and now I want to use node-scheduler, but i have just one query, please give me suggestion regarding this.
https://github.com/node-schedule/node-schedule
When I setup a scheduler that run in every 5 Minutes, If the scheduler does
not completed within 5 minutes. So my question is that then the scheduler
will start another thread or not?
Please solve my query.
Thanks.

Since jobs don't seem to have a mechanism to let the scheduler know they are done, jobs will be scheduled according to their scheduled time alone.
In other words: if you schedule a job to run every 5 minutes, it will be started every 5 minutes, even if the job itself takes more than 5 minutes to complete.
To clarify: this doesn't start a new thread for each job, as JS is single-threaded. If a job blocks the event loop (for instance by doing heavy calculations), it is possible for the scheduler to not be able to start a new job when its time has arrived, but blocking the event loop is not a good thing.

kafka consumer polling increase the method execution time

in our application,consumer started polling continuously at the load of the application and therefore sometimes it impact the execution time for one of the
method by polling in between the method execution.
Method (let say test())which ideally take few millisecond to run in junit case is now taking few seconds for the execution in app.Therfore, would like to skip the polling
at this point of time,if possible.
In spring integration doc have seen something called PollSkipAdvice/PollSkipStrategy which says The PollSkipAdvice can be used to suppress (skip) a poll.
Could you please suggest,if this can be of any help in above scenario.Would be great, if explain using example.Thanks.
sample config:
<int-kafka:inbound-channel-adapter
id="kafkaInboundChannelAdapter" kafka-consumer-context-ref="consumerContext"
auto-startup="false" channel="inputFromKafka">
<int:poller fixed-delay="10" time-unit="MILLISECONDS"
max-messages-per-poll="5" />
</int-kafka:inbound-channel-adapter>

You scenario isn't clear. Really...
We have here only one adapter with aggressive fixed-delay every 10 MILLISECONDS and only for the small amount of messages.
Consider to increase the poll time and make max-messages-per-poll as -1 to poll all of them for the one poll task.
From other side it isn't clear how that your test() method is involved...
Also consider to switch to <int-kafka:message-driven-channel-adapter> for better control over messages.
Regarding PollSkipAdvice... Really not sure which aim would you like to reach with it...
And one more point. Bear in mind all <poller>s use the same ThreadPoolTaskScheduler with the 10 as pool. So, maybe some other long-living task keeps threads from it busy...
This your <int-kafka:inbound-channel-adapter> takes only one, but each 10 millis, of course.

Mule Exhausted Action RUN vs WAIT. Which one to choose and when

I have a question on Mule threading profile Exhausted_Action. From the documentation, I understand that when the action is WAIT, any new request beyond the maxActive will wait for a thread to be available. Whereas action of RUN, would cause use the original thread to process the request. From my understanding, I thought WAIT is better way to do it, rather than RUN. However, it appears MULE has all default values set to RUN. Just want to hear for comments on my understanding and differences between these two actions and how to decide which one to use when.

Your undertanding about WAIT and RUN is correct.
Reason why all the default values are RUN is that, The message processing is not stopped because of unavailability of flow thread. Because the original thread(or receiver thread) is anyway waiting for the Flow thread to take the message and process, why not process it. (This is my opinion).
But there is a downside for using RUN.
Ex:
No of receiver threads are restricted to 2.
<asynchronous=processing-strategy name="customAsynchronous" maxThreads="1" />
<flow name="sample" processingStrategy="customAsynchronous" >
<file:inbound-endpoint ......>
............
..........
</flow>
File sizes: 1MB, 50MB, 100MB, 1MB, 5MB.
In the above flow when there are 5 files coming in. 3 files are processed as there is 1 flow thread available and 2 File Receiver threads (Exhausted_Action = RUN). The flow thread will finish the processing fast as the first file is small and keeps waiting for the next message. Unfortunately the receiver thread whose job is to pick the next file and give it to Flow thread to process is busy processing the BIG file. This way there is a chance of receiver threads getting struck in time consuming processing while the flow threads are waiting.
So it is always depending on the usecase you are dealing with.
Hope this helps.

Using CurrentRoleInstance.Id to run a task in just one instance

In a context where you are deploying a web role over multiple instances and require to schedule a task that should be done by one instance only (like sending an email to the site admin with some stats), how reliable is it to use RoleEnvironment.CurrentRoleInstance.Id in order to make the task run on one instance only (like only running it if the Id finishes with IN_0) ?
If anyone has ever done this, I'd be interested in his feedback.

I wouldn't use instance ID. What happens if instance 0 gets rebooted (which happens at least once per month)? Now your scheduler or task-runner is offline.
An alternate solution is to use a type of mutex that spans instances. The one I'm thinking of is a blob lease. You can actually acquire a lease on a blob for writing (and there can only be one lease-holder). You could attempt to get a blob lease before running a task. If you get it, run task. If you don't, don't run it.
A slight variation: In a thread (let's say started from your Run() method), attempt to acquire a lease and if successful, launch a scheduler task (maybe a thread or something). If you cannot acquire the lease, sleep for a minute and try again. Eventually, the instance with the lease will be rebooted (or it'll disappear for some other reason). After a few seconds, another instance will acquire the abandoned lease and start up a new scheduler task.
Steve Marx wrote a blog post about concurrency using leases. Tyler Doerksen also has a good post about leases.

yes you can use the InstanceId if needed specificaly
<Startup>
<Task commandLine="StartUpTasks\WindowService\InstallWindowService.bat" executionContext="elevated" taskType="background" >
<Environment>
<Variable name="InstanceId">
<RoleInstanceValue xpath="/RoleEnvironment/CurrentInstance/#id"/>
</Variable>
</Environment>
</Task>
</Startup>
it will be of following form
<deployment Id>.<Application Name>.<Role Name>_IN_<index>
Example mostly MyRole_IN_0, MyRole_IN_1
Access the environmet variable in batch file like this
%InstanceId%
You cane then use substring or last index of _ to get the index from InstanceId.
if this instance having index 0 will have the same index even after a reboot.
More Details
http://blogs.msdn.com/b/cclayton/archive/2012/05/17/windows-azure-start-up-tasks-part-2.aspx
http://msdn.microsoft.com/en-us/library/windowsazure/hh404006.aspx

It's possible to have some block of execution code only run once if you have multiple instances, by for example checking the ID of the current role instance you are executing at.
You could achieve the same result with other solutions, but those might require some more work, like decoupling the task from your instance

How to define frequency of a job in application by users?

I have an application that has to launch jobs repeatingly. But (yes, that would have been to easy without a but...) I would like users to define their backup frequency in application.
In worst case, they would have to choose between :
weekly,
daily,
every 12 hours,
every 6 hours,
hourly
In best case, they should be able to use crontab expressions (see documentation for example)
How to do this? Do I launch a job every minutes that check for last execution time, frequency and then launches another job if needed? Do I create a sort of queue that will be executed by a masterjob?
Any clues, ideas, opinions, best pratices, experiences are welcome!
EDIT : Solved this problem using Akka scheduler. Ok, this is a technical solution not a design answer but still everything works great.
Each user defined repetition is an actor that send messages every period to a new actor to execute the actual job.

There may be two ways to do this depending on your requirements/architecture:
If you can only use Play:
The user creates the job and the frequency it will run (crontab, whatever).
On saving the job, you calculate the first time it will have to be run. You then add an entry to a table JOBS with the execution time, job id, and any other information required. This is required as Play is stateless and information must be stored in the DB for later retrieval.
You have a job that queries the table for entries whose execution date is less than now. Retrieves the first, runs it, removes it from the table and adds a new entry for next execution. You should keep some execution counter so if a task fails (which means the entry is not removed from DB) it won't block execution of the other tasks by the job trying again and again.
The frequency of this job is set to run every second. That way while there is information in the table, you should execute the request around as often as they are required. As Play won't spawn a new job while the current one is working if you have enough tasks this one job will serve all. If not, it will be killed at some point and restored when required.
Of course, the crons of the users will not be too precise, as you have to account for you own cron delays plus execution delays on all the tasks in queue, which will be run sequentially. Not the best approach, unless you somehow disallow crons which run every second or more often than every minute (to be safe). Doing a check on execution time of the crons to kill them if they are over a certain amount of time would be a good idea.
If you can use more than Play:
The better alternative I believe is to use Quartz (see this) to create a future execution when the user creates the job, and reproram it once the execution is over.

There was a discussion on google-groups about it. As far as I remember you must define a job which start every 6 hours and check which backups must be done. So you must remember when the last backup job was finished and make the control yourself. I'm unsure if Quartz can handle such a requirement.
I looked in the source-code (always a good source ;-)) and found a method every, where I think this should be do what you want. How ever I'm unsure if this is a clever design, because if you have 1000 user you will have then 1000 Jobs. I'm unsure if Play was build to handle such a large number of jobs.
[Update] For cron-expressions you should have a look into JobPlugin.scheduleForCRON()

There are several ways to solve this.
If you don't have a really huge load of jobs, I'd just persist them to a table using the required flexibility. Then check all of them every hour (or the lowest interval you support) and run those eligible. Simple.
Or, if you prefer to use cron syntax anyway, just write (export) jobs to a user crontab using a wrapper which calls back to your running app, or starts the job in a standalone process if that's possible.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string