Liferay scheduler is not firing on clustering - liferay

I had scheduler used in portlet which will be triggering for each 5 minutes.
following are the configurations:
liferay-portlet.xml:
<scheduler-entry>
<scheduler-description>
This scheduler is used to invoke the update and delete results
</scheduler-description>
<scheduler-event-listener-class>com.test.myown.scheduler.action.GetResultsScheduler</scheduler-event-listener-class>
<trigger>
<simple>
<simple-trigger-value>5</simple-trigger-value>
<time-unit>minute</time-unit>
</simple>
</trigger>
</scheduler-entry>
And my class is :
public class GetResultsScheduler implements MessageListener{
public void receive(Message message) throws MessageListenerException {
// some code here
}
}
we are using liferay 6.1,weblogic server
Here the problem is, There are 2 nodes for server ,
Scheduler is triggering for 5 minutes in node1
There is no scheduler triggering in node 2
What I meant say is, scheduler is working in node 1 and not working in node2.
Can any one have any idea about this issue?

I'd say this is expected: When you run some frequent jobs, you might not want to execute them concurrently on each machine, as they'll get in conflict. You've stated that the job must be executed every 5 minutes, not twice every 5 minutes (as it would be if every machine would fire it).
If you take down node1, I'd also expect node2 to take over the duty of executing the scheduled jobs. This will be a good test for the correctness of the assumption stated above.
Also, you shouldn't rely on such jobs to execute in each VM (probably manipulating common state, shared through the classloader). If you modify content that is cached, the cluster communication should take care to invalidate the classes.
If you have problems with this, you're either violating that (my) assumption of the global state in the VM, or you've not configured the cluster cache invalidation correctly.

Related

Netty multithreading broken in version 4.1 ? Unable to process short queries after long ones?

I just want to setup a very common server : it must accept connections and make some business calculations to return the answer. Calculations can be short or long -> I need some kind of ThreadPoolExecutor to execute these calculations.
In netty 3, that we were using since a long time, this was achieved very easily, by just using an ExecutionHandler in the pipeline, before my BusinessHandler.
But now trying to setup the same thing in netty 4, i read in the documentation that ExecutionHandler was not existing anymore, and that i add to specify a EventExecutor when adding my BusinessHandler to the channel pipeline.
DefaultEventExecutorGroup applicativeExecutorGroup = new DefaultEventExecutorGroup(10);
...
ch.pipeline().addLast(applicativeExecutorGroup, businessHandler);
It works for very basic scenarios (only short queries), but not in the following one. The reason is that DefaultEventExecutorGroup will not select a free worker, but any one based on a round-robin.
A first request (R1) comes, is assigned T1 (Thread 1 of the DefaultEventExecutorGroup), and will take a long time (say 1 minute).
Then a few other queries Ri (i=2 to 10) are received. They are assigned Ti, and are also processed successfully.
But when R11 comes, it is assigned again T1, due to the round-robin algorithm implemented in DefaultEventExecutorGroup, and the query is queued after the long R1. As a result, it will not start its processing before one minute, and that is clearly an unacceptable delay. In concrete scenarios, clients never get the answer, because they time out waiting for the answer before we start the processing.
And this continue like this. One query every 10 queries will just fail, because queued after the long one in the only busy thread, while all the other threads of the Group were just idle.
Is there another configuration of my pipeline that would work ? For example, does a implemntation of EventExecutor exist that would just work like a standard Executor (select a FREE worker).
Or is it just a bug in netty 4.1 ? It would looks very strange, as this looks as a very common scenario for any server.
Thanks for your help.
From what you explained above I think you want to use UnorderedThreadPoolEventExecutor as a replacement for DefaultEventExecutorGroup. Or if ordering is important NonStickyEventExecutorGroup.

Why in kubernetes cron job two jobs might be created, or no job might be created?

In k8s Cron Job Limitations mentioned that there is no guarantee that a job will executed exactly once:
A cron job creates a job object about once per execution time of its
schedule. We say “about” because there are certain circumstances where
two jobs might be created, or no job might be created. We attempt to
make these rare, but do not completely prevent them. Therefore, jobs
should be idempotent
Could anyone explain:
why this could happen?
what are the probabilities/statistic this could happen?
will it be fixed in some reasonable future in k8s?
are there any workarounds to prevent such a behavior (if the running job can't be implemented as idempotent)?
do other cron related services suffer with the same issue? Maybe it is a core cron problem?
The controller:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cronjob/cronjob_controller.go
starts with a comment that lays the groundwork for an explanation:
I did not use watch or expectations. Those add a lot of corner cases, and we aren't expecting a large volume of jobs or scheduledJobs. (We are favoring correctness over scalability.)
If we find a single controller thread is too slow because there are a lot of Jobs or CronJobs, we we can parallelize by Namespace. If we find the load on the API server is too high, we can use a watch and UndeltaStore.)
Just periodically list jobs and SJs, and then reconcile them.
Periodically means every 10 seconds:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cronjob/cronjob_controller.go#L105
The documentation following the quoted limitations also has some useful color on some of the circumstances under which 2 jobs or no jobs may be launched on a particular schedule:
If startingDeadlineSeconds is set to a large value or left unset (the default) and if concurrentPolicy is set to AllowConcurrent, the jobs will always run at least once.
Jobs may fail to run if the CronJob controller is not running or broken for a span of time from before the start time of the CronJob to start time plus startingDeadlineSeconds, or if the span covers multiple start times and concurrencyPolicy does not allow concurrency. For example, suppose a cron job is set to start at exactly 08:30:00 and its startingDeadlineSeconds is set to 10, if the CronJob controller happens to be down from 08:29:00 to 08:42:00, the job will not start. Set a longer startingDeadlineSeconds if starting later is better than not starting at all.
Higher level, solving for only-once in a distributed system is hard:
https://bravenewgeek.com/you-cannot-have-exactly-once-delivery/
Clocks and time synchronization in a distributed system is also hard:
https://8thlight.com/blog/rylan-dirksen/2013/10/04/synchronization-in-a-distributed-system.html
To the questions:
why this could happen?
For instance- the node hosting the CronJobController fails at the time a job is supposed to run.
what are the probabilities/statistic this could happen?
Very unlikely for any given run. For a large enough number of runs, very unlikely to escape having to face this issue.
will it be fixed in some reasonable future in k8s?
There are no idemopotency-related issues under the area/batch label in the k8s repo, so one would guess not.
https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Aarea%2Fbatch
are there any workarounds to prevent such a behavior (if the running job can't be implemented as idempotent)?
Think more about the specific definition of idempotent, and the particular points in the job where there are commits. For instance, jobs can be made to support more-than-once execution if they save state to staging areas, and then there is an election process to determine whose work wins.
do other cron related services suffer with the same issue? Maybe it is a core cron problem?
Yes, it's a core distributed systems problem.
For most users, the k8s documentation gives perhaps a more precise and nuanced answer than is necessary. If your scheduled job is controlling some critical medical procedure, it's really important to plan for failure cases. If it's just doing some system cleanup, missing a scheduled run doesn't much matter. By definition, nearly all users of k8s CronJobs fall into the latter category.

CRON + Nodejs + multiple cores => behaviour?

I'm building in a CRON like module into my service (using node-schedule) that will get required into each instance of my multi-core setup and I'm wondering since they are all running their own threads and they are all scheduled to run at the same time, will they get called for every single thread or just once because they're all loading the same module.
If they do get called multiple times, then what is the best way to make sure the desired actions only get called once?
if you are using pm2 with cluster mode, then can use
process.env.NODE_APP_INSTANCE to detect which instance is running. You can use the following code so your cron jobs will be called only once.
// run cron jobs only for first instance
if(process.env.NODE_APP_INSTANCE === '0'){
// cron jobs
}
node-schedule runs inside a given node process and it schedules things that that particular node process asked it to schedule.
If you are running multiple node processes and each is using node-schedule, then all the node-schedule instances within those separate node processes are independent (no cooperation or coordination between them). If each node process asks it's own node-schedule instance to run a particular task at 3pm on the first wednesday of the month, then all the node processes will start running that task at that time.
If you only want the action carried out once, then you have to coordinate among your node-instances so that the action is only scheduled in one node process, not in all of them or only schedule these types of operations in one of your node instances, not all of them.
The best way to handle this in a generic way is to have a shared database that you write a "lock" entry to. As in, let's say all tasks wrote a DB entry such as {instanceId: "a", taskId: "myTask", timestamp: "2021-12-22:10:35"}.
All tasks would submit the same thing except with their own instanceId. You then have an unique index on 'timestamp' so that only 1 gets accepted.
Then they all do a query and see if their node was the one that was accepted to do the cron.
You could do the same thing but also add a "random" field that generates a random number and the task with the lowest number wins.

Orchard CMS custom background job

I created custom background job following instruction.
You can see also discussion regarding this problem.
Created task handler and constructor of task is called every minute. Process method is never called. Also I am getting timeout exception "Orchard.Tasks.BackgroundService - Error while processing background task".
It is interesting that it is possible to add new task, but any query won't work.
I checked table Scheduling_ScheduledTaskRecord and it was locked. I am getting timeout exception in code and SQL Management interface. Lock is released once I turn of site process.
You are describing Scheduled tasks, that run on the background.
In order to trigger the execution of Process you need shcedule the first task (thus starting the loop). Use DateTime.UtcNow to schedule tasks.
You can also use Background Tasks, this way:
public class MyBackgroundService : IBackgroundTask {
public MyBackgroundService() {
}
public void Sweep() {
//Background task execution
//DO some work
}
}
Sweep() will be executed every 60 seconds. I don't know if this suit you, because this will be executed every minute even if the previous task is still running.

Storm Bolt Database Connection

I am using Storm (java) with Cassandra.
One of my Bolts inserts data in to Cassandra. Is there any way to hold the connection to Cassandra open between instantiations of this bolt?
The write speed of my application is fast. The bolt need to run several times a second, and the performance is being hindered by the fact that it is connecting to Cassandra each time.
It would run a lot faster if I could have a static connection that was held open, but I am not sure to achieve this in storm.
To clarify the question:
what is the scope of a static connection in a storm topology?
Unlike other messaging systems which have workers where the "work" goes on in a loop or callback which can make use of a variable (maybe a static connection) outside this loop, storms bolts seem to be instantiated each time they are called and can not have parameters passed in to them, so how can I use the same connection to cassandra?
Unlike other messaging systems which have workers where the "work" goes on in a loop or callback which can make use of a variable (maybe a static connection) outside this loop, storms bolts seem to be instantiated each time they are called and can not have parameters passed in to them
Its not exactly right to say that storm bolts get instantiated each time they called. For example the prepare method only get called during the initialization phase i.e only once. from the doc it says it is Called when a task for this component is initialized within a worker on the cluster. It provides the bolt with the environment in which the bolt executes.
So the best bet would be to put the initialization code in the prepare or open (in case of spouts) method as they will be called when the tasks are starting. But you need make it thread safe as it will be called by every tasks concurrently in its own thread.
The execute(Tuple tuple) method on the other hand is actually responsible for processing the logic and called every time it receives a tuple from the corresponding spouts or bolts.(so this is actually what get called every single time the bolt runs)
The cleanup method is called when an IBolt is going to be shutdown, the documentation says
There is no guarentee that cleanup will be called, because the
supervisor kill -9's worker processes on the cluster.The one context
where cleanup is guaranteed to be called is when a topology is killed
when running Storm in local mode
So its not true that you can't pass a variable to it, you can instantiate any instance variables with the prepare method and then use it during the processing.
Regarding the DB connection I am not exactly sure about your use cases as you have not put any code but maintaining a pool of resource sounds like a good choice to me.

Resources