hazelcast ScheduledExecutorService lost tasks after node shutdown - hazelcast

I'm trying to use hazelcast ScheduledExecutorService to execute some periodic tasks. I'm using hazelcast 3.8.1.
I start one node and then the other, and the tasks are distributed between both nodes and properly executed.
If I shutdown the first node, then the second one will start to execute the periodic tasks that were previously on the first node.
The problem is that, if I stop the second node instead of the first, then its tasks are not rescheduled to the first one. This happens even if I have more nodes. If I shutdown the last node to receive tasks, those tasks are lost.
The shutdown is always done with ctrl+c
I've created a test application, with some sample code from hazelcast examples and with some pieces of code I've found on the web. I start two instances of this app.
public class MasterMember {
/**
* The constant LOG.
*/
final static Logger logger = LoggerFactory.getLogger(MasterMember.class);
public static void main(String[] args) throws Exception {
Config config = new Config();
config.setProperty("hazelcast.logging.type", "slf4j");
config.getScheduledExecutorConfig("scheduler").
setPoolSize(16).setCapacity(100).setDurability(1);
final HazelcastInstance instance = Hazelcast.newHazelcastInstance(config);
Runtime.getRuntime().addShutdownHook(new Thread() {
HazelcastInstance threadInstance = instance;
#Override
public void run() {
logger.info("Application shutdown");
for (int i = 0; i < 12; i++) {
logger.info("Verifying whether it is safe to close this instance");
boolean isSafe = getResultsForAllInstances(hzi -> {
if (hzi.getLifecycleService().isRunning()) {
return hzi.getPartitionService().forceLocalMemberToBeSafe(10, TimeUnit.SECONDS);
}
return true;
});
if (isSafe) {
logger.info("Verifying whether cluster is safe.");
isSafe = getResultsForAllInstances(hzi -> {
if (hzi.getLifecycleService().isRunning()) {
return hzi.getPartitionService().isClusterSafe();
}
return true;
});
if (isSafe) {
System.out.println("is safe.");
break;
}
}
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
threadInstance.shutdown();
}
private boolean getResultsForAllInstances(
Function<HazelcastInstance, Boolean> hazelcastInstanceBooleanFunction) {
return Hazelcast.getAllHazelcastInstances().stream().map(hazelcastInstanceBooleanFunction).reduce(true,
(old, next) -> old && next);
}
});
new Thread(() -> {
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
IScheduledExecutorService scheduler = instance.getScheduledExecutorService("scheduler");
scheduler.scheduleAtFixedRate(named("1", new EchoTask("1")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("2", new EchoTask("2")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("3", new EchoTask("3")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("4", new EchoTask("4")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("5", new EchoTask("5")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("6", new EchoTask("6")), 5, 10, TimeUnit.SECONDS);
}).start();
new Thread(() -> {
try {
// delays init
Thread.sleep(20000);
while (true) {
IScheduledExecutorService scheduler = instance.getScheduledExecutorService("scheduler");
final Map<Member, List<IScheduledFuture<Object>>> allScheduledFutures =
scheduler.getAllScheduledFutures();
// check if the subscription already exists as a task, if so, stop it
for (final List<IScheduledFuture<Object>> entry : allScheduledFutures.values()) {
for (final IScheduledFuture<Object> objectIScheduledFuture : entry) {
logger.info(
"TaskStats: name {} isDone() {} isCanceled() {} total runs {} delay (sec) {} other statistics {} ",
objectIScheduledFuture.getHandler().getTaskName(), objectIScheduledFuture.isDone(),
objectIScheduledFuture.isCancelled(),
objectIScheduledFuture.getStats().getTotalRuns(),
objectIScheduledFuture.getDelay(TimeUnit.SECONDS),
objectIScheduledFuture.getStats());
}
}
Thread.sleep(15000);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}).start();
while (true) {
Thread.sleep(1000);
}
// Hazelcast.shutdownAll();
}
}
And the task
public class EchoTask implements Runnable, Serializable {
/**
* serialVersionUID
*/
private static final long serialVersionUID = 5505122140975508363L;
final Logger logger = LoggerFactory.getLogger(EchoTask.class);
private final String msg;
public EchoTask(String msg) {
this.msg = msg;
}
#Override
public void run() {
logger.info("--> " + msg);
}
}
I'm I doing something wrong?
Thanks in advance
-- EDIT --
Modified (and updated above) the code to use log instead of system.out. Added logging of task statistics and fixed usage of the Config object.
The logs:
Node1_log
Node2_log
Forgot to mention that I wait until all the task are running in the first node before starting the second one.

Bruno, thanks for reporting this, and it really is a bug. Unfortunately it was not so obvious with multiple nodes as it is with just two. As you figured by your answer its not losing the task, but rather keep it cancelled after a migration. Your fix, however is not safe because a Task can be cancelled and have null Future at the same time, eg. when you cancel the master replica, the backup which never had a future, just gets the result. The fix is very close to what you did, so in the prepareForReplication() when in migrationMode we avoid setting the result. I will push a fix for that shortly, just running a few more tests. This will be available in master and later versions.
I logged an issue with your finding, if you don't mind, https://github.com/hazelcast/hazelcast/issues/10603 you can keep track of its status there.

I was able to do a quick fix for this issue by changing the ScheduledExecutorContainer class of the hazelcast project (used 3.8.1 source code), namely the promoteStash() method. Basically I've added a condition for the case were we task was cancelled on a previous migration of data.
I don't now the possible side effects of this change, or if this is the best way to do it!
void promoteStash() {
for (ScheduledTaskDescriptor descriptor : tasks.values()) {
try {
if (logger.isFinestEnabled()) {
logger.finest("[Partition: " + partitionId + "] " + "Attempt to promote stashed " + descriptor);
}
if (descriptor.shouldSchedule()) {
doSchedule(descriptor);
} else if (descriptor.getTaskResult() != null && descriptor.getTaskResult().isCancelled()
&& descriptor.getScheduledFuture() == null) {
// tasks that were already present in this node, once they get sent back to this node, since they
// have been cancelled when migrating the task to other node, are not rescheduled...
logger.fine("[Partition: " + partitionId + "] " + "Attempt to promote stashed canceled task "
+ descriptor);
descriptor.setTaskResult(null);
doSchedule(descriptor);
}
descriptor.setTaskOwner(true);
} catch (Exception e) {
throw rethrow(e);
}
}
}

Related

How to properly close a flowable and close response body using rxjava and retrofit

I am attempting to close a stream coming from an http request using Retrofit and rxjava, either because it timedOut, or because I need to change details that went into the request. Both appear to work perfectly, as when I cancel subscription I get the doOnCancel debug message and when doOnNext is completed I get the doOnTerminate message. I also do not receive inputLines from multiple threads. However, my thread count rises every single time either of the above actions happen. It appears that responsebody.close is not releasing their resources and therefore the thread is not dying (I also have gotten error messages along the lines of "OKHTTP leaked. did you close youre responseBody?")
Does anyone have any suggestions?
public boolean closeSubscription() {
flowableAlive = false;
subscription.cancel();
return true;
}
public void subscribeToFlowable() {
streamFlowable.observeOn(Schedulers.newThread()).subscribeOn(Schedulers.newThread())
.doOnTerminate(() -> log.debug("TERMINATED")).doOnCancel(() -> log.debug("FLOWABLE CANCELED"))
.subscribe(new Subscriber<ResponseBody>() {
#Override
public void onSubscribe(Subscription s) {
subscription = s;
subscription.request(Long.MAX_VALUE);
}
#Override
public void onNext(ResponseBody responseBody) {
log.debug("onNext called");
String inputLine;
try (InputStream inputStream = responseBody.byteStream()) {
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
while (flowableAlive && ((inputLine = br.readLine()) != null)) {
log.debug("stream receive input line for thread " + name);
log.debug(inputLine);
}
} catch (IOException e) {
log.debug("error occurred");
log.debug(e.getMessage());
}
}
#Override
public void onError(Throwable t) {
log.debug("error");
flowableAlive = false;
}
#Override
public void onComplete() {
log.debug("completed");
closeSubscription();
flowableAlive = false;
}
});
}
The result of subscribe() is Disposable object. You should store it as a filed and call Disposable.dispose() on it later as shown here:
https://proandroiddev.com/disposing-on-android-the-right-way-97bd55cbf970
Tour OkHttp call will be interrupted properly because dispose() interrupts thread on which the call runs and OkHttp checks regularly if Thread was interrupted to stop transfer when that happened - it's called cooperative cancelling/interruption.

Executor Service with void return type

Here I want to call n threads and execute my function padrDao.saveGuidanceDetails(sgd) which is a DAO method performing insert operation and return a long value as shown in below code.
Im using Callable, but it asks me to return some value but I'm not familiar with threads to use Runnable for the same job. Can someone pls validate if code is right or any modifications to be done? I feel code is wrong since there is a return statement inside callable and that will take me outside the main method for the first task itself.
int totalThreadsNeeded=listForguidanceItems.size();
ExecutorService executor = Executors.newFixedThreadPool(totalThreadsNeeded);
List<Callable<Void>> runnableTasks = new ArrayList<>();
final PriceLineItemsResultExt response1=response;
for(final ProductLineItemResultExt item: listForguidanceItems)
{
int counter=0;
final SavedGuidanceDetailsDto sgd=list.get(counter);
Callable<Void> task1 = new Callable() {
public Void call() {
if (sgd.hasGuidance())
{
if (response1.isSaveGuidance()) {
long guidanceDetailsId = padrDao.saveGuidanceDetails(sgd);
item.setGuidanceDetailsId(String.valueOf(guidanceDetailsId));
}
}
return null;
}};
counter++;
runnableTasks.add(task1);
}
try {
executor.invokeAll(runnableTasks);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
logger.info("Thread fail exception " + e);
}
executor.shutdown();
Pls suggest me modifications with the right code? Thanks in advance
To use Runnable you can simply replace these :
Callable<Void> task1 = new Callable() {
public Void call() {
...
With
Runnable task1 = new Runnable {
public void run() {
...
And with runnable you wouldn't have to return anything.
Of course you'd also need to modify you runnableTasks to be a List<Runnable> if tou still want to store these in a Collection (possibly not), and also change the way you submit them in the ExecutorService as :
executor.submit(your_Runnable_object)

Run swingworkers sequentially with semaphore

I have a panel with a JTabbedpane and in every tab you can set parameters to execute a query. When one query is busy retrieving his data from the database, you can already open a new tab to set the new parameters. To avoid overload on the database only one query may be executed at once. But when you click execute the program must remember which queries to execute in the right order. During the execution a loader icon is shown and the GUI may not be frozen, because there is a stop button you can click to stop the execution.
I used a swingworker to avoid the GUI from blocking while executing the query and that works fine. But now I want to prevent the next query to start before the previous has finished. In a model, common for the whole panel, I initialized a semaphore: private final Semaphore semaphore = new Semaphore(1, true);
This is the code which starts the swingworker (I've added println commands to see which is started, stopped or finished)
private void doStoredQuery() {
try {
semaphore.acquire();
System.out.println(queryName + "started");
worker.execute();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
And this is my swingworker (initializeWorker() is called from the constructor of the main class):
private SwingWorker<StoredQueryDataModel, Integer> initializeWorker() {
worker = new SwingWorker<StoredQueryDataModel, Integer>() {
#Override
protected StoredQueryDataModel doInBackground() throws Exception {
try {
StoredQueryDataModel dataModel = null;
publish(0);
try {
dataModel = new StoredQueryDataModel(queryRunner, ldbName, queryName, params);
} catch (S9SQLException e) {
//
} catch (Throwable e) {
showErrorMessage(e);
}
return dataModel;
}
finally {
semaphore.release();
System.out.println(queryName + "finished");
}
}
#Override
protected void process(List<Integer> chunks) {
//ignore chunks, just reload loader icon
panel.repaint();
}
#Override
protected void done() {
String error;
try {
result = get();
error = null;
} catch (Exception e) {
error = e.getMessage();
}
if(result == null) {
semaphore.release();
System.out.println(queryName + " stopped");
}
if(error == null) {
// process result
}
else {
showErrorMessage(new Throwable(error));
}
}
};
return worker;
}
I've tried putting the acquire and release on other positions in the code, but nothing seems to work. I am bot in Swingworker and sempahores quite new... Can someone help?
I have found the problem: the semaphore had to be a static variable. In my code there were as many semaphores as there are tabs, which caused them to run at the same time instead of sequentially.

Blackberry multi threading issue

I am developping a BlackBerry application which communicates with the server via HTTP requests(javax.microedition.io.HttpConnection). On device, user clicks some UI items, and device sends the requests to server, when the response comes, UI changes. Communication takes place under new thread, while UI thread pushes and pops ProgressDialogScreen.
The problem is sometimes, when response comes and ProgressDialogScreen is popped, UI does not change but after couple seconds UI changes. If you have requested in between when ProgressDialogScreen is popped and when new Screen is pushed, there comes the mess. First oldest new Screen is pushed, and the newest new Screen is pushed. And this situation can be observed like server responsing wrong requests. This problems occur on simulator and device.
The other problem is, sometimes two same response returns for one request. I was able to see these two problems on simulator at the logs, but i have not able to see this issue on device since i can not see the logs.
EDIT:
String utf8Response;
HttpConnection httpConn = null;
try{
httpConn = (HttpConnection) Connector.open(url);
httpConn.setRequestMethod(HttpConnection.GET);
httpConn.setRequestProperty("Content-Type", "text/html; charset=UTF8");
if(sessionIdCookie != null){
//may throw IOException, if the connection is in the connected state.
httpConn.setRequestProperty("Cookie", sessionIdCookie);
}
}catch (Exception e) {
//...
}
try{
httpConn.getResponseCode();
return httpConn;
}catch (IOException e) {
// ...
}
byte[] responseStr = new byte[(int)httpConn.getLength()];
DataInputStream strm = httpConn.openDataInputStream();
strm.readFully(responseStr);
try{
strm.close();
}catch (IOException e) {
// ....
}
utf8Response = new String(responseStr, "UTF-8");
If this code successfully run, this piece of code runs and new screen is pushed:
UiApplication.getUiApplication().invokeLater(new Runnable() {
public void run() {
Vector accounts = Parser.parse(utf8Response,Parser.ACCOUNTS);
if (accounts.size() == 0){
DialogBox.inform(Account.NO_DEPOSIT);
return;
}
currentScreen = new AccountListScreen(accounts);
changeScreen(null,currentScreen);
}
});
public void changeScreen(final AbstractScreen currentScreen,final AbstractScreen nextScreen) {
if (currentScreen != null)
UiApplication.getUiApplication().popScreen(currentScreen);
if (nextScreen != null)
UiApplication.getUiApplication().pushScreen(nextScreen);
}
EDITv2:
private static void progress(final Stoppable runThis, String text,boolean cancelable) {
progress = new ProgressBar(runThis, text,cancelable);
Thread threadToRun = new Thread() {
public void run() {
UiApplication.getUiApplication().invokeLater(new Runnable() {
public void run() {
try{
UiApplication.getUiApplication().pushScreen(progress);
}catch(Exception e){
Logger.log(e);
}
}
});
try {
runThis.run();
} catch (Throwable t) {
t.printStackTrace();
}
UiApplication.getUiApplication().invokeLater(new Runnable() {
public void run() {
try {
UiApplication.getUiApplication().popScreen(progress);
} catch (Exception e) { }
}
});
}
};
threadToRun.start();
}
By the way ProgressBar is extended from net.rim.device.api.ui.container.PopupScreen and Stoppable is extended from Runnable
I preferred to pop progress bar after new Screen is prepared and pushed. This way there will be no new request between request and response.
Why not do:
private static void progress(final Stoppable runThis, String text,boolean cancelable) {
progress = new ProgressBar(runThis, text,cancelable);
UiApplication.getUiApplication().pushScreen(progress);
[...]
Seems like you are parsing on the UI Thread. Please remove Vector accounts = Parser.parse(utf8Response,Parser.ACCOUNTS); from ui thread and do it in a separate thread.

Jackrabbit and concurrent modification

After we have done some performance testing for our application which uses jackrabbit we faced with the huge problem with concurrent modification jackrabbit's repository. Problem appears when we add nodes or edit them in multithread emulation. Then I wrote very simple test which shows us that problem is not in our environment.
There is it:
Simple Stateless Bean
#Stateless
#Local(TestFacadeLocal.class)
#Remote(TestFacadeRemote.class)
public class TestFacadeBean implements TestFacadeRemote, TestFacadeLocal {
public void doAction(int name) throws Exception {
new TestSynch().doAction(name);
}
}
Simple class
public class TestSynch {
public void doAction(int name) throws Exception {
Session session = ((Repository) new InitialContext().
lookup("java:jcr/local")).login(
new SimpleCredentials("username", "pwd".toCharArray()));
List added = new ArrayList();
Node folder = session.getRootNode().getNode("test");
for (int i = 0; i <= 100; i++) {
Node child = folder.addNode("" + System.currentTimeMillis(),
"nt:folder");
child.addMixin("mix:versionable");
added.add(child);
}
// saving butch changes
session.save();
//checking in all created nodes
for (Node node : added) {
session.getWorkspace().getVersionManager().checkin(node.getPath());
}
}
}
And Test class
public class Test {
private int c = 0;
private int countAll = 50;
private ExecutorService executor = Executors.newFixedThreadPool(5);
public ExecutorService getExecutor() {
return executor;
}
public static void main(String[] args) {
Test test = new Test();
try {
test.start();
} catch (Exception e) {
e.printStackTrace();
}
}
private void start() throws Exception {
long time = System.currentTimeMillis();
TestFacadeRemote testBean = (TestFacadeRemote) getContext().
lookup( "test/TestFacadeBean/remote");
for (int i = 0; i < countAll; i++) {
getExecutor().execute(new TestInstallerThread(i, testBean));
}
getExecutor().shutdown();
while (!getExecutor().isTerminated()) {
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
System.out.println(c + " shutdown " +
(System.currentTimeMillis() - time));
}
class TestInstallerThread implements Runnable {
private int number = 0;
TestFacadeRemote testBean;
public TestInstallerThread(int number, TestFacadeRemote testBean) {
this.number = number;
this.testBean = testBean;
}
#Override
public void run() {
try {
System.out.println("Installing data " + number);
testBean.doAction(number);
System.out.println("STOP" + number);
} catch (Exception e) {
e.printStackTrace();
c++;
}
}
}
public Context getContext() throws NamingException {
Properties properties = new Properties();
//init props
..............
return new InitialContext(properties);
}
}
If I initialized executor with 1 thread in pool all done without any error. If I initialized executor with 5 thread I got sometimes errors:
on client
java.lang.RuntimeException: javax.transaction.RollbackException: [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state
at org.jboss.aspects.tx.TxPolicy.handleEndTransactionException(TxPolicy.java:198)
on server at the beginning warn
ItemStateReferenceCache [ItemStateReferenceCache.java:176] overwriting cached entry 187554a7-4c41-404b-b6ee-3ce2a9796a70
and then
javax.jcr.RepositoryException: org.apache.jackrabbit.core.state.ItemStateException: there's already a property state instance with id 52fb4b2c-3ef4-4fc5-9b79-f20a6b2e9ea3/{http://www.jcp.org/jcr/1.0}created
at org.apache.jackrabbit.core.PropertyImpl.restoreTransient(PropertyImpl.java:195) ~[jackrabbit-core-2.2.7.jar:2.2.7]
at org.apache.jackrabbit.core.ItemSaveOperation.restoreTransientItems(ItemSaveOperation.java:879) [jackrabbit-core-2.2.7.jar:2.2.7]
We have tried synchronize this method and other workflow for handle multithread calls as one thread. Nothing helps.
And one more thing - when we have done similar test without ejb layer - all worked fine.
It looks like container wrapped in own transaction and then all crashed.
Maybe somebody faced with such a problem.
Thanks in advance.
From the Jackrabbit Wiki:
The JCR specification explicitly states that a Session is not thread-safe (JCR 1.0 section 7.5 and JCR 2.0 section 4.1.2). Hence, Jackrabbit does not support multiple threads concurrently reading from or writing to the same session. Each session should only ever be accessed from one thread.
...
If you need to write to the same node concurrently, then you need to use multiple sessions, and use JCR locking to ensure there is no conflict.

Resources