Chronicle queue bookmarking - multithreading

I have a situation where I need to get the end index of the queue in one thread, then resume reading it from another thread at some later point in time.
If it were the same thread, this would be trivial. I'd just create a tailer, move to end, and then start reading from that tailer when I'm ready.
The documentation states that using a tailer from multiple threads will result in undefined behavior. I presume that creating a Tailer in one thread with .createTailer().direction(FORWARD).toEnd(), and then reading from that Tailer in another thread would violate the contract. If not, let me know, because that would be the easiest solution.
What I've tried to do instead is:
bookmarkTailer = queue.createTailer().direction(FORWARD).toEnd();
bookmarkIndex = bookmarkTailer.index(); //Left open to ensure file doesn't expire
doAsync(()-> {
tailer = queue.createTailer();
if (!tailer.moveToIndex(index)) {
//fail
}
}
But the moveToIndex() call always fails.

The simplest was to resume a tailer is to use a named tailer.
tailer = queue.createTailer("my-name");
You can create this again, at any time and it will continue from where it was up.

Related

Spawning a new thread for each object load

I have a system which runs multiple service (long lived) and worker (short lived) threads. They all share a state which contains objects. Any thread can request an object an any time, through a singleton-of-sorts class called ObjectManager. If the object is not available it needs to be loaded.
Here's some pseudo-code of how object loading looks now:
class ObjectManager {
getLoadinData(path) {
if (hasLoadingDataFor(path))
return whatWeHave()
else {
loadingData = createNewLoadingData();
loadingData.path = path;
pushLoadingTaskToLoadingThread(loadingData);
return loadingData;
}
}
// loads object and blocks until it's loaded
loadObjectSync(path) {
loadingData = getLoadinData(path);
waitFor(loadingData.conditionVar);
return loadingData.loadedObject;
}
// initiates a load and calls a callback when done
loadObjectAsync(path, callback) {
loadingData = getLoadinData(path);
loadingData.callbacks.add(callback);
}
// dedicated loading thread
loadingThread() {
while (running) {
loadingData = waitForLoadingData();
object = readObjectFromDisk(loadingData.path);
object.onLoaded(); // !!!!
loadingData.object = object;
// unblock cv waiters
loadingData.conditionVar.notifyAll();
// call callbacks
loadingData.callbacks.callAll(object);
}
}
}
The problem is the line object.onLoaded. I have no control over this function. Some objects might decide that they need other objects to be valid. So in their onLoaded method they might call loadObjectSync. Uh-oh! This (naturally) dead locks. It blocks the loading loop until the loading loop makes more iterations.
What I could do to solve this is leave the onLoaded call to the initiating threads. This will change loadObjectSync to something like:
loadObjectSync(path) {
loadingData = getLoadinData(path);
waitFor(loadingData.conditionVar);
if (loadingData.wasCreatedInThisThread()) {
object.onLoaded();
loadingData.onLoadedConditionVar.notifyAll();
loadingData.callbacks.callAll(object);
}
else {
// wait more
waitFor(loadingData.onLoadedConditionVar);
}
return loadingData.loadedObject;
}
... but then the problem is that if I have no calls for loadSync and only for loadAsync or simply the loadAsync call was the first to create the loading data, there will be no one to finalize the object. So to make this work, I have to introduce another thread finalizes objects whose loadingData was created by loadObjectAsync.
It seems that it would work. But I have a simpler idea! What if I change getLoadingData instead. What if it does this:
getLoadinData(path) {
if (hasLoadingDataFor(path))
return whatWeHave()
else {
loadingData = createNewLoadingData();
loadingData.path = path;
///
thread = spawnLoadingThread(loadingData);
thread.detach();
///
return loadingData;
}
}
Spawn a new thread for every object load. Thus there is no dead lock. Every loading thread can safely block until it's done. The rest of the code remains exactly as it is.
This means potentially tens (or why not thousands in certain edge cases) active threads, waiting on condition variables. I know that spawning threads has its overhead but I think it would be negligible compared to the cost of I/O from readObjectFromDisk
So my question is: Is this terrible? Can this somehow backfire?
The target platform is conventional desktop machines. But this software is supposed to run for a long time without stopping: weeks, maybe months.
Alternatively... even though I have an idea how to solve this if the thread-per-load turns out to be terrible, can this be solved in another way?
Very interesting! This is a problem I have bumped into a couple of times, trying to add a synchronous interface to a fundamentally asynchronous operation (i.e. file load, or in my case, network write) that is performed by a service thread.
My own preference would be to not provide the synchronous interface. Why? Because it keeps the code simpler in design & implementation and easier to reason about -- always important for multi-threading.
Benefits of sticking to single thread & async only is that you only have 1 service thread, so resource growth is not a concern, plus the user callbacks are always invoked on this same thread, which simplifies thread-safety concerns for users of ObjectManager (if you have multiple callback threads, every user callback must be thread safe, so it's an important choice to make). However sticking to only an async interface does mean the user of ObjectManager has more work to do.
But if you do want to keep the synchronous interface, then another approach that I have taken could work for you. You stick to a single service thread but inside the implementation of loadObjectSync you check the thread-ID to determine if the invoker is the service thread or any-other thread. If it is any-other thread you queue the request and safely block. But if it is the service thread, you can immediately load the object, say by calling a new function loadObjectImpl. You will need to grab the thread-ID of the service thread during initialization and store it within the ObjectManager instance, and use that for thread identification. And you will need a new function which is basically just the internal scope of the loadingThread function -- i.e. a new function called something like loadObjectImpl.

Linux kernel: how to wait in multiple wait queues?

I know how to wait in Linux kernel queues using wait_event and how to wake them up.
Now I need to figure out how to wait in multiple queues at once. I need to multiplex multiple event sources, basically in a way similar to poll or select, but since the sources of events don't have the form of a pollable file descriptor, I wasn't able to find inspiration in the implementation of these syscalls.
My initial idea was to take the code from the wait_event macro, use DEFINE_WAIT multiple times as well as prepare_to_wait.
However, given how prepare_to_wait is implemented, I'm afraid the internal linked list of the queue would become corrupted if the same "waiter" is added multiple times (which could maybe happen if one queue causes wakeup, but the wait condition isn't met and waiting is being restarted).
One of possible scenarios for wait in several waitqueues:
int ret = 0; // Result of waiting; in form 0/-err.
// Define wait objects, one object per waitqueue.
DEFINE_WAIT_FUNC(wait1, default_wake_function);
DEFINE_WAIT_FUNC(wait2, default_wake_function);
// Add ourselves to all waitqueues.
add_wait_queue(wq1, &wait1);
add_wait_queue(wq2, &wait2);
// Waiting cycle
while(1) {
// Change task state for waiting.
// NOTE: this should come **before** condition checking for avoid races.
set_current_state(TASK_INTERRUPTIBLE);
// Check condition(s) which we are waiting
if(cond) break;
// Need to wait
schedule();
// Check if waiting has been interrupted by signal
if (signal_pending(current)) {
ret = -ERESTARTSYS;
break;
}
}
// Remove ourselves from all waitqueues.
remove_wait_queue(wq1, &wait1);
remove_wait_queue(wq2, &wait2);
// Restore task state
__set_current_state(TASK_RUNNING);
// 'ret' contains result of waiting.
Note, that this scenario is slightly different from one of wait_event:
wait_event uses autoremove_wake_function for wait object (created with DEFINE_WAIT). This function, called from wake_up(), removes wait object from the queue. So it is needed to re-add wait object into the queue each iteration.
But in case of multiple waitqueues it is impossible to know, which waitqueue has fired. So following this strategy would require to re-add every wait object every iteration, which is inefficient.
Instead, our scenario uses default_wake_function for wait object, so the object is not removed from the waitqueue on wake_up() call, and it is sufficient to add wait object to the queue only once, before the loop.

multithreading with MQ

I'm having problem using MQSeries Perl module in multi-threading environment. Here what I have tried:
create two handle in different thread with $mqMgr = MQSeries::QueueManager->new(). I thought this would give me two different connection to MQ, but instead I got return code 2219 on the second call to MQOPEN(), which probably means I got the same underling connection to mq from two separate call to new() method.
declare only one $mqMgr as global shared variable. But I can't assign reference to an MQSeries::QueueManager object to $mqMgr. The reason is "Type of arg 1 to threads::shared::share must be one of [$#%] (not subroutine entry)"
declare only one $mqMgr as global variable. Got same 2219 code.
Tried to pass MQCNO_HANDLE_SHARE_NO_BLOCK into MQSeries::QueueManager->new(), so that a single connection can be shared across thread. But I can not find a way to pass it in.
My question is, with Perl module MQSeries
How/can I get separate connection to MQ queue manager from different thread?
How/can I share a connection to MQ queue manager across different thread?
I have looked around but with little luck, Any info would be appreciated.
related question:
C++ - MQ RC Code 2219
Update 1: add a example that two local MQSeries::QueueManager object in two thread cause MQ error code 2219.
use threads;
use Thread::Queue;
use MQSeries;
use MQSeries::QueueManager;
use MQSeries::Queue;
# globals
our $jobQ = Thread::Queue->new();
our $resultQ = Thread::Queue->new();
# ----------------------------------------------------------------------------
# sub routines
# ----------------------------------------------------------------------------
sub worker {
# fetch work from $jobQ and put result to $resultQ
# ...
}
sub monitor {
# fetch result from $resultQ and put it onto another MQ queue
my $mqQMgr = MQSeries::QueueManager->new( ... );
# different queue from the one in main
# this would cause error with MQ code 2219
my $mqQ = MQSeries::Queue->new( ... );
while (defined(my $result = $resultQ->dequeue())) {
# create an mq message and put it into $mqQ
my $mqMsg = MQSeries::Message->new();
$mqQ->put($mqMsg);
}
}
# main
unless (caller()) {
# create connection to MQ
my $mqQMgr = MQSeries::QueueManager->new( ... );
my $mqQ = MQSeries::Queue->new( ... );
# create worker and monitor thread
my #workers;
for (1 .. $nThreads) {
push(#workers, threads->create('worker'));
}
my $monitor = threads->create('monitor');
while (True) {
my $mqMsg = MQSeries::Message->new ();
my $retCode = $mqQ->get(
Message => $mqMsg,
GetMsgOptions => $someOption,
Wait => $sometime
);
die("error") if ($retCode == 0);
next if ($retCode == -1); # no message
# not we have some job to do
$jobQ->enqueue($mqMsg->Data);
}
}
There is a very real danger when trying to multithread with modules that the module is not thread safe. There's a bunch of things that can just break messily because of the way threading works - you clone the current process state, and that includes things like file handles, sockets, etc.
But if you try and use them in an asynchronous/threaded way, they'll act really weird because the operations aren't (necessarily) atomic.
So whilst I can't answer your question directly, because I have no experience of the particular module:
Unless you know otherwise, assume you can't share between threads. It might be thread safe, it might not. If it isn't, it might still look ok, until one day you get a horrifically difficult to find bug as a result of a race condition in concurrent conditions.
A shared scalar/list is explicitly described in threads::shared as basically safe (and even then, you can still have problems with non-atomicity if you're not locking).
I would suggest therefore that what you need to do is either:
have a 'comms' thread, that does all the work related to the module, and make the other threads use IPC to talk to it. Thread::Queue can work nicely for this.
treat each thread as entirely separate for purposes of the module. That includes loading it (with require and import - not use because that acts earlier) and instantiating. (You might get away with 'loading' the module before threads start, but instantiating does things like creating descriptors, sockets etc.)
lock stuff when there's any danger of interruption of an atomic operation.
Much of the above also applies to fork parallelism too - but not in quite the same way, as fork makes "sharing" stuff considerably harder, so you're less likely to trip over it.
Edit:
Looking at the code you've posted, and crossreferencing against the MQSeries source:
There is a BEGIN block, that sets up some stuff with the MQSeries at the point at which you use it.
Whilst I can't say for sure that this is your problem, it makes me very wary - because bear in mind that when it does that, it sets up some stuff - and then when your threads start, they inherit non-shared copies of "whatever it did" during that "BEGIN" block.
So in light of what I suggested earlier on - I would recommend you try (because I can't say for sure, as I don't have a reference implementation):
require MQSeries;
MQSeries->import;
Put this in your code - in lieu of use - after thread start. E.g. after you do the creates and within the thread subroutine.

Synchronously request data within JavaFX thread from different thread

I've got a separate thread which needs to request some data that may change in the meantime within the JavaFX thread. I'd like to execute a blocking invocation in this separate thread that makes sure that the request becomes enqued into the JavaFX thread.
The Swing-GUI testing framework, AssertJ, provides an easy to use API for this purpose:
List list = GuiActionRunner.execute(new GuiQuery<...>...);
The invocation blocks the current thread, executes the passed code within event dispatching thread and returns the required data.
How can this be implemented in production code for JavaFX applications? What would be the recommended approach for this requirement?
Here's an alternative solution, using a FutureTask. This avoids the explicit latch and managing the synchronized data in an AtomicReference. The code here is probably simple enough that it would make including this functionality inPlatform redundant.
FutureTask<List<?>> task = new FutureTask<>( () -> {
List<?> data = ... ; // access data
return data ;
});
Platform.runLater(task);
List<?> data = task.get();
This technique is very useful if you want to pause a background thread to await user input.
Ok I think I got it now. You need to implement something like this yourself:
AtomicReference<List<?>> r = new AtomicReference<>();
CountDownLatch l = new CountDownLatch(1);
Platform.runLater( () -> {
// access data
r.set(...)
l.countDown();
})
l.await();
System.err.println(r.get());

What's the difference between log4net.ThreadLogicalContext and log4net.ThreadContext

I don't understand the explanation in offical document:
Logical threads can jump from one managed thread to another.
What's the different between ThreadContext and ThreadLogicalContext?
Can someone elaborate on it?
Thanks.
I should go back and add this to my own question (that Stefan Egli linked above) ...
From what I can tell, there is very little practical difference between the two.
ThreadContext stores information in a Dictionary that is stored using Thread.SetData.
ThreadLogicalContext stores its information in a Dictionary that is stored using the CallContext.
Information stored in the CallContext has almost the same
accessibility as information stored using Thread.SetData. That is, the information is accessibli to the thread that stored the information in the first place.
Now, IF the ThreadLogicalContext used CallContext.LogicalSetData (or if the Dictionary stored using CallContext.SetData implemented the marker interface, IThreadAffinative) then there WOULD be BIG difference. In that case, any information stored (LogicalSetData) could be accessed within the same thread AND is passed to child threads. In addition (flows with the logical thread), the
information can flow across remoting calls and across AppDomains (if the data is Serializable).
I would have put in some links, but am working from iPhone so is a little awkward. There are some good links in the link that Stefan Egli posted above.
Also, look at Jeffrey Richter's blog from September for an article on CallContext.LogicalSetData. I used his test program as a basis for comparing CallContext.SetData vs CallContext.LogicalSetData vs Thread.SetData vs [ThreadStatic]. Last time I checked, it was the last
thing he posted.
Will try to come back and post more links and/or some sample code when I have easy access to computer.
Good luck!
From using this myself, I see the benefit of using the ThreadLogicalContext when working with multi threaded logic (async, await).
For example, if you set the property on your original calling thread using ThreadContext, it is also available to any other tasks that get to run on the same thread.
// caller thread (thread #1)
log4net.ThreadContext.Properties["MyProp"] = "123"; // now set on thread #1
log("start");
await Task.WhenAll(
MyAsync(1), // `Issue` if task run on thread #1, it will have "MyProp"
MyAsync(2) // `Issue` if task run on thread #1, it will have "MyProp"
);
log("end"); // `Issue` only by random chance will you run on thread #1 again
Where as if you use ThreadLogicalContext, it stays on the calling context.
// caller thread (thread #1)
log4net.LogicalThreadContext.Properties["MyProp"] = "123"; // now set on calling context
log("start");
await Task.WhenAll(
MyAsync(1), // if task run on thread #1, there is no "MyProp"
MyAsync(2) // if task run on thread #1, there is no "MyProp"
);
log("end"); // if task run on thread #1, there is no "MyProp"
With await you are never guaranteed you come back to the same thread as when you started and the calling context will have changed, so you will have to set the property again.
...
log4net.LogicalThreadContext.Properties["MyProp"] = "123";
log("end");

Resources