In Kotlin Native, how to keep an object around in a separate thread, and mutate its state from any other thead without using C pointers? - multithreading

I'm exploring Kotlin Native and have a program with a bunch of Workers doing concurrent stuff
(running on Windows, but this is a general question).
Now, I wanted to add simple logging. A component that simply logs strings by appending them as new lines to a file that is kept open in 'append' mode.
(Ideally, I'd just have a "global" function...
fun log(text:String) {...} ]
...that I would be able to call from anywhere, including from "inside" other workers and that would just work. The implication here is that it's not trivial to do this because of Kotlin Native's rules regarding passing objects between threads (TLDR: you shouldn't pass mutable objects around. See: https://github.com/JetBrains/kotlin-native/blob/master/CONCURRENCY.md#object-transfer-and-freezing ).
Also, my log function would ideally accept any frozen object. )
What I've come up with are solutions using DetachedObjectGraph:
First, I create a detached logger object
val loggerGraph = DetachedObjectGraph { FileLogger("/foo/mylogfile.txt")}
and then use loggerGraph.asCPointer() ( asCPointer() ) to get a COpaquePointer to the detached graph:
val myPointer = loggerGraph.asCPointer()
Now I can pass this pointer into the workers ( via the producer lambda of the Worker's execute function ), and use it there. Or I can store the pointer in a #ThreadLocal global var.
For the code that writes to the file, whenever I want to log a line, I have to create a DetachedObjectGraph object from the pointer again,
and attach() it in order to get a reference to my fileLogger object:
val fileLogger = DetachedObjectGraph(myPointer).attach()
Now I can call a log function on the logger:
fileLogger.log("My log message")
This is what I've come up with looking at the APIs that are available (as of Kotlin 1.3.61) for concurrency in Kotlin Native,
but I'm left wondering what a better approach would be ( using Kotlin, not resorting to C ). Clearly it's bad to create a DetachedObjectGraph object for every line written.
One could pose this question in a more general way: How to keep a mutable resource open in a separate thread ( or worker ), and send messages to it.
Side comment: Having Coroutines that truly use threads would solve this problem, but the question is about how to solve this task with the APIs currently ( Kotlin 1.3.61 ) available.

You definitely shouldn't use DetachedObjectGraph in the way presented in the question. There's nothing to prevent you from trying to attach on multiple threads, or if you pass the same pointer, trying to attach to an invalid one after another thread as attached to it.
As Dominic mentioned, you can keep the DetachedObjectGraph in an AtomicReference. However, if you're going to keep DetachedObjectGraph in an AtomicReference, make sure the type is AtomicRef<DetachedObjectGraph?> and busy-loop while the DetachedObjectGraph is null. That will prevent the same DetachedObjectGraph from being used by multiple threads. Make sure to set it to null, and repopulate it, in an atomic way.
However, does FileLogger need to be mutable at all? If you're writing to a file, it doesn't seem so. Even if so, I'd isolate the mutable object to a separate worker and send log messages to it rather than doing a DetachedObjectGraph inside an AtomicRef.
In my experience, DetachedObjectGraph is super uncommon in production code. We don't use it anywhere at the moment.
To isolate mutable state to a Worker, something like this:
class MutableThing<T:Any>(private val worker:Worker = Worker.start(), producer:()->T){
private val arStable = AtomicReference<StableRef<T>?>(null)
init {
worker.execute(TransferMode.SAFE, {Pair(arStable, producer).freeze()}){
it.first.value = StableRef.create(it.second()).freeze()
}
}
fun <R> access(block:(T)->R):R{
return worker.execute(TransferMode.SAFE, {Pair(arStable, block).freeze()}){
it.second(it.first.value!!.get())
}.result
}
}
object Log{
private val fileLogger = MutableThing { FileLogger() }
fun log(s:String){
fileLogger.access { fl -> fl.log(s) }
}
}
class FileLogger{
fun log(s:String){}
}
The MutableThing uses StableRef internally. producer makes the mutable state you want to isolate. To log something, call Log.log, which will wind up calling the mutable FileLogger.
To see a basic example of MutableThing, run the following test:
#Test
fun goIso(){
val mt = MutableThing { mutableListOf("a", "b")}
val workers = Array(4){Worker.start()}
val futures = mutableListOf<Future<*>>()
repeat(1000) { rcount ->
val future = workers[rcount % workers.size].execute(
TransferMode.SAFE,
{ Pair(mt, rcount).freeze() }
) { pair ->
pair.first.access {
val element = "ttt ${pair.second}"
println(element)
it.add(element)
}
}
futures.add(future)
}
futures.forEach { it.result }
workers.forEach { it.requestTermination() }
mt.access {
println("size: ${it.size}")
}
}

The approach you've taken is pretty much correct and the way it's supposed to be done.
The thing I would add is, instead of passing around a pointer around. You should pass around a frozen FileLogger, which will internally hold a reference to a AtomicRef<DetachedObjectGraph>, the the attaching and detaching should be done internally. Especially since DetachedObjectGraphs are invalid once attached.

Related

Coordinating emission and subscription in Kotlin coroutines with hot flows

I am trying to design an observable task-like entity which would have the following properties:
Reports its current state changes reactively
Shares state and result events: new subscribers will also be notified if the change happens after they've subscribed
Has a lifecycle (backed by CoroutineScope)
Doesn't have suspend functions in the interface (because it has a lifecycle)
The very basic code is something like this:
class Worker {
enum class State { Running, Idle }
private val state = MutableStateFlow(State.Idle)
private val results = MutableSharedFlow<String>()
private val scope = CoroutineScope(Dispatchers.Default)
private suspend fun doWork(): String {
println("doing work")
return "Result of the work"
}
fun start() {
scope.launch {
state.value = State.Running
results.emit(doWork())
state.value = State.Idle
}
}
fun state(): Flow<State> = state
fun results(): Flow<String> = results
}
The problems with this arise when I want to "start the work after I'm subscribed". There's no clear way to do that. The simplest thing doesn't work (understandably):
fun main() {
runBlocking {
val worker = Worker()
// subscriber 1
launch {
worker.results().collect { println("received result $it") }
}
worker.start()
// subscriber 2 can also be created "later" and watch
// for state()/result() changes
}
}
This prints only "doing work" and never prints a result. I understand why this happens (because collect and start are in separate coroutines, not synchronized in any way).
Adding a delay(300) to coroutine inside doWork "fixes" things, results are printed, but I'd like this to work without artificial delays.
Another "solution" is to create a SharedFlow from results() and use its onSubscription to call start(), but that didn't work either last time I've tried.
My questions are:
Can this be turned into something that works or is this design initially flawed?
If it is flawed, can I take some other approach which would still hit all the goals I have specified in the beginning of the post?
Your problem is that your SharedFlow has no buffer set up, so it is emitting results to its (initially zero) current collectors and immediately forgetting them. The MutableSharedFlow() function has a replay parameter you can use to determine how many previous results it should store and replay to new collectors. You will need to decide what replay amount to use based on your use case for this class. For simply displaying latest results in a UI, a common choice is a replay of 1.
Depending on your use case, you may want to give your CoroutineScope a SupervisorJob() in its context so it isn't destroyed by any child job failing.
Side note, your state() and results() functions should be properties by Kotlin convention, since they do nothing but return references. Personally, I would also have them return read-only StateFlow/SharedFlow instead of just Flow to clarify that they are not cold.

Child thread not seeing updates made by main thread

I'm implementing a SparkHealthListener by extending SparkListener class.
#Component
class ClusterHealthListener extends SparkListener with Logging {
val appRunning = new AtomicBoolean(false)
val executorCount = new AtomicInteger(0)
override def onApplicationStart(applicationStart: SparkListenerApplicationStart) = {
logger.info("Application Start called .. ")
this.appRunning.set(true)
logger.info(s"[appRunning = ${appRunning.get}]")
}
override def onExecutorAdded(executorAdded: SparkListenerExecutorAdded) = {
logger.info("Executor add called .. ")
this.executorCount.incrementAndGet()
logger.info(s"[executorCount = ${executorCount.get}]")
}
}
appRunning and executorCount are two variables declared in ClusterHealthListener class. ClusterHealthReporterThread will only read the values.
#Component
class ClusterHealthReporterThread #Autowired() (healthListener: ClusterHealthListener) extends Logging {
new Thread {
override def run(): Unit = {
while (true) {
Thread.sleep(10 * 1000)
logger.info("Checking range health")
logger.info(s"[appRunning = ${healthListener.appRunning.get}] [executorCount=${healthListener.executorCount.get}]"
}
}
}.start()
}
ClusterHealthReporterThread is always reporting the initialized values regardless of the changes made to the variable by main thread? What am I doing wrong? Is this because I inject healthListener to ClusterHealthReporterThread?
Update
I played around a bit and looks like it has something to do with the way i initiate spark listener.
If I add the spark listener like this
val sparkContext = SparkContext.getOrCreate(sparkConf)
sparkContext.addSparkListener(healthListener)
Parent thread will show appRunning as 'false' always but shows executor count correctly. Child thread (health reporter) will also show proper executor counts but appRunning was always reporting 'false' like that of the main thread.
Then I stumbled across this Why is SparkListenerApplicationStart never fired? and tried setting listener at the spark config level,
.set("spark.extraListeners", "HealthListener class path")
If I do this, main thread will report 'true' for appRunning and will report correct executor counts but child thread will always report 'false' and '0' value for executors.
I can't immediately see what's wrong here, you might have found an interesting edge case.
I think #m4gic's comment might be correct, that the logging library is perhaps caching that interpolated string? It looks like you are using https://github.com/lightbend/scala-logging which claims that this interpolation "has no effect on behavior", so maybe not. Please could you follow his suggestion to retry without using that feature and report back?
A second possibility: I wonder if there is only one ClusterHealthListener in the system? Perhaps the autowiring is causing a second instance to be created? Can you log the object ids of the ClusterHealthListener reference in both locations and verify that they are the same object?
If neither of those suggestions fix this, are you able to post a working example that I can play with?

Why locally created struct can't be send to another thread?

why in D I can't send to another thread through Tid.send local instances of structs?
I would like to make simple handling of thread communication like this:
void main()
{
...
tid.send(Command.LOGIN, [ Variant("user"), Variant("hello1234") ] );
...
}
void thread()
{
...
receive(
(Command cmd, Variant[] args) { ... })
)
...
}
If I understand it correctly, D should create the array of Variants in the stack and
then copy content of the array the send function right? So there should not be any
issues about synchronization and concurrency. I'm quiet confused, this concurrency is
wierd, I'm used to code with threads in C# and C.
Also I'm confused about the shared keyword, and creating shared classes.
Usualy when I try to call method of shared class instance from non-shared object, the
compiler throws an error. Why?
you should idup the array and it will be able to go through, normal arrays are default sharable (as they have a shared mutable indirection)
as is the compiler can rewrite the sending as
Variant[] variants = [ Variant("user"), Variant("hello1234") ] ;
tid.send(Command.LOGIN, variants);
and Variant[] fails the hasUnsharedAlias test
you can fix this by making the array shared or immutable (and receiving the appropriate one on the other side)

Actor's value sometimes returns null

I have an Actor and some other object:
object Config {
val readValueFromConfig() = { //....}
}
class MyActor extends Actor {
val confValue = Config.readValueFromConfig()
val initValue = Future {
val a = confValue // sometimes it's null
val a = Config.readValueFromConfig() //always works well
}
//..........
}
The code above is a very simplified version of what I actually have. The odd thing is that sometimes val a = confValue returns null, whereas if I replace it with val a = Config.readValueFromConfig() then it always works well.
I wonder, is this due to the fact that the only way to interact with an actor is sending it a message? Therefore, since val confValue is not a local variable, I must either use val a = Config.readValueFromConfig() (a different object, not an actor) or val a = self ! GetConfigValue and read the result afterwards?
val readValueFromConfig() = { //....}
This gives me a compile error. I assume you mean without parentheses?
val readValueFromConfig = { //....}
Same logic with different timing gives different result = a race condition.
val confValue = Config.readValueFromConfig() is always executed during construction of MyActor objects (because it's a field of MyActor). Sometimes this is returning null.
val a = Config.readValueFromConfig() //always works well is always executed later - after MyActor is constructed, when the Future initValue is executed by it's Executor. It seems this never returns null.
Possible causes:
Could be explained away if the body of readValueFromConfig was dependent upon another
parallel/async operation having completed. Any chance you're reading the config asynchronously? Given the name of this method, it probably just reads synchronously from a file - meaning this is not the cause.
Singleton objects are not threadsafe?? I compiled your code. Here's the decompilation of your singleton object java class:
public final class Config
{
public static String readValueFromConfig()
{
return Config..MODULE$.readValueFromConfig();
}
}
public final class Config$
{
public static final MODULE$;
private final String readValueFromConfig;
static
{
new ();
}
public String readValueFromConfig()
{
return this.readValueFromConfig;
}
private Config$()
{
MODULE$ = this;
this.readValueFromConfig = // ... your logic here;
}
}
Mmmkay... Unless I'm mistaken, that ain't thread-safe.
IF two threads are accessing readValueFromConfig (say Thread1 accesses it first), then inside method private Config$(), MODULE$ is unsafely published before this.readValueFromConfig is set (reference to this prematurely escapes the constructor). Thread2 which is right behind can read MODULE$.readValueFromConfig before it is set. Highly likely to be a problem if '... your logic here' is slow and blocks the thread - which is precisely what synchronous I/O does.
Moral of story: avoid stateful singleton objects from Actors (or any Threads at all, including Executors) OR make them become thread-safe through very careful coding style. Work-Around: change to a def, which internally caches the value in a private val.
I wonder, is this due to the fact that the only way to interact with an actor is sending it a message? Therefore, since val confValue is not a local variable, I must either use val a = Config.readValueFromConfig() (a different object, not an actor)
Just because it's not an actor, doesn't mean it's necessarily safe. It probably isn't.
or val a = self ! GetConfigValue and read the result afterwards?
That's almost right. You mean self ? GetConfigValue, I think - that will return a Future, which you can then map over. ! doesn't return anything.
You cannot read from an actor's variables directly inside a Future because (in general) that Future could be running on any thread, on any processor core, and you don't have any memory barrier there to force the CPU caches to reload the value from main memory.

Why missingMethod is not working for Closure?

UPDATE
I have to apologize for confusing the readers. After I got totally lost in the code, I reverted all my changes from Mercurial repo, carefully applied the same logic as before -- and it worked. The answers below helped me understand the (new to me) concept better, and for that I gave them upvotes.
Bottom line: if a call to a missing method happens within a closure, and resolution set to DELEGATE_FIRST, methodMissing() will be called on the delegate. If it doesn't -- check you own code, there is a typo somewhere.
Thanks a lot!
Edit:
OK, now that you've clarified what your are doing (somewhat ;--))
Another approach (one that I use for DSLs) is to parse your closure group to map via a ClosureToMap utility like this:
// converts given closure to map method => value pairs (1-d, if you need nested, ask)
class ClosureToMap {
Map map = [:]
ClosureToMap(Closure c) {
c.delegate = this
c.resolveStrategy = Closure.DELEGATE_FIRST
c.each{"$it"()}
}
def methodMissing(String name, args) {
if(!args.size()) return
map[name] = args[0]
}
def propertyMissing(String name) { name }
}
// Pass your closure to the utility and access the generated map
Map map = new ClosureToMap(your-closure-here)?.map
Now you can iterate through the map, perhaps adding methods to applicable MCL instance. For example, some of my domains have dynamic finders like:
def finders = {
userStatusPaid = { Boolean active = true->
eq {
active "$active"
paid true
}
}
}
I create a map using the ClosureToMap utility, and then iterate through, adding map keys (methods, like "userStatus") and values (in this case, closure "eq") to domain instance MCL, delegating the closure to our ORM, like so:
def injectFinders(Object instance) {
if(instance.hasProperty('finders')) {
Map m = ClosureToMap.new(instance.finders).map
m?.each{ String method, Closure cl->
cl.delegate = instance.orm
cl.resolveStrategy = Closure.DELEGATE_FIRST
instance.orm.metaClass."$method" = cl
}
}
}
In this way in controller scope I can do, say:
def actives = Orders.userStatusPaid()
and "eq" closure will delegate to the ORM and not domain Orders where an MME would occur.
Play around with it, hopefully I've given you some ideas for how to solve the problem. In Groovy, if you can't do it one way, try another ;--)
Good luck!
Original:
Your missingMethod is defined on string metaclass; in order for it to be invoked, you need "someString".foo()
If you simply call foo() by itself within your closure it will fail, regardless of delegation strategy used; i.e. if you don't use the (String) delegate, good luck. Case in point, do "".foo() and it works.
I don't fully understand the issue either, why will you not have access to the closure's delegate? You are setting the closure's delegate and will invoke the closure, which means you will have access to the delegate within the closure itself (and can just delegate.foo())
nope, you will not catch a missing method and redirect it to the delegate with metaclass magic.
the closure delegate is the chance to capture those calls and adapt them to the backing domain.
that means...
you should create your own delegate with the methods required by the dsl.
do not try to force a class to do delegate work if it's not designed for the task, or the code will get really messy in not time.
keep everything dsl related in a set of specially designed delegate classes and everything will suddenly become ridiculously simple and clear.

Resources