SyncVar transfer producer/consumer threads in scala

SyncVar transfer producer/consumer threads in scala - multithreading

Note: The problem that I solve has only educational purpose, I know that abstraction that I want to create is error prone and so on... I don't need fast solution, I need explanation.
In the book I am reading there is exercise that says that I need to implement SyncVar which has the following interface:
class SyncVar[T] {
def get(): T = ???
def put(x: T): Unit = ???
}
My comment: Alright seems understandable, need some sync variable that I can put or get.
A SyncVar object is used to exchange values between two or more threads.
When created, the SyncVar object is empty:
° Calling get throws an exception
° Calling put adds a value to the SyncVar object
After a value is added to a SyncVar object, we can say that it is non-empty:
° Calling get returns the current value, and changes the state to empty
° Calling put throws an exception
My thoughts: This is variable that throws exception on empty value when calling get, or put when we have a value, when we call get it clears previous value. Seems like I need to use Option.
So I provide the following implementation:
class SyncVar[T] {
var value: Option[T] = None
def get(): T = value match {
case Some(t) => this.synchronized {
value = None
t
}
case None => throw new IllegalArgumentException("error get")
}
def put(x: T): Unit = this.synchronized{
value match {
case Some(t) => throw new IllegalArgumentException("error put")
case None => value = Some(x)
}
}
def isEmpty = value.isEmpty
def nonEmpty = value.nonEmpty
}
My comment:
Synchronously invoking put and get, also have isEmpty and nonEmpty
The next task makes me confused:
The SyncVar object from the previous exercise can be cumbersome to use,
due to exceptions when the SyncVar object is in an invalid state. Implement
a pair of methods isEmpty and nonEmpty on the SyncVar object. Then,
implement a producer thread that transfers a range of numbers 0 until 15
to the consumer thread that prints them.
As I understand I need two threads:
//producer thread that produces numbers from 1 to 15
val producerThread = thread{
for (i <- 0 until 15){
println(s"$i")
if (syncVar.isEmpty) {
println(s"put $i")
syncVar.put(i)
}
}
}
//consumer that prints value from 0 to 15
val consumerThread = thread{
while (true) {
if (syncVar.nonEmpty) println(s"get ${syncVar.get()}")
}
}
Question:
But this code caused by nondeterminism, so it has different result each time, while I need to print numbers from 1 to 15 (in right order). Could you explain me what is wrong with my solution?

First, your synchronized in get is too narrow. It should surround the entire method, like in put (can you think why?).
After fixing, consider this scenario:
producerThread puts 0 into syncVar.
producerThread continues to run and tries to put 1. syncVar.isEmpty returns false so it doesn't put 1. It continues to loop with next i instead.
consumerThread gets 0.
producerThread puts 2.
Etc. So consumerThread can never get and print 1, because producerThread never puts it there.
Think what producerThread should do if syncVar is not empty and what consumerThread should do if it is.

Thanks to #Alexey Romanov, finally I implement transfer method:
Explanation:
The idea is, that producer thread checks is syncVar is empty, if it is it puts it, otherwise it waits with while(syncVar.nonEmpty){} (using busy waiting, which is bad practice, but it is important to know about it in educational purpose) and when we leaving the loop(stop busy waiting) we putting variable and leaving for loop for i == 0. Meanwhile consumer thread busy waiting forever, and reads variable when it is nonEmpty.
Solution:
def transfer() = {
val syncVar = new SyncVar[Int]
val producerThread = thread{
log("producer thread started")
for (i <- 0 until 15){
if (syncVar.isEmpty) {
syncVar.put(i)
} else {
while (syncVar.nonEmpty) {
log("busy wating")
}
if (syncVar.isEmpty) {
syncVar.put(i)
}
}
}
}
val consumerThread = thread{
log("consumer thread started")
while (true) {
if (syncVar.nonEmpty) {
syncVar.get()
}
}
}
}

Related

What's wrong with this groovy for-loop of Closures? [duplicate]

In the context of Jenkins pipelines, I have some Groovy code that's enumerating a list, creating closures, and then using that value in the closure as a key to lookup another value in a map. This appears to be rife with some sort of anomaly or race condition almost every time.
This is a simplification of the code:
def tasks = [:]
for (platformName in platforms) {
// ...
tasks[platformName] = {
def componentUploadPath = componentUploadPaths[platformName]
echo "Uploading for platform [${platformName}] to [${componentUploadPath}]."
// ...
}
tasks.failFast = true
parallel(tasks)
platforms has two values. I will usually see two iterations and two tasks registered and the keys in tasks will be correct, but the echo statement inside the closure indicates that we're just running one of the platforms twice:
14:20:02 [platform2] Uploading for platform [platform1] to [some_path/platform1].
14:20:02 [platform1] Uploading for platform [platform1] to [some_path/platform1].
It's ridiculous.
What do I need to add or do differently?

It's the same issue as you'd see in Javascript.
When you generate the closures in a for loop, they are bound to a variable, not the value of the variable.
When the loop exits, and the closures are run, they will all be using the same value...that is -- the last value in the for loop before it exited
For example, you'd expect the following to print 1 2 3 4, but it doesn't
def closures = []
for (i in 1..4) {
closures << { -> println i }
}
closures.each { it() }
It prints 4 4 4 4
To fix this, you need to do one of two things... First, you could capture the value in a locally scoped variable, then close over this variable:
for (i in 1..4) {
def n = i
closures << { -> println n }
}
The second thing you could do is use groovy's each or collect as each time they are called, the variable is a different instance, so it works again:
(1..4).each { i ->
closures << { -> println i }
}
For your case, you can loop over platforms and collect into a map at the same time by using collectEntries:
def tasks = platforms.collectEntries { platformName ->
[
platformName,
{ ->
def componentUploadPath = componentUploadPaths[platformName]
echo "Uploading for platform [${platformName}] to [${componentUploadPath}]."
}
]
}
Hope this helps!

NPE when retrieving from HashMap even though containsKey() returns true in multithreaded environment

We are trying to store unique object for a particular key. When getMyObject is called in multithreaded environment we are getting null ptr exception at the time of return statement
object SampleClass
{
fun getMyObject(Id : String) : MyObject
{
if(!myMap.containsKey(Id))
{
synchronized(SampleClass)
{
if(!myMap.containsKey(Id))
{
myMap[Id] = MyObject()
}
}
}
return myMap[Id]!!
}
private val myMap = HashMap<String,MyObject>()
}
It seems even though contains method returns true when we try to get the value the value is returned null.
I am not sure what is the reason behind it.

This is the kind of trouble you get into if you try to outsmart the memory model. If you look at HashMap's source, you'll find containsKey implemented as:
public boolean containsKey(Object key) {
return getNode(key) != null;
}
Note that it returns true only if there's a HashMap.Node object corresponding to the given key. Now, this is how get is implemented:
public V get(Object key) {
Node<K,V> e;
return (e = getNode(key)) == null ? null : e.value;
}
What you're seeing is an instance of the unsafe publication problem. Let's say 2 threads (A & B) call getMyObject for a non-existent key. A is slightly ahead of B, so it gets into the synchronized block before B calls containsKey. Particularly, A calls put before B calls containsKey. The call to put creates a new Node object and puts it into the hash map's internal data structures.
Now, consider the case when B calls containsKey before A exists the synchronized block. B might see the Node object put by A, in such case containsKey returns true. At this point, however, the node is unsafely published, because it is accessed by B concurrently in a non-synchronized manner. There's no guarantee its constructor (the one setting its value field) has been called. Even if it was called, there's no guarantee the value reference (or any references set by the constructor) is published along with the node reference. This means B can see an incomplete node: the node reference but not its value or any of its fields. When B proceeds to get, it reads null as the value of the unsafely published node. Hence the NullPointerException.
Here's an ad-hoc diagram for visualizing this:
Thread A Thread B
- Enter the synchronized block
- Call hashMap.put(...)
- Insert a new Node
- See the newly inserted (but not yet
initialized from the perspective of B)
Node in HashMap.containsKey
- Return node.value (still null)
from HashMap.get
- !! throws a `NullPointerException`
...
- Exit the synchronized block
(now the node is safely published)
The above is just one scenario where things can go wrong (see comments). To avoid such hazards, either use a ConcurrentHashMap (e.g. map.computeIfAbsent(key, key -> new MyObject())) or never access your HashMap concurrently outside of a synchronized block.

Implementing pipes without using threads

I am working on a small language for fun and to try out some ideas. One of the ideas I am trying to implement is piping like in the shell but with arbitrary objects. An example might make this clearer.
The functions range and show_pipe can be defined like this:
range(n) => {
x := 0;
while x < n do {
push x;
x := x + 1;
}
}
show_pipe() => {
while true do {
x := pull;
if x = FinishedPipe then {
return 0
} else {
print(x)
};
}
}
push pushes a value into the next part of the pipeline and suspends the function until another value is needed and pull pulls a value from the pipe and returns it or FinishedPipe if the previous part of the pipeline has finished executing.
You can then pipe these two function together with range(10) | show_pipe() which will show the numbers 0 through 9 on console.
I'm implementing this by using a thread for each part of the pipeline and using thread safe queues for passing values from one part of the pipe the other. I would really like to find a way to implement pipes without using threads. I am using Rust so I can't use coroutines.

Fork/Join example with GPars

I found an example for fork/join in GPars here: Fork/Join
import static groovyx.gpars.GParsPool.runForkJoin
import static groovyx.gpars.GParsPool.withPool
withPool() {
println """Number of files: ${
runForkJoin(new File("./src")) {file ->
long count = 0
file.eachFile {
if (it.isDirectory()) {
println "Forking a child task for $it"
forkOffChild(it) //fork a child task
} else {
count++
}
}
return count + (childrenResults.sum(0))
//use results of children tasks to calculate and store own result
}
}"""
}
It works and returns the correct number of files, but unfortunately I don't understand this line:
return count + (childrenResults.sum(0))
How exactly work count and childrenResult?
Why is a 0 passed as a parameter to sum()?

I'm not much familiar with GPars, but the link you provided says it is a Divide-and-Conquer algorithm and clarifies a bit more what's implicit later on, explaining that forkOffChild() does not wait -- instead getChildrenResults() does.
You may find easier to understand the provided alternative approach in the same page, that uses a more Java-ish style, if you're more familiar to that.
childrenResults results in calling the method getChildrenResults(), this is the "join" in "Fork/Join", it waits for all children to finish and then returns a list with the results of them (or re-throws any exception a children may have thrown).
0 is just the initial value for the sum. If childrenResult is empty, that's what gets summed to count:
groovy:000> [].sum(1)
===> 1
groovy:000> [1].sum(1)
===> 2

Access to modified closure - ref int

int count = itemsToValidate.Count;
foreach(var item in itemsToValidate)
{
item.ValidateAsync += (x, y) => this.HandleValidate(ref count);
}
private void HandleValidate(ref int x)
{
--x;
if (x == 0)
{
// All items are validated.
}
}
For the above code resharper complained "Access to Modified Closure". Doesn't do that if I change that to type of object. Why is this a closure, even though I am passing by ref ?

This happens all the time
ReSharper is warning you that count is implicitly captured by the lambdas that you are assigning as "validation complete" event handlers, and that its value may well change between the time the lambda is created (i.e. when you assign the event handler) and the time when it is invoked. If this happens, the lambda will not see the value one would intuitively expect.
An example:
int count = itemsToValidate.Count;
foreach(var item in itemsToValidate)
{
item.ValidateAsync += (x, y) => this.HandleValidate(ref count);
}
// afterwards, at some point before the handlers get invoked:
count = 0;
In this instance the handlers will read the value of count as 0 instead of itemsToValidate.Count -- which might be called "obvious", but is surprising and counter-intuitive to many developers not familiar with the mechanics of lambdas.
And we usually solve it like this
The usual solution to "shut R# up" is to move the captured variable in an inner scope, where it is much less accessible and R# can be prove that it cannot be modified until the lambda is evaluated:
int count = itemsToValidate.Count;
foreach(var item in itemsToValidate)
{
int inner = count; // this makes inner impossible to modify
item.ValidateAsync += (x, y) => this.HandleValidate(ref inner);
}
// now this will of course not affect what the lambdas do
count = 0;
But your case is special
Your particular case is a comparatively rare one where you specifically want this behavior, and using the above trick would actually make the program behave incorrectly (you need the captured references to point to the same count).
The correct solution: disable this warning using the special line comments that R# recognizes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

SyncVar transfer producer/consumer threads in scala - multithreading

Related

What's wrong with this groovy for-loop of Closures? [duplicate]

NPE when retrieving from HashMap even though containsKey() returns true in multithreaded environment

Implementing pipes without using threads

Fork/Join example with GPars

Access to modified closure - ref int

Categories

Resources