How can I get safe reference to existing object in Nim? - nim-lang

I wrote a Nim procedure taking two objects as var parameters. They each are an object with an int field level. Before doing the real work of the procedure, I want to put the parameters in order by which has the larger level, so I do this:
proc subroutine(param1: var MyObject, param2: var MyObject) =
var large: ptr MyObject = param1.addr
var small: ptr MyObject = param2.addr
# If in wrong order by level, swap large and small
if large.level < small.level:
large = param2.addr
small = param1.addr
# The rest of the proc references, only variables large and small, e.g.,
large.level += 1
small.level += 2
This seems to work for my application, but I notice that in the Nim documentation, the ptr type is called "unsafe", and it is suggested to be used only for low-level operations. There's a "safe" reference type ref, and it is suggested to use ref unless you really want to do manual memory management.
I don't want to do manual memory management, and I want the Nim garbage collector to handle the memory for these parameters for me, but I don't see a way to get safe ref's to the two parameters.
I really want to be able to write the algorithm (which is significantly more complex than the simple code I showed) in terms of the variables large and small, rather than param1 and param2. Otherwise, if I can only reference parameters param1 and param2, without making these aliases for them, I'd have to copy and paste the same algorithm twice to separately handle the cases param1.level < param2.level and param1.level >= param2.level.
Is there a more idiomatic Nim way to do something like this, without using a ptr type?

There is no way you can turn a safe objecto into unsafe and viceversa, except by performing a copy of the object.
Normal variables are stored on the stack and thus will be destroyed when their function scope exists, but a reference can be stored into a global variable and accessed later. If this was possible, to make it safe, the compiler/language would need to know some way of extracting the variable from the stack so that it's still valid after its scope exists, or perform a copy manually behind your back, or some other magical thing.
That's also the reason why taking the address of a variable is unsafe. You can't guarantee its lifetime, because you could store that address somewhere else and try to use it later. However, in terms of memory guarantees, those variables should be kept alive at least for the time of your proc call, so it should be safe to use those address aliases within that proc without worrying.
That said, you could rewrite your code to use an intermediate proxy proc that performs the check and thus passes the correct variables in each slot. The intermediate proc guarantees that one of the variables will always be large, and you can avoid using unsafe references:
type
MyObject = object
level: int
proc subroutine_internal(large: var MyObject, small: var MyObject) =
assert large.level >= small.level
large.level += 1
small.level += 2
proc subroutine(param1: var MyObject, param2: var MyObject) =
if param1.level < param2.level:
subroutine_internal(param2, param1)
else:
subroutine_internal(param1, param2)
proc main() =
var
a = MyObject(level: 3)
b = MyObject(level: 40)
subroutine(a, b)
echo a, b
main()

type
MyObject = object
level: int
proc subroutine(param1, param2: var MyObject) =
if param1.level > param2.level:
swap(param1, param2)
echo param1.level
echo param2.level
var
p1 = MyObject(level: 7)
p2 = MyObject(level: 3)
subroutine(p1, p2)
p2.level = 13
subroutine(p1, p2)
Nim has the special swap() proc for that case. Var parameters are passed internally as pointers, so in parameter passing no copy is involved, and swap should do the copy as optimal as possible.
Of course, there can be cases where using ref objects instead of value objects can have advantages. Nim's references are managed pointers. You can find more information in the tutorials, and my one is located at http://ssalewski.de/nimprogramming.html#_value_objects_and_references.

Related

Pass by Reference in Haskell?

Coming from a C# background, I would say that the ref keyword is very useful in certain situations where changes to a method parameter are desired to directly influence the passed value for value types of for setting a parameter to null.
Also, the out keyword can come in handy when returning a multitude of various logically unconnected values.
My question is: is it possible to pass a parameter to a function by reference in Haskell? If not, what is the direct alternative (if any)?
There is no difference between "pass-by-value" and "pass-by-reference" in languages like Haskell and ML, because it's not possible to assign to a variable in these languages. It's not possible to have "changes to a method parameter" in the first place in influence any passed variable.
It depends on context. Without any context, no, you can't (at least not in the way you mean). With context, you may very well be able to do this if you want. In particular, if you're working in IO or ST, you can use IORef or STRef respectively, as well as mutable arrays, vectors, hash tables, weak hash tables (IO only, I believe), etc. A function can take one or more of these and produce an action that (when executed) will modify the contents of those references.
Another sort of context, StateT, gives the illusion of a mutable "state" value implemented purely. You can use a compound state and pass around lenses into it, simulating references for certain purposes.
My question is: is it possible to pass a parameter to a function by reference in Haskell? If not, what is the direct alternative (if any)?
No, values in Haskell are immutable (well, the do notation can create some illusion of mutability, but it all happens inside a function and is an entirely different topic). If you want to change the value, you will have to return the changed value and let the caller deal with it. For instance, see the random number generating function next that returns the value and the updated RNG.
Also, the out keyword can come in handy when returning a multitude of various logically unconnected values.
Consequently, you can't have out either. If you want to return several entirely disconnected values (at which point you should probably think why are disconnected values being returned from a single function), return a tuple.
No, it's not possible, because Haskell variables are immutable, therefore, the creators of Haskell must have reasoned there's no point of passing a reference that cannot be changed.
consider a Haskell variable:
let x = 37
In order to change this, we need to make a temporary variable, and then set the first variable to the temporary variable (with modifications).
let tripleX = x * 3
let x = tripleX
If Haskell had pass by reference, could we do this?
The answer is no.
Suppose we tried:
tripleVar :: Int -> IO()
tripleVar var = do
let times_3 = var * 3
let var = times_3
The problem with this code is the last line; Although we can imagine the variable being passed by reference, the new variable isn't.
In other words, we're introducing a new local variable with the same name;
Take a look again at the last line:
let var = times_3
Haskell doesn't know that we want to "change" a global variable; since we can't reassign it, we are creating a new variable with the same name on the local scope, thus not changing the reference. :-(
tripleVar :: Int -> IO()
tripleVar var = do
let tripleVar = var
let var = tripleVar * 3
return()
main = do
let x = 4
tripleVar x
print x -- 4 :(

Must access to scala.collection.immutable.List and Vector be synchronized?

I'm going through Learning Concurrent Programming in Scala, and encountered the following:
In current versions of Scala, however, certain collections that are
deemed immutable, such as List and Vector, cannot be shared without
synchronization. Although their external API does not allow you to
modify them, they contain non-final fields.
Tip: Even if an object
seems immutable, always use proper synchronization to share any object
between the threads.
From Learning Concurrent Programming in Scala by Aleksandar Prokopec, end of Chapter 2 (p.58), Packt Publishing, Nov 2014.
Can that be right?
My working assumption has always been that any internal mutability (to implement laziness, caching, whatever) in Scala library data structures described as immutable would be idempotent, such that the worst that might happen in a bad race is work would be unnecessarily duplicated. This author seems to suggest correctness may be imperiled by concurrent access to immutable structures. Is that true? Do we really need to synchronize access to Lists?
Much of my transition to an immutable-heavy style has been motivated by a desire to avoid synchronization and the potential contention overhead it entails. It would be an unhappy big deal to learn that synchronization cannot be eschewed for Scala's core "immutable" data structures. Is this author simply overconservative?
Scala's documentation of collections includes the following:
A collection in package scala.collection.immutable is guaranteed to be immutable for everyone. Such a collection will never change after it is created. Therefore, you can rely on the fact that accessing the same collection value repeatedly at different points in time will always yield a collection with the same elements.
That doesn't quite say that they are safe for concurrent access by multiple threads. Does anyone know of an authoritative statement that they are (or aren't)?
It depends on where you share them:
it's not safe to share them inside scala-library
it's not safe to share them with Java-code, reflection
Simply saying, these collections are less protected than objects with only final fields. Regardless that they're same on JVM level (without optimization like ldc) - both may be fields with some mutable address, so you can change them with putfield bytecode command. Anyway, var is still less protected by the compiler, in comparision with java's final, scala's final val and val.
However, it's still fine to use them in most cases as their behaviour is logically immutable - all mutable operations are encapsulated (for Scala-code). Let's look at the Vector. It requires mutable fields to implement appending algorithm:
private var dirty = false
//from VectorPointer
private[immutable] var depth: Int = _
private[immutable] var display0: Array[AnyRef] = _
private[immutable] var display1: Array[AnyRef] = _
private[immutable] var display2: Array[AnyRef] = _
private[immutable] var display3: Array[AnyRef] = _
private[immutable] var display4: Array[AnyRef] = _
private[immutable] var display5: Array[AnyRef] = _
which is implemented like:
val s = new Vector(startIndex, endIndex + 1, blockIndex)
s.initFrom(this) //uses displayN and depth
s.gotoPos(startIndex, startIndex ^ focus) //uses displayN
s.gotoPosWritable //uses dirty
...
s.dirty = dirty
And s comes to the user only after method returned it. So it's not even concern of happens-before guarantees - all mutable operations are performed in the same thread (thread where you call :+, +: or updated), it's just kind of initialization. The only problem here is that private[somePackage] is accessible directly from Java code and from scala-library itself, so if you pass it to some Java's method it could modify them.
I don't think you should worry about thread-safety of let's say cons operator. It also has mutable fields:
final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
override def tail : List[B] = tl
override def isEmpty: Boolean = false
}
But they used only inside library methods (inside one-thread) without any explicit sharing or thread creation, and they always return a new collection, let's consider take as an example:
override def take(n: Int): List[A] = if (isEmpty || n <= 0) Nil else {
val h = new ::(head, Nil)
var t = h
var rest = tail
var i = 1
while ({if (rest.isEmpty) return this; i < n}) {
i += 1
val nx = new ::(rest.head, Nil)
t.tl = nx //here is mutation of t's filed
t = nx
rest = rest.tail
}
h
}
So here t.tl = nx is not much differ from t = nx in the meaning of thread-safety. They both are reffered only from the single stack (take's stack). Althrought, if I add let's say someActor ! t (or any other async operation), someField = t or someFunctionWithExternalSideEffect(t) right inside the while loop - I could break this contract.
A little addtion here about relations with JSR-133:
1) new ::(head, Nil) creates new object in the heap and puts its address (lets say 0x100500) into the stack(val h =)
2) as long as this address is in the stack, it's known only to the current thread
3) Other threads could be involved only after sharing this address by putting it into some field; in case of take it has to flush any caches (to restore the stack and registers) before calling areturn (return h), so returned object will be consistent.
So all operations on 0x100500's object are out of scope of JSR-133 as long as 0x100500 is a part of stack only (not heap, not other's stacks). However, some fields of 0x100500's object may point to some shared objects (which might be in scope JSR-133), but it's not the case here (as these objects are immutable for outside).
I think (hope) the author meant logical synchronization guarantees for library's developers - you still need to be careful with these things if you're developing scala-library, as these vars are private[scala], private[immutable] so, it's possible to write some code to mutate them from different threads. From scala-library developer's perspective, it usually means that all mutations on single instance should be applied in single thread and only on collection that invisible to a user (at the moment). Or, simply saying - don't open mutable fields for outer users in any way.
P.S. Scala had several unexpected issues with synchronization, which caused some parts of the library to be surprisely not thread-safe, so I wouldn't wonder if something may be wrong (and this is a bug then), but in let's say 99% cases for 99% methods immutable collections are thread safe. In worst case you might be pushed from usage of some broken method or just (it might be not just "just" for some cases) need to clone the collection for every thread.
Anyway, immutability is still a good way for thread-safety.
P.S.2 Exotic case which might break immutable collections' thread-safety is using reflection to access their non-final fields.
A little addition about another exotic but really terrifying way, as it pointed out in comments with #Steve Waldman and #axel22 (the author). If you share immutable collection as member of some object shared netween threads && if collection's constructor becomes physically (by JIT) inlined (it's not logically inlined by default) && if your JIT-implementation allows to rearrange inlined code with normal one - then you have to synchronize it (usually is enough to have #volatile). However, IMHO, I don't believe that last condition is a correct behaviour - but for now, can't neither prove nor disprove that.
In your question you are asking for an authoritative statement. I found the following in "Programming in Scala" from Martin Odersky et al:
"Third, there is no way for two threads concurrently accessing an immutable to corrupt its state once it has been properbly constructed, because no thread can change the state of an immutable"
If you look for example at the implementation you see that this is followed in the implementation, see below.
There are some fields inside vector which are not final and could lead to data races. But since they are only changed inside a method creating a new instance and since you need an Synchronization action to access the newly created instance in different threads anyway everyting is fine.
The pattern used here is to create and modify an object. Than make it visible to other threads, for example by assigning this instance to a volatile static or static final. And after that make sure that it is not changed anymore.
As an Example the creation of two vectors:
val vector = Vector(4,5,5)
val vector2 = vector.updated(1, 2);
The method updated uses the var field dirty inside:
private[immutable] def updateAt[B >: A](index: Int, elem: B): Vector[B] = {
val idx = checkRangeConvert(index)
val s = new Vector[B](startIndex, endIndex, idx)
s.initFrom(this)
s.dirty = dirty
s.gotoPosWritable(focus, idx, focus ^ idx) // if dirty commit changes; go to new pos and prepare for writing
s.display0(idx & 0x1f) = elem.asInstanceOf[AnyRef]
s
}
but since after creation of vector2 it is assigned to a final variable:
Bytecode of variable declaration:
private final scala.collection.immutable.Vector vector2;
Byte code of constructor:
61 invokevirtual scala.collection.immutable.Vector.updated(int, java.lang.Object, scala.collection.generic.CanBuildFrom) : java.lang.Object [52]
64 checkcast scala.collection.immutable.Vector [48]
67 putfield trace.agent.test.scala.TestVector$.vector2 : scala.collection.immutable.Vector [22]
Everything is o.k.

What exactly are strings in Nim?

From what I understand, strings in Nim are basically a mutable sequence of bytes and that they are copied on assignment.
Given that, I assumed that sizeof would tell me (like len) the number of bytes, but instead it always gives 8 on my 64-bit machine, so it seems to be holding a pointer.
Given that, I have the following questions...
What was the motivation behind copy on assignment? Is it because they're mutable?
Is there ever a time when it isn't copied when assigned? (I assume non-var function parameters don't copy. Anything else?)
Are they optimized such that they only actually get copied if/when they're mutated?
Is there any significant difference between a string and a sequence, or can the answers to the above questions be equally applied to all sequences?
Anything else in general worth noting?
Thank you!
The definition of strings actually is in system.nim, just under another name:
type
TGenericSeq {.compilerproc, pure, inheritable.} = object
len, reserved: int
PGenericSeq {.exportc.} = ptr TGenericSeq
UncheckedCharArray {.unchecked.} = array[0..ArrayDummySize, char]
# len and space without counting the terminating zero:
NimStringDesc {.compilerproc, final.} = object of TGenericSeq
data: UncheckedCharArray
NimString = ptr NimStringDesc
So a string is a raw pointer to an object with a len, reserved and data field. The procs for strings are defined in sysstr.nim.
The semantics of string assignments have been chosen to be the same as for all value types (not ref or ptr) in Nim by default, so you can assume that assignments create a copy. When a copy is unneccessary, the compiler can leave it out, but I'm not sure how much that is happening so far. Passing strings into a proc doesn't copy them. There is no optimization that prevents string copies until they are mutated. Sequences behave in the same way.
You can change the default assignment behaviour of strings and seqs by marking them as shallow, then no copy is done on assignment:
var s = "foo"
shallow s

Inconsistencies when using UnsafeMutablePointer with String or Character types

I'm currently trying to implement my own DynamicArray data type in Swift. To do so I'm using pointers a bit. As my root I'm using an UnsafeMutablePointer of a generic type T:
struct DynamicArray<T> {
private var root: UnsafeMutablePointer<T> = nil
private var capacity = 0 {
didSet {
//...
}
}
//...
init(capacity: Int) {
root = UnsafeMutablePointer<T>.alloc(capacity)
self.capacity = capacity
}
init(count: Int, repeatedValue: T) {
self.init(capacity: count)
for index in 0..<count {
(root + index).memory = repeatedValue
}
self.count = count
}
//...
}
Now as you can see I've also implemented a capacity property which tells me how much memory is currently allocated for root. Accordingly one can create an instance of DynamicArray using the init(capacity:) initializer, which allocates the appropriate amount of memory, and sets the capacity property.
But then I also implemented the init(count:repeatedValue:) initializer, which first allocates the needed memory using init(capacity: count). It then sets each segment in that part of memory to the repeatedValue.
When using the init(count:repeatedValue:) initializer with number types like Int, Double, or Float it works perfectly fine. Then using Character, or String though it crashes. It doesn't crash consistently though, but actually works sometimes, as can be seen here, by compiling a few times.
var a = DynamicArray<Character>(count: 5, repeatedValue: "A")
println(a.description) //prints [A, A, A, A, A]
//crashes most of the time
var b = DynamicArray<Int>(count: 5, repeatedValue: 1)
println(a.description) //prints [1, 1, 1, 1, 1]
//works consistently
Why is this happening? Does it have to do with String and Character holding values of different length?
Update #1:
Now #AirspeedVelocity addressed the problem with init(count:repeatedValue:). The DynamicArray contains another initializer though, which at first worked in a similar fashion as init(count:repeatedValue:). I changed it to work, as #AirspeedVelocity described for init(count:repeatedValue:) though:
init<C: CollectionType where C.Generator.Element == T, C.Index.Distance == Int>(collection: C) {
let collectionCount = countElements(collection)
self.init(capacity: collectionCount)
root.initializeFrom(collection)
count = collectionCount
}
I'm using the initializeFrom(source:) method as described here. And since collection conforms to CollectionType it should work fine.
I'm now getting this error though:
<stdin>:144:29: error: missing argument for parameter 'count' in call
root.initializeFrom(collection)
^
Is this just a misleading error message again?
Yes, chances are this doesn’t crash with basic inert types like integers but does with strings or arrays because they are more complex and allocate memory for themselves on creation/destruction.
The reason it’s crashing is that UnsafeMutablePointer memory needs to be initialized before it’s used (and similarly, needs to de-inited with destroy before it is deallocated).
So instead of assigning to the memory property, you should use the initialize method:
for index in 0..<count {
(root + index).initialize(repeatedValue)
}
Since initializing from another collection of values is so common, there’s another version of initialize that takes one. You could use that in conjunction with another helper struct, Repeat, that is a collection of the same value repeated multiple times:
init(count: Int, repeatedValue: T) {
self.init(capacity: count)
root.initializeFrom(Repeat(count: count, repeatedValue: repeatedValue))
self.count = count
}
However, there’s something else you need to be aware of which is that this code is currently inevitably going to leak memory. The reason being, you will need to destroy the contents and dealloc the pointed-to memory at some point before your DynamicArray struct is destroyed, otherwise you’ll leak. Since you can’t have a deinit in a struct, only a class, this won’t be possible to do automatically (this is assuming you aren’t expecting users of your array to do this themselves manually before it goes out of scope).
Additionally, if you want to implement value semantics (as with Array and String) via copy-on-write, you’ll also need a way of detecting if your internal buffer is being referenced multiple times. Take a look at ManagedBufferPointer to see a class that handles this for you.

Programming Language Evaluation Strategies

Could you please explain differences between and definition of call by value, call by reference, call by name and call by need?
Call by value
Call-by-value evaluation is the most common evaluation strategy, used in languages as different as C and Scheme. In call-by-value, the argument expression is evaluated, and the resulting value is bound to the corresponding variable in the function (frequently by copying the value into a new memory region). If the function or procedure is able to assign values to its parameters, only its local copy is assigned — that is, anything passed into a function call is unchanged in the caller's scope when the function returns.
Call by reference
In call-by-reference evaluation (also referred to as pass-by-reference), a function receives an implicit reference to a variable used as argument, rather than a copy of its value. This typically means that the function can modify (i.e. assign to) the variable used as argument—something that will be seen by its caller. Call-by-reference can therefore be used to provide an additional channel of communication between the called function and the calling function. A call-by-reference language makes it more difficult for a programmer to track the effects of a function call, and may introduce subtle bugs.
differences
call by value example
If data is passed by value, the data is copied from the variable used in for example main() to a variable used by the function. So if the data passed (that is stored in the function variable) is modified inside the function, the value is only changed in the variable used inside the function. Let’s take a look at a call by value example:
#include <stdio.h>
void call_by_value(int x) {
printf("Inside call_by_value x = %d before adding 10.\n", x);
x += 10;
printf("Inside call_by_value x = %d after adding 10.\n", x);
}
int main() {
int a=10;
printf("a = %d before function call_by_value.\n", a);
call_by_value(a);
printf("a = %d after function call_by_value.\n", a);
return 0;
}
The output of this call by value code example will look like this:
a = 10 before function call_by_value.
Inside call_by_value x = 10 before adding 10.
Inside call_by_value x = 20 after adding 10.
a = 10 after function call_by_value.
call by reference example
If data is passed by reference, a pointer to the data is copied instead of the actual variable as is done in a call by value. Because a pointer is copied, if the value at that pointers address is changed in the function, the value is also changed in main(). Let’s take a look at a code example:
#include <stdio.h>
void call_by_reference(int *y) {
printf("Inside call_by_reference y = %d before adding 10.\n", *y);
(*y) += 10;
printf("Inside call_by_reference y = %d after adding 10.\n", *y);
}
int main() {
int b=10;
printf("b = %d before function call_by_reference.\n", b);
call_by_reference(&b);
printf("b = %d after function call_by_reference.\n", b);
return 0;
}
The output of this call by reference source code example will look like this:
b = 10 before function call_by_reference.
Inside call_by_reference y = 10 before adding 10.
Inside call_by_reference y = 20 after adding 10.
b = 20 after function call_by_reference.
when to use which
One advantage of the call by reference method is that it is using pointers, so there is no doubling of the memory used by the variables (as with the copy of the call by value method). This is of course great, lowering the memory footprint is always a good thing. So why don’t we just make all the parameters call by reference?
There are two reasons why this is not a good idea and that you (the programmer) need to choose between call by value and call by reference. The reason are: side effects and privacy. Unwanted side effects are usually caused by inadvertently changes that are made to a call by reference parameter. Also in most cases you want the data to be private and that someone calling a function only be able to change if you want it. So it is better to use a call by value by default and only use call by reference if data changes are expected.
call by name
In call-by-name evaluation, the arguments to a function are not evaluated before the function is called — rather, they are substituted directly into the function body (using capture-avoiding substitution) and then left to be evaluated whenever they appear in the function.
call by need
Lazy evaluation, or call-by-need is an evaluation strategy which delays the evaluation of an expression until its value is needed (non-strict evaluation) and which also avoids repeated evaluations

Resources