What exactly is the difference between Box::into_raw() and Box::leak()? - rust

As you know, both Box::into_raw() and Box::leak() consume the current Box and lose ownership of the memory.
The two just seem to have different types of return values, what exactly is the other difference between them?
How about typical application scenarios?

into_raw is typically used for FFI to get a pointer that can be sent to the other language, and is usually matched with a later call to from_raw to reclaim ownership and free the memory.
leak is typically used to get a 'static reference to satisfy some API requirement and is usually kept until the program exits.

Related

How does Rust's memory management differ from compile-time garbage collection?

I have read that Rust's compiler "inserts" memory management code during compile time, and this sounds kind of like "compile-time garbage collection".
What is the difference between these two ideas?
I've seen What does Rust have instead of a garbage collector? but that is about runtime garbage collection, not compile-time.
Compile-time garbage collection is commonly defined as follows:
A complementary form of automatic memory management is compile-time memory management (CTGC), where the decisions for memory management are taken at compile-time instead of at run-time. The compiler determines the life-time of the variables that are created during the execution of the program, and thus also the memory that will be associated with these variables. Whenever the compiler can guarantee that a variable, or more precisely, parts of the memory resources that this variable points to at run-time, will never ever be accessed beyond a certain program instruction, then the compiler can add instructions to deallocate these resources at that particular instruction without compromising the correctness of the resulting code.
(From Compile-Time Garbage Collection for the Declarative Language Mercury by Nancy Mazur)
Rust handles memory by using a concept of ownership and borrow checking. Ownership and move semantics describe which variable owns a value. Borrowing describes which references are allowed to access a value. These two concepts allow the compiler to "drop" the value when it is no longer accessible, causing the program to call the dtop method from the Drop trait).
However, the compiler itself doesn't handle dynamically allocated memory at all. It only handles drop checking (figuring out when to call drop) and inserting the .drop() calls. The drop implementation is responsible for determining what happens at this point, whether that is deallocating some dynamic memory (which is what Box's drop does, for example), or doing anything else. The compiler therefore never really enforces garbage collection, and it doesn't enforce deallocating unused memory. So we can't claim that Rust implements compile-time garbage collection, even if what Rust has is very reminiscent of it.

Why use an AtomicU32 in Rust, given that U32 already implements Sync?

The std::sync::atomic module contains a number of atomic variants of primitive types, with the stated purpose that these types are now thread-safe. However, all the primatives that correspond to the atomic types already implement Send and Sync, and should therefore already be thread-safe. What's the reasoning behind the Atomic types?
Generally, non-atomic integers are safe to share across threads because they're immutable. If you attempt to modify the value, you implicitly create a new one in most cases because they're Copy. However, it isn't safe to share a mutable reference to a u32 across threads (or have both mutable and immutable references to the same value), which practically means that you won't be able to modify the variable and have another thread see the results. An atomic type has some additional behavior which makes it safe.
In the more general case, using non-atomic operations doesn't guarantee that a change made in one thread will be visible in another. Many architectures, especially RISC architectures, do not guarantee that behavior without additional instructions.
In addition, compilers often reorder accesses to memory in functions and in some cases, across functions, and an atomic type with an appropriate barrier is required to indicate to the compiler that such behavior is not wanted.
Finally, atomic operations are often required to logically update the contents of a variable. For example, I may want to atomically add 1 to a variable. On a load-store architecture such as ARM, I cannot modify the contents of memory with an add instruction; I can only perform arithmetic on registers. Consequently, an atomic add is multiple instructions, usually consisting of a load-linked, which loads a memory location, the add operation on the register, and then a store-conditional, which stores the value if the memory location has not changed. There's also a loop to retry if it has.
These are why atomic operations are needed and generally useful across languages. So while one can use non-atomic operations in non-Rust languages, they don't generally produce useful results, and since one typically wants one's code to function correctly, atomic operations are desirable for correctness. Rust's atomic types guarantee this behavior by generating suitable instructions and therefore can be safely shared across threads.

Understanding the Send trait

I am trying to wrap my head around Send + Sync traits. I get the intuition behind Sync - this is the traditional thread safety(like in C++). The object does the necessary locking(interior mutability if needed), so threads can safely access it.
But the Send part is bit unclear. I understand why things like Rc are Send only - the object can be given to a different thread, but non-atomic operations make it thread unsafe.
What is the intuition behind Send? Does it mean the object can be copied/moved into another thread context, and continues to be valid after the copy/move?
Any examples scenarios for "Sync but no Send" would really help. Please also point to any rust libraries for this case (I found several for the opposite though)
For (2), I found some threads which use structs with pointers to data on stack/thread local storage as examples. But these are unsafe anyways(Sync or otherwise).
Sync allows an object to to be used by two threads A and B at the same time. This is trivial for non-mutable objects, but mutations need to be synchronized (performed in sequence with the same order being seen by all threads). This is often done using a Mutex or RwLock which allows one thread to proceed while others must wait. By enforcing a shared order of changes, these types can turn a non-Sync object into a Sync object. Another mechanism for making objects Sync is to use atomic types, which are essentially Sync primitives.
Send allows an object to be used by two threads A and B at different times. Thread A can create and use an object, then send it to thread B, so thread B can use the object while thread A cannot. The Rust ownership model can be used to enforce this non-overlapping use. Hence the ownership model is an important part of Rust's Send thread safety, and may be the reason that Send is less intuitive than Sync when comparing with other languages.
Using the above definitions, it should be apparent why there are few examples of types that are Sync but not Send. If an object can be used safely by two threads at the same time (Sync) then it can be used safely by two threads at different times (Send). Hence, Sync usually implies Send. Any exception probably relates to Send's transfer of ownership between threads, which affects which thread runs the Drop handler and deallocates the value.
Most objects can be used safely by different threads if the uses can be guaranteed to be at different times. Hence, most types are Send.
Rc is an exception. It does not implement Send. Rc allows data to have multiple owners. If one owner in thread A could send the Rc to another thread, giving ownership to thread B, there could be other owners in thread A that can still use the object. Since the reference count is modified non-atomically, the value of the count on the two threads may get out of sync and one thread may drop the pointed-at value while there are owners in the other thread.
Arc is an Rc that uses an atomic type for the reference count. Hence it can be used by multiple threads without the count getting out of sync. If the data that the Arc points to is Sync, the entire object is Sync. If the data is not Sync (e.g. a mutable type), it can be made Sync using a Mutex. Hence the proliferation of Arc<Mutex<T>> types in multithreaded Rust code.
Send means that a type is safe to move from one thread to another. If the same type also implements Copy, this also means that it is safe to copy from one thread to another.
Sync means that a type is safe to reference from multiple threads at the same time. Specifically, that &T is Send and can be moved/copied to another thread if T is Sync.
So Send and Sync capture two different aspects of thread safety:
Non-Send types can only ever be owned by a single thread, since they cannot be moved or copied to other threads.
Non-Sync types can only be used by a single thread at any single time, since their references cannot be moved or copied to other threads. They can still be moved between threads if they implement Send.
It rarely makes sense to have Sync without Send, as being able to use a type from different threads would usually mean that moving ownership between threads should also be possible. Although they are technically different, so it is conceivable that certain types can be Sync but not Send.
Most types that own data will be Send, as there are few cases where data can't be moved from one thread to another (and not be accessed from the original thread afterwards).
Some common exceptions:
Raw pointers are never Send nor Sync.
Types that share ownership of data without thread synchronization (for instance Rc).
Types that borrow data that is not Sync.
Types from external libraries or the operating system that are not thread safe.
Overall
Send and Sync exist to help thinking about the types when many threads are involved. In a single thread world, there is no need for Send and Sync to exist.
It may help also to not always think about Send and Sync as allowing you to do something, or giving you power to do something. On the contrary, think about !Send and !Sync as ways of forbidding or preventing you of doing multi-thread problematic stuff.
For the definition of Send and Sync
If some type X is Send, then if you have an owned X, you can move it into another thread.
This can be problematic if X is somehow related to multi/shared-ownership.
Rc has a problem with this, since having one Rc allows you to create more owned Rc's (by cloning it), but you don't want any of those to pass into other threads. The problem is that many threads could be making more clones of that Rc at the same time, and the counter of the owners inside of it doesn't work well in that multi-thread situation - because even if each thread would own an Rc, there would be only one counter really, and access into it would not be synchronized.
Arc may work better. At least it's owner's counter is capable of dealing with the situation mentioned above. So in that regard, Arc is ok to allow Send'ing. But only if the inner type is both Send and Sync. For example, an Arc<Rc> is still problematic - remembering that Rc forbids Send (!Send) - because multiple threads having their own owned clone of that Arc<Rc> could still invoke the Rc's own "multi-thread" problems - the Arc itself can't protect the threads from doing that. The other requirement of Arc<T>, to being Send, also requiring T to be Sync is not a big of a deal, because if a type is already forbidding Send'ing, it will likely also be forbidding Sync'ing.
So if some type forbids Sending, then doesn't matter what other types you try wrapping around it, you won't be able to make it "sendable" into another thread.
If some type X is Sync, then if multiple threads happened to somehow have an &X each, they all can safely use that &X.
This is problematic if &X allows interior mutability, and you'd want to forbid Sync if you want to prevent multiple threads having &X.
So if X has a problem with Sending, it will basically also have a problem with Syncing.
It's also problematic for Cell - which doesn't actually forbids Sending. Since Cell allows interior mutation by only having an &Cell, and that mutation access doesn't guarantee anything in a multithread situation, it must forbid Syncing - that is, the situation of multiple threads having &Cell must not be allowed (in general). Regarding it being Send, an owned Cell can still be moved into another thread, as long as there won't be &Cell's anywhere else.
Mutex may work better. It also allows interior mutation, and in which case it knows how to deal when many threads are trying to do it - the Mutex will only require that nothing inside of it forbids Send'ing - otherwise, it's the same problem that Arc would have to deal with. All being good, the Mutex is both Send and Sync.
This is not a practical example, but a curious note: if we have a Mutex<Cell> (which is redundant, but oh well), where Cell itself forbids Sync, the Mutex is able to deal with that problem, and still be (or "re-allow") Sync. This is because, once a thread got access into that Cell, we known it won't have to deal with other threads still trying to access others &Cell at the same time, since the Mutex will be locked and preventing this from happening.
Mutate a value in multi-thread
In theory you could share a Mutex between threads!
If you try to simply move an owned Mutex, you will get it done, but this is of no use, since you'd want multiple threads having some access to it at the same time.
Since it's Sync, you're allowed to share a &Mutex between threads, and it's lock method indeed only requires a &Mutex.
But trying this is problematic, let's say: you're in the main thread, then you create a Mutex and then a reference to it, a &Mutex, and then create another thread Z which you try to pass the &Mutex into.
The problem is that the Mutex has only one owner, and that is inside the main thread. If for some reason the thread Z outlives the main thread, that &Mutex would be dangling. So even if the Sync in the Mutex doesn't particularly forbids you of sending/sharing &Mutex between threads, you'll likely not get it done in this way, for lifetime reasons. Arc to the rescue!
Arc will get rid of that lifetime problem. instead of it being owned by a particular scope in a particular thread, it can be multi-owned, by multi-threads.
So using an Arc<Mutex> will allow a value to be co-owned and shared, and offer interior mutability between many threads. In sum, the Mutex itself re-allows Syncing while not particularly forbidding Sending, and the Arc doesn't particularly forbids neither while offering shared ownership (avoiding lifetime problems).
Small list of types
Types that are Send and Sync, are those that don't particularly forbids neither:
primitives, Arc, Mutex - depending on the inner types
Types that are Send and !Sync, are those that offer (multithread unsync) interior mutability:
Cell, RefCell - depending on the inner type
Types that are !Send and !Sync, are those that offer (multithread unsync) co-ownership:
Rc
I don't know types that are !Send and Sync;
According to
Rustonomicon: Send and Sync
A type is Send if it is safe to send it to another thread.
A type is Sync if it is safe to share between threads (T is Sync if and only if &T is Send).

Is there a way to drop a static lifetime object in Rust?

When searching for an answer, I found this question, however there is no mention of static lifetime objects. Can the method mentioned in this answer (calling drop() on the object) be used for static lifetime objects?
I was imagining a situation like a linked list. You need to keep nodes of the list around for (potentially) the entire lifetime of the program, however you also may remove items from the list. It seems wasteful to leave them in memory for the entire execution of the program.
Thanks!
No. The very point of a static is that it's static: It has a fixed address in memory and can't be moved from there. As a consequence, everybody is free to have a reference to that object, because it's guaranteed to be there as long as the program is executing. That's why you only get to use a static in the form of a &'static-reference and can never claim ownership.
Besides, doing this for the purpose of memory conservation is pointless: The object is baked into the executable and mapped to memory on access. All that could happen is for the OS to relinquish the memory mapping. Yet, since the memory is never allocated from the heap in the first place, there is no saving to be had.
The only thing you could do is to replace the object using unsafe mutable access. This is both dangerous (because the compiler is free to assume that the object does not in fact change) and pointless, due to the fact that the memory can't be freed, as it's part of the executable's memory mapping.

multiple threads vs reference counting: does each thread count variables separately

I've been playing around with glib, which
utilizes reference counting to manage memory for its objects;
supports multiple threads.
What I can't understand is how they play together.
Namely:
In glib each thread doesn't seem to increase refcount of objects passed on its input, AFAIK (I'll call them thread-shared objects). Is it true? (or I've just failed to find the right piece of code?) Is it a common practice not to increase refcounts to thread-shared objects for each thread, that shares them, besides the main thread (responsible for refcounting them)?
Still, each thread increases reference counts for the objects, dynamically created by itself. Should the programmer bother not to give the same names of variables in each thread in order to prevent collision of names and memory leaks? (E.g. on my picture, thread2 shouldn't crate a heap variable called output_object or it will collide with thread1's heap variable of the same name)?
UPDATE: Answer to (question 2) is no, cause the visibility scope of
those variables doesn't intersect:
Is dynamically allocated memory (heap), local to a function or can all functions in a thread have access to it even without passing pointer as an argument.
An illustration to my questions:
I think that threads are irrelevant to understanding the use of reference counters. The point is rather ownership and lifetime, and a thread is just one thing that is affected by this. This is a bit difficult to explain, hopefully I'll make this clearer using examples.
Now, let's look at the given example where main() creates an object and starts two threads using that object. The question is, who owns the created object? The simple answer is that main() and both threads share this object, so this is shared ownership. In order to model this, you should increment the refcounter before each call to pthread_create(). If the call fails, you must decrement it again, otherwise it is the responsibility of the started thread to do that when it is done with the object. Then, when main() terminates, it should also release ownership, i.e. decrement the refcounter. The general rule is that when adding an owner, increment the refcounter. When an owner is done with the object, it decrements the refcounter and the last one destroys the object with that.
Now, why does the the code not do this? Firstly, you can get away with adding the first thread as owner and then passing main()'s ownership to the second thread. This will save one increment/decrement operation. This still isn't what's happening though. Instead, no reference counting is done at all, and the simple reason is that it isn't used. The point of refcounting is to coordinate the lifetime of a dynamically allocated object between different owners that are peers. Here though, the object is created and owned by main(), the two threads are not peers but rather slaves of main. Since main() is the master that controls start/stop of the threads, it doesn't have to coordinate the lifetime of the object with them.
Lastly, though that might be due to the example-ness of your code, I think that main simply leaks the reference, relying on the OS to clean up. While this isn't beautiful, it doesn't hurt. In general, you can allocate objects once and then use them forever without any refcounting in some cases. An example for this is the main window of an application, which you only need once and for the whole runtime. You shouldn't repeatedly allocate such objects though, because then you have a significant memory leak that will increase over time. Both cases will be caught by tools like valgrind though.
Concerning your second question, concerning the heap variable name clash you expect, it doesn't exist. Variable names that are function-local can not collide. This is not because they are used by different threads, but even if the same function is called twice by the same thread (think recursion!) the local variables in each call to the function are distinct. Also, variable names are for the human reader. The compiler completely eradicates these.
UPDATE:
As matthias says below, GObject is not thread-safe, only reference counting functions are.
Original content:
GObject is supposed to be thread safe, but I've never played with that myself…

Resources