How do I implement the ABA solution? - linux

I am trying to implement the Michael-Scott FIFO queue from here. I'm unable to implement their solution for the ABA problem. I get this error.
error: incompatible type for argument 1 of '__sync_val_compare_and_swap'
For reference, I am using a linux box to compile this on an intel architecture. If you need more information on my setup please ask.
It seems that sync_val_CAS handles only up to 32 bit values. So when I remove their counter which is used to eliminate the ABA problem, everything compiles and runs fine.
Does anyone know of the relevant 64-bit CAS instruction I should be using here?
As an additional question, are there better (faster) implementations of lock-free fifo queues out there? I came across this by Nir Shavit et al which seems interesting. I am wondering if others have seen similar efforts? Thanks.

Assuming gcc, try using the "march" switch. Something like this: -march=i686
There is also the __sync_bool_compare_and_swap. I don't know if its faster or not.

GCC, last I looked in 2009, does not support contigious double-word CAS. I had to implement in-line assembly.
You can find my implemenation of the M&S queue (including in the abstraction layer the assembly implementation of DCAS) and other lock-free data structures here;
http://www.liblfds.org
Briefly looking at the Nir Shavit et al paper, the queue requires Safe Memory Reclaimation, which I suspect you'll need to implement - it won't be built into the queue. An SMR API will be available in the next release (couple weeks).

Lock-free may not be what you want, since lock-free is not necessarily wait-free. If you need a fast thread-safe queue (not lock-free!), then consider using Threading Building Blocks concurrent_queue.

Related

How do I find out the number of CPU cores using cpuid?

I am interested in physical cores, not logical cores.
I am aware of https://crates.io/crates/num_cpus, but I want to get the number of cores using cpuid. I am mostly interested in a solution that works on Ubuntu, but cross-platform solutions are welcome.
I see mainly two ways for you to do this.
You could use the higher level library cpuid. With this, it's as simple as cpuid::identify().unwrap().num_cores (of course, please do proper error handling). But since you know about the library num_cpus and still ask this question, I assume you don't want to use an external library.
The second way to do this is do it all on your own. But this method of doing it is mostly unrelated to Rust as the main difficulty lies in understanding the CPUID instruction and what it returns. This is explained in this Q&A, for example. It's not trivial, so I won't repeat it here.
The only Rust specific thing is how to actually execute that instruction in Rust. One way to do it is to use core::arch::x86_64::__cpudid_count. It's an unsafe function that returns the raw result (four registers). After calling it, you have to extract the information you want via bit shifting and masking as described in the Q&A I linked above. Consult core::arch for other architectures or more cpuid related functions.
But again, doing this manually is not trivial, error prone and apparently hard to make work truly cross-CPU. So I would strongly recommend using a library like num_cpus in any real code.

What happened to libgreen?

As far as I understand libgreen is not a part of Rust standard library anymore. Also I can't find a separate libgreen package. There are a few alternatives - coroutine, which does not provide actual green threads for now, and green-rs, which is broken. Do I right understand that for now there is no lightweight Go-like processes in Rust?
You are correct that there's no lightweight tasking library in std (or the rest of the main distribution), that green doesn't compile and that coroutine doesn't seem to fully handle the threading aspect yet. I do not know of any other library in this space.
As for what happened: the RFC linked to by that issue—RFC 230—is the canonical source of information. The summary is that it was found that the method by which green threading/IO was handled (std tried to abstract across both models, allowing them to be used interoperably automagically) was not worth the downsides. Now, std aims to just provide a minimum base-line of useful support: for IO/threading, that means "thin", safe wrappers for operating system functionality.
Read this https://aturon.github.io/blog/2016/08/11/futures/ and also:
Steve Klabnik's response in the comments:
In the beginning, Rust had only green threads. Eventually, it was
decided that a systems language without systems threads is... strange.
So we needed to add them. Why not add choice? Since the interfaces
could be the same, why not abstract over them, and you could just
choose which one you wanted?
At the same time, the problems with green threads by default were
becoming issues. Segmented stacks cause slow C interop. You need a
runtime to manage them, etc. Furthermore, the overall abstraction was
causing an unacceptable cost. The green threads weren't very green.
Plus, with the need to actually release someday looming, decisions
needed to be made regarding tradeoffs. And since Rust is supposed to
be a systems language, having 1:1 threads and basically no runtime
makes more sense than N:M threads and a runtime. . So libgreen was
removed, the interface was re-done to be 1:1 thread centric.
The 'release someday looming' is a big part of it. We want to be
really stable with Rust, and with all the things to do to actually
ship a 1.0, we didn't want to crystallize an interface we weren't
happy with. Heck, we pulled out a lot of libraries that are even less
important for similar reasons, like rand. Engineering is all about
tradeoffs, and we decided to choose minimalism.
mio is a non starter for us, as are most of the other async i/o frameworks for Rust, because we need Windows and besides we don't want
to get locked into an expensive to replace library which may get
orphaned.
Totally understood here, especially in the general case. In the
specific case, mio is going to either have Windows support, or a
windows-specific version of mio is going to be released, with a
higher-level package providing the features for all platforms. And in
this case, it's maintained by one of the people who's currently using
Rust heavily in production, so it's not likely to go away anytime
soon. But, unless you're actively involved, it's hard to know things
like that, which is, of itself an issue.
One of the reasons we were comfortable removing libgreen is that you
can write your own libraries to do different kinds of IO. 1.0 is a
strong core that we feel good about stabilizing forever, not the final
bit. Libraries like https://github.com/carllerche/mio can test out
different ways of handling things like async IO, and, when they're
mature enough, we can always pull them back in the standard library if
need be. But in the meantime, it's just one line to your Cargo.toml to
add them in.
And such text from reddit:
Unfortunately they ended up canning the greenlet support because
theirs were slower than kernel threads which in turn demonstrates
someone didn’t understand how to get a language compiler to generate
stackless coroutines effectively (not surprising, the number of
engineers wired the right way is not many in this world, but see
http://www.reddit.com/r/rust/comments/2l0a4b/do_rust_web_servers_use_libuv_through_libgreen_or/
for more detail). And they canned the async i/o because libuv is
“slow” (which it is only because it is single threaded only, plus
forces a malloc + free per async operation as the buffers must last
until completion occurs, plus it enforces a penalty over synchronous
i/o see
http://blog.kazuhooku.com/2014/09/the-reasons-why-i-stopped-using-libuv.html),
which was a real shame - they should have taken the opportunity to
replace libuv with something better (hint: ASIO + AFIO, and yes I know
they are both C++, but Rust could do with much better C++ interop than
the presently none it currently has) instead of canning
always-async-everything in what could have been an amazing step up
from C++ with most of the benefits of Erlang without the disadvantages
of Erlang.
For newcomers, there is now may, a crate that implements green threads similar to goroutines.

Is rbp/ebp(x86-64) register still used in conventional way?

I have been writing a small kernel lately based on x86-64 architecture. When taking care of some user space code, I realized I am virtually not using rbp. I then looked up at some other things and found out compilers are getting smarter these days and they really dont use rbp anymore. (I could be wrong here.)
I was wondering if conventional use of rbp/epb is not required anymore in many instances or am I wrong here. If that kind of usage is not required then can it be used like a general purpose register?
Thanks
It is only needed if you have variable-length arrays in your stack frames (recording the array length would require more memory and more computations). It is no longer needed for unwinding because there is now metadata for that.
It is still useful if you are hand-writing entire assembly functions, but who does that? Assembly should only be used as glue to jump into a C (or whatever) function.

What is data-dependency barrier: Linux Kernel

As question says it all I was looking for in depth explanation of data-dependency barrier in SMP especially with respect to Linux Kernel. I have the definition and brief description handy in this link here.
Linux Kernel Memory Barriers Documentation
However I was attempting to get a profound understanding of this concept. Your thoughts and inputs are highly appreciated.
Actually, at least in terms of C++11, this is more closely related to consume semantics. You can read more about it e.g. here. In short, they provide weaker guarantees than acquire semantics, which makes them more efficient on certain platforms that support data dependency ordering.
In C++ terms, it is called memory_order_consume. See this presentation from Paul Mckenney.
Old answer: I believe "acquire semantics" is the more commonly used term for what the document is calling a "data-dependency barrier". See for example this presentation or the C++11 memory_order_acquire.
Update: per comments, the Linux description for Data Dependency Barriers sounds more like C++ "consume semantics".

Lockfree standard collections and tutorial or articles

Does someone know of a good resource for the implementation (meaning source code) of lock-free usual data types. I'm thinking of Lists, Queues and so on?
Locking implementations are extremely easy to find but I can't find examples of lock free algorithms and how to exactly does CAS work and how to use it to implement those structures.
Check out Julian M Bucknall's blog. He describes (in detail) lock-free implementations of queues, lists, stacks, etc.
http://www.boyet.com/Articles/LockfreeQueue.html
http://www.boyet.com/Articles/LockfreeStack.html
http://www.liblfds.org
Written in C.
If C++ is okay with you, take a look at boost::lockfree. It has lock-free Queue, Stack, and Ringbuffer implementations.
In the boost::lockfree::details section, you'll find a lock-free freelist and tagged pointer (ABA prevention) implementation. You will also see examples of explicit memory ordering via boost::atomic (an in-development version of C++0x std::atomic).
Both boost::lockfree and boost::atomic are not part of boost yet, but both have seen attention from the boost-development mailing list and are on the schedule for review.

Resources