when( expression ){ do stuff } - programming-languages

Does anybody know in which programming language can you use this:
when( expression ){ do stuff }
It is supposed to register the expression and the code block in some array which in turn is parsed every Q mili-/micro-/nanoseconds by a background thread and, for every expression that evaluates to true, execute it's respective code block.
As you might think, I already have an implementation. I'm asking because I think it would be nice to have it supported natively in some programming languages.
For whom might ask what is the use for such an instruction- imagine that you create some variables/ objects in your program an want to have a procedure executed every time / while / whenever the variable/object has a certain value/state. The advantage is that you wouldn't have to bind the code to the variable/object and more they even wouldn't have to exist at the time you declare the when(){}.
It would be some sort of a trigger

The SR language has a construct very similar to the one you look for.
There the syntax is
do guard -> command
[] guard -> command
[] guard -> command
...
od
You can find information (and implementation) here: http://www.cs.arizona.edu/sr/

I don't know of any language level constructs like what you describe but it sounds similar in principle to what ReactiveX does with its observables. I believe that the Observable type is due to be incorporated into future versions of Javascript as well.
It looks a little different:
observable.doNext(() => { // do stuff })
vs:
when(expression) { // do stuff }
In the example you give your expression would return something akin to a ReactiveX observable, and the body of the statement could be equated to .doNext(() => do stuff). In fact Rx provides a number of options for how to handle different observable events and it can do some really powerful stuff allowing you to chain operations.

The Software Transactional Memory library for Haskell has something similar:
do
a <- readTVar x
b <- readTVar y
check (a < b)
writeTVar z (b - a)
This will read the current values of x and y into a and b, and wait until a < b becomes true before it continues to the next line.
It does so without busy-waiting. If the condition a < b is false, it aborts the transaction and starts listening for writes to the TVars that were read so far, namely x and y. When it's notified that one of the TVars were updated, only then does it restart the transaction from the top to attempt it again.

Related

Will Rust optimize away unused function arguments?

I have a function of type
f: fn(x: SomeType, y: Arc<()>) -> ISupposeTheReturnTypeDoesNotMatter
when compiled (with or without optimization), will the y be optimized away?
The intention of the y is to limit the number of the running instances of f, if y is referenced too many times, the caller of f will not call f until the reference count of y becomes lower.
Edit: clarification on my intention
The intention is to keep the number of running http requests (represented by the above f) in control, the pseudo code looks like this:
let y = Arc::new(());
let all_jobs_to_be_done = vector_of_many_jobs;
loop {
while y.strong_count() < some_predefined_limit {
// we have some free slots, fill them up with instances of f,
// f is passed with a clone of y,
// so that the running of f would increase the ref count,
// and the death of the worker thread would decrease the ref count
let work = all_jobs_to_be_done.pop();
let ticket = y.clone();
spawn_work(move || {f(work, ticket)});
}
sleep_for_a_few_seconds();
}
The reason for this seemingly hacky work around is that I cannot find a library that meets my needs (consume a changing work queue with bounded amount of async (tokio) workers, and requeue the work if the job fails)
Will Rust optimize away unused function arguments?
Yes, LLVM (the backend for rustc) is able to optimize away unused variables when removing them does not change program behavior, although nothing guarantees it will do it. rustc has some passes before LLVM too, but the same applies.
Knowing what exactly counts as program behavior is tricky business. However, multi-threaded primitives used in refcounting mechanics are usually the sort of thing that cannot be optimized away for good reason. Refer to the Rust reference for more information (other resources that might help are the nomicon, the different GitHub repos, the Rust fora, the C++11 memory model which Rust uses, etc.).
On the other hand, if you are asking about what are the semantics of the language when it encounters unused parameters, then no, Rust does not ignore them (and hopefully never will!).
will the y be optimized away?
No, it is a type with side effects. For instance, dropping it requires running non-trivial code.
The intention of the y is to limit the number of the running instances of f
Such an arrangement does not limit how many threads are running f since Arc is not a mutex and, even if it were some kind of mutex, you could construct as many independent ones as you wanted.

Is clojure "multithread"?

My question may seem weird but I think I'm facing an issue with volatile objects.
I have written a library implemented like this (just a scheme, not real content):
(def var1 (volatile! nil))
(def var2 (volatile! nil))
(def do-things [a]
(vreset! var1 a)
(vswap! var2 (inc #var2))
{:a #var1 :b #var2})
So I have global var which are initialized by external values, others that are calculated and I return their content.
i used volatile to have better speed than with atoms and not to redefine everytime a new var for every calculation.
The problem is that this seems to fail in practice because I map do-things to a collection (in another program) with inner sub-calls to this function occasionaly, like (pseudo-code) :
(map
(fn [x]
(let [analysis (do-things x)]
(if blabla
(do-things (f x))
analysis)))) coll)
Will inner conditionnal call spawn another thread under the hood ? It seems yes because somethimes calls work, sometimes not.
is there any other way to do apart from defining volatile inside every do-things body ?
EDIT
Actually the error was another thing but the question is still here : is this an acceptable/safe way to do without any explicit call to multithreading capabilities ?
There are very few constructs in Clojure that create threads on your behalf - generally Clojure can and will run on one or more threads depending on how you structure your program. pmap is a good example that creates and manages a pool of threads to map in parallel. Another is clojure.core.reducers/fold, which uses a fork/join pool, but really that's about it. In all other cases it's up to you to create and manage threads.
Volatiles should only be used with great care and in circumstances where you control the scope of use such that you are guaranteed not to be competing with threads to read and write the same volatile. Volatiles guarantee that writes can be read on another thread, but they do nothing to guarantee atomicity. For that, you must use either atoms (for uncoordinated) or refs and the STM (for coordinated).

How does the GHC garbage collector / runtime know that it can create an array `inplace'

For example
main = do
let ls = [0..10000000]
print ls
This will create the array 'inplace', using O(1) memory.
The following edit causes the program to run out of memory while executing.
main = do
let ls = [0..10000000]
print ls
print ls
ls in this case must be kept in memory to be printed again. It would actually be heaps more memory efficient to recalculate the array again 'inplace' than to try to keep this in place. That's an aside though. My real question is "how and when does GHC communicate to the runtime system that ls can be destroyed while it's created in O(1) time?" I understand that liveness analysis can find this information, I'm just wondering where the information is used. Is it the garbage collector that is passed this info? Is it somehow compiled away? (If I look at the compiled core from GHC then both examples use eftInt, so if it's a compiler artifact then it must happen at a deeper level).
edit: My question was more about finding where this optimization took place. I thought maybe it was in the GC, which was fed some information from some liveness check in the compilation step. Due to the answers so far I'm probably wrong. This is most likely then happening at some lower level before core, so cmm perhaps?
edit2: Most of the answers here assume that the GC knows that ls is no longer referenced in the first example, and that it is referenced again in the second example. I know the basics of GC and I know that arrays are linked lists, etc. My question is exactly HOW the GC knows this. The answer could probably be only (a) it is getting extra information from the compiler, or (b) it doesn't need to know this, that this information is handled 100% by the compiler
ls here is a lazy list, not an array. In practice, it's closer to a stream or generator in another language.
The reason the first code works fine is that it never actually has the whole list in memory. ls is defined lazily and then consume element-by-element by print. As print is going along, there are no other references to the beginning of ls so list items can be garbage collected immediately.
In theory, GHC could realize that it's more efficient to not store the list in memory between the two prints but instead recompute it. However, this is not always desirable—a lot of code is actually faster if things are only evaluated once—and, more importantly, would make the execution model even more confusing for programmers.
This explanation is probably a lie, especially because I'm making it up as I go, but that shouldn't be a problem.
The essential mistake you're making is assuming that a value is live if a variable bound to it is in scope in a live expression. This is simply wrong. A value bound to a variable is only live as a result if it is actually mentioned in a live expression.
The job of the runtime is very simple
Execute the expression bound to main.
There is no 2.
We can think of this execution as involving a couple different steps that repeat over and over:
Figure out what to do now.
Figure out what to do next.
So we start with some main expression. From the start, the "root set" for GC consists of those names that are used in that main expression, not the things that are in scope in that expression. If I write
foo = "Hi!"
main = print "Bye!"
then since foo is not mentioned in main, it is not in the root set at the beginning, and since it is not even mentioned indirectly by anything mentioned by main, it is dead right from the start.
Now suppose we take a more interesting example:
foo = "Hi!"
bar = "Bye!"
main = print foo >> print bar
Now foo is mentioned in main, so it starts out live. We evaluate main to weak head normal form to find out what to do, and we get, approximately,
(primitive operation that prints out "Hi!") >> print bar
Note that foo is no longer mentioned, so it is dead!
Now we execute that primitive operation, printing "Hi!", and our "to do list" is reduced to
print bar
Now we evaluate that to WHNF, and get, roughly,
(primitive operation to print "Bye!")
Now bar is dead. We print "Bye!" and exit.
Consider, now, the first program you described:
main = do
let ls = [0..10000000]
print ls
This desugars to
main =
let ls = [0..10000000]
in print ls
This is where we start. The "root set" at the beginning is everything mentioned in the in clause of the expression. So we conceptually have ls and print to start out. Now we can imagine that print, specialized to [Integer], looks something vaguely like the following (this is greatly simplified, and will print out the list differently, but that really doesn't matter).
print xs = case xs of
[] -> return ()
(y:ys) = printInteger y >> print ys
So when we start executing main (What do we do now? What will we do afterwards?), we are trying to calculate print ls. To do this, we pattern match on the first constructor of ls, which forces ls to be evaluated to WHNF. We find the second pattern, y:ys, matches, so we replace print ls with print Integer y >> print ys, where y points to the first element of ls and ys points to the thunk representing the second list constructor of ls. But note that ls itself is now dead! Nothing is pointing to it! So as print forces bits of the list, the bits it has already passed become dead.
In contrast, when you have
main =
let ls = ...
in print ls >> print ls
and you start executing, you start by calculating the thing to do first (print ls). You get
(printInteger y >> print ys) >> print ls
Everything is the same, except that the second part of the expression now points to ls. So even though the first part will be dropping pieces of the list as it goes, the second part will keep holding on to the beginning, keeping it all live.
Edit
Let me try explaining with something a little simpler than IO. Pretend that your program is an expression of type [Int], and the job of the runtime system is to print each element on its own line. So we can write
countup m n = if m == n then [] else m : countup (m+1)
main = countup 0 1000
The runtime system holds a value representing everything that it should print. Let's call the "current value" whatPrint. The RTS needs to follow a process:
Set whatPrint to main.
Is whatPrint empty? If so, I'm done, and can exit the program. If not, it is a cons, printNow : whatPrint'.
Calculate printNow and print it.
Set whatPrint to whatPrint'
Go to step 1.
In this model, the "root set" for garbage collection is just whatPrint.
In a real program, we don't produce a list; we produce an IO action. But such an action is also a lazy data structure (conceptually). You can think of >>=, return, and each primitive IO operation as a constructor for IO. Think of it as
data IO :: * -> * where
Return :: a -> IO a
Bind :: IO a -> (a -> IO b) -> IO b
PrintInt :: Int -> IO ()
ReadInt :: IO Int
...
The initial value of whatShouldIDo is main, but its value evolves over time. Only what it points to directly is in the root set. There is no magical analysis necessary.

What are the C++11 memory ordering guarantees in this corner case?

I'm writing some lock-free code, and I came up with an interesting pattern, but I'm not sure if it will behave as expected under relaxed memory ordering.
The simplest way to explain it is using an example:
std::atomic<int> a, b, c;
auto a_local = a.load(std::memory_order_relaxed);
auto b_local = b.load(std::memory_order_relaxed);
if (a_local < b_local) {
auto c_local = c.fetch_add(1, std::memory_order_relaxed);
}
Note that all operations use std::memory_order_relaxed.
Obviously, on the thread that this is executed on, the loads for a and b must be done before the if condition is evaluated.
Similarly, the read-modify-write (RMW) operation on c must be done after the condition is evaluated (because it's conditional on that... condition).
What I want to know is, does this code guarantee that the value of c_local is at least as up-to-date as the values of a_local and b_local? If so, how is this possible given the relaxed memory ordering? Is the control dependency together with the RWM operation acting as some sort of acquire fence? (Note that there's not even a corresponding release anywhere.)
If the above holds true, I believe this example should also work (assuming no overflow) -- am I right?
std::atomic<int> a(0), b(0);
// Thread 1
while (true) {
auto a_local = a.fetch_add(1, std::memory_order_relaxed);
if (a_local >= 0) { // Always true at runtime
b.fetch_add(1, std::memory_order_relaxed);
}
}
// Thread 2
auto b_local = b.load(std::memory_order_relaxed);
if (b_local < 777) {
// Note that fetch_add returns the pre-incrementation value
auto a_local = a.fetch_add(1, std::memory_order_relaxed);
assert(b_local <= a_local); // Is this guaranteed?
}
On thread 1, there is a control dependency which I suspect guarantees that a is always incremented before b is incremented (but they each keep being incremented neck-and-neck). On thread 2, there is another control dependency which I suspect guarantees that b is loaded into b_local before a is incremented. I also think that the value returned from fetch_add will be at least as recent as any observed value in b_local, and the assert should therefore hold. But I'm not sure, since this departs significantly from the usual memory-ordering examples, and my understanding of the C++11 memory model is not perfect (I have trouble reasoning about these memory ordering effects with any degree of certainty). Any insights would be appreciated!
Update: As bames53 has helpfully pointed out in the comments, given a sufficiently smart compiler, it's possible that an if could be optimised out entirely under the right circumstances, in which case the relaxed loads could be reordered to occur after the RMW, causing their values to be more up-to-date than the fetch_add return value (the assert could fire in my second example). However, what if instead of an if, an atomic_signal_fence (not atomic_thread_fence) is inserted? That certainly can't be ignored by the compiler no matter what optimizations are done, but does it ensure that the code behaves as expected? Is the CPU allowed to do any re-ordering in such a case?
The second example then becomes:
std::atomic<int> a(0), b(0);
// Thread 1
while (true) {
auto a_local = a.fetch_add(1, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acq_rel);
b.fetch_add(1, std::memory_order_relaxed);
}
// Thread 2
auto b_local = b.load(std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acq_rel);
// Note that fetch_add returns the pre-incrementation value
auto a_local = a.fetch_add(1, std::memory_order_relaxed);
assert(b_local <= a_local); // Is this guaranteed?
Another update: After reading all the responses so far and combing through the standard myself, I don't think it can be shown that the code is correct using only the standard. So, can anyone come up with a counter-example of a theoretical system that complies with the standard and also fires the assert?
Signal fences don't provide the necessary guarantees (well, not unless 'thread 2' is a signal hander that actually runs on 'thread 1').
To guarantee correct behavior we need synchronization between threads, and the fence that does that is std::atomic_thread_fence.
Let's label the statements so we can diagram various executions (with thread fences replacing signal fences, as required):
while (true) {
auto a_local = a.fetch_add(1, std::memory_order_relaxed); // A
std::atomic_thread_fence(std::memory_order_acq_rel); // B
b.fetch_add(1, std::memory_order_relaxed); // C
}
auto b_local = b.load(std::memory_order_relaxed); // X
std::atomic_thread_fence(std::memory_order_acq_rel); // Y
auto a_local = a.fetch_add(1, std::memory_order_relaxed); // Z
So first let's assume that X loads a value written by C. The following paragraph specifies that in that case the fences synchronize and a happens-before relationship is established.
29.8/2:
A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.
And here's a possible execution order where the arrows are happens-before relations.
Thread 1: A₁ → B₁ → C₁ → A₂ → B₂ → C₂ → ...
↘
Thread 2: X → Y → Z
If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B shall take its value from X or from a side effect Y that follows X in the modification order of M. — [C++11 1.10/18]
So the load at Z must take its value from A₁ or from a subsequent modification. Therefore the assert holds because the value written at A₁ and at all later modifications is greater than or equal to the value written at C₁ (and read by X).
Now let's look at the case where the fences do not synchronize. This happens when the load of b does not load a value written by thread 1, but instead reads the value that b is initialized with. There's still synchronization where the threads starts though:
30.3.1.2/5
Synchronization: The completion of the invocation of the constructor synchronizes with the beginning of the invocation of the copy of f.
This is specifying the behavior of std::thread's constructor. So (assuming the thread creation is correctly sequenced after the initialization of a) the value read by Z must take its value from the initialization of a or from one of the subsequent modifications on thread 1, which means that the assertions still holds.
This example gets at a variation of reads-from-thin-air like behavior. The relevant discussion in the spec is in section 29.3p9-11. Since the current version of the C11 standard doesn't guarantee dependences be respected, the memory model should allow the assertion to be fired. The most likely situation is that the compiler optimizes away the check that a_local>=0. But even if you replace that check with a signal fence, CPUs would be free to reorder those instructions.
You can test such code examples under the C/C++11 memory models using the open source CDSChecker tool.
The interesting issue with your example is that for an execution to violate the assertion, there has to be a cycle of dependences. More concretely:
The b.fetch_add in thread one depends on the a.fetch_add in the same loop iteration due to the if condition. The a.fetch_add in thread 2 depends on b.load. For an assertion violation, we have to have T2's b.load read from a b.fetch_add in a later loop iteration than T2's a.fetch_add. Now consider the b.fetch_add the b.load reads from and call it # for future reference. We know that b.load depends on # as it takes it value from #.
We know that # must depend on T2's a.fetch_add as T2's a.fetch_add atomic reads and updates a prior a.fetch_add from T1 in the same loop iteration as #. So we know that # depends on the a.fetch_add in thread 2. That gives us a cycle in dependences and is plain weird, but allowed by the C/C++ memory model. The most likely way of actually producing that cycle is (1) compiler figures out that a.local is always greater than 0, breaking the dependence. It can then do loop unrolling and reorder T1's fetch_add however it wants.
After reading all the responses so far and combing through the
standard myself, I don't think it can be shown that the code is
correct using only the standard.
And unless you admit that non atomic operations are magically safer and more ordered then relaxed atomic operations (which is silly) and that there is one semantic of C++ without atomics (and try_lock and shared_ptr::count) and another semantic for those features that don't execute sequentially, you also have to admit that no program at all can be proven correct, as the non atomic operations don't have an "ordering" and they are needed to construct and destroy variables.
Or, you stop taking the standard text as the only word on the language and use some common sense, which is always recommended.

How can one implement a forking try-catch in Haskell?

I want to write a function
forkos_try :: IO (Maybe α) -> IO (Maybe α)
which Takes a command x. x is an imperative operation which first mutates state, and then checks whether that state is messed up or not. (It does not do anything external, which would require some kind of OS-level sandboxing to revert the state.)
if x evaluates to Just y, forkos_try returns Just y.
otherwise, forkos_try rolls back state, and returns Nothing.
Internally, it should fork() into threads parent and child, with x running on child.
if x succeeds, child should keep running (returning x's result) and parent should die
otherwise, parent should keep running (returning Nothing) and child should die
Question: What's the way to write something with equivalent, or more powerful semantics than forkos_try? N.B. -- the state mutated (by x) is in an external library, and cannot be passed between threads. Hence, the semantic of which thread to keep alive is important.
Formally, "keep running" means "execute some continuation rest :: Maybe α -> IO () ". But, that continuation isn't kept anywhere explicit in code.
For my case, I think it will (for the time) work to write it in different style, using forkOS (which takes the entire computation child will run), since I can write an explicit expression for rest. But, it troubles me that I can't figure out how do this with the primitive function forkOS -- one would think it would be general enough to support any specific case (which could appear as a high-level API, like forkos_try).
EDIT -- please see the example code with explicit rest if the problem's still not clear [ http://pastebin.com/nJ1NNdda ].
p.s. I haven't written concurrency code in a while; hopefully my knowledge of POSIX fork() is correct! Thanks in advance.
Things are a lot simpler to reason about if you model state explicitly.
someStateFunc :: (s -> Maybe (a, s))
-- inside some other function
case someStateFunc initialState of
Nothing -> ... -- it failed. stick with initial state
Just (a, newState) -> ... -- it suceeded. do something with
-- the result and new state
With immutable state, "rolling back" is simple: just keep using initialState. And "not rolling back" is also simple: just use newState.
So...I'm assuming from your explanation that this "external library" performs some nontrivial IO effects that are nevertheless restricted to a few knowable and reversible operations (modify a file, an IORef, etc). There is no way to reverse some things (launch the missiles, write to stdout, etc), so I see one of two choices for you here:
clone the world, and run the action in a sandbox. If it succeeds, then go ahead and run the action in the Real World.
clone the world, and run the action in the real world. If it fails, then replace the Real World with the snapshot you took earlier.
Of course, both of these are actually the same approach: fork the world. One world runs the action, one world doesn't. If the action succeeds, then that world continues; otherwise, the other world continues. You are proposing to accomplish this by building upon forkOS, which would clone the entire state of the program, but this would not be sufficient to deal with, for example, file modifications. Allow me to suggest instead an approach that is nearer to the simplicity of immutable state:
tryIO :: IO s -> (s -> IO ()) -> IO (Maybe a) -> IO (Maybe a)
tryIO save restore action = do
initialState <- save
result <- action
case result of
Nothing -> restore initialState >> return Nothing
Just x -> return (Just x)
Here you must provide some data structure s, and a way to save to and restore from said data structure. This allows you the flexibility to perform any cloning you know to be necessary. (e.g. save could copy a certain file to a temporary location, and then restore could copy it back and delete the temporary file. Or save could copy the value of certain IORefs, and then restore could put the value back.) This approach may not be the most efficient, but it's very straightforward.

Resources