I would like to know if there is a difference between these to definitions.
[(x,y)| x<-[1..10000], x=2000,y<-[1..100], odd y]
[(x,y)| x<-[1..10000],y<-[1..100], x=2000, odd y]
Both will generate same list of tuples.
But if our compiler doesn't do any optimization.
How can i find out which one is faster.
In both case x<-[1..10000] will give us a list from [1,2.. 20000] since x==2000.
In what order will the y value be evaluated?
Things are executed left-to-right. Think of it as nested loops. So in the first one the test of x is executed 10000 times, and in the second it's executed 1000000 times.
Moving the condition outwards to speed up the execution is called "filter promotion"; a term coined by David Turner (ca 1980).
Related
I have been working on a multi-GPU project where I have had problems with obtaining non-deterministic results. I was surprised when it turned out that I obtained non-deterministic results due to a reduction clause executed on the CPU.
In the book Using OpenMP - The Next Step it is written that
"[...] the order in which threads combine their value to construct the
value for the shared result is non-deterministic."
Maybe I just don't understand how the reduction clauses are implemented. Does it mean that if I use schedule(monotonic:static) in combination with a reduction clause each thread will execute its chunk of the iterations in a deterministic order, but that the order in which the partial results are combined at the end of the parallel region is non-deterministic?
Does it mean that if I use schedule(monotonic:static) in combination
with a reduction clause each thread will execute its chunk of the
iterations in a deterministic order, but that the order in which the
partial results are combined at the end of the parallel region is
non-deterministic?
It is known that the end result is non-determinist, detailed information can be found in:
What Every Computer Scientist Should Know about Floating Point Arithmetic. For instance:
Another grey area concerns the interpretation of parentheses. Due to roundoff errors, the associative laws of algebra do not necessarily hold for floating-point numbers. For example, the expression (x+y)+z has a totally different answer than x+(y+z) when x = 1e30, y = -1e30 and z = 1 (it is 1 in the former case, 0 in the latter).
Now regarding the order in which the threads perform the reduction action, as far as I know, the OpenMP standard does not enforce any order, or requires that the order has to be deterministic. Hence, this is an implementation detail that is left up to the compiler that is implementing the OpenMP standard to decide, and consequently, it is something that your code should not reply upon.
Programming language semantics usually declares that a+b+c+d is evaluated as ((a+b)+c)+d. This is not parallel, so an OpenMP reduction is probably evaluated as (a+b)+(c+d). And so on for larger numbers of summands.
So you immediately have that, because of the non-associativity of floating point arithmetic, the result may be subtly different from the sequential value.
But more importantly, the exact value will depend on precisely how the combination is done. Is it a+(b+c) (on 2 threads) or (a+b)+c? So the result is at least "indeterministic" in the sense that you can not reconstruct how it was formed. It could probably even be done in two different ways, if you run the same reduction twice. That's what I would call "non-deterministic", but look in the standard for the exact definition of the term.
By the way, if you want to get some idea of how OpenMP actually does it, write your own reduction operator, and let each invocation print out what it computes. Here is a decent illustration: https://victoreijkhout.github.io/pcse/omp-reduction.html#Initialvalueforreductions
By the way, the standard actually doesn't use the word "non-deterministic" for this case. The following passage explains the issue:
Furthermore, using different numbers of threads may result in
different numeric results because of changes in the association of
numeric operations. For example, a serial addition reduction may have
a different pattern of addition associations than a parallel
reduction.
I was recently working on an implementation of calculating moving average from a stream of input, using Data.Sequence. I figured I could get the whole operation to be O(n) by using a deque.
My first attempt was (in my opinion) a bit more straightforward to read, but not a true a deque. It looked like:
let newsequence = (|>) sequence n
...
let dropFrontTotal = fromIntegral (newtotal - index newsequence 0)
let newsequence' = drop 1 newsequence.
...
According to the hackage docs for Data.Sequence, index should take O(log(min(i,n-i))) while drop should also take O(log(min(i,n-i))).
Here's my question:
If I do drop 1 someSequence, doesn't this mean a time complexity of O(log(min(1, (length someSequence)))), which in this case means: O(log(1))?
If so, isn't O(log(1)) effectively constant?
I had the same question for index someSequence 0: shouldn't that operation end up being O(log(0))?
Ultimately, I had enough doubts about my understanding that I resorted to using Criterion to benchmark the two implementations to prove that the index/drop version is slower (and the amount it's slower by grows with the input). The informal results on my machine can be seen at the linked gist.
I still don't really understand how to calculate time complexity for these operations, though, and I would appreciate any clarification anyone can provide.
What you suggest looks correct to me.
As a minor caveat remember that these are amortized complexity bounds, so a single operation could require more than constant time, but a long chain of operations will only require a constant times the number of the chain.
If you use criterion to benchmark and "reset" the state at every computation, you might see non-constant time costs, because the "reset" is preventing the amortization. It really depends on how you perform the test. If you start from a sequence an perform a long chain of operations on that, it should be OK. If you repeat many times a single operation using the same operands, then it could be not OK.
Further, I guess bounds such as O(log(...)) should actually be read as O(log(1 + ...)) -- you can't realistically have O(log(1)) = O(0) or, worse O(log(0))= O(-inf) as a complexity bound.
I would like to know how I can turn a function to work with infinite list?
For example, I have a function to revert a list of lists.
innerReverse [[1,2,3]] will return [[3,2,1]]. However, when I tried take 10 $ innerReverse [[1..]] It basically runs into an infinite loop.
When I do innerReverse [(take 10 [1..])] It gives the result: [[10,9,8,7,6,5,4,3,2,1]]
Haskell is a lazy language, which means that evaluations are only performed right before the result is actually used. That's what makes it possible for Haskell to have infinite lists; only the portions of the list that you're accessed so far are actually stored in memory.
The concept of an infinite list makes what you're trying to do impossible. In the list [1..] the first element is 1. What's the last element? The answer is that that's a trick question; there is no concept of the "end" of an infinite list. Similarly, what is the first element of the reverse of [1..]? Again, it's a trick question. The last element is 1, but the list would have no beginning.
The reverse of [1..] is not [10,9,8,7,6,5,4,3,2,1]. The reverse of the latter is [1,2,3,4,5,6,7,8,9,10], not [1..].
I am reading page 69 of Haskell School of Expression and I am not sure that I got the evalution of rev [1:2:3:4] right.
Hudak does not explain the evalution(rewriting) order in detail in his book for reverse.
Could someone please either confirm that my guess (shown in the attached picture) is correct or if not correct then point out what I got wrong. I believe that it is correct but I am not 100% sure, this is the reason for asking.
So the question is:
when I evaluate one step of reverse then aftes the evaluation (i.e. rewriting) the result should be surrounded by parenthesis, right?
If I understand correctly, these unlucky appearance of parentheseses is the reason for the poor (read quadratic) time complexity of reverse. In this example 6 steps are spent in total on list appending in order to reverse a 4 element list.
Yes, nested, left-associative calls to append (in Haskell, goes by the names (++) and (<>)) generates poor performance of singly-linked lists.
There are several solutions to this problem, since it's been known about for 30 or 40 years, at least. I believe the library version of reverse uses an accumulator to achieve linear complexity rather than quadratic, but it's still not something you want to call frequently on lists.
I am trying to write a simple iterating algorithm in Haskell, but I'm struggling to find the optimal solution in terms of elegance and speed.
I have an algorithm that needs to apply an operation to a state over a number of iterations until some stopping condition is reached, recording the state using some arbitrary function. I already know how to implement a scheme like this by defining a function like iterateM.
But in this case the operation to perform for each step depends on the state, and boils down to checking a 'step type' condition to decide on the next iteration types, and then performing operation A for the next 10 iterations, or performing operation B for the next iteration before checking the condition again.
I could write it in an imperative style as:
c=0
while True:
if c>0:
x=iterateByA(x)
c=c-1
else:
if stepCondition(x)==0:
x=iterateByA(x)
c=9
else:
x=iterateByB(x)
observeState(x)
if stopCondition(x):
break
and of course this could just be copied in Haskell, but I would rather do something more elegant.
My idea is to have the iteration use a list of functions to pop and apply to the state, and update that list with a new one (based on the 'step type' condition) once it is empty. I'm slightly concerned that this will be inefficient though. Would doing this and using something like
take 10 (repeat iterateByA)
compile away all of the list allocation etc to a tight loop that only uses a counter, like the imperative one above?
Is there another neat and efficient way of doing this?
If it helps this is for an adaptive stochastic simulation algorithm, the iteration steps update the state and the step condition (that decides the best simulation scheme) is a function of the current state. There are infact 3 different iteration schemes but I figured that an example with 2 is easier to explain.
(I'm not sure if it matters but I should probably also point out that in haskell the iterateByX functions are monadic since they use random numbers.)
A direct translation doesn't look too bad.
loop c x
| stopCondition x = observe x
| c > 0 = observe x >> iterateByA x >>= loop (c-1)
| stepCondition x = observe x >> iterateByA x >>= loop 9
| otherwise = observe x >> iterateByB x >>= loop c
The repetition of observe can be removed via various tricks if you don't like it.
You should probably rethink things, though. This is a very imperative approach; probably something much better can be done (but it's hard to say how from the few details you've given here).