I was reading this article: https://wiki.haskell.org/GHC/Memory_Management
and found it a bit confusing. I do not understand the part:
The trick is that immutable data NEVER points to younger values. Indeed, younger values don't yet exist at the time when an old value is created, so it cannot be pointed to from scratch. And since values are never modified, neither can it be pointed to later. This is the key property of immutable data.
I think it would be true if we forget about recursion and mutual recursion, but what about
let x = 1:y
y = 2:x
which is proper Haskell code, and makes x == [1,2,1...]. Here x points to y and y points to x. In this situation we cannot say that no value points to younger one (one of them must actually be elder).
Who is mistaken - me, or article author? Why? How does immutability help GC's analysis in this case?
Related
I am trying to understand Bellman Equation and facing with some confusing moments.
1) In different sources I met different definitions of Bellman Equation.
Sometimes it is defined as value-state function
v(s) = R + y*V(s')
Sometimes it is defined as action-state function
q(s, a) = r + max(q(s', a'))
Are both of these definitions correct? How Bellman equation was introduced in the original paper?
Bellman equation gives a definite form to dynamic programming solutions and using that we can generalise the solutions to optimisations problems which are recursive in nature and follow the optimal substructure property.
Optimal substructure in simpler terms means that the given problem can be broken down into smaller sub problems which require the same solution with smaller data. If an optimal solution to the smaller problem can be computed then it means the given problem (larger one) can also be computed.
Let's denote the problem solution for given state S by value V(S), S is the state or the subproblem. Let's denote the cost that would incur by choosing action a(i) at state S be R. R will be a function f(S, a(i)), where a is the set of all possible actions that can be performed on state S.
V(S) = max{ f(S, a(i)) + y * V(S') } where max is taken by iterating over all possible i. y is a fixed constant that taxes the subproblem to bigger problem transition, for most problems y = 1, so you can ignore it for now.
So basically at any given sub-problem S, V(S) will give us the most optimal solution by choosing all combinations of actions a(i) that can be performed and the next state that will be created with that action. If you think recursively and are habitual to such stuff then it's easy to see why the above equation is correct.
I would suggest to solve dynamic programming problems and look at some standard problems and their solutions to get an idea how those problems are broken down into smaller similar problems and solved recursively. After that, the above equation will make more sense. Also, you will realise that two equations you have written above are almost the same thing, just they are written in a bit different manner.
Here is a list of more commonly known DP problems and their solutions.
I'm taking a course on coursera that uses minizinc. In one of the assignments, I was spinning my wheels forever because my model was not performing well enough on a hidden test case. I finally solved it by changing the following types of accesses in my model
from
constraint sum(neg1,neg2 in party where neg1 < neg2)(joint[neg1,neg2]) >= m;
to
constraint sum(i,j in 1..u where i < j)(joint[party[i],party[j]]) >= m;
I dont know what I'm missing, but why would these two perform any differently from eachother? It seems like they should perform similarly with the former being maybe slightly faster, but the performance difference was dramatic. I'm guessing there is some sort of optimization that the former misses out on? Or, am I really missing something and do those lines actually result in different behavior? My intention is to sum the strength of every element in raid.
Misc. Details:
party is an array of enum vars
party's index set is 1..real_u
every element in party should be unique except for a dummy variable.
solver was Gecode
verification of my model was done on a coursera server so I don't know what optimization level their compiler used.
edit: Since minizinc(mz) is a declarative language, I'm realizing that "array accesses" in mz don't necessarily have a direct corollary in an imperative language. However, to me, these two lines mean the same thing semantically. So I guess my question is more "Why are the above lines different semantically in mz?"
edit2: I had to change the example in question, I was toting the line of violating coursera's honor code.
The difference stems from the way in which the where-clause "a < b" is evaluated. When "a" and "b" are parameters, then the compiler can already exclude the irrelevant parts of the sum during compilation. If "a" or "b" is a variable, then this can usually not be decided during compile time and the solver will receive a more complex constraint.
In this case the solver would have gotten a sum over "array[int] of var opt int", meaning that some variables in an array might not actually be present. For most solvers this is rewritten to a sum where every variable is multiplied by a boolean variable, which is true iff the variable is present. You can understand how this is less efficient than an normal sum without multiplications.
I was recently working on an implementation of calculating moving average from a stream of input, using Data.Sequence. I figured I could get the whole operation to be O(n) by using a deque.
My first attempt was (in my opinion) a bit more straightforward to read, but not a true a deque. It looked like:
let newsequence = (|>) sequence n
...
let dropFrontTotal = fromIntegral (newtotal - index newsequence 0)
let newsequence' = drop 1 newsequence.
...
According to the hackage docs for Data.Sequence, index should take O(log(min(i,n-i))) while drop should also take O(log(min(i,n-i))).
Here's my question:
If I do drop 1 someSequence, doesn't this mean a time complexity of O(log(min(1, (length someSequence)))), which in this case means: O(log(1))?
If so, isn't O(log(1)) effectively constant?
I had the same question for index someSequence 0: shouldn't that operation end up being O(log(0))?
Ultimately, I had enough doubts about my understanding that I resorted to using Criterion to benchmark the two implementations to prove that the index/drop version is slower (and the amount it's slower by grows with the input). The informal results on my machine can be seen at the linked gist.
I still don't really understand how to calculate time complexity for these operations, though, and I would appreciate any clarification anyone can provide.
What you suggest looks correct to me.
As a minor caveat remember that these are amortized complexity bounds, so a single operation could require more than constant time, but a long chain of operations will only require a constant times the number of the chain.
If you use criterion to benchmark and "reset" the state at every computation, you might see non-constant time costs, because the "reset" is preventing the amortization. It really depends on how you perform the test. If you start from a sequence an perform a long chain of operations on that, it should be OK. If you repeat many times a single operation using the same operands, then it could be not OK.
Further, I guess bounds such as O(log(...)) should actually be read as O(log(1 + ...)) -- you can't realistically have O(log(1)) = O(0) or, worse O(log(0))= O(-inf) as a complexity bound.
data StableName a
Stable names have the following property:
If sn1 :: StableName and sn2 :: StableName and sn1 == sn2 then sn1 and sn2 were created by calls to makeStableName on the same object.
The reverse is not necessarily true: if two stable names are not equal, then the objects they name may still be equal.
reallyUnsafePtrEquality# :: a -> a -> Int#
reallyUnsafePtrEquality# returns whether two objects on the GHC heap are the same object. It's really unsafe because the garbage collector moves things around, closures, etc. To the best of my knowledge, it can return false negatives (it says two objects aren't the same, but they are), but not false positives (saying they're the same when they aren't).
Both of them seem to do the same basic thing: they can tell you whether two objects are definitely the same, but not whether they're definitely not.
The advantages I can see for StableNames are:
They can be hashed.
They are less nonportable.
Their behaviour is well-defined and supported.
They don't have reallyUnsafe as part of their name.
The advantages I can see for reallyUnsafePtrEquality#:
It can be called directly on the objects in question, instead of having to create separate StablesNames.
You don't have to go through an IO function to create the StableNames.
You don't have to keep StableNames around, so memory usage is lower.
The RTS doesn't have to do whatever magic it does to make the StableNames work, so performance is presumably better.
It has reallyUnsafe in the name and a # at the end. Hardcore!
My questions are:
Did I miss anything?
Is there any use case where the fact that StableNames are separate from the objects they name is an advantage?
Is either one more accurate (less likely to return false negatives) than the other?
If you don't need hashing, don't care about portability, and aren't bothered by using something called reallyUnsafe, is there any reason to prefer StableNames over reallyUnsafePtrEquality#?
Holding the StableName of an object doesn't prevent it from being garbage collected, whereas holding the object itself around (to use with reallyUnsafePtrEquality# later) does. Sure, you can use System.Mem.Weak, but at that point, why not just use a StableName? (In fact, weak pointers were added with StableNames.)
Being able to hash them is the main motivator for StableNames, as the documentation says:
We can't build a hash table using the address of the object as the key, because objects get moved around by the garbage collector, meaning a re-hash would be necessary after every garbage collection.
In general, if StableNames will work for your purposes, I'd use them, even if you need to use unsafePerformIO; if you really need reallyUnsafePtrEquality#, you'll know. The only example I can think of where reallyUnsafePtrEquality# would work and StableNames wouldn't is speeding up an expensive Eq instance:
x == y =
x `seq` y `seq`
case reallyUnsafePtrEquality# x y of
1# -> True
_ -> slowEq x y
There's probably other examples I just haven't thought of, but they're not common.
In reading Haskell-related stuff I sometimes come across the expression “tying the knot”, I think I understand what it does, but not how.
So, are there any good, basic, and simple to understand explanations of this concept?
Tying the knot is a solution to the problem of circular data structures. In imperative languages you construct a circular structure by first creating a non-circular structure, and then going back and fixing up the pointers to add the circularity.
Say you wanted a two-element circular list with the elements "0" and "1". It would seem impossible to construct because if you create the "1" node and then create the "0" node to point at it, you cannot then go back and fix up the "1" node to point back at the "0" node. So you have a chicken-and-egg situation where both nodes need to exist before either can be created.
Here is how you do it in Haskell. Consider the following value:
alternates = x where
x = 0 : y
y = 1 : x
In a non-lazy language this will be an infinite loop because of the unterminated recursion. But in Haskell lazy evaluation does the Right Thing: it generates a two-element circular list.
To see how it works in practice, think about what happens at run-time. The usual "thunk" implementation of lazy evaluation represents an unevaluated expression as a data structure containing a function pointer plus the arguments to be passed to the function. When this is evaluated the thunk is replaced by the actual value so that future references don't have to call the function again.
When you take the first element of the list 'x' is evaluated down to a value (0, &y), where the "&y" bit is a pointer to the value of 'y'. Since 'y' has not been evaluated this is currently a thunk. When you take the second element of the list the computer follows the link from x to this thunk and evaluates it. It evaluates to (1, &x), or in other words a pointer back to the original 'x' value. So you now have a circular list sitting in memory. The programmer doesn't need to fix up the back-pointers because the lazy evaluation mechanism does it for you.
It's not quite what you asked for, and it's not directly related to Haskell, but Bruce McAdam's paper That About Wraps It Up goes into this topic in substantial breadth and depth. Bruce's basic idea is to use an explicit knot-tying operator called WRAP instead of the implicit knot-tying that is done automatically in Haskell, OCaml, and some other languages. The paper has lots of entertaining examples, and if you are interested in knot-tying I think you will come away with a much better feel for the process.