Understanding space leak with Arrow - haskell

I am having difficulty understanding the space leak in Hudak's paper "plugging a space leak with arrow"
https://www.sciencedirect.com/science/article/pii/S1571066107005919.
1) What exactly does O(n) space complexity mean? The total memory allocated with respect to input size? What about garbage collection along the way?
2) If the definition in 1) holds, how is it that in page 34, they say if dt is constant, the signal type is akin to list type and runs in constant space? Doesn't integralC still create 1 unit of space at each step, totally n units, that is, still O(n)?
3) I totally do not understand why time complexity is O(n^2). I do have an inkling of what needs to be evaluated (i', i'', i''' in picture below), but how is that O(n^2)?
The image represents the evaluation steps I have drawn in lambda graph notation. Each step sees its structure ADDED to the overall scope rather than REPLACING whatever is in it. Square denotes pointer, so square(i') in step 2 denotes i' block in step 1 for example.

I have only glanced at the paper briefly, but will do my best.
As usual, space complexity means than at some point in time we need to be storing that much "stuff" simultaneously in memory. GC says we can recover memory from variables we no longer need, but here we need to be remembering O(n) stuff, the memory can't be recovered yet because we (may) still need access to any part of it. You can think of it as, reusing memory (via eg. GC) adds to time but not space complexity. Here, n is computing the nth value by providing n time steps (dts).
If dt is constant, then instead of the type of C a = (a, dt -> C a) we have C' a = (a, C' a) which is just a (nonempty) list. The point of the paper is that either type can be made to run in constant space, but if it were isomorphic to lists then that's a solved problem. To see why creating a new value at each step can be constant memory, consider a possible evaluation of (iterate f)!!n, where we store just x, then overwrite it with f (x), then overwrite it with f (f (x)), and so on until we have f^n(x), but only ever using this one cell of memory for our values (and technically a second cell to iterate up to n).
Let's consider a really simple example of evaluation giving these different complexities. Let's say we're generating a list from some seed where each item is the sum of all the previous items. To calculate each next item, we could hold the entirety of the initial part p of the list in memory (O(Len(p))) and sum it (O(Len(p))), resulting in total memory O(n) and run time O(n^2) to retrieve then`th element - or we could observe that this is in fact the same as doubling the previous item, allowing us to use constant memory and linear time. I think the analogy section given is quite helpful - can you mechanically write out successors for the first few values and see how the two different evaluation strategies rapidly diverge in steps needed?

Related

Do I need a HashSet or a Set?

I want to initialise a set of all Ints from 1 to n (n<20000). Then I want to remove them one by one and meanwhile check if certain elements are still in it until the set is empty.
Which data structure is suited best for this task?
If you want to stick to immutable data structures, I would recommend IntSet. It's carefully optimized for precisely this kind of thing. A Set Int is a balanced binary search tree of Ints, which takes a lot of space and a good bit of time. A HashSet Int is an array-mapped trie of Ints, which is likely faster and more compact, but still pretty mediocre. An IntSet is a PATRICIA tree whose leaves are bitsets. So it's pretty compact (a little over twice the size of an unboxed immutable array when full), but much more efficient to modify.
Initializing an IntSet with all Ints from 1 to n takes O(n) time. If you're only initializing once, or once in a while, and n < 20000, then that shouldn't cause any performance trouble. If, however, you need to initialize often (especially if you sometimes only remove a few elements before discarding the set), or n turns out to be much larger (e.g., hundreds of millions) and you want to cut down on initialization time, you can use IntSet to represent the complement of the set you want to store.
data CompSet = CompSet
{ initialMax :: !Int
, size :: !Int
, missingElements :: !IntSet
}
A CompSet stores the initial maximum (n), and an IntSet indicating which elements in [1..initialMax] are no longer in the set. The size of the CompSet is initialized to initialMax and lets you know in O(1) time whether the set is empty (i.e., when size missingElements = initialMax).
Use a bitset (a.k.a. Integer). A 1 bit represents a value still in the set; a 0 bit represents one that just ain't there. For example, the Integer that represents having all the numbers from 1 to n would be bit (n+1) - 2 (assuming you plan to use 0-indexing, as seems sensible to me); to check whether a number is in the set, use testBit; to remove a number, use clearBit.
An alternate implementation strategy for the same underlying idea would be to use an unboxed array of Bool, either mutable or immutable as needed. The unboxed versions do the appropriate bit-packing. The only downside would be possibly having to resize the array if you need to add numbers to the set later that are larger than you originally allocated space for.

Python: Time and space complexity of gcd and recursive iterations

I’m studying for mid-terms and this is one of the questions from a past yr paper in university. (Questions stated below)
Given Euclid’s algorithm, we can write the function gcd.
def gcd(a,b):
if b == 0:
return a
else:
return gcd(b, a%b)
[Reduced Proper Fraction]
Consider the fraction, n/d , where n and d are positive integers.
If n < d and GCD(n,d) = 1, it is called a reduced proper fraction.
If we list the set of reduced proper fractions for n <=8 in ascending order of size, we get:
1/8,1/7,1/6,1/5,1/4,2/7,1/3,3/8,2/5,3/7,1/2,4/7,3/5,5/8,2/3,5/7,3/4,4/5,5/6,6/7,7/8
It can be seen that there are 21 elements in this set.
Implement the function count_fraction that takes an integer n and returns the number of reduced proper fractions for n. Assuming that the order of growth (in time) for gcd is O(logn), what is the order of growth in terms of time and space for the function you wrote in Part (B) in terms of n. Explain your answer.
Suggested answer.
def count_fraction(n):
if n==1:
return 0
else:
new = 0
for i in range(1,n):
if gcd(i,n) == 1:
new += 1
return new + count_fraction(n-1)
The suggested answer is pretty strange as the trend of this question in previous years, is designed to test purely recursive/purely iterative solutions, but it gave a mix. Nevertheless, I don’t understand why the suggested order of growth is given as such. (I will write it in the format, suggested answer, my answer and questions on my fundamentals)
Time: O(nlogn), since it’s roughly log1+log2+· · ·+log(n−1)+logn
My time: O(n^2 log n). Since there is n recursive function calls, each call has n-1 iterations, which takes O(log n) time due to gcd.
Question 1: Time in my opinion is counting number of iterations/recursions* time taken for 1 iteration/recursion. It’s actually my first time interacting with a mixed iterative/recursive solution so I don’t really know the interaction. Can someone tell me whether I'm right/wrong?
Space: O(n), since gcd is O(1) and this code is obviously linear recursion.
My space: O(n*log n). Since gcd is O(log n) and this code takes up O(n) space.
Question 2: Space in my opinion is counting number of recursions*space taken for 1 recursive call OR largest amount of space required among all iterations. In the first place, I would think gcd is O(log n) as I assume that recursion will happen log n times. I want to ask whether the discrepancy is due to what my lecturer said.
(I don’t really understand what my lecturers says about delayed operations for recursions on factorial or no new objects being formed in iteratives. How do u then accept the fact that there are NEW objects formed in recursion also no delayed operations in iteration).
If u can clarify my doubt on why gcd is O(1) instead of O(log n), I think if I take n*1 for recursion case, I would agree with the answer.
I agree with your analysis for of the running time. It should be O(n^2 log(n)), since you make n calls to gcd on each recursive call to count_fraction.
You're also partly right about the second question, but you get the conclusion wrong (and the supplied answer gets the right conclusion for the wrong reasons). The gcd function does indeed use O(log(n)) space, for the stack of the recursive calls. However, that space gets reused for each later call to gcd from count_fraction, so there's only ever one stack of size log(n). So there's no reason to multiply the log(n) by anything, only add it to whatever else might be using memory when the gcd calls are happening. Since there will also be a stack of size O(n) for the recursive calls of count_fraction, the smaller log(n) term can be dropped, so you say it takes O(n) space rather than O(n + log(n)).
All in all, I'd say this is a really bad assignment to be trying to learn from. Almost everything in it has an error somewhere, from the description saying it's limiting n when it's really limiting d, to the answers you describe which are all at least partly wrong.

Time Complexity for index and drop of first item in Data.Sequence

I was recently working on an implementation of calculating moving average from a stream of input, using Data.Sequence. I figured I could get the whole operation to be O(n) by using a deque.
My first attempt was (in my opinion) a bit more straightforward to read, but not a true a deque. It looked like:
let newsequence = (|>) sequence n
...
let dropFrontTotal = fromIntegral (newtotal - index newsequence 0)
let newsequence' = drop 1 newsequence.
...
According to the hackage docs for Data.Sequence, index should take O(log(min(i,n-i))) while drop should also take O(log(min(i,n-i))).
Here's my question:
If I do drop 1 someSequence, doesn't this mean a time complexity of O(log(min(1, (length someSequence)))), which in this case means: O(log(1))?
If so, isn't O(log(1)) effectively constant?
I had the same question for index someSequence 0: shouldn't that operation end up being O(log(0))?
Ultimately, I had enough doubts about my understanding that I resorted to using Criterion to benchmark the two implementations to prove that the index/drop version is slower (and the amount it's slower by grows with the input). The informal results on my machine can be seen at the linked gist.
I still don't really understand how to calculate time complexity for these operations, though, and I would appreciate any clarification anyone can provide.
What you suggest looks correct to me.
As a minor caveat remember that these are amortized complexity bounds, so a single operation could require more than constant time, but a long chain of operations will only require a constant times the number of the chain.
If you use criterion to benchmark and "reset" the state at every computation, you might see non-constant time costs, because the "reset" is preventing the amortization. It really depends on how you perform the test. If you start from a sequence an perform a long chain of operations on that, it should be OK. If you repeat many times a single operation using the same operands, then it could be not OK.
Further, I guess bounds such as O(log(...)) should actually be read as O(log(1 + ...)) -- you can't realistically have O(log(1)) = O(0) or, worse O(log(0))= O(-inf) as a complexity bound.

How can natural numbers be represented to offer constant time addition?

Cirdec's answer to a largely unrelated question made me wonder how best to represent natural numbers with constant-time addition, subtraction by one, and testing for zero.
Why Peano arithmetic isn't good enough:
Suppose we use
data Nat = Z | S Nat
Then we can write
Z + n = n
S m + n = S(m+n)
We can calculate m+n in O(1) time by placing m-r debits (for some constant r), one on each S constructor added onto n. To get O(1) isZero, we need to be sure to have at most p debits per S constructor, for some constant p. This works great if we calculate a + (b + (c+...)), but it falls apart if we calculate ((...+b)+c)+d. The trouble is that the debits stack up on the front end.
One option
The easy way out is to just use catenable lists, such as the ones Okasaki describes, directly. There are two problems:
O(n) space is not really ideal.
It's not entirely clear (at least to me) that the complexity of bootstrapped queues is necessary when we don't care about order the way we would for lists.
As far as I know, Idris (a dependently-typed purely functional language which is very close to Haskell) deals with this in a quite straightforward way. Compiler is aware of Nats and Fins (upper-bounded Nats) and replaces them with machine integer types and operations whenever possible, so the resulting code is pretty effective. However, that's not true for custom types (even isomorphic ones) as well as for compilation stage (there were some code samples using Nats for type checking which resulted in exponential growth in compile-time, I can provide them if needed).
In case of Haskell, I think a similar compiler extension may be implemented. Another possibility is to make TH macros which would transform the code. Of course, both of options aren't easy.
My understanding is that in basic computer programming terminology the underlying problem is you want to concatenate lists in constant time. The lists don't have cheats like forward references, so you can't jump to the end in O(1) time, for example.
You can use rings instead, which you can merge in O(1) time, regardless if a+(b+(c+...)) or ((...+c)+b)+a logic is used. The nodes in the rings don't need to be doubly linked, just a link to the next node.
Subtraction is the removal of any node, O(1), and testing for zero (or one) is trivial. Testing for n > 1 is O(n), however.
If you want to reduce space, then at each operation you can merge the nodes at the insertion or deletion points and weight the remaining ones higher. The more operations you do, the more compact the representation becomes! I think the worst case will still be O(n), however.
We know that there are two "extremal" solutions for efficient addition of natural numbers:
Memory efficient, the standard binary representation of natural numbers that uses O(log n) memory and requires O(log n) time for addition. (See also Chapter "Binary Representations" in the Okasaki's book.)
CPU efficient which use just O(1) time. (See Chapter "Structural Abstraction" in the book.) However, the solution uses O(n) memory as we'd represent natural number n as a list of n copies of ().
I haven't done the actual calculations, but I believe for the O(1) numerical addition we won't need the full power of O(1) FIFO queues, it'd be enough to bootstrap standard list [] (LIFO) in the same way. If you're interested, I could try to elaborate on that.
The problem with the CPU efficient solution is that we need to add some redundancy to the memory representation so that we can spare enough CPU time. In some cases, adding such a redundancy can be accomplished without compromising the memory size (like for O(1) increment/decrement operation). And if we allow arbitrary tree shapes, like in the CPU efficient solution with bootstrapped lists, there are simply too many tree shapes to distinguish them in O(log n) memory.
So the question is: Can we find just the right amount of redundancy so that sub-linear amount of memory is enough and with which we could achieve O(1) addition? I believe the answer is no:
Let's have a representation+algorithm that has O(1) time addition. Let's then have a number of the magnitude of m-bits, which we compute as a sum of 2^k numbers, each of them of the magnitude of (m-k)-bit. To represent each of those summands we need (regardless of the representation) minimum of (m-k) bits of memory, so at the beginning, we start with (at least) (m-k) 2^k bits of memory. Now at each of those 2^k additions, we are allowed to preform a constant amount of operations, so we are able to process (and ideally remove) total of C 2^k bits. Therefore at the end, the lower bound for the number of bits we need to represent the outcome is (m-k-C) 2^k bits. Since k can be chosen arbitrarily, our adversary can set k=m-C-1, which means the total sum will be represented with at least 2^(m-C-1) = 2^m/2^(C+1) ∈ O(2^m) bits. So a natural number n will always need O(n) bits of memory!

Most efficient way to print an AVL tree of strings?

I'm thinking that an in order traversal will run in O(n) time. The only thing better than that would be to have something running in logn time. But I don't see how this could be, considering we have to run at least n times.
Is O(n) the lastest we could do here?
Converting and expanding #C.B.'s comment to an answer:
If you have an AVL tree with n strings in it and you want to print all of them, then you have to do at least Θ(n) total work simply because you have to print out each of the n strings. You can often lower-bound the amount of work required to produce a list or otherwise output a sequence of values simply by counting up how many items are going to be in the list.
We can be even more precise here. Suppose the combined length of all the strings in the tree is L. The time required to print out all the strings in the tree has to be at least Θ(L), since it costs some computational effort to output each individual character. Therefore, we can say that we have to do at least Θ(n + L) work to print out all the strings in the tree.
The bound given here just says that any correct algorithm has to do at least this much work, not that there actually is an algorithm that does this much work. But if you look closely at any of the major tree traversals - inorder, preorder, postorder, level-order - you'll find that they all match this time bound.
Now, one area where you can look for savings is in space complexity. A level-order traversal of the tree might require Ω(n) total space if the tree is perfectly balanced (since it holds a whole layer of the tree in memory and the bottommost layer can have Θ(n) nodes in it), while an inorder, preorder, or postorder traversal would only require O(log n) memory because you only need to store the current access path, which has logarithmic height in an AVL tree.

Resources