I'm working on a set of n integers from an interval of [0, m] and I need the following operations:
find
remove
insert
enumerate (get all entries)
clear (remove all entries)
At the moment I'm using binary trees / heaps for this but wonder if there is a more efficient data structure.
I could use uninitialized RAM, as far as I can see requiring O(1) for find / remove / insert, O(n) for enumerate / clear, while needing O(m) space. (e.g. see https://research.swtch.com/sparse)
Is there any data structure requiring less than O(m) space while still maintaining (amortized) worst-case-complexities O(1) for find / remove / insert and O(n) for enumerate / clear?
Please note: Amortized as in: "Given n operations, we have constant * n operations in total, so amortized worst case is O(1)". Not as in: "Given this probability distribution and given assumptions X, Y, Z, average worst case will most probably be in O(1)."
So I'm not looking for hash-table based solutions that will work in O(1) "probably most of the time".
No, there is no such known data structure. See "Mihai Pǎtraşcu: obituary and open problems" by Mikkel Thorup, section 1, Problem 1. "If we want a joint bound for both lookups and updates, the best known bound is O(sqrt(log n / log log n))"
Related
I am having difficulty understanding the space leak in Hudak's paper "plugging a space leak with arrow"
https://www.sciencedirect.com/science/article/pii/S1571066107005919.
1) What exactly does O(n) space complexity mean? The total memory allocated with respect to input size? What about garbage collection along the way?
2) If the definition in 1) holds, how is it that in page 34, they say if dt is constant, the signal type is akin to list type and runs in constant space? Doesn't integralC still create 1 unit of space at each step, totally n units, that is, still O(n)?
3) I totally do not understand why time complexity is O(n^2). I do have an inkling of what needs to be evaluated (i', i'', i''' in picture below), but how is that O(n^2)?
The image represents the evaluation steps I have drawn in lambda graph notation. Each step sees its structure ADDED to the overall scope rather than REPLACING whatever is in it. Square denotes pointer, so square(i') in step 2 denotes i' block in step 1 for example.
I have only glanced at the paper briefly, but will do my best.
As usual, space complexity means than at some point in time we need to be storing that much "stuff" simultaneously in memory. GC says we can recover memory from variables we no longer need, but here we need to be remembering O(n) stuff, the memory can't be recovered yet because we (may) still need access to any part of it. You can think of it as, reusing memory (via eg. GC) adds to time but not space complexity. Here, n is computing the nth value by providing n time steps (dts).
If dt is constant, then instead of the type of C a = (a, dt -> C a) we have C' a = (a, C' a) which is just a (nonempty) list. The point of the paper is that either type can be made to run in constant space, but if it were isomorphic to lists then that's a solved problem. To see why creating a new value at each step can be constant memory, consider a possible evaluation of (iterate f)!!n, where we store just x, then overwrite it with f (x), then overwrite it with f (f (x)), and so on until we have f^n(x), but only ever using this one cell of memory for our values (and technically a second cell to iterate up to n).
Let's consider a really simple example of evaluation giving these different complexities. Let's say we're generating a list from some seed where each item is the sum of all the previous items. To calculate each next item, we could hold the entirety of the initial part p of the list in memory (O(Len(p))) and sum it (O(Len(p))), resulting in total memory O(n) and run time O(n^2) to retrieve then`th element - or we could observe that this is in fact the same as doubling the previous item, allowing us to use constant memory and linear time. I think the analogy section given is quite helpful - can you mechanically write out successors for the first few values and see how the two different evaluation strategies rapidly diverge in steps needed?
I have encountered times when it was O(log n), and times when it was O(n log n). This is also a very popular interview question. So when the interviewer blindly asks you what is the run time of a binary search (with no context)? What should you say?
Sounds like a trick question, since there is no context. Looks like interviewer wants to cover cases when binary search is good, and when its not.
So, binary search is great when you have sorted list of elements and you search for single element, and in such case it costs O(logn).
Now, if we don't have sorted array, cost of sorting it is O(n logn), and then you can apply first case. In such case, its better to place values in set or map and then search (execution time will be O(n) for inserting, O(1) for search).
Both of this cases rests on single search. Binary search is not for searching n elements in single execution (or any number of elements depending on n, like n/2 elements, n/4, or even logn elements - for fixed number its ok). For such cases, there are better ways (sets and maps).
O(log n), for average and worst case. Never heard someone claim it is O(n log n).
Cirdec's answer to a largely unrelated question made me wonder how best to represent natural numbers with constant-time addition, subtraction by one, and testing for zero.
Why Peano arithmetic isn't good enough:
Suppose we use
data Nat = Z | S Nat
Then we can write
Z + n = n
S m + n = S(m+n)
We can calculate m+n in O(1) time by placing m-r debits (for some constant r), one on each S constructor added onto n. To get O(1) isZero, we need to be sure to have at most p debits per S constructor, for some constant p. This works great if we calculate a + (b + (c+...)), but it falls apart if we calculate ((...+b)+c)+d. The trouble is that the debits stack up on the front end.
One option
The easy way out is to just use catenable lists, such as the ones Okasaki describes, directly. There are two problems:
O(n) space is not really ideal.
It's not entirely clear (at least to me) that the complexity of bootstrapped queues is necessary when we don't care about order the way we would for lists.
As far as I know, Idris (a dependently-typed purely functional language which is very close to Haskell) deals with this in a quite straightforward way. Compiler is aware of Nats and Fins (upper-bounded Nats) and replaces them with machine integer types and operations whenever possible, so the resulting code is pretty effective. However, that's not true for custom types (even isomorphic ones) as well as for compilation stage (there were some code samples using Nats for type checking which resulted in exponential growth in compile-time, I can provide them if needed).
In case of Haskell, I think a similar compiler extension may be implemented. Another possibility is to make TH macros which would transform the code. Of course, both of options aren't easy.
My understanding is that in basic computer programming terminology the underlying problem is you want to concatenate lists in constant time. The lists don't have cheats like forward references, so you can't jump to the end in O(1) time, for example.
You can use rings instead, which you can merge in O(1) time, regardless if a+(b+(c+...)) or ((...+c)+b)+a logic is used. The nodes in the rings don't need to be doubly linked, just a link to the next node.
Subtraction is the removal of any node, O(1), and testing for zero (or one) is trivial. Testing for n > 1 is O(n), however.
If you want to reduce space, then at each operation you can merge the nodes at the insertion or deletion points and weight the remaining ones higher. The more operations you do, the more compact the representation becomes! I think the worst case will still be O(n), however.
We know that there are two "extremal" solutions for efficient addition of natural numbers:
Memory efficient, the standard binary representation of natural numbers that uses O(log n) memory and requires O(log n) time for addition. (See also Chapter "Binary Representations" in the Okasaki's book.)
CPU efficient which use just O(1) time. (See Chapter "Structural Abstraction" in the book.) However, the solution uses O(n) memory as we'd represent natural number n as a list of n copies of ().
I haven't done the actual calculations, but I believe for the O(1) numerical addition we won't need the full power of O(1) FIFO queues, it'd be enough to bootstrap standard list [] (LIFO) in the same way. If you're interested, I could try to elaborate on that.
The problem with the CPU efficient solution is that we need to add some redundancy to the memory representation so that we can spare enough CPU time. In some cases, adding such a redundancy can be accomplished without compromising the memory size (like for O(1) increment/decrement operation). And if we allow arbitrary tree shapes, like in the CPU efficient solution with bootstrapped lists, there are simply too many tree shapes to distinguish them in O(log n) memory.
So the question is: Can we find just the right amount of redundancy so that sub-linear amount of memory is enough and with which we could achieve O(1) addition? I believe the answer is no:
Let's have a representation+algorithm that has O(1) time addition. Let's then have a number of the magnitude of m-bits, which we compute as a sum of 2^k numbers, each of them of the magnitude of (m-k)-bit. To represent each of those summands we need (regardless of the representation) minimum of (m-k) bits of memory, so at the beginning, we start with (at least) (m-k) 2^k bits of memory. Now at each of those 2^k additions, we are allowed to preform a constant amount of operations, so we are able to process (and ideally remove) total of C 2^k bits. Therefore at the end, the lower bound for the number of bits we need to represent the outcome is (m-k-C) 2^k bits. Since k can be chosen arbitrarily, our adversary can set k=m-C-1, which means the total sum will be represented with at least 2^(m-C-1) = 2^m/2^(C+1) ∈ O(2^m) bits. So a natural number n will always need O(n) bits of memory!
I'm thinking that an in order traversal will run in O(n) time. The only thing better than that would be to have something running in logn time. But I don't see how this could be, considering we have to run at least n times.
Is O(n) the lastest we could do here?
Converting and expanding #C.B.'s comment to an answer:
If you have an AVL tree with n strings in it and you want to print all of them, then you have to do at least Θ(n) total work simply because you have to print out each of the n strings. You can often lower-bound the amount of work required to produce a list or otherwise output a sequence of values simply by counting up how many items are going to be in the list.
We can be even more precise here. Suppose the combined length of all the strings in the tree is L. The time required to print out all the strings in the tree has to be at least Θ(L), since it costs some computational effort to output each individual character. Therefore, we can say that we have to do at least Θ(n + L) work to print out all the strings in the tree.
The bound given here just says that any correct algorithm has to do at least this much work, not that there actually is an algorithm that does this much work. But if you look closely at any of the major tree traversals - inorder, preorder, postorder, level-order - you'll find that they all match this time bound.
Now, one area where you can look for savings is in space complexity. A level-order traversal of the tree might require Ω(n) total space if the tree is perfectly balanced (since it holds a whole layer of the tree in memory and the bottommost layer can have Θ(n) nodes in it), while an inorder, preorder, or postorder traversal would only require O(log n) memory because you only need to store the current access path, which has logarithmic height in an AVL tree.
If the list has 1024 items (lg1024 = 10) at what point (the number of searches) does sorting the list first and using binary search pay off? How does your answer change if the list has 2048 items? instead of using sequential search
Where the "linear access" curve crosses the "binary search" curve depends on how long it takes to access/insert a single item versus how many items there are. This will be different for every combination of compiler, memory and cpu architecture, type of data/node in the list, the distribution of data values, what sort and insertion algorithms you use, etc... But with a "large enough" set of items, the running time can be described by mentioning how its upper bound grows with increasing number of items, even though that "Big-O" bound may not precisely describe any particular run.
You can figure out precisely if you can know the specific algorithm you will insert or search with, and determine the actual instructions that make up your list accesses, and find out how many clock cycles they take to execute, etc etc...
Then you can say for sure which one is faster, and at which point. And if you know you data values, you can model it. But if you don't know, you have to assume (for example, what if your inserted data values are already ordered? how does that affect your sort or insertion function?)
For example, a single item retrieval may take 1us. Comparing two items may take 0.5us. Doing a sorted list insertion with 100 items in the list might require X number of retrievals, Y number of compares, and Z number of updates/writes.... Whereas an unordered list might require more or less depending on what's already there and what you're inserting.
if your list is unsorted it will take O(n) to find it. Sort with quicksort costs O(n*log n), then binary search is O(log n). Lets assume that x is number of searchs. x * n = x * logn + n * logn . by putting different values you can estimate the dynamics. my rough estimate tells that if n = 1024 and number searches is greater then ~10, it is more efficitent to sort first. put 1024 instead of n and try.