Related
I want to map a timestamp t and an identifier id to a certain state of an object. I can do so by mapping a tuple (t,id) -> state_of_id_in_t. I can use this mapping to access one specific (t,id) combination.
However, sometimes I want to know all states (with matching timestamps t) of a specific id (i.e. id -> a set of (t, state_of_id_in_t)) and sometimes all states (with matching identifiers id) of a specific timestamp t (i.e. t -> a set of (id, state_of_id_in_t)). The problem is that I can't just put all of these in a single large matrix and do linear search based on what I want. The amount of (t,id) tuples for which I have states is very large (1m +) and very sparse (some timestamps have many states, others none etc.). How can I make such a dict, which can deal with accessing its contents by partial keys?
I created two distinct dicts dict_by_time an dict_by_id, which are dicts of dicts. dict_by_time maps a timestamp t to a dict of ids, which each point to a state. Similiarly, dict_by_id maps an id to a dict of timestamps, which each point to a state. This way I can access a state or a set of states however I like. Notice that the 'leafs' of both dicts (dict_by_time an dict_by_id) point to the same objects, so its just the way I access the states that's different, the states themselves however are the same python objects.
dict_by_time = {'t_1': {'id_1': 'some_state_object_1',
'id_2': 'some_state_object_2'},
't_2': {'id_1': 'some_state_object_3',
'id_2': 'some_state_object_4'}
dict_by_id = {'id_1': {'t_1': 'some_state_object_1',
't_2': 'some_state_object_3'},
'id_2': {'t_1': 'some_state_object_2',
't_2': 'some_state_object_4'}
Again, notice the leafs are shared across both dicts.
I don't think it is good to do it using two dicts, simply because maintaining both of them when adding new timestamps or identifiers result in double work and could easily lead to inconsistencies when I do something wrong. Is there a better way to solve this? Complexity is very important, which is why I can't just do manual searching and need to use some sort of HashMap magic.
You can always trade add complexity with lookup complexity. Instead of using a single dict, you can create a Class with an add method and a lookup method. Internally, you can keep track of the data using 3 different dictionaries. One uses the (t,id) tuple as key, one uses t as the key and one uses id as the key. Depending on the arguments given to lookup, you can return the result from one of the dictionaries.
I am kind of new using data structures in haskell besides of Lists. My goal is to chose one container among Data.Vector, Data.Sequence, Data.List, etc ... My problem is the following:
I have to create a sequence (mathematically speaking). The sequence starts at 0. In each iteration two new elements are generated but only one should be appended based in whether the first element is already in the sequence. So in each iteration there is a call to elem function (see the pseudo-code below).
appendNewItem :: [Integer] -> [Integer]
appendNewItem acc = let firstElem = someFunc
secondElem = someOtherFunc
newElem = if firstElem `elem` acc
then secondElem
else firstElem
in acc `append` newElem
sequenceUptoN :: Int -> [Integer]
sequenceUptoN n = (iterate appendNewItem [0]) !! n
Where append and iterate functions vary depending on which colection you use (I am using lists in the type signature for simplicity).
The question is: Which data structure should I use?. Is Data.Sequence faster for this task because of the Finger Tree inner structure?
Thanks a lot!!
No, sequences are not faster for searching. A Vector is just a flat chunk of memory, which gives generally the best lookup performance. If you want to optimise searching, use Data.Vector.Unboxed. (The normal, “boxed” variant is also pretty good, but it actually contains only references to the elements in the flat memory-chunk, so it's not quite as fast for lookups.)
However, because of the flat memory layout, Vectors are not good for (pure-functional) appending: basically, whenever you add a new element, the whole array must be copied so as to not invalidate the old one (which somebody else might still be using). If you need to append, Seq is a pretty good choice, although it's not as fast as destructive appending: for maximum peformance, you'll want to pre-allocate an uninitialized Data.Vector.Unboxed.Mutable.MVector of the required size, populate it using the ST monad, and freeze the result. But this is much more fiddly than purely-functional alternatives, so unless you need to squeeze out every bit of performance, Data.Sequence is the way to go. If you only want to append, but not look up elements, then a plain old list in reverse order would also do the trick.
I suggest using Data.Sequence in conjunction with Data.Set. The Sequence to hold the sequence of values and the Set to track the collection.
Sequence, List, and Vector are all structures for working with values where the position in the structure has primary importance when it comes to indexing. In lists we can manipulate elements at the front efficiently, in sequences we can manipulate elements based on the log of the distance the closest end, and in vectors we can access any element in constant time. Vectors however, are not that useful if the length keeps changing, so that rules out their use here.
However, you also need to lookup a certain value within the list, which these structures don't help with. You have to search the whole of a list/sequence/vector to be certain that a new value isn't present. Data.Map and Data.Set are two of the structures for which you define an index value based on Ord, and let you lookup/insert in log(n). So, at the cost of memory usage you can lookup the presence of firstElem in your Set in log(n) time and then add newElem to the end of the sequence in constant time. Just make sure to keep these two structures in synch when adding or taking new elements.
The main problem is that I'm working in a functional language with immutable types so thing like pointers and deletion are a bit harder. I would prefer if this was implementable primarily in Haskell.
Let's imagine we have a single dimensional field
[x,x,x,x,x,x,x,x,x]
So I have a map with keys being SIZES and values being ADDRESSES because each entry starts from a certain ADDRESS and has a certain SIZE.
[(x,x,x),x,x,(x,x,x,x)]
I want to be able to add an element by SIZE to a map and then check if the entries are touching so that I can merge them.
Since my map is by SIZEs I have to iterate through the whole map to find the ones with the bordering ADDRESSes.
Do I really have to chose between implementing a 2 key map and O(n) for merger?
Welp, in essence, this looks like computer memory. Do you want it to be efficient? Because you know, "things like pointers" exist and work in Haskell perfectly well.
Since my map is by SIZEs I have to iterate through the whole map to find the ones with the bordering ADDRESSes.
No, if you store the ranges in a separate data structure. I think for such non-overlapping subsets, there was something called a spanning tree (or as suggested by #Daniel, IntervalMap), but I'm not exactly an expert on those. Otherwise, why don't you simply hold memory blocks like that?
data Block = Block { start :: Int, data :: [Byte] }
type Memory = [Block]
You could cache the block length or use a data structure where length is O(1), to make merges O(nBlocks).
Sure, that doesn't make it obvious at the type level that they won't ever overlap, but that's an invariant you can keep for yourself.
I am currently writing my bachelor CS thesis for my studies in Austria.
The programming language that I use is Haskell.
Now, I am trying to find a way to fix my following issue:
I have a list of tuples, lets say [(1,2),(2,3)]. From that list of tuples, I would now like to pick out each of that tuples and then do an operation on it:
Map.insert (1,2) XXX ftable
where
(1,2) is the first element of that list and XXX is some value and ftable is my map.
How can I "iterate" through that list and proceed with that operation inserting the "n-th" elment of my list to my map?
I guess I am just too much familiar with programming imperative and I do not find a way to fix that in Haskell.
It's not entirely clear what you mean here. Is it correct to assume that the tuples are meant to represent the keys in your map and XXX is some value attached to a specific key? Are all the values you want to match to a given key also provided in a list? In that case you can easily use the fromList function in Data.Map:
keys = [(1,2),(2,3),(7,9)]
values = ["A","B","C"]
map = Data.Map.fromList $ zip keys values
Think about what your loop is doing.
if it transforms each list element, then use map (or concatMap)
if it filters out some of the list elements, then use filter
if it reduces the list to a summary value, use a fold (e.g. foldl, foldr; more specific folds are sum, and, etc)
or if it's doing some combination of the above, use a combination of the above functions
In your case, I'm not entirely certain what you want, but I think you want to end up with a single Map, so you want to fold your list. Perhaps something like
foldl (\oldMap key -> Map.insert key xxx oldMap) ftable yourListOfTuples
You have several options to iterate on lists, each one to be chosen depending on the effect you're searching for:
folds, they are useful for many computations on lists. They may also be what you're looking for, but I do not understand from your question your specific issue.
map, that applies the same (given) function on every element of the input list returning the list of the results. There are also variants that works in monadic computations.
or, if it best suits your needs, you can always write your own tail recursive function: haskell will treat them just like for loops, without consuming heap (but please look at the reference link for a good explanation of the technique).
In the end, whatever function or technique you will use, it will be based on recursion since functional languages like Haskell do not admit for-loops in the form you are used to.
Building off of hakoja's suggestion...
It is probable that XXX at this point is either 1) constant for every key, or 2) some function based on the key, or 3) somehow defined in parallel to the key.
1) constant xxx for every key
keys = [(1,2),(2,3),(7,9)]
xxx = "A"
ftable = Data.Map.fromList $ zip keys (repeat xxx)
2) function produces value based on key
keys = [(1,2),(2,3),(7,9)]
f = ...
ftable = Data.Map.fromList $ zip keys (map f keys)
3) xxx defined in parallel: use hakoja's suggestion
In C++ and other languages, add-on libraries implement a multi-index container, e.g. Boost.Multiindex. That is, a collection that stores one type of value but maintains multiple different indices over those values. These indices provide for different access methods and sorting behaviors, e.g. map, multimap, set, multiset, array, etc. Run-time complexity of the multi-index container is generally the sum of the individual indices' complexities.
Is there an equivalent for Haskell or do people compose their own? Specifically, what is the most idiomatic way to implement a collection of type T with both a set-type of index (T is an instance of Ord) as well as a map-type of index (assume that a key value of type K could be provided for each T, either explicitly or via a function T -> K)?
I just uploaded IxSet to hackage this morning,
http://hackage.haskell.org/package/ixset
ixset provides sets which have multiple indexes.
ixset has been around for a long time as happstack-ixset. This version removes the dependencies on anything happstack specific, and is the new official version of IxSet.
Another option would be kdtree:
darcs get http://darcs.monoid.at/kdtree
kdtree aims to improve on IxSet by offering greater type-safety and better time and space usage. The current version seems to do well on all three of those aspects -- but it is not yet ready for prime time. Additional contributors would be highly welcomed.
In the trivial case where every element has a unique key that's always available, you can just use a Map and extract the key to look up an element. In the slightly less trivial case where each value merely has a key available, a simple solution it would be something like Map K (Set T). Looking up an element directly would then involve first extracting the key, indexing the Map to find the set of elements that share that key, then looking up the one you want.
For the most part, if something can be done straightforwardly in the above fashion (simple transformation and nesting), it probably makes sense to do it that way. However, none of this generalizes well to, e.g., multiple independent keys or keys that may not be available, for obvious reasons.
Beyond that, I'm not aware of any widely-used standard implementations. Some examples do exist, for example IxSet from happstack seems to roughly fit the bill. I suspect one-size-kinda-fits-most solutions here are liable to have a poor benefit/complexity ratio, so people tend to just roll their own to suit specific needs.
Intuitively, this seems like a problem that might work better not as a single implementation, but rather a collection of primitives that could be composed more flexibly than Data.Map allows, to create ad-hoc specialized structures. But that's not really helpful for short-term needs.
For this specific question, you can use a Bimap. In general, though, I'm not aware of any common class for multimaps or multiply-indexed containers.
I believe that the simplest way to do this is simply with Data.Map. Although it is designed to use single indices, when you insert the same element multiple times, most compilers (certainly GHC) will make the values place to the same place. A separate implementation of a multimap wouldn't be that efficient, as you want to find elements based on their index, so you cannot naively associate each element with multiple indices - say [([key], value)] - as this would be very inefficient.
However, I have not looked at the Boost implementations of Multimaps to see, definitively, if there is an optimized way of doing so.
Have I got the problem straight? Both T and K have an order. There is a function key :: T -> K but it is not order-preserving. It is desired to manage a collection of Ts, indexed (for rapid access) both by the T order and the K order. More generally, one might want a collection of T elements indexed by a bunch of orders key1 :: T -> K1, .. keyn :: T -> Kn, and it so happens that here key1 = id. Is that the picture?
I think I agree with gereeter's suggestion that the basis for a solution is just to maintiain in sync a bunch of (Map K1 T, .. Map Kn T). Inserting a key-value pair in a map duplicates neither the key nor the value, allocating only the extra heap required to make a new entry in the right place in the index. Inserting the same value, suitably keyed, in multiple indices should not break sharing (even if one of the keys is the value). It is worth wrapping the structure in an API which ensures that any subsequent modifications to the value are computed once and shared, rather than recomputed for each entry in an index.
Bottom line: it should be possible to maintain multiple maps, ensuring that the values are shared, even though the key-orders are separate.