i need some help for my problem related to complex datatype in C#. I have following type of data and i want to save it in variable, but it will be performance efficient as i have to use it for search and there will be lot of data in it. Data sample is as follow:
ParentNode1
ChildNode1
ChildNode2
ChildNode3
ParentNode2
ParentNode3
ParentNode4
ChildNode1
ChildNode2
Node1
Node2
Node3
Nth level Node1
ChildNode3
ParentNode5
Above data is just a sample to show hierarchy of data. I'm not sure nested List, Dictionary, ienumerable or link list which will be best related to performance. Thanks
If you know that a search will take place at a single level, then you might want a list of lists: one list for each level. If your hierarchy has N levels, then you have N lists. Each one contains nodes that are:
ListNode
Data // string
ParentIndex // index of parent in the previous list
So to search level 4, you go to the list for that level and do your contains or regex test on each node in that level. If it matches, then the ParentIndex value will get you the parent, and its ParentIndex will get you the grandparent, etc.
This way, you don't have to worry about navigating the hierarchy except when you find a match, and you don't have to write nested or recursive algorithms to traverse the tree.
You could maintain your hierarchy, as well, with each top-level node containing a list of child nodes, and build this secondary list only for searching.
Related
I want to map a timestamp t and an identifier id to a certain state of an object. I can do so by mapping a tuple (t,id) -> state_of_id_in_t. I can use this mapping to access one specific (t,id) combination.
However, sometimes I want to know all states (with matching timestamps t) of a specific id (i.e. id -> a set of (t, state_of_id_in_t)) and sometimes all states (with matching identifiers id) of a specific timestamp t (i.e. t -> a set of (id, state_of_id_in_t)). The problem is that I can't just put all of these in a single large matrix and do linear search based on what I want. The amount of (t,id) tuples for which I have states is very large (1m +) and very sparse (some timestamps have many states, others none etc.). How can I make such a dict, which can deal with accessing its contents by partial keys?
I created two distinct dicts dict_by_time an dict_by_id, which are dicts of dicts. dict_by_time maps a timestamp t to a dict of ids, which each point to a state. Similiarly, dict_by_id maps an id to a dict of timestamps, which each point to a state. This way I can access a state or a set of states however I like. Notice that the 'leafs' of both dicts (dict_by_time an dict_by_id) point to the same objects, so its just the way I access the states that's different, the states themselves however are the same python objects.
dict_by_time = {'t_1': {'id_1': 'some_state_object_1',
'id_2': 'some_state_object_2'},
't_2': {'id_1': 'some_state_object_3',
'id_2': 'some_state_object_4'}
dict_by_id = {'id_1': {'t_1': 'some_state_object_1',
't_2': 'some_state_object_3'},
'id_2': {'t_1': 'some_state_object_2',
't_2': 'some_state_object_4'}
Again, notice the leafs are shared across both dicts.
I don't think it is good to do it using two dicts, simply because maintaining both of them when adding new timestamps or identifiers result in double work and could easily lead to inconsistencies when I do something wrong. Is there a better way to solve this? Complexity is very important, which is why I can't just do manual searching and need to use some sort of HashMap magic.
You can always trade add complexity with lookup complexity. Instead of using a single dict, you can create a Class with an add method and a lookup method. Internally, you can keep track of the data using 3 different dictionaries. One uses the (t,id) tuple as key, one uses t as the key and one uses id as the key. Depending on the arguments given to lookup, you can return the result from one of the dictionaries.
How do you define a directed acyclic graph (DAG) (of strings) (with one root) best in Haskell?
I especially need to apply the following two functions on this data structure as fast as possible:
Find all (direct and indirect) ancestors of one element (including the parents of the parents etc.).
Find all (direct) children of one element.
I thought of [(String,[String])] where each pair is one element of the graph consisting of its name (String) and a list of strings ([String]) containing the names of (direct) parents of this element. The problem with this implementation is that it's hard to do the second task.
You could also use [(String,[String])] again while the list of strings ([String]) contain the names of the (direct) children. But here again, it's hard to do the first task.
What can I do? What alternatives are there? Which is the most efficient way?
EDIT: One more remark: I'd also like it to be defined easily. I have to define the instance of this data type myself "by hand", so i'd like to avoid unnecessary repetitions.
Have you looked at the tree implemention in Martin Erwig's Functional Graph Library? Each node is represented as a context containing both its children and its parents. See the graph type class for how to access this. It might not be as easy as you requested, but it is already there, well-tested and easy-to-use. I have used it for more than a decade in a large project.
When looking through the docs of Data.Set, I saw that insertion of an element into the tree is mentioned to be O(log(n)). However, I would intuitively expect it to be O(n*log(n)) (or maybe O(n)?), as referential transparency requires creating a full copy of the previous tree in O(n).
I understand that for example (:) can be made O(1) instead of O(n), as here the full list doesn't have to be copied; the new list can be optimized by the compiler to be the first element plus a pointer to the old list (note that this is a compiler - not a language level - optimization). However, inserting a value into a Data.Set involves rebalancing that looks quite complex to me, to the point where I doubt that there's something similar to the list optimization. I tried reading the paper that is referenced by the Set docs, but couldn't answer my question with it.
So: how can inserting an element into a binary tree be O(log(n)) in a (purely) functional language?
There is no need to make a full copy of a Set in order to insert an element into it. Internally, element are stored in a tree, which means that you only need to create new nodes along the path of the insertion. Untouched nodes can be shared between the pre-insertion and post-insertion version of the Set. And as Deitrich Epp pointed out, in a balanced tree O(log(n)) is the length of the path of the insertion. (Sorry for omitting that important fact.)
Say your Tree type looks like this:
data Tree a = Node a (Tree a) (Tree a)
| Leaf
... and say you have a Tree that looks like this
let t = Node 10 tl (Node 15 Leaf tr')
... where tl and tr' are some named subtrees. Now say you want to insert 12 into this tree. Well, that's going to look something like this:
let t' = Node 10 tl (Node 15 (Node 12 Leaf Leaf) tr')
The subtrees tl and tr' are shared between t and t', and you only had to construct 3 new Nodes to do it, even though the size of t could be much larger than 3.
EDIT: Rebalancing
With respect to rebalancing, think about it like this, and note that I claim no rigor here. Say you have an empty tree. Already balanced! Now say you insert an element. Already balanced! Now say you insert another element. Well, there's an odd number so you can't do much there.
Here's the tricky part. Say you insert another element. This could go two ways: left or right; balanced or unbalanced. In the case that it's unbalanced, you can clearly perform a rotation of the tree to balance it. In the case that it's balanced, already balanced!
What's important to note here is that you're constantly rebalancing. It's not like you have a mess of a tree, decided to insert an element, but before you do that, you rebalance, and then leave a mess after you've completed the insertion.
Now say you keep inserting elements. The tree's gonna get unbalanced, but not by much. And when that does happen, first off you're correcting that immediately, and secondly, the correction occurs along the path of the insertion, which is O(log(n)) in a balanced tree. The rotations in the paper you linked to are touching at most three nodes in the tree to perform a rotation. so you're doing O(3 * log(n)) work when rebalancing. That's still O(log(n)).
To add extra emphasis to what dave4420 said in a comment, there are no compiler optimizations involved in making (:) run in constant time. You could implement your own list data type, and run it in a simple non-optimizing Haskell interpreter, and it would still be O(1).
A list is defined to be an initial element plus a list (or it's empty in the base case). Here's a definition that's equivalent to native lists:
data List a = Nil | Cons a (List a)
So if you've got an element and a list, and you want to build a new list out of them with Cons, that's just creating a new data structure directly from the arguments the constructor requires. There is no more need to even examine the tail list (let alone copy it), than there is to examine or copy the string when you do something like Person "Fred".
You are simply mistaken when you claim that this is a compiler optimization and not a language level one. This behaviour follows directly from the language level definition of the list data type.
Similarly, for a tree defined to be an item plus two trees (or an empty tree), when you insert an item into a non-empty tree it must either go in the left or right subtree. You'll need to construct a new version of that tree containing the element, which means you'll need to construct a new parent node containing the new subtree. But the other subtree doesn't need to be traversed at all; it can be put in the new parent tree as is. In a balanced tree, that's a full half of the tree that can be shared.
Applying this reasoning recursively should show you that there's actually no copying of data elements necessary at all; there's just the new parent nodes needed on the path down to the inserted element's final position. Each new node stores 3 things: an item (shared directly with the item reference in the original tree), an unchanged subtree (shared directly with the original tree), and a newly created subtree (which shares almost all of its structure with the original tree). There will be O(log(n)) of those in a balanced tree.
Is there a way to formulate a Core Data predicate for a given object, representing the head of a singly linked list, and all of the other objects in that list?
E.g., I have objects, each of which has a relationship to another object (say nextObject) and I want a predicate for a specified object and all other objects reachable by traversing nextObject (until it is nil).
CLARIFICATION:
I'm using these for a UITableView's NSFetchedResultsController, so these need to be part of the fetch, not something I iterate through in code.
You wouldn't use a predicate for a linked list. Instead, you would just start with the first object of interest and walk the relationships by calling nextObject until you hit one that did not have a nextObject value.
You can find the first and last objects with a predicate in a fetch just by looking for previousObject==nil and nextObject==nil.
Predicates do not understand arbitrarily long relationship chains. They understand a chain like enity1.entity2.entity3 but not nextObject.nextObject.nextObject... because they have no way of knowing when to stop.
Hoping that someone here will be able to provide some mysql advice...
I am working on a categorical searchtag system. I have tables like the following:
EXERCISES
exerciseID
exerciseTitle
SEARCHTAGS
searchtagID
parentID ( -> searchtagID)
searchtag
EXERCISESEARCHTAGS
exerciseID (Foreign key -> EXERCISES)
searchtagID (Foreign key -> SEARCHTAGS)
Searchtags can be arranged in an arbitrarily deep tree. So for example I might have a tree of searchtags that looks like this...
Body Parts
Head
Neck
Arm
Shoulder
Elbow
Leg
Hip
Knee
Muscles
Pecs
Biceps
Triceps
Now...
I want to select all of the searchtags in ONE branch of the tree that reference at least ONE record in the subset of records referenced by a SINGLE searchtag in a DIFFERENT branch of the tree.
For example, let's say the searchtag "Arm" points to a subset of exercises. If any of the exercises in that subset are also referenced by searchtags from the "Muscles" branch of SEARCHTAGS, I would like to select for them. So my query could potentially return "Biceps," "Triceps".
Two questions:
1) What would the SELECT query for something like this look like? (If such a thing is even possible without creating a lot of slow down. I'm not sure where to start...)
2) Is there anything I should do to tweak my datastructure to ensure this query will continue to run fast - even as the tables get big?
Thanks in advance for your help, it's much appreciated.
An idea: consider using a cache table that saves all ancestor relationships in your searchtags:
CREATE TABLE SEARCHTAGRELATIONS (
parentID INT,
descendantID INT
);
Also include the tag itself as parent and descendant (so, for searchtag with id 1, the relations table includes a row with (1,1).
That way, you get rid of the parent/descendant relationships and can join a flat table. Assuming "Muscles" has the ID 5,
SELECT descendantID FROM SEARCHTAGRELATIONS WHERE parentID=5
returns all searchtags contained in muscles.
Alternatively, use modified preorder tree traversal, also known as the nested set model. It requires two fields (left and right) instead of one (parent id), and makes certain operations harder, but makes selecting whole branches much easier.