Drawing a hierarchical tree: treemapping

Drawing a hierarchical tree: treemapping - graphics

I'm trying to develop a view of a hierarchical tree in which the weight of each node is the actual number of children it has. A leaf node has weight 1.
I want to arrange these items in a way they can be browsed going deeper into the tree by showing the root categories (with no parent) at the beginning. Clicking on a node makes the view repaint iself to show just children of that node.
The tricky part is that the size in pixel of a node should be proportional to its weight compared to adjacents nodes. According to wikipedia this is called treemapping and what I need is a tiling algorithm, I was trying to figure out by myself but it seems more complex that I expected..
To give you an example there is a program for Mac Os X called GrandPerspective that shows folder sizes of your HD:
(source: arstechnica.com)
I want to arrange nodes in a way like this! (of course size is proportional to folder size)
Any suggestions?
Thanks

The data structure used in the file system example you show is most likely a KD tree. I'm not exactly sure how well the problem you want to solve maps to the file system example but this is how I would go along solving the file system case myself:
You start with a rectangle representing the root of the hard disk.
You take all files and directories in the directory and give them a size.
For files the size is the size of the file
For directories, the size is the the complete size of all files it contains (including all its subfolders, and their subfolders and so on).
Now you try to cut this list into two as equally sized lists as possible. Now you cut the input rectangle into two rectangles that has the same size proportions as the two lists you cut the input files into. You should make the cut along the axis that is shorter of the input rectangles size to make sure you always have as quadratic as possible rectangles. Now you run the algorithm recursively on the two lists with their corresponding rectangle.
The base cases would be:
There is only a single file in the list. You then fill the rectangle with the color of the file type.
There is a single directory in the list. For this case you run the algorithm recursively on the contents in the directory inside the rectangle.
Choosing how to split the lists into two as equally sized parts as possible may not be trivial (is it a knapsack?). A decent heuristic approach would probably be to sort the list in descending order and take the elements out of the list and put it in the currently smallest of the two resulting lists.
EDIT: The splitting problem is called partition and is a special case of knapsack. It's covered in this thread here on SO.

That's a squarified treemap. You can read the paper explaining this technique.

Related

How can I adjust a GraphViz render

I'm trying to render a tree that is very broad... and it renders, as expected, in a long, skinny horizontal image.
Problem is that I need a graph suitable for a document. I would very much like to take and move the nodes that are rendered horizontally and "drag" them down so that the graph is more vertical... with the edges curving to accommodate this. Are there any clever ways to accomplish this? GraphViz settings? Third party tools that let me manipulate and fine tune the output? I work mostly in the Python ecosystem, but open to others. Also open to the use of tools like Visio and other pro drawing tools. Thanks!
Edit
After implementing the answer below by #sroush, and then tweaking a little further with Photoshop, got some nice results.
Tweaking the above in Photosop. Had to add the two curved edges after the secondary node by hand, but it's worth it. Much more presentable.

I assume you are using dot, and your graph "naturally" has only a few ranks (rows).
There are a few tweaks that will help a bit (reducing node horizontal footprints):
node [shape=rect] // snugger fit into rectangles
insert newlines into node labels e.g. xxx [label="Controller Board\n#19_8"])
Also try the unflatten program (https://www.graphviz.org/pdf/unflatten.1.pdf). It will increase the apparent number of ranks (rows).
See related question here with command line examples:
Distribute nodes on the same rank of a wide graph to different lines

You can use the minlen property to limit the minimum level span of some edges.This avoids the result becoming very long in the horizontal position.
For example:
digraph {
a->b
a->c
a->d
a->e
}
This will output the following image:
But when minlen is used, the picture will become longer vertically but shortened horizontally:
digraph {
a->b
a->c
a->d[minlen=2]
a->e[minlen=3]
}

How Might I organize vertex data in WebGL for a frame-by-frame (very specific) animated program?

I have been working on an animated graphics project with very specific requirements, and after quite a bit of searching and test coding, I have figured that I could take several approaches, but the Khronos and MDN documentation I have been reading coupled with other posts I have seen here don't answer all of my questions regarding my particular project. In the meantime, I have written short test programs (setting infrastructure for testing).
Firstly, I should describe the project:
The main object drawn to the screen is a simple quad surrounded by a black outline (LINE_LOOP or LINES will do, probably, though I have had issues with z-fighting...that will be left for another question). When the user interacts with the program, exactly one new quad is created and immediately drawn, but for a set amount of time its vertices move around until the quad moves to its final destination. (Note that translations won't do.) Random black lines are also drawn, and sometimes those lines also move around.
Once one of the quads reaches its final spot, it never moves again.
A new quad is always atop old quads (closer to the screen). That means that I need to layer the quads and lines from oldest to newest.
*this also means that it would probably be best to assign z-values to each quad and line, even if the graphics are in pixel coordinates and use an orthographic matrix. Would everyone agree with this?
Given these parameters, I have a few options with varying levels of complexity:
1> Take the object-oriented approach and just assign a buffer to each quad, and the same goes for the random lines. --creation and destruction of buffers every frame for the one shape that is moving. I truthfully think that this is a terrible idea that might only work in a higher level library that does heavy optimization underneath. This approach also doesn't take advantage of the fact that almost every quad will stay the same.
[vertices0] ... , [verticesN]
Draw x N (many draws for many small-size buffers)
2> Assign a z-value to each quad, outline, and line (as mentioned above). Allocate a huge vertex buffer and element buffer to store all permanently-in-their-final-positions quads. Resize only in the very unlikely case someone interacts for long enough. Create a second tiny buffer to store the one temporary moving quad and use bufferSubData every frame. When the quad reaches its destination, bufferSubData it into the large buffer and overwrite the small buffer upon creation of the next quad...all on the same frame. The main questions I have here are: is it possible (safe?) to use bufferSubData and draw it on the same frame? Also, would I use DYNAMIC_DRAW on both buffers even though the larger one would see fewer updates?
[permanent vertices ... | uninitialized (keep a count)]
bufferSubData -> [tempVerticesForOneQuad]
Draw 2x
3> Still create the large and small buffers, but instead of using bufferSubData every frame, create a second shader program and add an attribute for the new/moving quad that explicitly sets the vertex positions for the animation (I would pass vertex index attributes). Only draw with the small buffer when the quad is moving. For the frame when the quad reaches its destination, draw both large and small buffer, but then bufferSubData the final coordinates into the large permanent buffer to be used in the next frame.
switchToShaderProgramA();
[permanent vertices...| uninitialized (keep a count)]
switchToShaderProgramB();
[temp vertices] <- shader B accepts indices for each vertex so we can do all animation in the vertex shader
---last frame of movement arrives : bufferSubData into the permanent vertices buffer for when the the next quad is created
I get the sense that the third option might be the best, but I would like to learn whether there are some other factors that I did not consider. For example, my assumption that a program switch, additional attributes, and vertex shader manipulation would be faster than just substituting the buffer values as in 2>. The advantage of approach 3> (I think) is that I can defer the buffer substitution to a time when nothing needs to be drawn.
Still, I am still not sure of how to work with the randomly-appearing lines. I can't take the "single quad vertex buffer" approach since the number of lines cannot be predicted. Might I also allocate a large buffer for the moving lines? Those also stay after the quad is finished moving, though I don't think that I could use the vertex shader trick because there would be too many attributes to set (as opposed to the 4 for the one quad). I suppose that I could create a large "permanent line data" buffer first, but what to do during the animation is tricky because the lines move. Maybe bufferSubData() + draw on the same frame is not terrible? Or it could be. This is where I need advise.
I understand that this question might not be too specific code-wise, but I don't believe that I would be allowed to show the core of the program. All I have is the typical WebGL boilerplate ready.
I am looking forward to hearing people's thoughts on how I might proceed and whether there are any trade-offs I might have missed when considering the three options above.
Thank you in advance, and please feel free to ask any additional questions if clarification is necessary.

Honestly, for what you're describing, it doesn't sound to me like it matters which you choose. On modern hardware, drawing a few hundred quads and a few thousand lines each frame would not really tax the hardware much.
Having said that, I agree that approach 1 seems very inefficient. Approach 2 sounds perfectly fine. You can safely draw a buffer on the same frame that you uploaded the data. I don't think it matters much whether you use DYNAMIC_DRAW or STATIC_DRAW for the buffer. I tend to think of dynamic buffers as being something you're updating every frame. If you only update it every few seconds or less, then static is fine. Approach 3 is also fine. Between 2 and 3, I'd say do whichever is easier for you to understand and program.
Likewise, for the lines, I would use a separate buffer. It sounds like that one changes per frame, so I would use DYNAMIC_DRAW for that. Allocating a single large buffer for it and performing a glBufferSubData() per frame is probably a fine strategy. As always, trying it and profiling it will tell you for sure.

How to get only the nearby subset of a way's nodes

I'm using the Overpass API to query Open Street Maps for nearby road segments. I am pretty sure that my query is returning all of the nodes of the nearby way... but I only want nearby nodes of the nearby way.
In the documentation it references this problem:
In general, you will be rather interested in complete data than just
elements of a single type. First, there are several valid definitions
of what "complete map data" means. The first unclear topic is what to
do with nodes outside the bounding box which are members of ways that
lie partly inside the bounding box.
The same question repeats for relations. If you wait for a turn
restriction, you may prefer to get all elements of the relation
included. If your bounding box hits for example the border of Russia,
you likely don't want to download ten thousands kilometers of boundary
around half the world.
But I looked at the subsequent examples and didn't see the solution.
Basically, in their example, how would I restrict the elements returned to those strictly in the bounding box (rather than returning the whole boundary of Russia)?
My current query is
way (around:100,50.746,7.154) [highway~"^(secondary|tertiary)$"];
>;
out ids geom;
I'm thinking maybe I need to change it to node (around:...) and then recurse upwards to the way to query for the highway tag but I'm not sure if I am even on the right track.

Actually, it's even a bit more complicated, as you need the set intersection of all nodes in a 100m distance and those nodes belonging to one of the relevant ways. Here's how your query should look like: Adjust distance, tags for ways as needed.
Note that depending on the tagging, there's no guarantee that you will find a node in a certain distance, especially if roads tend to be rather straight and longish. This for sure will impact your results, so a bit experimenting with a suitable radius is probably needed.
// Find nodes up to 100m around center point
// (center is overpass turbo specific for center point lat/lon in current map view)
node(around:100,{{center}})->.aroundnodes;
// recurse up to ways with highway = secondary/tertiary
way(bn.aroundnodes)[highway~"^(secondary|tertiary)$"]->.allways;
// determine nodes belonging to found ways
node(w.allways)->.waynodes;
(
// determine intersection of all ways' nodes and nodes around center point
node.waynodes.aroundnodes;
// and return ways (intersection is just a workaround for a bug)
way.allways.allways;
);
out;
check it out in overpass turbo: http://overpass-turbo.eu/s/hPV

Union find in python3

I know how to implement union find in general, but I was thinking of whether there would be a way to utilize the set structure in python to achieve the same result.
For example, we can union sets pretty easily. But I'm not sure how to determine if two elements are in the same set using just sets.
So, I am wondering if there is a data structure in python that would support such operation, other than the usual implementation?

You could always solve this problem by visualizing it as a tree and its nodes connecting to each other via the root, and then looking up the tree if you want to know if two nodes are connected. If the two nodes you are comparing has the same root (they are in the same tree), than they are connected.
To connect two nodes, just go to the root of each tree they are in, and make one root become the parent of the other.
This video will give you a great intuition about it:
https://www.youtube.com/watch?v=YIFWCpquoS8&list=PLUX6FBiUa2g4YWs6HkkCpXL6ru02i7y3Q&index=1
The connection between the tree nodes can be made via pointers in a language which supports it, but if your language dont (python), than you can create your own pointers by storing positions and links via an array.
The array would be such that its positions would represent your nodes, and the values inside it represents the connection of the specific node to its root. On the beginning, the position in the array is filled with the node number because the nodes has initially no parent, but as you connect nodes, the roots changes, and the array has to represent this. Actually, the value stored there is the identificator of the root.
But try visualizing the problem visually first instead of thinking of arrays and too much mathematical artificats. Visually dealing with it makes the solution sound banal, and can be a good guidance while writing code.
I say this because I have watched the video from Robert Sedgewick I just posted, with a graphical simulation of the solution, and implemented myself without paying too much attention to the code on his book. The intuition the video gave me is much more valuable than any mathematics.
It will help you to encapsulate the nodes into a class, with the following methods:
climbTreeFromNodeUpToRoot
setNewParentToThisNodeAndUpdateHeights
The first method, as the name says, takes you from a node and goes up the tree until finding the root of it, which is then returned.
If you compare two nodes with this method (actually, the roots returned by it), you know easily if they are connected by just comparing their roots.
Once you want to connected them, you go up the trees of both nodes, and ask one root to take the other one as its parent.
The trees can grow very big in height (sorry I dont use the official nomeclature, but this is the one that makes sense to me), so this simple approach will get very slow when you have to climb the tree at a later time.
To prevent trees from becoming to high, dont just set one root as the parent to another without criterium, but attach the smallest tree (in terms of height, not quantity of elements) to the highest one.
For this, you need to know the heights of each tree, and this information you can store on their respective root (via an extra array in your case, or an extra pointer from each node in other languages). This information should be updated everytime another tree connects to it.
It is not possible for a tree to know that she just got a new tree attached to it, so its important that every tree attaching to a second one informs the second as to update its height.
This information can be sent to the root of the second tree, and later used to judge (as writen before) which tree is the smallest. Remember, attaching a small tree to a big one instead of the opposite will save you incredible amounts of time.

Do you want something like this?
myset = ...
all(elt in myset for elt in (a,b))

Visit all nodes in a graph with least repeat visits

I have a tile based map where several tiles are walls and others are walkable. the walkable tiles make up a graph I would like to use in path planning. My question is are their any good algorithms for finding a path which visits every node in the graph, minimising repeat visits?
For example:
map example http://img220.imageshack.us/img220/3488/mapq.png
If the bottom yellow tile is the starting point, the best path to visit all tiles with least repeats is:
path example http://img222.imageshack.us/img222/7773/mapd.png
There are two repeat visits in this path. A worse path would be to take a left at the first junction, then backtrack over three already visited tiles.
I don't care about the end node but the start node is important.
Thanks.
Edit:
I added pictures to my question but cannot see them when viewing it. here they are:
http://img220.imageshack.us/img220/3488/mapq.png
http://img222.imageshack.us/img222/7773/mapd.png
Additionally, in the graphs I need this for there will never be a situation where min repeats = 0. That is, to step on every tile in the map the player must cross his own path at least once.

Your wording is bad -- it allows a reduction to an NP-complete problem. If you could minimize repeat visits, then could you push them to 0 and then you would have a Hamiltonian Cycle. Which is solvable, but hard.

This sounds like it could be mapped onto the traveling salesman problem ... and so likely ends up being NP complete and no efficient deterministic algorithm is known.
Finding a path is fairly straight forward -- find a (or the minimum) spanning subtree and then do a depth/breadth-first traversal. Finding the optimal route is the really difficult bit.
You could use one of the dynamic optimization techniques to try and converge on a fairly good solution.
Unless there is some attribute of the minimum spanning subtree that could be used to generate the best path ... but I don't remember enough graph theory for that.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string