I came across the tree based algorithm Light GBM and I have read that it grows the trees vertically meaning that the Light GBM grows tree leaf-wise (while some other algorithms grow level-wise). I was just wondering and thinking: What is the advantage of growing a tree vertically? Are there any?
A difference (not necessarily an advantage) which I can see is the way you need to define early-stopping criteria while growing the tree. Any thoughts on this?
As described in this section of LightGBM's documentation
LightGBM uses leaf-wise (or what XGBoost calls lossguide) tree growth because it can achieve lower loss (i.e. better fit to the training data) than depth-wise tree growth, holding the number of leaves constant.
In leaf-wise tree growth, the split with the largest gain is chosen, regardless of its level of depth.
A difference ... I can see is the way you need to define early-stopping criteria while growing the tree
It's true that in this type of tree growth, you now have to consider two closely-related ways to prevent overfitting:
maximum depth (max_depth in LightGBM)
total allowed number of leaves (num_leaves in LightGBM)
I'm assuming this is what you meant by "early-stopping criteria", but wanted to also note that the phrase "early stopping" has a special meaning in GBMs that isn't related to how individual trees are grown. Early stopping, as XGBoost, LightGBM, and other GBM libraries refer to it, means "if performance on held-out data fails to improve for n iterations, stop training".
The way I understand it, in creating a random forest, the algorithm bundles a bunch of randomly generated decision trees together, weighting them such that they fit the training data.
Is it reasonable to say that this average of forests could be simplified into a simple decision tree? And, if so - how can I access and present this tree?
What I'm looking to do here is extract the information in the tree to help identify both the leading attributes, their boundary values and placement in the tree. I'm assuming that such a tree would provide insight to a human (or computer heuristic) as to which attributes within a dataset provide the most insight into determining the target outcome.
This probably seems a naive question - and if so, please be patient, I'm new to this and want to get to a stage where I understand it sufficiently.
RandomForest uses bootstrap to create many training sets by sampling the data with replacement (bagging). Each bootstrapped set is very close to the original data, but slightly different, since it may have multiples of the some points and some other points in the original data will be missing. (This helps create a whole bunch of similar but different sets that as a whole represent the population your data came from, and allow better generalization)
Then it fits a DecisionTree to each set. However, what a regular DecisionTree does at each step, is to loop over each feature, find the best split for each feature, and in the end choose to do the split in the feature that produced the best one among all. In RandomForest, instead of looping over every feature to find the best split, you only try a random subsample at each step (default is sqrt(n_features)).
So, every tree in RandomForest is fit to a bootstrapped random training set. And at each branching step, it only looks at a subsample of features, so some of the branching will be good but not necessarily the ideal split. This means that each tree is a less than ideal fit to the original data. When you average the result of all these (sub-ideal) trees, though, you get a robust prediction. Regular DecisionTrees overfit the data, this two-way randomization (bagging and feature subsampling) allow them to generalize and a forest usually does a good job.
Here is the catch: While you can average out the output of each tree, you cannot really "average the trees" to get an "average tree". Since trees are a bunch of if-then statements that are chained, there is no way of taking these chains and coming up with a single chain that produces the result that's the same as averaged result from each chain. Each tree in the forest is different, even if same features show up, they show up in different places of the trees, which makes it impossible to combine. You cannot represent a RandomForest as a single tree.
There are two things you can do.
1) As RPresle mentioned, you can look at the .feature_importances_ attribute, which for each feature averages the splitting score from different trees. The idea is, while you can't get an average tree, you can quantify how much and how effectively each feature is used in the forest by averaging their score in each tree.
2) When I fit a RandomForest model and need to get some insight into what's happening, how the features are affecting the result, I also fit a single DecisionTree. Now, this model is usually not good at all by itself, it will easily be outperformed by the RandomForest and I wouldn't use it to predict anything, but by drawing and looking at the splits in this tree, combined with the .feature_importances_ of the forest, I usually get a pretty good idea of the big picture.
i have a simple question that i didn't understand:
Why decision tree in Scikit Learn is Binary tree instead of n-ary tree?
Anyone knows the answer? Please tell me, thank you so much.
This is better suited for the cross-validated site, but the answer is simplicity. Any decision tree simply partitions space with leaf nodes being data and assignment being a function of that data, typically majority in case of classification and empirical average in case of regression.
However, every decision tree can be converted into a binary decision tree. Intuitively, if you have a rule at level 1 like ( X1 < 1 AND X2 > 10) then this can be converted into a two-level run by shifting one part of the predicate downward.
It is much simpler to train binary decision tree than n-ary because of the combinatorial explosion that takes place. Instead of randomly picking a splitting variable and then optimizing over that field (1-d optimization) N-ary trees must select a subset of variables and optimize over that set.
If you look at the Wikipedia entry for k-d trees, you will see this illustration of points and planes that divides the 2D space into rectangles.
My question is how do I get the resultant set of rectangles? I thought that each 'path' to a leaf node might give me the bounds. Is there a general way to do this for N points at arbitrary depths?
Notice that what I am not asking for is a k-d tree of hyperrectangle structures, where the given input is a set of rectangles that can then be queried for range search, etc. My input is a set of random points, and I want to output the set of rectangles that 'tesselate' or subdivide the Cartesian space completely.
Thanks eh9 for the edit. Just to clarify the input is the k-d tree constructed from a set of random points, the output is the set of resulting rectangles.
And thanks to Jerdak for the 'trivial' solution:
Indeed just walk down the tree starting at the root node and keep splitting rectangles at each axis depth. The only additional piece of info is the outer bound of the original rectangle. Once all nodes are visited, you can return the complete set.
A lot of kD-Trees actually store the bounding hyperrects of each subtree/leaf so that better pruning can be done in KNN searches. Note, these aren't rectangles that cover all of the space, but rather leave gaps between leaves where there aren't any points. Personally I think they are cooler ;-)
Academically speaking, what's the essential difference between the data structure Tree and Graph? And how about the tree based search and Graph based search?
A Tree is just a restricted form of a Graph.
Trees have direction (parent / child relationships) and don't contain cycles.
They fit with in the category of Directed Acyclic Graphs (or a DAG).
So Trees are DAGs with the restriction that a child can only have one parent.
One thing that is important to point out, Trees aren't a recursive data structure.
They can not be implemented as a recursive data structure because of the above restrictions. But any DAG implementation, which are generally not recursive, can also be used.
My preferred Tree implementation is a centralized map representation and is non recursive.
Graphs are generally searched breadth first or depth first. The same applies to Tree.
Instead of explaining I prefer to show it in pictures.
A tree in real time
A graph in real life use
Yes a map can be visualised as a graph data structure.
Seeing them like this makes life easier. Trees are used at places where we know that each node has only one parent. But graphs can have multiple predecessors(term parent is generally not used for graphs).
In real world, you can represent almost anything using graphs. I used a map, for example. If you consider each city as a node, it can be reached from multiple points. The points which lead to this node are called predecessors and the points which this node will lead to are called successors.
electrical circuit diagram, the plan of a house, computer network or a river system are few more examples of graphs. Many real world examples can be considered as graphs.
Technical diagram could be like this
Tree :
Graph :
Make sure to refer to below links. Those will answer almost all your questions on trees and graphs.
References :
http://www.introprogramming.info/english-intro-csharp-book/read-online/chapter-17-trees-and-graphs/#_Toc362296541
http://www.community-of-knowledge.de/beitrag/data-trees-as-a-means-of-presenting-complex-data-analysis/
Wikipedia
The other answers are useful, but they're missing the properties of each:
Graph
Undirected graph, image source: Wikipedia
Directed graph, image source: Wikipedia
Consists of a set of vertices (or nodes) and a set of edges connecting some or all of them
Any edge can connect any two vertices that aren't already connected by an identical edge (in the same direction, in the case of a directed graph)
Doesn't have to be connected (the edges don't have to connect all vertices together): a single graph can consist of a few disconnected sets of vertices
Could be directed or undirected (which would apply to all edges in the graph)
As per Wikipedia:
For example, if the vertices represent people at a party, and there is an edge between two people if they shake hands, then this graph is undirected because any person A can shake hands with a person B only if B also shakes hands with A. In contrast, if any edge from a person A to a person B corresponds to A admiring B, then this graph is directed, because admiration is not necessarily reciprocated.
Tree
Image source: Wikipedia
A type of graph
Vertices are more commonly called "nodes"
Edges are directed and represent an "is child of" (or "is parent of") relationship
Each node (except the root node) has exactly one parent (and zero or more children)
Has exactly one "root" node (if the tree has at least one node), which is a node without a parent
Has to be connected
Is acyclic, meaning it has no cycles: "a cycle is a path [AKA sequence] of edges and vertices wherein a vertex is reachable from itself"
There is some overlap in the above properties. Specifically, the last two properties are implied by the rest of the properties. But all of them are worth noting nonetheless.
TREE :
1. Only one path exist between two vertices (Nodes).
2. Root node is the starting node of the tree.
3. Tree doesn't have loops.
4. Number of edges: n-1 (where n is number of nodes)
5. Tree looks like Hierarchical
6. All trees are graph.
GRAPH :
1. More than one path is allowed between two vertices.
2. There is no root node concept (we can start from any node).
3. There can be loop in graph.
4. Number of edges are not defined.
5. Graph looks like Network.
6. All graphs are not tree.
More detailed explanation you can find in this video -> https://www.youtube.com/watch?v=KVHrjVTp9_w
Tree is special form of graph i.e. minimally connected graph and having only one path between any two vertices.
In graph there can be more than one path i.e. graph can have uni-directional or bi-directional paths (edges) between nodes
Also you can see more details:
http://freefeast.info/difference-between/difference-between-trees-and-graphs-trees-vs-graphs/
Tree is basically undirected graph which not contain cycle,so we can say that tree is more restricted form of graph.
However tree and graph have different application to implement various algorithm in programming.
For example graph can be used for model road map and tree can be used for implement any hierarchical data structure.
Simple concept is Tree doesn't have cycle formation and its unidirectional whereas Graph forms cycle and it will be Bidirectional in some cases and Unidirectional in another.
A tree is a digraph such that:
a) with edge directions removed, it is connected and acyclic
You can remove either the assumption that it is acyclic
If it is finite, you can alternatively remove the assumption that it is connected
b) every vertex but one, the root, has indegree 1
c) the root has indegree 0
If there are only finitely many nodes, you can remove either the assumption that the root has indegree 0 or the assumption that the
nodes other than the root have degree 1
Reference: http://www.cs.cornell.edu/courses/cs2800/2016sp/lectures/lec27-29-graphtheory.pdf
Trees are obvious: they're recursive data structures consisting of nodes with children.
Map (aka dictionary) are key/value pairs. Give a map a key and it will return the associated value.
Maps can be implemented using trees, I hope you don't find that confusing.
UPDATE: Confusing "graph" for "map" is very confusing.
Graphs are more complex than trees. Trees imply recursive parent/child relationships. There are natural ways to traverse a tree: depth-first, breadth-first, level-order, etc.
Graphs can have uni-directional or bi-directional paths between nodes, be cyclic or acyclic, etc. I would consider graphs to be more complex.
I think a cursory search in any decent data structures text (e.g. "Algorithms Design Manual") would give more and better information than any number of SO answers. I would recommend that you not take the passive route and start doing some research for yourself.
one root node in tree and only one parent for one child. However, there is no concept of root node. Another difference is, tree is hierarchical model but graph is network model.
In tree, each node (except the root node) has exactly one predecessor node and one or two successor nodes. It can be traversed by using In-order, Pre-order, Post-order, and Breadth First traversals​. Tree is a special kind of graph that has no cycle so that is known as DAG (Directed Acyclic Graph). Tree is a hierarchical model.
In graph, each node has one or more predecessor nodes and successor nodes. The graph is traversed by using Depth First Search (DFS) and Breadth First Search (BFS) algorithms. Graph has cycle so it is more complex than tree. Graph is a network model. There are two kinds of graph: directed graphs and undirected graphs.
In mathematics, a graph is a representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges.[1] Typically, a graph is depicted in diagrammatic form as a set of dots for the vertices, joined by lines or curves for the edges. Graphs are one of the objects of study in discrete mathematics.