Presentation of a CHAID decision tree - statistics

Would anyone please tell me how to effectively present a large CHAID decision tree with 4 levels and 550 nodes? Any sample manuscripts are appreciated. I can't figure out an effective way to present the tree in a manuscript. Thanks very much.

Related

Using Learning To Rank on textual documents?

i need some help in implementing Learning To Rank (LTR). It is related to my semester project and I'm totally new to this. The details are as follows:
I gathered around 90 documents and populated 10 user queries. Now i have to rank these documents based on each query using three algorithms specifically LambdaMart, AdaRank, and Coordinate Ascent. Previously i applied clustering techniques on Vector Space Model but that was easy. However in this case, I don't know how to change the data according to these algorithms. As i have this textual data( document and queries) in txt format in separate files. I have searched for solutions online and I'm unable to find a proper solution so can anyone here please guide me in the right direction i.e. Steps. I would really appreciate.
As you said you have applied the clustering in vector space model. the input of these algorithms are also vectors.
Why don't you have a look at the standard data set introduced for learning to rank issue (Letor benchmark) in which documents are shown in vectors of features?
There is also implementation of these algorithm provided in java (RankLib), which may give you the idea to solve the problem. I hope, this help you!

Decision nodes and chance nodes definition in decision tree

Could someone please provide a definition of decision nodes, change nodes and end nodes.
I have view the decision tree interpretation on wikipedia and haven't found the clear definition about the three tree nodes.
Thanks!
The way I understand it is like this:
Decision Nodes:
The node where a there is a requirement set that determines the outcome: ex:Profit>50k.
Chance Node:
A node were there isn't a set requirement to determine where to go in the split but something that just has a probability of happening or not Ex: 50% chance of success or failure of a business.
End Nodes:
The end of a split, so something feeds into this but nothing comes out of it. Usually some result. Ex: Business is sucessful.
DECISION NODE: These are variable's included on an influence diagrams or decision tree, they are points where decisions would have to be made, they are usually depicted by square or rectangle. by Master E. Felix.

How to find maximum independent set of a directed acyclic graph?

Say we have a graph that is similar to a linked list (or a directed acyclic graph). An independent set consists of nodes that don't share edges with any other node in the set. If each node is weighted, how can we calculate the max possible value of the independent set of nodes? I understand we have to use Dynamic Programming so I have a slight clue but I'm hoping someone could explain how they would approach it. Thank you!
I believe that this problem is NP-hard for arbitrary directed acyclic graphs. The corresponding problem for undirected graphs is known to be NP-hard, and that problem can be converted into the directed version of the problem by directing all of the edges in a way that makes the resulting graph a DAG. Any independent set in the original graph will be an independent set in the directed graph and vice-versa, so any solution to the directed case will solve the undirected case.
Your question talks about solving this problem on a linked list. If you're solving the problem just for linked lists, there is a polynomial-time solution using dynamic programming. As a hint, if you choose one node in the linked list, you have to skip the next node, then should maximize what remains. If you don't choose the node, you just maximize the value of the rest of the list. Taking the better of these two options and evaluating this bottom-up will give you a really fast DP algorithm.
Hope this helps!

Fusion Tables: Polygon not displayed as of certain zoom level

I'm working on a map that shows different population statistics on a rather granular level in Berlin (447 sub-districts).
https://www.google.com/fusiontables/data?docid=1tIAPGaYK1iEWWLANQOupkAqCcPhVauMjdPS1qOs#map:id=3
For some reason, a small number of polygons (3) is not displayed as soon as you zoom into the map (12 or higher).
As the polygons are displayed at the level before, they should have the proper coordinates. I first thought the shapefiles (kmls provided by the local statistics authority) might be buggy, but that does not seem to be the case.
Can anybody explain to me why this happens?
Thank you very much!
Michael
There are two possibilities that I can think of:
it is a complexity problem or a winding direction issue with the polygon. Thread on Fusion Tables Users Group discussing this issue.
it is a complexity issue with the number of "features" on the tile. See Limits in the documentation, it used to be more clearly defined.
Reversing the winding direction of two of the problem polygons seems to fix the issue:
https://www.google.com/fusiontables/DataSource?snapid=S787935DQC4

How can I select layer by location AND attributes in ArcMap?

I have a dataset (i.e. a shapefile) containing spatial location data (coordinates) and elevation data as well as other attribute fields.
I want to select points which have at least 200m vertical separation (i.e. are at least 200m apart on the z-axis) AND are within 3km of each other.
The aim is to create a new shapefile with all points that have this relationship with 1 or more other points.
Im sure there is a solution to this problem (maybe not using arcmap at all?) but i just cant find it. any help would be greatly appreciated.
Chris
You are going to have much better luck asking this question in gis.stackexchange.com. Many more ESRI users/programmers there. As a matter of fact I bet you find your solution there without having to ask the question.
You can run the ArcGIS Near tool on all the points.
Then select by attribute points with Z values of >200m and distance values of <3000m.

Resources