Leaf ordering in scikit-learn - scikit-learn

I am constructing decision tree in scikit-learn and tree is missing leaf #2. I wonder why? Here is my example:
import numpy as np
from sklearn.tree import DecisionTreeClassifier, export_graphviz
def leaf_ordering():
X = np.genfromtxt('X.csv', delimiter=',')
Y = np.genfromtxt('Y.csv',delimiter=',')
dt = DecisionTreeClassifier(min_samples_leaf=100, random_state=99)
dt.fit(X, Y)
print(set(dt.apply(X)))
leaf_ordering()
link to file X
link to file Y
Here is output: {1, 3, 4}. As you can see there is no leaf #2.

Nodes 0 and 2 in your example are both non-leaf nodes. In my example below, you can see from the export that 0, 1, and 4 are all internal tree nodes, and 2, 3, 5, and 6 are the leaves, and so all the predictions are going to be in one of those 4.
In [35]: X = np.random.random([100, 5])
In [36]: y = X.sum(axis=1) + np.random.random(100)
In [37]: dt = DecisionTreeRegressor(max_depth=2)
In [38]: dt.fit(X, y)
Out[38]:
DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=None,
splitter='best')
In [39]: dt.apply(X)
Out[39]:
array([6, 3, 3, 3, 6, 6, 3, 6, 3, 6, 2, 3, 3, 5, 3, 5, 5, 6, 3, 3, 3, 3, 3,
3, 3, 6, 6, 3, 3, 3, 3, 5, 3, 5, 3, 3, 3, 3, 2, 3, 3, 3, 6, 3, 3, 3,
3, 6, 3, 5, 2, 3, 3, 6, 3, 3, 3, 3, 3, 6, 6, 3, 6, 6, 3, 5, 6, 3, 3,
3, 3, 6, 3, 3, 2, 3, 6, 2, 6, 2, 3, 3, 6, 2, 5, 6, 3, 3, 3, 6, 5, 3,
3, 3, 6, 6, 3, 3, 6, 5])
In [40]: export_graphviz(dt)
In [41]: !cat tree.dot
digraph Tree {
node [shape=box] ;
0 [label="X[2] <= 0.7003\nmse = 0.4442\nsamples = 100\nvalue = 3.0586"] ;
1 [label="X[4] <= 0.1842\nmse = 0.3332\nsamples = 65\nvalue = 2.8321"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label="mse = 0.0426\nsamples = 7\nvalue = 1.9334"] ;
1 -> 2 ;
3 [label="mse = 0.2591\nsamples = 58\nvalue = 2.9406"] ;
1 -> 3 ;
4 [label="X[0] <= 0.3576\nmse = 0.3782\nsamples = 35\nvalue = 3.4791"] ;
0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
5 [label="mse = 0.1212\nsamples = 10\nvalue = 2.9395"] ;
4 -> 5 ;
6 [label="mse = 0.3179\nsamples = 25\nvalue = 3.695"] ;
4 -> 6 ;
}

Related

Apply if statement on multiple lists with multiple conditions

I would like to append ids to a list which meet a specific condition.
output = []
areac = [4, 4, 4, 4, 1, 6, 7,8,9,6, 10, 11]
arean = [1, 1, 1, 4, 5, 6, 7,8,9,10, 10, 10]
id = [1, 2, 3, 4, 5, 6, 7,8,9,10, 11, 12]
dist = [2, 2, 2, 4, 5, 6, 7.2,5,5,5, 8.5, 9.1]
for a,b,c,d in zip(areac,arean,id,dist):
if a >= 5 and b==b and d >= 3:
output.append(c)
print(comp)
else:
pass
The condition is the following:
- areacount has to be >= 5
- At least 3 ids with a distance of >= 3 with the same area_number
So the id output should be [10,11,12].I already tried a different attempt with Counter that didn't work out. Thanks for your help!
Here you go:
I changed the list names to something more descriptive.
output = []
area_counts = [4, 4, 4, 4, 1, 6, 7, 8, 9, 6, 10, 11]
area_numbers = [1, 1, 1, 4, 5, 6, 7, 8, 9, 10, 10, 10]
ids = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
distances = [2, 2, 2, 4, 5, 6, 7.2, 5, 5, 5, 8.5, 9.1]
temp_numbers, temp_ids = [], []
for count, number, id, distance in zip(counts, numbers, ids, distances):
if count >= 5 and distance >= 3:
temp_numbers.append(number)
temp_ids.append(id)
for (number, id) in zip(temp_numbers, temp_ids):
if temp_numbers.count(number) == 3:
output.append(id)
output will be:
[10, 11, 12]

When i use set( list_a + list_b ) it returns a dictionary. Do sets naturally return dictionaries?

I'm doing some beginner python exercises and one of them is to remove duplicates from a list. I've successfully done it, but the strange thing is that it is returning a dictionary instead of a list.
This is my code.
import random
a = []
b = []
for i in range(0,20):
n = random.randint(0,10)
a.append(n)
for i in range(0,20):
n = random.randint(0,10)
b.append(n)
print(sorted(a))
print(sorted(b))
c = set(list(a+b))
print(c)
and this is what it's spitting out
[0, 0, 1, 1, 1, 1, 2, 3, 4, 4, 6, 6, 7, 7, 7, 8, 9, 9, 10, 10]
[0, 1, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 10, 10, 10]
{0, 1, 2, 3, 4, 6, 7, 8, 9, 10}
thanks in advance!
{0, 1, 2, 3, 4, 6, 7, 8, 9, 10} is a set, not a dictionary, a dictionary would be printed as {key:value, key:value, ...}
Try print(type(c)) and you'll see it prints <class 'set'> rather than <class 'dict'>
Also try the following
s = {1,2,3}
print(type(s))
d = {'a':1,'b':2,'c':3}
print(type(d))
You'll see the type is different

taking averages of certain indices using another list in python

I was hoping to take the average of a list using another list with the start and stop indices.
for example:
a = [3, 9]
b = [0, 1, 12, 9, 0, 8, 9, 3, 3, 5, 7, 1, 4, 6, 6]
I want to take the average of the numbers from b[3] to b[9] and this is what I have so far
counter = a[0]
sum = b[counter]
while counter < a[1] + 1:
counter += 1
sum = sum + b[counter]
denominator = a[1] - a[0] + 1
avg = sum/denominator
But after checking, it seems to be giving the wrong thing
you could use statistics.mean
from statistics import mean
a = [3, 9]
b = [0, 1, 12, 9, 0, 8, 9, 3, 3, 5, 7, 1, 4, 6, 6]
mean(b[a[0]: a[1] + 1])
or you could use:
sum(b[a[0]: a[1] + 1]) / len(b[a[0]: a[1] + 1])
I would suggest the following:
from statistics import mean
a = [3, 9]
b = [0, 1, 12, 9, 0, 8, 9, 3, 3, 5, 7, 1, 4, 6, 6]
avg = mean(b[a[0]:a[1]+1])
print (avg)

Python code to print elements from list that meet certain criteria [duplicate]

This question already has answers here:
Print an element in a list based on a condition
(3 answers)
Select value from list of tuples where condition
(4 answers)
How to return a subset of a list that matches a condition [duplicate]
(1 answer)
Closed 3 years ago.
I have a list with 22 integers (ranging from 1 through 9) and want to create/ print a new list containing only those integers that are above 5.
This is what I have tried so far - the result (obviously) is that 'the_list' gets printed multiple times - i.e. the number of times = the number of instances above 5.
the_list = [1, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7]
print(the_list)
k=5
tl2=[]
for i in the_list:
if i > k :
tl2.append(the_list)
Try this code:
>>> the_list = [1, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7]
>>> print(the_list)
[1, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7]
>>> the_filtered_list = list(filter(lambda x: x > 5, the_list))
>>> print(the_filtered_list)
[7, 6, 6, 7, 6, 6, 7]
See
filter
lambda
EDIT:
Another option is to use a generator expression:
>>> the_filtered_list = list(i for i in the_list if i > 5)
>>> print(the_filtered_list)
[7, 6, 6, 7, 6, 6, 7]
See
generator expressions and list comprehensions
EDIT:
My initial answer was indeed slow and memory inefficient. Here is the comparison of several possibilities. Which one to choose depends on how big the list is and what it is used for later.
>>> import random
>>> import timeit
>>> import sys
>>>
>>> the_list = [random.randrange(1, 10) for _ in range(100)]
>>>
>>> timeit.timeit('filter(lambda x: x > 5, the_list)', setup=f'the_list = {the_list}')
0.15890196000000856
>>> timeit.timeit('[i for i in the_list if i > 5]', setup=f'the_list = {the_list}')
2.633208761999981
>>> timeit.timeit('(i for i in the_list if i > 5)', setup=f'the_list = {the_list}')
0.227755295999998
>>>
>>> timeit.timeit('list(filter(lambda x: x > 5, the_list))', setup=f'the_list = {the_list}')
7.5565902380000125
>>> timeit.timeit('list(i for i in the_list if i > 5)', setup=f'the_list = {the_list}')
3.599053368
>>>
>>> sys.getsizeof(filter(lambda x: x > 5, the_list))
64
>>> sys.getsizeof([i for i in the_list if i > 5])
440
>>> sys.getsizeof((i for i in the_list if i > 5))
128
>>>
>>> sys.getsizeof(list(filter(lambda x: x > 5, the_list)))
480
>>> sys.getsizeof(list(i for i in the_list if i > 5))
480
The problem is you are appending the list, not the number 'i'
the_list = [1, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7]
print(the_list)
k=5
tl2=[]
# the_list refers to the entire list
# i is an element in the list
for i in the_list:
if i > k :
# append the number 'i' if it is greater than k
tl2.append(i)
print (t12)
the_list = [1, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 3, 4, 5, 6, 7]
print(the_list)
k=5
new_ls = [x for x in the_list if x >k]
print(new_ls)
try this solution

Creating all the possible combinations inside an element of list

I need to explore every permutations of a list. Let's say I have this initiated variable:
samplelist = [1, 2, 3, 4, 5, 6, 7, 8, 9]
An example output would be:
output = [[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 3, 2, 4, 5, 6, 7, 8, 9], [1, 3, 4, 2, 5, 6, 7, 8, 9], [1, 3, 5, 3, 2, 6, 7, 8, 9]] .... and so on.
Here's what I did:
import itertools
samplelist = [1, 2, 3, 4, 5, 6, 7, 8, 9]
def combinations(iterable, r):
pool = tuple(iterable)
n = len(pool)
if r > n:
return
indices = range(r)
yield tuple(pool[i] for i in indices)
while True:
for i in reversed(range(r)):
if indices[i] != i + n - r:
break
else:
return
indices[i] += 1
for j in range(i+1, r):
indices[j] = indices[j-1] + 1
yield tuple(pool[i] for i in indices)
list(combinations_with_replacement(samplelist, 9))
Since the length of the list is 9, the factorial of 9 is 362,880. I'm trying to get all these combinations of the elements in the list
But my output is not what I'm trying to achieve.
itertools.permutations(samplelist) returns the 9! lists

Resources