What's the purpose of n_labels in make_multilabel_classification? - scikit-learn

from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(n_samples=1000, n_features=10, n_classes=3, n_labels=2, random_state=1)
While the n_labels=2, why there are values of [1,1,1] in y? Doesn't this mean there are 3 labels for certain examples? According to the documentation of n_labels:
n_labels : int, optional (default=2)
The average number of labels per instance. More precisely, the number
of labels per sample is drawn from a Poisson distribution with
``n_labels`` as its expected value, but samples are bounded
So n_labels=2 doesn't mean the max number of predicted labels is 2, but on average is 2. In this case, why should I specify this parameter?

You could still be interested in varying the average number of labels.
For example:
from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(n_samples=5, n_features=10,
n_classes=20, n_labels=2, random_state=1)
print(y)
# [[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0]
# [0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0]
# [0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]]
X, y = make_multilabel_classification(n_samples=5, n_features=10,
n_classes=20, n_labels=15, random_state=1)
print(y)
# [[1 1 0 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 0 0]
# [1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
# [1 0 0 0 0 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1]
# [1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1]
# [1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1]]

Related

How to interpret SVM output in Weka?

I have a Support Vector model in Weka (SMO) and I want to extract knowledge from this output:
=== Classifier model (full training set) ===
SMO
Kernel used:
Puk kernel
Classifier for classes: Positive, Negative
BinarySMO
0.9349 * <0.364865 0 0 1 0 1 0 0 0 1 1 1 0 1 1 1 > * X]
+ 0.743 * <0.486486 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 > * X]
+ 0.8578 * <0.391892 0 0 1 0 1 1 1 0 1 0 1 0 1 1 1 > * X]
- 0.815 * <0.297297 1 0 1 0 1 0 0 0 1 0 1 0 1 1 0 > * X]
- 0.2347 * <0.391892 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 > * X]
+ 1.1502 * <0.527027 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 > * X]
+ 0.6922 * <0.554054 0 0 1 0 1 1 0 1 1 0 1 0 0 1 1 > * X]
.....
- 0.3291 * <0.594595 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 > * X]
+ 0.9296 * <0.364865 0 0 1 1 1 0 1 0 1 0 0 0 1 0 1 > * X]
+ 0.6504 * <0.351351 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 > * X]
- 0.0333 * <0.27027 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 > * X]
+ 0.0085 * <0.513514 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 > * X]
+ 0.8176 * <0.72973 0 1 1 0 1 1 0 1 0 0 0 1 0 0 1 > * X]
- 0.4812 * <1 1 0 0 1 1 0 1 1 0 0 1 0 0 0 1 > * X]
- 0.3286 * <0.256757 1 0 0 1 0 1 0 0 0 0 1 1 1 1 1 > * X]
.........
- 0.1838 * <0.635135 0 1 0 1 0 1 0 1 1 0 1 0 0 0 0 > * X]
- 0.0976 * <0.189189 1 1 0 1 1 1 0 0 1 1 1 1 0 1 1 > * X]
- 0.0036 * <0.364865 1 1 0 1 0 1 0 1 1 0 1 1 0 1 0 > * X]
- 0.0157 * <0.554054 0 1 0 1 0 1 0 1 1 0 1 1 1 1 1 > * X]
.........
- 0.0167 * <0.621622 0 1 1 0 0 0 1 1 0 1 1 1 0 0 0 > * X]
+ 0.2005 * <0.5 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 > * X]
- 0.589
Number of support vectors: 378
Number of kernel evaluations: 131997 (92.5% cached)
How can I interpret this output?
Thanks in advance
Have a look at SMO's toString() method to see how the output is constructed. Check out the Puk kernel itself (publication), to see how its calculations are done.
The textual output of classifiers is usually only for informative purposes (it is optional and has no impact on a classifier). People usually apply trained models directly to new data rather than trying to understand the output (especially with support vector machines).

make lower half of a n*n list zero without using any functions in python

I tried to solve it by using 2 for loops and an if statement . But i was unable to get the desired output.
INPUT-
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
thislist=[1]*10
thislist=[thislist]*10
print(thislist)
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
print()
for i in range(10):
for j in range(10):
if i>j:
thislist[i][j]=0
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
This was the output i got:
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
but when i made a list using the below method i got the desired output.
thislist=[[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1]]
print(thislist)
for i in range(10):
for j in range(10):
if i>j:
thislist[i][j]=0
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
note-This is the desired OUTPUT-
1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 1 1
0 0 0 0 0 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 1
Can someone explain whats the difference between the above 2 codes?
As you pointed out, the problem comes from the manner you created your list of list. In your first example, you do something like this:
list1 = [1]*10
list_of_list1=[list1]*10
list_of_list1 is actually a list of shallow copies of the original list1. Then if you modify a value in list_of_list1, the modification will occurs in all the rows of list_of_list1.
The opposit of a shallow copy is a deep copy. You might want to search more info on the Internet about this topic
In the mean time, you can simply try this.
thislist = []
for row in range(10):
list1 = [1]*10
thislist.append(list1)
But I usually go with numpy when it is available.

how can i replace a matrix with another matrix in a non sequence order

I want to know how can I replace a part of a big matrix with another small matrix by a non-sequence order of row & columns. I mean
`
a=np.zeros([15,15])
B=np.ones([5,5])
ind1=[0,1,2,3,4]
ind2=[0,5,8,7,12]
#Now I want to replace like this
a[ind1,ind1]=a[ind1,ind1]+B
#and
a[ind2,ind2]=a[ind2,ind2]+B
`
It can be done very easily in Matlab, but I do not know why, in python, indexing of columns does not work with a list of numbers?
Thank you in advance
Your problem is all about numpy not Python. Learn about indexing in numpy - https://docs.scipy.org/doc/numpy-1.15.1/user/basics.indexing.html. It is not that different from MATLAB indexing actually.
For example:
import numpy as np
a = np.zeros(shape=[15, 15], dtype=int)
b = np.ones(shape=[5, 5], dtype=int)
a[0:5, 0:5] += b
a[0:5, 5:10] += b * 2
ind_1 = [11, 6, 7, 12, 13]
ind_2 = [9, 7, 14, 13, 4]
a[np.ix_(ind_1, ind_2)] += b * 3
print(a)
Output:
[[1 1 1 1 1 2 2 2 2 2 0 0 0 0 0]
[1 1 1 1 1 2 2 2 2 2 0 0 0 0 0]
[1 1 1 1 1 2 2 2 2 2 0 0 0 0 0]
[1 1 1 1 1 2 2 2 2 2 0 0 0 0 0]
[1 1 1 1 1 2 2 2 2 2 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 3 0 0 3 0 3 0 0 0 3 3]
[0 0 0 0 3 0 0 3 0 3 0 0 0 3 3]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 3 0 0 3 0 3 0 0 0 3 3]
[0 0 0 0 3 0 0 3 0 3 0 0 0 3 3]
[0 0 0 0 3 0 0 3 0 3 0 0 0 3 3]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

Rare error while computing dataframe by using pandas

I am dealing a rare error while making some machine learning with a dataset loaded using pandas.
This is the error I am getting:
I have been reading something releated to it and it seems to be due to the columns and how pandas interpret them but I have no clue what can be wrong.
This is the code I am using for this:
import pandas as pd
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/pima-
indians-diabetes/pima-indians-diabetes.data'
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi',
'pedigree', 'age', 'label']
pima = pd.read_csv(url, header=None, names=col_names)
# define X and y
feature_cols = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi',
'pedigree', 'age']
X = pima[feature_cols]
y = pima.label
#k fold cv
from sklearn.model_selection import KFold, cross_val_score
kf = KFold(n_splits=10) #define number of splits
kf.get_n_splits(X) #to check how many splits will be done.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
clf = LinearDiscriminantAnalysis() #select the model for train, test in kf.split(X, y):
for train, test in kf.split(X, y):
y_pred_prob = clf.fit(X[train], y[train]).predict_proba(X[test])
y_pred_class = clf.predict(X[test])
Thanks in advance
In clf.fit method, according to the document parameters expected are array:
Parameters
----------
X : array-like, shape (n_samples, n_features)
Training data.
y : array, shape (n_samples,)
If you look at the example in link X and y are numpy array:
Try using as_matrix() for both X and y instead of just X = pima[feature_cols] and y = pima.label
X = pima[feature_cols].as_matrix()
y = pima.label.as_matrix()
Testing by print:
import pandas as pd
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data'
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']
pima = pd.read_csv(url, header=None, names=col_names)
# define X and y
feature_cols = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age']
X = pima[feature_cols].as_matrix()
y = pima.label.as_matrix()
#k fold cv
from sklearn.model_selection import KFold, cross_val_score
kf = KFold(n_splits=10) #define number of splits
kf.get_n_splits(X) #to check how many splits will be done.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
clf = LinearDiscriminantAnalysis() #select the model for train, test in kf.split(X, y):
for train, test in kf.split(X, y):
y_pred_prob = clf.fit(X[train], y[train]).predict_proba(X[test])
y_pred_class = clf.predict(X[test])
print(y_pred_class)
Result:
[1 0 1 0 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 1
0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0]
[0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0
1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 1 1]
[1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0
0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1
1 0 1]
[1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 1
0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0
0 1 0]
[0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0
0 0 0]
[0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0
0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1
1 0 0]
[0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 1
1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0]
[0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1
0 1 0]
[0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
0 0 1 0 0 1 0 1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1
0 1]
[0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0
0 0]

J Language rank of power function

t=:1
test=: monad define
t=.y
t=. t, 0
)
testloop=: monad def'test^:y t'
testloop 1
1 0
testloop 2
1 0 0
testloop 10
1 0 0 0 0 0 0 0 0 0 0
In order to simplify this
(testloop 0),(testloop 1), (testloop 2), ...
110100100010000...
I tried
, testloop"0 (i.10)
but it gives
1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0...
It seems like I have a problem with a rank, I can't figure out which one to use.
I would be grateful if you could help me on this issue.
Thank you!
This is not so much a rank problem as the fact that the results are padded with zeros so that the row lengths match.
testloop 1
1 0
testloop 2
1 0 0
testloop"0 [ 1 2
1 0 0
1 0 0
testloop"0 [ 1 2 3
1 0 0 0
1 0 0 0
1 0 0 0
If I redefine your test and testloop to add a different appending digit, we can see how the padding is working.
test2 =: 3 : 0
​t=. y
​t=. t,2
​)
test2loop=: monad def'test2^:y t'
test2loop"0 [1
1 2
test2loop"0 [2
1 2 2
test2loop"0 [ 1 2 NB. 0 padded in first row
1 2 0
1 2 2
test2loop"0 [ 1 2 3 NB. 0's padded in first two rows
1 2 0 0
1 2 2 0
1 2 2 2
To get around the padding issue I will use each=: &.> so that the results are boxed before combining to avoid the padding.
testloop each 1 2 3
+---+-----+-------+
|1 0|1 0 0|1 0 0 0|
+---+-----+-------+
testloop each i. 10
+-+---+-----+-------+---------+-----------+-------------+---------------+-----------------+-------------------+
|1|1 0|1 0 0|1 0 0 0|1 0 0 0 0|1 0 0 0 0 0|1 0 0 0 0 0 0|1 0 0 0 0 0 0 0|1 0 0 0 0 0 0 0 0|1 0 0 0 0 0 0 0 0 0|
+-+---+-----+-------+---------+-----------+-------------+---------------+-----------------+-------------------+
using ; to unbox and ravel the results
; testloop each i. 10
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
To be honest I would be more inclined to use the fact that complex numbers used as the left argument of # introduce 0's for padding. The number of 0's depends on the imaginary value of the complex number.
1j0 # 1
1
1j1 # 1
1 0
1j2 # 1
1 0 0
test3=: monad def '(1 j. y)#1'
test3 1
1 0
test3 2
1 0 0
test3 1 2
1 0 1 0 0
test3 i. 10
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

Resources