I am trying to train an SVM in scikit. I am following the example and tried to adjust it to my 3d feature vectors.
I tried the example from the page http://scikit-learn.org/stable/modules/svm.html
and it ran through. While bugfixing I came back to the tutorial setup and found this:
X = [[0, 0], [1, 1],[2,2]]
y = [0, 1,1]
clf = svm.SVC()
clf.fit(X, y)
works while
X = [[0, 0,0], [1, 1,1],[2,2,2]]
y = [0, 1,1]
clf = svm.SVC()
clf.fit(X, y)
fails with:
ValueError: X.shape[1] = 2 should be equal to 3, the number of features at training time
what is wrong here? It's only one additional dimension...
Thanks,
El
Running your latter code works for me:
>>> X = [[0,0,0], [1,1,1], [2,2,2]]
>>> y = [0,1,1]
>>> clf = svm.SVC()
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=False, shrinking=True, tol=0.001,
verbose=False)
That error message seems like it should actually happen when you're calling .predict() on an SVM object with kernel="precomputed". Is that the case?
Related
So I am understanding lasso regression and I don't understand why it needs two input values to predict another value when it's just a 2 dimensional regression.
It says in the documentation that
clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
which I don't understand. Why is it [0,0] or [1,1] and not just [0] or [1]?
[[0,0], [1, 1], [2, 2]]
means that you have 3 samples/observations and each is characterised by 2 features/variables (2 dimensional).
Indeed, you could have these 3 samples with only 1 features/variables and still be able to fit a model.
Example using 1 feature.
from sklearn import datasets
from sklearn import linear_model
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :1] # we only take the feature
y = iris.target
clf = linear_model.Lasso(alpha=0.1)
clf.fit(X,y)
print(clf.coef_)
print(clf.intercept_)
I have found an inconsistency in the predict function of the SVM model for multiclass problems. I have trained a model with SKlearn SVM.SVC function for a multiclass prediction problem (see plot below).
But on some occasions, the predict functions gives me different results when I did the prediction instead with the argmax of the decision function. One can see that the inconsistency is close to the decision boundary.
This inconsistency vanishes when I use the OneVsRestClassifier directly. Does the predict function of the SVM.SVC classes some corrections or why does it differ from the argmax prediction?
Here is the code to reproduce the result:
import numpy as np
from sklearn import svm, datasets
from sklearn.multiclass import OneVsRestClassifier
from scipy.linalg import cho_solve, cho_factor
def create_data(n_samples, noise):
# 4 gaussian blobs with different means and variances
sample_per_cls = np.int(n_samples/4)
sample_per_cls_rest = sample_per_cls + n_samples - 4*sample_per_cls #puts the rest of the samples into the last class
x1 = np.random.multivariate_normal([20, 18], np.array([[2, 3], [3, 7]])*4*noise, sample_per_cls, 'warn')
x2 = np.random.multivariate_normal([13, 27], np.array([[10, 3], [3, 2]])*4*noise, sample_per_cls, 'warn')
x3 = np.random.multivariate_normal([9, 13], np.array([[6, 1], [1, 5]])*4*noise, sample_per_cls, 'warn')
x4 = np.random.multivariate_normal([14, 20], np.array([[4, 0.2], [0.2, 7]])*4*noise, sample_per_cls_rest, 'warn')
X = np.vstack([x1,x2,x3,x4])
#define the labels for each class
Y = np.empty([n_samples], dtype=np.int)
Y[0:sample_per_cls] = 0
Y[sample_per_cls:2*sample_per_cls] = 1
Y[2*sample_per_cls:3*sample_per_cls] = 2
Y[3*sample_per_cls:] = 3
#shuffle the data set
rand_int = np.arange(n_samples)
np.random.shuffle(rand_int)
X = X[rand_int]
Y = Y[rand_int]
return X, Y
X, Y = create_data(n_samples=800, noise=0.15)
clf = svm.SVC(C=0.5, kernel='rbf', gamma=0.1, decision_function_shape='ovr', cache_size=8000)
#the classifier below is consistent
#clf = OneVsRestClassifier(svm.SVC(C=0.5, kernel='rbf', gamma=0.1, decision_function_shape='ovr', cache_size=8000))
clf.fit(X,Y)
Xs = np.linspace(np.min(X[:,0] - 1), np.max(X[:,0] + 1), 150)
Ys = np.linspace(np.min(X[:,1] - 1), np.max(X[:,1] + 1), 150)
XX, YY = np.meshgrid(Xs, Ys)
test_set = np.stack([XX, YY], axis=2).reshape(-1,2)
#prediction via argmax of the decision function
pred = np.argmax(clf.decision_function(test_set), axis=1)
#prediction with sklearn function
pred_1 = clf.predict(test_set)
diff = np.equal(pred, pred_1)
error = np.where(diff == False)[0]
print(error)
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [16, 10]
plt.contourf(XX, YY, pred_1.reshape(XX.shape), alpha=0.5, cmap='seismic')
plt.colorbar()
plt.scatter(X[:,0], X[:,1], c=Y, s=20, marker='o', edgecolors='k')
plt.scatter(test_set[error, 0], test_set[error, 1], c=pred_1[error], s=120, marker='^', edgecolors='k')
plt.show()
Triangles are marking the inconsistent points:
Let's say this example implements a simple binary classification.
X = array([[1,2,3],[2,3,4],[3,4,5]])
y = array([0],[1],[0])
...
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(X, y, epochs=50, verbose=0)
# new instance where we do not know the answer
Xnew = array([[4, 5, 6]])
# make a prediction
ynew = model.predict(Xnew)
#show the inputs and predicted outputs
print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))
...
results
X=[4, 5, 6], Predicted=[0 or 1]
And this one implements multiclass classification.
X = array([[1,2,3],[2,3,4],[3,4,5]])
y = array([4],[5],[6])
...
model.compile(loss='categorical_crossentropy', optimizer='adam')
# fit model
model.fit(X, y, epochs=50, verbose=2)
model.reset_states()
# evaluate model on new data
yhat = model.predict((X))
...
results decoded
X=[4, 5, 6], Predicted=[4, 5, 6]
How to implement multiclass classification with single output to get something like this? (similar to forecasting time series)
X = array([[1,2,3],[2,3,4],[3,4,5]])
y = array([4],[5],[6])
# new instance where we do not know the answer
Xnew = array([[4, 5, 6]])
yhat = model.predict_classes(Xnew)
results decoded
X=[4, 5, 6], Predicted=[7]
What you are looking for is the loss='sparse_categorical_crossentropy' function which will assume that the integer targets are class labels. So if your model has 7 outputs, and you give target 2, sparse_categorical_crossentropy will convert 2 into [0,0,1,0,0,0,0] as the target and apply categorical_crossentropy as usual.
In this case, your output layer activation function should be softmax and number of outputs be equal to the number of classes. Most likely something like Dense(num_classes, activation='softmax')
If your integer classes are just [4,5,6] then you need to shift them to [0,1,2] to satisfy the condition max(Y_targets) < num_classes.
My dataset has around 5000 samples and 3 classes (one hot encoded) and I am interested in creating samples using stratified K fold. Moreover, in the end, I want to split each output file (from the K fold) into train and test.
I tried the following suggestion from sklearn documentation but I want to retain the shape of my dataset.
from sklearn.model_selection import StratifiedShuffleSplit
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0, 0, 1, 1])
sss = StratifiedShuffleSplit(n_splits=3, test_size=0.5, random_state=0)
sss.get_n_splits(X, y)
print(sss)
for train_index, test_index in sss.split(X, y):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
import theano.tensor as T
import numpy as np
from nolearn.lasagne import NeuralNet
def multilabel_objective(predictions, targets):
epsilon = np.float32(1.0e-6)
one = np.float32(1.0)
pred = T.clip(predictions, epsilon, one - epsilon)
return -T.sum(targets * T.log(pred) + (one - targets) * T.log(one - pred), axis=1)
net = NeuralNet(
# your other parameters here (layers, update, max_epochs...)
# here are the one you're interested in:
objective_loss_function=multilabel_objective,
custom_score=("validation score", lambda x, y: np.mean(np.abs(x - y)))
)
I found this code online and wanted to test it. It did work, the results include training loss, test loss, validation score and during time and so on.
But how can I get the F1-micro score? Also, if I was trying to import scikit-learn to calculate the F1 after adding the following code:
data = data.astype(np.float32)
classes = classes.astype(np.float32)
net.fit(data, classes)
score = cross_validation.cross_val_score(net, data, classes, scoring='f1', cv=10)
print score
I got this error:
ValueError: Can't handle mix of multilabel-indicator and
continuous-multioutput
How to implement F1-micro calculation based on above code?
Suppose your true labels on the test set are y_true (shape: (n_samples, n_classes), composed only of 0s and 1s), and your test observations are X_test (shape: (n_samples, n_features)).
Then you get your net predicted values on the test set by y_test = net.predict(X_test).
If you are doing multiclass classification:
Since in your network you have set regression to False, this should be composed of 0s and 1s only, too.
You can compute the micro averaged f1 score with:
from sklearn.metrics import f1_score
f1_score(y_true, y_pred, average='micro')
Small code sample to illustrate this (with dummy data, use your actual y_test and y_true):
from sklearn.metrics import f1_score
import numpy as np
y_true = np.array([[0, 0, 1], [0, 1, 0], [0, 0, 1], [0, 0, 1], [0, 1, 0]])
y_pred = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 0, 1], [0, 0, 1]])
t = f1_score(y_true, y_pred, average='micro')
If you are doing multilabel classification:
You are not outputting a matrix of 0 and 1, but a matrix of probabilities. y_pred[i, j] is the probability that observation i belongs to the class j.
You need to define a threshold value, above which you will say an observation belongs to a given class. Then you can attribute labels accordingly and proceed just the same as in the previous case.
thresh = 0.8 # choose your own value
y_test_binary = np.where(y_test > thresh, 1, 0)
# creates an array with 1 where y_test>thresh, 0 elsewhere
f1_score(y_true, y_pred_binary, average='micro')