Training matrix for SVM in OpenCv - svm

My features are the normalized rgb values that is it contains values in the range of 0 to 0.1.I have declared the training matrix to be CV_64FC1. It contains 1000 rows and 60 columns but with decimal values like 0.3333 or 0.2789. Now I read in the OpenCv docs that the training matrix has to be of float type but my matrix is of type double. How to give this matrix to the SVM for training without converting to float

Try casting your variables to float type

Related

Converting several binary columns to multilabel array

I have several binary outcome variables (see below), that I'd like to encode into a single column y that can be used for a multilabel keras model in python.
has_carinsurance has_lifeinsurance has_petinsurance
1 0 1
So far, I've created the column y using
df['y'] = df[['has_carinsurance','has_lifeinsurance','has_petinsurance']].values.tolist()
How would I convert this to an array suitable for multilabel classification in python while making the original labels retrievable?

Math.Net Weighted Least Squares dimensions issue

I'm trying to run a univariate WLS - using WeightedRegression.Weighted(X,y,W) - and am getting the error message Matrix dimensions must agree: op1 is 5836x1, op2 is 5836x1. Whether I make Y a column vector or a matrix (with only one column) does not matter.
From the error message, you can see the matrix (or the vector and matrix) dimensions agree - both have 5836 rows and 1 column.
What am I doing wrong?
What are the dimensions of your weight matrix? I think it expects a diagonal matrix, so in your case 5836x5836.

Input formatting for models such as logistic regression and KNN for Python

In my training set I have 24 Feature Vectors(FV). Each FV contains 2 lists. When I try to fit this on model = LogisticRegression() or model = KNeighborsClassifier(n_neighbors=k) I get this error ValueError: setting an array element with a sequence.
In my dataframe, each row represents each FV. There are 3 columns. The first column contains a list of an individual's heart rate, second a list of the corresponding activity data and third the target. Visually, it looks like something like this:
HR ACT Target
[0.5018, 0.5106, 0.4872] [0.1390, 0.1709, 0.0886] 1
[0.4931, 0.5171, 0.5514] [0.2423, 0.2795, 0.2232] 0
Should I:
Join both lists to form on long FV
Expand both lists such that each column represents one value. In other words, if there are 5 items in HR and ACT data for a FV, the new dataframe would have 10 columns for features and 1 for Target.
How does Logistic Regression and KNNs handle input data? I understand that logistic regression combines the input linearly using weights or coefficient values. But I am not sure what that means when it comes to lists VS dataframe columns. Does it mean it automatically converts corresponding values of dataframe columns to a list before transforming? Is there a difference between method 1 and 2?
Additionally, if a long list is required, should I have the long list as [HR,HR,HR,ACT,ACT,ACT] or [HR,ACT,HR,ACT,HR,ACT].
You should go with 2
Expand both lists such that each column represents one value. In other words, if there are 5 items in HR and ACT data for a FV, the new dataframe would have 10 columns for features and 1 for Target.
You should then select the feature columns from the dataframe and pass it as X, and the target column as Y to the model's fit function.
Sklearn's models accepts inputs with the following shape [n_samples, n_features], and since after following the 2nd solution you proposed, your training dataframe will have 2D of the shape [n_samples, 10].

Normalizing Vectors with Negative values

I want to represent each text-based item I have in my system as a vector in vector space model. The values for the terms can be negative or positive that reflect the frequency of a term in the positive or negative class. The zero value means neutral
for example:
Item1 (-1,0,-5,4.5,2)
Item2 (2,6,0,-4,0.5)
My questions are:
1- How can I normalize my vectors to a range of [0 to 1] where:
.5 means zero before normalization
and .5> if it is positive
.5< if it negative
I want to know if there is a mathematical formula to do such a thing.
2- Will similarity measure choice be different after the normalization?? For example can I use Cosine similarity?
3- Will it be difficult if I preform dimensionality reduction after the normalization??
Thanks in advance
One solution could be to use the MinMaxScaler which scales the number between (0, 1) range and then divide each row by the sum of the row. In python using sklearn you can do something like this:
from sklearn.preprocessing import MinMaxScaler, normalize
scaler = MinMaxScaler()
scaled_X = scaler.fit_transform(X)
normalized_X = normalize(scaled_X, norm='l1', axis=1, copy=True)

How to curve fit data in Excel to a multi variable polynomial?

I have a simple set of data, 10 values that increase.
I want to fit them to a polynomial of the form:
Z = A1 + A2*X + A3*Y + A4*X^2 + A5*X*Y+ A6*Y^2
Where Z the output is the set of data above, A1 - A6 are the coefficients I am looking for,
X is the range of inputs (10 of course), and Y for the moment is a constant value.
How can I curve fit to this polynomial and not the standard 2nd order one that is created using 'trendline'?
Construct a Vandermonde matrix on your data points, find it's inverse with MINVERSE, then apply this to the vector of Z values with MMULT. This would work for polynomial degree n with n data points.
Otherwise you could try polynomial regression, which will again use the Vandermonde matrix.
More math than Excel really.

Resources