CoreML Output for kNN Classifier - python-3.x

I am working with a kNN model that I have built and I would like to export as an .mlmodel file. I have done so already, but it is something that could use some work in terms of efficiency.
I have python3.6, sklearn 0.19.2, and the latest version of mlcoretools.
Initially, I training my model with x_train and x_test as array of float64 and y_train and y_test as Array of int8. The y values are either 0 or 1. Using:
import coremltools
coreml_model = coremltools.converters.sklearn.convert(model)
I get this error:
ValueError: Class labels must be all of type int or all of type string.
Fine. I change the y values to int32 and it works. But the reason I wanted int8 was for memory reasons in my app. Any reason why int8 won't work?
The other issue is with the output. Currently, with my labels they are 0 or 1. However, is there a way to have model output the strings go or stop instead of 1 or 0? Seems that within the documentation, in the input I can have a dict, but not for outputs. Ideally, something like this would be great for output, but I cannot get it to work: labels = {“stop” : 0, “go” : 1}

CoreML currently does not support int8 as an input data type.
If you want the model to predict strings, you should use labels that are strings. That said, it's possible to change the model so that it outputs strings instead of numbers.
You will have to edit the mlmodel file, grab the "spec" object, and fill in the stringClassLabels field of the spec.kNearestNeighborsClassifier.

Related

Sklearn Random Forrest different accuracy values for different label encodings

I'm using sklearn Random Forrest to train my model. With the same input features for the model I tried passing the target labels first with label_binarize to create one hot encodings of my target labels and second I tried using label_encoder to encode my target labels. In both cases I'm getting different accuracy score. Is there a specific reason why this is happening, as I'm just using a different method to encode the labels without changing any input features.
It is not because of label, but the randomness of Random Forest.
Try fix the random_state to avoid this situation.
https://datascience.stackexchange.com/questions/74364/random-forrest-sklearn-gives-different-accuracy-for-different-target-label-encod
Basically when you encode your target labels as one hot encoding sklearn treats it as a multilabel problem as compared to label encoder which gives an 1d array where sklearn treats it as a multiclass problem.
https://scikit-learn.org/stable/modules/multiclass.html

Machine Learning liner Regression - Sklearn

I'm new to the Machine learning domain and in Learn Regression i have some doubt
1:While practicing the sklearn learn regression model prediction method getting the below error.
Code:
sklearn.linear_model.LinearRegression.predict(25)
Error:
"ValueError: Expected 2D array, got scalar array instead: array=25. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."
Do i need to pass a 2-D array? Checked on sklearn documentation page any haven't found any thing for version update.
**Running my code on Kaggle
https://www.kaggle.com/aman9d/bikesharingdemand-upx/
2: Is index of dataset going to effect model's score (weights)?
First of all you should put your code as you use:
# import, instantiate, fit
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
linreg.fit(X, y)
# use the predict method
linreg.predict(25)
Because what you post in the question is not properly executable, predict method is not static for the class LinearRegression.
When you fit a model, the first step is recognize which kind of data will be the input, in your case will be similar to X, that means that if you pass something with different shape of X to the model it will raise an error.
In your example X seems to be a pd.DataFrame() instance with only 1 column, this should be replaceable with an array of 2 dimension representing the number of examples by the number of features, so if you try:
linreg.predict([[25]])
should work.
For example if you were trying a regression with more than 1 feature aka column, let's say temp and humidity, your input would look like this:
linreg.predict([[25, 56]])
I hope this will help you and always keep in mind which is the shape of your data.
Documentation: LinearRegression fit
X : array-like or sparse matrix, shape (n_samples, n_features)

Custom binary cross-entropy loss with weight-map using Keras

I have a question regarding the implementation of a custom loss-function for my neural network.
I am currently trying to segment cells for a project and I decided to use a unet as it seems to work quite well. In order to improve my current model, I decided to follow the idea of the original paper of the unet (https://arxiv.org/abs/1505.04597) where they implemented a weight-map assigning thus more weight to pixels that are located in between cells that are tightly associated, as you can see in this picture: Example of a weight map.
I am currently using Keras for my unet and my problem is that I do not know how to give my weights to my model without creating any problem. My idea was to create a generator with the images and a 2-channeled array containing the labels in the first channel and the weights in the second channel, that way I can extract my weights and my labels easily in my custom loss function.
My code looks like that:
train_generator = zip(image_generator, label_generator, weight_generator)
for (img, label, weight) in train_generator:
img, label = adjustData(img, True, label)
label_weights = np.concatenate((label, weight),axis=3)
# This is the final generator
yield (img, label_weights)
As you can see, I construct the train_generator with three previously constructed generators, I adjust some things and then I yield my images and combined labels and weights.
Then, when I try to fit my model with fit_generator, I get this error: ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays.
I really do not know what to do and how to implement correctly what I want to do.
Thank you in advance for your answers.

How should I give label names to the model output with coreML convert

I have build a model using keras and I want to convert it to coreML using this function :
import coremltools
coreml_model = coremltools.converters.keras.convert(model)
coreml_model.save(‘myModel’)
The output of my model is a 10 neurons layer to predict 10 classes. My issue is that I would like to give the label name associated with each neuron classA, classB, etc.
The doc shows a lot of parameters (https://apple.github.io/coremltools/generated/coremltools.converters.keras.convert.html) but I can't understand which one to use : output_names, predicted_feature_name, or predicted_probabilities_output?
Never mind... I just did not read the doc properly.. I had to use the class_labels parameters.

How to ignore some input layer, while predicting, in a keras model trained with multiple input layers?

I'm working with neural networks and I've implemented the following architecture using keras with tensorflow backend:
For training, I'll give some labels in the layer labels_vector, this vector can have int32 values (ie: 0 could be a label). For the testing phase, I need to just ignore this input layer, if I set it to 0 results could be wrong since I've trained with labels that can be equal to 0 vector. Is there a way to simply ignore or disable this layer on the prediction phase?
Thanks in advance.
How to ignore some input layer ?
You can't. Keras cannot just ignore an input layer as the output depends on it.
One solution to get nearly what you want is to define a custom label in your training data to be the null value. Your network will learn to ignore it if it feels that it is not an important feature.
If labels_vector is a vector of categorical labels, use one-hot encoding instead of integer encoding. integer encoding assumes that there is a natural ordered relationship between each label which is wrong.

Resources