Dataiku - Saving Models in DSS Python Receipes - scikit-learn

How do I save a model in Dataiku?
This is the tutorial that I am using: https://doc.dataiku.com/dss/latest/python-api/model-evaluation-stores.html
Example Code:
from sklearn import linear_model
reg = linear_model.LinearRegression()
m = dataiku.Model(reg)
> TypeError: argument of type 'LinearRegression' is not iterable

Not sure what you're trying to achieve, but dataiku.Model(...) expects the param to be a str that corresponds to the id of the model according to the doc.
You might either want to:
Turn your linear regression into a mlflow model and import it in Dataiku DSS.
Use the auto ML pipeline to train your linear regression in Dataiku
In both cases, the corresponding doc can be found here: link

Related

How to preprocess data for training a character level RNN

I am trying to train a RNN model which classifies origin of names. The data looks like attached image. I understand I have to first map the labels as an integer. I am using the following code to do that:
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
origin_label = encoder.fit_transform(origins)
I am having troubles figuring out what the next steps would be. I am using Keras to build this model. Thank you very much for your help.
Data Format

Machine Learning liner Regression - Sklearn

I'm new to the Machine learning domain and in Learn Regression i have some doubt
1:While practicing the sklearn learn regression model prediction method getting the below error.
Code:
sklearn.linear_model.LinearRegression.predict(25)
Error:
"ValueError: Expected 2D array, got scalar array instead: array=25. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."
Do i need to pass a 2-D array? Checked on sklearn documentation page any haven't found any thing for version update.
**Running my code on Kaggle
https://www.kaggle.com/aman9d/bikesharingdemand-upx/
2: Is index of dataset going to effect model's score (weights)?
First of all you should put your code as you use:
# import, instantiate, fit
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
linreg.fit(X, y)
# use the predict method
linreg.predict(25)
Because what you post in the question is not properly executable, predict method is not static for the class LinearRegression.
When you fit a model, the first step is recognize which kind of data will be the input, in your case will be similar to X, that means that if you pass something with different shape of X to the model it will raise an error.
In your example X seems to be a pd.DataFrame() instance with only 1 column, this should be replaceable with an array of 2 dimension representing the number of examples by the number of features, so if you try:
linreg.predict([[25]])
should work.
For example if you were trying a regression with more than 1 feature aka column, let's say temp and humidity, your input would look like this:
linreg.predict([[25, 56]])
I hope this will help you and always keep in mind which is the shape of your data.
Documentation: LinearRegression fit
X : array-like or sparse matrix, shape (n_samples, n_features)

How should I give label names to the model output with coreML convert

I have build a model using keras and I want to convert it to coreML using this function :
import coremltools
coreml_model = coremltools.converters.keras.convert(model)
coreml_model.save(‘myModel’)
The output of my model is a 10 neurons layer to predict 10 classes. My issue is that I would like to give the label name associated with each neuron classA, classB, etc.
The doc shows a lot of parameters (https://apple.github.io/coremltools/generated/coremltools.converters.keras.convert.html) but I can't understand which one to use : output_names, predicted_feature_name, or predicted_probabilities_output?
Never mind... I just did not read the doc properly.. I had to use the class_labels parameters.

Pickle for datapreprocessing

I was going through various tutorials and articles on using pickle on the ml model so that that can be used later.
But I am not able to get something pickle or something similar for data pre- processing. I am doing the preprocessing:
Changing the datatype of few columns/features.
Feature engineering.
Hot Encoding/Dummy variables
Scaling the data using below code
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Now, I want to do this for every dataset which I pass for predictions.
Is there any way to do something like pickle to load the data preprocessing steps before I was this to loaded ML model from pickle.
Please guide
I created a function and saved it a independent file. Then called that function whenever required.
Below is the code on how I am calling the data pre process function
from DataPreparationv3 import Data_Preprocess
Base_Data = pd.read_csv('Validate.csv')
DataReady = Data_Preprocess(Base_Data)
This solved my problem.
Regards
Sudhir

Use gensim Random Projection in sklearn SVM

Is it possible to use a gensim Random Projection to train a SVM in sklearn?
I need to use gensim's tfidf implementation because it's better at dealing with large inputs and then want to put that into a random projection on which I will train my SVM. I'd also be happy to just pass the tfidf model generated by gensim to sklearn and use their random projection, if that makes things easier.
But so far I haven't found a way to get either model out of gensim into sklearn.
I have tried using gensim.matutils.corpus2cscbut of course that doesn't work: neither TfidfModel nor RpModel are corpi, so now I'm clueless at what to try next.
This is now very easy thanks to an awesome gensim contribution from Chinmaya Pancholi (see post here).
Simply import the sklearn wrapper from `gensim:
from gensim.sklearn_api import RpTransformer
Then, you can use the model to do analysis as you would any other sklearn classifier:
model = RpTransformer(num_topics=2)
clf = svm.SVC()
pipe = Pipeline([('features', model,), ('classifier', clf)])
pipe.fit(X_train, y_train)
One thing to be aware of, when using the gensim models, is that you still need to perform the dictionary and corpus steps. So instead of fitting your model on X_train, you'll have to do something along the following lines:
dictionary = Dictionary(X_train)
corpus_train = [dictionary.doc2bow(text) for text in X_train]
corpus_test = [dictionary.doc2bow(text) for text in X_test]
Then fit/predict your model on corpus_train or corpus_test.

Resources