Error: transform() missing 1 required positional argument: 'X' after ML deployment while testing through Postman - azure-machine-learning-service

enter image description here
We have deployed ML model to Azure Machine Learning workspace & getting endpoints. But while testing, we are getting the above error.

The error is regarding the transformation you are performing in your modelling, you need to specify the X -> Independent variable which need to be transformed. In that transformation "X" is missing and place the accurate column index number of X need to be transformed and fit.
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder = 'passthrough')
X = np.array(ct.fit_transform(X)) # X is independent variable
something like above code block.

Related

Using Sktime Regressor

I am trying to use any regressor model from sktime but but I couldn't figure out how to create the data type and format I need to use. Assume I want to use 2 columns as input and 1 column as target.
from sktime.regression.interval_based import TimeSeriesForestRegressor
rand = np.random.random((200, 3))
X = pd.DataFrame(rand)[[0, 1]]
y = pd.DataFrame(rand)[2]
forecaster = TimeSeriesForestRegressor()
forecaster.fit(X=X, y=pd.Series(y))
Code block above this error: "X is not of a supported input data type.X must be in a supported mtype format for Panel, found <class 'pandas.core.frame.DataFrame'>Use datatypes.check_is_mtype to check conformance with specifications."
How can I solve that problem?

ValueError: could not convert string to float: ' '. Is Permutation importance only applicable for numeric features?

I've a Data frame that contain dtypes as categorical, float, int.
X - contain features of all the three given dtypes and y is int.
I've created a pipline as given below.
get_imputer():
imputing function
get_encoder():
some encoder function
#model
pipeline = Pipeline(steps=[
('imputer', get_imputer()),
('encoder', get_encoder()),
('regressor', RandomForestRegressor())
])
I needed to find permutation importance of the model. below is the code for that.
import eli5
from eli5.sklearn import PermutationImportance
perm = PermutationImportance(pipeline.steps[2][1], random_state=1).fit(X, y)
eli5.show_weights(perm)
But this code is throwing an error as follows:
ValueError: could not convert string to float: ''
Let's understand the working of PermutationImportance in short.
After you have trained your model with all the features, PermutationImportance shuffles values of column/s and checks the effect on Loss function.
Eg.
There are 5 features(columns) and there are n rows:
f1 f2 f3 f4 f5
v1 v2 v3 v4 v5
v6 v7 v8 v9 v10
.
.
.
vt . . . .
Now to identify whether f3 column is important or not, it shuffles values in column f3. Eg. Value of f3 in row x is swapped with the value of f3 in row y, then it checks the effect on the loss function. And hence, identifies the importance of a feature in a model.
Now, to answer this particular question, I would say that any model is trained when all the features are numerical(as ML model does not understand text directly). So, in you PermutionImportance argument, you need to supply columns that are numbers. As you have trained a model after converting categorical/textual things in numbers, you need to apply the same conversion strategy to your new input.
Hence, PermuationImportance should be used only when your data is pre-processed and your dataframe has everything numerical.
For the next poor soul...
I came across this post while having the same problem. While the accepted answer makes total sense - the fact is that in the OP's pipeline, it appears as though he is handling the categorical data with encoders which will convert them to numeric.
So, it appears that PermutationImportance is checking the array for numeric way too early in the process (before the pipeline entirely). Instead, it should check after the preprocessing steps and right before fitting the model. This is frustrating because if it doesn't work with pipelines it makes it hard to use.
I started off having some luck using sklearn's implementation of permutation_importance instead... But then I figured it out.
You need to separate the pipeline again and you should be able to get it to work. It's annoying, but it works!
import eli5
from eli5.sklearn import PermutationImportance
estimator = pipeline.named_steps['regressor']
# I didnt have multiple steps when I did it, but maybe this is right?
preprocessor = pipeline.named_steps['imputer']['encoder']
X2 = preprocessor.transform(X)
perm = PermutationImportance(estimator, random_state=1).fit(X2.toarray(), y)
eli5.show_weights(perm)

How to use extract the hidden layer features in H2ODeepLearningEstimator?

I found H2O has the function h2o.deepfeatures in R to pull the hidden layer features
https://www.rdocumentation.org/packages/h2o/versions/3.20.0.8/topics/h2o.deepfeatures
train_features <- h2o.deepfeatures(model_nn, train, layer=3)
But I didn't find any example in Python? Can anyone provide some sample code?
Most Python/R API functions are wrappers around REST calls. See http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/_modules/h2o/model/model_base.html#ModelBase.deepfeatures
So, to convert an R example to a Python one, move the model to be the this, and all other args should shuffle along. I.e. the example from the manual becomes (with dots in variable names changed to underlines):
prostate_hex = ...
prostate_dl = ...
prostate_deepfeatures_layer1 = prostate_dl.deepfeatures(prostate_hex, 1)
prostate_deepfeatures_layer2 = prostate_dl.deepfeatures(prostate_hex, 2)
Sometimes the function name will change slightly (e.g. h2o.importFile() vs. h2o.import_file() so you need to hunt for it at http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/index.html

Obtaining Same Result as Sklearn Pipeline without Using It

How would one correctly standardize the data without using pipeline? I am just wanting to make sure my code is correct and there is no data leakage.
So if I standardize the entire dataset once, right at the beginning of my project, and then go on to try different CV tests with different ML algorithms, will that be the same as creating an Sklearn Pipeline and performing the same standardization in conjunction with each ML algorithm?
y = df['y']
X = df.drop(columns=['y', 'Date'])
scaler = preprocessing.StandardScaler().fit(X)
X_transformed = scaler.transform(X)
clf1 = DecisionTreeClassifier()
clf1.fit(X_transformed, y)
clf2 = SVC()
clf2.fit(X_transformed, y)
####Is this the same as the below code?####
pipeline1 = []
pipeline1.append(('standardize', StandardScaler()))
pipeline1.append(('clf1', DecisionTreeClassifier()))
pipeline1.fit(X_transformed,y)
pipeline2 = []
pipeline2.append(('standardize', StandardScaler()))
pipeline2.append(('clf2', DecisionTreeClassifier()))
pipeline2.fit(X_transformed,y)
Why would anybody choose the latter other than personal preference?
They are the same. It is possible that you may want one or the other from a maintainability standpoint, but the outcome of a test set prediction will be identical.
Edit Note that this is only the case because the StandardScaler is idempotent. It is strange that you fit the pipeline on the data that has already been scaled...

CNTK Sequence 2 Sequence Tutorial : placeholder_variable initialization

I am new to CNTK and was following seq2seq tutorial of CNTK.
Inside the LSTM_layer function, there's following code :
dh = placeholder_variable(shape=(output_dim), dynamic_axes=input.dynamic_axes)
dc = placeholder_variable(shape=(output_dim), dynamic_axes=input.dynamic_axes)
LSTM_cell = LSTM(output_dim)
f_x_h_c = LSTM_cell(input, (dh, dc))
h_c = f_x_h_c.outputs
Now, in LSTM_Cell(input,(dh,dc)):what's the value for dh and dc?
I don't find them getting initialized anywhere when LSTM_layer function is called.
If you see a few lines below, you will find that the placeholders are replaced. At the time of model creation you may not have all the values needed but know the shape of the data you will need for that function to work. You create placeholders (containers) for those variables. Before executing the function, these values are replaced with variables that hold values to be computed.
replacements = { dh: h.output, dc: c.output }
f_x_h_c.replace_placeholders(replacements)

Resources