How to set initial weights in MLPClassifier? - scikit-learn

I cannot find a way to set the initial weights of the neural network, could someone tell me how please?
I am using python package sklearn.neural_network.MLPClassifier.
Here is the code for reference:
from sklearn.neural_network import MLPClassifier
classifier = MLPClassifier(solver="sgd")
classifier.fit(X_train, y_train)

Solution:
A working solution is to inherit from MLPClassifier and override the _init_coef method. In the _init_coef write the code to set the initial weights.
Then use the new class "MLPClassifierOverride" as in the example below instead of "MLPClassifier"
# new class
class MLPClassifierOverride(MLPClassifier):
# Overriding _init_coef method
def _init_coef(self, fan_in, fan_out):
if self.activation == 'logistic':
init_bound = np.sqrt(2. / (fan_in + fan_out))
elif self.activation in ('identity', 'tanh', 'relu'):
init_bound = np.sqrt(6. / (fan_in + fan_out))
else:
raise ValueError("Unknown activation function %s" %
self.activation)
coef_init = ### place your initial values for coef_init here
intercept_init = ### place your initial values for intercept_init here
return coef_init, intercept_init

The docs show you the attributes in use.
Attributes:
...
coefs_ : list, length n_layers - 1
The ith element in the list represents the weight matrix corresponding to > layer i.
intercepts_ : list, length n_layers - 1
The ith element in the list represents the bias vector corresponding to layer > i + 1.
Just build your classifier clf=MLPClassifier(solver="sgd") and set coefs_ and intercepts_ before calling clf.fit().
The only remaining question is: does sklearn overwrite your inits?
The code looks like:
if not hasattr(self, 'coefs_') or (not self.warm_start and not
incremental):
# First time training the model
self._initialize(y, layer_units)
This looks to me like it won't replace your given coefs_ (you might check biases too).
The packing and unpacking functions further indicates that this should be possible. These are probably used for serialization through pickle internally.

multilayer_perceptron.py initializes the weights based on the nonlinear function used for hidden layers. If you want to try a different initialization, you can take a look at the function _init_coef here and modify as you desire.

Related

How to use Keras Conv2D layers with OpenAI gym?

Using OpenAI's gym environment, I've created my own environment in which the observation space of box type, and the shape is (21,21,1).
The intention is to use a keras Conv2D layer as the model's input. Ideally, the shape going into this model would be (None,21,21,1), with None representing the batch size. Kera's documentation is here: https://keras.io/api/layers/convolution_layers/convolution2d/
The issue I'm having is that an extra dimension is being required while checking the shaping. Because of this, the shape it expects is (None,1,21,21,1). This is prohibiting me from using MaxPooling layers in the model. After investigating the keras RL library, this is due to two functions that are adding this dimensionality.
The first function is found in memory.py, where a current observation is put into a list and returned as such. Here:
def get_recent_state(self, current_observation):
"""Return list of last observations
# Argument
current_observation (object): Last observation
# Returns
A list of the last observations
"""
# This code is slightly complicated by the fact that subsequent observations might be
# from different episodes. We ensure that an experience never spans multiple episodes.
# This is probably not that important in practice but it seems cleaner.
state = [current_observation]
idx = len(self.recent_observations) - 1
for offset in range(0, self.window_length - 1):
current_idx = idx - offset
current_terminal = self.recent_terminals[current_idx - 1] if current_idx - 1 >= 0 else False
if current_idx < 0 or (not self.ignore_episode_boundaries and current_terminal):
# The previously handled observation was terminal, don't add the current one.
# Otherwise we would leak into a different episode.
break
state.insert(0, self.recent_observations[current_idx])
while len(state) < self.window_length:
state.insert(0, zeroed_observation(state[0]))
return state
The second function is called just after and computes the Q values based on the recent observation. It creates a list of the state when passing onto "compute_batch_q_values".
def compute_q_values(self, state):
q_values = self.compute_batch_q_values([state]).flatten()
assert q_values.shape == (self.nb_actions,)
return q_values
I understand that one extra dimension should be added to represent the batch size, but is it twice? Can anyone explain why this is or how to use Conv2d layers with OpenAI gym?
Thanks.

MultiOutput Classification with TensorFlow Extended (TFX)

I'm quite new to TFX (TensorFlow Extended), and have been going through the sample tutorial on the TensorFlow portal to understand a bit more to apply it to my dataset.
In my scenario, instead of predicting a single label, the problem at hand requires me to predict 2 outputs (category 1, category 2).
I've done this using pure TensorFlow Keras Functional API and that works fine, but then am now looking to see if that can be fitted into the TFX pipeline.
Where i get the error, is at the Trainer stage of the pipeline, and where it throws the error is in the _input_fn, and i suspect it's because i'm not correctly splitting out the given data into (features, labels) tensor pair in the pipeline.
Scenario:
Each row of the input data comes in the form of
[Col1, Col2, Col3, ClassificationA, ClassificationB]
ClassificationA and ClassificationB are the categorical labels which i'm trying to predict using the Keras Functional Model
The output layer of the keras functional model looks like below, where there's 2 outputs that is joined to a single dense layer (Note: _xf appended to the end is just to illustrate that i've encoded the classes to int representations)
output_1 = tf.keras.layers.Dense(
TargetA_Class, activation='sigmoid',
name = 'ClassificationA_xf')(dense)
output_2 = tf.keras.layers.Dense(
TargetB_Class, activation='sigmoid',
name = 'ClassificationB_xf')(dense)
model = tf.keras.Model(inputs = inputs,
outputs = [output_1, output_2])
In the trainer module file, i've imported the required packages at the start of the module file >
import tensorflow_transform as tft
from tfx.components.tuner.component import TunerFnResult
import tensorflow as tf
from typing import List, Text
from tfx.components.trainer.executor import TrainerFnArgs
from tfx.components.trainer.fn_args_utils import DataAccessor, FnArgs
from tfx_bsl.tfxio import dataset_options
The current input_fn in the trainer module file looks like the below (by following the tutorial)
def _input_fn(file_pattern: List[Text],
data_accessor: DataAccessor,
tf_transform_output: tft.TFTransformOutput,
batch_size: int = 200) -> tf.data.Dataset:
"""Helper function that Generates features and label dataset for tuning/training.
Args:
file_pattern: List of paths or patterns of input tfrecord files.
data_accessor: DataAccessor for converting input to RecordBatch.
tf_transform_output: A TFTransformOutput.
batch_size: representing the number of consecutive elements of returned
dataset to combine in a single batch
Returns:
A dataset that contains (features, indices) tuple where features is a
dictionary of Tensors, and indices is a single Tensor of label indices.
"""
return data_accessor.tf_dataset_factory(
file_pattern,
dataset_options.TensorFlowDatasetOptions(
batch_size=batch_size,
#label_key=[_transformed_name(x) for x in _CATEGORICAL_LABEL_KEYS]),
label_key=_transformed_name(_CATEGORICAL_LABEL_KEYS[0]), _transformed_name(_CATEGORICAL_LABEL_KEYS[1])),
tf_transform_output.transformed_metadata.schema)
When i run the trainer component the error that comes up is:
label_key=_transformed_name(_CATEGORICAL_LABEL_KEYS[0]),transformed_name(_CATEGORICAL_LABEL_KEYS1)),
^ SyntaxError: positional argument follows keyword argument
I've also tried label_key=[_transformed_name(x) for x in _CATEGORICAL_LABEL_KEYS]) which also gives an error.
However, if i just pass in a single label key, label_key=transformed_name(_CATEGORICAL_LABEL_KEYS[0]) then it works fine.
FYI - _CATEGORICAL_LABEL_KEYS is nothing but a list which contains the names of the 2 outputs i'm trying to predict (ClassificationA, ClassificationB).
transformed_name is nothing but a function to return an updated name/key for the transformed data:
def transformed_name(key):
return key + '_xf'
Question:
From what i can see, the label_key argument for dataset_options.TensorFlowDatasetOptions can only accept a single string/name of label, which means it may not be able to output the dataset with multi labels.
Is there a way which i can modify the _input_fn so that i can get the dataset that's returned by _input_fn to work with returning the 2 output labels? So the tensor that's returned looks something like:
Feature_Tensor: {Col1_xf: Col1_transformedfeature_values, Col2_xf:
Col2_transformedfeature_values, Col3_xf:
Col3_transformedfeature_values}
Label_Tensor: {ClassificationA_xf: ClassA_encodedlabels,
ClassificationB_xf: ClassB_encodedlabels}
Would appreciate advice from the wider community of tfx!
Since the label key is optional, maybe instead of specifying it in the TensorflowDatasetOptions, instead you can use dataset.map afterwards and pass both labels after taking them from your dataset.
Haven't tested it but something like:
def _data_augmentation(feature_dict):
features = feature_dict[_transformed_name(x) for x in
_CATEGORICAL_FEATURE_KEYS]]
keys=[_transformed_name(x) for x in _CATEGORICAL_LABEL_KEYS]
return features, keys
def _input_fn(file_pattern: List[Text],
data_accessor: DataAccessor,
tf_transform_output: tft.TFTransformOutput,
batch_size: int = 200) -> tf.data.Dataset:
"""Helper function that Generates features and label dataset for tuning/training.
Args:
file_pattern: List of paths or patterns of input tfrecord files.
data_accessor: DataAccessor for converting input to RecordBatch.
tf_transform_output: A TFTransformOutput.
batch_size: representing the number of consecutive elements of returned
dataset to combine in a single batch
Returns:
A dataset that contains (features, indices) tuple where features is a
dictionary of Tensors, and indices is a single Tensor of label indices.
"""
dataset = data_accessor.tf_dataset_factory(
file_pattern,
dataset_options.TensorFlowDatasetOptions(
batch_size=batch_size,
tf_transform_output.transformed_metadata.schema)
dataset = dataset.map(_data_augmentation)
return dataset

Getting a Scoring Function by Name in scikit-learn

In scikit-learn , there is the notion of a scoring function. If we have some predicted labels and the true labels, we can get to the score by calling scoring(y_true, y_predict). An example of such scoring function is sklearn.metrics.accuracy_score.
A scoring function is not to be confused of the scorer, which is an object that can be called as scorer(estimator, X, y_true).
There are many builtin scorers in scikit-learn. It is possible to get to these scorers by their string names. For example, we can get the scorer corresponding to the name 'accuracy' by calling sklearn.metrics.get_scorer("accuracy")/
But it turns out that there is no obvious mechanism to access the built-in scoring functions by their names at run-time, through passing in the name as a string. For example, there is no way to access sklearn.metrics.accuracy_score by its name accuracy.
For example, if at run time, the program knows the name of the scoring function is contained in variable name, I am looking for a mechanism get_scoring_function(), such that, get_scoring_function(name) will return the scoring function handle. Note that this name, name, is not known at scripting time.
Is there any way to access the built-in scoring functions by their names at run time through passing in the names as strings?
You can use the get_scorer() function, which accepts a string as an argument, and then get the _score_func attribute of the returned object.
So for example
from sklearn.metrics import get_scorer
get_scorer('accuracy')._score_func(y_true, y_pred)
is equivalent to
from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)
I myself faced this task, and I haven't found a better way to access metrics by names than with sklearn.metrics.get_scorer function, but the drawback of it is that you have to pass an estimator there, not predictions. I tried to use the #collinb9 recommendation, but you see, you have to access a protected method there, and in my case, it led to unpleasant consequences, namely incorrectly calculated metrics.
This is a short example showing this problem.
from sklearn import datasets, model_selection, linear_model, metrics
features, labels = datasets.make_regression(1000, random_state=123)
train_features, test_features, train_labels, test_labels = model_selection.train_test_split(features, labels, test_size=0.1, random_state=567)
model = linear_model.LinearRegression()
model.fit(train_features, train_labels)
print(f'variant 1 neg_mse = {metrics.get_scorer("neg_mean_squared_error")(model, test_features, test_labels)}')
print(f'variant 1 neg_rmse = {metrics.get_scorer("neg_root_mean_squared_error")(model, test_features, test_labels)}\n')
preds = model.predict(test_features)
print(f'variant 2 mse = {metrics.mean_squared_error(test_labels, preds)}')
print(f'variant 2 rmse = {metrics.mean_squared_error(test_labels, preds, squared=False)}\n')
print(f'protected neg_mse = {metrics.get_scorer("neg_mean_squared_error")._score_func(test_labels, preds)}')
print(f'protected neg_rmse = {metrics.get_scorer("neg_root_mean_squared_error")._score_func(test_labels, preds)}')
The output of this program will be:
variant 1 neg_mse = -2.142587870436064e-25
variant 1 neg_rmse = -4.628809642268803e-13
variant 2 mse = 2.142587870436064e-25
variant 2 rmse = 4.628809642268803e-13
protected neg_mse = 2.142587870436064e-25
protected neg_rmse = 2.142587870436064e-25
You see, metrics calculated with the use of the protected method differ. First, we ordered to get negative values, but got positive ones (it should be mentioned, that for variant 2 metrics we didn't imply negative values). Second, the neg_mse and neg_rmse values are equal but should be different.
If we go to the source code of sklearn metrics, we will see:
This is how _score_func is called: it is multiplied by sign, so that's where we lose our negative values.
This is how scorers are made: you see, neg_root_mean_squared_error_scorer has extra parameter squared=False. This parameter is stated explicitly as an optional one in metrics.mean_squared_error, so you won't make a mistake. We can pass this parameter as a keyword argument to _score_fun and at least we will get a correct absolute value then:
print(f'protected neg_rmse = {metrics.get_scorer("neg_root_mean_squared_error")._score_func(test_labels, preds, squared=False)}')
protected neg_rmse = 4.628809642268803e-13
To make things short, I've shown, to my knowledge, the only way to get sklearn metrics by name (btw, you can find the full list of names here), and that it's not safe to use protected methods that you're not supposed to use. BTW, I was using sklearn version=0.24.2.
Since the documentation is incomplete, you'll have to go directly to the source code here for the complete list of metric names:
Metric Names
Search for __all__.
Answer of #collinb9 should not be accepted as it would lead to incorrect calculations.
You need other arguments (such as squared:False for rmse) to compute the correct thing. They can be accessed via the _kwargs attribute of _BaseScorer class. If you combine _score_func and _kwargs then we can get the corresponding scorer function.
The full answer to the question should be:
import functools
import sklearn
def score(scoring_name, y_true, y_pred):
sklearn_scorer = sklearn.metrics.get_scorer(scoring_name)
return sklearn_scorer._sign * sklearn_scorer._score_func(
y_true=y_true, y_pred=y_pred, **sklearn_scorer._kwargs
)
score("neg_root_mean_squared_error", y_true, y_pred)

How to include multiple input tensor in keras.model.fit_generator

I am a keras rookie and I need some help in working with keras after many days struggling at this problem. Please ask for further information if there is any ambiguity.
Currently, I am trying to modify the code from a link.According to their network model, there are 2 input tensors expected. Now I have trouble including 2 input tensors into the source code provided by them.
Function Boneage_prediction_model() initiates a model of 2 input tensors.
def Boneage_prediction_model():
i1 = Input(shape=(500, 500, 1), name='input_img') # the 1st input tensor
i2 = Input(shape=(1,), name='input_gender') # the 2nd input tensor
... ...
model = Model(inputs=(i1, i2), outputs=o) # define model input
with both i1 and i2
... ...
#using model.fit_generator to instantiate
# datagen is initiated by keras.preprocessing.image.ImageDataGenerator
# img_train is the 1st network input, and boneage_train is the training label
# gender_train is the 2nd network input
model.fit_generator(
(datagen.flow(img_train, boneage_train, batch_size=10),
gender_train),
... ...
)
I tried many ways to combine the two (datagen.flow(img_train, boneage_train, batch_size=10) and gender_train) as stated above, but it failed and kept reporting errors
such as the following,
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays: [array([[[[-0.26078433],
[-0.26078433],
[-0.26078433],
...,
[-0.26078433],
[-0.26078433],
[-0.26078433]],
[[-0.26078433],
[-0.26...
If I understand you correctly, you want to have two inputs for one network and have one label for the combined output. In the official documentation for the fit_generator there is an example with multiple inputs.
Using a dictionary to map the multiple inputs would result in:
model.fit_generator(
datagen.flow({'input_img':img_train, 'input_gender':gender_train}, boneage_train, batch_size=10),
...
)
After failure either blindly to simply combine the 2 inputs, or as another contributor suggested, to use a dictionary to map the multiple inputs, I realized it seems to be the problem of datagen.flow which keeps me from combining a image tensor input and a categorical tensor input. datagen.flow is initiated by keras.preprocessing.image.ImageDataGenerator with the goal of preprocessing the input images. Therefore chances are that it is inappropriate to combine the 2 inputs inside datagen.flow. Additionally, fit_generator seems to expect an input of generator type, and what I did as proposed in my question is wrong, though I do not fully understand the mechanism of this function.
As I looked up carefully in other codes written by the team, I learned that I need to write a generator to combine the two. The solution is as following,
def combined_generators(image_generator, gender_data, batch_size):
gender_generator = cycle(batch(gender_data, batch_size))
while True:
nextImage = next(image_generator)
nextGender = next(gender_generator)
assert len(nextImage[0]) == len(nextGender)
yield [nextImage[0], nextGender], nextImage[1]
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
train_gen_wrapper = combined_generators(train_gen_boneage, train_df_boneage['male'], BATCH_SIZE_TRAIN)
model.fit_generator(train_gen_wrapper, ... )

Implementing a complicated activation function in keras

I just read an interesting paper: A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks.
I'd like to try to implement this activation function in Keras. I've implemented custom activations before, e.g. a sinusoidal activation:
def sin(x):
return K.sin(x)
get_custom_objects().update({'sin': Activation(sin)})
However, the activation function in this paper has 3 unique properties:
It doubles the size of the input (the output is 2x the input)
It's parameterized
It's parameters should be regularized
I think once I have a skeleton for dealing with the above 3 issues, I can work out the math myself, but I'll take any help I can get!
Here, we will need one of these two:
A Lambda layer - If your parameters are not trainable (you don't want them to change with backpropagation)
A custom layer - If you need custom trainable parameters.
The Lambda layer:
If your parameters are not trainable, you can define your function for a lambda layer. The function takes one input tensor, and it can return anything you want:
import keras.backend as K
def customFunction(x):
#x can be either a single tensor or a list of tensors
#if a list, use the elements x[0], x[1], etc.
#Perform your calculations here using the keras backend
#If you could share which formula exactly you're trying to implement,
#it's possible to make this answer better and more to the point
#dummy example
alphaReal = K.variable([someValue])
alphaImag = K.variable([anotherValue]) #or even an array of values
realPart = alphaReal * K.someFunction(x) + ...
imagPart = alphaImag * K.someFunction(x) + ....
#You can return them as two outputs in a list (requires the fuctional API model
#Or you can find backend functions that join them together, such as K.stack
return [realPart,imagPart]
#I think the separate approach will give you a better control of what to do next.
For what you can do, explore the backend functions.
For the parameters, you can define them as keras constants or variables (K.constant or K.variable), either inside or outside the function above, or even transform them in model inputs. See details in this answer
In your model, you just add a lambda layer that uses that function.
In a Sequential model: model.add(Lambda(customFunction, output_shape=someShape))
In a functional API model: output = Lambda(customFunction, ...)(inputOrListOfInputs)
If you're going to pass more inputs to the function, you'll need the functional model API.
If you're using Tensorflow, the output_shape will be computed automatically, I believe only Theano requires it. (Not sure about CNTK).
The custom layer:
A custom layer is a new class you create. This approach will only be necessary if you're going to have trainable parameters in your function. (Such as: optimize alpha with backpropagation)
Keras teaches it here.
Basically, you have an __init__ method where you pass the constant parameters, a build method where you create the trainable parameters (weights), a call method that will do the calculations (exactly what would go in the lambda layer if you didn't have trainable parameters), and a compute_output_shape method so you can tell the model what the output shape is.
class CustomLayer(Layer):
def __init__(self, alphaReal, alphaImag):
self.alphaReal = alphaReal
self.alphaImage = alphaImag
def build(self,input_shape):
#weights may or may not depend on the input shape
#you may use it or not...
#suppose we want just two trainable values:
weigthShape = (2,)
#create the weights:
self.kernel = self.add_weight(name='kernel',
shape=weightShape,
initializer='uniform',
trainable=True)
super(CustomLayer, self).build(input_shape) # Be sure to call this somewhere!
def call(self,x):
#all the calculations go here:
#dummy example using the constant inputs
realPart = self.alphaReal * K.someFunction(x) + ...
imagPart = self.alphaImag * K.someFunction(x) + ....
#dummy example taking elements of the trainable weights
realPart = self.kernel[0] * realPart
imagPart = self.kernel[1] * imagPart
#all the comments for the lambda layer above are valid here
#example returning a list
return [realPart,imagPart]
def compute_output_shape(self,input_shape):
#if you decide to return a list of tensors in the call method,
#return a list of shapes here, twice the input shape:
return [input_shape,input_shape]
#if you stacked your results somehow in a single tensor, compute a single tuple, maybe with an additional dimension equal to 2:
return input_shape + (2,)
You need to implement a "Layer", not a common activation function.
I think the implementation of pReLU in Keras would be a good example for your task. See pReLU
A lambda function in the activation worked for me. Maybe not what you want but it's one step more complicated than the simple use of a built-in activation function.
encoder_outputs = Dense(units=latent_vector_len, activation=k.layers.Lambda(lambda z: k.backend.round(k.layers.activations.sigmoid(x=z))), kernel_initializer="lecun_normal")(x)
This code changes the output of a Dense from Reals to 0,1 (ie, binary).
Keras throws a warning but the code still proves to work.

Resources