If I understand the concept of embedding matrices correctly, they exist to provide a more efficient way to encode categorical variables than by using one hot encoding. It seems that if you have multiple categorical variables as inputs to a Keras model, you need to use a separate embedding matrix for each categorical variable. However, I can't find a way to use embedding with multiple categorical variables using the Embedding class provided by Keras. The example in the documentation shows only how to use embedding when the input to the model is a single categorical variable. Can somebody please provide a working example of how to use embedding with Keras when the input consists of multiple categorical variables, and possibly other variables for which embedding is not used (for example, continuous variables)?
For each categorical variable, you can have separate embedding. Hope the below code helps.
inputss = []
embeddings = []
for c in self.categorical_vars:
inputs = Input(shape=(1,),name='input_sparse_'+c)
#no_of_unique_cat = data_lr[categorical_var].nunique()
embedding_size = min(np.ceil((no_of_unique_cat)/2), 50 )
embedding_size = int(embedding_size)
embedding = Embedding(no_of_unique_cat+1, embedding_size, input_length = 1)(inputs)
embedding = Reshape(target_shape=(embedding_size,))(embedding)
inputss.append(inputs)
embeddings.append(embedding)
input_numeric = Input(shape=(1,),name='input_constinuous')
embedding_numeric = Dense(16)(input_numeric)
inputss.append(input_numeric)
embeddings.append(embedding_numeric)
x = Concatenate()(embeddings)
x = Dense(10, activation = 'relu')(x)
x = Dropout(.15)(x)
out_control = Dense(output_shape)(x)
Related
I have a Keras model with multiple images as inputs, but after trained, I'd like to set some images as default input and let just a single placeholder/input as the Query image.
I tried to use layers.Input(tensor=my_default_tensor) but it doesn't seems to be what I need, the model still seems to need this input to be given during inference time. I'd like it to be hidden from the user, so he just needs to pass the query image as input.
Have you tried tf.keras.layers.Input(tensor=tftensor, shape=())?
I discovered that it is so easy that's why I couldn't find an answer while googling hahaha
It just needed to wrap my "default input images" as a tf.constant(input_image), so I could now use them as normal Keras inputs, but without the need to pass it as a Model input.
Usage example:
from tensorflow.keras import layers
from tensorflow.keras import Model
image_1 = cv2.imread("image/path/1.jpg")
image_2 = cv2.imread("image/path/2.jpg")
normal_input = layers.Input((256, 256, 3))
default_input_1 = tf.constant(image_1)
default_input_2 = tf.constant(image_2)
x_1 = embeddingModule(default_input_1)
x_2 = embeddingModule(default_input_2)
x_3 = embeddingModule(normal_input)
concat_x1_x3 = layers.Concatenate()([x_1, x_3])
concat_x2_x3 = layers.Concatenate()([x_2, x_3])
relation_1 = relationModule(concat_x1_x3)
relation_2 = relationModule(concat_x2_x3)
model = Model(inputs=[normal_input], outputs=[relation_1, relation_2])
Here, embeddingModule and relationModule are two pre-trained Keras models.
I am dealing with a binary classification problem.
I have 2 lists of indexes listTrain and listTest, which are partitions of the training set (the actual test set will be used only later). I would like to use the samples associated with listTrain to estimate the parameters and the samples associated with listTest to evaluate the error in a cross validation process (hold out set approach).
However, I am not be able to find the correct way to pass this to the sklearn GridSearchCV.
The documentation says that I should create "An iterable yielding (train, test) splits as arrays of indices". However, I do not know how to create this.
grid_search = GridSearchCV(estimator = model, param_grid = param_grid,cv = custom_cv, n_jobs = -1, verbose = 0,scoring=errorType)
So, my question is how to create custom_cv based on these indexes to be used in this method?
X and y are respectivelly the features matrix and y is the vector of labels.
Example: Supose that I only have one hyperparameter alpha that belongs to the set{1,2,3}. I would like to set alpha=1, estimate the parameters of the model (for instance the coefficients os a regression) using the samples associated with listTrain and evaluate the error using the samples associated with listTest. Then I repeat the process for alpha=2 and finally for alpha=3. Then I choose the alpha that minimizes the error.
EDIT: Actual answer to question. Try passing cv command a generator of the indices:
def index_gen(listTrain, listTest):
yield listTrain, listTest
grid_search = GridSearchCV(estimator = model, param_grid =
param_grid,cv = index_gen(listTrain, listTest), n_jobs = -1,
verbose = 0,scoring=errorType)
EDIT: Before Edits:
As mentioned in the comment by desertnaut, what you are trying to do is bad ML practice, and you will end up with a biased estimate of the generalisation performance of the final model. Using the test set in the manner you're proposing will effectively leak test set information into the training stage, and give you an overestimate of the model's capability to classify unseen data. What I suggest in your case:
grid_search = GridSearchCV(estimator = model, param_grid = param_grid,cv = 5,
n_jobs = -1, verbose = 0,scoring=errorType)
grid_search.fit(x[listTrain], y[listTrain]
Now, your training set will be split into 5 (you can choose the number here) folds, trained using 4 of those folds on a specific set of hyperparameters, and tested the fold that was left out. This is repeated 5 times, till all of your training examples have been part of a left out set. This whole procedure is done for each hyperparameter setting you are testing (5x3 in this case)
grid_search.best_params_ will give you a dictionary of the parameters that performed the best over all 5 folds. These are the parameters that you use to train your final classifier, using again only the training set:
clf = LogisticRegression(**grid_search.best_params_).fit(x[listTrain],
y[listTrain])
Now, finally your classifier is tested on the test set and an unbiased estimate of the generalisation performance is given:
predictions = clf.predict(x[listTest])
I have the following siamese model:
I would like to make the enabling/disabling of layers a-L1 and b-L1 trainable. ie: a-L1 and/or b-L1 should be transparent (not used or disabled) for the current input if necessary. So, the model after training, will learn when it should enable/disable one or both of the layers a-L1 and b-L1.
I managed to train this model with 4 cases, so I got 4 different models accordingly:
model-1: without a-L1 and b-L1
model-2: without a-L1
model-3: without b-L1
model-4: with both a-L1 and b-L1
the performances of these models complement each other and I would like to combine them. Do you have some suggestions, please?
Let's consider you have trained four models and them let's call them m1, m2, m3 and m4
first define the input layer which is common for all of them.
inputs = Input(shape=your_inputs_shape)
model_1_output = m1(inputs)
model_2_output = m2(inputs)
model_3_output = m3(inputs)
model_4_output = m4(inputs)
merged_layer = Concatenate(axis=your_concatanation_axis)([model_1_output, model_2_output, model_3_output,model_4_output)
new_model = Model(inputs=inputs, outputs=merged_layer)
I hope this will solve your problem.
EDIT:
To answer your question on comment, It is possible to combine only the layers before L2. But you have to decide which model's layers starting from L2, you are going to use(Since you are not combining layers starting from L2). Let's assume you want to use m1 model's layers after L2. In addition I want to add the weighting mechanism I've specified above in comments of the answer.
First let's define new models with common new inputs
new_inputs = Input(shape=(inputs_shape))
new_m1 = keras.models.Model(inputs = new_inputs, outputs = m1(new_inputs))
new_m2 = keras.models.Model(inputs = new_inputs, outputs = m2(new_inputs))
new_m3 = keras.models.Model(inputs = new_inputs, outputs = m3(new_inputs))
new_m4 = keras.models.Model(inputs = new_inputs, outputs = m4(new_inputs))
Now get the L2 layer for all models
model1_l2 = new_m1.layers[1].get_layer("L2").output
model2_l2 = new_m2.layers[1].get_layer("L2").output
model3_l2 = new_m3.layers[1].get_layer("L2").output
model4_l2 = new_m4.layers[1].get_layer("L2").output
weighted merge
merged = Concatenate(axis=your_concatanation_axis)([model1_l2, model2_l2, model3_l2,model4_l2])
merged_layer_shape = merged.get_shape().as_list()
# specify number of channels you want the output to have after merging
desired_output_channels = 32
new_trainable_weights = keras.backend.random_normal_variable(shape=(merged_layer_shape[-1], desired_output_channels),mean=0,scale=1)
weighted_output = keras.backend.dot(merged,new_trainable_weights)
now connect the layer of model1(m1) next to L2 with this new weighted_output
# I'm using some protected properties of layer. But it is not recommended way to do it.
# get the index of l2 layer in new_m1
for i in range(len(new_m1.layers[1].layers)):
if new_m1.layers[1].layers[i].name=="L2":
index = i
x = weighted_output
for i in range(index+1, len(new_m1.layers[1].layers)):
x = new_m1.layers[1].layers[i](x)
new_model = keras.models.Model(inputs=new_inputs, outputs=x)
I am trying to build a neural network without using estimators. I have defined layers as,
x_categorical = tf.placeholder(tf.string)
x_numeric = tf.placeholder(tf.float32)
l1 = tf.add(tf.matmul(x_numeric,weights), biases)
l2 = tf.add(tf.matmul(x_categorical,weights), biases)
tf.matmul works well for numeric features but i also have some categorical features. So i am unable to use them
I tried tf.string_to_hash_bucket_fast but it converts the string to int64 which is not supported by tf.matmul, i also tried tf.decode_raw. that also did not work. So please help me with this I want use categorical features as well.
To handle categorical values in a Neural Network you have to represent them in OneHot representation. If they are string (as it seems to be your case) you first have to convert them to "Integer representation". Step by step:
Using from sklearn.preprocessing import LabelEncoder,OneHotEncoder
Define you categorial string values
categorical_values = np.array([['Foo','bar','values'],['more','foo','bar'],['many','foo','bar']])
Then encode them as integers:
categorical_values[:,0] = LabelEncoder().fit_transform(categorical_values[:,0])
categorical_values[:,1] = LabelEncoder().fit_transform(categorical_values[:,1])
categorical_values[:,2] = LabelEncoder().fit_transform(categorical_values[:,2])
And use OneHotEncoder to obtain the OneHot representation:
oneHot_values = OneHotEncoder().fit_transform(categorical_values).toarray()
Define your graph:
x_categorical = tf.placeholder(shape=[NUM_OBSERVATIONS,NUM_FEATURES],dtype=tf.float32)
weights = tf.Variable(tf.truncated_normal([NUM_FEATURES,NUM_CLASSES]),dtype=tf.float32)
bias = tf.Variable([NUM_CLASSES],dtype=tf.float32)
l2 = tf.add(tf.matmul(x_categorical,weights),bias)
And execute it obtaining the results:
with tf.Session() as sess:
tf.global_variables_initializer().run()
_l2 = sess.run(l2,feed_dict={x_categorical : oneHot_values})
Edit: As requested, no-sklearn version.
Using just numpy.unique() and tensorflow.one_hot()
categorical_values = np.array(['Foo','bar','values']) #For one observation
lookup, labeledValues = np.unique(categorical_values, return_inverse=True)
oneHotValues = tf.one_hot(labeledValues,depth=NUM_FEATURES)
Full example on the JN linked below
Here you have a Jupyter Notebook with the code on my Github
Assume I have a model like this. M1 and M2 are two layers linking left and right sides of the model.
The example model: Red lines indicate backprop directions
During training, I hope M1 can learn a mapping from L2_left activation to L2_right activation. Similarly, M2 can learn a mapping from L3_right activation to L3_left activation.
The model also needs to learn the relationship between two inputs and the output.
Therefore, I should have three loss functions for M1, M2, and L3_left respectively.
I probably can use:
model.compile(optimizer='rmsprop',
loss={'M1': 'mean_squared_error',
'M2': 'mean_squared_error',
'L3_left': mean_squared_error'})
But during training, we need to provide y_true, for example:
model.fit([input_1,input_2], y_true)
In this case, the y_true is the hidden layer activations and not from a dataset.
Is it possible to build this model and train it using it's hidden layer activations?
If you have only one output, you must have only one loss function.
If you want three loss functions, you must have three outputs, and, of course, three Y vectors for training.
If you want loss functions in the middle of the model, you must take outputs from those layers.
Creating the graph of your model: (if the model is already defined, see the end of this answer)
#Here, all "SomeLayer(blabla)" could be replaced by a "SomeModel" if necessary
#Example of using a layer or a model:
#M1 = SomeLayer(blablabla)(L12)
#M1 = SomeModel(L12)
from keras.models import Model
from keras.layers import *
inLef = Input((shape1))
inRig = Input((shape2))
L1Lef = SomeLayer(blabla)(inLef)
L2Lef = SomeLayer(blabla)(L1Lef)
M1 = SomeLayer(blablaa)(L2Lef) #this is an output
L1Rig = SomeLayer(balbla)(inRig)
conc2Rig = Concatenate(axis=?)([L1Rig,M1]) #Or Add, or Multiply, however you're joining the models
L2Rig = SomeLayer(nlanlab)(conc2Rig)
L3Rig = SomeLayer(najaljd)(L2Rig)
M2 = SomeLayer(babkaa)(L3Rig) #this is an output
conc3Lef = Concatenate(axis=?)([L2Lef,M2])
L3Lef = SomeLayer(blabla)(conc3Lef) #this is an output
Creating your model with three outputs:
Now you've got your graph ready and you know what the outputs are, you create the model:
model = Model([inLef,inRig], [M1,M2,L3Lef])
model.compile(loss='mse', optimizer='rmsprop')
If you want different losses for each output, then you create a list:
#example of custom loss function, if necessary
def lossM1(yTrue,yPred):
return keras.backend.sum(keras.backend.abs(yTrue-yPred))
#compiling with three different loss functions
model.compile(loss = [lossM1, 'mse','binary_crossentropy'], optimizer =??)
But you've got to have three different yTraining too, for training with:
model.fit([input_1,input_2], [yTrainM1,yTrainM2,y_true], ....)
If your model is already defined and you don't create it's graph like I did:
Then, you have to find in yourModel.layers[i] which ones are M1 and M2, so you create a new model like this:
M1 = yourModel.layers[indexForM1].output
M2 = yourModel.layers[indexForM2].output
newModel = Model([inLef,inRig], [M1,M2,yourModel.output])
If you want that two outputs be equal:
In this case, just subtract the two outputs in a lambda layer, and make that lambda layer be an output of your model, with expected values = 0.
Using the exact same vars as before, we'll just create two addictional layers to subtract outputs:
diffM1L1Rig = Lambda(lambda x: x[0] - x[1])([L1Rig,M1])
diffM2L2Lef = Lambda(lambda x: x[0] - x[1])([L2Lef,M2])
Now your model should be:
newModel = Model([inLef,inRig],[diffM1L1Rig,diffM2L2lef,L3Lef])
And training will expect those two differences to be zero:
yM1 = np.zeros((shapeOfM1Output))
yM2 = np.zeros((shapeOfM2Output))
newModel.fit([input_1,input_2], [yM1,yM2,t_true], ...)
Trying to answer to the last part: how to make gradients only affect one side of the model.
...well.... at first that sounds unfeasible to me. But, if that is similar to "train only a part of the model", then it's totally ok by defining models that only go to a certain point and making part of the layers untrainable.
By doing that, nothing will affect those layers. If that's what you want, then you can do it:
#using the previous vars to define other models
modelM1 = Model([inLef,inRig],diffM1L1Rig)
This model above ends in diffM1L1Rig. Before compiling, you must set L2Right untrainable:
modelM1.layers[??].trainable = False
#to find which layer is the right one, you may define then using the "name" parameter, or see in the modelM1.summary() the shapes, types etc.
modelM1.compile(.....)
modelM1.fit([input_1, input_2], yM1)
This suggestion makes you train only a single part of the model. You can repeat the procedure for M2, locking the layers you need before compiling.
You can also define a full model taking all layers, and lock only the ones you want. But you won't be able (I think) to make half gradients pass by one side and half the gradients pass by the other side.
So I suggest you keep three models, the fullModel, the modelM1, and the modelM2, and you cycle them in training. One epoch each, maybe....
That should be tested....