I would like to use Keras to train a multi-input NN with a batch of training data, but I'm not able to pass a set of input and output samples to execute a fit or a train_on_batch on the model.
My NN is defined as following:
i1 = keras.layers.Input(shape=(2,))
i2 = keras.layers.Input(shape=(2,))
i3 = keras.layers.Input(shape=(2,))
i_layer = keras.layers.Dense(2, activation='sigmoid')
embedded_i1 = i_layer(i1)
embedded_i2 = i_layer(i2)
embedded_i3 = i_layer(i3)
middle_concatenation = keras.layers.concatenate([embedded_i1, embedded_i2, embedded_i3], axis=1)
out = keras.layers.Dense(1, activation='sigmoid')(middle_concatenation)
model = keras.models.Model(inputs=[i1, i2, i3], outputs=out)
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
For example, an instance of the input (successfully used for predict the output) is the following:
[array([[0.1, 0.2]]), array([[0.3, 0.5]]), array([[0.1, 0.3]])]
But when I try to train my model with:
inputs = [[np.array([[0.1, 0.2]]), np.array([[0.3, 0.5]]), np.array([[0.1, 0.3]])],
[np.array([[0.2, 0.1]]), np.array([[0.5, 0.3]]), np.array([[0.3, 0.1]])]
]
outputs = np.ones(len(inputs))
model.fit(inputs, outputs)
I get this error:
ValueError: Error when checking model input: you are passing a list as input to your model, but the model expects a list of 3 Numpy arrays instead. The list you passed was: [[array([[ 0.1, 0.2]]), array([[ 0.3, 0.5]]), array([[ 0.1, 0.3]])], [array([[ 0.2, 0.1]]), array([[ 0.5, 0.3]]), array([[ 0.3, 0.1]])]]
What am I doing wrong?
How can I train a multi-input NN with a batch of input/output samples?
Thank you!
the problem is just incorrect formatting. You can't pass a list to keras, only numpy arrays, so when you have your data structured like
inputs = [[np.array([[0.1, 0.2]]), np.array([[0.3, 0.5]]), np.array([[0.1, 0.3]])],
[np.array([[0.2, 0.1]]), np.array([[0.5, 0.3]]), np.array([[0.3, 0.1]])]
]
You need to pass one list element into your model at a time. You will also need to pass one output value to the model at a time. To do this, structure you outputs like this
outputs = [np.ones(1) for x in inputs]
[array([ 1.]), array([ 1.])]
Then you can loop over the the fit function like this
for z in range(0,len(inputs)):
model.fit(inputs[z],outputs[z],batch_size=1)
you can also replace model.fit with model.train_on_batch() instead, see docs
however to avoid the loop, you could just have 3 numpy arrays stored in your inputs list and have you single outputs as a numpy array. If you only want to train on a single batch at a time, you could set your batch size to do that.
inputs = [np.array([[0.1, 0.2],[0.2, 0.1]]), np.array([[0.3, 0.5],[0.5, 0.3]]), np.array([[0.1, 0.3],[0.3, 0.1]])]
outputs = np.ones(inputs[0].shape[0])
model.fit(inputs,outputs,batch_size=1)
The problem is that right now you are using a list of lists as input, although keras expects a list of arrays.
You need to convert your list so that it looks like [array_inputs_1, array_inputs_2, array_inputs_3], where each input array is the array of inputs you would pass the model if it had only that input layer, you just put the 3 of them inside a list.
Using your data the correct input should be:
[np.array([[0.1, 0.2], [0.2, 0.1]]),
np.array([[0.3, 0.5], [0.5, 0.3]]),
np.array([[0.1, 0.3], [0.1, 0.3]])]
This way, as long as all 3 input arrays have the same number of elements, keras will know how to divide tham into batches.
Related
I am doing a time-series forecasting in Keras with a CNN and the EHR dataset. The goal is to predict both what molecule to give to the patient and the time until the next patient visit. I have to implement a bi-objective gradient descent based on this paper. The algorithm to implements is here (end of page 7, the beginning of page 8):
The model I choose is this one :
With time-series of length 3 as input (correspondings to 3 consecutive visits for a client)
And 2 outputs:
the atc code (the code of the molecule to predict)
the time to wait until the next visit (in categories of months: 0,1,2,3,4 for >=4)
both outputs use the SparseCategoricalCorssentropy loss function.
when I start to implement the first operation: gs - gl I have this error :
Some values in my gradients are at None and I don't know why. My optimizer is defined as follow: optimizer=tf.Keras.optimizers.Adam(learning_rate=1e-3 when compiling my model.
Also, when I try some operations on gradients to see how things work, I have another problem: only one input is taken into account which will pose a problem later because I have to consider each loss function separately:
With this code, I have this output message : WARNING:tensorflow:Gradients do not exist for variables ['outputWaitTime/kernel:0', 'outputWaitTime/bias:0'] when minimizing the loss.
EPOCHS = 1
for epoch in range(EPOCHS):
with tf.GradientTape() as ATCTape, tf.GradientTape() as WTTape:
predictions = model(xTrain,training=False)
ATCLoss = loss(yTrain[:,:,0],predictions[ATC_CODE])
WTLoss = loss(yTrain[:,:,1],predictions[WAIT_TIME])
ATCGrads = ATCTape.gradient(ATCLoss, model.trainable_variables)
WTGrads = WTTape.gradient(WTLoss,model.trainable_variables)
grads = ATCGrads + WTGrads
model.optimizer.apply_gradients(zip(grads, model.trainable_variables))
With this code, it's okay, but both losses are combined into one, whereas I need to consider both losses separately
EPOCHS = 1
for epoch in range(EPOCHS):
with tf.GradientTape() as tape:
predictions = model(xTrain,training=False)
ATCLoss = loss(yTrain[:,:,0],predictions[ATC_CODE])
WTLoss = loss(yTrain[:,:,1],predictions[WAIT_TIME])
lossValue = ATCLoss + WTLoss
grads = tape.gradient(lossValue, model.trainable_variables)
model.optimizer.apply_gradients(zip(grads, model.trainable_variables))
I need help to understand why I have all of those problems.
The notebook containing all the code is here: https://colab.research.google.com/drive/1b6UorAAEddNKFQCxaK1Wsuj09U645KhU?usp=sharing
The implementation begins in the part Model Creation
The reason you get None in ATCGrads and WTGrads is because two gradients corresponding loss is wrt different outputs outputATC and outputWaitTime, if
outputs value is not using to calculate the loss then there will be no gradients wrt that outputs hence you get None gradients for that output layer. That is also the reason why you get WARNING:tensorflow:Gradients do not exist for variables ['outputWaitTime/kernel:0', 'outputWaitTime/bias:0'] when minimizing the loss, because you don't have those gradients wrt each loss. If you combine losses into one then both outputs are using to calculate the loss, thus no WARNING.
So if you want do a list element wise subtraction, you could first convert None to 0. before subtraction, and you cannot using tf.math.subtract(gs, gl) because it require shapes of all inputs must match, so:
import tensorflow as tf
gs = [tf.constant([1., 2.]), tf.constant(3.), None]
gl = [tf.constant([3., 4.]), None, tf.constant(4.)]
to_zero = lambda i : 0. if i is None else i
gs = list(map(to_zero, gs))
gl = list(map(to_zero, gl))
sub = [s_i - l_i for s_i, l_i in zip(gs, gl)]
print(sub)
Outpts:
[<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-2., -2.], dtype=float32)>,
<tf.Tensor: shape=(), dtype=float32, numpy=3.0>,
<tf.Tensor: shape=(), dtype=float32, numpy=-4.0>]
Also beware the tape.gradient() will return a list or nested structure of Tensors (or IndexedSlices, or None), one for each element in sources. Returned structure is the same as the structure of sources; Add two list [1, 2] + [3, 4] in python will not give you [4, 6] like you do in numpy array, instead it will combine two list and give you [1, 2, 3, 4].
I am quite new to keras and I have a problem in understanding shapes.
I wanted to create 1D Conv Keras model as follows, I don't know this is correct or not:
TIME_PERIODS = 511
num_sensors = 2
num_classes = 4
BATCH_SIZE = 400
EPOCHS = 50
model_m = Sequential()
model_m.add(Conv1D(100, 10, activation='relu', input_shape=(TIME_PERIODS, num_sensors)))
model_m.add(Conv1D(100, 10, activation='relu'))
model_m.add(MaxPooling1D(3))
model_m.add(Conv1D(160, 10, activation='relu'))
model_m.add(Conv1D(160, 10, activation='relu'))
model_m.add(GlobalAveragePooling1D())
model_m.add(Dropout(0.5))
model_m.add(Dense(num_classes, activation='softmax'))
The input data I have is 888 different panda data frame where each frame is of shape (511, 3) where 511 is numbers of signal points and 0th column is sensor1 values, 1st column is sensor2 values and 2nd column is labelled signals.
Now how I should combine all my 888 different panda data frame so I have x_train and y_train from X and Y using Sklearn train_test_split.
Also, I think the input shape I am defining for the model is wrong and I don't think I actually have TIME_PERIODS because, for 1-time point, I have 2 sensor inputs (orange, blue line) value and 1 output label (green line).
The context of the problem I am trying to solve e.g.
input: time-based 2 sensors values say for 1 AM-2 AM hour from a user, output: the range of times e.g where the user was doing activity 1, activity 2, activity X on 1:10-1:15, 1:15-1:30, 1:30-2:00, The above plot show a sample training input and output.
The problem is inspired from here but in my case, I don't have any time period, my 1-time point has 1 output label.
Update 1:
I am almost certain that my TIME_PERIODS=1 as for the prediction I will give 511 inputs and expects to get 511 output values.
Each dataframe is an independent sequence?
fileNames = get a list of filenames here, you can maybe os.listdir for that
allFrames = [pandas.read_csv(filename,... other_things...).values for filename in fileNames]
allData = np.stack(allFrames, axis=0)
inputData = allData[:,:num_sensors]
outputData = allData[:, -1:]
You can now use train test split the way you want.
Your input shape is correct.
If you want to predict the whole sequence, then you have to remove the poolings. Every convolution should use padding='same'.
And maybe you should use a Biridectional(LSTM(units, return_sequences=True)) layer somewhere to make your model stronger.
A simple model as an example. (Notice that models are totally open to creativity)
from keras.layers import *
inputs = Input((TIME_PERIODS,num_sensors)) #Should be called "time_steps" to be precise
outputs = Conv1D(any, 3, padding='same', activation = 'tanh')(inputs)
outputs = Bidirectional(LSTM(any, return_sequences=True))(outputs)
outputs = Conv1D(num_classes, activation='softmax', padding='same')(outputs)
model = keras.models.Model(inputs, outputs)
To say the least, you're in the correct path. The full solution for this would be like,
df = pd.concat([pd.read_csv(fname, index_col=<int>, header=<int>) for f filenames], ignore_index=True, axis=0)
inputs = df.loc[:,:-1]
labels = df.loc[:,0]
X_train, X_test, y_train, y_test = train_test_split(inputs, labels, test_size=<float>)
To add a bit more information, note how you are doing,
model_m.add(Conv1D(100, 10, activation='relu', input_shape=(TIME_PERIODS, num_sensors)))
and not
model_m.add(Conv1D(100, 10, activation='relu', padding='SAME', input_shape=(TIME_PERIODS, num_sensors)))
So, as you're not setting padding="Same" for the convolution layers this might have the undesirable effect of input becoming smaller and smaller as you go deeper to the model. If that's what you need, that's okay. Otherwise, set `padding="SAME".
For example, without same-padding you'll get, a width around 144 when you get to the GlobalPooling layer, where if you use same-padding it would be roughly 170. It's not a major problem here, but can easily lead to negative sizes in your input for deeper layers.
I'm using the estimator library of tensorflow on python. I want to train a student network by using a pre-trained teacher.I'm facing the following issue.
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": train_data},
y=train_labels,
batch_size=100,
num_epochs=None,
shuffle=True)
student_classifier.train(
input_fn=train_input_fn,
steps=20,
hooks=None)
This code returns a generator object that is passed to a student classifier. Inside the generator, we have the inputs and labels (in batches of 100) as tensors. The problem is, I want to pass the same values to the teacher model and extract its softmax outputs. But unfortunately, the model input requires a numpy array as follows
student_classifier = tf.estimator.Estimator(
model_fn=student_model_fn, model_dir="./models/mnist_student")
def student_model_fn(features, labels, mode):
sess=tf.InteractiveSession()
tf.train.start_queue_runners(sess)
data=features['x'].eval()
out=labels.eval()
sess.close()
input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])
eval_teacher_fn = tf.estimator.inputs.numpy_input_fn(
x={"x":data},
y=out,
num_epochs=1,
shuffle=False)
This requires x and y to be numpy arrays so I converted it via using such as ugly hack of using a session to convert tensor to numpy. Is there a better way of doing this?
P.S. I tried tf.estimator.Estimator.get_variable_value() but it retrieves weights from the model, not the input and output
Convert Tensor to Numpy_array using tf.make_ndarray.
tf.make_ndarray(), Create a numpy ndarray with the same shape and data as the tensor.
Sample working code:
import tensorflow as tf
a = tf.constant([[1,2,3],[4,5,6]])
proto_tensor = tf.make_tensor_proto(a)
tf.make_ndarray(proto_tensor)
output:
array([[1, 2, 3],
[4, 5, 6]], dtype=int32)
# output has shape (2,3)
As we know:
Keras.layers.Embedding turns positive integers (indexes) into dense vectors of fixed size. e.g. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
I want to know how can I see or print the dense vector output.
Or
how to see a tensor object's output?
You can take a look here : https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer
In a few words :
Create a new model from your trained model with the output layer in which you are interested, then use the methode predict.
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
I am trying to modify Resnet50 with my custom data as follows:
X = [[1.85, 0.460,... -0.606] ... [0.229, 0.543,... 1.342]]
y = [2, 4, 0, ... 4, 2, 2]
X is a feature vector of length 2000 for 784 images. y is an array of size 784 containing the binary representation of labels.
Here is the code:
def __classifyRenet(self, X, y):
image_input = Input(shape=(2000,1))
num_classes = 5
model = ResNet50(weights='imagenet',include_top=False)
model.summary()
last_layer = model.output
# add a global spatial average pooling layer
x = GlobalAveragePooling2D()(last_layer)
# add fully-connected & dropout layers
x = Dense(512, activation='relu',name='fc-1')(x)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu',name='fc-2')(x)
x = Dropout(0.5)(x)
# a softmax layer for 5 classes
out = Dense(num_classes, activation='softmax',name='output_layer')(x)
# this is the model we will train
custom_resnet_model2 = Model(inputs=model.input, outputs=out)
custom_resnet_model2.summary()
for layer in custom_resnet_model2.layers[:-6]:
layer.trainable = False
custom_resnet_model2.layers[-1].trainable
custom_resnet_model2.compile(loss='categorical_crossentropy',
optimizer='adam',metrics=['accuracy'])
clf = custom_resnet_model2.fit(X, y,
batch_size=32, epochs=32, verbose=1,
validation_data=(X, y))
return clf
I am calling to function as:
clf = self.__classifyRenet(X_train, y_train)
It is giving an error:
ValueError: Error when checking input: expected input_24 to have 4 dimensions, but got array with shape (785, 2000)
Please help. Thank you!
1. First, understand the error.
Your input does not match the input of ResNet, for ResNet, the input should be (n_sample, 224, 224, 3) but you are having (785, 2000). From your question, you have 784 images with array of size 2000, which doesn't really align with the original ResNet50 input shape of (224 x 224) no matter how you reshape it. That means you cannot use the ResNet50 directly with your data. The only thing you did in your code is to take the last layer of ResNet50 and added you output layer to align with your output class size.
2. Then, what you can do.
If you insist to use the ResNet architecture, you will need to change the input layer rather than output layer. Also, you will need to reshape your image data to utilize the convolution layers. That means, you cannot have it in a (2000,) array, but need to be something like (height, width, channel), just like what ResNet and other architectures are doing. Of course you will also need to change the output layer as well just like you did so that you are predicting for your classes. Try something like:
model = ResNet50(input_tensor=image_input_shape, include_top=True,weights='imagenet')
This way, you can specify customized input image shape. You can check the github code for more information (https://github.com/keras-team/keras/blob/master/keras/applications/resnet50.py). Here's part of the docstring:
input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(224, 224, 3)` (with `channels_last` data format)
or `(3, 224, 224)` (with `channels_first` data format).
It should have exactly 3 inputs channels,
and width and height should be no smaller than 197.
E.g. `(200, 200, 3)` would be one valid value.