Is it possible to reshape 512x512 rgb image to (timestep, dim)? Otherwards, I am trying to convert this reshape layer: Reshape((23, 3887)) to 512 vice 299. Also, is there any documentation explaining how to determine input_dim and timestep for Keras?
It seems like your problem is similar to one that i had earlier today. Look at it here: Keras functional API: Combine CNN model with a RNN to to look at sequences of images
Now to add to the answer from the question i linked too. Let number_of_images be n. In your case the original data format would be (n, 512, 512, 3). All you then need to do decide how many images you want per sequence. Say you want a sequence of 5 images and have gotten 5000 images in total. Then reshaping to (1000, 5, 512, 512, 3) should do. This way the model sees 1000 sequences of 5 images.
Related
I have a couple of questions.
I have data of the following shape:
(32, 64, 11)
where 32 is the batch size, 64 is a sequence length and 11 is the number of features. each sample of mine is 64X11, and has a label of 0 or 1.
I’d like to predict when a sequence has a label of “1”.
I’m trying to use a simple architecture with
conv1D → ReLU → flatten → linear → sigmoid.
For the Conv1D I thought that since it is a multi variate time series prediction, and each row in my data is a second, I think that the number of in channels should be the number of features, since that way it will process all of the features concurrently, (I don’t have any spatial things in my data, it doesn’t matter if a column is in index 0 or 9, as it is important in image with pixels.
I can't get to decide how to “initialize” the conv1D parameters. Currently I think the number of channels should be the number of features and not 1, as the reason I just explained, but unsure of it.
Secondly, should the loss function be BCELOSS or something else? assuming that my labels are 0 or 1, and the prediction for me is I want the model to provide a probability of belonging to class with label 1.
A lot of thanks.
I am trying to build an image classification model with Keras that should be able to classify cups as "good condition" or "defect". This was easy to do with a single image as input. However, I now want to try feeding 6 images, one from every angle(top, bottom, side etc), as input. What would be the best approach for this ? My initial idea was a np array of shape (6, width, height, 3), but this deemed unsuitable.
In order to do multiclass segmentation the masks need to be one-hot-encoded. For example if I have a 100 images of shape 224x224x3 with 5 different classes I would have a set of masks with shape (100, 224, 224, 5) i.e the last dimension (the channel) refers to the class of the pixel. Take a grayscale masks that contains 6 classes where each pixel has the label 1-6, I can easily convert this to the categorical mask I need using tf.keras.utils.to_categorical.
If I use the ImageDataGenerator provided with keras I know I can create a generator for both images and masks then zip them together for the problem (as code shows below) but where i'm confused is how do I convert the masks into this categorical one-hot-encoded structure whilst using the ImageDataGenerator? The ImageDataGenerator only finds files in directories that are saved as images therefore I can't convert the masks and then save them down as numpy arrays (the one-hot-encoded masks) for the generator to pick up, as images can't have that have more than 4 channels right? Is there somehow of telling the generator to do this conversion? Or does this therefore limit the number of classes I can have in my problem?
One solution is to write my own custom generator with the sequence class which I have done but I'm keen on understanding if this is possible to do with Keras inbuilt ImageDataGenenerator? Could writing my a lambda layer on the network be the solution?
mask_categorical = tf.keras.utils.to_categoricl(mask) #converts 224x224 grayscale mask to one-hot encoding version
imgDataGen = ImageDataGenerator(rescale=1/255.)
maskDataGen = ImageDataGenerator()
imageGenerator =imageDataGen.flow_from_directory("dataset/image/",
class_mode=None, seed=40)
maskGenerator = maskDataGen.flow_from_directory("dataset/mask/",
class_mode=None, seed=40)
trainGenerator = zip(imageGenerator, maskGenerator)
In TensorFlow, how do I convolve each image in a minibatch with a different 2D kernel? Each minibatch of images has size [10000, 32, 32] and the corresponding filter has size [10000, 2, 2]---10000 kernels, each 2 pixels x 2 pixels. I'd like to get output with size [10000, 31, 31]. (I plan to set the stride lengths all to 1 and to use the "VALID" option to turn off padding, so the output images would have size 31x31 while the input images have size 32x32.)
In a related question, the solution was to add a "depth" dimension to the minibatch of images, and then to use conv3d rather than conv2d. But in that problem, the op seemed content to get just one image back as output, rather than one image as output for each sample in the minibatch.
Ah, the tf.nn.depthwise_conv2d function does exactly what I wanted. I don't think there was any way to use conv2d or conv3d for the task.
Newbie to Keras alert!!!
I've got some questions related to Recurrent Layers in Keras (over theano)
How is the input supposed to be formatted regarding timesteps (say for instance I want a layer that will have 3 timesteps 1 in the future 1 in the past and 1 current) I see some answers and the API proposing padding and using the embedding layer or to shape the input using a time window (3 in this case) and in any case I can't make heads or tails of the API and SimpleRNN examples are scarce and don't seem to agree.
How would the input time window formatting work with a masking layer?
Some related answers propose performing masking with an embedding layer. What does masking have to do with embedding layers anyway, aren't embedding layers basically 1-hot word embeddings? (my application would use phonemes or characters as input)
I can start an answer, but this question is very broad so I would appreciate suggestions on improvement to my answer.
Keras SimpleRNN expects an input of size (num_training_examples, num_timesteps, num_features).
For example, suppose I have sequences of counts of numbers of cars driving by an intersection per hour (small example just to illustrate):
X = np.array([[10, 14, 2, 5], [12, 15, 1, 4], [13, 10, 0, 0]])
Aside: Notice that I was taking observations over four hours, and the last two hours had no cars driving by. That's an example of zero-padding the input, which means making all of the sequences the same length by adding 0s to the end of shorter sequences to match the length of the longest sequence.
Keras would expect the following input shape: (X.shape[0], X.shape1, 1), which means I could do this:
X_train = np.reshape(X, (X.shape[0], X.shape[1], 1))
And then I could feed that into the RNN:
model = Sequential()
model.add(SimpleRNN(units=10, activation='relu', input_shape = (X.shape[1], X.shape[2])))
You'd add more layers, or add regularization, etc., depending on the nature of your task.
For your specific application, I would think you would need to reshape your input to have 3 elements per row (last time step, current, next).
I don't know much about the masking layers, but here is a good place to start.
As far as I know, embeddings are independent of maskings, but you can mask an embedding.
Hope that provides a good starting point!