I performed data augmentation for images with two channels. My data set is formatted in the shape of (image_Numbers, image_height, image_weights, image_channels), where image_channels = 2.
In performing data augmentation using datagen (created by ImageDataGenerator), a userwarining message is generated:
UserWarning: NumpyArrayIterator is set to use the data format convention
"channels_last" (channels on axis 3),
i.e. expected either 1, 3 or 4 channels on axis 3.
However, it was passed an array with shape (1, 150, 150, 2) (2 channels).
Does the warning imply the data augmentation was unsuccessful? Was it only performed for one channel images? If so, how to perform data augmentation for two-channels of images (not one channel this time and then concatenation)?
It means they don't expect two channel images. It's non standard.
The standard images are:
1 channel: grayscale
3 channels: RGB
4 channels: RGBA
Since it's a warning, we don't really know what's going on.
Check the outputs of this generator yourself.
x, y = theGenerator[someIndex]
Plot x[0] and others.
In case the generated images aren't good, you can do the augmentations yourself using a python generator or a custom keras.utils.Sequence.
Related
I have an image gradient of size (3, 224, 224) and a patch of (1, 768). is it possible to add this gradient to the patch to get a size of the patch (1, 768)?
Forgive my inquisitiveness. I know pytorch too utilizes broadcasting and I am not sure if I will able to do so with two different tensors in way similar to the line below:
torch.add(a, b)
For example:
The end product would be the same patch on the left with the gradient of an entire image on the right added to it. My understanding is that it’s not possible, but knowledge isn’t bounded.
No. Whether two tensors are broadcastable is defined by the following rules:
Each tensor has at least one dimension.
When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.
Because the second bullet doesn't hold in your example (i.e., 768 != 224, 1 not in {224, 768}), you can't broadcast the add. If you have some meaningful way to reshape your gradients, you might be able to.
I figured out to do it myself. I divided the image gradient (right) into 16 x 16 patches, created a loop that adds each patch to the original image patch (left). This way, I was able to add a 224 x 224 image gradient into a 16 x 16 patch. I just wanted to see what would happen if I do such
The 3d CNN works with the videos, MRI, and scan datasets. Can you tell me If I have to feed the input (video) to the proposed 3d CNN network, and train it's weights, how can I able to do that? as 3d CNN expect 5 dimensional inputs;
[batch size, channels, depth, height, weight]
how can I extract depth from the videos?
If I have 10 video of 10 different classes. The duration of each video is 6 seconds. I extract 2 frames for each second and it goes around 12 frames for each video.
Size of RGB videos is 112x112 --> Height = 112, Width=112, and Channels=3
If I keep the batch size equals 2
1 video --> 6 seconds --> 12 frames (1sec == 2frames) [each frame (3,112,112)]
10 videos (10 classes) --> 60 seconds --> 120 frames
So the 5 dimensions will be something like this; [2, 3, 12, 112, 112]
2 --> Two videos will be processed for each batch size.
3 --> RGB channels
12 --> each video contains 12 frames
112 --> Height of each video
112 --> Width of each video
Am I right?
Yes, that seems to make sense if you're looking to use a 3D CNN. You're essentially adding a dimension to your input which is the temporal one, it is logical to use the depth dimension for it. This way you keep the channel axis as the feature channel (i.e. not a spatial-temporal dimension).
Keep in mind 3D CNNs are really memory intensive. There exist other methods to work with temporal dependent input. Here you are not really dealing with a third dimension (a 'spatial' dimension that is), so you're not required to use a 3D CNN.
Edit:
If I give the input of the above dimension to the 3d CNN, will it learn both features (spatial and temporal)? [...] Can you make me understand, spatial and temporal features?
If you use a 3D CNN then your filters will have a 3D kernel, and the convolution will be three dimensional: along the two spatial dimensions (width and height) as well as the depth dimensions (here corresponding to the temporal dimensions, since you're using depth dimension for the sequence of videos frames. A 3D CNN will allow you to capture local ('local' because the perception field is limited by the sizes of the kernels and the overall number of layers in the CNN) spatial and temporal information.
So I am writing custom dataset for medical images, with .nii (NIFTI1 format), but there is a confusion.
My dataloader returns the shape torch.Size (1,1,256,256,51) . But NIFTI volumes use anatomical axes, different coordinate system, so it doesn’t make any sense to permute the axes, which I normally would with volume made of 2D images each stored separately in local drive with 51 slice images (or depth), as Conv3D follows the convention (N,C,D,H,W).
so torch.Size (1,1,256,256,51) (ordinarily 51 would be the depth) doesn’t follow the convention (N,C,D,H,W) , but I should not permute the axes as the data uses entirely different coordinate system ?
In pytorch 3d convolution layer naming of the 3 dimensions you do convolution on is not really important (e.g. this layer doesn't really have a special treatment for depth compared to height). All difference is coming from kernel_size argument (and also padding if you use that). If you permute the dimensions and correspondingly permute the kernel_size parameters nothing will really change. So you can either permute your input's dimensions using e.g. x.permute(0, 1, 4, 2, 3) or continue using your initial tensor with depth as the last dimension.
Just to clarify - if you wanted to use kernel_size=(2, 10, 10) on your DxHxW image, now you can instead to use kernel_size=(10, 10, 2) on your HxWxD image. If you want all your code explicitly assume that dimension order is always D, H, W then you can create tensor with permuted dimensions using x.permute(0, 1, 4, 2, 3).
Let me know if I somehow misunderstand the problem you have.
My dataset consists mostly of 3 channel images, but i also have a few 1 channel images,Is it possible to train a network that takes in both 3 channels and 1 channels as inputs?
Any suggestions are welcome,Thanks in advance,
You can detect the grayscale images by checking the size and apply some transformation to have 3 channels.
It seems to be better to convert images from grayscale to RGB than simply copying the image three times on the channels.
You can do that by cv2.cvtColor(gray_img, cv.CV_GRAY2RGB) if you have opencv-python installed.
If you want a clean implementation you can extend torchvision.transform with a new Transform that does this job automatically.
Load your images and convert them to RGB:
from PIL import Image
image = Image.open(path).convert('RGB')
I want to use image sequence to predict 1 output.
training data:
[(x_img1, y1), (x_img2, y2), ..., (x_img10, y10)]
Color image dimension:
(100, 120, 3)
Output dimention: (1)
Model implemented in Keras:
img_sequence_length = 3
model = Sequential()
model.add(TimeDistributed(Convolution2D(24, 5, 5, subsample=(2, 2), border_mode="same", activation=‘rely’, name='conv1'),
input_shape=(img_sequence_length,
100,
120,
3)))
….
model.add(LSTM(64, return_sequences=True, name='lstm_1'))
model.add(LSTM(10, return_sequences=False, name='lstm_2'))
model.add(Dense(256))
model.add(Dense(1, name='output'))
The batch should be:
A)
[ [(x_img1, y1), (x_img2, y2), (x_img3, y3)],
[(x_img2, y2), (x_img3, y3), (x_img4, y4)],
…
]
Or
B)
[ [(x_img1, y1), (x_img2, y2), (x_img3, y3)],
[(x_img4, y4), (x_img5, y5), (x_img6, y6)],
…
]
Why?
This choice really depends on what you want to achieve. Understanding what your data is totally influences the decision. (Not only the shape and type of the data, but what it means and what you want from it. Is it a video? Many videos? Do I want the name of the character in a little segment of a video? Or to know the state of the plot continously along the video?)
In option A:
This option is used when all your images form a single long sequence and you want to predict the next element in the sequence by knowing a specific number of previous images.
Each group of 3 images in that batch is completely independent.
The layer doesn't keep a memory between them, and the actual memory is length 3.
The simulation of a long sequence happens because you are repeating images in each batch, like a sliding window. But there is no connection or memory transfer from one group to another.
You use this if the sequence has any logical possibility of being predicted from 3 images.
Imagine you have one long video, but you watch only three 3 seconds of it and try to deduce something from those 3 seconds. But, your memory is completely washed away before you watch another 3 seconds. When you watch this 3 new seconds, you will not be able to remember what you watched before and you will not be able to say you watched 4 seconds. All you learn will be confined in 3 second segments.
In option B:
In this option, each group of 3 images has absolutely no connection at all to the others. You can use this as if every group of 3 images were a different sequence (not belonging to a long sequence).
Image you have lots of videos, and they talk about different things. One is Titanic, the other is the Avengers, and so on.
This batch may be used for a case similar to the one proposed in A, but your sliding window would have a step 3. This would make learning faster, but also make it less learning.
Other options:
You can take a look at this question, its answer and the comments to have more ideas.
Some hints on splitting the data:
First, input and output data must be separate:
X = [item[0] for item in training_data]
Y = [item[1] for item in training_data]
Then you must separate the sequences properly.
As you defined in the input_shape, X must follow the same shape.
X.shape must be (numberOfSequences, img_sequence_length, 100, 120, 3)
So if it's a list of images, you must make sure that every image is a numpy array (transform them if necessary), and that you will later convert X to numpy:
X = np.asarray(X_with_numpy_images)
And if you have only one Y for each sequence, you may have it shaped as:
Y.shape must be (numberOfSequences,1)
You probably mount it taking values in steps of 3:
Y = [Y[(i+1)*3 - 1] for i in range(numberOfSequences)]
Now it's important to understand if each sequence of 3 images is independent of the other sequences, or if you have just one huge sequence divided in small parts.
In case one, use LSTM(..., stateful=False), in case two, use LSTM(...,stateful=True)
And you will also probably need to reshape the tensors properly in the transition from the convolutional layers to the LSTM layers, because LSTM will require inputs shaped as (NumberOfSequences, SequenceLength, Features)
A suggestion is to use reshape layers:
model.add(Reshape((img_sequence_length,100*120*3)))
#of course the last dimension may be different if you don't use `padding='same'` in the convolutions of if you use pooling.