In normal ANN each training sample is represented by a row of the matrix and in that way batches of training data can be processed but in CNN how multiple images are processed.
The same with ANN, you can stack up the images to n-dimensions tensor to be processed.
For CNNs that are trained on images, for example, say your dataset is RGB (3-channel) images that are 256x256 pixels. A single image can be represented by a 3 x 256 x 256 matrix. If you set your batch size to be 10, that means you’re concatenating 10 images together into a 10 x 3 x 256 x 256 matrix.
Tuning the batch size is one of the aspects of getting training right - if your batch size is too small, then there will be a lot of variance within a batch, and your training loss curve will bounce around a lot. But if it’s too large, your GPU will run out of memory to hold it, or training will progress too slowly to see if it’s the optimization is diverging early on.
Related
I recently trained a GAN with ~1000 real images and 64 fake images. This obviously wasn't enough fake images, so most of the "latent space" in the GAN just creates the same image. How many fake images are GANs usually trained with in order to make their latent space usable?
The number of fake images that you generate depends on the size of the noise vector that you sample. Ideally, in a single step of training, you would sample a noise vector of dimension batch_size x noise_dim and use the generator to generate batch_size fake images. Similarly, your discriminator would see the batch_size number of real images.
This way, at every step, your discriminator sees an equal number of fake and real images, and the total number of fake images seen by your discriminator would be equal to the total number of images in your training set itself.
I'm having some difficulty grasping the input_shape for an LSTM layer in Keras. Assume that is the first layer in the network; it takes input of the form (batch, time, features). Also assume there is only one feature, so the input is of the form (batch, time, 1).
Is the number "batch" the batch size or the number of batches? I assume it's the batch size from the examples I've seen online. Then I'm struggling to see how the number of batches isn't always one.
As a concrete example, I have a time series of 1000 steps, which I split to 10 series of 100 steps. One epoch is when the network goes through all 1000 steps, the 10 series. I should be free to split the 10 series into different batches with different batch sizes, but then the input would be of the form (number of batches, batch size, time steps, 1). What am I misunderstanding?
I am new to CNNs and am building a model using Keras to combine inputs from multiple sources. Two of my sources have different dimensions and cannot be scaled by an integer number (i.e., x2 or x3 smaller). Therefore, simply max-pooling will not work. I am having trouble figuring out how to downsample the larger image. Here are the exact dimensions:
Image1: 7000 x 4000
Image2: 2607 x 1370
Is there a best practice for dealing with non-conventional downsampling?
I am applying a Conv2D layer and am thinking that combing the appropriately sized filter (1787x1261 with stride=1) with a max pooling (2x2 and stride=2) would give me the correct dimensions. Any reason why that is a bad idea? This does seem like a large filter compared to the total size of the image.
Somewhat related, would it be better to run the model on smaller chunks of the full image? That way I could control the size of each chunk?
Usually, the input to a convolutional neural network (CNN) is described by an image with given width*height*channels. Will the number of input nodes be different for different numbers of a batch size, as well? Meaning, will the number of input nodes become batch_size*width*height*channels?
From this excellent answer batch size defines number of samples that going to be propagated through the network. Batch size does not have an effect on the network's architecture, including quantity of inputs.
Let's say you have 1,000 RGB images with size 32x32, and you set the batch size to be 100. Your convolutional network should have an input shape of 32x32x3. To train the network the training algorithm picks a sample of 100 images out of the total 1,000, and trains the network on each individual image in that subset. That is your 'batch'. The network architecture doesn't care that your subset (batch) has 100, 200 or 1,000 images, it only cares about the shape of a single image, because that's all it sees at a time. When the network has been trained on all 100 images then it has completed one epoch, and the network parameters are updated.
The training algorithm will pick a different batch of images for each epoch, but the above always holds true: the network only sees a single image at a time, so the image shape must match the input layer shape, with no regard for how many images in that particular batch.
As to why we have batches instead of just training on the entire set (ie setting batch size to 100% and training for one epoch), with GPUs it makes training much faster, and also means the parameters are updated less often while training, giving a smoother and more reliable convergence.
I want to feed images to a Keras CNN. The program randomly feeds either an image downloaded from the net, or an image of random pixel values. How do I set batch size and epoch number? My training data is essentially infinite.
Even if your dataset is infinite, you have to set both batch size and number of epochs.
For batch size, you can use the largest batch size that fits into your GPU/CPU RAM, by just trial and error. For example you can try power of two batch sizes like 32, 64, 128, 256.
For number of epochs, this is a parameter that always has to be tuned for the specific problem. You can use a validation set to then train until the validation loss is maximized, or the training loss is almost constant (it converges). Make sure to use a different part of the dataset to decide when to stop training. Then you can report final metrics on another different set (the test set).
It is because implementations are vectorised for faster & efficient execution. When the data is large, all the data cannot fit the memory & hence we use batch size to still get some vectorisation.
In my opinion, one should use a batch size as large as your machine can handle.