When we implement YOLOv2 in darknet, after every 10 epochs, the image size is changed. How is this happening? - conv-neural-network

In YOLO v2, after every 10 epochs, the network randomly chooses a size How is this happening in darknet? I'm using ubuntu 18.04

I think the network sized is changed every 10 iterations (not epochs).
In your cfg file, check the random flag.
random = 1 means Yolo changes the network size for every 10 iterations, it is useful to increase precision by training the network on different resolution.
According to Yolo paper :
However, since our model only uses convolutional and pooling layers it
can be resized on the fly. We want YOLOv2 to be robust to running on
images of different sizes so we train this into the model. Instead of
fixing the input image size we change the network every few
iterations. Every 10 batches our network randomly chooses a new image
dimension size. Since our model downsamples by a factor of 32, we pull
from the following multiples of 32: {320, 352, ..., 608}. Thus the
smallest option is 320 × 320 and the largest is 608 × 608. We resize
the network to that dimension and continue training.

Related

On batch size, epochs, and learning rate of DistributedDataParallel

I have read these threads [1] [2] [3] [4], and this article.
I think I got how batch size and epochs works with DDP, but I am not sure about the learning rate.
Let's say I have a dataset of 100 * 8 images. In a non-distributed scenario, I set the batch size to 8, so each epoch will do 100 gradient steps.
Now I am in a multi-node multi-gpu scenario, with 2 nodes and 4 GPUs (so world size is 8).
I understand that I need to pass batches of 8 / 8 = 1, because each update will aggregate the gradients from the 8 GPUs. In each worker, the data loader will load still 100 batches, but each of 1 sample. So the whole dataset is parsed exactly once per epoch.
I checked and everything seems like that.
But what about the learning rate?
According to the official doc
When a model is trained on M nodes with batch=N, the gradient will be
M times smaller when compared to the same model trained on a single
node with batch=M*N if the loss is summed (NOT averaged as usual)
across instances in a batch (because the gradients between different
nodes are averaged). [...] But in most cases, you can just treat a
DistributedDataParallel wrapped model, a DataParallel wrapped model
and an ordinary model on a single GPU as the same (E.g. using the same
learning rate for equivalent batch size).
I understand that the gradients are averaged, so if the loss is averaged over samples nothing changes, while if it is summer we need to account for that. But does 'nodes' refer to the total number of GPUs across all cluster nodes (world size) or just cluster nodes? In my example, would M be 2 or 8? Some posts in the threads I linked say that the gradient is divided 'by the number of GPUs'. How exactly is the gradient aggregated?
Please refer to the following discussion:
https://github.com/PyTorchLightning/pytorch-lightning/discussions/3706
"As far as I know, learning rate is scaled with the batch size so that the sample variance of the gradients is kept approx. constant.
Since DDP averages the gradients from all the devices, I think the LR should be scaled in proportion to the effective batch size, namely, batch_size * num_accumulated_batches * num_gpus * num_nodes
In this case, assuming batch_size=512, num_accumulated_batches=1, num_gpus=2 and num_noeds=1 the effective batch size is 1024, thus the LR should be scaled by sqrt(2), compared to a single gpus with effective batch size 512."

How to inference high resolution image(1000×1000) via unet

I have train the image segmentation model with basic U-Net. When I was testing the model it occurs to me that the images with resolution 256 by 256, produced very good results
But with dimensions 512 by 512 or 1000 by 1000 the model was producing bad results for the same image.
Is there any method How can I handle this scenerio.
I have tried cv2.resize(), but the mask is getting degraded as I increase its size from 256×256 to 1000×1000.
So how can I tackle this problem.

Is there a PyTorch with CUDA Unified GPU-CPU Memory fork?

So Training a DNN model can be a pain when a batch of one image takes 15GB. Speed is not so important for me, yet to fit bigger batches (and models is). So I wonder if there is a PyTorch with CUDA Unified Memory fork or something like that to fit giant models (having 16gb per GPU RAM, yet 250 on CPU side it seems quite resonable)?
If you do not care about the time it takes, but need large batches, you can use a more slow approach. Say you need to have a batch of 128 samples but your gpu memory fits only 8 samples. You can create smaller batches of 8 samples and then average their gradients.
For each small batch of 8 samples that you evaluate, you keep the .grad of each parameter in your cpu memory. You keep a list of grads for each of your models parameters. After you have gathered the grads for 16 batches of 8 samples (128 samples in total) you can average the gradients of each parameter and put the result back into the .grad attribute of each parameter.
You can then call the .step() of your optimizer. This should yield exactly the same results as if you were using a large batch of 128 samples.

what are the differences between the batch parameter in the yolov3.cfg and the batch_size parameter in the keras.fit()?

What are the differences between the batch parameter in the yolov3.cfg file and the batch_size parameter in the keras.fit()? How should I set them? please.
There is no difference, batch size means how many images (samples) will be in a mini-batch while training. For yolo, usually in the inference case, the batch_size is 1.
How would you set it?
Go for as high as you can unless you run out of GPU memory.
Batch size is a term used in machine learning and refers to the number of training examples utilized in one iteration.
Traditionally, batch_size are chosen as a power of 2 -> 8, 16, 32, 64. The higher the batch size the faster the convergence usually.

What's the difference between "samples_per_epoch" and "steps_per_epoch" in fit_generator

I was confused by this problem for several days...
My question is that why the training time has such massive difference between that I set the batch_size to be "1" and "20" for my generator.
If I set the batch_size to be 1, the training time of 1 epoch is approximately 180 ~ 200 sec.
If I set the batch_size to be 20, the training time of 1 epoch is approximately 3000 ~ 3200 sec.
However, this horrible difference between these training times seems to be abnormal..., since it should be the reversed result:
batch_size = 1, training time -> 3000 ~ 3200 sec.
batch_size = 20, training time -> 180 ~ 200 sec.
The input to my generator is not the file path, but the numpy arrays which are already loaded into the
memory via calling "np.load()".
So I think the I/O trade-off issue doesn't exist.
I'm using Keras-2.0.3 and my backend is tensorflow-gpu 1.0.1
I have seen the update of this merged PR,
but it seems that this change won't affect anything at all. (the usage is just the same with original one)
The link here is the gist of my self-defined generator and the part of my fit_generator.
When you use fit_generator, the number of samples processed for each epoch is batch_size * steps_per_epochs. From the Keras documentation for fit_generator: https://keras.io/models/sequential/
steps_per_epoch: Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples of your dataset divided by the batch size.
This is different from the behaviour of 'fit', where increasing batch_size typically speeds up things.
In conclusion, when you increase batch_size with fit_generator, you should decrease steps_per_epochs by the same factor, if you want training time to stay the same or lower.
Let's clear it :
Assume you have a dataset with 8000 samples (rows of data) and you choose a batch_size = 32 and epochs = 25
This means that the dataset will be divided into (8000/32) = 250 batches, having 32 samples/rows in each batch. The model weights will be updated after each batch.
one epoch will train 250 batches or 250 updations to the model.
here steps_per_epoch = no.of batches
With 50 epochs, the model will pass through the whole dataset 50 times.
Ref - https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
You should also take into account the following function parameters when working with fit_generator:
max_queue_size, use_multiprocessing and workers
max_queue_size - might cause to load more data than you actually expect, which depending on your generator code may do something unexpected or unnecessary which can slow down your execution times.
use_multiprocessing together with workers - might spin-up additional processes that would lead to additional work for serialization and interprocess communication. First you would get your data serialized using pickle, then you would send your data to that target processes, then you would do your processing inside those processes and then the whole communication procedure repeats backwards, you pickle results, and send them to the main process via RPC. In most cases it should be fast, but if you're processing dozens of gigabytes of data or have your generator implemented in sub-optimal fashion then you might get the slowdown you describe.
The whole thing is:
fit() works faster than fit_generator() since it can access data directly in memory.
fit() takes numpy arrays data into memory, while fit_generator() takes data from the sequence generator such as keras.utils.Sequence which works slower.

Resources