PyTorch Training exitting after Caching Images - pytorch

I have a dataset of around 12k Training Images and 500 Validation Images. I am using YOLOv5-PyTorch to train my model. When i start the training, and when it comes down to the Caching Images stage, it suddenly quits.
The code I'm using to run this is as follows:
!python train.py --img 800 --batch 32 --epochs 20 --data '/content/data.yaml' --cfg ./models/custom_yolov5s.yaml --weights yolov5s.pt --name yolov5s_results --cache
I am using Google Colab to train my model.
This is the command that executes before shutting down:
train: Caching Images (12.3GB ram): 99% 11880/12000 [00:47<00:00,
94.08it/s]

So i solved the above problem. The problem is occuring because we are caching all the images fore-hand as to increase the speed during epochs. Now this may increase the speed but on the other hand, it also consumes memory. When you are using Google Colab, it provides you 12.69GB of RAM. When caching such huge data, all of the RAM was being consumed and there was nothing left to cache validation set hence, it shuts down immediately.
There are two basic methods to solve this issue:
Method 1:
I simply reduced the image size from 800 to 640 as my training images didn't contain any small object, so i actually did not need large sized images. It reduced my RAM consumption by 50%
--img 640
train: Caching Images (6.6GB ram): 100% 12000/12000 [00:30<00:00,
254.08it/s]
Method 2:
I had written an argument at the end of my command that I'm using to run this project :
--cache
This command caches the entire dataset in the first epoch so it may be reused again instantly instead of processing it again. If you are willing to compromise on training speed, then this method would work for you. Just simply remove this line and you will be good to go. Your new command to run will be:
!python train.py --img 800 --batch 32 --epochs 20 --data '/content/data.yaml' --cfg ./models/custom_yolov5s.yaml --weights yolov5s.pt --name yolov5s_results

Maybe you should add "VRAM consumption" to your title, because this was the main reason your training was crashing.
Your awnser is still right though, but I would like to get into more detail, to why such crashes can happen for people with this kind of problems. Yolov5 works with Imagesizes of x32. If you have Imagesizes that are not a multiple of x32, Yolov5 will try to strech the image in every epoch and consume a lot of VRAM, that shouldn't be consumed (at least not for this). Large imagesizes also consume a lot of VRAM, so even if it is a multiple of x32 your setup or config could not be enouth for this training.The Cache command is speeding up your training, but with the downside of consuming more VRAM. Batchsizes are a big role of VRAM consumption. If you really want to train with a large Imagesize, you should reduce your batchsize for a multimple of x2.
I hope this helps somebody.

Related

pytorch loss.backward() keeps running for hours

I am using pytorch to train some x-ray images but I ran into the following issue:
in the line : loss.backward(), the program just keeps running and never end, and there is no error or warning.
loss, outputs = self.forward(images, targets)
loss = loss / self.accumulation_steps
print("loss calculated: " + str(loss))
if phase == "train":
print("running loss backwarding!")
loss.backward()
print("loss is backwarded!")
if (itr + 1 ) % self.accumulation_steps == 0:
self.optimizer.step()
self.optimizer.zero_grad()
The loss calculated before this is something like tensor(0.8598, grad_fn=<DivBackward0>).
Could anyone help me with why this keeps running or any good ways to debug the backward() function?
I am using torch 1.2.0+cu92 with the compatible cuda 10.0.
Thank you so much!!
It's hard to give a definite answer but I have a guess.
Your code looks fine but from the output you've posted (tensor(0.8598, grad_fn=<DivBackward0>)) I conclude that you are operating on your CPU and not on the GPU.
One possible explanation is that the backwards pass is not running forever, but just takes very very long. Training a large network on a CPU is much slower than on a GPU. Check your CPU and memory utilization. It might be that your data and model is too big to fit into your main memory, forcing the operation system to use your hard disk, which would slow down execution by several additional magnitudes. If this is the case I generally recommend:
Use a smaller batch size.
Downscale your images (if possible).
Only open images that are currently needed.
Reduce the size of your model.
Use your GPU (if available) by calling model.cuda(); images = images.cuda() before starting your training.
If that doesn't solve your problem you could start narrowing down the issue by doing some of the following:
Create a minimal working example to reproduce the issue.
Check if the problem persists with other, very simple model architectures.
Check if the problem persists with different input data
Check if the problem persists with a different PyTorch version

How can I avoid a "Segmentation Fault (core dumped)" error when loading large .JP2 images with PIL/OpenCV/Matplotlib?

I am running the following simple line in a short script without any issues:
Python 3.5.2;
PIL 1.1.7;
OpenCV 2.4.9.1;
Matplotlib 3.0.1;
...
# for example:
img = plt.imread(i1)
...
However, if the size of a loaded .JP2 > ~500 MB, Python3 throws the following error when attempting to load an image:
"Segmentation Fault (core dumped)"
It should not be a RAM issue, as only ~40% of the available RAM is being used when the error occurs + the error remains the same when RAM is removed or added to the computer. The error also remains the same when using other ways to load the image, e.g. with PIL.
Is there a way to avoid this error or to work around it?
Thanks a lot!
Not really a solution, more of an idea that may work or help other folks think up similar or further developments...
If you want to do several operations or crops on each monster JP2 image, it may be worth paying the price up-front, just once to convert to a format that ImageMagick can subsequently handle more easily. So, your image is 20048x80000 of 2-byte shorts, so you can expand it out to a 16-bit PGM file like this:
convert monster.jp2 -depth 16 image.pgm
and that takes around 3 minutes. However, if you now want to extract part of the image some way down its height, you can now extract from the PGM:
convert image.pgm -crop 400x400+0+6000 tile.tif
in 18 seconds, instead of from the monster JP2:
convert monster.jp2 -crop 400x400+0+6000 tile.tif
which takes 153 seconds.
Note that the PGM will take lots of disk space.... I guess you could try the same thing with a TIFF which can hold 16-bit data too and could maybe be LZW compressed. I guess you could also use libvips to extract tiles even faster from the PGM file.

Google Colab: Increase TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD

I have a python script which I run on Google Colaboratory using
!python3 "/content/gdrive/My Drive/my_folder/my_file.py"
And it gives me:
tcmalloc: large alloc 21329330176 bytes == 0x18e144000 # 0x7f736dbc2001 0x7f736b6f6b85 0x7f736b759b43 0x7f736b75ba86 0x7f736b7f3868 0x5030d5 0x506859 0x504c28 0x506393 0x634d52 0x634e0a 0x6385c8 0x63915a 0x4a6f10 0x7f736d7bdb97 0x5afa0a
And the session crashes.
Therefore I increase TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD size and run the code by:
!TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD=21329330176
!python3 "/content/gdrive/My Drive/my_folder/my_file.py"
But I still get the same error/warning. What is that I am doing wrong?
That warning indicates an attempted allocation of 21329330176 bytes, which is > 20 gigabytes of RAM.
That exceeds the memory capacity of Colab backends, so the crash is expected.
You'll want to restructure your computation to use less concurrent memory, or use a local runtime in order to make use of backends with more available memory.

YOLO - tensorflow works on cpu but not on gpu

I've used YOLO detection with trained model using my GPU - Nvidia 1060 3Gb, and everything worked fine.
Now I am trying to generate my own model, with param --gpu 1.0. Tensorflow can see my gpu, as I can read at start those communicates:
"name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705"
"totalMemory: 3.00GiB freeMemory: 2.43GiB"
Anyway, later on, when program loads data, and is trying to start learning i got following error:
"failed to allocate 832.51M (872952320 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY"
I've checked if it tries to use my other gpu (Intel 630) , but it doesn't.
As i run the train process without "--gpu" mode option, it works fine, but slowly.
( I've tried also --gpu 0.8, 0.4 ect.)
Any idea how to fix it?
Problem solved. Changing batch size and image size in config file didn't seem to help as they didn't load correctly. I had to go to defaults.py file and change them up there to lower, to make it possible for my GPU to calculate the steps.
Look like your custom model use to much memory and the graphic card cannot support it. You only need to use the --batch option to control the size of memory.

Compress .ipa monotuch

Starting from the assumption that I have deleted all unnecessary files, i have my app that contains a folder with jpg images (1024*700 resolution minimum permitted) where the size is 400 MB. When generate my ipa size is 120 MB. I have tried to convert those images in PNG and next generate ipa but size is more than 120 MB (140 MB), but quality it's a bit worse.
Which best practices recommended to reduce the size of the application?
P.s. Those files are showed as gallery.
On tool we used in our game, Draw a Stickman: EPIC, is smusher.
To install (you have to have ruby or XCode command line tools):
sudo gem install smusher
It might print some errors installing that you can ignore.
To use it:
smusher mypng.png
smusher myjpg.jpg
The tool will send the picture off to yahoo's web service smush.it, and in a non-lossy way compress the image.
Generally you can save maybe 20% file size with no loss in quality.
There are definitely other techniques we used like using indexed PNGs, but you are already using JPGs, which are smaller.

Resources