Google Colab: Increase TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD - python-3.x

I have a python script which I run on Google Colaboratory using
!python3 "/content/gdrive/My Drive/my_folder/my_file.py"
And it gives me:
tcmalloc: large alloc 21329330176 bytes == 0x18e144000 # 0x7f736dbc2001 0x7f736b6f6b85 0x7f736b759b43 0x7f736b75ba86 0x7f736b7f3868 0x5030d5 0x506859 0x504c28 0x506393 0x634d52 0x634e0a 0x6385c8 0x63915a 0x4a6f10 0x7f736d7bdb97 0x5afa0a
And the session crashes.
Therefore I increase TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD size and run the code by:
!TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD=21329330176
!python3 "/content/gdrive/My Drive/my_folder/my_file.py"
But I still get the same error/warning. What is that I am doing wrong?

That warning indicates an attempted allocation of 21329330176 bytes, which is > 20 gigabytes of RAM.
That exceeds the memory capacity of Colab backends, so the crash is expected.
You'll want to restructure your computation to use less concurrent memory, or use a local runtime in order to make use of backends with more available memory.

Related

PyTorch Training exitting after Caching Images

I have a dataset of around 12k Training Images and 500 Validation Images. I am using YOLOv5-PyTorch to train my model. When i start the training, and when it comes down to the Caching Images stage, it suddenly quits.
The code I'm using to run this is as follows:
!python train.py --img 800 --batch 32 --epochs 20 --data '/content/data.yaml' --cfg ./models/custom_yolov5s.yaml --weights yolov5s.pt --name yolov5s_results --cache
I am using Google Colab to train my model.
This is the command that executes before shutting down:
train: Caching Images (12.3GB ram): 99% 11880/12000 [00:47<00:00,
94.08it/s]
So i solved the above problem. The problem is occuring because we are caching all the images fore-hand as to increase the speed during epochs. Now this may increase the speed but on the other hand, it also consumes memory. When you are using Google Colab, it provides you 12.69GB of RAM. When caching such huge data, all of the RAM was being consumed and there was nothing left to cache validation set hence, it shuts down immediately.
There are two basic methods to solve this issue:
Method 1:
I simply reduced the image size from 800 to 640 as my training images didn't contain any small object, so i actually did not need large sized images. It reduced my RAM consumption by 50%
--img 640
train: Caching Images (6.6GB ram): 100% 12000/12000 [00:30<00:00,
254.08it/s]
Method 2:
I had written an argument at the end of my command that I'm using to run this project :
--cache
This command caches the entire dataset in the first epoch so it may be reused again instantly instead of processing it again. If you are willing to compromise on training speed, then this method would work for you. Just simply remove this line and you will be good to go. Your new command to run will be:
!python train.py --img 800 --batch 32 --epochs 20 --data '/content/data.yaml' --cfg ./models/custom_yolov5s.yaml --weights yolov5s.pt --name yolov5s_results
Maybe you should add "VRAM consumption" to your title, because this was the main reason your training was crashing.
Your awnser is still right though, but I would like to get into more detail, to why such crashes can happen for people with this kind of problems. Yolov5 works with Imagesizes of x32. If you have Imagesizes that are not a multiple of x32, Yolov5 will try to strech the image in every epoch and consume a lot of VRAM, that shouldn't be consumed (at least not for this). Large imagesizes also consume a lot of VRAM, so even if it is a multiple of x32 your setup or config could not be enouth for this training.The Cache command is speeding up your training, but with the downside of consuming more VRAM. Batchsizes are a big role of VRAM consumption. If you really want to train with a large Imagesize, you should reduce your batchsize for a multimple of x2.
I hope this helps somebody.

How to work with big WAV files in Python without getting memory error?

When working with Python's soundfile for reading and writing audio WAV files that are longer than 9 minutes (size > 500 MB) I am getting a memory error ("cannot allocate 1.1 GBi"). How can I work with such big files without splitting them into smaller files (e.g. in Audacity)? Why without splitting? to detect effects of my processing after long duration runs (e.g. after continuous processing of > 9 minutes), in which history is important.
In general, how do I extend Python runtime memory in order to allow working with big files, including loading them and writing them to the hard-drive?
See also: Unable to allocate array with shape and data type

Memory error in pycharm using scipy's welch function

I want to get the Welch's periodogram using scipy.signal in pycharm. My signal is an 5-min audio file with Fs = 48 kHz, so I guess it's a very big signal. The line was:
f, p = signal.welch(audio, Fs, nperseg=512)
I am getting a memory error. I was wondering if that's a pycharm configuration thing, or it's just a too big signal. My RAM is 8 Gb.
Sometimes it works with some audio files, but the idea is to do it with several, so after one or two, the error raises.
I've tested your setup and welch does not seem to be the problem. For further analysis the entire script you are running would be necessary.
import numpy as np
from scipy.signal import welch
fs = 48000
signal_length = 5 * 60 * fs
audio_signal = np.random.rand(signal_length)
f, Pxx = welch(audio_signal, fs=fs, nperseg=512)
On my computer (windows 10, 64 bit) it consumes 600 MB of peak memory during the call to welch which gets recycled directly afterwards, additionally to ~600MB of allocation for the initial array and Python itself. The call to welch itself does not lead to any permanent significant memory increase.
You can do the following:
Upgrade to newest version of scipy, as there have been problems with Welch previously
Check that your PC has enough free memory and close memory-hungry applications (eg. chrome)
Convert your array in a lower datatype e.g. from float64 to float32 or float16
Make sure to free variables that are not needed anymore . Especially if you load several signals and store the result in different arrays, it can accumulate quite quickly. Only keep what you need and delete vars via del variable_name, check that there are no references remaining elsewhere in the program. E.g if you don't need the audio variable, either delete it explicitly after welch(...) or overwrite it with the next audio data.
Run the garbage collector gc.collect(). However, this will probably not solve your problem as garbage is managed automatically in Python anyway.

p5.serialserver (i.e. p5serial and p5.serialcontrol) leaking memory

I'm trying to use p5.serial to display my Arduino-like device's USB output on a web page. It generates about ten strings a second continually.
the problem:
When I run p5serial (in a shell window) or p5.serialcontrol (an Electron/GUI app), the node server starts out at ~ 12 MB, but as it runs it bloats quickly to > 1 GB and the output becomes sluggish. The server eventually dies with
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
...
Abort trap: 6
the question:
Is this a known issue (aside from the bug report I just filed)? Or perhaps an error in the way I'm using it?
some details:
When I connect the Arduino-like device via a serial USB terminal, things work just fine (except for the lack of lovely p5.js graphics).
I'm running OS X (10.12.6 / Sierra), node v6.3.0, p5.serialserver#0.0.24
I've posted a gist containing a minimal example (but understand that it assumes you have an Arduino-like device with USB).
This memory link was fixed in p5.serial: https://github.com/p5-serial/p5.serialcontrol/issues/12

How much data can be fetched by submit_bio() at a time

Here is my LAN structure
I want to download a .zip file of 258.6MB from the samba server, meanwhile, start a profiling for the router's linux stack just before the download.
When finished, stopped the profiling and I found this in the porfiling report
samples % image name app name symbol name
...
16 0.0064 vmlinux smbd submit_bio
...
The sampling rate is 100000 and the event is CPU_CYCLES.
Because this is the first download of the file that is to say it is not in the page cache, submit_bio() should be pretty busy. Thus, I don't understand why there is just a poor portion of submit_bio(). Is that mean each time the submit_bio is called, we fetch about (258.6/16)MB data?
Thanks
That's statistical sampling. It means of the x times the profiler sampled the system, 16 times it happened to find the CPU running in submit_bio(). It does not mean that submit_bio() is called 16 times.

Resources