I train a network on GPU with Pytorch. However, after at most 3 epochs, code stops with a message :
Killed
No other error message is given.
I monitored the memory and gpu usage, there were still space during the run. I reviewed the /var/sys/dmesg to find a detailed message regarding to this, however no message with "kill" was entered. What might be the problem?
Cuda version: 9.0
Pytorch version: 1.1.0
If you had root access, you could check whether this is memory issue or not by dmesg command.
In my case, the process was killed by kernel due to out of memory.
I found the cause to be saving tensors require grad to a list and each of those stores an entire computation graph, which consumes significant memory.
I fixed the issue by saving .detach() tensor instead of saving tensors returned by loss function to the list.
You can type "dmesg" on your terminal and scroll down to the bottom. It will show you the message of why it is killed.
Since you mentioned PyTorch, the chances are that your process is killed due to "Out of Memory". To resolve this, reduce your batch size till you no longer see the error.
Hope this helps! :)
In order to give an idea to people who will enconter this:
Apparently, Slurm was installed on the machine so that I needed to give the tasks on Slurm.
Related
I am running a code in Google Colab for training a neural network.
All my scripts have been working just fine, but starting this week, I have been receiving this error:
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
which seems to occur at random. Sometimes it occurs at the beginning of my script run, say, even before epoch 1, some other times at epoch 160 or 56 or so. Nonetheless, it seems to always point to this sentence: loss.backward().
I'm running the code over GPU and have the paid subscription to Colab Pro.
Does anybody have faced this issue? I read somewhere that this seems to be a problem of the GPU running out of memory, however, can't say that for sure given the error messages I'm receiving.
Well, It took a while but I managed to find the source of this problem myself. Some other posts mentioned this could be a GPU memory issue so I tried to minimize the memory usage as much as possible. Though this was good for my code, it didn't solve the problem.
Others talked about switching to CPU and running the script to get a better error message (which I did and took forever). Running my script with CPU gave the error of binary cross entropy not receiving inputs in the zero to one interval. This was clearly not the problem since those inputs can from a sigmoid function.
Finally, I recall the last thing I changed before my script started behaving like this and I found out that it was because of the learning rate. When I ran my training with a learning rate of 0.001, everything was fine. I switched it to 0.02 (20 times higher) and then I started receiving this execution errors at random. Switching back to the smaller learning rate solved the problem immediately. No more GPU errors and now I'm happy.
So, if you have this issue, you my take a look to the learning rate and hopefully will help you.
I traced my Neural Network using torch.jit.trace on a CUDA-compatible GPU server. When I reloaded that Trace on the same server, I could reload it and use it fine. Now, when I downloaded it onto my laptop (for quick testing), when I try to load the trace I get:
RuntimeError: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. 'aten::empty_strided' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
Can I not switch between GPU and CPU on a trace? Or is there something else going on?
I had this exact same issue. In my model I had one line of code that was causing this:
if torch.cuda.is_available():
weight = weight.cuda()
If you have a look at the official documentation for trace (https://pytorch.org/docs/stable/generated/torch.jit.trace.html) you will see that
the returned ScriptModule will always run the same traced graph on any input. This has some important implications when your module is expected to run different sets of operations, depending on the input and/or the module state
So, if the model was traced on a machine with GPU this operation will be recorded and you won't be able to even load your model a CPU only machine. To solve this, deleted everything that makes you model CUDA dependent. In my case it was as easy as deleting the code-block above.
Whatever program I ran on GPU, even if programs that ran successfully before, my GPU throws this error: CL_OUT_OF_RESOURCES for the clEnqueueReadBuffer function.
Then I remembered that I ran a deep learning framework last night which crashed and may ate up all the memory on GPU. I tried to restart the computer, but it doesn't work.
Is it possible that my GPU ran out of memory due to the DL framework's crash?
If so, how should I solve this problem?
CL_OUT_OF_RESOURCES is a generic error given by NVIDIA implementation at clEnqueueRead, it more or less means:
Something went out of bounds (resources) when trying to write to this
buffer
Most probably the kernel you launched before that writes to that buffer went out of bounds of the buffer.
I'm testing CUDA app and I have run into strange memory issue:
My program performs some image operations and displays it using ImageMagick's display program.
The problem is that every time I run that IM's display I get more GPU memory usage, so less memory for GPU computation.
I'm using IM's display, because I couldn't find anything that displays image from the pipe input. Any suggestions?
Anyway why IM's display takes so much GPU memory and why is it not freed?
Based on your question, you're attempting to display a series of files in sequence using a shell not unlike Bash after performing a set of GPU-intensive operations. You're curious why more GPU memory is being consumed with every subsequent invocation of ImageMagick display, which appears to be closing out successfully after the conclusion of each operation.
We may further theorize that you're using ImageMagick's OpenCL support for at least some of your processing. While we don't have enough information to determine what your GPU's texture buffers look like at the completion of each rendering via display, I speculate your GPU isn't freeing textures expediently, causing memory to slowly creep up.
Instead of continuing to build conjecture around this hypothesis, I will instead recommend a tool to debug your issue: gDEBugger. This should allow you to interrogate your video card to determine exactly why things are slowing down.
Best of luck with your application.
I know it's old, but we have figured out that using pipes (popen()) makes sophisticated copy of the program in memory, what also causes copying the end program directives, or whatever called... So when I close program opened with popen I also finish all CUDA related context that are usually freed in "background", when program ends. So cleaning CUDA memory after I close popen application won't work, and I thing here was my memory leak and general major program error.
I hope someone will find it useful.
I would like to take a picture of whats happenning to my screen, but screenshot won't capture it, but the best description is snow.
One of my projects has a habit of randomly failing on a new iteration, and I always assumed it was a 'You're using too much memory fool!' error, so was happy to restart, deal with it, and try to fix the problem.
Then I started to actually monitor the global memory assigned; Its constant at around 70% free throughout execution until suddenly dying on a fresh malloc.
To make matters more worrying, these Guru Meditations have started to habitually appear in my dmesg; all (that I've noticed) with the same address.
NVRM: Xid (0000:01:00): 13, 0008 00000000 000050c0 00000368 00000000 00000080
Any words from the wise on what the hell is going on? I'm still continuing investigation into issues with register and shared memory, but wanted to start this question for any ideas anyone else has.
If none of your CUDA memory allocations fail, then your problem isn't that you are out of memory (if you were it could be due to fragmentation, not necessarily due to 100%+ consumption).
If you are getting a x-mas tree effect, then you probably have a kernel that is writing outside of allocated memory. Check the indexes of pixels/array-cells you are accessing and the memory offset calculation of their position in the output buffers.
You can also try using 1D index while invoking the kernels, to make calculations simpler.
(You can model any multi-dimensional array as a long 1D array.)
Please wrap all calls to CUDA Runtime API with cudaSafeCall() and add a cudaCheckError() after all kernel invocations. These utility functions are exposed in cutil.h. This should help you catch any CUDA errors at the point they actually happen and their error message should help your investigation.