I am a beginner with TF and I am trying to running some Tensorflow Object Detection API with:
GeForce 2GB-MX150
16GB RAM
I7 8550U
I getting the following error when it starts training and I can't figure out what's wrong.
I have tried to change some parameters like batch size multiple times but it stil getting the some error.
In this picture you can see the total and available memory that the computer has.
I'll grateful for you help me.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,1024,52,38] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_20/bottleneck_v1/conv3/Conv2D
= Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true,
_device="/job:localhost/replica:0/task:0/device:GPU:0"](FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_20/bottleneck_v1/conv2/Relu, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_20/bottleneck_v1/conv3/weights/read/_2629)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: gradients/FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_18/bottleneck_v1/conv3/Conv2D_grad/tuple/control_dependency_1/_3229
= _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6894_...pendency_1", tensor_type=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
2 GB of GPU memory on your is far too little for a huge model like ResNet-101.
Related
I'm trying to optimize some weighs (weigts) in Pytorch but I keep getting this error:
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 8000000000000 bytes. Error code 12 (Cannot allocate memory).
Namely, things blow up when I run (weights * col).sum() / weights.sum(). Weights is a tensor of size (1000000,1) and col is also a tensor of size (1000000, 1). Both tensors are decently sized, but it seems odd that I'm using up all the memory in my computer (8GB) for these operations.
It could be that your weights and col tensors are not aligned (i.e. one of them is transposed so it is (1,1000000) instead of (1000000,1). Then when you do (weights * col) the shapes are broadcast together and it makes a tensor that is (1000000,1000000) which is probably where you are getting the extreme memory usage (as the resulting tensor is 1000000 times bigger than your original tensor).
I have been getting this error after a few epochs.
I have tried several suggestions found in similar questions such as
reduce the batch size of both training and test to 1
reduce the data size
use kill -9 pid
use more than one GPU by setting os.environ['CUDA_VISIBLE_DEVICES'] = '0,2'
reduced the number of output neurons of the LSTM model
Adding gpu_options = tf.GPUOptions(allow_growth=True)
session = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options))
Adding del model after using the model
Adding k.clear_session(). I'm not sure I used this particular one correctly.
None of them work.
Does anyone have any other suggestions? Please help.
The tensor shape changes in different runs and different error message but the error message remains the same.
I'm using Python3.7, tensorflow-gpu==1.14, CuDNN=7.6.5, CUDA==10.0.
I'm surprised to face an Out-of-Memory error using tf.keras.applications.ResNet50 implementation on an Nvidia RTX2080Ti (with 11Gb of memory !).
Question:
Is there something wrong with the workflow I use?
Notes:
I'm using tensorflow-gpu==2.0.0b1 with CUDA v10.1
I work on a segmentation task, thus the large output_shape
I build the batches myself, thus the use of train_on_batch()
Even when setting memory_growth to True, the memory get filled-up from 700Mb to 10850Mb in a fraction of a second.
Code:
import tensorflow as tf
import tensorflow.keras as ke
import numpy as np
ke.backend.clear_session()
inputs = ke.layers.Input(shape=(512,1024,3), dtype="float32")
outputs = ke.applications.ResNet50(include_top=False, weights="imagenet")(inputs)
outputs = ke.layers.Lambda(lambda x: tf.compat.v1.image.resize_bilinear(x, size=(512,1024)))(outputs)
outputs = ke.layers.Conv2D(2, 1, activation="softmax")(outputs)
model = ke.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=ke.optimizers.RMSprop(lr=0.001), loss=ke.losses.CategoricalCrossentropy())
images = np.zeros((1,512,1024,3), dtype=np.float32)
targets = np.zeros((1,512,1024,2), dtype=np.float32)
model.train_on_batch(images, targets)
Resnet being the complex complex model, the dimensions of the input might be the reason for OOM error. Try reducing the dimensions and the corresponding batch size(as much as the memory can hold) and try.
As mentioned in comments it worked with batch size 1 and with dimensions 700*512.
We can allocate a tensor on GPU using torch.Tensor([1., 2.], device='cuda'). Are there any differences using that way rather than torch.cuda.Tensor([1., 2.]), except we can pass in a specific CUDA device to the former one?
Or in other words, in which scenario is torch.cuda.Tensor() necessary?
So generally both torch.Tensor and torch.cuda.Tensor are equivalent. You can do everything you like with them both.
The key difference is just that torch.Tensor occupies CPU memory while torch.cuda.Tensor occupies GPU memory. Of course operations on a CPU Tensor are computed with CPU while operations for the GPU / CUDA Tensor are computed on GPU.
The reason you need these two tensor types is that the underlying hardware interface is completely different. Apart from the point it doesn't make sense computationally, you will get an error as soon as you try to do computations between torch.Tensor and torch.cuda.Tensor:
import torch
# device will be 'cuda' if a GPU is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# creating a CPU tensor
cpu_tensor = torch.rand(10)
# moving same tensor to GPU
gpu_tensor = cpu_tensor.to(device)
print(cpu_tensor, cpu_tensor.dtype, type(cpu_tensor), cpu_tensor.type())
print(gpu_tensor, gpu_tensor.dtype, type(gpu_tensor), gpu_tensor.type())
print(cpu_tensor*gpu_tensor)
Output:
tensor([0.8571, 0.9171, 0.6626, 0.8086, 0.6440, 0.3682, 0.9920, 0.4298, 0.0172,
0.1619]) torch.float32 <class 'torch.Tensor'> torch.FloatTensor
tensor([0.8571, 0.9171, 0.6626, 0.8086, 0.6440, 0.3682, 0.9920, 0.4298, 0.0172,
0.1619], device='cuda:0') torch.float32 <class 'torch.Tensor'> torch.cuda.FloatTensor
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-15-ac794171c178> in <module>()
12 print(gpu_tensor, gpu_tensor.dtype, type(gpu_tensor), gpu_tensor.type())
13
---> 14 print(cpu_tensor*gpu_tensor)
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'other'
As the underlying hardware interface is completely different, CPU Tensors are just compatible with CPU Tensor and verse visa GPU Tensors are just compatible to GPU Tensors.
Edit:
As you can see here that a tensor which is moved to GPU is actually a tensor of type: torch.cuda.*Tensor i.e. torch.cuda.FloatTensor.
So cpu_tensor.to(device) or torch.Tensor([1., 2.], device='cuda') will actually return a tensor of type torch.cuda.FloatTensor.
In which scenario is torch.cuda.Tensor() necessary?
When you want to use GPU acceleration (which is much faster in most cases) for your program, you need to use torch.cuda.Tensor, but you have to make sure that ALL tensors you are using are CUDA Tensors, mixing is not possible here.
I am building a keras model to run some simple image recognition task. If i do everything in raw Keras, I don't hit OOM. However, strangely, when I do it through a mini framework I wrote, which is fairly simple and mainly so that I can keep track of the hyperparameters and setup I used, I hit OOM. Most of the executions should be the same as running the raw Keras. I am guessing somewhere I made some mistakes in my code. Note that this same mini framework had no issue running with CPU on my local laptop. I think I will need to debug. But before that, anyone has any general advice?
Here's a few lines of the errors I got:
Epoch 1/50
2018-05-18 17:40:27.435366: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-18 17:40:27.435906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:04.0 totalMemory: 11.17GiB freeMemory: 504.38MiB
2018-05-18 17:40:27.435992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-05-18 17:40:27.784517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-18 17:40:27.784675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-05-18 17:40:27.784724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-05-18 17:40:27.785072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 243 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2018-05-18 17:40:38.569609: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 36.00MiB. Current allocation summary follows.
2018-05-18 17:40:38.569702: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (256): Total Chunks: 66, Chunks in use: 66. 16.5KiB allocated for chunks. 16.5KiB in use in bin. 2.3KiB client-requested in use in bin.
2018-05-18 17:40:38.569768: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (512): Total Chunks: 10, Chunks in use: 10. 5.0KiB allocated for chunks. 5.0KiB in use in bin. 5.0KiB client- etc. etc
2018-05-18 17:40:38.573706: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[18432,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
This is caused by running out of GPU memory as it is clear from the Warnings.
First workaround is that you can allow GPU memory to grow if possible by writing making this Config proto and passing to tf.session()
# See https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
Then pass this config to the session that is causing this error. Like
tf.Session(config = config)
If this doesn't helps, you could disable the GPU for that particular session that is causing this error. Like this
config = tf.ConfigProto(device_count ={'GPU': 0})
sess = tf.Session(config=config)
If you are using keras, you can get the backends of keras and apply these configs by extracting the session.