Theano / Chainer Reporting Not Reporting Correct Free VRAM on K80 with 12GB RAM

Theano / Chainer Reporting Not Reporting Correct Free VRAM on K80 with 12GB RAM - theano

System:
Ubuntu 16.04.2
cudnn 5.1, CUDA 8.0
I have theano installed from git (latest version).
When I run the generate sample from https://github.com/yusuketomoto/chainer-fast-neuralstyle/tree/resize-conv, it reports back out of memory whether CPU or GPU is used.
python generate.py sample_images/tubingen.jpg -m models/composition.model -o sample_images/output.jpg -g 0
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29
/home/ubuntu/Theano/theano/sandbox/cuda/__init__.py:558: UserWarning: Theano flag device=gpu* (old gpu back-end) only support floatX=float32. You have floatX=float64. Use the new gpu back-end with device=cuda* for that value of floatX.
warnings.warn(msg)
Using gpu device 0: Tesla K80 (CNMeM is enabled with initial size: 95.0% of memory, cuDNN 5105)
Traceback (most recent call last):
File "generate.py", line 45, in <module>
y = model(x)
File "/home/ubuntu/chainer-fast-neuralstyle/net.py", line 56, in __call__
h = F.relu(self.b2(self.c2(h), test=test))
File "/usr/local/lib/python2.7/dist-packages/chainer/links/connection/convolution_2d.py", line 108, in __call__
deterministic=self.deterministic)
File "/usr/local/lib/python2.7/dist-packages/chainer/functions/connection/convolution_2d.py", line 326, in convolution_2d
return func(x, W, b)
File "/usr/local/lib/python2.7/dist-packages/chainer/function.py", line 199, in __call__
outputs = self.forward(in_data)
File "/usr/local/lib/python2.7/dist-packages/chainer/function.py", line 310, in forward
return self.forward_gpu(inputs)
File "/usr/local/lib/python2.7/dist-packages/chainer/functions/connection/convolution_2d.py", line 90, in forward_gpu
y = cuda.cupy.empty((n, out_c, out_h, out_w), dtype=x.dtype)
File "/usr/local/lib/python2.7/dist-packages/cupy/creation/basic.py", line 19, in empty
return cupy.ndarray(shape, dtype=dtype, order=order)
File "cupy/core/core.pyx", line 88, in cupy.core.core.ndarray.__init__ (cupy/core/core.cpp:6333)
File "cupy/cuda/memory.pyx", line 280, in cupy.cuda.memory.alloc (cupy/cuda/memory.cpp:5988)
File "cupy/cuda/memory.pyx", line 431, in cupy.cuda.memory.MemoryPool.malloc (cupy/cuda/memory.cpp:9256)
File "cupy/cuda/memory.pyx", line 447, in cupy.cuda.memory.MemoryPool.malloc (cupy/cuda/memory.cpp:9162)
File "cupy/cuda/memory.pyx", line 342, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc (cupy/cuda/memory.cpp:7817)
File "cupy/cuda/memory.pyx", line 368, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc (cupy/cuda/memory.cpp:7592)
File "cupy/cuda/memory.pyx", line 260, in cupy.cuda.memory._malloc (cupy/cuda/memory.cpp:5930)
File "cupy/cuda/memory.pyx", line 261, in cupy.cuda.memory._malloc (cupy/cuda/memory.cpp:5851)
File "cupy/cuda/memory.pyx", line 35, in cupy.cuda.memory.Memory.__init__ (cupy/cuda/memory.cpp:1772)
File "cupy/cuda/runtime.pyx", line 207, in cupy.cuda.runtime.malloc (cupy/cuda/runtime.cpp:3429)
File "cupy/cuda/runtime.pyx", line 130, in cupy.cuda.runtime.check_status (cupy/cuda/runtime.cpp:2241)
cupy.cuda.runtime.CUDARuntimeError: cudaErrorMemoryAllocation: out of memory
-
import theano.sandbox.cuda.basic_ops as sbcuda
sbcuda.cuda_ndarray.cuda_ndarray.mem_info()
(500105216L, 11995578368L)
-
lspci -vvv |grep -i -A 20 nvidia
00:04.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
Physical Slot: 4
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 400000000 (64-bit, prefetchable) [size=16G]
Region 3: Memory at 800000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at c000 [size=128]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidia_375_drm, nvidia_375
What exactly do those numbers mean? Theano/Chainer only has access to ~500MB of VRAM?

I managed to fix the issue by completely uninstalling theano. I was confused as to why importing chainer displayed the theano warnings, but it was doing that. Uninstalling theano allowed the chainer script to work.

Related

RuntimeError: The 'data' object was created by an older version of PyG

thanks for your great contribution to the science:
I have installed the following pytorch and pytorch_gemetric versions as you have mentioned in this link:
conda create -n tox-env python=3.6
conda install pytorch=1.6.0 torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install torch-scatter==2.0.6 torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
pip install torch-geometric==2.0.0
The reason is that I am trying to run the code from a GitHub repositorty, when it reaches to this line, it was raising an error (in the latest version of pytorch). Then I had to downgrade the pyG and pytorch versions, however, I am getting the following error:
/home/es/anaconda3/envs/tox-env/bin/python /home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/main.py
/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/cuda/__init__.py:125: UserWarning:
NVIDIA GeForce RTX 3090 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3090 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
tox21
Iteration: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/main.py", line 131, in <module>
main("tox21", "model_gin/supervised_contextpred.pth", "gin", True, True, True, 0.1, 5)
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/main.py", line 105, in main
support_grads = model(epoch)
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/meta_model.py", line 183, in forward
for step, batch in enumerate(tqdm(support_loaders[task], desc="Iteration")):
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 198, in __getitem__
data = self.get(self.indices()[idx])
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/loader.py", line 142, in get
for key in self.data.keys:
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/data.py", line 103, in keys
for store in self.stores:
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/data.py", line 393, in stores
return [self._store]
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/data.py", line 341, in __getattr__
"The 'data' object was created by an older version of PyG. "
RuntimeError: The 'data' object was created by an older version of PyG. If this error occurred while loading an already existing dataset, remove the 'processed/' directory in the dataset's root folder and try again.
Process finished with exit code 1

AssertionError: Torch not compiled with CUDA enabled in PyTorch

I've created new environment using Anconda and tried to run my model which is built on Pytorch. But, I am having troudlbe with AssertionError. Here is my CUDA version with nvcc --version command.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:24:09_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
and, I've installed Pytorch following the link https://pytorch.org/get-started/locally/, by below command.
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
and , I got:
>>> import torch
>>> print(torch.__version__)
1.11.0
>>>
It seems than I installed right Pytorch version with right CUDA version, however I got following error :
(torch) PS C:\Users\Administrator\Desktop> python .\torch_test.py
Traceback (most recent call last):
File ".\torch_test.py", line 136, in <module>
model = TFModel(24*7*2, 24*7, 512, 8, 4, 0.1).to(device)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 907, in to
return self._apply(convert)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply
module._apply(fn)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply
module._apply(fn)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply
module._apply(fn)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 601, in _apply
param_applied = fn(param)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 905, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\cuda\__init__.py", line 210, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
How can I resolve the AssertionError?

RuntimeError: cuda runtime error (100) . The gpu is enabled but still giving error

I am new to google Colab and pyTorch. I am running a pytorch model but it is giving me the Cuda Runtime Error in google Colab. My gpu is enabled on google colab but it is still giving error, The description of gpu is available in the image below. Can anyone please help me out?
torch GPU
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=47 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
File "run.py", line 338, in <module>
main()
File "run.py", line 303, in main
model = model.cuda()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 458, in cuda
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 376, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 458, in <lambda>
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 190, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:47
Read prediction from logs/logs_sparc_editsql/valid_use_predicted_queries_predictions.json

Nvm, I was putting CUDA_VISIBLE_DEVICES to 5.
It should be the number of CUDA devices you have.

TypeError: names_to_saveables must be a dict mapping string names to Tensors/Variables

Same issue of these two questions (1/2). This is happening when I'm retrieving a TF model to display using Lucid, following this tutorial. My full stack trace is:
/usr/local/lib/python3.6/dist-packages/h5py/init.py:36:
FutureWarning: Conversion of the second argument of issubdtype from
float to np.floating is deprecated. In future, it will be treated
as np.float64 == np.dtype(float).type. from ._conv import
register_converters as _register_converters
2018-05-08
01:22:34.477907: I
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful
NUMA node read from SysFS had negative value (-1), but there must be
at least one NUMA node, so returning NUMA node zero
2018-05-08
01:22:34.478221: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0
with properties: name: Tesla K80 major: 3 minor: 7
memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:04.0 totalMemory:
11.17GiB freeMemory: 362.12MiB
2018-05-08 01:22:34.478266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible
gpu devices: 0 2018-05-08 01:22:34.871564: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device
interconnect StreamExecutor with strength 1 edge matrix: 2018-05-08
01:22:34.871635: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-05-08 01:22:34.871680: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-05-08 01:22:34.871877: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created
TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with
101 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus
id: 0000:00:04.0, compute capability: 3.7)
Traceback (most recent call
last):
File
"/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py",
line 380, in
app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py",
line 126, in run
_sys.exit(main(argv))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py",
line 274, in main
FLAGS.saved_model_tags, checkpoint_version)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py",
line 256, in freeze_graph
checkpoint_version=checkpoint_version)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py",
line 130, in freeze_graph_with_def_protos
var_list=var_list, write_version=checkpoint_version)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py",
line 1311, in init
self.build() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py",
line 1320, in build
self._build(self._filename, build_save=True, build_restore=True)
File
"/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py",
line 1357, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py",
line 787, in _build_internal
saveables = self._ValidateAndSliceInputs(names_to_saveables)
File
"/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py",
line 688, in _ValidateAndSliceInputs
variable) TypeError: names_to_saveables must be a dict mapping string names to Tensors/Variables. Not a variable:
Tensor("conv2d_1/bias:0", shape=(16,), dtype=float32)
My network graph is defined as this:
input_1
conv2d_1/kernel
conv2d_1/kernel/read
conv2d_1/bias
conv2d_1/bias/read
conv2d_1/convolution
conv2d_1/BiasAdd
conv2d_1/Relu
max_pooling2d_1/MaxPool
conv2d_2/kernel
conv2d_2/kernel/read
conv2d_2/bias
conv2d_2/bias/read
conv2d_2/convolution
conv2d_2/BiasAdd
conv2d_2/Relu
max_pooling2d_2/MaxPool
conv2d_3/kernel
conv2d_3/kernel/read
conv2d_3/bias
conv2d_3/bias/read
conv2d_3/convolution
conv2d_3/BiasAdd
conv2d_3/Relu
max_pooling2d_3/MaxPool
conv2d_4/kernel
conv2d_4/kernel/read
conv2d_4/bias
conv2d_4/bias/read
conv2d_4/convolution
conv2d_4/BiasAdd
conv2d_4/Relu
up_sampling2d_1/Shape
up_sampling2d_1/strided_slice/stack
up_sampling2d_1/strided_slice/stack_1
up_sampling2d_1/strided_slice/stack_2
up_sampling2d_1/strided_slice
up_sampling2d_1/Const
up_sampling2d_1/mul
up_sampling2d_1/ResizeNearestNeighbor
conv2d_5/kernel
conv2d_5/kernel/read
conv2d_5/bias
conv2d_5/bias/read
conv2d_5/convolution
conv2d_5/BiasAdd
conv2d_5/Relu
up_sampling2d_2/Shape up_sampling2d_2/strided_slice/stack
up_sampling2d_2/strided_slice/stack_1
up_sampling2d_2/strided_slice/stack_2 up_sampling2d_2/strided_slice
up_sampling2d_2/Const up_sampling2d_2/mul
up_sampling2d_2/ResizeNearestNeighbor conv2d_6/kernel
conv2d_6/kernel/read conv2d_6/bias conv2d_6/bias/read
conv2d_6/convolution conv2d_6/BiasAdd conv2d_6/Relu
up_sampling2d_3/Shape up_sampling2d_3/strided_slice/stack
up_sampling2d_3/strided_slice/stack_1
up_sampling2d_3/strided_slice/stack_2 up_sampling2d_3/strided_slice
up_sampling2d_3/Const up_sampling2d_3/mul
up_sampling2d_3/ResizeNearestNeighbor conv2d_7/kernel
conv2d_7/kernel/read conv2d_7/bias conv2d_7/bias/read
conv2d_7/convolution conv2d_7/BiasAdd conv2d_7/Relu output_node0

Unable to train tensorflow object detection on custom dataset

I am trying to train the ssd_mobilenet_v1_coco_2017_11_17 model on my own object (BournVita, if you want to Google it). The images are of varying sizes and have been labelled using labelImg
I am following the Sentdex tutorial that can be found here, but on Windows 10
I have an Nvidia 1050Ti GPU linked with tensorflow version 1.4.0, Python 3.5.3 and OpenCV 3.
Situation:
On executing
python path\to\train.py --logtostderr --train-dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config
I get the following error:
(.dl-env) D:\Work\Gemini\bournvita>python D:\Work\.dl-env\Lib\site-packages\tensorflow\models\research\object_detection\train.py --logtostderr --train_dir=training\ --pipeline_config_path=training\ssd_mobilenet_v1_pets.config
WARNING:tensorflow:From D:\Work\.dl-env\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\trainer.py:210: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
2018-01-24 15:04:57.969803: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-01-24 15:04:59.246153: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.31GiB
2018-01-24 15:04:59.246358: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Restoring parameters from training\model.ckpt-5
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path training\model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Error reported to Coordinator: , image_size must contain 3 elements[4]
[[Node: cond_1/RandomCropImage/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2 = SampleDistortedBoundingBoxV2[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.5, 2], max_attempts=100, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](cond_1/RandomCropImage/Shape, cond_1/RandomCropImage/ExpandDims, cond_1/RandomCropImage/PruneNonOverlappingBoxes/Const)]]
INFO:tensorflow:Caught OutOfRangeError. Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
INFO:tensorflow:Recording summary at step 5.
Traceback (most recent call last):
File "D:\Work\.dl-env\Lib\site-packages\tensorflow\models\research\object_detection\train.py", line 163, in
tf.app.run()
File "D:\Work\.dl-env\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "D:\Work\.dl-env\Lib\site-packages\tensorflow\models\research\object_detection\train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "D:\Work\.dl-env\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\trainer.py", line 332, in train
saver=saver)
File "D:\Work\.dl-env\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 775, in train
sv.stop(threads, close_summary_writer=True)
File "D:\Work\.dl-env\lib\site-packages\tensorflow\python\training\supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "D:\Work\.dl-env\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "D:\Work\.dl-env\lib\site-packages\six.py", line 693, in reraise
raise value
File "D:\Work\.dl-env\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run
enqueue_callable()
File "D:\Work\.dl-env\lib\site-packages\tensorflow\python\client\session.py", line 1231, in _single_operation_run
target_list_as_strings, status, None)
File "D:\Work\.dl-env\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: image_size must contain 3 elements[4]
[[Node: cond_1/RandomCropImage/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2 = SampleDistortedBoundingBoxV2[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.5, 2], max_attempts=100, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](cond_1/RandomCropImage/Shape, cond_1/RandomCropImage/ExpandDims, cond_1/RandomCropImage/PruneNonOverlappingBoxes/Const)]]
The last line could be of importance.
I have tried changing all my images to jpg, but to no avail. I have also consulted this page, but it didn't help much. Thanks for your help.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Theano / Chainer Reporting Not Reporting Correct Free VRAM on K80 with 12GB RAM - theano

I managed to fix the issue by completely uninstalling theano. I was confused as to why importing chainer displayed the theano warnings, but it was doing that. Uninstalling theano allowed the chainer script to work.

Related

RuntimeError: The 'data' object was created by an older version of PyG

AssertionError: Torch not compiled with CUDA enabled in PyTorch

RuntimeError: cuda runtime error (100) . The gpu is enabled but still giving error

TypeError: names_to_saveables must be a dict mapping string names to Tensors/Variables

Unable to train tensorflow object detection on custom dataset

Categories

Resources