i tried to use conv3d2d for making 3d CNN. I get error below(my code is at enter link description here):
Traceback (most recent call last):
File "/home/shome/workspace_temp/conv3d_test/conv3d_test.py", line 1, in from convnet3d import ConvLayer
File "/home/shome/softwares/theano/Theano-3D-ConvNet-master/convnet3d/convnet3d.py", line 12, in from conv3d2d import conv3d
File "/home/shome/softwares/theano/Theano-3D-ConvNet-master/convnet3d/conv3d2d.py", line 298, in make_gpu_optimizer(DiagonalSubtensor, [0])
File "/home/shome/softwares/theano/Theano-3D-ConvNet-master/convnet3d/conv3d2d.py", line 266, in make_gpu_optimizer #theano.gof.local_optimizer([])
File "/usr/local/lib/python2.7/dist-packages/theano/gof/opt.py", line 948, in decorator
raise ValueError,
("Use None instead of an empty list to apply to all nodes.", f.__module__, f.__name__)
ValueError: ('Use None instead of an empty list to apply to all nodes.', 'conv3d2d', 'local_to_gpu').
My CNN construction is as below:
layer_0_input=x.reshape(batch_size,1,28,28,28)
**layer0=ConvLayer(layer_0_input, 1, nkerns[0], (5,5,5), (28,28,28), 100, T.tanh )**
layer1=PoolLayer(layer0.output, (2,2,2))
**layer2=ConvLayer(layer1.output, nkerns[0], nkerns[1], (5,5,5), (12,12,12), 100, T.tanh)**
layer3=PoolLayer(layer2.output, (2,2,2))
layer4_input=layer3.output.flatten(2)
layer4=HiddenLayer(layer4_input, nkerns[1]*4*4*4, 500, T.tanh)
layer5=LogRegr(layer4.output, 500, 10, rng1)
I think the error is in instantiating the Convlayer. Can anyone help?
Your conv3d2d.py files seems to have been copied from an earlier version of Theano, and the syntax for #theano.gof.local_optimizer has changed since.
If you look at the updated version in the latest master, the call to the decorator has been changed from #theano.gof.local_optimizer([]) to #theano.gof.local_optimizer([op, cuda.gpu_from_host]).
Applying that change alone may not be enough, though, so you may be better off importing conv3d2d from Theano rather than your repository, or updating the whole file.
Related
I am trying to make data augmentations (self.transform, self.transform_prime in the code below) to be done on the GPU to reduce computation time. However, I am running into memory errors.
Below is a snippet of the dataset class code.
Below, sub_img is a NumPy tensor inside the RAM. As the code below shows, I tried to make them into gpu torch tensor, do transformations on them (inside the GPU), then return them.
However, when I made a data loader using the dataset and ran it, CUDA out of memory occurred, even for batch sizes of 2, which is weird, since it worked for batch sizes of 35, when I ran the version of the code that does augmentations inside of the CPU. (Also, the nvidia-smi shows that lots of processes (that take up VRAM) are created, before CUDA out of memory occurs)
def __getitem__(self,idx):
sub_data = self.dataset[idx]
sub_img, sub_label = self.dataset[idx] #해당 idx subject의 img뽑기
if self.split == 'train':
"""below : major revision, so check again (copy 안해도?)"""
y1 = self.transform(from_numpy(sub_img).float().to("cuda:0"))
y2 = self.transform_prime(from_numpy(sub_img).float().to("cuda:0"))
return (y1, y2), sub_label
Could anyone explain to me how I can fix this? The questions are :
why does the CUDA out of memory occur when doing __getitem__ ? If my understanding is correct, __getitem__ is used when batches are generated in dataloader, and therefore should get removed when that specific batch is not used anymore. Shouldn’t this mean that the gpu memory used when I moved the sub_img to CUDA:0 be removed after each batch and hence not take up lots of memory? Why is there a GPU memory error?
How can I fix this? Should I make it so that the self.transform itself gets NumPy arrays, but within self.transform function it converts the arrays to tensors to perform operations in the GPU then return the tensor back to the CPU? Wouldn’t this be inefficient since the tensor has to move back and forth between the CPU and GPU?
I am sorry for my novice questions… thank you for any help and suggestions :)
I have attached the error log below :
Traceback (most recent call last):
File "main_3D.py", line 371, in <module>
main()
File "main_3D.py", line 87, in main
torch.multiprocessing.spawn(main_worker, (args,), args.ngpus_per_node)
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/scratch/connectome/dyhan316/VAE_ADHD/barlowtwins/main_3D.py", line 151, in main_worker
for step, ((y1, y2), _) in enumerate(loader, start=epoch * len(loader)):
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
data = self._next_data()
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/scratch/connectome/dyhan316/VAE_ADHD/barlowtwins/dataset.py", line 81, in __getitem__
y1 = self.transform(from_numpy(sub_img).float().to("cuda:0"))
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I'm new to python, that is the first problem.
Secondly I am trying to automate the task of adding vector point layers from spreadsheets (xlsx-files) with Python.
The task can be done manually with the plugin "add spreadsheet layer".
I have a folder with roughly 20 xlsx-files that need to be added into the QGIS-project as vector point layers.
I have tried the following code snippet, to check if the core task of adding a spreadsheet layer actually works:
The Computer has a Win7 OS. The program in question is Python which is contained in the program QGIS 3.4.
The Plugin that I want to control through python is called "add spreadsheet layer".
from qgis.core import *
import processing
processing.run("qgis:createpointslayerfromtable",
{'INPUT':r'C:\Users\Desktop\PlayItAll\Test.xlsx',
'XFIELD':'X_Pos',
'YFIELD':'Y_Pos',
'ZFIELD':None,
'MFIELD':None,
'TARGET_CRS':QgsCoordinateReferenceSystem('EPSG:4326'),
'OUTPUT':r'memory'})
It produces this error:
File "C:/PROGRA1/QGIS31.4/apps/qgis/./python/plugins\processing\core\Processing.py", line 183, in runAlgorithm
raise QgsProcessingException(msg)
I have contacted the programmer of the plugin and he gave me this code to try:
import processing
processing.runAndLoadResults("qgis:createpointslayerfromtable",
{
'INPUT':r'C:\Users\username\Desktop\Delete\test.xlsx',
'XFIELD':'Longitude',
'YFIELD':'Latitude',
'ZFIELD':None,
'MFIELD':None,
'TARGET_CRS':QgsCoordinateReferenceSystem('EPSG:4326'),
'OUTPUT':'memory'
})
For him it worked, for me it didn't.
I got this on the processing tab:
2019-07-03T13:19:43 CRITICAL Traceback (most recent call last):
File "C:/PROGRA~1/QGIS3~1.4/apps/qgis/./python/plugins\processing\algs\qgis\PointsLayerFromTable.py", line 112, in processAlgorithm
fields, wkb_type, target_crs)
Exception: unknown
2019-07-03T13:19:43 CRITICAL Traceback (most recent call last):
File "C:/PROGRA~1/QGIS3~1.4/apps/qgis/./python/plugins\processing\algs\qgis\PointsLayerFromTable.py", line 112, in processAlgorithm
fields, wkb_type, target_crs)
Exception: unknown
2019-07-03T13:19:43 CRITICAL There were errors executing the algorithm.
The "python warnings" tab showed this:
2019-07-03T13:19:43 WARNING warning:__console__:1: ResourceWarning:
unclosed file
traceback: File "C:/PROGRA~1/QGIS3~1.4/apps/qgis/./python\console\console.py", line 575, in runScriptEditor
self.tabEditorWidget.currentWidget().newEditor.runScriptCode()
File "C:/PROGRA~1/QGIS3~1.4/apps/qgis/./python\console\console_editor.py", line 629, in runScriptCode
.format(filename.replace("\\", "/"), sys.getfilesystemencoding()))
File "C:/PROGRA~1/QGIS3~1.4/apps/qgis/./python\console\console_sci.py", line 635, in runCommand
more = self.runsource(src)
File "C:/PROGRA~1/QGIS3~1.4/apps/qgis/./python\console\console_sci.py", line 665, in runsource
return super(ShellScintilla, self).runsource(source, filename, symbol)
File "C:\PROGRA~1\QGIS3~1.4\apps\Python37\lib\code.py", line 74, in runsource
self.runcode(code)
File "C:\PROGRA~1\QGIS3~1.4\apps\Python37\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "", line 1, in
I installed anaconda and all the packages I needed. I tried to run this mnist example from keras It works fine. Now I'm trying to visualize the Graph with keras manual. but it does not work.
When I run the mnist example I receive this error Message:
Traceback (most recent call last):
File "dummy.py", line 77, in <module>
plot_model(model, to_file='model.png')
File "C:\Users\***\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\utils\vis_utils.py", line 132, in plot_model
dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
File "C:\Users\***\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\utils\vis_utils.py", line 55, in model_to_dot
_check_pydot()
File "C:\Users\***\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\utils\vis_utils.py", line 26, in _check_pydot
pydot.Dot.create(pydot.Dot())
File "C:\Users\***\AppData\Local\Continuum\anaconda3\lib\site-packages\pydot.py", line 1885, in create
assert p.returncode == 0, p.returncode
AssertionError: 1
I also tried the convnet_drawer. It failed too.
Any Ideas?
I am using an evolutionary algorithm to find satisfactory hyper-parameters for a CNN written in Keras/Theano. The stochastic nature of this approach means that from time to time a pathological configuration will be tried, which will yield an exception. In those scenarios, I'd like to catch the exception so I can assign an appropriate low fitness. Unfortunately, when Theano throws an exception, it appears to be masked before it reaches my try/catch block. That is, at some point the exception is caught and not re-raised, which means it never propagates up the stack to reach my try/catch block.
I've asked on the Keras Slack workspace if there was some configuration I had to tickle in Keras to un-mask these exceptions, but I was told that the problem was not at the Keras level, that it was something with Theano. And, so here I am.
I have the following configuration settings at the top of the corresponding theanorc file that I had hoped would solve the problem:
[config]
on_opt_error = raise
on_shape_error = raise
numpy.seterr_all = raise
compute_test_value = raise
And, these are the exceptions I am seeing:
ERROR (theano.gof.opt): SeqOptimizer apply <theano.tensor.opt.ShapeOptimizer object at 0x2aaae03674a8>
ERROR (theano.gof.opt): Traceback:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/gof/opt.py", line 235, in apply
sub_prof = optimizer.optimize(fgraph)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/gof/opt.py", line 83, in optimize
self.add_requirements(fgraph)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1482, in add_requirements
fgraph.attach_feature(ShapeFeature())
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/gof/fg.py", line 541, in attach_feature
attach(self)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1299, in on_attach
self.on_import(fgraph, node, reason='on_attach')
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1362, in on_import
self.set_shape(r, s)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1151, in set_shape
shape_vars.append(self.unpack(s[i], r))
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1073, in unpack
raise ValueError(msg)
ValueError: There is a negative shape in the graph!
Backtrace when that variable is created:
File "/ccs/proj/geo121/mcoletti/dl-4-settlement-mapping/eadl/train_cnn.py", line 218, in <module>
validation_accuracy = train_cnn(data_dir=args.data_dir, kernel_sizes=args.kernel_sizes, max_epoch=args.epoch, batch_sizes=args.batch_size)
File "/ccs/proj/geo121/mcoletti/dl-4-settlement-mapping/eadl/train_cnn.py", line 193, in train_cnn
model = create_cnn(kernel_sizes=kernel_sizes)
File "/ccs/proj/geo121/mcoletti/dl-4-settlement-mapping/eadl/train_cnn.py", line 52, in create_cnn
model.add(Conv2D(256, kernel_size=kernel_sizes[3], activation="relu", kernel_initializer="normal"))
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/models.py", line 475, in add
output_tensor = layer(self.outputs[0])
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/engine/topology.py", line 602, in __call__
output = self.call(inputs, **kwargs)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/layers/convolutional.py", line 164, in call
dilation_rate=self.dilation_rate)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/backend/theano_backend.py", line 1890, in conv2d
filter_dilation=dilation_rate)
And, if you're curious to see the try/catch block, it's just this:
try:
validation_accuracy = train_cnn(data_dir=args.data_dir, kernel_sizes=args.kernel_sizes, max_epoch=args.epoch, batch_sizes=args.batch_size)
except Exception as e:
print(socket.gethostname(), ', Caught exception while training:', str(e) )
My intuition is that this is probably something very, very simple. Maybe I need to add more options to the THEANORC file?
Setting theano.config.compute_test_value = 'raise' appears to work.
Curiously, compute_test_value should have been set from the Theano configuration file, which suggests that it's not being properly read and parsed. I should not have to set this value programmatically when I explicitly set it in the configuration file.
I have seen problems similar to mine here on Stack Overflow, but not exactly the same. I can reshape when using fully-connected NN layers, but not with Conv1D layers. Here's a minimal example. I'm using TF 1.4.0 on Python 3.6.3.
import tensorflow as tf
# fully connected
fc = tf.placeholder(tf.float32, [None,12])
fc = tf.contrib.layers.fully_connected(fc, 12)
fc = tf.contrib.layers.fully_connected(fc, 6)
fc = tf.reshape(fc, [-1,3,2])
# convolutional
con = tf.placeholder(tf.float32, [None,50,4])
con = tf.layers.Conv1D(con, 12, 3, activation=tf.nn.relu)
con = tf.layers.Conv1D(con, 6, 3, activation=tf.nn.relu)
con = tf.reshape(con, [-1,50,3,2])
Here is the output (yes, I'm aware of the RuntimeWarning. The messages I have found which discuss it suggest that it's harmless, but if you know otherwise, please share!):
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py", line 468, in make_tensor_proto
str_values = [compat.as_bytes(x) for x in proto_values]
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py", line 468, in <listcomp>
str_values = [compat.as_bytes(x) for x in proto_values]
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/compat.py", line 65, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got <tensorflow.python.layers.convolutional.Conv1D object at 0x7fa67e0d1a20>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "minimal reshape example.py", line 16, in <module>
con = tf.reshape(con, [-1,width,3,2])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3938, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 513, in _apply_op_helper
raise err
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py", line 472, in make_tensor_proto
"supported type." % (type(values), values))
TypeError: Failed to convert object of type <class 'tensorflow.python.layers.convolutional.Conv1D'> to Tensor. Contents: <tensorflow.python.layers.convolutional.Conv1D object at 0x7fa67e0d1a20>. Consider casting elements to a supported type.
My code fails at con = tf.reshape(con, [-1,50,3,2]). Yet the pattern is nearly identical to the pattern that I use for the fully-connected graph, fc.
I made nets very similar to these work in the higher-level API for TensorFlow called TFLearn. However, TFLearn's DNN Estimator object is having trouble managing a tf.Session correctly. After over a month, I have yet to resolve the issue with TFLearn's developers on GitHub.
I don't mind using TensorFlow's native Estimator, but I have to solve this reshape problem to achieve it.
Well, I found the error: tf.layers.Conv1D != tf.layers.conv1d. Changing the former to the latter eliminated the error. Let the TensorFlow / Python user beware!
Even though TensorFlow seems to avoid Python's object model (which is probably necessary, given the possibility of distributed, low-level computation), there are in fact a few genuine classes in the Python API. The class constructors can accept many (all?) of the same arguments as the similarly-named convenience functions.