I am training a pix2pix GAN but two times now my program has randomly broken after hours of running with the same problem.
The runtime error I am getting is :
error: Caught error in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch both(index)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/gdrive/My Drive/All_Deep_Learning/PythonCustomLibraries/pix2pixdatasetlib.py", line 40, in __getitem__
File "/content/gdrive/My Drive/All_Deep_Learning/PythonCustomLibraries/pix2pixdatasetlib.py", line 185, in process_images
File "/content/gdrive/My
Drive/All_Deep_Learning/PythonCustomLibraries/pix2pixdatasetlib.py", line 112, in read_a_image cv2.error: OpenCV(4.6.0) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'
This last time it ran for 209 epochs before breaking at 64% of the 210 epoch. So the confusing part is why would it break after running for so long and the only thing I could think of is it most be some kind of memory problem... IDK
Here is my code:
class Pix2PixDataset(Dataset):
def __init__(self, data_points, transforms = None):
self.data_points = data_points
self.transforms = transforms
self.resize = T.Resize((512,512))
def __getitem__(self, index) :
image, y_label = process_images(self.data_points[index].reference_image, self.data_points[index].drawing )
image = self.resize(image)
y_label = self.resize(y_label)
if self.transforms:
image = self.transforms(image)
y_label = self.transforms(y_label)
return(image, y_label)
def __len__(self):
return len(self.data_points)
Here is my read a image function
def read_a_image(a_image_path):
image = cv2.imread(a_image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
return image
And here is my process image function
def process_images(reference_image_path, drawing_path):
reference_image = read_a_image(reference_image_path)
drawing = read_a_image(drawing_path)
reference_image = Image.fromarray(reference_image)
drawing = Image.fromarray(drawing)
if(apply_random_transformation):
reference_image, drawing = random_transformation(reference_image,drawing)
return reference_image, drawing
And here is my data point class:
class DataPoint:
def __init__(self, reference_image, drawing):
self.reference_image = reference_image
self.drawing = drawing
The error message indicates that the input image _src passed to the cv2.cvtColor function in the read_a_image function is empty. This can happen if the image path provided to the function is incorrect or if the file is missing or corrupted. But again it ran for over 12 hours both times, so why is the image path all of a sudden incorrect or the file is missing or corrupted?
I am using Google Collab and my dataset is in my google drive but I Mounted to my notebook
Related
I am trying to create a custom IterableDataset in pytorch and split it into train, validation and test datasets using this answer https://stackoverflow.com/a/61818182/9478434 .
My dataset class is:
class EchoDataset(torch.utils.data.IterableDataset):
def __init__(self, delay=4, seq_length=15, size=1000):
super(EchoDataset).__init__()
self.delay = delay
self.seq_length = seq_length
self.size = size
def __len__(self):
return self.size
def __iter__(self):
""" Iterable dataset doesn't have to implement __getitem__.
Instead, we only need to implement __iter__ to return
an iterator (or generator).
"""
for _ in range(self.size):
seq = torch.tensor([random.choice(range(1, N + 1)) for i in range(self.seq_length)], dtype=torch.int64)
result = torch.cat((torch.zeros(self.delay), seq[:self.seq_length - self.delay])).type(torch.int64)
yield seq, result
And the dataset is created and splitted as:
DELAY = 4
DATASET_SIZE = 200000
ds = EchoDataset(delay=DELAY, size=DATASET_SIZE)
train_count = int(0.7 * DATASET_SIZE)
valid_count = int(0.2 * DATASET_SIZE)
test_count = DATASET_SIZE - train_count - valid_count
train_dataset, valid_dataset, test_dataset = torch.utils.data.random_split(
ds, (train_count, valid_count, test_count)
)
The problem is that when I want to iterate into the dataloader, I get NotImplementedError:
iterator = iter(train_dataset_loader)
print(next(iterator))
I get:
NotImplementedError: Caught NotImplementedError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "venv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "venv/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "venv/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "venv/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 363, in __getitem__
return self.dataset[self.indices[idx]]
File "venv/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 69, in __getitem__
raise NotImplementedError
NotImplementedError
It seems the problem goes back to splitting the dataset and creating Subset objects since I can iterate through a Dataloader created from the original dataset (not splitted)
So I have this dataloader that loads data from hdf5 but exits unexpectedly when I am using num_workers>0 (it works ok when 0). More strangely, it works okay with more workers on google colab, but not on my computer.
On my computer I have the following error:
Traceback (most recent call last):
File "C:\Users\Flavio Maia\AppData\Roaming\Python\Python37\site-packages\torch\utils\data\dataloader.py", line 986, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\queues.py", line 105, in get
raise Empty
_queue.Empty
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "", line 2, in
File "C:\Users\Flavio Maia\AppData\Roaming\Python\Python37\site-packages\torch\utils\data\dataloader.py", line 517, in next
data = self._next_data()
File "C:\Users\Flavio Maia\AppData\Roaming\Python\Python37\site-packages\torch\utils\data\dataloader.py", line 1182, in _next_data
idx, data = self._get_data()
File "C:\Users\Flavio Maia\AppData\Roaming\Python\Python37\site-packages\torch\utils\data\dataloader.py", line 1148, in _get_data
success, data = self._try_get_data()
File "C:\Users\Flavio Maia\AppData\Roaming\Python\Python37\site-packages\torch\utils\data\dataloader.py", line 999, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 12332) exited unexpectedly
Also, my getitem function is:
def __getitem__(self,index):
desired_file = int(index/self.file_size)
position = index % self.file_size
h5_file = h5py.File(self.files[desired_file], 'r')
image = h5_file['Screenshots'][position]
rect = h5_file['Rectangles'][position]
numb = h5_file['Numbers'][position]
h5_file.close()
image = torch.from_numpy(image).float()
rect = torch.from_numpy(rect).float()
numb = torch.from_numpy( np.asarray(numb) ).float()
return (image, rect, numb)
Does anyone have any idea what can be causing this empty queue?
Windows can't handle num_workers > 0 . You can just set it to 0 which is fine. What also should work: Put all your train / test script in a train/test() function and call it under if __name__ == "__main__":
For example like this:
class MyDataLoder(torch.utils.data.Dataset):
train_set = create_dataloader()
. . .
def train():
test_set = create_dataloader()
. . .
def test():
. . .
if __name__ == "__main__":
train()
test()
Okay for last 2 weeks or so I have been teaching myself both opencv and kivy in order to create a UI/Camera System for Autonomous mission from MATE ROV. (I don't feel like explaining about MATE ROV just google it) I have succeeded in creating both the UI and the camera implementation. However, whenever I go to add the cv2.HoughLinesP calculation to find the length of a rectangle in my test image.
[Test Image][1]
I get created by the code running for a short amount of time (usually runs through entire code a couple of times) then I get this.
Traceback (most recent call last):
File "main.py", line 87, in <module>
CamApp().run()
File "/home/mlees/kivy_venv/lib/python3.6/site-packages/kivy/app.py", line 855, in run
runTouchApp()
File "/home/mlees/kivy_venv/lib/python3.6/site-packages/kivy/base.py", line 504, in runTouchApp
EventLoop.window.mainloop()
File "/home/mlees/kivy_venv/lib/python3.6/site-packages/kivy/core/window/window_sdl2.py", line 747, in mainloop
self._mainloop()
File "/home/mlees/kivy_venv/lib/python3.6/site-packages/kivy/core/window/window_sdl2.py", line 479, in _mainloop
EventLoop.idle()
File "/home/mlees/kivy_venv/lib/python3.6/site-packages/kivy/base.py", line 339, in idle
Clock.tick()
File "/home/mlees/kivy_venv/lib/python3.6/site-packages/kivy/clock.py", line 591, in tick
self._process_events()
File "kivy/_clock.pyx", line 384, in kivy._clock.CyClockBase._process_events
File "kivy/_clock.pyx", line 414, in kivy._clock.CyClockBase._process_events
File "kivy/_clock.pyx", line 412, in kivy._clock.CyClockBase._process_events
File "kivy/_clock.pyx", line 167, in kivy._clock.ClockEvent.tick
File "main.py", line 56, in update
for line in buf8:
TypeError: 'NoneType' object is not iterable
I have no clue what is causing this error so if anyone can help me out that would be great. Full Code is below.
from kivy.uix.widget import Widget
from kivy.uix.gridlayout import GridLayout
from kivy.uix.image import Image
from kivy.uix.label import Label
from kivy.clock import Clock
from kivy.graphics.texture import Texture
import cv2
import numpy as np
class CamApp(App):
def build(self):
self.img0 = Image()
self.img1 = Image()
self.img2 = Image()
self.img3 = Image()
layout = GridLayout(cols = 4, rows = 3)
layout.add_widget(self.img0)
layout.add_widget(self.img1)
layout.add_widget(Label(text="HELP"))
layout.add_widget(Label(text="HELP"))
layout.add_widget(self.img2)
layout.add_widget(self.img3)
layout.add_widget(Label(text="HELP"))
layout.add_widget(Label(text="HELP"))
layout.add_widget(Label(text="HELP"))
layout.add_widget(Label(text="HELP"))
layout.add_widget(Label(text="HELP"))
layout.add_widget(Label(text="HELP"))
#opencv2 stuffs
self.capture = cv2.VideoCapture(0)
Clock.schedule_interval(self.update, 1.0/33.0)
return layout
def update(self, dt):
# display image from cam in opencv window
ret, frame = self.capture.read()
# Flip Image and set up first frame
buf1 = cv2.flip(frame, -1)
# Convert main frame to Grayscale
buf3 = cv2.cvtColor(buf1, cv2.COLOR_BGR2GRAY)
# Take Grayscale and add an adaptiveThreshold
buf5 = cv2.adaptiveThreshold(buf3,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,11,2)
# Edge detection and line detection
buf7 = cv2.Canny(buf3,80,240,3)
buf8 = cv2.HoughLinesP(buf7, 1, np.pi/180, 60, np.array([]), 50, 5)
for line in buf8:
for x1, y1, x2, y2 in line:
cv2.line(buf8, (x1, y1), (x2, y2), (255, 255, 255), 4)
distance_pixels = np.sqrt(np.square(x2 - x1) + np.square(y2 - y1))
print(distance_pixels)
# Necessary to display all the transformations and other bullshit
buf0 = buf1.tostring()
buf2 = buf3.tostring()
buf4 = buf5.tostring()
buf6 = buf7.tostring()
# The next 9 lines are kivy bullshit to get the images on the screen.
texture0 = Texture.create(size=(frame.shape[1], frame.shape[0]), colorfmt='bgr')
texture1 = Texture.create(size=(frame.shape[1], frame.shape[0]), colorfmt='luminance')
texture2 = Texture.create(size=(frame.shape[1], frame.shape[0]), colorfmt='luminance')
texture3 = Texture.create(size=(frame.shape[1], frame.shape[0]), colorfmt='luminance')
texture0.blit_buffer(buf0, colorfmt='bgr', bufferfmt='ubyte')
texture1.blit_buffer(buf2, colorfmt='luminance', bufferfmt='ubyte')
texture2.blit_buffer(buf4, colorfmt='luminance', bufferfmt='ubyte')
texture3.blit_buffer(buf6, colorfmt='luminance', bufferfmt='ubyte')
# display image from the texture
self.img0.texture = texture0
self.img1.texture = texture1
self.img2.texture = texture2
self.img3.texture = texture3
# Here's the running shit.
if __name__ == '__main__':
CamApp().run()
cv2.destroyAllWindows()
Thanks in advance!
This error is arising because in some case, no line is found by HoughLinesP() function thus "buf8" have value "None" at those instances.
Add a line before starting the for loop to check if "buf8" is "None" or not. If not None process, else break the process.
I do transformations on images as below (which works with RandCrop): (it is from this dataloader script: https://github.com/jeffreyhuang1/two-stream-action-recognition/blob/master/dataloader/motion_dataloader.py)
def train(self):
training_set = motion_dataset(dic=self.dic_video_train, in_channel=self.in_channel, root_dir=self.data_path,
mode=‘train’,
transform = transforms.Compose([
transforms.Resize([256,256]),
transforms.FiveCrop([224, 224]),
#transforms.RandomCrop([224, 224]),
transforms.ToTensor(),
#transforms.Normalize([0.5], [0.5])
]))
print ‘==> Training data :’,len(training_set),’ videos’,training_set[1][0].size()
train_loader = DataLoader(
dataset=training_set,
batch_size=self.BATCH_SIZE,
shuffle=True,
num_workers=self.num_workers,
pin_memory=True
)
return train_loader
But when I do try to get Five Crops, I get this error:
Traceback (most recent call last):
File “motion_cnn.py”, line 267, in
main()
File “motion_cnn.py”, line 51, in main
train_loader,test_loader, test_video = data_loader.run()
File “/media/d/DATA_2/two-stream-action-recognition-master/dataloader/motion_dataloader.py”, line 120, in run
train_loader = self.train()
File “/media/d/DATA_2/two-stream-action-recognition-master/dataloader/motion_dataloader.py”, line 156, in train
print ‘==> Training data :’,len(training_set),’ videos’,training_set[1][0].size()
File “/media/d/DATA_2/two-stream-action-recognition-master/dataloader/motion_dataloader.py”, line 77, in getitem
data = self.stackopf()
File “/media/d/DATA_2/two-stream-action-recognition-master/dataloader/motion_dataloader.py”, line 51, in stackopf
H = self.transform(imgH)
File “/media/d/DATA_2/two-stream-action-recognition-master/venv/local/lib/python2.7/site-packages/torchvision/transforms/transforms.py”, line 60, in call
img = t(img)
File “/media/d/DATA_2/two-stream-action-recognition-master/venv/local/lib/python2.7/site-packages/torchvision/transforms/transforms.py”, line 91, in call
return F.to_tensor(pic)
File “/media/d/DATA_2/two-stream-action-recognition-master/venv/local/lib/python2.7/site-packages/torchvision/transforms/functional.py”, line 50, in to_tensor
raise TypeError(‘pic should be PIL Image or ndarray. Got {}’.format(type(pic)))
TypeError: pic should be PIL Image or ndarray. Got <type ‘tuple’>
Getting 5 random crops, I should handle a tuple of images instead of a PIL image - so I use Lambda, but then I get the error, at line 55, in stackopf
flow[2*(j),:,:] = H
RuntimeError: expand(torch.FloatTensor{[5, 1, 224, 224]}, size=[224,
224]): the number of sizes provided (2) must be greater or equal to
the number of dimensions in the tensor (4)
and when I try to set flow = torch.FloatTensor(5, 2*self.in_channel,self.img_rows,self.img_cols)
I get motion_dataloader.py", line 55, in stackopf
flow[:,2*(j),:,:] = H
RuntimeError: expand(torch.FloatTensor{[5, 1, 224, 224]}, size=[5,
224, 224]): the number of sizes provided (3) must be greater or equal
to the number of dimensions in the tensor (4)
when I multiply the train batchsize by 5 that is returned, I also get the same error.
I am trying to build a patch-wise image classifier network, therefore I want to extract patches from a .tif image and then save them as file:
import cv2
import tensorflow as tf
import numpy as np
from PIL import Image
file = 'data/test/Benign/b001.tif'
patch_path = 'data/patches/'
k = 1495 # window size
s = 99 # stride
def extract_patches(img_file, img_name):
img = cv2.imread(img_file)
padd = tf.constant([[29, 29, ], [21, 20], [0, 0]])
img = tf.pad(img, padd, "CONSTANT")
img = tf.expand_dims(img, 0)
c = img.get_shape()[-1] # color
extracted = tf.extract_image_patches(
images=img,
ksizes=[1, k, k, 1],
strides=[1, s, s, 1],
rates=[1, 1, 1, 1],
padding='VALID')
patches_shape = extracted.shape
patches = tf.reshape(extracted, [tf.reduce_prod(patches_shape[0:3]), k, k, int(c)])
patch_num = patches.shape[0]
for i in range(patch_num):
sess = tf.Session()
curr_patch = patches[i]
print(type(curr_patch))
print(curr_patch.shape)
# decode_patch = tf.image.decode_image(curr_patch, channels=3)
# print(type(decode_patch))
# print(decode_patch.shape)
resized_patch = tf.image.resize_images(curr_patch, [299, 299], method=tf.image.ResizeMethod.BILINEAR)
print(type(resized_patch))
print(resized_patch.shape)
encode_patch = tf.image.encode_jpeg(resized_patch)
print(type(encode_patch))
print(encode_patch.shape)
fwrite = tf.write_file(patch_path + img_name + '/' + str(i) + '_' + img_name, encode_patch)
sess.run(fwrite)
extract_patches(file, 'test.tif')
This is the output that I get currently:
class 'tensorflow.python.framework.ops.Tensor'
(1495, 1495, 3)
class 'tensorflow.python.framework.ops.Tensor'
(299, 299, 3)
Traceback (most recent call last):
File
"C:\Users\Mary\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py",
line 510, in _apply_op_helper preferred_dtype=default_dtype)
File
"C:\Users\Mary\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py",
line 1104, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File
"C:\Users\Mary\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py",
line 947, in _TensorTensorConversionFunction
(dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype uint8 for Tensor with
dtype
float32: 'Tensor("resize_images/Squeeze:0", shape=(299, 299, 3),
dtype=float32)'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File
"C:/Users/Mary/PycharmProjects/FineTune_18_12_10/patchbazi.py", line
51, in
extract_patches(file, 'test.tif')
File "C:/Users/Mary/PycharmProjects/FineTune_18_12_10/patchbazi.py",
line 43, in extract_patches
encode_patch = tf.image.encode_jpeg(resized_patch)
File
"C:\Users\Mary\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_image_ops.py",
line 1439, in encode_jpeg
name=name)
File
"C:\Users\Mary\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py",
line 533, in _apply_op_helper
(prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'image' of 'EncodeJpeg' Op has type float32 that does
not match expected type of uint8.
Process finished with exit code 1
as you can see, when I try to encode_jpeg(resized_patch), I get the type mismatch error. Without the tf.image.resize_images(), everything works perfectly, so I guess there is some type change happening in the resize function. I also tried decoding the image as suggested here, but apparently the decoder works only for few file extensions. Can some one help me with it?
I am using python 3.6.5 and tensorflow 1.12.0
From the docs of tf.image.resize_images (emphasis mine):
The return value has the same type as images if method is
ResizeMethod.NEAREST_NEIGHBOR. It will also have the same type as
images if the size of images can be statically determined to be the
same as size, because images is returned in this case. Otherwise, the
return value has type float32.
You need to cast the result to uint8 which is the expected type for the input of EncodeJpeg:
encode_patch = tf.image.encode_jpeg(tf.cast(resized_patch, tf.uint8))
Also, as a side note, type(my_tensor) is not useful to know what type is the data in the tensor. Either print my_tensor directly, or my_tensor.dtype.