RuntimeError when running 3D Ken Burns Effect - python-3.x

I'm trying to run a Google Colab of the 3d Ken Burns effect that can be found here
https://colab.research.google.com/github/agmm/colab-3d-ken-burns/blob/master/automatic-3d-ken-burns.ipynb
I've got through most of it (I had to add a line of pip install gevent
as that was causing another error)
But on the final step I'm getting and error of
Traceback (most recent call last): File "autozoom.py", line 76, in
process_load(npyImage, {}) File "", line 10, in process_load File "", line 128, in disparity_refinement
File
"/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py",
line 550, in call
result = self.forward(*input, **kwargs) File "", line 94, in forward RuntimeError: view size is not compatible with input
tensor's size and stride (at least one dimension spans across two
contiguous subspaces). Use .reshape(...) instead.
Would anybody be able to take a look and shed some light on what the error may be.
thank you

Try to upload a smaller picture in pixel size

Related

Why is my weights-file not working / not found?

I downloaded YOLOv7 from the github-page, trained it on a custom dataset, took the weights and trained it again on a different custom dataset. Both Trainings worked as expected.
Now I want to use the final weights for detection in new videos, but I get an error, when YOLO tries to load the weights (see message below).
Edit: I found the answer to my question. The problem was in the path to my weights there were upper case characters. Turns out that they are converted to lower case and the file could not be found anymore.
Thanks #gspr for pointing me to the right direction.
Traceback (most recent call last):
File "detect.py", line 195, in <module>
detect()
File "detect.py", line 34, in detect
model = attempt_load(weights, map_location=device) # load FP32 model
File "/home/mahler/yolo/yolov7/models/experimental.py", line 241, in attempt_load
attempt_download(w)
File "/home/mahler/yolo/yolov7/utils/google_utils.py", line 31, in attempt_download
tag = subprocess.check_output('git tag', shell=True).decode().split()[-1]
IndexError: list index out of range

Error using MultiWorkerMirroredStrategy to train object detection research model ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8

I'm trying to train research model ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8 using the MultiWorkerMirroredStrategy (by setting --num_workers=2 in the invocation of model_main_tf2.py). I'm trying to train across two workers (0 and 1), each with a single GPU. However, when I attempt this I get the following error, always on worker 1:
Traceback (most recent call last):
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 553, in __next__
return self.get_next()
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 610, in get_next
return self._get_next_no_partial_batch_handling(name)
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 642, in _get_next_no_partial_batch_handling
replicas.extend(self._iterators[i].get_next_as_list(new_name))
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 1594, in get_next_as_list
return self._format_data_list_with_options(self._iterator.get_next())
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\data\ops\multi_device_iterator_ops.py", line 580, in get_next
result.append(self._device_iterators[i].get_next())
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 889, in get_next
return self._next_internal()
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 819, in _next_internal
ret = gen_dataset_ops.iterator_get_next(
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2922, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\framework\ops.py", line 7186, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence [Op:IteratorGetNext]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\JS\Desktop\Tensorflow\models\research\object_detection\model_main_tf2.py", line 114, in <module>
tf.compat.v1.app.run()
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\platform\app.py", line 36, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\absl\app.py", line 312, in run
_run_main(main, args)
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\absl\app.py", line 258, in _run_main
sys.exit(main(argv))
File "C:\Users\JS\Desktop\Tensorflow\models\research\object_detection\model_main_tf2.py", line 105, in main
model_lib_v2.train_loop(
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\object_detection\model_lib_v2.py", line 605, in train_loop
load_fine_tune_checkpoint(
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\object_detection\model_lib_v2.py", line 401, in load_fine_tune_checkpoint
_ensure_model_is_built(model, input_dataset, unpad_groundtruth_tensors)
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\object_detection\model_lib_v2.py", line 161, in _ensure_model_is_built
features, labels = iter(input_dataset).next()
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 549, in next
return self.__next__()
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 555, in __next__
raise StopIteration
StopIteration
Worker 0 eventually fails after detecting that worker 1 has gone down.
This error happens regardless of the physical machines on which the two workers run. In other words I see it if I'm running both workers on a single machine (using localhost) OR different machines on the same network.
Based on the trace in the error messages, the error appears to be occurring whenever the training loop attempts to iterate over the training data generated by strategy.experimental_distribute_datasets_from_function. Note that if I change the strategy to MirroredStrategy it runs fine on a single machine (no other changes made). I'm not sure if I'm doing something wrong or if there is a bug in the object detection API.
My setup on both machines is identical (I basically followed the setup instructions on the object detection web-site):
Windows 10
Tensorflow 2.8.0
Cuda Toolkit 11.2
cudnn 8.1
Has anyone ever seen this error before? If so, is there a way around it?
Ok, I think I understand the issue. In the object detection library there is a file called dataset_builder.py that builds the training dataset from the TFRecord stored in the file specified in the pipeline.config file (in the input_path item of the tf_record_input_reader). The function that actually reads the TFRecord file is _read_dataset_internal. This function treats the input_path of the pipeline config as a LIST OF FILES and then applies a sharding function (passed as an argument) to divide the files between the replicas doing the training (one replica per worker). Since my input_path only specified a single TFRecord file it was assigned to the first replica and the other replicas were given empty filenames!! Thus only the first replica actually had an input dataset to work with, hence the crash.
The solution was to split the training data across two files (two TFRecords) and then set the input_path in pipeline.config to be a list of paths rather than a single path. Once I did this it appears as though the model trained successfully (at least it didn't crash).
I'm not sure if this is a bug in the object detection code or not. I assumed that if I only had one training record (visible to both workers) that both workers would use it and just batch the data accordingly. I'm just not sure if the assumption itself is wrong or if the assumption is correct and the code is wrong.
Anyway, I this helps anyone who might be wrestling with the same issue.

RuntimeError: unexpected EOF, expected 3302200 more bytes. The file might be corrupted

I am trying to implement pretrained model of following repository. I need your assistance to rectify the error.
RuntimeError: unexpected EOF, expected 3302200 more bytes. The file might be corrupted.
I tried to implement pretrained model of CANNet present on following repo using google Collab and followed all steps of (Prerequisites, cloning, Data Preparation, and Testing)
https://github.com/gjy3035/NWPU-Crowd-Sample-Code.git
The detailed error is given below
Traceback (most recent call last):
File "test.py", line 118, in
main()
File "test.py", line 46, in main
test(lines, model_path)
File "test.py", line 55, in test
net.load_state_dict(torch.load(model_path))
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 779, in _legacy_load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 3302200 more bytes. The file might be corrupted.
Check out this github link: https://github.com/huggingface/transformers/issues/1491
It proposes one should use the force_download arg. This is equivalent to force_reload assuming you're using torch.load.hub to load the pretrained model. The other option proposed is applicable to windows users is to delete the downloaded model and download it again.
I have the same issue but --setting force_reload=True hasn't cleared it for me, I'm thinking I have space problems, but I think it's worth a shot on your end.
I also faced the same same problem while I was evaluating my trained model on google collab. I found that the model was taking a lot of time to get fully uploaded to the machine. I was testing with the incompletely uploaded model. when I ensured that the model has been fully uploaded and then I ran, it worked.

AzureML: Unable to unpickle LightGBM model

I am trying to run an Azure ML pipeline. This pipeline trains a model, saves it a pickle file and then tries to unpickle it in the next step. When trying to unpickle it, I am facing the below issue in any random run:
Traceback (most recent call last):
File "batch_scoring.py", line 199, in
clf = joblib.load(open(model_path, 'rb'))
File "/azureml-envs/azureml_347514cea2002d6bd71b42aceb1e4eeb/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 595, in load
obj = _unpickle(fobj)
File "/azureml-envs/azureml_347514cea2002d6bd71b42aceb1e4eeb/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 529, in _unpickle
obj = unpickler.load()
File "/azureml-envs/azureml_347514cea2002d6bd71b42aceb1e4eeb/lib/python3.6/pickle.py", line 1048, in load
raise EOFError
EOFError
Has anyone faced this issue before?
I get same error when I m trying to unpickle the model from output folder/ model registry. In my case the pkl was not properly formed during the experiment. Try to re-run the experiment (I did it without changing any line and it works for me). In my case even the first pickle was a smaller size than the good one. Hope this helps :)

Unable to download file (Web Scraping) - OSError [Errorno22] - invalid argument

I wrote a program in Python 3 which scrapes and download the pages of wikipedia category with certain depth and places them in a directory.
The problem which I am facing is, "suppose during the execution of code, if the algorithm encounters any page of wikipedia having special character like (*, #, $ etc.), then the algorithm fails with the below mentioned message in terms of error trace ".
An example of the special character wiki page is as follows:
https://en.wikipedia.org/wiki/Eden*
The error trace is as follows:
Traceback (most recent call last):
File "F:\Pen Drive 8 GB\PDF\Code\wiki.py", line 103, in <module>
d.search_and_store("Biomedical_engineering", subcategory_depth=2, path=PATH)
File "F:\Pen Drive 8 GB\PDF\Code\wiki.py", line 98, in search_and_store
self.search_and_store(subcat_result['title'], subcategory_depth-1, path)
File "F:\Pen Drive 8 GB\PDF\Code\wiki.py", line 98, in search_and_store
self.search_and_store(subcat_result['title'], subcategory_depth-1, path)
File "F:\Pen Drive 8 GB\PDF\Code\wiki.py", line 76, in search_and_store
if self.write_page_text(path, page_result):
File "F:\Pen Drive 8 GB\PDF\Code\wiki.py", line 44, in write_page_text
txt_file = open(file_path, 'w')
OSError: [Errno 22] Invalid argument: 'F:\\Code\\Wikipedia\\DATASETS\\Biomedical Engineering/Eden*.txt'
As you can see clearly, the algorithm scrapes the data of the pages without having any special character's, but why it raising the aforementioned error.
The MWE is very large. If anybody suggests, then I can share the same.
Please suggest something, as I am trying this since long and frustrated. I don't even have idea what I am doing wrong? Please help.
Any small help is deeply appreciated.
Thanks in Advance.

Resources