How to resolve an error in Yolov5 train,py in yaml - python-3.x

I am trying to run yolov5 in Google colab GPU free version
!pip install PyYAML==5.3
I am getting an error
Model Summary: 407 layers, 8.84875e+07 parameters, 8.84875e+07 gradients
Optimizer groups: 134 .bias, 142 conv.weight, 131 other
Traceback (most recent call last):
File "/content/yolov5/train.py", line 116, in train
ckpt['model'] = {k: v for k, v in ckpt['model'].float().state_dict().items()
File "/content/yolov5/train.py", line 117, in <dictcomp>
if model.state_dict()[k].shape == v.shape} # to FP32, filter
KeyError: 'model.18.conv.weight'
My command in google colab is
!python /content/yolov5/train.py --img 640 --batch 4 --epochs 30 \
--data /content/yolov5/data/clothing.yaml
--cfg /content/yolov5/models/yolov5x.yaml \
--weights yolov5x.pt \
--name yolov5_clothing --cache
Can you please help me to resolve this ?
Thanks

This issue has been resolved by yolo team asked me to run requirements.txt and download yolo again.
Everything is fine you can see the more details on https://github.com/ultralytics/yolov5/issues/2181
Thanks

Related

RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

I am following this tutorial: https://blog.paperspace.com/train-yolov5-custom-data/ in order to train a custom dataset. I followed exactly the steps it says inside this tutorial. But when I try this command: python3 train.py --img 640 --cfg yolov5s.yaml --hyp hyp.scratch.yaml --batch 32 --epochs 100 --data road_sign_data.yaml --weights yolov5s.pt --workers 24 --name yolo_road_det I get this error:
File "/home/UbuntuUser/.local/lib/python3.8/site-packages/torch/serialization.py", line 242, in __init__
super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory
I have searched in google, and I found similar threads like this: https://discuss.pytorch.org/t/error-on-torch-load-pytorchstreamreader-failed/95103 and this: last.ckpt | RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory but I couldn't find a solution! The problem is here:
class _open_zipfile_reader(_opener):
def __init__(self, name_or_buffer) -> None:
super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
I followed the steps of the above tutorial, I don't know how to fix it...Could you help me?

Unable to export Core ML model in Turicreate

I used AWS Sagemaker with Jupyter notebook to train my Turicreate model. It trained successfully but I'm unable to export it to a CoreML model. It shows the below error. I've tried various kernels in the Jupyter notebook with the same result. Any ideas on how to fix this error?
turicreate 5.4
GPU: mxnet-cu100
KeyError Traceback (most recent call last)
<ipython-input-6-3499bdb76e06> in <module>()
1 # Export for use in Core ML
----> 2 model.export_coreml('pushupsTC.mlmodel')
~/anaconda3/envs/python3/lib/python3.6/site-packages/turicreate/toolkits/object_detector/object_detector.py in export_coreml(self, filename, include_non_maximum_suppression, iou_threshold, confidence_threshold)
1216 assert (self._model[23].name == 'pool5' and
1217 self._model[24].name == 'specialcrop5')
-> 1218 del net._children[24]
1219 net._children[23] = op
1220
KeyError: 24

Pyspark Inferring Timezone by location

I'm trying to infer timezone in PySpark given the longitude and latitude of an event. I came across the timezonefinder library which works locally. I wrapped it in a user defined function in an attempt to use it as the timezone inferrer.
def get_timezone(longitude, latitude):
from timezonefinder import TimezoneFinder
tzf = TimezoneFinder()
return tzf.timezone_at(lng=longitude, lat=latitude)
udf_timezone = F.udf(get_timezone, StringType())
df = sqlContext.read.parquet(INPUT)
df.withColumn("local_timezone", udf_timezone(df.longitude, df.latitude))\
.write.parquet(OUTPUT)
When I run on a single node, this code works. However, when running in parallel, I get the following error:
File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1525907011747_0007/container_1525907011747_0007_01_000062/pyspark.zip/pyspark/worker.py", line 177, in main
process()
File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1525907011747_0007/container_1525907011747_0007_01_000062/pyspark.zip/pyspark/worker.py", line 172, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1525907011747_0007/container_1525907011747_0007_01_000062/pyspark.zip/pyspark/worker.py", line 104, in <lambda>
func = lambda _, it: map(mapper, it)
File "<string>", line 1, in <lambda>
File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1525907011747_0007/container_1525907011747_0007_01_000062/pyspark.zip/pyspark/worker.py", line 71, in <lambda>
return lambda *a: f(*a)
File "/tmp/c95422912bfb4079b64b88427991552a/enrich_data.py", line 64, in get_timezone
File "/opt/conda/lib/python2.7/site-packages/timezonefinder/__init__.py", line 3, in <module>
from .timezonefinder import TimezoneFinder
File "/opt/conda/lib/python2.7/site-packages/timezonefinder/timezonefinder.py", line 59, in <module>
from .helpers_numba import coord2int, int2coord, distance_to_polygon_exact, distance_to_polygon, inside_polygon, \
File "/opt/conda/lib/python2.7/site-packages/timezonefinder/helpers_numba.py", line 17, in <module>
#jit(b1(i4, i4, i4[:, :]), nopython=True, cache=True)
File "/opt/conda/lib/python2.7/site-packages/numba/decorators.py", line 191, in wrapper
disp.enable_caching()
File "/opt/conda/lib/python2.7/site-packages/numba/dispatcher.py", line 529, in enable_caching
self._cache = FunctionCache(self.py_func)
File "/opt/conda/lib/python2.7/site-packages/numba/caching.py", line 614, in __init__
self._impl = self._impl_class(py_func)
File "/opt/conda/lib/python2.7/site-packages/numba/caching.py", line 349, in __init__
"for file %r" % (qualname, source_path))
RuntimeError: cannot cache function 'inside_polygon': no locator available for file '/opt/conda/lib/python2.7/site-packages/timezonefinder/helpers_numba.py'
I can import the library locally on the nodes where I got the error.
Any solution along these line would be appreciated:
Is there a native Spark to do the task?
Is there another way to load the library?
Is there a way to avoid caching numba does?
Eventually this was solved by abandoning timezonefinder completely, and instead, using the geo-spatial timezone dataset from timezone-boundary-builder, while querying using magellan, the geo-spatial sql query library for spark.
One caveat I had was the fact that the Point and other objects in the library were not wrapped for Python. I ended up writing my own scala function for timezone matching, and dropped the objects from magellan before returning the dataframe.
Encountered this error when running timezonefinder on spark cluster.
RuntimeError: cannot cache function 'inside_polygon': no locator available for file '/disk-1/hadoop/yarn/local/usercache/timezonefinder1.zip/timezonefinder/helpers_numba.py'
The issue was that numpy versions were different on cluster and timezonefinder package that we shipped to spark.
Cluster had numpy - 1.13.3 where as numpy on timezonefinder.zip was 1.17.2.
To overcome version mismatches, we created a custom conda environment with timezonefinder and numpy 1.17.2 and submitted spark job using custom conda environment.
Creating Custom Conda Environment with timezonefinder package installed:
conda create --name timezone-conda python timezonefinder
source activate timezone-conda
conda install -y conda-pack
conda pack -o timezonecondaevnv.tar.gz -d ./MY_CONDA_ENV
https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands
Submitting spark job with custom conda environment:
!spark-submit --name app_name \
--master yarn \
--deploy-mode cluster \
--driver-memory 1024m \
--executor-memory 1GB \
--executor-cores 5 \
--num-executors 10 \
--queue QUEUE_NAME\
--archives ./timezonecondaevnv.tar.gz#MY_CONDA_ENV \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./MY_CONDA_ENV/bin/python \
--conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=./MY_CONDA_ENV/bin/python \
--conf spark.executorEnv.PYSPARK_PYTHON=./MY_CONDA_ENV/bin/python \
--conf spark.executorEnv.PYSPARK_DRIVER_PYTHON=./MY_CONDA_ENV/bin/python \
./main.py

keras lstm seq2seq example Keyword argument not understood return_state on windows

I am running this example code ( seq2seq built on Keras)form https://github.com/fchollet/keras/blob/master/examples/lstm_seq2seq.py.
This code runs correctly on my Ubuntu. But an error occured when I ran the same code on my Windows.
It says:
Using TensorFlow backend.
Number of samples: 10000
Number of unique input tokens: 73
Number of unique output tokens: 86
Max sequence length for inputs: 17
Max sequence length for outputs: 42
Traceback (most recent call last):
File "h:/eclipse_workspace/Keras_DL/src/seq2seq/lstm_seq2seq.py", line 125, in
encoder = LSTM(latent_dim, return_state = True)
File "D:\software\anaconda\lib\site-packages\keras\legacy\interfaces.py", line 88, in wrapper
return func(*args, **kwargs)
File "D:\software\anaconda\lib\site-packages\keras\layers\recurrent.py", line 949, in init
super(LSTM, self).init(**kwargs)
File "D:\software\anaconda\lib\site-packages\keras\layers\recurrent.py", line 191, in init
super(Recurrent, self).init(**kwargs)
File "D:\software\anaconda\lib\site-packages\keras\engine\topology.py", line 281, in init
raise TypeError('Keyword argument not understood:', kwarg)
TypeError: ('Keyword argument not understood:', 'return_state')
I found that return_state do exists in
keras.layers.recurrent.Recurrent(return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, implementation=0)
Can anyone tell me how can I run this demo correctly on Windows?
My system info:
- OS : Windows 10 64 bit
- python 3.5.2 64 bit
- cudnn-8.0-windows10-x64-v5.1
- keras 2.04 tensorflow-gpu 1.1.0
Your Keras version is too old. return_state is added in Keras 2.0.5. I suggest you install the latest version from GitHub, since the example code you're running has been added to the library less than 24 hours ago.

Error when running deepmind

It took me two days to install the requirements of deepQ(python version),then I tried to run it today but I faced this problem, and the code are as followed.
root#unicorn:/media/trump/Data1/wei/college/laboratory/deep_q_rl-master/deep_q_rl# python run_nips.py
A.L.E: Arcade Learning Environment (version 0.5.0)
[Powered by Stella]
Use -help for help screen.
Warning: couldn't load settings file: ./ale.cfg
Game console created:
ROM file: ../roms/breakout.bin
Cart Name: Breakout - Breakaway IV (1978) (Atari)
Cart MD5: f34f08e5eb96e500e851a80be3277a56
Display Format: AUTO-DETECT ==> NTSC
ROM Size: 2048
Bankswitch Type: AUTO-DETECT ==> 2K
Running ROM file...
Random seed is 65
Traceback (most recent call last):
File "run_nips.py", line 60, in <module>
launcher.launch(sys.argv[1:], Defaults, __doc__)
File "/media/trump/Data1/wei/college/laboratory/deep_q_rl-master/deep_q_rl/launcher.py", line 223, in launch
rng)
File "/media/trump/Data1/wei/college/laboratory/deep_q_rl-master/deep_q_rl/q_network.py", line 53, in __init__
num_actions, num_frames, batch_size)
File "/media/trump/Data1/wei/college/laboratory/deep_q_rl-master/deep_q_rl/q_network.py", line 168, in build_network
batch_size)
File "/media/trump/Data1/wei/college/laboratory/deep_q_rl-master/deep_q_rl/q_network.py", line 407, in build_nips_network_dnn
from lasagne.layers import dnn
File "/usr/local/lib/python2.7/dist-packages/Lasagne-0.2.dev1-py2.7.egg/lasagne/layers/dnn.py", line 13, in <module>
raise ImportError("dnn not available") # pragma: no cover
ImportError: dnn not available
I have already tested theano ,numpy, scipy and there was no errors coming out. But when I ran it, it said dnn not available. So I came to find dnn, and the code is like this
import theano
from theano.sandbox.cuda import dnn
from .. import init
from .. import nonlinearities
from .base import Layer
from .conv import conv_output_length
from .pool import pool_output_length
from ..utils import as_tuple
if not theano.config.device.startswith("gpu") or not dnn.dnn_available():
raise ImportError("dnn not available") # pragma: no cover
Just hope someone can help me.
Did you install CUDA and cuDNN?
Lasagne is build on top of Theano and is, in some cases, relying on cuda code (e.g. here), rather than abstracting it away.
This can be seen from the import:
from theano.sandbox.cuda import dnn
Also see: https://github.com/Lasagne/Lasagne/issues/242
To get cuDNN you need to register at NVidia as a developer, see:
https://developer.nvidia.com/accelerated-computing
Hope this helps.
Cheers,
Michael

Resources