Unable to train a self-supervised(ssl) model using Lightly CLI

Unable to train a self-supervised(ssl) model using Lightly CLI - python-3.x

I am unable to train a self-supervised(ssl) model to create image embeddings using the lightly cli: Lightly Platform Link. I intend to select diverse example from my dataset to create an object detection model further downstream and the image embeddings created with the ssl model will help me to perform Active Learning.I have reproduced the error in the Notebook with public access -----> lightly_app_troubleshooting_stackoverflow.ipynb Link.
In the notebook shared above this cmd raises an exception:
!source /content/venv_1/bin/activate;lightly-magic \
input_dir="/content/Sunflowers" trainer.max_epochs=20 \
token='< your lightly token(free account) >' \
new_dataset_name="sunflowers_dataset" loader.batch_size=64
The exception stack trace produced is as below:
/content/venv_1/lib/python3.7/site-packages/hydra/_internal/hydra.py:127: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
configure_logging=with_log_configuration,
########## Starting to train an embedding model.
/content/venv_1/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py:23: LightningDeprecationWarning: pytorch_lightning.core.lightning.LightningModule has been deprecated in v1.7 and will be removed in v1.9. Use the equivalent class from the pytorch_lightning.core.module.LightningModule class instead.
"pytorch_lightning.core.lightning.LightningModule has been deprecated in v1.7"
Error executing job with overrides: ['input_dir=/content/Sunflowers', 'trainer.max_epochs=20', 'token=5bbcf60e3a5c7c266dcd4e0e9056c8301684e0f2f8922bc5', 'new_dataset_name=sunflowers_dataset', 'loader.batch_size=64']
Traceback (most recent call last):
File "/content/venv_1/lib/python3.7/site-packages/lightly/cli/lightly_cli.py", line 115, in lightly_cli
return _lightly_cli(cfg)
File "/content/venv_1/lib/python3.7/site-packages/lightly/cli/lightly_cli.py", line 52, in _lightly_cli
checkpoint = _train_cli(cfg, is_cli_call)
File "/content/venv_1/lib/python3.7/site-packages/lightly/cli/train_cli.py", line 137, in _train_cli
encoder.train_embedding(**cfg['trainer'], strategy=distributed_strategy)
File "/content/venv_1/lib/python3.7/site-packages/lightly/embedding/_base.py", line 88, in train_embedding
trainer = pl.Trainer(**kwargs, callbacks=[self.checkpoint_callback])
File "/content/venv_1/lib/python3.7/site-packages/pytorch_lightning/utilities/argparse.py", line 345, in insert_env_defaults
return fn(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'weights_summary'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
I could not create a new tag - "lightly" as I lack the stackoverflow reputation points to do so.

The error is from an incompatibility with the latest PyTorch Lightning version (version 1.7 at the time of this writing). A quick fix is to use a lower version (e.g. 1.6). We are working on a fix :)
Let me know in case that does not work for you!

Related

Trying to incorporate ML onnx model to Android App

I have a trained onnx model that I want to incorporate into an android app.
I'm actually working on a uni project, combining ML & Android development.
After a long research, since I don't want to use a python private REST API, I came to the conclusion that there are two ways I could continue from here: I can either try converting my onnx model into a TF model and then generate a TFLite model through TFLite Converter API, or give it a try with onnxruntime.
I tried the first way with TFLite, using the answer from this post, and hence this code:
import onnx
from onnx_tf.backend import prepare
onnx_model = onnx.load("input_path") # load onnx model
tf_rep = prepare(onnx_model) # prepare tf representation
tf_rep.export_graph("output_path") # export the model
but I’m stuck in that first conversion from .onnx to .pb, since I think onnx-tf doesn’t support dynamic dimensions (that my model has). I’m constantly getting an
"Input size (depth of inputs) must be accessible via shape inference," or
RuntimeError: Node name is not unique in your model. Please recreate your model with unique node name. and similar errors.
I also gave it a try with onnxruntime, but I can't seem to manage to "Create a minimal build for Android with NNAPI support". I'm getting this error while building:
[1/67] Building CXX object CMakeFiles/libprotobuf.dir/C_/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.cc.obj
FAILED: CMakeFiles/libprotobuf.dir/C_/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.cc.obj
C:\PROGRA~2\CODEBL~1\MinGW\bin\C__~1.EXE -DGOOGLE_PROTOBUF_CMAKE_BUILD -DHAVE_PTHREAD -I. -IC:/Users/chris/onnxruntime/cmake/external/protobuf/src -std=c++11 -MD -MT CMakeFiles/libprotobuf.dir/C_/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.cc.obj -MF CMakeFiles\libprotobuf.dir\C_\Users\chris\onnxruntime\cmake\external\protobuf\src\google\protobuf\io\io_win32.cc.obj.d -o CMakeFiles/libprotobuf.dir/C_/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.cc.obj -c C:/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.cc
C:/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.cc: In function 'int google::protobuf::io::win32::stat(const char*, google::protobuf::io::win32::_stat*)':
C:/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.cc:315:40: error: cannot convert 'google::protobuf::io::win32::_stat*' to '_stat*' for argument '2' to 'int _wstat(const wchar_t*, _stat*)'
return ::_wstat(wpath.c_str(), buffer);
^
In file included from C:/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.cc:52:0:
C:/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.h:75:51: note: class type 'google::protobuf::io::win32::_stat' is incomplete
PROTOBUF_EXPORT int stat(const char* path, struct _stat* buffer);
^
C:/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.cc: In function 'FILE* google::protobuf::io::win32::fopen(const char*, const char*)':
C:/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/io/io_win32.cc:337:10: error: '::_wfopen' has not been declared
return ::_wfopen(wpath.c_str(), wmode.c_str());
^
[6/67] Building CXX object CMakeFiles/libprotobuf.dir/C_/Users/chris/onnxruntime/cmake/external/protobuf/src/google/protobuf/message_lite.cc.obj
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "C:\Users\chris\onnxruntime\\tools\ci_build\build.py", line 2023, in <module>
sys.exit(main())
File "C:\Users\chris\onnxruntime\\tools\ci_build\build.py", line 1918, in main
cmake_path, source_dir, build_dir, args)
File "C:\Users\chris\onnxruntime\\tools\ci_build\build.py", line 1673, in build_protoc_for_host
run_subprocess(cmd_args)
File "C:\Users\chris\onnxruntime\\tools\ci_build\build.py", line 544, in run_subprocess
return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
File "C:\Users\chris\onnxruntime\tools\python\util\run.py", line 44, in run
env=env, shell=shell)
File "C:\Users\chris\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['C:\\Program Files\\CMake\\bin\\cmake.EXE', '--build', 'C:\\Users\\chris\\onnxruntime\\\\build\\Windows\\host_protoc', '--config', 'Release', '--target', 'protoc']' returned non-zero exit status 1.
Am I in a completely wrong path? It's my first time trying to combine ML with Android so I have no experience in this. Any advice would be very welcome.

For your issue with onnxruntime, try changing the CMakeCache.txt :
CMAKE_CXX_FLAGS:STRING=-U__STRICT_ANSI__
It should fix the error you mentioned. I think you're in the right path with onnxruntime.

RuntimeError: Unable to start JVM because of Deprecated: convertStrings

I run an automated python job on an EMR cluster that updates Amazon Athena Tables.
It was running well until few days ago (on python 2.7 and 3.7). Here is the script:
from pyathenajdbc import connect
import yaml
config = yaml.load(open('athena-config.yaml', 'r'))
statements = config['statements']
staging_dir = config['staging_dir']
conn = connect(s3_staging_dir=staging_dir, region_name='eu-west-1')
try:
with conn.cursor() as cursor:
for statement in statements:
cursor.execute(statement)
finally:
conn.close()
The athena-config.yaml has a staging directory and few Athena Statements.
Here is the Error:
You are using pip version 9.0.3, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Unrecognized option: -server
create_tables.py:5: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(open('athena-config.yaml', 'r'))
/mnt/conda/lib/python3.7/site-packages/jpype/_core.py:210: UserWarning:
-------------------------------------------------------------------------------
Deprecated: convertStrings was not specified when starting the JVM. The default
behavior in JPype will be False starting in JPype 0.8. The recommended setting
for new code is convertStrings=False. The legacy value of True was assumed for
this session. If you are a user of an application that reported this warning,
please file a ticket with the developer.
-------------------------------------------------------------------------------
""")
Traceback (most recent call last):
File "create_tables.py", line 10, in <module>
region_name='eu-west-1')
File "/mnt/conda/lib/python3.7/site-packages/pyathenajdbc/__init__.py", line 69, in connect
driver_path, log4j_conf, **kwargs)
File "/mnt/conda/lib/python3.7/site-packages/pyathenajdbc/connection.py", line 68, in __init__
self._start_jvm(jvm_path, jvm_options, driver_path, log4j_conf)
File "/mnt/conda/lib/python3.7/site-packages/pyathenajdbc/util.py", line 25, in _wrapper
return wrapped(*args, **kwargs)
File "/mnt/conda/lib/python3.7/site-packages/pyathenajdbc/connection.py", line 97, in _start_jvm
jpype.startJVM(jvm_path, *args)
File "/mnt/conda/lib/python3.7/site-packages/jpype/_core.py", line 219, in startJVM
_jpype.startup(jvmpath, tuple(args), ignoreUnrecognized, convertStrings)
RuntimeError: Unable to start JVM
at loadJVM(native/common/jp_env.cpp:169)
at loadJVM(native/common/jp_env.cpp:179)
at startup(native/python/pyjp_module.cpp:159)
As far as I understand the issue in convertStrings being deprecated. Can anyone help me resolve that? I cannot understand why this """) comes before the traceback, and what changed in past days to break the code. Thanks!

Got the same issue today. Try to downgrade JPype1 to 0.6.3. JPype1 released 0.7.0 today, which is not compatible with old interfaces.

The issue appears to be that the package is calling the JVM with an unrecognized argument -server. The previous version was ignoring those sort of errors allowing things to proceed. To get the same behavior with 0.7.0, the flag ignoreUnrecognized would need to be set to True. Likely this needs to be send to pyathenajdbc to correct the defect which placed the bogus argument into the startJVM in the first place.
Looking at the source the -server is hardcoded into the module.
if not jpype.isJVMStarted():
_logger.debug('JVM path: %s', jvm_path)
args = [
'-server',
'-Djava.class.path={0}'.format(driver_path),
'-Dlog4j.configuration=file:{0}'.format(log4j_conf)
]
if jvm_options:
args.extend(jvm_options)
_logger.debug('JVM args: %s', args)
jpype.startJVM(jvm_path, *args)
cls.class_loader = jpype.java.lang.Thread.currentThread().getContextClassLoader()
It is assuming a particular JVM which accepts -server as an argument.

zarr.consolidate_metadata yields error: 'memoryview' object has no attribute 'decode'

I have an existing LMDB zarr archive (~6GB) saved at path. Now I want to consolidate the metadata to improve read performance.
Here is my script:
store = zarr.LMDBStore(path)
root = zarr.open(store)
zarr.consolidate_metadata(store)
store.close()
I get the following error:
Traceback (most recent call last):
File "zarr_consolidate.py", line 12, in <module>
zarr.consolidate_metadata(store)
File "/local/home/marcel/.virtualenvs/noisegan/local/lib/python3.5/site-packages/zarr/convenience.py", line 1128, in consolidate_metadata
return open_consolidated(store, metadata_key=metadata_key)
File "/local/home/marcel/.virtualenvs/noisegan/local/lib/python3.5/site-packages/zarr/convenience.py", line 1182, in open_consolidated
meta_store = ConsolidatedMetadataStore(store, metadata_key=metadata_key)
File "/local/home/marcel/.virtualenvs/noisegan/local/lib/python3.5/site-packages/zarr/storage.py", line 2455, in __init__
d = store[metadata_key].decode() # pragma: no cover
AttributeError: 'memoryview' object has no attribute 'decode'
I am using zarr 2.3.2 and python 3.5.2. I have another machine running python 3.6.2 where this works. Could it have to do with the python version?

Thanks for the report. Should be fixed with gh-452. Please test it out (if you are able).
If you are able to share a bit more information on why read performance suffers in your case, that would be interesting to learn about. :)

Error when running deepmind

It took me two days to install the requirements of deepQ(python version),then I tried to run it today but I faced this problem, and the code are as followed.
root#unicorn:/media/trump/Data1/wei/college/laboratory/deep_q_rl-master/deep_q_rl# python run_nips.py
A.L.E: Arcade Learning Environment (version 0.5.0)
[Powered by Stella]
Use -help for help screen.
Warning: couldn't load settings file: ./ale.cfg
Game console created:
ROM file: ../roms/breakout.bin
Cart Name: Breakout - Breakaway IV (1978) (Atari)
Cart MD5: f34f08e5eb96e500e851a80be3277a56
Display Format: AUTO-DETECT ==> NTSC
ROM Size: 2048
Bankswitch Type: AUTO-DETECT ==> 2K
Running ROM file...
Random seed is 65
Traceback (most recent call last):
File "run_nips.py", line 60, in <module>
launcher.launch(sys.argv[1:], Defaults, __doc__)
File "/media/trump/Data1/wei/college/laboratory/deep_q_rl-master/deep_q_rl/launcher.py", line 223, in launch
rng)
File "/media/trump/Data1/wei/college/laboratory/deep_q_rl-master/deep_q_rl/q_network.py", line 53, in __init__
num_actions, num_frames, batch_size)
File "/media/trump/Data1/wei/college/laboratory/deep_q_rl-master/deep_q_rl/q_network.py", line 168, in build_network
batch_size)
File "/media/trump/Data1/wei/college/laboratory/deep_q_rl-master/deep_q_rl/q_network.py", line 407, in build_nips_network_dnn
from lasagne.layers import dnn
File "/usr/local/lib/python2.7/dist-packages/Lasagne-0.2.dev1-py2.7.egg/lasagne/layers/dnn.py", line 13, in <module>
raise ImportError("dnn not available") # pragma: no cover
ImportError: dnn not available
I have already tested theano ,numpy, scipy and there was no errors coming out. But when I ran it, it said dnn not available. So I came to find dnn, and the code is like this
import theano
from theano.sandbox.cuda import dnn
from .. import init
from .. import nonlinearities
from .base import Layer
from .conv import conv_output_length
from .pool import pool_output_length
from ..utils import as_tuple
if not theano.config.device.startswith("gpu") or not dnn.dnn_available():
raise ImportError("dnn not available") # pragma: no cover
Just hope someone can help me.

Did you install CUDA and cuDNN?
Lasagne is build on top of Theano and is, in some cases, relying on cuda code (e.g. here), rather than abstracting it away.
This can be seen from the import:
from theano.sandbox.cuda import dnn
Also see: https://github.com/Lasagne/Lasagne/issues/242
To get cuDNN you need to register at NVidia as a developer, see:
https://developer.nvidia.com/accelerated-computing
Hope this helps.
Cheers,
Michael

"KeyError: 'SPARK_HOME' ", "can't load main class from JAR" in running PySpark as an Oozie workflow job

This issue is a continuation of my previous question here, which was seemingly resolved but leads to here as another issue.
I am using Spark 1.4.0 on Cloudera QuickstartVM CHD-5.4.0.
When I run my PySpark script as a SparkAction in Oozie, I encounter this error in the Oozie job / container logs:
KeyError: 'SPARK_HOME'
Then I came across this solution and this which are actually for Spark 1.3.0, although I still did try. The documentations seem to say that this issue is already fixed for Spark version 1.3.2 and 1.4.0 (but here I am, encountering the same issue).
The suggested solution in the link was that I need to set spark.yarn.appMasterEnv.SPARK_HOME and spark.executorEnv.SPARK_HOME to anything, even if it's just any path that does not point to actual SPARK_HOME (i.e., /bogus, although I did set these to actual SPARK_HOME).
Here's my workflow after:
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${resourceManager}</job-tracker>
<name-node>${nameNode}</name-node>
<master>local[2]</master>
<mode>client</mode>
<name>${name}</name>
<jar>${workflowRootLocal}/lib/my_pyspark_job.py</jar>
<spark-opts>--conf spark.yarn.appMasterEnv.SPARK_HOME=/usr/lib/spark spark.executorEnv.SPARK_HOME=/usr/lib/spark</spark-opts>
</spark>
Which seems to solve the original problem above. However, it leads to another error when I try to inspect stderr of Oozie container log:
Error: Cannot load main class from JAR file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/cloudera/appcache/application_1437103727449_0011/container_1437103727449_0011_01_000001/spark.executorEnv.SPARK_HOME=/usr/lib/spark
If I am using Python, it should not expect for a main class right? Please note in my previous related post that the Oozie job example shipped with Cloudera QuickstartVM CDH-5.4.0, which features a SparkAction written in Java was working in my tests. It seems that the issue is only in Python.
Appreciate greatly anyone that can help.

Rather than setting spark.yarn.appMasterEnv.SPARK_HOME and spark.executorEnv.SPARK_HOME variables, try and add the following lines of code to your python script before setting your SparkConf()
os.environ["SPARK_HOME"] = "/path/to/spark/installed/location"
Found the reference here
This helped me resolve the error you face, but I faced the following error afterwards
Traceback (most recent call last):
File "/usr/hdp/current/spark-client/AnalyticsJar/boxplot_outlier.py", line 129, in <module>
main()
File "/usr/hdp/current/spark-client/AnalyticsJar/boxplot_outlier.py", line 60, in main
sc = SparkContext(conf=conf)
File "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/pyspark/context.py", line 107, in __init__
File "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/pyspark/context.py", line 155, in _do_init
File "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/pyspark/context.py", line 201, in _initialize_context
File "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/py4j/java_gateway.py", line 701, in __call__
File "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.SecurityException: class "javax.servlet.FilterRegistration"'s signer information does not match signer information of other classes in the same package

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Unable to train a self-supervised(ssl) model using Lightly CLI - python-3.x

The error is from an incompatibility with the latest PyTorch Lightning version (version 1.7 at the time of this writing). A quick fix is to use a lower version (e.g. 1.6). We are working on a fix :) Let me know in case that does not work for you!

Related

Trying to incorporate ML onnx model to Android App

RuntimeError: Unable to start JVM because of Deprecated: convertStrings

zarr.consolidate_metadata yields error: 'memoryview' object has no attribute 'decode'

Error when running deepmind

"KeyError: 'SPARK_HOME' ", "can't load main class from JAR" in running PySpark as an Oozie workflow job

Categories

Resources