RuntimeError: Unable to start JVM because of Deprecated: convertStrings - python-3.x

I run an automated python job on an EMR cluster that updates Amazon Athena Tables.
It was running well until few days ago (on python 2.7 and 3.7). Here is the script:
from pyathenajdbc import connect
import yaml
config = yaml.load(open('athena-config.yaml', 'r'))
statements = config['statements']
staging_dir = config['staging_dir']
conn = connect(s3_staging_dir=staging_dir, region_name='eu-west-1')
try:
with conn.cursor() as cursor:
for statement in statements:
cursor.execute(statement)
finally:
conn.close()
The athena-config.yaml has a staging directory and few Athena Statements.
Here is the Error:
You are using pip version 9.0.3, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Unrecognized option: -server
create_tables.py:5: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(open('athena-config.yaml', 'r'))
/mnt/conda/lib/python3.7/site-packages/jpype/_core.py:210: UserWarning:
-------------------------------------------------------------------------------
Deprecated: convertStrings was not specified when starting the JVM. The default
behavior in JPype will be False starting in JPype 0.8. The recommended setting
for new code is convertStrings=False. The legacy value of True was assumed for
this session. If you are a user of an application that reported this warning,
please file a ticket with the developer.
-------------------------------------------------------------------------------
""")
Traceback (most recent call last):
File "create_tables.py", line 10, in <module>
region_name='eu-west-1')
File "/mnt/conda/lib/python3.7/site-packages/pyathenajdbc/__init__.py", line 69, in connect
driver_path, log4j_conf, **kwargs)
File "/mnt/conda/lib/python3.7/site-packages/pyathenajdbc/connection.py", line 68, in __init__
self._start_jvm(jvm_path, jvm_options, driver_path, log4j_conf)
File "/mnt/conda/lib/python3.7/site-packages/pyathenajdbc/util.py", line 25, in _wrapper
return wrapped(*args, **kwargs)
File "/mnt/conda/lib/python3.7/site-packages/pyathenajdbc/connection.py", line 97, in _start_jvm
jpype.startJVM(jvm_path, *args)
File "/mnt/conda/lib/python3.7/site-packages/jpype/_core.py", line 219, in startJVM
_jpype.startup(jvmpath, tuple(args), ignoreUnrecognized, convertStrings)
RuntimeError: Unable to start JVM
at loadJVM(native/common/jp_env.cpp:169)
at loadJVM(native/common/jp_env.cpp:179)
at startup(native/python/pyjp_module.cpp:159)
As far as I understand the issue in convertStrings being deprecated. Can anyone help me resolve that? I cannot understand why this """) comes before the traceback, and what changed in past days to break the code. Thanks!

Got the same issue today. Try to downgrade JPype1 to 0.6.3. JPype1 released 0.7.0 today, which is not compatible with old interfaces.

The issue appears to be that the package is calling the JVM with an unrecognized argument -server. The previous version was ignoring those sort of errors allowing things to proceed. To get the same behavior with 0.7.0, the flag ignoreUnrecognized would need to be set to True. Likely this needs to be send to pyathenajdbc to correct the defect which placed the bogus argument into the startJVM in the first place.
Looking at the source the -server is hardcoded into the module.
if not jpype.isJVMStarted():
_logger.debug('JVM path: %s', jvm_path)
args = [
'-server',
'-Djava.class.path={0}'.format(driver_path),
'-Dlog4j.configuration=file:{0}'.format(log4j_conf)
]
if jvm_options:
args.extend(jvm_options)
_logger.debug('JVM args: %s', args)
jpype.startJVM(jvm_path, *args)
cls.class_loader = jpype.java.lang.Thread.currentThread().getContextClassLoader()
It is assuming a particular JVM which accepts -server as an argument.

Related

Unable to train a self-supervised(ssl) model using Lightly CLI

I am unable to train a self-supervised(ssl) model to create image embeddings using the lightly cli: Lightly Platform Link. I intend to select diverse example from my dataset to create an object detection model further downstream and the image embeddings created with the ssl model will help me to perform Active Learning.I have reproduced the error in the Notebook with public access -----> lightly_app_troubleshooting_stackoverflow.ipynb Link.
In the notebook shared above this cmd raises an exception:
!source /content/venv_1/bin/activate;lightly-magic \
input_dir="/content/Sunflowers" trainer.max_epochs=20 \
token='< your lightly token(free account) >' \
new_dataset_name="sunflowers_dataset" loader.batch_size=64
The exception stack trace produced is as below:
/content/venv_1/lib/python3.7/site-packages/hydra/_internal/hydra.py:127: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
configure_logging=with_log_configuration,
########## Starting to train an embedding model.
/content/venv_1/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py:23: LightningDeprecationWarning: pytorch_lightning.core.lightning.LightningModule has been deprecated in v1.7 and will be removed in v1.9. Use the equivalent class from the pytorch_lightning.core.module.LightningModule class instead.
"pytorch_lightning.core.lightning.LightningModule has been deprecated in v1.7"
Error executing job with overrides: ['input_dir=/content/Sunflowers', 'trainer.max_epochs=20', 'token=5bbcf60e3a5c7c266dcd4e0e9056c8301684e0f2f8922bc5', 'new_dataset_name=sunflowers_dataset', 'loader.batch_size=64']
Traceback (most recent call last):
File "/content/venv_1/lib/python3.7/site-packages/lightly/cli/lightly_cli.py", line 115, in lightly_cli
return _lightly_cli(cfg)
File "/content/venv_1/lib/python3.7/site-packages/lightly/cli/lightly_cli.py", line 52, in _lightly_cli
checkpoint = _train_cli(cfg, is_cli_call)
File "/content/venv_1/lib/python3.7/site-packages/lightly/cli/train_cli.py", line 137, in _train_cli
encoder.train_embedding(**cfg['trainer'], strategy=distributed_strategy)
File "/content/venv_1/lib/python3.7/site-packages/lightly/embedding/_base.py", line 88, in train_embedding
trainer = pl.Trainer(**kwargs, callbacks=[self.checkpoint_callback])
File "/content/venv_1/lib/python3.7/site-packages/pytorch_lightning/utilities/argparse.py", line 345, in insert_env_defaults
return fn(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'weights_summary'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
I could not create a new tag - "lightly" as I lack the stackoverflow reputation points to do so.
The error is from an incompatibility with the latest PyTorch Lightning version (version 1.7 at the time of this writing). A quick fix is to use a lower version (e.g. 1.6). We are working on a fix :)
Let me know in case that does not work for you!

Unable to run index migration in OpenSearch

I have a docker compose running where a django backend, opensearch & opensearch dashboard are running. I have connected the backend to talk to opensearch and I'm able to query it successfully. I'm trying to create indexes using this command inside the docker container.
./manage.py opensearch --rebuild
Reference: https://django-opensearch-dsl.readthedocs.io/en/latest/getting_started/#create-and-populate-opensearchs-indices
I get the following error when I run the above command
root#ed186e462ca3:/app# ./manage.py opensearch --rebuild
/usr/local/lib/python3.6/site-packages/OpenSSL/crypto.py:8: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
from cryptography import utils, x509
Traceback (most recent call last):
File "./manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 224, in fetch_command
klass = load_command_class(app_name, subcommand)
File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 37, in load_command_class
return module.Command()
File "/usr/local/lib/python3.6/site-packages/django_opensearch_dsl/management/commands/opensearch.py", line 32, in __init__
if settings.TESTING: # pragma: no cover
File "/usr/local/lib/python3.6/site-packages/django/conf/__init__.py", line 80, in __getattr__
val = getattr(self._wrapped, name)
AttributeError: 'Settings' object has no attribute 'TESTING'
Sentry is attempting to send 1 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit
I'm not sure where I'm going wrong. Any help would be greatful.
TIA
For future reference, this was indeed a bug in django-opensearch-dsl which was fix in the 0.3.0 release.

Problem running odoo-bin on MacOS (ValueError: current limit exceeds maximum limit)

I am unable to get Odoo 14 running on my MacOS machine. Some research into the following error suggests that I can manually configure the memory limits which may resolve the issue but I cannot find the relevant config files on my machine.
I've checked and reinstalled all of the requirements and I can't find much information to point me in the right direction.
(venv) kilgow#wmbp odoo-dev % python3 odoo/odoo-bin
2021-09-18 15:56:53,295 1931 INFO ? odoo: Odoo version 14.0
2021-09-18 15:56:53,295 1931 INFO ? odoo: addons paths: ['/Users/kilgow/Desktop/odoo-dev/odoo/odoo/addons', '/Users/kilgow/Library/Application Support/Odoo/addons/14.0', '/Users/kilgow/Desktop/odoo-dev/odoo/addons']
2021-09-18 15:56:53,295 1931 INFO ? odoo: database: default#default:default
2021-09-18 15:56:53,351 1931 INFO ? odoo.addons.base.models.ir_actions_report: You need Wkhtmltopdf to print a pdf version of the reports.
Traceback (most recent call last):
File "/Users/kilgow/Desktop/odoo-dev/odoo/odoo-bin", line 8, in <module>
odoo.cli.main()
File "/Users/kilgow/Desktop/odoo-dev/odoo/odoo/cli/command.py", line 61, in main
o.run(args)
File "/Users/kilgow/Desktop/odoo-dev/odoo/odoo/cli/server.py", line 178, in run
main(args)
File "/Users/kilgow/Desktop/odoo-dev/odoo/odoo/cli/server.py", line 172, in main
rc = odoo.service.server.start(preload=preload, stop=stop)
File "/Users/kilgow/Desktop/odoo-dev/odoo/odoo/service/server.py", line 1298, in start
rc = server.run(preload, stop)
File "/Users/kilgow/Desktop/odoo-dev/odoo/odoo/service/server.py", line 510, in run
self.start(stop=stop)
File "/Users/kilgow/Desktop/odoo-dev/odoo/odoo/service/server.py", line 452, in start
set_limit_memory_hard()
File "/Users/kilgow/Desktop/odoo-dev/odoo/odoo/service/server.py", line 83, in set_limit_memory_hard
resource.setrlimit(rlimit, (config['limit_memory_hard'], hard))
ValueError: current limit exceeds maximum limit
(venv) kilgow#wmbp odoo-dev % python3 odoo/odoo-bin
You should check out this documentation for more info, so the easy way is add an extra argument to your run script like below.
python3 odoo-bin --addons-path=addons -d mydb --limit-memory-hard 0
I believe this could be due to the machine using an M1 chip. Manually increasing the memory limits did not resolve the problem.
I’ve managed to work around the issue by running Odoo and Postgres in Docker containers instead.

AttributeError: module 'resource' has no attribute 'getpagesize'

I am trying to use Tensorflow Object Detection API and I follow the steps mentioned in the given link -
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html#tf-models-install
When I try to access the Object Detection Jupyter Notebook through jupyter notebook
I am facing the below exception
Traceback (most recent call last):
File "/usr/local/bin/jupyter-notebook", line 7, in <module>
from notebook.notebookapp import main
File "/home/dinesh/.local/lib/python3.6/site-
packages/notebook/notebookapp.py", line 79, in <module>
from .base.handlers import Template404, RedirectWithParams
File "/home/dinesh/.local/lib/python3.6/site-
packages/notebook/base/handlers.py", line 32, in <module>
import prometheus_client
File "/home/dinesh/.local/lib/python3.6/site-
packages/prometheus_client/__init__.py", line 7, in <module>
from . import process_collector
File "/home/dinesh/.local/lib/python3.6/site-
packages/prometheus_client/process_collector.py", line 12, in <module>
_PAGESIZE = resource.getpagesize()
AttributeError: module 'resource' has no attribute 'getpagesize'
I am using
Python - 3.6.3
Jupyter - 1.0.0
How can I overcome this exception?
Got a similar error. My project contained modules (folders)
model
resource (replaced with resources)
service
So I changed the name of the resource module to resources (change name to any appropriate module name)
I have the same error ,after rename my resource module in the PYTHONPATH , it works right. check your PYTHONPATH, is there a resource module?
I had a similar issue on starting Jupyter Notebooks on Windows 10.
When I initially ran the regular startup script, I got a windows terminal that opened and immediately closed, too fast to see any error messages. So, I opened a windows 10 powershell terminal and ran
conda update conda
and
conda update --all
then I ran
jupyter-notebook at the windows prompt. The results were:
Traceback (most recent call last):
File "E:\Users\Bob\anaconda3\Scripts\jupyter-notebook-script.py", line 6, in
from notebook.notebookapp import main
File "E:\Users\Bob\anaconda3\lib\site-packages\notebook\notebookapp.py", line 76, in
from .base.handlers import Template404, RedirectWithParams
File "E:\Users\Bob\anaconda3\lib\site-packages\notebook\base\handlers.py", line 24, in
import prometheus_client
File "E:\Users\Bob\anaconda3\lib\site-packages\prometheus_client_init_.py", line 3, in
from . import (
File "E:\Users\Bob\anaconda3\lib\site-packages\prometheus_client\process_collector.py", line 11, in
_PAGESIZE = resource.getpagesize()
AttributeError: module 'resource' has no attribute 'getpagesize'
I opened process_collector.py in the site-packages\prometheus_client in notepad++ and changed
line 9 import resource to import resources
and
line 11 _PAGESIZE = resource.getpagesize() to _PAGESIZE = resources.getpagesize()
I searched for other instances of resource and found none. I then saved the file and reran jupyter-notebook at the windows terminal prompt.
This time I got:
Traceback (most recent call last):
File "E:\Users\Bob\anaconda3\Scripts\jupyter-notebook-script.py", line 10, in
sys.exit(main())
File "E:\Users\Bob\anaconda3\lib\site-packages\jupyter_core\application.py", line 254, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "E:\Users\Bob\anaconda3\lib\site-packages\traitlets\config\application.py", line 844, in launch_instance
app.initialize(argv)
File "E:\Users\Bob\anaconda3\lib\site-packages\traitlets\config\application.py", line 87, in inner
return method(app, *args, **kwargs)
File "E:\Users\Bob\anaconda3\lib\site-packages\notebook\notebookapp.py", line 2126, in initialize
self.init_resources()
File "E:\Users\Bob\anaconda3\lib\site-packages\notebook\notebookapp.py", line 1697, in init_resources
old_soft, old_hard = resource.getrlimit(resource.RLIMIT_NOFILE)
AttributeError: module 'resource' has no attribute 'getrlimit'
Still having Notepad++ open, I opened notebookapp.py in site-packages\notebook and searched for resource. I found and changed the following lines:
Line 37 import resource to import resources
line 40 resource = None to resources = None
line 1036 resource is None to resources is None
line 1040 soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE) to
soft, hard = resources.getrlimit(resources.RLIMIT_NOFILE)
line 1693 if resource is None: to if resources is None:
line 1697 old_soft, old_hard = resource.getrlimit(resource.RLIMIT_NOFILE) to old_soft, old_hard = resources.getrlimit(resources.RLIMIT_NOFILE)
line 1706 resource.setrlimit(resource.RLIMIT_NOFILE, (soft, hard)) to
resources.setrlimit(resources.RLIMIT_NOFILE, (soft, hard))
I searched for other instances of resource and found none. I then saved the notebookapp.py file and reran jupyter-notebook at the windows terminal prompt. This time Jupyter Notebooks opened with the expected files tab displayed. I quit out of Jupyter Notebooks and restarted using the normal link to the startup script and it worked as expected.
I am not sure what caused this issue. I did not intentionally update anything before I saw the issue. Yesterday, Jupyter Notebooks worked as expected, today when I tried to run it, I got the flicker screen as described above.
[Quick look for root cause]
Check if you have set PYTHONPATH in your system -
try to rename that momentarily
run "jupyter notebook"
[Fix the issue]
if the issue is resolved by doing this then
search "resource" folder within all paths mentioned in PYTHONPATH
if such folder found then rename/refactor it to other name e.g. "resources"

"KeyError: 'SPARK_HOME' ", "can't load main class from JAR" in running PySpark as an Oozie workflow job

This issue is a continuation of my previous question here, which was seemingly resolved but leads to here as another issue.
I am using Spark 1.4.0 on Cloudera QuickstartVM CHD-5.4.0.
When I run my PySpark script as a SparkAction in Oozie, I encounter this error in the Oozie job / container logs:
KeyError: 'SPARK_HOME'
Then I came across this solution and this which are actually for Spark 1.3.0, although I still did try. The documentations seem to say that this issue is already fixed for Spark version 1.3.2 and 1.4.0 (but here I am, encountering the same issue).
The suggested solution in the link was that I need to set spark.yarn.appMasterEnv.SPARK_HOME and spark.executorEnv.SPARK_HOME to anything, even if it's just any path that does not point to actual SPARK_HOME (i.e., /bogus, although I did set these to actual SPARK_HOME).
Here's my workflow after:
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${resourceManager}</job-tracker>
<name-node>${nameNode}</name-node>
<master>local[2]</master>
<mode>client</mode>
<name>${name}</name>
<jar>${workflowRootLocal}/lib/my_pyspark_job.py</jar>
<spark-opts>--conf spark.yarn.appMasterEnv.SPARK_HOME=/usr/lib/spark spark.executorEnv.SPARK_HOME=/usr/lib/spark</spark-opts>
</spark>
Which seems to solve the original problem above. However, it leads to another error when I try to inspect stderr of Oozie container log:
Error: Cannot load main class from JAR file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/cloudera/appcache/application_1437103727449_0011/container_1437103727449_0011_01_000001/spark.executorEnv.SPARK_HOME=/usr/lib/spark
If I am using Python, it should not expect for a main class right? Please note in my previous related post that the Oozie job example shipped with Cloudera QuickstartVM CDH-5.4.0, which features a SparkAction written in Java was working in my tests. It seems that the issue is only in Python.
Appreciate greatly anyone that can help.
Rather than setting spark.yarn.appMasterEnv.SPARK_HOME and spark.executorEnv.SPARK_HOME variables, try and add the following lines of code to your python script before setting your SparkConf()
os.environ["SPARK_HOME"] = "/path/to/spark/installed/location"
Found the reference here
This helped me resolve the error you face, but I faced the following error afterwards
Traceback (most recent call last):
File "/usr/hdp/current/spark-client/AnalyticsJar/boxplot_outlier.py", line 129, in <module>
main()
File "/usr/hdp/current/spark-client/AnalyticsJar/boxplot_outlier.py", line 60, in main
sc = SparkContext(conf=conf)
File "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/pyspark/context.py", line 107, in __init__
File "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/pyspark/context.py", line 155, in _do_init
File "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/pyspark/context.py", line 201, in _initialize_context
File "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/py4j/java_gateway.py", line 701, in __call__
File "/hadoop/yarn/local/filecache/1314/spark-core_2.10-1.1.0.jar/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.SecurityException: class "javax.servlet.FilterRegistration"'s signer information does not match signer information of other classes in the same package

Resources