Using pyspark on Windows not working- py4j - apache-spark

I installed Zeppelin on Windows using this tutorial and this.
I also installed java 8 to avoid problems.
I'm now able to start the Zeppelin server, and I'm trying to run this code -
%pyspark
a=5*4
print("value = %i" % (a))
sc.version
I'm getting this error, related to py4j. I had other problems with this library before (same as here), and to avoid them I replaced the library of py4j in the Zeppelin and Spark on my computer with the latest version- py4j 0.10.7.
This is the error I get-
Traceback (most recent call last):
File "C:\Users\SHIRM~1.ARG\AppData\Local\Temp\zeppelin_pyspark-1240802621138907911.py", line 309, in <module>
sc = _zsc_ = SparkContext(jsc=jsc, gateway=gateway, conf=conf)
File "C:\Users\SHIRM.ARGUS\spark-2.3.2\spark-2.3.2-bin-hadoop2.7\python\pyspark\context.py", line 118, in __init__
conf, jsc, profiler_cls)
File "C:\Users\SHIRM.ARGUS\spark-2.3.2\spark-2.3.2-bin-hadoop2.7\python\pyspark\context.py", line 189, in _do_init
self._javaAccumulator = self._jvm.PythonAccumulatorV2(host, port, auth_token)
File "C:\Users\SHIRM.ARGUS\Documents\zeppelin-0.8.0-bin-all\interpreter\spark\pyspark\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1525, in __call__
File "C:\Users\SHIRM.ARGUS\Documents\zeppelin-0.8.0-bin-all\interpreter\spark\pyspark\py4j-0.10.7-src.zip\py4j\protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.api.python.PythonAccumulatorV2. Trace:
I googled it, but couldn't find anyone that it had happened to.
Does anyone have an idea how can I solve this?
Thanks

I feel you have installed Java 9 or 10. Uninstall either of those versions and install a fresh copy of Java 8 from here: https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
And set JAVA_HOME inside hadoop_env.cmd (open with any text-editor).
Note: Java 8 or 7 are stable versions to use and uninstall any existing versions of JAVA. Make sure you add JDK (not JRE) in JAVA_HOME.

I faced the same problem today, and I fixed it by adding PYTHONPATH in the system environment like:
%SPARK_HOME%\python\lib\py4j;%SPARK_HOME%\python\lib\pyspark

Related

Can I run pyspark locally without installing spark on windows 10?

I need to create a proof of concept using pyspark and I was wondering if there is a way to install it and use it via pip without having to install and configure spark itself. I've read a few answers suggesting that the newer versions of pyspark allow you to run it in standalone mode without without needing the full spark but when I try that I get the following error:
Traceback (most recent call last):
File "C:\Users\320181940\PycharmProjects\meetup\main.py", line 8, in <module>
sc = SparkContext("local", "meetup_etl")
File "C:\Users\320181940\PycharmProjects\meetup\venv\lib\site-packages\pyspark\context.py", line 144, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "C:\Users\320181940\PycharmProjects\meetup\venv\lib\site-packages\pyspark\context.py", line 331, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "C:\Users\320181940\PycharmProjects\meetup\venv\lib\site-packages\pyspark\java_gateway.py", line 101, in launch_gateway
proc = Popen(command, **popen_kwargs)
File "C:\Python310\lib\subprocess.py", line 966, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Python310\lib\subprocess.py", line 1435, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
I installed pyspark 3.1.3 using pip, and I'm trying to run this on Windows 10. Any help would be much appreciated.
You need to install java and add JAVA_HOME to your environment variables path
Start a python interpreter, create a spark session and run your code, here's an example:
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
df = spark.createDataFrame(
[["I'm ready!"], ["If I could put into words how much I love waking up at 6 am on Mondays I would."]]).toDF(
"text")
df.show()
Also make sure to set up HADOOP_HOME like it's specified in this gist

ValueError: Protocol message SsdFeatureExtractor has no field replace_preprocessor_with_placeholder

I'm using an object-detection API to train my own model, but while running the training using this command:
python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config
I get this error:
WARNING:tensorflow:From C:\Users\MHD\Anaconda3\envs\tf15\lib\site-packages\tensorflow\python\platform\app.py:124: main (from __main__) is deprecated and will be removed in a future version.
Instructions for updating:
Use object_detection/model_main.py.
Traceback (most recent call last):
File "train.py", line 179, in <module>
tf.app.run()
File "C:\Users\MHD\Anaconda3\envs\tf15\lib\site-packages\tensorflow\python\platform\app.py", line 124, in run
_sys.exit(main(argv))
File "C:\Users\MHD\Anaconda3\envs\tf15\lib\site-packages\tensorflow\python\util\deprecation.py", line 136, in new_func
return func(*args, **kwargs)
File "train.py", line 175, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 249, in train
detection_model = create_model_fn()
File "C:\tensorflow1\models\research\object_detection\builders\model_builder.py", line 119, in build
return _build_ssd_model(model_config.ssd, is_training, add_summaries)
File "C:\tensorflow1\models\research\object_detection\builders\model_builder.py", line 237, in _build_ssd_model
is_training=is_training)
File "C:\tensorflow1\models\research\object_detection\builders\model_builder.py", line 187, in _build_ssd_feature_extractor
if feature_extractor_config.HasField('replace_preprocessor_with_placeholder'):
ValueError: Protocol message SsdFeatureExtractor has no field replace_preprocessor_with_placeholder
please help me guys
Tracing down the cause of this error, I found the option replace_preprocessor_with_placeholder was recently added. Here is the commit record. (On that page if you search for replace_preprocessor_with_placeholder you will find that it was added recently on March 7th, 2019).
So the cause of the error is obviously your proto files version is not consistent with the code version. If you compare object_detection/protos/ssd.proto on your local machine and on the github repo, you will probably find this line does not exist on your local machine's file (because this filed was also added recently!).
The easiest way to fix this error is to reinstall the object detection api following this guide.
Since you already have all packages installed, essentially there are two steps you need to do, install the coco api and compile the protobuff. A new protobuff compilation will fix your error.
Also I recommend you follow the latest api tutorial, I see in your call you are using train.py, this file has now been put in the legacy folder and is not recommended to run since they may not be up-to-date.

Can't get cqlsh to run

I've installed DSE 5.1.10 + the DSE demos as per these instructions on ubuntu.
Apparently it doesn't come with cqlsh so I went about installing it myself.
I've tried various methods the latest of which being
pip3 install cqlsh
this completed successfully and I can now run
cqlsh -version
and get
cqlsh 5.0.1
when running
cqlsh
I get the following error
Traceback (most recent call last):
File "/usr/bin/dsecqlsh.py", line 510, in <module>
cqlsh.main(*cqlsh.read_options(sys.argv[1:], os.environ))
File "/usr/bin/cqlsh.py", line 2447, in main
encoding=options.encoding)
File "/usr/bin/dsecqlsh.py", line 383, in __init__
connect_timeout=connect_timeout)
File "/usr/bin/cqlsh.py", line 528, in __init__
self.get_connection_versions()
File "/usr/bin/cqlsh.py", line 645, in get_connection_versions
if result['dse_version']:
KeyError: 'dse_version'
Any ideas what I'm doing wrong?
Thanks
I'm curious as to your source of information that says DSE does not include the cqlsh command line. As far as I am aware all versions of DSE will install this.
To me it looks like you have clobbered your cqlsh install with the pip3 install command. You've likely installed the OSS version of cqlsh, hence the error complaining about the dse_version above.
I would say at first try uninstalling the pip3 version and see if that helps and then uninstall DSE and reinstall.
Not sure what was wrong, resorted to removing everything and installing from tarball instead following https://docs.datastax.com/en/install/doc/install60/installTARdse.html

Unable to launch anaconda navigator in windows after tensor flow installation

I tried to install tensor flow library in anaconda and after that I am unable to launch anaconda navigator. I am using windows machine and anaconda was working fine before. Error is as below:
An unexpected error occurred on Navigator start-up
Psutil.AccessDenied(pid=9636)
Traceback (most recent call last):
File "C:\Users\india\AppData\Local\Continuum\Anaconda3\lib\site-packages\psutil\_pswindows.py", line 620, in wrapper
return fun(self, *args, **kwargs)
File "C:\Users\india\AppData\Local\Continuum\Anaconda3\lib\site-packages\psutil\_pswindows.py", line 690, in cmdline
ret = cext.proc_cmdline(self.pid)
PermissionError: [WinError 5] Access is denied
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\india\AppData\Local\Continuum\Anaconda3\lib\site-packages\anaconda_navigator\exceptions.py", line 75, in exception_handler
return_value = func(*args, **kwargs)
File "C:\Users\india\AppData\Local\Continuum\Anaconda3\lib\site-packages\anaconda_navigator\app\start.py", line 108, in start_app
if misc.load_pid() is None: # A stale lock might be around
File "C:\Users\india\AppData\Local\Continuum\Anaconda3\lib\site-packages\anaconda_navigator\utils\misc.py", line 384, in load_pid
cmds = process.cmdline()
File "C:\Users\india\AppData\Local\Continuum\Anaconda3\lib\site-packages\psutil\__init__.py", line 701, in cmdline
return self._proc.cmdline()
File "C:\Users\india\AppData\Local\Continuum\Anaconda3\lib\site-packages\psutil\_pswindows.py", line 623, in wrapper
raise AccessDenied(self.pid, self._name)
psutil.AccessDenied: psutil.AccessDenied (pid=9636)
I have uninstalled and then installed anaconda, still facing same issue.
I could fix it by simply resetting the navigator via Command Prompt:
C:\Anaconda3\anaconda-navigator --reset
I use the anaconda in win10 x64 (python3.x) and I got this problem this morning as well. All the things had been done in anaconda was that I installed a package called pywinauto, then I turned off the computer and went home.
I follow the suggestion from https://groups.google.com/a/continuum.io/forum/#!topic/anaconda/4hBTDOcDzgo by lan, I update the anaconda by using the command.
conda update anaconda-navigator
However, the error occurred the same.
Then I reboot the computer and restart the anaconda, magic thing happens! the anaconda works fine!
I think maybe some configuration has been changed when I install the new package, and this change is conflicted with the current configuration which you need to reboot the computer to take effect. you should note that turn off computer then restart it is not equal to reboot it.
So, just reboot the computer and it will be fine.
I solved this issue by running in the anaconda prompt :
anaconda-navigator --reset

Spyder IDE fails to start on Windows 10 with Python 3.8 [duplicate]

This question already has answers here:
Jupyter Notebook with Python 3.8 - NotImplementedError
(4 answers)
Closed 3 years ago.
Note: this issue is fixed in Spyder 4.1.3!
(original question) checking out Python 3.8 (x64) on Windows 10, I got into trouble when trying to setup Spyder. Note: the issue was reproducible with a fresh Python installation on a clean Windows-10 system. However, no such issues on Linux (tested on debian / Mint19.x).
At first, everything went smooth during installation via pip install spyder.
error #1: pywin32
After starting Spyder, it said in the IPython console window:
Traceback (most recent call last):
File "c:\users\USERNAME\appdata\local\programs\python\python38\lib\site‑packages\spyder\plugins\ipythonconsole.py", line 1572, in create_kernel_manager_and_kernel_client
kernel_manager.start_kernel(stderr=stderr_handle)
File "c:\users\USERNAME\appdata\local\programs\python\python38\lib\site‑packages\jupyter_client\manager.py", line 240, in start_kernel
self.write_connection_file()
File "c:\users\USERNAME\appdata\local\programs\python\python38\lib\site‑packages\jupyter_client\connect.py", line 470, in write_connection_file
self.connection_file, cfg = write_connection_file(self.connection_file,
File "c:\users\USERNAME\appdata\local\programs\python\python38\lib\site‑packages\jupyter_client\connect.py", line 141, in write_connection_file
with secure_write(fname) as f:
File "c:\users\USERNAME\appdata\local\programs\python\python38\lib\contextlib.py", line 113, in __enter__
return next(self.gen)
File "c:\users\USERNAME\appdata\local\programs\python\python38\lib\site‑packages\jupyter_core\paths.py", line 424, in secure_write
win32_restrict_file_to_user(fname)
File "c:\users\USERNAME\appdata\local\programs\python\python38\lib\site‑packages\jupyter_core\paths.py", line 359, in win32_restrict_file_to_user
import win32api
ImportError: DLL load failed while importing win32api: Das angegebene Modul wurde nicht gefunden.
I was able to fix the import error by running pywin32_postinstall.py -install from the scripts folder (from a cmd prompt with elevated rights). That copies pythoncom38.dll and pywintypes38.dll from \Lib\site-packages\pywin32_system32 to \windows\system32, see also here - however, I'd suggest to not modify system folders and use the option I put in my answer below.
error #2: tornado
However, now Spyder just freezes at the loading screen (logo displayed, saying something like "initializing main window")!
cloning the dev version of Spyder from https://github.com/spyder-ide/spyder.git and running it via python bootstrap.py --debug reveals the cause of the freeze:
2019-11-03 17:39:53,261 [ERROR] [tornado.application] -> Exception in callback functools.partial(<function ThreadedZMQSocketChannel.__init__.<locals>.setup_stream at 0x0000015E00B758B0>)
Traceback (most recent call last):
File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python38\lib\site-packages\tornado\ioloop.py", line 743, in _run_callback
ret = callback()
File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python38\lib\site-packages\jupyter_client\threaded.py", line 48, in setup_stream
self.stream = zmqstream.ZMQStream(self.socket, self.ioloop)
File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python38\lib\site-packages\zmq\eventloop\zmqstream.py", line 127, in __init__
self._init_io_state()
File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python38\lib\site-packages\zmq\eventloop\zmqstream.py", line 546, in _init_io_state
self.io_loop.add_handler(self.socket, self._handle_events, self.io_loop.READ)
File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python38\lib\site-packages\tornado\platform\asyncio.py", line 99, in add_handler
self.asyncio_loop.add_reader(fd, self._handle_events, fd, IOLoop.READ)
File "C:\Users\USERNAME\AppData\Local\Programs\Python\Python38\lib\asyncio\events.py", line 501, in add_reader
raise NotImplementedError
NotImplementedError
...so it seems the import error caused by the Python 3.8 version of pywin32 is only one issue. There's also a problem related to tornado IO (web server), see here / here.
last checked with Python 3.8.2 (AMD64), Spyder 4.1.1. Please note that I am not using Anaconda. Use either conda or pip, not both.
Spyder 4.1.3 Update: The issue is fixed!
(Tested on Python 3.8.3rc1, tornado 6.0.4)
If you come here still experiencing similar startup issues with Spyder: the first thing to try I'd suggest upgrade to Spyder version >= 4.1.3.
older version of this answer
workaround, tornado issue:
Modify the file ...\Python38...\Lib\site-packages\tornado\platform\asyncio.py;
add
import sys
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
after the other import statements. Source: here on SO and also linked here. If I get this post on the tornado repo right, this is likely to be a pretty permanent workaround.
if also needed - workaround, pywin32 issue:
Modify the file ...\Python38\Lib\site-packages\jupyter_core\path.py;
add a line
import pywintypes
before import win32api in line 359. This modification is based on this post.

Resources