Getting below error while connecting to PySpark session - python-3.x

Overwriting default python with Anaconda3 python..This is on AWS Linux2 and can able to set Python and pip alias as below. tried exporting pyspark as below but it is not reflected. How can I fix this ...
###Set PySpark
sudo tee -a /etc/skel/.bashrc <<"EOF"
export PYSPARK_PYTHON="/apps/softwares/anaconda3/bin/python"
export PYSPARK_DRIVER_PYTHON="/apps/softwares/anaconda3/bin/python"
EOF
source ~/.bashrc
$ which python
alias python='/apps/softwares/anaconda3/bin/python'
/apps/softwares/anaconda3/bin/python
$ which pip
alias pip='/apps/softwares/anaconda3/bin/pip'
/apps/softwares/anaconda3/bin/pip
$ which pyspark
/usr/bin/pyspark
$ python
Python 3.8.3 (default, Jul 2 2020, 16:21:59)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
$ pyspark
Python 3.8.3 (default, Jul 2 2020, 16:21:59)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/shell.py", line 31, in <module>
from pyspark import SparkConf
File "/usr/lib/spark/python/pyspark/__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "/usr/lib/spark/python/pyspark/context.py", line 31, in <module>
from pyspark import accumulators
File "/usr/lib/spark/python/pyspark/accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "/usr/lib/spark/python/pyspark/serializers.py", line 72, in <module>
from pyspark import cloudpickle
File "/usr/lib/spark/python/pyspark/cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "/usr/lib/spark/python/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code return types.CodeType(
TypeError: an integer is required (got type bytes)
>>>

Related

why is pythono3 panda last udate not working?

python3.9.x
python3
Python 3.9.2 (default, Mar 12 2021, 04:06:34)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> import pandas
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pi/.local/lib/python3.9/site-packages/pandas/__init__.py", line 22, in <module>
from pandas.compat import (
File "/home/pi/.local/lib/python3.9/site-packages/pandas/compat/__init__.py", line 15, in <module>
from pandas.compat.numpy import (
File "/home/pi/.local/lib/python3.9/site-packages/pandas/compat/numpy/__init__.py", line 7, in <module>
from pandas.util.version import Version
File "/home/pi/.local/lib/python3.9/site-packages/pandas/util/__init__.py", line 1, in <module>
from pandas.util._decorators import ( # noqa
File "/home/pi/.local/lib/python3.9/site-packages/pandas/util/_decorators.py", line 14, in <module>
from pandas._libs.properties import cache_readonly # noqa
File "/home/pi/.local/lib/python3.9/site-packages/pandas/_libs/__init__.py", line 13, in <module>
from pandas._libs.interval import Interval
File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 48 from C header, got 40 from PyObject
the when I update numpy it said that it needs verson 3.9 ++
For the Python version you are using pandas is officially not supported. I would advise to uninstall 3.9 and install 3.7.
Before update or installing the new packages/libraries, always make sure that it is compatible with whatever version you are using.

How to fix "TypeError: an integer is required (got type bytes)" error while running import pypsark

Spark version 2.4.5, python version 3.8.2
Got the below error:
VirtualBox:~/spark-2.4.5-bin-hadoop2.7/python$ python3
Python 3.8.2 (default, Apr 27 2020, 15:53:34)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/prasanth/spark-2.4.5-bin-hadoop2.7/python/pyspark/__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "/home/prasanth/spark-2.4.5-bin-hadoop2.7/python/pyspark/context.py", line 31, in <module>
from pyspark import accumulators
File "/home/prasanth/spark-2.4.5-bin-hadoop2.7/python/pyspark/accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "/home/prasanth/spark-2.4.5-bin-hadoop2.7/python/pyspark/serializers.py", line 72, in <module>
from pyspark import cloudpickle
File "/home/prasanth/spark-2.4.5-bin-hadoop2.7/python/pyspark/cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "/home/prasanth/spark-2.4.5-bin-hadoop2.7/python/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
This issue happens because of spark doesn't compatible with python 3.8.x! please use python3.7. you can do it with this command:
PYSPARK_PYTHON=python3.7 pyspark

how to solve import readline error in python 3 linux install

I am getting the below error while installing python 3 in SUSE SLES 12SP3, how do i get rid(skip) of the import readline error, I am i missing some step
:/tmp/Python-3.6.4> ./configure
./configure --prefix=/usr/local
make altinstall
: /usr/local/lib64> sudo ln -s /usr/local/lib/python3.6/lib-dynload/
/usr/local/lib> python3
Python 3.6.3 (default, Jan 8 2018, 10:26:56)
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
Failed calling sys.__interactivehook__
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site.py", line 387, in register_readline
import readline
File "/usr/local/lib/python3.6/site-packages/readline.py", line 6, in <module>
from pyreadline.rlmain import Readline
File "/usr/local/lib/python3.6/site-packages/pyreadline/__init__.py", line 12, in <module>
from . import logger, clipboard, lineeditor, modes, console
File "/usr/local/lib/python3.6/site-packages/pyreadline/clipboard/__init__.py", line 13, in <module>
from .win32_clipboard import GetClipboardText, SetClipboardText
File "/usr/local/lib/python3.6/site-packages/pyreadline/clipboard/win32_clipboard.py", line 37, in <module>
import ctypes.wintypes as wintypes
File "/usr/local/lib/python3.6/ctypes/wintypes.py", line 20, in <module>
class VARIANT_BOOL(ctypes._SimpleCData):
ValueError: _type_ 'v' not supported

Importing package in python error

importing procedure in python -
utina#utinax55:~$ python
Python 2.7.11+ (default, Apr 17 2016, 14:00:29)
[GCC 5.3.1 20160413] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lal
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/utina/.local/lib/python2.7/site-packages/lal/__init__.py", line 2, in <module>
from .lal import *
File "/home/utina/.local/lib/python2.7/site-packages/lal/lal.py", line 28, in <module>
_lal = swig_import_helper()
File "/home/utina/.local/lib/python2.7/site-packages/lal/lal.py", line 24, in swig_import_helper
_mod = imp.load_module('_lal', fp, pathname, description)
ImportError: libgsl.so.19: cannot open shared object file: No such file or directory
>>> exit
libgsl.so.90 path is:
utina#utinax55:~$ locate libgsl
/usr/lib/i386-linux-gnu/libgsl.so.19
/usr/lib/i386-linux-gnu/libgsl.so.19.0.0
/usr/lib/i386-linux-gnu/libgslcblas.so.0
/usr/lib/i386-linux-gnu/libgslcblas.so.0.0.0
/usr/lib/x86_64-linux-gnu/libgsl.a
/usr/lib/x86_64-linux-gnu/libgsl.so
/usr/lib/x86_64-linux-gnu/libgsl.so.19
/usr/lib/x86_64-linux-gnu/libgsl.so.19.0.0
/usr/lib/x86_64-linux-gnu/libgslcblas.a
/usr/lib/x86_64-linux-gnu/libgslcblas.so
/usr/lib/x86_64-linux-gnu/libgslcblas.so.0
/usr/lib/x86_64-linux-gnu/libgslcblas.so.0.0.0
/usr/share/doc/libgsl-dbg
/usr/share/doc/libgsl-dev
/usr/share/doc/libgsl2
/usr/share/lintian/overrides/libgsl2
/var/cache/apt/archives/libgsl-dbg_2.1+dfsg-2_amd64.deb
/var/cache/apt/archives/libgsl-dev_2.1+dfsg-2_amd64.deb
/var/cache/apt/archives/libgsl2_2.1+dfsg-2_amd64.deb
/var/lib/dpkg/info/libgsl-dbg:amd64.list
/var/lib/dpkg/info/libgsl-dbg:amd64.md5sums
My library path:
utina#utinax55:~$ echo $LD_LIBRARY_PATH
/usr/lib/x86_64-linux-gnu/
I tried to add this path to the /etc/ld.so.conf and run "sudo ldconfig" but these changes specified in previous posts did not change the import error in python.
I also specify that I installed libgsl dependency and also lal packages were installed using synaptic package manager.

UPDATE: Py 3.5.2 + Matplotlib :errors messages

Sorry for the second post , but I wanted to include the errors I'm getting.
"import matplotlib" works.
get_backend() retuns "TkAgg"( change made in RC file)
but "import matplotlib.pyplot as plt returns:
Python 3.4.2 (default, Oct 19 2014, 13:31:11)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
import matplotlib.pyplot as plt
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.4/dist-packages/matplotlib-1.5.0-py3.4-linux-armv7l.egg/matplotlib/pyplot.py", line 114, in
_backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
File "/usr/local/lib/python3.4/dist-packages/matplotlib-1.5.0-py3.4-linux-armv7l.egg/matplotlib/backends/init.py", line 32, in pylab_setup
globals(),locals(),[backend_name],0)
File "/usr/local/lib/python3.4/dist-packages/matplotlib-1.5.0-py3.4-linux-armv7l.egg/matplotlib/backends/backend_tkagg.py", line 13, in
import matplotlib.backends.tkagg as tkagg
File "/usr/local/lib/python3.4/dist-packages/matplotlib-1.5.0-py3.4-linux-armv7l.egg/matplotlib/backends/tkagg.py", line 9, in
from matplotlib.backends import _tkagg
ImportError: cannot import name '_tkagg'

Resources