Environment variables in Pyspark

Environment variables in Pyspark - apache-spark

I have installed hadoop in cluster mode and now I have installed Spark. I want to use pyspark and this is my .bashrc
# User specific aliases and functions
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:/opt/hadoop/spark/bin:/opt/hadoop/spark/sbin
export JAVA_HOME=/usr/java/jdk1.8.0_202-amd64
#Estas variables las metemos con spark
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export SPARK_HOME=/opt/hadoop/spark
#Para pyspark
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9.3-src.zip:$PYTHONPATH
export PATH=$SPARK_HOME/python:$PATH
export PYSPARK_PYTHON=/usr/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/usr/bin/python2.7
When I run the pyspark command the following happens:
[hadoop#nodo1 ~]$ pyspark
Python 2.7.5 (default, Nov 16 2020, 22:23:17)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "/opt/hadoop/spark/python/pyspark/shell.py", line 29, in <module>
from pyspark.context import SparkContext
File "/opt/hadoop/spark/python/pyspark/__init__.py", line 53, in <module>
from pyspark.rdd import RDD, RDDBarrier
File "/opt/hadoop/spark/python/pyspark/rdd.py", line 34, in <module>
from pyspark.java_gateway import local_connect_and_auth
File "/opt/hadoop/spark/python/pyspark/java_gateway.py", line 31, in <module>
from pyspark.find_spark_home import _find_spark_home
File "/opt/hadoop/spark/python/pyspark/find_spark_home.py", line 68
print("Could not find valid SPARK_HOME while searching {0}".format(paths), file=sys.stderr)
^
SyntaxError: invalid syntax
I am using
Hadoop 3.2.3
Spark 3.1.2
Python 2.7.5
CentOs 7
Where is the error?

The problem was in the Python version. Installing Python3 fixed the problem by leaving the following environment variables:
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:/opt/hadoop/spark/bin:/opt/hadoop/spark/sbin
export JAVA_HOME=/usr/java/jdk1.8.0_202-amd64
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export SPARK_HOME=/opt/hadoop/spark

Related

Unable to import netmiko module

I have problem when importing netmiko module. I've installed the python3-pip and trying to import netmiko module
root#Python,Go,Perl,PHP-1:~# python3
Python 3.5.2 (default, Oct 8 2019, 13:06:37)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import netmiko
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/netmiko/__init__.py", line 7, in <module>
from netmiko.ssh_dispatcher import ConnectHandler
File "/usr/local/lib/python3.5/dist-packages/netmiko/ssh_dispatcher.py", line 2, in <module>
from netmiko.a10 import A10SSH
File "/usr/local/lib/python3.5/dist-packages/netmiko/a10/__init__.py", line 1, in <module>
from netmiko.a10.a10_ssh import A10SSH
File "/usr/local/lib/python3.5/dist-packages/netmiko/a10/a10_ssh.py", line 3, in <module>
from netmiko.cisco_base_connection import CiscoSSHConnection
File "/usr/local/lib/python3.5/dist-packages/netmiko/cisco_base_connection.py", line 143
msg = f"Login failed: {self.host}"
^
SyntaxError: invalid syntax

As you can see the SyntaxError that you are receiving happens at this line:
msg = f"Login failed: {self.host}"
This is because your Python version is 3.5 while the f-strings were introduced in Python 3.6 with PEP 498. Upgrading your Python version to 3.6 or later will resolve your issue.

Netmiko 3.x.x (and going forward requires Python 3.6 or greater). Netmiko 2.4.2 is the last version to support Python2.7 (or Python 3.5).
F-strings as mentioned above are one thing that will break if you try to use Netmiko 3.x.x with Python 3.5.

How to install nest in Python3 on Ubuntu 18.04

After following the Ubuntu/Debian installation instructions for the Nest simulator I can only import the nest module in python2.x, not python3.x
$ python3
Python 3.6.8 (default, Aug 20 2019, 17:12:48)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nest
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/nest/lib/python2.7/site-packages/nest/__init__.py", line 26, in <module>
from . import ll_api # noqa
File "/nest/lib/python2.7/site-packages/nest/ll_api.py", line 72, in <module>
from . import pynestkernel as kernel # noqa
ImportError: dynamic module does not define module export function (PyInit_pynestkernel)

The default install compiles with the default Python version which is still 2 in Ubuntu.
To use Python 3, run:
cmake -Dwith-python=3 -DCMAKE_INSTALL_PREFIX:PATH=</install/path> </path/to/NEST/src>
Mentioned here in the doc.
NB: don't forget to clear the build folder to avoid issues

running spark 2.3 with python 3.x on yarn

I am trying to run example pi.py using spark-submit but I am getting following error,
Python 3.6.5
[GCC 4.3.4 [gcc-4_3-branch revision 152973]] on linux
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "/var/lib/spark/python/pyspark/shell.py", line 31, in <module>
from pyspark import SparkConf
File "/var/lib/spark/python/pyspark/__init__.py", line 110, in <module>
from pyspark.sql import SQLContext, HiveContext, Row
File "/var/lib/spark/python/pyspark/sql/__init__.py", line 45, in <module>
from pyspark.sql.types import Row
File "/var/lib/spark/python/pyspark/sql/types.py", line 27, in <module>
import ctypes
File "Python-3.6.5_suse/lib/python3.6/ctypes/__init__.py", line 7, in <module>
from _ctypes import Union, Structure, Array
ImportError: libffi.so.4: cannot open shared object file: No such file or directory
I am new to python and spark but when I set PYSPARK_PYTHON path in spark-defaults.sh to some older version of python like 3.3.x then it works perfectly fine.
am I setting anything wrong or I do need any other library? This looks like libraries issue.
Thanks!

I found what the problem was! My small yarn cluster has different OS hosts some suse's some centos's and when I set the PYSPARK_PYTHON in the spark-env.sh that configuration was having a central python path so the libraries weren't matching and it was throwing the libffi.so error. So, checking the type of host OS against the lib python path was helpful. Once I set the correct path and run,
./bin/spark-submit --deploy-mode client examples/src/main/python/pi.py
then I could verify the local libraries were set properly. I didn't need to install any additional python libraries such as pyspark or py4j as suggested in comments or other answers.

How do I configure the sqlite3 module to work with Django 1.10?

So the issue is that apparently Django uses the sqlite3 that is included with python, I have sqlite3 on my computer and it works fine on its own. I have tried many things to fix this and have not found a solution yet.
Please let me know how I can fix this issue so that I can use Django on my computer.
:~$ python
Python 3.5.2 (default, Nov 6 2016, 14:10:16)
[GCC 6.2.0 20161005] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/sqlite3/__init__.py", line 23, in <module>
from sqlite3.dbapi2 import *
File "/usr/local/lib/python3.5/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ImportError: No module named '_sqlite3'
>>> exit()

I figured out that this error was caused by me changing my python path to 3.5 from the default of 2.7.

canopy linux running error: missing libmkl_intel_lp64.so

I am a paid user and I have installed canopy on a red hat server and got the virtual environment configured. In the virtual env, python is the one contained in the environment:
(User) $ which python
~/Enthought/Canopy_64bit/User/bin/python
But when I cannot import "numpy" from python:
(User) $ python
Enthought Canopy Python 2.7.3 | 64-bit | (default, Mar 25 2013, 15:55:17)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/canopy/appdata/canopy-1.0.0.1160.rh5-x86_64/lib/python2.7/site-packages/numpy/__init__.py", line 148, in <module>
import add_newdocs
File "/usr/lib/canopy/appdata/canopy-1.0.0.1160.rh5-x86_64/lib/python2.7/site-packages/numpy/add_newdocs.py", line 9, in <module>
from numpy.lib import add_newdoc
File "/usr/lib/canopy/appdata/canopy-1.0.0.1160.rh5-x86_64/lib/python2.7/site-packages/numpy/lib/__init__.py", line 13, in <module>
from polynomial import *
File "/usr/lib/canopy/appdata/canopy-1.0.0.1160.rh5-x86_64/lib/python2.7/site-packages/numpy/lib/polynomial.py", line 17, in <module>
from numpy.linalg import eigvals, lstsq
File "/usr/lib/canopy/appdata/canopy-1.0.0.1160.rh5-x86_64/lib/python2.7/site-packages/numpy/linalg/__init__.py", line 48, in <module>
from linalg import *
File "/usr/lib/canopy/appdata/canopy-1.0.0.1160.rh5-x86_64/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 23, in <module>
from numpy.linalg import lapack_lite
**ImportError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory**
I tried to update numpy with enpkg, but the above error still shows up when importing numpy.
(User) $ enpkg numpy
prefix: /home/wchen06/canopy_virtual
MKL-10.3-1.egg [fetching]
74.60 MB [.................................................................]
numpy-1.6.1-5.egg [fetching]
3.33 MB [.................................................................]
MKL-10.3-1.egg [installing]
248.04 MB [.................................................................]
numpy-1.6.1-5.egg [installing]
11.20 MB [.................................................................]
Please help.

Sorry for the late reply! This is due to a bug in Canopy 1.0.0 (beta) for Linux. For a workaround, please see https://support.enthought.com/entries/21656595-ImportError-libmkl-intel-lp64-so-cannot-open-shared-object-file

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Environment variables in Pyspark - apache-spark

Related

Unable to import netmiko module

How to install nest in Python3 on Ubuntu 18.04

running spark 2.3 with python 3.x on yarn

How do I configure the sqlite3 module to work with Django 1.10?

canopy linux running error: missing libmkl_intel_lp64.so

Categories

Resources