running spark 2.3 with python 3.x on yarn - apache-spark

I am trying to run example pi.py using spark-submit but I am getting following error,
Python 3.6.5
[GCC 4.3.4 [gcc-4_3-branch revision 152973]] on linux
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "/var/lib/spark/python/pyspark/shell.py", line 31, in <module>
from pyspark import SparkConf
File "/var/lib/spark/python/pyspark/__init__.py", line 110, in <module>
from pyspark.sql import SQLContext, HiveContext, Row
File "/var/lib/spark/python/pyspark/sql/__init__.py", line 45, in <module>
from pyspark.sql.types import Row
File "/var/lib/spark/python/pyspark/sql/types.py", line 27, in <module>
import ctypes
File "Python-3.6.5_suse/lib/python3.6/ctypes/__init__.py", line 7, in <module>
from _ctypes import Union, Structure, Array
ImportError: libffi.so.4: cannot open shared object file: No such file or directory
I am new to python and spark but when I set PYSPARK_PYTHON path in spark-defaults.sh to some older version of python like 3.3.x then it works perfectly fine.
am I setting anything wrong or I do need any other library? This looks like libraries issue.
Thanks!

I found what the problem was! My small yarn cluster has different OS hosts some suse's some centos's and when I set the PYSPARK_PYTHON in the spark-env.sh that configuration was having a central python path so the libraries weren't matching and it was throwing the libffi.so error. So, checking the type of host OS against the lib python path was helpful. Once I set the correct path and run,
./bin/spark-submit --deploy-mode client examples/src/main/python/pi.py
then I could verify the local libraries were set properly. I didn't need to install any additional python libraries such as pyspark or py4j as suggested in comments or other answers.

Related

class YAMLObject(metaclass=YAMLObjectMetaclass): invalid syntax for docker-compose up command

On my Linux system two python versions are present 2.7 and 3.6. To use python 3.6. in my PYTHONPATH in .profile file I have kept /usr/local/lib64/python3.6/site-packages and /usr/local/lib/python3.6/site-packages at the top. and there is no entry of python 2.7 path. In my sys.path variable I can see the python2.7 path entries. (dont know how they appear)
now the issue is when I use docker-compose up command it gives me below error. docker-compose version is 1.29.2
Traceback (most recent call last):
File "/usr/bin/docker-compose", line 7, in <module>
from compose.cli.main import main
File "/usr/lib/python2.7/site-packages/compose/cli/main.py", line 22, in <module>
from ..bundle import get_image_digests
File "/usr/lib/python2.7/site-packages/compose/bundle.py", line 12, in <module>
from .config.serialize import denormalize_config
File "/usr/lib/python2.7/site-packages/compose/config/__init__.py", line 6, in <module>
from .config import ConfigurationError
File "/usr/lib/python2.7/site-packages/compose/config/config.py", line 13, in <module>
import yaml
File "/usr/local/lib64/python3.6/site-packages/yaml/__init__.py", line 284
class YAMLObject(metaclass=YAMLObjectMetaclass):
^
SyntaxError: invalid syntax
If I add python 2.7 site package path at the top of PYTHONPATH docker-compose works but other programs start failing. I dont want to use python 2.7 how can I use only python 3.6.
Please suggest.

google cloud bigquery import failing because "ImportError: cannot import name 'client_pb2' from 'google.api'"

I'm trying to import bigquery from google.cloud, but it's failing because there's a missing dependency. I'm using Python 3.7.1.
Here is the error I'm getting:
Python 3.7.1 (default, Dec 14 2018, 13:28:58)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.cloud import bigquery
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/anaconda3/lib/python3.7/site-packages/google/cloud/bigquery/__init__.py", line 35, in <module>
from google.cloud.bigquery.client import Client
File "/anaconda3/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 53, in <module>
from google.cloud.bigquery.dataset import Dataset
File "/anaconda3/lib/python3.7/site-packages/google/cloud/bigquery/dataset.py", line 24, in <module>
from google.cloud.bigquery.model import ModelReference
File "/anaconda3/lib/python3.7/site-packages/google/cloud/bigquery/model.py", line 27, in <module>
from google.cloud.bigquery_v2 import types
File "/anaconda3/lib/python3.7/site-packages/google/cloud/bigquery_v2/__init__.py", line 23, in <module>
from google.cloud.bigquery_v2 import types
File "/anaconda3/lib/python3.7/site-packages/google/cloud/bigquery_v2/types.py", line 23, in <module>
from google.cloud.bigquery_v2.proto import model_pb2
File "/anaconda3/lib/python3.7/site-packages/google/cloud/bigquery_v2/proto/model_pb2.py", line 28, in <module>
from google.api import client_pb2 as google_dot_api_dot_client__pb2
ImportError: cannot import name 'client_pb2' from 'google.api' (/anaconda3/lib/python3.7/site-packages/google/api/__init__.py)
I've tried upgrading, and uninstalling and reinstalling the "google-cloud-bigquery" and "google-api-python-client" libraries, but this error continues to occur.
I'm not sure how to resolve this error or how to debug it further. I thought it may have been my version of the package, but I haven't been able to replicate this issue on other computers. Is it possible this is occurring because of my version of Python, or because it's installed through Anaconda?
Edit: https://github.com/googleapis/google-cloud-python/issues/8674
Solution is there - upgrade googleapis-common-protos
As you mentioned in your post the solution is to update the module googleapis-common-protos using:
pip install --upgrade googleapis-common-protos
Common Protos are common dependencies throughout the Google API ecosystem, and which are made available for use as dependencies elsewhere as BigQuery.

Command line application shoogle requires python3 but Ubuntu 16 defaults to python 2.7

The command line application shoogle (https://github.com/tokland/shoogle) which exposes google api services at a terminal command line requires python3 but the ubuntu 16 default python is 2.7.
I have tried alias and calling the shoogle app from subprocess in a python3 shell but (of course) the os still provides the default. I have been reluctant to make system wide changes to .bashrc or PYTHONPATH e.g. as so many other resource expect 2.7. But I am currently using this on a virtual machine so if it does break I can recover. That seems the only option but impractical in a production environment.
I've found very little shoogle help online (the author suggests SO etc. for support) so if any one has any experience with shoogle or suggestions to get the requiured python version I'd be happy to hear.
Running shoogle from a python3 interpreter finds the 2.7 files:
3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]
Python Type "help", "copyright", "credits" or "license" for more information.
from subprocess import call
call (['shoogle', 'show'])
Traceback (most recent call last):
File "/usr/local/bin/shoogle", line 11, in <module>
import shoogle
File "/usr/local/lib/python2.7/dist-packages/shoogle/__init__.py", line 5,
in <module> from .shoogle import *
File "/usr/local/lib/python2.7/dist-packages/shoogle/shoogle.py", line 14, in <module>
from . import commands
File "/usr/local/lib/python2.7/dist-packages/shoogle/commands/__init__.py", line 2, in <module>
from . import execute
Using #Surest Texans suggestions I uninstalled the shoogle app and used pip3 install to reinstall. Now the application works as expected when called from the command line.

How do I configure the sqlite3 module to work with Django 1.10?

So the issue is that apparently Django uses the sqlite3 that is included with python, I have sqlite3 on my computer and it works fine on its own. I have tried many things to fix this and have not found a solution yet.
Please let me know how I can fix this issue so that I can use Django on my computer.
:~$ python
Python 3.5.2 (default, Nov 6 2016, 14:10:16)
[GCC 6.2.0 20161005] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/sqlite3/__init__.py", line 23, in <module>
from sqlite3.dbapi2 import *
File "/usr/local/lib/python3.5/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ImportError: No module named '_sqlite3'
>>> exit()
I figured out that this error was caused by me changing my python path to 3.5 from the default of 2.7.

Sublime Text: Change default python.dll file

When I try to run some Python code (which works in Anaconda Spyder) in Sublime Text, I get this error:
Python 3.5.1 |Anaconda 2.4.0 (64-bit)| (default, Dec 7 2015, 15:00:12) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\pandas\__init__.py", line 7, in <module>
from pandas import hashtable, tslib, lib
File "pandas\src\numpy.pxd", line 157, in init pandas.hashtable (pandas\hashtable.c:38262)
File "C:\Anaconda3\lib\site-packages\numpy\__init__.py", line 200, in <module>
from . import add_newdocs
File "C:\Anaconda3\lib\site-packages\numpy\add_newdocs.py", line 13, in <module>
from numpy.lib import add_newdoc
File "C:\Anaconda3\lib\site-packages\numpy\lib\__init__.py", line 8, in <module>
from .type_check import *
File "C:\Anaconda3\lib\site-packages\numpy\lib\type_check.py", line 11, in <module>
import numpy.core.numeric as _nx
File "C:\Anaconda3\lib\site-packages\numpy\core\__init__.py", line 21, in <module>
from . import _internal # for freeze programs
File "C:\Anaconda3\lib\site-packages\numpy\core\_internal.py", line 14, in <module>
import ctypes
File "C:\Anaconda3\lib\ctypes\__init__.py", line 7, in <module>
from _ctypes import Union, Structure, Array
ImportError: Module use of python33.dll conflicts with this version of Python.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 5, in <module>
File "C:\Anaconda3\lib\site-packages\pandas\__init__.py", line 13, in <module>
"extensions first.".format(module))
ImportError: C extension: Module use of python33.dll conflicts with this version of Python. not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace' to build the C extensions first.
Although the SublimeREPL is Python 3.5.1, the program seems to be using python33.dll. It seems like it should be using python35.dll.
In C:\Program Files\Sublime Text, I see python33.dll. So, I place it in a temp folder and put python35.dll (from the Anaconda folder) there instead. I restart Sublime Text.
Then, the program doesn't open, saying python33.dll is missing. This makes me think that there's some setting file in Sublime Text that's looking for python33.dll specifically, and it won't accept python35.dll.
I went through Sublime's PackageResourceViewer and couldn't find anything within the Python package that indicated a python33.dll preference. For reference, my PYTHONPATH points to C:\Anaconda3\ which is where my Python installation lies.
Is there any easy way to switch out python33.dll with python35.dll in C:\Program Files\Sublime Text?
As you discovered, you should not do this. Python 3.3.3 is compiled into the Sublime Text 3 binary and is used to run the Python API and plugin system, among other things. Inserting a Python 3.5 .dll will cause all sorts of conflicts between the ABI and the compiled-in bits, killing the program.
So, instead of fiddling around with that, please edit your question and post the code you were trying to run along with detailed information on exactly how you were trying to run it, and we can troubleshoot that instead.

Resources