MODULE MASKED IN PYSPARK . NOT ACCESSIBLE - apache-spark

I am a novice to Pyspark. I was trying to run a pyspark code. I ran a code with name "time.py" because of which the pyspark is not able to run now. I get the below error.
Traceback (most recent call last):
File "/home/VAL_CODE/test.py", line 1, in <module>
from pyspark import SparkContext,HiveContext,SparkConf
File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p0.1796617/lib/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module>
File "/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p0.1796617/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 24, in <module>
File "/usr/lib64/python2.7/threading.py", line 14, in <module>
from time import time as _time, sleep as _sleep
File "/home/VAL_CODE/time.py", line 1, in <module>
ImportError: cannot import name SparkContext
20/08/13 19:04:16 INFO util.ShutdownHookManager: Shutdown hook called
20/08/13 19:04:16 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-f104da1f-ba70-4c45-8a19-6ffc55b609aa
The error is that the script "/usr/lib64/python2.7/threading.py" is referring to the local script "/home/VAL_CODE/time.py" which I created. Now, I have deleted the "/home/VAL_CODE/time.py" script. But still face the issue when I run a new code "/home/VAL_CODE/test.py". Please help to resolve.

Related

AttributeError: module 'resource' has no attribute 'getpagesize'

I am trying to use Tensorflow Object Detection API and I follow the steps mentioned in the given link -
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html#tf-models-install
When I try to access the Object Detection Jupyter Notebook through jupyter notebook
I am facing the below exception
Traceback (most recent call last):
File "/usr/local/bin/jupyter-notebook", line 7, in <module>
from notebook.notebookapp import main
File "/home/dinesh/.local/lib/python3.6/site-
packages/notebook/notebookapp.py", line 79, in <module>
from .base.handlers import Template404, RedirectWithParams
File "/home/dinesh/.local/lib/python3.6/site-
packages/notebook/base/handlers.py", line 32, in <module>
import prometheus_client
File "/home/dinesh/.local/lib/python3.6/site-
packages/prometheus_client/__init__.py", line 7, in <module>
from . import process_collector
File "/home/dinesh/.local/lib/python3.6/site-
packages/prometheus_client/process_collector.py", line 12, in <module>
_PAGESIZE = resource.getpagesize()
AttributeError: module 'resource' has no attribute 'getpagesize'
I am using
Python - 3.6.3
Jupyter - 1.0.0
How can I overcome this exception?
Got a similar error. My project contained modules (folders)
model
resource (replaced with resources)
service
So I changed the name of the resource module to resources (change name to any appropriate module name)
I have the same error ,after rename my resource module in the PYTHONPATH , it works right. check your PYTHONPATH, is there a resource module?
I had a similar issue on starting Jupyter Notebooks on Windows 10.
When I initially ran the regular startup script, I got a windows terminal that opened and immediately closed, too fast to see any error messages. So, I opened a windows 10 powershell terminal and ran
conda update conda
and
conda update --all
then I ran
jupyter-notebook at the windows prompt. The results were:
Traceback (most recent call last):
File "E:\Users\Bob\anaconda3\Scripts\jupyter-notebook-script.py", line 6, in
from notebook.notebookapp import main
File "E:\Users\Bob\anaconda3\lib\site-packages\notebook\notebookapp.py", line 76, in
from .base.handlers import Template404, RedirectWithParams
File "E:\Users\Bob\anaconda3\lib\site-packages\notebook\base\handlers.py", line 24, in
import prometheus_client
File "E:\Users\Bob\anaconda3\lib\site-packages\prometheus_client_init_.py", line 3, in
from . import (
File "E:\Users\Bob\anaconda3\lib\site-packages\prometheus_client\process_collector.py", line 11, in
_PAGESIZE = resource.getpagesize()
AttributeError: module 'resource' has no attribute 'getpagesize'
I opened process_collector.py in the site-packages\prometheus_client in notepad++ and changed
line 9 import resource to import resources
and
line 11 _PAGESIZE = resource.getpagesize() to _PAGESIZE = resources.getpagesize()
I searched for other instances of resource and found none. I then saved the file and reran jupyter-notebook at the windows terminal prompt.
This time I got:
Traceback (most recent call last):
File "E:\Users\Bob\anaconda3\Scripts\jupyter-notebook-script.py", line 10, in
sys.exit(main())
File "E:\Users\Bob\anaconda3\lib\site-packages\jupyter_core\application.py", line 254, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "E:\Users\Bob\anaconda3\lib\site-packages\traitlets\config\application.py", line 844, in launch_instance
app.initialize(argv)
File "E:\Users\Bob\anaconda3\lib\site-packages\traitlets\config\application.py", line 87, in inner
return method(app, *args, **kwargs)
File "E:\Users\Bob\anaconda3\lib\site-packages\notebook\notebookapp.py", line 2126, in initialize
self.init_resources()
File "E:\Users\Bob\anaconda3\lib\site-packages\notebook\notebookapp.py", line 1697, in init_resources
old_soft, old_hard = resource.getrlimit(resource.RLIMIT_NOFILE)
AttributeError: module 'resource' has no attribute 'getrlimit'
Still having Notepad++ open, I opened notebookapp.py in site-packages\notebook and searched for resource. I found and changed the following lines:
Line 37 import resource to import resources
line 40 resource = None to resources = None
line 1036 resource is None to resources is None
line 1040 soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE) to
soft, hard = resources.getrlimit(resources.RLIMIT_NOFILE)
line 1693 if resource is None: to if resources is None:
line 1697 old_soft, old_hard = resource.getrlimit(resource.RLIMIT_NOFILE) to old_soft, old_hard = resources.getrlimit(resources.RLIMIT_NOFILE)
line 1706 resource.setrlimit(resource.RLIMIT_NOFILE, (soft, hard)) to
resources.setrlimit(resources.RLIMIT_NOFILE, (soft, hard))
I searched for other instances of resource and found none. I then saved the notebookapp.py file and reran jupyter-notebook at the windows terminal prompt. This time Jupyter Notebooks opened with the expected files tab displayed. I quit out of Jupyter Notebooks and restarted using the normal link to the startup script and it worked as expected.
I am not sure what caused this issue. I did not intentionally update anything before I saw the issue. Yesterday, Jupyter Notebooks worked as expected, today when I tried to run it, I got the flicker screen as described above.
[Quick look for root cause]
Check if you have set PYTHONPATH in your system -
try to rename that momentarily
run "jupyter notebook"
[Fix the issue]
if the issue is resolved by doing this then
search "resource" folder within all paths mentioned in PYTHONPATH
if such folder found then rename/refactor it to other name e.g. "resources"

python 3 exception also gives the output of the previous program

i ran into an interesting bug when writing a json parser(called /home/myusername/py/json.py) in python3
i raised a basic exception and got unexpected output,
when investigating this further i wrote a new script entirely given below
/home/myusername/py/error.py
raise Exception("basic exception")
after running "python3 error.py"
i should get a really short error message, but instead i get console output of the previous run program.
[unexpected debug output of json.py]
[truncated for readability]
[it is extremely long but does not contain further errors]
Traceback (most recent call last):
File "error.py", line 1, in <module>
raise Exception("basic exception")
Exception: basic exception
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 17, in <module>
import json
File "/home/myusername/py/json.py", line 174, in <module>
rs = parser.Object(testcase)
File "/home/myusername/py/json.py", line 104, in Object
raise Exception(self.Array(source, "crashing object scanner"))
Exception: None
Original exception was:
Traceback (most recent call last):
File "error.py", line 1, in <module>
raise Exception("basic exception")
Exception: basic exception
i dont know why i get such a long message. nor do i know why i get debug code of an uncalled script. i would like an explanation, i am running Ubuntu, i have not yet found related bugs on the internet.
it appears that basic exception handling requires a json.py script, so when my error.py raises an exception it loads my json.py script instead of the buildin script, then my json script trows an exception.
the solution is to rename my json.py

ImportError: No Module named 'driver'

I am trying to run a voice program with Python 3, like given here: https://stackoverflow.com/a/31257805/760393
But I keep getting errors:
..\site-packages\pyttsx\engine.py, line 18, in import driver..
Please help.
Traceback (most recent call last):
File "C:/Users/USER/AppData/Local/Programs/Python/Python35-32/0DEV/L X/voice2.py", line 1, in <module>
import pyttsx
File "C:\Users\USER\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pyttsx\__init__.py", line 18, in <module>
from .engine import Engine
File "C:\Users\USER\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pyttsx\engine.py", line 18, in <module>
import driver
ImportError: No module named 'driver'
I had this problem. It's a mistake in calling driver.py file in engine.py.
You must edit engine file and at calling section change this:
import driver
to:
from . import driver

PySpark - The system cannot find the path specified

Hy,
I have been run Spark multiple times (Spyder IDE).
Today I got this error (the code it's the same)
from py4j.java_gateway import JavaGateway
gateway = JavaGateway()
os.environ['SPARK_HOME']="C:/Apache/spark-1.6.0"
os.environ['JAVA_HOME']="C:/Program Files/Java/jre1.8.0_71"
sys.path.append("C:/Apache/spark-1.6.0/python/")
os.environ['HADOOP_HOME']="C:/Apache/spark-1.6.0/winutils/"
from pyspark import SparkContext
from pyspark import SparkConf
conf = SparkConf()
The system cannot find the path specified.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Apache\spark-1.6.0\python\pyspark\conf.py", line 104, in __init__
SparkContext._ensure_initialized()
File "C:\Apache\spark-1.6.0\python\pyspark\context.py", line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "C:\Apache\spark-1.6.0\python\pyspark\java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
What's go wrong?
thanks for your time.
Ok... Someone install a new java version in VirtualMachine. I'm only change this
os.environ['JAVA_HOME']="C:/Program Files/Java/jre1.8.0_91"
and works again.
thks for your time.

cqlsh: Copy command error ImportError: No module named cqlsh

I am new to world of Cassandra and was trying to import a csv data file to my newly created cassandra server on windows 7 for learning purpose, i was following datastax online tutorial for the same and got stuck on
https://academy.datastax.com/courses/installing-configuring-and-manipulating-keyspace/copying-external-data-tables-development
When i tried to copy a csv file it gives me error
cqlsh> use musicdb
... ;
cqlsh:musicdb> copy album(title,year,performer,genre,tracks)
... from 'album.csv'
... with header = true;
Error starting import process:
Can't pickle <type 'thread.lock'>: it's not found as thread.lock
can only join a started process
cqlsh:musicdb> Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\DataStax Community\python\lib\multiprocessing\forking.p
y", line 373, in main
prepare(preparation_data)
File "C:\Program Files\DataStax Community\python\lib\multiprocessing\forking.p
y", line 482, in prepare
file, path_name, etc = imp.find_module(main_name, dirs)
ImportError: No module named cqlsh
cqlsh:musicdb>
My album.csv file is in same folder as cqlsh.exe

Resources