Pyspark -No module named coverage_daemon - apache-spark

I am trying to execute this simple code in my dataframe:
import ast rddAlertsRdd = df.rdd.map(lambda message: ast.literal_eval(message['value'])) rddAlerts= rddAlertsRdd.collect()
But I´m getting the error below:
Versions:
Spark: 3.3.1
Hadoop: 2.7
Python: 3.7
Pyspark: 3.3.1
Py4j: 0.10.9.5
OpenJDK: 8
Can it be a problem related to compatibility versions? Appreciate your help!
In order to solve the problem I tried to change Spark environment variables in my Dockerfile.
This is what I have in my Dockerfile:

tl;dr No idea what could be wrong but giving you a little more about the possible cause while reading the source code. Hope this helps.
The only place with coverage_daemon is python/test_coverage/conf/spark-defaults.conf which (as you may've guessed already) is for test coverage and does not seem to be used in production.
It appears that for some reason python/run-tests-with-coverage got executed.
It looks as if you're using Jupyter environment that seems misconfigured.

Related

Breusch-Pagan_test in Python 3

I am trying to run the Breusch-Pagan test in python 3 using the below code. It works perfectly in python 2.7, but when I run it in Anaconda with python 3.6 instead, I get the following the error: "module 'statsmodels.stats.api' has no attribute 'het_breuschpagan'".
I have looked at the statsmodel documentation at this link, https://www.statsmodels.org/devel/generated/statsmodels.stats.diagnostic.het_breuschpagan.html, and know that I am running the right code.
import statsmodels.stats.api as sms
breuschpagan_test = sms.het_breuschpagan(model_run.resid, model.model.exog)
Does anyone know a solution to this or a different way to call this statsmodel function in python 3?
Also, due to limitations at work, I cannot uninstall/re-install or update my statsmodel library at the moment either.
Thanks in advance!

Running Scala with pixiedust in jupyter notebook

I'm trying to run some scala code in python 3 with jupyter notebook. I have installed pixiedust to make this easier. I have imported it and managed to do that successfully. However, according to the tutorial (https://pixiedust.github.io/pixiedust/scalabridge.html) I should be able to use %scala and then run scala code. This is not working for me and I'm getting an error like this: UsageError: Cell magic %%scala not found.
I have tried with both %scala and %%scala, but neither work.
Does anyone know of a different syntax or how this could work?
Thanks!

Setting PYSPARK_SUBMIT_ARGS causes creating SparkContext to fail

a little backstory to my problem: I've been working on a spark project and recently switched my OS to Debian 9. After the switch, I reinstalled spark version 2.2.0 and started getting the following errors when running pytest:
E Exception: Java gateway process exited before sending the driver its port number
After googling for a little while, it looks like people have been seeing this cryptic error in two situations: 1) when trying to use spark with java 9; 2) when the environment variable PYSPARK_SUBMIT_ARGS is set.
It looks like I'm in the second scenario, because I'm using java 1.8. I have written a minimal example
from pyspark import SparkContext
import os
def test_whatever():
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages graphframes:graphframes:0.5.0-spark2.1-s_2.11,com.databricks:spark-avro_2.11:3.2.0 pyspark-shell'
sc = SparkContext.getOrCreate()
It fails with said error, but when the fourth line is commented out, the test is fine (I invoke it with pytest file_name.py).
Removing this env variable is -- at least I don't think it is -- a solution to this problem, because it gives some important information SparkContext. I can't find any documentation in this regard and am lost completely.
I would appreciate any hints on this
Putting this at the top of my jupyter notebook works for me:
import os
os.environ['JAVA_HOME'] = '/usr/lib/jvm/java-8-openjdk-amd64/'

mkl_blas_dgemm_alloc not found in mkl_intel_thread

I developed a a tool in python 3.5 some time ago which currently only uses differential evolution from scipy to do its task. For some reasons I had to change settings on my machine and switch to use python in a virtual environment.
My specs now:
win 10 64 bit
used pip 9.0.1
numpy 1.12.1+mkl
scipy 0.19.0
python 3.6.1
I have a different env using python 2.7 flying around somewhere else.
Now my problem....everytime the differential evolution function gets its first set of data it crashes after returning the differential_evolution step value.
The differential_evolution call can be found under ./libraries/methods/differential_evoluation.py line 76
The pop-up Error is "Entry point 'mkl_blas_dgemm_alloc wasn't found in 'mkl_intel_thread.dll'." And the printed error is "Intel MKL FATAL ERROR: Cannot load mkl_intel_thread.dll." Please notice that my system language is german therfore the pop-up message was translated by me.
I don't know whether this is relevant but my directory structure is:
>some_place/location1/goal.py
>some_place/location2/env/
I didn't work with virtualenv before and in addition to this i used python 3.5. I'd appreciate any help or instructions on how to add more information to this case to help clarify my problem.
Yours sincerely
OK I'm back with more information. Maybe someone else will stumble uppon it. To be clear: I do not know the fix or reason. I just tried a bunch of stuff.
Using
python 3.5.3,numpy 1.11.1+mkl and scipy-0.18.0 or scipy-0.19.0
made the error disappear. I couldn't try numpy 1.11.1 or 1.11.2 for python 3.6 because this kind soul sadly doesn't offer those versions anymore. I found the 1.11.1 version for python 3.5 somewhere on my disk.
From my testing i can tell that it breaks once i use numpy 1.11.3 or higher. 1.11.1 works fine. Therefore I assume that some changes happened either in numpy between 1.11.1 and 1.11.3 which break it OR in how this kind soul creates/builds his wheels. So I'm going to use python 3.5 for now as i do not have numpy 1.11.1+mkl for python 3.6.
Yours sincerely

Python 3 relative path conversion issue

I am currently working with converting Pycrypto over to Python 3.X
Whilst I seem to have the cryptography side working the same cannot be said for the tests
provided with the module :(
I have used the tests under Python 2.64 and all works fine.
I then ran '2to3' over the tests to generate new files in 3.X format.
There are several references to the following:
from .common import make_block_tests
Whenever I run the tests I get:
ValueError: Attempted relative import in non-package
If someone would point me towards a way to fix this it would be much appreciated :)
Cheers
Grail
You are trying to run the test files directly, then you can't have relative imports. Change them to be absolute imports, and it will solve the problem.

Resources