Package Cell Issue on Databricks Community Edition - databricks

Followed this https://docs.databricks.com/notebooks/package-cells.html
On Community Edition - latest release Spark 3.x:
A.1. Created the Package with Object as per the example.
A.2. Ran in same Notebook in a different cell without cluster
re-start. No issues, runs fine.
package x.y.z
object Utils {
val aNumber = 5 // works!
def functionThatWillWork(a: Int): Int = a + 1
}
import x.y.z.Utils
Utils.functionThatWillWork(Utils.aNumber)
B.1. Ran this in different Notebook without Cluster re-start. Error.
import x.y.z.Utils
Utils.functionThatWillWork(Utils.aNumber)
C.1. Re-started the Cluster. Ran the import. Error.
import x.y.z.Utils
Utils.functionThatWillWork(Utils.aNumber)
Question
Is this an issue with Community Edition? Do not think so, but cannot place it. Observations contradict the official docs.

Related

Why NoSuchElementException and how to fix it?

I am working on the Deployment of Purview ADB Lineage Solution Accelerator developed by MS Azure team. The tool's gitgub site is here.
I followed their instructions and deployed their tool on Azure. But when I run their sample scala file abfss-in-abfss-out-olsample The following code, gives the error shown below:
NoSuchElementException: spark.openlineage.samplestorageaccount
Code in Scala language:
import org.apache.spark.sql.types.{StructType, StructField, IntegerType, StringType}
val storageServiceName = spark.conf.get("spark.openlineage.samplestorageaccount")
val storageContainerName = spark.conf.get("spark.openlineage.samplestoragecontainer")
val adlsRootPath = "wasbs://"+storageContainerName+"#"+storageServiceName+".blob.core.windows.net"
val storageKey = dbutils.secrets.get("purview-to-adb-kv", "storageAccessKey")
spark.conf.set("fs.azure.account.key."+storageServiceName+".blob.core.windows.net", storageKey)
Question: What could be a cause of the error and how can we fix it
UPDATE: In the Spark Config in the Advanced Options section of the Databricks Cluster I have added the following content as suggested by item 4 of Install OpenLineage on Your Databricks Cluster section of the above mentioned tutorial.
Spark config
spark.openlineage.host https://functionapppv2dtbr8s6k.azurewebsites.net
spark.openlineage.url.param.code bmHFCiNI86nfgqwfkX86Lj5veclRds9Zb1NIJ48uRgNXAzFuQEueiQ==
spark.openlineage.namespace https://adb-1900514794152199.12#0160-060038-516wad48
spark.databricks.delta.preview.enabled true
spark.openlineage.version v1
It means Spark Configuration (spark.conf) doesn't contain such a key.
You have to check how is the configuration setup/provided if you expect this key to be present.

python-docx import is not able to be imported even though the library is installed

System:
MacBook Pro (13-inch, 2020, Two Thunderbolt 3 ports)
Processor 1.4 GHz Quad-Core Intel Core i5
Memory 16GB
OS macOs Monterey version 12.6.1
I'm still fairly new to Python and just learned about the docx library.
I get the following error in Visual Studio Code about the docx library.
Import "docx" could not be resolved.
enter image description here
When I check to see the version installed I get the following:
pip3 show python-docx
Name: python-docx
Version: 0.8.11
I am able to create Word documents with one Python script even with the import issue. However, I've tried to create one using a table and this is what is causing me issues. I'm not sure if the import issue is the root cause.
When I run my script I get the following:
python3 test.py
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/docx/styles/styles.py:139: UserWarning: style lookup by style_id is deprecated. Use style name as key instead.
return self._get_style_id_from_style(self[style_name], style_type)`
The code causing the error:
Add review question table
table = document.add_table(rows=1, cols=3)
table.style = 'TableGrid'
In researching I found I may need to import the following:
from docx.oxml.table import CT_TableStyle
And add the following to that section:
# Add review question table
table = document.add_table(rows=1, cols=3)
style = CT_TableStyle()
style.name = 'TableGrid'
table.style = style
I now get the following warning:
Import "docx.oxml.table" could not be resolved
And the following error when running the script:
line 2, in
from docx.oxml.table import CT_TableStyle
ImportError: cannot import name 'CT_TableStyle' from 'docx.oxml.table' (/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/docx/oxml/table.py)
I also created a virtual environment and still have the same issues. If you need additional details, just let me know what to provide.
Kind regards,
Marcus

Import of Ecoinvent 2.2 and Ecoinvent 3.x fail with Brightway

The import of Ecoinvent 2.2 and 3.x does not happen. I do not understand what is the issue here.
I have downloaded both 2 and 3 versions from Ecoinvent.org but both the Ecospold1 and 2 importers show me the same (or the lack of) results.
`from brightway2 import *
ei33cutoff = SingleOutputEcospold2Importer(
r"C:\Users\HS254581\Documents\test\ecoinvent33cutoff\datasets",
"ecoinvent 3.3 cutoff"
)
ei33cutoff.apply_strategies()
ei33cutoff.statistics()`
All I get is
"Extracting XML data from 13831 datasets"
WITHOUT any error message. I have kept it running for hours sometimes, ultimately I have to quit.
I can however import Forwast database, do some LCA calculations here and some CSV files from Simapro.
I read most of the questions answered here and on github, but cannot find a solution to this problem.
Send help!
P.S. I am using Python 3.7 with Spyder. Here I need to activate bw2 environment in Anaconda prompt and set Python interpreter path to default. Have installed kernels too. This is the only way it seems to work. Also updated conda and brightway packages today just in case.

Setting PYSPARK_SUBMIT_ARGS causes creating SparkContext to fail

a little backstory to my problem: I've been working on a spark project and recently switched my OS to Debian 9. After the switch, I reinstalled spark version 2.2.0 and started getting the following errors when running pytest:
E Exception: Java gateway process exited before sending the driver its port number
After googling for a little while, it looks like people have been seeing this cryptic error in two situations: 1) when trying to use spark with java 9; 2) when the environment variable PYSPARK_SUBMIT_ARGS is set.
It looks like I'm in the second scenario, because I'm using java 1.8. I have written a minimal example
from pyspark import SparkContext
import os
def test_whatever():
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages graphframes:graphframes:0.5.0-spark2.1-s_2.11,com.databricks:spark-avro_2.11:3.2.0 pyspark-shell'
sc = SparkContext.getOrCreate()
It fails with said error, but when the fourth line is commented out, the test is fine (I invoke it with pytest file_name.py).
Removing this env variable is -- at least I don't think it is -- a solution to this problem, because it gives some important information SparkContext. I can't find any documentation in this regard and am lost completely.
I would appreciate any hints on this
Putting this at the top of my jupyter notebook works for me:
import os
os.environ['JAVA_HOME'] = '/usr/lib/jvm/java-8-openjdk-amd64/'

Combining PyCharm, Spark and Jupyter

In the current setup I use a Jupyter notebook server that has a pyspark profile to use Spark. This all works great. I'm however working on a pretty big project and the notebook environment is lacking a bit for me. I found out that PyCharm allows you to run notebooks inside the IDE, giving you more of the advantages of a full IDE as opposed to Jupyter.
In the best case scenario I would run PyCharm locally as opposed to remote desktop on the gateway but using the gateway would be an acceptable alternative.
I'm trying first to get it to work on the gateway. If I have my (spark) Jupyter server running, the IP address set correctly 127.0.0.1:8888 and I create an .ipynb file, after I enter a line and press enter (not running it, just add a newline) I get the following error in the terminal I started pycharm from:
ERROR - pplication.impl.LaterInvocator - Not a stub type: Py:IPNB_TARGET in class org.jetbrains.plugins.ipnb.psi.IpnbPyTargetExpression
Googling doesn't get me anywhere.
I was able to get all three working by installing spark via terminal on OS X. Then I added the following packages to PyCharm project interpreter: findspark, pyspark.
Tested it out with
import findspark
findspark.init()
import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
num_samples = 100000000
def inside(p):
x, y = random.random(), random.random()
return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()
outputting: 3.14160028

Resources