ipython notebook logging to notebook - apache-spark

I'm playing with spark in ipython notebook and having a blast, but unfortunately a recent change (maybe a recent spark upgrade) caused spark logging to the notebook rather than the console where I started the notebook. I still want to see these log messages so I can't just turn off logging, but I would prefer it be directed to console rather than logging to the notebook. Any way to achieve this?

Edit your conf/log4j.properties file and Change the following line:
log4j.rootCategory=INFO, console
to
log4j.rootCategory=ERROR, console
Another approach would be to :
Fireup spark-shell and type in the following:
import org.apache.log4j.Logger
import org.apache.log4j.Level
Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)
You won't see any logs after that.
(copied from https://stackoverflow.com/a/1323999/348056)

Related

runOutput isn't appearing even after using dbutils.notebook.exit in ADF

I am using the below code to get some information in the Azure Databricks notebook, but runOutput isn't appearing even after the successful completion of the notebook activity.
Code that I used.
import json
dbutils.notebook.exit(json.dumps({
"num_records" : dest_count,
"source_table_name" : table_name
}))
Databricks notebook exited properly, but Notebook activity isn't showing runOutput.
Can someone please help me what is wrong here?
When I tried the above in my environment, it is working fine for me.
These are my Linked service Configurations.
Result:
I suggest you try the troubleshooting steps like, changing Notebook and changing the Databricks workspace with new one or using Existing cluster in linked service.
If still, it is giving the same, then it's better to raise a Support ticket for your issue.

Using databricks-connect debugging a notebook that runs another notebook

I am able to connect to the Azure Databricks cluster from my Linux Centos VM, using visual studio code.
Below code even works without any issue
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
print("Cluster access test - ",spark.range(100).count())
setting = spark.conf.get("spark.master") # returns local[*]
if "local" in setting:
from pyspark.dbutils import DBUtils
dbutils = DBUtils().get_dbutils(spark)
else:
print("Do nothing - dbutils should be available already")
out = dbutils.fs.ls('/FileStore/')
print(out)
I have a notebook in my local which run another notebook using %run path/anothernotebook.
Since the %run string is commented # python is not executing it.
So i tried to include the dbutils.notebook.run('pathofnotebook') but it errors out stating notebook
Exception has occurred: AttributeError
'SparkServiceClientDBUtils' object has no attribute 'notebook'
Is it possible to locally debug a notebook that invokes another notebook?
It’s impossible - dbutils implementation included into Databricks Connect supports only ‘fs’ and ‘secrets’ subcommands (see docs).
Databricks Connect is designed to work with code developed locally, not with notebooks. If you can package content of that notebook as Python package, then you’ll able to debug it.
P.S. please take into account that dbutils.notebook.run executes notebook as a separate job, in contrast with %run

Execute databricks magic command from PyCharm IDE

With databricks-connect we can successfully run codes written in Databricks or Databricks notebook from many IDE. Databricks has also created many magic commands to support their feature with regards to running multi-language support in each cell by adding commands like %sql or %md. One issue I am facing currently is when I try to execute Databricks notebooks in Pycharm is as follows:
How to execute Databricks specific magic command from PyCharm.
E.g.
Importing a script or notebook in Done in Databricks using this command-
%run
'./FILE_TO_IMPORT'
Where as in IDE from FILE_TO_IMPORT import XYZ works.
Again everytime I download Databricks notebook it comments out the magic commands and that makes it impossible to be used anywhere outside Databricks environment.
It's really inefficient to convert all databricks magic command everytime I want to do any developement.
Is there any configuration I could set which automatically detects Databricks specific magic commands?
Any solution to this will be helpful. Thanks in Advance!!!
Unfortunately, as per the databricks-connect version 6.2.0-
" We cannot use magic command outside the databricks environment directly. This will either require creating custom functions but again that will only work for Jupyter not PyCharm"
Again, since importing py files requires %run magic command so this also becomes a major issue. A solution to this is by converting the set of files to be imported as a python package and add it to the cluster via Databricks UI and then import and use it in PyCharm. But this is a very tedious process.

When trying to register a UDF using Python on I get an error about Spark BUILD with HIVE

Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o54))
This happens whenever I create a UDF on a second notebook in Jupyter on IBM Bluemix Spark as a Service.
If you are using IBM Bluemix Spark as a Service, execute the following command in a cell of the python notebook :
!rm -rf /gpfs/global_fs01/sym_shared/YPProdSpark/user/spark_tenant_id/notebook/notebooks/metastore_db/*.lck
Replace spark_tenant_id with the actual one. You can find the tenant id using the following command in a cell of the notebook:
!whoami
I've run into these errors as well. Only the first notebook you launch will have access to the hive context. From here
By default Hive(Context) is using embedded Derby as a metastore. It is intended mostly for testing and supports only one active user.

Logback stops logging on file change

Wrote a small test and found that when logging to a file and opening that file in vi, editing and then saving it - Logback stops writing to that file.
No apparent errors in the process that is writing the logs. It keeps running but no logs are appended to the file.
Is anyone familiar with this? Tried running the same against log4j - and it appears to continue writing the logs. I recall reading in the past the log4j also had such a shortcoming but couldn't reproduce.
Please advise.

Resources