Databricks connect does not work from intellj? - databricks

I am trying to use databricks connect to run the spark job on databricks cluster from intellj .I followed the below link documentation.
https://docs.databricks.com/dev-tools/databricks-connect.html
However I could not make it work with intellj and it throws below exception
21/10/01 18:32:07 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/10/01 18:32:07 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
Exception in thread "main" java.lang.NoSuchFieldError: JAVA_9
at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:207)
at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:95)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:443)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:384)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:432)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:262)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:291)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:495)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2834)
I could not find a workaround this as the documentation does not say anything clearly I cross checked from intellj its pointed to correct jar directory returned by (databricks-connect get-jar-dir).Any clue on this will be helpful?
Note:databricks-connect test is returning success

Related

Application failed 2 times due to AM container for appattempt_ exited with exitCode: 0

When I submit my spark program, it fails at the end but with a ExitCode:0 as shown in the picture.
The program should write a table on hive and despite the failure, the table was created successfully.
But I can't figure out the origin of the error. Can you help please.
Yarn logs -appID gives the following output here
I finally solved my problem. In fact today I suprisingly got another error from the same program saying
ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver X:37478 disassociated! Shutting down.
And I found so many solutions talking about memory or timeOut.
What did the trick is just that I forgot to close my sparkSession (spark.close()).

Hive INFO logs are not getting suppressed in Spark job

There are two approaches to control logging. One is via log4j.properties and another via controlling it programmatically. I have tried both:
Via log4j.properties file:
# disable logging for spark libraries
log4j.additivity.org=false
log4j.additivity.org.apache=false
#log4j.logger.org.apache=ERROR, NOAPPENDER
log4j.logger.org=ERROR, NOAPPENDER
and via programmatically:
org.apache.log4j.Logger logger = LogManager.getLogger(pkgName);
logger.setLevel(Level.ERROR);
I was able to suppress other logs but there are few INFO logs which are still getting printed:
INFO metastore: Connected to metastore.
INFO Hive: Registering function addfunc ca.nextpathway.hive.UDFToDate
and
INFO ContextHandler: Started o.s.j.s.ServletContextHandler#17f9344b{/static,null,AVAILABLE}
I want to suppress all the INFO logs except for few specific packages. But I think I am nowhere near it. If anyone knows what could be the problem here please let me know.
Try using the below. This should work.
Logger.getLogger("org.apache.hadoop.hive").setLevel(Level.ERROR);
The code
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java has a bug. It creates the LOg as below:
Logger LOG = LoggerFactory.getLogger("hive.ql.metadata.Hive");
So the regular filter with org.apache.hadoop.hive does not work. Instead, you have to use "hive.ql.metadata.Hive". For example:
org.apache.log4j.Logger.getLogger("hive.ql.metadata.Hive").setLevel(Level.WARN);

Spark - Cosmos - connector problems

I am playing around with the Azure Spark-CosmosDB connector which lets you access CosmosDB nodes directly from a Spark cluster for analytics using Jupyter on HDINsight
I have been following the steps described here,including uploading the required jars to Azure storage and executing the %%configure magic to prepare the environment.
But it always seems to terminate due to an I/O exception when trying to open the jar (see yarn log below)
17/10/09 20:10:35 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.io.IOException: Error accessing /mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1507534135641_0014/container_1507534135641_0014_01_000001/azure-cosmosdb-spark-0.0.3-SNAPSHOT.jar)
17/10/09 20:10:35 ERROR ApplicationMaster: RECEIVED SIGNAL TERM
17/10/09 20:10:35 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: java.io.IOException: Error accessing /mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1507534135641_0014/container_1507534135641_0014_01_000001/azure-cosmosdb-spark-0.0.3-SNAPSHOT.jar)`
Not sure whether this is related to the jar not being copied to the worker nodes.
any idea? thanks, Nick

insert data into Microsoft SQL server using Spark

I am trying to insert data into sql server using spark using the below Jdbc methods.
Option 1:
prop.put("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
dataf.write.mode(org.apache.spark.sql.SaveMode.Append).jdbc(url,table_name, prop)
Table is already created. Appending new data.Job Error-ed with the below exception
Exception in thread "main"
com.microsoft.sqlserver.jdbc.SQLServerException: CREATE TABLE
permission denied in database
Question is : Why create table permission is required for appending the data?
Option2:
prop.put("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(dataf, url, table_name, prop)
Above command working from spark-shell. when the same is used in scala code and packaged with dependencies giving below exception
Exception in thread "main" java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
I tried setting driver class-path and executor class-path and also --jars still no luck. Included sqljdbc4.jar in driver-classpath and --jars.
Copied sqljdbc4.jar to all worker nodes as well still no luck.
Any Ideas on this?
After Lot of searching and Testing, I found the answer. It might be useful for someone.
Option 1: This is because of bug in spark 1.5.X. the same was resolved
in 1.6.x and later. Because of the bug, It always try to create a new
table.
Option2: This causes because , driver name on classpath given
priority than properties we are passing as argument. Workaround for
this is to create connection and then invoke savetable.
workaround if you are using spark 1.5.x or lower.
JdbcUtils.createConnection(url, prop)
JdbcUtils.saveTable()

inject runtime exception nutch 2.3

I get stuck with setup Nutch 2.3 with hbase 0.94:
fx#fx:~$ $NUTCH_HOME/runtime/local/bin/nutch inject file:///home/fx/Abivin/apache-nutch-2.3/seed/urls.txt
InjectorJob: starting at 2015-06-17 14:46:35
InjectorJob: Injecting urlDir: file:/home/fx/Abivin/apache-nutch-2.3/seed/urls.txt
InjectorJob: Using class org.apache.gora.memory.store.MemStore as the Gora storage class.
InjectorJob: java.lang.RuntimeException: job failed: name=inject file:/home/fx/Abivin/apache-nutch-2.3/seed/urls.txt, jobid=job_local1999341506_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:231)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
when seed/urls.txt stores urls. I've searched many similar errors but still get stuck with this. Please give me some ideas to resolve. Thanks
It seems that Nutch cannot inject URL to 'webpage' table. First, please check the configuration in gora-hbase. In the case the configuration is correct, you should delete the hbase data directory and start again.
Hope this helps

Resources