spark-submit yarn java.lang.ClassNotFoundException:com.microsoft.sqlserver.jdbc.SQLServerDriver - apache-spark

spark-submit --master yarn --deploy-mode cluster sqlserver.py --jars sqljdbc42.jar
I get Error :
java.lang.ClassNotFoundException:com.microsoft.sqlserver.jdbc.SQLServerDriver
Everything works when I use --deploy-mode client and copy the jar file to /usr/hdp/current/spark2-client/jars/sqljdbc42.jar
Should I copy the sqljdbc42.jar to /usr/hdp/current/hadoop-yarn-client/lib/ on all data nodes?

Related

What is different between yarn mode and deploy mode in spark?

I'm very confused right now.
Please check if this is right.
4 cases command like below:
# It mean, yarn is cluster mode and deploy cluster mode.
# cluster have YARN Container(have Spark AM, Spark Driver) and YARN node manager.
spark-submit --master yarn --deploy-mode cluster
# It mean, yarn is cluster mode and deploy client mode.
# client have Spark Driver.
# cluster have YARN Container(have Spark AM, Spark Driver) and YARN node manager.
spark-submit --master yarn --deploy-mode client
# It mean, yarn is client mode and deploy cluster mode.
# cluster have YARN Container(have Spark AM) and YARN node manager.
spark-submit --master yarn-client --deploy-mode cluster
# It mean, yarn is client mode and deploy client mode.
# client have Spark Driver.
# cluster have YARN Container(have Spark AM) and YARN node manager.
spark-submit --master yarn-client --deploy-mode client
Is the explanation of the above code correct?
#Use yarn, deploy the driver into the yarn cluster.
spark-submit --master yarn --deploy-mode cluster
#Use yarn, deploy the driver on my local machine(machine that is launching the code)
spark-submit --master yarn --deploy-mode client # this is the default if you don't specify --deploy-mode
These aren't actual options anymore so they aren't really worth discussing:
spark-submit --master yarn-client --deploy-mode cluster
spark-submit --master yarn-client --deploy-mode client
--master yarn-client maybe was an option in early version of spark but isn't used today. (as referenced in the documentation above)

can't run example, error '-Xmx1g org.apache.spark.deploy.SparkSubmit'

i'm trying to replicate wordcount example from cloudera website
spark-submit --master yarn --deploy-mode client --conf
"spark.dynamicAllocation.enabled=false" --jars
SPARK_HOME/examples/jars/spark-examples_2.11-2.4.6.jar
cloudera_analyze.py zookeeper_server:2181 transaction
But it gives me this error:
C:\openjdk\jdk8025209hotspot\bin\java -cp "C:\spark246hadoop27/conf\;C:\spark246hadoop27\jars\*"
-Xmx1g org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf
"spark.dynamicAllocation.enabled=false" --jars
SPARK_HOME/examples/jars/spark-examples_2.11-2.4.6.jar cloudera_analyze.py zookeeper_server:2181 transaction
what could be the reason?

Load properties file in Spark classpath during spark-submit execution

I'm installing the Spark Atlas Connector in a spark submit script (https://github.com/hortonworks-spark/spark-atlas-connector)
Due to security restrictions, I can't put the atlas-application.properties in the spark/conf repository.
I used the two options in the spark-submit :
--driver-class-path "spark.driver.extraClassPath=hdfs:///directory_to_properties_files" \
--conf "spark.executor.extraClassPath=hdfs:///directory_to_properties_files" \
When I launch the spark-submit, I encounter this issue :
20/07/20 11:32:50 INFO ApplicationProperties: Looking for atlas-application.properties in classpath
20/07/20 11:32:50 INFO ApplicationProperties: Looking for /atlas-application.properties in classpath
20/07/20 11:32:50 INFO ApplicationProperties: Loading atlas-application.properties from null
Please find CDP Atals Configuration article.
https://community.cloudera.com/t5/Community-Articles/How-to-pass-atlas-application-properties-configuration-file/ta-p/322158
Client Mode:
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-java-options="-Datlas.conf=/tmp/" /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 10
Cluster Mode:
sudo -u spark spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --files /tmp/atlas-application.properties --conf spark.driver.extraJavaOptions="-Datlas.conf=./" /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 10

Unable to access Hive table when used yarn-cluster mode

I have kerbores enabled Cloudera cluster. Spark can access Hive table when I use client deployment mode.
I executed kinit command and then executed spark2-submit. Spark can access the Hive table when I use client mode.
spark2-submit --master yarn --deploy-mode client --keytab XXXXXXXXXX.keytab --principal XXXXXXXXXXX#USER.COM --conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:MaxPermSize=1024M -Djava.security.krb5.conf=/etc/krb5.conf" test.jar
But when I use cluster mode, spark gives table not found error.
spark2-submit --master yarn --deploy-mode cluster --keytab XXXXXXXXXX.keytab --principal XXXXXXXXXXX#USER.COM --conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:MaxPermSize=1024M -Djava.security.krb5.conf=/etc/krb5.conf" test.jar

Spark/Yarn: FileNotFoundException

I am runnig the following code in spark.
scala>import com.databricks.spark.xml.XmlInputFormat
scala>import org.apache.hadoop.io._
scala>sc.hadoopConfiguration.set(XmlInputFormat.START_TAG_KEY,"<mytag>")
scala>sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY,"</mytag>")
scala>sc.hadoopConfiguration.set(XmlInputFormat.ENCODING_KEY,"utf-8")
scala>val record1 = sc.newAPIHadoopFile("file:///home/myuser/myfile.xml", classOf[XmlInputFormat], classOf[LongWritable],classOf[Text])
When I run it by setting master to local it works fine.
spark2-shell --jars spark-xml_2.10-0.4.1.jar --master local[*]
But when I try to run it in yarn it return java.io.FileNotFoundException: File file:/home/myuser/myfile.xml does not exist.
spark2-shell --jars spark-xml_2.10-0.4.1.jar --master yarn
I tried to add --deply-mode as client and cluster but it didn't work.
Th file file:///home/myuser/myfile.xml" seems only accessible on your driver, but not on your executors. You should either
A) manually the file on HDFS and read from there
or B) use --files option of spark2-shell which does upload the file to HDFS for you:
spark2-shell --files /home/myuser/myfile.xml --jars spark-xml_2.10-0.4.1.jar --master yar
and then
val record1 = sc.newAPIHadoopFile(org.apache.spark.SparkFiles.get("myfile.xml"), classOf[XmlInputFormat], classOf[LongWritable],classOf[Text])

Resources