error: not found: value sqlContext on EMR - apache-spark

I am on EMR using Spark 2. When I ssh into the master node and run spark-shell I can't see to have access to sqlContext. Is there something I'm missing?
[hadoop#ip-172-31-13-180 ~]$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/11/10 21:07:05 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/11/10 21:07:14 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://172.31.13.180:4040
Spark context available as 'sc' (master = yarn, app id = application_1478720853870_0003).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.1
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext
scala> sqlContext
<console>:25: error: not found: value sqlContext
sqlContext
^
Since I'm getting same error on my local computer I've tried the following to no avail:
exported SPARK_LOCAL_IP
➜ play grep "SPARK_LOCAL_IP" ~/.zshrc
export SPARK_LOCAL_IP=127.0.0.1
➜ play source ~/.zshrc
➜ play spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/11/10 16:12:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/10 16:12:19 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://127.0.0.1:4040
Spark context available as 'sc' (master = local[*], app id = local-1478812339020).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79)
Type in expressions to have them evaluated.
Type :help for more information.
scala> sqlContext
<console>:24: error: not found: value sqlContext
sqlContext
^
scala>
My /etc/hosts contains the following
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost

Spark 2.0 doesn't use SQLContext anymore:
use SparkSession (initialized in spark-shell as spark).
for legacy application you can:
val sqlContext = spark.sqlContext

Related

Google PubSub in Apache Spark 2.2.1

I'm trying to use Google Cloud PubSub within a Spark application. For simplicity let's just say that this application is Spark's shell. Trying to instantiate a Publisher throws a NoClassDefFoundError, which is most likely the result of dependency version conflicts. However, with a simple setup like this (just Spark and a Google Cloud PubSub dependency), I can't figure out how to resolve this issue.
bash-4.4# spark-shell --packages com.google.cloud:google-cloud-pubsub:1.105.0
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-2.2.1-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.google.cloud#google-cloud-pubsub added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.google.cloud#google-cloud-pubsub;1.105.0 in central
found io.grpc#grpc-api;1.28.1 in central
...
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.1
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
Type in expressions to have them evaluated.
Type :help for more information.
scala> com.google.cloud.pubsub.v1.Publisher.newBuilder("topic").build
java.lang.NoClassDefFoundError: com/google/api/gax/grpc/InstantiatingGrpcChannelProvider
at com.google.cloud.pubsub.v1.stub.PublisherStubSettings.defaultGrpcTransportProviderBuilder(PublisherStubSettings.java:225)
at com.google.cloud.pubsub.v1.TopicAdminSettings.defaultGrpcTransportProviderBuilder(TopicAdminSettings.java:169)
at com.google.cloud.pubsub.v1.Publisher$Builder.<init>(Publisher.java:674)
at com.google.cloud.pubsub.v1.Publisher$Builder.<init>(Publisher.java:625)
at com.google.cloud.pubsub.v1.Publisher.newBuilder(Publisher.java:621)
... 48 elided
Caused by: java.lang.ClassNotFoundException: com.google.api.gax.grpc.InstantiatingGrpcChannelProvider
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 53 more
Is there any way to get this to work? I could change the pubsub dependency version, but not the Spark version.
That is due to Google's Guava dependency conflict which is known to exist whenever using Spark + Google Libraries. The workaround (with Maven) is using the maven-shade-plugin.

Why spark submit --files option not working?

spark-submit --files option not working as expecting
I am trying use following option for spark-submit
--files FILES Comma-separated list of files to be
placed in the working
directory of each executor. File paths of these files
in executors can be accessed via SparkFiles.get(fileName).
sh-4.2$ spark-shell --files etl_emr_test_config.json
..............................................
.............................................
..........................
..................................
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.0
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.apache.spark._
import org.apache.spark._
scala> SparkFiles.get("etl_emr_test_config.json")
res0: String = /mnt/tmp/spark-770e7981-2a38-4b12-950d-3519e70bdbe0/userFiles-afa53bd8-45c9-4c30-a923-feb2f0927117/etl_emr_test_config.json
scala> spark.read.text(SparkFiles.get("etl_emr_test_config.json")).show()
org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://ip-100-69-166-111.ec2.internal:8020/mnt/tmp/spark-770e7981-2a38-4b12-950d-3519e70bdbe0/userFiles-afa53bd8-45c9-4c30-a923-feb2f0927117/etl_emr_test_config.json;
I was expecting etl_emr_test_config.json to be present in SparkFiles.get("etl_emr_test_config.json") path but it gives me error that file is not present

Apache Spark installation error.

I am able to install Apache spark with the given set of commands on ubuntu 16:
dpkg -i scala-2.12.1.deb
mkdir /opt/spark
tar -xvf spark-2.0.2-bin-hadoop2.7.tgz
cp -rv spark-2.0.2-bin-hadoop2.7/* /opt/spark
cd /opt/spark
executing spark shell worked well
./bin/spark-shell --master local[2]
return this output on the shell:
jai#jaiPC:/opt/spark$ ./bin/spark-shell --master local[2]
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
18/05/15 19:00:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/15 19:00:55 WARN Utils: Your hostname, jaiPC resolves to a loopback address: 127.0.1.1; using 172.16.16.46 instead (on interface enp4s0)
18/05/15 19:00:55 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/05/15 19:00:55 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://172.16.16.46:4040
Spark context available as 'sc' (master = local[2], app id = local-1526391055793).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.2
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
but when I tried to access
Spark context Web UI available at http://172.16.16.46:4040
it shows
The page cannot be displayed
How can I resolve this problem
Please help:
Thanks and Regards

I want to run spark shell in client mode?

Spark context available as 'sc' (master = yarn, app id = application_1519491124804_0002).
I need master = yarn-client
error:
Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/02/24 22:27:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/24 22:27:29 WARN Utils: Your hostname, suraj resolves to a loopback address: 127.0.1.1; using 192.168.43.193 instead (on interface wlan0)
18/02/24 22:27:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/02/24 22:27:32 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://192.168.43.193:4040 Spark context available as 'sc' (master
= yarn, app id = application_1519491124804_0002).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_161) Type in expressions to have them evaluated. Type :help for more information.
I need master = yarn-client
In Spark 2.x master = yarn-client is deprecated.
spark-shell --master yarn --deploy-mode client is the correct way to run the shell
The default deploy-mode is client

spark-shell, dependency jars and class not found exception

I'm trying to run my spark app on spark shell. Here is what I tried and many more variants after hours of reading on this error...but none seem to work.
spark-shell --class my_home.myhome.RecommendMatch —jars /Users/anon/Documents/Works/sparkworkspace/myhome/target/myhome-0.0.1-SNAPSHOT.jar,/Users/anon/Documents/Works/sparkworkspace/myhome/target/original-myhome-0.0.1-SNAPSHOT.jar
What is get instead is
java.lang.ClassNotFoundException: my_home.myhome.RecommendMatch
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Any ideas please? Thanks!
UPDATE:
Found that the jars must be colon(:) separated and not comma(,) separated as described in several articles/docs
spark-shell --class my_home.myhome.RecommendMatch —jars /Users/anon/Documents/Works/sparkworkspace/myhome/target/myhome-0.0.1-SNAPSHOT.jar:/Users/anon/Documents/Works/sparkworkspace/myhome/target/original-myhome-0.0.1-SNAPSHOT.jar
However, now the errors have changed. Note ls -la finds the paths although the following lines complain that don't exit. Bizarre..
Warning: Local jar /Users/anon/Documents/Works/sparkworkspace/myhome/target/myhome-0.0.1-SNAPSHOT.jar:/Users/anon/Documents/Works/sparkworkspace/myhome/target/original-myhome-0.0.1-SNAPSHOT.jar does not exist, skipping.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
at sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:314)
at sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java:268)
UPDATE 2:
spark-shell —class my_home.myhome.RecommendMatch —-jars “/Users/anon/Documents/Works/sparkworkspace/myhome/target/myhome-0.0.1-SNAPSHOT.jar:/Users/anon/Documents/Works/sparkworkspace/myhome/target/original-myhome-0.0.1-SNAPSHOT.jar”
The above command yields the following on spark-shell.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/05/16 01:19:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/16 01:19:13 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.0.101:4040
Spark context available as 'sc' (master = local[*], app id = local-1494877749685).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :load my_home.myhome.RecommendMatch
That file does not exist
scala> :load RecommendMatch
That file does not exist
scala> :load my_home.myhome.RecommendMatch.scala
That file does not exist
scala> :load RecommendMatch.scala
That file does not exist
The jars don't seem to be loaded :( based on what I see at http://localhost:4040/environment/
The URLs supplied to --jars must be separated by commas. Your first command is correct.
You also have to add the jar at last param to spark-submit. Lets say my_home.myhome.RecommendMatch is part of myhome-0.0.1-SNAPSHOT.jar jar file.
spark-submit --class my_home.myhome.RecommendMatch \
—jars "/Users/anon/Documents/Works/sparkworkspace/myhome/target/original-myhome-0.0.1-SNAPSHOT.jar" \
/Users/anon/Documents/Works/sparkworkspace/myhome/target/myhome-0.0.1-SNAPSHOT.jar

Resources