Error reading Hive table from Spark using JdbcStorageHandler

Error reading Hive table from Spark using JdbcStorageHandler - apache-spark

I've set up access to an external relational store (PostgreSQL) via my Spark/Hive deployment. I can read this table via Hive/Beeline, but it fails when I try to read via SparkSQL/pyspark3 jupyter notebook, because it's unable to find JdbcStorageHandler. I've tried to add the appropriate jars in a couple of ways but am hitting the same stack trace across the board - any advice on what jar and version I need, and where exactly I should put it, for this to work? Stack trace:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.hive.storage.jdbc.JdbcStorageHandler
..
..
java.lang.ClassNotFoundException: org.apache.hive.storage.jdbc.JdbcStorageHandler
In terms of getting Hive/Beeline to work: I did as described in this JDBC Storage Handler document. I hit a few jar dependency problems while doing this, but resolved it by adding the hive-jdbc-2.0.0.jar, postgresql-42.2.12.jar jars after launching Beeline, and can now successfully read data directly from the relational store from Beeline.
Some things I've tried:
Add the jars listed above with spark.jars.packages in the notebook sparkmagic conf. hive-jdbc 2.0.0 installs cleanly but yields aforementioned error. I tried hive-jdbc 3.1.0 also, but it errors out and does not install. I was a little confused as to how to assess compatibility here, might be a distraction.
Launch spark-sql on the cluster directly, add hive-jdbc-2.0.0.jar jar (successfully). Same stack trace.
Add Apache Hive libraries across the cluster during cluster creation (the hive-jdbc, and postgres driver)
Look around the rest of /usr/hdp for hive-jdbc, of which there are a variety of versions (beneath zeppelin, spark2, oozie, hive-hcatalog, hive, ranger-admin).
Environment details:
running on Azure HDInsight
Spark 2.4 (HDI 4.0)

Please copy the hive-jdbc-handler.jar to $SPARK_HOME/standalone-metastore directory in all nodes.
cp /usr/hdp/current/hive-client/lib/hive-jdbc-handler.jar /usr/hdp/current/spark2-client/standalone-metastore/
After that launch the spark-shell and test the example
sudo -u spark spark-shell --master yarn --jars /usr/hdp/current/hive-client/lib/hive-jdbc-handler.jar
scala> spark.sql("select * from table_name").show()
If you get below error, then there is an issue created for this one, you need to back port that code Spark 3.0.0 to your cluster spark version for example Spark 2.3.2.
Caused by: java.lang.IllegalArgumentException: requirement failed: length (-1) cannot be negative
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.rdd.InputFileBlockHolder$.set(InputFileBlockHolder.scala:70)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:226)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:214)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
https://issues.apache.org/jira/browse/SPARK-27259

Related

Spark 3.2 on Kubernetes keeps throwing okhttp3/okio EOFException

I'm using Spark 3.2.1 image that was built from the official distribution via `docker-image-tool.sh', on Kubernetes 1.18 cluster. Everything works fine, except for this error message every 90 seconds:
WARN WatcherWebSocketListener: Exec Failure
java.io.EOFException
at okio.RealBufferedSource.require(RealBufferedSource.java:61)
at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
This error message does not effect the application, but it's really annoying, especially for Jupyter users, and the lack of details makes it very hard to debug.
It appears on any submit variation - spark-submit, pyspark, spark-shell, and regardless to dynamic execution enabled or disabled.
I've found traces of it on the internet, but all occurrences were from older versions of Spark and resolved by using "newer" version of fabric8 (4.x). Spark 3.2.1 already use fabric8 5.4.1.
I wonder if anyone else still sees this error in Spark 3.x, and has a resolution.
Thanks.
Update:
This seems to be related to the Kubernetes cluster itself. After migrating to a new cluster this error was gone.

Flink-Cassandra connector throws exception (flink-connector-cassandra_2.11-1.10.0)

I am trying to upgrade flink 1.7.2 to flink 1.10 and I am having problem with cassandra connector. Everytime I start a job that is using it the following exception is thrown:
com.datastax.driver.core.exceptions.TransportException: [/xx.xx.xx.xx] Error writing
at com.datastax.driver.core.Connection$10.operationComplete(Connection.java:550)
at com.datastax.driver.core.Connection$10.operationComplete(Connection.java:534)
at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyLateListener(DefaultPromise.java:621)
at com.datastax.shaded.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:138)
at com.datastax.shaded.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:93)
at com.datastax.shaded.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:28)
at com.datastax.driver.core.Connection$Flusher.run(Connection.java:870)
at com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
at com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.datastax.shaded.netty.handler.codec.EncoderException: java.lang.OutOfMemoryError: Direct buffer memory
at com.datastax.shaded.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:643)
Also the following message was printed when the job was run locally (not in YARN):
13:57:54,490 ERROR com.datastax.shaded.netty.util.ResourceLeakDetector - LEAK: You are creating too many HashedWheelTimer instances. HashedWheelTimer is a shared resource that must be reused across the JVM,so that only a few instances are created.
All jobs that do not use cassandra connector are working properly
Can someone help?

UPDATE: The bug is still reproducible and I think this is the reason: https://issues.apache.org/jira/browse/FLINK-17493.
I had an old configuration (from flink 1.7) where classloader.parent-first-patterns.additional: com.datastax. was configured and my cassadndra-flink connector was in flink/lib folder ( this was done because of other problems related to shaded netty I had with Cassandra-flink connector). Now with the migration to flink 1.10 the following problem was hit. Once removing this configuration - classloader.parent-first-patterns.additional: com.datastax., including flink-connector-cassandra_2.12-1.10.0.jar in my jar and removing it from /usr/lib/flink/lib/ the problem was no longer reproducible.

Hive on Spark ERROR java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS

Running Hive on Spark with a simple select * from table query runs smoothly, but on joins and sums, the ApplicationMaster returns this stack trace for the associated spark container:
2019-03-29 17:23:43 ERROR ApplicationMaster:91 - User class threw exception: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS
java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS
at org.apache.hive.spark.client.rpc.RpcConfiguration.<clinit>(RpcConfiguration.java:47)
at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:134)
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:516)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:706)
2019-03-29 17:23:43 INFO ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: User class threw exception: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS
at org.apache.hive.spark.client.rpc.RpcConfiguration.<clinit>(RpcConfiguration.java:47)
at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:134)
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:516)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:706)
)
2019-03-29 17:23:43 ERROR ApplicationMaster:91 - Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:486)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:800)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:799)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:824)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.util.concurrent.ExecutionException: Boxed Error
at scala.concurrent.impl.Promise$.resolver(Promise.scala:55)
at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:47)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:244)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:724)
Caused by: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS
at org.apache.hive.spark.client.rpc.RpcConfiguration.<clinit>(RpcConfiguration.java:47)
at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:134)
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:516)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:706)
2019-03-29 17:23:43 INFO ApplicationMaster:54 - Deleting staging directory hdfs://LOSLDAP01:9000/user/hdfs/.sparkStaging/application_1553880018684_0001
2019-03-29 17:23:43 INFO ShutdownHookManager:54 - Shutdown hook called
I have already tried to increase yarn container memory allocation (and decrease spark memory) with no success.
Using:
Hadoop 2.9.2
Spark 2.3.0
Hive 2.3.4
Thank you for your help.

This was asked 6 months ago. Hope this helps others.
The reason for this error is SPARK_RPC_SERVER_ADDRESS added in hive version 2.x and Spark by default supports hive 1.2.1.
I was able to enable hive-on-spark using this manual on EMR 5.25 cluster (Hadoop 2.8.5, hive 2.3.5, Spark 2.4.3) for running on YARN. However, manual needs to be updated, it is missing some key items.
To run with YARN mode (either yarn-client or yarn-cluster), link the following jars to HIVE_HOME/lib. Manual didn't mention linking the last library spark-unsafe.jar
ln -s /usr/lib/spark/jars/scala-library-2.11.12.jar /usr/lib/hive/lib/scala-library.jar
ln -s /usr/lib/spark/jars/spark-core_2.11-2.4.3.jar /usr/lib/hive/lib/spark-core.jar
ln -s /usr/lib/spark/jars/spark-network-common_2.11-2.4.3.jar /usr/lib/hive/lib/spark-network-common.jar
ln -s /usr/lib/spark/jars/spark-unsafe_2.11-2.4.3.jar /usr/lib/hive/lib/spark-unsafe.jar
Allow Yarn to cache necessary spark dependency jars on nodes so that it does not need to be distributed each time when an application runs.
Hive 2.2.0, upload all jars in $SPARK_HOME/jars to hdfs folder and add following in hive-site.xml
<property>
<name>spark.yarn.jars</name>
<value>hdfs://xxxx:8020/spark-jars/*</value>
</property>
Manual is missing key information that you need to exclude default hive 1.2.1 jars. This is what I did:
hadoop fs -mkdir /spark-jars
hadoop fs -put /usr/lib/spark/jars/*.jar /spark-jars/
hadoop fs -rm /spark-jars/*hive*1.2.1*
Also, you need to add the following to spark-defaults.conf file:
spark.sql.hive.metastore.version 2.3.0;
spark.sql.hive.metastore.jars /usr/lib/hive/lib/*:/usr/lib/hadoop/client/*
For more information on interacting with different versions of hive metastore please check this link.

It turned out that Hive-on-Spark has a lot of implementation problems and essentially does not work at all unless you write your own custom Hive connector. In a nutshell, Spark devs are struggling to keep up with Hive releases and they did not yet decided how to deal with backward compatibility on how to load Hive versions ~< 2 while focusing on the newest branch.
Solutions
1) Go back to Hive 1.x
Not ideal. Especially if you want some more modern integration with file formats such as ORC.
2) Use Hive-on-Tez
This is the one we decided to adopt. *This solution does not break the open source stack* and works perfectly along with Spark-on-Yarn. 3rd party Hadoop ecosystems, like those for Azure, AWS and Hortonworks all add proprietary code just for running Hive-On-Spark because of the mess that it became.
By installing Tez, your Hadoop queries will work like this:
A direct Hive query (e.g. jdbc connection from DBeaver) will run a Tez container on the cluster
A Spark job will be able to access the Hive metastore as normal and will use a Spark container on the cluster when creating the SparkSession.builder.enableHiveSupport().getOrCreate() (this is pyspark code)
Installing Hive-on-Tez with Spark-on-Yarn
Note: I'll keep it short since I do not see much interest on these boards. Ask for details and I'll be happy to help and expand.
Version matrix
Hadoop 2.9.2
Tez 0.9.2
Hive 2.3.4
Spark 2.4.2
Hadoop is installed in cluster mode.
This is what worked for us. I would not expect it to work seamlessly when switching to Hadoop 3.x, which we will be doing at some point in the future, but it should work fine if you do not change the main release version for each component.
Basic guide
Compile Tez from source as written in the official install guide, with Mode A for sharing hadoop jars. Do not use any pre-compiled Tez distro. Test it by the hive shell with a simple query which is not a simple data access (i.e. just a select). For example, use: select count(*) from myDb.myTable. You should see the Tez bars from the hive console.
Compile Spark from source. To do so, follow the official guide (Important: download the archive labeled without-hadoop !!), but before compiling it edit the source code at ./sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala and comment out the following line: ConfVars.HIVE_STATS_JDBC_TIMEOUT -> TimeUnit.SECONDS,
Share $HIVE_HOME/conf/hive-site.xml, in your $SPARK_HOME/conf/ dir. You must make a hard copy of this config file and not a symlink. The reason is that you must remove all Tez-related Hive config values from it to guarantee that Spark co-exist independently with Tez, as explained above. This does include the hive.execution.engine=tez property which must be left empty. Just remove it completely from the Spark's hive-site.xml, while leaving it in the Hive's hive-site.xml.
In $HADOOP_HOME/etc/hadoop/mapred-site.xml set property mapreduce.framework.name=yarn. This will be picked up correctly by both environments even if it is not set to yarn-tez. It just means that raw mapreduce jobs will not run on Tez, while Hive jobs will indeed use it. This is a problem only for legacy jobs, since raw mapred is obsolete.
Good luck!

Spark action stuck with EOFException

I'm trying to execute an action with Spark with gets stuck. The corresponding executor throws following exception:
2019-03-06 11:18:16 ERROR Inbox:91 - Ignoring error
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readUTF(DataInputStream.java:609)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:131)
at org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:130)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at org.apache.spark.scheduler.TaskDescription$.decode(TaskDescription.scala:130)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:96)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
My environment is a standalone Spark cluster on Docker with Zeppelin as Spark driver. The connection to the cluster is working fine.
My Spark action is a simple output of a database read like:
spark.read.jdbc(jdbcString, "table", props).show()
I can print the schema of the table, so there shouldn't be a problem with the connection.

Please check your environment JAVA, Python, Pysaprk must be same in MASTER and WORKER and path, version same too

Our driver machine had a different version of Java compared to spark standalone cluster. When we tried with another machine with the same java version, it worked.

I had the same issue in one of folder available on S3. Data was stored as Parquet with Snappy compression. When I changed it to ORC with Snappy compression, it worked like charm.

Spark enable security with secure YARN Hadoop cluster

I have an Hadoop 3.0 cluster configured with Kerberos. Everything works fine and YARN is started as well.
Now I wish to add Spark on top of it and make full use of Hadoop and security. To do so I use a binary distribution of Spark 2.3 and modified the following.
In spark-env.sh:
YARN_CONF_DIR, set to the folder where my Hadoop configuration files core-site.xml, hdfs-site.xml and yarn-site.xml are located.
In spark-defaults.conf:
spark.master yarn
spark.submit.deployMode cluster
spark.authenticate true
spark.yarn.principal mysparkprincipal
spark.yarn.keytab mykeytabfile
If I understood correctly when using YARN, the secret key will be generated automatically and I don't need to manually set spark.authenticate.secret.
The problem I have is that the worker is complaining about the key:
java.lang.IllegalArgumentException: A secret key must be specified via the spark.authenticate.secret config
I also don't have any indication in the logs that Spark is using YARN or try to do anything with my hdfs volume. It's almost like Hadoop configuration files are ignored completely. I've read documentation on YARN and security for Spark but it's not very clear to me.
My questions are:
How can i be sure that Spark is using YARN
Do I need to set spark.yarn.access.hadoopFileSystems if I only use the server set in YARN_CONF_DIR
Is LOCAL_DIRS best to be set to HDFS and if yes, what is the syntax
Do I need both HADOOP_CONF_DIR and YARN_CONF_DIR?
Edit/Add:
After looking at the source code the exception is from SASL which is not enabled for Spark so I don't understand.
My Hadoop has SSL enabled (Data Confidentiality) and since I give Spark my server configuration maybe it requires SSL for Spark if the configuration for Hadoop has it enabled.
So far I am really confused about everything.
It says that environment variables need to be set using the spark.yarn.appMasterEnv. But which one? All of them?
Also it says that it is Hadoop CLIENT file that I need to have on the classpath but what properties should be present in a CLIENT file?
I guess I can replace the XML files using spark.hadoop.* properties but what are the properties required for Spark to know where is my YARN cluster?
Setting spark.authenticate.enableSaslEncryption to false seems to have no effect as the exception is still about SparkSaslClient
The exception is:
java.lang.IllegalArgumentException: A secret key must be specified via the spark.authenticate.secret config
at org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
at org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:509)
at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:551)
at org.apache.spark.network.sasl.SparkSaslClient$ClientCallbackHandler.handle(SparkSaslClient.java:137)
at com.sun.security.sasl.digest.DigestMD5Client.processChallenge(DigestMD5Client.java:337)
at com.sun.security.sasl.digest.DigestMD5Client.evaluateChallenge(DigestMD5Client.java:220)
at org.apache.spark.network.sasl.SparkSaslClient.response(SparkSaslClient.java:98)
at org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:71)
at org.apache.spark.network.crypto.AuthClientBootstrap.doSaslAuth(AuthClientBootstrap.java:115)
at org.apache.spark.network.crypto.AuthClientBootstrap.doBootstrap(AuthClientBootstrap.java:74)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:257)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string