Flume is not able to send the event when submitting the job on cluster with yarn-client - apache-spark

I am using Horton Works Cluster (2 Node cluster) to run the spark and flume , So when I am running the job with --master "local[*]" , Flume is able to send the events and Spark is also able to receive and on checking at localhost:4040 I can see the events are being received from the flume. (We are pumping 100 Events/Sec from flume using flume-ng-sql source with an approx size of ~1KB each)
Where as when I run the same example with --master "yarn-client" , I am getting the below error in flume and spark is not getting any events as well.
2015-08-13 18:24:24,927 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:403)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flume.FlumeException: NettyAvroRpcClient { host: localhost, port: 55555 }: RPC connection error
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:182)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:121)
at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:638)
at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
at org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:127)
at org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:222)
at org.apache.flume.sink.AbstractRpcSink.verifyConnection(AbstractRpcSink.java:283)
at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:360)
... 3 more
Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:55555
at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261)
at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:203)
at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:152)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:168)
... 10 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:496)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:452)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:365)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
^
Also below observation has been observed in cluster:
-- Memory consumption using yarn is pretty much higher than compared to that being used in case of local.
-- Also when I am pumping 100 events per 30 second then Flume and spark are able to connect and process the same using yarn-client as well as local..
Below is the command which I am using for flume and spark.
Flume:
sudo -u hdfs flume-ng agent --conf conf/ -f conf/flume_mysql_spark.conf -n agent1 -Dflume.root.logger=INFO,console > flumelog.txt
Spark:
sudo -u hdfs spark-submit --master "yarn-client" --class "org.paladion.atm.FlumeEventCount" target/atm-1.1-jar-with-dependencies.jar > sparklog.txt
sudo -u hdfs spark-submit --master "local[*]" --class "org.paladion.atm.FlumeEventCount" target/atm-1.1-jar-with-dependencies.jar > sparklog.txt
Kindly l;et me know what could be wrong over here?

It got solves as below:
1 - If running as local give IP of local machine in Flume as well as spark.
2 - If running as cluster (yarn-client or yarn-cluster) give IP of the machine in cluster where you want to send the events (other than the one where you are executing the program so may be give IP of node which is not a master node) machine in Flume as well as spark.
Let me know if I am wrong and this could have worked for some other reason and any better solution is there for the same.

Related

Spark Structure Streaming job failing in cluster mode

I am using spark-sql-2.4.1 v in my application.
While writing data on to hdfs folder I am facing this issue in spark-streaming application
Error:
yarn.Client: Deleted staging directory hdfs://dev/user/xyz/.sparkStaging/application_1575699597805_47
20/02/24 14:02:15 ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied: user= xyz, access=WRITE, inode="/tmp/hadoop-admin":admin:supergroup:drwxr-xr-x
.
.
.
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=xyz, access=WRITE, inode="/tmp/hadoop-admin":admin:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:350)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:251)
While writing data on to HDFS folder I am facing this issue in spark-streaming application. When I run in yarn-cluster mode I face this issue i.e.
--master yarn \
--deploy-mode cluster \
But when I run in “yarn-client” mode it runs fine i.e.
--master yarn \
--deploy-mode client \
What is the root cause of this problem?
Fundamental question here, why it is trying to write in "/tmp/hadoop-admin/" instead of respective user directory i.e. hdfs://qa2/user/xyz/?
I have come across this fix:
https://issues.apache.org/jira/browse/SPARK-26825
How can I implement it in my spark-sql application?
The only difference between the working --deploy-mode client and the failing --deploy-mode cluster cases is the location of the driver. In client deploy mode, the driver runs on the machine you execute spark-submit (which is usually an edge node that is configured to use a YARN cluster, but it is not part of it) while in cluster deploy mode the driver runs as part of a YARN cluster (one of the nodes under control of YARN).
It looks like you've got a misconfigured edge node.
I'd not be surprised if a regular Spark SQL-only Spark application would be failing too. I'd not be surprised to hear that it has nothing to do with a streaming query (Spark Structured Streaming) and would fail for any Spark application.

Spark job failed to write to Alluxio due to DeadlineExceededException

I am running a Spark job writing to an Alluxio cluster with 20 workers (Alluxio 1.6.1). Spark job failed to write its output due to alluxio.exception.status.DeadlineExceededException. The worker is still alive from Alluxio WebUI. How can I avoid this failure?
alluxio.exception.status.DeadlineExceededException: Timeout writing to WorkerNetAddress{host=spark-74-44.xxxx, rpcPort=51998, dataPort=51999, webPort=51997, domainSocketPath=} for request type: ALLUXIO_BLOCK
id: 3209355843338240
tier: 0
worker_group {
host: "spark6-64-156.xxxx"
rpc_port: 51998
data_port: 51999
web_port: 51997
socket_path: ""
}
This error indicates that your Spark job timed out while trying to write data to an Alluxio worker. The worker could be under high load, or have a slow connection to your UFS.
The default timeout is 30 seconds. To increase the timeout, configure alluxio.user.network.netty.timeout on the Spark side.
For example, to increase the timeout to 5 minutes, use the --conf option to spark-submit
$ spark-submit --conf 'spark.executor.extraJavaOptions=-Dalluxio.user.network.netty.timeout=5min' \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.network.netty.timeout=5min' \
...
You can also set these properties in your spark-defaults.conf file to have them automatically applied to all jobs.
Source: https://www.alluxio.org/docs/1.6/en/Configuration-Settings.html#spark-jobs

Spark executor failure on mesos agent while using mesos dispatcher in cluster mode

I launched dispatcher as follows and the launch was successful as seen from the logs
./sbin/start-mesos-dispatcher.sh --master mesos://10.0.0.6:5050
Rest server was activated on port 7078
I submitted the job to the dispatcher as follows
./bin/spark-submit \
--class com.ibm.cds.spark.samples.HelloSpark \
--master mesos://10.0.0.6:7078 \
--deploy-mode cluster \
--verbose \
https://github.com/../helloSpark.jar
On the spark slave, I get the following error in mesos agent sandbox - stderr.
17/11/22 09:22:06 INFO RestSubmissionClient: Submitting a request to launch an application in mesos://10.0.0.6:5050.
Exception in thread "main" org.apache.spark.deploy.rest.SubmitRestProtocolException: Malformed response received from server
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:268)
at org.apache.spark.deploy.rest.RestSubmissionClient.org$apache$spark$deploy$rest$RestSubmissionClient$$postJson(RestSubmissionClient.s
Question:
why is the executor submitting the launch of application to mesos-master? In spark-submit (above), I clearly give spark master address (at port 7078). why is this not taken?
How can I avoid this error?
using mesos version 1.4.1
removed all entries in spark-defaults.conf except the below.
spark.eventLog.enabled true
It works fine now, meaning, I dont get this error.
looks like having a spark.master called out in spark-defaults.conf was causing this issue.

java.net.ConnectException (on port 9000) while submitting a spark job

On running this command:
~/spark/bin/spark-submit --class [class-name] --master [spark-master-url]:7077 [jar-path]
I am getting
java.lang.RuntimeException: java.net.ConnectException: Call to ec2-[ip].compute-1.amazonaws.com/[internal-ip]:9000 failed on connection exception: java.net.ConnectException: Connection refused
Using spark version 1.3.0.
How do I resolve it?
When Spark is run in Cluster mode, all input files will be expected to be from HDFS (otherwise how will workers read from master's local files). But in this case, Hadoop wasn't running, so it was giving this exception.
Starting HDFS resolved this.

Spark 2 application on the same time

I am using spark streaming and saving the processed output in a data.csv file
val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(1000))
JavaReceiverInputDStream<String> lines = jssc.socketTextStream("localhost", 9999);
At the same time i would like to read output of NetworkWordCount data.csv along with another newfile and process it again simultaneously
My Question here is
Is it possible to run two spark applications at the same time?
Is it possible to submit a spark application through the code itself
I am using mac, Now i am submitting spark application from the spark folder with the following command
bin/spark-submit --class "com.abc.test.SparkStreamingTest" --master spark://xyz:7077 --executor-memory 20G --total-executor-cores 100 ../workspace/Test/target/Test-0.0.1-SNAPSHOT-jar-with-dependencies.jar 1000
or just without spark:ip:port and executor memory, total executor core
bin/spark-submit --class "com.abc.test.SparkStreamingTest" --master local[4] ../workspace/Test/target/Test-0.0.1-SNAPSHOT-jar-with-dependencies.jar
and the other application which read the textfile for batch processing like follows
bin/spark-submit --class "com.abc.test.BatchTest" --master local[4] ../workspace/Batch/target/BatchTesting-0.0.1-SNAPSHOT-jar-with-dependencies.jar
when i run the both the applictions SparkStreamingTest and BatchTest separately both works fine , but when i tried to run both simultaneously, i get the following error
Currently i am using spark stand alone mode
WARN AbstractLifeCycle: FAILED SelectChannelConnector#0.0.0.0:4040: java.net.BindException: Address already in use
java.net.BindException: Address already in use
Any help is much appriciated.. i am totally out of my mind
From http://spark.apache.org/docs/1.1.0/monitoring.html
If multiple SparkContexts are running on the same host, they will bind to successive ports beginning with 4040 (4041, 4042, etc).
Your apps should be able to run. It's just a warning to tell you about port conflicts. It's because you run the two Spark apps in the same time. But don't worry about it. Spark will try 4041, 4042 until it finds an available port. So in your case, you will find two Web UIs: ip:4040, ip:4041 for these two apps.

Resources