Scala application using spark cassandra connector hangs - apache-spark

I am developing a test application in Intellij using scala and spark cassandra connector. Here is my build.sbt code:
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
libraryDependencies += "org.apache.spark".%%("spark-sql") % "1.6.1"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.6.0-M2"
I have created cassandra cluster using ccm with 4 nodes. The keyspace I created has replication factor 3. Here is my code in scala app
val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("SparkCassandra")
//set Cassandra host address as your local address
.set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext(conf)
val rdd = sc.cassandraTable("excelsior", "emp")
val total = rdd.count()
println(total)
println("exiting now:")
sc.stop()
But the spark job hangs at following line
CassandraConnector: Disconnected from Cassandra cluster: cluster4nodes
Only 3 tasks out of 4 are completed.Here is the full log:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/08 14:23:26 INFO SparkContext: Running Spark version 1.6.0
16/07/08 14:23:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/08 14:23:27 WARN Utils: Your hostname, renuka-Inspiron-3542 resolves to a loopback address: 127.0.1.1; using 192.168.1.189 instead (on interface wlan0)
16/07/08 14:23:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/07/08 14:23:27 INFO SecurityManager: Changing view acls to: renuka
16/07/08 14:23:27 INFO SecurityManager: Changing modify acls to: renuka
16/07/08 14:23:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(renuka); users with modify permissions: Set(renuka)
16/07/08 14:23:28 INFO Utils: Successfully started service 'sparkDriver' on port 41027.
16/07/08 14:23:28 INFO Slf4jLogger: Slf4jLogger started
16/07/08 14:23:28 INFO Remoting: Starting remoting
16/07/08 14:23:28 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#192.168.1.189:46329]
16/07/08 14:23:28 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 46329.
16/07/08 14:23:29 INFO SparkEnv: Registering MapOutputTracker
16/07/08 14:23:29 INFO SparkEnv: Registering BlockManagerMaster
16/07/08 14:23:29 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-d1db2ea9-3fba-4b64-ad2b-80deddd3f05a
16/07/08 14:23:29 INFO MemoryStore: MemoryStore started with capacity 1091.3 MB
16/07/08 14:23:29 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/08 14:23:29 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/08 14:23:29 INFO SparkUI: Started SparkUI at http://192.168.1.189:4040
16/07/08 14:23:30 INFO Executor: Starting executor ID driver on host localhost
16/07/08 14:23:30 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40956.
16/07/08 14:23:30 INFO NettyBlockTransferService: Server created on 40956
16/07/08 14:23:30 INFO BlockManagerMaster: Trying to register BlockManager
16/07/08 14:23:30 INFO BlockManagerMasterEndpoint: Registering block manager localhost:40956 with 1091.3 MB RAM, BlockManagerId(driver, localhost, 40956)
16/07/08 14:23:30 INFO BlockManagerMaster: Registered BlockManager
16/07/08 14:23:30 INFO NettyUtil: Found Netty's native epoll transport in the classpath, using it
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.1:9042 added
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.2:9042 added
16/07/08 14:23:31 INFO LocalNodeFirstLoadBalancingPolicy: Added host 127.0.0.2 (datacenter1)
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.3:9042 added
16/07/08 14:23:31 INFO LocalNodeFirstLoadBalancingPolicy: Added host 127.0.0.3 (datacenter1)
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.4:9042 added
16/07/08 14:23:31 INFO LocalNodeFirstLoadBalancingPolicy: Added host 127.0.0.4 (datacenter1)
16/07/08 14:23:31 INFO CassandraConnector: Connected to Cassandra cluster: cluster4nodes
16/07/08 14:23:31 INFO SparkContext: Starting job: count at hello.scala:36
16/07/08 14:23:32 INFO DAGScheduler: Got job 0 (count at hello.scala:36) with 4 output partitions
16/07/08 14:23:32 INFO DAGScheduler: Final stage: ResultStage 0 (count at hello.scala:36)
16/07/08 14:23:32 INFO DAGScheduler: Parents of final stage: List()
16/07/08 14:23:32 INFO DAGScheduler: Missing parents: List()
16/07/08 14:23:32 INFO DAGScheduler: Submitting ResultStage 0 (CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18), which has no missing parents
16/07/08 14:23:32 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 7.2 KB, free 7.2 KB)
16/07/08 14:23:32 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.7 KB, free 10.9 KB)
16/07/08 14:23:32 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40956 (size: 3.7 KB, free: 1091.2 MB)
16/07/08 14:23:32 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/07/08 14:23:32 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18)
16/07/08 14:23:32 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
16/07/08 14:23:32 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,NODE_LOCAL, 3530 bytes)
16/07/08 14:23:32 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,NODE_LOCAL, 3530 bytes)
16/07/08 14:23:32 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,NODE_LOCAL, 3530 bytes)
16/07/08 14:23:32 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
16/07/08 14:23:32 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/07/08 14:23:32 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/07/08 14:23:34 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2082 bytes result sent to driver
16/07/08 14:23:34 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2082 bytes result sent to driver
16/07/08 14:23:34 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 2082 bytes result sent to driver
16/07/08 14:23:34 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 2089 ms on localhost (1/4)
16/07/08 14:23:34 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2179 ms on localhost (2/4)
16/07/08 14:23:34 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2117 ms on localhost (3/4)
16/07/08 14:23:41 INFO CassandraConnector: Disconnected from Cassandra cluster: cluster4nodes
If I created a keyspace with replication factor 4 on 4 nodes cluster, app works fine and never hangs. Am I missing anything in configuration. Thanks in advance.

Related

Permission denied error when setting up local Spark instance and running pyspark

I am setting up a local Spark instance on Windows to use with PySpark as described in this guide (but with spark-3.0.0 / hadoop 2.7 instead): https://phoenixnap.com/kb/install-spark-on-windows-10.
I can startup Spark with:
C:\Spark\spark-3.0.0-bin-hadoop2.7\bin>spark-shell.cmd
and connect to it with http://localhost:4040/ in my browser (I see the Spark GUI).
But when am running the Python pyspark example with
C:\Spark\spark-3.0.0-bin-hadoop2.7\examples>run-example SparkPi
it throws an Permission Denied error like in this trace:
21/03/08 10:51:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/08 10:51:04 INFO SparkContext: Running Spark version 3.0.0
21/03/08 10:51:04 INFO ResourceUtils: ==============================================================
21/03/08 10:51:04 INFO ResourceUtils: Resources for spark.driver:
21/03/08 10:51:04 INFO ResourceUtils: ==============================================================
21/03/08 10:51:04 INFO SparkContext: Submitted application: Spark Pi
21/03/08 10:51:04 INFO SecurityManager: Changing view acls to: #####
21/03/08 10:51:04 INFO SecurityManager: Changing modify acls to: #####
21/03/08 10:51:04 INFO SecurityManager: Changing view acls groups to:
21/03/08 10:51:04 INFO SecurityManager: Changing modify acls groups to:
21/03/08 10:51:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(#####); groups with view permissions: Set(); users with modify permissions: Set(#####); groups with modify permissions: Set()
21/03/08 10:51:05 INFO Utils: Successfully started service 'sparkDriver' on port 63213.
21/03/08 10:51:05 INFO SparkEnv: Registering MapOutputTracker
21/03/08 10:51:05 INFO SparkEnv: Registering BlockManagerMaster
21/03/08 10:51:05 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/03/08 10:51:05 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/03/08 10:51:05 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
21/03/08 10:51:05 INFO DiskBlockManager: Created local directory at C:\Users\#####\AppData\Local\Temp\blockmgr-dce03954-27a7-484d-8e54-f552b21433f7
21/03/08 10:51:05 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
21/03/08 10:51:05 INFO SparkEnv: Registering OutputCommitCoordinator
21/03/08 10:51:05 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/03/08 10:51:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://WORKSTATION.DOMAIN.EXT:4040
21/03/08 10:51:05 INFO SparkContext: Added JAR file:///C:/Spark/spark-3.0.0-bin-hadoop2.7/examples/jars/scopt_2.12-3.7.1.jar at spark://WORKSTATION.DOMAIN.EXT:63213/jars/scopt_2.12-3.7.1.jar with timestamp 1615197065578
21/03/08 10:51:05 INFO SparkContext: Added JAR file:///C:/Spark/spark-3.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.12-3.0.0.jar at spark://WORKSTATION.DOMAIN.EXT:63213/jars/spark-examples_2.12-3.0.0.jar with timestamp 1615197065579
21/03/08 10:51:05 INFO Executor: Starting executor ID driver on host WORKSTATION.DOMAIN.EXT
21/03/08 10:51:05 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 63260.
21/03/08 10:51:05 INFO NettyBlockTransferService: Server created on WORKSTATION.DOMAIN.EXT:63260
21/03/08 10:51:05 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/03/08 10:51:05 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, NLLR4000250910.solon.prd, 63260, None)
21/03/08 10:51:05 INFO BlockManagerMasterEndpoint: Registering block manager NLLR4000250910.solon.prd:63260 with 366.3 MiB RAM, BlockManagerId(driver, WORKSTATION.DOMAIN.EXT, 63260, None)
21/03/08 10:51:05 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, NLLR4000250910.solon.prd, 63260, None)
21/03/08 10:51:05 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, NLLR4000250910.solon.prd, 63260, None)
21/03/08 10:51:06 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
21/03/08 10:51:06 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
21/03/08 10:51:06 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
21/03/08 10:51:06 INFO DAGScheduler: Parents of final stage: List()
21/03/08 10:51:06 INFO DAGScheduler: Missing parents: List()
21/03/08 10:51:06 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
21/03/08 10:51:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.1 KiB, free 366.3 MiB)
21/03/08 10:51:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1816.0 B, free 366.3 MiB)
21/03/08 10:51:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on WORKSTATION.DOMAIN.EXT:63260 (size: 1816.0 B, free: 366.3 MiB)
21/03/08 10:51:06 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1200
21/03/08 10:51:06 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
21/03/08 10:51:06 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
21/03/08 10:51:06 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, WORKSTATION.DOMAIN.EXT, executor driver, partition 0, PROCESS_LOCAL, 7393 bytes)
21/03/08 10:51:06 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, WORKSTATION.DOMAIN.EXT, executor driver, partition 1, PROCESS_LOCAL, 7393 bytes)
21/03/08 10:51:06 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
21/03/08 10:51:06 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/03/08 10:51:06 INFO Executor: Fetching spark://WORKSTATION.DOMAIN.EXT:63213/jars/spark-examples_2.12-3.0.0.jar with timestamp 1615197065579
21/03/08 10:51:06 ERROR Utils: Aborting task
java.io.IOException: Failed to connect to WORKSTATION.DOMAIN.EXT/192.168.#.#:63213
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:392)
at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:360)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:359)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:719)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:535)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:869)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:860)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:860)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:404)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: Permission denied: no further information: WORKSTATION.DOMAIN.EXT/192.168.#.#:63213
Caused by: java.net.SocketException: Permission denied: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Unknown Source)
[snip]
When running it on a different machine with seemingly the same config where it works fine, I get this trace on the part where the Exception is thrown on the other trace:
[snip]
21/03/08 08:00:22 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/03/08 08:00:22 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
21/03/08 08:00:22 INFO Executor: Fetching spark://WORKSTATION.DOMAIN.EXT:63646/jars/spark-examples_2.12-3.0.0.jar with timestamp 1615186820489
21/03/08 08:00:22 INFO TransportClientFactory: Successfully created connection to WORKSTATION.DOMAIN.EXT/10.121.#.#:63646 after 86 ms (0 ms spent in bootstraps)
21/03/08 08:00:22 INFO Utils: Fetching spark://WORKSTATION.DOMAIN.EXT:63646/jars/spark-examples_2.12-3.0.0.jar to C:\Users\#####\AppData\Local\Temp\spark-54a13d9f-9064-4f34-ba81-af49b18d9a0c\userFiles-24c3eabc-02a4-4aca-8abb-424431c6442f\fetchFileTemp5258763437798623210.tmp
21/03/08 08:00:24 INFO Executor: Adding file:/C:/Users/#####/AppData/Local/Temp/spark-54a13d9f-9064-4f34-ba81-af49b18d9a0c/userFiles-24c3eabc-02a4-4aca-8abb-424431c6442f/spark-examples_2.12-3.0.0.jar to class loader
[snip]
At first it seemed to me as a Firewall issue, but adding the executing java.exe as exeption to the firewall didn't solve the issue.
Does anyone know what I should try next to get this issue resolved?
Finally I could solve it by setting my SPARK_LOCAL_IP to localhost in my environment variables: Go to your Windows environment variables and set SPARK_LOCAL_IP=localhost

FileNotFoundException on submitting Spark Jobs to remote

I've created an environment where I've set up 3 Docker containers, 1 for Airflow using the puckel/docker-airflow image with spark and hadoop additionally installed. The other two containers are basically imitating spark master and worker (used gettyimages/spark Docker image to create this). All 3 containers are connected to each other via a bridge network, so all containers are able to communicate with each other.
What I'm trying to do next is to submit spark job from the Airflow container to the Spark cluster (master).
As an initial example, I'm using the wordcount sample script. I created a sample.txt file in the airflow container at path usr/local/airflow/sample.txt. I've bashed into the Airflow container and I'm using the command given below to run the wordcount.py on spark master located at the ip which I found after inspecting the bridge network.
spark-submit --master spark://ipaddress:7077 --files usr/local/airflow/sample.txt /opt/spark-2.4.1/examples/src/main/python/wordcount.py sample.txt
After submitting the script, from the logs, I can see that a connection has been established with the master (from airflow container), and it also copied the file specified by --files to the master and worker, but then it just errors out saying,
java.io.FileNotFoundException: File file:/usr/local/airflow/sample.txt does not exist
As per my understanding (could be wrong), but when we specify files to copy to master using --files you can access it directly via the file name (sample.txt in my case). So what I'm trying to figure out is if a job has been submitted and the file has been copied to master, then why is it searching in the location file:/usr/local/airflow/sample.txt? How do I make it refer to the correct path?
I apologize as this question has been asked a couple of times, but I've read all the related question on stackoverflow, but I'm still unable to resolve this. I'd really appreciate y'alls help on this.
Thanks.
The full log below,
user#machine:/usr/local/airflow# spark-submit --master spark://172.22.0.2:7077 --files sample.txt /opt/spark-2.4.1/examples/src/main/python/wordcount.py ./sample.txt
20/07/25 03:23:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/07/25 03:23:35 INFO SparkContext: Running Spark version 2.4.1
20/07/25 03:23:35 INFO SparkContext: Submitted application: PythonWordCount
20/07/25 03:23:35 INFO SecurityManager: Changing view acls to: root
20/07/25 03:23:35 INFO SecurityManager: Changing modify acls to: root
20/07/25 03:23:35 INFO SecurityManager: Changing view acls groups to:
20/07/25 03:23:35 INFO SecurityManager: Changing modify acls groups to:
20/07/25 03:23:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
20/07/25 03:23:35 INFO Utils: Successfully started service 'sparkDriver' on port 33457.
20/07/25 03:23:35 INFO SparkEnv: Registering MapOutputTracker
20/07/25 03:23:36 INFO SparkEnv: Registering BlockManagerMaster
20/07/25 03:23:36 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/07/25 03:23:36 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/07/25 03:23:36 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-dd1957de-6907-484d-a3d8-2b3b88e0c7ca
20/07/25 03:23:36 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/07/25 03:23:36 INFO SparkEnv: Registering OutputCommitCoordinator
20/07/25 03:23:36 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/07/25 03:23:36 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://0508a77fcaad:4040
20/07/25 03:23:37 INFO SparkContext: Added file file:///usr/local/airflow/sample.txt at spark://0508a77fcaad:33457/files/sample.txt with timestamp 1595647417081
20/07/25 03:23:37 INFO Utils: Copying /usr/local/airflow/sample.txt to /tmp/spark-f9dfe6ee-22d7-4747-beab-9450fc1afce0/userFiles-74f8cfe4-8a19-4d2e-8fa1-1f0bd1f0ef12/sample.txt
20/07/25 03:23:37 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://172.22.0.2:7077...
20/07/25 03:23:37 INFO TransportClientFactory: Successfully created connection to /172.22.0.2:7077 after 32 ms (0 ms spent in bootstraps)
20/07/25 03:23:38 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20200725032338-0003
20/07/25 03:23:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45057.
20/07/25 03:23:38 INFO NettyBlockTransferService: Server created on 0508a77fcaad:45057
20/07/25 03:23:38 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/07/25 03:23:38 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200725032338-0003/0 on worker-20200725025003-172.22.0.4-8881 (172.22.0.4:8881) with 2 core(s)
20/07/25 03:23:38 INFO StandaloneSchedulerBackend: Granted executor ID app-20200725032338-0003/0 on hostPort 172.22.0.4:8881 with 2 core(s), 1024.0 MB RAM
20/07/25 03:23:38 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 0508a77fcaad, 45057, None)
20/07/25 03:23:38 INFO BlockManagerMasterEndpoint: Registering block manager 0508a77fcaad:45057 with 366.3 MB RAM, BlockManagerId(driver, 0508a77fcaad, 45057, None)
20/07/25 03:23:38 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 0508a77fcaad, 45057, None)
20/07/25 03:23:38 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 0508a77fcaad, 45057, None)
20/07/25 03:23:38 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200725032338-0003/0 is now RUNNING
20/07/25 03:23:38 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.020/07/25 03:23:38 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/usr/local/airflow/spark-warehouse').
20/07/25 03:23:38 INFO SharedState: Warehouse path is 'file:/usr/local/airflow/spark-warehouse'.
20/07/25 03:23:40 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
20/07/25 03:23:47 INFO FileSourceStrategy: Pruning directories with:
20/07/25 03:23:47 INFO FileSourceStrategy: Post-Scan Filters:
20/07/25 03:23:47 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
20/07/25 03:23:47 INFO FileSourceScanExec: Pushed Filters:
20/07/25 03:23:51 INFO CodeGenerator: Code generated in 2187.926234 ms
20/07/25 03:23:53 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 220.9 KB, free 366.1 MB)
20/07/25 03:23:55 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.8 KB, free 366.1 MB)
20/07/25 03:23:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 0508a77fcaad:45057 (size: 20.8 KB, free: 366.3 MB)
20/07/25 03:23:55 INFO SparkContext: Created broadcast 0 from javaToPython at NativeMethodAccessorImpl.java:0
20/07/25 03:23:55 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
20/07/25 03:23:57 INFO SparkContext: Starting job: collect at /opt/spark-2.4.1/examples/src/main/python/wordcount.py:40
20/07/25 03:23:58 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.22.0.4:59324) with ID 0
20/07/25 03:23:58 INFO DAGScheduler: Registering RDD 5 (reduceByKey at /opt/spark-2.4.1/examples/src/main/python/wordcount.py:39)
20/07/25 03:23:58 INFO DAGScheduler: Got job 0 (collect at /opt/spark-2.4.1/examples/src/main/python/wordcount.py:40) with 1 output partitions
20/07/25 03:23:58 INFO DAGScheduler: Final stage: ResultStage 1 (collect at /opt/spark-2.4.1/examples/src/main/python/wordcount.py:40)
20/07/25 03:23:58 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
20/07/25 03:23:58 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
20/07/25 03:23:58 INFO DAGScheduler: Submitting ShuffleMapStage 0 (PairwiseRDD[5] at reduceByKey at /opt/spark-2.4.1/examples/src/main/python/wordcount.py:39), which has no missing parents
20/07/25 03:23:58 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 15.2 KB, free 366.0 MB)
20/07/25 03:23:58 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 9.1 KB, free 366.0 MB)
20/07/25 03:23:58 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 0508a77fcaad:45057 (size: 9.1 KB, free: 366.3 MB)
20/07/25 03:23:58 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
20/07/25 03:23:58 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (PairwiseRDD[5] at reduceByKey at /opt/spark-2.4.1/examples/src/main/python/wordcount.py:39) (first 15 tasks are for partitions Vector(0))
20/07/25 03:23:58 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
20/07/25 03:23:58 INFO BlockManagerMasterEndpoint: Registering block manager 172.22.0.4:45435 with 366.3 MB RAM, BlockManagerId(0, 172.22.0.4, 45435, None)
20/07/25 03:23:58 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.22.0.4, executor 0, partition 0, PROCESS_LOCAL, 8307 bytes)
20/07/25 03:24:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.22.0.4:45435 (size: 9.1 KB, free: 366.3 MB)
20/07/25 03:24:09 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.22.0.4:45435 (size: 20.8 KB, free: 366.3 MB)
20/07/25 03:24:11 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.22.0.4, executor 0): java.io.FileNotFoundException: File file:/usr/local/airflow/sample.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
20/07/25 03:24:11 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 1, 172.22.0.4, executor 0, partition 0, PROCESS_LOCAL, 8307 bytes)
20/07/25 03:24:11 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) on 172.22.0.4, executor 0: java.io.FileNotFoundException (File file:/usr/local/airflow/sample.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.) [duplicate 1]
20/07/25 03:24:11 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 2, 172.22.0.4, executor 0, partition 0, PROCESS_LOCAL, 8307 bytes)
20/07/25 03:24:12 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2) on 172.22.0.4, executor 0: java.io.FileNotFoundException (File file:/usr/local/airflow/sample.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.) [duplicate 2]
20/07/25 03:24:12 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 3, 172.22.0.4, executor 0, partition 0, PROCESS_LOCAL, 8307 bytes)
20/07/25 03:24:12 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3) on 172.22.0.4, executor 0: java.io.FileNotFoundException (File file:/usr/local/airflow/sample.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.) [duplicate 3]
20/07/25 03:24:12 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
20/07/25 03:24:12 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/07/25 03:24:12 INFO TaskSchedulerImpl: Cancelling stage 0
20/07/25 03:24:12 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
20/07/25 03:24:12 INFO DAGScheduler: ShuffleMapStage 0 (reduceByKey at /opt/spark-2.4.1/examples/src/main/python/wordcount.py:39) failed in 13.690 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 172.22.0.4, executor 0): java.io.FileNotFoundException: File file:/usr/local/airflow/sample.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
Driver stacktrace:
20/07/25 03:24:12 INFO DAGScheduler: Job 0 failed: collect at /opt/spark-2.4.1/examples/src/main/python/wordcount.py:40, took 14.579961 s
Traceback (most recent call last):
File "/opt/spark-2.4.1/examples/src/main/python/wordcount.py", line 40, in <module>
output = counts.collect()
File "/opt/spark-2.4.1/python/lib/pyspark.zip/pyspark/rdd.py", line 816, in collect
File "/opt/spark-2.4.1/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/spark-2.4.1/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/opt/spark-2.4.1/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 172.22.0.4, executor 0): java.io.FileNotFoundException: File file:/usr/local/airflow/sample.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File file:/usr/local/airflow/sample.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:557)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
20/07/25 03:24:13 INFO SparkContext: Invoking stop() from shutdown hook
20/07/25 03:24:13 INFO SparkUI: Stopped Spark web UI at http://0508a77fcaad:4040
20/07/25 03:24:13 INFO StandaloneSchedulerBackend: Shutting down all executors
20/07/25 03:24:13 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
20/07/25 03:24:16 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/07/25 03:24:16 INFO MemoryStore: MemoryStore cleared
20/07/25 03:24:16 INFO BlockManager: BlockManager stopped
20/07/25 03:24:16 INFO BlockManagerMaster: BlockManagerMaster stopped
20/07/25 03:24:16 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/07/25 03:24:16 INFO SparkContext: Successfully stopped SparkContext
20/07/25 03:24:16 INFO ShutdownHookManager: Shutdown hook called
20/07/25 03:24:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-2dfb2222-d56c-4ee1-ab62-86e71e5e751b
20/07/25 03:24:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-f9dfe6ee-22d7-4747-beab-9450fc1afce0
20/07/25 03:24:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-f9dfe6ee-22d7-4747-beab-9450fc1afce0/pyspark-2ee74d07-6606-4edc-8420-fe46212c50e5
Change your spark-submit like below for submitting your spark job.
spark-submit \
--master spark://ipaddress:7077 \
--deploy-mode cluster # add this if you want to pass file name to wordcount.py
--files usr/local/airflow/sample.txt \
/opt/spark-2.4.1/examples/src/main/python/wordcount.py sample.txt
OR
spark-submit \
--master spark://ipaddress:7077 \
/opt/spark-2.4.1/examples/src/main/python/wordcount.py /usr/local/airflow/sample.txt

Spark/Python - Slave in Cluster is not used

I'm new to Spark. I have the master(192.168.33.10), and slave(192.168.33.12) cluster setup locally, and I'm wrote to following script to demo that both master and slave are running the get_ip_wrap() on its own machine.
However, when I run with the command ./bin/spark-submit ip.py, I only see the 192.168.33.10 in the output, I was expecting 192.168.33.12 in the output as well.
I have also included the trace for my master and work output file as well.
import socket
import fcntl
import struct
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
def get_ip_address(ifname):
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
return socket.inet_ntoa(fcntl.ioctl(
s.fileno(),
0x8915, # SIOCGIFADDR
struct.pack('256s', ifname[:15])
)[20:24])
def get_ip_wrap(num):
return get_ip_address('eth1')
#spark = SparkSession\
# .builder\
# .appName("PythonALS")\
# .getOrCreate()
#sc = spark.sparkContext
conf = SparkConf().setAppName('appName').setMaster('spark://vagrant-ubuntu-trusty-64:7077')
sc = SparkContext(conf=conf)
data = [x for x in range(0, 50)]
distData = sc.parallelize(data)
result = distData.map(get_ip_wrap)
print result.collect()
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ ./sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /home/vagrant/spark-2.1.1-bin-hadoop2.7/logs/spark-vagrant-org.apache.spark.deploy.master.Master-1-vagrant-ubuntu-trusty-64.out
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ ./sbin/start-slave.sh spark://vagrant-ubuntu-trusty-64:7077
starting org.apache.spark.deploy.worker.Worker, logging to /home/vagrant/spark-2.1.1-bin-hadoop2.7/logs/spark-vagrant-org.apache.spark.deploy.worker.Worker-1-vagrant-ubuntu-trusty-64.out
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ ./bin/spark-submit ip.py
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/05/27 17:08:09 INFO SparkContext: Running Spark version 2.1.1
17/05/27 17:08:09 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0
17/05/27 17:08:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/27 17:08:10 INFO SecurityManager: Changing view acls to: vagrant
17/05/27 17:08:10 INFO SecurityManager: Changing modify acls to: vagrant
17/05/27 17:08:10 INFO SecurityManager: Changing view acls groups to:
17/05/27 17:08:10 INFO SecurityManager: Changing modify acls groups to:
17/05/27 17:08:10 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vagrant); groups with view permissions: Set(); users with modify permissions: Set(vagrant); groups with modify permissions: Set()
17/05/27 17:08:10 INFO Utils: Successfully started service 'sparkDriver' on port 59290.
17/05/27 17:08:10 INFO SparkEnv: Registering MapOutputTracker
17/05/27 17:08:10 INFO SparkEnv: Registering BlockManagerMaster
17/05/27 17:08:10 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/05/27 17:08:10 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/05/27 17:08:10 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ad008702-6e92-4e60-ab27-a582b1ba9fb9
17/05/27 17:08:10 INFO MemoryStore: MemoryStore started with capacity 413.9 MB
17/05/27 17:08:11 INFO SparkEnv: Registering OutputCommitCoordinator
17/05/27 17:08:11 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/05/27 17:08:11 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
17/05/27 17:08:11 INFO Utils: Successfully started service 'SparkUI' on port 4042.
17/05/27 17:08:11 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4042
17/05/27 17:08:11 INFO SparkContext: Added file file:/home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py at spark://10.0.2.15:59290/files/ip.py with timestamp 1495904891756
17/05/27 17:08:11 INFO Utils: Copying /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py to /tmp/spark-5400808c-1304-404d-ae53-dc6cdb14694f/userFiles-dc94d72e-15d3-4d84-87b9-27e87dcb0f6a/ip.py
17/05/27 17:08:11 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://vagrant-ubuntu-trusty-64:7077...
17/05/27 17:08:11 INFO TransportClientFactory: Successfully created connection to vagrant-ubuntu-trusty-64/10.0.2.15:7077 after 20 ms (0 ms spent in bootstraps)
17/05/27 17:08:12 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20170527170812-0000
17/05/27 17:08:12 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53124.
17/05/27 17:08:12 INFO NettyBlockTransferService: Server created on 10.0.2.15:53124
17/05/27 17:08:12 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/05/27 17:08:12 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 53124, None)
17/05/27 17:08:12 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20170527170812-0000/0 on worker-20170527170800-10.0.2.15-54829 (10.0.2.15:54829) with 1 cores
17/05/27 17:08:12 INFO StandaloneSchedulerBackend: Granted executor ID app-20170527170812-0000/0 on hostPort 10.0.2.15:54829 with 1 cores, 1024.0 MB RAM
17/05/27 17:08:12 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:53124 with 413.9 MB RAM, BlockManagerId(driver, 10.0.2.15, 53124, None)
17/05/27 17:08:12 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 53124, None)
17/05/27 17:08:12 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 53124, None)
17/05/27 17:08:12 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20170527170812-0000/0 is now RUNNING
17/05/27 17:08:12 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/05/27 17:08:13 INFO SparkContext: Starting job: collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31
17/05/27 17:08:13 INFO DAGScheduler: Got job 0 (collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31) with 2 output partitions
17/05/27 17:08:13 INFO DAGScheduler: Final stage: ResultStage 0 (collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31)
17/05/27 17:08:13 INFO DAGScheduler: Parents of final stage: List()
17/05/27 17:08:13 INFO DAGScheduler: Missing parents: List()
17/05/27 17:08:13 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31), which has no missing parents
17/05/27 17:08:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.1 KB, free 413.9 MB)
17/05/27 17:08:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.8 KB, free 413.9 MB)
17/05/27 17:08:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:53124 (size: 2.8 KB, free: 413.9 MB)
17/05/27 17:08:13 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996
17/05/27 17:08:13 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[1] at collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31)
17/05/27 17:08:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
17/05/27 17:08:15 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.0.2.15:40762) with ID 0
17/05/27 17:08:15 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.0.2.15, executor 0, partition 0, PROCESS_LOCAL, 6136 bytes)
17/05/27 17:08:15 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:33949 with 413.9 MB RAM, BlockManagerId(0, 10.0.2.15, 33949, None)
17/05/27 17:08:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:33949 (size: 2.8 KB, free: 413.9 MB)
17/05/27 17:08:16 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.0.2.15, executor 0, partition 1, PROCESS_LOCAL, 6136 bytes)
17/05/27 17:08:16 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1050 ms on 10.0.2.15 (executor 0) (1/2)
17/05/27 17:08:16 INFO DAGScheduler: ResultStage 0 (collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31) finished in 2.504 s
17/05/27 17:08:16 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 119 ms on 10.0.2.15 (executor 0) (2/2)
17/05/27 17:08:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/05/27 17:08:16 INFO DAGScheduler: Job 0 finished: collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31, took 2.981746 s
['192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10']
17/05/27 17:08:16 INFO SparkContext: Invoking stop() from shutdown hook
17/05/27 17:08:16 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4042
17/05/27 17:08:16 INFO StandaloneSchedulerBackend: Shutting down all executors
17/05/27 17:08:16 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
17/05/27 17:08:16 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/05/27 17:08:16 INFO MemoryStore: MemoryStore cleared
17/05/27 17:08:16 INFO BlockManager: BlockManager stopped
17/05/27 17:08:16 INFO BlockManagerMaster: BlockManagerMaster stopped
17/05/27 17:08:16 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/05/27 17:08:16 INFO SparkContext: Successfully stopped SparkContext
17/05/27 17:08:16 INFO ShutdownHookManager: Shutdown hook called
17/05/27 17:08:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-5400808c-1304-404d-ae53-dc6cdb14694f/pyspark-021d6ed2-91d0-481b-b528-108581abe66c
17/05/27 17:08:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-5400808c-1304-404d-ae53-dc6cdb14694f
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ cat /home/vagrant/spark-2.1.1-bin-hadoop2.7/logs/spark-vagrant-org.apache.spark.deploy.master.Master-1-vagrant-ubuntu-trusty-64.out
Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java -cp /home/vagrant/spark-2.1.1-bin-hadoop2.7/conf/:/home/vagrant/spark-2.1.1-bin-hadoop2.7/jars/* -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --host vagrant-ubuntu-trusty-64 --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/05/27 17:07:44 INFO Master: Started daemon with process name: 9384#vagrant-ubuntu-trusty-64
17/05/27 17:07:44 INFO SignalUtils: Registered signal handler for TERM
17/05/27 17:07:44 INFO SignalUtils: Registered signal handler for HUP
17/05/27 17:07:44 INFO SignalUtils: Registered signal handler for INT
17/05/27 17:07:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/27 17:07:45 INFO SecurityManager: Changing view acls to: vagrant
17/05/27 17:07:45 INFO SecurityManager: Changing modify acls to: vagrant
17/05/27 17:07:45 INFO SecurityManager: Changing view acls groups to:
17/05/27 17:07:45 INFO SecurityManager: Changing modify acls groups to:
17/05/27 17:07:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vagrant); groups with view permissions: Set(); users with modify permissions: Set(vagrant); groups with modify permissions: Set()
17/05/27 17:07:45 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
17/05/27 17:07:45 INFO Master: Starting Spark master at spark://vagrant-ubuntu-trusty-64:7077
17/05/27 17:07:45 INFO Master: Running Spark version 2.1.1
17/05/27 17:07:45 INFO Utils: Successfully started service 'MasterUI' on port 8080.
17/05/27 17:07:45 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://10.0.2.15:8080
17/05/27 17:07:45 INFO Utils: Successfully started service on port 6066.
17/05/27 17:07:45 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066
17/05/27 17:07:46 INFO Master: I have been elected leader! New state: ALIVE
17/05/27 17:08:00 INFO Master: Registering worker 10.0.2.15:54829 with 1 cores, 2.8 GB RAM
17/05/27 17:08:12 INFO Master: Registering app appName
17/05/27 17:08:12 INFO Master: Registered app appName with ID app-20170527170812-0000
17/05/27 17:08:12 INFO Master: Launching executor app-20170527170812-0000/0 on worker worker-20170527170800-10.0.2.15-54829
17/05/27 17:08:16 INFO Master: Received unregister request from application app-20170527170812-0000
17/05/27 17:08:16 INFO Master: Removing app app-20170527170812-0000
17/05/27 17:08:16 INFO Master: 10.0.2.15:51703 got disassociated, removing it.
17/05/27 17:08:16 INFO Master: 10.0.2.15:59290 got disassociated, removing it.
17/05/27 17:08:16 WARN Master: Got status update for unknown executor app-20170527170812-0000/0
vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$

Spark: why tasks assigned only to one worker?

I'm new to Apache Spark and trying to run a simple program on my cluster. The problem is that the driver allocates all tasks to one worker.
I am running as spark stand-alone cluster mode on 2 computers:
1 - runs the master and a worker with 4 cores: 1 used for the master, 3 for the worker. Ip: 192.168.1.101
2 - runs only a worker with 4 cores: all for worker. Ip: 192.168.1.104
this is the code:
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("spark-project");
JavaSparkContext sc = new JavaSparkContext(conf);
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
JavaRDD<String> lines = sc.textFile("/Datasets/somefile.txt",7);
System.out.println(lines.partitions().size());
Accumulator<Integer> sum = sc.accumulator(0);
JavaRDD<Integer> numbers = lines.map(line -> 1);
System.out.println(numbers.partitions().size());
numbers.foreach(num -> System.out.println(num));
numbers.foreach(num -> sum.add(num));
System.out.println(sum.value());
sc.close();
}
Note: used Thread.sleep() command because I tried this: https://issues.apache.org/jira/browse/SPARK-3100
I used the submit script:
bin/spark-submit --class spark.Main --master spark://192.168.1.101:7077 --deploy-mode cluster /home/sparkUser/JarsOfSpark/JarForSpark.jar
this is the result I have got from the driver stdout:
7
7
50144
logs from the master:
log4j:WARN No appenders could be found for logger(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/15 19:22:14 INFO SecurityManager: Changing view acls to: sparkUser
16/01/15 19:22:14 INFO SecurityManager: Changing modify acls to: sparkUser
16/01/15 19:22:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sparkUser); users with modify permissions: Set(sparkUser)
16/01/15 19:22:24 INFO Slf4jLogger: Slf4jLogger started
16/01/15 19:22:24 INFO Utils: Successfully started service 'Driver' on port 46546.
16/01/15 19:22:24 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker#192.168.1.101:43150/user/Worker
16/01/15 19:22:24 INFO SparkContext: Running Spark version 1.4.1
16/01/15 19:22:24 INFO SecurityManager: Changing view acls to: sparkUser
16/01/15 19:22:24 INFO SecurityManager: Changing modify acls to: sparkUser
16/01/15 19:22:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sparkUser); users with modify permissions: Set(sparkUser)
16/01/15 19:22:24 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker#192.168.1.101:43150/user/Worker
16/01/15 19:22:25 INFO Slf4jLogger: Slf4jLogger started
16/01/15 19:22:25 INFO Utils: Successfully started service 'sparkDriver' on port 38186.
16/01/15 19:22:25 INFO SparkEnv: Registering MapOutputTracker
16/01/15 19:22:25 INFO SparkEnv: Registering BlockManagerMaster
16/01/15 19:22:25 INFO DiskBlockManager: Created local directory at /tmp/spark-ef3b8193-e086-4764-993c-0a40534052c1/blockmgr-e80c1c60-fe19-4be1-b3f9-259b3f1031a0
16/01/15 19:22:25 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
16/01/15 19:22:25 INFO HttpFileServer: HTTP File server directory is /tmp/spark-ef3b8193-e086-4764-993c-0a40534052c1/httpd-e05a5a70-dbf3-4055-b6ab-7efa22dfa4d2
16/01/15 19:22:25 INFO HttpServer: Starting HTTP Server
16/01/15 19:22:25 INFO Utils: Successfully started service 'HTTP file server' on port 34728.
16/01/15 19:22:25 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/15 19:22:35 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/15 19:22:35 INFO SparkUI: Started SparkUI at http://192.168.1.101:4040
16/01/15 19:22:35 INFO SparkContext: Added JAR file:/home/sparkUser/JarsOfSpark/JarForSpark.jar at http://192.168.1.101:34728/jars/JarForSpark.jar with timestamp 1452878555317
16/01/15 19:22:35 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#192.168.1.101:7077/user/Master...
16/01/15 19:22:35 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160115192235-0016
16/01/15 19:22:35 INFO AppClient$ClientActor: Executor added: app-20160115192235-0016/0 on worker-20160115181337-192.168.1.104-50099 (192.168.1.104:50099) with 4 cores
16/01/15 19:22:35 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160115192235-0016/0 on hostPort 192.168.1.104:50099 with 4 cores, 512.0 MB RAM
16/01/15 19:22:35 INFO AppClient$ClientActor: Executor added: app-20160115192235-0016/1 on worker-20160115125104-192.168.1.101-43150 (192.168.1.101:43150) with 3 cores
16/01/15 19:22:35 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160115192235-0016/1 on hostPort 192.168.1.101:43150 with 3 cores, 512.0 MB RAM
16/01/15 19:22:35 INFO AppClient$ClientActor: Executor updated: app-20160115192235-0016/1 is now LOADING
16/01/15 19:22:35 INFO AppClient$ClientActor: Executor updated: app-20160115192235-0016/0 is now LOADING
16/01/15 19:22:35 INFO AppClient$ClientActor: Executor updated: app-20160115192235-0016/0 is now RUNNING
16/01/15 19:22:35 INFO AppClient$ClientActor: Executor updated: app-20160115192235-0016/1 is now RUNNING
16/01/15 19:22:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33359.
16/01/15 19:22:35 INFO NettyBlockTransferService: Server created on 33359
16/01/15 19:22:35 INFO BlockManagerMaster: Trying to register BlockManager
16/01/15 19:22:35 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.101:33359 with 265.1 MB RAM, BlockManagerId(driver, 192.168.1.101, 33359)
16/01/15 19:22:35 INFO BlockManagerMaster: Registered BlockManager
16/01/15 19:22:35 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/01/15 19:22:38 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor#192.168.1.104:49573/user/Executor#1472403765]) with ID 0
16/01/15 19:22:39 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.104:33856 with 265.1 MB RAM, BlockManagerId(0, 192.168.1.104, 33856)
16/01/15 19:22:40 INFO MemoryStore: ensureFreeSpace(130448) called with curMem=0, maxMem=278019440
16/01/15 19:22:40 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 127.4 KB, free 265.0 MB)
16/01/15 19:22:40 INFO MemoryStore: ensureFreeSpace(14257) called with curMem=130448, maxMem=278019440
16/01/15 19:22:40 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 265.0 MB)
16/01/15 19:22:40 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.101:33359 (size: 13.9 KB, free: 265.1 MB)
16/01/15 19:22:40 INFO SparkContext: Created broadcast 0 from textFile at Main.java:25
16/01/15 19:22:41 INFO FileInputFormat: Total input paths to process : 1
16/01/15 19:22:41 INFO SparkContext: Starting job: foreach at Main.java:33
16/01/15 19:22:41 INFO DAGScheduler: Got job 0 (foreach at Main.java:33) with 7 output partitions (allowLocal=false)
16/01/15 19:22:41 INFO DAGScheduler: Final stage: ResultStage 0(foreach at Main.java:33)
16/01/15 19:22:41 INFO DAGScheduler: Parents of final stage: List()
16/01/15 19:22:41 INFO DAGScheduler: Missing parents: List()
16/01/15 19:22:41 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at map at Main.java:30), which has no missing parents
16/01/15 19:22:41 INFO MemoryStore: ensureFreeSpace(4400) called with curMem=144705, maxMem=278019440
16/01/15 19:22:41 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.3 KB, free 265.0 MB)
16/01/15 19:22:41 INFO MemoryStore: ensureFreeSpace(2538) called with curMem=149105, maxMem=278019440
16/01/15 19:22:41 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 265.0 MB)
16/01/15 19:22:41 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.101:33359 (size: 2.5 KB, free: 265.1 MB)
16/01/15 19:22:41 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
16/01/15 19:22:41 INFO DAGScheduler: Submitting 7 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at map at Main.java:30)
16/01/15 19:22:41 INFO TaskSchedulerImpl: Adding task set 0.0 with 7 tasks
16/01/15 19:22:41 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:41 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:41 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:41 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:41 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.104:33856 (size: 2.5 KB, free: 265.1 MB)
16/01/15 19:22:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.104:33856 (size: 13.9 KB, free: 265.1 MB)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 2017 ms on 192.168.1.104 (1/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2036 ms on 192.168.1.104 (2/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2027 ms on 192.168.1.104 (3/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 2027 ms on 192.168.1.104 (4/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 143 ms on 192.168.1.104 (5/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 199 ms on 192.168.1.104 (6/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 206 ms on 192.168.1.104 (7/7)
16/01/15 19:22:43 INFO DAGScheduler: ResultStage 0 (foreach at Main.java:33) finished in 2.218 s
16/01/15 19:22:43 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/01/15 19:22:43 INFO DAGScheduler: Job 0 finished: foreach at Main.java:33, took 2.289399 s
16/01/15 19:22:43 INFO SparkContext: Starting job: foreach at Main.java:34
16/01/15 19:22:43 INFO DAGScheduler: Got job 1 (foreach at Main.java:34) with 7 output partitions (allowLocal=false)
16/01/15 19:22:43 INFO DAGScheduler: Final stage: ResultStage 1(foreach at Main.java:34)
16/01/15 19:22:43 INFO DAGScheduler: Parents of final stage: List()
16/01/15 19:22:43 INFO DAGScheduler: Missing parents: List()
16/01/15 19:22:43 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[2] at map at Main.java:30), which has no missing parents
16/01/15 19:22:43 INFO MemoryStore: ensureFreeSpace(4824) called with curMem=151643, maxMem=278019440
16/01/15 19:22:43 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.7 KB, free 265.0 MB)
16/01/15 19:22:43 INFO MemoryStore: ensureFreeSpace(2761) called with curMem=156467, maxMem=278019440
16/01/15 19:22:43 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.7 KB, free 265.0 MB)
16/01/15 19:22:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.101:33359 (size: 2.7 KB, free: 265.1 MB)
16/01/15 19:22:43 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
16/01/15 19:22:43 INFO DAGScheduler: Submitting 7 missing tasks from ResultStage 1 (MapPartitionsRDD[2] at map at Main.java:30)
16/01/15 19:22:43 INFO TaskSchedulerImpl: Adding task set 1.0 with 7 tasks
16/01/15 19:22:43 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 7, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 8, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 9, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 10, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.104:33856 (size: 2.7 KB, free: 265.1 MB)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 11, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 7) in 106 ms on 192.168.1.104 (1/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 8) in 125 ms on 192.168.1.104 (2/7)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 12, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 13, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 9) in 131 ms on 192.168.1.104 (3/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 10) in 133 ms on 192.168.1.104 (4/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 12) in 32 ms on 192.168.1.104 (5/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 11) in 61 ms on 192.168.1.104 (6/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 13) in 34 ms on 192.168.1.104 (7/7)
16/01/15 19:22:43 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
16/01/15 19:22:43 INFO DAGScheduler: ResultStage 1 (foreach at Main.java:34) finished in 0.165 s
16/01/15 19:22:43 INFO DAGScheduler: Job 1 finished: foreach at Main.java:34, took 0.177378 s
16/01/15 19:22:43 INFO SparkUI: Stopped Spark web UI at http://192.168.1.101:4040
16/01/15 19:22:43 INFO DAGScheduler: Stopping DAGScheduler
16/01/15 19:22:43 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/01/15 19:22:43 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/01/15 19:22:43 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/01/15 19:22:43 INFO Utils: path = /tmp/spark-ef3b8193-e086-4764-993c-0a40534052c1/blockmgr-e80c1c60-fe19-4be1-b3f9-259b3f1031a0, already present as root for deletion.
16/01/15 19:22:43 INFO MemoryStore: MemoryStore cleared
16/01/15 19:22:43 INFO BlockManager: BlockManager stopped
16/01/15 19:22:43 INFO BlockManagerMaster: BlockManagerMaster stopped
16/01/15 19:22:43 INFO SparkContext: Successfully stopped SparkContext
16/01/15 19:22:43 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/01/15 19:22:43 INFO Utils: Shutdown hook called
16/01/15 19:22:43 INFO Utils: Deleting directory /tmp/spark-ef3b8193-e086-4764-993c-0a40534052c1
logs from worker 192.168.1.101:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/15 18:14:15 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
16/01/15 18:14:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/15 18:14:15 INFO SecurityManager: Changing view acls to: sparkUser
16/01/15 18:14:15 INFO SecurityManager: Changing modify acls to: sparkUser
16/01/15 18:14:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sparkUser); users with modify permissions: Set(sparkUser)
logs from worker 192.168.1.104:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/15 19:23:23 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
16/01/15 19:23:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/15 19:23:24 INFO SecurityManager: Changing view acls to: root,sparkUser
16/01/15 19:23:24 INFO SecurityManager: Changing modify acls to: root,sparkUser
16/01/15 19:23:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, sparkUser); users with modify permissions: Set(root, sparkUser)
16/01/15 19:23:25 INFO Slf4jLogger: Slf4jLogger started
16/01/15 19:23:25 INFO Remoting: Starting remoting
16/01/15 19:23:25 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher#192.168.1.104:43937]
16/01/15 19:23:25 INFO Utils: Successfully started service 'driverPropsFetcher' on port 43937.
16/01/15 19:23:26 INFO SecurityManager: Changing view acls to: root,sparkUser
16/01/15 19:23:26 INFO SecurityManager: Changing modify acls to: root,sparkUser
16/01/15 19:23:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, sparkUser); users with modify permissions: Set(root, sparkUser)
16/01/15 19:23:26 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/01/15 19:23:26 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/01/15 19:23:26 INFO Slf4jLogger: Slf4jLogger started
16/01/15 19:23:26 INFO Remoting: Starting remoting
16/01/15 19:23:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor#192.168.1.104:49573]
16/01/15 19:23:26 INFO Utils: Successfully started service 'sparkExecutor' on port 49573.
16/01/15 19:23:26 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/01/15 19:23:26 INFO DiskBlockManager: Created local directory at /tmp/spark-6ffb215c-7267-4a93-a766-2486d2331f6b/executor-146bfe64-d7e8-4da4-9144-8003754f0b5b/blockmgr-41031d8c-b069-4147-90c9-2237baed04f1
16/01/15 19:23:26 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
16/01/15 19:23:26 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sparkDriver#192.168.1.101:38186/user/CoarseGrainedScheduler
16/01/15 19:23:26 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker#192.168.1.104:50099/user/Worker
16/01/15 19:23:26 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker#192.168.1.104:50099/user/Worker
16/01/15 19:23:26 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
16/01/15 19:23:26 INFO Executor: Starting executor ID 0 on host 192.168.1.104
16/01/15 19:23:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33856.
16/01/15 19:23:26 INFO NettyBlockTransferService: Server created on 33856
16/01/15 19:23:26 INFO BlockManagerMaster: Trying to register BlockManager
16/01/15 19:23:26 INFO BlockManagerMaster: Registered BlockManager
16/01/15 19:23:29 INFO CoarseGrainedExecutorBackend: Got assigned task 0
16/01/15 19:23:29 INFO CoarseGrainedExecutorBackend: Got assigned task 1
16/01/15 19:23:29 INFO CoarseGrainedExecutorBackend: Got assigned task 2
16/01/15 19:23:29 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/01/15 19:23:29 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
16/01/15 19:23:29 INFO CoarseGrainedExecutorBackend: Got assigned task 3
16/01/15 19:23:29 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/01/15 19:23:29 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
16/01/15 19:23:29 INFO Executor: Fetching http://192.168.1.101:34728/jars/JarForSpark.jar with timestamp 1452878555317
16/01/15 19:23:29 INFO Utils: Fetching http://192.168.1.101:34728/jars/JarForSpark.jar to /tmp/spark-6ffb215c-7267-4a93-a766-2486d2331f6b/executor-146bfe64-d7e8-4da4-9144-8003754f0b5b/fetchFileTemp1585609242243689070.tmp
16/01/15 19:23:29 INFO Utils: Copying /tmp/spark-6ffb215c-7267-4a93-a766-2486d2331f6b/executor-146bfe64-d7e8-4da4-9144-8003754f0b5b/3339800781452878555317_cache to /home/sparkUser2/Programs/spark-1.4.1-bin-hadoop2.6/work/app-20160115192235-0016/0/./JarForSpark.jar
16/01/15 19:23:29 INFO Executor: Adding file:/home/sparkUser2/Programs/spark-1.4.1-bin-hadoop2.6/work/app-20160115192235-0016/0/./JarForSpark.jar to class loader
16/01/15 19:23:29 INFO TorrentBroadcast: Started reading broadcast variable 1
16/01/15 19:23:29 INFO MemoryStore: ensureFreeSpace(2538) called with curMem=0, maxMem=278019440
16/01/15 19:23:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 265.1 MB)
16/01/15 19:23:29 INFO TorrentBroadcast: Reading broadcast variable 1 took 273 ms
16/01/15 19:23:29 INFO MemoryStore: ensureFreeSpace(4400) called with curMem=2538, maxMem=278019440
16/01/15 19:23:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.3 KB, free 265.1 MB)
16/01/15 19:23:29 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:161772+161772
16/01/15 19:23:29 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:323544+161772
16/01/15 19:23:29 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:0+161772
16/01/15 19:23:29 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:485316+161772
16/01/15 19:23:29 INFO TorrentBroadcast: Started reading broadcast variable 0
16/01/15 19:23:29 INFO MemoryStore: ensureFreeSpace(14257) called with curMem=6938, maxMem=278019440
16/01/15 19:23:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 265.1 MB)
16/01/15 19:23:30 INFO TorrentBroadcast: Reading broadcast variable 0 took 66 ms
16/01/15 19:23:30 INFO MemoryStore: ensureFreeSpace(188976) called with curMem=21195, maxMem=278019440
16/01/15 19:23:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 184.5 KB, free 264.9 MB)
16/01/15 19:23:30 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
16/01/15 19:23:30 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
16/01/15 19:23:30 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
16/01/15 19:23:30 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
16/01/15 19:23:30 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
16/01/15 19:23:30 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 1796 bytes result sent to driver
16/01/15 19:23:30 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 1796 bytes result sent to driver
16/01/15 19:23:30 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1796 bytes result sent to driver
16/01/15 19:23:30 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1796 bytes result sent to driver
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Got assigned task 4
16/01/15 19:23:31 INFO Executor: Running task 4.0 in stage 0.0 (TID 4)
16/01/15 19:23:31 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:647088+161772
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Got assigned task 5
16/01/15 19:23:31 INFO Executor: Running task 5.0 in stage 0.0 (TID 5)
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Got assigned task 6
16/01/15 19:23:31 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:808860+161772
16/01/15 19:23:31 INFO Executor: Running task 6.0 in stage 0.0 (TID 6)
16/01/15 19:23:31 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:970632+161773
16/01/15 19:23:31 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 1796 bytes result sent to driver
16/01/15 19:23:31 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 1796 bytes result sent to driver
16/01/15 19:23:31 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 1796 bytes result sent to driver
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Got assigned task 7
16/01/15 19:23:31 INFO Executor: Running task 0.0 in stage 1.0 (TID 7)
16/01/15 19:23:31 INFO TorrentBroadcast: Started reading broadcast variable 2
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Got assigned task 8
16/01/15 19:23:31 INFO Executor: Running task 1.0 in stage 1.0 (TID 8)
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Got assigned task 9
16/01/15 19:23:31 INFO Executor: Running task 2.0 in stage 1.0 (TID 9)
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Got assigned task 10
16/01/15 19:23:31 INFO Executor: Running task 3.0 in stage 1.0 (TID 10)
16/01/15 19:23:31 INFO MemoryStore: ensureFreeSpace(2761) called with curMem=210171, maxMem=278019440
16/01/15 19:23:31 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.7 KB, free 264.9 MB)
16/01/15 19:23:31 INFO TorrentBroadcast: Reading broadcast variable 2 took 42 ms
16/01/15 19:23:31 INFO MemoryStore: ensureFreeSpace(4824) called with curMem=212932, maxMem=278019440
16/01/15 19:23:31 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.7 KB, free 264.9 MB)
16/01/15 19:23:31 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:0+161772
16/01/15 19:23:31 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:161772+161772
16/01/15 19:23:31 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:323544+161772
16/01/15 19:23:31 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:485316+161772
16/01/15 19:23:31 INFO Executor: Finished task 0.0 in stage 1.0 (TID 7). 1814 bytes result sent to driver
16/01/15 19:23:31 INFO Executor: Finished task 1.0 in stage 1.0 (TID 8). 1814 bytes result sent to driver
16/01/15 19:23:31 INFO Executor: Finished task 2.0 in stage 1.0 (TID 9). 1814 bytes result sent to driver
16/01/15 19:23:31 INFO Executor: Finished task 3.0 in stage 1.0 (TID 10). 1814 bytes result sent to driver
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Got assigned task 11
16/01/15 19:23:31 INFO Executor: Running task 4.0 in stage 1.0 (TID 11)
16/01/15 19:23:31 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:647088+161772
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Got assigned task 12
16/01/15 19:23:31 INFO Executor: Running task 5.0 in stage 1.0 (TID 12)
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Got assigned task 13
16/01/15 19:23:31 INFO Executor: Running task 6.0 in stage 1.0 (TID 13)
16/01/15 19:23:31 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:808860+161772
16/01/15 19:23:31 INFO HadoopRDD: Input split: file:/Datasets/somefile.txt:970632+161773
16/01/15 19:23:31 INFO Executor: Finished task 5.0 in stage 1.0 (TID 12). 1814 bytes result sent to driver
16/01/15 19:23:31 INFO Executor: Finished task 4.0 in stage 1.0 (TID 11). 1814 bytes result sent to driver
16/01/15 19:23:31 INFO Executor: Finished task 6.0 in stage 1.0 (TID 13). 1814 bytes result sent to driver
16/01/15 19:23:31 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
I also tried to stop one of the workers to see what happens and the program successfully completed on the other worker.
and I looked at this post but unfortunately it didn't solved my problem:
Why my tasks only be done in one worker in Spark cluster
Appreciate your help!
It is because of Data Locality - "How close data is to the code processing it"
Spark tries to schedule the available tasks to its best locality levels.
Spark by default tries "PROCESS_LOCAL" mode as the first option and switches on to the lower levels only if it sees that the none of the CPU's are freed after a certain time interval.
Default wait time before switching to lower levels is 3s (see spark.locality.wait parameter).
And looking at the logs, all your tasks are finished within 3 seconds.
16/01/15 19:22:41 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:41 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:41 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:41 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:41 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.104:33856 (size: 2.5 KB, free: 265.1 MB)
16/01/15 19:22:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.104:33856 (size: 13.9 KB, free: 265.1 MB)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, 192.168.1.104, PROCESS_LOCAL, 1495 bytes)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 2017 ms on 192.168.1.104 (1/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2036 ms on 192.168.1.104 (2/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2027 ms on 192.168.1.104 (3/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 2027 ms on 192.168.1.104 (4/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 143 ms on 192.168.1.104 (5/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 199 ms on 192.168.1.104 (6/7)
16/01/15 19:22:43 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 206 ms on 192.168.1.104 (7/7)
16/01/15 19:22:43 INFO DAGScheduler: ResultStage 0 (foreach at Main.java:33) finished in 2.218 s
16/01/15 19:22:43 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/01/15 19:22:43 INFO DAGScheduler: Job 0 finished: foreach at Main.java:33, took 2.289399 s
Would suggest to try with larger files (in GB's) where each tasks takes some time to get the final results.
for more information on Data Locality, please read the section "Data Locality" in Spark Tuning Section

Spark metrics on wordcount example

I read the section Metrics on spark website. I wish to try it on the wordcount example, I can't make it work.
spark/conf/metrics.properties :
# Enable CsvSink for all instances
*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
# Polling period for CsvSink
*.sink.csv.period=1
*.sink.csv.unit=seconds
# Polling directory for CsvSink
*.sink.csv.directory=/home/spark/Documents/test/
# Worker instance overlap polling period
worker.sink.csv.period=1
worker.sink.csv.unit=seconds
# Enable jvm source for instance master, worker, driver and executor
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
I run my app in local like in the documentation :
$SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar
I checked /home/spark/Documents/test/ and it is empty.
What did I miss?
Shell:
$SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[4] --conf spark.metrics.conf=/home/spark/development/spark/conf/metrics.properties target/scala-2.10/simple-project_2.10-1.0.jar
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
INFO SparkContext: Running Spark version 1.3.0
WARN Utils: Your hostname, cv-local resolves to a loopback address: 127.0.1.1; using 192.168.1.64 instead (on interface eth0)
WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
INFO SecurityManager: Changing view acls to: spark
INFO SecurityManager: Changing modify acls to: spark
INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark)
INFO Slf4jLogger: Slf4jLogger started
INFO Remoting: Starting remoting
INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#cv-local.local:35895]
INFO Utils: Successfully started service 'sparkDriver' on port 35895.
INFO SparkEnv: Registering MapOutputTracker
INFO SparkEnv: Registering BlockManagerMaster
INFO DiskBlockManager: Created local directory at /tmp/spark-447d56c9-cfe5-4f9d-9e0a-6bb476ddede6/blockmgr-4eaa04f4-b4b2-4b05-ba0e-fd1aeb92b289
INFO MemoryStore: MemoryStore started with capacity 265.4 MB
INFO HttpFileServer: HTTP File server directory is /tmp/spark-fae11cd2-937e-4be3-a273-be8b4c4847df/httpd-ca163445-6fff-45e4-9c69-35edcea83b68
INFO HttpServer: Starting HTTP Server
INFO Utils: Successfully started service 'HTTP file server' on port 52828.
INFO SparkEnv: Registering OutputCommitCoordinator
INFO Utils: Successfully started service 'SparkUI' on port 4040.
INFO SparkUI: Started SparkUI at http://cv-local.local:4040
INFO SparkContext: Added JAR file:/home/spark/workspace/IdeaProjects/wordcount/target/scala-2.10/simple-project_2.10-1.0.jar at http://192.168.1.64:52828/jars/simple-project_2.10-1.0.jar with timestamp 1444049152348
INFO Executor: Starting executor ID <driver> on host localhost
INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver#cv-local.local:35895/user/HeartbeatReceiver
INFO NettyBlockTransferService: Server created on 60320
INFO BlockManagerMaster: Trying to register BlockManager
INFO BlockManagerMasterActor: Registering block manager localhost:60320 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 60320)
INFO BlockManagerMaster: Registered BlockManager
INFO MemoryStore: ensureFreeSpace(34046) called with curMem=0, maxMem=278302556
INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 33.2 KB, free 265.4 MB)
INFO MemoryStore: ensureFreeSpace(5221) called with curMem=34046, maxMem=278302556
INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.1 KB, free 265.4 MB)
INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:60320 (size: 5.1 KB, free: 265.4 MB)
INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
INFO SparkContext: Created broadcast 0 from textFile at SimpleApp.scala:11
WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN LoadSnappy: Snappy native library not loaded
INFO FileInputFormat: Total input paths to process : 1
INFO SparkContext: Starting job: count at SimpleApp.scala:12
INFO DAGScheduler: Got job 0 (count at SimpleApp.scala:12) with 2 output partitions (allowLocal=false)
INFO DAGScheduler: Final stage: Stage 0(count at SimpleApp.scala:12)
INFO DAGScheduler: Parents of final stage: List()
INFO DAGScheduler: Missing parents: List()
INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at filter at SimpleApp.scala:12), which has no missing parents
INFO MemoryStore: ensureFreeSpace(2848) called with curMem=39267, maxMem=278302556
INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.8 KB, free 265.4 MB)
INFO MemoryStore: ensureFreeSpace(2056) called with curMem=42115, maxMem=278302556
INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 265.4 MB)
INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:60320 (size: 2.0 KB, free: 265.4 MB)
INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839
INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[2] at filter at SimpleApp.scala:12)
INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1391 bytes)
INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1391 bytes)
INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
INFO Executor: Fetching http://192.168.1.64:52828/jars/simple-project_2.10-1.0.jar with timestamp 1444049152348
INFO Utils: Fetching http://192.168.1.64:52828/jars/simple-project_2.10-1.0.jar to /tmp/spark-cab5a940-e2a4-4caf-8549-71e1518271f1/userFiles-c73172c2-7af6-4861-a945-b183edbbafa1/fetchFileTemp4229868141058449157.tmp
INFO Executor: Adding file:/tmp/spark-cab5a940-e2a4-4caf-8549-71e1518271f1/userFiles-c73172c2-7af6-4861-a945-b183edbbafa1/simple-project_2.10-1.0.jar to class loader
INFO CacheManager: Partition rdd_1_1 not found, computing it
INFO CacheManager: Partition rdd_1_0 not found, computing it
INFO HadoopRDD: Input split: file:/home/spark/development/spark/conf/metrics.properties:2659+2659
INFO HadoopRDD: Input split: file:/home/spark/development/spark/conf/metrics.properties:0+2659
INFO MemoryStore: ensureFreeSpace(7840) called with curMem=44171, maxMem=278302556
INFO MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 7.7 KB, free 265.4 MB)
INFO BlockManagerInfo: Added rdd_1_0 in memory on localhost:60320 (size: 7.7 KB, free: 265.4 MB)
INFO BlockManagerMaster: Updated info of block rdd_1_0
INFO MemoryStore: ensureFreeSpace(8648) called with curMem=52011, maxMem=278302556
INFO MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 8.4 KB, free 265.4 MB)
INFO BlockManagerInfo: Added rdd_1_1 in memory on localhost:60320 (size: 8.4 KB, free: 265.4 MB)
INFO BlockManagerMaster: Updated info of block rdd_1_1
INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2399 bytes result sent to driver
INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2399 bytes result sent to driver
INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 139 ms on localhost (1/2)
INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 133 ms on localhost (2/2)
INFO DAGScheduler: Stage 0 (count at SimpleApp.scala:12) finished in 0.151 s
INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
INFO DAGScheduler: Job 0 finished: count at SimpleApp.scala:12, took 0.225939 s
INFO SparkContext: Starting job: count at SimpleApp.scala:13
INFO DAGScheduler: Got job 1 (count at SimpleApp.scala:13) with 2 output partitions (allowLocal=false)
INFO DAGScheduler: Final stage: Stage 1(count at SimpleApp.scala:13)
INFO DAGScheduler: Parents of final stage: List()
INFO DAGScheduler: Missing parents: List()
INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[3] at filter at SimpleApp.scala:13), which has no missing parents
INFO MemoryStore: ensureFreeSpace(2848) called with curMem=60659, maxMem=278302556
INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.8 KB, free 265.3 MB)
INFO MemoryStore: ensureFreeSpace(2056) called with curMem=63507, maxMem=278302556
INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.0 KB, free 265.3 MB)
INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:60320 (size: 2.0 KB, free: 265.4 MB)
INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:839
INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MapPartitionsRDD[3] at filter at SimpleApp.scala:13)
INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, PROCESS_LOCAL, 1391 bytes)
INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, PROCESS_LOCAL, 1391 bytes)
INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
INFO BlockManager: Found block rdd_1_0 locally
INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1830 bytes result sent to driver
INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 9 ms on localhost (1/2)
INFO BlockManager: Found block rdd_1_1 locally
INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1830 bytes result sent to driver
INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 10 ms on localhost (2/2)
INFO DAGScheduler: Stage 1 (count at SimpleApp.scala:13) finished in 0.011 s
INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
INFO DAGScheduler: Job 1 finished: count at SimpleApp.scala:13, took 0.024084 s
Lines with a: 5, Lines with b: 12
I made it work specifying in the spark submit the path to the metrics file
--files=/yourPath/metrics.properties --conf spark.metrics.conf=./metrics.properties

Resources