Pyspark not reading data from kafka stream - apache-spark

I'm using PySpark to read streaming data from kafka. Both kafka and PySpark are running in a local kubernetes cluster setup using minikube. I am using the bitnami/spark-cluster helm chart. It seems that the PySpark app is unable to read streaming data from Kafka.
Here are some logs from the pyspark app -
22/12/18 20:40:57 INFO StandaloneSchedulerBackend: Granted executor ID app-20221218203946-0013/5 on hostPort 172.17.0.4:40893 with 1 core(s), 1024.0 MiB RAM
22/12/18 20:40:57 INFO AppInfoParser: Kafka version: 2.6.0
22/12/18 20:40:57 INFO AppInfoParser: Kafka commitId: 62abe01bee039651
22/12/18 20:40:57 INFO AppInfoParser: Kafka startTimeMs: 1671396057484
22/12/18 20:40:57 INFO KafkaConsumer: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Subscribed to topic(s): meetup-rsvps-topic
22/12/18 20:40:57 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20221218203946-0013/5 is now RUNNING
22/12/18 20:41:04 INFO Metadata: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Cluster ID: tIhn8pqNQbWmb6C_WDce3A
22/12/18 20:41:04 INFO AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Discovered group coordinator kafka-local-0.kafka-local-headless.meetupstream.svc.cluster.local:9092 (id: 2147483647 rack: null)
22/12/18 20:41:04 INFO AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] (Re-)joining group
22/12/18 20:41:05 INFO AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Join group failed with org.apache.kafka.common.errors.MemberIdRequiredException: The group member needs to have a valid member id before actually entering a consumer group.
22/12/18 20:41:05 INFO AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] (Re-)joining group
22/12/18 20:41:05 INFO ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Finished assignment for group at generation 1: {consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1-8ff6df17-88f3-48dd-86f1-caa731cfbfab=Assignment(partitions=[meetup-rsvps-topic-0])}
22/12/18 20:41:05 INFO AbstractCoordinator: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Successfully joined group with generation 1
22/12/18 20:41:05 INFO ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Notifying assignor about the new Assignment(partitions=[meetup-rsvps-topic-0])
22/12/18 20:41:05 INFO ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Adding newly assigned partitions: meetup-rsvps-topic-0
22/12/18 20:41:05 INFO ConsumerCoordinator: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Found no committed offset for partition meetup-rsvps-topic-0
22/12/18 20:41:06 INFO SubscriptionState: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Seeking to EARLIEST offset of partition meetup-rsvps-topic-0
22/12/18 20:41:06 INFO SubscriptionState: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Resetting offset for partition meetup-rsvps-topic-0 to offset 0.
22/12/18 20:41:06 INFO CheckpointFileManager: Writing atomically to file:/tmp/temporary-a95fafb4-6113-49ab-93d8-b0c495bac40a/sources/0/0 using temp file file:/tmp/temporary-a95fafb4-6113-49ab-93d8-b0c495bac40a/sources/0/.0.e93bfa5b-6db8-4892-8447-a500739de876.tmp
22/12/18 20:41:07 INFO CheckpointFileManager: Renamed temp file file:/tmp/temporary-a95fafb4-6113-49ab-93d8-b0c495bac40a/sources/0/.0.e93bfa5b-6db8-4892-8447-a500739de876.tmp to file:/tmp/temporary-a95fafb4-6113-49ab-93d8-b0c495bac40a/sources/0/0
22/12/18 20:41:07 INFO KafkaMicroBatchStream: Initial offsets: {"meetup-rsvps-topic":{"0":0}}
22/12/18 20:41:07 INFO SubscriptionState: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Seeking to LATEST offset of partition meetup-rsvps-topic-0
22/12/18 20:41:07 INFO SubscriptionState: [Consumer clientId=consumer-spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0-1, groupId=spark-kafka-source-bfcc8f10-c42a-4320-a4bf-735d75f3893b--1704904730-driver-0] Resetting offset for partition meetup-rsvps-topic-0 to offset 1898.
22/12/18 20:41:07 INFO CheckpointFileManager: Writing atomically to file:/tmp/temporary-a95fafb4-6113-49ab-93d8-b0c495bac40a/offsets/0 using temp file file:/tmp/temporary-a95fafb4-6113-49ab-93d8-b0c495bac40a/offsets/.0.a9f679ba-5f6a-4ca5-9e3d-4e7ac92d3afa.tmp
22/12/18 20:41:09 INFO CheckpointFileManager: Renamed temp file file:/tmp/temporary-a95fafb4-6113-49ab-93d8-b0c495bac40a/offsets/.0.a9f679ba-5f6a-4ca5-9e3d-4e7ac92d3afa.tmp to file:/tmp/temporary-a95fafb4-6113-49ab-93d8-b0c495bac40a/offsets/0
22/12/18 20:41:09 INFO MicroBatchExecution: Committed offsets for batch 0. Metadata OffsetSeqMetadata(0,1671396067279,Map(spark.sql.streaming.stateStore.providerClass -> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider, spark.sql.streaming.join.stateFormatVersion -> 2, spark.sql.streaming.stateStore.compression.codec -> lz4, spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion -> 2, spark.sql.streaming.multipleWatermarkPolicy -> min, spark.sql.streaming.aggregation.stateFormatVersion -> 2, spark.sql.shuffle.partitions -> 200))
22/12/18 20:41:15 INFO KafkaOffsetReaderConsumer: Partitions added: Map()
22/12/18 20:41:16 INFO KafkaOffsetReaderConsumer: Partitions added: Map()
22/12/18 20:41:16 INFO KafkaOffsetReaderConsumer: Partitions added: Map()
22/12/18 20:41:16 INFO KafkaOffsetReaderConsumer: Partitions added: Map()
22/12/18 20:41:17 INFO KafkaOffsetReaderConsumer: Partitions added: Map()
22/12/18 20:41:17 INFO KafkaOffsetReaderConsumer: Partitions added: Map()
22/12/18 20:41:31 INFO CodeGenerator: Code generated in 4801.506531 ms
22/12/18 20:41:33 INFO WriteToDataSourceV2Exec: Start processing data source write support: org.apache.spark.sql.execution.streaming.sources.MicroBatchWrite#53fa8f13. The input RDD has 1 partitions.
22/12/18 20:41:33 INFO SparkContext: Starting job: start at NativeMethodAccessorImpl.java:0
22/12/18 20:41:33 INFO DAGScheduler: Got job 0 (start at NativeMethodAccessorImpl.java:0) with 1 output partitions
22/12/18 20:41:33 INFO DAGScheduler: Final stage: ResultStage 0 (start at NativeMethodAccessorImpl.java:0)
22/12/18 20:41:33 INFO DAGScheduler: Parents of final stage: List()
22/12/18 20:41:33 INFO DAGScheduler: Missing parents: List()
22/12/18 20:41:33 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20221218203946-0013/5 is now EXITED (Command exited with code 1)
22/12/18 20:41:33 INFO StandaloneSchedulerBackend: Executor app-20221218203946-0013/5 removed: Command exited with code 1
22/12/18 20:41:33 INFO BlockManagerMaster: Removal of executor 5 requested
22/12/18 20:41:33 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 5
22/12/18 20:41:33 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20221218203946-0013/6 on worker-20221217144830-172.17.0.4-40893 (172.17.0.4:40893) with 1 core(s)
22/12/18 20:41:33 INFO BlockManagerMasterEndpoint: Trying to remove executor 5 from BlockManagerMaster.
22/12/18 20:41:33 INFO StandaloneSchedulerBackend: Granted executor ID app-20221218203946-0013/6 on hostPort 172.17.0.4:40893 with 1 core(s), 1024.0 MiB RAM
22/12/18 20:41:33 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at start at NativeMethodAccessorImpl.java:0), which has no missing parents
22/12/18 20:41:33 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20221218203946-0013/6 is now RUNNING
22/12/18 20:41:34 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20221218203946-0013/4 is now EXITED (Command exited with code 1)
22/12/18 20:41:34 INFO StandaloneSchedulerBackend: Executor app-20221218203946-0013/4 removed: Command exited with code 1
22/12/18 20:41:34 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20221218203946-0013/7 on worker-20221217144749-172.17.0.3-46817 (172.17.0.3:46817) with 1 core(s)
22/12/19 10:35:09 INFO StandaloneSchedulerBackend: Granted executor ID app-20221219103231-0000/9 on hostPort 172.17.0.4:35303 with 1 core(s), 1024.0 MiB RAM
22/12/19 10:35:09 INFO StandaloneSchedulerBackend: Granted executor ID app-20221219103231-0000/9 on hostPort 172.17.0.4:35303 with 1 core(s), 1024.0 MiB RAM
22/12/19 10:35:09 INFO BlockManagerMasterEndpoint: Trying to remove executor 7 from BlockManagerMaster.
22/12/19 10:35:09 INFO BlockManagerMaster: Removal of executor 7 requested
22/12/19 10:35:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 7
22/12/19 10:35:09 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20221219103231-0000/9 is now RUNNING
22/12/19 10:35:18 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
22/12/19 10:35:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20221219103231-0000/8 is now EXITED (Command exited with code 1)
22/12/19 10:35:25 INFO StandaloneSchedulerBackend: Executor app-20221219103231-0000/8 removed: Command exited with code 1
22/12/19 10:35:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20221219103231-0000/10 on worker-20221219102048-172.17.0.3-33275 (172.17.0.3:33275) with 1 core(s)
22/12/19 10:35:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20221219103231-0000/10 on hostPort 172.17.0.3:33275 with 1 core(s), 1024.0 MiB RAM
22/12/19 10:35:25 INFO BlockManagerMasterEndpoint: Trying to remove executor 8 from BlockManagerMaster.
22/12/19 10:35:25 INFO BlockManagerMaster: Removal of executor 8 requested
Here is the code for my pyspark app.
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
def main():
spark = SparkSession \
.builder \
.appName("SparkProcess") \
.getOrCreate()
df = spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "kafka-local.meetupstream.svc.cluster.local:9092")
.option("subscribe", "meetup-rsvps-topic")
.option("startingOffsets", "earliest")
.load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
df.writeStream \
.trigger(processingTime="60 seconds") \
.format("console") \
.outputMode("append") \
.start()
I have also tried setting .config("spark.dynamicAllocation.enabled", "false") in PySpark but I still get the same error.
My spark UI -
I am using the following command for executing my spark app -
kubectl run spark-base -it --rm --labels="app=spark-client" --image-pull-policy Never --restart Never --image spark-layer:latest -- bash ./spark/bin/spark-submit --master spark://spark-cluster-master-svc:7077 --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1 ./spark-process.py

Related

can driver program and cluster manager(resource manager) can available on same machine in spark stand alone

I'm doing spark submit from the same machine as spark master, using following command ./bin/spark-submit --master spark://ip:port --deploy-mode "client" test.py I'm my application running forever with following kind of output
22/11/18 13:17:37 INFO BlockManagerMaster: Removal of executor 8 requested
22/11/18 13:17:37 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 8
22/11/18 13:17:37 INFO StandaloneSchedulerBackend: Granted executor ID app-20221118131723-0008/10 on hostPort 192.168.210.94:37443 with 2 core(s), 1024.0 MiB RAM
22/11/18 13:17:37 INFO BlockManagerMasterEndpoint: Trying to remove executor 8 from BlockManagerMaster.
22/11/18 13:17:37 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20221118131723-0008/10 is now RUNNING
22/11/18 13:17:38 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20221118131723-0008/9 is now EXITED (Command exited with code 1)
22/11/18 13:17:38 INFO StandaloneSchedulerBackend: Executor app-20221118131723-0008/9 removed: Command exited with code 1
22/11/18 13:17:38 INFO BlockManagerMaster: Removal of executor 9 requested
22/11/18 13:17:38 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 9
22/11/18 13:17:38 INFO BlockManagerMasterEndpoint: Trying to remove executor 9 from BlockManagerMaster.
22/11/18 13:17:38 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20221118131723-0008/11 on worker-20221118111836-192.168.210.82-46395 (192.168.210.82:4639
But when I run from other nodes, my application is running successfully what could be reason?

Spark: Jobs are not assigned

I've deployed an spark cluster into my kubernetes. Here webui:
I'm trying to submit an sark SparkPi example using:
$ ./spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://spark-cluster-ra-iot-dev.si-origin-cluster.t-systems.es:32316 \
--num-executors 1 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
../examples/jars/spark-examples_2.11-2.4.5.jar 10
Job is reached on spark cluster:
Nevertheless, I'm getting messages like:
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I seems like SparkPi application is scheduled but never executed...
Here complete log:
./spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark-cluster-ra-iot-dev.si-origin-cluster.t-systems.es:32316 --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 ../examples/jars/spark-examples_2.11-2.4.5.jar 10
20/06/09 10:52:57 WARN Utils: Your hostname, psgd resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
20/06/09 10:52:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/06/09 10:52:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/06/09 10:52:58 INFO SparkContext: Running Spark version 2.4.5
20/06/09 10:52:58 INFO SparkContext: Submitted application: Spark Pi
20/06/09 10:52:58 INFO SecurityManager: Changing view acls to: jeusdi
20/06/09 10:52:58 INFO SecurityManager: Changing modify acls to: jeusdi
20/06/09 10:52:58 INFO SecurityManager: Changing view acls groups to:
20/06/09 10:52:58 INFO SecurityManager: Changing modify acls groups to:
20/06/09 10:52:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jeusdi); groups with view permissions: Set(); users with modify permissions: Set(jeusdi); groups with modify permissions: Set()
20/06/09 10:52:59 INFO Utils: Successfully started service 'sparkDriver' on port 42943.
20/06/09 10:52:59 INFO SparkEnv: Registering MapOutputTracker
20/06/09 10:52:59 INFO SparkEnv: Registering BlockManagerMaster
20/06/09 10:52:59 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/06/09 10:52:59 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/06/09 10:52:59 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b6c54054-c94b-42c7-b85f-a4e30be4b659
20/06/09 10:52:59 INFO MemoryStore: MemoryStore started with capacity 117.0 MB
20/06/09 10:52:59 INFO SparkEnv: Registering OutputCommitCoordinator
20/06/09 10:52:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/06/09 10:53:00 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040
20/06/09 10:53:00 INFO SparkContext: Added JAR file:/home/jeusdi/projects/workarea/valladolid/spark-2.4.5-bin-hadoop2.7/bin/../examples/jars/spark-examples_2.11-2.4.5.jar at spark://10.0.2.15:42943/jars/spark-examples_2.11-2.4.5.jar with timestamp 1591692780146
20/06/09 10:53:00 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-cluster-ra-iot-dev.si-origin-cluster.t-systems.es:32316...
20/06/09 10:53:00 INFO TransportClientFactory: Successfully created connection to spark-cluster-ra-iot-dev.si-origin-cluster.t-systems.es/10.49.160.69:32316 after 152 ms (0 ms spent in bootstraps)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20200609085300-0002
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/0 on worker-20200609084543-10.129.3.127-45867 (10.129.3.127:45867) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/0 on hostPort 10.129.3.127:45867 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/1 on worker-20200609084543-10.129.3.127-45867 (10.129.3.127:45867) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/1 on hostPort 10.129.3.127:45867 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/2 on worker-20200609084543-10.129.3.127-45867 (10.129.3.127:45867) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/2 on hostPort 10.129.3.127:45867 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/3 on worker-20200609084543-10.129.3.127-45867 (10.129.3.127:45867) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/3 on hostPort 10.129.3.127:45867 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/4 on worker-20200609084543-10.129.3.127-45867 (10.129.3.127:45867) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/4 on hostPort 10.129.3.127:45867 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33755.
20/06/09 10:53:01 INFO NettyBlockTransferService: Server created on 10.0.2.15:33755
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/5 on worker-20200609084509-10.128.3.197-41600 (10.128.3.197:41600) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/5 on hostPort 10.128.3.197:41600 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/6 on worker-20200609084509-10.128.3.197-41600 (10.128.3.197:41600) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/6 on hostPort 10.128.3.197:41600 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/7 on worker-20200609084509-10.128.3.197-41600 (10.128.3.197:41600) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/7 on hostPort 10.128.3.197:41600 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/8 on worker-20200609084509-10.128.3.197-41600 (10.128.3.197:41600) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/8 on hostPort 10.128.3.197:41600 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/9 on worker-20200609084509-10.128.3.197-41600 (10.128.3.197:41600) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/9 on hostPort 10.128.3.197:41600 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/10 on worker-20200609084426-10.131.1.27-46041 (10.131.1.27:46041) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/10 on hostPort 10.131.1.27:46041 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/11 on worker-20200609084426-10.131.1.27-46041 (10.131.1.27:46041) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/11 on hostPort 10.131.1.27:46041 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/12 on worker-20200609084426-10.131.1.27-46041 (10.131.1.27:46041) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/12 on hostPort 10.131.1.27:46041 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/13 on worker-20200609084426-10.131.1.27-46041 (10.131.1.27:46041) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/13 on hostPort 10.131.1.27:46041 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/14 on worker-20200609084426-10.131.1.27-46041 (10.131.1.27:46041) with 1 core(s)
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/14 on hostPort 10.131.1.27:46041 with 1 core(s), 512.0 MB RAM
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/5 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/6 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/7 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/8 is now RUNNING
20/06/09 10:53:01 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 33755, None)
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/9 is now RUNNING
20/06/09 10:53:01 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:33755 with 117.0 MB RAM, BlockManagerId(driver, 10.0.2.15, 33755, None)
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/10 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/11 is now RUNNING
20/06/09 10:53:01 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 33755, None)
20/06/09 10:53:01 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 33755, None)
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/12 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/13 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/14 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/0 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/1 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/2 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/3 is now RUNNING
20/06/09 10:53:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/4 is now RUNNING
20/06/09 10:53:01 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
20/06/09 10:53:02 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
20/06/09 10:53:02 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
20/06/09 10:53:02 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
20/06/09 10:53:02 INFO DAGScheduler: Parents of final stage: List()
20/06/09 10:53:02 INFO DAGScheduler: Missing parents: List()
20/06/09 10:53:02 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
20/06/09 10:53:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.0 KB, free 117.0 MB)
20/06/09 10:53:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1381.0 B, free 117.0 MB)
20/06/09 10:53:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:33755 (size: 1381.0 B, free: 117.0 MB)
20/06/09 10:53:03 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
20/06/09 10:53:03 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
20/06/09 10:53:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
20/06/09 10:53:18 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/06/09 10:53:33 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/06/09 10:53:48 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/06/09 10:54:03 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/06/09 10:54:18 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/06/09 10:54:33 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/06/09 10:54:48 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/13 is now EXITED (Command exited with code 1)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Executor app-20200609085300-0002/13 removed: Command exited with code 1
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/15 on worker-20200609084426-10.131.1.27-46041 (10.131.1.27:46041) with 1 core(s)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/15 on hostPort 10.131.1.27:46041 with 1 core(s), 512.0 MB RAM
20/06/09 10:55:03 INFO BlockManagerMasterEndpoint: Trying to remove executor 13 from BlockManagerMaster.
20/06/09 10:55:03 INFO BlockManagerMaster: Removal of executor 13 requested
20/06/09 10:55:03 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 13
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/15 is now RUNNING
20/06/09 10:55:03 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/12 is now EXITED (Command exited with code 1)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Executor app-20200609085300-0002/12 removed: Command exited with code 1
20/06/09 10:55:03 INFO BlockManagerMaster: Removal of executor 12 requested
20/06/09 10:55:03 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 12
20/06/09 10:55:03 INFO BlockManagerMasterEndpoint: Trying to remove executor 12 from BlockManagerMaster.
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/16 on worker-20200609084426-10.131.1.27-46041 (10.131.1.27:46041) with 1 core(s)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/16 on hostPort 10.131.1.27:46041 with 1 core(s), 512.0 MB RAM
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/16 is now RUNNING
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/14 is now EXITED (Command exited with code 1)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Executor app-20200609085300-0002/14 removed: Command exited with code 1
20/06/09 10:55:03 INFO BlockManagerMaster: Removal of executor 14 requested
20/06/09 10:55:03 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 14
20/06/09 10:55:03 INFO BlockManagerMasterEndpoint: Trying to remove executor 14 from BlockManagerMaster.
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/17 on worker-20200609084426-10.131.1.27-46041 (10.131.1.27:46041) with 1 core(s)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/17 on hostPort 10.131.1.27:46041 with 1 core(s), 512.0 MB RAM
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/17 is now RUNNING
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/10 is now EXITED (Command exited with code 1)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Executor app-20200609085300-0002/10 removed: Command exited with code 1
20/06/09 10:55:03 INFO BlockManagerMaster: Removal of executor 10 requested
20/06/09 10:55:03 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 10
20/06/09 10:55:03 INFO BlockManagerMasterEndpoint: Trying to remove executor 10 from BlockManagerMaster.
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/18 on worker-20200609084426-10.131.1.27-46041 (10.131.1.27:46041) with 1 core(s)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/18 on hostPort 10.131.1.27:46041 with 1 core(s), 512.0 MB RAM
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/18 is now RUNNING
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/8 is now EXITED (Command exited with code 1)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Executor app-20200609085300-0002/8 removed: Command exited with code 1
20/06/09 10:55:03 INFO BlockManagerMaster: Removal of executor 8 requested
20/06/09 10:55:03 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 8
20/06/09 10:55:03 INFO BlockManagerMasterEndpoint: Trying to remove executor 8 from BlockManagerMaster.
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/19 on worker-20200609084509-10.128.3.197-41600 (10.128.3.197:41600) with 1 core(s)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/19 on hostPort 10.128.3.197:41600 with 1 core(s), 512.0 MB RAM
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/11 is now EXITED (Command exited with code 1)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Executor app-20200609085300-0002/11 removed: Command exited with code 1
20/06/09 10:55:03 INFO BlockManagerMasterEndpoint: Trying to remove executor 11 from BlockManagerMaster.
20/06/09 10:55:03 INFO BlockManagerMaster: Removal of executor 11 requested
20/06/09 10:55:03 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 11
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/20 on worker-20200609084426-10.131.1.27-46041 (10.131.1.27:46041) with 1 core(s)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/20 on hostPort 10.131.1.27:46041 with 1 core(s), 512.0 MB RAM
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/19 is now RUNNING
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/20 is now RUNNING
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/7 is now EXITED (Command exited with code 1)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Executor app-20200609085300-0002/7 removed: Command exited with code 1
20/06/09 10:55:03 INFO BlockManagerMaster: Removal of executor 7 requested
20/06/09 10:55:03 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 7
20/06/09 10:55:03 INFO BlockManagerMasterEndpoint: Trying to remove executor 7 from BlockManagerMaster.
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/21 on worker-20200609084509-10.128.3.197-41600 (10.128.3.197:41600) with 1 core(s)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20200609085300-0002/21 on hostPort 10.128.3.197:41600 with 1 core(s), 512.0 MB RAM
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/21 is now RUNNING
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200609085300-0002/9 is now EXITED (Command exited with code 1)
20/06/09 10:55:03 INFO StandaloneSchedulerBackend: Executor app-20200609085300-0002/9 removed: Command exited with code 1
20/06/09 10:55:03 INFO BlockManagerMaster: Removal of executor 9 requested
20/06/09 10:55:03 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 9
20/06/09 10:55:03 INFO BlockManagerMasterEndpoint: Trying to remove executor 9 from BlockManagerMaster.
20/06/09 10:55:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200609085300-0002/22 on worker-20200609084509-10.128.3.197-41600 (10.128.3.197:41600) with 1 core(s)
...

How to deploy spark on Kubeedge?

I tried to use k8s deployment mode to deploy spark-2.4.3 on Kubeedge 1.1.0 but failed (docker version 19.03.4,k8s version 1.16.1).
SPARK_DRIVER_BIND_ADDRESS=10.4.20.34
SPARK_IMAGE=spark:2.4.3
SPARK_MASTER="k8s://http://127.0.0.1:8080"
CMD=(
"$SPARK_HOME/bin/spark-submit"
--conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
--conf "spark.kubernetes.container.image=${SPARK_IMAGE}"
--conf "spark.executor.instances=1"
--conf "spark.kubernetes.executor.limit.cores=1"
--deploy-mode client
--master ${SPARK_MASTER}
--name spark-pi
--class org.apache.spark.examples.SparkPi
--driver-memory 1G
--executor-memory 1G
--num-executors 1
--executor-cores 1
file://${PWD}/spark-examples_2.11-2.4.3.jar
)
${CMD[#]}
Node status is normal.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
edge-node-001 Ready edge 6d1h v1.15.3-kubeedge-v1.1.0-beta.0.178+c6a5aa738261e7-dirty
ubuntu-ms-7b89 Ready master 6d4h v1.16.1
But I got some errors
19/11/17 21:45:12 INFO k8s.ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
19/11/17 21:45:12 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46571.
19/11/17 21:45:12 INFO netty.NettyBlockTransferService: Server created on 10.4.20.34:46571
19/11/17 21:45:12 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/11/17 21:45:12 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.4.20.34, 46571, None)
19/11/17 21:45:12 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.4.20.34:46571 with 366.3 MB RAM, BlockManagerId(driver, 10.4.20.34, 46571, None)
19/11/17 21:45:12 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.4.20.34, 46571, None)
19/11/17 21:45:12 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.4.20.34, 46571, None)
19/11/17 21:45:12 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#451882b2{/metrics/json,null,AVAILABLE,#Spark}
19/11/17 21:45:42 INFO k8s.KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
19/11/17 21:45:42 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Parents of final stage: List()
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Missing parents: List()
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
19/11/17 21:45:42 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 366.3 MB)
19/11/17 21:45:42 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 366.3 MB)
19/11/17 21:45:42 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.4.20.34:46571 (size: 1256.0 B, free: 366.3 MB)
19/11/17 21:45:42 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
19/11/17 21:45:42 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
19/11/17 21:45:57 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/11/17 21:46:12 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/11/17 21:46:27 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/11/17 21:46:42 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/11/17 21:46:57 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/11/17 21:47:12 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Is it possible to deploy spark on Kubeedge in Kubernetes deployment mode? Or I should try standalone deployment mode?
I'm so confused.

Collect failed in ... s due to Stage cancelled because SparkContext was shut down

I want to display the number of elements in each partition, so I write the following:
def count_in_a_partition(iterator):
yield sum(1 for _ in iterator)
If I use it like this
print("number of element in each partitions: {}".format(
my_rdd.mapPartitions(count_in_a_partition).collect()
))
I get the following:
19/02/18 21:41:15 INFO DAGScheduler: Job 3 failed: collect at /project/6008168/tamouze/testSparkCedar.py:435, took 30.859710 s
19/02/18 21:41:15 INFO DAGScheduler: ResultStage 3 (collect at /project/6008168/tamouze/testSparkCedar.py:435) failed in 30.848 s due to Stage cancelled because SparkContext was shut down
19/02/18 21:41:15 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/02/18 21:41:16 INFO MemoryStore: MemoryStore cleared
19/02/18 21:41:16 INFO BlockManager: BlockManager stopped
19/02/18 21:41:16 INFO BlockManagerMaster: BlockManagerMaster stopped
19/02/18 21:41:16 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/02/18 21:41:16 WARN BlockManager: Putting block rdd_3_14 failed due to exception java.net.SocketException: Connection reset.
19/02/18 21:41:16 WARN BlockManager: Block rdd_3_14 could not be removed as it was not found on disk or in memory
19/02/18 21:41:16 WARN BlockManager: Putting block rdd_3_3 failed due to exception java.net.SocketException: Connection reset.
19/02/18 21:41:16 WARN BlockManager: Block rdd_3_3 could not be removed as it was not found on disk or in memory
19/02/18 21:41:16 INFO SparkContext: Successfully stopped SparkContext
....
noting that my_rdd.take(1) return :
[(u'id', u'text', array([-0.31921682, ...,0.890875]))]
How can I solve this issue?
You have to use glom() function for that. Let’s take an example.
Let's create a DataFrame first.
rdd=sc.parallelize([('a',22),('b',1),('c',4),('b',1),('d',2),('e',0),('d',3),('a',1),('c',4),('b',7),('a',2),('f',1)] )
df=rdd.toDF(['key','value'])
df=df.repartition(5,"key") # Make 5 Partitions
The number of partitions -
print("Number of partitions: {}".format(df.rdd.getNumPartitions()))
Number of partitions: 5
Number of rows/elements on each partition. This can give you an idea of skew -
print('Partitioning distribution: '+ str(df.rdd.glom().map(len).collect()))
Partitioning distribution: [3, 3, 2, 2, 2]
See how actually are rows distributed on the partitions. Behold that if the dataset is big, then your system could crash because of Out of Memory OOM issue.
print("Partitions structure: {}".format(df.rdd.glom().collect()))
Partitions structure: [
#Partition 1 [Row(key='a', value=22), Row(key='a', value=1), Row(key='a', value=2)],
#Partition 2 [Row(key='b', value=1), Row(key='b', value=1), Row(key='b', value=7)],
#Partition 3 [Row(key='c', value=4), Row(key='c', value=4)],
#Partition 4 [Row(key='e', value=0), Row(key='f', value=1)],
#Partition 5 [Row(key='d', value=2), Row(key='d', value=3)]
]

Why would Spark executors be removed (with "ExecutorAllocationManager: Request to remove executorIds" in the logs)?

Im trying to execute a spark job in an AWS cluster of 6 c4.2xlarge nodes and I don't know why Spark is killing the executors...
Any help will be appreciated
Here the spark submit command:
. /usr/bin/spark-submit --packages="com.databricks:spark-avro_2.11:3.2.0" --jars RedshiftJDBC42-1.2.1.1001.jar --deploy-mode client --master yarn --num-executors 12 --executor-cores 3 --executor-memory 7G --driver-memory 7g --py-files dependencies.zip iface_extractions.py 2016-10-01 > output.log
At line this line starts to remove executors
17/05/25 14:42:50 INFO ExecutorAllocationManager: Request to remove executorIds: 5, 3
Output spark-submit log:
Ivy Default Cache set to: /home/hadoop/.ivy2/cache
The jars for the packages stored in: /home/hadoop/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-avro_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-avro_2.11;3.2.0 in central
found org.slf4j#slf4j-api;1.7.5 in central
found org.apache.avro#avro;1.7.6 in central
found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
found com.thoughtworks.paranamer#paranamer;2.3 in central
found org.xerial.snappy#snappy-java;1.0.5 in central
found org.apache.commons#commons-compress;1.4.1 in central
found org.tukaani#xz;1.0 in central
:: resolution report :: resolve 284ms :: artifacts dl 8ms
:: modules in use:
com.databricks#spark-avro_2.11;3.2.0 from central in [default]
com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
org.apache.avro#avro;1.7.6 from central in [default]
org.apache.commons#commons-compress;1.4.1 from central in [default]
org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
org.slf4j#slf4j-api;1.7.5 from central in [default]
org.tukaani#xz;1.0 from central in [default]
org.xerial.snappy#snappy-java;1.0.5 from central in [default]
:: evicted modules:
org.slf4j#slf4j-api;1.6.4 by [org.slf4j#slf4j-api;1.7.5] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 10 | 0 | 0 | 1 || 9 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 9 already retrieved (0kB/8ms)
17/05/25 14:41:37 INFO SparkContext: Running Spark version 2.1.0
17/05/25 14:41:38 INFO SecurityManager: Changing view acls to: hadoop
17/05/25 14:41:38 INFO SecurityManager: Changing modify acls to: hadoop
17/05/25 14:41:38 INFO SecurityManager: Changing view acls groups to:
17/05/25 14:41:38 INFO SecurityManager: Changing modify acls groups to:
17/05/25 14:41:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
17/05/25 14:41:38 INFO Utils: Successfully started service 'sparkDriver' on port 37132.
17/05/25 14:41:38 INFO SparkEnv: Registering MapOutputTracker
17/05/25 14:41:38 INFO SparkEnv: Registering BlockManagerMaster
17/05/25 14:41:38 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/05/25 14:41:38 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/05/25 14:41:38 INFO DiskBlockManager: Created local directory at /mnt/tmp/blockmgr-e368a261-c1a1-49e7-8533-8081896a45e4
17/05/25 14:41:38 INFO MemoryStore: MemoryStore started with capacity 4.0 GB
17/05/25 14:41:38 INFO SparkEnv: Registering OutputCommitCoordinator
17/05/25 14:41:39 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/05/25 14:41:39 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.185.53.161:4040
17/05/25 14:41:39 INFO Utils: Using initial executors = 12, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
17/05/25 14:41:39 INFO RMProxy: Connecting to ResourceManager at ip-10-185-53-161.eu-west-1.compute.internal/10.185.53.161:8032
17/05/25 14:41:39 INFO Client: Requesting a new application from cluster with 5 NodeManagers
17/05/25 14:41:40 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container)
17/05/25 14:41:40 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
17/05/25 14:41:40 INFO Client: Setting up container launch context for our AM
17/05/25 14:41:40 INFO Client: Setting up the launch environment for our AM container
17/05/25 14:41:40 INFO Client: Preparing resources for our AM container
17/05/25 14:41:40 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/05/25 14:41:42 INFO Client: Uploading resource file:/mnt/tmp/spark-4f534fa1-c377-4113-9c86-96d5cdab4cb5/__spark_libs__6500399427935716229.zip -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/__spark_libs__6500399427935716229.zip
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/RedshiftJDBC42-1.2.1.1001.jar -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/RedshiftJDBC42-1.2.1.1001.jar
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/com.databricks_spark-avro_2.11-3.2.0.jar -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/com.databricks_spark-avro_2.11-3.2.0.jar
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/org.slf4j_slf4j-api-1.7.5.jar
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/org.apache.avro_avro-1.7.6.jar -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/org.apache.avro_avro-1.7.6.jar
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/org.codehaus.jackson_jackson-core-asl-1.9.13.jar
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/com.thoughtworks.paranamer_paranamer-2.3.jar
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/org.xerial.snappy_snappy-java-1.0.5.jar
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/org.apache.commons_commons-compress-1.4.1.jar
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/org.tukaani_xz-1.0.jar -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/org.tukaani_xz-1.0.jar
17/05/25 14:41:43 INFO Client: Uploading resource file:/etc/spark/conf/hive-site.xml -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/hive-site.xml
17/05/25 14:41:43 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/pyspark.zip
17/05/25 14:41:43 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/py4j-0.10.4-src.zip
17/05/25 14:41:43 INFO Client: Uploading resource file:/home/hadoop/dependencies.zip -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/dependencies.zip
17/05/25 14:41:43 WARN Client: Same path resource file:/home/hadoop/.ivy2/jars/com.databricks_spark-avro_2.11-3.2.0.jar added multiple times to distributed cache.
17/05/25 14:41:43 WARN Client: Same path resource file:/home/hadoop/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar added multiple times to distributed cache.
17/05/25 14:41:43 WARN Client: Same path resource file:/home/hadoop/.ivy2/jars/org.apache.avro_avro-1.7.6.jar added multiple times to distributed cache.
17/05/25 14:41:43 WARN Client: Same path resource file:/home/hadoop/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar added multiple times to distributed cache.
17/05/25 14:41:43 WARN Client: Same path resource file:/home/hadoop/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar added multiple times to distributed cache.
17/05/25 14:41:43 WARN Client: Same path resource file:/home/hadoop/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar added multiple times to distributed cache.
17/05/25 14:41:43 WARN Client: Same path resource file:/home/hadoop/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar added multiple times to distributed cache.
17/05/25 14:41:43 WARN Client: Same path resource file:/home/hadoop/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar added multiple times to distributed cache.
17/05/25 14:41:43 WARN Client: Same path resource file:/home/hadoop/.ivy2/jars/org.tukaani_xz-1.0.jar added multiple times to distributed cache.
17/05/25 14:41:43 INFO Client: Uploading resource file:/mnt/tmp/spark-4f534fa1-c377-4113-9c86-96d5cdab4cb5/__spark_conf__1516567354161750682.zip -> hdfs://ip-10-185-53-161.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1495720658394_0004/__spark_conf__.zip
17/05/25 14:41:43 INFO SecurityManager: Changing view acls to: hadoop
17/05/25 14:41:43 INFO SecurityManager: Changing modify acls to: hadoop
17/05/25 14:41:43 INFO SecurityManager: Changing view acls groups to:
17/05/25 14:41:43 INFO SecurityManager: Changing modify acls groups to:
17/05/25 14:41:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
17/05/25 14:41:43 INFO Client: Submitting application application_1495720658394_0004 to ResourceManager
17/05/25 14:41:43 INFO YarnClientImpl: Submitted application application_1495720658394_0004
17/05/25 14:41:43 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1495720658394_0004 and attemptId None
17/05/25 14:41:44 INFO Client: Application report for application_1495720658394_0004 (state: ACCEPTED)
17/05/25 14:41:44 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1495723303463
final status: UNDEFINED
tracking URL: http://ip-10-185-53-161.eu-west-1.compute.internal:20888/proxy/application_1495720658394_0004/
user: hadoop
17/05/25 14:41:45 INFO Client: Application report for application_1495720658394_0004 (state: ACCEPTED)
17/05/25 14:41:46 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
17/05/25 14:41:46 INFO Client: Application report for application_1495720658394_0004 (state: ACCEPTED)
17/05/25 14:41:46 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> ip-10-185-53-161.eu-west-1.compute.internal, PROXY_URI_BASES -> http://ip-10-185-53-161.eu-west-1.compute.internal:20888/proxy/application_1495720658394_0004), /proxy/application_1495720658394_0004
17/05/25 14:41:46 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
17/05/25 14:41:47 INFO Client: Application report for application_1495720658394_0004 (state: RUNNING)
17/05/25 14:41:47 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.185.52.31
ApplicationMaster RPC port: 0
queue: default
start time: 1495723303463
final status: UNDEFINED
tracking URL: http://ip-10-185-53-161.eu-west-1.compute.internal:20888/proxy/application_1495720658394_0004/
user: hadoop
17/05/25 14:41:47 INFO YarnClientSchedulerBackend: Application application_1495720658394_0004 has started running.
17/05/25 14:41:47 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37860.
17/05/25 14:41:47 INFO NettyBlockTransferService: Server created on 10.185.53.161:37860
17/05/25 14:41:47 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/05/25 14:41:47 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.185.53.161, 37860, None)
17/05/25 14:41:47 INFO BlockManagerMasterEndpoint: Registering block manager 10.185.53.161:37860 with 4.0 GB RAM, BlockManagerId(driver, 10.185.53.161, 37860, None)
17/05/25 14:41:47 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.185.53.161, 37860, None)
17/05/25 14:41:47 INFO BlockManager: external shuffle service port = 7337
17/05/25 14:41:47 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.185.53.161, 37860, None)
17/05/25 14:41:47 INFO EventLoggingListener: Logging events to hdfs:///var/log/spark/apps/application_1495720658394_0004
17/05/25 14:41:47 INFO Utils: Using initial executors = 12, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
17/05/25 14:41:50 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.185.52.31:57406) with ID 5
17/05/25 14:41:50 INFO ExecutorAllocationManager: New executor 5 has registered (new total is 1)
17/05/25 14:41:50 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-185-52-31.eu-west-1.compute.internal:38781 with 4.0 GB RAM, BlockManagerId(5, ip-10-185-52-31.eu-west-1.compute.internal, 38781, None)
17/05/25 14:41:50 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.185.53.45:40096) with ID 3
17/05/25 14:41:50 INFO ExecutorAllocationManager: New executor 3 has registered (new total is 2)
17/05/25 14:41:50 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-185-53-45.eu-west-1.compute.internal:43702 with 4.0 GB RAM, BlockManagerId(3, ip-10-185-53-45.eu-west-1.compute.internal, 43702, None)
17/05/25 14:41:50 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.185.53.135:42390) with ID 2
17/05/25 14:41:50 INFO ExecutorAllocationManager: New executor 2 has registered (new total is 3)
17/05/25 14:41:50 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-185-53-135.eu-west-1.compute.internal:41552 with 4.0 GB RAM, BlockManagerId(2, ip-10-185-53-135.eu-west-1.compute.internal, 41552, None)
17/05/25 14:41:50 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.185.53.10:60612) with ID 1
17/05/25 14:41:50 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 4)
17/05/25 14:41:50 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-185-53-10.eu-west-1.compute.internal:33391 with 4.0 GB RAM, BlockManagerId(1, ip-10-185-53-10.eu-west-1.compute.internal, 33391, None)
17/05/25 14:41:50 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.185.53.68:57424) with ID 4
17/05/25 14:41:50 INFO ExecutorAllocationManager: New executor 4 has registered (new total is 5)
17/05/25 14:41:50 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-185-53-68.eu-west-1.compute.internal:34222 with 4.0 GB RAM, BlockManagerId(4, ip-10-185-53-68.eu-west-1.compute.internal, 34222, None)
17/05/25 14:42:09 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
17/05/25 14:42:09 INFO SharedState: Warehouse path is 'hdfs:///user/spark/warehouse'.
17/05/25 14:42:10 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
17/05/25 14:42:11 INFO CodeGenerator: Code generated in 170.416763 ms
17/05/25 14:42:11 INFO SparkContext: Starting job: collect at /home/hadoop/iface_extractions/select_fields.py:90
17/05/25 14:42:11 INFO DAGScheduler: Got job 0 (collect at /home/hadoop/iface_extractions/select_fields.py:90) with 1 output partitions
17/05/25 14:42:11 INFO DAGScheduler: Final stage: ResultStage 0 (collect at /home/hadoop/iface_extractions/select_fields.py:90)
17/05/25 14:42:11 INFO DAGScheduler: Parents of final stage: List()
17/05/25 14:42:11 INFO DAGScheduler: Missing parents: List()
17/05/25 14:42:11 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at collect at /home/hadoop/iface_extractions/select_fields.py:90), which has no missing parents
17/05/25 14:42:11 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 7.5 KB, free 4.0 GB)
17/05/25 14:42:11 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.1 KB, free 4.0 GB)
17/05/25 14:42:11 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.185.53.161:37860 (size: 4.1 KB, free: 4.0 GB)
17/05/25 14:42:11 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996
17/05/25 14:42:11 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at collect at /home/hadoop/iface_extractions/select_fields.py:90)
17/05/25 14:42:11 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
17/05/25 14:42:11 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-185-53-135.eu-west-1.compute.internal, executor 2, partition 0, PROCESS_LOCAL, 5899 bytes)
17/05/25 14:42:11 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-10-185-53-135.eu-west-1.compute.internal:41552 (size: 4.1 KB, free: 4.0 GB)
17/05/25 14:42:12 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1101 ms on ip-10-185-53-135.eu-west-1.compute.internal (executor 2) (1/1)
17/05/25 14:42:12 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/05/25 14:42:12 INFO DAGScheduler: ResultStage 0 (collect at /home/hadoop/iface_extractions/select_fields.py:90) finished in 1.109 s
17/05/25 14:42:12 INFO DAGScheduler: Job 0 finished: collect at /home/hadoop/iface_extractions/select_fields.py:90, took 1.290037 s
17/05/25 14:42:12 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 10.185.53.161:37860 in memory (size: 4.1 KB, free: 4.0 GB)
17/05/25 14:42:12 INFO SparkContext: Starting job: collect at /home/hadoop/iface_extractions/select_fields.py:91
17/05/25 14:42:12 INFO BlockManagerInfo: Removed broadcast_0_piece0 on ip-10-185-53-135.eu-west-1.compute.internal:41552 in memory (size: 4.1 KB, free: 4.0 GB)
17/05/25 14:42:12 INFO DAGScheduler: Got job 1 (collect at /home/hadoop/iface_extractions/select_fields.py:91) with 1 output partitions
17/05/25 14:42:12 INFO DAGScheduler: Final stage: ResultStage 1 (collect at /home/hadoop/iface_extractions/select_fields.py:91)
17/05/25 14:42:12 INFO DAGScheduler: Parents of final stage: List()
17/05/25 14:42:12 INFO DAGScheduler: Missing parents: List()
17/05/25 14:42:12 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at collect at /home/hadoop/iface_extractions/select_fields.py:91), which has no missing parents
17/05/25 14:42:12 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 7.5 KB, free 4.0 GB)
17/05/25 14:42:12 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.1 KB, free 4.0 GB)
17/05/25 14:42:12 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.185.53.161:37860 (size: 4.1 KB, free: 4.0 GB)
17/05/25 14:42:12 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:996
17/05/25 14:42:12 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at collect at /home/hadoop/iface_extractions/select_fields.py:91)
17/05/25 14:42:12 INFO YarnScheduler: Adding task set 1.0 with 1 tasks
17/05/25 14:42:12 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, ip-10-185-53-68.eu-west-1.compute.internal, executor 4, partition 0, PROCESS_LOCAL, 5900 bytes)
17/05/25 14:42:13 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-185-53-68.eu-west-1.compute.internal:34222 (size: 4.1 KB, free: 4.0 GB)
17/05/25 14:42:14 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 1047 ms on ip-10-185-53-68.eu-west-1.compute.internal (executor 4) (1/1)
17/05/25 14:42:14 INFO YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/05/25 14:42:14 INFO DAGScheduler: ResultStage 1 (collect at /home/hadoop/iface_extractions/select_fields.py:91) finished in 1.047 s
17/05/25 14:42:14 INFO DAGScheduler: Job 1 finished: collect at /home/hadoop/iface_extractions/select_fields.py:91, took 1.054768 s
17/05/25 14:42:14 INFO CodeGenerator: Code generated in 13.109425 ms
17/05/25 14:42:14 INFO CodeGenerator: Code generated in 12.568665 ms
17/05/25 14:42:14 INFO CodeGenerator: Code generated in 11.257538 ms
17/05/25 14:42:14 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 10.185.53.161:37860 in memory (size: 4.1 KB, free: 4.0 GB)
17/05/25 14:42:14 INFO BlockManagerInfo: Removed broadcast_1_piece0 on ip-10-185-53-68.eu-west-1.compute.internal:34222 in memory (size: 4.1 KB, free: 4.0 GB)
17/05/25 14:42:14 INFO CodeGenerator: Code generated in 11.563958 ms
17/05/25 14:42:14 INFO CodeGenerator: Code generated in 18.189301 ms
17/05/25 14:42:14 INFO CodeGenerator: Code generated in 13.490762 ms
17/05/25 14:42:14 INFO CodeGenerator: Code generated in 15.156166 ms
17/05/25 14:42:50 INFO ExecutorAllocationManager: Request to remove executorIds: 5, 3
17/05/25 14:42:50 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 5, 3
17/05/25 14:42:50 INFO YarnClientSchedulerBackend: Actual list of executor(s) to be killed is 5, 3
17/05/25 14:42:50 INFO ExecutorAllocationManager: Removing executor 5 because it has been idle for 60 seconds (new desired total will be 4)
17/05/25 14:42:50 INFO ExecutorAllocationManager: Removing executor 3 because it has been idle for 60 seconds (new desired total will be 3)
17/05/25 14:42:50 INFO ExecutorAllocationManager: Request to remove executorIds: 1
17/05/25 14:42:50 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 1
17/05/25 14:42:50 INFO YarnClientSchedulerBackend: Actual list of executor(s) to be killed is 1
17/05/25 14:42:50 INFO ExecutorAllocationManager: Removing executor 1 because it has been idle for 60 seconds (new desired total will be 2)
17/05/25 14:42:50 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 5.
17/05/25 14:42:50 INFO DAGScheduler: Executor lost: 5 (epoch 0)
17/05/25 14:42:50 INFO BlockManagerMasterEndpoint: Trying to remove executor 5 from BlockManagerMaster.
17/05/25 14:42:50 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(5, ip-10-185-52-31.eu-west-1.compute.internal, 38781, None)
17/05/25 14:42:50 INFO BlockManagerMaster: Removed 5 successfully in removeExecutor
17/05/25 14:42:50 INFO YarnScheduler: Executor 5 on ip-10-185-52-31.eu-west-1.compute.internal killed by driver.
17/05/25 14:42:50 INFO ExecutorAllocationManager: Existing executor 5 has been removed (new total is 4)
17/05/25 14:42:51 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 1.
17/05/25 14:42:51 INFO DAGScheduler: Executor lost: 1 (epoch 0)
17/05/25 14:42:51 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
17/05/25 14:42:51 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, ip-10-185-53-10.eu-west-1.compute.internal, 33391, None)
17/05/25 14:42:51 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
17/05/25 14:42:51 INFO YarnScheduler: Executor 1 on ip-10-185-53-10.eu-west-1.compute.internal killed by driver.
17/05/25 14:42:51 INFO ExecutorAllocationManager: Existing executor 1 has been removed (new total is 3)
17/05/25 14:42:51 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 3.
17/05/25 14:42:51 INFO DAGScheduler: Executor lost: 3 (epoch 0)
17/05/25 14:42:51 INFO BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster.
17/05/25 14:42:51 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, ip-10-185-53-45.eu-west-1.compute.internal, 43702, None)
17/05/25 14:42:51 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor
17/05/25 14:42:51 INFO YarnScheduler: Executor 3 on ip-10-185-53-45.eu-west-1.compute.internal killed by driver.
17/05/25 14:42:51 INFO ExecutorAllocationManager: Existing executor 3 has been removed (new total is 2)
17/05/25 14:43:12 INFO ExecutorAllocationManager: Request to remove executorIds: 2
17/05/25 14:43:12 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 2
17/05/25 14:43:12 INFO YarnClientSchedulerBackend: Actual list of executor(s) to be killed is 2
17/05/25 14:43:12 INFO ExecutorAllocationManager: Removing executor 2 because it has been idle for 60 seconds (new desired total will be 1)
17/05/25 14:43:13 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 2.
17/05/25 14:43:13 INFO DAGScheduler: Executor lost: 2 (epoch 0)
17/05/25 14:43:13 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
17/05/25 14:43:13 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, ip-10-185-53-135.eu-west-1.compute.internal, 41552, None)
17/05/25 14:43:13 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor
17/05/25 14:43:13 INFO YarnScheduler: Executor 2 on ip-10-185-53-135.eu-west-1.compute.internal killed by driver.
17/05/25 14:43:13 INFO ExecutorAllocationManager: Existing executor 2 has been removed (new total is 1)
17/05/25 14:43:14 INFO ExecutorAllocationManager: Request to remove executorIds: 4
17/05/25 14:43:14 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 4
17/05/25 14:43:14 INFO YarnClientSchedulerBackend: Actual list of executor(s) to be killed is 4
17/05/25 14:43:14 INFO ExecutorAllocationManager: Removing executor 4 because it has been idle for 60 seconds (new desired total will be 0)
17/05/25 14:43:17 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 4.
17/05/25 14:43:17 INFO DAGScheduler: Executor lost: 4 (epoch 0)
17/05/25 14:43:17 INFO BlockManagerMasterEndpoint: Trying to remove executor 4 from BlockManagerMaster.
17/05/25 14:43:17 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(4, ip-10-185-53-68.eu-west-1.compute.internal, 34222, None)
17/05/25 14:43:17 INFO BlockManagerMaster: Removed 4 successfully in removeExecutor
17/05/25 14:43:17 INFO YarnScheduler: Executor 4 on ip-10-185-53-68.eu-west-1.compute.internal killed by driver.
17/05/25 14:43:17 INFO ExecutorAllocationManager: Existing executor 4 has been removed (new total is 0)
My guess is that you've got Dynamic Resource Allocation enabled in your Spark configuration.
Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand. This feature is particularly useful if multiple applications share resources in your Spark cluster.
This feature is disabled by default and available on all coarse-grained cluster managers, i.e. standalone mode, YARN mode, and Mesos coarse-grained mode.
I highlighted the relevant part that says it is disabled by default and hence I can only guess that it was enabled.
From ExecutorAllocationManager:
An agent that dynamically allocates and removes executors based on the workload.
With that said, I'd use web UI and see if spark.dynamicAllocation.enabled property is enabled or not.
There are two requirements for using this feature (Dynamic Resource Allocation). First, your application must set spark.dynamicAllocation.enabled to true. Second, you must set up an external shuffle service on each worker node in the same cluster and set spark.shuffle.service.enabled to true in your application.
This is the line that prints out the INFO message:
logInfo("Request to remove executorIds: " + executors.mkString(", "))
You can also kill executors using SparkContext.killExecutors that gives a Spark developer a way to kill executors himself.
killExecutors(executorIds: Seq[String]): Boolean Request that the cluster manager kill the specified executors.
There are two killExecutors actually and they are very helpful for demo purposes as you can easily show how executors come and go.

Resources