Error when running apache spark-kafka-hbase jar from cmd - apache-spark

First of all I'm new to all this tech stack, so if I don't present all the details please let me know.
Here's my problem with this: I'm trying to make a jar archive of a apache spark - kafka app. To package my app into a jar I use sbt assembly plugin:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.10")
the package of the jar runs successfuly.
Now if I try to run it with:
spark-submit kafka-consumer.jar
the app boots up successfuly.
I want to do the same with the java -jar cmd, but unfortunately it fails.
Here's how to stack looks like:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/03/29 11:16:23 INFO SparkContext: Running Spark version 2.4.4
20/03/29 11:16:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/29 11:16:23 INFO SparkContext: Submitted application: KafkaConsumer
20/03/29 11:16:23 INFO SecurityManager: Changing view acls to: popar
20/03/29 11:16:23 INFO SecurityManager: Changing modify acls to: popar
20/03/29 11:16:23 INFO SecurityManager: Changing view acls groups to:
20/03/29 11:16:23 INFO SecurityManager: Changing modify acls groups to:
20/03/29 11:16:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(popar); groups with view permissions: Set(); users with modify permissions: Set(popar); groups with modify permissions: Set()
20/03/29 11:16:25 INFO Utils: Successfully started service 'sparkDriver' on port 55595.
20/03/29 11:16:25 INFO SparkEnv: Registering MapOutputTracker
20/03/29 11:16:25 INFO SparkEnv: Registering BlockManagerMaster
20/03/29 11:16:25 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/29 11:16:25 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/29 11:16:25 INFO DiskBlockManager: Created local directory at C:\Users\popar\AppData\Local\Temp\blockmgr-77af3fbc-264e-451c-9df3-5b7dda58f3a8
20/03/29 11:16:25 INFO MemoryStore: MemoryStore started with capacity 898.5 MB
20/03/29 11:16:26 INFO SparkEnv: Registering OutputCommitCoordinator
20/03/29 11:16:26 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/03/29 11:16:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://DESKTOP-0IISN4F.mshome.net:4040
20/03/29 11:16:26 INFO Executor: Starting executor ID driver on host localhost
20/03/29 11:16:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 55636.
20/03/29 11:16:26 INFO NettyBlockTransferService: Server created on DESKTOP-0IISN4F.mshome.net:55636
20/03/29 11:16:26 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/29 11:16:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, DESKTOP-0IISN4F.mshome.net, 55636, None)
20/03/29 11:16:26 INFO BlockManagerMasterEndpoint: Registering block manager DESKTOP-0IISN4F.mshome.net:55636 with 898.5 MB RAM, BlockManagerId(driver, DESKTOP-0IISN4F.mshome.net, 55636, None)
20/03/29 11:16:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, DESKTOP-0IISN4F.mshome.net, 55636, None)
20/03/29 11:16:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, DESKTOP-0IISN4F.mshome.net, 55636, None)
Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:181)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.spark.streaming.StreamingContext.checkpoint(StreamingContext.scala:239)
at KafkaConsumer$.main(KafkaConsumer.scala:85)
at KafkaConsumer.main(KafkaConsumer.scala)
As you can see it fails with: Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
Now for my main class this is the def:
import Service._
import kafka.serializer.StringDecoder
import org.apache.hadoop.hbase.TableName
import org.apache.hadoop.hbase.client.{Put, Scan, Table}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.log4j.{Level, Logger}
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.collection.JavaConversions
import scala.util.Try
object KafkaConsumer {
def setupLogging(): Unit = {
val rootLogger = Logger.getRootLogger
rootLogger.setLevel(Level.ERROR)
}
def persistToHbase[A](rdd: A): Unit = {
val connection = getHbaseConnection
val admin = connection.getAdmin
val columnFamily1 = "personal_data"
val table = connection.getTable(TableName.valueOf("employees"))
val scan = new Scan()
scan.addFamily(columnFamily1.getBytes())
val totalRows: Int = getLastRowNumber(scan, columnFamily1, table)
persistRdd(rdd, table, columnFamily1, totalRows + 1)
Try(table.close())
Try(admin.close())
Try(connection.close())
}
private def getLastRowNumber[A](scan: Scan,
columnFamily: String,
table: Table): Int = {
val scanner = table.getScanner(scan)
val values = scanner.iterator()
val seq = JavaConversions.asScalaIterator(values).toIndexedSeq
seq.size
}
def persistRdd[A](rdd: A,
table: Table,
columnFamily: String,
rowNumber: Int): Unit = {
val row = Bytes.toBytes(String.valueOf(rowNumber))
val put = new Put(row)
val qualifier = "test_column"
put.addColumn(columnFamily.getBytes(),
qualifier.getBytes(),
String.valueOf(rdd).getBytes())
table.put(put)
}
def main(args: Array[String]): Unit = {
// create the context with a one second batch of data & uses all the CPU cores
val context = new StreamingContext("local[*]", "KafkaConsumer", Seconds(1))
// hostname:port for Kafka brokers
val kafkaParams = Map("metadata.broker.list" -> "192.168.56.22:9092")
// list of topics you want to listen from Kafka
val topics = List("employees").toSet
setupLogging()
// create a Kafka Stream, which will contain(topic, message) pairs
// we take a map(_._2) at the end in order to only get the messages which contain individual lines of data
val stream: DStream[String] = KafkaUtils
.createDirectStream[String, String, StringDecoder, StringDecoder](
context,
kafkaParams,
topics)
.map(_._2)
// debug print
stream.print()
// stream.foreachRDD(rdd => rdd.foreach(persistToHbase(_)))
context.checkpoint("C:/checkpoint/")
context.start()
context.awaitTermination()
}
}
and the build.sbt looks like this:
import sbt._
import Keys._
name := "kafka-consumer"
version := "0.1"
scalaVersion := "2.11.8"
lazy val sparkVersion = "2.4.4"
lazy val sparkStreamingKafkaVersion = "1.6.3"
lazy val hbaseVersion = "2.2.1"
lazy val hadoopVersion = "2.8.0"
lazy val hadoopCoreVersion = "1.2.1"
resolvers in Global ++= Seq(
"Sbt plugins" at "https://dl.bintray.com/sbt/sbt-plugin-releases"
)
lazy val commonSettings = Seq(
version := "0.1",
organization := "com.rares",
scalaVersion := "2.11.8",
test in assembly := {}
)
lazy val excludeJPountz =
ExclusionRule(organization = "net.jpountz.lz4", name = "lz4")
lazy val excludeHadoop =
ExclusionRule(organization = "org.apache.hadoop")
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % sparkVersion excludeAll (excludeJPountz, excludeHadoop),
"org.apache.spark" % "spark-streaming-kafka_2.11" % sparkStreamingKafkaVersion,
"org.apache.spark" % "spark-streaming_2.11" % sparkVersion excludeAll (excludeJPountz),
"org.apache.hadoop" % "hadoop-client" % hadoopVersion,
"org.apache.hbase" % "hbase-server" % hbaseVersion,
"org.apache.hbase" % "hbase-client" % hbaseVersion,
"org.apache.hbase" % "hbase-common" % hbaseVersion
)
//Fat jar
assemblyMergeStrategy in assembly := {
case PathList("org", "aopalliance", xs # _*) => MergeStrategy.last
case PathList("javax", "inject", xs # _*) => MergeStrategy.last
case PathList("net", "jpountz", xs # _*) => MergeStrategy.last
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case PathList("jetty-dir.css", xs # _*) => MergeStrategy.last
case PathList("org", "apache", xs # _*) => MergeStrategy.last
case PathList("com", "sun", xs # _*) => MergeStrategy.last
case PathList("hdfs-default.xml", xs # _*) => MergeStrategy.last
case PathList("javax", xs # _*) => MergeStrategy.last
case PathList("mapred-default.xml", xs # _*) => MergeStrategy.last
case PathList("core-default.xml", xs # _*) => MergeStrategy.last
case PathList("javax", "servlet", xs # _*) => MergeStrategy.last
// case "git.properties" => MergeStrategy.last
// case PathList("org", "apache", "jasper", xs # _*) => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
assemblyJarName in assembly := "kafka-consumer.jar"
Any advice will be deeply appreciated!!!

Ok, so here is what helped me out. Add the hadoop configuration for the spark context as following:
val hadoopConfiguration = context.sparkContext.hadoopConfiguration
hadoopConfiguration.set(
"fs.hdfs.impl",
classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
hadoopConfiguration.set(
"fs.file.impl",
classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
Works like a charm!
A big thanks to this: hadoop No FileSystem for scheme: file

Related

What could be the reasons for JsonMappingException?

Get an exception while the Spark (3.0.1) application is running:
com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.0 requires Jackson Databind version >= 2.10.0 and < 2.11.0
I have already seen questions with a similar Jackson error, but they are different from my case: these Jackson dependencies are already match up and should be compatible:
[Driver] INFO ru.kontur.srs.infra.launching.daemon.DaemonLauncher$ - /hadoop/yarn/local/usercache/SrsApp/filecache/4368/__spark_libs__4872112349984824286.zip/jackson-annotations-2.10.0.jar
[Driver] INFO ru.kontur.srs.infra.launching.daemon.DaemonLauncher$ - /hadoop/yarn/local/usercache/SrsApp/filecache/4368/__spark_libs__4872112349984824286.zip/jackson-core-2.10.0.jar
[Driver] INFO ru.kontur.srs.infra.launching.daemon.DaemonLauncher$ - /hadoop/yarn/local/usercache/SrsApp/filecache/4368/__spark_libs__4872112349984824286.zip/jackson-databind-2.10.0.jar
[Driver] INFO ru.kontur.srs.infra.launching.daemon.DaemonLauncher$ - /hadoop/yarn/local/usercache/SrsApp/filecache/4368/__spark_libs__4872112349984824286.zip/jackson-dataformat-yaml-2.10.0.jar
[Driver] INFO ru.kontur.srs.infra.launching.daemon.DaemonLauncher$ - /hadoop/yarn/local/usercache/SrsApp/filecache/4368/__spark_libs__4872112349984824286.zip/jackson-datatype-jsr310-2.10.3.jar
[Driver] INFO ru.kontur.srs.infra.launching.daemon.DaemonLauncher$ - /hadoop/yarn/local/usercache/SrsApp/filecache/4368/__spark_libs__4872112349984824286.zip/jackson-module-jaxb-annotations-2.10.0.jar
[Driver] INFO ru.kontur.srs.infra.launching.daemon.DaemonLauncher$ - /hadoop/yarn/local/usercache/SrsApp/filecache/4368/__spark_libs__4872112349984824286.zip/jackson-module-paranamer-2.10.0.jar
[Driver] INFO ru.kontur.srs.infra.launching.daemon.DaemonLauncher$ - /hadoop/yarn/local/usercache/SrsApp/filecache/4368/__spark_libs__4872112349984824286.zip/jackson-module-scala_2.12-2.10.0.jar
I double-checked it even in this way:
def main(args: Array[String]): Unit = {
// ...
logger.info("JACKSON VERSION: " + new ObjectMapper().version())
// ...
}
> [Driver] INFO ru.kontur.khajiit.reports.builders.raw_products.ReportsRawProductsReportBuilderDaemon$ - JACKSON VERSION: 2.10.0
What could be the reasons that 2.10 does not pass the >= 2.10.0 and < 2.11.0 check and how can this problem be solved?

Cannot write to Druid through SparkStreaming and Tranquility

I am trying to write results from Spark Streaming job to Druid datasource. Spark successfully completes its jobs and hands to Druid. Druid starts indexing but does not write anything.
My code and logs are as follows:
import org.apache.spark._
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
impor org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import scala.util.parsing.json._
import com.metamx.tranquility.spark.BeamRDD._
import org.joda.time.{DateTime, DateTimeZone}
object MyDirectStreamDriver {
def main(args:Array[String]) {
val sc = new SparkContext()
val ssc = new StreamingContext(sc, Minutes(5))
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "[$hadoopURL]:6667",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "use_a_separate_group_id_for_each_stream",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val eventStream = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](Array("events_test"), kafkaParams))
val t = eventStream.map(record => record.value).flatMap(_.split("(?<=\\}),(?=\\{)")).
map(JSON.parseRaw(_).getOrElse(new JSONObject(Map(""-> ""))).asInstanceOf[JSONObject]).
map( new DateTime(), x => (x.obj.getOrElse("OID", "").asInstanceOf[String], x.obj.getOrElse("STATUS", "").asInstanceOf[Double].toInt)).
map(x => MyEvent(x._1, x._2, x._3))
t.saveAsTextFiles("/user/username/result", "txt")
t.foreachRDD(rdd => rdd.propagate(new MyEventBeamFactory))
ssc.start
ssc.awaitTermination
}
}
case class MyEvent (time: DateTime,oid: String, status: Int)
{
#JsonValue
def toMap: Map[String, Any] = Map(
"timestamp" -> (time.getMillis / 1000),
"oid" -> oid,
"status" -> status
)
}
object MyEvent {
implicit val MyEventTimestamper = new Timestamper[MyEvent] {
def timestamp(a: MyEvent) = a.time
}
val Columns = Seq("time", "oid", "status")
def fromMap(d: Dict): MyEvent = {
MyEvent(
new DateTime(long(d("timestamp")) * 1000),
str(d("oid")),
int(d("status"))
)
}
}
import org.apache.curator.framework.CuratorFrameworkFactory
import org.apache.curator.retry.BoundedExponentialBackoffRetry
import io.druid.granularity._
import io.druid.query.aggregation.LongSumAggregatorFactory
import com.metamx.common.Granularity
import org.joda.time.Period
class MyEventBeamFactory extends BeamFactory[MyEvent]
{
// Return a singleton, so the same connection is shared across all tasks in the same JVM.
def makeBeam: Beam[MyEvent] = MyEventBeamFactory.BeamInstance
object MyEventBeamFactory {
val BeamInstance: Beam[MyEvent] = {
// Tranquility uses ZooKeeper (through Curator framework) for coordination.
val curator = CuratorFrameworkFactory.newClient(
"{IP_2}:2181",
new BoundedExponentialBackoffRetry(100, 3000, 5)
)
curator.start()
val indexService = DruidEnvironment("druid/overlord") // Your overlord's druid.service, with slashes replaced by colons.
val discoveryPath = "/druid/discovery" // Your overlord's druid.discovery.curator.path
val dataSource = "events_druid"
val dimensions = IndexedSeq("oid")
val aggregators = Seq(new LongSumAggregatorFactory("status", "status"))
// Expects simpleEvent.timestamp to return a Joda DateTime object.
DruidBeams
.builder((event: MyEvent) => event.time)
.curator(curator)
.discoveryPath(discoveryPath)
.location(DruidLocation(indexService, dataSource))
.rollup(DruidRollup(SpecificDruidDimensions(dimensions), aggregators, QueryGranularities.MINUTE))
.tuning(
ClusteredBeamTuning(
segmentGranularity = Granularity.HOUR,
windowPeriod = new Period("PT10M"),
partitions = 1,
replicants = 1
)
)
.buildBeam()
}
}
}
This is the druid indexing task log: (index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0)
2017-12-28T13:05:19,299 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Running with task: {
"type" : "index_realtime",
"id" : "index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0",
"resource" : {
"availabilityGroup" : "events_druid-2017-12-28T13:00:00.000Z-0000",
"requiredCapacity" : 1
},
"spec" : {
"dataSchema" : {
"dataSource" : "events_druid",
"parser" : {
"type" : "map",
"parseSpec" : {
"format" : "json",
"timestampSpec" : {
"column" : "timestamp",
"format" : "iso",
"missingValue" : null
},
"dimensionsSpec" : {
"dimensions" : [ "oid" ],
"spatialDimensions" : [ ]
}
}
},
"metricsSpec" : [ {
"type" : "longSum",
"name" : "status",
"fieldName" : "status",
"expression" : null
} ],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "HOUR",
"queryGranularity" : {
"type" : "duration",
"duration" : 60000,
"origin" : "1970-01-01T00:00:00.000Z"
},
"rollup" : true,
"intervals" : null
}
},
"ioConfig" : {
"type" : "realtime",
"firehose" : {
"type" : "clipped",
"delegate" : {
"type" : "timed",
"delegate" : {
"type" : "receiver",
"serviceName" : "firehose:druid:overlord:events_druid-013-0000-0000",
"bufferSize" : 100000
},
"shutoffTime" : "2017-12-28T14:15:00.000Z"
},
"interval" : "2017-12-28T13:00:00.000Z/2017-12-28T14:00:00.000Z"
},
"firehoseV2" : null
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : 75000,
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M",
"basePersistDirectory" : "/tmp/1514466313873-0",
"versioningPolicy" : {
"type" : "intervalStart"
},
"rejectionPolicy" : {
"type" : "none"
},
"maxPendingPersists" : 0,
"shardSpec" : {
"type" : "linear",
"partitionNum" : 0
},
"indexSpec" : {
"bitmap" : {
"type" : "concise"
},
"dimensionCompression" : "lz4",
"metricCompression" : "lz4",
"longEncoding" : "longs"
},
"buildV9Directly" : true,
"persistThreadPriority" : 0,
"mergeThreadPriority" : 0,
"reportParseExceptions" : false,
"handoffConditionTimeout" : 0,
"alertTimeout" : 0
}
},
"context" : null,
"groupId" : "index_realtime_events_druid",
"dataSource" : "events_druid"
}
2017-12-28T13:05:19,312 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Attempting to lock file[/apps/druid/tasks/index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0/lock].
2017-12-28T13:05:19,313 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Acquired lock file[/apps/druid/tasks/index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0/lock] in 1ms.
2017-12-28T13:05:19,317 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Running task: index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0
2017-12-28T13:05:19,323 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0] location changed to [TaskLocation{host='hadooptest9.{host}', port=8100}].
2017-12-28T13:05:19,323 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0] status changed to [RUNNING].
2017-12-28T13:05:19,327 INFO [main] org.eclipse.jetty.server.Server - jetty-9.3.19.v20170502
2017-12-28T13:05:19,350 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Creating plumber using rejectionPolicy[io.druid.segment.realtime.plumber.NoopRejectionPolicyFactory$1#7925d517]
2017-12-28T13:05:19,351 INFO [task-runner-0-priority-0] io.druid.server.coordination.CuratorDataSegmentServerAnnouncer - Announcing self[DruidServerMetadata{name='hadooptest9.{host}:8100', host='hadooptest9.{host}:8100', maxSize=0, tier='_default_tier', type='realtime', priority='0'}] at [/druid/announcements/hadooptest9.{host}:8100]
2017-12-28T13:05:19,382 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Expect to run at [2017-12-28T14:10:00.000Z]
2017-12-28T13:05:19,392 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push.
2017-12-28T13:05:19,392 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] segments. Attempting to hand off segments that start before [1970-01-01T00:00:00.000Z].
2017-12-28T13:05:19,392 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge
2017-12-28T13:05:19,451 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.EventReceiverFirehoseFactory - Connecting firehose: firehose:druid:overlord:events_druid-013-0000-0000
2017-12-28T13:05:19,453 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.EventReceiverFirehoseFactory - Found chathandler of class[io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider]
2017-12-28T13:05:19,453 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Registering Eventhandler[firehose:druid:overlord:events_druid-013-0000-0000]
2017-12-28T13:05:19,454 INFO [task-runner-0-priority-0] io.druid.curator.discovery.CuratorServiceAnnouncer - Announcing service[DruidNode{serviceName='firehose:druid:overlord:events_druid-013-0000-0000', host='hadooptest9.{host}', port=8100}]
2017-12-28T13:05:19,502 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider as a provider class
2017-12-28T13:05:19,502 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider as a provider class
2017-12-28T13:05:19,502 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering io.druid.server.initialization.jetty.CustomExceptionMapper as a provider class
2017-12-28T13:05:19,502 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering io.druid.server.StatusResource as a root resource class
2017-12-28T13:05:19,505 INFO [main] com.sun.jersey.server.impl.application.WebApplicationImpl - Initiating Jersey application, version 'Jersey: 1.19.3 10/24/2016 03:43 PM'
2017-12-28T13:05:19,515 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Registering Eventhandler[events_druid-013-0000-0000]
2017-12-28T13:05:19,515 INFO [task-runner-0-priority-0] io.druid.curator.discovery.CuratorServiceAnnouncer - Announcing service[DruidNode{serviceName='events_druid-013-0000-0000', host='hadooptest9.{host}', port=8100}]
2017-12-28T13:05:19,529 WARN [task-runner-0-priority-0] org.apache.curator.utils.ZKPaths - The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT will be used instead.
2017-12-28T13:05:19,535 INFO [task-runner-0-priority-0] io.druid.server.metrics.EventReceiverFirehoseRegister - Registering EventReceiverFirehoseMetric for service [firehose:druid:overlord:events_druid-013-0000-0000]
2017-12-28T13:05:19,536 INFO [task-runner-0-priority-0] io.druid.data.input.FirehoseFactory - Firehose created, will shut down at: 2017-12-28T14:15:00.000Z
2017-12-28T13:05:19,574 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.initialization.jetty.CustomExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton"
2017-12-28T13:05:19,576 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider to GuiceManagedComponentProvider with the scope "Singleton"
2017-12-28T13:05:19,583 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider to GuiceManagedComponentProvider with the scope "Singleton"
2017-12-28T13:05:19,845 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.http.security.StateResourceFilter to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,863 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.http.SegmentListerResource to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,874 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.QueryResource to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,876 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.segment.realtime.firehose.ChatHandlerResource to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,880 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.query.lookup.LookupListeningResource to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,882 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.query.lookup.LookupIntrospectionResource to GuiceInstantiatedComponentProvider
2017-12-28T13:05:19,883 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.StatusResource to GuiceManagedComponentProvider with the scope "Undefined"
2017-12-28T13:05:19,896 WARN [main] com.sun.jersey.spi.inject.Errors - The following warnings have been detected with resource and/or provider classes:
WARNING: A HTTP GET method, public void io.druid.server.http.SegmentListerResource.getSegments(long,long,long,javax.servlet.http.HttpServletRequest) throws java.io.IOException, MUST return a non-void type.
2017-12-28T13:05:19,905 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler#2fba0dac{/,null,AVAILABLE}
2017-12-28T13:05:19,914 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started ServerConnector#25218a4d{HTTP/1.1,[http/1.1]}{0.0.0.0:8100}
2017-12-28T13:05:19,914 INFO [main] org.eclipse.jetty.server.Server - Started #6014ms
2017-12-28T13:05:19,915 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.server.listener.announcer.ListenerResourceAnnouncer.start()] on object[io.druid.query.lookup.LookupResourceListenerAnnouncer#426710f0].
2017-12-28T13:05:19,919 INFO [main] io.druid.server.listener.announcer.ListenerResourceAnnouncer - Announcing start time on [/druid/listeners/lookups/__default/hadooptest9.{host}:8100]
2017-12-28T13:05:20,517 WARN [task-runner-0-priority-0] io.druid.segment.realtime.firehose.PredicateFirehose - [0] InputRow(s) ignored as they do not satisfy the predicate
This is index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0 payload:
{
"task":"index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0","payload":{
"id":"index_realtime_events_druid_2017-12-28T13:00:00.000Z_0_0","resource":{
"availabilityGroup":"events_druid-2017-12-28T13:00:00.000Z-0000","requiredCapacity":1},"spec":{
"dataSchema":{
"dataSource":"events_druid","parser":{
"type":"map","parseSpec":{
"format":"json","timestampSpec":{
"column":"timestamp","format":"iso","missingValue":null},"dimensionsSpec":{
"dimensions":["oid"],"spatialDimensions":[]}}},"metricsSpec":[{
"type":"longSum","name":"status","fieldName":"status","expression":null}],"granularitySpec":{
"type":"uniform","segmentGranularity":"HOUR","queryGranularity":{
"type":"duration","duration":60000,"origin":"1970-01-01T00:00:00.000Z"},"rollup":true,"intervals":null}},"ioConfig":{
"type":"realtime","firehose":{
"type":"clipped","delegate":{
"type":"timed","delegate":{
"type":"receiver","serviceName":"firehose:druid:overlord:events_druid-013-0000-0000","bufferSize":100000},"shutoffTime":"2017-12-28T14:15:00.000Z"},"interval":"2017-12-28T13:00:00.000Z/2017-12-28T14:00:00.000Z"},"firehoseV2":null},"tuningConfig":{
"type":"realtime","maxRowsInMemory":75000,"intermediatePersistPeriod":"PT10M","windowPeriod":"PT10M","basePersistDirectory":"/tmp/1514466313873-0","versioningPolicy":{
"type":"intervalStart"},"rejectionPolicy":{
"type":"none"},"maxPendingPersists":0,"shardSpec":{
"type":"linear","partitionNum":0},"indexSpec":{
"bitmap":{
"type":"concise"},"dimensionCompression":"lz4","metricCompression":"lz4","longEncoding":"longs"},"buildV9Directly":true,"persistThreadPriority":0,"mergeThreadPriority":0,"reportParseExceptions":false,"handoffConditionTimeout":0,"alertTimeout":0}},"context":null,"groupId":"index_realtime_events_druid","dataSource":"events_druid"}}
This is end of spark job stderr
50:09 INFO ZooKeeper: Client environment:os.version=3.10.0-514.10.2.el7.x86_64
17/12/28 14:50:09 INFO ZooKeeper: Client environment:user.name=yarn
17/12/28 14:50:09 INFO ZooKeeper: Client environment:user.home=/home/yarn
17/12/28 14:50:09 INFO ZooKeeper: Client environment:user.dir=/data1/hadoop/yarn/local/usercache/hdfs/appcache/application_1512485869804_6924/container_e58_1512485869804_6924_01_000002
17/12/28 14:50:09 INFO ZooKeeper: Initiating client connection, connectString={IP2}:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState#5967905
17/12/28 14:50:09 INFO ClientCnxn: Opening socket connection to server {IP2}/{IP2}:2181. Will not attempt to authenticate using SASL (unknown error)
17/12/28 14:50:09 INFO ClientCnxn: Socket connection established, initiating session, client: /{IP6}:42704, server: {IP2}/{IP2}:2181
17/12/28 14:50:09 INFO ClientCnxn: Session establishment complete on server {IP2}/{IP2}:2181, sessionid = 0x25fa4ea15980119, negotiated timeout = 40000
17/12/28 14:50:10 INFO ConnectionStateManager: State change: CONNECTED
17/12/28 14:50:10 INFO Version: HV000001: Hibernate Validator 5.1.3.Final
17/12/28 14:50:10 INFO JsonConfigurator: Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, directory='extensions', hadoopDependenciesDir='hadoop-dependencies', hadoopContainerDruidClasspath='null', loadList=null}]
17/12/28 14:50:10 INFO LoggingEmitter: Start: started [true]
17/12/28 14:50:11 INFO FinagleRegistry: Adding resolver for scheme[disco].
17/12/28 14:50:11 INFO CachedKafkaConsumer: Initial fetch for spark-executor-use_a_separate_group_id_for_each_stream events_test 0 6658
17/12/28 14:50:12 INFO ClusteredBeam: Global latestCloseTime[2017-12-28T12:00:00.000Z] for identifier[druid:overlord/events_druid] has moved past timestamp[2017-12-28T12:00:00.000Z], not creating merged beam
17/12/28 14:50:12 INFO ClusteredBeam: Turns out we decided not to actually make beams for identifier[druid:overlord/events_druid] timestamp[2017-12-28T12:00:00.000Z]. Returning None.
17/12/28 14:50:12 WARN MapPartitioner: Cannot partition object of class[class MyEvent] by time and dimensions. Consider implementing a Partitioner.
17/12/28 14:50:12 INFO ClusteredBeam: Global latestCloseTime[2017-12-28T12:00:00.000Z] for identifier[druid:overlord/events_druid] has moved past timestamp[2017-12-28T12:00:00.000Z], not creating merged beam
17/12/28 14:50:12 INFO ClusteredBeam: Turns out we decided not to actually make beams for identifier[druid:overlord/events_druid] timestamp[2017-12-28T12:00:00.000Z]. Returning None.
17/12/28 14:50:12 INFO ClusteredBeam: Global latestCloseTime[2017-12-28T12:00:00.000Z] for identifier[druid:overlord/events_druid] has moved past timestamp[2017-12-28T12:00:00.000Z], not creating merged beam
17/12/28 14:50:12 INFO ClusteredBeam: Turns out we decided not to actually make beams for identifier[druid:overlord/events_druid] timestamp[2017-12-28T12:00:00.000Z]. Returning None.
17/12/28 14:50:12 INFO ClusteredBeam: Global latestCloseTime[2017-12-28T12:00:00.000Z] for identifier[druid:overlord/events_druid] has moved past timestamp[2017-12-28T12:00:00.000Z], not creating merged beam
17/12/28 14:50:12 INFO ClusteredBeam: Turns out we decided not to actually make beams for identifier[druid:overlord/events_druid] timestamp[2017-12-28T12:00:00.000Z]. Returning None.
17/12/28 14:50:12 INFO ClusteredBeam: Global latestCloseTime[2017-12-28T12:00:00.000Z] for identifier[druid:overlord/events_druid] has moved past timestamp[2017-12-28T12:00:00.000Z], not creating merged beam
17/12/28 14:50:12 INFO ClusteredBeam: Turns out we decided not to actually make beams for identifier[druid:overlord/events_druid] timestamp[2017-12-28T12:00:00.000Z]. Returning None.
17/12/28 14:50:16 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1541 bytes result sent to driver
I have also written result to a text file to make sure data is coming and formatted. Here are a few lines of text file:
MyEvent(2017-12-28T16:10:00.387+03:00,0010,1)
MyEvent(2017-12-28T16:10:00.406+03:00,0030,1)
MyEvent(2017-12-28T16:10:00.417+03:00,0010,1)
MyEvent(2017-12-28T16:10:00.431+03:00,0010,1)
MyEvent(2017-12-28T16:10:00.448+03:00,0010,1)
MyEvent(2017-12-28T16:10:00.464+03:00,0030,1)
Help is much appreciated. Thanks.
This problem was solved by adding timestampSpec to DruidBeams as such:
DruidBeams
.builder((event: MyEvent) => event.time)
.curator(curator)
.discoveryPath(discoveryPath)
.location(DruidLocation(indexService, dataSource))
.rollup(DruidRollup(SpecificDruidDimensions(dimensions), aggregators, QueryGranularities.MINUTE))
.tuning(
ClusteredBeamTuning(
segmentGranularity = Granularity.HOUR,
windowPeriod = new Period("PT10M"),
partitions = 1,
replicants = 1
)
)
.timestampSpec(new TimestampSpec("timestamp", "posix", null))
.buildBeam()

Derby Metastore directory is created in spark workspace

I have spark 2.1.0 installed and integrated with eclipse and hive2 installed and metastore configured in Mysql also placed hive-site.xml file in spark >> conf folder. I'm trying to access tables already present in hive from eclipse.
when I execute the program metastore folder and derby.log file is been created in spark workspace and eclipse console show the below INFO:
Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/06/13 18:26:43 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/06/13 18:26:43 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/06/13 18:26:43 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/06/13 18:26:43 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/06/13 18:26:43 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery#0" since the connection used is closing
17/06/13 18:26:43 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
spark can't able to locate the configured mysql metastore database
also throwing the error
Exception in thread "main" java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
Code:
import org.apache.spark.SparkContext, org.apache.spark.SparkConf
import com.typesafe.config._
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
object hivecore {
def main(args: Array[String]) {
val warehouseLocation = "hdfs://HADOOPMASTER:54310/user/hive/warehouse"
val spark = SparkSession
.builder().master("local[*]")
.appName("hivecore")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
import spark.sql
sql("SELECT * FROM sample.source").show()
}
}
Build.sbt
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0"
libraryDependencies += "com.typesafe" % "config" % "1.3.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
libraryDependencies += "org.apache.spark" % "spark-hive_2.11" % "2.1.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.42"
NOTE : I can able to access the hive tables from Spark-shell
Thanks
When you put context.setMaster(local), it may not look for the spark configurations that you setup in cluster; specially when you trigger it from ECLIPSE.
Make a jar out of it; and trigger from cmd as spark-submit --class <main class package> --master spark://207.184.161.138:7077 --deploy-mode client
The master ip: spark://207.184.161.138:7077 should be replace with your cluster's ip and spark port.
And, remember to initialize HiveContext to trigger query on underlying HIVE.
val hc = new HiveContext(sc)
hc.sql("SELECT * FROM ...")

Spark Driver Heap Memory Issues

I am seeing issues where I slowly run out of Java Heap on the master node. Below is a simple example I've created which just repeats itself 200 times. With the settings below the master runs out of memory in about 1 hour with the following error:
16/12/15 17:55:46 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 97578 on executor id: 9 hostname: ip-xxx-xxx-xx-xx
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 20160"...
The Code:
import org.apache.spark.sql.functions._
import org.apache.spark._
object MemTest {
case class X(colval: Long, colname: Long, ID: Long)
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("MemTest")
val spark = new SparkContext(conf)
val sc = org.apache.spark.sql.SQLContext.getOrCreate(spark)
import sc.implicits._;
for( a <- 1 to 200)
{
var df = spark.parallelize((1 to 5000000).map(x => X(x.toLong, x.toLong % 10, x.toLong / 10 ))).toDF()
df = df.groupBy("ID").pivot("colname").agg(max("colval"))
df.count
}
spark.stop()
}
}
I'm running on AWS emr-5.1.0 using m4.xlarge (4 nodes+1 master). Here are my spark settings
{
"Classification": "spark-defaults",
"Properties": {
"spark.dynamicAllocation.enabled": "false",
"spark.executor.instances": "16",
"spark.executor.memory": "2560m",
"spark.driver.memory": "768m",
"spark.executor.cores": "1"
}
},
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "false"
}
},
I compile with sbt using
name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.2" % "provided",
"org.apache.spark" %% "spark-sql" % "2.0.2")
and then run it using
spark-submit --class MemTest target/scala-2.11/simple-project_2.11-1.0.jar
Looking at memory with jmap -histo I see java.lang.Long and scala.Tuple2 keep growing.
Are you sure the spark version installed on the cluster is 2.0.2?
Or if there are several Spark installations on your cluster, are you sure you're calling the correct (2.0.2) spark-submit?
(I unfortunately cannot comment so that's the reason I posted this as an answer)

How to run spark + cassandra + mesos (dcos) with dynamic resource allocation?

On every slave node through Marathon we run Mesos External Shuffle Service . When we submit spark job via dcos CLI in coarse grained mode without dynamic allocation everything working as expected. But when we submit the same job with dynamic allocation it fails.
16/12/08 19:20:42 ERROR OneForOneBlockFetcher: Failed while starting block fetches
java.lang.RuntimeException: java.lang.RuntimeException: Failed to open file:/tmp/blockmgr-d4df5df4-24c9-41a3-9f26-4c1aba096814/30/shuffle_0_0_0.index
at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:234)
...
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
...
Caused by: java.io.FileNotFoundException: /tmp/blockmgr-d4df5df4-24c9-41a3-9f26-4c1aba096814/30/shuffle_0_0_0.index (No such file or directory)
Full description:
We installed Mesos (DCOS) with Marathon using Azure Portal.
Via Universe Packages we installed: Cassandra, Spark and Marathon-lb
We generated test data in Cassandra.
On laptop I installed dcos CLI
When I submit job as below everything is working as expected:
./dcos spark run --submit-args="--properties-file coarse-grained.conf --class portal.spark.cassandra.app.ProductModelPerNrOfAlerts http://marathon-lb-default.marathon.mesos:10018/jars/spark-cassandra-assembly-1.0.jar"
Run job succeeded. Submission id: driver-20161208185927-0043
cqlsh:sp> select count(*) from product_model_per_alerts_by_date ;
count
-------
476
coarse-grained.conf:
spark.cassandra.connection.host 10.32.0.17
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.executor.cores 1
spark.executor.memory 1g
spark.executor.instances 2
spark.submit.deployMode cluster
spark.cores.max 4
portal.spark.cassandra.app.ProductModelPerNrOfAlerts:
package portal.spark.cassandra.app
import org.apache.spark.sql.{SQLContext, SaveMode}
import org.apache.spark.{SparkConf, SparkContext}
object ProductModelPerNrOfAlerts {
def main(args: Array[String]): Unit = {
val conf = new SparkConf(true)
.setAppName("cassandraSpark-ProductModelPerNrOfAlerts")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = sqlContext
.read
.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "asset_history", "keyspace" -> "sp"))
.load()
.select("datestamp","product_model","nr_of_alerts")
val dr = df
.groupBy("datestamp","product_model")
.avg("nr_of_alerts")
.toDF("datestamp","product_model","nr_of_alerts")
dr.write
.mode(SaveMode.Overwrite)
.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "product_model_per_alerts_by_date", "keyspace" -> "sp"))
.save()
sc.stop()
}
}
Dynamic Allocation
Through Marathon we run Mesos External Shuffle Service:
{
"id": "spark-mesos-external-shuffle-service-tt",
"container": {
"type": "DOCKER",
"docker": {
"image": "jpavt/mesos-spark-hadoop:mesos-external-shuffle-service-1.0.4-2.0.1",
"network": "BRIDGE",
"portMappings": [
{ "hostPort": 7337, "containerPort": 7337, "servicePort": 7337 }
],
"forcePullImage":true,
"volumes": [
{
"containerPath": "/tmp",
"hostPath": "/tmp",
"mode": "RW"
}
]
}
},
"instances": 9,
"cpus": 0.2,
"mem": 512,
"constraints": [["hostname", "UNIQUE"]]
}
Dockerfile for jpavt/mesos-spark-hadoop:mesos-external-shuffle-service-1.0.4-2.0.1:
FROM mesosphere/spark:1.0.4-2.0.1
WORKDIR /opt/spark/dist
ENTRYPOINT ["./bin/spark-class", "org.apache.spark.deploy.mesos.MesosExternalShuffleService"]
Now when I submit job with dynamic allocation it fails:
./dcos spark run --submit-args="--properties-file dynamic-allocation.conf --class portal.spark.cassandra.app.ProductModelPerNrOfAlerts http://marathon-lb-default.marathon.mesos:10018/jars/spark-cassandra-assembly-1.0.jar"
Run job succeeded. Submission id: driver-20161208191958-0047
select count(*) from product_model_per_alerts_by_date ;
count
-------
5
dynamic-allocation.conf:
spark.cassandra.connection.host 10.32.0.17
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.executor.cores 1
spark.executor.memory 1g
spark.submit.deployMode cluster
spark.cores.max 4
spark.shuffle.service.enabled true
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.minExecutors 2
spark.dynamicAllocation.maxExecutors 5
spark.dynamicAllocation.cachedExecutorIdleTimeout 120s
spark.dynamicAllocation.schedulerBacklogTimeout 10s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 20s
spark.mesos.executor.docker.volumes /tmp:/tmp:rw
spark.local.dir /tmp
logs from mesos:
16/12/08 19:20:42 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 18.0 KB, free 366.0 MB)
16/12/08 19:20:42 INFO TorrentBroadcast: Reading broadcast variable 7 took 21 ms
16/12/08 19:20:42 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 38.6 KB, free 366.0 MB)
16/12/08 19:20:42 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 0, fetching them
16/12/08 19:20:42 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker#10.32.0.4:45422)
16/12/08 19:20:42 INFO MapOutputTrackerWorker: Got the output locations
16/12/08 19:20:42 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 58 blocks
16/12/08 19:20:42 INFO TransportClientFactory: Successfully created connection to /10.32.0.11:7337 after 2 ms (0 ms spent in bootstraps)
16/12/08 19:20:42 INFO ShuffleBlockFetcherIterator: Started 1 remote fetches in 13 ms
16/12/08 19:20:42 ERROR OneForOneBlockFetcher: Failed while starting block fetches java.lang.RuntimeException: java.lang.RuntimeException: Failed to open file: /tmp/blockmgr-d4df5df4-24c9-41a3-9f26-4c1aba096814/30/shuffle_0_0_0.index
at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:234)
...
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
...
Caused by: java.io.FileNotFoundException: /tmp/blockmgr-d4df5df4-24c9-41a3-9f26-4c1aba096814/30/shuffle_0_0_0.index (No such file or directory)
logs from marathon spark-mesos-external-shuffle-service-tt:
...
16/12/08 19:20:29 INFO MesosExternalShuffleBlockHandler: Received registration request from app 704aec43-1aa3-4971-bb98-e892beeb2c45-0008-driver-20161208191958-0047 (remote address /10.32.0.4:49710, heartbeat timeout 120000 ms).
16/12/08 19:20:31 INFO ExternalShuffleBlockResolver: Registered executor AppExecId{appId=704aec43-1aa3-4971-bb98-e892beeb2c45-0008-driver-20161208191958-0047, execId=2} with ExecutorShuffleInfo{localDirs=[/tmp/blockmgr-14525ef0-22e9-49fb-8e81-dc84e5fba8b2], subDirsPerLocalDir=64, shuffleManager=org.apache.spark.shuffle.sort.SortShuffleManager}
16/12/08 19:20:38 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 8157825166903585542
java.lang.RuntimeException: Failed to open file: /tmp/blockmgr-14525ef0-22e9-49fb-8e81-dc84e5fba8b2/16/shuffle_0_55_0.index
at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:234)
...
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
Caused by: java.io.FileNotFoundException: /tmp/blockmgr-14525ef0-22e9-49fb-8e81-dc84e5fba8b2/16/shuffle_0_55_0.index (No such file or directory)
...
but file exists on given slave box:
$ ls -l /tmp/blockmgr-14525ef0-22e9-49fb-8e81-dc84e5fba8b2/16/shuffle_0_55_0.index
-rw-r--r-- 1 root root 1608 Dec 8 19:20 /tmp/blockmgr-14525ef0-22e9-49fb-8e81-dc84e5fba8b2/16/shuffle_0_55_0.index
stat shuffle_0_55_0.index
File: 'shuffle_0_55_0.index'
Size: 1608 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 1805493 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2016-12-08 19:20:38.163188836 +0000
Modify: 2016-12-08 19:20:38.163188836 +0000
Change: 2016-12-08 19:20:38.163188836 +0000
Birth: -
I am not familiar with DCOS, Marathon and Azure though, I use dynamic resource allocation(Mesos external shuffle service) on Mesos and Aurora with Docker.
Each Mesos agent node has its own external shuffle service (that is, one external shuffle service for one mesos agent) ?
spark.local.dir setting is exactly same string and pointing same directory ? Your spark.local.dir for shuffle service is /tmp though, I don't know DCOS setting.
spark.local.dir directory can be readable/writable for both ? If both mesos agent and external shuffle service are launched by docker, spark.local.dir on host MUST be mounted to both containers.
EDIT
If SPARK_LOCAL_DIRS (mesos or standalone) environment variable is set, spark.local.dir will be overridden.
There was error in marathon external shuffle service config instead of path container.docker.volumes we should use container.volumes path.
Correct configuration:
{
"id": "mesos-external-shuffle-service-simple",
"container": {
"type": "DOCKER",
"docker": {
"image": "jpavt/mesos-spark-hadoop:mesos-external-shuffle-service-1.0.4-2.0.1",
"network": "BRIDGE",
"portMappings": [
{ "hostPort": 7337, "containerPort": 7337, "servicePort": 7337 }
],
"forcePullImage":true
},
"volumes": [
{
"containerPath": "/tmp",
"hostPath": "/tmp",
"mode": "RW"
}
]
},
"instances": 9,
"cpus": 0.2,
"mem": 512,
"constraints": [["hostname", "UNIQUE"]]
}

Resources