Hazelcast mapreduce executor overload - hazelcast

I'm setting up a new cluster and I'm getting an error from the hazelcast mapreduce executor:
java.util.concurrent.RejectedExecutionException: Executor[mapreduce::hz::default] is overloaded
Using spring, I am configuring the jobtracker as follows:
<hz:jobtracker name="default" max-thread-size="8" queue-size="0"/>
Per documentation, 0 is the default queue size which is un-bound.
Thoughts? I am only sending about 100 jobs simultaneously

The manual is wrong about it.
A value that's less or equal zero means that the queue size is twice the partitionCount.
int queueSize = jobTrackerConfig.getQueueSize();
if (queueSize <= 0) {
queueSize = ps.getPartitionCount() * 2;
}
Code snippet on github
Use an integer that's big enough for your use-case.

Related

How to fire N Requests per Second in Scala

I am developing an application in Scala, which is kind of request processor in batch. It gets the required data from storage and forms the request and calls an API of another service (named ServiceB)
ServiceB allocates 110TPS of its throughput to my service. In order to utilize max available throughput, I want to make request at the rate of 100TPS.
How can I fire requests at rate of exactly 100 TPS ?
In theory, if the API call takes around 500 milliseconds to execute. One thread can execute max 2 request per second. So, 50 threads are needed to achieve 100TPS per second.
But Futures in scala are anyway executed parallely. This makes it difficult to come up with an exact number of threads required.
I tried the following
def main(args: Array[String]): Unit = {
val executorService = Executors.newFixedThreadPool(1) // threadpoolsize 1
implicit val executionContext: ExecutionContextExecutor = ExecutionContextFactory.get(executorService)
....
val request = getRequest(..)
val responseList = ListBuffer[Future[Response]]();
val stTime = System.currentTimeMillis()
val rateLimiter = RateLimiter.create(300) //guava-ratelimiter
while(System.currentTimeMillis() - stTime <= (1000*60*3)) { // Running for 3 mins
rateLimiter.acquire(1)
responseList += http(request) // using dispatch
}
//TODO : use different threadpool to process the futures in responseList.
}
Metrics from DataDog:
TPS is 83
Total no of calls to tat api is 16.4k
This doesn't seem to be firing calls at even 100 TPS (though I allowed 300 request per seconds, check the rate limiter value)
This problem is similar to load testing, how are load testing frameworks firing requests at X TPS constantly?
Thanks.

Cassandra-driver Client.batch() gives RangeError

This code
const cassandra = require('cassandra-driver');
const Long = require('cassandra-driver').types.Long;
const client = new cassandra.Client({
contactPoints: ['localhost:9042'],
localDataCenter: 'datacenter1',
keyspace: 'ks'
});
let q =[]
const ins_q = 'INSERT INTO ks.table1 (id , num1, num2, txt, date) VALUES (?,33,44,\'tes2\',toTimeStamp(now()));'
for (let i = 50000000003n; i < 50000100003n; i++) {
q.push({query: ins_q, params: [Long.fromString(i.toString(),true)]})
}
client.batch(q, { prepare: true }).catch(err => {
console.log('Failed %s',err);
})
is causing this error
Failed RangeError [ERR_OUT_OF_RANGE]: The value of "value" is out of range. It must be >= 0 and <= 65535. Received 100000
at new NodeError (f:\node\lib\internal\errors.js:371:5)
at checkInt (f:\node\lib\internal\buffer.js:72:11)
at writeU_Int16BE (f:\node\lib\internal\buffer.js:832:3)
at Buffer.writeUInt16BE (f:\node\lib\internal\buffer.js:840:10)
at FrameWriter.writeShort (f:\node\test\node_modules\cassandra-driver\lib\writers.js:47:9)
at BatchRequest.write (f:\node\test\node_modules\cassandra-driver\lib\requests.js:438:17)
Is this a bug? I tried execute() with one bigint the same way and there was no problem.
"cassandra-driver": "^4.6.3"
Failed RangeError [ERR_OUT_OF_RANGE]: The value of "value" is out of range. It must be >= 0 and <= 65535. Received 100000
Is this a bug?
No, this is Cassandra protecting the cluster from running a large batch and crashing one or more nodes.
While you do appear to be running this on your own machine, Cassandra is first and foremost a distributed system. So it has certain guardrails built in to prevent non-distributed things from causing problems. This is one of them.
What will happen here, is that the driver looks at the id and figures out real fast that a single node isn't responsible for all of the different possible values of id. So, it sends the batch of 100k statements to one node picked as the "coordinator." That coordinator "coordinates" retrieving each partition of data from all nodes in the cluster, and assembles the result set.
Or rather, it'll try to, but probably time-out before getting through even 1/5th of a batch this size. Remember, BATCH with Cassandra was built to really only run 5 or 6 write operations to keep 5 or 6 tables in-sync; not 100k write operations to the same table.
The way to approach this scenario, is to execute each write operation individually. If you want to optimize the process, make each write operation asynchronous with a listenable future. Run only a certain number of async threads at a time, block on their completion, and then run the next set of threads. Repeat this process until complete.
In short, there are many nuances about Cassandra that are different from a relational database. The use and implementation of BATCH writes being one of them.
Why does it cause a range error?
Because of this part in the error message:
It must be >= 0 and <= 65535
The Cassandra Node.js driver will not allow a batch to exceed 65535 statements. By the looks of it, it is being sent 100000 statements.

What is openCostInBytes?

Can someone explain me openCostInBytes in Apache Spark? I can see definition in documentation but I dont understand how exactly it can affect reading files. Should I really care about this and if yes how should I tune it?
spark.files.openCostInBytes will affect how many partitions the input data will be read into. The exact calculation can be found in Filepartition.scala.
The way it exists at the time of this answer, the calculation is the following:
def maxSplitBytes(
sparkSession: SparkSession,
selectedPartitions: Seq[PartitionDirectory]): Long = {
val defaultMaxSplitBytes = sparkSession.sessionState.conf.filesMaxPartitionBytes
val openCostInBytes = sparkSession.sessionState.conf.filesOpenCostInBytes
val minPartitionNum = sparkSession.sessionState.conf.filesMinPartitionNum
.getOrElse(sparkSession.leafNodeDefaultParallelism)
val totalBytes = selectedPartitions.flatMap(_.files.map(_.getLen + openCostInBytes)).sum
val bytesPerCore = totalBytes / minPartitionNum
Math.min(defaultMaxSplitBytes, Math.max(openCostInBytes, bytesPerCore))
}
So that last line is the interesting one. We take the minimum of:
defaultMaxSplitBytes, which comes from spark.sql.files.maxPartitionBytes and is by default 128 * 1024 * 1024
the max of:
openCostInBytes, which comes from spark.sql.files.openCostInBytes, and is by default 4 * 1024
bytesPerCore which is totalBytes / minPartitionNum. minPartitionNum comes from spark.default.parallelism in the default case and this is equal to your total number of cores
So now we know this, we can try to understand the 3 edge cases of this calculation (taking into account default values of the parameters):
If the result is the value of defaultMaxSplitBytes, this is because we have a bytesPerCore that is larger than the other values. This only happens when we're handling BIG files. So big, that if we would fairly split the data over all the cores it would be bigger than defaultMaxSplitBytes. So here we are limiting the size of each partition.
If the result is the value of bytesPerCore, then that means that it was smaller than 128MB but larger than 4MB. In this case, we are fairly splitting the data over all of the cores.
If the result is the value of openCostInBytes, then that means bytesPerCore was so small it was smaller than 4MB. Since each partition has a cost of opening, we want to limit the amount of partitions that get created. So in this case, we are limiting the amount of partitions created
From understanding this, we see that this value only has an effect if your data is small w.r.t. your cluster (i.e. if your data size / nr of cores in cluster is small)
Hope this helps!

Color scheme in spark web UI

What does the blue zone represent? I can understand the green zone represents the computing time. By going from legend, the blue zone should represent scheduler delay.However, the numbers do not match as mentioned schedular delay is negligible to the executor time. So, what does it means?
The scheduler is the part of the master that constructs the DAG of stages and tasks and interacts with the cluster to distribute them in the most efficient way it can. Scheduler Delay is the overhead of how long it takes to ship tasks to the executors and get the results back.
This is how it is calculated in the most recent branch:
private[ui] def getSchedulerDelay(
info: TaskInfo, metrics: TaskMetricsUIData, currentTime: Long): Long = {
if (info.finished) {
val totalExecutionTime = info.finishTime - info.launchTime
val executorOverhead = (metrics.executorDeserializeTime +
metrics.resultSerializationTime)
math.max(
0,
totalExecutionTime - metrics.executorRunTime - executorOverhead -
getGettingResultTime(info, currentTime))
} else {
// The task is still running and the metrics like executorRunTime are not available.
0L
}
}

Spark RDD.isEmpty costs much time

I built a Spark cluster.
workers:2
Cores:12
Memory: 32.0 GB Total, 20.0 GB Used
Each worker gets 1 cpu, 6 cores and 10.0 GB memory
My program gets data source from MongoDB cluster. Spark and MongoDB cluster are in the same LAN(1000Mbps).
MongoDB document format:
{name:string, value:double, time:ISODate}
There is about 13 million documents.
I want to get the average value of a special name from a special hour which contains 60 documents.
Here is my key function
/*
*rdd=sc.newAPIHadoopRDD(configOriginal, classOf[com.mongodb.hadoop.MongoInputFormat], classOf[Object], classOf[BSONObject])
Apache-Spark-1.3.1 scala doc: SparkContext.newAPIHadoopFile[K, V, F <: InputFormat[K, V]](path: String, fClass: Class[F], kClass: Class[K], vClass: Class[V], conf: Configuration = hadoopConfiguration): RDD[(K, V)]
*/
def findValueByNameAndRange(rdd:RDD[(Object,BSONObject)],name:String,time:Date): RDD[BasicBSONObject]={
val nameRdd = rdd.map(arg=>arg._2).filter(_.get("name").equals(name))
val timeRangeRdd1 = nameRdd.map(tuple=>(tuple, tuple.get("time").asInstanceOf[Date]))
val timeRangeRdd2 = timeRangeRdd1.map(tuple=>(tuple._1,duringTime(tuple._2,time,getHourAgo(time,1))))
val timeRangeRdd3 = timeRangeRdd2.filter(_._2).map(_._1)
val timeRangeRdd4 = timeRangeRdd3.map(x => (x.get("name").toString, x.get("value").toString.toDouble)).reduceByKey(_ + _)
if(timeRangeRdd4.isEmpty()){
return basicBSONRDD(name, time)
}
else{
return timeRangeRdd4.map(tuple => {
val bson = new BasicBSONObject()
bson.put("name", tuple._1)
bson.put("value", tuple._2/60)
bson.put("time", time)
bson })
}
}
Here is part of Job information
My program works so slowly. Does it because of isEmpty and reduceByKey? If yes, how can I improve it ? If not, why?
=======update ===
timeRangeRdd3.map(x => (x.get("name").toString, x.get("value").toString.toDouble)).reduceByKey(_ + _)
is on the line of 34
I know reduceByKey is a global operation, and may costs much time, however, what it costed is beyond my budget. How can I improvet it or it is the defect of Spark. With the same calculation and hardware, it just costs several seconds if I use multiple thread of java.
First, isEmpty is merely the point at which the RDD stage ends. The maps and filters do not create a need for a shuffle, and the method used in the UI is always the method that triggers a stage change/shuffle...in this case isEmpty. Why it's running slow is not as easy to discern from this perspective, especially without seeing the composition of the originating RDD. I can tell you that isEmpty first checks the partition size and then does a take(1) and verifies whether data was returned or not. So, the odds are that there is a bottle neck in the network or something else blocking along the way. It could even be a GC issue... Click into the isEmpty and see what more you can discern from there.

Resources