I am developing an application in Scala, which is kind of request processor in batch. It gets the required data from storage and forms the request and calls an API of another service (named ServiceB)
ServiceB allocates 110TPS of its throughput to my service. In order to utilize max available throughput, I want to make request at the rate of 100TPS.
How can I fire requests at rate of exactly 100 TPS ?
In theory, if the API call takes around 500 milliseconds to execute. One thread can execute max 2 request per second. So, 50 threads are needed to achieve 100TPS per second.
But Futures in scala are anyway executed parallely. This makes it difficult to come up with an exact number of threads required.
I tried the following
def main(args: Array[String]): Unit = {
val executorService = Executors.newFixedThreadPool(1) // threadpoolsize 1
implicit val executionContext: ExecutionContextExecutor = ExecutionContextFactory.get(executorService)
....
val request = getRequest(..)
val responseList = ListBuffer[Future[Response]]();
val stTime = System.currentTimeMillis()
val rateLimiter = RateLimiter.create(300) //guava-ratelimiter
while(System.currentTimeMillis() - stTime <= (1000*60*3)) { // Running for 3 mins
rateLimiter.acquire(1)
responseList += http(request) // using dispatch
}
//TODO : use different threadpool to process the futures in responseList.
}
Metrics from DataDog:
TPS is 83
Total no of calls to tat api is 16.4k
This doesn't seem to be firing calls at even 100 TPS (though I allowed 300 request per seconds, check the rate limiter value)
This problem is similar to load testing, how are load testing frameworks firing requests at X TPS constantly?
Thanks.
Related
I want to kill my spark streaming job when there is no activity (i.e. the receivers are not receiving messages) for a certain time. I tried doing this
var counter = 0
myDStream.foreachRDD {
rdd =>
if (rdd.count() == 0L)
{
counter = counter + 1
if (counter == 40) {
ssc.stop(true, true)
}
} else {
counter = 0
}
}
Is there a better way of doing this? How would I make a variable available to all receivers and update the variable by 1 whenever there is no activity?
Use a NoSQL Table like Cassandra or HBase to keep the counter. You can not handle Stream Polling inside a loop. Implement same logic using NoSQL or Maria DB and perform a Graceful Shutdown to your streaming Job if no activity is happening.
The way I did it was I maintained a Table in Maria DB for Streaming JOB having Polling interval of 5 mins. Every 5 mins it hits the data base and writes the count of records it consumed also the method returns what is the count of zero records line items during latest timestamp. This helped me a lot managing my Streaming Job Management. Also this table usually helps me o automatically trigger the Streaming job based on a logic written in a shell script
Currently, I am able to write to database in the batchsize of 500. But due to the memory shortage error and delay synchronization between child aggregator and leaf node of database, sometimes I am running into Leaf Node Memory Error. The only solution for this is if I limit my write operations to 1k records per second, I can get rid of the error.
dataStream
.map(line => readJsonFromString(line))
.grouped(memsqlBatchSize)
.foreach { recordSet =>
val dbRecords = recordSet.map(m => (m, Events.transform(m)))
dbRecords.map { record =>
try {
Events.setValues(eventInsert, record._2)
eventInsert.addBatch
} catch {
case e: Exception =>
logger.error(s"error adding batch: ${e.getMessage}")
val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))
logger.error(s"event: $error_event")
}
}
// Bulk Commit Records
try {
eventInsert.executeBatch
} catch {
case e: java.sql.BatchUpdateException =>
val updates = e.getUpdateCounts
logger.error(s"failed commit: ${updates.toString}")
updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>
val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))
logger.error(s"insert error: $error")
logger.error(e.getMessage)
}
}
finally {
connection.commit
eventInsert.clearBatch
logger.debug(s"committed: ${dbRecords.length.toString}")
}
}
The reason for 1k records is that, some of the data that I am trying to write can contains tons of json records and if batch size if 500, that may result in 30k records per second. Is there any way so that I can make sure that only 1000 records will be written to the database in a batch irrespective of the number of records?
I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.
One suggestion would be using any Streaming techniques such as Akka.Stream, Monix.Observable. There are some pro and cons between those libraries I don't want to spend too much paragraph on it. But they do support back pressure to control the producing rate when consumer is slower than producer. For example, in your case your consumer is database writing and your producer maybe is reading some json files and doing some aggregations.
The following code illustrates the idea and you will need to modify as your need:
val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))
val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink
val flowThrottle = Flow[String]
.throttle(1, 1.second, 1, ThrottleMode.shaping)
val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)
val result = runnable.run()
The code block is already called by a thread and there are multiple threads running in parallel. Either I can use Thread.sleep(1000) or delay(1.0) in this scala code. But if I use delay() it will use a promise which might have to call outside the function. Looks like Thread.sleep() is the best option along with batch size of 1000. After performing the testing, I could benchmark 120,000 records/thread/sec without any problem.
According to the architecture of memsql, all loads into memsql are done into a rowstore first into the local memory and from there memsql will merge into the columnstore at the end leaves. That resulted into the leaf error everytime I pushed more number of data causing bottleneck. Reducing the batchsize and introducing a Thread.sleep() helped me writing 120,000 records/sec. Performed testing with this benchmark.
I am streaming data from kafka and trying to limit the number of events per batch to 10 events. After processing for 10-15 batches, there is a sudden spike in the batch size. Below are my settings:
spark.streaming.kafka.maxRatePerPartition=1
spark.streaming.backpressure.enabled=true
spark.streaming.backpressure.pid.minRate=1
spark.streaming.receiver.maxRate=2
Please check this image for the streaming behavior
This is the bug in spark, please reffer to: https://issues.apache.org/jira/browse/SPARK-18371
The pull request isn't merged yet, but you may pick it up and build spark on your own.
To summarize the issue:
If you have the spark.streaming.backpressure.pid.minRate set to a number <= partition count, then an effective rate of 0 is calculated:
val totalLag = lagPerPartition.values.sum
...
val backpressureRate = Math.round(lag / totalLag.toFloat * rate)
...
(the second line calculates rate per partition where rate is rate comming from PID and defaults to minRate, when PID calculates it shall be smaller)
As here: DirectKafkaInputDStream code
This resulting to 0 causes the fallback to (unreasonable) head of partitions:
...
if (effectiveRateLimitPerPartition.values.sum > 0) {
val secsPerBatch = context.graph.batchDuration.milliseconds.toDouble / 1000
Some(effectiveRateLimitPerPartition.map {
case (tp, limit) => tp -> (secsPerBatch * limit).toLong
})
} else {
None
}
...
maxMessagesPerPartition(offsets).map { mmp =>
mmp.map { case (tp, messages) =>
val lo = leaderOffsets(tp)
tp -> lo.copy(offset = Math.min(currentOffsets(tp) + messages, lo.offset))
}
}.getOrElse(leaderOffsets)
As in DirectKafkaInputDStream#clamp
This makes the backpressure basically not working when your actual and minimal receive rate/msg/ partitions is smaller ~ equal to partitions count and you experience significant lag (e.g. messages come in spikes and you have constant processing powers).
I built a Spark cluster.
workers:2
Cores:12
Memory: 32.0 GB Total, 20.0 GB Used
Each worker gets 1 cpu, 6 cores and 10.0 GB memory
My program gets data source from MongoDB cluster. Spark and MongoDB cluster are in the same LAN(1000Mbps).
MongoDB document format:
{name:string, value:double, time:ISODate}
There is about 13 million documents.
I want to get the average value of a special name from a special hour which contains 60 documents.
Here is my key function
/*
*rdd=sc.newAPIHadoopRDD(configOriginal, classOf[com.mongodb.hadoop.MongoInputFormat], classOf[Object], classOf[BSONObject])
Apache-Spark-1.3.1 scala doc: SparkContext.newAPIHadoopFile[K, V, F <: InputFormat[K, V]](path: String, fClass: Class[F], kClass: Class[K], vClass: Class[V], conf: Configuration = hadoopConfiguration): RDD[(K, V)]
*/
def findValueByNameAndRange(rdd:RDD[(Object,BSONObject)],name:String,time:Date): RDD[BasicBSONObject]={
val nameRdd = rdd.map(arg=>arg._2).filter(_.get("name").equals(name))
val timeRangeRdd1 = nameRdd.map(tuple=>(tuple, tuple.get("time").asInstanceOf[Date]))
val timeRangeRdd2 = timeRangeRdd1.map(tuple=>(tuple._1,duringTime(tuple._2,time,getHourAgo(time,1))))
val timeRangeRdd3 = timeRangeRdd2.filter(_._2).map(_._1)
val timeRangeRdd4 = timeRangeRdd3.map(x => (x.get("name").toString, x.get("value").toString.toDouble)).reduceByKey(_ + _)
if(timeRangeRdd4.isEmpty()){
return basicBSONRDD(name, time)
}
else{
return timeRangeRdd4.map(tuple => {
val bson = new BasicBSONObject()
bson.put("name", tuple._1)
bson.put("value", tuple._2/60)
bson.put("time", time)
bson })
}
}
Here is part of Job information
My program works so slowly. Does it because of isEmpty and reduceByKey? If yes, how can I improve it ? If not, why?
=======update ===
timeRangeRdd3.map(x => (x.get("name").toString, x.get("value").toString.toDouble)).reduceByKey(_ + _)
is on the line of 34
I know reduceByKey is a global operation, and may costs much time, however, what it costed is beyond my budget. How can I improvet it or it is the defect of Spark. With the same calculation and hardware, it just costs several seconds if I use multiple thread of java.
First, isEmpty is merely the point at which the RDD stage ends. The maps and filters do not create a need for a shuffle, and the method used in the UI is always the method that triggers a stage change/shuffle...in this case isEmpty. Why it's running slow is not as easy to discern from this perspective, especially without seeing the composition of the originating RDD. I can tell you that isEmpty first checks the partition size and then does a take(1) and verifies whether data was returned or not. So, the odds are that there is a bottle neck in the network or something else blocking along the way. It could even be a GC issue... Click into the isEmpty and see what more you can discern from there.
Is it possible to cancel an spark future and still get a smaller RDD with the processed elements?
Spark Async Actions "documented" here
http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.rdd.AsyncRDDActions
And the future itself has a rich set of functions
http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.FutureAction
The use case I was thinking of is to have a very huge map, that could be aborted afted 30 minutes of calculation, and still collect -or even iterate or saveAsObjectFile- the subset of the RDD that has been effectively mapped.
FutureAction.cancel causes a failure (see comment in JobWaiter.scala), so you cannot use it to get partial results. I don't think there's a way to do it through the async API.
Instead, you could stop processing the input after 30 minutes.
val stopTime = System.currentTimeMillis + 30 * 60 * 1000 // 30 minutes from now.
rdd.mapPartitions { partition =>
if (System.currentTimeMillis < stopTime) partition.map {
// Process it like usual.
???
} else {
// Time's up. Don't process anything.
Iterator()
}
}
Keep in mind that this only makes a difference once all the shuffle dependencies have completed. (It cannot stop the shuffle from being performed, even when 30 minutes have passed.)