sub-second latency causing delay in spark application - apache-spark

I have a spark batch job that runs every minute and processes ~200k records per batch. The usual processing delay of the app is ~30 seconds. In the app, for each request, we make a write request to DynamoDB. At times, the server-side DDB write latency is ~5 ms instead of 3.5 ms (~30% increase w.r.t to usual latency 3.5ms). This is causing the overall delay of the app to bump by 6 times (~3 minutes).
How does sub-second latency of DDB call impact the overall latency of the app by 6 times?
PS: I have verified the root cause through overlapping the cloud-watch graphs of DDB put latency and the spark app processing delay.
Thanks,
Vinod.

Just a ballpark estimate:
If the average is 3.5 ms latency and about half of your 200k records are processed in 5ms instead of 3.5ms, this would leave us with:
200.000 * 0.5 * (5 - 3.5) = 150.000 (ms)
of total delay, which is 150 seconds or 2.5 minutes. I don't know how well the process is parallelized, but this seems to be within the expected delay.

Related

Spark Structured Streaming metrics: Why process rate can be greater than input rate?

How come the process rate can be greater than the input rate?
From my understanding, process rate is the rate by which spark can process arriving data, ie, the process capacity. If so, the process rate must be on average lower or equal to the input rate. If it is lower, we know we need more processing power, or rethink about trigger time.
I am basing my understanding on this blog post and common sense, but I might be wrong. I looking for the formal formula in the source code while writing this question, as well.
This is an example where the process rate is constantly greater than the input rate:
You can see that on averege we have 200-300 records being processed per sec, whereas we have 80-120 records arriving per sec.
Setup background: Spark 3.x reading from Kafka and writing to Delta.
Thank you all.
Process rate more than input rate could mean its processing much faster than input rate. i.e it could process 300-400 per sec although event rate is 100 per sec. For ex: lets say ~100 per sec is the input rate and spark is able to process 100 records within half a sec than it means it can process 100 more in the next half of the sec and on an average this would lead to ~200 process rate.
In screenshot attached it could be interpreted as
It can process ~3000 records within each batch(~200*~15s) with 15s processing time for each batch (based on ~15000 ms seen in latency chart) but its processing around ~1000 records within each batch with 15s processing time.

Spark streaming slow down

In our spark app we're consuming Kafka stream and storing data to Cassandra DB.
First, we've run the stream without backpressure and experienced a weird anomaly where processing time was constant ~ 1 minute, however the scheduling delay was increasing. In this way the queue was piling up, eventually crashing the stream.
Any thoughts why this could be happening? If it's not the processing, what can cause such dramatic delays?
Then we tried the same setup with backpressure (with increased maxRatePerPartition), initially, everything was running well. Backpressure did its throttling job and we were able to process at a constant rate of ~ 100K / minute.
Then after few hours, something happened and the rate dropped rapidly to 5K / minute. The processing time was only 5-6 second with no scheduling delay, but backpressure absurdly kept the rate at 5k / minute and never increased. Actually, there was no reason to throttle down to 5K at all.
Our Setup:
Window: 1 minute
spark.streaming.kafka.maxRatePerPartition = 500 (4 partition * 60 sec * 500 = 120K / window)
spark.streaming.backpressure.enabled = true
spark.streaming.kafka.allowNonConsecutiveOffsets = true
spark.streaming.kafka.consumer.cache.enabled = false
Spark cluster with one master and 2 worker nodes

Spark Streaming Processing Time vs Total Delay vs Processing Delay

I am trying to understand what the different metrics that Spark Streaming outputs mean and I am slightly confused what is the difference between the Processing Time, Total Delay and Processing Delay of the last batch ?
I have looked at the Spark Streaming guide which mentions the Processing Time as a key metric for figuring if the system is falling behind, but other places such as "Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache Spark" speak about using Total Delay and Processing Delay. I have failed to find any documentation that lists all the metrics produced by Spark Streaming with explanation what each one of them means.
I would appreciate if someone can outline what each of these three metrics means or point me to any resources that can help me understand that.
Let's break down each metric. For that, let's define a basic streaming application which reads a batch at a given 4 second interval from some arbitrary source, and computes the classic word count:
inputDStream.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
.saveAsTextFile("hdfs://...")
Processing Time: The time it takes to compute a given batch for all its jobs, end to end. In our case this means a single job which starts at flatMap and ends at saveAsTextFile, and assumes as a prerequisite that the job has been submitted.
Scheduling Delay: The time taken by Spark Streaming scheduler to submit the jobs of the batch. How is this computed? As we've said, our batch reads from the source every 4 seconds. Now let's assume that a given batch took 8 seconds to compute. This means that we're now 8 - 4 = 4 seconds behind, thus making the scheduling delay 4 seconds long.
Total Delay: This is Scheduling Delay + Processing Time. Following the same example, if we're 4 seconds behind, meaning our scheduling delay is 4 seconds, and the next batch took another 8 seconds to compute, this means that the total delay is now 8 + 4 = 12 seconds long.
A live example from a working Streaming application:
We see that:
The bottom job took 11 seconds to process. So now the next batches scheduling delay is 11 - 4 = 7 seconds.
If we look at the second row from the bottom, we see that scheduling delay + processing time = total delay, in that case (rounding 0.9 to 1) 7 + 1 = 8.
We're experiencing stable processing time, however increasing scheduling delay.
Based on the answer, the scheduling delay should be influenced only by processing time of previous runs.
Spark is running only streaming, nothing else.
Time window is 1 minute, processing 120K records.
If your window is 1 minute, and the average processing time is 1 minute 7 seconds, you have a problem : each batch will delay the next one by 7 seconds.
Your processing time graph shows a stable processing time, but always higher than batch time.
I think after a given amount of time, your driver will crash on GC overhead limit exceeded, as it will be full of pending batch waiting to be excecuted.
You can change this by reducing the processing time so that it goes under the expected microbatch max duration (requires code and/or resources allocation changes), or increase the microbatch size, or go to continuous streaming.
Rgds

How big the spark stream window could be?

I have some data flows need to be calculated. I am thinking about use spark stream to do this job. But there is one thing I am not sure and feel worry about.
My requirements is like :
Data comes in as CSV files every 5 minutes. I need report on data of recent 5 minutes, 1 hour and 1 day. So If I setup a spark stream to do this calculation. I need a interval as 5 minutes. Also I need to setup two window 1 hour and 1 day.
Every 5 minutes there will be 1GB data comes in. So the one hour window will calculate 12GB (60/5) data and the one day window will calculate 288GB(24*60/5) data.
I do not have much experience on spark. So this worries me.
Can spark handle such big window ?
How much RAM do I need to calculation those 288 GB data? More than 288 GB RAM? (I know this may depend on my disk I/O, CPU and the calculation pattern. But I just want some estimated answer based on experience)
If calculation on one day / one hour data is too expensive in stream. Do you have any better suggestion?

Performance testing - Jmeter results

I am using Jmeter (started using it a few days ago) as a tool to simulate a load of 30 threads using a csv data file that contains login credentials for 3 system users.
The objective I set out to achieve was to measure 30 users (threads) logging in and navigating to a page via the menu over a time span of 30 seconds.
I have set my thread group as:
Number of threads: 30
Ramp-up Perod: 30
Loop Count: 10
I ran the test successfully. Now I'd like to understand what the results mean and what is classed as good/bad measurements, and what can be suggested to improve the results. Below is a table of the results collated in the Summary report of Jmeter.
I have conducted research only to find blogs/sites telling me the same info as what is defined on the jmeter.apache.org site. One blog (Nicolas Vahlas) that I came across gave me some very useful information,but still hasn't help me understand what to do next with my results.
Can anyone help me understand these results and what I could do next following the execution of this test plan? Or point me in the right direction of an informative blog/site that will help me understand what to do next.
Many thanks.
According to me, Deviation is high.
You know your application better than all of us.
you should focus on, avg response time you got and max response frequency and value are acceptable to you and your users? This applies to throughput also.
It shows average response time is below 0.5 seconds and maximum response time is also below 1 second which are generally acceptable but that should be defined by you (Is it acceptable by your users). If answer is yes, try with more load to check scaling.
In you requirement it is mentioned that you need have 30 concurrent users performing different actions. The response time of your requests is less and you have ramp-up of 30 seconds. Can you please check total active threads during the test. I believe the time for which there will be 30 concurrent users in system is pretty short so the average response time that you are seeing seems to be misleading. I would suggest you run a test for some more time so that there will be 30 concurrent users in the system and that would be correct reading as per your requirements.
You can use Aggregate report instead of summary report. In performance testing
Throughput - Requests/Second
Response Time - 90th Percentile and
Target application resource utilization (CPU, Processor Queue Length and Memory)
can be used for analysis. Normally SLA for websites is 3 seconds but this requirement changes from application to application.
Your test results are good, considering if the users are actually logging into system/portal.
Samples: This means the no. of requests sent on a particular module.
Average: Average Response Time, for 300 samples.
Min: Min Response Time, among 300 samples (fastest among 300 samples).
Max: Max Response Time, among 300 samples (slowest among 300 samples).
Standard Deviation: A measure of the variation (for 300 samples).
Error: failure %age
Throughput: No. of request processed per second.
Hope this will help.

Resources