Achieving concurrency through FAIR scheduling in Spark - apache-spark

My Environment:
I'm trying to connect Cassandra through Spark Thrift server. Then I create a Meta-Table in Hive Metastore which holds the Cassandra table data. In a web application I connect to Meta-table through JDBC driver. I have enabled fair scheduling for Spark Thrift Server.
Issue:
When I perform a load test for concurrency through JMeter for 100 users for 300 seconds duration, I get sub seconds response time for initial requests(Say like first 30 seconds). Then the response time gradually increases (like 2 to 3 seconds). When I check the Spark UI, all the jobs are executed less than 100 milliseconds. I also notice that jobs and tasks are in pending stage when request are received. So I assume that even though the tasks take sub seconds to execute they are submitted with a latency by the scheduler. How to fix this latency in job submission?
Following are my configuration details,
Number of Workers - 2
Number of Executors per Worker - 1
Number of cores per Executor - 14
Total core of workers - 30
Memory per Executor - 20Gb
Total Memory of worker - 106Gb
Configuration in Fair Schedule XML
<pool name="default">
<schedulingMode>FAIR</schedulingMode>
<weight>2</weight>
<minShare>15</minShare>
</pool>
<pool name="test">
<schedulingMode>FIFO</schedulingMode>
<weight>2</weight>
<minShare>3</minShare>
</pool>
I'm executing in Spark Standalone mode.

Is it not the case queries pending in the queue while others are running. Try reducing spark.locality.wait to say 1s

Related

The specific role of worker in DolphinDB?

What does worker, secondaryWorker, web worker, infra worker, dynamic worker, and local executors stand for respectively in DolphinDB?  Why the secondaryWorker and dynamic worker is introduced and what’s the usage?
worker: the thread of regular interactive jobs. It divides clients’ requests into subtasks once received. Depending on the task granularity, the worker will either execute the tasks or allocate them to local executor or remote executor. The number of workers can be set by specifying the configuration parameter workerNum and the default value is determined by the number of CPU cores.
secondary worker: the thread of secondary jobs. It is used to avoid job loops and solve the deadlocks caused by circular dependency between tasks. The upper limit can be set by specifying the configuration parameter secondaryWorkerNum and the default value is workerNum.
web worker: the thread that processes HTTP requests. DolphinDB provides a web interface for cluster management, allowing users to interact with DolphinDB nodes. The upper limit can be set by specifying the configuration parameter webWorkerNum and the default value is 1.
infra worker: the thread that reports heartbeat within clusters. It solves the problem that the heartbeat cannot be reported to the master in time when a cluster is under high pressure.
dynamic worker: the dynamic working thread as a supplemental to worker. If a new task is requested when all the worker threads are occupied, the system creates a dynamic worker thread to perform the task. The upper limit can be set by specifying the configuration parameter maxDynamicWorker and the default value is workerNum. The thread will be recycled by the system after being idle for 60 seconds to release memory resources.
local executor: the local thread that executes sub-tasks allocated by worker. Each local executor can only execute one task at a time. All worker threads share one local executor. The number of local executors can be set by specifying configuration parameter localExecutors and the default value is the number of CPU cores minus 1. The number of workers and local executors directly determines the system’s performance for concurrent computing.

Cassandra - Frequent cross-node timeouts

I am observing timeouts in the Cassandra cluster with the following logs in debug.log:
time 1478 msec - slow timeout 500 msec/cross-node
Does this represent that the read request is spending 1378 ms for the other replicas to respond?
The NTP is in sync for this cluster with fewer data and good CPU and memory allocated.
Does setting cross_node_timeout: truegoing to help?
Cassandra version: 3.11.6
Thanks
The value 1478 msec reported in logs is the time recorderd for a particular query to execute. As it is cross-node which signifies that this query/operation was performed across nodes. This is just a warning that your queries are running slower. Default value of slow query timeout is 500ms and can be set in cassandra.yaml by slow_query_log_timeout_in_ms.
If this is one off log in your logs, then it could have been caused by GC. If it is consistently showing up, then something is wrong in your environment(network etc) or your query.
Regarding the property cross_node_timeout: true, it was introduced via CASSANDRA-4812. Purpose of this property is to avoid timeouts in case NTP server is not synced across nodes. Default value of this property is false. Since NTP is synced on your cluster, you can do it to true but it will not help in message you are getting.

Cassandra write query timeout out after PT2S

I have cassandra monolithic application where I want to write at high rate reading some payloads from queue. Cassandra cluster has 3 nodes . When i start processing large number of messages in parallel(by spawning threads) I get below exceptions
java.util.concurrent.ExecutionException: com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2S
I am creating CQLsession as bean
return CqlSession.builder().addContactPoints(contactPoints)
/*.addContactPoint(new InetSocketAddress("localhost", 9042))*/
.withConfigLoader(new DefaultDriverConfigLoader()).withLocalDatacenter("datacenter1")
.addTypeCodecs(new CustomDateCodec())
.withKeyspace("dev").build();
I am injecting this CqlSession into my mapper and other classes to run queries
In my datastax driver i have given ip of 3 nodes as contact points
Is there any tuning I need to do in CQLsession creation/ or my cassandra nodes so that they can take is writes at high concurrency ?
Also How many writes can I do in parallel ?
All are update statement without any if condition only on primary key
The timeout you're seeing is a result of your app overloading the cluster, effectively doing a DDoS attack.
PT2S is the 2-second write timeout. There will come a point when the commitlog disks can only take so much write IO. If you're seeing dropped mutations in the logs or nodetool tpstats, that's confirmation that the commitlog can't keep up with the writes.
If your cluster can sustain 10K writes/sec but your app is doing 20K writes then you need to double the size of your cluster (add more nodes) to support the throughput requirements. Cheers!

Heartbeat, poll interval and session timeout for Spark Streaming with Kafka

Using Spark Streaming with Kafka - Direct Approach - Doc
Spark version - 2.3.2
Spark Streaming version - spark-streaming-kafka-0-10_2.11
Problem : Need to run the steaming application with a batch interval of 10 minutes, but the default timeouts are very less than 10 mins so how to configure following parameters:
heartbeat.interval.ms
session.timeout.ms
group.max.session.timeout.ms
group.min.session.timeout.ms
max.poll.interval.ms
Given that the batch interval is 10 minutes.
Also, does setting these to certain values affect all the consumer groups (existing and the ones that may be added in future) ?
If yes, how to configure these params only for a specific consumer group ?

Kafka consumer request timeout

I have a Spark streaming (Scala) application running in CDH 5.13 consuming messages from Kafka using client 0.10.0. My Kafka cluster contains 3 brokers. Kafka topic is divided into 12 partitions evenly distributed between these 3 brokers. My Spark streaming consumer has 12 executors with 1 core each.
Spark streaming starts reading millions of messages from Kafka in each batch, but reduces the number to thousands due to the fact that Spark is not capable to cope with the load and queue of unprocessed batches is created. That is fine and my expectation though is that Spark processes the small batches very quickly and returns to normal, however I see that from time to time one of the executors that processes only few hundreds of messages gets 'request timeout' error just after reading the last offset from Kafka:
DEBUG org.apache.clients.NetworkClient Disconnecting from node 12345 due to request timeout
After this error, executor sends several RPC requests driver that take ~40 seconds and after this time executor reconnects to the same broker from which it disconnected.
My question is how can I prevent this request timeout and what is the best way to find the root cause for it?
Thank you
The root cause for disconnection was the fact that response for data request arrived from Kafka too late. i.e. after request.timeout.ms parameter which was set to default 40000 ms. The disconnection problem was fixed when I increased this value.

Resources