Best practice for Slick 2.1 / 3 execution context usage - multithreading

We use Slick (2.1.0) with Spray-io (1.3.3). Currently we are facing an issue because we use the same execution context for both the Spray HTTP API part and background running jobs accessing the same database. All database / blocking calls are wrapped in futures using the same scala.concurrent.ExecutionContext.global execution context.
When the background jobs start doing heavy work, they'll consume all available threads, which will lead to timeouts on the API side since their aren't any available threads to handle the API work.
The obvious solution would be to use different execution contexts for both parts with a total thread count not higher than the configured DB connection pool (HikariCP). (as partially suggested here https://www.playframework.com/documentation/2.1.0/ThreadPools#Many-specific-thread-pools) but how would such a setup work with Slick 3 where the execution context is tied to the DB configuration itself?

Slick3 comes with own execution context and number of threads are configurable.You can tweak all the connection pool setting for example(MySQL):
dev-dbconf={
dataSourceClass = "com.mysql.jdbc.jdbc2.optional.MysqlDataSource"
numThreads = 10 //for execution context
maxConnections = 10
minConnections = 5
connectionTimeout = 10000
initializationFailFast = false
properties {
user = "root"
password = "root"
databaseName = "db_name"
serverName = "localhost"
}
}
In this config you can change number of thread according your requirement.
I would like advise you never used "scala.concurrent.ExecutionContext.global" for IO. Because default ExecutionContext comes with fork-join thread pool which is not good for IO.You can create own thread pool for IO:
import scala.concurrent.ExecutionContext
import java.util.concurrent.Executors
object MyExecutionContext {
private val concorrency = Runtime.getRuntime.availableProcessors()
private val factor = 3 // get from configuration file
private val noOfThread = concorrency * factor
implicit val ioThreadPool: ExecutionContext = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(noOfThread))
}
// Use this execution context for IO instead of scala execution context.
import MyExecutionContext.ioThreadPool
Future{
// your blocking IO code
}
You can change noOfThread according to your requirement. It would be good if you set number of thread according number processors in your machine.
For more info, your can see Best Practices for Using Slick on Production
and Slick Doc.

Related

Oracle Connection retry using spark

I am trying to Spark to Oracle. If my connection fails, job is failing. Instead, I want to set some connection retry limit to ensure its trying to reconnect as per the limit and then fail the job if its not connecting.
Please suggest on how we could implement this.
Let's assume you are using PySpark. Recently I used this in my project so I know this works.
I have used retry PyPi project
retry 0.9.2
and its application passed through extensive testing process
I used a Python class to hold the retry related configurations.
class RetryConfig:
retry_count = 1
delay_interval = 1
backoff_multiplier = 1
I collected the application parameter from runtime configurations and set them as below:
RetryConfig.retry_count = <retry_count supplied from config>
RetryConfig.delay_interval = <delay_interval supplied from config>
RetryConfig.backoff_multiplier = <backoff_multiplier supplied from config>
Then applied the on the method call that connects the DB
#retry((Exception), tries=RetryConfig.retry_count, delay=RetryConfig.delay_interval, backoff=RetryConfig.backoff_multiplier)
def connect(connection_string):
print("trying")
obj = pyodbc.connect(connection_string)
return obj
Backoff will increase the delay by backoff multiplication factor with each retry - a quite common functional ask.
Cheers!!

Access mutable property of immutable broadcast variable

I'm making Spark App but get stuck on broadcast variable. According to document, broadcast variable should be 'read only'. What if it's properties are mutable?
In local, it works like variable. I don't have cluster environment, so ...
case object Var {
private var a = 1
def get() = {
a = a + 1
a
}
}
val b = sc.broadcast(Var)
// usage
b.value.get // => 2
b.value.get // => 3
// ...
Is this wrong usage of broadcast? It seems destroy the broadcast variable's consistency.
Broadcasts are moved from the driver JVM to executor JVMs once per executor. What happens is Var would get serialized on driver with its current a, then copied and deserialized to all executor JVMs. Let's say get was never called on driver before broadcasting. Now all executors get a copy of Var with a = 1 and whenever they call get, the value of a in their local JVM gets increased by one. That's it, nothing else happens and the changes of a won't get propagated to any other executor or the driver and the copies of Var will be out of sync.
Is this wrong usage of broadcast?
Well, the interesting question is why would you do that as only the initial value of a gets transferred. If the aim is to build local counters with a common initial value it technically works but there are much cleaner ways to implement that. If the aim is to get the value changes back to the driver then yes, it is wrong usage and accumulators should be used instead.
It seems destroy the broadcast variable's consistency.
Yep, definitely as explained earlier.

Performance bottlenecks when using async-await with Azure Storage API

I'm hitting a performance bottleneck, on insertion requests using the Azure Table Storage API. I'm trying to reach of a speed of at least 1 insert per 30ms into a table (unique partition keys).
What is the recommended way to achieve this request rate and how can I fix my program to overcome my bottleneck?
I have a test programs that inserts into the azure table at roughly 1 / 30ms. With this test program, the latency continuously increases and requests begin to take even more than 15 seconds per insert.
Below is the code for my test program. It creates async tasks that log the time it takes to await on the CloudTable ExecuteAsync method. Unfortunately, the insertion latency just grows as the program runs.
List<Task> tasks = new List<Task>();
while (true)
{
Thread.Sleep(30);
tasks = tasks.Where(t => t.IsCompleted == false).ToList(); // Remove completed tasks
DynamicTableEntity dte = new DynamicTableEntity() { PartitionKey = Guid.NewGuid().ToString(), RowKey = "abcd" };
tasks.Add(AddEntityToTableAsync(dte));
}
...
public static async Task<int> AddEntityToTableAsync<T>(T entity) where T : class, ITableEntity
{
Stopwatch timer = Stopwatch.StartNew();
var tableResult = await this.cloudTable.ExecuteAsync(TableOperation.InsertOrReplace(entity));
timer.Stop();
Console.WriteLine($"Table Insert Time: {timer.ElapsedMilliseconds}, Inserted {entity.PartitionKey}");
return tableResult.HttpStatusCode;
}
I thought that it might be my test program running out of threads for the outgoing Network IO, so I tried monitoring the available thread counts during the program's execution:
ThreadPool.GetAvailableThreads(out workerThreads, out completionIoPortThreads);
It showed that nearly all the IO threads were available during execution (Just in case, I even tried increasing the available threads but that had no affect on the issue).
As I understand it, for async tasks, the completion port threads don't get "reserved" until there's data on them to process, so I started thinking that there might be an issue with my connection to Azure Table Storage.
However, I confirmed that was not the case by lowering the request rate (1 insert / 100ms) and launching 30 instances of my test program on the same machine. With 30 instances, I was able to maintain a stable ~90ms / insert without any increase in latency.
What can I do to enable a single test program to achieve a simillar performance that I was getting when running 30 programs on the same machine?
The test program was hitting the System.Net.ServicePointManager.DefaultConnectionLimit. The default value is 2
Increasing the number to 100 fixes the problem. And allows the single program to achieve the same speed as the 30 programs scenario

Using cassandra python driver with twisted

My python application uses twisted, and uses cassandra python driver under the hood. Cassandra python driver can use cassandra.io.twistedreactor.TwistedConnection as a connection class to use twisted as a way to query.
TwistedConnection class uses timer and reactor.callLater to check if a query task has timed out.
The problem is when I use cassandra ORM (cassandra.cqlengine.models.Model) to query.
from cassandra.cqlengine import columns
from cassandra.cqlengine.models import Model
# ORM for user settings
class UserSettings(Model):
userid = columns.Text(primary_key=True)
settings = columns.Text()
# Function registered with autobahn/wamp
def worker():
userid = "96c5d462-cf7c-11e7-b567-b8e8563d0920"
def _query():
# This is a blocking call, internally calling twisted reactor
# to collect the query result
setting = model.UserSettings.objects(userid=userid).get()
return json.loads(setting.settings)
threads.deferToThread(_query)
When run in twisted.trial unit tests. The test that uses above code always fails with
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was
unclean.
DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug)
<DelayedCall 0x10e0a2dd8 [9.98250699043274s] called=0 cancelled=0 TwistedLoop._on_loop_timer()
In the autobahn worker where this code used, however works fine.
The cassandra driver code for TwistedConnection, keeps on calling callLater, and I could not find a way to find if any of these calls are still pending, as these calls are hidden in the TwistedLoop class.
Questions:
Is this correct way of handling cassandra query (which in turn would call twisted reactor)
If yes, is there a way to address DelayedCall resulting from cassandra driver timeout (reactor.callLater).
Just my understanding:
Maybe you will need to call .filter function while filtering? as mentioned in docs setting = model.UserSettings.objects.filter(userid=userid).get()
Maybe work around by change response time in Cassandra conf yaml file?

Akka scheduler : strange behavior in production (messages not firing)

I'm developing an scala + akka app as part as a bigger application. The purpose of the app is to call external services and SQL databases (using JDBC), do some processing, and return a parsed result, on a recurrent basis. The app uses akka cluster so that it can scale horizontally.
How it should work
I'm creating a **singleton actor* on the cluster who's responsible for sending instructions to a pool of instruction handlers actors. I'm receiving events from a Redis pub/sub channel that state which datasources should be refreshed and how often. This SourceScheduler actor stores in an internal Array the instruction along with the interval.
Then I'm using akka Scheduler to execute a tick function every second. This function filters the array to determine which instructions need to be executed, and sends messages to the instructions handlers pool. The routees in the pool execute the instructions and emit the results through Redis Pub/Sub
The issue
On my machine (Ryzen 7 + 16GB RAM + ArchLinux) everything runs fine and we're processing easily 2500 database calls/second. But once in production, I cannot get it to process more than ~400 requests/s.
The SourceScheduler doesn't tick every second, and messages get stuck in the mailbox. Also, the app uses more CPU resources, and way more RAM (1.3GB in production vs ~350MB on my machine)
The production app runs in a JRE-8 alpine-based Docker container on Rancher, on a MS Azure server.
I understand that singleton actors on clusters can be a bottleneck, but since it only forwards messages to other actors I don't see how it could block.
What I've tried
I use Tomcat JDBC as connection pool manager for SQL queries. I'm sure I don't leak any a connection for I log every connection that is borrowed from the pool and every connection that returns to it
Blocking operations like JDBC queries are all executed on a separate dispatcher, a fixed thread pool executer with 500 threads, so all other actors should run properly
I've also given the SourceScheduler actor a dedicated pinned dispatcher so it should run on it's own thread
I've tried running the app in cluster with 3 nodes, with no performance improvement. Since the SourceScheduler is a singleton, running multiple nodes does not resolve the issue
I've tried the app on my coworker's machine. Works like a charm. I'm only experiencing issues with the production server
I've tried upgrading the production server to the most powerful available on Azure (16 cores, 2.3ghz) with no noticeable change
As anyone ever experienced such differences between their local machine and the production server ?
EDIT SourceScheduler.scala
class SourceScheduler extends Actor with ActorLogging with Timers {
case object Tick
case object SchedulerReport
import context.dispatcher
val instructionHandlerPool = context.actorOf(
ClusterRouterGroup(
RoundRobinGroup(Nil),
ClusterRouterGroupSettings(
totalInstances = 10,
routeesPaths = List("/user/instructionHandler"),
allowLocalRoutees = true
)
).props(),
name = "instructionHandlerRouter")
var ticks: Int = 0
var refreshedSources: Int = 0
val maxTicks: Int = Int.MaxValue - 1
var scheduledSources = Array[(String, Int, String)]()
override def preStart(): Unit = {
log.info("Starting Scheduler")
}
def refreshSource(hash: String) = {
instructionHandlerPool ! Instruction(hash)
refreshedSources += 1
}
// Get sources that neeed to be refreshed
def getEligibleSources(sources: Seq[(String, Int, String)], tick: Int) = {
sources.groupBy(_._1).mapValues(_.toList.minBy(_._2)).values.filter(tick * 1000 % _._2 == 0).map(_._1)
}
def tick(): Unit = {
ticks += 1
log.debug("Scheduler TICK {}", ticks)
val eligibleSources = getEligibleSources(scheduledSources, ticks)
val chunks = eligibleSources.grouped(ConnectionPoolManager.connectionPoolSize).zipWithIndex.toList
log.debug("Scheduling {} sources in {} chunks", eligibleSources.size, chunks.size)
chunks.foreach({
case(sources, index) =>
after((index * 25 + 5) milliseconds, context.system.scheduler)(Future.successful {
sources.foreach(refreshSource)
})
})
if(ticks >= maxTicks) ticks = 0
}
timers.startPeriodicTimer("schedulerTickTimer", Tick, 990 milliseconds)
timers.startPeriodicTimer("schedulerReportTimer", SchedulerReport, 10 seconds)
def receive: Receive = {
case AttachSource(hash, interval, socketId) =>
scheduledSources.synchronized {
scheduledSources = scheduledSources :+ ((hash, interval, socketId))
}
case DetachSource(socketId) =>
scheduledSources.synchronized {
scheduledSources = scheduledSources.filterNot(_._3 == socketId)
}
case SchedulerReport =>
log.info("{} sources were scheduled since last report", refreshedSources)
refreshedSources = 0
case Tick => tick()
case _ =>
}
}
Each source has is determined by a hash containing all required data for the execution (like the host of the database for example), the refresh interval, and the unique id of the client that asked for it so we can stop refreshing when the client disconnects.
Each second, we check if the source needs to be refreshed by applying a modulo with the current value of the ticks counter.
We refresh sources in smaller chunks to avoid connection pool starvation
The problem is that under a small load (~300 rq/s) the tick function is no longer executed every second
It turns out the issue was with Rancher.
We did several tests and the app was running fine on the machine directly, and on docker, but not when using Rancher as the orchestrator. I'm not sure why but since it's not related to Akka I'm closing the issue.
Thanks everyone for your help.
Maybe the bottleneck is on the network latency? In your machine all components are running side by side and communication should have no latency but in the cluster, if you are making a high number of database calls from one machine to another the network latency may be noticeable.

Resources