Performance bottlenecks when using async-await with Azure Storage API - azure

I'm hitting a performance bottleneck, on insertion requests using the Azure Table Storage API. I'm trying to reach of a speed of at least 1 insert per 30ms into a table (unique partition keys).
What is the recommended way to achieve this request rate and how can I fix my program to overcome my bottleneck?
I have a test programs that inserts into the azure table at roughly 1 / 30ms. With this test program, the latency continuously increases and requests begin to take even more than 15 seconds per insert.
Below is the code for my test program. It creates async tasks that log the time it takes to await on the CloudTable ExecuteAsync method. Unfortunately, the insertion latency just grows as the program runs.
List<Task> tasks = new List<Task>();
while (true)
{
Thread.Sleep(30);
tasks = tasks.Where(t => t.IsCompleted == false).ToList(); // Remove completed tasks
DynamicTableEntity dte = new DynamicTableEntity() { PartitionKey = Guid.NewGuid().ToString(), RowKey = "abcd" };
tasks.Add(AddEntityToTableAsync(dte));
}
...
public static async Task<int> AddEntityToTableAsync<T>(T entity) where T : class, ITableEntity
{
Stopwatch timer = Stopwatch.StartNew();
var tableResult = await this.cloudTable.ExecuteAsync(TableOperation.InsertOrReplace(entity));
timer.Stop();
Console.WriteLine($"Table Insert Time: {timer.ElapsedMilliseconds}, Inserted {entity.PartitionKey}");
return tableResult.HttpStatusCode;
}
I thought that it might be my test program running out of threads for the outgoing Network IO, so I tried monitoring the available thread counts during the program's execution:
ThreadPool.GetAvailableThreads(out workerThreads, out completionIoPortThreads);
It showed that nearly all the IO threads were available during execution (Just in case, I even tried increasing the available threads but that had no affect on the issue).
As I understand it, for async tasks, the completion port threads don't get "reserved" until there's data on them to process, so I started thinking that there might be an issue with my connection to Azure Table Storage.
However, I confirmed that was not the case by lowering the request rate (1 insert / 100ms) and launching 30 instances of my test program on the same machine. With 30 instances, I was able to maintain a stable ~90ms / insert without any increase in latency.
What can I do to enable a single test program to achieve a simillar performance that I was getting when running 30 programs on the same machine?

The test program was hitting the System.Net.ServicePointManager.DefaultConnectionLimit. The default value is 2
Increasing the number to 100 fixes the problem. And allows the single program to achieve the same speed as the 30 programs scenario

Related

Throttling EF queries to save DTUs

We have an asp.Net application using EF 6 hosted in Azure. The database runs at about 20% DTU usage for most of the time except for certain rare actions.
These are almost like db dumps in Excel format, like having all orders of the last X years etc. which the (power) users can trigger and then get the result later by email.
The problem is that these queries use up all DTU and the whole application goes into a crawl. We would like to kind of throttle these non-critical queries as it doesn't matter if this takes 10-15min longer.
Googling I found the option to reduce the DEADLOCK_PRIORITY but this wont fix the issue of using up all resources.
Thanks for any pointers, ideas or solutions.
Optimizing is going to be hard as it is more or less a db dump.
Azure SQL Database doesn't have Resource Governor available, so you'll have to handle this in code.
Azure SQL Database runs in READ COMMITTED SNAPSHOT mode, so slowing down the session that dumps the data from a table (or any streaming query plan) should reduce its DTU consumption without adversely affecting other sessions.
To do this put waits in the loop that reads the query results, either an IEnumerable<TEntity> returned from a LINQ query or a SqlDataReader returned from an ADO.NET SqlCommand.
But you'll have to directly loop over the streaming results. You can't copy the query results into memory first using IQueryable<TEntity>.ToList() or DataTable.Load(), SqlDataAdapter.Fill(), etc as that would read as fast as possible.
eg
var results = new List<TEntity>();
int rc = 0;
using (var dr = cmd.ExecuteReader())
{
while (dr.Read())
{
rc++;
var e = new TEntity();
e.Id = dr.GetInt(0);
e.Name = dr.GetString(1);
// ...
results.Add(e);
if (rc%100==0)
Thread.CurrentThread.Sleep(100);
}
}
or
var results = new List<TEntity>();
int rc = 0;
foreach (var e in db.MyTable.AsEnumerable())
{
rc++;
var e = new TEntity();
e.Id = dr.GetInt(0);
e.Name = dr.GetString(1);
// ...
results.Add(e);
if (rc%100==0)
Thread.CurrentThread.Sleep(100);
}
For extra credit, use async waits and stream the results directly to the client without batching in memory.
Alternatively, or in addition, you can limit the number of sessions that can concurrently perform the dump to one, or one per table, etc using named Application Locks.

AWS DocumentDB Performance Issue with Concurrency of Aggregations

I'm working with DocumentDB in AWS, and I've been having troubles when I try to read from the same collection simultaneously from different aggregation queries.
The issue is not that I cannot read from the database, but rather that it takes a lot of time to complete the queries. It doesn't matter if I trigger the queries simultaneously or one after the other.
I'm using a Lambda Function with NodeJS to run my code. And I'm using mongoose to handle the connection with the database.
Here's a sample code that I put together to illustrate my problem:
query1() {
return Collection.aggregate([...])
}
query2() {
return Collection.aggregate([...])
}
query3() {
return Collection.aggregate([...])
}
It takes the same time if I run it using Promise.all
Promise.all([ query1(), query2(), query3() ])
Than if I run it waiting for the previous one to finish
query1().then(result1 => query2().then(result3 => query3()))
While if I run each query in different Lambda Executions, it takes significantly less time for each individual query to finish (Between 1 and 2 seconds).
So if they were running in parallel the execution should be finished with the time of the query that takes the most time (2 seconds), and not take 7 seconds, as it does now.
So my guessing is that the instance of DocumentDB is running the queries in sequence no matter how I send them. In the collection there are around 19,000 documents with a total size of almost 25Mb.
When I check the metrics of the instance, the CPUUtilization is barely over 8% and the RAM available only drops by 20Mb. So I don't think the problem of the delay has to do with the size of the instance.
Do you know why DocumentDB is behaving like this? Is there a configuration that I can change to run the aggregations in parallel?

Akka scheduler : strange behavior in production (messages not firing)

I'm developing an scala + akka app as part as a bigger application. The purpose of the app is to call external services and SQL databases (using JDBC), do some processing, and return a parsed result, on a recurrent basis. The app uses akka cluster so that it can scale horizontally.
How it should work
I'm creating a **singleton actor* on the cluster who's responsible for sending instructions to a pool of instruction handlers actors. I'm receiving events from a Redis pub/sub channel that state which datasources should be refreshed and how often. This SourceScheduler actor stores in an internal Array the instruction along with the interval.
Then I'm using akka Scheduler to execute a tick function every second. This function filters the array to determine which instructions need to be executed, and sends messages to the instructions handlers pool. The routees in the pool execute the instructions and emit the results through Redis Pub/Sub
The issue
On my machine (Ryzen 7 + 16GB RAM + ArchLinux) everything runs fine and we're processing easily 2500 database calls/second. But once in production, I cannot get it to process more than ~400 requests/s.
The SourceScheduler doesn't tick every second, and messages get stuck in the mailbox. Also, the app uses more CPU resources, and way more RAM (1.3GB in production vs ~350MB on my machine)
The production app runs in a JRE-8 alpine-based Docker container on Rancher, on a MS Azure server.
I understand that singleton actors on clusters can be a bottleneck, but since it only forwards messages to other actors I don't see how it could block.
What I've tried
I use Tomcat JDBC as connection pool manager for SQL queries. I'm sure I don't leak any a connection for I log every connection that is borrowed from the pool and every connection that returns to it
Blocking operations like JDBC queries are all executed on a separate dispatcher, a fixed thread pool executer with 500 threads, so all other actors should run properly
I've also given the SourceScheduler actor a dedicated pinned dispatcher so it should run on it's own thread
I've tried running the app in cluster with 3 nodes, with no performance improvement. Since the SourceScheduler is a singleton, running multiple nodes does not resolve the issue
I've tried the app on my coworker's machine. Works like a charm. I'm only experiencing issues with the production server
I've tried upgrading the production server to the most powerful available on Azure (16 cores, 2.3ghz) with no noticeable change
As anyone ever experienced such differences between their local machine and the production server ?
EDIT SourceScheduler.scala
class SourceScheduler extends Actor with ActorLogging with Timers {
case object Tick
case object SchedulerReport
import context.dispatcher
val instructionHandlerPool = context.actorOf(
ClusterRouterGroup(
RoundRobinGroup(Nil),
ClusterRouterGroupSettings(
totalInstances = 10,
routeesPaths = List("/user/instructionHandler"),
allowLocalRoutees = true
)
).props(),
name = "instructionHandlerRouter")
var ticks: Int = 0
var refreshedSources: Int = 0
val maxTicks: Int = Int.MaxValue - 1
var scheduledSources = Array[(String, Int, String)]()
override def preStart(): Unit = {
log.info("Starting Scheduler")
}
def refreshSource(hash: String) = {
instructionHandlerPool ! Instruction(hash)
refreshedSources += 1
}
// Get sources that neeed to be refreshed
def getEligibleSources(sources: Seq[(String, Int, String)], tick: Int) = {
sources.groupBy(_._1).mapValues(_.toList.minBy(_._2)).values.filter(tick * 1000 % _._2 == 0).map(_._1)
}
def tick(): Unit = {
ticks += 1
log.debug("Scheduler TICK {}", ticks)
val eligibleSources = getEligibleSources(scheduledSources, ticks)
val chunks = eligibleSources.grouped(ConnectionPoolManager.connectionPoolSize).zipWithIndex.toList
log.debug("Scheduling {} sources in {} chunks", eligibleSources.size, chunks.size)
chunks.foreach({
case(sources, index) =>
after((index * 25 + 5) milliseconds, context.system.scheduler)(Future.successful {
sources.foreach(refreshSource)
})
})
if(ticks >= maxTicks) ticks = 0
}
timers.startPeriodicTimer("schedulerTickTimer", Tick, 990 milliseconds)
timers.startPeriodicTimer("schedulerReportTimer", SchedulerReport, 10 seconds)
def receive: Receive = {
case AttachSource(hash, interval, socketId) =>
scheduledSources.synchronized {
scheduledSources = scheduledSources :+ ((hash, interval, socketId))
}
case DetachSource(socketId) =>
scheduledSources.synchronized {
scheduledSources = scheduledSources.filterNot(_._3 == socketId)
}
case SchedulerReport =>
log.info("{} sources were scheduled since last report", refreshedSources)
refreshedSources = 0
case Tick => tick()
case _ =>
}
}
Each source has is determined by a hash containing all required data for the execution (like the host of the database for example), the refresh interval, and the unique id of the client that asked for it so we can stop refreshing when the client disconnects.
Each second, we check if the source needs to be refreshed by applying a modulo with the current value of the ticks counter.
We refresh sources in smaller chunks to avoid connection pool starvation
The problem is that under a small load (~300 rq/s) the tick function is no longer executed every second
It turns out the issue was with Rancher.
We did several tests and the app was running fine on the machine directly, and on docker, but not when using Rancher as the orchestrator. I'm not sure why but since it's not related to Akka I'm closing the issue.
Thanks everyone for your help.
Maybe the bottleneck is on the network latency? In your machine all components are running side by side and communication should have no latency but in the cluster, if you are making a high number of database calls from one machine to another the network latency may be noticeable.

Time slowed down using `S.D.Stopwatch` on Azure

I just ran some code which reports its performance on an Azure Web Sites instance; the result seemed a little off. I re-ran the operation, and indeed it seems consistent: System.Diagnostics.Stopwatch sees an execution time of 12 seconds for an operation that actually took more than three minutes (at least 3m16s).
Debug.WriteLine("Loading dataset in database ...");
var stopwatch = new Stopwatch();
stopwatch.Start();
ProcessDataset(CurrentDataSource.Database.Connection as SqlConnection, parser);
stopwatch.Stop();
Debug.WriteLine("Dataset loaded in database ({0}s)", stopwatch.Elapsed.Seconds);
return (short)stopwatch.Elapsed.Seconds;
This process runs in the context of a WCF Data Service "action" and seeds test data in a SQL Database (this is not production code). Specifically, it:
Opens a connection to an Azure SQL Database,
Disables a null constraint,
Uses System.Data.SqlClient.SqlBulkCopy to lock an empty table and load it using a buffered stream that retrieves a dataset (2.4MB) from Azure Blob Storage via the filesystem, decompresses it (GZip, 4.9MB inflated) and parses it (CSV, 349996 records, parsed with a custom IDataReader using TextFieldParser),
Updates a column of the same table to set a common value,
Re-enables the null constraint.
No less, no more; there's nothing particularly intensive going on, I figure the operation is mostly network-bound.
Any idea why time is slowing down?
Notes:
Interestingly, timeouts for both the bulk insert and the update commands had to be increased (set to five minutes). I read that the default is 30 seconds, which is more than the reported 12 seconds; hence, I conclude that SqlClient measures time differently.
Reports from local execution seem perfectly correct, although it's consistently faster (4-6s using LocalDB) so it may just be that the effects are not noticeable.
You used stopwatch.Elapsed.Seconds to get total time but it is wrong. Elapsed.Seconds is the seconds component of the time interval represented by the TimeSpan structure. Please try stopwatch.Elapsed.TotalSeconds instead.

Azure Table Storage slow to update records

I have an Azure Table which stores 1000s of discount codes partitioned by the first letter of the code so there are roughly 30 partitions with 1000 records each. In my application I enter a code and get the specific record from the table. I then update the discount code to say that it's been used. When load testing this application with 1000 concurrent users for 30 seconds the response times for reading the codes takes less than 1 second but updating the record takes over 10 seconds. Is this typical behavior for table storage or is there a way to speed this up?
//update discount code
string code = "A0099";
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("constring...");
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("discounts");
string partitionKey = code[0].ToString().ToUpper();
TableOperation retrieveOperation = TableOperation.Retrieve<DiscountEntity>(partitionKey, code);
TableResult retrievedResult = table.Execute(retrieveOperation);
if (retrievedResult.Result != null) {
DiscountEntity discount = (DiscountEntity)retrievedResult.Result;
discount.Used = true;
TableOperation updateOperation = TableOperation.Replace(discount);
table.Execute(updateOperation);
}
This is not the default behavior but i've seen it before... first of all check your vm size, because the bigger the vm size, faster the I/O (theres a MS doc somewhere that says that fat VMs have "fast I/O" or something like that...) but 10 secs is alot even for the extra-small vm...
To speed things up, i would suggest you to:
implement cache!, instead of searching for 1 code at a time, capture the whole "letter" of unused codes at once, cache them up, and then search the cache for the guy to update
Dont live update, instead, update the cache and than use the async methods to save things back
One thing you can check is the E2E time for a specific request vs how much time server has spent processing the request. That would allow you to see whether the bottleneck is the client/network or the server.
For more information on enabling Windows Azure Storage Analytics (specifically Logging), please refer to How To Monitor a Storage Account and Storage Analytics articles.

Resources