Meta-Question:
We're pulling data from EventHub, running some logic, and saving it off to cosmos. Currently Cosmos inserts are our bottleneck. How do we maximize our throughput?
Details
We're trying to optimize our Cosmos throughput and there seems to be some contention in the SDK that makes parallel inserts only marginally faster than serial inserts.
We're logically doing:
for (int i = 0; i < insertCount; i++)
{
taskList.Add(InsertCosmos(sdkContainerClient));
}
var parallelTimes = await Task.WhenAll(taskList);
Here's the results comparing serial inserts, parallel inserts, and "faking" an insert (with Task.Delay):
Serial took: 461ms for 20
- Individual times 28,8,117,19,14,11,10,12,5,8,9,11,18,15,79,23,14,16,14,13
Cosmos Parallel
Parallel took: 231ms for 20
- Individual times 17,15,23,39,45,52,72,74,80,91,96,98,108,117,123,128,139,146,147,145
Just Parallel (no cosmos)
Parallel took: 27ms for 20
- Individual times 27,26,26,26,26,26,26,25,25,25,25,25,25,24,24,24,23,23,23,23
Serial is obvious (just add each value)
no cosmos (the last timing) is also obvious (just take the min time)
But parallel cosmos doesn't parallelize nearly as well, indicating there's some contention.
We're running this on a VM in Azure (same datacenter as Cosmos), have enough RUs so aren't getting 429s, and using Microsoft.Azure.Cosmos 3.2.0.
Full Code Sample
class Program
{
public static void Main(string[] args)
{
CosmosWriteTest().Wait();
}
public static async Task CosmosWriteTest()
{
var cosmosClient = new CosmosClient("todo", new CosmosClientOptions { ConnectionMode = ConnectionMode.Direct });
var database = cosmosClient.GetDatabase("<ourcontainer>");
var sdkContainerClient = database.GetContainer("<ourcontainer>");
int insertCount = 25;
//Warmup
await sdkContainerClient.CreateItemAsync(new TestObject());
//---Serially inserts into Cosmos---
List<long> serialTimes = new List<long>();
var serialTimer = Stopwatch.StartNew();
Console.WriteLine("Cosmos Serial");
for (int i = 0; i < insertCount; i++)
{
serialTimes.Add(await InsertCosmos(sdkContainerClient));
}
serialTimer.Stop();
Console.WriteLine($"Serial took: {serialTimer.ElapsedMilliseconds}ms for {insertCount}");
Console.WriteLine($" - Individual times {string.Join(",", serialTimes)}");
//---Parallel inserts into Cosmos---
Console.WriteLine(Environment.NewLine + "Cosmos Parallel");
var parallelTimer = Stopwatch.StartNew();
var taskList = new List<Task<long>>();
for (int i = 0; i < insertCount; i++)
{
taskList.Add(InsertCosmos(sdkContainerClient));
}
var parallelTimes = await Task.WhenAll(taskList);
parallelTimer.Stop();
Console.WriteLine($"Parallel took: {parallelTimer.ElapsedMilliseconds}ms for {insertCount}");
Console.WriteLine($" - Individual times {string.Join(",", parallelTimes)}");
//---Testing parallelism minus cosmos---
Console.WriteLine(Environment.NewLine + "Just Parallel (no cosmos)");
var justParallelTimer = Stopwatch.StartNew();
var noCosmosTaskList = new List<Task<long>>();
for (int i = 0; i < insertCount; i++)
{
noCosmosTaskList.Add(InsertCosmos(sdkContainerClient, true));
}
var justParallelTimes = await Task.WhenAll(noCosmosTaskList);
justParallelTimer.Stop();
Console.WriteLine($"Parallel took: {justParallelTimer.ElapsedMilliseconds}ms for {insertCount}");
Console.WriteLine($" - Individual times {string.Join(",", justParallelTimes)}");
}
//inserts
private static async Task<long> InsertCosmos(Container sdkContainerClient, bool justDelay = false)
{
var timer = Stopwatch.StartNew();
if (!justDelay)
await sdkContainerClient.CreateItemAsync(new TestObject());
else
await Task.Delay(20);
timer.Stop();
return timer.ElapsedMilliseconds;
}
//Test object to save to Cosmos
public class TestObject
{
public string id { get; set; } = Guid.NewGuid().ToString();
public string pKey { get; set; } = Guid.NewGuid().ToString();
public string Field1 { get; set; } = "Testing this field";
public double Number { get; set; } = 12345;
}
}
This is the scenario for which Bulk is being introduced. Bulk mode is in preview at this moment and available in the 3.2.0-preview2 package.
What you need to do to take advantage of Bulk is turn the AllowBulkExecution flag on:
new CosmosClient(endpoint, authKey, new CosmosClientOptions() { AllowBulkExecution = true } );
This mode was made to benefit this scenario you describe, a list of concurrent operations that need throughput.
We have a sample project here: https://github.com/Azure/azure-cosmos-dotnet-v3/tree/master/Microsoft.Azure.Cosmos.Samples/Usage/BulkSupport
And we are still working on the official documentation, but the idea is that when concurrent operations are issued, instead of executing them as individual requests like you are seeing right now, the SDK will group them based on partition affinity and execute them as grouped (batch) operations, reducing the backend service calls and potentially increasing throughput between 50%-100% depending on the volume of operations. This mode will consume more RU/s as it is pushing a higher volume of operations per second than issuing the operations individually (so if you hit 429s it means the bottleneck is now on the provisioned RU/s).
var cosmosClient = new CosmosClient("todo", new CosmosClientOptions { AllowBulkExecution = true });
var database = cosmosClient.GetDatabase("<ourcontainer>");
var sdkContainerClient = database.GetContainer("<ourcontainer>");
//The more operations the better, just 25 might not yield a great difference vs non bulk
int insertCount = 10000;
//Don't do any warmup
List<Task> operations = new List<Tasks>();
var timer = Stopwatch.StartNew();
for (int i = 0; i < insertCount; i++)
{
operations.Add(sdkContainerClient.CreateItemAsync(new TestObject()));
}
await Task.WhenAll(operations);
serialTimer.Stop();
Important: This is a feature that is still in preview. Since this is a mode optimized for throughput (not latency), any single individual operation you do, won't have a great operational latency.
If you want to optimize even further, and your data source lets you access Streams (avoid serialization), you can use the CreateItemStream SDK methods for even better throughput.
i'm making a web application which is connected to Azure by using .Net Core 2.1.1. I encountered some problem related with service bus queue when i'm trying to get the sessionid of the related queue.
I found some code, but it isn't supported by the .Net Core. Here below is the code:
var queueClient = QueueClient.CreateFromConnectionString(AppSettings.ServiceBusConnection, queueName);
var sessions = await queueClient.GetMessageSessionsAsync();
return sessions;
I also already tried this function,
var connString = Configuration.GetConnectionString("servicebus");
sessionClient = new SessionClient(connString, queue,ReceiveMode.PeekLock);
List<IMessageSession> sessions=new List<IMessageSession>();
while (true)
{
var session = await sessionClient.AcceptMessageSessionAsync();
if (session == null)
break;
sessions.Add(session);
}
return sessions;
}
But it keep giving me Timeout Exception. Can anyone help me ?
This is something which i tried and it worked for me, Please check the screenshot below
Here is the code i have tried
using System;
namespace Core.SBConsole
{
using Microsoft.Azure.ServiceBus;
using Microsoft.Azure.ServiceBus.Core;
using System;
using System.Collections.Generic;
using System.Text;
using System.Threading.Tasks;
class Program
{
// Connection String for the namespace can be obtained from the Azure portal under the
// 'Shared Access policies' section.
const string ServiceBusConnectionString = "{Connection String}";
const string QueueName = "mvq";
static IMessageSender messageSender;
static ISessionClient sessionClient;
const string SessionPrefix = "session-prefix";
static void Main(string[] args)
{
MainAsync().GetAwaiter().GetResult();
}
static async Task MainAsync()
{
const int numberOfSessions = 5;
const int numberOfMessagesPerSession = 3;
messageSender = new MessageSender(ServiceBusConnectionString, QueueName);
sessionClient = new SessionClient(ServiceBusConnectionString, QueueName);
// Send messages with sessionId set
await SendSessionMessagesAsync(numberOfSessions, numberOfMessagesPerSession);
// Receive all Session based messages using SessionClient
await ReceiveSessionMessagesAsync(numberOfSessions, numberOfMessagesPerSession);
Console.WriteLine("=========================================================");
Console.WriteLine("Completed Receiving all messages... Press any key to exit");
Console.WriteLine("=========================================================");
Console.ReadKey();
await messageSender.CloseAsync();
await sessionClient.CloseAsync();
}
static async Task ReceiveSessionMessagesAsync(int numberOfSessions, int messagesPerSession)
{
Console.WriteLine("===================================================================");
Console.WriteLine("Accepting sessions in the reverse order of sends for demo purposes");
Console.WriteLine("===================================================================");
for (int i = 0; i < numberOfSessions; i++)
{
int messagesReceivedPerSession = 0;
// AcceptMessageSessionAsync(i.ToString()) as below with session id as parameter will try to get a session with that sessionId.
// AcceptMessageSessionAsync() without any messages will try to get any available session with messages associated with that session.
IMessageSession session = await sessionClient.AcceptMessageSessionAsync();// (SessionPrefix + i.ToString());
if (session != null)
{
// Messages within a session will always arrive in order.
Console.WriteLine("=====================================");
Console.WriteLine($"Received Session: {session.SessionId}");
while (messagesReceivedPerSession++ < messagesPerSession)
{
Message message = await session.ReceiveAsync();
Console.WriteLine($"Received message: SequenceNumber:{message.SystemProperties.SequenceNumber} Body:{Encoding.UTF8.GetString(message.Body)}");
// Complete the message so that it is not received again.
// This can be done only if the queueClient is created in ReceiveMode.PeekLock mode (which is default).
await session.CompleteAsync(message.SystemProperties.LockToken);
}
Console.WriteLine($"Received all messages for Session: {session.SessionId}");
Console.WriteLine("=====================================");
// Close the Session after receiving all messages from the session
await session.CloseAsync();
}
}
}
static async Task SendSessionMessagesAsync(int numberOfSessions, int messagesPerSession)
{
if (numberOfSessions == 0 || messagesPerSession == 0)
{
await Task.FromResult(false);
}
for (int i = numberOfSessions - 1; i >= 0; i--)
{
var messagesToSend = new List<Message>();
string sessionId = SessionPrefix + i;
for (int j = 0; j < messagesPerSession; j++)
{
// Create a new message to send to the queue
string messageBody = "test" + j;
var message = new Message(Encoding.UTF8.GetBytes(messageBody));
// Assign a SessionId for the message
message.SessionId = sessionId;
messagesToSend.Add(message);
// Write the sessionId, body of the message to the console
Console.WriteLine($"Sending SessionId: {message.SessionId}, message: {messageBody}");
}
// Send a batch of messages corresponding to this sessionId to the queue
await messageSender.SendAsync(messagesToSend);
}
Console.WriteLine("=====================================");
Console.WriteLine($"Sent {messagesPerSession} messages each for {numberOfSessions} sessions.");
Console.WriteLine("=====================================");
}
}
}
Things to consider before creating queue
1) Make sure service bus is not in the free or basic tier, if yes then scale it to Standadrd
2) Make sure to enable session while creating queue.
I am using Microsoft.Azure.ServiceBus nuget pakcage 3.4 which is latest now.If you are using some other package try to upgrade/downgrade it.
Hope it helps.
public static async Task DoMessage()
{
const int numberOfMessages = 10;
queueClient = new QueueClient(ConnectionString, QueueName);
await SendMessageAsync(numberOfMessages);
await queueClient.CloseAsync();
}
private static async Task SendMessageAsync(int numOfMessages)
{
try
{
for (var i = 0; i < numOfMessages; i++)
{
var messageBody = $"Message {i}";
var message = new Message(Encoding.UTF8.GetBytes(messageBody));
message.SessionId = i.ToString();
await queueClient.SendAsync(message);
}
}
catch (Exception e)
{
}
}
This is my sample code to send message to the service bus queue with session id.
My question is if I call DoMessage function 2 times: Let's name it as MessageSet1 and MessageSet2, respectively. Will the MessageSet2 be received and processed by the received azure function who dealing with the receiving ends of the message.
I want to handle in order like MessageSet1 then the MessageSet2 and never handle with MessageSet2 unless MessageSet1 finished.
There are a couple of issues with what you're doing.
First, Azure Functions do not currently support sessions. There's an issue for that you can track.
Second, the sessions you're creating are off. A session should be applied on a set of messages using the same SessionId. Meaning your for loop should be assigning the same SessionId to all the messages in the set. Something like this:
private static async Task SendMessageAsync(int numOfMessages, string sessionID)
{
try
{
var tasks = new List<Task>();
for (var i = 0; i < numOfMessages; i++)
{
var messageBody = $"Message {i}";
var message = new Message(Encoding.UTF8.GetBytes(messageBody));
message.SessionId = sessionId;
tasks.Add(queueClient.SendAsync(message));
}
await Task.WhenAll(tasks).ConfigureAwait(false);
}
catch (Exception e)
{
// handle exception
}
}
For ordered messages using Sessions, see documentation here.
I got a small WebJob running in 3 instances, the WebJob is triggered by ServiceBusTrigger, each job takes about 20 seconds (I added a sleep for testing).
Now i add 3 items to the ServiceBus Queue but only 2 WebJob Instances are working.
What is the third instance doing and how can i get the instance to also work on the queue?
My code is very basic:
public class Functions
{
// This function will get triggered/executed when a new message is written
// on an Azure Queue called queue.
public static void ProcessQueueMessage([ServiceBusTrigger("jobs2")] string message, TextWriter log)
{
string url = "https://requestb.in/xxxxx";
log.WriteLine(message);
log.WriteLine("gotmsg");
Thread.Sleep(20000);
log.WriteLine("sending");
string postData = "test=" + message;
Console.WriteLine(postData);
System.Net.WebRequest req = System.Net.WebRequest.Create(url);
//Add these, as we're doing a POST
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
//We need to count how many bytes we're sending. Post'ed Faked Forms should be name=value&
byte[] bytes = System.Text.Encoding.ASCII.GetBytes(postData);
req.ContentLength = bytes.Length;
System.IO.Stream os = req.GetRequestStream();
os.Write(bytes, 0, bytes.Length); //Push it out there
os.Close();
System.Net.WebResponse resp = req.GetResponse();
if (resp == null) return;
System.IO.StreamReader sr = new System.IO.StreamReader(resp.GetResponseStream());
log.WriteLine(sr.ReadToEnd().Trim());
}
}
What is the third instance doing and how can i get the instance to also work on the queue?
The third instance shoud work as default. I assume Azure WebApp uses the specified LoadBalance strategy for multiple instances. And it seems that we have no way to config the LoadBalance strategy. In you case it seems that 3 message is not enough for test that. Please have a try to test it with more message. And I test it on my side, it works correctly. The following is the test code for sending message I used.
QueueClient.CreateFromConnectionString("connection string", QueueName);
for (int i = 0; i < 20; i++)
{
var sendMessage = new BrokeredMessage("test message"+i);
client.Send(sendMessage);
}
I have an event hub with 4 partitions and 2 consumer groups. I have 2 webjobs that read the data using an EventProcessor. Both for a different consumer group
I have configured the event processors like this:
var host = new EventProcessorHost(
Guid.NewGuid().ToString(),
configurationManager.EventHubConfiguration.Path,
configurationManager.EventHubConfiguration.ConsumerGroupName,
configurationManager.EventHubConfiguration.ListenerConnectionString,
configurationManager.StorageConfiguration.ConnectionString)
{
PartitionManagerOptions = new PartitionManagerOptions
{
AcquireInterval = TimeSpan.FromSeconds(10),
RenewInterval = TimeSpan.FromSeconds(10),
LeaseInterval = TimeSpan.FromSeconds(30)
}
};
var options = EventProcessorOptions.DefaultOptions;
options.MaxBatchSize = 250;
await host.RegisterEventProcessorFactoryAsync(new PlanCareEventProcessorFactory(telemetryClient, configurationManager), options);
return host;
In my EventProcessor I keep track of the progress (some methods skipped to keep it short and readable):
internal class PlanCareEventProcessor : IEventProcessor
{
public Task OpenAsync(PartitionContext context)
{
namespaceManager = NamespaceManager.CreateFromConnectionString(configurationManager.EventHubConfiguration.ManagerConnectionString);
if (namespaceManager == null)
return;
var currentSeqNo = context.Lease.SequenceNumber;
var lastSeqNo = namespaceManager.GetEventHubPartition(context.EventHubPath, context.ConsumerGroupName, context.Lease.PartitionId).EndSequenceNumber;
var delta = lastSeqNo - currentSeqNo;
var msg = $"Last processed seqnr for partition {context.Lease.PartitionId}: {currentSeqNo} of {lastSeqNo} in consumergroup '{context.ConsumerGroupName}' (lag: {delta})";
telemetryClient.TrackTrace(new TraceTelemetry(msg, SeverityLevel.Information));
telemetryClient.TrackMetric(new MetricTelemetry($"Partition_Lag_{context.Lease.PartitionId}_{context.ConsumerGroupName}", delta));
return Task.CompletedTask;
}
public async Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> events)
{
progressCounter++;
...
await LogProgress(context);
}
private async Task LogProgress(PartitionContext context)
{
if (progressCounter >= 100)
{
await CheckPointAsync(context);
progressCounter = 0;
}
}
}
Now I noticed a difference in the webjobs when it comes to how often OpenAsync and CloseAsync are called. For one of the consumer groups this is about every half hour while for the other one it is several times a minute.
Since both webjobs use the same code and are running on the same app plan, what could be the reason for this?
It bothers me because checkpointing using await CheckPointAsync(context) is almost never done for one of the webjobs since it does not reach the threshold before the lease is gone.