Cosmos inserts won't parallelize effectively - azure

Meta-Question:
We're pulling data from EventHub, running some logic, and saving it off to cosmos. Currently Cosmos inserts are our bottleneck. How do we maximize our throughput?
Details
We're trying to optimize our Cosmos throughput and there seems to be some contention in the SDK that makes parallel inserts only marginally faster than serial inserts.
We're logically doing:
for (int i = 0; i < insertCount; i++)
{
taskList.Add(InsertCosmos(sdkContainerClient));
}
var parallelTimes = await Task.WhenAll(taskList);
Here's the results comparing serial inserts, parallel inserts, and "faking" an insert (with Task.Delay):
Serial took: 461ms for 20
- Individual times 28,8,117,19,14,11,10,12,5,8,9,11,18,15,79,23,14,16,14,13
Cosmos Parallel
Parallel took: 231ms for 20
- Individual times 17,15,23,39,45,52,72,74,80,91,96,98,108,117,123,128,139,146,147,145
Just Parallel (no cosmos)
Parallel took: 27ms for 20
- Individual times 27,26,26,26,26,26,26,25,25,25,25,25,25,24,24,24,23,23,23,23
Serial is obvious (just add each value)
no cosmos (the last timing) is also obvious (just take the min time)
But parallel cosmos doesn't parallelize nearly as well, indicating there's some contention.
We're running this on a VM in Azure (same datacenter as Cosmos), have enough RUs so aren't getting 429s, and using Microsoft.Azure.Cosmos 3.2.0.
Full Code Sample
class Program
{
public static void Main(string[] args)
{
CosmosWriteTest().Wait();
}
public static async Task CosmosWriteTest()
{
var cosmosClient = new CosmosClient("todo", new CosmosClientOptions { ConnectionMode = ConnectionMode.Direct });
var database = cosmosClient.GetDatabase("<ourcontainer>");
var sdkContainerClient = database.GetContainer("<ourcontainer>");
int insertCount = 25;
//Warmup
await sdkContainerClient.CreateItemAsync(new TestObject());
//---Serially inserts into Cosmos---
List<long> serialTimes = new List<long>();
var serialTimer = Stopwatch.StartNew();
Console.WriteLine("Cosmos Serial");
for (int i = 0; i < insertCount; i++)
{
serialTimes.Add(await InsertCosmos(sdkContainerClient));
}
serialTimer.Stop();
Console.WriteLine($"Serial took: {serialTimer.ElapsedMilliseconds}ms for {insertCount}");
Console.WriteLine($" - Individual times {string.Join(",", serialTimes)}");
//---Parallel inserts into Cosmos---
Console.WriteLine(Environment.NewLine + "Cosmos Parallel");
var parallelTimer = Stopwatch.StartNew();
var taskList = new List<Task<long>>();
for (int i = 0; i < insertCount; i++)
{
taskList.Add(InsertCosmos(sdkContainerClient));
}
var parallelTimes = await Task.WhenAll(taskList);
parallelTimer.Stop();
Console.WriteLine($"Parallel took: {parallelTimer.ElapsedMilliseconds}ms for {insertCount}");
Console.WriteLine($" - Individual times {string.Join(",", parallelTimes)}");
//---Testing parallelism minus cosmos---
Console.WriteLine(Environment.NewLine + "Just Parallel (no cosmos)");
var justParallelTimer = Stopwatch.StartNew();
var noCosmosTaskList = new List<Task<long>>();
for (int i = 0; i < insertCount; i++)
{
noCosmosTaskList.Add(InsertCosmos(sdkContainerClient, true));
}
var justParallelTimes = await Task.WhenAll(noCosmosTaskList);
justParallelTimer.Stop();
Console.WriteLine($"Parallel took: {justParallelTimer.ElapsedMilliseconds}ms for {insertCount}");
Console.WriteLine($" - Individual times {string.Join(",", justParallelTimes)}");
}
//inserts
private static async Task<long> InsertCosmos(Container sdkContainerClient, bool justDelay = false)
{
var timer = Stopwatch.StartNew();
if (!justDelay)
await sdkContainerClient.CreateItemAsync(new TestObject());
else
await Task.Delay(20);
timer.Stop();
return timer.ElapsedMilliseconds;
}
//Test object to save to Cosmos
public class TestObject
{
public string id { get; set; } = Guid.NewGuid().ToString();
public string pKey { get; set; } = Guid.NewGuid().ToString();
public string Field1 { get; set; } = "Testing this field";
public double Number { get; set; } = 12345;
}
}

This is the scenario for which Bulk is being introduced. Bulk mode is in preview at this moment and available in the 3.2.0-preview2 package.
What you need to do to take advantage of Bulk is turn the AllowBulkExecution flag on:
new CosmosClient(endpoint, authKey, new CosmosClientOptions() { AllowBulkExecution = true } );
This mode was made to benefit this scenario you describe, a list of concurrent operations that need throughput.
We have a sample project here: https://github.com/Azure/azure-cosmos-dotnet-v3/tree/master/Microsoft.Azure.Cosmos.Samples/Usage/BulkSupport
And we are still working on the official documentation, but the idea is that when concurrent operations are issued, instead of executing them as individual requests like you are seeing right now, the SDK will group them based on partition affinity and execute them as grouped (batch) operations, reducing the backend service calls and potentially increasing throughput between 50%-100% depending on the volume of operations. This mode will consume more RU/s as it is pushing a higher volume of operations per second than issuing the operations individually (so if you hit 429s it means the bottleneck is now on the provisioned RU/s).
var cosmosClient = new CosmosClient("todo", new CosmosClientOptions { AllowBulkExecution = true });
var database = cosmosClient.GetDatabase("<ourcontainer>");
var sdkContainerClient = database.GetContainer("<ourcontainer>");
//The more operations the better, just 25 might not yield a great difference vs non bulk
int insertCount = 10000;
//Don't do any warmup
List<Task> operations = new List<Tasks>();
var timer = Stopwatch.StartNew();
for (int i = 0; i < insertCount; i++)
{
operations.Add(sdkContainerClient.CreateItemAsync(new TestObject()));
}
await Task.WhenAll(operations);
serialTimer.Stop();
Important: This is a feature that is still in preview. Since this is a mode optimized for throughput (not latency), any single individual operation you do, won't have a great operational latency.
If you want to optimize even further, and your data source lets you access Streams (avoid serialization), you can use the CreateItemStream SDK methods for even better throughput.

Related

Async threading in ASP.net Web API

I have to update prices of thousands of SKUs calling a third party API hosted in AWS. The third party party has a TPS throttle of 1000, i.e., 1000 API calls are permitted per second. The third party API approximately takes 1.5 seconds for every API invoke.
Now every time if I update the prices sequentially by invoking third party API, for 2,000 products, the price update is taking 2000 * 1.5 = 3000 seconds. By using threading & thread synchronization this should have been achieved in 3 seconds since the TPS throttle is 1000. Here is sample code snippet of my present method:
[HttpPut]
public async Task<HttpResponseMessage> UpdatePrices([FromBody] List<PriceViewModel> prices)
{
int failedAPICalls = 0;
int successAPICalls = 0;
foreach (var p in prices) {
PriceManagement pm = new PriceManagement();
ErrorViewModel error = await pm.UpdateMarketplacePrice(p.SKU, p.Price);
if (error != null) {
failedAPICalls++;
//Error handling code here...
}
else {
successAPICalls++;
//Log price update to database
}
}
var totalApiCalls = successAPICalls + failedAPICalls;
string message = "Total API calls : " + totalApiCalls.ToString() + " | Successfull API Calls: " + successAPICalls.ToString()
+ " | Failed API Calls: " + failedAPICalls.ToString();
return Request.CreateResponse(HttpStatusCode.OK, message);
}
Here is the sample definition View models:
public class PriceViewModel
{
public string SKU { get; set; }
public decimal Price { get; set; }
}
public class ErrorViewModel
{
public int StatusCode { get; set; }
public string Description { get; set; }
}
Please help me out to improve performance.
The code you posted is sequential. Asynchronous but still sequential. await will await an already asynchronous operation to complete without blocking, before continuing execution. It won't fire off all requests at the same time.
One easy way to make multiple concurrent calls with a specific limit is to use an ActionBlock<> with a MaxDegreeOfParallelism set to the limit you want, eg :
var options=new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
BoundedCapacity = capacity
};
var block=new ActionBlock<PriceViewModel>(async p=>{
var pm = new PriceManagement();
var error = await pm.UpdateMarketplacePrice(p.SKU, p.Price);
if (error != null) {
Interlocked.Increment(ref failedAPICalls);
}
else {
Interlocked.Increment(ref successAPICalls);
}
}, options);
Setting MaxDegreeOfParallelism controls how many messages can be processed concurrently. The rest of the messages are buffered.
Once the block is created, we can post messages to it. Each message will be processed by a separate task up to the MaxDOP limit. Once we're done, we tell the block so and wait for it to complete all remaining messages.
foreach(var p in prices)
{
await block.SendAsync(p);
}
//Tell the block we're done
block.Complete();
//Wait until all prices are processed
await block.Completion;
By default, there's no limit to how many items can be buffered. This can be a problem if operations are slow, as the buffer may end up with thousands of items waiting to be processed, essentially duplicating the prices list.
To avoid this, BoundedCapacity can be set to a specific number. When the limit is reached SendAsync will block until a slot becomes available

Best way to asynchronously receive all brokered messages from the service bus

Here is what I am trying to achieve :
On the service bus I have a topic which contains 5005 messages.
I need to peek all the messages without completing them and add them to a list (List<BrokeredMessage>)
Here is what I am trying :
IEnumerable<BrokeredMessage> dlIE = null;
List<BrokeredMessage> bmList = new List<BrokeredMessage>();
long i = 0;
while (i < count) //count is the total messages in the subscription
{
dlIE = deadLetterClient.ReceiveBatch(100);
bmList.AddRange(dlIE);
i = i + dlIE.Count();
}
In the above code I can only fetch 100 messages at a time since there is a batch size limit to retrieving the messages.
I have also tried to do asynchronously but it always returns 0 messages in the list. This is the code for that:
static List<BrokeredMessage> messageList = new List<BrokeredMessage>();
long i = 0;
while (i < count)
{
var task = ReceiveMessagesBatchForSubscription(deadLetterClient);
i = i + 100;
}
Task.WaitAny();
public async static Task ReceiveMessagesBatchForSubscription(SubscriptionClient deadLetterClient)
{
while (true)
{
var receivedMessage = await deadLetterClient.ReceiveBatchAsync(100);
messageList.AddRange(receivedMessage);
}
}
Can anyone please suggest a better way to do this?

Performance testing: One core vs multiple

I'm trying to understand one problem that I encountered recently in my project. I'm using Aurigma library to resize images. It is used in the single thread mode and produce only one thread during calculation. Lately I decided to move to ImageMagick project, because it is free and open source. I've built IM in the single thread mode and started to test. At first I wanted to compare their performance without interruptions, so I created a test that has high priorities for a process and its thread. Also, I set affinity to one core. I got that IM faster than Aurigma on ~25%. But than more threads I added than less IM had advantage against Aurigma.
My project is a windows service that starts about 7-10 child processes. Each process has 2 threads to process images. When I run my test as two different processes with 2 threads each, I noticed that IM worked worse than Aurigma on about 5%.
Maybe my question is not very detailed, but this scope is a little new for me and I would be glad to get direction for further investigation. How can it be that one program works faster if it is run on one thread in one process, but worse if it is run in multiple processes at the same time.
Fro example,
Au: 8 processes x 2Th(20 tasks per thread) = 320 tasks for 245 secs
IM: 8 processes x 2Th(20 tasks per thread) = 320 tasks for 280 secs
Au: 4 processes x 2Th(20 tasks per thread) = 160 tasks for 121 secs
IM: 4 processes x 2Th(20 tasks per thread) = 160 tasks for 141 secs
We can see that Au works better if we have more that 1 process, but in single process mode: Au process one task for 2,2 sec, IM for 1,4 sec and the sum time is better for IM
private static void ThreadRunner(
Action testFunc,
int repeatCount,
int threadCount
)
{
WaitHandle[] handles = new WaitHandle[threadCount];
var stopwatch = new Stopwatch();
// warmup
stopwatch.Start();
for (int h = 0; h < threadCount; h++)
{
var handle = new ManualResetEvent(false);
handles[h] = handle;
var thread = new Thread(() =>
{
Runner(testFunc, repeatCount);
handle.Set();
});
thread.Name = "Thread id" + h;
thread.IsBackground = true;
thread.Priority = ThreadPriority.Normal;
thread.Start();
}
WaitHandle.WaitAll(handles);
stopwatch.Stop();
Console.WriteLine("All Threads Total time taken " + stopwatch.ElapsedMilliseconds);
}
private static void Runner(
Action testFunc,
int count
)
{
//Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); // Use only the second core
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.BelowNormal;
Process.GetCurrentProcess().PriorityBoostEnabled = false;
Thread.CurrentThread.Priority = ThreadPriority.Normal;
var stopwatch = new Stopwatch();
// warmup
stopwatch.Start();
while(stopwatch.ElapsedMilliseconds < 10000)
testFunc();
stopwatch.Stop();
long elmsec = 0;
for (int i = 0; i < count; i++)
{
stopwatch.Reset();
stopwatch.Start();
testFunc();
stopwatch.Stop();
elmsec += stopwatch.ElapsedMilliseconds;
Console.WriteLine("Ticks: " + stopwatch.ElapsedTicks +
" mS: " + stopwatch.ElapsedMilliseconds + " Thread name: " + Thread.CurrentThread.Name);
}
Console.WriteLine("Total time taken " + elmsec + " Thread name: " + Thread.CurrentThread.Name);
}
/// <summary>
/// Entry point
/// </summary>
/// <param name="args"></param>
private static void Main(string[] args)
{
var files = GetFiles(args.FirstOrDefault());
if (!files.Any())
{
Console.WriteLine("Source files were not found.");
goto End;
}
//// Run tests
Console.WriteLine("ImageMagick run... Resize");
Runner(() => PerformanceTests.RunResizeImageMagickTest(files[0]), 20);
Console.WriteLine("Aurigma run... Resize");
Runner(() => PerformanceTests.RunResizeAurigmaTest(files[0]), 20);
Console.WriteLine("ImageMagick run... multi Resize");
ThreadRunner(() => PerformanceTests.RunResizeImageMagickTest(files[0]), 20, 2);
Console.WriteLine("Aurigma run... multi Resize");
ThreadRunner(() => PerformanceTests.RunResizeAurigmaTest(files[0]), 20, 2);
End:
Console.WriteLine("Done");
Console.ReadKey();
}
public static void RunResizeImageMagickTest(string source)
{
float[] ratios = { 0.25f, 0.8f, 1.4f };
// load the source bitmap
using (MagickImage bitmap = new MagickImage(source))
{
foreach (float ratio in ratios)
{
// determine the target image size
var size = new Size((int)Math.Round(bitmap.Width * ratio), (int)Math.Round(bitmap.Height * ratio));
MagickImage thumbnail = null;
try
{
thumbnail = new MagickImage(bitmap);
// scale the image down
thumbnail.Resize(size.Width, size.Height);
}
finally
{
if (thumbnail != null && thumbnail != bitmap)
{
thumbnail.Dispose();
}
}
}
}
}
public static void RunResizeAurigmaTest(string source)
{
float[] ratios = { 0.25f, 0.8f, 1.4f };
//// load the source bitmap
using (ABitmap bitmap = new ABitmap(source))
{
foreach (float ratio in ratios)
{
// determine the target image size
var size = new Size((int)Math.Round(bitmap.Width * ratio), (int)Math.Round(bitmap.Height * ratio));
ABitmap thumbnail = null;
try
{
thumbnail = new ABitmap();
// scale the image down
using (var resize = new Resize(size, InterpolationMode.HighQuality))
{
resize.ApplyTransform(bitmap, thumbnail);
}
}
finally
{
if (thumbnail != null && thumbnail != bitmap)
{
thumbnail.Dispose();
}
}
}
}
}
Code for testing is added. I use C#/.NET, ImageMagick works through ImageMagick.Net library, for Aurigma there is one too. For IM .net lib is written on C++/CLI, IM is C. A lot of languages are used.
OpenMP for IM is off.
Could be a memory cache issue. It is possible that multiple threads utilizing memory in a certain way create a scenario that one thread invalidates cache memory that another thread was using, causing a stall.
Programs that are not purely number crunching, but rely on a lot of IO (CPU<->Memory) are more difficult to analyze.

Tight Loop - Disk at 100%, Quad Core CPU #25% useage, only 15MBsec disk write speed

I have a tight loop which runs through a load of carts, which themselves contain around 10 events event objects and writes them to the disk in JSON via an intermediate repository (jOliver common domain rewired with GetEventStore.com):
// create ~200,000 carts, each with ~5 events
List<Cart> testData = TestData.GenerateFrom(products);
foreach (var cart in testData)
{
count = count + (cart as IAggregate).GetUncommittedEvents().Count;
repository.Save(cart);
}
I see the disk says it is as 100%, but the throughout is 'low' (15MB/sec, ~5,000 events per second) why is this, things i can think of are:
Since this is single threaded does the 25% CPU usage actually mean 100% of the 1 core that I am on (any way to show specific core my app is running on in Visual Studio)?
Am i constrained by I/O, or by CPU? Can I expect better performance if i create my own thread pool one for each CPU?
How come I can copy a file at ~120MB/sec, but I can only get throughput of 15MB/sec in my app? Is this due to the write size of lots of smaller packets?
Anything else I have missed?
The code I am using is from the geteventstore docs/blog:
public class GetEventStoreRepository : IRepository
{
private const string EventClrTypeHeader = "EventClrTypeName";
private const string AggregateClrTypeHeader = "AggregateClrTypeName";
private const string CommitIdHeader = "CommitId";
private const int WritePageSize = 500;
private const int ReadPageSize = 500;
IStreamNamingConvention streamNamingConvention;
private readonly IEventStoreConnection connection;
private static readonly JsonSerializerSettings serializerSettings = new JsonSerializerSettings { TypeNameHandling = TypeNameHandling.None };
public GetEventStoreRepository(IEventStoreConnection eventStoreConnection, IStreamNamingConvention namingConvention)
{
this.connection = eventStoreConnection;
this.streamNamingConvention = namingConvention;
}
public void Save(IAggregate aggregate)
{
this.Save(aggregate, Guid.NewGuid(), d => { });
}
public void Save(IAggregate aggregate, Guid commitId, Action<IDictionary<string, object>> updateHeaders)
{
var commitHeaders = new Dictionary<string, object>
{
{CommitIdHeader, commitId},
{AggregateClrTypeHeader, aggregate.GetType().AssemblyQualifiedName}
};
updateHeaders(commitHeaders);
var streamName = this.streamNamingConvention.GetStreamName(aggregate.GetType(), aggregate.Identity);
var newEvents = aggregate.GetUncommittedEvents().Cast<object>().ToList();
var originalVersion = aggregate.Version - newEvents.Count;
var expectedVersion = originalVersion == 0 ? ExpectedVersion.NoStream : originalVersion - 1;
var eventsToSave = newEvents.Select(e => ToEventData(Guid.NewGuid(), e, commitHeaders)).ToList();
if (eventsToSave.Count < WritePageSize)
{
this.connection.AppendToStreamAsync(streamName, expectedVersion, eventsToSave).Wait();
}
else
{
var startTransactionTask = this.connection.StartTransactionAsync(streamName, expectedVersion);
startTransactionTask.Wait();
var transaction = startTransactionTask.Result;
var position = 0;
while (position < eventsToSave.Count)
{
var pageEvents = eventsToSave.Skip(position).Take(WritePageSize);
var writeTask = transaction.WriteAsync(pageEvents);
writeTask.Wait();
position += WritePageSize;
}
var commitTask = transaction.CommitAsync();
commitTask.Wait();
}
aggregate.ClearUncommittedEvents();
}
private static EventData ToEventData(Guid eventId, object evnt, IDictionary<string, object> headers)
{
var data = Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(evnt, serializerSettings));
var eventHeaders = new Dictionary<string, object>(headers)
{
{
EventClrTypeHeader, evnt.GetType().AssemblyQualifiedName
}
};
var metadata = Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(eventHeaders, serializerSettings));
var typeName = evnt.GetType().Name;
return new EventData(eventId, typeName, true, data, metadata);
}
}
It was partially mentioned in the comments, but to enhance on that, as you are working fully single-threaded in the mentioned code (though you use async, you are just waiting for them, so effectively working sync) you are suffering from latency and overhead for context switching and EventStore protocol back and forth. Either really go the async route, but avoid waiting on the async threads and rather parallelize it (EventStore likes parallelization because it can batch multiple writes) or do batching yourself and send, for example, 20 events at a time.

Does ConcurrencyMode of Multiple have relevance when InstanceContextMode is PerCall for a WCF service with Net.Tcp binding?

I always thought that setting InstanceContextMode to PerCall makes concurrency mode irrelevant even if using a session aware binding like net.tcp. This is what MSDN says
http://msdn.microsoft.com/en-us/library/ms731193.aspx
"In PerCallinstancing, concurrency is not relevant, because each message is processed by a new InstanceContext and, therefore, never more than one thread is active in the InstanceContext."
But today I was going through Juval Lowy's book Programming WCF Services and he writes in Chapter 8
If the per-call service has a transport-level session, whether
concurrent processing of calls is allowed is a product of the service
concurrency mode. If the service is configured with
ConcurrencyMode.Single, concurrent processing of the pending
calls is not al lowed, and the calls are dispatched one at a time.
[...] I consider this to be a flawed design. If the service is
configured with ConcurrencyMode.Multiple, concurrent pro- cessing is
allowed. Calls are dispatched as they arrive, each to a new instance,
and execute concurrently. An interesting observation here is that in
the interest of through- put, it is a good idea to configure a
per-call service with ConcurrencyMode.Multiple— the instance itself
will still be thread-safe (so you will not incur the synchronization
liability), yet you will allow concurrent calls from the same client.
This is contradicting my understanding and what MSDN says. Which is correct ?
In my case I have a WCF Net.Tcp service used my many client applications that creates a new proxy object, makes the call and then immediately closes the proxy. The service has PerCall InstanceContextMode. Will I get improved throughput if I change the InstanceContextMode to Multiple with no worse thread safety behaviour than percall ?
The key phrase in reading Lowy’s statement is “in the interest of throughput”. Lowy is pointing out that when using ConcurrencyMode.Single WCF will blindly implement a lock to enforce serialization to the service instance. Locks are expensive and this one isn’t necessary because PerCall already guarantees that a second thread will never try to call the same service instance.
In terms of behavior:
ConcurrencyMode does not matter for a PerCall service instance.
In terms of performance:
A PerCall service that is ConcurrencyMode.Multiple should be slightly faster because its not creating and acquiring the (unneeded) thread lock that ConcurrencyMode.Single is using.
I wrote a quick benchmark program to see if I could measure the performance impact of Single vs Multiple for a PerCall service: The benchmark showed no meaningful difference.
I pasted in the code below if you want to try running it yourself.
Test cases I tried:
600 threads calling a service 500 times
200 threads calling a service 1000 times
8 threads calling a service 10000 times
1 thread calling a service 10000 times
I ran this on a 4 CPU VM running Service 2008 R2. All but the 1 thread case was CPU constrained.
Results:
All the runs were within about 5% of eachother.
Sometimes ConcurrencyMode.Multiple was faster. Sometimes ConcurrencyMode.Single was faster. Maybe a proper statistical analysis could pick a winner. In my opinion they are close enough to not matter.
Here’s a typical output:
Starting Single Service on net.pipe://localhost/base...
Type=SingleService ThreadCount=600 ThreadCallCount=500
runtime: 45156759 ticks 12615 msec
Starting Multiple Service on net.pipe://localhost/base...
Type=MultipleService ThreadCount=600 ThreadCallCount=500
runtime: 48731273 ticks 13613 msec
Starting Single Service on net.pipe://localhost/base...
Type=SingleService ThreadCount=600 ThreadCallCount=500
runtime: 48701509 ticks 13605 msec
Starting Multiple Service on net.pipe://localhost/base...
Type=MultipleService ThreadCount=600 ThreadCallCount=500
runtime: 48590336 ticks 13574 msec
Benchmark Code:
Usual caveat: This is benchmark code that takes short cuts that aren’t appropriate for production use.
using System;
using System.Collections.Generic;
using System.Linq;
using System.ServiceModel;
using System.ServiceModel.Description;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace WCFTest
{
[ServiceContract]
public interface ISimple
{
[OperationContract()]
void Put();
}
[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall, ConcurrencyMode = ConcurrencyMode.Single)]
public class SingleService : ISimple
{
public void Put()
{
//Console.WriteLine("put got " + i);
return;
}
}
[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall, ConcurrencyMode = ConcurrencyMode.Multiple)]
public class MultipleService : ISimple
{
public void Put()
{
//Console.WriteLine("put got " + i);
return;
}
}
public class ThreadParms
{
public int ManagedThreadId { get; set; }
public ServiceEndpoint ServiceEndpoint { get; set; }
}
public class BenchmarkService
{
public readonly int ThreadCount;
public readonly int ThreadCallCount;
public readonly Type ServiceType;
int _completed = 0;
System.Diagnostics.Stopwatch _stopWatch;
EventWaitHandle _waitHandle;
bool _done;
public BenchmarkService(Type serviceType, int threadCount, int threadCallCount)
{
this.ServiceType = serviceType;
this.ThreadCount = threadCount;
this.ThreadCallCount = threadCallCount;
_done = false;
}
public void Run(string baseAddress)
{
if (_done)
throw new InvalidOperationException("Can't run twice");
ServiceHost host = new ServiceHost(ServiceType, new Uri(baseAddress));
host.Open();
Console.WriteLine("Starting " + ServiceType.Name + " on " + baseAddress + "...");
_waitHandle = new EventWaitHandle(false, EventResetMode.ManualReset);
_completed = 0;
_stopWatch = System.Diagnostics.Stopwatch.StartNew();
ServiceEndpoint endpoint = host.Description.Endpoints.Find(typeof(ISimple));
for (int i = 1; i <= ThreadCount; i++)
{
// ServiceEndpoint is NOT thread safe. Make a copy for each thread.
ServiceEndpoint temp = new ServiceEndpoint(endpoint.Contract, endpoint.Binding, endpoint.Address);
ThreadPool.QueueUserWorkItem(new WaitCallback(CallServiceManyTimes),
new ThreadParms() { ManagedThreadId = i, ServiceEndpoint = temp });
}
_waitHandle.WaitOne();
host.Shutdown();
_done = true;
//Console.WriteLine("All DONE.");
Console.WriteLine(" Type=" + ServiceType.Name + " ThreadCount=" + ThreadCount + " ThreadCallCount=" + ThreadCallCount);
Console.WriteLine(" runtime: " + _stopWatch.ElapsedTicks + " ticks " + _stopWatch.ElapsedMilliseconds + " msec");
}
public void CallServiceManyTimes(object threadParams)
{
ThreadParms p = (ThreadParms)threadParams;
ChannelFactory<ISimple> factory = new ChannelFactory<ISimple>(p.ServiceEndpoint);
ISimple proxy = factory.CreateChannel();
for (int i = 1; i < ThreadCallCount; i++)
{
proxy.Put();
}
((ICommunicationObject)proxy).Shutdown();
factory.Shutdown();
int currentCompleted = Interlocked.Increment(ref _completed);
if (currentCompleted == ThreadCount)
{
_stopWatch.Stop();
_waitHandle.Set();
}
}
}
class Program
{
static void Main(string[] args)
{
BenchmarkService benchmark;
int threadCount = 600;
int threadCalls = 500;
string baseAddress = "net.pipe://localhost/base";
for (int i = 0; i <= 4; i++)
{
benchmark = new BenchmarkService(typeof(SingleService), threadCount, threadCalls);
benchmark.Run(baseAddress);
benchmark = new BenchmarkService(typeof(MultipleService), threadCount, threadCalls);
benchmark.Run(baseAddress);
}
baseAddress = "http://localhost/base";
for (int i = 0; i <= 4; i++)
{
benchmark = new BenchmarkService(typeof(SingleService), threadCount, threadCalls);
benchmark.Run(baseAddress);
benchmark = new BenchmarkService(typeof(MultipleService), threadCount, threadCalls);
benchmark.Run(baseAddress);
}
Console.WriteLine("Press ENTER to close.");
Console.ReadLine();
}
}
public static class Extensions
{
static public void Shutdown(this ICommunicationObject obj)
{
try
{
if (obj != null)
obj.Close();
}
catch (Exception ex)
{
Console.WriteLine("Shutdown exception: {0}", ex.Message);
obj.Abort();
}
}
}
}

Resources