Performance testing: One core vs multiple - multithreading

I'm trying to understand one problem that I encountered recently in my project. I'm using Aurigma library to resize images. It is used in the single thread mode and produce only one thread during calculation. Lately I decided to move to ImageMagick project, because it is free and open source. I've built IM in the single thread mode and started to test. At first I wanted to compare their performance without interruptions, so I created a test that has high priorities for a process and its thread. Also, I set affinity to one core. I got that IM faster than Aurigma on ~25%. But than more threads I added than less IM had advantage against Aurigma.
My project is a windows service that starts about 7-10 child processes. Each process has 2 threads to process images. When I run my test as two different processes with 2 threads each, I noticed that IM worked worse than Aurigma on about 5%.
Maybe my question is not very detailed, but this scope is a little new for me and I would be glad to get direction for further investigation. How can it be that one program works faster if it is run on one thread in one process, but worse if it is run in multiple processes at the same time.
Fro example,
Au: 8 processes x 2Th(20 tasks per thread) = 320 tasks for 245 secs
IM: 8 processes x 2Th(20 tasks per thread) = 320 tasks for 280 secs
Au: 4 processes x 2Th(20 tasks per thread) = 160 tasks for 121 secs
IM: 4 processes x 2Th(20 tasks per thread) = 160 tasks for 141 secs
We can see that Au works better if we have more that 1 process, but in single process mode: Au process one task for 2,2 sec, IM for 1,4 sec and the sum time is better for IM
private static void ThreadRunner(
Action testFunc,
int repeatCount,
int threadCount
)
{
WaitHandle[] handles = new WaitHandle[threadCount];
var stopwatch = new Stopwatch();
// warmup
stopwatch.Start();
for (int h = 0; h < threadCount; h++)
{
var handle = new ManualResetEvent(false);
handles[h] = handle;
var thread = new Thread(() =>
{
Runner(testFunc, repeatCount);
handle.Set();
});
thread.Name = "Thread id" + h;
thread.IsBackground = true;
thread.Priority = ThreadPriority.Normal;
thread.Start();
}
WaitHandle.WaitAll(handles);
stopwatch.Stop();
Console.WriteLine("All Threads Total time taken " + stopwatch.ElapsedMilliseconds);
}
private static void Runner(
Action testFunc,
int count
)
{
//Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); // Use only the second core
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.BelowNormal;
Process.GetCurrentProcess().PriorityBoostEnabled = false;
Thread.CurrentThread.Priority = ThreadPriority.Normal;
var stopwatch = new Stopwatch();
// warmup
stopwatch.Start();
while(stopwatch.ElapsedMilliseconds < 10000)
testFunc();
stopwatch.Stop();
long elmsec = 0;
for (int i = 0; i < count; i++)
{
stopwatch.Reset();
stopwatch.Start();
testFunc();
stopwatch.Stop();
elmsec += stopwatch.ElapsedMilliseconds;
Console.WriteLine("Ticks: " + stopwatch.ElapsedTicks +
" mS: " + stopwatch.ElapsedMilliseconds + " Thread name: " + Thread.CurrentThread.Name);
}
Console.WriteLine("Total time taken " + elmsec + " Thread name: " + Thread.CurrentThread.Name);
}
/// <summary>
/// Entry point
/// </summary>
/// <param name="args"></param>
private static void Main(string[] args)
{
var files = GetFiles(args.FirstOrDefault());
if (!files.Any())
{
Console.WriteLine("Source files were not found.");
goto End;
}
//// Run tests
Console.WriteLine("ImageMagick run... Resize");
Runner(() => PerformanceTests.RunResizeImageMagickTest(files[0]), 20);
Console.WriteLine("Aurigma run... Resize");
Runner(() => PerformanceTests.RunResizeAurigmaTest(files[0]), 20);
Console.WriteLine("ImageMagick run... multi Resize");
ThreadRunner(() => PerformanceTests.RunResizeImageMagickTest(files[0]), 20, 2);
Console.WriteLine("Aurigma run... multi Resize");
ThreadRunner(() => PerformanceTests.RunResizeAurigmaTest(files[0]), 20, 2);
End:
Console.WriteLine("Done");
Console.ReadKey();
}
public static void RunResizeImageMagickTest(string source)
{
float[] ratios = { 0.25f, 0.8f, 1.4f };
// load the source bitmap
using (MagickImage bitmap = new MagickImage(source))
{
foreach (float ratio in ratios)
{
// determine the target image size
var size = new Size((int)Math.Round(bitmap.Width * ratio), (int)Math.Round(bitmap.Height * ratio));
MagickImage thumbnail = null;
try
{
thumbnail = new MagickImage(bitmap);
// scale the image down
thumbnail.Resize(size.Width, size.Height);
}
finally
{
if (thumbnail != null && thumbnail != bitmap)
{
thumbnail.Dispose();
}
}
}
}
}
public static void RunResizeAurigmaTest(string source)
{
float[] ratios = { 0.25f, 0.8f, 1.4f };
//// load the source bitmap
using (ABitmap bitmap = new ABitmap(source))
{
foreach (float ratio in ratios)
{
// determine the target image size
var size = new Size((int)Math.Round(bitmap.Width * ratio), (int)Math.Round(bitmap.Height * ratio));
ABitmap thumbnail = null;
try
{
thumbnail = new ABitmap();
// scale the image down
using (var resize = new Resize(size, InterpolationMode.HighQuality))
{
resize.ApplyTransform(bitmap, thumbnail);
}
}
finally
{
if (thumbnail != null && thumbnail != bitmap)
{
thumbnail.Dispose();
}
}
}
}
}
Code for testing is added. I use C#/.NET, ImageMagick works through ImageMagick.Net library, for Aurigma there is one too. For IM .net lib is written on C++/CLI, IM is C. A lot of languages are used.
OpenMP for IM is off.

Could be a memory cache issue. It is possible that multiple threads utilizing memory in a certain way create a scenario that one thread invalidates cache memory that another thread was using, causing a stall.
Programs that are not purely number crunching, but rely on a lot of IO (CPU<->Memory) are more difficult to analyze.

Related

Cosmos inserts won't parallelize effectively

Meta-Question:
We're pulling data from EventHub, running some logic, and saving it off to cosmos. Currently Cosmos inserts are our bottleneck. How do we maximize our throughput?
Details
We're trying to optimize our Cosmos throughput and there seems to be some contention in the SDK that makes parallel inserts only marginally faster than serial inserts.
We're logically doing:
for (int i = 0; i < insertCount; i++)
{
taskList.Add(InsertCosmos(sdkContainerClient));
}
var parallelTimes = await Task.WhenAll(taskList);
Here's the results comparing serial inserts, parallel inserts, and "faking" an insert (with Task.Delay):
Serial took: 461ms for 20
- Individual times 28,8,117,19,14,11,10,12,5,8,9,11,18,15,79,23,14,16,14,13
Cosmos Parallel
Parallel took: 231ms for 20
- Individual times 17,15,23,39,45,52,72,74,80,91,96,98,108,117,123,128,139,146,147,145
Just Parallel (no cosmos)
Parallel took: 27ms for 20
- Individual times 27,26,26,26,26,26,26,25,25,25,25,25,25,24,24,24,23,23,23,23
Serial is obvious (just add each value)
no cosmos (the last timing) is also obvious (just take the min time)
But parallel cosmos doesn't parallelize nearly as well, indicating there's some contention.
We're running this on a VM in Azure (same datacenter as Cosmos), have enough RUs so aren't getting 429s, and using Microsoft.Azure.Cosmos 3.2.0.
Full Code Sample
class Program
{
public static void Main(string[] args)
{
CosmosWriteTest().Wait();
}
public static async Task CosmosWriteTest()
{
var cosmosClient = new CosmosClient("todo", new CosmosClientOptions { ConnectionMode = ConnectionMode.Direct });
var database = cosmosClient.GetDatabase("<ourcontainer>");
var sdkContainerClient = database.GetContainer("<ourcontainer>");
int insertCount = 25;
//Warmup
await sdkContainerClient.CreateItemAsync(new TestObject());
//---Serially inserts into Cosmos---
List<long> serialTimes = new List<long>();
var serialTimer = Stopwatch.StartNew();
Console.WriteLine("Cosmos Serial");
for (int i = 0; i < insertCount; i++)
{
serialTimes.Add(await InsertCosmos(sdkContainerClient));
}
serialTimer.Stop();
Console.WriteLine($"Serial took: {serialTimer.ElapsedMilliseconds}ms for {insertCount}");
Console.WriteLine($" - Individual times {string.Join(",", serialTimes)}");
//---Parallel inserts into Cosmos---
Console.WriteLine(Environment.NewLine + "Cosmos Parallel");
var parallelTimer = Stopwatch.StartNew();
var taskList = new List<Task<long>>();
for (int i = 0; i < insertCount; i++)
{
taskList.Add(InsertCosmos(sdkContainerClient));
}
var parallelTimes = await Task.WhenAll(taskList);
parallelTimer.Stop();
Console.WriteLine($"Parallel took: {parallelTimer.ElapsedMilliseconds}ms for {insertCount}");
Console.WriteLine($" - Individual times {string.Join(",", parallelTimes)}");
//---Testing parallelism minus cosmos---
Console.WriteLine(Environment.NewLine + "Just Parallel (no cosmos)");
var justParallelTimer = Stopwatch.StartNew();
var noCosmosTaskList = new List<Task<long>>();
for (int i = 0; i < insertCount; i++)
{
noCosmosTaskList.Add(InsertCosmos(sdkContainerClient, true));
}
var justParallelTimes = await Task.WhenAll(noCosmosTaskList);
justParallelTimer.Stop();
Console.WriteLine($"Parallel took: {justParallelTimer.ElapsedMilliseconds}ms for {insertCount}");
Console.WriteLine($" - Individual times {string.Join(",", justParallelTimes)}");
}
//inserts
private static async Task<long> InsertCosmos(Container sdkContainerClient, bool justDelay = false)
{
var timer = Stopwatch.StartNew();
if (!justDelay)
await sdkContainerClient.CreateItemAsync(new TestObject());
else
await Task.Delay(20);
timer.Stop();
return timer.ElapsedMilliseconds;
}
//Test object to save to Cosmos
public class TestObject
{
public string id { get; set; } = Guid.NewGuid().ToString();
public string pKey { get; set; } = Guid.NewGuid().ToString();
public string Field1 { get; set; } = "Testing this field";
public double Number { get; set; } = 12345;
}
}
This is the scenario for which Bulk is being introduced. Bulk mode is in preview at this moment and available in the 3.2.0-preview2 package.
What you need to do to take advantage of Bulk is turn the AllowBulkExecution flag on:
new CosmosClient(endpoint, authKey, new CosmosClientOptions() { AllowBulkExecution = true } );
This mode was made to benefit this scenario you describe, a list of concurrent operations that need throughput.
We have a sample project here: https://github.com/Azure/azure-cosmos-dotnet-v3/tree/master/Microsoft.Azure.Cosmos.Samples/Usage/BulkSupport
And we are still working on the official documentation, but the idea is that when concurrent operations are issued, instead of executing them as individual requests like you are seeing right now, the SDK will group them based on partition affinity and execute them as grouped (batch) operations, reducing the backend service calls and potentially increasing throughput between 50%-100% depending on the volume of operations. This mode will consume more RU/s as it is pushing a higher volume of operations per second than issuing the operations individually (so if you hit 429s it means the bottleneck is now on the provisioned RU/s).
var cosmosClient = new CosmosClient("todo", new CosmosClientOptions { AllowBulkExecution = true });
var database = cosmosClient.GetDatabase("<ourcontainer>");
var sdkContainerClient = database.GetContainer("<ourcontainer>");
//The more operations the better, just 25 might not yield a great difference vs non bulk
int insertCount = 10000;
//Don't do any warmup
List<Task> operations = new List<Tasks>();
var timer = Stopwatch.StartNew();
for (int i = 0; i < insertCount; i++)
{
operations.Add(sdkContainerClient.CreateItemAsync(new TestObject()));
}
await Task.WhenAll(operations);
serialTimer.Stop();
Important: This is a feature that is still in preview. Since this is a mode optimized for throughput (not latency), any single individual operation you do, won't have a great operational latency.
If you want to optimize even further, and your data source lets you access Streams (avoid serialization), you can use the CreateItemStream SDK methods for even better throughput.

.NET Core: Why are so many threads when I use async IO (OS Windows)?

I am a little bit confused about how many threads .net core creates. I have a very simple program that fetch static contnet from our application`s server by array of urls:
class Program
{
static void Main(string[] args)
{
//ThreadPool.SetMaxThreads(10, 10);
var urls = new string[]
{
"http://url1.....",
"http://url2.....",
.......
};
var taskCount = 100;
var count = 1000;
var httpClient = new HttpClient {Timeout = TimeSpan.FromMilliseconds(10000)};
var tasks = new Collection<Task>();
var threads = new ConcurrentBag<int>();
for (var i = 0; i < taskCount; i++)
{
tasks.Add(Task.Run(async () =>
{
var sw = new Stopwatch();
for (var j = 0; j < count; j++)
{
foreach (var url in urls)
{
sw.Start();
try
{
threads.Add(Thread.CurrentThread.ManagedThreadId);
using (var r = await httpClient.GetAsync(url))
{
threads.Add(Thread.CurrentThread.ManagedThreadId);
if (r.StatusCode != HttpStatusCode.OK)
{
sw.Stop();
Console.WriteLine($"[{url}] - status: [{r.StatusCode}], time: [{sw.ElapsedMilliseconds}]");
}
else
{
using (var content = r.Content)
{
threads.Add(Thread.CurrentThread.ManagedThreadId);
_ = await content.ReadAsByteArrayAsync();
threads.Add(Thread.CurrentThread.ManagedThreadId);
sw.Stop();
Console.WriteLine($"[{url}] - status: [{r.StatusCode}], time: [{sw.ElapsedMilliseconds}]");
}
}
}
}
catch (Exception e)
{
Console.WriteLine(e);
}
sw.Reset();
}
}
}));
}
Console.ReadKey();
File.AppendAllText(#"E:\temp\log.txt", $"t count: {threads.ToList().Distinct().ToList().Count}\r\n");
}
}
Only 100 tasks fetch content by urls, log this and do it again. When I run this program on OS Windows by using .NET Core 2.0, I see 10 threads in log file. I understand that these are threads from the thread pool, but when I look on count of threads in Windows Resource Monitor, I see 120 threads. Why?
If I run this program on OS Windows by using .NET Framework 4.7.1, I see 17 threads in log file and 37 threads in Windows Resource Monitor. If I use ThreadPool.SetMaxThreads(10, 10) for both cases, nothing changes, I still see 120 threads in .NET Core 2.0 and around 37 threads in .NET Framework 4.7.1 by looking on Windows Resource Monitor. I do not understand why. Please, someone explain to me what are these 120 threads and why does .NET Core 2.0 create them so much?

Should I use CRITICAL_SECTION or not?

I have a program which has a Ui with which users choose the way to display and do small configurations. It also has a background procedure, which continuously reads data from the network and update the data to display.
Now I put them in one process:
background procedure:
STATE MainWindow::Rcv()
{
DeviceMAP::iterator dev;
for(dev= dev_map.begin(); dev!= dev_map.end(); dev++)
{
dev->second.rcvData();//receive data from the network, the time can be ignored.
BitLog* log = new BitLog();
dev->second.parseData(log);
LogItem* logItem = new LogItem();
logItem->time = QString::fromLocal8Bit(log->rcvTime.c_str());
logItem->name = QString::fromLocal8Bit(log->basicInfo.getName().c_str());
logItem->PIN = QString::fromLocal8Bit(log->basicInfo.getPIN().c_str()).toShort();
delete log;
add_logItem(logItem);
}
return SUCCESS;
}
add_logItem:
void MainWindow::add_logItem(LogItem* logItem)
{
writeToFile(logItem);
Device* r = getDevbyPIN(QString::number(logItem->PIN));
if(r == NULL)return;
devInfo_inside_widget::States state = logItem->state;
bool bool_list[portsNum_X];
for(int i =0; i < portsNum_X; i++)
{
bool_list[i] = 0;
}
for(int i = 0; i < portsNum; i++)
{
bool_list[i] = (logItem->BITS[i/8] >> (7 - i%8)) & 0x1;
}
r->refresh(state, logItem->time, bool_list);//update data inside...state, time , BITS...
IconLabel* icl = getIConLabelByDev(r);//update data
icl->refresh(state);
logDisplayQueue.enqueue(logItem);//write queue here
int size = logDisplayQueue.size();
if(size > 100)
{
logDisplayQueue.dequeue();//write queue here
}
}
The section above has not dealt with any ui operations yet, but when user push a radio button in the ui, the program has to filter the data in the queue and display it in the table widget:
ui operations:
void MainWindow::filter_log_display(bool bol)
{
row_selectable = false;
ui->tableWidget->setRowCount(0);//delete table items all
row_selectable = true;
int size_1 = logDisplayQueue.size() - 1;
ui->tableWidget->verticalScrollBar()->setSliderPosition(0);
if(size_1+1 < 100)
{
ui->tableWidget->setRowCount(size_1 + 1);
}
else
{
ui->tableWidget->setRowCount(100);//display 100 rows at most
}
if(bol)//filter from logDisplayQueue and display unworking-state-log rows
{
int index = 0;
for(int queue_i = size_1; queue_i >= 0; queue_i--)
{
LogItem* logItem = (LogItem*)logDisplayQueue.at(queue_i); // read queue here
if(logItem->state == STATE_WORK || logItem->state == STATE_UN)continue;
QString BITS_str = bits2Hexs(logItem->BITS);
ui->tableWidget->setItem(index, 0, new QTableWidgetItem(logItem->time));//time
ui->tableWidget->setItem(index, 1, new QTableWidgetItem(logItem->name));//name
ui->tableWidget->setItem(index, 2, new QTableWidgetItem(BITS_str));//BITS
if(queue_i == oldRowItemNo)ui->tableWidget->selectRow(index);
index++;
}
ui->tableWidget->setRowCount(index);
}
else//display all rows
{
for(int queue_i = size_1, index = 0; queue_i >= 0; queue_i--, index++)
{
LogItem* logItem = (LogItem*)logDisplayQueue.at(queue_i); //read queue here
QString BITS_str = bits2Hexs(logItem->BITS);//
finish = clock();
ui->tableWidget->setItem(index, 0, new QTableWidgetItem(logItem->time));//time
ui->tableWidget->setItem(index, 1, new QTableWidgetItem(logItem->name));//name
ui->tableWidget->setItem(index, 2, new QTableWidgetItem(BITS_str));//BITS
if(queue_i == oldRowItemNo)ui->tableWidget->selectRow(index);
}
}
}
So the queue is quite samll and the background procedure is quite frequent(nearly 500 times per sec). That is, the queue will be written 500 times in 1 sec, but displayed time from time by the user.
I want to split the functions into two threads and run them together, one rev and update data, one display.
If i do not use any lock or mutex, the user may get the wrong data, but if i force the write-data procedure enter critical section and leave critical section everytime, it will be a heavy overload. :)
Should I use CRITICAL_SECTION or something else, any suggestions related?(my words could be verbose for you :) , i only hope for some hints :)
I'd put "Recv" function in another QObject derived class, put it under other QThread not main gui thread and connect "logItemAdded(LogItem* item)" signal to main window's "addLogItem(LogItem* item)" slot.
for just quick and dirty hint my conceptual code follows.
#include <QObject>
class Logger : public QObject
{
Q_OBJECT
public:
Logger(QObject* parent=0);
virtual ~Logger();
signals:
void logItemAdded(LogItem* logItem);
public slots:
protected:
void Rcv()
{
// ...
// was "add_logItem(logItem)"
emit logItemAdded(logItem);
}
};
MainWindow::MainWindow(...)
{
Logger logger = new Logger;
// setup your logger
QThread* thread = new QThread;
logger->moveToThread(thread);
connect(thread, SIGNAL(finished()), thread, SLOT(deleteLater()));
thread->start();
}
Hope this helps.
Good luck.

Does ConcurrencyMode of Multiple have relevance when InstanceContextMode is PerCall for a WCF service with Net.Tcp binding?

I always thought that setting InstanceContextMode to PerCall makes concurrency mode irrelevant even if using a session aware binding like net.tcp. This is what MSDN says
http://msdn.microsoft.com/en-us/library/ms731193.aspx
"In PerCallinstancing, concurrency is not relevant, because each message is processed by a new InstanceContext and, therefore, never more than one thread is active in the InstanceContext."
But today I was going through Juval Lowy's book Programming WCF Services and he writes in Chapter 8
If the per-call service has a transport-level session, whether
concurrent processing of calls is allowed is a product of the service
concurrency mode. If the service is configured with
ConcurrencyMode.Single, concurrent processing of the pending
calls is not al lowed, and the calls are dispatched one at a time.
[...] I consider this to be a flawed design. If the service is
configured with ConcurrencyMode.Multiple, concurrent pro- cessing is
allowed. Calls are dispatched as they arrive, each to a new instance,
and execute concurrently. An interesting observation here is that in
the interest of through- put, it is a good idea to configure a
per-call service with ConcurrencyMode.Multiple— the instance itself
will still be thread-safe (so you will not incur the synchronization
liability), yet you will allow concurrent calls from the same client.
This is contradicting my understanding and what MSDN says. Which is correct ?
In my case I have a WCF Net.Tcp service used my many client applications that creates a new proxy object, makes the call and then immediately closes the proxy. The service has PerCall InstanceContextMode. Will I get improved throughput if I change the InstanceContextMode to Multiple with no worse thread safety behaviour than percall ?
The key phrase in reading Lowy’s statement is “in the interest of throughput”. Lowy is pointing out that when using ConcurrencyMode.Single WCF will blindly implement a lock to enforce serialization to the service instance. Locks are expensive and this one isn’t necessary because PerCall already guarantees that a second thread will never try to call the same service instance.
In terms of behavior:
ConcurrencyMode does not matter for a PerCall service instance.
In terms of performance:
A PerCall service that is ConcurrencyMode.Multiple should be slightly faster because its not creating and acquiring the (unneeded) thread lock that ConcurrencyMode.Single is using.
I wrote a quick benchmark program to see if I could measure the performance impact of Single vs Multiple for a PerCall service: The benchmark showed no meaningful difference.
I pasted in the code below if you want to try running it yourself.
Test cases I tried:
600 threads calling a service 500 times
200 threads calling a service 1000 times
8 threads calling a service 10000 times
1 thread calling a service 10000 times
I ran this on a 4 CPU VM running Service 2008 R2. All but the 1 thread case was CPU constrained.
Results:
All the runs were within about 5% of eachother.
Sometimes ConcurrencyMode.Multiple was faster. Sometimes ConcurrencyMode.Single was faster. Maybe a proper statistical analysis could pick a winner. In my opinion they are close enough to not matter.
Here’s a typical output:
Starting Single Service on net.pipe://localhost/base...
Type=SingleService ThreadCount=600 ThreadCallCount=500
runtime: 45156759 ticks 12615 msec
Starting Multiple Service on net.pipe://localhost/base...
Type=MultipleService ThreadCount=600 ThreadCallCount=500
runtime: 48731273 ticks 13613 msec
Starting Single Service on net.pipe://localhost/base...
Type=SingleService ThreadCount=600 ThreadCallCount=500
runtime: 48701509 ticks 13605 msec
Starting Multiple Service on net.pipe://localhost/base...
Type=MultipleService ThreadCount=600 ThreadCallCount=500
runtime: 48590336 ticks 13574 msec
Benchmark Code:
Usual caveat: This is benchmark code that takes short cuts that aren’t appropriate for production use.
using System;
using System.Collections.Generic;
using System.Linq;
using System.ServiceModel;
using System.ServiceModel.Description;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace WCFTest
{
[ServiceContract]
public interface ISimple
{
[OperationContract()]
void Put();
}
[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall, ConcurrencyMode = ConcurrencyMode.Single)]
public class SingleService : ISimple
{
public void Put()
{
//Console.WriteLine("put got " + i);
return;
}
}
[ServiceBehavior(InstanceContextMode = InstanceContextMode.PerCall, ConcurrencyMode = ConcurrencyMode.Multiple)]
public class MultipleService : ISimple
{
public void Put()
{
//Console.WriteLine("put got " + i);
return;
}
}
public class ThreadParms
{
public int ManagedThreadId { get; set; }
public ServiceEndpoint ServiceEndpoint { get; set; }
}
public class BenchmarkService
{
public readonly int ThreadCount;
public readonly int ThreadCallCount;
public readonly Type ServiceType;
int _completed = 0;
System.Diagnostics.Stopwatch _stopWatch;
EventWaitHandle _waitHandle;
bool _done;
public BenchmarkService(Type serviceType, int threadCount, int threadCallCount)
{
this.ServiceType = serviceType;
this.ThreadCount = threadCount;
this.ThreadCallCount = threadCallCount;
_done = false;
}
public void Run(string baseAddress)
{
if (_done)
throw new InvalidOperationException("Can't run twice");
ServiceHost host = new ServiceHost(ServiceType, new Uri(baseAddress));
host.Open();
Console.WriteLine("Starting " + ServiceType.Name + " on " + baseAddress + "...");
_waitHandle = new EventWaitHandle(false, EventResetMode.ManualReset);
_completed = 0;
_stopWatch = System.Diagnostics.Stopwatch.StartNew();
ServiceEndpoint endpoint = host.Description.Endpoints.Find(typeof(ISimple));
for (int i = 1; i <= ThreadCount; i++)
{
// ServiceEndpoint is NOT thread safe. Make a copy for each thread.
ServiceEndpoint temp = new ServiceEndpoint(endpoint.Contract, endpoint.Binding, endpoint.Address);
ThreadPool.QueueUserWorkItem(new WaitCallback(CallServiceManyTimes),
new ThreadParms() { ManagedThreadId = i, ServiceEndpoint = temp });
}
_waitHandle.WaitOne();
host.Shutdown();
_done = true;
//Console.WriteLine("All DONE.");
Console.WriteLine(" Type=" + ServiceType.Name + " ThreadCount=" + ThreadCount + " ThreadCallCount=" + ThreadCallCount);
Console.WriteLine(" runtime: " + _stopWatch.ElapsedTicks + " ticks " + _stopWatch.ElapsedMilliseconds + " msec");
}
public void CallServiceManyTimes(object threadParams)
{
ThreadParms p = (ThreadParms)threadParams;
ChannelFactory<ISimple> factory = new ChannelFactory<ISimple>(p.ServiceEndpoint);
ISimple proxy = factory.CreateChannel();
for (int i = 1; i < ThreadCallCount; i++)
{
proxy.Put();
}
((ICommunicationObject)proxy).Shutdown();
factory.Shutdown();
int currentCompleted = Interlocked.Increment(ref _completed);
if (currentCompleted == ThreadCount)
{
_stopWatch.Stop();
_waitHandle.Set();
}
}
}
class Program
{
static void Main(string[] args)
{
BenchmarkService benchmark;
int threadCount = 600;
int threadCalls = 500;
string baseAddress = "net.pipe://localhost/base";
for (int i = 0; i <= 4; i++)
{
benchmark = new BenchmarkService(typeof(SingleService), threadCount, threadCalls);
benchmark.Run(baseAddress);
benchmark = new BenchmarkService(typeof(MultipleService), threadCount, threadCalls);
benchmark.Run(baseAddress);
}
baseAddress = "http://localhost/base";
for (int i = 0; i <= 4; i++)
{
benchmark = new BenchmarkService(typeof(SingleService), threadCount, threadCalls);
benchmark.Run(baseAddress);
benchmark = new BenchmarkService(typeof(MultipleService), threadCount, threadCalls);
benchmark.Run(baseAddress);
}
Console.WriteLine("Press ENTER to close.");
Console.ReadLine();
}
}
public static class Extensions
{
static public void Shutdown(this ICommunicationObject obj)
{
try
{
if (obj != null)
obj.Close();
}
catch (Exception ex)
{
Console.WriteLine("Shutdown exception: {0}", ex.Message);
obj.Abort();
}
}
}
}

Reading output from console application and WPF plotting async

I have a console application which outputs about 160 lines of info every 1 second.
The data output is points that can be used to plot on a graph.
In my WPF application, I've successfully have this hooked up and the data output by the console application is being plotted, however, after about 500 or so data points, I see significant slow down in the application and UI thread lockups.
I assume this is due to the async operations I'm using:
BackgroundWorker worker = new BackgroundWorker();
worker.DoWork += delegate(object s, DoWorkEventArgs args)
{
_process = new Process();
_process.StartInfo.FileName = "consoleApp.exe";
_process.StartInfo.UseShellExecute = false;
_process.StartInfo.RedirectStandardOutput = true;
_process.StartInfo.CreateNoWindow = true;
_process.EnableRaisingEvents = true;
_process.OutputDataReceived += new DataReceivedEventHandler(SortOutputHandler);
_process.Start();
_process.BeginOutputReadLine();
_watch.Start();
};
worker.RunWorkerAsync();
And the handler that is taking care of parsing and plotting the data:
private void SortOutputHandler(object sendingProcess, DataReceivedEventArgs outLine)
{
if (!String.IsNullOrEmpty(outLine.Data))
{
var xGroup = Regex.Match(outLine.Data, "x: ?([-0-9]*)").Groups[1];
int x = int.Parse(xGroup.Value);
var yGroup = Regex.Match(outLine.Data, "y: ?([-0-9]*)").Groups[1];
int y = int.Parse(yGroup.Value);
var zGroup = Regex.Match(outLine.Data, "z: ?([-0-9]*)").Groups[1];
int z = int.Parse(zGroup.Value);
Reading reading = new Reading()
{
Time = _watch.Elapsed.TotalMilliseconds,
X = x,
Y = y,
Z = z
};
Dispatcher.Invoke(new Action(() =>
{
_readings.Enqueue(reading);
_dataPointsCount++;
}), System.Windows.Threading.DispatcherPriority.Normal);
}
}
_readings is a custom ObservableQueue<Queue> as defined in this answer. I've modified it so that only 50 items can be in the queue at a time. So if a new item is being added and the queue count >= 50, a Dequeue() is called before an Enqueue().
Is there any way I can improve the performance or am I doomed because of how much the console app outputs?
From what I can tell here is what it looks like is going on:
IU thread spins up a background worker to launch the console App.
It redirects the output of the Console and handles it with a handler on the UI thread
The handler on the UI thread then calls Dispatcher.Invoke 160 times a second to update a queue object on the same thread.
After 50 calls the queue starts blocking while items are dequeued by the UI
The trouble would seem to be:
Having the UI thread handle the raw output from the console and the queue and the update to the Graph.
There is also a potential problem with blocking between enqueue and dequeue once the UI is over 50 data items behind that might be leading to a cascading failure. (I can't see enough of the code to be sure of that)
Resolution:
Start another background thread to manage the data from the console app
The new thread should: Create the Queue; handle the OutputDataReceived event; and launch the console app process.
The Event Handler should not use Dispatcher.Invoke to update the Queue. A direct threadsafe call should be used.
The Queue really needs to be non blocking when updating the UI, but I don't really have enough information about how that's being implemented to comment.
Hope this helps
-Chris
I suspect that there's a thread starvation issue happening on the UI thread as your background thread is marshaling calls to an observable collection that is possibly forcing the underlying CollectionView to be recreated each time. This can be a pretty expensive operation.
Depending how you've got your XAML configured is also a concern. The measure / layout changes alone could be killing you. I would imagine that at the rate the data is coming in, the UI hasn't got a chance to properly evaluate what's happening to the underlying data.
I would suggest not binding the View to the Queue directly. Instead of using an Observable Queue as you've suggested, consider:
Use a regular queue that caps content at 50 items. Don't worry about the NotifyCollectionChanged event happening on the UI thread. You also won't have to marshal each item to the UI thread either.
Expose a CollectionViewSource object in your ViewModel that takes the Queue as its collection.
Use a timer thread on the UI to manually force a refresh of the CollectionViewSource. Start with once a second and decrease the interval to see what your XAML and machine can handle. In this fashion, you control when the CollectionView is created and destroyed.
You could try passing the processed data onto the UI Thread from the BackgroundWorker ProgressChanged event.
Something like....
// Standard warnings apply: not tested, no exception handling, etc.
var locker = new object();
var que = new ConcurrentQueue<string>();
var worker = new BackgroundWorker();
var proc = new Process();
proc.StartInfo.FileName = "consoleApp.exe";
proc.StartInfo.UseShellExecute = false;
proc.StartInfo.RedirectStandardOutput = true;
proc.StartInfo.CreateNoWindow = true;
proc.EnableRaisingEvents = true;
proc.OutputDataReceived +=
(p, a) =>
{
que.Enqueue(a.Data);
Monitor.Pulse(locker);
};
worker.DoWork +=
(s, e) =>
{
var watch = Stopwatch.StartNew();
while (!e.Cancel)
{
while (que.Count > 0)
{
string data;
if (que.TryDequeue(out data))
{
if (!String.IsNullOrEmpty(data))
{
var xGroup = Regex.Match(data, "x: ?([-0-9]*)").Groups[1];
int x = int.Parse(xGroup.Value);
var yGroup = Regex.Match(data, "y: ?([-0-9]*)").Groups[1];
int y = int.Parse(yGroup.Value);
var zGroup = Regex.Match(data, "z: ?([-0-9]*)").Groups[1];
int z = int.Parse(zGroup.Value);
var reading = new Reading()
{
Time = watch.Elapsed.TotalMilliseconds,
X = x,
Y = y,
Z = z
};
worker.ReportProgress(0, reading);
}
}
else break;
}
// wait for data or timeout and check if the worker is cancelled.
Monitor.Wait(locker, 50);
}
};
worker.ProgressChanged +=
(s, e) =>
{
var reading = (Reading)e.UserState;
// We are on the UI Thread....do something with the new reading...
};
// start everybody.....
worker.RunWorkerAsync();
proc.Start();
proc.BeginOutputReadLine();
You can simply store the points in a list and call the dispatcher only when you have e.g. reached 160 points so you do not create to many update messages. Currently you are causing a window message every 6ms which is way too much. When you update the UI e.g. every second or every 160 points things will be much smoother. If the notifications are still too much you need to have a look how you can suspend redrawing your control while you update the UI with 160 data points and resume drawing afterwards so you do not get heavy flickering.
List<Reading> _Readings = new List<Reading>();
DateTime _LastUpdateTime = DateTime.Now;
TimeSpan _UpdateInterval = new TimeSpan(0,0,0,0,1*1000); // Update every 1 second
private void SortOutputHandler(object sendingProcess, DataReceivedEventArgs outLine)
{
if (!String.IsNullOrEmpty(outLine.Data))
{
var xGroup = Regex.Match(outLine.Data, "x: ?([-0-9]*)").Groups[1];
int x = int.Parse(xGroup.Value);
var yGroup = Regex.Match(outLine.Data, "y: ?([-0-9]*)").Groups[1];
int y = int.Parse(yGroup.Value);
var zGroup = Regex.Match(outLine.Data, "z: ?([-0-9]*)").Groups[1];
int z = int.Parse(zGroup.Value);
Reading reading = new Reading()
{
Time = _watch.Elapsed.TotalMilliseconds,
X = x,
Y = y,
Z = z
};
// create a batch of readings until it is time to send it to the UI
// via ONE window message and not hundreds per second.
_Readings.Add(reading);
DateTime current = DateTime.Now;
if( current -_LastUpdateTime > _UpdateInterval ) // update ui every second
{
_LastUpdateTime = current;
List<Reading> copy = _Readings; // Get current buffer and make it invisible to other threads by creating a new list.
// Since this is the only thread that does write to it this is a safe operation.
_Readings = new List<Reading>(); // publish a new empty list
Dispatcher.Invoke(new Action(() =>
{
// This is called as part of a Window message in the main UI thread
// once per second now and not every 6 ms. Now we can upate the ui
// with a batch of 160 points at once.
// A further optimization would be to disable drawing events
// while we add the points to the control and enable it after
// the loop
foreach(Reading reading in copy)
{
_readings.Enqueue(reading);
_dataPointsCount++;
}
}),
System.Windows.Threading.DispatcherPriority.Normal);
}
}
}

Resources