Thread Startup Speed in Delphi - multithreading

I am not sure that this question exclusively pertains to Delphi but that is what I use, so I will refer to that.
I have been told that starting up a new thread, even from a typically implemented threadpool takes about 20 - 40ms. I was referred to the article at https://learn.microsoft.com/en-us/windows/desktop/procthread/multitasking, which basically says that a timeslice in Windows is about 20 ms, so realistically the minimum thread execution time is 20 ms.
I have written the code below, which is very basic. In a VMWare workstation VM that is set up with 2 processors, 1 core per processor, the timing reports about 17 ms to complete.
When I run it on my host machine, (an i7-6700) the stopwatch consistently reports 0 ms to complete. I was told that I am just getting "lucky" with the WaitFor on my host machine, and that typically I should expect 20 ms for a single thread. Obviously this means that trying to lower the time of threaded execution below 20ms is not possible.
Is there any definitive explanation about how quickly it takes to start a thread?
The code I am using for testing is below.
unit Unit1;
interface
uses
Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics,
Vcl.Controls, Vcl.Forms, Vcl.Dialogs, Vcl.StdCtrls;
type
TForm1 = class(TForm)
Button1: TButton;
Memo1: TMemo;
procedure Button1Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
TMyThread=class(TThread)
public
Sum:integer;
procedure Execute;override;
end;
var
Form1: TForm1;
implementation
uses
System.Diagnostics;
{$R *.dfm}
procedure TForm1.Button1Click(Sender: TObject);
var
sw:TStopWatch;
thrd: TMyThread;
theSum:integer;
begin
sw:=TStopWatch.StartNew;
thrd:=TMyThread.Create;
thrd.WaitFor;
theSum:=thrd.sum;
thrd.Free;
sw.Stop;
memo1.lines.add('sum: '+theSum.ToString);
memo1.lines.add('elapsed: '+sw.ElapsedMilliseconds.toString);
end;
{ TMyThread }
procedure TMyThread.Execute;
var
cntr: Integer;
begin
inherited;
sum:=0;
for cntr := 0 to 100 do
sum:=sum+cntr;
end;
end.

The fastest speed I was able to get is 14-16 ms with the following code on Win10 x64, i5 6500, Delphi Rio:
unit Unit1;
interface
uses
Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics,
Vcl.Controls, Vcl.Forms, Vcl.Dialogs, Vcl.StdCtrls;
type
TForm1 = class(TForm)
Button1: TButton;
Label1: TLabel;
procedure Button1Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
TMyThread = class(TThread)
public
procedure Execute; override;
end;
var
Form1: TForm1;
implementation
{$R *.dfm}
procedure TForm1.Button1Click(Sender: TObject);
var
M: TMyThread;
S, L: Int64;
begin
QueryPerformanceCounter(S);
M := TMyThread.Create;
M.WaitFor;
QueryPerformanceCounter(L);
Label1.Caption := IntToStr(L - S);
M.Free;
end;
{ TMyThread }
procedure TMyThread.Execute;
begin
inherited;
end;
end.
It is all about OS time slices. Even if on multi-core/hyper-threaded systems with parallel execution, theoretical near zero thread start time, zero context switching and your thread terminates earlier, you can reach it in next time slice.
Multiple short tasks can be executed in one time slice in a single thread.
Thread pools are useful to get instantly an initialized thread if there are multiple short operations but thread initialization take some time.
Slice time is well balanced in OS between context switching time cost and responsivity. Even if there are ways to decrease it to 1ms - 0.5ns, if hardware architecture permits, a lower slice time is not always better.
Edit: Some technologies, like Intel Hyper-Threading allow execution on multiple threads on the same core in the same time slice, see comments.

Timing short threaded applications can be misleading using Delphi because most people are used to doing such timings in the IDE. If you are running in the IDE, thread startup is quite slow, ~ 20 - 40 ms as described in this post. Possibly this is because when running in the IDE there is text send to the "messages" about threads starting and stopping which may be synchronized with the main IDE thread.
Timing outside the IDE by myself and others above on many computers seems to indicate that on windows thread creation/destruction takes between 0.1 and 0.4 msec depending on the computer. Of course, test on your own target computers to confirm. Just make sure to test outside the IDE.
Threadpools then do play a role in Delphi. One, they can very much speed up timing inside while debugging inside the IDE. Also, depending on the threadpool, I have obtained task startup times of 0.01 msec to start the task. Thus, for short tasks that only take 1-2 msec on their own when threaded, the benefits of a shorter task startup time can be meaningful.
However, for most people, where the threads run longer than a 20 msec or so, using a threadpool probably provides negligible benefit.

Related

Unable to enter critical section

Why is it imposible to enter critical section without Sleep(1)?
type
TMyThread = class(TThread)
public
procedure Execute; override;
end;
var
T: TMyThread;
c: TRTLCriticalSection;
implementation
procedure TForm1.FormCreate(Sender: TObject);
begin
InitializeCriticalSection(c);
T := TMyThread.Create(false);
end;
procedure TMyThread.Execute;
begin
repeat
EnterCriticalSection(c);
Sleep(100);
LeaveCriticalSection(c);
sleep(1); // can't enter from another thread without it
until false;
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
EnterCriticalSection(c);
Caption := 'entered';
LeaveCriticalSection(c);
end;
Can't post this because of too much code so text text text text text.
Oh by the way if the section is created by the thread then it is working fine.
There is no guarantee that threads acquire a critical section on a FIFO basis (MSDN). If your current thread always re-acquires the critical section a few uops after releasing it then chances are that any other waiting threads will likely never wake in time to find it available themselves.
If you want better control of lock sequencing there are other synchronization objects you can use. Events or a queue might be suitable but we don't really know what you are trying to achieve.

Thread.FreeOnTerminate := True, memory leak and ghost running

Years ago, I decided never to rely solely on setting a thread's FreeOnTerminate property to true to be sure of its destruction, because I discovered and reasoned two things at application's termination:
it produces a memory leak, and
after program's termination, the thread is still running somewhere below the keyboard of my notebook.
I familiarized myself with a workaround, and it did not bother me all this time. Until tonight, when again someone (#MartinJames in this case) commented on my answer in which I refer to some code that does not use FreeOnTerminate in combination with premature termination of the thread. I dove back in the RTL code and realized I may have made the wrong assumptions. But I am not quite sure about that either, hence this question.
First, to reproduce the above mentioned statements, this illustrative code is used:
unit Unit3;
interface
uses
Classes, Windows, Messages, Forms;
type
TMyThread = class(TThread)
FForm: TForm;
procedure Progress;
procedure Execute; override;
end;
TMainForm = class(TForm)
procedure FormClick(Sender: TObject);
procedure FormDestroy(Sender: TObject);
private
FThread: TMyThread;
end;
implementation
{$R *.dfm}
{ TMyThread }
procedure TMyThread.Execute;
begin
while not Terminated do
begin
Synchronize(Progress);
Sleep(2000);
end;
end;
procedure TMyThread.Progress;
begin
FForm.Caption := FForm.Caption + '.';
end;
{ TMainForm }
procedure TMainForm.FormClick(Sender: TObject);
begin
FThread := TMyThread.Create(True);
FThread.FForm := Self;
FThread.FreeOnTerminate := True;
FThread.Resume;
end;
procedure TMainForm.FormDestroy(Sender: TObject);
begin
FThread.Terminate;
end;
end.
Now (situation A), if you start the thread with a click on the form, and close the form right after the caption changed, there is a memory leak of 68 bytes. I assume this is because the thread is not freed. Secondly, the program terminates immediately, and the IDE is at that same moment back again in normal state. That in contrast to (situation B): when not making use of FreeOnTerminate and the last line of the above code is changed into FThread.Free, it takes (max.) 2 seconds from the disappearance of the program to the normal IDE state.
The delay in situation B is explained by the fact that FThread.Free calls FThread.WaitFor, both which are executed in the context of the main thread. Further investigation of Classes.pas learned that the destruction of the thread due to FreeOnTerminate is done in the context of the worker thread. This lead to the following questions on situation A:
Is there indeed a memory leak? And if so: is it important, could it be ignored? Because when an application terminates, doesn't Windows give back all its reserved resources?
What happens with the thread? Does it indeed run further somewhere in memory until its work is done, or not? And: is it freed, despite the evidence of the memory leak?
Disclaimer: For memory leak detection, I use this very simple unit as first in the project file.
Indeed, the OS reclaims all a process's memory when it terminates, so even if those 68 bytes refer to the non-freed thread object, the OS is going to take those bytes back anyway. It doesn't really matter whether you've freed the object at that point.
When your main program finishes, it eventually reaches a place where it calls ExitProcess. (You should be able to turn on debug DCUs in your project's linker options and step through to that point with the debugger.) That API call does several things, including terminating all other threads. The threads are not notified that they're terminating, so the cleanup code provided by TThread never runs. The OS thread simply ceases to exist.

How to implement thread which periodically checks something using minimal resources?

I would like to have a thread running in background which will check connection to some server with given time interval. For example for every 5 seconds.
I don't know if there is a good "desing pattern" for this? If I remember corretly, I've read somewehere that sleeping thread in its execute method is not good. But I might be wrong.
Also, I could use normal TThread class or OTL threading library.
Any ideas?
Thanks.
In OmniThreadLibrary, you would do:
uses
OtlTask,
OtlTaskControl;
type
TTimedTask = class(TOmniWorker)
public
procedure Timer1;
end;
var
FTask: IOmniTaskControl;
procedure StartTaskClick;
begin
FTask := CreateTask(TTimedTask.Create())
.SetTimer(1, 5*1000, #TTimedTask.Timer1)
.Run;
end;
procedure StopTaskClick;
begin
FTask.Terminate;
FTask := nil;
end;
procedure TTimedTask.Timer1;
begin
// this is triggered every 5 seconds
end;
As for sleeping in Execute - it depends on how you do it. If you use Sleep, then this might not be very wise (for example because it would prevent the thread to stop during the sleep). Sleeping with WaitForSingleObject is fine.
An example of TThread and WaitForSingleObject:
type
TTimedThread = class(TThread)
public
procedure Execute; override;
end;
var
FStopThread: THandle;
FThread: TTimedThread;
procedure StartTaskClick(Sender: TObject);
begin
FStopThread := CreateEvent(nil, false, false, nil);
FThread := TTimedThread.Create;
end;
procedure StopTaskClick(Sender: TObject);
begin
SetEvent(FStopThread);
FThread.Terminate;
FThread.Free;
CloseHandle(FStopThread);
end;
{ TTimedThread }
procedure TTimedThread.Execute;
begin
while WaitForSingleObject(Form71.FStopThread, 5*1000) = WAIT_TIMEOUT do begin
// this is triggered every 5 seconds
end;
end;
OTL timer implementation is similar to the TThread code above. OTL timers are kept in priority list (basically the timers are sorted on the "next occurence" time) and internal MsgWaitForMultipleObjects dispatcher in TOmniWorker specifies the appropriate timeout value for the highest-priority timer.
You could use an event and implement the Execute method of the TThread descendant by a loop with WaitForSingleObject waiting for the event, specifying the timeout. That way you can wake the thread up immediately when needed, e.g. when terminating.
If the thread runs for the life of the app, can be simply terminated by the OS on app close and does not need accurate timing, why bother with solutions that require more typing than sleep(5000)?
To add another means of achieving a 5-sec event it is possible to use the Multimedia Timer which is similar to TTimer but has no dependence on your application. After configuring it (you can setup one-shot or repetitive) it calls you back in another thread. By its nature it is very accurate (to within better than 1ms). See some sample Delphi code here.
The code to call the timer is simple and it is supported on all Windows platforms.
Use CreateWaitableTimer and SetWaitableTimer

Delphi - Updating a global string from a second thread

I am experimenting with multithreading in Delphi (XE) and have run into a problem with the use of a Global Variable between the main VCL thread and a second work thread.
My project involves a 2nd worker thread that scans through some files, and updates a globalvar string with the current filename its on. This globalvar is then picked up via a timer on the main VCL thread, and updates a statusbar.
I have noticed though that it occasionally comes up with a 'Invalid Pointer Operation'...or 'Out of Memory' or the work thread just stops responding (deadlock probably).
I therefore created a test app to identify and greatly increase the chance of error so i could see what's going on.
type
TSyncThread = class(TThread)
protected
procedure Execute; override;
end;
var
Form11: TForm11;
ProgressString : String;
ProgressCount : Int64;
SyncThread : TSyncThread;
CritSect : TRTLCriticalSection;
implementation
{$R *.dfm}
procedure TForm11.StartButtonClick(Sender: TObject);
begin
Timer1.Enabled := true;
SyncThread := TSyncThread.Create(True);
SyncThread.Start;
end;
procedure TForm11.StopbuttonClick(Sender: TObject);
begin
Timer1.Enabled := false;
SyncThread.Terminate;
end;
procedure TForm11.Timer1Timer(Sender: TObject);
begin
StatusBar1.Panels[0].Text := 'Count: ' + IntToStr(ProgressCount);
StatusBar1.Panels[1].Text := ProgressString;
end;
procedure TSyncThread.Execute;
var
i : Int64;
begin
i := 0;
while not Terminated do begin
inc(i);
EnterCriticalSection(CritSect);
ProgressString := IntToStr(i);
ProgressCount := i;
LeaveCriticalSection(CritSect);
end;
end;
initialization
InitializeCriticalSection(CritSect);
finalization
DeleteCriticalSection(CritSect);
I set the timer interval to 10ms so that it is reading a lot, whilst the worker thread is running flat out updating the global var string. Sure enough this app barely lasts a second when run before it comes up with the above errors.
My question is, does the read operation of the Global var in the VCL Timer need to be run in a critical section? - if so, why?. From my understanding it is only a read, and with the writes already running in a critical section, i cannot see why it runs into a problem. If i do put the read in the timer into a critical section as well - it works fine....but im unhappy just doing that without knowing why!
I am new to multithreading so would appreciate any help in explaining why this simple example causes all sorts of problems and if there is a better way to be accessing a string from a worker thread.
Delphi String is allocated on a heap, it is not a static buffer somewhere. The variable itself is just a pointer. When your reading thread accesses a String, and at the same time this very string is being deallocated by another thread, bad things happen. You are accessing already freed memory, possibly allocated again for something else, etc.
Even if this String was a static buffer, update operations are not atomic, therefore you could be using a corrupted string that is being updated at this very moment (half new data and half old).
So you need to protect your reading operations with the same critical section you used around the writing operations.

Delphi 2010: No thread vs threads - TSQLConnection and TSQLDataSet

My previous question
From the above answer, means if in my threads has create objects, i will face the memory allocations/deallocations bottleneck?
I've a case that I need to create TSQLConnection and TSQLDataSet to query data from 5 tables of the database, each table has more than 10000 records. So I will create 5 threads, each thread accept a tablename as parameter via constructor. Unfortunately, i cannot get more obvious difference of time taken. I've write the following codes:
TMyThread = class(TThread)
private
FTableName: string;
protected
procedure Execute; override;
public
constructor Create(const CreateSuspended: Boolean; const aTableName: string);
reintroduce; overload;
end;
constructor TMyThread.Create(const CreateSuspended: Boolean; const aTableName);
begin
inherited Create(CreateSuspended);
FTableName := aTableName;
FreeOnTerminate := True;
end;
procedure TMyThread.Execute;
var C: TSQLConnection;
D: TDataSet;
begin
C := NewSQLConnection;
try
D := NewSQLDataSet(C, FTableName);
try
D.Open;
while not D.Eof do begin
// Do something
D.Next;
end;
finally
D.Free;
end;
finally
C.Free;
end;
end;
function NewSQLConnection: TSQLConnection;
begin
Result := TSQLConnection.Create(nil);
// Setup TSQLConnection
end;
function NewSQLDataSet(const aConn: TSQLConnection; const aTableName: string):
TSQLDataSet;
begin
Result := TSQLDataSet.Create(aConn);
Result.CommandText := Format('SELECT * FROM %s', [aTableName]);
Result.SQLConnection := aConn;
end;
Is there any advice or recommendation for this case?
My first thing I should do: take a look at the CPU time. Where is the bottleneck: is your application using 100% CPU in single user mode, then multi-threaded won't work (only on dual/quad core). But if your app in multi-threaded mode is not using much CPU you could have an other bottleneck: for example, the DB server is maybe using 100% CPU? Or your server HD is slow? Or slow network? Or maybe even the DBExpress database driver is not multi-threaded, so it only uses 1 thread at a time (you will notice low CPU on client and server on multicore).
By the way: yes, you should always try to minimize memory allocations. FastMM is not the best MM for heavy multithreaded: does not scale very well on multicore (you will never see 100% CPU in memory heavy apps). But nothing to bother about in normal apps. Only if this is the case, you could try TopMM: somewhat slower(?) but scales much better:
http://www.topsoftwaresite.nl/Downloads/TopMemory.pdf
Aha! You said you use SuperServer?
Firebird 1.5 Classic Server vs. Superserver
SuperServer:
No SMP support. On multi-processor Windows machines, performance can even drop dramatically as the OS switches the process between CPUs. To prevent this, set the CpuAffinityMask parameter in the configuration file firebird.conf.
So it uses only 1 core. You should use Classic Server (process per connection). Firebird has no true SMP support yet, will be in 3.0 version:
http://tracker.firebirdsql.org/browse/CORE-775
Actually I would like to see a CPU GRAPH with Process Explorer, like step 4 of this page:
http://www.brightrev.com/how-to/windows/53-five-uses-for-sysinternals-process-explorer.html
Because it shows CPU+mem+IO(!) in time. Please of both test app and firebird.
I think firebird is bound by IO (your HD), because in your first CPU screenshot, both have low CPU. Maybe you can increase firebird memory cache?
Btw: how many cores/cpu's do you have?

Resources