Thread safety read InterlockedIncremented value

Thread safety read InterlockedIncremented value - multithreading

In my application several threads increment some counter and only one reads this value (main thread). As far I know, reading 32-bit value is thread-safe if it is aligned by double word, so I use such code:
{$A8}
TMyStat = class
private
FCounter: Integer;
public
procedure IncCounter;
property Counter: Integer read FCounter;
...
procedure TMyStat.IncCounter;
begin
InterlockedIncrement(FCounter);
end;
But I'm not sure that it's safe to mix Interlocked functions and direct access to value.
Should I use InterlockedCompareExchange instead?
function TMyStat.GetCounter: Integer;
begin
Result := InterlockedCompareExchange(FCounter, 0, 0);
end;

You can use a normal read. As FCounter is 4-aligned, reading and writing is guaranteed to be atomic.
[This holds for Intel platforms, though. I really don't know how ARM behaves. I would guess that the behaviour is the same (reading of aligned value is atomic).]
Actually, if you only increment a counter and read it, you don't even need InterlockedIncrement. When you read the value you'll always get either the pre-increment or post-increment value. There's no way you get a mix of both.

Reading an aligned 32-bit value is atomic (i.e. all 32 bits are guaranteed to be consistent). But the read is not synchronized. You may not read the "current" value; but instead a value from the caches.
For example:
FCounter := 1; //number of threads running
FResult := 0; //the answer to everything
And then your thread:
procedure ThreadProc;
begin
FResult := 42; //set the answer before we indicate we're done
InterlockedDecrement(FCounter);
end;
And then your code checks that the thread is done:
if (FCounter <= 0) then
begin
//Thread is done; read the answer
ShowMessage('The answer is: '+IntToStr(FResult));
Exit;
end;
The answer is: 0
Even though your flag said the thread set the result, you read the result value out of your local cache.
Even though the read of FCounter and FResult is atomic
they aren't synchronized
If you are only using FCounter then you are fine. But as soon as you use anything else and expect them to be consistent, then you need to also use something to enforce synchronization.
The strict answer to your question is that reading a 32-bit aligned value is already an atomic operation.
But it's entirely possible that people coming here to read this question might forget that the read being atomic is a small part of your worries.

Related

System.Move and Array of String

I am trying to move some array elements (of string) to some other position.
When I use System.Move(), FastMM4 reports leaks.
Here is a small snippet to show the problem:
procedure TForm1.Button2Click(Sender: TObject);
type
TArrayOfStr = array of string;
const
Count1 = 100;
String1 = 'some string '; {space at end}
var
Array1: TArrayOfStr;
Index1: Integer;
begin
SetLength(Array1, Count1);
Index1 := 0;
while Index1 < Count1 do begin
Array1[Index1] := String1 + IntToStr(Index1);
Inc(Index1);
end;
System.Move(Array1[0], Array1[3], 2 * SizeOf(string)); {move 2 cells from cell 0 to cell 3}
ShowMessage(Array1[3]);
end;
It probably has something to do with SizeOf(String) but I don't know what.
Could someone help me make the leaks go away?

Issues
The problem you are having has to do with the reference counting of the string.
Leaks
If there is already a string in the areas you're overwriting these strings will not get freed. These are the leaks you are reporting.
Potential access violations
You copy the string pointers, but you do not increase the reference count of the string. This will lead to access violations if the original strings ever get destroyed due to going out of scope. This is a very subtle bug and will bite you when you least expect it.
Best solution
It's far simpler to just let Delphi do the copying and then all internal bookkeeping will get done properly.
{move 6 from cell 1 to cell 3}
System.Move(Array1[0], Array1[3], 2 * SizeOf(string));
//This does not increase the reference count for the string;
//leading to problems at cleanup.
Array1[3]:= Array1[0];
Array1[4]:= Array1[1]; //in a loop obviously :-)
//this increases the reference count of the string.
Note that Delphi does not copy the strings, it just copies the pointers and increases the ref counts as needed. It also frees any strings as needed.
Hack solution
You should manually clear the area first.
Using
for i:= start to finish do Array1[i]:= '';
The next part of this solution horrible hack is to manually increase the ref counts on the strings you've copied.
See: http://docwiki.embarcadero.com/RADStudio/Seattle/en/Internal_Data_Formats#Long_String_Types
procedure IncreaseRefCount(const str: string; HowMuch: integer = 1);
var
Hack: PInteger;
begin
Hack:= pointer(str);
Dec(Hack,2); //get ref count
Hack^:= AtomicIncrement(Hack^,HowMuch);
end;
System.Move(Array1[0], Array1[3], 2 * SizeOf(string));
IncreaseRefCount(Array1[3]);
.... do this for every copied item.
Note that this hack is not completely thread safe if you get the strings from somewhere outside your array.
However if you are really really in need of speed it might be a solution to gain a wooping 2% in the performance of the copy.
Warning
Don't use this code to decrease ref counts manually, you'll run into thread-safety issues!
Need for speed
It is unlikely that simply coping a few strings leads to slowness.
There is no way to get out of the clean up issues if you insist on using managed strings.
On the other hand the overhead of the reference counting is really not that bad, so I suspect the reason for the slowness lies elsewhere; somewhere we can't see because you haven't told us your problem.
I suggest you ask a new question explaining what you're trying to do, why and where the slowness is hurting you.

The String type in Delphi is a managed type. Delphi keeps record of references and dereferences and automatically releases memory allocated to the string when it's no longer being referenced.
The reason for the leak is that you are bypassing Delphi's management of the string type. You are simply overwriting the pointer that references the 4th string in the array. (and for that matter the 5th as well because of 2 * ...) So you now have strings in memory that are no longer referenced. But Delphi doesn't realise this and cannot free the memory.
Solution: write Array1[3] := Array1[0];
Edit:
I've fully answered your question; giving you everything you need to: 1) Understand why you've got the memory leaks. 2) And make the leaks go away.
Yet you're not satisfied... In comments you've explained that you're trying to improve a phantom performance problem you've conjured up via an artificial benchmark.
It's been explained to you that the only reason Move is a little faster than normal string assignment is that: string assignment needs to do additional record keeping to prevent memory leaks and access violations.
If you insist on using Move, you'll need to do said record keeping yourself. (Johan has even demonstrated how.)
And then you complain that this will slow down your Move "solution".
Seriously, take your pick: If you want to use string, you can have it a little faster with AV's and memory leaks OR a little slower but behaving correctly. There's no magic wand waving that's going fix it for you.
You could choose to abandon the string type (and all the goodness it gives you). I.e. Use array of char instead. Of course, you'll have to do all the memory management yourself, and see how far multi-byte string copying helps you.
I still maintain that if you ask a new question demonstrating a specific performance problem you're trying to solve, you'll get much better feedback.
E.g. In comments you've mentioned that you're trying to improve the performance of TStringList.
I have previously encountered a performance problem with TStringList in older versions of Delphi. It's generally fine even working with hundreds of thousands of items. However, even with only 10,000 strings, CommaText was noticeably slow; and at 40,000 it was almost unbearable.
The solution didn't involve trying to bastardise reference counting: because that's not the reason it was slow. It was slow because the algorithm was a little naive and performed a huge number of incremental memory allocations. Writing a custom CommaText method solved it.

I'm adding a fundamentally different answer in which I present an option how you can do something you really and truly should not be doing at all.
Disclaimer: What I describe here is terrible advice and should not be done (but it would work). Feel free to learn from it, but don't use it.
As has already been discussed ad nauseam, your problem is that when you use Move you bypass reference counting.
The solution involves making "internal" reference counting unnecessary by working on the principal that no matter how many times a string is internally referenced, you'll hold only a single count to the string.
And you'll only remove that single count when you're certain you have no more "internal" references to the string.
Unfortunately, having abandoned internal reference counting, the only time you can be sure of this is when you've completely finished with your internal array.
At this point you must first manually clear the internal array.
Then reduce the reference count of all strings you had previously used by 1.
This implies you need a separate "master reference" to each string.
Warning
There are 2 immediate problems:
You need a second list to track the master reference count; wasting memory.
You cannot recover memory used by strings that are no longer referenced internally until you've finished with the internal array entirely because you've abandoned your ability to track internal reference counts.
(Technically this is still a memory leak, albeit controlled. And FastMM won't report it provided you cleanup correctly.)
Without further ado, some sample code:
//NOTE: Deliberate use of fixed size array instead of dynamic to
//avoid further complications. See Yet Another Warning after code
//for explanation and resolution.
TStringArray100 = array[0..99] of string;
TBadStrings = class(TObject)
private
FMasterStrings: TStrings;
FInternalStrings: TStringArray100;
public
...
end;
constructor TBadStrings.Create()
begin
FMasterStings := TStringList.Create;
end;
destructor TBadStrings.Destroy;
begin
Clear;
FMasterStrings.Free;
inherited;
end;
procedure TBadStrings.Clear;
begin
for I := 0 to 99 do
Pointer(FInternalStrings[I]) := 0;
//Should optimise to equivalent of
//Move(0, FInternalStrings[0], 100 * SizeOf(String));
//NOTE: Only now is it safe to clear the master list.
FMasterStings.Clear;
end;
procedure TBadStrings.SetString(APos: Integer; AString: string);
begin
FMasterStrings.Add(AString); //Hold single reference count
//Bypass reference counting to assign the string internally
//Equivalent to Move
Pointer(FInternalStrings[APos]) := Pointer(AString);
end;
//Usage
begin
LBadStrings := TBadStrings.Create;
try
for I := 0 to 199 do
begin
//NOTE 0 to 99 are set, then all overwritten with 100 to 199
//However strings 0 to 99 are still in memory...
LBadStrings.SetString(I mod 100, 'String ' + IntToStr(I));
end;
finally
//...until cleanup.
LBadStrings.Free;
end;
end;
NOTE: You can add methods to do whatever you like using Move on FInternalStrings. It won't matter that those references aren't being tracked because the master reference can perform correct cleanup at the end. However....
WARNING: Anything you do to FInternalStrings MUST also bypass reference counting otherwise you'll have nasty side-effects. So it should go without saying you need to solidly guard access to the internal array. If client code gets direct access to the array, you can expect accidental 'abuse'.
Yet Another Warning: As commented in code this uses a fixed size array to avoid other problems. Said problems are that if you use a dynamic array, then resizing the array can apply reference counting. Increasing the array size shouldn't be a problem (I recall that being a pointer copy). However, when the size is decreased, elements that are discarded will be dereferenced as necessary. This means you'll have to take the precaution of first pointer nilling these elements before shrinking the array.
The above is a way you can bypass string reference counting in a controlled fashion. But let me reiterate what a terrible idea it is.
You should have no trouble concocting an artificial benchmark to demonstrate that it's faster. However, I seriously doubt it will provide any benefit in a real-world environment.
In fact, if it does you probably have a different problem entirely; because why on earth would be shuffling the strings around so much that time spent there overshadows other aspects of your application?

procedure TamListVar<T>.Insert(aIndex: Integer; aItem: T);
begin
InitCheck;
if not IsIndex(aIndex) then Add(aItem)
else
begin
Setlength(Items,FCount+1);
System.Move(Items[aIndex], Items[aIndex + 1], (FCount - aIndex) * SizeOf(Items[aIndex]));
PPointer(#Items[aIndex])^:=nil;
Items[aIndex]:= aItem;
inc(FCount);
end;
end;

Here is the solution I have come up with. It is heavily inspired by System.Move().
As far as I could see, after doing a number of tests, it seems to work OK --no leaks reported by FastMM4.
Obvioulsy, this is not the hand-optimized asm routine I was after; but given my (lack of) talents in asm area, this has to do for now.
I'd be most appreciative if you commented on this --especially to point out any pitfalls, as well as any other (e.g. speed) improvements.
{ACount refers to the number of actual array elements (cells of strings), not their byte count.}
procedure MoveString(const ASource; var ATarget; ACount: NativeInt);
type
PString = ^string;
const
SzString = SizeOf(string);
var
Source1: PString;
Target1: PString;
begin
Source1 := PString(#ASource);
Target1 := PString(#ATarget);
if Source1 = Target1 then Exit;
while ACount > 0 do begin
Target1^ := Source1^;
//Source1^ := ''; {enable if you want to avoid duplicates}
Inc(Source1);
Inc(Target1);
Dec(ACount);
end;
end;
procedure TForm1.Button2Click(Sender: TObject);
type
TArrayOfStr = array of string;
const
Count1 = 100;
String1 = 'some string '; {space at end}
var
Array1: TArrayOfStr;
Index1: Integer;
begin
SetLength(Array1, Count1);
Index1 := 0;
while Index1 < Count1 do begin
Array1[Index1] := String1 + IntToStr(Index1);
Inc(Index1);
end;
MoveString(Array1[0], Array1[3], 2); {move 2 cells from cell 0 to cell 3}
ShowMessage(Array1[3]); {should be 'some string 0'}
MoveString(Array1[3], Array1[0], 2); {move 2 cells from cell 3 to cell 0}
ShowMessage(Array1[0]); {should be 'some string 0'}
end;

How can I use a Dictionary/StringList inside Execute of a Thread - delphi

I have a thread class TValidateInvoiceThread:
type
TValidateInvoiceThread = class(TThread)
private
FData: TValidationData;
FInvoice: TInvoice; // Do NOT free
FPreProcessing: Boolean;
procedure ValidateInvoice;
protected
procedure Execute; override;
public
constructor Create(const objData: TValidationData; const bPreProcessing: Boolean);
destructor Destroy; override;
end;
constructor TValidateInvoiceThread.Create(const objData: TValidationData;
const bPreProcessing: Boolean);
var
objValidatorCache: TValidationCache;
begin
inherited Create(False);
FData := objData;
objValidatorCache := FData.Caches.Items['TInvAccountValidator'];
end;
destructor TValidateInvoiceThread.Destroy;
begin
FreeAndNil(FData);
inherited;
end;
procedure TValidateInvoiceThread.Execute;
begin
inherited;
ValidateInvoice;
end;
procedure TValidateInvoiceThread.ValidateInvoice;
var
objValidatorCache: TValidationCache;
begin
objValidatorCache := FData.Caches.Items['TInvAccountValidator'];
end;
I create this thread in another class
procedure TInvValidators.ValidateInvoiceUsingThread(
const nThreadIndex: Integer;
const objValidatorCaches: TObjectDictionary<String, TValidationCache>;
const nInvoiceIndex: Integer; const bUseThread, bPreProcessing: Boolean);
begin
objValidationData := TValidationData.Create(FConnection, FAllInvoices, FAllInvoices[nInvoiceIndex], bUseThread);
objValidationData.Caches := objValidatorCaches;
objThread := TValidateInvoiceThread.Create(objValidationData, bPreProcessing);
FThreadArray[nThreadIndex] := objThread;
FHandleArray[nThreadIndex]:= FThreadArray[nThreadIndex].Handle;
end;
Then I execute it
rWait:= WaitForMultipleObjects(FThreadsRunning, #FHandleArray, True, 100);
Note I have removed some code out of here to try to keep it a bit simpler to follow
The problem is that my Dictionary is becoming corrupt
If I put a breakpoint in the constructor all is fine
However, in the first line of the Execute method, the dictionary is now corrupt.
The dictionary itself is a global variable to the class
Do I need to do anything special to allow me to use Dictionaries inside a thread?
I have also had the same problem with a String List
Edit - additional information as requested
TInvValidators contains my dictionary
TInvValidators = class(TSTCListBase)
private
FThreadArray : Array[1..nMaxThreads] of TValidateInvoiceThread;
FHandleArray : Array[1..nMaxThreads] of THandle;
FThreadsRunning: Integer; // total number of supposedly running threads
FValidationList: TObjectDictionary<String, TObject>;
end;
procedure TInvValidators.Validate(
const Phase: TValidationPhase;
const objInvoices: TInvoices;
const ReValidate: TRevalidateInvoices;
const IDs: TList<Integer>;
const objConnection: TSTCConnection;
const ValidatorCount: Integer);
var
InvoiceIndex: Integer;
i : Integer;
rWait : Cardinal;
Flags: DWORD; // dummy variable used in a call to find out if a thread handle is valid
nThreadIndex: Integer;
procedure ValidateInvoiceRange(const nStartInvoiceID, nEndInvoiceID: Integer);
var
InvoiceIndex: Integer;
I: Integer;
begin
nThreadIndex := 1;
for InvoiceIndex := nStartInvoiceID - 1 to nEndInvoiceID - 1 do
begin
if InvoiceIndex >= objInvoices.Count then
Break;
objInvoice := objInvoices[InvoiceIndex];
ValidateInvoiceUsingThread(nThreadIndex, FValidatorCaches, InvoiceIndex, bUseThread, False);
Inc(nThreadIndex);
if nThreadIndex > nMaxThreads then
Break;
end;
FThreadsRunning := nMaxThreads;
repeat
rWait:= WaitForMultipleObjects(FThreadsRunning, #FHandleArray, True, 100);
case rWait of
// one of the threads satisfied the wait, remove its handle
WAIT_OBJECT_0..WAIT_OBJECT_0 + nMaxThreads - 1: RemoveHandle(rWait + 1);
// at least one handle has become invalid outside the wait call,
// or more than one thread finished during the previous wait,
// find and remove them
WAIT_FAILED:
begin
if GetLastError = ERROR_INVALID_HANDLE then
begin
for i := FThreadsRunning downto 1 do
if not GetHandleInformation(FHandleArray[i], Flags) then // is handle valid?
RemoveHandle(i);
end
else
// the wait failed because of something other than an invalid handle
RaiseLastOSError;
end;
// all remaining threads continue running, process messages and loop.
// don't process messages if the wait returned WAIT_FAILED since we didn't wait at all
// likewise WAIT_OBJECT_... may return soon
WAIT_TIMEOUT: Application.ProcessMessages;
end;
until FThreadsRunning = 0; // no more valid thread handles, we're done
end;
begin
try
FValidatorCaches := TObjectDictionary<String, TValidationCache>.Create([doOwnsValues]);
for nValidatorIndex := 0 to Count - 1 do
begin
objValidator := Items[nValidatorIndex];
objCache := TValidationCache.Create(objInvoices);
FValidatorCaches.Add(objValidator.ClassName, objCache);
objValidator.PrepareCache(objCache, FConnection, objInvoices[0].UtilityType);
end;
nStart := 1;
nEnd := nMaxThreads;
while nStart <= objInvoices.Count do
begin
ValidateInvoiceRange(nStart, nEnd);
Inc(nStart, nMaxThreads);
Inc(nEnd, nMaxThreads);
end;
finally
FreeAndNil(FMeterDetailCache);
end;
end;
If I remove the repeat until and leave just WaitForMultipleObjects I still get lots of errors
You can see here that I am processing the invoices in chunks of no more than nMaxThreads (10)
When I reinstated the repeat until loop it worked on my VM but then access violated on my host machine (which has more memory available)
Paul

Before I offer guidance on how to resolve your problem, I'm going to give you a very important tip.
First ensure your code works single-threaded, before trying to get a multi-threaded implementation working. The point is that multi-threaded code adds a whole new layer of complexity. Until your code works correctly in a single thread, it has no chance of doing so in multiple threads. And the extra layer of complexity makes it extremely difficult to fix.
You might believe you've got a working single-threaded solution, but I'm seeing errors in your code that imply you still have a lot of resource management bugs. Here's one example with relevant lines only, and comments to explain the mistakes:
begin
try //try/finally is used for resource protection, in order to protect a
//resource correctly, it should be allocated **before** the try.
FValidatorCaches := TObjectDictionary<String, TValidationCache>.Create([doOwnsValues]);
finally
//However, in the finally you're destroying something completely
//different. In fact, there are no other references to FMeterDetailCache
//anywhere else in the code you've shown. This strongly implies an
//error in your resource protection.
FreeAndNil(FMeterDetailCache);
end;
end;
Reasons for not being able to use the dictionary
You say that: "in the first line of the Execute method, the dictionary is now corrupt".
For a start, I'm fairly certain that your dictionary isn't really "corrupt". The word "corrupt" implies that it's there, but its internal data is invalid resulting in inconsistent behaviour. It's far more likely that by the time the Execute method wants to use the dictionary, it has already been destroyed. So your thread is basically pointing to an area of memory that used to have a dictionary, but it's no longer there at all. (I.e. not "corrupt")
SIDE NOTE It is possible for your dictionary to truly become corrupt because you have multiple threads sharing the same dictionary. If different threads cause any internal changes to the dictionary at the same time, it could very easily become corrupt. But, assuming your threads are all treating the dictionary as read-only, you would need a memory overwrite to corrupt it.
So let's focus on what might cause your dictionary to be destroyed before the thread gets to use it. NOTE I can't see anything in the code provided, but there are 2 likely possibilities:
Your main thread destroys the dictionary before the child thread gets to use it.
One of your child threads destroys the dictionary as soon as it is destroyed resulting in all other threads being unable to use it.
In the first case, this would happen as follows:
Main Thread: ......C......D........
Child Thread ---------S......
. = code being executed
C = child thread created
- = child thread exists, but isn't doing anything yet
S = OS has started the child thread
D = main thread destroys dictionary
The point of the above is that it's easy to forget that the main thread can reach a point where it decides to destroy the dictionary even before the child thread starts running.
As for the second possibility, this depends on what is happening inside the destructor of TValidationData. Since you haven't shown that code, only you know the answer to that.
Debugging to pinpoint the problem
Assuming the dictionary is being destroyed too soon, a little debugging can quickly pinpoint where/why the dictionary is being destroyed. From your question, it seems you've already done some debugging, so I'm assuming you'll have no trouble following these steps:
Put a breakpoint on the first line of the dictionary's destructor.
Run your code.
If you reach Execute before reaching the dictionary's destructor, then the thread should still be able to use the dictionary.
If you reach the dictionary's destructor before reaching Execute, then you simply need to examine the sequence of calls leading to the object's destruction.
Debugging in case of a memory overwrite
Keeping an open mind about the possibility of a memory overwrite... This is a little trickier to debug. But provided you can consistently reproduce the problem it should be possible to debug.
Put a breakpoint in the thread's destructor.
Run the app
When you reach the above breakpoint, find the address of the of the dictionary by pressing Ctrl + F7 and evaluating #FData.Caches.
Now add a Data Breakpoint (use the drop-down from the Breakpoints window) for the address and the size of the dictionary.
Continue running, the app will pause when the the data changes.
Again, examine the call-stack to determine the cause.
Wrapping up
You have a number of questions and statements that imply misunderstandings about sharing data (dictionary/string list) between threads. I'll try cover those here.
There is nothing special required to use a Dictionary/StringList in a thread. It's basically the same as passing it to any other object. Just make sure the Dictionary/StringList isn't destroyed prematurely.
That said, whenever you share data, you need to be aware of the possibility of "race conditions". I.e. one thread attempts to access the shared data at the same time another thread is busy modifying it. If no threads are modifying the data, then there's no need for concern. But as soon as any thread is able to modify the data, the access needs to be made "thread-safe". (There are a number of ways to do this, please search for existing questions on SO.)
You mention: "The dictionary itself is a global variable to the class". Your terminology is not correct. A global variable is something declared at the unit level and is accessible anywhere. It's enough to simply say the dictionary is a member of or field of the class. When dealing with "globals", there are significantly different things to worry about; so best to avoid any confusion.
You may want to rethink how you initialise your threads. There are a few reasons you some entries of FHandleArray won't be initialised. Are you ok with this?
You mention AV on a machine that has more memory available. NOTE: Amount of memory is not relevant. And if you run in 32-bit mode you wouldn't have access to more than 4 GB in any case.
Finally, to make a special mention:
Using multiple threads to perform dictionary lookups is extremely inefficient. A dictionary lookup is an O(1) operation. The overhead of threading will almost certainly slow you down unless you intend doing a significant amount of processing in addition to the dictionary lookup.
PS - (not so big) mistake
I noticed the following in your code, and it's a mistake.
procedure TValidateInvoiceThread.Execute;
begin
inherited;
ValidateInvoice;
end;
The TThread.Execute method is abstract, meaning there's no implementation. Attempting to call an abstract method will trigger an EAbstractError. Luckily as LU RD points out, the compiler is able to protect you by not compiling the line in. Even so, it would be more accurate to not call inherited here.
NOTE: In general, overridden methods don't always need to call inherited. You should be explicitly aware of what inherited is doing for you and decide whether to call it on a case-by-case basis. Don't go into auto-pilot mode of calling inherited just because you're overriding a virtual method.

Threads complete but loop doesn't end

I wrote some threading code with what seems to be an incorrect assumption, that integers were thread-safe. Now it seems that although they are, my use of them is NOT thread safe. I'm using a global integer ThreadCount to hold the number of threads. During thread create, I increment the ThreadCount. During thread destroy, I decrement it. After all threads are done being created, I wait for them to be done (ThreadCount should drop to 0) and then write my final report and exit.
Sometimes (5%) though, I never get to 0, even though a post-mortem examination of my log shows that all threads did run and complete. So all signs point to ThreadCount getting trampled. I have been telling myself that this is impossible since it's an integer, and I'm just using inc/dec.
Here some relevant code:
var // global
ThreadCount : integer; // Number of active threads
...
constructor TTiesUpsertThread.Create(const CmdStr : string);
begin
inherited create(false);
Self.FreeOnTerminate := true;
...
Inc(ThreadCount); // Number of threads created. Used for throttling.
end;
destructor TTiesUpsertThread.Destroy;
begin
inherited destroy;
Dec(ThreadCount); // When it reaches 0, the overall job is done.
end;
...
//down at the end of the main routine:
while (ThreadCount > 0) do // Sometimes this doesn't ever end.
begin
SpinWheels('.'); // sleeps for 1000ms and writes dots... to console
end;
I THINK my problem is with inc/dec. I think I'm getting collisions where an two or more dec() hit at the same time and both read the same value, so they replace it with the same value. ex: ThreadCount = 5, and two threads end at the same time, both read 5, replace with 4. But the new value should be 3.
This never runs into trouble in our test environment, (different hardware, topology, load, etc..) so I'm looking for confirmation that this is likely the problem, before I try to "sell" this solution to the business unit.
If this is my problem, do I use a critical selection to protect the inc/dec?
Thanks for taking a look.

If multiple threads modify the variable without protection then yes you have a data race. If two threads attempt to increment or decrement at the same instance then what happens is:
The variable is read into a register.
The modification is made in the register.
The new value is written back to the variable.
That read/modify/write is not atomic. If you have two threads executing at the same time then you have the canonical data race.
Thread 1 reads the value, N say.
Thread 2 reads the value, the same value as was read by thread 1, N.
Thread 1 writes N+1 to the variable.
Thread 2 writes N+1 to the variable.
And instead of the variable being incremented twice, it is incremented only once.
In this case there's no need for a full blown critical section. Use InterlockedIncrement to perform lock-free, thread safe modification.

The best practice on how to wait for data within thread and write it into file?

I am programming multi-threaded app. I have two threads. On is for transferring some data from device to global data buffer and second is for writing those data to file.
Data from device to buffer is transferring asynchronously. The purpose of second thread should be to wait for specified amount of data to be written in main data buffer and finally to write it to file.
Well the first thread is in DLL and second one is in main app. Temporarily I solve this with events. First thread transfers data from device to main data buffer and counts data and when specified amount of data is transferred it sets an event. The second one waits for event to be signalled and when it is it runs some code for data store. Simple as that it is working.
Thread1.Execute:
var DataCount, TransferedData: Integer;
DataCounter := 0;
while not Terminted do
begin
TransferData(#pData, TransferedData);
Inc(DataCounter, TransferedData)
if DataCounter >= DataCountToNotify then SetEvent(hDataCount);
end;
Thread2.Execute:
hndlArr[0] := hDataCount;
hndlArr[1] := hTerminateEvent;
while (not Terminated) do
begin
wRes := WaitForMultipleObjects(HandlesCount, Addr(hndlArr), false, 60000);
case wRes of
WAIT_OBJECT_0:
begin
Synchronize(WriteBuffer); // call it from main thread
ResetEvent(hndlArr[0]);
end;
WAIT_OBJECT_0 + 1:
begin
ResetEvent(hTerminateEvent);
break;
end;
WAIT_TIMEOUT: Break;
end;
end;
Now I would like to make second thread more independent... so I can make multiple instances of second thread and I don't need to wait for first thread. I would like to move data counting part of code from first thread to second one so I won't need data counting in first thread anymore. First one will be just for data transfer purpose.
I would like to use second one as data counter and to store data. But now I will have to loop and constantly check manually for specified data amount. If I had while loop I will have to add some sleep so second thread will not decrease computer performances but I don't know how long should sleep be while speed of data transfer in firts thread is not constant and thus speed of counting in second thread will vary.
My guess that this code sample is not good:
Thread2.Execute:
var DataCount: Integer;
DataIdx1 := GetCurrentDataIdx;
while (not Terminated) do
begin
if (GetCurrentDataIdx - DataIdx1) >= DataCountToNotify then
begin
Synchronize(WriteBuffer);
DataIdx1 := GetCurrentIdx;
end;
sleep(???);
end;
So my question is what is the best approach to solve that issue with data counting and storing it within second thread? What are your experiences and suggestions?

You have some issues. #LU RD has already pointed one out - don't synchronize stuff that does not need to be synchronized. It's not clear what 'WriteBuffer' does, but the file system and all databases I have used are just fine to have one thread opening a file/table and writing to them.
Your buffer system could probably do with some attention. Is there some 'specified data amount' or is this some notional figure that allows lazy writing?
Typically, producer and consumer threads exchange multiple buffer pointers on queues and so avoid sharing any single buffer. Given that this is a DLL and so memory-management calls can be problematic, I would probably avoid them as well by creating a pool of buffers at startup to transfer data round the system. I would use a buffer class rather than just pointers to memory, but it's not absolutely required, (just much more easy/flexible/safe).
Sleep() loops are a spectacularly bad way of communicating between threads. Sleep() does hae its uses, but this is not one of them. Delphi/Windows has plenty of synchronization mechanisms - events, semaphores, mutexes etc - tha make such polling unnecessary.
LU RD has also mentioned the problems of parallel processing of data whose order must be preserved. This often requires yet another thread, a list-style collection and sequence-numbers. I wouldn't try that until you have the inter-thread comms working well.

If you want to avoid the Sleep() call in your second thread, use a waitable timer like TSimpleEvent.
Set the sleep time to handle all your timing conditions.
There is an advantage of using this scheme instead of a normal Sleep(), since a waitable timer will not put the thread into a deep sleep.
To dispose the thread, see comments in code.
var
FEvent: TSimpleEvent;
FSleepTime: Integer = 100; // Short enough to handle all cases
Constructor TThread2.Create;
begin
Inherited Create( False);
FEvent := TSimpleEvent.Create;
Self.FreeOnTerminate := True;
end;
procedure TThread2.Execute;
var
DataCount: Integer;
begin
DataIdx1 := GetCurrentDataIdx;
while (fEvent.WaitFor(FSleepTime) = wrTimeout) do
begin
if Terminated then
break;
// Do your work
if (GetCurrentDataIdx - DataIdx1) >= DataCountToNotify then
begin
// Write data to buffer
DataIdx1 := GetCurrentIdx;
end;
end;
end;
// To stop the thread gracefully, call this instead of Terminate or override the DoTerminate
procedure TThread2.SetTerminateFlag;
begin
FEvent.SetEvent;
end;

Possible to force Delphi threadvar Memory to be Freed?

I have been chasing down what appears to be a memory leak in a DLL built in Delphi 2007 for Win32. The memory for the threadvar variables is not freed if the threads still exist when the DLL is unloaded (there are no active calls into the DLL when it is unloaded).
The question: Is there some way to cause Delphi to free memory associated with threadvar variables? It is not as simple as just not using them. A number of the existing Delphi components use them, so even if the DLL does not explicitly declare them, it ends up using them.
A Few Details
I have tracked it down to a LocalAlloc call that occurs in response to the usage of a threadvar variable, which is Delphi's "wrapper" around thread-local storage in Win32. For the curious, the allocation call is in the Delphi source file sysinit.pas. The corresponding LocalFree call occurs only for threads that get DLL_THREAD_DETACH calls. If you have multiple threads in an application and unload a DLL, there is no DLL_THREAD_DETACH call for each thread. The DLL gets a DLL_PROCESS_DETACH and nothing else; I believe that is expected and valid. Thus, any thread-local storage allocations made on other threads are leaked.
I re-created it with a short C program that starts several "worker" threads. It loads the DLL (via LoadLibrary) on the main thread and then makes calls into an exported function on the worker threads. The function exported from the Delphi DLL assigns a value to a threadvar integer variable and returns. The C program then unloads the DLL (via FreeLibrary on the main thread) and repeats. After about 32,000 iterations, the process memory usage shown in Process Explorer grows to over 130MB. I also verified it more accurately with umdh. UMDH showed 24 bytes lost per instance. But the 130MB in Process Explorer seems to indicate about 4K per iteration; I'm guessing a 4K segment was leaked each time based on that, but I don't know for sure.
For clarification, here is the threadvar declaration and the entire exported function:
threadvar
threadint : integer;
function Startup( ulID: LongWord; hValue: Longint ): LongWord; stdcall;
begin
threadint := 123;
Result := 0;
end;
Thanks.

As you've already determined, thread-local storage will get released for each thread that gets detached from the DLL. That happens in System._StartLib when Reason is DLL_Thread_Detach. For that to happen, though, the thread needs to terminate. Thread-detach notifications occur when the thread terminates, not when the DLL is unloaded. (If it were the other way around, the OS would have to interrupt the thread someplace so it could insert a call to DllMain on the thread's behalf. That would be disastrous.)
The DLL is supposed to receive thread-detach notifications. In fact, that's the model suggested by Microsoft in its description of how to use thread-local storage with DLLs.
The only way to release thread-local storage is to call TlsFree from the context of the thread whose storage you want to free. From what I can tell, Delphi keeps all its threadvars in a single TLS index, given by the TlsIndex variable in SysInit.pas. You can use that value to call TlsFree whenever you want, but you'd better be sure there won't be any more code executed by the DLL in the current thread.
Since you also want to free the memory used for holding all the threadvars, you'll need to call TlsGetValue to get the address of the buffer Delphi allocates. Call LocalFree on that pointer.
This would be the (untested) Delphi code to free the thread-local storage.
var
TlsBuffer: Pointer;
begin
TlsBuffer := TlsGetValue(SysInit.TlsIndex);
LocalFree(HLocal(TlsBuffer));
TlsFree(SysInit.TlsIndex);
end;
If you need to do this from the host application instead of from within the DLL, then you'll need to export a function that returns the DLL's TlsIndex value. That way, the host program can free the storage itself after the DLL is gone (thus guaranteeing no further DLL code executes in a given thread).

Note that it is clearly specified in the Help that you have to take care of freeing yourself your threadvars.
You should do so as soon as you know you won't need them anymore.
From Help:
Dynamic variables that are ordinarily managed by the compiler (long strings, wide strings, dynamic arrays, variants, and interfaces) can be declared with threadvar, but the compiler does not automatically free the heap-allocated memory created by each thread of execution. If you use these data types in thread variables, it is your responsibility to dispose of their memory from within the thread, before the thread terminates. For example,
threadvar S: AnsiString;
S := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
...
S := ''; // free the memory used by S
Note: Use of such constructs is discouraged.
You can free a variant by setting it to Unassigned and an interface or dynamic array by setting it to nil.

At the risk of way too much code, here is a possible (poor) solution to my own question. Using the fact that the thread-local storage memory is stored in a single block for the threadvar variables (as pointed out by Mr. Kennedy - thanks), this code stores the allocated pointers in a TList and then frees them at process detach. I wrote it mostly just to see if it would work. I probably would not use this in production code because it makes assumptions about the Delphi runtime that could change with different versions and quite possibly misses problems even with the version I am using (Delphi 7 and 2007).
This implementation does make umdh happy, it doesn't think there are any more memory leaks. However, if I run the test in a loop (load, call entrypoint on another thread, unload), the memory usage as seen in Process Explorer still grows alarmingly fast. In fact, I created a completely empty DLL with only an empty DllMain (that was not called since I did not assign Delphi's global DllMain pointer to it ... Delhi itself provides the real DllMain entrypoint). A simple loop of loading/unloading the DLL still leaked 4K per iteration. So there may still be something else a Delphi DLL is supposed to include (the main point of the original question). But I don't know what it is. A DLL written in C does not behave this way.
Our code (a server) can call DLLs written by customers to extend functionality. We typically unload the DLL after there are no more references to it. I think my solution to the problem is going to be to add an option to leave the DLL loaded "permanently" in memory. If customers use Delphi to write their DLL, they will need to turn that option on (or maybe we can detect that it is a Delphi DLL on load ... need to check that out). Nonetheless, it has been an interesting exercise.
library Sample;
uses
SysUtils,
Windows,
Classes,
HTTPApp,
SyncObjs;
{$E dll}
var
gListSync : TCriticalSection;
gTLSList : TList;
threadvar
threadint : integer;
// remove all entries from the TLS storage list
procedure RemoveAndFreeTLS();
var
i : integer;
begin
// Only call this at process detach. Those calls are serialized
// so don't get the critical section.
if assigned( gTLSList ) then
for i := 0 to gTLSList.Count - 1 do
// Is this actually safe in DllMain process detach? From reading the MSDN
// docs, it appears that the only safe statement in DllMain is "return;"
LocalFree( Cardinal( gTLSList.Items[i] ));
end;
// Remove this thread's entry
procedure RemoveThreadTLSEntry();
var
p : pointer;
begin
// Find the entry for this thread and remove it.
gListSync.enter;
try
if ( SysInit.TlsIndex <> -1 ) and ( assigned( gTLSList )) then
begin
p := TlsGetValue( SysInit.TlsIndex );
// if this thread didn't actually make a call into the DLL and use a threadvar
// then there would be no memory for it
if p <> nil then
gTLSList.Remove( p );
end;
finally
gListSync.leave;
end;
end;
// Add current thread's TLS pointer to the global storage list if it is not already
// stored in it.
procedure AddThreadTLSEntry();
var
p : pointer;
begin
gListSync.enter;
try
// Need to create the list if first call
if not assigned( gTLSList ) then
gTLSList := TList.Create;
if SysInit.TlsIndex <> -1 then
begin
p := TlsGetValue( SysInit.TlsIndex );
if p <> nil then
begin
// if it is not stored, add it
if gTLSList.IndexOf( p ) = -1 then
gTLSList.Add( p );
end;
end;
finally
gListSync.leave;
end;
end;
// Some entrypoint that uses threadvar (directly or indirectly)
function MyExportedFunc(): LongWord; stdcall;
begin
threadint := 123;
// Make sure this thread's TLS pointer is stored in our global list so
// we can free it at process detach. Do this AFTER using the threadvar.
// Delphi seems to allocate the memory on demand.
AddThreadTLSEntry;
Result := 0;
end;
procedure DllMain(reason: integer) ;
begin
case reason of
DLL_PROCESS_DETACH:
begin
// NOTE - if this is being called due to process termination, then it should
// just return and do nothing. Very dangerous (and against MSDN recommendations)
// otherwise. However, Delphi does not provide that information (the 3rd param of
// the real DlLMain entrypoint). In my test, though, I know this is only called
// as a result of the DLL being unloaded via FreeLibrary
RemoveAndFreeTLS();
gListSync.Free;
if assigned( gTLSList ) then
gTLSList.Free;
end;
DLL_THREAD_DETACH:
begin
// on a thread detach, Delphi will clean up its own TLS, so we just
// need to remove it from the list (otherwise we would get a double free
// on process detach)
RemoveThreadTLSEntry();
end;
end;
end;
exports
DllMain,
MyExportedFunc;
// Initialization
begin
IsMultiThread := TRUE;
// Make sure Delphi calls my DllMain
DllProc := #DllMain;
// sync object for managing TLS pointers. Is it safe to create a critical section?
// This init code is effectively DllMain's DLL_PROCESS_ATTACH
gListSync := TCriticalSection.Create;
end.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string