How to free a Pascal function's TStringList?

How to free a Pascal function's TStringList? - object

I've got a function in Pascal which returns a StringList as the result. How do I free it correctly? As it's not a global variable I can't free it in the Form.FormDestroy procedure. And freeing it after returning it doesn't work either (that should be allowed though, LOL).
In general, is there a way to free all objects (including the ones that the form has no control over) as soon as the program is closed?

#TomBrunberg is right, his comment solved the problem:
Take a reference to the returned TStringList and use that ref to free it:
var
sl: TStringList;
begin
sl := fnThatReturnsAStringList;
UseTheList(sl);
sl.Free;
end;
Another possibility, in case you call the function only because of side effects of that call, without any interest in the returned TStringList, you may even free it simply by calling:
begin
fnThatReturnsAStringList.Free;
end;

Related

Is TStringList thread safe?

Is it ok to read data from TStringList without any form of synchronization? For example synchronization with main thread.
Example code
var MyStringList:TStringList; //declared globally
procedure TForm1.JvThread1Execute(Sender: TObject; Params: Pointer);
var x:integer;
begin
for x:=0 to MaxInt do MyStringList.Add(FloatToStr(Random));
end;
procedure TForm1.ButtonClick(Sender: TObject);
var x:integer;
SumOfRandomNumbers:double;
begin
for x:=0 to MyStringList.Count-1 do
SumOfRandomNumbers:=SumOfRandomNumbers+StrToFloat(MyStringList.Strings[x]);
end;
or Should I protect access to MyStringList with EnterCiticalSection
var MyStringList:TStringList; //declared globally
procedure TForm1.JvThread1Execute(Sender: TObject; Params: Pointer);
var x:integer;
begin
for x:=0 to MaxInt do
begin
EnterCriticalSection(MySemaphore);
MyStringList.Add(FloatToStr(Random));
LeaveCriticalSection(MySemaphore);
end;
end;
procedure TForm1.ButtonClick(Sender: TObject);
var x:integer;
SumOfRandomNumbers:double;
begin
for x:=0 to MyStringList.Count-1 do
begin
EnterCriticalSection(MySemaphore);
SumOfRandomNumbers:=SumOfRandomNumbers+StrToFloat(MyStringList.Strings[x]);
LeaveCriticalSection(MySemaphore);
end;
end;

First, no TStringList is not thread-safe.
Second, attempting to make it so would be a terrible idea for a low-level container that in the vast majority of cases would not be shared across multiple threads.
Third, the naive code you propose to make it thread-safe is woefully insufficient. It falls well short of making it truly thread-safe - which is part of the problem in trying to do so generically.
In the text of your question you ask:
Is it ok to read data from TStringList without any form of synchronisation?
Yes it is okay. In fact, that is preferred because it is more efficient.
However, if the data is shared across threads, you may run into problems. Which is why you should minimise the amount of data (not just string lists) shared across threads. And if you do need to share data, do so in a suitably controlled fashion.
Expanding on point 3
The reason your code is not thread-safe is that it falls short of protecting all your data from shared access. This is a common misunderstanding in multi-threaded development: "I just need to wrap certain operations with locks and all will be fine."
The point is, if your list is shared, you are:
Sharing the structures that represent the container.
AND you are sharing the data members (the actual strings) themselves.
When dealing with strings, this goes a step further, because the way Delphi manages strings means they could be shared (through internal reference counting) with other strings of the same value in an entirely different area of the application.
While it is possible your proposed locking strategy might be suitable for your current requirements, it is far from being generally thread-safe.
Conclusion
If you want to write thread-safe code the onus is on you to:
Understand the data access paths.
Minimise sharing between threads (by far the best bang for buck).
And to implement the best strategy to share the data safely (of which there are many options, and locking is not guaranteed to be best in any case).
Sidenote
I indicated earlier that your locking technique only "might be suitable for your current requirements" because I do not believe you have really given an indication as to you real requirements. If you have then you really do need to take note of the following:
In the code you have presented there would be absolutely no benefit in making your TStringList "thread-safe". You populate the list in a loop, and you read values in a second loop. You're doing absolutely nothing to use the data concurrently.
The closest your code should come to multi-threading is: It would be a good idea to process both loops off the main thread to avoid blocking the UI. In which case, the background thread should NOT share its TStringList instance. And can simply synchronise with the main thread to report the result (and possibly progress updates).
By not sharing data that doesn't need to be shared, you can bypass the need for locks entirely. They would be an unnecessary overhead. And you can be happy that TStringList doesn't have a built-in "thread-safety" mechanism.

No, it isn't. There is no mechanism inside of TStringList, that locks for example .Add() or .GetStrings().
Unfortunately there is nothing built in like TThreadList, that is a threadsafe wrapper for TList. But you could build that easily on your own.
Here is a simple example for a synchronized decorator of TStringList, in that I cover the case for Add():
TThreadStringList = class
private
FStringList: TStringList;
FCriticalSection: TRtlCriticalSection;
// ...
public
function Add(const S: string): Integer;
// ...
end;
// ...
TThreadStringList.Add(const S: string): Integer;
begin
EnterCriticalSection(FCriticalSection);
try
Result:= Add(S);
finally
LeaveCriticalSection(FCriticalSection);
end;
end;
It should be easy, to apply this to all other methods you need.
Bear in mind, that you have to initialize the critical section, before you can use it, and to delete it afterwards.

How to prevent my threads from exiting before their work is done?

I have 10 threads working together. After starting the threads, 15 seconds later all threads exit before the job done, and only one thread remains.
My code:
procedure TForm1.Button2Click(Sender: TObject);
begin
AA;
BB;
CC;
DD;
EE;
FF;
GG;
HH;
II;
JJ;
end;
procedure TForm1.AA; //same procedure for BB,CC,DD,EE.FF,JJ,HH,II,JJ
begin
lHTTP := TIdHTTP.Create(nil);
TTask.Create(Procedure
try
//HTTP Opertations
finally
end;
end).Start;
end;
Note, i can't Free the HTTP component because if i did i get an AV and I don't know how to debug it, where to correctly free it in the code? However without freeing it the code works well but the threads exit. It might be the problem as Mr Dodge said.

Based on how I see you're creating the TIdHTTP component, it's simply wrong. You shouldn't create an object outside of the thread, then use it from inside the thread. That's not thread-safe. You should create it in the same thread as where it's being used. This is why you're unable to free it as well, so you actually have two problems to fix here at the same time.
I also realized that your lHTTP variable is not in the scope of your code, so I'm going to assume that you have it declared in some global (or otherwise shared) location. Each thread needs its own variable for its own instance.
So your code should look a little more like this:
procedure TForm1.AA; //same procedure for BB,CC,DD,EE.FF,JJ,HH,II,JJ
begin
TTask.Create(Procedure
var
lHTTP: TIdHTTP;
begin
lHTTP := TIdHTTP.Create(nil);
try
//HTTP Opertations
finally
lHTTP.Free;
end;
end).Start;
end;
Other components (such as TADOConnection) would even completely fail and crash for attempting such a thing (since such components utilize COM). Luckily, TIdHTTP does not use COM, but the design is still flawed for the same reason.
Now, when you say that you debugged it, I'm guessing you mean you debugged the code in the actual thread, but the breakpoint jumped to another place in your code before it reached the end of this? That is to be expected when using the debugger in threads. You can't just step into a thread and expect each sequential breakpoint to be in the same thread - I mean, if you have more than one breakpoint in different threads, your debugger is very likely to jump from one to another - because, again, they are multiple threads. I suggest creating some sort of work log, and each thread reports its status and position.
It is literally just like an alternate universe. Multiple different similar threads doing slightly different things than each other. The Delphi Debugger is simply the Time Lord who can see into all the alternate universes.

How can I use a Dictionary/StringList inside Execute of a Thread - delphi

I have a thread class TValidateInvoiceThread:
type
TValidateInvoiceThread = class(TThread)
private
FData: TValidationData;
FInvoice: TInvoice; // Do NOT free
FPreProcessing: Boolean;
procedure ValidateInvoice;
protected
procedure Execute; override;
public
constructor Create(const objData: TValidationData; const bPreProcessing: Boolean);
destructor Destroy; override;
end;
constructor TValidateInvoiceThread.Create(const objData: TValidationData;
const bPreProcessing: Boolean);
var
objValidatorCache: TValidationCache;
begin
inherited Create(False);
FData := objData;
objValidatorCache := FData.Caches.Items['TInvAccountValidator'];
end;
destructor TValidateInvoiceThread.Destroy;
begin
FreeAndNil(FData);
inherited;
end;
procedure TValidateInvoiceThread.Execute;
begin
inherited;
ValidateInvoice;
end;
procedure TValidateInvoiceThread.ValidateInvoice;
var
objValidatorCache: TValidationCache;
begin
objValidatorCache := FData.Caches.Items['TInvAccountValidator'];
end;
I create this thread in another class
procedure TInvValidators.ValidateInvoiceUsingThread(
const nThreadIndex: Integer;
const objValidatorCaches: TObjectDictionary<String, TValidationCache>;
const nInvoiceIndex: Integer; const bUseThread, bPreProcessing: Boolean);
begin
objValidationData := TValidationData.Create(FConnection, FAllInvoices, FAllInvoices[nInvoiceIndex], bUseThread);
objValidationData.Caches := objValidatorCaches;
objThread := TValidateInvoiceThread.Create(objValidationData, bPreProcessing);
FThreadArray[nThreadIndex] := objThread;
FHandleArray[nThreadIndex]:= FThreadArray[nThreadIndex].Handle;
end;
Then I execute it
rWait:= WaitForMultipleObjects(FThreadsRunning, #FHandleArray, True, 100);
Note I have removed some code out of here to try to keep it a bit simpler to follow
The problem is that my Dictionary is becoming corrupt
If I put a breakpoint in the constructor all is fine
However, in the first line of the Execute method, the dictionary is now corrupt.
The dictionary itself is a global variable to the class
Do I need to do anything special to allow me to use Dictionaries inside a thread?
I have also had the same problem with a String List
Edit - additional information as requested
TInvValidators contains my dictionary
TInvValidators = class(TSTCListBase)
private
FThreadArray : Array[1..nMaxThreads] of TValidateInvoiceThread;
FHandleArray : Array[1..nMaxThreads] of THandle;
FThreadsRunning: Integer; // total number of supposedly running threads
FValidationList: TObjectDictionary<String, TObject>;
end;
procedure TInvValidators.Validate(
const Phase: TValidationPhase;
const objInvoices: TInvoices;
const ReValidate: TRevalidateInvoices;
const IDs: TList<Integer>;
const objConnection: TSTCConnection;
const ValidatorCount: Integer);
var
InvoiceIndex: Integer;
i : Integer;
rWait : Cardinal;
Flags: DWORD; // dummy variable used in a call to find out if a thread handle is valid
nThreadIndex: Integer;
procedure ValidateInvoiceRange(const nStartInvoiceID, nEndInvoiceID: Integer);
var
InvoiceIndex: Integer;
I: Integer;
begin
nThreadIndex := 1;
for InvoiceIndex := nStartInvoiceID - 1 to nEndInvoiceID - 1 do
begin
if InvoiceIndex >= objInvoices.Count then
Break;
objInvoice := objInvoices[InvoiceIndex];
ValidateInvoiceUsingThread(nThreadIndex, FValidatorCaches, InvoiceIndex, bUseThread, False);
Inc(nThreadIndex);
if nThreadIndex > nMaxThreads then
Break;
end;
FThreadsRunning := nMaxThreads;
repeat
rWait:= WaitForMultipleObjects(FThreadsRunning, #FHandleArray, True, 100);
case rWait of
// one of the threads satisfied the wait, remove its handle
WAIT_OBJECT_0..WAIT_OBJECT_0 + nMaxThreads - 1: RemoveHandle(rWait + 1);
// at least one handle has become invalid outside the wait call,
// or more than one thread finished during the previous wait,
// find and remove them
WAIT_FAILED:
begin
if GetLastError = ERROR_INVALID_HANDLE then
begin
for i := FThreadsRunning downto 1 do
if not GetHandleInformation(FHandleArray[i], Flags) then // is handle valid?
RemoveHandle(i);
end
else
// the wait failed because of something other than an invalid handle
RaiseLastOSError;
end;
// all remaining threads continue running, process messages and loop.
// don't process messages if the wait returned WAIT_FAILED since we didn't wait at all
// likewise WAIT_OBJECT_... may return soon
WAIT_TIMEOUT: Application.ProcessMessages;
end;
until FThreadsRunning = 0; // no more valid thread handles, we're done
end;
begin
try
FValidatorCaches := TObjectDictionary<String, TValidationCache>.Create([doOwnsValues]);
for nValidatorIndex := 0 to Count - 1 do
begin
objValidator := Items[nValidatorIndex];
objCache := TValidationCache.Create(objInvoices);
FValidatorCaches.Add(objValidator.ClassName, objCache);
objValidator.PrepareCache(objCache, FConnection, objInvoices[0].UtilityType);
end;
nStart := 1;
nEnd := nMaxThreads;
while nStart <= objInvoices.Count do
begin
ValidateInvoiceRange(nStart, nEnd);
Inc(nStart, nMaxThreads);
Inc(nEnd, nMaxThreads);
end;
finally
FreeAndNil(FMeterDetailCache);
end;
end;
If I remove the repeat until and leave just WaitForMultipleObjects I still get lots of errors
You can see here that I am processing the invoices in chunks of no more than nMaxThreads (10)
When I reinstated the repeat until loop it worked on my VM but then access violated on my host machine (which has more memory available)
Paul

Before I offer guidance on how to resolve your problem, I'm going to give you a very important tip.
First ensure your code works single-threaded, before trying to get a multi-threaded implementation working. The point is that multi-threaded code adds a whole new layer of complexity. Until your code works correctly in a single thread, it has no chance of doing so in multiple threads. And the extra layer of complexity makes it extremely difficult to fix.
You might believe you've got a working single-threaded solution, but I'm seeing errors in your code that imply you still have a lot of resource management bugs. Here's one example with relevant lines only, and comments to explain the mistakes:
begin
try //try/finally is used for resource protection, in order to protect a
//resource correctly, it should be allocated **before** the try.
FValidatorCaches := TObjectDictionary<String, TValidationCache>.Create([doOwnsValues]);
finally
//However, in the finally you're destroying something completely
//different. In fact, there are no other references to FMeterDetailCache
//anywhere else in the code you've shown. This strongly implies an
//error in your resource protection.
FreeAndNil(FMeterDetailCache);
end;
end;
Reasons for not being able to use the dictionary
You say that: "in the first line of the Execute method, the dictionary is now corrupt".
For a start, I'm fairly certain that your dictionary isn't really "corrupt". The word "corrupt" implies that it's there, but its internal data is invalid resulting in inconsistent behaviour. It's far more likely that by the time the Execute method wants to use the dictionary, it has already been destroyed. So your thread is basically pointing to an area of memory that used to have a dictionary, but it's no longer there at all. (I.e. not "corrupt")
SIDE NOTE It is possible for your dictionary to truly become corrupt because you have multiple threads sharing the same dictionary. If different threads cause any internal changes to the dictionary at the same time, it could very easily become corrupt. But, assuming your threads are all treating the dictionary as read-only, you would need a memory overwrite to corrupt it.
So let's focus on what might cause your dictionary to be destroyed before the thread gets to use it. NOTE I can't see anything in the code provided, but there are 2 likely possibilities:
Your main thread destroys the dictionary before the child thread gets to use it.
One of your child threads destroys the dictionary as soon as it is destroyed resulting in all other threads being unable to use it.
In the first case, this would happen as follows:
Main Thread: ......C......D........
Child Thread ---------S......
. = code being executed
C = child thread created
- = child thread exists, but isn't doing anything yet
S = OS has started the child thread
D = main thread destroys dictionary
The point of the above is that it's easy to forget that the main thread can reach a point where it decides to destroy the dictionary even before the child thread starts running.
As for the second possibility, this depends on what is happening inside the destructor of TValidationData. Since you haven't shown that code, only you know the answer to that.
Debugging to pinpoint the problem
Assuming the dictionary is being destroyed too soon, a little debugging can quickly pinpoint where/why the dictionary is being destroyed. From your question, it seems you've already done some debugging, so I'm assuming you'll have no trouble following these steps:
Put a breakpoint on the first line of the dictionary's destructor.
Run your code.
If you reach Execute before reaching the dictionary's destructor, then the thread should still be able to use the dictionary.
If you reach the dictionary's destructor before reaching Execute, then you simply need to examine the sequence of calls leading to the object's destruction.
Debugging in case of a memory overwrite
Keeping an open mind about the possibility of a memory overwrite... This is a little trickier to debug. But provided you can consistently reproduce the problem it should be possible to debug.
Put a breakpoint in the thread's destructor.
Run the app
When you reach the above breakpoint, find the address of the of the dictionary by pressing Ctrl + F7 and evaluating #FData.Caches.
Now add a Data Breakpoint (use the drop-down from the Breakpoints window) for the address and the size of the dictionary.
Continue running, the app will pause when the the data changes.
Again, examine the call-stack to determine the cause.
Wrapping up
You have a number of questions and statements that imply misunderstandings about sharing data (dictionary/string list) between threads. I'll try cover those here.
There is nothing special required to use a Dictionary/StringList in a thread. It's basically the same as passing it to any other object. Just make sure the Dictionary/StringList isn't destroyed prematurely.
That said, whenever you share data, you need to be aware of the possibility of "race conditions". I.e. one thread attempts to access the shared data at the same time another thread is busy modifying it. If no threads are modifying the data, then there's no need for concern. But as soon as any thread is able to modify the data, the access needs to be made "thread-safe". (There are a number of ways to do this, please search for existing questions on SO.)
You mention: "The dictionary itself is a global variable to the class". Your terminology is not correct. A global variable is something declared at the unit level and is accessible anywhere. It's enough to simply say the dictionary is a member of or field of the class. When dealing with "globals", there are significantly different things to worry about; so best to avoid any confusion.
You may want to rethink how you initialise your threads. There are a few reasons you some entries of FHandleArray won't be initialised. Are you ok with this?
You mention AV on a machine that has more memory available. NOTE: Amount of memory is not relevant. And if you run in 32-bit mode you wouldn't have access to more than 4 GB in any case.
Finally, to make a special mention:
Using multiple threads to perform dictionary lookups is extremely inefficient. A dictionary lookup is an O(1) operation. The overhead of threading will almost certainly slow you down unless you intend doing a significant amount of processing in addition to the dictionary lookup.
PS - (not so big) mistake
I noticed the following in your code, and it's a mistake.
procedure TValidateInvoiceThread.Execute;
begin
inherited;
ValidateInvoice;
end;
The TThread.Execute method is abstract, meaning there's no implementation. Attempting to call an abstract method will trigger an EAbstractError. Luckily as LU RD points out, the compiler is able to protect you by not compiling the line in. Even so, it would be more accurate to not call inherited here.
NOTE: In general, overridden methods don't always need to call inherited. You should be explicitly aware of what inherited is doing for you and decide whether to call it on a case-by-case basis. Don't go into auto-pilot mode of calling inherited just because you're overriding a virtual method.

Aquire Singleton class Instance Multithread

To get the instance of the class with Singleton pattern, I want use the following function:
This is a sketch
interface
uses SyncObjs;
type
TMCriticalSection = class(TCriticalSection)
private
Dummy : array [0..95] of Byte;
end;
var
InstanceNumber : Integer;
AObject: TObject;
CriticalSection: TMCriticalSection;
function getInstance: TObject;
implementation
uses Windows;
function getInstance: TObject;
begin
//I Want somehow use InterlockedCompareExchange instead of CriticalSession, for example
if InterlockedCompareExchange(InstanceNumber, 1, 0) > 0 then
begin
Result := AObject;
end
else
begin
CriticalSection.Enter;
try
AObject := TObject.Create;
finally
CriticalSection.Leave;
end;
InterlockedIncrement(InstanceNumber);
Result := AObject
end;
end;
initialization
CriticalSection := TMCriticalSection.Create;
InstanceNumber := 0;
finalization;
CriticalSection.Free;
end.
Three Questions:
1- Is this design Thread Safe? Especially the with InterlockedExchange Part.
2- How to use the InterlockedCompareExchange? Is it possible to do what i'm trying?
3- Is this design better than involve all code within the critical section scope?
Remark:
My object is thread safe, only the construction i need to serialize!
This is not the intire code, only the important part, i mean, the getInstance function.
Edit
I NEED to use some kind of singleton object.
Is there any way to use InterlockedCompareExchange to compare if the value of InstanceNumber is zero?
And
1 - Create the object only when is 0, otherwise, return the instance.
2 - When the value is 0: Enter in the critical section. Create the object. Leave critical section.
3 - Would be better to do this way, instead of involve all code within the critical section scope?

There is a technique called "Lock-free initialization" that does what you want:
interface
function getInstance: TObject;
implementation
var
AObject: TObject;
function getInstance: TObject;
var
newObject: TObject;
begin
if (AObject = nil) then
begin
//The object doesn't exist yet. Create one.
newObject := TObject.Create;
//It's possible another thread also created one.
//Only one of us will be able to set the AObject singleton variable
if InterlockedCompareExchangePointer(AObject, newObject, nil) <> nil then
begin
//The other beat us. Destroy our newly created object and use theirs.
newObject.Free;
end;
end;
Result := AObject;
end;
The use of InterlockedCompareExchangePointer erects a full memory barrier around the operation. It is possible one might be able to get away with InterlockedCompareExchangeRelease to use release semantics (to ensure the construction of the object completes before performing the compare exchange). The problem with that is:
i'm not smart enough to know if Release semantics alone really will work (just cause a smarter person than me said they would, doesn't mean i know what he's talking about)
you're constructing an object, the memory barrier performance hit is the least of your worries (it's the thread safety)
Note: Any code released into public domain. No attribution required.

Your design does not work and this can be seen even without any understanding of what InterlockedCompareExchange does. In fact, irrespective of the meaning of InterlockedCompareExchange, your code is broken.
To see this, consider two threads arriving at the if statement in getInstance at the same time. Let's consider the three options for which branches they take:
They both choose the second branch. Then you create two instances and your code no longer implements a singleton.
They both choose the first branch. Then you never create an instance.
One choose the first and the other chooses the second. But since there is no lock in the first branch, the thread that takes that route can read AObject before the other thread has written it.
Personally I'd use double-checked locking if I had to implement your getInstance function.

Possible to force Delphi threadvar Memory to be Freed?

I have been chasing down what appears to be a memory leak in a DLL built in Delphi 2007 for Win32. The memory for the threadvar variables is not freed if the threads still exist when the DLL is unloaded (there are no active calls into the DLL when it is unloaded).
The question: Is there some way to cause Delphi to free memory associated with threadvar variables? It is not as simple as just not using them. A number of the existing Delphi components use them, so even if the DLL does not explicitly declare them, it ends up using them.
A Few Details
I have tracked it down to a LocalAlloc call that occurs in response to the usage of a threadvar variable, which is Delphi's "wrapper" around thread-local storage in Win32. For the curious, the allocation call is in the Delphi source file sysinit.pas. The corresponding LocalFree call occurs only for threads that get DLL_THREAD_DETACH calls. If you have multiple threads in an application and unload a DLL, there is no DLL_THREAD_DETACH call for each thread. The DLL gets a DLL_PROCESS_DETACH and nothing else; I believe that is expected and valid. Thus, any thread-local storage allocations made on other threads are leaked.
I re-created it with a short C program that starts several "worker" threads. It loads the DLL (via LoadLibrary) on the main thread and then makes calls into an exported function on the worker threads. The function exported from the Delphi DLL assigns a value to a threadvar integer variable and returns. The C program then unloads the DLL (via FreeLibrary on the main thread) and repeats. After about 32,000 iterations, the process memory usage shown in Process Explorer grows to over 130MB. I also verified it more accurately with umdh. UMDH showed 24 bytes lost per instance. But the 130MB in Process Explorer seems to indicate about 4K per iteration; I'm guessing a 4K segment was leaked each time based on that, but I don't know for sure.
For clarification, here is the threadvar declaration and the entire exported function:
threadvar
threadint : integer;
function Startup( ulID: LongWord; hValue: Longint ): LongWord; stdcall;
begin
threadint := 123;
Result := 0;
end;
Thanks.

As you've already determined, thread-local storage will get released for each thread that gets detached from the DLL. That happens in System._StartLib when Reason is DLL_Thread_Detach. For that to happen, though, the thread needs to terminate. Thread-detach notifications occur when the thread terminates, not when the DLL is unloaded. (If it were the other way around, the OS would have to interrupt the thread someplace so it could insert a call to DllMain on the thread's behalf. That would be disastrous.)
The DLL is supposed to receive thread-detach notifications. In fact, that's the model suggested by Microsoft in its description of how to use thread-local storage with DLLs.
The only way to release thread-local storage is to call TlsFree from the context of the thread whose storage you want to free. From what I can tell, Delphi keeps all its threadvars in a single TLS index, given by the TlsIndex variable in SysInit.pas. You can use that value to call TlsFree whenever you want, but you'd better be sure there won't be any more code executed by the DLL in the current thread.
Since you also want to free the memory used for holding all the threadvars, you'll need to call TlsGetValue to get the address of the buffer Delphi allocates. Call LocalFree on that pointer.
This would be the (untested) Delphi code to free the thread-local storage.
var
TlsBuffer: Pointer;
begin
TlsBuffer := TlsGetValue(SysInit.TlsIndex);
LocalFree(HLocal(TlsBuffer));
TlsFree(SysInit.TlsIndex);
end;
If you need to do this from the host application instead of from within the DLL, then you'll need to export a function that returns the DLL's TlsIndex value. That way, the host program can free the storage itself after the DLL is gone (thus guaranteeing no further DLL code executes in a given thread).

Note that it is clearly specified in the Help that you have to take care of freeing yourself your threadvars.
You should do so as soon as you know you won't need them anymore.
From Help:
Dynamic variables that are ordinarily managed by the compiler (long strings, wide strings, dynamic arrays, variants, and interfaces) can be declared with threadvar, but the compiler does not automatically free the heap-allocated memory created by each thread of execution. If you use these data types in thread variables, it is your responsibility to dispose of their memory from within the thread, before the thread terminates. For example,
threadvar S: AnsiString;
S := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
...
S := ''; // free the memory used by S
Note: Use of such constructs is discouraged.
You can free a variant by setting it to Unassigned and an interface or dynamic array by setting it to nil.

At the risk of way too much code, here is a possible (poor) solution to my own question. Using the fact that the thread-local storage memory is stored in a single block for the threadvar variables (as pointed out by Mr. Kennedy - thanks), this code stores the allocated pointers in a TList and then frees them at process detach. I wrote it mostly just to see if it would work. I probably would not use this in production code because it makes assumptions about the Delphi runtime that could change with different versions and quite possibly misses problems even with the version I am using (Delphi 7 and 2007).
This implementation does make umdh happy, it doesn't think there are any more memory leaks. However, if I run the test in a loop (load, call entrypoint on another thread, unload), the memory usage as seen in Process Explorer still grows alarmingly fast. In fact, I created a completely empty DLL with only an empty DllMain (that was not called since I did not assign Delphi's global DllMain pointer to it ... Delhi itself provides the real DllMain entrypoint). A simple loop of loading/unloading the DLL still leaked 4K per iteration. So there may still be something else a Delphi DLL is supposed to include (the main point of the original question). But I don't know what it is. A DLL written in C does not behave this way.
Our code (a server) can call DLLs written by customers to extend functionality. We typically unload the DLL after there are no more references to it. I think my solution to the problem is going to be to add an option to leave the DLL loaded "permanently" in memory. If customers use Delphi to write their DLL, they will need to turn that option on (or maybe we can detect that it is a Delphi DLL on load ... need to check that out). Nonetheless, it has been an interesting exercise.
library Sample;
uses
SysUtils,
Windows,
Classes,
HTTPApp,
SyncObjs;
{$E dll}
var
gListSync : TCriticalSection;
gTLSList : TList;
threadvar
threadint : integer;
// remove all entries from the TLS storage list
procedure RemoveAndFreeTLS();
var
i : integer;
begin
// Only call this at process detach. Those calls are serialized
// so don't get the critical section.
if assigned( gTLSList ) then
for i := 0 to gTLSList.Count - 1 do
// Is this actually safe in DllMain process detach? From reading the MSDN
// docs, it appears that the only safe statement in DllMain is "return;"
LocalFree( Cardinal( gTLSList.Items[i] ));
end;
// Remove this thread's entry
procedure RemoveThreadTLSEntry();
var
p : pointer;
begin
// Find the entry for this thread and remove it.
gListSync.enter;
try
if ( SysInit.TlsIndex <> -1 ) and ( assigned( gTLSList )) then
begin
p := TlsGetValue( SysInit.TlsIndex );
// if this thread didn't actually make a call into the DLL and use a threadvar
// then there would be no memory for it
if p <> nil then
gTLSList.Remove( p );
end;
finally
gListSync.leave;
end;
end;
// Add current thread's TLS pointer to the global storage list if it is not already
// stored in it.
procedure AddThreadTLSEntry();
var
p : pointer;
begin
gListSync.enter;
try
// Need to create the list if first call
if not assigned( gTLSList ) then
gTLSList := TList.Create;
if SysInit.TlsIndex <> -1 then
begin
p := TlsGetValue( SysInit.TlsIndex );
if p <> nil then
begin
// if it is not stored, add it
if gTLSList.IndexOf( p ) = -1 then
gTLSList.Add( p );
end;
end;
finally
gListSync.leave;
end;
end;
// Some entrypoint that uses threadvar (directly or indirectly)
function MyExportedFunc(): LongWord; stdcall;
begin
threadint := 123;
// Make sure this thread's TLS pointer is stored in our global list so
// we can free it at process detach. Do this AFTER using the threadvar.
// Delphi seems to allocate the memory on demand.
AddThreadTLSEntry;
Result := 0;
end;
procedure DllMain(reason: integer) ;
begin
case reason of
DLL_PROCESS_DETACH:
begin
// NOTE - if this is being called due to process termination, then it should
// just return and do nothing. Very dangerous (and against MSDN recommendations)
// otherwise. However, Delphi does not provide that information (the 3rd param of
// the real DlLMain entrypoint). In my test, though, I know this is only called
// as a result of the DLL being unloaded via FreeLibrary
RemoveAndFreeTLS();
gListSync.Free;
if assigned( gTLSList ) then
gTLSList.Free;
end;
DLL_THREAD_DETACH:
begin
// on a thread detach, Delphi will clean up its own TLS, so we just
// need to remove it from the list (otherwise we would get a double free
// on process detach)
RemoveThreadTLSEntry();
end;
end;
end;
exports
DllMain,
MyExportedFunc;
// Initialization
begin
IsMultiThread := TRUE;
// Make sure Delphi calls my DllMain
DllProc := #DllMain;
// sync object for managing TLS pointers. Is it safe to create a critical section?
// This init code is effectively DllMain's DLL_PROCESS_ATTACH
gListSync := TCriticalSection.Create;
end.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string