FreeThreadedDOMDocument, Neutral Apartments and Free-Threaded Marshaler

FreeThreadedDOMDocument, Neutral Apartments and Free-Threaded Marshaler - multithreading

As MSDN states:
If you are writing a single threaded application (or a multi-threaded application where only one thread accesses a DOM at one time), use the rental threaded model (Msxml2.DOMDocument.3.0 or Msxml2.DOMDocument.6.0). If you are writing an application where multiple threads access will simultaneously access a DOM, use the free threaded model (Msxml2.FreeThreadedDOMDocument.3.0 or Msxml2.FreeThreadedDOMDocument.6.0).
Is there any connection between FreeThreadedDOMDocument, neutral apartments and free-threaded marshaler? I looked in OleView and found that FreeThreadedDOMDocument threading model is Both. As far as I understand neutral apartment objects are supported with a free-threaded marshaler. Does it mean that FreeThreadedDOMDocument doesn't use a free-threaded marshaler and it is called a bit confusing as free-threaded?
What is the implementation difference between COM classes that marked as Free, Both or Neutral? As far as I understand they all must be thread-safe, why is the difference? Is it correct that Neutral should support a free-threaded marshaler?

There are multiple questions here.
TL;DR:
Neutral objects:
Incur a bit less in-process marshaling than STA and MTA object
Avoid thread switching
Interface pointers are automatically marshaled
Neutral apartment lives as long as there's a neutral object alive
Must be ready to run under any kind of thread, using COM utility functions for waiting or choosing the Win32 wait function to use depending on the thread type
Free threaded objects:
Incur practically no in-process marshaling
Avoid thread switching
Interface pointers are not automatically marshaled
Lifetime is tied to the activation apartment
Must be ready to run under any kind of thread, using COM utility functions for waiting or choosing the Win32 wait function to use depending on the thread type
Is there any connection between FreeThreadedDOMDocument, neutral apartments and free-threaded marshaler?
TL;DR: The FreeThreadedDOMDocument's threading model is "Both", so it's tied to the apartment where it's activated (created). It aggregates the free threaded marshaler, so it's a free threaded object.
The FreeThreadedDOMDocument is a COM class whose objects aggregate the free threaded marshaler. What this marshaler does is to provide a raw pointer whenever marshaling in-process (i.e. IMarshal::MarshalInterface with dwDestContext set to MSHCTX_INPROC.
I'll use definition of a free threaded object as an object which aggregates the free threaded marshaler.
A free threaded object's threading model should be specified as "Neutral", or "Both" before Windows 2000, so it can be created and used in any thread, avoiding context switches.
If its threading model is specified as "Both", the object's lifetime is tied to the apartment where it was created. For instance, if an STA thread terminates, all free threaded objects created within that apartment are either destroyed or no longer valid.
As far as I understand neutral apartment objects are supported with a free-threaded marshaler.
No, proxies to neutral objects are a bit lighter than other in-process proxies as it only sets up a COM context, but it never incurs in full marshaling and it avoids thread switching.
Does it mean that FreeThreadedDOMDocument doesn't use a free-threaded marshaler and it is called a bit confusing as free-threaded?
No, the FreeThreadedDOMDocument does use the free threaded marshaler.
Historically, there were already free threaded objects before Microsoft provided its own support for them (due to popularity, and probably because most free threaded marshalers out there were flaky), and the Neutral apartment appeared only in Windows 2000.
As such, instances of FreeThreadedDOMDocument are free threaded because they aggregate the free threaded marshaler, and the lifetime of each instance is tied to the apartment where it was created. Usually, there's little impact, but with e.g. a thread-pool of STA threads, the effect is observed more often, because STAs come and go as the owning threads terminate (either normally or to reclaim resources) and get created. For instance, classic ASP uses STA threads by default.
PS: I've mentioned the following subject in another answer, but I believe the content is a bit different as the questions are different too.
Here's the current threading model values:
None: use the main STA
"Apartment": use any STA, i.e. if the current apartment is STA or NA over STA, use the current STA, otherwise use the host STA (more on this later)
"Free": use the MTA
"Both": use the current apartment
"Neutral": use the NA
For any apartment that doesn't exist, COM creates it if needed.
There are several peculiarities here:
To use the main STA, you must not mention any threading model, instead of something more sensible, like "Main"
All names except "Neutral" make no sense nowadays:
"Apartment" feels like the current apartment, but it's not
"Free" feels like free threaded objects, but it's not
"Both" makes you think there are only 2 apartment types, but there are 3: STA, MTA and NA
Actually, since Windows 8, there's ASTA, a variation of STA that is created for the GUI which, during outgoing calls, discards incoming calls that are not related, thus avoiding a great source of reentrancy bugs
You can make a regular STA behave like this with a message filter
The main STA is the first created STA. It only matters for classes with an unspecified threading model.
There can be several STAs, but there are at most one MTA and one NA.
While there is an active MTA, any thread not initialized for COM is implicitly in the MTA if it doesn't call CoInitializeEx(NULL, COINIT_MULTITHREADED), but it also doesn't affect the MTA's lifetime at all, meaning the MTA may be destroyed while the thread is using it. Since this is scarcely documented and pretty much unreliable, you shouldn't rely on this.
Implicitly created apartments are called host STA and host MTA. You have no control over them (unless by cheating with CoUninitialize while in that apartment; note: don't actually do this). In fact, if you activate "Apartment" objects outside of an STA or outside of an NA running over an STA, it'll be activated in a host STA. For further confusion, this might also be the main STA if the host STA was the first STA to be initialized.
All COM threads that support host apartments are background threads, so they don't keep your application from exiting.
You have no control whatsoever over the NA, other than creating it when activating a neutral object. You cannot enter it directly, but you can create your own neutral object with a method that runs a callback in the context of the neutral apartment. This callback could be a free threaded object.
What is the implementation difference between COM classes that marked as Free, Both or Neutral?
COM classes with apartment declared as "Free" will result in objects that belong to the MTA. Such objects may assume that the threads they run in don't have to pump window messages. Essentially, they may block.
Free threaded objects and neutral objects must be prepared to run under any apartment. For free threaded objects, it should be obvious why: it bypasses any context marshaling, so methods execute in any thread. For neutral objects, there's the distinction of which kind of apartment was active (through CoGetApartmentType).
In either case, you should use COM's utility functions, like CoWaitForMultipleHandles instead of WaitForMultipleHandles[Ex], which blocks and is unacceptable in STAs, or MsgWaitForMultipleHandles[Ex], which accesses the window message queue, probably creating it implicitly, and is usually unacceptable in MTAs.
You can check the apartment type by yourself and choose to use the proper Win32 waiting functions or to use a polling strategy which waits and pumps messages with timeouts in the STA, in case you're waiting on something other than handles or if you require a specific waiting logic.
The most striking difference between free threaded objects and neutral objects is marshaling of other COM objects.
While using neutral objects, incoming and outgoing interface pointers are automatically marshaled. For instance, you can store incoming interface pointers in fields.
While using free threaded objects, incoming and outgoing interface pointers are not marshaled at all, meaning either you get raw pointers to objects in the same apartment, or you get proxies to objects in other apartments. These proxies are tied to the current apartment too.
For instance, an incoming raw pointer means you're getting an object that belongs to the current apartment, so you'll have to marshal it if you intend to store a reference to the object.
An incoming proxy means you're getting a proxy to an object in another apartment, but this proxy is tied to the current apartment. You can't store this proxy either. Specifically, notwithstanding standard proxy/stubs' apartment verification, STA proxies may have thread affinity. You must marshal it as well. But don't worry, marshaling a proxy will not stack marshaling; when you again unmarshal, you'll get a proxy to the object, not a proxy to a proxy.
When a free threaded object must store an interface pointer, it must always do so through manual marshaling, and when it must call methods from this interface pointer, it must do so through manually unmarshaling.
Usually, the Global Interface Table (GIT; another misleading name, it's actually an in-process table) is used for this purpose.
As far as I understand they all must be thread-safe, why is the difference?
Regarding thread-safety, there's no difference.
But as I explained in the previous question, there's a huge difference when storing interface pointers, and a subtle difference regarding object activation and lifetime.
Is it correct that Neutral should support a free-threaded marshaler?
The free threaded marshaler effectively ignores the apartment, so it's the methods' responsibility to behave, synchronize and/or lock correctly. So, neither apartment must support the free threaded marshaler, it's the free threaded object that must support every apartment.
It's possible to aggregate the free threaded marshaler in objects with any threading model, including "Neutral".
If you find that the context setup by the neutral apartment marshaler is somehow a bottleneck, then you may consider using the free threaded marshaler, at the cost of manually marshaling stored interface pointers. If not, just use the neutral apartment.

Related

What is STA/MTA vs apartments/free threads vs UI threads/worker threads? Why the name changes?

I am reading Inside COM by Dale Rogerson, and it uses the terms apartment threads and free threads to describe the different types of COM threads.
He also clarifies that these correspond directly to UI threads and worker threads:
COM uses the same two types of threads, although COM has different names for them. Instead of calling one a user-interface thread, COM uses the term apartment thread. The term free thread is used instead of worker thread. [...]
However, lots of other documentation refers to STAs and MTAs. "Single-Threaded Apartments" and "Multi-Threaded Apartments."
Do "apartments/free threads" and "STA/MTA" mean different things? Does Rogerson's book (1997) no longer reflect COM's threading model?
Why has the naming changed?
Disclaimer: I work for Microsoft.

It seems that the terms are interchangeable:
COM synchronizes calls
COM does not synchronize calls
STA (preferred name)
MTA (preferred name)
"Apartment thread"
Free thread
(often) UI thread
(often) Worker thread
Descriptions and workings of OLE threading models indicates STA == apartment thread and MTA == free thread (even though both new terms use the word "apartment"):
Single-threaded Apartment model (STA): One or more threads in a process use COM and calls to COM objects are synchronized by COM. Interfaces are marshaled between threads. A degenerate case of the single-threaded apartment model, where only one thread in a given process uses COM, is called the single-threading model. Previous have sometimes referred to the STA model simply as the "apartment model."
Multi-threaded Apartment model (MTA): One or more threads use COM and calls to COM objects associated with the MTA are made directly by all threads associated with the MTA without any interposition of system code between caller and object. Because multiple simultaneous clients may be calling objects more or less simultaneously (simultaneously on multi-processor systems), objects must synchronize their internal state by themselves. Interfaces are not marshaled between threads. Previous have sometimes referred to this model as the "free-threaded model."
And as quoted above apartment thread == UI thread and free thread == worker thread according to Rogerson:
COM uses the same two types of threads, although COM has different names for them. Instead of calling one a user-interface thread, COM uses the term apartment thread. The term free thread is used instead of worker thread. [...]
I'd love to know why the terminology changed, though.

Can a singleton object have a (thread-safe) method executing in different threads simultaneously?

If so, how will different threads share the same instance or chunk of memory that represents the object? Will different threads somehow "copy" the single-instance object method code to be run on its own CPU resources?
EDIT - to clarify this question further:
I understand that different threads can be in the "process" of executing a singleton object's method at the same time, while they might not all be actively executing - they may be waiting to be scheduled for execution by the OS and in various states of execution within the method. This question is specifically for multiple active threads that are executing each on a different processor. Can multiple threads be actively executing the same exact code path (same region of memory) at the same time?

Can a singleton object have a (thread-safe) method executing in different threads simultaneously?
Yes, of course. Note that this is no different from multiple threads running a method on the same non-singleton object instance.
how will different threads share the same instance or chunk of memory that represents the object?
(Ignoring NUMA and processor caches), the object exists in only one place in memory (hence, it's a singleton) so the different threads are all reading from the same memory addresses.
Now, if the object is immutable (and doesn't have external side-effects, like IO) then that's okay: multiple threads reading from the same memory and never changing it doesn't introduce any problems.
By analogy, this is like having a single (single-sided) piece of paper on a desk and 10 people reading it simultaneously: no problem.
If the object is mutable or (does have external side-effects like IO) then that's a problem :) You need to synchronize each thread's changes otherwise they'll be overwriting each other - this is bad.
Note that in many languages and platforms, like C#/.NET and C++, documentation can often assert or claim that a method is "thread-safe" but this is not necessarily absolutely guaranteed by the runtime (excepting the case where a function is provably "const-correct" and has no IO, which is something C++ can do, but C# cannot currently do that).
Note that just because a single method is thread-safe, doesn't mean an entire logical operation (e.g. calling multiple object methods in sequence) is thread-safe; for example, consider an Dictionary<K,V>: many threads can simultaneously call TryGetValue without issues - but as soon as one thread calls .Add then that messes up all of the other threads calling TryGetValue (because Dictionary<K,V> does not guarantee that .Add is atomic and blocks TryGetValue, this is why we use ConcurrentDictionary which has a different API - or we wrap our logical business/domain operations in lock (Monitor) statements).
Will different threads somehow "copy" the single-instance object method code to be run on its own CPU resources?
No.
You can make that happen with [ThreadLocal] and [ThreadStatic] (and [AsyncLocal]) but proponents of functional programming techniques (like myself) will strongly recommend against this for many reasons. If you want a thread to "own" something then pass the object on the stack as a parameter and not inside hidden static state.
(Also note this does not apply to value-types (e.g. struct) which are always copied between uses, but I assume you're referring to class object instances, fwiw - as .NET discourages using mutable structs anyway).

Microsoft's Aparment Analogy (STA, MTA): Need help understanding it

I've read lots about the Microsoft's threaded apartment model, but I'm still having a little trouble visualizing it.
Microsoft uses the analogy of living things living in an apartment. So, for STA, consider the following (I know it's a little silly).
Assume thread = person and COMObject = bacteria. The person lives in the apartment, and the bacteria lives inside the person. So in STA-Land, a thread lives in the STA and the COMObject lives inside the thread, so in order to interact with the COMObject, one must do so by running code on the COMObject's thread.
Assume thread = person and COMObject = cat. The person lives in the apartment, and the cat lives in the apartment with the person. SO in STA-Land, the thread and the COMObject at the same hierarchical level.
Q1. Which analogy above is correct, or if neither are correct, how would you describe the STA?
Q2. How would you describe the MTA?

I do not like these analogies. They are confusing.
You create an apartment.
If it is an STA there will be only one thread in the apartment so all the objects in that apartment will be executed on that single thread (so there is no concurrent execution in the objects in that apartment)
If it is an MTA there can be multiple threads in that apartment. So the objects in the MTA need to implement the synchronization explicitly if needed.
An object lives in one apartment. There can be multiple objects in the same apartment.
A very good read here

It is not a great term. It actually describes thread behavior. A thread tells COM how it behaves in the CoInitializeEx() call, selecting between STA and MTA. By using STA, the thread promises that it behaves in a manner that suitable for code that is not thread-safe. The hard promises it makes are:
Never blocks execution
Pumps a message loop
Using MTA means a thread can do whatever it wants and does not make any effort to support code that is not thread-safe.
This matters first when a COM object gets created. Such an object contains a key in the registry that describes what kind of thread-safety it implements. The ThreadingModel key. By far the most common value for this key is "Apartment" (or is missing), telling COM that it doesn't support threading at all and that any calls on the object must be made from the same thread.
If the thread that creates such an object is in an STA then everything is happy. After all, the thread promised to support single threaded objects. If the thread is in the MTA then there's a problem, the thread said it didn't support thread-safety but still created an object that isn't thread-safe. COM steps in an creates a new thread, an STA thread that can support code that isn't thread safe. The code gets a proxy to the object. Any calls made on the object go through that proxy. The proxy code intercepts the call and makes it run on the STA thread that was created, thus ensuring the call is made in a thread-safe way.
As you can imagine, the job done by the proxy isn't cheap. It involves two thread context switches and a stack frame must be constructed from the function arguments to make the call. It must also wait until the thread is ready to execute the call. This is called marshaling, it is an easy 3 orders of magnitude slower than making a call that doesn't have to be marshaled. This perhaps also explains the reason an STA thread has those two requirements listed above. It cannot block because as long as it blocks that marshaled call cannot be made and makes deadlock very likely. And it must pump a message loop, that loop is what makes injecting a call into another thread possible.
So making a thread join the MTA is easy programming for you. But deadly to performance. An STA is efficient.

Questions about COM multithreading and STA / MTA

Hi I am a beginner in COM. I want to test a COM dll in both STA and MTA modes. My first question is: is it possible a COM object supports both STA and MTA?
Now I imagine the STA code snippet below:
// this is the main thread
m_IFoo;
CoInitializeEx(STA); // initialize COM in main thread
CreateInstance(m_IFoo);
m_IFoo->Bar();
CreateThread(ThreadA);
// start ThreadA
// this is secondary thread
ThreadA()
{
CoInitializeEx(STA);
m_IFoo->Buz(); // call m_IFoo's method directly
}
Will this code work? Am I missing any fundamental things? I know the main thread needs a window message loop to let calls from other threads be executed. Do I have to do anything about it?
Now I move on to test MTA. If I merely replace "STA" with "MTA" in the above code, will it work？
Another question is: As a thread with GUI must be STA, I cannot initialize and test MTA in a GUI thread?
Thanks in advance and sorry for me being naive on COM and threading.

Your code is not legal COM, because you are passing a pointer directly from one STA to another, which COM doesn't allow.
In COM, interface pointers have "apartment affinity", they can only be used within an apartment. To pass a pointer from one STA to another, or between STA and MTA, you have to 'marshal' the pointer to a safe representation, which is then unmarshaled by the receiving thread.
The simplest way to do this is using the Global Interface Table; you register the interface with it in one thread and get back a DWORD, which you then use in the other thread to get back a version of the interface that the other thread can use.
If both threads are MTA, you can avoid doing this. While STA are one-per-thread - each STA thread has its own aparment - the MTA is shared by all MTA threads. This means that MTA threads can pass COM pointers between themselves freely. (But they still need to marshal if passing pointers to or from STA threads.)
Generally speaking, you don't change code between STA or MTA, you usually decide this once at the outset. If the thread has UI, then it needs a message loop, and is usually STA. If there's no UI, you may decide to use MTA. But once you make that decision and write your code, it's rare to change to the other later, since picking one or the other has different requirements and assumptions that affect the code; change from STA to MTA or vice versa and you'd have to carefully review the code and see if things like pointer assignments needed to be changed.

Being able to switch from "MTA" to "STA" and consequences of such switch will depend on how the object is registered in system registry. In order for the object to "support" both cases without marshalling it has to have ThreadingModel set to Both.
Please see this great answer - Both means "either Free or Apartment depending on how the caller initializes COM". That's exactly what you want.
As to using the "STA" mode - yes, the tread object belongs to will have to run the message loop by calling GetMessage(), TranslateMesage() and DispatchMessage() in a loop. Anyway the objects methods won't be called directly from the second thread - they will go through the proxy. Please see this very good article for thorough explanation.

Difference between "free-threaded" and "thread-safe"

Sometimes I see the term "free-threaded" to describe a class or a method. It seems to have a similar or identical meaning to "thread-safe".
Is there a difference between the two terms?

There may well be other things meant in other contexts, but in cases I've worked with in the past, "free threaded" means it works, or at least can work, across different threads without any marshalling between apartments.
Apartment-threading in contrast blocks off different "apartments" with separate copies of "global" data (which hence isn't really global, when you think about it) and either allows only one thread to operate in the apartment, or allows several but which will still be separate from those using other apartments.
Now, because the apartment model offers some thread-safety of its own, some (but not all) concerns about thread-safety go away. A piece of code that is designed to operate in an apartment model will be thread-safe, but some or all of that thread-safety is coming from the apartment model.
A free threaded piece of code will have to provide full guarantees of whatever degree of thread-safety it is claiming itself.
Which means it pretty much does mean the same thing as thread-safe, for any intents and purposes where you don't also have to consider the thread-safety of the use of apartment-model code.

I've just done some research on what "free-threaded" might mean, and I also ended up with COM. Let me first cite two passages from the 1998 edition of Don Box' book Essential COM. (The book actually contains more sections about the free-threaded model, but I'll leave it at that for now.)
A thread executes in exactly one apartment at a time. Before a thread can use COM, it must first enter an apartment. […] COM defines two types of apartments: multithreaded apartments (MTAs) and singlethreaded apartments (STAs). Each process has at most one MTA; however, a process can contain multiple STAs. As their names imply, multiple threads can execute in an MTA concurrently, whereas only one thread can execute in an STA. […]
— from pages 200-201. (Emphasis added by me.)
Each CLSID in a DLL can have its own distinct ThreadingModel. […]
ThreadingModel="Both" indicates that the class can execute in either an MTA or an STA.
ThreadingModel="Free" indicates that the class can execute only in an MTA.
ThreadingModel="Apartment" indicates that the class can execute only in an STA.
The absence of a ThreadingModel value implies that the class can run only on the main STA. The main STA is defined as the first STA to be initialized in the process.
— from page 204. (Formatting and emphasis added by me.)
I take this to mean that a component (class) who is declared as free-threaded runs in an MTA, where concurrency of several threads is possible and calls to the component from different threads is explicitly allowed; ie. a free-threaded component supports a multithreaded environment. It would obviously have to be thread-safe in order to achieve this.
The opposite would be a component that is designed for an STA, ie. only allows calls from one particular thread. Such a class would not have to be thread-safe (because COM will take care that no other thread than the one who "entered" / set up the STA can use the component in the first place, that is, COM protects the component from concurrent access).
Conclusion: COM's term "free-threaded" essentially seems to have the same implications as the more general term "thread-safe".
P.S.: This answer assumes that "thread-safe" basically means something like, "can deal with concurrent accesses (possibly by different threads)."
P.P.S.: I do wonder whether "free-threaded" is the opposite of "having thread affinity".

“free-threaded” and “thread-safe” have different meaning. For example, ASP.NET can use "application state" to share data between different web sessions and users of a application (aka IIS virtual directory and all subdirectory). Microsoft Help said:
https://learn.microsoft.com/en-us/previous-versions/aspnet/ms178594(v=vs.100)
Application state is free-threaded, which means that application state
data can be accessed simultaneously by many threads. Therefore, it is
important to ensure that when you update application state data, you
do so in a thread-safe manner by including built-in synchronization
support. You can use the Lock and UnLock methods to ensure data
integrity by locking the data for writing by only one source at a
time.
Therefore, I understand the "application state" can be read simultaneously; but to update its contents, your source codes of ASP.NET must use lock and unlock operations around.
It can extend the understand to COM domain. “free-threaded” for MTA apartment means threads can read COM data simultaneously；but to update COM data in MTA, your source codes should use synchronization manner by yourself. In the other hand, STA is “thread-safe” by COM itself. Your program can access STA data diretly and easily.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string