Microsoft's Aparment Analogy (STA, MTA): Need help understanding it - multithreading

I've read lots about the Microsoft's threaded apartment model, but I'm still having a little trouble visualizing it.
Microsoft uses the analogy of living things living in an apartment. So, for STA, consider the following (I know it's a little silly).
Assume thread = person and COMObject = bacteria. The person lives in the apartment, and the bacteria lives inside the person. So in STA-Land, a thread lives in the STA and the COMObject lives inside the thread, so in order to interact with the COMObject, one must do so by running code on the COMObject's thread.
Assume thread = person and COMObject = cat. The person lives in the apartment, and the cat lives in the apartment with the person. SO in STA-Land, the thread and the COMObject at the same hierarchical level.
Q1. Which analogy above is correct, or if neither are correct, how would you describe the STA?
Q2. How would you describe the MTA?

I do not like these analogies. They are confusing.
You create an apartment.
If it is an STA there will be only one thread in the apartment so all the objects in that apartment will be executed on that single thread (so there is no concurrent execution in the objects in that apartment)
If it is an MTA there can be multiple threads in that apartment. So the objects in the MTA need to implement the synchronization explicitly if needed.
An object lives in one apartment. There can be multiple objects in the same apartment.
A very good read here

It is not a great term. It actually describes thread behavior. A thread tells COM how it behaves in the CoInitializeEx() call, selecting between STA and MTA. By using STA, the thread promises that it behaves in a manner that suitable for code that is not thread-safe. The hard promises it makes are:
Never blocks execution
Pumps a message loop
Using MTA means a thread can do whatever it wants and does not make any effort to support code that is not thread-safe.
This matters first when a COM object gets created. Such an object contains a key in the registry that describes what kind of thread-safety it implements. The ThreadingModel key. By far the most common value for this key is "Apartment" (or is missing), telling COM that it doesn't support threading at all and that any calls on the object must be made from the same thread.
If the thread that creates such an object is in an STA then everything is happy. After all, the thread promised to support single threaded objects. If the thread is in the MTA then there's a problem, the thread said it didn't support thread-safety but still created an object that isn't thread-safe. COM steps in an creates a new thread, an STA thread that can support code that isn't thread safe. The code gets a proxy to the object. Any calls made on the object go through that proxy. The proxy code intercepts the call and makes it run on the STA thread that was created, thus ensuring the call is made in a thread-safe way.
As you can imagine, the job done by the proxy isn't cheap. It involves two thread context switches and a stack frame must be constructed from the function arguments to make the call. It must also wait until the thread is ready to execute the call. This is called marshaling, it is an easy 3 orders of magnitude slower than making a call that doesn't have to be marshaled. This perhaps also explains the reason an STA thread has those two requirements listed above. It cannot block because as long as it blocks that marshaled call cannot be made and makes deadlock very likely. And it must pump a message loop, that loop is what makes injecting a call into another thread possible.
So making a thread join the MTA is easy programming for you. But deadly to performance. An STA is efficient.

Related

FreeThreadedDOMDocument, Neutral Apartments and Free-Threaded Marshaler

As MSDN states:
If you are writing a single threaded application (or a multi-threaded application where only one thread accesses a DOM at one time), use the rental threaded model (Msxml2.DOMDocument.3.0 or Msxml2.DOMDocument.6.0). If you are writing an application where multiple threads access will simultaneously access a DOM, use the free threaded model (Msxml2.FreeThreadedDOMDocument.3.0 or Msxml2.FreeThreadedDOMDocument.6.0).
Is there any connection between FreeThreadedDOMDocument, neutral apartments and free-threaded marshaler? I looked in OleView and found that FreeThreadedDOMDocument threading model is Both. As far as I understand neutral apartment objects are supported with a free-threaded marshaler. Does it mean that FreeThreadedDOMDocument doesn't use a free-threaded marshaler and it is called a bit confusing as free-threaded?
What is the implementation difference between COM classes that marked as Free, Both or Neutral? As far as I understand they all must be thread-safe, why is the difference? Is it correct that Neutral should support a free-threaded marshaler?
There are multiple questions here.
TL;DR:
Neutral objects:
Incur a bit less in-process marshaling than STA and MTA object
Avoid thread switching
Interface pointers are automatically marshaled
Neutral apartment lives as long as there's a neutral object alive
Must be ready to run under any kind of thread, using COM utility functions for waiting or choosing the Win32 wait function to use depending on the thread type
Free threaded objects:
Incur practically no in-process marshaling
Avoid thread switching
Interface pointers are not automatically marshaled
Lifetime is tied to the activation apartment
Must be ready to run under any kind of thread, using COM utility functions for waiting or choosing the Win32 wait function to use depending on the thread type
Is there any connection between FreeThreadedDOMDocument, neutral apartments and free-threaded marshaler?
TL;DR: The FreeThreadedDOMDocument's threading model is "Both", so it's tied to the apartment where it's activated (created). It aggregates the free threaded marshaler, so it's a free threaded object.
The FreeThreadedDOMDocument is a COM class whose objects aggregate the free threaded marshaler. What this marshaler does is to provide a raw pointer whenever marshaling in-process (i.e. IMarshal::MarshalInterface with dwDestContext set to MSHCTX_INPROC.
I'll use definition of a free threaded object as an object which aggregates the free threaded marshaler.
A free threaded object's threading model should be specified as "Neutral", or "Both" before Windows 2000, so it can be created and used in any thread, avoiding context switches.
If its threading model is specified as "Both", the object's lifetime is tied to the apartment where it was created. For instance, if an STA thread terminates, all free threaded objects created within that apartment are either destroyed or no longer valid.
As far as I understand neutral apartment objects are supported with a free-threaded marshaler.
No, proxies to neutral objects are a bit lighter than other in-process proxies as it only sets up a COM context, but it never incurs in full marshaling and it avoids thread switching.
Does it mean that FreeThreadedDOMDocument doesn't use a free-threaded marshaler and it is called a bit confusing as free-threaded?
No, the FreeThreadedDOMDocument does use the free threaded marshaler.
Historically, there were already free threaded objects before Microsoft provided its own support for them (due to popularity, and probably because most free threaded marshalers out there were flaky), and the Neutral apartment appeared only in Windows 2000.
As such, instances of FreeThreadedDOMDocument are free threaded because they aggregate the free threaded marshaler, and the lifetime of each instance is tied to the apartment where it was created. Usually, there's little impact, but with e.g. a thread-pool of STA threads, the effect is observed more often, because STAs come and go as the owning threads terminate (either normally or to reclaim resources) and get created. For instance, classic ASP uses STA threads by default.
PS: I've mentioned the following subject in another answer, but I believe the content is a bit different as the questions are different too.
Here's the current threading model values:
None: use the main STA
"Apartment": use any STA, i.e. if the current apartment is STA or NA over STA, use the current STA, otherwise use the host STA (more on this later)
"Free": use the MTA
"Both": use the current apartment
"Neutral": use the NA
For any apartment that doesn't exist, COM creates it if needed.
There are several peculiarities here:
To use the main STA, you must not mention any threading model, instead of something more sensible, like "Main"
All names except "Neutral" make no sense nowadays:
"Apartment" feels like the current apartment, but it's not
"Free" feels like free threaded objects, but it's not
"Both" makes you think there are only 2 apartment types, but there are 3: STA, MTA and NA
Actually, since Windows 8, there's ASTA, a variation of STA that is created for the GUI which, during outgoing calls, discards incoming calls that are not related, thus avoiding a great source of reentrancy bugs
You can make a regular STA behave like this with a message filter
The main STA is the first created STA. It only matters for classes with an unspecified threading model.
There can be several STAs, but there are at most one MTA and one NA.
While there is an active MTA, any thread not initialized for COM is implicitly in the MTA if it doesn't call CoInitializeEx(NULL, COINIT_MULTITHREADED), but it also doesn't affect the MTA's lifetime at all, meaning the MTA may be destroyed while the thread is using it. Since this is scarcely documented and pretty much unreliable, you shouldn't rely on this.
Implicitly created apartments are called host STA and host MTA. You have no control over them (unless by cheating with CoUninitialize while in that apartment; note: don't actually do this). In fact, if you activate "Apartment" objects outside of an STA or outside of an NA running over an STA, it'll be activated in a host STA. For further confusion, this might also be the main STA if the host STA was the first STA to be initialized.
All COM threads that support host apartments are background threads, so they don't keep your application from exiting.
You have no control whatsoever over the NA, other than creating it when activating a neutral object. You cannot enter it directly, but you can create your own neutral object with a method that runs a callback in the context of the neutral apartment. This callback could be a free threaded object.
What is the implementation difference between COM classes that marked as Free, Both or Neutral?
COM classes with apartment declared as "Free" will result in objects that belong to the MTA. Such objects may assume that the threads they run in don't have to pump window messages. Essentially, they may block.
Free threaded objects and neutral objects must be prepared to run under any apartment. For free threaded objects, it should be obvious why: it bypasses any context marshaling, so methods execute in any thread. For neutral objects, there's the distinction of which kind of apartment was active (through CoGetApartmentType).
In either case, you should use COM's utility functions, like CoWaitForMultipleHandles instead of WaitForMultipleHandles[Ex], which blocks and is unacceptable in STAs, or MsgWaitForMultipleHandles[Ex], which accesses the window message queue, probably creating it implicitly, and is usually unacceptable in MTAs.
You can check the apartment type by yourself and choose to use the proper Win32 waiting functions or to use a polling strategy which waits and pumps messages with timeouts in the STA, in case you're waiting on something other than handles or if you require a specific waiting logic.
The most striking difference between free threaded objects and neutral objects is marshaling of other COM objects.
While using neutral objects, incoming and outgoing interface pointers are automatically marshaled. For instance, you can store incoming interface pointers in fields.
While using free threaded objects, incoming and outgoing interface pointers are not marshaled at all, meaning either you get raw pointers to objects in the same apartment, or you get proxies to objects in other apartments. These proxies are tied to the current apartment too.
For instance, an incoming raw pointer means you're getting an object that belongs to the current apartment, so you'll have to marshal it if you intend to store a reference to the object.
An incoming proxy means you're getting a proxy to an object in another apartment, but this proxy is tied to the current apartment. You can't store this proxy either. Specifically, notwithstanding standard proxy/stubs' apartment verification, STA proxies may have thread affinity. You must marshal it as well. But don't worry, marshaling a proxy will not stack marshaling; when you again unmarshal, you'll get a proxy to the object, not a proxy to a proxy.
When a free threaded object must store an interface pointer, it must always do so through manual marshaling, and when it must call methods from this interface pointer, it must do so through manually unmarshaling.
Usually, the Global Interface Table (GIT; another misleading name, it's actually an in-process table) is used for this purpose.
As far as I understand they all must be thread-safe, why is the difference?
Regarding thread-safety, there's no difference.
But as I explained in the previous question, there's a huge difference when storing interface pointers, and a subtle difference regarding object activation and lifetime.
Is it correct that Neutral should support a free-threaded marshaler?
The free threaded marshaler effectively ignores the apartment, so it's the methods' responsibility to behave, synchronize and/or lock correctly. So, neither apartment must support the free threaded marshaler, it's the free threaded object that must support every apartment.
It's possible to aggregate the free threaded marshaler in objects with any threading model, including "Neutral".
If you find that the context setup by the neutral apartment marshaler is somehow a bottleneck, then you may consider using the free threaded marshaler, at the cost of manually marshaling stored interface pointers. If not, just use the neutral apartment.

Questions about COM multithreading and STA / MTA

Hi I am a beginner in COM. I want to test a COM dll in both STA and MTA modes. My first question is: is it possible a COM object supports both STA and MTA?
Now I imagine the STA code snippet below:
// this is the main thread
m_IFoo;
CoInitializeEx(STA); // initialize COM in main thread
CreateInstance(m_IFoo);
m_IFoo->Bar();
CreateThread(ThreadA);
// start ThreadA
// this is secondary thread
ThreadA()
{
CoInitializeEx(STA);
m_IFoo->Buz(); // call m_IFoo's method directly
}
Will this code work? Am I missing any fundamental things? I know the main thread needs a window message loop to let calls from other threads be executed. Do I have to do anything about it?
Now I move on to test MTA. If I merely replace "STA" with "MTA" in the above code, will it work?
Another question is: As a thread with GUI must be STA, I cannot initialize and test MTA in a GUI thread?
Thanks in advance and sorry for me being naive on COM and threading.
Your code is not legal COM, because you are passing a pointer directly from one STA to another, which COM doesn't allow.
In COM, interface pointers have "apartment affinity", they can only be used within an apartment. To pass a pointer from one STA to another, or between STA and MTA, you have to 'marshal' the pointer to a safe representation, which is then unmarshaled by the receiving thread.
The simplest way to do this is using the Global Interface Table; you register the interface with it in one thread and get back a DWORD, which you then use in the other thread to get back a version of the interface that the other thread can use.
If both threads are MTA, you can avoid doing this. While STA are one-per-thread - each STA thread has its own aparment - the MTA is shared by all MTA threads. This means that MTA threads can pass COM pointers between themselves freely. (But they still need to marshal if passing pointers to or from STA threads.)
Generally speaking, you don't change code between STA or MTA, you usually decide this once at the outset. If the thread has UI, then it needs a message loop, and is usually STA. If there's no UI, you may decide to use MTA. But once you make that decision and write your code, it's rare to change to the other later, since picking one or the other has different requirements and assumptions that affect the code; change from STA to MTA or vice versa and you'd have to carefully review the code and see if things like pointer assignments needed to be changed.
Being able to switch from "MTA" to "STA" and consequences of such switch will depend on how the object is registered in system registry. In order for the object to "support" both cases without marshalling it has to have ThreadingModel set to Both.
Please see this great answer - Both means "either Free or Apartment depending on how the caller initializes COM". That's exactly what you want.
As to using the "STA" mode - yes, the tread object belongs to will have to run the message loop by calling GetMessage(), TranslateMesage() and DispatchMessage() in a loop. Anyway the objects methods won't be called directly from the second thread - they will go through the proxy. Please see this very good article for thorough explanation.

Difference between "free-threaded" and "thread-safe"

Sometimes I see the term "free-threaded" to describe a class or a method. It seems to have a similar or identical meaning to "thread-safe".
Is there a difference between the two terms?
There may well be other things meant in other contexts, but in cases I've worked with in the past, "free threaded" means it works, or at least can work, across different threads without any marshalling between apartments.
Apartment-threading in contrast blocks off different "apartments" with separate copies of "global" data (which hence isn't really global, when you think about it) and either allows only one thread to operate in the apartment, or allows several but which will still be separate from those using other apartments.
Now, because the apartment model offers some thread-safety of its own, some (but not all) concerns about thread-safety go away. A piece of code that is designed to operate in an apartment model will be thread-safe, but some or all of that thread-safety is coming from the apartment model.
A free threaded piece of code will have to provide full guarantees of whatever degree of thread-safety it is claiming itself.
Which means it pretty much does mean the same thing as thread-safe, for any intents and purposes where you don't also have to consider the thread-safety of the use of apartment-model code.
I've just done some research on what "free-threaded" might mean, and I also ended up with COM. Let me first cite two passages from the 1998 edition of Don Box' book Essential COM. (The book actually contains more sections about the free-threaded model, but I'll leave it at that for now.)
A thread executes in exactly one apartment at a time. Before a thread can use COM, it must first enter an apartment. […] COM defines two types of apartments: multithreaded apartments (MTAs) and singlethreaded apartments (STAs). Each process has at most one MTA; however, a process can contain multiple STAs. As their names imply, multiple threads can execute in an MTA concurrently, whereas only one thread can execute in an STA. […]
— from pages 200-201. (Emphasis added by me.)
Each CLSID in a DLL can have its own distinct ThreadingModel. […]
ThreadingModel="Both" indicates that the class can execute in either an MTA or an STA.
ThreadingModel="Free" indicates that the class can execute only in an MTA.
ThreadingModel="Apartment" indicates that the class can execute only in an STA.
The absence of a ThreadingModel value implies that the class can run only on the main STA. The main STA is defined as the first STA to be initialized in the process.
— from page 204. (Formatting and emphasis added by me.)
I take this to mean that a component (class) who is declared as free-threaded runs in an MTA, where concurrency of several threads is possible and calls to the component from different threads is explicitly allowed; ie. a free-threaded component supports a multithreaded environment. It would obviously have to be thread-safe in order to achieve this.
The opposite would be a component that is designed for an STA, ie. only allows calls from one particular thread. Such a class would not have to be thread-safe (because COM will take care that no other thread than the one who "entered" / set up the STA can use the component in the first place, that is, COM protects the component from concurrent access).
Conclusion: COM's term "free-threaded" essentially seems to have the same implications as the more general term "thread-safe".
P.S.: This answer assumes that "thread-safe" basically means something like, "can deal with concurrent accesses (possibly by different threads)."
P.P.S.: I do wonder whether "free-threaded" is the opposite of "having thread affinity".
“free-threaded” and “thread-safe” have different meaning. For example, ASP.NET can use "application state" to share data between different web sessions and users of a application (aka IIS virtual directory and all subdirectory). Microsoft Help said:
https://learn.microsoft.com/en-us/previous-versions/aspnet/ms178594(v=vs.100)
Application state is free-threaded, which means that application state
data can be accessed simultaneously by many threads. Therefore, it is
important to ensure that when you update application state data, you
do so in a thread-safe manner by including built-in synchronization
support. You can use the Lock and UnLock methods to ensure data
integrity by locking the data for writing by only one source at a
time.
Therefore, I understand the "application state" can be read simultaneously; but to update its contents, your source codes of ASP.NET must use lock and unlock operations around.
It can extend the understand to COM domain. “free-threaded” for MTA apartment means threads can read COM data simultaneously;but to update COM data in MTA, your source codes should use synchronization manner by yourself. In the other hand, STA is “thread-safe” by COM itself. Your program can access STA data diretly and easily.

VC++ thread marshalling and COM : The application called an interface that was marshalled for a different thread

My VC++ 2005 Dialog based application initializes a COM object in the dialog class and uses it in the worker thread.
I called CoInitialize(NULL) At the start of the application and the at the start of the worker thread. But when a COM method is called the error "The application called an interface that was marshalled for a different thread" follows.
If I use CoInitializeEx(0,COINIT_MULTITHREADED) then I will get the same error message
Please help me in finding the root cause.
Thanks.
You created two single-threaded apartments by calling CoInitialize(NULL). An interface pointer must be marshaled from one apartment to the other before it is usable. Initializing the worker thread as MTA doesn't solve the problem. The original interface pointer was still created in a single-threaded apartment and is thus not thread-safe. In other words, you cannot call the interface methods directly from a thread. Those calls have to be marshaled to the thread that created the interface. Marshaling the interface pointer sets up the plumbing that makes that possible.
The only time you don't have to marshal is when both threads are MTA. That's almost never possible, your main thread must be STA if it creates any windows. And the COM server would actually have to be thread-safe, they very rarely are. They advertise what they need with the ThreadingModel key in the registry. COM will actually create an STA thread if necessary to find a good home for the server.
You must marshal the pointer with CoMarshalInterThreadInterfaceInStream() to avoid the error. That's a fairly unfriendly function, IGlobalInterfaceTable is easier to use. The COM server also has to support it, you typically need a proxy/stub DLL that takes care of the marshaling. You'll get E_NOINTERFACE if it doesn't.
Also beware the overhead, marshaling a call from the worker thread to the main thread is pretty expensive and subject to how responsive your main thread is. In other words, if you wrote the thread to speed up your program or to avoid blocking the user interface then this won't actually work. It is the 'there is no free lunch' principle.
Probably CoMarshalInterface() and CoUnMarshalInterface() are the simplest way to do this.
Look at http://support.microsoft.com/kb/206076. You can download the code example and find different implementations of your requirements in Client.cpp.
I think one of the ways to access COM objects inside another thread would be to use Global Interface Pointers. After initialization,form the GIT pointer to the thread along with dwCookie value. Then inside the thread reinterpret-cast the pointer as a DWORD and pass it to the GI table to get our COM pointer.
Thanks

Has anyone seen a programming language that handles threads like this?

Most of the multithreaded work I have done has been in C/C++, Python, or Delphi (Object Pascal). All on Windows. I'll use Delphi for my discussion here. Delphi has a nice class called TThread which abstracts the thread creation process. The class provides an Execute method which is the created thread's thread function. You override that method and typically create a loop within it that exits when the thread is terminated. You do the thread's work inside the loop.
One of the recurring tasks that crops up is keeping track (carefully) of which code gets executed in the thread's context and which code gets executed by external threads in an external context, with synchronization objects guarding data shared by the threads. All basic thread programming stuff. One of the recurring annoyances is creating functions to allow external threads to submit or retrieve data, and moving data from public thread-safe memory objects to those private to the thread and back the other way.
I was wondering if anyone has ever seen a programming language that makes this simpler? Here's what I would love as a programming thread idiom. Let's take a Delphi TThread sub-class created for this discussion. Suppose I could mark class methods with one of three keywords like Private, PublicExecuteInAnyContext, or PublicExecuteInPrivateContext. Here's how they would work.
Private: Private methods would only execute in the thread's context. The compiler would automatically add code that would raise an exception if a code path led to that method being executed in a context outside of the host thread. (E.g. - "Error, attempt to execute method private to thread $AEB from thread $EE0").
PublicExecuteInAnyContext: methods marked as such could be called by the thread that owns the method and any external thread. Any data objects referenced in these methods would automatically be guarded with synchronization objects, with the option to override the default choice and supply your own. (Mutex or semaphore instead of Critical Section, etc.)
PublicExecuteInPrivateContext: Methods marked with this keyword would execute in the thread's context, but are callable by any thread. This option would allow for two strategies for dealing with calls to such methods by external threads:
1) Mode 1 - Block calling thread: the calling thread would block until the method returned. In other words the compiler would automatically write code to make the calling thread would block. Any parameters passed by the calling thread would be copied into variables private to the host thread. The method would not execute until the host thread received control. When the host thread exited the method the calling thread would be released and would have any results returned by the method copied into it's own private variable space.
2) Mode 2 - Do not block the calling thread: this would allow for the additional argument of a callback function. Any parameters passed to the method by an external thread would be copied into the thread's private variable space. The compiler would again hold the execution of the method until the host thread received control, but would let the calling thread continue without blocking. When the host thread completed execution of the method, if a callback function was specified, it would call that function ** but in the context of the original thread that called the method, not in the host thread context **.
A language that would provide this kind of automatic thread handling would, at least to me, be a lot more fun to do multithreading with then the way I have to do it now. Has anybody seen a programming language, or a module/hack for one of the mainstream languages, that provides this kind of multi-threading model?
I don't think it matches your description exactly, but Erlang provides one of the easiest concurrency models I've seen. It uses a share-nothing approach, which may or may not sound appealing to you, but I found it really interesting. It's also really easy to create distributed systems with it, if you ever have the need. Check out "Getting Started with Erlang", it's a great tutorial that covers almost all parts of the language.
Wanted to mention. LabVIEW really makes starting with multi threading a breeze. I just found some article on the Internet which actually explains this in a better way:
If you have programmed in a
traditional, textual language before,
the data-flow paradigm of Labview can
be somewhat hard to embrace. The
data-flow paradigm stipulates that it
does not matter where on the 2D
surface of the block diagram a
particular component is placed in
relation to other components, but what
other components it is wired to. A
particular component-node does not
execute until all its inputs are
available; it executes, however, as
soon as all its inputs are available,
regardless of what else might be
executing at the same time, that is -
in parallel with potentially many
other things.
You need extensive experience with
multi-threading programming in a
traditional language, however, to
truly appreciate how easy it is to
achieve multi-threading with LabVIEW
and how naturally the parallelism
arises - whether intentionally, or
not. While LabVIEW does not magically
dissolve all the challenges related to
parallel processing, it certainly
makes it considerably simpler to get
started with it. In fact, in contrast
with the traditionally sequential
textual programming languages, its is
the sequential execution that comes at
a price in LabVIEW - parallelism is
almost free.
Taken from http://saberrobotics.org/?id=34

Resources