Delphi Win API CreateTimerQueueTimer threads and thread safe FormatDateTime crashes - multithreading

This is a bit of a long question, but here we go. There is a version of FormatDateTime that is said to be thread safe in that you use
GetLocaleFormatSettings(3081, FormatSettings);
to get a value and then you can use it like so;
FormatDateTime('yyyy', 0, FormatSettings);
Now imagine two timers, one using TTimer (interval say 1000ms) and then another timer created like so (10ms interval);
CreateTimerQueueTimer
(
FQueueTimer,
0,
TimerCallback,
nil,
10,
10,
WT_EXECUTEINTIMERTHREAD
);
Now the narly bit, if in the call back and also the timer event you have the following code;
for i := 1 to 10000 do
begin
FormatDateTime('yyyy', 0, FormatSettings);
end;
Note there is no assignment. This produces access violations almost immediatley, sometimes 20 minutes later, whatever, at random places. Now if you write that code in C++Builder it never crashes. The header conversion we are using is the JEDI JwaXXXX ones. Even if we put locks in the Delphi version around the code, it only delays the inevitable. We've looked at the original C header files and it all looks good, is there some different way that C++ uses the Delphi runtime? The thread safe version of FormatDatTime looks to be re-entrant. Any ideas or thoughts from anyone who may have seen this before.
UPDATE:
To narrow this down a bit, FormatSettings is passed in as a const, so does it matter if they use the same copy (as it turns out passing a local version within the function call yeilds the same problem)? Also the version of FormatDateTime that takes the FormatSettings doesn't call GetThreadLocale, because it already has the Locale information in the FormatSettings structure (I double checked by stepping through the code).
I made mention of no assignment to make it clear that no shared storage is being accessed, so no locking is required.
WT_EXECUTEINTIMERTHREAD is used to simplify the problem. I was under the impression you should only use it for very short tasks because it may mean it'll miss the next interval if it is running something long?
If you use a plain old TThread the problem doesn't occur. What I am getting at here I suppose is that using a TThread or a TTimer works but using a thread created outside the VCL doesn't, that's why I asked if there was a difference in the way C++ Builder uses the VCL/Delphi RTL.
As an aside this code as mentioned before also fails (but takes longer), after a while, CS := TCriticalSection.Create;
CS.Acquire;
for i := 1 to LoopCount do
begin
FormatDateTime('yyyy', 0, FormatSettings);
end;
CS.Release;
And now for the bit I really don't understand, I wrote this as suggested;
function ReturnAString: string;
begin
Result := 'Test';
UniqueString(Result);
end;
and then inside each type of timer the code is;
for i := 1 to 10000 do
begin
ReturnAString;
end;
This causes the same kinds of failiures, as I said before the fault is never in the same place inside the CPU window etc. Sometimes it's an access violation and sometimes it might be an invalid pointer operation. I am using Delphi 2009 btw.
UPDATE 2:
Roddy (below) points out the Ontimer event (and unfortunately also Winsock, i.e. TClientSocket) use the windows message pump (as an aside it would be nice to have some nice Winsock2 components using IOCP and Overlapping IO), hence the push to get away from it. However does anyone know how to see what sort of thread local storage is setup on the CreateQueueTimerQueue?
Thanks for taking the time to think and answer this problem.

I am not sure if it is good form to post an "Answer" to my own question but it seemed logical, let me know if that is uncool.
I think I have found the problem, the thread local storage idea lead me to follow a bunch of leads and I found this magical line;
IsMultiThread := True;
From the help;
"IsMultiThread is set to true to indicate that the memory manager should support multiple threads. IsMultiThread is set to true by BeginThread and class factories."
This of course is not set by using a single Main VCL thread using a TTimer, However it is set for you when you use TThread. If I set it manually the problem goes away.
In C++Builder, I do not use a TThread but it appears by using the following code;
if (IsMultiThread) {
ShowMessage("IsMultiThread is True!");
}
that is it set for you somewhere automatically.
I am really glad for peoples input so that I could find this and I hope vainly it might help someone else.

As DateTimeToString which FormatDateTime calls uses GetThreadLocale, you may wish to try having a local FormatSettings variable for each thread, maybe even setting up FormatSettings in a local variable before the loop.
It may also be the WT_EXECUTEINTIMERTHREAD parameter which causes this. Note that it states it should only be used for very short tasks.
If the problem persists the problem may actually be elsewhere, which was my first hunch when I saw this but I don't have enough information about what the code does to really determine that.
Details about where the access violation occurs may help.

Are you sure this actually has anything to do with FormatDateTime? You made a point of mentioning that there is no assignment statement there; is that an important aspect of your question? What happens if you call some other string-returning function instead? (Make sure it's not a constant string. Write your own function that calls UniqueString(Result) before returning.)
Is the FormatSettings variable thread-specific? That's the point of having the extra parameter for FormatDateTime, so each thread has its own private copy that is guaranteed not to be modified by any other thread while the function is active.
Is the timer queue important to this question? Or do you get the same results when you use a plain old TThread and run your loop in the Execute method?
You did warn that it was a long question, but it seems there are a few things you could do to make it smaller, to narrow down the scope of the issue.

I wonder if the RTL/VCL calls you're making are expecting access to some thread-local storage (TLS) variables that aren't correctly set up whn you invoke your code via the timer queue?
This isn't the answer to your problem, but are you aware that TTimer OnTimer events just run as part of the normal message loop in the main VCL thread?

You found your answer - IsMultiThread. This has to be used anytime to revert to using the API and create threads. From MSDN: CreateTimerQueueTimer is creating a thread pool to handle this functionality so you have an outside thread working with the main VCL thread with no protection. (Note: your CS.acquire/release doesn't do anything at all unless other parts of the code respect this lock.)

Re. your last question about Winsock and overlapping I/O: You should look closely at Indy.
Indy uses blocking I/O, and is a great choice when you want high performance network IO regardless of what the main thread is doing. Now you've cracked the multi-threading issue, you should just create another thread (or more) to use indy to handle your I/O.

The problem with Indy is that if you need many connections it's not effiecient at all. It requires one thread per connection (blocking I/O) which doesn't scale at all, hence the benefit of IOCP and Overlapping IO, it's pretty much the only scalable way on Windows.

For update2 :
There is a free IOCP socket components : http://www.torry.net/authorsmore.php?id=7131 (source code included)
"By Naberegnyh Sergey N.. High
performance socket server based on
Windows Completion Port and with using
Windows Socket Extensions. IPv6
supported. "
i've found it while looking better components/library to rearchitecture my little instant messaging server. I haven't tried it yet but it looks good coded as a first impression.

Related

Workaround for ncurses multi-thread read and write

This is what says on http://invisible-island.net/ncurses/ncurses.faq.html#multithread
If you have a program which uses curses in more than one thread, you will almost certainly see odd behavior. That is because curses relies upon static variables for both input and output. Using one thread for input and other(s) for output cannot solve the problem, nor can extra screen updates help. This FAQ is not a tutorial on threaded programming.
Specifically, it mentions it is not safe even if input and output are done on separate threads. Would it be safe if we further use a mutex for the whole ncurses library so that at most one thread can be calling any ncurses function at a time? If not, what would be other cheap workarounds to use ncurses safely in multi-thread application?
I'm asking this question because I notice a real application often has its own event loop but relies on ncurses getch function to get keyboard input. But if the main thread is block waiting in its own event loop, then it has no chance to call getch. A seemingly applicable solution is to call getch in a different thread, which hasn't caused me a problem yet, but as what says above is actually not safe, and was verified by another user here. So I'm wondering what is the best way to merge getch into an application's own event loop.
I'm considering making getch non-blocking and waking up the main thread regularly (every 10-100 ms) to check if there is something to read. But this adds an additional delay between key events and makes the application less responsive. Also, I'm not sure if that would cause any problems with some ncurses internal delay such as ESCDELAY.
Another solution I'm considering is to poll stdin directly. But I guess ncurses should also be doing something like that and reading the same stream from two different places looks bad.
The text also mentions the "ncursest" or "ncursestw" libraries, but they seem to be less available, for example, if you are using a different language binding of curses. It would be great if there is a viable solution with the standard ncurses library.
Without the thread-support, you're out of luck for using curses functions in more than one thread. That's because most of the curses calls use static or global data. The getch function for instance calls refresh which can update the whole screen—using the global pointers curscr and stdscr. The difference in the thread-support configuration is that global values are converted to functions and mutex's added.
If you want to read stdin from a different thread and run curses in one thread, you probably can make that work by checking the file descriptor (i.e., 0) for pending activity and alerting the thread which runs curses to tell it to read data.

Creating promise in one thread and setting it in another

Can I have an boost::promise<void> created in a thread and set its value in another different thread through boost::promise<void>::set_value().
I think I am having a crash because of this, probably, so I must guess that no, but I would need confirmation. Thanks in advance.
P.S.: Note that I am using boost implementation.
Yes, you can do that, but you must ensure that the call to set_value() does not conflict with anything in the other thread, such as the completion of the constructor or the start of the destructor.
(According to the C++ standard you cannot even make potentially concurrent calls to set_value() and get_future() but that is a defect and should get fixed.)
To give a more precise answer it would be necessary to see exactly what your code is doing.

Using threadsafe initialization in a JRuby gem

Wanting to be sure we're using the correct synchronization (and no more than necessary) when writing threadsafe code in JRuby; specifically, in a Puma instantiated Rails app.
UPDATE: Extensively re-edited this question, to be very clear and use latest code we are implementing. This code uses the atomic gem written by #headius (Charles Nutter) for JRuby, but not sure it is totally necessary, or in which ways it's necessary, for what we're trying to do here.
Here's what we've got, is this overkill (meaning, are we over/uber-engineering this), or perhaps incorrect?
ourgem.rb:
require 'atomic' # gem from #headius
SUPPORTED_SERVICES = %w(serviceABC anotherSvc andSoOnSvc).freeze
module Foo
def self.included(cls)
cls.extend(ClassMethods)
cls.send :__setup
end
module ClassMethods
def get(service_name, method_name, *args)
__cached_client(service_name).send(method_name.to_sym, *args)
# we also capture exceptions here, but leaving those out for brevity
end
private
def __client(service_name)
# obtain and return a client handle for the given service_name
# we definitely want to cache the value returned from this method
# **AND**
# it is a requirement that this method ONLY be called *once PER service_name*.
end
def __cached_client(service_name)
##_clients.value[service_name]
end
def __setup
##_clients = Atomic.new({})
##_clients.update do |current_service|
SUPPORTED_SERVICES.inject(Atomic.new({}).value) do |memo, service_name|
if current_services[service_name]
current_services[service_name]
else
memo.merge({service_name => __client(service_name)})
end
end
end
end
end
end
client.rb:
require 'ourgem'
class GetStuffFromServiceABC
include Foo
def self.get_some_stuff
result = get('serviceABC', 'method_bar', 'arg1', 'arg2', 'arg3')
puts result
end
end
Summary of the above: we have ##_clients (a mutable class variable holding a Hash of clients) which we only want to populate ONCE for all available services, which are keyed on service_name.
Since the hash is in a class variable (and hence threadsafe?), are we guaranteed that the call to __client will not get run more than once per service name (even if Puma is instantiating multiple threads with this class to service all the requests from different users)? If the class variable is threadsafe (in that way), then perhaps the Atomic.new({}) is unnecessary?
Also, should we be using an Atomic.new(ThreadSafe::Hash) instead? Or again, is that not necessary?
If not (meaning: you think we do need the Atomic.news at least, and perhaps also the ThreadSafe::Hash), then why couldn't a second (or third, etc.) thread interrupt between the Atomic.new(nil) and the ##_clients.update do ... meaning the Atomic.news from EACH thread will EACH create two (separate) objects?
Thanks for any thread-safety advice, we don't see any questions on SO that directly address this issue.
Just a friendly piece of advice, before I attempt to tackle the issues you raise here:
This question, and the accompanying code, strongly suggests that you don't (yet) have a solid grasp of the issues involved in writing multi-threaded code. I encourage you to think twice before deciding to write a multi-threaded app for production use. Why do you actually want to use Puma? Is it for performance? Will your app handle many long-running, I/O-bound requests (like uploading/downloading large files) at the same time? Or (like many apps) will it primarily handle short, CPU-bound requests?
If the answer is "short/CPU-bound", then you have little to gain from using Puma. Multiple single-threaded server processes would be better. Memory consumption will be higher, but you will keep your sanity. Writing correct multi-threaded code is devilishly hard, and even experts make mistakes. If your business success, job security, etc. depends on that multi-threaded code working and working right, you are going to cause yourself a lot of unnecessary pain and mental anguish.
That aside, let me try to unravel some of the issues raised in your question. There is so much to say that it's hard to know where to start. You may want to pour yourself a cold or hot beverage of your choice before sitting down to read this treatise:
When you talk about writing "thread-safe" code, you need to be clear about what you mean. In most cases, "thread-safe" code means code which doesn't concurrently modify mutable data in a way which could cause data corruption. (What a mouthful!) That could mean that the code doesn't allow concurrent modification of mutable data at all (using locks), or that it does allow concurrent modification, but makes sure that it doesn't corrupt data (probably using atomic operations and a touch of black magic).
Note that when your threads are only reading data, not modifying it, or when working with shared stateless objects, there is no question of "thread safety".
Another definition of "thread-safe", which probably applies better to your situation, has to do with operations which affect the outside world (basically I/O). You may want some operations to only happen once, or to happen in a specific order. If the code which performs those operations runs on multiple threads, they could happen more times than desired, or in a different order than desired, unless you do something to prevent that.
It appears that your __setup method is only called when ourgem.rb is first loaded. As far as I know, even if multiple threads require the same file at the same time, MRI will only ever let a single thread load the file. I don't know whether JRuby is the same. But in any case, if your source files are being loaded more than once, that is symptomatic of a deeper problem. They should only be loaded once, on a single thread. If your app handles requests on multiple threads, those threads should be started up after the application has loaded, not before. This is the only sane way to do things.
Assuming that everything is sane, ourgem.rb will be loaded using a single thread. That means __setup will only ever be called by a single thread. In that case, there is no question of thread safety at all to worry about (as far as initialization of your "client cache" goes).
Even if __setup was to be called concurrently by multiple threads, your atomic code won't do what you think it does. First of all, you use Atomic.new({}).value. This wraps a Hash in an atomic reference, then unwraps it so you just get back the Hash. It's a no-op. You could just write {} instead.
Second, your Atomic#update call will not prevent the initialization code from running more than once. To understand this, you need to know what Atomic actually does.
Let me pull out the old, tired "increment a shared counter" example. Imagine the following code is running on 2 threads:
i += 1
We all know what can go wrong here. You may end up with the following sequence of events:
Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A writes its incremented value back to i.
Thread B writes its incremented value back to i.
So we lose an update, right? But what if we store the counter value in an atomic reference, and use Atomic#update? Then it would be like this:
Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A tries to write its incremented value back to i, and succeeds.
Thread B tries to write its incremented value back to i, and fails, because the value has already changed.
Thread B reads i again and increments it.
Thread B tries to write its incremented value back to i again, and succeeds this time.
Do you get the idea? Atomic never stops 2 threads from running the same code at the same time. What it does do, is force some threads to retry the #update block when necessary, to avoid lost updates.
If your goal is to ensure that your initialization code will only ever run once, using Atomic is a very inappropriate choice. If anything, it could make it run more times, rather than less (due to retries).
So, that is that. But if you're still with me here, I am actually more concerned about whether your "client" objects are themselves thread-safe. Do they have any mutable state? Since you are caching them, it seems that initializing them must be slow. Be that as it may, if you use locks to make them thread-safe, you may not be gaining anything from caching and sharing them between threads. Your "multi-threaded" server may be reduced to what is effectively an unnecessarily complicated, single-threaded server.
If the client objects have no mutable state, good for you. You can be "free and easy" and share them between threads with no problems. If they do have mutable state, but initializing them is slow, then I would recommend caching one object per thread, so they are never shared. Thread[] is your friend there.

Delphi thread best practices

I am implementing a synchronization method inside my application. The main steps it will perform are:
Get XML content from a remote site
Parse this XML using IXMLDomDocument2
Update a Firebird database
The logic is quite complex, but it is working fine per se.
The problem is when I try to run it inside a separate thread. It is clear to me that I am not getting thread safety properly in my logic.
So let´s slice it
I - Get content using TidHTTP
Didn´t have any problems with it, should I have any concerns here?
II - For IXMLDomDocument2 I am calling
CoInitializeEx(nil, 0);
which according to the documentation should be enough to use IXMLDomDocument2 safely. And it seems to be ok, after adding it I did not get any error when trying to use it. Any extra concern here?
III - To use Firebird safely
My problems are here. Sometimes it works, sometimes it don´t (which I guess is the main symptom of badly designed thread logic). Most of the time I get a EInterbaseError with the message "Error reading data from the connection". Other times it simply locks.
Should I have a separate connection with the database?
Warren nailed the main problem with sharing the connection between the background and foreground thread... you have another issue and that is every call to CoInitialize needs to be paired with CoUninitialize
http://msdn.microsoft.com/en-us/library/windows/desktop/ms688715(v=vs.85).aspx

Can i get the id of the thread which holds a CriticalSection?

I want to write a few asserts around a complicated multithreaded piece of code.
Is there some way to do a
assert(GetCurrentThreadId() == ThreadOfCriticalSection(sec));
If you want to do this properly I think you have use a wrapper object around your critical sections which will track which thread (if any) owns each CS in debug builds.
i.e. Rather than call EnterCriticalSection directly, you'd call a method on your wrapper which did the EnterCriticalSection and then, when it succeeded, stored GetCurrentThreadId in a DWORD which the asserts would check. Another method would zero that thread ID DWORD before calling LeaveCriticalSection.
(In release builds, the wrapper would probably omit the extra stuff and just call Enter/LeaveCriticalSection.)
As Casablanca points out, the owner thread ID is within the current CRITICAL_SECTION structure, so using a wrapper like I suggest would be storing redundant information. But, as Casablanca also points out, the CRITICAL_SECTION structure is not part of any API contract and could change. (In fact, it has changed in past Windows versions.)
Knowing the internal structure is useful for debugging but should not be used in production code.
So which method you use depends on how "proper" you want your solution to be. If you just want some temporary asserts for tracking down problems today, on the current version of Windows, then using the CRITICAL_SECTION fields directly seems reasonable to me. Just don't expect those asserts to be valid forever. If you want something that will last longer, use a wrapper.
(Another advantage of using a wrapper is that you'll get RAII. i.e. The wrapper's constructor and destructor will take care of the InitializeCriticalSection and DeleteCriticalSection calls so you no longer have to worry about them. Speaking of which, I find it extremely useful to have a helper object which enters a CS on construction and then automatically leaves it on destruction. No more critical sections accidentally left locked because a function had an early return hidden in the middle of it...)
As far as I know, there is no documented way to get this information. If you look at the headers, the CRITICAL_SECTION structure contains a thread handle, but I wouldn't rely on such information because internal structures could change without notice. A better way would be to maintain this information yourself whenever a thread enters/exits the critical section.
Your requirement doesn't make sense. If your current thread is not the thread which is in the critical section, then the code within the current thread won't be running, it'll be blocked when trying to lock the critical section.
If your thread is actually inside the critical section, then your assertion will always be true. If it's not, your assertion will always be false!
So what I mean is, assuming you're able to track which thread is in the critical section, if you place your assertion inside the critical section code, it'll always be true. If you place it outside, it'll always be false.

Resources