Are i/o WinApi methods thread safe? - multithreading

In my last job interview, I was asked to make a program that sorts data in a huge file. I implemented it using c++ WinApi. Everything was ok until interviewer noticed that i wrote to the file concurrently through a single file handle. He told that i had to synchronize writes with a mutex. I tried to argue that every thread wrote data in its own file area explicitly specifing offset from the file beginning so there was no need in synchronization, it was useless.
Questions:
Is it safe to write (WriteFile) to a file concurrently using a
single handle, assuming that a every thread has its own file part?
Where i can find any information about it?

Related

Delphi call TDBGrid.Datasource.Dataset.Refresh from a background thread [duplicate]

I'd like to be able to open a TDataSet asynchronously in its own thread so that the main VCL thread can continue until that's done, and then have the main VCL thread read from that TDataSet afterwards. I've done some experimenting and have gotten into some very weird situations, so I'm wondering if anyone has done this before.
I've seen some sample apps where a TDataSet is created in a separate thread, it's opened and then data is read from it, but that's all done in the separate thread. I'm wondering if it's safe to read from the TDataSet from the main VCL thread after the other thread opens the data source.
I'm doing Win32 programming in Delphi 7, using TmySQLQuery from DAC for MySQL as my TDataSet descendant.
Provided you only want to use the dataset in its own thread, you can just use synchronize to communicate with the main thread for any VCL/UI update, like with any other component.
Or, better, you can implement communication between the mainthread and worker threads with your own messaging system.
check Hallvard's solution for threading here:
http://hallvards.blogspot.com/2008/03/tdm6-knitting-your-own-threads.html
or this other one:
http://dn.codegear.com/article/22411
for some explanation on synchronize and its inefficiencies:
http://www.eonclash.com/Tutorials/Multithreading/MartinHarvey1.1/Ch3.html
I have seen it done with other implementations of TDataSet, namely in the Asta components. These would contact the server, return immediately, and then fire an event once the data had been loaded.
However, I believe it depends very much on the component. For example, those same Asta components could not be opened in a synchronous manner from anything other than the main VCL thread.
So in short, I don't believe it is a limitation of TDataSet per se, but rather something that is implementation specific, and I don't have access to the components you've mentioned.
One thing to keep in mind about using the same TDataSet between multiple threads is you can only read the current record at any given time. So if you are reading the record in one thread and then the other thread calls Next then you are in trouble.
Also remember the thread will most likely need its own database connection. I believe what is needed here is a multi-threaded "holding" object to load the data from the thread into (write only) which is then read only from the main VCL thread. Before reading use some sort of syncronization method to insure that your not reading the same moment your writing, or writing the same moment your reading, or load everything into a memory file and write a sync method to tell the main app where in the file to stop reading.
I have taken the last approach a few times, depdending on the number of expected records (and the size of the dataset) I have even taken this to a physical disk file on the local system. It works quite well.
I've done multithreaded data access, and it's not straightforward:
1) You need to create a session per thread.
2) Everything done to that TDataSet instance must be done in context of the thread where it was created. That's not easy if you wanted to place e.g. a db grid on top of it.
3) If you want to let e.g. main thread play with your data, the straight-forward solution is to move it into a separate container of some kind,e.g. a Memory dataset.
4) You need some kind of signaling mechanism to notify main thread once your data retrieval is complete.
...and exception handling isn't straightforward, either...
But: Once you've succeeded, the application will be really elegant !
Most TDatasets are not thread safe. One that I know is thread safe is kbmMemtable. It also has the ability to clone a dataset so that the problem of moving the record pointer (as explained by Jim McKeeth) does occur. They're one of the best datasets you can get (bought or free).

multi-threaded file locking in linux

I'm working on a multithreaded application where multiple threads may want exclusive access to the same file. I'm looking for a way of serializing these operations. I was planning to use flock, lockf, or fcntl locking. However it appears that with these methods an attempt to lock a file by a second thread when a first thread already owns the lock will be granted, because the two threads are in the same process. This is according to the manpages for flock and fnctl (and I guess in linux lockf is implemented with fnctl). Also supported by this other question. So, are there other ways of locking a file in linux which works at a thread-level instead of a process-level?
Some alternatives that I came up with which I do not like are:
1) Use a lockfile (xxx.lock) opened with O_CREAT | O_EXCL flags. This call will succeed only in one thread if there is contention. The problem with this is that then other threads have to spin on the call until they achieve the lock, meaning that I have to _yield() or sleep() which makes me think this is not a great option.
2) Keep a mutex'ed list of all open files. When a thread wants to open/close a file it has to lock the list first. When opening a file, it searches the list to see if it's open. This sounds particularly inefficient because it requires a significant amount of work even if the file is not owned yet.
Are there other ways of doing this?
Edit:
I just discovered this text in my system's manpages which isn't in the online man pages:
If a process uses open(2) (or similar) to obtain more than one descriptor for the same file, these descriptors are treated independently by flock(). An attempt to lock the file using one of these file descriptors may be denied by a lock that the calling process has already placed via another descriptor.
I'm not happy about the words "may be denied", I'd prefer "will be denied" but I guess it's time to test that.

Write to file from many threads

i have a problem.
my progrem creates a some number of processes in the system (Windows).
They all must write some data in ONE file on a disk.
so, i need to synchronize them... but don't know...
exactly: my program calls SetWindowsHook and injects one dll in many processes. and they all need to write some data to one file
The synchronisation object that works across processes is a mutex.
Windows have a lock foreach file, so if one process is writting in a file windows wont let another write. Mutex is what you want, protect the code where you are writting into the file with one.
Single System As David mentioned, you can use a mutex to accomplish this task. In Windows, this is done by using named mutexes and (if you want) named semaphores to do this.
The function CreateMutex can be used to both create the mutex (by the first process) and open it by the other processes. Supply the same lpName value in all processes. Use WaitForSingleObject to gain ownership of the mutex. And use ReleaseMutex to give up ownership.
An example of creating a named mutex can be found here.
If use a named semaphore, you can accomplish the same thing by giving the semaphore an initial count of 1. Use CreateSemaphore to create and open it, WaitForSingleObject to gain ownership (same as with a mutex) and ReleaseSemaphore to give up ownwership.
Multiple Systems The above approach assumes that the processes are all running on the same system. If the processes are running on different systems, then you may need to use the idea mentioned by DVD. You can lock portions of a file and use that as the synchronization method. For example, you could, by convention, lock 1 byte at some offset (it can even be past the end of the file) as a type of semaphore. Using this mechanism, though, may mean you need to implement some kind of efficient wait depending on the functions you use. If you use CreateFile and LockFileEx, you can have the function do a blocked wait by not specifying LOCKFILE_FAIL_IMMEDIATELY in the call.
The answer to your problem is to implement Thread synchronization.
If you are using C#, you can put a lock{} statement over your file writing code.
For other languages, you must use a Monitor or Mutex class to synchronize.
Use Stream.Synchronized()
See http://msdn.microsoft.com/en-us/library/system.io.stream.synchronized.aspx. This method only works for C# though
I recently had to do almost this exact thing (for logging to a single file from a dll injected into multiple processes).
Use _fsopen() to open the file in each process and then use a mutex for synchronization to ensure that only one process at a time is ever writing to the file.

files on multiple processes

If one of my processes open a file, let's say for reading only, does the O.S guarantee that no other process will write on it as I'm reading, maybe
leaving the reading process with first part of the old file version, and second part of the newer file version, making data integrity questionable?
I am not talking about pipes which have no seek, but on regular files, with seek option (at least when opened with only one process).
No, other processes can change the file contents as you are reading it. Try running "man fcntl" and ignore the section on "advisory" locks; those are "optional" locks that processes only have to pay attention to if they want. Instead, look for the (alas, non-POSIX) "mandatory" locks. Those are the ones that will protect you from other programs. Try a read lock.
No, if you open a file, other processes can write to it, unless you use a lock.
On Linux, you can add an advisory lock on a file with:
#include <sys/file.h>
...
flock(file_descriptor,LOCK_EX); // apply an advisory exclusive lock
Any process which can open the file for writing, may write to it. Writes can happen concurrently with your own writes, resulting in (potentially) indeterminate states.
It is your responsibility as an application writer to ensure that Bad Things don't happen. In my opinion mandatory locking is not a good idea.
A better idea is not to grant write access to processes which you don't want to write to the file.
If several processes open a file, they will have independent file pointers, so they can seek() and not affect one another.
If a file is opened by a threaded program (or a task which shares its file descriptors with another, more generally), the file pointer is also shared, so you need to use another method to access the file to avoid race conditions causing chaos - normally pread, pwrite, or the scatter/gather functions readv and writev.

In Delphi, is TDataSet thread safe?

I'd like to be able to open a TDataSet asynchronously in its own thread so that the main VCL thread can continue until that's done, and then have the main VCL thread read from that TDataSet afterwards. I've done some experimenting and have gotten into some very weird situations, so I'm wondering if anyone has done this before.
I've seen some sample apps where a TDataSet is created in a separate thread, it's opened and then data is read from it, but that's all done in the separate thread. I'm wondering if it's safe to read from the TDataSet from the main VCL thread after the other thread opens the data source.
I'm doing Win32 programming in Delphi 7, using TmySQLQuery from DAC for MySQL as my TDataSet descendant.
Provided you only want to use the dataset in its own thread, you can just use synchronize to communicate with the main thread for any VCL/UI update, like with any other component.
Or, better, you can implement communication between the mainthread and worker threads with your own messaging system.
check Hallvard's solution for threading here:
http://hallvards.blogspot.com/2008/03/tdm6-knitting-your-own-threads.html
or this other one:
http://dn.codegear.com/article/22411
for some explanation on synchronize and its inefficiencies:
http://www.eonclash.com/Tutorials/Multithreading/MartinHarvey1.1/Ch3.html
I have seen it done with other implementations of TDataSet, namely in the Asta components. These would contact the server, return immediately, and then fire an event once the data had been loaded.
However, I believe it depends very much on the component. For example, those same Asta components could not be opened in a synchronous manner from anything other than the main VCL thread.
So in short, I don't believe it is a limitation of TDataSet per se, but rather something that is implementation specific, and I don't have access to the components you've mentioned.
One thing to keep in mind about using the same TDataSet between multiple threads is you can only read the current record at any given time. So if you are reading the record in one thread and then the other thread calls Next then you are in trouble.
Also remember the thread will most likely need its own database connection. I believe what is needed here is a multi-threaded "holding" object to load the data from the thread into (write only) which is then read only from the main VCL thread. Before reading use some sort of syncronization method to insure that your not reading the same moment your writing, or writing the same moment your reading, or load everything into a memory file and write a sync method to tell the main app where in the file to stop reading.
I have taken the last approach a few times, depdending on the number of expected records (and the size of the dataset) I have even taken this to a physical disk file on the local system. It works quite well.
I've done multithreaded data access, and it's not straightforward:
1) You need to create a session per thread.
2) Everything done to that TDataSet instance must be done in context of the thread where it was created. That's not easy if you wanted to place e.g. a db grid on top of it.
3) If you want to let e.g. main thread play with your data, the straight-forward solution is to move it into a separate container of some kind,e.g. a Memory dataset.
4) You need some kind of signaling mechanism to notify main thread once your data retrieval is complete.
...and exception handling isn't straightforward, either...
But: Once you've succeeded, the application will be really elegant !
Most TDatasets are not thread safe. One that I know is thread safe is kbmMemtable. It also has the ability to clone a dataset so that the problem of moving the record pointer (as explained by Jim McKeeth) does occur. They're one of the best datasets you can get (bought or free).

Resources