files on multiple processes - linux

If one of my processes open a file, let's say for reading only, does the O.S guarantee that no other process will write on it as I'm reading, maybe
leaving the reading process with first part of the old file version, and second part of the newer file version, making data integrity questionable?
I am not talking about pipes which have no seek, but on regular files, with seek option (at least when opened with only one process).

No, other processes can change the file contents as you are reading it. Try running "man fcntl" and ignore the section on "advisory" locks; those are "optional" locks that processes only have to pay attention to if they want. Instead, look for the (alas, non-POSIX) "mandatory" locks. Those are the ones that will protect you from other programs. Try a read lock.

No, if you open a file, other processes can write to it, unless you use a lock.
On Linux, you can add an advisory lock on a file with:
#include <sys/file.h>
...
flock(file_descriptor,LOCK_EX); // apply an advisory exclusive lock

Any process which can open the file for writing, may write to it. Writes can happen concurrently with your own writes, resulting in (potentially) indeterminate states.
It is your responsibility as an application writer to ensure that Bad Things don't happen. In my opinion mandatory locking is not a good idea.
A better idea is not to grant write access to processes which you don't want to write to the file.
If several processes open a file, they will have independent file pointers, so they can seek() and not affect one another.
If a file is opened by a threaded program (or a task which shares its file descriptors with another, more generally), the file pointer is also shared, so you need to use another method to access the file to avoid race conditions causing chaos - normally pread, pwrite, or the scatter/gather functions readv and writev.

Related

POSIX-compliant file locking (within a single process)?

I'm making a client/server system where clients can download and upload files to the server (one client can do several such operations at once). In case of a client crash it has to resume its interrupted operations on restart.
Obviously, I need some metadata file that keeps track of current operations. It would be accessed by a thread every time a chunk of data is downloaded/uploaded. Also, the client application should be able to print all files' download/upload progress in %.
I don't mind locking the whole meta-file for a single entry (that corresponds to single download/upload operation) update but at least reading it should allow thread concurrency (unless finding and reading a line in a file is fast).
This article says that inter-process file locking sucks in POSIX. What options do I have?
EDIT: it might be clear already but concurrency in my system must be based on pthreads.
This article says that inter-process file locking sucks in POSIX. What options do I have?
The article correctly describes the state of inter-process file locking. In your case, you have single process, thus no inter-process locking is taking place.
[...] concurrency in my system must be based on pthreads.
As you say, one of the possibilities, is to use a global mutex to synchronize accesses to the meta file.
One way to minimize the locking, is to make the per-thread/per-file entry in the meta-file of predictable size (for best results aligned on the file-system block size). That would allow following:
while holding the global mutex, allocate the per-file/per-thread entry in the meta-file by appending it to the end of the meta-file, and saving its file offset,
with the file offset, use the pread()/pwrite() functions to update the information about the file operations, without using or affecting the global file offset.
Since every thread would be writing into its own area of the meta-file, there should be no race conditions.
(If the number of entries has upper limit, then it is also possible to preallocate the whole file and mmap() it, and then use it as if it was plain memory. If application crashes, the most recent state (possibly with some corruptions; application has crashed after all) would be present in the file. Some applications to speed-up restart after software crash, go as far as keep the whole state of the application in a mmap()ed file.)
Another alternative is to use the meta-file as "journal": open the meta-file in append mode; as entry, write status change of the pending file operations, including start and end of the file operations. After the crash, to recover the state of the file transfers, you "replay" the journal file: read the the entries from the file, and update the state of the file operations in the memory. After you reach the end of the journal, the state in memory should be up-to-date and ready to be resumed. The typical complication of the approach, since the journal file is only written into, is that it has to be clean-up periodically, purging old finished operations from it. (Simplest method is to implement the clean-up during recovery (after recovery, write new journal and delete the old) and then periodically gracefully restart the application.)

Are i/o WinApi methods thread safe?

In my last job interview, I was asked to make a program that sorts data in a huge file. I implemented it using c++ WinApi. Everything was ok until interviewer noticed that i wrote to the file concurrently through a single file handle. He told that i had to synchronize writes with a mutex. I tried to argue that every thread wrote data in its own file area explicitly specifing offset from the file beginning so there was no need in synchronization, it was useless.
Questions:
Is it safe to write (WriteFile) to a file concurrently using a
single handle, assuming that a every thread has its own file part?
Where i can find any information about it?

multi-threaded file locking in linux

I'm working on a multithreaded application where multiple threads may want exclusive access to the same file. I'm looking for a way of serializing these operations. I was planning to use flock, lockf, or fcntl locking. However it appears that with these methods an attempt to lock a file by a second thread when a first thread already owns the lock will be granted, because the two threads are in the same process. This is according to the manpages for flock and fnctl (and I guess in linux lockf is implemented with fnctl). Also supported by this other question. So, are there other ways of locking a file in linux which works at a thread-level instead of a process-level?
Some alternatives that I came up with which I do not like are:
1) Use a lockfile (xxx.lock) opened with O_CREAT | O_EXCL flags. This call will succeed only in one thread if there is contention. The problem with this is that then other threads have to spin on the call until they achieve the lock, meaning that I have to _yield() or sleep() which makes me think this is not a great option.
2) Keep a mutex'ed list of all open files. When a thread wants to open/close a file it has to lock the list first. When opening a file, it searches the list to see if it's open. This sounds particularly inefficient because it requires a significant amount of work even if the file is not owned yet.
Are there other ways of doing this?
Edit:
I just discovered this text in my system's manpages which isn't in the online man pages:
If a process uses open(2) (or similar) to obtain more than one descriptor for the same file, these descriptors are treated independently by flock(). An attempt to lock the file using one of these file descriptors may be denied by a lock that the calling process has already placed via another descriptor.
I'm not happy about the words "may be denied", I'd prefer "will be denied" but I guess it's time to test that.

RAII and system resource cleanup

RAII is a good solution for resource cleanup. However, RAII is based on stack unwinding. If process terminates abnormally, the stack won't be unwinded. It means RAII won't work in this situation. For process life-time's resource, it's nothing to worry about, but for file system life tiem or kernal life time resource, such as file, message queue, semaphore, shared memory, it will be a problem.
How can I cleanup the system(fs and kernal) resource in a reliable way?
Example:
A shared file will be create by "master" process, and be used by "slave" process. The shared file should be deleted by "master" process in plan. Does it exist a way to do that.
Obvious, the shared file can't be unlink at once after it is created. If that, other processes can't "see" the file.
There is no perfect answer to your question. You mostly just mitigate the effects as best you can, so that "abnormal termination leaving junk behind" is rare.
First, write your programs to be as robust as possible against abnormal process termination.
Second, rely on kernel mechanisms where possible. In your shared file example, if you are just talking about one "master" and one "slave" process using a file to communicate, then you can unlink the file as soon as both processes have it open. The file will continue to exist and be readable and writable by both processes until both of them close it, at which point the kernel will automatically reclaim the storage. (Even if both of them terminate abnormally.)
And of course, the next time your server starts, it can clean up any junk left by the previous run, assuming only one is supposed to exist at a time.
The usual last-chance mechanism is to have "cleanup" processes that run periodically to (e.g.) sweep out /tmp.
But what you are asking is fundamentally hard. Any process responsible for handling another's abnormal termination might itself terminate abnormally. "Who watches the watchers?"

Write to file from many threads

i have a problem.
my progrem creates a some number of processes in the system (Windows).
They all must write some data in ONE file on a disk.
so, i need to synchronize them... but don't know...
exactly: my program calls SetWindowsHook and injects one dll in many processes. and they all need to write some data to one file
The synchronisation object that works across processes is a mutex.
Windows have a lock foreach file, so if one process is writting in a file windows wont let another write. Mutex is what you want, protect the code where you are writting into the file with one.
Single System As David mentioned, you can use a mutex to accomplish this task. In Windows, this is done by using named mutexes and (if you want) named semaphores to do this.
The function CreateMutex can be used to both create the mutex (by the first process) and open it by the other processes. Supply the same lpName value in all processes. Use WaitForSingleObject to gain ownership of the mutex. And use ReleaseMutex to give up ownership.
An example of creating a named mutex can be found here.
If use a named semaphore, you can accomplish the same thing by giving the semaphore an initial count of 1. Use CreateSemaphore to create and open it, WaitForSingleObject to gain ownership (same as with a mutex) and ReleaseSemaphore to give up ownwership.
Multiple Systems The above approach assumes that the processes are all running on the same system. If the processes are running on different systems, then you may need to use the idea mentioned by DVD. You can lock portions of a file and use that as the synchronization method. For example, you could, by convention, lock 1 byte at some offset (it can even be past the end of the file) as a type of semaphore. Using this mechanism, though, may mean you need to implement some kind of efficient wait depending on the functions you use. If you use CreateFile and LockFileEx, you can have the function do a blocked wait by not specifying LOCKFILE_FAIL_IMMEDIATELY in the call.
The answer to your problem is to implement Thread synchronization.
If you are using C#, you can put a lock{} statement over your file writing code.
For other languages, you must use a Monitor or Mutex class to synchronize.
Use Stream.Synchronized()
See http://msdn.microsoft.com/en-us/library/system.io.stream.synchronized.aspx. This method only works for C# though
I recently had to do almost this exact thing (for logging to a single file from a dll injected into multiple processes).
Use _fsopen() to open the file in each process and then use a mutex for synchronization to ensure that only one process at a time is ever writing to the file.

Resources