What is the correct way to write to a memory mapped file with no synchronization from multiple threads in Rust?
I need to create a 40+ GB file using multiple threads. The file is used as a giant vector of u64 values. Threads do not need any kind of synchronization -- each thread's output will be unique to that thread, but each thread does NOT get its own slice. Rather, the nature of the data ensures each thread will generate a set of unique positions in the file to write to. Simple example -- each thread writes to a position [ind / thread_count], where ind goes to millions. For thread_count = 2, one thread writes to odd positions, and the other to even.
I have used memmap2 - a new maintained fork of the memmap lib. The memmap2 seems to do everything I need for the access, but I do not know how to properly use it from multiple threads.
Related
Suppose we have a 100x100 matrix.
We have two threads that both access this matrix by reference (std::ref() in C++).
First thread is assigned rows 1-50, second 51-100. They both start working on their blocks and writing in to them.
There's no communication between the two threads and no chance that one thread will read/write something from the block assigned to the other thread.
In this particular case, it seems that using a mutex is redundant, am I correct?
Correct. If your do not share any data, there is no need for locking.
But you are having this matrix for a reason, you probably want to share it later. To do this you will need to enstablish some communication between threads probably using mutex and condition variable.
In math most of the time threads are used to offload some computations to other CPUs so that their results could be merged later. Merging is the part when synchronization is needed.
If there is multiple threads going through a function which inside of it there is a for loop with variable assignment inside of it. How do variables values don't get messed up across multiple threads?
Each thread has a block of memory named "thread stack", of about 1MB for 32bits and 4Mb for 64 bitsr, where value types and references (pointers to the heap) that are parameters or intermediate results are stored; this space is not shared accross threads. The rest of things (ie: shared value types and references and the objects to which any references maybe pointing to) are stored in the heap, that is the rest of RAM memory and it is shared across all threads.
Values can be messed up across threads, for example in race conditions. If you try to update a 64 bits value from multiple threads with a 32 bits CPU, you could get an inconsistency, for preventing that you have many synchronization primitives in .NET like the Interlock and Monitor utilities for example. Also, non thread-safe objects like List<> can get messed up if using multiple threads without synchronization, and a simple integer incrementation can lost values if operation is not atomic.
You should do some read. I recommend you the book CLR via C# from Jeffrey Richter. Every .NET dev should read that book actually.
I'm writing a basic UNIX program that involves processes sending messages to each other. My idea to synchronize the processes is to simply have an array of flags to indicate whether or not a process has reached a certain point in the code.
For example, I want all the processes to wait until they've all been created. I also want them to wait until they've all finished sending messages to each other before they begin reading their pipes.
I'm aware that a process performs a copy-on-write operation when it writes to a previously defined variable.
What I'm wondering is, if I make an array of flags, will the pointer to that array be copied, or will the entire array be copied (thus making my idea useless).
I'd also like any tips on inter-process communication and process synchronization.
EDIT: The processes are writing to each other process' pipe. Each process will send the following information:
typedef struct MessageCDT{
pid_t destination;
pid_t source;
int num;
} Message;
So, just the source of the message and some random number. Then each process will print out the message to stdout: Something along the lines of "process 20 received 5724244 from process 3".
Unix processes have independent address spaces. This means that the memory in one is totally separate from the memory in another. When you call fork(), you get a new copy of the process. Immediately on return from fork(), the only thing different between the two processes is fork()'s return value. All of the data in the two processes are the same, but they are copies. Updating memory in one cannot be known by the other, unless you take steps to share the memory.
There are many choices for interprocess communication (IPC) in Unix, including shared memory, semaphores, pipes (named and unnamed), sockets, message queues and signals. If you Google these things you will find lots to read.
In your particular case, trying to make several processes wait until they all reach a certain point, I might use a semaphore or shared memory, depending on whether there is some master process that started them all or not.
If there is a master process that launches the others, then the master could setup the semaphore with a count equal to the number of processes to synchronize and then launch them. Each child could then decrement the semaphore value and wait for the semaphore value to reach zero.
If there is no master process, then I might create a shared memory segment that contains a count of processes and a flag for each process. But when you have two or more processes using shared memory, then you also need some kind of locking mechanism (probably a semaphore again) to ensure that two processes do not try to update the shared memory simultaneously.
Keep in mind that reading a pipe that nobody is writing to will block the reader until data appears. I don't know what your processes do, but perhaps that is synchronization enough? One other thing to consider if you have multiple processes writing to a given pipe, their data may become interleaved if the writes are larger than PIPE_BUF. The value and location of this macro are system dependent.
-Kevin
The entire array of flags will seem to be copied. It will not actually be copied until one process or another writes to it of course. But that's an implementation detail and transparent to the individual processes. As far as each process is concerned, they each get a copy of the array.
There are ways to make this not happen. You can use mmap with the MAP_SHARED option for the memory used for your flags. Then each sub-process will share the same region of memory. There's also Posix shared memory (which I, BTW, think is an awful hack). To find out about Posix shared memory, look at the shm_overview(7) man page.
But using memory in this way isn't really a good idea. On multi-core systems it's not always the case that when one process (or thread) writes to an area of shared memory that all other processes will see the value written right away. Frequently the value will hang out for awhile in the L2 cache and not be immediately flushed.
If you want to communicate using shared memory, you will have to used mutexes or the C++11 atomic operations to ensure that writes are properly seen by the other processes.
Here's the deal. My app has a lot of threads that do the same thing - read specific data from huge files(>2gb), parse the data and eventually write to that file.
Problem is that sometimes it could happen that one thread reads X from file A and second thread writes to X of that same file A. A problem would occur?
The I/O code uses TFileStream for every file. I split the I/O code to be local(static class), because I'm afraid there will be a problem. Since it's split, there should be critical sections.
Every case below is local(static) code that is not instaniated.
Case 1:
procedure Foo(obj:TObject);
begin ... end;
Case 2:
procedure Bar(obj:TObject);
var i: integer;
begin
for i:=0 to X do ...{something}
end;
Case 3:
function Foo(obj:TObject; j:Integer):TSomeObject
var i:integer;
begin
for i:=0 to X do
for j:=0 to Y do
Result:={something}
end;
Question 1: In which case do I need critical sections so there are no problems if >1 threads call it at same time?
Question 2: Will there be a problem if Thread 1 reads X(entry) from file A while Thread 2 writes to X(entry) to file A?
When should I use critical sections? I try to imagine it my head, but it's hard - only one thread :))
EDIT
Is this going to suit it?
{a class for every 2GB file}
TSpecificFile = class
cs: TCriticalSection;
...
end;
TFileParser = class
file :TSpecificFile;
void Parsethis; void ParseThat....
end;
function Read(file: TSpecificFile): TSomeObject;
begin
file.cs.Enter;
try
...//read
finally
file.cs.Leave;
end;
end;
function Write(file: TSpecificFile): TSomeObject;
begin
file.cs.Enter;
try
//write
finally
file.cs.Leave
end;
end;
Now will there be a problem if two threads call Read with:
case 1: same TSpecificFile
case 2: different TSpecificFile?
Do i need another critical section?
In general, you need a locking mechanism (critical sections are a locking mechanism) whenever multiple threads may access a shared resource at the same time, and at least one of the threads will be writing to / modifying the shared resource.
This is true whether the resource is an object in memory or a file on disk.
And the reason that the locking is necessary is that, is that if a read operation happens concurrently with a write operation, the read operation is likely to obtain inconsistent data leading to unpredictable behaviour.
Stephen Cheung has mentioned the platform specific considerations with regards file handling, and I'll not repeat them here.
As a side note, I'd like to highlight another concurrency concern that may be applicable in your case.
Suppose one thread reads some data and starts processing.
Then another thread does the same.
Both threads determine that they must write a result to position X of File A.
At best the values to be written are the same, and one of the threads effectively did nothing but waste time.
At worst, the calculation of one of the threads is overwritten, and the result is lost.
You need to determine whether this would be a problem for your application. And I must point out that if it is, just locking the read and write operations will not solve it. Furthermore, trying to extend the duration of the locks leads to other problems.
Options
Critical Sections
Yes, you can use critical sections.
You will need to choose the best granularity of the critical sections: One per whole file, or perhaps use them to designate specific blocks within a file.
The decision would require a better understanding of what your application does, so I'm not going to answer for you.
Just be aware of the possibility of deadlocks:
Thread 1 acquires lock A
Thread 2 acquires lock B
Thread 1 desires lock B, but has to wait
Thread 2 desires lock A - causing a deadlock because neither thread is able to release its acquired lock.
I'm also going to suggest 2 other tools for you to consider in your solution.
Single-Threaded
What a shocking thing to say! But seriously, if your reason to go multi-threaded was "to make the application faster", then you went multi-threaded for the wrong reason. Most people who do that actually end up making their applications, more difficult to write, less reliable, and slower!
It is a far too common misconception that multiple threads speed up applications. If a task requires X clock-cycles to perform - it will take X clock-cycles! Multiple threads don't speed up tasks, it permits multiple tasks to be done in parallel. But this can be a bad thing! ...
You've described your application as being highly dependent on reading from disk, parsing what's read and writing to disk. Depending on how CPU intensive the parsing step is, you may find that all your threads are spending the majority of their time waiting for disk IO operations. In which case, the multiple threads generally only serve to shunt the disk heads to the far 'corners' of your (ummm round) disk platters. Disk IO is still the bottle-neck, and the threads make it behave as if the files are maximally fragmented.
Queueing Operations
Let's suppose your reason for going multi-threaded are valid, and you do still have threads operating on shared resources. Instead of using locks to avoid concurrency issues, you could queue your shared resource operations onto specific threads.
So instead of Thread 1:
Reading position X from File A
Parsing the data
Writing to position Y in file A
Create another thread; the FileA thread:
the FileA has a queue of instructions
When it gets to the instruction to read position X, it does so.
It sends the data to Thread 1
Thread 1 parses its data --- while FileA thread continues processing instructions
Thread 1 places an instruction to write its result to position Y at the back of FileA thread's queue --- while FileA thread continues to process other instructions.
Eventually FileA thread will write the data as required by Trhead 1.
Synchronization is only needed for shared data that can cause a problem (or an error) if more than one agent is doing something with it.
Obviously the file writing operation should be wrapped in a critical section for that file only if you don't want other writer processes to trample on the new data before the write is completed -- the file may no long be consistent if you have half of the new data modified by another process that does not see the other half of the new data (that hasn't been written out by the original writer process yet). Therefore you'll have a collection of CS's, one for each file. That CS should be released asap when you're done with writing.
In certain cases, e.g. memory-mapped files or sparse files, the O/S may allow you to write to different portions of the file at the same time. Therefore, in such cases, your CS will have to be on a particular segment of the file. Thus you'll have a collection of CS's (one for each segment) for each file.
If you write to a file and read it at the same time, the reader may get inconsistent data. In some O/S's, reading is allowed to happen simultaneously with a write (perhaps the read comes from cached buffers). However, if you are writing to a file and reading it at the same time, what you read may not be correct. If you need consistent data on reads, then the reader should also be subject to the critical section.
In certain cases, if you are writing to a segment and read from another segment, the O/S may allow it. However, whether this will return correct data usually cannot be guaranteed because there you can't always tell whether two segments of the file may be residing in one disk sector, or other low-level O/S things.
So, in general, the advise is to wrap any file operation in a CS, per file.
Theoretically, you should be able to read simultaneously from the same file, but locking it in a CS will only allow one reader. In that case, you'll need to separate your implementation into "read locks" and "write locks" (similar to a database system). This is highly non-trivial though as you'll then have to deal with promoting different levels of locks.
After note: The kind of thing you're trying to data (reading and writing huge data sets that are GB's in size simultaneously in segments) is what is typically done in a database. You should be looking into breaking your data files into database records. Otherwise, you either suffer from non-optimized read/write performance due to locking, or you end up re-inventing the relational database.
Conclusion first
You don't need TCriticalSection. You should implement a Queue-based algorithm that guarantees no two threads are working on the same piece of data, without blocking.
How I got to that conclusion
First of all Windows (Win 7?) will allow you to simultaneously write to a file as many times as you see fit. I have no idea what it does with the writes, and I'm clearly not saying it's a good idea, but I've just done the following test to prove Windows allows simultaneous multiple writes to the same file:
I made a thread that opens a file for writing (with "share deny none") and keeps writing random stuff to a random offset for 30 seconds. Here's a pastebin with the code.
Why a TCriticalSection would be bad
A critical section only allows one thread to access the protect resource at any given time. You have two options: Only hold the lock for the duration of the read/write operation, or hold the lock for the entire time required to process the given resource. Both have serious problems.
Here's what might happen if a thread holds the lock only for the duration of the read/write operations:
Thread 1 acquires the lock, reads the data, releases the lock
Thread 2 acquires the lock, reads the same data, releases the lock
Thread 1 finishes processing, acquires the lock, writes the data, releases the lock
Thread 2 acquires the lock, writes the data, and here's the oops: Thread 2 has been working on old data, since Thread 1 made changes in the background!
Here's what might happen if a thread holds the lock for the entire round-trim read & write operation:
Thread 1 acquires the lock, starts reading data
Thread 2 tries to acquire the same lock, gets blocked...
Thread 1 finishes reading the data, processes the data, writes the data back to file, releases the lock
Thread 2 acquires the lock and starts processing the same data again !
The Queue solution
Since you're multi-threading, and you can have multiple threads simultaneously processing data from the same file, I assume data is somehow "context free": You can process the 3rd part of a file before processing the 1st. This must be true, because if it's not, you can't multi-thread (or are limited to 1 thread per file).
Before you start processing you can prepare a number of "Jobs", that look like this:
File 'file1.raw', offset 0, 1024 Kb
File 'file1.raw', offset 1024, 1024 kb.
...
File 'fileN.raw', offset 99999999, 1024 kb
Put all those "jobs" in a queue. Have your threads dequeue one Job from the queue and process it. Since no two jobs overlap, threads don't need to synchronize with each other, so you don't need the critical section. You only need the critical section to protect access to the Queue itself. Windows makes sure threads can read and write to/from the files just fine, as long as they stick to the allocated "Job".
I am going to implement a program where one parent process reads a text file and feeds the data he's reading into a shared memory buffer that's going to be read by some children processes. All this work will be mediated by semaphores. Let's assume the parent is going to read one character at a time from the file and the shared memory buffer contains 5 slots.
At first, I thought of only having 2 semaphores:
writeSemaphore, initialized to 5, is the semaphore that tells whether the writer is allowed to write to the buffer. when it finally goes down to 0, the parent process will be blocked until one of the children unlocks it (after having read some block).
readSemaphore, initialized to 0, is the semaphore that tells whether any of the readers is allowed to read from the buffer.
But now that I think of it, this wouldn't prevent me from having 2 consumers accessing the the shared memory at the same time. I must prevent it. So I introduced a third semaphore:
allowedToRead that is either 1 or 0, that allows or blocks access to the children processes.
Here is pseudo code for both children and parent:
Child:
while (something) {
wait(readSemaphore)
wait(allowedToRead)
<<read from shared memory>>
post(allowedToRead)
post(writeSemaphore)
}
Parent:
while (something) {
wait(writeSemaphore)
<<writes to shared memory>>
post(allowedToRead)
}
Is my reasoning correct?
Thanks
Khachik is half right. He's may be all right, but his description isn't as clear as it could be.
Firstly, where you have the parent posting allowedToRead you probably mean for it to post readSemaphore.
Secondly your code allows the parent to write at the same time as a child is reading. You say you have 5 slots. If the parent writes to a different slot than the child is reading then this is ok I suppose, but how does the child determine where to read? Is it using the same variables as the parent is using to determine where to write? You probably need some extra protection there. After all I assume the different children are all reading different slots, so if you need to prevent them treading one ach other's toes you'll need to do the same for the parent too.
Thirdly, I'd have used a mutex instead of a semaphore for allowedToRead.
Fourthly, what determines which child reads which data or is it meant to be first come first served like pigs at a slop bucket?
If the shared memory has 5 independant slots, then I'd be inclined to add a "next read" and "next write" variable. Protect those two variables with a mutex in both producer and consumers, and then use the semaphores just to block/trigger reading and writing as you are already doing. If it weren't a school exercise, you could do better using a single condition variable attached to the mutex I mentioned. When it gets signalled the parent checks if he can write and the children check if they can read. When either a read or a write occurs, signal the condition variable globally to wake everybody up to check their conditions. This has the advantage that if you have independant buffer slots then you can safely and happily have multiple consumers consuming at the same time.
No.
the writer should release readSemaphore when it write one unit of information;
the writer should acquire allowedToRead lock (0,1 semaphore is a lock/mutex) before writing to shared memory to prevent race conditions.
To simplify: consider two functions read_shared_memory, write_shared_memory, which are to read and write from/to the shared memory respectively and both acquiring/releasing the same lock before reading/writing.
The producer acquires write semaphore, calls the write function, releases the read semaphore.
The consumer acquire read semaphore, calls the read function, releases the the write semaphore.
Sure this can be implemented without read/write functions, they are just to simplify using atomic access to the shared memory. A critical section can be implemented inside produce/consume loops without additional functions.
Wikipedia describes it in more scientific way :)