Here's the deal. My app has a lot of threads that do the same thing - read specific data from huge files(>2gb), parse the data and eventually write to that file.
Problem is that sometimes it could happen that one thread reads X from file A and second thread writes to X of that same file A. A problem would occur?
The I/O code uses TFileStream for every file. I split the I/O code to be local(static class), because I'm afraid there will be a problem. Since it's split, there should be critical sections.
Every case below is local(static) code that is not instaniated.
Case 1:
procedure Foo(obj:TObject);
begin ... end;
Case 2:
procedure Bar(obj:TObject);
var i: integer;
begin
for i:=0 to X do ...{something}
end;
Case 3:
function Foo(obj:TObject; j:Integer):TSomeObject
var i:integer;
begin
for i:=0 to X do
for j:=0 to Y do
Result:={something}
end;
Question 1: In which case do I need critical sections so there are no problems if >1 threads call it at same time?
Question 2: Will there be a problem if Thread 1 reads X(entry) from file A while Thread 2 writes to X(entry) to file A?
When should I use critical sections? I try to imagine it my head, but it's hard - only one thread :))
EDIT
Is this going to suit it?
{a class for every 2GB file}
TSpecificFile = class
cs: TCriticalSection;
...
end;
TFileParser = class
file :TSpecificFile;
void Parsethis; void ParseThat....
end;
function Read(file: TSpecificFile): TSomeObject;
begin
file.cs.Enter;
try
...//read
finally
file.cs.Leave;
end;
end;
function Write(file: TSpecificFile): TSomeObject;
begin
file.cs.Enter;
try
//write
finally
file.cs.Leave
end;
end;
Now will there be a problem if two threads call Read with:
case 1: same TSpecificFile
case 2: different TSpecificFile?
Do i need another critical section?
In general, you need a locking mechanism (critical sections are a locking mechanism) whenever multiple threads may access a shared resource at the same time, and at least one of the threads will be writing to / modifying the shared resource.
This is true whether the resource is an object in memory or a file on disk.
And the reason that the locking is necessary is that, is that if a read operation happens concurrently with a write operation, the read operation is likely to obtain inconsistent data leading to unpredictable behaviour.
Stephen Cheung has mentioned the platform specific considerations with regards file handling, and I'll not repeat them here.
As a side note, I'd like to highlight another concurrency concern that may be applicable in your case.
Suppose one thread reads some data and starts processing.
Then another thread does the same.
Both threads determine that they must write a result to position X of File A.
At best the values to be written are the same, and one of the threads effectively did nothing but waste time.
At worst, the calculation of one of the threads is overwritten, and the result is lost.
You need to determine whether this would be a problem for your application. And I must point out that if it is, just locking the read and write operations will not solve it. Furthermore, trying to extend the duration of the locks leads to other problems.
Options
Critical Sections
Yes, you can use critical sections.
You will need to choose the best granularity of the critical sections: One per whole file, or perhaps use them to designate specific blocks within a file.
The decision would require a better understanding of what your application does, so I'm not going to answer for you.
Just be aware of the possibility of deadlocks:
Thread 1 acquires lock A
Thread 2 acquires lock B
Thread 1 desires lock B, but has to wait
Thread 2 desires lock A - causing a deadlock because neither thread is able to release its acquired lock.
I'm also going to suggest 2 other tools for you to consider in your solution.
Single-Threaded
What a shocking thing to say! But seriously, if your reason to go multi-threaded was "to make the application faster", then you went multi-threaded for the wrong reason. Most people who do that actually end up making their applications, more difficult to write, less reliable, and slower!
It is a far too common misconception that multiple threads speed up applications. If a task requires X clock-cycles to perform - it will take X clock-cycles! Multiple threads don't speed up tasks, it permits multiple tasks to be done in parallel. But this can be a bad thing! ...
You've described your application as being highly dependent on reading from disk, parsing what's read and writing to disk. Depending on how CPU intensive the parsing step is, you may find that all your threads are spending the majority of their time waiting for disk IO operations. In which case, the multiple threads generally only serve to shunt the disk heads to the far 'corners' of your (ummm round) disk platters. Disk IO is still the bottle-neck, and the threads make it behave as if the files are maximally fragmented.
Queueing Operations
Let's suppose your reason for going multi-threaded are valid, and you do still have threads operating on shared resources. Instead of using locks to avoid concurrency issues, you could queue your shared resource operations onto specific threads.
So instead of Thread 1:
Reading position X from File A
Parsing the data
Writing to position Y in file A
Create another thread; the FileA thread:
the FileA has a queue of instructions
When it gets to the instruction to read position X, it does so.
It sends the data to Thread 1
Thread 1 parses its data --- while FileA thread continues processing instructions
Thread 1 places an instruction to write its result to position Y at the back of FileA thread's queue --- while FileA thread continues to process other instructions.
Eventually FileA thread will write the data as required by Trhead 1.
Synchronization is only needed for shared data that can cause a problem (or an error) if more than one agent is doing something with it.
Obviously the file writing operation should be wrapped in a critical section for that file only if you don't want other writer processes to trample on the new data before the write is completed -- the file may no long be consistent if you have half of the new data modified by another process that does not see the other half of the new data (that hasn't been written out by the original writer process yet). Therefore you'll have a collection of CS's, one for each file. That CS should be released asap when you're done with writing.
In certain cases, e.g. memory-mapped files or sparse files, the O/S may allow you to write to different portions of the file at the same time. Therefore, in such cases, your CS will have to be on a particular segment of the file. Thus you'll have a collection of CS's (one for each segment) for each file.
If you write to a file and read it at the same time, the reader may get inconsistent data. In some O/S's, reading is allowed to happen simultaneously with a write (perhaps the read comes from cached buffers). However, if you are writing to a file and reading it at the same time, what you read may not be correct. If you need consistent data on reads, then the reader should also be subject to the critical section.
In certain cases, if you are writing to a segment and read from another segment, the O/S may allow it. However, whether this will return correct data usually cannot be guaranteed because there you can't always tell whether two segments of the file may be residing in one disk sector, or other low-level O/S things.
So, in general, the advise is to wrap any file operation in a CS, per file.
Theoretically, you should be able to read simultaneously from the same file, but locking it in a CS will only allow one reader. In that case, you'll need to separate your implementation into "read locks" and "write locks" (similar to a database system). This is highly non-trivial though as you'll then have to deal with promoting different levels of locks.
After note: The kind of thing you're trying to data (reading and writing huge data sets that are GB's in size simultaneously in segments) is what is typically done in a database. You should be looking into breaking your data files into database records. Otherwise, you either suffer from non-optimized read/write performance due to locking, or you end up re-inventing the relational database.
Conclusion first
You don't need TCriticalSection. You should implement a Queue-based algorithm that guarantees no two threads are working on the same piece of data, without blocking.
How I got to that conclusion
First of all Windows (Win 7?) will allow you to simultaneously write to a file as many times as you see fit. I have no idea what it does with the writes, and I'm clearly not saying it's a good idea, but I've just done the following test to prove Windows allows simultaneous multiple writes to the same file:
I made a thread that opens a file for writing (with "share deny none") and keeps writing random stuff to a random offset for 30 seconds. Here's a pastebin with the code.
Why a TCriticalSection would be bad
A critical section only allows one thread to access the protect resource at any given time. You have two options: Only hold the lock for the duration of the read/write operation, or hold the lock for the entire time required to process the given resource. Both have serious problems.
Here's what might happen if a thread holds the lock only for the duration of the read/write operations:
Thread 1 acquires the lock, reads the data, releases the lock
Thread 2 acquires the lock, reads the same data, releases the lock
Thread 1 finishes processing, acquires the lock, writes the data, releases the lock
Thread 2 acquires the lock, writes the data, and here's the oops: Thread 2 has been working on old data, since Thread 1 made changes in the background!
Here's what might happen if a thread holds the lock for the entire round-trim read & write operation:
Thread 1 acquires the lock, starts reading data
Thread 2 tries to acquire the same lock, gets blocked...
Thread 1 finishes reading the data, processes the data, writes the data back to file, releases the lock
Thread 2 acquires the lock and starts processing the same data again !
The Queue solution
Since you're multi-threading, and you can have multiple threads simultaneously processing data from the same file, I assume data is somehow "context free": You can process the 3rd part of a file before processing the 1st. This must be true, because if it's not, you can't multi-thread (or are limited to 1 thread per file).
Before you start processing you can prepare a number of "Jobs", that look like this:
File 'file1.raw', offset 0, 1024 Kb
File 'file1.raw', offset 1024, 1024 kb.
...
File 'fileN.raw', offset 99999999, 1024 kb
Put all those "jobs" in a queue. Have your threads dequeue one Job from the queue and process it. Since no two jobs overlap, threads don't need to synchronize with each other, so you don't need the critical section. You only need the critical section to protect access to the Queue itself. Windows makes sure threads can read and write to/from the files just fine, as long as they stick to the allocated "Job".
Related
I have a vector of entities. At update cycle I iterate through vector and update each entity: read it's position, calculate current speed, write updated position. Also, during updating process I can change some other objects in other part of program, but each that object related only to current entity and other entities will not touch that object.
So, I want to run this code in threads. I separate vector into few chunks and update each chunk in different threads. As I see, threads are fully independent. Each thread on each iteration works with independent memory regions and doesn't affect other threads work.
Do I need any locks here? I assume, that everything should work without any mutexes, etc. Am I right?
Short answer
No, you do not need any lock or synchronization mechanism as your problem appear to be a embarrassingly parallel task.
Longer answer
A race conditions that can only appear if two threads might access the same memory at the same time and at least one of the access is a write operation. If your program exposes this characteristic, then you need to make sure that threads access the memory in an ordered fashion. One way to do it is by using locks (it is not the only one though). Otherwise the result is UB.
It seems that you found a way to split the work among your threads s.t. each thread can work independently from the others. This is the best case scenario for concurrent programming as it does not require any synchronization. The complexity of the code is dramatically decreased and usually speedup will jump up.
Please note that as #acelent pointed out in the comment section, if you need changes made by one thread to be visible in another thread, then you might need some sort of synchronization due to the fact that depending on the memory model and on the HW changes made in one thread might not be immediately visible in the other.
This means that you might write from Thread 1 to a variable and after some time read the same memory from Thread 2 and still not being able to see the write made by Thread 1.
"I separate vector into few chunks and update each chunk in different threads" - in this case you do not need any lock or synchronization mechanism, however, the system performance might degrade considerably due to false sharing depending on how the chunks are allocated to threads. Note that the compiler may eliminate false sharing using thread-private temporal variables.
You can find plenty of information in books and wiki. Here is some info https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads
Also there is a stackoverflow post here does false sharing occur when data is read in openmp?
In a multi-threading environment, isn’t it that every operation on the RAM must be synchronized?
Let’s say, I have a variable, which is a pointer to another memory address:
foo 12345678
Now, if one thread sets that variable to another memory address (let’s say 89ABCDEF), meanwhile the first thread reads the variable, couldn’t it be that the first thread reads totally trash from the variable if access wouldn’t be synchronized (on some system level)?
foo 12345678 (before)
89ABCDEF (new data)
••••• (writing thread progress)
89ABC678 (memory content)
Since I never saw those things happen I assume that there is some system level synchronization when writing variables. I assume, that this is why it is called an ‘atomic’ operation. As I found here, this problem is actually a topic and not totally fictious from me.
On the other hand, I read everywhere that synchronizing has a significant impact on performance. (Aside from threads that must wait bc. they cannot enter the lock; I mean just the action of locking and unlocking.) Like here:
synchronized adds a significant overhead to the methods […]. These operations are quite expensive […] it has an extreme impact on the program performance. […] the expensive synchronized operations that cause the code to be so terribly slow.
How does this go together? Why is locking for changing a variable unnoticeable fast, but locking for anything else so expensive? Or, is it equally expensive, and there should be a big warning sign when using—let’s say—long and double because they always implicitly require synchronization?
Concerning your first point, when a processor writes some data to memory, this data is always properly written and cannot be "trashed" by other writes by threads processes, OS, etc. It is not a matter of synchronization, just required to insure proper hardware behaviour.
Synchronization is a software concept that requires hardware support. Assume that you just want to acquire a lock. It is supposed to be free when at 0 et locked when at 1.
The basic method to do that is
got_the_lock=0
while(!got_the_lock)
fetch lock value from memory
set lock value in memory to 1
got_the_lock = (fetched value from memory == 0)
done
print "I got the lock!!"
The problem is that if other threads do the same thing at the same time and read lock value before it has been set to 1, several threads may think they got the lock.
To avoid that, one need atomic memory access. An atomic access is typically a read-modify-write cycle to a data in memory that cannot interrupted and that forbids access to this information until completion. So not all accesses are atomic, only specific read-modify-write operation and it is realized thanks tp specific processor support (see test-and-set or fetch-and-add instructions, for instance). Most accesses do not need it and can be a regular access. Atomic access is mostly use to synchronize threads to insure that only one thread is in a critical section.
So why are atomic access expensive ? There are several reasons.
The first one is that one must ensure a proper ordering of instructions. You probably know that instruction order may be different from instruction program order, provided the semantic of the program is respected. This is heavily exploited to improve performances : compiler reorder instructions, processor execute them out-of-order, write-back caches write data in memory in any order, and memory write buffer do the same thing. This reordering can lead to improper behavior.
1 while (x--) ; // random and silly loop
2 f(y);
3 while(test_and_set(important_lock)) ; //spinlock to get a lock
4 g(z);
Obviously instruction 1 is not constraining and 2 can be executed before (and probably 1 will be removed by an optimizing compiler). But if 4 is executed before 3, the behavior will not be as expected.
To avoid that, an atomic access flushes the instruction and memory buffer that requires tens of cycles (see memory barrier).
Without pipeline, you pay the full latency of the operation: read data from memory, modify it and write it back. This latency always happens, but for regular memory accesses you can do other work during that time that largely hides the latency.
An atomic access requires at least 100-200 cycles on modern processors and is accordingly extremely expensive.
How does this go together? Why is locking for changing a variable unnoticeable fast, but locking for anything else so expensive? Or, is it equally expensive, and there should be a big warning sign when using—let’s say—long and double because they always implicitly require synchronization?
Regular memory access are not atomic. Only specific synchronization instructions are expensive.
Synchronization always has a cost involved. And the cost increases with contention due to threads waking up, fighting for lock and only one gets it, and the rest go to sleep resulting in lot of context switches.
However, such contention can be kept at a minimum by using synchronization at a much granular level as in a CAS (compare and swap) operation by CPU, or a memory barrier to read a volatile variable. A far better option is to avoid synchronization altogether without compromising safety.
Consider the following code:
synchronized(this) {
// a DB call
}
This block of code will take several seconds to execute as it is doing a IO and therefore run high chance of creating a contention among other threads wanting to execute the same block. The time duration is enough to build up a massive queue of waiting threads in a busy system.
This is the reason the non-blocking algorithms like Treiber Stack Michael Scott exist. They do a their tasks (which we'd otherwise do using a much larger synchronized block) with the minimum amount of synchronization.
isn’t it that every operation on the RAM must be synchronized?
No. Most of the "operations on RAM" will target memory locations that are only used by one thread. For example, in most programming languages, None of a thread's function arguments or local variables will be shared with other threads; and often, a thread will use heap objects that it does not share with any other thread.
You need synchronization when two or more threads communicate with one another through shared variables. There are two parts to it:
mutual exclusion
You may need to prevent "race conditions." If some thread T updates a data structure, it may have to put the structure into a temporary, invalid state before the update is complete. You can use mutual exclusion (i.e., mutexes/semaphores/locks/critical sections) to ensure that no other thread U can see the data structure when it is in that temporary, invalid state.
cache consistency
On a computer with more than one CPU, each processor typically has its own memory cache. So, when two different threads running on two different processors both access the same data, they may each be looking at their own, separately cached copy. Thus, when thread T updates that shared data structure, it is important to ensure that all of the variables it updated make it into thread U's cache before thread U is allowed to see any of them.
It would totally defeat the purpose of the separate caches if every write by one processor invalidated every other processor's cache, so there typically are special hardware instructions to do that only when it's needed, and typical mutex/lock implementations execute those instructions on entering or leaving a protected block of code.
I've been reading about semaphores and came across this article:
www.csc.villanova.edu/~mdamian/threads/posixsem.html
So, this page states that if there are two threads accessing the same data, things can get ugly. The solution is to allow only one thread to access the data at the same time.
This is clear and I understand the solution, only why would anyone need threads to do this? What is the point? If the threads are blocked so that only one can execute, why use them at all? There is no advantage. (or maybe this is a just a dumb example; in such a case please point me to a sensible one)
Thanks in advance.
Consider this:
void update_shared_variable() {
sem_wait( &g_shared_variable_mutex );
g_shared_variable++;
sem_post( &g_shared_variable_mutex );
}
void thread1() {
do_thing_1a();
do_thing_1b();
do_thing_1c();
update_shared_variable(); // may block
}
void thread2() {
do_thing_2a();
do_thing_2b();
do_thing_2c();
update_shared_variable(); // may block
}
Note that all of the do_thing_xx functions still happen simultaneously. The semaphore only comes into play when the threads need to modify some shared (global) state or use some shared resource. So a thread will only block if another thread is trying to access the shared thing at the same time.
Now, if the only thing your threads are doing is working with one single shared variable/resource, then you are correct - there is no point in having threads at all (it would actually be less efficient than just one thread, due to context switching.)
When you are using multithreading not everycode that runs will be blocking. For example, if you had a queue, and two threads are reading from that queue, you would make sure that no thread reads at the same time from the queue, so that part would be blocking, but that's the part that will probably take the less time. Once you have retrieved the item to process from the queue, all the rest of the code can be run asynchronously.
The idea behind the threads is to allow simultaneous processing. A shared resource must be governed to avoid things like deadlocks or starvation. If something can take a while to process, then why not create multiple instances of those processes to allow them to finish faster? The bottleneck is just what you mentioned, when a process has to wait for I/O.
Being blocked while waiting for the shared resource is small when compared to the processing time, this is when you want to use multiple threads.
This is of course a SSCCE (Short, Self Contained, Correct Example)
Let's say you have 2 worker threads that do a lot of work and write the result to a file.
you only need to lock the file (shared resource) access.
The problem with trivial examples....
If the problem you're trying to solve can be broken down into pieces that can be executed in parallel then threads are a good thing.
A slightly less trivial example - imagine a for loop where the data being processed in each iteration is different every time. In that circumstance you could execute each iteration of the for loop simultaneously in separate threads. And indeed some compilers like Intel's will convert suitable for loops to threads automatically for you. In that particular circumstances no semaphores are needed because of the iterations' data independence.
But say you were wanting to process a stream of data, and that processing had two distinct steps, A and B. The threadless approach would involve reading in some data then doing A then B and then output the data before reading more input. Or you could have a thread reading and doing A, another thread doing B and output. So how do you get the interim result from the first thread to the second?
One way would be to have a memory buffer to contain the interim result. The first thread could write the interim result to a memory buffer and the second could read from it. But with two threads operating independently there's no way for the first thread to know if it's safe to overwrite that buffer, and there's no way for the second to know when to read from it.
That's where you can use semaphores to synchronise the action of the two threads. The first thread takes a semaphore that I'll call empty, fills the buffer, and then posts a semaphore called filled. Meanwhile the second thread will take the filled semaphore, read the buffer, and then post empty. So long as filled is initialised to 0 and empty is initialised to 1 it will work. The second thread will process the data only after the first has written it, and the first won't write it until the second has finished with it.
It's only worth it of course if the amount of time each thread spends processing data outweighs the amount of time spent waiting for semaphores. This limits the extent to which splitting code up into threads yields a benefit. Going beyond that tends to mean that the overall execution is effectively serial.
You can do multithreaded programming without semaphores at all. There's the Actor model or Communicating Sequential Processes (the one I favour). It's well worth looking up JCSP on Wikipedia.
In these programming styles data is shared between threads by sending it down communication channels. So instead of using semaphores to grant another thread access to data it would be sent a copy of that data down something a bit like a network socket, or a pipe. The advantage of CSP (which limits that communication channel to send-finishes-only-if-receiver-has-read) is that it stops you falling into the many many pitfalls that plague multithreaded do programs. It sounds inefficient (copying data is inefficient), but actually it's not so bad with Intel's QPI architecture, AMD's Hypertransport. And it means hat the 'channel' really could be a network connection; scalability built in by design.
I am going to implement a program where one parent process reads a text file and feeds the data he's reading into a shared memory buffer that's going to be read by some children processes. All this work will be mediated by semaphores. Let's assume the parent is going to read one character at a time from the file and the shared memory buffer contains 5 slots.
At first, I thought of only having 2 semaphores:
writeSemaphore, initialized to 5, is the semaphore that tells whether the writer is allowed to write to the buffer. when it finally goes down to 0, the parent process will be blocked until one of the children unlocks it (after having read some block).
readSemaphore, initialized to 0, is the semaphore that tells whether any of the readers is allowed to read from the buffer.
But now that I think of it, this wouldn't prevent me from having 2 consumers accessing the the shared memory at the same time. I must prevent it. So I introduced a third semaphore:
allowedToRead that is either 1 or 0, that allows or blocks access to the children processes.
Here is pseudo code for both children and parent:
Child:
while (something) {
wait(readSemaphore)
wait(allowedToRead)
<<read from shared memory>>
post(allowedToRead)
post(writeSemaphore)
}
Parent:
while (something) {
wait(writeSemaphore)
<<writes to shared memory>>
post(allowedToRead)
}
Is my reasoning correct?
Thanks
Khachik is half right. He's may be all right, but his description isn't as clear as it could be.
Firstly, where you have the parent posting allowedToRead you probably mean for it to post readSemaphore.
Secondly your code allows the parent to write at the same time as a child is reading. You say you have 5 slots. If the parent writes to a different slot than the child is reading then this is ok I suppose, but how does the child determine where to read? Is it using the same variables as the parent is using to determine where to write? You probably need some extra protection there. After all I assume the different children are all reading different slots, so if you need to prevent them treading one ach other's toes you'll need to do the same for the parent too.
Thirdly, I'd have used a mutex instead of a semaphore for allowedToRead.
Fourthly, what determines which child reads which data or is it meant to be first come first served like pigs at a slop bucket?
If the shared memory has 5 independant slots, then I'd be inclined to add a "next read" and "next write" variable. Protect those two variables with a mutex in both producer and consumers, and then use the semaphores just to block/trigger reading and writing as you are already doing. If it weren't a school exercise, you could do better using a single condition variable attached to the mutex I mentioned. When it gets signalled the parent checks if he can write and the children check if they can read. When either a read or a write occurs, signal the condition variable globally to wake everybody up to check their conditions. This has the advantage that if you have independant buffer slots then you can safely and happily have multiple consumers consuming at the same time.
No.
the writer should release readSemaphore when it write one unit of information;
the writer should acquire allowedToRead lock (0,1 semaphore is a lock/mutex) before writing to shared memory to prevent race conditions.
To simplify: consider two functions read_shared_memory, write_shared_memory, which are to read and write from/to the shared memory respectively and both acquiring/releasing the same lock before reading/writing.
The producer acquires write semaphore, calls the write function, releases the read semaphore.
The consumer acquire read semaphore, calls the read function, releases the the write semaphore.
Sure this can be implemented without read/write functions, they are just to simplify using atomic access to the shared memory. A critical section can be implemented inside produce/consume loops without additional functions.
Wikipedia describes it in more scientific way :)
I am writing a program where there is an object shared by multiple threads:
A) Multiple write threads write to the object (all running the same
function)
B) A read thread which accesses the object every 5 seconds
C) A read thread which accesses the object there is a user request
It is obviously necessary to lock the object when writing to it, as we do not want multiple threads to write to the object at the same time.
My questions are:
Is it also necessary to lock the object when reading from it?
Am I correct to think that if we just lock the object when writing, a critical section is enough; but if we lock the object when reading or writing, a mutex is necessary?
I am asking this question because in Microsoft Office, it is not possible for two instances of Word to access a document in read/write access mode; but while the document is being opened in read/write mode, it is possible to open another instance of Word to access the document in read only mode. Would the same logic apply in threading?
As Ofir already wrote - if you try to read data from an object that some other thread is modyfying - you could get data in some inconsistent state.
But - if you are sure the object is not being modified, you can of course read it from multiple threads. In general, the question you are asking is more or less the Readers-writers problem - see http://en.wikipedia.org/wiki/Readers-writers_problem
Lastly - a critical section is an abstract term and can be implemented using a mutex or a monitor. The syntax sugar for a critical section in java or C# (synchronized, lock) use a monitor under the covers.
Is it also necessary to lock the object when reading from it?
If something else could write to it at the same time - yes. If only another read could occur - no. In your circumstances, I would say - yes.
Am I correct to think that if we just lock the object when writing, a
critical section is enough; but if we
lock the object when reading or
writing, a mutex is necessary?
No, you can use a critical section for both, other things being equal. Mutexes have added features over sections (named mutexes can be used from multiple processes, for example), but I don't think you need such features here.
It is necessary, because otherwise (unless operations are atomic) you may be reading an intermediate state.
You may want to allow multiple readers at the same time which requires a (bit) more complex kind of lock.
depends on how you use and read it. if your read is atomic (i.e, won't be interrupted by write) and the read thread does not have dependency with the write threads, then you maybe able to skip read lock. But if your 'read' operation takes some time and takes heavy object interation, then you should lock it for read.
if your reading does not take a very long time (i.e., won't delay the write threads too long), critical section should be enough.
locking is only needed when two processes can change the same database table elements.
when you want to read data it is always secure. you read data of a consistent database. the process changing the data has a shadow version which is consistent and will override current data when you save it. but if you are running a reading process which is depending on critical value from database elements you should look for locks which indicates those values are likely to be altered. so your reading is delayed until the lock is gone.