Why don't thread locks render multithreading useless? - multithreading

I understand that thread locks can be beneficial in some cases, but in others, they make no sense. The case I am talking about is shared data. Suppose you have an array, and you want each thread to sort a random range within the array, and then update the array in that range with the sorted values, and keep doing that until the whole array is sorted (not practical, this is actually a school project I need to do, but it also serves as an example for what I don't understand).
array = {9, 7, 5, 3, 2, 4, 1}
// thread 1 sorts indices 1-3
// thread 2 sorts indices 3-6
thread1_result = {3, 5, 7}
tread2_result = {1, 2, 3, 4}
//update array
array = {9, 3, 5, 1, 2, 3, 4}
The above pseudo code is what would happen if two threads read from the array and sorted their respective ranges at the same time. Since they are reading from the same version of the array, they do not take into account the changes the other array is about to make. The results in the 7 being lost, and the 3 being duplicated.
The way to prevent this is by locking the array so only a single thread can read from the array, sort their given range, and then update the array at a time. Here's my big problem: doesn't that completely nullify the reason for using threads? Locking the array so that the threads take turn operating on it turns the program from a multithreaded solution to a sequential solution, because the again, the array is only being interacted with by only one thread at any given time. What's the point using threads? Is there a solution I am not seeing where multiple threads can work on the array at the same time without data being lost?

Locking of threads is to prevent concurrent access to shared data which may result in data inconsistencies.
Race conditions may appear where several processes access and manipulate the same data concurrently and the outcome of the execution depends on the order in which the access and amendments take place. This is exactly what happened in your described scenario.
Synchronization is needed for this, with all the multithreading in the world, if you can't synch up your threads, all is for nothing. Semaphores & Mutexes are used in synchronization which then more problems arises such as Deadlocks & Starvation.
Lock-free multi-threading is extremely hard to achieve and if its easily obtainable everybody will be utilizing it. This answer talks about the understanding of obtaining lock-free multi-threading, it's observations and it's challenges. Hope this helps!

Related

Which one I should use in Clojure? go block or thread?

I want to see the intrinsic difference between a thread and a long-running go block in Clojure. In particular, I want to figure out which one I should use in my context.
I understand if one creates a go-block, then it is managed to run in a so-called thread-pool, the default size is 8. But thread will create a new thread.
In my case, there is an input stream that takes values from somewhere and the value is taken as an input. Some calculations are performed and the result is inserted into a result channel. In short, we have input and out put channel, and the calculation is done in the loop. So as to achieve concurrency, I have two choices, either use a go-block or use thread.
I wonder what is the intrinsic difference between these two. (We may assume there is no I/O during the calculations.) The sample code looks like the following:
(go-loop []
(when-let [input (<! input-stream)]
... ; calculations here
(>! result-chan result))
(recur))
(thread
(loop []
(when-let [input (<!! input-stream)]
... ; calculations here
(put! result-chan result))
(recur)))
I realize the number of threads that can be run simultaneously is exactly the number of CPU cores. Then in this case, is go-block and thread showing no differences if I am creating more than 8 thread or go-blocks?
I might want to simulate the differences in performance in my own laptop, but the production environment is quite different from the simulated one. I could draw no conclusions.
By the way, the calculation is not so heavy. If the inputs are not so large, 8,000 loops can be run in 1 second.
Another consideration is whether go-block vs thread will have an impact on GC performance.
There's a few things to note here.
Firstly, the thread pool that threads are created on via clojure.core.async/thread is what is known as a cached thread pool, meaning although it will re-use recently used threads inside that pool, it's essentially unbounded. Which of course means it could potentially hog a lot of system resources if left unchecked.
But given that what you're doing inside each asynchronous process is very lightweight, threads to me seem a little overkill. Of course, it's also important to take into account the quantity of items you expect to hit the input stream, if this number is large you could potentially overwhelm core.async's thread pool for go macros, potentially to the point where we're waiting for a thread to become available.
You also didn't mention preciously where you're getting the input values from, are the inputs some fixed data-set that remains constant at the start of the program, or are inputs continuously feed into the input stream from some source over time?
If it's the former then I would suggest you lean more towards transducers and I would argue that a CSP model isn't a good fit for your problem since you aren't modelling communication between separate components in your program, rather you're just processing data in parallel.
If it's the latter then I presume you have some other process that's listening to the result channel and doing something important with those results, in which case I would say your usage of go-blocks is perfectly acceptable.

What multithreading based data structure should I use?

I have recently come across a question based on multi-threading. I was given a situation where there will be variable no of cars constantly changing there locations. Also there are multiple users who are posting requests to get location of any car at any moment. What would be data structure to handle this situation and why?
You could use a mutex (one per car).
Lock: before changing location of the associated car
Unlock: after changing location of the associated car
Lock: before getting location of the associated car
Unlock: after done doing work that relies on that location being up to date
I'd answer with:
Try to make threading an external concept to your system yet make the system as modular and encapsulated as possible at the same time. It will allow adding concurrency at later phase at low cost and in case the solution happens to work nicely in a single thread (say by making it event-loop-based) no time will have been burnt for nothing.
There are several ways to do this. Which way you choose depends a lot on the number of cars, the frequency of updates and position requests, the expected response time, and how accurate (up to date) you want the position reports to be.
The easiest way to handle this is with a simple mutex (lock) that allows only one thread at a time to access the data structure. Assuming you're using a dictionary or hash map, your code would look something like this:
Map Cars = new Map(...)
Mutex CarsMutex = new Mutex(...)
Location GetLocation(carKey)
{
acquire mutex
result = Cars[carKey].Location
release mutex
return result
}
You'd do that for Add, Remove, Update, etc. Any method that reads or updates the data structure would require that you acquire the mutex.
If the number of queries far outweighs the number of updates, then you can do better with a reader/writer lock instead of a mutex. With an RW lock, you can have an unlimited number of readers, OR you can have a single writer. With that, querying the data would be:
acquire reader lock
result = Cars[carKey].Location
release reader lock
return result
And Add, Update, and Remove would be:
acquire writer lock
do update
release writer lock
Many runtime libraries have a concurrent dictionary data structure already built in. .NET, for example, has ConcurrentDictionary. With those, you don't have to worry about explicitly synchronizing access with a Mutex or RW lock; the data structure handles synchronization for you, either with a technique similar to that shown above, or by implementing lock-free algorithms.
As mentioned in comments, a relational database can handle this type of thing quite easily and can scale to a very large number of requests. Modern relational databases, properly constructed and with sufficient hardware, are surprisingly fast and can handle huge amounts of data with very high throughput.
There are other, more involved, methods that can increase throughput in some situations depending on what you're trying to optimize. For example, if you're willing to have some latency in reported position, then you could have position requests served from a list that's updated once per minute (or once every five minutes). So position requests are fulfilled immediately with no lock required from a static copy of the list that's updated once per minute. Updates are queued and once per minute a new list is created by applying the updates to the old list, and the new list is made available for requests.
There are many different ways to solve your problem.

PostgreSQL: Is using SELECT nextval generator thread safe, in harsh multiuser environments?

I mean like thousands users in time updating values in database?
Yes, nextval is safe to use from multiple concurrently operating transactions. That is its purpose and its reason for existing.
That said, it is not actually "thread safe" as such, because PostgreSQL uses a multi-processing model not a multi-threading model, and because most client drivers (libpq, for example) do not permit more than one thread at a time to interact with a single connection.
You should also be aware that while nextval is guaranteed to return distinct and increasing values, it is not guaranteed to do so without "holes" or "gaps". Such gaps are created when a generated value is discarded without being committed (say, by a ROLLBACK) and when PostgreSQL recovers after a server crash.
While nextval will always return increasing numbers, this does not mean that your transactions will commit in the order they got IDs from a given sequence in. It's thus perfectly normal to have something like this happen:
Start IDs in table: [1 2 3 4]
1st tx gets ID 5 from nextval()
2nd tx gets ID 6 from nextval()
2nd tx commits: [1 2 3 4 6]
1st tx commits: [1 2 3 4 5 6]
In other words, holes can appear and disappear.
Both these anomalies are necessary and unavoidable consequences of making one nextval call not block another.
If you want a sequence without such ordering and gap anomalies, you need to use a gapless sequence design that permits only one transaction at a time to have an uncommitted generated ID, effectively eliminating all concurrency for inserts in that table. This is usually implemented using SELECT FOR UPDATE or UPDATE ... RETURNING on a counter table.
Search for "PostgreSQL gapless sequence" for more information.
Yes it is threadsafe.
From the manual:
nextvalAdvance the sequence object to its next value and return that value. This is done atomically: even if multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value.
(Emphasis mine)
Yes: http://www.postgresql.org/docs/current/static/functions-sequence.html
It wouldn't be useful otherwise.
Edit:
Here is how you use nextval and currval:
nextval returns a new sequence number, you use this for the id in an insert on the first table
currval returns the last sequence number obtained by this session, you use that in foreign keys to reference the first table
each call to nextval returns another value, don't call it twice in the same set of inserts.
And of course, you should use transactions in any multiuser code.
This poster asked a different question on the same flawed code.
here Point is: he does not seem to know how foreign keys work, and has them reversed (a sequence functioning as a foreign key is kind of awkward IMHO)
BTW: This is should be a comment, not an answer; but I can't comment yet.

Implementing Concurrent writes in the CRCW threading model

PRAM models for parallel computing come in the three main flavours: EREW , CREW, CRCW.
I can understand how EREW, CREW can be implemented on a multicore machine. But how
would one go about implementing the CRCW model on a multicore CPU ? Is it even a practical model, since concurrent writes are not possible and every basic parallel programming course
goes into great details into race conditions.
Essentially this means that trying to avoid race conditions and trying to implement concurrent
writes are two opposing goals.
First up: We know that the PRAM is a theoretical, or abstract machine. There are several simplifications made so that it may be used for analyzing/designing parallel algorithms.
Next, let's talk about the ways in which one may do 'concurrent writes' meaningfully.
Concurrent write memories are usually divided into subclasses, based on how they behave:
Priority based CW - Processors have a priority, and if multiple concurrent writes to the same location arrive, the write from the processor of highest priority gets committed to memory.
Arbitary CW - One processor's write is arbitrarily chosen for commit.
Common CW - Multiple concurrent writes to the same location are committed only if the values being written are the same. i.e. all writing processors must agree on the value being written.
Reduction CW - A reduction operator is applied on the multiple values being written. e.g. a summation, where multiple concurrent writes to the same location lead to the sum of the values being written to be committed to memory.
These subclasses lead to some interesting algorithms. Some of the examples I remember from class are:
A CRCW-PRAM where the concurrent write is achieved as a summation can sum an arbitrarily large number of integers in a single timestep. There is a processor for each integer in the input array. All processors write their value to the same location. Done.
Imagine a CRCW-PRAM where the memory commits concurrent writes only if the value written by all processors is the same. Now imagine N numbers A[1] ... A[N], whose maximum you need to find. Here's how you'd do it:
Step 1.
N2 processors will compare each value to each other value, and write the result to a 2D array:
parallel_for i in [1,N]
parallel_for j in [1,N]
if (A[i] >= A[j])
B[i,j] = 1
else
B[i,j] = 0
So in this 2D array, the column corresponding to the biggest number will be all 1's.
Step 2:
Find the column which has only 1's. And store the corresponding value as the max.
parallel_for i in [1,N]
M[i] = 1
parallel_for j in [1,N]
if (B[i,j] = 0)
M[i] = 0 // multiple concurrent writes of *same* value
if M[i]
max = A[i]
Finally, is it possible to implement for real?
Yes, it is possible. Designing, say, a register file, or a memory and associated logic, which has multiple write ports, and which arbitrates concurrent writes to the same address in a meaningful way (like the ways I described above) is possible. You can probably already see that based on the subclasses I mentioned. Whether or not it is practical, I cannot say. I can say that in my limited experience with computers (which involves mostly using general purpose hardware, like the Core Duo machine I'm currently sitting before), I haven't seen one in practice.
EDIT: I did find a CRCW implementation. The wikipedia article on PRAM describes a CRCW machine which can find the max of an array in 2 clock cycles (using the same algorithm as the one above). The description is in SystemVerilog and can be implemented in an FPGA.

Thread locking / exclusive access improvements

I have 2 threaded methods running in 2 separate places but sharing access at the same time to a list array object (lets call it PriceArray), the first thread Adds and Removes items from PriceArray when necessary (the content of the array gets updated from a third party data provider) and the average update rate is between 0.5 and 1 second.
The second thread only reads -for now- the content of the array every 3 seconds using a foreach loop (takes most items but not all of them).
To ensure avoiding the nasty Collection was modified; enumeration operation may not execute exception when the second thread loops through the array I have wrapped the add and remove operation in the first thread with lock(PriceArray) to ensure exclusive access and prevent that exception from occurring. The problem is I have noticed a performance issue when the second method tries to loop through the array items as most of the time the array is locked by the add/remove thread.
Having the scenario running this way, do you have any suggestions how to improve the performance using other thread-safety/exclusive access tactics in C# 4.0?
Thanks.
Yes, there are many alternatives.
The best/easiest would be to switch to using an appropriate collection in System.Collections.Concurrent. These are all thread-safe collections, and will allow you to use them without managing your own locks. They are typically either lock-free or use very fine grained locking, so will likely dramatically improve the performance impacts you're getting from the synchronization.
Another option would be to use ReaderWriterLockSlim to allow your readers to not block each other. Since a third party library is writing this array, this may be a more appropriate solution. It would allow you to completely block during writing, but the readers would not need to block each other during reads.
My suggestion is that ArrayList.Remove() takes most of the time, because in order to perform deletion it performs two costly things:
linear search: just takes elements one by one and compares with element being removed
when index of the element being removed is found - it shifts everything below it by one position to the left.
Thus every deletion takes time proportionally to count of elements currently in the collection.
So you should try to replace ArrayList with more appropriate structure for this task. I need more information about your case to suggest which one to choose.

Resources