When we run a STM expression which hits retry, the thread is blocked and the transaction is run once again if the entries are modified.
But I was wondering :
If we read a STM variable which, in that specific branch leading to retry, is not actually used , would updating it try to perform the transaction again ?
While the thread is blocked, is it really blocked ? or is it recycled in a thread pool to be used by other potentially waiting operations ?
Yes. Reading STM variable will invoke stmReadTVar - see here. This will generate new entry in transaction record and it will be checked on commit. If you take a look here you will find that ReadTVarOp marked as an operation with side effect (has_side_effects = True) so I don't think compiler will eliminate it regardless will you use it result or not.
As #WillSewell wrote Haskell uses green threads. You can even use STM in single-threaded runtime without worry that the actual OS thread will be blocked.
Re. 1: as I understand your question, yes that is correct; your entire STM transaction will have a consistent view of the world including branches composed with orElse (see: https://ghc.haskell.org/trac/ghc/ticket/8680). But I'm not sure what you mean by "but my transaction actually depends on the value of just 1 variable"; if you do a readTVar then changes to that var will be tracked.
Re. 2: you can think of green threads as lumps of saved computation state that are stored in a stack-like thing and popped off, run for a bit, and put back onto the stack when they can't make further progress for the time being ("blocked") or after they've run for long enough. The degree to which this happens in parallel depends on the number of OS threads you tell the runtime to use (via +RTS -N). You can have a concurrent program that uses thousands of green threads but is only run with a single OS thread, and that's perfectly fine.
Related
I want to see the intrinsic difference between a thread and a long-running go block in Clojure. In particular, I want to figure out which one I should use in my context.
I understand if one creates a go-block, then it is managed to run in a so-called thread-pool, the default size is 8. But thread will create a new thread.
In my case, there is an input stream that takes values from somewhere and the value is taken as an input. Some calculations are performed and the result is inserted into a result channel. In short, we have input and out put channel, and the calculation is done in the loop. So as to achieve concurrency, I have two choices, either use a go-block or use thread.
I wonder what is the intrinsic difference between these two. (We may assume there is no I/O during the calculations.) The sample code looks like the following:
(go-loop []
(when-let [input (<! input-stream)]
... ; calculations here
(>! result-chan result))
(recur))
(thread
(loop []
(when-let [input (<!! input-stream)]
... ; calculations here
(put! result-chan result))
(recur)))
I realize the number of threads that can be run simultaneously is exactly the number of CPU cores. Then in this case, is go-block and thread showing no differences if I am creating more than 8 thread or go-blocks?
I might want to simulate the differences in performance in my own laptop, but the production environment is quite different from the simulated one. I could draw no conclusions.
By the way, the calculation is not so heavy. If the inputs are not so large, 8,000 loops can be run in 1 second.
Another consideration is whether go-block vs thread will have an impact on GC performance.
There's a few things to note here.
Firstly, the thread pool that threads are created on via clojure.core.async/thread is what is known as a cached thread pool, meaning although it will re-use recently used threads inside that pool, it's essentially unbounded. Which of course means it could potentially hog a lot of system resources if left unchecked.
But given that what you're doing inside each asynchronous process is very lightweight, threads to me seem a little overkill. Of course, it's also important to take into account the quantity of items you expect to hit the input stream, if this number is large you could potentially overwhelm core.async's thread pool for go macros, potentially to the point where we're waiting for a thread to become available.
You also didn't mention preciously where you're getting the input values from, are the inputs some fixed data-set that remains constant at the start of the program, or are inputs continuously feed into the input stream from some source over time?
If it's the former then I would suggest you lean more towards transducers and I would argue that a CSP model isn't a good fit for your problem since you aren't modelling communication between separate components in your program, rather you're just processing data in parallel.
If it's the latter then I presume you have some other process that's listening to the result channel and doing something important with those results, in which case I would say your usage of go-blocks is perfectly acceptable.
I have a main thread that creates/destroys objects. Let's name the object 'f'.
Now, every time this object is created it is added to the tailqueue of another object - say 'mi' . conversely when this object is deleted.
Now, there is another thread that runs every second, that tries to gather say statistics for this object 'f'. So it basically walks through all the max possible instance of 'mi' (say 2048)and then for each such 'mi', it gathers all the 'f' objects attached to it, sends a cmd down to the lower layer which emits some values corresponding to these objects. Now it must update the corresponding 'f' objects with these values.
Now the concern is what IF one of these 'f' objects gets deleted by the main thread while this walk is happening every 1s ?
Intuitively one would think of having a lock at the 'mi' level that is acquired before beginning the walk and released post the walk /update of all the 'f' objects belonging to a particular instance of 'mi', correct?
But the only hitch with this is that there could be 10,000's and even millions of 'f' objects tied to this instance of 'mi'.
The other requirement being that the main thread performance of creating/destroying these 'f' objects should be high i.e at the rate of atleast 10000 objects per second....
So given that, i'm not sure if it's feasible to have this per 'mi' object lock? Or am i overestimating the side effects of lock contention?
Any other ideas ?
Now the concern is what IF one of these 'f' objects gets deleted by
the main thread while this walk is happening every 1s ?
If an f object gets deleted while the other thread is trying to use it, undefined behavior will be invoked and you will probably end up spending some hours debugging your program to try to figure out why it is occasionally crashing. :) The trick is to make sure that you never delete any f while the other thread might be using it -- typically that would mean that your main thread needs to lock the mi's mutex before removing the f from its queue -- once the f is no longer in the queue, you can release the mutex before deleting the f if you want to, since at that point the other thread will not be able to access the f anyway.
i'm not sure if it's feasible to have this per 'mi' object lock?
It's feasible, as long as you don't mind your main thread occasionally getting held off (i.e. blocked waiting in a mutex::lock() method-call) until your other thread finishes iterating through the mi's queue and releases the mutex. Whether that holdoff-time is acceptable or not will depend on what the latency requirements of your main thread are (e.g. if it's generating a report, then being blocked for some number of milliseconds is no problem; OTOH if it is operating the control surfaces on a rocket in flight, being blocked for any length of time is unacceptable)
Any other ideas ?
My first idea is to get rid of the second thread entirely -- just have main thread call the statistics-collection function directly once per second, instead. Then you don't have to worry about mutexes or mutex-contention at all. This does mean that your main thread won't be able to perform its primary function during the time it is running the statistics-collection function, but at least now its "down time" is predictable rather than being a random function of which mi objects the two threads happen to try to lock/access at any given instant.
If that's no good (i.e. you can't tolerate any significant hold-off time whatsoever), another approach would be to use a message-passing paradigm rather than a shared-data paradigm. That is, instead of allowing both threads direct access the same set of mi's, use a message-queue of some sort so that the main thread can take a mi out of service and send it over to the second thread for statistics-gathering purposes. The second thread would then scan/update it as usual, and when it's done, pass it back (via a second message-queue) to the primary thread, which would put it back into service. You could periodically do this with various mi's to keep statistics updated on each of them without every requiring shared access to any of them. (This only works if your main thread can afford to go without access to certain mi's for short periods, though)
Wanting to be sure we're using the correct synchronization (and no more than necessary) when writing threadsafe code in JRuby; specifically, in a Puma instantiated Rails app.
UPDATE: Extensively re-edited this question, to be very clear and use latest code we are implementing. This code uses the atomic gem written by #headius (Charles Nutter) for JRuby, but not sure it is totally necessary, or in which ways it's necessary, for what we're trying to do here.
Here's what we've got, is this overkill (meaning, are we over/uber-engineering this), or perhaps incorrect?
ourgem.rb:
require 'atomic' # gem from #headius
SUPPORTED_SERVICES = %w(serviceABC anotherSvc andSoOnSvc).freeze
module Foo
def self.included(cls)
cls.extend(ClassMethods)
cls.send :__setup
end
module ClassMethods
def get(service_name, method_name, *args)
__cached_client(service_name).send(method_name.to_sym, *args)
# we also capture exceptions here, but leaving those out for brevity
end
private
def __client(service_name)
# obtain and return a client handle for the given service_name
# we definitely want to cache the value returned from this method
# **AND**
# it is a requirement that this method ONLY be called *once PER service_name*.
end
def __cached_client(service_name)
##_clients.value[service_name]
end
def __setup
##_clients = Atomic.new({})
##_clients.update do |current_service|
SUPPORTED_SERVICES.inject(Atomic.new({}).value) do |memo, service_name|
if current_services[service_name]
current_services[service_name]
else
memo.merge({service_name => __client(service_name)})
end
end
end
end
end
end
client.rb:
require 'ourgem'
class GetStuffFromServiceABC
include Foo
def self.get_some_stuff
result = get('serviceABC', 'method_bar', 'arg1', 'arg2', 'arg3')
puts result
end
end
Summary of the above: we have ##_clients (a mutable class variable holding a Hash of clients) which we only want to populate ONCE for all available services, which are keyed on service_name.
Since the hash is in a class variable (and hence threadsafe?), are we guaranteed that the call to __client will not get run more than once per service name (even if Puma is instantiating multiple threads with this class to service all the requests from different users)? If the class variable is threadsafe (in that way), then perhaps the Atomic.new({}) is unnecessary?
Also, should we be using an Atomic.new(ThreadSafe::Hash) instead? Or again, is that not necessary?
If not (meaning: you think we do need the Atomic.news at least, and perhaps also the ThreadSafe::Hash), then why couldn't a second (or third, etc.) thread interrupt between the Atomic.new(nil) and the ##_clients.update do ... meaning the Atomic.news from EACH thread will EACH create two (separate) objects?
Thanks for any thread-safety advice, we don't see any questions on SO that directly address this issue.
Just a friendly piece of advice, before I attempt to tackle the issues you raise here:
This question, and the accompanying code, strongly suggests that you don't (yet) have a solid grasp of the issues involved in writing multi-threaded code. I encourage you to think twice before deciding to write a multi-threaded app for production use. Why do you actually want to use Puma? Is it for performance? Will your app handle many long-running, I/O-bound requests (like uploading/downloading large files) at the same time? Or (like many apps) will it primarily handle short, CPU-bound requests?
If the answer is "short/CPU-bound", then you have little to gain from using Puma. Multiple single-threaded server processes would be better. Memory consumption will be higher, but you will keep your sanity. Writing correct multi-threaded code is devilishly hard, and even experts make mistakes. If your business success, job security, etc. depends on that multi-threaded code working and working right, you are going to cause yourself a lot of unnecessary pain and mental anguish.
That aside, let me try to unravel some of the issues raised in your question. There is so much to say that it's hard to know where to start. You may want to pour yourself a cold or hot beverage of your choice before sitting down to read this treatise:
When you talk about writing "thread-safe" code, you need to be clear about what you mean. In most cases, "thread-safe" code means code which doesn't concurrently modify mutable data in a way which could cause data corruption. (What a mouthful!) That could mean that the code doesn't allow concurrent modification of mutable data at all (using locks), or that it does allow concurrent modification, but makes sure that it doesn't corrupt data (probably using atomic operations and a touch of black magic).
Note that when your threads are only reading data, not modifying it, or when working with shared stateless objects, there is no question of "thread safety".
Another definition of "thread-safe", which probably applies better to your situation, has to do with operations which affect the outside world (basically I/O). You may want some operations to only happen once, or to happen in a specific order. If the code which performs those operations runs on multiple threads, they could happen more times than desired, or in a different order than desired, unless you do something to prevent that.
It appears that your __setup method is only called when ourgem.rb is first loaded. As far as I know, even if multiple threads require the same file at the same time, MRI will only ever let a single thread load the file. I don't know whether JRuby is the same. But in any case, if your source files are being loaded more than once, that is symptomatic of a deeper problem. They should only be loaded once, on a single thread. If your app handles requests on multiple threads, those threads should be started up after the application has loaded, not before. This is the only sane way to do things.
Assuming that everything is sane, ourgem.rb will be loaded using a single thread. That means __setup will only ever be called by a single thread. In that case, there is no question of thread safety at all to worry about (as far as initialization of your "client cache" goes).
Even if __setup was to be called concurrently by multiple threads, your atomic code won't do what you think it does. First of all, you use Atomic.new({}).value. This wraps a Hash in an atomic reference, then unwraps it so you just get back the Hash. It's a no-op. You could just write {} instead.
Second, your Atomic#update call will not prevent the initialization code from running more than once. To understand this, you need to know what Atomic actually does.
Let me pull out the old, tired "increment a shared counter" example. Imagine the following code is running on 2 threads:
i += 1
We all know what can go wrong here. You may end up with the following sequence of events:
Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A writes its incremented value back to i.
Thread B writes its incremented value back to i.
So we lose an update, right? But what if we store the counter value in an atomic reference, and use Atomic#update? Then it would be like this:
Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A tries to write its incremented value back to i, and succeeds.
Thread B tries to write its incremented value back to i, and fails, because the value has already changed.
Thread B reads i again and increments it.
Thread B tries to write its incremented value back to i again, and succeeds this time.
Do you get the idea? Atomic never stops 2 threads from running the same code at the same time. What it does do, is force some threads to retry the #update block when necessary, to avoid lost updates.
If your goal is to ensure that your initialization code will only ever run once, using Atomic is a very inappropriate choice. If anything, it could make it run more times, rather than less (due to retries).
So, that is that. But if you're still with me here, I am actually more concerned about whether your "client" objects are themselves thread-safe. Do they have any mutable state? Since you are caching them, it seems that initializing them must be slow. Be that as it may, if you use locks to make them thread-safe, you may not be gaining anything from caching and sharing them between threads. Your "multi-threaded" server may be reduced to what is effectively an unnecessarily complicated, single-threaded server.
If the client objects have no mutable state, good for you. You can be "free and easy" and share them between threads with no problems. If they do have mutable state, but initializing them is slow, then I would recommend caching one object per thread, so they are never shared. Thread[] is your friend there.
Disclaimer: This can easily be done using an MVar () as a simple mutex. I'm just curios to see whether it can be done with STM.
I want to do the following atomically:
Read some variables.
Decide what I/O to perform, based on what I just read.
Perform the I/O.
Record the results in the variables.
For concreteness, suppose I want to keep track of how many bytes of input I've read, and pretend I've reached EOF after a certain number of bytes have been consumed. (OK, letting two threads read from the same file concurrently is probably a bogus thing to do in the first place, but go with me on this...)
Clearly this cannot be a single STM transaction; there's I/O in the middle. Clearly it would also be wrong to have it as two unconnected transactions. (Two threads could see that there's one byte of quota left, and both decide to read that byte.)
Is there a nice solution to this problem? Or is STM simply the wrong tool for this task?
Use a TVar Bool named consistent to keep track of whether your IO action is in progress. Before you run the IO action you set consistent to False and after you run the IO action you set consistent to True. Then, any action that depends on the values of those STM variables that you are modifying just puts this clause at the beginning:
do b <- readTVar consistent
check b
...
This ensures that those actions only see a consistent view of the variables being modified and will not run while the IO action is in progress.
I think you are looking for the stm-io-hooks package.
Whatever you actually want to do – would it be possible to express that in terms of STM's abort/retry semantics? In other words: Can you do a rollback and repeat the IO action?
If not, then I'd refer to the answer of Gabriel Gonzalez.
I'd say STM can't do it, and it's on purpose. A piece of STM code can be restarted multiple times at various places if a transaction rolls back. What would happen if you run your transaction, it performs the I/O action and then rolls back when recording the results in the variables?
For this reason STM computations must be pure, only with the addition of STM primitives such as STM mutable variables and arrays.
I am using shared variables on perl with use threads::shared.
That variables can we modified only from single thread, all other threads are only 'reading' that variables.
Is it required in the 'reading' threads to lock
{
lock $shared_var;
if ($shared_var > 0) .... ;
}
?
isn't it safe to simple verification without locking (in the 'reading' thread!), like
if ($shared_var > 0) ....
?
Locking is not required to maintain internal integrity when setting or fetching a scalar.
Whether it's needed or not in your particular case depends on the needs of the reader, the other readers and the writers. It rarely makes sense not to lock, but you haven't provided enough details for us to determine what your needs are.
For example, it might not be acceptable to use an old value after the writer has updated the shared variable. For starters, this can lead to a situation where one thread is still using the old value while the another thread is using the new value, a situation that can be undesirable if those two threads interact.
It depends on whether it's meaningful to test the condition just at some point in time or other. The problem however is that in a vast majority of cases, that Boolean test means other things, which might have already changed by the time you're done reading the condition that says it represents a previous state.
Think about it. If it's an insignificant test, then it means little--and you have to question why you are making it. If it's a significant test, then it is telltale of a coherent state that may or may not exist anymore--you won't know for sure, unless you lock it.
A lot of times, say in real-time reporting, you don't really care which snapshot the database hands you, you just want a relatively current one. But, as part of its transaction logic, it keeps a complete picture of how things are prior to a commit. I don't think you're likely to find this in code, where the current state is the current state--and even a state of being in a provisional state is a definite state.
I guess one of the times this can be different is a cyclical access of a queue. If one consumer doesn't get the head record this time around, then one of them will the next time around. You can probably save some processing time, asynchronously accessing the queue counter. But here's a case where it means little in context of just one iteration.
In the case above, you would just want to put some locked-level instructions afterward that expected that the queue might actually be empty even if your test suggested it had data. So, if it is just a preliminary test, you would have to have logic that treated the test as unreliable as it actually is.