I'm using Node and Redis.
If I issue a redis.set() command, is there any chance that whilst that is being set, another read can occur with the old value?
No, you will never have that problem. One of the basic virtues of Redis is that it has a tight event loop which executes the commands, so they are naturally atomic.
This page has more on the topic (see subheading "Atomicity"), and about Redis in general.
Assuming you're talking about two truly concurrent accesses, one write and one read, there is essentially no meaning to this question. If a write is itself atomic and the value is never seen as anything other than the old or new value, then a reader who reads at "about the same time" as a writer may legitimately see either the old or the new value.
Related
Here is the nice article which describes what is ES and how to deal with it.
Everything is fine there, but one image is bothering me. Here it is
I understand that in distributed event-based systems we are able to achieve eventual consistency only. Anyway ... How do we ensure that we don't book more seats than available? This is especially a problem if there are many concurrent requests.
It may happen that n aggregates are populated with the same amount of reserved seats, and all of these aggregate instances allow reservations.
I understand that in distributes event-based systems we are able to achieve eventual consistency only, anyway ... How to do not allow to book more seats than we have? Especially in terms of many concurrent requests?
All events are private to the command running them until the book of record acknowledges a successful write. So we don't share the events at all, and we don't report back to the caller, without knowing that our version of "what happened next" was accepted by the book of record.
The write of events is analogous to a compare-and-swap of the tail pointer in the aggregate history. If another command has changed the tail pointer while we were running, our swap fails, and we have to mitigate/retry/fail.
In practice, this is usually implemented by having the write command to the book of record include an expected position for the write. (Example: ES-ExpectedVersion in GES).
The book of record is expected to reject the write if the expected position is in the wrong place. Think of the position as a unique key in a table in a RDBMS, and you have the right idea.
This means, effectively, that the writes to the event stream are actually consistent -- the book of record only permits the write if the position you write to is correct, which means that the position hasn't changed since the copy of the history you loaded was written.
It's typical for commands to read event streams directly from the book of record, rather than the eventually consistent read models.
It may happen that n-AggregateRoots will be populated with the same amount of reserved seats, it means having validation in the reserve method won't help, though. Then n-AggregateRoots will emit the event of successful reservation.
Every bit of state needs to be supervised by a single aggregate root. You can have n different copies of that root running, all competing to write to the same history, but the compare and swap operation will only permit one winner, which ensures that "the" aggregate has a single internally consistent history.
There are going to be a couple of ways to deal with such a scenario.
First off, an event stream would have the current version as the version of the last event added. This means that when you would not, or should not, be able to persist the event stream if the event stream is not at the version when loaded. Since the very first write would cause the version of the event stream to be increased, the second write would not be permitted. Since events are not emitted, per se, but rather a result of the event sourcing we would not have the type of race condition in your example.
Well, if your commands are processed behind a queue any failures should be retried. Should it not be possible to process the request you would enter the normal "I'm sorry, Dave. I'm afraid I can't do that" scenario by letting the user know that they should try something else.
Another option is to start the processing by issuing an update against some table row to serialize any calls to the aggregate. Probably not the most elegant but it does cause a system-wide block on the processing.
I guess, to a large extent, one cannot really trust the read store when it comes to transactional processing.
Hope that helps :)
Wanting to be sure we're using the correct synchronization (and no more than necessary) when writing threadsafe code in JRuby; specifically, in a Puma instantiated Rails app.
UPDATE: Extensively re-edited this question, to be very clear and use latest code we are implementing. This code uses the atomic gem written by #headius (Charles Nutter) for JRuby, but not sure it is totally necessary, or in which ways it's necessary, for what we're trying to do here.
Here's what we've got, is this overkill (meaning, are we over/uber-engineering this), or perhaps incorrect?
ourgem.rb:
require 'atomic' # gem from #headius
SUPPORTED_SERVICES = %w(serviceABC anotherSvc andSoOnSvc).freeze
module Foo
def self.included(cls)
cls.extend(ClassMethods)
cls.send :__setup
end
module ClassMethods
def get(service_name, method_name, *args)
__cached_client(service_name).send(method_name.to_sym, *args)
# we also capture exceptions here, but leaving those out for brevity
end
private
def __client(service_name)
# obtain and return a client handle for the given service_name
# we definitely want to cache the value returned from this method
# **AND**
# it is a requirement that this method ONLY be called *once PER service_name*.
end
def __cached_client(service_name)
##_clients.value[service_name]
end
def __setup
##_clients = Atomic.new({})
##_clients.update do |current_service|
SUPPORTED_SERVICES.inject(Atomic.new({}).value) do |memo, service_name|
if current_services[service_name]
current_services[service_name]
else
memo.merge({service_name => __client(service_name)})
end
end
end
end
end
end
client.rb:
require 'ourgem'
class GetStuffFromServiceABC
include Foo
def self.get_some_stuff
result = get('serviceABC', 'method_bar', 'arg1', 'arg2', 'arg3')
puts result
end
end
Summary of the above: we have ##_clients (a mutable class variable holding a Hash of clients) which we only want to populate ONCE for all available services, which are keyed on service_name.
Since the hash is in a class variable (and hence threadsafe?), are we guaranteed that the call to __client will not get run more than once per service name (even if Puma is instantiating multiple threads with this class to service all the requests from different users)? If the class variable is threadsafe (in that way), then perhaps the Atomic.new({}) is unnecessary?
Also, should we be using an Atomic.new(ThreadSafe::Hash) instead? Or again, is that not necessary?
If not (meaning: you think we do need the Atomic.news at least, and perhaps also the ThreadSafe::Hash), then why couldn't a second (or third, etc.) thread interrupt between the Atomic.new(nil) and the ##_clients.update do ... meaning the Atomic.news from EACH thread will EACH create two (separate) objects?
Thanks for any thread-safety advice, we don't see any questions on SO that directly address this issue.
Just a friendly piece of advice, before I attempt to tackle the issues you raise here:
This question, and the accompanying code, strongly suggests that you don't (yet) have a solid grasp of the issues involved in writing multi-threaded code. I encourage you to think twice before deciding to write a multi-threaded app for production use. Why do you actually want to use Puma? Is it for performance? Will your app handle many long-running, I/O-bound requests (like uploading/downloading large files) at the same time? Or (like many apps) will it primarily handle short, CPU-bound requests?
If the answer is "short/CPU-bound", then you have little to gain from using Puma. Multiple single-threaded server processes would be better. Memory consumption will be higher, but you will keep your sanity. Writing correct multi-threaded code is devilishly hard, and even experts make mistakes. If your business success, job security, etc. depends on that multi-threaded code working and working right, you are going to cause yourself a lot of unnecessary pain and mental anguish.
That aside, let me try to unravel some of the issues raised in your question. There is so much to say that it's hard to know where to start. You may want to pour yourself a cold or hot beverage of your choice before sitting down to read this treatise:
When you talk about writing "thread-safe" code, you need to be clear about what you mean. In most cases, "thread-safe" code means code which doesn't concurrently modify mutable data in a way which could cause data corruption. (What a mouthful!) That could mean that the code doesn't allow concurrent modification of mutable data at all (using locks), or that it does allow concurrent modification, but makes sure that it doesn't corrupt data (probably using atomic operations and a touch of black magic).
Note that when your threads are only reading data, not modifying it, or when working with shared stateless objects, there is no question of "thread safety".
Another definition of "thread-safe", which probably applies better to your situation, has to do with operations which affect the outside world (basically I/O). You may want some operations to only happen once, or to happen in a specific order. If the code which performs those operations runs on multiple threads, they could happen more times than desired, or in a different order than desired, unless you do something to prevent that.
It appears that your __setup method is only called when ourgem.rb is first loaded. As far as I know, even if multiple threads require the same file at the same time, MRI will only ever let a single thread load the file. I don't know whether JRuby is the same. But in any case, if your source files are being loaded more than once, that is symptomatic of a deeper problem. They should only be loaded once, on a single thread. If your app handles requests on multiple threads, those threads should be started up after the application has loaded, not before. This is the only sane way to do things.
Assuming that everything is sane, ourgem.rb will be loaded using a single thread. That means __setup will only ever be called by a single thread. In that case, there is no question of thread safety at all to worry about (as far as initialization of your "client cache" goes).
Even if __setup was to be called concurrently by multiple threads, your atomic code won't do what you think it does. First of all, you use Atomic.new({}).value. This wraps a Hash in an atomic reference, then unwraps it so you just get back the Hash. It's a no-op. You could just write {} instead.
Second, your Atomic#update call will not prevent the initialization code from running more than once. To understand this, you need to know what Atomic actually does.
Let me pull out the old, tired "increment a shared counter" example. Imagine the following code is running on 2 threads:
i += 1
We all know what can go wrong here. You may end up with the following sequence of events:
Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A writes its incremented value back to i.
Thread B writes its incremented value back to i.
So we lose an update, right? But what if we store the counter value in an atomic reference, and use Atomic#update? Then it would be like this:
Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A tries to write its incremented value back to i, and succeeds.
Thread B tries to write its incremented value back to i, and fails, because the value has already changed.
Thread B reads i again and increments it.
Thread B tries to write its incremented value back to i again, and succeeds this time.
Do you get the idea? Atomic never stops 2 threads from running the same code at the same time. What it does do, is force some threads to retry the #update block when necessary, to avoid lost updates.
If your goal is to ensure that your initialization code will only ever run once, using Atomic is a very inappropriate choice. If anything, it could make it run more times, rather than less (due to retries).
So, that is that. But if you're still with me here, I am actually more concerned about whether your "client" objects are themselves thread-safe. Do they have any mutable state? Since you are caching them, it seems that initializing them must be slow. Be that as it may, if you use locks to make them thread-safe, you may not be gaining anything from caching and sharing them between threads. Your "multi-threaded" server may be reduced to what is effectively an unnecessarily complicated, single-threaded server.
If the client objects have no mutable state, good for you. You can be "free and easy" and share them between threads with no problems. If they do have mutable state, but initializing them is slow, then I would recommend caching one object per thread, so they are never shared. Thread[] is your friend there.
I am using shared variables on perl with use threads::shared.
That variables can we modified only from single thread, all other threads are only 'reading' that variables.
Is it required in the 'reading' threads to lock
{
lock $shared_var;
if ($shared_var > 0) .... ;
}
?
isn't it safe to simple verification without locking (in the 'reading' thread!), like
if ($shared_var > 0) ....
?
Locking is not required to maintain internal integrity when setting or fetching a scalar.
Whether it's needed or not in your particular case depends on the needs of the reader, the other readers and the writers. It rarely makes sense not to lock, but you haven't provided enough details for us to determine what your needs are.
For example, it might not be acceptable to use an old value after the writer has updated the shared variable. For starters, this can lead to a situation where one thread is still using the old value while the another thread is using the new value, a situation that can be undesirable if those two threads interact.
It depends on whether it's meaningful to test the condition just at some point in time or other. The problem however is that in a vast majority of cases, that Boolean test means other things, which might have already changed by the time you're done reading the condition that says it represents a previous state.
Think about it. If it's an insignificant test, then it means little--and you have to question why you are making it. If it's a significant test, then it is telltale of a coherent state that may or may not exist anymore--you won't know for sure, unless you lock it.
A lot of times, say in real-time reporting, you don't really care which snapshot the database hands you, you just want a relatively current one. But, as part of its transaction logic, it keeps a complete picture of how things are prior to a commit. I don't think you're likely to find this in code, where the current state is the current state--and even a state of being in a provisional state is a definite state.
I guess one of the times this can be different is a cyclical access of a queue. If one consumer doesn't get the head record this time around, then one of them will the next time around. You can probably save some processing time, asynchronously accessing the queue counter. But here's a case where it means little in context of just one iteration.
In the case above, you would just want to put some locked-level instructions afterward that expected that the queue might actually be empty even if your test suggested it had data. So, if it is just a preliminary test, you would have to have logic that treated the test as unreliable as it actually is.
I'm Korean. My English skill too low.
In NODE.JS, there are two setInterval().
Of course, nodejs is single thread.
but, I worry about that each setInterval handles same value(or array).
To tell the truth, my circumstance has network and setInterval().
how can I controll the value. Or my worry is nothing?
You want to consider rewording this, I'm having trouble understanding what you are asking (especially in relation to network/threads), but I'm guessing you want to look into what the nodejs event loop is:
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
JavaScript runs code in what I like to call turns.
During a turn, the code that is running has full and exclusive access to all variables and the values bound to them. As no other code is or can be running, you don't have to worry about locking.
You can ignore the text below the line.
Note that although this doesn't matter in this case, if you have a process that completes over multiple turns, you should be aware that other code may have taken turns between those turns. Each turn is atomic, and there are ways to make multi-turn processes atomic but they are too complex to explain here.
Note that the concept of a turn comes from the E lang but fits so nicely in JavaScript.
only one thread is allocated to user-level
user level 에서는 오직 1 thread 만 할당 되어있다 .
so, you don't have to worry about thread confliction. or IPC
즉 thread confliction 은 고민할 필요가 없다는 얘기
if your question is not regarding this ,
then you can handle every other case easily by your application-level programming
기타 상황은 응용프로그램 레벨에서 조치 하면 될것 같음.
i'm newbie to here,
so i don't know whether language other than english is permitted or not ....
There is nothing in the way the program uses this data which will cause the program to crash if it reads the old value rather than the new value. It will get the new value at some point.
However, I am wondering if reading and writing at the same time from multiple threads can cause problems for the OS?
I am yet to see them if it does. The program is developed in Linux using pthreads.
I am not interested in being told how to use mutexs/semaphores/locks/etc edit: so my program is only getting the new values, that is not what I'm asking.
No.. the OS should not have any problem. The tipical problem is the that you dont want to read the old values or a value that is half way updated, and thus not valid (and may crash your app or if the next value depends on the former, then you can get a corrupted value and keep generating wrong values all the itme), but if you dont care about that, the OS wont either.
Are the kernel/drivers reading that data for any reason (eg. it contains structures passed in to kernel APIs)? If no, then there isn't any issue with it, since the OS will never ever look at your hot memory.
Your own reads must ensure they are consistent so you don't read half of a value pre-update and half post-update and end up with a value that is neither pre neither post update.
There is no danger for the OS. Only your program's data integrity is at risk.
Imagine you data to consist of a set (structure) of values, which cannot be updated in an atomic operation. The reading thread is bound to read inconsistent data at some point (data consisting of a mixture of old and new values). But you did not want to hear about mutexes...
Problems arise when multiple threads share access to data when accessing that data is not atomic. For example, imagine a struct with 10 interdependent fields. If one thread is writing and one is reading, the reading thread is likely to see a struct that is halfway between one state and another (for example, half of it's members have been set).
If on the other hand the data can be read and written to with a single atomic operation, you will be fine. For example, imagine if there is a global variable that contains a count... One thread is incrementing it on some condition, and another is reading it and taking some action... In this case, there is really no intermediate inconsistent state. It's either got the new value, or it has the old value.
Logically, you can think of locking as a tool that lets you make arbitrary blocks of code atomic, at least as far as the other threads of execution are concerned.