Service Fabric locking and timeout - multithreading

I'm dealing with a situation where multiple threads are accessing this method
using (var tx = StateManager.CreateTransaction())
{
var item = await reliableDictioanary.GetAsync(tx, key);
... // Do work on a copy of item
await reliableDictioanary.SetAsync(tx, key, item);
await tx.CommitAsync();
}
Single threading this works well, but when I try accessing the dictionary this way using multiple threads I encounter a System.TimeOutException.
The only way I've been able to get around it is to use LockMode.Update on the GetAsync(...) method. Has anyone here experienced something like this?
I'm wondering if there is a way to read with snapshot isolation, which would allow a read with no lock on it, as opposed to a read with a shared lock on the record.
I've tried doing this with both a shared transaction as shown above as well as individual transactions for the get and the set. Any help would be appreciated.

The default lock when reading, is a shared lock. (caused by GetAsync)
If you want to write, you need an exclusive lock. You can't get it if shared locks exist.
Getting the first lock as an update lock prevents this, like you noticed.
Snapshot isolation happens when enumerating records, which you're not doing with GetAsync.
More info here.

Related

Using Redis transaction vs redlock to solve the Lost Update Problem

I'm working with a Redis cluster having 2+ nodes. I'm trying to figure out which tool best fits for handling concurrency - transaction or locking. Transactions are well documented, but I didn't find a good best-practice-example on redlock. I also wonder why two tools exist and what's the use case for each.
For simplicity, let's assume I want to do a concurrent increment and there is no INCR command in Redis.
Option 1. Using Transactions
If I understand correctly, NodeJS pseudocode would look like this:
transactIncrement = async (key) => {
await redisClient.watch(key);
let value = redisClient.get(key);
value = value + 1;
const multi = await redisClient.multi();
try {
await redisClient.set(key, value, multi);
await redisClient.exec(multi);
} catch (e) {
// most probably error thrown because transaction failed
// TODO: think if it's a good idea to restart in every case, introducing a potential infinite loop
// whatever, restart
await transactIncrement(key);
}
}
Bad things I can see above are:
try-catch block
possibility to use transactions with multiple keys is limited on redis cluster
Option 2. Redlock
Is it true that trying to lock a resource that's already locked would not cause a failure immediately? So that redlock tries N times before erroring?
If true then here's my pseudocode:
redlockIncrement = async (key) => {
await redlock.lock(key, 1);
// below this line it's guaranteed that other "threads" are put on hold
// and cannot access the key, right?
let value = await redisClient.get(key);
value = value + 1;
await redisClient.set(key, value);
await redlock.unlock(key);
}
Summary
If I got things right then redlock is definitely a more powerful technique. Please correct me if I'm wrong in the above assumptions. It would also be really great if someone provides an example of code solving similar problem because I couldn't find one.
Redlock is useful when you have a distributed set of components that you want to coordinate to create an atomic operation.
You wouldn't use it for operations that affect a single Redis node. That's because Redis already has much simpler and more reliable means of ensuring atomicity for commands that use its single-threaded server: transactions or scripting. (You didn't mention Lua scripting, but that's the most powerful way to create custom atomic commands).
Since INCR operates on a single key, and therefore on a single node, the best way to implement that would be with a simple Lua script.
Now, if you want to use a sequence of commands that spans multiple nodes neither transactions nor scripting will work. In that case you could use Redlock or a similar distributed lock. However, you would generally try to avoid that in your Redis design. Specifically, you would use hash tags to force certain keys to reside on the same node:
Hash tags are a way to ensure that multiple keys are allocated in the same hash slot. This is used in order to implement multi-key operations in Redis Cluster.

JDBC LockRegistry accross JVMS

Is my application service obtaining a lock using JDBC LockRepository supposed to run inside an #Transaction ?
We have a sample application service that updates a JDBCRepository and since this application can run on multiple JVMS (headless). We needed a global lock to serialize those updates.
I looked at your test and was hoping my use case would work too. ... JdbcLockRegistryDifferentClientTests
My config has a DefaultLockRepository and JdbcLockRegistry;
I launched( java -jar boot.jar) my application on two terminals to simulate. When I obtain a lock and issue a tryLock() without #Transaction on my application service both of them get the lock (albeit) one after the other almost immediately. I expected one of them to NOT get it for at least 10 seconds (Default expiry).
Service (Instance -1) {
Obtain("KEY-1")
tryLock()
DoWork()
unlock();
close();
}
Service (Instance -2) {
Obtain("KEY-1")
tryLock() <-- Wait until the lock expires or the unlock happens
DoWork()
unlock();
close();
}
I also noticed here DefaultLockRepository that the transaction scope (if not inherited) is only around the JDBC operation.
When I change my service to
#Transaction
Service (Instance -1) {
Obtain("KEY-1")
tryLock()
DoWork()
unlock();
close();
}
It works as expected.
I am quite sure I missed something ? But I expect my lock operation to honor global-locks (the fact that a lock exists in a JDBC store with an expiration) until an unlock or expiration.
Is my understanding incorrect ?
This works as designed. I didnt configure the DefaultLockRepository correctly and the default ttl was shorter than my service (artificial wait) lock duration. My apologies. :) Josh Long helped me figure this out :)
You have to use different client ids. The same is means the same client. That for special use-case. Use different client ids as they are different instances
The behavior here is subtle (or obvious once you see how this is working) and the general lack of documentation unhelpful, so here's my experience.
I created a lock table by looking at the SQL in DefaultLockRepository, which appeared to imply a composite primary key of REGION, LOCK_KEY and CLIENT_ID - THIS WAS WRONG.
I subsequently found the SQL script in the spring-integration-jdbc JAR, where I could see that the composite primary key MUST BE on just REGION and LOCK_KEY as #ArtemBilan says.
The reason is that the lock doesn't care about the client, obviously, so the primary key must be just the REGION and LOCK_KEY columns. These columns are used when acquiring a lock and it is the key violation that occurs should another client attempt to obtain the lock that is used to restrict other client IDs.
This also implies that, again as #ArtemBilan says, each client instance must have a unique ID, which is the default behavior when no ID specified at construction time.

Using Hibernate, Spring Data JPA in multithreading [duplicate]

I am using Spring Batch and Partition to do parallel processing. Hibernate and Spring Data Jpa for db. For the partition step, the reader, processor and writer have stepscope and so I can inject partition key and range(from-to) to them. Now in processor, I have one synchronized method and expected this method to be ran once at time, but it is not the case.
I set it to have 10 partitions , all 10 Item reader read the right partitioned range. The problem comes with item processor. Blow code has the same logic I use.
public class accountProcessor implementes ItemProcessor{
#override
public Custom process(item) {
createAccount(item);
return item;
}
//account has unique constraints username, gender, and email
/*
When 1 thread execute that method, it will create 1 account
and save it. If next thread comes in and try to save the same account,
it should find the account created by first thread and do one update.
But now it doesn't happen, instead findIfExist return null
and it try to do another insert of duplicate data
*/
private synchronized void createAccount(item) {
Account account = accountRepo.findIfExist(item.getUsername(), item.getGender(), item.getEmail());
if(account == null) {
//account doesn't exist
account = new Account();
account.setUsername(item.getUsername());
account.setGender(item.getGender());
account.setEmail(item.getEmail());
account.setMoney(10000);
} else {
account.setMoney(account.getMoney()-10);
}
accountRepo.save(account);
}
}
The expected output is that only 1 thread will run this method at any given time and so that there will be no duplicate inserttion in db as well as avoid DataintegrityViolationexception.
Actually result is that second thread can't find the first account and try to create a duplicate account and save to db, which will cause DataintegrityViolationexception, unique constraints error.
Since I synchronized the method, thread should execute it in order, second thread should wait for first thread to finish and then run, which mean it should be able to find the first account.
I tried with many approaches, like a volatile set to contains all unique accounts, do saveAndFlush to make commits asap, using threadlocal whatsoever, no of these works.
Need some help.
Since you made the item processor step-scoped, you don't really need synchronization as each step will have its own instance of the processor.
But it looks like you have a design problem rather than an implementation issue. You are trying to sychronize threads to act in a certain order in a parallel setup. When you decide to go parallel and divide the data into partitions and give each worker (either local or remote) a partition to work on, you must admit that these partitions will be processed in an undefined order and that there should be no relation between records of each partition or between the work done by each worker.
When 1 thread execute that method, it will create 1 account
and save it. If next thread comes in and try to save the same account,
it should find the account created by first thread and do one update. But now it doesn't happen, instead findIfExist return null and it try to do another insert of duplicate data
That's because the transaction of thread1 may not be committed yet, hence thread2 won't find the record you think have been inserted by thread1.
It looks like you are trying to create or update some accounts with a partitioned setup. I'm not sure if this setup is suitable for the problem at hand.
As a side note, I would not call accountRepo.save(account); in an item processor but rather do that in an item writer.
Hope this helps.

Using QSqlQuery from multiple threads

I have a lot of C++11 threads running which all need database access at some time. In main I do initalize the database connection and open the database. Qt documentation says that queries are not threadsafe so I use a global mutex until a QSqlQuery exists inside a thread.
This works but is that guaranteed to work or do I run into problems at some time?
A look at the Documentation tells us, that
A connection can only be used from within the thread that created it.
Moving connections between threads or creating queries from a
different thread is not supported.
So you do indeed need one connection per thread. I solved this by generating dynamic names based on the thread:
auto name = "my_db_" + QString::number((quint64)QThread::currentThread(), 16);
if(QSqlDatabase::contains(name))
return QSqlDatabase::database(name);
else {
auto db = QSqlDatabase::addDatabase( "QSQLITE", name);
// open the database, setup tables, etc.
return db;
}
In case you use threads not managed by Qt make use of QThreadStorage to generate names per thread:
// must be static, to be the same for all threads
static QThreadStorage<QString> storage;
QString name;
if(storage.hasLocalData())
name = storage.localData();
else {
//simple way to get a random name
name = "my_db_" + QUuid::createUuid().toString();
storage.setLocalData(name);
}
Important: Sqlite may or may not be able to handle multithreading. See https://sqlite.org/threadsafe.html. As far as I know, the sqlite embedded into Qt is threadsafe, as thats the default, and I could not find any flags that disable it in the sourcecode. But If you are using a different sqlite version, make shure it does actually support threads.
You can write class with SQL functions and use signals-slots to do the queries and get result from database.
It's thread-safe also no need to use mutex.
You choose not well approach. Should use shared QSqlDatabase object instead QSqlQuery. Please check next example of multithreading database access. If that will not clear for you please let me know. Will explain more.

Need to lock a copy of an stl::map of pointers while reading?

I have an stl::map<int, *msg> msg_container, where msg is a class (not relevant here).
There are multiple threads adding to the global msg_container, with locks in place for synchronised access.
In a seperate thread, it needs to assess a local copy of msg_container at a particular time and perform checks on it. Pseudo-code as below
map<int, *msg> msg_container;
map<int, *msg> msg_container_copy;
if (appropriate_time_is_reached)
{
msg_container_copy = msg_container;
//perform functions on msg_container_copy
}
As per my previous question, I know I will need to lock msg_container when reading, if there is a chance that other threads are adding to it.
Do I need to lock msg_container_copy when using it in this manner? It is local only to this thread, so there are no other threads that will be accessing it.
I do not see the necessity to lock the variable msg_container_copy if as you describe, "It is local only to this thread, so there are no other threads that will be accessing it."
By the way, I think the definition "stl::map<int, *msg> msg_container;" should be written as "stl::map<int, msg *> msg_container;" if msg is a class, so that msg * is a pointer type. It must be a typo.
You don't need a lock to access msg_container_copy because no other thread can access it.
You might need a lock when dereferencing the pointers it contains, because they are shared with other threads. It depends what you do with those pointers.

Resources