Multithread UnitOfWork in Nhibernate - multithreading

We are working on a c# windows service using NHibernate which is supposed to process a batch of records.
The service has to process about 6000 odd records and its taking about 3 hours at present to process these. There are a lot of db hits incurred and while we are trying to minimize these , we are also exploring multithreading options to improve performance.
We are using the UnitOfWork pattern to access the NHibernate session.
This is roughly how the service looks :
public class BatchService
{
public DoWork()
{
StartUnitOfWork();
foreach ( var record in recordsToBeProcessed)
{
Process(record);
// Perform lots of db operations
}
StopUnitOfWork();
}
}
We were thinking of using the Task Parallel Library to try to process these records in batches ( using the Parallel.Foreach () method).
From what I have read about NHibernate so far , we should provide each thread a separate NHibernate session.
My query is how do we supply this ..considering the UnitOfWork pattern which only allows one session to be available.
Should I be looking at wrapping a UnitOfWork around the processing of a single record ?
Any help much appreciated.
Thanks

The best way is to start a new unitofwork for each thread, use a thread-static contextual session NHibernate.Context.ThreadStaticSessionContext. You must be aware of dettached entities.

The easiest way is to wrap each processing of a record in it's own unit of work and then run each UOW on it's own thread. You need to make sure that each UOW & session is started, used and completed on a single thread.
To gain performance you could split the batch of records in smaller batches and then wrap the processing of this smaller batches into UOWs and execute them on separate threads.
Depending on your workload using a second level cache (memcached/membase) might dramatically improve your performance. (eg if you need to read some records from the db for each processing )

Related

Multiple insert in single transaction using multithreading

I am using springboot Jpa and marked a method as transactional.
In the transactional method, multiple inserts are there which are independent of each other.
I am trying to improve the performance by using multiple threads inside the transactional method but Transactional behavior is lost when I did this.
Can you please suggest a way to achieve this behavior.
#Transactional
methodA(){
repo1.save(A)
repo2.save(B)
repo3.save(C)
repo4.save(D)
}
I want to run these save operations in different threads but in one single transaction.

Synchronicity of Azure stored procedures [duplicate]

Can documentDb stored procedures run in parallel and update the same object? Will documentDb process them sequentially?
Consider the following scenario.
I have an app and I have 10000 coins to give away to my users when they complete a task. And I have the following object
{
remainingPoints: 10000
}
I have a stored procedure that subtracts 10 points from this object and adds them to the users' points.
Now lets say 10 users complete the task at the same time and I call the stored procedure 10 times at the same time, will DocDb execute them sequentially? Or will I have to execute the stored procedures sequentially?
I had similar questions when I first started using DocumentDB and got good answers here and in email from the DocumentDB product managers. Quoting:
Stored procedures ... get an isolated snapshot of the database for transactional support. The snapshot reflects the current state of the world (no stale data) at the time the sproc begins execution (strongly consistent).
Caveat – since stored procedures are operating on a snapshot, you can still get a stale read in a sproc if a new write come in from the outside world during execution.
Also, stored procedures will ALWAYS read their owns writes.
Sprocs are DocumentDB’s mechanism for multi-document transactions. Sproc writes are committed when a sproc successfully complete execution. If an exception is thrown, all work done in a sproc gets rolled back.
So if two are sprocs are running concurrently, they won’t see eachother’s writes.
If both sprocs happen to write to the same document (replace) – then the 2nd one will fail due to an etag mismatch when it attempts to commit writes.
From that, I went forward with my design making sure to use ETags in my writes as #Julian suggests. I also automatically retry up to 3 times each sproc execution to handle the case where they fail due to parallel operations among other reasons. In practice, I've never exceed the 3 retries (except in cases where my sproc had a bug) and I rarely even get a single retry.
I assume from the behavior that I observe that it sends each new sproc execution to a different replica until it runs out of replicas and then it queues them for sequential execution, so it's a hybrid of parallel and serial execution.
One other tip that I learned through experimentation is that you are better off doing pure read operations (no writes and no significant aggregation) client-side rather than in a sproc when you are on a heavily loaded system. I assume the advantage is because DocumentDB can satisfy different reads from different replicas in parallel. I have modularized my sproc code using the expandScript functionality of documentdb-utils to make sure that I use the exact same code for write validation, intra-document consistency, and derived fields both client-side and server-side, which is possible using node.js. Even if you are mostly .NET, you may want to use expandScripts to build your sprocs in a modular DRY way. You'll still need to run node.js in your build process to pre-process your sprocs or use Edge.NET (node running inside of .NET) to do so on the fly.
It will depend on the consistency you have choose for your collection. But the idea is that DocumentDb handle concurrency using etag and executes stored procedure on a snapshot of a document version, and commit the result only if the execution succeed.
See: https://azure.microsoft.com/en-us/documentation/articles/documentdb-faq/#develop
This thread may help too: Atomically increment an integer in a document in Azure DocumentDB

How to optimize mongodb insert query in nodejs?

We are doing manipulation and insertion of data in mongo db. So for single insert in mongo db it is taking 28ms. I have to insert 2 times per request. At a time, if I get 6000 requests, I have to insert each data individually and it takes lot more time. How can I optimize this? Kindly help me on this.
var obj = new gnModel({
id: data.EID,
val: data.MID,
});
let response = await insertIntoMongo(gnModel);
If it is not vital for the data to be stored immediately, you can implement some form of batching.
For example, you can have a service which queues operations and commits them to the database every X seconds. In the service itself, you can use mongo's Bulk and more specifically for insertion: Bulk.insert(). It lets you queue operations to be executed as a single query(or at least minimal amount of queries/round trips).
It would also be a good idea to serialize and store this operation log/cache somewhere, as server restart will wipe it out if it is stored entirely in memory. A possible solution is Redis as it can both persist data to disk and is also distributed thus enabling you to queue operations from different application instances.
You'll achieve even better performance if the operations are unrelated and not dependent on each other. In this case you can use db.collection.initializeUnorderedBulkOp() which will allow mongo to execute the operations in parallel instead of sequentially and a single operation fail won't affect the execution of the rest of the set(contrary to OrderedBulkOp).

Should I use Hazelcast to detect duplicate requests to a REST service

I have a simple usecase. I have a system where duplicate requests to a REST service (with dozens of instances) are not allowed. However, also difficult to prevent because of a complicated datastore configuration and downstream services.
So the only way I can prevent duplicate "transactions" is to have some centralized place where I write a unique hash of a request data. Each REST endpoint first checks if the hash of a new request already exists and only proceeds if no such hash exists.
For purposes of this question assume that it's not possible to do this with database constraints.
One solution is to create a table in the database where I store my request hashes and always write to this table before proceeding with the request. However, I want something lighter than that.
Another solution is to use something like Redis and write to redis my unique hashes before proceeding with the request. However, I don't want to spin up a Redis cluster and maintain it etc..
I was thinking of embedding Hazelcast in each of my app instances and write my unique hashes there. In theory, all instances will see the hash in the memory grid and will be able to detect duplicate requests. This solves my problem of having a lighter solution than a database and the other requirement of not having to maintain a Redis cluster.
Ok now for my question finally. Is it a good idea to use Hazelcast for this usecase?
Will hazelcast be fast enough to detect duplicate requests that come in milliseconds or microseconds apart ?
If request 1 comes into instance 1 and request 2 comes into instance 2 microseconds apart. Instance 1 writes to hazelcast a hash of the request, instance 2 checks hazelcast for existence of the hash only millyseconds later will the hash have be detected? Is hazelcast going to propagate the data across the cluster in time? Does it even need to do that?
Thanks in advance, all ideas are welcome.
Hazelcast is definitely a good choice for this kind of usecase. Especially if you just use a Map<String, Boolean> and just test with Map::containsKey instead of retrieving the element and check for null. You should also put a TTL when putting the element, so you won't run out of memory. However, same as with Redis, we recommend to use Hazelcast with a standalone cluster for "bigger" datasets, as the lifecycle of cached elements normally interferes with the rest of the application and complicates GC optimization. Running Hazelcast embedded is a choice that should be taken only after serious considerations and tests of your application at runtime.
Yes you can use Hazelcast distributed Map to detect duplicate requests to a REST service as whenever there is put operation in hazelcast map data will be available to all the other clustered instance.
From what I've read and seen in the tests, it doesn't actually replicate. It uses a data grid to distribute the primary data evenly across all the nodes rather than each node keeping a full copy of everything and replicating to sync the data. The great thing about this is that there is no data lag, which is inherent to any replication strategy.
There is a backup copy of each node's data stored on another node, and that obviously depends on replication, but the backup copy is only used when a node crashes.
See the below code which creates two hazelcast clustered instances and get the distributed map. One hazelcast instance putting the data into distibuted IMap and other instance is getting data from the IMap.
import com.hazelcast.config.Config;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.IMap;
public class TestHazelcastDataReplication {
//Create 1st Instance
public static final HazelcastInstance instanceOne = Hazelcast
.newHazelcastInstance(new Config("distributedFisrtInstance"));
//Create 2nd Instance
public static final HazelcastInstance instanceTwo = Hazelcast
.newHazelcastInstance(new Config("distributedSecondInstance"));
//Insert in distributedMap using instance one
static IMap<Long, Long> distributedInsertMap = instanceOne.getMap("distributedMap");
//Read from distributedMap using instance two
static IMap<Long, Long> distributedGetMap = instanceTwo.getMap("distributedMap");
public static void main(String[] args) {
new Thread(new Runnable() {
#Override
public void run() {
for (long i = 0; i < 100000; i++) {
//Inserting data in distributedMap using 1st instance
distributedInsertMap.put(i, System.currentTimeMillis());
//Reading data from distributedMap using 2nd instance
System.out.println(i + " : " + distributedGetMap.get(i));
}
}
}).start();
}
}

Transactional operation with SaveChanges and ExecuteStoreCommand

I have a problem that I would like to share. The context is a bit messy, so I will try to do my best in the explanation.
I need to create a transactional operation over a number of entities. I'm working with EF CodeFirst but with a legacy database that I can't change. In order to create a more consistent model than the database provides I'm projecting the database information into a more refined entities I created on my own.
As I need to use different contexts, my initial idea was to use TransactionScope which gave me good results in the past. Why do I need different contexts? Due to diverse problems with db, I can't make the updates only in one operation (UnitOfWork). I need to retrieve different IDs which only appears after SaveChanges().
using (var scope = new TransactionScope())
{
Operation1();
Operation2();
Operation3(uses ExecuteStoreCommand)
SaveChanges();
Operation4();
SaveChanges();
}
I know that, in order to use TransactionScope, I need to share the same connection among all the operations (And I'm doing it, passing the context to the objects). However, when I execute one of the operations (which uses ExecuteStoreCommand) or I try to do some update after the first SaveChanges I always receive the MSDTC error (the support for distributed transactions is disabled) or even more rare, as unloaded domains.
I don't know if someone can help me, at least to know which is the best direction for this scenario.
Have a look at this answer:
Entity Framework - Using Transactions or SaveChanges(false) and AcceptAllChanges()?
The answer does exactly what you require having a transaction, over multiple data contexts.
This post on Transactions and Connections in Entity Framework 4.0 I found really helpful too.
For people who may need a simpler solution, here's what I use when I need to mix ExecuteStoreCommand and SaveChanges in a transaction.
using (var dataContext = new ContextEntities())
{
dataContext.Connection.Open();
var trx = dataContext.Connection.BeginTransaction();
var sql = "DELETE TestTable WHERE SomeCondition";
dataContext.ExecuteStoreCommand(sql);
var list = CreateMyListOfObjects(); // this could throw an exception
foreach (var obj in list)
dataContext.TestTable.AddObject(obj);
dataContext.SaveChanges(); // this could throw an exception
trx.Commit();
}

Resources