Do I need to use transaction when reading data after write?

Do I need to use transaction when reading data after write? - node.js

I have a Node.js web app with a route that marks some entity as deleted - flipping boolean field in a database. This route returns that entity. Right now I have code that looks like this:
UPDATE entity SET is_deleted=true WHERE entity.id = ?
SELECT * FROM entity WHERE entity.id = ?
For the moment I can't use RETURNING statement for other reasons.
So I got in the argument with colleague, I think that putting both UPDATE and SELECT inside transaction is unnecessary, because we are not doing anything significant with data, just returning it. As a user of the app I would expect that data that is returned is as fresh as possible, meaning that I would get same results on page refresh.
My question is, what is the best practice regarding reading data after write? Do you always wrap reading with writing inside transaction? Or it depends?

Well, for performance reasons you want to keep your transactions as small and quick as possible. This will minimize the chance to have potential locks and deadlocks that could bring your application to its knees. As such, unless there is a very good reason to do so, keep your select statements outside of the transaction. This is specially important if your need to execute a long running select statement. By putting the select inside the transaction, you keep the update locks much longer than needed.

Related

Calling Save Changes Multiple Times

I was wondering if anyone has done any perf tests around the effect calling EF Cores SaveChangesAsync() has on performance if there are no changes to be saved.
Essentially I am assuming it's basically nothing and therefore isn't a big deal to call it "just in case"?
(I am trying to do something with tracking user activity in middleware in asp net core and essentially on the way out I want to make sure save changes was called to persist the activity to the database. There is a chance that it has already been called on the context depending on the operation of the user and if that's the case I don't want to incur the cost of a second operation when the activity could be persisted as part of the normal transaction/round trip)

As you can see in implementation if there are no changes, nothing will be done. As far it has impact to performance, I don't know. But of course calling SaveChanges or SaveChangesAsync without any changes will have a performance impact in relation to don't call them.
That's the same behavior like EF6 has too.

Avoiding race condition for inserting model to DB on complex conditions

We are trying to create an algorithm/heuristic that will schedule a delivery at a certain time period, but there is definitely a race condition here, whereby two conflicting scheduled items could be written to the DB, because the write is not really atomic.
The only way to truly prevent race conditions is to create some atomic insert operation, TMK.
The server receives a request to schedule something for a certain time period, and the server has to check if that time period is still available before it writes the data to the DB. But in that time the server could get a similar request and end up writing conflicting data.
How to circumvent this? Is there some way to create some script in the DB itself that hooks into the write operation to make the whole thing atomic? By putting a locking mechanism on that script? What makes the whole thing non-atomic is the read and the wire time between the server and the DB.

Whenever I run into race condition I think of one immediate solution QUEUE.
Step 1) What you can do is that instead of adding data to a database directly you can add it to queue without checking anything.
Step 2) A separate reader will read from the queue check DB for any conflict and take necessary action.
This is one of the ways to solve this If you implement any better solution please do share it.
Hope that helps

How to deal with Command which is depend on existing records in application using CQRS and Event sourcing

We are using CQRS with EventSourcing.
In our application we can add resources(it is business term for a single item) from ui and we are sending command accordingly to add resources.
So we have x number of resources present in application which were added previously.
Now, we have one special type of resource(I am calling it as SpecialResource).
When we add this SpecialResource , id needs to be linked with all existing resources in application.
Linked means this SpecialResource should have List of ids(guids) (List)of existing resources.
The solution which we tried to get all resource ids in applcation before adding the special
resource(i.e before firing the AddSpecialResource command).
Assign these List to SpecialResource, Then send AddSpecialResource command.
But we are not suppose to do so , because as per cqrs command should not query.
I.e. command cant depend upon query as query can have stale records.
How can we achieve this business scenario without querying existing records in application?

But we are not suppose to do so , because as per cqrs command should not query. I.e. command cant depend upon query as query can have stale records.
This isn't quite right.
"Commands" run queries all the time. If you are using event sourcing, in most cases your commands are queries -- "if this command were permitted, what events would be generated?"
The difference between this, and the situation you described, is the aggregate boundary, which in an event sourced domain is a fancy name for the event stream. An aggregate is allowed to run a query against its own event stream (which is to say, its own state) when processing a command. It's the other aggregates (event streams) that are out of bounds.
In practical terms, this means that if SpecialResource really does need to be transactionally consistent with the other resource ids, then all of that data needs to be part of the same aggregate, and therefore part of the same event stream, and everything from that point is pretty straight forward.
So if you have been modeling the resources with separate streams up to this point, and now you need SpecialResource to work as you have described, then you have a fairly significant change to your domain model to do.
The good news: that's probably not your real requirement. Consider what you have described so far - if resourceId:99652 is created one millisecond before SpecialResource, then it should be included in the state of SpecialResource, but if it is created one millisecond after, then it shouldn't. So what's the cost to the business if the resource created one millisecond before the SpecialResource is missed?
Because, a priori, that doesn't sound like something that should be too expensive.
More commonly, the real requirement looks something more like "SpecialResource needs to include all of the resource ids created prior to close of business", but you don't actually need SpecialResource until 5 minutes after close of business. In other words, you've got an SLA here, and you can use that SLA to better inform your command.
How can we achieve this business scenario without querying existing records in application?
Turn it around; run the query, copy the results of the query (the resource ids) into the command that creates SpecialResource, then dispatch the command to be passed to your domain model. The CreateSpecialResource command includes within it the correct list of resource ids, so the aggregate doesn't need to worry about how to discover that information.

It is hard to tell what your database is capable of, but the most consistent way of adding a "snapshot" is at the database layer, because there is no other common place in pure CQRS for that. (There are some articles on doing CQRS+ES snapshots, if that is what you actually try to achieve with SpecialResource).
One way may be to materialize list of ids using some kind of stored procedure with the arrival of AddSpecialResource command (at the database).
Another way is to capture "all existing resources (up to the moment)" with some marker (timestamp), never delete old resources, and add "SpecialResource" condition in the queries, which will use the SpecialResource data.
Ok, one more option (depends on your case at hand) is to always have the list of ids handy with the same query, which served the UI. This way the definition of "all resources" changes to "all resources as seen by the user (at some moment)".

I do not think any computer system is ever going to be 100% consistent simply because life does not, and can not, work like this. Apparently we are all also living in the past since it takes time for your brain to process input.
The point is that you do the best you can with the information at hand but ensure that your system is able to smooth out any edges. So if you need to associate one or two resources with your SpecialResource then you should be able to do so.
So even if you could associate your SpecialResource with all existing entries in your data store what is to say that there isn't another resource that has not yet been entered into the system that also needs to be associated.
It all, as usual, will depend on your specific use-case. This is why process managers, along with their state, enable one to massage that state until the process can complete.
I hope I didn't misinterpret your question :)

You can do two things in order to solve that problem:
make a distinction between write and read model. You know what read model is, right? So "write model" of data in contrast is a combination of data structures and behaviors that is just enough to enforce all invariants and generate consistent event(s) as a result of every executed command.
don't take a rule which states "Event Store is a single source of truth" too literally. Consider the following interpretation: ES is a single source of ALL truth for your application, however, for each specific command you can create "write models" which will provide just enough "truth" in order to make this command consistent.

Why limit commands and events to one aggregate? CQRS + ES + DDD

Please explain why modifying many aggregates at the same time is a bad idea when doing CQRS, ES and DDD. Is there any situations where it still could be ok?
Take for example a command such as PurgeAllCompletedTodos. I want this command to lead to one event that update the state of each completed Todo-aggregate by setting IsActive to false.
Why is this not good?
One reason I could think of:
When updating the domain state it's probably good to limit the transaction to a well defined part of the entire state so that only this part need to be write locked during the update. Doing so would allow many writes on different aggregates in parallell which could boost performance in some extremely heavy scenarios.

The response of the question lie in the meaning of "aggregate".
As first thing I would say that you are not modifying 'n' aggregates, but you are modifying 'n' entities.
An aggregate contains more-than-one entity and it is just a transaction concept, the aggregate (pattern) is used when you need to modify the state of more than one entity in your application transactionally (all are modified or none).
Now, why you would modify more than one aggregate with one command?
If you feel this needs, before doing anything else check your aggregate boundaries to see if you can modify it to remove the needs to 1 command -> 'n' aggregate.
An aggregate can contains a lot of entities of the same type, so for your command PurgeAllCompletedTodos, you could also think about expand the transaction boundary from a single Todo to an aggregate UserTodosAggregate that contains all the user todos, and let it manage all the commands for the todos of a single user.
In this way you can modify all the todos of a user in a single transaction.
If this still doesn't solve your problem because, let's say that is needed to purge all completed todos of each user in the application, you will still need to send a command to 'n' aggregates, the aggregate boundary doesn't help, so we can think of having an AllApplicationTodosAggregate that manage the command.
Probably this isn't the best solution, because as you said it that command would block ALL the todos of the application, but, always check if it can be a good trade off (this part of the blocking is explained very well in both Blue Book and Red Book of DDD).
What if I need to modify some entities and can't have them in a single aggregate?
With the previous said, a command that modify more than one aggregate is bad because of transactions. What if you modify 3 aggregate, the first is good, and then the server is shut down?
In this case what you are doing is having a lot of single modification that needs to be managed to prevent inconsistency of the system.
It can be done using a process manager, whom responsabilities are modify all the aggregates sending them the right command and manage failures if they happen.
An aggregate still receive it's own command, but the process manager is in charge to send them in a way it knows (one at time, all in parallel, 5 per time, what-do-you-want)
So you can have a strategy to manage the failure between two transaction, and make decision like: "if something fail, roll back all the modification done untill now" (sending a rollback command to each aggregate), or "if an operation fail repeat it 3 times each 30 minutes and if doens't work then rollback", "if something fail create a notification for the system admin".
(sorry for the long post, at least hope it helps)

What's the best way to keep count of the data set size information in Core Data?

Right now whenever I need to access my data set size (and it can be quite frequently), I perform a countForFetchRequest on the managedObjectContext. Is this a bad thing to do? Should I manage the count locally instead? The reason I went this route is to ensure I am getting 100% correct answer. With Core Data being accessed from more than one places (for example, through NSFetchedResultsController as well), it's hard to keep an accurate count locally.

-countForFetchRequest: is always evaluated in the persistent store. When using the Sqlite store, this will result in IO being performed.
Suggested strategy:
Cache the count returned from -countForFetchRequest:.
Observe NSManagedObjectContextObjectsDidChangeNotification for your own context.
Observe NSManagedObjectContextDidSaveNotification for related contexts.
For the simple case (no fetch predicate) you can update the count from the information contained in the notification without additional IO.
Alternately, you can invalidate your cached count and refresh via -countForFetchRequest: as necessary.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string