If we know that some organizations might want to keep certain information private from others, why not just create a separate channel? Is private-data purely just for management and to reduce channel overhead?
I've read the documentation on when to use a collection within a channel vs. a separate channel:
Use channels when entire transactions (and ledgers) must be kept confidential within a set of organizations that are members of the channel.
Use collections when transactions (and ledgers) must be shared among a set of organizations, but when only a subset of those organizations should have access to some (or all) of the data within a transaction. Additionally, since private data is disseminated peer-to-peer rather than via blocks, use private data collections when transaction data must be kept confidential from ordering service nodes.
Take a practical example for this. There is an auction house and 3-4 vendors who regularly bid. The bidding type is a closed auction. The auction house is one node and will announce the item to be bid upon. This item must be visible by all the vendors. Each vendor will then submit their bid for the item over the blockchain. As each bid is private, the vendors may view only their bid, while the auction house has full visibility.
Without private data
1) Channel PUBLIC -> Auction house creates a bid, all vendors can view it
2) Channel VENDOR_1, VENDOR_2, VENDOR_3 - Only one vendor and auction house are on this channel. The vendor submits his bid over here
What happens is the auction house now has to check bids across multiple channels, choose the winner and then update all channels appropriately. At a larger scale and more complex systems the overhead associated is massive. You may require separate modules/ API calls that just ensure state of certain objects (bids) are the same across channels.
Instead private data will allow a single channel to be used. A vendor may submit a bid that is viewable by everyone, BUT mark the price of the bid as private, so only the auction house and the vendor can view it.
Yes,private-data is mostly used to reduce channel overhead.
Adding a new Private data collection dynamically is more convenient and easy and have pretty much no overhead on the network.
Where As Having too many channels in the network can lead to a maintenance nightmare and can drastically affect the networks performance.
when to use Multiple channels
when its okay to have isolated transactions
Number of channels are manageable.
When to use Private data collection.
when its just required to hide the txn data(confidential data) and
not isolate other users from viewing the interaction between the
parties involved.(others can only see the hash of the data anyway but
they would know there was a txn between the involved parties.)
Would like to highlight one important distinction (its also in your documentation quote): Private collections hide the transaction data from the orderers too i.e. these transactions are never submitted for ordering. When using the multiple channels approach, your transaction is shared with the orderer(s).
Related
I'm trying to design a double-entry ledger with DDD and running into some trouble with defining aggregate roots. There are three domain models:
LedgerLine: individual line items that have data such as amount, timestamp they are created at, etc.
LedgerEntry: entries into the ledger. Each entry contains multiple LedgerLines where the debit and credit lines must balance.
LedgerAccount: accounts in the ledger. There are two types of accounts: (1) internal accounts (e.g. cash) (2) external accounts (e.g. linked bank accounts). External accounts can be added/removed.
After reading some articles online (e.g. this one: https://lorenzo-dee.blogspot.com/2013/06/domain-driven-design-accounting-domain.html?m=0). It seems like LedgerEntry should be one aggregate root, holding references to LedgerLines. LedgerAccount should be the other aggregate root. LedgerLines would hold the corresponding LedgerAccount's ID.
While this makes a lot of sense, I'm having trouble figuring out how to update the balance of ledger accounts when ledger lines are added. The above article suggests that the balance should be calculated on the fly, which means it wouldn't need to be updated when LedgerEntrys are added. However, I'm using Amazon QLDB for the ledger, and their solutions engineer specifically recommended that the balance should be computed and stored on the LedgerAccount since QLDB is not optimized for such kind of "scanning through lots of documents" operation.
Now the dilemma ensues:
If I update the balance field synchronously when adding LedgerEntrys, then I would be updating two aggregates in one operation, which violates the consistency boundary.
If I update the balance field asynchronously after receiving the event emitted by the "Add LedgerEntry" operation, then I could be reading a stale balance on the account if I add another LedgerEntry that spends the balance on the account, which could lead to overdrafts.
If I subsume the LedgerAccount model into the same aggregate of LedgerEntry, then I lose the ability to add/remove individual LedgerAccount since I can't query them directly.
If I get rid of the balance field and compute it on the fly, then there could be performance problems given (1) QLDB limitation (2) the fact that the number of ledger lines is unbounded.
So what's the proper design here? Any help is greatly appreciated!
You could use Saga Pattern to ensure the whole process completes or fails.
Here's a primer ... https://medium.com/#lfgcampos/saga-pattern-8394e29bbb85
I'd add 'reserved funds' owned collection to the Ledger Account.
A Ledger Account will have 'Actual' balance and 'Available Balance'.
'Available Balance' is 'Actual' balance less the value of 'reserved funds'
Using a Saga to manage the flow:
Try to reserve funds on the Account aggregate. The Ledger Account will check its available balance (actual minus total of reserved funds) and if sufficient, add another reserved funds to its collection. If reservation succeeds, the account aggregate will return a reservation unique id. If reservation fails, then the entry cannot be posted.
Try to complete the double entry bookkeeping. If it fails, send a 'release reservation' command to the Account aggregate quoting the reservation unique id, which will remove the reservation and we're back to where we started.
After double entry bookkeeping is complete, send a command to Account to 'complete' reservation with reservation unique id. The Account aggregate will then remove the reservation and adjust its actual balance.
In this way, you can manage a distributed transaction without the possibility of an account going overdrawn.
An aggregate root should serve as a transaction boundary. A multi-legged transaction spans multiple accounts, hence an account cannot be.
So a ledger itself is an aggregate root. An accounting transaction should correspond to database transaction.
Actually, "ledger itself" doesn't mean a singleton. It can be org branch * time period ledger. And it usually is in non-computer event-sourcing systems.
Update.
A ledger account balances is merely a view into the ledger. And as a view it has a state as of some known event. When making up a decision whether to accept an operation or not, you should make sure that the actual state of the ledger is the latest state processed as of the balances. If it is not - the newer events should be processed first, and then an account operation should be tried again.
Hyperledger fabric provides inbuilt support storing offchain data with the help of private collections. For this we need to specify the collection config which contains various collection names along with the participants that has access to data present in those collections.
There is a setting called "BlockToLive" using which we can specify for how many blocks the peers should store the private data they have access to. The peers will automatically purge the private data after the ledger block height reaches to the mentioned threshold.
We have a requirement in which we need to use the private data collections but the data should be removed (automatically/manually) after exactly 30 days. Is there any possibility to achieve the same?
timeToLive: Is there any implementation for specifying the timeToLive or similar configuration? Using this the peers will automatically purge the data after mentioned duration.
If there is no automatic way possible currently, how can the data present in private collection be removed manually? Is there any way by which the data in private collections be removed directly using external scripts/code? We don't want to to create chaincode methods that will be used to invoke as transactions to delete private data as even the deletion of private data will need to be endorsed and sent to the orderer and needs to be added to the ledger. How can the private data be removed directly?
First, everything that you put on the blockchain is permanent and supposed to be decentralized. So, having unilateral control over when to delete the private data goes against the virtue of decentralization and you should avoid it (answer to point 2). Endorsers endorse every change or transactions. (including the BlockToLive), so it does not make sense to deviate from the agreed period.
Second, time in distributed systems is subjective and it impossible to have a global clock ⏰ (say 30 days for one node can be 29.99 for another or 29.80 days for another). Hence, time is measured in blocks which is objective for all nodes. So, it is recommended that you use BlockToLive. It can be difficult first, but you can calculate backwards.
Say you have BlockSize as 10 (no. of transactions in a block) and expect around 100 transactions per day, then you can set BlockToLive = 300. (Of course, this is a ballpark number).
Finally, if you still want to delete private data at will, I would recommend manual off-chain storage mechanisms.
I am aware there is a lot of topics on set validation and i won’t say i have read every single one of them but i’ve read a lot and still don’t feel i’ve seen some definite answer that doesn’t smell hackish.
Consider this:
we have a concept of Customer
Customer has some general details data
Customer can make Transaction (buying things from the store)
if Customer is in credit mode then he has a limit of how much he can spend in a year
number of Transactions per Customer per year can be huge (thousands+)
it is critical that Customer never spents a cent over a limit (there is no human delivering goods that would check limit manually)
Customer can either create new Transaction or add items to existing ones and for both the limit must be checked
Customer can actualy be a Company behind which there are many Users making actual transactions meaning Transactions can be created/updated concurrently
Obviously, i want to avoid loading all Transactions for Customer when creating new or editing existing Transaction as it doesn’t scale well for huge number of Transactions.
If i introduce aggregate dedicated to check currentLimitSpent before create/update Transaction then i have non-transactional create/update (one step to check currentLimitSpent and then another for create/update Transaction).
I know how to implement this if i don’t care about all ddd rules (or if its eventual consistency approach) but i am wondering if there is some idiomatic ddd way of solving this kinds of problems with strict consistency that doesnt involve loading all Transactions for every Transaction create/update?
it is critical that Customer never spents a cent over a limit (there
is no human delivering goods that would check limit manually)
Please read this couple of posts: RC Dont Exist and Eventual Consistency
If the systems owners still think that the codition must be honored then, to avoid concurrency issues, you could use a precomputed currentLimitSpent stored in persistence (because no Event Sourcing Tag in your question) to check the invariant and use it as optimistic concurrency flag.
Hidrate your aggregate with currentLimitSpent and any other data you need from persistence.
Check rules (customerMaxCredit <= currentLimitSpent + newTransactionValue).
Persist (currentLimitSpent + newTransactionValue) as the new currentLimitSpent.
If currentLimitSpent has changed in persistence while the aggregate was working (many Users in the same Company making transactions) you should get a optimisticConcurrency error from persistence.
You could stop on exception or rehidrate the aggregate and try again.
This is a overview. It can not be more detailed without entering into tech stack details and architectural design.
In my domain I have Product and Order aggregates. Orders reference Products. This is a 1 to n relationship, so a Product has many Orders and Orders belong to a Product. When a Product is discontinued a ProductDiscontinued event is published and all Orders that belong to that Product must be cancelled. So there's an adapter that receives the ProductDiscontinued event via RabbitMQ. The adapter then delegates cancelling Orders to an application service. How can I achieve that a single Order is cancelled in a single transaction? Should the adapter iterate all Orders of the discontinued Product and call the application service for every single Order? Should I just ignore that I modify more than one aggregate in a single transaction and call the application service just once with a list of all affected OrderIds? Is there a better solution?
From the DDD point of view, the Aggregate is the transaction boundary. The transaction should not be larger than the Aggregate. This rule exist in order to force one to correctly design the Aggregates, to not depend on multiple Aggregates modified in the same transaction.
However, you already designed your Aggregates having that in mind (from what I can see).
Should the adapter iterate all Orders of the discontinued Product and call the application service for every single Order?
This is the normal way of doing things.
Should I just ignore that I modify more than one aggregate in a single transaction and call the application service just once with a list of all affected OrderIds?
In the context of what I wrote earlier, you may do that if somehow it offers a better performance (I don't see how a bigger transaction can give better performance but hey, it depends on the code also).
In DDD, an aggregate root can have a repository. Let us take an Order aggregate and it's non-persistant counterpart OrderRepository and persistent counterpart OrderUoW. We have also ProductVariant aggregate which tracks the inventory of the products in the order. It can have a ProductVariantRepository and ProductVariantUoW.
The way the Order and the ProductVariant work is that before the order is persisted, the inventory is checked. If there is inventory, the order will be persisted by calling OrderUoW.Commit(). Yes, the ProductVariantUoW.Commit() will be called next to update the inventory of the products.
UNFORTUNATELY things can go bad, a user bought the same products in that short time (Consider this as a web app where two users are buying the same products). Now the whole transaction for the second user should fail by reverting the order that just created. Should I call the OrderUoW to rollback the changes (the order should be deleted from the db)? Or should I put both UoW.Commit() operations in a transaction scope, so failing of one commit() will rollback the changes? Or both the repositories (Order, ProductVariant) should have only UoW and it needs to have only one transaction scope?
I may be able to make the story short by saying, how the transaction is handled where there are multiple repositories involved?
A question we could ask is who is doing the following:
The way the Order and the ProductVariant work is that before the order
is persisted, the inventory is checked. If there is inventory, the
order will be persisted by calling OrderUoW.Commit(). Yes, the
ProductVariantUoW.Commit() will be called next to update the inventory
of the products.
Some argue that this kind of work belongs in the service layer, which allows the service layer to put things crossing aggregate objects into a single transaction.
According to http://www.infoq.com/articles/ddd-in-practice:
Some developers prefer managing the transactions in the DAO classes
which is a poor design. This results in too fine-grained transaction
control which doesn't give the flexibility of managing the use cases
where the transactions span multiple domain objects. Service classes
should handle transactions; this way even if the transaction spans
multiple domain objects, the service class can manage the transaction
since in most of the use cases the Service class handles the control
flow.
I think as an alternative to using a single transaction, you can claim the inventory using ProductVariant, and, if all the inventory items necessary are available then you can commit the order. Otherwise (i.e. you can't claim all the products you need for the order) you have to return the inventory that was successfully claimed using compensating transactions. The results it that in the case of unsuccessfull commit of an order, some of the inventory will temporarily appear unavailable for other orders, but the advantage is that you can work without a distributed transaction.
None the less, this logic still belongs in the service layer, not the DAO classes.
The way you are using unit of work seems a bit fine-grained. Just in case you haven't read Martin Fowler's take: http://martinfowler.com/eaaCatalog/unitOfWork.html
That being said you want to handle the transaction at the use-case level. The fact that the inventory is checked up-front is simply a convenience (UX) and the stock level should be checked when persisting the various bits also. An exception can be raised for insufficient stock.
The transaction isolation level should be set such that the two 'simultaneous' parts are performed serially. So whichever one gets to update the stock levels first is going to 'win'. The second will then raise the exception.
If you can use a single UoW then do so, because it's easier.
If your repositories are on different DBs (or maybe one is file-based and the others are not) then you may be forced to use multiple UoWs, but then you're writing roll-back commands too, because if UoW1 saves changes to SqlRepo OK, but then UoW2 fails to save changes to FileRepo then you need to rollback SqlRepo. Don't bother writing all that rollback command stuff if you can avoid it!