How do I read all entities of a kind in a transaction with google cloud datastore nodejs

How do I read all entities of a kind in a transaction with google cloud datastore nodejs - node.js

When I try run a query to read all entities of a kind in a transaction with google datastore it gives me this error
{ Error: Only ancestor queries are allowed inside transactions.
at /root/src/node_modules/grpc/src/client.js:554:15
code: 3,
metadata: Metadata { _internal_repr: {} },
So I need to use an ancestor query. How do I create an ancestor query? It appears to depend on how you structured the hierarchy in datastore. So my next question is, given every entity I have created in datastore has been saved like so (the identifier is unique to the entityData saved)
const entityKey = datastore.key({ namespace: ns, path: [kind, identifier] });
{ key: entityKey, method: 'upsert', data: entityData };
How do I read from the db within a transaction? I think I could do it if I knew the identifiers, but the identifiers are constructed from the entityData that I saved in the kind and I need to read the entities of the kind to figure out what I have in the db (chicken egg problem). I am hoping I am missing something.
More context
The domain of my problem involves sponsoring people. I have stored a kind people in datastore where each entity is a person consisting of a unique identifier, name and grade. I have another kind called relationships where each entity is a relationship containing two of the peoples identifiers, the sponsor & sponsee (linking to people together). So I have structured it like an RDB. If I want to get a persons sponsor, I get all the relationships from the db, loop over them returning the relationships where the person is the sponsee then query the db for the sponsor of that relationship.
How do I structure it the 'datastore' way, with entity groups/ancestors, given I have to model people and their links/relationships.
Let's assume a RDB is out of the question.
Example scenario
Two people have to be deleted from the app/db (let's say they left the company on the same day). When I delete someone, I also want to remove their relationships. The two people I delete share a relationship (one is sponsoring the other). Assume the first transaction is successful i.e. I delete one person and their relationship. Next transaction, I delete one person, then search the relationships for relevant relationships and I find one that has already been deleted because eventually consistent. I try find the person for that relationship and they don't exist. Blows up.
Note: each transaction wraps delete person & their relationship. Multiple people equals multiple transactions.
Scalability is not a concern for my application

Your understanding is correct:
you can't use an ancestor query since your entities are not in an ancestry relationship (i.e. not in the same entity group).
you can't perform non-ancestor queries inside transactions. Note that you also can't read more than 25 of your entities inside a single transaction (each entity is in a separate entity group). From Restrictions on queries:
Queries inside transactions must be ancestor queries
Cloud Datastore transactions operate on entities belonging to up
to 25 entity groups, but queries inside transactions must be
ancestor queries. All queries performed within a transaction must
specify an ancestor. For more information, refer to Datastore
Transactions.
The typical approach in a context similar to yours is to perform queries outside transactions, often just keys only queries - to obtain the entity keys, then read the corresponding entities (up to 25 at a time) by key lookup inside transactions. And use transactions only when it's absolutely needed, see, for example, this related discussion: Ancestor relation in datastore.
Your question apparently suggests you're approaching the datastore with a relational DB mindset. If your app fundamentally needs relational data (you didn't describe what you're trying to do) the datastore might not be the best product for it. See Choosing a storage option. I'm not saying that you can't use the datastore with relational data, it can still be done in many cases, but with a bit more careful design - those restrictions are driving towards scalable datastore-based apps (IMHO potentially much more scalable that you can achieve with relational DBs)
There is a difference between structuring the data RDB style (which is OK with the datastore) and using it in RDB style (which is not that good).
In the particular usage scenario you mentioned you do not need to query for the sponsor of a relationship: you already have the sponsor's key in the relationship entity, all you need to do is look it up by key, which can be done in a transaction.
Getting all relationship entities for a person needs a query, filtered by the person being the sponsor or the sponsee. But does it really have to be done in a transaction? Or is it acceptable if maybe you miss in the result list a relationship created just seconds ago? Or having one which was recently deleted? It will eventually (dis)appear in the list if you repeat the query a bit later (see Eventual Consistency on Reading an Index). If that's acceptable (IMHO it is, relationships don't change that often, chances of querying exactly right after a change are rather slim) then you don't need to make the query inside a transaction thus you don't need an ancestry relationship between the people and relationship entities. Great for scalability.
Another consideration: looping through the list of relationship entities: also doesn't necessarily have to be done in a transaction. And, if the number of relationships is large, the loop can hit the request deadline. A more scalable approach is to use query cursors and split the work across multiple tasks/requests, each handling a subset of the list. See a Python example of such approach: How to delete all the entries from google datastore?
For each person deletion case:
add something like a being_deleted property (in a transaction) to that person to flag the deletion and prevent any use during deletion, like creating new relationship while the deletion task is progressing. Add checks for this flag wherever needed in the app's logic (also in transactions).
get the list of all relationship keys for that person and delete them, using the looping technique mentioned above
in the last loop iteration, when there are no relationships left, enqueue another task, generously delayed, to re-check for any recent relationships that might have been missed in the previous loop execution due to the eventual consistency. If any shows up re-run the loop, otherwise just delete the person
If scalability is not a concern, you can also re-design you data structures to use ancestry between all your entities (placing them in the same entity group) and then you could do what you want. See, for example, What would be the purpose of putting all datastore entities in a single group?. But there are many potential risks to be aware of, for example:
max rate of 1 write/sec across the entire entity group (up to 500 entities each), see Datastore: Multiple writes against an entity group inside a transaction exceeds write limit?
large transactions taking too long and hitting the request deadlines, see Dealing with DeadlineExceededErrors
higher risk of contention, see Contention problems in Google App Engine

Related

Can DDD repositories return data from other aggregate roots?

I'm having trouble getting my head around how to use the repository pattern with a more complex object model. Say I have two aggregate roots Student and Class. Each student may be enrolled in any number of classes. Access to this data would therefore be through the respective repositories StudentRepository and ClassRepository.
Now on my front end say I want to create a student details page that shows the information about the student, and a list of classes they are enrolled in. I would first have to get the Student from StudentRepository and then their Classes from ClassRepository. This makes sense.
Where I get lost is when the domain model becomes more realistic/complex. Say students have a major that is associated with a department, and classes are associated with a course, room, and instructors. Rooms are associated with a building. Course are associated with a department etc.. etc..
I could easily see wanting to show information from all these entities on the student details page. But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
I understand the ClassRepository should only be responsible for updating classes, and not anything in other aggregate roots. But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots? In most cases this would only need to be a partial summary of those related entities (building name, course name, course number, instructor name, instructor email etc..).

But then I would have to make a number of calls to separate repositories per each class the student is enrolled in. So now what could have been a couple queries to the database has increased massively. This doesn't seem right.
Yup.
But does it violate DDD if the values ClassRepository returns contains information from other related aggregate roots?
Nobody cares about "violate DDD". What we care about is: do you still get the benefits of the repository pattern if you start pulling in data from other aggregates?
Probably not - part of the point of "aggregates" is that when writing the business code you don't have to worry to much about how storage is implemented... but if you start mixing locked data and unlocked data, your abstraction starts leaking into the domain code.
However: if you are trying to support reporting, or some other effectively read only function, you don't necessarily need the domain model at all -- it might make sense to just query your data store and present a representation of the answer.
This substitution isn't necessarily "free" -- the accuracy of the information will depend in part on how closely your stored information matches your in memory information (ie, how often are you writing information into your storage).
This is basically the core idea of CQRS: reads and writes are different, so maybe we should separate the two, so that they each can be optimized without interfering with the correctness of the other.

Can DDD repositories return data from other aggregate roots?
Short answer: No. If that happened, that would not be a DDD repository for a DDD aggregate (that said, nobody will go after you if you do it).
Long answer: Your problem is that you are trying to use tools made to safely modify data (aggregates and repositories) to solve a problem reading data for presentation purposes. An aggregate is a consistency boundary. Its goal is to implement a process and encapsulate the data required for that process. The repository's goal is to read and atomically update a single aggregate. It is not meant to implement queries needed for data presentation to users.
Also, note that the model you present is not a model based on aggregates. If you break that model into aggregates you'll have multiple clusters of entities without "lines" between them. For example, a Student aggregate might have a collection of ClassEnrollments and a Class aggregate a collection of Atendees (that's just an example, note that modeling many to many relationships with aggregates can be a bit tricky). You'll have one repository for each aggregate, which will fully load the aggregate when executing an operation and transactionally update the full aggregate.
Now to your actual question: how do you implement queries for data presentation that require data from multiple aggregates? well, you have multiple options:
As you say, do multiple round trips using your existing repositories. Load a student and from the list of ClassEnrollments, load the classes that you need.
Use CQRS "lite". Aggregates and respositories will only be used for update operations and for query operations implement Queries, which won't use repositories, but access the DB directly, therefore you can join tables from multiple aggregates (Student->Enrollments->Atendees->Classes)
Use "full" CQRS. Create read models optimised for your queries based on the data from your aggregates.
My preferred approach is to use CQRS lite and only create a dedicated read model when it's really needed.

DDD - How to form Aggregates where Entities have to reference non-root Entities

I have some Entities and I am trying to follow Domain Driven Design practices to identify Aggregates. I somehow cant do this because I either break the rule of Entities not being allowed to reference non-root Entities of other Aggregates, or I cant form Aggregates at all.
I have the following Entities: Organisation, JobOffer, Candidate, and JobApplication.
An Organisation creates JobOffers but may only have a limited amount of active JobOffers.
A Candidate creates JobApplications but may only have a limited amount of active JobApplications.
A JobApplication references a JobOffer that it is meant for.
Based on that I have to know how many JobOffers an Organisation has before I can create a new one (enforcing limits), I assume Organisation should be an Root-Entity that owns JobOffers. The same applies to Candidates and JobApplications. Now I have two Aggregates: Organisation with JobOffers and Candidate with JobApplications. But... I need to reference JobOffer from JobApplication... and that breaks the rule that I cant reference non-Root-Entities.
I have looked for and found similar questions on this forum but I somehow still cant figure it out, so sorry in advance - I appreciate any help.

I general, you should avoid holding object references to other aggregates but rather reference other aggregates by id. In some cases it can be valid to reference some entity within in another aggregate, but again this should be done via id as well.
If you go this way you should reference a composite id. Aggregates are meant to depict logical boundaries and also transactional boundaries. Child entity ids which are modelled as part of the aggregate only need to be unique inside the boundaries of that aggregate. This makes it a lot easier to focus on stuff just inside those boundaries when performing actions in your system. Even if you are using UUIDs (or GUIDs), if you really need to reference a child entity of another aggregate - let's say you have good reasons for that - you should model the id graph via the aggregate root which means always knowing the id of the other aggregate in combination with the id of the entity you are interested in. That means referencing a composite id.
But: whenever I think I need to reference a child entity of another aggregate root at first I investigate this more deeply. This would mean that this child entity might be important as a stand-alone entity as well.
Did I miss to discover another aggregate root?
In your case, looking at your domain model diagram, I suspect JobOffer should be an aggregate on its own. Of course I don't know your domain but I can at least guess that there might be some transactions performed in your system allowing to mutate job offers on its own without requiring to consider organization specific business invariants. If this is the case, you should rethink the domain model and consider making JobOffer an aggregate root on its own. In this case your initial problem get's resolved automatically. Also note that modelling job offers as aggregates can make actions performed on organizations simpler as well as you do not need to load all the job offers for that organization when loading the organization aggregate. This might of course not be relevant in your case and really depends on the maximum amount of job offers for an organization.
So I think, depending on your business requirements and domain logic invariants I would recommd one of the folllwing two options:
Reference the foreign child entity only through a composite id including the id of other the aggregate + the child entity id (e.g. by creating some value object that represents this reference as a strong type)
Make JobOffer an aggregate on its own if the mentioned considerations hold true in your case

Multiple Data Transfer Objects for same domain model

How do you solve a situation when you have multiple representations of same object, depending on a view?
For example, lets say you have a book store. Within a book store, you have 2 main representations of Books:
In Lists (search results, browse by category, author, etc...): This is a compact representation that might have some aggregates like for example NumberOfAuthors and NumberOfRwviews. Each Author and Review are entities themselves saved in db.
DetailsView: here you wouldn't have aggregates but real values for each Author, as Book has a property AuthorsList.
Case 2 is clear, you get all from DB and show it. But how to solve case 1. if you want to reduce number of connections and payload to/from DB? So, if you don't want to get all actual Authors and Reviews from DB but just 2 ints for count for each of them.
Full normalized solution would be 2, but 1 seems to require either some denormalization or create 2 different entities: BookDetails and BookCompact within Business Layer.
Important: I am not talking about View DTOs, but actually getting data from DB which doesn't fit into Business Layer Book class.

For me it sounds like multiple Query Models (QM).
I used DDD with CQRS/ES style, so aggregate roots are producing events based on commands being passed in. To those events multiple QMs are subscribed. So I create multiple "views" based on requirements.
The ES (event-sourcing) has huge power - I can introduce another QMs later by replaying stored events.
Sounds like managing a lot of similar, or even duplicate data, but it has sense for me.
QMs can and are optimized to contain just enough data/structure/indexes for given purpose. This is the way out of "shared data model". I see the huge evil in "RDMS" one for all approach. You will always get lost in complexity of managing shared model - like you do.

I had a very good result with the following design:
domain package contains #Entity classes which contain all necessary data which are stored in database
dto package which contains view/views of entity which will be returned from service
Dto should have constructor which takes entity as parameter. To copy data easier you can use BeanUtils.copyProperties(domainClass, dtoClass);
By doing this you are sharing only minimal amount of information and it is returned in object which does not have any functionality.

Should the implementation of repositories be isolated like their coresponding aggregates?

The benifit of having repositories when using DDD is that they allows one to design a domain model without worrying about how objects will be persisted. It also allows the final product to be more flexible, as different implementations of repositories can be swapped in and out easily. So it's possible for the implementation of repositories to be based on SQL databases, REST web services, XML files, or any other method of storing and retrieving data. From the model's perspective the expectation is that there are just these magic collections that can be use to store and retrieve aggregate roots objects.
Now if I have two normal in-memory collections, say an IList<Order> and an IList<Customer>, I would never expect that modifying one collection would affect the other. So should the same logic apply to repositories? Should the actual implementation of repositories be totally isolated from one another, even if they in reality access the same database?
For example a cascade-on-delete relationship may be setup in a SQL database between a Customers table and an Orders table so that corresponding orders are deleted when a customer is deleted. Yet this functionality would break if later the SQLCustomerRepository is replaced by a RESTCustomerRepository.
So am I correct in thinking that the model should always be under the assumption that repositories are totally isolated from one another, and correspondingly the actual implementation of repositories should be isolated as well?
So if Orders should be deleted when a Customer is deleted should this be defined explicitly in the domain model, rather then relying on the database? Say through a CustomerService.DeleteCustomer() method which accesses the current ICustomerRepository and IOrderRepository.
I think I am just having a hard time getting my head out of the relational world and into the DDD world. I keep wanting to think of things in terms of tables and PK/FK relationships, where I should just ignore that a database is involved at all.

I believe that point you miss is that aggregate roots draws context boundaries.
In simple words - stuff underneath makes sense only together w/ aggregate root itself.
As I see it - Order is not an aggregate root but an entity which lives in Customer aggregate root context. That means - there is no need for Order repository because repositories are supposed to be per aggregate root. So there should be only CustomerRepository which is supposed to know how to persist Customer.Orders too.
I myself don't worry that much and omit repository pattern altogether and just rely on NHibernate ORM. Rich domain model that correctly tracks and monitors state changes is much more important than way how you actually send update/select sql statements.
Also - think twice before deleting stuff.

Never delete a customer, a customer is not deleted, it is made inactive or something. Also please don't cascade delete orders it will get you into strange places, orders should always be preserved when they are processed. Think of reports for your application, so 1.1 Million revenue just went away because you decided to cascade delete.

You have a repository per aggregate root not per entity, thus even cascading deletion of childs of aggregate root is applicable in the aggregate root repository as it is still isolated.
Dont cascade deletion or have any side effects to other aggregate roots, co-ordinate this logic in the application layer.

Your domain model should model the transactional operations of your domain. By putting Orders on Customer, in your Customer entity, you are saying that when a Customer is deleted, so should his Orders.
If you have OrderIds on your Customer, that's different. Than you have an association between Customer and Orders. In this case, you are saying that by adding or removing from the list of OrderIds on Customers, you are adding or removing associations, not adding or deleting Orders.
Should the actual implementation of repositories be totally isolated from one another, even if they in reality access the same database?
Yes, for the most part. If you decide to make both Order and Customer Aggregate Roots, you are saying they are independant of one another, and should be allowed to change independently and simultaneously. That is, you don't need the changes to be transactional between the two. If you only make Customer an Aggregate Root, and have it have a list of Orders, now you are saying that the Customer entity dictates what happens to the Orders, and changing a Customer will cascade changes to it's Orders.
Now in your example, it seems you'd have Customers as aggregate roots. And Orders as aggregate roots. Each with their own repo. Customers would have a list of OrderIds to model the one to many association. If you deleted a Customer, you could publish a customer deleted event, and have everything related to this customer clean itself up.

How should I enforce relationships and constraints between aggregate roots?

I have a couple questions regarding the relationship between references between two aggregate roots in a DDD model. Refer to the typical Customer/Order model diagrammed below.
First, should references between the actual object implementation of aggregates always be done through ID values and not object references? For example if I want details on the customer of an Order I would need to take the CustomerId and pass it to a ICustomerRepository to get a Customer rather then setting up the Order object to return a Customer directly correct? I'm confused because returning a Customer directly seems like it would make writing code against the model easier, and is not much harder to setup if I am using an ORM like NHibernate. Yet I'm fairly certain this would be violating the boundaries between aggregate roots/repositories.
Second, where and how should a cascade on delete relationship be enforced for two aggregate roots? For example say I want all the associated orders to be deleted when a customer is deleted. The ICustomerRepository.DeleteCustomer() method should not be referencing the IOrderRepostiory should it? That seems like that would be breaking the boundaries between the aggregates/repositories? Should I instead have a CustomerManagment service which handles deleting Customers and their associated Orders which would references both a IOrderRepository and ICustomerRepository? In that case how can I be sure that people know to use the Service and not the repository to delete Customers. Is that just down to educating them on how to use the model correctly?

First, should references between aggregates always be done through ID values and not actual object references?
Not really - though some would make that change for performance reasons.
For example if I want details on the customer of an Order I would need to take the CustomerId and pass it to a ICustomerRepository to get a Customer rather then setting up the Order object to return a Customer directly correct?
Generally, you'd model 1 side of the relationship (eg., Customer.Orders or Order.Customer) for traversal. The other can be fetched from the appropriate Repository (eg., CustomerRepository.GetCustomerFor(Order) or OrderRepository.GetOrdersFor(Customer)).
Wouldn't that mean that the OrderRepository would have to know something about how to create a Customer? Wouldn't that be beyond what OrderRepository should be responsible for...
The OrderRepository would know how to use an ICustomerRepository.FindById(int). You can inject the ICustomerRepository. Some may be uncomfortable with that, and choose to put it into a service layer - but I think that's overkill. There's no particular reason repositories can't know about and use each other.
I'm confused because returning a Customer directly seems like it would make writing code against the model easier, and is not much harder to setup if I am using an ORM like NHibernate. Yet I'm fairly certain this would be violating the boundaries between aggregate roots/repositories.
Aggregate roots are allowed to hold references to other aggregate roots. In fact, anything is allowed to hold a reference to an aggregate root. An aggregate root cannot hold a reference to a non-aggregate root entity that doesn't belong to it, though.
Eg., Customer cannot hold a reference to OrderLines - since OrderLines properly belongs as an entity on the Order aggregate root.
Second, where and how should a cascade on delete relationship be enforced for two aggregate roots?
If (and I stress if, because it's a peculiar requirement) that's actually a use case, it's an indication that Customer should be your sole aggregate root. In most real-world systems, however, we wouldn't actually delete a Customer that has associated Orders - we may deactivate them, move their Orders to a merged Customer, etc. - but not out and out delete the Orders.
That being said, while I don't think it's pure-DDD, most folks will allow some leniency in following a unit of work pattern where you delete the Orders and then the Customer (which would fail if Orders still existed). You could even have the CustomerRepository do the work, if you like (though I'd prefer to make it more explicit myself). It's also acceptable to allow the orphaned Orders to be cleaned up later (or not). The use case makes all the difference here.
Should I instead have a CustomerManagment service which handles deleting Customers and their associated Orders which would references both a IOrderRepository and ICustomerRepository? In that case how can I be sure that people know to use the Service and not the repository to delete Customers. Is that just down to educating them on how to use the model correctly?
I probably wouldn't go a service route for something so intimately tied to the repository. As for how to make sure a service is used...you just don't put a public Delete on the CustomerRepository. Or, you throw an error if deleting a Customer would leave orphaned Orders.

Another option would be to have a ValueObject describing the association between the Order and the Customer ARs, VO which will contain the CustomerId and additional information you might need - name,address etc (something like ClientInfo or CustomerData).
This has several advantages:
Your ARs are decoupled - and now can be partitioned, stored as event streams etc.
In the Order ARs you usually need to keep the information you had about the customer at the time of the order creation and not reflect on it any future changes made to the customer.
In almost all the cases the information in the value object will be enough to perform the read operations ( display customer info with the order ).
To handle the Deletion/deactivation of a Customer you have the freedom to chose any behavior you like. You can use DomainEvents and publish a CustomerDeleted event for which you can have a handler that moves the Orders to an archive, or deletes them or whatever you need. You can also perform more than one operation on that event.
If for whatever reason DomainEvents are not your choice you can have the Delete operation implemented as a service operation and not as a repository operation and use a UOW to perform the operations on both ARs.
I have seen a lot of problems like this when trying to do DDD and i think that the source of the problems is that developers/modelers have a tendency to think in DB terms. You ( we :) ) have a natural tendency to remove redundancy and normalize the domain model. Once you get over it and allow your model to evolve and implicate the domain expert(s) in it's evolution you will see that it's not that complicated and it's quite natural.
UPDATE: and a similar VO - OrderInfo can be placed inside the Customer AR if needed, with only the needed information - order total, order items count etc.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string