Difference between relations and collections in hybris? - sap-commerce-cloud

I am new to hybris, What is diff b/w relations and collections, why we go for relations instead of collections.

Basically, there are two technically different ways of modeling collections in hybris:
CollectionTypes
Think of CollectionTypes in hybris as a backpack mounted onto a type
By runtime, CollectionTypes are resolved into a Collection of a kind of item, such as a List of MediaModels
Can cause overflow, resulting in truncation and therefore loss of data
More difficult to search and lower performance
On the database level, CollectionTypes are a comma-separated list of PKs, so there is a maximum
RelationTypes
Create links between all kinds of types Create type-safe n-to-m relations: Only link such elements of the source / target type declared at the relation
Values for relations are stored in a separate database table
+Each value is stored in a separate table row

I'm totally agree with #KilleKat comment, he has mentioned all the differences between CollectionType and RelationType in Hybris.
I attached bellow some diagrams to have a more clearer view about the subject.
CollectionTypes: (to be used wisely)
RelationTypes: (recommended)

As Sumit says above,
CollectionType is discouraged and RelationType should be used whenever possible. This is because, the maximum length of the database field of a CollectionType is limited and a CollectionType with many values may end up getting its values truncated. In addition, the values of CollectionTypes are written in a CSV format and not in a normalized way. By consequence, hybris recommends using RelationTypes whenever possible.
CollectionType: CollectionTypes are based on the Java Collection class i.e. a Collection is a list of elements.
1:n - Keep links to the respective values via an attribute on the source item, for example, a list of Primary Keys.
n:1 - Store the attribute values at the respective target items and have a getter method at the source type to retrieve the values.
RelationType:
n:m - Internally, the elements on both sides of the relation are linked together via instances of a helper type called LinkItem.
LinkItems hold two attributes, SourceItem and TargetItem, that hold
references to the respective item.
For each entry within a relation (in other words, for each link from one item to another), there is a LinkItem instance that stores the PKs of the related items. LinkItem instances are handled transparently and automatically by the platform: On the API level, you only need to use the respective getter and setter methods.

Its important to understand hybris strongly discourages using collections, use relations instead.
As stated above collections are maintained as comma separated from a data structure prospective and thats why you might see problem of data truncate, where as relations have rational data structure of creating a new table and map table to join the two table.
Collection because of there storage structure - can't be searched.
I would say for a very simple (1:n) relationship with limited data - you can still use collections. While for any complex (m:n /1:n) relationship always use relations

In collections we have limited size, If we are trying to insert more data it will be truncated.
Relations we can use n no. of data.
Collections is faster than relations, but in collections we can use only one to many relationship only, for many to many we should use relations only....

Adding to what Raghav has added, a collection internally is stored as a csv of the PKs in a single column. Hence the size limit due to field length restriction in any database.
A relation however can be stored in a separate table and hence unlimited mappings can be done.

Collection
The Root Interface in the Collection Hierarchy.
Collection represents a group of objects, known as its elements.
Some collections allow Duplicate elements and others do Not.
Some are Ordered and others Un-Ordered
To get a really good idea of what each collection is good for and their performance characteristics I would recommend getting a good idea about Data Structures like Arrays, Linked Lists, Binary Search Trees, Hashtables, as well as Stacks and Queues. There is really no substitute to learning this if you want to be an Effective Programmer in any Language.
HashMap is only really used for cases when there is some logical reason to have special keys corresponding to values

Collections are persisted as a serialized object in a single column in the DB.
Relations are persisted in the usual relational database way - using a foreign key on another table or a link table (depending on the cardinality of the relation)
Collection types are discouraged as they cannot be searched using flexiblesearch and have significant performance limitations when dealing with collections of more than a handful of objects.

The exact difference between Collection and Relations in hybris is:
"How the data is stored in both of them"
In collections, a new column is created in table(item), containing comma separated primary keys of the list elements. The actual list elements are stored in another table.
In relations, a new table is created as a link table between two item types.
You can read the complete difference here.

One to many relationship can be achieved by Collection and relation and
Why collection is preferred in some cases over relation in Hybris
Collection - an alternative to one to many relation
Example: User and Address
Here Address is of type Collection and mapped to User as AddressCollection.
User must need an address object but for address it is not neccessary to have UserModel reference (an user can have many addresses).
Here is why collection preferred than relation.
<collectiontype code="AddressCollection" elementtype="Address" autocreate="true" generate="false"/>
<itemtype code="User"
extends="Principal"
jaloclass="de.hybris.platform.jalo.user.User"
autocreate="true"
generate="true">
<deployment table="Users" typecode="4" propertytable="UserProps"/>
<attributes>
<attribute autocreate="true" qualifier="addresses" type="AddressCollection">
<modifiers read="true" write="true" search="false" optional="true" partof="true"/>
<persistence type="jalo"/>
</attribute>
</attributes>
</itemtype>
Relation - One to many
Example: User and Order
Here one User can place as many as orders he want!.
User needs OrderModel reference and For OrderModel , it needs UserModel object reference.
A bidirectional link will be created.
<relation code="User2Orders" generate="true" localized="false" autocreate="true">
<sourceElement type="User" cardinality="one" qualifier="user">
<modifiers read="true" write="true" search="true" optional="false"/>
</sourceElement>
<targetElement type="Order" cardinality="many" qualifier="orders">
<modifiers read="true" write="true" search="true" optional="true" partof="true"/>
</targetElement>
</relation>

Related

How do I read all entities of a kind in a transaction with google cloud datastore nodejs

When I try run a query to read all entities of a kind in a transaction with google datastore it gives me this error
{ Error: Only ancestor queries are allowed inside transactions.
at /root/src/node_modules/grpc/src/client.js:554:15
code: 3,
metadata: Metadata { _internal_repr: {} },
So I need to use an ancestor query. How do I create an ancestor query? It appears to depend on how you structured the hierarchy in datastore. So my next question is, given every entity I have created in datastore has been saved like so (the identifier is unique to the entityData saved)
const entityKey = datastore.key({ namespace: ns, path: [kind, identifier] });
{ key: entityKey, method: 'upsert', data: entityData };
How do I read from the db within a transaction? I think I could do it if I knew the identifiers, but the identifiers are constructed from the entityData that I saved in the kind and I need to read the entities of the kind to figure out what I have in the db (chicken egg problem). I am hoping I am missing something.
More context
The domain of my problem involves sponsoring people. I have stored a kind people in datastore where each entity is a person consisting of a unique identifier, name and grade. I have another kind called relationships where each entity is a relationship containing two of the peoples identifiers, the sponsor & sponsee (linking to people together). So I have structured it like an RDB. If I want to get a persons sponsor, I get all the relationships from the db, loop over them returning the relationships where the person is the sponsee then query the db for the sponsor of that relationship.
How do I structure it the 'datastore' way, with entity groups/ancestors, given I have to model people and their links/relationships.
Let's assume a RDB is out of the question.
Example scenario
Two people have to be deleted from the app/db (let's say they left the company on the same day). When I delete someone, I also want to remove their relationships. The two people I delete share a relationship (one is sponsoring the other). Assume the first transaction is successful i.e. I delete one person and their relationship. Next transaction, I delete one person, then search the relationships for relevant relationships and I find one that has already been deleted because eventually consistent. I try find the person for that relationship and they don't exist. Blows up.
Note: each transaction wraps delete person & their relationship. Multiple people equals multiple transactions.
Scalability is not a concern for my application
Your understanding is correct:
you can't use an ancestor query since your entities are not in an ancestry relationship (i.e. not in the same entity group).
you can't perform non-ancestor queries inside transactions. Note that you also can't read more than 25 of your entities inside a single transaction (each entity is in a separate entity group). From Restrictions on queries:
Queries inside transactions must be ancestor queries
Cloud Datastore transactions operate on entities belonging to up
to 25 entity groups, but queries inside transactions must be
ancestor queries. All queries performed within a transaction must
specify an ancestor. For more information, refer to Datastore
Transactions.
The typical approach in a context similar to yours is to perform queries outside transactions, often just keys only queries - to obtain the entity keys, then read the corresponding entities (up to 25 at a time) by key lookup inside transactions. And use transactions only when it's absolutely needed, see, for example, this related discussion: Ancestor relation in datastore.
Your question apparently suggests you're approaching the datastore with a relational DB mindset. If your app fundamentally needs relational data (you didn't describe what you're trying to do) the datastore might not be the best product for it. See Choosing a storage option. I'm not saying that you can't use the datastore with relational data, it can still be done in many cases, but with a bit more careful design - those restrictions are driving towards scalable datastore-based apps (IMHO potentially much more scalable that you can achieve with relational DBs)
There is a difference between structuring the data RDB style (which is OK with the datastore) and using it in RDB style (which is not that good).
In the particular usage scenario you mentioned you do not need to query for the sponsor of a relationship: you already have the sponsor's key in the relationship entity, all you need to do is look it up by key, which can be done in a transaction.
Getting all relationship entities for a person needs a query, filtered by the person being the sponsor or the sponsee. But does it really have to be done in a transaction? Or is it acceptable if maybe you miss in the result list a relationship created just seconds ago? Or having one which was recently deleted? It will eventually (dis)appear in the list if you repeat the query a bit later (see Eventual Consistency on Reading an Index). If that's acceptable (IMHO it is, relationships don't change that often, chances of querying exactly right after a change are rather slim) then you don't need to make the query inside a transaction thus you don't need an ancestry relationship between the people and relationship entities. Great for scalability.
Another consideration: looping through the list of relationship entities: also doesn't necessarily have to be done in a transaction. And, if the number of relationships is large, the loop can hit the request deadline. A more scalable approach is to use query cursors and split the work across multiple tasks/requests, each handling a subset of the list. See a Python example of such approach: How to delete all the entries from google datastore?
For each person deletion case:
add something like a being_deleted property (in a transaction) to that person to flag the deletion and prevent any use during deletion, like creating new relationship while the deletion task is progressing. Add checks for this flag wherever needed in the app's logic (also in transactions).
get the list of all relationship keys for that person and delete them, using the looping technique mentioned above
in the last loop iteration, when there are no relationships left, enqueue another task, generously delayed, to re-check for any recent relationships that might have been missed in the previous loop execution due to the eventual consistency. If any shows up re-run the loop, otherwise just delete the person
If scalability is not a concern, you can also re-design you data structures to use ancestry between all your entities (placing them in the same entity group) and then you could do what you want. See, for example, What would be the purpose of putting all datastore entities in a single group?. But there are many potential risks to be aware of, for example:
max rate of 1 write/sec across the entire entity group (up to 500 entities each), see Datastore: Multiple writes against an entity group inside a transaction exceeds write limit?
large transactions taking too long and hitting the request deadlines, see Dealing with DeadlineExceededErrors
higher risk of contention, see Contention problems in Google App Engine

Multiple Data Transfer Objects for same domain model

How do you solve a situation when you have multiple representations of same object, depending on a view?
For example, lets say you have a book store. Within a book store, you have 2 main representations of Books:
In Lists (search results, browse by category, author, etc...): This is a compact representation that might have some aggregates like for example NumberOfAuthors and NumberOfRwviews. Each Author and Review are entities themselves saved in db.
DetailsView: here you wouldn't have aggregates but real values for each Author, as Book has a property AuthorsList.
Case 2 is clear, you get all from DB and show it. But how to solve case 1. if you want to reduce number of connections and payload to/from DB? So, if you don't want to get all actual Authors and Reviews from DB but just 2 ints for count for each of them.
Full normalized solution would be 2, but 1 seems to require either some denormalization or create 2 different entities: BookDetails and BookCompact within Business Layer.
Important: I am not talking about View DTOs, but actually getting data from DB which doesn't fit into Business Layer Book class.
For me it sounds like multiple Query Models (QM).
I used DDD with CQRS/ES style, so aggregate roots are producing events based on commands being passed in. To those events multiple QMs are subscribed. So I create multiple "views" based on requirements.
The ES (event-sourcing) has huge power - I can introduce another QMs later by replaying stored events.
Sounds like managing a lot of similar, or even duplicate data, but it has sense for me.
QMs can and are optimized to contain just enough data/structure/indexes for given purpose. This is the way out of "shared data model". I see the huge evil in "RDMS" one for all approach. You will always get lost in complexity of managing shared model - like you do.
I had a very good result with the following design:
domain package contains #Entity classes which contain all necessary data which are stored in database
dto package which contains view/views of entity which will be returned from service
Dto should have constructor which takes entity as parameter. To copy data easier you can use BeanUtils.copyProperties(domainClass, dtoClass);
By doing this you are sharing only minimal amount of information and it is returned in object which does not have any functionality.

How to model sort order for many-to-one across two aggreagate roots

Take the domain proposed in Effective Aggregate Design of a Product which has multiple Releases. In this article, Vaughn arrives at the conclusion that both the Product and Release should each be their own aggregate roots.
Now suppose that we add a feature
As a release manager I would like to be able to sort releases so that I can create timelines for rolling out larger epics to our users
I'm not a PM with a specific need but it seems reasonable that they would want the ability to sort releases in the UI.
I'm not exactly sure how this should work. Its natural for each Release to have an order property but re-ordering would involve changing multiple aggregates on the same transaction. On the other hand, if that information is stored in the Product aggregate you have to have a method like product.setRelaseOrder(ReleaseId[]) which seems like a weird bit of data to store at a completely different place than Releases. Worse, adding a release would again involve modification on two different aggregates! What else can we do? ProductReleaseSortOrder can be its own aggregate, but that sounds downright absurd!
So what to do? At the moment I'm still leaning toward the let-product-manage-it option but what's correct here?
I have found that in fact it is best to create a new aggregate root (e.g., ProductReleaseSorting as suggested) for each individual sorting and/or ordering purposes.
This is because releaseOrder clearly is not actually a property of the Product, i.e., something that has a meaning on a product on its own. Rather, it is actually a property of a "view" on a collection of products, and this view should be modeled on its own.
The reason why I tend to introduce a new aggregate root for each individual view on a collection of items becomes clear if you think of what happens if you were to introduce additional orderings in the future, say a "marketing order", or multiple product managers want to keep their own ordering etc. Here, one easily sees that "marketing order" and "release order" are two different concepts that should be treated independently, and if multiple persons want to order the products with the same key, but using different orderings, you'll need individual "per person views". Furthermore, it could be that there are multiple order criteria that one would like to take into account when sorting (an example for the latter would be (in a different context) fastest route vs. shortest route), all of which depends on the view you have on the collection, and not on individual properties of its items.
If you now handle the Product Manager's sorting in a ProductReleaseSorting aggregate, you
have a single source of truth support for the ordering (the AR),
the ProductReleaseSorting AR can enforce constraints such as that no two products have the same order number, and you
don't face the issue of having to update multiple ARs in a single transaction when changing the order.
Note that your ProductReleaseSorting aggregate most probably has a unique identity ("Singleton") in your domain, i.e., all product managers share the same sorting. If however all team members would like to have their own ProductReleaseSorting, it's trivial to support this by giving the ProductReleaseSorting a corresponding ID. Similarly, a more generic ProductSorting can be fetched by a per-team ID (marketing vs. product management) from the repository. All of this is easy with a new, separate aggregate root for ordering purposes, but hard if you add properties to the underlying items/entities.
So, Product and Release are both ARs. Release has an association to Product via AggregateId. You want to get list of all releasesfor a given product ordered by something?
Since ordering is an attribute of aggregate, then it should be set on Product, but Releases are ARs too and you shouldn't access repository of Release in Product AR (every AR should have its own repository).
I would simply make a ReleaseQueryService that takes productId and order parameter and call ReleaseRepository.loadOrderedReleasesForProduct(productId, order).
I would also think about separating contexts, maybe model for release presentation should be in another context? In example additional AR ProductReleases that would be used only for querying.

Core Data subentity vs. relationship for entities with similar attributes

Planning out the Core Data schema for my iOS app, I found that I have a handful of entities that need the same set of attributes (e.g. descriptive text, a rating, user notes, etc.) and relationships (e.g. for applying tags, attaching parameters, setting parent/child relationships, associating images, etc.). For one entity "Entity A", the shared attributes/relationships are all it needs while the others each have a couple of additional unique attributes and/or relationships. Reading the Core Data documentation and posts here, I decided to set up "Entity A" as the parent of the others.
One additional entity shares all of the same attributes and a subset of the same relationships, and that is the entity for images (note that the images themselves are stored in files, this entity is for metadata and has a key to the image file). "Entity A" and children all need a to-many relationship to the image entity, however, while the image entity does not need a relationship to itself. The image entity also does not need the parent/child relationships of "Entity A". I see four options, but am having trouble determining which way to go.
My question is whether any of these options have a significant pro or con that I'm missing or if one particular option is generally considered the "correct" way of doing things. I've read that in Core Data all subentities will share a single table with the parent entity, which could lead to performance issues for large numbers of objects, but I anticipate my app would only require a couple thousand rows in such a table at most.
Option 1: Set up the image entity as a child of "Entity A" and just ignore the relationships it doesn't need. This is the easiest to set up and takes full advantage of inheritance, but I don't know if ignoring relationships is a good design choice. "Entity A" has existing data that I think would migrate most easily this way, but I can also recreate the data without too much trouble, so that's not a significant consideration. There is no current image entity, so migration of that is not a concern.
Option 2: Create an abstract parent that both "Entity A" and the image entity are children of. The parent would be "Entity A" minus the relationship to images and "Entity A" would now just have the relationship to images. This seems cleaner and is currently where I'm leaning. While this seems functionally good, conceptually I don't know if it's appropriate for an entity with what's essentially metadata to be the parent.
Option 3: Instead of an abstract parent, make a separate new "metadata" entity that is "Entity A" minus the image relationship and add a to-one relationship to the metadata entity from both "Entity A" and the image entity, like making a composite object. This seems conceptually appropriate since the metadata is just one aspect of the main entities, not the defining factor (which is how it feels with Option 2). It also keeps the image entity and "Entity A" in separate tables and should allow searching by metadata to be done more efficiently. The only downside, if it is one, is that it's taking the majority of the attributes and relationships for both "Entity A" and the image entity and tacking them on as a relationship.
Option 4: Ignore the similarity in attributes and just make the image entity completely separate, duplicating all of the attributes from "Entity A". This seems the least desirable due to duplication of effort.
I ended up going with Option 3 (creating a new "metadata" entity) and then created two separate sub-entities of metadata to handle the inverse relationships for "Entity A" and the image entity. This seemed to be the appropriate course of action in terms of object hierarchy. I also added accessors to "Entity A" and the image entity as a convenience to pass through calls for the metadata entity's attributes.
The resulting Core Data migration required a few steps and some custom coding - probably more work than simply creating an empty database and manually re-populating it, but it was a good learning experience with migrations.
Having completed the migration I confirmed what I had read about sub-entities sharing a table with the parent entity. In the case of the metadata entity, this meant that every row had columns representing the inverse relationships to both "Entity A" and the image entity. For reference, the metadata table had the following columns:
Z_PK = row index
Z_ENT = entity index to distinguish between sub-entities (all entity indices are in the table Z_PRIMARYKEY)
Z_OPT = count of writes for the given row
Zxxxx - columns for each attribute and to-one relationship in the parent and sub-entities, apparently ordered with booleans first, then integers, then relationships, dates, and finally strings (from smallest to largest data size)
Note that to-many relationships are handled in separate tables.

Lookup tables in Core Data

Core data is not a database and so I am getting confused as to how to create, manage or even implement Lookup tables in core data.
Here is a specific example that relates to my project.
Staff (1) -> (Many) Talents (1)
The talents table consists of:
TalentSkillName (String)
TalentSkillLevel (int)
But I do not want to keep entering the TalentSkillName, so I want to put this information into another, separate table/entity.
But as Core Data is not really a database, I'm getting confused as to what the relationships should look like, or even if Lookup tables should even be stored in core data.
One solution I'm thinking of is to use a PLIST of all the TalentSkillNames and then in the Talents entity simply have a numeric value which points to the PLIST version.
Thanks.
I've added a diagram which I believe is what you're meant to do, but I am unsure if this is correct.
I'd suggest that you have a third entity, Skill. This can have a one to many relationship with Talent, which then just has the level as an attribute.
Effectively, this means you are modelling a many-to-many relationship between Staff and Talent through the Skill entity. Logically, that seems to fit with the situation you're describing.

Resources