On SQLAlchemy's documentation page the author starts with a philosophy,
SQL databases behave less like object collections the more size and
performance start to matter; object collections behave less like
tables and rows the more abstraction starts to matter.
I'm scratching my head trying to understand the idea behind these two sentences, but failed. Could someone give an example illustrate the idea here? Thanks.
When you are creating an application using an Object Oriented language and a SQL database, you are simultaneously working with two very different conceptual models for storing information:
The relational model says how to store data in tables and rows and how to link elements through keys and joins.
The object model establishes a way to store entities with attributes in memory (usually) and how to set links between them using pointers or references.
So, let's say that you have an User entity that is linked to addresses and other users in your application. Those entities will need to be stored in the form several tables in the database (users table, addresses table and a many to many table for associating users to users, for instance). At the same time, if your code uses object oriented constructs, users and addresses will exist in memory in the form of objects with references between them, pointing to objects of the same or different kind.
The thing is, moving information between those two different worlds is much much more difficult than it looks at first:
You might associate one object with one row in a table, but that is not always possible and sometimes a single object must be associated to multiple rows in different tables.
Inheritance and polymorphic behavior are particularly difficult to map to a relational model.
Traversing objects and querying the database are vastly different actions.
Performance factors to take into account in an object model and a relational model are completely different.
And those are just a few examples. ORMs such as SQLAlchemy are essentially translators that convert information from one world into the other and back.
What I think that Mike Bayer was trying to convey is: the more you adapt your entity information to the object model (lots of inheritance, polymorphism, traversal of objects, ...), the farther it will resemble the natural structure in a relational model and the more performance concessions you will be making. And the other way around: the more you design your tables to perform well and be optimized for your queries, the less they will adapt to a natural structure of objects.
Martin Fowler has a nice write-up about the need of this translation in this article: ORM Hate (from which I took the above image).
Edit: further clarification on the abstraction vs performance issue
At the end, I think that the bottom line of that SQLAlchemy presentation text is: many ORMs hide the relational side of the relational-object oriented translation to make things easier. With them you only have to worry about the object oriented side, and the library is in charge of taking away the burden of dealing with the database. You get persistence for your objects without having to deal with SQL. However, they incur in a performance penalty in doing so, because the details of working with the database are abstracted away and you have no control over them. And those details are essential when you have to optimize performance. SQLAlchemy takes the opposite approach. It hides nothign of the relational side, you are in control of how SQL is generated and when and when not use joins, subqueries and other SQL constructs. That makes it a much more complex library to learn, but at the same time you are in control of the whole relational-object oriented translation process.
Related
I am developing Library Management System which have two sorts of books (Ebook and PrintedBook).
I intends to make search capacity with both ebook and printedbook in the same page.
The only problem is that I see that ebook and printedbook are book. And should I make an Book entity, and PrintedBook and Ebook inherits Book entity. If I do this, the search capacity is easier by using IBookRepository. If not I have to join two tables (Ebooks and PrintedBooks).
Please help me.
Dealing with inheritance at persistance level, esspecialy when talking about relation databases, can be a headache. First of all you should ask yourself why is this a problem for you.
If the problem is a performance due to using JOIN in you database query you might look at technique called single table inheritance. Basically you have one table containing all the columns of all your book types (i.e. PrintedBook and Ebook). This way you don't have to use JOIN, but you sacrifice some storage.
Other then the concrete table inheritanec technique (as described by yourself) there is no other way how to deal with the inheritance problem in relation databases.
If your application becomes too complex or the domain model isn't compatible with your read use cases, you might look at read-model. Read-model helps you to focus on your problem domain without modifying it while having easy access to the data. This is very complex topic so if you want to read something about read-models (or about DDD implementation problems/techniques) I recommend you to read Implementing Domain-Driven Design by Vaugh Vernon.
Everywhere I look, I noticed that both Domain Driven Design (DDD) and entity hydration approaches attempt to populate entities directly from the data layer. I disagree with such approaches. It is not because these approaches do not work because these do. Instead, I would argue that such approaches give a low level of transparency for testing purposes. I propose that at the data access layer, data is retrieved to populate dictionaries instead of the directly populating the entities themselves. There are several reasons for this:
First, there is greater flexibility. A dictionary per result set could be populated. We would decide later which entities could be populated from these result sets.
Second, less knowledge about the data layer is needed to determine where data retrival is failing. We may still write tests for verify data retrieval without having to understand anything about its associated complex domain entity factories.
There is one so-called disadvantage, performance? Going through two layers is slower than going through one? Yes, it is but the performance gain from going through a single data layer is negliable here. The reason I say this is because both the dictionaries and the entries these dictionaries would populate would be cached. So, if anything there would be a memory overhead. I think this would be worthwhile to gain the two advantages stated above.
It seems like testing is the issue ("for testing purposes"), so I suggest you use repositories just like #tschmuck pointed out.
As Ayende points out, they might give you unnecessary lasagna code (i.e. too many layers), but they will give you flexibility. You can implement fakes/test spies yourself, mock and stub 'em, as well as use an in-memory DB such as SQLite, and the dependent class is just as happy.
I have learn about Object role modeling but not about Object-relational mapping and I want to know if they are two ways of doing the same thing and what are the pros and cons? To me Object role modeling makes a lot more sense. Could you make a brief but easy to understand comparison if they can be compared. Cheers
Object Role Modeling: software modeling notation to, specially, define domain models. You can think of this language as an alternative to using UML class diagrams to design your database. More info here: http://www.orm.net/
Object-relational mapping: a set of strategies to bridge the gap between object-oriented programs and relational databases. It aims to allow the persistent storage of objects in a relational database structure
Object Role Modeling was invented by a team at Control Data around 1973, and named by Falkenberg. It is a modeling method rooted in linguistic analysis, and was formalised as a first-order logic by Terry Halpin, see http://orm.net. ORM is thus the original user of the acronym. ORM and related modeling languages are distinguished by being attribute-free. These languages contain only objects and object types (kinds of things), facts and fact types (relationships between individual things) and constraints (rules about what things and relationships may exist). No relationship has the master-slave characteristic like entity-attribute - this is a notion that only arises during physical mapping, as it's irrelevant to the underlying semantics of the domain.
Object Relational Mapping (which I always write O/RM) is a name for a method or family of tools that help translate data between relational form and object-oriented form. Both these forms use aggregate or composite things based on attributes (entity/attribute or object/attribute), but the principles for aggregation differ between the two approaches, so the same underlying semantics results in different data structures; hence the need for tools to help automate the translation. Furthermore, in ER or O-O analysis, the need to make early decisions about which things are objects/entities and which are attributes is forced, and this gives rise to a whole class of modeling errors that simply does not occur with ORM.
Of course, both relational and o-o models can be automatically derived from an ORM model, and the mapping between the derived forms is also automatic and painless. I suppose that's not done more often because it would make life too easy.
You are comapring Apples to Oranges.
Object Relational Mapping is all about trying to overcome the impedance msimatch between the object world and relational databases.
Activerecord for example is a ORM that wraps a row in a database.
Hibernate is another popular ORM
Just google for ORM wikipedia explains it much better
http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
In Domain Driven Design are collection properties of entities allowed to have partial values?
For example, should properties such as Customer.Orders, Post.Comments, Graph.Vertices always contain all orders, comments, vertices or it is allowed to have today's orders, recent comments, orphaned vertices?
Correspondingly, should Repositories provide methods like
GetCustomerWithOrdersBySpecification
GetPostWithCommentsBefore
etc.?
I don't think that DDD tells you to do or not to do this. It strongly depends on the system you are building and the specific problems you need to solve.
I not even heard about patterns about this.
From a subjective point of view I would say that entities should be complete by definitions (considering lazy loading), and could completely or partially be loaded to DTO's, to optimized the amount of data sent to clients. But I wouldn't mind to load partial entities from the database if it would solve some problem.
Remember that Domain-Driven Design also has a concept of services. For performing certain database queries, it's better to model the problem as a service than as a collection of child objects attached to a parent object.
A good example of this might be creating a report by accepting several user-entered parameters. It be easier to model this as:
CustomerReportService.GetOrdersByOrderDate(Customer theCustomer, Date cutoff);
Than like this:
myCustomer.OrdersCollection.SelectMatching(Date cutoff);
Or to put it another way, the DDD model you use for data entry does not have to be the same as the DDD model you use for reporting.
In highly scalable systems, it's common to separate these two concerns.
Im new to working with Domain Models so forgive me for asking an elementary question.
If a Domain Object has a 1-many relationship with another Domain Object but logic that uses the first object works with only a subset of that objects related objects, what is the best way to expose this subset?
For example, say a Person is related to many Orders but some external logic needs to examine only the "Dispatched" Orders associated with a Person. Should the Person have a DispatchedOrders property, along with other properties for other subsets (such as CompletedOrders etc) or is this bad design? Assume for performance reasons I cant filter the objects in memory and must use SQL to pull back only the subset I'm interested in.
Thanks
If you're using SQL to find the set you're interested in, you're in a perfect world. Relational queries are all about finding that sort of thing. Find the perfect query, and then just figure out what the class of the result tuples are, i.e., an object for each result tuple, and process them appropriately.
In your example, you want a set of "Dispatched Orders", which whatever person information necessary attached to each one.
I think you have the right idea - DispatchedOrders would tell me precisely what collection of object you are returning to me. As Curt said, you are in good spot as you can use SQL / stored procedure to fetch your data.
One caveat - be sure that the domain matches the business process and is not an interpolation of you understanding of that process. That is - why does a person have primacy over an order and what corner are you painted into when you construct other objects. Does a line-item contain an order as well, and does this lead to object bloat? Discussions with your client should help shape the answer.
Rob Conery of SubSonic fame has a good discussion of these types of issues. It's worth listening to.