Related
I was asked to implement CQRS/Event sourcing patterns into a legacy web application, in order to prepare to migrate it from a monolithic/state oriented model to a distributed, service oriented app.
I have some questions on how I can design a Domain oriented code bundle that would connect the legacy entities strongly coupled to database, with a new Event sourced model.
The first things I did were:
writing a small "framework" for CQRS/ES, with classes like AggregateRoot, DomainEvent, Command, Handlers, Messaging, Eventstore, AggregateIds, etc.
trying to group and "migrate" the legacy Entities into some Aggregates to reconstruct all the history and states of the app into EventSoourced Aggregates
plug some Commands dispatching in the old controllers in order to let the app work as is, but also to feed the new CQRS/ES system on the side.
The context:
The legacy app contains several entities, mapped to database, that hold the model layer. (Our domain is Human resources (manpower).
Let's say we have those existing entities:
Worker, with various fields and related entities (OneToOne, OneToMany), like
name
address 1-1
competences 1-N
Society, in which worker works, with various fields and related entities (OneToOne, OneToMany), like
name
address 1-1
hours
Contract, with various fields and related entities (OneToOne, OneToMany), like
address 1-1
Worker 1-1
Society 1-1
documents 1-N
days 1-N
hours
etc.
From this legacy model, I designed a MissionAggregate that holds:
A db independent ID, like UUID
some Value objects: address, days (they were an entity in the legacy model, they became VOs here)
I also designed a WorkerAggregate and a SocietyAggregate, with fields and UUIDS, and in the MissionAggregate I added:
a reference to WorkerAggregate's UUID
a reference to SocietyAggregate's UUID
As I said earlier, my aim is to leave the legacy app as is, but just introduce in the CRUD controller's methods some calls to dispatch Commands to the new CQRS system.
For example:
After flushing newly created Contract in bdd, I want to dispatch a "CreateMissionCommand" to the new command bus.
It targets the appropriate Command Handler, that handles all the command's data, passes it to a newly created Aggregate with a new UUID and stores "MissionCreatedDomainEvent" in the EventStore.
The DomainEvent is indexed with an AggregateId, a playhead, and has a payload which contains the fields necessary to be applied to and build the MissionAggregate.
The newly Contract created in the app has now its former lifecycle, as usual, with all the updates that the legacy app does on it. But I also need to reflects all those changes to the corresponding EventSourcedAggregate, so every time there is a flush in database in the app, I dispatch a Command that translates the "crud like operations" of the legacy app into a Domain oriented /Command oriented pattern.
To sum up the workflow is:
A Crud legacy operation occurs and flushes some changes on the Contract Entity
In just a row of code in the controller, I dispatch a command built with necessary fields (AggregateId of the MissionAggregate... that I need to have stored somewhere... see next problems) to the Domain command bus, so that the impact on the existing code base is very low.
The bus passes the command to the corresponding command handler
The handler loads the aggregate and applies the changes it by calling the appropriate Aggregate method
then after some validation, the aggregate raises and stores the appropriate event
My problems and questions (some of them at least) are:
I feel like I am rewriting all big portions of the legacy app, with the same kind of relations between the Aggregates that I have between the Entities, and with the same type of validations, checks etc.
Having references, to both WorkerAggregate and SocietyAggregate UUID in MissionAggregate implies that I have to build those aggregate also (hence to dispatch commands from legacy app when the Worker and Society entities are flushed). Can't I have only references to Worker's entity id and Society's entity id?
How can I avoid having a eternally growing MissionAggregate? The Contract Entity is quite huge, it has a lot of fields that are constantly updated (hours, days, documents, etc.) If I want to store all those events, I need to have a large MissionAggregate to reflect all those changes; and so I need to have a tons of CommandHandlers that react to all the Commands of add, update, etc. that I am going to dispatch from the legacy app.
How "free" is an Aggregate from the Root entity it is supposed to refer to ? For example, a Contract Entity needs to relate somewhere to it's related Mission Aggregate, like for example when I want to dispatch a Command from the app, just after the legacy code having flushed something on the Entity. Where to store this relation? In the Entity itself, in a AggregateId field? in the Aggregate, should I have a ContractId field? Or should I have some kind of Mapping Table somewhere that holds the relationship between Contract ID and MissionAggregate ID?
What to do with the past? Should I migrate all the existing data through a script that generates Aggregates and events on all the historical data?
Thanks in advance for your time.
You have a huge task ahead of you, let's try to break it down.
It's best to build this new part of the system in isolation from the legacy codebase, otherwise you're going to have your hands tied in every turn of the way.
Create a separate layer in your project for these new requirements. We're going to call it "bubble" from now on. This bubble will be like a greenfield project, with its own structure, dependencies, etc. There will be no direct communication between the bubble and the legacy; communication will happen through another dedicated translation layer, which we'll call "Anti-Corruption Layer" (ACL).
ACL
It is like an API between two systems.
It translates calls from the bubble to the legacy and vice-versa. Its purpose is to prevent one system from corrupting or influencing the other. This way you can keep building/maintaining each system independently from each other.
At the same time, the ACL allows one system to consume the other, and reuse logic, validations, rules, etc.
To answer your questions directly:
I feel like i am rewriting all big portions of the legacy app, with the same kind of relations between the Aggregates that i have between the Entities, and with the same type of validations, checks etc.
With the ACL, you can resort to calling validations and reuse implementations from the legacy code. This will allow you time to rewrite things as needed or as possible.
You may not need to rewrite the entire system, though. If your goal is to implement CQRS and Event Sourcing and you can achieve this goal by keeping most or part of the legacy system, I would say you do it. Unless, of course, one of the goals is to completely replace the old system. Otherwise, keep it; write as less code as possible.
Suggested workflow:
Keep the CQRS and Event Sourcing system in the bubble
Do not bring these new frameworks into legacy
Make the lagacy Controller issue method calls to the ACL
The ACL will convert these calls into Commands and dispatch them
Any events will be caught by your Event Sourcing framework
Results will be persisted to the bubble's database
The bubble's database can be a different schema in the same database or can be a different database altogether. But you'll have to think about synchronization, and that's a topic of its own. To reduce complexity, I recommend a different schema in the same database.
Having references, to both WorkerAggregate and SocietyAggregate UUID in MissionAggregate implies that i have to build those aggregate also (hence to dispatch commands from legacy app when the Worker and Society entities are flushed). Can't i have only references to Worker's entity id and Society's entity id?
How can i avoid having a eternally growing MissionAggregate ? The Contract Entity is quite huge, it has a looot of fields that are constantly updated (hours, days, documents, etc.) If i want to store all those events, i need to have a large MissionAggregate to reflect all those changes; and so i need to have a tons of CommandHandlers that react to all the Commands of add, update, etc that i am going to dispatch from the legacy app.
You should aim for small aggregates. Huge aggregates are likely to degrade performance and cause concurrency problems.
If you anticipate having a huge aggregate, it is best to rethink it and try to break it down. Ask what fields/properties change together - these are possibly a different aggregate.
Also, when you speak about CQRS, you generally lean towards a task-based way of doing things in your system.
Think of a traditional web application, where you have a huge page with lots of fields that are all sent to the server in one batch when the user saves.
Now, contrast it with a modern web app where the user changes small portions of data at each step. If you think about your system this way you'll find those smaller aggregates.
PS. you don't need to rebuild your interfaces for this. If your legacy system has those huge pages, you could have logic in the controllers to detect which fields were changed and issue the appropriate commands.
How "free" is an Aggregate from the Root entity it is supposed to refer to ? For example, a Contract Entity needs to relate somewhere to it's related Mission Aggregate, like for example when i want to dispatch a Command from the app, just after the legacy code having flushed something on the Entity. Where to store this relation ? In the Entity itself, in a AggregateId field ? in the Aggregate, should i have a ContratId field ? Or should i have some kind of Mapping Table somewhere that holds the relationship between Contract ID and MissionAggregate ID?
Aggregates represent a conceptual whole. They are like atoms, indivisible things. You should always refer to an aggregate by its Root Entity Id, and never to a Child Entity Id: looking from the outside, there are no children.
An aggregate should be loaded as a whole and persisted as a whole. One more reason to have small aggregates.
An aggregate can be comprised of a single entity. Or it can have more entities and value objects, forming a graph, but one entity will be elected as the Root and will hold references to its children. Child entities and value objects should not hold references to their parents. The dependency is not bi-directional.
If Contract is an entity inside the Mission aggregate, the Contract should not have a reference to its parent.
But, if your Contract and Mission are different aggregates, then they can reference each other by their Ids.
What to do with the past? Should i migrate all the existing datas through a script that generates Aggregates and events on all the historical data?
That's a question for the business experts. Do they need it? If they don't, then don't implement it just for the sake of doing so. Every decision you make should be geared towards satisfying a business need and generating real value for it, considering the costs and tradeoffs.
Some people say that code is a liability, not an asset, and I aggre to some extent: every line of code you write needs to be tested and supported. Don't write any code that is not really necessary.
Also, have a look at this article about the Strangler Pattern, which shows how to migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services.
If you have a chance, watch this course at Pluralsight (paid): Domain-Driven Design: Working with Legacy Projects. The author presents practical approaches for dealing with this kind of task.
I hope this has given you some insight.
I don't want to spoil your game. Everybody knows how cool it is to rewrite something from scratch. It's a challenge, it's fun, it's exciting. However...
migrate it from a monolithic/state oriented model to a distributed, service oriented app
CQRS/Event Sourcing won't solve any of your problems and it won't help you distribute the app in any reasonable way. If you just generate events on the CRUD operations you'll have a large tangled mess of dependencies between each part. Every part that needs data will have to call a couple of "services" (i.e. tables) to get it, than push data elsewhere, generate events1 that some other parts will react to. It will be a mess. Usually this is called a distributed monolith.
This is also the reason you already see problems with it. These problems won't go away, because you are essentially building the same system in the same way, but this time it'll be more complex.
Where to go from here
The very first thing is always: have a clear goal. You want a service oriented architecture you said. Why? Are there parts that need different scaling, different resources? Are they managed by different teams with different life-cycles? Etc.? Maybe you already have all this, I don't know, but if not, that's your first task.
Then. The parts you do want to pull out can't be just CRUD things. Those will not be independent, so whether your goal (see point above!) is scaling or different team, you won't reach your goal! To be independent you'll have to pull out the behavior with the data, and in a way that the service can operate on its own.
You can't just throw buzzwords at it and hope for the best. I'd suggest to just ignore all the hype and buzzwords and think about the goal you want to reach.
For example: I need a million workers to log their time in under 10 minutes total. So that means I need a "service" to enable worker to log their time with a web interface. So let's create that as a complete independent piece with its own database so it can be scaled to a 100 nodes when it needs to be. Export data to billing automatically every hour or so.
I am just starting out with ES/DDD and I have a question how one is supposed to do reporting in this architecture. Lets take a typical example, where you have a Customer Aggregate, Order Aggregate, and Product Aggregate all independent.
Now if i want to run a query across all 3 aggregates and/or services, but that data is each in a separate DB, maybe one is SQL, one is a MongoDB, and one something else. How is one supposed to design or be able to run a query that would require a join across these aggregates ?
You should design the Reporting as a simple read-model/projection, possible in its own bounded context (BC), that just listen to the relevant events from the other bounded contexts (Customer BC, Ordering BC and Inventory BC) and builds the needed reports with full data denormalization (i.e. at query time you won't need to query the original sources).
Because of events you won't need any joins as you could maintain a private local state attached to the Reporting read-model in which you can store temporary external models and query those temporary read-models as needed thus avoiding external additional queries to the other BCs.
An anti-corruption layer would not be necessary in this case as there would be no write-model involved in the Reporting BC.
Things are really as simple as that because you already have an event-driven architecture (you use Event sourcing).
UPDATE:
This particular solution is very handy in creating new reports that you haven't thought ahead of time. Every time you thing about a new report you just create a new Read-model (as in you write its source code) then you replay all the relevant events on it. Read-models are side-effect free, you can replay all the events (from the beggining of time) any time and as many time you want.
Read-model rebuilding is done in two situations:
you create a new Read-model
you modify an existing one by listening to a new event or the algorithm differs too much from the initial version
You can read more here:
DDD/CQRS specialized forum - Grey Young is there!
Event sourcing applied – the read model
Writing an Event-Sourced CQRS Read Model
A post in first group describing Read Model rebuilding
Or you can search about this using this text: event sourcing projection rebuilding
Domain-Driven Design is more concerned with the command side of things. You should not attempt to query your domain as that leads to pain and suffering.
Each bounded context may have its own data store and that data store may be a different technology as you have stated.
For reporting you would use a reporting store. How you get data into that store would either require each bounded context to publish events that the reporting BC would pick up and use to update the reporting store or you could make use of event sourcing where the reporting store would project the events into the relevant reporting structures.
There are known practices to solve this.
One might be having a reporting context, which, as Eben has pointed out, will listen to domain events from other contexts and update its store. This of course will lead to issues, since this reporting context will be coupled to all services it reports from. Some might say this is a necessary evil but this is not always the case.
Another technique is to aggregate on-demand. This is not very complex and can be done on different layers/levels. Consider aggregation on the web API level or even on the front-end level, if your reporting is on the screen (not sent by mail as PDF, for example).
This is well known as UI composition and Udi Dahan has wrote an article about this, which is worth reading: UI Composition Techniques for Correct Service Boundires. Also, Mauro Servienti has wrote a blog post about this recently: The secret of better UI composition.
Mauro mentions two types of composition, which I mentioned above. The API/server-side composition is called ViewModel Composition in his post, and front-end (JavaScript) composition is mentioned in the Client side composition process section. Server-side composition is illustrated by this picture:
DDD strategic modeling tools says:
Design two different models 1. Write Models (Handles Command Side) 2.Read Models (POCOs/POJOs) whatever u call them.
I've got a question on my mind that has been stirring for months as I've read about DDD, patterns and many other topics of application architecture. I'm going to frame this in terms of an MVC web application but the question is, I'm sure, much broader. and it is this: Does the adherence to domain entities create rigidity and inefficiency in an application?
The DDD approach makes complete sense for managing the business logic of an application and as a way of working with stakeholders. But to me it falls apart in the context of a multi-tiered application. Namely there are very few scenarios when a view needs all the data of an entity or when even two repositories have it all. In and of itself that's not bad but it means I make multiple queries returning a bunch of properties I don't need to get a few that I do. And once that is done the extraneous information either gets passed to the view or there is the overhead of discarding, merging and mapping data to a DTO or view model. I have need to generate a lot of reports and the problem seems magnified there. Each requires a unique slicing or aggregating of information that SQL can do well but repositories can't as they're expected to return full entities. It seems wasteful, honestly, and I don't want to pound a database and generate unneeded network traffic on a matter of principle. From questions like this Should the repository layer return data-transfer-objects (DTO)? it seems I'm not the only one to struggle with this question. So what's the answer to the limitations it seems to impose?
Thanks from a new and confounded DDD-er.
What's the real problem here? Processing business rules and querying for data are 2 very different concerns. That realization leads us to CQRS - Command-Query Responsibility Segregation. What's that? You just don't use the same model for both tasks: Domain Model is about behavior, performing business processes, handling command. And there is a separate Reporting Model used for display. In general, it can contain a table per view. These tables contains only relevant information so you can get rid of DTO, AutoMapper, etc.
How these two models synchronize? It can be done in many ways:
Reporting model can be built just on top of database views
Database replication
Domain model can issue events containing information about each change and they can be handled by denormalizers updating proper tables in Reporting Model
as I've read about DDD, patterns and many other topics of application architecture
Domain driven design is not about patterns and architecture but about designing your code according to business domain. Instead of thinking about repositories and layers, think about problem you are trying to solve. Simplest way to "start rehabilitation" would be to rename ProductRepository to just Products.
Does the adherence to domain entities create rigidity and inefficiency in an application?
Inefficiency comes from bad modeling. [citation needed]
The DDD approach makes complete sense for managing the business logic of an application and as a way of working with stakeholders. But to me it falls apart in the context of a multi-tiered application.
Tiers aren't layers
Namely there are very few scenarios when a view needs all the data of an entity or when even two repositories have it all. In and of itself that's not bad but it means I make multiple queries returning a bunch of properties I don't need to get a few that I do.
Query that data as you wish. Do not try to box your problems into some "ready-made solutions". Instead - learn from them and apply only what's necessary to solve them.
Each requires a unique slicing or aggregating of information that SQL can do well but repositories can't as they're expected to return full entities.
http://ayende.com/blog/3955/repository-is-the-new-singleton
So what's the answer to the limitations it seems to impose?
"seems"
Btw, internet is full of things like this (I mean that sample app).
To understand what DDD is, read blue book slowly and carefully. Twice.
If you think that fully fledged DDD is too much effort for your scenario then maybe you need to take a step down and look at something closer to Active Record.
I use DDD but in my scenario I have to support multiple front-ends; a couple web sites and a WinForms app, as well as a set of services that allow interaction with other automated processes. In this case, the extra complexity is worth it. I use DTO's to transfer a representation of my data to the various presentation layers. The CPU overhead in mapping domain entities to DTO's is small - a rounding error when compared to net work calls and database calls. There is also the overhead in managing this complexity. I have mitigated this to some extent by using AutoMapper. My Repositories return fully populated domain objects. My service layer will map to/from DTO's. Here we can flatten out the domain objects, combine domain objects, etc. to produce a more tabulated representation of the data.
Dino Esposito wrote an MSDN Magazine article on this subject here - you may find this interesting.
So, I guess to answer your "Why" question - as usual, it depends on your context. DDD maybe too much effort. In which case do something simpler.
Each requires a unique slicing or aggregating of information that SQL can do well but repositories can't as they're expected to return full entities.
Add methods to your repository to return ONLY what you want e.g. IOrderRepository.GetByCustomer
It's completely OK in DDD.
You may also use Query object pattern or Specification to make your repositories more generic; only remember not to use anything which is ORM-specific in interfaces of the repositories(e.g. ICriteria of NHibernate)
I am on a tight schedule with my project so don't have time to read books to understand it.
Just like anything else we can put it in few lines after reading books for few times. So here i need some description about each terms in DDD practices guideline so I can apply them bit at a piece to my project.
I already know terms in general but can't put it in terms with C# Project.
Below are the terms i have so far known out of reading some brief description in relation with C# project. Like What is the purpose of it in C# project.
Services
Factories
Repository
Aggregates
DomainObjects
Infrastructure
I am really confused about Infrastructure, Repository and Services
When to use Services and when to use Repository?
Please let me know if anyway i can make this question more clear
I recommend that you read through the Domain-Driven Design Quickly book from infoq, it is short, free in pdf form that you can download right away and does its' best to summarize the concepts presented in Eric Evan's Blue Bible
You didn't specify which language/framework the project you are currently working on is in, if it is a .NET project then take a look at the source code for CodeCampServer for a good example.
There is also a fairly more complicated example named Fohjin.DDD that you can look at (it has a focus on CQRS concepts that may be more than you are looking for)
Steve Bohlen has also given a presentation to an alt.net crowd on DDD, you can find the videos from links off of his blog post
I've just posted a blog post which lists these and some other resources as well.
Hopefully some of these resources will help you get started quickly.
This is my understanding and I did NOT read any DDD book, even the holy bible of it.
Services - stateless classes that usually operate on different layer objects, thus helping to decouple them; also to avoid code duplication
Factories - classes that knows how to create objects, thus decouple invoking code from knowing implementation details, making it easier to switch implementations; many factories also help to auto-resolve object dependencies (IoC containers); factories are infrastructure
Repository - interfaces (and corresponding implementations) that narrows data access to the bare minimum that clients should know about
Aggregates - classes that unifies access to several related entities via single interfaces (e.g. order and line items)
Domain Objects - classes that operate purely on domain/business logic, and do not care about persistence, presentation, or other concerns
Infrastructure - classes/layers that glue different objects or layers together; contains the actual implementation details that are not important to real application/user at all (e.g. how data is written to database, how HTTP form is mapped to view models).
Repository provides access to a very specific, usually single, kind of domain object. They emulate collection of objects, to some extent. Services usually operate on very different types of objects, usually accessed via static methods (do not have state), and can perform any operation (e.g. send email, prepare report), while repositories concentrate on CRUD methods.
DDD what all terms mean for Joe the plumber who can’t afford to read books few times?
I would say - not much. Not enough for sure.
I think you're being quite ambitious in trying to apply a new technique to a project that's under such tight deadlines that you can't take the time to study the technique in detail.
At a high level DDD is about decomposing your solution into layers and allocating responsibilities cleanly. If you attempt just to do that in your application you're likely to get some benefit. Later, when you have more time to study, you may discover that you didn't quite follow all the details of the DDD approach - I don't see that as a problem, you proabably already got some benefit of thoughtful structure even if you deviated from some of the DDD guidance.
To specifically answer your question in detail would just mean reiterating material that's already out there: Seems to me that this document nicely summarises the terms you're asking about.
They say about Services:
Some concepts from the domain aren’t
natural to model as objects. Forcing
the required domain functionality to
be the responsibility of an ENTITY or
VALUE either distorts the definition
of a model-based object or adds
meaningless artificial objects.
Therefore: When a significant process
or transformation in the domain is not
a natural responsibility of an ENTITY
or VALUE OBJECT, add an operation to
the model as a standalone interface
declared as a SERVICE.
Now the thing about this kind of wisdom is that to apply it you need to be able to spot when you are "distorting the definition". And I suspect that only with experience (or guidance from someone who is experienced) do you gain the insight to spot such things.
You must expect to experiment with ideas, get it a bit wrong sometimes, then reflect on why your decisions hurt or work. Your goal should not be to do DDD for its own sake, but to produce good software. When you find it cumbersome to implement something, or difficult to maintain something think about why this is, then examine what you did in the light of DDD advice. At that point you may say "Oh, if I had made that a Service, the Model would be so nmuch cleaner", or whatever.
You may find it helpful to read an example.:
After reading Evan's and Nilsson's books I am still not sure how to manage Data access in a domain driven project. Should the CRUD methods be part of the repositories, i.e. OrderRepository.GetOrdersByCustomer(customer) or should they be part of the entities: Customer.GetOrders(). The latter approach seems more OO, but it will distribute Data Access for a single entity type among multiple objects, i.e. Customer.GetOrders(), Invoice.GetOrders(), ShipmentBatch.GetOrders() ,etc. What about Inserting and updating?
CRUD-ish methods should be part of the Repository...ish. But I think you should ask why you have a bunch of CRUD methods. What do they really do? What are they really for? If you actually call out the data access patterns your application uses I think it makes the repository a lot more useful and keeps you from having to do shotgun surgery when certain types of changes happen to your domain.
CustomerRepo.GetThoseWhoHaventPaidTheirBill()
// or
GetCustomer(new HaventPaidBillSpecification())
// is better than
foreach (var customer in GetCustomer()) {
/* logic leaking all over the floor */
}
"Save" type methods should also be part of the repository.
If you have aggregate roots, this keeps you from having a Repository explosion, or having logic spread out all over: You don't have 4 x # of entities data access patterns, just the ones you actually use on the aggregate roots.
That's my $.02.
DDD usually prefers the repository pattern over the active record pattern you hint at with Customer.Save.
One downside in the Active Record model is that it pretty much presumes a single persistence model, barring some particularly intrusive code (in most languages).
The repository interface is defined in the domain layer, but doesn't know whether your data is stored in a database or not. With the repository pattern, I can create an InMemoryRepository so that I can test domain logic in isolation, and use dependency injection in the application to have the service layer instantiate a SqlRepository, for example.
To many people, having a special repository just for testing sounds goofy, but if you use the repository model, you may find that you don't really need a database for your particular application; sometimes a simple FileRepository will do the trick. Wedding to yourself to a database before you know you need it is potentially limiting. Even if a database is necessary, it's a lot faster to run tests against an InMemoryRepository.
If you don't have much in the way of domain logic, you probably don't need DDD. ActiveRecord is quite suitable for a lot of problems, especially if you have mostly data and just a little bit of logic.
Let's step back for a second. Evans recommends that repositories return aggregate roots and not just entities. So assuming that your Customer is an aggregate root that includes Orders, then when you fetched the customer from its repository, the orders came along with it. You would access the orders by navigating the relationship from Customer to Orders.
customer.Orders;
So to answer your question, CRUD operations are present on aggregate root repositories.
CustomerRepository.Add(customer);
CustomerRepository.Get(customerID);
CustomerRepository.Save(customer);
CustomerRepository.Delete(customer);
I've done it both ways you are talking about, My preferred approach now is the persistent ignorant (or PONO -- Plain Ole' .Net Object) method where your domain classes are only worried about being domain classes. They do not know anything about how they are persisted or even if they are persisted. Of course you have to be pragmatic about this at times and allow for things such as an Id (but even then I just use a layer super type which has the Id so I can have a single point where things like default value live)
The main reason for this is that I strive to follow the principle of Single Responsibility. By following this principle I've found my code much more testable and maintainable. It's also much easier to make changes when they are needed since I only have one thing to think about.
One thing to be watchful of is the method bloat that repositories can suffer from. GetOrderbyCustomer.. GetAllOrders.. GetOrders30DaysOld.. etc etc. One good solution to this problem is to look at the Query Object pattern. And then your repositories can just take in a query object to execute.
I'd also strongly recommend looking into something like NHibernate. It includes a lot of the concepts that make Repositories so useful (Identity Map, Cache, Query objects..)
Even in a DDD, I would keep Data Access classes and routines separate from Entities.
Reasons are,
Testability improves
Separation of concerns and Modular design
More maintainable in the long run, as you add entities, routines
I am no expert, just my opinion.
The annoying thing with Nilsson's Applying DDD&P is that he always starts with "I wouldn't do that in a real-world-application but..." and then his example follows. Back to the topic: I think OrderRepository.GetOrdersByCustomer(customer) is the way to go, but there is also a discussion on the ALT.Net Mailing list (http://tech.groups.yahoo.com/group/altdotnet/) about DDD.