We want to model a warehouse application. Let us assume we identified the following real world objects:
Articles(the things stored in the warehouse)
Palettes (where the Articles are on)
Compartments (the places in the racks where the palettes are stored in)
There are the following constraints:
A palette is in exactly one compartment
A compartment can hold zero or one palette
For the start we have one operation:
Move (moves a palette from its current compartment to another).
Of course this is very simplified.
How should this be modelled?
Stockitem could be a value object I think. One solution would be to model the whole warehouse as an aggregate with palette and compartment entities. The move operation could be implemented without any problems in this case regarding its constraints (invariants). But this approach has obvious drawbacks. The Event log for this aggregate will grow infinitely. Two move operations could not be executed in parallel because of the aggregate versioning and so on. And from a ddd point of view, it feels not right for me.
Another approach would be to make each palette and each compartment its own aggregate. But how would the move operation be implemented then?
Problem 1 : Who loads referenced aggregates?
I think it should stick to the palette. The palette could reference the compartment it is in (its an aggregat). But how is this reference implemented (CQRS/ES)? The Comandhandler of the move command obviously will load the palette aggregate from the palette repository and call the move method on it. Who loads the referenced compartment? And who loads the compartment it should be moved to? I read that aggregates should not access repositories. Should the commandhandler load both compartments? Should the compartments be given to the move method as parameters? Or should the commandhandler set the current compartment to the palette and give the target compartment as a parameter?
Problem 2 & 3 : Constraints and bidirectional association between aggregates
What about the constraints? To check if the target compartment is empty, the compartment needs to know the palette that is stored in it. This would be a bidirectional association that should be avoided. And because they are different aggregates, they could not be updated in the same transaction. Has the palette to fire a domain event to inform the compartment that it will move to it? Has it to be implemented as a saga with undo action? What if two moves conflict, one wins, but the loosing palette could not be moved back, because the old compartment is filled up in the meantime?
This all seems very complicated to me regarding the really simple problem.
In the books and the examples it seems all so clear. But if I try to use it, I seem to do it wrong.
Could somebody guide me in the right direction please?
Problem 1
I think you need to do some transactional analysis in addition to the business analysis here.
If your domain is highly collaborative (which is what DDD is recommended for), how often do moves to a given compartment happen ? Would it be feasible contention-wise for the Move operation to happen under a transaction that spans across the 2 Compartment aggregates (source and target) ? Or would eventual consistency suffice, where the source Compartment would signal the world via an event that the Palette has left it, and the target Compartment would somehow be informed later, in an asynchronous way, that the Palette is joining it ?
A palette floating "in limbo" for a small amount of time can be perfectly acceptable, that's something you need to ask a domain expert about.
Problem 2
A bidirectional association is not the only solution. You could ask the PaletteRepository for all the Palettes that have CompartmentID X. Alternatively, Compartment could (should) have a list of Palette IDs, not a full reference to them.
Overall, I think you should look first if there's a business implication/answer to all the design questions you're asking yourself. The domain expert will usually have an educated opinion on whether eventual consistency is realistic or immediate consistency is required, what should happen in case of conflict, etc.
You should just start object modeling the behaviour, and don't think about aggregates and value objects at all. Once you've modeled the needed behaviour, you'll know what are the entities, aggregates, roots and value objects.
In the explanations by Vaughn Vernon he makes it clear that you need to work out the details of what you need before you can make these decisions.
Related
No, it is not a duplication question.
I have red many sources on the subject, but still I feel like I don't fully understand it.
This is the information I have so far (from multiple sources, be it articles, videos, etc...) about what is an Aggregate and Aggregate Root:
Aggregate is a collection of multiple Value Objects\Entity references and rules.
An Aggregate is always a command model (meant to change business state).
An Aggregate represents a single unit of (database - because essentialy the changes will be persisted) work, meaning it has to be consistent.
The Aggregate Root is the interface to the external world.
An Aggregate Root must have a globally unique identifier within the system
DDD suggests to have a Repository per Aggregate Root
A simple object from an aggregate can't be changed without its AR(Aggregate Root) knowing it
So with all that in mind, lets get to the part where I get confused:
in this site it says
The Aggregate Root is the interface to the external world. All interaction with an Aggregate is via the Aggregate Root. As such, an Aggregate Root MUST have a globally unique identifier within the system. Other Entites that are present in the Aggregate but are not Aggregate Roots require only a locally unique identifier, that is, an Id that is unique within the Aggregate.
But then, in this example I can see that an Aggregate Root is implemented by a static class called Transfer that acts as an Aggregate and a static function inside called TransferedRegistered that acts as an AR.
So the questions are:
How can it be that the function is an AR, if there must be a globaly unique identifier to it, and there isn't, reason being that its a function. what does have a globaly unique identifier is the Domain Event that this function produces.
Following question - How does an Aggregate Root looks like in code? is it the event? is it the entity that is returned? is it the function of the Aggregate class itself?
In the case that the Domain Event that the function returns is the AR (As stated that it has to have that globaly unique identifier), then how can we interact with this Aggregate? the first article clearly stated that all interaction with an Aggregate is by the AR, if the AR is an event, then we can do nothing but react on it.
Is it right to say that the aggregate has two main jobs:
Apply the needed changes based on the input it received and rules it knows
Return the needed data to be persisted from AR and/or need to be raised in a Domain Event from the AR
Please correct me on any of the bullet points in the beginning if some/all of them are wrong is some way or another and feel free to add more of them if I have missed any!
Thanks for clarifying things out!
I feel like I don't fully understand it.
That's not your fault. The literature sucks.
As best I can tell, the core ideas of implementing solutions using domain driven design came out of the world of Java circa 2003. So the patterns described by Evans in chapters 5 and six of the blue book were understood to be object oriented (in the Java sense) domain modeling done right.
Chapter 6, which discusses the aggregate pattern, is specifically about life cycle management; how do you create new entities in the domain model, how does the application find the right entity to interact with, and so on.
And so we have Factories, that allow you to create instances of domain entities, and Repositories, that provide an abstraction for retrieving a reference to a domain entity.
But there's a third riddle, which is this: what happens when you have some rule in your domain that requires synchronization between two entities in the domain? If you allow applications to talk to the entities in an uncoordinated fashion, then you may end up with inconsistencies in the data.
So the aggregate pattern is an answer to that; we organize the coordinated entities into graphs. With respect to change (and storage), the graph of entities becomes a single unit that the application is allowed to interact with.
The notion of the aggregate root is that the interface between the application and the graph should be one of the members of the graph. So the application shares information with the root entity, and then the root entity shares that information with the other members of the aggregate.
The aggregate root, being the entry point into the aggregate, plays the role of a coarse grained lock, ensuring that all of the changes to the aggregate members happen together.
It's not entirely wrong to think of this as a form of encapsulation -- to the application, the aggregate looks like a single entity (the root), with the rest of the complexity of the aggregate being hidden from view.
Now, over the past 15 years, there's been some semantic drift; people trying to adapt the pattern in ways that it better fits their problems, or better fits their preferred designs. So you have to exercise some care in designing how to translate the labels that they are using.
In simple terms an aggregate root (AR) is an entity that has a life-cycle of its own. To me this is the most important point. One AR cannot contain another AR but can reference it by Id or some value object (VO) containing at least the Id of the referenced AR. I tend to prefer to have an AR contain only other VOs instead of entities (YMMV). To this end the AR is responsible for consistency and variants w.r.t. the AR. Each VO can have its own invariants such as an EMailAddress requiring a valid e-mail format. Even if one were to call contained classes entities I will call that semantics since one could get the same thing done with a VO. A repository is responsible for AR persistence.
The example implementation you linked to is not something I would do or recommend. I followed some of the comments and I too, as one commenter alluded to, would rather use a domain service to perform something like a Transfer between two accounts. The registration of the transfer is not something that may necessarily be permitted and, as such, the domain service would be required to ensure the validity of the transfer. In fact, the registration of a transfer request would probably be a Journal in an accounting sense as that is my experience. Once the journal is approved it may attempt the actual transfer.
At some point in my DDD journey I thought that there has to be something wrong since it shouldn't be so difficult to understand aggregates. There are many opinions and interpretations w.r.t. to DDD and aggregates which is why it can get confusing. The other aspect is, in IMHO, that there is a fair amount of design involved that requires some creativity and which is based on an understanding of the domain itself. Creativity cannot be taught and design falls into the realm of tacit knowledge. The popular example of tacit knowledge is learning to ride a bike. Now, we can read all we want about how to ride a bike and it may or may not help much. Once we are on the bike and we teach ourselves to balance then we can make progress. Then there are people who end up doing absolutely crazy things on a bike and even if I read how to I don't think that I'll try :)
Keep practicing and modelling until it starts to make sense or until you feel comfortable with the model. If I recall correctly Eric Evans mentions in the Blue Book that it may take a couple of designs to get the model closer to what we need.
Keep in mind that Mike Mogosanu is using a event sourcing approach but in any case (without ES) his approach is very good to avoid unwanted artifacts in mainstream OOP languages.
How can it be that the function is an AR, if there must be a globaly unique identifier to it, and there isn't, reason being that
its a function. what does have a globaly unique identifier is the
Domain Event that this function produces.
TransferNumber acts as natural unique ID; there is also a GUID to avoid the need a full Value Object in some cases.
There is no unique ID state in the computer memory because it is an argument but think about it; why you want a globaly unique ID? It is just to locate the root element and its (non unique ID) childrens for persistence purposes (find, modify or delete it).
Order A has 2 order lines (1 and 2) while Order B has 4 order lines (1,2,3,4); the unique identifier of order lines is a composition of its ID and the Order ID: A1, B3, etc. It is just like relational schemas in relational databases.
So you need that ID just for persistence and the element that goes to persistence is a domain event expressing the changes; all the changes needed to keep consistency, so if you persist the domain event using the global unique ID to find in persistence what you have to modify the system will be in a consistent state.
You could do
var newTransfer = New Transfer(TransferNumber); //newTransfer is now an AG with a global unique ID
var changes = t.RegisterTransfer(Debit debit, Credit credit)
persistence.applyChanges(changes);
but what is the point of instantiate a object to create state in the computer memory if you are not going to do more than one thing with this object? It is pointless and most of OOP detractors use this kind of bad OOP design to criticize OOP and lean to functional programming.
Following question - How does an Aggregate Root looks like in code? is it the event? is it the entity that is returned? is it the function
of the Aggregate class itself?
It is the function itself. You can read in the post:
AR is a role , and the function is the implementation.
An Aggregate represents a single unit of work, meaning it has to be consistent. You can see how the function honors this. It is a single unit of work that keeps the system in a consistent state.
In the case that the Domain Event that the function returns is the AR (As stated that it has to have that globaly unique identifier),
then how can we interact with this Aggregate? the first article
clearly stated that all interaction with an Aggregate is by the AR, if
the AR is an event, then we can do nothing but react on it.
Answered above because the domain event is not the AR.
4 Is it right to say that the aggregate has two main jobs: Apply the
needed changes based on the input it received and rules it knows
Return the needed data to be persisted from AR and/or need to be
raised in a Domain Event from the AR
Yes; again, you can see how the static function honors this.
You could try to contat Mike Mogosanu. I am sure he could explain his approach better than me.
I'm struggling with how/if to define "a set of aggregates". Aggregates are supposed to be stand alone and isolated but it's easy to think of a bigger set of aggregates that belong together. But is this a trap?
Using this "set of aggregates" it would be possible to for instance enumerate and index aggregates on a unique property within the set and have other domain rules that could be validated across all aggregates in the set. It's tempting but also feels a bit wrong.
Another approach would be to avoid this thinking completely and not allow/define a set of aggregates and not allow enumerating aggregates but only load/save on aggregate-id. Using this option if would be necessary to reference aggregates from other aggregates and by doing this build up an interconnected graph of aggregates.
The approaches are similar to having aggregates in a folder on disk or having an "internet" of aggregates where the references between them are defining the bigger set of aggregates. In any case I'm really stuck on this problem. I have never read anywhere about this and I guess nobody really cares that much? I'm not sure I explain this very good but my question is if there are any definitions of the "set of aggregates" or if we should think of aggregates as totally isolated/on its own and with only a unique aggregate-id (UUID)?
The set of aggregates could for instance be the database being used under the surface. But what I'm wondering is if this database as in the information about what aggregates it contains has any definition in DDD or if we should think about a set of aggregates as an interconnected graph where only traversal of this graph can be used to enumerate all "associated" aggregates.
Aggregates are connected
In any application with sufficient complexity, Aggregates end up referencing one-another. And it is perfectly reasonable to use their unique identifiers as reference IDs to refer to each other.
But take care to load and persist aggregates outside the domain layer, typically in repositories. If you want to traverse links across aggregates and load them into memory, you will be doing that upfront before handing over control to the domain layer for the actual processing.
Traversing the graph to get all related aggregates is correct, but this rarely spans across too many aggregate boundaries. You rarely find a single change or rule to be applied throughout the application. If you do have such a transaction, it is probably a sign of the domain design needing improvement, simply because you are spreading one responsibility/change amongst many aggregates.
The connectivity is so usual that you should watch out for aggregates that have no linkages with the rest of the system. They are either standalone libraries, or they probably belong to a different bounded context.
Aggregates can morph into different forms
They are aggregates because they form a clear invariant boundary, with their primary responsibility being to enforce invariants across state changes for all the entities within themselves. But they can morph into different kinds of DDD objects based on the requirement.
A good example is of a single Currency note. In most applications, they are value objects. But for the federal bank, they are aggregates with clear cut invariant rules. They are aggregates when they are created and referenced, but in a transaction that ships printed notes to banks, they may become value objects.
So you may have to evaluate whether you are talking about a domain entity in its aggregate form, or as a value object when you consider each linkage.
Aggregates are invariant boundaries
It is wrong to validate domain rules across aggregates.
Your aggregate boundary is an invariant boundary, meaning all the domain rules within it should be satisfied at all time. By that logic, you are going to incorrectly build up a structure that will need to ensure that all domain rules across aggregates are valid at all time. Doing so will impose considerable performance burden, not to mention the complexity in business logic.
But this is not to say that there may be domain rules that span across aggregates. The correct way to accomplish this would be using eventual consistency and an Event-driven approach.
The primary changing aggregate would validate and persist the data, and bubble up an event containing the state change. Other aggregates would then act on the event and bring themselves up-to-date. If an aggregate's domain rules break because of the change, there is usually a supplementary mechanism that allows correction of the problem (a preferred mechanism) or a rollback of the first state change (happens very rarely).
Perhaps you can find a common denominator the agg sets have in common and use that to work with?
A simplified example; there is a set of Books and a set of Users that have nothing in common except you want to know whenever they were first registered? What might be an option is to have an interface FirstRegistration and then you can choose to either expand Books/Users or create a specific entity instead.
I'm struggling with how/if to define "a set of aggregates". Aggregates
are supposed to be stand alone and isolated but it's easy to think of
a bigger set of aggregates that belong together. But is this a trap?
I think you're struggling because indeed the idea of a set of aggregates (instances) is very generic, and the uses of such things are contextual and domain-specific. People don't talk specifically about it because of course you may have behaviors that operate on a collection of multiple aggregates, but that doesn't give such collections any particular common properties or requirements that would allow you, from a general DDD perspective, to characterize such collections more specifically than "a set of aggregates", "a list of distinct aggregates", or similar.
Using this "set of aggregates" it would be possible to for instance
enumerate and index aggregates on a unique property within the set and
have other domain rules that could be validated across all aggregates
in the set. It's tempting but also feels a bit wrong.
Tempting why? You've couched the question in very abstract terms, so it's pretty much impossible to contradict you about the "it would be possible", but just because something may be possible doesn't mean it would be useful. In practice, I think you'll find that rules or behaviors that operate on collections of aggregates most naturally belong not to collections of aggregates in an abstract sense, but rather to other aggregate types in your domain model, to domain repositories, or to domain services.
It is entirely plausible that your domain model might want to handle particular sets of aggregates characterized by some rule. For example, if you're an airline, then one of the aggregates in your domain model might a single seat on a flight, since that's the unit you sell. It makes sense in that case that there would be operations on all the seats on a particular flight, for example, but whatever rules and behaviors you might have about that are specifically about that kind of aggregate, selected in that particular way.
Another approach would be to avoid this thinking completely and not
allow/define a set of aggregates and not allow enumerating aggregates
but only load/save on aggregate-id.
It's surely counterproductive to forbid working with sets of aggregates. Just don't attribute more significance to it than is warranted. There is nothing particularly special about sets of aggregates in general.
Using this option if would be
necessary to reference aggregates from other aggregates and by doing
this build up an interconnected graph of aggregates.
I don't follow that. One certainly must be able to retrieve and store individual aggregates from persistence, as that's more or less the defining property of aggregates -- they are the unit of persistence. But that doesn't mean that you must reject the ability to work with collections of aggregates. However, sets of aggregates do not have identity in the same way that individual aggregates do, so yes, relationships between aggregates need to be modeled in terms of individual aggregates. Nevertheless, that does not inherently preclude 1:m or n:m relationships among aggregates.
I'm really stuck on this problem. I have never read anywhere about this and I guess nobody really cares that much?
You'll find all sorts of uses of various sets of aggregates in applications built and maintained based on DDD ideas, but there's not much to talk about at the level of abstraction of your question, and what there is is already summed up in the words "set" and "aggregate".
The set of aggregates could for instance be the database being used
under the surface. But what I'm wondering is if this database as in
the information about what aggregates it contains has any definition
in DDD
Not to my knowledge. I suspect most DDD practitioners would just call it "the data", or something similar.
or if we should think about a set of aggregates as an
interconnected graph where only traversal of this graph can be used to
enumerate all "associated" aggregates.
I'm still not seeing why you set that up as a thing. Sure, depending on the domain model, you might be able to traverse all or substantial chunks of the data by traversing associations between aggregates, and that might be appropriate for some purposes, but DDD doesn't have to give a special name or special rules for sets of aggregates for you to work with them.
Like any useful methodology, DDD exists to solve problems. Its bread & butter is complex applications with complex data and evolving requirements. It is not to be interpreted as a straight jacket preventing designers and developers from (thoughtfully) writing designs and code that incorporate aspects of other design approaches, much less designs and code that provide for the application's idiosyncratic needs.
Somewhere far, far away in a domain galaxy there is mention of
'Measurement values' and 'Places'
Each 'Measurement value' comes from/belongs to a specific 'Place'
Each 'Measurement value' is registered on a given date & time and of a given, specific type (eg. waterflow, wind, etc)
Each 'Place' has a name and a collection of 'Measurement Values' that gets registered
Given my current model where 'Places' are the aggregate root that holds 'Measurement values' I have a dilemma:
Users wishes to view one type of measurement values at a time and there are quite a lot of measurement values.
To load all measurement values when only some of them are needed seems unneccessary..
Eg. Im stuck on how to organize/model the need "Show me waterflows (measurement values) in River X (Place) between time A and B"
Is it allowed to instantiate River X aggregate root only partially loaded with the type of measurement values concerned in a given use case?
Are there other ways of modelling measurement values and their origin?
Please let med know your thoughts...
I think that your aggregate is consistent as it is. Your dilemma as nothing to do with domain model but rather than with a presentation model.
I will consider the possibility to deserialize each measurement in a NoSQL instance, in this way your presenteation layer could filter and make any query without affecting the consistency of domain layer.
Correct me if I'm wrong but it sounds very much like the data model and storage are impacting the design of your system? If this is so it may be the cause of your dilemma. A key part of the benefit modeling using aggregates is it is free of dependencies. Dependencies such as databases and data models. There is no direct 'view' of an aggregate, so it's not shaped by the view. This makes aggregates much easier to design. They are much more focused on solving the problem. And are therefore great candidates for doing complex stuff.
If it turns out you don't need aggregates to model your domain. You can then just focus on an efficient storage and retrieval mechanism.
In other words...
Don't tie yourself up in knots doing DDD if you don't need to.
If it helps I created an infographic on common DDD mistakes. You may find it helpful. You can find it here.
By the way, I think DDD is a great way to go, but only if your domain warrants it. Appologies if I have misunderstood you.
I fail to see the real problem. You said that each Measurement is tied to a specific Place, then you don't have to load all Measurements.
Using correct data layer configuration, you can load the required Measurement by selecting/loading/instantiating only it's parent (Place).
I've always avoided using aggregation because it seems so subjective which one-to-many relationships should be classed as aggregations. But I'm reviewing a model produced by someone else in which aggregations are used for many-to-many relationships (as in: a course consists of several modules, a module may be part of several courses). That strikes me as plain wrong, but I can't find a definitive rule against it. What's the official ruling?
Two things:
Are shared aggregations allowed? According to the UML spec, yes.
Is it useful in practice? Generally I'd say no.
I am not a fan of the UML Aggregation relationship. Whilst ownership is intuitively appealing, it is too subjective practically. I don't use it, and generally don't recommend it be used (although see footnote). Instead, focus on the important questions:
What's the cardinality?
What's the create/delete behaviour?
Why does the relationship exist? (i.e. what business fact/rule is the relationship capturing?
All above can be done with straight associations. If the answer is (a) it's one to many, (b) the 'one' end is responsible for creating/deleting the 'many' end and (c) you really want to, then use the Composite association. Aggregation however doesn't generally improve readability of the model, it adds confusion and detracts from surfacing the underlying domain rules/requirements.
hth.
footnote: there is one scenario where Aggregation does have well-defined semantics and can be useful. Specifically, if you have a recursive relationship, Aggregation says the resultant object structure is acyclic (i.e. a DAG). Downside is relatively few people realise that property - certainly not business domain experts. So you typically have to highlight anyway, e.g. in a comment / constraint.
A good website for this is
http://www.uml-diagrams.org/class-diagrams.html
If you search there for "Shared and Composite Aggregation" you will read, that shared parts can be modeled as aggregations. Even if the composite holding the part will be discarded the parts are allowed to survive.
This seems to make many to many relationships possible. For example sharing a part of a view for several view-components. Why not...
Personally this matches my understanding, that UML is very interpretative.
Let's set the terms. The Aggregation is a metaterm in the UML standard, and means BOTH composition and shared aggregation, simply named shared. To often it is named incorrectly "aggregation". It is BAD, for composition is an aggregation, too. As I understand, you mean "shared".
Again, if we'll look at the UML standards (look for Superstructure documentation there), we'll find, that "Precise semantics of shared aggregation varies by application area and modeler." So, ANY strategy you choose is acceptable. And you even can use different strategy for different projects.
But the shared aggregation IS useful and CAN be used with multiplicities on both sides and even the empty diamond can be on both sides.
The association in UML is an abstraction, that can be realized in any language and in any way, only the realization must be up to the diagram.
Such association, as on the diagram, can be realized as following:
Every instance of Student has a list of courses he is registered to,
and every instance of Courses has a list of registered
students/participants.
But this is not the only way of realization. There could be arrays instead of lists, and even somebody can make it without any normal collection at all - simply using the addresses in memory in C++.
Of course, we could draw two associations, one for student's list of courses and the second for courses' list of students. But thus:
our diagram becomes more complex and, therefore, less readable.
we are describing thoroughly the elementary things that any coder will do anyway in 99% cases.
we are limiting the freedom of coders. And in 1% of cases they'll have to choose between not following the diagram and not coding effectively. It is simply not your job.
So, do as you wish. Forbidden is only to change the strategy during one project and to FORBID others to use their only strategy.
DDD states that you should only ever access entities through their aggregate root. So say for instance that you have an aggregate root X which potentially has a lot of child Y entities. Now, for some scenario, you only really care about a subset of these Y entities at a time (maybe you're displaying them in a paged list or whatever).
Is it OK to implement a repository then, so that in such scenarios it returns an incomplete aggregate? Ie. an X object who'se Ys collection only contains the Y instances we're interested in and not all of them? This could for instance cause methods on X which perform some calculation involving the Ys to not behave as expected.
Is this perhaps an indication that the Y entity in question should be considered promoted to an aggregate root?
My current idea (in C#) is to leverage the delayed execution of LINQ, so that my X object has an IQueryable to represent its relationship with Y. This way, I can have transparent lazy loading with filtering... But getting this to work with an ORM (Linq to Sql in my case) might be a bit tricky.
Any other clever ideas?
I consider an aggregate root with a lot of child entities to be a code smell, or a DDD smell if you will. :-) Generally I look at two options.
Split your aggregate into many smaller aggregates. This means that my original design was not optimal and I need to identify some new entities.
Split your domain into multiple bounded contexts. This means that there are specific sets of scenarios that use a common subset of the entities in the aggregate, while there are other sets of scenarios that use a different subset.
Jimmy Nilsson hints in his book that instead of reading a complete aggregate you can read a snapshot of parts of it. But you are not supposed to be able to save changes in the snapshot classes to the database.
Jimmy Nilsson's book Chapter 6: Preparing for infrastructure - Querying. Page 226.
Snapshot pattern
You're really asking two overlapping questions.
The title and first half of your question are philosophical/theoretical. I think the reason for accessing entities only through their "aggregate root" is to abstract away the kinds of implementation details you're describing. Access through the aggregate root is a way to reduce complexity by having a trusted point of access. You're eliminating friction/ambiguity/uncertainty by adhering to a convention. It doesn't matter how it's implemented within the root, you just know that when you ask for an entity it will be there. I don't think this perspective rules out a "filtered repository" as you describe. But to provide a pit of success for devs to fall into, it should be impossible instantiate the repository without being explicit about its "filteredness;" likewise, if shared access to a repository instance is possible, the "filteredness" should be explicit when coding in the caller.
The second half of your question is about implementation on a specific platform. Not sure why you mention delayed execution, I think that's really orthogonal to the filtering question. The filtering itself could be a bit tricky to implement with LINQ. Maybe rather than inlining the Where lambdas, you set up a collection of them and select one depending on the filter you need.
You are allowed since the code will compile anyway, but if you're going for a pure DDD design you should not have incomplete instances of objects.
You should look into LazyLoading if you're afraid to load a huge object of which you will only use a small portion of its child entities.
LazyLoading delays the loading of whatever you decide to lazy-load until the moment they are accessed. They make use of callbacks to call the loading method once the code calls for them.
Is it OK to implement a repository then, so that in such scenarios it
returns an incomplete aggregate?
Not at all. Aggregate is a transnational boundary to change the state of your system. Never use aggregates for querying data. Split the system into Write and Read sides. (read about CQR & CQRS). When we think "CRUD" based, we implement our system, based on some resource. Lets say you have "Appointment" aggregate. Thinking "Crudish" means we should implement usecases Create, Update, Delete, GetAll appointments. That means Appointment[] should be returned for GetAll. When you think usecase based, (HexagonalArchitecture) your usecases would be ScheduleAppointment, RescheduleAppointment, CancelAppointment. But for query side it can be: /myCalendar. We return back all appointments for a specific user in a ClientCalendar object. Create separate DTO's for Query sides. Never use aggregates for this purpose.