Axon Framework: send command on aggregate load - domain-driven-design

We're building a microservices system with Axon Framework 4.1. In our domain, we have a label concept where we can attach labels to other entities. While labels are normally created and managed by the user, some of these labels are "special" and need to be hard-coded, but they need to be present in the event stream as well.
We have a bunch of aggregates that represent entities that can be labeled with these labels. Some of these aggregates will be used frequently, while others might be used infrequently or are even abandoned by the user.
Sometimes we come up with new special labels. We add them to the code, and then we also need to add them to the event stream. What is a good way to do that?
We can create a special command that we need to send when the updated service is started for the first time. It goes through all the labels and adds the ones that aren't in the event stream yet. This has two disadvantages. First, we need to actually send that command, which either requires us to not forget it, or to add some infrastructure for it outside of the code (e.g., in our build pipeline). Also, other services could have booted up faster with the new labels and started sending commands before we fired our special command. The other disadvantage is that this command will target all aggregates, including the abandoned ones, which could be wasteful of resources and be confusing to end users who might see activity in a document they thought was abandoned.
Ideally, we would like to be able to send the command when Axon has just loaded the aggregate. That way we would be certain that the labels are only introduced in aggregates that are actually used. Also, we could wire this up in code and it wouldn't require us to add infrastructure outside of the application and/or remember to do it manually.
Unfortunately, this feature doesn't seem to exist in Axon (yet) 😉.
Are there other (better) ways to achieve this?

I've got an idea which might help you out on this.
If I understand the use case correctly, the "Label" in your system, which user can introduce themselves but for which also a couple of hard-coded versions exist, is an Aggregate.
Based on that assumption, I suggest to be smart with the Aggregate Identifier you are using.
The sole thing that Axon expects from you, is that the Aggregate Identifier is (or can be made in to) a String. Typically a UUID is used for the Aggregate Identifiers, which is a reasonable first start.
You can however wrap this UUID in a typed-id object. Taking your "Label" Aggregate, that would opt for a LabelId.
That said, let's first go back to verifying whether a given "Label" Aggregate exists within the Event Stream.
The concern you have is rather valid I think; reading the entire Event Stream to figure out whether a given Aggregate instance exists is to big of a hassle.
However, the EventStore can be queried through two mechanism:
The Event Stream from a given point in time (e.g. what the TrackingToken mechanism does).
The Event Stream for a given Aggregate instance, based on the Aggregate Identifier.
It's the second option which is far more ideal in your scenario.
Just query the EventStore for a given "Label" Aggregate's Identifier. If you receive a non-empty Event Stream, you know it already exists.
Vice versa, if no Events are found, you are certain it's a new "Label" that needs to be introduced.
The crux here is in knowing the "Label's" Aggregate Identifier up front, which circles back to the String storage approach for the Aggregate Identifiers using a typed LabelId. What you could do, is deviate in the LabelId object between a custom "Label" (I'd opt for a UUID here) and a hard-coded "Label".
For the latter, you could for example have the label-name, plus a UUID/counter if desired.
Doing so will ensure that all the Events published from a hard-coded "Label" will have an Aggregate Identifier you can anticipate on during start-up.
Hope this is clear and all, if not, please comment on my response below.

Related

How can I design a bridge from a legacy CRUD oriented app to a CQRS and Event sourcing system?

I was asked to implement CQRS/Event sourcing patterns into a legacy web application, in order to prepare to migrate it from a monolithic/state oriented model to a distributed, service oriented app.
I have some questions on how I can design a Domain oriented code bundle that would connect the legacy entities strongly coupled to database, with a new Event sourced model.
The first things I did were:
writing a small "framework" for CQRS/ES, with classes like AggregateRoot, DomainEvent, Command, Handlers, Messaging, Eventstore, AggregateIds, etc.
trying to group and "migrate" the legacy Entities into some Aggregates to reconstruct all the history and states of the app into EventSoourced Aggregates
plug some Commands dispatching in the old controllers in order to let the app work as is, but also to feed the new CQRS/ES system on the side.
The context:
The legacy app contains several entities, mapped to database, that hold the model layer. (Our domain is Human resources (manpower).
Let's say we have those existing entities:
Worker, with various fields and related entities (OneToOne, OneToMany), like
name
address 1-1
competences 1-N
Society, in which worker works, with various fields and related entities (OneToOne, OneToMany), like
name
address 1-1
hours
Contract, with various fields and related entities (OneToOne, OneToMany), like
address 1-1
Worker 1-1
Society 1-1
documents 1-N
days 1-N
hours
etc.
From this legacy model, I designed a MissionAggregate that holds:
A db independent ID, like UUID
some Value objects: address, days (they were an entity in the legacy model, they became VOs here)
I also designed a WorkerAggregate and a SocietyAggregate, with fields and UUIDS, and in the MissionAggregate I added:
a reference to WorkerAggregate's UUID
a reference to SocietyAggregate's UUID
As I said earlier, my aim is to leave the legacy app as is, but just introduce in the CRUD controller's methods some calls to dispatch Commands to the new CQRS system.
For example:
After flushing newly created Contract in bdd, I want to dispatch a "CreateMissionCommand" to the new command bus.
It targets the appropriate Command Handler, that handles all the command's data, passes it to a newly created Aggregate with a new UUID and stores "MissionCreatedDomainEvent" in the EventStore.
The DomainEvent is indexed with an AggregateId, a playhead, and has a payload which contains the fields necessary to be applied to and build the MissionAggregate.
The newly Contract created in the app has now its former lifecycle, as usual, with all the updates that the legacy app does on it. But I also need to reflects all those changes to the corresponding EventSourcedAggregate, so every time there is a flush in database in the app, I dispatch a Command that translates the "crud like operations" of the legacy app into a Domain oriented /Command oriented pattern.
To sum up the workflow is:
A Crud legacy operation occurs and flushes some changes on the Contract Entity
In just a row of code in the controller, I dispatch a command built with necessary fields (AggregateId of the MissionAggregate... that I need to have stored somewhere... see next problems) to the Domain command bus, so that the impact on the existing code base is very low.
The bus passes the command to the corresponding command handler
The handler loads the aggregate and applies the changes it by calling the appropriate Aggregate method
then after some validation, the aggregate raises and stores the appropriate event
My problems and questions (some of them at least) are:
I feel like I am rewriting all big portions of the legacy app, with the same kind of relations between the Aggregates that I have between the Entities, and with the same type of validations, checks etc.
Having references, to both WorkerAggregate and SocietyAggregate UUID in MissionAggregate implies that I have to build those aggregate also (hence to dispatch commands from legacy app when the Worker and Society entities are flushed). Can't I have only references to Worker's entity id and Society's entity id?
How can I avoid having a eternally growing MissionAggregate? The Contract Entity is quite huge, it has a lot of fields that are constantly updated (hours, days, documents, etc.) If I want to store all those events, I need to have a large MissionAggregate to reflect all those changes; and so I need to have a tons of CommandHandlers that react to all the Commands of add, update, etc. that I am going to dispatch from the legacy app.
How "free" is an Aggregate from the Root entity it is supposed to refer to ? For example, a Contract Entity needs to relate somewhere to it's related Mission Aggregate, like for example when I want to dispatch a Command from the app, just after the legacy code having flushed something on the Entity. Where to store this relation? In the Entity itself, in a AggregateId field? in the Aggregate, should I have a ContractId field? Or should I have some kind of Mapping Table somewhere that holds the relationship between Contract ID and MissionAggregate ID?
What to do with the past? Should I migrate all the existing data through a script that generates Aggregates and events on all the historical data?
Thanks in advance for your time.
You have a huge task ahead of you, let's try to break it down.
It's best to build this new part of the system in isolation from the legacy codebase, otherwise you're going to have your hands tied in every turn of the way.
Create a separate layer in your project for these new requirements. We're going to call it "bubble" from now on. This bubble will be like a greenfield project, with its own structure, dependencies, etc. There will be no direct communication between the bubble and the legacy; communication will happen through another dedicated translation layer, which we'll call "Anti-Corruption Layer" (ACL).
ACL
It is like an API between two systems.
It translates calls from the bubble to the legacy and vice-versa. Its purpose is to prevent one system from corrupting or influencing the other. This way you can keep building/maintaining each system independently from each other.
At the same time, the ACL allows one system to consume the other, and reuse logic, validations, rules, etc.
To answer your questions directly:
I feel like i am rewriting all big portions of the legacy app, with the same kind of relations between the Aggregates that i have between the Entities, and with the same type of validations, checks etc.
With the ACL, you can resort to calling validations and reuse implementations from the legacy code. This will allow you time to rewrite things as needed or as possible.
You may not need to rewrite the entire system, though. If your goal is to implement CQRS and Event Sourcing and you can achieve this goal by keeping most or part of the legacy system, I would say you do it. Unless, of course, one of the goals is to completely replace the old system. Otherwise, keep it; write as less code as possible.
Suggested workflow:
Keep the CQRS and Event Sourcing system in the bubble
Do not bring these new frameworks into legacy
Make the lagacy Controller issue method calls to the ACL
The ACL will convert these calls into Commands and dispatch them
Any events will be caught by your Event Sourcing framework
Results will be persisted to the bubble's database
The bubble's database can be a different schema in the same database or can be a different database altogether. But you'll have to think about synchronization, and that's a topic of its own. To reduce complexity, I recommend a different schema in the same database.
Having references, to both WorkerAggregate and SocietyAggregate UUID in MissionAggregate implies that i have to build those aggregate also (hence to dispatch commands from legacy app when the Worker and Society entities are flushed). Can't i have only references to Worker's entity id and Society's entity id?
How can i avoid having a eternally growing MissionAggregate ? The Contract Entity is quite huge, it has a looot of fields that are constantly updated (hours, days, documents, etc.) If i want to store all those events, i need to have a large MissionAggregate to reflect all those changes; and so i need to have a tons of CommandHandlers that react to all the Commands of add, update, etc that i am going to dispatch from the legacy app.
You should aim for small aggregates. Huge aggregates are likely to degrade performance and cause concurrency problems.
If you anticipate having a huge aggregate, it is best to rethink it and try to break it down. Ask what fields/properties change together - these are possibly a different aggregate.
Also, when you speak about CQRS, you generally lean towards a task-based way of doing things in your system.
Think of a traditional web application, where you have a huge page with lots of fields that are all sent to the server in one batch when the user saves.
Now, contrast it with a modern web app where the user changes small portions of data at each step. If you think about your system this way you'll find those smaller aggregates.
PS. you don't need to rebuild your interfaces for this. If your legacy system has those huge pages, you could have logic in the controllers to detect which fields were changed and issue the appropriate commands.
How "free" is an Aggregate from the Root entity it is supposed to refer to ? For example, a Contract Entity needs to relate somewhere to it's related Mission Aggregate, like for example when i want to dispatch a Command from the app, just after the legacy code having flushed something on the Entity. Where to store this relation ? In the Entity itself, in a AggregateId field ? in the Aggregate, should i have a ContratId field ? Or should i have some kind of Mapping Table somewhere that holds the relationship between Contract ID and MissionAggregate ID?
Aggregates represent a conceptual whole. They are like atoms, indivisible things. You should always refer to an aggregate by its Root Entity Id, and never to a Child Entity Id: looking from the outside, there are no children.
An aggregate should be loaded as a whole and persisted as a whole. One more reason to have small aggregates.
An aggregate can be comprised of a single entity. Or it can have more entities and value objects, forming a graph, but one entity will be elected as the Root and will hold references to its children. Child entities and value objects should not hold references to their parents. The dependency is not bi-directional.
If Contract is an entity inside the Mission aggregate, the Contract should not have a reference to its parent.
But, if your Contract and Mission are different aggregates, then they can reference each other by their Ids.
What to do with the past? Should i migrate all the existing datas through a script that generates Aggregates and events on all the historical data?
That's a question for the business experts. Do they need it? If they don't, then don't implement it just for the sake of doing so. Every decision you make should be geared towards satisfying a business need and generating real value for it, considering the costs and tradeoffs.
Some people say that code is a liability, not an asset, and I aggre to some extent: every line of code you write needs to be tested and supported. Don't write any code that is not really necessary.
Also, have a look at this article about the Strangler Pattern, which shows how to migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services.
If you have a chance, watch this course at Pluralsight (paid): Domain-Driven Design: Working with Legacy Projects. The author presents practical approaches for dealing with this kind of task.
I hope this has given you some insight.
I don't want to spoil your game. Everybody knows how cool it is to rewrite something from scratch. It's a challenge, it's fun, it's exciting. However...
migrate it from a monolithic/state oriented model to a distributed, service oriented app
CQRS/Event Sourcing won't solve any of your problems and it won't help you distribute the app in any reasonable way. If you just generate events on the CRUD operations you'll have a large tangled mess of dependencies between each part. Every part that needs data will have to call a couple of "services" (i.e. tables) to get it, than push data elsewhere, generate events1 that some other parts will react to. It will be a mess. Usually this is called a distributed monolith.
This is also the reason you already see problems with it. These problems won't go away, because you are essentially building the same system in the same way, but this time it'll be more complex.
Where to go from here
The very first thing is always: have a clear goal. You want a service oriented architecture you said. Why? Are there parts that need different scaling, different resources? Are they managed by different teams with different life-cycles? Etc.? Maybe you already have all this, I don't know, but if not, that's your first task.
Then. The parts you do want to pull out can't be just CRUD things. Those will not be independent, so whether your goal (see point above!) is scaling or different team, you won't reach your goal! To be independent you'll have to pull out the behavior with the data, and in a way that the service can operate on its own.
You can't just throw buzzwords at it and hope for the best. I'd suggest to just ignore all the hype and buzzwords and think about the goal you want to reach.
For example: I need a million workers to log their time in under 10 minutes total. So that means I need a "service" to enable worker to log their time with a web interface. So let's create that as a complete independent piece with its own database so it can be scaled to a 100 nodes when it needs to be. Export data to billing automatically every hour or so.

Stream aggregate relationship in an event sourced system

So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store. If we divide the events into streams, what exactly would a stream represent? In the context of a hypothetical inventory management system that tracks a collection of items, each with an ID, product code, and location, I'm having trouble visualizing the layout of the system.
From what I could gather on the internet, it could be described succinctly "one stream per aggregate." So I would have an Inventory aggregate, a single stream with ItemAdded, ItemPulled, ItemRestocked, etc. events each with serialized data containing the Item ID, quantity changed, location, etc. The aggregate root would contain a collection of InventoryItem objects (each with their respective quantity, product codes, location, etc.) That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location? It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
So my question is 'which is correct?' Partially both? Neither? Like most things, the more I learn, the more I learn that I don't know...
In a typical event store, each event stream is an isolated transaction boundary. Any time you change the model you lock the stream, append new events, and release the lock. (In designs that use optimistic concurrency, the boundaries are the same, but the "locking" mechanism is slightly different).
You will almost certainly want to ensure that any aggregate is enclosed within a single stream -- sharing an aggregate between two streams is analogous to sharing an aggregate across two databases.
A single stream can be dedicated to a single aggregate, to a collection of aggregates, or even to the entire model. Aggregates that are part of the same stream can be changed in the same transaction -- huzzah! -- at the cost of some contention and a bit of extra work to do when loading an aggregate from the stream.
The most commonly discussed design assigns each logical stream to a single aggregate.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
There are a couple of possibilities; in some models, especially those with a strong temporal component, it makes sense to model some "entities" as a time series of aggregates. For example, in a scheduling system, rather than Bobs Calendar you might instead have Bobs March Calendar, Bobs April Calendar and so on. Chopping the life cycle into smaller installments can keep the event count in check.
Another possibility is snapshots, with an additional trick to it: each snapshot is annotated with metadata that describes where in the stream the snapshot was made, and you simply read the stream forward from that point.
This, of course, depends on having an implementation of an event stream that supports random access, or an implementation of stream that allows you to read last in first out.
Keep in mind that both of these are really performance optimizations, and the first rule of optimization is... don't.
So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store
The event store in a DDD project is designed around event-sourced Aggregates:
it provides the efficient loading of all events previously emitted by an Aggregate root instance (having a given, specified ID)
those events must be retrieved in the order they where emitted
it must not permit appending events at the same time for the same Aggregate root instance
all events emitted as result of a single command must be all appended atomically; this means that they should all succeed or all fail
The 4th point could be implemented using transactions but this is not a necessity. In fact, for scalability reasons, if you can then you should choose a persistence that provides you atomicity without the use of transactions. For example, you could store the events in a MongoDB document, as MongoDB guaranties document-level atomicity.
The 3rd point can be implemented using optimistic locking, using a version column with an unique index per (version x AggregateType x AggregateId).
At the same time, there is a DDD rule regarding the Aggregates: don't mutate more than one Aggregate per transaction. This rule helps you A LOT to design a scalable system. Break it if you don't need one.
So, the solution to all these requirements is something that is called an Event-stream, that contains all the previous emitted events by an Aggregate instance.
So I would have an Inventory aggregate
The DDD has higher precedence than the Event-store. So, if you have some business rules that force you to decide that you must have a (big) Inventory aggregate, then yes, it would load ALL the previous events generated by itself. Then the InventoryItem would be a nested entity that cannot emit events by itself.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Yes, indeed. The simplest thing would be for us to all have a single Aggregate, with a single instance. Then the consistency would be the strongest possible. But this is not efficient so you need to better think about the real business requirements.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location?
There is another possibility. You should model the assigning of product codes as a Business Process. For this you could use a Saga/Process manager that would orchestrate the entire process. This Saga could use a collection with an unique index added to the product code column in order to ensure that only one product uses a given product code.
You could design the Saga to permit the allocation of an already-taken code to a product and to compensate later or to reject the invalid allocation in the first place.
It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
The Saga uses indeed a private state maintained from the domain events in an eventual consistent state, just like a Read-model but this does not feel wrong for me. It may use whatever it needs in order to bring (eventually) the system as a hole to a consistent state. It complements the Aggregates, whose purpose is to not allow the building-blocks of the system to get into an invalid state.

Check command for validity with data from other aggregate

I am currently working on my first bigger DDD application. For now it works pretty well, but we are stuck with an issue since the early days that I cannot stop thinking about:
In some of our aggreagtes we keep references to another aggregate-root that is pretty essential for the whole application (based on their IDs, so there are no hard references - also the deletion is based on events/eventual consistency). Now when we create a new Entity "Entity1" we send a new CreateEntity1Command that contains the ID of the referenced aggregate-root.
Now how can I check if this referenced ID is a valid one? Right now we check it by reading from the other aggregate (without modifying anything there) - but this approach somehow feels dirty. I would like to just "trust" the commands, because the ID cannot be entered manually but must be selected. The problem is, that our application is a web-application and it is not really safe to trust the user input you get there (even though it is not accessibly by the public).
Did I overlook any possible solutions for this problems or should I just ignore the feeling that there needs to be a better solution?
Verifying that another referenced Aggregate exists is not the responsibility of an Aggregate. It would break the Single responsibility principle. When the CreateEntity1Command arrive at the Aggregate, it should be considered that the other referenced Aggregate is in a valid state, i.e. it exists.
Being outside the Aggregate's boundary, this checking is eventually consistent. This means that, even if it initially passes, it could become invalid after that (i.e. it is deleted, unpublished or any other invalid domain state). You need to ensure that:
the command is rejected, if the referenced Aggregate does not yet exists. You do this checking in the Application service that is responsible for the Use case, before dispatching the command to the Aggregate, using a Domain service.
if the referenced Aggregate enters an invalid state afterwards, the corrects actions are taken. You should do this inside a Saga/Process manager. If CQRS is used, you subscribe to the relevant events; if not, you use a cron. What is the correct action it depends on your domain but the main idea is that it should be modeled as a process.
So, long story short, the responsibilty of an Aggregate does not extend beyond its consistency boundary.
P.S. resist the temptation to inject services (Domain or not) into Aggregates (throught constructor or method arguments).
Direct Aggregate-to-Aggregate interaction is an anti-pattern in DDD. An aggregate A should not directly send a command or query to an aggregate B. Aggregates are strict consistency boundaries.
I can think of 2 solutions to your problem: Let's say you have 2 aggregate roots (AR) - A and B. Each AR has got a bunch of command handlers where each command raises 1 or more events. Your command handler in A depends on some data in B.
You can subscribe to the events raised by B and maintain the state of B in A. You can subscribe only to the events which dictate the validity.
You can have a completely independent service (S) coordinating between A and B. Instead of directly sending your request to A, send your request to S which would be responsible for a query from B (to check for validity of referenced ID) and then forward request to A. This is sometimes called a Process Manager (PM).
For Example in your case when you are creating a new Entity "Entity1", send this request to a PM whose job would be to validate if the data in your request is valid and then route your request to the aggregate responsible for creating "Entity1". Send a new CreateEntity1Command that contains the ID of the referenced aggregate-root to this PM which uses ID of the referenced AR to make sure it's valid and if it's valid then only it would pass your request forward.
Useful Links: http://microservices.io/patterns/data/saga.html
Did I overlook any possible solutions for this problems
You did. "Domain Services" give you a possible loop hole to play in.
Aggregates are consistency boundaries; their behaviors are constrained by
The current state of the aggregate
The arguments that they are passed.
If an aggregate needs to interact with something outside of its boundary, then you pass to the aggregate root a domain service to encapsulate that interaction. The aggregate, at its own discretion, can invoke methods provided by the domain service to achieve work.
Often, the domain service is just a wrapper around an application or infrastructure service. For instance, if the aggregate needed to know if some external data were available, then you could pass in a domain service that would support that query, checking against some cache of data.
But - here's the trick: you need to stay aware of the fact that data from outside of the aggregate boundary is necessarily stale. There might be another process changing the data even as you are querying a stale copy.
The problem is, that our application is a web-application and it is not really safe to trust the user input you get there (even though it is not accessibly by the public).
That's true, but it's not typically a domain problem. For instance, we might specify that an endpoint in our API requires a JSON representation of some command message -- but that doesn't mean that the domain model is responsible for taking a raw byte array and creating a DOM for it. The application layer would have that responsibility; the aggregate's responsibility is the domain concerns.
It can take some careful thinking to distinguish where the boundary between the different concerns is. Is this sequence of bytes a valid identifier for an aggregate? is clearly an application concerns. Is the other aggregate in a state that permits some behavior? is clearly a domain concern. Does the aggregate exist at all...? could go either way.

Showing data on the UI in the Hexagonal architecture

I'm learning DDD and Hexagonal architecture, I think I got the basics. However, there's one thing I'm not sure how to solve: how am I showing data to the user?
So, for example, I got a simple domain with a Worker entity with some functionality (some methods cause the entity to change) and a WorkerRepository so I can persist Workers. I got an application layer with some commands and command bus to manipulate the domain (like creating Workers and updating their work hours, persisting the changes), and an infrastructure layer which has the implementation of the WorkerRepository and a GUI application.
In this application I want to show all workers with some of their data, and be abe to modify them. How do I show the data?
I could give it a reference to the implementation of WorkerRepository.
I think it's not a good solution because this way I could insert new Workers in the repository skipping the command bus. I want all changes going through the command bus.
Okay then, I'd split the WorkerRepository into WorkerQueryRepository and WorkerCommandRepository (as per CQRS), and give reference only to the WorkerQueryRepository. It's still not a good solution because the repo gives back Worker entities which have methods that change them, and how are these changes will be persisted?
Should I create two type of Repositories? One would be used in the domain and application layer, and the other would be used only for providing data to the outside world. The second one wouldn't return full-fledged Worker entities, only WorkerDTOs containing only the data the GUI needs. This way, the GUI has no other way to change Workers, only through the command bus.
Is the third approach the right way? Or am I wrong forcing that the changes must go through the command bus?
Should I create two type of Repositories? One would be used in the domain and application layer, and the other would be used only for providing data to the outside world. The second one wouldn't return full-fledged Worker entities, only WorkerDTOs containing only the data the GUI needs.
That's the CQRS approach; it works pretty well.
Greg Young (2010)
CQRS is simply the creation of two objects where there was previously only one. The separation occurs based upon whether the methods are a command or a query (the same definition that is used by Meyer in Command and Query Separation, a command is any method that mutates state and a query is any method that returns a value).
The current term for the WorkerDTO you propose is "Projection". You'll often have more than one; that is to say, you can have a separate projection for each view of a worker in the GUI. (That has the neat side effect of making the view easier -- it doesn't need to think about the data that it is given, because the data is already formatted usefully).
Another way of thinking of this, is that you have a "write-only" representation (the aggregate) and "read-only" representations (the projections). In both cases, you are reading the current state from the book of record (via the repository), and then using that state to construct the representation you need.
As the read models don't need to be saved, you are probably better off thinking factory, rather than repository, on the read side. (In 2009, Greg Young used "provider", for this same reason.)
Once you've taken the first step of separating the two objects, you can start to address their different use cases independently.
For instance, if you need to scale out read performance, you have the option to replicate the book of record to a bunch of slave copies, and have your projection factory load from the slaves, instead of the master. Or to start exploring whether a different persistence store (key value store, graph database, full text indexer) is more appropriate. Udi Dahan reviews a number of these ideas in CQRS - but different (2015).
"read models don't need to be saved" Is not correct.
It is correct; but it isn't perhaps as clear and specific as it could be.
We don't need to create a durable representation of a read model, because all of the information that describes the variance between instances of the read model has already been captured by our writes.
We will often want to cache the read model (or a representation of it), so that we can amortize the work of creating the read model across many queries. And various trade offs may indicate that the cached representations should be stored durably.
But if a meteor comes along and destroys our cache of read models, we lose a work investment, but we don't lose information.

DDD, Move to trash, how to design it

I have a simple use case where the user can discard a profile. It is really easy to understand but raise some modeling questions.
1/ Is it okay to have a flag in my profile entity to indicate that he is in the trash ?
I don't think. So I would like to have two ProfileRepository and TrashRepository.
2/ So given those two repositories, in my application service I just have to remove the profile from his repository and add it to the trash. Seems natural but can cause troubles if I cannot have a transaction. (but it is not the case in my app).
However, I'm using a relational database and a first idea would be to use a column to indicate if the row is in the trash or not and having the two repositories working on the same table. I'm not sure that it is a good idea.
I can also add a discard method to the ProfileRepository so that I don't need the two.
Which is the best solution ?
Can I set a flag to determinate the status (discarded) in my entity or is it better to have two different entities with different repositories ?
Discard really is a business command and a command will always mutate the state of the domain. I believe that it's perfectly valid to have a status indicating that the profile has been discarded. What would be wrong is to introduce a property such as deleted or active when what you really mean is discarded.
However, some thinks that it's sometime useful to model states explicitely: have an entirely different class to represent a discarded profile.
Here's a few links related to explicit state modeling:
http://codebetter.com/gregyoung/2010/03/09/state-pattern-misuse/
http://p2p.wrox.com/book-patterns-principles-practices-domain-driven-design/94718-ch16-explicit-state-modeling-identity-map.html
https://medium.com/#martinezdelariva/explicit-state-modeling-f6e534c33508

Resources