Microservices: how to effectively deal with data dependencies between microservices - node.js

I am developing an application utilizing the microservices development approach with the mean stack. I am running into a situation where data needs to be shared between multiple microservices. For example, let's say I have user, video, message(sending/receiving,inbox, etc.) services. Now the video and message records belong to an account record. As users create video and send /receive message there is a foreign key(userId) that has to be associated with the video and message records they create. I have scenarios where I need to display the first, middle and last name associated with each video for example. Let's now say on the front end a user is scrolling through a list of videos uploaded to the system, 50 at a time. In the worst case scenario, I could see a situation where a pull of 50 occurs where each video is tied to a unique user.
There seems to be two approaches to this issue:
One, I make an api call to the user service and get each user tied to each video in the list. This seems inefficient as it could get really chatty if I am making one call per video. In the second of the api call scenario, I would get the list of video and send a distinct list of user foreign keys to query to get each user tied to each video. This seems more efficient but still seems like I am losing performance putting everything back together to send out for display or however it needs to be manipulated.
Two, whenever a new user is created, the account service sends a message with the user information each other service needs to a fanout queue and then it is the responsibility of the individual services to add the new user to a table in it's own database thus maintaining loose coupling. The extreme downside here would be the data duplication and having to have the fanout queue to handle when updates needs to be made to ensure eventual consistency. Though, in the long run, this approach seems like it would be the most efficient from a performance perspective.
I am torn between these two approaches, as they both have their share of tradeoffs. Which approach makes the most sense to implement and why?

I'm also interested in this question.
First of all, scenario that you described is very common. Users, videos and messages definitely three different microservices. There is no issue in how you broke down system into pieces.
Secondly, there are multiple options, how to solve data sharing problem. Take a look at great article from auth0: https://auth0.com/blog/introduction-to-microservices-part-4-dependencies/

Don't restrict your design decision to those 2 options you've outlined. The hardest thing about microservices is to get your head around what a service is and how to cut your application into chunks/services that make sense to be implemented as a 'microservice'.
Just because you have those 3 entities (user, video & message) doesn't mean you have to implement 3 services. If your actual use case shows that these services (or entities) depend heavily on each other to fulfil a simple request from the front-end than that's a clear signal that your cutting was not right.
From what I see from your example I'd design 1 microservice that fulfills the request. Remember that one of the design fundamentals of a microservice is to be as independent as possible.
There's no need to over complicate services, it's not SOA.
https://martinfowler.com/articles/microservices.html -> great read!
Regards,
Lars

Related

CQRS with REST APIs

I am building a REST service over CQRS using EventSourcing to distribute changes to my domain across services. I have the REST service up and running, with a POST endpoint for creating the initial model and then a series of PATCH endpoints to change the model. Each end-point has a command associated with it that the client sends as a Content-Type parameter. For example, Content-Type=application/json;domain-command=create-project. I have the following end-points for creating a Project record on my task/project management service.
api.foo.com/project
Verb: POST
Command: create-project
What it does: Inserts a new model in the event store with some default values set
api.foo.com/project/{projectId}
Verb: PATCH
Command: rename-project
What it does: Inserts a project-renamed event into the event store with the new project name.
api.foo.com/project/{projectId}
Verb: PATCH
Command: reschedule-project
What it does: Inserts a project-rescheduled event into the event store with the new project due date.
api.foo.com/project/{projectId}
Verb: PATCH
Command: set-project-status
What it does: Inserts a project-status-changed event into the event store with the new project status (Active, Planning, Archived etc).
api.foo.com/project/{projectId}
Verb: DELETE
Command: delete-project
What it does: Inserts a project-deleted event into the event store
Traditionally in a REST service you would offer a PUT endpoint so the record could be replaced. I'm not sure how that works in the event-sourcing + CQRS pattern. Would I only ever use POST and PATCH verbs?
I was concerned I was to granular and that every field didn't need a command associated with it. A PUT endpoint could be used to replace pieces. My concern though was that the event store would get out of sync so I just stuck with PATCH endpoints. Is this level of granularity typical? For a model with 6 properties on it I have 5 commands to adjust the properties of the model.
This is a common question that we get a lot of the time when helping developers getting started with CQRS/ES. We need to acknowledge that applying REST in a pure way is a really bad match for DDD/CQRS since the intention of the commands are not explicitly expressed in the verbs GET/POST/PUT/PATCH/DELETE (even though you can use content-type like you did). Also the C/R-side of the system are definitely different resources in a CQRS-system which does not match up with REST.
However, to use HTTP to provide an API for a CQRS/ES system is very practical.
We usually only use POST for sending commands, to either a /commands endpoint or to endpoints with the name of the command, i.e /commands/create-project. It's all about how strict you want to be. In this case we embed the command type in the payload or as a content-type.
However, it is all a matter of what matches the tech stack better and what you choose here usually does not make or break the solution. The more important part is usually to create a good domain model and get the whole team onboard with this way of thinking.
Good luck!
One question that comes to mind is, is REST the right paradigm for CQRS at all?
One completely different way to structure this is to not have action-focused endpoints, but instead structure your REST API as a series of events that you add new events to (with POST).
Events should be immutable and append-only, so maybe a DELETE method doesn't make that much sense for mutations.
If you're going all in with CQRS (good luck, I've heard the war stories) I would be inclined to build an API that reflects that model well.
Would I only ever use POST and PATCH verbs?
Most of the time, you would use POST.
PUT, and PATCH are defined with remote authoring semantics - they are methods used to copy new representations of a resource from the client to the server. For example, the client GETs a representation of /project/12345, makes local edits, and then uses PUT to request that the server accept the client's new representation of the resource as its own.
PATCH, semantically, is a similar exchange of messages - the difference being that instead of sending the full representation of the resource, the client returns a "patch-document" that the server can apply to its copy to make the changes.
Now, technically, the PATCH documentation does put any restrictions on what a "patch-document" is. In order for PATCH to be more useful that POST, however, we need patch document formats that are general purpose and widely recognized (for instance, application/merge-patch+json or application/json-patch+json).
And that's not really the use case you have here, where you are defining command messages that are specific to your domain.
Furthermore, remote authoring semantics don't align very well with "domain modeling" (which is part of the heritage of CQRS). When we're modeling a domain, we normally give the domain model the authority to decide how to integrate new information with what the server already knows. PUT and PATCH semantics are more like what you would use to write information into an anemic data store.
On the other hand, it is okay to use POST
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.” -- Fielding, 2009
It may help to recall that REST is the architectural style of the world wide web, and the only unsafe method supported by html is POST.
So replacing your PATCH commands with POST, and you're on the right path.
Fielding, 2008
I should also note that the above is not yet fully RESTful, at least how I use the term. All I have done is described the service interfaces, which is no more than any RPC. In order to make it RESTful, I would need to add hypertext to introduce and define the service, describe how to perform the mapping using forms and/or link templates, and provide code to combine the visualizations in useful ways. I could even go further and define these relationships as a standard, much like Atom has standardized a normal set of HTTP relationships with expected semantics
The same holds here - we aren't yet at "REST", but we have improved things by choosing standardized methods that are better aligned with our intended semantics.
One final note -- you should probably replace your use of DELETE with POST as well. DELETE is potentially a problem for two reasons -- the semantics aren't what you want, and the standard delete payload has no defined semantics
Expressed another way: DELETE is from the transferring documents over a network domain, not from your domain. A DELETE message sent to your resources should be understood to mean the same thing as a DELETE message sent to any other resource is understood. That's the uniform interface constraint at work: we all agree that the HTTP method tokens mean the same thing everywhere.
Relatively few resources allow the DELETE method -- its primary use is for remote authoring environments, where the user has some direction regarding its effect -- RFC 7231
As before: remote authoring semantics are not obviously a good fit for sending messages to a domain model.
This Google Cloud article API design: Understanding gRPC, OpenAPI and REST and when to use them clarifies the REST vs RPC debate. REST is more relevant for entity-centric API whereas RPC is more relevant for action-centric API (and CQRS). The most mature REST level 3 with hypermedia controls works well only for entities with simple state models.
Understand and evaluate first the benefits of REST for your case. Many APIs are REST-ish and not RESTful. OpenAPI is actually RPC mapped over and HTTP endpoints but it doesn't prevent it to be widely adopted.

DDD/Event sourcing, getting data from another microservice?

I wonder if you can help. I am writing an order system and currently have implemented an order microservice which takes care of placing an order. I am using DDD with event sourcing and CQRS.
The order service itself takes in commands that produce events, the actual order service listens to its own event to create a read model (The idea here is to use CQRS, so commands for writes and queries for reads)
After implementing the above, I ran into a problem and its probably just that I am not fully understanding the correct way of doing this.
An order actually has dependents, meaning an order needs a customer and a product/s. So i will have 2 additional microservices for customer and products.
To keep things simple, i would like to concentrate on the customer (although I have exactly the same issue with products but my thinking is that if I fix the customer issue then the other one is automatically fixed also).
So back to the problem at hand. To create an order the order needs a customer (and products), I currently have the customerId on the client, so sending down a command to the order service, I can pass in the customerId.
I would like to save the name and address of the customer with the order. How do I get the name and address of the customerId from the Customer Service in the Order Service ?
I suppose to summarize, when data from one service needs data from another service, how am I able to get this data.
Would it be the case of the order service creating an event for receiving a customer record ? This is going to introduce a lot of complexity (more events) in the system
The microservices are NOT coupled so the order service can't just call into the read model of the customer.
Anybody able to help me on this ?
If you are using DDD, first of all, please read about bounded context. Forget microservices, they are just implementation strategy.
Now back to your problem. Publish these events from Customer aggregate(in your case Customer microservice): CustomerRegistered, CustomerInfoUpdated, CustomerAccountRemoved, CustomerAddressChanged etc. Then subscribe your Order service(again in your case application service inside Order microservice) to listen all above events. Okay, not all, just what order needs.
Now, you may have a question, what if majority or some of my customers don't make orders? My order service will be full of unnecessary data. Is this a good approach?
Well, answer might vary. I would say, space in hard disk is cheaper than memory or a database query is faster than a network call in performance perspective. If your database host(or your server) is limited then you should not go with microservices. Moreover, I would make some business ideas with these unused customer data e.g. list all customers who never ordered anything, I will send them some offers to grow my business. Just kidding. Don't feel bothered with unused data in microservices.
My suggestion would be to gather the required data on the front-end and pass it along. The relevant customer details that you want to denormalize into the order would be a value object. The same goes for the product data (e.g. id, description) related to the order line.
It isn't impossible to have the systems interact to retrieve data but that does couple them on a lower level that seems necessary.
When data from one service needs data from another service, how am I able to get this data?
You copy it.
So somewhere in your design there needs to be a message that carries the data from where it is to where it needs to be.
That could mean that the order service is subscribing to events that are published by the customer service, and storing a copy of the information that it needs. Or it could be that the order service queries some API that has direct access to the data stored by the customer service.
Queries for the additional data that you need could be synchronous or asynchronous - maybe the work can be deferred until you have all of the data you need.
Another possibility is that you redesign your system so that the business capability you need is with the data, either moving the capability or moving the data. Why does ordering need customer data? Can the customer service do the work instead? Should ordering own the data?
There's a certain amount of complexity that is inherent in your decision to distribute the work across multiple services. The decision to distribute your system involves weighing various trade offs.

DDD Layers and External Api

Recently I've been trying to make my web application use separated layers.
If I understand the concept correctly I've managed to extract:
Domain layer
This is where my core domain entities, aggregate roots, value objects reside in. I'm forcing myself to have pure domain model, meaning i do not have any service definitions here. The only thing i define here is the repositories, which is actually hidden because axon framework implements that for me automatically.
Infrastructure layer
This is where the axon implements the repository definitions for my aggregates in the domain layer
Projection layer
This is where the event handlers are implemented to project the data for the read model using MongoDB to persist it. It does not know anything other than event model (plain data classes in kotlin)
Application layer
This is where the confusion starts.
Controller layer
This is where I'm implementing the GraphQL/REST controllers, this controller layer is using the command and query model, meaning it has knowledge about the Domain Layer commands as well as the Projection Layer query model.
As I've mentioned the confusion starts with the application layer, let me explain it a bit with simplified example.
Considering I want a domain model to implement Pokemon fighting logic. I need to use PokemonAPI that would provide me data of the Pokemon names stats etc, this would be an external API i would use to get some data.
Let's say that i would have domain implemented like this:
(Keep in mind that I've stretched this implementation so it forces some issues that i have in my own domain)
Pokemon {
id: ID
}
PokemonFight {
id: ID
pokemon_1: ID
pokemon_2: ID
handle(cmd: Create) {
publish(PokemonFightCreated)
}
handle(cmd: ProvidePokemonStats) {
//providing the stats for the pokemons
publish(PokemonStatsProvided)
}
handle(cmd: Start) {
//fights only when the both pokemon stats were provided
publish(PokemonsFought)
}
The flow of data between layers would be like this.
User -> [HTTP] -> Controller -> [CommandGateway] -> (Application | Domain) -> [EventGateway] -> (Application | Domain)
Let's assume that two of pokemons are created and the use case of pokemon fight is basically that when it gets created the stats are provided and then when the stats are provided the fight automatically starts.
This use case logic can be solved by using event processor or even saga.
However as you see in the PokemonFight aggregate, there is [ProvidePokemonStats] command, which basically provides their stats, however my domain do not know how to get such data, this data is provided with the PokemonAPI.
This confuses me a bit because the use case would need to be implemented on both layers, the application (so it provides the stats using the external api) and also in the domain? the domain use case would just use purely domain concepts. But shouldn't i have one place for the use cases?
If i think about it, the only purpose saga/event processor that lives in the application layer is to provide proper data to my domain, so it can continue with it's use cases. So when external API fails, i send command to the domain and then it can decide what to do.
For example i could just put every saga / event processor in the application, so when i decide to change some automation flow i exactly know what module i need to edit and where to find it.
The other confusion is where i have multiple domains, and i want to create use case that uses many of them and connects the data between them, it immediately rings in my brain that this should be application layer that would use domain APIs to control the use case, because I don't think that i should add dependency of different domain in the core one.
TL;DR
What layer should be responsible of implementing the automated process between aggregates (can be single but you know what i mean) if the process requires some external API data.
What layer should be responsible of implementing the automated process between aggregates that live in different domains / micro services.
Thank you in advance, and I'm also sorry if what I've wrote sounds confusing or it's too much of text, however any answers about layering the DDD applications and proper locations of the components i would highly appreciate.
I will try to put it clear. If you use CQRS:
In the Write Side (commands): The application services are the command handlers. A cmd handler accesses the domain (repositories, aggreagates, etc) in order to implement a use case.
If the use case needs to access data from another bounded context (microservice), it uses an infraestructure service (via dependency injection). You define the infraestructure service interface in the application service layer, and the implementation in the infra layer. The infra then access the remote microservice via http rest for example. Or integration through events.
In the Read Side (queries): The application service is the query method (I think you call it projection), which access the database directly. There's no domain here.
Hope it helps.
I do agree your wording might be a bit vague, but a couple of things do pop up in my mind which might steer you in the right direction.
Mind you, the wording makes it so that I am not 100% sure whether this is what you're looking for. If it isn't, please comment and correct my on the answer I'll provide, so I can update it accordingly.
Now, before your actual question, I'd firstly like to point out the following.
What I am guessing you're mixing is the notion of the Messages and your Domain Model belonging to the same layer. To me personally, the Messages (aka your Commands, Events and Queries) are your public API. They are the language your application speaks, so should be freely sharable with any component and/or service within your Bounded Context.
As such, any component in your 'application layer' contained in the same Bounded Context should be allowed to be aware of this public API. The one in charge of the API will be your Domain Model, that's true, but these concepts have to be shared to be able to communicate with one another.
That said, the component which will provide the states to your aggregate can be viewed from two directions I think.
It's a component that handles a specific 'Start Pokemon Match' Command. This component has the smarts to know to firstly retrieve the states prior to being able to dispatch a Create and ProvidePokemonStats command, thus ensuring it'll consistently create a working match with the stats in it by not dispatching any of both of the external stats-retrieval API fails.
Your angle in the question is to have an Event Handling Component that reacts on the creation of a Match. From here, I'd state a short-lived saga would be in place, as you'd need to deal with the fault scenario of not being able to retrieve the stats. A regular Event Handler is likely to lean to deal with this correctly.
Regardless of the two options you select, this service will deal with messages, a.k.a. your public API. As such it's within your application and not a component others will deal with directly, ever.
When it comes to your second question, I feel the some notion still holds. Two distinct applications/microservices only more so suggests your talking about two different Bounded Contexts. Certainly then a Saga would be in place to coordinate the operations between both contexts. Note that between Bounded Contexts, you want to share consciously when it comes to the public API, as you'd ideally not expose everything to the outside world.
Hope this helps you out and if not, like I said, please comment and provide me guidance how to answer your question properly.

Duplicate business logic in front-end with ddd microservice back-end

Here's an abstract question with real world implications.
I have two microservices; let's call them the CreditCardsService and the SubscriptionsService.
I also have a SPA that is supposed to use the SubscriptionsService so that customers can subscribe. To do that, the SubscriptionsService has an endpoint where you can POST a subscription model to create a subscription, and in that model is a creditCardId that points to a credit card that should pay for the subscription. There are certain business rules that say whether or not you can use said credit card for the subscription (expiration is more than 12 months away, it's a VISA, etc). These specific business rules are tied to the SubscriptionsService
The problem is that the team working on the SPA want a /CreditCards endpoint in the SubscriptonsService that returns all valid credit cards of the user that can be used in the subscriptions model. They don't want to implement the same business validation rules in the SPA that are in the SubscriptionsService itself.
To me this seems to go against the SOLID principles that are central to microservice design; specifically separation of concerns. I also ask myself, what precedent is this going to set? Are we going to have to add a /CreditCards endpoint to the OrdersService or any other service that might use creditCardId as a property of it's model?
So the main question is this: What is the best way to design this? Should the business validation logic be duplicated between the frontend and the backend? Should this new endpoint be added to the SubscriptionsService? Should we try to simplify the business logic?
It is a completely fair request and you should offer that endpoint. If you define the rules for what CC is valid for your service, then you should offer any and all help dealing with it too.
Logic should not be repeated. That tend to make systems unmaintainable.
This has less to do with SOLID, although SRP would also say, that if you are responsible for something then any related logic also belongs to you. This concern can not be separated from your service, since it is defined there.
As a solution option, I would perhaps look into whether I can get away with linking to the CC Service since you already have one. Can I redirect the client with a constructed query perhaps to the CC Service to get all relevant CCs, without actually knowing them in the Subscription Service.
What is the best way to design this? Should the business validation
logic be duplicated between the frontend and the backend? Should this
new endpoint be added to the SubscriptionsService? Should we try to
simplify the business logic?
From my point of view, I would integrate "Subscription BC" (S-BC) with "CreditCards BC" (CC-BC). CC-BC is upstream and S-BC is downstream. You could do it with REST API in CC-BC, or with a message queue.
But what I validate is the operation done with a CC, not the CC itself, i.e. validate "is this CC valid for subscription". And that validation is in S-BC.
If the SPA wants to retrieve "the CCs of a user that he/she can use for subscription", it is a functionality of the S-BC.
The client (SPA) should call the the S-BC API to use that functionality, and the S-BC performs the functionality getting the CCs from the CC-BC and doing the validation.
In microservices and DDD the subscriptions service should have a credit cards endpoint if that is data that is relevant to the bounded context of subscriptions.
The creditcards endpoint might serve a slightly different model of data than you would find in the credit cards service itself, because in the context of subscriptions a credit card might look or behave differently. The subscriptions service would have a creditcards table or backing store, probably, to support storing its own schema of creditcards and refer to some source of truth to keep that data in good shape (for example messages about card events on a bus, or some other mechanism).
This enables 3 things, firstly the subscriptions service wont be completely knocked out if cards goes down for a while, it can refer to its own table and work anyway. Secondly your domain code will be more focused as it will only have to deal with the properties of credit cards that really matter to solving the current problem. Finally if your cards store can even have extra domain specific properties that are computed and materialized on store.
Obligatory Fowler link : Bounded Context Pattern
Even when the source of truth is the domain model and ultimately you must have validation at the
domain model level, validation can still be handled at both the domain model level (server side) and
the UI (client side).
Client-side validation is a great convenience for users. It saves time they would otherwise spend
waiting for a round trip to the server that might return validation errors. In business terms, even a few
fractions of seconds multiplied hundreds of times each day adds up to a lot of time, expense, and
frustration. Straightforward and immediate validation enables users to work more efficiently and
produce better quality input and output.
Just as the view model and the domain model are different, view model validation and domain model
validation might be similar but serve a different purpose. If you are concerned about DRY (the Don’t
Repeat Yourself principle), consider that in this case code reuse might also mean coupling, and in enterprise applications it is more important not to couple the server side to the client side than to
follow the DRY principle. (NET-Microservices-Architecture-for-Containerized-NET-Applications book)

Design pattern to address invitation race condition

Before I begin describing the problem I'm facing, let me assure you that I've checked if there already is another thread where this has been talked about. After about 5-6 tries clicking on suggestions, I gave up, since it's hard to get an idea from threads with generic names like "What design pattern can I use?"
So I've given this question as descriptive a title as I could come up with. The reason for my concern about this being asked already is that it feels like it should be a fairly common problem (surely others would've encountered this in their client-server program).
=====
So here's my problem...
I've got a single server S, and several clients C1, C2, ..., Cn.
A client can do 1 of three things at any given time:
Create an event.
Invite other clients to created events.
Accept or reject invitations to events created by other clients.
A client sees names for events they've created (and possibly invited other clients to) as well as names for events they've accepted invitations to. The server processes all invitations; when a client invites another client to an event, the invitation goes through S but S knows nothing about an event E other than the name associated with it, the inviting client, and the invited clients. Let's symbolise the name of an event E as |E|.
Now for two events Ea and Eb, |Ea| != |Eb| does not imply Ea != Eb. That is, just because two events have different names does not mean they are different. I won't formally define what makes two events the same here, but as a use-case, let's say two events are the same if they have the same location/time. However the server never knows this info remember, only the clients do, but the clients may not communicate well enough beforehand with each other and so may choose different names to represent the same (intended) event.
My problem: I want to avoid a situation where a client Ca accepts an invitation from a client Cb to an event Eb, and Cb accepts an invitation from Ca to Ea, where Ea = Eb. This would lead to each client seeing both |Ea| and |Eb|, which actually represent the same event.
Question: How do I avoid the above? Is there a design pattern that can work on the server alone, client alone, or both server and client together? The solution can include dialogs/prompts for clients.
=====
A practical implementation for such a client-server setup could be discussion topics as events and employees as clients. Imagine a situation where Craig and Matt are colleagues who rarely see each other. They suddenly realise that their boss had asked them to look into why a recent software upgrade wasn't working for some of their customers. But neither knows the other person has been asked to look into the issue as well. So Craig creates the event 'Discuss recent upgrade', and Matt (who's done bit more research than Craig) suspecting it to be an (ahem) Adobe issue, creates 'Investigate new Adobe add-on'. They both invite each other to these topics, and both being very polite, readily accept. Confusion ensues.
How about the solution running core component # server?
I am trying to map this problem with the standard scheduling problem, with a variation of informing the conflicting appointments and finding conflict logic.
Here server keeps the list of events(or appointments). When a request for a new event comes, a conflict check is made. If a conflict is found (may be location and time or even something else - abstract this with a strategy pattern). Server completely conceals the complexity with a simple facade pattern.
#Client side a conflict can be coded to have dialog getting the user feedback to create a new or use existing event.
The next key thing will be, what and how an event will be stored. Again this majorly depends on use case. If we are designing for an event with following data
Name, Date, Start time, Location, Organizer, Description
The storage # server can be typical 'Outlook based calendar' way or 'list of events # buckets of time'.

Resources