Mikro-orm inter-service transactions in NestJS - nestjs

I am evaluating Mikro-Orm for a future project. There are several questions I either could not find an answer in the docs or did not fully understand them.
Let me describe a minimal complex example (NestJS): I have an order processing system with two entities: Orders and Invoices as well as a counter table for sequential invoice numbers (legal requirement). It's important to mention, that the OrderService create method is not always called by a controller, but also via crobjob/queue system. My questions is about the use case of creating a new order:
class OrderService {
async createNewOrder(orderDto) {
const order = new Order();
order.customer = orderDto.customer;
order.items = orderDto.items;
const invoice = await this.InvoiceService.createInvoice(orderDto.items);
order.invoice = invoice;
await order.persistAndFlush();
return order
}
}
class InvoiceService {
async create(items): Invoice {
const invoice = new Invoice();
invoice.number = await this.InvoiceNumberService.getNextInSequence();
// the next two lines are external apis, if they throw, the whole transaction should roll back
const pdf = await this.PdfCreator.createPdf(invoice);
const upload = await s3Api.uplpad(pdf);
return invoice;
}
}
class InvoiceNumberService {
async getNextInSequence(): number {
return await db.collection("counter").findOneAndUpdate({ type: "INVOICE" }, { $inc: { value: 1 } });
}
}
The whole use case of creating a new order with all subsequent service calls should happen in one Mikro-Orm transaction. So if anything throws in OrderService.createNewOrder() or one one of the subsequently called methods, the whole transaction should be rolled back.
Mikro-Orm does not allow the atomic update-increment shown in InvoiceNumberService. I can fall back to the native mongo driver. But how do I ensure the call to collection.findOneAndUpdate() shares the same transaction as the entities managed by Mikro-Orm?
Mikro-Orm needs a unique request context. In the examples for NestJS, this unique context is created at the controller level. In the example above the service methods are not necessarily called by a controller. So I would need a new context for each call to OrderService.createNewOrder() that has a lifetime scoped to the function call, correct? How can I acheive this?
How can I share the same request context between services? In the example above InvoiceService and InvoiceNumberService would need the same context as OrderService for Mikro-Orm to work properly.

I will start with the bad news, mongodb transactions are not yet supported in MikroORM (athough they will land within weeks probably, already got the PoC implemented). You can subscribe here for updates: https://github.com/mikro-orm/mikro-orm/issues/34
But let me answer the rest as it will then apply:
You can use const collection = (em as EntityManager<MongoDriver>).getConnection().getCollection('counter'); to get the collection from the internal mongo connection instance. You can also use orm.em.getTransactionContext() to get the current trasaction context (currently implemented only in sql drivers, but in future this will probably return the session object in mongo).
Also note that in mongo driver, implicit transactions won't be enabled by default (it will be configurable though), so you will need to use explicit transaction demarcation via em.transactional(...).
The RequestContext helper works automatically. You just register it as a middleware (done automatically in the nestjs orm adapter) and then your request handler (route/endpoint/controller method) is ran inside a domain that shares the context. Thanks to this, all services in the DI can share singleton instances of repositories, but they will automatically pick the right context from the domain.
You basically have this automatic request context, and then you can create new (nested) contexts manually via em.transactional(...).
https://mikro-orm.io/docs/transactions/#approach-2-explicitly

Related

How to use CosmosClient.CreateAndInitializeAsync() with CosmosClientBuilder.Build()

This YouTube video #27:20 talks about populating the cache with routing info to avoid latency during a cold start.
You can either try to get a document you know doesn't exist, or you can use CosmosClient.CreateAndInitializeAsync().
I already have this code set up:
private async Task<Container> CreateContainerAsync(string endpoint, string authKey)
{
var cosmosClientBuilder = new CosmosClientBuilder(
accountEndpoint: endpoint,
authKeyOrResourceToken: authKey)
.WithConnectionModeDirect(portReuseMode: PortReuseMode.PrivatePortPool, idleTcpConnectionTimeout: TimeSpan.FromHours(1))
.WithApplicationName(UserAgentSuffix)
.WithConsistencyLevel(ConsistencyLevel.Session)
.WithApplicationRegion(Regions.AustraliaEast)
.WithRequestTimeout(TimeSpan.FromSeconds(DatabaseRequestTimeoutInSeconds))
.WithThrottlingRetryOptions(TimeSpan.FromSeconds(DatabaseMaxRetryWaitTimeInSeconds), DatabaseMaxRetryAttemptsOnThrottledRequests);
var client = cosmosClientBuilder.Build();
var databaseResponse = await CreateDatabaseIfNotExistsAsync(client).ConfigureAwait(false);
var containerResponse = await CreateContainerIfNotExistsAsync(databaseResponse.Database).ConfigureAwait(false);
return containerResponse;
}
Is there any way to incorporate CosmosClient.CreateAndInitializeAsync() with it to populate the cache?
If not, is it ok to do this to populate the cache?
public class CosmosClientWrapper
{
public CosmosClientWrapper(IKeyVaultFacade keyVaultFacade)
{
var container = CreateContainerAsync(endpoint, authenticationKey).GetAwaiter().GetResult();
// Get a document that doesn't exist to populate the routing info:
container.ReadItemAsync<object>(Guid.NewGuid().ToString(), PartitionKey.None).GetAwaiter().GetResult();
}
}
The point of CreateAndInitialize or BuildAndInitialize is to pre-establish the connections required to perform Data Plane operations to the desired containers (reference https://learn.microsoft.com/azure/cosmos-db/nosql/sdk-connection-modes#routing).
If the containers do not exist, then it makes no sense to use CreateAndInitialize or BuildAndInitialize because there are no connections that can be pre-established/warmed up, because there are no target backend endpoints to connect to. That is why the container/database information is required, because the only benefit is warming up the connections to the backend machines that support that/those container/s.
Please see CosmosClientBuilder.BuildAndInitializeAsync which creates the cosmos client and initialize the provided containers. I believe this is what you are looking for.

external service result mutates state of aggregate

My problem is that I don't know how to handle external calls that mutates the state but also needs validation before executing them
Here is my command handler
public async Task<IAggregateRoot> ExecuteAsync(Command command)
{
var sandbox = await _aggregateStore.GetByIdAsync<Sandbox>(command.SandboxId);
var response = await _azureService.CreateRedisInstance(sandbox.Id);
if (response.IsSuccess)
{
sandbox.CreateRedisDetails(response);
return sandbox;
}
sandbox.FailSetup(response.Errors.Select(e => e.Message));
return sandbox;
}
The problem here is that the sandbox aggregate needs to be in correct state before calling external service and I cannot satisfy both. My only idea here is to create separate method CanCreateRedisInstance that checks if aggregate state is valid and only then calls external service. What I don't like is that I introduce validation methods
public async Task<IAggregateRoot> ExecuteAsync(Command command)
{
var sandbox = await _aggregateStore.GetByIdAsync<Sandbox>(command.SandboxId);
if(!sandbox.CanCreateRedisInstance())
{
throw new ValidationExcetpion("something");
}
var response = await _azureService.CreateRedisInstance(sandbox.Id);
if (response.IsSuccess)
{
sandbox.CreateRedisDetails(response);
return sandbox;
}
sandbox.FailSetup(response.Errors.Select(e => e.Message));
return sandbox;
}
The other approach I thought of is to make whole process more cqrs-ish.
public async Task<IAggregateRoot> ExecuteAsync(Command command)
{
var sandbox = await _aggregateStore.GetByIdAsync<Sandbox>(command.SandboxId);
sandbox.ScheduleRedisInstanceCreation();
}
public void ScheduleRedisInstanceCreation()
{
if(RedisInstanceDetails != null)
{
throw new ValidationException("something")
}
RedisInstanceDetails = RedisInstanceDetails.Scheduled(some arguments);
AddEvent(new RedisInstanceCreationScheduled(some arguments));
}
The RedisInstanceCreationScheduled event is sent to queue and picked by event handler
which will call external service and based on result will create other events
public async Task ExecuteAsync(RedisInstanceCreationScheduled event)
{
var sandbox = await _aggregateStore.GetByIdAsync<Sandbox>(command.SandboxId);
var response = await _azureService.CreateRedisInstance(sandbox.Id);
if (response.IsSuccess)
{
sandbox.CreateRedisDetails(response);
return sandbox;
}
sandbox.FailSetup(response.Errors.Select(e => e.Message));
_aggregateStore.Save(sandbox);
}
However this approach add some extra complexity and I am not quite sure if event handler should modify aggregate.
Both approaches are possible.
Why no validation should stay in the Handler? When you change something in the domain, the domain object makes also a validation about the action, and deny it if it's not possible. Here you just need to interact with an external service to verify it.
The external service is just an interface in the domain layer, that you're going to implement with a concrete class into the infrastructure layer. Hence you will not have a directly binding with azure, but a service, let's say CloudService, that in it's implementation uses Azure. This allows you to build domain related exceptions that are thrown by classes that stay in the infrastructure layer.
Also the CQRS approach is valid. But you have to take care when you use it.
You can, for example, start a saga where you ask to the external service to create the instance (CreateRedisInstance), then, according to the event that you get (success or failure) you proceed with the next handler. But you really have to take care about middle situations: what should be done to handle failures between the 2 actions? You need also a rollback of the first action if the second one ends with a failure.
Said this, I would go with the first one if there're no really need to handle a complex process. Moreover, it looks that is all related to the same domain (no infra-domain actions are required), hence there're no real need to augment the complexity with a saga where every success/fail status should be correctly handled.

Cloud Functions and Cloud Firestore USER ASSOCIATION FUNCTION

Shortly, imagine I have a Cloud Firestore DB where I store some users data such as email, geo-location data (as geopoint) and some other things.
In Cloud Functions I have "myFunc" that runs trying to "link" two users between them based on a geo-query (I use GeoFirestore for it).
Now everything works well, but I cannot figure out how to avoid this kind of situation:
User A calls myFunc trying to find a person to be associated with, and finds User B as a possible one.
At the same time, User B calls myFunc too, trying to find a person to be associated with, BUT finds User C as possible one.
In this case User A would be associated with User B, but User B would be associated with User C.
I already have a field called "associated" set to FALSE on each user initialization, that becomes TRUE whenever a new possible association has been found.
But this code cannot guarantee the right association if User A and User B trigger the function at the same time, because at the moment in which the function triggered by User A will find User B, the "associated" field of B will be still set to false because B is still searching and has not found anybody yet.
I need to find a solution otherwise I'll end up having
wrong associations ( User A pointing at User B, but User B pointing at User C ).
I also thought about adding a snapshotListener to the user who is searching, so in that way if another User would update the searching user's document, I could terminate the function, but I'm not really sure it will work as expected.
I'd be incredibly grateful if you could help me with this problem.
Thanks a lot!
Cheers,
David
HERE IS MY CODE:
exports.myFunction = functions.region('europe-west1').https.onCall( async (data , context) => {
const userDoc = await firestore.collection('myCollection').doc(context.auth.token.email).get();
if (!userDoc.exists) {
return null;
}
const userData = userDoc.data();
if (userData.associated) { // IF THE USER HAS ALREADY BEEN ASSOCIATED
return null;
}
const latitude = userData.g.geopoint["latitude"];
const longitude = userData.g.geopoint["longitude"];
// Create a GeoQuery based on a location
const query = geocollection.near({ center: new firebase.firestore.GeoPoint(latitude, longitude), radius: userData.maxDistance });
// Get query (as Promise)
let otherUser = []; // ARRAY TO SAVE THE FIRST USER FOUND
query.get().then((value) => {
// CHECK EVERY USER DOC
value.docs.map((doc) => {
doc['data'] = doc['data']();
// IF THE USER HAS NOT BEEN ASSOCIATED YET
if (!doc['data'].associated) {
// SAVE ONLY THE FIRST USER FOUND
if (otherUser.length < 1) {
otherUser = doc['data'];
}
}
return null;
});
return value.docs;
}).catch(error => console.log("ERROR FOUND: ", error));
// HERE I HAVE TO RETURN AN .update() OF DATA ON 2 DOCUMENTS, IN ORDER TO UPDATE THE "associated" and the "userAssociated" FIELDS OF THE USER WHO WAS SEARCHING AND THE USER FOUND
return ........update({
associated: true,
userAssociated: otherUser.name
});
}); // END FUNCTION
You should use a Transaction in your Cloud Function. Since Cloud Functions are using the Admin SDK in the back-end, Transactions in a Cloud Function use pessimistic concurrency controls.
Pessimistic transactions use database locks to prevent other operations from modifying data.
See the doc form more details. In particular, you will read that:
In the server client libraries, transactions place locks on the
documents they read. A transaction's lock on a document blocks other
transactions, batched writes, and non-transactional writes from
changing that document. A transaction releases its document locks at
commit time. It also releases its locks if it times out or fails for
any reason.
When a transaction locks a document, other write operations must wait
for the transaction to release its lock. Transactions acquire their
locks in chronological order.

should I open/close different Postgres connections in one node endpoint? making this work with OOP

I'm setting up the ability for my node server to load up the proper information from my DB (postgres) to render a certain client View. I'm currently refactoring my server code to follow an Object Oriented Approach with class constructors.
I currently have it so that Readers are a class of functions that are responsible for, well, running read queries on my database. I have inherited classes like MainViewReader and MatchViewReader, and they all inherit from a "Reader" class which instantiates a connection with postgres using the pg-promise library.
The issue with this is that I can't use two view readers or they will be opening up duplicate connections, therefore I am finding myself writing redundant code. So I believe I have two design choices, and I was wondering what was more efficient:
Instead of setting the pattern to be by servlet view, instead set
the pattern to be by the table read using that class, ie
NewsTableReader" MatchTableReader. Pro's of this is that none of
the code is redundant and can be used in different servlets, Con's
is that I would have to end the connection to postgres on every
instance of the Reader class before instantiating a new one as such:
const NewsTableReader = NewsTableReader()
await NewsTableReader.close()
const MatchTableReader = MatchTableReader()
await MatchTableReader.close()
Just having view readers. Pro's is that this is only one persisting
connection, cons is that there is lots of redundant code if i'm
loading data from the same tables in different views, example:
const MatchViewReader = MatchViewReader()
await MatchViewReader.load_news()
await MatchViewReader.load_matches()
Which approach is going to affect my performance negatively the most?
You've correctly ascertained that you should not create multiple connection pools with the same connection options 1. But this doesn't have to influence the structure of your code.
You could create a global pool, and pass that to your Reader constructors, as a kind of Dependency injection:
class Reader {
constructor(db) {
this._db = db
}
}
class NewsTableReader extends Reader {}
class MatchTableReader extends Reader {}
const pgp = require('pg-promise')(/* library options */)
const db = (/* connection options */)
const newsTableReader = new NewsTableReader(db)
const matchTableReader = new MatchTableReader(db)
await newsTableReader.load()
await matchTableReader.load()
// await Promise.all([newsTableReader.load(), matchTableReader.load()])
An other way to go is to use the same classes with the extend event of the pg-promise library:
const pgp = require('pg-promise')({
extend(obj, dc) {
obj.newsTableReader = new NewsTableReader(obj);
obj.matchTableReader = new MatchTableReader(obj);
}
})
const db = (/* connection options */)
await db.newsTableReader.load()
await db.tx(async t => {
const news = await t.newsTableReader.load();
const match = await t.matchTableReader.load();
return {news, match};
});
The upside of the extend event is, that you can use all of the functionality (eg. transactions and tasks) provided by the pg-promise library across different models. The thing to keep in mind is, that it creates new objects on every db.task(), db.tx() and db.connect() call.

One connection per user

I know that this question was asked already, but it seems that some more things have to be clarified. :)
Database is designed in the way that each user has proper privileges to read documents, so the connection pool needs to have a connection with different users, which is out of connection pool concept. Because of the optimization and the performance I need to call so-called "user preparation" which includes setting session variables, calculating and caching values in a cache, etc, and after then execute queries.
For now, I have two solutions. In the first solution, I first check that everything is prepared for the user and then execute one or more queries. In case it is not prepared then I need to call "user preparation", and then execute query or queries. With this solution, I lose a lot of performance because every time I have to do the checking and so I've decided for another solution.
The second solution includes "database pool" where each pool is for one user. Only at the first connection useCount === 0 (I do not use {direct: true}) I call "user preparation" (it is stored procedure that sets some session variables and prepares cache) and then execute sql queries.
User preparation I’ve done in the connect event within the initOptions parameter for initializing the pgPromise. I used the pg-promise-demo so I do not need to explain the rest of the code.
The code for pgp initialization with the wrapper of database pooling looks like this:
import * as promise from "bluebird";
import pgPromise from "pg-promise";
import { IDatabase, IMain, IOptions } from "pg-promise";
import { IExtensions, ProductsRepository, UsersRepository, Session, getUserFromJWT } from "../db/repos";
import { dbConfig } from "../server/config";
// pg-promise initialization options:
export const initOptions: IOptions<IExtensions> = {
promiseLib: promise,
async connect(client: any, dc: any, useCount: number) {
if (useCount === 0) {
try {
await client.query(pgp.as.format("select prepareUser($1)", [getUserFromJWT(session.JWT)]));
} catch(error) {
console.error(error);
}
}
},
extend(obj: IExtensions, dc: any) {
obj.users = new UsersRepository(obj);
obj.products = new ProductsRepository(obj);
}
};
type DB = IDatabase<IExtensions>&IExtensions;
const pgp: IMain = pgPromise(initOptions);
class DBPool {
private pool = new Map();
public get = (ct: any): DB => {
const checkConfig = {...dbConfig, ...ct};
const {host, port, database, user} = checkConfig;
const dbKey = JSON.stringify({host, port, database, user})
let db: DB = this.pool.get(dbKey) as DB;
if (!db) {
// const pgp: IMain = pgPromise(initOptions);
db = pgp(checkConfig) as DB;
this.pool.set(dbKey, db);
}
return db;
}
}
export const dbPool = new DBPool();
import diagnostics = require("./diagnostics");
diagnostics.init(initOptions);
And web api looks like:
GET("/api/getuser/:id", (req: Request) => {
const user = getUserFromJWT(session.JWT);
const db = dbPool.get({ user });
return db.users.findById(req.params.id);
});
I'm interested in whether the source code correctly instantiates pgp or should be instantiated within the if block inside get method (the line is commented)?
I've seen that pg-promise uses DatabasePool singleton exported from dbPool.js which is similar to my DBPool class, but with the purpose of giving “WARNING: Creating a duplicate database object for the same connection”. Is it possible to use DatabasePool singleton instead of my dbPool singleton?
It seems to me that dbContext (the second parameter in pgp initialization) can solve my problem, but only if it could be forwarded as a function, not as a value or object. Am I wrong or can dbContext be dynamic when accessing a database object?
I wonder if there is a third (better) solution? Or any other suggestion.
If you are troubled by this warning:
WARNING: Creating a duplicate database object for the same connection
but your intent is to maintain a separate pool per user, you can indicate so by providing any unique parameter for the connection. For example, you can include custom property with the user name:
const cn = {
database: 'my-db',
port: 12345,
user: 'my-login-user',
password: 'my-login-password'
....
my_dynamic_user: 'john-doe'
}
This will be enough for the library to see that there is something unique in your connection, which doesn't match the other connections, and so it won't produce that warning.
This will work for connection strings as well.
Please note that what you are trying to achieve can only work well when the total number of connections well exceeds the number of users. For example, if you can use up to 100 connections, with up to 10 users. Then you can allocate 10 pools, each with up to 10 connections in it. Otherwise, scalability of your system will suffer, as total number of connections is a very limited resource, you would typically never go beyond 100 connections, as it creates excessive load on the CPU running so many physical connections concurrently. That's why sharing a single connection pool scales much better.

Resources