I'm learning CQRS pattern as we going to use it on our new project. And I have few questions so far:
Example task: I'll have cron command to fetch information from different providers (different API's) and responsibility of this cron command is:
fetch data from all provided;
make additional API call's to get images and videos;
process those videos and images (store to aws s3) and to uploads table in DB;
fetch existing data from DB;
transform new API data to system entities, update existing entities and delete nonexistent;
persist DB. ;
CQRS related questions:
Can I have few CQRS commands and queries inside of one system request? In the example above I need to get existing data from DB (query), persist data (command) and so on.
what about the logic of fetching data from API's can I consider it as a CQRS query as its process of getting data or CQRS query it's the only process of getting data from internal storage, not from external API?
What about the process of storing videos to s3 and storing information to uploads table, can I consider the process of storing assets to S3 as a CQRS command and this command will return data I need to store later to uploads? I do not want to store it immediately as upload entity is a part of aggregate to store main info where main info entity is the main aggregate entity. I know command should return nothing or entity ID but here it will return all data about stores assets
If all the questions from above are true, so I can make:
query to fetch API data
query to get existing data
command to process images/videos
command to insert/update/delete data
Don't judge me very strict, I'm in process of learning concepts of DDD and related patterns. And I just ask the questions what is not clear for me. Thank you very much
Can I have few CQRS commands and queries inside of one system request?
In the example above I need to get existing data from DB (query),
persist data (command) and so on.
No, you cannot. Each request is either one command or one query.
what about the logic of fetching data from API's can I consider it as
a CQRS query as its process of getting data or CQRS query it's the
only process of getting data from internal storage, not from external
API?
Commands and Queries refer to the local database. Fetching data from external services through remote API is an integration with another BC (see DDD context mapping patterns).
What about the process of storing videos to s3 and storing information
to uploads table, can I consider the process of storing assets to S3
as a CQRS command and this command will return data I need to store
later to uploads?
Storing videos to s3 is not a command, is an integration with an external service. You will have to integrate (again context mapping pattern).
I do not want to store it immediately as upload entity is a part of
aggregate to store main info where main info entity is the main
aggregate entity.
I dont know your domain model, but if uploads is a child entity in an aggregate, then storing things in your uploads table isnt a command neither. A command refers to the aggregate. Storing info in uploads table would be a part of the command.
AS A CONCLUSION:
A command or a query is a transactional operation at the application layer boundary (application service). They deal with data from your DB. Each command/query is a transaction.
Related
I have a firebase storage bucket set up for the primary purpose of storing user's profile pictures. Fetching the profile picture of the currentUser is simple, as I know the .uid. However, fetching the profile pictures for other users is not so straightforward as that first requires a query to my actual database (in this case a graph database) before I can even begin fetching their images. This process is aggravated by my backend having a three tier architecture.
So my current process is this:
get request to Node.js backend
Node.js queries graph database
Node.js sends data to frontend
frontend iteratively fetches profile pictures from other user's uid
What seems slow is the fact that my frontend has to wait for the other uids before it can even begin fetching the images. Is this unavoidable? Ideally, the images would be fetched concurrently with the info about the users.
The title here is Firebase fetching other user's Images efficiently but you're using a non-firebase database which makes it a little difficult.
The way I believe you could handle this in Firebase/Firestore would be to have duplicate data (pretty common with NoSQL databases).
Example:
Say you have a timeline feed, you probably wouldn't query the list of posts and then query user info from each of the posts. Instead, I would have a list of timeline posts for a given UID (the customer accessing the system right now), that list would include all the details needed to display the feed without another query. This could be users names, post description, and a link to their pictures based of a known bucket path to a bucket and directory structure and the UIDs. Something like gs://<my-bucket>/user-images/<a-uid>.jpg. Again, I don't have much exposure to graph databases so not sure how applicable the technique is there but I believe it could work the same.
As it says in the documentation for the Microsoft Bot Framework, they have different types of data. One of them is the dialogData, privateConversationData, conversationData and userData.
By default, it seems the userData is/should be prepared to handle the persistency across nodes, however the dialogData should be used for temporary data.
As it says here: https://learn.microsoft.com/en-us/bot-framework/nodejs/bot-builder-nodejs-dialog-waterfall
If the bot is distributed across multiple compute nodes, each step of
the waterfall could be processed by a different node, therefore it's
important to store bot data in the appropriate data bag
So, basically, if I have two nodes, how/why should I used dialogData at all, as I cannot guarantee it will be kept across nodes? It seems that if you have more than one node, you should just use userData.
I've asked the docs team to remove the last portion of the sentence: "therefore it's important to store bot data in the appropriate data bag". It is misleading. The Bot Builder is restful and stateless. Each of the dialogData, privateConversationData, conversationData and userData are stored in the State Service: so any "compute node" will be able to retrieve the data from any of these objects.
Please note: the default Connector State Service is intended only for prototyping, and should not be used with production bots. Please use the Azure Extensions or implement a custom state client.
This blog post might also be helpful: Saving State data with BotBuilder-Azure in Node.js
I have a scenario where there are multiple (~1000 - 5000) databases being created dynamically in CouchDB, similar to the "one database per user" strategy. Whenever a user creates a document in any DB, I need to hit an existing API and update that document. This need not be synchronous. A short delay is acceptable. I have thought of two ways to solve this:
Continuously listen to the changes feed of the _global_changes database.
Get the db name which was updated from the feed.
Call the /{db}/_changes API with the seq (stored in redis).
Fetch the changed document, call my external API and update the document
Continuously replicate all databases into a single database.
Listen to the /_changes feed of this database.
Fetch the changed document, call my external API and update the document in the original database (I can easily keep a track of which document originally belongs to which database)
Questions:
Does any of the above make sense? Will it scale to 5000 databases?
How do I handle failures? It is critical that the API be hit for all documents.
Thanks!
To start: I've tried Loopback. Loopback is nice but does not allow for relations across multiple REST data services, but rather makes a call to the initial data service and passes query parameters that ask it to perform the joined query.
Before I go reinventing the wheel and writing a massive wrapper around Loopback's loopback-rest-connector, I need to find out if there are any existing libraries or frameworks that already tackle this. My extensive Googling has turned up nothing so far.
In a true microservice environment, there is a service per database.
http://microservices.io/patterns/data/database-per-service.html
From this article:
Implementing queries that join data that is now in multiple databases
is challenging. There are various solutions:
Application-side joins - the application performs the join rather than
the database. For example, a service (or the API gateway) could
retrieve a customer and their orders by first retrieving the customer
from the customer service and then querying the order service to
return the customer’s most recent orders.
Command Query Responsibility Segregation (CQRS) - maintain one or more
materialized views that contain data from multiple services. The views
are kept by services that subscribe to events that each services
publishes when it updates its data. For example, the online store
could implement a query that finds customers in a particular region
and their recent orders by maintaining a view that joins customers and
orders. The view is updated by a service that subscribes to customer
and order events.
EXAMPLE:
I have 2 data microservices:
GET /pets - Returns an object like
{
"name":"ugly",
"type":"dog",
"owner":"chris"
}
and on a completely different microservice....
GET /owners/{OWNER_NAME} - Returns the owner info
{
"owner":"chris",
"address":"under a bridge",
"phone":"123-456-7890"
}
And I have an API-level microservice that is going to call these two data services. This is the microservice that I will be applying this at.
I'd like to be able to establish a model for Pet such that, when I query pet, upon a successful response from GET /pets, it will "join" with owners (send a GET /owners/{OWNERS_NAME} for all responses), and to the user, simply return a list of pets that includes their owner's data.
So GET /pets (maybe something like Pets.find()) would return
{
"name":"ugly",
"type":"dog",
"owner": "chris",
"address": "under a bridge",
"phone": "123-456-7890"
}
Applying any model/domain logic on your API-gateway is bad decision, and considered as bad practice. API Gateway should only do your systems's CAS (with relying onto Auth service which holds the logic), And convert incoming external requests into inner system requests (different headers/ requester payload data) and proxy formatted requests to services for any other work, recieves them, cares about encapsulating errors, and presents every response in proper external form.
Another point, if there is alot of joins between two models required for application core flow (validation/scoping etc) then perhaps you should reconsider to which Business Domain your models/services are bound. If it's same BD perhaps they should be together. Priciples of Domain-Driven-Design helped me to understand where real boundaries between micro-services are.
If you work with loopback (like we are and face same problem we faced - that loopback have no proper join implementation) you can have separate Report/Combined data service, which is only one who can access to all the service databases and does it only for READ purposes - i.e. queries. Provide it with separately set-up read-only wide access to the db - instead of having only one datasource being set up (single database) it should be able to read from all the databases which are in scope of this query-join db user.
Such service should able to generate proper joins with expected output schema from configuration json - like loopback models (thats what I did in same case). Once abstraction is done it's pretty simple to build/add any equery with any complex joins. It's clean, and it's easy to reason about. Also, it's DBA friendly. For me such approach worked well so far.
We've got an application in Django running against a PGSQL database. One of the functions we've grown to support is real-time messaging to our UI when data is updated in the backend DB.
So... for example we show the contents of a customer table in our UI, as records are added/removed/updated from the backend customer DB table we echo those updates to our UI in real-time via some redis/socket.io/node.js magic.
Currently we've rolled our own solution for this entire thing using overloaded save() methods on the Django table models. That actually works pretty well for our current functions but as tables continue to grow into GB's of data, it is starting to slow down on some larger tables as our engine digs through the current 'subscribed' UI's and messages out appropriately which updates are needed as which clients.
Curious what other options might exist here. I believe MongoDB and other no-sql type engines support some constructs like this out of the box but I'm not finding an exact hit when Googling for better solutions.
Currently we've rolled our own solution for this entire thing using
overloaded save() methods on the Django table models.
Instead of working on the app level you might want to work on the lower, database level.
Add a PostgreSQL trigger after row insertion, and use pg_notify to notify external apps of the change.
Then in NodeJS:
var PGPubsub = require('pg-pubsub');
var pubsubInstance = new PGPubsub('postgres://username#localhost/tablename');
pubsubInstance.addChannel('channelName', function (channelPayload) {
// Handle the notification and its payload
// If the payload was JSON it has already been parsed for you
});
See that and that.
And you will be able to to the same in Python https://pypi.python.org/pypi/pgpubsub/0.0.2.
Finally, you might want to use data-partitioning in PostgreSQL. Long story short, PostgreSQL has already everything you need :)