I have Order aggregate root class containing children value objects:
class Order {
val id: String
val lines: Seq[OrderLine]
val destination: Destination
//...omit other fields
}
This is a CQRS read model, that is represented by order-search microservice responsible for searching orders by some filter.
There is OrderApplicationService that uses OrderRepository (I am not sure that it is a pure repository in ddd terms):
trait OrderRepository {
def search(filter:OrderFilter):Seq[Order]
def findById(orderId:String):Order
}
and ElasticSearchOrderRepository which uses ES as search engine.
Due to new requirements I need new api method for UI that will search for the all destinations across the orders by some filter. It should be /destinations endpoint, that will call repository to find all data. The performance is important in this case, so to search for all orders and that map them to destination doesn't seem a good solution.
What is the most appropriate option to solve this? :
Add new method in OrderRepository e.g. def searchOrderDestinations(filter:DestinationFilter): Seq[Destination]
Create new repository:
trait OrderDestinationRepository {
def searchOrderDestinations(filter:DestinationFilter): Seq[Destination]
}
The same is for application service - do I need to create new DestinationAppService?
Are these options applicable? Or maybe there is some better solution?
Thanks in advance!
This is a CQRS read model
Perfect - create and update a list of your orders indexed by destination, and use that to serve the query results.
Think "relational database that includes the data you need to create the view, and an index". Queries go to the database, which acts as a cache for the information. A background process (async) runs to update the information in database.
How often you run that process will depend on how stale the data in the view can be. How bad is it for the business if the view shows results as of 10 minutes ago? as of 1 minute ago? as of an hour ago?
Related
Amazon QLDB allows querying the version history of a specific object by its ID. However, it also allows deleting objects. It seems like this can be used to bypass versioning by deleting and creating a new object instead of updating the object.
For example, let's say we need to track vehicle registrations by VIN.
INSERT INTO VehicleRegistration
<< {
'VIN' : '1N4AL11D75C109151',
'LicensePlateNumber' : 'LEWISR261LL'
} >>
Then our application can get a history of all LicensePlateNumber assignments for a VIN by querying:
SELECT * FROM _ql_committed_VehicleRegistration AS r
WHERE r.data.VIN = '1N4AL11D75C109151';
This will return all non-deleted document revisions, giving us an unforgeable history. The history function can be used similarly if you remember the document ID from the insert. However, if I wanted to maliciously bypass the history, I would simply delete the object and reinsert it:
DELETE FROM VehicleRegistration AS r WHERE VIN = '1N4AL11D75C109151';
INSERT INTO VehicleRegistration
<< {
'VIN' : '1N4AL11D75C109151',
'LicensePlateNumber' : 'ABC123'
} >>
Now there is no record that I have modified this vehicle registration, defeating the whole purpose of QLDB. The document ID of the new record will be different from the old, but QLDB won't be able to tell us that it has changed. We could use a separate system to track document IDs, but now that other system would be the authoritative one instead of QLDB. We're supposed to use QLDB to build these types of authoritative records, but the other system would have the exact same problem!
How can QLDB be used to reliably detect modifications to data?
There would be a record of the original record and its deletion in the ledger, which would be available through the history() function, as you pointed out. So there's no way to hide the bad behavior. It's a matter of hoping nobody knows to look for it. Again, as you pointed out.
You have a couple of options here. First, QLDB rolled-out fine-grained access control last week (announcement here). This would let you, say, prohibit deletes on a given table. See the documentation.
Another thing you can do is look for deletions or other suspicious activity in real-time using streaming. You can associate your ledger with a Kinesis Data Stream. QLDB will push every committed transaction into the stream where you can react to it using a Lambda function.
If you don't need real-time detection, you can do something with QLDB's export feature. This feature dumps ledger blocks into S3 where you can extract and process data. The blocks contain not just your revision data but also the PartiQL statements used to create the transaction. You can setup an EventBridge scheduler to kick off a periodic export (say, of the day's transactions) and then churn through it to look for suspicious deletes, etc. This lab might be helpful for that.
I think the best approach is to manage it with permissions. Keep developers out of production or make them assume a temporary role to get limited access.
I need to either add new document with children or add child to already existing parent.
The only way I know how to do it is ugly:
public async Task AddOrUpdateCase(params)
{
try
{
await UpdateCase(params);
}
catch (RequestFailedException ex)
{
if (ex.Status != (int)HttpStatusCode.NotFound)
throw;
await AddCase(params);
}
}
private async Task UpdateCase(params)
{
// this line throws when document is not found
var caseResponse = await searchClient.GetDocumentAsync<Case>(params.CaseId)
// need to add to existing collection
caseResponse.Value.Children.Add(params.child);
}
I think there wouldn't be any problem if this document didn't contain collection. You cannot use MergeOrUpload if there are child collections. You need to load them from index and add element.
Is there better way to do it?
Azure Cognitive Search doesn't support partial updates to collection fields, so retrieving the entire document, modifying the relevant collection field, and sending the document back to the index is the only way to accomplish this.
The only improvement I would suggest to the code you've shown is to search for the documents you want to update instead of retrieving them one-by-one. That way, you can update them in batches. Index updates are much more expensive than queries, so to reduce overhead you should batch updates together wherever possible.
Note that if you have all the data needed to re-construct the entire document at indexing time, you can skip the step of retrieving the document first, which would be a big improvement. Azure Cognitive Search doesn't yet support concurrency control for updating documents in the index, so you're better off having a single process writing to the index anyway. This should hopefully eliminate the need to read the documents before updating and writing them back. This is assuming you're not using the search index as your primary store, which you really should avoid.
If you need to add or update items in complex collections often, it's probably a sign that you need a different data model for your index. Complex collections have limitations (see "Maximum elements across all complex collections per document") that make them impractical for scenarios where the cardinality of the parent-to-child relationship is high. For situations like this, it's better to have a secondary index that includes the "child" entities as top-level documents instead of elements of a complex collection. That has benefits for incremental updates, but also for storage utilization and some types of queries.
I have a backend api with express. I've implemented logging with winston and morgan.
My next requirement is to record a user's activity: timestamp, the user, and the content he've fetched or changed, into the database MySQL. I've searched web and found this. But since there is no answer yet, I've come to this.
My Thought:
I can add another query which INSERT all the information mentioned above, right before I response to the client, in my route handlers. But I'm curious if there could be another way to beautifully achieve it.
Select the best approach that suits your system from following cases.
Decide whether your activity log should be persistent or in memory, based on use case. Lets assume persistent and the Db is mySQL.
If your data is already is DB, there is no point of storing all the data again, you can just store keys/ids that are primary for identification, for the rows which you have performed CRUD. you can store as foreign keys in case if the operations performed are always fixed or serialised JSON in activity table.
For instance, the structure can be shown as below, where activity_data is serialised JSON value.
ID | activity_name | activity_data | start_date | end_date |
If there is a huge struggle while gathering the data again, at the end of storing activity before sending response, you can consider applying activity functions to the database abstraction layer or wrapper module created for mySQL (assuming).
For instance :
try {
await query(`SELECT * FROM products`);
//performActivity(insertion)
}catch{
//performErrorActivity(insertion)
}
Here, we need to consider a minor trade off regarding performance, as we are performing insertion operation at each step.
If we want to do it all at once, we need to maintain a collection that add up references of all activity in something like request.activityPayload or may be a cache and perform the insertion at last.
If you are thinking of specifically adding a new data-source for activity, A non-relational DB can be highly recommended to store/dump such data (MongoDB opinionated). This is because it doesn't focuses on schema structure as compare to relational DB as well you can achieve performance benefits as compare to mySQL specifically in case of activity storing.
I have a MongoDB database with 2 collections:
groups: { group_slug, members }
users: { id, display name, groups }
All changes to groups are done by changing the members array of the group to include the users ids.
I want to sync these changes across to the users collection by using map/reduce. How can I output the results of map/reduce into an existing collection (but not merging or reducing).
My existing code is here: https://gist.github.com/morgante/5430907
How can I output the results of map/reduce into an existing collection
You really can't do it this way. Nor is this really suggested behaviour. There are other solutions:
Solution #1:
Output the map / reduce into a temporary collection
Run a follow-up task that updates the primary data store from the temporary collection
Clean-up the temporary collection
Honestly, this is a safe way to do this. You can implement some basic retry logic in the whole loop.
Solution #2:
Put the change on a Queue. (i.e. "user subscribes to group")
Update both tables from separates workers that are listening for such events on the queue.
This solution may require a separate piece (the queue), but any large system is going to have such denormalization problems. So this will not be the only place you see this.
I did a little expirement with storing child objects in azure table storage today.
Something like Person.Project where Person is the table entity and Person is just a POCO. The only way I was able to achieve this was by serializing the Project into byte[]. It might be what is needed, but is there another way around?
Thanks
Rasmus
Personally I would prefer to store the Project in a different table with the same partition key that its parent have, which is its Person's partition key. It ensures that the person and underlying projects will be stored in the same storage cluster. On the code side, I would like to have some attributes on top of the reference properties, for example [Reference(typeof(Person))] and [Collection(typeof(Project))], and in the data context class I can use some extension method it retrieve the child elements on demand.
In terms of the original question though, you certainly can store both parent and child in the same table - were you seeing an error when trying to do so?
One other thing you sacrifice by separating out parent and child into separate tables is the ability to group updates into a transaction. Say you created a new 'person' and added a number of projects for that person, if they are in the same table with same partition key you can send the multiple inserts as one atomic operation. With a multi-table approach, you're going to have to manage atomicity yourself (if that's a requirement of your data consistency model).
I'm presuming that when you say person is just a POCO you mean Project is just a POCO?
My preferred method is to store the child object in its own Azure table with the same partition key and row key as the parent. The main reason is that this allows you to run queries against this child object if you have to. You can't run just one query that uses properties from both parent and child, but at least you can run queries against the child entity. Another advantage is that it means that the child class can take up more space, the limit to how much data you can store in a single property is less than the amount you can store in a row.
If neither of these things are a concern for you, then what you've done is perfectly acceptable.
I have come across a similar problem and have implemented a generic object flattener/recomposer API that will flatten your complex entities into flat EntityProperty dictionaries and make them writeable to Table Storage, in the form of DynamicTableEntity.
Same API will then recompose the entire complex object back from the EntityProperty dictionary of the DynamicTableEntity.
Have a look at: https://www.nuget.org/packages/ObjectFlattenerRecomposer/
Usage:
//Flatten complex object (of type ie. Order) and convert it to EntityProperty Dictionary
Dictionary<string, EntityProperty> flattenedProperties = EntityPropertyConverter.Flatten(order);
// Create a DynamicTableEntity and set its PK and RK
DynamicTableEntity dynamicTableEntity = new DynamicTableEntity(partitionKey, rowKey);
dynamicTableEntity.Properties = flattenedProperties;
// Write the DynamicTableEntity to Azure Table Storage using client SDK
//Read the entity back from AzureTableStorage as DynamicTableEntity using the same PK and RK
DynamicTableEntity entity = [Read from Azure using the PK and RK];
//Convert the DynamicTableEntity back to original complex object.
Order order = EntityPropertyConverter.ConvertBack<Order>(entity.Properties);