Azure table storage inverse relationship - azure

I am using azure table storage (Note: NOT Azure SQL) and I have the following situation:
In my application I have a number of organisations that 'invite' users, and on the invite there is an associated 'Role' and 'Expiry'. Once the organisation has invited a user I want the org to see the list of users that they have invited, and I want the user to see a list of organisations that they have been invited to.
I think in my application and this case, that there would be low numbers (ie an org would only invite a few users and a user will generally only be invited by one org). However is there a general pattern that people use to deal with this situation even with very large numbers?

I have three approaches that I currently use, depending on my needs:
Transactional
I store the forward and inverse relationship on the same partition... this means that EVERY entity is on the same partition (ie this method is rate limited by a single partition), but it means you can use a batch transaction to insert the forward and inverse relationship at the same time which means that you know they will always be correct.
public class OrganisationInvite : TableEntity
{
// Partition Id - string.Empty
// Row Id - "Invite_" + OrangisationId + "_" + UserId
public string Role { get; set; }
public DateTime Expiry { get; set; }
}
public class OrganisationRequest : TableEntity
{
// Partition Id - string.Empty
// Row Id - "Request_" + UserId + "_" + OrganisationId
public string Role { get; set; }
public DateTime Expiry { get; set; }
}
To query I use a t.RowKey.StartsWith("Request_...") or t.RowKey.StartsWith("Invite_...") depending on whether I want to get a list of a user/org invites.
I guess this is best used when the data is very critical.
Eventual Consistency
I give both tables all the properties but they live on different partitions, this gives you awesome scalability but you loose the transaction. I use a messaging queue to update the inverse relationship to match the forward relationship, so eventually the data will match. (But for a while it may not).
// Assume both in the same table, thus the prefix on partition
public class OrganisationInvite : TableEntity
{
// Partition Id - "Invite_" + OrangisationId
// Row Id - UserId
public string Role { get; set; }
public DateTime Expiry { get; set; }
}
public class OrganisationRequest : TableEntity
{
// Partition Id - "Request_" + UserId
// Row Id - OrganisationId
public string Role { get; set; }
public DateTime Expiry { get; set; }
}
To query I use a t.PatitionKey == "Request_..." or t.PatitionKey == "Invite_..." depending on whether I want to get a list of a user/org invites. Perhaps you would consider one of these the 'source of truth' so when a user does accept the invite you would look up the 'source of truth' and give the user that role etc.
This is the most scalable solution, and especially makes sense if you are using caching on top of it.
Source of truth
In this case I only give the properties on one entity, and only have the keys of the inverse relationship on the other. You would add the entities to the list that is longest or is queried the most... in this case I would say it is the invites for an org. Like the eventual consistency method you would queue the inverse relationship to add the inverse entity. This method gives you complete data consistency except for when you add a new relationship (as there is a bit of time before the inverse relationship is created), and is highly scalable - there is a higher cost to read the inverse list though.
// Assume both in the same table, thus the prefix on partition
public class OrganisationInvite : TableEntity
{
// Partition Id - "Invite_" + OrangisationId
// Row Id - UserId
public string Role { get; set; }
public DateTime Expiry { get; set; }
}
public class OrganisationRequest : TableEntity
{
// Partition Id - "Request_" + UserId
// Row Id - OrganisationId
}
You can trivially query the forward relationship using t.PatitionKey == "Invite_...". The inverse relationship is not trivial though. You have to query using t.PatitionKey == "Request_..." and create n number of parallel calls to get each item's data forward data (In this case to use the org id found in the inverse relationship's RowKey). If the item does not exist then you do not add it to your final list. This ensures that if the org changes its role for example the user will see this change on the next hit.
I think this method is useful if the inverse relationship is used rarely and it is critical that the data is up to date (I'm thinking user permissions etc?)

Related

Cosmos Document query to locate latest record per type

I have a collection where I am storing the timestamp and its latest location with the following class:
public class TrackingInfo
{
[JsonProperty("id")]
public string Id { get; set; }
[JsonProperty("_partition_key")]
public string _PartitionKey { get; set; }
[JsonProperty("asset_id")]
public string AssetId { get; set; }
[JsonProperty("unix_timestamp")]
public double UnixTimestamp { get; set; }
[JsonProperty("timestamp")]
public string Timestamp { get; set; }
[JsonProperty("location")]
public Point Location { get; set; }
}
which is partitioned by _PartitionKey which contains a construct like this:
tracking._PartitionKey = $"tracking_{tracking.AssetId.ToLower()}_{DateTime.Today.ToString("D")}";
Looks like there is no way to do a Group by on the collection.
Can someone please help me create a SQL document query to find the latest entry for each AssetId and its Location and Timnestamp when the data was recorded.
Update 1:
what if I change the _PartitionKey to represent per day something like below:
tracking._PartitionKey = $"tracking_{DateTime.Today.ToString("D")}";
would it make it easier to get all assets and its latest tracking record?
As per my comment, my suggestion would be to solve your problem differently.
Assumption: You have a large number of assetIds and don't know the values beforehand:
Have one document that represents the latest state of your asset
Have another document that represents the location events of your asset
Update the first document whenever there is a new location event
You can put both types of documents in the same collection or separate them - both approaches have benefits. I would probably separate them.
Then do a query "what assets are within 1km of xxx" (Querying spatial types)
Sidenote: It might be a good idea to use the assetId as partitionKey instead of your combined key. Using such a key is very bad for queries
If you only have very few assetIds, you can use those to only find the latest updates by using and ordering by the timestamp field. This will only return the last item
Cosmos DB doesn't support group by feature,you could vote up this.
Provide a third-party package [documentdb-lumenize for your reference which supports group by feature,it has .net example:
string configString = #"{
cubeConfig: {
groupBy: 'state',
field: 'points',
f: 'sum'
},
filterQuery: 'SELECT * FROM c'
}";
Object config = JsonConvert.DeserializeObject<Object>(configString);
dynamic result = await client.ExecuteStoredProcedureAsync<dynamic>("dbs/db1/colls/coll1/sprocs/cube", config);
Console.WriteLine(result.Response);
You could group by assetId column and get the max timestamp.

Query data across multiple repositories in DDD

I am using multiple aggregate roots inside a DDD bounded context.
For example
public class OrderAggregate
{
public int ID {get;set;}
public string Order_Name {get;set;}
public int Created_By_UserID {get;set;}
}
public class UserAggregate
{
public int ID {get;set;}
public string Username {get;set;}
public string First_Name {get;set;}
public string Last_Name {get;set;}
}
I am using SQL relational base to persists domain objects. Each aggregate root matches one repository.
In case I would like to find an order that was created by John Doe (seach accross multiple aggregates) what would be a DDD way to go?
add First_Name and Last_Name into OrderAggregate in order to add FindByUserFirstLastName method in OrderRespository, but that could raise data consistency issue between two aggregate roots
create a raw sql query and access DB directly in order to span search accross "repositories"
use "finders" in order to join entities directly from DB
replicate data necessary for query to be completed to a new aggregate root such as
public class QueryOrderAggregate
{
public int ID { get; set; }
public string Order_Name { get; set; }
public int Created_By_UserID { get; set; }
public string First_Name { get; set; }
public string Last_Name { get; set; }
}
In case I would like to find an order that was created by John Doe (seach accross multiple aggregates) what would be a DDD way to go?
Almost the same way that it goes with accessing an aggregate...
You create a Repository that provides a (whatever the name for this view/report is in your domain). It probably uses the UserId as the key to identify the report. In the implementation of the repository, the implementation can do whatever makes sense -- a SQL join is a reasonable starting point.
The View/Report is basically a value type; it is immutable, and can provide data, but doesn't not have any methods, or any direct access to the aggregate roots. For example, the view might include the OrderId, but to actually get at the order aggregate root you would have to invoke a method on that repository.
A view that spans multiple aggregates is perfectly acceptable, because you can't actually modify anything using the view. Changes to the underlying state still go through the aggregate roots, which provide the consistency guarantees.
The view is a representation of a stale snapshot of your data. Consumers should not be expecting it to magically update -- if you want something more recent, go back to the repository for a new copy.

Entity Framework Linking tables

I’m using Entity Framework 5.0,
Scenario
"Organisation" has a list of "clients" and a list of "Periods" and a "CurrentPeriodID" At the start of each period some or all of the "Clients" are associated with that "Period", this I have done using a link table and this works OK so when I do "Organisation->Period->Clients" I get a list of "Clients" for the "Period".
Next I need to add some objects ("Activities") to the "Clients" for a "Period" so I get "Organisation->Period->Client->Activates" this won’t be the only one there will eventually be several other navigation properties that will need to be added to the "Clients" and the "Activities" and all of them have to be "Period" related, I also will have to be able to do (if possible) "Organisation->Period-Activities".
Question
What would be the best way of implementing the "Activities" for the "Organisation->Period-Client", I Don’t mind what way it is done Code First reverse Engineering etc. Also on the creation of the "Organisation" object could I load a current "Period" object using the "CurrentPeriodID" value which is stored in the "Organisation" object.
Thanks
To me this sounds like you need an additional entity that connects Period, Client and Activity, let's call it ClientActivityInPeriod. This entity - and the corresponding table - would have three foreign keys and three references (and no collections). I would make the primary key of that entity a composition of the three foreign keys because that combination must be unique, I guess. It would look like this (in Code-First style):
public class ClientActivityInPeriod
{
[Key, ForeignKey("Period"), Column(Order = 1)]
public int PeriodId { get; set; }
[Key, ForeignKey("Client"), Column(Order = 2)]
public int ClientId { get; set; }
[Key, ForeignKey("Activity"), Column(Order = 3)]
public int ActivityId { get; set; }
public Period Period { get; set; }
public Client Client { get; set; }
public Activity Activity { get; set; }
}
All three foreign keys are required (because the properties are not nullable).
Period, Client and Activity can have collections refering to this entity (but they don't need to), for example in Period:
public class Period
{
[Key]
public int PeriodId { get; set; }
public ICollection<ClientActivityInPeriod> ClientActivities { get; set; }
}
You can't have navigation properties like a collection of Clients in Period that would contain all clients that have any activities in the given period because it would require to have a foreign key from Client to Period or a many-to-many link table between Client and Period. Foreign key or link table would only be populated if the client has activities in that Period. Neither EF nor database is going to help you with such a business logic. You had to program this and ensure that the relationship is updated correctly if activities are added or removed from the period - which is error prone and a risk for your data consistency.
Instead you would fetch the clients that have activities in a given period 1 by a query, not by a navigation property, for example with:
var clientsWithActivitiesInPeriod1 = context.Periods
.Where(p => p.PeriodId == 1)
.SelectMany(p => p.ClientActivities.Select(ca => ca.Client))
.Distinct()
.ToList();

DDD: Modeling simple domain with two aggregate roots

Let's say I want to create action web site where members would be able to bid for items. To model this domain I have three classes: Member, Item and Bid.
My brainstorming would go something like this:
Item can contain multiple bids
Bid is associated with one Item and one Member
Member can contain multiple bids
Member and Item can exist without bid instance
Bid instance can't exist without both Member and Item
Considering all this it is obvious that since Member and Item objects are independent we can consider them aggregate roots. Bid will be part of one of these aggregate. That is clear but what is confusing to me right now is which aggregate root should I choose? Item or Member?
This is example from Pro ASP.NET MVC 3 Framework book by Apress, and the way they did is like following:
Which gives following code:
public class Member
{
public string LoginName { get; set; } // The unique key
public int ReputationPoints { get; set; }
}
public class Item
{
public int ItemID { get; private set; } // The unique key
public string Title { get; set; }
public string Description { get; set; }
public DateTime AuctionEndDate { get; set; }
public IList<Bid> Bids { get; set; }
}
public class Bid
{
public Member Member { get; set; }
public DateTime DatePlaced { get; set; }
public decimal BidAmount { get; set; }
}
Member and Item are aggregate roots here and Bid is contained within Item.
Now let's say that I have application use case: "Get all bids posted by specific member". Does that mean that I would have to first get all Items (eg. from data base via repository interface) and then enumerate all bids for each Item trying to find matching Member? Isn't that a bit inefficient? So a better way would be then to aggregate Bid objects inside of Member. But in that case consider the new use case: "Get all bids for specific item". Now again we need to go other way around to get all bids...
So taking into account that I need to implement both of these use cases in my application, what is the right and efficient way to model this domain then?
Your domain should really reflect only Command (CQRS) requirements (update/change data). I presume that you need Queries (read data, no update/change of data): "Get all bids for specific item" and "Get all bids posted by specific member". So, this "querying" has nothing to do with the domain, as the query implementation is independent on the command implementation (command is calling a domain method). This gives you a freedom to implement each query in an efficient way. My approach is to implement an efficient DB view getting you only data you want to display in UI. Then you create a new class called BidForItemDto (DTO = data transfer object) and you map data from DB view into a collection of BidForItemDto (you can do it manually via ADO.NET or use NHibernate (preferred, does everything for you)). The same for the second query, create a new class called BidPostedByMemberDto.
So, if it is queries you need, just forget about domain, realize that it's just data you want to display in UI, and query them efficiently from the DB. Only when you do some action in UI (click a button to place a bid for instance), it's executing a command "place a bid", which would at the end call domain method Item.PlaceBid(Member member, DateTime date, decimal amount). And btw, IMHO is it an Item which "has many bids", and the domain method "place bid" would surely need to access previous bids to implement the whole logic correctly. Placing bids collection into Member does not make much sense to me...
From the top of my head some examples of DB views and sql queries:
Get all bids for specific item:
create view BidForItemDto
as
select
i.ItemId,
b.BidId,
b.MemberId,
b.DatePlaced,
b.BidAmount
from Item i
join Bid b ON b.ItemId = i.ItemId
query:
SELECT *
from BidFormItemDto
where ItemId = <provide item id>
Get all bids posted by specific member:
create view BidPostedByMemberDto
as
select
m.MemberId,
b.BidId,
b.MemberId,
b.DatePlaced,
b.BidAmount
from Member m
join Bid b ON b.MemberId = i.MemberId
query:
SELECT *
from BidPostedByMemberDto
where MemberId = <provide member id>

Azure table storage - pattern for parent-child (self referencing schema)

Using Windows Azure Table Storage (WATS) and trying to update the app to use Azure. I've read many articles, and am not sure on the best approach for this, that is parent to child in a self referencing model.
ie a single parent message could have many child sub-messages. In a DB model, it would be a self referencing table.
How would I best structure this for WATS so that when I make a query "Give me 10 parent records", it will also return all the child-messages belonging to the parent...
The entity of the message / submessage as below. I've tried to define the PK and RK as below:
public class TextCacheEntity : AzureTableEntity // custom table inherits AzureTableEntity
{
public override void GenerateKeys()
{
PartitionKey = string.Format("{0}_{1}_{2}", MessageType, AccountId.PadThis(), ParentMessageId );
RowKey = string.Format("{0}_{1}", DateOfMessage.Ticks.ReverseTicks(), MessageId);
}
public string MessageType { get; set; }
public int AccountId { get; set; }
public DateTime DateOfMessage { get; set; }
public string MessageId { get; set; }
public string ParentMessageId { get; set; }
// other properties...
}
I thought of an implementation so the child messages store the parentMessagesId, and the parent parentMessageId would be empty.
The pattern would then be
Get the parent messages
.Where(o => o.ParititionKey == "Parent_000000000000001_").Take(10)
Get the child messages. Iterate through all the parent messages and using a parallel for loop
.Where(o => o.ParititionKey == "Child_000000000000001_" + parentMessageId)
But the problem is that this will result in 11 queries !
See this example by Scott Densmore:
http://scottdensmore.typepad.com/blog/2011/04/multi-entity-schema-tables-in-windows-azure.html
You can do this by using the same PK for both. There are a couple reasons to do this, but one good one is that you can then also issue batch commands for parent and children at once and achieve a type of consistent transaction. Also, when they share the same PK within the same table, it means they are going to be colocated together and served from the same partition. You are less likely to continuation tokens (but you should still expect them). To differentiate between parent and children you can either add an attribute or use the RowKey perhaps.
The only trick to this (and the model you already ahve), is that if the parent and children are not the same CLR type, you will have issues with serialization in WCF DataServices. You can fix this of course by creating an uber-CLR type that has both child and parent properties or you can override serialization with the ReadingEntity event and handle it yourself.
Anyhow, use the same PK for both children and parent. Then when you search PK ranges you will always get parents and children returned at once (you can discriminate with a Where clause predicate if you wish).

Resources