DocumentDB - Storing telemetry data - azure

so a quick update on why I created this question.
We currently are storing our telemetry data of our devices in the field within Azure SQL Server. This is working great (have a ton of experience with EF, LINQ and relationship dbs) BUT I am aware that this most likely isnt the best solution especially for storing "big" data (data is still small for now but will grow within a year).
I have chosen DocumentDB as our possible solution for storing of just our event history. The rest will stay in SQL - users, profiles, device info, sim, vehicle etc as I dont want to completely halt development as we move 100% across to docdb and rather just do whats best short term - cost + performance.
Going through this video I finally came up with a possible solution as to how to store telemetry data - https://www.youtube.com/watch?v=-o_VGpJP-Q0
They recommended One document per time period (example used 1 per hour). Is this the recommended approach still?
[Index]
public DateTime TimestampUtc { get; set; }
public DateTime ReceivedTimestampUtc { get; set; }
[Index]
public EventType EventType { get; set; }
public Guid ConnectionId { get; set; }
public string RawEventMessage { get; set; }
[Index]
public Sender Sender { get; set; }
[Index]
public Channel Channel { get; set; }
public DbGeography Location { get; set; }
public double? Speed { get; set; }
public double? Altitude { get; set; }
public Int16? Heading { get; set; }
public Byte? HDOP { get; set; }
public Byte? GPSFixStatus { get; set; }
public Byte? GPSFixType { get; set; }
public string Serial { get; set; }
public string HardwareVersion { get; set; }
public string FirmwareVersion { get; set; }
public string Relay1 { get; set; }
public string Relay2 { get; set; }
public string Relay3 { get; set; }
public string Ign { get; set; }
public string Doors { get; set; }
public string Input1 { get; set; }
public string Input2 { get; set; }
public string Out1 { get; set; }
public string Out2 { get; set; }
public int V12 { get; set; }
public int VBat { get; set; }

That's one of several possible alternatives. Which is best depends on what your data looks like. For instance, if you have events that vary in their start date/time and duration (or end date/time) or if you track all state changes of entities then something like Richard Snodgrass' temporal data model is ideal. Interestingly Microsoft SQL Server 2016 recently added direct support for temporal tables but they've been in the SQL spec as TSQL2 for a while. Note, the TSQL2 spec includes both valid-time and transaction-time support but I believe that the recent MS SQL 2016 addition only supports valid time... but that's OK since that's what is most valuable. I only point it out because getting your head around how a valid-time table works is hard enough without the added complexity of adding transaction-time.
The beauty of this approach is that you don't have to decide on the needed time granularity as the data is collected, only if/when you aggregate it.
However, as you said, SQL is not ideal for such large data sets. So, I've implemented valid-time Richard Snodgrass style temporal model on top of DocumentDB in my Lumenize library in particular the TimeSeriesCalculator and its other time-series functionality. Read pages 10-19 here for a backgrounder on the data model and common operations in the Lumenize time-series analysis. That deck is for an implementation I did while at Rally called the Lookback API built on MongoDB but the concepts are the same and I've now switched to DocumentDB (but Rally hasn't).
Another comment on your proposed model, you might want to consider a separate document for every reading. It's a bit confusing from the example if there is a document per minute or one per device. If it's one per device per hour, then you can be assured that you'll never go past 60 minutes, which would be OK, but in just about every other way I can think of, it looks like you have the risk of a single document growing unbounded which is a big no-no in DocumentDB (and all NoSQL data modeling). Also, as you say, even if it isn't unbounded, it would involve a lot of in-place updates. Since your system is likely to be write heavy, I would suggest that you might be better off with a single document per reading. If you have to store denormalized aggregations for speed later on, then you still have the option to do that. You may not even need it though. Let the performance of the production system inform that decision.
I suggest that you read up on time-dimensions for star-schemas. It looks a lot like what you are planning, but it's also ideal for the denormalized aggregation storage that I describe. I have not seen any writeups of star schema concepts for NoSQL but here is one from the traditional SQL world that will help you with the concepts.
As I said, there are a lot of alternatives and without knowing more about your situation, I cannot know which is best.

Ok so I think I am going for the 1 document per event (for now 1 every 5 minutes but could change to 1 per second per device). The reason being is appending to a document should surely be costly as you need to do a "replace" on that document?? (does docdb support append/partial updates now?) Surely that involves a read and then a growing replace which would be more costly and timely than just adding a new doc per event. The only concern is when we have millions/billions of documents... is this fine?

Related

Does entity really can belong to only one aggregate?

I am learning DDD and just faced a problem that I can't solve.
Assume we have following domain:
public class Hotel : AggregateRoot {
public List<Room> Rooms { get; private set; }
}
public class Room : Entity {
public string Name { get; set; }
public int Number { get; set; }
}
and now we want to model RoomReservations.
public class RoomReservationRecord : Aggregate {
public string CustomerName { get; set; }
public Room Room { get; set; } // <- this is problem
public DateTime DateFrom { get; set; }
public DateTime DateTo { get; set; }
}
as clearly visible, 2 aggregates contains (share) single entity. It makes sense from business perspective, however from DDD perspective, it looks like 2 aggregates share the same entity.
Is this approach correct, or it violates "Entity can be part of single Aggregate" rule?
Or is there better (obvious way) to model such requirement?
I will try to point out a few things but at the end of the day this is a modeling exercise and often there are compromises to be made.
Invariants - A big part (probably the primary) of why you would group objects in an object graph is to make sure that certain rules are enforced. So if it was a business rule that NO ROOM CAN EVER BE DOUBLE BOOKED, then Hotel would probably be an aggregate root with rooms and reservations as entities on it.
Something like this... but this has some drawbacks...
EG.
class Hotel
{
//members (eg. Rooms and Reservations) ...
public Hotel(string name, ICollection<Room> rooms)
{
//...
}
bool TryMakeReservation(Reservation reservation)
{
// if booked already return false
return true;
}
}
class Room
{
//members ...
public Room(int number, bool isBooked)
{
//..
}
}
class Reservation {
public string CustomerName { get; private set; }
public int RoomNumber { get; private set; }
public DateTime DateFrom { get; private set; }
public DateTime DateTo { get; private set; }
Reservation(Parameters)
{
// ...
}
}
NOTE: To use this model you would need to lock the hotel down each time you make a booking!
This might not be acceptable for a busy hotel. There are ways around this like reserving it for 5 minutes before automatically releasing it unless a reservation is confirmed. Or storing a list of events and if 2 reservations for the same room exist over the same period without a checkout event, kick off a process to notify someone to deal with the double booking.
Contexts - it might be that the list of rooms and the actual bookings are in separate domains. Think about how often bookings are done from multiple sources like AirBnb, Booking.com, hotel website, and/or at the counter in person or over the phone. It might not make sense for bookings and the persistence of available rooms to be in the same domain. What about cleaning schedules. A room needs to be cleaned before it is available but is this really handled in the Booking context?
Performance - as mentioned, sometimes the model we want just isn't possible because of the physics of getting that amount of data queried from a datastore. Users, Product Owners et.c tend not to care how clean your model is if it affects performance too much.
Repositories - following on from the above point, since an aggregate should be a consistent type (ie. data in it should never be in an incorrect state) then when you fetch your aggregate it should be consistent. If Hotel has a repository but contains a room, and Room is an aggregate root and has its own repository, and repositories are calling repositories, I would say you are failing at scaling complexity. The main point of DDD is giving a set of patterns and practices that help you deal with complexity. If by applying DDD principles you increased the complexity by a step that is never capitalized by the future smaller steps in complexity as new features are added, then DDD was probably not the right tool to use at that point in the project.
To speak to the linked article on Ids. Just not using primitive types can mitigate a lot of the problems here. Focusing on finding Value Objects can give a lot of clarity and really help express your domain. Even if you don't use DDD it is a valuable practice, which is why I wrote a series on them without ever mentioning Value Objects... I think.
I hope this helped. DDD, more than even FP to me I guess, has so many super valuable ideas in it that help create maintainable code that scales to requirements. Better than that it has a focus on soft elements outside the code like collaboration and shared a language that brings even more value but they are just guidelines that need to be applied because you want a certain gain (and are willing to pay any incurred costs). They are not rules to apply and very seldom is it just a wrong and right path.

More ServiceStack request DTO advice

This is a follow up regarding:
ServiceStack Request DTO design
In the above question the design was strictly regarding read operations. What about write operations? Say we wanted to add operations for creating a new booking limit, would reusing the noun be appropriate here?
[Route("/bookinglimits/","POST")]
public class CreateBookingLimit : IReturn<BookingLimit>
{
BookingLimit newBookingLimit
}
-OR- Would this be better design?
[Route("/bookinglimits/","POST")]
public class CreateBookingLimit : IReturn<BookingLimit>
{
public int ShiftId { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Limit { get; set; } }
}
Also, if we wanted to add editing--should we have insert and edit share the same models and add the ID?
[Route("/bookinglimits/","POST")]
[Route("/bookinglimits/{Id}/","PUT")]
public class CreateBookingLimit : IReturn<BookingLimit>
{
public int Id { get; set; }
public int ShiftId { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Limit { get; set; } }
}
I'm trying to wrap my head around when it makes the most sense to reuse POCOs and when it makes more sense to separate intentions.
Message-based API Design
There are a few things to bear in mind when designing the ideal message-based API where your Services effectively end up serving 2 masters: a Native Client API and a REST API. Native Clients just send and receive messages in their original form so they get a natural API for free modelled using C# Request and Response DTOs to capture what information is required for the Service to perform its Operation and what it should return.
Projecting messages into the ideal HTTP API
After designing your message-based API you'll then want to focus on how best to project the messages into a REST API by annotating Request DTOs with [Route] Attributes to define the Custom endpoints for your Services.
This previous answer on Designing a REST-ful service with ServiceStack provides examples on which routes different Request DTOs map to, in general you'll want to design your APIs around Resources where each operation "acts on a Resource" which will make defining your Custom Routes easier. The ideal HTTP API for Creating and Updating a Booking Limit would look like:
POST /bookinglimits (Create Booking Limit)
PUT /bookinglimits/{id} (Update Booking Limit)
General recommendations on good API Design
Whilst not specifically about Web Services this article on Ten Rules for Good API Design provides good recommendations on general (Code or Services) API design. As API Consumers are the intended audience of your APIs who'll primarily be deriving the most value from them, their design should be optimized so that they're self-descriptive, use consistent naming, are intuitive to use and can be evolved without breaking existing clients. Messages are naturally suited to versioning but you still need to be mindful when making changes to existing published APIs that any additional properties are optional with default fallback behavior if required.
For this reason whilst you can save some code by returning a naked BookingLimit, my preference is to instead return a specific Response DTO for each Service which allows the Service to return additional metadata without breaking existing clients whilst maintaining a consistent Request/Response pattern for all Services. Although this is just my preference - returning naked types is also fine.
ServiceStack Implementation
To implement this in ServiceStack I wouldn't use the same Request DTO to support multiple verbs. Since the Request DTO is called Create* that conveys that users should only send this Request DTO to Create Booking limits which is typically done using a POST request, e.g:
[Route("/bookinglimits", "POST")]
public class CreateBookingLimit : IReturn<CreateBookingLimitResponse>, IPost
{
public int ShiftId { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Limit { get; set; }
}
public class CreateBookingLimitResponse
{
public BookingLimit Result { get; set; }
public ResponseStatus ResponseStatus { get; set; }
}
The IPut, IPost are Verb interface markers which lets both the User and Service Client know which Verb this message should be sent with which makes it possible to have all messages sent in a single Service Gateway method.
If your Service also supports updating a Booking Limit then I'd create a separate Service for it which would look like:
[Route("/bookinglimits/{Id}", "PUT")]
public class UpdateBookingLimit : IReturn<UpdateBookingLimitResponse>, IPut
{
public int Id { get; set; }
public int ShiftId { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Limit { get; set; }
}
public class UpdateBookingLimitResponse
{
public BookingLimit Result { get; set; }
public ResponseStatus ResponseStatus { get; set; }
}
By using separate Operations you can ensure Request DTOs contains only the properties relevant to that operation, reducing the confusion for API consumers.
If it makes sense for your Service, e.g. the schemas for both operations remains the same I'll merge both Create/Update operations into a single Operation. When you do this you should use a consistent Verb that indicates when an operation does both, e.g. Store* or CreateOrUpdate*:
[Route("/bookinglimits", "POST")]
public class StoreBookingLimit : IReturn<StoreBookingLimitResponse>, IPost
{
public int Id { get; set; }
public int ShiftId { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public int Limit { get; set; }
}
public class StoreBookingLimitResponse
{
public BookingLimit Result { get; set; }
public ResponseStatus ResponseStatus { get; set; }
}
In most cases where the Server generates the Id for the Resource you should use POST, in the rare case where the client specifies the Id, e.g. Slug or Guid you can use PUT which roughly translates to "PUT this resource at this location" which is possible when the client knows the url for the resource.
Message based API examples
Most of the time what messages should contain will be obvious based on the Service requirements and becomes intuitive and natural to create over time. For examples on a comprehensive message-based API you can have a look AWS Web Services who've effectively servicified their Web Services behind a message-based design that uses Service Clients to send messages to access all their APIs, e.g. AWS DynamoDB API Reference lists each Actions that's available as well as other DTO Types that the Services return, e.g here are DynamoDB APIs they have around Creating / Modifying and Querying Items:
Actions
BatchGetItem
BatchWriteItem
DeleteItem
GetItem
PutItem
Query
Scan
UpdateItem
Data Types
AttributeDefinition
AttributeValue
AttributeValueUpdate
Condition
...
In ServiceStack Actions are called Operations and what you'll use Request DTOs to define, whilst AWS Data Types are just called DTOs which I keep in a Types namespace to differentiate from Operations.
DynamoDb.ServiceModel (project)
/GetItem
/PutItem
/UpdateItem
/DeleteItem
/Query
/Scan
/Types
/AttributeDefinition
/AttributeValue
/AttributeValueUpdate
You typically wouldn't need additional explicit Services for Batch Requests as you can get that for free using ServiceStack's Auto Batched Requests. ServiceStack also includes a number of other benefits where it's able to generate richer DTOs containing Custom Attributes and interfaces in the Source DTOs to enable a richer and succinct end-to-end typed API requiring less boilerplate and generated code that lets you use the same Generic Service Client to call any ServiceStack Service offering both Sync and idiomatic Async APIs. The additional metadata also enables seamless higher-level functionality like Encrypted Messaging, Cache Aware Clients, Multiple Formats, Service Gateway, HTTP Verb Interface Markers, etc.
Otherwise AWS follows a very similar approach to ServiceStack for designing message-based APIs using generic Service Clients to send DTOs native in each language.

CRUD and Query with ServiceStack - Need to get rid of some confusion

I am a bit confused with ServiceStack 'old' and 'new' API and need some clarification and best practices, especially with Request / Response DTO's and routing. I watched some courses on Pluralsight and have the first three books listet on servicestack.net in my electronic bookshelf.
I like to 'restify' an existing application which is built using DDD patterns which means I have a high level of abstraction. The client is WPF and follows the MVVM pattern. I have 'client side service', 'server side service' and repository classes (and some aggregates too). I use NHibernate 4 (with fluent API and a code-first approach) as ORM. Only my repository classes know about the ORM. I have DTO's for all my Entity objects and in my WPF client I only work with those DTOs in the ViewModel classes. I heavily use AutoMapper to 'transfer' Entity objects to my DTO's and vice versa.
My confusion starts exactly with these DTO's and the Request / Response DTOs used in ServiceStack. Here is a very much simplified example of an Address Entity which illustrates the problem:
All my Entity Objects derive from EntityBase which contains basic properties used in all Entities:
public abstract class EntityBase : IEntity
{
public virtual Guid Id { get; protected set; }
public virtual DateTime CDate { get; set; } //creation date
public virtual string CUser { get; set; } //creation user
public virtual DateTime MDate { get; set; } //last modification date
public virtual string MUser { get; set; } //last modification user
//
// some operators and helper methods irrelevant for the question
// ....
}
public class Address : EntityBase
{
public string Street { get; private set; }
public string AdrInfo1 { get; private set; }
public string AdrInfo2 { get; private set; }
public string ZipCode { get; private set; }
public string City { get; private set; }
public string Country { get; private set; }
}
Of course there are collections and references to related objects which are ignored here as well as database mappers, naming conventions etc. The DTO I have looks like this:
public class AddressDto
{
public Guid Id { get; set; } // NHibernate GUID.comb, NO autoincrement ints!!
public DateTime CDate { get; set; }
public string CUser { get; set; }
public DateTime MDate { get; set; }
public string MUser { get; set; }
public string Street { get; private set; }
public string AdrInfo1 { get; private set; }
public string AdrInfo2 { get; private set; }
public string ZipCode { get; private set; }
public string City { get; private set; }
public string Country { get; private set; }
}
To use this with ServiceStack I need to support the following:
CRUD functionality
Filter / search functionality
So my 'Address service' should have the following methods:
GetAddresses (ALL, ById, ByZip, ByCountry, ByCity)
AddAddress (Complete AddressDTO without Id. CDate, CUser are filled automatically without user input)
UpdateAddress (Complete AddressDTO without CUser and CDate, MDate and MUser filled automatically without user input)
DeleteAddress (Just the Id)
For me it is pretty clear, that all Requests return either a single AddressDto or a List<AddressDto> as ResponseDTO except for the delete which should just return a status object.
But how to define all those RequestDTO's? Do I really have to define one DTO for EACH scenario?? In the books I only saw samples like:
[Route("/addresses", "GET")]
public class GetAddresses : IReturn<AddressesResponse> { }
[Route("/addresses/{Id}", "GET")]
public class GetAddressById : IReturn<AddressResponse>
{
public Guid Id { get; set; }
}
[Route("/addresses/{City}", "GET")]
public class GetAddressByCity : IReturn<AddressResponse>
{
public string City { get; set; }
}
// .... etc.
This is a lot of boilerplate code and remembers me a lot of old IDL compilers I used in C++ and CORBA.....
Especially for Create and Update I should be able to 'share' one DTO or even better reuse my existing DTO... For delete there is probably not much choice....
And then the filters. I have other DTOs with a lot more properties. A function approach like used in WCF, RPC etc is hell to code...
In my repositories I pass an entire DTO and use a predicate builder class which composes the LINQ where clause depending on the properties filled. This looks something like this:
List<AddressDto> addresses;
Expression<Func<Address, bool>> filter = PredicateBuilder.True<Address>();
if (!string.IsNullOrEmpty(address.Zip))
filter = filter.And(s => s.Zip == address.Zip);
// .... etc check all properties and dynamically build the filter
addresses = NhSession.Query<Address>()
.Where(filter)
.Select(a => new AddressDto
{
Id = a.Id,
CDate = a.CDate,
//.... etc
}).ToList();
Is there anything similar I could do with my RequestDTO and how should the routing be defined?
A lot of questions raised here have been covered in existing linked answers below. The Request / Response DTOs are what you use to define your Service Contract, i.e. instead of using RPC method signatures, you define your contract with messages that your Service accepts (Request DTO) and returns (Response DTO). This previous example also walks through guidelines on designing HTTP APIs with ServicesStack.
Use of well-defined DTOs have a very important role in Services:
You want to ensure all types your Services return are in DTOs since this, along with the base url of where your Services are hosted is all that's required for your Service Consumers to know in order to consume your Services. Which they can use with any of the .NET Service Clients to get an end-to-end Typed API without code-gen, tooling or any other artificial machinery.
DTOs are what defines your Services contract, keeping them isolated from any Server implementation is how your Service is able to encapsulate its capabilities (which can be of unbounded complexity) and make them available behind a remote facade. It separates what your Service provides from the complexity in how it realizes it. It defines the API for your Service and tells Service Consumers the minimum info they need to know to discover what functionality your Services provide and how to consume them (maintaining a similar role to Header files in C/C++ source code). Well-defined Service contracts decoupled from implementation, enforces interoperability ensuring that your Services don't mandate specific client implementations, ensuring they can be consumed by any HTTP Client on any platform. DTOs also define the shape and structure of your Services wire-format, ensuring they can be cleanly deserialized into native data structures, eliminating the effort in manually parsing Service Responses.
Auto Queryable Services
If you're doing a lot of data driven Services I recommend taking a look at AutoQuery which lets you define fully queryable Services without an implementation using just your Services Request DTO definition.

EF6 and MVC5: Using two fields as a combined key

I have several objects (Product, Rule, PriceDetail, etc.) that manage and store information in a CRUD application. I want a way to keep a log of when the data is updated, and to that end I've created an Update class, referenced as ICollection<Update> Updates within each data class.
When the tables are all generated, EF creates a FK for each class in the Updates table (Product_ID, Rule_ID, etc.). This seems horribly inefficient. Could I use a two-field key, such as enum ObjectType and long ID? Alternately, can I use string ID and force a pattern where the first N characters of the string identify the referencing object? If the latter, can the database auto-increment the string value?
Here's some example code, trimmed for placement here:
public class Update
{
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public long ID { get; set; }
public string Reason { get; set; }
public DateTime TimeOfUpdate { get; set; }
public long Product_ID { get; set; }
public long Rule_ID { get; set; }
}
public class Product
{
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public long ID { get; set; }
public string Name { get; set; }
public PriceDetail Price { get; set; }
public ICollection<Update> Updates { get; set; }
}
public class Rule
{
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public long ID { get; set; }
public string Name { get; set; }
public ICollection<Condition> Conditions { get; set; }
public ICollection<Update> Updates { get; set; }
}
There are multiple ways of handling auditing logic.
Do you anticipate storing update history for every table? If it's going to be limited to a few tables, your design might work fine. If however, you want to update many tables, you might want to try out the options below.
Include 3 tables (Products, Updates and ProductUpdates). The Products tables will always have the latest data. The Updates tables will get a new row capturing the updated timestamp every time an entry in Products is updated. The ProductUpdates will have a foreign key to the Updates table and will have the old row from the Products table. This way you know exactly what the row looked at any point of time. Extending it to any other table X will require adding XUpdates table. But you wouldn't have the unnecessary 50 foreign keys that you mentioned.
Another option would be to have IsActive, UpdatedBy, UpdatedTimestamp, etc... columns in the tables that will be updated. Every time, you update a row, you mark it as inactive and insert a new row with the latest data. You can store the reason and rule columns also if needed.
You can also redesign your entities in such a way that their primary key is a foreign key to your updates table. This way you will eliminate the inelegance of all previous solutions. Every time you update, you will insert a row in the Updates table and use the generated Id as the primary key of a new row in your products table.
Entity Framework can help you in automating the process laid out in points 3 and 4. The basic idea would be to intercept the Save requests for updates and force an update and insert instead.
Lastly, you might also be able to use CLR triggers to have the audit functionality you want.
Each solution has its pros and cons. The best solution for you would depend upon your specific use case.

Including a base member doesn't seem to work in Entity Framework 5

here are my entities:
public abstract class ResourceBase
{
[Key]
int Id { get; set; }
[ForeignKey("Resource")]
public Guid ResourceId { get; set; }
public virtual Resource Resource { get; set; }
}
public class Resource
{
[Key]
public Guid Id { get; set; }
public string Type { get; set; }
}
public class Message : ResourceBase
{
[MaxLength(300)]
public string Text { get; set; }
}
And then my query is something like this:
var msgs = messages.Where(x=>x.Id == someRangeOfIds).Include(m=>m.Resource).Select(x => new
{
message = x,
replyCount = msgs.Count(msg => msg.Id = magicNumber)
});
I am running this with proxy creation disabled, and the result is all the messages BUT with all the Resource properties as NULL. I checked the database and the Resources with matching Guids are there.
I drastically simplified my real life scenario for illustration purposes, but I think you'll find you can reproduce the issue with just this.
Entity Framework 5 handles inherited properties well (by flattening the inheritence tree and including all the properties as columns for the entity table).
The reason this query didn't work was due to the projection after the include. Unfortunately, the include statement only really works when you are returning entities. Although, I did see mention of a solution which is tricky and involves invoking the "include" after the shape of the return data is specified... If anyone has more information on this please reply.
The solution I came up with was to just rephrase the query so I get all messages in one query, and then in another trip to the database another query that gets all the reply counts.
2 round trips when it really should only be 1.

Resources