How to embed documents in Azure Table Storage - azure

I'd like to be able to store objects that have child objects in Azure Table storage using a structure like so:
public class AzureTestDocument : TableServiceEntity
public AzureTestDocument(int counter)
: base("_default", counter.ToString())
Counter = counter;
Child = new AzureTestChildDocument(counter);
public int Counter { get; set; }
public class AzureTestChildDocument
public AzureTestChildDocument(int counter)
Counter = counter;
public int Counter { get; set; }
Saving the parent document if I remove the child document works fine. Saving a structure like this results in a "One of the request inputs is not valid" exception. Doing a little googling turned up this article about supported types which may mean you can't embed any types other than that short list of supported ones.
Please clarify if this is the case or point me towards what I may be missing.

Azure Table Storage supports saving of entities that only contain primitive properties. Any nested child objects need to be saved separately:
You can serialize the child objects into strings and save those strings as properties.
Alternatively, you can save those child objects as individual rows in Azure tables
Alternatively, if you're dealing with documents, you can save those objects in Azure BLOB storage.


Best way to migrate (and Transform) data from Azure Table Storage to Cosmos DB Sql API

I have a Azure Table that is storing a Customer object with a nested address object as per following.
public class Customer {
public Guid Id { get; set; }
public string Name { get; set; }
public Address Address { get; set; }
public class Address {
public string AddressLine1 { get; set; }
public string AddressLine2 { get; set; }
public string City { get; set; }
public string Postcode { get; set; }
The Customer object gets stored in a Azure Table with columns like this:
Child object gets flattened and gets columns at the same level as Table Storage doesn't support nested objects.
I want to migrate this to Cosmos DB SQL API. What's the best way to migrate this data so that I end up with a nested json document instead of a flat one with these underscore columns?
I want to migrate this data so that it looks something like this in Cosmos:
Id: 2fca57ec-8c13-4f2c-81c7-d6b649ca6296,
Name: "John Smith",
Address: {
AddressLine1: '123 Street',
AddressLine2: '',
City: 'City',
Postcode: '1234'
I have tried using Cosmos Data Migration tool (deprecated?) and Azure Data factory but couldn't figure out how to convert the Address_* columns to a nested Address object instead of ending up as flat attributes in the json document.
Is there a way to easily map it to a nested child object Or will I have to write custom code to do the migration?
Unfortunately, there is no out of the box solution for this kind of migration.
Easier option would be to write custom code to loop through the TableEntities, construct the document object and add the item to your Cosmos container.
There is no straightforward solution offered by Microsoft (Azure Storage Explorer) to overcome this challenge but leveraging a third-party tool like Cerebrata ( could help you migrate your data from Azure Table Storage to Cosmos DB SQL API in a simple copy/paste model.
This way you can also avoid spending a good amount of time on custom coding and also view your migrated data in a table format rather than a complicated JSON format.
Disclaimer: It’s purely based on my experience

How to read and parse xml file present in azure data lake store using .NET?

I have few xml files in azure data lake store. i want to fetch those files from azure data lake and parse it.
How can i do that?
I've performed this type of operation with a custom extractor. There were two parts to it: 1., A class that mapped the XML elements to properties using System.Xml.Serialization, and 2., The extractor method which creates an XmlSerializer object of that class type, and invokes the Deserialize method of that object over input.Basestream (the data). Finally, query with U-SQL using your custom extractor--which requires the library to be compiled, uploaded, registered, and, finally, referenced--before it is invoked with the USING statement of your query.
Part 1:
public class MyDataClass
public Guid Column0 { get; set; }
public long Column1 { get; set; }
public string Column2 { get; set; }
}`enter code here`
Part 2:
[SqlUserDefinedExtractor(AtomicFileProcessing = true)]
public class Xml : IExtractor
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
XmlSerializer ser = new XmlSerializer(typeof(MyDataClass));
MyDataClass data = (MyDataClass)ser.Deserialize(input.BaseStream);
//blah blah
output.Set<Guid>(0, data.Column0);
output.Set<long>(1, data.Column1);
output.Set<string>(2, data.Column2);
You can use Azure Data Lake Store .NET SDK. Following tutorial walks you through on performing various file operations on files stored on Data Lake Store.

Orchard: how to persist a record without content

Allright, this should be fairly easy.
I would like to persist some records for my module in Orchard (1.7.2) without those records being also a ContentPartRecord.
In other words, I would like to be able to persist in DB the following objects:
public class LogItemRecord
public virtual string Message { get; set; }
..which is already mapped on to the db. But notice that this class is not derived from ContentPartRecord, as it is most certainly not one.
However, when I call IRepository instance's .Create method, all I get is a lousy nHibernate exception:
No persister for: MyModule.Models.LogItemRecord
...which disappears if I do declare the LogItem record as having been inherited from ContentPartRecord, but trying to persist that, apart from being hacky-tacky, runs into an exception of its own, where nHibernate again justly complains that the Id value for the record is zero, though in not so many words.
So... how do I play nicely with Orchard and use its API to persist objects of my own that are not ContentParts / ContentItems?
I'm running 1.7.3 (also tested in 1.7.2) and have successfully been able to persist the following class to the DB:
public class ContactRecord
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual string JobTitle { get; set; }
public virtual string Email { get; set; }
public virtual string Phone { get; set; }
Here are the relevant lines from Migrations.cs
table => table
.Column<int>("Id", col => col.Identity().PrimaryKey())
I'm going to assume that the code you've shown for LogItemRecord is the complete class definition when making the following statement...
I think that any Record class you store in the DB needs an Id property, and that property should be marked as Identity and PrimaryKey in the table definition (as I've done above).
When you create a *Record class which inherits from ContentPartRecord and setup the table like
table => table
// more column definitions
then you get the Id property/PK "for free" by inheritance and calling .ContentPartRecord() in the Migration.
See the PersonRecord in the Orchard Training Demo Module for another example of storing a standard class as a record in the DB.

DDD: Modeling simple domain with two aggregate roots

Let's say I want to create action web site where members would be able to bid for items. To model this domain I have three classes: Member, Item and Bid.
My brainstorming would go something like this:
Item can contain multiple bids
Bid is associated with one Item and one Member
Member can contain multiple bids
Member and Item can exist without bid instance
Bid instance can't exist without both Member and Item
Considering all this it is obvious that since Member and Item objects are independent we can consider them aggregate roots. Bid will be part of one of these aggregate. That is clear but what is confusing to me right now is which aggregate root should I choose? Item or Member?
This is example from Pro ASP.NET MVC 3 Framework book by Apress, and the way they did is like following:
Which gives following code:
public class Member
public string LoginName { get; set; } // The unique key
public int ReputationPoints { get; set; }
public class Item
public int ItemID { get; private set; } // The unique key
public string Title { get; set; }
public string Description { get; set; }
public DateTime AuctionEndDate { get; set; }
public IList<Bid> Bids { get; set; }
public class Bid
public Member Member { get; set; }
public DateTime DatePlaced { get; set; }
public decimal BidAmount { get; set; }
Member and Item are aggregate roots here and Bid is contained within Item.
Now let's say that I have application use case: "Get all bids posted by specific member". Does that mean that I would have to first get all Items (eg. from data base via repository interface) and then enumerate all bids for each Item trying to find matching Member? Isn't that a bit inefficient? So a better way would be then to aggregate Bid objects inside of Member. But in that case consider the new use case: "Get all bids for specific item". Now again we need to go other way around to get all bids...
So taking into account that I need to implement both of these use cases in my application, what is the right and efficient way to model this domain then?
Your domain should really reflect only Command (CQRS) requirements (update/change data). I presume that you need Queries (read data, no update/change of data): "Get all bids for specific item" and "Get all bids posted by specific member". So, this "querying" has nothing to do with the domain, as the query implementation is independent on the command implementation (command is calling a domain method). This gives you a freedom to implement each query in an efficient way. My approach is to implement an efficient DB view getting you only data you want to display in UI. Then you create a new class called BidForItemDto (DTO = data transfer object) and you map data from DB view into a collection of BidForItemDto (you can do it manually via ADO.NET or use NHibernate (preferred, does everything for you)). The same for the second query, create a new class called BidPostedByMemberDto.
So, if it is queries you need, just forget about domain, realize that it's just data you want to display in UI, and query them efficiently from the DB. Only when you do some action in UI (click a button to place a bid for instance), it's executing a command "place a bid", which would at the end call domain method Item.PlaceBid(Member member, DateTime date, decimal amount). And btw, IMHO is it an Item which "has many bids", and the domain method "place bid" would surely need to access previous bids to implement the whole logic correctly. Placing bids collection into Member does not make much sense to me...
From the top of my head some examples of DB views and sql queries:
Get all bids for specific item:
create view BidForItemDto
from Item i
join Bid b ON b.ItemId = i.ItemId
from BidFormItemDto
where ItemId = <provide item id>
Get all bids posted by specific member:
create view BidPostedByMemberDto
from Member m
join Bid b ON b.MemberId = i.MemberId
from BidPostedByMemberDto
where MemberId = <provide member id>

Azure table storage - pattern for parent-child (self referencing schema)

Using Windows Azure Table Storage (WATS) and trying to update the app to use Azure. I've read many articles, and am not sure on the best approach for this, that is parent to child in a self referencing model.
ie a single parent message could have many child sub-messages. In a DB model, it would be a self referencing table.
How would I best structure this for WATS so that when I make a query "Give me 10 parent records", it will also return all the child-messages belonging to the parent...
The entity of the message / submessage as below. I've tried to define the PK and RK as below:
public class TextCacheEntity : AzureTableEntity // custom table inherits AzureTableEntity
public override void GenerateKeys()
PartitionKey = string.Format("{0}_{1}_{2}", MessageType, AccountId.PadThis(), ParentMessageId );
RowKey = string.Format("{0}_{1}", DateOfMessage.Ticks.ReverseTicks(), MessageId);
public string MessageType { get; set; }
public int AccountId { get; set; }
public DateTime DateOfMessage { get; set; }
public string MessageId { get; set; }
public string ParentMessageId { get; set; }
// other properties...
I thought of an implementation so the child messages store the parentMessagesId, and the parent parentMessageId would be empty.
The pattern would then be
Get the parent messages
.Where(o => o.ParititionKey == "Parent_000000000000001_").Take(10)
Get the child messages. Iterate through all the parent messages and using a parallel for loop
.Where(o => o.ParititionKey == "Child_000000000000001_" + parentMessageId)
But the problem is that this will result in 11 queries !
See this example by Scott Densmore:
You can do this by using the same PK for both. There are a couple reasons to do this, but one good one is that you can then also issue batch commands for parent and children at once and achieve a type of consistent transaction. Also, when they share the same PK within the same table, it means they are going to be colocated together and served from the same partition. You are less likely to continuation tokens (but you should still expect them). To differentiate between parent and children you can either add an attribute or use the RowKey perhaps.
The only trick to this (and the model you already ahve), is that if the parent and children are not the same CLR type, you will have issues with serialization in WCF DataServices. You can fix this of course by creating an uber-CLR type that has both child and parent properties or you can override serialization with the ReadingEntity event and handle it yourself.
Anyhow, use the same PK for both children and parent. Then when you search PK ranges you will always get parents and children returned at once (you can discriminate with a Where clause predicate if you wish).
