Servicestack - Ormlite - high volume data loading - servicestack

I am getting some issues with Servicestack and OrmLite in high data loading scenarios.
Specifically,
1. I have a list of 1000 000 + entities
2. I would like to insert them into Db (using Sql Server) if record does not exist yet
Thus,
public class Entity
{
[Autoincrement]
public int Id {get;set;}
public string Name {get;set;}
public string Address {get;set;}
}
Now for the import logic,
List<Entity> entities = oneMillionEntities.ToList();
foreach (var entity in entities)
{
if (!db.Exists<Entity>(ar => ar.Address == entity.Address))
{
db.Save(entity);
}
}
Issue is that quite often db is still busy with save action thus db.Exists does not always produce correct result. What is the best way of handling these scenarios?

Try
// Prepare SqlExpression
var ev = Db.From<Entity>().Select(p => p.Address).GroupBy(p => p.Address);
// Execute SqlExpression and transform result to HashSet
var dbAddresses = Db.SqlList(ev).ToHashSet();
// Filter local entities and get only local entities with different addresses
var filteredEntities = oneMillionEntities.Where(p =>
!dbAddresses.Contains(p.Address));
// Bulk insert
db.InsertAll(filteredEntities.ToList());

Related

"Traditional" one-to-many Query with RavenDB

I know the include-feature of RavenDB. It allows me to fetch a referenced document right away in one roundtrip to the database. But my problem is: The document i fetch in the first place is not including a reference to the "other" documents. But the "other" documents have references to the current document.
Imagine a setup where we have sites across the world. Each site may trigger various alarms. Each alarm has a reference to the site via siteId.
Now i would like to get a list of all the sites including all alarms. But it looks like, this is not possible with RavenDB? Since include only accepts a "path" in the site-Document which holds an id (or an array of ids) to the referenced document.
This could be solved by providing an array of alarmIds within the site'-document and referencing this array in include. But in contrast to a lot of examples featuring stuff like an orderwithlineItemswhere the order is a self contained thing, mysite` will be running for years, collecting alarms anywhere between 0 and a million. Which seems to be a bad idea to me.
Of course i could go the other way round: Query all alarms and include the sites via sitesId. But this would not return a site that has zero alarms.
So is this just a design error on my side? To i misunderstand something? Or is it just not possible to do this in one query and prevent a "n+1 query"?
public class A
{
public string Id { get; set; }
}
public class B
{
public string Id { get; set; }
public string A { get; set; }
}
public class MultiMapIndex : AbstractMultiMapIndexCreationTask<MultiMapIndex.Result>
{
public class Result
{
public string Id { get; set; }
public IEnumerable<string> Bs { get; set; }
}
public MultiMapIndex()
{
AddMap<A>(items => from a in items
select new Result {Id = a.Id, Bs = new string[0]});
AddMap<B>(items => from b in items
select new Result {Id = b.A, Bs = new[] {b.Id}});
Reduce = results => from result in results
group result by result.Id
into g
select new Result {Id = g.Key, Bs = g.SelectMany(r => r.Bs)};
}
}
[Fact]
public async Task TestCase()
{
using var store = GetDocumentStore();
await new MultiMapIndex().ExecuteAsync(store);
using (var session = store.OpenAsyncSession())
{
await session.StoreAsync(new B {A = "a/1"}, "b/0");
await session.StoreAsync(new A(), "a/1");
await session.StoreAsync(new A(), "a/2");
await session.SaveChangesAsync();
}
WaitForIndexing(store);
using (var session = store.OpenAsyncSession())
{
var results = await session.Query<MultiMapIndex.Result, MultiMapIndex>()
.Include(r => r.Bs)
.ToArrayAsync();
var before = session.Advanced.NumberOfRequests;
var bs = session.LoadAsync<B>(results[0].Bs);
Assert.Equal(before, session.Advanced.NumberOfRequests);
}
}
If you do choose to query all Alarms, as you mention,
then you can create a Map-Reduce index on the Alarms collection which will group-by the Sites.
Then you can query this Map-Reduce index and know per Site the count of Alarms it has or doesn't have...
https://demo.ravendb.net/demos/csharp/static-indexes/map-reduce-index

Parameter lengths for parameterized queries in OrmLite

A POCO update in OrmLite executes SQL like this example:
(#P1 varchar(1043),#P2 varchar(6))
UPDATE table
SET FILEDATA=#P1
WHERE FILEID=#P2
But it leads to multiple query plans based on different #P1 and #P2 values with varying parameter lengths.
So, what's the best way(s) to specify data types/lengths for parameterized queries in Ormlite, so that query plans are cached properly, and avoids multiple query plans due to variable parameter lengths?
Here's a similar situation with having variable length strings: https://dba.stackexchange.com/questions/216330/parameterized-query-creating-many-plans
Update
Here's an example:
Database Table
dbo.Users
Id (PK, int, not null)
Email (nvarchar(150), not null)
POCO
[Alias("Users")]
public class User
{
[PrimaryKey]
[AutoIncrement]
public int Id { get; set; }
public string Email { get; set; }
}
Code
int userId = 1;
User user;
// get User
using (var db = DbConn.OpenDbConnection())
{
user = db.SingleById<User>(userId);
}
// print User email (hi#example.com)
Console.WriteLine(user.Email);
// update User email
using (var db = DbConn.OpenDbConnection())
{
user.Email = "tester#example.org";
db.Update(User);
}
The update operation will result in an SQL query similar to the one I've posted at the top, with variable length of parameters. It causes multiple query plans to be created by SQL Server due to variable length of parameters. Ideally, the query should have fixed length of parameters, so that a query plan can be created, cached and reused for the same operations (e.g. User update) with varying parameter values (i.e. different email).
The Size of string parameters are now being specified from this commit where it takes the default string size of the configured StringConverter. This change is available from v5.5.1 that's now available on MyGet.
If needed its behavior can be overridden by replacing the String Converter and overriding InitDbParam().

Create table with custom name dynamically and insert with custom table name

I want to create the table with custom name but I cannot find the sample code. I notice the only way to create table is by generic type like db.CreateTable(). May I know if there is a way to create the table name dynamically instead of using Alias? The reason is because sometime we want to store the same object type into different tables like 2015_january_activity, 2015_february_activity.
Apart from this, the db.Insert also very limited to object type. Is there anyway to insert by passing in the table name?
I think these features are very important as it exists in NoSQL solution for long and it's very flexible. Thanks.
OrmLite is primarily a code-first ORM which uses typed POCO's to create and query the schema of matching RDMBS tables. It also supports executing Custom SQL using the Custom SQL API's.
One option to use a different table name is to change the Alias at runtime as seen in this previous answer where you can create custom extension methods to modify the name of the table, e.g:
public static class GenericTableExtensions
{
static object ExecWithAlias<T>(string table, Func<object> fn)
{
var modelDef = typeof(T).GetModelMetadata();
lock (modelDef) {
var hold = modelDef.Alias;
try {
modelDef.Alias = table;
return fn();
}
finally {
modelDef.Alias = hold;
}
}
}
public static void DropAndCreateTable<T>(this IDbConnection db, string table) {
ExecWithAlias<T>(table, () => { db.DropAndCreateTable<T>(); return null; });
}
public static long Insert<T>(this IDbConnection db, string table, T obj, bool selectIdentity = false) {
return (long)ExecWithAlias<T>(table, () => db.Insert(obj, selectIdentity));
}
public static List<T> Select<T>(this IDbConnection db, string table, Func<SqlExpression<T>, SqlExpression<T>> expression) {
return (List<T>)ExecWithAlias<T>(table, () => db.Select(expression));
}
public static int Update<T>(this IDbConnection db, string table, T item, Expression<Func<T, bool>> where) {
return (int)ExecWithAlias<T>(table, () => db.Update(item, where));
}
}
These extension methods provide additional API's that let you change the name of the table used, e.g:
var tableName = "TableA"'
db.DropAndCreateTable<GenericEntity>(tableName);
db.Insert(tableName, new GenericEntity { Id = 1, ColumnA = "A" });
var rows = db.Select<GenericEntity>(tableName, q =>
q.Where(x => x.ColumnA == "A"));
rows.PrintDump();
db.Update(tableName, new GenericEntity { ColumnA = "B" },
where: q => q.ColumnA == "A");
rows = db.Select<GenericEntity>(tableName, q =>
q.Where(x => x.ColumnA == "B"));
rows.PrintDump();
This example is also available in the GenericTableExpressions.cs integration test.

'Unexpected element: XX' during deserialization MongoDB C#

I'm trying to persist an object into a MongoDB, using the following bit of code:
public class myClass
{
public string Heading { get; set; }
public string Body { get; set; }
}
static void Main(string[] args)
{
var mongo = MongoServer.Create();
var db = mongo.GetDatabase("myDb");
var col = db.GetCollection<BsonDocument>("myCollection");
var myinstance = new myClass();
col.Insert(myinstance);
var query = Query.And(Query.EQ("_id", new ObjectId("4df06c23f0e7e51f087611f7)));
var res = col.Find(query);
foreach (var doc in res)
{
var obj = BsonSerializer.Deserialize<myClass>(doc);
}
}
However I get the following exception 'Unexpected element: _id' when trying to Deserialize the document.
So do I need to Deserialize in another way?? What is the preferred way of doing this?
TIA
Søren
You are searching for a given document using an ObjectId but when you save an instance of MyClass you aren't providing an Id property so the driver will create one for you (you can make any property the id by adding the [BsonId] attribute to it), when you retrieve that document you don't have an Id so you get the deserialization error.
You can add the BsonIgnorExtraElements attribute to the class as Chris said, but you should really add an Id property of type ObjectId to your class, you obviously need the Id (as you are using it in your query). As the _id property is reserved for the primary key, you are only ever going to retrieve a single document so you would be better off writing your query like this:
col.FindOneById(new ObjectId("4df06c23f0e7e51f087611f7"));
The fact that you are deserializing to an instance of MyClass once you retrieve the document lends itself to strongly typing the collection, so where you create an instance of the collection you can do this
var col = db.GetCollection<MyClass>("myCollection");
so that when you retrieve the document using the FindOneById method the driver will take care of the deserialization for you putting it all together (provided you add the Id property to the class) you could write
var col = db.GetCollection<MyClass>("myCollection");
MyClass myClass = col.FindOneById(new ObjectId("4df06c23f0e7e51f087611f7"));
One final thing to note, as the _id property is created for you on save by the driver, if you were to leave it off your MyClass instance, every time you saved that document you would get a new Id and hence a new document, so if you saved it n times you would have n documents, which probably isn't what you want.
A slight variation of Projapati's answer. First Mongo will deserialize the id value happily to a property named Id which is more chsarp-ish. But you don't necessarily need to do this if you are just retrieving data.
You can add [BsonIgnoreExtraElements] to your class and it should work. This will allow you to return a subset of the data, great for queries and view-models.
Try adding _id to your class.
This usually happens when your class doesn't have members for all fields in your document.
public class myClass
{
public ObjectId _id { get; set; }
public string Heading { get; set; }
public string Body { get; set; }
}

Azure table storage - pattern for parent-child (self referencing schema)

Using Windows Azure Table Storage (WATS) and trying to update the app to use Azure. I've read many articles, and am not sure on the best approach for this, that is parent to child in a self referencing model.
ie a single parent message could have many child sub-messages. In a DB model, it would be a self referencing table.
How would I best structure this for WATS so that when I make a query "Give me 10 parent records", it will also return all the child-messages belonging to the parent...
The entity of the message / submessage as below. I've tried to define the PK and RK as below:
public class TextCacheEntity : AzureTableEntity // custom table inherits AzureTableEntity
{
public override void GenerateKeys()
{
PartitionKey = string.Format("{0}_{1}_{2}", MessageType, AccountId.PadThis(), ParentMessageId );
RowKey = string.Format("{0}_{1}", DateOfMessage.Ticks.ReverseTicks(), MessageId);
}
public string MessageType { get; set; }
public int AccountId { get; set; }
public DateTime DateOfMessage { get; set; }
public string MessageId { get; set; }
public string ParentMessageId { get; set; }
// other properties...
}
I thought of an implementation so the child messages store the parentMessagesId, and the parent parentMessageId would be empty.
The pattern would then be
Get the parent messages
.Where(o => o.ParititionKey == "Parent_000000000000001_").Take(10)
Get the child messages. Iterate through all the parent messages and using a parallel for loop
.Where(o => o.ParititionKey == "Child_000000000000001_" + parentMessageId)
But the problem is that this will result in 11 queries !
See this example by Scott Densmore:
http://scottdensmore.typepad.com/blog/2011/04/multi-entity-schema-tables-in-windows-azure.html
You can do this by using the same PK for both. There are a couple reasons to do this, but one good one is that you can then also issue batch commands for parent and children at once and achieve a type of consistent transaction. Also, when they share the same PK within the same table, it means they are going to be colocated together and served from the same partition. You are less likely to continuation tokens (but you should still expect them). To differentiate between parent and children you can either add an attribute or use the RowKey perhaps.
The only trick to this (and the model you already ahve), is that if the parent and children are not the same CLR type, you will have issues with serialization in WCF DataServices. You can fix this of course by creating an uber-CLR type that has both child and parent properties or you can override serialization with the ReadingEntity event and handle it yourself.
Anyhow, use the same PK for both children and parent. Then when you search PK ranges you will always get parents and children returned at once (you can discriminate with a Where clause predicate if you wish).

Resources