Background (ie what the heck is a relative complement?)
Relative Complement
What I'm trying to do
Let's say I've got a custom Vehicle entity that has a VehicleType option set that is either "Car", or "Truck". There is a 1 to many relationship between Contact and Vehicle (ie. ContactId is on the vehicle entity). How do I write an XRM query (Linq To CRM, QueryExpression, fetch Xml, whatever) that returns the contacts with only cars?
Option 1:
I’d prefer a modification of the proposal that AdamV makes above. I can’t think of a way that you’d get this particular query answered using Linq to CRM, Query Expressions, FetchXML alone. Daryl doesn’t offer what the client is, but I would suppose if Linq and Query Expressions were acceptable offerings, .NET is on the table. Creating aggregate fields containing the count of the related entity on the parent entity (contact in this case) offers more than the Boolean option. If the query requirements ever changed to a threshold (more than X cars, less than Y trucks, between X and Y total vehicles) the Boolean options fails to deliver. The client in this question isn’t known, but I can’t think of many (any?) cases where pulling all the records to the client on a set of 500K+ rows is more efficient than a the SQL query that CRM would make on your behalf against several integer fields with range clauses.
Upside:
Maintains client purity in Query approach
Simple client query
Probably as performant as possible
Downside:
Setups for Aggregate fields
Workflow or plugin to manage the increment and decrement of the aggregate fields
SQL Script for initial load of the aggregates.
Risk that aggregate fields get out of sync (workflow or plugin fails)
Option 2:
If purity within the client isn’t essential, and .NET is on the table – skip the aggregate fields and the setup and just run SQL against the Views. If you don’t want to work with the ADO.NET, a thin ORM like Dapper, Massive, or PetaPOCO can still give you an object model. As Andreas offers in his comment on the OP’s first answer, it seems like something fairly trivial to do in SQL.
Sketching something from top of mind:
SELECT c.*
FROM Contact
WHERE C.Contactid in (
Select contactid
FROM Vehicle v
group by v.contactid , v.type
having v.type = ‘Car’ and count(contactid) > 1
)
AND NOT IN (
Select contactid
FROM Vehicle v
group by v.contactid , v.type
having v.type <> ‘Car’ and count(contactid) > 1
)
Upside:
Much less work
CRM Entities get left alone
Downside:
Depending on the client and/or the application mixing DataAccess methods is a bit kludgy.
Likely less performant than Option 1
Option 3:
Mix and Match: Take the aggregate fields from Option 1. But update them using a scheduled SQL job (or something similar) with a query similar to the initial load job you’d need to write in Option 1
Upside:
Takes most of the work and risk out of Option 1
Keeps all of the performance of Option 1
Downside:
Some will see this as an unsupported feature.
In order to order to perform a true Relative Complement Query you need to be able to perform a subquery.
Your query would basically say give me all the contacts with cars, and then, within those results, remove any contacts that have a vehicle that isn't a car. This is what the SQL in #JasonKoopmans answer does. Unfortunetly, CRM does not support SubQueries.
Therefore, the only way to achieve this is to either perform the sub query on the client side, as I resorted to doing, or storing the results of what would be the subquery in a manner that can be accessed through the main query (ie storing counts on the contact entity).
You could theoretically do this "on the fly" by making a SubQueryResult entity that stores a ContactId, and SubQueryId. You'd first pull back the contacts that have at least 1 car, and create a SubQueryResult record for each record, with it's contactId, and a single SubQueryId that is generated client side to tie them all together.
Then you'd do another query that says give me all the contacts that are in this SubQueryResult with this SubQueryId, that do not have any vehicles that aren't cars.
I could only assume that this wouldn't be any more efficient than performing the two separate queries and performing the filter client side. Although with the new ExecuteMultipleRequests in the new CRM release, it may be close.
I have resorted to pulling back all of my records in CRM, and performing the check on the client side since CRM 2011 doesn't support this via Query Expressions.
You could write two Fetch XML statements, one to return all contacts and the count of their vehicles, and another to return all contacts and the count of their cars, then compare the list on the client side. But once again, you're having to return every contact and filter it client side.
It's not tested but how about this query expression? I'm linking in the Vehicle entity as an inner join, requiring that it's a Car. I'm assuming that the field VehicleType is a String because I'm a bit lazy and don't want to test it (I'm typing this hardcore style, no compilation - pure brain work).
Optionally, you might want to add a Criteria section as well to control which of the Contact instances that actually get retrieved. Do tell how it went!
Sorry for the verbosity. I know you like it short. My brains work better when circumlocutory.
new QueryExpression
{
EntityName = "contact",
ColumnSet = new ColumnSet("fullname"),
LinkEntities =
{
new LinkEntity
{
JoinOperator = JoinOperator.Inner,
LinkFromEntityName = "contact",
LinkFromAttributeName = "contactid",
LinkToEntityName = "vehicle",
LinkToAttributeName = "contactid",
Columns = new ColumnSet("vehicletype"),
EntityAlias = "Vroom",
//LinkCriteria = { Conditions =
//{
// new ConditionExpression(
// "vehicletype", ConditionOperator.Equal, "car")
//} }
LinkCriteria = { Conditions =
{
new ConditionExpression(
"vehicletype", ConditionOperator.NotEqual, "truck")
} }
}
}
};
EDIT:
I've talk to my MVP Gustaf Westerlund and he's suggested the following work-around. Let me stress that it's not an answer to your original question. It's just a way to solve it. And it's cumbersome. :)
So, the hint is to add a flag in the Contact or Person entity. Then, every time you create a new instance of Vehicle, you need to fire a message and using a plugin, update the information on the first about the creation of the latter.
This has several drawbacks.
It requires us to do stuff.
It's not the straight-forward do-this-and-that type of approach.
Maintenance is higher for every new type of Vehicle one adds.
Buggibility is elevated since there are many cases to regard (what happens to the flagification when a Vehicle instance is reasigned, deleted etc.).
So, my answer to your question is changed to: "can't be done". This remains effective until (gladly) proven wrong by presented alternative solution. Duck!
Personally, I'd fetch (almost) everything and unleash the hounds of LINQ onto it. But I'd do that without smiling nor proud. :)
Related
Somebody know how to get the last row modified from a table?
For example:
I have a Service Builder with a "Car" entity, this entity has a column called "LastModified". I want something that get the one "Car" (the last cart modified).
I don't know if create a finder with where clause is a good practice.
Thank you!
First off, service builder entities have a column called "modifiedDate" by default. Just want to make sure you're aware of that so you aren't creating redundant columns: "LastModified" and "modifiedDate".
Secondly, you could use either a custom SQL query or a dynamic query to get the Car with the most recent modifiedDate. Both approaches are documented:
https://dev.liferay.com/develop/tutorials/-/knowledge_base/6-2/developing-custom-sql-queries
https://dev.liferay.com/develop/tutorials/-/knowledge_base/6-2/leveraging-hibernates-criteria-api
Personally, I'd try the dynamic query approach (leveraging hibernate's criteria API) first. I think it's slightly simpler.
In your finder method, you could do something like this:
Order order = OrderFactoryUtil.desc("modifiedDate");
DynamicQuery carQuery = DynamicQueryFactoryUtil.forClass(Car.class).addOrder(order).setLimit(0, 1);
List<Car> cars = CarLocalServiceUtil.dynamicQuery(carQuery);
The setLimit(0, 1) limits the result of the query to only the first Car.
Let's use the classic example of blog context. In our domain we have the following scenarios: Users can write Posts. Posts must be cataloged at least in one Category. Posts can be described using Tags. Users can comment on Posts.
The four entities (Post, Category, Tag, Comment) are implemented as different aggregates because of I have not detected any rule for that an entity data should interfere in another. So, for each aggregate I will have one repository that represent it. Too, each aggregate reference others by his id.
Following CQRS, from this scenario I have deducted typical use cases that result on commands such as WriteNewPostCommand, PublishPostCommand, DeletePostCommand etc... along with their respective queries to get data from repositories. FindPostByIdQuery, FindTagByTagNameQuery, FindPostsByAuthorIdQuery etc...
Depending on which site of the app we are (backend or fronted) we will have queries more or less complex. So, if we are on the front page maybe we need build some widgets to get last comments, latest post of a category, etc... Queries that involve a simple Query object (few search criterias) and a QueryHandler very simple (a single repository as dependency on the handler class)
But in other places this queries can be more complex. In an admin panel we require to show in a table a relation that satisfy a complex search criteria. Might be interesting search posts by: author name (no id), categories names, tags name, publish date... Criterias that belongs to different aggregates and different repositories.
In addition, in our table of post we dont want to show the post along with author ID, or categories ID. We need to show all information (name user, avatar, category name, category icon etc).
My questions are:
At infrastructure layer, when we design repositories, the search methods (findAll, findById, findByCriterias...), should have return the corresponding entity referencing to all associations id's? I mean, If a have a method findPostById(uuid) or findPostByCustomFilter(filter), should return a post instance with a reference to all categories id it has, all tags id, and author id that it has? Or should my repo have some kind of method that populates a given post instance with the associations I want?
If I want to search posts created from 12/12/2014, written by John, and categorised on "News" and "Videos" categories and tags "sci-fi" and "adventure", and get the full details of each aggregate, how should create my Query and QueryHandler?
a) Create a Query with all my parameters (authorName, categoriesNames, TagsNames, if a want retrive User, Category, Tag association full detailed) and then his QueryHandler ensamble the different read models in a only one. Or...
b) Create different Queries (FindCategoryByName, FindTagByName, FindUserByName) and then my web controller calls them for later
call to FindPostQuery but now passing him the authorid, categoryid, tagid returned from the other queries?
The b) solution appear more clean but it seems me more expensive.
On the query side, there are no entities. You are free to populate your read models in any way suits your requirements best. Whatever data you need to display on (a part of) the screen, you put it in the read model. It's not the command side repositories that return these read models but specialized query side data access objects.
You mentioned "complex search criteria" -- I recommend you model it with a corresponding SearchCriteria object. This object would be technnology agnostic, but it would be passed to your Query side data access object that would know how to combine the criteria to build a lower level query for the specific data store it's targeted at.
With simple applications like this, it's easier to not get distracted by aggregates. Do event sourcing, subscribe to the events by one set of tables that is easy to query the way you want.
Another words, it sounds like you're main goal is to be able to query easily for the scenarios you describe. Start with that end goal. Now write your event handler to adjust your tables accordingly.
Start with events and the UI. Then everything else will fit easily. Google "Event Modeling" as it will help you formulate ideas sound what and how you want to build these style of applications.
I can see three problems in your approach and they need to be solved separately:
In CQRS the Queries are completely separate from the Commands. So, don't try to solve your queries with your Commands pipelines repositories. The point of CQRS is precisely to allow you to solve the commands and queries in very different ways, as they have very different requirements.
You mention DDD in the question title, but you don't mention your Bounded Contexts in the question itself. If you follow DDD, you'll most likely have more than one BC. For example, in your question, it could be that CategoryName and AuthorName belong to two different BCs, which are also different from the BC where the blog posts are. If that is the case and each BC properly owns its own data, the data that you want to search by and show in the UI will be stored potentially in different databases, therefore implementing a query in the DB with a join might not even be possible.
Searching and Reading data are two different concerns and can/should be solved differently. When you search, you get some search criteria (including sorting and paging) and the result is basically a list of IDs (authorIds, postIds, commentIds). When you Read data, you get one or more Ids and the result is one or more DTOs with all the required data properties. It is normal that you need to read data from multiple BCs to populate a single page, that's called UI composition.
So if we agree on these 3 points and especially focussing on point 3, I would suggest the following:
Figure out all the searches that you want to do and see if you can decompose them to simple searches by BC. For example, search blog posts by author name is a problem, because the author information could be in a different BC than the blog posts. So, why not implement a SearchAuthorByName in the Authors BC and then a SearchPostsByAuthorId in the Posts BC. You can do this from the Client itself or from the API. Doing it in the client gives the client a lot of flexibility because there are many ways a client can get an authorId (from a MyFavourites list, from a paginated list or from a search by name) and then get the posts by authorId is a separate operation. You can do the same by tags, categories and other things. The Post will have Ids, but not the extra details about those IDs.
Potentially, you might want more complicated searches. As long as the search criteria (including sorting fields) contain fields from a single BC, you can easily create a read model and execute the search there. Note that this is only for the search criteria. If the search result needs data from multiple BCs you can solve it with UI composition. But if the search criteria contain fields from multiple BCs, then you'll need some sort of Search engine capable of indexing data coming from multiple sources. This is especially evident if you want to do full-text search, search by categories, tags, etc. with large quantities of data. You will need to use some specialized service like Elastic Search and it won't belong to any of your existing BCs, it'll be like a supporting service.
From CQRS you will have a separeted Stack for Queries and Commands. Your query stack should represent a diferente module, namespace, dll or package at your project.
a) You will create one QueryModel and this query model will return whatever you need. If you are familiar with Entity Framework or NHibernate, you will create a Façade to hold this queries togheter, DbContext or Session.
b) You can create this separeted queries, but saying again, if you are familiar with any ORM your should return the set that represents the model, return every set as IQueryable and use LET (Linq Expression Trees) to make your Query stack more dynamic.
Using Entity Framework and C# for exemple:
public class QueryModelDatabase : DbContext, IQueryModelDatabase
{
public QueryModelDatabase() : base("dbname")
{
_products = base.Set<Product>();
_orders = base.Set<Order>();
}
private readonly DbSet<Order> _orders = null;
private readonly DbSet<Product> _products = null;
public IQueryable<Order> Orders
{
get { return this._orders.Include("Items").Include("Items.Product"); }
}
public IQueryable<Product> Products
{
get { return _products; }
}
}
Then you should do queries the way you need and return anything:
using (var db = new QueryModelDatabase())
{
var queryable = from o in db.Orders.Include(p => p.Items).Include("Details.Product")
where o.OrderId == orderId
select new OrderFoundViewModel
{
Id = o.OrderId,
State = o.State.ToString(),
Total = o.Total,
OrderDate = o.Date,
Details = o.Items
};
try
{
var o = queryable.First();
return o;
}
catch (InvalidOperationException)
{
return new OrderFoundViewModel();
}
}
I'm developing service which consumes CRM 2011 data via dynamic entities (as in, Microsoft.Xrm.Sdk.Entity, the late-binding method). I'm deliberately not using Xrm.cs method (early binding) in an attempt to keep my solution generic.
Also, I want to avoid connecting to a CRM database directly (e.g. EDMX) as this would stop my solution being usable for a hosted CRM (e.g. with no direct DB access).
I have the following (simplified) requirement, I'm really struggling with the selection criteria:
A random 7% of records needs to be selected (and updated).
In SQL, the selection criteria would be relatively easy - I know how to select a random percentage of records. Something like:
SELECT TOP 7 PERCENT * FROM
(
SELECT TOP 1000 NEWID() AS Foo, [someColumns]
FROM [someTable]
)
AS Bar ORDER BY Bar.Foo ASC
This works perfectly. I gather the LINQ equivalent is something like:
from e in someEntities
orderby Guid.NewGuid()
select e;
There's a problem though, I don't know of a way to use LINQ with CRM 2011 dynamic entities - instead they insist on using either some restrictive QueryExpression classes/syntax, or fetchXML, as seen on this page (MSDN).
I've identified the following options for fulfilling this requirement:
Using dynamic entities, return the whole record set into a List, then simply choose a random selection by index. This however involves returning up to 10,000 records over an internet data service, which may be slow/insecure/etc.
Use a fetchXML statement. Unfortunately I don't know fetchXML, so I don't know if it's possible to do things like COUNT, TOP, PERCENT or NEWID().
Use Xrm.cs and LINQ, or use a Stored Procedure, or a SQL view. All of these options mean tying the solution down to either direct database connectivity and/or early binding, which is not desirable.
Say no to the customer.
Any advise would be greatly appreciated! Can fetchXML perform this query? Is there a better way to do this?
FetchXML does not support this, so you are down to either 1 or 3. And you are right, 3 would only work in the On Premise version, as you can't connect directly to SQL with the CRM Online product. However, that's the one I would go with unless you are absolutely sure the customer will be moving to CRM Online. If you must go with 1, you can at least limit the returned columns to only be the GUID of the record to decrease the payload size. Then when you select your random records, just go get their additional columns if needed (of course this could end up being slower due to "chattiness" depending on how many random records you are dealing with).
Dynamics CRM 2011, at this point, can't give you the degree of querying power that SQL and other LINQ providers can give, so I really believe you'll want to say no to the customer and move to the on-premise version if he/she wants that kind of flexibility.
With that said, a variant of method #1 is to, rather than fetch all rows at once and then choose your random set, fetch a random set from the entity one row at a time until you have the number of rows you want. The downside of this method is that instead of one call to the DB, there are many, which slows down the overall retrieve speed. A POC is below.
As for #2, I believe it's possible to handle all of your request, with some degree of success, using fetchXml. In fact, the only way to get aggregated data is by using fetchXml, and it also supports paging.
As for #3, native SQL is your best bet to get everything you want out of your data at this point, but that notwithstanding, while the LINQ provider is limited, it's a lot easier to transition SQL statements to LINQ than to fetchXML, and it does support late-binding/dynamic entities.
//create a list of random numbers
List<int> randomNumbers = new List<int>();
//declare a percentage of records you'd like to retrieve
double pctg = 0.07;
//use FetchXML to count the # of rows in the table
string fetchXml = #"<fetch aggregate='true'>
<entity name='salesorder'>
<attribute name='salesorderid' aggregate='count' alias='countIds' distinct='false' />
</entity>
</fetch>";
EntityCollection result = _service.RetrieveMultiple(new FetchExpression(fetchXml));
int rowCount = int.Parse(result.Entities[0].FormattedValues["countIds"].Replace(",", ""));
//initalize the random number list for paging
for (int i = 0; i < Math.Ceiling(pctg * rowCount); i++)
{
randomNumbers.Add((new Random(unchecked((int)(DateTime.Now.Ticks >> i)))).Next(rowCount - 1));
}
randomNumbers.Sort();
//page through the rows one at a time until you have the number of rows you want
using (OrganizationServiceContext osc = new OrganizationServiceContext(_service))
{
foreach (int r in randomNumbers)
{
foreach (var er in (from c in osc.CreateQuery("salesorder")
//not especially useful to use the orderby option as you can only order by entity attributes
//orderby c.GetAttributeValue<string>("name")
select new
{
name = c.GetAttributeValue<string>("name")
}).Skip(r).Take(1))
{
Console.WriteLine(er.name);
}
}
}
I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.
I am looking for a query that will work on Sharepoint 2003 to show me all the documents created/touched by a given userID.
I have found tables with the documents (Docs) and tables for users (UserInfo, UserData)
but the relationship between seems a bit odd - there are 99,000 records in our userdata table, and 12,000 records in userinfo - we have 400 users!
I suppose I was expecting a simple 1 to many relationship with a user table having 400 records and joining that to the documents table, but I see thats not the case.
Any help would be appreciated.
Edit:
Thanks Bjorn,
I have translated that query back to the Sharepoint 2003 structure:
select
d.* from
userinfo u join userdata d
on u.tp_siteid = d.tp_siteid
and
u.tp_id = d.tp_author
where
u.tp_login = 'userid'
and
d.tp_iscurrent = 1
This gets me a list of siteid/listid/tp_id's I'll have to see if I can trace those back to a filename / path.
All: any additional help is still appreciated!
I've never looked at the database in SharePoint 2003, but in 2007 UserInfo is connected to Sites, which means that every user has a row in UserInfo for each site collection (or the equivalent 2003 concept). So to identify what a user does you need both the site id and the user's id within that site. In 2007, I would begin with something like this:
select d.* from userinfo u
join alluserdata d on u.tp_siteid = d.tp_siteid
and u.tp_id = d.tp_author
where u.tp_login = '[username]'
and d.tp_iscurrentversion = 1
Update: As others write here, it is not recommended to go directly into the SharePoint database, but I would say use your head and be careful. Updates are an all-caps no-no, but selects depends on the context.
DO NOT QUERY THE SHAREPOINT DATABASE DIRECTLY!
I wonder if I made that clear enough? :)
You really need to look at the object model available in C#, you will need to get an SPSite instance for a SiteCollection, and then iterate over the SPList instances that belong to the SPSite and the SPWeb objects.
Once you have the SPList object, you will need to call GetListItems using a query that filters for the user you want.
That is the supported way of doing what you want.
You should never go to the database directly as SharePoint isn't designed for that at all and there is no guarantee (actually, there's a specific warning) that the structure of the database will be the same between versions and upgrades, and additionally when content is spread over several content databases in a farm there is no guarantee that a query that runs on one content database will do what you expect on another content database.
When you look at the object model for iteration, also note that you will need to dispose() the SPSite and SPWeb objects that you create.
Oh, and yes you may have 400 users, but I would bet that you have 30 sites. The information is repeated in the database per site... 30 x 400 = 12,000 entries in the database.
If you are going to use that query in Sharepoint you should know that creating views on the content database or quering directly against the database seems to be a big No-No. A workaround could be some custom code that iterates through the object model and writes the results to your own database. This could either be timer based or based on an eventtrigger.
You really shouldn't be doing SELECTs with Locks either i.e. adding WITH (NOLOCK) to your queries. Some parts of the system are very timeout sensitive and if you start introducing locks that the system wasn't expecting you can see the system freak out.
But really, you should be doing this via the object model. Mess around with something like IronPython and experimentations with the OM are almost downright pleasant.