DocumentDB ORDER BY removing undefined - azure

I have a data set containing legacy data. We are sorting by a field which is only present in newer documents.
When I sort by this field, only the objects which have this value are returned, even when they pass the WHERE clause.
How do we get a sorted set at the top for objects which have the field, and all those that don't are un-sorted at the end?
I've been trying something like this, but it's not valid:
SELECT T.id, IS_DEFINED(T.createdDate) ? T.createdDate : "2000-01-01T00:00:00" AS createdDate FROM Themes T
WHERE T.customerId = 43333
AND T.isDefault = false
AND (NOT IS_DEFINED(T.isArchived) OR T.isArchived = false)
ORDER BY createdDate ASC
I have also tried:
SELECT T.id FROM Themes T
WHERE T.customerId = 43333
AND T.isDefault = false
AND (NOT IS_DEFINED(T.isArchived) OR T.isArchived = false)
ORDER BY IS_DEFINED(T.createdDate) ? T.createdDate : "2000-01-01T00:00:00" ASC

Short of populating that field in documents where it's missing, the only way I know to do it would be to return all of the data and do the sort client side.
I wonder if you could ORDER BY a projected field and/or if you can use SQL's case and value substitution features in DocumentDB's SQL subset... and if so, are they efficient?
The bottleneck for my system is DocumentDB RUs so I try to do as much as possible client-side, but I'd love to know if this is possible and efficient in DocumentDB's SQL.

In the schema-less databases, there is no defined convention on the type order, let alone on the undefined entries when sorting. Your requirement is still valid nonetheless.
Currently the best solution for this in DocumentDB, is to do two part queries -
Look for NOT IS_DEFINED documents
Query for ORDER BY
If the number of documents without the isArchived property are few and fixed, then it might be a good idea to prepare that list and keep it in a separate document so that the first query is much faster once the document is built one time. All you need is to query that documents by "id" for all the entries in that metadata document and retrieve them first.

Related

TypeORM querybuilder with take,skip and order by

I have an issue with a NestJs service that uses Typeorm and implements pagination logic.
When I use limit and offset on a query I execute using QueryBuilder, it does not return all results (i.e. limit 10 returns only 8 results)- it does work however, meaning that the relations aren't mapped yet still filtered on, orderby does order by a relation field without mapping it too.
The issue began when I changed my limit and offset to take and skip.
It gets buggy when I use order by on a relation column I innerJoin.
For example:
querybuilder = querybuilder.innerJoin(profile.service, service)
then try to do
queryBuilder = queryBuilder.orderBy("service.price", "ASC")
It returns an error
column distinctAlias.service_price does not exist
That's because it looks like you must leftJoinAndSelect the relation in order to have the field alias included in the query.
I don't want to select the relations but only the base entity. Is there any way around it?
Limit and offset as pagination utilities are not an option since in the docs itself it says to not use them when appending joins to the query.

What library do you use for postgres+jsonb in Node?

I would like to do more complex queries on jsonb/documents that contain arrays of objects. Is there any library anyone would recommend for Node? I am using pg but I want to do more advanced queries like select the document where a document has an array with an object with a certain key/value. If there aren't any libraries that do this, does anyone know how I could do it with json functions/etc in psql? or point me to a book/resource where I could learn this advanced querying?
If you need to do really complicated things you're going to be writing SQL no matter what. But for basic queries that involve working with JSONB fields Massive (full disclosure, it's my project) has you covered, and executing handwritten prepared statements is as easy as anything else since scripts are loaded into the API.
Searching an embedded array falls into the 'really complicated' category, unfortunately, but if you know your element positions you could do this quite simply with Massive:
await db.mytable.find({
'somejson.arrayfield[0].key': 'value'
});
This would return all records from mytable where the somejson column has an arrayfield array, the first element in which contains a "key": "value" pair.
For searching, check out the Postgres docs. The specific question you have requires a lateral join on the jsonb_array_elements function like so:
SELECT somejson
FROM mytable
JOIN LATERAL jsonb_array_elements(mytable.somejson->'arrayfield') AS elements
ON TRUE
WHERE elements->>'key' = $1;
With Massive, you'd put this query in a script in your application's /db directory and run it as db.myScriptName('value'). You can use folders to group similar scripts too.

Can I query documents with undefined properties more efficiently in DocumentDB than with IS_DEFINED

I have a documentdb which I need to query a date property. I could initialize the date property to null when i create the document, but i might have a few paths like this:
Document.PathA.Date
Document.PathB.Date
Document.PathC.Date
Not all documents may need PathA or PathC so i dont want to just insert nulls all over the place. I also may need to insert a new PathD after a document's initial creation.
When I use IS_DEFINED on one of these paths It uses a very long running scan, which is understandable but something i'd like to avoid. Is there a way i can change my index for the collection so that i can query missing paths faster?

How to preform a relative complement query in CRM?

Background (ie what the heck is a relative complement?)
Relative Complement
What I'm trying to do
Let's say I've got a custom Vehicle entity that has a VehicleType option set that is either "Car", or "Truck". There is a 1 to many relationship between Contact and Vehicle (ie. ContactId is on the vehicle entity). How do I write an XRM query (Linq To CRM, QueryExpression, fetch Xml, whatever) that returns the contacts with only cars?
Option 1:
I’d prefer a modification of the proposal that AdamV makes above. I can’t think of a way that you’d get this particular query answered using Linq to CRM, Query Expressions, FetchXML alone. Daryl doesn’t offer what the client is, but I would suppose if Linq and Query Expressions were acceptable offerings, .NET is on the table. Creating aggregate fields containing the count of the related entity on the parent entity (contact in this case) offers more than the Boolean option. If the query requirements ever changed to a threshold (more than X cars, less than Y trucks, between X and Y total vehicles) the Boolean options fails to deliver. The client in this question isn’t known, but I can’t think of many (any?) cases where pulling all the records to the client on a set of 500K+ rows is more efficient than a the SQL query that CRM would make on your behalf against several integer fields with range clauses.
Upside:
Maintains client purity in Query approach
Simple client query
Probably as performant as possible
Downside:
Setups for Aggregate fields
Workflow or plugin to manage the increment and decrement of the aggregate fields
SQL Script for initial load of the aggregates.
Risk that aggregate fields get out of sync (workflow or plugin fails)
Option 2:
If purity within the client isn’t essential, and .NET is on the table – skip the aggregate fields and the setup and just run SQL against the Views. If you don’t want to work with the ADO.NET, a thin ORM like Dapper, Massive, or PetaPOCO can still give you an object model. As Andreas offers in his comment on the OP’s first answer, it seems like something fairly trivial to do in SQL.
Sketching something from top of mind:
SELECT c.*
FROM Contact
WHERE C.Contactid in (
Select contactid
FROM Vehicle v
group by v.contactid , v.type
having v.type = ‘Car’ and count(contactid) > 1
)
AND NOT IN (
Select contactid
FROM Vehicle v
group by v.contactid , v.type
having v.type <> ‘Car’ and count(contactid) > 1
)
Upside:
Much less work
CRM Entities get left alone
Downside:
Depending on the client and/or the application mixing DataAccess methods is a bit kludgy.
Likely less performant than Option 1
Option 3:
Mix and Match: Take the aggregate fields from Option 1. But update them using a scheduled SQL job (or something similar) with a query similar to the initial load job you’d need to write in Option 1
Upside:
Takes most of the work and risk out of Option 1
Keeps all of the performance of Option 1
Downside:
Some will see this as an unsupported feature.
In order to order to perform a true Relative Complement Query you need to be able to perform a subquery.
Your query would basically say give me all the contacts with cars, and then, within those results, remove any contacts that have a vehicle that isn't a car. This is what the SQL in #JasonKoopmans answer does. Unfortunetly, CRM does not support SubQueries.
Therefore, the only way to achieve this is to either perform the sub query on the client side, as I resorted to doing, or storing the results of what would be the subquery in a manner that can be accessed through the main query (ie storing counts on the contact entity).
You could theoretically do this "on the fly" by making a SubQueryResult entity that stores a ContactId, and SubQueryId. You'd first pull back the contacts that have at least 1 car, and create a SubQueryResult record for each record, with it's contactId, and a single SubQueryId that is generated client side to tie them all together.
Then you'd do another query that says give me all the contacts that are in this SubQueryResult with this SubQueryId, that do not have any vehicles that aren't cars.
I could only assume that this wouldn't be any more efficient than performing the two separate queries and performing the filter client side. Although with the new ExecuteMultipleRequests in the new CRM release, it may be close.
I have resorted to pulling back all of my records in CRM, and performing the check on the client side since CRM 2011 doesn't support this via Query Expressions.
You could write two Fetch XML statements, one to return all contacts and the count of their vehicles, and another to return all contacts and the count of their cars, then compare the list on the client side. But once again, you're having to return every contact and filter it client side.
It's not tested but how about this query expression? I'm linking in the Vehicle entity as an inner join, requiring that it's a Car. I'm assuming that the field VehicleType is a String because I'm a bit lazy and don't want to test it (I'm typing this hardcore style, no compilation - pure brain work).
Optionally, you might want to add a Criteria section as well to control which of the Contact instances that actually get retrieved. Do tell how it went!
Sorry for the verbosity. I know you like it short. My brains work better when circumlocutory.
new QueryExpression
{
EntityName = "contact",
ColumnSet = new ColumnSet("fullname"),
LinkEntities =
{
new LinkEntity
{
JoinOperator = JoinOperator.Inner,
LinkFromEntityName = "contact",
LinkFromAttributeName = "contactid",
LinkToEntityName = "vehicle",
LinkToAttributeName = "contactid",
Columns = new ColumnSet("vehicletype"),
EntityAlias = "Vroom",
//LinkCriteria = { Conditions =
//{
// new ConditionExpression(
// "vehicletype", ConditionOperator.Equal, "car")
//} }
LinkCriteria = { Conditions =
{
new ConditionExpression(
"vehicletype", ConditionOperator.NotEqual, "truck")
} }
}
}
};
EDIT:
I've talk to my MVP Gustaf Westerlund and he's suggested the following work-around. Let me stress that it's not an answer to your original question. It's just a way to solve it. And it's cumbersome. :)
So, the hint is to add a flag in the Contact or Person entity. Then, every time you create a new instance of Vehicle, you need to fire a message and using a plugin, update the information on the first about the creation of the latter.
This has several drawbacks.
It requires us to do stuff.
It's not the straight-forward do-this-and-that type of approach.
Maintenance is higher for every new type of Vehicle one adds.
Buggibility is elevated since there are many cases to regard (what happens to the flagification when a Vehicle instance is reasigned, deleted etc.).
So, my answer to your question is changed to: "can't be done". This remains effective until (gladly) proven wrong by presented alternative solution. Duck!
Personally, I'd fetch (almost) everything and unleash the hounds of LINQ onto it. But I'd do that without smiling nor proud. :)

CouchDB views - Multiple join... Can it be done?

I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.

Resources