Azure CosmosDB: how to ORDER BY id? - azure

Using a vanilla CosmosDB collection (all default), adding documents like this:
{
"id": "3",
"name": "Hannah"
}
I would like to retrieve records ordered by id, like this:
SELECT c.id FROM c
ORDER BY c.id
This give me the error Order-by item requires a range index to be defined on the corresponding index path.
I expect this is because /id is hash indexed and not range indexed. I've tried to change the Indexing Policy in various ways, but any change I make which would touch / or /id gets wiped when I save.
How can I retrieve documents ordered by ID?

The best way to do this is to store a duplicate property e.g. id2 that has the same value of id, and is indexed using a range index, then use that for sorting, i.e. query for SELECT * FROM c ORDER BY c.id2.
PS: The reason this is not supported is because id is part of a composite index (which is on partition key and row key; id is the row key part) The Cosmos DB team is working on a change that will allow sorting by id.
EDIT: new collections now support ORDER BY c.id as of 7/12/19

I found this page CosmosDB Indexing Policies , which has the below Note that may be helpful:
Azure Cosmos DB returns an error when a query uses ORDER BY but
doesn't have a Range index against the queried path with the maximum
precision.
Some other information from elsewhere in the document:
Range supports efficient equality queries, range queries (using >, <,
>=, <=, !=), and ORDER BY queries. ORDER By queries by default also require maximum index precision (-1). The data type can be String or
Number.
Some guidance on types of queries assisted by Range queries:
Range Range over /prop/? (or /) can be used to serve the following
queries efficiently:
SELECT FROM collection c WHERE c.prop = "value"
SELECT FROM collection c WHERE c.prop > 5
SELECT FROM collection c ORDER BY c.prop
And a code example from the docs also:
var rangeDefault = new DocumentCollection { Id = "rangeCollection" };
// Override the default policy for strings to Range indexing and "max" (-1) precision
rangeDefault.IndexingPolicy = new IndexingPolicy(new RangeIndex(DataType.String) { Precision = -1 });
await client.CreateDocumentCollectionAsync(UriFactory.CreateDatabaseUri("db"), rangeDefault);
Hope this helps,
J

Related

How to query CosmosDB for nested object value

How can I retrieve objects which match order_id = 9234029m, given this document in CosmosDB:
{
"order": {
"order_id": "9234029m",
"order_name": "name",
}
}
I have tried to query in CosmosDB Data Explorer, but it's not possible to simply query the nested order_id object like this:
SELECT * FROM c WHERE c.order.order_id = "9234029m"
(Err: "Syntax error, incorrect syntax near 'order'")
This seems like it should be so simple, yet it's not! (In CosmosDB Data Explorer, all queries need to start with SELECT * FROM c, but REST SQL is an alternative as well.)
As you discovered, order is a reserved keyword, which was tripping up the query parsing. However, you can get past that, and still query your data, with slightly different syntax (bracket notation):
SELECT *
FROM c
WHERE c["order"].order_id = "9234029m"
This was due, apparently, to order being a reserved keyword in CosmosDB SQL, even if used as above.

Cosmos: DISTINCT results with JOIN and ORDER BY

I'm trying to write a query that uses a JOIN to perform a geo-spatial match against locations in a array. I got it working, but added DISTINCT in order to de-duplicate (Query A):
SELECT DISTINCT VALUE
u
FROM
u
JOIN loc IN u.locations
WHERE
ST_WITHIN(
{'type':'Point','coordinates':[loc.longitude,loc.latitude]},
{'type':'Polygon','coordinates':[[[-108,-43],[-108,-40],[-110,-40],[-110,-43],[-108,-43]]]})
However, I then found that combining DISTINCT with continuation tokens isn't supported unless you also add ORDER BY:
System.ArgumentException: Distict query requires a matching order by in order to return a continuation token. If you would like to serve this query through continuation tokens, then please rewrite the query in the form 'SELECT DISTINCT VALUE c.blah FROM c ORDER BY c.blah' and please make sure that there is a range index on 'c.blah'.
So I tried adding ORDER BY like this (Query B):
SELECT DISTINCT VALUE
u
FROM
u
JOIN loc IN u.locations
WHERE
ST_WITHIN(
{'type':'Point','coordinates':[loc.longitude,loc.latitude]},
{'type':'Polygon','coordinates':[[[-108,-43],[-108,-40],[-110,-40],[-110,-43],[-108,-43]]]})
ORDER BY
u.created
The problem is, the DISTINCT no longer appears to be taking effect because it returns, for example, the same record twice.
To reproduce this, create a single document with this data:
{
"id": "b6dd3e9b-e6c5-4e5a-a257-371e386f1c2e",
"locations": [
{
"latitude": -42,
"longitude": -109
},
{
"latitude": -42,
"longitude": -109
}
],
"created": "2019-03-06T03:43:52.328Z"
}
Then run Query A above. You will get a single result, despite the fact that both locations match the predicate. If you remove the DISTINCT, you'll get the same document twice.
Now run Query B and you'll see it returns the same document twice, despite the DISTINCT clause.
What am I doing wrong here?
Reproduced your issue indeed,based on my researching,it seems a defect in cosmos db distinct query. Please refer to this link:Provide support for DISTINCT.
This feature is broke in the data explorer. Because cosmos can only
return 100 results per page at a time, the distinct keyword will only
apply to a single page. So, if your result set contains more than 100
results, you may still get duplicates back - they will simply be on
separately paged result sets.
You could describe your own situation and vote up this feedback case.

IndexedDB getAll() ordering

I'm using getAll() method to get all items from db.
db.transaction('history', 'readonly').objectStore('history').getAll().onsuccess = ...
My ObjectStore is defined as:
db.createObjectStore('history', { keyPath: 'id', autoIncrement: true });
Can I count on the ordering of the items I get? Will they always be sorted by primary key id?
(or is there a way to specify sort explicitly?)
I could not find any info about ordering in official docs
If the docs don't help, consult the specs:
getAll refers to "steps for retrieving multiple referenced values"
the retrieval steps refer to "first count records in index"
the specification of index contains the following paragraph:
The records in an index are always sorted according to the record's
key. However unlike object stores, a given index can contain multiple
records with the same key. Such records are additionally sorted
according to the index's record's value (meaning the key of the record
in the referenced object store).
Reading backwards: An index is sorted. getAll retrieves the first N of an index, i.e. it is order-preserving. Therefore the result itself should retain the sort order.

Index multiple MongoDB fields, make only one unique

I've got a MongoDB database of metadata for about 300,000 photos. Each has a native unique ID that needs to be unique to protect against duplication insertions. It also has a time stamp.
I frequently need to run aggregate queries to see how many photos I have for each day, so I also have a date field in the format YYYY-MM-DD. This is obviously not unique.
Right now I only have an index on the id property, like so (using the Node driver):
collection.ensureIndex(
{ id:1 },
{ unique:true, dropDups: true },
function(err, indexName) { /* etc etc */ }
);
The group query for getting the photos by date takes quite a long time, as one can imagine:
collection.group(
{ date: 1 },
{},
{ count: 0 },
function ( curr, result ) {
result.count++;
},
function(err, grouped) { /* etc etc */ }
);
I've read through the indexing strategy, and I think I need to also index the date property. But I don't want to make it unique, of course (though I suppose it's fine to make it unique in combine with the unique id). Should I do a regular compound index, or can I chain the .ensureIndex() function and only specify uniqueness for the id field?
MongoDB does not have "mixed" type indexes which can be partially unique. On the other hand why don't you use _id instead of your id field if possible. It's already indexed and unique by definition so it will prevent you from inserting duplicates.
Mongo can only use a single index in a query clause - important to consider when creating indexes. For this particular query and requirements I would suggest to have a separate unique index on id field which you would get if you use _id. Additionally, you can create a non-unique index on date field only. If you run query like this:
db.collection.find({"date": "01/02/2013"}).count();
Mongo will be able to use index only to answer the query (covered index query) which is the best performance you can get.
Note that Mongo won't be able to use compound index on (id, date) if you are searching by date only. You query has to match index prefix first, i.e. if you search by id then (id, date) index can be used.
Another option is to pre aggregate in the schema itself. Whenever you insert a photo you can increment this counter. This way you don't need to run any aggregation jobs. You can also run some tests to determine if this approach is more performant than aggregation.

Cannot link MS Access query with subquery

I have created a query with a subquery in Access, and cannot link it in Excel 2003: when I use the menu Data -> Import External Data -> Import Data... and select the mdb file, the query is not present in the list. If I use the menu Data -> Import External Data -> New Database Query..., I can see my query in the list, but at the end of the import wizard I get this error:
Too few parameters. Expected 2.
My guess is that the query syntax is causing the problem, in fact the query contains a subquery. So, I'll try to describe the query goal and the resulting syntax.
Table Positions
ID (Autonumber, Primary Key)
position (double)
currency_id (long) (references Currency.ID)
portfolio (long)
Table Currency
ID (Autonumber, Primary Key)
code (text)
Query Goal
Join the 2 tables
Filter by portfolio = 1
Filter by currency.code in ("A", "B")
Group by currency and calculate the sum of the positions for each currency group an call the result: sumOfPositions
Calculate abs(sumOfPositions) on each currency group
Calculate the sum of the previous results as a single result
Query
The query without the final sum can be created using the Design View. The resulting SQL is:
SELECT Currency.code, Sum(Positions.position) AS SumOfposition
FROM [Currency] INNER JOIN Positions ON Currency.ID = Positions.currency_id
WHERE (((Positions.portfolio)=1))
GROUP BY Currency.code
HAVING (((Currency.code) In ("A","B")));
in order to calculate the final SUM I did the following (in the SQL View):
SELECT Sum(Abs([temp].[SumOfposition])) AS sumAbs
FROM [SELECT Currency.code, Sum(Positions.position) AS SumOfposition
FROM [Currency] INNER JOIN Positions ON Currency.ID = Positions.currency_id
WHERE (((Positions.portfolio)=1))
GROUP BY Currency.code
HAVING (((Currency.code) In ("A","B")))]. AS temp;
So, the question is: is there a better way for structuring the query in order to make the export work?
I can't see too much wrong with it, but I would take out some of the junk Access puts in and scale down the query to this, hopefully this should run ok:
SELECT Sum(Abs(A.SumOfPosition)) As SumAbs
FROM (SELECT C.code, Sum(P.position) AS SumOfposition
FROM Currency As C INNER JOIN Positions As P ON C.ID = P.currency_id
WHERE P.portfolio=1
GROUP BY C.code
HAVING C.code In ("A","B")) As A
It might be worth trying to declare your parameters in the MS Access query definition and define their datatypes. This is especially important when you are trying to use the query outside of MS Access itself, since it can't auto-detect the parameter types. This approach is sometimes hit or miss, but worth a shot.
PARAMETERS [[Positions].[portfolio]] Long, [[Currency].[code]] Text ( 255 );
SELECT Sum(Abs([temp].[SumOfposition])) AS sumAbs
FROM [SELECT Currency.code, Sum(Positions.position) AS SumOfposition
FROM [Currency] INNER JOIN Positions ON Currency.ID = Positions.currency_id
WHERE (((Positions.portfolio)=1))
GROUP BY Currency.code
HAVING (((Currency.code) In ("A","B")))]. AS temp;
I have solved my problems thanks to the fact that the outer query is doing a trivial sum. When choosing New Database Query... in Excel, at the end of the process, after pressing Finish, an Import Data form pops up, asking
Where do you want to put the data?
you can click on Create a PivotTable report... . If you define the PivotTable properly, Excel will display only the outer sum.

Resources