Azure CosmosDB Unexpected Pagination Behaviour - pagination

We are using CosmosDb C# SDK
We tried both: "Microsoft.Azure.Cosmos 3.4.1", "Microsoft.Azure.DocumentDB.Core 2.9.1 and 2.4.2"
We are getting invalid results and the main problems is the ResponseContinuation
[{"token":null,"range":{"min":"05C1DFFFFFFFFC","max":"FF"}}]
This started showing in one of our smaller service with only 14 documents.
In all queryes we use the folowing headers:
"x-ms-documentdb-query-enablecrosspartition" = true
"x-ms-max-item-count" = 100
Query 1:
The query is the folowing SELECT * FROM c.
We get the folowing response:
- 7 items
- ResponseContinuation [{"token":null,"range":{"min":"05C1DFFFFFFFFC","max":"FF"}}]
Then we use the continuation token to get the other 7 items.
Query 2:
If we modify the query to SELECT * FROM c ORDER BY c.property ASC, the order gets messed up! (responses are simplified)
- we get the folowing result ["A", "B", "C", "D", "F"]
- and the second query ["C", "D", "G"]
Query 3:
if we want to find only one item SELECT TOP 1 * FROM c WHERE c.name = #name, and the item is in the "second query result"
- nothing and RequestContionuation {"top":1,"sourceToken":"[{\"token\":null,\"range\":{\"min\":\"05C1DFFFFFFFFC\",\"max\":\"FF\"}}]"}
This is all a really unexpected behaviour.
Why does ORDER BY, TOP even exist if we can't even use it properly..
We can't afford to list all data from cosmos to our server and then do ordering, expecialy on bigger containers.
Edit: github issue link: https://github.com/Azure/azure-cosmos-dotnet-v3/issues/1001

Related

Loop Through a list in python to query DynamoDB for each item

I have a list of items and would like to use each item as the pk (Primary Key) to query Dynamo DB, using Python.
I have tried using a for loop but I dont get any results, If I try the same query with the actual value from the group_id list it does work which means my query statement is correct.
group_name_query = []
for i in group_id:
group_name_query = config_table.query(
KeyConditionExpression=Key('pk').eq(i) & Key('sk').eq('GROUP')
)
Here is a sample group_ip = ['GROUP#6501e5ac-59b2-4d05-810a-ee63d2f4f826', 'GROUP#6501e5ac-59b2-4d05-810a-ee63d2sfdgd']
not answering your issue but got a suggestion, if you're querying base table with pk and sk instead of query gsi, i would suggest you Batch Get Item API to get multiple items in one shot
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/example_dynamodb_BatchGetItem_section.html

Turning a simple json array to a csv string using PostgreSQL

I tried looking in the already open questions but nothing helped me unfortunately.
I'm trying to convert a json array to a string separated by commas, using the jsonb_array_elements and the string_agg function, but I keep getting the same error no matter how I try - "ERROR: cannot extract elements from a scalar"
Let's assume that my JSON data looks like: { "id" : "hi", "list" : ["a", "b", "c"] }.
What I'm trying to get is a query that results in a row that looks like : [id, list participants] = [hi, a,b,c]
I tried some different methods but the most recent one was:
select (select string_agg(t->>0, ',') from jsonb_array_elements(data->'list') as t) from my_table
Would really appreciate any help with this
You can do this for example
with c as (
select 1,'{ "id" : "hi", "list" : ["a", "b", "c"] }'::jsonb->>'id' as value
union
select 2,string_agg(t->>0, ',') from jsonb_array_elements('{ "id" : "hi", "list" : ["a", "b", "c"] }'::jsonb->'list') as t
)
select string_agg(c.value,',') from c order by 1
So, after a lot of attempts with many different methods - every method attempted has resulted in the same "ERROR: cannot extract elements from a scalar".
I eventually figured out that the query was not the problem, instead it was the data!
If one of the lists in the data is empty( "{[]}" ) it would result in that error. So after excluding those empty lists from the query using a WHERE clause, I finally got it working! Thanks, hopefully this post will help future frustrated query users :)
For reference, the method I eventually chose is:
SELECT t.tbl, d.list
FROM tbl t, LATERAL (
SELECT string_agg(value::text, ', ') AS list
FROM json_array_elements_text(t.data->'tags')
) d

What can cause "java.lang.IndexOutOfBoundsException: Index: 1, Size: 1" in a prepared query?

This is in Cassandra 3.11.4.
I'm running a modified version of a query that previously worked fine in my app. The original query that is fine was like:
SELECT SerializedRecord FROM SxRecord WHERE Mark=?
I modified the query to have a range of a timestamp (which I also added an index for, though I don't think that is relevant):
SELECT SerializedRecord FROM SxRecord WHERE Mark=? AND Timestamp>=? AND Timestamp<=?
This results in:
ResponseError {reHost = datacenter1:rack1:127.0.0.1:9042, reTrace = Nothing, reWarn = [], reCause = ServerError "java.lang.IndexOutOfBoundsException: Index: 1, Size: 1"}
When this occurs, I don't see the query CQL being logged in system_traces.sessions, which is interesting, because if I put a syntax error into the query, it is still logged there.
Additionally, when I run an (as far as I know, identical, up to timestamps) query in cqlsh, there doesn't seem to be a problem:
cqlsh> SELECT SerializedRecord FROM test_fds.SxRecord WHERE Mark=8391 AND Timestamp >= '2021-03-06 00:00:00.000+0000' AND Timestamp <= '2021-03-09 00:00:00.000+0000';
serializedrecord
------------------
This results in the following query trace:
cqlsh> select parameters from system_traces.sessions;
parameters
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{'consistency_level': 'ONE', 'page_size': '100', 'query': 'SELECT SerializedRecord FROM test_fds.SxRecord WHERE Mark=8391 AND Timestamp >= ''2021-03-06 00:00:00.000+0000'' AND Timestamp <= ''2021-03-09 00:00:00.000+0000'';', 'serial_consistency_level': 'SERIAL'}
null
It seems that the query, executed inside a prepared/bound statement, is not recieving all the parameters needed OR too many (something bound in previous code)
The fact that you don't see the query traced comes from the fact that the driver does not even perform the query as it has unbound parameters

How to query CosmosDB for nested object value

How can I retrieve objects which match order_id = 9234029m, given this document in CosmosDB:
{
"order": {
"order_id": "9234029m",
"order_name": "name",
}
}
I have tried to query in CosmosDB Data Explorer, but it's not possible to simply query the nested order_id object like this:
SELECT * FROM c WHERE c.order.order_id = "9234029m"
(Err: "Syntax error, incorrect syntax near 'order'")
This seems like it should be so simple, yet it's not! (In CosmosDB Data Explorer, all queries need to start with SELECT * FROM c, but REST SQL is an alternative as well.)
As you discovered, order is a reserved keyword, which was tripping up the query parsing. However, you can get past that, and still query your data, with slightly different syntax (bracket notation):
SELECT *
FROM c
WHERE c["order"].order_id = "9234029m"
This was due, apparently, to order being a reserved keyword in CosmosDB SQL, even if used as above.

Syntax issue in Stream Analytics Query running in Azure: Invalid column name: 'payload'

I am having a syntax issue with my stream analytics query. Following is my Stream Analytics query, where i am trying to get following fields from the events:
Vehicle Id
Diff of previous and current fuel level (for each
vehicle),
Diff of current and previous odometer value (for each
vehicle).
NON-WORKING QUERY
SELECT input.vehicleId,
FUEL_DIFF = LAG(input.Payload.FuelLevel) OVER (PARTITION BY vehicleId LIMIT DURATION(minute, 1)) - input.Payload.FuelLevel,
ODO_DIFF = input.Payload.OdometerValue - LAG(input.Payload.OdometerValue) OVER (PARTITION BY input.vehicleId LIMIT DURATION(minute, 1))
from input
Following is one sample input event on which the above query/job is ran on the series of events:
{
"IoTDeviceId":"DeviceId_1",
"MessageId":"03494607-3aaa-4a82-8e2e-149f1261ebbb",
"Payload":{
"TimeStamp":"2017-01-23T11:16:02.2019077-08:00",
"FuelLevel":19.9,
"OdometerValue":10002
},
"Priority":1,
"Time":"2017-01-23T11:16:02.2019077-08:00",
"VehicleId":"MyCar_1"
}
Following syntax error is thrown when the Stream Analytics job is ran:
Invalid column name: 'payload'. Column with such name does not exist.
Ironically, the following query works just fine:
WORKING QUERY
SELECT input.vehicleId,
FUEL_DIFF = LAG(input.Payload.FuelLevel) OVER (PARTITION BY vehicleId LIMIT DURATION(second, 1)) - input.Payload.FuelLevel
from input
The only diffrence between WORKING QUERY and NON-WORKING QUERY is number of LAG constructs used. The NON-WORKING QUERY has two LAG constructs, while WORKING QUERY has just one LAG construct.
I have referred Stream Analytics Query Language, they only have basic examples. Also tried looking into multiple blogs. In addition, I have tried using GetRecordPropertyValue() function, but no luck. Kindly suggest.
Thank you in advance!
This looks like a syntax bug indeed. Thank you for reporting - we will fix it in the upcoming updates.
Please consider using this query as a workaround:
WITH Step1 AS
(
SELECT vehicleId, Payload.FuelLevel, Payload.OdometerValue
FROM input
)
SELECT vehicleId,
FUEL_DIFF = LAG(FuelLevel) OVER (PARTITION BY vehicleId LIMIT DURATION(minute, 1)) - FuelLevel,
ODO_DIFF = OdometerValue - LAG(OdometerValue) OVER (PARTITION BY vehicleId LIMIT DURATION(minute, 1))
from Step1

Resources