ArangoDB offset doesn't work in join - arangodb

I got next tables: users_categories, users.
users_categories objects contains "users" fields which has keys only, so I make join:
FOR c IN users_categories
FILTER c._key == '75a65608-7e9b-4e74-be19-76882209e388'
FOR u IN c.users
FOR u2 IN users FILTER u == u2._key
LIMIT 0, 100
RETURN u2
Result:
[
{
"_key": "5b1b68db-9848-4a0a-81b3-775007f16845",
"_id": "users/5b1b68db-9848-4a0a-81b3-775007f16845",
"_rev": "_VXo9gaC---",
"activated": true,
"blocked": false,
"citizenship": "RU",
"city": "Kalinigrad",
"deleted": false,
"email": "trigger.trigg#yandex.ru",
"lastActivityTime": 1501539830209,
"login": "triggerJK",
"name": "Max",
"passportId": "8736e8e4-9390-44e7-9e21-b17e18b1ebd9",
"phone": "89092132022",
"profileName": "Default profile",
"sex": 1,
"surname": "Max"
},
{
"_key": "0965a0d9-fc91-449f-90f8-9086944b1a86",
"_id": "users/0965a0d9-fc91-449f-90f8-9086944b1a86",
"_rev": "_VWjRYHe---",
"activated": true,
"blocked": false,
"citizenship": "AF",
"deleted": false,
"email": "megamozg4#mail.ru",
"lastActivityTime": 1501247531,
"login": "Megamozg4",
"passportId": "20ab7aad-d356-4437-86b2-6dfa9c4467e0",
"phone": "12312334555",
"profileName": "Default profile",
"sex": 1
}
]
If I set LIMIT 1 or LIMIT 0, 1 it returns only first record, as I want to. However, if I set LIMIT 1, N (N can be any) it returns empty array, so offset doesn't work?
What am I doing wrong?
ArangoDB used: 3.1.10
UPD:
somehow, LIMIT 1, N skips not the only first record, but first 2.
If I have more than 2 records to show, offset works strange. I created issue on github

Two bugs were reported regarding offsets:
https://github.com/arangodb/arangodb/issues/2928
https://github.com/arangodb/arangodb/issues/2879
And the fixes for LIMIT are included in the versions v3.1.27 and v3.2.1, so please update and test again.

Related

Is sorting rows by UUID a bad way in Cassandra?

I have a simple table and I want to sort by descending it. I added a parent_id and it's value is zero always. Is this a bad way to order by?
CREATE TABLE postcards (
id uuid,
parent_id tinyint,
body text,
PRIMARY KEY (parent_id, id)
) WITH CLUSTERING ORDER BY (id DESC)
SELECT * FROM postcards;
And query result:
[
{
"parent_id": 0,
"id": "f6b53ed0-aa30-11ec-8dc2-38f3ab100fe8",
"body": "7"
},
{
"parent_id": 0,
"id": "f507fa4b-aa30-11ec-8dc1-38f3ab100fe8",
"body": "6"
},
{
"parent_id": 0,
"id": "f31a2ced-aa30-11ec-8dc0-38f3ab100fe8",
"body": "5"
},
{
"parent_id": 0,
"id": "f1ab7e36-aa30-11ec-8dbf-38f3ab100fe8",
"body": "4"
},
{
"parent_id": 0,
"id": "f0897c34-aa30-11ec-8dbe-38f3ab100fe8",
"body": "3"
},
{
"parent_id": 0,
"id": "ef61185e-aa30-11ec-8dbd-38f3ab100fe8",
"body": "2"
},
{
"parent_id": 0,
"id": "ee1f9399-aa30-11ec-8dbc-38f3ab100fe8",
"body": "1"
}
]
If you will ever have just the one parent_id, this makes no sense because the data will only ever be owned by one node in the cluster (+ replicas).
Whether you sort the rows in ascending or descending is neither bad nor good. The more appropriate question is "What problem are you trying to solve?" because we would be able to give you a better answer if we have context on what you are trying to achieve.
If you're able to update your question with those details, I'd be happy to update my answer. Cheers!

Azure CosmosDB sql join not returning results when the child contains empty array

we have the below json structure. Having nested array of objects. Some arrays may be empty.
[
{
"adjustments": [
{
"id": "1_0000001",
"clientID": 1,
"adjustmentID": "0000001",
"chargeID": "0000001",
"dateOfEntry": "2019-01-29T00:00:00",
"adjustmentAmount": 200
}
],
"payments": [
{
"id": "1_0000001",
"clientID": 1,
"paymentID": "0000001",
"chargeID": "0000001",
"dateOfDeposit": "2019-01-28T00:00:00",
"dateOfEntry": "2019-01-29T00:00:00",
"paymentAmount": 250,
},
{
"id": "1_0000002",
"clientID": 1,
"paymentID": "0000002",
"chargeID": "0000001",
"dateOfDeposit": "2019-01-28T00:00:00",
"dateOfEntry": "2019-01-29T00:00:00",
"paymentAmount": 50,
}
],
"id": "1_0000001",
"clientID": 1,
"chargeID": "0000001",
"encounterID": "0000001",
"patientID": "1234567"
"dateOfServiceBegin": "2019-01-20T00:00:00",
"dateOfServiceEnd": "2019-01-20T00:00:00",
"dateOfEntry": "2019-01-21T00:00:00",
"location": "Main Campus",
"chargeTotal": 500
},
{
"adjustments": [],
"payments": [],
"id": "1_0000001",
"clientID": 1,
"chargeID": "0000001",
"encounterID": "0000001",
"patientID": "1234567"
"dateOfServiceBegin": "2019-02-20T00:00:00",
"dateOfServiceEnd": "2019-02-20T00:00:00",
"dateOfEntry": "2019-02-21T00:00:00",
"location": "Main Campus",
"chargeTotal": 500
}
]
i am trying to execute the below query
SELECT udf.getMonthAndYearPart(c.dateOfEntry) as date, sum(p.paymentAmount) as paymentAmount , sum(c.chargeTotal) as chargeAmount , sum(a.adjustmentAmount) as adjustmentAmount FROM c
JOIN p IN c.payments
JOIN a IN c.adjustments
where c.dateOfEntry >= '2019-01-11T18:30:00.000Z' and c.dateOfEntry <= '2020-12-30T18:30:00.000Z'
GROUP BY udf.getMonthAndYearPart(c.dateOfEntry)
I am expecting the below result
[
{
"date": "January-2019",
"paymentAmount": 300,
"chargeAmount": 1000,
"adjustmentAmount": 400
},
{
"date": "February-2019",
"chargeAmount": 500,
}
]
But I got only first object
[
{
"date": "January-2019",
"paymentAmount": 300,
"chargeAmount": 1000,
"adjustmentAmount": 400
}
]
Is there anything i can do without join? I want to calculate the sum of child objects amounts with group by month. Please help.
Found a solution by myself. using sub queries and group by. below one is the query in case anyone need this.
Select sum(k.totalPaymentAmount) as totalPaymentAmount,sum(k.totalAdjustmentAmount) as totalAdjustmentAmount,sum(k.totalCharge) as totalCharge,k.date as date From (SELECT
(SELECT value sum(c.paymentAmount) FROM c IN RevenueAnalytics.payments) as totalPaymentAmount,
(SELECT value sum(c.adjustmentAmount) FROM c IN RevenueAnalytics.adjustments) as totalAdjustmentAmount,
RevenueAnalytics.chargeTotal as totalCharge,
udf.getMonthAndYearPart(RevenueAnalytics.dateOfServiceBegin) as date
FROM RevenueAnalytics) k
Group BY k.date
In your case you would need to do a LEFT JOIN in your query to include the cases of documents with empty adjustments or payments. LEFT JOIN at the moment is not supported, you can vote this thread to include this feature. In the meanwhile you can create a procedure and do two separate queries, one as you are doing using joins, and the other not using joins and filtering (where clause) for entries with array_length 0 for adjustments and payments, and then aggregate all results and result.

How to use pagination on dynamoDB

How can you make a paginated request (limit, offset, and sort_by) using dynamoDB?
On mysql you can:
SELECT... LIMIT 10 OFFSET 1 order by created_date ASC
I'm trying this using nodejs, and in this case created_date isn't the primary key, can I query using sort key created_date?
This is my users table
{
"user_id": "asa2311",
"created_date": "2019/01/18 15:05:59",
"status": "A",
"rab_item_id": "0",
"order_id": "1241241",
"description": "testajabroo",
"id": "e3f46600-1af7-11e9-ac22-8d3a3e79a693",
"title": "test"
},
{
"user_id": "asa2311",
"status_id": "D",
"created_date": "2019/01/18 14:17:46",
"order_id": "1241241",
"rab_item_id": "0",
"description": "testajabroo",
"id": "27b5b0d0-1af1-11e9-b843-77bf0166a09f",
"title": "test"
},
{
"user_id": "asa2311",
"created_date": "2019/01/18 15:05:35",
"status": "A",
"rab_item_id": "0",
"order_id": "1241241",
"description": "testajabroo",
"id": "d5879e70-1af7-11e9-8abb-0fa165e7ac53",
"title": "test"
}
Pagination in DynamoDB is handled by setting the ExclusiveStartKey parameter to the LastEvaluatedKey returned from the previous result. There is no way to start after a specific number of items like you can with OFFSET in MySQL.

I am changing indexing of Cosmos DB but it is not working as it is supposed to be

I am Using Azure SQL API and my data is structured in the following way:
{
"deviceId": "123_123",
"comms": 0,
"engineSpdEnc": 0,
"currentTime": 1542185998605,
"deviceName": "mydevice2",
"siteId": 0,
"messageType": 2,
"data": {
"v5B3Freq": 0,
"v5B3Amp": 0,
"v5B4Freq": 0,
"v5B4Amp": 0,
"v5B5Freq": 0,
"v5B5Amp": 0,
"v6B6Freq": 0,
"v6B6Amp": 0,
"v6B7Freq": 0,
"v6B7Amp": 0,
"inletPres": 0
},
"EventProcessedUtcTime": "2018-11-14T09:01:42.6897624Z",
"PartitionId": 1,
"EventEnqueuedUtcTime": "2018-11-14T08:59:58.645Z",
"IoTHub": {
"MessageId": null,
"CorrelationId": null,
"ConnectionDeviceId": "device1",
"ConnectionDeviceGenerationId": "636758197942626855",
"EnqueuedTime": "2018-11-14T08:59:58.649Z",
"StreamId": null
},
"id": "1734dd0c-1bb5-d424-4946-e2c957bb3858",
"_rid": "lblPAOEu3xYCAAAAAAAAAA==",
"_self": "dbs/lblPAA==/colls/lblPAOEu3xY=/docs/lblPAOEu3xYCAAAAAAAAAA==/",
"_etag": "\"08008e15-0000-0000-0000-5bebe47c0000\"",
"_attachments": "attachments/",
"_ts": 1542186108 }
And by using Azure portal I have changed the indexing policy from default to following:
{
"indexingMode": "lazy",
"automatic": false,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": 3
},
{
"kind": "Range",
"dataType": "String",
"precision": 3
},
{
"kind": "Spatial",
"dataType": "Point"
}
]
}
],
"excludedPaths": [
{
"path": "/data/*"
}
]
}
According to this I have disable auto indexing policy and excluded the path in the /data/* that means
If I am going to query:
select * from c where c.data.v6B7Amp = 0
It should return nothing to me as there is no indexing forc.data.pressure , but I am getting all the records supposed to be in it.
Is it because I am using Azure portal to changing the indexing or anything else?
Firstly, you don't need to turn automatic indexing off or set the indexingMode to lazy, unless you have a reason to.
It appears that equality checks can work even if the path is excluded.
Where the excluded path will kick in is when you try to do something like an order by against that field.
Here is an example using your data data and your indexing policy:
When a path is excluded from indexing, query will fallback to do a full scan of all the documents in the collection to filter the results. That's the reason you are seeing results for your query.
It should be "path": "/data/?" The question mark refers to the specific value of the path, whereas the asterisk represents one or more paths as determined by the wildcard.

Return distinct and sorted query in AQL

So I have two collections, one with cities with an array of postal codes as a property and one with postal codes and their latitude & longitude.
I want to return the cities closest to a coordinate. This is easy enough with a geo index but the issue I'm having is the same city being returned multiple times and some times it can be the 1st and 3rd closest because the postal code that I'm searching in bordering another city.
cities example data:
[
{
"_key": "30936019",
"_id": "cities/30936019",
"_rev": "30936019",
"countryCode": "US",
"label": "Colorado Springs, CO",
"name": "Colorado Springs",
"postalCodes": [
"80904",
"80927"
],
"region": "CO"
},
{
"_key": "30983621",
"_id": "cities/30983621",
"_rev": "30983621",
"countryCode": "US",
"label": "Manitou Springs, CO",
"name": "Manitou Springs",
"postalCodes": [
"80829"
],
"region": "CO"
}
]
postalCodes example data:
[
{
"_key": "32132856",
"_id": "postalCodes/32132856",
"_rev": "32132856",
"countryCode": "US",
"location": [
38.9286,
-104.6583
],
"postalCode": "80927"
},
{
"_key": "32147422",
"_id": "postalCodes/32147422",
"_rev": "32147422",
"countryCode": "US",
"location": [
38.8533,
-104.8595
],
"postalCode": "80904"
},
{
"_key": "32172144",
"_id": "postalCodes/32172144",
"_rev": "32172144",
"countryCode": "US",
"location": [
38.855,
-104.9058
],
"postalCode": "80829"
}
]
The following query works but as an ArangoDB newbie I'm wondering if there's a more efficient way to do this:
FOR p IN WITHIN(postalCodes, 38.8609, -104.8734, 30000, 'distance')
FOR c IN cities
FILTER p.postalCode IN c.postalCodes AND c.countryCode == p.countryCode
COLLECT close = c._id AGGREGATE distance = MIN(p.distance)
FOR c2 IN cities
FILTER c2._id == close
SORT distance
RETURN c2
The first FOR in the query will use the geo index and probably return few documents (just the postal codes around the specified location).
The second FOR will look up the city for each found postal code. This may be an issue, depending on whether there is an index present on cities.postalCodes and cities.countryCode. If not, then the second FOR has to do a full scan of the cities collection each time it is involved. This will be inefficient. It may therefore be create an index on the two attributes like this:
db.cities.ensureIndex({ type: "hash", fields: ["countryCode", "postalCodes[*]"] });
The third FOR can be removed entirely when not COLLECTing by c._id but by c:
FOR p IN WITHIN(postalCodes, 38.8609, -104.8734, 30000, 'distance')
FOR c IN cities
FILTER p.postalCode IN c.postalCodes AND c.countryCode == p.countryCode
COLLECT city = c AGGREGATE distance = MIN(p.distance)
SORT distance
RETURN city
This will shorten the query string, but it may not help efficiency much I think, as the third FOR will use the primary index to look up the city documents, which is O(1).
In general, when in doubt about a query using indexes, you can use db._explain(queryString) to show which indexes will be used by a query.

Resources