Limit on the gremlin Query result set size in cosmosDB - azure

in cosmosDB graph i have a vertex named "student" which have the following properties.
query:
g.V().hasLabel('student').valueMap().limit(1)
output:
{
"StudentID": [
10000
],
"StudentName": [
"RITM0809903"
],
"Student_Age": [
"ritm0809903"
],
"Student_class": [
"Awaiting User Training"
],
"Student_long_description": [
"*******************HUGE STUDENT DESCRIPTION*****************************"
]
}
Note: "HUGE STUDENT DESCRIPTION" is a huge description about a student.
Total number of student vertices available are 9.
i am using gremlinpython module to hit the query on cosmosdb and fetch the query results.
but when i try to do valueMap('StudentID','StudentName','Student_long_description') and get all the 9 vertices("g.V().hasLabel('student').valueMap('StudentID','StudentName','Student_long_description')") in output i am only able to see the 7 vertices , but when i exclude the property ""Student_long_description" i am able to see all 9 vertices.
is it because of the limit on the result set size.
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/concepts/querylimits
but when i include fold at last ("g.V().hasLabel('student').valueMap('StudentID','StudentName','Student_long_description').fold()") i am able to see all the 9 vertices along with the property ""Student_long_description" but folded.
please let me know is their any option i can use to get all the 9 vertices with all the properties without using fold in the query.

Related

Retrieving the documents connected this vertex

I would like to retrieve all documents connected to the vertex.
First all my idea was to find a vertex by using the information stored in it. I was able to do it with a full text query:
FOR doc IN spec
FILTER doc.serial_no == '"12345abc"'
RETURN doc
RESULT:
[
{
"_key": "3834670",
"_id": "spec/3834670",
"_rev": "_WP3-fvW---",
"type": "spec-type-545",
"name": "spec-name-957",
"serial_no": ""12345abc""
}
]
Now I would like to find all documents attached to this vertex. How can I do that?
Assuming you save connections from your vertex to its attached documents in edge collections you can use a traversal.
A traversal starts at one specific document (startVertex) and follows all edges connected to this document. For all documents (vertices) that are targeted by these edges it will again follow all edges connected to them and so on.
In your case the startVertex is 'spec/3834670'. IN [min[..max]] defines the depth of the traversal, not specifying this option will use the default depth of 1. edgeCollection1, ..., edgeCollectionN is a list of all edge collections in use.
FOR v IN [min[..max]] ANY 'spec/3834670'
edgeCollection1, ..., edgeCollectionN
RETURN v._key
This is documented in the AQL Manual.

Why does SELECT count(1) FROM c change values each time I query it in CosmosDB Document Explorer?

I have a database with about 600-700 thousand documents. When I am in the Document Explorer and I execute "SELECT value count(1) FROM c", it returns values ranging from 64,000 to 72,000, seemingly at random. When I execute this using the Python SDK, it returns the actual count I mentioned above. Why is this?
The count query is limited by the number for RUs allocated to your collection. The response that you would have received will have a continuation token. You have to keep looking for next set of results and keep on adding it, which will give you the final count. For example, I tried a count query on my Cosmos DB, and these were the results
First execution
[
{
"$1": 184554
}
]
Next set of continuation. (By clicking Next button from Azure portal Data Explorer)
[
{
"$1": 181909
}
]
Next set of continuation. (By clicking Next button from Azure portal Data Explorer)
[
{
"$1": 25589
}
]
So, finally the count is
184554 + 181909 + 25589 = 3,92,052

How to connect to two document in the same collection

In Arango DB how do I build a graph of relations of the documents in one collection? The documentation covers using multi table collection but not how to it inside one collection.
If I understand correctly, the requirement is to create a graph using a single vertex collection.
The following example (ArangoShell code) creates a graph named example, using a single vertex collection named v and an edge collection named e:
var g = require("org/arangodb/general-graph");
g._create("example", [ { collection: "e", "from": [ "v" ], "to" : [ "v" ] } ]);
g._graph("example");
This will bring up this graph definition:
[ Graph example EdgeDefinitions: [
"e: [v] -> [v]"
] VertexCollections: [ ] ]
Note that having a separate edge collection is still required, but vertices can be stored in a single collection (as shown above) or in multiple.

Redundant query trigger when creating a graph?

whenever I try to create a new graph with 700.000 to 2 Mio edges, it takes a long time. I observed due to the great new feature in the API
/_api/query/current
that possibly the graph creation triggers automatically some kind of cache loading, but twice?
[
{
"id": "70",
"query": "FOR x IN GRAPH_VERTICES(#graph, {}) SORT RAND() LIMIT #limit RETURN x",
"started": "2015-03-31T19:06:59Z",
"runTime": 41.95919394493103
},
{
"id": "71",
"query": "FOR x IN GRAPH_VERTICES(#graph, {}) SORT RAND() LIMIT #limit RETURN x",
"started": "2015-03-31T19:06:59Z",
"runTime": 41.95719385147095
}
]
Is this correct. Is there a more efficient way?
Thanks in Advance!
The graph viewer issued the mentioned RAND() query two times:
- one instance is fired to determine a random vertex from the graph
- the other instance is fired to determine the attributes of some random vertices of the graph, in order to populate the search input field
The AQL that was used by the graph viewer was inefficient. It build a big list, sorted it randomly and returned 1 (first query) or 10 (second query) documents from it. This has been fixed in commit c28575f202a58d5c93e6c36883effda48c2a7159 so it's much more efficient now.
The fix will be included in the next build (i.e. 2.5.2).

How do I keep existing data in couchbase and only update the new data without overwriting

So, say I have created some records/documents under a bucket and the user updates only one column out of 10 in the RDBMS, so I am trying to send only that one columns data and update it in couchbase. But the problem is that couchbase is overwriting the entire record and putting NULL`s for the rest of the columns.
One approach is to copy all the data from the exisiting record after fetching it from Cbase, and then overwriting the new column while copying the data from the old one. But that doesn`t look like a optimal approach
Any suggestions?
You can use N1QL update Statments google for Couchbase N1QL
UPDATE replaces a document that already exists with updated values.
update:
UPDATE keyspace-ref [use-keys-clause] [set-clause] [unset-clause] [where-clause] [limit-clause] [returning-clause]
set-clause:
SET path = expression [update-for] [ , path = expression [update-for] ]*
update-for:
FOR variable (IN | WITHIN) path (, variable (IN | WITHIN) path)* [WHEN condition ] END
unset-clause:
UNSET path [update-for] (, path [ update-for ])*
keyspace-ref: Specifies the keyspace for which to update the document.
You can add an optional namespace-name to the keyspace-name in this way:
namespace-name:keyspace-name.
use-keys-clause:Specifies the keys of the data items to be updated. Optional. Keys can be any expression.
set-clause:Specifies the value for an attribute to be changed.
unset-clause: Removes the specified attribute from the document.
update-for: The update for clause uses the FOR statement to iterate over a nested array and SET or UNSET the given attribute for every matching element in the array.
where-clause:Specifies the condition that needs to be met for data to be updated. Optional.
limit-clause:Specifies the greatest number of objects that can be updated. This clause must have a non-negative integer as its upper bound. Optional.
returning-clause:Returns the data you updated as specified in the result_expression.
RBAC Privileges
User executing the UPDATE statement must have the Query Update privilege on the target keyspace. If the statement has any clauses that needs data read, such as SELECT clause, or RETURNING clause, then Query Select privilege is also required on the keyspaces referred in the respective clauses. For more details about user roles, see Authorization.
For example,
To execute the following statement, user must have the Query Update privilege on travel-sample.
UPDATE `travel-sample` SET foo = 5
To execute the following statement, user must have the Query Update privilege on the travel-sample and Query Select privilege on beer-sample.
UPDATE `travel-sample`
SET foo = 9
WHERE city = (SELECT raw city FROM `beer-sample` WHERE type = "brewery"
To execute the following statement, user must have the Query Update privilege on `travel-sample` and Query Select privilege on `travel-sample`.
UPDATE `travel-sample`
SET city = “San Francisco”
WHERE lower(city) = "sanfrancisco"
RETURNING *
Example
The following statement changes the "type" of the product, "odwalla-juice1" to "product-juice".
UPDATE product USE KEYS "odwalla-juice1" SET type = "product-juice" RETURNING product.type
"results": [
{
"type": "product-juice"
}
]
This statement removes the "type" attribute from the "product" keyspace for the document with the "odwalla-juice1" key.
UPDATE product USE KEYS "odwalla-juice1" UNSET type RETURNING product.*
"results": [
{
"productId": "odwalla-juice1",
"unitPrice": 5.4
}
]
This statement unsets the "gender" attribute in the "children" array for the document with the key, "dave" in the tutorial keyspace.
UPDATE tutorial t USE KEYS "dave" UNSET c.gender FOR c IN children END RETURNING t
"results": [
{
"t": {
"age": 46,
"children": [
{
"age": 17,
"fname": "Aiden"
},
{
"age": 2,
"fname": "Bill"
}
],
"email": "dave#gmail.com",
"fname": "Dave",
"hobbies": [
"golf",
"surfing"
],
"lname": "Smith",
"relation": "friend",
"title": "Mr.",
"type": "contact"
}
}
]
Starting version 4.5.1, the UPDATE statement has been improved to SET nested array elements. The FOR clause is enhanced to evaluate functions and expressions, and the new syntax supports multiple nested FOR expressions to access and update fields in nested arrays. Additional array levels are supported by chaining the FOR clauses.
Example
UPDATE default
SET i.subitems = ( ARRAY OBJECT_ADD(s, 'new', 'new_value' )
FOR s IN i.subitems END )
FOR s IN ARRAY_FLATTEN(ARRAY i.subitems
FOR i IN items END, 1) END;
If you're using structured (json) data, you need to read the existing record then update the field you want in your program's data structure and then send the record up again. You can't update individual fields in the json structure without sending it all up again. There isn't a way around this that I'm aware of.
It is indeed true, to update individual items in a JSON doc, you need to fetch the entire document and overwrite it.
We are working on adding individual item updates in the near future.

Resources