ArangoDB Graph Traversal fails inspite of increased heap size

ArangoDB Graph Traversal fails inspite of increased heap size - arangodb

ArangoDB Version: 3.3.14
Heap Statistics for application:
{ total_heap_size: 39108608,
total_heap_size_executable: 3670016,
total_physical_size: 37433512,
total_available_size: 8735662896,
used_heap_size: 28891504,
heap_size_limit: 8769069275,
malloced_memory: 16384,
peak_malloced_memory: 168484640,
does_zap_garbage: 0 }
I have a traversal api which traverses through 3 vertices and returns around 300 document vertices.
For 200 I get the proper response but when I increase the number of vertices to 300, the traversal api throws an error invalid string length. I have increased the heap space for application to 8GB but as can be seen below the heap_used is way far too less. I am quite not sure if this issue is during serialization to JSON as there is sufficient heap memory available or are there any other options available to get the complete data.
AQL query (if applicable):
{"request":{"vertex":"start_vertex","start_vertex":"service_teams/9a2e582997494bee9066bbcf2aa52218","start_vertices":null,"opts":{"expander":"var connections = [];\n config.datasource.getOutEdges(vertex).forEach(function (e) {\n if( (e._id.indexOf(\"loc_has_parent_loc\") > -1) || (e._id.indexOf(\"loc_is_associated_with_org\") > -1) || (e._id.indexOf(\"st_is_allocated_loc\") > -1) ) {\n connections.push({ vertex: require(\"internal\").db._document(e._to), edge: e});\n }\n });\n return connections;"},"collection_name":"","edge_name":"","data":true,"path":false}},"err":{"code":500,"message":"Invalid string length"}}

The error message you're seeing here is a V8 error message, thus this is a javscript implementation limitation. V8 allows strings up to 256 MB.
The old graph traversal API is implemented in Javascript, and for processing may stringify documents inbetween and thus reach this limit.
Please prefer AQL-Traversals instead for better performance.

Related

How to measure RU in DocumentDB?

Given that Azure DocumentDB uses Requests Units as a measurement for throughput I would like to make sure my queries utilize the least amount of RUs as possible to ncrease my throughput. Is there a tool that will tell me how many RUs a query will take and if the query is actually using an index or not?

As you discovered, certain tools will provide RU's upon completion of a query. This is also available programmatically, as the x-ms-request-charge header is returned in the response, and easily retrievable via the DocumentDB SDKs.
For example, here's a snippet showing RU retrieval using JS/node:
var queryIterator = client.queryDocuments(collLink, querySpec );
queryIterator.executeNext(function (err, results, headers) {
if (err) {
// deal with error...
} else {
// deal with payload...
var ruConsumed = headers['x-ms-request-charge'];
}
});
As far as your question regarding indexing, and determining if a property is indexed (which should then answer your question about a query using or not using an index): You may query the collection, which returns the indexing details in the response header.
For example: given some path dbs/<databaseId>/colls/<collectionId>:
var collLink = 'dbs/' + databaseId+ '/colls/'+ collectionId;
client.readCollection(collLink, function (err, coll) {
if (err) {
// deal with error
} else {
// compare indexingPolicy with your property, to see if it's included or excluded
// this just shows you what these properties look like
console.log("Included: " + JSON.stringify(coll.indexingPolicy.includedPaths))
console.log("Excluded: " + JSON.stringify(coll.indexingPolicy.excludedPaths))
}
});
You'll see includedPaths and excludedPaths looking something like this, and you can then search for your given property in any way you see fit:
Included: [{"path":"/*","indexes":[{"kind":"Range","dataType":"Number","precision":-1},{"kind":"Hash","dataType":"String","precision":3}]}]
Excluded: []

I found DocumentDb Studio which shows the response header that provide the RUs on every query.

Another option is to use the emulator with the trace collection option turned on.
https://learn.microsoft.com/en-us/azure/cosmos-db/local-emulator
I was trying to profile LINQ aggregate queries, which currently seems to be impossible with c# SDK.
Using the trace output from the emulator I was able to identify the request charges and a host of other metrics. There is a lot of data to wade through through.
I found the request charge stored under this event key
DocDBServer/Transport_Channel_Processortask/Genericoperation
Example output:
ThreadID="141,928" FormattedMessage="EndRequest DocumentServiceId localhost, ResourceType 2, OperationType 15, ResourceId 91M7AL+QPQA=, StatusCode 200, HRESULTHex 0, ResponseLength 61, Duration 70,546, HasQuery 1, PartitionId a4cb495b-38c8-11e6-8106-8cdcd42c33be, ReplicaId 1, ConsistencyLevel 3, RequestSessionToken 0:594, ResponseSessionToken 594, HasContinuation 0, HasPreTrigger 0, HasPostTrigger 0, IsFeedUnfiltered 0, IndexingDirective 5, XDate Fri, 09 Jun 2017 08:49:03 GMT, RetryAfterMilliseconds 0, MaxItemCount -1, ActualItemCount 1, ClientVersion 2017-02-22, UserAgent Microsoft.Azure.Documents.Common/1.13.58.2, RequestLength 131, NetworkBucket 2, SubscriptionId 00000000-0000-0000-0000-000000000000, Region South Central US, IpAddress 192.168.56.0, ChannelProtocol RNTBD, RequestCharge 51.424, etc...
This can then be correlated with data from another event which contains the query info:
DocDBServer/ServiceModuletask/Genericoperation
Note you need perfview to view the ETL log files. See here for more info:
https://github.com/Azure/azure-documentdb-dotnet/blob/master/docs/documentdb-sdk_capture_etl.md

Expressjs JSON object and Array change form over POST request

I'm writing a web application with Express.js. When sending data in request body, any Array or Object with number key automatically parse as below:
{object1: {23: "abc", 45: "def"}, array1: ["a", "b"]}
get parsed into:
{"object1[23]": "abc", "object1[45]": "def", "array1[0]": "a", "array1[1]": "b"}
This is really annoying because when I try to retrieve key object1 or array1, I got error key does not exists and I have to do object1[23] instead.

there are 2 things which could lead to this issue -
the array size is larger that 20 (express default limit is 20)
the array is at some deeper level in the object (express default limit is 5)
to solve any of these issue
first require 'qs' in your app.js (file where app is iniatlized)
const qs = require('qs');
then before making the use calls (app.use( ))
set the limits (it ll not work if written after use calls)
app.set('query parser', function (str) {
//to allow to receive objects with depths upto 10
return qs.parse(str, { depth: 10 });
// return qs.parse(str, {arrayLimit:1000}); to allow arrays of size 1000
});

Azure Storage Table Does not return whole partition

I found some situation on production when
CloudContext.TableData.Where( A => A.PartitionKey == "MYKEY").ToList();
where TableData is
public DataServiceQuery<T> TableData { get { return CreateQuery<T>( _TableName ); } }
does not return the whole partition (I have less than 1000 records there).
In my case it returns 367 records while in VS2010 Server Explorer or in Azure Storage Explorer I get 414 records (condition is the same).
Did anyone experience the same problem?
Also If I change the query and add RowKey into the condition - I get required record with no problem.

You have to better understand the Table Service. In the official documentation here there are listed other conditions which affect number of records returned. If you want to retrieve the whole partition you have to inspect the TableResult for Continuation Token and use provided continuation token to execute the same query over and over again, until all the results come.
You can use an approach similar to the following:
private IEnumerable<MyEntityType> GetAllEntities()
{
var result = this._tables.GetSegmentedEntities(100, null); // null is for continuation token
while (result.Results.Count > 0)
{
foreach (var ufs in result.Results)
{
yield return new MyEntityType(ufs.RowKey, ufs.WhateverOtherPropertyINeed);
}
if (result.ContinuationToken != null)
{
result = this._tables.GetSegmentedEntities(100, result.ContinuationToken);
}
else
{
break;
}
}
}
Where GetSegmentedEntities(100, result.ContinuationToken) is defined as:
public TableQuerySegment<MyEntityType> GetSegmentedEntities(int pageSize, TableContinuationToken token)
{
var partKey = "My_Desired_Partition_key_passed_via_Const_or_method_Param";
TableQuery<MyEntityType> query = new TableQuery<MyEntityType>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partKey));
query.TakeCount = pageSize;
return this.azureTableReference.ExecuteQuerySegmented<MyEntityType>(query, token);
}
You can use and modify this code for your case.

This is a known and documented behavior. The Table service API will either return 1000 entities or as much entities as possible within 5 seconds. If the query takes longer than 5 seconds to execute, it'll return a continuation token.
With the addition of rowkey you are making the query more specific and hence faster and as a result yo are getting all the entities.
See TimeOuts and Pagination on MSDN for details

If you are getting partial result sets then there will be two factors.
i) You are having more than 1000 records matching the filter
ii) Querying took more than 5 seconds.
iii) Query crosses partition boundary.
As you are having less than 1000 records the first factor wont be a issue.And as you are retrieving based on PartitionKey equality third one also wont cause any problem. You are facing this problem because of second factor.
Two handle this you need to work on continuation token. You can refer this link for more info.

d3.js: "Cannot read property 'weight' of undefined" when manually defining both nodes and links for force layout

I tried setting both nodes and links at the same time this way:
var force = d3.layout.force()
.size([w, h])
.nodes(nodes)
.links(connections)
.start();
nodes = [{"name":"data_base_id", "kind":"subgenre"},...]
connections = [{"source":"name_of_node", "target":"name_of_other_node"},...]
I have data that may not have connections, so it is necessary to defined the nodes, so that all of the nodes get rendered. And defining the genres is pretty easy.
but I get this error;
Cannot read property 'weight' of undefined
And when I comment out .links(connections) the graph renders (juts a bunch of dots scattered throughout...) How do I get the connections / links to cooperate with d3?
I was reading the docs, and apparently the source and target have to be INDEXES of the nodes in the nodes array. Is there anyway to change that? So, I can use the name of a node rather than the index it has in an array?

I encounter same problem before, it is due to there is null values in source/target of links.
print out nodes and links information might help to debug

The force-directed layout uses edge weights to calculate the layout. Try adding a dummy "weight":1 to all of your connections.
The code that initializes the links looks like this:
links.forEach(function(d) {
if (typeof d.source == "number") { d.source = nodes[d.source]; }
if (typeof d.target == "number") { d.target = nodes[d.target]; }
});
Presumably you could tweak that (in the d3 source) to use any property/type.

In addition to the answers mentioning the null in the source/target of links, the reason for this could be the assignment of an out-of-range source/target. E.g. you have 10 nodes and you assign the target to be the 11-th indexed node.

Thanks to the answers above which refer to null source or target values!
I've been testing out the graph from http://bl.ocks.org/mbostock/4062045, and found that my data referenced a missing node.
This may help others debug this issue:
d3.json("my-buggy-data.json", function(error, graph) {
// Find blank links, which give the error
// "Uncaught TypeError: Cannot read property 'weight' of undefined"
graph.links.forEach(function(link, index, list) {
if (typeof graph.nodes[link.source] === 'undefined') {
console.log('undefined source', link);
}
if (typeof graph.nodes[link.target] === 'undefined') {
console.log('undefined target', link);
}
});
force
.nodes(graph.nodes)
.links(graph.links)
.start();

I think you might have null values in your source and target. I had this bug too and fixed it by filtering out the null values.

I've had this issue pop up in a number of ways. Most recently, I had my edge list as follows:
{Source: 0; Target: 1}
instead of:
{source: 0, target: 1}

JavaFX Node objects are not garbage collected

JavaFX (1.2.x and 1.3.x) doesn't seem to allow garbage collection for at least Nodes and Scenes. A Node object is not freed after being removed from Scene (there's no other explicit reference to it).
Here goes example:
var buttonB:Button =
Button {
text: "i'm just hanging here"
}
var buttonC:Button =
Button {
text: "hit me to leak memory"
action: function() {
buttonB.managed = false;
delete buttonB from mainBox.content;
buttonB.skin = null;
buttonB = null;
java.lang.System.gc();
}
}
def mainBox:HBox =
HBox {
hpos: HPos.CENTER
nodeVPos: VPos.CENTER
layoutInfo: LayoutInfo {
width: 800 height: 600
}
content: [buttonC, buttonB]
}
buttonB is never freed. Setting skin to null helps somehow (in VisualVM most of the references to the button disappear) but doesn't fix the issue. I also tried nullifying all members using JavaFX reflection with no luck.
Is it possible to make buttonB eligible for GC and how to do it?
Does the problem persist in JavaFX 2.0?

I found (through visualVM inspection) that JavaFX 1.3 keeps SoftReferences to buffered images (that probably represent rendered versions of Nodes) for nodes that have been removed. For me this was a sort of memory leak, as soft references are cleared depending on the amount of free memory. This isn't a memory leak (OutOfMemoryException will never happen due to this) but for me this was reason to cause very inefficient garbage collecting.
You can use XX:SoftRefLRUPolicyMSPerMB=<N> to reduce the time SoftReferences are kept, this is at a possible (but unlikely) performance penalty though. It sets the number of milliseconds per free MB that an object is kept. Default is 1000 ms.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

ArangoDB Graph Traversal fails inspite of increased heap size - arangodb

Related

How to measure RU in DocumentDB?

Expressjs JSON object and Array change form over POST request

Azure Storage Table Does not return whole partition

d3.js: "Cannot read property 'weight' of undefined" when manually defining both nodes and links for force layout

JavaFX Node objects are not garbage collected

Categories

Resources