Query ArangoDB for Arrays

Query ArangoDB for Arrays - arangodb

I am having a problem querying ArangoDB in java for a value of Arrays. I have tried with both String[] and ArrayList, both with no success.
My query:
FOR document IN documents FILTER #categoriesArray IN document.categories[*].title RETURN document
BindParams:
Map<String, Object> bindVars = new MapBuilder().put("categoriesArray", categoriesArray).get();
categoriesArray contains a bunch of Strings. I'm not sure why it isn't returning any results, because if I query using:
FOR document IN documents FILTER "Politics" IN document.categories[*].title RETURN document
I get the results I am looking for. Just not when using an Array or ArrayList.
I also tried querying for:
FOR document IN documents FILTER ["Politics","Law] IN document.categories[*].title RETURN document
in order to emulate an ArrayList, but this doesn't return any results. I would query using a bunch of individual Strings, but there are too many and I get an error from the Java driver when querying with a String that long. Thus, I must query using an Array or ArrayList.
An example of the categoriesArray:
["Politics", "Law", "Nature"]
A sample image of the database:

The reason is that the IN operator works by searching for the value on its left-hand side in each member of the array on the right side.
With the following the query, this will work if "Politics" is a member of document.categories[*].title:
FOR document IN documents FILTER "Politics" IN document.categories[*].title RETURN document
However the following will not work query even if "Politics" is a member of document.categories[*].title:
FOR document IN documents FILTER [ "Politics", "Law" ] IN document.categories[*].title RETURN document
This is because it will be searched for the exact value [ "Politics", "Law" ] in each member on the right side, and this will not be present. What you are probably looking for is a comparison that looks for "Politics" and "Law" separately, e.g.:
FOR document IN documents
LET contained = (
FOR title IN [ "Politics", "Law" ] /* or #categoriesArray */
FILTER title IN document.categories[*].title
RETURN title
)
FILTER LENGTH(contained) > 0
RETURN document

Arango also (now) has Array Comparison Operators which allow searching ALL IN, ANY IN, or NONE IN
[ 1, 2, 3 ] ALL IN [ 2, 3, 4 ] // false
[ 1, 2, 3 ] ALL IN [ 1, 2, 3 ] // true
[ 1, 2, 3 ] NONE IN [ 3 ] // false
[ 1, 2, 3 ] NONE IN [ 23, 42 ] // true
[ 1, 2, 3 ] ANY IN [ 4, 5, 6 ] // false
[ 1, 2, 3 ] ANY IN [ 1, 42 ] // true
[ 1, 2, 3 ] ANY == 2 // true
[ 1, 2, 3 ] ANY == 4 // false
[ 1, 2, 3 ] ANY > 0 // true
[ 1, 2, 3 ] ANY <= 1 // true
[ 1, 2, 3 ] NONE < 99 // false
[ 1, 2, 3 ] NONE > 10 // true
[ 1, 2, 3 ] ALL > 2 // false
[ 1, 2, 3 ] ALL > 0 // true
[ 1, 2, 3 ] ALL >= 3 // false
["foo", "bar"] ALL != "moo" // true
["foo", "bar"] NONE == "bar" // false
["foo", "bar"] ANY == "foo" // true
So you could now filter by:
FOR document IN documents
FILTER ["Politics", "Law] ANY IN (document.categories[*].title)[**]
RETURN document

Related

Transpose CSV data using nodejs

"A",1,2,3,4
"B",1,2,3,4
"C",1,2,3,4
I want to transpose and get the output as
"A""B""C"
111
222
333
444

Hi sai kiran bandari !
Please provide more information for your next question.
I expect you to have a 2 dimension array for which i made a solution. You want to iteratoe through the array using 2 loops and split the values to a new 2 dimension array according to first array index i in this case.
const data =[['A', 1,2,3,4], ['B', 1,2,3,4], ['C', 1,2,3,4]]
const transposed = []
// iterate through 2 dimension array..
for(var i = 0; i < data.length; i++) {
var arr = data[i]
// Iterate through inner araray
for(let p = 0; p < arr.length; p++) {
// create new inner array if there is not already one at destination..
if(Array.isArray(transposed[p]) === false) {
transposed[p] = []
}
// we want to take all the values of the first array and split it
// up to a single array each. In the second iteration, alle the
// values will be splitted again and you have your transpose.
transposed[p].push(arr[p])
}
}
console.log(transposed)
/* Output => [
[ 'A', 'B', 'C' ],
[ 1, 1, 1 ],
[ 2, 2, 2 ],
[ 3, 3, 3 ],
[ 4, 4, 4 ]
]
*/

Terraform filter based on map's minor key on lists of maps

I have 2 list of maps. Allow me to show you.
Let's call this one values_default:
[{A="abc", B="10"}]
and new_config:
[
{A="abc", B="9"},
{A="cdea", B="1000"},
{A="asd", B="otra cosa"},
]
Then I need merge or concat them, but in a special way. What I need to do is Filter out duplicates of A, taking the A who has the smallest value of B. That is the final result desired, maybe there is another way to do this from what I am attempting, but as long as I get those two inputs and the expected result, it's fine. For the example above it the result should be:
[
{A="abc", B="9"},
{A="cdea", B="1000"},
{A="asd", B="otra cosa"},
]
Here we took the A with the lower B value.
I did this so far:
locals {
tmp_cluster_parameters = concat(var.new_config, var.values_default)
final_cluster_parameters = distinct([for i in local.tmp_cluster_parameters: {
name = i.A
value = i.B
}])
}
This means I can filter out ONLY when the maps are exactly the same (A & B). I tried many more things, but can not figure out how to get closer to my goal. for the example above, this would not filter out anything. The result would be this:
[
{A="abc", B="10"},
{A="abc", B="9"},
{A="cdea", B="1000"},
{A="asd", B="otra cosa"}
]
Ideally it should have removed the {A="abc", B="10"}
Edit 1: answer to question from #macin. here is the actual field as is today and some further explanation
variable "cluster_parameters_default" {
description = ""
type = list(map(string))
default = [
{
name = "wait_timeout"
value = "800"
}
]
}
I will have a few default parameters in the future. Those are defaults for MySQL Parameter groups. Here cluster_parameters_default is values_default, name is A and value is B. The idea is to have some ENFORCEABLE MySQL config defaults that we can overwrite to a greater or smaller value depending on what is allowed. This defaults would create a permissible walled garden for many configs of many DBs. For instance security team might require for us to have wait_timeout smaller than 15 minutes. Now you we should also be able to have a different value always that this value is smaller than 15 minutes as required by the security team. Aside there will be many more values that come from new_config. I did not want to have to explain all this as this is part of a much bigger project. meaning it will be much more complicated. That is why values_default, A and B. I could change the FORMAT of this variable values_default as it's my own variable, but I can not change new_config as this is used by MANY other things outside my control.
Edit2: Other Data Sets(DSx).
DS1:
values_default:
[{A="abc", B="10"}]
Then new_config:
[
{A="cdea", B="1000"},
{A="asd", B="otra cosa"},
]
desired output here should be:
[
{A="abc", B="10"},
{A="cdea", B="1000"},
{A="asd", B="otra cosa"},
]
DS2:
values_default:
[
{A="abc", B="10"},
{A="cdea", B="111"},
]
Then new_config:
[
{A="abc", B="8"},
{A="cdea", B="1000"},
{A="asd", B="otra cosa"},
]
The desired output in this case would be:
[
{A="abc", B="8"},
{A="cdea", B="111"},
{A="asd", B="otra cosa"},
]
DS3:
values_default:
[
{A="abc", B="10"},
{A="cdea", B="111"},
]
Then new_config would be NOT DECLARED in this case.
The desired output in this case would be values_default:
[
{A="abc", B="10"},
{A="cdea", B="111"},
]

Not sure I fully understand, but just based on your example, you can achieve your outcome using (p.s. min applies only to numbers, I don't know how you want to compare strings using min):
variable "values_default" {
default = [
{A="abc", B="10"},
{A="cdea", B="111"},
]
}
# FOR CASE 3, but works for other cases as well
variable "new_config" {
default = []
}
locals {
# get keys avaiable in our vars
keys_default = [for v in var.values_default: v.A]
keys_new = [for v in var.new_config: v.A]
keys_all = distinct(concat(local.keys_default, local.keys_new))
# find common keys that need to be potentailly overwritten
# using min of B values
keys_common = setintersection(local.keys_default, local.keys_new)
# construct overwritten values, if there are any
# keys must be unique in both vars (no duplicates present)
overwritten_values = [ for idx, key in local.keys_common:
{
A = key
B = min([for v in var.values_default: v.B if v.A == key][0],
[for v in var.new_config: v.B if v.A == key][0])
}
]
keys_different = setsubtract(local.keys_all, local.keys_common)
new_values = [ for idx, key in local.keys_different:
{
A = key
B = concat([for v in var.values_default: v.B if v.A == key],
[for v in var.new_config: v.B if v.A == key])
}
]
}
output "test" {
value = concat(local.overwritten_values, local.new_values)
}
gives:
test = [
{
"A" = "abc"
"B" = [
"10",
]
},
{
"A" = "cdea"
"B" = [
"111",
]
},
]

Build an Octave struct by "rows"

I mean to use a struct to hold a "table":
% Sample data
% idx idxstr var1 var2 var3
% 1 i01 3.5 21.0 5
% 12 i12 6.5 1.0 3
The first row contains the field names.
I could enter these data by columns directly,
ds2 = struct( ...
'idx', { 1, 12 }, ...
'idxstr', { 'i01', 'i12' }, ...
'var1', { 3.5, 6.5 }, ...
'var2', { 21, 1 }, ...
'var3', { 5, 3 } ...
);
and by rows indirectly, creating a cell array, and converting to struct,
ds3 = cell2struct( ...
{ 1, 'i01', 3.5, 21.0, 5; ...
12, 'i12', 6.5, 1.0, 3 ...
}, { 'idx', 'idxstr', 'var1', 'var2', 'var3' }, 2 );
Is there a direct way to enter data by rows?
In addition,
why the different sizes?
>> size(ds2), size(ds3)
ans =
1 2
ans =
2 1

As I mentioned in your other post here, you are probably better off creating your 'table' as a struct of array fields, rather than an array of single-row structs.
However, for the sake of writing a useful answer, I will assume the reason you opted for this form to begin with may be that you already have your data as rows in 'cell' form (e.g. possibly the output of a csv2cell operation), and you'd like to convert it to such a "table".
Therefore, to create a nice "table as struct of arrays" from such a data structure, you could follow a strategy like the following:
Data = { 1, 'i01', 3.5, 21.0, 5; 12, 'i12', 6.5, 1.0, 3 };
d1 = struct( 'idx' , [Data{:,1}] ,
'idxstr', {{Data{:,2}}}, % note the 'enclosing' braces!
'var1' , [Data{:,3}] ,
'var2' , [Data{:,4}] ,
'var3' , [Data{:,5}]
);
or, using cell2struct if you prefer that syntax:
d2 = cell2struct( { [Data{:,1}],
{Data{:,2}}, % note the lack of enclosing braces here!
[Data{:,3}],
[Data{:,4}],
[Data{:,5}] },
{ 'idx', 'idxstr', 'var1', 'var2', 'var3' },
2
);
Note that you "do" need to know if a 'column' represents a numeric or string array, so that you wrap it in [] or {} respectively ... but I think knowing the data-type represented by each column is not an unreasonable requirement from a programmer.

Counting string occurrences with ArangoDB AQL

To count the number of objects containing a specific attribute value I can do something like:
FOR t IN thing
COLLECT other = t.name = "Other" WITH COUNT INTO otherCount
FILTER other != false
RETURN otherCount
But how can I count three other occurrences within the same query, without resulting to subqueries running through the same dataset multiple times?
I've tried something like:
FOR t IN thing
COLLECT
other = t.name = "Other",
some = t.name = "Some",
thing = t.name = "Thing"
WITH COUNT INTO count
RETURN {
other, some, thing,
count
}
But I can't make sense of the results: I must be approaching this in the wrong way?

Split and count
You could split the string by the phrase and subtract 1 from the count. This works for any substring, which on the other hand means it does not respect word boundaries.
LET things = [
{name: "Here are SomeSome and Some Other Things, brOther!"},
{name: "There are no such substrings in here."},
{name: "some-Other-here-though!"}
]
FOR t IN things
LET Some = LENGTH(SPLIT(t.name, "Some"))-1
LET Other = LENGTH(SPLIT(t.name, "Other"))-1
LET Thing = LENGTH(SPLIT(t.name, "Thing"))-1
RETURN {
Some, Other, Thing
}
Result:
[
{
"Some": 3,
"Other": 2,
"Thing": 1
},
{
"Some": 0,
"Other": 0,
"Thing": 0
},
{
"Some": 0,
"Other": 1,
"Thing": 0
}
]
You can use SPLIT(LOWER(t.name), LOWER("...")) to make it case-insensitive.
COLLECT words
The TOKENS() function can be utilized to split the input into word arrays, which can then be grouped and counted. Note that I changed the input slightly. An input "SomeSome" will not be counted because "somesome" != "some" (this variant is word and not substring based).
LET things = [
{name: "Here are SOME some and Some Other Things. More Other!"},
{name: "There are no such substrings in here."},
{name: "some-Other-here-though!"}
]
LET whitelist = TOKENS("Some Other Things", "text_en")
FOR t IN things
LET whitelisted = (FOR w IN TOKENS(t.name, "text_en") FILTER w IN whitelist RETURN w)
LET counts = MERGE(FOR w IN whitelisted
COLLECT word = w WITH COUNT INTO count
RETURN { [word]: count }
)
RETURN {
name: t.name,
some: counts.some || 0,
other: counts.other || 0,
things: counts.things ||0
}
Result:
[
{
"name": "Here are SOME some and Some Other Things. More Other!",
"some": 3,
"other": 2,
"things": 0
},
{
"name": "There are no such substrings in here.",
"some": 0,
"other": 0,
"things": 0
},
{
"name": "some-Other-here-though!",
"some": 1,
"other": 1,
"things": 0
}
]
This does use a subquery for the COLLECT, otherwise it would count the total number of occurrences for the entire input.
The whitelist step is not strictly necessary, you could also let it count all words. For larger input strings it might save some memory to not do this for words you are not interested in anyway.
You might want to create a separate Analyzer with stemming disabled for the language if you want to match the words precisely. You can also turn off normalization ("accent": true, "case": "none"). An alternative would be to use REGEX_SPLIT() for typical whitespace and punctuation characters for a simpler tokenization, but that depends on your use case.
Other solutions
I don't think that it's possible to count each input object independently with COLLECT without subquery, unless you want a total count.
Splitting is a bit of a hack, but you could substitute SPLIT() by REGEX_SPLIT() and wrap the search phrases in \b to only match if word boundaries are on both sides. Then it should only match words (more or less):
LET things = [
{name: "Here are SomeSome and Some Other Things, brOther!"},
{name: "There are no such substrings in here."},
{name: "some-Other-here-though!"}
]
FOR t IN things
LET Some = LENGTH(REGEX_SPLIT(t.name, "\\bSome\\b"))-1
LET Other = LENGTH(REGEX_SPLIT(t.name, "\\bOther\\b"))-1
LET Thing = LENGTH(REGEX_SPLIT(t.name, "\\bThings\\b"))-1
RETURN {
Some, Other, Thing
}
Result:
[
{
"Some": 1,
"Other": 1,
"Thing": 1
},
{
"Some": 0,
"Other": 0,
"Thing": 0
},
{
"Some": 0,
"Other": 1,
"Thing": 0
}
]
A more elegant solution would be to utilize ArangoSearch for word counting, but it doesn't have a feature to let you retrieve how often a word occurs. It might keep track of that already internally (Analyzer feature "frequency"), but it's definitely not exposed at this point in time.

Longest Substring Pair Sequence is it Longest Common Subsequence or what?

I have a pair of strings, for example: abcabcabc and abcxxxabc and a List of Common Substring Pairs (LCSP), in this case LCSP is 6 pairs, because three abc in the first string map to two abc in the second string. Now I need to find the longest valid (incrementing) sequence of pairs, in this case there are three equally long solutions: 0:0,3:6; 0:0,6:6; 3:0,6:6 (those numbers are starting positions of each pair in the original strings, the length of substrings is 3 as length of "abc"). I would call it the Longest Substring Pair Sequence or LSPQ. (Q is not to confuse String and Sequence)
Here is the LCSP for this example:
LCSP('abcabcabc', 'abcxxxabc') =
[ [ 6, 6, 3 ],
[ 6, 0, 3 ],
[ 3, 6, 3 ],
[ 0, 6, 3 ],
[ 3, 0, 3 ],
[ 0, 0, 3 ] ]
LSPQ(LCSP('abcabcabc', 'abcxxxabc'), 0, 0, 0) =
[ { a: 0, b: 0, size: 3 }, { a: 3, b: 6, size: 3 } ]
Now I find it with brute force recursively trying all combinations. So I am limited to about 25 pairs, otherwise it is unpractical. Size=[10,15,20,25,26,30], Time ms = [0,15,300,1000,2000,19000]
Is there a way to do that in linear time or at least not quadratic complexity so that longer input LCSP (List of Common Substring Pairs) could be used.
This problem is similar to the "Longest Common Subsequence", but not exactly it, because the input is not two strings but a list of common substrings sorted by their length. So I do not know where to look for an existing solutions or even if they exist.
Here is my particular code (JavaScript):
function getChainSize(T) {
var R = 0
for (var i = 0; i < T.length; i++) R += T[i].size
return R
}
function LSPQ(T, X, Y, id) {
// X,Y are first unused character is str1,str2
//id is current pair
function findNextPossible() {
var x = id
while (x < T.length) {
if (T[x][0] >= X && T[x][1] >= Y) return x
x++
}
return -1
}
var id = findNextPossible()
if (id < 0) return []
var C = [{a:T[id][0], b:T[id][1], size:T[id][2] }]
// with current
var o = T[id]
var A = C.concat(LSPQ(T, o[0]+o[2], o[1]+o[2], id+1))
// without current
var B = LSPQ(T, X, Y, id+1)
if (getChainSize(A) < getChainSize(B)) return B
return A
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Query ArangoDB for Arrays - arangodb

Related

Transpose CSV data using nodejs

Terraform filter based on map's minor key on lists of maps

Build an Octave struct by "rows"

Counting string occurrences with ArangoDB AQL

Longest Substring Pair Sequence is it Longest Common Subsequence or what?

Categories

Resources