AQL: return minimum of two values - arangodb

i am very new to AQL so this question is very simple i guess.
I would like to return the minimum of two values via aql. But min(valA,valB) returns
[1541] invalid number of arguments for function '_AQL:MIN()'
Unfortunatly i coudn't find any functions like min, max in the documentation, so i dont know what "Invalid number of arguments" means.
Here is a minimal reproducable example:
for art in artikel
return {"Preis" : min(art.preis, art.preisE,1)}

AQL:MIN() is defined for arrays not for arbitrary many parameters.
Try to use:
for art in artikel
return {"Preis" : min([art.preis, art.preisE,1])}

Related

Not able to use for loop in ternary operator in arangodb

How do we write conditions in arango, that includes for loops. I can elaborate the requirement below.
My requirement is if a particular attribute(array type) exists in the arango collection, i would read data from the collection(that requires a loop) or else, might do the following :
return null
return empty string ""
do nothing.
Is this possible to achieve in arango?
The helping methods could be -->
-- has(collectionname, attributename)
-- The ternary operator ?:
let attribute1 = has(doc,"attribute1") ?(
for name in doc.attribute1.names
filter name.language == "xyz"
return name.name
) : ""
But this dosent work. Seems like arango compiler first attempts to compile the for loop, finds nulls and reports error as below. Instead, it should have compiled "has" function first for the ternary operator being used.
collection or array expected as operand to FOR loop; you provided a value of type 'null' (while executing)
If there is a better way of doing it, would appreciate the advice!!
Thanks in advance!
Nilotpal
Fakhrany here from ArangoDB.
Regarding your question, this is a known limitation.
From https://www.arangodb.com/docs/3.8/aql/fundamentals-limitations.html:
The following other limitations are known for AQL queries:
Subqueries that are used inside expressions are pulled out of these
expressions and executed beforehand. That means that subqueries do not
participate in lazy evaluation of operands, for example in the ternary
operator. Also see evaluation of subqueries.
Also noted here for the ternary operator:
https://www.arangodb.com/docs/3.8/aql/operators.html#ternary-operator.
An answer to the question what to do may be to use a FILTER before enumerating over the attributes:
FOR doc IN collection
/* the following filter will only let those documents passed in which "attribute1.names" is an array */
FILTER IS_ARRAY(doc.attribute1.names)
FOR name IN doc.attribute1.names
FILTER name.language == "xyz"
RETURN name.name
Other solutions are also possible. Depends a bit on the use case.

Search the best match comparing prefixes

I have the numbers codes and text codes like in table1 below. And I have the numbers to search like in table2
for which I want to get the best match for a prefix of minimun length of 3 comparing from left to rigth and show as answer the corresponding TEXT CODE.
If there is an exact match, that would be the answer.
If there is no any value that has at least 3 length prefix then answer would be "not found".
I show some comments explaining the conditions applied in answer expected for each Number to search next to table2.
My current attempt shows the exact matches, but I'm not sure how to compare the values to search for the other conditions, when there is no exact match.
ncode = ["88271","1893","107728","4482","3527","71290","404","5081","7129","33751","3","40489","107724"]
tcode = ["RI","NE","JH","XT","LF","NE","RI","XT","QS","XT","YU","WE","RP"]
tosearch = ["50923","712902","404","10772"]
out = []
out.append([])
out.append([])
for code in tosearch:
for nc in ncode:
if code == nc:
indexOfMatched = ncode.index(nc)
out[0].append(nc)
out[1].append(tcode[indexOfMatched])
>>> out
[['404'], ['RI']]
The expected output would be
out = [
['50923', '712902', '404', '10772'],
['NOT FOUND', 'NE', 'RI', 'JH' ]
]
A simple solution you might consider would be the fuzzy-match library. It compares strings and calculates a similarity score. It really shines with strings rather than numbers, but it could easily be applied to find similar results in your prefix numbers.
Check out fuzzy-match here.
Here is a well written fuzzy-match tutorial.

Can a comparator function be made from two conditions connected by an 'and' in python (For sorting)?

I have a list of type:
ans=[(a,[b,c]),(x,[y,z]),(p,[q,r])]
I need to sort the list by using the following condition :
if (ans[j][1][1]>ans[j+1][1][1]) or (ans[j][1][1]==ans[j+1][1][1] and ans[j][1][0]<ans[j+1][1][0]):
# do something (like swap(ans[j],ans[j+1]))
I was able to implement using bubble sort, but I want a faster sorting method.
Is there a way to sort my list using the sort() or sorted() (Using comparator or something similar) functions while pertaining to my condition ?
You can create a comparator function that retuns a tuple; tuples are compared from left to right until one of the elements is "larger" than the other. Your input/output example is quite lacking, but I believe this will result into what you want:
def my_compare(x):
return x[1][1], x[1][0]
ans.sort(key=my_compare)
# ans = sorted(ans, key=my_compare)
Essentially this will first compare the x[1][1] value of both ans[j] and ans[j+1], and if it's the same then it will compare the x[1][0] value. You can rearrange and add more comparators as you wish if this didn't match your ues case perfectly.

FuzzyWuzzy for very similar records in Python

I have a dataset with which I want to find the closest string match. For that purpose I'm using FuzzyWuzzy in this way
sol=process.extract(t,dev2,scorer=fuzz.token_sort_ratio)
Where t is the string and dev2 is the list to compare to. My problem is that sometimes it has very similar records and options provided by FuzzyWuzzy seems to be lacking. And I've tested with token_sort, token_set, partial_token sort and set, ratio, partial_ratio, and WRatio.
For example, the string Italy - Serie A gives me the following 2 closest matches.
Token_sort_ratio: (92, 'Italy - Serie D');(86, 'Italian - Serie A')
The one wanted is obviously the second one, but character by character is closer the first one, which is a different league.
This happens as well with teams. If, let's say I have a string Buchtholz I would obtains Buchtholz II before I get TSV Buchtholz.
My main guess now would be to try and weight the presence and absence of several characters more heavily, like single capital letters at the end of the string, so if there is a difference in the letter or an absence it is weighted as less close. Or for () and special characters.
I don't know if there is a way to take this into account or you guys have a better approach to get the string that really matches.
Similarity matches often require knowledge of the data being analysed. i.e. it is not just a blind single round of matching. I recommend that you pass your results through more steps of matching, starting with inclusive/optimistic approaches (like token_set_ratio) with low cut off scores and working toward more exclusive/pessimistic approaches with higher cut off scores until you have a clear winner. If you know more about the text you're analyzing, you can even modify the strings as you progress.
In a case I worked on, I did similarity matches of goods movement descriptions. In the descriptions the numbers sequences were more important than the text. e.g. when looking for a match for "SLURRY VALVE 250MM RAGMAX 2000" the 250 and 2000 part of the string are important, otherwise I get a "SLURRY VALVE 50MM RAGMAX 2000" as the best match instead of "VALVE B/F 250MM,RAGMAX 250RAG2000 RAGON" which is a better result.
I put the similarity match process through two steps: 1. Get a bunch of similar matches using an optimistic matching scorer (token_set_ratio) 2. get the number sequences of these results and pass them through another round of matching with a more strict scorer (token_sort_ratio). Doing this gave me the better result in the example I showed above.
Below is some blocks of code that could be of assistance:
here's a function to get numbers from the sequence. (In your case you might use this to exclude numbers from your string instead?)
def get_numbers_from_string(description):
numbers = ''.join((ch if ch in '0123456789.-' else ' ') for ch in description)
numbers = ' '.join([nr for nr in numbers.split()])
return numbers
and here is a portion of the code I used to put the description match through two rounds:
try:
# get close match from goods move that has material numbers
df_material = pd.DataFrame(process.extract(description,
corpus_material,
scorer=fuzz.token_set_ratio),
columns=['Similar Text','Score']
)
if df_material['Score'][df_material['Score']>=cut_off_accuracy_materials].count()>=1:
similar_text = df_material['Similar Text'].iloc[0]
score = df_material['Score'].iloc[0]
if nr_description_numbers>4:
# if there are multiple matches found, then get best number combination match
df_material = df_material[df_material['Score']>=cut_off_accuracy_materials]
new_corpus = list(df_material['Similar Text'])
new_corpus = np.vectorize(get_numbers_from_string)(new_corpus)
df_material['numbers'] = new_corpus
df_numbers = pd.DataFrame(process.extract(description_numbers,
new_corpus,
scorer=fuzz.token_sort_ratio),
columns=['numbers','Score']
)
similar_text = df_material['Similar Text'][df_material['numbers']==df_numbers['numbers'].iloc[0]].iloc[0]
nr_score = df_numbers['Score'].iloc[0]
hope it helps, and good luck

What do empty square brackets after a variable name mean in Groovy?

I'm fairly new to groovy, looking at some existing code, and I see this:
def timestamp = event.timestamp[]
I don't understand what the empty square brackets are doing on this line. Note that the timestamp being def'd here should receive a long value.
In this code, event is defined somewhere else in our huge code base, so I'm not sure what it is. I thought it was a map, but when I wrote some separate test code using this notation on a map, the square brackets result in an empty value being assigned to timestamp. In the code above, however, the brackets are necessary to get correct (non-null) values.
Some quick Googling didn't help much (hard to search on "[]").
EDIT: Turns out event and event.timestamp are both zero.core.groovysupport.GCAccessor objects, and as the answer below says, the [] must be calling getAt() on these objects and returning a value (in this case, a long).
The square brackets will invoke the underlying getAt(Object) method of that object, so that line is probably invoking that one.
I made a small script:
class A {
def getAt(p) {
println "getAt: $p"
p
}
}
def a = new A()
b = a[]
println b.getClass()
And it returned the value passed as a parameter. In this case, an ArrayList. Maybe that timestamp object has some metaprogramming on it. What does def timestamp contains after running the code?
Also check your groovy version.
Empty list, found this. Somewhat related/possibly helpful question here.
Not at a computer, but that looks like it's calling the method event.timestamp and passing an empty list as a parameter.
The same as:
def timestamp = event.timestamp( [] )

Resources