I know when we use the filter function, we could apply a LOWER()/UPPER() function to match our search criterion.
FOR d IN doc
FILTER LOWER(d.category) == "abc"
LIMIT 10
RETURN d
However, what if I use PHRASE() in arangosearch?
I write code like this, but I get error messages.
FOR d IN vKBS
SEARCH ANALYZER(MIN_MATCH(
PHRASE(LOWER(d["category"]), 'abc'),
PHRASE(LOWER(d["name"]), 'abc'),
, 1), 'text_en')
SORT BM25(d)
LIMIT 50
RETURN d
How can I ignore case sensitivity in arangosearch?
You can check the analyzer option. the en_text analyzer should already lower the case if not you can create another analyzer of type text
You can check the analyzers docs here
https://www.arangodb.com/docs/stable/arangosearch-analyzers.html#text
Related
I have the numbers codes and text codes like in table1 below. And I have the numbers to search like in table2
for which I want to get the best match for a prefix of minimun length of 3 comparing from left to rigth and show as answer the corresponding TEXT CODE.
If there is an exact match, that would be the answer.
If there is no any value that has at least 3 length prefix then answer would be "not found".
I show some comments explaining the conditions applied in answer expected for each Number to search next to table2.
My current attempt shows the exact matches, but I'm not sure how to compare the values to search for the other conditions, when there is no exact match.
ncode = ["88271","1893","107728","4482","3527","71290","404","5081","7129","33751","3","40489","107724"]
tcode = ["RI","NE","JH","XT","LF","NE","RI","XT","QS","XT","YU","WE","RP"]
tosearch = ["50923","712902","404","10772"]
out = []
out.append([])
out.append([])
for code in tosearch:
for nc in ncode:
if code == nc:
indexOfMatched = ncode.index(nc)
out[0].append(nc)
out[1].append(tcode[indexOfMatched])
>>> out
[['404'], ['RI']]
The expected output would be
out = [
['50923', '712902', '404', '10772'],
['NOT FOUND', 'NE', 'RI', 'JH' ]
]
A simple solution you might consider would be the fuzzy-match library. It compares strings and calculates a similarity score. It really shines with strings rather than numbers, but it could easily be applied to find similar results in your prefix numbers.
Check out fuzzy-match here.
Here is a well written fuzzy-match tutorial.
I tried to use the SEARCH keywords in Arangodb with PHRASE function to conduct a fuzzy search query in a view, and the problem is that if I tried to use the AND keyword to add multiple searches conditions it will only return an empty set, is there something wrong in my query?
FOR a in newView
SEARCH (a.LocationStreet.StreetNumberText == 18 AND PHRASE( a.StreetName, "GREY", "text_en") AND PHRASE(a.eng, "St.Johns", "text_en") )
FOR v, e, p IN 5 OUTBOUND a GRAPH 'Dwellings'
SORT BM25 (a) desc
LIMIT 10
RETURN [BM25(a),p.vertices]
I have a list of type:
ans=[(a,[b,c]),(x,[y,z]),(p,[q,r])]
I need to sort the list by using the following condition :
if (ans[j][1][1]>ans[j+1][1][1]) or (ans[j][1][1]==ans[j+1][1][1] and ans[j][1][0]<ans[j+1][1][0]):
# do something (like swap(ans[j],ans[j+1]))
I was able to implement using bubble sort, but I want a faster sorting method.
Is there a way to sort my list using the sort() or sorted() (Using comparator or something similar) functions while pertaining to my condition ?
You can create a comparator function that retuns a tuple; tuples are compared from left to right until one of the elements is "larger" than the other. Your input/output example is quite lacking, but I believe this will result into what you want:
def my_compare(x):
return x[1][1], x[1][0]
ans.sort(key=my_compare)
# ans = sorted(ans, key=my_compare)
Essentially this will first compare the x[1][1] value of both ans[j] and ans[j+1], and if it's the same then it will compare the x[1][0] value. You can rearrange and add more comparators as you wish if this didn't match your ues case perfectly.
I am currently new with NLP and need guidance as of how I can solve this problem.
I am currently doing a filtering technique where I need to brand data in a database as either being correct or incorrect. I am given a structured data set, with columns and rows.
However, the filtering conditions are given to me in a text file.
An example filtering text file could be the following:
Values in the column ID which are bigger than 99
Values in the column Cash which are smaller than 10000
Values in the column EndDate that are smaller than values in StartDate
Values in the column Name that contain numeric characters
Any value that follows those conditions should be branded as bad.
However, I want to extract those conditions and append them to the program that I've made so far.
For instance, for the conditions above, I would like to produce
`if ID>99`
`if Cash<10000`
`if EndDate < StartDate`
`if Name LIKE %[1-9]%`
How can I achieve the above result using the Stanford NLP? (or any other NLP library).
This doesn't look like a machine learning problem; it's a simple parser. You have a simple syntax, from which you can easily extract the salient features:
column name
relationship
target value or target column
The resulting "action rule" is simply removing the "syntactic sugar" words and converting the relationship -- and possibly the target value -- to its symbolic form.
Enumerate all of your critical words for each position in a lexicon. Then use basic string manipulation operators in your chosen implementation language to find the three needed fields.
EXAMPLE
Given the data above, your lexicons might be like this:
column_trigger = "Values in the column"
relation_dict = {
"are bigger than" : ">",
"are smaller than" : "<",
"contain" : "LIKE",
...
}
value_desc = {
"numeric characters" : "%[1-9]%",
...
}
From here, use these items in standard parsing. If you're not familiar with that, please look up the basics of a simple sentence grammar in your favourite programming language, with rules such as such as
SENTENCE => SUBJ VERB OBJ
Does that get you going?
Assuming I have the fields
textFieldA
textFieldB
specialC
in my index. Now I want to query these with
textFieldA:"searchVal" textFieldB:"searchVal" specialC:"somecode"
But I only want to boost matches on specialC if there were also matches on at least one of the other fields.
Example:
DocumentA:
textFieldA:"This is a test" textFieldB:"for clarification" specialC:"megacode"
DocumentB:
textFieldA:"Doesnt contain" textFieldB:"searched word here" specialC:"megacode"
DocumentC:
textFieldA:"But this again" textFieldB:"contains test" specialC:"supercode"
Now when searching for example with
textFieldA:"test" textFieldB:"test" specialC:"supercode"
I want the results
DocumentC
DocumentA
with document C having the highest rank, but document B being excluded.
How can this be achieved?
q=textFieldA:"test" OR textFieldB:"test" OR textFieldA:"test" AND specialC:"supercode" OR textFieldB:"test" AND specialC:"supercode"&bq=(specialC:"supercode")^100
Should return only DocumentC and DocumentA in the desired order. bq means boosting one field/ field value, see more here https://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_negative_.28or_very_low.29_boost_to_documents_that_match_a_query.3F .
As far as I know query boosting works only if you actually query for the thing you want to boost (kind of intuitive). That is why I added the last 2 parts to the query.