I am working on the below query and trying to implement an ArangoDB wildcard search. The criteria is very simple, I'd like to match records similar to the name or a number field and limit the records to 25. The query works but is very slow, taking upwards of 30seconds. The goal is to optimize this query and get it as close to sub second as possible. I'd like the query to function similar to how a MySQL LIKE would work, matching using the % wildcard on both sides.
https://www.arangodb.com/docs/stable/release-notes-new-features37.html#wildcard-search
Note, one thing I noticed is that in the release note examples, rather than using FILTER, they are using SEARCH.
Additional info:
name is alphanumeric
number is going to by an 8 digit number
LET str = CONCAT("%", 'test', '%")
LET search = (
FOR doc IN name_search
FILTER ANALYZER(doc.name LIKE str, "text_en") OR
FILTER ANALYZER(doc.number LIKE str, "text_en")
LIMIT 25
RETURN doc
)
RETURN SEARCH
FILTER doesn't utilize indices. To speedup your wildcard queries you have to create an ArangoSearch view over a collection and use SEARCH keyword.
Feel free to check the following interactive tutorial (see "LIKE Support" section):
https://www.arangodb.com/learn/search/arangosearch-tutorial-3-7/
Related
it seems to be an issue / bad configuration in my solr index.
In detail when I perform a search by using some words in my query the solr result is ok, it returns me 50 entries.
Let me show an example :
Example 1)
url = http://mydomain:8983/solr/mycore/select?q=walk%20in%20the%20city
query = walk in the city
results = 231373, 231372, 231454, ....
Unfortunately when I use a single word in my query the solr result is "truncated"
Let me show some examples :
Example 2)
url = http://mydomain:8983/solr/mycore/select?q=Walk
query = Walk
results = 231373, 231372
Example 3)
url = http://mydomain:8983/solr/mycore/select?q=city
query = city
results = 231373, 231372
As you can see "Walk" and "city" words are inside my first query set.
The results in examples 2/3 are the same.
I'm a beginner in using solr, probably I perform some mistakes in solr configuration.
What I have to check first in order to optimize the query?
Thanks in advance.
Best regards.
Sergio
I would suggest to add debugQuery=true to your query and look at the debug node in the result. Under debug in particular look at the parsedquery to see what Solr is doing with your query, like what fields it's searching and whether it is using AND or OR between expressions (e.g. +fieldName means AND).
Also, under the debug there is an explain node that contains the documents that were found and why they were found. That should help you pin point why those records were returned. The explain output is pretty convoluted, but there is a lot of useful information there for this kind of issues.
(I realize this is not quite an answer to your question, but it's too long for a comment.)
When I query from Cassandra with a CQL statement of:
select * from abctpl where tpl like '1-1'
In the table, the content of tpl which I want is '1-1-1', and it's unique.
But actually I get 3 rows. The other two tpls do not contain a string '1-1-1', I guess Cassandra regard '-' as a wildcard character. If tpl's word like '11111111' also can be selected.
So how can I edit the CQL to make it query the exact data?
select * from abctpl where tpl like '1-1';
I think the problem here, is that you're not providing the LIKE wildcard character %. If your SASI index is defaulted to PREFIX mode, then this should work:
select * from abctpl where tpl like '1-1%';
Take a look through the DataStax docs on using SASI indexes: https://docs.datastax.com/en/dse/6.7/cql/cql/cql_using/useSASIIndex.html . That has some query examples, along with how to specify the mode at index creation.
make it query the exact data?
And if it's exact data that you're after, using equals (=) does a better job of that than LIKE does.
I've been trying to create a filter matching the end of the whole field text.
For example, taking a text field with the text: the brown fox jumped over the lazy dog
I would like it to match with a query that searches for fields with values ending with g. Something like:
{
"search":"*",
"queryType":"full",
"searchMode": "any",
...
"filter":"search.ismatchscoring('/g$/','MyField')"
}
The result is only records where MyField contains values with words composed by a the single g character anywhere on the string.
Using the filter directly also produces no results:
{
"search":"*",
"queryType":"full",
"searchMode": "any",
...
"filter":"MyField eq '*g'"
}
As far as I can see, the tokenization will always be the base for the search and filter, which means that on the above query, $ is completely ignored and matches will be by word, not by field.
Probably I could use the keyword_v2 analyzer on this field but then I would lose the tokenizarion that I use when searching normally.
One possible solution could be defining a second field in your index, with the same value as ‘MyField’, but with a different analyzer (e.g. keyword_v2). That way you may still search over the original field while filtering over the other.
Regardless, you might have simplified the filter for the sake of the example, but otherwise it seems redundant to use search.ismatchscoring() when not combining with another filter clause via ‘or’ – one can use the search parameter directly.
Moreover, regex might not be working because the default queryType for search.ismatchscoring() is simple, not full - please see docs here
I am currently using the tire client for elastic search. Lets say I have a field which is indexed as a field of type long in my elastic search mapping.
I am trying to achieve something like this:
search.query {|query| query.string "30*", :fields => ['id']}
Here 'id' is the long field about which I was talking about. But since I specify the fields in the query, the wildcard doesn't work and I end up getting the exact match as the only result.
But doing the same thing works with the _all search as the field type doesn't matter. I want this wildcard search to work while also searching for the search key in that particular field. Is there any way to do this without changing my mapping?
I see next solutions:
use multifield and make this also of a string type (but requires mapping change)
use range and translate this into something like:
(from 30 to 39) or (from 300 to 309) or (from 3000 to 3099)
or (from 30000 to 30999) or ... (to max value)
use http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html and check this using scripting
Thanks to #alex on that scripting tip. Finally I found something which worked. Phew!
So I ended up doing this(briefly):
search.query do |query|
query.filtered do |f|
f.filter :script, {
:script => "doc['id'].value.toString() ~= '^30[0-9]*$'"
}
end
end
Hope it helps.
I am newest in Lucene.
I'm using Lucene.NET version 2.9.4.
What is the difference between these queries?
the first is:
title:hello AND tags:word
the second is:
+title:hello +tags:word
I testing a software, and I note that the first returns 3 records, and the second returns many records.
I observe that the first returns records where title and tags fields are fuel, but the second returns records where title and tags can be empty.
Is it the difference?
There is no difference between the two. clause1 AND clause2 is effectively shorthand for +clause1 +clause2
Similarly: clause1 clause2 = clause1 OR clause2
Note, there is really no equivalent for +clause1 clause2 using the boolean operators.
Are you sending the query over the Internet, if you are and not urlencoding the request correctly it could be misinterting the '+' as an encoded space and therefore lucene just runs the second query as if the +'s not there which would just OR the two parts and give the results you get.
title:hello tags:word