AZURE SEARCH, ismatch not filtering - azure

I'm using Azure Search for perform some customs search in a database.
I got this one field that have this kind of structure:
"STUFF": "05-05-16-00|"
but I'm having trouble by creating the filter, because its possible that I'll not have all the numbers that builds this structure. It all depends that what the final user will type. So I need a wildcard to fill the blanks with the missing numbers, like this
"05-05-??-??" -> the pipe is important, because this field can have more than 1 code inside.
Now I need to catch all the possible elements that STARTS WITH 05-05, like, for example: 05-05-11-01
I thought I suposed to use the search.ismatch() function, but it doesnt work.
here some code:
search.ismatch('05-05-??-??','STUFF');
And the results were:
"STUFF": "02-02-16-00|",
"STUFF": "02-02-14-00|",
this is driving me crazy, because I dont know why this results came back.
Maybe is important to know that Im performing a POST request to the Azure Search API with this code in 'filter'
Maybe i should to escape this especial characters like - and ? like this
search.ismatch('05\\-05\\-\\?\\?\\-\\?\\?','STUFF')
But the results were the same.
Can somebody please help me ?
EDIT 1
following this Article I change some things and make the following search:
search.ismatch('\"05-00*\"','STUFF','simple', 'all')
And I starting the get some results, but now this is my results:
"STUFF": "06-05-02-00|", //WRONG
"STUFF": "05-02-05-01|", //RIGHT
"STUFF": "05-02-02-07|", //RIGHT
For some reason, it's returing the right structure but not the in the front of the text.
EDIT 2
I made some changes and change all the "-" for the keyword "OU" and I'm trying to follow this question to make sore like a "contains", but i perfoming a POST request with the following parameters
{
"search": "*",
"filter": "search.ismatch('/.*08010000OU/.*','STUFF', 'full', 'all')",
"skip": "0",
"count": true
}
Im trying to use a wildcard in the begining of the query search because I still missing some information.

I believe you won't be able to solve this using the StandardAnalyzer. Try switching to WhitespaceAnalyzer for this particular field and it probably will work with "05-05*"

Related

Unable to full text search in Solr

I have some data in solr. I want to search which name is Chinmay Sahu See below I have 3 results in output. But I got 3 instead of 1. Because Content name searched partially.
I want to full search those name having Chinmay Sahu only that contents will come.
Output:
"docs": [
{
"id": "741fde46a654879949473b2cdc577913",
"content_id": "1277",
"name": "Chinmay Sahu",
"_version_": 1596995745829879800
},
{
"id": "4e98d680efaab3afe051f3ddc00dc5f2",
"content_id": "1825",
"name": "Chinmay Panda",
"_version_": 1596995745829879800
}
{
"id": "741fde46a654879949473b2cdc577913",
"content_id": "1259",
"name": "Sasmita Sahu",
"_version_": 1596995745829879800
}
]
Query:
name:Chinmay Sahu
Expected :
"docs": [
{
"id": "741fde46a654879949473b2cdc577913",
"content_id": "1277",
"name": "Chinmay Sahu",
"_version_": 1596995745829879800
},
]
Please help
Try doing this
name:"Chinmay Sahu"
You need to do a phrase query to match the exact name.
I am guessing in your case the name field is using Standard tokenizer which will split tokens if whitespace is there. So while indexing in all the 3 docs there will be a token called "chinmay".
While you search using
name:Chinmay Sahu
Solr will search it like this since if there is no fieldName specified before a token solr automatically searches it in default_field.(however default field is removed from solr 7.3, So it depends on what version of solr are you using.
)
Name:chinmay AND default_field:sahu
So since all the three docs are having chinmay as a token in the index,the query will match all 3 docs.
Now i dont know what your default field is? can you post your solr schema? That way we can explain why you are seeing those 3 docs.
Since root545 already explained that field:foo bar will search for foo in field and bar in the default search field, I'll suggest that it seems like you don't want to concern yourself with the exact Lucene syntax for searching. The edismax query parser is well suited for separating the typed search string from what fields are being searched and whether you want all tokens to match.
The query in that case would be just Chinmay Sahu, while you'd set q.op=AND (all terms must match), defType=edismax (use the edismax query parser) and qf=name (search the name field):
q=Chinmay Sahu&q.op=AND&defType=edismax&qf=name
You can also tune the different phrase parameters to make sure that names with the tokens in the exact same sequence will be boosted higher than those that have them in the opposite sequence (i.e. Sahu Chinmay).
If this is a programmatic search where no user is actually typing in the suggestion, using a phrase search as suggested is the way to go (name:"Chinmay Sahu").
I would suggest using query like
name:(Chinmay Sahu)
And make sure default operator is AND either in settings or query string like q.op=AND
With that approach you can use user input much easier since you don't need to parse it too much.

How can I get elastic search to return results inside angle brackets?

I'm new to elastic search. I'm trying to fix our search so that it will allow users to search on content within html tags. Currently, we're using a whitespace tokenizer because we need it to return results on hyphenated names. Consequently, aname123-suffix project is indexed as ["aname123-suffix", "project"] and a user search for "aname123-*" returns the correct results.
My problem arises because we also want to be able to search on content within html tags. So, for example for a project called <aname123>-suffix project, we'd like to be able to enter the search term <aname123>-* and get back the correct results.
The index has the correct tokens for a whitespace tokenizer, namely ["<aname123>-suffix", "project"] but when my search string is "\<aname123\>\-suffix" or "\\<aname123\\>\\-suffix" elastic search returns no results.
I think the solution lies either in
modifying the search string so that elastic search returns <aname123>-suffix when I ask for it; or
being able to index the content within the tag separately from the whitespace tokens, i.e. ["<aname123>-suffix", "project", "aname123", "suffix"]
So far I've been approaching it by changing the indexing, but I have not yet succeeded. A standard tokenizer will allow search results for content within tags, but it fails to return search results for aname123-*. Currently my analyzer settings look like this:
{ "analysis":
{ "analyzer":
{ "my_whitespace_analyzer" :
{"type": "custom"
{"tokenizer": "whitespace},
{"filter": ["standard", "lowercase", "stop"]}
}
},
{ "my_tag_analyzer":
{"type": "custom"
{"tokenizer": "standard"},
{"filter": ["standard", "lowercase", "stop"]}
}
}
}
}
I can create a custom char filter that strips out the < and the >, so my index contains aname123; but for some reason elastic search still does not return correct results when searching on <aname123>*. However, when I use instead a standard analyzer, the index contains aname123 and it returns the expected results for <aname123>* ... What is so special about angle brackets in elastic search?
You may want to take a look at the html_strip character filter:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html
An example from one of the elasticsearch developers is here:
https://gist.github.com/clintongormley/780895

Kibana and groovy scripting

I was looking for a way to calculate a ratio on Kibana. After many researches i found this way :
Using the "JSON Input" feature in a visualisation.
I have all my informations in an index, with 2 types of documents (boots and reboots).
I am looking for the script which count the number of documents with the type boots, same for the reboots type then divide the second by the first.
It sounds really easy, but i do not find any way to get it after my researches, and i am not used to groovy enough yet to do it by myself.
I found many ways to manipulate documents values (doc['mydocname'].values etc), but nothing about the type.
Thanks in advance.
EDIT : I tried this
{
"aggs" : {
"boots_count" : { "value_count" : { "_type" : "boots" } }
}
}
Which is supposed to count the number of fields (here the field _type) in the index. But when i put it into "JSON Input" in a visualisation, that results in an error :
Error: Request to Elasticsearch failed: {"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[BbXJ0O6tRxa_OcyBfYCGJQ][informationbe][0]: SearchParseException[[informationbe][0]: from[-1],size[0]: Parse Failure [Failed to parse source [{\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"#sitePoste\",\"size\":5,\"order\":{\"1\":\"desc\"}},\"aggs\":{\"1\":{\"avg\":{\"script\":\"0\",\"lang\":\"expression\",\"ratio\":{\"boots_count\":{\"value_count\":{\"_type\":\"boots\"}}}}}}}}
I am wrong. But where ?
EDIT2 : In other hand, i am trying scripted fields, with something like this using lucene expression :
doc['_type:boots'].count / doc['_type:reboots'].count
but it doesnt work more, i am pretty confident about the "doc['_type:boots']" part, i guess the problem is on the "XXX.count" part.
After many attempts, i understand better and better how it works. Default scripted fields scope is on the document, not on the whole index, so i cant do a count action of whole values of the index from documents in it.
I am looking for a workaround, i'll post it it if find something interesting.
I finally solved my problem :
I added a scripted field, if the type of the document is boots, the scripted field = 1, else 0. Then i created a search with only boots and reboots documents (filter _type:boots _type:reboots) and calculated the average of the scripted field in a metric.
Everything works well !

How to find an object which is at nth nested level in mongoDB? (single collection, single document)

I am trying to find an nth object using '_id', which is in the same document.
Any suggestions or references or code samples would be appreciated.
(e.g)
Document will look as below:
{
"_id": "xxxxx",
"name": "One",
"pocket": [{
"_id": "xxx123",
"name": "NestedOne",
"pocket": []
}, {
"_id": "xxx1234",
"name": "NestedTwo",
"pocket": [{
"_id": "xxx123456",
"name": "NestedTwoNested",
"pocket": [{"_id": "xxx123666",
"name": "NestedNestedOne",
"pocket": []
}]
}]
}]
}
The pockets shall hold more pockets and it is dynamic.
Here, I would like to search "pocket" using "_id" , say "xxx123456", but without using static reference.
Thanks again.
I highly recommend you change your document structure to something easier to manage/search, as this will only become more of a pain to work with.
Why not use multiple collections, like explained in this answer?
So an easy way to think about this for your situation, which I hope is easier for you to reason about than dropping some schema code...
Store all of your things as children in the same document. Give them unique _ids.
Store all of the contents of their pockets as collections. The collections simply hold all the ids that would normally be inside the pocket (instead of referencing the pockets themselves).
That way, a lot of your work can happen outside of DB calls. You just batch pull out the items you need when you need them, instead of searching nested documents!
However, if you can work with the entire document:
Looks like you want to do a recursive search a certain of number of levels deep. I'll give you a general idea with some pseudocode, in hopes that you'll be able to figure the rest out.
Say your function will be:
function SearchNDeep(obj, n, id){
/**
You want to go down 1 level, to pocket
see if pocket has any things in it. If so:
Check all of the things...for further pockets...
Once you've checked one level of things, increment the counter.
When the counter reaches the right level, you'd want to then see if the object you're checking has a `'_id'` of `id`.
**/
}
That's the general idea. There is a cleaner, recursive way to do this where you call SearchNDeep while passing a number for how deep you are, base case being no more levels to go, or the object is found.
Remember to return false or undefined if you don't find it, and the right object if you do! Good luck!

search with startkey, endkey and array keys

I have a view wich returns several elements with array keys.
Example :
{"total_rows":4,"offset":0,"rows":[
{"id":"","key":[15,"2"],"value":1,"doc":{},
{"id":"","key":[20,"2"],"value":1,"doc":{},
{"id":"","key":[20,"3"],"value":1,"doc":{},
{"id":"","key":[20,"4"],"value":1,"doc":{}
]}
I'm trying to search through those elements. So if I do the following request :
/database/_design/element/_view/all/?
startkey=[15, "2"]&
endkey=[20, "3"]&
include_docs=true&reduce=false
Live example : http://jchris.couchone.com/keyhuh/_design/Record/_view/by_CreationDate_and_BoreholeName?startkey=[1267686720,%22sp4%22]&endkey=[1267686725,%22sp4\u9999%22]&include_docs=true&reduce=false
This one doesn't works. It returns me all the records, even the last one, which doesn't meets the second element of the array.
Strangely enough, it works with strings only.
Example :
{"total_rows":4,"offset":0,"rows":[
{"id":"","key":["15","2"],"value":1,"doc":{},
{"id":"","key":["20","2"],"value":1,"doc":{},
{"id":"","key":["20","3"],"value":1,"doc":{},
{"id":"","key":["20","4"],"value":1,"doc":{}
]}
if I do the following request :
/database/_design/element/_view/all/?
startkey=["15", "2"]&
endkey=["20", "3"]&
include_docs=true&
reduce=false
Live Example : http://jchris.couchone.com/keyhuh/_design/Record/_view/by_Client_and_BoreholeName?startkey=[%22Test1%22,%22sp4%22]&endkey=[%22Test1%22,%22sp4\u9999%22]&include_docs=true&reduce=false
Here it'll work well and only return the three first elements.
Am I missing something with couchdb's search for arrays with integers and strings ? Or have I fallen on a bug ?
Note : it does the same with CouchDB 0.10 and 0.11.
This looks wrong, and there are a few things it could be. Is it possible for you to share your code with us? If the data isn't proprietary you could replicate your db to http://jchris.couchone.com/keyhuh and I'll take a look at the whole thing there.
...
Thanks for posting the live data. This is the query that is busted?
http://jchris.couchone.com/keyhuh/_design/Record/_view/by_Client_and_BoreholeName?startkey=[%22Test1%22,%22sp4%22]&endkey=[%22Test1%22,%22sp4\u9999%22]&reduce=false
Because that looks fine to me. What am I missing?

Resources