how solr search works with starts with+ - search

How solr search works with starts with '+' symbol, search phone number with '+32498516 where phone number saved in database like: +32498516141,+32498516345
My Concern here :
Start with:
+32498516
q=cellPhone%3A%28%28%2B32498516*%29%29&start=0&rows=1048577&fl=firstName,lastName,indexType,userExtKey,resumeId,externalId,shareProfile,candidateId,publicIntranetNR,userId,id,score&fq=indexType%3ACANDIDATE+AND+%28shareProfile%3Atrue+OR+country%3AUS%29
32498516
q=cellPhone%3A%28%2832498516*%29%29&start=0&rows=1048577&fl=firstName,lastName,indexType,userExtKey,resumeId,externalId,shareProfile,candidateId,publicIntranetNR,userId,id,score&fq=indexType%3ACANDIDATE+AND+%28shareProfile%3Atrue+OR+country%3AUS%29
In Both cases getting same number of candidate in result :

Related

elasticsearch stops adding documents after some point

I'm new to elasticsearch and want to index many sentences to search them efficiently.
At first I tried bulk adding to an index, but that didn't work for me, so now I'm adding sentences one by one using the following piece of (python) code:
c = pycurl.Curl()
add_document(c, 'myIndexName', 'someJsonString', 99)
def add_document(c, index_name, js, _id):
c.setopt(c.POST, 1)
c.setopt(c.URL, 'localhost:9200/%s/sentence/%i' % (index_name, _id))
c.setopt(c.POSTFIELDS, json.dumps(js))
c.perform()
Where I'm incrementing the id, and an example of a json input string would be:
{"sentence_id": 2, "article_name": "Kegelschnitt", "paragraph_id": 1, "plaintext": "Ein Kegelschnitt ist der zweidimensionale Sonderfall einer Quadrik .", "postags": "Ein/ART Kegelschnitt/NN ist/VAFIN der/ART zweidimensionale/ADJA Sonderfall/NN einer/ART Quadrik/NE ./$."}
So far so good, seems to work. I suspect that getting this to work in a bulk import way is a lot more efficient, but since this is a one-time only process, efficiency is not my primary concern.
I'm using this query (on the command line) to get an overview of my indices:
curl 'localhost:9200/_cat/indices?v'
Which gives me (for the relevant index):
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open wiki_dump_jan2019 5 1 795502 276551 528.1mb 528.1mb
Similarly, the query:
curl -XGET 'localhost:9200/wiki_dump_jan2019/sentence/_count?pretty' -H 'Content-Type: application/json' -d '{"query": {"match_all": {}}}'
returns
{
"count" : 795502,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}
Telling me that I have 795.502 sentences in my index.
My problem here is that in total I do over 23 million inserts. I realise that there may well be some duplicate sentences, but checked this and found over 21 million unique sentences. My python code executed fine, no errors, and I checked the elasticsearch logs and did not find anything alarming in there. I'm a bit unsure about the number of docs.deleted from the index (276.551, see above), but I understood that this may have to do with re-indexing, duplicates, and should not necessarily be a problem (and in any case, the total number of docs and the docs.deleted are still way below my number of sentences).
The only thing I could find (getting close to my problem) was this post: elasticsearch stops indexing new documents after a while, using Tire , but the following query:
curl -XGET 'localhost:9200/_nodes/stats/process?filter_path=**.max_file_descriptors'
returns:
{"nodes":{"hoOAMZoCTkOgirg6_aIkUQ":{"process":{"max_file_descr^Ctors":65536}}}}
so from what I understand upon installation it defaulted to the max value and this should not be the issue.
Anyone who can shed some light on this?
UPDATE: Ok, I guess I'm officially stupid. My issue was that I used the sentence_id as index id in the adding/inserting process. This sentence_id is coming from one particular document, so the max nr of docs (sentences) in my index would be the highest sentence_id (the longest document in my data set apparently had 795502 sentences). It just kept overwriting all entries after every document... Sorry for having wasted your time if you read this. NOT an elasticsearch issue; bug in my python code (outside of the displayed function above).

Azure Search alphanumeric order

Is there a way to order results in Azure Search by another order than the default one ?
In my case, I have entries which number looks something like that : L1-1, L1-2, L2-1, L10-1, D1-1, etc...
I would like to do two things :
Most of all, order in alphanumeric order, not alphabetical, by which I mean that I want the order :
L1-1, L1-2, L2-1, L10-1
and not
L1-1, L1-2, L10-1, L2-1
Azure search default order gives me the second one.
If possible, I would also like to add some salt by specifying a custom order for first char (let's say L > D > Q).
For now, I retrieve all results, then order by that custom order, but that prevents me from building a paging or "infinite loading" system (I can't retrieve my results 10 by 10 if they are in the wrong order).
Cheers !

Partial search in Endeca Property

I have created a endeca property Address with RecordSearch (with WildCard) enabled.
When I search '487 Saxony' word for the above property in pipeline I get results (Matching Records: 10).
But when searching '487 Saxo' or '487 Sax', I'm not getting any result (Matching Records: 0).
Anyone please tell me what changes need to be done to get the desired result when searching '487 Saxo' or ''487 Sax'?
Note : I'm using version 6.1.3
Thanks in advance

Extract a particular string given its heading - regex

Am trying to extract a particular string in the below data.
I have to extract the observations mentioned in the data. There are various ways observations has been written, for ex : OBSERVATION, Observation, observed, OBSERVED ....
Please let me know how to extract this.
Data :
Preconditions :1)One valid navigation map should be installed on the MGU.2)One route must be active.\nActions/steps : 1)Press PTT button.\n2)Give voice command "Navigate to Alibaba Restaurant" and observe system\' s behavior.\n\nExpected result/behaviour:\n1) Confirmation prompt of given spoken command should be played. \n2)User shall get the list of POI\'s which are spoken by user for e.g. \n\n\nObserved result/behavior:\n1) Confirmation prompt of given spoken command is not played.\n2) User not able to select POI via speech commands.\n3)User getting the list of POI destination but user not able to select those point via spoken commands for e.g.

notesdocumentcollection.ftsearch and a search query with special characters

i try to make a search function in ssjs that looks like this.
notesdocumentcollection.ftsearch('"*' + searchword + '*"');
i have a document with this field value "Dr. Max Muster".
if i search for "dr" i get a result.
if i search for "dr. max" i don't get a result.
if i remove the wildcard and type "dr. max" i will get an result.
i also tryed it like this
notesdocumentcollection.ftsearch('*' + searchword + '*');
Is there any way to get an result with wildcards and special characters in the search query ?
P.S.
If i try this in the notesclient in the view it will work.
EDIT:
for this query "dr. ma" i got this debug results from the server
IN FTGSearch option = 0x400089
[12CC:000A-1A30] Query: dr. ma
[12CC:000A-1A30] Engine Query: ("drma")
[12CC:000A-1A30] OUT FTGSearch error = F22
[12CC:000A-1A30] FTGSearch: found=0, returne
[12CC:000A-1A30] IN FTGSearch option = 0x40008C
[12CC:000A-1A30] Query: *"dr**ma"*
[12CC:000A-1A30] Engine Query: ("*dr**ma*")
[12CC:000A-1A30] OUT FTGSearch error = F22
[12CC:000A-1A30] FTGSearch: found=0, returned=0, start=0, count=0, limit=0
OK first up the search engine uses a trigram system. So searching for 2 characters will not work as expected. The wild cards may be helping but there is no guarantee it will get everything.
So as I understand the next part if you manually type in the following into the Full Text Search bar in the notes client and it works? (quotes included)
"*dr. max*"
One thing to be aware of in the Notes client is that you can activate two different search modes (switch in basic preferences). Web query and Notes query.
By default web query is on (IIRC), so you search as if you would your standard internet search engines.
If you have switched it to Notes query, or the search starts with an all capitals word it use the syntax that Notes has used previously.
So it possible you are are seeing differences in the client vs XPages due to that.
To test this you can debug as follows. On the Domino server console type the following.
set config DEBUG_THREADID=1
set config CONSOLE_LOG_ENABLED=1
set config Debug_FTV_Search=1
Now do a search in the notes client and the XPage. It will generate something like the following on the Domino Console (note: I added the numbers at the start for the important lines).
IN FTGSearch
[07FC:0048-0A94] option = 0x400219
1. [07FC:0048-0A94] Query: ("*test*")
2. [07FC:0048-0A94] Engine Query: ("*test*"%STEM)
3. [07FC:0048-0A94] GTR query performed in 6 ms. 5 documents found
4. [07FC:0048-0A94] 0 documents disualified by deletion
5. [07FC:0048-0A94] 0 documents disqualified by ACL
6. [07FC:0048-0A94] 0 documents disqualified by IDTable
7. [07FC:0048-0A94] 0 documents disqualified by NIF
8. [07FC:0048-0A94] Results marshalled in 3 ms. 5 documents left
9. [07FC:0048-0A94] OUT FTGSearch error = 0
[07FC:0048-0A94] FTGSearch: found=5, returned=5, start=0, count=0, limit=0
[07FC:0048-0A94] Total search time 10 ms.
Explanation of each bit.
String you sent to the search engine. In this case it was "test" (with quotes)
The compiled search string.
How long it took and total number of documents found.
Total discarded because it was flagged as deleted.
Total discarded because you did not have the rights to view them.
Total discarded because of the IDTable results.
Total discarded because they would not appear in the view you are searching from.
Time it took and remaining documents.
If any errors occurred.
So generate those two search results and post them if it is not obvious why it mentioned it didn't work.
The documentation for FTSearch says to enclose words and phrases in quotes. So try this (where you enclose the searchword variable in quotes - and not the wildcard star):
notesdocumentcollection.ftsearch('*"' + searchword + '"*');
the Notes Fulltext Query Syntax is a better kept secret than the Disney Time share apartments (if you ever were at Disney you get the drift).
The official syntax guide is here: http://www-10.lotus.com/ldd/dominowiki.nsf/dx/full-text-syntax
What helped me a lot is to take the searchsite.ntf and rip it apart. Inside all concepts of FTSearch have been implemented in a working fashion (code that works beats documentation any time).
Hope that helps

Resources