Marklogic search with ampersand - search

Suppose I am searching using one of the cts:query API's. I am looking for documents containing the phrase "John and Jane". Some of my documents have "John & Jane"(actually John & Jane) in them. I want them to be returned as well. Also consider reverse situation.
Does Marklogic provide any options to do that?

Queries expressed as cts:query items or XML are easy to rewrite with XQuery typeswitch expressions. The discussion list thread at http://markmail.org/message/6hxmuqnpnfm73j4n has an example of something similar.

Mike gives a good suggestion, but it might be worth to take a step back and look at your problem first. From your comment on Mike's answer I take it that you look for something like thesaurus expansion, but for the 'and' and '&' instead of the other words.
I may be wrong, but to my knowledge MarkLogic doesn't provide features to take care of something like that automatically. Functions like search:search and search:parse are powerfull, but don't go that far. You are up to your own to take a search string like yours, break it into parts manually to wrap it in a cts:query, or use something like search:parse for that, and then pull tricks like that of Mike to walk through your query-tree, and expand any particular search query node you would like to expand in a particular way.
The markmail thread to which Mike points, gives an example of how to walk a query-tree, and manipulate it. A little heavy for this particular case, but there is a thesaurus module that can help in various general cases. The following chapter of the Search Dev Guide explains its features, and ends with a small example of how to apply it:
http://docs.marklogic.com/guide/search-dev/thesaurus#chapter
HTH!

Assume your term to search is "John & Jane"
In order to Search above word ,you can use following line
let $inputSearchDetails ="John & Jane"
let $InputXML := xdmp:unquote($inputSearchDetails, "", ("format-xml", "repair-full"))

Related

How do I find mispelled words with sphinx?

Let's say I have the word "catheter". A user tries to search on my web app for that word but spells it "cathiter" or "cattiter" instead. How can I use SphinxQL to match the word from my SQL database based on the incorrectly spelled word? What would my query look like? Do I need to enable something in my index on my conf file? From my understanding, enable_star has been deprecated.
Yes, enable_star=0 has been depreciated, but not sure how that relevant!
Anyway sounds like you want the CALL SUGGEST function
http://sphinxsearch.com/blog/2016/10/03/2-3-2-feature-built-in-suggests/
The defuult settings a good place to start...
CALL QSUGGEST('cathiter','yourindex');
... if you dont min_infix_len defined on index, will need that. Alao dict=keywords - for some reason that requirement not mentioned in blog post.

No result if search contains dot and wildcard

I use azure search and have some document with a field like this {"Nr": "123.334.93"}.
If i search for querytype=full&search=123.334.93 then it found multiple document and if I search for querytype=full&search="123.334.93" then it found one document. This is as expected.
But if I search for querytype=full&search=123.334.9* I expect multiple document starting with 123.334.9 but none result are given back.
Do I miss somthing?
The same is when I use a regex expression like this querytype=full&search=/123\.334\.9.*/
Your query looks correct to me and should work.
A couple of things you might look into.
1) Sometimes you need to escape the * like this:
querytype=full&search=123.334.9\*
Usually, this is only necessary if you have more search terms after the *.
2) You can also narrow the fields searched down to only the field you need (for better efficiency) like this:
querytype=full&search=Nr:123.334.9\*
Hope this helps.
Based on the Comment from Yahnoosh.
The analyzer of the field was set to "de.microsoft". I change that to "standard.lucene", recreate and fill the index and it works as expected.
It seems that I have to be more carefully to set the analyzer and only use specific ones for fields with language specific content.
Thanks for your help.

Query wikipedia

I would like to query two or three terms in order to locate them in Wikipedia´s entries. Specifically, I´m trying to see if some terms get repeated in the first paragraphs (abstract) across entries. Could be direct or through dbpedia. Thanks
Using Mediawiki API you can find articles that contain those keywords.
Try the API:Search documentation.
For doing what you want to do, also, you'd probably need to find the articles that have those keywords and then parse the text to check if they are in the first paragraphs.
With this:
?action=parse&page=Nicolas_Cage&prop=text&section=0
you can get the HTML of the first section of a page (see this post).

Solr custom wildcard

I am pretty new to Solr and I am looking for a way to port the search features I have for my web application having a regular database to use Solr indexes. My problem so far is I have to customize the wildcards behaviour: for example, "?" should be "0 or 1 characters" not any character as it is now, "+" should mean any "white-space", "#" should be any digit and so on. Any good pointer?
Thanks!
There is no simple answer that I know of, I am afraid.
For 0 or 1 characters - you can replace the original query with an 'OR' query. Eg. mp? in your db search usecase becomes - 'mp OR mp?' in Solr.
White spaces are tokenized by default in text field. So, you can look at using a white space tokenizer as part of your custom 'text' field. There are several examples. text_ws in the sample schema only does whitespace tokenizing. You'd want to readup on tokenizers.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
There is no digit equivalent - you can do term1* OR term2* OR term3* ... etc. You can also use function queries that support numerical functions. http://wiki.apache.org/solr/FunctionQuery
It looks like the best choice in this case is to use regular expressions in the search. More details can be found here: http://1opensourcelover.wordpress.com/2013/09/29/solr-regex-tutorial/
It's not exactly what I was looking for as I will have to build my own solr-query on the back and I have a feeling that regular expressions abuse will create a little bit more overhead on my server. For the test I did it looks pretty fast.
I will leave the question open for a while maybe someone can come up with a better answer.

Solr not matching. Threshold setting, or something weird?

I'm using solr to search for articles. I created 2 test "body" sentences which have the common word "tall", but there is no match.
The Query---> Body:"There are tall people outside" AND !UserId:2
Does not match a post with:
Body: the KU tower is really tall
UserId:3
Is this just simply a very low matching score? or is there something else going on here? In the case of a low matching score should it really be that low? The body sentences are very short and share a common word, I would have expected some match.
EDIT: I think the matching isn't happening as a result of having the !UserId: 2 condition. If I try to match body sentences without that, its very liberal. Can anyone explain this? and perhaps how to best structure a query to avoid this type of specific behavior?
Thanks!
I have seen some funky behavior with the ! operator with Solr. I would suggest you use the - (negative indicator) instead as shown in the SolrQuerySyntax Wiki Page. Try changing your original query to Body:"There are tall people outside" AND -UserId:2 to see if that works as you are expecting.
For those who come after me, I found a solution however not necessarily an explanation for its behavior.
The Solr query:
(PostBody:There are tall people outside) AND !UserId:2
worked as I desired above. Note that if the quotes are added around the body, it does not match. I believe Solr attempts to match such a query as a single string rather than individual words.

Resources