Elasticsearch: searching specifically for words inside brackets - search

I'm trying to do an Elastic search for names that are inside brackets. I'm working with a database of names, and some of the names include maiden names within the first name field. Maiden names are indicated with brackets, like "Samantha [Murray]". My clients want our 'exact search' feature to work so that if you search for "Murray" you only get results with firstname Murray, not including maiden names; but if you search for "[Murray]", you get maiden names but NOT firstname = Murray, i.e. search for "Murray" >> "Murray Smith" but not "Samantha [Murray] Jones", search for "[Murray]" >> vice versa.
My problem so far is that elastic search seems to be ignoring the brackets entirely. Here is my query...
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field" : "first_name",
"query" : "\\[Murray\\]"
}
}
]
}
}
}
}
}
but I get the same results that I do for "query" : "Murray" with no brackets at all. I tried a regexp but the results were even worse, the names I got weren't even close to "Murray" (I got things like "Rogers").
Is this type of request possible in Elastic? If so, what do I need to change?

Get familiar with analyzers - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html
If you are using the defaults your analyzer is probably stripping down the brackets.
You need to define an analyzer that doesn't remove brackets if you want to be able to search by it.

Related

Storing a complex Query within MongoDb Document [duplicate]

This is the case: A webshop in which I want to configure which items should be listed in the sjop based on a set of parameters.
I want this to be configurable, because that allows me to experiment with different parameters also change their values easily.
I have a Product collection that I want to query based on multiple parameters.
A couple of these are found here:
within product:
"delivery" : {
"maximum_delivery_days" : 30,
"average_delivery_days" : 10,
"source" : 1,
"filling_rate" : 85,
"stock" : 0
}
but also other parameters exist.
An example of such query to decide whether or not to include a product could be:
"$or" : [
{
"delivery.stock" : 1
},
{
"$or" : [
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 60
}
},
{
"delivery.filling_rate" : {
"$gt" : 90
}
}
]
},
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 40
}
},
{
"delivery.filling_rate" : {
"$gt" : 80
}
}
]
},
{
"$and" : [
{
"delivery.delivery_days" : {
"$lt" : 25
}
},
{
"delivery.filling_rate" : {
"$gt" : 70
}
}
]
}
]
}
]
Now to make this configurable, I need to be able to handle boolean logic, parameters and values.
So, I got the idea, since such query itself is JSON, to store it in Mongo and have my Java app retrieve it.
Next thing is using it in the filter (e.g. find, or whatever) and work on the corresponding selection of products.
The advantage of this approach is that I can actually analyse the data and the effectiveness of the query outside of my program.
I would store it by name in the database. E.g.
{
"name": "query1",
"query": { the thing printed above starting with "$or"... }
}
using:
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
Which results in:
2016-03-27T14:43:37.265+0200 E QUERY Error: field names cannot start with $ [$or]
at Error (<anonymous>)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:161:19)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:165:18)
at insert (src/mongo/shell/bulk_api.js:646:20)
at DBCollection.insert (src/mongo/shell/collection.js:243:18)
at (shell):1:12 at src/mongo/shell/collection.js:161
But I CAN STORE it using Robomongo, but not always. Obviously I am doing something wrong. But I have NO IDEA what it is.
If it fails, and I create a brand new collection and try again, it succeeds. Weird stuff that goes beyond what I can comprehend.
But when I try updating values in the "query", changes are not going through. Never. Not even sometimes.
I can however create a new object and discard the previous one. So, the workaround is there.
db.queries.update(
{"name": "query1"},
{"$set": {
... update goes here ...
}
}
)
doing this results in:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 52,
"errmsg" : "The dollar ($) prefixed field '$or' in 'action.$or' is not valid for storage."
}
})
seems pretty close to the other message above.
Needles to say, I am pretty clueless about what is going on here, so I hope some of the wizzards here are able to shed some light on the matter
I think the error message contains the important info you need to consider:
QUERY Error: field names cannot start with $
Since you are trying to store a query (or part of one) in a document, you'll end up with attribute names that contain mongo operator keywords (such as $or, $ne, $gt). The mongo documentation actually references this exact scenario - emphasis added
Field names cannot contain dots (i.e. .) or null characters, and they must not start with a dollar sign (i.e. $)...
I wouldn't trust 3rd party applications such as Robomongo in these instances. I suggest debugging/testing this issue directly in the mongo shell.
My suggestion would be to store an escaped version of the query in your document as to not interfere with reserved operator keywords. You can use the available JSON.stringify(my_obj); to encode your partial query into a string and then parse/decode it when you choose to retrieve it later on: JSON.parse(escaped_query_string_from_db)
Your approach of storing the query as a JSON object in MongoDB is not viable.
You could potentially store your query logic and fields in MongoDB, but you have to have an external app build the query with the proper MongoDB syntax.
MongoDB queries contain operators, and some of those have special characters in them.
There are rules for mongoDB filed names. These rules do not allow for special characters.
Look here: https://docs.mongodb.org/manual/reference/limits/#Restrictions-on-Field-Names
The probable reason you can sometimes successfully create the doc using Robomongo is because Robomongo is transforming your query into a string and properly escaping the special characters as it sends it to MongoDB.
This also explains why your attempt to update them never works. You tried to create a document, but instead created something that is a string object, so your update conditions are probably not retrieving any docs.
I see two problems with your approach.
In following query
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
a valid JSON expects key, value pair. here in "query" you are storing an object without a key. You have two options. either store query as text or create another key inside curly braces.
Second problem is, you are storing query values without wrapping in quotes. All string values must be wrapped in quotes.
so your final document should appear as
db.queries.insert({
"name" : "query1",
"query": 'the thing printed above starting with "$or"... '
})
Now try, it should work.
Obviously my attempt to store a query in mongo the way I did was foolish as became clear from the answers from both #bigdatakid and #lix. So what I finally did was this: I altered the naming of the fields to comply to the mongo requirements.
E.g. instead of $or I used _$or etc. and instead of using a . inside the name I used a #. Both of which I am replacing in my Java code.
This way I can still easily try and test the queries outside of my program. In my Java program I just change the names and use the query. Using just 2 lines of code. It simply works now. Thanks guys for the suggestions you made.
String documentAsString = query.toJson().replaceAll("_\\$", "\\$").replaceAll("#", ".");
Object q = JSON.parse(documentAsString);

How to fuzzy query against multiple fields in elasticsearch?

Here's my query as it stands:
"query":{
"fuzzy":{
"author":{
"value":query,
"fuzziness":2
},
"career_title":{
"value":query,
"fuzziness":2
}
}
}
This is part of a callback in Node.js. Query (which is being plugged in as a value to compare against) is set earlier in the function.
What I need it to be able to do is to check both the author and the career_title of a document, fuzzily, and return any documents that match in either field. The above statement never returns anything, and whenever I try to access the object it should create, it says it's undefined. I understand that I could write two queries, one to check each field, then sort the results by score, but I feel like searching every object for one field twice will be slower than searching every object for two fields once.
https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzzy-match-query.html
If you see here, in a multi match query you can specify the fuzziness...
{
"query": {
"multi_match": {
"fields": [ "text", "title" ],
"query": "SURPRIZE ME!",
"fuzziness": "AUTO"
}
}
}
Somewhat like this.. Hope this helps.

How can I create an autocomplete with MongoDB full text search

I want to create an autocomplete input box that shows word suggestions as users type.
Basically, my problem is that when I use the $text operator for searching strings in a document, the queries will only match on complete stemmed words. This is for the same reason that if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries would match.
find = {$text: { $search: 'blue' } };
^ (doesn't match blueberry or bluebird on a document.)
I want to be able to do this. I want to match 'blueberry' or 'bluebird' with 'blue', and initially I thought this was possible by using a 'starts with' (^) regular expression, but it seems like $text and $search only accepts a string; not a regexp.
I would like to know if there is a way to do this that is not excessively complex to implement/maintain. So far, I've only seen people trying to accomplish this by creating a new collection with the results of running a map/reduce across the collection with the text index.
I do not want to use ElasticSearch or Solr because I think it is overkill for what I am trying to do, and although I sometimes think that eventually I will have no other choice, I still cannot believe that there is not a simpler way to accomplish this.
MongoDB full text search matches whole words only, so it is inherently not suitable for auto complete.
The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.
(Source: http://docs.mongodb.org/manual/core/index-text/)
You can now use Atlas Search natively in MongoDB Atlas to achieve this. You will have to first add the autocomplete field mapping in your index definition before you can use the autocomplete operator to your query. This can be accomplished through the Visual Editor or the JSON editor - there's a tutorial which walks you through how to implement it.
Here's the index definition template from the docs:
{
"mappings": {
"dynamic": true|false,
"fields": {
"<field-name>": [
{
"type": "autocomplete",
"analyzer": "lucene.standard",
"tokenization": "edgeGram|rightEdgeGram|nGram",
"minGrams": <2>,
"maxGrams": <15>,
"foldDiacritics": true|false
}
]
}
}
}
And the query, where you can also specify support for typo-tolerance via the fuzzy parameter:
{
$search: {
"index": "<index name>", // optional, defaults to "default"
"autocomplete": {
"query": "<search-string>",
"path": "<field-to-search>",
"tokenOrder": "any|sequential",
"fuzzy": <options>,
"score": <options>
}
}
}

Search within single document using Elasticsearch

If I want to search an index I can use:
$curl -XGET 'X/index1/_search?q=title:ES'
If I want to search a document type I can use:
$curl -XGET 'X/index1/docType1/_search?q=title:ES'
But if I want to search a specific document, this doesn't work:
$curl -XGET 'X/index1/docType1/documentID/_search?q=title:ES'
Is there a simple work around for this so that I can search within a single document as opposed to an entire index or an entire document type? To explain why I need this, I have to do some resource intensive queries to find what I'm looking for. Once I find the documents I need, I don't actually need the whole document, just the highlighted portion that matches the query. But I don't want to store all the highlighted hits in memory because I might not need them for a few hours and at times they could take up a lot of space (I would also prefer not to write them to disk). I'd rather store a list of document ids so that when I need the highlighted portion of a document I can just run the highlighted query on a specific document and get back the highlighted portion. Thanks in advance for your help!
You can index the document's id as a field, then when you query, include the unique document id as a term to narrow the results just to that single document.
'$curl -XPOST 'X/index1/docType1/_search' -d '{
"query": {
"bool": {
"must":[
{"match":{"doc":"223"}},
{"match":{"title":"highlight me please"}}
]
}
}
}'
You can use the Ids Query in Elasticsearch to search on a single document. Elasticsearch by default, indexes a field called _uid which is the combination of type and id so that it can be used for queries, aggregations, scripts, and sorting.
So the query you need will be as follows
curl -XGET 'X/index1/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"title": "ES"
}
},
{
"ids": {
"type" : "docType1",
"values": [
"documentID"
]
}
}
]
}
}
}'
If you need to search on multiple documents, then specify doc_ids in the values array in ids query.

elasticsearch prefix query for multiple words to solve the autocomplete use case

How do I get elastic search to work to solve a simple autocomplete use case that has multiple words?
Lets say I have a document with the following title - Elastic search is a great search tool built on top of lucene.
So if I use the prefix query and construct it with the form -
{
"prefix" : { "title" : "Elas" }
}
It will return that document in the result set.
However if I do a prefix search for
{
"prefix" : { "title" : "Elastic sea" }
}
I get no results.
What sort of query do I need to construct so as to present to the user that result for a simple autocomplete use case.
A prefix query made on Elastic sea would match a term like Elastic search in the index, but that doesn't appear in your index if you tokenize on whitespaces. What you have is elastic and search as two different tokens. Have a look at the analyze api to find out how you are actually indexing your text.
Using a boolean query like in your answer you wouldn't take into account the position of the terms. You would get as a result the following document for example:
Elastic model is a framework to store your Moose object and search
through them.
For auto-complete purposes you might want to make a phrase query and use the last term as a prefix. That's available out of the box using the match_phrase_prefix type in a match query, which was made available exactly for your usecase:
{
"match" : {
"message" : {
"query" : "elastic sea",
"type" : "phrase_prefix"
}
}
}
With this query your example document would match but mine wouldn't since elastic is not close to search there.
To achieve that result, you will need to use a Boolean query. The partial word needs to be a prefix query and the complete word or phrase needs to be in a match clause. There are other tweaks available to the query like must should etc.. that can be applied as needed.
{
"query": {
"bool": {
"must": [
{
"prefix": {
"name": "sea"
}
},
{
"match": {
"name": "elastic"
}
}
]
}
}
}

Resources