ElasticSearch: Match Field Within Query - search

Searching for a string within an indexed document is simple with match. What about the opposite? I need to look for matches of a string field within a query. For example, searching for:
correct horse battery staple
Should match a document with a field with a value of horse battery, and only that. What is the query for that with ElasticSearch?
Edit: Here's a thread about someone wanting to do the same thing, but never received any replies: https://groups.google.com/d/topic/elasticsearch/IYDu5-0YD6E/discussion

Inverted index doesn't perform very well in knowing which multiple terms a document contains exactly. A solution found in the definitive guide was to index the term count and to query over the different possible combinations, which is very tedious.
Here is a related question (it's about filter, but the problematic is the same) with more developped answers.

The solution I came to was to use the porcolator API. I indexed the field value as a search query, and then matched it against a document that contained the query string. This method is working quite well. Here is how I'm creating the percolator:
curl -XPUT localhost:9200/myindex/.percolator/model-2332 -d '
{
"query": {
"match_phrase": {
"name": "horse battery"
}
}
}'
And how I'm querying for it:
curl -XGET localhost:9200/myindex/model/_percolate -d '
{
"doc": {
"name": "correct horse battery staple"
}
}'

Related

Querying mongoDB using pymongo (completely new to mongo/pymongo)

If this question seems too trivial then please let me know in the comments, I will do further research on how to solve it.
I have a collection called products where I store details of a particular product from different retailers. The schema of a document looks like this -
{
"_id": "uuid of a product",
"created_at": "timestamp",
"offers": [{
"retailer_id": 123,
"product_url": "url - of -a - product.com",
"price": "1"
},
{
"retailer_id": 456,
"product_url": "url - of -a - product.com",
"price": "1"
}
]
}
_id of a product is system generated. Consider a product like 'iPhone X'. This will be a single document with URLs and prices from multiple retailers like Amazon, eBay, etc.
Now if a new URL comes into the system, I need to make a query if this URL already exists in our database. The obvious way to do this is to iterate every offer of every product document and see if the product_url field matches with the input URL. That would require loading up all the documents into the memory and iterate through the offers of every product one by one. Now my question arises -
Is there a simpler method to achieve this? Using pymongo?
Or my database schema needs to be changed since this basic check_if_product_url_exists() is too complex?
MongoDB provides searching within arrays using dot notation.
So your query would be:
db.collection.find({'offers.product_url': 'url - of -a - product.com'})
The same syntax works in MongoDB shell or pymongo.

Unable to full text search in Solr

I have some data in solr. I want to search which name is Chinmay Sahu See below I have 3 results in output. But I got 3 instead of 1. Because Content name searched partially.
I want to full search those name having Chinmay Sahu only that contents will come.
Output:
"docs": [
{
"id": "741fde46a654879949473b2cdc577913",
"content_id": "1277",
"name": "Chinmay Sahu",
"_version_": 1596995745829879800
},
{
"id": "4e98d680efaab3afe051f3ddc00dc5f2",
"content_id": "1825",
"name": "Chinmay Panda",
"_version_": 1596995745829879800
}
{
"id": "741fde46a654879949473b2cdc577913",
"content_id": "1259",
"name": "Sasmita Sahu",
"_version_": 1596995745829879800
}
]
Query:
name:Chinmay Sahu
Expected :
"docs": [
{
"id": "741fde46a654879949473b2cdc577913",
"content_id": "1277",
"name": "Chinmay Sahu",
"_version_": 1596995745829879800
},
]
Please help
Try doing this
name:"Chinmay Sahu"
You need to do a phrase query to match the exact name.
I am guessing in your case the name field is using Standard tokenizer which will split tokens if whitespace is there. So while indexing in all the 3 docs there will be a token called "chinmay".
While you search using
name:Chinmay Sahu
Solr will search it like this since if there is no fieldName specified before a token solr automatically searches it in default_field.(however default field is removed from solr 7.3, So it depends on what version of solr are you using.
)
Name:chinmay AND default_field:sahu
So since all the three docs are having chinmay as a token in the index,the query will match all 3 docs.
Now i dont know what your default field is? can you post your solr schema? That way we can explain why you are seeing those 3 docs.
Since root545 already explained that field:foo bar will search for foo in field and bar in the default search field, I'll suggest that it seems like you don't want to concern yourself with the exact Lucene syntax for searching. The edismax query parser is well suited for separating the typed search string from what fields are being searched and whether you want all tokens to match.
The query in that case would be just Chinmay Sahu, while you'd set q.op=AND (all terms must match), defType=edismax (use the edismax query parser) and qf=name (search the name field):
q=Chinmay Sahu&q.op=AND&defType=edismax&qf=name
You can also tune the different phrase parameters to make sure that names with the tokens in the exact same sequence will be boosted higher than those that have them in the opposite sequence (i.e. Sahu Chinmay).
If this is a programmatic search where no user is actually typing in the suggestion, using a phrase search as suggested is the way to go (name:"Chinmay Sahu").
I would suggest using query like
name:(Chinmay Sahu)
And make sure default operator is AND either in settings or query string like q.op=AND
With that approach you can use user input much easier since you don't need to parse it too much.

CouchDB find by search term

I import a CSV file to CouchDB with the correct structure.
Now I would like to search for records matching one search term in ANY of the fields. Here is an example record :
{
"_id": "QW141401",
"_rev": "1-7aae4ce6f6c148d82d7d6e1e3ba28542",
"PART": {
"ONE": "QUA01135",
"TWO": "W/364",
"THREE": "QUA04384",
"FOUR": "QUA12167"
},
"FOO": {
"BAR": "C40"
},
"DÉSIGNATION": "THE QUICK BROWN FOX"
}
Now given a search term, for example QUA04384 this record should come up. Aloso for C40. And, if possible, also for a partial match like FOX
The keys under PART and FOO can change from record to record...
This could be a similar question. Probably you are looking for a Full Text Search mechanism.
Yo can try with couchdb-lucene or elasticseach
A stupid way to do this is to build an additional field (call it 'fulltext') in each Lucene document, containing the concatenation of all other field values. (Remember to build this completely dynamically so that every single field has its contents in this additional field no matter what the original field name was.) Then you can perform your searches on this 'fulltext' field.

Case insensitive search in arrays for CosmosDB / DocumentDB

Lets say I have these documents in my CosmosDB. (DocumentDB API, .NET SDK)
{
// partition key of the collection
"userId" : "0000-0000-0000-0000",
"emailAddresses": [
"someaddress#somedomain.com", "Another.Address#someotherdomain.com"
]
// some more fields
}
I now need to find out if I have a document for a given email address. However, I need the query to be case insensitive.
There are ways to search case insensitive on a field (they do a full scan however):
How to do a Case Insensitive search on Azure DocumentDb?
select * from json j where LOWER(j.name) = 'timbaktu'
e => e.Id.ToLower() == key.ToLower()
These do not work for arrays. Is there an alternative way? A user defined function looks like it could help.
I am mainly looking for a temporary low-effort solution to support the scenario (I have multiple collections like this). I probably need to switch to a data structure like this at some point:
{
"userId" : "0000-0000-0000-0000",
// Option A
"emailAddresses": [
{
"displayName": "someaddress#somedomain.com",
"normalizedName" : "someaddress#somedomain.com"
},
{
"displayName": "Another.Address#someotherdomain.com",
"normalizedName" : "another.address#someotherdomain.com"
}
],
// Option B
"emailAddressesNormalized": {
"someaddress#somedomain.com", "another.address#someotherdomain.com"
}
}
Unfortunately, my production database already contains documents that would need to be updated to support the new structure.
My production collections contain only 100s of these items, so I am even tempted to just get all items and do the comparison in memory on the client.
If performance matters then you should consider one of the normalization solution you have proposed yourself in question. Then you could index the normalized field and get results without doing a full scan.
If for some reason you really don't want to retouch the documents then perhaps the feature you are missing is simple join?
Example query which will do case-insensitive search from within array with a scan:
SELECT c FROM c
join email in c.emailAddresses
where lower(email) = lower('ANOTHER.ADDRESS#someotherdomain.com')
You can find more examples about joining from Getting started with SQL commands in Cosmos DB.
Note that where-criteria in given example cannot use an index, so consider using it only along another more selective (indexed) criteria.

Elasticsearch - Why am I not getting the same search results after updating a document?

Here's what I'm doing:
First, I make a search and get some documents
curl -XPOST index/type/_search
{
"query" : {
"match_all": {}
},
"size": 10
}
Then, I'm updating one of the documents resulted in the search
curl -XPOST index/type/_id/_update
{
"doc" : {
"some_field" : "Some modification goes here."
}
}
And finally, I'm doing exactly the same search as above.
But the curious thing is that I get all the previous documents, except the updated one. Why is it no longer among the documents in the search?
Thank you!
Since you're not sorting your documents, they are sorted by score. Your modification might have changed the document score after which the documents are sorted by default.
And since you're only taking the first 10 documents, you have no guarantee that your new document will come back in those 10 documents.

Resources