Elasticsearch - Why am I not getting the same search results after updating a document? - search

Here's what I'm doing:
First, I make a search and get some documents
curl -XPOST index/type/_search
{
"query" : {
"match_all": {}
},
"size": 10
}
Then, I'm updating one of the documents resulted in the search
curl -XPOST index/type/_id/_update
{
"doc" : {
"some_field" : "Some modification goes here."
}
}
And finally, I'm doing exactly the same search as above.
But the curious thing is that I get all the previous documents, except the updated one. Why is it no longer among the documents in the search?
Thank you!

Since you're not sorting your documents, they are sorted by score. Your modification might have changed the document score after which the documents are sorted by default.
And since you're only taking the first 10 documents, you have no guarantee that your new document will come back in those 10 documents.

Related

Mongo: create if document doesn't exist, otherwise do nothing

I have a Mongo collection that has two fields, let's say "name" and "randomString".
I want to create a random string for a name, only if it doesn't exist already. So the first request for { name: "SomeName" } will result in saving e.g. { name: "someName", randomString: "abc" }. The second request will do nothing.
Is there a mongo command for this? All I could find are things like findOneAndUpdate, replaceOne etc, who all support an optional "upsert" but their behavior on match is to update, I want the behavior on match to be do nothing.
I'm not looking for an if-then solution like in this question, as I have a race condition issue - I need to be able to get multiple requests simultaneously without updating the document or failing any of the requests.
Yes there is a command for this you can do this by using $addToSet method.
For more info please go through the given link: https://docs.mongodb.com/manual/reference/operator/update/addToSet/
PS: If you still have any confusion regarding this question please feel free to comment further.
Thanks
This is the solution I found in the end:
CustomerRandomString.findOneAndUpdate(
{ name: "someName" },
{
$setOnInsert: { randomString: generateRandomString() },
},
{ upsert: true },
);
The setOnInsert operator only applies when creating a new document, which is exactly what I needed.
EDIT: per the docs, this solution requires a unique index on the field in order to fully avoid duplicates.
You can easily do it using the $exists command to check for randomString field and then use $set in an aggregation pipeline to upsert that field.
db.collection.updateMany({"name":someName,"randomString":{$exists: false}},[{$set:{"randomString":"abcd"}}],{upsert:true})
If the condition query doesn't match with any documents, then it returns null.
Note: Aggregation pipeline works in updateMany() only from MongoDB version 4.2 and above.

Querying mongoDB using pymongo (completely new to mongo/pymongo)

If this question seems too trivial then please let me know in the comments, I will do further research on how to solve it.
I have a collection called products where I store details of a particular product from different retailers. The schema of a document looks like this -
{
"_id": "uuid of a product",
"created_at": "timestamp",
"offers": [{
"retailer_id": 123,
"product_url": "url - of -a - product.com",
"price": "1"
},
{
"retailer_id": 456,
"product_url": "url - of -a - product.com",
"price": "1"
}
]
}
_id of a product is system generated. Consider a product like 'iPhone X'. This will be a single document with URLs and prices from multiple retailers like Amazon, eBay, etc.
Now if a new URL comes into the system, I need to make a query if this URL already exists in our database. The obvious way to do this is to iterate every offer of every product document and see if the product_url field matches with the input URL. That would require loading up all the documents into the memory and iterate through the offers of every product one by one. Now my question arises -
Is there a simpler method to achieve this? Using pymongo?
Or my database schema needs to be changed since this basic check_if_product_url_exists() is too complex?
MongoDB provides searching within arrays using dot notation.
So your query would be:
db.collection.find({'offers.product_url': 'url - of -a - product.com'})
The same syntax works in MongoDB shell or pymongo.

Return only _source from a search

Is it possible to only retrieve the _source document(s) when I execute a search query with the (official) nodejs-elasticsearch library? According to the documentation, there seems to be a way, sort of:
Use the /{index}/{type}/{id}/_source endpoint to get just the _source field of the document, without any additional content around it. For example:
curl -XGET 'http://localhost:9200/twitter/tweet/1/_source'
And the corresponding API call in the nodejs library is:
client.getSource([params, [callback]])
However, this method only seems to be able to retrieve documents on an ID basis. I need to issue a full search body (with filters and query_strings and whatnot), which this method doesn't support.
I'm running ES 1.4
You can use "fields" for this. See a simplified example below. Go ahead and customize your query as per your requirement:
{
"fields": [
"_source"
],
"query": {
"match_all": {}
}
}
The value of fields _index, _type, _id and _score will always be present in the response of Search API.

ElasticSearch: Match Field Within Query

Searching for a string within an indexed document is simple with match. What about the opposite? I need to look for matches of a string field within a query. For example, searching for:
correct horse battery staple
Should match a document with a field with a value of horse battery, and only that. What is the query for that with ElasticSearch?
Edit: Here's a thread about someone wanting to do the same thing, but never received any replies: https://groups.google.com/d/topic/elasticsearch/IYDu5-0YD6E/discussion
Inverted index doesn't perform very well in knowing which multiple terms a document contains exactly. A solution found in the definitive guide was to index the term count and to query over the different possible combinations, which is very tedious.
Here is a related question (it's about filter, but the problematic is the same) with more developped answers.
The solution I came to was to use the porcolator API. I indexed the field value as a search query, and then matched it against a document that contained the query string. This method is working quite well. Here is how I'm creating the percolator:
curl -XPUT localhost:9200/myindex/.percolator/model-2332 -d '
{
"query": {
"match_phrase": {
"name": "horse battery"
}
}
}'
And how I'm querying for it:
curl -XGET localhost:9200/myindex/model/_percolate -d '
{
"doc": {
"name": "correct horse battery staple"
}
}'

Can I retrieve all revisions of a deleted document?

I know I can retrieve all revisions of an "available" document, but can I retrieve the last "available" version of a deleted document? I do not know the revision id prior to the delete. This is the command I am currently running...it returns {"error":"not_found","reason":"deleted"}.
curl -X GET http://localhost:5984/test_database/a213ccad?revs_info=true
I've got this problem, trying to recover deleted document, here is my solution:
0) until you run a compaction, get deleted history, e.g.:
curl http://example.iriscouch.com/test/_changes
1) you'll see deleted documents with $id and $rev, put empty document as new version, e.g.:
curl -X PUT http://example.iriscouch.com/test/$id?rev=$rev -H "Content-Type: application/json" -d {}
2) now you can get all revisions info, e.g:
curl http://example.iriscouch.com/test/$id?revs_info=true
See also Retrieve just deleted document
Besides _changes, another good way to do this is to use keys with _all_docs:
GET $MYDB/_all_docs?keys=["foo"] ->
{
"offset": 0,
"rows": [
{
"id": "foo",
"key": "foo",
"value": {
"deleted": true,
"rev": "2-eec205a9d413992850a6e32678485900"
}
}
],
"total_rows": 0
}
Note that it has to be keys; key will not work, because only keys returns info for deleted docs.
You can get the last revision of a deleted document, however first you must first determine its revision id. To do that, you can query the _changes feed and scan for the document's deletion record — this will contain the last revision and you can then fetch it using docid?rev=N-XXXXX.
I remember some mailinglist discussion of making this easier (as doing a full scan of the changes feed is obviously not ideal for routine usage), but I'm not sure anything came of it.
I've hit this several times recently, so for anyone else wandering by ...
This question typically results from a programming model that needs to know which document was deleted. Since user keys such as 'type' don't survive deletion and _id is best assigned by couch, it would often be nice to peak under the covers and see something about the doc that was deleted. An alternative is to have a process that sets deleted:True (no underscore) for documents, and to adjust any listener filters, etc., to look for deleted:True. One of the processes can then actually delete the document. This means that any process triggering on the document doesn't need to track an _id for eventual deletion.

Resources