Elastic search order by count of nested object - search

Beginner with Elasticsearch. I feel like this should be pretty simple, but I'm stuck here. I've got a mapping for Posts that looks like this:
[ post1: {
title: 'asdfasd',
comments: [commment1, comment2, comment3]
},
post2: {
title: 'asdf',
comments: [comment1, comment2]
}
.
.
.]
And I'm trying to search for them by title and then order them by number of comments. I can search by title just fine, but I'm a little confused as to how to go about ordering the results by comments count. What would be the best way to go about doing this?

You have two options -
Use a script to get the length of an array. So you would do something like:
{
"query" : {
....
},
"sort" : {
"_script" : {
"script" : "doc['comments'].values.length",
"type" : "number",
"order" : "desc"
}
}
}
Keep an additional field for the number of comments, each time you add a comment also increment the value of the comments counter, and sort by it.
Option #2 is preferable if you have a lot of data. Using a script has its overhead and it can increase search time if you have to calculate the script on a large collection of documents.
Sorting by a field, on the other hand, is much better in terms of performance. I would go with #2.

Related

how do i get the top 20 most searched query in elasticsearch?

I have stored sentences in elasticsearch for autosuggestion.
format:
{
"text": "what is temperature in chicago"
}
it suggests correctly when w or wha or what typed. but I am wondering if there is any way I can fetch most search sentences from elasticsearch.
Sounds like what you need is terms aggregations:
Your request body should look something like this:
{
"query": {
//your query
},
"aggs": {
"common" : {
"terms" : { "field" : "text.keyword", "size": 20 }
}
}
}
If I get your question correctly you want most common searches done wrt to input query, a simple solution can be implemented.
Just track what user finally selects (document of ES) and then increment its counter by 1 keeping mapping of _id.
Running a batch system/sync/indexing this data in ES data will have counter value in your data.
Use this while giving suggestions i.e sort with count field.
This will start working properly as users start using.
Your ES document would look like.
{ "text":"what is temperature in chicago",
"count":10
}
I would suggest this is very raw solution there can be many, but nice to start with.

How to insert an Object into MongoDB

So I have an object using a dictionary to store products that a user has added to the cart in a shopping cart application. I am taking is object and attempting to insert into mongoDB with zero luck.
The piece of data I am attempting to insert looks like this:
products: '{"rJUg4uiGl":{"productPrice":"78.34","count":2},"BJ_7VOiGg":{"productPrice":"3","count":2}}' }
My process of attempting to insert it into mongoDB looks like this:
db.orders.insert("products":{"rJUg4uiGl":{"productPrice":"78.34","count":2},"BJ_7VOiGg":{"productPrice":"3","count":2}});
Currently with this approach I get the following error:
2016-12-15T18:11:43.862-0500 E QUERY [thread1] SyntaxError: missing ) after argument list #(shell):1:27
Which is implying there is some sort of a formatting issue with inserting it. I have moved quotation marks and parenthesis around plenty, simply to either get the above error, or a ... response from mongoDB implying that it is waiting for me to do something more to fix what exactly is causing an error.
Any chance anyone could help give some guidance in the best way to store this object in mongoDB?
My true question feels that it should have been in regards to the mongoose schema that would be used in order to store this data format. I hoped that getting how to initially insert it into mongodb was going to be enough but the way the data is being saved has me a bit confused. I know this is a bit of an awful question but could I get any assistance with setting up my schema for this as well?
"products" : {
"rJUg4uiGl" : {
"productPrice" : "78.34",
"count" : 2
},
"BJ_7VOiGg" : {
"productPrice" : "3",
"count" : 2
}
}
This is what the data looks like when it is stored in mongo. I think what is confusing me on how to set up is the "rJUg4uiGl" portion of the data. I am un-sure of how exactly that is suppose to look in mongoose schema. Here are a few of my rather poor attempts:
products: {
productId: {
productPrice: Number,
count: Number
}
}
Above simply doesn't store anything in the database
products: {
productId: [{
productPrice: Number,
count: Number
}]
}
Above gives:
"products" : {
"productId" : [ ]
}
Again, I know that this is quite specific but any help at all would be extremely appreciated.
Need to wrap your insert data in {}
db.orders.insert({"products":{"rJUg4uiGl":{"productPrice":"78.34","count":2},"BJ_7VOiGg":{"productPrice":"3","count":2}}});

Storing a complex Query within MongoDb Document [duplicate]

This is the case: A webshop in which I want to configure which items should be listed in the sjop based on a set of parameters.
I want this to be configurable, because that allows me to experiment with different parameters also change their values easily.
I have a Product collection that I want to query based on multiple parameters.
A couple of these are found here:
within product:
"delivery" : {
"maximum_delivery_days" : 30,
"average_delivery_days" : 10,
"source" : 1,
"filling_rate" : 85,
"stock" : 0
}
but also other parameters exist.
An example of such query to decide whether or not to include a product could be:
"$or" : [
{
"delivery.stock" : 1
},
{
"$or" : [
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 60
}
},
{
"delivery.filling_rate" : {
"$gt" : 90
}
}
]
},
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 40
}
},
{
"delivery.filling_rate" : {
"$gt" : 80
}
}
]
},
{
"$and" : [
{
"delivery.delivery_days" : {
"$lt" : 25
}
},
{
"delivery.filling_rate" : {
"$gt" : 70
}
}
]
}
]
}
]
Now to make this configurable, I need to be able to handle boolean logic, parameters and values.
So, I got the idea, since such query itself is JSON, to store it in Mongo and have my Java app retrieve it.
Next thing is using it in the filter (e.g. find, or whatever) and work on the corresponding selection of products.
The advantage of this approach is that I can actually analyse the data and the effectiveness of the query outside of my program.
I would store it by name in the database. E.g.
{
"name": "query1",
"query": { the thing printed above starting with "$or"... }
}
using:
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
Which results in:
2016-03-27T14:43:37.265+0200 E QUERY Error: field names cannot start with $ [$or]
at Error (<anonymous>)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:161:19)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:165:18)
at insert (src/mongo/shell/bulk_api.js:646:20)
at DBCollection.insert (src/mongo/shell/collection.js:243:18)
at (shell):1:12 at src/mongo/shell/collection.js:161
But I CAN STORE it using Robomongo, but not always. Obviously I am doing something wrong. But I have NO IDEA what it is.
If it fails, and I create a brand new collection and try again, it succeeds. Weird stuff that goes beyond what I can comprehend.
But when I try updating values in the "query", changes are not going through. Never. Not even sometimes.
I can however create a new object and discard the previous one. So, the workaround is there.
db.queries.update(
{"name": "query1"},
{"$set": {
... update goes here ...
}
}
)
doing this results in:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 52,
"errmsg" : "The dollar ($) prefixed field '$or' in 'action.$or' is not valid for storage."
}
})
seems pretty close to the other message above.
Needles to say, I am pretty clueless about what is going on here, so I hope some of the wizzards here are able to shed some light on the matter
I think the error message contains the important info you need to consider:
QUERY Error: field names cannot start with $
Since you are trying to store a query (or part of one) in a document, you'll end up with attribute names that contain mongo operator keywords (such as $or, $ne, $gt). The mongo documentation actually references this exact scenario - emphasis added
Field names cannot contain dots (i.e. .) or null characters, and they must not start with a dollar sign (i.e. $)...
I wouldn't trust 3rd party applications such as Robomongo in these instances. I suggest debugging/testing this issue directly in the mongo shell.
My suggestion would be to store an escaped version of the query in your document as to not interfere with reserved operator keywords. You can use the available JSON.stringify(my_obj); to encode your partial query into a string and then parse/decode it when you choose to retrieve it later on: JSON.parse(escaped_query_string_from_db)
Your approach of storing the query as a JSON object in MongoDB is not viable.
You could potentially store your query logic and fields in MongoDB, but you have to have an external app build the query with the proper MongoDB syntax.
MongoDB queries contain operators, and some of those have special characters in them.
There are rules for mongoDB filed names. These rules do not allow for special characters.
Look here: https://docs.mongodb.org/manual/reference/limits/#Restrictions-on-Field-Names
The probable reason you can sometimes successfully create the doc using Robomongo is because Robomongo is transforming your query into a string and properly escaping the special characters as it sends it to MongoDB.
This also explains why your attempt to update them never works. You tried to create a document, but instead created something that is a string object, so your update conditions are probably not retrieving any docs.
I see two problems with your approach.
In following query
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
a valid JSON expects key, value pair. here in "query" you are storing an object without a key. You have two options. either store query as text or create another key inside curly braces.
Second problem is, you are storing query values without wrapping in quotes. All string values must be wrapped in quotes.
so your final document should appear as
db.queries.insert({
"name" : "query1",
"query": 'the thing printed above starting with "$or"... '
})
Now try, it should work.
Obviously my attempt to store a query in mongo the way I did was foolish as became clear from the answers from both #bigdatakid and #lix. So what I finally did was this: I altered the naming of the fields to comply to the mongo requirements.
E.g. instead of $or I used _$or etc. and instead of using a . inside the name I used a #. Both of which I am replacing in my Java code.
This way I can still easily try and test the queries outside of my program. In my Java program I just change the names and use the query. Using just 2 lines of code. It simply works now. Thanks guys for the suggestions you made.
String documentAsString = query.toJson().replaceAll("_\\$", "\\$").replaceAll("#", ".");
Object q = JSON.parse(documentAsString);

In a MongoDB will an index help when a field is just being tested on its length?

I am creating a routine to check for interrupted processing and to carry on, during the startup I'm performing the following search:
.find({"DocumentsPath": {$exists: true, $not: {$size: 0}}})
I want it to be as fast as possible, however the documentation suggests that the index is for scanning within the data. I never need to search within the "DocumentsPath" just use it if its there. Creating an index seems like an overhead I don't want. However having the index might speed up the size test.
My question is whether this field should be indexed within the DB?
Thought of commenting but this does deserve an answer. Should this be indexed? Well probably, but for other purposes. Does this make a difference here? No it does not.
The big point to make is your query terms are redundant ( or could be better ) in this case. Let's look at the example:
{ "DocumentsPath": { "$exists": true } }
That will tell you if there is actually an element in a document that matches the property specified. No it does not an cannot use an index. You can use a "sparse" index though and not even need to call that.
{ "DocumentsPath": { "$not": { "$size" : 0 } } }
This is cute one. Yes it tests the length of an array, but what you are really asking here is "I don't want the array to be empty".
So for the better solution.
Use a "sparse" index:
db.collection.ensureIndex({ "DocumentsPath": 1 }, { "sparse": true })
Query for the zeroth element of an index
{ "DocumentsPath.0": { "$exists": true } }
Still no index for "matching" really, but at least the "sparse" index sorted out some of that my excluding documents and the "dot notation" form here is actually more efficient than evaluating via $size.

Whats the best way of saving a document with revisions in a key value store?

I'm new to Key-Value Stores and I need your recommendation. We're working on a system that manages documents and their revisions. A bit like a wiki does. We're thinking about saving this data in a key value store.
Please don't give me a recommendation that is the database you prefer because we want to hack it so we can use many different key value databases. We're using node.js so we can easily work with json.
My Question is: What should the structure of the database look like? We have meta data for each document(timestamp, lasttext, id, latestrevision) and we have data for each revision (the change, the author, timestamp, etc...). So, which key/value structure you recommend?
thx
Cribbed from the MongoDB groups. It is somewhat specific to MongoDB, however, it is pretty generic.
Most of these history implementations break down to two common strategies.
Strategy 1: embed history
In theory, you can embed the history of a document inside of the document itself. This can even be done atomically.
> db.docs.save( { _id : 1, text : "Original Text" } )
> var doc = db.docs.findOne()
> db.docs.update( {_id: doc._id}, { $set : { text : 'New Text' }, $push : { hist : doc.text } } )
> db.docs.find()
{ "_id" : 1, "hist" : [ "Original Text" ], "text" : "New Text" }
Strategy 2: write history to separate collection
> db.docs.save( { _id : 1, text : "Original Text" } )
> var doc = db.docs.findOne()
> db.docs_hist.insert ( { orig_id : doc._id, ts : Math.round((new Date()).getTime() / 1000), data : doc } )
> db.docs.update( {_id:doc._id}, { $set : { text : 'New Text' } } )
Here you'll see that I do two writes. One to the master collection and
one to the history collection.
To get fast history lookup, just grab the original ID:
> db.docs_hist.ensureIndex( { orig_id : 1, ts : 1 })
> db.docs_hist.find( { orig_id : 1 } ).sort( { ts : -1 } )
Both strategies can be enhanced by only displaying diffs
You could hybridize by adding a link from history collection to original collection
Whats the best way of saving a document with revisions in a key value store?
It's hard to say there is a "best way". There are obviously some trade-offs being made here.
Embedding:
atomic changes on a single doc
can result in large documents, may break the reasonable size limits
probably have to enhance code to avoid returning full hist when not necessary
Separate collection:
easier to write queries
not atomic, needs two operations (do you have transactions?)
more storage space (extra indexes on original docs)
I'd keep a hierarchy of the real data under each document with the revision data attached, for instance:
{
[
{
"timestamp" : "2011040711350621",
"data" : { ... the real data here .... }
},
{
"timestamp" : "2011040711350716",
"data" : { ... the real data here .... }
}
]
}
Then use the push operation to add new versions and periodically remove the old versions. You can use the last (or first) filter to only get the latest copy at any given time.
I think there are multiple approaches and this question is old but I'll give my two cents as I was working on this earlier this year. I have been using MongoDB.
In my case, I had a User account that then had Profiles on different social networks. We wanted to track changes to social network profiles and wanted revisions of them so we created two structures to test out. Both methods had a User object that pointed to foreign objects. We did not want to embed objects from the get-go.
A User looked something like:
User {
"tags" : [Tags]
"notes" : "Notes"
"facebook_profile" : <combo_foreign_key>
"linkedin_profile" : <same as above>
}
and then, for the combo_foreign_key we used this pattern (Using Ruby interpolation syntax for simplicity)
combo_foreign_key = "#{User.key}__#{new_profile.last_updated_at}"
facebook_profiles {
combo_foreign_key: facebook_profile
... and you keep adding your foreign objects in this pattern
}
This gave us O(1) lookup of the latest FacebookProfile of a User but required us to keep the latest FK stored in the User object. If we wanted all of the FacebookProfiles we would then ask for all keys in the facebook_profiles collection with the prefix of "#{User.key}__" and this was O(N)...
The second strategy we tried was storing an array of those FacebookProfile keys on the User object so the structure of the User object changed from
"facebook_profile" : <combo_foreign_key>
to
"facebook_profile" : [<combo_foreign_key>]
Here we'd just append on the new combo_key when we added a new profile variation. Then we'd just do a quick sort of the "facebook_profile" attribute and index on the largest one to get our latest profile copy. This method had to sort M strings and then index the FacebookProfile based on the largest item in that sorted list. A little slower for grabbing the latest copy but it gave us the advantage knowing every version of a Users FacebookProfile in one swoop and we did not have to worry about ensuring that foreign_key was really the latest profile object.
At first our revision counts were pretty small and they both worked pretty well. I think I prefer the first one over the second now.
Would love input from others on ways they went about solving this issue. The GIT idea suggested in another answer actually sounds really neat to me and for our use case would work quite well... Cool.

Resources