Find field value that is found in most documents - search

Suppose there are documents that represents books and there is a field called author. What aggregation(s) can retrieve the author value that is found in most documents? Or rephrased, the author that has written most books?
In case it's not clear from the tag, the question is referring to Elasticsearch.
e.g.
{
"name" : "Book1"
"author" : "John"
},
{
"name" : "Book3"
"author" : "Mike"
},
{
"name" : "Book2"
"author" : "John"
},
{
"name" : "Book4"
"author" : "Frank"
}
For the above data, John must be returned since there are 2 documents with him as an author, while only one book by the others.
I've tried with value_count and cardinality, but this only returns the count and not the value itself.

Actually this I found this is quite simple using terms aggregation. Left it, maybe other find this useful.
Reference
e.g. From data from above:
{
"aggs": {
"author_count": {
"terms": {
"size": 2,
"field": "book.author"
}
}
}

Related

Null when filtering Many-to-Many relationship with JHipster

I have an issue with filtering in JHipster.
Here is my (relevant) jhipster-jdl.jh file :
entity Exercise {
name String required
}
entity Difficulty {
name String required
}
entity Language {
name String required
}
relationship ManyToMany {
Exercise{language(name)} to Language
}
relationship ManyToOne {
Exercise{difficulty} to Difficulty
}
filter Exercise
I generated the Springboot service with JHipster and did not change anything.
Let's say I have an exercise called "test" with difficulty "easy" and languages "spanish" and "dutch".
When I query the GET exercises endpoint with the filter name.equals=test :
http://localhost:8080/myservice/api/exercises?nameId.equals=test
I get this answer :
[
{
"id" : 1000,
"difficulty" : {
"id": 5,
"name": "easy"
},
"languages" : null,
"name" : "test"
}
]
As you can see, the issue is that I don't have direct access to the languages linked to my exercise.
Note that the difficulty field has no issue because it is a many-to-one relationship.
The database is not the source of these issues, because if I query the GET exercises/{id} endpoint with the exercise's id :
http://localhost:8080/myservice/api/exercises/1000
I get the right result :
{
"id" : 1000,
"difficulty" : {
"id": 5,
"name": "easy"
},
"languages" : [
{
"id" : 200,
"name" : "spanish"
},
{
"id" : 205,
"name" : "dutch"
}
],
"name" : "test"
}
Now let's try to query the GET exercises endpoint with the filter languageId.greaterOrEqualThan=200 (for the sake of the example) :
http://localhost:8080/myservice/api/exercises?languageId.greaterOrEqualThan=200
Then the response will be :
[
{
"id" : 1000,
"difficulty" : {
"id": 5,
"name": "easy"
},
"languages" : null,
"name" : "test"
},
{
"id" : 1000,
"difficulty" : {
"id": 5,
"name": "easy"
},
"languages" : null,
"name" : "test"
}
]
Notice that the exercise comes out twice (or n times if it has n languages meeting the constraint, I checked), which is problematic.
I feel like something in the JHipster generator is broken, but it seems unlikely because I did not find anybody talking about this quite crippling issue.
Did I do something wrong when generating my JHipster project ? Or is it a true issue ?
Please feel free to ask for any other piece of code, I'm not sure what could be relevant. Thanks.
Note : I noticed the exercise endpoint filters for the languages field use the singular (e.g. language.equals), I don't know if this is normal for a many-to-many relationship.

update array in mongoose which matches the condition

my schema looks like
{
qty:{
property1:{
//something
}
property2:[{
size:40,
color:"black",
enabled:"true"
}]
}
}
property 2 is array what i want to do is update those array object whose enabled is true in single query
I tried writing the following query
db.col.update({
"qty.property2.enabled" = "true"
}, {
"qty.property2.color" = "green"
}, callback)
but it is not working
error:
[main] Error: can't have . in field names [qty.pro.size]
db.col.update({"qty.property2.enabled":"true"},{$set: {'qty.property2.$.color': 'green'}}, {multi: true})
this is the way to update element inside array.
equal sign '=' cannot be used inside object
updating array is done using $
Alternative solution for multiple conditions:
db.foo.update({
_id:"i1",
replies: { $elemMatch:{
_id: "s2",
update_password: "abc"
}}
},
{
"$set" : {"replies.$.text" : "blah"}
}
);
Why
So I was looking for similar solution as this question, but in my case I needed array element to match multiple conditions and using currently provided answers resulted in changes to wrong fields.
If you need to match multiple fields, for example let say we have element like this:
{
"_id" : ObjectId("i1"),
"replies": [
{
"_id" : ObjectId("s1"),
"update_password": "abc",
"text": "some stuff"
},
{
"_id" : ObjectId("s2"),
"update_password": "abc",
"text": "some stuff"
}
]
}
Trying to do update by
db.foo.update({
_id:"i1",
"replies._id":"s2",
"replies.update_password": "abc"
},
{
"$set" : {"replies.$.text" : "blah"}
}
);
Would result in updating to field that only matches one condition, for example it would update s1 because it matches update_password condition, which is clearly wrong. I might have did something wrong, but $elemMatch solution solved any problems like that.
Suppose your documet looks like this.
{
"_id" : ObjectId("4f9808648859c65d"),
"array" : [
{"text" : "foo", "value" : 11},
{"text" : "foo", "value" : 22},
{"text" : "foobar", "value" : 33}
]
}
then your query will be
db.foo.update({"array.value" : 22}, {"$set" : {"array.$.text" : "blah"}})
where first curly brackets represents query criteria and second one sets the new value.

what it means by "populate" in mongoose

so generally what is "populate"? referring to some action to the database.
I have heard it before but never got it right..
If you have a document pointing to another document (i.e. contains an ID reference), populate will fetch the referenced document.
For instance, if you have:
{
"__id" : "a",
"className" : "astroPhysics",
"teacher" : "b"
}
and
{
"__id" : "b",
"teacherName" : "John Smith"
}
getting a and populating teacher will give the following result:
{
"__id" : "a",
"className" : "astroPhysics",
"teacher" : {
"__id" : "b",
"teacherName" : "John Smith"
}
}

Querying a property that is in a deeply nested array

So I have this document within the course collection
{
"_id" : ObjectId("53580ff62e868947708073a9"),
"startDate" : ISODate("2014-04-23T19:08:32.401Z"),
"scoreId" : ObjectId("531f28fd495c533e5eaeb00b"),
"rewardId" : null,
"type" : "certificationCourse",
"description" : "This is a description",
"name" : "testingAutoSteps1",
"authorId" : ObjectId("532a121e518cf5402d5dc276"),
"steps" : [
{
"name" : "This is a step",
"description" : "This is a description",
"action" : "submitCategory",
"value" : "532368bc2ab8b9182716f339",
"statusId" : ObjectId("5357e26be86f746b68482c8a"),
"_id" : ObjectId("53580ff62e868947708073ac"),
"required" : true,
"quantity" : 1,
"userId" : [
ObjectId("53554b56e3a1e1dc17db903f")
]
},...
And I want to do is create a query that returns all courses that have a specific userId in the userId array that is in the steps array for a specific userId. I've tried using $elemMatch like so
Course.find({
"steps": {
"$elemMatch": {
"userId": {
"$elemMatch": "53554b56e3a1e1dc17db903f"
}
}
}
},
But It seems to be returning a empty document.
I think this will work for you, you have the syntax off a bit plus you need to use ObjectId():
db.Course.find({ steps : { $elemMatch: { userId:ObjectId("53554b56e3a1e1dc17db903f")} } })
The $elemMatch usage is not necessary unless you actually have compound sub-documents in that nested array element. And also is not necessary unless the value being referenced could possibly duplicate in another compound document.
Since this is an ObjectId we are talking about, then it's going to be unique, at least within this array. So just use the "dot-notation" form:
Course.find({
"steps.userId": ObjectId("53554b56e3a1e1dc17db903f")
},
Go back and look at the $elemMatch documentation. In this case, the direct "dot-notation" form is all you need

Best document format for addressbook in CouchDB

I really tried, tried so hard but i cant understand couchdb :( I must record the contact of several people, should i put every contact in a single document ?
"1th documet"
{
"names" : [
Jake", "Lock"
]
"numbers" : [
"Jake's number", "Lock's number"
]
}
Future records:
"1th documet"
{
"names" : [
Jake", "Lock", "Kate", "Jin", ...
]
"numbers" : [
"Jake's number", "Lock's number", "Kate's number", "Jin's number", ...
]
}
Or in different documents ?
"1th document"
{
"name" : "Jake"
"number" : "Jake's number"
}
"2th document"
{
"name" : "Lock"
"number" : "Lock's number"
}
Future records:
"1th document"
{
"name" : "Jake"
"number" : "Jake's number"
}
"2th document"
{
"name" : "Lock"
"number" : "Lock's number"
}
"3th document"
{
"name" : "Kate"
"number" : "Kate's number"
}
"4th document"
{
"name" : "Jin"
"number" : "Jin's number"
}
...
I confused, can somebody help me ?
Thanks.
I assume you are storing these contacts to form some kind of address-book style application. Going with this assumption, I would say your second example is exactly what you want to be doing. The way I look at it, each "contact" is a single document. All the attributes for this contact belong within the document.
{
name: "John Smith",
number: "+44 1234 567890"
}
To take this a bit further, in the future you might decide you wish to store multiple numbers per person, perhaps of different types. I would embed these all inside the document for the particular contact:
{
name: "John Smith",
numbers: [
{ number: "+44 1234 567890", type: "home" },
{ number: "+44 7798 987654", type: "mobile" },
{ number: "+44 1234 987123", type: "work" }
]
}
I find a good way to approach designing a model for use in a document database is to consider what items you will wish to use independently. For those which make sense on their own, they should probably go inside their own document. For those which only make sense when viewed in the context of their "container" object, embed them within it.
I hope this helps you.

Resources