Hi I'm new to Elasticsearch and I'm using Elasticsearch version 5.6 as I know _id of every doc in Elasticsearch is unique.
but while re-indexing logs I found that. some of doc have same _id. for example below two logs
have same id. How is it possible?
{
"_index": "orders",
"_type": "pending",
"_id": "1473531",
"_score": 1,
"_routing": "44540",
"_parent": "44540",
"_source": {
"id": 1473531,
"level": "info",
"type": "pending",
"status": "",
"message": "Order marked cancelled by system"
}
}
{
"_index": "orders",
"_type": "confirmed",
"_id": "1473531",
"_score": 1,
"_source": {
"id": 1473531,
"source_address": "Independence, MO 64055",
"dest_address": "MO 64138",
"short_source": "Select Physical Therapy",
"short_dest": "Home",
"customer_remarks": null,
"source_lat_long": ["39.0334554", "-94.3761432"],
"dest_lat_long": ["38.986449", "-94.4661768"]
}
}
This is because, your type in the index is different.
first document has index orders but type as pending while other document has same index orders but type is confirmed.
In latest ES version types are removed, refer removal of types for more info.
Related
I have a one order table
Mongo db version is 3.0.15
Order table have this columns
{
"status": "new",
"_id": "5fd320a8b3d5133c8bf00c4a",
"categoryId": "5fb2e20f5ea2aa2d15cca1ef",
"specialization": "Tesst",
"requestedBy": "5fd232ecd7ba652c3a85f818",
"vendorId": "5fa908773410d8591aa9f550",
"updated_at": 1607671976182,
"created_at": 1607671976182,
},
requested-by belongs to second table users
I want to fetch user details from user table username user-address
in user table i have this columns
{
"_id": "5fd232ecd7ba652c3a85f818",
"phone": "1234567890",
"otp" : "123456",
"progress_pictures": [],
"createdAt": "2020-12-10T14:38:36.045Z",
"updatedAt": "2020-12-10T14:39:35.127Z",
}
How can I get data anyone know ?
Order.find(
{
status : status
},
{
}
).exec();
This is my order invite query this is working find get only order table data
i found solution using populate
Order.find(
{
status : status
},
{
}
).populate([
{
path: 'requestedBy', // in order table user ref id
select : 'name phone user_address'
},{
path : 'vendorId', // in order table vendor ref id
select : 'vendorName vendorMobile vendorAddress vendorBusinessName'
}]).exec();
----------------------- RESPONSE -----------------------
{
"status": "new",
"_id": "5fd320a8b3d5133c8bf00c4a",
"categoryId": "5fb2e20f5ea2aa2d15cca1ef",
"specialization": "Tesst",
"requestedBy": {
"user_address": [],
"name" : "nikhil",
"_id": "5fd232ecd7ba652c3a85f818",
"phone": "1234567890"
},
"vendorId": {
"_id": "5fa908773410d8591aa9f550",
"vendorMobile": "1234567890",
"vendorAddress" : "demo address",
"vendorBusinessName" : "testing hub",
"vendorName": "KL"
},
"updated_at": 1607671976182,
"created_at": 1607671976182,
},
I have data like
id, title
'Deploying SQL Server Databases from Test to Live'
'Deploying SQL Server Databases for clients'
'Merge SQL Server databases'
'SQL Server : gather data from different databases'
.......
.......
more then millions of records.
my search query be like
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
bd={
'query':{
'match': {
'title': "Deploying SQL Server Databases from Test to Live"
}
},
'sort': {
'_score': {
'order': 'desc'
}
}
}
res = es.search(index='abc-index', body=bd)
My search result :
{
"took": 1297,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 38.9089,
"hits": [
{
"_index": "abc-index",
"_type": "_doc",
"_id": "1",
"_score": 38.9089, #normalized this value to some range [a,b]
"_source": {
"id": 1,
"title": "Deploying SQL Server Databases from Test to Live"
}
},
{
"_index": "abc-index",
"_type": "_doc",
"_id": "2",
"_score": 25.427029, #normalized this value to some range [a,b]
"_source": {
"id": 2,
"title": "Deploying SQL Server Databases for clients"
}
},
{
"_index": "abc-index",
"_type": "_doc",
"_id": "3",
"_score": 19.293251, #normalized this value to some range [a,b]
"_source": {
"id": 3,
"title": "Merge SQL Server databases"
}
},
{
"_index": "abc-index",
"_type": "_doc",
"_id": "4",
"_score": 18.969624, #normalized this value to some range [a,b]
"_source": {
"id": 4,
"title": "SQL Server : gather data from different databases"
}
}
.......... # 10,000 query result
]
}
}
I want _score value normalized in some range [a,b] for example [0,2].Can anyone please help me how to do that.
A user has many payments, a payment has many debtors, a debtor belongs to a user.
I am trying to find a users payments that relate to another particular user.
I have a query which gets a users payments, populated with all the debtors and user information for each debtor.
const user_1 = await this.userModel
.findOne({email: "geoffery.brown#gmail.com"})
.populate({path: 'payments', populate: {path: 'debtors', populate: {path: 'user'}}})
which returns a something like this:
{
"payments": [
{
"debtors": [
{
"_id": "5a9531b0de918e42c94947cc",
"amount": 15,
"user": {
"payments": [],
"created_at": "2018-02-27T10:14:39.847Z",
"_id": "5a95300388740142774f49c9",
"first_name": "John",
"last_name": "Smith",
"email": "john.smith#gmail.com",
"__v": 0
},
"__v": 0
},
{
"_id": "5a9531b0de918e42c94947cd",
"amount": 10,
"user": {
"payments": [],
"created_at": "2018-02-27T10:14:39.847Z",
"_id": "5a95302188740142774f49ca",
"first_name": "Joe",
"last_name": "Blogs",
"email": "joe.blogs#hotmail.com",
"__v": 0
},
"__v": 0
}
],
"created_at": "2018-02-27T10:23:31.561Z",
"_id": "5a9531b0de918e42c94947ce",
"date": "2018-02-26T10:54:36.167Z",
"reference": "Food",
"__v": 0
}
],
"created_at": "2018-02-27T10:14:39.847Z",
"_id": "5a952fc488740142774f49c8",
"first_name": "Geoffery",
"last_name": "Brown",
"email": "geoffery.brown#gmail.com",
"__v": 0
}
I want to have my mongo query be able to filter the debtors where email === "john.smith#gmail.com"
Is this possible with my current mongodb structure?
No such support is available in mongoose populate. You can use aggregate like below in 3.4.
Similar concept as to populate but all the heavy lifting is done in single server call inside aggregation framework.
$lookup stage is used to pull data from different referenced collections.
$unwind stage to flatten the structure for subsequent lookups.
$group stages each to push debtors into payments and payments array into main document.
this.userModel.aggregate([
{"$match":{"email": "geoffery.brown#gmail.com"}},
{"$lookup":{
"from":"payments", // name of the collection
"localField":"payments",
"foreignField":"_id",
"as":"payments"
}},
{"$unwind":"$payments"},
{"$lookup":{
"from":"debtors", // name of the collection
"localField":"payments.debtors",
"foreignField":"_id",
"as":"debtors"
}},
{"$project":{"payments.debtors":0}},
{"$unwind":"$debtors"},
{"$lookup":{
"from":"users", // name of the collection
"localField":"debtors.user",
"foreignField":"_id",
"as":"debtors.user"
}},
{"$unwind":"$debtors.user"},
{"$match":{"debtors.user.email":"john.smith#gmail.com"}},
{"$group":{
"_id":{id:"$_id",payment_id:"$payments._id"},
"created_at":{"$first":"$created_at"},
"first_name":{"$first":"$first_name"},
"last_name": {"$first":"$last_name"},
"email": {"$first":"$email"},
"payments":{"$first":"$payments"},
"debtors":{"$push":"$debtors"}
}},
{"$addFields":{"payments.debtors":"$debtors"}},
{"$project":{"debtors":0}},
{"$group":{
"_id":"$_id.id",
"created_at":{"$first":"$created_at"},
"first_name":{"$first":"$first_name"},
"last_name": {"$first":"$last_name"},
"email": {"$first":"$email"},
"payments":{"$push":"$payments"}
}}
]).exec(function() {...})
This is my Friends Collection
[
{
"_id": "59e4fbcac23f38cdfa6963a8",
"friend_id": "59e48f0af8c277d7a8886ed7",
"user_id": "59e1d36ad17ad5ad3d0453f7",
"__v": 0,
"created_at": "2017-10-16T18:34:50.875Z"
},
{
"_id": "59e5065f705a90cfa218c9e5",
"friend_id": "59e48f0af8c277d7a8886edd",
"user_id": "59e1d36ad17ad5ad3d0453f7",
"__v": 0,
"created_at": "2017-10-16T19:19:59.483Z"
}
]
This is my Scores collection:
[
{
"_id": "59e48f0af8c277d7a8886ed8",
"score": 19,
"user_id": "59e48f0af8c277d7a8886ed7",
"created_at": "2017-10-13T09:02:10.010Z"
},
{
"_id": "59e48f0af8c277d7a8886ed9",
"score": 24,
"user_id": "59e48f0af8c277d7a8886ed7",
"created_at": "2017-10-11T00:56:10.010Z"
},
{
"_id": "59e48f0af8c277d7a8886eda",
"score": 52,
"user_id": "59e48f0af8c277d7a8886ed7",
"created_at": "2017-10-24T09:16:10.010Z"
},
]
This is my Users collection.
[
{
"_id": "59e48f0af8c277d7a8886ed7",
"name": "testuser_0",
"thumbnail": "path_0"
},
{
"_id": "59e48f0af8c277d7a8886edd",
"name": "testuser_1",
"thumbnail": "path_1"
},
{
"_id": "59e48f0af8c277d7a8886ee3",
"name": "testuser_2",
"thumbnail": "path_2"
},
{
"_id": "59e48f0af8c277d7a8886ee9",
"name": "testuser_3",
"thumbnail": "path_3"
},
]
And finally i need list of friends sorted in highscore order for a particular time period (say last 24 hours) with something like this...
[
{
"friend_id": "59e48f0af8c277d7a8886ed7",
"friend_name":"test_user_2"
"thumbnail":"image_path",
"highscore":15
},
"friend_id": "59e48f0af8c277d7a8886edd",
"friend_name":"test_user_3"
"thumbnail":"image_path",
"highscore":10
}
]
What's the best way to achieve this? I have tried aggregation pipeline but getting quite confused with working with 3 collections.
Following your answers, an array size of 500 entries in a document may not be a bad idea to store the friends as you would only store "friends id" and "created" in each entry. It saves having a collection.
You would not have too much performances issues if you project the data in your query by selecting only the fields you want.
https://docs.mongodb.com/v3.2/tutorial/project-fields-from-query-results/#return-specified-fields-only
For the score that increase of 30 per day; it depends what type of query you do.
It would take a while to reach the 2MB limit per the document by adding 30 scores per day.
regarding joining the different collections there is a stack overflow question about it:
How do I perform the SQL Join equivalent in MongoDB?
or
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
You will need to use the aggregation framework from mongoDB to use if; not just a find command.
How do I positive-boost the absence of certain terms? I've asked this question before here but the response was not satisfactory because it wasn't generalizable enough.
Lets try again, with more nuances.
I want to be able to distinguish laptops from their accessories. In human language this is done by the absense of terms. That is, when you say lenovo thinkpad you know that by omitting the word battery you mean you want the actual laptop. Compare this with when a person says lenovo thinkpad battery, where they mean the battery.
So suppose we have the index:
PUT test_index
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
}
with mapping:
PUT test_index/_mapping/merchant
{
"properties": {
"title": {
"type": "string"
},
"category": {
"type": "string",
"index": "not_analyzed"
}
}
}
put two items into it:
PUT test_index/merchant/3
{
"title": "macbook battery",
"category": "laptops accessories"
}
PUT test_index/merchant/2
{
"title": "lenovo thinkpad battery",
"category": "laptops accessories"
}
PUT test_index/merchant/1
{
"title": "lenovo thinkpad white/black",
"category": "laptops"
}
Now search lenovo thinkpad:
POST test_index/_search
{
"query":{
"match": { "title": "lenovo thinkpad" }
}
}
The result is:
"hits": [
{
"_index": "test_index",
"_type": "merchant",
"_id": "2",
"_score": 0.70710677,
"_source": {
"title": "lenovo thinkpad battery",
"category": "laptops accessories"
}
},
{
"_index": "test_index",
"_type": "merchant",
"_id": "1",
"_score": 0.70710677,
"_source": {
"title": "lenovo thinkpad white/black",
"category": "laptops"
}
}
]
where notice that lenovo thinkpad battery is higher up than lenovo thinkpad white/black.
Now, I can see at least two reasonable ways to do this.
A) Use term frequency on a per-category basis to influence relevance of title match. For example, if for each category you extract the 95% percentile terms, you get that battery is a high frequency term in laptops accessories and so the word battery should be negative-boosted on all title queries.
B) Use term frequency on a per-category basis to influence relevance of category match. For example, in addition of the title match, you automatically negative-boost results whose categories have 95% percentile terms which aren't contained in your title match.
A and B aren't quite the same, but they both rely on the idea that certain absent words should be taken into account for relevance.
So...... thoughts?
My vote would be
C)
Fix the categories so that a battery doesn't have 'laptops' as a category (it's a 'laptopAccessory' or just 'accessory') Alternatively create an additional category (not called 'laptops') to indicate the actual machines themselves.
In your search, instead of trying to down-rank the accessories, you apply a boost to the 'laptops' category (no longer ambiguous). This will cause initial searches as in your example of 'lenovo thinkpad' to bring the actual machines up above the accessories. A more precise search ('lenovo thinkpad battery') will still work as you'd expect also.
Another nice UI/UX experience is to take the total set of categories returned in your results, and provide easy filter links. So if your initial search returns 'laptops' 'accessories' 'payment plans', then you'd have each of those as a link to a re-query that uses the original search plus a filter on that category.
Good luck!
Boost "that" category.
GET /test_index/merchant/_search
{
"from": 0,
"query": {
"bool": {
"must": [
{"match": {"title": "lenovo thinkpad"}}
],
"should": [
{
"match": {
"category": {
"boost": "2",
"query": "laptops"
}
}
}
]
}
},
"size": "10"
}
Result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.573319,
"hits": [
{
"_index": "index",
"_type": "merchant",
"_id": "1",
"_score": 1.573319,
"_source": {
"title": "lenovo thinkpad white/black",
"category": "laptops"
}
},
{
"_index": "index",
"_type": "merchant",
"_id": "2",
"_score": 0.15889977,
"_source": {
"title": "lenovo thinkpad battery",
"category": "laptops accessories"
}
}
]
}
}
More on boosting, can be found here
We can update the absence of certain terms using boost property which was provided while query for that term.
Please check below query with boost property set to 10.
GET /test_index/students/_search
{
"from": 0,
"query": {
"bool": {
"must": [
{"match": {"age": "20"}}
],
"should": [
{
"match": {
"category": {
"boost": "10",
"query": "students"
}
}
}
]
}
},
"size": "10"
}