Below if a document from my collection of over 20,000,000 documents.
I need to find documents by a particular zip, out of these documents I need to select one record from each postal address (ADDR, CITY, STATE, ZIP, APT) and which has a age value of 18 or higher.
The results need to be limited to a number as well which is entered by the end-user.
{
"_id" : ObjectId("55e86e98f493590878bb45d7"),
"RecordID" : 84096380,
"FN" : "Michael",
"MI" : "",
"LN" : "Horn",
"NAME_PRE" : "MR",
"ADDR" : "160 Yankee Camp Rd",
"CITY" : "Telford",
"ST" : "TN",
"ZIP" : 37690,
"APT" : "",
"Z4" : 2200,
"DPC" : 605,
"CAR_RTE" : "R001",
"WALK_SEQ" : 228,
"LOT" : "0136A",
"FIPS_ST" : 47,
"FIPS_CTY" : 179,
"LATITUDE" : 36.292787,
"LONGITUDE" : -82.568171,
"ADDR_TYP" : 1,
"MSA" : 3660,
"CBSA" : 27740,
"ADDR_LINE" : 3,
"DMA_SUPPR" : "",
"GEO_MATCH" : 1,
"CENS_TRACT" : 61900,
"CENS_BLK_GRP" : 1,
"CENS_BLK" : 17,
"CENS_MED_HOME_VALUE" : 953,
"CENS_MED_HH_INCOME" : 304,
"CRA" : "",
"Z4_TYP" : "S",
"DSF_IND" : 1,
"DPD_IND" : "N",
"PHONE_FLAG" : "Y",
"PHONE" : NumberLong("4237730233"),
"TIME_ZN" : "E",
"GENDER" : "M",
"NEW_TO_BLD" : "",
"SOURCES" : 19,
"BASE_VER_DT" : 20101,
"COMP_ID" : NumberLong("3769001836"),
"IND_ID" : 1,
"INF_HH_RANK" : 1,
"HOME_OWNR_SRC" : "V",
"DOB_YR" : 1975,
"DOB_MON" : 7,
"DOB_DAY" : 10,
"EXACT_AGE" : 39,
"AGE" : 39,
"HH_INCOME" : "D"
}
if you are using mongoose, we can chain the operations by dot(.) operator. Since i see all your needs is conditional here is the example -
Person.
find({
ZIP: "37690",
ADDR : "",
STATE : "", //so on
AGE: { $gt: 18 }
}).
limit(10).
exec(callback);
more info - http://mongoosejs.com/docs/queries.html
You need to use aggregate operation.
var pipeline = [
{
$match: {ZIP: 37690, AGE: {$gt: 18}}
}, {
$group: {
_id: {ADDR: '$ADDR', CITY: '$CITY', STATE: '$STATE', ZIP: '$ZIP', APT: '$APT'},
PHONE: {$first: '$PHONE'}
}
},
{$limit: 10}
];
db.mycoll.aggregate(pipeline)
enhance the above to project whatever fields you require in results
I think This query will solve your problem.
Person.find({
ZIP: "37690",
AGE: { $gt: 18 }
}).
limit(50).
exec(callback);
Related
I have a document in the below format. The goal is to group the document by student name and sort it by rank in the ascending order. Once that is done, iterate through the rank(within a student) and if each subsequent rank is greater than the previous one, the version field needs to be incremented. As part of a pipeline, student_name will be passed to me so matching by student name should be good instead of grouping.
NOTE: Tried it with python and works to some extent. A python solution would also be great!
{
"_id" : ObjectId("5d389c7907bf860f5cd11220"),
"class" : "I",
"students" : [
{
"student_name" : "AAA",
"Version" : 2,
"scores" : [
{
"value" : "50",
"rank" : 2
},
{
"value" : "70",
"rank" : 1
}
]
},
{
"student_name" : "BBB",
"Version" : 5,
"scores" : [
{
"value" : 80,
"rank" : 2
},
{
"value" : 100,
"rank" : 1
},
{
"value" : 100,
"rank" : 1
}
]
}
]
}
I tried this piece of code to sort
def version(student_name):
db.column.aggregate(
[
{"$unwind": "$students"},
{"$unwind": "$students.scores"},
{"$sort" : {"students.scores.rank" : 1}},
{"$group" : {"students.student_name}
]
)
for i in range(0,(len(students.scores)-1)):
if students.scores[i].rank < students.scores[i+1].rank:
tag.update_many(
{"$inc" : {"students.Version":1}}
)
The expected output for student AAA should be
{
"_id" : ObjectId("5d389c7907bf860f5cd11220"),
"class" : "I",
"students" : [
{
"student_name" : "AAA",
"Version" : 3, #version incremented
"scores" : [
{
"value" : "70",
"rank" : 1
},
{
"value" : "50",
"rank" : 2
}
]
}
I was able to sort the document.
pipeline = [
{"$unwind": "$properties"},
{"$unwind": "$properties.values"},
{"$sort" : {"$properties.values.rank" : -1}},
{"$group": {"_id" : "$properties.property_name", "values" : {"$push" : "$properties.values"}}}
]
import pprint
pprint.pprint(list(db.column.aggregate(pipeline)))
If I have these objects :
{
"_id" : ObjectId("5caf2c1642e3731464c2c79d"),
"requested" : [],
"roomNo" : "E0-1-09",
"capacity" : 40,
"venueType" : "LR(M)",
"seatingType" : "TB",
"slotStart" : "8:30AM",
"slotEnd" : "9:50AM",
"__v" : 0
}
/* 2 */
{
"_id" : ObjectId("5caf2deb4a7f5222305b55d5"),
"requested" : [],
"roomNo" : "E0-1-09",
"capacity" : 40,
"venueType" : "LR(M)",
"seatingType" : "TB",
"slotStart" : "10:00AM",
"slotEnd" : "11:20AM",
"__v" : 0
}
is it possible to get something like this using aggregate in mongodb?
[{ roomNo: "E0-1-09" , availability : [{slotStart : "8:30AM", slotEnd: "9:50AM"} ,
{slotStart: "10:00AM", slotEnd : "11:20AM"}]
what im using currently:
db.getDB().collection(collection).aggregate([
{ $group: {_id:{roomNo: "$roomNo", availability :[{slotStart:"$slotStart", slotEnd:"$slotEnd"}]}}}
])
actually getting it twice like so :
[{ roomNo: "E0-1-09" , availability : [{slotStart : "8:30AM", slotEnd: "9:50AM"}]
[{ roomNo: "E0-1-09" , availability : [{slotStart: "10:00AM", slotEnd : "11:20AM"}]
You have to use $push accumulator
db.collection.aggregate([
{ "$group": {
"_id": "$roomNo",
"availability": {
"$push": {
"slotEnd": "$slotEnd",
"slotStart": "$slotStart"
}
}
}}
])
My MongoDB contains the following data
{
"_id" : ObjectId("5c1b742eb1829b69963029e8"),
"duration" : 12,
"cost" : 450,
"tax" : 81,
"tags" : [],
"participants" : [
ObjectId("5c1b6a8f348ddb15e4a8aac7"),
ObjectId("5c1b742eb1829b69963029e7")
],
"initiatorId" : ObjectId("5c1b6a8f348ddb15e4a8aac7"),
"context" : "coach",
"accountId" : ObjectId("5bdfe7b01cbf9460c9bb5d68"),
"status" : "over",
"webhook" : "http://d4bdc1ef.ngrok.io/api/v1/webhook_callback",
"hostId" : "5be002109a708109f862a03e",
"createdAt" : ISODate("2018-12-20T10:51:26.143Z"),
"updatedAt" : ISODate("2018-12-20T10:51:44.962Z"),
"__v" : 0,
"endedAt" : ISODate("2018-12-20T10:51:44.612Z"),
"startedAt" : ISODate("2018-12-20T10:51:32.992Z"),
"type" : "voip"
}
{
"_id" : ObjectId("5c1b7451b1829b69963029ea"),
"duration" : 1,
"cost" : 150,
"tax" : 27,
"tags" : [],
"participants" : [
ObjectId("5c1b6a8f348ddb15e4a8aac7"),
ObjectId("5c1b7451b1829b69963029e9")
],
"initiatorId" : ObjectId("5c1b6a8f348ddb15e4a8aac7"),
"context" : "coach",
"accountId" : ObjectId("5bdfe7b01cbf9460c9bb5d68"),
"status" : "over",
"webhook" : "http://d4bdc1ef.ngrok.io/api/v1/webhook_callback",
"hostId" : "5be002109a708109f862a03e",
"createdAt" : ISODate("2018-12-20T10:52:01.560Z"),
"updatedAt" : ISODate("2018-12-20T10:52:08.018Z"),
"__v" : 0,
"endedAt" : ISODate("2018-12-20T10:52:07.667Z"),
"startedAt" : ISODate("2018-12-20T10:52:06.762Z"),
"type" : "voip"
}
I want to get the total duration (sum of duration field) for a particular accountID where status is equals to "over" for a particular date range. Anyway to accomplish this using PyMongo? I am unable to form the query
Well I was doing some pretty basic mistakes while converting the query to PyMongo aggregation function. All I would say is be careful with the query structure format and especially the keys are to be encapsulated within quotes(""). To solve this all I have to do was
from bson.objectid import ObjectId
pipe = [
{"$match": {"accountId": ObjectId(accountId),
"status": "over",
"startedAt": {"$gte": startDate,
"$lte": EndDate
}
}},
{"$project": {"readableDate":
{"$dateToString":
{"format": "%Y-%m-%d", "date": "$startedAt"}},
"accountId": str("$accountId"),
"duration": "$duration"
}},
{"$group": {"_id": {"date": "$readableDate",
"accountId": str("$accountId")}, "totalCallDuration": {"$sum": "$duration"}}}]
for doc in db.VoiceCall.aggregate(pipe):
print(doc)
Just a reminder : the startDate and EndDate are in Python datetime format.
{
"_id" : ObjectId("5514ecc73910d3e808b9417c"),
"endingReciptBookNumber" : 2999,
"startingReciptBookNumber" : 2900,
"User" : 8,
"allRecipt" : [
{
"recipt_Number" : 2999,
"amount" : 24124,
"_id" : ObjectId("5514ecc73910d3e808b94180")
},
{
"recipt_Number" : 100,
"amount" : 2414,
"_id" : ObjectId("5514ecc73910d3e808b9417f")
},
{
"recipt_Number" : 101,
"amount" : 242,
"_id" : ObjectId("5514ecc73910d3e808b9417e")
},
{
"recipt_Number" : 102,
"amount" : 2424,
"_id" : ObjectId("5514ecc73910d3e808b9417d")
}
],
"__v" : 0
}
I have many documents like this in a collection in mongoose .I want to find a latest entered recipt_Number for a particular user. like in this case it should give me 102 as answer.
i have also attached snippet of lines of code. Its also a way to get same result.
db.topics.find( {'User': 8}, { 'allRecipt': { $slice: -1 },'startingReciptBookNumber':0,'endingReciptBookNumber':0,'User':0,'_id':0,'__v':0 } )
query result like below
{
"allRecipt" : [
{
"recipt_Number" : 102,
"amount" : 2424,
"_id" : ObjectId("5514ecc73910d3e808b9417d")
}
]
}
Though query won't give any single number in result but it will give desired outcome through result.allRecipt.0.recipt_Number, Your desired number will always get into in 0 index. I think this is your desired number.
Here $slice make a difference.
Thanks
I have a collection with a sub-document consisting of more than 40K records.
My aggregate query takes about 300 secs. I have tried optimizing the same using compound as well as multi-key indexing, which completes in 180 secs.
I still require a reduced query time execution.
here is my collection:
{
"_id" : ObjectId("545b32cc7e9b99112e7ddd97"),
"grp_id" : 654,
"user_id" : 2,
"mod_on" : ISODate("2014-11-06T08:35:40.857Z"),
"crtd_on" : ISODate("2014-11-06T08:35:24.791Z"),
"uploadTp" : 0,
"tp" : 1,
"status" : 3,
"id_url" : [
{"mid":"xyz12793"},
{"mid":"xyz12794"},
{"mid":"xyz12795"},
{"mid":"xyz12796"}
],
"incl" : 1,
"total_cnt" : 25,
"succ_cnt" : 25,
"fail_cnt" : 0
}
and following is my query
db.member_id_transactions.aggregate([ { '$match':
{ id_url: { '$elemMatch': { mid: 'xyz12794' } } } },
{ '$unwind': '$id_url' },
{ '$match': { grp_id: 654, 'id_url.mid': 'xyz12794' } } ])
has anyone faced the same issue?
here's the o/p for aggregate query with explain option
{
"result" : [
{
"_id" : ObjectId("546342467e6d1f4951b56285"),
"grp_id" : 685,
"user_id" : 2,
"mod_on" : ISODate("2014-11-12T11:24:01.336Z"),
"crtd_on" : ISODate("2014-11-12T11:19:34.682Z"),
"uploadTp" : 1,
"tp" : 1,
"status" : 3,
"id_url" : [
{"mid":"xyz12793"},
{"mid":"xyz12794"},
{"mid":"xyz12795"},
{"mid":"xyz12796"}
],
"incl" : 1,
"__v" : 0,
"total_cnt" : 21406,
"succ_cnt" : 21402,
"fail_cnt" : 4
}
],
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("545c8d37ab9cc679383a1b1b")
}
}
One way to reduce the number of records being filtered further is to include the field grp_id, in the first $match operator.
db.member_id_transactions.aggregate([
{$match:{ "id_url.mid": 'xyz12794',"grp_id": 654 } },
{$unwind: "$id_url" },
{$match: { "id_url.mid": "xyz12794" } }
])
See how the performance is now. Add grp_id to the index to get better response time.
The above aggregation query though it works, is unnecessary. since you are not altering the structure of the document, and you expect only one element in the array to match the filter condition, you could just use a simple find and project.
db.member_id_transactions.find(
{ "id_url.mid": "xyz12794","grp_id": 654 },
{"_id":0,"grp_id":1,"id_url":{$elemMatch:{"mid":"xyz12794"}},
"user_id":1,"mod_on":1,"crtd_on":1,"uploadTp":1,
"tp":1,"status":1,"incl":1,"total_cnt":1,
"succ_cnt":1,"fail_cnt":1
}
)