How can I group items close to each other without knowing the distribution? - node.js

I'm preparing some data about sold apartment prices for regression analysis. One category is what street the houses are on, but some streets have very different areas, so I want to make a category with the combination of construction year and street name.
Broadway 1910
Broadway 2001
Forexample my challenge is that sometimes the construction spans over several two years. The data is from Sweden, known for huge centralized housing projects. I would like to group these houses together into a period somehow. This is my current code. I know it's not very efficient, but it will only run once on a not huge dataset.
(async () =>{
let client;
try {
client = await MongoClient;
let collection = client.db("booliscraper").collection("sold");
let docs = await collection.find();
await docs.forEach((sale) => {
sale.street = sale.location.address.streetAddress.split(/[0-9]/)[0] + sale.location.namedAreas[0]
sale.streetYear = sale.street+" "+sale.constructionYear
log(sale);
collection.replaceOne({_id: ObjectId(sale._id)}, doc)
});
client.close();
} catch(err) {
log(err)
}
})()

As you correctly said, your current code is inefficient when it comes to dealing with huge datasets so instead of making several calls to the server to do replaceOne within your forEach loop, you can create an aggregate query that computes the category fields you want with the $group pipeline and push the documents that fall into those categories into an array that you will later use to do a bulk update.
For the bulk update you can use bulkWrite method on the collection that will have multiple updateMany operations.
The following operation shows the intuition above in practice:
(async () => {
try {
let client = await MongoClient;
let collection = client.db("booliscraper").collection("sold");
let pipeline = [
{ '$group': {
'_id': {
'street': {
'$concat': [
{
'$arrayElemAt': [
{ '$split': [
'$location.address.streetAddress',
/[0-9]/
] },
0
]
},
{ '$arrayElemAt': [ '$location.namedAreas', 0 ] },
]
},
'streetYear': { '$concat': ['$street', ' ', '$constructionYear'] }
},
'ids': { '$push': '$_id' }
} }
]
let docs = await collection.aggregate(pipeline);
let ops = docs.map(({ _id, ids }) => ({
'updateMany': {
'filter': { '_id': { '$in': ids } },
'update': { '$set': {
'street': _id.street, 'streetYear': _id.streetYear
} }
}
}));
let result = await collection.bulkWrite(ops);
log(result)
client.close()
} catch(err) {
log(err)
}
})()

Related

GET request with $function in Mongodb

Summary:
I am trying to Combine the result of Mongodb aggregation with a third party API and I can't find anything relevant to it.
Explanation:
The below Express route finds all Games that comes after the provided Date and have not been cancelled. The next step is to get some data of that single game from the Third party API and attach it to the object and continue further in the pipeline
Issue:
It seems that you can't have a XHR request inside the $function (I didn't find anything in the official documentation so I'm not sure on that)
const today = moment();
today.year(2021);
today.month(5);
let response = await Game.aggregate([
{
$match: {
$expr: {
$and: [
{ $gte: ["$date", moment(today).startOf('day').toDate()] },
{ $eq: ["$canceled", false] },
]
}
}
},
{ $sort: { date: 1 } },
{
$addFields: {
boxScore: {
$function:
{
body: async function (season, week, home_team) {
const result = await axios.get(
`SINGLEGAMEURL/${season}/${week}/${home_team}`,
{
headers: {
'Subscription-Key': 'SOMEKEY',
},
}
);
return result.data;
},
args: ["$season", '$week', 'home_team'],
lang: "js"
}
}
}
}
]);
I would really appreciate any help/direction on this, Cheers!
I doubt that you can use asynchronous functions in $function, because they return a promise that resolves to result.data, rather than the data themselves. Instead, consider performing the asynchronous operation in your express middleware, after the MongoDB operation. Something like this:
app.use("/path", async function(req, res) {
const today = moment();
today.year(2021);
today.month(5);
let response = await Game.aggregate([
{$match: ...},
{$sort: {date: 1}}
]).toArray();
await Promise.all(response.map(row => axios.get(
`SINGLEGAMEURL/${row.season}/${row.week}/${row.home_team}`,
{headers: {'Subscription-Key': 'SOMEKEY'}}
).then(function(result) {
row.boxScore = result.data;
})));
res.json(response);
});
(Probably the Promise.all can be avoided, but I'm not experienced enough with async/await to know how.)

How do I find out the total number of items in a collection? mongoose

I have 100 items in the collection, each item has a "money" field, I need to get the total amount of money from all, preferably without crutches. i don't know how to did it.
const allMoney = async () => {
let count = 0;
await User.find({}).sort({ money: -1 }).forEach(plr => { count+=plr.money });
return count;
}
"user is my model"
Try using the Mongoose aggregation framework with the $group and
$sum operators, just like this:
user.aggregate([
{
$group: {
_id: null,
count: { $sum: "$money" }
}
}
])
The _id: null expression is necessary because a group specification must include it obligatorily. Otherwise Mongoose will throw an error.
You can use aggregate with a group which will easily fetch the total amount for you.
Refer: https://docs.mongodb.com/manual/reference/operator/aggregation/group/
for more clarification.
const allMoney = async () => {
var aggregate = Scheme.aggregate()
aggregate.group({
_id: null,
totalMoney: {$sum: '$money'}
)}
aggregate.exec(function(err, result) {
if (err) return err;
return result;
});
}

Can mongodb send query pipelining with no loop?

I'm new to NodeJS and MongoDB.
I wanna get user's profile with one user's following list. If I use RDB, it was so simple with EQ join but I didn't have much experience of MongoDB, I don't know how.
Sample data below.
// list of users
[
{
_id: "oid_1",
nickname: "user_01",
link: "url/user_01"
},
{
_id: "oid_2",
nickname: "user_02",
link: "url/user_02"
},
{
_id: "oid_3",
nickname: "user_03",
link: "url/user_03"
}
...
]
user_01's followList
[
{
followOid: "foid_1",
userOid: "user_01"
},
{
followOid: "foid_2",
userOid: "user_02"
},
]
My solution is, get follow list, then use loop with follows.findOne() like below
const dataSet = [];
Follow.getFollowerList(userId) // for pipeline, use promise
.exec()
.then( async (result) => { // no async-await, no data output...
for (let data of result) {
let temp = await Users.getUserInfo( // send query for each data, I think it's not effective
data.userId,
{ nickname: 1, link: 1 }
);
dataSet.push(temp);
}
return dataSet;
})
.then((data) => {
res.status(200).json(data);
})
.catch( ... )
I think it's not best solution. If you are good at mongodb, plz save my life :)
thanks
One option would be to use aggregation.
const userId = 'Fill with UserId';
const pipe = [
{
'$match': {
'_id': userId
}
}, {
'$lookup': {
'from': 'followListCollectionName',
'localField': '_id',
'foreignField': 'userOid',
'as': 'followList'
}
}
];
const result = await UserModel.aggregate(pipeline);
and then you can find an array in result which contains one user with given Id ( and more if there are with same Id) and result[0].followList you can find follow objects as array
Second Option is to use virtuals
https://mongoosejs.com/docs/tutorials/virtuals.html
but for this schema of your collection needs some changes.
Good luck

Mongo/Node: Filtering By Single Properties?

I am dealing with a query with a criteria object that is being passed as the first argument to this query:
module.exports = (criteria, sortProperty, offset = 0, limit = 20) => {
// write a query that will follow sort, offset, limit options only
// do not worry about criteria yet
console.log(criteria);
const query = Artist.find({ age: { $gte: 19, $lte: 44 } })
.sort({ [sortProperty]: 1 })
.skip(offset)
.limit(limit);
return Promise.all([query, Artist.count]).then(results => {
return {
all: results[0],
count: results[1],
offset: offset,
limit: limit
};
});
};
By default, the criteria object has a single name property that is an empty string.
The age property points to an object that has both min and max values assigned to it. I also have a yearsActive property inside of the criteria object and that also has a min and max value.
So three different properties: age, name and yearsActive.
This has been an extremely challenging one for me and if you look above that's as far as I got.
When my criteria property is console logged it only has a name { name: "" }. It has no yearsActive or age by default when it first starts. So that is where the point of the sliders come in. When I start moving these sliders around on the frontend, then it gets the age and yearsActive appended to the criteria object.
So I need to figure out how to update the query to consider for example the different ages and I have been considering using an if conditional inside a helper function.
Regarding to the comment that I left you.
You have three states at least one when you retrieve the data to the UI. In this case, I would recommend you use aggregation in order to retrieve the data as a model as your business.
For example, the problem as you have is that sometimes you don't know about the max or min value for age or yearsActive, but also you should have an identifier that could be an ObjectId which will be used to update the model identified by that property.
Artist.aggregate([
{
$match: { age: { $gte: 19, $lte: 44 } }
},
{
$sort: { yourProperty: 1 }
},
{
$skip: 10
},
{
$limit: 10
},
{
$project: {
// You set your properties to retrieve with the 1 as flag
propertieX: 1,
"another.property": 1,
"age.max": {
$cond: {
if: { $eq: [ "", "$age.max" ] },
then: 0, // Or the value that you want to set it
else: "$age.max"
}
}
}
}]);
The other state is when you do the query according to the parameters that you're submitting from the form.
If you assurance to retrieve a model with the logic as you want. For example you should return this model in every request using $project and applying the default values when doesn't exist the manipulation in the front-end side as in the searching should be easy to manage.
{
ObjectId: YOUR_OBJECT_ID,
age: {
min: YOUR_MIN_VALUE,
max: YOUR_MAX_VALUE
},
yearsActive: {
min: YOUR_MIN_VALUE,
max: YOUR_MAX_VALUE
}
}
Finally, when you would send the data to save it you should sent the entire model that you returned but the must important thing is identify only that element by the ObjectId to do the update.
NOTE: This is an approach that I will do according with the information that I understand from your question, If I'm bad with me interpretation let me know, and if you want to share more information or open a repository to understand in code, should more easy to me understand the problem.
So what I decided to do since the code would look messy to throw all inside the Artist.find({}) was to create a separate helper function:
const buildQuery = (criteria) => {
console.log(criteria);
};
This helper function is being called with the criteria object and I have to form up the object in such a way that it will represent the query the way in which I want to search the Artist collection.
What made this difficult to wrap my head around was the not very well formed object for searching over a collection with its random properties such as age which has a min and a max which Mongo does not know how to deal with by default. MongoDB does not know what min and max mean exactly.
So inside the helper function I made a separate object to return from this function thats going to represent the actual query that I want to send off to Mongo.
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
};
I am not modifying the object in anyway, I am just reading some of the desired search results or what the user wants to see from this UI object and so I made this object called query and I added the idea of age.
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
query.age = {};
};
I decided to do an if conditional inside of the helper function for the specific age range that I want to find.
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {};
}
};
So this is where the Mongo query operators come into play. The two operators I want to be concerned with is the greater than or equal to ($gte) and the less than or equal to ($lte) operators.
This is how I actually implemented in practice:
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {
$gte: criteria.age.min,
$lte: criteria.age.max
};
}
};
The query object here will eventually be returned from the buildQuery function:
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {
$gte: criteria.age.min,
$lte: criteria.age.max
};
}
return query;
};
That query object will be passed off to the find operation:
module.exports = (criteria, sortProperty, offset = 0, limit = 20) => {
// write a query that will follow sort, offset, limit options only
// do not worry about criteria yet
const query = Artist.find(buildQuery(criteria))
.sort({ [sortProperty]: 1 })
.skip(offset)
.limit(limit);
return Promise.all([query, Artist.count]).then(results => {
return {
all: results[0],
count: results[1],
offset: offset,
limit: limit
};
});
};
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {
$gte: criteria.age.min,
$lte: criteria.age.max
};
}
return query;
};
So what I am doing here is to get the equivalent of Artist.find({ age: { $gte: minAge, $lte: maxAge }).
So for yearsActive I decided to implement something that is nearly identical:
const buildQuery = criteria => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {
$gte: criteria.age.min,
$lte: criteria.age.max
};
}
if (criteria.yearsActive) {
}
return query;
};
So if the user changes the slider, I am going to expect my criteria object to have a yearsActive property defined on it like so:
const buildQuery = criteria => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {
$gte: criteria.age.min,
$lte: criteria.age.max
};
}
if (criteria.yearsActive) {
query.yearsActive = {
$gte: criteria.yearsActive.min,
$lte: criteria.yearsActive.max
}
}
return query;
};

MongoDB - find one and add a new property

Background: Im developing an app that shows analytics for inventory management.
It gets an office EXCEL file uploaded, and as the file uploads the app convert it to an array of JSONs. Then, it comapers each json object with the objects in the DB, change its quantity according to the XLS file, and add a timestamp to the stamps array which contain the changes in qunatity.
For example:
{"_id":"5c3f531baf4fe3182cf4f1f2",
"sku":123456,
"product_name":"Example",
"product_cost":10,
"product_price":60,
"product_quantity":100,
"Warehouse":4,
"stamps":[]
}
after the XLS upload, lets say we sold 10 units, it should look like that:
{"_id":"5c3f531baf4fe3182cf4f1f2",
"sku":123456,
"product_name":"Example",
"product_cost":10,
"product_price":60,
"product_quantity":90,
"Warehouse":4,
"stamps":[{"1548147562": -10}]
}
Right now i cant find the right commands for mongoDB to do it, Im developing in Node.js and Angular, Would love to read some ideas.
for (let i = 0; i < products.length; i++) {
ProductsDatabase.findOneAndUpdate(
{"_id": products[i]['id']},
//CHANGE QUANTITY AND ADD A STAMP
...
}
You would need two operations here. The first will be to get an array of documents from the db that match the ones in the JSON array. From the list you compare the 'product_quantity' keys and if there is a change, create a new array of objects with the product id and change in quantity.
The second operation will be an update which uses this new array with the change in quantity for each matching product.
Armed with this new array of updated product properties, it would be ideal to use a bulk update for this as looping through the list and sending
each update request to the server can be computationally costly.
Consider using the bulkWrite method which is on the model. This accepts an array of write operations and executes each of them of which a typical update operation
for your use case would have the following structure
{ updateOne :
{
"filter" : <document>,
"update" : <document>,
"upsert" : <boolean>,
"collation": <document>,
"arrayFilters": [ <filterdocument1>, ... ]
}
}
So your operations would follow this pattern:
(async () => {
let bulkOperations = []
const ids = products.map(({ id }) => id)
const matchedProducts = await ProductDatabase.find({
'_id': { '$in': ids }
}).lean().exec()
for(let product in products) {
const [matchedProduct, ...rest] = matchedProducts.filter(p => p._id === product.id)
const { _id, product_quantity } = matchedProduct
const changeInQuantity = product.product_quantity - product_quantity
if (changeInQuantity !== 0) {
const stamps = { [(new Date()).getTime()] : changeInQuantity }
bulkOperations.push({
'updateOne': {
'filter': { _id },
'update': {
'$inc': { 'product_quantity': changeInQuantity },
'$push': { stamps }
}
}
})
}
}
const bulkResult = await ProductDatabase.bulkWrite(bulkOperations)
console.log(bulkResult)
})()
You can use mongoose's findOneAndUpdate to update the existing value of a document.
"use strict";
const ids = products.map(x => x._id);
let operations = products.map(xlProductData => {
return ProductsDatabase.find({
_id: {
$in: ids
}
}).then(products => {
return products.map(productData => {
return ProductsDatabase.findOneAndUpdate({
_id: xlProductData.id // or product._id
}, {
sku: xlProductData.sku,
product_name: xlProductData.product_name,
product_cost: xlProductData.product_cost,
product_price: xlProductData.product_price,
Warehouse: xlProductData.Warehouse,
product_quantity: productData.product_quantity - xlProductData.product_quantity,
$push: {
stamps: {
[new Date().getTime()]: -1 * xlProductData.product_quantity
}
},
updated_at: new Date()
}, {
upsert: false,
returnNewDocument: true
});
});
});
});
Promise.all(operations).then(() => {
console.log('All good');
}).catch(err => {
console.log('err ', err);
});

Resources