How to send bulk MongoDB count() queries? - node.js

My application has a search field and to do an autocomplete, I first fetch the distinct() values, and immediately after, I send a count() query for each distinct value. There can be dozens of values to then count, that's a lot of queries.
Any idea how I could avoid this large number of queries using MongoDB's NodeJS module?
For now, the query is as such:
const baseQuery = {
"organization": organization,
"status": "processed"
}
let domains = []
// A. Query to get the disinct values
MongoDB.getMainCollection().distinct(`location.hostname`, { organization, status: "processed" })
// B. Got the value, now creating a COUNT() query for each
.then(list => {
domains = list.map((host,idx) => Object.assign({}, { domain: host, count: 0, idx: idx }))
const countingPromises = list.map(host => MongoDB.getMainCollection().count(Object.assign({}, baseQuery, { "location.hostname": host })))
return Promise.all(countingPromises)
})
// C. Putting it all together
.then(values => {
values.forEach((count, idx) => {
const domain = domains.find(d => d.idx === idx)
if (domain) {
domain.count = count
}
})
domains.sort((a,b) => b.count - a.count)
resolve(domains)
})
.catch(err => reject(new AppError(`Error listing hostnames for #${organization}.`, 500, err, payload)))
p.s. this works as intended and returns what I want -- just want to avoid so many queries and perhaps bundle them if possible?

You can get all the distinct values and their counts in a single aggregate query:
MongoDB.getMainCollection().aggregate([
// Filter for the desired docs
{$match: baseQuery},
// Group the docs by location.hostname and get a count for each
{$group: {_id: '$location.hostname', count: {$sum: 1}}}
])

Related

finding multiple documents with mongoose

this is what happens when I console.log()
this is what happens when I return all documents with that id, I just get the first document
const followingUsers = await User.find({ _id: { $in: foundUser.followings } })
const getFeedData = async() => {
for (let user of followingUsers) {
for (let postId of user.posts) {
console.log(postId)
}
}
}
I'm running this code when I console.log(postId) it returns all the posts with that id, but when I try to retrieve all documents with that id it returns just one document
findById will only return one record or null, an ID is the _id field on each document in a collection, which is a unique value
find is the equivalent of a where command in SQL, it returns as many documents that match the query, or an empty array
passing $in as a query to find looks for an array of matching document for the user id's
so if you already know the document _id's, then find will return the ID's so long you have passed an array of valid ObjectId
// (pretending these are real id's)
const arrayOfUserIds = [
ObjectId("5af619de653438ba9c91b291"),
ObjectId("5af619de653438ba9c91b293"),
ObjectId("5af619de653438ba9c91b297")
]
const users = await User.find({ _id: { $in: arrayOfUserIds } })
console.log(users.length)
users.forEach((user, index) => {
console.log(`${index} - `, user._id)
})
// => 3
// => 0 - ObjectId("5af619de653438ba9c91b291")
// => 1 - ObjectId("5af619de653438ba9c91b293")
// => 2 - ObjectId("5af619de653438ba9c91b297")

Mongoose .find() with unique filter

I have a big DB with some duplicate items. Can I use .find() with filter by unique field? If I use .distinct('term') it return an array of unique terms like ['term1', 'term2', 'term3'], but I want to return an array with entire objects from db, filtered by unique 'term'.
I resolved it by filtering the response from .find()
responses = await Product.find({ product_id: data.input });
let container = [];
responses
.filter((value, index, self) => {
return (
self.findIndex((v) => v.search_term === value.search_term) === index
);
})
.map((ele) => {
container.push(ele);
});

Get Firestore subcollections without use id

I have a problem with getting data from the Firestore with a following structure
Here is how I get category collection:
app.get('/getProjectsNo', (request, response) => {
response.set('Access-Control-Allow-Origin', '*')
let orders = []
db.collection('companies').doc('renaultsomaca').collection('orders').get().then(snapshot => {
snapshot.forEach((doc) => {
orders.push(doc.data())
});
response.send(orders)
})
})
It gives me: orders list.But I need to get orders without using doc('renaultsomaca').Because I just need all orders not only renaultsomaca orders.
What you're describing is known as a collection group query, which queries across all collections with a specific name.
To get all documents from orders collections, no matter where they are in the database, you'd do:
const querySnapshot = await db.collectionGroup('orders').get();
querySnapshot.forEach((doc) => {
console.log(doc.id, ' => ', doc.data());
});
There is no way to specify the path to the orders collection, so if you have multiple types of orders that you want to query separately, you'll have to give them distinct names.

Mongo/Node: Filtering By Single Properties?

I am dealing with a query with a criteria object that is being passed as the first argument to this query:
module.exports = (criteria, sortProperty, offset = 0, limit = 20) => {
// write a query that will follow sort, offset, limit options only
// do not worry about criteria yet
console.log(criteria);
const query = Artist.find({ age: { $gte: 19, $lte: 44 } })
.sort({ [sortProperty]: 1 })
.skip(offset)
.limit(limit);
return Promise.all([query, Artist.count]).then(results => {
return {
all: results[0],
count: results[1],
offset: offset,
limit: limit
};
});
};
By default, the criteria object has a single name property that is an empty string.
The age property points to an object that has both min and max values assigned to it. I also have a yearsActive property inside of the criteria object and that also has a min and max value.
So three different properties: age, name and yearsActive.
This has been an extremely challenging one for me and if you look above that's as far as I got.
When my criteria property is console logged it only has a name { name: "" }. It has no yearsActive or age by default when it first starts. So that is where the point of the sliders come in. When I start moving these sliders around on the frontend, then it gets the age and yearsActive appended to the criteria object.
So I need to figure out how to update the query to consider for example the different ages and I have been considering using an if conditional inside a helper function.
Regarding to the comment that I left you.
You have three states at least one when you retrieve the data to the UI. In this case, I would recommend you use aggregation in order to retrieve the data as a model as your business.
For example, the problem as you have is that sometimes you don't know about the max or min value for age or yearsActive, but also you should have an identifier that could be an ObjectId which will be used to update the model identified by that property.
Artist.aggregate([
{
$match: { age: { $gte: 19, $lte: 44 } }
},
{
$sort: { yourProperty: 1 }
},
{
$skip: 10
},
{
$limit: 10
},
{
$project: {
// You set your properties to retrieve with the 1 as flag
propertieX: 1,
"another.property": 1,
"age.max": {
$cond: {
if: { $eq: [ "", "$age.max" ] },
then: 0, // Or the value that you want to set it
else: "$age.max"
}
}
}
}]);
The other state is when you do the query according to the parameters that you're submitting from the form.
If you assurance to retrieve a model with the logic as you want. For example you should return this model in every request using $project and applying the default values when doesn't exist the manipulation in the front-end side as in the searching should be easy to manage.
{
ObjectId: YOUR_OBJECT_ID,
age: {
min: YOUR_MIN_VALUE,
max: YOUR_MAX_VALUE
},
yearsActive: {
min: YOUR_MIN_VALUE,
max: YOUR_MAX_VALUE
}
}
Finally, when you would send the data to save it you should sent the entire model that you returned but the must important thing is identify only that element by the ObjectId to do the update.
NOTE: This is an approach that I will do according with the information that I understand from your question, If I'm bad with me interpretation let me know, and if you want to share more information or open a repository to understand in code, should more easy to me understand the problem.
So what I decided to do since the code would look messy to throw all inside the Artist.find({}) was to create a separate helper function:
const buildQuery = (criteria) => {
console.log(criteria);
};
This helper function is being called with the criteria object and I have to form up the object in such a way that it will represent the query the way in which I want to search the Artist collection.
What made this difficult to wrap my head around was the not very well formed object for searching over a collection with its random properties such as age which has a min and a max which Mongo does not know how to deal with by default. MongoDB does not know what min and max mean exactly.
So inside the helper function I made a separate object to return from this function thats going to represent the actual query that I want to send off to Mongo.
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
};
I am not modifying the object in anyway, I am just reading some of the desired search results or what the user wants to see from this UI object and so I made this object called query and I added the idea of age.
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
query.age = {};
};
I decided to do an if conditional inside of the helper function for the specific age range that I want to find.
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {};
}
};
So this is where the Mongo query operators come into play. The two operators I want to be concerned with is the greater than or equal to ($gte) and the less than or equal to ($lte) operators.
This is how I actually implemented in practice:
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {
$gte: criteria.age.min,
$lte: criteria.age.max
};
}
};
The query object here will eventually be returned from the buildQuery function:
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {
$gte: criteria.age.min,
$lte: criteria.age.max
};
}
return query;
};
That query object will be passed off to the find operation:
module.exports = (criteria, sortProperty, offset = 0, limit = 20) => {
// write a query that will follow sort, offset, limit options only
// do not worry about criteria yet
const query = Artist.find(buildQuery(criteria))
.sort({ [sortProperty]: 1 })
.skip(offset)
.limit(limit);
return Promise.all([query, Artist.count]).then(results => {
return {
all: results[0],
count: results[1],
offset: offset,
limit: limit
};
});
};
const buildQuery = (criteria) => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {
$gte: criteria.age.min,
$lte: criteria.age.max
};
}
return query;
};
So what I am doing here is to get the equivalent of Artist.find({ age: { $gte: minAge, $lte: maxAge }).
So for yearsActive I decided to implement something that is nearly identical:
const buildQuery = criteria => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {
$gte: criteria.age.min,
$lte: criteria.age.max
};
}
if (criteria.yearsActive) {
}
return query;
};
So if the user changes the slider, I am going to expect my criteria object to have a yearsActive property defined on it like so:
const buildQuery = criteria => {
console.log(criteria);
const query = {};
if (criteria.age) {
query.age = {
$gte: criteria.age.min,
$lte: criteria.age.max
};
}
if (criteria.yearsActive) {
query.yearsActive = {
$gte: criteria.yearsActive.min,
$lte: criteria.yearsActive.max
}
}
return query;
};

Limiting amount of async BigQuery Jobs running on express server

I have an express server that is pulling data from Google BigQuery. An array of objects is provided. I want to pull sales data for each store in a district, but the table holds sales information based only on store and doesn't have district information. I was accomplishing this by sending one query per district, but once the array has more than 50 district I get errors. The results are stored in individual CSV files by district. So it is convenient to send individual queries and dump it into the CSVs. BigQuery only allows 50 jobs at a given time. I am looking for the best way to adapt the below code to call asyncQuery(query) 50 times then only make the next call when a previous call has returned. I have been trying to work the job status using the job.getMetadata() but no luck yet.
Thanks for any help you can offer
const array = [{
district: "north"
store: "1001,1002"
},
{
district: "south"
store: "1003"
},{
district: "west"
store: "1004"
}
]
function apiCall(array) {
array.forEach(element => {
let stores = element.store.toString()
let query = `SELECT store, sku, tot_sales, price
FROM big-query-table
WHERE
store IN (${stores})`
asyncQuery(query)
.then(resp => {
console.log(resp)
}).catch(err => {
console.error('ERROR:', err);
})
})
return "Running Jobs"
}
function asyncQuery(sqlQuery) {
const options = {
query: sqlQuery,
useLegacySql: false,
};
let job;
return bigquery
.createQueryJob(options)
.then(results => {
job = results[0];
console.log(`Job ${job.id} started.`);
return job.promise();
})
.then(() => {
// Get the job's status
return job.getMetadata();
})
.then(metadata => {
// Check the job's status for errors
const errors = metadata[0].status.errors;
if (errors && errors.length > 0) {
throw errors;
}
})
.then(() => {
console.log(`Job ${job.id} completed.`);
return job.getQueryResults();
})
.then(results => {
const rows = results[0];
return rows;
})
.catch(err => {
console.error('ERROR:', err);
});
}
With BigQuery - and any other columnar analytical database - you really want to avoid doing 50 queries like:
[*50] SELECT * FROM big-query-table
WHERE storeNumber = ${StoreNumber}
Instead, the best you could do is one query, specifying the columns you are looking for, and all the ids you're looking for:
SELECT col1, col2, col3
FROM big-query-table
WHERE storeNumber IN ('id1', 'id2', ..., 'id50')
Or a straight join:
SELECT col1, col2, col3
FROM big-query-table
WHERE storeNumber IN (SELECT store_id FROM `table`)
Then you won't need to send 50 concurrent queries, and you'll get result in less time and lower cost.

Resources