node.js mongoose remove the subset of presisted documents not in a current list of those documents - node.js

I have a list of documents I retrieve from a web API. All documents in this list have the same structure and 2 fields combined create a natural key.
I take this list and persist into a collection.
A month of so later I will call for a fresh subset of documents from the API based on a specifically value from one of the 2 fields. However, not all of the documents in the new subset include all the documents previously persisted.
I need to identify and remove old documents not in the fresh subset.
In SQL this is:
delete a from olderset a
left join newersubset b
on a.f1 = b.f1
and a.f2 = b.f2
where a.f2 is null
-- or something like that
Think of f1 as companyName and f2 as transactionID.
olderset will contain a collection of different companyName/s.
But my newer API call is only getting the transactions of one specific company.
In mongoose, what is the best strategy to remove the company specific older transactions from the olderset collection. When the documents to be removed do not exists in the newersubset list?
Can you offer a code example?
Sample data:
[
{ "f1": "f1a", "f2": "f2a", "f3": "f3a" }
, { "f1": "f1b", "f2": "f2b", "f3": "f3b" }
, { "f1": "f1c", "f2": "f2c", "f3": "f3c" }
, { "f1": "f1d", "f2": "f2d", "f3": "f3d" }
]
the second round:
[
{ "f1": "f1a", "f2": "f2a", "f3": "f3a" }
, { "f1": "f1b", "f2": "f2b", "f3": "f3b" }
, { "f1": "f1c", "f2": "f2c", "f3": "f3c" }
]

If you a have set of documents that you would like to use to replace ALL of the documents in an existing collection, the best and safest way to do this is by using a temporary collection.
The following steps assume your collection is called foo
Insert the new documents into a temporary collection called foo_temp
Once all the records have been added (in a callback or a then) rename the original foo collection to foo_old
Rename the foo_temp collection to foo
Drop the collection foo_old
Notes:
In MongoDB, the new collection will be added automatically.
Performance should not be an issue, as you are only handling 1K records or so. Still, it wouldn't hurt to do overnight.
In the question it is noted that the IDs are specifically set and not auto-generated, if they were auto-generated the new ones would not match the old ones.
References:
Inserting multiple documents
Renaming a collection
Dropping a collection

As you say you could iterate down a subset of the older documents. Match each of those to the list of newer documents on the natural key. When you find an older document not in the newer list then delete it.
In LINQ this would be easy. Is that available to you?
I don't know how to do it in mongoose without iterating down one side or the other and/or without LINQ.

Related

ArangoDB populate relation as field over graph query

I recently started using Arango since I want to make use of the advantages of graph databases. However, I'm not yet sure what's the most elegant and efficient approach to query an item from a document collection and applying fields to it that are part of a relation.
I'm used to make use of population or joins in SQL and NoSQL databases, but I'm not sure how it works here.
I created a document collection called posts. For example, this is a post:
{
"title": "Foo",
"content": "Bar"
}
And I also have a document collection called tags. A post can have any amount of tags, and my goal is to fetch either all or specific posts, but with their tags included, so for example this as my returning query result:
{
"title": "Foo",
"content": "Bar",
"tags": ["tag1", "tag2"]
}
I tried creating those two document collections and an edge collection post-tags-relation where I added an item for each tag from the post to the tag. I also created a graph, although I'm not yet sure what the vertex field is used for.
My query looked like this
FOR v, e, p IN 1..2 OUTBOUND 'posts/testPost' GRAPH post-tags-relation RETURN v
And it did give me the tag, but my goal is to fetch a post and include the tags in the same document...The path vertices do contain all tags and the post, but in separate arrays, which is not nice and easy to use (and probably not the right way). I'm probably missing something important here. Hopefully someone can help.
You're really close - it looks like your query to get the tags is correct. Now, just add a bit to return the source document:
FOR post IN posts
FILTER post._key == 'testPost'
LET tags = (
FOR v IN 1..2 OUTBOUND post
GRAPH post-tags-relation
RETURN v.value
)
RETURN MERGE(
post,
{ tags }
)
Or, if you want to skip the FOR/FILTER process:
LET post = DOCUMENT('posts/testPost')
LET tags = (
FOR v IN 1..2 OUTBOUND post
GRAPH post-tags-relation
RETURN v.value
)
RETURN MERGE(
post,
{ tags }
)
As for graph definition, there are three required fields:
edge definitions (an edge collection)
from collections (where your edges come from)
to collections (where your edges point to)
The non-obvious vertex collections field is there to allow you to include a set of vertex-only documents in your graph. When these documents are searched and how they're filtered remains a mystery to me. Personally, I've never used this feature (my data has always been connected) so I can't say when it would be valuable, but someone thought it was important to include.

Is there a way to search In Firebase firestore without saving another field in lowercase for case-insensitive search? [duplicate]

This question already has answers here:
Cloud Firestore Case Insensitive Sorting Using Query
(3 answers)
Are Cloud Firestore queries still case sensitive?
(1 answer)
Closed 1 year ago.
To support case-insensitive or any other canonicalization do we need to write a separate field that contains the canonicalized version and query against that??.
For example:
db.collection("users").where("name", "==", "Dan")
db.collection("users").where("name_lowercase", "==", "dan")
What I would do:
Before querying (maybe client-side): convert the query term in two or more variations (10 variations is maximum). For example, the search term "dan" (String) becomes an array of ["dan", "DAN", "Dan"]
Then I would do a "in" query, where I would search all of those variations in the same name field.
The "in" query type supports up to 10 equality (==) clauses with a logical "OR" operator. (documentation here)
This way, you can keep only one field "name" and query with possible variations on it.
It would look like this:
let query_variations = ["dan", "DAN", "Dan"]; // TODO: write a function that converts the query string into this kind of Array
let search = await db.collection("users").where("name", "in", query_variations).get();
In short, yes.
This is because Cloud Firestore (and the Firebase Realtime Database, when enabled) are indexed databases based on the values of each property in a document.
Rather than search through hundreds (if not thousands and thousands) of documents for matches, the index of the relevant property is queried for matching document IDs.
Consider the following "database" and it's index based on the name in the documents:
const documents = {
"docId1": {
name: "dan"
},
"docId2": {
name: "dan"
},
"docId3": {
name: "Dan"
},
"docId4": {
name: "Dan"
}
}
const nameIndex = {
"dan": ["docId1, docId2"],
"Dan": ["docId3, docId4"]
}
Instead of calling Object.entries(documents).filter(([id, data]) => data.name === "dan") on the entire list of documents, you can just ask the index instead using nameIndex["dan"] yielding the final results ["docId1, docId2"] near-instantly ready to be retrieved.
Continuing that same example, calling nameIndex["daniel"] gives undefined (no documents with that name) which can quickly be used to say that the data doesn't exist in the database).
Firestore introduced composite indexes, which allows you to index across multiple properties such as "name" and "age" so you can also quickly and efficiently search documents where the name is "Dan" but they are also 42 years of age.
Further reading: The Firebase documentation covers one solution for text-based search here.

Compare values inside same subdocument for findOne() [MongoDB]

I have a database full of objects that look ~exactly like this (simplified for clarity):
{
"_id": "GIFT100",
"price": 100,
"priceHistory": [
100, 110
],
"update": 1444183299242
}
What I'm trying to do is create a query document for MongoJS (or MongoDB and I can figure out the rest) that looks for the fact that priceHistory[0] < priceHistory[1].
I would want my query document to return the above record as a result. Alternatively, I could change my document code to compare price < priceHistory[0] but I believe this still leads to the same problem (comparing values inside the same document).
Any help would be appreciated, I've exhausted my Google-foo.
Edit:
I want to return a set of records that indicate a price drop since our last scan (performed daily). Basically a set of "sale" items from a data source I don't control.
You can use the $where clause, but be careful--it's slow, it cannot use your indexes, and it will perform a full table scan. Pass on whatever Javascript you want to use for comparison:
db.collection.findOne({$where: "priceHistory[0] < priceHistory[1]"})
Additionally, you can skip the $where statement if that's the only thing you're querying by:
db.collection.findOne("priceHistory[0] < priceHistory[1]")

Cloudant 1 to many function

I’ve just started to use Cloudant and I just can’t get my head around the map functions. I’ve been fiddling with the data below but it isn’t working out as I expected.
The relationship is, a user can have many vehicles. A vehicle belongs to 1 user. The vehicle ‘userId’ is the key of the user. There is a bit of redundancy as in user the _id and userId is the same, guess later is not required.
Anyhow, how can I find for a/every user, the vehicles which belong to it? The closest I’ve come through trial and error is a result which displays the owner of every vehicle, but I would like it the other way round, the user and the vehicles belonging to it. All the examples I’ve found use another document which ‘joins’ two or more documents, but I don’t need to do that?
Any point in the right direction appreciated - I really have no idea.
function (doc) {
if (doc.$doctype == "vehicle")
{
emit(doc.userId, {_id: doc.userId});
}
}
EDIT: Getting closer. I'm not sure exactly what I was expecting, but the result seems a bit 'messy'. Row[0] is the user document, row[n > 0] are the vehicle documents. I guess it's fine when a startkey/endkey is used, but without the results are a bit jumbled up.
function (doc) {
if (doc.$doctype == 'user') {
emit([doc._id, 0], doc);
} else if (doc.$doctype == 'vehicle') {
emit([doc.userId, 1, doc._id], doc);
}
}
A user is described as,
{
"_id": "user:10",
"firstname": “firstnamehere",
"secondname": “secondnamehere",
"userId": "user:10",
"$doctype": "user"
}
a vehicle is described as,
{
"_id": "vehicle:4002”,
“name”: “avehicle”,
"userId": "user:10",
"$doctype": "vehicle",
}
You're getting in the right direction! You already got that right with the global IDs. Having the type of the document as part of the ID in some form is a very good idea, so that you don't get confused later (all documents are in the same "pot").
Here are some minor problems with your current solution (before getting to your actual question):
Don't emit the doc as value in emit(key, value). You can always ask for the document that belongs to a view row by querying with include_docs=true. Having the doc as view value increases the view indexes a lot. When you don't need a specific value, use emit(key, null).
You also don't need the ID in the emit value. You'll get the ID of the document that belongs to a view row as part of the row anyway.
View Collation
Now to your problem of aggregating the vehicles with their user. You got the basic pattern right. This pattern is called view collation, you can read more about it in the CouchDB docs (ignore that it is in the "Couchapp" section).
The trick with view collation is that you return two or more types of documents, but make sure that they are sorted in a way that allows for direct grouping. Thus it is important to understand how CouchDB sorts the view result. See the collation specification for more information on that one. An important key to understanding view collation is that rows with array keys are sorted by key elements. So when two rows have the same key[0], they sort by key[1]. If that's equal as well, key[2] is considered, and so on.
Your map function frist groups users and vehicles by user ID (key[0]). Your map function then uses the fact that 0 sorts before 1 in the second element of the key, so your view will contain the following:
user 1
vehicle of user 1
vehicle of user 1
vehicle of user 1
user 2
user 3
vehicle of user 3
user 4
etc.
As you can see, the vehicles of a user immediately follow their user. Thus you can group this result into aggregates without performing expensive sort or lookup operations.
Note that users are sorted according to their ID, and vehicles within users also according to their ID. This is because you use the IDs in the key array.
Creating Queries
Now that view isn't worth much if you can't query according to your needs. A view as you have it supports the following queries:
Get all users with their vehicles
Get a range of users with their vehicles
Get a single user with its vehicles
Get a single user without vehicles (you could also use the _all_docs view for that though)
Example query for "all users between user 1 and user 3 (inclusive) with their vehicles"
We want to query for a range, so we use startkey and endkey in the query:
startkey=["user:1", 0]
endkey=["user:3", 1, {}]
Note the use of {} as sentinel value, which is required so that the end key is larger than any row that has a key of ["user:3", 1, (anyConceivableVehicleId)]

couchdb - Map Reduce - How to Join different documents and group results within a Reduce Function

I am struggling to implement a map / reduce function that joins two documents and sums the result with reduce.
First document type is Categories. Each category has an ID and within the attributes I stored a detail category, a main category and a division ("Bereich").
{
"_id": "a124",
"_rev": "8-089da95f148b446bd3b33a3182de709f",
"detCat": "Life_Ausgehen",
"mainCat": "COL_LEBEN",
"mainBereich": "COL",
"type": "Cash",
"dtCAT": true
}
The second document type is a transaction. The attributes show all the details for each transaction, including the field "newCat" which is a reference to the category ID.
{
"_id": "7568a6de86e5e7c6de0535d025069084",
"_rev": "2-501cd4eaf5f4dc56e906ea9f7ac05865",
"Value": 133.23,
"Sender": "Comtech",
"Booking Date": "11.02.2013",
"Detail": "Oki Drucker",
"newCat": "a124",
"dtTRA": true
}
Now if I want to develop a map/reduce to get the result in the form:
e.g.: "Name of Main Category", "Sum of all values in transactions".
I figured out that I could reference to another document with "_ID:" and ?include_docs=true, but in that case I can not use a reduce function.
I looked in other postings here, but couldn't find a suitable example.
Would be great if somebody has an idea how to solve this issue.
I understand, that multiple Category documents may have the same mainCat value. The technique called view collation is suitable to some cases where single join would be used in relational model. In your case it will not help: although you use two document schemes, you really have three level structure: main-category <- category <- transaction. I think you should consider changing the DB design a bit.
Duplicating the data, by storing mainCat value also in the transaction document, would help. I suggest to use meaningful ID for the transaction instead of generated one. You can consider for example "COL_LEBEN-7568a6de86e5e" (concatenated mainCat with some random value, where - delimiter is never present in the mainCat). Then, with simple parser in map function, you emit ["COL_LEBEN", "7568a6de86e5e"] for transactions, ["COL_LEBEN"] for categories, and reduce to get the sum.

Resources