How to keep json object with highest value among duplicates with nodejs - node.js

I have JSON objects imported from an external system, some of which are duplicates in an ID value.
Foe example:
{
"ID": "1",
"name": "Bob",
"ink": "100"
},
{
"ID":"2",
"Name": "George",
"ink": "100"
},
{
"ID":"1",
"name": "Bob",
"ink":"200"
}
I am manipulating the information for each object, then push them into a new JSON array:
var array = {};
array.users = [];
for (let user of users) {
function (user) => {
...
array.users.push(user);
}
}
I need to remove all duplicates save the one with the highest value in the ink key.
I found solutions to do this for the array AFTER it is constructed, but that means I use system resources for nothing - no reason to manipulate users that will be removed anyway.
I am looking for a way to check for each new user if a user with that ID:value pair already exists in the array.users[] array, if it does, compare the values of the ink key, if it is higher - remove the existing from the array, then I can continue with my manipulation code and push the new user into the array.
Any ideas of what would be the most elegant/efficient/shortest way to accomplish this?

I am not really sure if I fully understood your question. If I understand correctly you don't want to pass through the entire array after it is constructed and check for duplicates?
"If in doubt throw a hash map at the problem". Use a map instead of a plain array. The map key stores the ID. And save your fields as the value. If a key already exists then you can just check which value is higher.
Code example should somewhat look like this:
let userMap = new Map()
for (let user in users) {
if (userMap.has(user["ID"]) //Look which ink is bigger
else //Store new entry
}
EDIT: My solution does require an extra step though and is not directly done in the original array. However, I still think that maps are probably one of the most efficient ways to handle this...

var array = {};
array.users = users.filter((user)=>{
for (let userSecond of users) {
if(userSecond.id === user.id && +userSecond.ink > +user.ink){
return false;
}
}
return true;
});
Not the cleanest solution perhaps but it should do the job. Basically you filter through users. Within the filter you go through every user again to check if any of them has the same id and more ink, if so the current user should be discarded by returning false. If no user is found with same id and more ink the current user will stay in the array.

Related

Query CosmosDB when document contains Dictionary

I have a problem with querying CosmosDB document which contains a dictionary. This is an example document:
{
"siteAndDevices": {
"4cf0af44-6233-402a-b33a-e7e35dbbee6a": [
"f32d80d9-e93a-687e-97f5-676516649420",
"6a5eb9fa-c961-93a5-38cc-ecd74ada13ac",
"c90e9986-5aea-b552-e532-cd64a250ad10",
"7d4bfdca-547a-949b-ccb3-bbf0d6e5d727",
"fba51bfe-6a5e-7f25-e58a-7b0ced59b5d8",
"f2caac36-3590-020f-ebb7-5ccd04b4412c",
"1b446af7-ba74-3564-7237-05024c816a02",
"7ef3d931-131e-a639-10d4-f4dd5db834ca"
]
},
"id": "f9ef9fb6-4b70-7d3f-2bc8-c3d335018624"
}
I need to get all documents where provided guid is in the list, so in the dictionary value (I don't know dictionary key). I found an information somewhere here that it is not possible to iterate through keys in dictionary in CosmosDB (maybe it has changed since that time but I din't find any information in documentation), but maybe someone will have some idea. I cannot change form of the document.
I tried to do it in Linq, but I didn't get any results.
var query = _documentClient
.CreateDocumentQuery<Dto>(DocumentCollectionUri())
.Where(d => d.SiteAndDevices.Any(x => x.Value.Contains("f32d80d9-e93a-687e-97f5-676516649420")))
.AsDocumentQuery();
Not sure of the Linq query, but with SQL, you'd need something like this:
SELECT * FROM c
where array_contains(c.siteAndDevices['4cf0af44-6233-402a-b33a-e7e35dbbee6a'],"f32d80d9-e93a-687e-97f5-676516649420")
This is a strange document format though, as you've named your key with an id:
"siteAndDevices": {
"4cf0af44-6233-402a-b33a-e7e35dbbee6a": ["..."]
}
Your key is "4cf0af44-6233-402a-b33a-e7e35dbbee6a", which forces you to use a different syntax to reference it:
c.siteAndDevices['4cf0af44-6233-402a-b33a-e7e35dbbee6a']
You'd save yourself a lot of trouble refactoring this to something like:
{
"id": "dictionary1",
"siteAndDevices": {
"deviceId": "4cf0af44-6233-402a-b33a-e7e35dbbee6a",
"deviceValues": ["..."]
}
}
You can refactor further, such as using an array to contain multiple device id + value combos.

Reduce output must shrink more rapidly -- Reducing to a list of documents

I have a few documents in my couch db with json as below. The cId will change for each. And I have created a view with map/reduce function to filter out few documents and return a list of json documents.
Document structure -
{
"_id": "ccf8a36e55913b7cf5b015d6c50009f7",
"_rev": "8-586130996ad60ccef54775c51599e73f",
"cId": 1,
"Status": true
}
Here is the sample map:
function(doc) {
if(doc.Key && doc.Value && doc.Status == true)
emit(null, doc);
}
Here is the sample reduce:
function(key, values, rereduce){
var kv = [];
values.forEach(function(value){
if(value.cId != <some_val>){
kv.push({"k": value.cId, "v" : value});
}
});
return kv;
}
If there are two documents and reduce output has list containing 1 document, this works fine. But if I add one more document (with cId = 2), it throws the errors - "reduce output must shrink more rapidly". Why is this caused? And how can I achieve what I intend to do?
The cause of the error is, that the reduce function does not actually reduce anything (it rather is collecting objects). The documentation mentions this:
The way the B-tree storage works means that if you don’t actually
reduce your data in the reduce function, you end up having CouchDB
copy huge amounts of data around that grow linearly, if not faster
with the number of rows in your view.
CouchDB will be able to compute the final result, but only for views
with a few rows. Anything larger will experience a ridiculously slow
view build time. To help with that, CouchDB since version 0.10.0 will
throw an error if your reduce function does not reduce its input
values.
It is unclear to me, what you intend to achieve.
Do you want to retrieve a list of docs based on certain criteria? In this case, a view without reduce should suffice.
Edit: If the desired result depends on a value stored in a certain document, then CouchDB has a feature called list. It is a design function, that provides access to all docs of a given view, if you pass include_docs=true.
A list URL follow this pattern:
/db/_design/foo/_list/list-name/view-name
Like views, lists are defined in a design document:
{
"_id" : "_design/foo",
"lists" : {
"bar" : "function(head, req) {
var row;
while (row = getRow()) {
if (row.doc._id === 'baz') // Do stuff based on a certain doc
}
}"
},
... // views and other design functions
}

reduce output must shrink more rapidly, on adding new document

I have couple of documents in couchdb, each having a cId field, such as -
{
"_id": "ccf8a36e55913b7cf5b015d6c50009f7",
"_rev": "8-586130996ad60ccef54775c51599e73f",
"cId": 1,
"Status": true
}
I have a simple view, which tries to return max of cId with map and reduce functions as follows -
Map
function(doc) {
emit(null, doc.cId);
}
Reduce
function(key, values, rereduce){
return Math.max.apply(null, values);
}
This works fine (output is 1) until I add one more document with cId = 2 in db. I am expecting output as 2 but it starts giving error as "Reduce output must shrink more rapidly". When I delete this document things are back to normal again. What can be the issue here? Is there any alternative way to achieve this?
Note: There are more views in db, which perform different role and few return json as well. They also start failing on this change.
You could simply use the built-in _statsreduce function, in order to get the maximum value. It is returned in the "max" field.

Creating view to check a document fields for specific values (For a simple login)

I'm very new to cloudant , so pardon me for this question. I am creating a simple mobile game login system which only checks for username(email) and password.
I have several simple docs that are in this format
{
"_id": "xxx",
"_rev": "xxx",
"password": "3O+k+O8bxsxu0KUlSBUiww==", --encrypted by application beforehand
"type": "User",
"email": "asd#asd.com"
}
Right now I can't seem to get the correct 'Formula' for creating this view (map function) whereby I would do a network request and pass it both the email and password. If there is a doc that matches the email, then check the doc.password against the passed value. If it matches, the function should return a simple "YES".
For now my map function is as follows, but this just returns all the docs .
function(doc) {
if (doc.email){
index("password", doc.password, { store : true });
if (doc.password){
emit("YES");
}
}
}
It may be my request format is also wrong. Right now it is as follows. Values are not real, only for format checking
https:/etcetc/_design/app/_view/viewCheckLogin?q=email:"asd#asd.com"&password:"asd"
It looks like you have misunderstood how views are supposed to work. In general you cannot perform logic to return a different result based on the request. Query parameters in a view request can only be used to limit the result set of view entries returned or to return grouped information from the reduce function.
To determine if there is a match for a given username and password you could emit those values as keys and then query for them. This would return the view entry for those keys or an empty list if there was no match. However I'd be very cautious about the security here. Anyone with access to the view would be able to see all the view entries, i.e. all the usernames and passwords.

emit doc twice with different key in couchdb

Say I have a doc to save with couchDB and the doc looks like this:
{
"email": "lorem#gmail.com",
"name": "lorem",
"id": "lorem",
"password": "sha1$bc5c595c$1$d0e9fa434048a5ae1dfd23ea470ef2bb83628ed6"
}
and I want to be able to query the doc either by 'id' or 'email'. So when save this as a view I write so:
db.save('_design/users', {
byId: {
map: function(doc) {
if (doc.id && doc.email) {
emit(doc.id, doc);
emit(doc.email, doc);
}
}
}
});
And then I could query like this:
db.view('users/byId', {
key: key
}, function(err, data) {
if (err || data.length === 0) return def.reject(new Error('not found'));
data = data[0] || {};
data = data.value || {};
self.attrs = _.clone(data);
delete self.attrs._rev;
delete self.attrs._id;
def.resolve(data);
});
And it works just fine. I could load the data either by id or email. But I'm not sure if I should do so.
I have another solution which by saving the same doc with two different view like byId and byEmail, but in this way I save the same doc twice and obviously it will cost space of the database.
Not sure which solution is better.
The canonical solution would be to have two views, one by email and one by id. To not waste space for the document, you can just emit null as the value and then use the include_docs=true query paramter when you query the view.
Also, you might want to use _id instead of id. That way, CouchDB ensures that the ID will be unique and you don't have to use a view to loop up documents.
I'd change to the two separate views. That's explicit and clear. When you emit the same doc twice in a single view – by an id and e-mail you're effectively combining the 2 views into one. You may think of it as a search tree with the 2 root branches. I don't see any reason of doing that, and would suggest leaving the data access and storage optimization job to the database.
The views combination may also yield tricky bugs, when for some reason you confuse an id and an e-mail.
There is absolutely nothing wrong with emitting the same document multiple times with a different key. It's about what makes most sense for your application.
If id and email are always valid and interchangeable ways to identify a user then a single view is perfect. For example, when id is some sort of unique account reference and users are allowed to use that or their (more memorable) email address to login.
However, if you need to differentiate between the two values, e.g. id is only meant for application administrators, then separate views are probably better. (You could probably use a complex key instead ... but that's another answer.)

Resources