Efficient way to determine if there is more than one distinct item using linq.js

Efficient way to determine if there is more than one distinct item using linq.js - linq.js

I'm looking for an efficient way using linq.js to determine if a collection has more than one distinct value. I assume that the following approach is inefficient because it has to consider the entire collection.
if (Enumerable.From(collection).Distinct().Take(2).Count() > 1) {
//it's not unique, continue loop
}
My question is similar to one:
Efficient Linq Enumerable's 'Count() == 1' test
Is there a more efficient linq.js-based technique? Thanks!

If you're specifically testing to see if a collection has more than one item in it, the idiomatic way to write it (IMHO) is to use Skip in conjunction with Any. Skip the first item and if there are any others in the collection, it has more than one. If it was empty, the Skip would effectively do nothing and there still wouldn't be any other items in the collection.
In your case, your condition would be:
if (Enumerable.From(collection).Distinct().Skip(1).Any()) {
//it's not unique, continue loop
}

var test = collection[0];
if (Enumerable
.From(collection)
.Skip(1)
.Any(function (e) { return e != test; })
)
Let me explain it. At least 2 distinct items mean that for any item there is at least one item that is not equal to it. Let's pick first item, you could pick any other, just first is more convenient and let's see if there is any other number not equal to it (except itself).

Related

Building a std::map and issue with using std::emplace

Code:
std::map<CString, S_DISCUSSION_HIST_ITEM> mapHistory;
// History list is in ascending date order
for (auto& sHistItem : m_listDiscussionItemHist)
{
if (m_bFullHistoryMode)
mapHistory.emplace(sHistItem.strName, sHistItem);
else if (sHistItem.eSchool == m_eActiveSchool)
mapHistory.emplace(sHistItem.strName, sHistItem);
}
// The map is sorted by Name (so reset by date later)
// The map has the latest assignment info for each Name now
Observation:
I now understand that std::emplace behaves like this:
The insertion only takes place if no other element in the container has a key equivalent to the one being emplaced (keys in a map container are unique).
Therefore my code is flawed. What I was hoping to acheive (in pseudo code) is:
For Each History Item
Is the name in the map?
No, so add to map with sHitItem
Yes, so replace the sHistItem with this one
End Loop
By the end of this loop iteration I want to have the most recent sHitItem, for each person. But as it is, it is only adding an entry into the map if the name does not exist.
What is the simplest way to get around this?

Use insert_or_assign method if the item is assignable. It will be assigned if it already exists. Or use [] operator followed by assignment, it will default-construct item if it does not exist.
For non-assignable types I'm afraid there's no convenient way.

Cloudant Custom Sort

I have my data as follows
{
"key":"adasd",
"col1"::23,
"col2":3
}
I want to see the results sorted in descending order of the ratio of col1/sum(col2)
where sum(col2) refers to the sum of all values of col2. I am a bit new to cloudant so I don't know what the best way to approach this is. I can think of a few options.
Create a new column for sum(col2) and keep updating it with each new value of col2
For each record,also create a new column col1/sum(col2). Then i can sort on this column.
Use Views to calculate the ratio and sum on the fly. This way I don't have to store new columns plus I don't have to perform costly calculations on each update.
I tried to create a view and the map function is easy enough
function (doc) {
emit(doc._id, {"col1_value":doc.col1,"col2_value":doc.col2});
}
but I am confused by the reduce template
function (keys, values, rereduce) {
if (rereduce) {
return sum(values);
} else {
return values.length;
}
}
I have no idea on how to access the values of the two columns and then aggregate here. Is this even possible? Is there any other way to achieve the result I need?

Two comments:
Ordering by X/sum(Y) is the same as ordering by X (or by -X if sum(Y) is negative). So for ordering purposes, just order by X and save yourself a bunch of hassle.
Assuming you actually want to know the value of X/sum(Y), and not just order by it, there's no one-step way to accomplish this in CouchDB. The best I can think of is to create a map/reduce view that gives you the global sum(Y). Then you can fetch that sum with a simple query, and do the math in your application, when fetching your documents.

How to maintain counters with LinqToObjects?

I have the following c# code:
private XElement BuildXmlBlob(string id, Part part, out int counter)
{
// return some unique xml particular to the parameters passed
// remember to increment the counter also before returning.
}
Which is called by:
var counter = 0;
result.AddRange(from rec in listOfRecordings
from par in rec.Parts
let id = GetId("mods", rec.CKey + par.UniqueId)
select BuildXmlBlob(id, par, counter));
Above code samples are symbolic of what I am trying to achieve.
According to the Eric Lippert, the out keyword and linq does not mix. OK fair enough but can someone help me refactor the above so it does work? A colleague at work mentioned accumulator and aggregate functions but I am novice to Linq and my google searches were bearing any real fruit so I thought I would ask here :).
To Clarify:
I am counting the number of parts I might have which could be any number of them each time the code is called. So every time the BuildXmlBlob() method is called, the resulting xml produced will have a unique element in there denoting the 'partNumber'.
So if the counter is currently on 7, that means we are processing 7th part so far!! That means XML returned from BuildXmlBlob() will have the counter value embedded in there somewhere. That's why I need it somehow to be passed and incremented every time the BuildXmlBlob() is called per run through.

If you want to keep this purely in LINQ and you need to maintain a running count for use within your queries, the cleanest way to do so would be to make use of the Select() overloads that includes the index in the query to get the current index.
In this case, it would be cleaner to do a query which collects the inputs first, then use the overload to do the projection.
var inputs =
from recording in listOfRecordings
from part in recording.Parts
select new
{
Id = GetId("mods", recording.CKey + part.UniqueId),
Part = part,
};
result.AddRange(inputs.Select((x, i) => BuildXmlBlob(x.Id, x.Part, i)));
Then you wouldn't need to use the out/ref parameter.
XElement BuildXmlBlob(string id, Part part, int counter)
{
// implementation
}

Below is what I managed to figure out on my own:.
result.AddRange(listOfRecordings.SelectMany(rec => rec.Parts, (rec, par) => new {rec, par})
.Select(#t => new
{
#t,
Id = GetStructMapItemId("mods", #t.rec.CKey + #t.par.UniqueId)
})
.Select((#t, i) => BuildPartsDmdSec(#t.Id, #t.#t.par, i)));
I used resharper to convert it into a method chain which constructed the basics for what I needed and then i simply tacked on the select statement right at the end.

How to search and sort with CouchDB in one map function

I'm stumbling a bit with my CouchDB knowledge.
I have a database of content that is tagged with an array of tags and has a created date.
I want to create a view that pulls a limited number of newest stories tagged with a specific tag.
For example, the newest 6 stories tagged "Business."
Ran across this question, which seems to get me almost to where I need to go, but I'm missing one key element, which I think is how to craft the query string to sort by one key while searching by the other.
Here's my map function.
function(doc) {
if (doc.published == "yes" && doc.type == "news") {
for (var i = 0; i < doc.tags.length; i++) {
if (doc.tags[i]) {
emit([doc.created, doc.tags[i]], doc);
}
}
}
}
So how do I query that view for a all documents tagged "Business" that are the newest documents based on created.
The created attribute is a date sortable format.

First, I would switch the order of your emit:
emit([doc.tags[i], doc.created]);
(leave out doc as well, you can just add include_docs=true to get the entire document, and your view won't take up so much disk-space in the process)
Now you can query for the all the stories tagged as "Business" by using the following querystring:
startkey=["Business"]&endkey=["Business",{}]
You'll get all the documents with the tag business, and they'll be sorted by date.
This takes advantage of view collation, which basically is the rules governing how indexes are sorted/queried. For complex keys like this, the sorting is done for each item of the array separately. (ie. the first key is sorted first, the second key is sorted second, etc) This is why the order matters, as you must always move from left to right when querying a view index.
If you want the 6 most recent, your querystring will need to change:
descending=true&limit=6&endkey=["Business"]&startkey=["Business",{}]
NOTICE You need to swap the startkey/endkey values, due to how the descending parameter works. See the View reference page on the wiki for further explanation.

OK, I think I figured this out, but I'm not quite certain I fully understand it.
I found this story about complex keys and searching and sorting.
My map function looks like this:
function(doc) {
if (doc.published == "yes" && doc.type == "news") {
for (var i = 0; i < doc.tags.length; i++) {
if (doc.tags[i]) {
emit([doc.tags[i], doc.created], doc);
}
}
}
}
And to query and sort using it, the query looks like this.
http://localhost:5984/database/_design/story/_view/tagged?limit=10&startkey=["Business"]&endkey=["Business",{}]&descending=false
I'm getting the results I want, but I'm not entirely certain I understand it all.

Does CouchDB support multiple range queries?

How are multiple range queries implemented in CouchDB? For a single range condition, startkey and endkey combination works fine, but the same thing is not working with a multiple range condition.
My View function is like this:
"function(doc){
if ((doc['couchrest-type'] == 'Item')
&& doc['loan_name']&& doc['loan_period']&&
doc['loan_amount'])
{ emit([doc['template_id'],
doc['loan_name'],doc['loan_period'],
doc['loan_amount']],null);}}"
I need to get the whole docs with loan_period > 5 and
loan_amount > 30000. My startkey and endkey parameters are like this:
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0",nil,5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0",{},{},{}],:include_docs => true}
Here, I am not getting the desired result. I think my startkey and endkey params are wrong. Can anyone help me?

A CouchDB view is an ordered list of entries. Queries on a view return a contiguous slice of that list. As such, it's not possible to apply two inequality conditions.
Assuming that your loan_period is a discrete variable, this case would probably be best solved by emit'ing the loan_period first and then issuing one query for each period.
An alternative solution would be to use couchdb-lucene.

You're using arrays as your keys. Couchdb will compare arrays by comparing each array element in increasing order until two element are not equal.
E.g. to compare [1,'a',5] and [1,'c',0] it will compare 1 whith 1, then 'a' with 'c' and will decide that [1,'a',5] is less than [1,'a',0]
This explains why your range key query fails:
["7446567e45dc5155353736cb3d6041c0",nil,5,30000] is greater ["7446567e45dc5155353736cb3d6041c0",nil,5,90000]

Your emit statement looks a little strange to me. The purpose of emit is to produce a key (i.e. an index) and then the document's values that you are interested in.
for example:
emit( doc.index, [doc.name, doc.address, ....] );
You are generating an array for the index and no data for the view.
Also, Couchdb doesn't provide for an intersection of views as it doesn't fit the map/reduce paradigm very well. So your needs boil down to trying to address the following:
Can I produce a unique index which I can then extract a particular range from? (using startkey & endkey)

Actually CouchDB allows views to have complex keys which are arrays of values as given in the question:
[template_id, loan_name, loan_period, loan_amount]
Have you tried
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0",nil,5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0",{}],:include_docs => true}
or perhaps
params = {:startkey =>["7446567e45dc5155353736cb3d6041c0","\u0000",5,30000],
:endkey=>["7446567e45dc5155353736cb3d6041c0","\u9999",{}],:include_docs => true}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Efficient way to determine if there is more than one distinct item using linq.js - linq.js

Related

Building a std::map and issue with using std::emplace

Cloudant Custom Sort

How to maintain counters with LinqToObjects?

How to search and sort with CouchDB in one map function

Does CouchDB support multiple range queries?

Categories

Resources