A non-loop efficient way to erase from unordered_map with predicate C++11? - predicate

Algorithms and member functions are suggested over looping for efficiency when working with containers. However, associative containers (unordered_map) does not work with the erase(remove_if) paradigm, it appears that the common method is to fall back on a loop.
uom is a std::unordered_map
for(auto it = uom.begin() ; it!=uom.end(); ){
if(it->second->toErase()) {
delete it->second; // omit delete if using std::unique_ptr
fpc.erase(it++);
}else{
++it;
}
}
//as per Scott Meyers Effective STL pg45
is this as efficient as possible? It seams like there should be a better way to do this using something like the erase(remove_if) paradigm but that works for unordered_map (I understand that the associative containers cannot be "re-ordered" hence the non-support of the remove_if algorithm). Is this really the best way to erase entries from an unordered_map using a predicate? Any suggestions?
Thank you in advance.

That is as efficient as possible. If you want something more convenient, you could use boost's erase_if template - see here. unordered_map maintains a linked list of nodes in each bucket, so it's cheap to erase them. There's no need of remove-if type "compaction", which suits std::vector's use of contiguous memory.

Related

Best Practice: Use reference to objects in loop or plain array access?

I go an array of objects Data of let's say 100: Data data_array[100]. What would be the best practice to access these objects in a loop in c++98.
1.
for(int i=0;i<100;++i)
{
Data& data_obj = data_array[i];
// do a lot of with it, call functions and so on
}
2.
for(int i=0;i<100;++i)
{
// do a lot of with it, call functions and so on but always use data_array[i]
}
Is there a performance decrease when using method 1 over 2? Or will the compiler optimizations eliminate any differences anyway?
What would be the preferred way to write code?
PS: I don't have a PC at hand to test out the performance myself.
if you have dissasembler you could read the assembler code and check there are very little differences between the 2 ways.
In fact data_array[i] is always transformed in a temp variable.. but with the first way, i think the code is more readable

Efficient algorithm for grouping array of strings by prefixes

I wonder what is the best way to group an array of strings according to a list of prefixes (of arbitrary length).
For example, if we have this:
prefixes = ['GENERAL', 'COMMON', 'HY-PHE-NATED', 'UNDERSCORED_']
Then
tasks = ['COMMONA', 'COMMONB', 'GENERALA', 'HY-PHE-NATEDA', 'UNDERESCORED_A', 'HY-PHE-NATEDB']
Should be grouped this way:
[['GENERALA'], ['COMMONA', 'COMMONB'], ['HY-PHE-NATEDA', 'HY-PHE-NATEDB'], ['UNDERESCORED_A'] ]
The naïve approach is to loop through all the tasks and inner loop through prefixes (or vice versa, whatever) and test each task for each prefix.
Can one give me a hint how to make this in a more efficient way?
It depends a bit on the size of your problem, of course, but your naive approach should be okay if you sort both your prefixes and your tasks and then build your sub-arrays by traversing both sorted lists only forwards.
There are a few options, but you might be interested in looking into the trie data structure.
http://en.wikipedia.org/wiki/Trie
The trie data structure is easy to understand and implement and works well for this type of problem. If you find that this works for your situation you can also look at Patricia Tries which achieve the similar performance characteristics but typically have better memory utilization. They are a little more involved to implement but not overly complex.

How to use async.map

I am having two for loops. One nested in another. I want to iterate on a single Object and change a property in it with another value, something like this:
for(i=0;i<items.length;<i++){
obj.changeThisAttribute = "abc";
for(j=0;j<items.anotherobj.length;j++){
items.anotherobj.changeThisAttribute = "dyz";
}
}
return items;
Is there any better way of doing this? I have read about Async.map and think that it will be a good solution however there is no good example of the same. Please suggest a running example or any alternative way of achieving this.
You're not performing anything asynchronous here so there is no point in async.map.
Unless this is very CPU intensive (looks fine! profile, how many objects do you have?) , your code looks fine.
It's readable, straightforward and simple, no need to look for alternative ways.
(I'm assuming your inner loop goes through items[i].anotherobj and not items.anotherobj though)

String sorting - std::set or std::vector?

I'm going to have around 1000 strings that need to be sorted alphabetically.
std::set, from what I've read, is sorted. std::vector is not. std::set seems to be a simpler solution, but if I were to use a std::vector, all I would need to do is use is std::sort to alphabetize the strings.
My application may or may not be performance critical, so performance isn't necessarily the issue here (yet), but since I'll need to iterate through the container to write the strings to the file, I've read that iterating through a std::set is a tad bit slower than iterating through a std::vector.
I know it probably won't matter, but I'd like to hear which one you all would go with in this situation.
Which stl container would best suit my needs? Thanks.
std::vector with a one-time call to std::sort seems like the simplest way to do what you are after, computationally speaking. std::set provides dynamic lookups by key, which you don't really need here, and things will get more complicated if you have to deal with duplicates.
Make sure you use reserve to pre-allocate the memory for the vector, since you know the size ahead of time in this case. This prevents memory reallocation as you add to the vector (very expensive).
Also, if you use [] notation instead of push_back() you may save a little time avoiding bounds checking of push_back(), but that is really academic and insubstantial and perhaps not even true with compiler optimizations. :-)

Counter++ in Parallel.ForEach

I understand using an iterator++ inside Parallel.ForEach is not a good option but right now i'm forced to use a counter inside a Parallel.ForEach loop, counter is used to pick up column names of a dynamic object at runtime.Any suggestion what would be the best option?.I read somewhere at StackOverflow that using "Interlocked" is again a bad design inside Parallel.ForEach.
If you really need parallel processing, the indices will have to be pre-computed. Something like Enumerable.Range(0, cols.Length).ToArray(). Otherwise, each column will depend on the previous one, which obviously doesn't parallelize.

Resources