Counter++ in Parallel.ForEach - c#-4.0

I understand using an iterator++ inside Parallel.ForEach is not a good option but right now i'm forced to use a counter inside a Parallel.ForEach loop, counter is used to pick up column names of a dynamic object at runtime.Any suggestion what would be the best option?.I read somewhere at StackOverflow that using "Interlocked" is again a bad design inside Parallel.ForEach.

If you really need parallel processing, the indices will have to be pre-computed. Something like Enumerable.Range(0, cols.Length).ToArray(). Otherwise, each column will depend on the previous one, which obviously doesn't parallelize.

Related

Best Practice: Use reference to objects in loop or plain array access?

I go an array of objects Data of let's say 100: Data data_array[100]. What would be the best practice to access these objects in a loop in c++98.
1.
for(int i=0;i<100;++i)
{
Data& data_obj = data_array[i];
// do a lot of with it, call functions and so on
}
2.
for(int i=0;i<100;++i)
{
// do a lot of with it, call functions and so on but always use data_array[i]
}
Is there a performance decrease when using method 1 over 2? Or will the compiler optimizations eliminate any differences anyway?
What would be the preferred way to write code?
PS: I don't have a PC at hand to test out the performance myself.
if you have dissasembler you could read the assembler code and check there are very little differences between the 2 ways.
In fact data_array[i] is always transformed in a temp variable.. but with the first way, i think the code is more readable

How to implement efficient string interning in f#?

What is to implement a custom string type in f# for interning strings. i have to read large csv files into memory. Given most of the columns are categorical, values are repeating and it makes sense to create new string first time it is encountered and only refer to it on subsequent occurrences to save memory.
In c# I do this by creating a global intern pool (concurrent dict) and before setting a value, lookup the dictionary if it already exists. if it exists, just point to the string already in the dictionary. if not, add it to the dictionary and set the value to the string just added to dictionary.
New to f# and wondering what is the best way to do this in f#. will be using the new string type in records named tuples etc and it will have to work with concurrent processes.
Edit:
String.Intern uses the Intern Pool. My understanding is, it is not very efficient for large pools and is not garbage collected i.e. any/all interned strings will remain in intern pool for lifetime of the app. Imagine a an application where you read a file, perform some operations and write data. Using Intern Pool solution will probably work. Now imagine you have to do the same 100 times and the strings in each file have little in common. If the memory is allocated on heap, after processing each file, we can force garbage collector to clear unnecessary strings.
I should have mentioned I could not really figure out how to do the C# approach in F# (other than implementing a C# type and using it in F#)
Memorisation pattern is slightly different from what I am looking for? We are not caching calculated results - we are ensuring each string object is created no more than once and all subsequent creations of same string are just references to the original. Using a dictionary to do this is a one way and using String.Intern is other.
sorry if is am missing something obvious here.
I have a few things to say, so I'll post them as an answer.
First, I guess String.Intern works just as well in F# as in C#.
let x = "abc"
let y = StringBuilder("a").Append("bc").ToString()
printfn "1 : %A" (LanguagePrimitives.PhysicalEquality x y) // false
let y2 = String.Intern y
printfn "2 : %A" (LanguagePrimitives.PhysicalEquality x y2) // true
Second, are you using a dictionary in combination with String.Intern in your C# solution? If so, why not just do s = String.Intern(s); after the string is ready following input from file?
To create a type for use in your business domain to handle string deduplication in general is a very bad idea. You don't want your business domain polluted by that kind of low level stuff.
As for rolling your own. I did that some years ago, probably to avoid that problem you mentioned with the strings not being garbage collected, but I never tested if that actually was a problem.
It might be a good idea to use a dictionary (or something) for each column (or type of column) where the same values are likely to repeat in great numbers. (This is pretty much what you said already.)
It makes sense to only keep these dictionaries live while you read the information from file, and stuff it into internal data structures. You might be thinking that you need the dictionaries for subsequent reads, but I am not so sure about that.
The important thing is to deduplicate the great majority of strings, and not necessarily every single duplicate. Because of this you can greatly simplify the solution as indicated. You most probably have nothing to gain by overcomplicating your solution to squeeze out the last fraction of memory savings.
Releasing the dictionaries after the file is read and structures filled, will have the advantage of not holding on to strings when they are no longer really needed. And of course you save memory by not holding onto the dictionaries.
I see no need to handle concurrency issues in the implementation here. String.Intern must necessarily be immune to concurrency issues. If you roll your own with the design suggested, you would not use it concurrently. Each file being read would have its own set of dictionaries for its columns.

How to use async.map

I am having two for loops. One nested in another. I want to iterate on a single Object and change a property in it with another value, something like this:
for(i=0;i<items.length;<i++){
obj.changeThisAttribute = "abc";
for(j=0;j<items.anotherobj.length;j++){
items.anotherobj.changeThisAttribute = "dyz";
}
}
return items;
Is there any better way of doing this? I have read about Async.map and think that it will be a good solution however there is no good example of the same. Please suggest a running example or any alternative way of achieving this.
You're not performing anything asynchronous here so there is no point in async.map.
Unless this is very CPU intensive (looks fine! profile, how many objects do you have?) , your code looks fine.
It's readable, straightforward and simple, no need to look for alternative ways.
(I'm assuming your inner loop goes through items[i].anotherobj and not items.anotherobj though)

Why do some programming languages restrict you from editing the array you're looping through?

Pseudo-code:
for each x in someArray {
// possibly add an element to someArray
}
I forget the name of the exception this throws in some languages.
I'm curious to know why some languages prohibit this use case, whereas other languages allow it. Are the allowing languages unsafe -- open to some pitfall? Or are the prohibiting languages simply being overly cautious, or perhaps lazy (they could have implemented the language to gracefully handle this case, but simply didn't bother).
Thanks!
What would you want the behavior to be?
list = [1,2,3,4]
foreach x in list:
print x
if x == 2: list.remove(1)
possible behaviors:
list is some linked-list type iterator, where deletions don't affect your current iterator:
[1,2,3,4]
list is some array, where your iterator iterates via pointer increment:
[1,2,4]
same as before, only the system tries to cache the iteration count
[1,2,4,<segfault>]
The problem is that different collections implementing this enumerable/sequence interface that allows for foreach-looping have different behaviors.
Depending on the language (or platform, as .Net), iteration may be implemented differently.
Typically a foreach creates an Iterator or Enumerator object on the array, which internally keeps its state about the iteration details. If you modify the array (by adding or deleting an element), the iterator state would be inconsistent in regard to the new state of the array.
Platforms such as .Net allow you to define your own enumerators which may not be susceptible to adding/removing elements of the underlying array.
A generic solution to the problem of adding/removing elements while iterating is to collect the elements in a new list/collection/array, and add/remove the collected elements after the enumeration has completed.
Suppose your array has 10 elements. You get to the 7th element, and decide there that you need to add a new element earlier in the array. Uh-oh! That element doesn't get iterated on! for each has the semantics, to me at least, of operating on each and every element of the array, once and only once.
Your pseudo example code would lead to an infinite loop. For each element you look at, you add one to the collection, hence if you have at least 1 element to start with, you will have i (iterative counter) + 1 elements.
Arrays are typically fixed in the number of elements. You get flexible sized widths through wrapped objects (such as List) that allow the flexibility to occur. I suspect that the language may have issues if the mechanism they used created a whole new array to allow for the edit.
Many compiled languages implement "for" loops with the assumption that the number of iterations will be calculated once at loop startup (or better yet, compile time). This means that if you change the value of the "to" variable inside the "for i = 1 to x" loop, it won't change the number of iterations. Doing this allows a legion of loop optimizations, which are very important in speeding up number-crunching applications.
If you don't like that semantics, the idea is that you should use the language's "while" construct instead.
Note that in this view of the world, C and C++ don't have proper "for" loops, just fancy "while" loops.
To implement the lists and enumerators to handle this, would mean a lot of overhead. This overhead would always be there, and it would only be useful in a vast miniority of the cases.
Also, any implementation that were chosen would not always make sense. Take for example the simple case of inserting an item in the list while enumerating it, would the new item always be included in the enumeration, always excluded, or should that depend on where in the list the item was added? If I insert the item at the current position, would that change the value of the Current property of the enumerator, and should it skip the currently current item which is then the next item?
This only happens within foreach blocks. Use a for loop with an index value and you'll be allowed to. Just make sure to iterate backwards so that you can delete items without causing issues.
From the top of my head there could be two scenarios of implementing iteration on a collection.
the iterator iterates over the collection for which it was created
the iterator iterates over a copy of the collection for which it was created
when changes are made to the collection on the fly, the first option should either update its iteration sequence (which could be very hard or even impossible to do reliably) or just deny the possibility (throw an exception). The last of which obviously is the safe option.
In the second option changes can be made upon the original collection without bothering the iteration sequence. But any adjustments will not be seen in the iteration, this might be confusing for users (leaky abstraction).
I could imagine languages/libraries implementing any of these possibilities with equal merit.

When to use If-else if-else over switch statements and vice versa [duplicate]

This question already has answers here:
Advantage of switch over if-else statement
(23 answers)
Eliminating `switch` statements [closed]
(23 answers)
Is there any significant difference between using if/else and switch-case in C#?
(21 answers)
Closed 2 years ago.
Why you would want to use a switch block over a series of if statements?
switch statements seem to do the same thing but take longer to type.
As with most things you should pick which to use based on the context and what is conceptually the correct way to go. A switch is really saying "pick one of these based on this variables value" but an if statement is just a series of boolean checks.
As an example, if you were doing:
int value = // some value
if (value == 1) {
doThis();
} else if (value == 2) {
doThat();
} else {
doTheOther();
}
This would be much better represented as a switch as it then makes it immediately obviously that the choice of action is occurring based on the value of "value" and not some arbitrary test.
Also, if you find yourself writing switches and if-elses and using an OO language you should be considering getting rid of them and using polymorphism to achieve the same result if possible.
Finally, regarding switch taking longer to type, I can't remember who said it but I did once read someone ask "is your typing speed really the thing that affects how quickly you code?" (paraphrased)
If you are switching on the value of a single variable then I'd use a switch every time, it's what the construct was made for.
Otherwise, stick with multiple if-else statements.
concerning Readability:
I typically prefer if/else constructs over switch statements, especially in languages that allows fall-through cases. What I've found, often, is as the projects age, and multiple developers gets involved, you'll start having trouble with the construction of a switch statement.
If they (the statements) become anything more than simple, many programmers become lazy and instead of reading the entire statement to understand it, they'll simply pop in a case to cover whatever case they're adding into the statement.
I've seen many cases where code repeats in a switch statement because a person's test was already covered, a simple fall-though case would have sufficed, but laziness forced them to add the redundant code at the end instead of trying to understand the switch. I've also seen some nightmarish switch statements with many cases that were poorly constructed, and simply trying to follow all the logic, with many fall-through cases dispersed throughout, and many cases which weren't, becomes difficult ... which kind of leads to the first/redundancy problem I talked about.
Theoretically, the same problem could exist with if/else constructs, but in practice this just doesn't seem to happen as often. Maybe (just a guess) programmers are forced to read a bit more carefully because you need to understand the, often, more complex conditions being tested within the if/else construct? If you're writing something simple that you know others are likely to never touch, and you can construct it well, then I guess it's a toss-up. In that case, whatever is more readable and feels best to you is probably the right answer because you're likely to be sustaining that code.
concerning Speed:
Switch statements often perform faster than if-else constructs (but not always). Since the possible values of a switch statement are laid out beforehand, compilers are able to optimize performance by constructing jump tables. Each condition doesn't have to be tested as in an if/else construct (well, until you find the right one, anyway).
However this isn't always the case, though. If you have a simple switch, say, with possible values of 1 to 10, this will be the case. The more values you add requires the jump tables to be larger and the switch becomes less efficient (not than an if/else, but less efficient than the comparatively simple switch statement). Also, if the values are highly variant ( i.e. instead of 1 to 10, you have 10 possible values of, say, 1, 1000, 10000, 100000, and so on to 100000000000), the switch is less efficient than in the simpler case.
Hope this helps.
Switch statements are far easier to read and maintain, hands down. And are usually faster and less error prone.
Use switch every time you have more than 2 conditions on a single variable, take weekdays for example, if you have a different action for every weekday you should use a switch.
Other situations (multiple variables or complex if clauses you should Ifs, but there isn't a rule on where to use each.
I personally prefer to see switch statements over too many nested if-elses because they can be much easier to read. Switches are also better in readability terms for showing a state.
See also the comment in this post regarding pacman ifs.
This depends very much on the specific case. Preferably, I think one should use the switch over the if-else if there are many nested if-elses.
The question is how much is many?
Yesterday I was asking myself the same question:
public enum ProgramType {
NEW, OLD
}
if (progType == OLD) {
// ...
} else if (progType == NEW) {
// ...
}
if (progType == OLD) {
// ...
} else {
// ...
}
switch (progType) {
case OLD:
// ...
break;
case NEW:
// ...
break;
default:
break;
}
In this case, the 1st if has an unnecessary second test. The 2nd feels a little bad because it hides the NEW.
I ended up choosing the switch because it just reads better.
I have often thought that using elseif and dropping through case instances (where the language permits) are code odours, if not smells.
For myself, I have normally found that nested (if/then/else)s usually reflect things better than elseifs, and that for mutually exclusive cases (often where one combination of attributes takes precedence over another), case or something similar is clearer to read two years later.
I think the select statement used by Rexx is a particularly good example of how to do "Case" well (no drop-throughs) (silly example):
Select
When (Vehicle ¬= "Car") Then
Name = "Red Bus"
When (Colour == "Red") Then
Name = "Ferrari"
Otherwise
Name = "Plain old other car"
End
Oh, and if the optimisation isn't up to it, get a new compiler or language...
The tendency to avoid stuff because it takes longer to type is a bad thing, try to root it out. That said, overly verbose things are also difficult to read, so small and simple is important, but it's readability not writability that's important. Concise one-liners can often be more difficult to read than a simple well laid out 3 or 4 lines.
Use whichever construct best descibes the logic of the operation.
Let's say you have decided to use switch as you are only working on a single variable which can have different values. If this would result in a small switch statement (2-3 cases), I'd say that is fine. If it seems you will end up with more I would recommend using polymorphism instead. An AbstractFactory pattern could be used here to create an object that would perform whatever action you were trying to do in the switches. The ugly switch statement will be abstracted away and you end up with cleaner code.

Resources