Threadsafe mutable collection with fast elements removal and random get - multithreading

I need a thread safe data structure with three operations: remove, getRandom, reset.
I have only two ideas by now.
First: Seq in syncronized var.
val all: Array[String] = ... //all possible.
var current: Array[String] = Array.empty[String]
def getRandom(): = {
val currentAvailable = current
currentAvailable(Random.nextInt(currentAvailable.length))
}
def remove(s: String) = {
this.syncronized {
current = current diff Seq(s)
}
}
def reset(s: String) = {
this.syncronized {
current = all
}
}
Second:
Maintain some Map[String,Boolean], there bool is true when element currently is present. The main problem is to make a fast getRandom method (not something like O(n) in worst case).
Is there a better way(s) to implement this?

Scala's Trie is a lock free data structure that supports snapshots (aka your currentAvailable) and fast removals

Since I'm not a Scala expert so this answer is general as an example I used Java coding.
in short the answer is YES.
if you use a map such as :
Map<Integer,String> map=new HashMap<Integer,String>(); //is used to get random in constant time
Map<String,Integer> map1=new HashMap<String,Integer>(); //is used to remove in constant time
to store date,
the main idea is to keep the key( in this case the integer) synchronized to be {1 ... size of map}
for example to fill this structure, you need something like this:
int counter=0; //this is a global variable
for(/* all your string (s) in all */ ){
map.put(counter++, s);
}
//then , if you want the removal to be in constant time you need to fill the second map
for(Entry e : map.EntrySet(){
map1.put(e.getValue(),e.getKey());
}
The above code is the initialization. everytime you want to set things you need to do that
then you can achieve a random value with O(1) complexity
String getRandom(){
int i; /*random number between 0 to counter*/
return map.get(i);
}
Now to remove things you use map1 to achive it in constant time O(1);
void remove(String s){
if(!map1.containsKey(s))
return; //s doesn't exists
String val=map.get(counter); //value of the last
map.remove(counter) //removing the last element
int thisCounter= map1.get(s); //pointer to this
map1.remove(s); // remove from map1
map.remove(counter); //remove from map
map1.put(thisCounter,val); //the val of the last element with the current pointer
counter--; //reducing the counter by one
}
obviously the main issue here is to keep the synchronization ensured. but by carefully analyzing the code you should be able to do that.

Related

C++\Cli Parallel::For with thread local variable - Error: too many arguments

Trying to implement my first Parallel::For loop with a tread local variable to sum results of the loop. My code is based on an example listed in "Visual C++ 2010, by W. Saumweber, D. Louis (German). Ch. 33, P.804).
I get stuck in the implementation with syntax errors in the Parallel::For call. The errors are as follows, from left to right: a) expected a type specifier, b) too many arguments for generic class "System::Func", c) pointer to member is not valid for a managed class, d) no operator "&" matches these operands.
In line with the book, I create a collection with data List<DataStructure^> numbers, which is subject to a calculation performed in method computeSumScore which is called by the Parallel::For routine in method sumScore. All results are summed in method finalizeSumScore using a lock.
Below I paste the full code of the .cpp part of the class, to show what I have. The data collection "numbers" may look a bit messy, but that's due to organical growth of the program and me learning as I go along.
// constructor
DataCollection::DataCollection(Form1^ f1) // takes parameter of type Form1 to give acces to variables on Form1
{
this->f1 = f1;
}
// initialize data set for parallel processing
void DataCollection::initNumbers(int cIdx)
{
DataStructure^ number;
numbers = gcnew List<DataStructure^>();
for (int i = 0; i < f1->myGenome->nGenes; i++)
{
number = gcnew DataStructure();
number->concentrationTF = f1->myOrgan->cellPtr[cIdx]->concTFA[i];
number->stringA->AddRange(f1->myGenome->cStruct[i]->gString->GetRange(0, f1->myGenome->cChars));
number->stringB->AddRange(f1->myGenome->cStruct[i]->pString);
if (f1->myGenome->cStruct[i]->inhibitFunc)
number->sign = -1;
else
number->sign = 1;
numbers->Add(number);
}
}
// parallel-for summation of scores
double DataCollection::sumScore()
{
Parallel::For<double>(0, numbers->Count, gcnew Func<double>(this, &GenomeV2::DataCollection::initSumScore),
gcnew Func<int, ParallelLoopState^, double, double>(this, &GenomeV2::DataCollection::computeSumScore),
gcnew Action<double>(this, &GenomeV2::DataCollection::finalizeSumScore));
return summation;
}
// returns start value
double DataCollection::initSumScore()
{
return 0.0;
}
// perform sequence alignment calculation
double DataCollection::computeSumScore(int k, ParallelLoopState^ status, double tempVal)
{
int nwScore;
if (numbers[k]->concentrationTF > 0)
{
nwScore = NeedlemanWunsch::computeGlobalSequenceAlignment(numbers[k]->stringA, numbers[k]->stringB);
tempVal = Mapping::getLinIntMapValue(nwScore); // mapped value (0-1)
tempVal = (double) numbers[k]->sign * tempVal * numbers[k]->concentrationTF;
}
else
tempVal = 0.0;
return tempVal;
}
// locked addition
void DataCollection::finalizeSumScore(double tempVal)
{
Object^ myLock = gcnew Object();
try
{
Monitor::Enter(myLock);
summation += tempVal;
}
finally
{
Monitor::Exit(myLock);
}
}
Once this problem is solved I need to ensure that the functions called (computeGlobalSequenceAlignment and getLinIntMapvalue) are thread safe and the program doesn't get stalled on multiple treads accessing the same (static) variables. But this needs to work first.
Hope you can help me out.
Hans Passant answered my question in the comments (include full method name, add comma). Yet I cannot mark my question as answered, so this answer is to close the question.

Count of nodes in BST

I am trying to count the number of nodes in a Binary Search Tree and was wondering what the most efficient means was. These are the options that I have found:
store int count in the BST Class
store int children in each node of the tree which stores the number of children under it
write a method that counts the number of Nodes in the BST
if using option 3, I've written:
int InOrder {
Node *cur = root;
int count = 0;
Stack *s = null;
bool done = false;
while(!done) {
if(cur != NULL) {
s.push(cur);
cur = cur->left;
}
else {
if(!s.IsEmpty()) {
cur = s.pop();
count++;
cur = cur->right;
}
else {
done = true;
}
}
}
return count;
}
but from looking at it, it seems like it would get stuck in an infinite loop between cur = cur->left; and cur = cur->right;
So which option is the most efficient and if it is option 3, then will this method work?
I think the first option is the quickest and it only requires O(1) space to achieve this. However whenever you insert/delete an item, you need to keep updating this value.
It will take O(1) time to get the number of all the nodes.
The second option would make this program way too complicated since deleting/inserting a node somewhere would have to update all of its ancestors. Either you add a parent pointer so you can adequately update each one of the ancestors, or you need to go through all the nodes in the tree and update the numbers again. Anyway I think this would be the worst option of all three.
The third option is good if you don't call this many times since the first option is a lot quicker, O(1), than this option. This will take O(n) since you need to go through every single node to check the count.
In terms of your code, I think it's easier to write in a recursive way like below:
int getCount(Node* n)
{
if (!n)
return 0;
return 1 + getCount(n->left) + getCount(n->right);
}
Hope this helps!

Performance difference in toString.map and toString.toArray.map

While coding Euler problems, I ran across what I think is bizarre:
The method toString.map is slower than toString.toArray.map.
Here's an example:
def main(args: Array[String])
{
def toDigit(num : Int) = num.toString.map(_ - 48) //2137 ms
def toDigitFast(num : Int) = num.toString.toArray.map(_ - 48) //592 ms
val startTime = System.currentTimeMillis;
(1 to 1200000).map(toDigit)
println(System.currentTimeMillis - startTime)
}
Shouldn't the method map on String fallback to a map over the array? Why is there such a noticeable difference? (Note that increasing the number even causes an stack overflow on the non-array case).
Original
Could be because toString.map uses the WrappedString implicit, while toString.toArray.map uses the WrappedArray implicit to resolve map.
Let's see map, as defined in TraversableLike:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
b.sizeHint(this)
for (x <- this) b += f(x)
b.result
}
WrappedString uses a StringBuilder as builder:
def +=(x: Char): this.type = { append(x); this }
def append(x: Any): StringBuilder = {
underlying append String.valueOf(x)
this
}
The String.valueOf call for Any uses Java Object.toString on the Char instances, possibly getting boxed first. These extra ops might be the cause of speed difference, versus the supposedly shorter code paths of the Array builder.
This is a guess though, would have to measure.
Edit
After revising, the general point still stands, but the I referred the wrong implicits, since the toDigit methods return an Int sequence (or like), not a translated string as I misread.
toDigit uses LowPriorityImplicits.fallbackStringCanBuildFrom[T]: CanBuildFrom[String, T, immutable.IndexedSeq[T]], with T = Int, which just defers to a general IndexedSeq builder.
toDigitFast uses a direct Array implicit of type CanBuildFrom[Array[_], T, Array[T]], which is unarguably faster.
Passing the following CBF for toDigit explicitly makes the two methods on par:
object FastStringToArrayBuild {
def canBuildFrom[T : ClassManifest] = new CanBuildFrom[String, T, Array[T]] {
private def newBuilder = scala.collection.mutable.ArrayBuilder.make()
def apply(from: String) = newBuilder
def apply() = newBuilder
}
}
You're being fooled by running out of memory. The toDigit version does create more intermediate objects, but if you have plenty of memory then the GC won't be heavily impacted (and it'll all run faster). For example, if instead of creating 1.2 million numbers, I create 12k 100x in a row, I get approximately equal times for the two methods. If I create 1.2k 5-digit numbers 1000x in a row, I find that toDigit is about 5% faster.
Given that the toDigit method produces an immutable collection, which is better when all else is equal since it is easier to reason about, and given that all else is equal for all but highly demanding tasks, I think the library is as it should be.
When trying to improve performance, of course one needs to keep all sorts of tricks in mind; one of these is that arrays have better memory characteristics for collections of known length than do the fancy collections in the Scala library. Also, one needs to know that map isn't the fastest way to get things done; if you really wanted this to be fast you should
final def toDigitReallyFast(num: Int, accum: Long = 0L, iter: Int = 0): Array[Byte] = {
if (num==0) {
val ans = new Array[Byte](math.max(1,iter))
var i = 0
var ac = accum
while (i < ans.length) {
ans(ans.length-i-1) = (ac & 0xF).toByte
ac >>= 4
i += 1
}
ans
}
else {
val next = num/10
toDigitReallyFast(next, (accum << 4) | (num-10*next), iter+1)
}
}
which on my machine is at 4x faster than either of the others. And you can get almost 3x faster yet again if you leave everything in a Long and pack the results in an array instead of using 1 to N:
final def toDigitExtremelyFast(num: Int, accum: Long = 0L, iter: Int = 0): Long = {
if (num==0) accum | (iter.toLong << 48)
else {
val next = num/10
toDigitExtremelyFast(next, accum | ((num-10*next).toLong<<(4*iter)), iter+1)
}
}
// loop, instead of 1 to N map, for the 1.2k number case
{
var i = 10000
val a = new Array[Long](1201)
while (i<=11200) {
a(i-10000) = toDigitReallyReallyFast(i)
i += 1
}
a
}
As with many things, performance tuning is highly dependent on exactly what you want to do. In contrast, library design has to balance many different concerns. I do think it's worth noticing where the library is sub-optimal with respect to performance, but this isn't really one of those cases IMO; the flexibility is worth it for the common use cases.

Remove key/value from map while iterating

I'm creating a map like this:
def myMap = [:]
The map is basically an object for a key and an int for a value. When I iterate over the map, I decret the value, and if it's 0, I remove it. I already tried myMap.remove(), but I get a ConcurrentModificationError - which is fair enough. So I move on to using it.remove(), which is giving me weird results.
Basically, my code is this:
myMap.each {
it.value--;
if( it.value <= 0 )
it.remove();
}
Simple enough. My problem is, if I print myMap.size() before and after the remove, they're the same. If I call myMap.containsKey( key ), it gives me true, the key is still in there.
But, if I print out the map like this:
myMap.each { System.out.println( "$it.key: $it.value" ); }
I get nothing, and calling myMap.keySet() and myMap.values() return empty.
Anyone know what's going on?
This should be a bit more efficient than Tim's answer (because you only need to iterate over the map once). Unfortunately, it is also pretty verbose
def map = [2:1, 3:4]
def iterator = map.entrySet().iterator()
while (iterator.hasNext()) {
if (iterator.next().value - 1 <= 0) {
iterator.remove()
}
}
// test that it worked
assert map == [3:4]
Can you do something like this:
myMap = myMap.each { it.value-- }.findAll { it.value > 0 }
That will subtract one from every value, then return you a new map of only those entries where the value is greater than zero.
You shouldn't call the remove method on a Map Entry, it is supposed to be a private method used internally by the Map (see line 325 for the Java 7 implementation), so you calling it yourself is getting the enclosing Map into all sorts of bother (it doesn't know that it is losing entries)
Groovy lets you call private methods, so you can do this sort of trickery behind the back of the Java classes
Edit -- Iterator method
Another way would be:
myMap.iterator().with { iterator ->
iterator.each { entry ->
entry.value--
if( entry.value <= 0 ) iterator.remove()
}
}

Sorting a string using another sorting order string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I saw this in an interview question ,
Given a sorting order string, you are asked to sort the input string based on the given sorting order string.
for example if the sorting order string is dfbcae
and the Input string is abcdeeabc
the output should be dbbccaaee.
any ideas on how to do this , in an efficient way ?
The Counting Sort option is pretty cool, and fast when the string to be sorted is long compared to the sort order string.
create an array where each index corresponds to a letter in the alphabet, this is the count array
for each letter in the sort target, increment the index in the count array which corresponds to that letter
for each letter in the sort order string
add that letter to the end of the output string a number of times equal to it's count in the count array
Algorithmic complexity is O(n) where n is the length of the string to be sorted. As the Wikipedia article explains we're able to beat the lower bound on standard comparison based sorting because this isn't a comparison based sort.
Here's some pseudocode.
char[26] countArray;
foreach(char c in sortTarget)
{
countArray[c - 'a']++;
}
int head = 0;
foreach(char c in sortOrder)
{
while(countArray[c - 'a'] > 0)
{
sortTarget[head] = c;
head++;
countArray[c - 'a']--;
}
}
Note: this implementation requires that both strings contain only lowercase characters.
Here's a nice easy to understand algorithm that has decent algorithmic complexity.
For each character in the sort order string
scan string to be sorted, starting at first non-ordered character (you can keep track of this character with an index or pointer)
when you find an occurrence of the specified character, swap it with the first non-ordered character
increment the index for the first non-ordered character
This is O(n*m), where n is the length of the string to be sorted and m is the length of the sort order string. We're able to beat the lower bound on comparison based sorting because this algorithm doesn't really use comparisons. Like Counting Sort it relies on the fact that you have a predefined finite external ordering set.
Here's some psuedocode:
int head = 0;
foreach(char c in sortOrder)
{
for(int i = head; i < sortTarget.length; i++)
{
if(sortTarget[i] == c)
{
// swap i with head
char temp = sortTarget[head];
sortTarget[head] = sortTarget[i];
sortTarget[i] = temp;
head++;
}
}
}
In Python, you can just create an index and use that in a comparison expression:
order = 'dfbcae'
input = 'abcdeeabc'
index = dict([ (y,x) for (x,y) in enumerate(order) ])
output = sorted(input, cmp=lambda x,y: index[x] - index[y])
print 'input=',''.join(input)
print 'output=',''.join(output)
gives this output:
input= abcdeeabc
output= dbbccaaee
Use binary search to find all the "split points" between different letters, then use the length of each segment directly. This will be asymptotically faster then naive counting sort, but will be harder to implement:
Use an array of size 26*2 to store the begin and end of each letter;
Inspect the middle element, see if it is different from the element left to it. If so, then this is the begin for the middle element and end for the element before it;
Throw away the segment with identical begin and end (if there are any), recursively apply this algorithm.
Since there are at most 25 "split"s, you won't have to do the search for more than 25 segemnts, and for each segment it is O(logn). Since this is constant * O(logn), the algorithm is O(nlogn).
And of course, just use counting sort will be easier to implement:
Use an array of size 26 to record the number of different letters;
Scan the input string;
Output the string in the given sorting order.
This is O(n), n being the length of the string.
Interview questions are generally about thought process and don't usually care too much about language features, but I couldn't resist posting a VB.Net 4.0 version anyway.
"Efficient" can mean two different things. The first is "what's the fastest way to make a computer execute a task" and the second is "what's the fastest that we can get a task done". They might sound the same but the first can mean micro-optimizations like int vs short, running timers to compare execution times and spending a week tweaking every millisecond out of an algorithm. The second definition is about how much human time would it take to create the code that does the task (hopefully in a reasonable amount of time). If code A runs 20 times faster than code B but code B took 1/20th of the time to write, depending on the granularity of the timer (1ms vs 20ms, 1 week vs 20 weeks), each version could be considered "efficient".
Dim input = "abcdeeabc"
Dim sort = "dfbcae"
Dim SortChars = sort.ToList()
Dim output = New String((From c In input.ToList() Select c Order By SortChars.IndexOf(c)).ToArray())
Trace.WriteLine(output)
Here is my solution to the question
import java.util.*;
import java.io.*;
class SortString
{
public static void main(String arg[])throws IOException
{
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
// System.out.println("Enter 1st String :");
// System.out.println("Enter 1st String :");
// String s1=br.readLine();
// System.out.println("Enter 2nd String :");
// String s2=br.readLine();
String s1="tracctor";
String s2="car";
String com="";
String uncom="";
for(int i=0;i<s2.length();i++)
{
if(s1.contains(""+s2.charAt(i)))
{
com=com+s2.charAt(i);
}
}
System.out.println("Com :"+com);
for(int i=0;i<s1.length();i++)
if(!com.contains(""+s1.charAt(i)))
uncom=uncom+s1.charAt(i);
System.out.println("Uncom "+uncom);
System.out.println("Combined "+(com+uncom));
HashMap<String,Integer> h1=new HashMap<String,Integer>();
for(int i=0;i<s1.length();i++)
{
String m=""+s1.charAt(i);
if(h1.containsKey(m))
{
int val=(int)h1.get(m);
val=val+1;
h1.put(m,val);
}
else
{
h1.put(m,new Integer(1));
}
}
StringBuilder x=new StringBuilder();
for(int i=0;i<com.length();i++)
{
if(h1.containsKey(""+com.charAt(i)))
{
int count=(int)h1.get(""+com.charAt(i));
while(count!=0)
{x.append(""+com.charAt(i));count--;}
}
}
x.append(uncom);
System.out.println("Sort "+x);
}
}
Here is my version which is O(n) in time. Instead of unordered_map, I could have just used a char array of constant size. i.,e. char char_count[256] (and done ++char_count[ch - 'a'] ) assuming the input strings has all ASCII small characters.
string SortOrder(const string& input, const string& sort_order) {
unordered_map<char, int> char_count;
for (auto ch : input) {
++char_count[ch];
}
string res = "";
for (auto ch : sort_order) {
unordered_map<char, int>::iterator it = char_count.find(ch);
if (it != char_count.end()) {
string s(it->second, it->first);
res += s;
}
}
return res;
}
private static String sort(String target, String reference) {
final Map<Character, Integer> referencesMap = new HashMap<Character, Integer>();
for (int i = 0; i < reference.length(); i++) {
char key = reference.charAt(i);
if (!referencesMap.containsKey(key)) {
referencesMap.put(key, i);
}
}
List<Character> chars = new ArrayList<Character>(target.length());
for (int i = 0; i < target.length(); i++) {
chars.add(target.charAt(i));
}
Collections.sort(chars, new Comparator<Character>() {
#Override
public int compare(Character o1, Character o2) {
return referencesMap.get(o1).compareTo(referencesMap.get(o2));
}
});
StringBuilder sb = new StringBuilder();
for (Character c : chars) {
sb.append(c);
}
return sb.toString();
}
In C# I would just use the IComparer Interface and leave it to Array.Sort
void Main()
{
// we defin the IComparer class to define Sort Order
var sortOrder = new SortOrder("dfbcae");
var testOrder = "abcdeeabc".ToCharArray();
// sort the array using Array.Sort
Array.Sort(testOrder, sortOrder);
Console.WriteLine(testOrder.ToString());
}
public class SortOrder : IComparer
{
string sortOrder;
public SortOrder(string sortOrder)
{
this.sortOrder = sortOrder;
}
public int Compare(object obj1, object obj2)
{
var obj1Index = sortOrder.IndexOf((char)obj1);
var obj2Index = sortOrder.IndexOf((char)obj2);
if(obj1Index == -1 || obj2Index == -1)
{
throw new Exception("character not found");
}
if(obj1Index > obj2Index)
{
return 1;
}
else if (obj1Index == obj2Index)
{
return 0;
}
else
{
return -1;
}
}
}

Resources