Julia multithreading "broadcast return" - multithreading

I am new to the world of multithreading, I just want to write a function that compare (large) sorted 2D arrays:
function check_duplicates(sites, generated_structures)
for i in eachindex(generated_structures)
sites == generated_structures[i] && return false
end
return true
end
"generated_structures" might be large (5M+ elements) and to speed up things I was thinking about doing something like this:
function check_duplicates(sites, generated_structures)
Threads.#threads for i in eachindex(generated_structures)
sites == generated_structures[i] && return false
end
return true
end
So that multiple threads check subparts of the large array. Is it possible that as soon as the condition is satisfied the function stops and return false? Right now some threads return true because the subpart they checked didn't contain matches.

One way would be to have a common status flag that each thread can access and bail out if some other thread has found a match. To do this in a thread-safe manner, there is Threads.Atomic:
function hasdup(val, itr)
status = Threads.Atomic{Bool}(false)
Threads.#threads for x in itr
status[] && break
(x == val) && (status[] = true)
end
return status[]
end
Now, I'm not really sure why the access to status needs to be thread-safe, since it will only be written to in order to set it to true. Removing the Atomic wrapper still works, but is quite a bit slower.
I get a decent threading speedup from the above code, but the threading overhead is quite large, so if your match is early in the record, the threaded version will be much slower.
A multi-threading library that has much lower overhead is Polyester.jl, which will be much faster if the match comes early. It is, however, not compatible with the Threads library, so you cannot nest it within Threads.

Related

Why were Logical Operators created?

Almost all programming languages are having the concept of logical operator
I am having a query why logical operators were created. I googled and found its created for condition based operation, but that's a kind of usage i think.
I am interested in the answer that what are the challenges people faced without this operator. Please explain with example if possible.
I am interested in the answer that what are the challenges people faced without this operator.
Super-verbose deeply nested if() conditions, and especially loop conditions.
while (a && b) {
a = something;
b = something_else;
}
written without logical operators becomes:
while (a) {
if (!b) break; // or if(b){} else break; if you want to avoid logical ! as well
a = something;
b = something_else;
}
Of if you don't want a loop, do you want to write this?
if (c >= 'a') {
if (c <= 'z') {
stuff;
}
}
No, of course you don't because it's horrible compared to if (c >= 'a' && c <= 'z'), especially if there's an else, or this is inside another nesting. Especially if your coding-style rules require 8-space indentation for each level of nesting, or the { on its own line making each level of nesting eat up even more vertical space.
Note that a&b is not equivalent to a&&b: even apart from short-circuit evaluation. (Where b isn't even evaluated if a is false.) e.g. 2 & 1 is false, because their integer bit patterns don't have any of the same bits set.
Short-circuit evaluation allows loop conditions like while(p && p->data != 0) to check for a NULL pointer and then conditionally do something only on non-NULL.
Compact expressions were a big deal when computers were programmed over slow serial lines using paper teletypes.
Also note that these are purely high-level language-design considerations. CPU hardware doesn't have anything like logical operators; it usually takes multiple instructions to implement a ! on an integer (into a 0/1 integer, not when used as an if condition).
if (a && b) typically compiles to two test/branch instructions in a row.

Is this recursive?

Second attempt here, I just wanted to know if this is considered a recursive function.
The purpose of the function is to take a string and
if the the first element is equal to the last element
then append the last element to a list and return nothing,
else call istelf and pass the same string from index [1]
finally append the first element to the list
I know that error checking needs to be done on the if statement. However I am only doing this to try and get my head around recursion...Struggling to be honest.
Also I would never write a program like this if it where anything but trivial I just wanted to check if my understanding is correct so far.
def parse(theList):
theList.reverse()
parsedString = ''.join(theList)
return parsedString
def recursiveMessage(theString):
lastElement = theString[len(theString) - 1]
if theString[0] == lastElement:
buildString.append(theString[0])
return None
else:
recursiveMessage(theString[1::])
buildString.append(theString[0])
toPrint = "Hello Everyone!"
buildString = []
recursiveMessage(toPrint)
print(parse(buildString))
Thanks again.
Is this recursive?
If at any point in a function's execution it calls itself, then it is consider recursive. This happens in your example, so recursiveMessage is indeed recursive.
so which is quicker recursion or iteration?
Recursion is usually much slower and consumes more space due to a new stack frame having to be created on the call stack each recursive call. If you know your recursive function will need to be run many times, iteration is the best route.
As an interesting side note, many compilers actually optimize a recursive function by rolling it out into a loop anyways.

Readable, controllable iterators?

I'm trying to craft an LL(1) parser for a deterministic context-free grammar. One of the things I'd like to be able to use, because it would enable much simpler, less greedy and more maintainable parsing of literal records like numbers, strings, comments and quotations is k tokens of lookahead, instead of just 1 token of lookahead.
Currently, my solution (which works but which I feel is suboptimal) is like (but not) the following:
for idx, tok in enumerate(toklist):
if tok == "blah":
do(stuff)
elif tok == "notblah":
try:
toklist[idx + 1]
except:
whatever()
else:
something(else)
(You can see my actual, much larger implementation at the link above.)
Sometimes, like if the parser finds the beginning of a string or block comment, it would be nice to "jump" the iterator's current counter, such that many indices in the iterator would be skipped.
This can in theory be done with (for example) idx += idx - toklist[idx+1:].index(COMMENT), however in practice, each time the loop repeats, the idx and obj are reinitialised with toklist.next(), overwriting any changes to the variables.
The obvious solution is a while True: or while i < len(toklist): ... i += 1, but there are a few glaring problems with those:
Using while on an iterator like a list is really C-like and really not Pythonic, besides the fact it's horrendously unreadable and unclear compared to an enumerate on the iterator. (Also, for while True:, which may sometimes be desirable, you have to deal with list index out of range.)
For each cycle of the while, there are two ways to get the current token:
using toklist[i] everywhere (ugly, when you could just iterate)
assigning toklist[i] to a shorter, more readable, less typo-vulnerable name each cycle. this has the disadvantage of hogging memory and being slow and inefficient.
Perhaps it can be argued that a while loop is what I should use, but I think while loops are for doing things until a condition is no longer true, and for loops are for iterating and looping finitely over an iterator, and a(n iterative LL) parser should clearly implement the latter.
Is there a clean, Pythonic, efficient way to control and change arbitrarily the iterator's current index?
This is not a dupe of this because all those answers use complicated, unreadable while loops, which is what I don't want.
Is there a clean, Pythonic, efficient way to control and change arbitrarily the iterator's current index?
No, there isn't. You could implement your own iterator type though; it wouldn't operate at the same speed (being implemented in Python), but it's doable. For example:
from collections.abc import Iterator
class SequenceIterator(Iterator):
def __init__(self, seq):
self.seq = seq
self.idx = 0
def __next__(self):
try:
ret = self.seq[self.idx]
except IndexError:
raise StopIteration
else:
self.idx += 1
return ret
def seek(self, offset):
self.idx += offset
To use it, you'd do something like:
# Created outside for loop so you have name to call seek on
myseqiter = SequenceIterator(myseq)
for x in myseqiter:
if test(x):
# do stuff with x
else:
# Seek somehow, e.g.
myseqiter.seek(1) # Skips the next value
Adding behaviors like providing the index as well as value is left as an exercise.

complete vs. simple i/o lua

I am trying to write a program to analyze data from a simulation. Since the simulation software I am using is what is running the Lua program, I am not sure if this is the right place to ask this question, but I am probably making a programming error.
I am struggling with the difference between using the simple and complete I/O models. I have a block of code, which works, and looks like this:
io.output([[filename_and_location]])
function segment.other_actions
if ion_splat ~= 0 then io.write(ion_px_mm, "\n") end
io.close()
end
Note: ion_splat and ion_px_mm are pre-determined variables that take on number values. This code is run over and over again throughout the simulation.
Then I decided to try achieving the same thing using the complete I/O model like this:
f = io.open([[file_name_and_location]],"w")
function segment.other_actions ()
if ion_splat ~= 0 then f:write(ion_py_mm, "\n") end
f:close()
end
end
This runs, but takes a lot longer than the other way. Why is that?
Example 1:
for i = 1, 1000 do
io.output("test.txt")
io.write("some data to be written\n")
io.close()
end
Example 2:
for i = 1, 1000 do
local f = io.open("test.txt", "w")
f:write("some data to be written\n")
f:close()
end
There is no measurable difference in the execution time.
The latter approach is usually preferable because the used file is identified explicitly.

Multithread+Recursion strategies

I am just starting to learn the ins-and-outs of multithread programming and have a few basic questions that, once answered, should keep me occupied for quite sometime. I understand that multithreading loses its effectiveness once you have created more threads than there are cores (due to context switching and cache flushing). With that understood, I can think of two ways to employ multithreading of a recursive function...but am not quite sure what is the common way to approach the problem. One seems much more complicated, perhaps with a higher payoff...but thats what I hope you will be able to tell me.
Below is pseudo-code for two different methods of multithreading a recursive function. I have used the terminology of merge sort for simplicity, but it's not that important. It is easy to see how to generalize the methods to other problems. Also, I will personally be employing these methods using the pthreads library in C, so the thread syntax mildly reflects this.
Method 1:
main ()
{
A = array of length N
NUM_CORES = get number of functional cores
chunk[NUM_CORES] = array of indices partitioning A into (N / NUM_CORES) sized chunks
thread_id[NUM_CORES] = array of thread id’s
thread[NUM_CORES] = array of thread type
//start NUM_CORES threads on working on each chunk of A
for i = 0 to (NUM_CORES - 1) {
thread_id[i] = thread_start(thread[i], MergeSort, chunk[i])
}
//wait for all threads to finish
//Merge chunks appropriately
exit
}
MergeSort ( chunk )
{
MergeSort ( lowerSubChunk )
MergeSort ( higherSubChunk )
Merge(lowerSubChunk, higherSubChunk)
}
//Merge(,) not shown
Method 2:
main ()
{
A = array of length N
NUM_CORES = get number of functional cores
chunk = indices 0 and N
thread_id[NUM_CORES] = array of thread id’s
thread[NUM_CORES] = array of thread type
//lock variable aka mutex
THREADS_IN_USE = 1
MergeSort( chunk )
exit
}
MergeSort ( chunk )
{
lock THREADS_IN_USE
if ( THREADS_IN_USE < NUM_CORES ) {
FREE_CORE = find index of unused core
thread_id[FREE_CORE] = thread_start(thread[FREE_CORE], MergeSort, lowerSubChunk)
THREADS_IN_USE++
unlock THREADS_IN_USE
MergeSort( higherSubChunk )
//wait for thread_id[FREE_CORE] and current thread to finish
lock THREADS_IN_USE
THREADS_IN_USE--
unlock THREADS_IN_USE
Merge(lowerSubChunk, higherSubChunk)
}
else {
unlock THREADS_IN_USE
MergeSort( lowerSubChunk )
MergeSort( higherSubChunk )
Merge(lowerSubChunk, higherSubChunk)
}
}
//Merge(,) not shown
Visually, one can think of the differences between these two methods as follows:
Method 1: creates NUM_CORES separate recursion trees, each one having a single core traversing it.
Method 2: creates a single recursion tree but has all cores traversing it. In particular, whenever there is a free core, it is set to work on the "left child subtree" of the first node where MergeSort is called after the core is freed.
The problem with Method 1 is that if it is the case that the running time of the recursive function varies with the distribution of values within each initial subchunk (i.e. the chunk[i]), one thread could finish much faster leaving a core sitting idle while the others finish. With Merge Sort this is not likely to be the case since the work of MergeSort happens in Merge whose runtime isn't affected much by the distribution of values in the (sorted) subchunks. However, with a more involved recursive function, the running time on one subchunk could be much longer!
With Method 2 it is possible to have the same problem. Again, with merge sort its not clear since the running time for each subchunk is likely to be similar, but the line //wait for thread_id[FREE_CORE] and current thread to finish would also require one core to wait for the other. However, with Method 2, all calls to Merge run ASAP as opposed to Method 1 where one must wait for NUM_CORES calls to MergeSort to finish and then do NUM_CORES - 1 merges afterward (although you can multithread this as well...to an extent)
(though the syntax might not be completely correct)
Are both of these methods used in practice? Are there situations where one is more beneficial over the other? Is this the correct way to implement Method 2? (in this case, THREADS_IN_USE is a semaphore?)
Thanks so much for your help!

Resources