What is wrong with my batching when using #sync #spawn loop on multiple threads, odd behavior when using += operator - multithreading

QUESTION TITILE UPDATED from original. The real problem wasn't really a multithreading issue, it was how I was batching the work, and things were being repeated.
I am getting the wrong result when using the += operator in a loop.
I have created a simple script that reproduces the problem I have been having. The array should be filled with array[i,j] = i+j. The first time it goes through it just does it using the assignment the second time trying to use the += operator. There is also array_2, array_3, and array_4 to show the correct result using two separate arrays added together at the end.
Can anyone explain why this is happening / what the correct way to do this is?
using Base.Threads
function function_1(cartesian_index)
i = cartesian_index[1]
j = cartesian_index[2]
return (i + j)/2
end
function main()
array_1 = zeros(Float64, 15, 15)
array_2 = zeros(Float64, 15, 15)
array_3 = zeros(Float64, 15, 15)
cartesian_indecies = CartesianIndices(array_1)
number_of_indecies = length(cartesian_indecies)
n_threads = Threads.nthreads()
batch_size = ceil(Int, number_of_indecies / n_threads)
cartesian_indicies = CartesianIndices(array_1)
#sync for batch_index in 1:batch_size:number_of_indecies
Threads.#spawn begin
for view_index in batch_index:min(number_of_indecies, batch_index + batch_size)
cartesian_index = cartesian_indecies[view_index]
array_1[cartesian_index] = function_1(cartesian_index)
array_2[cartesian_index] = function_1(cartesian_index)
end
end
end
for index in CartesianIndices(array_1)
if array_1[index] != (index[1] + index[2])/2
println("Error part 1 at index $index: $(array_1[index]) != $(index[1] + index[2])")
end
end
#sync for batch_index in 1:batch_size:number_of_indecies
Threads.#spawn begin
for view_index in batch_index:min(number_of_indecies, batch_index + batch_size)
cartesian_index = cartesian_indecies[view_index]
array_1[cartesian_index] += function_1(cartesian_index)
array_3[cartesian_index] = function_1(cartesian_index)
end
end
end
for index in CartesianIndices(array_1)
if array_1[index] != index[1] + index[2]
println("Error at index $index: $(array_1[index]) != $(index[1] + index[2])")
end
end
array_4 = array_2 + array_3
for index in CartesianIndices(array_4)
if array_4[index] != index[1] + index[2]
println("Array 4 Error at index $index: $(array_1[index]) != $(index[1] + index[2])")
end
end
end
main()
this works fine when running with one thread.
But when running as:
julia --threads 3 ./parallel_issues_test.jl
The output of this:
Error at index CartesianIndex(1, 6): 10.5 != 7
Error at index CartesianIndex(1, 11): 18.0 != 12

Because your partitioning has bug:
test = []
for batch_index in 1:batch_size:number_of_indecies
for view_index in batch_index:min(number_of_indecies, batch_index + batch_size)
cartesian_index = cartesian_indecies[view_index]
push!(test, cartesian_index)
end
end
#show length(test), length(unique(test))
# shows
(length(test), length(unique(test))) = (227, 225)
there are a few clues:
= works because you don't rely on previous value
you have 2 errors and this shows you have exactly 2 extra iteration
if it's something to do with multi-threading (race condition), the error location or value would probably change from run to run

A fixed version from #jling 's answer.
The key lines are the top-level loops:
#sync for batch_index in 1:batch_size+1:number_of_indecies
using Base.Threads
function function_1(cartesian_index)
i = cartesian_index[1]
j = cartesian_index[2]
return (i + j)/2
end
function main()
array_1 = zeros(Float64, 15, 15)
array_2 = zeros(Float64, 15, 15)
array_3 = zeros(Float64, 15, 15)
cartesian_indecies = CartesianIndices(array_1)
number_of_indecies = length(cartesian_indecies)
n_threads = Threads.nthreads()
batch_size = ceil(Int, number_of_indecies / n_threads)
#sync for batch_index in 1:batch_size+1:number_of_indecies
Threads.#spawn begin
for view_index in batch_index:min(number_of_indecies, batch_index + batch_size)
cartesian_index = cartesian_indecies[view_index]
array_1[cartesian_index] = function_1(cartesian_index)
array_2[cartesian_index] = function_1(cartesian_index)
end
end
end
for index in CartesianIndices(array_1)
if array_1[index] != (index[1] + index[2])/2
println("Error part 1 at index $index: $(array_1[index]) != $(index[1] + index[2])")
end
end
#sync for batch_index in 1:batch_size+1:number_of_indecies
Threads.#spawn begin
for view_index in batch_index:min(number_of_indecies, batch_index + batch_size)
cartesian_index = cartesian_indecies[view_index]
array_1[cartesian_index] += function_1(cartesian_index)
array_3[cartesian_index] = function_1(cartesian_index)
end
end
end
for index in CartesianIndices(array_1)
if array_1[index] != index[1] + index[2]
println("Error at index $index: $(array_1[index]) != $(index[1] + index[2])")
end
end
array_4 = array_2 + array_3
for index in CartesianIndices(array_4)
if array_4[index] != index[1] + index[2]
println("Array 4 Error at index $index: $(array_1[index]) != $(index[1] + index[2])")
end
end
println("number_of_indecies = $number_of_indecies")
println("n_threads = $n_threads")
println("batch_size = $batch_size")
test = []
#sync for batch_index in 1:batch_size+1:number_of_indecies
Threads.#spawn begin
#print thread number
end_index = min(number_of_indecies, batch_index + batch_size)
println("Thread $(Threads.threadid()) $batch_index:$end_index")
for view_index in batch_index:end_index
cartesian_index = cartesian_indecies[view_index]
push!(test, cartesian_index)
end
end
end
#show length(test), length(unique(test))
end
main()

Related

Misunderstanding about Julia threading race conditions

This question is a follow up of one of my previous questions: Fastest way to compare lists with threshold in Julia
I am trying to parallelise the code using threads, here is an before/after
for i = 1:l
B[i] || continue
for j = i + 1:l
B[j] || continue
if all(abs(P[i, k] - P[j, k]) <= 0.001 for k=1:1540)
B[j] = false
end
end
end
after:
for i = 1:l
B[i] || continue
Threads.#threads for j = i + 1:l
B[j] || continue
if all(abs(P[i, k] - P[j, k]) <= 0.001 for k=1:1540)
B[j] = false
end
end
end
Where B is a BitVector of size m and P is matrix of size (m, 1540)
Just threading the loop actually improved performance and seemed to work fine with large array, (shape of P = (400000, 1540)) but doing some tests on smaller arrays with higher number of threads (m = 1056, nthreads > 8) I realised that I wasn't getting the same answer anymore. Some false in B weren't set correctly.
Now what I don't understand is that there shouldn't be a race condition here: no threads will access the same j value?
Changing to the code below actually fix the problem without hurting the performance but I don't really understand why?
for i = 1:l
B[i] || continue
o = ReentrantLock()
Threads.#threads for j = i + 1:l
B[j] || continue
if all(abs(P[i, k] - P[j, k]) <= 0.001 for k=1:1540)
Threads.lock(o) do
B[j] = false
end
end
end
end

in fit_generator, training_generator was influenced by validation_generator

I met a strange problem in fit_generator
model.fit_generator(generate_arrays_from_file(trainSizeListImgDic[s], s, batchSize), steps_per_epoch=math.floor(size / batchSize), epochs=20,
verbose=2, validation_data=generate_arrays_from_file(testSizeListImgDic[s], s, batchSize),
validation_steps=vs,callbacks=[EarlyStoppingByAccVal(monitor='val_acc', value=0.90, verbose=1),checkpointer])
and my generate_arrays_from_file reads:
def generate_arrays_from_file( SizeListImg ,img_size,batch):
size = len(SizeListImg.images)
dim = re.split('[,()]', img_size)
dataX = np.zeros((batch, int(dim[1]), int(dim[2]), 1), dtype=np.float32)
dataY = np.zeros((batch, num_classes), dtype=np.float32)
loopcount = math.floor( size/batch )-1
if loopcount==0:
loopcount = 1
counter = 0
while (True):
i = random.randint(0,loopcount)
for ind in range( (i*batch) , (i + 1)*batch ):
try:
dataX[counter, :, :, 0] = SizeListImg.images[ind]
except :
print('dim='+ str(dim) )
print('error counter=' + str(counter) + " i="+str(i) + " ind=" + str(ind) + " batch="+str(batch) + "\n" )
print("SizeListImg.images="+str( len(SizeListImg.images) ) )
print( "img0 = "+SizeListImg.images_names[0])
print("img1 = " + SizeListImg.images_names[1])
print("img2 = " + SizeListImg.images_names[2])
print("img3 = " + SizeListImg.images_names[3])
for j, imgClass in enumerate(imgClasses):
dataY[counter, j] = (SizeListImg.labels[ind] == imgClass)
counter += 1
if counter>=batch:
yield (dataX,dataY)
counter = 0
dataX = np.zeros((batch, int(dim[1]), int(dim[2]), 1), dtype=np.float32) #not tf.zeros((25, 200, 200, 1)) please noted different: np.zeros(
dataY = np.zeros((batch, num_classes), dtype=np.float32)
During training I found size of training images 139 be reduced to the size of validation images 22, and that lead to the index wrong but the images indeed came from training images set. However if I reduce batch from 20 to 10, no any error.
Any conspiracy of fit_generator against me?

how to get start index for maximum sum sub array

I am using the following program to find the maximum sum and indices of the sum.I am able to get the right index but having trouble finding the right index.
def max_sum_new(a):
max_current = max_global = a[0]
r_index = 0
for i in range(1, len(a)):
max_current = max(a[i], max_current+a[i])
if max_current > max_global:
max_global = max_current
r_index = i
return (max_global, r_index)
#Driver Code:
my_arr = [-2, 3, 2, -1]
print(max_sum_new(my_arr))
I am getting
(5, 2)
which is expected but i also want to get the starting index which in this case should be 1
so i expect
(5, 1, 2)
is there any way to get a start index here? How should i put the variable to record start index?
You just need to apply the same logic as you did with r_index. Whenever you change the max_current, that's when the start of the maximum sub-array changes. In the end it looks like this
max_current = max_global = a[0]
r_index = 0
left_index = 0
for i in range(1, len(a)):
if a[i] > max_current + a[i]:
# Here is where your start index changes
left_index = i
max_current = a[i]
else:
max_current = a[i] + max_current
if max_current > max_global:
max_global = max_current
r_index = i
print(max_global, left_index, r_index)

Pascal Triangle Array

I tried to make a Pascal triangle using python arrays , i succeeded in doing so but in the output i get zeros in the usual blank spaces , i want to remove the zeros but i don't know how, there is my code :
n = int(input("range: "))
t = [([0] * n) for i in range(n)]
t[0][0] = 1
for i in range(1, n):
t[i][0] = 1
t[i][i] = 1
for j in range(1, i):
t[i][j] = t[i - 1][j] + t[i - 1][j - 1]
for i in range(n):
for j in range(n):
print(t[i][j], end="")
print("\n")
The output is :
range: 5
10000
11000
12100
13310
14641
n = int(input("range: "))
t = [([""] * n) for i in range(n)]
t[0][0] = 1
for i in range(1, n):
t[i][0] = 1
t[i][i] = 1
for j in range(1, i):
t[i][j] = t[i - 1][j] + t[i - 1][j - 1]
for i in range(n):
for j in range(n):
print(t[i][j], end="")
print("\n")

Im trying to get the code to sort using the median of three method

Im trying to get the code to sort using the median of three method and im running into a few problems.
alist[first], alist[pivotindex] = alist[pivotindex], alist[first]
is returning a invalid syntax and i'm not sure why.
def quickSort(alist):
quickSortHelper(alist,0,len(alist)-1)
def quickSortHelper(alist,first,last):
if first<last:
splitpoint = partition(alist,first,last)
quickSortHelper(alist,first,splitpoint-1)
quickSortHelper(alist,splitpoint+1,last)
def partition(alist,first,last):
pivotindex = median(alist, first, last, (first + last //2)
alist[first], alist[pivotindex] = alist[pivotindex], alist[first]
pivotvalue = alist[first]
leftmark = first+1
rightmark = last
done = False
while not done:
while leftmark <= rightmark and \
alist[leftmark] <= pivotvalue:
leftmark = leftmark + 1
print(alist)
while alist[rightmark] >= pivotvalue and \
rightmark >= leftmark:
rightmark = rightmark -1
print(alist)
if rightmark < leftmark:
done = True
else:
temp = alist[leftmark]
alist[leftmark] = alist[rightmark]
alist[rightmark] = temp
print(alist)
temp = alist[first]
alist[first] = alist[rightmark]
alist[rightmark] = temp
return rightmark
def median (a, i, j, k):
if a [i] < a[j]:
return j if a [j] < a[k] else k
else:
return i if a[i] < a[k] else k
alist = [54,26,93,17,77,31,44,55,20]
quickSort(alist)
print(alist)
Because the line above it is missing a ).

Resources