Best way to store outputs from multi-threaded function calls - multithreading

I have a function f() which returns a DataFrame, the number of rows of which I don't know in advance. I'm calling f() in a multi-threaded context. I'm storing the results like this:
results = [DataFrame() for _ in 1:100]
Threads.#threads for hi in 1:100
results[hi] = f(df)
end
When I run this code, the memory usage blows up, presumably because results is having to constantly resize itself when it gets the size of the DataFrame [EDIT: this isn't true]. What is the best way to pre-allocate the results array so that the memory doesn't blow up?
**** UPDATE with MWE ****
function func(df::DataFrame)
X = df[:time]
indices = findall(X .> 0)
end
# read in R data
rds = "blablab.rds"
objs = load(rds);
params = collect(0.5:0.005:0.7);
for i in 1:length(objs)
cols = [string(name) for name in names(objs.data[i]) if occursin("blabla",string(name))]
hypers = [(a,b) for a in cols, b in params]
results = [DataFrame() for _ in 1:length(hypers)]
# HERE IS WHERE THE MEMORY BLOWS UP
Threads.#threads for hi in 1:length(hypers)
name, val = hypers[hi]
results[hi] = func(objs.data[i])
end
end
df is 0.7GB. When I run this piece of code my memory usage goes up to ~30GB!!! It seems like just accessing a column of df inside func() is copying the whole thing?

Please find below two version of the same code - single and multi-threaded generating a DataFrame from a set of DataFrames returned by f() function and having random length.
using Random
using DataFrames
using BenchmarkTools
function f(rngs::Vector{Random.MersenneTwister}, offset)::DataFrame
t = Threads.threadid()
n = rand(rngs[t+offset], 1:20)
DataFrame(a=1:n,b=21:(20+n),t=t+offset)
end
function test_threads(rngs::Vector{Random.MersenneTwister})
res = DataFrame([Int,Int,Int],[:a,:b,:t],0)
lock = Threads.SpinLock()
Threads.#threads for i in 1:100
df = f(rngs,0)
Threads.lock(lock)
append!(res,df)
Threads.unlock(lock)
end
res
end
function test_normal(rngs::Vector{Random.MersenneTwister})
res = DataFrame([Int,Int,Int],[:a,:b,:t],0)
for i in 1:100
append!(res,f(rngs, i%2))
end
res
end
Now let us do the testing:
julia> rngs = [Random.MersenneTwister(i) for i in 1:2];
julia> #btime test_normal($rngs);
891.306 μs (5983 allocations: 476.67 KiB)
rngs = [Random.MersenneTwister(i) for i in 1:Threads.nthreads()];
#btime test_threads($rngs);
674.559 μs (5549 allocations: 425.69 KiB)

Related

How to compute the average of a string of floats

temp = "75.1,77.7,83.2,82.5,81.0,79.5,85.7"
I am stuck in this assignment and unable to find a relevant answer to help.
I’ve used .split(",") and float()
and I am still stuck here.
temp = "75.1,77.7,83.2,82.5,81.0,79.5,85.7"
li = temp.split(",")
def avr(li):
av = 0
for i in li:
av += float(i)
return av/len(li)
print(avr(li))
You can use sum() to add the elements of a tuple of floats:
temp = "75.1,77.7,83.2,82.5,81.0,79.5,85.7"
def average (s_vals):
vals = tuple ( float(v) for v in s_vals.split(",") )
return sum(vals) / len(vals)
print (average(temp))
Admittedly similar to the answer by #emacsdrivesmenuts (GMTA).
However, opting to use the efficient map function which should scale nicely for larger strings. This approach removes the for loop and explicit float() conversion of each value, and passes these operations to the lower-level (highly optimised) C implementation.
For example:
def mean(s):
vals = tuple(map(float, s.split(',')))
return sum(vals) / len(vals)
Example use:
temp = '75.1,77.7,83.2,82.5,81.0,79.5,85.7'
mean(temp)
>>> 80.67142857142858

Setting seeds in multi-threading loop in Julia

I want to generate random numbers in Julia using multi-threading. I am using the
Threads.#threads macro to accomplish it. However, I struggle fixing the number of seeds to obtain the same result every time I run the code. Here is my trial:
Random.seed!(1234)
a = [Float64[] for _ in 1:10]
Threads.#threads for i = 1:10
push!(a[Threads.threadid()],rand())
end
sum(reduce(vcat, a))
The script above delivers different results every time I run it. By contrast, I get the same results if I use a plain for loop:
Random.seed!(12445)
b = []
for i = 1:10
push!(b,rand())
end
sum(b)
I have the impression that the solution to this issue must be easy. Still, I couldn't find it. Any help is much appreciated.
Thank you.
You need to generate a separate random stream for each thread.
The simplest way is to have a random number generator with a different seed:
using Random
rngs = [MersenneTwister(i) for i in 1: Threads.nthreads()];
Threads.#threads for i = 1:10
val = rand(rngs[Threads.threadid()])
# do something with val
end
If you do not want to risk correlation for different random number seeds you could actually jump around a single number generator:
julia> rngs2 = Future.randjump.(Ref(MersenneTwister(0)), big(10)^20 .* (1:Threads.nthreads()))
4-element Vector{MersenneTwister}:
MersenneTwister(0, (200000000000000000000, 0))
MersenneTwister(0, (400000000000000000000, 0))
MersenneTwister(0, (600000000000000000000, 0))
MersenneTwister(0, (800000000000000000000, 0))
Ciao Fabrizio. In BetaML I solved this problem with:
"""
generateParallelRngs(rng::AbstractRNG, n::Integer;reSeed=false)
For multi-threaded models, return n independent random number generators (one per thread) to be used in threaded computations.
Note that each ring is a _copy_ of the original random ring. This means that code that _use_ these RNGs will not change the original RNG state.
Use it with `rngs = generateParallelRngs(rng,Threads.nthreads())` to have a separate rng per thread.
By default the function doesn't re-seed the RNG, as you may want to have a loop index based re-seeding strategy rather than a threadid-based one (to guarantee the same result independently of the number of threads).
If you prefer, you can instead re-seed the RNG here (using the parameter `reSeed=true`), such that each thread has a different seed. Be aware however that the stream of number generated will depend from the number of threads at run time.
"""
function generateParallelRngs(rng::AbstractRNG, n::Integer;reSeed=false)
if reSeed
seeds = [rand(rng,100:18446744073709551615) for i in 1:n] # some RNGs have issues with too small seed
rngs = [deepcopy(rng) for i in 1:n]
return Random.seed!.(rngs,seeds)
else
return [deepcopy(rng) for i in 1:n]
end
end
The function above deliver the same results also independently of the number of threads used in Julia and can then be used for example like here:
using Test
TESTRNG = MersenneTwister(123)
println("** Testing generateParallelRngs()...")
x = rand(copy(TESTRNG),100)
function innerFunction(bootstrappedx; rng=Random.GLOBAL_RNG)
sum(bootstrappedx .* rand(rng) ./ 0.5)
end
function outerFunction(x;rng = Random.GLOBAL_RNG)
masterSeed = rand(rng,100:9999999999999) # important: with some RNG it is important to do this before the generateParallelRngs to guarantee independance from number of threads
rngs = generateParallelRngs(rng,Threads.nthreads()) # make new copy instances
results = Array{Float64,1}(undef,30)
Threads.#threads for i in 1:30
tsrng = rngs[Threads.threadid()] # Thread safe random number generator: one RNG per thread
Random.seed!(tsrng,masterSeed+i*10) # But the seeding depends on the i of the loop not the thread: we get same results indipendently of the number of threads
toSample = rand(tsrng, 1:100,100)
bootstrappedx = x[toSample]
innerResult = innerFunction(bootstrappedx, rng=tsrng)
results[i] = innerResult
end
overallResult = mean(results)
return overallResult
end
# Different sequences..
#test outerFunction(x) != outerFunction(x)
# Different values, but same sequence
mainRng = copy(TESTRNG)
a = outerFunction(x, rng=mainRng)
b = outerFunction(x, rng=mainRng)
mainRng = copy(TESTRNG)
A = outerFunction(x, rng=mainRng)
B = outerFunction(x, rng=mainRng)
#test a != b && a == A && b == B
# Same value at each call
a = outerFunction(x,rng=copy(TESTRNG))
b = outerFunction(x,rng=copy(TESTRNG))
#test a == b
Assuming you are on Julia 1.6 you can do e.g. the following:
julia> using Random
julia> foreach(i -> Random.seed!(Random.default_rng(i), i), 1:Threads.nthreads())
The point is that currently Julia already has a separate random number generator per thread so you do not need to generate your own (of course you could do it as in the other answers, but you do not have to).
Also note that in the future versions of Julia the:
Threads.#threads for i = 1:10
push!(a[Threads.threadid()],rand())
end
part is not guaranteed to produce reproducible results. In Julia 1.6 Threads.#threads uses static scheduling, but as you can read in its docstring it is subject to change.

How do I speed up this nested for loop in Python?

the function shown below is running quite slow even though I used swifter to call it. Does anyone know how to speed this up? My python knowledge is limited at this point and I would appreciate any help I could get. I tried using map() function but somehow it didnt work for me. I guess the nested for loop makes it rather slow, right?
BR,
Hannes
def polyData(uniqueIds):
for index in range(len(uniqueIds) - 1):
element = uniqueIds[index]
polyData1 = df[df['id'] == element]
poly1 = build_poly(polyData1)
poly1 = poly1.buffer(0)
for secondIndex in range(index + 1, len(uniqueIds)):
otherElement = uniqueIds[secondIndex]
polyData2 = df[df['id'] == otherElement]
poly2 = build_poly(polyData2)
poly2 = poly2.buffer(0)
# Calculate overlap percentage wise
overlap_pct = poly1.intersection(poly2).area/poly1.area
# Form new DF
df_ol = pd.DataFrame({'id_1':[element],'id_2':[otherElement],'overlap_pct':[overlap_pct]})
# Write to SQL database
df_ol.to_sql(name='df_overlap', con=e,if_exists='append',index=False)
This function is inherently slow for large amounts of data due to its complexity (trying every 2-combination of a set). However, you're calculating the 'poly' for the same ids multiple times, even though it seems that you can calculate them only once beforehand (which might be expensive) and store them for later usage. So try to extract the building of the polys.
def getPolyForUniqueId(uid):
polyData = df[df['id'] == uid]
poly = build_poly(polyData)
poly = poly.buffer(0)
return polyData
def polyData(uniqueIds):
polyDataList = [getPolyForUniqueId(uid) for uid in uniqueIds]
for index in range(len(uniqueIds) - 1):
id_1 = uniqueIds[index]
poly_1 = polyDataList[index]
for secondIndex in range(index + 1, len(uniqueIds)):
id_2 = uniqueIds[secondIndex]
poly_2 = polyDataList[secondIndex]
...

How to accelerate the application of the following for loop and function?

I have the following for loop:
for j in range(len(list_list_int)):
arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
I would like to apply foo with multiprocessing, mainly because it is taking a lot of time to finish. I tried to do it in batches with funcy's chunks method:
for j in chunks(1000, list_list_int):
arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
However, I am getting list object cannot be interpreted as an integer. What is the correct way of applying foo using multiprocessing?
list_list_int = [1,2,3,4,5,6]
for j in chunks(2, list_list_int):
for i in j:
avg_, max_, last_ = foo(bar, i)
I don't have chunks installed, but from the docs I suspect it produces (for size 2 chunks, from:
alist = [[1,2],[3,4],[5,6],[7,8]]
j = [[1,2],[3,4]]
j = [[5,6],[7,8]]
which would produce an error:
In [116]: alist[j]
TypeError: list indices must be integers or slices, not list
And if your foo can't work with the full list of lists, I don't see how it will work with that list split into chunks. Apparently it can only work with one sublist at a time.
If you are looking to perform parallel operations on a numpy array, then I would use Dask.
With just a few lines of code, your operation should be able to be easily ran on multiple processes and the highly developed Dask scheduler will balance the load for you. A huge benefit to Dask compared to other parallel libraries like joblib, is that it maintains the native numpy API.
import dask.array as da
# Setting up a random array with dimensions 10K rows and 10 columns
# This data is stored distributed across 10 chunks, and the columns are kept together (1_000, 10)
x = da.random.random((10_000, 10), chunks=(1_000, 10))
x = x.persist() # Allow the entire array to persist in memory to speed up calculation
def foo(x):
return x / 10
# Using the native numpy function, apply_along_axis, applying foo to each row in the matrix in parallel
result_foo = da.apply_along_axis(foo, 0, x)
# View original contents
x[0:10].compute()
# View sample of results
result_foo = result_foo.compute()
result_foo[0:10]

Julia: Unique sets of n elements with replacement

Given a vector v = [1,..,n], I try to compute all unique sets of n elements with replacements in julia.
Since I want to do this for larger values of n, I'm looking for an efficient solution, possibly using iterators.
For example, let's consider v = [1, 2, 3]: This should results in [1,1,1], [1,1,2], [1,1,3], [1,2,2], [1,2,3], [1,3,3], [2,2,2], [2,2,3], [2,3,3], [3,3,3]. With unique, I mean that if [1,1,2] is a solution, any of its permutations [1,2,1], [2,1,1] is not.
My current solution is based on the partitions function, but does not allow me to restrict the computation on the elements [1,..,n]
for i in n:n^2
for j in partitions(i, n)
## ignore sets which exceed the range [1,n]
if maximum(j) <= n
## accept as solution
end
end
end
In julia v0.5.0, combinatorics.jl has a with_replacement_combinations method.
julia> collect(with_replacement_combinations(1:4,3))
20-element Array{Array{Int64,1},1}:
[1,1,1]
[1,1,2]
[1,1,3]
[1,1,4]
[1,2,2]
[1,2,3]
[1,2,4]
[1,3,3]
[1,3,4]
[1,4,4]
[2,2,2]
[2,2,3]
[2,2,4]
[2,3,3]
[2,3,4]
[2,4,4]
[3,3,3]
[3,3,4]
[3,4,4]
[4,4,4]
I guess, it doesn't get shorter than one-line (using Iterators).
using IterTools
import Combinatorics.combinations
n=3
collect(imap(c -> Int[c[k]-k+1 for k=1:length(c)],combinations(1:(2n-1),n)))
I believe you're looking for the product function from the Iterators package. In your case product(v,v,v) should do what's required.
Here is a function to calculate the required collection:
function calcset(n=3)
res = []
for c in combinations([1:(2n-1)],n-1)
c3 = [c,2n].-[0,c]
push!(res,vcat([fill(i,c3[n-i+1]-1) for i=1:n]...))
end
return res
end
calcset(3)
There is probably some better way to code this, but this should be enough.
Notice the result is generated through repeated push!s, so this is easily turned into an iterator, if necessary.
And in iterator form:
import Base: start, next, done, eltype, length
type ImageTypeIterator
inneritr::Base.Combinations{Array{Int64,1}}
n::Int
end
imagetype(n::Int) = ImageTypeIterator(combinations([1:(2n-1)],n-1),n)
eltype(itr::ImageTypeIterator) = Array{Int64,1}
start(itr::ImageTypeIterator) = start(itr.inneritr)
function next(itr::ImageTypeIterator,s)
(c,s) = next(itr.inneritr,s)
c3 = [c,2*itr.n].-[0,c]
(vcat([fill(i,c3[itr.n-i+1]-1) for i=1:itr.n]...),s)
end
done(itr::ImageTypeIterator,s) = done(itr.inneritr,s)
length(itr::ImageTypeIterator) = length(itr.inneritr)
# test with [1,2,3]
for t in imagetype(3) println(t) ; end
The test at the end should print the collection set in the question.
BTW the name ImageTypeIterator is an attempt to characterize the collection as the distinct types of sizes of preimages when looking at a function f : [1:n] -> [1:n]. But a different interpretation might be appropriate. Other names suggestion welcome in comments.
A faster?/clearer? implementation could use:
imagetype(n::Int) = ImageTypeIterator(combinations([1:(2n-1)],n),n)
function next(itr::ImageTypeIterator,s)
(c,s) = next(itr.inneritr,s)
v = Array(Int,itr.n)
j = 1 ; p = 1
for k=1:itr.n
while !(j in c) j += 1 ; p += 1 ; end
v[k] = p
j += 1
end
(v,s)
end
Its the same logic as above, but without too much slicing. The logic takes a subset of 2n-1 and views non-gaps as repeated values and gaps as a trigger to advance to next value.
OK, a simpler version using Iterators.jl:
using Iterators
function ff(c)
v = Array(Int,length(c))
j = 1 ; p = 1
for k=1:length(c)
while !(j in c) j += 1 ; p += 1 ; end
v[k] = p
j += 1
end
v
end
# test
n = 3
for t in imap(ff,combinations([1:(2n-1)],n)) println(t) ; end
This is perhaps the simplest version, although equivalent in methods to the other answers.
And in the spirit of brevity:
using Iterators
ff(c) = begin
j=1;p=1; [(while !(j in c) j+=1;p+=1 ; end ; j+=1 ; p) for k=1:length(c)]
end
n = 3 # test
for t in imap(ff,combinations([1:(2n-1)],n)) println(t) ; end

Resources