Setting seeds in multi-threading loop in Julia - multithreading

I want to generate random numbers in Julia using multi-threading. I am using the
Threads.#threads macro to accomplish it. However, I struggle fixing the number of seeds to obtain the same result every time I run the code. Here is my trial:
Random.seed!(1234)
a = [Float64[] for _ in 1:10]
Threads.#threads for i = 1:10
push!(a[Threads.threadid()],rand())
end
sum(reduce(vcat, a))
The script above delivers different results every time I run it. By contrast, I get the same results if I use a plain for loop:
Random.seed!(12445)
b = []
for i = 1:10
push!(b,rand())
end
sum(b)
I have the impression that the solution to this issue must be easy. Still, I couldn't find it. Any help is much appreciated.
Thank you.

You need to generate a separate random stream for each thread.
The simplest way is to have a random number generator with a different seed:
using Random
rngs = [MersenneTwister(i) for i in 1: Threads.nthreads()];
Threads.#threads for i = 1:10
val = rand(rngs[Threads.threadid()])
# do something with val
end
If you do not want to risk correlation for different random number seeds you could actually jump around a single number generator:
julia> rngs2 = Future.randjump.(Ref(MersenneTwister(0)), big(10)^20 .* (1:Threads.nthreads()))
4-element Vector{MersenneTwister}:
MersenneTwister(0, (200000000000000000000, 0))
MersenneTwister(0, (400000000000000000000, 0))
MersenneTwister(0, (600000000000000000000, 0))
MersenneTwister(0, (800000000000000000000, 0))

Ciao Fabrizio. In BetaML I solved this problem with:
"""
generateParallelRngs(rng::AbstractRNG, n::Integer;reSeed=false)
For multi-threaded models, return n independent random number generators (one per thread) to be used in threaded computations.
Note that each ring is a _copy_ of the original random ring. This means that code that _use_ these RNGs will not change the original RNG state.
Use it with `rngs = generateParallelRngs(rng,Threads.nthreads())` to have a separate rng per thread.
By default the function doesn't re-seed the RNG, as you may want to have a loop index based re-seeding strategy rather than a threadid-based one (to guarantee the same result independently of the number of threads).
If you prefer, you can instead re-seed the RNG here (using the parameter `reSeed=true`), such that each thread has a different seed. Be aware however that the stream of number generated will depend from the number of threads at run time.
"""
function generateParallelRngs(rng::AbstractRNG, n::Integer;reSeed=false)
if reSeed
seeds = [rand(rng,100:18446744073709551615) for i in 1:n] # some RNGs have issues with too small seed
rngs = [deepcopy(rng) for i in 1:n]
return Random.seed!.(rngs,seeds)
else
return [deepcopy(rng) for i in 1:n]
end
end
The function above deliver the same results also independently of the number of threads used in Julia and can then be used for example like here:
using Test
TESTRNG = MersenneTwister(123)
println("** Testing generateParallelRngs()...")
x = rand(copy(TESTRNG),100)
function innerFunction(bootstrappedx; rng=Random.GLOBAL_RNG)
sum(bootstrappedx .* rand(rng) ./ 0.5)
end
function outerFunction(x;rng = Random.GLOBAL_RNG)
masterSeed = rand(rng,100:9999999999999) # important: with some RNG it is important to do this before the generateParallelRngs to guarantee independance from number of threads
rngs = generateParallelRngs(rng,Threads.nthreads()) # make new copy instances
results = Array{Float64,1}(undef,30)
Threads.#threads for i in 1:30
tsrng = rngs[Threads.threadid()] # Thread safe random number generator: one RNG per thread
Random.seed!(tsrng,masterSeed+i*10) # But the seeding depends on the i of the loop not the thread: we get same results indipendently of the number of threads
toSample = rand(tsrng, 1:100,100)
bootstrappedx = x[toSample]
innerResult = innerFunction(bootstrappedx, rng=tsrng)
results[i] = innerResult
end
overallResult = mean(results)
return overallResult
end
# Different sequences..
#test outerFunction(x) != outerFunction(x)
# Different values, but same sequence
mainRng = copy(TESTRNG)
a = outerFunction(x, rng=mainRng)
b = outerFunction(x, rng=mainRng)
mainRng = copy(TESTRNG)
A = outerFunction(x, rng=mainRng)
B = outerFunction(x, rng=mainRng)
#test a != b && a == A && b == B
# Same value at each call
a = outerFunction(x,rng=copy(TESTRNG))
b = outerFunction(x,rng=copy(TESTRNG))
#test a == b

Assuming you are on Julia 1.6 you can do e.g. the following:
julia> using Random
julia> foreach(i -> Random.seed!(Random.default_rng(i), i), 1:Threads.nthreads())
The point is that currently Julia already has a separate random number generator per thread so you do not need to generate your own (of course you could do it as in the other answers, but you do not have to).
Also note that in the future versions of Julia the:
Threads.#threads for i = 1:10
push!(a[Threads.threadid()],rand())
end
part is not guaranteed to produce reproducible results. In Julia 1.6 Threads.#threads uses static scheduling, but as you can read in its docstring it is subject to change.

Related

Speed up for loop iterating over two large (10000s) lists

I'm trying to check one list of IP addresses against another list of Networks that it could belong to. The lengths:
len(IP_addresses_list) = 31995
len(Network_list) = 54099
big_dict = {}
for ip in IP_addresses_list:
address_and_networks = self.is_subnet_of(ip, Network_list)
big_dict.update(address_and_networks)
df = pd.DataFrame.from_dict(big_dict, orient="index")
And the loop is verifying whether it belongs to the network by:
def is_subnet_of(self, ip, Network_list):
address_and_networks = {}
address = ipaddress.ip_address(ip)
for net in Network_list:
network = ipaddress.ip_network(net)
res = network.supernet_of(ipaddress.ip_network(f"{address}/{address.max_prefixlen}"))
if res:
if ip in address_and_networks:
address_and_networks[ip].append(net)
else:
address_and_networks[ip] = [net]
return address_and_networks
The address_and_networks dict may look like:
{
"xxx.xxx.xxx.xxx": ["xxx.xxx.xxx.xxx/24", "xxx.xxx.xxx.xxx/23"],
"yyy.yyy.yyy.yyy": ["yyy.yyy.yyy.yyy/24", "yyy.yyy.yyy.yyy/23"]
}
This method is currently painstakingly slow, so slow it's just not feasible to use. I'd like to accelerate this somehow, perhaps by dumping the original lists (IP_addresses_list, Network_list) into a dataframe then perform some sweeping execution on the dataframe, by applying the is_subnet_of method (maybe something like dataframe.select or dataframe.apply). Any idea how I can speed this up?
EDIT
I streamlined the code further, but I'm still resorting to looping over the dataframes:
df = pd.DataFrame({"IP_Address": ip_s.map(ipaddress.ip_address),
"Network": net_s.map(ipaddress.ip_network),
"Associated": np.nan})
for i, address in df["IP_Address"].iteritems():
if address != address:
continue
net_list = []
for j, network in df["Network"].iteritems():
if network.supernet_of(ipaddress.ip_network(f"{address}/{address.max_prefixlen}")):
net_list.append(str(network))
df.loc[i, "Associated"] = net_list
Example data:
Addresses = ['172.16.56.40','172.16.16.16']
Networks = ['172.16.56.0/24', '172.16.56.32/27']
Bit operations will make it much faster, it also allows some expensive operations to be performed only once.
Consider the bellow function:
def is_subnet_of(ip: str, network: str) -> bool:
def bin_ip(ip: str) -> int:
return int("".join(map(lambda n: bin(int(n)).replace("0b", "").zfill(8), ip.split("."))), 2)
net, mask_len = network.split("/")
mask_len = int(mask_len)
# Convert ip and net to binary-format
ip_bin = bin_ip(ip)
net_bin = bin_ip(net)
# Build mask
mask = int(mask_len * '1' + (32 - mask_len) * '0', 2)
# check (Bit operations are faster)
return net_bin ^ (mask & ip_bin) == 0
# `is_subnet_of("172.16.56.40", "172.16.56.0/24")` will return True.
It takes 3 steps to decided whether the given ip is in the network:
Convert ip and net to binary-format(We need another functionbin_ipto do this job)
Build the net mask
Do the checkļ¼š(net ^ (mask & ip))
Even better, step1 and step2 only have to be performed once, which will save us most of the time:
from typing import Tuple
def check_subnet(ip, network_list):
def bin_ip(ip: str) -> int:
return int(
"".join(map(lambda n: bin(int(n)).replace("0b", "").zfill(8), ip.split("."))), 2
)
def bin_net(net: str) -> Tuple[int, int]:
net_ip, mask_len = net.split("/")
mask_len = int(mask_len)
net_bin = bin_ip(net_ip)
mask = int(mask_len * "1" + (32 - mask_len) * "0", 2)
return net_bin, mask
def is_subnet_of(ip: int, network: Tuple[int, int]) -> bool:
return network[0] ^ (network[1] & ip) == 0
ip = bin_ip(ip)
networks = tuple(map(bin_net, network_list))
return tuple(is_subnet_of(ip, network) for network in networks)
# Below are the test section
address_and_networks = {"172.16.56.40": ["172.16.56.0/24", "172.16.56.32/27"]}
result = tuple(check_subnet(ip, network_list) for ip, network_list in address_and_networks.items())
print(result)
If you insist using pandas vectorization operations, then I suggest:
Use the bin_ip function I mentioned above to construct matrixs of ip, network, and mask (this step will be O(n) complexity)
Use numpy matrix operations (as we all know, pandas actually uses numpy to complete numerical calculations) to calculate the result:result = ((ip & mask) == net.T)
In this way, we will do the O(n) complexity ourselves and leave the O(n^2) part to pandas/numpy vectorization operations.
You are running into O(n^2) complexity.
As you mentioned, len(IP_addresses_list) = 31995 and len(Network_list) = 54099. Currently you are looping all 31995 * 54099 iterations and so causing the slowness.
Just for rough idea the same length loop doing nothing(just a pass statement) took almost 4 mins on online python compiler website. It still took a little over 1 min on my machine.
You need to reduce the iterations happening inside is_subnet_of function.
One approach here is break from the loop whenever possible. Do you see any conditions when your result is complete or you no longer need to continue the loop on Network_list (inside is_subnet_of function)?
Another approach is reduce your search list (Network_list). Convert it into dictionary with key as first 3 parts of IP address. And in is_subnet_of loop only the shorter list matching first 3 parts of IP address.
For example:
Addresses = ['172.16.56.40','172.16.16.16']
Networks = ['172.16.56.0/24', '172.16.56.32/27']
Convert your Network lists to dictionary for Networks
Networks = ['172.16.56.0/24', '172.16.56.32/27']
network_dict = {}
for n in Networks:
network_dict.setdefault(n[:n.rfind(".")], []).append(n)
print(network_dict)
Output:
{'172.16.56': ['172.16.56.0/24', '172.16.56.32/27']}
So your is_subnet_of function will loop through for 172.16.56.40 and no loop for 172.16.16.16
Without the actual data, it is a bit hard to understand what exactly is happening, but some general advise could be:
DataFrame.merge with how='inner' allows you to relatively fast check which values appear in two separate lists. So if you have one column containing the first three parts of IP addresses and another column containing network IPs (again only XXX.XXX.XXX and not the last part), this could already do the job. You might need to do some checking afterwards, but the number of operations would be much less. merge is vectorized making it a suitable candidate.
dask.apply could be used on a Dask DataFrame. This could be combined with merge similar to the above-mentioned idea.
Once you provide example data, the suggestions above could be refined.

speeding up CVXPY processing speed

I have written some code that uses the cvxpy library to solve an integer programming problem, however the code is taking so much time to run I was wondering if there is any way to make the code faster?
The integer programming problem in this case takes in a matrix of shape (1,569 x 3,071), and it has 3,071 constraints to satisfy. The code is as follows:
mat_f = sys.argv[1]
matIdx2genome_dic_f = sys.argv[2]
genomes_f = sys.argv[3]
with open(matIdx2genome_dic_f, 'r') as in_f:
matIdx2genome_dic = json.load(in_f)
M = np.load(mat_f)
selection = cp.Variable(M.shape[1], boolean = True)
ones_vec = np.ones(M.shape[1])
constraints = []
for i in range(len(M)):
constraints.append(M[i] * selection >= 1)
total_genomes = ones_vec * selection
problem = cp.Problem(cp.Minimize(total_genomes), constraints)
print('solving the integer programming problem: ')
time = time.time()
problem.solve(parallel = True)
print('problem solved in: '+ str(time.time() - time))
solution = selection.value
solution = list(map(round, solution))
solution = np.array(solution)
which_genomes = np.where(solution == 1.0)[0]
with open(genomes_f, 'w') as out_f:
for idx in which_genomes:
out_f.write(matIdx2genome_dic[idx]+'\n')
The first command line argument is what's important here, it's a numpy binary matrix that is of shape (1569, 3071).
The problem here is to minimize the number of columns of the matrix needed such that each and every row has at least a 1 in it's rows.
My question is how can I write this script so that it can run faster? is there a way to parallelize it? I have set the parallel parameter to True in the solve method by I don't think it's doing much since I'm monitoring the cpu utilization and it's only 100%, so I don't think that parallel option is doing much?
Or is there another way (solver that I should call maybe) that would solve this faster?

How to accelerate the application of the following for loop and function?

I have the following for loop:
for j in range(len(list_list_int)):
arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
I would like to apply foo with multiprocessing, mainly because it is taking a lot of time to finish. I tried to do it in batches with funcy's chunks method:
for j in chunks(1000, list_list_int):
arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
However, I am getting list object cannot be interpreted as an integer. What is the correct way of applying foo using multiprocessing?
list_list_int = [1,2,3,4,5,6]
for j in chunks(2, list_list_int):
for i in j:
avg_, max_, last_ = foo(bar, i)
I don't have chunks installed, but from the docs I suspect it produces (for size 2 chunks, from:
alist = [[1,2],[3,4],[5,6],[7,8]]
j = [[1,2],[3,4]]
j = [[5,6],[7,8]]
which would produce an error:
In [116]: alist[j]
TypeError: list indices must be integers or slices, not list
And if your foo can't work with the full list of lists, I don't see how it will work with that list split into chunks. Apparently it can only work with one sublist at a time.
If you are looking to perform parallel operations on a numpy array, then I would use Dask.
With just a few lines of code, your operation should be able to be easily ran on multiple processes and the highly developed Dask scheduler will balance the load for you. A huge benefit to Dask compared to other parallel libraries like joblib, is that it maintains the native numpy API.
import dask.array as da
# Setting up a random array with dimensions 10K rows and 10 columns
# This data is stored distributed across 10 chunks, and the columns are kept together (1_000, 10)
x = da.random.random((10_000, 10), chunks=(1_000, 10))
x = x.persist() # Allow the entire array to persist in memory to speed up calculation
def foo(x):
return x / 10
# Using the native numpy function, apply_along_axis, applying foo to each row in the matrix in parallel
result_foo = da.apply_along_axis(foo, 0, x)
# View original contents
x[0:10].compute()
# View sample of results
result_foo = result_foo.compute()
result_foo[0:10]

changing the value of iterator of for loop at certain condition in python

Hello friend while learning python it came into my mind that is there any way by which we can directly jump to a particular value of iterator without iterating fro example
a=range(1.10) or (1,2,3,4,5,6,7,8,9)
for i in a
print ("value of i:",i)
if (certain condition)
#this condition will make iterator to directly jump on certain value of
#loop here say if currently i=2 and after this it will directly jump the
#the iteration value of i=8 bypassing the iterations from 3 to 7 and
#saving the cycles of CPU)
There is a solution, however it involves complicating your code somewhat.
It does not require an if function however it does require both while and try loops.
If you wish to change the numbers skipped then you simply change the for _ in range() statement.
This is the code:
a = [1,2,3,4,5,6,7,8,9,10]
at = iter(a)
while True:
try:
a_next = next(at)
print(a_next)
if a_next == 3:
for _ in range(4, 8):
a_next = next(at)
a_next = str(a_next)
print(a_next)
except StopIteration:
break
The iterator interface is based on the next method. Multiple next calls are necessary to advance in the iteration for more that one element. There is no shortcut.
If you iterate over sequences only, you may abandon the interator and write an old-fashioned C-like code that allows you to move the index:
a = [1,2,3,4,5,6,7,8,9,10]
a_len = len(a)
i = 0
while i < a_len:
print(a[i])
if i == 2:
i = 8
continue
i += 1

Combination of one of every string group in all possible combinations and orders in matlab

So I forgot a string and know there is three substrings in there and I know a few possibilities for each string. So all I need to do is go through all possible combinations and orders until I find the one I forgot. But since humans can only hold four items in their working memory (definately an upper limit for me), I cant keep tabs on which ones I examined.
So say I have n sets of m strings, how do I get all strings that have a length of n substrings consisting of one string from each set in any order?
I saw an example of how to do it in a nested loop but then I have to specify the order. The example is for n = 3 with different m`s. Not sure how to make this more general:
first = {'Hoi','Hi','Hallo'};
second = {'Jij','You','Du'};
third = {'Daar','There','Da','LengthIsDifferent'};
for iF = 1:length(first)
for iS = 1:length(second)
for iT = 1:length(third)
[first{iF}, second{iS}, third{iT}]
end
end
end
About this question: it does not solve this problem because it presumes that the order of the sets to choose from is known.
This generates the cartesian product of the indices using ndgrid.
Then uses some cellfun-magic to get all the strings. Afterwards it just cycles through all the permutations and appends those.
first = {'Hoi','Hi','Hallo'};
second = {'Jij','You','Du'};
third = {'Daar','There','Da','LengthIsDifferent'};
Vs = {first, second, third};
%% Create cartesian product
Indices = cellfun(#(X) 1:numel(X), Vs, 'uni', 0);
[cartesianProductInd{1:numel(Vs)}] = ndgrid(Indices{:});
AllStringCombinations = cellfun(#(A,I) A(I(:)), Vs, cartesianProductInd,'uni',0);
AllStringCombinations = cat(1, AllStringCombinations{:}).';%.'
%% Permute what we got
AllStringCombinationsPermuted = [];
permutations = perms(1:numel(Vs));
for i = 1:size(permutations,1)
AllStringCombinationsPermuted = [AllStringCombinationsPermuted; ...
AllStringCombinations(:,permutations(i,:));];
end

Resources