How to run Locust task n times - performance-testing

Want to run nested Locust task n iterations in every sequential task set.
The only way that I've found for Locust 1.x versions is to multiply task in tasks:
class TaskA(SequentialTaskSet):
...
class TaskB(SequentialTaskSet):
n = 5
tasks = [TaskA for _ in range(n)]
What is the better and more correct way to run TaskA n times?
Options 1 & 2 below don't work in 1.3.1:
1:
class TaskB(SequentialTaskSet):
for _ in range(n):
tasks = [TaskA]
2:
class TaskB(TaskSet):
for _ in range(n):
tasks = {TaskA: n}

Usually the answer to
How do I do X with a SequentialTaskSet?
is
Don't use a SequentialTaskSet, just use Python!
Can you achieve the same thing with a regular for loop & functions?
tasks = [TaskA for _ in range(n)] seems ok though, unless it doesnt do what you want it to?

Related

Setting seeds in multi-threading loop in Julia

I want to generate random numbers in Julia using multi-threading. I am using the
Threads.#threads macro to accomplish it. However, I struggle fixing the number of seeds to obtain the same result every time I run the code. Here is my trial:
Random.seed!(1234)
a = [Float64[] for _ in 1:10]
Threads.#threads for i = 1:10
push!(a[Threads.threadid()],rand())
end
sum(reduce(vcat, a))
The script above delivers different results every time I run it. By contrast, I get the same results if I use a plain for loop:
Random.seed!(12445)
b = []
for i = 1:10
push!(b,rand())
end
sum(b)
I have the impression that the solution to this issue must be easy. Still, I couldn't find it. Any help is much appreciated.
Thank you.
You need to generate a separate random stream for each thread.
The simplest way is to have a random number generator with a different seed:
using Random
rngs = [MersenneTwister(i) for i in 1: Threads.nthreads()];
Threads.#threads for i = 1:10
val = rand(rngs[Threads.threadid()])
# do something with val
end
If you do not want to risk correlation for different random number seeds you could actually jump around a single number generator:
julia> rngs2 = Future.randjump.(Ref(MersenneTwister(0)), big(10)^20 .* (1:Threads.nthreads()))
4-element Vector{MersenneTwister}:
MersenneTwister(0, (200000000000000000000, 0))
MersenneTwister(0, (400000000000000000000, 0))
MersenneTwister(0, (600000000000000000000, 0))
MersenneTwister(0, (800000000000000000000, 0))
Ciao Fabrizio. In BetaML I solved this problem with:
"""
generateParallelRngs(rng::AbstractRNG, n::Integer;reSeed=false)
For multi-threaded models, return n independent random number generators (one per thread) to be used in threaded computations.
Note that each ring is a _copy_ of the original random ring. This means that code that _use_ these RNGs will not change the original RNG state.
Use it with `rngs = generateParallelRngs(rng,Threads.nthreads())` to have a separate rng per thread.
By default the function doesn't re-seed the RNG, as you may want to have a loop index based re-seeding strategy rather than a threadid-based one (to guarantee the same result independently of the number of threads).
If you prefer, you can instead re-seed the RNG here (using the parameter `reSeed=true`), such that each thread has a different seed. Be aware however that the stream of number generated will depend from the number of threads at run time.
"""
function generateParallelRngs(rng::AbstractRNG, n::Integer;reSeed=false)
if reSeed
seeds = [rand(rng,100:18446744073709551615) for i in 1:n] # some RNGs have issues with too small seed
rngs = [deepcopy(rng) for i in 1:n]
return Random.seed!.(rngs,seeds)
else
return [deepcopy(rng) for i in 1:n]
end
end
The function above deliver the same results also independently of the number of threads used in Julia and can then be used for example like here:
using Test
TESTRNG = MersenneTwister(123)
println("** Testing generateParallelRngs()...")
x = rand(copy(TESTRNG),100)
function innerFunction(bootstrappedx; rng=Random.GLOBAL_RNG)
sum(bootstrappedx .* rand(rng) ./ 0.5)
end
function outerFunction(x;rng = Random.GLOBAL_RNG)
masterSeed = rand(rng,100:9999999999999) # important: with some RNG it is important to do this before the generateParallelRngs to guarantee independance from number of threads
rngs = generateParallelRngs(rng,Threads.nthreads()) # make new copy instances
results = Array{Float64,1}(undef,30)
Threads.#threads for i in 1:30
tsrng = rngs[Threads.threadid()] # Thread safe random number generator: one RNG per thread
Random.seed!(tsrng,masterSeed+i*10) # But the seeding depends on the i of the loop not the thread: we get same results indipendently of the number of threads
toSample = rand(tsrng, 1:100,100)
bootstrappedx = x[toSample]
innerResult = innerFunction(bootstrappedx, rng=tsrng)
results[i] = innerResult
end
overallResult = mean(results)
return overallResult
end
# Different sequences..
#test outerFunction(x) != outerFunction(x)
# Different values, but same sequence
mainRng = copy(TESTRNG)
a = outerFunction(x, rng=mainRng)
b = outerFunction(x, rng=mainRng)
mainRng = copy(TESTRNG)
A = outerFunction(x, rng=mainRng)
B = outerFunction(x, rng=mainRng)
#test a != b && a == A && b == B
# Same value at each call
a = outerFunction(x,rng=copy(TESTRNG))
b = outerFunction(x,rng=copy(TESTRNG))
#test a == b
Assuming you are on Julia 1.6 you can do e.g. the following:
julia> using Random
julia> foreach(i -> Random.seed!(Random.default_rng(i), i), 1:Threads.nthreads())
The point is that currently Julia already has a separate random number generator per thread so you do not need to generate your own (of course you could do it as in the other answers, but you do not have to).
Also note that in the future versions of Julia the:
Threads.#threads for i = 1:10
push!(a[Threads.threadid()],rand())
end
part is not guaranteed to produce reproducible results. In Julia 1.6 Threads.#threads uses static scheduling, but as you can read in its docstring it is subject to change.

Converting Iterative Code to Recursive Code Python3

I'm a beginner programming student and I've been studying recursion functions in Python3 lately. I'm working on a code that basically provides minimum steps requires for a number N to be M undergoing processes of adding 1, divide 2, or multiple 10. I did an iterative function that works well, but as a beginner student of recursive functions I want to be able to convert the code to a recursive code and in this code I was not successful.
I've been reading about this process lately, but as I said it was a very hard implementation for my skills. I know if I want to convert an iterative code I need to use the main loop condition as my base case and the body of the loop as the recursive step and that is all I know.
I would really appreciate it if you could help me to find the base case and the recursive steps of this code. I don't want you to write my code, I want you to help me in reaching my goals.
ITERATIVE CODE
def scape(N, M, steps=0):
if N == M:
return 0
currentoptions = [N]
while True:
if M in currentoptions:
break
thisround = currentoptions[:]
currentoptions = []
for i in thisround:
if (i%2) == 0:
currentoptions.append(i // 2)
currentoptions.append(i + 1)
currentoptions.append(i * 10)
steps += 1
return steps
EXAMPLE
print(scape(8,1))
OUTPUT -> 3
Because 8/2 -> 4/2 -> 2/2 = 1
It is difficult to use pure recursion here (without passing around auxiliary data structures). YOu could do sth along the following lines:
def scape(opts, M, steps=0):
if M in opts:
return steps
opts_ = []
for N in opts:
if not N%2:
opts_.append(N // 2)
opts_.extend((N + 1, N * 10))
return scape(opts_, M, steps+1)
>>> scape([8], 1)
3
Or in order to keep the signature (and not pass around redundant arguments), you could use a recursive helper function:
def scape(N, M):
steps = 0
def helper(opts):
nonlocal steps
if M in opts:
return steps
opts_ = []
for N in opts:
if not N%2:
opts_.append(N // 2)
opts_.extend((N + 1, N * 10))
steps += 1
return helper(opts_)
return helper([N])
>>> scape(8, 1)
3

(itertools.combinations) Python Shell hangs in a bigger than specific amount size and throws memory error

I am trying to run a specific code to find the sum of all the possible combinations of the list that is coming from a .in file. The same code, when running with relatively small files, runs perfectly and with bigger files hangs and after a bit throws MEMORY ERROR
import itertools
file = open("c_medium.in","r")
if file.mode=='r':
content = file.readlines()
maxSlices,numberOfPizza = map(int,content[0].split())
numberOfSlices = tuple(map(int,content[1].split()))
print(maxSlices)
print(numberOfSlices)
sol = []
sumOfSlices = []
for x in range(1,len(numberOfSlices)+1):
print(x)
for y in itertools.combinations(numberOfSlices,x):
if sum(y) <= maxSlices:
sumOfSlices.append(sum(y))
sumOfSlices.sort()
print(sumOfSlices)
checkSum = sumOfSlices[len(sumOfSlices)-1]
print(checkSum)
found = False
if found == False:
for x in range(1,len(numberOfSlices)+1):
print(x)
for y in itertools.combinations(numberOfSlices,x):
if found == False:
if sum(y) == checkSum:
for z in y:
sol.append(numberOfSlices.index(z))
found = True
solution = tuple(map(str,sol))
print(solution)
The number of combinations of N elements grows very, very fast with N.
Regarding your code in particular, if (sum(y) <= maxSlices) is always true, then you'll generate a list with 2^(numberOfSlices) elements. i.e., you'll overflow a 32-bit integer if numberOfSlices=32.
I'd recommend trying to solve your task without explicitly building a list. If you describe what your code is doing, maybe someone can help.

How to accelerate the application of the following for loop and function?

I have the following for loop:
for j in range(len(list_list_int)):
arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
I would like to apply foo with multiprocessing, mainly because it is taking a lot of time to finish. I tried to do it in batches with funcy's chunks method:
for j in chunks(1000, list_list_int):
arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
However, I am getting list object cannot be interpreted as an integer. What is the correct way of applying foo using multiprocessing?
list_list_int = [1,2,3,4,5,6]
for j in chunks(2, list_list_int):
for i in j:
avg_, max_, last_ = foo(bar, i)
I don't have chunks installed, but from the docs I suspect it produces (for size 2 chunks, from:
alist = [[1,2],[3,4],[5,6],[7,8]]
j = [[1,2],[3,4]]
j = [[5,6],[7,8]]
which would produce an error:
In [116]: alist[j]
TypeError: list indices must be integers or slices, not list
And if your foo can't work with the full list of lists, I don't see how it will work with that list split into chunks. Apparently it can only work with one sublist at a time.
If you are looking to perform parallel operations on a numpy array, then I would use Dask.
With just a few lines of code, your operation should be able to be easily ran on multiple processes and the highly developed Dask scheduler will balance the load for you. A huge benefit to Dask compared to other parallel libraries like joblib, is that it maintains the native numpy API.
import dask.array as da
# Setting up a random array with dimensions 10K rows and 10 columns
# This data is stored distributed across 10 chunks, and the columns are kept together (1_000, 10)
x = da.random.random((10_000, 10), chunks=(1_000, 10))
x = x.persist() # Allow the entire array to persist in memory to speed up calculation
def foo(x):
return x / 10
# Using the native numpy function, apply_along_axis, applying foo to each row in the matrix in parallel
result_foo = da.apply_along_axis(foo, 0, x)
# View original contents
x[0:10].compute()
# View sample of results
result_foo = result_foo.compute()
result_foo[0:10]

Python fails to parallelize buffer reads

I'm having performances issues in multi-threading.
I have a code snippet that reads 8MB buffers in parallel:
import copy
import itertools
import threading
import time
# Basic implementation of thread pool.
# Based on multiprocessing.Pool
class ThreadPool:
def __init__(self, nb_threads):
self.nb_threads = nb_threads
def map(self, fun, iter):
if self.nb_threads <= 1:
return map(fun, iter)
nb_threads = min(self.nb_threads, len(iter))
# ensure 'iter' does not evaluate lazily
# (generator or xrange...)
iter = list(iter)
# map to results list
results = [None] * nb_threads
def wrapper(i):
def f(args):
results[i] = map(fun, args)
return f
# slice iter in chunks
chunks = [iter[i::nb_threads] for i in range(nb_threads)]
# create threads
threads = [threading.Thread(target = wrapper(i), args = [chunk]) \
for i, chunk in enumerate(chunks)]
# start and join threads
[thread.start() for thread in threads]
[thread.join() for thread in threads]
# reorder results
r = list(itertools.chain.from_iterable(map(None, *results)))
return r
payload = [0] * (1000 * 1000) # 8 MB
payloads = [copy.deepcopy(payload) for _ in range(40)]
def process(i):
for i in payloads[i]:
j = i + 1
if __name__ == '__main__':
for nb_threads in [1, 2, 4, 8, 20]:
t = time.time()
c = time.clock()
pool = ThreadPool(nb_threads)
pool.map(process, xrange(40))
t = time.time() - t
c = time.clock() - c
print nb_threads, t, c
Output:
1 1.04805707932 1.05
2 1.45473504066 2.23
4 2.01357698441 3.98
8 1.56527090073 3.66
20 1.9085559845 4.15
Why does the threading module miserably fail at parallelizing mere buffer reads?
Is it because of the GIL? Or because of some weird configuration on my machine, one process
is allowed only one access to the RAM at a time (I have decent speed-up if I switch ThreadPool for multiprocessing.Pool is the code above)?
I'm using CPython 2.7.8 on a linux distro.
Yes, Python's GIL prevents Python code from running in parallel across multiple threads. You describe your code as doing "buffer reads", but it's really running arbitrary Python code (in this case, iterating over a list adding 1 to other integers). If your threads were making blocking system calls (like reading from a file, or from a network socket), then the GIL would usually be released while the thread blocked waiting on the external data. But since most operations on Python objects can have side effects, you can't do several of them in parallel.
One important reason for this is that CPython's garbage collector uses reference counting as its main way to know when an object can be cleaned up. If several threads try to update the reference count of the same object at the same time, they might end up in a race condition and leave the object with the wrong count. The GIL prevents that from happening, as only one thread can be making such internal changes at a time. Every time your process code does j = i + 1, it's going to be updating the reference counts of the integer objects 0 and 1 a couple of times each. That's exactly the kind of thing the GIL exists to guard.

Resources