Safe way for parallel random sampling in python3 - python-3.x

I need to repeat N times a scientific simulation based on a random sampling, easily:
results = [mysimulation() for i in range(N)]
Since every simulation require minutes, I'd like to parallelize them in order to reduce the execution time. Some weeks ago I successfully analyzed some simpler cases, for which I wrote my code in C using OpenMP and functions like rand_r() for avoiding seed overlapping. How could I obtain a similar effect in Python?
I tried reading more about python3 multithreading/parallelization, but I found no results concerning the random generation. Conversely, numpy.random does not suggest anything in this direction (as far as I found).

Related

Accelerating diagonalization with GPU and multiprocessing

In my python code, I have to diagonalize hundreds of arrays, each of size around ~1000*1000. However, each array is independent, so it would seem that I can accelerate this process using parallel programming. A minimal (pseudo-code) example would be something of the form
arr_list = [a0, a1, a2, ..., a199] # each arr is of shape 1000*1000
for idx, arr in enumerate(arr_list):
evals, evecs = np.linalg.eigh(arr)
arr_list[idx] = evals
I'm not very familiar with CUDA, numba, CuPy or multiprocessing, but some quick research seems to tell me that CuPy is mainly used for accelerating basic operations such as addition, multiplication, diagonalization, etc. and only really has a significant jump in time relative to numpy if the array size is much larger than 1000. Multiprocessing, in contrast, seems to utilize the multiple cores (6-8) on a CPU, but it seems that numpy diagonalization is already a multi-core process (correct me if wrong), so it may not have a larger decrease in time.
I'm not very familiar with parallel programming, so I wondering if someone with more experience could give a few pointers on such a problem. Maybe a direction to research.
EDIT. Unfortunately, I'm on Windows so jax doesn't seem to work.

How to efficiently perform billions of Bernoulli extractions using numpy?

I am working at a thesis about epidemiology, and I have to simulate a SI epidemic in a temporal network. At each time step there's a probability ~ Bernoulli(beta) to perform an extraction between an infected and a susceptible node. I am using np.random.binomial(size=whatever, n=1, p=beta) to make the computer decide. Now, I have to simulate the epidemic in the same network by making it start from each one of the nodes. This should be repeated K times to get some statistically relevant results for each node, and, since the temporal network is stochastic too, everything should be repeated NET_REALIZATION times.
So, in a network with N = 100, if K=500 and NET=REALIZATION=500, the epidemic should be repeated 25,000,000‬ times. If T=100, it means 2,500,000,000‬ extractions per set of S-I couples (which of course varies in time). If beta is small, which is often the case, this leads to a very time-spending computation.
If you think that, for my computer, the bernoulli extraction takes 3.63 µs to happen, this means I have to wait hours to get some results, which is really limitating the development of my thesis.
The problem is that more than half of the time is just spent in random extractions.
I should use numpy since the results of extractions interact with other data structures. I tried to use numba, but it didn't seem to improve extractions' speed.
Is there a faster way to get the same results? I was thinking about doing a very very big extraction once forever, something like 10^12 extractions of 0s and 1s, and just import a part of them for each different simulation (this should be repeated for several values of beta), but I wonder if there's a smarter move.
Thanks for help
If you can express your betas as increments of 2^-N (for example, increments of 1/256 if N is 8.), then extract random N-bit chunks and determine whether each chunk is less than beta * 2^N. This works better if 32 is evenly divisible by N.
Note that numpy.random.uniform produces random floating-point numbers, and is expected to be slower than producing random integers or bits. This is especially because generating random floating-point numbers depends on generating random integers — not the other way around.
The following is an example of how this idea works.
import numpy
# Fixed seed for demonstration purposes
rs = numpy.random.RandomState(777778)
# Generate 10 integers in [0, 256)
ri = rs.randint(0, 256, 10)
# Now each integer x can be expressed, say, as a Bernoulli(5/256)
# variable which is 0 if x < 5, and 1 otherwise. I haven't tested
# the following, which is similar to an example you gave in a
# comment.
rbern = (ri>=5) * 1
If you can use NumPy 1.17 or later, the following alternative exists:
import numpy
rs = numpy.random.default_rng()
ri = rs.integers(0, 256, 10)
Note also that NumPy 1.17 introduces a new random number generation system alongside the legacy one. Perhaps it has better performance generating Bernoulli and binomial variables than the old one, especially because its default RNG, PCG64, is lighter-weight than the legacy system's default, Mersenne Twister. The following is an example.
import numpy
beta = 5.0/256
rs = numpy.random.default_rng()
rbinom = rs.binomial(10, beta)

Ideas on filtering out consistent time series data

So I have two subsets of data that represent two situations. The one that look more consistent needs to be filtered out (they are noise) while the one looks random are kept (they are motions). The method I was using was to define a moving window = 10 and whenever the standard deviation of the data within the window was smaller than some threshold, I suppressed them. However, this method could not filter out all "consistent" noise while also hurting the inconsistent one (real motion). I was hoping to use some kinds of statistical models and not machine learning to accomplish this. Any suggestions would be appreciated!
noise
real motion
The Kolmogorov–Smirnov test is used to compare two samples to determine if they come from the same distribution. I realized that real world data would never be uniform. So instead of comparing my noise data against the uniform distribution, I used scipy.stats.ks_2samp function to compare any bursts against one real motion burst. I then muted the motion if the return p-value is significantly small, meaning I can reject the hypothesis that two samples are from the same distribution.

Mersenne twister: limitations used in agent based models

I am using the Mersnenne Twister as the engine to generate random numbers in an agent based model: it is fast and has an extremely long period before repeating.
Recently I did a literature review on this, while Colt library Java API recommends the Mersenne twister, I came across two limitations:
the seed should not be 0. Is this something suggested in the Apache Commons Math library ?
based on a cryptography paper, it was mentioned that "if the initial state has too many zeros then the generated sequence may also contain many zeros for more than 10000 generations and if the seeds are chosen systematically such as 0, 20, 30….. the output sequences will be correlated".
Has anyone come across such issues, or is it something fixed and not the case anymore ?
Is there any literature showing the spectral analysis of the Mersenne Twister vs the others like the Linear Congruential Generator?
SFMT has a better characteristic of zero-excess initial state.
A usual tip to get rid of zero-excess initialization of the seed is to use another PRNG (which might have near-equal probability of zeros and ones in the output) to generate the seed itself.
See also a comment on "How to properly seed a mersenne twister RNG?"

Can I display the number of iterations and each iteration result when using fmin_tnc?

I've been tasked with using fmin_tnc for an optimization problem. For the time being, I am only allowed to use fmin_tnc. I would like to display the number of iterations whether or not the results converge. Upon convergence (or line search error, which I often get), I receive a string of numbers, but they are not labeled so I'm not sure if one of them is an iteration number or function evaluation.
Additionally, I would like to store the values of my function output every time fmin_tnc iterates it.
So far, I can only find answers revolving around "fmin" but not "fmin_tnc". The code for the optimizer is as follows (unfortunately, I am not allowed to show more than this):
optimize.fmin_tnc(func1, x0, approx_grad=True, bounds=(bounds), epsilon=0.001, messages=15, stepmx=21, ftol=1e-06, xtol=1e-6)
Thank you very much!

Resources