Reliable and fast pseudo random number generator (PRNG) for Monte Carlo simulation in .net - montecarlo

I use pseudo random number generators (PRNG) to Monte Carlo simulate a queueing type of system. I use System.Random, because it is fast, but found out that it has some weird correlation between subsequent draws, which interferes with the results (it is not random enough).
Now I am using Mersenne Twister (http://takel.jp/mt/MersenneTwister.cs), which (up until now) has proven to be random enough for my purposes. It is 50% slower, but that is a price I am willing to pay to get reliable results.
What PRNG for .net is most suitable for Monte Carlo simulation? I am looking for a reliable PRNG that is not too slow.

The Mersenne Twister has been optimized for use with Monte Carlo simulations in a number of fields, so i would stick to that one.
If performance is an issue and going parralell is not an option i would go for an XORshift generator. A very good (fast) random number generator from Geroge Marsaglia.
Here's the paper:
That is probably your best bet if you need a good and fast PRNG for some monte carlo or other statistical simulations, not for cryptography though.
At this SO post you can find a very simple port in JAVA, but should be not that hard to rewrite or find a C# implementation on the net.

you can also use SIMD-oriented Fast Mersenne Twister (SFMT), which is very fast, it uses SIMD instruction to generate random number in parallel.
it can be found in Home Page of Makoto Matsumoto:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html

Related

Does sklearn.linear_model.LogisticRegression always converge to best solution?

When using this code, I've noticed that it converges unbelievably quickly (small
fraction of one second), even when the model and/or the data is very large. I
suspect that in some cases I am not getting anything close to the best solution,
but this is hard to prove. It would be nice to have the option for some type of
global optimizer such as the basin hopping algorithm, even if this consumed 100
to 1,000 times as much CPU. Does anyone have any thoughts on this subject?
This is a very complex question and this answer might be incomplete, but should give you some hints (as your question also indicates some knowledge gaps):
(1) First i disagree with the desire for some type of global optimizer such as the basin hopping algorithm, even if this consumed 100 to 1,000 times as much CPU as this does not help in most cases (in ML world) as the differences are so subtle and the optimization-error will often be negligible compared to the other errors (model-power; empirical-risk)
Read "Stochastic Gradient Descent Tricks" (Battou) for some overview (and the error-components!)
He even gives a very important reason to use fast approximate algorithms (not necessarily a good fit in your case if 1000x training-time is not a problem): approximate optimization can achieve better expected risk because more training examples can be processed during the allowed time
(2) Basin-hopping is some of these highly heuristic tools of global-optimization (looking for global-minima instead of local minima) without any guarantees at all (touching NP-hardness and co.). It's the last algorithm you want to use here (see point (3))!
(3) The problem of logistic-regression is a convex optimization problem!
The local minimum is always the global-minimum, which follows from convexity (i'm ignoring stuff like strictly/unique solutions and co)!
Therefore you will always use something tuned for convex-optimization! And never Basin-hopping!
(4) There are different solvers and each support different variants of problems (different regularization and co.). We don't know exactly what you are optimizing, but of course these solvers are working differently in regards to convergence:
Take the following comments with a grain of salt:
liblinear: is probably using some CG-based algorithm (conjugated-gradient) which means convergence is highly dependent on the data
if accurate convergence is achieved is solely depending on the exact implementation (liblinear is high-quality)
as it's a first-order method i would call the general accuracy medium
sag/saga: seems to have a better convergence-theory (did not check it much), but again: it's dependent on your data as mentioned in sklearn's docs and if solutions are accurate is highly depending on the implementation details
as these are first-order methods: general accuracy medium
newton-cg: an inexact newton-method
in general much more robust in terms of convergence as line-searches replace heuristics or constant learning-rates (LS costly in first-order opt)
second-order method with inexact-core: expected accuracy: medium-high
lbfgs: quasi-newton method
again in general much more robust in terms of convergence like newton-cg
second-order method: expected accuracy: medium-high
Of course second-order methods get more hurt with large-scale data (even complexity-wise) and as mentioned, not all solvers are supporting every logreg-optimization-problem supported in sklearn.
I hope you get the idea how complex this question is (because of highly complex solver-internals).
Most important things:
LogReg is convex -> use solvers tuned for unconstrained convex optimization
If you want medium-high accuracy: use those second-order based methods available and do many iterations (it's a parameter)
If you want high accuracy: use second-order based methods which are even more conservative/careful (no: hessian-approx; inverse-hessian-approx; truncating...):
e.g. any off-the-shelve solver from convex-optimization
Open-source: cvxopt, ecos and co.
Commercial: Mosek
(but you need to formulate the model yourself in their frameworks or some wrapper; probably some examples for classic logistic-regression available)
As expected: some methods will get very slow with much data.

Mersenne twister: limitations used in agent based models

I am using the Mersnenne Twister as the engine to generate random numbers in an agent based model: it is fast and has an extremely long period before repeating.
Recently I did a literature review on this, while Colt library Java API recommends the Mersenne twister, I came across two limitations:
the seed should not be 0. Is this something suggested in the Apache Commons Math library ?
based on a cryptography paper, it was mentioned that "if the initial state has too many zeros then the generated sequence may also contain many zeros for more than 10000 generations and if the seeds are chosen systematically such as 0, 20, 30….. the output sequences will be correlated".
Has anyone come across such issues, or is it something fixed and not the case anymore ?
Is there any literature showing the spectral analysis of the Mersenne Twister vs the others like the Linear Congruential Generator?
SFMT has a better characteristic of zero-excess initial state.
A usual tip to get rid of zero-excess initialization of the seed is to use another PRNG (which might have near-equal probability of zeros and ones in the output) to generate the seed itself.
See also a comment on "How to properly seed a mersenne twister RNG?"

Multiplying small matrices in parallel

I have been writing code to multiply matrices in parallel using POSIX threads and I have been seeing great speedup when operating on large matrices; however, as I shrink the size of the matrices the naive sequential O(n^3) matrix multiplication algorithm begins to overtake the performance of the parallel implementation.
Is this normal or does it indicate a poor quality algorithm? Is it simply me noticing the extra overhead of creating and handling threads and that past a certain point that extra time dominates the computation?
Note that this is for homework, so I won't be posting my code as I don't want to breach my University's Academic Integrity Policies.
It is not possible to give an exact answer without seeing the code(or a detailed description of an algorithm, at least), but in general it is normal for simple algorithms to perform better on small inputs because of a smaller constant factor. Moreover, thread creation/context switches are not free so it can take longer to create a thread then to perform some simple computations. So if your algorithm works much faster than a naive one on large inputs, there should be no reasons to worry about it.

Singular value decomposition (SVD) using multithreading

I am running the partial SVD of a large (120k x 600k) and sparse (0.1 of non-zero values) matrix on a 3,5GHz/3,9GHz (6 cores / 12 threads) server with 128GB of RAM using SVDLIBC.
Is it possible to speed up the process a little bit using multithreading so as to take full advantage of my server configuration?
I have no experience of multithreading; therefore I am asking for friendly advices and/or pointer to manuals/tutorials.
[EDIT] I am open to alternatives too (matlab/octave, r, etc.)
In Matlab, for sparse matrices, you have svds. This implementation benefits from multithreaded computation (1)
See irlba: Fast partial SVD by implicitly-restarted Lanczos bidiagonalization in R. It just calculates the first user-specified no. of dimensions. Had good experience with it in past. But, then I used on commercial version of R which was complied to take advantage of multi-threading so can't vouch for speed-improvement due because of multi-threading.

Compare and Contrast Monte-Carlo Method and Evolutionary Algorithms

What's the relationship between the Monte-Carlo Method and Evolutionary Algorithms? On the face of it they seem to be unrelated simulation methods used to solve complex problems. Which kinds of problems is each best suited for? Can they solve the same set of problems? What is the relationship between the two (if there is one)?
"Monte Carlo" is, in my experience, a heavily overloaded term. People seem to use it for any technique that uses a random number generator (global optimization, scenario analysis (Google "Excel Monte Carlo simulation"), stochastic integration (the Pi calculation that everybody uses to demonstrate MC). I believe, because you mentioned evolutionary algorithms in your question, that you are talking about Monte Carlo techniques for mathematical optimization: You have a some sort of fitness function with several input parameters and you want to minimize (or maximize) that function.
If your function is well behaved (there is a single, global minimum that you will arrive at no matter which inputs you start with) then you are best off using a determinate minimization technique such as the conjugate gradient method. Many machine learning classification techniques involve finding parameters that minimize the least squares error for a hyperplane with respect to a training set. The function that is being minimized in this case is a smooth, well behaved, parabaloid in n-dimensional space. Calculate the gradient and roll downhill. Easy peasy.
If, however, your input parameters are discrete (or if your fitness function has discontinuties) then it is no longer possible to calculate gradients accurately. This can happen if your fitness function is calculated using tabular data for one or more variables (if variable X is less than 0.5 use this table else use that table). Alternatively, you may have a program that you got from NASA that is made up of 20 modules written by different teams that you run as a batch job. You supply it with input and it spits out a number (think black box). Depending on the input parameters that you start with you may end up in a false minimum. Global optimization techniques attempt to address these types of problems.
Evolutionary Algorithms form one class of global optimization techniques. Global optimization techniques typically involve some sort of "hill climbing" (accepting a configuration with a higher (worse) fitness function). This hill climbing typically involves some randomness/stochastic-ness/monte-carlo-ness. In general, these techniques are more likely to accept less optimal configurations early on and, as the optimization progresses, they are less likely to accept inferior configurations.
Evolutionary algorithms are loosely based on evolutionary analogies. Simulated annealing is based upon analogies to annealing in metals. Particle swarm techniques are also inspired by biological systems. In all cases you should compare results to a simple random (a.k.a. "monte carlo") sampling of configurations...this will often yield equivalent results.
My advice is to start off using a deterministic gradient-based technique since they generally require far fewer function evaluations than stochastic/monte-carlo techniques. When you hear hoof steps think horses not zebras. Run the optimization from several different starting points and, unless you are dealing with a particularly nasty problem, you should end up with roughly the same minimum. If not, then you might have zebras and should consider using a global optimization method.
well I think Monte Carlo methods is the general name for these methods which
use random numbers in order to solve optimization problems. In this ways,
even the evolutionary algorithms are a type of Monte Carlo methods if they
use random numbers (and in fact they do).
Other Monte Carlo methods are: metropolis, wang-landau, parallel tempering,etc
OTOH, Evolutionary methods use 'techniques' borrowed from nature such as
mutation, cross-over, etc.

Resources