Singular value decomposition (SVD) using multithreading - multithreading

I am running the partial SVD of a large (120k x 600k) and sparse (0.1 of non-zero values) matrix on a 3,5GHz/3,9GHz (6 cores / 12 threads) server with 128GB of RAM using SVDLIBC.
Is it possible to speed up the process a little bit using multithreading so as to take full advantage of my server configuration?
I have no experience of multithreading; therefore I am asking for friendly advices and/or pointer to manuals/tutorials.
[EDIT] I am open to alternatives too (matlab/octave, r, etc.)

In Matlab, for sparse matrices, you have svds. This implementation benefits from multithreaded computation (1)

See irlba: Fast partial SVD by implicitly-restarted Lanczos bidiagonalization in R. It just calculates the first user-specified no. of dimensions. Had good experience with it in past. But, then I used on commercial version of R which was complied to take advantage of multi-threading so can't vouch for speed-improvement due because of multi-threading.

Related

Is there a simple way to use Oange3 with an Nvidia GPU?

I need to compute a high dimension dataset, with clustering on Orange3 app. So, there's too many time spent to calculate the Distance Matrix between the objects. If I could use a graphic card for this tasks it will take much less time to complete the task. Anyone know, let's say, a workaround to do this?
No. Orange uses numpy arrays and computes distances on CPU. Short of reimplementing the routine for calculation of distances (which in itself is rather short and simple), there's nothing you can do about it.
Orange will start using Dask in some not too distant future, but until then try reducing your data set. You may not need all dimensions and/or objects for your clustering.

Accelerating diagonalization with GPU and multiprocessing

In my python code, I have to diagonalize hundreds of arrays, each of size around ~1000*1000. However, each array is independent, so it would seem that I can accelerate this process using parallel programming. A minimal (pseudo-code) example would be something of the form
arr_list = [a0, a1, a2, ..., a199] # each arr is of shape 1000*1000
for idx, arr in enumerate(arr_list):
evals, evecs = np.linalg.eigh(arr)
arr_list[idx] = evals
I'm not very familiar with CUDA, numba, CuPy or multiprocessing, but some quick research seems to tell me that CuPy is mainly used for accelerating basic operations such as addition, multiplication, diagonalization, etc. and only really has a significant jump in time relative to numpy if the array size is much larger than 1000. Multiprocessing, in contrast, seems to utilize the multiple cores (6-8) on a CPU, but it seems that numpy diagonalization is already a multi-core process (correct me if wrong), so it may not have a larger decrease in time.
I'm not very familiar with parallel programming, so I wondering if someone with more experience could give a few pointers on such a problem. Maybe a direction to research.
EDIT. Unfortunately, I'm on Windows so jax doesn't seem to work.

Multithreading in divide and conquer matrix multiplication

There is this code of divide and conquer matrix multiplication which is taking quite a lot of time for the operations, when the input size for the matrix is nearly 4k. Multi-threading can be considered as a solution for reducing the running time. But should we create Thread as a object and pass or just implement a Runnable class? both the perspective seem to be working but when we create a thread more than a particular number the running time seems to be more worse.
Please some explain why is it so with an implementation of the same in java or python?

Multiplying small matrices in parallel

I have been writing code to multiply matrices in parallel using POSIX threads and I have been seeing great speedup when operating on large matrices; however, as I shrink the size of the matrices the naive sequential O(n^3) matrix multiplication algorithm begins to overtake the performance of the parallel implementation.
Is this normal or does it indicate a poor quality algorithm? Is it simply me noticing the extra overhead of creating and handling threads and that past a certain point that extra time dominates the computation?
Note that this is for homework, so I won't be posting my code as I don't want to breach my University's Academic Integrity Policies.
It is not possible to give an exact answer without seeing the code(or a detailed description of an algorithm, at least), but in general it is normal for simple algorithms to perform better on small inputs because of a smaller constant factor. Moreover, thread creation/context switches are not free so it can take longer to create a thread then to perform some simple computations. So if your algorithm works much faster than a naive one on large inputs, there should be no reasons to worry about it.

Compare and Contrast Monte-Carlo Method and Evolutionary Algorithms

What's the relationship between the Monte-Carlo Method and Evolutionary Algorithms? On the face of it they seem to be unrelated simulation methods used to solve complex problems. Which kinds of problems is each best suited for? Can they solve the same set of problems? What is the relationship between the two (if there is one)?
"Monte Carlo" is, in my experience, a heavily overloaded term. People seem to use it for any technique that uses a random number generator (global optimization, scenario analysis (Google "Excel Monte Carlo simulation"), stochastic integration (the Pi calculation that everybody uses to demonstrate MC). I believe, because you mentioned evolutionary algorithms in your question, that you are talking about Monte Carlo techniques for mathematical optimization: You have a some sort of fitness function with several input parameters and you want to minimize (or maximize) that function.
If your function is well behaved (there is a single, global minimum that you will arrive at no matter which inputs you start with) then you are best off using a determinate minimization technique such as the conjugate gradient method. Many machine learning classification techniques involve finding parameters that minimize the least squares error for a hyperplane with respect to a training set. The function that is being minimized in this case is a smooth, well behaved, parabaloid in n-dimensional space. Calculate the gradient and roll downhill. Easy peasy.
If, however, your input parameters are discrete (or if your fitness function has discontinuties) then it is no longer possible to calculate gradients accurately. This can happen if your fitness function is calculated using tabular data for one or more variables (if variable X is less than 0.5 use this table else use that table). Alternatively, you may have a program that you got from NASA that is made up of 20 modules written by different teams that you run as a batch job. You supply it with input and it spits out a number (think black box). Depending on the input parameters that you start with you may end up in a false minimum. Global optimization techniques attempt to address these types of problems.
Evolutionary Algorithms form one class of global optimization techniques. Global optimization techniques typically involve some sort of "hill climbing" (accepting a configuration with a higher (worse) fitness function). This hill climbing typically involves some randomness/stochastic-ness/monte-carlo-ness. In general, these techniques are more likely to accept less optimal configurations early on and, as the optimization progresses, they are less likely to accept inferior configurations.
Evolutionary algorithms are loosely based on evolutionary analogies. Simulated annealing is based upon analogies to annealing in metals. Particle swarm techniques are also inspired by biological systems. In all cases you should compare results to a simple random (a.k.a. "monte carlo") sampling of configurations...this will often yield equivalent results.
My advice is to start off using a deterministic gradient-based technique since they generally require far fewer function evaluations than stochastic/monte-carlo techniques. When you hear hoof steps think horses not zebras. Run the optimization from several different starting points and, unless you are dealing with a particularly nasty problem, you should end up with roughly the same minimum. If not, then you might have zebras and should consider using a global optimization method.
well I think Monte Carlo methods is the general name for these methods which
use random numbers in order to solve optimization problems. In this ways,
even the evolutionary algorithms are a type of Monte Carlo methods if they
use random numbers (and in fact they do).
Other Monte Carlo methods are: metropolis, wang-landau, parallel tempering,etc
OTOH, Evolutionary methods use 'techniques' borrowed from nature such as
mutation, cross-over, etc.

Resources