Multithread Eigen linear solver-Using IncompleteLU preconditioner with Bicgstab - multithreading

I am trying to solve a large sparse matrix with BICGSTAB in Eigen. I have to run the code in parallel and it seems the IncompleteLU preconditioner is the only way that my solution converges. However, when I use BIGSTAB with IncompleteLU preconditioner the code runs in sequential mode.
Is it possible to change BIGSTAB.h and use INcompleteLU instead of DigonalPreconditioner?

BiCGSTAB itself can run in parallel in Eigen if your code is compiled with OpenMP. This is true regardless of your choice of preconditioner. However, what I believe you've observed is that the ILUT preconditioner implementation in Eigen is serial. This is an algorithmic limitation of ILUT, not an implementation issue, so the only way around it is to use another preconditioner. For reference, ViennaCL, another linear algebra library, implements some interesting parallel preconditioners: http://viennacl.sourceforge.net/doc/manual-algorithms.html#manual-algorithms-preconditioners-parallel-ilu0 . If you're specifically interested in an LU-type preconditioner, you may want to read the paper by Chow and Patel, which is implemented in ViennaCL: https://www.cc.gatech.edu/~echow/pubs/parilu-sisc.pdf .

Related

How to permute a rust simd vector by another variable vector?

Ok. So I've recently started learning rust by rewriting my program to rust though I might have started too hard because I am using portable simd module in rust and I am having restless nights of finding functions to use. Today I tried looking for a way to permute i8x16 vector by another i8x16 vector, like a _mm_shuffle_epi8 from SSE, but sadly didn't find anything.
Please help me find function from std::simd to permutate vector by another variable vector.
Preferably as fast as possible and I want to keep it SSE or AVX but not AVX2 or bigger.
Based on this issue, it is not yet implemented for std::simd: Introduce "dynamic swizzling" into LLVMIR and Rust intrinsics.
For now, you will have to resort to calling std::arch::{x86,x86_64}::_mm_shuffle_epi8 (or whatever you were after) manually, with cfg-wrapping of course if you want it to stay portable.

Difference between R::runif() and Rcpp::runif()

I'm learning to use Rcpp in R. Would you please explain me the difference between R::runif() and Rcpp::runif().
I mean 3 questions:
Do these 2 functions produce the same stream of random numbers given that we set the same seed before running each of them ?
Which function is preferable when using Rcpp ? I mean, it seems to me that the 2 functions produce the same thing, but Rcpp::runif() will run more fastly.
How to call Rcpp::runif() in a .R file ? Is it true that the Rcpp::runif() can be called only from a .cpp file and not in R? (I mean, it seems to me that the function Rcpp::runif() is of extensively used to write other C++ functions, then I will import that function by sourcecpp() to use in R)
Thank you very much for your help!
I suspect this question is a duplicate so I may close this but here goes:
Yes they do. The whole point of the RNG interfaces is guaranteeing just that
Entirely up to you. Sometimes you want to wrap or use a C API example and you have R::runif() for that. Sometimes you want efficient vector operations for which you have Rcpp::runif().
You write a C++ function accessing the C++ API. Note that not all those functions will be faster than calling what R offers when what R offers is already vectorised. Your wrapping of Rcpp::runif() will not be much different in performance from calling stats::runif(). You use the C++ accessor in C++ code writing something you cannot easily get from R.
Edit: This Rcpp Gallery post has some background and examples.

Equivalent terms of LLVM IR for watermarking by renumbering?

I want to apply an algorithm for watermarking that basically reorders equivalent terms of a programming language:
https://books.google.dk/books?id=mig-bH3u0Z0C&pg=PT595&lpg=PT595&dq=obfuscation+renumbering+register&source=bl&ots=b3vMhp-yTq&sig=RERdnDNewRqBi7ZmSNMlsnPy-Hw&hl=da&sa=X&ved=0ahUKEwiLw-zWrpnSAhWEHJoKHXCpAkMQ6AEIGTAA#v=onepage&q=obfuscation%20renumbering%20register&f=false
Say, T1, T2,...,Tn are equivalent terms of the language, then the watermark is a permutation f such that f(Ti) = Tj.
In this case the programming language is LLVM IR, which is an intermediate language.
The book gives an example of renumbering registers by applying a permutation. However, registers are not in the scope of LLVM IR, since they are a lower-level detail?
I've been thinking of equivalent terms of LLVM, but cannot come up with some. The more the better, since this means a more flexible degree of watermarking.
Can you think of equivalent terms of LLVM IR such that each could be substituted for some other? Or is it only possible to do such watermarking at the machine code level?
Even if you perform this at the IR level (and you could by changing patterns), you won't get far since the machine instruction level would reshuffle everything. You better off writing a (potentially post-RA) machine instruction level pass.

excel solver (Simplex LP) binary constraints

I am solving an optimization problem. the problem has binary constraints. solver is (during iteration) setting those binary constraints to decimals between 0 and 1 (approximating a relaxed gradient search). I wish to indicate to solver that it should just search over the discontinous values for 0..1.
Is there a way to do this?
Alternatively, is there an algorithm in OpenSolver which does this, that mimics the simplex-lp, and provides a global optimum?
the cheap way to do it, is to right a for-loop, and iterate over the values. I was wondering if there was a way to phrase it so that a nonlinear problem, becomes a linear problem.
Thanks.
The GRG Nonlinear and Simplex LP methods both use the Branch & Bound method when faced with integer constraints. This method "relaxes" the integer requirement first, finds a solution, then fixes one of the constraints to an integer and finds a new solution. See the Solver on-line documentation.
It is a brute force search method and can take a considerable amount of time.
The Evolutionary method uses it's own algorithm for dealing with integer constraints and is typically much faster than the other two methods.
You ask about linearizing a non-linear problem - you would need to provide more specific information in order to answer that (e.g. What is your equation? How have you set up your solver problem? etc.)

Hypothesis testing and GPGPU

I'm very new to GPGPU and Programming. I'm interested to know if statistical hypothesis testing like one-sample Kolmogorov-Smirnov test (K–S test) and Levene's test could be implemented in GPGPU (SIMD) using CUDA? If so what will be the limitations?
I just read web definitions about these tests, but, if I understood correctly, they can be properly accelerated by the kind of parallelism expressed by SIMD (in particular as implemented by CUDA).
In K-S test, one has to compute the difference between a function and an estimate on N samples, then take the maximum difference. In other words, one has to perform the same operation on N different values, which is exactly SIMD (single instruction, multiple data).
In Levene's test, there is again the same difference, square and multiplication over N different values.
What SIMD can do is a sort of FOR statement over N value sets, provided that the iterations are independent from each other. Thus, in CUDA for example, the compiler can allocate the iterations to the processing elements of the graphic device, so that, executing in parallel, the FOR loop is run for all the data in the time of a single iteration.
The CUDA toolkit provides a specific C/C++ compiler (NVCC) where special instructions are dispatched to the GPGPU rather than to the CPU, therefore distributed to its parallel processing elements.

Resources