Transform matrix of 3D positions with corresponding transformation matrix - transform

I have a matrix of 3D points (positions), in which every column represents a 3D point expressed in a local frame at a specific time instance.
The transforms (row)vector contains the transformation matrix of the moving local frame at each time instance, i.e. the ith transformation matrix corresponds with the ith column of positions.
I want to calculate the position in the global frame (transformed) by applying the transformation matrixes to their corresponding point.
This can be done with a for loop as follows:
Eigen::Matrix<Eigen::Isometry3d, 1, Eigen::Dynamic> transforms;
Eigen::Matrix<double, 3, Eigen::Dynamic> positions, transformed;
for (int i = 0; i < positions.cols(); ++i)
transformed.col(i) = transforms(i) * positions.col(i);
I was wondering if it is possible to perform the same operation avoiding the for loop. I tried the following two approaches, but they are giving me compilation errors:
Apply the transformation columnwise:
transformed = transforms.colwise() * positions.colwise ();
error: invalid operands to binary expression (ColwiseReturnType (aka VectorwiseOp<Eigen::Matrix<Eigen::Transform<double, 3, 1, 0>, 1, -1, 1, 1, -1>, Vertical>) and ColwiseReturnType (aka VectorwiseOp<Eigen::Matrix<double, 3, -1, 0, 3, -1>, Vertical>))
Apply the transformation using arrays:
transformed = transforms.array() * positions.array().colwise ();
error: invalid operands to binary expression (ArrayWrapper<Eigen::Matrix<Eigen::Transform<double, 3, 1, 0>, 1, -1, 1, 1, -1> > and ColwiseReturnType (aka VectorwiseOp<Eigen::ArrayWrapper<Eigen::Matrix<double, 3, -1, 0, 3, -1> >, Vertical>))
Question: How can I rewrite the for loop to eliminate the (explicit) for loop?

That's not easy but doable. First you have to tell Eigen that you allow scalar products between an Isometry3D and a Vector3d and that the result is a Vector3d:
namespace Eigen {
template<>
struct ScalarBinaryOpTraits<Isometry3d,Vector3d,internal::scalar_product_op<Isometry3d,Vector3d> > {
typedef Vector3d ReturnType;
};
}
Then, you need to interpret your 3xN matrices as a vector of Vector3d using Map:
auto as_vec_of_vec3 = [] (Matrix3Xd& v) { return Matrix<Vector3d,1,Dynamic>::Map(reinterpret_cast<Vector3d*>(v.data()), v.cols()); };
Finally, you can use cwiseProduct to carry out all products at once:
as_vec_of_vec3(transformed2) = transforms.cwiseProduct(as_vec_of_vec3(positions));
Putting everything together:
#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;
using namespace std;
namespace Eigen {
template<>
struct ScalarBinaryOpTraits<Isometry3d,Vector3d,internal::scalar_product_op<Isometry3d,Vector3d> > {
typedef Vector3d ReturnType;
};
}
int main()
{
int n = 10;
Matrix<Isometry3d, 1, Dynamic> transforms(n);
Matrix<double, 3, Dynamic> positions(3,n), transformed(3,n);
positions.setRandom();
for (int i = 0; i < n; ++i)
transforms(i).matrix().setRandom();
auto as_vec_of_vec3 = [] (Matrix3Xd& v) { return Matrix<Vector3d,1,Dynamic>::Map(reinterpret_cast<Vector3d*>(v.data()), v.cols()); };
as_vec_of_vec3(transformed) = transforms.cwiseProduct(as_vec_of_vec3(positions));
cout << transformed << "\n\n";
}

This answers extends ggaels accepted answer to be compatible with versions of Eigen older than 3.3.
Pre Eigen 3.3 compatibility
ScalarBinaryOpTraits is introduced in Eigen 3.3 as a replacement for internal::scalar_product_traits. Therefore, one should use internal::scalar_product_traits before Eigen 3.3:
template<>
struct internal::scalar_product_traits<Isometry3d,Vector3d> {
enum {Defined = 1};
typedef Vector3d ReturnType;
};

Related

Eigen, parallel ConjugateGradient failed: More threads, more costs

I want to use a parallel ConjugateGradient in Eigen 3.3.7 (gitlab) to solve Ax=b, but it showed that more threads, more computational costs.
I test the code in this question and change the matrix dimension from 90000 to 9000000. Here is the code (I name the file as test-cg-parallel.cpp):
// Use RowMajor to make use of multi-threading
typedef SparseMatrix<double, RowMajor> SpMat;
typedef Triplet<double> T;
// Assemble sparse matrix from
// https://eigen.tuxfamily.org/dox/TutorialSparse_example_details.html
void insertCoefficient(int id, int i, int j, double w, vector<T>& coeffs,
VectorXd& b, const VectorXd& boundary)
{
int n = int(boundary.size());
int id1 = i+j*n;
if(i==-1 || i==n) b(id) -= w * boundary(j); // constrained coefficient
else if(j==-1 || j==n) b(id) -= w * boundary(i); // constrained coefficient
else coeffs.push_back(T(id,id1,w)); // unknown coefficient
}
void buildProblem(vector<T>& coefficients, VectorXd& b, int n)
{
b.setZero();
ArrayXd boundary = ArrayXd::LinSpaced(n, 0,M_PI).sin().pow(2);
for(int j=0; j<n; ++j)
{
for(int i=0; i<n; ++i)
{
int id = i+j*n;
insertCoefficient(id, i-1,j, -1, coefficients, b, boundary);
insertCoefficient(id, i+1,j, -1, coefficients, b, boundary);
insertCoefficient(id, i,j-1, -1, coefficients, b, boundary);
insertCoefficient(id, i,j+1, -1, coefficients, b, boundary);
insertCoefficient(id, i,j, 4, coefficients, b, boundary);
}
}
}
int main()
{
int n = 3000; // size of the image
int m = n*n; // number of unknowns (=number of pixels)
// Assembly:
vector<T> coefficients; // list of non-zeros coefficients
VectorXd b(m); // the right hand side-vector resulting from the constraints
buildProblem(coefficients, b, n);
SpMat A(m,m);
A.setFromTriplets(coefficients.begin(), coefficients.end());
// Solving:
// Use ConjugateGradient with Lower|Upper as the UpLo template parameter to make use of multi-threading
clock_t time_start, time_end;
time_start=clock();
ConjugateGradient<SpMat, Lower|Upper> solver(A);
VectorXd x = solver.solve(b); // use the factorization to solve for the given right hand side
time_end=clock();
cout<<"time use:"<<1000*(time_end-time_start)/(double)CLOCKS_PER_SEC<<"ms"<<endl;
return 0;
}
I compile the code with gcc 7.4.0, Intel Xeon E2186G CPU with 6 cores(12 threads), compile and run details are as follows:
liu#liu-Precision-3630-Tower:~/test$ g++ test-cg-parallel.cpp -O3 -fopenmp -o cg
liu#liu-Precision-3630-Tower:~/test$ OMP_NUM_THREADS=1 ./cg
time use:747563ms
liu#liu-Precision-3630-Tower:~/test$ OMP_NUM_THREADS=4 ./cg
time use: 1.49821e+06ms
liu#liu-Precision-3630-Tower:~/test$ OMP_NUM_THREADS=8 ./cg
time use: 2.60207e+06ms
Can anyone give me some advices? Thanks a lot.

Check BIC via RcppArmadillo(Rcpp)

I want check the BIC usage in statistics.
My little example, which is saved as check_bic.cpp, is presented as follows:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
List check_bic(const int N = 10, const int p = 20, const double seed=0){
arma_rng::set_seed(seed); // for reproducibility
arma::mat Beta = randu(p,N); //randu/randn:random values(uniform and normal distributions)
arma::vec Bic = randu(N);
uvec ii = find(Bic == min(Bic)); // may be just one or several elements
int id = ii(ii.n_elem); // fetch the last one element
vec behat = Beta.col(id); // fetch the id column of matrix Beta
List ret;
ret["Bic"] = Bic;
ret["ii"] = ii;
ret["id"] = id;
ret["Beta"] = Beta;
ret["behat"] = behat;
return ret;
}
Then I compile check_bic.cpp in R by
library(Rcpp)
library(RcppArmadillo);
sourceCpp("check_bic.cpp")
and the compilation can pass successfully.
However, when I ran
check_bic(10,20,0)
in R, it shows errors as
error: Mat::operator(): index out of bounds
Error in check_bic(10, 20, 0) : Mat::operator(): index out of bounds
I check the .cpp code line by line, and guess the problems probably
happen at
uvec ii = find(Bic == min(Bic)); // may be just one or several elements
int id = ii(ii.n_elem); // fetch the last one element
since if uvec ii only has one element, then ii.n_elem may be NaN or something
else in Rcpp (while it's ok in Matlab), while I dont konw how to
deal with case. Any help?

RcppArrayFire passing a matrix row as af::array input

In this simple example I would like to subset a matrix by row and pass it to another cpp function; the example demonstrates this works by passing an input array to the other function first.
#include "RcppArrayFire.h"
using namespace Rcpp;
af::array theta_check_cpp( af::array theta){
if(*theta(1).host<double>() >= 1){
theta(1) = 0;
}
return theta;
}
// [[Rcpp::export]]
af::array theta_check(RcppArrayFire::typed_array<f64> theta){
const int theta_size = theta.dims()[0];
af::array X(2, theta_size);
X(0, af::seq(theta_size)) = theta_check_cpp( theta );
X(1, af::seq(theta_size)) = theta;
// return X;
Rcpp::Rcout << " works till here";
return theta_check_cpp( X.row(1) );
}
/*** R
theta <- c( 2, 2, 2)
theta_check(theta)
*/
The constructor you are using to create X has an argument ty for the data type, which defaults to f32. Therefore X uses 32 bit floats and you cannot extract a 64 bit host pointer from that. Either use
af::array X(2, theta_size, f64);
to create an array using 64 bit doubles, or extract a 32 bit host pointer via
if(*theta(1).host<float>() >= 1){
...

Creating a Templated Function to Fill a Vector with another depending on Size

Is there a base function in Rcpp that:
Fills entirely by a single value if size of a vector is 1.
Fills the other vector completely if same length.
Fills with an NA value if neither Vector are the same length nor a vector is of size 1.
I've written the above criteria as a function below using a NumericVector as an example. If there isn't a base function in Rcpp that performs said operations there should be a way to template the function so that given any type of vector (e.g. numeric, character and so on) the above logic would be able to be executed.
// [[Rcpp::export]]
NumericVector cppvectorize(NumericVector x,NumericVector y) {
NumericVector y_out(y.size());
if(x.size() == 1) {
for(int i = 0; i < y_out.size(); i++) {
y_out[i] = x[0];
}
} else if(x.size() == y_out.size()) {
for(int i = 0; i < y_out.size(); i++) {
y_out[i] = x[i];
}
} else {
for(int i = 0; i < y_out.size(); i++) {
y_out[i] = NA_REAL;
}
}
return y_out;
}
Unfortunately, the closest you will come to such a function is one of the rep variants that Rcpp supports. However, none of the variants match the desired output. Therefore, the only option is to really implement a templated version of your desired function.
To create the templated function, we will first create a routing function that handles the dispatch of SEXP objects. The rationale behind the routing function is SEXP objects are able to be retrieved from and surfaced into R using Rcpp Attributes whereas a templated version is not. As a result, we need to specify the SEXTYPE (used as RTYPE) dispatches that are possible. The TYPEOF() macro retrieves the coded number. Using a switch statement, we can dispatch this number into the appropriate cases.
After dispatching, we arrive at the templated function. The templated function makes use of the base Vector class of Rcpp to simplify the data flow. From here, the notable novelty will be the use of ::traits::get_na<RTYPE>() to dynamically retrieve the appropriate NA value and fill it.
With the plan in place, let's look at the code:
#include <Rcpp.h>
using namespace Rcpp;
// ---- Templated Function
template <int RTYPE>
Vector<RTYPE> vec_helper(const Vector<RTYPE>& x, const Vector<RTYPE>& y) {
Vector<RTYPE> y_out(y.size());
if(x.size() == 1){
y_out.fill(x[0]);
} else if (x.size() == y.size()) {
y_out = x;
} else {
y_out.fill(::traits::get_na<RTYPE>());
}
return y_out;
}
// ---- Dispatch function
// [[Rcpp::export]]
SEXP cppvectorize(SEXP x, SEXP y) {
switch (TYPEOF(x)) {
case INTSXP: return vec_helper<INTSXP>(x, y);
case REALSXP: return vec_helper<REALSXP>(x, y);
case STRSXP: return vec_helper<STRSXP>(x, y);
default: Rcpp::stop("SEXP Type Not Supported.");
}
// Need to return a value even though this will never be triggered
// to quiet the compiler.
return R_NilValue;
}
Sample Tests
Here we conduct a few sample tests on each of the supported data
# Case 1: x == 1
x = 1:5
y = 2
cppvectorize(x, y)
## [1] NA
# Case 2: x == y
x = letters[1:5]
y = letters[6:10]
cppvectorize(x, y)
## [1] "a" "b" "c" "d" "e"
# Case 3: x != y && x > 1
x = 1.5
y = 2.5:6.5
cppvectorize(x, y)
## [1] 1.5 1.5 1.5 1.5 1.5

Draw cube vertices with fewest number of steps

What's the fewest number of steps needed to draw all of the cube's vertices, without picking up the pen from the paper?
So far I have reduced it to 16 steps:
0, 0, 0
0, 0, 1
0, 1, 1
1, 1, 1
1, 1, 0
0, 1, 0
0, 0, 0
1, 0, 0
1, 0, 1
0, 0, 1
0, 1, 1
0, 1, 0
1, 1, 0
1, 0, 0
1, 0, 1
1, 1, 1
I presume it can be reduced less than 16 steps as there are only 12 vertices to be drawn
You can view a working example in three.js javascript here:
http://jsfiddle.net/kmturley/5aeucehf/show/
Well I encoded a small brute force solver for this
the best solution is with 16 vertexes
took about 11.6 sec to compute
all is in C++ (visualization by OpenGL)
First the cube representation:
//---------------------------------------------------------------------------
#define a 0.5
double pnt[]=
{
-a,-a,-a, // point 0
-a,-a,+a,
-a,+a,-a,
-a,+a,+a,
+a,-a,-a,
+a,-a,+a,
+a,+a,-a,
+a,+a,+a, // point 7
1e101,1e101,1e101, // end tag
};
#undef a
int lin[]=
{
0,1,
0,2,
0,4,
1,3,
1,5,
2,3,
2,6,
3,7,
4,5,
4,6,
5,7,
6,7,
-1,-1, // end tag
};
// int solution[]={ 0, 1, 3, 1, 5, 4, 0, 2, 3, 7, 5, 4, 6, 2, 6, 7, -1 }; // found polyline solution
//---------------------------------------------------------------------------
void draw_lin(double *pnt,int *lin)
{
glBegin(GL_LINES);
for (int i=0;lin[i]>=0;)
{
glVertex3dv(pnt+(lin[i]*3)); i++;
glVertex3dv(pnt+(lin[i]*3)); i++;
}
glEnd();
}
//---------------------------------------------------------------------------
void draw_pol(double *pnt,int *pol)
{
glBegin(GL_LINE_STRIP);
for (int i=0;pol[i]>=0;i++) glVertex3dv(pnt+(pol[i]*3));
glEnd();
}
//---------------------------------------------------------------------------
Now the solver:
//---------------------------------------------------------------------------
struct _vtx // vertex
{
List<int> i; // connected to (vertexes...)
_vtx(){}; _vtx(_vtx& a){ *this=a; }; ~_vtx(){}; _vtx* operator = (const _vtx *a) { *this=*a; return this; }; /*_vtx* operator = (const _vtx &a) { ...copy... return this; };*/
};
const int _max=16; // know solution size (do not bother to find longer solutions)
int use[_max],uses=0; // temp line usage flag
int pol[_max],pols=0; // temp solution
int sol[_max+2],sols=0; // best found solution
List<_vtx> vtx; // model vertexes + connection info
//---------------------------------------------------------------------------
void _solve(int a)
{
_vtx *v; int i,j,k,l,a0,a1,b0,b1;
// add point to actual polyline
pol[pols]=a; pols++; v=&vtx[a];
// test for solution
for (l=0,i=0;i<uses;i++) use[i]=0;
for (a0=pol[0],a1=pol[1],i=1;i<pols;i++,a0=a1,a1=pol[i])
for (j=0,k=0;k<uses;k++)
{
b0=lin[j]; j++;
b1=lin[j]; j++;
if (!use[k]) if (((a0==b0)&&(a1==b1))||((a0==b1)&&(a1==b0))) { use[k]=1; l++; }
}
if (l==uses) // better solution found
if ((pols<sols)||(sol[0]==-1))
for (sols=0;sols<pols;sols++) sol[sols]=pol[sols];
// recursion only if pol not too big
if (pols+1<sols) for (i=0;i<v->i.num;i++) _solve(v->i.dat[i]);
// back to previous state
pols--; pol[pols]=-1;
}
//---------------------------------------------------------------------------
void solve(double *pnt,int *lin)
{
int i,j,a0,a1;
// init sizes
for (i=0;i<_max;i++) { use[i]=0; pol[i]=-1; sol[i]=-1; }
for(i=0,j=0;pnt[i]<1e100;i+=3,j++); vtx.allocate(j); vtx.num=j;
for(i=0;i<vtx.num;i++) vtx[i].i.num=0;
// init connections
for(uses=0,i=0;lin[i]>=0;uses++)
{
a0=lin[i]; i++;
a1=lin[i]; i++;
vtx[a0].i.add(a1);
vtx[a1].i.add(a0);
}
// start actual solution (does not matter which vertex on cube is first)
pols=0; sols=_max+1; _solve(0);
sol[sols]=-1; if (sol[0]<0) sols=0;
}
//---------------------------------------------------------------------------
Usage:
solve(pnt,lin); // call once to compute the solution
glColor3f(0.2,0.2,0.2); draw_lin(pnt,lin); // draw gray outline
glColor3f(1.0,1.0,1.0); draw_pol(pnt,sol); // overwrite by solution to visually check correctness (Z-buffer must pass also on equal values!!!)
List
is just mine template for dynamic array
List<int> x is equivalent to int x[]
x.add(5) ... adds 5 to the end of list
x.num is the used size of list in entries
x.allocate(100) preallocate list size to 100 entries (to avoid relocations slowdowns)
solve(pnt,lin) algorithm
first prepare vertex data
each vertex vtx[i] corresponds to point i-th point in pnt table
i[] list contains the index of each vertex connected to this vertex
start with vertex 0 (on cube is irrelevant the start point
otherwise there would be for loop through every vertex as start point
_solve(a)
it adds a vertex index to actual solution pol[pols]
then test how many lines is present in actual solution
and if all lines from lin[] are drawn and solution is smaller than already found one
copy it as new solution
after test if actual solution is not too long recursively add next vertex
as one of the vertex that is connected to last vertex used
to limit the number of combinations
at the end sol[sols] hold the solution vertex index list
sols is the number of vertexes used (lines-1)
[Notes]
the code is not very clean but it works (sorry for that)
hope I did not forget to copy something

Resources