Sort elements of a NumericMatrix by dim names - rcpp

I have a NumericMatrix m. Say m is (the elements in the square brackets are the dim names)
7 9 8
4 6 5
1 3 2
with column names = {"x", "z", "y"}, row names = {"z", "y", "x"}
I want the following output
1 2 3
4 5 6
7 8 9
with column names = {"x", "y", "z"}, row names = {"x", "y", "z"}
So what I want to do is the following -
Sort elements of each row according to the column names
Permute the rows such that their corresponding row names are sorted
Is there an easy way to do this in Rcpp for a general NumericMatrix?

This isn't necessarily the simplest approach, but it appears to work:
#include <Rcpp.h>
#include <map>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::NumericMatrix dim_sort(const Rcpp::NumericMatrix& m) {
Rcpp::Function rownames("rownames");
Rcpp::Function colnames("colnames");
Rcpp::CharacterVector rn = rownames(m);
Rcpp::CharacterVector cn = colnames(m);
Rcpp::NumericMatrix result(Rcpp::clone(m));
Rcpp::CharacterVector srn(Rcpp::clone(rn));
Rcpp::CharacterVector scn(Rcpp::clone(cn));
std::map<std::string, int> row_map;
std::map<std::string, int> col_map;
for (int i = 0; i < rn.size(); i++) {
row_map.insert(std::pair<std::string, int>(Rcpp::as<std::string>(rn[i]), i));
col_map.insert(std::pair<std::string, int>(Rcpp::as<std::string>(cn[i]), i));
}
typedef std::map<std::string, int>::const_iterator cit;
cit cm_it = col_map.begin();
int J = 0;
for (; cm_it != col_map.end(); ++cm_it) {
int I = 0;
int j = cm_it->second;
scn[J] = cm_it->first;
cit rm_it = row_map.begin();
for (; rm_it != row_map.end(); ++rm_it) {
int i = rm_it->second;
result(J, I) = m(j, i);
srn[I] = rm_it->first;
I++;
}
J++;
}
result.attr("dimnames") = Rcpp::List::create(srn, scn);
return result;
}
/*** R
x <- matrix(
c(7,9,8,4,6,5,1,3,2),
nrow = 3,
dimnames = list(
c("x", "z", "y"),
c("z", "y", "x")
),
byrow = TRUE
)
R> x
z y x
x 7 9 8
z 4 6 5
y 1 3 2
R> dim_sort(x)
x y z
x 1 2 3
y 4 5 6
z 7 8 9
*/
I used a std::map<std::string, int> for two reasons:
maps automatically maintain a sorted order based on their keys, so by using the dim names as keys, the container does the sorting for us.
Letting a key's corresponding value be an integer representing the order in which it was added, we have an index for retrieving the appropriate value along a given dimension.

Related

In Rcpp How to create a NumericMatrix by a NumbericaVector?

In Rcpp How to create a NumericMatrix by a NumbericaVector?
Something like
// vector_1 has 16 element
NumericMatrix mat = NumericMatrix(vector_1, nrow = 4);
Thanks.
Edit: I knew we had something better. See below for update.
Looks like we do not have a matching convenience constructor for this. But you can just drop in a helper function -- the following is minimally viable (one should check that n + k == length(vector)) and taken from one of the unit tests:
// [[Rcpp::export]]
Rcpp::NumericMatrix vec2mat(Rcpp::NumericVector vec, int n, int k) {
Rcpp::NumericMatrix mat = Rcpp::no_init(n, k);
for (auto i = 0; i < n * k; i++) mat[i] = vec[i];
return mat;
}
Another constructor takes the explicit dimensions and then copies the payload for you (via memcpy()), removing the need for the loop:
// [[Rcpp::export]]
Rcpp::NumericMatrix vec2mat2(Rcpp::NumericVector s, int n, int k) {
Rcpp::NumericMatrix mat(n, k, s.begin());
return mat;
}
Full example below:
> Rcpp::sourceCpp("~/git/stackoverflow/66720922/answer.cpp")
> v <- (1:9) * 1.0 # numeric
> vec2mat(v, 3, 3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> vec2mat2(v, 3, 3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
>
Full source code below.
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericMatrix vec2mat(Rcpp::NumericVector vec, int n, int k) {
Rcpp::NumericMatrix mat = Rcpp::no_init(n, k);
for (auto i = 0; i < n * k; i++) mat[i] = vec[i];
return mat;
}
// [[Rcpp::export]]
Rcpp::NumericMatrix vec2mat2(Rcpp::NumericVector s, int n, int k) {
Rcpp::NumericMatrix mat(n, k, s.begin());
return mat;
}
/*** R
v <- (1:9) * 1.0 # numeric
vec2mat(v, 3, 3)
vec2mat2(v, 3, 3)
*/
Depending on what you want to do with the matrix object (linear algrebra?) you may want to consider RcppArmadillo (or RcppEigen) as those packages also have plenty of vector/matrix converters.

string split into all possible combination

for a given string "ABC", i wish to get all the possible character combination out of it ascendingly and without skip of character, the result should be:["A","B","C"],["AB","C"],["ABC"],["A","BC"]
Any idea how can i achieve this? I was thinking using a nested for loop to get all the component:
string input="ABCD";
List<string> component=new List<string>();
for(int i=0;i<=input.Length;i++){
for(int j=1;j<=(input.Length-i);j++){
component.Add(input.Substring(i,j));
}
}
But i have no idea how to put them into group as the above result. Any advice is appreciated.
You can go about this in several ways.
One way is recursion. Keep a current list of substrings and an overall results list. At the top level, iterate over all the possible gaps. Split the string into a substring and the rest. This should include the "gap" at the end, where you split the string into itself and the empty string as rest. Add the (non-empty) substring to the current list and recurse on the rest of the string. When the rest of the string is empty, add the current list to the overall results list. This will give you all 2ⁿ possibilities for a string with n + 1 letters.
Pseudocode:
// recursive function
function splits_r(str, current, res)
{
if (str.length == 0) {
res += [current]
} else {
for (i = 0; i < str.length; i++) {
splits_r(str.substr(i + 1, end),
current + [str.substr(0, i + 1)], res)
}
}
}
// wrapper to get the recursion going
function splits(str)
{
res = [];
splits_r(str, [], res);
return res;
}
Another way is enumeration of all possibilities. There are 2ⁿ possibilities for a string with n + 1 letters. You can consider one individual posibility as a combination of splits and non-splits. For example:
enum splits result
0 0 0 A B C D "ABCD"
0 0 1 A B C | D "ABC", "D"
0 1 0 A B | C D "AB", "CD"
0 1 1 A B | C | D "AB", "C", "D"
1 0 0 A | B C D "A", "BCD"
1 0 1 A | B C | D "A", "BC", "D"
1 1 0 A | B | C D "A", "B", "CD"
1 1 1 A | B | C | D "A", "B", "C", "D"
The enumeration uses 0 for no split and 1 for a split. It can be seen as a binary number. If you are familiar with bitwise operations, you can now enumerate all values from 0 to 2ⁿ and find out where the splits are.
Pseudocode:
function splits(str)
{
let m = str.length - 1; // possible gap positions
let n = (1 << m); // == pow(2, m)
let res = []
for (i = 0; i < n; i++) {
let last = 0
let current = []
for (j = 0; j < m; j++) { // loop over all gaps
if (i & (1 << j)) { // test for split
current.append(str.substr(last, j + 1));
last = j + 1;
}
}
current.append(s[last:])
res.append(current);
return res;
}

subset NumericMatrix by row and column names in Rcpp

I am trying to create a function in Rcpp that will take as input a pairwise numeric matrix, as well as a list of vectors, each element being a subset of row/column names. I would like this function identify the subset of the matrix that matches those names, and return the mean of the values.
Below I generated some dummy data that resembles the sort of data I have, and follow with an attempt of a Rcpp function.
library(Rcpp)
dat <- c(spA = 4, spB = 10, spC = 8, spD = 1, spE = 5, spF = 9)
pdist <- as.matrix(dist(dat))
pdist[upper.tri(pdist, diag = TRUE)] <- NA
Here I have a list made up of character vectors of various subsets of the row/column names in pdist
subsetList <- replicate(10, sample(names(dat), 4), simplify=FALSE)
For each of these sets of names, I would like to identify the subset of the pairwise matrix and take the mean of the values
Here is what I have so far, which does not work, but I think it illustrates where I am trying to get.
cppFunction('
List meanDistByCell(List input, NumericMatrix pairmat) {
int n = input.size();
List out(n);
List dimnames = pairmat.attr( "dimnames" );
CharacterVector colnames = dimnames[1];
for (int i = 0; i < n; i++) {
CharacterVector sp = as< CharacterVector >(input[i]);
if (sp.size() > 0) {
out[i] = double(mean(pairmat(sp, sp)));
} else {
out[i] = NA_REAL;
}
}
return out;
}
')
Any help would be greatly appreciated! Thanks!
Although (contiguous) range-based subsetting is available (e.g. x(Range(first_row, last_row), Range(first_col, last_col))), as coatless pointed out, subsetting by CharacterVector is not currently supported, so you will have to roll your own for the time being. A general-ish approach might look something like this:
template <int RTYPE> inline Matrix<RTYPE>
Subset2D(const Matrix<RTYPE>& x, CharacterVector crows, CharacterVector ccols) {
R_xlen_t i = 0, j = 0, rr = crows.length(), rc = ccols.length(), pos;
Matrix<RTYPE> res(rr, rc);
CharacterVector xrows = rownames(x), xcols = colnames(x);
IntegerVector rows = match(crows, xrows), cols = match(ccols, xcols);
for (; j < rc; j++) {
// NB: match returns 1-based indices
pos = cols[j] - 1;
for (i = 0; i < rr; i++) {
res(i, j) = x(rows[i] - 1, pos);
}
}
rownames(res) = crows;
colnames(res) = ccols;
return res;
}
// [[Rcpp::export]]
NumericMatrix subset2d(NumericMatrix x, CharacterVector rows, CharacterVector cols) {
return Subset2D(x, rows, cols);
}
This assumes that the input matrix has both row and column names, and that the row and column lookup vectors are valid subsets of those dimnames; additional defensive code could be added to make this more robust. To demonstrate,
subset2d(pdist, subsetList[[1]], subsetList[[1]])
# spB spD spE spC
# spB NA NA NA NA
# spD 9 NA NA 7
# spE 5 4 NA 3
# spC 2 NA NA NA
pdist[subsetList[[1]], subsetList[[1]]]
# spB spD spE spC
# spB NA NA NA NA
# spD 9 NA NA 7
# spE 5 4 NA 3
# spC 2 NA NA NA
Subset2D takes care of most of the boilerplate involved in implementing meanDistByCell; all that remains is to loop over the input list, apply this to each list element, and store the mean of the result in the output list:
// [[Rcpp::export]]
List meanDistByCell(List keys, NumericMatrix x, bool na_rm = false) {
R_xlen_t i = 0, sz = keys.size();
List res(sz);
if (!na_rm) {
for (; i < sz; i++) {
res[i] = NumericVector::create(
mean(Subset2D(x, keys[i], keys[i]))
);
}
} else {
for (; i < sz; i++) {
res[i] = NumericVector::create(
mean(na_omit(Subset2D(x, keys[i], keys[i])))
);
}
}
return res;
}
all.equal(
lapply(subsetList, function(x) mean(pdist[x, x], na.rm = TRUE)),
meanDistByCell2(subsetList, pdist, TRUE)
)
# [1] TRUE
Although the use of Subset2D allows for a much cleaner implementation of meanDistByCell, in this situation it is inefficient for at least a couple of reasons:
It sets the dimnames of the return object (rownames(res) = crows;, colnames(res) = ccols;), which you have no need for here.
It makes a call to match to obtain indices for each of rownames and colnames, which is unnecessary since you know in advance that rownames(x) == colnames(x).
You will incur the cost of both of these points k times, for an input list with length k.
A more efficient -- but consequently less concise -- approach would be to essentially implement only the aspects of Subset2D that are needed, inline inside of meanDistByCell:
// [[Rcpp::export]]
List meanDistByCell2(List keys, NumericMatrix x, bool na_rm = false) {
R_xlen_t k = 0, sz = keys.size(), i = 0, j = 0, nidx, pos;
List res(sz);
CharacterVector cx = colnames(x);
if (!na_rm) {
for (; k < sz; k++) {
// NB: match returns 1-based indices
IntegerVector idx = match(as<CharacterVector>(keys[k]), cx) - 1;
nidx = idx.size();
NumericVector tmp(nidx * nidx);
for (j = 0; j < nidx; j++) {
pos = idx[j];
for (i = 0; i < nidx; i++) {
tmp[nidx * j + i] = x(idx[i], pos);
}
}
res[k] = NumericVector::create(mean(tmp));
}
} else {
for (; k < sz; k++) {
IntegerVector idx = match(as<CharacterVector>(keys[k]), cx) - 1;
nidx = idx.size();
NumericVector tmp(nidx * nidx);
for (j = 0; j < nidx; j++) {
pos = idx[j];
for (i = 0; i < nidx; i++) {
tmp[nidx * j + i] = x(idx[i], pos);
}
}
res[k] = NumericVector::create(mean(na_omit(tmp)));
}
}
return res;
}
all.equal(
meanDistByCell(subsetList, pdist, TRUE),
meanDistByCell2(subsetList, pdist, TRUE)
)
# [1] TRUE

CodeJam 2014: How to solve task "New Lottery Game"?

I want to know efficient approach for the New Lottery Game problem.
The Lottery is changing! The Lottery used to have a machine to generate a random winning number. But due to cheating problems, the Lottery has decided to add another machine. The new winning number will be the result of the bitwise-AND operation between the two random numbers generated by the two machines.
To find the bitwise-AND of X and Y, write them both in binary; then a bit in the result in binary has a 1 if the corresponding bits of X and Y were both 1, and a 0 otherwise. In most programming languages, the bitwise-AND of X and Y is written X&Y.
For example:
The old machine generates the number 7 = 0111.
The new machine generates the number 11 = 1011.
The winning number will be (7 AND 11) = (0111 AND 1011) = 0011 = 3.
With this measure, the Lottery expects to reduce the cases of fraudulent claims, but unfortunately an employee from the Lottery company has leaked the following information: the old machine will always generate a non-negative integer less than A and the new one will always generate a non-negative integer less than B.
Catalina wants to win this lottery and to give it a try she decided to buy all non-negative integers less than K.
Given A, B and K, Catalina would like to know in how many different ways the machines can generate a pair of numbers that will make her a winner.
For small input we can check all possible pairs but how to do it with large inputs. I guess we represent the binary number into string first and then check permutations which would give answer less than K. But I can't seem to figure out how to calculate possible permutations of 2 binary strings.
I used a general DP technique that I described in a lot of detail in another answer.
We want to count the pairs (a, b) such that a < A, b < B and a & b < K.
The first step is to convert the numbers to binary and to pad them to the same size by adding leading zeroes. I just padded them to a fixed size of 40. The idea is to build up the valid a and b bit by bit.
Let f(i, loA, loB, loK) be the number of valid suffix pairs of a and b of size 40 - i. If loA is true, it means that the prefix up to i is already strictly smaller than the corresponding prefix of A. In that case there is no restriction on the next possible bit for a. If loA ist false, A[i] is an upper bound on the next bit we can place at the end of the current prefix. loB and loK have an analogous meaning.
Now we have the following transition:
long long f(int i, bool loA, bool loB, bool loK) {
// TODO add memoization
if (i == 40)
return loA && loB && loK;
int hiA = loA ? 1: A[i]-'0'; // upper bound on the next bit in a
int hiB = loB ? 1: B[i]-'0'; // upper bound on the next bit in b
int hiK = loK ? 1: K[i]-'0'; // upper bound on the next bit in a & b
long long res = 0;
for (int a = 0; a <= hiA; ++a)
for (int b = 0; b <= hiB; ++b) {
int k = a & b;
if (k > hiK) continue;
res += f(i+1, loA || a < A[i]-'0',
loB || b < B[i]-'0',
loK || k < K[i]-'0');
}
return res;
}
The result is f(0, false, false, false).
The runtime is O(max(log A, log B)) if memoization is added to ensure that every subproblem is only solved once.
What I did was just to identify when the answer is A * B.
Otherwise, just brute force the rest, this code passed the large input.
// for each test cases
long count = 0;
if ((K > A) || (K > B)) {
count = A * B;
continue; // print count and go to the next test case
}
count = A * B - (A-K) * (B-K);
for (int i = K; i < A; i++) {
for (int j = K; j < B; j++) {
if ((i&j) < K) count++;
}
}
I hope this helps!
just as Niklas B. said.
the whole answer is.
#include <algorithm>
#include <cstring>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <map>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
#define MAX_SIZE 32
int A, B, K;
int arr_a[MAX_SIZE];
int arr_b[MAX_SIZE];
int arr_k[MAX_SIZE];
bool flag [MAX_SIZE][2][2][2];
long long matrix[MAX_SIZE][2][2][2];
long long
get_result();
int main(int argc, char *argv[])
{
int case_amount = 0;
cin >> case_amount;
for (int i = 0; i < case_amount; ++i)
{
const long long result = get_result();
cout << "Case #" << 1 + i << ": " << result << endl;
}
return 0;
}
long long
dp(const int h,
const bool can_A_choose_1,
const bool can_B_choose_1,
const bool can_K_choose_1)
{
if (MAX_SIZE == h)
return can_A_choose_1 && can_B_choose_1 && can_K_choose_1;
if (flag[h][can_A_choose_1][can_B_choose_1][can_K_choose_1])
return matrix[h][can_A_choose_1][can_B_choose_1][can_K_choose_1];
int cnt_A_max = arr_a[h];
int cnt_B_max = arr_b[h];
int cnt_K_max = arr_k[h];
if (can_A_choose_1)
cnt_A_max = 1;
if (can_B_choose_1)
cnt_B_max = 1;
if (can_K_choose_1)
cnt_K_max = 1;
long long res = 0;
for (int i = 0; i <= cnt_A_max; ++i)
{
for (int j = 0; j <= cnt_B_max; ++j)
{
int k = i & j;
if (k > cnt_K_max)
continue;
res += dp(h + 1,
can_A_choose_1 || (i < cnt_A_max),
can_B_choose_1 || (j < cnt_B_max),
can_K_choose_1 || (k < cnt_K_max));
}
}
flag[h][can_A_choose_1][can_B_choose_1][can_K_choose_1] = true;
matrix[h][can_A_choose_1][can_B_choose_1][can_K_choose_1] = res;
return res;
}
long long
get_result()
{
cin >> A >> B >> K;
memset(arr_a, 0, sizeof(arr_a));
memset(arr_b, 0, sizeof(arr_b));
memset(arr_k, 0, sizeof(arr_k));
memset(flag, 0, sizeof(flag));
memset(matrix, 0, sizeof(matrix));
int i = 31;
while (i >= 1)
{
arr_a[i] = A % 2;
A /= 2;
arr_b[i] = B % 2;
B /= 2;
arr_k[i] = K % 2;
K /= 2;
i--;
}
return dp(1, 0, 0, 0);
}

Generate all compositions of an integer into k parts

I can't figure out how to generate all compositions (http://en.wikipedia.org/wiki/Composition_%28number_theory%29) of an integer N into K parts, but only doing it one at a time. That is, I need a function that given the previous composition generated, returns the next one in the sequence. The reason is that memory is limited for my application. This would be much easier if I could use Python and its generator functionality, but I'm stuck with C++.
This is similar to Next Composition of n into k parts - does anyone have a working algorithm?
Any assistance would be greatly appreciated.
Preliminary remarks
First start from the observation that [1,1,...,1,n-k+1] is the first composition (in lexicographic order) of n over k parts, and [n-k+1,1,1,...,1] is the last one.
Now consider an exemple: the composition [2,4,3,1,1], here n = 11 and k=5. Which is the next one in lexicographic order? Obviously the rightmost part to be incremented is 4, because [3,1,1] is the last composition of 5 over 3 parts.
4 is at the left of 3, the rightmost part different from 1.
So turn 4 into 5, and replace [3,1,1] by [1,1,2], the first composition of the remainder (3+1+1)-1 , giving [2,5,1,1,2]
Generation program (in C)
The following C program shows how to compute such compositions on demand in lexicographic order
#include <stdio.h>
#include <stdbool.h>
bool get_first_composition(int n, int k, int composition[k])
{
if (n < k) {
return false;
}
for (int i = 0; i < k - 1; i++) {
composition[i] = 1;
}
composition[k - 1] = n - k + 1;
return true;
}
bool get_next_composition(int n, int k, int composition[k])
{
if (composition[0] == n - k + 1) {
return false;
}
// there'a an i with composition[i] > 1, and it is not 0.
// find the last one
int last = k - 1;
while (composition[last] == 1) {
last--;
}
// turn a b ... y z 1 1 ... 1
// ^ last
// into a b ... (y+1) 1 1 1 ... (z-1)
// be careful, there may be no 1's at the end
int z = composition[last];
composition[last - 1] += 1;
composition[last] = 1;
composition[k - 1] = z - 1;
return true;
}
void display_composition(int k, int composition[k])
{
char *separator = "[";
for (int i = 0; i < k; i++) {
printf("%s%d", separator, composition[i]);
separator = ",";
}
printf("]\n");
}
void display_all_compositions(int n, int k)
{
int composition[k]; // VLA. Please don't use silly values for k
for (bool exists = get_first_composition(n, k, composition);
exists;
exists = get_next_composition(n, k, composition)) {
display_composition(k, composition);
}
}
int main()
{
display_all_compositions(5, 3);
}
Results
[1,1,3]
[1,2,2]
[1,3,1]
[2,1,2]
[2,2,1]
[3,1,1]
Weak compositions
A similar algorithm works for weak compositions (where 0 is allowed).
bool get_first_weak_composition(int n, int k, int composition[k])
{
if (n < k) {
return false;
}
for (int i = 0; i < k - 1; i++) {
composition[i] = 0;
}
composition[k - 1] = n;
return true;
}
bool get_next_weak_composition(int n, int k, int composition[k])
{
if (composition[0] == n) {
return false;
}
// there'a an i with composition[i] > 0, and it is not 0.
// find the last one
int last = k - 1;
while (composition[last] == 0) {
last--;
}
// turn a b ... y z 0 0 ... 0
// ^ last
// into a b ... (y+1) 0 0 0 ... (z-1)
// be careful, there may be no 0's at the end
int z = composition[last];
composition[last - 1] += 1;
composition[last] = 0;
composition[k - 1] = z - 1;
return true;
}
Results for n=5 k=3
[0,0,5]
[0,1,4]
[0,2,3]
[0,3,2]
[0,4,1]
[0,5,0]
[1,0,4]
[1,1,3]
[1,2,2]
[1,3,1]
[1,4,0]
[2,0,3]
[2,1,2]
[2,2,1]
[2,3,0]
[3,0,2]
[3,1,1]
[3,2,0]
[4,0,1]
[4,1,0]
[5,0,0]
Similar algorithms can be written for compositions of n into k parts greater than some fixed value.
You could try something like this:
start with the array [1,1,...,1,N-k+1] of (K-1) ones and 1 entry with the remainder. The next composition can be created by incrementing the (K-1)th element and decreasing the last element. Do this trick as long as the last element is bigger than the second to last.
When the last element becomes smaller, increment the (K-2)th element, set the (K-1)th element to the same value and set the last element to the remainder again. Repeat the process and apply the same principle for the other elements when necessary.
You end up with a constantly sorted array that avoids duplicate compositions

Resources