how many vectors can be added in DataFrame::create( vec1, vec2 ... )? - rcpp

I am creating a DataFrame to hold a parsed haproxy http log files which has quite a few fields (25+).
If I add more than 20 vectors (one for each field), I get the compilation error:
no matching function call to 'create'
The create method:
return DataFrame::create(
_["clientIp"] = clientIp,
_["clientPort"] = clientPort,
_["acceptDate"] = acceptDate,
_["frontendName"] = frontendName,
_["backendName"] = backendName,
_["serverName"] = serverName,
_["tq"] = tq,
_["tw"] = tw,
_["tc"] = tc,
_["tr"] = tr,
_["tt"] = tt,
_["status_code"] = statusCode,
_["bytes_read"] = bytesRead,
#if CAPTURED_REQUEST_COOKIE_FIELD == 1
_["capturedRequestCookie"] = capturedRequestCookie,
#endif
#if CAPTURED_REQUEST_COOKIE_FIELD == 1
_["capturedResponseCookie"] = capturedResponseCookie,
#endif
_["terminationState"] = terminationState,
_["actconn"] = actconn,
_["feconn"] = feconn,
_["beconn"] = beconn,
_["srv_conn"] = srvConn,
_["retries"] = retries,
_["serverQueue"] = serverQueue,
_["backendQueue"] = backendQueue
);
Questions:
Have I hit a hard limit?
Is there a workaround to allow me to add more than 20 vectors to a data frame?

Yes, you have hit a hard limit -- Rcpp is limited by the C++98 standard, which requires explicit code bloat to support 'variadic' arguments. Essentially, a new overload must be generated for each create function used, and to avoid choking the compiler Rcpp just provides up to 20.
A workaround would be to use a 'builder' class, where you successively add elements, and then convert to DataFrame at the end. A simple example of such a class -- we create a ListBuilder object, for which we successively add new columns. Try running Rcpp::sourceCpp() with this file to see the output.
#include <Rcpp.h>
using namespace Rcpp;
class ListBuilder {
public:
ListBuilder() {};
~ListBuilder() {};
inline ListBuilder& add(std::string const& name, SEXP x) {
names.push_back(name);
// NOTE: we need to protect the SEXPs we pass in; there is
// probably a nicer way to handle this but ...
elements.push_back(PROTECT(x));
return *this;
}
inline operator List() const {
List result(elements.size());
for (size_t i = 0; i < elements.size(); ++i) {
result[i] = elements[i];
}
result.attr("names") = wrap(names);
UNPROTECT(elements.size());
return result;
}
inline operator DataFrame() const {
List result = static_cast<List>(*this);
result.attr("class") = "data.frame";
result.attr("row.names") = IntegerVector::create(NA_INTEGER, XLENGTH(elements[0]));
return result;
}
private:
std::vector<std::string> names;
std::vector<SEXP> elements;
ListBuilder(ListBuilder const&) {}; // not safe to copy
};
// [[Rcpp::export]]
DataFrame test_builder(SEXP x, SEXP y, SEXP z) {
return ListBuilder()
.add("foo", x)
.add("bar", y)
.add("baz", z);
}
/*** R
test_builder(1:5, letters[1:5], rnorm(5))
*/
PS: With Rcpp11, we have variadic functions and hence the limitations are removed.

The other common approach with Rcpp is to just use an outer list containing as many DataFrame objects (with each limited by the number of elements provided via the old-school macro expansion / repetition) in the corresponding header) as you need.
In (untested) code:
Rcpp::DataFrame a = Rcpp::DateFrame::create(/* ... */);
Rcpp::DataFrame b = Rcpp::DateFrame::create(/* ... */);
Rcpp::DataFrame c = Rcpp::DateFrame::create(/* ... */);
return Rcpp::List::create(Rcpp::Named("a") = a,
Rcpp::Named("b") = b,
Rcpp::Named("c") = c);

Related

Build variable length arguments array for #call

I've recently started learning Zig.
As a little project I wanted to implement a small QuickCheck [1] style helper library for writing randomized tests.
However, I can't figure out how to write a generic way to call a function with an arbitrary number of arguments.
Here's a simplified version that can test functions with two arguments:
const std = #import("std");
const Prng = std.rand.DefaultPrng;
const Random = std.rand.Random;
const expect = std.testing.expect;
// the thing we want to test
fn some_property(a: u64, b: u64) !void {
var tmp: u64 = undefined;
var c1 = #addWithOverflow(u64, a, b, &tmp);
var c2 = #addWithOverflow(u64, a, b, &tmp);
expect(c1 == c2);
}
// helper for generating random arguments for the function under test
fn gen(comptime T: ?type, rnd: Random) (T orelse undefined) {
switch (T orelse undefined) {
u64 => return rnd.int(u64),
f64 => return rnd.float(f64),
else => #compileError("unsupported type"),
}
}
/// tests if 'property' holds.
fn for_all(property: anytype) !void {
var rnd = Prng.init(0);
const arg_types = #typeInfo(#TypeOf(property)).Fn.args;
var i: usize = 0;
while (i < 100) {
var a = gen(arg_types[0].arg_type, rnd.random());
var b = gen(arg_types[1].arg_type, rnd.random());
var args = .{a, b}; // <-- how do I build args for functions with any number of arguments?
try #call(.{}, property, args);
i += 1;
}
}
test "test" {
try for_all(some_property);
}
I've tried a few different things, but I can't figure out how to get the above code to work for functions with any number of arguments.
Things I've tried:
Make args an array and fill it with an inline for loop. Doesn't work since []anytype is not a valid type.
Use a bit of comptime magic to build a struct type whose fields hold the arguments for #call. This hits a TODO in the compiler: error: TODO: struct args.
Write generic functions that return an appropriate argument tuple call. I don't really like this one, since you need one function for every arity you want to support. But it doesn't seem to work anyway since antype is not a valid return type.
I'm on Zig 0.9.1.
Any insight would be appreciated.
[1] https://hackage.haskell.org/package/QuickCheck
This can be done with std.meta.ArgsTuple (defined in this file of the zig standard library)
const Args = std.meta.ArgsTuple(#TypeOf(property));
var i: usize = 0;
while (i < 1000) : (i += 1) {
var args: Args = undefined;
inline for (std.meta.fields(Args)) |field, index| {
args[index] = gen(field.field_type, rnd.random());
}
try #call(.{}, property, args);
}
The way this works internally is it constructs a tuple type with #Type(). We can then fill it with values and use it to call the function.

In Rcpp, how to get a user-defined structure from C into R

Am using Rcpp packages and can get my C function to compile and run in R, but now I want to return a large, user-defined data structure to R. The fields in the structure are either numbers or strings - no new or odd types within the structure. The example below is simplified and doesn't compile, but it conveys the idea of my problem.
typedef struct {
char* firstname[128];
char* lastname[128];
int nbrOfSamples;
} HEADER_INFO;
// [[Rcpp::export]]
HEADER_INFO* read_header(Rcpp::StringVector strings) {
FILE *fp;
MEF_HEADER_INFO *header;
char * filename = (char*)(strings(0));
char * password = (char*)(strings(1));
header = (HEADER_INFO*)malloc(sizeof(HEADER_INFO));
memset(header, 0, sizeof(HEADER_INFO));
fp = fopen(filename, "r");
(void)read_header(header, password);
return header;
}
I'm pretty sure that I could package the entries in the header back into a StringVector, but that seems like a brute-force approach. My question is whether a more elegant solution exists. It is not clear to me what form such a structure would even have in R: a named List?
Thanks!
The right structure in R depends on what your struct looks like exactly. A named list is the most general one. Here a simple sample implementation for a wrap function as referred to in the comments:
#include <RcppCommon.h>
typedef struct {
char* firstname[128];
char* lastname[128];
int nbrOfSamples;
} HEADER_INFO;
namespace Rcpp {
template <>
SEXP wrap(const HEADER_INFO& x);
}
#include <Rcpp.h>
namespace Rcpp {
template <>
SEXP wrap(const HEADER_INFO& x) {
Rcpp::CharacterVector firstname(x.firstname, x.firstname + x.nbrOfSamples);
Rcpp::CharacterVector lastname(x.lastname, x.lastname + x.nbrOfSamples);
return Rcpp::wrap(Rcpp::List::create(Rcpp::Named("firstname") = firstname,
Rcpp::Named("lastname") = lastname,
Rcpp::Named("nbrOfSamples") = Rcpp::wrap(x.nbrOfSamples)));
};
}
// [[Rcpp::export]]
HEADER_INFO getHeaderInfo() {
HEADER_INFO header;
header.firstname[0] = (char*)"Albert";
header.lastname[0] = (char*)"Einstein";
header.firstname[1] = (char*)"Niels";
header.lastname[1] = (char*)"Bohr";
header.firstname[2] = (char*)"Werner";
header.lastname[2] = (char*)"Heisenberg";
header.nbrOfSamples = 3;
return header;
}
/*** R
getHeaderInfo()
*/
Output:
> getHeaderInfo()
$firstname
[1] "Albert" "Niels" "Werner"
$lastname
[1] "Einstein" "Bohr" "Heisenberg"
$nbrOfSamples
[1] 3
However, for this particular case a data.frame would be more natural to use, which can be achieved by replacing above wrap with:
template <>
SEXP wrap(const HEADER_INFO& x) {
Rcpp::CharacterVector firstname(x.firstname, x.firstname + x.nbrOfSamples);
Rcpp::CharacterVector lastname(x.lastname, x.lastname + x.nbrOfSamples);
return Rcpp::wrap(Rcpp::DataFrame::create(Rcpp::Named("firstname") = firstname,
Rcpp::Named("lastname") = lastname));
};
Output:
> getHeaderInfo()
firstname lastname
1 Albert Einstein
2 Niels Bohr
3 Werner Heisenberg

duktape how to parse argument of type String Object (similarly Number object) in duktape c function

How to type check the String object/Number object argument types in duktape c function and parse the value from String object/Number object. There is generic api like duk_is_object() but I need the correct object type to parse the value .
ex:
ecmascript code
var str1 = new String("duktape");
var version = new Number(2.2);
dukFunPrintArgs(str1,str2);
duktape c function :
dukFunPrintArgs(ctx)
{
// code to know whether the args is of type String Object / Number Object
}
Where did you find the information how to register a C function in duktape? That place certainly also has details on how to access the parameters passed to it. Already on the homepage of duktape.org you can find a Getting Started example:
3 Add C function bindings
To call a C function from Ecmascript code, first declare your C functions:
/* Being an embeddable engine, Duktape doesn't provide I/O
* bindings by default. Here's a simple one argument print()
* function.
*/
static duk_ret_t native_print(duk_context *ctx) {
printf("%s\n", duk_to_string(ctx, 0));
return 0; /* no return value (= undefined) */
}
/* Adder: add argument values. */
static duk_ret_t native_adder(duk_context *ctx) {
int i;
int n = duk_get_top(ctx); /* #args */
double res = 0.0;
for (i = 0; i < n; i++) {
res += duk_to_number(ctx, i);
}
duk_push_number(ctx, res);
return 1; /* one return value */
}
Register your functions e.g. into the global object:
duk_push_c_function(ctx, native_print, 1 /*nargs*/);
duk_put_global_string(ctx, "print");
duk_push_c_function(ctx, native_adder, DUK_VARARGS);
duk_put_global_string(ctx, "adder");
You can then call your function from Ecmascript code:
duk_eval_string_noresult(ctx, "print('2+3=' + adder(2, 3));");
One of the core concepts in duktape are stacks. The value stack is where parameters are stored. Read more on the Getting Started page.

Why do setters need to return a value in Haxe?

I was recently tripped up by the fact that the expected type of a setter that sets an Int is Int -> Int.
Why does a setter return a value? What significance does this value have?
Small addition to other answers over here, it allows you to do this:
x = y = z = 5;
The value that the setter returns is the value of the assignment expression.
For instance, if you were to do something like this:
public var x(setX, getX): Int
public function setX(val: Int): Int {
return 5;
}
static function main() {
neko.Lib.print(x = 2); // Prints 5
}
Haxe would print out 5. The setter returns the value of the set expression. Of course, it's a nonsensical value in this case.
In the real world, it permits a way of implementing copy by value on assignment, eg:
public var pos(set_pos, default):Point;
function set_pos(newPos:Point) {
pos.x = newPos.x
pos.y = newPos.y;
return pos;
}
And also of having a sensible return when implementing setters that can fail, eg:
public var positiveX(default, set_positiveX):Int;
function set_positiveX(newX:Int) {
if (newX >= 0) x = newX;
return x;
}
trace(x = 10); // 10
trace(x = -4); // 10
trace(x); // 10
Whereas you would otherwise get something like this:
trace(x = 10); // 10
trace(x = -4); // -4
trace(x); // 10
Which is what happens in AS3 even if the setter does not modify the value. If you could not do this, clearly the first, where x = -4 returns 10 is better :)

C++ lambdas for std::sort and std::lower_bound/equal_range on a struct element in a sorted vector of structs

I have a std::vector of this struct:
struct MS
{
double aT;
double bT;
double cT;
};
which I want to use std::sort on as well as std::lower_bound/equal_range etc...
I need to be able to sort it and look it up on either of the first two elements of the struct. So at the moment I have this:
class MSaTLess
{
public:
bool operator() (const MS &lhs, const MS &rhs) const
{
return TLess(lhs.aT, rhs.aT);
}
bool operator() (const MS &lhs, const double d) const
{
return TLess(lhs.aT, d);
}
bool operator() (const double d, const MS &rhs) const
{
return TLess(d, rhs.aT);
}
private:
bool TLess(const double& d1, const double& d2) const
{
return d1 < d2;
}
};
class MSbTLess
{
public:
bool operator() (const MS &lhs, const MS &rhs) const
{
return TLess(lhs.bT, rhs.bT);
}
bool operator() (const MS &lhs, const double d) const
{
return TLess(lhs.bT, d);
}
bool operator() (const double d, const MS &rhs) const
{
return TLess(d, rhs.bT);
}
private:
bool TLess(const double& d1, const double& d2) const
{
return d1 < d2;
}
};
This allows me to call both std::sort and std::lower_bound with MSaTLess() to sort/lookup based on the aT element and with MSbTLess() to sort/lookup based on the bT element.
I'd like to get away from the functors and use C++0x lambdas instead. For sort that is relatively straightforward as the lambda will take two objects of type MS as arguments.
What about for the lower_bound and other binary search lookup algorithms though? They need to be able to call a comparator with (MS, double) arguments and also the reverse, (double, MS), right? How can I best provide these with a lambda in a call to lower_bound? I know I could create an MS dummy object with the required key value being searched for and then use the same lambda as with std::sort but is there a way to do it without using dummy objects?
It's a little awkward, but if you check the definitions of lower_bound and upper_bound from the standard, you'll see that the definition of lower_bound puts the dereferenced iterator as the first parameter of the comparison (and the value second), whereas upper_bound puts the dereferenced iterator second (and the value first).
So, I haven't tested this but I think you'd want:
std::lower_bound(vec.begin(), vec.end(), 3.142, [](const MS &lhs, double rhs) {
return lhs.aT < rhs;
});
and
std::upper_bound(vec.begin(), vec.end(), 3.142, [](double lhs, const MS &rhs) {
return lhs < rhs.aT;
});
This is pretty nasty, and without looking up a few more things I'm not sure you're actually entitled to assume that the implementation uses the comparator only in the way it's described in the text - that's a definition of the result, not the means to get there. It also doesn't help with binary_search or equal_range.
It's not explicitly stated in 25.3.3.1 that the iterator's value type must be convertible to T, but it's sort of implied by the fact that the requirement for the algorithm is that T (in this case, double) must be LessThanComparable, not that T must be comparable to the value type of the iterator in any particular order.
So I think it's better just to always use a lambda (or functor) that compares two MS structs, and instead of passing a double as a value, pass a dummy MS with the correct field set to the value you're looking for:
std::upper_bound(vec.begin(), vec.end(), MS(3.142,0,0), [](const MS &lhs, const MS &rhs) {
return lhs.aT < rhs.aT;
});
If you don't want to give MS a constructor (because you want it to be POD), then you can write a function to create your MS object:
MS findA(double d) {
MS result = {d, 0, 0};
return result;
}
MS findB(double d) {
MS result = {0, d, 0};
return result;
}
Really, now that there are lambdas, for this job we want a version of binary search that takes a unary "comparator":
double d = something();
unary_upper_bound(vec.begin(), vec.end(), [d](const MS &rhs) {
return d < rhs.aT;
});
C++0x doesn't provide it, though.
The algorithms std::sort, std::lower_bound, and std::binary_search take a predicate that compares two elements of the container. Any lambda that compares two MS objects and returns true when they are in order should work for all three algorithms.
Not directly relevant to what you're saying about lambdas, but this might be an idea for using the binary search functions:
#include <iostream>
#include <algorithm>
#include <vector>
struct MS
{
double aT;
double bT;
double cT;
MS(double a, double b, double c) : aT(a), bT(b), cT(c) {}
};
// template parameter is a data member of MS, of type double
template <double MS::*F>
struct Find {
double d;
Find(double d) : d(d) {}
};
template <double MS::*F>
bool operator<(const Find<F> &lhs, const Find<F> &rhs) {
return lhs.d < rhs.d;
}
template <double MS::*F>
bool operator<(const Find<F> &lhs, const MS &rhs) {
return lhs.d < rhs.*F;
}
template <double MS::*F>
bool operator<(const MS &lhs, const Find<F> &rhs) {
return lhs.*F < rhs.d;
}
int main() {
std::cout << (Find<&MS::bT>(1) < Find<&MS::bT>(2)) << "\n";
std::cout << (Find<&MS::bT>(1) < MS(1,0,0)) << "\n";
std::cout << (MS(1,0,0) < Find<&MS::bT>(1)) << "\n";
std::vector<MS> vec;
vec.push_back(MS(1,0,0));
vec.push_back(MS(0,1,0));
std::lower_bound(vec.begin(), vec.end(), Find<&MS::bT>(0.5));
std::upper_bound(vec.begin(), vec.end(), Find<&MS::bT>(0.5));
}
Basically, by using Find as the value, we don't have to supply a comparator, because Find compares to MS using the field that we specify. This is the same kind of thing as the answer you saw over here: how to sort STL vector, but using the value rather than the comparator as in that case. Not sure if it'd be all that great to use, but it might be, since it specifies the value to search for and the field to search in a single short expression.
I had the same problem for std::equal_range and came up with an alternative solution.
I have a collection of pointers to objects sorted on a type field. I need to find the find the range of objects for a given type.
const auto range = std::equal_range (next, blocks.end(), nullptr,
[type] (Object* o1, Object* o2)
{
return (o1 ? o1->Type() : type) < (o2 ? o2->Type() : type);
});
Although it is less efficient than a dedicated predicate as it introduces an unnecessary nullptr test for each object in my collection, it does provide an interesting alternative.
As an aside, when I do use a class as in your example, I tend to do the following. As well as being shorter, this allows me to add additional types with only 1 function per type rather then 4 operators per type.
class MSbTLess
{
private:
static inline const double& value (const MS& val)
{
return val.bT;
}
static inline const double& value (const double& val)
{
return val;
}
public:
template <typename T1, typename T2>
bool operator() (const T1& lhs, const T2& rhs) const
{
return value (t1) < value (t2);
}
};
In the definition of lower_bound and other STL Algorithms the Compare function is such that the first type must match that of the Forward Iterator and the second type must match that of T (i.e., of the value).
template< class ForwardIt, class T, class Compare >
ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value, Compare comp );
So one indeed can compare things from different objects (doing what the other response called an Unary Comparator). In C++11 :
vector<MS> v = SomeSortedVectorofMSByFieldaT();
double a_key;
auto it = std::lower_bound(v.begin(),
v.end(),
a_key,
[]{const MS& m, const double& a) {
m.aT < a;
});
And this can be used with other STL algorithm functions as well.

Resources