Divide large malloc-block into smaller "partitions"

Divide large malloc-block into smaller "partitions" - struct

Is there a way to use poiter arithmetic on a large malloc block, so you can assign multiple structs or primitive data types to that area already allocated? I'm writing something like this but it isnt working (trying to assign 200 structs to a 15000byte malloc area):
char *primDataPtr = NULL;
typedef struct Metadata METADATA;
struct Metadata {
.
.
.
};/*struct Metadata*/
.
.
.
primDataPtr = (void*)(malloc(15000));
if(primDataPtr == NULL) {
exit(1);
}
char *tempPtr = primDataPtr;
int x;
for(x=0;x<200;x++) {
METADATA *md = (void*)(primDataPtr + (sizeof(METADATA) * x));
}//end x -for

The only thing I can see is that:
METADATA *md = (void*)(primDataPtr + (sizeof(METADATA) * x));
should be:
METADATA *md = (METADATA *)(primDataPtr + (sizeof(METADATA) * x));
I think?
PS: your malloc could also just allocation 200 * sizeof(METADATA).

In C, the syntax for a pointer to something is just like the syntax for an array of something. You just need to be careful with the index ranges:
#define ARRAY_SIZE_IN_BYTES (15000)
void *primDataPtr = (void*) malloc(ARRAY_SIZE_IN_BYTES);
assert(primDataPtr);
METADATA *md = (METADATA *)primDataPtr;
for (x=0; x<(ARRAY_SIZE_IN_BYTES/sizeof(METADATA)); x++) {
do_something_with(md[x]);
}

Related

Assign memory blob to py-torch output tensor (C++ API)

I am training a linear model using py-torch and I am saving it to a file with the "save" function call. I have another code that loads the model in C++ and performs inference.
I would like to instruct the Torch CPP Library to use a specific memory blob at the final output tensor. Is this even possible? If yes, how? Below you can see a small example of what I am trying to achieve.
#include <iostream>
#include <memory>
#include <torch/script.h>
int main(int argc, const char* argv[]) {
if (argc != 3) {
std::cerr << "usage: example-app <path-to-exported-script-module>\n";
return -1;
}
long numElements = (1024*1024)/sizeof(float) * atoi(argv[2]);
float *a = new float[numElements];
float *b = new float[numElements];
float *c = new float[numElements*4];
for (int i = 0; i < numElements; i++){
a[i] = i;
b[i] = -i;
}
//auto options = torch::TensorOptions().dtype(torch::kFloat64);
at::Tensor a_t = torch::from_blob((float*) a, {numElements,1});
at::Tensor b_t = torch::from_blob((float*) b, {numElements,1});
at::Tensor out = torch::from_blob((float*) c, {numElements,4});
at::Tensor c_t = at::cat({a_t,b_t}, 1);
at::Tensor d_t = at::reshape(c_t, {numElements,2});
torch::jit::script::Module module;
try {
module = torch::jit::load(argv[1]);
}
catch (const c10::Error& e) {
return -1;
}
out = module.forward({d_t}).toTensor();
std::cout<< out.sizes() << "\n";
delete [] a;
delete [] b;
delete [] c;
return 0;
}
So, I am allocating memory into "c" and then I am casting creating a tensor out of this memory. I store this memory into a tensor named "out". I load the model when I call the forward method. I observe that the resulted data are copied/moved into the "out" tensor. However, I would like to instruct Torch to directly store into "out" memory. Is this possible?

Somewhere in libtorch source code (I don' remember where, I'll try to find the file), there is an operator which is something like below (notice the last &&)
torch::tensor& operator=(torch::Tensor rhs) &&;
and which does what you need if I remember correctly. Basically torch assumes that if you allocate a tensor rhs to an rvalue reference tensor, then you actually mean to copy rhs into the underlying storage.
So in your case, that would be
std::move(out) = module.forward({d_t}).toTensor();
or
torch::from_blob((float*) c, {numElements,4}) = module.forward({d_t}).toTensor();

How to read from or write to a scoketConnection? (newbie)

I never have used C++ before so this may be a dumb question.
In R I use this function to read from a socket:
socket_bin_reader <- function(in_sock) {
string_read <- raw(0)
while((rd <- readBin(in_sock, what = "raw", n=1)) > 0) {
if (rd == 0xff) rd <- readBin(in_sock, what = "raw", n =1)
string_read <- c(string_read, rd)
}
return(string_read)
}
This functions does exactly what I need, but has the disadvantage that it takes a lot of time to read large quantities of data. Therefore I am looking for ways to use C++.
I found this example on how to read byte-by-byte from a file (cpp-byte-file-reading)
The body from my function will probably be based on this example. My guess is that it will look like:
// [[Rcpp::export]]
NumericVector socket_bin_reader_C(??? in_sock) {
NumericVector out = NumericVector::create(??);
ifstream infile(in_sock, ios::in | ios::binary);
while (rd = infile.read(char*) > 0) {
if (rd == 0xff) rd = infile.read(char*);
add rd to out;
}
}
But I have two questions:
In Rcpp you have to provide a class for each parameter. What is the class for a socketConnection?
I know that in C or C++ you have to allocate memory. How can I dynamically reallocate more memory for the return vector?
Ben

RcppArrayFire passing a matrix row as af::array input

In this simple example I would like to subset a matrix by row and pass it to another cpp function; the example demonstrates this works by passing an input array to the other function first.
#include "RcppArrayFire.h"
using namespace Rcpp;
af::array theta_check_cpp( af::array theta){
if(*theta(1).host<double>() >= 1){
theta(1) = 0;
}
return theta;
}
// [[Rcpp::export]]
af::array theta_check(RcppArrayFire::typed_array<f64> theta){
const int theta_size = theta.dims()[0];
af::array X(2, theta_size);
X(0, af::seq(theta_size)) = theta_check_cpp( theta );
X(1, af::seq(theta_size)) = theta;
// return X;
Rcpp::Rcout << " works till here";
return theta_check_cpp( X.row(1) );
}
/*** R
theta <- c( 2, 2, 2)
theta_check(theta)
*/

The constructor you are using to create X has an argument ty for the data type, which defaults to f32. Therefore X uses 32 bit floats and you cannot extract a 64 bit host pointer from that. Either use
af::array X(2, theta_size, f64);
to create an array using 64 bit doubles, or extract a 32 bit host pointer via
if(*theta(1).host<float>() >= 1){
...

Algorithm for doing many substring reversals?

Suppose I have a string S of length N, and I want to perform M of the following operations:
choose 1 <= L,R <= N and reverse the substring S[L..R]
I am interested in what the final string looks like after all M operations. The obvious approach is to do the actual swapping, which leads to O(MN) worst-case behavior. Is there a faster way? I'm trying to just keep track of where an index ends up, but I cannot find a way to reduce the running time (though I have a gut feeling O(M lg N + N) -- for the operations and the final reading -- is possible).

Yeah, it's possible. Make a binary tree structure like
struct node {
struct node *child[2];
struct node *parent;
char label;
bool subtree_flipped;
};
Then you can have a logical getter/setter for left/right child:
struct node *get_child(struct node *u, bool right) {
return u->child[u->subtree_flipped ^ right];
}
void set_child(struct node *u, bool right, struct node *c) {
u->child[u->subtree_flipped ^ right] = c;
if (c != NULL) { c->parent = u; }
}
Rotations have to preserve flipped bits:
struct node *detach(struct node *u, bool right) {
struct node *c = get_child(u, right);
if (c != NULL) { c->subtree_flipped ^= u->subtree_flipped; }
return c;
}
void attach(struct node *u, bool right, struct node *c) {
set_child(u, right, c);
if (c != NULL) { c->subtree_flipped ^= u->subtree_flipped; }
}
// rotates one of |p|'s child up.
// does not fix up the pointer to |p|.
void rotate(struct node *p, bool right) {
struct node *u = detach(p, right);
struct node *c = detach(u, !right);
attach(p, right, c);
attach(u, !right, p);
}
Implement splay with rotations. It should take a "guard" pointer that is treated as a NULL parent for the purpose of splaying, so that you can splay one node to the root and another to its right child. Do this and then you can splay both endpoints of the flipped region and then toggle the flip bits for the root and the two subtrees corresponding to segments left unaffected.
Traversal looks like this.
void traverse(struct node *u, bool flipped) {
if (u == NULL) { return; }
flipped ^= u->subtree_flipped;
traverse(u->child[flipped], flipped);
visit(u);
traverse(u->child[!flipped], flipped);
}

Splay tree may help you, it supports reverse operation in an array, with total complexity O(mlogn)

#F. Ju is right, splay trees are one of the best data structures to achieve your goal.
However, if you don't want to implement them, or a solution in O((N + M) * sqrt(M)) is good enough, you can do the following:
We will perform sqrt(M) consecutive queries and then rebuilt the array from the scratch in O(N) time.
In order to do that, for each query, we will store the information that the queried segment [a, b] is reversed or not (if you reverse some range of elements twice, they become unreversed).
The key here is to maintain the information for disjoint segments here. Notice that since we are performing at most sqrt(M) queries before rebuilding the array, we will have at most sqrt(M) disjoint segments and we can perform query operation on sqrt(M) segments in sqrt(M) time. Let me know if you need a detailed explanation on how to "reverse" these disjoint segments.
This trick is very useful while solving problems like that and it is worth to know it.
UPDATE:
I solved the problem exactly corresponding to yours on HackerRank, during their contest, using the method I described.
Here is the problem
Here is my solution in C++.
Here is the discussion about the problem and a brief description of my method, please check my 3rd message there.

I'm trying to just keep track of where an index ends up
If you're just trying to follow one entry of the starting array, it's easy to do that in O(M) time.
I was going to just write pseudocode, but no hand-waving was needed so I ended up with what's probably valid C++.
// untested C++, but it does compile to code that looks right.
struct swap {
int l, r;
// or make these non-member functions for C
bool covers(int pos) { return l <= pos && pos <= r; }
int apply_if_covering(int pos) {
// startpos - l = r - endpos;
// endpos = l - startpos + r
if(covers(pos))
pos = l - pos + r;
return pos;
}
};
int follow_swaps (int pos, int len, struct swap swaps[], int num_swaps)
{
// pos = starting position of the element we want to track
// return value = where it will be after all the swaps
for (int i = 0 ; i < num_swaps ; i++) {
pos = swaps[i].apply_if_covering(pos);
}
return pos;
}
This compiles to very efficient-looking code.

Is qHash consistent across computers?

I have a database table with multiple text columns that collectively have to be unique, and I don't want to use a multicolumn key, so I was thinking to hash the strings together into an int and use that as the primary key. I was wondering if it would be a better idea to take advantage of uint qHash ( const QString & key ), or to write my own function, given that the database will need to be edited by different people in different places. (Also, if the whole approach is bad, please help.)

qHash is implemented as below :
static uint hash(const uchar *p, int n)
{
uint h = 0;
uint g;
while (n--) {
h = (h << 4) + *p++;
if ((g = (h & 0xf0000000)) != 0)
h ^= g >> 23;
h &= ~g;
}
return h;
}
static uint hash(const QChar *p, int n)
{
uint h = 0;
uint g;
while (n--) {
h = (h << 4) + (*p++).unicode();
if ((g = (h & 0xf0000000)) != 0)
h ^= g >> 23;
h &= ~g;
}
return h;
}
There is nothing specific to platform in that code. However a hash algorithm does not guarantee uniqueness like a database. It does its best to avoid collisions but it is not guaranteed. That is why most hash containers use buckets and reallocation algorithms.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Divide large malloc-block into smaller "partitions" - struct

The only thing I can see is that: METADATA md = (void)(primDataPtr + (sizeof(METADATA) * x)); should be: METADATA md = (METADATA )(primDataPtr + (sizeof(METADATA) * x)); I think? PS: your malloc could also just allocation 200 * sizeof(METADATA).

Related

Assign memory blob to py-torch output tensor (C++ API)

How to read from or write to a scoketConnection? (newbie)

RcppArrayFire passing a matrix row as af::array input

Algorithm for doing many substring reversals?

Is qHash consistent across computers?

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Divide large malloc-block into smaller "partitions" - struct

The only thing I can see is that: METADATA *md = (void*)(primDataPtr + (sizeof(METADATA) * x)); should be: METADATA *md = (METADATA *)(primDataPtr + (sizeof(METADATA) * x)); I think? PS: your malloc could also just allocation 200 * sizeof(METADATA).

Related

Assign memory blob to py-torch output tensor (C++ API)

How to read from or write to a scoketConnection? (newbie)

RcppArrayFire passing a matrix row as af::array input

Algorithm for doing many substring reversals?

Is qHash consistent across computers?

Categories

Resources

The only thing I can see is that: METADATA md = (void)(primDataPtr + (sizeof(METADATA) * x)); should be: METADATA md = (METADATA )(primDataPtr + (sizeof(METADATA) * x)); I think? PS: your malloc could also just allocation 200 * sizeof(METADATA).