Slices vs Arrays in Carbon language - carbon-lang

What are the differences between Array and Slice in Carbon? I find a document from the official repo. However, it is uncompleted currently.
The following are example codes from the Carbon official repo.
// Carbon:
package Geometry api;
import Math;
class Circle {
var r: f32;
}
fn PrintTotalArea(circles: Slice(Circle)) {
var area: f32 = 0;
for (c: Circle in circles) {
area += Math.Pi * c.r * c.r;
}
Print("Total area: {0}", area);
}
fn Main() -> i32 {
// A dynamically sized array, like `std::vector`.
var circles: Array(Circle) = ({.r = 1.0}, {.r = 2.0});
// Implicitly constructs `Slice` from `Array`.
PrintTotalArea(circles);
return 0;
}
I guess that the Slice is something like std::span, since the C++ version of the codes above uses std::span when the Carbon version uses Slice. Am I correct?
// C++:
#include <math.h>
#include <iostream>
#include <span>
#include <vector>
struct Circle {
float r;
};
void PrintTotalArea(std::span<Circle> circles) {
float area = 0;
for (const Circle& c : circles) {
area += M_PI * c.r * c.r;
}
std::cout << "Total area: " << area << "\n";
}
auto main(int argc, char** argv) -> int {
std::vector<Circle> circles = {{1.0}, {2.0}};
// Implicitly constructors `span` from `vector`.
PrintTotalArea(circles);
return 0;
}

Currently, a first version Arrays exist, while continuing to being worked on, and the Slice type is basically not defined, so it is a bit early to give you a real answer.
As you guessed, it seems like slices might eventually have similarities with std::span, based on some design discussions, but nothing has been approved yet.
You can check the open design proposals on GitHub directly.

Related

Rcpp: how to use unwind protection?

I was wondering how could I make some Rcpp code use automatic unwind protection in all Rcpp object creations.
For example, suppose I have some code like this:
#include <stdint.h>
#include <Rcpp.h>
class MyObj {
public:
int val;
MyObj(int val) : val(val) {};
~MyObj() {
std::cout << "I' being destructed - value was: " << val << std::endl;
}
};
// [[Rcpp::export]]
Rcpp::NumericVector crashme(unsigned int seed)
{
srand(seed);
MyObj obj1(rand());
Rcpp::NumericVector out(INT64_MAX-1, 100.);
return out;
}
When I call crashme, obj1 doesn't get destructed before the function ends, due to R's long jumps which I want to protect against.
I see there is a function Rcpp::unwindProtect, but it's implemented as something that takes a callback.
I'm not 100% sure if I'm doing it right, but I managed to add unwind protection like this:
#include <stdint.h>
#include <Rcpp.h>
#include <Rcpp/unwindProtect.h>
// [[Rcpp::plugins(unwindProtect)]]
class MyObj {
public:
int val;
MyObj(int val) : val(val) {};
~MyObj() {
std::cout << "I' being destructed - value was: " << val << std::endl;
}
};
struct NumVecArgs {
size_t size;
double fillwith;
};
SEXP alloc_NumVec(void *data)
{
NumVecArgs *args = (NumVecArgs*)data;
return Rcpp::NumericVector(args->size, args->fillwith);
}
// [[Rcpp::export]]
Rcpp::NumericVector crashme(unsigned int seed)
{
srand(seed);
MyObj obj1(rand());
NumVecArgs args = {INT64_MAX-1, 100.};
Rcpp::NumericVector out = Rcpp::unwindProtect(alloc_NumVec, (void*)&args);
return out;
}
Now calling crashme will successfully destruct obj1 and print the destructor message.
But this is very inconvenient, since I have a series of different Rcpp object allocations taking different constructor types, which would imply either defining a different struct and callback for each one of them, or translating all the calls to lengthy lambda functions.
Is there any way to automatically make all calls to constructors of e.g. Rcpp::NumericVector and Rcpp::IntegerVector have unwind protection?

How to increase speed of large for loops

Right now i'm trying to run very large for loops for some task, nearly about 8e+12 iterations. I tried using c++11 threading, but it do not seems to be working that fast as required. I am using system with 8 gb ram, i5 cpu and intel graphics 4000 card. If i use openmp would it be better or i have to use nvidia gpu and use cuda for this task? My code is as below:
#include <ros/ros.h>
// PCL specific includes
#include <sensor_msgs/PointCloud2.h>
#include <pcl_conversions/pcl_conversions.h>
#include <pcl/point_types.h>
#include <pcl/filters/voxel_grid.h>
#include <visualization_msgs/Marker.h>
#include <rosbag/bag.h>
#include <std_msgs/Int32.h>
#include <rosbag/view.h>
#include <boost/foreach.hpp>
#define foreach BOOST_FOREACH
#include <fstream>
#include <pcl/point_cloud.h>
#include <pcl/octree/octree_pointcloud_changedetector.h>
#include <pcl/io/pcd_io.h>
#include <iostream>
#include <vector>
#include <ctime>
#include <thread>
ros::Publisher marker_publisher;
int frame_index = 0;
using namespace std;
int x[200000];
void thread_function(pcl::PointCloud<pcl::PointXYZRGB>::ConstPtr cloudB,vector<int> v,int p0) {
for(size_t p1=0;p1<v.size() && ros::ok();++p1) {
int p0p1 = sqrt( pow(cloudB->points[v[p1]].x-cloudB->points[v[p0]].x,2)
+pow(cloudB->points[v[p1]].y-cloudB->points[v[p0]].y,2)
+pow(cloudB->points[v[p1]].z-cloudB->points[v[p0]].z,2) ) * 1000;
if(p0p1>10) {
for(size_t p2=0;p2<v.size() && ros::ok();++p2) {
int p0p2 = sqrt( pow(cloudB->points[v[p2]].x-cloudB->points[v[p0]].x,2)
+pow(cloudB->points[v[p2]].y-cloudB->points[v[p0]].y,2)
+pow(cloudB->points[v[p2]].z-cloudB->points[v[p0]].z,2) ) * 1000;
int p1p2 = sqrt( pow(cloudB->points[v[p2]].x-cloudB->points[v[p1]].x,2)
+pow(cloudB->points[v[p2]].y-cloudB->points[v[p1]].y,2)
+pow(cloudB->points[v[p2]].z-cloudB->points[v[p1]].z,2) ) * 1000;
if(p0p2>10 && p1p2>10) {
}
}
}
}
x[p0] = 3;
cout<<"ended thread="<<p0<<endl;
}
void cloud_cb (const sensor_msgs::PointCloud2ConstPtr& input)
{
frame_index++;
pcl::PointCloud<pcl::PointXYZRGB>::Ptr cloudB (new pcl::PointCloud<pcl::PointXYZRGB> );
pcl::fromROSMsg(*input,*cloudB);
// Initializing Marker parameters which will be used in rviz
vector<visualization_msgs::Marker> line_list, marker, text_view_facing;
line_list.resize(4); marker.resize(4); text_view_facing.resize(4);
for(int i=0;i<line_list.size();i++) {
marker[i].header.frame_id = line_list[i].header.frame_id = text_view_facing[i].header.frame_id = "/X3/base_link";
marker[i].header.stamp = line_list[i].header.stamp = text_view_facing[i].header.stamp =ros::Time();
marker[i].ns = line_list[i].ns = text_view_facing[i].ns ="lines";
marker[i].action = line_list[i].action = text_view_facing[i].action = visualization_msgs::Marker::ADD;
marker[i].pose.orientation.w = line_list[i].pose.orientation.w = text_view_facing[i].pose.orientation.w = 1;
marker[i].id = i+4;
line_list[i].id = i;
marker[i].type = visualization_msgs::Marker::POINTS;
line_list[i].type = visualization_msgs::Marker::LINE_LIST;
line_list[i].color.r = 1; line_list[i].color.g = 1; line_list[i].color. b = 1; line_list[i].color.a = 1;
marker[i].scale.x = 0.003;
marker[i].scale.y = 0.003;
marker[i].scale.z = 0.003;
text_view_facing[i].id = i+8;
text_view_facing[i].type = visualization_msgs::Marker::TEXT_VIEW_FACING;
text_view_facing[i].color.b = 1; text_view_facing[i].color.a = 1.0; text_view_facing[i].color.g = 1.0; text_view_facing[i].color.r = 1.0;
text_view_facing[i].scale.z = 0.015;
}
marker[3].scale.x = 0.05;
marker[3].scale.y = 0.05;
marker[3].scale.z = 0.05;
if(frame_index==10) // Saving the point cloud for only one time to find moved object in it
{
pcl::io::savePCDFileASCII ("test_pcd.pcd", *cloudB);
}
if(frame_index>10) // Reading above point cloud file after saving for once to compare it with newly arriving point clouds
{
pcl::PointCloud<pcl::PointXYZRGB>::Ptr cloud (new pcl::PointCloud<pcl::PointXYZRGB>);
if (pcl::io::loadPCDFile<pcl::PointXYZRGB> ("test_pcd.pcd", *cloud) == -1) //* load the file
{
PCL_ERROR ("Couldn't read file test_pcd.pcd \n");
}
else {
srand ((unsigned int) time (NULL));
// Octree resolution - side length of octree voxels
double resolution = 0.1;
// Instantiate octree-based point cloud change detection class
pcl::octree::OctreePointCloudChangeDetector<pcl::PointXYZRGB> octree (resolution);
// Add points from cloudA to octree
octree.setInputCloud (cloud);
octree.addPointsFromInputCloud ();
// Switch octree buffers: This resets octree but keeps previous tree structure in memory.
octree.switchBuffers ();
// Add points from cloudB to octree
octree.setInputCloud (cloudB);
octree.addPointsFromInputCloud ();
std::vector<int> newPointIdxVector;
// Get vector of point indices from octree voxels which did not exist in previous buffer
octree.getPointIndicesFromNewVoxels (newPointIdxVector);
geometry_msgs::Point p; std_msgs::ColorRGBA c;
for (size_t i = 0; i < newPointIdxVector.size (); ++i)
{
p.x = cloudB->points[newPointIdxVector[i]].x;
p.y = cloudB->points[newPointIdxVector[i]].y;
p.z = cloudB->points[newPointIdxVector[i]].z;
c.r = cloudB->points[newPointIdxVector[i]].r/255.0;
c.g = cloudB->points[newPointIdxVector[i]].g/255.0;
c.b = cloudB->points[newPointIdxVector[i]].b/255.0;
c.a = 1;
//cout<<newPointIdxVector.size()<<"\t"<<p.x<<"\t"<<p.y<<"\t"<<p.z<<endl;
if(!isnan(p.x) && !isnan(p.y) && !isnan(p.z)) {
marker[3].points.push_back(p);
marker[3].colors.push_back(c);
}
}
marker_publisher.publish(marker[3]);
pcl::PointCloud<pcl::PointXYZRGB> P;
thread t[newPointIdxVector.size()];
for(int p0=0;p0<newPointIdxVector.size();++p0) { // For each voxel in moved object
t[p0] = thread(thread_function,cloudB,newPointIdxVector,p0);
}
for(int p0=0;p0<newPointIdxVector.size();++p0) { // For each voxel in moved object
t[p0].join();
cout<<"joined"<<"\t"<<p0<<"\t"<<x[p0]<<endl;
}
}
}
}
int main (int argc, char** argv)
{
ros::init (argc, argv, "training");
ros::NodeHandle nh;
ros::Subscriber sub = nh.subscribe<sensor_msgs::PointCloud2> ("input", 1, cloud_cb);
marker_publisher = nh.advertise<visualization_msgs::Marker> ("visualization_marker",1);
// Spin
ros::spin ();
}
This task is really important for my algorithm to complete. I need a suggestion how to make this loops run very fast.
In above code the thread_function is the main function where i'm putting the for loops currentely. Is their any way to increase its performance in above code?
OpenMP is the easiest to implement and try. Just add a couple of lines at your CMakeLists.txt, an include and the famous #pragma omp parallel for line just before your for loop.
Threading itself is not necessarily a guarantee for speed. If your process is mostly linear, there is nothing to be done in parallel. In your case, it looks like you have a loop and each iteration might be able to be done independently in parallel, but because each loop is so small and mostly simple mathematical operations, the overhead for making each item its own thread might not save you much (if any) time. The algorithm itself might need an overhaul (i.e. doing this an entirely different way), but threading could potentially solve your issue if your loop is huge and you can break it into, say, 4 chunks and parallel process the 4 chunks (i.e. one thread does items 0-100, another 101-200, etc). Just be aware that one process might finish before another and if some other process is relying on the completion of the whole set of data, then you'll need to ensure that you're done with all 4 threads before continuing. And if you do any kind of manipulation of the data (i.e. shifting elements, adding, removing) in the parallel processes, then you could end up screwing up a parallel thread. Hope that helps!

Qt C++ Displaying images outside the GUI thread (Boost thread)

I am developing a C++ library realizing its interface by means of Qt, using VS2015. On the library side, 3 boost threads continously load images from 3 folders. I am trying to display these images in 3 different QLabel (or equivalent QWidgets), so the thread body consists of this functionality,
in particular by exploiting the setPixmap method. Although the call to the function is protected by a boost mutex, I got exceptions probably due to threads synchronization. Looking for a solution, I already awared that the QPixmap widget is not "thread-safe" (non-reentrant). I also tried to use QGraphicsView but it in turn relies on QPixmap, thus I came across the same problem.
So my question is: does an alternative to QPixmap exist to display images in Qt in a thread-safe
manner?
I would recommend to do not multi-threading in GUI programming. Although, Qt provides multi-threading support in general, IMHO, the widgets are not well-prepared for this.
Thus, to achieve image loaders which run concurrently in separate threads I would suggest the following concept:
Each threaded image loader feeds a private buffer. The GUI inspects from time to time (using QTimer) these buffers and updates its QPixmap. As access to buffers should be possible from the resp. image loader thread as well as the GUI thread they have to be mutex guarded, of course.
My sample code testLoadImageMT.cc:
#include <atomic>
#include <chrono>
#include <mutex>
#include <thread>
#include <QtWidgets>
// manually added types (normally provided by glib)
typedef unsigned guint;
typedef unsigned char guint8;
// the fluffy-cat image sample
struct Image {
guint width;
guint height;
guint bytes_per_pixel; /* 3:RGB, 4:RGBA */
guint8 pixel_data[1];
};
extern "C" const Image fluffyCat;
class ImageLoader {
private:
const Image &_img;
std::atomic<bool> _exit;
std::mutex _lock;
QImage _qImg;
std::thread _thread;
public: // main thread API
ImageLoader(const Image &img = fluffyCat):
_img(img),
_qImg(img.width, img.height, QImage::Format_RGB888),
_exit(false), _thread(&ImageLoader::loadImage, std::ref(*this))
{ }
~ImageLoader()
{
_exit = true;
_thread.join();
}
ImageLoader(const ImageLoader&) = delete;
void applyImage(QLabel &qLblImg)
{
std::lock_guard<std::mutex> lock(_lock);
qLblImg.setPixmap(QPixmap::fromImage(_qImg));
}
private: // thread private
void loadImage()
{
for (;;) {
{ std::lock_guard<std::mutex> lock(_lock);
_qImg.fill(0);
}
size_t i = 0;
for (int y = 0; y < (int)_img.height; ++y) {
for (int x = 0; x < (int)_img.width; ++x) {
const quint32 value
= _img.pixel_data[i + 2]
| (_img.pixel_data[i + 1] << 8)
| (_img.pixel_data[i + 0] << 16)
| (0xff << 24);
i += _img.bytes_per_pixel;
{ std::lock_guard<std::mutex> lock(_lock);
_qImg.setPixel(x, y, value);
}
if (_exit) return; // important: make thread co-operative
}
std::this_thread::sleep_for(std::chrono::milliseconds(100)); // slow down CPU cooler
}
}
}
};
int main(int argc, char **argv)
{
// settings:
enum { N = 3 }; // number of images loaded/displayed
enum { Interval = 50 }; // update rate for GUI 50 ms -> 20 Hz (round about)
// build appl.
qDebug() << "Qt Version: " << QT_VERSION_STR;
QApplication app(argc, argv);
// build GUI
QWidget qMainWin;
QVBoxLayout qVBox;
QLabel *pQLblImgs[N];
for (int i = 0; i < N; ++i) {
qVBox.addWidget(
new QLabel(QString::fromUtf8("Image %1").arg(i + 1)));
qVBox.addWidget(
pQLblImgs[i] = new QLabel());
}
qMainWin.setLayout(&qVBox);
qMainWin.show();
// build image loaders
ImageLoader imgLoader[N];
// install timer
QTimer qTimer;
qTimer.setInterval(Interval); // ms
QObject::connect(&qTimer, &QTimer::timeout,
[&imgLoader, &pQLblImgs]() {
for (int i = 0; i < N; ++i) {
imgLoader[i].applyImage(*pQLblImgs[i]);
}
});
qTimer.start();
// exec. application
return app.exec();
}
Sorry, I used std::thread instead of boost::thread as I've no experience with the latter, nor a working installation. I believe (hope) the differences will be marginal. QThread would have been the "Qt native" alternative but again – no experiences.
To keep things simple, I just copied data out of a linked binary image (instead of loading one from file or from anywhere else). Hence, a second file has to be compiled and linked to make this an MCVE – fluffyCat.cc:
/* GIMP RGB C-Source image dump (fluffyCat.cc) */
// manually added types (normally provided by glib)
typedef unsigned guint;
typedef unsigned char guint8;
extern "C" const struct {
guint width;
guint height;
guint bytes_per_pixel; /* 3:RGB, 4:RGBA */
guint8 pixel_data[16 * 16 * 3 + 1];
} fluffyCat = {
16, 16, 3,
"x\211s\215\232\200gw`fx`at[cx^cw^fu\\itZerWn|ap~cv\204jnzedq^fr^kzfhv^Ra"
"GRbMWdR\\jXer^qw_\311\256\226\271\253\235\275\264\252\315\277\260\304\255"
"\231u~i\213\225\207l{fly`jx\\^nRlz_z\206nlx`t~i\221\211s\372\276\243\375"
"\336\275\376\352\340\356\312\301\235\216\212judgwcl~f\212\226u}\206h\212"
"\224q\231\237z\232\236{\216\225v\225\230\200\306\274\244\376\360\327\376"
"\361\331\376\360\341\326\275\272\253\240\244{\203p\202\220xp~e{\204^\222"
"\230n\212\217g\240\242{\234\236z\214\222r\270\271\247\360\353\340\376\370"
"\336\376\363\334\375\357\336\310\254\262\232\223\234\\gRfrX\204\220z\212"
"\225g\225\232j\254\255\177\252\250{\225\226u\304\302\265\374\365\351\376"
"\375\366\376\367\341\376\361\320\374\346\324\306\241\242\237\232\235n{fj"
"xckyfu~fUX#VZCfnT\231\231\207\374\374\371\377\372\354\376\376\374\376\376"
"\372\376\362\332\375\340\301\341\300\264\260\253\262jvdbq\\XkVJTDNTCCG8O"
"TE\322\321\313\377\377\375\376\376\373\376\377\376\376\376\375\376\374\362"
"\376\360\342\344\311\306\250\244\254R_PL^HXkT<#2OP#`dP\217\220\177\374\374"
"\370\377\377\374\376\375\371\377\377\376\376\374\360\377\367\336\376\350"
"\316\342\303\274\246\236\245jtbXdQTdNQYGU\\KchV\317\315\302\377\376\372\377"
"\376\367\376\373\360\377\376\367\376\366\337\376\355\312\374\331\271\323"
"\263\251\216\214\214\\hTP^HL\\FR[LMXI^dW\355\352\342\376\375\366\377\374"
"\360\376\374\361\376\374\361\376\356\321\374\331\264\374\330\266\330\270"
"\260\200||Y`SLVE>K9BJ<CN?VYP\347\330\322\376\366\345\376\363\330\376\367"
"\337\377\372\350\374\342\314\326\243\210\375\350\314\352\317\304shc^`TV`"
"RVbT>B4IS?PTD\244\232\216\374\355\320\376\354\311\376\351\306\376\362\332"
"\374\344\321\267\206u\375\362\337\326\274\272\\POMNBT]LNZH:<*<A*TV>OI;\242"
"\222\207\340\304\243\375\335\262\372\336\272\376\361\334\320\241\212\374"
"\352\322\266\233\237c\\WFH;MR>\\`F~xP\220\214[pqE\211\202\\g]=\230\214`\313"
"\266\207\344\303\240\362\336\274\323\257\201\333\304\240\305\252\204\254"
"\232p\216\206\\\206\203U\232\224b\234\244b\246\257m\220\232`\224\227h~\202"
"W\206\213]\204\210W\227\227i|\177RvzNlsGrtJwtLz}N{\204RlxF",
};
I compiled and tested in VS2013, with Qt 5.9.2 on Windows 10 (64 bit). This is how it looks:
I solved using signal/slot: the "non-GUI" thread emits a signal instead of displaying the images and the called slot paints the QLabel inside the GUI thread!

Copy struct with function pointer to device

I have a struct containing the parameters of a linear function, as well as the function itself. What I want to do is copy this struct to the device and then evaluate the linear function. The following example doesn't make sense but it is sufficient to describe the difficulties I have:
struct model
{
double* params;
double (*func)(double*, double);
};
I don't know how to copy this struct to the device.
Here are my functions:
Init function
// init function for struct model
__host__ void model_init(model* m, double* params, double(*func)(double*,double))
{
if(m)
{
m->params = params;
m->func = func;
}
}
Evaluation function
__device__ double model_evaluate(model* m, double x)
{
if(m)
{
return m->func(m->params, x);
}
return 0.0;
}
The actual function
__host__ __device__ double linear_function(double* params, double x)
{
return params[0] + params[1] * x;
}
Function called inside kernel
__device__ double compute(model *d_linear_model)
{
return model_evaluate(d_linear_model,1.0);
}
The kernel itself
__global__ void kernel(double *array, model *d_linear_model, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < N)
{
array[idx] = compute(d_linear_model);
}
}
I know how to copy an array from host to device but I don't know how to do this for this concrete struct which contains a function.
The kernel call in main then looks like this:
int block_size = 4;
int n_blocks = N_array/block_size + (N_array % block_size == 0 ? 0:1);
kernel<<<n_blocks, block_size>>>(device_array, d_linear_model, N_array);
You've outlined two items that I consider to be somewhat more difficult than beginner-level CUDA programming:
use of device function pointers
a "deep copy" operation (on the embedded params pointer in your model structure)
Both of these topics have been covered in other questions. For example this question/answer discusses deep copy operations - when a data structure has embedded pointers to other data. And this question/answer links to a variety of resources on device function pointer usage.
But I'll go ahead and offer a possible solution for your posted case. Most of your code is usable as-is (at least for demonstration purposes). As mentioned already, your model structure will present two challenges:
struct model
{
double* params; // requires a "deep copy" operation
double (*func)(double*, double); // requires special handling for device function pointers
};
As a result, although most of your code is usable as-is, your "init" function is not. That might work for a host realization, but not for a device realization.
The deep copy operation requires us to copy the overall structure, plus separately copy the data pointed to by the embedded pointer, plus separately copy or "fixup" the embedded pointer itself.
The usage of a device function pointer is restricted by the fact that we cannot grab the actual device function pointer in host code - that is illegal in CUDA. So one possible solution is to use a __device__ construct to "capture" the device function pointer, then do a cudaMemcpyFromSymbol operation in host code, to retrieve the numerical value of the device function pointer, which can then be moved about in ordinary fashion.
Here's a worked example building on what you have shown, demonstrating the two concepts above. I have not created a "device init" function - but all the code necessary to do that is in the main function. Once you've grasped the concepts, you can take whatever code you wish out of the main function below and craft it into your "device init" function, if you wish to create one.
Here's a worked example:
$ cat t968.cu
#include <iostream>
#define NUM_PARAMS 2
#define ARR_SIZE 1
#define nTPB 256
struct model
{
double* params;
double (*func)(double*, double);
};
// init function for struct model -- not using this for device operations
__host__ void model_init(model* m, double* params, double(*func)(double*,double))
{
if(m)
{
m->params = params;
m->func = func;
}
}
__device__ double model_evaluate(model* m, double x)
{
if(m)
{
return m->func(m->params, x);
}
return 0.0;
}
__host__ __device__ double linear_function(double* params, double x)
{
return params[0] + params[1] * x;
}
__device__ double compute(model *d_linear_model)
{
return model_evaluate(d_linear_model,1.0);
}
__global__ void kernel(double *array, model *d_linear_model, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < N)
{
array[idx] = compute(d_linear_model);
}
}
__device__ double (*linear_function_ptr)(double*, double) = linear_function;
int main(){
// grab function pointer from device code
double (*my_fp)(double*, double);
cudaMemcpyFromSymbol(&my_fp, linear_function_ptr, sizeof(void *));
// setup model
model my_model;
my_model.params = new double[NUM_PARAMS];
my_model.params[0] = 1.0;
my_model.params[1] = 2.0;
my_model.func = my_fp;
// setup for device copy of model
model *d_model;
cudaMalloc(&d_model, sizeof(model));
// setup "deep copy" for params
double *d_params;
cudaMalloc(&d_params, NUM_PARAMS*sizeof(double));
cudaMemcpy(d_params, my_model.params, NUM_PARAMS*sizeof(double), cudaMemcpyHostToDevice);
// copy model to device
cudaMemcpy(d_model, &my_model, sizeof(model), cudaMemcpyHostToDevice);
// fixup device params pointer in device model
cudaMemcpy(&(d_model->params), &d_params, sizeof(double *), cudaMemcpyHostToDevice);
// run test
double *d_array, *h_array;
cudaMalloc(&d_array, ARR_SIZE*sizeof(double));
h_array = new double[ARR_SIZE];
for (int i = 0; i < ARR_SIZE; i++) h_array[i] = i;
cudaMemcpy(d_array, h_array, ARR_SIZE*sizeof(double), cudaMemcpyHostToDevice);
kernel<<<(ARR_SIZE+nTPB-1)/nTPB,nTPB>>>(d_array, d_model, ARR_SIZE);
cudaMemcpy(h_array, d_array, ARR_SIZE*sizeof(double), cudaMemcpyDeviceToHost);
std::cout << "Results: " << std::endl;
for (int i = 0; i < ARR_SIZE; i++) std::cout << h_array[i] << " ";
std::cout << std::endl;
return 0;
}
$ nvcc -o t968 t968.cu
$ cuda-memcheck ./t968
========= CUDA-MEMCHECK
Results:
3
========= ERROR SUMMARY: 0 errors
$
For brevity of presentation, I've dispensed with proper cuda error checking (instead I have run the code with cuda-memcheck to demonstrate that it is without runtime error) but I would recommend proper error checking if you're having any trouble with a code.

split a string using find_if

I found the following code in the book "Accelerated C++" (Chapter 6.1.1), but I can't compile it. The problem is with the find_if lines. I have the necessary includes (vector, string, algorithm, cctype). Any idea?
Thanks, Jabba
bool space(char c) {
return isspace(c);
}
bool not_space(char c) {
return !isspace(c);
}
vector<string> split_v3(const string& str)
{
typedef string::const_iterator iter;
vector<string> ret;
iter i, j;
i = str.begin();
while (i != str.end())
{
// ignore leading blanks
i = find_if(i, str.end(), not_space);
// find end of next word
j = find_if(i, str.end(), space);
// copy the characters in [i, j)
if (i != str.end()) {
ret.push_back(string(i, j));
}
i = j;
}
return ret;
}
Writing this in a more STL-like manner,
#include <algorithm>
#include <cctype>
#include <functional>
#include <iostream>
#include <iterator>
#include <string>
#include <vector>
using namespace std;
template<class P, class T>
void split(const string &str, P pred, T output) {
for (string::const_iterator i, j = str.begin(), str_end = str.end();
(i = find_if(j, str_end, not1(pred))) != str_end;)
*output++ = string(i, j = find_if(i, str_end, pred));
}
int main() {
string input;
while (cin >> input) {
vector<string> words;
split(input, ptr_fun(::isspace), inserter(words, words.begin()));
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));
}
return 0;
}
There is no problem in the code you posted. There is a very obvious problem with the real code you linked to: is_space and space are member functions, and they cannot be called without an instance of Split2. This requirement doesn't make sense, though, so at least you should make those functions static.
(Actually it doesn't make much sense for split_v3 to be a member function either. What does having a class called Split2 achieve over having just a free function - possibly in a namespace?)
As requested:
class SplitV2 {
public:
void foo();
private:
struct space { bool operator() (char c) { return isspace(c); } };
struct not_space {
Split2::space space;
bool operator() (char c) { return !space(c); }
};
Use them with std::find_if(it, it2, space()) or std::find_if(it, it2, not_space().
Notice that not_space has a default constructed space as a member variable. It may be not wise to construct space in every call to bool not_space::operator() but maybe the compiler could take care of this. If the syntax for overloading operator() confuses you and you would like to know more about using structs as Predicates you should have a look at operator overloading and some guidelines to the STL.
Off hand, I would say it should probably be
i = str.find_if( ...
j = str.find_if( ...

Resources