Pytorch get "RuntimeError: CUDA error: device-side assert triggered"

Pytorch get "RuntimeError: CUDA error: device-side assert triggered" - pytorch

Snippet from my code :
max = torch.tensor([3])
if USE_CUDA: max = max.cuda()
max_embedding = self.max_embedding(max) # dim of max_embedding: 1*5
item_dict = {}
for item in item_list:
item = torch.tensor(item)
if USE_CUDA: item = item.cuda()
item_embedding = self.item_embedding(item) # dim of item_embedding: 1*20
embedded = torch.cat((max_embedding, item_embedding), 1)
But I get error of "RuntimeError: CUDA error: device-side assert triggered".
The output by adding CUDA_LAUNCH_BLOCKING=1:
/pytorch/aten/src/THC/THCTensorIndex.cu:308: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim =2, SrcDim = 2, IdxDim = -2: block: [0,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:308: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim =2, SrcDim = 2, IdxDim = -2: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:308: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim =2, SrcDim = 2, IdxDim = -2: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:308: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim =2, SrcDim = 2, IdxDim = -2: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:308: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim =2, SrcDim = 2, IdxDim = -2: block: [0,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
File "mytest.py", line 33, in <module>
if USE_CUDA: item = item.cuda()
RuntimeError: CUDA error: device-side assert triggered
How to fix it?

This is a typical case of index-out-of-bounds error manifested itself in the context of embeddings. Check this link for solution to a similar problem.

Related

Python ctypes dll - pass pointer to C structure

I am trying to extract data from a Hamamatsu C15713 MEMS-FPI spectrometer with the dll, through python ctypes.
The dll function I am trying to call is :
HRESULT hpkfpi_getdevcount(HPKFPI_INITPARAM* initparam);
Where the parameter initparam should be a pointer towards a structure:
#define HPKFPI_INITARY_NUMBER 127
#define HPKFPI_INITARY_STRLEN 32
typedef struct {
int32_t sizeof_parameter;
int32_t devicecount;
char buf_VID[HPKFPI_INITARY_NUMBER][HPKFPI_INITARY_STRLEN];
char buf_PID[HPKFPI_INITARY_NUMBER][HPKFPI_INITARY_STRLEN];
char buf_SERIAL[HPKFPI_INITARY_NUMBER][HPKFPI_INITARY_STRLEN];
} HPKFPI_INITPARAM;
My current code:
from ctypes import *
class HPKFPI_INITPARAM(Structure):
# Define the structure object
_fields_ = [("sizeof_parameter", c_uint32),("devicecount", c_uint32),
('buf_VID', (c_char*127)*32), ('buf_PID', (c_char*127)*32), ('buf_SERIAL', (c_char*127)*32)]
class Hamamatsu_C15713:
def __init__(self, dll_path):
dll_filename = os.path.join(dll_path, 'HPKFPI_x64.dll')
self._dll = windll.LoadLibrary(dll_filename)
logging.debug(f'dll loaded')
# Initialize the dll
self._dll.hpkfpi_init()
# Try to get the device count
deviceCount = HPKFPI_INITPARAM()
self._dll.hpkfpi_getdevcount(byref(deviceCount))
This code does not produce errors, however the fields of deviceCount all remain 0.
Numeric values are 0 and the char arrays are bitstrings value 0.
What have I tried to solve this:
POINTER(deviceCount) instead of byref(deviceCount) returns an error (TypeError: Not Hashable).
Whatever object I make as deviceCount, it is not changed by the function.
Can anyone give me advise?

With the help of a colleague, we managed advance on this problem.
Important point: There is an error in documentation of the Hamamatsu C15713 MEMS-FPI spectrometer dll: The dimensions of the initparam are wrong.
127 should be 255
HPKFPI_INITARY_NUMBER = 255
HPKFPI_INITARY_STRLEN = 32
class HPKFPI_INITPARAM(Structure):
_fields_ = [
("sizeof_parameter", c_int32),
("devicecount", c_int32),
('buf_VID', (c_char * HPKFPI_INITARY_STRLEN) * HPKFPI_INITARY_NUMBER),
('buf_PID', (c_char * HPKFPI_INITARY_STRLEN) * HPKFPI_INITARY_NUMBER),
('buf_SERIAL', (c_char * HPKFPI_INITARY_STRLEN) * HPKFPI_INITARY_NUMBER)
]
class HPKFPI_OPENPARAM(Structure):
_fields_ = [
("sizeof_parameter", c_int32),
("id", c_int32),
("hfpi", HANDLE)
]
class Hamamatsu_C15713:
"""
This class is intended to a a wrapper around the Hamamatsu C15713 dll
"""
def __init__(self, dll_path):
logging.debug("Loading dll %s", dll_path)
# add dll dir to os path env var so dependent dll can be found
os.environ['PATH'] = str(DLL_DIR) + ';' + os.environ['PATH']
self._dll = windll.LoadLibrary(str(dll_path))
logging.debug('dll loaded')
# Initialize the dll
ret = self._dll.hpkfpi_init()
logging.debug("init ret: %s", ret)
if ret < 0:
raise SystemError(f"hpkfpi_init failed, ret code: {ret}")
# Try to get the device count
init_param = HPKFPI_INITPARAM()
init_param.sizeof_parameter = sizeof(init_param)
logging.debug("devcount sizeof: %s",init_param.sizeof_parameter)
ret = self._dll.hpkfpi_getdevcount(byref(init_param))
logging.debug("hpkfpi_getdevcount ret: %s", ret)
if ret < 0:
raise SystemError(f"hpkfpi_getdevcount failed, ret code: {ret}")
logging.debug(
"Extracted devicecount : %d, VID %s, PID %s, SERIAL %s", init_param.devicecount,
bytes(init_param.buf_VID[0]).decode(),
bytes(init_param.buf_PID[0]).decode(),
bytes(init_param.buf_SERIAL[0]).decode())
open_param = HPKFPI_OPENPARAM()
open_param.id = 0 # device 0
open_param.sizeof_parameter = sizeof(open_param)
ret = self._dll.hpkfpi_open(byref(open_param))
logging.debug("hpkfpi_open ret: %s", ret)
if ret < 0:
raise SystemError(f"hpkfpi_open failed, ret code: {ret}")
self.dev_handle = open_param.hfpi
logging.debug("Open handle: %d", self.dev_handle)

Numba Invalid use of BoundFunction on np.astype

I'm trying to compile a function that does some computation on an image patch using numba. Here is part of the code:
#jit(nopython=True, parallel=True)
def value_at_patch(img, coords, imgsize, patch_radius):
x_center = coords[0]; y_center = coords[1];
r = patch_radius
s = 2*r+1
xvec = np.arange(x_center-r, x_center+r+1)
xvec[xvec <= 0] = 0 #prevent negative index
xvec = xvec.astype(int)
yvec = np.arange(y_center-r, y_center+r+1)
yvec[yvec <= 0] = 0
yvec = yvec.astype(int)
A = np.zeros((s,s))
#do some parallel computation on A
p = np.any(A)
return p
I'm able to compile the function, but when I run it, I get the following error message:
Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of BoundFunction(array.astype for array(float64, 1d, C)) with parameters (Function(<class 'int'>))
* parameterized
[1] During: resolving callee type: BoundFunction(array.astype for array(float64, 1d, C))
[2] During: typing of call at <ipython-input-17-90e27ac302a8> (42)
File "<ipython-input-17-90e27ac302a8>", line 42:
def value_at_patch(img, coords, imgsize, patch_radius):
<source elided>
xvec[xvec <= 0] = 0 #prevent negative index
xvec = xvec.astype(int)
^
I checked the numba documentation and np.astype should be supported with just one argument. Do you know what could be causing the problem?

Use np.int64 in place of int in following places:
xvec = xvec.astype(np.int64)
yvec = yvec.astype(np.int64)

Parallel QuickSort, can someone help me?

I am trying to implement the quicksort parallelization by specifying the list separation snippet in two others compared to the pivo. I am having problems with the syntax and to save the pointer at the end of the two new lists. How do I get rid of the syntax errors and save the list sizes at the end of the kernel?
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda import gpuarray, compiler
from pycuda.compiler import SourceModule
import time
import numpy as np
def quickSort_paralleloGlobal(listElements: list) -> list:
if len(listElements) <= 1:
return listElements
else:
pivo = listElements.pop()
list1 = []
list2 = []
kernel_code_template = """
__global__ void separateQuick(int *listElements, int *list1, int *list2, int pivo)
{
int index1 = 0, index2 = 0;
int index = blockIdx.x * blockDim.x + threadIdx.x;
int stride = blockDim.x * gridDim.x;
for (int i = index; i < %(ARRAY_SIZE)s; i+= stride)
if (lista[i] < pivo
{
list1[index2] = listElements[i];
index1++;
}
else
{
list2[index2] = listElements[i];
index2++;
}
}
"""
SIZE = len(listElements)
listElements = np.asarray(listElements)
listElements = listElements.astype(np.int)
lista_gpu = cuda.mem_alloc(listElements.nbytes)
cuda.memcpy_htod(lista_gpu, listElements)
list1_gpu = cuda.mem_alloc(listElements.nbytes)
list2_gpu = cuda.mem_alloc(listElements.nbytes)
BLOCK_SIZE = 256
NUM_BLOCKS = (SIZE + BLOCK_SIZE - 1) // BLOCK_SIZE
kernel_code = kernel_code_template % {
'ARRAY_SIZE': SIZE
}
mod = compiler.SourceModule(kernel_code)
arraysQuick = mod.get_function("separateQuick")
arraysQuick(lista_gpu, list1_gpu, list2_gpu, pivo, block=(BLOCK_SIZE, 1, 1), grid=(NUM_BLOCKS, 1))
list1 = list1_gpu.get()
list2 = list2_gpu.get()
np.allclose(list1, list1_gpu.get())
np.allclose(list2, list2_gpu.get())
return quickSort_paralleloGlobal(list1) + [pivo] + quickSort_paralleloGlobal(list2)
Here is the runtime error:
Traceback (most recent call last):
File "C:/Users/mateu/Documents/GitHub/ppc_Sorting_and_Merging/quickSort.py", line 104, in <module>
print(quickSort_paraleloGlobal([1, 5, 4, 2, 0]))
File "C:/Users/mateu/Documents/GitHub/ppc_Sorting_and_Merging/quickSort.py", line 60, in quickSort_paraleloGlobal
mod = compiler.SourceModule(kernel_code)
File "C:\Users\mateu\Documents\GitHub\ppc_Sorting_and_Merging\venv\lib\site-packages\pycuda\compiler.py", line 291, in __init__
arch, code, cache_dir, include_dirs)
File "C:\Users\mateu\Documents\GitHub\ppc_Sorting_and_Merging\venv\lib\site-packages\pycuda\compiler.py", line 254, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File "C:\Users\mateu\Documents\GitHub\ppc_Sorting_and_Merging\venv\lib\site-packages\pycuda\compiler.py", line 137, in compile_plain
stderr=stderr.decode("utf-8", "replace"))
pycuda.driver.CompileError: nvcc compilation of C:\Users\mateu\AppData\Local\Temp\tmpefxgkfkk\kernel.cu failed
[command: nvcc --cubin -arch sm_61 -m64 -Ic:\users\mateu\documents\github\ppc_sorting_and_merging\venv\lib\site-packages\pycuda\cuda kernel.cu]
[stdout:
kernel.cu
]
[stderr:
kernel.cu(10): error: expected a ")"
kernel.cu(19): warning: parsing restarts here after previous syntax error
kernel.cu(19): error: expected a statement
kernel.cu(5): warning: variable "indexMenor" was declared but never referenced
kernel.cu(5): warning: variable "indexMaior" was declared but never referenced
2 errors detected in the compilation of "C:/Users/mateu/AppData/Local/Temp/tmpxft_00004260_00000000-10_kernel.cpp1.ii".
]
Process finished with exit code 1

There are a number of problems with your code. I don't think I will be able to list them all. However one of the central problems is that you have attempted to do a naive conversion of a serial quicksort into a thread-parallel quicksort, and such a simple conversion is not possible.
To allow threads to work in a parallel fashion, while dividing up an input list into one of two separate output lists, requires a number of changes to your kernel code.
However we can address most of the other issues by limiting your kernel launches to one thread each.
With that idea, the following code appears to sort the given input correctly:
$ cat t18.py
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda import gpuarray, compiler
from pycuda.compiler import SourceModule
import time
import numpy as np
def quickSort_paralleloGlobal(listElements):
if len(listElements) <= 1:
return listElements
else:
pivo = listElements.pop()
pivo = np.int32(pivo)
kernel_code_template = """
__global__ void separateQuick(int *listElements, int *list1, int *list2, int *l1_size, int *l2_size, int pivo)
{
int index1 = 0, index2 = 0;
int index = blockIdx.x * blockDim.x + threadIdx.x;
int stride = blockDim.x * gridDim.x;
for (int i = index; i < %(ARRAY_SIZE)s; i+= stride)
if (listElements[i] < pivo)
{
list1[index1] = listElements[i];
index1++;
}
else
{
list2[index2] = listElements[i];
index2++;
}
*l1_size = index1;
*l2_size = index2;
}
"""
SIZE = len(listElements)
listElements = np.asarray(listElements)
listElements = listElements.astype(np.int32)
lista_gpu = cuda.mem_alloc(listElements.nbytes)
cuda.memcpy_htod(lista_gpu, listElements)
list1_gpu = cuda.mem_alloc(listElements.nbytes)
list2_gpu = cuda.mem_alloc(listElements.nbytes)
l1_size = cuda.mem_alloc(4)
l2_size = cuda.mem_alloc(4)
BLOCK_SIZE = 1
NUM_BLOCKS = 1
kernel_code = kernel_code_template % {
'ARRAY_SIZE': SIZE
}
mod = compiler.SourceModule(kernel_code)
arraysQuick = mod.get_function("separateQuick")
arraysQuick(lista_gpu, list1_gpu, list2_gpu, l1_size, l2_size, pivo, block=(BLOCK_SIZE, 1, 1), grid=(NUM_BLOCKS, 1))
l1_sh = np.zeros(1, dtype = np.int32)
l2_sh = np.zeros(1, dtype = np.int32)
cuda.memcpy_dtoh(l1_sh, l1_size)
cuda.memcpy_dtoh(l2_sh, l2_size)
list1 = np.zeros(l1_sh, dtype=np.int32)
list2 = np.zeros(l2_sh, dtype=np.int32)
cuda.memcpy_dtoh(list1, list1_gpu)
cuda.memcpy_dtoh(list2, list2_gpu)
list1 = list1.tolist()
list2 = list2.tolist()
return quickSort_paralleloGlobal(list1) + [pivo] + quickSort_paralleloGlobal(list2)
print(quickSort_paralleloGlobal([1, 5, 4, 2, 0]))
$ python t18.py
[0, 1, 2, 4, 5]
$
The next step in the porting process would be to convert your naive serial kernel to one that could operate in a thread-parallel fashion. One relatively simple approach would be to use atomics to manage all output data (both lists, as well as updates to the sizes of each list).
Here is one possible approach:
$ cat t18.py
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda import gpuarray, compiler
from pycuda.compiler import SourceModule
import time
import numpy as np
def quickSort_paralleloGlobal(listElements):
if len(listElements) <= 1:
return listElements
else:
pivo = listElements.pop()
pivo = np.int32(pivo)
kernel_code_template = """
__global__ void separateQuick(int *listElements, int *list1, int *list2, int *l1_size, int *l2_size, int pivo)
{
int index = blockIdx.x * blockDim.x + threadIdx.x;
int stride = blockDim.x * gridDim.x;
for (int i = index; i < %(ARRAY_SIZE)s; i+= stride)
if (listElements[i] < pivo)
{
list1[atomicAdd(l1_size, 1)] = listElements[i];
}
else
{
list2[atomicAdd(l2_size, 1)] = listElements[i];
}
}
"""
SIZE = len(listElements)
listElements = np.asarray(listElements)
listElements = listElements.astype(np.int32)
lista_gpu = cuda.mem_alloc(listElements.nbytes)
cuda.memcpy_htod(lista_gpu, listElements)
list1_gpu = cuda.mem_alloc(listElements.nbytes)
list2_gpu = cuda.mem_alloc(listElements.nbytes)
l1_size = cuda.mem_alloc(4)
l2_size = cuda.mem_alloc(4)
BLOCK_SIZE = 256
NUM_BLOCKS = (SIZE + BLOCK_SIZE - 1) // BLOCK_SIZE
kernel_code = kernel_code_template % {
'ARRAY_SIZE': SIZE
}
mod = compiler.SourceModule(kernel_code)
arraysQuick = mod.get_function("separateQuick")
l1_sh = np.zeros(1, dtype = np.int32)
l2_sh = np.zeros(1, dtype = np.int32)
cuda.memcpy_htod(l1_size, l1_sh)
cuda.memcpy_htod(l2_size, l2_sh)
arraysQuick(lista_gpu, list1_gpu, list2_gpu, l1_size, l2_size, pivo, block=(BLOCK_SIZE, 1, 1), grid=(NUM_BLOCKS, 1))
cuda.memcpy_dtoh(l1_sh, l1_size)
cuda.memcpy_dtoh(l2_sh, l2_size)
list1 = np.zeros(l1_sh, dtype=np.int32)
list2 = np.zeros(l2_sh, dtype=np.int32)
cuda.memcpy_dtoh(list1, list1_gpu)
cuda.memcpy_dtoh(list2, list2_gpu)
list1 = list1.tolist()
list2 = list2.tolist()
return quickSort_paralleloGlobal(list1) + [pivo] + quickSort_paralleloGlobal(list2)
print(quickSort_paralleloGlobal([1, 5, 4, 2, 0]))
$ python t18.py
[0, 1, 2, 4, 5]
$
I'm not suggesting that the above examples are perfect or defect free. Also, I have not identified each and every change I made to your code. I suggest you study the differences between these examples and your posted code.
I should also mention that this isn't a fast or efficient way to sort numbers on the GPU. I assume this is for a learning exercise. If you're interested in fast parallel sorting, you are encouraged to use a library implementation. If you want to do this from python, one possible implementation is provided by cupy

PyOpenCL how to modify a matrix locally within the kernel function

I am trying to modify a matrix (Pbis) locally within a pyOpenCL kernel function and when filling up this matrix with 0 it alters the result matrix R. When executing this code we obtain weird values in the R matrix. It is probably due to memory allocation but we cannot figure out how to fix it. Normally R should be exclusively composed of the init value.
program = cl.Program(context, """
__kernel void generate_paths(__global float *P, ushort const n,
ushort N, ushort init, __global float *R){
int i = get_global_id(0);
__private float* Pbis;
for (int k=0; k<n; k++){
Pbis[k] = 0;
}
for (int j=0; j<n; j++)
{
R[i*(n+1) + j] = init;
}
R[i*(n+1) + n] = init;
}
""").build()
The parameters for the generation are:
program.generate_paths(queue, res_np.shape, None, P_buf, np.uint16(n), np.uint16(N), np.uint16(init), res_buf)
Here is the entire code for reproducibility:
import numpy as np
import pyopencl as cl
import numpy.linalg as la
import os
os.environ['PYOPENCL_COMPILER_OUTPUT'] = '1'
os.environ['PYOPENCL_CTX'] = '1'
(n, N) = (3,6)
U = np.random.uniform(0,1, size=(n+1)*N)
U = U.astype(np.float32)
P = np.matrix([[0, 1/3, 1/3, 1/3], [1/3, 0, 1/3, 1/3], [1/3, 1/3, 0, 1/3], [1/3, 1/3, 1/3, 0]])
P = P.astype(np.float32)
res_np = np.zeros((N, n+1),dtype = np.float32)
platform = cl.get_platforms()[0]
device = platform.get_devices()[0]
context = cl.Context([device])
queue = cl.CommandQueue(context)
mf = cl.mem_flags
U_buf = cl.Buffer(context, mf.COPY_HOST_PTR | mf.COPY_HOST_PTR, hostbuf=U)
P_buf = cl.Buffer(context, mf.COPY_HOST_PTR | mf.COPY_HOST_PTR, hostbuf=P)
res_buf = cl.Buffer(context, mf.WRITE_ONLY, res_np.nbytes)
init = 0
program = cl.Program(context, """
__kernel void generate_paths(__global const float *U, __global float *P, ushort const n,
ushort N, ushort init, __global float *R){
int i = get_global_id(0);
int current = init;
__private float* Pbis;
for (int k=0; k<n; k++){
Pbis[k] = 0;
}
for (int j=0; j<n; j++)
{
R[i*(n+1) + j] = current;
}
R[i*(n+1) + n] = init;
}
""").build()
#prg.multiply(queue, c.shape, None,
# np.uint16(n), np.uint16(m), np.uint16(p),
# a_buf, b_buf, c_buf)
# a_mul_b = np.empty_like(c)
# cl.enqueue_copy(queue, a_mul_b, c_buf)
program.generate_paths(queue, res_np.shape, None, U_buf, P_buf, np.uint16(n), np.uint16(N), np.uint16(init), res_buf)
chem_gen = np.empty_like(res_np)
cl.enqueue_copy(queue, chem_gen, res_buf)
print("Platform Selected = %s" %platform.name)
print("Device Selected = %s" %device.name)
print("Generated Paths:")
print (chem_gen)

Call one Rcpp function inside another passing through DataFrame

I've been converting some R code to Rcpp functions recently. I'm simulating people parking their cars in parking lots. I have a function which picks which lot a person will park in based on what gate they enter through and the fullness of the parking lots.
#include <Rcpp.h>
#include <numeric>
#include <chrono>
// [[Rcpp::plugins(cpp11)]]
using namespace Rcpp;
// [[Rcpp::export]]
std::string pickLotcpp(std::string gate,DataFrame dist,DataFrame curr,NumericVector maxDist = 0.005) {
std::vector<std::string> gates = Rcpp::as<std::vector<std::string> >(dist["inGate"]);
std::vector<std::string> lots = Rcpp::as<std::vector<std::string> >(dist["lot"]);
NumericVector d = dist["dist"];
std::vector<std::string> currLots = Rcpp::as<std::vector<std::string> >(curr["Lot"]);
NumericVector cap = curr["Cap"];
NumericVector util = curr["Util"];
NumericVector percFree = (cap - util)/cap;
std::vector<std::string> relLot;
NumericVector relD;
int n = gates.size();
for(int i = 0; i < n; i++){
if(gates[i] == gate){
if(d[i] <= maxDist[0]){
relLot.push_back(lots[i]);
relD.push_back(pow(d[i],-2));
}
}
}
n = relLot.size();
int n2 = currLots.size();
NumericVector relPerc;
for(int i = 0; i < n; i++){
for(int j = 0; j < n2; j++){
if(relLot[i] == currLots[j]){
relPerc.push_back(percFree[j]);
}
}
}
relD = relD*relPerc;
NumericVector csV(relD.size());
std::partial_sum(relD.begin(), relD.end(), csV.begin());
NumericVector::iterator mv;
mv = std::max_element(csV.begin(),csV.end());
double maxV = *mv;
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
std::mt19937 gen(seed);
std::uniform_real_distribution<> dis(0, maxV);
double rv = dis(gen);
int done = 0;
int i = 0;
std::string selGate;
while(done < 1){
if(csV[i] >= rv){
selGate = relLot[i];
done = 1;
}
i++;
}
return selGate;
}
which works great in R:
fakeDist = structure(list(inGate = c("A", "A", "B", "B"), lot = c("Y", "Z", "Y", "Z"), dist = c(0.001, 0.003, 0.003, 0.001)), .Names = c("inGate", "lot", "dist"), row.names = c(NA, 4L), class = c("tbl_df", "tbl", "data.frame"))
fakeStatus = structure(list(Lot = c("Y", "Z"), Cap = c(100, 100), Util = c(0, 0)), .Names = c("Lot", "Cap", "Util"), row.names = c(NA, 2L), class = c("tbl_df", "tbl", "data.frame"))
pickLotcpp("A",fakeDist,fakeStatus)
#> [1] "Y"
Now I'm trying to write the function which will loop through all the gate activity and park people sequentially. So I have this Rcpp function:
// [[Rcpp::export]]
List test(DataFrame records, DataFrame currentLoc, DataFrame dist,
DataFrame currentStatus, NumericVector times){
List out(times.size());
NumericVector recID = records["ID"];
NumericVector recTime = records["Time"];
NumericVector recDir = records["Dir"];
std::vector<std::string> Gate = Rcpp::as<std::vector<std::string> >(records["Gate"]);
NumericVector currState = currentLoc["State"];
std::vector<std::string> currLot = Rcpp::as<std::vector<std::string> >(currentLoc["Lot"]);
out[0] = pickLotcpp(Gate[0],dist,currentStatus);
return out;
}
This is in the same file, under pickLotcpp. It compiles fine, but when called causes R to crash.
fakeData = structure(list(ID = c(1, 2, 3), Time = c(1, 2, 3), Dir = c(1, 1, 1), Gate = c("A", "A", "B")), .Names = c("ID", "Time", "Dir", "Gate"), row.names = c(NA, 3L), class = c("tbl_df", "tbl", "data.frame"))
fakeLoc = structure(list(ID = c(1, 2, 3), State = c(0, 0, 0), Lot = c("", "", "")), .Names = c("ID", "State", "Lot"), row.names = c(NA, 3L), class = c("tbl_df", "tbl", "data.frame"))
a = test(fakeData, fakeLoc, fakeDist, fakeStatus, 10)
I've written other Rcpp code where functions call functions and they work fine. The only thing I can think of is that I'm passing a DataFrame that was an input directly to the other function but I can't find anything that says I can't. I'm not an expert C++ programmer - I just started hacking in it a couple weeks ago and this has me stumped.
How can I call pickLotcpp from test passing along the needed distance and status data frames?

It seems not related to Rcpp.
Can you double check the line
selGate = relLot[i];
relLot might be empty.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pytorch get "RuntimeError: CUDA error: device-side assert triggered" - pytorch

This is a typical case of index-out-of-bounds error manifested itself in the context of embeddings. Check this link for solution to a similar problem.

Related

Python ctypes dll - pass pointer to C structure

Numba Invalid use of BoundFunction on np.astype

Parallel QuickSort, can someone help me?

PyOpenCL how to modify a matrix locally within the kernel function

Call one Rcpp function inside another passing through DataFrame

Categories

Resources