"JNI ERROR (app bug): weak global reference table overflow" why? - android-ndk

I keep getting this error due to a leak in my native code according to this thread:
ReferenceTable overflow (max=512) JNI
yet, it seems to me the the AttachCurrentThread leaks. I tried this code and it leaks
// this code LEAKS!
// C++:
void Engine::UpdateCamera(float x, float y, float z) {
JNIEnv *jni;
app_->activity->vm->AttachCurrentThread(&jni, NULL);
//Do nothing
app_->activity->vm->DetachCurrentThread();
return;
}
// Java
public void updateCamera(final float x, final float y, final float z) {
if (_label2 == null)
return;
StackTraceElement trace = new Exception().getStackTrace()[0];
Log.e(APP_TAG, "Called:" +
trace.getClassName() + "->" + trace.getMethodName() + ":" + trace.getLineNumber());
}
Then I simply commented out everything and the program stopped leaking and ran forever :(
// this code never leaks, but it does not do anything either
void Engine::UpdateCamera(float x, float y, float z) {
JNIEnv *jni;
//app_->activity->vm->AttachCurrentThread(&jni, NULL);
//app_->activity->vm->DetachCurrentThread();
return;
}
Has anyone experienced leaking issues with AttachCurrentThread?
thank you.

Are you connected to the debugger? If so, disconnect and you may find the weak reference table returns to a reasonable value.
I had this same problem; if I run with the debugger, this occurs.

The below attatchTestMemoryLeak() function has a native memory leak, and I still have not figured out the reason, but I did find another way to avoid the native memory leak; see function attatchTestOK();
//c++ code
void attatchTestMemoryLeak(){
for(int i=0; i<100000; i++){
JNIEnv *env= nullptr;
//native thread try to attach java environment;
int getEnvStat = g_VM->GetEnv((void **)&env,JNI_VERSION_1_4);
if (getEnvStat == JNI_EDETACHED) {
jint attachStat=g_VM->AttachCurrentThread(&env, NULL);
if (attachStat == JNI_OK) {
LOG_E("index=%d, attach ok",i);
}else{
LOG_E("index=%d, attach failed",i);
}
}
//do something, call java function;
//Detatched the native thread from java environment;
jint detachStat=g_VM->DetachCurrentThread();
if(detachStat==JNI_OK){
LOG_E("detach ok, index=%d, detachStat=%d",i,detachStat);
}else{
LOG_E("detach failed, index=%d,detachStat=%d",i,detachStat);
}
env = NULL;
}
}
The below function works fine, and the https://www.jianshu.com/p/1f17ab192940 gives the explanation.
static pthread_key_t detachKey=0;
void detachKeyDestructor(void* arg)
{
pthread_t thd = pthread_self();
JavaVM* jvm = (JavaVM*)arg;
LOG_E("detach thread, thd=%u",thd);
jvm->DetachCurrentThread();
}
void attachTestOK(){
for (int i = 0; i < 1000000; i++)
{
JNIEnv *env= nullptr;
int getEnvStat = g_VM->GetEnv((void **)&env,JNI_VERSION_1_4);
if (getEnvStat == JNI_EDETACHED) {
if (detachKey == 0){
LOG_E("index=%d,create thread key",i);
pthread_key_create(&detachKey, detachKeyDestructor);
}
jint attachStat=g_VM->AttachCurrentThread(&env, NULL);
pthread_setspecific(detachKey, g_VM_Test);
if (attachStat == JNI_OK) {
LOG_E("index=%d, attach ok",i);
}else{
LOG_E("index=%d, attach failed",i);
}
}
LOG_E("index=%d, getEnvStat=%d",i,getEnvStat);
//do something, call java function;
env = NULL;
}
}

Related

'PTX JIT compilation failed' from cuModuleLoadData

Below is the code:
#define FILENAME "kernel.code"
#define kernel_name "hello_world"
#define THREADS 4
std::vector<char> load_file()
{
std::ifstream file(FILENAME, std::ios::binary | std::ios::ate);
std::streamsize fsize = file.tellg();
file.seekg(0, std::ios::beg);
std::vector<char> buffer(fsize);
if (!file.read(buffer.data(), fsize)) {
failed("could not open code object '%s'\n", FILENAME);
}
return buffer;
}
struct joinable_thread : std::thread
{
template <class... Xs>
joinable_thread(Xs&&... xs) : std::thread(std::forward<Xs>(xs)...) // NOLINT
{
}
joinable_thread& operator=(joinable_thread&& other) = default;
joinable_thread(joinable_thread&& other) = default;
~joinable_thread()
{
if(this->joinable())
this->join();
}
};
void run(const std::vector<char>& buffer) {
CUdevice device;
CUDACHECK(cuDeviceGet(&device, 0));
CUcontext context;
CUDACHECK(cuCtxCreate(&context, 0, device));
CUmodule Module;
CUDACHECK(cuModuleLoadData(&Module, &buffer[0]));
...
}
void run_multi_threads(uint32_t n) {
{
auto buffer = load_file();
std::vector<joinable_thread> threads;
for (uint32_t i = 0; i < n; i++) {
threads.emplace_back(std::thread{[&, i, buffer] {
run(buffer);
}});
}
}
}
int main() {
CUDACHECK(cuInit(0));
run_multi_threads(THREADS);
}
And the code kernel.cu used for ptx is as follows:
#include "cuda_runtime.h"
extern "C" __global__ void hello_world(float* a, float* b) {
int tx = threadIdx.x;
b[tx] = a[tx];
}
I m generating the ptx in this way
nvcc --ptx kernel.cu -o kernel.code
Im using a machine with GeForce GTX TITAN X.
And Im facing this "PTX JIT compilation failed" from cuModuleLoadData error, only when I m trying to use this with multiple threads. If i remove the multi-threading part and run normally, this error doesn't occur.
Can anyone please tell me what is going wrong and how to overcome this.
As mentioned in the comments, I was able to get it to work by moving the load_file() call to the main, so that the buffer read from the file is valid, and then pass only the buffer to all the threads.
Actually in the original code, the buffer will be deconstructed once it leaves the '{...}' scope. So when thread starts, you may read the invalid buffer.
If you put your buffer in the main, it will not be deconstructed or freed until the program exits.
So yes, it's because you pass the invalid buffer (which may have already been freed) to the cu code.

c++11 update pthread based code to std::thread or boost::thread

I have the following code that I would like to update to be more portable and c++11 friendly. However, I'm stuck as how to replace the pthread calls. I can use std::this_thread::get_id() to get the thread id but can't tell if that thread is still alive.
pthread_t activeThread = 0;
pthread_t getCurrentThread() {
return pthread_self();
}
bool isActiveThreadAlive() {
if(activeThread == 0) {
return false;
}
return pthread_kill(activeThread, 0) != ESRCH;
}
Potential std::thread version...
std::thread::id activeThread = std::thread::id();
std::thread::id getCurrentThread() {
return std::this_thread::get_id();
}
bool isActiveThreadAlive() {
if(activeThread == std::thread::id()) {
return false;
}
return pthread_kill(activeThread, 0) != ESRCH;// <--- need replacement!!!
}
What the code really needs to do is know if the thread has died from an exception or some other error that caused it to terminate without releasing the object. As in the following...
std::unique_lock<std::mutex> uLock = getLock();
while (activeThread != 0) {
if (threadWait.wait_for(uLock, std::chrono::seconds(30)) == std::cv_status::timeout) {
if (!isActiveThreadAlive()) {
activeThread = 0;
}
}
}
activeThread = getCurrentThread();
uLock.unlock();
try {
// do stuff here.
}
catch (const std::exception&) {
}
uLock.lock();
activeThread = 0;
And before anyone asks I do not have a guarantee of control over when, where or how the threads are created. The threads that call the functions may be from anywhere.

Try to compare 2 methods to implement bounded blocking queue

bounded blocking queue is famous, of course. There are mostly 2 methods to implement it. I try to understand which way is better:
Method 1: use counting semaphore
void *producer(void *arg) {
int i;
for (i = 0; i < loops; i++) {
sem_wait(&empty);
sem_wait(&mutex);
put(i);
sem_post(&mutex);
sem_post(&full);
}
}
void *consumer(void *arg) {
int i;
for (i = 0; i < loops; i++) {
sem_wait(&full);
sem_wait(&mutex);
int tmp = get();
sem_post(&mutex);
sem_post(&empty);
printf("%d\n", tmp);
}
}
Method 2: classic monitor pattern
class BoundedBuffer {
private:
int buffer[MAX];
int fill, use;
int fullEntries;
pthread_mutex_t monitor; // monitor lock
pthread_cond_t empty;
pthread_cond_t full;
public:
BoundedBuffer() {
use = fill = fullEntries = 0;
}
void produce(int element) {
pthread_mutex_lock(&monitor);
while (fullEntries == MAX)
pthread_cond_wait(&empty, &monitor);
//do something
pthread_cond_signal(&full);
pthread_mutex_unlock(&monitor);
}
int consume() {
pthread_mutex_lock(&monitor);
while (fullEntries == 0)
pthread_cond_wait(&full, &monitor);
//do something
pthread_cond_signal(&empty);
pthread_mutex_unlock(&monitor);
return tmp;
}
}
I understand the 2nd method can solve a lot of other problems. But how to compare these 2 methods? Looks like they can both fulfill the task.
Is there any link on detailed comparision?
Appreciate your help.
Thanks.
The big difference between those two methods is that the first one does not use pthread_ specific functions (semaphores are not part of pthread) and as such is not guaranteed to work in multithreaded enviornment.
In particular, semaphores do not protect memory ordering, so things written in one thread might not be readable on another. Mutexes are suitable for multi-thread message queue.

CreateThread doesn't work in my code

I wonder how to solve this problem.
I notice that CreateThread() doesn't work well in this code:
DWORD threadFunc1(LPVOID lParam)
{
int cur = (int)lParam
while(1)
{
//Job1...
//Reason
break;
}
Start(cur + 1);
Sleep(100);
}
void Start(int startNum)
{
....
CreateThread(NULL, NULL, (LPTHREAD_START_ROUTINE)threadFunc1, &startNum, 0, &dwThreadId);
...
}
void btnClicking()
{
Start(0);
}
In this code, there is a thread create by Start() and it calls Start() when the thread ends.
The second created thread does not work. I think the first thread disappeared and the second thread is destroyed.
What is the best way to solve this?
OS: Win 7 64bit Ultimate.
Tool: Visual Studio 2008.
It does not work because your code has bugs in it. The signature of your thread function is wrong, and you are passing the startNum value to the thread in the wrong way.
Try this instead:
DWORD WINAPI threadFunc1(LPVOID lParameter)
{
int cur = (int) lParameter;
while(1)
{
//Job1...
//Reason
break;
}
Start(cur + 1);
Sleep(100);
}
void Start(int startNum)
{
....
HANDLE hThread = CreateThread(NULL, NULL, &threadFunc1, (LPVOID) startNum, 0, &dwThreadId);
if (hThread != NULL)
{
// ... store it somewhere or close it, otherwise you are leaking it...
}
...
}
void btnClicking()
{
Start(0);
}

Copying objects in C++/CLI and message passing in multithreading

I'm trying to transfer a command line code that I have to a more visual program with a
GUI to enable easier use. The original code was in C++, so I'm using Visual C++ that is
available in Visual Studio Express 2012, but I have problems understanding the "new"
managed C++/CLI way of handling objects. Being new to CLI and managed C++, I was wondering
if someone can explain what I am doing wrong, and why it doesn't work. Now here is a
description of the code and the problem.
The program is essentially an optimization program:
There are multiple boxes (modes) in a system, each mode, depending on its type has a
few numerical coefficients that control its behavior and the way it responds to outside
excitation.
The program asks the user to specify the number of boxes and the type of each box.
Then tries to find the numerical coefficients that minimize the difference between
the system response with those obtained experimentally.
So, the UI has means for user to open the experimental result files, specify the number
of modes, and specify the type of each mode. Then, the user can initiate the processing
function by clicking on a start button, that initiates a background worker.
Following the example given in MSDN, I created a class that performs the work:
ref class curveFit
{
public: ref class CurrentState{
public:
int percentage;
int iterationNo;
int stage;
bool done;
multimode systemModel;
};
public:
int modes;
int returncode;
array<double> ^expExcitations;
array<double> ^expResults;
multimode systemModel;
private:
void fcn(int, int, double*, double*, int*);
double totalError(std::vector<double> &);
public:
delegate void fcndelegate(int, int, double*, double*, int*);
public:
curveFit(void);
curveFit^ fit(System::ComponentModel::BackgroundWorker^, System::ComponentModel::DoWorkEventArgs^, Options^);
};
multimode is just a container class: a list of different boxes.
ref class multimode
{
private:
Collections::Generic::List<genericBoxModel ^>^ models;
int modes;
public:
multimode(void);
multimode(const multimode%);
int modeNo(void);
void Add(genericBoxModel^);
void Clear();
genericBoxModel^ operator[](int);
multimode% operator=(const multimode%);
double result(double);
bool isValid();
std::vector<double> MapData();
void MapData(std::vector<double> &);
};
multimode::multimode(void)
{
models = gcnew Collections::Generic::List<genericBoxModel ^>();
modes = 0;
}
multimode::multimode(const multimode% rhs)
{
models = gcnew Collections::Generic::List<genericBoxModel ^>();
for(int ind = 0; ind < rhs.modes; ind++)
models->Add(rhs.models[ind]);
modes = rhs.modes;
}
int multimode::modeNo(void)
{
return modes;
}
void multimode::Add(genericBoxModel^ model)
{
models->Add(model);
modes++;
}
void multimode::Clear()
{
models->Clear();
modes = 0;
}
genericBoxModel^ multimode::operator[](int ind)
{
return models[ind];
}
multimode% multimode::operator=(const multimode% rhs)
{
models->Clear();
for(int ind = 0; ind < rhs.modes; ind++)
models->Add(rhs.models[ind]);
modes = rhs.modes;
return *this;
}
double multimode::result(double excitation)
{
double temp = 0.0;
for(int ind = 0; ind < modes; ind++)
temp += models[ind]->result(excitation);
return temp;
}
bool multimode::isValid()
{
bool isvalid = true;
if(modes < 1)
return false;
for(int ind = 0; ind < modes; ind++)
isvalid = (isvalid && models[ind]->isValid());
return isvalid;
}
std::vector<double> multimode::fullMap()
{
//Map the model coefficients to a vector of doubles
...
}
void multimode::fullMap(std::vector<double> &data)
{
//Map a vector of doubles to the model coefficients
...
}
and genericBoxModel is an abstract class that all box models are based on.
The curvefit::fit function does the optimization based on the options passed to it:
curveFit^ curveFit::fit(System::ComponentModel::BackgroundWorker^ worker, System::ComponentModel::DoWorkEventArgs^ e, Options^ opts)
{
fcndelegate^ del = gcnew fcndelegate(this, &curveFit::fcn);
std::vector<double> data;
CurrentState^ state = gcnew CurrentState;
state->done = false;
state->stage = 0;
state->percentage = 0;
state->systemModel = systemModel;
worker->ReportProgress(state->percentage, state);
switch(opts->optimizationMethod)
{
case 0:
while(iterationNo < maxIterations)
{
data = systemModel.MapData();
OptimizationMethod0::step(some_parameters, data, (optmethods::costfunction)Runtime::InteropServices::Marshal::GetFunctionPointerForDelegate(del).ToPointer());
systemModel.MapData(data);
iterationNo++;
state->percentage = 0;
state->systemModel = systemModel;
worker->ReportProgress(state->percentage, state);
}
...
}
}
I'm passing the system model inside the state so that I can display the results of the
latest step on the screen, which doesn't work, but that is another question :-)
The start button calls the curvefit::fit function after initializing the system model:
private: System::Void btnStart_Click(System::Object^ sender, System::EventArgs^ e) {
systemModel.Clear();
for(int mode = 0; mode < modes; mode++)
{
switch(model)
{
case 0:
systemModel.Add(gcnew model0);
systemModel[mode]->coefficients[0] = 100.0 / double(mode + 1);
...
break;
...
}
}
btnStart->Enabled = false;
stStatusText->Text = "Calculating!";
Application::UseWaitCursor = true;
curveFit^ cf = gcnew curveFit;
fitCurve->RunWorkerAsync(cf);
}
private: System::Void fitCurve_DoWork(System::Object^ sender, System::ComponentModel::DoWorkEventArgs^ e) {
System::ComponentModel::BackgroundWorker^ worker;
worker = dynamic_cast<System::ComponentModel::BackgroundWorker^>(sender);
curveFit^ cf = safe_cast<curveFit^>(e->Argument);
cf->expExcitations = gcnew array<double>(expExcitations.Count);
expExcitations.CopyTo(cf->expExcitations);
cf->expResults = gcnew array<double>(expResults.Count);
expResults.CopyTo(cf->expResults);
cf->systemModel = systemModel;
cf->modes = modes;
e->Result = cf->fit(worker, e, options);
}
This works perfectly! But, in order to make the optimization process faster and more
successful, I wanted to use the results of previous optimizations as the initial guess
for the next run (if possible):
multimode oldmodel(systemModel);
systemModel.Clear();
for(int mode = 0; mode < modes; mode++)
{
switch(model)
{
case 0:
if(mode < oldmodel.modeNo() && oldmodel.isValid() && (oldmodel[mode]->model == 0))
systemModel.Add(oldmodel[mode]);
else
{
systemModel.Add(gcnew model0);
systemModel[mode]->coefficients[0] = 100.0 / double(mode + 1);
...
}
break;
...
Now, my problem is, after this change, it seems that the messages don't get passed
correctly: the first time the start button is clicked everything functions as it should,
but from then on, if the statement systemModel.Add(oldmodel[mode]); gets executed,
results remain the same as the initial guesses, and don't get updated after the fit
function is called.
So, why should these two lines(Add(oldmodel[mode]) and Add(gcnew model0)) give
such different results?

Resources