Strange Memory Behavoir in JNI

Strange Memory Behavoir in JNI - memory-leaks

I have found some strange behavior concerning multidimensional arrays in JNI and after hours of research I still have no idea how to solve my problem. I have the following JNI code:
JNIEXPORT jobject JNICALL Java_leaktest_NativeClass_nativeCalculation
(JNIEnv *env, jobject thiz, jobjectArray arr) {
double** multidimArray = new double*[512];
for (int i = 0;i < 512;i++) {
multidimArray[i] = new double[512];
for (int j = 0;j < 512;j++) {
multidimArray[i][j] = i * j;
}
}
jobjectArray jMultidimArray = env->NewObjectArray(512, env->FindClass("[D"), 0);
for (int i = 0;i < 512;i++) {
jdoubleArray row = env->NewDoubleArray(512);
jdouble* elems = (jdouble*)multidimArray[i];
env->SetDoubleArrayRegion(row, 0, 512, elems);
env->SetObjectArrayElement(jMultidimArray, i, row);
env->DeleteLocalRef(row);
}
jclass arrayClass = env->FindClass("leaktest/ArrayClass");
jobject arrObj = env->NewObject(arrayClass, env->GetMethodID(arrayClass, "<init>", "(II[[D)V"), 512, 512, jMultidimArray);
for (int i = 0;i < 512;i++) {
delete multidimArray[i];
}
delete multidimArray;
return arrObj;
}
In Java, I simply call this native Method repeatedly. With every call the displayed amount of RAM occupied by the JVM increases by about 1 to 2 MB. It appears to me that I am somewhere allocating memory in the C++ part and never releasing it, but I have no idea where that should occur.
Michael

My guess is it would be here
jclass arrayClass = env->FindClass("leaktest/ArrayClass");
jobject arrObj = env->NewObject(arrayClass, env->GetMethodID(arrayClass, "<init>", "(II[[D)V"), 512, 512, jMultidimArray);

Related

stringstream serialization, on centOS, of large set of floats is faster than 4 pthread-ts which serialize chunks. std::threads are faster on Windows

I have the task to optimize the serialization of large sets of floats on a hard-disk.
My initial approach has the following:
class StringStreamDataSerializer
{
public:
void serializeRawData(const vector<float>& data);
void saveToFileStream(std::fstream& file);
private:
stringstream _stringStream;
};
void StringStreamDataSerializer::serializeRawData(const vector<float>& data)
{
for (float currentFloat : data)
_stringStream << currentFloat;
}
void StringStreamDataSerializer::saveToFileStream(std::fstream& file)
{
file << _stringStream.str().c_str();
file.close();
}
I wanted to separate the task of serializaton between 4 threads, to make the
serialization faster. Here's how:
struct st_args
{
const vector<float>* data;
size_t from;
size_t to;
size_t segment;
} ;
string outputs[4];
std::mutex g_display_mutex;
void serializeLocal(void *context)
{
struct st_args *readParams = (st_args*)context;
for (auto i = readParams->from; i < readParams->to; i++)
{
string currentFloat = std::to_string( readParams->data->at(i));
currentFloat.erase(currentFloat.find_last_not_of('0') + 1,
std::string::npos);
outputs[readParams->segment] += currentFloat;
}
}
void SImplePThreadedSerializer::serializeRawData(const vector<float>& data)
{
const int N = 4;
size_t totalFloats = data.size();
st_args* seg;
pthread_t* chunk;
chunk = (pthread_t *) malloc(N*sizeof(pthread_t));
seg = (st_args *) malloc(N*sizeof(st_args));
size_t from = 0;
for(int i = 0; i < N; i++)
{
seg[i].from = 0;
seg[i].data = &data;
}
int i = 0;
for (; i < N - 1; ++i)
{
seg[i].from = from;
seg[i].to = seg[i].from + totalFloats / N;
seg[i].segment = i;
pthread_create(&chunk[i], NULL, (void *(*)(void *)) serializeLocal,
(void *) &(seg[i]));
from += totalFloats / N;
}
seg[i].from = from;
seg[i].to = totalFloats;
seg[i].segment = i;
pthread_create(&chunk[i], NULL, (void *(*)(void *)) serializeLocal, (void *)
&(seg[i]));
size_t totalBuffered = 0;
for (int k = 0; k < N; k++)
{
pthread_join(chunk[k], NULL);
totalBuffered += outputs[k].size();
}
str.reserve(totalBuffered);
for (int k = 0; k < N; k++)
{
str+= outputs[k];
}
free(chunk);
free(seg);
}
Turns out, that the stringstream is faster even from 4 thread on Linux. On Windows I am archiving an optimization with the presented approach (with std::thread) on Windows, but on Linux I have the opposite results. Any explanation why would be helpful and appreciated.
Here are the results on centOS:
* Serialization of 10000000 floats on the hard disk *
StringStreamDataSerializer flushes data in file in 0.55 seconds.
StringStreamDataSerializer Finished in 3.28 seconds.
SImplePThreadedSerializer flushes data in file in 0.46 seconds.
SImplePThreadedSerializer Finished in 6.96 seconds.
On windows, the multithreaded serialization is done by 4 std::threads and they actually optimize the serialization:
static void serializeChunk(string& output, const vector<float>& data, size_t
from, size_t to)
{
for (auto i = from; i < to; i++)
{
string currentFloat = std::to_string(data[i]);
//fuckin trim the zeroes at the end
currentFloat.erase(currentFloat.find_last_not_of('0') + 1,
std::string::npos);
output += currentFloat;
}
}
void SimpleMultiThreadedSerializer::serializeRawData(const vector<float>&
data)
{
const int N = 4;
thread t[N]; // say, 4 CPUs.
string outputs[N];
size_t totalFloats = data.size();
size_t from = 0;
int i = 0;
for (; i < N - 1; ++i)
{
t[i] = thread(serializeChunk, std::ref(outputs[i]), data, from, from +
totalFloats / N);
from += totalFloats / N;
}
t[i] = thread(serializeChunk, std::ref(outputs[i]), data, from,
totalFloats);
for (i = 0; i < N; ++i)
t[i].join();
size_t totalBuffered = 0;
for (int i = 0; i < N; ++i)
totalBuffered += outputs[i].size();
str.reserve(totalBuffered);
for (int i = 0; i < N; ++i)
str += outputs[i];
}
And the results:
* Serialization of 1000000 floats on the hard disk *
StringStreamDataSerializer flushes data in file in 0.116 seconds.
StringStreamDataSerializer Finished in 10.236 seconds.
SimpleMultiThreadedSerializer flushes data in file in 0.105 seconds.
SimpleMultiThreadedSerializer Finished in 3.01 seconds.

Conversion between binary floating point and decimal output is very expensive. If performance is a concern, you should serialize the data in binary (possibly after endianess conversion, so you get at least interoperability across IEEE 754 systems).
Regarding the poor threading on GNU/Linux performance, this is like a known performance issue regarding locale object handling. In multi-threaded mode, stringstream currently uses a process-wide, heavily contended reference counter for locale handling.

Import FBX Vertex And Index Buffer To DirectX 11

Ok, I'm still trying to figure out how to correctly import FBX vertex and index buffer into DirectX 11. I wrote a controller for doing that and passing the vertex and index buffer to the DX11 renderer, the output should look like a cube but it is not, I only see triangles that don't make sense.
The code is shown below. I did multiply the Z values by -1, though.
What do I need to modify to get the render right?
#pragma once
#include "Array.h"
#include "Vector.h"
#include "fbxsdk.h"
#include <assert.h>
#include "constants.h"
class FbxController
{
public:
FbxController();
~FbxController();
void Import(const char* lFilename)
{
lImporter = FbxImporter::Create(lSdkManager, "");
bool lImportStatus = lImporter->Initialize(lFilename, -1, lSdkManager->GetIOSettings());
if (!lImportStatus) {
printf("Call to FbxImporter::Initialize() failed.\n");
printf("Error returned: %s\n\n", lImporter->GetStatus().GetErrorString());
exit(-1);
}
lScene = FbxScene::Create(lSdkManager, "myScene");
lImporter->Import(lScene);
FbxNode* lRootNode = lScene->GetRootNode();
int childCount = lRootNode->GetChildCount();
FbxNode *node1 = lRootNode->GetChild(0);
const char* nodeName1 = node1->GetName();
fbxsdk::FbxMesh *mesh = node1->GetMesh();
int cpCount1 = mesh->GetControlPointsCount();
fbxsdk::FbxVector4 *controlPoints = mesh->GetControlPoints();
for (int i = 0; i < cpCount1; i++)
{
fbxsdk::FbxVector4 cpitem = controlPoints[i];
printf("%d, %d, %d, %d", cpitem[0], cpitem[1], cpitem[2], cpitem[3] );
VERTEXPOSCOLOR vpc;
vpc.Color.x = 0.5f;
vpc.Color.y = 0.5f;
vpc.Color.z = 0.5f;
vpc.Position.x = cpitem[0];
vpc.Position.y = cpitem[1];
vpc.Position.z = cpitem[2] * -1.0f;
m_vertices.add(vpc);
}
int pvCount = mesh->GetPolygonVertexCount();
int polyCount = mesh->GetPolygonCount();
for (int i = 0; i < polyCount; i++)
{
int polyItemSize = mesh->GetPolygonSize(i);
assert(polyItemSize == 3);
for (int j = 0; j < polyItemSize; j++)
{
int cpIndex = mesh->GetPolygonVertex(i, j);
m_indices.add(cpIndex);
float x = controlPoints[cpIndex].mData[0];
float y = controlPoints[cpIndex].mData[1];
float z = controlPoints[cpIndex].mData[2];
}
}
fbxsdk::FbxMesh *mesh2;
bool isT = mesh->IsTriangleMesh();
FbxNode *node2 = lRootNode->GetChild(1);
FbxNode *node3 = lRootNode->GetChild(2);
//lImporter->Destroy();
}
Array<VERTEXPOSCOLOR> GetVertexPosColors()
{
return m_vertices;
}
Array<unsigned int> getIndexBuffer()
{
return m_indices;
}
protected:
FbxManager *lSdkManager;
FbxIOSettings *ios;
FbxImporter *lImporter;
bool lImportStatu;
FbxScene *lScene;
private:
Array<VERTEXPOSCOLOR> m_vertices;
Array<unsigned int> m_indices;
};

I think you have some problems in your index buffer creation.
You simply gives an index for each vertex, and index buffer not working that way.
let me know if you solve this.

How to return few string from threads and concat it in C++/CLI

how to return few Strings from threads and link it to one String ?
I use CLI/C++, threads in windows forms. This code should divide message from user to n(nThreads) texts and in each thread should encipher message.
Finally it must concat all results to one.
Actually I did something like this:
public: ref class ThreadExample
{
public:
static String^ inputString;
static String^ outputString;
static array<String^>^ arrayOfThreads = gcnew array <String^>(nThreads);
static int iterator;
static void ThreadEncipher()
{
string input, output;
MarshalString(inputString, input);
output = CaesarCipher::encipher(input);
outputString = gcnew String(output.c_str());
arrayOfThreads[iterator] = outputString;
}
Function where I use threads:
array<String^>^ ThreadEncipherFuncCpp(int nThreads, string str2){
array<String^>^ arrayOfThreads = gcnew array <String^>(nThreads);
string loopSubstring;
messageLength = str2.length();
int numberOfSubstring = messageLength / nThreads;
int isModulo = messageLength % nThreads;
array<Thread^>^ xThread = gcnew array < Thread^ >(nThreads);
int j;
//loop dividing text to threads
for (int i = 0; i < nThreads; i++)
{
j = i;
if (i == 0 && numberOfSubstring != 0)
loopSubstring = str2.substr(0, numberOfSubstring);
else if ((i == nThreads - 1) && numberOfSubstring != 0){
if (isModulo != 0)
loopSubstring = str2.substr(numberOfSubstring*i, numberOfSubstring + isModulo);
else
loopSubstring = str2.substr(numberOfSubstring*i, numberOfSubstring);
}
else if (numberOfSubstring == 0){
loopSubstring = str2.substr(0, isModulo);
i = nThreads - 1;
}
else
loopSubstring = str2.substr(numberOfSubstring*i, numberOfSubstring);
xThread[i] = gcnew Thread(gcnew ThreadStart(&ThreadExample::ThreadEncipher));
}
auto start = chrono::system_clock::now();
for (int i = 0; i < nThreads; i++){
ThreadExample::iterator = i;
ThreadExample::inputString = gcnew String(loopSubstring.c_str());
xThread[i]->Start();
}
for (int i = 0; i < nThreads; i++){
xThread[i]->Join();
}
auto elapsed = chrono::system_clock::now() - start;
long long milliseconds = chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
cppTimer = milliseconds;
arrayOfThreads = ThreadExample::arrayOfThreads;
delete xThread;
return arrayOfThreads;
}

I'm going to take a guess here and say that the program ran without error, but your output was blank.
The reason the output is blank is because of the static class initializer. This is executed earlier than you think it is: As soon as you reference the class in any way, the static initializer runs. Therefore, when you try to execute ThreadExample::inputString = "Some example text. Some example text2.";, the static class initializer has already run, and your array of threads is set.
To fix this, move that code out of the static initializer, and into the method where you create the threads.
Also, a more general note on C++/CLI: If you're trying to learn C++, please don't use C++/CLI. C++/CLI is not the same thing as C++. C++/CLI has all the complexities of C++, all the complexities of C#, and some complexities of its own thrown in for good measure. It should be used when it's needed to interface .Net code to C++ code, not as a primary development language.

How to initialize a number of variables within a for loop in Arduino

I'd like to initialize a bunch of variables by calling them in a for loop like this. The effect I'm hoping for is that I'd have three variables at the end aVar = 1, bVar =2, and cVar = 3.
char* variables[] = { "aVar", "bVar", "cVar"};
int values[] = { 1, 2, 3};
void setup(){
for (int i = 0; i < 3; i++){
int String(variables[i]) = values [i];
Serial.println(variables[i]);
}
}
Is there a way to do this?

What you seem to be suggesting is creating a variable at run time whose name is also variable which is not possible. What you could do is create a map and have your keys be entries from variables array and your values be entries from values array.
using namespace std;
int main()
{
char* variables[] = { "aVar", "bVar", "cVar"};
int values[] = { 1, 2, 3};
map<string, int> VariablesMap;
for(int i = 0; i < 3 ; i ++)
{
VariablesMap[variables[i]] = values[i];
}
return 0;
}

Multithreading in C++ using reference classes - ThreadStart constructor issues?

I appreciate any help, and would like to thank you in advance. I'm working on a project for one of my classes. Essentially performing merge sort using multithreading and reference classes. In main I'm just trying to create an initial thread that will begin the recursive mergesort. Each time the array is split a new thread is spawned to handle that subroutine. I don't need all of it done, i just don't under stand why my Thread constructor and ThreadStart delegate are not working. Thanks again!!
#include <iostream>
#include <vector>
#include <string>
#include <time.h>
#include <cstdlib>
using namespace System;
using namespace System::Threading;
public ref class MergeSort
{
private: int cnt;
public: MergeSort()
{
cnt = 0;
}
public: void mergeSort(char a[], int from, int to)
{
Thread^ current = Thread::CurrentThread;
if(from == to)
return;
int mid = (from + to)/2;
//Sort the first and the second half
//addThread(a, from, mid);
//addThread(a, mid+1, to);
//threads[0]->Join();
//threads[1]->Join();
merge(a, from, mid, to);
}
public: void merge(char a[], int from, int mid, int to)
{
Thread^ current = Thread::CurrentThread;
while (current ->ThreadState == ThreadState::Running)
{
int n = to-from + 1; // Size of range to be merged
std::vector<char> b(n);
int i1 = from; //Next element to consider in the first half
int i2 = mid + 1; //Next element to consider in the second half
int j = 0; //Next open position in b
//As long as neight i1 or i2 is past the end, move the smaller element into b
while(i1 <= mid && i2 <= to)
{
if(a[i1] < a[i2])
{
b[j] = a[i1];
i1++;
}
else
{
b[j] = a[i2];
i2++;
}
j++;
}
//Copy any remaining entries of the first half
while(i1 <= mid)
{
b[j] = a[i1];
i1++;
j++;
}
while(i2 <= to)
{
b[j] = a[i2];
i2++;
j++;
}
//Copy back from temporary vector
for(j = 0; j < n; j++)
a[from+j] = b[j];
}
}
};
void main()
{
char A[10];
for(int i = 0; i < 10; i++)
{
A[i] = ((char) ((rand() % (122-65)) + 65));
}
array<Thread^>^ tr = gcnew array<Thread^>(10);
MergeSort^ ms1 = gcnew MergeSort();
ThreadStart^ TS = gcnew ThreadStart(ms1, &MergeSort::mergeSort(A, 0, 10));
tr[0] = gcnew Thread(TS);
tr[0] -> Start();
system("pause");
}

The issue you are facing here is how to construct a ThreadStart delegate. You are trying to do too many things in the ThreadStart constructor. You cannot pass in arguments at this point because all it is looking for is a start location for the thread.
The delegate should be:
ThreadStart^ TS = gcnew ThreadStart(ms1, &MergeSort::mergeSort);
Since however you are passing in some state, I would recommend doing a bit more research on how that is done using C++\CLI. This MSDN topic should give you a start.

Edit:
Never mind, the problem was that I had to change the parameter of the method I tried to pass from Int32 to Object^.
I´m having a similar issue, though i think my problem are not the arguments. I´m passing those through during thread->Start().
I think my problem is rather that I´m trying to start the thread using a method of a ref class.
invalid delegate initializer -- function does not match the delegate type
Is the error I´m getting. Any Ideas?
void AddForcesAll() {
for (int index = 0; index < n; index++) {
Thread^ thread = gcnew Thread (gcnew ParameterizedThreadStart(this, &Bodies::AddForces));
thread->Start(index);
}
The Syntax worked fine for me for non referenced classes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Strange Memory Behavoir in JNI - memory-leaks

My guess is it would be here jclass arrayClass = env->FindClass("leaktest/ArrayClass"); jobject arrObj = env->NewObject(arrayClass, env->GetMethodID(arrayClass, "<init>", "(II[[D)V"), 512, 512, jMultidimArray);

Related

stringstream serialization, on centOS, of large set of floats is faster than 4 pthread-ts which serialize chunks. std::threads are faster on Windows

Import FBX Vertex And Index Buffer To DirectX 11

How to return few string from threads and concat it in C++/CLI

How to initialize a number of variables within a for loop in Arduino

Multithreading in C++ using reference classes - ThreadStart constructor issues?

Categories

Resources