How do you flatten image coordinates into a 1D array? - graphics

My source code is from Heterogeneous Computing with OpenCL Chapter 4 Basic OpenCL Examples > Image Rotation. The book leaves out several critical details.
My major problem is that I don't know how to initialize the array that I supply to their kernel (they don't tell you how). What I have is:
int W = inImage.width();
int H = inImage.height();
float *myImage = new float[W*H];
for(int row = 0; row < H; row++)
for(int col = 0; col < W; col++)
myImage[row*W+col] = col;
which I supply to this kernel:
__kernel void img_rotate(__global float* dest_data, __global float* src_data, int W, int H, float sinTheta, float cosTheta)
{
const int ix = get_global_id(0);
const int iy = get_global_id(1);
float x0 = W/2.0f;
float y0 = H/2.0f;
float xoff = ix-x0;
float yoff = iy-y0;
int xpos = (int)(xoff*cosTheta + yoff*sinTheta + x0);
int ypos = (int)(yoff*cosTheta - xoff*sinTheta + y0);
if(((int)xpos>=0) && ((int)xpos < W) && ((int)ypos>=0) && ((int)ypos<H))
{
dest_data[iy*W+ix] = src_data[ypos*W+xpos];
//dest_data[iy*W+ix] = src_data[iy*W+ix];
}
}
I'm having trouble finding the right value for theta too. An integer would be an appropriate value for theta, right?
float theta = 45; // 45 degrees, right?
float cos_theta = cos(theta);
float sin_theta = sin(theta);

When writing my OpenCL code, I always treat each kernel as reading a 3D set of data, regardless if the data is 1D, 2D, or 3D:
__kernel void TestKernel(__global float *Data){
k = get_global_id(0); //also z
j = get_global_id(1); //also y
i = get_global_id(2); //also x
//Convert 3D to 1D
int linear_coord = i + get_global_size(0)*j + get_global_size(0)*get_global_size(1)*k;
//do stuff
}
When doing the clEnqueueNDKernelRange(...), just set the dimension to be:
int X = 500;
int Y = 300;
int Z = 1;
size_t GlobalDim = {Z, Y, X};
This let's all of my kernels work easily in all dimensions.

Related

Nan building and looping over an array

I'm able to execute a hello world example, but beyond that I'm new to nan and node add-ons.
I'm concerned about memory leaks so if I'm causing any please let me
know.
And how do I push an array onto that out array similar to
[].push([0, 1]). I'm not sure how to do it in the cleanest way possible without creating a new variable to store it - if possible.
Also if there's anything else I'm doing that's not best practice please let me know! I've been researching this for a while now.
Here's the code I have so far
#include <nan.h>
void Method(const Nan::FunctionCallbackInfo <v8::Value> &info) {
v8::Local <v8::Context> context = info.GetIsolate()->GetCurrentContext();
v8::Local <v8::Array> coordinate = v8::Local<v8::Array>::Cast(info[0]);
unsigned int radius = info[2]->Uint32Value(context).FromJust();
// Also if creating the array is wasteful this way by giving it the max possible size
v8::Local <v8::Array> out = Nan::New<v8::Array>(x * y);
for (int x = -radius; x <= radius; ++x) {
for (int y = -radius; y <= radius; ++y) {
if (x * x + y * y <= radius * radius) {
// I need to push something like [x + coordinate->Get(context, 0), y + coordinate->Get(context, 0)];
out->push_back();
}
}
}
}
I was later able to write this.. If anyone can point out if I approached it correctly and/or if there are any memory issues I need to watch out for.
#include <nan.h>
void Method(const Nan::FunctionCallbackInfo <v8::Value> &info) {
v8::Local <v8::Context> context = info.GetIsolate()->GetCurrentContext();
v8::Local <v8::Array> coordinates v8::Local<v8::Array>::Cast(info[0]);
int radius = info[1]->Int32Value(context).FromJust();
v8::Local <v8::Array> out = Nan::New<v8::Array>();
int index = 0;
for (unsigned int i = 0; i < coordinates->Length(); i++) {
v8::Local <v8::Array> coordinate = v8::Local<v8::Array>::Cast(coordinates->Get(context, i).ToLocalChecked());
int xArg = coordinate->Get(context, 0).ToLocalChecked()->Int32Value(context).FromJust();
int yArg = coordinate->Get(context, 1).ToLocalChecked()->Int32Value(context).FromJust();
for (int xPos = -radius; xPos <= radius; ++xPos) {
for (int yPos = -radius; yPos <= radius; ++yPos) {
if (xPos * xPos + yPos * yPos <= radius * radius) {
v8::Local <v8::Array> xy = Nan::New<v8::Array>();
(void) xy->Set(context, 0, Nan::New(xPos + xArg));
(void) xy->Set(context, 1, Nan::New(yPos + yArg));
(void) out->Set(context, index++, xy);
}
}
}
}
info.GetReturnValue().Set(out);
}
I don't think you have any leaks - in fact there is no implicit memory allocation at all in your code - but in case you need it, I suggest you check the gyp files of my addons for more information on how to build them with asan with g++ or clang. As far as I am concerned, it is a mandatory step when creating Node addons.
https://github.com/mmomtchev/node-gdal-async/blob/master/binding.gyp
https://github.com/mmomtchev/exprtk.js/blob/main/binding.gyp
The option is called --enable_asan

checking edge for circular movement

I want to make my dot program turn around when they reach edge
so basically i just simply calculate
x = width/2+cos(a)*20;
y = height/2+sin(a)*20;
it's make circular movement. so i want to make this turn around by checking the edge. i also already make sure that y reach the if condition using println command
class particles {
float x, y, a, r, cosx, siny;
particles() {
x = width/2; y = height/2; a = 0; r = 20;
}
void display() {
ellipse(x, y, 20, 20);
}
void explode() {
a = a + 0.1;
cosx = cos(a)*r;
siny = sin(a)*r;
x = x + cosx;
y = y + siny;
}
void edge() {
if (x>width||x<0) cosx*=-1;
if (y>height||y<0) siny*=-1;
}
}
//setup() and draw() function
particles part;
void setup(){
size (600,400);
part = new particles();
}
void draw(){
background(40);
part.display();
part.explode();
part.edge();
}
they just ignore the if condition
There is no problem with your check, the problem is with the fact that presumably the very next time through draw() you ignore what you did in response to the check by resetting the values of cosx and siny.
I recommend creating two new variables, dx and dy ("d" for "direction") which will always be either +1 and -1 and change these variables in response to your edge check. Here is a minimal example:
float a,x,y,cosx,siny;
float dx,dy;
void setup(){
size(400,400);
background(0);
stroke(255);
noFill();
x = width/2;
y = height/2;
dx = 1;
dy = 1;
a = 0;
}
void draw(){
ellipse(x,y,10,10);
cosx = dx*20*cos(a);
siny = dy*20*sin(a);
a += 0.1;
x += cosx;
y += siny;
if (x > width || x < 0)
dx = -1*dx;
if (y > height || y < 0)
dy = -1*dy;
}
When you run this code you will observe the circles bouncing off the edges:

Reaction-Diffusion algorithm on Processing + Multithreading

I have made an implementation of the Reaction-Diffusion algorithm on Processing 3.1.1, following a video tutorial. I have made some adaptations on my code, like implementing it on a torus space, instead of a bounded box, like the video.
However, I ran into this annoying issue, that the code runs really slow, proportional to the canvas size (larger, slower). With that, I tried optmizing the code, according to my (limited) knowledge. The main thing I did was to reduce the number of loops running.
Even then, my code still ran quite slow.
Since I have noticed that with a canvas of 50 x 50 in size, the algorithm ran at a good speed, I tried making it multithreaded, in such a way that the canvas would be divided between the threads, and each thread would run the algorithm for a small region of the canvas.
All threads read from the current state of the canvas, and all write to the future state of the canvas. The canvas is then updated using Processing's pixel array.
However, even with multithreading, I didn't see any performance improvement. By the contrary, I saw it getting worse. Now sometimes the canvas flicker between a rendered state and completely white, and in some cases, it doesn't even render.
I'm quite sure that I'm doing something wrong, or I may be taking the wrong approach to optimizing this algorithm. And now, I'm asking for help to understand what I'm doing wrong, and how I could fix or improve my code.
Edit: Implementing ahead of time calculation and rendering using a buffer of PImage objects has removed flickering, but the calculation step on the background doesn't run fast enough to fill the buffer.
My Processing Sketch is below, and thanks in advance.
ArrayList<PImage> buffer = new ArrayList<PImage>();
Thread t;
Buffer b;
PImage currentImage;
Point[][] grid; //current state
Point[][] next; //future state
//Reaction-Diffusion algorithm parameters
final float dA = 1.0;
final float dB = 0.5;
//default: f = 0.055; k = 0.062
//mitosis: f = 0.0367; k = 0.0649
float feed = 0.055;
float kill = 0.062;
float dt = 1.0;
//multi-threading parameters to divide canvas
int threadSizeX = 50;
int threadSizeY = 50;
//red shading colors
color red = color(255, 0, 0);
color white = color(255, 255, 255);
color black = color(0, 0, 0);
//if redShader is false, rendering will use a simple grayscale mode
boolean redShader = true;
//simple class to hold chemicals A and B amounts
class Point
{
float a;
float b;
Point(float a, float b)
{
this.a = a;
this.b = b;
}
}
void setup()
{
size(300, 300);
//initialize matrices with A = 1 and B = 0
grid = new Point[width][];
next = new Point[width][];
for (int x = 0; x < width; x++)
{
grid[x] = new Point[height];
next[x] = new Point[height];
for (int y = 0; y < height; y++)
{
grid[x][y] = new Point(1.0, 0.0);
next[x][y] = new Point(1.0, 0.0);
}
}
int a = (int) random(1, 20); //seed some areas with B = 1.0
for (int amount = 0; amount < a; amount++)
{
int siz = 2;
int x = (int)random(width);
int y = (int)random(height);
for (int i = x - siz/2; i < x + siz/2; i++)
{
for (int j = y - siz/2; j < y + siz/2; j++)
{
int i2 = i;
int j2 = j;
if (i < 0)
{
i2 = width + i;
} else if (i >= width)
{
i2 = i - width;
}
if (j < 0)
{
j2 = height + j;
} else if (j >= height)
{
j2 = j - height;
}
grid[i2][j2].b = 1.0;
}
}
}
initializeThreads();
}
/**
* Divide canvas between threads
*/
void initializeThreads()
{
ArrayList<Reaction> reactions = new ArrayList<Reaction>();
for (int x1 = 0; x1 < width; x1 += threadSizeX)
{
for (int y1 = 0; y1 < height; y1 += threadSizeY)
{
int x2 = x1 + threadSizeX;
int y2 = y1 + threadSizeY;
if (x2 > width - 1)
{
x2 = width - 1;
}
if (y2 > height - 1)
{
y2 = height - 1;
}
Reaction r = new Reaction(x1, y1, x2, y2);
reactions.add(r);
}
}
b = new Buffer(reactions);
t = new Thread(b);
t.start();
}
void draw()
{
if (buffer.size() == 0)
{
return;
}
PImage i = buffer.get(0);
image(i, 0, 0);
buffer.remove(i);
//println(frameRate);
println(buffer.size());
//saveFrame("output/######.png");
}
/**
* Faster than calling built in pow() function
*/
float pow5(float x)
{
return x * x * x * x * x;
}
class Buffer implements Runnable
{
ArrayList<Reaction> reactions;
boolean calculating = false;
public Buffer(ArrayList<Reaction> reactions)
{
this.reactions = reactions;
}
public void run()
{
while (true)
{
if (buffer.size() < 1000)
{
calculate();
if (isDone())
{
buffer.add(currentImage);
Point[][] temp;
temp = grid;
grid = next;
next = temp;
calculating = false;
}
}
}
}
boolean isDone()
{
for (Reaction r : reactions)
{
if (!r.isDone())
{
return false;
}
}
return true;
}
void calculate()
{
if (calculating)
{
return;
}
currentImage = new PImage(width, height);
for (Reaction r : reactions)
{
r.calculate();
}
calculating = true;
}
}
class Reaction
{
int x1;
int x2;
int y1;
int y2;
Thread t;
public Reaction(int x1, int y1, int x2, int y2)
{
this.x1 = x1;
this.x2 = x2;
this.y1 = y1;
this.y2 = y2;
}
public void calculate()
{
Calculator c = new Calculator(x1, y1, x2, y2);
t = new Thread(c);
t.start();
}
public boolean isDone()
{
if (t.getState() == Thread.State.TERMINATED)
{
return true;
} else
{
return false;
}
}
}
class Calculator implements Runnable
{
int x1;
int x2;
int y1;
int y2;
//weights for calculating the Laplacian for A and B
final float[][] laplacianWeights = {{0.05, 0.2, 0.05},
{0.2, -1, 0.2},
{0.05, 0.2, 0.05}};
/**
* x1, x2, y1, y2 delimit a rectangle. The object will only work within it
*/
public Calculator(int x1, int y1, int x2, int y2)
{
this.x1 = x1;
this.x2 = x2;
this.y1 = y1;
this.y2 = y2;
//println("x1: " + x1 + ", y1: " + y1 + ", x2: " + x2 + ", y2: " + y2);
}
#Override
public void run()
{
reaction();
show();
}
public void reaction()
{
for (int x = x1; x <= x2; x++)
{
for (int y = y1; y <= y2; y++)
{
float a = grid[x][y].a;
float b = grid[x][y].b;
float[] l = laplaceAB(x, y);
float a2 = reactionDiffusionA(a, b, l[0]);
float b2 = reactionDiffusionB(a, b, l[1]);
next[x][y].a = a2;
next[x][y].b = b2;
}
}
}
float reactionDiffusionA(float a, float b, float lA)
{
return a + ((dA * lA) - (a * b * b) + (feed * (1 - a))) * dt;
}
float reactionDiffusionB(float a, float b, float lB)
{
return b + ((dB * lB) + (a * b * b) - ((kill + feed) * b)) * dt;
}
/**
* Calculates Laplacian for both A and B at same time, to reduce amount of loops executed
*/
float[] laplaceAB(int x, int y)
{
float[] l = {0.0, 0.0};
for (int i = x - 1; i < x + 2; i++)
{
for (int j = y - 1; j < y + 2; j++)
{
int i2 = i;
int j2 = j;
if (i < 0)
{
i2 = width + i;
} else if (i >= width)
{
i2 = i - width;
}
if (j < 0)
{
j2 = height + j;
} else if (j >= height)
{
j2 = j - height;
}
int weightX = (i - x) + 1;
int weightY = (j - y) + 1;
l[0] += laplacianWeights[weightX][weightY] * grid[i2][j2].a;
l[1] += laplacianWeights[weightX][weightY] * grid[i2][j2].b;
}
}
return l;
}
public void show()
{
currentImage.loadPixels();
//renders the canvas using the pixel array
for (int x = 0; x < width; x++)
{
for (int y = 0; y < height; y++)
{
float a = next[x][y].a;
float b = next[x][y].b;
int pix = x + y * width;
float diff = (a - b);
color c;
if (redShader) //aply red shading
{
float thresh = 0.5;
if (diff < thresh)
{
float diff2 = map(pow5(diff), 0, pow5(thresh), 0, 1);
c = lerpColor(black, red, diff2);
} else
{
float diff2 = map(1 - pow5(-diff + 1), 1 - pow5(-thresh + 1), 1, 0, 1);
c = lerpColor(red, white, diff2);
}
} else //apply gray scale shading
{
c = color(diff * 255, diff * 255, diff * 255);
}
currentImage.pixels[pix] = c;
}
}
currentImage.updatePixels();
}
}
A programmer had a problem. He thought “I know, I’ll solve it with threads!”. has Now problems. two he
Processing uses a single rendering thread.
It does this for good reason, and most other renderers do the same thing. In fact, I don't know of any multi-threaded renderers.
You should only change what's on the screen from Processing's main rendering thread. In other words, you should only change stuff from Processing's functions, not your own thread. This is what's causing the flickering you're seeing. You're changing stuff as it's being drawn to the screen, which is a horrible idea. (And it's why Processing uses a single rendering thread in the first place.)
You could try to use your multiple threads to do the processing, not the rendering. But I highly doubt that's going to be worth it, and like you saw, it might even make things worse.
If you want to speed up your sketch, you might also consider doing the processing ahead of time instead of in real time. Do all your calculations at the beginning of the sketch, and then just reference the results of the calculations when it's time to draw the frame. Or you could draw to a PImage ahead of time, and then just draw those.

Errors with repeated FFTW calls

I'm having a strange issue that I can't resolve. I made this as a simple example that demonstrates the problem. I have a sine wave defined between [0, 2*pi]. I take the Fourier transform using FFTW. Then I have a for loop where I repeatedly take the inverse Fourier transform. In each iteration, I take the average of my solution and print the results. I expect that the average stays the same with each iteration because there is no change to solution, y. However, when I pick N = 256 and other even values of N, I note that the average grows as if there are numerical errors. However, if I choose, say, N = 255 or N = 257, this is not the case and I get what is expect (avg = 0.0 for each iteration).
Code:
#include <stdio.h>
#include <stdlib.h>
#include <fftw3.h>
#include <math.h>
int main(void)
{
int N = 256;
double dx = 2.0 * M_PI / (double)N, dt = 1.0e-3;
double *x, *y;
x = (double *) malloc (sizeof (double) * N);
y = (double *) malloc (sizeof (double) * N);
// initial conditions
for (int i = 0; i < N; i++) {
x[i] = (double)i * dx;
y[i] = sin(x[i]);
}
fftw_complex yhat[N/2 + 1];
fftw_plan fftwplan, fftwplan2;
// forward plan
fftwplan = fftw_plan_dft_r2c_1d(N, y, yhat, FFTW_ESTIMATE);
fftw_execute(fftwplan);
// set N/2th mode to zero if N is even
if (N % 2 < 1.0e-13) {
yhat[N/2][0] = 0.0;
yhat[N/2][1] = 0.0;
}
// backward plan
fftwplan2 = fftw_plan_dft_c2r_1d(N, yhat, y, FFTW_ESTIMATE);
for (int i = 0; i < 50; i++) {
// yhat to y
fftw_execute(fftwplan2);
// rescale
for (int j = 0; j < N; j++) {
y[j] = y[j] / (double)N;
}
double avg = 0.0;
for (int j = 0; j < N; j++) {
avg += y[j];
}
printf("%.15f\n", avg/N);
}
fftw_destroy_plan(fftwplan);
fftw_destroy_plan(fftwplan2);
void fftw_cleanup(void);
free(x);
free(y);
return 0;
}
Output for N = 256:
0.000000000000000
0.000000000000000
0.000000000000000
-0.000000000000000
0.000000000000000
0.000000000000022
-0.000000000000007
-0.000000000000039
0.000000000000161
-0.000000000000314
0.000000000000369
0.000000000004775
-0.000000000007390
-0.000000000079126
-0.000000000009457
-0.000000000462023
0.000000000900855
-0.000000000196451
0.000000000931323
-0.000000009895302
0.000000039348379
0.000000133179128
0.000000260770321
-0.000003233551979
0.000008285045624
-0.000016331672668
0.000067450106144
-0.000166893005371
0.001059055328369
-0.002521514892578
0.005493164062500
-0.029907226562500
0.093383789062500
-0.339111328125000
1.208251953125000
-3.937500000000000
13.654296875000000
-43.812500000000000
161.109375000000000
-479.250000000000000
1785.500000000000000
-5369.000000000000000
19376.000000000000000
-66372.000000000000000
221104.000000000000000
-753792.000000000000000
2387712.000000000000000
-8603776.000000000000000
29706240.000000000000000
-96833536.000000000000000
Any ideas?
libfftw has the odious habit of modifying its inputs. Back up yhat if you want to do repeated inverse transforms.
OTOH, it's perverse, but why are you repeating the same operation if you don't expect it give different results? (Despite this being the case)
As indicated in comments: "if you want to keep the input data unchanged, use the FFTW_PRESERVE_INPUT flag. Per http://www.fftw.org/doc/Planner-Flags.html"
For example:
// backward plan
fftwplan2 = fftw_plan_dft_c2r_1d(N, yhat, y, FFTW_ESTIMATE | FFTW_PRESERVE_INPUT);

j2me drawChar abnormal character spacing?

I write some code to draw a text on a j2me canvas without using drawString.
For some reasons I can't use drawString method.
So, when I run my program, I deal with abnormal character spacing.
Please help me to solve the problem. This is my code:
public void paint(Graphics g) {
...
String str = ... ;
int x0 = 10;
int y0 = getHeight() - 50;
Font f = g.getFont();
int charWidth = 0;
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
charWidth = f.charWidth(ch);
x0 += charWidth;
g.drawChar(ch, x0, y0, 0);
}
...
}
instead use this:
public void paint(Graphics g) {
...
String str = ... ;
int x0 = 10;
int y0 = getHeight() - 50;
Font f = g.getFont();
int lastWidth = 0;
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
g.drawChar(ch, x0 + lastWidth, y0, 0);
lastWidth += f.charWidth(ch);
}
...
}
In your drawChar method,you use 0(it is equal to Graphics.TOP|Graphics.LEFT) so you would increase "lastWidth" after draw current char,or use another anchor(for example Graphics.TOP|Graphics.RIGHT) for drawChar.

Resources