Bandits with Rcpp - rcpp

This is a second attempt at correcting my earlier version that lives here. I am translating the epsilon-greedy algorithm for multiarmed bandits.
A summary of the code is as follows. Basically, we have a set of arms, each of which pays out a reward with a pre-defined probability and our job is to show that by drawing at random from the arms while drawing the arm with the best reward intermittently eventually allows us to converge on to the best arm.
The original algorithm can be found here.
#define ARMA_64BIT_WORD
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp11)]]
struct EpsilonGreedy {
double epsilon;
arma::uvec counts;
arma::vec values;
};
int index_max(arma::uvec& v) {
return v.index_max();
}
int index_rand(arma::vec& v) {
int s = arma::randi<int>(arma::distr_param(0, v.n_elem-1));
return s;
}
int select_arm(EpsilonGreedy& algo) {
if (R::runif(0, 1) > algo.epsilon) {
return index_max(algo.values);
} else {
return index_rand(algo.values);
}
}
void update(EpsilonGreedy& algo, int chosen_arm, double reward) {
algo.counts[chosen_arm] += 1;
int n = algo.counts[chosen_arm];
double value = algo.values[chosen_arm];
algo.values[chosen_arm] = ((n-1)/n) * value + (1/n) * reward;
}
struct BernoulliArm {
double p;
};
int draw(BernoulliArm arm) {
if (R::runif(0, 1) > arm.p) {
return 0;
} else {
return 1;
}
}
// [[Rcpp::export]]
DataFrame test_algorithm(double epsilon, std::vector<double>& means, int
n_sims, int horizon) {
std::vector<BernoulliArm> arms;
for (auto& mu : means) {
BernoulliArm b = {mu};
arms.push_back(b);
}
std::vector<int> sim_num, time, chosen_arms;
std::vector<double> rewards;
for (int sim = 1; sim <= n_sims; ++sim) {
arma::uvec counts(means.size(), arma::fill::zeros);
arma::vec values(means.size(), arma::fill::zeros);
EpsilonGreedy algo = {epsilon, counts, values};
for (int t = 1; t <= horizon; ++t) {
int chosen_arm = select_arm(algo);
double reward = draw(arms[chosen_arm]);
update(algo, chosen_arm, reward);
sim_num.push_back(sim);
time.push_back(t);
chosen_arms.push_back(chosen_arm);
rewards.push_back(reward);
}
}
DataFrame results = DataFrame::create(Named("sim_num") = sim_num,
Named("time") = time,
Named("chosen_arm") = chosen_arms,
Named("reward") = rewards);
return results;
}
/***R
library(tidyverse)
means <- c(0.1, 0.1, 0.1, 0.1, 0.9)
total_results <- data.frame(sim_num = integer(), time = integer(),
chosen_arm = integer(),
reward = numeric(), epsilon = numeric())
for (epsilon in seq(0.1, 0.5, length.out = 5)) {
cat("Starting with ", epsilon, " at: ", format(Sys.time(), "%H:%M"), "\n")
results <- test_algorithm(epsilon, means, 5000, 250)
results$epsilon <- epsilon
total_results <- rbind(total_results, results)
}
avg_reward <- total_results %>% group_by(time, epsilon) %>%
summarize(avg_reward = mean(reward))
dev.new()
ggplot(avg_reward) +
geom_line(aes(x = time, y = avg_reward,
group = epsilon, color = epsilon), size = 1) +
scale_color_gradient(low = "grey", high = "black") +
labs(x = "Time",
y = "Average reward",
title = "Performance of the Epsilon-Greedy Algorithm",
color = "epsilon\n")
The above code returns the following plot:
This plot is just wrong! However, I am unable to zero-in on a logical flaw in the code.... Where am I going off-track?
Edit:
As per the comments, the following is the expected plot:

In this piece of code:
int n = algo.counts[chosen_arm];
//...
algo.values[chosen_arm] = ((n-1)/n) * value + (1/n) * reward;
n is declared as an integer, so (n-1)/n and 1/n will be integer expressions that both evaluate to 0. You can fix this by changing the 1 to 1.0 which is a floating-point constant, to force the expressions to be evaluated as double:
algo.values[chosen_arm] = ((n-1.0)/n) * value + (1.0/n) * reward;

Related

Palindrome operations on a string

You are given a string S initially and some Q queries. For each query you will have 2 integers L and R. For each query, you have to perform the following operations:
Arrange the letters from L to R inclusive to make a Palindrome. If you can form many such palindromes, then take the one that is lexicographically minimum. Ignore the query if no palindrome is possible on rearranging the letters.
You have to find the final string after all the queries.
Constraints:
1 <= length(S) <= 10^5
1 <= Q <= 10^5
1<= L <= R <= length(S)
Sample Input :
4
mmcs 1
1 3
Sample Output:
mcms
Explanation:
The initial string is mmcs, there is 1 query which asks to make a palindrome from 1 3, so the palindrome will be mcm. Therefore the string will mcms.
If each query takes O(N) time, the overall time complexity would be O(NQ) which will give TLE. So each query should take around O(logn) time. But I am not able to think of anything which will solve this question. I think since we only need to find the final string rather than what every query result into, I guess there must be some other way to approach this question. Can anybody help me?
We can solve this problem using Lazy Segment Tree with range updates.
We will make Segment Tree for each character , so there will be a total of 26 segment trees.
In each node of segment tree we will store the frequency of that character over the range of that node and also keep a track of whether to update that range or not.
So for each query do the following ->
We are given a range L to R
So first we will find frequency of each character over L to R (this will take O(26*log(n)) time )
Now from above frequencies count number of characters who have odd frequency.
If count > 1 , we cannot form palindrome, otherwise we can form palindrome
If we can form palindrome then,first we will assign 0 over L to R for each character in Segment Tree and then we will start from smallest character and assign it over (L,L+count/2-1) and (R-count/2+1,R) and then update L += count/2 and R -= count/2
So the time complexity of each query is O(26log(n)) and for building Segment Tree time complexity is O(nlog(n)) so overall time complexity is O(nlogn + q26logn).
For a better understanding please see my code,
#include <bits/stdc++.h>
using namespace std;
#define enl '\n'
#define int long long
#define sz(s) (int)s.size()
#define all(v) (v).begin(),(v).end()
#define input(vec) for (auto &el : vec) cin >> el;
#define print(vec) for (auto &el : vec) cout << el << " "; cout << "\n";
const int mod = 1e9+7;
const int inf = 1e18;
struct SegTree {
vector<pair<bool,int>>lazy;
vector<int>cnt;
SegTree () {}
SegTree(int n) {
lazy.assign(4*n,{false,0});
cnt.assign(4*n,0);
}
int query(int l,int r,int st,int en,int node) {
int mid = (st+en)/2;
if(st!=en and lazy[node].first) {
if(lazy[node].second) {
cnt[2*node] = mid - st + 1;
cnt[2*node+1] = en - mid;
}
else {
cnt[2*node] = cnt[2*node+1] = 0;
}
lazy[2*node] = lazy[2*node+1] = lazy[node];
lazy[node] = {false,0};
}
if(st>r or en<l) return 0;
if(st>=l and en<=r) return cnt[node];
return query(l,r,st,mid,2*node) + query(l,r,mid+1,en,2*node+1);
}
void update(int l,int r,int val,int st,int en,int node) {
int mid = (st+en)/2;
if(st!=en and lazy[node].first) {
if(lazy[node].second) {
cnt[2*node] = mid - st + 1;
cnt[2*node+1] = en - mid;
}
else {
cnt[2*node] = cnt[2*node+1] = 0;
}
lazy[2*node] = lazy[2*node+1] = lazy[node];
lazy[node] = {false,0};
}
if(st>r or en<l) return;
if(st>=l and en<=r) {
cnt[node] = (en - st + 1)*val;
lazy[node] = {true,val};
return;
}
update(l,r,val,st,mid,2*node);
update(l,r,val,mid+1,en,2*node+1);
cnt[node] = cnt[2*node] + cnt[2*node+1];
}
};
void solve() {
int n;
cin>>n;
string s;
cin>>s;
vector<SegTree>tr(26,SegTree(n));
for(int i=0;i<n;i++) {
tr[s[i]-'a'].update(i,i,1,0,n-1,1);
}
int q;
cin>>q;
while(q--) {
int l,r;
cin>>l>>r;
vector<int>cnt(26);
for(int i=0;i<26;i++) {
cnt[i] = tr[i].query(l,r,0,n-1,1);
}
int odd = 0;
for(auto u:cnt) odd += u%2;
if(odd>1) continue;
for(int i=0;i<26;i++) {
tr[i].update(l,r,0,0,n-1,1);
}
int x = l,y = r;
for(int i=0;i<26;i++) {
if(cnt[i]/2) {
tr[i].update(x,x+cnt[i]/2-1,1,0,n-1,1);
tr[i].update(y-cnt[i]/2+1,y,1,0,n-1,1);
x += cnt[i]/2;
y -= cnt[i]/2;
cnt[i]%=2;
}
}
for(int i=0;i<26;i++) {
if(cnt[i]) {
tr[i].update(x,x,1,0,n-1,1);
}
}
}
string ans(n,'a');
for(int i=0;i<26;i++) {
for(int j=0;j<n;j++) {
if(tr[i].query(j,j,0,n-1,1)) {
ans[j] = (char)('a'+i);
}
}
}
cout<<ans<<enl;
}
signed main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);cout.tie(nullptr);
int testcases = 1;
cin>>testcases;
while(testcases--) solve();
return 0;
}

checking edge for circular movement

I want to make my dot program turn around when they reach edge
so basically i just simply calculate
x = width/2+cos(a)*20;
y = height/2+sin(a)*20;
it's make circular movement. so i want to make this turn around by checking the edge. i also already make sure that y reach the if condition using println command
class particles {
float x, y, a, r, cosx, siny;
particles() {
x = width/2; y = height/2; a = 0; r = 20;
}
void display() {
ellipse(x, y, 20, 20);
}
void explode() {
a = a + 0.1;
cosx = cos(a)*r;
siny = sin(a)*r;
x = x + cosx;
y = y + siny;
}
void edge() {
if (x>width||x<0) cosx*=-1;
if (y>height||y<0) siny*=-1;
}
}
//setup() and draw() function
particles part;
void setup(){
size (600,400);
part = new particles();
}
void draw(){
background(40);
part.display();
part.explode();
part.edge();
}
they just ignore the if condition
There is no problem with your check, the problem is with the fact that presumably the very next time through draw() you ignore what you did in response to the check by resetting the values of cosx and siny.
I recommend creating two new variables, dx and dy ("d" for "direction") which will always be either +1 and -1 and change these variables in response to your edge check. Here is a minimal example:
float a,x,y,cosx,siny;
float dx,dy;
void setup(){
size(400,400);
background(0);
stroke(255);
noFill();
x = width/2;
y = height/2;
dx = 1;
dy = 1;
a = 0;
}
void draw(){
ellipse(x,y,10,10);
cosx = dx*20*cos(a);
siny = dy*20*sin(a);
a += 0.1;
x += cosx;
y += siny;
if (x > width || x < 0)
dx = -1*dx;
if (y > height || y < 0)
dy = -1*dy;
}
When you run this code you will observe the circles bouncing off the edges:

RcppAramadillo Cube::operator() : index out of bounds

I have been fiddling with the following C++ code for integration with R code that I have written (too much to include here), but keep getting an error that the Cube::operator() index is out of bounds and I am unsure as to why this is occurring. My suspicion is that the 3D array is not being filled correctly as described in
making 3d array with arma::cube in Rcpp shows cube error
but I am uncertain how to properly solve the issue.
Below is my full C++ code:
// [[Rcpp::depends(RcppArmadillo)]]
#define ARMA_DONT_PRINT_OPENMP_WARNING
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <set>
using namespace Rcpp;
int sample_one(int n) {
return n * unif_rand();
}
int sample_n_distinct(const IntegerVector& x,
int k,
const int * pop_ptr) {
IntegerVector ind_index = RcppArmadillo::sample(x, k, false);
std::set<int> distinct_container;
for (int i = 0; i < k; i++) {
distinct_container.insert(pop_ptr[ind_index[i]]);
}
return distinct_container.size();
}
// [[Rcpp::export]]
arma::Cube<int> fillCube(const arma::Cube<int>& pop,
const IntegerVector& specs,
int perms,
int K) {
int num_specs = specs.size();
arma::Cube<int> res(perms, num_specs, K);
IntegerVector specs_C = specs - 1;
const int * pop_ptr;
int i, j, k;
for (i = 0; i < K; i++) {
for (k = 0; k < num_specs; k++) {
for (j = 0; j < perms; j++) {
pop_ptr = &(pop(0, sample_one(perms), sample_one(K)));
res(j, k, i) = sample_n_distinct(specs_C, k + 1, pop_ptr);
}
}
}
return res;
}
Does someone have an idea as to what may be producing the said error?
Below is the R code with a call to the C++ function (including a commented-out triply-nested 'for' loop that the C++ code reproduces).
## Set up container(s) to hold the identity of each individual from each permutation ##
num.specs <- ceiling(N / K)
## Create an ID for each haplotype ##
haps <- 1:Hstar
## Assign individuals (N) to each subpopulation (K) ##
specs <- 1:num.specs
## Generate permutations, assume each permutation has N individuals, and sample those individuals' haplotypes from the probabilities ##
gen.perms <- function() {
sample(haps, size = num.specs, replace = TRUE, prob = probs)
}
pop <- array(dim = c(perms, num.specs, K))
for (i in 1:K) {
pop[,, i] <- replicate(perms, gen.perms())
}
## Make a matrix to hold individuals from each permutation ##
# HAC.mat <- array(dim = c(perms, num.specs, K))
## Perform haplotype accumulation ##
# for (k in specs) {
# for (j in 1:perms) {
# for (i in 1:K) {
# select.perm <- sample(1:nrow(pop), size = 1, replace = TRUE) # randomly sample a permutation
# ind.index <- sample(specs, size = k, replace = FALSE) # randomly sample individuals
# select.subpop <- sample(i, size = 1, replace = TRUE) # randomly sample a subpopulation
# hap.plot <- pop[select.perm, ind.index, select.subpop] # extract data
# HAC.mat[j, k, i] <- length(unique(hap.plot)) # how many haplotypes are recovered
# }
# }
# }
HAC.mat <- fillCube(pop, specs, perms, K)
This is an out-of-bounds error. The gist of problem is the call
pop_ptr = &(pop(0, sample_one(perms), sample_one(K)));
since
sample_one(perms)
is being placed as an access index where the max length is num_specs. This is seen by how res is defined:
arma::Cube<int> res(perms, num_specs, K);
Thus, moving out perms out of num_specs place should resolve the issue.
// [[Rcpp::export]]
arma::Cube<int> fillCube(const arma::Cube<int>& pop,
const IntegerVector& specs,
int perms,
int K) {
int num_specs = specs.size();
arma::Cube<int> res(perms, num_specs, K);
IntegerVector specs_C = specs - 1;
const int * pop_ptr;
int i, j, k;
for (i = 0; i < K; i++) {
for (k = 0; k < num_specs; k++) {
for (j = 0; j < perms; j++) {
// swapped location
pop_ptr = &(pop(sample_one(perms), 0, sample_one(K)));
// should the middle index be 0?
res(j, k, i) = sample_n_distinct(specs_C, k + 1, pop_ptr);
}
}
}
return res;
}

Reaction-Diffusion algorithm on Processing + Multithreading

I have made an implementation of the Reaction-Diffusion algorithm on Processing 3.1.1, following a video tutorial. I have made some adaptations on my code, like implementing it on a torus space, instead of a bounded box, like the video.
However, I ran into this annoying issue, that the code runs really slow, proportional to the canvas size (larger, slower). With that, I tried optmizing the code, according to my (limited) knowledge. The main thing I did was to reduce the number of loops running.
Even then, my code still ran quite slow.
Since I have noticed that with a canvas of 50 x 50 in size, the algorithm ran at a good speed, I tried making it multithreaded, in such a way that the canvas would be divided between the threads, and each thread would run the algorithm for a small region of the canvas.
All threads read from the current state of the canvas, and all write to the future state of the canvas. The canvas is then updated using Processing's pixel array.
However, even with multithreading, I didn't see any performance improvement. By the contrary, I saw it getting worse. Now sometimes the canvas flicker between a rendered state and completely white, and in some cases, it doesn't even render.
I'm quite sure that I'm doing something wrong, or I may be taking the wrong approach to optimizing this algorithm. And now, I'm asking for help to understand what I'm doing wrong, and how I could fix or improve my code.
Edit: Implementing ahead of time calculation and rendering using a buffer of PImage objects has removed flickering, but the calculation step on the background doesn't run fast enough to fill the buffer.
My Processing Sketch is below, and thanks in advance.
ArrayList<PImage> buffer = new ArrayList<PImage>();
Thread t;
Buffer b;
PImage currentImage;
Point[][] grid; //current state
Point[][] next; //future state
//Reaction-Diffusion algorithm parameters
final float dA = 1.0;
final float dB = 0.5;
//default: f = 0.055; k = 0.062
//mitosis: f = 0.0367; k = 0.0649
float feed = 0.055;
float kill = 0.062;
float dt = 1.0;
//multi-threading parameters to divide canvas
int threadSizeX = 50;
int threadSizeY = 50;
//red shading colors
color red = color(255, 0, 0);
color white = color(255, 255, 255);
color black = color(0, 0, 0);
//if redShader is false, rendering will use a simple grayscale mode
boolean redShader = true;
//simple class to hold chemicals A and B amounts
class Point
{
float a;
float b;
Point(float a, float b)
{
this.a = a;
this.b = b;
}
}
void setup()
{
size(300, 300);
//initialize matrices with A = 1 and B = 0
grid = new Point[width][];
next = new Point[width][];
for (int x = 0; x < width; x++)
{
grid[x] = new Point[height];
next[x] = new Point[height];
for (int y = 0; y < height; y++)
{
grid[x][y] = new Point(1.0, 0.0);
next[x][y] = new Point(1.0, 0.0);
}
}
int a = (int) random(1, 20); //seed some areas with B = 1.0
for (int amount = 0; amount < a; amount++)
{
int siz = 2;
int x = (int)random(width);
int y = (int)random(height);
for (int i = x - siz/2; i < x + siz/2; i++)
{
for (int j = y - siz/2; j < y + siz/2; j++)
{
int i2 = i;
int j2 = j;
if (i < 0)
{
i2 = width + i;
} else if (i >= width)
{
i2 = i - width;
}
if (j < 0)
{
j2 = height + j;
} else if (j >= height)
{
j2 = j - height;
}
grid[i2][j2].b = 1.0;
}
}
}
initializeThreads();
}
/**
* Divide canvas between threads
*/
void initializeThreads()
{
ArrayList<Reaction> reactions = new ArrayList<Reaction>();
for (int x1 = 0; x1 < width; x1 += threadSizeX)
{
for (int y1 = 0; y1 < height; y1 += threadSizeY)
{
int x2 = x1 + threadSizeX;
int y2 = y1 + threadSizeY;
if (x2 > width - 1)
{
x2 = width - 1;
}
if (y2 > height - 1)
{
y2 = height - 1;
}
Reaction r = new Reaction(x1, y1, x2, y2);
reactions.add(r);
}
}
b = new Buffer(reactions);
t = new Thread(b);
t.start();
}
void draw()
{
if (buffer.size() == 0)
{
return;
}
PImage i = buffer.get(0);
image(i, 0, 0);
buffer.remove(i);
//println(frameRate);
println(buffer.size());
//saveFrame("output/######.png");
}
/**
* Faster than calling built in pow() function
*/
float pow5(float x)
{
return x * x * x * x * x;
}
class Buffer implements Runnable
{
ArrayList<Reaction> reactions;
boolean calculating = false;
public Buffer(ArrayList<Reaction> reactions)
{
this.reactions = reactions;
}
public void run()
{
while (true)
{
if (buffer.size() < 1000)
{
calculate();
if (isDone())
{
buffer.add(currentImage);
Point[][] temp;
temp = grid;
grid = next;
next = temp;
calculating = false;
}
}
}
}
boolean isDone()
{
for (Reaction r : reactions)
{
if (!r.isDone())
{
return false;
}
}
return true;
}
void calculate()
{
if (calculating)
{
return;
}
currentImage = new PImage(width, height);
for (Reaction r : reactions)
{
r.calculate();
}
calculating = true;
}
}
class Reaction
{
int x1;
int x2;
int y1;
int y2;
Thread t;
public Reaction(int x1, int y1, int x2, int y2)
{
this.x1 = x1;
this.x2 = x2;
this.y1 = y1;
this.y2 = y2;
}
public void calculate()
{
Calculator c = new Calculator(x1, y1, x2, y2);
t = new Thread(c);
t.start();
}
public boolean isDone()
{
if (t.getState() == Thread.State.TERMINATED)
{
return true;
} else
{
return false;
}
}
}
class Calculator implements Runnable
{
int x1;
int x2;
int y1;
int y2;
//weights for calculating the Laplacian for A and B
final float[][] laplacianWeights = {{0.05, 0.2, 0.05},
{0.2, -1, 0.2},
{0.05, 0.2, 0.05}};
/**
* x1, x2, y1, y2 delimit a rectangle. The object will only work within it
*/
public Calculator(int x1, int y1, int x2, int y2)
{
this.x1 = x1;
this.x2 = x2;
this.y1 = y1;
this.y2 = y2;
//println("x1: " + x1 + ", y1: " + y1 + ", x2: " + x2 + ", y2: " + y2);
}
#Override
public void run()
{
reaction();
show();
}
public void reaction()
{
for (int x = x1; x <= x2; x++)
{
for (int y = y1; y <= y2; y++)
{
float a = grid[x][y].a;
float b = grid[x][y].b;
float[] l = laplaceAB(x, y);
float a2 = reactionDiffusionA(a, b, l[0]);
float b2 = reactionDiffusionB(a, b, l[1]);
next[x][y].a = a2;
next[x][y].b = b2;
}
}
}
float reactionDiffusionA(float a, float b, float lA)
{
return a + ((dA * lA) - (a * b * b) + (feed * (1 - a))) * dt;
}
float reactionDiffusionB(float a, float b, float lB)
{
return b + ((dB * lB) + (a * b * b) - ((kill + feed) * b)) * dt;
}
/**
* Calculates Laplacian for both A and B at same time, to reduce amount of loops executed
*/
float[] laplaceAB(int x, int y)
{
float[] l = {0.0, 0.0};
for (int i = x - 1; i < x + 2; i++)
{
for (int j = y - 1; j < y + 2; j++)
{
int i2 = i;
int j2 = j;
if (i < 0)
{
i2 = width + i;
} else if (i >= width)
{
i2 = i - width;
}
if (j < 0)
{
j2 = height + j;
} else if (j >= height)
{
j2 = j - height;
}
int weightX = (i - x) + 1;
int weightY = (j - y) + 1;
l[0] += laplacianWeights[weightX][weightY] * grid[i2][j2].a;
l[1] += laplacianWeights[weightX][weightY] * grid[i2][j2].b;
}
}
return l;
}
public void show()
{
currentImage.loadPixels();
//renders the canvas using the pixel array
for (int x = 0; x < width; x++)
{
for (int y = 0; y < height; y++)
{
float a = next[x][y].a;
float b = next[x][y].b;
int pix = x + y * width;
float diff = (a - b);
color c;
if (redShader) //aply red shading
{
float thresh = 0.5;
if (diff < thresh)
{
float diff2 = map(pow5(diff), 0, pow5(thresh), 0, 1);
c = lerpColor(black, red, diff2);
} else
{
float diff2 = map(1 - pow5(-diff + 1), 1 - pow5(-thresh + 1), 1, 0, 1);
c = lerpColor(red, white, diff2);
}
} else //apply gray scale shading
{
c = color(diff * 255, diff * 255, diff * 255);
}
currentImage.pixels[pix] = c;
}
}
currentImage.updatePixels();
}
}
A programmer had a problem. He thought “I know, I’ll solve it with threads!”. has Now problems. two he
Processing uses a single rendering thread.
It does this for good reason, and most other renderers do the same thing. In fact, I don't know of any multi-threaded renderers.
You should only change what's on the screen from Processing's main rendering thread. In other words, you should only change stuff from Processing's functions, not your own thread. This is what's causing the flickering you're seeing. You're changing stuff as it's being drawn to the screen, which is a horrible idea. (And it's why Processing uses a single rendering thread in the first place.)
You could try to use your multiple threads to do the processing, not the rendering. But I highly doubt that's going to be worth it, and like you saw, it might even make things worse.
If you want to speed up your sketch, you might also consider doing the processing ahead of time instead of in real time. Do all your calculations at the beginning of the sketch, and then just reference the results of the calculations when it's time to draw the frame. Or you could draw to a PImage ahead of time, and then just draw those.

Errors with repeated FFTW calls

I'm having a strange issue that I can't resolve. I made this as a simple example that demonstrates the problem. I have a sine wave defined between [0, 2*pi]. I take the Fourier transform using FFTW. Then I have a for loop where I repeatedly take the inverse Fourier transform. In each iteration, I take the average of my solution and print the results. I expect that the average stays the same with each iteration because there is no change to solution, y. However, when I pick N = 256 and other even values of N, I note that the average grows as if there are numerical errors. However, if I choose, say, N = 255 or N = 257, this is not the case and I get what is expect (avg = 0.0 for each iteration).
Code:
#include <stdio.h>
#include <stdlib.h>
#include <fftw3.h>
#include <math.h>
int main(void)
{
int N = 256;
double dx = 2.0 * M_PI / (double)N, dt = 1.0e-3;
double *x, *y;
x = (double *) malloc (sizeof (double) * N);
y = (double *) malloc (sizeof (double) * N);
// initial conditions
for (int i = 0; i < N; i++) {
x[i] = (double)i * dx;
y[i] = sin(x[i]);
}
fftw_complex yhat[N/2 + 1];
fftw_plan fftwplan, fftwplan2;
// forward plan
fftwplan = fftw_plan_dft_r2c_1d(N, y, yhat, FFTW_ESTIMATE);
fftw_execute(fftwplan);
// set N/2th mode to zero if N is even
if (N % 2 < 1.0e-13) {
yhat[N/2][0] = 0.0;
yhat[N/2][1] = 0.0;
}
// backward plan
fftwplan2 = fftw_plan_dft_c2r_1d(N, yhat, y, FFTW_ESTIMATE);
for (int i = 0; i < 50; i++) {
// yhat to y
fftw_execute(fftwplan2);
// rescale
for (int j = 0; j < N; j++) {
y[j] = y[j] / (double)N;
}
double avg = 0.0;
for (int j = 0; j < N; j++) {
avg += y[j];
}
printf("%.15f\n", avg/N);
}
fftw_destroy_plan(fftwplan);
fftw_destroy_plan(fftwplan2);
void fftw_cleanup(void);
free(x);
free(y);
return 0;
}
Output for N = 256:
0.000000000000000
0.000000000000000
0.000000000000000
-0.000000000000000
0.000000000000000
0.000000000000022
-0.000000000000007
-0.000000000000039
0.000000000000161
-0.000000000000314
0.000000000000369
0.000000000004775
-0.000000000007390
-0.000000000079126
-0.000000000009457
-0.000000000462023
0.000000000900855
-0.000000000196451
0.000000000931323
-0.000000009895302
0.000000039348379
0.000000133179128
0.000000260770321
-0.000003233551979
0.000008285045624
-0.000016331672668
0.000067450106144
-0.000166893005371
0.001059055328369
-0.002521514892578
0.005493164062500
-0.029907226562500
0.093383789062500
-0.339111328125000
1.208251953125000
-3.937500000000000
13.654296875000000
-43.812500000000000
161.109375000000000
-479.250000000000000
1785.500000000000000
-5369.000000000000000
19376.000000000000000
-66372.000000000000000
221104.000000000000000
-753792.000000000000000
2387712.000000000000000
-8603776.000000000000000
29706240.000000000000000
-96833536.000000000000000
Any ideas?
libfftw has the odious habit of modifying its inputs. Back up yhat if you want to do repeated inverse transforms.
OTOH, it's perverse, but why are you repeating the same operation if you don't expect it give different results? (Despite this being the case)
As indicated in comments: "if you want to keep the input data unchanged, use the FFTW_PRESERVE_INPUT flag. Per http://www.fftw.org/doc/Planner-Flags.html"
For example:
// backward plan
fftwplan2 = fftw_plan_dft_c2r_1d(N, yhat, y, FFTW_ESTIMATE | FFTW_PRESERVE_INPUT);

Resources