I have a few questions about setting up NLopt with non-linear constraints:
If the number of constraints is bigger than the number of variables, how can we set grad[ ] in the constraint function? Is there any (automatic) method to solve the problem without introducing Lagrangian multiplier?
Using a Lagrangian multiplexer, I know we can solve the problem. However the use of Lagrangian multiplexer we have to obtain my_constraint_data manually, which make it difficult to solve large-scale problem.
For example, suppose I want to minimize the function
f(x1,x2) = -((x1)^3)-(2*(x2)^2)+(10*(x1))-6-(2*(x2)^3)
subject to the following constraints:
Constraint 1: c1 = 10-(x1)*(x2) >= 0
Constraint 2: c2 = ((x1)*(x2)^2)-5 >= 0
Constraint 3: c3 = (x2)-(x1)*(x2)^3 >= 0
In NLopt tutorial, we know that grad[0] = d(c1)/d(x1) and grad[1] = d(c2)/d(x2) as the gradient of constraints. Then, we set grad as follows:
double myconstraint(unsigned n, const double *x, double *grad, void *data) {
my_constraint_data *d = (my_constraint_data *)data;
if (grad) {
grad[0] = -x[1]; //grad[0] = d(c1)/dx[1]
grad[1] = 2*x[0]+x[1]; //grad[1] = d(c2)/dx[2]
grad[2] = ???; //grad[2] = d(c3)/dx[3] but we only have 2 variable (x1)&(x2)
}
return (10-x[0]*x[1], x[0]*x[1]*x[1]-5, x[1]-x[0]*x[1]*x[1]*x[1];
}
The problem is we do not know how to set grad[ ] (especially for c3) if the number of constraints are larger than the number of variables.
Of course we can solve the problem with non-automatic method like below by using Lagrangian multiplexer (l1, l2, l3) where
grad[0] = -l1*(d(c1)/d(x1))-l2*(d(c2)/d(x1))-l3*(d(c)/d(x1))
and
grad[1] = -l1*(d(c1)/d(x2))-l2*(d(c2)/d(x2))-l3*(d(c)/d(x3))
double myconstraint(unsigned n, const double *x, double *grad, void *data) {
my_constraint_data *d = (my_constraint_data *)data;
//set l1, l2, and l3 as parameter of lagrangian multiplier
double l1=d->l1,l2=d->l2,l3=d->l3;
++count;
if (grad) {
grad[0] = l1*x[1]-l2*x[1]*x[1]-l3*x[1]*x[1]*x[1];
grad[1] = l1*x[0]-2*l2*x[0]*x[1]-l3+3*l3*x[0]*x[1]*x[1];
}
return (10-x[0]*x[1], x[0]*x[1]*x[1]-5, x[1]-x[0]*x[1]*x[1]*x[1]);
}
Meanwhile, it is not easy to apply non-automatic method into large-scale problem because it will be inefficient and complicated in programming.
Is there any method to solve nonlinear simultaneous equations using NLopt? (When Lagrangian multiplexer is applied in case of the number of constraints are larger than the number of variables, nonlinear simultaneous equations should be solved.).
We appreciate for your answer. It will be really helpful to us. Thank you for all your kindness.
I think you've got the constraints and the variables you are minimizing mixed up. If I understand your question correctly, you need to create three separate constraint functions for your three constraints. For example:
double c1(unsigned n, const double *x, double *grad, void *data)
{
/* Enforces the constraint
*
* 10 - x1*x2 >= 0
*
* Note we compute x1*x2 - 10 instead of 10 - x1*x2 since nlopt expects
* inequality constraints to be of the form h(x) <= 0. */
if (grad) {
grad[0] = x[1]; // grad[0] = d(c1)/dx1
grad[1] = x[0]; // grad[1] = d(c1)/dx2
}
return x[0]*x[1] - 10;
}
double c2(unsigned n, const double *x, double *grad, void *data)
{
/* Enforces the constraint
*
* x1*x2^2 - 5 >= 0
*
* Note we compute -x1*x2^2 - 5 instead of x1*x2^2 - 5 since nlopt expects
* inequality constraints to be of the form h(x) <= 0. */
if (grad) {
grad[0] = -x[1]*x[1];
grad[1] = -2*x[0]*x[1];
}
return -x[0]*x[1]*x[1] + 5;
}
Then, in your main function you need to add each inequality constraint separately:
int main(int argc, char **argv)
{
// set up nlopt here
/* Add our constraints. */
nlopt_add_inequality_constraint(opt, c1, NULL, 1e-8);
nlopt_add_inequality_constraint(opt, c2, NULL, 1e-8);
// etc.
}
Related
While searching for methods of determining whether a point is within a circumcircle, I came across this answer, which used an interesting method of constructing a quadrilateral between the point and triangle, and testing the flip condition to see if the new point makes a better Delaunay triangle, and therefore is within the original triangle's circumcircle.
The Delaunay flip condition deals with angles, however, the answer I found instead just calculates the cosines of the angles. Rather than checking that the sum of angles is less than or equal to 180°, it takes the minimum of all (negated) cosines, comparing the two results to decide if the point is in the circle.
Here is the code from that answer (copied here for convenience):
#include <array>
#include <algorithm>
struct pnt_t
{
int x, y;
pnt_t ccw90() const
{ return { -y, x }; }
double length() const
{ return std::hypot(x, y); }
pnt_t &operator -=(const pnt_t &rhs)
{
x -= rhs.x;
y -= rhs.y;
return *this;
}
friend pnt_t operator -(const pnt_t &lhs, const pnt_t &rhs)
{ return pnt_t(lhs) -= rhs; }
friend int operator *(const pnt_t &lhs, const pnt_t &rhs)
{ return lhs.x * rhs.x + lhs.y * rhs.y; }
};
int side(const pnt_t &a, const pnt_t &b, const pnt_t &p)
{
int cp = (b - a).ccw90() * (p - a);
return (cp > 0) - (cp < 0);
}
void make_ccw(std::array<pnt_t, 3> &t)
{
if (side(t[0], t[1], t[2]) < 0)
std::swap(t[0], t[1]);
}
double ncos(pnt_t a, const pnt_t &o, pnt_t b)
{
a -= o;
b -= o;
return -(a * b) / (a.length() * b.length());
}
bool inside_circle(std::array<pnt_t, 3> t, const pnt_t &p)
{
make_ccw(t);
std::array<int, 3> s =
{ side(t[0], t[1], p), side(t[1], t[2], p), side(t[2], t[0], p) };
unsigned outside = std::count(std::begin(s), std::end(s), -1);
if (outside != 1)
return outside == 0;
while (s[0] >= 0)
{
std::rotate(std::begin(t), std::begin(t) + 1, std::end(t));
std::rotate(std::begin(s), std::begin(s) + 1, std::end(s));
}
double
min_org = std::min({
ncos(t[0], t[1], t[2]), ncos(t[2], t[0], t[1]),
ncos(t[1], t[0], p), ncos(p, t[1], t[0]) }),
min_alt = std::min({
ncos(t[1], t[2], p), ncos(p, t[2], t[0]),
ncos(t[0], p, t[2]), ncos(t[2], p, t[1]) });
return min_org <= min_alt;
}
I'm having trouble understanding how this works.
How do "sum of angles" and "minimum of all cosines" relate? Cosines of certain angles are always negative, and I would think you could position your triangle to arbitrarily fall within that negative range. So how is this test valid?
Additionally, after collecting the two sets of "minimum cosines" (rather than the two sets of angle sums), the final test is to see which minimum is smallest. Again, I don't see how this relates to the original test of determining whether a triangle is valid by using the flip condition.
What am I missing?
The good news is that there is a well-known function for finding out if a Point D lies within the circumcircle of triangle ABC by computing the determinant shown below. If InCircle comes back greater than zero, then D lies within the circumcircle and a flip is required. The equation does assume that the triangle ABC is given in counterclockwise order (and, so, has a positive area).
I got this equation from the book Cheng, et al. "Delaunay Mesh Generation" (2013), but you should be able to find it in other places. An open-source Java implementation is available at https://github.com/gwlucastrig/Tinfour, but I'm sure you can find examples elsewhere, some of which may be better suited to your needs.
I have been experimenting with the RcppArrayFire Package, mostly rewriting some cost functions from RcppArmadillo and can't seem to get over "no viable conversion from 'af::array' to 'float'. I have also been getting some backend errors, the example below seems free of these.
This cov-var example is written poorly just to use all relevant coding pieces from my actual cost function. As of now it is the only addition in a package generated by, "RcppArrayFire.package.skeleton".
#include "RcppArrayFire.h"
#include <Rcpp.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
float example_ols(const RcppArrayFire::typed_array<f32>& X_vect, const RcppArrayFire::typed_array<f32>& Y_vect){
int Len = X_vect.dims()[0];
int Len_Y = Y_vect.dims()[0];
while( Len_Y < Len){
Len --;
}
float mean_X = af::sum(X_vect)/Len;
float mean_Y = af::sum(Y_vect)/Len;
RcppArrayFire::typed_array<f32> temp(Len);
RcppArrayFire::typed_array<f32> temp_x(Len);
for( int f = 0; f < Len; f++){
temp(f) = (X_vect(f) - mean_X)*(Y_vect(f) - mean_Y);
temp_x(f) = af::pow(X_vect(f) -mean_X, 2);
}
return af::sum(temp)/af::sum(temp_x);
}
/*** R
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
The first thing to consider is the af::sum function, which comes in different forms: An sf::sum(af::array) that returns an af::array in device memory and a templated af::sum<T>(af::array) that returns a T in host memory. So the minimal change to your example would be using af::sum<float>:
#include "RcppArrayFire.h"
#include <Rcpp.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
float example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
int Len = X_vect.dims()[0];
int Len_Y = Y_vect.dims()[0];
while( Len_Y < Len){
Len --;
}
float mean_X = af::sum<float>(X_vect)/Len;
float mean_Y = af::sum<float>(Y_vect)/Len;
RcppArrayFire::typed_array<f32> temp(Len);
RcppArrayFire::typed_array<f32> temp_x(Len);
for( int f = 0; f < Len; f++){
temp(f) = (X_vect(f) - mean_X)*(Y_vect(f) - mean_Y);
temp_x(f) = af::pow(X_vect(f) -mean_X, 2);
}
return af::sum<float>(temp)/af::sum<float>(temp_x);
}
/*** R
set.seed(1)
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
However, there are more things one can improve. In no particular order:
You don't need to include Rcpp.h.
There is an af::mean function for computing the mean of an af::array.
In general RcppArrayFire::typed_array<T> is only needed for getting arrays from R into C++. Within C++ and for the way back you can use af::array.
Even when your device does not support double, you can still use double values on the host.
In order to get good performance, you should avoid for loops and use vectorized functions, just like in R. You have to impose equal dimensions for X and Y, though.
Interestingly I get a different result when I use vectorized functions. Right now I am not sure why this is the case, but the following form makes more sense to me. You should verify that the result is what you want to get:
#include <RcppArrayFire.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
double example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
double mean_X = af::mean<double>(X_vect);
double mean_Y = af::mean<double>(Y_vect);
af::array temp = (X_vect - mean_X) * (Y_vect - mean_Y);
af::array temp_x = af::pow(X_vect - mean_X, 2.0);
return af::sum<double>(temp)/af::sum<double>(temp_x);
}
/*** R
set.seed(1)
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
BTW, an even shorter version would be:
#include <RcppArrayFire.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
af::array example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
return af::cov(X_vect, Y_vect) / af::var(X_vect);
}
Generally it is a good idea to use the in-build functions as much as possible.
Consider a simple depth-of-field filter (my actual use case is similar). It loops over the image and scatters every pixel over a circular neighborhood of its. The radius of the neighborhood depends on the depth of the pixel - the closer the it is to the focal plane, the smaller the radius.
Note that I said "scatters" and not "gathers". In simpler image processing applications, you normally use the "gather" technique to perform an uniform Gaussian blur. IOW, you loop over the neighborhood of each pixel, and "gather" the nearby values into a weighted average. This works fine in that case, but if you make the blur kernel vary between pixels, while still using "gathering", you'll get a somewhat unrealistic effect. Such "space-variant filtering" scenarios are where "scattering" is different from "gathering".
To be clear: the scatter algo is something like this:
init resultImage to black
loop over sourceImage
var c = fetch current pixel from sourceImage
var toAdd = c * weight // weight < 1
loop over circular neighbourhood of current sourcepixel
add toAdd to current neighbor from resultImage
My question is: if I do a direct translation of this pseudocode to OpenCL, will there be synchronization issues due to different work-items simultaneously writing to the same output pixel?
Does the answer vary depending on whether I'm using Buffers or Images?
The course I'm reading suggests that there will be synchronization issues. But OTOH I read the source of Mandelbulber 1.21-2, which does a straightforward OpenCL DOF just like my above pseudocode, and it seems to work fine.
(the relevant code is in mandelbulber-opencl-1.21-2.orig/usr/share/cl/cl_DOF.cl and it's as follows)
//*********************************************************
// MANDELBULBER
// kernel for DOF effect
//
//
// author: Krzysztof Marczak
// contact: buddhi1980#gmail.com
// licence: GNU GPL v3.0
//
//*********************************************************
typedef struct
{
int width;
int height;
float focus;
float radius;
} sParamsDOF;
typedef struct
{
float z;
int i;
} sSortZ;
//------------------ MAIN RENDER FUNCTION --------------------
kernel void DOF(__global ushort4 *in_image, __global ushort4 *out_image, __global sSortZ *zBuffer, sParamsDOF p)
{
const unsigned int i = get_global_id(0);
uint index = p.height * p.width - i - 1;
int ii = zBuffer[index].i;
int2 scr = (int2){ii % p.width, ii / p.width};
float z = zBuffer[index].z;
float blur = fabs(z - p.focus) / z * p.radius;
blur = min(blur, 500.0f);
float4 center = convert_float4(in_image[scr.x + scr.y * p.width]);
float factor = blur * blur * sqrt(blur)* M_PI_F/3.0f;
int blurInt = (int)blur;
int2 scr2;
int2 start = (int2){scr.x - blurInt, scr.y - blurInt};
start = max(start, 0);
int2 end = (int2){scr.x + blurInt, scr.y + blurInt};
end = min(end, (int2){p.width - 1, p.height - 1});
for (scr2.y = start.y; scr2.y <= end.y; scr2.y++)
{
for(scr2.x = start.x; scr2.x <= end.x; scr2.x++)
{
float2 d = scr - scr2;
float r = length(d);
float op = (blur - r) / factor;
op = clamp(op, 0.0f, 1.0f);
float opN = 1.0f - op;
uint address = scr2.x + scr2.y * p.width;
float4 old = convert_float4(out_image[address]);
out_image[address] = convert_ushort4(opN * old + op * center);
}
}
}
No, you can't without worrying about synchronization. If two work items scatter to the same location without synchronization, you have a race condition and won't get the correct results. Same for both buffers and images. With buffers you could use atomics, but they can slow down your code, especially when there is contention (but even when not). AFAIK, read/write images don't have atomic operations.
I am using the following code to convert a Bitmap to Complex and vice versa.
Even though those were directly copied from Accord.NET framework, while testing these static methods, I have discovered that, repeated use of these static methods cause 'data-loss'. As a result, the end output/result becomes distorted.
public partial class ImageDataConverter
{
#region private static Complex[,] FromBitmapData(BitmapData bmpData)
private static Complex[,] ToComplex(BitmapData bmpData)
{
Complex[,] comp = null;
if (bmpData.PixelFormat == PixelFormat.Format8bppIndexed)
{
int width = bmpData.Width;
int height = bmpData.Height;
int offset = bmpData.Stride - (width * 1);//1 === 1 byte per pixel.
if ((!Tools.IsPowerOf2(width)) || (!Tools.IsPowerOf2(height)))
{
throw new Exception("Imager width and height should be n of 2.");
}
comp = new Complex[width, height];
unsafe
{
byte* src = (byte*)bmpData.Scan0.ToPointer();
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++, src++)
{
comp[y, x] = new Complex((float)*src / 255,
comp[y, x].Imaginary);
}
src += offset;
}
}
}
else
{
throw new Exception("EightBppIndexedImageRequired");
}
return comp;
}
#endregion
public static Complex[,] ToComplex(Bitmap bmp)
{
Complex[,] comp = null;
if (bmp.PixelFormat == PixelFormat.Format8bppIndexed)
{
BitmapData bmpData = bmp.LockBits( new Rectangle(0, 0, bmp.Width, bmp.Height),
ImageLockMode.ReadOnly,
PixelFormat.Format8bppIndexed);
try
{
comp = ToComplex(bmpData);
}
finally
{
bmp.UnlockBits(bmpData);
}
}
else
{
throw new Exception("EightBppIndexedImageRequired");
}
return comp;
}
public static Bitmap ToBitmap(Complex[,] image, bool fourierTransformed)
{
int width = image.GetLength(0);
int height = image.GetLength(1);
Bitmap bmp = Imager.CreateGrayscaleImage(width, height);
BitmapData bmpData = bmp.LockBits(
new Rectangle(0, 0, width, height),
ImageLockMode.ReadWrite,
PixelFormat.Format8bppIndexed);
int offset = bmpData.Stride - width;
double scale = (fourierTransformed) ? Math.Sqrt(width * height) : 1;
unsafe
{
byte* address = (byte*)bmpData.Scan0.ToPointer();
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++, address++)
{
double min = System.Math.Min(255, image[y, x].Magnitude * scale * 255);
*address = (byte)System.Math.Max(0, min);
}
address += offset;
}
}
bmp.UnlockBits(bmpData);
return bmp;
}
}
(The DotNetFiddle link of the complete source code)
(ImageDataConverter)
Output:
As you can see, FFT is working correctly, but, I-FFT isn't.
That is because bitmap to complex and vice versa isn't working as expected.
What could be done to correct the ToComplex() and ToBitmap() functions so that they don't loss data?
I do not code in C# so handle this answer with extreme prejudice!
Just from a quick look I spotted few problems:
ToComplex()
Is converting BMP into 2D complex matrix. When you are converting you are leaving imaginary part unchanged, but at the start of the same function you have:
Complex[,] complex2D = null;
complex2D = new Complex[width, height];
So the imaginary parts are either undefined or zero depends on your complex class constructor. This means you are missing half of the data needed for reconstruction !!! You should restore the original complex matrix from 2 images one for real and second for imaginary part of the result.
ToBitmap()
You are saving magnitude which is I think sqrt( Re*Re + Im*Im ) so it is power spectrum not the original complex values and so you can not reconstruct back... You should store Re,Im in 2 separate images.
8bit per pixel
That is not much and can cause significant round off errors after FFT/IFFT so reconstruction can be really distorted.
[Edit1] Remedy
There are more options to repair this for example:
use floating complex matrix for computations and bitmap only for visualization.
This is the safest way because you avoid additional conversion round offs. This approach has the best precision. But you need to rewrite your DIP/CV algorithms to support complex domain matrices instead of bitmaps which require not small amount of work.
rewrite your conversions to support real and imaginary part images
Your conversion is really bad as it does not store/restore Real and Imaginary parts as it should and also it does not account for negative values (at least I do not see it instead they are cut down to zero which is WRONG). I would rewrite the conversion to this:
// conversion scales
float Re_ofset=256.0,Re_scale=512.0/255.0;
float Im_ofset=256.0,Im_scale=512.0/255.0;
private static Complex[,] ToComplex(BitmapData bmpRe,BitmapData bmpIm)
{
//...
byte* srcRe = (byte*)bmpRe.Scan0.ToPointer();
byte* srcIm = (byte*)bmpIm.Scan0.ToPointer();
complex c = new Complex(0.0,0.0);
// for each line
for (int y = 0; y < height; y++)
{
// for each pixel
for (int x = 0; x < width; x++, src++)
{
complex2D[y, x] = c;
c.Real = (float)*(srcRe*Re_scale)-Re_ofset;
c.Imaginary = (float)*(srcIm*Im_scale)-Im_ofset;
}
src += offset;
}
//...
}
public static Bitmap ToBitmapRe(Complex[,] complex2D)
{
//...
float Re = (complex2D[y, x].Real+Re_ofset)/Re_scale;
Re = min(Re,255.0);
Re = max(Re, 0.0);
*address = (byte)Re;
//...
}
public static Bitmap ToBitmapIm(Complex[,] complex2D)
{
//...
float Im = (complex2D[y, x].Imaginary+Im_ofset)/Im_scale;
Re = min(Im,255.0);
Re = max(Im, 0.0);
*address = (byte)Im;
//...
}
Where:
Re_ofset = min(complex2D[,].Real);
Im_ofset = min(complex2D[,].Imaginary);
Re_scale = (max(complex2D[,].Real )-min(complex2D[,].Real ))/255.0;
Im_scale = (max(complex2D[,].Imaginary)-min(complex2D[,].Imaginary))/255.0;
or cover bigger interval then the complex matrix values.
You can also encode both Real and Imaginary parts to single image for example first half of image could be Real and next the Imaginary part. In that case you do not need to change the function headers nor names at all .. but you would need to handle the images as 2 joined squares each with different meaning ...
You can also use RGB images where R = Real, B = Imaginary or any other encoding that suites you.
[Edit2] some examples to make my points more clear
example of approach #1
The image is in form of floating point 2D complex matrix and the images are created only for visualization. There is little rounding error this way. The values are not normalized so the range is <0.0,255.0> per pixel/cell at first but after transforms and scaling it could change greatly.
As you can see I added scaling so all pixels are multiplied by 315 to actually see anything because the FFT output values are small except of few cells. But only for visualization the complex matrix is unchanged.
example of approach #2
Well as I mentioned before you do not handle negative values, normalize values to range <0,1> and back by scaling and rounding off and using only 8 bits per pixel to store the sub results. I tried to simulate that with my code and here is what I got (using complex domain instead of wrongly used power spectrum like you did). Here C++ source only as an template example as you do not have the functions and classes behind it:
transform t;
cplx_2D c;
rgb2i(bmp0);
c.ld(bmp0,bmp0);
null_im(c);
c.mul(1.0/255.0);
c.mul(255.0); c.st(bmp0,bmp1); c.ld(bmp0,bmp1); i2iii(bmp0); i2iii(bmp1); c.mul(1.0/255.0);
bmp0->SaveToFile("_out0_Re.bmp");
bmp1->SaveToFile("_out0_Im.bmp");
t. DFFT(c,c);
c.wrap();
c.mul(255.0); c.st(bmp0,bmp1); c.ld(bmp0,bmp1); i2iii(bmp0); i2iii(bmp1); c.mul(1.0/255.0);
bmp0->SaveToFile("_out1_Re.bmp");
bmp1->SaveToFile("_out1_Im.bmp");
c.wrap();
t.iDFFT(c,c);
c.mul(255.0); c.st(bmp0,bmp1); c.ld(bmp0,bmp1); i2iii(bmp0); i2iii(bmp1); c.mul(1.0/255.0);
bmp0->SaveToFile("_out2_Re.bmp");
bmp1->SaveToFile("_out2_Im.bmp");
And here the sub results:
As you can see after the DFFT and wrap the image is really dark and most of the values are rounded off. So the result after unwrap and IDFFT is really pure.
Here some explanations to code:
c.st(bmpre,bmpim) is the same as your ToBitmap
c.ld(bmpre,bmpim) is the same as your ToComplex
c.mul(scale) multiplies complex matrix c by scale
rgb2i converts RGB to grayscale intensity <0,255>
i2iii converts grayscale intensity ro grayscale RGB image
I'm not really good in this puzzles but double check this dividing.
comp[y, x] = new Complex((float)*src / 255, comp[y, x].Imaginary);
You can loose precision as it is described here
Complex class definition in Remarks section.
May be this happens in your case.
Hope this helps.
I am making a simple program which abstracts complex numbers and complex number operations. I started out with integer data types for my imaginary and real aspects of my complex numbers because of my vast ignorance when, after coding addition, subtraction and multiplication successfully, I realized that for division I would need to use doubles. When I switched to doubles I got bad results from my previous three calculations which worked wonderfully when the values were stored as ints. Can someone please explain to me what is so fundamentally different about ints and doubles in c++ that makes my code work fine for int but die when I try using doubles?
I have pasted my code for reference.
#include "Complex.h"
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
Complex::Complex(){
real = 0;
imaginary = 0;
}
Complex::Complex(double givenReal, double givenImaginary)
{
real = givenReal;
imaginary = givenImaginary;
}
double Complex::getImaginary(){
return imaginary;
}
double Complex::getReal(){
return real;
}
double Complex::getMagnitude(){
//magnitude = sqrt(pow(real,2)+pow(magnitude,2));
return magnitude;
}
Complex Complex::operator+(Complex n){
Complex j = Complex();
j.real = real + n.real;
j.imaginary = imaginary + n.imaginary;
return j;
}
Complex Complex::operator-(Complex n){
Complex j = Complex();
j.real = real - n.real;
j.imaginary = imaginary - n.imaginary;
return j;
}
Complex Complex::operator*(Complex n){
Complex j = Complex();
j.real = (real * n.real)-(imaginary * n.imaginary);
j.imaginary = (real * n.imaginary) + (imaginary * n.real);
return j;
}
Complex Complex::operator/(Complex n){
Complex j = Complex();
j.real = ((real * n.real) + (imaginary * n.imaginary))/(n.real*n.real + n.imaginary*n.imaginary);
j.imaginary = ((imaginary*n.real)-(real * n.imaginary))/(n.real*n.real + n.imaginary*n.imaginary);
return j;
}
int main(){
Complex a = Complex(1, 3);
Complex b = Complex(4, 8);
Complex c = a+b;
printf("Adding a and b\nExpected: (5,11)\nActual: (%d,%d)\n",c.getReal(), c.getImaginary());
c = a-b;
printf("Subtracting b from a\nExpected: (-3,-5)\nActual: (%d,%d)\n",c.getReal(), c.getImaginary());
c = a*b;
printf("Multiplying a and b\nExpected: (-20,20)\nActual: (%d,%d)\n",c.getReal(), c.getImaginary());
c = a/b;
printf("Dividing a by b\nExpected: (.35,.05)\nActual: (%d,%d)\n",c.getReal(), c.getImaginary());
system ("pause");
}
Output:
Adding a and b
Expected: (5,11)
Actual: (0,1075052544)
Subtracting b from a
Expected: (-3,-5)
Actual: (0,-1073217536)
Multiplying a and b
Expected: (-20,20)
Actual: (0,-1070333952)
Dividing a by b
Expected: (.35,.05)
Actual: (1610612736,1071015526)
What every C/C++ programmer should know about printf format specifiers are: %d is for int, %f is for double.