Numerical differentiation using Cauchy (CIF) - python-3.x

I am trying to create a module with a mathematical class for Taylor series, to have it easily accessible for other projects. Hence I wish to optimize it as far as I can.
For those who are not too familiar with Taylor series, it will be a necessity to be able to differentiate a function in a point many times. Given that the normal definition of the mathematical derivative of a function will require immense precision for higher order derivatives, I've decided to use Cauchy's integral formula instead. With a little bit of work, I've managed to rearrange the formula a little bit, as you can see on this picture: Rearranged formula. This provided me with much more accurate results on higher order derivatives than the traditional definition of the derivative. Here is the function i am currently using to differentiate a function in a point:
def myDerivative(f, x, dTheta, degree):
riemannSum = 0
theta = 0
while theta < 2*np.pi:
functionArgument = np.complex128(x + np.exp(1j*theta))
secondFactor = np.complex128(np.exp(-1j * degree * theta))
riemannSum += f(functionArgument) * secondFactor * dTheta
theta += dTheta
return factorial(degree)/(2*np.pi) * riemannSum.real
I've tested this function in my main function with a carefully thought out mathematical function which I know the derivatives of, namely f(x) = sin(x).
def main():
print(myDerivative(f, 0, 2*np.pi/(4*4096), 16))
pass
These derivatives seems to freak out at around the derivative of degree 16. I've also tried to play around with dTheta, but with no luck. I would like to have higher orders as well, but I fear I've run into some kind of machine precission.
My question is in it's simplest form: What can I do to improve this function in order to get higher order of my derivatives?

I seem to have come up with a solution to the problem. I did this by rearranging Cauchy's integral formula in a different way, by exploiting that the initial contour integral can be an arbitrarily large circle around the point of differentiation. Be aware that it is very important that the function is analytic in the complex plane for this to be valid.
New formula
Also this gives a new function for differentiation:
def myDerivative(f, x, dTheta, degree, contourRadius):
riemannSum = 0
theta = 0
while theta < 2*np.pi:
functionArgument = np.complex128(x + contourRadius*np.exp(1j*theta))
secondFactor = (1/contourRadius)**degree*np.complex128(np.exp(-1j * degree * theta))
riemannSum += f(functionArgument) * secondFactor * dTheta
theta += dTheta
return factorial(degree) * riemannSum.real / (2*np.pi)
This gives me a very accurate differentiation of high orders. For instance I am able to differentiate f(x)=e^x 50 times without a problem.

Well, since you are working with a discrete approximation of the derivative (via dTheta), sooner or later you must run into trouble. I'm surprised you were able to get at least 15 accurate derivatives -- good work! But to get derivatives of all orders, either you have to put a limit on what you're willing to accept and say it's good enough, or else compute the derivatives symbolically. Take a look at Sympy for that. Sympy probably has some functions for computing Taylor series too.

Related

Digital Watermarking: Blind detection with linear correlation

I'm trying to understand how a blind detection (detection without cover work) works, by applying linear correlation.
This is my understand so far:
Embedding (one-bit):
we generate a reference pattern w_r by using watermarking key
W_m:we multiply w_r with an strength factor a and take the negative values if we want to embedd a zero bit.
Then: C = C_0 + W_m + N,where N is noise
Blind detection (found in literature):
We need to calculate the linear correlation between w_r and C, to detect the appearance of w_r in C. Linear correlation in genereal is the normalizez scalar product = 1/(j*i) *C*w_r
C consists of C_0*w_r + W_m*w_r + w_*r*N. It is said that, because the left and the right term is probably small, but W_m*w_r has large magnitude, therefore LC(C,w_r) = +-a * |w_r|^2/(ji)
This makes no sense to me. Why should we only consider +-a * |w_r|^2/(ji) for detecting watermarks, without using C ?. This term LC(C,w_r) = +-a * |w_r|^2/(ji) is independent from C?
Or does this only explain why we can say that low linear correlation corresponds to zero-bit and high value to one-bit and we just compute LC(C,w_r) like we usually do by using the scalar product?
Thanks!

Scipy.integrate gives odd results; are there best practices?

I am still struggling with scipy.integrate.quad.
Sparing all the details, I have an integral to evaluate. The function is of the form of the integral of a product of functions in x, like so:
Z(k) = f(x) g(k/x) / abs(x)
I know for certain the range of integration is between tow positive numbers. Oddly, when I pick a wide range that I know must contain all values of x that are positive - like integrating from 1 to 10,000,000 - it intgrates fast and gives an answer which looks right. But when I fingure out the exact limits - which I know sice f(x) is zero over a lot of the real line - and use those, I get another answer that is different. They aren't very different, though I know the second is more accurate.
After much fiddling I got it to work OK, but then needed to add in an exopnentiation - I was at least getting a 'smooth' answer for the computed function of z. I had this working in an OK way before I added in the exponentiation (which is needed), but now the function that gets generated (z) becomes more and more oscillatory and peculiar.
Any idea what is happening here? I know this code comes from an old Fortran library, so there must be some known issues, but I can't find references.
Here is the core code:
def normal(x, mu, sigma) :
return (1.0/((2.0*3.14159*sigma**2)**0.5)*exp(-(x-
mu)**2/(2*sigma**2)))
def integrand(x, z, mu, sigma, f) :
return np.exp(normal(z/x, mu, sigma)) * getP(x, f._x, f._y) / abs(x)
for _z in range (int(z_min), int(z_max) + 1, 1000):
z.append(_z)
pResult = quad(integrand, lb, ub,
args=(float(_z), MU-SIGMA**2/2, SIGMA, X),
points = [100000.0],
epsabs = 1, epsrel = .01) # drop error estimate of tuple
p.append(pResult[0]) # drop error estimate of tuple
By the way, getP() returns a linearly interpolated, piecewise continuous,but non-smooth function to give the integrator values that smoothly fit between the discrete 'buckets' of the histogram.
As with many numerical methods, it can be very sensitive to asymptotes, zeros, etc. The only choice is to keep giving it 'hints' if it will accept them.

Parallelization of Piecewise Polynomial Evaluation

I am trying to evaluate points in a large piecewise polynomial, which is obtained from a cubic-spline. This takes a long time to do and I would like to speed it up.
As such, I would like to evaluate a points on a piecewise polynomial with parallel processes, rather than sequentially.
Code:
z = zeros(1e6, 1) ; % preallocate some memory for speed
Y = rand(11220,161) ; %some data, rand for generating a working example
X = 0 : 0.0125 : 2 ; % vector of data sites
pp = spline(X, Y) ; % get the piecewise polynomial form of the cubic spline.
The resulting structure is large.
for t = 1 : 1e6 % big number
hcurrent = ppval(pp,t); %evaluate the piecewise polynomial at t
z(t) = sum(x(t:t+M-1).*hcurrent,1) ; % do some operation of the interpolated value. Most likely not relevant to this question.
end
Unfortunately, with matrix form and using:
hcurrent = flipud(ppval(pp, 1: 1e6 ))
requires too much memory to process, so cannot be done. Is there a way that I can batch process this code to speed it up?
For scalar second arguments, as in your example, you're dealing with two issues. First, there's a good amount of function call overhead and redundant computation (e.g., unmkpp(pp) is called every loop iteration). Second, ppval is written to be general so it's not fully vectorized and does a lot of things that aren't necessary in your case.
Below is vectorized code code that take advantage of some of the structure of your problem (e.g., t is an integer greater than 0), avoids function call overhead, move some calculations outside of your main for loop (at the cost of a bit of extra memory), and gets rid of a for loop inside of ppval:
n = 1e6;
z = zeros(n,1);
X = 0:0.0125:2;
Y = rand(11220,numel(X));
pp = spline(X,Y);
[b,c,l,k,dd] = unmkpp(pp);
T = 1:n;
idx = discretize(T,[-Inf b(2:l) Inf]); % Or: [~,idx] = histc(T,[-Inf b(2:l) Inf]);
x = bsxfun(#power,T-b(idx),(k-1:-1:0).').';
idx = dd*idx;
d = 1-dd:0;
for t = T
hcurrent = sum(bsxfun(#times,c(idx(t)+d,:),x(t,:)),2);
z(t) = ...;
end
The resultant code takes ~34% of the time of your example for n=1e6. Note that because of the vectorization, calculations are performed in a different order. This will result in slight differences between outputs from ppval and my optimized version due to the nature of floating point math. Any differences should be on the order of a few times eps(hcurrent). You can still try using parfor to further speed up the calculation (with four already running workers, my system took just 12% of your code's original time).
I consider the above a proof of concept. I may have over-optmized the code above if your example doesn't correspond well to your actual code and data. In that case, I suggest creating your own optimized version. You can start by looking at the code for ppval by typing edit ppval in your Command Window. You may be able to implement further optimizations by looking at the structure of your problem and what you ultimately want in your z vector.
Internally, ppval still uses histc, which has been deprecated. My code above uses discretize to perform the same task, as suggested by the documentation.
Use parfor command for parallel loops. see here, also precompute z vector as z(j) = x(j:j+M-1) and hcurrent in parfor for speed up.
The Spline Parameters estimation can be written in Matrix form.
Once you write it in Matrix form and solve it you can use the Model Matrix to evaluate the Spline on all data point using Matrix Multiplication which is probably the most tuned operation in MATLAB.

Is it possible to do an algebraic curve fit with just a single pass of the sample data?

I would like to do an algebraic curve fit of 2D data points, but for various reasons - it isn't really possible to have much of the sample data in memory at once, and iterating through all of it is an expensive process.
(The reason for this is that actually I need to fit thousands of curves simultaneously based on gigabytes of data which I'm reading off disk, and which is therefore sloooooow).
Note that the number of polynomial coefficients will be limited (perhaps 5-10), so an exact fit will be extremely unlikely, but this is ok as I'm trying to find an underlying pattern in data with a lot of random noise.
I understand how one can use a genetic algorithm to fit a curve to a dataset, but this requires many passes through the sample data, and thus isn't practical for my application.
Is there a way to fit a curve with a single pass of the data, where the state that must be maintained from sample to sample is minimal?
I should add that the nature of the data is that the points may lie anywhere on the X axis between 0.0 and 1.0, but the Y values will always be either 1.0 or 0.0.
So, in Java, I'm looking for a class with the following interface:
public interface CurveFit {
public void addData(double x, double y);
public List<Double> getBestFit(); // Returns the polynomial coefficients
}
The class that implements this must not need to keep much data in its instance fields, no more than a kilobyte even for millions of data points. This means that you can't just store the data as you get it to do multiple passes through it later.
edit: Some have suggested that finding an optimal curve in a single pass may be impossible, however an optimal fit is not required, just as close as we can get it in a single pass.
The bare bones of an approach might be if we have a way to start with a curve, and then a way to modify it to get it slightly closer to new data points as they come in - effectively a form of gradient descent. It is hoped that with sufficient data (and the data will be plentiful), we get a pretty good curve. Perhaps this inspires someone to a solution.
Yes, it is a projection. For
y = X beta + error
where lowercased terms are vectors, and X is a matrix, you have the solution vector
\hat{beta} = inverse(X'X) X' y
as per the OLS page. You almost never want to compute this directly but rather use LR, QR or SVD decompositions. References are plentiful in the statistics literature.
If your problem has only one parameter (and x is hence a vector as well) then this reduces to just summation of cross-products between y and x.
If you don't mind that you'll get a straight line "curve", then you only need six variables for any amount of data. Here's the source code that's going into my upcoming book; I'm sure that you can figure out how the DataPoint class works:
Interpolation.h:
#ifndef __INTERPOLATION_H
#define __INTERPOLATION_H
#include "DataPoint.h"
class Interpolation
{
private:
int m_count;
double m_sumX;
double m_sumXX; /* sum of X*X */
double m_sumXY; /* sum of X*Y */
double m_sumY;
double m_sumYY; /* sum of Y*Y */
public:
Interpolation();
void addData(const DataPoint& dp);
double slope() const;
double intercept() const;
double interpolate(double x) const;
double correlate() const;
};
#endif // __INTERPOLATION_H
Interpolation.cpp:
#include <cmath>
#include "Interpolation.h"
Interpolation::Interpolation()
{
m_count = 0;
m_sumX = 0.0;
m_sumXX = 0.0;
m_sumXY = 0.0;
m_sumY = 0.0;
m_sumYY = 0.0;
}
void Interpolation::addData(const DataPoint& dp)
{
m_count++;
m_sumX += dp.getX();
m_sumXX += dp.getX() * dp.getX();
m_sumXY += dp.getX() * dp.getY();
m_sumY += dp.getY();
m_sumYY += dp.getY() * dp.getY();
}
double Interpolation::slope() const
{
return (m_sumXY - (m_sumX * m_sumY / m_count)) /
(m_sumXX - (m_sumX * m_sumX / m_count));
}
double Interpolation::intercept() const
{
return (m_sumY / m_count) - slope() * (m_sumX / m_count);
}
double Interpolation::interpolate(double X) const
{
return intercept() + slope() * X;
}
double Interpolation::correlate() const
{
return m_sumXY / sqrt(m_sumXX * m_sumYY);
}
Why not use a ring buffer of some fixed size (say, the last 1000 points) and do a standard QR decomposition-based least squares fit to the buffered data? Once the buffer fills, each time you get a new point you replace the oldest and re-fit. That way you have a bounded working set that still has some data locality, without all the challenges of live stream (memoryless) processing.
Are you limiting the number of polynomial coefficients (i.e. fitting to a max power of x in your polynomial)?
If not, then you don't need a "best fit" algorithm - you can always fit N data points EXACTLY to a polynomial of N coefficients.
Just use matrices to solve N simultaneous equations for N unknowns (the N coefficients of the polynomial).
If you are limiting to a max number of coefficients, what is your max?
Following your comments and edit:
What you want is a low-pass filter to filter out noise, not fit a polynomial to the noise.
Given the nature of your data:
the points may lie anywhere on the X axis between 0.0 and 1.0, but the Y values will always be either 1.0 or 0.0.
Then you don't need even a single pass, as these two lines will pass exactly through every point:
X = [0.0 ... 1.0], Y = 0.0
X = [0.0 ... 1.0], Y = 1.0
Two short line segments, unit length, and every point falls on one line or the other.
Admittedly, an algorithm to find a good curve fit for arbitrary points in a single pass is interesting, but (based on your question), that's not what you need.
Assuming that you don't know which point should belong to which curve, something like a Hough Transform might provide what you need.
The Hough Transform is a technique that allows you to identify structure within a data set. One use is for computer vision, where it allows easy identification of lines and borders within the field of sight.
Advantages for this situation:
Each point need be considered only once
You don't need to keep a data structure for each candidate line, just one (complex, multi-dimensional) structure
Processing of each line is simple
You can stop at any point and output a set of good matches
You never discard any data, so it's not reliant on any accidental locality of references
You can trade off between accuracy and memory requirements
Isn't limited to exact matches, but will highlight partial matches too.
An approach
To find cubic fits, you'd construct a 4-dimensional Hough space, into which you'd project each of your data-points. Hotspots within Hough space would give you the parameters for the cubic through those points.
You need the solution to an overdetermined linear system. The popular methods are Normal Equations (not usually recommended), QR factorization, and singular value decomposition (SVD). Wikipedia has decent explanations, Trefethen and Bau is very good. Your options:
Out-of-core implementation via the normal equations. This requires the product A'A where A has many more rows than columns (so the result is very small). The matrix A is completely defined by the sample locations so you don't have to store it, thus computing A'A is reasonably cheap (very cheap if you don't need to hit memory for the node locations). Once A'A is computed, you get the solution in one pass through your input data, but the method can be unstable.
Implement an out-of-core QR factorization. Classical Gram-Schmidt will be fastest, but you have to be careful about stability.
Do it in-core with distributed memory (if you have the hardware available). Libraries like PLAPACK and SCALAPACK can do this, the performance should be much better than 1. The parallel scalability is not fantastic, but will be fine if it's a problem size that you would even think about doing in serial.
Use iterative methods to compute an SVD. Depending on the spectral properties of your system (maybe after preconditioning) this could converge very fast and does not require storage for the matrix (which in your case has 5-10 columns each of which are the size of your input data. A good library for this is SLEPc, you only have to find a the product of the Vandermonde matrix with a vector (so you only need to store the sample locations). This is very scalable in parallel.
I believe I found the answer to my own question based on a modified version of this code. For those interested, my Java code is here.

Given a set of points, how do I approximate the major axis of its shape?

Given a "shape" drawn by the user, I would like to "normalize" it so they all have similar size and orientation. What we have is a set of points. I can approximate the size using bounding box or circle, but the orientation is a bit more tricky.
The right way to do it, I think, is to calculate the majoraxis of its bounding ellipse. To do that you need to calculate the eigenvector of the covariance matrix. Doing so likely will be way too complicated for my need, since I am looking for some good-enough estimate. Picking min, max, and 20 random points could be some starter. Is there an easy way to approximate this?
Edit:
I found Power method to iteratively approximate eigenvector. Wikipedia article.
So far I am liking David's answer.
You'd be calculating the eigenvectors of a 2x2 matrix, which can be done with a few simple formulas, so it's not that complicated. In pseudocode:
// sums are over all points
b = -(sum(x * x) - sum(y * y)) / (2 * sum(x * y))
evec1_x = b + sqrt(b ** 2 + 1)
evec1_y = 1
evec2_x = b - sqrt(b ** 2 + 1)
evec2_y = 1
You could even do this by summing over only some of the points to get an estimate, if you expect that your chosen subset of points would be representative of the full set.
Edit: I think x and y must be translated to zero-mean, i.e. subtract mean from all x, y first (eed3si9n).
Here's a thought... What if you performed a linear regression on the points and used the slope of the resulting line? If not all of the points, at least a sample of them.
The r^2 value would also give you information about the general shape. The closer to 0, the more circular/uniform the shape is (circle/square). The closer to 1, the more stretched out the shape is (oval/rectangle).
The ultimate solution to this problem is running PCA
I wish I could find a nice little implementation for you to refer to...
Here you go! (assuming x is a nx2 vector)
def majAxis(x):
e,v = np.linalg.eig(np.cov(x.T)); return v[:,np.argmax(e)]

Resources