Unexpected solution using JiTCDDE - python-3.x

I'm trying to investigate the behavior of the following Delayed Differential Equation using Python:
y''(t) = -y(t)/τ^2 - 2y'(t)/τ - Nd*f(y(t-T))/τ^2,
where f is a cut-off function which is essentially equal to the identity when the absolute value of its argument is between 1 and 10 and otherwise is equal to 0 (see figure 1), and Nd, τ and T are constants.
For this I'm using the package JiTCDDE. This provides a reasonable solution to the above equation. Nevertheless, when I try to add a noise on the right hand side of the equation, I obtain a solution which stabilize to a non-zero constant after a few oscillations. This is not a mathematical solution of the equation (the only possible constant solution being equal to zero). I don't understand why this problem arises and if it is possible to solve it.
I reproduce my code below. Here, for the sake of simplicity, I substituted the noise with an high-frequency cosine, which is introduced in the system of equation as the initial condition for a dummy variable (the cosine could have been introduced directly in the system, but for a general noise this doesn't seem possible). To simplify further the problem, I removed also the term involving the f function, as the problem arises also without it. Figure 2 shows the plot of the function given by the code.
from jitcdde import jitcdde, y, t
import numpy as np
from matplotlib import pyplot as plt
import math
from chspy import CubicHermiteSpline
# Definition of function f:
def functionf(x):
return x/4*(1+symengine.erf(x**2-Bmin**2))*(1-symengine.erf(x**2-Bmax**2))
#parameters:
τ = 42.9
T = 35.33
Nd = 8.32
# Definition of the initial conditions:
dt = .01 # Time step.
totT = 10000. # Total time.
Nmax = int(totT / dt) # Number of time steps.
Vt = np.linspace(0., totT, Nmax) # Vector of times.
# Definition of the "noise"
X = np.zeros(Nmax)
for i in range(Nmax):
X[i]=math.cos(Vt[i])
past=CubicHermiteSpline(n=3)
for time, datum in zip(Vt,X):
regular_past = [10.,0.]
past.append((
time-totT,
np.hstack((regular_past,datum)),
np.zeros(3)
))
noise= lambda t: y(2,t-totT)
# Integration of the DDE
g = [
y(1),
-y(0)/τ**2-2*y(1)/τ+0.008*noise(t)
]
g.append(0)
DDE = jitcdde(g)
DDE.add_past_points(past)
DDE.adjust_diff()
data = []
for time in np.arange(DDE.t, DDE.t+totT, 1):
data.append( DDE.integrate(time)[0] )
plt.plot(data)
plt.show()
Incidentally, I noticed that even without noise, the solution seems to be discontinuous at the point zero (y is set to be equal to zero for negative times), and I don't understand why.

As the comments unveiled, your problem eventually boiled down to this:
step_on_discontinuities assumes delays that are small with respect to the integration time and performs steps that are placed on those times where the delayed components points to the integration start (0 in your case). This way initial discontinuities are handled.
However, implementing an input with a delayed dummy variable introduces a large delay into the system, totT in your case.
The respective step for step_on_discontinuities would be at totT itself, i.e., after the desired integration time.
Thus when you reach for time in np.arange(DDE.t, DDE.t+totT, 1): in your code, DDE.t is totT.
Therefore you have made a big step before you actually start integrating and observing which may seem like a discontinuity and lead to weird results, in particular you do not see the effect of your input, because it has already “ended” at this point.
To avoid this, use adjust_diff or integrate_blindly instead of step_on_discontinuities.

Related

Mixed integer nonlinear programming with gekko python

I want to solve the following optimization problem using Gekko in python 3.7 window version.
Original Problem
Here, x_s are continuous variables, D and Epsilon are deterministic and they are also parameters.
However, since minimization function exists in the objective function, I remove it using binary variables(z1, z2) and then the problem becomes MINLP as follows.
Modified problem
With Gekko,
(1) Can both original problem & modified problem be solved?
(2) How can I code summation in the objective function and also D & epsilon which are parameters in Gekko?
Thanks in advance.
Both problems should be feasible with Gekko but the original appears easier to solve. Here are a few suggestions for the original problem:
Use m.Maximize() for the objective
Use sum() for the inner summation and m.sum() for outer summation for the objective function. I switch to m.sum() when the summation would create an expression that is over 15,000 characters. Using sum() creates one long expression and m.sum() breaks the summation into pieces but takes longer to compile.
Use m.min3() for the min(Dt,xs) terms or slack variables s with x[i]+s[i]=D[i]. It appears that Dt (size 30) is an upper bound, but it has different dimensions that xs (size 100). Slack variables are much more efficient than using binary variables.
D = np.array(100)
x = m.Array(m.Var,100,lb=0,ub=2000000)
The modified problem has 6000 binary variables and 100 continuous variables. There are 2^6000 potential combinations of those variables so it may take a while to solve, even with the efficient branch and bound method of APOPT. Here are a few suggestions for the modified problem:
Use matrix multiplications when possible. Below is an example of matrix operations with Gekko.
from gekko import GEKKO
import numpy as np
m = GEKKO(remote=False)
ni = 3; nj = 2; nk = 4
# solve AX=B
A = m.Array(m.Var,(ni,nj),lb=0)
X = m.Array(m.Var,(nj,nk),lb=0)
AX = np.dot(A,X)
B = m.Array(m.Var,(ni,nk),lb=0)
# equality constraints
m.Equations([AX[i,j]==B[i,j] for i in range(ni) \
for j in range(nk)])
m.Equation(5==m.sum([m.sum([A[i][j] for i in range(ni)]) \
for j in range(nj)]))
m.Equation(2==m.sum([m.sum([X[i][j] for i in range(nj)]) \
for j in range(nk)]))
# objective function
m.Minimize(m.sum([m.sum([B[i][j] for i in range(ni)]) \
for j in range(nk)]))
m.solve()
print(A)
print(X)
print(B)
Declare z1 and z2 variables as integer type with integer=True. Here is more information on using the integer type.
Solve locally with m=GEKKO(remote=False). The processing time will be large and the public server resets connections and deletes jobs every day. Switch to local mode to avoid a potential disruption.

Is there a logic error with my implementation of odeint?

I am getting the wrong solution and am unsure if odeint is the correct tool for solving this system of ODEs.
I am trying to model a simple first order chemical reaction by solving a system of ODEs. From a logic standpoint my functions are correct and I can solve this problem in MATLAB with little issue. I would like to also be able to do this work in python as well. I think odeint is the tool for the job but I could be wrong. My solution should not converge at independent variable = 10 every time but it always does regardless of inputs.
from matplotlib.pyplot import (plot,grid,xlabel,ylabel,show,legend)
import numpy as np
from scipy.integrate import odeint
wght= np.linspace(0,20)
# reaction is A -> B
def PBR(fun,W):
X,y = fun
P_0=20;#%bar
v_0=5; #%m^3/min
y_A0=1; #unitless
k=.005; #m^3/kg/min
alpha=0.1; #1/kg
epi=.13; #unitless
R=8.314; #J/mol/K
F_A0= .5 ;#mol/min
ra = -k *y*(1-X)/(1+epi*X)
dX = (-ra)/F_A0
dy = -alpha*(1+epi*X)/(2*y)
return [dX,dy]
X0 = 0.0
y0 = 1.0
sol = odeint(PBR, [X0, y0],wght)
plot(wght, sol[:, 0], 'b', label='X')
plot(wght, sol[:, 1], 'g', label='y')
legend(loc='best')
xlabel('W')
grid()
show()
print(sol)
Output graph
Your graphic is not reproducible, in python 3.7, scipy 1.4.1, I get a divergence to practically infinity at around 12.5, and after cleaning the work space, both graphs moving to zero after time 10.
The problem is your division by 2*y in the second equation. In effect that means that y is the square root of some other function, and that type of function behaves badly numerically at small values as the tangent becomes vertical, changing to zero shortly after.
However, one can desingularize the division in a simple way
dy = -alpha*(1+epi*X)*y/(1e-10+2*y*y)
or in a more complicated way that leaves the solution far away from zero really unchanged
dy = -alpha*(1+epi*X)*y/(y*y+max(1e-12,y*y))
resulting in the graph
If that is still not the expected behavior, then something else is different in your ODE system relative to the Matlab version.
That the singular point is at about w=10 depends almost completely on alpha=0.1. As epi is small and X is initially small, the second equation is close to
d(y^2)/dW = -alpha ==> y^2 = 1 - alpha*W
and this hits zero at about W=10 where the solution would have to end as the square of y can not take negative values.

Scipy.integrate gives odd results; are there best practices?

I am still struggling with scipy.integrate.quad.
Sparing all the details, I have an integral to evaluate. The function is of the form of the integral of a product of functions in x, like so:
Z(k) = f(x) g(k/x) / abs(x)
I know for certain the range of integration is between tow positive numbers. Oddly, when I pick a wide range that I know must contain all values of x that are positive - like integrating from 1 to 10,000,000 - it intgrates fast and gives an answer which looks right. But when I fingure out the exact limits - which I know sice f(x) is zero over a lot of the real line - and use those, I get another answer that is different. They aren't very different, though I know the second is more accurate.
After much fiddling I got it to work OK, but then needed to add in an exopnentiation - I was at least getting a 'smooth' answer for the computed function of z. I had this working in an OK way before I added in the exponentiation (which is needed), but now the function that gets generated (z) becomes more and more oscillatory and peculiar.
Any idea what is happening here? I know this code comes from an old Fortran library, so there must be some known issues, but I can't find references.
Here is the core code:
def normal(x, mu, sigma) :
return (1.0/((2.0*3.14159*sigma**2)**0.5)*exp(-(x-
mu)**2/(2*sigma**2)))
def integrand(x, z, mu, sigma, f) :
return np.exp(normal(z/x, mu, sigma)) * getP(x, f._x, f._y) / abs(x)
for _z in range (int(z_min), int(z_max) + 1, 1000):
z.append(_z)
pResult = quad(integrand, lb, ub,
args=(float(_z), MU-SIGMA**2/2, SIGMA, X),
points = [100000.0],
epsabs = 1, epsrel = .01) # drop error estimate of tuple
p.append(pResult[0]) # drop error estimate of tuple
By the way, getP() returns a linearly interpolated, piecewise continuous,but non-smooth function to give the integrator values that smoothly fit between the discrete 'buckets' of the histogram.
As with many numerical methods, it can be very sensitive to asymptotes, zeros, etc. The only choice is to keep giving it 'hints' if it will accept them.

Parallelization of Piecewise Polynomial Evaluation

I am trying to evaluate points in a large piecewise polynomial, which is obtained from a cubic-spline. This takes a long time to do and I would like to speed it up.
As such, I would like to evaluate a points on a piecewise polynomial with parallel processes, rather than sequentially.
Code:
z = zeros(1e6, 1) ; % preallocate some memory for speed
Y = rand(11220,161) ; %some data, rand for generating a working example
X = 0 : 0.0125 : 2 ; % vector of data sites
pp = spline(X, Y) ; % get the piecewise polynomial form of the cubic spline.
The resulting structure is large.
for t = 1 : 1e6 % big number
hcurrent = ppval(pp,t); %evaluate the piecewise polynomial at t
z(t) = sum(x(t:t+M-1).*hcurrent,1) ; % do some operation of the interpolated value. Most likely not relevant to this question.
end
Unfortunately, with matrix form and using:
hcurrent = flipud(ppval(pp, 1: 1e6 ))
requires too much memory to process, so cannot be done. Is there a way that I can batch process this code to speed it up?
For scalar second arguments, as in your example, you're dealing with two issues. First, there's a good amount of function call overhead and redundant computation (e.g., unmkpp(pp) is called every loop iteration). Second, ppval is written to be general so it's not fully vectorized and does a lot of things that aren't necessary in your case.
Below is vectorized code code that take advantage of some of the structure of your problem (e.g., t is an integer greater than 0), avoids function call overhead, move some calculations outside of your main for loop (at the cost of a bit of extra memory), and gets rid of a for loop inside of ppval:
n = 1e6;
z = zeros(n,1);
X = 0:0.0125:2;
Y = rand(11220,numel(X));
pp = spline(X,Y);
[b,c,l,k,dd] = unmkpp(pp);
T = 1:n;
idx = discretize(T,[-Inf b(2:l) Inf]); % Or: [~,idx] = histc(T,[-Inf b(2:l) Inf]);
x = bsxfun(#power,T-b(idx),(k-1:-1:0).').';
idx = dd*idx;
d = 1-dd:0;
for t = T
hcurrent = sum(bsxfun(#times,c(idx(t)+d,:),x(t,:)),2);
z(t) = ...;
end
The resultant code takes ~34% of the time of your example for n=1e6. Note that because of the vectorization, calculations are performed in a different order. This will result in slight differences between outputs from ppval and my optimized version due to the nature of floating point math. Any differences should be on the order of a few times eps(hcurrent). You can still try using parfor to further speed up the calculation (with four already running workers, my system took just 12% of your code's original time).
I consider the above a proof of concept. I may have over-optmized the code above if your example doesn't correspond well to your actual code and data. In that case, I suggest creating your own optimized version. You can start by looking at the code for ppval by typing edit ppval in your Command Window. You may be able to implement further optimizations by looking at the structure of your problem and what you ultimately want in your z vector.
Internally, ppval still uses histc, which has been deprecated. My code above uses discretize to perform the same task, as suggested by the documentation.
Use parfor command for parallel loops. see here, also precompute z vector as z(j) = x(j:j+M-1) and hcurrent in parfor for speed up.
The Spline Parameters estimation can be written in Matrix form.
Once you write it in Matrix form and solve it you can use the Model Matrix to evaluate the Spline on all data point using Matrix Multiplication which is probably the most tuned operation in MATLAB.

Filtering signal: how to restrict filter that last point of output must equal the last point of input

Please help my poor knowledge of signal processing.
I want to smoothen some data. Here is my code:
import numpy as np
from scipy.signal import butter, filtfilt
def testButterworth(nyf, x, y):
b, a = butter(4, 1.5/nyf)
fl = filtfilt(b, a, y)
return fl
if __name__ == '__main__':
positions_recorded = np.loadtxt('original_positions.txt', delimiter='\n')
number_of_points = len(positions_recorded)
end = 10
dt = end/float(number_of_points)
nyf = 0.5/dt
x = np.linspace(0, end, number_of_points)
y = positions_recorded
fl = testButterworth(nyf, x, y)
I am pretty satisfied with results except one point:
it is absolutely crucial to me that the start and end point in returned values equal to the start and end point of input. How can I introduce this restriction?
UPD 15-Dec-14 12:04:
my original data looks like this
Applying the filter and zooming into last part of the graph gives following result:
So, at the moment I just care about the last point that must be equal to original point. I try to append copy of data to the end of original list this way:
the result is as expected even worse.
Then I try to append data this way:
And the slice where one period ends and next one begins, looks like that:
To do this, you're always going to cheat somehow, since the true filter applied to the true data doesn't behave the way you require.
One of the best ways to cheat with your data is to assume it's periodic. This has the advantages that: 1) it's consistent with the data you actually have and all your changing is to append data to the region you don't know about (so assuming it's periodic as as reasonable as anything else -- although may violate some unstated or implicit assumptions); 2) the result will be consistent with your filter.
You can usually get by with this by appending copies of your data to the beginning and end of your real data, or just small pieces, depending on your filter.
Since the FFT assumes that the data is periodic anyway, that's often a quick and easy approach, and is fully accurate (whereas concatenating the data is an estimation of an infinitely periodic waveform). Here's an example of the FFT approach for a step filter.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 128)
y = (np.sin(.22*(x+10))>0).astype(np.float)
# filter
y2 = np.fft.fft(y)
f0 = np.fft.fftfreq(len(x))
y2[(f0<-.25) | (f0>.25)] = 0
y3 = abs(np.fft.ifft(y2))
plt.plot(x, y)
plt.plot(x, y3)
plt.xlim(-10, 140)
plt.ylim(-.1, 1.1)
plt.show()
Note how the end points bend towards each other at either end, even though this is not consistent with the periodicity of the waveform (since the segments at either end are very truncated). This can also be seen by adjusting waveform so that the ends are the same (here I used x+30 instead of x+10, and here the ends don't need to bend to match-up so they stay at level with the end of the data.
Note, also, to have the endpoints actually be exactly equal you would have to extend this plot by one point (at either end), since it periodic with exactly the wavelength of the original waveform. Doing this is not ad hoc though, and the result will be entirely consistent with your analysis, but just representing one extra point of what was assumed to be infinite repeats all along.
Finally, this FFT trick works best with waveforms of length 2n. Other lengths may be zero padded in the FFT. In this case, just doing concatenations to either end as I mentioned at first might be the best way to go.
The question is how to filter data and require that the left endpoint of the filtered result matches the left endpoint of the data, and same for the right endpoint. (That is, in general, the filtered result should be close to most of the data points, but not necessarily exactly match any of them, but what if you need a match at both endpoints?)
To make the filtered result exactly match the endpoints of a curve, one could add a padding of points at either end of the curve and adjust the y-position of this padding so that the endpoints of the valid part of the filter exactly matched the end points of the original data (without the padding).
In general, this can be done by either iterating towards a solution, adjusting the padding y-position until the ends line up, or by calculating a few values and then interpolating to determine the y-positions that would be required for the matched endpoints. I'll do the second approach.
Here's the code I used, where I simulated the data as a sine wave with two flat pieces on either side (note, that these flat pieces are not the padding, but I'm just trying to make data that looks a bit like the OPs).
import numpy as np
from scipy.signal import butter, filtfilt
import matplotlib.pyplot as plt
#### op's code
def testButterworth(nyf, x, y):
#b, a = butter(4, 1.5/nyf)
b, a = butter(4, 1.5/nyf)
fl = filtfilt(b, a, y)
return fl
def do_fit(data):
positions_recorded = data
#positions_recorded = np.loadtxt('original_positions.txt', delimiter='\n')
number_of_points = len(positions_recorded)
end = 10
dt = end/float(number_of_points)
nyf = 0.5/dt
x = np.linspace(0, end, number_of_points)
y = positions_recorded
fx = testButterworth(nyf, x, y)
return fx
### simulate some data (op should have done this too!)
def sim_data():
t = np.linspace(.1*np.pi, (2.-.1)*np.pi, 100)
y = np.sin(t)
c = np.ones(10, dtype=np.float)
z = np.concatenate((c*y[0], y, c*y[-1]))
return z
### code to find the required offset padding
def fit_with_pads(v, data, n=1):
c = np.ones(n, dtype=np.float)
z = np.concatenate((c*v[0], data, c*v[1]))
fx = do_fit(z)
return fx
def get_errors(data, fx):
n = (len(fx)-len(data))//2
return np.array((fx[n]-data[0], fx[-n]-data[-1]))
def vary_padding(data, span=.005, n=100):
errors = np.zeros((4, n)) # Lpad, Rpad, Lerror, Rerror
offsets = np.linspace(-span, span, n)
for i in range(n):
vL, vR = data[0]+offsets[i], data[-1]+offsets[i]
fx = fit_with_pads((vL, vR), data, n=1)
errs = get_errors(data, fx)
errors[:,i] = np.array((vL, vR, errs[0], errs[1]))
return errors
if __name__ == '__main__':
data = sim_data()
fx = do_fit(data)
errors = vary_padding(data)
plt.plot(errors[0], errors[2], 'x-')
plt.plot(errors[1], errors[3], 'o-')
oR = -0.30958
oL = 0.30887
fp = fit_with_pads((oL, oR), data, n=1)[1:-1]
plt.figure()
plt.plot(data, 'b')
plt.plot(fx, 'g')
plt.plot(fp, 'r')
plt.show()
Here, for the padding I only used a single point on either side (n=1). Then I calculate the error for a range of values shifting the padding up and down from the first and last data points.
For the plots:
First I plot the offset vs error (between the fit and the desired data value). To find the offset to use, I just zoomed in on the two lines to find the x-value of the y zero crossing, but to do this more accurately, one could calculate the zero crossing from this data:
Here's the plot of the original "data", the fit (green) and the adjusted fit (red):
and zoomed in the RHS:
The important point here is that the red (adjusted fit) and blue (original data) endpoints match, even though the pure fit doesn't.
Is this a valid approach? Of the various options, this seems the most reasonable since one isn't usually making any claims about the data that isn't being shown, and also for show region has an accurately applied filter. For example, FFTs usually assume the data is zero or periodic beyond the boundaries. Certainly, though, to be precise one should explain what was done.

Resources