scipy.ndimage.correlate but updating of values while computing - python-3.x

I need an array of the sums of 3x3 neighboring cells with products based on a kernel of a different array with the same size (this is exactly scipy.ndimage.correlate up to this point). But when a value for the new array is calculated it has to be updated immediately instead of using the value from the original array for the next computation involving that value. I have written this slow code to implement it myself, which is working perfectly fine (although too slow for me) and delivering the expected result:
for x in range(width):
for y in range(height):
AArr[y,x] += laplaceNeighborDifference(x,y)
def laplaceNeighborDifference(x,y,z):
global w, h, AArr
return -AArr[y,x]+AArr[(y+1)%h,x]*.2+AArr[(y-1)%h,x]*.2+AArr[y,(x+1)%w]*.2+AArr[y,(x-1)%w]*.2+AArr[(y+1)%h,(x+1)%w]*.05+AArr[(y-1)%h,(x+1)%w]*.05+AArr[(y+1)%h,(x-1)%w]*.05+AArr[(y-1)%h,(x-1)%w]*.05
In my approach the kernel is coded directly. Although as an array (to be used as a kernel) it would be written like this:
[[.05,.2,.05],
[.2 ,-1,.2 ],
[.05,.2,.05]]
The SciPy implementation would work like this:
AArr += correlate(AArr, kernel, mode='wrap')
But obviously when I use scipy.ndimage.correlate it calculates the values entirely based on the original array and doesn't update them as it computes them. At least I think that is the difference between my implementation and the SciPy implementation, feel free to point out other differences if I've missed one. My question is if there is a similar function to the aforementioned with desired results or if there is an approach to code it which is faster than mine?
Thank you for your time!

You can use Numba to do that efficiently:
import numba as nb
#nb.njit
def laplaceNeighborDifference(AArr,w,h,x,y):
return -AArr[y,x]+AArr[(y+1)%h,x]*.2+AArr[(y-1)%h,x]*.2+AArr[y,(x+1)%w]*.2+AArr[y,(x-1)%w]*.2+AArr[(y+1)%h,(x+1)%w]*.05+AArr[(y-1)%h,(x+1)%w]*.05+AArr[(y+1)%h,(x-1)%w]*.05+AArr[(y-1)%h,(x-1)%w]*.05
#nb.njit('void(float64[:,::1],int64,int64)')
def compute(AArr,width,height):
for x in range(width):
for y in range(height):
AArr[y,x] += laplaceNeighborDifference(AArr,width,height,x,y)
Note that modulus are generally very slow. This is better to remove them by computing the border separately of the main loop. The resulting code should be much faster without any modulus.

Related

How to apply a Pearson Correlation Analysis over all pairs of pixels of a DataArray as a Correlation Matrix?

I am facing serious difficulties in generating a correlation matrix (pixel by pixel) of a single Netcdf with dimensions ('lon', 'lat', 'time'). My final intent is to generate what one calls a Teleconnectivity Map.
This Map is composed of correlation coefficients. Each pixel has a value that represents the highest correlation value (in module) found in the correlation matrix over all pairs of pixels in the DataArray.
Therefore, in order to create my Teleconnectivity Map, instead of looping over every longitude ('lon') and every latitude ('lat') and later checking all possible combinations of correlation for which one was higher in magnitude, I was thinking of applying the xr.apply_ufunction with a wrapped correlation function inside.
Despite my efforts, I still don't get what is truly happening behind the scenes in the xr.apply_ufunc. All I managed to do was as a single Resultant Matrix with all pixels equals to 1 (perfect correlation).
See code below:
import numpy as np
import xarray as xr
def correlation(x, y):
return np.corrcoef(x, y)[0,0] # to return a single correlation index, instead of a matriz
def wrapped_correlation(da, x, coord='time'):
"""Finds the correlation along a given dimension of a dataarray."""
from functools import partial
fpartial = partial(correlation, x.values)
return xr.apply_ufunc(fpartial,
da,
input_core_dims=[[coord]] ,
output_core_dims=[[]],
vectorize=True,
output_dtypes=[float]
)
# testing the wrapped correlation for a sample data:
ds = xr.tutorial.open_dataset('air_temperature').load()
# testing for a single point in space.
x = ds['air'].sel(dict(lon=1, lat=92), method='nearest')
# over all points in the DataArray
Corr_over_x = wrapped_correlation(ds['air'], x)
Corr_over_x# notice that the resultant DataArray is composed solely of ones (perfect correlation match). This is impossible. I would expect to have different values of correlation for each pixel in here
# if one would plot the data, I would be composed of a variety of correlation values (see example below):
Corr_over_x.plot()
This is an important asset for meteorologists and Remote Sensing researches. It allows the evaluation of potential geophysical patterns over a given area of study.
I thank you for your time, and I hope hearing from you soon.
Sincerely yours,
Firstly, you need to use np.corrcoef(x, y)[0,1]. In the end, you don't need to use partial at all, see below:
def correlation(x1, x2):
return np.corrcoef(x1, x2)[0,1] # to return a single correlation index, instead of a matriz
def wrapped_correlation(da, x, coord='time'):
"""Finds the correlation along a given dimension of a dataarray."""
return xr.apply_ufunc(correlation,
da,
x,
input_core_dims=[[coord],[coord]] ,
output_core_dims=[[]],
vectorize=True,
output_dtypes=[float]
)
I have managed to solve my question. The script has become a bit long. Nevertheless, it does what it was previously intended.
The code is adapted from this reference
Since it is too long to show a snippet in here, I am posting a link to my Github account in which the algorithm (organized in a package named Teleconnection_using_xarray_data) can be checked here.
The package has two modules with similar results.
The first module (teleconnection_with_connecting_pathways) is slower than the second (teleconnection_via_numpy), but it allows to evaluate the connecting pathways between the partial teleconnection maps.
The second, only returns the resultant teleconnection map, without the connecting lines (geopandas-Linestrings), though it is much faster.
Feel free to colaborate. If possible, I would like to combine both modules ensuring speed and pathway analyses in the Teleconnection algorithm.
Sincerely yours,
Philipe Leal

Random Index from a Tensor (Sampling with Replacement from a Tensor)

I'm trying to manipulate individual weights of different neural nets to see how their performance degrades. As part of these experiments, I'm required to sample randomly from their weight tensors, which I've come to understand as sampling with replacement (in the statistical sense). However, since it's high-dimensional, I've been stumped by how to do this in a fair manner. Here are the approaches and research I've put into considering this problem:
This was previously implemented by selecting a random layer and then selecting a random weight in that layer (ignore the implementation of picking a random weight). Since layers are different sizes, we discovered that weights were being sampled unevenly.
I considered what would happen if we sampled according to the numpy.shape of the tensor; however, I realize now that this encounters the same problem as above.
Consider what happens to a rank 2 tensor like this:
[[*, *, *],
[*, *, *, *]]
Selecting a row randomly and then a value from that row results in an unfair selection. This method could work if you're able to assert that this scenario never occurs, but it's far from a general solution.
Note that this possible duplicate actually implements it in this fashion.
I found people suggesting flattening the tensor and use numpy.random.choice to select randomly from a 1D array. That's a simple solution, except I have no idea how to invert the flattened tensor back into its original shape. Further, flattening millions of weights would be a somewhat slow implementation.
I found someone discussing tf.random.multinomial here, but I don't understand enough of it to know whether it's applicable or not.
I ran into this paper about resevoir sampling, but again, it went over my head.
I found another paper which specifically discusses tensors and sampling techniques, but it went even further over my head.
A teammate found this other paper which talks about random sampling from a tensor, but it's only for rank 3 tensors.
Any help understanding how to do this? I'm working in Python with Keras, but I'll take an algorithm in any form that it exists. Thank you in advance.
Before I forget to document the solution we arrived at, I'll talk about the two different paths I see for implementing this:
Use a total ordering on scalar elements of the tensor. This is effectively enumerating your elements, i.e. flattening them. However, you can do this while maintaining the original shape. Consider this pseudocode (in Python-like syntax):
def sample_tensor(tensor, chosen_index: int) -> Tuple[int]:
"""Maps a chosen random number to its index in the given tensor.
Args:
tensor: A ragged-array n-tensor.
chosen_index: An integer in [0, num_scalar_elements_in_tensor).
Returns:
The index that accesses this element in the tensor.
NOTE: Entirely untested, expect it to be fundamentally flawed.
"""
remaining = chosen_index
for (i, sub_list) in enumerate(tensor):
if type(sub_list) is an iterable:
if |sub_list| > remaining:
remaining -= |sub_list|
else:
return i joined with sample_tensor(sub_list, remaining)
else:
if len(sub_list) <= remaining:
return tuple(remaining)
First of all, I'm aware this isn't a sound algorithm. The idea is to count down until you reach your element, with bookkeeping for indices.
We need to make crucial assumptions here. 1) All lists will eventually contain only scalars. 2) By direct consequence, if a list contains lists, assume that it also doesn't contain scalars at the same level. (Stop and convince yourself for (2).)
We also need to make a critical note here too: We are unable to measure the number of scalars in any given list, unless the list is homogeneously consisting of scalars. In order to avoid measuring this magnitude at every point, my algorithm above should be refactored to descend first, and subtract later.
This algorithm has some consequences:
It's the fastest in its entire style of approaching the problem. If you want to write a function f: [0, total_elems) -> Tuple[int], you must know the number of preceding scalar elements along the total ordering of the tensor. This is effectively bound at Theta(l) where l is the number of lists in the tensor (since we can call len on a list of scalars).
It's slow. It's too slow compared to sampling nicer tensors that have a defined shape to them.
It begs the question: can we do better? See the next solution.
Use a probability distribution in conjunction with numpy.random.choice. The idea here is that if we know ahead of time what the distribution of scalars is already like, we can sample fairly at each level of descending the tensor. The hard problem here is building this distribution.
I won't write pseudocode for this, but lay out some objectives:
This can be called only once to build the data structure.
The algorithm needs to combine iterative and recursive techniques to a) build distributions for sibling lists and b) build distributions for descendants, respectively.
The algorithm will need to map indices to a probability distribution respective to sibling lists (note the assumptions discussed above). This does require knowing the number of elements in an arbitrary sub-tensor.
At lower levels where lists contain only scalars, we can simplify by just storing the number of elements in said list (as opposed to storing probabilities of selecting scalars randomly from a 1D array).
You will likely need 2-3 functions: one that utilizes the probability distribution to return an index, a function that builds the distribution object, and possibly a function that just counts elements to help build the distribution.
This is also faster at O(n) where n is the rank of the tensor. I'm convinced this is the fastest possible algorithm, but I lack the time to try to prove it.
You might choose to store the distribution as an ordered dictionary that maps a probability to either another dictionary or the number of elements in a 1D array. I think this might be the most sensible structure.
Note that (2) is truly the same as (1), but we pre-compute knowledge about the densities of the tensor.
I hope this helps.

Apply function on coordinate pair along particular axis using multiple variables in Xarray

My xarray Dataset has three dimensions,
x,y,t
and 2 variables,
foo, bar
I would like to apply function baz() on every x, y coordinate pair's time series t
baz() will accept an array of foo-s and and array of bar-s for a given (x, y)
I'm having a tough time understanding whether or not built in structures to handle/distribute this exists in either, xarray, pandas, numpy, or dask.
Any hints?
My current approach is writing a python array iterator, or exploring ufunc:
The issues with iterating over something in python is that I have to deal with concurrency myself and that there is overhead in the iteration.
The issues with ufunc is that I don't understand it enough and seems like it's a function that is applied to individual elements of the array, not subsets along axes.
The hopeful part of me is looking for something that is xarray native or daskable.
Sounds (vaguely) like you are looking for Dataset.reduce:
ds.reduce(baz, dim=('x', 'y'))

Problems with a function and odeint in python

For a few months I started working with python, considering the great advantages it has. But recently, i used odeint from scipy to solve a system of differential equations. But during the integration process the implemented function doesn't work as expected.
In this case, I want to solve a system of differential equations where one of the initial conditions (x[0]) varies (between 4-5) depending on the value that the variable reaches during the integration process (It is programmed inside of the function by means of the if structure).
#Control of oxygen
SO2_lower=4
SO2_upper=5
if x[0]<=SO2_lower:
x[0]=SO2_upper
When the function is used by odeint, some lines of code inside the function are obviated, even when the functions changes the value of x[0]. Here is all my code:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
plt.ion()
# Stoichiometric parameters
YSB_OHO_Ox=0.67 #Yield for XOHO growth per SB (Aerobic)
YSB_Stor_Ox=0.85 #Yield for XOHO,Stor formation per SB (Aerobic)
YStor_OHO_Ox=0.63 #Yield for XOHO growth per XOHO,Stor (Aerobic)
fXU_Bio_lys=0.2 #Fraction of XU generated in biomass decay
iN_XU=0.02 #N content of XU
iN_XBio=0.07 #N content of XBio
iN_SB=0.03 #N content of SB
fSTO=0.67 #Stored fraction of SB
#Kinetic parameters
qSB_Stor=5 #Rate constant for XOHO,Stor storage of SB
uOHO_Max=2 #Maximum growth rate of XOHO
KSB_OHO=2 #Half-saturation coefficient for SB
KStor_OHO=1 #Half-saturation coefficient for XOHO,Stor/XOHO
mOHO_Ox=0.2 #Endogenous respiration rate of XOHO (Aerobic)
mStor_Ox=0.2 #Endogenous respiration rate of XOHO,Stor (Aerobic)
KO2_OHO=0.2 #Half-saturation coefficient for SO2
KNHx_OHO=0.01 #Half-saturation coefficient for SNHx
#Other parameters
DT=1/86400.0
def f(x,t):
#Control of oxygen
SO2_lower=4
SO2_upper=5
if x[0]<=SO2_lower:
x[0]=SO2_upper
M=np.matrix([[-(1.0-YSB_Stor_Ox),-1,iN_SB,0,0,YSB_Stor_Ox],
[-(1.0-YSB_OHO_Ox)/YSB_OHO_Ox,-1/YSB_OHO_Ox,iN_SB/YSB_OHO_Ox-iN_XBio,0,1,0],
[-(1.0-YStor_OHO_Ox)/YStor_OHO_Ox,0,-iN_XBio,0,1,-1/YStor_OHO_Ox],
[-(1.0-fXU_Bio_lys),0,iN_XBio-fXU_Bio_lys*iN_XU,fXU_Bio_lys,-1,0],
[-1,0,0,0,0,-1]])
R=np.matrix([[DT*fSTO*qSB_Stor*(x[0]/(KO2_OHO+x[0]))*(x[1]/(KSB_OHO+x[1]))*x[4]],
[DT*(1-fSTO)*uOHO_Max*(x[0]/(KO2_OHO+x[0]))*(x[1]/(KSB_OHO+x[1]))* (x[2]/(KNHx_OHO+x[2]))*x[4]],
[DT*uOHO_Max*(x[0]/(KO2_OHO+x[0]))*(x[2]/(KNHx_OHO+x[2]))*((x[5]/x[4])/(KStor_OHO+(x[5]/x[4])))*(KSB_OHO/(KSB_OHO+x[1]))*x[4]],
[DT*mOHO_Ox*(x[0]/(KO2_OHO+x[0]))*x[4]],
[DT*mStor_Ox*(x[0]/(KO2_OHO+x[0]))*x[5]]])
Mt=M.transpose()
MxRm=Mt*R
MxR=MxRm.tolist()
return ([MxR[0][0],
MxR[1][0],
MxR[2][0],
MxR[3][0],
MxR[4][0],
MxR[5][0]])
#ODE solution
t=np.linspace(0.0,3600,3600)
#Initial conditions
y0=np.array([5,176,5,30,100,5])
Var=odeint(f,y0,t,args=(),h0=1,hmin=1,hmax=1,atol=1e-5,rtol=1e-5)
Sol=Var.tolist()
plt.plot(t,Var[:,0])
Thanks very much in advance!!!!!
Short answer:
You should not modify input state vector inside your ODE function. Instead try the following and verify your results:
x0 = x[0]
if x0<=SO2_lower:
x0=SO2_upper
# use x0 instead of x[0] in the rest of this function body
I suppose that this is your problem, but I am not sure, since you did not explain what exactly was wrong with the results. Moreover, you do not change "initial condition". Initial condition is
y0=np.array([5,176,5,30,100,5])
you just change the input state vector.
Detailed answer:
Your odeint integrator is probably using one of the higher order adaptive Runge-Kutta methods. This algorithm requires multiple ODE function evaluations to calculate single integration step, therefore changing the input state vector may lead to undefined results. In C++ boost::odeint this is even not possible to do so, because input variable is "const". Python however is not as strict as C++ and I suppose that it is possible to make this kind of bug unintentionally (I did not try it, though).
EDIT:
OK, I understand what you want to achieve.
Your variable x[0] is constrained by modular algebra and it is not possible to express in the form
x' = f(x,t)
which is one of the possible definitions of the Ordinary Differential Equation, that ondeint library is meant to solve. However, few possible "hacks" can be used here to bypass this limitation.
One possibility is to use a fixed step and low order (because for higher order solvers you need to know, which part of the algorithm you are actually in, see RK4 for example) solver and change your dx[0] equation (in your code it is MxR[0][0] element) to:
# at the beginning of your system
if (x[0] > S02_lower): # everything is normal here
x0 = x[0]
dx0 = # normal equation for dx0
else: # x[0] is too low, we must somehow force it to become S02_upper again
dx0 = (x[0] - S02_upper)/step_size // assuming that x0_{n+1} = x0_{n} + dx0*step_size
x0 = S02_upper
# remember to use x0 in the rest of your code and also remember to return dx0
However, I do not recommend this technique, because it makes you strongly dependent on the algorithm and you must know the exact step size (although, I may recommend it for saturation constraints). Another possibility is to perform a single integration step at a time and correct your x0 each time it is necessary:
// 1 do_step(sys, in, dxdtin, t, out, dt);
// 2 do something with output
// 3 in = out
// 4 return to 1 or finish
Sorry for C++ syntax, here is the exhaustive documentation (C++ odeint steppers), and here is its equivalent in python (Python ode class). C++ odeint interface is better for your task, however you may achieve exactly the same in python. Just look for:
integrate(t[, step, relax])
set_initial_value(y[, t])
in docs.

How to find a regression line for a closed set of data with 4 parameters in matlab or excel?

I have a set of data I have acquired from simulations. There are 3 parameters that go into my simulations and I get one result out.
I can graph the data from the small subset i have and see the trends for each input, but I need to be able to extrapolate this and get some form of a regression equation seeing as the simulation takes a long time.
In matlab or excel, is it possible to list the inputs and outputs to obtain a 4 parameter regression line for a given set of information?
Before this gets flagged as a duplicate, i understand polyfit will give me an equation of best fit and will be as accurate as i want it, but i need the equation to correspond to the inputs, not just a regression line.
In other words if i 20 simulations of inputs a, b, c and output y, is there a way to obtain a "best fit":
y=B0+B1*a+B2*b+B3*c
using the data?
My usual recommendation for higher-dimensional curve fitting is to pose the problem as a minimization problem (that may be unneeded here with the nice linear model you've proposed, but I'm a hammer-nail guy sometimes).
It starts by creating a correlation function (the functional form you think maps your inputs to the output) given a vector of fit parameters p and input data xData:
correl = #(p,xData) p(1) + p(2)*xData(:,1) + p(3)*xData(:2) + p(4)*xData(:,3)
Then you need to define a function to minimize given the parameter vector, which I call the objective; this is typically your correlation minus you output data.
The details of this function are determined from the solver you'll use (see below).
All of the method need a starting vector pGuess, which is dependent on the trends you see.
For nonlinear correlation function, finding a good pGuess can be a trial but necessary for a good solution.
fminsearch
To use fminsearch, the data must be collapsed to a scalar value using some norm (2 here):
x = [a,b,c]; % your input data as columns of x
objective = #(p) norm(correl(p,x) - y,2);
p = fminsearch(objective,pGuess); % you need to define a good pGuess
lsqnonlin
To use lsqnonlin (which solves the same problem as above in different ways), the norm-ing of the objective is not needed:
objective = #(p) correl(p,x) - y ;
p = lsqnonlin(objective,pGuess); % you need to define a good pGuess
(You can also specify lower and upper bounds on the parameter solution, which is nice.)
lsqcurvefit
To use lsqcurvefit (which is simply a wrapper for lsqnonlin), only the correlation function is needed along with the data:
p = lsqcurvefit(correl,pGuess,x,y); % you need to define a good pGuess

Resources