AR model with MATLAB - statistics

I'm using the following code taken from MATLAB documentation to estimate the parameters of an ARMA model:
y = sin([1:300]') + 0.5 * randn(300, 1);
y = iddata(y);
mb = ar(y, 4, 'burg');
At this point, if if I type mb what I get is this:
Discrete-time IDPOLY model:
A(q)y(t) = e(t)
A(q) = 1 - 0.2764 q^-1 + 0.2069 q^-2 + 0.4804 q^-3 + 0.1424 q^-4
Estimated using AR ('burg'/'now') from data set y
Loss function 0.314965 and FPE 0.323364
Sampling interval: 1
How can I use the variable mb I obtained to generate samples with those coefficients?
mb doesn't look like a vector.
In particular, how can I handle missing data?

Use: sim(mb,input)
More info about sim and also here:
Simulate linear models.
y = sim(m,ue)
[y, ysd] = sim(m,ue,init)
m is an arbitrary idmodel object.
ue is an iddata object, containing inputs only. The number of input
channels in ue must either be equal to the number of inputs of the
model m, or equal to the sum of the number of inputs and noise sources
(= number of outputs). In the latter case the last inputs in ue are
regarded as noise sources and a noise-corrupted simulation is
obtained. The noise is scaled according to the property
m.NoiseVariance in m, so in order to obtain the right noise level
according to the model, the noise inputs should be white noise with
zero mean and unit covariance matrix. If no noise sources are
contained in ue, a noise-free simulation is obtained.


How to fix the issue of plotting a 2D sine wave in python

I want to generate 2D travelling sine wave. To do this, I've set the parameters for the plane wave and generate wave for any time instants like as follows:
import numpy as np
import random
import matplotlib.pyplot as plt
f = 10 # frequency
fs = 100 # sample frequency
Ts = 1/fs # sample period
t = np.arange(0,0.5, Ts) # time index
c = 50 # speed of wave
w = 2*np.pi *f # angular frequency
k = w/c # wave number
resolution = 0.02
x = np.arange(-5, 5, resolution)
y = np.arange(-5, 5, resolution)
dx = np.array(x); M = len(dx)
dy = np.array(y); N = len(dy)
[xx, yy] = np.meshgrid(x, y);
theta = np.pi / 4 # direction of propagation
kx = k* np.cos(theta)
ky = k * np.sin(theta)
So, the plane wave would be
plane_wave = np.sin(kx * xx + ky * yy - w * t[1])
plt.imshow(plane_wave,cmap='seismic',origin='lower', aspect='auto')
that gives a smooth plane wave as shown in . Also, the sine wave variation with plt.figure(); plt.plot(plane_wave[2,:]) time is given in .
However, when I want to append plane waves at different time instants then there is some discontinuity arises in figure 03 & 04 , and I want to get rid of from this problem.
I'm new in python and any help will be highly appreciated. Thanks in advance.
arr = []
for count in range(len(t)):
p = np.sin(kx * xx + ky * yy - w * t[count]); # plane wave
arr = np.array(arr)
pp,q,r = arr.shape
sig = np.reshape(arr, (-1, r))
print('The signal shape is :', sig.shape)
plt.figure(); plt.imshow(sig.transpose(),cmap='seismic',origin='lower', aspect='auto')
plt.xlabel('X'); plt.ylabel('Y')
plt.figure(); plt.plot(sig[2,:])
This is not that much a problem of programming. It has to do more with the fact that you are using the physical quantities in a somewhat unusual way. Your plots are absolutely fine and correct.
What you seem to have misunderstood is the fact that you are talking about a 2D problem with a third dimension added for time. This is by no means wrong but if you try to append the snapshot of the 2D wave side-by-side you are using (again) the x spatial dimension to represent temporal variations. This leads to an inconsistency of the use of that coordinate axis. Now, to make this more intuitive, consider the two time instances separately. Does it not coincide with your intuition that all points on the 2D plane must have different amplitudes (unless of course the time has progressed by a multiple of the period of the wave)? This is the case indeed. Thus, when you try to append the two snapshots, a discontinuity is exhibited. In order to avoid that you have to either use a time step equal to one period, which I believe is of no practical use, or a constant time step that will make the phase of the wave on the left border of the image in the current time equal to the phase of the wave on the right border of the image in the previous time step. Yet, this will always be a constant time step, alternating the phase (on the edges of the image) between the two said values.
The same applies to the 1D case because you use the two coordinate axes to represent the wave (x is the x spatial dimension and y is used to represent the amplitude). This is what can be seen in your last plot.
Now, what would be the solution you may ask. The solution is provided by simple inspection of the mathematical formula of the wave function. In 2D, it is a scalar function of three variables (that is, takes as input three values and outputs one) and so you need at least four dimensions to represent it. Alas, we can't perceive a fourth spatial dimension, but this is not a problem in your case as the output of the function is represented with colors. Then there are three dimensions that could be used to represent the temporal evolution of your function. All you have to do is to create a 3D array where the third dimension represents time and all 2D snapshots will be stored in the first two dimensions.
When it comes to visual representation of the results you could either use some kind of waterfall plots where the z-axis will represent time or utilize the fourth dimension we can perceive, time that is, to create an animation of the evolution of the wave.
I am not very familiar with Python, so I will only provide a generic naive implementation. I am sure a lot of people here could provide some simplification and/or optimisation of the following snippet. I assume that everything in your first two blocks of code is available so changes have to be done only in the last block you present
arr = np.zeros((len(xx), len(yy), len(t))) # Initialise the array to hold the temporal evolution of the snapshots
for i in range(len(t)):
arr[:, :, i] = np.sin(kx * xx + ky * yy - w * t[i])
# Below you can plot the figures with any function you prefer or make an animation out of it

FFT loss in PyTorch

I want to compute the loss between the GT and the output of my network (called TDN) in the frequency domain by computing 2D FFT. The tensors are of dim batch x channel x height x width
amp_ip, phase_ip = 2DFFT(TDN(ip))
amp_gt, phase_gt = 2DFFT(TDN(gt))
loss = ||amp_ip - amp_gt||
For computing FFT I can use torch.fft(ip, signal_ndim = 2). But the output is in a + j b format i.e rectangular coordinates and NOT decomposed into phase and amplitude. How can I convert a + j b into amp exp(j phase) format in PyTorch? A side concern is also if signal_ndims be kept 2 to compute 2D FFT or something else?
The following description, which describes the loss that I plan to implement, maybe useful.
The question is answered by the GITHUB code file shared by #akshayk07 in the comments. Extracting the relevant information from that code, the concise answer to the question is,
fft_im = torch.rfft(img.clone(), signal_ndim=2, onesided=False)
# fft_im: size should be bx3xhxwx2
fft_amp = fft_im[:,:,:,:,0]**2 + fft_im[:,:,:,:,1]**2
fft_amp = torch.sqrt(fft_amp) # this is the amplitude
fft_pha = torch.atan2( fft_im[:,:,:,:,1], fft_im[:,:,:,:,0] ) # this is the phase
As of PyTorch 1.7.1 choose torch.rfft over torch.fft as the latter does not work off the shelf with real valued tensors propagating in CNNs. Also a good idea will be ti use the normalisation flag of torch.rfft.

What is the upsampling method called 'area' used for?

The PyTorch function torch.nn.functional.interpolate contains several modes for upsampling, such as: nearest, linear, bilinear, bicubic, trilinear, area.
What is the area upsampling modes used for?
As jodag said, it is resizing using adaptive average pooling. While the answer at the link aims to explain what adaptive average pooling is, I find the explanation a bit vague.
TL;DR the area mode of torch.nn.functional.interpolate is probably one of the most intuitive ways to think of when one wants to downsample an image.
You can think of it as applying an averaging Low-Pass Filter(LPF) to the original image and then sampling. Applying an LPF before sampling is to prevent potential aliasing in the downsampled image. Aliasing can result in Moiré patterns in the downscaled image.
It is probably called "area" because it (roughly) preserves the area ratio between the input and output shapes when averaging the input pixels. More specifically, every pixel in the output image will be the average of a respective region in the input image where the 1/area of this region will be roughly the ratio between output image's area and input image's area.
Furthermore, the interpolate function with mode = 'area' calls the source function adaptie_avg_pool2d (implemented in C++) which assigns each pixel in the output tensor the average of all pixel intensities within a computed region of the input. That region is computed per pixel and can vary in size for different pixels. The way it is computed is by multiplying the output pixel's height and width by the ratio between the input and output (in that order) height and width (respectively) and then taking once the floor (for the region's starting index) and once the ceil (for the region's ending index) of the resulting value.
Here's an in-depth analysis of what happens in nn.AdaptiveAvgPool2d:
First of all, as stated there you can find the source code for adaptive average pooling (in C++) here: source
Taking a look at the function where the magic happens (or at least the magic on CPU for a single frame), static void adaptive_avg_pool2d_single_out_frame, we have 5 nested loops, running over channel dimension, then width, then height and within the body of the 3rd loop the magic happens:
First compute the region within the input image which is used to calculate the value of the current pixel (recall we had width and height loop to run over all pixels in the output).
How is this done?
Using a simple computation of start and end indices for height and width as follows: floor((input_height/output_height) * current_output_pixel_height) for the start and ceil((input_height/output_height) * (current_output_pixel_height+1)) and similarly for the width.
Then, all that is done is to simply average the intensities of all pixels in that region and current channel and place the result in the current output pixel.
I wrote a simple Python snippet that does the same thing, in the same fashion (loops, naive) and produces equivalent results. It takes tensor a and uses adaptive average pool to resize a to shape output_shape in 2 ways - once using the built-in nn.AdaptiveAvgPool2d and once with my translation into Python of the source function in C++: static void adaptive_avg_pool2d_single_out_frame. Built-in function's result is saved into b and my translation is saved into b_hat. You can see that the results are equivalent (you can further play with the spatial shapes and validate this):
import torch
from math import floor, ceil
from torch import nn
a = torch.randn(1, 3, 15, 17)
out_shape = (10, 11)
b = nn.AdaptiveAvgPool2d(out_shape)(a)
b_hat = torch.zeros(b.shape)
for d in range(a.shape[1]):
for w in range(b_hat.shape[3]):
for h in range(b_hat.shape[2]):
startW = floor(w * a.shape[3] / out_shape[1])
endW = ceil((w + 1) * a.shape[3] / out_shape[1])
startH = floor(h * a.shape[2] / out_shape[0])
endH = ceil((h + 1) * a.shape[2] / out_shape[0])
b_hat[0, d, h, w] = torch.mean(a[0, d, startH: endH, startW: endW])
Prints Mean Squared Error = 0 (or a very small number, due to precision error)
as both outputs are the same, proof of output equivalence:
print(nn.MSELoss()(b_hat, b))
Looking at the source code it appears area interpolation is equivalent to resizing a tensor via adaptive average pooling. You can refer to this question for an explanation of adaptive average pooling. Therefore area interpolation is more applicable to downsampling than upsampling.

How do I build a probability matrix output layer in Keras

Suppose I need to build a network that takes two inputs:
A patient's information, represented as an array of features
Selected treatment, represented as one-hot encoded array
Now how do I build a network that outputs a 2D probability matrix A where A[i,j] represents the probability the patient will end up at state j under treatment i. Let's say there are n possible states, and under any treatment, the total probability of all n states sums up to 1.
I wanted to do this because I was motivated by a similar network, where the inputs are the same as above, but the output is a 1d array representing the expected lifetime after treatment i is delivered. And such network is built as follows:
def default_dense(feature_shape, n_treatment):
feature_input = keras.layers.Input(feature_shape)
treatment_input = keras.layers.Input((n_treatments,))
hidden_1 = keras.layers.Dense(16, activation = 'relu')(feature_input)
hidden_2 = keras.layers.Dense(16, activation = 'relu')(hidden_1)
output = keras.layers.Dense(n_treatments)(hidden_2)
output_on_action = keras.layers.multiply([output, treatment_input])
model = keras.models.Model([feature_input, treatment_input], output_on_action)
return model
And the training is simply = [features, encoded_treatments], y = encoded_treatments * lifetime[:, np.newaxis], verbose = 0)
This is super handy because when predicting, I can use np.ones() as the encoded_treatments, and the network gives expected lifetimes under all treatments, thus choosing the best one is one-step. Certainly I can create multiple networks, each for a treatment, but it would be much less efficient.
Now the questions is, can I do the same to probability output?
I have figured it out myself. The trick is to use RepeatVector() and Permute() layers to generate a matrix mask for treatments.
The output is an element-wise Multiply() of the mask and a Softmax() of same size.

sklearn customized standarization of data

Suppose I have a 2D numpy array:
X = np.array[
[..., ...],
[..., ...]]
And I want to standardize the data either with:
X = StandardScaler().fit_transform(X)
X = (X - X.mean())/X.std()
The results are different. Why are they different?
Assuming X is a feature matrix of shape (n x m) (n instances and m features). We want to scale each feature so its instances are distributed with a mean of zero and with unit variance.
To do this you need to calculate the mean and standard deviation of each feature for the provided instances (column of X) and then calculate the scaled feature vectors. Currently you are calculating the mean and standard deviation of the whole dataset and scaling the data using these values: this will give you meaningless results in all but a few special cases (i.e., X = np.ones((100,2)) is such a special case).
Practically, to calculate these statistics for each feature you will need to set the axis parameter of the .mean() or .std() methods to 0. This will perform the calculations along the columns and return a (1 x m) shaped array (actually a (m,) array, but thats another story), where each value is the mean or standard deviation for the given column. You can then use numpy broadcasting to correctly scale the feature vectors.
The below example shows how you can correctly implement it manually. x1 and x2 are 2 features with 100 training instances. We store them in a feature matrix X.
x1 = np.linspace(0, 100, 100)
x2 = 10 * np.random.normal(size=100)
X = np.c_[x1, x2]
# scale the data using the sklearn implementation
X_scaled = StandardScaler().fit_transform(X)
# scale the data taking mean and std along columns
X_scaled_manual = (X - X.mean(axis=0)) / X.std(axis=0)
If you print the two you will see they match exactly, explicitly:
returns 0.0.
