I have a data set generated from a sensor that gives the pressure over a period of time at a specific location. The sensor I'm using is low-cost not research graded. I have access to the data set from a research sensor and there's a correlation between my data set and the research-graded one. However, there's also a large offset and scaling. I would like to know if it's possible to use Python to fit my data set to the research-graded data set by determining the offset and scaling factor because the two data sets have similar shape the only difference is the offset and factor.
Thanks
Assuming you are looking for a simple scaling factor,
def getScalingFactor(data1,data2,order=2):
return np.linalg.norm(data1,order) / np.linalg.norm(data2,order)
# For example, if your data is
def f(x):
return x**2;
x = np.arange(-5,6)
# y1 is your sensor data
y1 = f(x)
# y2 is your 'research-graded' sensor data having a simple scale with respect to y1
y2 = f(x) * 2. # 2.0 is your example scale
# You would expect `getScalingFactor` to return 2.0
out_scale = getScalingFactor(y2,y1,order=2)
As you have explicitly mentioned about using Python, I have used numpy.linalg.norm to calculate the norms.
Related
I am using scipy.signal.find_peaks function (link) to determine the peaks of a signal provided.
The signal is loaded in a dataframe like this
x = df["sig_coord"] # the x coordinate of a signal , time msec
Y = df["sig_value"] # value of a signal f(x) - Measured Voltage at the moment x, V
The dataframe has abot 10k points for one signal.
peaks = signal.find_peaks(Y, prominence=2, distance = 40)
the parameter distance actually the measure of how far peaks can be distanced from each other, as I undersood. But the dimension of this parameter must be set in discrete number values - or point number of a signal.
My x-scale is non-monotonic, the distance between pints is changed in a special manner (non-equidistant measurements) so it's unconvenient to use discrete numbers as a dimension of a distance in this case... It will be much better to use a distance provided in a time dimesion (msec in my case). Is it possible to do that?
Maybe there is a way to use find_peaks functionality providing the distance in coordinates that have physical meaning, not strictly discrets number..?
Or maybe it's possible to recalc those values in a simple manner?
Because I have various ratio msec/signal points at various signal parts..
I want to generate 2D travelling sine wave. To do this, I've set the parameters for the plane wave and generate wave for any time instants like as follows:
import numpy as np
import random
import matplotlib.pyplot as plt
f = 10 # frequency
fs = 100 # sample frequency
Ts = 1/fs # sample period
t = np.arange(0,0.5, Ts) # time index
c = 50 # speed of wave
w = 2*np.pi *f # angular frequency
k = w/c # wave number
resolution = 0.02
x = np.arange(-5, 5, resolution)
y = np.arange(-5, 5, resolution)
dx = np.array(x); M = len(dx)
dy = np.array(y); N = len(dy)
[xx, yy] = np.meshgrid(x, y);
theta = np.pi / 4 # direction of propagation
kx = k* np.cos(theta)
ky = k * np.sin(theta)
So, the plane wave would be
plane_wave = np.sin(kx * xx + ky * yy - w * t[1])
plt.figure();
plt.imshow(plane_wave,cmap='seismic',origin='lower', aspect='auto')
that gives a smooth plane wave as shown in . Also, the sine wave variation with plt.figure(); plt.plot(plane_wave[2,:]) time is given in .
However, when I want to append plane waves at different time instants then there is some discontinuity arises in figure 03 & 04 , and I want to get rid of from this problem.
I'm new in python and any help will be highly appreciated. Thanks in advance.
arr = []
for count in range(len(t)):
p = np.sin(kx * xx + ky * yy - w * t[count]); # plane wave
arr.append(p)
arr = np.array(arr)
print(arr.shape)
pp,q,r = arr.shape
sig = np.reshape(arr, (-1, r))
print('The signal shape is :', sig.shape)
plt.figure(); plt.imshow(sig.transpose(),cmap='seismic',origin='lower', aspect='auto')
plt.xlabel('X'); plt.ylabel('Y')
plt.figure(); plt.plot(sig[2,:])
This is not that much a problem of programming. It has to do more with the fact that you are using the physical quantities in a somewhat unusual way. Your plots are absolutely fine and correct.
What you seem to have misunderstood is the fact that you are talking about a 2D problem with a third dimension added for time. This is by no means wrong but if you try to append the snapshot of the 2D wave side-by-side you are using (again) the x spatial dimension to represent temporal variations. This leads to an inconsistency of the use of that coordinate axis. Now, to make this more intuitive, consider the two time instances separately. Does it not coincide with your intuition that all points on the 2D plane must have different amplitudes (unless of course the time has progressed by a multiple of the period of the wave)? This is the case indeed. Thus, when you try to append the two snapshots, a discontinuity is exhibited. In order to avoid that you have to either use a time step equal to one period, which I believe is of no practical use, or a constant time step that will make the phase of the wave on the left border of the image in the current time equal to the phase of the wave on the right border of the image in the previous time step. Yet, this will always be a constant time step, alternating the phase (on the edges of the image) between the two said values.
The same applies to the 1D case because you use the two coordinate axes to represent the wave (x is the x spatial dimension and y is used to represent the amplitude). This is what can be seen in your last plot.
Now, what would be the solution you may ask. The solution is provided by simple inspection of the mathematical formula of the wave function. In 2D, it is a scalar function of three variables (that is, takes as input three values and outputs one) and so you need at least four dimensions to represent it. Alas, we can't perceive a fourth spatial dimension, but this is not a problem in your case as the output of the function is represented with colors. Then there are three dimensions that could be used to represent the temporal evolution of your function. All you have to do is to create a 3D array where the third dimension represents time and all 2D snapshots will be stored in the first two dimensions.
When it comes to visual representation of the results you could either use some kind of waterfall plots where the z-axis will represent time or utilize the fourth dimension we can perceive, time that is, to create an animation of the evolution of the wave.
I am not very familiar with Python, so I will only provide a generic naive implementation. I am sure a lot of people here could provide some simplification and/or optimisation of the following snippet. I assume that everything in your first two blocks of code is available so changes have to be done only in the last block you present
arr = np.zeros((len(xx), len(yy), len(t))) # Initialise the array to hold the temporal evolution of the snapshots
for i in range(len(t)):
arr[:, :, i] = np.sin(kx * xx + ky * yy - w * t[i])
# Below you can plot the figures with any function you prefer or make an animation out of it
I used log-transformed data (dependent varibale=count) in my generalised additive model (using mgcv) and tried to plot the response by using "trans=plogis" as for logistic GAMs but the results don't seem right. Am I forgetting something here? When I used linear models for my data first, I plotted the least-square means. Any idea how I could plot the output of my GAMs in a more interpretable way other than on the log scale?
Cheers
Are you running a logistic regression for count data? Logistic regression is normally a binary variable or a proportion of binary outcomes.
That being said, the real question here is that you want to backtransform a variable that was fit on the log scale back to the original scale for plotting. That can be easily done using the itsadug package. I've simulated some silly data here just to show the code required.
With itsadug, you can visually inspect many aspects of GAM models. I'd encourage you to look at this: https://cran.r-project.org/web/packages/itsadug/vignettes/inspect.html
The transform argument of plot_smooth() can also be used with custom functions written in R. This can be useful if you have both centred and logged a dependent variable.
library(mgcv)
library(itsadug)
# Setting seed so it's reproducible
set.seed(123)
# Generating 50 samples from a uniform distribution
x <- runif(50, min = 20, max = 50)
# Taking the sin of x to create a dependent variable
y <- sin(x)
# Binding them to a dataframe
d <- data.frame(x, y)
# Logging the dependent variable after adding a constant to prevent negative values
d$log_y <- log(d$y + 1)
# Fitting a GAM to the transformed dependent variable
model_fit <- gam(log_y ~ s(x),
data = d)
# Using the plot_smooth function from itsadug to backtransform to original y scale
plot_smooth(model_fit,
view = "x",
transform = exp)
You can specify the trans function for back-transforming as :trans = function(x){exp(coef(gam)[1]+x)}, where gam is your fitted model, and coef(gam)[1] is the intercept.
Python developers
I am working on spectroscopy in a university. My experimental 1-D data sometimes shows "cosmic ray", 3-pixel ultra-high intensity, which is not what I want to analyze. So I want to remove this kind of weird peaks.
Does anybody know how to fix this issue in Python 3?
Thanks in advance!!
A simple solution could be to use the algorithm proposed by Whitaker and Hayes, in which they use modified z scores on the derivative of the spectrum. This medium post explains how it works and its implementation in python https://towardsdatascience.com/removing-spikes-from-raman-spectra-8a9fdda0ac22 .
The idea is to calculate the modified z scores of the spectra derivatives and apply a threshold to detect the cosmic spikes. Afterwards, a fixer is applied to remove the spike points and replace it by the mean values of the surrounding pixels.
# definition of a function to calculate the modified z score.
def modified_z_score(intensity):
median_int = np.median(intensity)
mad_int = np.median([np.abs(intensity - median_int)])
modified_z_scores = 0.6745 * (intensity - median_int) / mad_int
return modified_z_scores
# Once the spike detection works, the spectrum can be fixed by calculating the average of the previous and the next point to the spike. y is the intensity values of a spectrum, m is the window which we will use to calculate the mean.
def fixer(y,m):
threshold = 7 # binarization threshold.
spikes = abs(np.array(modified_z_score(np.diff(y)))) > threshold
y_out = y.copy() # So we don't overwrite y
for i in np.arange(len(spikes)):
if spikes[i] != 0: # If we have an spike in position i
w = np.arange(i-m,i+1+m) # we select 2 m + 1 points around our spike
w2 = w[spikes[w] == 0] # From such interval, we choose the ones which are not spikes
y_out[i] = np.mean(y[w2]) # and we average the value
return y_out
The answer depends a on what your data looks like: If you have access to two-dimensional CCD readouts that the one-dimensional spectra were created from, then you can use the lacosmic module to get rid of the cosmic rays there. If you have only one-dimensional spectra, but multiple spectra from the same source, then a quick ad-hoc fix is to make a rough normalisation of the spectra and remove those pixels that are several times brighter than the corresponding pixels in the other spectra. If you have only one one-dimensional spectrum from each source, then a less reliable option is to remove all pixels that are much brighter than their neighbours. (Depending on the shape of your cosmics, you may even want to remove the nearest 5 pixels or something, to catch the wings of the cosmic ray peak as well).
My data has two columns: date (in Month/Year format) and corresponding value. I plotted this data on x-log(y) scale using gnuplot. It looks very close to a straight line. I am interested to draw a straight line using curve fitting. I tried with few fit functions but did not get success.
I tried the following fit functions:
f(x) = a * x + b (f(x) is not linear as scale is x-log(y))
f(x) = a*10**x + b (overflow error)
Any help in this regard would be appreciated.
The overflow error should be due to at least one large value of x. If you can rescale the x data so that there is no overflow when calculating 10**x, the fit might work. As a test, try something like:
x_scaled = x / 1000.0
f(x_scaled) = a*10**x_scaled + b
Inspecting the maximum value of x will give you an idea of the scaling value, shown as 1000.0 in my example.