I’m writing a GAM using the mgcv package that predicts burrow abundance and distribution of two different species on an island using data obtained during a field trip and images taken from the Sentinel satellite. 101 plots were surveyed. 922 burrows belonging to species 1 were recorded in 66 plots and 29 burrows belonging to species 2 were recorded in 8 plots.
I used a negative binomial distribution for species 1 as using a Poisson distribution resulted in the model being over dispersed. The maximal model was:
gam(Species_1 ~ s(x, y, bs="ts") +
Sentinel2_band_1 + Sentinel2_band_2 + Sentinel2_band_3 + Sentinel2_band_4 + Sentinel2_band_5 +
Sentinel2_band_6 + Sentinel2_band_7 + Sentinel2_band_8 + Sentinel2_band_9 + Sentinel2_band_10 +
I(Sentinel2_band_1^2) + I(Sentinel2_band_2^2) + I(Sentinel2_band_3^2) + I(Sentinel2_band_4^2) + I(Sentinel2_band_5^2) +
I(Sentinel2_band_6^2) + I(Sentinel2_band_7^2) + I(Sentinel2_band_8^2) + I(Sentinel2_band_9^2) + I(Sentinel2_band_10^2) +
aspect + elevation + slope +
I(aspect^2) + I(elevation^2) + I(slope^2) +
aspect:elevation + aspect:slope + elevation:slope,
data = dat,
family = nb(1))
The model selection process has resulted in a model that gives acceptable results.
When I run the same model using species 2 as the response variable I get the following error message:
Warning message:
In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L, :
Fitting terminated with step failure - check results carefully
The diagnostic plots also look pretty dodgy:
My assumption the issue I’m encountering is due to the much smaller sample size for species 2.
Any ideas what I can do to resolve this problem?
I have a large spreadsheet with a number of forumlas and they all make complete sense apart from one, which is listed below. Does anyone have any idea what this NORMALDIST calculation is trying to acheive or tell me? It has relevants to HE
=MAX(1,NORMDIST(3,N18,N18/4,TRUE)-NORMDIST(0,N18,N18/4,TRUE) + 2*(NORMDIST(6,N18,N18/4,TRUE)-NORMDIST(3,N18,N18/4,TRUE)) + 3*(NORMDIST(9,N18,N18/4,TRUE)-NORMDIST(6,N18,N18/4,TRUE)) + 4*(NORMDIST(12,N18,N18/4,TRUE)-NORMDIST(9,N18,N18/4,TRUE)) + 5*(NORMDIST(15,N18,N18/4,TRUE)-NORMDIST(12,N18,N18/4,TRUE)) + 6*(NORMDIST(18,N18,N18/4,TRUE)-NORMDIST(15,N18,N18/4,TRUE)) + 7*(NORMDIST(21,N18,N18/4,TRUE)-NORMDIST(18,N18,N18/4,TRUE)) + 8*(NORMDIST(24,N18,N18/4,TRUE)-NORMDIST(21,N18,N18/4,TRUE)) + 9*(NORMDIST(27,N18,N18/4,TRUE)-NORMDIST(24,N18,N18/4,TRUE)) + 10*(NORMDIST(30,N18,N18/4,TRUE)-NORMDIST(27,N18,N18/4,TRUE)) + 11*(NORMDIST(33,N18,N18/4,TRUE)-NORMDIST(30,N18,N18/4,TRUE)) + 12*(NORMDIST(36,N18,N18/4,TRUE)-NORMDIST(33,N18,N18/4,TRUE)) + 13*(NORMDIST(39,N18,N18/4,TRUE)-NORMDIST(36,N18,N18/4,TRUE)) + 14*(NORMDIST(42,N18,N18/4,TRUE)-NORMDIST(39,N18,N18/4,TRUE)) + 15*(NORMDIST(45,N18,N18/4,TRUE)-NORMDIST(42,N18,N18/4,TRUE)) + 16*(NORMDIST(48,N18,N18/4,TRUE)-NORMDIST(45,N18,N18/4,TRUE)) + 17*(NORMDIST(51,N18,N18/4,TRUE)-NORMDIST(48,N18,N18/4,TRUE)) + 18*(NORMDIST(54,N18,N18/4,TRUE)-NORMDIST(51,N18,N18/4,TRUE)) + 19*(NORMDIST(57,N18,N18/4,TRUE)-NORMDIST(54,N18,N18/4,TRUE)) + 20*(NORMDIST(60,N18,N18/4,TRUE)-NORMDIST(57,N18,N18/4,TRUE)) + 21*(NORMDIST(63,N18,N18/4,TRUE)-NORMDIST(60,N18,N18/4,TRUE)) + 22*(NORMDIST(66,N18,N18/4,TRUE)-NORMDIST(63,N18,N18/4,TRUE)) + 23*(NORMDIST(69,N18,N18/4,TRUE)-NORMDIST(66,N18,N18/4,TRUE)))
Strange question I know, but could not think of where else to ask!!!
Cheers
The equation has a series of terms of the form N*[NORMDIST(3N,mu,sigma)-NORMDIST(3N-3,mu,sigma)] where mu is the mean (N18 in the equation), sigma is the standard deviation (N18/4), and with N going from 1 to 23. This appears to be an estimate involving the average of the normal distribution. It would be more rigorous for N to go from minus infinity to plus infinity and it's not clear why this formula truncated the interval to 1..23. Nevertheless, if the person who wrote the equation was calculating the average, then from the properties of the normal distribution you can derive a closed form solution as:
Total of all NORMDIST terms = mu/3 + 1/2
This will be accurate as long as mu (N18) is in the between 0 and 30. If you plug this into the equation you get
=MAX(1,N18/3+0.5)
Hope that helps.
From the docs...
NORMDIST function
Excel for Office 365 Excel for Office 365 for Mac Excel 2019 Excel 2016 More...
Returns the normal distribution for the specified mean and standard deviation. This function has a very wide range of applications in statistics, including hypothesis testing.
Important: This function has been replaced with one or more new functions that may provide improved accuracy and whose names better reflect their usage. Although this function is still available for backward compatibility, you should consider using the new functions from now on, because this function may not be available in future versions of Excel.
For more information about the new function, see NORM.DIST function.
Here is my code
a = [10,10,20]
b = [2,5,4]
print(sum(a) / sum(b))
print(sum([i/j for i,j in zip(a,b)])/3)
The output is
3.6363636363636362
4.0
My question is: How to make the first calculation right.And why is there such a difference?
Thanks.
The first one is (10+10+20)/(2+5+4) = 40/11 = 3.6363.
The second one is (10/2 + 10/5 + 20/4)/3 = (5 + 2 + 5)/3=4
Those are two different calculations. There is no reason to assume there should not be any difference.
Nothing is wrong with the calculation.
In the first case i.e, in
(print(sum(a) / sum(b)))
you are first adding the numerator and adding the denominator seperately and then dividing them
let [a,b,c] and [d,e,f] be your list elements, in the first case, you are doing
(a+b+c)/(d+e+f)
While in the second case, you are doing
a/d + b/e + c/f
and then dividing by 3
Which is why you got two different answers
Let us say I have log_a1=-1000, log_a2=-1001, and log_a3=-1002.
n=3
I want to get the harmonic mean (HM) of a1, a2 and a3 (not log_a1, log_a2 and log_a3) such that HM = n/[1/exp(log_a1) + 1/exp(log_a2) + 1/exp(log_a3)].
However, due to rounding error exp(log_a1)=exp(-1000)=0 and accordingly 1/exp(log_a)=inf and HM=0.
Is there any mathematical trick to do? It is okay to get either HM or log(HM).
The best approach is probably to keep things in log scale. Many scientific languages have a log-add-exp function (e.g. numpy.logaddexp in python) that does what you want to high precision, with both the input and the result in log form.
The idea is that you want to compute e^-1000 + e^-1001 + e^-1002, so you factor it to e^-1000 (1 + + e^-1 + e^-2) and take the log. The result is -1000 + log(1 + e^-1 + e^-2), which can be computed without loss of precision.
log(HM)=log(n)-log(1)+log_a_max - log(sum(1./exp(log_ai - log_a_max)))
For a=[-1000, -1001, -1002];
log(HM)=-1001.309