In ANOVA, the square root of MSE is equal to the pooled SD across groups (or, equally, SE of marginal means * sqrt(n)).
But, in ANCOVA, the pooled adjusted SD (SE of the adjusted marginal mean * sqrt(n)) is not equal to sqrt(MSE), although it is very close. What is the difference?
The adjustment is, I assume, equally applied to both statistics in the same way, so why is there a difference?
(The issue became a practical one we calculate SMD from ANCOVA.)
Here is a reproducible example:
library("effects") #for calculating the adjusted statistics
dat <- data.frame(
gp = factor(c(1,1,1,1,2,2,2,2)),
pre = c(6,3,1,3,3,7,2,3),
post = c(7,6,2,4,6,10,5,4)
#ANOVA <- lm(post ~ gp, dat)
anova.res <- anova(
adj.res.anova <- effect("gp",, se = T)
sqrt(anova.res$`Mean Sq`[2]) #pooled SD from MSE
[1] 2.43242
adj.res.anova$se * sqrt(4) #pooled SD from SE * sqrt(n)
[1] 2.43242 2.43242
ancova.res <- anova(reg.res.with.cov)
reg.res.with.cov <- lm(post ~ pre + gp, dat)
adj.res.ancova <- effect("gp", reg.res.with.cov, se = T)
sqrt(ancova.res$`Mean Sq`[3]) #pooled SD from MSE
adj.res.ancova$se * sqrt(4) #pooled SD from SE * sqrt(n)
[1] 1.097073 1.097073


Canonical correlation analysis on covariance matrices instead of raw data

Due to privacy issues I don't have the original raw data matrices, but instead I can have covariance matrices of x and y (x'x, y'y, x'y) datasets or the correlation matrix between the two of them (or any other sort of matrix that is not the original data matrix).
I need to find a way to apply canonical correlation analysis directly on those matrices. Browsing the net I didn't find any solution to my problem. I want to ask if there is already an implemented algorithm able to work on these data, in R would be the best, but other languages are ok
Example from the tutorial in R for cca package: (
mm <- read.csv("")
colnames(mm) <- c("Control", "Concept", "Motivation", "Read", "Write", "Math",
"Science", "Sex")
You divide the dataset into x and y :
x <- mm[, 1:3]
y <- mm[, 4:8]
Then the function works taking as input these two datasets: cc(x,y) (note that the function standardizes the data by itself).
What I want to know if there is a way to perform cca starting by centering matrices around the mean:
x = scale(x, scale = F)
y = scale(Y, scale = F)
An then computing the covariance matrices x'x, y'y, xy'xy:
cvx = crossprod(x); cvy = crossprod(y); cvxy = crossprod(x,y)
And the algorithm should take in input those matrices to work and compute the canonical variates and correlation coefficients
like: f(cvx, cvy, cvxy)
In this article is written a solution starting from covariance matrices for example, but I don't if it is just theory or someone has actually implemented it
I hope to be exhaustive enough!
In short: the correlation are using internally in most (probably all) CCA analysis.
In long: you will need to work out a bit how to do that depending on the case. Let me show you below a example.
What is Canonical-correlation analysis (CCA)?
Canonical-correlation analysis (CCA): help you to identify the best possible linear relations you could create between two datasets. See wikipedia. See references for examples. I will follow this post for the data and use libraries.
Set up libraries, upload the data, select some variables, removed nans, estandarizad the data.
import pandas as pd
import numpy as np
df = pd.read_csv('2016 School Explorer.csv')
# choose relevant features
df = df[['Rigorous Instruction %',
'Collaborative Teachers %',
'Supportive Environment %',
'Effective School Leadership %',
'Strong Family-Community Ties %',
'Trust %','Average ELA Proficiency',
'Average Math Proficiency']]
# drop missing values
df = df.dropna()
# separate X and Y groups
X = df[['Rigorous Instruction %',
'Collaborative Teachers %',
'Supportive Environment %',
'Effective School Leadership %',
'Strong Family-Community Ties %',
'Trust %'
Y = df[['Average ELA Proficiency',
'Average Math Proficiency']]
for col in X.columns:
X[col] = X[col].str.strip('%')
X[col] = X[col].astype('int')
# Standardise the data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler(with_mean=True, with_std=True)
X_sc = sc.fit_transform(X)
Y_sc = sc.fit_transform(Y)
What are Correlations?
I am pausing here to talk about the idea and the implementation.
First of all CCA analysis is naturally based on that idea however for the numerical resolution there are different ways to do that.
The definition from wikipedia. See the pic:
I am talking about this because I am going to modify a function of that library and I want you to really pay attention to that.
See Eq 4 in Bilenko et al 2016. But you need to be really careful with how to place that well.
Notice that strictly speaking you do not need the correlations.
Let me show the the function that is working out that expression, in pyrrcca library here
def kcca(data, reg=0., numCC=None, kernelcca=True,
gausigma=1.0, degree=2):
"""Set up and solve the kernel CCA eigenproblem
if kernelcca:
kernel = [_make_kernel(d, ktype=ktype, gausigma=gausigma,
degree=degree) for d in data]
kernel = [d.T for d in data]
nDs = len(kernel)
nFs = [k.shape[0] for k in kernel]
numCC = min([k.shape[1] for k in kernel]) if numCC is None else numCC
# Get the auto- and cross-covariance matrices
crosscovs = [, kj.T) for ki in kernel for kj in kernel]
# Allocate left-hand side (LH) and right-hand side (RH):
LH = np.zeros((sum(nFs), sum(nFs)))
RH = np.zeros((sum(nFs), sum(nFs)))
# Fill the left and right sides of the eigenvalue problem
for i in range(nDs):
RH[sum(nFs[:i]) : sum(nFs[:i+1]),
sum(nFs[:i]) : sum(nFs[:i+1])] = (crosscovs[i * (nDs + 1)]
+ reg * np.eye(nFs[i]))
for j in range(nDs):
if i != j:
LH[sum(nFs[:j]) : sum(nFs[:j+1]),
sum(nFs[:i]) : sum(nFs[:i+1])] = crosscovs[nDs * j + i]
LH = (LH + LH.T) / 2.
RH = (RH + RH.T) / 2.
maxCC = LH.shape[0]
r, Vs = eigh(LH, RH, eigvals=(maxCC - numCC, maxCC - 1))
r[np.isnan(r)] = 0
rindex = np.argsort(r)[::-1]
comp = []
Vs = Vs[:, rindex]
for i in range(nDs):
comp.append(Vs[sum(nFs[:i]):sum(nFs[:i + 1]), :numCC])
return comp
The output from here the Canonical Covariates (comp), those are a and b in Eq4 in Bilenko et al 2016.
I just want you to pay attention to this:
# Get the auto- and cross-covariance matrices
crosscovs = [, kj.T) for ki in kernel for kj in kernel]
That is exactly the place where that operation happens. Notice that is not exactly the definition from Wikipedia, however is mathematically equivalent.
Calculation of the correlations
I am going to calculate the correlations as in wikipedia but later I will modify that function, so it is going to bit a couple of details, to make sure this is answering the original questions clearly.
# Get the auto- and cross-covariance matrices
crosscovs = [, kj.T) for ki in kernel for kj in kernel]
[array([[1217. , 746.04496925, 736.14178336, 575.21073838,
517.52474332, 641.25363806],
[ 746.04496925, 1217. , 732.6297358 , 1094.38480773,
572.95747557, 1073.96490387],
[ 736.14178336, 732.6297358 , 1217. , 559.5753228 ,
682.15312862, 774.36607617],
[ 575.21073838, 1094.38480773, 559.5753228 , 1217. ,
495.79248754, 1047.31981248],
[ 517.52474332, 572.95747557, 682.15312862, 495.79248754,
1217. , 632.75610906],
[ 641.25363806, 1073.96490387, 774.36607617, 1047.31981248,
632.75610906, 1217. ]]), array([[367.74099904, 391.82683717],
[348.78464015, 355.81358426],
[440.88117453, 514.22183796],
[326.32173163, 311.97282341],
[216.32441793, 269.72859023],
[288.27601974, 304.20209135]]), array([[367.74099904, 348.78464015, 440.88117453, 326.32173163,
216.32441793, 288.27601974],
[391.82683717, 355.81358426, 514.22183796, 311.97282341,
269.72859023, 304.20209135]]), array([[1217. , 1139.05867099],
[1139.05867099, 1217. ]])]
Have a look to the output, I am going to change that a bit so is between -1 and 1. Again, this modification is minor. Following the definition from wikipedia the authors just care about the numerator, and I am just going to include now the denominator.
max_unit = 0
for crosscov in crosscovs:
max_unit = np.max([max_unit,np.max(crosscov)])
"""I normalice"""
crosscovs_new = []
for crosscov in crosscovs:
[array([[1. , 0.6130197 , 0.60488232, 0.47264646, 0.4252463 ,
[0.6130197 , 1. , 0.6019965 , 0.89924799, 0.47079497,
[0.60488232, 0.6019965 , 1. , 0.45979895, 0.56052024,
[0.47264646, 0.89924799, 0.45979895, 1. , 0.40738906,
[0.4252463 , 0.47079497, 0.56052024, 0.40738906, 1. ,
[0.52691342, 0.88246911, 0.63629094, 0.86057503, 0.51993107,
1. ]]), array([[0.30217009, 0.32196125],
[0.28659379, 0.29236942],
[0.36226884, 0.42253232],
[0.26813618, 0.25634579],
[0.17775219, 0.22163401],
[0.2368743 , 0.24996063]]), array([[0.30217009, 0.28659379, 0.36226884, 0.26813618, 0.17775219,
0.2368743 ],
[0.32196125, 0.29236942, 0.42253232, 0.25634579, 0.22163401,
0.24996063]]), array([[1. , 0.93595618],
[0.93595618, 1. ]])]
For clarity I will show you in a slightly different way to see that the numbers and indeed correlations of the original data.
Average ELA Proficiency Average Math Proficiency
Average ELA Proficiency 1.000000 0.935956
Average Math Proficiency 0.935956 1.000000
That is a way to see as well the variables name. I just want to show you that the numbers above make sense, and are what you are calling correlations.
Calculations of the CCA
So now I will just modify a bit the function kcca from pyrrcca. The idea is for that function to accept the previously calculated correlations matrixes.
from rcca import _make_kernel
from scipy.linalg import eigh
def kcca_working(data, reg=0.,
"""Set up and solve the kernel CCA eigenproblem
if kernelcca:
kernel = [_make_kernel(d, ktype=ktype, gausigma=gausigma,
degree=degree) for d in data]
kernel = [d.T for d in data]
nDs = len(kernel)
nFs = [k.shape[0] for k in kernel]
numCC = min([k.shape[1] for k in kernel]) if numCC is None else numCC
if crosscovs is None:
# Get the auto- and cross-covariance matrices
crosscovs = [, kj.T) for ki in kernel for kj in kernel]
# Allocate left-hand side (LH) and right-hand side (RH):
LH = np.zeros((sum(nFs), sum(nFs)))
RH = np.zeros((sum(nFs), sum(nFs)))
# Fill the left and right sides of the eigenvalue problem
for i in range(nDs):
RH[sum(nFs[:i]) : sum(nFs[:i+1]),
sum(nFs[:i]) : sum(nFs[:i+1])] = (crosscovs[i * (nDs + 1)]
+ reg * np.eye(nFs[i]))
for j in range(nDs):
if i != j:
LH[sum(nFs[:j]) : sum(nFs[:j+1]),
sum(nFs[:i]) : sum(nFs[:i+1])] = crosscovs[nDs * j + i]
LH = (LH + LH.T) / 2.
RH = (RH + RH.T) / 2.
maxCC = LH.shape[0]
r, Vs = eigh(LH, RH, eigvals=(maxCC - numCC, maxCC - 1))
r[np.isnan(r)] = 0
rindex = np.argsort(r)[::-1]
comp = []
Vs = Vs[:, rindex]
for i in range(nDs):
comp.append(Vs[sum(nFs[:i]):sum(nFs[:i + 1]), :numCC])
return comp, crosscovs
Let run the function:
comp, crosscovs = kcca_working([X_sc, Y_sc], reg=0.,
numCC=2, kernelcca=False, ktype='linear',
gausigma=1.0, degree=2, crosscovs = crosscovs_new)
[array([[-0.00375779, 0.0078263 ],
[ 0.00061439, -0.00357358],
[-0.02054012, -0.0083491 ],
[-0.01252477, 0.02976148],
[ 0.00046503, -0.00905069],
[ 0.01415084, -0.01264106]]), array([[ 0.00632283, 0.05721601],
[-0.02606459, -0.05132531]])]
So I take the original function, and make possible to introduce the correlations, I also output that just for checking.
I print the Canonical Covariates (comp), those are a and b in Eq4 in Bilenko et al 2016.
Comparing results
Now I am going to compare results from the original and the modified function. I will show you that the results are equivalent.
I could obtain the original results this way. With crosscovs = None, so it is calculated as originally, instead of us introducing it:
comp, crosscovs = kcca_working([X_sc, Y_sc], reg=0.,
numCC=2, kernelcca=False, ktype='linear',
gausigma=1.0, degree=2, crosscovs = None)
[array([[-0.13109264, 0.27302457],
[ 0.02143325, -0.12466608],
[-0.71655285, -0.2912628 ],
[-0.43693303, 1.03824477],
[ 0.01622265, -0.31573818],
[ 0.49365965, -0.44098996]]), array([[ 0.2205752 , 1.99601077],
[-0.90927705, -1.79051045]])]
I print the Canonical Covariates (comp), those are a' and b' in Eq4 in Bilenko et al 2016.
a, b and a', b' are different but they are just different in the scale, so for all purpose they are equivalent. This is because of the correlations definitions.
To show that let me pick up numbers from each case and calculate the ratio:
They are the same result.
When that is modified you could just build in the top of that.
Cool post with example and explanations in Python, using library pyrcca:
Bilenko, Natalia Y., and Jack L. Gallant. "Pyrcca: regularized kernel canonical correlation analysis in python and its applications to neuroimaging." Frontiers in neuroinformatics 10 (2016): 49. Paper in which pyrcca is explained:

p-values for estimates in flexsurvreg

I fitted a survival model using an inverse weibull distribution in flexsurvreg:
if (require("actuar")){
invweibull <- list(name="invweibull",
transforms=c(log, log),
inv.transforms=c(exp, exp),
inits=function(t){ c(1, median(t)) })
invweibull <- flexsurvreg(formula = kpnsurv~iaas, data = kpnrs2,
And I got the following output:
flexsurvreg(formula = kpnsurv ~ iaas, data = kpnrs2, dist = invweibull)
data. mean. est L95% U95% se exp(est) L95% U95%
shape NA 0.4870 0.4002 0.5927 0.0488 NA NA NA
scale NA 62.6297 36.6327 107.0758 17.1371 NA NA NA
iaas 0.4470 -0.6764 -1.2138 -0.1391 0.2742 0.5084 0.2971 0.8701
N = 302, Events: 54, Censored: 248
Total time at risk: 4279
Log-likelihood = -286.7507, df = 3
AIC = 579.5015
How can I get the p-value of the covariate estimate (in this case iaas)? Thank you for your help.
Just in case this is still useful to anyone, this worked for me. First extract the matrix of coefficient information from the model:
invweibull.res <- invweibull$res
Then divide the estimated coefficients by their standard errors to calculate the Wald statistics, which have asymptotic standard normal distributions:
invweibull.wald <- invweibull.res[,1]/invweibull.res[,4]
Finally, get the p-values:
invweibull.p <- 2*pnorm(-abs(invweibull.wald))

Calculating p-values with pnorm ( ). What makes p-values differ if data is transformed?

I am comparing two alternatives for calculating p-values with R's pnorm() function.
xbar <- 2.1
mu <- 2
sigma <- 0.25
n = 35
# z-transformation
z <- (xbar - mu) / (sigma / sqrt(n))
# Alternative I using transformed values
pval1 <- pnorm(q = z)
# Alternative II using untransformed values
pval2 <- pnorm(q = xbar, mean = mu, sd = sigma)
How come the two calculated p-values are not the same? Should not they?
They are different because you use two different estimates of the standard deviation.
In the z-transformation calculation you use sigma / sqrt(n) as the standard deviation, but in the untransformed calculation you use sd = sigma, ignoring n.

Random effects modeling using mgcv and using lmer. Basically identical fits but VERY different likelihoods and DF. Which to use for testing?

I am aware that there is a duality between random effects and smooth curve estimation. At this link, Simon Wood describes how to specify random effects using mgcv. Of particular note is the following passage:
For example if g is a factor then s(g,bs="re") produces a random coefficient for each level of g, with the radndom coefficients all modelled as i.i.d. normal.
After a quick simulation, I can see this is correct, and that the model fits are almost identical. However, the likelihoods and degrees of freedom are VERY different. Can anyone explain the difference? Which one should be used for testing?
x <- rnorm(1000)
ID <- rep(1:200,each=5)
y <- x
for(i in 1:200) y[which(ID==i)] <- y[which(ID==i)] + rnorm(1)
y <- y + rnorm(1000)
ID <- as.factor(ID)
# gam (mgcv)
m <- gam(y ~ x + s(ID,bs="re"))
# lmer
m2 <- lmer(y ~ x + (1|ID))
mean( abs( fitted(m)-fitted(m2) ) )
Full disclosure: I encountered this problem because I want to fit a GAM that also includes random effects (repeated measures), but need to know if I can trust likelihood-based tests under those models.

How to find the fundamental frequency of a guitar string sound?

I want to build a guitar tuner app for Iphone. My goal is to find the fundamental frequency of sound generated by a guitar string. I have used bits of code from aurioTouch sample provided by Apple to calculate frequency spectrum and I find the frequency with the highest amplitude . It works fine for pure sounds (the ones that have only one frequency) but for sounds from a guitar string it produces wrong results. I have read that this is because of the overtones generate by the guitar string that might have higher amplitudes than the fundamental one. How can I find the fundamental frequency so it works for guitar strings? Is there an open-source library in C/C++/Obj-C for sound analyzing (or signal processing)?
You can use the signal's autocorrelation, which is the inverse transform of the magnitude squared of the DFT. If you're sampling at 44100 samples/s, then a 82.4 Hz fundamental is about 535 samples, whereas 1479.98 Hz is about 30 samples. Look for the peak positive lag in that range (e.g. from 28 to 560). Make sure your window is at least two periods of the longest fundamental, which would be 1070 samples here. To the next power of two that's a 2048-sample buffer. For better frequency resolution and a less biased estimate, use a longer buffer, but not so long that the signal is no longer approximately stationary. Here's an example in Python:
from pylab import *
import wave
fs = 44100.0 # sample rate
K = 3 # number of windows
L = 8192 # 1st pass window overlap, 50%
M = 16384 # 1st pass window length
N = 32768 # 1st pass DFT lenth: acyclic correlation
# load a sample of guitar playing an open string 6
# with a fundamental frequency of 82.4 Hz (in theory),
# but this sample is actually at about 81.97 Hz
g = fromstring('dist_gtr_6.wav').readframes(-1),
g = g / float64(max(abs(g))) # normalize to +/- 1.0
mi = len(g) / 4 # start index
def welch(x, w, L, N):
# Welch's method
M = len(w)
K = (len(x) - L) / (M - L)
Xsq = zeros(N/2+1) # len(N-point rfft) = N/2+1
for k in range(K):
m = k * ( M - L)
xt = w * x[m:m+M]
# use rfft for efficiency (assumes x is real-valued)
Xsq = Xsq + abs(rfft(xt, N)) ** 2
Xsq = Xsq / K
Wsq = abs(rfft(w, N)) ** 2
bias = irfft(Wsq) # for unbiasing Rxx and Sxx
p = dot(x,x) / len(x) # avg power, used as a check
return Xsq, bias, p
# first pass: acyclic autocorrelation
x = g[mi:mi + K*M - (K-1)*L] # len(x) = 32768
w = hamming(M) # hamming[m] = 0.54 - 0.46*cos(2*pi*m/M)
# reduces the side lobes in DFT
Xsq, bias, p = welch(x, w, L, N)
Rxx = irfft(Xsq) # acyclic autocorrelation
Rxx = Rxx / bias # unbias (bias is tapered)
mp = argmax(Rxx[28:561]) + 28 # index of 1st peak in 28 to 560
# 2nd pass: cyclic autocorrelation
N = M = L - (L % mp) # window an integer number of periods
# shortened to ~8192 for stationarity
x = g[mi:mi+K*M] # data for K windows
w = ones(M); L = 0 # rectangular, non-overlaping
Xsq, bias, p = welch(x, w, L, N)
Rxx = irfft(Xsq) # cyclic autocorrelation
Rxx = Rxx / bias # unbias (bias is constant)
mp = argmax(Rxx[28:561]) + 28 # index of 1st peak in 28 to 560
Sxx = Xsq / bias[0]
Sxx[1:-1] = 2 * Sxx[1:-1] # fold the freq axis
Sxx = Sxx / N # normalize S for avg power
n0 = N / mp
np = argmax(Sxx[n0-2:n0+3]) + n0-2 # bin of the nearest peak power
# check
print "\nAverage Power"
print " p:", p
print "Rxx:", Rxx[0] # should equal dot product, p
print "Sxx:", sum(Sxx), '\n' # should equal Rxx[0]
subplot2grid((2,1), (0,0))
title('Autocorrelation, R$_{xx}$'); xlabel('Lags')
mr = r_[:3 * mp]
plot(Rxx[mr]); plot(mp, Rxx[mp], 'ro')
xticks(mp/2 * r_[1:6])
grid(); axis('tight'); ylim(1.25*min(Rxx), 1.25*max(Rxx))
subplot2grid((2,1), (1,0))
title('Power Spectral Density, S$_{xx}$'); xlabel('Frequency (Hz)')
fr = r_[:5 * np]; f = fs * fr / N;
vlines(f, 0, Sxx[fr], colors='b', linewidth=2)
xticks((fs * np/N * r_[1:5]).round(3))
grid(); axis('tight'); ylim(0,1.25*max(Sxx[fr]))
Average Power
p: 0.0410611012542
Rxx: 0.0410611012542
Sxx: 0.0410611012542
The peak lag is 538, which is 44100/538 = 81.97 Hz. The first-pass acyclic DFT shows the fundamental at bin 61, which is 82.10 +/- 0.67 Hz. The 2nd pass uses a window length of 538*15 = 8070, so the DFT frequencies include the fundamental period and harmonics of the string. This enables an ubiased cyclic autocorrelation for an improved PSD estimate with less harmonic spreading (i.e. the correlation can wrap around the window periodically).
Edit: Updated to use Welch's method to estimate the autocorrelation. Overlapping the windows compensates for the Hamming window. I also calculate the tapered bias of the hamming window to unbias the autocorrelation.
Edit: Added a 2nd pass with cyclic correlation to clean up the power spectral density. This pass uses 3 non-overlapping, rectangular windows length 538*15 = 8070 (short enough to be nearly stationary). The bias for cyclic correlation is a constant, instead of the Hamming window's tapered bias.
Finding the musical pitches in a chord is far more difficult than estimating the pitch of one single string or note played at a time. The overtones for the multiple notes in a chord might all be overlapping and interleaving. And all the notes in common chords may themselves be at overtone frequencies for one or more non-existent lower pitched notes.
For single notes, autocorrelation is a common technique used by some guitar tuners. But with autocorrelation, you have to be aware of some potential octave uncertainty, as guitars may produce inharmonic and decaying overtones which thus don't exactly match from pitch period to pitch period. Cepstrum and Harmonic Product Spectrum are two other pitch estimation methods which may or may not have different problems, depending on the guitar and the note.
RAPT appears to be one published algorithm for more robust pitch estimation. YIN is another.
Also Objective C is a superset of ANSI C. So you can use any C DSP routines you find for pitch estimation within an Objective C app.
Use libaubio (link) and be happy . It was one the biggest time lose for me to try to implement a fundemental frequency estimator. If you want to do it yourself I advise you follow to YINFFT method (link)
