multipanel plot with labels extracted from htest R objects - statistics

ks.test returns a list with class "htests", but I do not find the way to store those lists with the proper class into a vector. The code I am using is:
random.sim <- read.delim("ABC_searStatsForModelFit_model0_RandomValidation.txt")
labels <- names(random.sim)
par(mfrow=c(4,3), oma=c(0.5, 0.75, 2, 0.25), mar=c(4, 4, 4, 4))
pdf("posterior_bias_random.pdf",width=9,height=13)
ks = vector("list",12)
i=1
for (n in c(6,11,16,21,26,31,36,41,46,51,56,61)) {
ks[i]<-ks.test(random.sim[,n], "qunif")
i=i+1
hist(random.sim[,n], main="", xlab=labels[n], ylab="Frequency")
add_label(0.4, 0.07, paste("K-S test = ", ks[i], sep=""))
}
title("2CAB+CJAfg", outer=T)
dev.off()
I know I'm doing something wrong because each ks[i] has not class htest
Please note that "add_label" is a small function that I borrowed from somewhere else (sorry I do not recall where, but possibly from Stackoverflow) to align the labels within the plots
add_label <- function(xfrac, yfrac, label, pos = 4, ...) {
u <- par("usr")
x <- u[1] + xfrac * (u[2] - u[1])
y <- u[4] - yfrac * (u[4] - u[3])
text(x, y, label, pos = pos, ...)
}
Thanks for any help.
Pablo
I could solve the plot part by labeling on the fly
pdf("posterior_bias_random.pdf",width=9,height=13)
par(mfrow=c(4,3), oma=c(0.5, 0.75, 2, 0.25), mar=c(4, 4, 4, 4))
for (n in c(6,11,16,21,26,31,36,41,46,51,56,61)) {
ks<-ks.test(random.sim[,n], "qunif")
hist(random.sim[,n], main="", xlab=labels[n], ylab="Frequency")
add_label(0.4, 0.07, paste("K-S test = ", signif(ks$statistic, digits=3), sep=""))
}
title("2CAB+CJAfg", outer=T)
dev.off()
However, I still wonder how to store such list. I have read Q:"How to store htest list into a matrix", but can not use the solutions. Josh's only keeps the last test list, whereas Bruno's own answer do not clarify how to store the non-numeric info.
Tanks anyway for maintaining this great forum. Probably my preferred source for solving R code questions.
Pablo

Related

In drc() package, drm fct = L.4 finds wrong intercept parameters, even though the graph is right

I have a problem with the following code.
It calculates the drc curve correctly, but the ec50 wrongly, although the are closely related...
x <- c(-1, -0.114074, 0.187521, 0.363612, 0.488551, 0.585461, 0.664642, 0.730782, 0.788875, 0.840106, 0.885926, 0.92737, 0.965202, 1)
y <- c(100, 3.978395643, 0.851717911, 0.697307565, 0.512455497, 0.512455497, 0.482273052, 0.479293487, 0.361024717, 0.355324864, 0.303120838, 0.286539832, 0.465692047, 0.358045152)
mat <- cbind(x, y)
df <- as.data.frame(mat)
calc <- drm(
formula = y ~ x,
data = df,
fct = L.4(names = c("hill", "min_value", "max_value", "ec50"))
)
plot <- ggplot(df, aes(x=x, y=y), color="black") +
geom_point() +
labs(x = "x", y = "y") +
theme(
axis.title.x = element_text(color="black", size=10),
axis.title.y = element_text(color="black", size=10),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black")
) +
stat_smooth(
formula = y ~ x,
method = "drm", color="black",
method.args = list(fct = L.4(names = c("hill", "min_value", "max_value", "ec50"))),
se = FALSE
) +
theme(panel.background=element_rect(fill="white"))+
ylim(0, NA)
ec50 <- ED(calc,50)
print(ec50)
print(calc)
print(plot)
This is the graph I obtain:
But if I print the parameters of the function L.4, I have the following result:
hill:(Intercept) 6.3181
min_value:(Intercept) 0.3943
max_value:(Intercept) 111.0511
ec50:(Intercept) -0.6520
max_value:(Intercept) is obviously wrong (it has to be 100), and, as a consequence, ec50 is wrong too.
I would also add that for other sets of data, the min_value:(Intercept) is wrong too (with values < 0...)
I cannot find the mistake, because the graph derived from the same function L.4 shows the right values.
Thank you very much for your help!
The upper asymptote in your case assumes a symmetrical curve (due to 4PL fitting). Meaning that both bottom and upper asymptote have the same inflection.
Your data might max out at 100 but the formula calculates the upper asymptote further than 100 (111) because that's where the actual asymptote lies, not the end of your data.
So the graph is based on your data, but the estimated parameters forces a symmetrical curve to fit it, and your asymptote increases. This will also shift the EC50.

SolverStudio how to reference 1 column in a 2D list in a for loop(PuLP)

I have 2 data sets x1 and x2. I want to be able to get a total sum of all the products of x1 and x2 only in the rows where the From column has Auckland in it.
see here
The final answer should be (5*1) + (2*1) + (3*1) + (4*1) or 14. The PuLP code that I wrote to do this is given below
# Import PuLP modeller functions
from pulp import *
varFinal = sum([x1[a] * x2[a] for a in Arcs if a == Nodes[0]])
print Nodes[0]
print Arcs[0]
Final = varFinal
The output that gets printed to the console is
Auckland
('Auckland', 'Albany')
I realise that my final value is zero because Arcs[some number] does not equal Nodes[some number]. Is there anyway to change the code so my final value is 14?
Any help is appreciated.
Welcome to stack overflow! Cause you've only posted part of your code, I have to guess at what data-types you're using. From the output, I'm guessing your Nodes are strings, and your Arcs are tuples of strings.
Your attempt is very close, you want the from column to have Auckland in it. You can index into a tuple the same way you would into an array, so you want to do: a[0] == Nodes[0].
Below is a self-contained example with the first bit of your data in which outputs the following (note that I've changed to python 3.x print statements (with parentheses)):
Output:
Auckland
('Auckland', 'Albany')
14
Code:
# Import PuLP modeller functions
from pulp import *
# Data
Nodes = ['Auckland',
'Wellington',
'Hamilton',
'Kansas City',
'Christchuch',
'Albany',
'Whangarei',
'Rotorua',
'New Plymouth']
Arcs = [('Auckland','Albany'),
('Auckland','Hamilton'),
('Auckland','Kansas City'),
('Auckland','Christchuch'),
('Wellington','Hamilton'),
('Hamilton','Albany'),
('Kansas City','Whangarei'),
('Christchuch','Rotorua')]
x1_vals = [1, 2, 3, 4, 5, 9, 11, 13]
x2_vals = [5, 1, 1, 1, 1, 1, 1, 1]
x1 = dict((Arcs[i], x1_vals[i]) for i in range(len(Arcs)))
x2 = dict((Arcs[i], x2_vals[i]) for i in range(len(Arcs)))
varFinal = sum([x1[a] * x2[a] for a in Arcs if a[0] == Nodes[0]])
print(Nodes[0])
print(Arcs[0])
print(varFinal)
For future reference, answers are most likely to be forthcoming if you include code which others can try to run (without external data dependencies), that way people can try to run it, fix it, and re-post it.

How to plot a Cramer’s V heatmap for categorical features?

The association between categorical variables should be computed using Crammer's V. Therefore, I found the following code to plot it, but I don't know why he plotted it for "contribution", which is a numeric variable?
def cramers_corrected_stat(confusion_matrix):
""" calculate Cramers V statistic for categorical-categorical association.
uses correction from Bergsma and Wicher,
Journal of the Korean Statistical Society 42 (2013): 323-328
"""
chi2 = ss.chi2_contingency(confusion_matrix)[0]
n = confusion_matrix.sum().sum()
phi2 = chi2/n
r,k = confusion_matrix.shape
phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))
rcorr = r - ((r-1)**2)/(n-1)
kcorr = k - ((k-1)**2)/(n-1)
return np.sqrt(phi2corr / min( (kcorr-1), (rcorr-1)))
cols = ["Party", "Vote", "contrib"]
corrM = np.zeros((len(cols),len(cols)))
# there's probably a nice pandas way to do this
for col1, col2 in itertools.combinations(cols, 2):
idx1, idx2 = cols.index(col1), cols.index(col2)
corrM[idx1, idx2] = cramers_corrected_stat(pd.crosstab(df[col1], df[col2]))
corrM[idx2, idx1] = corrM[idx1, idx2]
corr = pd.DataFrame(corrM, index=cols, columns=cols)
fig, ax = plt.subplots(figsize=(7, 6))
ax = sns.heatmap(corr, annot=True, ax=ax); ax.set_title("Cramer V Correlation between Variables");
I also found Bokeh. However, I am not sure if it uses Crammer's V to plot the heatmap or not?
Really, I have two categorical features: the first one has 2 categories and the second one has 37 categories. Could you please let me know how to plot Crammer's V heatmap?
Some part of my dataset is here.
Thanks in advance.
What's the problem? The code is absolutely right.
ax in this case ia a correlation matrix beetwen variables.
Using "contribution" is not correct but you can see in the article bellow
Quote
*
"This isn't right to do on the Contribution variable, but we'll do
more with a model later."
*
The author shows this variable for example only.
In your case what's the reason to make plot Crammer's V? You have just two variables (as I see) and you will get only one correlation coefficient Crammer's V
But of course you can repeat the code on your data and get plot Crammer's V heatmap

Applying function to cartesian product of two unequal vectors

I am trying to avoid looping by using an documented apply function, but have not been able to find any examples to suit my purpose. I have two vectors, x which is (1 x p) and y which is (1 x q) and would like to feed the Cartesian product of their parameters into a function, here is a parsimonious example:
require(kernlab)
x = c("cranapple", "pear", "orange-aid", "mango", "kiwi",
"strawberry-kiwi", "fruit-punch", "pomegranate")
y = c("apple", "cranberry", "orange", "peach")
sk <- stringdot(type="boundrange", length = l, normalized=TRUE)
sk_map = function(x, y){return(sk(x, y))}
I realize I could use an apply function over one dimension and loop for the other, but I feel like there has to be a way to do it in one step... any ideas?
Is this what you had in mind:
sk <- stringdot(type="boundrange", length = 2, normalized=TRUE)
# Create data frame with every combination of x and y
dat = expand.grid(x=x,y=y)
# Apply sk by row
sk_map = apply(dat, 1, function(dat_row) sk(dat_row[1],dat_row[2]))
You can use the outer function for this if your function is vectorized, and you can use the Vectorize function to create a vectorized function if it is not.
outer(x,y,FUN=sk)
or
outer(x,y, FUN=Vectorize(sk))

R: Call matrixes from a vector of string names?

Imagine I've got 100 numeric matrixes with 5 columns each.
I keep the names of that matrixes in a vector or list:
Mat <- c("GON1EU", "GON2EU", "GON3EU", "NEW4", ....)
I also have a vector of coefficients "coef",
coef <- c(1, 2, 2, 1, ...)
And I want to calculate a resulting vector in this way:
coef[1]*GON1EU[,1]+coef[2]*GON2EU[,1]+coef[3]*GON3EU[,1]+coef[4]*NEW4[,1]+.....
How can I do it in a compact way, using the the vector of names?
Something like:
coef*(Object(Mat)[,1])
I think the key is how to call an object from a string with his name and use and vectorial notation. But I don't know how.
get() allows you to refer to an object by a string. It will only get you so far though; you'll still need to construct the repeated call to get() on the list matrices etc. However, I wonder if an alternative approach might be feasible? Instead of storing the matrices separately in the workspace, why not store the matrices in a list?
Then you can use sapply() on the list to extract the first column of each matrix in the list. The sapply() step returns a matrix, which we multiply by the coefficient vector. The column sums of that matrix are the values you appear to want from your above description. At least I'm assuming that coef[1]*GON1EU[,1] is a vector of length(GON1EU[,1]), etc.
Here's some code implementing this idea.
vec <- 1:4 ## don't use coef - there is a function with that name
mat <- matrix(1:12, ncol = 3)
myList <- list(mat1 = mat, mat2 = mat, mat3 = mat, mat4 = mat)
colSums(sapply(myList, function(x) x[, 1]) * vec)
Here is some output:
> sapply(myList, function(x) x[, 1]) * vec
mat1 mat2 mat3 mat4
[1,] 1 1 1 1
[2,] 4 4 4 4
[3,] 9 9 9 9
[4,] 16 16 16 16
> colSums(sapply(myList, function(x) x[, 1]) * vec)
mat1 mat2 mat3 mat4
30 30 30 30
The above example suggest you create, or read in, your 100 matrices as components of a list from the very beginning of your analysis. This will require you to alter the code you used to generate the 100 matrices. Seeing as you already have your 100 matrices in your workspace, to get myList from these matrices we can use the vector of names you already have and use a loop:
Mat <- c("mat","mat","mat","mat")
## loop
for(i in seq_along(myList2)) {
myList[[i]] <- get(Mat[i])
}
## or as lapply call - Kudos to Ritchie Cotton for pointing that one out!
## myList <- lapply(Mat, get)
myList <- setNames(myList, paste(Mat, 1:4, sep = ""))
## You only need:
myList <- setNames(myList, Mat)
## as you have the proper names of the matrices
I used "mat" repeatedly in Mat as that is the name of my matrix above. You would use your own Mat. If vec contains what you have in coef, and you create myList using the for loop above, then all you should need to do is:
colSums(sapply(myList, function(x) x[, 1]) * vec)
To get the answer you wanted.
See help(get) and that's that.
If you'd given us a reproducible example I'd have said a bit more. For example:
> a=1;b=2;c=3;d=4
> M=letters[1:4]
> M
[1] "a" "b" "c" "d"
> sum = 0 ; for(i in 1:4){sum = sum + i * get(M[i])}
> sum
[1] 30
Put whatever you need in the loop, or use apply over the vector M and get the object:
> sum(unlist(lapply(M,function(n){get(n)^2})))
[1] 30

Resources