The n-th "Invalid parent value" error - jags

I hope you can help me with a very simple model I'm running right now in Rjags.
The data I have are as follows:
> print(data)
$R
225738 184094 66275 24861 11266
228662 199379 70308 27511 12229
246808 224814 78255 30447 13425
254823 236063 83099 33148 13961
263772 250706 89182 35450 14750
272844 262707 96918 37116 15715
280101 271612 102604 38692 16682
291493 283018 111125 40996 18064
310474 299315 119354 44552 19707
340975 322054 126901 47757 21510
347597 332946 127708 49103 21354
354252 355994 130561 51925 22421
366818 393534 140628 56562 23711
346430 400629 146037 59594 25313
316438 399545 150733 62414 26720
303294 405876 161793 67060 29545
$N
9597000 8843000 9154000 9956000 11329000
9854932 9349814 9532373 10195193 11357751
9908897 9676950 9303113 10263930 11141510
9981879 9916245 9248586 10270193 10903446
10086567 10093723 9307104 10193818 10660101
10242793 10190641 9479080 10041145 10453320
10434789 10222806 9712544 9835154 10411620
10597293 10238784 10014422 9611918 10489448
10731326 10270163 10229259 9559334 10502839
10805148 10339566 10393532 9625879 10437809
10804571 10459413 10466871 9800559 10292169
10696317 10611599 10477448 10030407 10085603
10540942 10860363 10539271 10245334 9850488
10411836 11053751 10569913 10435763 9797028
10336667 11152428 10652017 10613341 9850533
10283624 11172747 10826549 10719741 9981814
$n
[1] 16
$na
[1] 5
$pbeta
[1] 0.70 0.95
and the model is as follows:
cat('model{
## likelihoods ##
for(k in 1:na){ for(w in 1:n){ R[w,k] ~ dbin( theta[w,k], N[w,k] ) }}
for(k in 1:na){ for(w in 1:n){ theta[w,k] <- 0.5*beta[w,k]*0.5 }}
for(k in 1:na){
beta[1,k] ~ dunif(pbeta[1], pbeta[2])
beta.plus[1,k] <- beta[1,k]
for (w in 2:n){
beta.plus[w,k] ~ dunif(beta[(w-1),k], 0.95)
beta[w,k] <- beta.plus[w,k]} } }',
file='model1.bug')
######## initial random values for beta
bbb=bb.plus=matrix(rep(NA, 16*5), byrow=T, ncol=5);
for(k in 1:5){
bbb[1,k]=runif(1, 0.7,0.95);
for (w in 2:16){
bb.plus[w,k] = runif(1, bbb[w-1,k], 0.95);
bbb[w,k]=bb.plus[w,k]} }
## data & initial values
inits1 <- list('beta'= bbb )
jags_mod <- jags.model('model1.bug', data=data, inits=inits1, n.chains=1, n.adapt=1000)
update(jags_mod, n.iter=1000)
posts=coda.samples(model=jags_mod,variable.names=c('beta','theta'), n.iter=niter, thin=1000)
Super easy. This is actually a scaled down model from a more complex one which gives me exactly the same error message I get here.
Whenever I run this model, no problems at all.
You will notice that the priors for parameter beta are written in such a way to be increasing from 0.7 to 0.95.
Now I would like to "shut off" the likelihood for R by commenting out the first line of the model. I'd like to do so, to see how the parameter theta gets estimated in any case (basically I should find theta=beta/4 in this case, but that would be fine with me)
When I do that, I get an "Invalid parent" error for parameter beta, generally in the bottom rows (rows 15 or 16) of the matrix.
Actually it's more sophisticated than this: sometimes I get an error, and sometimes I don't (mostly, I do).
I don' t understand why this happens: shouldn't the values of beta generated independently from the presence/absence of a likelihood?
Sorry if this is a naive question, I really hope you can help me sort it out.
Thanks, best!
Emanuele

After playing around with the model a bit more I think I found the cause of your problem. One necessary aspect of the uniform distribution (i.e., unif(a,b)) is that a<b. When you are making the uniform distribution smaller and smaller within your model you are bringing a closer and closer to b. At times, it does not reach it, but other times a equals b and you get the invalid parent values error. For example, in your model if you include:
example ~ dunif(0.4,0.4)
You will get "Error in node example, Invalid parent values".
So, to solve this I think it will be easier to adjust how you specify your priors instead of assigning them randomly. You could do this with the beta distribution. At the first step, beta(23.48, 4.98) covers most of the range from 0.7 to 0.95, but we could truncate it to make sure it lies between that range. Then, as n increases you can lower 4.98 so that the prior shrinks towards 0.95. The model below will do this. After inspecting the priors, it does turn out that theta does equal beta/4.
data.list <- list( n = 16, na = 5,
B = rev(seq(0.1, 4.98, length.out = 16)))
cat('model{
## likelihoods ##
#for(k in 1:na){ for(w in 1:n){ R[w,k] ~ dbin( theta[w,k], N[w,k] ) }}
for(k in 1:na){ for(w in 1:n){ theta[w,k] <- 0.5*beta[w,k]*0.5 }}
for(k in 1:na){
for(w in 1:n){
beta[w,k] ~ dbeta(23.48, B[w]) T(0.7,0.95)
} } }',
file='model1.bug')
jags_mod <- jags.model('model1.bug', data=data.list,
inits=inits1, n.chains=1, n.adapt=1000)
update(jags_mod, n.iter=1000)
posts=coda.samples(model=jags_mod,
variable.names=c('beta','theta'), n.iter=10000, thin=10)
Looking at some of the output from this model we get
beta[1,1] theta[1,1]
[1,] 0.9448125 0.2362031
[2,] 0.7788794 0.1947198
[3,] 0.9498806 0.2374702
0.9448125/4
[1] 0.2362031
Since I don't really know what you are trying to use the model for I do not know if the beta distribution would suit your needs, but the above method will mimic what you are trying to do.

Related

(Wald)Test for comparing two nested feols/plm models

I want to test if
model_1 <- feols(auth ~ dummy_past1 + dummy_past2 + dummy_past3
| region + date_f, # Fixed Effects
data=df)
is just as good as
model_2 <- feols(auth ~ i(dummy_past1,specific_regions) + i(dummy_past2,specific_regions) + i(dummy_past3, specific_regions) # interaction(dummies * specific regions)
| region + date_f, # Fixed Effects
data=df)
If this was a normal linear regression I would conduct a lrtest (likelihood ratio test). As I heard that a waldtest can also be applied to panel data (?) I conducted a waldtest
waldtest(model_1, model_2). Yet, when doing that I get the warning: "Error in solve.default(vc[ovar, ovar]) : system is computationally singular: reciprocal condition number = 2.10749e-44"
Does someone know if doing a waldtest is the correct approach here, or if there are any other tests that I could do on plm or feols regressions? Or it would also be very helpful if you have ideas on how I could get the waldtest working.

How does sklearn.linear_model.LinearRegression work with insufficient data?

To solve a 5 parameter model, I need at least 5 data points to get a unique solution. For x and y data below:
import numpy as np
x = np.array([[-0.24155831, 0.37083184, -1.69002708, 1.4578805 , 0.91790011,
0.31648635, -0.15957368],
[-0.37541846, -0.14572825, -2.19695883, 1.01136142, 0.57288752,
0.32080956, -0.82986857],
[ 0.33815532, 3.1123936 , -0.29317028, 3.01493602, 1.64978158,
0.56301755, 1.3958912 ],
[ 0.84486735, 4.74567324, 0.7982888 , 3.56604097, 1.47633894,
1.38743513, 3.0679506 ],
[-0.2752026 , 2.9110031 , 0.19218081, 2.0691105 , 0.49240373,
1.63213241, 2.4235483 ],
[ 0.89942508, 5.09052174, 1.26048572, 3.73477373, 1.4302902 ,
1.91907482, 3.70126468]])
y = np.array([-0.81388378, -1.59719762, -0.08256274, 0.61297275, 0.99359647,
1.11315445])
I used only 6 data to fit a 8 parameter model (7 slopes and 1 intercept).
lr = LinearRegression().fit(x, y)
print(lr.coef_)
array([-0.83916772, -0.57249998, 0.73025938, -0.02065629, 0.47637768,
-0.36962192, 0.99128474])
print(lr.intercept_)
0.2978781587718828
Clearly, it's using some kind of assignment to reduce the degrees of freedom. I tried to look into the source code but couldn't found anything about that. What method do they use to find the parameter of under specified model?
You don't need to reduce the degrees of freedom, it simply finds a solution to the least squares problem min sum_i (dot(beta,x_i)+beta_0-y_i)**2. For example, in the non-sparse case it uses the linalg.lstsq module from scipy. The default solver for this optimization problem is the gelsd LAPACK driver. If
A= np.concatenate((ones_v, X), axis=1)
is the augmented array with ones as its first column, then your solution is given by
x=numpy.linalg.pinv(A.T*A)*A.T*y
Where we use the pseudoinverse precisely because the matrix may not be of full rank. Of course, the solver doesn't actually use this formula but uses singular value Decomposition of A to reduce this formula.

How to use custom mean, median, mode functions with array of 2500 in python?

So I am trying to solve mean, median and mode challenge on Hackerrank. I defined 3 functions to calculate mean, median and mode for a given array with length between 10 and 2500, inclusive.
I get an error with an array of 2500 integers, not sure why. I looked into python documentation and found no mentions of max length for lists. I know I can use statistics module but trying the hard way and being stubborn I guess. Any help and criticism is appreciated regarding my code. Please be honest and brutal if need be. Thanks
N = int(input())
var_list = [int(x) for x in input().split()]
def mean(sample_list):
mean = sum(sample_list)/N
print(mean)
return
def median(sample_list):
sorted_list = sorted(sample_list)
if N%2 != 0:
median = sorted_list[(N//2)]
else:
median = (sorted_list[N//2] + sorted_list[(N//2)-1])/2
print(median)
return
def mode(sample_list):
sorted_list = sorted(sample_list)
mode = min(sorted_list)
max_count = sorted_list.count(mode)
for i in sorted_list:
if (i <= mode) and (sorted_list.count(i) >= max_count):
mode = i
print(mode)
return
mean(var_list)
median(var_list)
mode(var_list)
Compiler Message
Wrong Answer
Input (stdin)
2500
19325 74348 68955 98497 26622 32516 97390 64601 64410 10205 5173 25044 23966 60492 71098 13852 27371 40577 74997 42548 95799 26783 51505 25284 49987 99134 33865 25198 24497 19837 53534 44961 93979 76075 57999 93564 71865 90141 5736 54600 58914 72031 78758 30015 21729 57992 35083 33079 6932 96145 73623 55226 18447 15526 41033 46267 52486 64081 3705 51675 97470 64777 31060 90341 55108 77695 16588 64492 21642 56200 48312 5279 15252 20428 57224 38086 19494 57178 49084 37239 32317 68884 98127 79085 77820 2664 37698 84039 63449 63987 20771 3946 862 1311 77463 19216 57974 73012 78016 9412 90919 40744 24322 68755 59072 57407 4026 15452 82125 91125 99024 49150 90465 62477 30556 39943 44421 68568 31056 66870 63203 43521 78523 58464 38319 30682 77207 86684 44876 81896 58623 24624 14808 73395 92533 4398 8767 72743 1999 6507 49353 81676 71188 78019 88429 68320 59395 95307 95770 32034 57015 26439 2878 40394 33748 41552 64939 49762 71841 40393 38293 48853 81628 52111 49934 74061 98537 83075 83920 42792 96943 3357 83393{-truncated-}
Download to view the full testcase
Expected Output
49921.5
49253.5
2184
Your issue seems to be that you are actually using standard list operations rather than calculating things on the fly, while looping through the data once (for the average). sum(sample_list) will almost surely give you something which exceeds the double-limit, i.a.w. it becomes really big.
Further reading
Calculating the mean, variance, skewness, and kurtosis on the fly
How do I determine the standard deviation (stddev) of a set of values?
Rolling variance algorithm
What is a good solution for calculating an average where the sum of all values exceeds a double's limits?
How do I determine the standard deviation (stddev) of a set of values?
How to efficiently compute average on the fly (moving average)?
I figured out that you forgot to change the max_count variable inside the if block. Probably that causes the wrong result. I tested the debugged version on my computer and they seem to work well when I compare their result with the scipy's built-in functions. The correct mode function should be
def mode(sample_list):
N = len(sample_list)
sorted_list = sorted(sample_list)
mode = min(sorted_list)
max_count = sorted_list.count(mode)
for i in sorted_list:
if (sorted_list.count(i) >= max_count):
mode = i
max_count = sorted_list.count(i)
print(mode)
I was busy with some stuff and now came back to completing this. I am happy to say that I have matured enough as a coder and solved this issue.
Here is the solution:
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Input an array of numbers, convert it to integer array
n = int(input())
my_array = list(map(int, input().split()))
my_array.sort()
# Find mean
array_mean = sum(my_array) / n
print(array_mean)
# Find median
if (n%2) != 0:
array_median = my_array[n//2]
else:
array_median = (my_array[n//2 - 1] + my_array[n//2]) / 2
print(array_median)
# Find mode(I could do this using multimode method of statistics module for python 3.8)
def sort_second(array):
return array[1]
modes = [[i, my_array.count(i)] for i in my_array]
modes.sort(key = sort_second, reverse=True)
array_mode = modes[0][0]
print(array_mode)

linearK error in seq. default() cannot be NA, NaN

I am trying to learn linearK estimates on a small linnet object from the CRC spatstat book (chapter 17) and when I use the linearK function, spatstat throws an error. I have documented the process in the comments in the r code below. The error is as below.
Error in seq.default(from = 0, to = right, length.out = npos + 1L) : 'to' cannot be NA, NaN or infinite
I do not understand how to resolve this. I am following this process:
# I have data of points for each data of the week
# d1 is district 1 of the city.
# I did the step below otherwise it was giving me tbl class
d1_data=lapply(split(d1, d1$openDatefactor),as.data.frame)
# I previously create a linnet and divided it into districts of the city
d1_linnet = districts_linnet[["d1"]]
# I create point pattern for each day
d1_ppp = lapply(d1_data, function(x) as.ppp(x, W=Window(d1_linnet)))
plot(d1_ppp[[1]], which.marks="type")
# I am then converting the point pattern to a point pattern on linear network
d1_lpp <- as.lpp(d1_ppp[[1]], L=d1_linnet, W=Window(d1_linnet))
d1_lpp
Point pattern on linear network
3 points
15 columns of marks: ‘status’, ‘number_of_’, ‘zip’, ‘ward’,
‘police_dis’, ‘community_’, ‘type’, ‘days’, ‘NAME’,
‘DISTRICT’, ‘openDatefactor’, ‘OpenDate’, ‘coseDatefactor’,
‘closeDate’ and ‘instance’
Linear network with 4286 vertices and 6183 lines
Enclosing window: polygonal boundary
enclosing rectangle: [441140.9, 448217.7] x [4640080, 4652557] units
# the errors start from plotting this lpp object
plot(d1_lpp)
"show.all" is not a graphical parameter
Show Traceback
Error in plot.window(...) : need finite 'xlim' values
coords(d1_lpp)
x y seg tp
441649.2 4649853 5426 0.5774863
445716.9 4648692 5250 0.5435492
444724.6 4646320 677 0.9189631
3 rows
And then consequently, I also get error on linearK(d1_lpp)
Error in seq.default(from = 0, to = right, length.out = npos + 1L) : 'to' cannot be NA, NaN or infinite
I feel lpp object has the problem, but I find it hard to interpret the errors and how to resolve them. Could someone please guide me?
Thanks
I can confirm there is a bug in plot.lpp when trying to plot the marked point pattern on the linear network. That will hopefully be fixed soon. You can plot the unmarked point pattern using
plot(unmark(d1_lpp))
I cannot reproduce the problem with linearK. Which version of spatstat are you running? In the development version on my laptop spatstat_1.51-0.073 everything works. There has been changes to this code recently, so it is likely that this will be solved by updating to development version (see https://github.com/spatstat/spatstat).

R simplify heatmap to pdf

I want to plot a simplified heatmap that is not so difficult to edit with the scalar vector graphics program I am using (inkscape). The original heatmap as produced below contains lots of rectangles, and I wonder if they could be merged together in the different sectors to simplify the output pdf file:
nentries=100000
ci=rainbow(nentries)
set.seed=1
mean=10
## Generate some data (4 factors)
i = data.frame(
a=round(abs(rnorm(nentries,mean-2))),
b=round(abs(rnorm(nentries,mean-1))),
c=round(abs(rnorm(nentries,mean+1))),
d=round(abs(rnorm(nentries,mean+2)))
)
minvalue = 10
# Discretise values to 1 or 0
m0 = matrix(as.numeric(i>minvalue),nrow=nrow(i))
# Remove rows with all zeros
m = m0[rowSums(m0)>0,]
# Reorder with 1,1,1,1 on top
ms =m[order(as.vector(m %*% matrix(2^((ncol(m)-1):0),ncol=1)), decreasing=TRUE),]
rowci = rainbow(nrow(ms))
colci = rainbow(ncol(ms))
colnames(ms)=LETTERS[1:4]
limits=c(which(!duplicated(ms)),nrow(ms))
l=length(limits)
toname=round((limits[-l]+ limits[-1])/2)
freq=(limits[-1]-limits[-l])/nrow(ms)
rn=rep("", nrow(ms))
for(i in toname) rn[i]=paste(colnames(ms)[which(ms[i,]==1)],collapse="")
rn[toname]=paste(rn[toname], ": ", sprintf( "%.5f", freq ), "%")
heatmap(ms,
Rowv=NA,
labRow=rn,
keep.dendro = FALSE,
col=c("black","red"),
RowSideColors=rowci,
ColSideColors=colci,
)
dev.copy2pdf(file="/tmp/file.pdf")
Why don't you try RSvgDevice? Using it you could save your image as svg file, which is much convenient to Inkscape than pdf
I use the Cairo package for producing svg. It's incredibly easy. Here is a much simpler plot than the one you have in your example:
require(Cairo)
CairoSVG(file = "tmp.svg", width = 6, height = 6)
plot(1:10)
dev.off()
Upon opening in Inkscape, you can ungroup the elements and edit as you like.
Example (point moved, swirl added):
I don't think we (the internet) are being clear enough on this one.
Let me just start off with a successful export example
png("heatmap.png") #Ruby dev's think of this as kind of like opening a `File.open("asdfsd") do |f|` block
heatmap(sample_matrix, Rowv=NA, Colv=NA, col=terrain.colors(256), scale="column", margins=c(5,10))
dev.off()
The dev.off() bit, in my mind, reminds me of an end call to a ruby block or method, in that, the last line of the "nested" or enclosed (between png() and dev.off()) code's output is what gets dumped into the png file.
For example, if you ran this code:
png("heatmap4.png")
heatmap(sample_matrix, Rowv=NA, Colv=NA, col=terrain.colors(32), scale="column", margins=c(5,15))
heatmap(sample_matrix, Rowv=NA, Colv=NA, col=greenred(32), scale="column", margins=c(5,15))
dev.off()
it would output the 2nd (greenred color scheme, I just tested it) heatmap to the heatmap4.png file, just like how a ruby method returns its last line by default

Resources