Dodging error bars in marginsplot in Stata - graphics

I am using marginsplot to draw some error bars between two different groups. The error bars overlap though, so I'm trying to dodge them slightly left-or-right from one another.
Here is an example slightly edited from the marginsplot help that illustrates the problem:
use http://www.stata-press.com/data/r13/nhanes2
quietly regress bpsystol agegrp##sex
quietly margins agegrp#sex
marginsplot, recast(scatter) ciopts(recast(rspike))
Is there any easy way to dodge the blue Male points and bars slightly to the left, and the red Female points and bars slightly to the right (or vice-versa)? Like what is done is dodged bar charts.
Here it would work out fine to recast the confidence intervals to an area and make it slightly transparent as in the help example further down the line. However, for my actual use I would like to keep the points and spikes.

Here is an approach using the community-contributed commands parmest and eclplot.
The trick is to adjust the values of the group variable by a small amount, for example 0.1, and then to use the subby option of eclplot:
** a short version
use http://www.stata-press.com/data/r13/nhanes2
qui reg bpsystol agegrp##sex
qui margins agegrp#sex
qui parmest , bmat(r(b)) vmat(r(V)) level( `cilevel' ) fast
qui split parm, parse( . # )
qui destring parm*, replace
replace parm1 = parm1 - ( 0.05 )
eclplot estimate min95 max95 parm1, eplot(sc) rplottype(rspike) supby(parm3, spaceby(0.1))
However, the problem with this approach is that all the labels get lost but I do not know of a good way to retrieve them, other than by brute force.
The following is an extended version of the code where I tried to automate re-application of all the value labels by a brute force method:
use http://www.stata-press.com/data/r13/nhanes2, clear
** specify parameters and variables
local cilevel = 95
local groupvar agegrp
local typevar sex
local ytitle "Linear Prediction"
local title "Adjust Predictions of `groupvar'#`typevar' with `cilevel'% CIs"
local eplot scatter
local rplottype rspike
local spaceby 0.1 /* use this param to control the dodge */
** store labels of groupvar ("agegrp") and typevar ("sex")
local varlist `groupvar' `typevar'
foreach vv of var `varlist' {
local `vv'_varlab : var lab `vv'
qui levelsof `vv', local( `vv'_vals )
foreach vl of local `vv'_vals {
local `vv'_`vl'lab : lab `vv' `vl'
lab def `vv'_vallab `vl' "``vv'_`vl'lab'", add
}
}
** run analysis
qui reg bpsystol `groupvar'##`typevar'
margins `groupvar'#`typevar'
** use parmest to store estimates
preserve
parmest , bmat(r(b)) vmat(r(V)) level( `cilevel' ) fast
lab var estimate "`ytitle'"
split parm, parse( . # )
qui destring parm*, replace
rename parm1 `groupvar'
rename parm3 `typevar'
** reaply stored labels
foreach vv of var `varlist' {
lab var `vv' "``vv'_varlab'"
lab val `vv' `vv'_vallab
}
** dodge and plot
replace agegrp = agegrp - ( `spaceby' / 2 )
eclplot estimate min95 max95 agegrp ///
, eplot( `eplot' ) rplottype( `rplottype' ) ///
supby( sex, spaceby( `spaceby' ) ) ///
estopts1( mcolor( navy ) ) estopts2( mcolor( maroon ) ) ///
ciopts1( lcolor( navy ) ) ciopts2( lcolor( maroon ) ) ///
title( "`title'" )
restore

Related

Adding/multiplying heatmaps in gnuplot

Is it possible with gnuplot to perform an operation on data (adding/multiplying) from two data files to generate a heatmap with the result of the operation ?
Ex: I have two files each with 4 columns, where
Col1: X coordinate
Col2: Y coordiante
Col3: Value
Col4: Uncertainty
I want to multiply the columns 3 of each file.
I wondered if something similar exists/would work in gnuplot, like ...
splot 'first.dat' using 1:2:(v=$3), 'second.dat' using 1:2:(v*$3)
I have been able to do this with two columns from the same file
splot 'first.dat' using 1:2:($3*$4)
A very similar question has already been answered:
gnuplot plot data from two files
In your case it will look like that:
splot "<paste first.dat second.dat" u 1:2:($3*$6)
Note that all columns from both files are present, therefore you have to "skip" the ones from the second file.
The OP apparently runs Linux or MacOS. #Eldrad's nice and short solution won't work with Windows. Of course, you can install additional programs like gnuwin, awk, etc...
A platform independent and gnuplot-only (bit more complicated) solution is the following.
You load the files 1:1 into datablocks and print these datablocks into a new datablock by appending each line. Assumption is of course that the two files have the same number of lines.
Code:
### plot data from different files combined with mathematical operation
# platform independent gnuplot-only solution
reset session
Windows = GPVAL_SYSNAME[:7] eq "Windows" ? 1 : 0 # otherwise Linux or MacOS
FILE = "first.dat"
Data = "$Data1"
if (Windows) { load '< echo '.Data.' ^<^<EOD & type "'.FILE.'"' }
else { load '< echo "\'.Data.' <<EOD" & cat "'.FILE.'"' }
FILE = "second.dat"
Data = "$Data2"
if (Windows) { load '< echo '.Data.' ^<^<EOD & type "'.FILE.'"' }
else { load '< echo "\'.Data.' <<EOD" & cat "'.FILE.'"' }
set print $Data
do for [i=1:|$Data1|] {
print $Data1[i][1:strlen($Data1[i])-1]."\t".$Data2[i]
}
set print
splot $Data u 1:2:($3*$6)
### end of code

The n-th "Invalid parent value" error

I hope you can help me with a very simple model I'm running right now in Rjags.
The data I have are as follows:
> print(data)
$R
225738 184094 66275 24861 11266
228662 199379 70308 27511 12229
246808 224814 78255 30447 13425
254823 236063 83099 33148 13961
263772 250706 89182 35450 14750
272844 262707 96918 37116 15715
280101 271612 102604 38692 16682
291493 283018 111125 40996 18064
310474 299315 119354 44552 19707
340975 322054 126901 47757 21510
347597 332946 127708 49103 21354
354252 355994 130561 51925 22421
366818 393534 140628 56562 23711
346430 400629 146037 59594 25313
316438 399545 150733 62414 26720
303294 405876 161793 67060 29545
$N
9597000 8843000 9154000 9956000 11329000
9854932 9349814 9532373 10195193 11357751
9908897 9676950 9303113 10263930 11141510
9981879 9916245 9248586 10270193 10903446
10086567 10093723 9307104 10193818 10660101
10242793 10190641 9479080 10041145 10453320
10434789 10222806 9712544 9835154 10411620
10597293 10238784 10014422 9611918 10489448
10731326 10270163 10229259 9559334 10502839
10805148 10339566 10393532 9625879 10437809
10804571 10459413 10466871 9800559 10292169
10696317 10611599 10477448 10030407 10085603
10540942 10860363 10539271 10245334 9850488
10411836 11053751 10569913 10435763 9797028
10336667 11152428 10652017 10613341 9850533
10283624 11172747 10826549 10719741 9981814
$n
[1] 16
$na
[1] 5
$pbeta
[1] 0.70 0.95
and the model is as follows:
cat('model{
## likelihoods ##
for(k in 1:na){ for(w in 1:n){ R[w,k] ~ dbin( theta[w,k], N[w,k] ) }}
for(k in 1:na){ for(w in 1:n){ theta[w,k] <- 0.5*beta[w,k]*0.5 }}
for(k in 1:na){
beta[1,k] ~ dunif(pbeta[1], pbeta[2])
beta.plus[1,k] <- beta[1,k]
for (w in 2:n){
beta.plus[w,k] ~ dunif(beta[(w-1),k], 0.95)
beta[w,k] <- beta.plus[w,k]} } }',
file='model1.bug')
######## initial random values for beta
bbb=bb.plus=matrix(rep(NA, 16*5), byrow=T, ncol=5);
for(k in 1:5){
bbb[1,k]=runif(1, 0.7,0.95);
for (w in 2:16){
bb.plus[w,k] = runif(1, bbb[w-1,k], 0.95);
bbb[w,k]=bb.plus[w,k]} }
## data & initial values
inits1 <- list('beta'= bbb )
jags_mod <- jags.model('model1.bug', data=data, inits=inits1, n.chains=1, n.adapt=1000)
update(jags_mod, n.iter=1000)
posts=coda.samples(model=jags_mod,variable.names=c('beta','theta'), n.iter=niter, thin=1000)
Super easy. This is actually a scaled down model from a more complex one which gives me exactly the same error message I get here.
Whenever I run this model, no problems at all.
You will notice that the priors for parameter beta are written in such a way to be increasing from 0.7 to 0.95.
Now I would like to "shut off" the likelihood for R by commenting out the first line of the model. I'd like to do so, to see how the parameter theta gets estimated in any case (basically I should find theta=beta/4 in this case, but that would be fine with me)
When I do that, I get an "Invalid parent" error for parameter beta, generally in the bottom rows (rows 15 or 16) of the matrix.
Actually it's more sophisticated than this: sometimes I get an error, and sometimes I don't (mostly, I do).
I don' t understand why this happens: shouldn't the values of beta generated independently from the presence/absence of a likelihood?
Sorry if this is a naive question, I really hope you can help me sort it out.
Thanks, best!
Emanuele
After playing around with the model a bit more I think I found the cause of your problem. One necessary aspect of the uniform distribution (i.e., unif(a,b)) is that a<b. When you are making the uniform distribution smaller and smaller within your model you are bringing a closer and closer to b. At times, it does not reach it, but other times a equals b and you get the invalid parent values error. For example, in your model if you include:
example ~ dunif(0.4,0.4)
You will get "Error in node example, Invalid parent values".
So, to solve this I think it will be easier to adjust how you specify your priors instead of assigning them randomly. You could do this with the beta distribution. At the first step, beta(23.48, 4.98) covers most of the range from 0.7 to 0.95, but we could truncate it to make sure it lies between that range. Then, as n increases you can lower 4.98 so that the prior shrinks towards 0.95. The model below will do this. After inspecting the priors, it does turn out that theta does equal beta/4.
data.list <- list( n = 16, na = 5,
B = rev(seq(0.1, 4.98, length.out = 16)))
cat('model{
## likelihoods ##
#for(k in 1:na){ for(w in 1:n){ R[w,k] ~ dbin( theta[w,k], N[w,k] ) }}
for(k in 1:na){ for(w in 1:n){ theta[w,k] <- 0.5*beta[w,k]*0.5 }}
for(k in 1:na){
for(w in 1:n){
beta[w,k] ~ dbeta(23.48, B[w]) T(0.7,0.95)
} } }',
file='model1.bug')
jags_mod <- jags.model('model1.bug', data=data.list,
inits=inits1, n.chains=1, n.adapt=1000)
update(jags_mod, n.iter=1000)
posts=coda.samples(model=jags_mod,
variable.names=c('beta','theta'), n.iter=10000, thin=10)
Looking at some of the output from this model we get
beta[1,1] theta[1,1]
[1,] 0.9448125 0.2362031
[2,] 0.7788794 0.1947198
[3,] 0.9498806 0.2374702
0.9448125/4
[1] 0.2362031
Since I don't really know what you are trying to use the model for I do not know if the beta distribution would suit your needs, but the above method will mimic what you are trying to do.

In Stata, how can I group coefplot's means across categorical variable?

I'm working with coefplot command (source, docs) in Stata plotting means of continuous variable over cateories.
Small reporoducible example:
sysuse auto, clear
drop if rep78 < 3
la de rep78 3 "Three" 4 "Four" 5 "Five"
la val rep78 rep78
mean mpg if foreign == 0, over(rep78)
eststo Domestic
mean mpg if foreign == 1, over(rep78)
eststo Foreign
su mpg, mean
coefplot Domestic Foreign , xtitle(Mpg) xline(`r(mean)')
Gives me result:
What I'd like to add is an extra 'group' label for Y axis. Trying options from regression examples doesn't seem to do the job:
coefplot Domestic Foreign , headings(0.rep78 = "Repair Record 1978")
coefplot Domestic Foreign , groups(?.rep78 = "Repair Record 1978")
Any other possibilities?
This seems to do the job
coefplot Domestic Foreign , xtitle(Mpg) xline(`r(mean)') ///
groups(Three Four Five = "Repair Record 1978")
I don't know however how it will handle situations with categorical variables with the same labels?

R simplify heatmap to pdf

I want to plot a simplified heatmap that is not so difficult to edit with the scalar vector graphics program I am using (inkscape). The original heatmap as produced below contains lots of rectangles, and I wonder if they could be merged together in the different sectors to simplify the output pdf file:
nentries=100000
ci=rainbow(nentries)
set.seed=1
mean=10
## Generate some data (4 factors)
i = data.frame(
a=round(abs(rnorm(nentries,mean-2))),
b=round(abs(rnorm(nentries,mean-1))),
c=round(abs(rnorm(nentries,mean+1))),
d=round(abs(rnorm(nentries,mean+2)))
)
minvalue = 10
# Discretise values to 1 or 0
m0 = matrix(as.numeric(i>minvalue),nrow=nrow(i))
# Remove rows with all zeros
m = m0[rowSums(m0)>0,]
# Reorder with 1,1,1,1 on top
ms =m[order(as.vector(m %*% matrix(2^((ncol(m)-1):0),ncol=1)), decreasing=TRUE),]
rowci = rainbow(nrow(ms))
colci = rainbow(ncol(ms))
colnames(ms)=LETTERS[1:4]
limits=c(which(!duplicated(ms)),nrow(ms))
l=length(limits)
toname=round((limits[-l]+ limits[-1])/2)
freq=(limits[-1]-limits[-l])/nrow(ms)
rn=rep("", nrow(ms))
for(i in toname) rn[i]=paste(colnames(ms)[which(ms[i,]==1)],collapse="")
rn[toname]=paste(rn[toname], ": ", sprintf( "%.5f", freq ), "%")
heatmap(ms,
Rowv=NA,
labRow=rn,
keep.dendro = FALSE,
col=c("black","red"),
RowSideColors=rowci,
ColSideColors=colci,
)
dev.copy2pdf(file="/tmp/file.pdf")
Why don't you try RSvgDevice? Using it you could save your image as svg file, which is much convenient to Inkscape than pdf
I use the Cairo package for producing svg. It's incredibly easy. Here is a much simpler plot than the one you have in your example:
require(Cairo)
CairoSVG(file = "tmp.svg", width = 6, height = 6)
plot(1:10)
dev.off()
Upon opening in Inkscape, you can ungroup the elements and edit as you like.
Example (point moved, swirl added):
I don't think we (the internet) are being clear enough on this one.
Let me just start off with a successful export example
png("heatmap.png") #Ruby dev's think of this as kind of like opening a `File.open("asdfsd") do |f|` block
heatmap(sample_matrix, Rowv=NA, Colv=NA, col=terrain.colors(256), scale="column", margins=c(5,10))
dev.off()
The dev.off() bit, in my mind, reminds me of an end call to a ruby block or method, in that, the last line of the "nested" or enclosed (between png() and dev.off()) code's output is what gets dumped into the png file.
For example, if you ran this code:
png("heatmap4.png")
heatmap(sample_matrix, Rowv=NA, Colv=NA, col=terrain.colors(32), scale="column", margins=c(5,15))
heatmap(sample_matrix, Rowv=NA, Colv=NA, col=greenred(32), scale="column", margins=c(5,15))
dev.off()
it would output the 2nd (greenred color scheme, I just tested it) heatmap to the heatmap4.png file, just like how a ruby method returns its last line by default

Spatially Subsetting Images in batch mode using IDL and ENVI

I would like to spatially subset LANDSAT photos in ENVI using an IDL program. I have over 150 images that I would like to subset, so I'd like to run the program in batch mode (with no interaction). I know how to do it manually, but what command would I use to spatially subset the image via lat/long coordinates in IDL code?
Here is some inspiration, for a single file.
You can do the same for a large number of files by building up
a list of filenames and looping over it.
; define the image to be opened (could be in a loop), I believe it can also be a tif, img...
img_file='path/to/image.hdr'
envi_open_file,img_file,r_fid=fid
if (fid eq -1) then begin
print, 'Error when opening file ',img_file
return
endif
; let's define some coordinates
XMap=[-70.0580916, -70.5006694]
YMap=[-32.6030694, -32.9797194]
; now convert coordinates into pixel position:
; the transformation function uses the image geographic information:
ENVI_CONVERT_FILE_COORDINATES, FID, XF, YF, XMap, YMap
; we must consider integer. Think twice here, maybe you need to floor() or ceil()
XF=ROUND(XF)
YF=ROUND(YF)
; read the image
envi_file_query, fid, DIMS=DIMS, NB=NB, NL=NL, NS=NS
pos = lindgen(nb)
; and store it in an array
image=fltarr(NS, NL, NB)
; read each band sequentially
FOR i=0, NB-1 DO BEGIN
image[*,*,i]= envi_get_data(fid=fid, dims=dims, pos=pos[i])
endfor
; simply crop the data with array-indexing function
imagen= image[XF[0]:XF[1],YF[0]:YF[1]]
nl2=YF[1]-YF[0]
ns2=XF[1]-XF[0]
; read mapinfo to save it in the final file
map_info=envi_get_map_info(fid=fid)
envi_write_envi_file, imagen, data_type=4, $
descrip = 'cropped', $
map_info = map_info, $
nl=nl2, ns=ns2, nb=nb, r_fid=r_fid, $
OUT_NAME = 'path/to/cropped.hdr'

Resources