This is from my original problem which has closed here
how-to-store-a-variable-in-gnuplot-and-use-it-in-xrange-and-in-set-arrow
The stats commands worked for me as
stats case.".data.agr" u 4
but it is showing the output of stats on terminal like this
* FILE:
Records: 33656
Out of range: 0
Invalid: 0
Column headers: 0
Blank: 0
Data Blocks: 1
* COLUMN:
Mean: 1.9161
Std Dev: 1.1081
Sample StdDev: 1.1081
Skewness: 0.0001
Kurtosis: 1.8016
Avg Dev: 0.9592
Sum: 64487.8293
Sum Sq.: 164887.7903
Mean Err.: 0.0060
Std Dev Err.: 0.0043
Skewness Err.: 0.0134
Kurtosis Err.: 0.0267
Minimum: 0.0000 [ 0]
Maximum: 3.8335 [33655]
Quartile: 0.9581
Median: 1.9168
Quartile: 2.8732
which I do not want to see
Could someone please tell me how to set the stats command so that it do not show the stats data on terminal?
Add nooutput:
stats case.".data.agr" u 4 nooutput
You can get help about gnuplot commands with help command from the gnuplot command line, in your case: help stats.
Related
Let's assume that we have a survfit object as follows.
fit = survfit(Surv(data$time_12m, data$status_12m) ~ data$group)
fit
Call: survfit(formula = Surv(data$time_12m, data$status_12m) ~ data$group)
n events median 0.95LCL 0.95UCL
data$group=HF 10000 3534 NA NA NA
data$group=IGT 70 20 NA NA NA
fit object does not show CI-s. How to calculate confidence intervals for the survival rates? Which R packages and code should be used?
The print result of survfit gives confidnce intervals by group for median survivla time. I'm guessing the NA's for the estimates of median times is occurring because your groups are not having enough events to actually get to a median survival. You should show the output of plot(fit) to see whether my guess is correct.
You might try to plot the KM curves, noting that the plot.survfit function does have a confidence interval option constructed around proportions:
plot(fit, conf.int=0.95, col=1:2)
Please read ?summary.survfit. It is the class of generic summary functions which are typically used by package authors to deliver the parameter estimates and confidence intervals. There you will see that it is not "rates" which are summarized by summary.survfit, but rather estimates of survival proportion. These proportions can either be medians (in which case the estimate is on the time scale) or they can be estimates at particular times (and in that instance the estimates are of proportions.)
If you actually do want rates then use a functions designed for that sort of model, perhaps using ?survreg. Compare what you get from using survreg versus survfit on the supplied dataset ovarian:
> reg.fit <- survreg( Surv(futime, fustat)~rx, data=ovarian)
> summary(reg.fit)
Call:
survreg(formula = Surv(futime, fustat) ~ rx, data = ovarian)
Value Std. Error z p
(Intercept) 6.265 0.778 8.05 8.3e-16
rx 0.559 0.529 1.06 0.29
Log(scale) -0.121 0.251 -0.48 0.63
Scale= 0.886
Weibull distribution
Loglik(model)= -97.4 Loglik(intercept only)= -98
Chisq= 1.18 on 1 degrees of freedom, p= 0.28
Number of Newton-Raphson Iterations: 5
n= 26
#-------------
> fit <- survfit( Surv(futime, fustat)~rx, data=ovarian)
> summary(fit)
Call: survfit(formula = Surv(futime, fustat) ~ rx, data = ovarian)
rx=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
59 13 1 0.923 0.0739 0.789 1.000
115 12 1 0.846 0.1001 0.671 1.000
156 11 1 0.769 0.1169 0.571 1.000
268 10 1 0.692 0.1280 0.482 0.995
329 9 1 0.615 0.1349 0.400 0.946
431 8 1 0.538 0.1383 0.326 0.891
638 5 1 0.431 0.1467 0.221 0.840
rx=2
time n.risk n.event survival std.err lower 95% CI upper 95% CI
353 13 1 0.923 0.0739 0.789 1.000
365 12 1 0.846 0.1001 0.671 1.000
464 9 1 0.752 0.1256 0.542 1.000
475 8 1 0.658 0.1407 0.433 1.000
563 7 1 0.564 0.1488 0.336 0.946
Might have been easier if I had used "exponential" instead of "weibull" as the distribution type. Exponential fits have a single parameter that is estimated and are more easily back-transformed to give estimates of rates.
Note: I answered an earlier question about survfit, although the request was for survival times rather than for rates. Extract survival probabilities in Survfit by groups
I'm have some weird behavior with stats command when using ranges.
Consider the follow simple example.
$Data<<EOD
1 10
2 20
3 30
4 40
5 50
6 60
7 70
8 80
9 90
10 100
EOD
stats [1:5] $Data u 1 nooutput
print STATS_records # Result: 10; Expected: 5
stats [1:5] $Data u 1:2 nooutput
print STATS_records # It works fine
Why the first stats command don't return expected value?
This affects all stats results. Is it a bug? Am I missing something? I'm using version 5.2 patchlevel 8.
My attempt to explain this behavior:
If you do
stats [1:5] $Data u 1:2 nooutput
Column 1 corresponds to x and column 2 corresponds to y.
With [1:5] you limit x from 1 to 5, hence 5 records.
If you do
stats [1:5] $Data u 1 nooutput
Column 1 is "kind of" y and the pseudocolumn 0 is "kind of" x,
however, you are limiting x but not the pseudocolumn 0, hence 10 records.
So, if you do
stats [1:5] $Data u 1:1 nooutput
you will get the expected results and the expected statistics on column 1.
To have a look at all the STATS values type show var STATS.
But I'm just guessing... I'm sure #Ethan can tell.
Say, I have a large data file that starts at index 1 and ends at more than 10000, like this:
1 -35000 44312 53750 97500 67687 5000 1.64
2 33500 -12937 -68000 -37250 -35937 -96750 1.64
3 -37750 43125 53500 95250 66937 4500 1.64
4 29000 -15437 -69000 -39750 -36562 -97250 1.64
5 -39000 43062 52250 93000 65750 3750 1.64
.
.
.
100000 29250 -14250 -69250 -41500 -37500 -98000 1.64
I use this command to monitor the data online:
plot 'data.raw' using 0:3 title 'Reference' w lp ls 1, \
'data.raw' using 0:7 title 'Temperature' w lp ls 7
set xrange [0: ]
pause 0.5
replot
reread
As the data points increases, I barely see a change in the graph, because I plot the whole file from X=0. How can I plot a certain interval only, e.g. deltaX = 300 points with autoupdate? So I would then see practically 0-300, 300-600, and so on in plot window of Gnuplot.
Thank You!
Not sure if this is what you're after. Say that I have some data file with 1000 entries (generated with bash):
for i in `seq 1 1 1000`; do echo $i $RANDOM >> data; done
Now I plot in intervals of 100 points and visualize each interval during 2 second:
do for [i=1:10] {
set xrange[100*(i-1):100*i]
set title "Interval no. ".i
plot "data" w l
pause 2
}
This looks like so:
I am encountering a problem with gnuplot 5.0.3 using the smooth frequency function. I am using the function on data that Looks like the sample row below with 4140 rows using the attached gnpulot script.
I am using smooth frequency within the plot command, once in the set table wrapper to output the frequency data in $DB and once in the actual graph plotting command.
In both times output is produced with 2 counts for same bin or category. The column in accval_col contains discrete integer data from 1 to 9, and the required job is to count the occurences of these categories and plot them in a histogramm.
If i run the script for the first 3400 records it performs the counts correctly after that the problem occurs. The splitted counts given for each category, sum up to the total count for the category. But i have no explanation why this split is happening and how to eliminate it.
I would appreciate any help on this, thanks.
reset
reset session
#input file
accfile=ARG1
#out_path=ARG2
#columns
accval_col=21
misses_col=13
#########################################################################
set term qt #pdfcairo font "Arial,7"
set datafile separator ";"
#print out_path
set output out_path
stats accfile u (((stringcolumn(misses_col)eq "\N") || (column(misses_col)==0))?column(accval_col):1/0) name "acc" #nooutput
set table $DB
plot accfile u (((stringcolumn(misses_col)eq "\N") || (column(misses_col)==0))?column(accval_col):1/0):(1.0*100/acc_records) smooth frequency
unset table
print $DB
set multiplot
unset key
set yrange [0:100]
set xrange [0.5:8.5]
set xtics ("<=3m" 1, "<=7,5m" 2, "<=12m" 3, "<=15m" 4, "<=20" 5, "<=30m" 6, "<=40m" 7, "<=50m" 8)
set boxwidth 0.5
set style fill solid 0.5
set xlabel "Kategorie"
set ylabel "Häufigkeit in %"
set title "Häufigkeitsdiagramm von Fehler Kategorien"
unset grid
set margins 8, 8, 6, 6
plot accfile u (((stringcolumn(misses_col)eq "\N") || (column(misses_col)==0))?column(accval_col):1/0):(1.0*100/acc_records) smooth frequency w boxes
set yrange[0:100]
unset xlabel
unset ylabel
unset title
unset ytics
unset xtics
set datafile separator
plot $DB u (stringcolumn(3) ne "u"?column(1):1/0):(column(2)+ 1.5):2 w labels
unset multiplot
set o
A sample line of the Input data, only relevant columns for the script is 21 (2nd from the back)
2016-03-17;86347;1;26.68;300.14;;;;; ;3949600;FRA_VC_CAT10_SMR_W_032016;1;2476187;\N;2476187;-8.53550654341171;5.943441933097802;FRA_VC_CAT10_SMR_W_032016;6125718;3;10.400931398905074
Output on STDOUT:
* FILE:
Records: 4138
Out of range: 0
Invalid: 2
Blank: 0
Data Blocks: 1
* COLUMN:
Mean: 3.3971
Std Dev: 1.2643
Sample StdDev: 1.2645
Skewness: 0.9423
Kurtosis: 4.1839
Avg Dev: 1.0162
Sum: 14057.0000
Sum Sq.: 54367.0000
Mean Err.: 0.0197
Std Dev Err.: 0.0139
Skewness Err.: 0.0381
Kurtosis Err.: 0.0762
Minimum: 1.0000 [ 6]
Maximum: 9.0000 [4002]
Quartile: 3.0000
Median: 3.0000
Quartile: 4.0000
# Curve 0 of 1, 20 points
# Curve title: "accfile u (((stringcolumn(misses_col)eq "\N") || (column(misses_col)==0))?column(accval_col):1/0):(1.0*100/acc_records)"
# x y type
1 1.08748 i
2 21.7013 i
3 33.0594 i
4 13.0256 i
5 10.5848 i
6 2.27163 i
7 0.652489 i
8 0.314161 i
9 0.338328 i
1 0.0241663 u
1 0.144998 i
2 1.57081 i
3 5.58241 i
4 3.67327 i
5 3.67327 i
6 2.1508 i
7 0.120831 i
8 0.0241663 i
9 0.0241663 i
1 0.0241663 u
So I need to plot planes (as in, they HAVE to be FLAT) defined by three points which all come from my equation. I can redefine the code such that there is a space between the three points. I choose not to. I just added the comments for clarity of where the points are divided. They're not really there.
# surface 1
1.000 0.000 0.000
-46.777 -0.702 -1.692
0.000 3.000 5.500
# surface 2
0.998 0.030 0.055
-46.451 -2.099 -5.068
-0.468 2.993 5.483
# surface 3
0.991 0.060 0.110
-45.804 -3.471 -8.400
-0.932 2.972 5.432
# surface 4
0.979 0.089 0.164
-44.842 -4.803 -11.659
-1.390 2.937 5.348
# surface 5
0.963 0.119 0.217
-43.574 -6.079 -14.816
-1.839 2.889 5.232
#... and so on
now I can plot just ONE surface using this code
set dgrid3d 10,10
set style data lines
set pm3d
i=0
splot '5surf' every ::i::(i+2) pal
but when I plug it in a do loop
n = 1000
unset key
set terminal gif size 800,600 crop
outtmpl = 'pic/output%07d.gif'
set dgrid3d 10,10
set style data lines
set pm3d
do for [i=0:n:3] {
set output sprintf(outtmpl, i)
splot '5surf' every ::i::(i+2) pal
print i
}
set output
I got curved surfaces with this, which is plain wrong. (pun intended)
The surface, according to my analysis, has to look a bit like it's rotating.
EDIT: I threw the dgrid3d out the window. Filledcurves didn't work. I was able to make a square with these points
1 1 4.8
-1 1 5.6
-1 -1 2.4
1 -1 1.6
1 1 4.8
using polygon, but I can't make it read from file.
last edit: If anyone stumbling across this is curious as to how I found the four points using the original set of three points, it was a matter of finding the equation of the plane containing the three points and plugging in (\pm 1, \pm 1, z) in there. Solve for z and consider all four cases. a basic calc problem, really
With the points for the square you must also insert an empty line to make pm3d work properly:
surface.dat:
-1 -1 2.4
-1 1 5.6
1 -1 1.6
1 1 4.8
Not, that the y-values are always in the same order -1, 1 for both blocks. Plot this with
set pm3d
splot 'surface.dat'
If you want to put several surfaces from one file you can now separate two surfaces with two empty lines, which you can then access with index:
surfaces.dat:
-1 -1 2.4
-1 1 5.6
1 -1 1.6
1 1 4.8
-1 -1 1.4
-1 1 4.6
1 -1 0.6
1 1 3.8
You can use stats to count the number of blocks:
stats 'surfaces.dat' nooutput
set pm3d
do for [i=0:STATS_blocks - 1] {
splot 'surfaces.dat' index i
}