Colour points in X,Y scatter based on value of continuous data in another column

Colour points in X,Y scatter based on value of continuous data in another column - gnuplot

My question is similar to this one:
vary point color based on column value for multiple data blocks gnuplot
Except there was not an explanation given above for the syntax used and what it meant..
The data looks like this - columns separated by a comma and enter separates rows:
0, 0F_0F_0F_0F_0F, 0_0_0_0_0_0_0_0_0_0, 1_0_0_0_0_0_0_0_0_0
4.046025985, 0F_2Fo_0F_2Fo_0F, 0_0_1_0_0_0_0_0_1_0, 1_1_0_0_0_0_1_0_0_0
2.941144083, 0F_0F_0F_0F_0F, 0_0_1_0_0_1_0_0_0_1, 1_0_0_0_1_0_0_0_0_0
1.836301245, 0F_0F_0F_2Fo_0F, 0_0_0_0_0_0_0_0_0_0, 1_0_0_0_0_0_0_0_0_0
0.90317579, 0F_0F_0F_2Fo_0F, 0_0_0_1_0_0_0_1_0_0, 1_0_1_0_0_1_0_0_1_0
3.826663156, 0F_0F_0F_0F_0F, 0_1_0_0_1_0_1_0_0_1, 1_0_1_0_0_0_0_0_0_0
In my datafile, there are 100 individual rows, where column 1 is to be used for the colour palette and columns 2-4 are labels for X,Y axes on two different plots
What I want is an X,Y scatter of columns 3 and 4, with column 1 used to colour each point on the plot.
Here is my script attempt:
set title "K and W Occupancy \n KcsA, Replica 0, 0 mV "
set xlabel "POT" font ",18"
set ylabel "Water" font ",18"
set cblabel "Free energy (kT)" font ",18"
set xtics rotate by -45
set xtics out font ", 13" nomirror
set ytics out font ", 13" nomirror
set pointsize 0.4
set xrange [0:100]
iset yrange [0:100]
set cbrange [0:10]
# MATLAB jet color pallete --> from https://github.com/Gnuplotting/gnuplot-palettes/blob/master/jet.pal
# palette
set palette defined (0 0.0 0.0 0.5, \
1 0.0 0.0 1.0, \
2 0.0 0.5 1.0, \
3 0.0 1.0 1.0, \
4 0.5 1.0 0.5, \
5 1.0 1.0 0.0, \
6 1.0 0.5 0.0, \
7 1.0 0.0 0.0, \
8 0.5 0.0 0.0 )
splot '$filename' using 3:4:($1 <= 10 ? 0 : 1) w p pointtype 5 pointsize 1 palette linewidth 10
I do not really know what this means:
($1 <= 10 ? 0 : 1)
Why does the script plot a 3D graph with the data incorrectly placed?
Was expected a 2D plot with unique entries along the X and Y axes, with each point coloured along a colour scale..
The attempt described above results in a 3D plot and the points are incorrect.
Multiple answers to similar questions I have read do not explain what each term in the gnuplot script means, including:
Plotting style based on an entry in a data-file
gnuplot splot colors based on a fourth column of the data file
vary point color based on column value for multiple data blocks gnuplot

We don't have your data (if possible please always add minimized data) and we don't see your graph output.
I do not really know what this means: ($1 <= 10 ? 0 : 1)
This is the ternary operator. Check help ternary. If the value in column 1 ($1) is smaller or equal to 10 return 0, and 1 otherwise.
Why does the script plot a 3D graph with the data incorrectly placed?
Because you told gnuplot so. Mind the difference splot and plot. Check help splot and help plot. splot requires x,y,z input and your z is ($1 <= 10 ? 0 : 1)
So, without being able to test your case, your command probably should be something like this:
plot '$filename' u 3:4:1 w p pt 5 ps 1 lc palette
Addition:
If I understood your question correctly, I guess there is no off-the-self plotting style for this.
You need to:
create lists of unique elements (by (mis-)using stats, check help stats) for x and for y (in your case column 3 and 4). The list will be in the order of occurrence in the datafile. Unfortunately, gnuplot does not offer an internal alphanumerical sort of a list. If you want it sorted you need to either use external tools or a cumbersome gnuplot-only workaround.
define a function by (mis-)using sum (check help sum) which determines the index of a given item and use this index either as x- or y-coordinate
Script:
### scatter plot with x,y strings
reset session
$Data <<EOD
0.00, 0F_0F_0F, 0_0_0_0, 0_0_0_0
0.43, 0F_0F_0F, 0_1_1_1, 1_0_1_1
0.64, 0F_0F_0F, 0_1_1_1, 1_1_0_0
0.73, 0F_0F_0F, 0_1_1_1, 0_1_1_1
0.29, 0F_0F_0F, 0_1_0_1, 1_0_1_1
0.34, 0F_0F_0F, 0_1_0_1, 1_1_1_1
0.45, 0F_0F_0F, 1_1_1_1, 1_0_1_1
0.10, 0F_0F_0F, 1_1_1_1, 0_1_1_1
0.99, 0F_0F_0F, 0_0_1_1, 1_1_0_0
EOD
uniqX = uniqY = ' '
addToList(uniq,col) = uniq.(strstrt(uniq,' '.strcol(col).' ') ? '' : strcol(col).' ' )
getIdx(list,s) = (_c=NaN, sum[_i=1:words(list)] (word(list,_i) eq s ? _c=_i : NaN) , _c)
set datafile separator comma
stats $Data u (uniqX=addToList(uniqX,3), uniqY=addToList(uniqY,4)) nooutput
set key noautotitle
set xtic noenhanced rotate by 90 right
set ytic noenhanced
set offsets 0.5,0.5,0.5,0.5
set bmargin 4
set size ratio -1
set grid x,y
set palette rgb 33,13,10
plot $Data u (getIdx(uniqX,strcol(3))):(getIdx(uniqY,strcol(4))):1:xtic(3):ytic(4) w p pt 5 ps 7 lc palette
### end of script
Result:

Related

How to display y-labels on top of histogram bars on gnuplot

I desgined a histogram in gnuplot however the y-scale needs to be in log2 due to huge difference in values. Therefore, to improve readability of the plot I pretend to display the concrete values on top of each bar. The values represent bytes and so I would like for this values also be in log2 and to be formated to display kb, Mb, ... as is being done in the y-axis.
How can I achieve this?
This is the comands I'm currently using:
set terminal postscript eps enhanced dash color "" 13
reset
set datafile separator ","
set title "Bytes per Protocol"
set xlabel "Protocol"
set ylabel "Bytes" rotate by 90
set yrange [0:1342177280]
set logscale y 2
set format y '%.0s%cB'
set style data histogram
set boxwidth 0.5
set style fill solid
set xtics format ""
set grid ytics
set style data histogram
set style histogram clustered gap 2
set grid ytics
set tic scale 0
set size 1,0.9
set size ratio 0.5
set key autotitle columnhead
set output "ex_a_1_BIG.eps"
plot "ex_a_1_BIG.csv" using ($3):xtic(1) title "IN", \
'' using ($5):xtic(1) title "OUT", \
'' using 0:($3):($3) with labels center offset -2,1 notitle, \
'' using 0:($5):($5) with labels center offset 2,1 notitle
This is the content of the csv I want to plot (I only want the bytes in and out):
protocol,packets in,bytes in,packets out,bytes out
ICMP,1833,141562,979,60334
IGMP,0,0,283,14006
TCP,158214,129221151,130101,47734355
UDP,68476,9571677,72530,24310734

Check help format_specifiers and help gprintf. And the example below.
What is a bit unfortunate, that in gnuplot apparently the prefix for 1 to 999 is a single space instead of an empty string.
For example, with the format '%.1s %cB' this leads to two spaces for 1-999 B and one space for the others, e.g. 1 kB. However, if you use '%.1s%cB' this leads to one space for 1-999 B and no space for the others e.g. 100kB. As far as I know, correct would be one space between the number and the units. I'm not sure whether there is an easy fix for this.
Code:
### prefixes
reset session
$Data <<EOD
1 1
2 12
3 123
4 1234
5 12345
6 123456
7 1234567
8 12345678
9 123456789
10 1234567890
11 12345678901
12 123456789012
13 1234567890123
EOD
set boxwidth 0.7
set style fill solid 1.0
set xtics 1
set yrange [0.5:8e13]
set multiplot layout 2,1
set logscale y # base of 10
set format y '%.0s %cB'
plot $Data u 1:2 w boxes lc rgb "green" notitle, \
'' u 1:2:(gprintf('%.1s %cB',$2)) w labels offset 0,1 not
set logscale y 2 # base of 2
set format y '%.0b %BB'
plot $Data u 1:2 w boxes lc rgb "red" notitle, \
'' u 1:2:(gprintf('%.1b %BB',$2)) w labels offset 0,1 not
unset multiplot
### end of code
Result:
Addition:
a workaround for number/unit space issue at least for the labels in the graph would be:
myFmt(c) = column(c)>=1 && column(c)<1000 ? \
gprintf('%.1s%cB',column(c)) : gprintf('%.1s %cB',column(c))
and
plot $Data u 1:2 w boxes lc rgb "green" notitle, \
'' u 1:2:(myFmt(2)) w labels offset 0,1 not
But for the ytics labels I still don't have an idea.

Plotting intersecting lines in GNUplot

I haven't been able to find any example of what I'm trying to do in GNUplot from raking docs and demos.
Essentially I want to plot the Blue, Green, and Red lines I manually drew on this output (for demonstration) at the 10/50/90% marks.
EDIT: For clarity, I'm looking to determine where the distribution lines hit the cumulative distribution at 0.1/0.5/0.9 to know which co-ordinates to draw the lines at. Thanks!
set terminal png size 1600,800 font "Consolas" 16
set output "test.png"
set title "PDF and CDF - 1000 Simulations"
set grid y2
set ylabel "Date Probability"
set y2range [0:1.00]
set y2tics 0.1
set y2label "Cumulative Distribution"
set xtics rotate by 90 offset 0,-5
set bmargin 6
plot "data.txt" using 1:3:xtic(2) notitle with boxes axes x1y1,'' using 1:4 notitle with linespoints axes x1y2

Depending on the number of points in your cumulative data curve you might need interpolation. The following example is chosen such that no original data point will be at your levels 10%, 50%, 90%. If your data is not steadily increasing, it will take the last value which matches your level(s).
The procedure is as follows:
plot your data to a dummy table.
check when Level is between to successive y-values (y0,y1).
remember the interpolated x-value in xp.
draw arrows from the borders of the graph to the point (xp,Level) (or instead use the partly outside rectangle "trick" from #Ethan).
Code:
### linear interpolation of data
reset session
set colorsequence classic
set key left
# create some dummy data
set sample 10
set table $Data
plot [-2:2] '+' u 1:(norm(x)) with table
unset table
Interpolate(yi) = x0 + (x1-x0)*(yi-y0)/(y1-y0)
Levels = "0.1 0.5 0.9"
do for [i=1:words(Levels)] {
Level = word(Levels,i)
x0 = x1 = y0 = y1 = NaN
set table $Dummy
plot $Data u (x0=x1,x1=$1,y0=y1,y1=$2, (y0<=Level && Level<=y1)? (xp=Interpolate(Level)):NaN ): (Level) w table
unset table
set arrow i*2 from xp, graph 0 to xp,Level nohead lc i
set arrow i*2+1 from xp,Level to graph 1,Level nohead lc i
}
plot $Data u 1:2 w lp pt 7 lc 0 t "Original data"
### end code
Result:

It is not clear if you are asking how to find the x-coordinates at which your cumulative distribution line hits 0.1, 0.5, 0.9 (hard to do so I will leave that for now) or asking how to draw the lines once you know those x values. The latter part is easy. Think of the lines you want to draw as the unclipped portion of a rectangle that extends off the plot to the lower right:
set object 1 rectangle from x1, 0.1 to graph 2, -2 fillstyle empty border lc "blue"
set object 2 rectangle from x2, 0.1 to graph 2, -2 fillstyle empty border lc "green"
set object 3 rectangle from x3, 0.1 to graph 2, -2 fillstyle empty border lc "red"
plot ...

Plotting Condition Lines

Suppose I have the following data:
"1,5"
"2,10"
""
"3,4"
"4,2"
""
"5,6"
"6,10"
I want to graph this using gnuplot with a line between each condition, similar to this display:
How might this be accomplished? I have looked into gridlines, but that does not seem to suit my need. I am also looking for a solution that will automatically draw condition / phase lines between each break in the data set.

As mentioned in the comments and explained in the linked question and its answers, you can draw arbitrary lines manually via set arrow ... (check help arrow).
However, if possible I don't want to adjust the lines manually every time I change the data or if I have many different plots.
But, hey, you are using gnuplot, so, make it automated!
To be honest, within the time figuring out how it can be done I could have changed a "few" lines and labels manually ;-). But now, this might be helpful for others.
The script below is written in such a way that it doesn't matter whether you have zero, one or two or more empty lines between the different blocks.
Comments:
the function valid(1) returns 0 and 1 if column(1) contains a valid number (check help valid).
the vertical lines are plotted with vectors (check help vectors). The x-position is taken as average of the x-value before the label line and the x-value after the label line. The y-value LevelY is determined beforehand via stats (check help stats).
the labels are plotted with labels (check help labels) and positioned at the first x-value after each label line and at an y-value of LevelY with an offset.
Script:
### automatic vertical lines and labels
reset session
$Data <<EOD
Baseline
1 10.0
2 12.0
3 10.5
4 11.0 # zero empty lines follow
Treatment
5 45.0
6 35.0
7 32.5
8 31.0 # one empty line follows
Baseline
9 14.0
10 12.8
11 12.0
12 11.3 # two empty lines follow
Treatment
13 35.0
14 45.0
15 45.0
16 37.0
EOD
set offset 1,1,1,1
set border 3
set title "Student Performance" font ",14"
set xlabel "Sessions"
set xtics 1 out nomirror
set ylabel "Number of Responses"
set yrange [0:]
set ytics out nomirror
set key noautotitle
set grid x,y
stats $Data u 2 nooutput
LevelY = STATS_max # get the max y-level
getLinePosX(col) = (v0=v1,(v1=valid(col))?(x0=x1,x1=column(1)):0, v0==0?(x0+x1)/2:NaN)
getLabel(col) = (v0=v1,(v1=valid(col))?0:(h1=strcol(1),h0=h1),column(1))
plot x1=NaN $Data u (y0=(valid(1)?$2:NaN),$1):(y0) w lp pt 13 ps 2 lw 2 lc "red", \
x1=v1=NaN '' u (getLinePosX(1)):(0):(0):(LevelY) w vec nohead lc "black" lw 1.5 dt 2, \
v1=NaN '' u (getLabel(1)):(LevelY):(sprintf("%s",v0==0?h0:'')) w labels left offset 0,1.5 font ",12"
### end of script
Result:

gnuplot: string values xticlabel & adjusting fontsize

I have data I would like to plot in a histogram style with a "cumulated" curve on top. I have the following problem:
My data consists of one column with the categories ("discharge") and one column with the quantity of values ("probability") that belong to the respective category. The last value of the category-column is ">100" summarizing all power plants that have a bigger discharge than the last numeric value ("100 m^3/s"). I have not found a solution to plot this last category and the respective values with the command plot 'datafile.dat' using 1:2 with boxes ... because (as I assume) in this case only numerical values are read out for the x-ticlabels, so the last category is missing. If
I plot it with this command plot 'datafile.dat' using 2:xtics(1) with boxes ... I get the last category ">100" plotted just fine.
BUT: if I use the latter command the x-axis labels appear in the normal font size. Even though I have the line set format x '\footnotesize \%10.0f' in my code.
I have read about explicit labels in the plotcommand line that overwrite format style which was set before but was not able to adapt it to my code.
Changing ytic font size in gnuplot epslatex (multiplot)
Do you have an idea how to do this?
Excel screenshot to visualize what I want to achieve
'datafile.dat'
discharge probability cumulated
10 20 20%
20 10 10%
30 5 5%
40 6 6%
50 4 4%
60 12 12%
70 8 8%
80 15 15%
90 20 20%
100 6 6%
>100 4 4%`
[terminal=epslatex,terminaloptions={size 15cm, 8cm font ",10"}]
set xrange [*:*]
set yrange [0:20]
set y2range [0:100]
set xlabel 'Discharge$' offset 0,-1
set ylabel 'No. of power plants' offset 10.5
set y2label 'Cumulated probability' offset -10
set format xy '$\%g$'
set format x '\footnotesize \%10.0f'
set format y '\footnotesize \%10.0f'
set format y2 '\footnotesize \%10.0f'
set xtics rotate by 45 center offset 0,-1
set style fill pattern border -1
set boxwidth 0.3 relative
set style line 1 lt 1 lc rgb 'black' lw 2 pt 6 ps 1 dt 2
plot 'datafile.dat' using 1:2 with boxes axes x1y1 fs pattern 6 lc black notitle, \
'datafile.dat' using 1:3 with linespoints axes x1y2 ls 1 notitle

I am confused by your datafile; the numbers in the third column do not seem to be cumulative, and do not add up to 100%. Here is a solution that uses only the first two columns of your file:
set term epslatex standalone header "\\usepackage[T1]{fontenc}"
set output 'test.tex'
stats "datafile.dat" using 2
total = STATS_sum
set xlabel "Discharge" offset 0, 1.5
set xtics rotate
set ylabel "No. of power plants"
set ytics nomirror
set yrange [0:*]
set y2label "Cumulative probability"
set y2tics
set y2range [0:]
set boxwidth 0.3 relative
set style line 1 lt 1 lc rgb 'black' lw 2 pt 6 ps 1 dt 2
plot \
'datafile.dat' using 2:xtic("\\footnotesize " . stringcolumn(1)) with boxes axes x1y1 fs pattern 6 lc black notitle, \
'datafile.dat' using ($2/total) smooth cumulative with linespoints axes x1y2 ls 1 notitle
set output
The trick is to add the latex command \footnotesize in front of each label in the using command. It also first computes the total number of power plants so that it can compute probabilities, and computes cumulative values with the smooth cumulative option.

Add a single point at an existing plot

I am using the following script to fit a function on a plot. In the output plot I would like to add a single value with etiquette on the fitting curve lets say the point f(3.25). I have read that for gnuplot is very tricky to add one single point on a plot particularly when this plot is a fitting function plot.
Has someone has an idea how to add this single point on the existing plot?
set xlabel "1000/T (K^-^1)" font "Helvetica,20"
#set ylabel "-log(tau_c)" font "Helvetica,20"
set ylabel "-log{/Symbol t}_c (ns)" font "Helvetica,20"
set title "$system $type $method" font "Helvetica,24"
set xtics font "Helvetica Bold, 18"
set ytics font "Helvetica Bold, 18"
#set xrange[0:4]
set border linewidth 3
set xtic auto # set xtics automatically
set ytic auto # set ytics automatically
#set key on bottom box lw 3 width 8 height .5 spacing 4 font "Helvetica, 24"
set key box lw 3 width 4 height .5 spacing 4 font "Helvetica, 24"
set yrange[-5:]
set xrange[1.5:8]
f(x)=A+B*x/(1000-C*x)
A=1 ;B=-227 ; C=245
fit f(x) "$plot1" u (1000/\$1):(-log10(\$2)) via A,B,C
plot [1.5:8] f(x) ti "VFT" lw 4, "$plot1" u (1000/\$1):(-log10(\$2)) ti "$system $type" lw 10
#set key on bottom box lw 3 width 8 height .5 spacing 4 font "Helvetica, 24"
set terminal postscript eps color dl 2 lw 1 enhanced # font "Helvetica,20"
set output "KWW.eps"
replot

There are several possiblities to set a point/dot:
1. set object
If you have simple points, like a circle, circle wedge or a square, you can use set object, which must be define before the respective plot command:
set object circle at first -5,5 radius char 0.5 \
fillstyle empty border lc rgb '#aa1100' lw 2
set object circle at graph 0.5,0.9 radius char 1 arc [0:-90] \
fillcolor rgb 'red' fillstyle solid noborder
set object rectangle at screen 0.6, 0.2 size char 1, char 0.6 \
fillcolor rgb 'blue' fillstyle solid border lt 2 lw 2
plot x
To add a label, you need to use set label.
This may be cumbersome, but has the advantage that you can use different line and fill colors, and you can use different coordinate systems (first, graph, screen etc).
The result with 4.6.4 is:
2. Set an empty label with point option
The set label command has a point option, which can be used to set a point using the existing point types at a certain coordinate:
set label at xPos, yPos, zPos "" point pointtype 7 pointsize 2
3. plot with '+'
The last possibility is to use the special filename +, which generates a set of coordinates, which are then filtered, and plotted using the labels plotting style (or points if no label is requested:
f(x) = x**2
x1 = 2
set xrange[-5:5]
set style line 1 pointtype 7 linecolor rgb '#22aa22' pointsize 2
plot f(x), \
'+' using ($0 == 0 ? x1 : NaN):(f(x1)):(sprintf('f(%.1f)', x1)) \
with labels offset char 1,-0.2 left textcolor rgb 'blue' \
point linestyle 1 notitle
$0, or equivalently column(0), is the coordinate index. In the using statement only the first one is taken as valid, all other ones are skipped (using NaN).
Note, that using + requires setting a fixed xrange.
This has the advantages (or disadvantages?):
You can use the usual pointtype.
You can only use the axis values as coordinates (like first or second for the objects above).
It may become more difficult to place different point types.
It is more involved using different border and fill colors.
The result is:

Adding to Christoph's excellent answers :
4. use stdin to pipe in the one point
replot "-" using 1:(f($1))
2.0
e
and use the method in 3rd answer to label it.
5. bake a named datablock
(version > 5.0) that contains the one point, then you can replot without resupplying it every time:
$point << EOD
2.0
EOD
replot $point using 1:(f($1)):(sprintf("%.2f",f($1))) with labels

6. A solution using a dummy array of length one:
array point[1]
pl [-5:5] x**2, point us (2):(3) pt 7 lc 3
7. Or through a shell command (see help piped-data):
pl [-5:5] x**2, "<echo e" us (2):(3) pt 7 lc 3
pl [-5:5] x**2, "<echo 2 3" pt 7 lc 3
8. Special filename '+'
pl [-5:5] x**2, "+" us (2):(3) pt 7 lc 3
It seems to be the shortest solution. But note that while it looks like a single point, these are like 500 points (see show samples) plotted on the same position.
To have only one point the sampling needs to be temporarily adjusted (see help plot sampling)
pl [-5:5] x**2, [0:0:1] "+" us (2):(3) pt 7 lc 3
9. Function with zero sampling range length
Shortest to type, but plotting as many points on top of each other as many specified with samples
pl [-5:5] x**2, [2:2] 3 w p pt 7 lc 3

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string