gnuplot, drawing duplicated points and coloring - gnuplot

I have the below data in gnuplot:
2012-09-18 0 2 12
2012-03-15 1 4 5
2012-12-18 24 8 11
2012-09-18 2 8 11
2012-03-15 16 5 5
2011-12-06 5 2 3
2012-12-18 3 12 8
2012-09-18 4 4 8
2012-03-29 11 6 2
2011-12-06 9 7 3
2012-12-18 6 7 8
2012-09-18 4 3 8
2012-02-09 27 2 1
2012-12-18 2 1 8
2012-09-18 6 14 8
1st column; x (date)
2nd column; y
3rd column; the point color
4th column; number of occurrences(the point is duplicated)
I need to write a gnuplot program which:
Draws my (x,y) points.
Gives each point a different color depending on the 3rd column value (maybe over 50 different colors).
If the 4th column is greater than 0 then the point is duplicated and it must be drawn n times and give its x,y a random positing with a small margin. for example, (rand(x)-0.5,rand(y)-0.5).
Another question, what is the best and fastest way/tool to learn gnuplot?

This is supposed to be an extension to my answer for your other question drawing duplicated points in gnuplot with small margin:
You need to have the first column interpreted as time data. For this you need
set xdata time
set timefmt '%Y-%m-%d'
In order to set the point color, it is best to define a palette and then use linecolor palette, which sets the point color based on its value in the palette.
So, using the explanations from drawing duplicated points in gnuplot with small margin the final script is:
reset
filename = 'data.dat'
stats filename using 4 nooutput
set xdata time
set timefmt '%Y-%m-%d'
set format x '%Y-%m'
rand_x(x) = x + 60*60*24*7 * (rand(0) - 0.5)
rand_y(y) = y + (rand(0) - 0.5)
plot for [i=0:int(STATS_max)-1] filename \
using (rand_x(timecolumn(1))):(i < $4 ? rand_y($2) : 1/0):3 pointtype 7 linecolor palette notitle
Some other things you must have in mind are:
The stats call must come before set xdata time, because the statistics don't work with time data.
When calculating with time data in the using statement, one needs to use the timecolumn function (as opposed to column or $.. in generic cases). This gives the time as a timestamp (i.e. in seconds).
For that reason you need two different random functions for x and y, because of the very different scalings. Here, I used a 'jitter' of one week (60*60*24*7 seconds) on the time axis.
The result with 4.6.4 is:
Some remarks to your question about learning gnuplot: Try to solve your questions by yourself and then post more concrete questions! Browse through the gnuplot demos to see what is possible, look which feature or plotting style is used, look them up in the documentation, what options/settings are offered? Play around with those demos and try to apply that to your data sets etc. In the end its all about practice (I've been using gnuplot for 12 years...).

Related

Understanding the use of the keyword every

My question regards the keyword every that is used to sample an input data file (i.e., .csv, .dat etc.). I am reading the documentation of the keyword that says the following:
plot 'file' every {<point_incr>}
{:{<block_incr>}
{:{<start_point>}
{:{<start_block>}
{:{<end_point>}
{:<end_block>}}}}}
The thing is I cannot completely comprehend how to adapt this to a data set. For instance, if I have some dummy data that I wish to use to create a bar chart for example and the data are the following
# first bars group
#x axis #y axis
0 2
0.2 3
0.4 4
0.6 5
0.8 6
#second bars group
1 1
1.2 2
1.4 3
1.6 4
1.8 5
#etc.
3 10
3.2 20
3.4 30
3.6 40
3.8 50
4 20
4.2 30
4.4 40
4.6 50
4.8 60
And lets say that I want to create four bar clusters from the data. One for every block. How can I use the syntax of the keyword? Could someone give me some examples to better understand the use of it? Thank you in advance
As you've found, the every keyword allows you to cherry-pick a subset of single-newline-separated points and double-newline-separated blocks from your datafile. Your example datafile shows 20 points divided into 4 blocks.
So to plot the first block (indexed 0 in gnuplot), you only need to specify the end block, and use the default values for the other every parameters. Try:
plot 'data.txt' every :::::0 with boxes
It seems your goal is to plot each block with separate styling. Here's how you could do that with a few extra styling commands. (Note my use of gnuplot's shorthand for some keywords.)
set key left top
set boxwidth 0.2
p 'data.txt' ev :::0::0 w boxes t 'first',\
'data.txt' ev :::1::1 w boxes t 'second',\
'data.txt' ev :::2::2 w boxes t 'third',\
'data.txt' ev :::3::3 w boxes t 'fourth'
From help every:
The data points to be plotted are selected according to a loop from
<start_point> to <end_point> with increment <point_incr> and the
blocks according to a loop from <start_block> to <end_block> with
increment <block_incr>.
This should be pretty clear, however, you have to know if blocks are separated by two (or more) empty lines, you have to address them differently. Check help index. To my opinion the documentation is a bit confusing about datablock, (sub-)block, dataset, etc...
Check the following example. I assume this is not your final graph, but still needs some tuning. Depending on your detailed requirements you also might want to check help histograms.
For example every :::i::i will plot all datapoints in in block i, i.e. from block i to block i.
Code:
### plotting using "every"
reset session
$Data <<EOD
# first bars group
#x axis #y axis
0 2
0.2 3
0.4 4
0.6 5
0.8 6
#second bars group
1 1
1.2 2
1.4 3
1.6 4
1.8 5
#etc.
3 10
3.2 20
3.4 30
3.6 40
3.8 50
4 20
4.2 30
4.4 40
4.6 50
4.8 60
EOD
set key top left
set boxwidth 0.2
set key out noautotitles
set style fill solid 0.3
set yrange [:70]
plot for [i=0:3] $Data u 1:2 every :::i::i w boxes
### end of code
Result:

Jitter points in gnuplot. Data input file format

I am able to successfully reproduce Jitter examples from here: http://gnuplot.sourceforge.net/demo/violinplot.html
However, when I try to use my own data, the points are not "jittered".
Here is the data file (data.dat):
10 1 1 3 8 8 8
20 2 2 3 8 8 8
30 3 3 3 8 8 8
Here is a minimal gnuplot input file:
set jitter
plot 'data.dat' using 1:2 with points, '' u 1:3 with points, '' u 1:4 with points, '' u 1:5 with points, '' u 1:6 with points, '' u 1:7 with points
The points are right on top of each other, whereas I want points that are in the same place to be slightly offset (x-axis).
I've installed the latest version of gnuplot:
$ gnuplot --version
gnuplot 5.2 patchlevel 6
EDIT WITH SOLUTION:
#Ethan's comment cleared it up for me. I'm able to get the jittering by reorganizing my input data file so that it's a single dataset, which contains internal "collisions", rather than reading in lots of individual data sets. e.g:
10 1
10 1
10 3
10 3
20 2
20 2
30 8
30 8
And my gnuplot file is now just:
set jitter
plot 'data.dat' using 1:2 with points
"set jitter" will not work across multiple data sets as noted in the comment. You could do something similar by adding a random displacement in the 'using' specifier.
plot for [col=2:7] 'data.dat' using 1:(column(col) + (rand(0)-0.5)/2.) with points
This is different from "set jitter" because all points will be randomly displaced, whereas with jitter only overlapping points are shifted and the displacement is not random.
Alternatively, since in your case the columns are distinct perhaps you want to shift systematically based on the column number:
plot for [col=2:7] 'data.dat' using (column(1)+col/4.) : (column(col))

Larger color variance for frequent values GNUplot

I've got data in the following format:
x y value
1 1 3
1 2 3
1 3 3
2 1 4
2 2 4
2 3 4
3 1 5
3 2 6
3 3 7
In this example, values 3 and 4 occur most frequently in the 3*3 grid. The goal is to create an image of a surface, and the color of the coordinate is depends on the 'value'. Please keep in mind that this is just an example, in practice I have a grid of about 500*150 coordinates with values ranging up to roughly 1000. I currently plot an image like this:
plot "data.csv" u 1:2:3 with points pt 5 ps variable palette
The thing is, given the example, the colors of 3 and 4 would be similar using a standard pallete. This gets worse when the amount of different values increases.
What I would like, is that the colors of values with the highest frequency are most substantially different from each other. I want this because due to the nature of the data, there are often values that are close in value and high frequency at the same time compared to the rest of the data. So given the example, I would for instance desire 3 to be blue and 4 to be red. If there would be more values that don't occur much like the values 5, 6 and 7, I'd still like the colors of 3 and 4 to be very much apart; again like blue and red. The whole point is that it would be easy to distinguish high frequency values due to their color.
I think the above will be difficult to do, this could be an alternative: as an alternative I would like a 'fast' gradient for low values, for example from values 1 to 10 a rainbow gradient, excluding for instance blue. Then, for the values 11-1000 for instance, I'd like a 'slow' gradient towards the remaining color (blue in this example, or perhaps some range of remaining colors). This would also somewhat suffice as it would imply low values to have more differentiating colors, and due to the nature of the data lower values tend to occur more often. I imagine this to be an easier solution, so if you have a solution for this but not for the first one, please mention it.
Thanks in advance!
Here is one possible palette as starting point:
set terminal postscript eps color
set output 'test.eps'
set palette defined (0 1 0 1, 1 1 0 0, 2 1 1 0, 3 0 1 0, 100 0 0 1) maxcolor 2**12
test palette
set output
This gives the result:
I used the postscript terminal, because most of the other terminals support only 256 colors when using coloring by palette. Of course, you could use lc rgb variable instead. Then the last column is interpreted as integer representation of an rgb value, but you would have to specify the complete color function including interpolation etc. by yourself.

gnuplot pm3d gets white lines

I draw a spectrogram with gnuplot version 4.6.
I ensured it is the newest version here:
http://www.gnuplot.info/download.html
Gnuplot is installed from Debian repository.
The plot area and the scale on the right includes strange white lines. They seem to separate the data. On plot area with dense data they look like checkered pattern:
Lines are less visible on the scale but also they are there. There are only horizontal lines on the scale.
I thought it's the case of monitor gamut or something but lines occur also with pdf monochrome.
My code is:
#!/usr/bin/gnuplot
set term pdf
# set style fill noborder # checked with and without this line
set output "../results1/fft.pdf"
set pm3d map
splot "../results1/fft.dat"
As you can read there I tried using option noborder but both with and without the lines exist.
Example data can be:
0 1 3
0 2 3
0 3 4
0 4 2
0 5 2
0 6 3
0 7 4
0 8 3
1 1 3
1 2 2
1 3 4
1 4 2
1 5 2
1 6 3
1 7 4
1 8 4
2 1 2
2 2 4
2 3 4
2 4 3
2 5 2
2 6 4
2 7 2
2 8 2
3 1 2
3 2 3
3 3 3
3 4 4
3 5 2
3 6 2
3 7 2
3 8 2
Do you have any idea how to get rid of this lines?
This has plagued me for some time as well. The best workaround I have is inspired by this post. Basically you can use
plot "datafile" with image
instead of the splot command. There are some subtle differences in how the data is plotted, but this creates a .pdf without those nasty gaps between the colored areas, and a much smaller file.
Note this only really works for data which forms a rectangular grid! If the datafile is in a matrix format (no x or y data except for the number of data points) which for your example would be
3 3 2 2
3 2 4 3
...
(z-value from first row of each block makes the first row, etc.) you can use the command
plot 'datafile' ($1*xincr - x0):($2*yincr - y0):3 matrix with image
The stuff in parens is optional, but allows you to scale the resulting plot to the correct x and y values. (xincr is a gnuplot variable you would have to set for the x-increment in the data, x0 is the x offset, etc.)
If this results in a white gap around the plot, use the plot command once, rescale the axes, and plot again, e.g.
set terminal unknown
plot 'datafile' ($1*xincr - x0):($2*yincr - y0):3 matrix with image
set xr[GPVAL_DATA_X_MIN:GPVAL_DATA_X_MAX]
set yr[GPVAL_DATA_Y_MIN:GPVAL_DATA_Y_MAX]
set output "fft.pdf"
replot
Second answer
I find that this effect is pdf-viewer-dependent. For instance, it shows up in evince under linux but not okular. Depending on your application you could just use a different viewer.
For me, replacing
set pm3d map
with
unset surface; set pm3d at b; set view map
corrects this problem. Tested on gnuplot 4.6 patchlevel 2 and gnuplot 5.4 patchlevel 1.
I can't give much insight as to why, but the manual reveals that
set pm3d map
is a shortcut for
set pm3d at b; set view map; set style data pm3d; set style func pm3d
Simple testing reveals that the problem occurs if either of the "set style" instructions are included. Their removal corrects the problem, but leaves the surface visible - hence "unset surface".

Gnuplot xrange not really a range?

I try to make a plot on gnuplot which has no real range order on x-axis.
--------------------->
1 4 2 20 17 12 10 8
It's therefore not a real function as you would interpret it with math knowledge, but it has some sort of index on its x-axis which has no numbering order and runs from 1-20 but 20 could be the first, or in the middle.. everything may be mixed..
hope you understand what I mean cause I am hoping gnuplot can handle that.
maybe i can write my data file so that point 2 contains the data that should be there on the y-axis and just move the labels around on x-axis?
You could e.g. write a datafile "data" containing such values
1 1.5
4 2
2 3.2
20 2.2
17 0.4
12 4.3
The second column are the "y-values", the first column the labels of the x-axis (xtics)
now try to plot this data with:
plot './data' u 2:xticlabel(1)
is that what you want?
Solution is using xticlabels and add an extra column in the data file.
ie
#xdata ydata label
0 2 1
1 3 14
2 10 0
3 8 20
etc.
command: plot "data.dat" using 1:2:xticlabels(3) with lp"

Resources