How to sum an entire column in gnuplot?

How to sum an entire column in gnuplot? - gnuplot

I have a multiple CSV files like this (just one column):
3
4
2.3
0.1
Now I want to create a gnuplot bar chart that has <filename>:<sum of the column>.
But currently I struggle with summing up a single column:
plot 'data1.txt' using 0:(sum [col = 0:MAXCOL] (col)) with linespoint;

The command you show is summing each row rather than each column.
(1) If you can transpose rows/columns in your csv file before feeding it to gnuplot, this command would produce a plot close to what you ask for. Note that MAXCOL is really the number of rows (not columns) in the original data file
set boxwidth 0.5
set style fill solid
plot 'transpose_of_original' using 0:(sum [col=0:MAXCOL] col) with boxes
(2) Alternatively you can do the summing gnuplot by first accumulating the sums and then plotting it afterward
# get number of columns
stats 'data1.txt' nooutput
NCOL = STATS_columns
array SUM[NCOL]
# get sum for each column
do for [col=1:NCOL] {
stats 'data1.txt' using col nooutput
SUM[col] = STATS_sum
}
# Now we plot the sums in a bar chart
set style fill solid
set boxwidth 0.5
set xlabel "Column"
set ylabel "Sum"
plot SUM using 1:2 with boxes

With help from #Ethan, I was able to solve my problem:
array files = ['data1.txt', 'data2.txt']
array SUM[|files|]
do for [i=1:|files|] {
stats files[i] using 1 nooutput
SUM[i] = STATS_sum
}
set style fill solid
set boxwidth 0.5
set xlabel 'File'
set ylabel 'Sum'
set yrange [0:]
plot SUM using 1:2:xticlabels(files[column(0)+1]) with boxes
data1.txt:
11
22
33
44
data2.txt:
11
2
33
4

Related

Gnuplot stacked histogram skipping the first bin

I'm trying to use gnuplot to plot a stacked histogram of some data but it skips the first bin (the first row of the data file).
The data is:
1 0.2512 0.0103 0.9679
2 0.4730 0.2432 0.8468
3 0.6669 0.2826 0.6895
4 0.6304 0.2268 0.7424
And the plot code is
set title "Data"
set key invert reverse Left outside
set key autotitle columnheader
set style data histogram
set style histogram rowstacked
set style fill solid border -1
#set boxwidth 0.75
plot 'data.dat' using 2:xtic(1) title 'X', '' using 3 title 'Y', '' using 4 title 'Z'
The output is. I checked it and it correctly displays the data of the 2nd, 3rd and 4th rows of the data file. Why am I missing the first bin..?
Thanks a lot!
I already checked this with no help: Using gnuplot for stacked histograms

As it turns out, it was a very simple mistake, that I've fixed mostly thanks to Azad comment about the titles.
The new code is:
set title "Position error along the three axis"
set key invert reverse Left outside
#set key autotitle columnheader
set style data histogram
set style histogram rowstacked
set style fill solid border -1
#set boxwidth 0.75
plot 'data.dat' using 2:xtic(1), '' using 3, '' using 4
Titles have been removed from the code. Gnuplot was taking the first row (which should have been the first bin) as the titles and then it was overwritten by the title 'X' etc.
The new data looks like this:
0 X Y Z
1 0.2512 0.0103 0.9679
2 0.4730 0.2432 0.8468
3 0.6669 0.2826 0.6895
4 0.6304 0.2268 0.7424
This fixed the problem, now all the bins are correctly displayed!

Plotting data vertically with gnuplot

I have a data file with a single column of data. By default, gnuplot renders this on the x-axis from left to right. However, I want to plot this data vertically from top to bottom. How can I do this?
The relevant excerpt from my plot file:
set size 1.0, 1.0
set terminal postscript eps enhanced color dashed lw 1 "Helvetica" 14
set output "ocean-diffuse.eps"
set autoscale
set xtic auto
set ytic auto
plot '0000086400.dat' using 1 with line, \
'0000172800.dat' using 1 with line

In order to have the single column used as x-value, use:
plot '0000086400.dat' using 1:0
That uses the row number (column 0) as y-values. Of course you can do any scaling and computation with the row number as
f(x) = x
plot '0000086400.dat' using 1:(f($0))
To have the y-axis reversed, use
set yrange [*:*] reverse

Gnuplot interchanging Axes

I would like to reproduce this plot with gnuplot:
My data has this format:
Data
1: time
2: price
3: volume
I tried this:
plot file using 1:2 with lines, '' using 1:3 axes x1y2 with impulses
Which gives a normal time series chart with y1 as price and y2 as volume.
Next, I tried:
plot file using 2:1 with lines, '' using 2:3 axes x1y2 with impulses
Which gives prices series with y1 as time and y2 as volume.
However, I need the price to remain at y1 and volume at x2.
Maybe something like:
plot file using 1:2 with lines,' ' using 2:3 axes y1x2 with impulses
However, that does not give what I want.

Gnuplot has no official way to draw this kind of horizontal boxplots. However, you can use the boxxyerrorbars (shorthand boxxy) to achieve this.
As I don't have any test data of your actual example, I generated a data file from a Gaussian random-walk. To generate the data run the following python script:
from numpy import zeros, savetxt, random
N = 500
g = zeros(N)
for i in range(1, N):
g[i] = g[i-1] + random.normal()
savetxt('randomwalk.dat', g, delimiter='\t', fmt='%.3f')
As next thing, I do binning of the 'position data' (which in your case would be the volume data). For this one can use smooth frequency. This computes the sum of the y values for the same x-values. So first I use a proper binning function, which returns the same value for a certain range (x +- binwidth/2). The output data is saved in a file, because for the plotting we must exchange x and y value:
binwidth = 2
hist(x) = floor(x+0.5)/binwidth
set output "| head -n -2 > randomwalk.hist"
set table
plot 'randomwalk.dat' using (hist($1)):(1) smooth frequency
unset table
unset output
Normally one should be able to use set table "randomwalk.hist", but due to a bug, one needs this workaround to filter out the last entry of the table output, see my answer to Why does the 'set table' option in Gnuplot re-write the first entry in the last line?.
Now the actual plotting part is:
unset key
set x2tics
set xtics nomirror
set xlabel 'time step'
set ylabel 'position value'
set x2label 'frequency'
set style fill solid 1.0 border lt -1
set terminal pngcairo
set output 'randwomwalk.png'
plot 'randomwalk.hist' using ($2/2.0):($1*binwidth):($2/2.0):(binwidth/2.0) with boxxy lc rgb '#00cc00' axes x2y1,\
'randomwalk.dat' with lines lc rgb 'black'
which gives the result (with 4.6.3, depends of course on your random data):
So, for your data structure, the following script should work:
reset
binwidth = 2
hist(x) = floor(x+0.5)/binwidth
file = 'data.txt'
histfile = 'pricevolume.hist'
set table histfile
plot file using (hist($2)):($3) smooth unique
unset table
# get the number of records to skip the last one
stats histfile using 1 nooutput
unset key
set x2tics
set xtics nomirror
set xlabel 'time'
set ylabel 'price'
set x2label 'volume'
set style fill solid 1.0 border lt -1
plot histfile using ($2/2.0):($1*binwidth):($2/2.0):(binwidth/2.0) every ::::(STATS_records-2) with boxxy lc rgb '#00cc00' axes x2y1,\
file with lines using 1:2 lc rgb 'black'
Note, that this time the skipping of the last table entry is done by counting all entries with the stats command, and skipping the last one with every (yes, STATS_records-2 is correct, because the point numbering starts at 0). This variant doesn't need any external tool.
I also use smooth unique, which computes the average value of the , instead of the sum (which is done with smooth frequency).

How to label (x,y) data points in Gnuplot 4.2 with integer numbers

I have a text file with 2 columns of numbers corresponding to (x,y) coords.
4 1
4 5
1 1
1 5
2.5 3
How do I tell gnuplot to plot these points and label each point with its corresponding row #? (Please keep in mind I'm going to apply this to a much larger file with 100 points, so I'm looking for a way to do it automagically, rather than have to create a 3rd column of data corresponding to row numbers).

You can use the with labels flag to the plot command. By default this places the label instead of the point at the place where the point would be. with label takes the offset flag (and any flag you can pass to set label) so you can have the label next to the point. Here is an example script:
#!/usr/bin/env gnuplot
reset
set terminal pngcairo
set output 'test.png'
set xr [0:5]
set yr [0:6]
plot 'data.dat' pt 7, \
'data.dat' using 1:2:($0+1) with labels offset 1 notitle
which produces this output:

x range for non-numerical data in Gnuplot

When running the following script, I get an error message:
set terminal postscript enhanced color
set output '| ps2pdf - histogram_categorie.pdf'
set auto x
set key off
set yrange [0:20]
set style fill solid border -1
set boxwidth 5
unset border
unset ytic
set xtics nomirror
plot "categorie.dat" using 1:2 ti col with boxes
The error message that I get is
smeik:plots nvcleemp$ gnuplot categorie.gnuplot
plot "categorie.dat" using 1:2 ti col with boxes
^
"categorie.gnuplot", line 13: x range is invalid
The content of the file categorie.dat is
categorie aantal
poussin 13
pupil 9
miniem 15
cadet 15
junior 6
senior 5
veteraan 8
I understand that the problem is that I haven't defined an x range. How can I make him use the first column as values for the x range? Or do I need to take the row numbers as x range and let him use the first column as labels? I'm using Gnuplot 4.4.
I'm ultimately trying to get a plot that looks the same as the plot I made before this one. That one worked fine, but had numerical data on the x axis.
set terminal postscript enhanced color
set output '| ps2pdf - histogram_geboorte.pdf'
set auto x
set key off
set yrange [0:40]
set xrange [1935:2005]
set style fill solid border -1
set boxwidth 5
unset border
unset ytic
set xtics nomirror
plot "geboorte.dat" using 1:2 ti col with boxes,\
"geboorte.dat" using 1:($2+2):2 with labels
and the content of the file geboorte.dat is
decennium aantal
1940 2
1950 1
1960 3
1970 2
1980 3
1990 29
2000 30

the boxes style expects that the x-values are numeric. That's an easy one, we can give it the pseudo-column 0 which is essentially the script's line number:
plot "categorie.dat" using (column(0)):2 ti col with boxes
Now you probably want the information in the first column on the plot somehow. I'll assume you want those strings to become the x-tics:
plot "categorie.dat" using (column(0)):2:xtic(1) ti col with boxes
*careful here, this might not work with your current boxwidth settings. You might want to consider set boxwidth 1 or plot ... with (5*column(0)):2:xtic(1) ....
EDIT -- Taking your datafiles posted above, I've tested both of the above changes to get the boxwidth correct, and both seemed to work.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to sum an entire column in gnuplot? - gnuplot

I have a multiple CSV files like this (just one column): 3 4 2.3 0.1 Now I want to create a gnuplot bar chart that has <filename>:<sum of the column>. But currently I struggle with summing up a single column: plot 'data1.txt' using 0:(sum [col = 0:MAXCOL] (col)) with linespoint;

Related

Gnuplot stacked histogram skipping the first bin

Plotting data vertically with gnuplot

Gnuplot interchanging Axes

How to label (x,y) data points in Gnuplot 4.2 with integer numbers

x range for non-numerical data in Gnuplot

Categories

Resources