column with empty datapoints - gnuplot

date daily weekly monthly
1 11 88
2 12
3 45 44
4 54
5 45
6 45 66
7 77
8 78
9 71 99 88
For empty data points in weekly column , the plot is ploting values from monthly column.
Monthly column plot and daily column plot are perfect.
suggest something more than set datafile missing ' ' and set datafile separator "\t"

Alas, Gnuplot doesn't support field based data files, the only current solution is is to preprocess the file. awk is well suited for the task (note if the file contains hard tabs you need to adjust FIELDWIDTHS):
awk '$3 ~ /^ *$/ { $3 = "?" } $4 ~ /^ *$/ { $4 = "?" } 1' FIELDWIDTHS='6 7 8 7' infile > outfile
This replaces empty fields (/^ *$/) in column 3 and 4 with question marks, which means undefined to Gnuplot. The 1 at the end of the awk script invokes the default rule: { print $0 }.
If you send awk's output to outfile, you can for example now plot the file like this:
set key autotitle columnhead out
set style data linespoint
plot 'outfile' using 1:2, '' using 1:3, '' using 1:4

If anyone runs into this, I recommend updating to at least the 4.6.5 Gnuplot version.
This is because from Gnuplot 4.6.4 update:
* CHANGE treat empty fields in a csv file as "missing" rather than "bad"
And there seemed to be a (related?) bugfix in 4.6.5:
* FIX empty first field in a tab-separated-values file was incorrectly ignored

Related

Gnuplot: store last data point as variable

I was wondering if there is an easy way, using Gnuplot, to get the last point in a data file say data.txt and store the values in a variable.
From this question,
accessing the nth datapoint in a datafile using gnuplot
I know that I can get the X-Value using stats and GP_VAL_DATA_X_MAX variable, but is there a simple trick to get the corresponding y-value?
A third possibility is to write each ordinate value into the same user variable during plotting. Last value stays in:
plot dataf using 1:(lasty=$2)
print lasty
If you want to use Gnuplot, you can
plot 'data.txt'
plot[GPVAL_DATA_X_MAX:] 'data.txt'
show variable GPVAL_DATA_Y_MAX
OR
plot 'data.txt'
plot[GPVAL_DATA_X_MAX:] 'data.txt'
print GPVAL_DATA_Y_MAX
If you know how your file is organised (separators, trailing empty lines) and you have access to standard Unix tools, you make use of Gnuplot’s system command. For example, if you have no trailing newlines and your values are separated by tabs, you can do the following:
x = system("tail -n 1 data.txt | cut -f 1")
y = system("tail -n 1 data.txt | cut -f 2")
(tail gets the last n lines of a file. cut extracts the column f.)
Note that x and y are strings if obtained this way, but for most applications this should not matter. If you must convert them, you can still add zero.
Let me add a 4th solution, because:
To be very precise, the OP asked about the last x-value and the correscponding y-value.
#TomSolid's solution will return the maximum x-value and its corresponding y-value.
However, strictly speaking the maximum value might not necessarily be the last value, unless the x-data is sorted in ascending order.
The result for the example below would be 10 and 14 instead of 8 and 18.
#Karl's solution will return the last y-value and as well plots something, although you maybe just want to extract the value and plot something else. Ideally, you could combine extraction and plotting.
#Wrzlprmft's solution is using the Linux function tail which is not platform-independent (for Windows you first would have to install such utilities)
Hence, here is a solution:
platform-independent and gnuplot-only
returns the last x-value and corresponding y-value
doesn't create any dummy plot
Script:
### get the last x-value and corresponding y-value
reset session
$Data <<EOD
1 11
2 12
3 13
10 14
5 15
6 16
7 17
8 18
EOD
stats $Data u (lastX=$1,lastY=$2) nooutput
print lastX, lastY
### end of script
Result:
8.0 18.0

Gnuplot plot specific lines from data file

I have a data file with 24 lines and 3 columns. How can I plot only the data from specific lines, e.g. 3,7,9,14,18,21?
Now I use the following command
plot 'xy.dat' using 0:2:3:xticlabels(1) with boxerrorbars ls 2
which plots all 24 lines.
I tried the every command but couldn't figure out a way that works.
Untested, but something like this
plot "<(sed -n -e 3p -e 7p -e 9p xy.dat)" using ...
Another option may be to annotate your datafile, if as it seems, it contains multiple datasets. Let's say you created your datafile like this:
1 2 3
2 1 3 # SetA
2 7 3 # SetB
2 2 1 # SetA SetB SetC
Then if you wanted just SetA you would use this sed command in the plot statement
sed -ne '/SetA/s/#.*//p' xy.dat
2 1 3
2 2 1
That says..."in general, don't print anything (-n), but, if you do see a line containing SetA, delete the hash sign and everything after it and print the line".
or if you wanted SetB, you would use
sed -ne '/SetB/s/#.*//p' xy.dat
2 7 3
2 2 1
or if you wanted the whole data file, but stripped of our comments
sed -e 's/#.*//' xy.dat
If you wanted SetB and SetC, use
sed -ne '/Set[BC]/s/#.*//p' xy.dat
2 7 3
2 2 1
If the lines you want have something in common that you can evaluate, e.g. the label in column 1 begins with an "a"
plot dataf using (strcol(1)[1:1] eq "a" ? $0 : NaN):2:xticslabel(1)
you can just skip these lines by letting the using statement return "NaN".
This here is an ugly hack you can use in case the desired line numbers are just arbitrary:
linnum = " 1 3 7 12 16 21 "
plot dataf using (strstrt(linnum," ".int($0)." ") != 0 ? $0 : NaN):2
strstrt(a,b) returns the position of string b in string a, zero if it does not appear. I add the two spaces to make the line numbers unique.
But I would recommend using an external program to preprocess the data in that case, see the other answer.
Yes, there is a solution with every. Since you want to plot with boxerrorbars it can be done in a plot for-loop.
no external tools, i.e. gnuplot-only and hence platform-independent
no strictly increasing line numbers, but arbitrary sequence of lines possible
Script:
### plot only certain lines appearing in a list
reset session
# create some random test data
set print $Data
do for [i=1:24] {
print sprintf("line%02d %g %g", i, rand(0)*5+1, rand(0)*0.5)
}
set print
myLines = "3 7 9 14 18 21"
myLine(i) = int(word(myLines,i)-1)
set offsets 0.5,0.5,0,0
set style fill solid 0.3
set boxwidth 0.6
set xtics out
set key noautotitle
set yrange [0:]
plot for [i=1:words(myLines)] $Data u (i):2:3:xtic(1) \
every ::myLine(i)::myLine(i) w boxerrorbars lc "blue"
### end of script
Result:

Mapping lines to columns in *nix

I have a text file that was created when someone pasted from Excel into a text-only email message. There were originally five columns.
Column header 1
Column header 2
...
Column header 5
Row 1, column 1
Row 1, column 2
etc
Some of the data is single-word, some has spaces. What's the best way to get this data into column-formatted text with unix utils?
Edit: I'm looking for the following output:
Column header 1 Column header 2 ... Column header 5
Row 1 column 1 Row 1 column 2 ...
...
I was able to achieve this output by manually converting the data to CSV in vim by adding a comma to the end of each line, then manually joining each set of 5 lines with J. Then I ran the csv through column -ts, to get the desired output. But there's got to be a better way next time this comes up.
Perhaps a perl-one-liner ain't "the best" way, but it should work:
perl -ne 'BEGIN{$fields_per_line=5; $field_seperator="\t"; \
$line_break="\n"} \
chomp; \
print $_, \
$. % $fields_per_row ? $field_seperator : $line_break; \
END{print $line_break}' INFILE > OUTFILE.CSV
Just substitute the "5", "\t" (tabspace), "\n" (newline) as needed.
You would have to use a script that uses readline and counter. When the program reaches that line you want, use cut command and space as a dilimeter to get the word you want
counter=0
lineNumber=3
while read line
do
counter += 1
if lineNumber==counter
do
echo $line | cut -d" " -f 4
done
fi

fitting a function with multiple data sets using gnuplot

I would like to fit a function using many data sets. For example, I reproduce an experience many times, each time I obtain a pair of data column (x,y). I put all these column in a file named 'data.txt' :
first experience : x = column 1, y = column 2
second experience : x = column 3, y = column 4
third experience : x = column 5, y = column 6
...
Now I wish to fit a function y = f(x) for these data sets. I do not know if Gnuplot can do that ? If it is possible, could you please help me to correct the following command ? This one does not work.
fit f(x) "data.txt" u 1:2:(0.25), "data.txt" u 3:4:(0.25), "data.txt" u 5:6:(0.25) via a, b
You can process your data so that columns 1, 3 and 5 all become the same column 1, and columns 2, 4 and 6 all become the same column 2. It's easy with awk, you can do it outside gnuplot:
awk '{print $1, $2} {print $3, $4} {print $5, $6}' data.txt > data2.txt
and then fit it within gnuplot:
f(x)=a*x+b
fit f(x) "data2.txt" u 1:2:(0.25) via a,b
Or you can do it completely within gnuplot without any intermediate file:
f(x)=a*x+b
fit f(x) "< awk '{print $1, $2} {print $3, $4} {print $5, $6}' data.txt" u 1:2:(0.25) via a,b

How to extract multiple params from string using sed or awk

I have a log file which looks like this:
2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts
I'd like to plot the date time string vs interesting value using gnuplot. In order to do that i'm trying to parse the above log file into a csv file which looks like (not all lines in the log have a plottable vale):
2010/01/12/ 12:00, 45
2010/01/13/ 14:00, 60
How can i do this with sed or awk?
I can extract the initial characters something like:
cat partial.log | sed -e 's/^\(.\{17\}\).*/\1/'
but how can i extract the end values?
I've been trying to do this to no avail!
Thanks
Although this is a really old question with many answers, but you can do it without the use of external tools like sed or awk (hence platform-independent). You can "simply" do it with gnuplot (even with the version at that time of OP's question: gnuplot 4.4.0, March 2010).
However, from your example data and description it is not clear whether the value of interest
is strictly in the 12th column or
is always in the last column or
could be in any column but always trailed with pts
For all 3 cases there are gnuplot-only (hence platform-independent) solutions.
Assumption is that column separator is space.
ad 1. The simplest solution: with u 1:12, gnuplot will simply ignore non-numerical and column values, e.g. like 45pts will be interpreted as 45.
ad 2. and 3. If you extract the last column as string, gnuplot will fail and stop if you want to convert a non-numerical value via real() into a floating point number. Hence, you have to test yourself via your own function isNumber() if the column value at least starts with a number and hence can be converted by real(). In case the string is not a number you could set the value to 1/0 or NaN. However, in earlier gnuplot versions the line of a lines(points) plot will be interrupted.
Whereas in newer gnuplot versions (>=4.6.0) you could set the value to NaN and avoid interruptions via set datafile missing NaN which, however, is not available in gnuplot 4.4.
Furthermore, in gnuplot 4.4 NaN is simply set to 0.0 (GPVAL_NAN = 0.0).
You can workaround this with this "trick" which is also used below.
Data: SO7353702.dat
2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts
2010/01/15/ 09:00 some un related alapha 345678 62pts and nothing
2010/01/17/ 09:00 some un related alapha 345678 and nothing
2010/01/18/ 09:00 some un related alapha 345678 and the interesting value 70.5pts
2010/01/19/ 09:00 some un related alapha 345678 and the interesting value extra extra 64pts
2010/01/20/ 09:00 some un related alapha 345678 and the interesting value 0.66e2pts
Script: (works for gnuplot>=4.4.0, March 2010)
### extract numbers without external tools
reset
FILE = "SO7353702.dat"
set xdata time
set timefmt "%Y/%m/%d/ %H:%M"
set format x "%b %d"
isNumber(s) = strstrt('+-.',s[1:1])>0 && strstrt('0123456789',s[2:2])>0 \
|| strstrt('0123456789',s[1:1])>0
# Version 1:
plot FILE u 1:12 w lp pt 7 ti "value in the 12th column"
pause -1
# Version 2:
set datafile separator "\t"
getLastValue(col) = (s=word(strcol(col),words(strcol(col))), \
isNumber(s) ? (t0=t1, real(s)) : (y0))
plot t0=NaN FILE u (t1=timecolumn(1), y0=getLastValue(1), t0) : (y0) w lp pt 7 \
ti "value in the last column"
pause -1
# Version 3:
getPts(s) = (c=strstrt(s,"pts"), c>0 ? (r=s[1:c-1], p=word(r,words(r)), isNumber(p) ? \
(t0=t1, real(p)) : y0) : y0)
plot t0=NaN FILE u (t1=timecolumn(1),y0=getPts(strcol(1)),t0):(y0) w lp pt 7 \
ti "value anywhere with trailing 'pts'"
### end of script
Result:
Version 1:
Version 2:
Version 3:
Bash
#!/bin/bash
while read -r a b line
do
[[ $line =~ ([0-9]+)pts$ ]] && echo "$a $b, ${BASH_REMATCH[1]}"
done < file
try:
awk 'NF==12{sub(/pts/,"",$12);printf "%s %s, %s ", $1, $2, $12}' file
Input:
2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts
Output:
2010/01/12/ 12:00, 45 2010/01/13/ 09:00, 60
Updated for your new requirements:
Command:
awk 'NF==12{gsub(/\//,"-",$1)sub(/pts/,"",$12);printf "%s%s %s \n", $1, $2, $12}' file
Output:
2010-01-12-12:00 45
2010-01-13-09:00 60
HTH Chris
It is indeed possible. A regex such as this one, for instance:
sed -n 's!([0-9]{4}/[0-9]{2}/[0-9]{2}/ [0-9]{2}:[0-9]{2}).*([0-9]+)pts!\1, \2!p'
awk '/pts/{ gsub(/pts/,"",$12);print $1,$2", "$12}' yourFile
output:
2010/01/12/ 12:00, 45
2010/01/13/ 09:00, 60
[Update:based on your new requirement]
How can i modify the above to look like:
2010-01-12-12:00 45
2010-01-13-09:00 60
awk '/pts/{ gsub(/pts/,"",$12);a=$1$2OFS$12;gsub(/\//,"-",a);print a}' yourFile
the cmd above will give you:
2010-01-12-12:00 45
2010-01-13-09:00 60
sed can be made more readable:
nn='[0-9]+'
n6='[0-9]{6}'
n4='[0-9]{4}'
n2='[0-9]{2}'
rx="^($n4/$n2/$n2/ $n2:$n2) .+ $n6 .+ ($nn)pts$"
sed -nre "s|$rx|\1 \2|p" file
output
2010/01/12/ 12:00 45
2010/01/13/ 09:00 60
I'd do that in two pipeline stages, first awk then sed:
awk '$NF ~ /[[:digit:]]+pts/ { print $1, $2", "$NF }' |
sed 's/pts$//'
By using $NF instead of a fixed number, you work with the final field, regardless of what the unrelated text looks like and how many fields it occupies.

Resources