gnuplot: fetching a variable value from different row/column for calculations - gnuplot

I want to get a specific value from another row & column to normalize my data. The tricky part is, that this value changes for every data point in my data set.
Here what my data set looks like:
64 22370 1 585 1 10
128 47547 1 4681 1 10
256 291761 1 37449 1 10
128 48446 1.019 4681 1 10
256 480937 1.648 37449 1 10
128 7765 0.163 777 0.166 10
256 7164 0.025 1393 0.037 10
128 37078 0.780 4681 1 10
256 334372 1.146 37449 1 10
128 45543 0.958 4681 1 10
128 5579 0.117 649 0.139 10
128 40121 0.844 4529 0.968 10
128 49494 1.041 4681 1 10
# --> here it starts to repeat
64 48788 1 585 1 20
128 110860 1 4681 1 20
256 717797 1 37449 1 20
128 101666 0.917 4681 1 20
......
......
This data file contains all points for in total 13 different sets, so I plot it with something like this:
plot\
'../logs.dat' every 13::1 u 6:2 title '' with lines lt 3 lc 'black' lw 1,\
'../logs.dat' every 13::3 u 6:2 title '' with lines lt 3 lc 'black' lw 1,\
Now I try to normalize my data. The interesting value is respectively the 1st row 2nd column (starting to count at 0) $1:$2 and then adds 13 to the rows for every data point
For example: The first data set I want to plot would be
(10:47547/47547)
(20:110860/110860)
...
The second plot should be
(10:48446/47547)
(20:101666/110860)
...
And so on.
In pseudo code I would read something like
plot\
'../logs.dat' every 13::1 u 6:($2 / take i:$2 for i = i + 13 ) title '' with lines lt 3 lc 'black' lw 1,\
'../logs.dat' every 13::3 u 6:($2 / take i:$2 for i = i + 13 ) title '' with lines lt 3 lc 'black' lw 1,\
I hope I could make clear what I try to archive.
Thank you for any help!

If the value you want to use for normalisation is the very first to be plotted, then something like this is possible:
plot y0=-1e10, "data" using 1:(y0 == -1e10 ? (y0 = $2, 1) : $2/y0)
The normalisation value y0 is initialised to -1e10 on every replot. Check the help for ternary operator and serial evaluation.
But really you'd better pre-process your data.

If I understood your question correctly you want to normalize some of your data in a special way.
For the first plot you want to start from the second line (row-index 1) and divide the value in the column by itself and continue for every 13th row.
So, this is dividing the values of the second column for the following row indices: 1/1, 14/14, 27/27, ..., (n*13+1)/(n*13+1). This is trivial because it will always be 1.
For the second plot you want to start with the value in column 2 from row index 3 and divide it by the value in column2 of row index 1 and repeat this for every 13th row.
i.e. involved rows-indices: 3/1, 16/14, 29/27, ..., (n*13+3)/(n*13+1)
For the second case, a construct with every 13 will not work because you need every 13th value and every 13th shifted by 2 rows.
So, what you can do:
If you pass by row-index 1 (and every 13th row later), remember the value in column 2 and when you pass by row-index 3, divide this value by the remembered value and plot it, otherwise plot NaN. Repeat this for all rows cycled by 13. You can use the pseudocolumn 0 (check help pseudocolumns) and the modulo operator (check help operators binary).
If you want a continuous line with lines or linespoints you need to set datafile missing NaN because NaN values would interrupt the lines (check help missing). However, this works only for gnuplot>=5.0.6. For gnuplot 5.0.0 (version at OP's question) you have to use some workaround.
Script:
### special normalization of data
reset session
$Data <<EOD
1 900 3 4 5 10
2 1000 3 4 5 10
3 1050 3 4 5 10
4 1100 3 4 5 10
5 1150 3 4 5 10
6 1200 3 4 5 10
7 1250 3 4 5 10
8 1300 3 4 5 10
9 1350 3 4 5 10
10 1400 3 4 5 10
11 1450 3 4 5 10
12 1500 3 4 5 10
13 1550 3 4 5 10
#
1 1900 3 4 5 20
2 2000 3 4 5 20
3 2050 3 4 5 20
4 2100 3 4 5 20
5 2150 3 4 5 20
6 2200 3 4 5 20
7 2250 3 4 5 20
8 2300 3 4 5 20
9 2350 3 4 5 20
10 2400 3 4 5 20
11 2450 3 4 5 20
12 2500 3 4 5 20
13 2550 3 4 5 20
#
1 2900 3 4 5 30
2 3000 3 4 5 30
3 3050 3 4 5 30
4 3100 3 4 5 30
5 3150 3 4 5 30
6 3200 3 4 5 30
7 3250 3 4 5 30
8 3300 3 4 5 30
9 3350 3 4 5 30
10 3400 3 4 5 30
11 3450 3 4 5 30
12 3500 3 4 5 30
13 3550 3 4 5 30
EOD
M = 13 # cycle of your data
set datafile missing NaN # only for gnuplot>=5.0.6
plot $Data u 6:(1) every M w lp pt 7 lc "red" ti "Normalized 1/1", \
'' u 6:(int($0)%M==1?y0=$2:0,int($0)%M==3?$2/y0:NaN) w lp pt 7 lc "blue" ti "Normalized 3/1"
### end of code
Result:

Related

Calculate mean value by interval coordinates in pandas

I have a dataframe such as :
Name Position Value
A 1 10
A 2 11
A 3 10
A 4 8
A 5 6
A 6 12
A 7 10
A 8 9
A 9 9
A 10 9
A 11 9
A 12 9
and I woulde like for each interval of 3 position, to calculate the mean of Values.
And create a new df with start and end coordinates (of length 3 then), with the Mean_value column.
Name Start End Mean_value
A 1 3 10.33 <---- here this is (10+11+10)/3 = 10.33
A 4 6 8.7
A 7 9 9.3
A 10 13 9
Does someone have an idea using pandas please ?
Solution for get each 3 rows (if exist) per Name groups - first get counter by GroupBy.cumcount with integer division and pass it to named aggregations:
g = df.groupby('Name').cumcount() // 3
df = df.groupby(['Name',g]).agg(Start=('Position','first'),
End=('Position','last'),
Value=('Value','mean')).droplevel(1).reset_index()
print (df)
Name Start End Value
0 A 1 3 10.333333
1 A 4 6 8.666667
2 A 7 9 9.333333
3 A 10 12 9.000000

Create a frequency diagram using a dataframe in Pandas (Python3)

I currently have a list of the number of items and their frequency stored in a data frame called transactioncount_freq.
Item Frequency
0 1 3474
1 2 2964
2 3 1532
3 4 937
4 5 360
5 6 168
6 7 57
7 8 25
8 9 5
9 10 5
10 11 3
11 12 1
How would I make a bar chart using the item values as the x axis and the frequency values as the y axis using pandas and matplotlib.pyplot?
You can plot it easily like this
transactioncount_freq.plot(x='Item', y='Frequency', kind='bar')

Variable string formatting in python 3

Input is a number, e.g. 9 and I want to print decimal, octal, hex and binary value from 1 to 9 like:
1 1 1 1
2 2 2 10
3 3 3 11
4 4 4 100
5 5 5 101
6 6 6 110
7 7 7 111
8 10 8 1000
9 11 9 1001
How can I achieve this in python3 using syntax like
dm, oc, hx, bn = len(str(9)), len(bin(9)[2:]), ...
print("{:dm%d} {:oc%s}" % (i, oct(i[2:]))
I mean if number is 999 so I want decimal 10 to be printed like ' 10' and binary equivalent of 999 is 1111100111 so I want 10 like ' 1010'.
You can use str.format() and its mini-language to do the whole thing for you:
for i in range(1, 10):
print("{v} {v:>6o} {v:>6x} {v:>6b}".format(v=i))
Which will print:
1 1 1 1
2 2 2 10
3 3 3 11
4 4 4 100
5 5 5 101
6 6 6 110
7 7 7 111
8 10 8 1000
9 11 9 1001
UPDATE: To define field 'widths' in a variable you can use a format-within-format structure:
w = 5 # field width, i.e. offset to the right for all octal/hex/binary values
for i in range(1, 10):
print("{v} {v:>{w}o} {v:>{w}x} {v:>{w}b}".format(v=i, w=w))
Or define a different width variable for each field type if you want them non-uniformly spaced.
Btw. since you've tagged your question with python-3.x, if you're using Python 3.6 or newer, you can use Literal String Interpolation to simplify it even more:
w = 5 # field width, i.e. offset to the right for all octal/hex/binary values
for v in range(1, 10):
print(f"{v} {v:>{w}o} {v:>{w}x} {v:>{w}b}")

image style plot issue with latest gnuplot 5.0.3

Recently I upgrade gnuplot to 5.0.3 and find that image style plot generate strange output with none square data. This does not happen in previous version (5.0.2). here is the minimal example
set term png
set output "a.png"
plot "-" using 1:2:3 with image title ""
#data
1 1 2
1 2 3
1 3 1
2 1 1
2 2 2
2 3 3
3 1 8
3 2 6
3 3 4
4 1 8
4 2 6
4 3 4
the output image is a.png
When dealing with square data like this
set term png
set output "b.png"
plot "-" using 1:2:3 with image title ""
#data
1 1 2
1 2 3
1 3 1
2 1 1
2 2 2
2 3 3
3 1 8
3 2 6
3 3 4
everything is fine b.png
is this a bug?
This may be a problem of sorting.
Try to sort by 2nd column like this:
1 1 2
2 1 1
3 1 8
4 1 8
1 2 3
2 2 2
3 2 6
4 2 6
1 3 1
2 3 3
3 3 4
4 3 4
I am facing the same problem. At the moment, I switch the x and y axes. I use "u 2:1:3 w image" instead of using "u 1:2:3 w image"

Average of multiple files with unequal row sizes in Shell

I have 15 datafiles with unequal row sizes, but number of columns in each file is same. e.g.
ifile1.dat ifile2.dat ifile3.dat and so on ............
0 0 0 0 1 6
1 2 5 3 2 7
2 5 6 10 4 6
5 2 8 9 5 9
10 2 10 3 8 2
In each file 1st column represents the index number.
I would like to compute average of all these files for each index number in column 1. i.e.
ofile.txt
0 0 [This is computed as (0+0)/2]
1 4 [This is computed as (2+6)/2]
2 6 [This is computed as (5+7)/2]
3 [no value]
4 6 [This is computed as (6)/1]
5 4.66 [This is computed as (2+3+9)/3]
6 10
7
8 5.5
9
10 2.5
I can't think of any simple method to do it. I was thinking of a method, but seems very lengthy. Taking the average after converting all the files with same row size, .e.g.
ifile1.dat ifile2.dat ifile3.dat and so on ............
0 0 0 0 0 0
1 2 1 1 6
2 5 2 2 7
3 3 3
4 4 4 6
5 2 5 3 5 9
6 6 10 6
7 7 7
8 8 9 8 2
9 9 9
10 2 10 3 10
$ awk '{s[$1]+=$2; c[$1]++;} END{for (i in s) print i,s[i]/c[i];}' ifile*.dat
0 0
1 4
2 6
4 6
5 4.66667
6 10
8 5.5
10 2.5
In the above code, there are two arrays, s and c. s[i] is the sum of all entries with index i and c[i] is the number of entries with index i. After we have read all the files, we print the average, s[i]/c[i], for each index i.

Resources