for loop incorrectly plotting boxplot on same figure - python-3.x

I have the code below where I’m trying to create three separate figures. I’m trying to create a figure with a boxplot for each column from the list. When I run this code it plots all three boxplots in the same figure on top of each other. If I instead changed it to a histogram it works perfectly, creating a separate figure for each histogram plot. Can someone please let me know how to fix this? I’ve also included some sample data below.
Code:
for i in ['Fresh', 'Milk', 'Grocery']:
data_df.boxplot(column=i)
Data:
print(data_df[:10])
Channel Region Fresh Milk Grocery Frozen Detergents_Paper \
0 2 3 12669 9656 7561 214 2674
1 2 3 7057 9810 9568 1762 3293
2 2 3 6353 8808 7684 2405 3516
3 1 3 13265 1196 4221 6404 507
4 2 3 22615 5410 7198 3915 1777
5 2 3 9413 8259 5126 666 1795
6 2 3 12126 3199 6975 480 3140
7 2 3 7579 4956 9426 1669 3321
8 1 3 5963 3648 6192 425 1716
9 2 3 6006 11093 18881 1159 7425
Delicatessen
0 1338
1 1776
2 7844
3 1788
4 5185
5 1451
6 545
7 2566
8 750
9 2098

You can try this:
import matplotlib.pyplot as plt
df[['Fresh','Milk','Grocery']].plot.box(subplots=True)
plt.tight_layout()
Output:

Related

Assigning ID to column name in merging distance matrix to dataframe

I have this issue I haven't been able to solve and I was hoping to get some insights here.
I have this geopandas dataframe:
GEO =
id geometry_zone \
0 A001DFD POLYGON ((48.08793 50.93755, 48.08793 49.18650...
1 A001DG POLYGON ((60.96434 49.05222, 59.86796 49.29929...
2 A001DS007 POLYGON ((53.16200 50.20131, 52.84363 48.45026...
3 A001DS01 POLYGON ((59.04953 49.34561, 58.77158 47.52346...
4 A001DS02 POLYGON ((58.12301 49.46915, 57.79873 47.67788...
5 A001DS03 POLYGON ((57.07498 49.66937, 56.79702 47.84722...
6 A001DS04 POLYGON ((56.13302 49.80835, 55.83962 48.00164...
7 A001DS05 POLYGON ((55.16017 49.93189, 54.89766 48.18694...
8 A001DS06 POLYGON ((54.14099 50.05542, 53.86304 48.27959...
9 A001DS08 POLYGON ((52.22678 50.36050, 51.94821 48.52985...
10 A001DS09 POLYGON ((50.93339 48.70894, 51.96811 48.52985...
11 A001DS10 POLYGON ((50.23695 50.67887, 49.91857 48.84823...
12 A001DS11 POLYGON ((50.23695 50.67887, 49.60020 50.75847...
13 A001FS01 POLYGON ((46.47617 48.94772, 46.47617 47.63443...
14 A001FS02 POLYGON ((46.49606 50.04213, 46.47617 48.94772...
centroid
0 POINT (48.75295 49.98494)
1 POINT (60.27696 48.21993)
2 POINT (53.49869 49.22928)
3 POINT (59.29040 48.38586)
4 POINT (58.42620 48.49535)
5 POINT (57.43469 48.68996)
6 POINT (56.46528 48.82210)
7 POINT (55.50608 48.98701)
8 POINT (54.51093 49.10232)
9 POINT (52.52668 49.40021)
10 POINT (51.59314 49.51614)
11 POINT (50.57522 49.68396)
12 POINT (49.74105 49.81923)
13 POINT (47.00679 48.58955)
14 POINT (47.23437 49.55921)
where the points are the geometry_zone centroids. Now, I know how to calculate the distance between every point, i.e. compute the distance matrix:
GEO_distances
0 1 2 3 4 5 6 \
0 0.000000 11.063874 4.299228 10.275246 9.312075 8.274448 7.312941
1 10.983097 0.000000 6.348082 0.616036 1.399226 2.373198 3.374784
2 4.132203 6.259105 0.000000 5.469828 4.507633 3.469029 2.507443
3 9.982697 0.409114 5.348195 0.000000 0.399280 1.373252 2.374671
4 9.112541 1.279148 4.477119 0.487986 0.000000 0.504366 1.503677
5 8.102334 2.289412 3.468492 1.497509 0.538514 0.000000 0.494605
6 7.124643 3.266993 2.490125 2.475753 1.515950 0.474954 0.000000
7 6.151367 4.240258 1.517485 3.448859 2.489192 1.448060 0.487174
8 5.151208 5.240246 0.515855 4.450013 3.488962 2.449214 1.487936
9 3.145284 7.246023 0.481768 6.456493 5.494540 4.455695 3.494278
10 2.205711 8.185458 1.420986 7.396838 6.433798 5.396039 4.434327
11 1.174092 9.217045 2.452510 8.428427 7.465334 6.427628 5.465988
12 0.329081 10.062023 3.297427 9.273461 8.310263 7.272662 6.311059
13 1.235000 12.579303 5.838504 11.812993 10.830385 9.818336 8.852372
14 0.853558 12.484730 5.717153 11.712257 10.730567 9.711458 8.743639
7 8 9 10 11 12 13 \
0 6.343811 5.312333 3.377798 2.368462 1.343153 0.675055 1.051959
1 4.353762 5.318769 7.388784 8.269175 9.305375 10.325337 12.247130
2 1.538467 0.506829 0.544190 1.416284 2.454398 3.479383 5.430826
3 3.353424 4.318400 6.388838 7.269062 8.304972 9.325272 11.250890
4 2.482659 3.447704 5.519952 6.398068 7.434796 8.456133 10.381205
5 1.473030 2.437971 4.509526 5.388997 6.424600 7.445701 9.379809
6 0.494829 1.459821 3.533033 4.410650 5.446892 6.468964 8.405156
7 0.000000 0.486633 2.560113 3.437762 4.473614 5.495941 7.440721
8 0.518599 0.000000 1.561677 2.436310 3.473427 4.497171 6.443451
9 2.525085 1.493574 0.000000 0.429875 1.467480 2.492644 4.463771
10 3.465481 2.433809 0.499402 0.000000 0.527884 1.554218 3.540493
11 4.497042 3.465439 1.530986 0.521601 0.000000 0.523013 2.556065
12 5.342058 4.310497 2.376017 1.366597 0.341276 0.000000 1.788666
13 7.901132 6.863255 4.941781 3.928417 2.923256 2.273971 0.000000
14 7.782154 6.746808 4.815043 3.790372 2.766326 2.077512 0.492253
14
0 0.703212
1 12.250335
2 5.430658
3 11.253792
4 10.383930
5 9.382000
6 8.406976
7 7.441895
8 6.444094
9 4.461567
10 3.531133
11 2.517604
12 1.686975
13 0.444277
14 0.000000
(So, first row contains the distance to all points in the centroid column, including the first point).
What I actually want is to merge this matrix to the dataframe AND that the column names be the ids from GEO.
Now, I know how to merge:
new = GEO.merge(GEO_distances, on=['index'])
which returns:
index id geometry_zone \
0 0 A001DFD POLYGON ((48.08793 50.93755, 48.08793 49.18650...
1 1 A001DG POLYGON ((60.96434 49.05222, 59.86796 49.29929...
2 2 A001DS007 POLYGON ((53.16200 50.20131, 52.84363 48.45026...
3 3 A001DS01 POLYGON ((59.04953 49.34561, 58.77158 47.52346...
4 4 A001DS02 POLYGON ((58.12301 49.46915, 57.79873 47.67788...
5 5 A001DS03 POLYGON ((57.07498 49.66937, 56.79702 47.84722...
6 6 A001DS04 POLYGON ((56.13302 49.80835, 55.83962 48.00164...
7 7 A001DS05 POLYGON ((55.16017 49.93189, 54.89766 48.18694...
8 8 A001DS06 POLYGON ((54.14099 50.05542, 53.86304 48.27959...
9 9 A001DS08 POLYGON ((52.22678 50.36050, 51.94821 48.52985...
10 10 A001DS09 POLYGON ((50.93339 48.70894, 51.96811 48.52985...
11 11 A001DS10 POLYGON ((50.23695 50.67887, 49.91857 48.84823...
12 12 A001DS11 POLYGON ((50.23695 50.67887, 49.60020 50.75847...
13 13 A001FS01 POLYGON ((46.47617 48.94772, 46.47617 47.63443...
14 14 A001FS02 POLYGON ((46.49606 50.04213, 46.47617 48.94772...
centroid 0 1 2 3 \
0 POINT (48.75295 49.98494) 0.000000 11.063874 4.299228 10.275246
1 POINT (60.27696 48.21993) 10.983097 0.000000 6.348082 0.616036
2 POINT (53.49869 49.22928) 4.132203 6.259105 0.000000 5.469828
3 POINT (59.29040 48.38586) 9.982697 0.409114 5.348195 0.000000
4 POINT (58.42620 48.49535) 9.112541 1.279148 4.477119 0.487986
5 POINT (57.43469 48.68996) 8.102334 2.289412 3.468492 1.497509
6 POINT (56.46528 48.82210) 7.124643 3.266993 2.490125 2.475753
7 POINT (55.50608 48.98701) 6.151367 4.240258 1.517485 3.448859
8 POINT (54.51093 49.10232) 5.151208 5.240246 0.515855 4.450013
9 POINT (52.52668 49.40021) 3.145284 7.246023 0.481768 6.456493
10 POINT (51.59314 49.51614) 2.205711 8.185458 1.420986 7.396838
11 POINT (50.57522 49.68396) 1.174092 9.217045 2.452510 8.428427
12 POINT (49.74105 49.81923) 0.329081 10.062023 3.297427 9.273461
13 POINT (47.00679 48.58955) 1.235000 12.579303 5.838504 11.812993
14 POINT (47.23437 49.55921) 0.853558 12.484730 5.717153 11.712257
4 5 6 7 8 9 10 \
0 9.312075 8.274448 7.312941 6.343811 5.312333 3.377798 2.368462
1 1.399226 2.373198 3.374784 4.353762 5.318769 7.388784 8.269175
2 4.507633 3.469029 2.507443 1.538467 0.506829 0.544190 1.416284
3 0.399280 1.373252 2.374671 3.353424 4.318400 6.388838 7.269062
4 0.000000 0.504366 1.503677 2.482659 3.447704 5.519952 6.398068
5 0.538514 0.000000 0.494605 1.473030 2.437971 4.509526 5.388997
6 1.515950 0.474954 0.000000 0.494829 1.459821 3.533033 4.410650
7 2.489192 1.448060 0.487174 0.000000 0.486633 2.560113 3.437762
8 3.488962 2.449214 1.487936 0.518599 0.000000 1.561677 2.436310
9 5.494540 4.455695 3.494278 2.525085 1.493574 0.000000 0.429875
10 6.433798 5.396039 4.434327 3.465481 2.433809 0.499402 0.000000
11 7.465334 6.427628 5.465988 4.497042 3.465439 1.530986 0.521601
12 8.310263 7.272662 6.311059 5.342058 4.310497 2.376017 1.366597
13 10.830385 9.818336 8.852372 7.901132 6.863255 4.941781 3.928417
14 10.730567 9.711458 8.743639 7.782154 6.746808 4.815043 3.790372
11 12 13 14
0 1.343153 0.675055 1.051959 0.703212
1 9.305375 10.325337 12.247130 12.250335
2 2.454398 3.479383 5.430826 5.430658
3 8.304972 9.325272 11.250890 11.253792
4 7.434796 8.456133 10.381205 10.383930
5 6.424600 7.445701 9.379809 9.382000
6 5.446892 6.468964 8.405156 8.406976
7 4.473614 5.495941 7.440721 7.441895
8 3.473427 4.497171 6.443451 6.444094
9 1.467480 2.492644 4.463771 4.461567
10 0.527884 1.554218 3.540493 3.531133
11 0.000000 0.523013 2.556065 2.517604
12 0.341276 0.000000 1.788666 1.686975
13 2.923256 2.273971 0.000000 0.444277
14 2.766326 2.077512 0.492253 0.000000
But, how do I give the column the id names in a simple way? Manually renaming 18 000 columns is not my idea of a fun afternoon.
I have found an answer to my question, but I still wonder if there is a better and more elegant way to do this (e.g. on the fly). What I did was this:
new_column_name = GEO.id.to_list()
columnlist = GEO_distances.columns.to_list()
cols_remove = ['index','id','geometry_zone','centroid']
old_column_names = [x for x in columnlist if (x not in cols_remove)]
col_rename_dict = {i:j for i,j in zip(old_column_names,new_column_name)}
GEO_distances.rename(columns=col_rename_dict, inplace=True)
which gives:
index id geometry_zone \
0 0 A001DFD POLYGON ((48.08793 50.93755, 48.08793 49.18650...
1 1 A001DG POLYGON ((60.96434 49.05222, 59.86796 49.29929...
2 2 A001DS007 POLYGON ((53.16200 50.20131, 52.84363 48.45026...
3 3 A001DS01 POLYGON ((59.04953 49.34561, 58.77158 47.52346...
4 4 A001DS02 POLYGON ((58.12301 49.46915, 57.79873 47.67788...
... ... ... ...
1790 1790 R13C1G POLYGON ((63.72846 54.07087, 61.04155 54.02454...
1791 1791 R13D1A POLYGON ((63.03727 60.43190, 65.27641 57.78312...
1792 1792 R13D1D POLYGON ((68.90781 67.16844, 68.95414 60.51294...
1793 1793 R13D1F POLYGON ((61.42043 67.16403, 75.48019 67.22166...
1794 1794 R13D1G POLYGON ((61.40300 67.15300, 61.43388 63.43148...
centroid A001DFD A001DG A001DS007 A001DS01 \
0 POINT (48.75295 49.98494) 0.000000 11.063874 4.299228 10.275246
1 POINT (60.27696 48.21993) 10.983097 0.000000 6.348082 0.616036
2 POINT (53.49869 49.22928) 4.132203 6.259105 0.000000 5.469828
3 POINT (59.29040 48.38586) 9.982697 0.409114 5.348195 0.000000
4 POINT (58.42620 48.49535) 9.112541 1.279148 4.477119 0.487986
... ... ... ... ... ...
1790 POINT (62.36165 51.28081) 12.814471 2.630419 8.337061 3.267367
1791 POINT (69.85889 59.16021) 21.991462 13.464194 18.191827 14.124815
1792 POINT (72.22137 63.86261) 26.206982 18.602918 22.776510 19.187716
1793 POINT (68.46954 68.61039) 26.045757 20.948750 23.468535 21.237352
1794 POINT (65.33358 63.93216) 20.589210 15.508162 17.853344 15.717912
A001DS02 A001DS03 A001DS04 A001DS05 A001DS06 A001DS08 \
0 9.312075 8.274448 7.312941 6.343811 5.312333 3.377798
1 1.399226 2.373198 3.374784 4.353762 5.318769 7.388784
2 4.507633 3.469029 2.507443 1.538467 0.506829 0.544190
3 0.399280 1.373252 2.374671 3.353424 4.318400 6.388838
4 0.000000 0.504366 1.503677 2.482659 3.447704 5.519952
... ... ... ... ... ... ...
1790 3.862726 4.616091 5.526811 6.415345 7.329588 9.280236
1791 14.623171 15.220831 15.921820 16.621702 17.363742 18.956701
1792 19.622822 20.146582 20.757196 21.374153 22.035882 23.454339
1793 21.458123 21.751888 22.104256 22.496392 22.947829 23.939341
1794 15.894837 16.153433 16.481252 16.864660 17.318735 18.347265
Any other more efficient solution is welcome.

Year On Year Growth Using Pandas - Traverse N rows Back

I have a lot of parameters on which I have to calculate the year on year growth.
Type 2006-Q1 2006-Q2 2006-Q3 2006-Q4 2007-Q1 2007-Q2 2007-Q3 2007-Q4 2008-Q1 2008-Q2 2008-Q3 2008-Q4
MonMkt_IntRt 3.44 3.60 3.99 4.40 4.61 4.73 5.11 4.97 4.92 4.89 5.29 4.51
RtlVol 97.08 97.94 98.25 99.15 99.63 100.29 100.71 101.18 102.04 101.56 101.05 99.49
IntRt 4.44 5.60 6.99 7.40 8.61 9.73 9.11 9.97 9.92 9.89 7.29 9.51
GMR 9.08 9.94 9.25 9.15 9.63 10.29 10.71 10.18 10.04 10.56 10.05 9.49
I need to calculate the growth, i.e in column 2007-Q1 i need to find the growth from 2006-Q1. The formula is (2007-Q1/2006-Q1) - 1
I have gone through the link below and tried to code
Calculating year over year growth by group in Pandas
df = pd.read_csv('c:/Econometric/EconoModel.csv')
df.set_index('Type',inplace=True)
df.sort_index(axis=1, inplace=True)
df_t = df.T
df_output=(df_cd_americas_t/df_cd_americas_t.shift(4)) -1
The output is as below
Type 2006-Q1 2006-Q2 2006-Q3 2006-Q4 2007-Q1 2007-Q2 2007-Q3 2007-Q4 2008-Q1 2008-Q2 2008-Q3 2008-Q4
MonMkt_IntRt 0.3398 0.3159 0.2806 0.1285 0.0661 0.0340 0.0363 -0.0912
RtlVol 0.0261 0.0240 0.0249 0.0204 0.0242 0.0126 0.0033 -0.0166
IntRt 0.6666 0.5375 0.3919 0.2310 0.1579 0.0195 0.0856 -0.2688
GMR 0.0077 -0.031 0.1124 0.1704 0.0571 -0.024 -0.014 -0.0127
Use iloc to shift data slices. See an example on test df.
df= pd.DataFrame({i:[0+i,1+i,2+i] for i in range(0,12)})
print(df)
0 1 2 3 4 5 6 7 8 9 10 11
0 0 1 2 3 4 5 6 7 8 9 10 11
1 1 2 3 4 5 6 7 8 9 10 11 12
2 2 3 4 5 6 7 8 9 10 11 12 13
df.iloc[:,list(range(3,12))] = df.iloc[:,list(range(3,12))].values/ df.iloc[:,list(range(0,9))].values - 1
print(df)
0 1 2 3 4 5 6 7 8 9 10
0 0 1 2 inf 3.0 1.50 1.00 0.75 0.600000 0.500000 0.428571
1 1 2 3 3.0 1.5 1.00 0.75 0.60 0.500000 0.428571 0.375000
2 2 3 4 1.5 1.0 0.75 0.60 0.50 0.428571 0.375000 0.333333
11
0 0.375000
1 0.333333
2 0.300000
I could not find any issue with your code.
Simply added axis=1 to the dataframe.shift() method as you are trying to do the column comparison
I have executed the following code it is giving the result you expected.
def getSampleDataframe():
df_economy_model = pd.DataFrame(
{
'Type':['MonMkt_IntRt', 'RtlVol', 'IntRt', 'GMR'],
'2006-Q1':[3.44, 97.08, 4.44, 9.08],
'2006-Q2':[3.6, 97.94, 5.6, 9.94],
'2006-Q3':[3.99, 98.25, 6.99, 9.25],
'2006-Q4':[4.4, 99.15, 7.4, 9.15],
'2007-Q1':[4.61, 99.63, 8.61, 9.63],
'2007-Q2':[4.73, 100.29, 9.73, 10.29],
'2007-Q3':[5.11, 100.71, 9.11, 10.71],
'2007-Q4':[4.97, 101.18, 9.97, 10.18],
'2008-Q1':[4.92, 102.04, 9.92, 10.04],
'2008-Q2':[4.89, 101.56, 9.89, 10.56],
'2008-Q3':[5.29, 101.05, 7.29, 10.05],
'2008-Q4':[4.51, 99.49, 9.51, 9.49]
}) # Your data
return df_economy_model>
df_cd_americas = getSampleDataframe()
df_cd_americas.set_index('Type', inplace=True)
df_yearly_growth = (df/df.shift(4, axis=1))-1
print (df_cd_americas)
print (df_yearly_growth)

How to create a Chord diagram out of this dataset format?

I have a dataset which consists of passes made and received by a player with every teammate. A sample dataset looks like this:
ter Stegen Pique Rakitic Busquets Coutinho Suarez Messi \
ter Stegen 0 8 0 2 0 1 1
Pique 12 0 2 20 0 0 1
Rakitic 3 3 0 13 5 2 6
Busquets 1 1 9 0 0 0 8
Coutinho 0 0 2 1 0 4 6
Suarez 0 0 2 1 2 0 1
Messi 0 2 5 1 3 4 0
Lenglet 4 6 8 8 1 0 0
Alba 1 1 8 4 5 8 5
Roberto 4 11 5 4 0 4 6
Vidal 1 10 5 8 3 2 7
Lenglet Alba Roberto Vidal
ter Stegen 4 3 5 5
Pique 9 2 10 5
Rakitic 4 8 2 5
Busquets 4 8 7 12
Coutinho 0 3 0 1
Suarez 0 5 3 3
Messi 0 4 3 4
Lenglet 0 4 0 4
Alba 6 0 1 4
Roberto 1 0 0 8
Vidal 5 7 6 0
How do I visualize this in the form of a chord diagram which shows the flow of passes from every player to every other? I've tried using Holoviews and Plotly but I can't crack how to work with data in this format. Any help would be appreciated.
Here's the entire code:
import pandas as pd
import holoviews as hv
from holoviews import opts, dim
from bokeh.plotting import show, output_file
import numpy as np
pd.set_option("display.max_columns",11)
hv.extension('bokeh')
hv.output(size = 200)
df = pd.read_csv(r"C:\Users\ADMIN\Desktop\Abhishek\BarLiv.csv")
df = df.set_index("0")
df.index.name = None
#print(df)
# Declare a gridded HoloViews dataset and call dframe to flatten it
players = list(df.columns)
data = hv.Dataset((players, players, df), ['source', 'target']).dframe()
#print(players)
# Now create your Chord diagram from the flattened data
chord = hv.Chord(data)
chord.opts(
node_color='index', edge_color='source', label_index='index',
cmap='Category10', edge_cmap='Category10', width=500, height=500)
output_file('chordtest.html')
show(hv.render(chord))
Edit 1: Here's what I'm getting after implementing #philippjfr's solution
HoloViews has provides a neat little trick that makes this pretty easy, you can declare a gridded Dataset from your dataframe and then flatten it:
df = pd.read_csv('/Users/philippjfr/Downloads/BarLiv.csv', index_col=0)
# Declare a gridded HoloViews dataset and call dframe to flatten it
data = hv.Dataset((list(df.columns), list(df.index), df),
['source', 'target'], 'value').dframe()
# Now create your Chord diagram from the flattened data
chord = hv.Chord(data)
chord.opts(
node_color='index', edge_color='source', label_index='index',
cmap='Category10', edge_cmap='Category10', width=500, height=500)

Create a frequency diagram using a dataframe in Pandas (Python3)

I currently have a list of the number of items and their frequency stored in a data frame called transactioncount_freq.
Item Frequency
0 1 3474
1 2 2964
2 3 1532
3 4 937
4 5 360
5 6 168
6 7 57
7 8 25
8 9 5
9 10 5
10 11 3
11 12 1
How would I make a bar chart using the item values as the x axis and the frequency values as the y axis using pandas and matplotlib.pyplot?
You can plot it easily like this
transactioncount_freq.plot(x='Item', y='Frequency', kind='bar')

gnuplot: fetching a variable value from different row/column for calculations

I want to get a specific value from another row & column to normalize my data. The tricky part is, that this value changes for every data point in my data set.
Here what my data set looks like:
64 22370 1 585 1 10
128 47547 1 4681 1 10
256 291761 1 37449 1 10
128 48446 1.019 4681 1 10
256 480937 1.648 37449 1 10
128 7765 0.163 777 0.166 10
256 7164 0.025 1393 0.037 10
128 37078 0.780 4681 1 10
256 334372 1.146 37449 1 10
128 45543 0.958 4681 1 10
128 5579 0.117 649 0.139 10
128 40121 0.844 4529 0.968 10
128 49494 1.041 4681 1 10
# --> here it starts to repeat
64 48788 1 585 1 20
128 110860 1 4681 1 20
256 717797 1 37449 1 20
128 101666 0.917 4681 1 20
......
......
This data file contains all points for in total 13 different sets, so I plot it with something like this:
plot\
'../logs.dat' every 13::1 u 6:2 title '' with lines lt 3 lc 'black' lw 1,\
'../logs.dat' every 13::3 u 6:2 title '' with lines lt 3 lc 'black' lw 1,\
Now I try to normalize my data. The interesting value is respectively the 1st row 2nd column (starting to count at 0) $1:$2 and then adds 13 to the rows for every data point
For example: The first data set I want to plot would be
(10:47547/47547)
(20:110860/110860)
...
The second plot should be
(10:48446/47547)
(20:101666/110860)
...
And so on.
In pseudo code I would read something like
plot\
'../logs.dat' every 13::1 u 6:($2 / take i:$2 for i = i + 13 ) title '' with lines lt 3 lc 'black' lw 1,\
'../logs.dat' every 13::3 u 6:($2 / take i:$2 for i = i + 13 ) title '' with lines lt 3 lc 'black' lw 1,\
I hope I could make clear what I try to archive.
Thank you for any help!
If the value you want to use for normalisation is the very first to be plotted, then something like this is possible:
plot y0=-1e10, "data" using 1:(y0 == -1e10 ? (y0 = $2, 1) : $2/y0)
The normalisation value y0 is initialised to -1e10 on every replot. Check the help for ternary operator and serial evaluation.
But really you'd better pre-process your data.
If I understood your question correctly you want to normalize some of your data in a special way.
For the first plot you want to start from the second line (row-index 1) and divide the value in the column by itself and continue for every 13th row.
So, this is dividing the values of the second column for the following row indices: 1/1, 14/14, 27/27, ..., (n*13+1)/(n*13+1). This is trivial because it will always be 1.
For the second plot you want to start with the value in column 2 from row index 3 and divide it by the value in column2 of row index 1 and repeat this for every 13th row.
i.e. involved rows-indices: 3/1, 16/14, 29/27, ..., (n*13+3)/(n*13+1)
For the second case, a construct with every 13 will not work because you need every 13th value and every 13th shifted by 2 rows.
So, what you can do:
If you pass by row-index 1 (and every 13th row later), remember the value in column 2 and when you pass by row-index 3, divide this value by the remembered value and plot it, otherwise plot NaN. Repeat this for all rows cycled by 13. You can use the pseudocolumn 0 (check help pseudocolumns) and the modulo operator (check help operators binary).
If you want a continuous line with lines or linespoints you need to set datafile missing NaN because NaN values would interrupt the lines (check help missing). However, this works only for gnuplot>=5.0.6. For gnuplot 5.0.0 (version at OP's question) you have to use some workaround.
Script:
### special normalization of data
reset session
$Data <<EOD
1 900 3 4 5 10
2 1000 3 4 5 10
3 1050 3 4 5 10
4 1100 3 4 5 10
5 1150 3 4 5 10
6 1200 3 4 5 10
7 1250 3 4 5 10
8 1300 3 4 5 10
9 1350 3 4 5 10
10 1400 3 4 5 10
11 1450 3 4 5 10
12 1500 3 4 5 10
13 1550 3 4 5 10
#
1 1900 3 4 5 20
2 2000 3 4 5 20
3 2050 3 4 5 20
4 2100 3 4 5 20
5 2150 3 4 5 20
6 2200 3 4 5 20
7 2250 3 4 5 20
8 2300 3 4 5 20
9 2350 3 4 5 20
10 2400 3 4 5 20
11 2450 3 4 5 20
12 2500 3 4 5 20
13 2550 3 4 5 20
#
1 2900 3 4 5 30
2 3000 3 4 5 30
3 3050 3 4 5 30
4 3100 3 4 5 30
5 3150 3 4 5 30
6 3200 3 4 5 30
7 3250 3 4 5 30
8 3300 3 4 5 30
9 3350 3 4 5 30
10 3400 3 4 5 30
11 3450 3 4 5 30
12 3500 3 4 5 30
13 3550 3 4 5 30
EOD
M = 13 # cycle of your data
set datafile missing NaN # only for gnuplot>=5.0.6
plot $Data u 6:(1) every M w lp pt 7 lc "red" ti "Normalized 1/1", \
'' u 6:(int($0)%M==1?y0=$2:0,int($0)%M==3?$2/y0:NaN) w lp pt 7 lc "blue" ti "Normalized 3/1"
### end of code
Result:

Resources