I have a data with 109 columns and around 3000 rows. I would like to plot the average of the x1 until x108 (ignoring the y and z), the data looks like:
time x1 y1 z1 x2 y2 z2 x3 y3 z4 ...x108 y108 z108
With just a few columns it works well, for example:
time x1 y1 z1 x2 y2 z2 x3 y3 z3
plot 'file.dat' u 1:(($2+$5+$8)/3) with lines ls 4
But the problem comes when the data is bigger, for instance 108 columns or more. I would not like to do it manually, because the data will get bigger later.
I have tried:
plot for [i=2:108:3] 'file.dat' u 1:(column(i)) with lines ls 4
But then I would get plots for each combination, which is not what I want. So, how could I plot just the average of x1 ... x108 (ignoring the y and z)?
Thanks.
The following should do what you're looking for. The example below averages z1, ..., z3. In your case the parameters for averaging x1, ..., x108 would be ColStart=2, ColStep=3, and ColCount=108.
Also check help summation.
Code:
### average over several columns
reset session
$Data <<EOD
#n x1 y1 z1 x2 y2 z2 x3 y3 z3
1 1.11 1.21 1.31 2.11 2.21 2.31 3.11 3.21 3.31
2 1.12 1.22 1.32 2.12 2.22 2.32 3.12 3.22 3.32
3 1.13 1.23 1.33 2.13 2.23 2.33 3.13 3.23 3.33
4 1.14 1.24 1.34 2.14 2.24 2.34 3.14 3.24 3.34
5 1.15 1.25 1.35 2.15 2.25 2.35 3.15 3.25 3.35
6 1.16 1.26 1.36 2.16 2.26 2.36 3.16 3.26 3.36
7 1.17 1.27 1.37 2.17 2.27 2.37 3.17 3.27 3.37
8 1.18 1.28 1.38 2.18 2.28 2.38 3.18 3.28 3.38
9 1.19 1.29 1.39 2.19 2.29 2.39 3.19 3.29 3.39
EOD
ColStart = 4
ColStep = 3
ColCount = 3
plot $Data u 1:((sum[i=0:ColCount-1] column(i*ColStep+ColStart))/ColCount) w lp pt 7 notitle
### end of code
Result:
Related
I have a df with groundwater level time series and I am trying to remove the outliers from the data. I tend to do it using a rolling window, so the outlier removal method I want to use is Generalized Extreme Studentized Deviate (ESD). But due to the fact that my timesieres are sometimes not normally distributed, I want to apply this method for a specific time window (12months or 24months) for monthly data to get better results.
from __future__ import print_function, division
import numpy as np
import matplotlib.pylab as plt
from PyAstronomy import pyasl
# Convert data given at:
# http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h3.htm
# to array.
x = np.array([float(x) for x in "-0.25 0.68 0.94 1.15 1.20 1.26 1.26 1.34 1.38 1.43 1.49 1.49 \
1.55 1.56 1.58 1.65 1.69 1.70 1.76 1.77 1.81 1.91 1.94 1.96 \
1.99 2.06 2.09 2.10 2.14 2.15 2.23 2.24 2.26 2.35 2.37 2.40 \
2.47 2.54 2.62 2.64 2.90 2.92 2.92 2.93 3.21 3.26 3.30 3.59 \
3.68 4.30 4.64 5.34 5.42 6.01".split()])
# Apply the generalized ESD
r = pyasl.generalizedESD(x, 10, 0.05, fullOutput=True)
print("Number of outliers: ", r[0])
print("Indices of outliers: ", r[1])
print(" R Lambda")
for i in range(len(r[2])):
print("%2d %8.5f %8.5f" % ((i+1), r[2][i], r[3][i]))
# Plot the "data"
plt.plot(x, 'b.')
# and mark the outliers.
for i in range(r[0]):
plt.plot(r[1][i], x[r[1][i]], 'rp')
plt.show()
I just simply want to apply the code abow to a rolling window in my dataframe an remove outliers.
thank you,
I am generating line charts with the following syntax:
df2 = df2[['runtime','per','dev','var']]
op = f"/tmp/image.png"
fig, ax = plt.subplots(facecolor='darkslategrey')
df2.plot(x='runtime',xlabel="Date", kind='line', marker='o',linewidth=2,alpha=.7,subplots=True,color=['khaki', 'lightcyan','thistle'])
plt.style.use('dark_background')
plt.suptitle(f'Historical Data:', fontsize=12,fontname = 'monospace')
#file output
plt.savefig(op, transparent=False,bbox_inches="tight")
plt.close('all')
Where df2 dataframe sample:
runtime per dev var
1 2021-05-28 50.85 2.11 2.13
1 2021-05-30 50.85 2.11 2.13
1 2021-06-02 51.13 2.16 2.11
1 2021-06-04 51.13 2.16 2.11
1 2021-06-07 51.13 2.16 2.11
1 2021-06-09 51.11 2.13 2.10
1 2021-06-10 51.11 2.13 2.10
1 2021-06-14 51.11 2.13 2.10
1 2021-06-16 51.34 2.12 2.10
1 2021-06-18 51.34 2.12 2.10
1 2021-06-21 51.34 2.12 2.10
1 2021-06-23 51.69 1.97 2.17
1 2021-06-25 51.69 1.97 2.17
1 2021-06-28 51.69 1.97 2.17
1 2021-06-30 56.46 1.74 2.14
1 2021-07-02 56.46 1.74 2.14
1 2021-07-05 56.46 1.74 2.14
1 2021-07-07 55.10 1.84 2.08
1 2021-07-09 55.10 1.84 2.08
1 2021-07-12 55.10 1.84 2.08
1 2021-07-14 54.58 1.85 2.07
1 2021-07-16 54.58 1.85 2.07
1 2021-07-19 54.58 1.85 2.07
1 2021-07-21 54.33 1.87 2.06
1 2021-07-23 54.33 1.87 2.06
1 2021-07-26 54.33 1.87 2.06
1 2021-07-28 54.98 1.91 2.19
1 2021-07-30 54.98 1.91 2.19
This works great.
Now, I would like to change the color of points if their values are "abnormal", specifically if per < 90.00 or per > 10.00, or if dev < 10.00 or if var < 10.00 to color the point RED.
Is this possible?
Instead of drawing the 3 subplots in one call, they could be drawn one-by-one. First draw the subplot as before, and on top of it a scatter plot, only with the "abnormal" points. zorder=3 makes sure that the scatter dots appear on top of the existing dots.
Here is some example code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df2 = pd.DataFrame({'runtime': pd.date_range('20210101', freq='D', periods=100),
'per': np.random.uniform(1, 99, 100),
'dev': np.random.uniform(1, 11, 100),
'var': np.random.uniform(2, 11, 100)})
fig, axs = plt.subplots(nrows=3, figsize=(6, 10), facecolor='darkslategrey', sharex=True)
for ax, column, color, (min_normal, max_normal) in zip(axs,
['per', 'dev', 'var'],
['khaki', 'lightcyan', 'thistle'],
[(10, 90), (-np.inf, 10), (-np.inf, 10)]):
df2.plot(x='runtime', xlabel="Date", y=column, ylabel=column,
kind='line', marker='o', linewidth=2, alpha=.7, color=color, legend=False, ax=ax)
df_abnormal = df2[(df2[column] < min_normal) | (df2[column] > max_normal)]
df_abnormal.plot(x='runtime', xlabel="Date", y=column, ylabel=column,
kind='scatter', marker='o', color='red', legend=False, zorder=3, ax=ax)
plt.style.use('dark_background')
plt.suptitle(f'Historical Data:', fontsize=12, fontname='monospace')
plt.tight_layout()
plt.show()
I am trying to use the summation expression in Gnuplot but it is not working properly. I have the following data structure with many number of rows:
t x1 y1 z1 x2 y2 z2 x3 y3 z3 ... x98 y98 z98
I would like to plot the following equation:
u = (sqrt(sum(x)**2 + sum(y)**2 + sum(z)**2))/98
98 is the number of points (x,y,z).
What I have until now is how to plot the average of columns x1, x2, x3.. as following:
plot 'data file' u 1:((sum[i=0:ColCount-1] column(i*ColStep+ColStart))/ColCount) w lines ls 4 notitle
Where ColCount = 98, ColStep = 3 and ColStart=2.
But I have been trying to plot the equation, but it is not working. I would really appreciate any help.
What the following script does:
It takes the square root of the sum of (x1+x2+x3)**2 and (y1+y2+y3)**2 and (z1+z2+z3)**2. This you can adapt to your column numbers.
But I'm still not sure whether this is what you want. Please clarify.
Code:
### summing up columns
reset session
$Data <<EOD
#t x1 y1 z1 x2 y2 z2 x3 y3 z3
1 1.11 1.21 1.31 2.11 2.21 2.31 3.11 3.21 3.31
2 1.12 1.22 1.32 2.12 2.22 2.32 3.12 3.22 3.32
3 1.13 1.23 1.33 2.13 2.23 2.33 3.13 3.23 3.33
4 1.14 1.24 1.34 2.14 2.24 2.34 3.14 3.24 3.34
5 1.15 1.25 1.35 2.15 2.25 2.35 3.15 3.25 3.35
6 1.16 1.26 1.36 2.16 2.26 2.36 3.16 3.26 3.36
7 1.17 1.27 1.37 2.17 2.27 2.37 3.17 3.27 3.37
8 1.18 1.28 1.38 2.18 2.28 2.38 3.18 3.28 3.38
9 1.19 1.29 1.39 2.19 2.29 2.39 3.19 3.29 3.39
EOD
ColStep = 3
ColCount = 3
mySum(ColStart) = sum[i=0:ColCount-1] column(i*ColStep+ColStart)
plot $Data u 1:(sqrt(mySum(2)**2 + mySum(3)**2 + mySum(4)**2)) w lp pt 7 notitle
### end of code
Result:
The floating point numbers with finite precision are represented with different precision in identical conditions
It is detected and tested on python version 3.x under Linux and Windows. And take the negative effect for the next calculation.
for i in range(100):
k = 1 + i / 100;
print(k)
1.0
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09
1.1
1.11
1.12
1.13
1.1400000000000001
1.15
1.16
1.17
1.18
1.19
1.2
1.21
1.22
1.23
1.24
1.25
1.26
1.27
1.28
1.29
1.3
1.31
1.32
1.33
1.34
1.35
1.3599999999999999
1.37
1.38
1.3900000000000001
1.4
1.41
1.42
1.43
1.44
1.45
1.46
1.47
1.48
1.49
1.5
1.51
1.52
1.53
1.54
1.55
1.56
1.5699999999999998
1.58
1.5899999999999999
1.6
1.6099999999999999
1.62
1.63
1.6400000000000001
1.65
1.6600000000000001
1.67
1.6800000000000002
1.69
1.7
1.71
1.72
1.73
1.74
1.75
1.76
1.77
1.78
1.79
1.8
1.81
1.8199999999999998
1.83
1.8399999999999999
1.85
1.8599999999999999
1.87
1.88
1.8900000000000001
1.9
1.9100000000000001
1.92
1.9300000000000002
1.94
1.95
1.96
1.97
1.98
1.99
It is possible to set the precision in the following way:
for i in range(100):
k = 1 + i / 100;
print("%.Nf"%k)
Where N - decimal numbers.
Keep in mind, that regularly you don't need a lot of them, though the number could be really huge.
I have a long.dat file as following.
#x1 y1 sd1 x2 y2 sd2 x3 y3 sd3
2.50 9.04 0.03 2.51 16.08 0.04 2.50 26.96 0.07
2.25 9.06 0.05 1.84 16.01 0.16 1.91 26.94 0.21
1.11 9.12 0.19 1.06 15.90 0.14 1.30 26.41 0.10
0.71 9.97 0.18 0.86 16.47 0.33 0.92 28.59 0.92
0.60 11.36 0.24 0.77 17.31 0.18 0.73 33.55 1.40
0.56 12.44 0.55 0.72 18.25 0.25 0.65 37.82 2.16
0.50 14.23 0.37 0.71 18.73 0.49 0.57 44.75 2.69
0.43 16.93 1.20 0.63 20.55 0.64 0.51 52.11 1.01
0.38 19.18 1.12 0.57 22.27 0.94 0.47 58.01 2.17
0.32 24.83 2.26 0.52 25.04 0.53 0.42 65.92 2.62
0.30 28.87 1.39 0.46 29.75 2.41 0.38 71.60 1.81
0.25 34.23 2.07 0.41 37.92 1.49 0.34 75.81 0.68
0.21 39.52 0.53 0.37 43.33 1.81 0.32 77.12 0.68
0.16 44.10 1.81 0.32 47.22 0.57 0.28 79.87 2.03
0.13 49.73 1.19 0.28 49.36 0.99 0.22 85.93 1.32
0.13 49.73 1.19 0.22 53.94 0.98 0.19 89.10 2.14
0.13 49.73 1.19 0.18 57.28 1.56 0.16 96.48 1.28
0.13 49.73 1.19 0.14 63.66 1.90 0.14 100.09 1.46
0.13 49.73 1.19 0.12 67.92 0.64 0.12 103.90 0.48
0.13 49.73 1.19 0.12 67.92 0.64 0.12 103.90 0.48
I tried to fit my data with second order polynomial. I am having problems with
(1) My x1,y1,sd1 data coluns are shorter than x2,y,sd2. So I had to append x1,y2,sd1 at x1= 0.13. Otherwise, text file is doing "something" resulting wrong plotting. Is there any way to avoid it rather than appending with same values?
(2) In my plotting, the fit f8(x) is extending the last value at about 7.5 to match f12(x) at about x = 8.25. If I set my x-range [0:100], all the fits extend to x=100. How can I control this?
Here are the codes,
Set key left
f8(x) = a8*x*x+b8*x+c8
fit f8(x) 'long.dat' u (1/$1):($2/800**3) via a8,b8,c8
plot f8(x), 'long.dat' u (1/$1):($2/800**3): ($3/800**3) w errorbars not
f10(x) = a10*x*x+b10*x+c10
fit f10(x) 'long.dat' u (1/$4):($5/1000**3) via a10,b10,c10
replot f10(x), 'long.dat' u (1/$4):($5/1000**3): ($6/1000**3) w errorbars not
f12(x) = a12*x*x+b12*x+c12
fit f12(x) 'long.dat' u (1/$7):($8/1200**3) via a12,b12,c12
replot f12(x), '' u (1/$7):($8/1200**3): ($9/1200**3) w errorbars not
(3) I tried to use logistic fit g(x) = a/(1+bexp(-kx)) on x1,y1 data set but severaly failed! Codes are here,
Set key left
g(x) = a/(1+b*exp(-k*x))
fit g(x) 'long.dat' u (1/$1):($2/800**3) via a,b,k
plot g(x), 'long.dat' u (1/$1):($2/800**3): ($3/800**3) w errorbars not
Any comment/suggestion would be highly appreciated! Many many thanks for going through this big post and any feedback in advance!
1) you can use the NaN keyword for the missing points: gnuplot will ignore them
2) if what you want to plot is a function, by definition it's defined for every x so it will extend allover
what you might want to do is to store the fitted points on a file, something like:
set table "func.txt"
plot [0.5:7.5] f(x)
unset table
and then plot the file rather than the function. you might want to use the samples of your datafile to tune the result: type "help samples"
Some more suggestions besides #bibi's answer:
How should gnuplot know, that at a certain row the first number it encounters belongs to column 4? For this you can use e.g. a comma as column delimiter:
0.16, 44.10, 1.81, 0.32, 47.22, 0.57, 0.28, 79.87, 2.03
0.13, 49.73, 1.19, 0.28, 49.36, 0.99, 0.22, 85.93, 1.32
, , , 0.22, 53.94, 0.98, 0.19, 89.10, 2.14
And tell gnuplot about it:
set datafile separator ','
All functions are drawn with the same xrange. You can use different limits for a function by return 1/0 when outside the desired range:
f(x) = a*x**2 + b*x + c
f_p(x, min, max) = (x >= min && x <= max) ? f(x) : 1/0
plot f_p(x, 0.5, 7.5)
You can use stats to extract the limits:
stats 'long.dat' using (1/$1) name 'A_' nooutput
plot f_p(x, A_min, A_max)
For fitting, gnuplot uses 1 as starting value for the parameters, if you haven't assigned them an explicit value. And you can imagine, that with a=1 you're not too close to your values of 1e-7. For nonlinear fitting, there doesn't exists one unique solution only, for all starting values. So its all about finding the correct starting value and a proper model function.
With the starting values a=1e-7; b = 50; k = 1 you get a solution, but the fit isn't very good.