I have the following dataset as a small part of the big dataset.
PM2.5 is the dependent variable, while the other seven-column
represent the independent variables, AOD, BLH, RH, WS, Prec. and Temp.
I am looking to use the Support Vector Method SVM multiple regression
to find the best fit multiple variable regression equation using the python code.
I will appreciate your help a lot.
PM2.5 AOD BLH RH WS Prec Temp SLP
43.52 0.42 0.39 0.74 1.2 0.4 4.95 1.03
18.4 0.31 0.41 0.71 2.9 0.0 13.4 1.02
53.36 0.30 0.91 0.75 3.21 2.8 17.2 1.01
18.83 0.36 0.29 0.48 1.7 0.6 20.5 1.02
21.2 0.39 0.36 0.52 0.93 0.1 22.0 1.02
12.17 0.15 0.69 0.52 0.55 0.1 18.67 1.01
8.75 0.11 0.42 0.59 4.98 0.1 18.67 1.01
7.7 0.31 0.048 0.52 0.95 0.0 22.44 1.02
6.58 0.05 0.48 0.57 2.75 0.0 32.38 1.02
Data as an xls file is here
Thanks a lot in advance
Related
I'm trying to train a DNN model using one dataset with huge difference in stdev. The following scalers were tested but none of them work: MinMaxScaler, StandardScaler, RobustScaler, PowerTransformer. The reason they didn't work was that those models can achieve high predictive performance on the validation sets but they had little predictivity on external test sets. The dataset has more than 10,000 rows and 200 columns. Here are a prt of statistics of the dataset.
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11
mean 11.31 -1.04 11.31 0.21 0.55 359.01 337.64 358.58 131.70 0.01 0.09
std 2.72 1.42 2.72 0.24 0.20 139.86 131.40 139.67 52.25 0.14 0.47
min 2.00 -10.98 2.00 0.00 0.02 59.11 50.04 59.07 26.00 0.00 0.00
5% 5.24 -4.07 5.24 0.01 0.19 190.25 178.15 190.10 70.00 0.00 0.00
25% 10.79 -1.35 10.79 0.05 0.41 269.73 254.14 269.16 98.00 0.00 0.00
50% 12.15 -0.64 12.15 0.13 0.58 335.47 316.23 335.15 122.00 0.00 0.00
75% 12.99 -0.21 12.99 0.27 0.72 419.42 394.30 419.01 154.00 0.00 0.00
95% 14.17 0.64 14.17 0.73 0.85 594.71 560.37 594.10 220.00 0.00 1.00
max 19.28 2.00 19.28 5.69 0.95 2924.47 2642.23 2922.13 1168.00 6.00 16.00
I am trying to see if mlflow is the right place to store my metrics in the model tracking. According to the doc log_metric takes either a key value or a dict of key-values. I am wondering how to log something like below into mlflow so it can be visualized meaningfully.
precision recall f1-score support
class1 0.89 0.98 0.93 174
class2 0.96 0.90 0.93 30
class3 0.96 0.90 0.93 30
class4 1.00 1.00 1.00 7
class5 0.93 1.00 0.96 13
class6 1.00 0.73 0.85 15
class7 0.95 0.97 0.96 39
class8 0.80 0.67 0.73 6
class9 0.97 0.86 0.91 37
class10 0.95 0.81 0.88 26
class11 0.50 1.00 0.67 5
class12 0.93 0.89 0.91 28
class13 0.73 0.84 0.78 19
class14 1.00 1.00 1.00 6
class15 0.45 0.83 0.59 6
class16 0.97 0.98 0.97 245
class17 0.93 0.86 0.89 206
accuracy 0.92 892
macro avg 0.88 0.90 0.88 892
weighted avg 0.93 0.92 0.92 892
I am trying to create a column which contains only the minimum of the one row and a few columns, for example:
A0 A1 A2 B0 B1 B2 C0 C1
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72
Here I am trying to create a column which contains the minimum for each row of columns B0, B1, B2.
The output would look like this:
A0 A1 A2 B0 B1 B2 C0 C1 Minimum
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75 0.42
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39 0.39
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73 0.00
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03 0.51
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61 0.51
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53 0.17
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72 0.01
Here is part of the code, but it is not doing what I want it to do:
for i in range(0,2):
df['Minimum'] = df.loc[0,'B'+str(i)].min()
This is a one-liner, you just need to use the axis argument for min to tell it to work across the columns rather than down:
df['Minimum'] = df.loc[:, ['B0', 'B1', 'B2']].min(axis=1)
If you need to use this solution for different numbers of columns, you can use a for loop or list comprehension to construct the list of columns:
n_columns = 2
cols_to_use = ['B' + str(i) for i in range(n_columns)]
df['Minimum'] = df.loc[:, cols_to_use].min(axis=1)
For my tasks a universal and flexible approach is the following example:
df['Minimum'] = df[['B0', 'B1', 'B2']].apply(lambda x: min(x[0],x[1],x[2]), axis=1)
The target column 'Minimum' is assigned the result of the lambda function based on the selected DF columns['B0', 'B1', 'B2']. Access elements in a function through the function alias and his new Index(if count of elements is more then one). Be sure to specify axis=1, which indicates line-by-line calculations.
This is very convenient when you need to make complex calculations.
However, I assume that such a solution may be inferior in speed.
As for the selection of columns, in addition to the 'for' method, I can suggest using a filter like this:
calls_to_use = list(filter(lambda f:'B' in f, df.columns))
literally, a filter is applied to the list of DF columns through a lambda function that checks for the occurrence of the letter 'B'.
after that the first example can be written as follows:
calls_to_use = list(filter(lambda f:'B' in f, df.columns))
df['Minimum'] = df[calls_to_use].apply(lambda x: min(x), axis=1)
although after pre-selecting the columns, it would be preferable:
df['Minimum'] = df[calls_to_use].min(axis=1)
I am studing container loading algorithm. When I have loading plan, I use gnuplot to plot the plan (3D) as in attachment. As all goods are cubic, I want to plot one cubic border line by yellow, next brown, then yellow, next brown. Of course, the color could be any. My purpose is that I could see better the cubic loading plan. Currently, I could only plot with same color.
The better is that Container cubic border line is its own.
Part of my test data is at /2/
/2/
++++++container 40 feet data###########
0 0 0
12.0 0 0
12.0 2.3 0
0 2.3 0
0 0 0
0 0 0
0 0 2.5
12.0 0 2.5
12.0 2.3 2.5
0 2.3 2.5
### container 40 feet data#########
##########first cubic #############
0 0 2.5
0.0 0.0 0.0
0.64 0.0 0.0
0.64 0.66 0.0
0.0 0.66 0.0
0.0 0.0 0.0
0.0 0.0 1.93
0.64 0.0 1.93
0.64 0.66 1.93
0.0 0.66 1.93
0.0 0.0 1.93
0.64 0.0 0.0
0.64 0.0 1.93
0.64 0.66 0.0
0.64 0.66 1.93
0.0 0.66 0.0
0.0 0.66 1.93
################# Second cubic#################
0.64 0.0 0.0
1.27 0.0 0.0
1.27 0.66 0.0
0.64 0.66 0.0
0.64 0.0 0.0
0.64 0.0 1.93
1.27 0.0 1.93
1.27 0.66 1.93
0.64 0.66 1.93
0.64 0.0 1.93
1.27 0.0 0.0
1.27 0.0 1.93
1.27 0.66 0.0
1.27 0.66 1.93
0.64 0.66 0.0
0.64 0.66 1.93
I have a data file:
0.4 -0.97
0.41 -0.96
0.42 -0.95
0.43 -0.93
0.44 -0.92
0.45 -0.91
0.46 -0.90
0.47 -0.88
0.48 -0.87
0.49 -0.86
0.5 -0.84
0.51 -0.83
0.52 -0.82
0.53 -0.81
0.54 -0.80
0.55 -0.78
0.56 -0.77
0.57 -0.76
0.58 -0.74
0.59 -0.73
0.6 -0.72
0.61 -0.71
0.62 -0.70
0.63 -0.69
0.64 -0.67
0.65 -0.66
0.66 -0.65
0.67 -0.64
0.68 -0.62
0.69 -0.61
0.7 -0.60
0.71 -0.59
0.72 -0.58
0.73 -0.56
0.74 -0.55
0.75 -0.54
0.76 -0.53
0.77 -0.52
0.78 -0.51
0.79 -0.50
0.8 -0.49
0.81 -0.47
0.82 -0.47
0.83 -0.46
0.84 -0.44
0.85 -0.43
0.86 -0.42
0.87 -0.41
0.88 -0.40
0.89 -0.39
0.9 -0.38
0.91 -0.49
0.92 -0.48
0.93 -0.47
0.94 -0.46
0.95 -0.44
0.96 -0.43
0.97 -0.42
0.98 -0.41
0.99 -0.40
1.0 -0.39
1.01 -0.38
1.02 -0.37
1.03 -0.36
1.04 -0.35
1.05 -0.34
1.06 -0.33
1.07 -0.32
1.08 -0.31
1.09 -0.30
1.1 -0.30
1.11 -0.29
1.12 -0.28
1.13 -0.27
1.14 -0.26
1.15 -0.25
1.16 -0.24
1.17 -0.24
1.18 -0.23
1.19 -0.22
1.2 -0.21
1.21 -0.20
1.22 -0.20
1.23 -0.19
1.24 -0.18
1.25 -0.17
1.26 -0.17
1.27 -0.16
1.28 -0.15
1.29 -0.14
1.3 -0.13
1.31 -0.12
1.32 -0.11
1.33 -0.11
1.34 -0.10
1.35 -0.09
1.36 -0.08
1.37 -0.08
1.38 -0.07
1.39 -0.06
1.4 -0.05
1.41 -0.04
1.42 -0.03
1.43 -0.03
1.44 -0.02
1.45 -0.01
1.46 -0.01
1.47 -0.00
1.48 0.00
1.49 0.01
1.5 0.02
1.51 0.03
1.52 0.04
1.53 0.04
1.54 0.05
1.55 0.06
1.56 0.06
1.57 0.07
1.58 0.08
1.59 0.08
1.6 0.09
1.61 0.09
1.62 0.10
1.63 0.10
1.64 0.10
1.65 0.11
1.66 0.11
1.67 0.12
1.68 0.12
1.69 0.13
1.7 0.14
1.71 0.14
1.72 0.14
1.73 0.15
1.74 0.15
1.75 0.16
1.76 0.16
1.77 0.17
1.78 0.17
1.79 0.18
1.8 0.19
1.81 0.20
1.82 0.20
1.83 0.21
1.84 0.21
1.85 0.22
1.86 0.22
1.87 0.23
1.88 0.24
1.89 0.24
1.9 0.25
1.91 0.25
1.92 0.26
1.93 0.26
1.94 0.26
1.95 0.27
1.96 0.28
1.97 0.28
1.98 0.28
1.99 0.29
2.0 0.29
2.01 0.29
2.02 0.29
2.03 0.30
2.04 0.30
2.05 0.30
2.06 0.31
2.07 0.32
2.08 0.32
2.09 0.33
2.1 0.33
2.11 0.33
2.12 0.34
2.13 0.34
2.14 0.34
2.15 0.35
2.16 0.35
2.17 0.36
2.18 0.36
2.19 0.36
2.2 0.37
2.21 0.37
2.22 0.37
2.23 0.38
2.24 0.38
2.25 0.38
2.26 0.38
2.27 0.39
2.28 0.39
2.29 0.39
2.3 0.40
2.31 0.40
2.32 0.40
2.33 0.40
2.34 0.41
2.35 0.41
2.36 0.42
2.37 0.42
2.38 0.43
2.39 0.43
2.4 0.43
2.41 0.43
2.42 0.44
2.43 0.44
2.44 0.44
2.45 0.44
2.46 0.45
2.47 0.45
2.48 0.45
2.49 0.45
2.5 0.46
2.51 0.46
2.52 0.46
2.53 0.47
2.54 0.47
2.55 0.47
2.56 0.48
2.57 0.48
2.58 0.49
2.59 0.36
2.6 0.36
2.61 0.36
2.62 0.36
2.63 0.37
2.64 0.37
2.65 0.37
2.66 0.37
2.67 0.38
2.68 0.38
2.69 0.38
2.7 0.38
2.71 0.38
2.72 0.38
2.73 0.38
2.74 0.38
2.75 0.38
2.76 0.38
2.77 0.38
2.78 0.38
2.79 0.39
2.8 0.39
2.81 0.39
2.82 0.39
2.83 0.39
2.84 0.39
2.85 0.28
2.86 0.28
2.87 0.28
2.88 0.28
2.89 0.28
2.9 0.28
2.91 0.28
2.92 0.28
2.93 0.29
2.94 0.29
2.95 0.29
2.96 0.29
2.97 0.29
2.98 0.29
2.99 0.29
3.0 0.19
3.01 0.19
3.02 0.19
3.03 0.19
3.04 0.19
3.05 0.19
3.06 0.19
3.07 0.19
3.08 0.20
3.09 0.20
3.1 0.20
3.11 0.20
3.12 0.20
3.13 0.20
3.14 0.20
3.15 0.20
3.16 0.20
3.17 0.20
3.18 0.21
3.19 0.21
3.2 0.21
3.21 0.21
3.22 0.21
3.23 0.21
3.24 0.21
3.25 0.21
3.26 0.21
3.27 0.21
3.28 0.21
3.29 0.21
3.3 0.21
3.31 0.21
3.32 0.21
3.33 0.21
3.34 0.21
3.35 0.21
3.36 0.21
3.37 0.22
3.38 0.22
3.39 0.22
3.4 0.22
3.41 0.22
3.42 0.22
3.43 0.22
3.44 0.22
3.45 0.22
3.46 0.22
3.47 0.22
3.48 0.22
3.49 0.22
3.5 0.22
3.51 0.23
3.52 0.23
3.53 0.23
3.54 0.23
3.55 0.23
3.56 0.13
3.57 0.13
3.58 0.13
3.59 0.13
3.6 0.13
3.61 0.13
3.62 0.13
3.63 0.13
3.64 0.13
3.65 0.13
3.66 0.13
3.67 0.13
3.68 0.13
3.69 0.13
3.7 0.13
3.71 0.13
3.72 0.14
3.73 0.14
3.74 0.14
3.75 0.14
3.76 0.05
3.77 0.05
3.78 0.05
3.79 0.05
3.8 0.05
3.81 -0.04
3.82 -0.04
3.83 -0.04
3.84 -0.04
3.85 -0.04
3.86 -0.04
3.87 -0.04
3.88 -0.04
3.89 -0.04
3.9 -0.04
3.91 -0.04
3.92 -0.04
3.93 -0.04
3.94 -0.04
3.95 -0.12
3.96 -0.12
3.97 -0.12
3.98 -0.12
3.99 -0.12
4.0 -0.12
4.01 -0.12
4.02 -0.12
4.03 -0.12
4.04 -0.12
4.05 -0.19
4.06 -0.19
4.07 -0.19
4.08 -0.19
4.09 -0.19
4.1 -0.19
4.11 -0.19
4.12 -0.41
4.13 -0.41
4.14 -0.41
4.15 -0.47
4.16 -0.47
4.17 -0.47
4.18 -0.47
4.19 -0.47
4.2 -0.47
4.21 -0.47
4.22 -0.54
4.23 -0.54
4.24 -0.60
4.25 -0.65
4.26 -0.65
4.27 -0.65
4.28 -0.65
4.29 -0.65
4.3 -0.65
4.31 -0.65
4.32 -0.65
4.33 -0.65
4.34 -0.65
4.35 -0.65
4.36 -0.65
4.37 -0.65
4.38 -0.71
4.39 -0.71
4.4 -0.71
4.41 -0.71
4.42 -0.71
4.43 -0.71
4.44 -0.71
4.45 -0.71
4.46 -0.71
4.47 -0.71
4.48 -0.71
4.49 -0.71
4.5 -0.71
4.51 -0.71
4.52 -0.71
4.53 -0.76
4.54 -0.76
4.55 -0.82
4.56 -0.82
4.57 -0.87
4.58 -0.87
4.59 -0.87
4.6 -0.87
4.61 -0.92
4.62 -0.97
4.63 -1.06
4.64 -1.06
4.65 -1.06
4.66 -1.06
4.67 -1.06
4.68 -1.06
4.69 -1.06
4.7 -1.06
4.71 -1.06
4.72 -1.06
4.73 -1.06
4.74 -1.11
4.75 -1.11
4.76 -1.11
4.77 -1.11
4.78 -1.11
4.79 -1.11
4.8 -1.11
4.81 -1.11
4.82 -1.11
4.83 -1.11
4.84 -1.11
4.85 -1.15
4.86 -1.15
4.87 -1.15
4.88 -1.15
I wish to create a "well" smoother curve, so i use
plot "for_gnuplot" lw 3 w l sm b title ""
I get the following image:
This is very nice, but i wish to mark the maximum in some way. I know that with sm b the maximum is not the real maximum of the plot, but i dont know how to mark this new maximum value.
Thanks
You can write the (x,y) data of the smoothed plot to a temporary file, do some statistics on this file, and plot the results:
# Generate the data for the smooth plot
set samples 1000
set table "temp.dat"
plot "for_gnuplot" lw 3 w l sm b title "1"
unset table
# Get maximum values and indices of maximum values:
# A_max_y, A_index_max_y, B_max_y, B_index_max_y
stats "for_gnuplot" prefix "A"
stats "temp.dat" using 1:2 prefix "B"
# Calculate positions from indices.
# We need the x-value (first column) at B_index_max_y. We know that the first
# column of "temp.dat" consists of equidistant x-values. So we just fit a
# linear function to map from index to position. (Could be done analytically.)
pos_from_index(x) = a*x + b
fit pos_from_index(x) "for_gnuplot" using 0:1 via a, b
A_xvalue_max_y = pos_from_index(A_index_max_y)
fit pos_from_index(x) "temp.dat" using 0:1 via a, b
B_xvalue_max_y = pos_from_index(B_index_max_y)
# Make some arrows to indicate maximal values
set arrow 1 from A_xvalue_max_y, graph 0.99 to A_xvalue_max_y, A_max_y fill lw 2
set arrow 2 from B_xvalue_max_y, graph 0.8 to B_xvalue_max_y, B_max_y fill lw 2
set label 1 at A_xvalue_max_y, graph 0.99 "max raw" offset 0.2, -0.3
set label 2 at B_xvalue_max_y, graph 0.8 "max smooth" center offset 0, -0.4
# Finally plot the graphs
set terminal png
set output "graph.png"
plot "for_gnuplot" lw 2 w l title "raw" ,\
"for_gnuplot" lw 2 w l sm b title "smooth"
This produces the following output:
PS: I would be interested if there is a more direct way to access a value from a file at a specific index.
Here is a link: http://www.phyast.pitt.edu/~zov1/gnuplot/html/statistics.html
Scroll to "Determining the position of the minimum and maximum".