My Data:
File 1:
2015-08-01 07:00 0.23 0.52 0.00 0.52 9 14.6 14.6 14.6 67 8.5 0.0 --- 0.00 0.0 --- 14.6 14.1 14.1 16.3 1016.2 0.00 0.0 156 0.22 156 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.05 23 1 100.0 1 1.8797836153192153 660.7143449269239
File 2:
2015-08-01 07:00 0.23 0.52 0.00 0.52 9 14.6 14.6 14.6 67 8.5 0.0 --- 0.00 0.0 --- 14.6 14.1 14.1 16.3 1016.2 0.00 0.0 156 0.22 156 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.05 23 1 100.0 1 1.8797836153192153 660.7143449269239
..... and so on.
So the csv. files are multiple days and from those days I created a scatterplot using 3:43:0
I used the 0as a dummy so I could use varialble linecolors (if I wouldn`t have done that the colors would have repeated themselfs after line 9)
The Scatter Plot looks great but now I want to fit a curve into the plot. There are 2 similar questions: Question 1 , Question covering the Fit Data from multiple files but when trying the cat or awk command I always end up with an error telling me cannot create pipe for data
So what I tried was:
fit f(x) '< cat file1.csv file2.csv file3.csv file4.csv file5.csv' u 3:43:0 a,b
am I missing something here?
Cou8ld this be a OS Problem? I run Windows 7.
Both cat and awk are Unix-commands. The windows-equivalent of cat is type. For instance, the following should work:
fit f(x) '< type file1.csv file2.csv' u 3:43:0 via a,b
If, for some reason, you need to use a tool like gawk (gnu-equivalent to awk), grep, or sed in windows, take a look at gnuwin32.
Related
I was wondering whether there is a way to seperate the table below in multiple sub dfs using the periodicity in the first column e.g between ~5,..,~0
before:
a b c
5.10 1.00 0.00
4.20 2.00 0.00
3.01 3.00 0.00
2.10 4.00 0.00
1.20 5.00 0.00
0.52 6.00 0.00
0.02 6.00 1.00
5.30 7.00 0.40
4.20 8.00 0.00
3.10 9.00 0.00
2.40 10.00 0.00
1.30 11.00 0.00
0.20 12.00 0.00
5.98 13.00 0.00
4.23 14.00 0.30
3.33 15.00 0.00
2.11 16.00 0.00
1.30 17.00 0.00
0.30 18.00 0.00
5.50 13.00 0.00
output after separating into multiple dfs :
"sub_df1"
5.10 1.00 0.00
4.20 2.00 0.00
3.01 3.00 0.00
2.10 4.00 0.00
1.20 5.00 0.00
0.52 6.00 0.00
0.02 6.00 0.00
"sub_df2"
5.30 7.00 0.00
4.20 8.00 0.00
3.10 9.00 0.00
2.40 10.00 0.00
1.30 11.00 0.00
0.20 12.00 0.00
"sub_df3"
5.98 13.00 0.00
4.23 14.00 0.00
3.33 15.00 0.00
2.11 16.00 0.00
1.30 17.00 0.00
0.30 18.00 0.00
"sub_df4"
5.50 13.00 0.00
The periodicity is variable in length so I cannot assume a fixed length to separate. Therefore, I thought first to add another column 'id' like
df['id']=(df['a'].shift(1)>df['a']).astype(int)
this could show me at least from where (1st:"0") to where (2nd"0") to append the values. However, I don't quite know how to continue from here
a b c id
0 4.20 2.0 0.0 0
1 3.01 3.0 0.0 1
2 2.10 4.0 0.0 1
3 1.20 5.0 0.0 1
4 0.52 6.0 0.0 1
5 0.02 6.0 1.0 1
6 5.30 7.0 0.4 0
7 4.20 8.0 0.0 1
8 3.10 9.0 0.0 1
9 2.40 10.0 0.0 1
10 1.30 11.0 0.0 1
11 0.20 12.0 0.0 1
12 5.98 13.0 0.0 0
13 4.23 14.0 0.3 1
14 3.33 15.0 0.0 1
15 2.11 16.0 0.0 1
16 1.30 17.0 0.0 1
17 0.30 18.0 0.0 1
18 5.50 13.0 0.0 0
You can create a series s to identify the different groups. From there, you can create multiple dataframes and add the to a dictionary of dataframes df_dict. I show oyu how to access these in the print statement.:
s = (df['a'] > df['a'].shift()).cumsum() + 1
df_dict = {}
for frame, data in df.groupby(s):
df_dict[f'df{frame}'] = data
print(df_dict['df1'], '\n\n',
df_dict['df2'], '\n\n',
df_dict['df3'], '\n\n',
df_dict['df4'])
a b c
0 5.10 1.0 0.0
1 4.20 2.0 0.0
2 3.01 3.0 0.0
3 2.10 4.0 0.0
4 1.20 5.0 0.0
5 0.52 6.0 0.0
6 0.02 6.0 1.0
a b c
7 5.3 7.0 0.4
8 4.2 8.0 0.0
9 3.1 9.0 0.0
10 2.4 10.0 0.0
11 1.3 11.0 0.0
12 0.2 12.0 0.0
a b c
13 5.98 13.0 0.0
14 4.23 14.0 0.3
15 3.33 15.0 0.0
16 2.11 16.0 0.0
17 1.30 17.0 0.0
18 0.30 18.0 0.0
a b c
19 5.5 13.0 0.0
Try this:
listofdfs = [y for x,y in df.groupby(df['a'].diff().gt(0).cumsum())]
or
dict(list(df.groupby(df['a'].diff().gt(0).cumsum())))
I have the following dataset as a small part of the big dataset.
PM2.5 is the dependent variable, while the other seven-column
represent the independent variables, AOD, BLH, RH, WS, Prec. and Temp.
I am looking to use the Support Vector Method SVM multiple regression
to find the best fit multiple variable regression equation using the python code.
I will appreciate your help a lot.
PM2.5 AOD BLH RH WS Prec Temp SLP
43.52 0.42 0.39 0.74 1.2 0.4 4.95 1.03
18.4 0.31 0.41 0.71 2.9 0.0 13.4 1.02
53.36 0.30 0.91 0.75 3.21 2.8 17.2 1.01
18.83 0.36 0.29 0.48 1.7 0.6 20.5 1.02
21.2 0.39 0.36 0.52 0.93 0.1 22.0 1.02
12.17 0.15 0.69 0.52 0.55 0.1 18.67 1.01
8.75 0.11 0.42 0.59 4.98 0.1 18.67 1.01
7.7 0.31 0.048 0.52 0.95 0.0 22.44 1.02
6.58 0.05 0.48 0.57 2.75 0.0 32.38 1.02
Data as an xls file is here
Thanks a lot in advance
I am studing container loading algorithm. When I have loading plan, I use gnuplot to plot the plan (3D) as in attachment. As all goods are cubic, I want to plot one cubic border line by yellow, next brown, then yellow, next brown. Of course, the color could be any. My purpose is that I could see better the cubic loading plan. Currently, I could only plot with same color.
The better is that Container cubic border line is its own.
Part of my test data is at /2/
/2/
++++++container 40 feet data###########
0 0 0
12.0 0 0
12.0 2.3 0
0 2.3 0
0 0 0
0 0 0
0 0 2.5
12.0 0 2.5
12.0 2.3 2.5
0 2.3 2.5
### container 40 feet data#########
##########first cubic #############
0 0 2.5
0.0 0.0 0.0
0.64 0.0 0.0
0.64 0.66 0.0
0.0 0.66 0.0
0.0 0.0 0.0
0.0 0.0 1.93
0.64 0.0 1.93
0.64 0.66 1.93
0.0 0.66 1.93
0.0 0.0 1.93
0.64 0.0 0.0
0.64 0.0 1.93
0.64 0.66 0.0
0.64 0.66 1.93
0.0 0.66 0.0
0.0 0.66 1.93
################# Second cubic#################
0.64 0.0 0.0
1.27 0.0 0.0
1.27 0.66 0.0
0.64 0.66 0.0
0.64 0.0 0.0
0.64 0.0 1.93
1.27 0.0 1.93
1.27 0.66 1.93
0.64 0.66 1.93
0.64 0.0 1.93
1.27 0.0 0.0
1.27 0.0 1.93
1.27 0.66 0.0
1.27 0.66 1.93
0.64 0.66 0.0
0.64 0.66 1.93
My Data looks like this:
2015-08-01 07:00 0.23 0.52 0.00 0.52 9 14.6 14.6 14.6 67 8.5 0.0 --- 0.00 0.0 --- 14.6 14.1 14.1 16.3 1016.2 0.00 0.0 156 0.22 156 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.05 23 1 100.0 1 1.8797836153192153 660.7143449269239
2015-08-01 07:01 0.25 0.53 0.00 0.53 0 14.6 14.6 14.6 67 8.5 0.0 --- 0.00 0.0 --- 14.6 14.1 14.1 16.3 1016.2 0.00 0.0 153 0.22 153 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.00 23 1 100.0 1 1.8894284951616422 657.3416264126714 105 73 121 163
2015-08-01 07:02 0.25 0.52 0.00 0.52 0 14.7 14.7 14.6 67 8.6 0.0 --- 0.00 0.0 --- 14.7 14.2 14.2 16.1 1016.2 0.00 0.0 139 0.20 139 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.00 24 1 100.0 1 1.8976360559992214 654.4985251906015
2015-08-01 07:03 0.26 0.53 0.00 0.53 0 14.7 14.7 14.7 67 8.6 0.0 --- 0.00 0.0 --- 14.7 14.2 14.2 16.1 1016.3 0.00 0.0 139 0.20 144 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.00 23 1 100.0 1 1.9047561611790007 652.0519661851259
2015-08-01 07:04 0.25 0.53 0.00 0.53 0 14.7 14.7 14.7 67 8.7 0.0 --- 0.00 0.0 --- 14.7 14.2 14.2 16.2 1016.3 0.00 0.0 141 0.20 141 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.00 24 1 100.0 1 1.903537153899393 652.4695341279602
2015-08-01 07:05 0.25 0.52 0.00 0.52 0 14.8 14.8 14.7 67 8.7 0.0 --- 0.00 0.0 --- 14.8 14.3 14.3 16.3 1016.3 0.00 0.0 148 0.21 148 0.0 0.00 0.0 0.002 0.000 23.9 39 9.1 23.4 0.00 23 1 100.0 1 1.897596925383499 654.5120216976508
........
........
I've got multiple files looking that way: so I got data from 2015-08-01, 2015-06-05 and so on.
i want to plot the 43rd row in relation to the 3rd and 25th row :-) in some kind of heat map style from all those files in ONE plot. So those are the rows want to pick out of each the file:
0.23 156 660.7143449269239
0.25 153 660.7143449269239
0.25 139 654.4985251906015
0.26 139 652.0519661851259
i got the format right through dgrid 3d and that ist my output so far:
here's my code
set dgrid3d
set grid
set palette model HSV defined ( 0 0 1 1, 1 1 1 1 )
set pm3d map
unset surf
set pm3d at b
splot "data_AIT_lvl1_20150604.csv" every ::121::600 using 3:25:43 lc palette title '{/Symbol l}average 20150604',\
"data1.csv" every ::121::361 using 3:25:43 lc palette title '{/Symbol l}average 20150605',\
"data2" every ::121::361 using 3:25:43 lc palette title '{/Symbol l}average 20150606',\
"data3.csv" every ::121::361 using 3:25:43 lc palette title '{/Symbol l}average 20150703',\
and so on for multple files
I like the output but I'd like to know if there's a way to improve the overlaying areas in the plot to distinguish the values better? Is there a gnuplot way to write all the data I hwant to plot from each file into one big table and plot the data from that table into a heat map. I tried a few things but somehow lost track of all my try and error steps so I thought maybe one of you could help me out with a clean approach to this.
Thanks for the answers so far I'm trying my best to specify my second question a bit more:
right now I have the values of multiple days plotted in the graph, it looks good but there are parts overlapping so I can't see the values (hue) of all the days in the plot.
Since in my experience, I tend to overcomplicate problems like this a bit so I decided to ask the question if there's a way to solve that.
I thought maybe by putting all the days into one big table all the data is plotted on one level so I'd get a simple colored heat map.
I tried Joces table solution, which works flawlessly but Joce was right, it didn't actually solve my problem.
as you can see there's now a huge block of data, with different colors, but you can't distinguish between the different days. Alos, the gap from the first picture (between the left big purple block and the entered orange block) is gone and melted into one big block.
So I think what I'm trying to ask is if there's another better way maybe with contour to get what I want.
What you ask for is
set table
set output "one_big_table"
splot "file1" using c1:c2:c3:..., \
"file2" using C1:C2:C3:...., \
...
unset table
This will create as many blocks as you have files, so I am not sure your final goal will be so easy to achieve. That's a different issue though.
I want to store the output of the below command in an variable
This is my cmd
load=$(sar -q | awk -F'[\t] +\\' '{print $1,$2,$3,$4,$5,$6,$7 }')
When I am trying to store the output in an variable I am getting like this inux
3.13.0-45-generic
(vr1tel-Inspiron-3542)
03/27/2015
_i686_
(4
CPU)
06:47:44
AM
LINUX
RESTART
06:55:01
AM
runq-sz
plist-sz
ldavg-1
ldavg-5
ldavg-15
blocked
07:05:01
AM
2
449
1.08
1.01
0.76
0
07:15:01
AM
3
438
1.09
1.11
0.93
0
07:25:01
AM
0
434
0.29
0.69
0.85
0
Average:
2
440
0.82
0.94
but I want answer like this
Linux 3.13.0-45-generic (vr1tel-Inspiron-3542) 03/27/2015 i686 (4 CPU)
06:47:44 AM LINUX RESTART
06:55:01 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked
07:05:01 AM 2 449 1.08 1.01 0.76 0
07:15:01 AM 3 438 1.09 1.11 0.93 0
07:25:01 AM 0 434 0.29 0.69 0.85 0
Average: 2 440 0.82 0.94 0.85 0
08:08:13 AM LINUX RESTART
08:34:19 AM LINUX RESTART
08:35:01 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked
08:45:01 AM 0 437 0.26 0.51 0.46 1
08:55:01 AM 0 418 0.30 0.32 0.40 0
09:05:01 AM 0 348 1.18 0.60 0.48 0
09:15:01 AM 0 364 0.23 0.55 0.55 0
09:25:01 AM 0 364 0.42 0.39 0.46 0
09:35:01 AM 0 439 0.33 0.26 0.34 0
09:45:01 AM 0 469 0.38 0.40 0.36 0