I'm trying to train a DNN model using one dataset with huge difference in stdev. The following scalers were tested but none of them work: MinMaxScaler, StandardScaler, RobustScaler, PowerTransformer. The reason they didn't work was that those models can achieve high predictive performance on the validation sets but they had little predictivity on external test sets. The dataset has more than 10,000 rows and 200 columns. Here are a prt of statistics of the dataset.
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11
mean 11.31 -1.04 11.31 0.21 0.55 359.01 337.64 358.58 131.70 0.01 0.09
std 2.72 1.42 2.72 0.24 0.20 139.86 131.40 139.67 52.25 0.14 0.47
min 2.00 -10.98 2.00 0.00 0.02 59.11 50.04 59.07 26.00 0.00 0.00
5% 5.24 -4.07 5.24 0.01 0.19 190.25 178.15 190.10 70.00 0.00 0.00
25% 10.79 -1.35 10.79 0.05 0.41 269.73 254.14 269.16 98.00 0.00 0.00
50% 12.15 -0.64 12.15 0.13 0.58 335.47 316.23 335.15 122.00 0.00 0.00
75% 12.99 -0.21 12.99 0.27 0.72 419.42 394.30 419.01 154.00 0.00 0.00
95% 14.17 0.64 14.17 0.73 0.85 594.71 560.37 594.10 220.00 0.00 1.00
max 19.28 2.00 19.28 5.69 0.95 2924.47 2642.23 2922.13 1168.00 6.00 16.00
I am simulating something and want to figure out the influence of two parameters. Therefore I vary them both and look for the result on each pair of parameter values and get a result like:
0 1000 2000 3000 4000 5000 ....
0 13.2 14.8 19.9 25.5 27.3 ...
1000 21.3 25.9 32.3 etc.
2000 etc.
3000
4000
....
To visualize them, I use gnuplot, creating a heatmap, which works perfectly fine, showing me colors and height:
reset
set terminal qt
set title "Test"
unset key
set tic scale 0
set palette rgbformula 7,5,15
set cbrange [0:100]
set cblabel "Transmission"
set pm3d at s interpolate 1,1
unset surf
set xlabel "U_{Lense} [V]"
set ylabel "E_{Start} [eV]"
set datafile separator "\t"
splot "UT500test.csv" matrix rowheaders columnheaders
Now I want to look more detailed on some areas on my heatmap, and vary my parameters in steps of 100 difference, not 1000 as shown in the table above. But because the simulation takes quite a long time, I just do this for some areas, so my table looks like this:
0 1000 2000 2100 2200 2300 2400 ... 2900 3000 4000 ...
...
Now I want to show this in the heatmap, too. But everytime I tried this, all the bins on the heatmap, no matter if 1000 or 100 difference are of the same width. But I want the ones with 100 difference to be only 1/10 of the width of the 1000 differences. Is there a possibility to do this?
The extra steps with stats are not necessary.
You can access the true coordinates directly as a nonuniform matrix:
set offset 100,100,100,100
plot $Data matrix nonuniform using 1:2:3 with points pt 5 lc palette
The missing piece is to fill in the full area rather than plotting single points. You can do this using pm3d:
set pm3d corners2color mean
set view map
splot $Data matrix nonuniform with pm3d
The colors do not match the previous plot because pm3d considers all 4 corners of each box when assigning a color. I told it to take the mean value (that's the default) but many other variants are possible. You could smooth the coloring further with set pm3d interpolate 3,3
You could do something with plotting style with boxxyerror. It's pretty straightforward, except the way to get the x-coordinates into an array which will be used later during plotting. Maybe, there are smarter solutions.
Script:
### heatmap with irregular spacing
reset session
unset key
$Data <<EOD
0.00 0.00 1000 2000 2100 2200 2300 2400 3000 4000
1000 0.75 0.75 0.43 0.34 0.61 0.74 0.66 0.97 0.58
1100 0.82 0.90 0.18 0.12 0.87 0.15 0.01 0.57 0.97
1200 0.10 0.15 0.68 0.73 0.55 0.07 0.98 0.89 0.01
1300 0.67 0.38 0.41 0.85 0.37 0.45 0.49 0.21 0.98
1400 0.76 0.53 0.68 0.09 0.22 0.40 0.59 0.33 0.08
2000 0.37 0.32 0.30 NaN 0.33 NaN 0.73 0.94 0.96
3000 0.07 0.61 0.37 0.54 0.32 0.28 0.62 0.51 0.48
4000 0.79 0.98 0.78 0.06 0.16 0.45 0.83 0.50 0.10
5000 0.49 0.95 0.29 0.59 0.55 0.88 0.29 0.47 0.93
EOD
stats $Data nooutput
BoxHalfWidth=50
# put first row into array
array ArrayX[STATS_columns]
set table $Dummy
plot for [i=1:STATS_columns] $Data u (ArrayX[i]=column(i)) every ::0::0 with table
unset table
plot for [i=2:STATS_columns] $Data u (ArrayX[i]):1:(BoxHalfWidth):(BoxHalfWidth):i every ::1 with boxxyerror fs solid 1.0 palette
### end of script
Result:
Edit:
With a little bit more effort you can as well generate a plot which covers the whole area.
In contrast to the simpler code from #Ethan, the recangles are centered on the datapoint coordinates and have the color of the actual datapoint z-value. Furthermore, the datapoint (2200,2000) is also plotted. The borders of the rectangles are halfway between matrix points. The outer rectangles have dimensions equal to the x and y distance to the next inner matrix point.
Revision: (simplified version, works for gnuplot>=5.0.1)
The following solution works for gnuplot 5.0.1, but not for 5.0.0 (haven't found out yet why).
There will be a warning: warning: matrix contains missing or undefined values which can be ignored.
I noticed that there seems to be a bug(?!) with the matrix column index, but you can fix it with:
colIdxFix(n) = (r0=r1,r1=column(-1),r0==r1?c=c+1:c=1) # fix for missing column index in a matrix
plot r1=c=0 $Data nonuniform matrix u 1:2:(colIdxFix(0)) ....
Script: (works with gnuplot>=5.0.1)
### heatmap with irregular spacing with filled area
# compatible with gnuplot>=5.0.1
reset session
$Data <<EOD
0.00 0.00 1000 2000 2100 2200 2300 2400 3000 4000
1000 0.75 0.75 0.43 0.34 0.61 0.74 0.66 0.97 0.58
1100 0.82 0.90 0.18 0.12 0.87 0.15 0.01 0.57 0.97
1200 0.10 0.15 0.68 0.73 0.55 0.07 0.98 0.89 0.01
1300 0.67 0.38 0.41 0.85 0.37 0.45 0.49 0.21 0.98
1400 0.76 0.53 0.68 0.09 0.22 0.40 0.59 0.33 0.08
2000 0.37 0.32 0.30 NaN 0.33 NaN 0.73 0.94 0.96
3000 0.07 0.61 0.37 0.54 0.32 0.28 0.62 0.51 0.48
4000 0.79 0.98 0.78 0.06 0.16 0.45 0.83 0.50 0.10
5000 0.49 0.95 0.29 0.59 0.55 0.88 0.29 0.47 0.93
EOD
# get irregular x- and y-values into string
Xs = Ys = ""
stats $Data matrix u ($1==0 ? Ys=Ys.sprintf(" %g",$3) : 0, \
$2==0 ? Xs=Xs.sprintf(" %g",$3) : 0) nooutput
# box extension d in dn (negative) and dp (positive) direction
d(vs,n0,n1) = abs(real(word(vs,n0+1))-real(word(vs,n1+1)))/2.
dn(vs,n) = (n==1 ? (n0=1,n1=2) : (n0=n,n1=n-1), -d(vs,n0,n1))
dp(vs,n) = (Ns=words(vs)-1, n>=Ns ? (n0=Ns-1,n1=Ns) : (n0=n,n1=n+1), d(vs,n0,n1))
unset key
set offset 1,1,1,1
set style fill solid 1.0
colIdxFix(n) = (r0=r1,r1=column(-1),r0==r1?c=c+1:c=1) # fix for missing column index in a matrix (bug?!)
plot r1=c=0 $Data nonuniform matrix u 1:2:($1+dn(Xs,colIdxFix(0))):($1+dp(Xs,c)): \
($2+dn(Ys,int(column(-1))+1)):($2+dp(Ys,int(column(-1))+1)):3 w boxxy palette
### end of script
Result:
Edit2: (I leave this here for gnuplot 5.0.0)
Just for fun, here is the "retro-version" for gnuplot 5.0:
gnuplot5.0 does not support arrays. Although, gnuplot5.0 supports datablocks, but apparently indexing like $Datablock[1] does not work. So, the workaround-around is to put the matrix X,Y coordinates into strings CoordsX and CoordsY and get the coordinates with word(). If there is not another limitation with string and word(), the following worked with gnuplot5.0 and gave the same result as above.
Script:
### heatmap with irregular spacing with filled area
# compatible with gnuplot 5.0
reset session
unset key
$Data <<EOD
0.00 0.00 1000 2000 2100 2200 2300 2400 3000 4000
1000 0.75 0.75 0.43 0.34 0.61 0.74 0.66 0.97 0.58
1100 0.82 0.90 0.18 0.12 0.87 0.15 0.01 0.57 0.97
1200 0.10 0.15 0.68 0.73 0.55 0.07 0.98 0.89 0.01
1300 0.67 0.38 0.41 0.85 0.37 0.45 0.49 0.21 0.98
1400 0.76 0.53 0.68 0.09 0.22 0.40 0.59 0.33 0.08
2000 0.37 0.32 0.30 NaN 0.33 NaN 0.73 0.94 0.96
3000 0.07 0.61 0.37 0.54 0.32 0.28 0.62 0.51 0.48
4000 0.79 0.98 0.78 0.06 0.16 0.45 0.83 0.50 0.10
5000 0.49 0.95 0.29 0.59 0.55 0.88 0.29 0.47 0.93
EOD
stats $Data nooutput
ColCount = int(STATS_columns-1)
RowCount = int(STATS_records-1)
# put first row and column into arrays
CoordsX = ""
set table $Dummy
set xrange[0:1] # to avoid warnings
do for [i=2:ColCount+1] {
plot $Data u (Value=column(i)) every ::0::0 with table
CoordsX = CoordsX.sprintf("%g",Value)." "
}
unset table
CoordsY = ""
set table $Dummy
do for [i=1:RowCount] {
plot $Data u (Value=$1) every ::i::i with table
CoordsY= CoordsY.sprintf("%g",Value)." "
}
unset table
dx(i) = (word(CoordsX,i)-word(CoordsX,i-1))*0.5
dy(i) = (word(CoordsY,i)-word(CoordsY,i-1))*0.5
ndx(i,j) = word(CoordsX,i) - (i-1<1 ? dx(i+1) : dx(i))
pdx(i,j) = word(CoordsX,i) + (i+1>ColCount ? dx(i) : dx(i+1))
ndy(i,j) = word(CoordsY,j) - (j-1<1 ? dy(j+1) : dy(j))
pdy(i,j) = word(CoordsY,j) + (j+1>RowCount ? dy(j) : dy(j+1))
set xrange[ndx(1,1):pdx(ColCount,1)]
set yrange[ndy(1,1):pdy(1,RowCount)]
set tic out
plot for [i=2:ColCount+1] $Data u (real(word(CoordsX,i-1))):1:(ndx(i-1,int($0))):(pdx(i-1,int($0))): \
(ndy(i-1,int($0+1))):(pdy(i-1,int($0+1))):i every ::1 with boxxyerror fs solid 1.0 palette
### end of script
Im parsing linux sar output and i have dat file which looks like this :
07:09:49 CPU %usr %nice %sys %iowait %steal %irq %soft %guest %idle
07:09:51 all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
07:09:53 all 11.82 0.00 0.13 0.00 0.00 0.00 0.00 0.00 88.05
07:09:55 all 53.99 0.00 0.63 0.00 0.13 0.00 0.13 0.00 45.12
07:09:57 all 55.18 0.00 0.25 0.00 0.00 0.00 0.00 0.00 44.57
07:09:59 all 66.58 0.00 0.51 0.00 0.00 0.00 0.13 0.00 32.78
07:10:01 all 71.90 0.00 0.63 0.13 0.00 0.00 0.13 0.00 27.22
07:10:03 all 70.24 0.00 0.63 0.00 0.00 0.00 0.13 0.00 29.00
07:10:05 all 55.39 0.00 0.63 0.00 0.00 0.00 0.13 0.00 43.85
07:10:07 all 72.90 0.00 0.38 0.00 0.00 0.00 0.00 0.00 26.73
07:10:09 all 60.96 0.00 0.38 0.00 0.13 0.00 0.13 0.00 38.40
07:10:11 all 76.60 0.00 0.63 0.00 0.00 0.00 0.13 0.00 22.65
07:10:13 all 53.87 0.00 0.76 0.00 0.00 0.00 0.13 0.00 45.25
07:10:15 all 46.73 0.00 0.63 0.00 0.00 0.00 0.00 0.00 52.64
07:10:17 all 56.37 0.00 0.50 0.00 0.00 0.00 0.13 0.00 43.00
07:10:19 all 58.15 0.00 0.63 0.00 0.00 0.00 0.13 0.00 41.09
07:10:21 all 61.26 0.00 0.75 0.00 0.00 0.00 0.13 0.00 37.86
07:10:23 all 51.50 0.00 0.75 0.12 0.12 0.00 0.25 0.00 47.2
set title ' CPU usage'
set xdata time
set timefmt '%H:%M:%S'
set xlabel 'time'
set ylabel 'CPU Usage'
set style data lines
plot 'filename.dat' using 1:3 title '0.6'
pause -1
the out put in the X data is not related to the time presented in the file
You have to set the formatting of the tic labels:
set format x '%H:%M:%S'
So I am using a pm3d map to plot a data file with 3 columns x, y, z. The final plot shows some region in 2d and I have another data file x, y which are discrete coordinates of some of the points on the boundary of the region. I want to plot these points on top of a plot generated by pm3d map. If I simply try replot after plotting pm3d map, it doesn't show those points in the plot. Can anybody kindly tell me how can I achieve this?
Thanks in advance.
Edit: Here is the minimal example. The data file is something like this:
0.00 -0.50 4
0.00 -0.25 4
0.00 0.00 4
0.00 0.25 4
0.00 0.50 4
0.25 -0.50 1
0.25 -0.25 1
0.25 0.00 1
0.25 0.25 1
0.25 0.50 1
0.50 -0.50 0
0.50 -0.25 0
0.50 0.00 0
0.50 0.25 0
0.50 0.50 0
0.75 -0.50 0
0.75 -0.25 0
0.75 0.00 0
0.75 0.25 0
0.75 0.50 0
1.00 -0.50 3
1.00 -0.25 4
1.00 0.00 4
1.00 0.25 5
1.00 0.50 5
I am plotting this by following commands:
set pm3d map
set pm3d corners2color c1
spl 'file.dat'
I also have another file border.dat which contains discrete points like this:
0.00 -0.25
0.25 0.25
1.00 0.00
Now I want to plot the points (x and y coordinates) of the points given in this file on top of the plot that pm3d map (I am not using with pm3d; it's pm3d map!) generates for file.dat.
How can I achieve this?
Thank you
I have a program that generates Gnuplot scripts to display the pattern of a Jacobian matrix.
For a dense matrix, I obtain that kind of code which does exactly what I want:
set terminal wxt persist
set title 'The Answer'
set palette defined(0 "white",1 "blue")
set grid front
set xrange [0:7]
set yrange [0:7] reverse
set size ratio -1
unset colorbox
plot '-' using ($1+0.5):($2+0.5):($3 == 0 ? 0 : 1) matrix with image notitle
1.00 0.00 0.00 0.00 1.00 1.00 0.00
1.00 0.00 1.00 0.00 0.00 0.00 1.00
1.00 0.00 1.00 0.00 0.00 0.00 1.00
1.00 1.00 1.00 0.00 0.00 1.00 0.00
0.00 0.00 1.00 0.00 1.00 0.00 0.00
0.00 0.00 1.00 0.00 1.00 0.00 0.00
0.00 0.00 1.00 0.00 1.00 1.00 1.00
e
Since the rest of my code now handles sparse matrices, I have been trying to adapt the generation of Gnuplot scripts to achieve the same result with Gnuplot's binary data format. However, I have not succeeded without using an exterior file so far. How should I go about this?