Reading right ascension/declination coordinates in gnuplot - gnuplot

I have a two column file with right ascension/declination coordinates:
18:42:21.8 -23:04:52
20:55:00.8 -17:23:19
I can read the first column specifying data as 'timefmt' but it seems there is no way to do a similar reading for angular data. I could, of course delete :'s and plot ($2+$3/60+$3/3600) but I wonder if there is a more elegant way.

You can define a function which is doing the job for you, which might be a bit more convenient and shorter in the plot command.
Convert your hours, minutes, seconds or degrees, minutes, seconds into seconds via strptime() or timecolumn(). In gnuplot console type check help strptime, help timecolumn and help time_specifiers. Use %tH:%tM:%tS, not %H:%M:%S.
However, you have to be careful how gnuplot interprets negative times:
if your input time is for example -00:17:56.7 gnuplot will interpret this as +00:17:56.7 which is not what you expect. Apparently, -00 is equal to +00 and hence 17 is interpreted as positive, although you intended it to be negative. A workaround in this special case would be the following:
Create a function myTimeSign(s) which checks if hours are 0 and if the first character of your time is - and will return -1, and 1 otherwise.
myTimeSign(s) = strptime("%tH",s)==0 && s[1:1] eq '-' ? -1 : 1
Multiply this with your time. This will do here as workaround, but not in general.
Update:
This has been reported as bug (https://sourceforge.net/p/gnuplot/bugs/2245/) and is already fixed in the development version of gnuplot.
Code:
### time / angle conversion
reset session
set size square
set object 1 rect from graph 0,0 to graph 1,1 fc rgb "black"
$Orion <<EOD
05:55:10.29 +07:24:25.3 0.42 Betelgeuse
05:14:32.27 -08:12:05.9 0.18 Rigel
05:25:07.87 +06:20:59.0 1.64 Bellatrix
05:32:00.40 -00:17:56.7 2.20 Mintaka
05:36:12.81 -01:12:06.9 1.69 Alnilam
05:40:45.52 -01:56:33.3 1.88 Alnitak
05:47:45.39 -09:40:10.6 2.07 Saiph
05:35:08.28 +09:56:03.0 3.47 Meissa
EOD
myTimeFmt = "%tH:%tM:%tS"
RA(n) = timecolumn(n,myTimeFmt)
myTimeSign(s) = strptime("%tH",s)==0 && s[1:1] eq '-' ? -1 : 1 # returns -1 if hours are -00
Dec(n) = timecolumn(n,myTimeFmt)*myTimeSign(strcol(n))
set xrange[strptime(myTimeFmt,"06:12"):strptime(myTimeFmt,"05:00")] reverse
set format x "%H^h%M^m" time
set yrange[strptime(myTimeFmt,"-12:00"):strptime(myTimeFmt,"+12:00")]
set format y "%tH°%tM'" time
set tics out
plot $Orion u (RA(1)):(Dec(2)):(-log10($3)+1.5) w p pt 7 ps var lc rgb "yellow" notitle
### end of code
Result:

Related

Plotting Average curve for points in gnuplot

[Current]
I am importing a text file in which the first column has simulation time (0~150) the second column has the delay (0.01~0.02).
1.000000 0.010007
1.000000 0.010010
2.000000 0.010013
2.000000 0.010016
.
.
.
149.000000 0.010045
149.000000 0.010048
150.000000 0.010052
150.000000 0.010055
which gives me the plot:
[Desired]
I need to plot an average line on it like shown in the following image with red line:
Here is a gnuplot only solution with sample data:
set table "test.data"
set samples 1000
plot rand(0)+sin(x)
unset table
You should check the gnuplot demo page for a running average. I'm going to generalize this demo in terms of dynamically building the functions. This makes it much easier to change the number of points include in the average.
This is the script:
# number of points in moving average
n = 50
# initialize the variables
do for [i=1:n] {
eval(sprintf("back%d=0", i))
}
# build shift function (back_n = back_n-1, ..., back1=x)
shift = "("
do for [i=n:2:-1] {
shift = sprintf("%sback%d = back%d, ", shift, i, i-1)
}
shift = shift."back1 = x)"
# uncomment the next line for a check
# print shift
# build sum function (back1 + ... + backn)
sum = "(back1"
do for [i=2:n] {
sum = sprintf("%s+back%d", sum, i)
}
sum = sum.")"
# uncomment the next line for a check
# print sum
# define the functions like in the gnuplot demo
# use macro expansion for turning the strings into real functions
samples(x) = $0 > (n-1) ? n : ($0+1)
avg_n(x) = (shift_n(x), #sum/samples($0))
shift_n(x) = #shift
# the final plot command looks quite simple
set terminal pngcairo
set output "moving_average.png"
plot "test.data" using 1:2 w l notitle, \
"test.data" using 1:(avg_n($2)) w l lc rgb "red" lw 3 title "avg\\_".n
This is the result:
The average lags quite a bit behind the datapoints as expected from the algorithm. Maybe 50 points are too many. Alternatively, one could think about implementing a centered moving average, but this is beyond the scope of this question.
And, I also think that you are more flexible with an external program :)
Here's some replacement code for the top answer, which makes this also work for 1000+ points and much much faster. Only works in gnuplot 5.2 and later I guess
# number of points in moving average
n = 5000
array A[n]
samples(x) = $0 > (n-1) ? n : int($0+1)
mod(x) = int(x) % n
avg_n(x) = (A[mod($0)+1]=x, (sum [i=1:samples($0)] A[i]) / samples($0))
Edit
The updated question is about a moving average.
You can do this in a limited way with gnuplot alone, according to this demo.
But in my opinion, it would be more flexible to pre-process your data using a programming language like python or ruby and add an extra column for whatever kind of moving average you require.
The original answer is preserved below:
You can use fit. It seems you want to fit to a constant function. Like this:
f(x) = c
fit f(x) 'S1_delay_120_LT100_LU15_MU5.txt' using 1:2 every 5 via c
Then you can plot them both.
plot 'S1_delay_120_LT100_LU15_MU5.txt' using 1:2 every 5, \
f(x) with lines
Note that this is technique can be used with arbitrary functions, not just constant or lineair functions.
I wanted to comment on Franky_GT, but somehow stackoverflow didn't let me.
However, Franky_GT, your answer works great!
A note for people plotting .xvg files (e.g. after doing analysis of MD simulations), if you don't add the following line:
set datafile commentschars "##&"
Franky_GT's moving average code will result in this error:
unknown type in imag()
I hope this is of use to anyone.
For gnuplot >=5.2, probably the most efficient solution is using an array like #Franky_GT's solution.
However, it uses the pseudocolumn 0 (see help pseudocolumns). In case you have some empty lines in your data $0 will be reset to 0 which eventually might mess up your average.
This solution uses an index t to count up the datalines and a second array X[] in case a centered moving average is desired. Datapoints don't have to be equidistant in x.
At the beginning there will not be enough datapoints for a centered average of N points so for the x-value it will use every second point and the other will be NaN, that's why set datafile missing NaN is necessary to plot a connected line at the beginning.
Code:
### moving average over N points
reset session
# create some test data
set print $Data
y = 0
do for [i=1:5000] {
print sprintf("%g %g", i, y=y+rand(0)*2-1)
}
set print
# average over N values
N = 250
array Avg[N]
array X[N]
MovAvg(col) = (Avg[(t-1)%N+1]=column(col), n = t<N ? t : N, t=t+1, (sum [i=1:n] Avg[i])/n)
MovAvgCenterX(col) = (X[(t-1)%N+1]=column(col), n = t<N ? t%2 ? NaN : (t+1)/2 : ((t+1)-N/2)%N+1, n==n ? X[n] : NaN) # be aware: gnuplot does integer division here
set datafile missing NaN
plot $Data u 1:2 w l ti "Data", \
t=1 '' u 1:(MovAvg(2)) w l lc rgb "red" ti sprintf("Moving average over %d",N), \
t=1 '' u (MovAvgCenterX(1)):(MovAvg(2)) w l lw 2 lc rgb "green" ti sprintf("Moving average centered over %d",N)
### end of code
Result:

Gnuplot: Expression in input data

Is there a way to specify that input data is an expression that needs to be evaluated?
In my case the data is rational numbers encoded in the format "n/d". Is there a way to tell gnuplot to interpret "n/d" as "n divided by d"?
Example input data:
1/9 1
1/8 2
1/7 3
1/6 4
I tried plot "data" using ($1):2 but this truncates "n/d" to "n".
Update: After some digging in the manual, I found that in this case I can tell gnuplot to interpret "/" as a column separator and then divide the first number by the second as follows: plot "data" using ($1/$2):3 '%lf/%lf %lf'
I don't know a gnuplot only answer. But you can use the system command to let another program do the work. For example the bc program on linux. The following script works for me:
result(s) = system(sprintf('echo "%s" | bc -l ~/.bcrc', s)) + 0
set table "data.eval"
plot "data.dat" using 1:(result(strcol(2)))
unset table
This is the datafile:
1 1/2
2 1/2.0
3 4+4
4 4*5-1
5 4*(5-1)-(3-7)
6 sin(3.1415)
This is the output:
# Curve 0 of 1, 6 points
# Curve title: ""data.dat" using 1:(result(strcol(2)))"
# x y type
1 0.5 i
2 0.5 i
3 8 i
4 19 i
5 20 i
6 9.26536e-05 i
Notes:
The set table "data.eval" prints the values into a file, now it is easier to check the results.
strcol(2) reads the entries of the second column as a string. The expression must not contain white space.
The function result transfers the string to bc. The string itself must be quoted, else the shell would complain for example about brackets as in line 5 or 6 of the datafile.
The option -l on bc enables floating point evaluation of expressions like in the first line (1/2 = 0.5 instead of 1/2 = 0), and it defines functions like s(x) for sine and e(x) for exp(x).
~/.bcrc reads some function definitions
The system command returns a string. The string is promoted to a floating point number by adding 0.
My ~/.bcrc looks like this:
pi=4*a(1)
e=e(1)
define ln(x)
{return(l(x))}
define lg(x)
{return(l(x)/l(10))}
define exp(x)
{return(e(x))}
define sin(x)
{return(s(x))}
define fac(x)
{if (x<=1) return(1);
return(fac(x-1)*x)}
define ncr(n,r)
{return(fac(n)/(fac(r)*fac(n-r)))}
Tested with gnuplot 4.6 and bc 1.06.95 on Debian Jessie. On Windows you have the set command for integer calculations. It seems that Google knows some other commandline calculators.
It wouldn't be gnuplot if there wasn't a gnuplot-only solution.
Simply collect your expressions in a string by "mis"using stats and evaluate them via eval in a do for loop and write the results in a string and convert the values to a number via real and plot them.
Check help stats, help do, help eval, help real and the example below. Most of the data is taken from #maij's answer. The script works for gnuplot>=5.0 and with some adaptions probably with earlier versions.
Script: (works for gnuplot>=5.0, Jan 2015)
### evaluate expressions in input data
reset session
$Data <<EOD
1 1/2 # integer division
2 1/2.0 # float division
3 4+4
4 4*5-1
5 4*(5-1)-(3-7)
6 sin(3.1415/2)
7 2**3
8 sqrt(9)
EOD
myCol = 2
myExprs = ''
stats $Data u (myExprs=myExprs.sprintf(' "v=%s"',strcol(myCol))) nooutput
myValues = ''
do for [i=1:words(myExprs)] {
eval word(myExprs,i)
myValues = myValues.sprintf(" %g",v)
}
myValue(n) = real(word(myValues,int(column(n)+1)))
set offsets 0.5,0.5,2,0
plot $Data u 1:(myValue(0)) w lp pt 7 lc "red" ti "Expressions", \
'' u 1:(myValue(0)):2 w labels offset 0,1 notitle
### end of script
Result:

Different number of samples for different functions

plot x+3 , x**2+5*x+12
Is it possible to set x+3 to have only 2 samples and x**2+5*x+12 to have say 1000 samples in the same plot?
It can be done, but not out-of-the-box.
The first variant uses a temporary file to save one function with a low sampling rate and plotting it later together with the high-resolution function:
set samples 2
set table 'tmp.dat'
plot x+3
unset table
set samples 1000
plot 'tmp.dat' w lp t 'x+3', x**2 + 5*x + 12
This has the advantage, that you can use any sampling rates for both functions.
For you special case of 2 samples for one function, it can be done without an external file, but it involves quite some tricking:
set xrange [-10:10]
s = 1000
set samples s
f1(x) = x + 3
set style func linespoints
set style data linespoints
plot '+' using (x0 = (($0 == 0 || $0 == (s-1) )? $1 : x0), \
($0 < (s-2) ? 1/0 : x0)):(f1(x0)) t 'x+3',\
x**2 + 5*x + 12
What I did here is:
Use the special filename + to generate a set of coordinates in the current xrange. This must be set, no autoscaling is possible.
Skipping all points but the first and the last by giving them the value 1/0 doesn't work, because the two remaining points aren't connected.
So I store the first x-value (when $0, or column(0) equals 0) and use it when I encountered the second last points. For the last points, the usual values are used.
That works for your special case of 2 samples.
You must keep in mind, that the first function is treated as data, so you must use both set style data and set style func (just to show it).
The result with 4.6.4 is:
I am not sure if different samplings (as opposed to different ranges) are possible with gnuplot 5.x. If I missed that please let me know.
Here is a suggestion to have two different samplings in the same plot command without temporary files (or datablocks from gnuplot 5.0 on).
A requirement is a known xrange, i.e. it will work with autoscale only if you plot and replot the graph to automatically get xmin and xmax. For the second function you could also use '+' u 1:(f2($1)) w lp.
Script: (works for gnuplot>=4.4.0, March 2010)
### different samplings in one plot command
reset
set xrange[xmin=-10:xmax=10]
f1(x) = x+3
f2(x) = x**2 + 5*x + 12
s1 = 3 # sampling 1
s2 = 101 # sampling 2
set samples (s1>s2?s1:s2) # the higher value
dx1 = real(xmax-xmin)/(s1-1) # determine dx1 for f1
plot '+' u (x0=xmin+$0*dx1):(f1(x0)) every ::0::s1-1 w lp pt 7 ti sprintf("%d samples",s1), \
f2(x) w lp pt 7 ti sprintf("%d samples",s2)
### end of script
Result:

Is there a way to have gnuplot use xaxis time data, but skip certain intervals (e.g. non-trading hours)

I'm collecting pricing data on stocks and options during trading hours and appending them to a data file that I plot with gnuplot. The file looks like:
2013-01-30--15:58:14 38.68 0.64
2013-01-30--15:58:44 38.70 0.64
2013-01-30--15:59:15 38.70 0.64
2013-01-30--15:59:45 38.69 0.64
I end up with large periods of time that I don't collect any data for since the markets are closed.
When I plot this data with gnuplot, using xdata as timefmt, it displays large gaps from the end of one day to the start of another.
I'd prefer to have it skip those times during the days where there is no actual data... Is there a way to do this?
I've been able to come close by not plotting the data against the time value in the first column, but I'd like to show the time data AS WELL AS skip those times when the data was not collected.
I hope this makes sense and appreciate your help.
If I understood correctly, you can make good use of a broken axis on x.
There are two ways to obtain broken axis. The first one relies on ternary operators to plot the data only in the region of your interest, which in your case should not even be necessary, and shifting the xtics left in order to reduce the dimension of the empty region.
This is a nice tutorial:
http://gnuplot-tricks.blogspot.com/2009/06/broken-axis-revisited.html
The second one makes uses of multiplots instead. This is probably better suit to your needs.
http://gnuplot-tricks.blogspot.com/2010/06/broken-axis-once-more.html
Hope it helps.
There are similar but slightly different questions:
GNUPLOT Plotting 5 day financial week
I have non-contiguous date/time X data and want non-contiguous X scale
The question is not about breaking the axis, but skipping time intervals with no data.
This can simply be done by plotting the y-data versus the row index (i.e. pseudocolumn 0) (check pseudocolumns), however, then the challenge is to get some reasonable xtics. Here are two suggestions.
Script: (works for gnuplot>=5.0.0, Jan. 2015)
### skip non-trading hours
reset session
FILE = "SO14618708.dat"
myTimeFmt = "%Y-%m-%d--%H:%M:%S"
# create some random test data
set print FILE
t0 = time(0)
y0 = 100
do for [i=0:400] {
t = t0 + i*1800
isOpen(t) = tm_wday(t)>0 && tm_wday(t)<6 && tm_hour(t)>=9 && tm_hour(t)<=17
if (isOpen(t)) {
print sprintf("%s %g",strftime(myTimeFmt,t),y0=y0+rand(0)*2-1)
}
}
set print
set format x "%a\n%d" timedate
set grid x,y
set ytics 5
set key noautotitle
set multiplot layout 3,1
set title "with non-trading hours"
plot FILE u (timecolumn(1,myTimeFmt)):2 w l lc "red"
set title "without non-trading hours, but possible duplicates in day tics"
set format x "\n" timedate
myXtic(col) = strftime("%a\n%d",strptime(myTimeFmt,strcol(col)))
N = 15
plot FILE u 0:2 w l lc "web-green", \
'' u ($0*N):(NaN):xtic(myXtic(1)) every N
N = 1
set title sprintf("with tics only every Nth day (here: N=%d)",N)
SecPerDay = 3600*24
isNewDay(col) = (t0=t1,t1=timecolumn(col,myTimeFmt),t0!=t0 || int(t1)/SecPerDay-int(t0)/SecPerDay>0)
everyNthNewDay(col) = (isNewDay(col) ? d0=d0+1 : 0, d0==N ? (d0=0,1) : 0)
myXtic(col) = everyNthNewDay(col) ? strftime("%a\n%d",t1) : NaN
plot FILE u 0:2 w l lc "blue", \
t1=(d0=0,NaN) '' u 0:(NaN):xtic(myXtic(1))
unset multiplot
### end of script
Result:
Script: (version for the time of OP's question. Works for gnuplot>=4.6.0, March 2012)
Creation of reasonable time and string data files is difficult in gnuplot 4.6, so this part was skipped and assumed you have a suitable datafile.
Although, in the lowest plot, I've only managed either to not display the very first tic (Thu 22) or to show it incorrectly.
### skip non-trading hours
reset
FILE = "SO14618708.dat"
myTimeFmt = "%Y-%m-%d--%H:%M:%S"
set format x "%a\n%d"
set grid x,y
set ytics 5
set key noautotitle
set timefmt "%Y-%m-%d--%H:%M:%S"
set xdata time
set multiplot layout 3,1
set title "with non-trading hours"
plot FILE u (timecolumn(1)):2 w l lc rgb "red"
set title "without non-trading hours, but possible duplicates in day tics"
set format x "\n"
myXtic(col) = strftime("%a\n%d",strptime(myTimeFmt,strcol(col)))
N = 15
plot FILE u 0:2 w l lc rgb "web-green", \
'' u ($0*N):(NaN):xtic(myXtic(1)) every N
N = 1
set title sprintf("with tics only every Nth day (here: N=%d)",N)
SecPerDay = 3600*24
isNewDay(col) = (t0=t1,t1=strptime(myTimeFmt,strcol(col)),(t0!=t0) || ((int(t1)/SecPerDay-int(t0)/SecPerDay)>0))
everyNthNewDay(col) = (isNewDay(col) ? d0=d0+1 : 0, d0==N ? (d0=0,1) : 0)
myXtic(c) = c ? strftime("%a\n%d",t1) : ' '
plot FILE u 0:2 w l lc rgb "blue", \
t1=(d0=0,NaN) '' u ((c=everyNthNewDay(1)) ? $0 : NaN):(NaN):xtic(myXtic(c)) w p
unset multiplot
### end of script
Result: (created with gnuplot4.6.0)

Normalizing histogram bins in gnuplot

I'm trying to plot a histogram whose bins are normalized by the number of elements in the bin.
I'm using the following
binwidth=5
bin(x,width)=width*floor(x/width) + binwidth/2.0
plot 'file' using (bin($2, binwidth)):($4) smooth freq with boxes
to get a basic histogram, but I want the value of each bin to be divided by the size of the bin. How can I go about this in gnuplot, or using external tools if necessary?
In gnuplot 4.4, functions take on a different property, in that they can execute multiple successive commands, and then return a value (see gnuplot tricks) This means that you can actually calculate the number of points, n, within the gnuplot file without having to know it in advance. This code runs for a file, "out.dat", containing one column: a list of n samples from a normal distribution:
binwidth = 0.1
set boxwidth binwidth
sum = 0
s(x) = ((sum=sum+1), 0)
bin(x, width) = width*floor(x/width) + binwidth/2.0
plot "out.dat" u ($1):(s($1))
plot "out.dat" u (bin($1, binwidth)):(1.0/(binwidth*sum)) smooth freq w boxes
The first plot statement reads through the datafile and increments sum once for each point, plotting a zero.
The second plot statement actually uses the value of sum to normalise the histogram.
In gnuplot 4.6, you can count the number of points by stats command, which is faster than plot. Actually, you do not need such a trick s(x)=((sum=sum+1),0), but directly count the number by variable STATS_records after running of stats 'out.dat' u 1.
Here is how I would do, with n=500 random gaussian variates generated from R with the following command:
Rscript -e 'cat(rnorm(500), sep="\\n")' > rnd.dat
I use quite the same idea as yours for defining a normalized histogram, where y is defined as 1/(binwidth * n), except that I use int instead of floor and I didn't recenter at the bin value. In short, this is a quick adaptation from the smooth.dem demo script, and a similar approach is described in Janert's textbook, Gnuplot in Action (Chapter 13, p. 257, freely available). You can replace my sample data file with random-points which is available in the demo folder coming with Gnuplot. Note that we need to specify the number of points as Gnuplot as no counting facilities for records in a file.
bw1=0.1
bw2=0.3
n=500
bin(x,width)=width*int(x/width)
set xrange [-3:3]
set yrange [0:1]
tstr(n)=sprintf("Binwidth = %1.1f\n", n)
set multiplot layout 1,2
set boxwidth bw1
plot 'rnd.dat' using (bin($1,bw1)):(1./(bw1*n)) smooth frequency with boxes t tstr(bw1)
set boxwidth bw2
plot 'rnd.dat' using (bin($1,bw2)):(1./(bw2*n)) smooth frequency with boxes t tstr(bw2)
Here is the result, with two bin width
Besides, this really is a rough approach to histogram and more elaborated solutions are readily available in R. Indeed, the problem is how to define a good bin width, and this issue has already been discussed on stats.stackexchange.com: using Freedman-Diaconis binning rule should not be too difficult to implement, although you'll need to compute the inter-quartile range.
Here is how R would proceed with the same data set, with default option (Sturges rule, because in this particular case, this won't make a difference) and equally spaced bin like the ones used above.
The R code that was used is given below:
par(mfrow=c(1,2), las=1)
hist(rnd, main="Sturges", xlab="", ylab="", prob=TRUE)
hist(rnd, breaks=seq(-3.5,3.5,by=.1), main="Binwidth = 0.1",
xlab="", ylab="", prob=TRUE)
You can even look at how R does its job, by inspecting the values returned when calling hist():
> str(hist(rnd, plot=FALSE))
List of 7
$ breaks : num [1:14] -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 ...
$ counts : int [1:13] 1 1 12 20 49 79 108 87 71 43 ...
$ intensities: num [1:13] 0.004 0.004 0.048 0.08 0.196 0.316 0.432 0.348 0.284 0.172 ...
$ density : num [1:13] 0.004 0.004 0.048 0.08 0.196 0.316 0.432 0.348 0.284 0.172 ...
$ mids : num [1:13] -3.25 -2.75 -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 ...
$ xname : chr "rnd"
$ equidist : logi TRUE
- attr(*, "class")= chr "histogram"
All that to say that you can use R results to process your data with Gnuplot if you like (although I would recommend to use R directly :-).
Another way of counting the number of data points in a file is by using a system command. This proves useful if you are plotting multiple files, and you don't know the number of points beforehand. I used:
countpoints(file) = system( sprintf("grep -v '^#' %s| wc -l", file) )
file1count = countpoints (file1)
file2count = countpoints (file2)
file3count = countpoints (file3)
...
The countpoints functions avoids counting lines that start with '#'. You would then use the already mentioned functions to plot the normalized histogram.
Here's a complete example:
n=100
xmin=-50.
xmax=50.
binwidth=(xmax-xmin)/n
bin(x,width)=width*floor(x/width)+width/2.0
countpoints(file) = system( sprintf("grep -v '^#' %s| wc -l", file) )
file1count = countpoints (file1)
file2count = countpoints (file2)
file3count = countpoints (file3)
plot file1 using (bin(($1),binwidth)):(1.0/(binwidth*file1count)) smooth freq with boxes,\
file2 using (bin(($1),binwidth)):(1.0/(binwidth*file2count)) smooth freq with boxes,\
file3 using (bin(($1),binwidth)):(1.0/(binwidth*file3count)) smooth freq with boxes
...
Simply
plot 'file' using (bin($2, binwidth)):($4/$4) smooth freq with boxes

Resources