Gnuplot CCDF plotting and log-log scale - gnuplot

My data file is a set of sorted single-column:
1
1
2
2
2
3
...
999
1000
1000
I am able to successfully plot the CDF using the command like (assuming 10000 lines in the file):
plot "file" using 1:(1/10000.) smooth cumulative title "CDF"
I am also able to plot the logcale of x axis by:
set logscale x
My problem is how can I have a CCDF plotting with Gnuplot?
In additional, the CDF with log-log scale (set logscale xy) can not give me any output. What if I would like to have a log-log CCDF plotting?
Many thanks!

I found a workaround for this problem, because I do not think you can plot a CCDF only using gnuplot.
Briefly, I just parsed my data using bash to create a dataset where the cumulative data is explicit; then gnuplot may simply plot the new dataset. As an example, assuming that your file contains the (numerical) values you want to cumulate, I would do in a bash environment:
cat data | sort -n | uniq --count | awk 'BEGIN{sum=0}{print $2,$1,sum; sum=sum+$1}' > parsed.dat'
This command reads the dataset (cat data), sorts the numerical data using their value (sort -n), counts the occurrences of each sample (uniq --count) and creates a new dataset, calculating as well the cumulative sum of each data value (the awk command).
This new dataset contains 3 columns: the first column ($1 in gnuplot) contains the unique values of your dataset, the $2 contains the number of the occurrences of your values, and the third column represents the cumulative sum.
Finally, in gnuplot, you can do this:
stats "parsed.dat" using 3;
plot "parsed.dat" using 1:($3/STATS_max) with lines title "CDF",\
"" using 1:(1-$3/STATS_max) with lines title "CCDF",\
"" using 1:($2/STATS_max) with boxes title "PDF"
The stats command of gnuplot analyzes the third column (the one with the cumulative sum) and stores the values to some variables. STATS_max is the max value of this column (so it is the final cumulative sum). Now you have all the data you need to plot not only the CDF, but also the CCDF (which is 1 - CDF) and also the PDF (or the normalized histogram, for discrete values).

Related

Gnuplot - plotting series based on label in third column

I have data in the format:
1 1 A
2 3 ab
1 2 A
3 3 x
4 1 x
2 3 A
and so on. The third column indicates the series. That is in the case above there are 3 distinct data series, one designated A, another designated ab and last designated x. Is there a way to plot the three data series from such data structure in gnuplot without using eg. awk? The difficulty here is that the number of categories (here denoted A, ab, x) is quite large and it is not feasible to write them out by hand.
I was thinking along the lines:
plot data u 1:2:3 w dots
but that does not work and I get warning: Skipping data file with no valid points (I tried quoted and unquoted version of the third column). A similar question has to manually define the palette which is undesirable.
With a little bit of work you can make a list of unique categories from within gnuplot without using external tools. The following code snippet first assembles a list of the entire third column of the data file, and then loops over it to generate a list of unique category names. If memory use or processing time become an issue then one could probably combine these steps and avoid forming a single string with the entire third column.
delimiter = "#" # some character that does not appear in category name
categories = ""
stats "test.dat" using (categories = categories." ".delimiter.strcol(3).delimiter) nooutput
unique_categories = ""
do for [cat in categories] {
if (strstrt (unique_categories, cat) ==0) {
unique_categories = unique_categories." ".cat
}
}
set xrange[0:5]
set yrange [0:4]
plot for [cat in unique_categories] "test.dat" using 1:(delimiter.strcol(3).delimiter eq cat ? $2 : NaN) title cat[2:strlen(cat)-1]
Take a look at the contents of the string variables categories and unique_categories to get a better idea of what this code does.

Scatter plot for every pairs in a two column matrix

I have a matrix which contains the atom numbers of the pairs of atoms which are in contact with each other. My matrix is like this:
column 1: atom number i;
column 2: atom number j
i,j runs from 1 to 800.
If there is a pair i-j in the matrix, place a dot corresponding to the position (i,j) of the matrix.
How do I plot such matrix?
Example:
A= [1,3; 3,8; 3,1; 6,2; 2,6; 1,2; 5,2; 8,3; 2,5; 2,1]
I want to Plot the matrix A, where X and Y-axis run from 1 to 8. Place a dot for every combination of X and Y which are present in A.
I want a plot like this:
Isn't this just a scatter plot?
If your m x 2 matrix is saved in a text file then this is trivial.
Here are the contents of an example data file "input.dat":
4 3
3 4
5 3
3 5
8 2
2 8
All you need to do is open the data file in xmgrace using xmgrace input.dat.
Now, initially it will be a line plot, but if you do 'Plot' > 'Set Appearance' and then with the only set already being selected you can set the 'Symbol Properties' 'Type:' to Diamond and 'Line Properties' 'Type:' to None you will already be on your way. Setting the symbol fill to solid red, tweaking the axis ranges and showing major tick grid lines will give a plot like the one you gave as an example.
You can save a parameters file and in future load the parameters at the beginning using
xmgrace -param template.par input2.dat.
But, having said all this, why not just plot it in matlab?

gnuplot auto sorts times

i have a file which looks as follows:
19:40:47,2772
19:41:50,2896
19:42:50,2870
19:43:51,2851
19:44:53,2824
19:45:55,2891
.
.
.
07:52:53,2772
07:53:56,2767
07:55:00,2709
07:56:01,2713
07:57:04,2844
07:58:04,2750
07:59:05,2744
08:00:08,2812
08:01:11,2728
08:02:14,2852
and im trying to do the simple task of making a graph with time X axis & number Y axis.
code as follows:
#!/usr/bin/gnuplot
unset multiplot
set xdata time
set datafile separator ","
set timefmt "%H:%M:%S"
set format x "%H:%M"
set title "defect number"
set xlabel "X"
set ylabel "Y"
plot "Defect_number_03-03-16_08.04.49.csv" using 1:2 w lines
pause -1
problem is that gnuplot autosorts the time and my chart looks like this:
I want to make a chart according to the order in the file, any help will be great =)
When you give the plot command
plot datafile u 1:2
you are telling gnuplot that the first column is your x-value and the second is your y-value. Naturally, earlier times are further to the left (as you didn't post your full data, I have used only the part you did post - this will cause a "skip" in the axis labels).
You can use a pseudocolumn to use the line number as your x-value. The 0 column corresponds to the line number (see help pseudocolumns).
Thus plot datafile u 0:2 will use the line number as the x-coordinate and the 2nd column as the y-coordinate.
We still need to add the correct x-axis labels, and can't rely on them to be generated correctly in this case. We would use the xtic function to do this, as1
plot datafile u 0:2:xtic(1)
which tells gnuplot to use the value in column 1 as an xtic, but it will read this literally and not format it as you have desired with the time. To do this, we can manually cast this to the correct string
plot datafile u 0:2:xtic(strftime("%H:%M",strptime("%H:%M:%S",strcol(1)))) w lines
Here, the strcol function reads column 1 as a string, the strptime function turns this into the internal time representation using the specified format string for reading it, and finally the strftime formats this as time string using the specified output string.
As Christoph stated in his answer, these solutions will cause uniform spacing of the points. If the points are already uniform spaced, this is not a problem, and if the points are very close to uniform spaced, it is probably acceptable as well (it looks like your points are about 1 minute apart give or take a couple of seconds).
However, if we want the absolutely correct spacing, we will need to add a date to the lines. This could be done in the original data file during the creation, or we could use an external process to add the dates only when needed leaving the original file exactly the same.
As you are only marking off the time and not the day in your tic marks, the actual day doesn't matter. It only matters that the times from the next morning are in the next day from the times from the last night.
We can use an external program to add dates. The following python 3 program reads the data file and adds a date to it (using Jan 1st, 2015 for the first date - as previously mentioned this date doesn't really matter). If a time occurs earlier in the day from the previous one, it moves to the next day. Here is the program adddates.py:
from datetime import datetime,timedelta
from sys import argv
last = None
offset = timedelta(days=0)
for x in open(argv[1],"r"):
vals = x.split(",")
dte = datetime.strptime("01/01/2015 "+vals[0],"%m/%d/%Y %H:%M:%S") + offset
if last!=None and last>dte:
offset+= timedelta(days=1)
dte = dte + offset
last = dte
print(dte.strftime("%Y-%m-%d %H:%M:%S"),vals[1],sep=",",end="")
The output from running this on the data file looks like:
2015-01-01 19:40:47,2772
2015-01-01 19:41:50,2896
2015-01-01 19:42:50,2870
2015-01-01 19:43:51,2851
2015-01-01 19:44:53,2824
2015-01-01 19:45:55,2891
...
2015-01-02 07:52:53,2772
2015-01-02 07:53:56,2767
...
We can now read data from this program by opening a pipe in our plot command.
set timefmt "%Y-%m-%d %H:%M:%S"
plot "< adddates.py datafile" u 1:2 with lines
1 Note that this also causes labels to overlap, as it uses all of them. To use every other one, we could have used xtic(int($0) % 2 == 0 ? strcol(1):""). A similar technique can be used with the format using the correct labels as well.
A proper solution is to save your data with full date and time, or as timestamps.
All other solutions with $0 and labelling the xtics with xticlabel requires your data to be spaces equidistantly, which doesn't seem to be the case.
So, just save your data as e.g. UNIX timestamp and you can use all nice gnuplot features without fiddling.

Plotting multiple graphs depending on column value with gnuplot

I have the following data, which I wan't to plot using GNUPLOT:
#TIME #VALUE #SOURCE
1 100 A
1 88 B
2 115 A
2 100 B
3 130 A
3 210 B
I want to have two lines drawn, depending on the value of column #SOURCE. One line for A and one line for B. Is this possible with GNUPLOT and if yes how?
Is it possible to also draw a summation of column #VALUE depending over column #TIME? Means, that for all equal entries in #TIME, the values in #VALUE will be summed up.
Thanks in advance,
Frank
One way to do it would be to use grep to locate lines ending with A or B and plot the result. You can do this in a single plot line with a for loop if you know the characters lines will end in:
plot for [s in 'A B'] sprintf("<(grep -v '%s$' data.dat)", s) u 1:2 w l
This plots the data you provided (saved in data.dat) as two different lines.
You could also change the for part to [s in 'word1 word2 word3'] or any other string you like. If you don't know the character/word lines will be ending with you would probably need to pass the file twice first to determine the string for the for loop and a second time to do the plotting.

Gnuplot: binary plotting two 1D records against eachother

I want to plot a single curve of data values versus time values, both of which come from a binary file. Both of these sets (data and time) are stored as 59, 4-byte floats. The data values are found first by skipping 52192 bytes into the file, then reading the next 59 values. The time values are stored similarly at 181676 bytes from the beginning of the file in a set of 59, 4-byte floats.
I'm able to plot each of these sets by themselves against the coordinate number, but I'm not able to plot the data versus the time. Here are the two lines that work as I would expect:
plot 'file.bin' binary endian=big record=59 skip=52192 using 1 title "data-1" with lines
plot 'file.bin' binary endian=big record=59 skip=181676 using 1 title "time-1" with lines
Here is how I'm trying to plot data-1 versus time-1 (the 129248 skip value is 181676-52192+4(59)):
plot 'file.bin' binary endian=big record=59:59 skip=52192:129248 using 2:1 title "data vs time" with lines
However, this seems to concatenate the two records into the 1 column, rather than storing the first record in column 1 and the second in column 2. I'm not sure how to prevent this concatenation.
I've read Plotting 1D binary array, but it ultimately plots against the coordinate values instead of plotting the first record against the second.

Resources