using third column as a label for the segment range [duplicate]

using third column as a label for the segment range [duplicate] - graphics

This question already has an answer here:
gnuplot specify column for label
(1 answer)
Closed 4 years ago.
I have a plain text file that has N amount of segments in this format :
#SegmentX SegmentY SegmentRangeX
100 100
300 100 200
Where SegmentRangeX represents (x2-x1)
I need to plot this for N amount of Segments, With a label next to each segment showing me the SegmentRangeX, how do i fetch the value for the label ?

Thanks to #Christoph i found exactly what i wanted
'' index i using 1:2:3 with labels offset 2 title ''

Related

How can I subtract the value in first row from all other values in the column?

I want to subtract all rows of a file from its first row, and then plot it. How can I implement such math work in gnuplot?
Here is an example of what i want to do:
Let's say i have a file that has two columns and 1000 rows. I want a script that subtract all data's in 2nd column from the 2nd column value in first row.

I am pretty sure that there are a similar questions on SO, however, apparently not so easy to find.
I would have searched for "normalization" or "offset".
The following example works even if you have single or double empty lines in your data. The expression in the plot command uses serial evaluation, check help operators binary.
Sometimes, you might see similar solutions using the pseudocolumn 0 (check help pseudocolumns), however, which might lead to wrong results if you have empty lines in your data.
Script:
### offset data: subtraction of first value in a column
reset session
$Data <<EOD
0 10
1 11
2 12
3 13
4 14
5 15
6 16
7 17
EOD
plot t=0 $Data u 1:(t==0?y0=$2:0,t=t+1,$2-y0) w lp pt 7 lc "red"
### end of script
Result:

dropna() not working for axis = 1 with the given threshold [duplicate]

This question already has answers here:
thresh in dropna for DataFrame in pandas in python
(3 answers)
Closed 2 years ago.
For the given dataset
I performed a dropna on axis = 1 with threshold = 2
df.dropna(thresh=2,axis=1)
The output was
Which does not seem correct, what I expect is to drop column with index = 1 and 2 given that both columns have NaN occurences >= 2
The code works perfectly fine with axis=0

Try using df.dropna(thresh=6,axis=1) for same dataframe.

Plot all columns in a file using gnuplot without specifying number of columns

I have large number of files of data which I want to plot using gnuplot. The files are in text form, in the form of multiple columns. I wanted to use gnuplot to plot all columns in a given file, without the need for having to identify the number of the columns to be plotted or even then total number of columns in the file, since the total number of columns tend to vary between the files I am having. Is there some way I could do this using gnuplot?

There are different ways you can go about this, some more and some less elegant.
Take the following file data as an example:
1 2 3
2 4 5
3 1 3
4 5 2
5 9 5
6 4 2
This has 3 columns, but you want to write a general script without the assumption of any particular number. The way I would go about it would be to use awk to get the number of columns in your file within the gnuplot script by a system() call:
N = system("awk 'NR==1{print NF}' data")
plot for [i=1:N] "data" u 0:i w l title "Column ".i
Say that you don't want to use a system() call and know that the number of columns will always be below a certain maximum, for instance 10:
plot for [i=1:10] "data" u 0:i w l title "Column ".i
Then gnuplot will complain about non-existent data but will plot columns 1 to 3 nonetheless.

Now you can use "*" symbol:
plot for [i=1:*] 'data' using 0:i with lines title 'Column '.i

How to get the value of a specific column in a specific line in any time of processing in gnuplot?

I got a data file in the format like this:
# begin
16 1
15 2
14 3
13 4
12 5
11 6
Now I want to use gnuplot to draw a line through the points:
(1, (16/16)) (2, (16/15)) (3, (16/14)) ... (6, (16/11))
As you see, the x axis is the range [1:6] and the Y axis corresponds the values obtained from the number in the first line at the first column(ie. 16 in this example) divided by the number in each line at the first column.
The problem is that I don't know how to get the value of the number at the first column in the first line (16), so that I could do something like
plot "datafile" using 2:(16/$1) with linespoints
I have done a lot of search about how to achieve that but with no luck. It seems that gnuplot doesn't provide some flexible ways to allow arbitrary data selection. Any ideas how to do that? Or maybe I just got stuck into a not so common problem?
Thanks for your help in advance.

You can use the stats command to extract a single numerical value from your data file. The row is selected with the every option, the column with the using:
col = 1
row = 0
stats 'datafile' every ::row::row using col nooutput
value = STATS_min
plot "datafile" using 2:(value/$1) w lp
Note, that column numbering starts at 1, and row numbering at 0 (comment lines are skipped and aren't counted).

Gnuplot: Plotting multiple series on graph, but number of different series to overlay unknown ahead of time

I am trying to write a script wrapping gnuplot that will take a dataset and produce an overlayed graph, the number of series to be plotted based on the number of distinct values in a given column, or based on the number of different datasets in the file. An example file would be:
#SeriesName x y
Series1 0 10
Series1 1 11
Series1 2 13
...
SeriesN 0 14
SeriesN 1 19
SeriesN 2 15
I have this in one continuous set of lines, but I can split it into index-able chunks if necessary. The problem is that I don't know the different names of the SeriesName values I'll have ahead of time, nor how many of distinct values there will be. But I want one line on the graph per distinct value of SeriesName. I can see how to make graphs if I know ahead of time the different values of SeriesName, but I don't know how to tell gnuplot to "make one line per value of series, and label each line with the name that is the value of SeriesName that was used for each line."
Can gnuplot do this? Otherwise, I can make two passes through the data, the first one of which I will gather the unique values of SeriesName, and then use bash/perl/python to explicitly build a `plot' statement, but it seems like gnuplot should have some functionality for a user to have to avoid that. Am I missing something?
Thanks in advance.
Update: I also posted to a forum to where the author of Gnuplot in Action (Philipp Janert) posts, and I posted a workaround to my own problem, but I don't think it qualifies as an answer, as what it ultimately does is make a second run through the data and then does a source code filter on gnuplot commands to make a gnuplot script compliant with a particular dataset. I would think that there would be an answer using just the syntax of gnuplot better than what I did. For reference, here is the link: http://www.manning-sandbox.com/thread.jspa?messageID=122752#122752

Just for the records, here is a solution which works with gnuplot>=4.4.0 and gnuplot 5.x.
When the series label changes in column 1 it will be added to a string. This string will be used later to plot the legend.
Data: SO8812078.dat
#SeriesName x y
Series1 0 10
Series1 1 11
Series1 2 13
Series2 0 12
Series2 1 13
Series2 2 14
SeriesN 0 14
SeriesN 1 19
Script: (works with gnuplot>=4.4.0, March 2010)
### take legend from column
reset
FILE = "SO8812078.dat"
myTitles = ''
set key noautotitle
plot t1='' FILE u (t0=t1,t1=strcol(1),t0 ne t1?myTitles=myTitles.' '.t1:0,$2):3:(words(myTitles)) w lp pt 7 lc var, \
for [i=0:words(myTitles)] 1/0 w lp pt 7 lc i ti word(myTitles,i)
### end of script
Result: (created with gnuplot 4.4.0)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string