the following data sets are generated from program:
1 **1 0.11111**
1 **2 0.22222**
1 **3 0.33333**
1 **4 0.44444**
2 1 0.00185
2 2 0.00005
2 3 0.12355
2 4 0.68124
3 1 0.54875
3 2 0.62155
3 3 0.35895
3 4 0.41588
My question: How do I plot the first 4 row(bold) in 2-dimensional figure? i.e. the following point should be plotted:
(1, 0.11111)
(2, 0.22222)
(3, 0.33333)
(4, 0.44444)
I know I can use "index" directive to plot multiple data sets, if so, double blank lines must come up from the file (in order to distinguish data sets). But I don't want any blank lines to come up. thanks
you can use every ::0::3 to plot up to the 4th row: Gnuplot plotting data from a file up to some row, and using 2:3 to plot using the 2nd and 3rd column.
Related
I have a dataset in a table format that looks like this:
test frequency
1 test40 3
2 test33 5
3 test19 2
4 test4521 1
5 test34 1
6 test27 3
7 test42 3
8 test35 1
....
If I use this command:
library(ggplot2)
ggplot(t, aes("frequency")) +
geom_histogram()
("t" is the name of my table)
Then RStudio says: "StatBin requires a continuous x variable: the x variable is discrete. Perhaps you want stat="count"?"
I just want to see how many times a 3 or a 5 etc. occurs.
Thanks for your help.
It looks like your data is already aggregated? Maybe the ggplot2::geom_histogram() function might not appropriate for you to use? Have you tried the geom_col() function? This simply takes the numbers declared in the input data frame, and displays a column plot with that data.
Using the below code
# Declare data frame
t <- data.frame(test = c("test40", "test33", "test19", "test4521",
"test34", "test27", "test42", "test35"),
frequency = c(3, 5, 2, 1,
1, 3, 3, 1))
returns the data frame like this
# View data
print(t)
test frequency
1 test40 3
2 test33 5
3 test19 2
4 test4521 1
5 test34 1
6 test27 3
7 test42 3
8 test35 1
and therefore you can plot it like this
# Load package
library(ggplot2)
# Generate column plot
ggplot(t, aes(test, frequency)) +
geom_col()
If you simply wanted a count of the times that the number 2 or the number 3 occurred in your data frame, then yes the geom_histogram() is the correct function to use. See, the geom_histogram() function counts the frequency that a term occurs in the data frame, then returns the result. It has an internal validation that looks at the type of data that you are trying to plot across the x-axis, and notices that if it is discrete, then you need to parse the parameter stat="count" in the function. If you don't include this parameter, then ggplot will try to bin your data to create the histogram, which is illogical because all you want is a count.
Check out this link for a description of the difference between continuous and discrete data: What is the difference between discrete data and continuous data?
With this in mind, you can plot the histogram like this
# Generate histogram plot
ggplot(t, aes(frequency)) +
geom_histogram(stat="count")
I hope that helps mate.
How frustrating is Excel.. working on this for half an hour now.
I simply try to make a frequency plot of two groups, with different colours. On the x-axis I would like to display the subject.ids per bar.
However, if I select a different range for the horizontal x axis per series (series 1 = blue, series 2 = orange) with the subject id, it changes the x-axis in the other series to the same. What in hell am i doing wrong?
3007 1
23121 1
3009 1
3005 1
3011 2
23171 2
3207 2
3102 3
3207 6
13302 7
2411 11
23191 11
3008 11
3106 12
110031 1
110031 1
110030 1
110017 1
110014 1
110008 1
110004 1
110007 2
110035 4
110020 4
110003 4
110036 10
110019 11
110015 21
AFAIK, you cannot put 2 series onto the x axis.
You have 2 alternate ways to solve your problem:
Concatenate each positional pair into a new column and use this as the x-axis label series. It will look like this:
You could use data labels for each series. However, this will add the data to the columns themselves and not the axis (you could put it at the base of the column). To do so, you will need to right click on the graph, select 'Add Data Labels'. By default it adds the value as the label, but you can select the labels, right click to format the data labels and use the 'values from cells' option. Once you do this and play around with the orientation and location of the labels, it will look like this:
For simplicity, I'd go with the first method
Adding a 3rd option; simply put the columns for the axis labels beside each other and when selecting the Data for the Axis Labels, just select both columns instead of the usual 1. It will look like this:
I have data like the following:
x y f
1 1 1.2
1 2 1.4
1 3 1.6
3 1 3.2
3 2 3.4
3 3 3.6
5 1 5.2
5 2 5.4
5 3 5.6
If you insert a pivot chart, you can plot f vs x and y using a line chart, and the plot has two stacked x-axes where the lower x-axes values are 1 3 5 corresponding to x, and the upper x-axes has values 1 2 3 for each value of the lower x-axes, representing x = 1 and y = 1 2 3, then x = 2 and y = 1 2 3, and x = 3 and y = 1 2 3. The plot should show a single continuous line from left to right. What I would like is for the line to break when x changes values, so there are three short lines showing the influence of y for constant values of x.
This link makes a chart similar to what I'm describing in the answer. In terms of that figure, what I want is for the link to break every time the year changes. But the answer they have, and discussion doesn't get what I'm looking for. The only approach that I can think of is to modify the PivotTable data by hand and add a row at the location the data breaks. I tried to do something like that at work, but before modifying the table, I copied the table as values to a separate location. With the new data table, I was not able to create the plot with two x axis. If I created the plot, I could put a second value in when y = 3, and for f have NA(), which should create the break in the proper location.
For something that looks like:
Select each of the second and subsequent y 1 values (individually):
and Format Data Point..., Line, No line.
(BTW IMO better suited to Super User.)
I have large number of files of data which I want to plot using gnuplot. The files are in text form, in the form of multiple columns. I wanted to use gnuplot to plot all columns in a given file, without the need for having to identify the number of the columns to be plotted or even then total number of columns in the file, since the total number of columns tend to vary between the files I am having. Is there some way I could do this using gnuplot?
There are different ways you can go about this, some more and some less elegant.
Take the following file data as an example:
1 2 3
2 4 5
3 1 3
4 5 2
5 9 5
6 4 2
This has 3 columns, but you want to write a general script without the assumption of any particular number. The way I would go about it would be to use awk to get the number of columns in your file within the gnuplot script by a system() call:
N = system("awk 'NR==1{print NF}' data")
plot for [i=1:N] "data" u 0:i w l title "Column ".i
Say that you don't want to use a system() call and know that the number of columns will always be below a certain maximum, for instance 10:
plot for [i=1:10] "data" u 0:i w l title "Column ".i
Then gnuplot will complain about non-existent data but will plot columns 1 to 3 nonetheless.
Now you can use "*" symbol:
plot for [i=1:*] 'data' using 0:i with lines title 'Column '.i
I got a data file in the format like this:
# begin
16 1
15 2
14 3
13 4
12 5
11 6
Now I want to use gnuplot to draw a line through the points:
(1, (16/16)) (2, (16/15)) (3, (16/14)) ... (6, (16/11))
As you see, the x axis is the range [1:6] and the Y axis corresponds the values obtained from the number in the first line at the first column(ie. 16 in this example) divided by the number in each line at the first column.
The problem is that I don't know how to get the value of the number at the first column in the first line (16), so that I could do something like
plot "datafile" using 2:(16/$1) with linespoints
I have done a lot of search about how to achieve that but with no luck. It seems that gnuplot doesn't provide some flexible ways to allow arbitrary data selection. Any ideas how to do that? Or maybe I just got stuck into a not so common problem?
Thanks for your help in advance.
You can use the stats command to extract a single numerical value from your data file. The row is selected with the every option, the column with the using:
col = 1
row = 0
stats 'datafile' every ::row::row using col nooutput
value = STATS_min
plot "datafile" using 2:(value/$1) w lp
Note, that column numbering starts at 1, and row numbering at 0 (comment lines are skipped and aren't counted).