How to build a scatter graph in excel with average y value for each x value - excel

I am not sure that here is the best place to ask,
but I have summerized my program performance data in an excel file and I want to build a scatter graph.
For each x value I have 6 y values and I want my graph to contain the average of those 6 to each x.
Is there a way to do this in excel?
For example: I have
X Y
1 0.2
1 0
1 0
1 0.8
1 1.4
1 0
2 0.2
2 1.2
2 1
2 2.2
2 0
2 2.2
3 0.8
3 1.6
3 0
3 3.6
3 1.2
3 0.6
For each x I want my graph to contain the average y.
Thanks

Not certain what you want but suggest inserting a column (assumed to be B) immediately between your two existing ones and populating it with:
=AVERAGEIF(A:A,A2,C:C)
then plotting X against those values.
Or maybe better, just subtotal for each change in X with average for Y and plot that.

Related

How can I display data from 3 tables into a single table?

I have 3 datasets a,b,c and 3 tables that give values based on the strength of their relationships. a has 5 items, b has 7 items, and c has 4 items. (a,b)(a,c)(b,c). I am trying to find a way to display all the data from these tables in a single graphic. The axis would have a Y shape with each leg representing a different data set and the cells representing their relationship strength.
I have looked through excel charts and haven't found anything that might help represent this. Is there another program which would be better? Looking for something simple to use.
Below is an example of the 3 tables and the type of data they contain. I want a way to show the scores for each relationship in a grid.
Table 1
A
B
Score
1
x
3
2
x
5
3
x
0
1
y
2
2
y
6
3
y
0
1
z
5
2
z
8
3
z
0
Table 2
A
C
Score
1
blue
3
2
blue
8
3
blue
2
1
red
0
2
red
4
3
red
1
1
yellow
3
2
yellow
3
3
yellow
9
Table 3
B
C
Score
x
blue
2
x
red
1
x
yellow
5
y
blue
0
y
red
3
y
yellow
7
z
blue
0
z
red
1
z
yellow
3
Here is an example of what I am trying to do with the data
I manually created this type of visual in autoCAD
This works as a one-off but it doesn't scale and is very tedious. Hoping there is a programmatic way to create something similar.

How do you sum every nth column of data in gnuplot?

I would like to take an average over several columns of a data set in Gnuplot. The problem is that I want to average every other column (starting from the second column of my dataset). I was thinking of using every somehow but I still don't really understand when and where to use every. To help visualise my question: my data looks something like this:
x y1 z1 y2 z2
2 0.6 0 0.6 0
1 0.7 0 0.7 1
1 0.8 2 0.8 1
1 0.9 0 0.9 0
and I would like to average y1 and y2 and plot the result by doing something like:
stats filename nooutput
plot filename u 1:sum[col = every :2::2::STATS_columns] / ((STATS_columns-1)/2)
Not sure if this is anywhere close to doable though. Also, it would be nice to have a way of finding the number of columns used without any apriori knowledge of what the data looks like. In the example I have used my knowledge of the data to know that the average is over ((STATS_columns-1)/2) number of points.
Thank you for your response
From your code I assume you want to average y1 and y2 for each row and then plot it versus x (column 1). Since you have several identical x values, there would be another average, namely an average over the columns and over all identical x values.
I modified your data to better illustrate the difference.
I guess you were asking fot the red circles. The blue triangles are basically the average of the average, i.e. the average of the red points.
Check help summation and help smooth. sum has no step size with the index.
From gnuplot help:
sum [<var> = <start> : <end>] <expression>
Code:
### average over columns and smooth
reset session
$Data <<EOD
#x y1 z1 y2 z2
1 2.0 0 4.0 0
1 2.2 0 4.2 1
1 2.9 2 4.9 1
2 2.1 0 4.1 0
2 2.3 0 4.3 0
2 2.8 0 4.8 0
3 2.2 0 4.2 0
3 2.3 0 4.3 0
3 2.7 0 4.7 0
EOD
stats $Data nooutput
set offsets 0.5,0.5,0.5,0.5
Count = (STATS_columns-1)/2
plot $Data u 1:((sum[i=1:Count] column(i*2))/Count) w p pt 7 lc rgb "red" ti "average over y1,y2 columns for each row",\
$Data u 1:((sum[i=1:Count] column(i*2))/Count) smooth unique w p pt 9 lc rgb "blue" ti "average over y1,y2 for each x"
### end of code
Result:

Need to plot histogram in Pandas such that x axis is categorical and y axis is sum of some column

I have a data frame in Pandas (using Python 3.7) as shown below:
# actuals probability bucket
# 0 0.0 0.116375 2
# 1 0.0 0.239069 3
# 2 1.0 0.591988 6
# 3 0.0 0.273709 3
# 4 1.0 0.929855 10
Where 'bucket' can take discreet values from 1 to 10. And 'actuals' can take only 2 values, either 1 or 0.
I need to plot a histogram such that x-axis = 'bucket' (i.e 1 to 10) and y-axis = Sum of 'actuals' . Then how can I do that?
Use groupby.sum with plot:
df.groupby('bucket')['actuals'].sum().plot(kind='bar')
If need histogram use kind='hist'

python-3: how to create a new pandas column as subtraction of two consecutive rows of another column?

I have a pandas dataframe
x
1
3
4
7
10
I want to create a new column y as y[i] = x[i] - x[i-1] (and y[0] = x[0]).
So the above data frame will become:
x y
1 1
3 2
4 1
7 3
10 3
How to do that with python-3? Many thanks
Using .shift() and fillna():
df['y'] = (df['x'] - df['x'].shift(1)).fillna(df['x'])
To explain what this is doing, if we print(df['x'].shift(1)) we get the following series:
0 NaN
1 1.0
2 3.0
3 4.0
4 7.0
Which is your values from 'x' shifted down one row. The first row gets NaN because there is no value above it to shift down. So, when we do:
print(df['x'] - df['x'].shift(1))
We get:
0 NaN
1 2.0
2 1.0
3 3.0
4 3.0
Which is your subtracted values, but in our first row we get a NaN again. To clear this, we use .fillna(), telling it that we want to just take the value from df['x'] whenever a null value is encountered.

heatmap color not relating with data in gnuplot

I am trying to create one heatmap using Gnuplot and my data file structure is looked like below:
6 5 4 3 1 0
3 2 2 0 0 1
0 0 0 0 1 0
0 0 0 0 2 3
0 0 1 2 4 3
the cell values are z values and columns represent y-axis and row are x-axes. that means the first value 6 is the z value where the y-axis is 5th position at x label zero. However, while plotting the heat map I am getting a different color which does not correlate with the z value. Also, I am getting five bins for the x-axis (which is supposed to be 6)and 4 bins (which is supposed to be 5) for the y-axis. My simple code is written below:
set pm3d map
splot 'm.txt' matrix
Please help me out of this confused situation.
Thanks.

Resources