How to make PivotChart with line breaks - excel

I have data like the following:
x y f
1 1 1.2
1 2 1.4
1 3 1.6
3 1 3.2
3 2 3.4
3 3 3.6
5 1 5.2
5 2 5.4
5 3 5.6
If you insert a pivot chart, you can plot f vs x and y using a line chart, and the plot has two stacked x-axes where the lower x-axes values are 1 3 5 corresponding to x, and the upper x-axes has values 1 2 3 for each value of the lower x-axes, representing x = 1 and y = 1 2 3, then x = 2 and y = 1 2 3, and x = 3 and y = 1 2 3. The plot should show a single continuous line from left to right. What I would like is for the line to break when x changes values, so there are three short lines showing the influence of y for constant values of x.
This link makes a chart similar to what I'm describing in the answer. In terms of that figure, what I want is for the link to break every time the year changes. But the answer they have, and discussion doesn't get what I'm looking for. The only approach that I can think of is to modify the PivotTable data by hand and add a row at the location the data breaks. I tried to do something like that at work, but before modifying the table, I copied the table as values to a separate location. With the new data table, I was not able to create the plot with two x axis. If I created the plot, I could put a second value in when y = 3, and for f have NA(), which should create the break in the proper location.

For something that looks like:
Select each of the second and subsequent y 1 values (individually):
and Format Data Point..., Line, No line.
(BTW IMO better suited to Super User.)

Related

Excel Formula for finding Y maximum in a given X range

I have an XY table in excel. I would like to find the maximum Y value in a given X range. Example data given below. What equation can I use, in an unrelated cell, to output the max Y value in between the X range 2:6
X
Y
1
4
2
7
3
0
4
8
5
4
6
3
Using MAXIFS:
=MAXIFS(B:B,A:A,">=2",A:A,"<=6")
As noted by #ScottCraner, if your version of Excel does not support MAXIFS, see this thread for alternatives.
I understand your question so:
you want the biggest number in column Y which also is present in X
=MAXIFS(B2:B7;B2:B7;">="&MIN(A2:A7);B2:B7;"<="&MAX(A2:A7))

Histogram with ggplot2 requires a continuous x variable

I have a dataset in a table format that looks like this:
test frequency
1 test40 3
2 test33 5
3 test19 2
4 test4521 1
5 test34 1
6 test27 3
7 test42 3
8 test35 1
....
If I use this command:
library(ggplot2)
ggplot(t, aes("frequency")) +
geom_histogram()
("t" is the name of my table)
Then RStudio says: "StatBin requires a continuous x variable: the x variable is discrete. Perhaps you want stat="count"?"
I just want to see how many times a 3 or a 5 etc. occurs.
Thanks for your help.
It looks like your data is already aggregated? Maybe the ggplot2::geom_histogram() function might not appropriate for you to use? Have you tried the geom_col() function? This simply takes the numbers declared in the input data frame, and displays a column plot with that data.
Using the below code
# Declare data frame
t <- data.frame(test = c("test40", "test33", "test19", "test4521",
"test34", "test27", "test42", "test35"),
frequency = c(3, 5, 2, 1,
1, 3, 3, 1))
returns the data frame like this
# View data
print(t)
test frequency
1 test40 3
2 test33 5
3 test19 2
4 test4521 1
5 test34 1
6 test27 3
7 test42 3
8 test35 1
and therefore you can plot it like this
# Load package
library(ggplot2)
# Generate column plot
ggplot(t, aes(test, frequency)) +
geom_col()
If you simply wanted a count of the times that the number 2 or the number 3 occurred in your data frame, then yes the geom_histogram() is the correct function to use. See, the geom_histogram() function counts the frequency that a term occurs in the data frame, then returns the result. It has an internal validation that looks at the type of data that you are trying to plot across the x-axis, and notices that if it is discrete, then you need to parse the parameter stat="count" in the function. If you don't include this parameter, then ggplot will try to bin your data to create the histogram, which is illogical because all you want is a count.
Check out this link for a description of the difference between continuous and discrete data: What is the difference between discrete data and continuous data?
With this in mind, you can plot the histogram like this
# Generate histogram plot
ggplot(t, aes(frequency)) +
geom_histogram(stat="count")
I hope that helps mate.

Add horizontal axis per series in excel

How frustrating is Excel.. working on this for half an hour now.
I simply try to make a frequency plot of two groups, with different colours. On the x-axis I would like to display the subject.ids per bar.
However, if I select a different range for the horizontal x axis per series (series 1 = blue, series 2 = orange) with the subject id, it changes the x-axis in the other series to the same. What in hell am i doing wrong?
3007 1
23121 1
3009 1
3005 1
3011 2
23171 2
3207 2
3102 3
3207 6
13302 7
2411 11
23191 11
3008 11
3106 12
110031 1
110031 1
110030 1
110017 1
110014 1
110008 1
110004 1
110007 2
110035 4
110020 4
110003 4
110036 10
110019 11
110015 21
AFAIK, you cannot put 2 series onto the x axis.
You have 2 alternate ways to solve your problem:
Concatenate each positional pair into a new column and use this as the x-axis label series. It will look like this:
You could use data labels for each series. However, this will add the data to the columns themselves and not the axis (you could put it at the base of the column). To do so, you will need to right click on the graph, select 'Add Data Labels'. By default it adds the value as the label, but you can select the labels, right click to format the data labels and use the 'values from cells' option. Once you do this and play around with the orientation and location of the labels, it will look like this:
For simplicity, I'd go with the first method
Adding a 3rd option; simply put the columns for the axis labels beside each other and when selecting the Data for the Axis Labels, just select both columns instead of the usual 1. It will look like this:

fetching data from excel in matlab

I am trying to fetch a column from excel with rows more than 17500. Now problem is that when i call it in MATLAB , it does not gives me whole matrix with all data. it fetches data from somewhere in middle.
Now the real problem is that i have to add up 4 numbers in the column and get average , save it in another column and proceed to next consecutive set of numbers and repeat again till the end..How could i do that in MATLAB .Please help me solve this problem as i am just a rookie. Thank you.
so far i have done is this:
clc
g=xlsread('Data.xlsx',1,'E1:E17500');
x=1;
for i = 1:(17500/4) %as steps has to be stepped at 4 since we need avg of 4
y{i}=((g{x}+g{x+1}+g{x+2}+g{x+3})/4);
x=x+4;
end
xlswrite('Data.xlsx', y, 1, 'F1:F4375');
I see several things here: xlsread with one output gives you a numeric matrix of doubles (not a cell-array). Therefore you should address entries with () and not with {}. The for-loop can be omitted when we use reshape to create a matrix with dimensions 4x4375. The we calculate the average of the 4 values in each column directly with mean (evaluated over the first dimension). To get a column-vector again we have to transpose the result of mean using '.
Here is the code:
g = xlsread('Data.xlsx',1,'E1:E17500');
y = mean(reshape(g,4,[]),1)';
xlswrite('Data.xlsx',y,1,'F1:F4375');
To see in detail what happens within the code, let's see the results of each step using random data for g:
Code:
rng(4);
g = randi(10,12,1)
a = reshape(g,4,[])
b = mean(a,1)
y = b'
Result:
g =
10
6
10
8
7
3
10
1
3
5
8
2
a =
10 7 3
6 3 5
10 10 8
8 1 2
b =
8.5000 5.2500 4.5000
y =
8.5000
5.2500
4.5000

Ignore #N/As in Excel LINEST function with multiple independent variables (known_x's)

I am trying to find the equation of a plane of best fit to a set of x,y,z data using the LINEST function. Some of the z data is missing, meaning that there are #N/As in the z column. For example:
A B C
(x) (y) (z)
1 1 1 5.1
2 2 1 5.4
3 3 1 5.7
4 1 2 #N/A
5 2 2 5.2
6 3 2 5.5
7 1 3 4.7
8 2 3 5
9 3 3 5.3
I would like to do =LINEST(C1:C9,A1:B9), but the #N/A causes this to return a value error.
I found a solution for a single independent variable (one column of known_x's, i.e. fitting a line to x,y data), but I have not been able to extend it for two independent variables (two known_x's columns, i.e. fitting a plane to x,y,z data). The solution I found is here: http://www.excelforum.com/excel-general/647448-linest-question.html, and the formula (slightly modified for my application) is:
=LINEST(
N(OFFSET(C1:C9,SMALL(IF(ISNUMBER(C1:C9),ROW(C1:C9)-ROW(C1)),
ROW(INDIRECT("1:"&COUNT(C1:C9)))),0,1)),
N(OFFSET(A1:A9,SMALL(IF(ISNUMBER(C1:C9),ROW(C1:C9)-ROW(C1)),
ROW(INDIRECT("1:"&COUNT(C1:C9)))),0,1)),
)
which is equivalent to =LINEST(C1:C9,A1:A9), ignoring the row containing the #N/A.
The formula from the posted link could probably be adapted but it is unwieldy. Least squares with missing data can be viewed as a regression with weight 1 for numeric values and weight 0 for non-numeric values. Based on this observation you could try this (with Ctrl+Shift+Enter in a 1x3 range):
=LINEST(IF(ISNUMBER(C1:C9),C1:C9,),IF(ISNUMBER(C1:C9),CHOOSE({1,2,3},1,A1:A9,B1:B9),),)
This gives the equation of the plane as z=-0.2x+0.3y+5 which can be checked against the results of using LINEST(C1:C8,A1:B8) with the error row removed.

Resources