I have a list of data in Excel, with values attributed to the different samples. I would like to subset the top 5% from all my data. How can I do this in Excel?
sample value
a 0.6001437980
b 0.0983224370
c 0.0493093160
d 0.0427906350
e 0.0413478790
f 0.0299204810
g 0.0259600660
h 0.0215505810
i 0.0167398000
j 0.0131496290
k 0.0105364240
l 0.0082647980
m 0.0068507060
n 0.0065234580
o 0.0050233730
In cell C2, enter
=B2>=PERCENTILE($B$2:$B$63,0.95)
you can then copy this to C3:C63.
Column C now shows TRUE only for those rows with a B value in the top 5%.
Additionally you may like to apply a filter.
You can specifie rang of your data and then color it with very little effort.
Here is an example, where you can color top N records:
Hope it helps :)
Related
Here's a link to a screenshot with the formula used in Column B and some sample data
I have a spreadsheet with 48 rows of data in column A
The values range from 0 to 19
The average of these 48 rows = 8.71
the standard deviation of the population = 3.77
I've used the STANDARDIZE function in excel in column B to return the Z-score of each item in column A given that I know the mean (8.71), std dev (3.77), and x (whatever is in column A).
For example (row 2) has:
x = 2
z = -1.779
Using the z value, I want to create an lower (4) and upper (24) boundary and calculate what the value would be in this 3rd column.
Essentially, if x = 0 (min value), then z = -2.3096, and columnC = 4 (lower boundary condition)
Conversely, if x = 19 (max value), then z = 2.9947, and columnC = 19 (upper boundary condition)
and then all other values between 0 to 19 would be calculated....
Any ideas how I can accomplish this with a formula in the column C?
So if your lowest original value is 0 and your highest is 19 and you want to re-distribute them from 4 to 24 and we assume that both are linear that means:
Since both are linear we have to use these formulas:
we develope the first to c so we get
and replace the c in the second equation with that so we get
and develope this to m as follows
If we put this togeter with our third equation above we get:
So we finally have equations for m = and c = and we can use the numbers from our old and new lower and upper bound to get:
you can use these values with
where x is are your old values in column A and y is the new distributed value in column B:
Some visualization if you change the boundaries:
Idea for a non-linear solution
If you want 4 and 24 as boundaries and the mean should be 12 the solution cannot be linear of course. But you could use for example any other formula like
So you can use this formula for column D y2 with the following values a, b, c as well as calculating the mean, min and max over column D y2.
Then use the solver:
Goal is: Mean $M$15 should be 12
secondary conditions: $M$16 = 4 (lower boundary) and $M$17 = 24 (upper boundary)
variable cells are a, b and c: $M$11:$M$13
The solver will now adjust the values a, b and c so that you get very close to your goal and to get these results:
The min is 4 the max is almost 24 and the mean is almost 12 that is probably the closest you can get with a numeric method.
I have no idea if this is the right place to ask this, but I am really struggling with excel. I am trying to define two formulas in excel, and then make a 2 variable data table to run these formulas through.
My formulas are:
Q = SQRT( 2*U*A ) / SQRT(h) , and if you use that best quantity Q then the acquisition and holding costs yield a corresponding TOTAL COST = SQRT( 2*U*A ) * SQRT(h).
We are then given a range of values for U and h, with A constant.
How do I define these equations in excel?
Here's a start, type this in (without the A B C... and 1,2,3... at top):
A B C D E
1 A U h Q Total
2 123 1 100 =SQRT(2*B2*$A$2)/SQRT(C2) =SQRT(2*B2*$A$2)*SQRT(C2)
3 2 200 =SQRT(2*B3*$A$2)/SQRT(C3) =SQRT(2*B3*$A$2)*SQRT(C3)
4 3 300 =SQRT(2*B4*$A$2)/SQRT(C4) =SQRT(2*B4*$A$2)*SQRT(C4)
Also, have a look at https://faculty.fuqua.duke.edu/~pecklund/ExcelReview/2001_Documents/2001XLGettingStarted.pdf
I have a sheet which has Dates as first column and time as second column. Then in other columns more details (which are not part of the problem).
The problem is on a given date there are "n" number of rows (each with same date in 1st column). BUT, the time is not chronological.
Say, on 7th Jan there are 4 rows of data with times such as
7-jan-2016 14:25:33 x y z
7-jan-2016 10:43:51 v t s
7-jan-2016 13:01:02 h m p
7-jan-2016 12:48:15 l p l
9-jan-2016 problem same as above
I need to rearrange the rows chronologically FOR EACH DATE. Such that above looks like this:
7-jan-2016 10:43:51 v t s
7-jan-2016 12:48:15 l p l
7-jan-2016 13:01:02 h m p
7-jan-2016 14:25:33 x y z
9-jan-2016 no more problems.. and as above..
How can I achieve this without manually cut-pasting rows that are in 1000's.
Sort by the first column, then select Add Level and sort by the second column.
I have had some success matching data from various columns and creating a new data output. Here is what I typically start with;
COL A COL B COL C COL D
ITEM VALUE ITEM VALUE2
---- ---- ---- ----
A 1 B 100
B 2 A 200
C 3 F 300
G 4 E 400
H 5 C 500
J 6 M 600
And I can achieve this result using VLOOKUP;
COL E COL F COL G
ITEM VALUE VALUE2
---- ---- ----
A 1 200
B 2 100
C 3 500
G 4
H 5
J 6
But what I'm really after is this: both matching AND merging;
COL E COL F COL G
ITEM VALUE VALUE2
---- ---- ----
A 1 200
B 2 100
C 3 500
E 400
F 300
G 4
H 5
J 6
M 600
I'm punching a tad above my weight class with this one, so any help would be greatly appreciated.
With something like:
=COUNTIF(A:A,C2)
in a column and copied down you should get indication of which items are not already present in ColumnA. Sort with C:D on that basis and simply copy the items associated with 0 to ColumnA and their values to ColumnD, then sort and apply your VLOOKUP.
The following is based on: http://www.get-digital-help.com/2009/05/25/create-a-drop-down-list-containing-only-unique-distinct-alphabetically-sorted-text-values-using-excel-array-formula/#comment-74005
It's a great site for clever use of Excel formulas and does a good job of explaining the monster formulas below, creating dynamic named ranges, and how to work with array formulas. Look at it!
My solution is not quite as elegant as I'd like but it works. So, you'll need to create some named ranges.
One for the items in COL A (e.g. "Items1") and one for the items in COL C (e.g. "Items2")
In COL G we'll have another List "AllItems1". In COL H we'll have another List "AllItems2". You'll need to make a named range of "AllItems1" too.
The array formula in COL G is:
=IFERROR(IFERROR(INDEX(Items1,MATCH(0,IF(MAX(NOT(COUNTIF($G$1:$G1,Items1))* COUNTIF(Items1,">"&Items1)+1))=(COUNTIF(Items1,">"&Items1)+1),0,1),0)),INDEX(Items2,MATCH(0,IF(MAX(NOT(COUNTIF($G$1:$G1,Items2))*(COUNTIF(Items2,">"&Items2)+1))=(COUNTIF(Items2,">"&Items2)+1),0,1),0))),"")
Which is a mouthful. This cascades through each list (i.e. Items1 and Items2) and gives you a non-repetitive sorta alphabetized list of the items in "Items1" and "Items2".
COL G is really just an intermediate step to get us to COL H. I haven't found a way to combine two lists without doing this step. Let me know if you find a better way.
To get a non-repetitive AND alphabetized list of the items we put the following array formula in COL H:
=IFERROR(INDEX(AllItems1,MATCH(0,IF(MAX(NOT(COUNTIF($H$1:$H1,AllItems1))*(COUNTIF(AllItems1,">"&AllItems1)+1))=(COUNTIF(AllItems1,">"&AllItems1)+1),0,1),0)),"")
Then we do VLOOKUP s off of "AllItems2".
Put the following VLOOKUP in COL I :
=IFERROR(VLOOKUP($H2,$A$2:$B$7,2,0),"")
And put the following VLOOKUP in COL J:
=IFERROR(VLOOKUP($H2,$C$2:$D$7,2,0),"")
Of course, you could make "AllItems2" a named range also and use that in the VLOOKUP s.
I think that gets you what you want.
I have a database of people who may or may not have multiple entries and I'd like to know how to count the total number of people who are male who meet another category using a formula. I current use the
=SUMPRODUCT((MelanomaEth="U")/COUNTIF(MelMRN,MelMRN&"")))
formula to count the number of unique entries with a "U" in the MelanomaEth column. However, I'd like to go further and determine how many of these U's are males and females.
I tried to use:
=IF(MelSex="M",SUMPRODUCT((MelanomaEth="U")/COUNTIF(MelMRN,MelMRN&"")))
but it gives me the incorrect number.
Here is an "dummy" sheet:
MRN Date Sex Ethnicity
A 8/1/2013 M U
B 8/2/2013 F N
C 8/2/2013 F N
A 9/2/2013 M U
A 9/3/2013 M U
C 8/31/2013 F N
B 8/15/2013 F N
D 10/5/2013 M U
If I wanted to know the number of unique names who are M and U, I should get 2. The number of names who are F and U should be 0, FN should be 2, and 0 MN.
Any suggestions would be appreciated.
Thanks!
Try this:
=SUMPRODUCT(((MelanomaEth="U")/COUNTIF(MelMRN,MelMRN&""))*((MelSex="M")/COUNTIF(MelMRN,MelMRN&""))
What your looking for is a sumproduct with multiple criteria. Usually the format is something like this:
= SUMPRODUCT((RANGE CONDITION)*(RANGE2 CONDITION2))
= SUMPRODUCT(( D1:E5 > 1 )*( D1:E5 < 10 ))
If a condition is false, then the whole statement is false and wont be counted.
Since I'm not sure what your names represent I can't be sure the code above will work for you.
I got it!
It was simpler than I thought. For those who need this,
=SUMPRODUCT((MelanomaEth="U")*(MelSex="F")/COUNTIF(MelMRN,MelMRN&"")))
This gives the unique number of MRN who meet the criteria F and U.
=SUMPRODUCT((O10:O21<>"")/COUNTIF($O$10:$O$21,O10:O21&"")) counts unique cells in the range