Select non-contiguous cells within a T.Test - excel

I think that it is not possible after some search, but here is my question:
I have a row of values, which correspond to a quantity we calculate every day. We alternate the experiments between product A and product B.
Now, I want to run student tests on the averages of 3 consecutive experiments, on product A versus product B. A toy example can be found below:
I would like to run a T.Test to compare the sets of values (B2,D2,F2) to (C2,E2,G2). However, nothing seems to work, at least not the following:
=T.Test((B2,D2,F2);(C2,E2,G2);2;1)
=T.Test((B2;D2;F2);(C2;E2;G2);2;1)
=T.Test({"B2";"D2";"F2"};{"C2";"E2";"G2"};2;1)
Note that I was able to get my test results by reorganizing the data so that the ranges are contiguous, but this has definitely made be curious to see if there is a solution.

For those with access to HSTACK() (or VSTACK() for that matter), try:
=T.TEST(HSTACK(B2,D2,F2),HSTACK(C2,E2,G2),2,1)
If not available, try:
=T.TEST(INDEX(B2:G2,{1,3,5}),INDEX(B2:G2,{2,4,6}),2,1)
EDIT: for those with older versions of Excel, as per #JosWoolley you'd indeed need to pursuade Excel to use arrays:
=T.TEST(INDEX(B2:G2,N(IF(1,{1,3,5}))),INDEX(B2:G2,N(IF(1,{2,4,6}))),2,1)

Related

Rank equally across multiple columns with different data types using a single formula. Is it possible?

See this data set below. It has 3 columns (PTQ, % Growth, $ Growth) that all need ranked individually then summed up and then ranked again for a total power rank of each region. Is there any way I can do this with a single formula? I do this a lot and it would be nice not to have to rank everything individually each time. Basically I need to end up at the "Power Rank" column.
To clarify, I do not want to rank first on one column then another, they all need to be ranked equally together.
Thought I'd add a solution for non-365 users:
Edit: slight amendment so that the returned ranking mimics that of the RANK function, with thanks to #Scott Craner
=MMULT(N(MMULT(CHOOSE({1,2,3},RANK(A2:A10,A2:A10),RANK(B2:B10,B2:B10),RANK(C2:C10,C2:C10)),{1;1;1})>TRANSPOSE(MMULT(CHOOSE({1,2,3},RANK(A2:A10,A2:A10),RANK(B2:B10,B2:B10),RANK(C2:C10,C2:C10)),{1;1;1}))),ROW(A2:A10)^0)+1
which may require committing with CTRL+SHIFT+ENTER, depending on the version of Excel being used.
The static
{1;1;1}
relates to the fact that 3 columns are being queried, and could of course be replaced with a dynamic alternative.
We can use Office 365 LET and SORT. RANK does not allow the use of arrays, so we need to improvise with MATCH and SORT.
=LET(
ptq,A2:A10,
grwt,B2:B10,
grwthd,C2:C10,
ptqrnk,RANK(ptq,ptq),
gwtprnk,RANK(grwt,grwt),
gwtdrnk,RANK(grwthd,grwthd),
sm,ptqrnk+gwtprnk+gwtdrnk,
MATCH(sm,SORT(sm),0))

COUNTIFS with or condition

I have a list of cities (this is a simplified version)
London
Manchester
Nottingham
I want to say something like (Obviously formula is not correct, is just to show what I am trying to do)
=COUNTIFS(MachineData!$X:$X,"Y",MachineData!$N:$N,"*london*" OR MachineData!$N:$N, "*Manchester*" OR MachineData!$N:$N,"*Nottingham*")
Basically, I want to count if its London OR Manchester OR Nottingham but also if X:X is Y
How can I do this without having to repeat the same instruction 3 times? is there a way to do this where I can tell it to check all the conditions for the cities at once?
Update: I have been trying something like
=COUNTIFS(MachineData!$X:$X,"Y",SUM(COUNTIFS(MachineData!$N:$N,{"London","*Manchester*","Oxford","*Nottingham*"})))
But this is not working for me
You are very close to a solution yourself. Try:
=SUM(COUNTIFS(MachineData!X:X,"Y",MachineData!N:N,{"*london*","*manchester*","*nottingham*"}))
edit: Use #JvdV solution. Far more elegant, less wall of text.
I created a small table as an example, five lines long reading london, manchester, nottingham, cardiff, london. The table name is table1, the column header I used was "target city"
With that in mind, this will bring the desired results into a single cell.
=SUM(COUNTIF(Table1[Target city],"london"),COUNTIF(Table1[Target city],"manchester"),COUNTIF(Table1[Target city],"nottingham"))
Result = 4
Warning: Depending on the length of the target cities you want to bring in and how frequently that list changes this might become prohibitively large/time consuming.
If you need something that will dynamically scale based of a changing list of cities you want to look for, something like python may be a better solution.
No, COUNTIFS is restricted to AND and cannot use OR.
=SUM(COUNTIFS(Table2[Category 2],"Y",Table2[Category 1],{"London","Manchester","Nottingham"}))
Suppose you have the following named ranges:
There are a couple of ways to crack this question:
Solution1:
=SUMPRODUCT(((N:N=City1)+(N:N=City2)+(N:N=City3))*(X:X=Criteria1))
For this type of question, I normally use SUMPRODUCT as a first instinct, but it is not as fast as using SUM+COUNTIFS if you are working on a large dataset.
Solution2:
=SUM(--ISNUMBER(MATCH(N:N,Cities,0)*MATCH(X:X,Criteria1,0)))
Instead of checking on individual cities, it would be faster to put the cities in a named range and check all of them at once and then add the results up.
Solution3:
=SUM(COUNTIFS(X:X,Criteria1,N:N,TRANSPOSE(Cities)))
This is the solution proposed by JvdV using named ranges.
Solution4:
=SUM((N:N=TRANSPOSE(Cities))*(X:X=Criteria1))
This will do the trick too, but I am not sure how it compares to SUM+COUNTIFS in terms of calculation efficiency on a large dataset.

Can I use MINIFS or INDEX/MATCH on two non-contiguous ranges...?

Problem is straightforward, but solution is escaping. Hopefully some master here can provide insight.
I have a big data grid with prices. Those prices are ordered by location (rows) and business name (cols). I need to match the location/row by looking at two criteria (location name and a second column). Once the matching row is found (there will always be a match), I need to get the minimum/lowest price from two ranges within the grid.
The last point is the real challenge. Unlike a normal INDEX or MINIFS scenario, the columns I need to MIN aren't contiguous... for example, I need to know what the MIN value is between I4:J1331 and Q4:U1331. It's not an intersection, it's a contiguous set of values across two different arrays.
You're probably saying "hey, why don't you just reorder your table to make them contiguous"... not an option. I have a lot of data, and this spreadsheet is used for a bunch of other stuff. So, I have to work with the format I have, and that means figuring out how to do a lookup/min across multiple non-contiguous ranges. My latest attempt:
=MINIFS(AND($I$4:$J$1331,$K$4:$P$1331),$B$4:$B$1331,$A2,$E$4:$E$1331,$B2)
Didn't work, but it should make it more clear what I'm trying to do. There has GOT to be an easy way to just tell excel "use these two ranges instead of one".
Thanks,
Rick
Figured it out. For anyone else who's interested, there doesn't seem to be any easy way to just "AND" arrays together for a search (Hello MS, backlog please). So, what I did instead was to just create multiple INDEX/MATCH arrays inside of a MIN function and take the result. Like this:
MIN((INDEX/MATCH ARRAY 1),(INDEX/MATCH ARRAY 2))
They both have identical criteria, the only difference is the set of arrays being indexed in each function. That basically gives me this:
MIN((match array),(match array))
And Min can then pull the lowest value from either.
Not as elegant as I'd like... lots of redundant code, but at least it works.
-rt

Define Status depending on Criteria

I have advanced Excel/Google Sheets skills. I have more of a conceptual question. I am happy with any solution (Excel or for Sheets, no difference for me).
I have a sheet where various coworkers have access and work with. It is used to define which product needs to go through which steps. Then when a part of a job is done, the status of the product is changed depending on criteria.
You can also think of it as projects and the status of a project.
The 3 examples shows how the data is input by the workers. Sometimes, the "No" cells are empty, sometimes they have a "No", sometimes for the same product, one criterion is empty, the other has a "No".
If I do a nested IF formula, I would have to create 32 of them (I believe, since its 5 criteria with each 2 options).
Obviously I can do that. I was wondering anyone has a better solution for me? Something more practical.
Thanks in advance!
Based on the data you've provided, it looks like your statuses are based on the number of Yes's in the input columns. Also you don't have a status shown for zero Yes's so I'll make an additional for that.
Given that assumption you can use a combination of the COUNTIF function (to count the Yes's), and the IFS function (to manage nested Ifs better) to drastically reduce the size of your function.
To make this cleaner I suggest you add a column and hide it containing: =COUNTIF([InputCriteria1to5Range],"Yes")
For the next formula assume the formula above is in B2. In your status column put the following:
=IFS(B2=5, Status1, B2=4, Status2, B2=3, Status3, B2=2, Status4, B2=1, Status5, B2=0, Status6)
Solution: Thanks to all for your help, I ended up firstly, creating ALL scenarios. This was actually the most complex part. See https://www.mrexcel.com/forum/excel-questions/654871-how-generate-all-possible-combinations-two-lists-without-macro.html (Answer from "Tusharm") where I had to repeat this process 5 times to have all possible outcomes. In the end, there were 192 combinations.
Then, I assigned a status for each combination.
Finally, for each product/row, I created another column where I concatenated the different criteria so that it looks exactly like my above combinations. Then finally index match the concatenated criteria to my combinations.

How to optimize COUNTIFS with very large data

I would like to create a report that look like this picture below.
My data has around 500,000 cells (it will continue to grow larger)
Right now, I'm using countifs function from excel but it takes a very long time to calculate. (cannot turnoff automatic calculate)
The main value is collected as date and the range of date is about 3 years, so I have to put a lot of formula to cover all range of value.
result
The picture below is the datasource the top one cannot be changed. , while the bottom is the one I created by myself (can change). I use weeknum to change date to week number.
data
Are there any better formula or any ways to make this file faster? Every kinds of suggestions are welcome!
I was thinking about using Pivot Table, but I don't know how to make pivot table from this kind of datasource.
PS. VBA is the last option.
You can download example file here: https://www.mediafire.com/?t21s8ngn9mlme2d
I will post this answer with the disclaimer that it is entirely dependent on the size of the data set. That turning on and off the auto calculate is the best way, but your question doesn't let me do that, so keep reading.
Your question made me curious, so I gave it a try and timed it. I essentially set up two columns of over 100,000 rand numbers choosing from 1-1000 and then tried to do a countif on the two columns if they were equal. I made a macro that I can run that turns off the autocalculate, inserts the start time, calculates, and then inserts the finish time. I highlighted in yellow the time difference.
First I tried your way, two criteria, countifs:
Then I tried to combine (concatenate) the two columns to see if I could make it easier by only having one countif criteria and data set. It doesn't. see result below:
Finally, realizing what was going on. I decided to make the criteria only match the FIRST value in the number to look for. I was essentially reducing the number of characters to check per cell. This had a positive result. See below:
Therefore my suggestion is to limit the length of the words you are comparing in anyway possible. You are mostly looking at dates, so you might have to get creative, but this seems to be the best way possible without going to manual calculation.
I have worked with Excel sheets of a similar size. Especially if you are using the data on a regular basis, I would heartily recommend switching to a proper database SQL based, Access, or whatever fits your purpose. I does wonders for the speed and also you won't run into the size limits of Excel. :-)
You can import the data you have now fairly easy.
I am happy as a clam with my postgresql db.

Resources