Filter filter criteria and then apply in countif statement in Excel - excel

I have a table of filter criteria like this:
+----------+----------+------+------+------+
| Category | SpecName | Spec | Pass | Fail |
+----------+----------+------+------+------+
| A | S1 | 3 | | |
| A | S2 | 4 | | |
| B | S1 | 5 | | |
| C | S1 | 2 | | |
+----------+----------+------+------+------+
I have a table I want to apply the filter criteria to like this:
+----------+----+----+
| Category | S1 | S2 |
+----------+----+----+
| A | 5 | 3 |
| B | 4 | |
| A | 5 | 5 |
| C | 2 | |
| A | 2 | 6 |
+----------+----+----+
I want to fill the Pass and Fail columns in the filter criteria table with a count of items in second table with values >= the corresponding spec, like so.
+----------+----------+------+------+------+
| Category | SpecName | Spec | Pass | Fail |
+----------+----------+------+------+------+
| A | S1 | 3 | 1 | 2 |
| A | S2 | 4 | 1 | 2 |
| B | S1 | 5 | 0 | 1 |
| C | S1 | 2 | 1 | 0 |
+----------+----------+------+------+------+
Here are steps for how I might do it in a scripting language:
Filter first table to get all spec filter criteria for the Category on that row, as follows for the first row.
+----------+----------+------+
| Category | SpecName | Spec |
+----------+----------+------+
| A | S1 | 3 |
| A | S2 | 4 |
+----------+----------+------+
Copy table 2 to a variable iTable
+----------+----+----+
| Category | S1 | S2 |
+----------+----+----+
| A | 5 | 3 |
| B | 4 | |
| A | 5 | 5 |
| C | 2 | |
| A | 2 | 6 |
+----------+----+----+
For each spec name:
Find column in iTable with spec name
Filter spec name column in iTable by spec
After all filters applied, we would have:
+----------+----+----+
| Category | S1 | S2 |
+----------+----+----+
| A | 5 | 5 |
+----------+----+----+
Then just count the rows in iTable and assign to the cell in Pass column of the criteria table
Is this possible with Excel formulas?
If not, does anyone know how to do it with VBA?

Looking at an alternative layout for you spec criteria. Expand you columns to suit your need.
With each spec criteria being its own column life gets really easy. You just need to adjust your formula to match the number of criteria you have.
Based on the table at the end for layout, place the following formula in D3 and copy down as required.
=SUMPRODUCT(($G$2:$G$6=A3)*($H$2:$H$6>=B3)*($I$2:$I$6>=C3))
That will give you a count of passing all criteria. Its also a function that performs array like calcs. It could be repeated in the next column but in order to reduce dependency on array calculation and potentially speed things up depending on the amount of data to check, place the following in the top of the fail column and copy down as required:
=COUNTIF($G$2:$G$6,A3)-D3
Basically it subtracts the passes from the total count. This assumes you can only have PASS and FAIL as options.

Related

Find cell address of value found in range

tl;dr In Google Sheets/Excel, how do I find the address of a cell with a specified value within a specified range where value may be in any row or column?
My best guess is
=CELL("address",LOOKUP("My search value", $search:$range))
but it doesn't work. When it finds a value at all, it returns the rightmost column every time, rather than the column of the cell it found.
I have a sheet of pretty, formatted tables that represent various concepts. Each table consists of
| Title |
+------+------+-------+------+------+-------+------+------+-------+
| Sub | Prop | Name | Sub | Prop | Name | Sub | Prop | Name |
+------+------+-------+------+------+-------+------+------+-------+
| Sub prop | value | Sub prop | value | Sub prop | value |
+------+------+-------+------+------+-------+------+------+-------+
| data | data | data | data | data | data | data | data | data |
| data | data | data | data | data | data | data | data | data |
⋮
I have 8 such tables of variable height arranged in a grid within the sheet 3 tables wide and 3 tables tall except the last column which has only 2 tables--see image. These fill the range C2:AI78.
Now I have a table off to the right consisting in AK2:AO11 of
| Table title | Table title address | ... |
+---------------+-----------------------+-----+
| Table 1 Title | | ... |
| Table 2 Title | | ... |
⋮
| Table 8 Title | | ... |
I want to fill out the Table title address column. (Would it be easier to do this manually for all of 8 values? Absolutely. Did I need to in order to write this question? Yes. But using static values is not the StackOverflow way, now, is it?)
Based on very limited Excel/Google Sheets experience, I believe I need to use CELL() and LOOKUP() for this.
=CELL("address",LOOKUP($AK4, $C$2:$AI$78))
This retrieves the wrong value. For AL4 (looking for value Death Wave), LOOKUP($AK4, $C$2:$AI$78) should retrieve cell C2 but it finds AI2 instead.
| Max Levels |
+------------------+---------------+----+--+----+
| UW | Table Address | | | |
+------------------+---------------+----+--+----+
| Death Wave | $AI$3 | 3 | | 15 |
| Poison Swamp | $AI$30 | | | |
| Smart Missiles | $AI$56 | | | |
| Black Hole | #N/A | 1 | | |
| Inner Land Mines | $AI$3 | | | |
| Chain Lightning | #N/A | | | |
| Golden Tower | $AI$3 | | | |
| Chrono Field | #N/A | 25 | | |
The error messages for the #N/A columns is
Did not find value '<Table Title>' in LOOKUP evaluation.
My expected table is
| Max Levels |
+------------------+---------------+----+--+----+
| UW | Table Address | | | |
+------------------+---------------+----+--+----+
| Death Wave | $C$2 | 3 | | 15 |
| Poison Swamp | $C$28 | | | |
| Smart Missiles | $C$54 | | | |
| Black Hole | $O$2 | 1 | | |
| Inner Land Mines | $O$28 | | | |
| Chain Lightning | $O$54 | | | |
| Golden Tower | $AA$2 | | | |
| Chrono Field | $AA$39 | 25 | | |
try:
=INDEX(ADDRESS(
VLOOKUP(A2:A3, SPLIT(FLATTEN(D2:F4&"​"&ROW(D2:F4)), "​"), 2, ),
VLOOKUP(A2:A3, SPLIT(FLATTEN(D2:F4&"​"&COLUMN(D2:F4)), "​"), 2, ), 4))
or if you want to create jump links:
=INDEX(LAMBDA(x, HYPERLINK("#gid=1273961649&range="&x, x))(ADDRESS(
VLOOKUP(A2:A3, SPLIT(FLATTEN(D2:F4&"​"&ROW(D2:F4)), "​"), 2, ),
VLOOKUP(A2:A3, SPLIT(FLATTEN(D2:F4&"​"&COLUMN(D2:F4)), "​"), 2, ), 4)))
Try this:
=QUERY(
FLATTEN(
ARRAYFORMULA(
IF(
C:AI=$AK4,
ADDRESS(ROW(C:AI), COLUMN(C:AI)),
""
)
)
), "
SELECT
Col1
WHERE
Col1<>''
"
, 0)
Basically, cast all cells in the search range to addresses if they equal the search term. Then flatten that 2D range and filter out non-nulls.

Looking up values in third column by two criteria

I have a list similar to this one:
NO | Cat1 | Cat2 | | Crit1 | Crit2 |
---|------|------| | A | O |
5 | A | O |
3 | K | Y |
6 | K | Y |
7 | F | K |
8 | A | O |
9 | J | H |
10 | K | Y |
5 | F | T |
50 | A | O |
8 | L | E |
1 | R | D |
Based on two criteria I want a dynamic list which changes everytime the content are changed or the criteria are changed.
If criteria is A O then the list should be as below,
|List|
|----|
| 5 |
| 8 |
| 50 |
If any other criteria is selected the list will be longer or shorter and if nothing is present it is shown as a blank cell.
I have tried some MATCH and INDEX formulas but I cannot make it work correctly.
=IFERROR(INDEX(LookUpList;MATCH(0;COUNTIF(NewList;LookUpList)+IF(Cat1<>Crit1;1;0)+IF(Cat2<>Crit2;1;0);0));"")
Sorted ascending:
=IFERROR(AGGREGATE(15,7,A$2:A$12/((B$2:B$12=G$1)*(C$2:C$12=G$2)),ROW(1:1)), "")
Ordered by row:
=IFERROR(INDEX(A:A, AGGREGATE(15, 7, ROW(A:A)/((B$1:B$12=G$1)*(C$1:C$12=G$2)), ROW(1:1))), "")
Pick one formula then fill down for subsequent matches.

Determine range for one value in a column, use to run function over same range in another

Summary
I want to have a column in my spreadsheet that does 2 things.
1) In an ordered column, it will return the range where the column contains a specified value.
2) It will run a function (i.e., =SUM(), =AVERAGE(), etc.) over that same range in a different column.
Examples
Original
| NAME | VAL | FOO |
|-------|-----|-----|
| A | 3 | |
| A | 2 | |
| A | 4 | |
| A | 3 | |
| B | 2 | |
| B | 2 | |
| B | 1 | |
| C | 6 | |
| C | 5 | |
Average
I would want to get the average of VAL for each NAME. I would want the result to be:
| NAME | VAL | FOO |
|-------|-----|-----|
| A | 3 | 3 |
| A | 2 | 3 |
| A | 4 | 3 |
| A | 3 | 3 |
| B | 2 | 1.7 |
| B | 2 | 1.7 |
| B | 1 | 1.7 |
| C | 6 | 5.5 |
| C | 5 | 5.5 |
Sum
Another example would be to get the sum of VAL for each NAME.
| NAME | VAL | FOO |
|-------|-----|-----|
| A | 3 | 12 |
| A | 2 | 12 |
| A | 4 | 12 |
| A | 3 | 12 |
| B | 2 | 5 |
| B | 2 | 5 |
| B | 1 | 5 |
| C | 6 | 11 |
| C | 5 | 11 |
Having "NAME" ordered makes it easy. If "NAME" is in A1. Enter this into C2 for the sum, then fill down:
=IF(A2=A3,C3,SUMIF($A$2:A2,A2,$B$2:B2))
Enter this into C2 for the average, then fill down:
=IF(A2=A3,C3,AVERAGEIF($A$2:A2,A2,$B$2:B2))
Note that the result in C2 won't be what you want until you fill down.
Update for MAXIF
If you don't have Excel 2016, you'll have to use an array formula (commit with ctrl+shift+enter):
=IF(A2=A3,C3,MAX(IF($A$2:A2=A2,$B$2:B2)))

PySpark getting distinct values over a wide range of columns

I have data with a large number of custom columns, the content of which I poorly understand. The columns are named evar1 to evar250. What I'd like to get is a single table with all distinct values, and a count how often these occur and the name of the column.
------------------------------------------------
| columnname | value | count |
|------------|-----------------------|---------|
| evar1 | en-GB | 7654321 |
| evar1 | en-US | 1234567 |
| evar2 | www.myclient.com | 123 |
| evar2 | app.myclient.com | 456 |
| ...
The best way I can think of doing this feels terrible, as I believe I have to read this data once per column (there are actually about 400 such columns.
i = 1
df_evars = None
while i <= 30:
colname = "evar" + str(i)
df_temp = df.groupBy(colname).agg(fn.count("*").alias("rows"))\
.withColumn("colName", fn.lit(colname))
if df_evars:
df_evars = df_evars.union(df_temp)
else:
df_evars = df_temp
display(df_evars)
Am I missing a better solution?
Update
This has been marked as a duplicate but the two responses IMO only solve part of my question.
I am looking at potentially very wide tables with potentially a large number of values. I need a simple way (ie. 3 columns that show the source column, the value and the count of the value in the source column.
The first of the responses only gives me an approximation of the number of distinct values. Which is pretty useless to me.
The second response seems less relevant than the first. To clarify, source data like this:
-----------------------
| evar1 | evar2 | ... |
|---------------|-----|
| A | A | ... |
| B | A | ... |
| B | B | ... |
| B | B | ... |
| ...
Should result in the output
--------------------------------
| columnname | value | count |
|------------|-------|---------|
| evar1 | A | 1 |
| evar1 | B | 3 |
| evar2 | A | 2 |
| evar2 | B | 2 |
| ...
Using melt borrowed from here:
from pyspark.sql.functions import col
melt(
df.select([col(c).cast("string") for c in df.columns]),
id_vars=[], value_vars=df.columns
).groupBy("variable", "value").count()
Adapted from the answer by user6910411.

Transform values without VBA but with Index and Match

I'm trying to find a solution without macros in excel for following problem:
There is a table containing ratings of a student for different time periods.
So the rating of the student with ID=1 was 1 from January to April and 3 from Mai to June.
Two other students had a constant ranking (6 and 9) from January to June
| A | B | C |D |
---| ----|------------|------------|-------|
1 | ID | START | END |RANKING|
2 | 1 | 01.01.2014 | 30.04.2014 | 1 |
3 | 1 | 01.05.2014 | 30.06.2014 | 3 |
4 | 2 | 01.01.2014 | 30.06.2014 | 6 |
5 | 3 | 01.01.2014 | 30.06.2014 | 9 |
Next table contains IDs (y axis) and Months (x axis)
| F | G | H | I | J | K | L |
---| ----|--------|--------|--------|--------|--------|--------|
1 | ID | 201401 | 201402 | 201403 | 201404 | 201405 | 201406 |
2 | 1 | | | | | | |
3 | 2 | | | | | | |
4 | 3 | | | | | | |
And I wish to feel this second table like this:
| ID | 201401 | 201402 | 201403 | 201404 | 201405 | 201406 |
| ----|--------|--------|--------|--------|--------|--------|
| 1 | 1 | 1 | 1 | 1 | 3 | 3 |
| 2 | 6 | 6 | 6 | 6 | 6 | 6 |
| 3 | 9 | 9 | 9 | 9 | 9 | 9 |
I tried to use Index and Match, but without any good results because I haven't found a posibility to use IF (if (
Could anybody help?
You can get what you're looking for with SUMPRODUCT
Given the layout you provided, this formula should work when put in G2 and filled down and over
=SUMPRODUCT(--($A:$A=$F2),--($B:$B<=G$1),--($C:$C>G$1),$D:$D)
That looks in column A for an ID matching F2, then for every one it finds of those:
It checks the date in column B against the date in G1
It checks the date in column C against the date in G1
If all criteria match, it returns the value in Column D
This assumes you only have one entry for each period, otherwise it will sum them.
Also, you can use SUMIFS, it's a little less easy to read but I think it's slightly more efficient than SUMPRODUCT (I'm not positive, just anecdotal evidence from usage)
=SUMIFS($D:$D,$A:$A,"="&$F3,$B:$B,"<="&G$1,$C:$C,">"&G$1)
It does the exact same thing, just with different syntax.

Resources