Index/Match with Varied Offset - excel

Running into some trouble performing an Index/Match where the offset rows could be spaced 1 row apart, 2 rows apart, or 3 rows apart. Below is an example of the setup:
Sheet1:
| A | B | C | D | E | F |
-------------------------------------------------
| | | | | Apple | |
-------------------------------------------------
| Ser1 | | | | | |
-------------------------------------------------
| | | | | Orange| |
-------------------------------------------------
| Ser2 | | Ser3 | | Ser4 | |
-------------------------------------------------
| Ser5 | | | | | |
Sheet2:
| A |
---------
| Ser1 |
---------
| Ser2 |
---------
| Ser3 |
---------
| Ser4 |
---------
| Ser5 |
I have a list of the serial numbers (ser1, ser2, etc) in another sheet, and I need to match values in Column E where the correct value is above the serial number by 1, 2, or 3 rows. As you can see, serial numbers could be in column A, C, or E.
Ex: Ser1 should match on Apple. Ser2, Ser3, ser4, and Ser5 should match on Orange.
I can't seem to figure out the correct Index/Match that work completely since the offset at the end of the formula is either + or - by a static row number.

Rough solution:
In your sheet Make a structure like this:
| A | Row | Lookup
---------
| Ser1 |
---------
| Ser2 |
---------
| Ser3 |
---------
| Ser4 |
---------
| Ser5 |
For column Row put:
=SUMPRODUCT((Sheet1!A$1$:E$5$=A2)*ROW(Sheet1!A$1$:E$5$))
this will calculate row in which serial number occures in your data range. Then you may put in Lookup column:
=LOOKUP(2,(1/(INDEX(Sheet1!$E$1:$E$5,1,1):INDEX(Sheet1!$E$1:$E$5,B2-1,1)<>"")),Sheet1!$E$1:$E$5)
This formula will lookup for last non-empty cell in column E (please refer here for more comments) in range that is above selected serial number.
This is a partial solution as for "Ser5" you will receive "Ser4". To overcome that issue you may perform additional Vlookup on achieved results.

Related

Filter filter criteria and then apply in countif statement in Excel

I have a table of filter criteria like this:
+----------+----------+------+------+------+
| Category | SpecName | Spec | Pass | Fail |
+----------+----------+------+------+------+
| A | S1 | 3 | | |
| A | S2 | 4 | | |
| B | S1 | 5 | | |
| C | S1 | 2 | | |
+----------+----------+------+------+------+
I have a table I want to apply the filter criteria to like this:
+----------+----+----+
| Category | S1 | S2 |
+----------+----+----+
| A | 5 | 3 |
| B | 4 | |
| A | 5 | 5 |
| C | 2 | |
| A | 2 | 6 |
+----------+----+----+
I want to fill the Pass and Fail columns in the filter criteria table with a count of items in second table with values >= the corresponding spec, like so.
+----------+----------+------+------+------+
| Category | SpecName | Spec | Pass | Fail |
+----------+----------+------+------+------+
| A | S1 | 3 | 1 | 2 |
| A | S2 | 4 | 1 | 2 |
| B | S1 | 5 | 0 | 1 |
| C | S1 | 2 | 1 | 0 |
+----------+----------+------+------+------+
Here are steps for how I might do it in a scripting language:
Filter first table to get all spec filter criteria for the Category on that row, as follows for the first row.
+----------+----------+------+
| Category | SpecName | Spec |
+----------+----------+------+
| A | S1 | 3 |
| A | S2 | 4 |
+----------+----------+------+
Copy table 2 to a variable iTable
+----------+----+----+
| Category | S1 | S2 |
+----------+----+----+
| A | 5 | 3 |
| B | 4 | |
| A | 5 | 5 |
| C | 2 | |
| A | 2 | 6 |
+----------+----+----+
For each spec name:
Find column in iTable with spec name
Filter spec name column in iTable by spec
After all filters applied, we would have:
+----------+----+----+
| Category | S1 | S2 |
+----------+----+----+
| A | 5 | 5 |
+----------+----+----+
Then just count the rows in iTable and assign to the cell in Pass column of the criteria table
Is this possible with Excel formulas?
If not, does anyone know how to do it with VBA?
Looking at an alternative layout for you spec criteria. Expand you columns to suit your need.
With each spec criteria being its own column life gets really easy. You just need to adjust your formula to match the number of criteria you have.
Based on the table at the end for layout, place the following formula in D3 and copy down as required.
=SUMPRODUCT(($G$2:$G$6=A3)*($H$2:$H$6>=B3)*($I$2:$I$6>=C3))
That will give you a count of passing all criteria. Its also a function that performs array like calcs. It could be repeated in the next column but in order to reduce dependency on array calculation and potentially speed things up depending on the amount of data to check, place the following in the top of the fail column and copy down as required:
=COUNTIF($G$2:$G$6,A3)-D3
Basically it subtracts the passes from the total count. This assumes you can only have PASS and FAIL as options.

Return unique column headers matching criteria

Consider the following data below:
| 1st | 2nd | A | B | C | D | E | F | G | H |
|-----|-----|---|---|---|---|---|---|---|---|
| y | x | | | 1 | | | | | |
| y | x | | | 1 | | | | | |
| y | x | | | | 1 | | | | |
| | x | 1 | | | | | | | |
| y | | 1 | 1 | 1 | | | | | |
| y | x | | | | | | 1 | | |
| y | | | | | | | | 1 | |
| | x | | | | | 1 | | | |
| | x | | | | | | | | 1 |
| y | x | | | | | | | | 1 |
What I wish to do is to return all column headers (from A to H) that meets the following condition: it should have a value of 1 that is both aligned with a y and x value from the first two columns.
I already have a working array formula to do this, which is as follows:
{=INDEX($C$1:$J$1,SMALL(IF(($A$2:$A$11="y")*($B$2:$B$11="x")*($C$2:$J$11=1),COLUMN($C$1:$J$1)-COLUMN($B$1)),ROW(1:1)))}
However, while I drag this down, it returns two C values and one for D, F and H.
This is since there are two 1's under header C that meets the said condition. What I want is to return unique values, so C should only be returned once. I tried to make use of MATCH and additional COUNTIF instead of the SMALL function, but it is returning an error, and the 'Evaluate formula' feature of Excel isn't helping. Below if the erroneous formula I experimented with:
{=INDEX($C$1:$J$1,MATCH(0,IF(($A$2:$A$11="y")*($B$2:$B$11="x")*($C$2:$J$11=1),COUNTIF($N$1:N1,COLUMN($C$1:$J$1)-COLUMN($B$1))),0))}
A workaround I am currently doing is to make my first formula a "helper column" and then create another formula based from the first formula's result to return only the unique values. However, the double array formula is heavily affecting the efficiency of Excel's calculation due to the huge volume of data I'm dealing with.
Any help/suggestions will do please (no VBA please, since I believe it's not needed here). Thanks!
Insert a helper row. I did it just under your header row before your data. In this row you check to see if there is a 1 that lines up with an x and a y. I assumed this to be non blank, but if its specific values change the formula from <>"" to ="y" or =134 as the case may be. Place the following formula under your first column header you are interested in and copy right.
=--(0<SUMPRODUCT(($B$3:$B$12<>"")*($C$3:$C$12<>"")*(D3:D12=1)))
Then where you want to generate your list in a column without space and sorted in the order the appear in from left to right in the headings, use the following formula and copy down as required:
=IFERROR(INDEX($1:$1,AGGREGATE(15,6,COLUMN($D$2:$K$2)/$D$2:$K$2,ROW(A1))),"")
The above formula put in a blank value when no column heading applies are you have copied the formula down beyond the number of applicable columns.
The above formulas are based on the proof of concept image below. Adjust ranges to suit your needs.
Have you tried without the use of an array formula? I don't know how large the data actually is. But, this might be what you are looking for:
=IF(COUNTIFS($A:$A,"y",$B:$B,"x",C:C,1)>0,C1,"")
Assuming column A is "1st" and "H" is your last column at colunm J. Try pasting the formula at "K1" and drag it to your right until "S1".

PySpark getting distinct values over a wide range of columns

I have data with a large number of custom columns, the content of which I poorly understand. The columns are named evar1 to evar250. What I'd like to get is a single table with all distinct values, and a count how often these occur and the name of the column.
------------------------------------------------
| columnname | value | count |
|------------|-----------------------|---------|
| evar1 | en-GB | 7654321 |
| evar1 | en-US | 1234567 |
| evar2 | www.myclient.com | 123 |
| evar2 | app.myclient.com | 456 |
| ...
The best way I can think of doing this feels terrible, as I believe I have to read this data once per column (there are actually about 400 such columns.
i = 1
df_evars = None
while i <= 30:
colname = "evar" + str(i)
df_temp = df.groupBy(colname).agg(fn.count("*").alias("rows"))\
.withColumn("colName", fn.lit(colname))
if df_evars:
df_evars = df_evars.union(df_temp)
else:
df_evars = df_temp
display(df_evars)
Am I missing a better solution?
Update
This has been marked as a duplicate but the two responses IMO only solve part of my question.
I am looking at potentially very wide tables with potentially a large number of values. I need a simple way (ie. 3 columns that show the source column, the value and the count of the value in the source column.
The first of the responses only gives me an approximation of the number of distinct values. Which is pretty useless to me.
The second response seems less relevant than the first. To clarify, source data like this:
-----------------------
| evar1 | evar2 | ... |
|---------------|-----|
| A | A | ... |
| B | A | ... |
| B | B | ... |
| B | B | ... |
| ...
Should result in the output
--------------------------------
| columnname | value | count |
|------------|-------|---------|
| evar1 | A | 1 |
| evar1 | B | 3 |
| evar2 | A | 2 |
| evar2 | B | 2 |
| ...
Using melt borrowed from here:
from pyspark.sql.functions import col
melt(
df.select([col(c).cast("string") for c in df.columns]),
id_vars=[], value_vars=df.columns
).groupBy("variable", "value").count()
Adapted from the answer by user6910411.

How to display the value which is found in vlookup

I got an Excel like this:
---------
| 1 | a |
---------
| 2 | b |
---------
| 3 | c |
---------
| 4 | d |
---------
| 5 | e |
---------
and a table like this:
---------
| 4 | d |
---------
| 3 | k |
---------
| 2 | b |
---------
| 1 | a |
---------
Now I want to check: If the data of the first column is the same in both tables, then if the second column is not the same, it has to display the value of the other table. So like this:
---------------------
| 1 | a | correct |
---------------------
| 2 | b | correct |
---------------------
| 3 | c | k |
---------------------
| 4 | d | correct |
---------------------
| 5 | e | not found |
---------------------
This is what I already have:
=IFERROR(IF(VLOOKUP(F2;A:B;2;FALSE)=G2;"Correct";"Wrong");"Not Found")
The "wrong" needs to change in some sort of formula.
Thanks in advance!
You already have it in your formula. the below part gives you the value in the 2nd table corresponding to the number. this you are checking if it matches with the value in the first table. if it does not match, print this else print correct
VLOOKUP(F3,$F$8:$G$11,2,FALSE)
The formula should be
=IFERROR(IF(VLOOKUP(F3,$F$8:$G$11,2,FALSE)=G3,"Correct",VLOOKUP(F3,$F$8:$G$11,2,FALSE)),"Not Found")
Google Sheet
Replace "Wrong" with VLOOKUP(F2;A:B;2;FALSE)

Lookup Two Columns Against Two Columns in Excel

With two sheets, I'm looking to compare columns B and C from Sheet1 to columns A and B of Sheet2. If there is a match, record the value in column A from Sheet1 in column C of Sheet2.
Specifically, what would be a formula to place in Column C on Sheet2 to calculate the corresponding value from Column A on Sheet1?
Sheet1
| A | B | C |
| 1 | 1000 | A |
| 2 | 2000 | B |
| 3 | 3000 | C |
| 4 | 4000 | D |
Sheet2
| A | B | C |
| 3000 | C | |
| 2000 | B | |
| 3000 | C | |
| 1000 | A | |
Sheet2 (desired output)
| A | B | C |
| 3000 | C | 3 |
| 2000 | B | 2 |
| 3000 | C | 3 |
| 1000 | A | 1 |
Apologies if this particular issue has already been answered. I feel like this should be very simple, but I'm just not very experienced in these types of lookups.
The easiest way is to insert a helper column in each sheet that defines a unique key.
To do so, insert a new column C in each sheet and populate it with this formula =A1&";"&B1.
Then, enter this formula in D1 (formerly C1) of sheet 2: =VLOOKUP(C1,Sheet1!$C:$D,2,0)

Resources