how to vlookup if prefix found in the list? - excel

HI.
how can i come up with return value of "company name" (column H) at Column B IF any of the "PrefiX" (Column G) found at "con no" (Column A).
Sample of outcome needed as in column B.
Sample:
620011113 = DD
CN1234 = BB
thanks

=INDEX($H:$H,AGGREGATE(15,6,ROW($G$1:$G$7)/(--(FIND($G$1:$G$7,$A2)=1)*--(LEN($G$1:$G$7)>0)),1),1)
Breaking this down, the INDEX retrieves the Nth item from Column H (Company name). To find the value of N, we are using the AGGREGATE function
AGGREGATE is a weird function - it lets us use things like MAX or LARGE or SUM while ignoring any error values. In this case, we will be using it for SMALL (first argument, 15), while Ignoring Error Values (second argument, 6). We will want the very smallest value, so the fourth argument will be 1. (If we wanted the second smallest, it would be 2, and so on)
=INDEX($H:$H,AGGREGATE(15,6, <SOMETHING> ,1),1)
So, all we need now is a list of values to compare! To make things slightly simpler, I'll break that bit of the code out for you here:
ROW($G$1:$G$7) / (--(FIND($G$1:$G$7,$A2)=1) * --(LEN($G$1:$G$7)>0))
There are 3 parts to this. The first, ROW($G$1:$G$7)is the actual value we want to retrieve - these will be the Row Numbers for each Prefix that matches your value. On its own, however, it will be all the row numbers. Since we are skipping errors, we want any Rows that don't match the prefix to throw an error. The easiest way to do this is to Divide by Zero
At the start of --(FIND($G$1:$G$7,$A2)=1) and --(LEN($G$1:$G$7)>0) we have a double-negative. This is a quick way to convert True and False to 1 and 0. Only when both tests are True will we not divide by 0, as this table shows:
A | B | A*B
1 | 1 | 1
1 | 0 | 0
0 | 1 | 0
0 | 0 | 0
Starting with the second test first (it's easier), we have LEN($G$1:$G$7)>0 - basically "don't look at blank cells".
The other test (FIND($G$1:$G$7,$A2)=1) will search for the Prefix in the Con No, and return where it is found (or a #VALUE! error if it isn't). We then check "is this at position 1" - in other words, "Is this at the start of the Con No, rather than in the middle". We don't want to say Con No CNQ6060 is part of Company AA instead of Company BB by mistake!
So, if the Prefix is at the Start of the Con No, AND it isn't Blank (because there is an infinite amount of Nothing Before, After, and Between every number and letter), then we get it added to our list of Rows. We then take the smallest row (i.e. closest to the top - change AGGREGATE(15 to AGGREGATE(14 if you want the closest to the bottom!), and use that to get the Company Name

You could try the below formal:
=VLOOKUP(IF(LEFT(A3,1)="6",LEFT(A3,4),IF(LEFT(A3,1)="C",LEFT(A3,2),IF(LEFT(A3,1)="E",LEFT(A3,7)))),$G$3:$H$7,2,0)
Have in mind that you have to use ' before the cell value of column A & G in order to convert cell value into text get the correct out comes using VLOOKUP
Result:

Related

How do I group rows based on a fixed sum of values in Excel?

I am trying to find another solution to below Excel formula that was already provided here:
How do I create groups based on the sum of values?
It is the same requirement, but the grouping criteria needs to be an exact value.
Here's the sample data:
Column A | Column B
Item A | 1
Item B | 2
Item C | 3
Item D | 4
Item E | 5
Item F | 1
Item G | 2
Item H | 3
Item I | 4
Item J | 5
I need to group the rows if their Column B sum = 5.
Expected result:
Group 1 = Item A, Item D (1 + 4) = 5
Group 2 = Item B, Item C (2 + 3) = 5
Group 3 = Item E = 5
Group 4 = Item F, Item I (1 + 4) = 5
Group 5 = Item G, Item H (2 + 3) = 5
Group 6 = Item J = 5
If a row's Column B exceeds 5 or does not have another matching row to equal 5 when added then it will have no Group value.
Groupings can be interchangeable, ie. Group 1 = Item A, Item I can be made since 1 + 4 = 5.
I assume this can be achieved using Excel formulas but I am struggling to find which formula(s) can be used. Any help is appreciated!
I believe I was able to understand your question after some comments exchanged. Anyway I would recommend to update your question, it is an interesting problem, but the question was difficult to follow.
Before looking for an Excel solution, I took the approach of understanding the problem as a state machine with the transition from one state to another. I considered the following states that represent the position the item in the group. A group is defined as consecutive items that the sum of all items is equal to 5.
EMPTY: Just the initial situation
START: Start of the group
MIDDLE: A middle element of the group
END: The end of the group
START-END: A group with a single element
NA: Not applicable group
I follow the same idea of: How do I create groups based on the sum of values?, but slightly different helper columns:
Total (Column D), but for this case it is used the following formula: IF(SUM(C3,D2)>5,C3,SUM(C3,D2))
Status or item position within Group (Column G). Here is where it is calculated the corresponding status for each element
Checks for Valid Groups (Column H): Evaluates if a group is valid. When there is no match to 5, the group is not valid. It is indicated at the row that represents the beginning of the group (START or START-END states). If TRUE it means a valid group, if FALSE it is not a valid group, and NA for an NA value from Status column. If empty represents any element of the group that is not the first one.
Group # (Column I): To identify the group the row (Item) belongs to. Notice that we start counting the group from 1 and I also consider the case a group can not be formed (NA).
Here is a screenshot with the solution and the formula on G3:
=LET(total, D3, prevS, G2, QTY, C3,
IF(C3="", "",
IF(OR(AND(total=5, QTY<5, prevS="START"), AND(total=5, prevS="MIDDLE")), "END",
IF(OR(AND(total>5, total=QTY, OR(prevS="START", prevS="MIDDLE")),AND(total>5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "NA",
IF(OR(AND(total<5, total=QTY, OR(prevS="START", prevS="MIDDLE")),AND(total<5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "START",
IF(AND(total<5, OR(prevS="START", prevS="MIDDLE")), "MIDDLE",
IF(OR(AND(total=5, total= QTY, OR(prevS="START", prevS="MIDDLE")),AND(total=5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "START-END", "UNDEFINED")
)
)
)
)
)
)
Notes::
LET Excel function is used to have something more readable
The IF blocks should to be ordered from the most specific case of total and QTY values to the most generic ones. For the case with same total condition, make sure the second condition for prevS are not repeated.
Added as a last resort UNDEFINED case, to check if any transition was not covered, if that is the case it has to be reviewed, so far in the sample data all cases are covered
Column K-Q is just for documenting purpose to identify all possible transitions. Column K-M provides all possible transitions organized them by previous status. The columns O-Q represent all possible transitions ordered by current status, so it is easier to formulate each portion of the IF blocks.
Maybe the formula can be simplified, compared to the solution provided by the similar question is more complex, but this question has more specific conditions. Some transitions maybe not relevant for the final result, but it is preferred to consider all positions in the group to make sure all transitions are covered.
The following state machine diagram shows all possible transitions:
Notes:
As you can see the solution also considers when a group cannot be created or non valid groups (NA values). The solution considers that Item column has only positive values, it is not stated in the question any restriction, but looking at the example they are all positives. To consider zero values, this solution needs to be adjusted.
Checks for Valid Groups column is calculated as follow:
= IF(G3="", "",
IF(G3="START-END", TRUE,
IF(G3="NA", "NA",
IF(G3="START",
LET(endRow, IFNA(MATCH("START", LEFT(G4:$G$1000,5),0), MATCH("", LEFT(G4:$G$1000,5),0))+ ROW()-1,
value, VLOOKUP("END", G4:INDIRECT( "G" & endRow),1,0),
IF(ISNA(value), FALSE, TRUE)
), ""
)
)
)
)
It identifies the start and end of the group, and then finds any NA values, if there are, then it is not a valid group. If the end of the candidate group is not found (the first MATCH returns N/A), then is searches until a blank row
Group # column is calculated has follow:
=IF(C3="","", LET(value, MAX($I$2:I2), IF(G3="NA", "NA",
IF(H3=TRUE, value + 1, IF(H3=FALSE, "NA",
IF(I2="NA", "NA", value))))))
This way only valid transaction are considered, i.e. the following status transitions starting from START but not ending in END : START->NA, START->MIDDLE[one or more]...->NA and NA are not considered valid groups (NA).
I added more examples from the original sample file provided, more can be added to further test all possible scenarios, but I guess you get the idea about this approach. As you sated "I assume this can be achieved using Excel formulas" yes it is possible, but I would say for more complex conditions I would suggest to implement a state machine algorithm in VBA. Even it is possible to do it with Excel functions, you have to deal with several nested IF blocks and helper columns, something that can be achieved with a simple for-loop in VBA.
Here is a link to online Excel file I used.

lookup value based on partial value

I have a set of batch numbers in a sheet which are alphanumeric code as follows
sdc234
fgh345
ght587
jki876
The alphabets of the batch number represent a product code. For example
sdc = 20499999
fgh = 45999999
ght = 67999992
jki = 56700000
The above relation is in another sheets.
I want to match product code with batch number directly. How do i lookup a product code based on this partial info ?
You can sort your second table in an alphabetical order and use VLOOKUP with TRUE (approximate match) as your third argument.
Assuming the second table is in column A and B:
D E
sdc234 =VLOOKUP(D1,A:B,2,TRUE)
fgh345 =VLOOKUP(D2,A:B,2,TRUE)
ght587 =VLOOKUP(D3,A:B,2,TRUE)
jki876 =VLOOKUP(D4,A:B,2,TRUE)
The output is as below:
D E
sdc234 20499999
fgh345 45999999
ght587 67999992
jki876 56700000
EDIT:
Assuming your product code is always 3 letters, you can use the LEFT function to get the first 3 letters and then use that as the lookup value. This way you can use the exact match as your third argument:
sdc234 =VLOOKUP(LEFT(D1,3),A:B,2,FALSE)
fgh345 =VLOOKUP(LEFT(D2,3),A:B,2,FALSE)
ght587 =VLOOKUP(LEFT(D3,3),A:B,2,FALSE)
jki876 =VLOOKUP(LEFT(D4,3),A:B,2,FALSE)
Credits to Mladen Savic's comment for making me think of this solution.

how to conditionally match in excel

I've got two data sets: Data-A and Data-B.
Data-A
A B C D Start_Date End_Date
N C P 1 23-05-2015 27-05-2015
N C K 1 30-05-2015 07-06-2015
N C Ke 1 09-06-2015 28-06-2015
N C Ch 1 14-07-2015 25-07-2015
N C Th 1 29-06-2015 13-07-2015
N C Po 2 23-05-2015 27-05-2015
N C Kan 2 30-05-2015 08-06-2015
Data-B
X D Date A B C
444 1 09-07-2015
455 1 20-07-2015
1542 1 28-06-2015
2321 1 21-07-2015
2744 1 01-07-2015
7455 2 25-05-2015
12454 2 02-06-2015
18568 2 24-05-2015
28329 2 03-06-2015
28661 2 31-05-2015
Values is data-Bare missing and I need to fill them using conditional index matching/vlookup such that column D(Data-B) is matched along with Date(Data-B) such that Start Date<= Date <=End Date.
Desired Output:
X D Date A B C
444 1 09-07-2015 N C Th
455 1 20-07-2015 N C Ch
1542 1 28-06-2015 N C Ke
2321 1 21-07-2015 N C Ch
2744 1 01-07-2015 N C Th
7455 2 25-05-2015 N C Po
12454 2 02-06-2015 N C Kan
18568 2 24-05-2015 N C Po
28329 2 03-06-2015 N C Kan
28661 2 31-05-2015 N C Kan
Proof of Concept
In order to achieve the above I used the AGGREGATE function. It is a normal formula that performs array like calculations. The following formula will return the results from the first row that matches your criteria.
=INDEX(A$2:A$8,AGGREGATE(15,6,ROW($D$2:$D$8)/(($J2=$D$2:$D$8)*($E$2:$E$8<=$K2)*($K2<=$F$2:$F$8)),1)-1)
This assumed your table Data-A Started in A1 and included 1 row as a header row. The formula can be place in the first cell under A in Data-B and copied down and to the right as needed.
UPDATE Formula explained
The aggregate function performs array calculations within its brackets for certain sub function. There are about 19 different subfunctions. Subfunction 14 and 15 are both array calculations. This is a nice feature since it does array like calculations while being a regular formula.
Since I wanted the first row that met your criteria, I opted to use the small function or subfunction 15 for the first argument. Basically I am telling the aggregate function to generate a list and sort it in ascending order.
The second argument has a value of 6 which tell the aggregate to ignore any results from the array that generate errors. This will come in very handy if we can make results we do not want turn in to errors.
Now we are getting into the array portion of the formula. You can take this next part of the equation and highlight the appropriate rows in a neighbouring column and enter it as a CONTROL+SHIFT+ENTER (CSE) formula. As long as you do this in the top cell the array formula will propagate to the remainder of the selected cells and show you the results of the array. Also check the formula bar to see if { } appeared around your formula. You cannot add the { } manually.
{=ROW($D$2:$D$8)/(($J2=$D$2:$D$8)*($E$2:$E$8<=$K2)*($K2<=$F$2:$F$8))}
What this will do is determine the current row and then will divide it by the results of our conditions. You can also try each of the following conditions in a separate column as CSE formulas in the same manner described above to see their results.
($J2=$D$2:$D$8)
($E$2:$E$8<=$K2)
($K2<=$F$2:$F$8)
These on their own will provide you with either TRUE or FALSE as it checks each row. Now the interesting thing is, and this applies to excel formulas, when you perform a math operation on a Boolean, it will treat 0 as false and anything other number as TRUE. It will actually convert TRUE to 1. You will also note that each of the logic checks was separated by *. In this case * is acting like an AND operator as only when all results are true will you get an answer of 1. (+ will act like an OR operator)
Now if you remember from earlier 6 said to ignore all errors. So any row that does not meet our logic check will result in a division by 0 since not all logic checks results in TRUE or 1. All the checks that wound up false wind up getting ignored. So now after doing that, a list of only row numbers that met our criteria is left inside the aggregates array.
After the logic check there is a ,1 for the next argument. In this case we are telling the aggregate to return the 1st number in the list which is the first row number that met our criteria. If we wanted the third number, this would be ,3 instead.
So aggregate is returning the first row number of the results we want. When this is paired with an INDEX function, when can use the result to tell us what row of the INDEX function to look in. In this case we said we wanted to look in the index A$2:A$8. The aggregate function is telling us how many rows to go down in the index. If the index had start in row 1 we would not have to do anything. But since there is a header row, we need to adjust the results from the aggregate function by subtracting 1 for the head row (in reality you need to subtract the row number above the start of your data). This is why you see the -1 after the aggregate function.
Now if you pay attention to the lock on the range you will notice I did not lock the A in A$2:A$8. I did this so that I could copy the formula to the right and the column A address would update as I did. This only works because you were keeping the columns in the same order. If the order has changed I would have changed the index from a 1D array to a 2D array and used a MATCH function to line up the column headers.

Extracting a substring from a string of arbitrary length

I have just a hair over 30,000 tweets. I have one column that has the actual tweet. There are two things that I would like to accomplish with this column.
First here is a snippet of sample data:
RT #Just_Sports: Cool page for fans of early pro #baseball. https://t.co/QCMYFQNSq8 #mlb #vintage #Chicago #Detroit #Boston #Brooklyn #Phil…
#brettjuliano you already know #unity #newengland #hiphop #boston #watertown #network
I have a column that uses the following formula to see if the message starts out with RT meaning a re-tweet. It returns 1 for yes and 0 for no.
What I would like to accomplish is to create a formula in two columns. One that will get the username if the RT column has a value of 1 and in the second column the username if the RT column has a value of 0. Since usernames are of arbitrary length I am unsure of how to go about this.
Example
RT #Just_Sports: | 1 | #Just_Sports | 0
#brettjuliano | 0 | | #brettjuliano
Take a look at Excel's FIND function. You can use this to identify the position of the #, then using a specified delimiter, match the end of the user name:
=MID(A1, FIND("#",A1), FIND(":",A1,FIND("#",A1)) - FIND("#",A1))
Where A1 is the cell containing the tweet, and ":" is your delimiter.
You can use the same feature to check for the existence of the "RT" identifier.
=FIND("RT",A1)>0
Which returns TRUE if "RT" is found. You may want to consider a search for " RT " (spaces), or some other variation, since there is no standard for using this in a tweet:
=OR(FIND("RT",A1)>0,FIND(" RT",A1)>0,FIND("RT ",A1)>0, FIND(" RT ",A1)>0)
But beware of false positives: ART, START, ARTOO, etc...
Additionally, your "RT" may be lower/upper/mixed case, in which case you'll want to normalize that search:
=OR(FIND("RT",UPPER(A1))>0,FIND(" RT",UPPER(A1))>0,FIND("RT ",UPPER(A1))>0, FIND(" RT ",UPPER(A1))>0)
My OR check is different than the 0/1 check you say you already have, so you can jsut add IF to that to convert to the 0/1 as needed:
=IF(OR(FIND("RT",A1)>0,FIND(" RT",A1)>0,FIND("RT ",A1)>0, FIND(" RT ",A1)>0),1,0)
Once you know you have the RT check correct, and your second column is filled properly, you can add to my original formula:
Case for 1 in 2nd column:
=IF(B1=1,MID(A1, FIND("#",A1), FIND(":",A1,FIND("#",A1)) - FIND("#",A1)),"")
Case for 0 in 2nd column:
=IF(B1=0,MID(A1, FIND("#",A1), FIND(":",A1,FIND("#",A1)) - FIND("#",A1)),"")

Destination, prefix lookup via Phone number - Excel

I have two tables.
Table one contains: phone number list
Table Two contains: prefix and destination list
I want look up prefix and destination for phone number.
Given below Row data table and result table
Table 01 ( Phone Number List)
Phone Number
------------
12426454407
12865456546
12846546564
14415332165
14426546545
16496564654
16896546564
16413216564
Table 02 (Prefix and Destination List)
PREFIX |COUNTRY
-------+---------------------
1 |Canada_USA_Fixed
1242 |Bahamas
1246 |Barbados
1268 |Antigua
1284 |Tortola
1340 |Virgin Islands - US
1345 |Cayman Island
144153 |Bermuda-Mobile
1473 |Grenada
1649 |Turks and Caicos
1664 |Montserrat
Table 03 (Result)
Phone Number | PREFIX | COUNTRY
--------------+--------+-------------------
12426454407 | 1242 | Bahamas
12865456546 | 1 | Canada_USA_Fixed
12846546564 | 1284 | Tortola
14415332165 | 144153 | Bermuda-Mobile
14426546545 | 1 | Canada_USA_Fixed
16496564654 | 1649 | Turks and Caicos
16896546564 | 1 | Canada_USA_Fixed
16643216564 | 1664 | Montserrat
Lets assume phone numbers are in column A, now in column B you need to extract the prefix. Something like this:
=LEFT(A1, 4)
However your Canada_USA_Fixed creates problems as does the Antigua mobile. I'll let you solve this issue yourself. Start with IF statements.
Now that you have extracted the prefix you can easily use VLOOKUP() to get the country.
Assuming that the longest prefix is 6 digits long you, can add 6 columns (B:G) next to the column with the phone numbers in table 1 (I assume this is column A). In column B you'd show the first 6 characters using =LEFT(A2,6), in the next column you show 5 chars, etc.
Then you add another 6 columns (H:M) , each doing a =MATCH(B2,Table2!A:A,0) to see if this prefix is in the list of prefixes.
Now if any of the 6 potential prefixes match, you'll get the row number of the prefix - else you'll get an #N/A error. Put the following formula in column N: {=INDEX(H2:M2,MATCH(FALSE,ISERROR(H2:M2),0))} - enter the formula as an array formula, i.e. instead of pressing Enter after entering it, press Ctrl-Shift-Enter - you'll see these {} around the formula then, so don't enter those manually!.
Column N now contains the row of the matching prefix or #N/A if no prefix matches. Therefore, put =IF(ISNA(N2,'No matching prefix',INDEX(Table2!B:B,N2)) in the next column and you'll be done.
You could also the above approach with less columns but more complex formulas but I wouldn't recommend it.
I'm also doing longest prefix matches and, like everyone else that Google has turned up, it's also for international phone number prefixes!
My solution is working for my table of 200 prefixes (including world zone 1, ie. having 1 for US/Canada and 1242 for Bahamas, etc).
Firstly you need this array formula (which I'm going to call "X" in the following but you'll want to type out in full)
(LEFT(ValueToFind,LEN(PrefixArray))=PrefixArray)*LEN(PrefixArray)
This uses the trick of multiplying a logical value with an integer so the result is zero if there's no match. You use this find the maximum value in one cell (which I'm calling "MaxValue").
{=MAX(X)}
If MaxValue is more than zero (and therefore some sort of match was found), you can then find the position of the maximum value in your prefix array.
{=MATCH(MaxValue,X,0)}
I've not worried about duplicates here - you can check for them in your PrefixArray separately.
Notes for neophytes:
PrefixArray should be an absolute reference, either stated with lots of $ or as a "named range".
I'm assuming you'll make ValueToFind, MaxValue and the resultant index into PrefixArray as cells on the same row, and therefore have a $ against their column letter but not their row number. This allows easy pasting for lots of rows of ValueToFind.
Array formula are indicated by curly braces, but are entered by typing the text without the curly braces and then hitting Ctrl-Shift-Enter.

Resources