How to MATCH text with a wildcard in the target matrix - excel

I want to assign a transaction type to a list of credit card transactions. The type depends on the description of each transaction, which is a text value. I use INDEX MATCH to lookup the type in a description index table which maps from transaction descriptions to types.
Here is the problem: I want to avoid a long list of very similar entries in the description index table. I already have more than ten entries that contain the substring "amazon". Therefore, I thought about using wildcards in the MATCH so that all transactions that contain the substring "amazon" (or similar) are mapped to the same type. Unfortunately, MATCH supports wildcards only in the search value, not in the target matrix. Therefore it seems to be impossible to maintain a description index table that uses wildcard matching patterns.
Can this be done?
Consider the following sample tables. I want to match Description against Pattern to find the Type.
Transaction Table
| Description | Amount |
|-------------------|--------|
|Amazon Merchant XY | 100.00 |
|Amazon online //> | 89.99 |
|Amazon.com | 32.64 |
Lookup Table
| Pattern | Type |
|-----------|--------------|
|*amazon* |Shoping |
|*itunes* |Entertainment |

Here is a solution for you:
=INDEX($F$2:$F$3,MAX(ROW($A$1:$A$2)*IF(ISERROR(FIND($E$2:$E$3,A2)),0,1)))
This needs to be entered with Ctrl+Shift+Enter rather than just Enter.
It works with the following setup:
Explanation
IF(ISERROR(FIND($E$2:$E$3,A2)),0,1)
Loops through your lookup table and creates an array of ones and zeros depending on whether the lookup value was found in the string A2 in the case of A2 the array would be {1,0}
Multiplying this string by ROW($A$1:$A$2) effectively gives you the row position that you can use with the INDEX function:
{1,0}*{1,2} = {1,0} not that in the case of A4 we would have something like this: {0,1}*{1,2} = {0,2}
So taking the MAX of this gives you the number 1 or 2 depending on which string was found and you can use that as normal with a INDEX function. Note that using the MAX function means that if you have more than one match it will take the lowest match in the list.

Related

Search for a substring in cell and return value of matrix

the following excel tables are given:
Sheet01
String: A
Output: B
+---------------------+--------------+
| String | Output |
+---------------------+--------------+
| ABC Test01 | It is Test01 |
| DEF Test01 | It is Test01 |
| Test01 GHI | It is Test01 |
| Hellow Test02 World | Wow Test02 |
| Test02 Sum Sing | Wow Test02 |
+---------------------+--------------+
Sheet02
Search Criteria: A
OutputThis: B
+-----------------+--------------+
| Search Criteria | OutputThis |
+-----------------+--------------+
| Test01 | It is Test01 |
| Test02 | Wow Test02 |
+-----------------+--------------+
So basically I want to find out if Search Criteria in Sheet02 can be found in String in Sheet01. If so, display in Output (Sheet01) value of OutputThis (Sheet02).
Following works for exact match:
=INDEX(Sheet02!B:B,MATCH(A2,Sheet02!A:A,0))
Now I simply tried to put in the like operator which doesn't make any sense. Because excel can't now what part of String is to be found in Search Criteria.
What I'm looking for is something
=New_Function(SearchCriteriaMatrix, SearchCell, OutputMatrix)
EDIT:
I just used the following code and it somehow works:
=INDEX(Sheet02!B:B,Match(A2,Sheet02!A:A,-1))
The key is "-1". It changes the criteria from an exact match to a broad match. At least that's what I want it to do, but it doesn't workout well. Perhaps someone can use this and help out.
SEARCH or FIND can find a substring within a string. So something like:
=LOOKUP(2,1/SEARCH(srchCritTbl[Search Criteria],A2),srchCritTbl[OutputThis])
will work.
I made your sheet2 table into a "real" table and I'm using structured references, but it'll work with discrete references also, but the references should encompass only the active part of the data table, and not the entire column.
Here is a screen shot showing the results:
If you were to use direct references to the table on Sheet02, it would look like:
=LOOKUP(2,1/SEARCH(Sheet02!$A$2:$A$3,
Sheet01!$A2),Sheet02!$B$2:$B$3)
The way this works:
SEARCH returns an array of either a number, or an error, depending on whether or not if finds the contents of table2 within the referenced cell in table 1.
1/Search(... will then return either an error, or some number which has to be no greater than 1.
Using 2 as the lookup criteria in LOOKUP guarantees that it will be greater than any value returned in the lookup_vector.
If the LOOKUP function can't find the lookup_value, the function matches the largest value in lookup_vector that is less than or equal to lookup_value. EDIT: As pointed out by #Gregory, this applies if lookup_vector is sorted ascending. When it is not, then the last entry that is less than or equal to lookup_value gets matched.
since result_vector references the OutputThis column, the matching entry in that column would be returned.
One could also guarantee, in this situation, that lookup_value would be greater than any value in lookup_vector by using a very large number, and eliminating the 1/... portion:
=LOOKUP(9.9E+307,SEARCH(srchCritTbl[Search Criteria],Sheet01!$A2),srchCritTbl[OutputThis])
I used the other form out of habit as it is more generally useful.
Here's a single cell array formula:
In cell Sheet01!B2 type:
=INDEX(Sheet02!B:B,MATCH(1,IF(ISERROR(SEARCH(Sheet02!A:A,A2)),0,1),0))
Press ctrl+shift+enter to complete the formula as single cell array formula
Copy this cell down to the remaining range Sheet01!B3:B6
Note: It gives the output for the first Sheet02 Search Criteria that matches, and not the first one found in the string. So the string Test02 and Test01 would output It is Test01 because Test01 is listed before Test02 on Sheet02.

Excel - count text occurrences if string is present in adjacent column

I'm trying to count values in Column A if Column B matches a certain string of text.
assets status
-----------------------------
1 | itemThing | yes
2 | |
3 | itemThing |
4 | |
5 | itemThing | yes
This above example would ideally return 2.
I want to count how many times "item" shows up in column A ONLY if column B says "yes"
I've tried something with =SUMPRODUCT but it doesn't seem to work correctly. It is currently returning 4 when there are 5 matching criteria.
I have =SUMPRODUCT((assets=A1)*(status=B1)) where assets and status are custom names for the column ranges created with Name Manager.
Edit: noticed that is has to be an exact string match for it to count correctly. How do I do partial string matches? e.g. search terms? e.g. match =SUMPRODUCT((assets="*item*")*(status=B1))
Two ways here for your reference:
SUMPRODUCT:
=SUMPRODUCT((ISNUMBER(SEARCH("*itemThing*",assets)))*(status="yes"))
COUNTIFS:
=COUNTIFS(status,"yes",assets,"*itemThing*")
For partial match, use wild card * such as "*itemThing*" and that should do the trick for you.

Simplify Multiple COUNTIFS Used for Search in Excel 2010

Introduction:
I have an Excel workbook I'm using to track stats for the game Hearthstone. One sheet contains the data of each individual game (wins losses, etc.). The other sheet allows the user to search for win/loss statistics based on user input search criteria.
My Question:
In the search sheet I am using COUNTIFS formulas. These formulas are getting rather long. Is there any way to simplify the COUNTIFS formulas?
The Setup: How the Excel Search Sheet Looks:
| Column K | Column L |
|____________________|______________________________|
Row 5 |Date Start | User input goes in Column L |
Row 6 |Date End | |
Row 7 |Player's Class | |
... |Turn Number | |
|Deck Name | |
|Opponent's Class | |
|Opponent's Username | |
|Match Type 1 | |
|Match Type 2 | |
|Match Type 3 | |
|Match Type 4 | |
... |Match Type 5 | |
Row 17 |Match Type 6 | |
|____________________|______________________________|
| Column K | Column L |
|______________________|_______________________________________________|
Row 21 | Total Matches Played | Data is displayed based on the user's input. |
Row 22 | Total Wins | The code that needs simplifying is goes here. |
Row 23 | Total Losses | |
Row 24 | Win to Loss Ratio | |
Row 25 | Win Percentage | |
Row 26 | Loss Percentage | |
|______________________|_______________________________________________|
The code that needs simplifying. This Code goes in Row 22 Column L:
=(COUNTIFS('Indiv. Match Stats'!I:I,"Win",'Indiv. Match Stats'!H:H,L12,
'Indiv. Match Stats'!L:L,L7,'Indiv. Match Stats'!T:T,L9,'Indiv. Match
Stats'!Q:Q,L10,'Indiv. Match Stats'!P:P,L11,'Indiv. Match Stats'!C:C,
">="&L5,'Indiv. Match Stats'!C:C,"<="&L6,'Indiv. Match Stats'!N:N,L8))
+
(COUNTIFS('Indiv. Match Stats'!I:I,"Win",'Indiv. Match Stats'!H:H,L13,
'Indiv. Match Stats'!L:L,L7,'Indiv. Match Stats'!T:T,L9,'Indiv. Match
Stats'!Q:Q,L10,'Indiv. Match Stats'!P:P,L11,'Indiv. Match Stats'!C:C,
">="&L5,'Indiv. Match Stats'!C:C,"<="&L6,'Indiv. Match Stats'!N:N,L8))
+
(The code repeats the above four more times. Basically each block of code
stands for one Match Type in Column K)
Explanation of Worksheet and Code:
The user inputs criteria in Rows 5 through 17, Column L. Anything left blank is treated as a wildcard. The user input criteria narrows the search results and determines the data displayed in Rows 21 through 26, Column L.
The code shown above, references a separate sheet named Indiv. Match Stats many times. The COUNTIFS narrow down the search by date, player class, turn number, deck name, ..., and match type. Unfortunately all those criteria must be repeated, once for each match type and then the code adds the results, giving the final result (the proper amount of wins, losses, etc. for the given criteria). It is a large block of code, being added to another block of code.
Is there any better way to do this or just some way to visually simply the code? Is there a way to make similar blocks of the code equal some variable, so that those similar parts don't have to be typed of and over?
You can effectively use an "OR" in COUNTIFS - assuming you want to count if column H = any of L12:L17 then use this version
=SUMPRODUCT(COUNTIFS('Indiv. Match Stats'!I:I,"Win",'Indiv. Match Stats'!H:H,L12:L17,
'Indiv. Match Stats'!L:L,L7,'Indiv. Match Stats'!T:T,L9,'Indiv. Match
Stats'!Q:Q,L10,'Indiv. Match Stats'!P:P,L11,'Indiv. Match Stats'!C:C,
">="&L5,'Indiv. Match Stats'!C:C,"<="&L6,'Indiv. Match Stats'!N:N,L8))
The COUNTIFS now returns an array of 6 values (one each for L12:L17) and then SUMPRODUCT is used to sum that array because it doesn't require "array entry" as SUM would.
Note1: SUMPRODUCT is simply summing 6 values, so there is no performance "hit" from using it in this context - all the "heavy lifting" is done by COUNTIFS
Note2: If any value is repeated in L12:L17 then you will get "double-counting" just as your original formula does
To avoid double-counting use this formula - note the additional COUNTIF function at the end:
=SUMPRODUCT(COUNTIFS('Indiv. Match Stats'!I:I,"Win",'Indiv. Match Stats'!H:H,L12:L17,
'Indiv. Match Stats'!L:L,L7,'Indiv. Match Stats'!T:T,L9,'Indiv. Match
Stats'!Q:Q,L10,'Indiv. Match Stats'!P:P,L11,'Indiv. Match Stats'!C:C,
">="&L5,'Indiv. Match Stats'!C:C,"<="&L6,'Indiv. Match Stats'!N:N,L8),1/COUNTIF(L12:L17,L12:L17&""))
Dead simple approach to shorten the code is the shorten the title of the tab "Indiv. Match Stats" to say IMS. That shortens things significantly:
=(COUNTIFS('IMS'!I:I,"Win",'IMS'!H:H,L12,'IMS'!L:L,L7,'IMS'!T:T,L9,'IVM'!Q:Q,L10,'IMS'!P:P,L11,'IMS'!C:C,">="&L5,'IMS'!C:C,"<="&L6,'IMS'!N:N,L8))
+ (COUNTIFS('IMS'!I:I,"Win",'IMS'!H:H,L13,'IMS'!L:L,L7,'IMS'!T:T,L9,'IMS'!Q:Q,L10,'IMS'!P:P,L11,'IMS'!C:C,">="&L5,'IMS'!C:C,"<="&L6,'IMS'!N:N,L8))
Another prettier way to do this is to used Excel named ranges. Highlight each range like 'Indiv. Match Stats'!I:I and click in the Name box, to the left of the formula bar, type a name for the list like IMSI. Repeat with 'Indiv. Match Stats'!N:N -> IMSN and so on.
That would give you code like this:
=(COUNTIFS(IMSI,"Win",IMSH,L12,IMSL,L7,IMST,L9,IVMQ,L10,IMSP,L11,IMSC,">="&L5,IMSC,"<="&L6,IMSN,L8))
+ (COUNTIFS(IMSI,"Win",IMSH,L13,IMSL,L7,IMST,L9,IMSQ,L10,IMSP,L11,IMSC,">="&L5,IMSC,"<="&L6,IMSN,L8))
I post this answer not as my suggestion, but to show what the problem is. The problem is, that there is no OR shortcut functionality in COUNTIFS. So you can't say COUNTIFS('Indiv. Match Stats'!H:H;L12 OR L13 OR L14...).
There is a possibility to get the formula shorter with an array formula with SUMPRODUCT. This will work, because there is a possibility to perform OR shortcuts by sum boolean results so that the sum is 1 if only one boolean is true. The formula would be:
=SUMPRODUCT(
('Indiv. Match Stats'!I:I="win")
*(
('Indiv. Match Stats'!H:H=L12)+('Indiv. Match Stats'!H:H=L13)
+('Indiv. Match Stats'!H:H=L14)+('Indiv. Match Stats'!H:H=L15)
+('Indiv. Match Stats'!H:H=L16)+('Indiv. Match Stats'!H:H=L17)
)
*('Indiv. Match Stats'!L:L=L7)
*('Indiv. Match Stats'!T:T=L9)
*('Indiv. Match Stats'!Q:Q=L10)
*('Indiv. Match Stats'!P:P=L11)
*('Indiv. Match Stats'!C:C>=L5)
*('Indiv. Match Stats'!C:C<=L6)
*('Indiv. Match Stats'!N:N=L8)
)
But this will have a very bad performance. Such array formulas are very slow especially for whole columns.
So BKays suggestions are the best ones also in my opinion.
Greetings
Axel

Extracting a substring from a string of arbitrary length

I have just a hair over 30,000 tweets. I have one column that has the actual tweet. There are two things that I would like to accomplish with this column.
First here is a snippet of sample data:
RT #Just_Sports: Cool page for fans of early pro #baseball. https://t.co/QCMYFQNSq8 #mlb #vintage #Chicago #Detroit #Boston #Brooklyn #Phil…
#brettjuliano you already know #unity #newengland #hiphop #boston #watertown #network
I have a column that uses the following formula to see if the message starts out with RT meaning a re-tweet. It returns 1 for yes and 0 for no.
What I would like to accomplish is to create a formula in two columns. One that will get the username if the RT column has a value of 1 and in the second column the username if the RT column has a value of 0. Since usernames are of arbitrary length I am unsure of how to go about this.
Example
RT #Just_Sports: | 1 | #Just_Sports | 0
#brettjuliano | 0 | | #brettjuliano
Take a look at Excel's FIND function. You can use this to identify the position of the #, then using a specified delimiter, match the end of the user name:
=MID(A1, FIND("#",A1), FIND(":",A1,FIND("#",A1)) - FIND("#",A1))
Where A1 is the cell containing the tweet, and ":" is your delimiter.
You can use the same feature to check for the existence of the "RT" identifier.
=FIND("RT",A1)>0
Which returns TRUE if "RT" is found. You may want to consider a search for " RT " (spaces), or some other variation, since there is no standard for using this in a tweet:
=OR(FIND("RT",A1)>0,FIND(" RT",A1)>0,FIND("RT ",A1)>0, FIND(" RT ",A1)>0)
But beware of false positives: ART, START, ARTOO, etc...
Additionally, your "RT" may be lower/upper/mixed case, in which case you'll want to normalize that search:
=OR(FIND("RT",UPPER(A1))>0,FIND(" RT",UPPER(A1))>0,FIND("RT ",UPPER(A1))>0, FIND(" RT ",UPPER(A1))>0)
My OR check is different than the 0/1 check you say you already have, so you can jsut add IF to that to convert to the 0/1 as needed:
=IF(OR(FIND("RT",A1)>0,FIND(" RT",A1)>0,FIND("RT ",A1)>0, FIND(" RT ",A1)>0),1,0)
Once you know you have the RT check correct, and your second column is filled properly, you can add to my original formula:
Case for 1 in 2nd column:
=IF(B1=1,MID(A1, FIND("#",A1), FIND(":",A1,FIND("#",A1)) - FIND("#",A1)),"")
Case for 0 in 2nd column:
=IF(B1=0,MID(A1, FIND("#",A1), FIND(":",A1,FIND("#",A1)) - FIND("#",A1)),"")

Match text from column within a certain cell - Excel

I have a column of few thousand filenames that are not uniform. For instance:
| Column A | Column B |
===============================
| junk_City1_abunc | City1 |
-------------------------------
| nunk_City1_blahb | City1 |
-------------------------------
| small=City2_jdjf | City2 |
-------------------------------
| mozrmcity3_somet | City3 |
I would like to identify the city within the text in column A and return it in Column B.
I've come up with a complex formula that does the trick, but it is difficult to adjust if more cities are added within the filenames in new entries within column A.
Here is an example:
=IF(ISNA(MATCH("*"&$W$3&"*",I248,0)),IF(ISNA(MATCH("*"&$W$4&"*",I248,0)),IF(ISNA(MATCH("*"&$W$5&"*",I248,0)),IF(ISNA(MATCH("*"&$W$6&"*",I248,0)),IF(ISNA(MATCH("*"&$W$7&"*",I248,0)),IF(ISNA(MATCH("*"&$W$8&"*",I248,0)),"Austin","Orlando"),"Las Vegas"),"Chicago"),"Boston"),"Las Angeles"),"National")
It seems like there should be an easier way to do it, but I just can't figure it out.
(To make matters worse, not only am I identifying a city within the filename, I'm looking for other attributes to populate other columns)
Can anyone help?
Use the formula =IFERROR(LOOKUP(1E+100,SEARCH($E$2:$E$11,A2),$E$2:$E$11),A2)
This does *****NOT***** have to be array entered.
Where $E$2:$E$11 is the list of names you want returned and A2 is the cell to test
If no matches are found instead of errors you will just use the full name in column b.
If you want errors or expect to NEVER have then you can just use:
=LOOKUP(1E+100,SEARCH($E$2:$E$11,A2),$E$2:$E$11)
Here's a round about way that works, not all my own work but a mish mash of bits from other sources:
Assuming the sheet is setup as follows:
The formula to use is below, this must be entered using Ctrl+Shift+Enter
=INDEX($C$2:$C$8,MAX(IF(ISERROR(SEARCH($C$2:$C$8,A2)),-1,1)*(ROW($C$2:$C$8)-ROW($C$2)+1)))
Annotated version:
=INDEX([List of search terms],MAX(IF(ISERROR(SEARCH([List of search terms],[Cell to search])),-1,1)*(ROW([List of search terms])-ROW([Cell containing first search term])+1)))

Resources