How to find duplicates by row in excel - excel

I've been looking for a way to see which rows have duplicate words in.
If a word matches in column A and C I would like to add a "X" to column B. The whole cell shouldn't have to be exactly the same for example, John Miller and Miller,J This needs to only match words in the same row and not the entire column. I have 50k plus rows to work through so I'm looking for a better way,any help would really be appreciated
Here's what it looks like:
A
Jf Wepener . Lourens Johannes Stephanus
Me Horn x Horn Maria Elizabeth
Jg Waldeck x Waldeck Johan George
Pj Du Preez x Preez Paulus Jacobus Du

By Excel you have long formula (To work well). Following the scheme:
First Split the String Column A for searcing. In the colums:
D: =TRIM(IFERROR(LEFT(A2;SEARCH(" ";A2;1));A2))
E: =TRIM(IFERROR(LEFT(MID(A2;LEN(D2)+2;99);SEARCH(" ";MID(A2;LEN(D2)+2;99);1));MID(A2;LEN(D2)+2;99)))
F: =TRIM(IFERROR(LEFT(MID(A2;LEN(D2)+LEN(E2)+3;99);SEARCH(" ";MID(A2;LEN(D2)+LEN(E2)+3;99);1));MID(A2;LEN(D2)+LEN(E2)+3;99)))
in the column B:
=IF(OR(AND(ISNUMBER(SEARCH(D2;C2));IF(D2="";FALSE;TRUE));AND(ISNUMBER(SEARCH(E2;C2));IF(E2="";FALSE;TRUE));AND(ISNUMBER(SEARCH(F2;C2));IF(F2="";FALSE;TRUE)));"x";"")
In this way you can have space (at the end) or not. If you have more words in the column A you shall add formula.It's Better / You can split and HIDE the columns no necessary for generate the results...

Related

Is there a way to find any one of a set of characters using an excel formula

I have data that uses a range, or a less than symbol to denote 'between 0 and number'. But multiple characters are used for the same purpose.
It looks like below (first two columns), plus a column showing the results I want:
Country
Average hotdog consumption
Desired output
Madeupaland
10-200
105
Exampledesh
50—1000
525
Republic of Notreal
<1000
500
Inventia
≤5000
2500
Plus many rows where the data in the second column is purely numerical and doesn't need finessing into a number
I can use this formula to calculate the midpoint where there is a range:
=IFERROR(AVERAGE(LEFT(C2,FIND("–",C2)-1),RIGHT(C2, LEN(C2)-FIND("–",C2))), A2)
But they only covers one kind of dash(- and not —). Similarly, if I want to halve the numbers in rows with < and ≤ I'd need to replicate a formula there.
Is there a way of finding multiple different characters from a set? My understanding is that find looks for the whole string of characters. substitute is a work around, but I'd have to substitute every different value in the 'character set'.
In regex this would just be [-—].
I'm using Excel 2013 if that matters
It's not a perfect solution but you can try the following. This replaces those patterns of text with replacements representing which formula to use:
Create a Reference Table (I have made this in I1:K5)
|Pattern |Pattern Name |Substitution Rule |
|------- |------------ |----------------- |
|— |double dash |/2+0.5* |
|- |dash |/2+0.5* |
|< |lt |0.5* |
|≤ |lte |0.5* |
In your third column enter the following array formula (Using Ctrl + Shift + Enter to confirm)
=IF(ISNUMBER(B2),B2,"'="&SUBSTITUTE(B2,INDEX($I$2:$I$5,MIN(IF(ISNUMBER(FIND($I$2:$I$5,B2)),ROW($I$2:$I$5)-1,99))),INDEX($K$2:$K$5,MIN(IF(ISNUMBER(FIND($I$2:$I$5,B2)),ROW($I$2:$I$5),99)-1))))
Copy your third column and past values into a fourth column
Replace all the ''s with nothing to evaluate the expressions using Ctrl + H
My Result:
Country
Average hotdog consumption
Desired output
Formula Paste
Output after replacing 's
Madeupaland
10-200
105
'=10/2+0.5*200
105
Exampledesh
50—1000
525
'=50/2+0.5*1000
525
Republic of Notreal
<1000
500
'=0.5*1000
500
Inventia
≤5000
2500
'=0.5*5000
2500

Excel: How to find six different combinations of words in string?

I have been working for several days on this and have researched everything looking for this answer. I'd appreciate any help you can give.
In Excel I am searching a string of text in column A:
Bought 1 HD Sep 3 2021 325.0 Call # 2.75
I am detecting the first word (in this case "Bought") and detecting the last word before "#" symbol (in this case "Call").
I am then detecting the price following the "#" symbol (in this case "2.75"). This number will go into column B (header "Open") or column C (header "Close") depending on the combination of words found:
Sold/Put=Close
Sold/Call=Open
Bought/Put=Open
Bought/Call=Close
Sold (by itself)=Open
Sold (by itself)=Close.
Bought 1 HD Sep 3 2021 325.0 Call # 2.75
The combination found in the above string is: "Bought Call". Therefore the number at the end ("2.75"), goes into "Open" column.
Here's another example:
Sold 4 AI Sep 17 2021 50.0 Put # 1.5
The combination found in the above string is: "Sold Put". Therefore the number at the end ("1.5") goes into "Close" column.
I am currently using this formula to determine if the string contains "Sold" and "Call" and get the desired number and it does work:
=IF(AND(
ISNUMBER(SEARCH({"Sold","Call"},A10))),
TRIM(MID(A10,SEARCH("#",A10)+LEN("#"),255))," ")
But, I don't know how to search for all the other possible combinations.
The point behind this is to be able to paste the transaction from the broker and have most of the entry process automated. I'm sure many will benefit from this as I've not found anything like this.
I'd appreciate any help and if possible, an explanation of the formula so I can better learn.
Thanks!
I think you have the right idea, but would just extend the IF statement.
Something like the below might work for you:
=IF(ISNUMBER(SEARCH("Call", $A1)),
IF(ISNUMBER(SEARCH({"Bought","Sold"}, $A1)),
NUMBERVALUE(RIGHT($A1, LEN($A1)-SEARCH("#", $A1))),""),
IF(ISNUMBER(SEARCH({"!!!","!!!","Bought","Sold"}, $A1)),
NUMBERVALUE(RIGHT($A1, LEN($A1)-SEARCH("#", $A1))),""))
Just enter in column B and drag down; columns B through E should fill as needed.
For example:
Note that the search for "!!!" is just random characters, it can be anything that you don't think has a good chance of appearing in the string.
Here/screenshots refer:
(requires Office 365 compatible version Excel)
Main lookup
=LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($J$7:$J$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))
EDIT:
Other Excel versions:
=IFERROR(INDEX($J$7:$J$12,MATCH(1,IF($I$7:$I$12="",MATCH("*"&$H$7:$H$12&"*",B4,0)*ISNUMBER(MATCH("*"&$I$7:$I$12&"*",B4,0)),MATCH("*"&$H$7:$H$12&"*",B4,0)*MATCH("*"&$I$7:$I$12&"*",B4,0)),0)),)
(all that falls away is the 'Let' formula, replacing fn_1 and fn_2 with respective functions in index formula within the let making first equation somewhat longer, but otherwise identical)
Example applications
Have provided 2 examples of how one might customize to insert numeric in one of the columns (the key part to this question is really how to do lookup in first instance, from thereon it's a matter of finetuning/taking appropriate action)...
Assuming calls/buys are "long" position and strike price go in first col (here, D), and puts/sales are "short" position with strike price going in 2nd col (here, E):
Long - insert strike price col D
=IF(LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))=1,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
EDIT
Other Excel versions:
=IF(IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",MATCH("*"&$H$7:$H$12&"*",B4,0)*ISNUMBER(MATCH("*"&$I$7:$I$12&"*",B4,0)),MATCH("*"&$H$7:$H$12&"*",B4,0)*MATCH("*"&$I$7:$I$12&"*",B4,0)),0)),)=1,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
Short - insert strike price col E
=IF(LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))=2,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
EDIT
Other Excel versions:
Follow same routine in previous Edits (remove Let, replace fn_1 & fn_2 with respective formulae...)
Note similarity in all 3 equations above: 2nd and 3rd contain 1st (effectively they just wrap a big old 'if' statement around 1st, use lookup_2 col (here, col K), and use mid/search to extract rate after the hashtag.
Assumes you don't have other hashtags in the sentence..
Customize as required.

Is there a way to compare text strings in Excel and output a complete/partial/no match column (with the information missing listed)?

I have a large spreadsheet (upwards of 119K rows) of mismatched data. Column A contains a list of names in full (and occasionally a Trustee or company name), and Column B contains initialized first/middle names with full names (and occasionally Trustee or company names).
I do not currently have a way to compare them short of doing so manually as there are many variable, and am looking for some assistance.
So far I have tried using a VBA script from (How do I fuzzy match just adjacent cells?) to see if it can output the difference (which would allow me to eliminate the cells in Column 2 that had no matching data), but this did not function as intended.
I have also tried various LEFT/RIGHT to trim the names from Column A and then match this to Column B, but this has also not worked due to variance in text in Column A.
Here are some examples of the cells. Note that the names in Column A are not always in alphabetical order, but Column B is:
Example (complete match):
Column A: Column B:
Smith Marcus John J M Smith
Page Binder Book, Quoth Nevermore Raven B B Page, R N Quoth
Orange Apple Banana, Orange Pear Plum A B & P P Orange
Koala Bear, Koala Marsupial Pouch, Koala Gum Tree B, P M & T G Koala
S & P Limited S & P Limited
S & P Limited A D Cumin (S & P Limited)
Example (partial):
Column A: Column B:
Page Binder Book, Quoth Nevermore Raven B B Page
Orange Apple Banana, Orange Pear Plum A B & P P Orange (Fruit 2019 Limited)
Koala Bear, Koala Marsupial Pouch, Goanna Gumtree, Koala Gum Tree B, P M & T G Koala
Example (no match):
Column A: Column B:
Smith Marcus John H J Hyde
Sheppard Garrus Thane B B Page, R N Quoth
What I am hoping to do:
Firstly, I am hoping to correctly mark each cell in Column B as complete/partial/no match with a fill (green/yellow/red). Secondly, for partial matches (whether Column A has extra information, or Column B is missing information) I want to output in Column C the missing information, like so:
Column A: Column B:
Page Binder Book, Quoth Nevermore Raven B B Page
Orange Apple Banana, Orange Pear Plum A B & P P Orange (Fruit 2019 Limited)
Column C:
Quoth Nevermore Raven
(Fruit 2019 Limited)
Is this kind of thing even possible, or are there just too many variations in the way the data is presented in each column?
Very new to both this site and excel functions in general, this is my first task!
Thank you for your assistance/knowledge/time.
Importing and using this VBA module: https://github.com/kyledeer-32/vba_fuzzymatching
Which contains several User Defined Functions (UDFs) will get you a near optimal solution (you will still have to review matches), but you can easily fuzzy match, then calculate the similarity between strings, then a simple "=IF" function can rank them. Using this VBA module I recommended, I got the following results:
I noted that "Koala Bear..." in Column A matched to "S & P..." in Column B. I expected the value in Column B with "...Koala" to match. I checked the script and the Levenshtein Edit distance was actually equal for both. This scarce occurrence will require you to review your matches, but you can do this quickly by ranking your results based on string similarity. Here is a formula view of what I did:
To import the VBA module linked in the beginning of this answer - here is a guide: https://www.excelcampus.com/vba/copy-import-vba-code/
Note: after importing this module, you will need to enable the "Microsoft Scripting Runtime" library in the Visual Basic Editor Window it to run. Steps to do this (takes less than a minute):
From Excel Workbook:
Select Developer tab on ribbon
Select Visual Basic
Select Tools on the Toolbar
Select References
Scroll down until you see Microsoft Scripting Runtime, then check the
box
Press OK
Then your all set! You can use the UDFs (just like in my second image - above) just as you would use normal excel functions! Hope this helps!

How to find numeric values Before & After a String in an Excel Cell

I hope I can get some assistance as to which formula to use. In the three rows below, I am trying to pull values from the right.
First line you can see that we have 10x50 meaning 10 packages have 50 items each. So I need to extract values Before and After X
It could be two cells, where I have values Before X and then next cell values After X. Sometimes the X is located a few spaces before the last word. I'm wondering if any kind soul can help please?
DEXTROSE 50% 2G/ML 10X50 LSSYR
LEVETIRACETAM INJ USP 500MG SSOL 25X5
DOBUTAMINE 100 INJ 1X5 ML AMP SAM (PF)
This should work for you. Assumes the measurement is at the end, or near the end and looks for the last occurrence of "x". So if there is another x after this measurement, then it will not work. Also your example had only numbers between 1 and 99 (aka no more than two digits). So this formula will not work if the measurement is longer than 5 characters. aaXbb is OK. aaaXbb is not OK.
=TRIM(RIGHT(LEFT(A1,SEARCH("^^",SUBSTITUTE(A1,"x","^^",LEN(A1)-LEN(SUBSTITUTE(A1,"x",""))))+2),5))

Need advice on VLOOKUP, LOOKUP or MATCH

I have four columns Name, Y/N, NameList and Result, e.g.:
A B C D
Name Y/N NameList Result
Abc Y Xyz N
Xyz N Wto N.A
Def Y Abc Y
Tow N.A
Wtf N.A
Qrz N.A
Def Y
I want to fill up column D (Result) according to column B if A and C match.
I have tried LOOKUP, VLOOKUP and MATCH but still do not get what I want, e.g.:
=INDEX($B$2:$B$51,MATCH($A$2:$A$51,$C$2:$C$75,0))
What am I doing wrong here?
If you are prepared to replace the spaces in ColumnA (with nothing) then
=IFERROR(VLOOKUP(REPLACE(C2,SEARCH(" ",C2),1,""),A$2:B$5,2,FALSE),VLOOKUP(C2,A$2:B$5,2,FALSE))
should work for entries in the NameList that include a single space, as well as those with no spaces but you might want to apply TRIM to NameList first.
NOTE: Chris Neilsen's solution (in a comment on the OP's own answer) is a much better solution (once the requirements were clarified!)
I have found solution to my own answer but it is not the perfect one because it needs to match exactly the same. Meaning the spaces. But i will stick to this first. If anyone has better answer please do advise.
=VLOOKUP(C2,$A$2:$B$4,2,FALSE)
Thanks
P.S what should i do to also match abc/123 and abc /123. Due to the space currently they are not match

Resources