Remove duplicates within Excel cell - excel

Say I have the following text string in one single Excel cell:
John John John Mary Mary
I want to create a formula (so no menu functions or VBA, please) that would give me, on another cell
John Mary
How can I do this?
What I've tried so far was search the internet and SO about the issue and all I could find were solutions involving Excel's built-in duplicate removal or something involving countif and the replacement of duplicates for "". I've also taken a look at the list of Excel functions, especially those from the "Text" category, but couldn't find anything interesting, that could be done on one cell.

The answer is here: https://www.extendoffice.com/documents/excel/2133-excel-remove-duplicate-characters-in-string.html
Function RemoveDupes2(txt As String, Optional delim As String = " ") As String
Dim x
'Updateby20140924
With CreateObject("Scripting.Dictionary")
.CompareMode = vbTextCompare
For Each x In Split(txt, delim)
If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
Next
If .Count > 0 Then RemoveDupes2 = Join(.keys, delim)
End With
End Function
Put the code above in a module
Use =RemoveDupes2(A2,",")
A2 contains repeated text separated by ,
You may change the delimiter

Assuming you'll never have more than two distinct names in a cell, this should work:
=MID(A1&" ",1,FIND(" ",A1&" "))&
MID(SUBSTITUTE(A1&" ",MID(A1&" ",1,FIND(" ",A1&" ")),"")&" ",1,
FIND(" ",SUBSTITUTE(A1&" ",MID(A1&" ",1,FIND(" ",A1&" "))&" ","")))
It will show John Mary for all of these:
John John John Mary Mary
John Mary
John Mary John Mary
John Mary Mary
John John Mary
It will show John for all of these:
John
John John
John John John
And it will show nothing if A1 is blank.

As I wrote, it is trivial to solve with VBA. If you cannot use VBA, one method is to use helper columns.
Assume: Your string is in A1
Enter the following formulas:
C1: =IFERROR(INDEX(TRIM(MID(SUBSTITUTE($A$1," ",REPT(" ",99)),(ROW(INDIRECT("1:" & LEN($A$1)-LEN(SUBSTITUTE($A$1," ",""))+1))-1)*99+((ROW(INDIRECT("1:" & LEN($A$1)-LEN(SUBSTITUTE($A$1," ",""))+1))=1)),99)),ROWS($1:1),1),"")
D1: =IF(COUNTIF(C1:$C$5,C1)=1,C1,"")
Select C1 and D1 and fill down until you start getting blanks
E1: =D1
E2: =TRIM(CONCATENATE(D2," ",E1))
Select E2 and fill down.
The contents of the last cell filled in column E will be your result.
If you want to have a cell which automatically returns the contents of the last cell in column E range, you can use a formula like:
=LOOKUP(REPT("z",99),$E$1:$E$100)

Without a formula: Text to Columns with space as the delimiter, transpose the output, apply Remove Duplicates to each of the columns individually, then transpose the result.

Found a solution that might work if you are also the one making the list.
when you make the list if you are doing it by combining the cell above with the current line, you can check to see if the value is already in the above cell using the following code:
if(iserror(find(value_to_be_added,previous_concatenation)),
previous_concatenation&" "&value_to_be_added,previous_concatenation)

Did you try the textjoin function? (available in Excel 2016, not sure about previous versions). Was just looking for something similar and this seems to do the job for me on a column where I have multiple values more than once.
=TEXTJOIN(delimiter;ignore_empty;text)
define delimiter in any way you need it
ignore empty can be true or false, depending on what serves your needs
text would be your array of values - using the unique function within here (see example below) will filter out any multiples of any string (I am using it for numbers and it works)
Example:
=TEXTJOIN(" ";TRUE;UNIQUE($A$1:$A$16))
Guess this might be Excel's equivalent to google sheets' join function. Textjoin comes up if you type in =join - I took the formula provided in user11308575's post above but removed the parantheses and its content, then went from there.
Hope this helps (even though the thread is already old) ;)

If one has access to TEXTJOIN one could use:
=TEXTJOIN(" ",,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[not(preceding::*=.)]"))

I found the answer below in this thread https://superuser.com/questions/643909/remove-duplicate-entries-in-one-cell
=join(" ",unique(transpose(split(A1," "))))

Related

How to extract email & name from multi-line strings in cells in excel

Example of cells below. As you can see some cells have more info including emails, middle initials or names, but some don't. They're all in the same column.
Cell 1
Smith, James
#129432
123 N. Street Road
Libertyville, IL, 60048
(810) 955-9721
claudie.predov#hotmail.com
Cell 2
Evette Tar Rudnick
#7928253
1308 Stutler Lane
Tidioute, PA, 16351
Cell 3
David Ponce C
#1234567
2855 Retreat Avenue
Frenchboro, ME, 04635
(313) 204-6364
Any help is appreciated. Thank you.
You may try in this way,
• Formula used in cell C1
=FILTERXML("<a><b>"&SUBSTITUTE(A1,CHAR(10),"</b><b>")&"</b></a>","//b[1]")
• Formula used in cell D1
=IF(ISNUMBER(FIND("#",A1)),FILTERXML("<a><b>"&SUBSTITUTE(A1,CHAR(10),"</b><b>")&"</b></a>","//b[last()]"),"")
Note: I have assumed the strings are segregated within a cell by line breaks hence why CHAR(10) has been used.
Or, if you have access to the O365 & currently in Insider's Beta Channel Version then you may try using TEXTBEFORE() & TEXTSPLIT() Functions as well,
• Formula used in cell E1
=TEXTBEFORE(A1,CHAR(10),1)
• Formula used in cell F1
=IFERROR(INDEX(TEXTSPLIT(A1,,CHAR(10)),6),"")

Remove values in a cell found in another cell

I am wondering if there is a formula, or several, that can remove values within a cell (not delete the cell entirely) that are defined in another cell(s).
Example:
"123 New York St Chicago IL 98765" to be "New York St Chicago IL" by using a formula that looks at the cell with the street # "123" and zip "98765" values and remove it from the cell with the full address.
Using Text to Columns is not an option at this point because there are no commas separating values, and all addresses are not in similar formats.
Any help appreciated.
I believe you're looking for Substitute(). This would need to be a helper cell, as formulas in the worksheet cannot remove/delete data.
Example:
A B
1 cat =Substitute(A$4,A1,"") 'output: The chased the dog.
2 dog =Substitute(A$4,A2,"") 'output: The cat chased the .
3
4 The cat chased the dog.
If you are looking to actually remove/delete data, you would need VBA, which isn't appropriate for Google-sheets.
in Google Sheets you can do:
=ARRAYFORMULA(TRIM(REGEXREPLACE(A1:A; "\d+"; )))
Or using Textjoin+Filterxml function, of which Textjoin available in Office365
In B1, formula copied down :
=TEXTJOIN(" ",1,INDEX(FILTERXML("<a><b>"&SUBSTITUTE(A1," ","</b><b>")&"</b></a>","//b[.!=0+.]"),0))

Mode Text 2nd most common text value

IFERROR(INDEX($I$7:$I,MODE(IF($I$7:$I<>"",MATCH($I$7:$I,$I$7:$I,0)))),"No data")
With this formula, which calculates the most common text value, I need to have the 2nd most common.
Column I content:
Apple
Orange
Apple
Apple
Orange
In this example, I need to get Orange. How is that possible? I can't figure how.
A PivotTable might suit:
and copes with ties for rank.
You can extract the most frequent item in the list with an array formula.
=INDEX(MyList,MATCH(MAX(COUNTIF(MyList,MyList)),COUNTIF(MyList,MyList),0))
Note that an array formula must be confirmed with Shift+Ctl+Enter instead of the customary singular Enter required for normal formulas. When entered wrongly it will display a #NUM! error.
For simplicity's sake I have used a named range MyList in the formula. However, if you prefer, you can replace the name with something like $I$7:$I$1000.
To extract the second-most frequent expression in the list you could use a formula constructed analogue to the above.
=INDEX(MyList,MATCH(LARGE(COUNTIF(MyList,MyList),MAX(COUNTIF(MyList,MyList))+1),COUNTIF(MyList,MyList),0))
This formula is built on the logic that n equals the highest number of occurrences. Therefore the second highest must rank as n + 1, being MAX(COUNTIF(MyList,MyList))+1) in the above formula. By the same method the third ranked could be extracted.
You can embed these formulas in an IFERROR() function.
I found this on Mr Excel
Return most common, 2nd most common, 3rd most common, etc. text string in an array
Spreadsheet Formulas
Cell ___ Formula 'Notice that the cells are B2, D2, E2. Column C is blank
B2 =IF(A2="","",IF(COUNTIF(A$2:A2,A2)=COUNTIF($A$2:$A$100,A2),COUNTIF($A$2:$A$100,A2)+(ROW()/1000),""))
D2 =IF(ROWS($1:1)>COUNT(B:B),"",INDEX(A:A,MATCH(LARGE(B:B,ROWS($1:1)),B:B,0)))
E2 =IF(D2="","",COUNTIF($A$2:$A$100,D2))<br><br>
Results
___ A ________ B ___C ___D _________E
1 Data Set:___Helper ____ Name ____ Occurrences
2 Harmon _____________ Williams ______4
3 Smith _______________ Smith ________3
4 Smith _______________ Harmon ______2
5 Harmon_____ 2.005
6 Williams
7 Williams
8 Smith _______3.008
9 Williams
10 Williams ____4.010
you can try to tie this all together in a single formula but it's simpler and more agile in a spreadsheet environment to just break out the problem in a few separate steps.
take a given column of values you're wanting to count/rank - i'll call it RankList in examples below.
if you're not setting named ranges (do yourself a favor and use named ranges) you'll want this to be your column range - i.e. A:A
now in another column use
=unique(RankList)
there's your list of unique values, now we just need to count the instances of each unique value in the original RankList - this is simple - in the next column over simply use
=countif(RankList,B1)
B1 above represents the cell adjacent to the formula, wherever that might be on your sheet. now autofill the formula in with the relative cell value for each item. now all of your items are counted by instance.
now we want to sort them by value, highest to lowest. create another named range, selecting the two columns containing the =unique(RankList) and =countif(RankList,B1) formulas that were just created, i'll refer to it as UniqueCount
use the following
=sort(UniqueCount, 2, false)
that's it. again you can accomplish this by stacking formulas like in the above examples, but in practice i've found that you won't know what you'll want to do additionally with your data/sheet later on. keeping it broken up in discrete steps like this makes it much easier to make adjustments.

Excel, append one range to the end of another in one column

I have two columns of data in Excel. I would like to add a third column which combines the first and second. How can I do this with a formula such that I can add or remove data from columns A and B without ever having to touch column C?
Column A Column B Column C
Bob Mary Bob
Joe Melissa Joe
Jim Jackie Jim
Mary
Melissa
Jackie
The question explicit mention Microsoft Office Excel but I think would be good to add that if you are using Google Sheets a simpler solution is to use the curly brackets function/operator as mentioned by Lake at https://stackoverflow.com/a/14151000/1802726.
Here is a simple solution using FILTERXML and TEXTJOIN that can append MULTIPLE RANGES OF ANY SIZE, ARRAY FORMULAS AND REGULAR FORMULAS. Just replace YOUR_RANGES with the ranges or dynamic arrays you wish to join:
Simple version that ignores empty cells:
=FILTERXML("<A><B>" & TEXTJOIN("</B><B>",TRUE,YOUR_RANGES) & "</B></A>", "//B")
This one includes empty cells:
=IFERROR(FILTERXML("<A><B>" & TEXTJOIN("</B><B>",FALSE,YOUR_RANGE) & "</B></A>", "//B"), "")
If your input data contains the "<" character, the formulas above will return an error, so use this one instead:
=IFERROR(SUBSTITUTE(FILTERXML("<A><B>" & SUBSTITUTE(SUBSTITUTE(TEXTJOIN("ΨΨ",FALSE,YOUR_RANGE),"<","ЉЉ"),"ΨΨ","</B><B>")&"</B></A>","//B"),"ЉЉ","<"),"")
Note: you can change the FALSE to TRUE to ignore empty cells.
Note 2: You can replace the characters ЉЉ and ΨΨ by any character(s). I used these specific characters because it is very unlikely that your input data will contain ЉЉ or ΨΨ, which would cause errors.
NOTES:
Tested on:
Excel 365
EXAMPLE:
Using the simple version of the formula:
=FILTERXML("<A><B>" & TEXTJOIN("</B><B>",TRUE,A1:A3,B1:B3,C1:C3) & "</B></A>", "//B")
As a result you will get a dynamic array with the joined/appended ranges:
You can then apply any dynamic array formula (like UNIQUE) to the result.
HOW THIS WORKS:
The JOINTEXT function grabs your ranges and joins them as a text with the delimiter "</ B >< B >". Then, after adding "< A >< B >" to the beginning and "</ B ></ A >" to the end, we have an XML formatted text:
<A><B>1</B><B>2</B><B>3</B><B>A</B><B>B</B><B>C</B><B>!</B><B>#</B><B>#</B></A>
Finally, the FILTERXML will separate the tags into a dynamic array which will be the joined/appended ranges.
Enter the following formula into cell C1
=IF(ROW()>COUNTA(A:B),"",IF(ROW()<=COUNTA(A:A),INDEX(A:A,ROW()),INDEX(B:B,ROW()-COUNTA(B:B))))
Then copy down as far as you need.
Here's a nice way of interleaving the two rows.
In other words, turning this:
A X
B Y
C Z
into this:
X
A
Y
B
Z
C
Say the above table is in columns one and two, you'd do:
=IF(MOD(ROW(),2)=0,INDIRECT(ADDRESS(INT(ROW()/2), 1)),
INDIRECT(ADDRESS(INT(ROW()/2)+1, 2)))
Explanation
Let's break that down a little. The first part is MOD(ROW(), 2) which returns a zero if the current row is even, and a one if it's odd.
So the IF goes FALSE/TRUE/FALSE/TRUE as we go down the column.
Next, the ADDRESS(INT(ROW()/2), 1) returns us a string representation of the address of the cell at column 1 and at half the current row. (Rounded down). This piece on its own looks like:
#VALUE!
$A$1
$A$1
$A$2
$A$2
$A$3
$A$3
(That first #VALUE error is because 1/2 = 0.5 which rounds down to zero. There's no row zero!)
The INDIRECT function returns whatever value is found at that address.
The rest is pretty clear.
NOTE: There may be a slicker way than using INDIRECT and ADDRESS. Suggestions welcome.

Combining COUNT IF AND VLOOK UP EXCEL

I have multiple spreadsheets in a workbook and I would like the following in basic English talk:
IF worksheet1(cell)A3, appears in 'worksheet2' column B - count how many times it appears in column b 'worksheet 2'
So in other words - Lets say A3 = BOB smith - in work sheet 1
and appears 4 times in worksheet 2 - I want the formula to count the fact that A3 'Bob smith' is in worksheet 2 4 times, and come back and tell me 4.
I have attempted to do separate calculations - with use of Vlookups - then in another cell to count/do if statement
for example
=COUNTIF(VLOOKUP(A9,'To retire'!J:J,9,1))
=IF(J228=O233, 'worksheet2'!F440,0)
=VLOOKUP(A3,'worksheet2'!A:A,1,1)
Help would be very much appreciated, I am very stuck - I am unsure if I am looking into this too deeply or not enough! Thank you in advance
This is trivial when you use SUMPRODUCT. Por ejemplo:
=SUMPRODUCT((worksheet2!A:A=A3)*1)
You could put the above formula in cell B3, where A3 is the name you want to find in worksheet2.
=COUNTIF() Is the function you are looking for
In a column adjacent to Worksheet1 column A:
=countif(worksheet2!B:B,worksheet1!A3)
This will search worksheet 2 ALL of column B for whatever you have in cell A3
See the MS Office reference for =COUNTIF(range,criteria) here!
You can combine this all into one formula, but you need to use a regular IF first to find out if the VLOOKUP came back with something, then use your COUNTIF if it did.
=IF(ISERROR(VLOOKUP(B1,Sheet2!A1:A9,1,FALSE)),"Not there",COUNTIF(Sheet2!A1:A9,B1))
In this case, Sheet2-A1:A9 is the range I was searching, and Sheet1-B1 had the value I was looking for ("To retire" in your case).
Try this:
=IF(NOT(ISERROR(MATCH(A3,worksheet2!A:A,0))),COUNTIF(worksheet2!A:A,A3),"No Match Found")
If your are referring to two worksheets please use this formula
=COUNTIF(Worksheet2!$A$1:$A$50,Worksheet1cellA1)
In case referring to to more than two worksheets please use this formula
=COUNTIF(Worksheet2!$A$1:$A$50,Worksheet1cellA1)+=COUNTIF
(Worksheet3!$A$1:$A$50,Worksheet1cellA1)+=
COUNTIF(Worksheet4!$A$1:$A$50,Worksheet1cellA1)

Resources