How to remove duplicates in a string - excel-formula

I have a file contains 38,000 records each row contains 2 or more ';' at the end. is there any formula to remove the end repeated ';' in Excel or any other tool for example

To remove repeated characters (semi-colons in this case)
Hit CTRL+H
Find What: ;; (two semicolons)
Replace with: ; (one semicolon)
Click Replace All.
When it finishes, repeat Step 4 until there are no more matches found.
Now the document will have no more than one semicolon in a row.
Remove repeated characters using a VBA function:
The following function does the same thing using VBA, and for any character you choose:
Function removeDoubleChars(txt As String, doubleChar As String) As String
'removes all multiple-consecutive [doubleChar] within [txt]
Do
txt = Replace(txt, doubleChar & doubleChar, doubleChar)
Loop While InStr(txt, doubleChar & doubleChar) > 0
removeDoubleChars = txt
End Function
You would use this like Range("A1") = removeDoubleChars ( Range("A1"), ";") to remove consecutive semicolons from cell A1.

Related

Extract substrings from irregular text in Excel cell

I am trying to solve this problem -
If suppose I have text like this in a single column on Excel
#22-atr$$1 AM**01-May-2015&&
$21-atr#10-Jan-2007*6 PM&
&&56-atr#11 PM$$8-Jan-2016*
**4 PM#68-atr#21-Mar-2022&&
and I want to write functions to have separate columns as follows
Can someone help me do that please?
I am trying to solve this problem and the only thing that I was able to arrive to is extracting Month by using =MID(A1,FIND("-",A1)+1,3)
One option for formulae would be using new functions, currently available in the BETA-channel for insiders:
Formula in B1:
=LET(A,TEXTSPLIT(A1,{"#","$","&","*","#"},,1),B,SORTBY(A,IFERROR(MATCH(RIGHT(A),{"r","M"},0),3)),C,HSTACK(TAKE(B,,2),TEXTSPLIT(TEXT(--INDEX(B,3),"YYYY-Mmm-D"),"-")),IFERROR(--C,C))
The idea is to:
Use LET() throughout to store variables;
TEXTSPLIT() the value in column A using all available delimiters into columns and skip empty values in the resulting array;
Then SORTBY() the rightmost character of the resulting three elements using MATCH(). The IFERROR() will catch the data-string;
We can than HSTACK() the 1st and 2nd column with the result of splitting the 3rd element after we formatted to YYYY-MMM-D first;
Finally, the resulting array can be multiplied by a double unary. If not, we replace it with original content from the previous variable.
Notes:
I formatted column C to hold time-value in AM/PM.
I changed the text to hold dutch month-names to have Excel recognize the dates for demonstration purposes. Should work the same with English names.
For fun an UDF using regular expressions:
Public Function GetPart(inp As String, prt As Long) As Variant
Dim Pat As String
Select Case prt
Case 0
Pat = "(\d+-atr)"
Case 1
Pat = "(\d+\s*[AP]M)"
Case 2
Pat = "-(\d{4})"
Case 3
Pat = "-(\w+)-"
Case 4
Pat = "(\d+)-\w+-"
Case Else
Pat = ""
End Select
With CreateObject("vbscript.regexp")
.Pattern = ".*" & Pat & ".*"
GetPart = .Replace(inp, "$1")
End With
End Function
Invoke through =GetPart(0,A1). Choices ares 0-4 and in order of your column-headers.
You can achieve what you wish by applying a few simple transformations.
Replace the #,$,* and & with a common character that is guaranteed not to appear in the data sections (e.g. #)
Replace all occurrences of 2 or more runs of the # character with a single #
Trim the # from the start and end of the string
Split the string into an array using # as the split character (vba.split)
use For Each to loop over the array
In the loop have a set of three tests
Test 1 tests the string for the ocurrence of "-atr"
Test 2 tests the string for the occurence of "-XXX-" where XXX is a three letter month - You then split the date at the - to give an array with Day/Month/Year
Test 3 Tests if the string has ' AM' or ' PM'

Splitting very large string separated with comma and i need to split 50 items only per row

im having very big string on 1st row.so 1st row contains lots of items with comma like below
12345,54322,44444,222222222,444444,121,333,44444,........
I just need to split this till 50 items in every row. lets assume there are 700 items separated with comma and I want to keep till 50 items only in 1st row and then next 50 in 2nd row and so on.
I tried with the below code which splits till 50 for sure but im not sure if this will works going forward. so need help on this
OutData = Split(InpData, ",")(50)
MsgBox OutData
You can do this in many more ways, but one would be to replace every nth comma. For example through Regular Expressions:
Sub Test()
Dim s As String: s = "1,2,3,4,5,6,7,8,9,10,11"
Dim n As Long: n = 2
Dim arr() As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "([^,]*(?:,[^,]*){" & n - 1 & "}),"
arr = Split(.Replace(s, "$1|"), "|")
End With
End Sub
The pattern used means:
( - Open 1st capture group;
[^,]* - Match 0+ (Greedy) characters other than comma;
(?: - Open a nested non-capture group;
,[^,]* - Match a comma and again 0+ characters other than comma;
){1} - Close the non-capture group and match n-1 times (1 time in the given example);
), - Close the capture group and match a literal comma.
Replace every match with the content of the 1st capture group and a character you know is not in the full string so we can split on that character. See an online demo
I suppose you can do whatever you like with the resulting array. You probably want to transpose it into the worksheet.

Excel - reverse text to columns

I am struggling to find a good way of doing this.
I have a file where some lines are hundreds of words long (comma separated) and some are only a few words long.
So performing a text to columns produces hundreds of columns, most of which are blank.
I have done my edits on the second column and now need to join everything back so that each line can be read in a text file and all words are comma separated once more.
Is there a formula that will know how many columns are in each line that are not blank and bring them all back into the first cell with comma separation?
I should add, I only have Excel 2010
I'd be happy to try a good powershell script solution if possible
Many thanks,
K
eg.
54325,354354,786756,6543,73644,23323,544,7233,64537,654,56,3456,754,876666,78,788
122,433
655,766
1233,7374,65436,65444,6577,85488,56767,8585876,6755,544445,67,67783,2233,466636
I use text to columns so I can work on contents of column 2, but then need it back in this format once done.
If I simply save as csv and open as text file, there are commas for every blank cell in each column
Given a text file like the one you describe, you could simply split the lines manually:
Get-Content .\input.txt |ForEach-Object {
# split line into individual values
$values = $_ -split ','
# modify as needed
if($values.Length -gt 2){
$values[1] = "Make changes to column 2 here"
}
# stitch line back together
$values -join ','
} |Set-Content .\output.txt
Not sure why you need to strip off the trailing commas from the short rows, as they don't produce blank columns, only empty cells. The columns have content from another row. But here is VBA code to do what you request, in case that is a possibility for your usage:
Assumption: Starting point is your worksheet with each element of the CSV data you show above, in a separate cell.
None of your words includes a comma, linefeed, or other character that requires enclosing the word in quotes.
This VBA code:
Option Explicit
Sub createCSV()
Dim v, x, S As String, I As Long, RE As Object
v = ThisWorkbook.Worksheets("Sheet2").Cells(1, 1).CurrentRegion
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = ",*$"
.MultiLine = True
.Global = True
End With
ReDim x(1 To UBound(v, 1), 1 To 1)
With WorksheetFunction
For I = 1 To UBound(v, 1)
'Replace trailing commas with nothing
x(I, 1) = RE.Replace(Join(.Index(v, I, 0), ","), "")
Next I
End With
Open "D:\Users\Ron\Desktop\NewFile.csv" For Output As #1
For I = 1 To UBound(x)
Print #1, x(I, 1)
Next I
Close #1
End Sub
I used a Regular Expression to remove the trailing commas.
results (in Notepad++):
54325,354354,786756,6543,73644,23323,544,7233,64537,654,56,3456,754,876666,78,788
122,433
655,766
1233,7374,65436,65444,6577,85488,56767,8585876,6755,544445,67,67783,2233,466636

Find count of multiline in an Excel cell starting with delimiter -

I am looking to find formula which gives me count of -> how many line in multiline of the cell are begining with - (hyphen)
for e.g. if cell contains
how are you keeping up
-I am well and need toy
-"You" are asking wrong question
<you are wrong>
-why should i reply you
sum count of qualified multiline is = 3
can anyone help me out here please
If you first lines never start with an hyphen, or at least do not count towards the total, then try:
Formula in B1:
=(LEN(A1)-LEN(SUBSTITUTE(A1,CHAR(10)&"-","")))/2
If your first line can also start with an hyphen and therefor count towards the total, try:
=(LEN(CHAR(10)&A1)-LEN(SUBSTITUTE(CHAR(10)&A1,CHAR(10)&"-","")))/2
Here is a VBA solution:
Function CountLines(text As String, Optional flag As String = "") As Long
'counts all lines in text which starts with flag
Dim i As Long, count As Long
Dim lines As Variant
lines = Split(text, vbLf)
For i = LBound(lines) To UBound(lines)
If Mid(lines(i), 1, Len(flag)) = flag Then
count = count + 1
End If
Next i
CountLines = count
End Function
If this is in a standard code module, the example text in A1 and in B1 you enter the formula =CountLines(A1,"-"), it will evaluate to 3.
If you want to include the first line in the potential count, then, in Windows Excel 2013+, you can try:
=COUNTA(FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,">",">"),"<","<"),"""","""),CHAR(10),"</s><s>") & "</s></t>","//s[starts-with(text(),'-')]"))
Replace illegal xml characters ",<, and >
Create an XML by splitting into nodes based on the LF character
Use xpath //s[starts-with(text(),'-')] to return only those nodes that start with a hyphen.
COUNTA to return the count of those nodes

Excel replace characters in string before and after 'x'

Hello I have a column with strings (names of products) in it.
Now these are formatted as Name LenghtxWidth, example Green box 20x30. Now I need to change the 20 with the 30 in this example so I get Green box 30x20, any ideas how I can achieve this?
Thanks
Here is both a formula solution, as well as a VBA solution using Regular Expressions:
Formula
=LEFT(A1,FIND(TRIM(RIGHT(SUBSTITUTE(A1," ",REPT(" ",99)),99)),A1)-1)&
MID(TRIM(RIGHT(SUBSTITUTE(A1," ",REPT(" ",99)),99)),SEARCH("x",TRIM(RIGHT(SUBSTITUTE(A1," ",REPT(" ",99)),99)))+1,99)&
"x"&
LEFT(TRIM(RIGHT(SUBSTITUTE(A1," ",REPT(" ",99)),99)),SEARCH("x",TRIM(RIGHT(SUBSTITUTE(A1," ",REPT(" ",99)),99)))-1)
UDF
Option Explicit
Function RevWL(S As String)
Dim RE As Object
Const sPat As String = "(\d+.?\d*)x(\d+.?\d*)"
'If L or W might start with a decimal point, and not a digit,
'Then change sPat to: (\d*.?\d+)x(\d*.?\d+)
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.ignorecase = True
.Pattern = sPat
RevWL = .Replace(S, "$2x$1")
End With
End Function
Here is an example of the kinds of data I tested with:
The Formula works by looking at the last space-separated substring which would be LxW, then reversing the portion after and before the x, then concatenating everything back together.
The regex pattern captures the two numbers (could be integers or decimals, so long as the start with an integer -- although that could be changed if needed), and reversing them.
Here is a more detailed explanation of the regex (and the replacement string) with links to a tutorial:
(\d+.?\d*)x(\d+.?\d*)
(\d+.?\d*)x(\d+.?\d*)
Options: Case insensitive; ^$ don’t match at line breaks
Match the regex below and capture its match into backreference number 1 (\d+.?\d*)
Match a single character that is a “digit” \d+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match any single character that is NOT a line break character .?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Match a single character that is a “digit” \d*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “x” literally x
Match the regex below and capture its match into backreference number 2 (\d+.?\d*)
Match a single character that is a “digit” \d+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match any single character that is NOT a line break character .?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Match a single character that is a “digit” \d*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
$2x$1
Insert the text that was last matched by capturing group number 2 $2
Insert the character “x” literally x
Insert the text that was last matched by capturing group number 1 $1
Created with RegexBuddy
Here is a VBA solution that will work for you:
Option Explicit
Function Switch(r As Range) As String
Dim measurement As String
Dim firstPart As String
Dim secondPart As String
measurement = Right(r, Len(r) - InStrRev(r, " "))
secondPart = Right(measurement, Len(measurement) - InStr(1, measurement, "x"))
firstPart = Left(measurement, InStr(1, measurement, "x") - 1)
Switch = Left(r, InStrRev(r, " ") - 1) & " " & secondPart & "x" & firstPart
End Function
You can paste this in a regular module in the VBE (Visual Basic Editor) and use it as a regular function/formula. If your value is in cell A1 then type =Switch(A1) in cell B1. Hope it helps!
Ok, so it is really easier to use VBA, but if you want only some formulas you can use some columns to split your text and then concatenate your cells.
Here is a little example:
Of course B1-4 are optional. It is here only to have something more readable, but you can do use only one formula
=CONCATENATE(LEFT(A1, SEARCH(" ",A1,1)-1)," ",RIGHT(RIGHT(A1,LEN(A1)-SEARCH(" ",A1,1)),LEN(RIGHT(A1,LEN(A1)-SEARCH(" ",A1,1)))-SEARCH("x",RIGHT(A1,LEN(A1)-SEARCH(" ",A1,1)),1)),"x",LEFT(RIGHT(A1,LEN(A1)-SEARCH(" ",A1,1)), SEARCH("x",RIGHT(A1,LEN(A1)-SEARCH(" ",A1,1)),1)-1))
If you have several spaces in your names, you can use this formula that will search the last space in the text
=CONCATENATE(LEFT(A1, SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))-1)," ",RIGHT(RIGHT(A1,LEN(A1)-SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))),LEN(RIGHT(A1,LEN(A1)-SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))))-SEARCH("x",RIGHT(A1,LEN(A1)-SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))),1)),"x",LEFT(RIGHT(A1,LEN(A1)-SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))), SEARCH("x",RIGHT(A1,LEN(A1)-SEARCH("^^",SUBSTITUTE(A1," ","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))),1)-1))

Resources