I have a few thousand rows I would like to clean. I would like to remove all repeated commas and replace them with just one comma. I am using excel. Examples can be scene below
Current Data
Desired Output
,,,,,,,,,,
one,,,,,,two,,,,,three
one, two, three
two,,,,one,,,,,,
two, one
two,,, one,one,two,,,one
two, one, one, two, one
You can apply it to a cell as a user-defined function, and if you have a lot of data, it will be faster to use a procedure.
Sub test()
Dim vDB As Variant
Dim vResult() As Variant
Dim i As Long, r As Long
Dim str As String
vDB = Range("a1", Range("a" & Rows.Count).End(xlUp))
r = UBound(vDB, 1)
ReDim vResult(1 To r, 1 To 1)
For i = 1 To r
str = vDB(i, 1)
vResult(i, 1) = myresult(str)
Next i
Range("b1").Resize(r) = vResult
End Sub
Function myresult(str As String)
Dim vR(), vS, v
Dim n As Integer
vS = Split(str, ",")
For Each v In vS
If v <> "" Then
n = n + 1
ReDim Preserve vR(1 To n)
vR(n) = v
End If
Next v
If n Then
myresult = Join(vR, ", ")
Else
myresult = ""
End If
End Function
Related
I have the following data
0/3, 1/1, 3/4
1/3, 3/2, 6/2
12/1, 3/6, 3/4
3/4, 0/3, 1/1 'will be considered is duplicate with the first data
Is there any way to find and remove duplicate data like this?
My current method is to split into 3 strings based on "," then check with the following condition.
'I got each String value by mid command.
'FrstStr1: First String of String 1
'SecStr1: Second String of String 1
'ThrStr1: Third String of String 1
'FrstStr2: First String of String 2
'SecStr2: Second String of String 2
'ThrStr2: Third String of String 2
if (FrstStr1 = FrstStr2 and SecStr1 = SecStr2 and ThrStr1 = ThrStr2) or
(FrstStr1 = FrstStr2 and SecStr1 = ThrStr2 and ThrStr1 = SecStr2) or
() or () .... then
I listed 6 possible cases and put them into if condition like above.
Make Array by Spliting data with delimiter comma.
And Sorting Array by function.
Ceck duplicated data by Dictionary.
## Code ##
Sub test()
Dim vR(), vDB
Dim dic As Object
Dim v As Variant
Dim s As String
Dim i As Long, n As Long
Set dic = CreateObject("Scripting.Dictionary")
vDB = Range("a1").CurrentRegion
For i = 1 To UBound(vDB, 1)
v = Split(vDB(i, 1), ",")
s = newArray(v)
If dic.exists(s) Then
Else
dic.Add s, s
n = n + 1
ReDim Preserve vR(1 To n)
vR(n) = vDB(i, 1)
End If
Next i
If n Then
Range("e1").Resize(n) = WorksheetFunction.Transpose(vR)
End If
End Sub
Function newArray(v As Variant)
Dim temp As String
Dim r As Integer, i As Integer, j As Integer
r = UBound(v)
For i = LBound(v) To r - 1
For j = i + 1 To r
v(i) = Trim(v(i))
v(j) = Trim(v(j))
If v(i) > v(j) Then
temp = v(j)
v(j) = v(i)
v(i) = temp
End If
Next j
Next i
newArray = Join(v, ",")
End Function
Image
expoliting Dictionary and ArrayList objects could lead to a very compact (and maintanable) code:
Sub RemoveDuplicatedDataWithoutCountingOrder()
Dim dict As Object: Set dict = CreateObject("Scripting.Dictionary")
Dim j As Long
Dim key As String
Dim datum As Variant, couple As Variant
For Each datum In Range("A1").CurrentRegion.Value
key = vbNullString
With CreateObject("System.Collections.SortedList")
For Each couple In Split(Replace(datum, " ", vbNullString), ",")
.Add couple, 0
Next
For j = 0 To .Count - 1
key = key & .getkey(j)
Next
If Not dict.exists(key) Then dict.Add key, datum
End With
Next
Range("C1").Resize(dict.Count) = Application.Transpose(dict.items)
End Sub
How do I count the total number of "alt" and "first" that appeared in a cell and do the same for other cells as well while ignoring empty cells in the process? For instance, if a cell has first, first, alt, first, first, first, it should give me firstcounter = 5 (where firstcounter is the total count for first) and altcounter= 1(altcounter is the total count for alt). After that I can use the value of firstcounter and altcounter found to concatenate them into a string as shown in column B in the form of "first-" & firstcounter, "alt-"& altcounter.
Dim ia As Long
Dim lastrow2 As Long
Dim ws1 As Worksheet
Set ws1 = ThisWorkbook.Sheets(1)
lastrow2 = ws1.Range("A" & ws1.Rows.count).End(xlUp).Row
For ia = 2 To lastrow2
Dim arr() As Variant
' Split the string to an array
arr = Split(ws1.Cells(ia, "A"), ",").Value
'what should i do after split
Enter the following into a code module...
Function CountWords$(r)
Dim a&, f&, w
For Each w In Split(r, ",")
If w = "alt" Then a = a + 1
If w = "first" Then f = f + 1
Next
If (a + f) Then CountWords = "first-" & f & ",alt-" & a
End Function
Then in cell B2 enter this formula:
=CountWords(A2)
Now copy it downwards as far as you need.
Update
To use the above function from VBA without entering formulas in the worksheet you can do it like this...
Sub Cena()
Dim i&, v
With [a2:a8]
v = .Value2
For i = 1 To UBound(v)
v(i, 1) = CountWords(v(i, 1))
Next
.Offset(, 1) = v
End With
End Sub
Function CountWords$(r)
Dim a&, f&, w
For Each w In Split(r, ",")
If w = "alt" Then a = a + 1
If w = "first" Then f = f + 1
Next
If (a + f) Then CountWords = "first-" & f & ",alt-" & a
End Function
Update #2
In response to your questions in the comments, you can use this variation instead...
Sub Cena()
Dim i&, v
With [a2].Resize(Cells(Rows.Count, 1).End(xlUp).Row - 1)
v = .Value2
For i = 1 To UBound(v)
v(i, 1) = CountWords(v(i, 1))
Next
.Cells = v
End With
End Sub
Function CountWords$(r)
Dim a&, f&, w
For Each w In Split(r, ",")
If w = "alt" Then a = a + 1
If w = "first" Then f = f + 1
Next
If (a + f) Then CountWords = "first-" & f & ",alt-" & a
End Function
In order to make this independent from the words alt and first and whitespaces in the string I would use the following functions
Option Explicit
'Add a reference to Microsoft VBScript Regular Expressions 5.5
Public Function RemoveWhiteSpace(target As String) As String
With New RegExp
.Pattern = "\s"
.MultiLine = True
.Global = True
RemoveWhiteSpace = .Replace(target, vbNullString)
End With
End Function
'Add a reference to Microsoft Scripting Runtime
Function CountWordsA(rg As Range) As String
On Error GoTo EH
Dim dict As Dictionary
Set dict = New Dictionary
Dim vDat As Variant
vDat = RemoveWhiteSpace(rg.Value)
vDat = Split(vDat, ",")
Dim i As Long
For i = LBound(vDat) To UBound(vDat)
If dict.Exists(vDat(i)) Then
dict(vDat(i)) = dict(vDat(i)) + 1
Else
dict.Add vDat(i), 1
End If
Next i
Dim vKey As Variant
ReDim vDat(1 To dict.Count)
i = 1
For Each vKey In dict.Keys
vDat(i) = vKey & "-" & dict(vKey)
i = i + 1
Next vKey
CountWordsA = Join(vDat, ",")
Exit Function
EH:
CountWordsA = ""
End Function
Sub TestIt()
Dim rg As Range
Set rg = Range("A2:A8")
Dim sngCell As Range
For Each sngCell In rg
sngCell.Offset(, 1) = CountWordsA(sngCell)
Next sngCell
End Sub
More about dictionaries and regular expressions
Alternative using Filter() function
This demonstrates the use of the Filter() function to count words via function UBound():
Function CountTerms() (usable also in formulae)
Function CountTerms(ByVal WordList As String, Optional TermList As String = "first,alt", Optional DELIM As String = ",") As String
'Purpose: count found terms in wordlist and return result as list
'[1] assign lists to arrays
Dim words, terms
words = Split(WordList, DELIM): terms = Split(TermList, DELIM)
'[2] count filtered search terms
Dim i As Long
For i = 0 To UBound(terms)
terms(i) = terms(i) & "-" & UBound(Filter(words, terms(i), True, vbTextCompare)) + 1
Next i
'[3] return terms as joined list, e.g. "first-5,alt-1"
CountTerms = Join(terms, ",")
End Function
Example call (due to comment) & help function getRange()
In order to loop over the entire range and replace the original data with the results list:
Sub ExampleCall()
'[1] get range data assigning them to variant temporary array
Dim rng As Range, tmp
Set rng = getRange(Sheet1, tmp) ' << change to sheet's Code(Name)
'[2] loop through array values and get counts
Dim i As Long
For i = 1 To UBound(tmp)
tmp(i, 1) = CountTerms(tmp(i, 1))
Next i
'[3] write to target (here: overwriting due to comment)
rng.Offset(ColumnOffset:=0) = tmp
End Sub
Function getRange(mySheet As Worksheet, tmp) As Range
'Purpose: assign current column A:A data to referenced tmp array
With mySheet
Set getRange = .Range("A2:A" & .Cells(.Rows.Count, 1).End(xlUp).Row)
tmp = getRange ' assign range data to referenced tmp array
End With
End Function
I have a current code that compares the first two sheets and then outputs the differences in another. I am now trying to figure out how to also output the similarities into another worksheet.
Here is my current code:
Option Explicit
Sub CompareIt()
Dim ar As Variant
Dim arr As Variant
Dim Var As Variant
Dim v()
Dim i As Long
Dim n As Long
Dim j As Long
Dim str As String
ar = Sheet1.Cells(10, 1).CurrentRegion.Value
With CreateObject("Scripting.Dictionary")
.CompareMode = 1
ReDim v(1 To UBound(ar, 2))
For i = 2 To UBound(ar, 1)
For n = 1 To UBound(ar, 2)
str = str & Chr(2) & ar(i, n)
v(n) = ar(i, n)
Next
.Item(str) = v: str = ""
Next
ar = Sheet2.Cells(10, 1).CurrentRegion.Resize(, UBound(v)).Value
For i = 2 To UBound(ar, 1)
For n = 1 To UBound(ar, 2)
str = str & Chr(2) & ar(i, n)
v(n) = ar(i, n)
Next
If .exists(str) Then
.Item(str) = Empty
Else
.Item(str) = v
End If
str = ""
Next
For Each arr In .keys
If IsEmpty(.Item(arr)) Then .Remove arr
Next
Var = .items: j = .Count
End With
With Sheet3.Range("a10").Resize(, UBound(ar, 2))
.CurrentRegion.ClearContents
.Value = ar
If j > 0 Then
.Offset(1).Resize(j).Value = Application.Transpose(Application.Transpose(Var))
End If
End With
Sheet3.Activate
End Sub
Any ideas?
Since your question is:
Any ideas?
I do have an idea that does rely on:
Your excel license (TEXTJOIN function is available if you have Office 2019, or if you have an Office 365 subscription)
Your data size (If the resulting string exceeds 32767 characters (cell limit), TEXTJOIN returns the #VALUE! error.)
But it's an idea :)
Sheet1 & Sheet2
Run this code:
Sub Test()
Dim Var() As String
With ThisWorkbook.Sheets("Sheet3")
Var() = Split(Evaluate("=TEXTJOIN("","",TRUE,IF(Sheet1!A1:A6=TRANSPOSE(Sheet2!A1:A5),Sheet1!A1:A6,""""))"), ",")
.Cells(1, 1).Resize(UBound(Var) + 1).Value = Application.Transpose(Var)
End With
End Sub
Output on sheet3:
Obviously it's simplified, but you can add variables in the EVALUATE.
I need to do the following:
I have a table where the 13th column contains strings such as
acbd,ef,xyz
qwe,rtyu,tqyuiop
And what I want to create new rows in order to separate those values:
acbd
ef
xyz
qwe
rtyu
tqyuiop
Meaning I would have now 6 rows instead of 2, and all the other information on cells would remain the same (i.e. all the other values of the row would repeat themselves through all the new rows).
What I have tried is the following:
Sub test()
Dim coma As Integer
Dim finalString As String
Set sh = ActiveSheet
For Each rw In sh.Rows
* If find a coma, then copy the row, insert a new row, and paste in this new row*
If InStr(1, sh.Cells(rw.Row, 13).Value, ",") Then
Rows(rw.Row).Copy
Rows(rw.Row).insert shift:=xlShiftDown
Rows(rw.Row).PasteSpecial xlPasteValues
* Now it will look for the position of the comma and assign
to finalString what's before the comma, and assign to mod String
what's after the comma *
coma = InStr(1, sh.Cells(rw.Row, 13).Value, ",")
finalString = Left(sh.Cells(rw.Row, 13).Value, coma - 1)
modString = Right(sh.Cells(rw.Row, 13).Value, Len(sh.Cells(rw.Row, 13).Value) - coma)
* Replace the values: *
sh.Cells(rw.Row, 13).Value = modString
sh.Cells(rw.Row - 1, 13).Value = finalString
End If
Next rw
MsgBox ("End")
End Sub
This code works perfectly well except that for tables with 400 rows it takes 15 +-5 seconds to be completed.
I would like some suggestions on how to improve the performance of this. Thank you!
With data in column L, give this a try:
Sub LongList()
Dim wf As WorksheetFunction, arr, s As String
Set wf = Application.WorksheetFunction
s = wf.TextJoin(",", True, Range("L:L"))
arr = Split(s, ",")
Range("M1").Resize(UBound(arr) + 1, 1).Value = wf.Transpose(arr)
End Sub
Note:
No looping over cells.No looping within cells. This process can be accomplished with just worksheet formulas, VBA is not needed.
Try this.
Sub test()
Dim vDB, vR(), vS, s
Dim i As Long, j As Integer, n As Long
vDB = Range("a1").CurrentRegion
For i = 1 To UBound(vDB, 1)
vS = Split(vDB(i, 13), ",")
For Each s In vS
n = n + 1
ReDim Preserve vR(1 To 13, 1 To n)
For j = 1 To 12
vR(j, n) = vDB(i, j)
Next j
vR(13, n) = s
Next s
Next i
Range("a1").Resize(n, 13) = WorksheetFunction.Transpose(vR)
End Sub
Before.
After.
If you have more columns, do like this.
Sub test()
Dim vDB, vR(), vS, s
Dim i As Long, j As Integer, n As Long
Dim c As Integer
vDB = Range("a1").CurrentRegion
c = UBound(vDB, 2)
For i = 1 To UBound(vDB, 1)
vS = Split(vDB(i, 13), ",")
For Each s In vS
n = n + 1
ReDim Preserve vR(1 To c, 1 To n)
For j = 1 To c
vR(j, n) = vDB(i, j)
Next j
vR(13, n) = s
Next s
Next i
Range("a1").Resize(n, c) = WorksheetFunction.Transpose(vR)
End Sub
If you want an immediate boost in performance without having to adjust any kind of code just add Application events at the beginning...
With Application
.DisplayAlerts = False
.ScreenUpdating = False
End With
and be sure to turn them back on at the end of the code...
With Application
.DisplayAlerts = True
.ScreenUpdating = True
End With
These two simple statements usually speed up code considerably.
This should look for comma-delimited values in column M and overwrite the values in column M with the split values (basically what your code was doing).
Option Explicit
Sub splitValues()
Dim sourceSheet As Worksheet
Set sourceSheet = ActiveSheet
With sourceSheet
Dim lastRow As Long
lastRow = .Cells(.Rows.Count, "M").End(xlUp).Row
Dim inputValues() As Variant
inputValues = .Range("M1:M" & lastRow).Value2
Dim splitString() As String
Dim rowIndex As Long
Dim outputArray As Variant
Dim outputRowIndex As Long
outputRowIndex = 1
For rowIndex = LBound(inputValues, 1) To UBound(inputValues, 1)
splitString = VBA.Strings.Split(inputValues(rowIndex, 1), ",", -1, vbBinaryCompare)
outputArray = Application.Transpose(splitString)
.Cells(outputRowIndex, "M").Resize(UBound(outputArray, 1), UBound(outputArray, 2)).Value2 = outputArray
outputRowIndex = outputRowIndex + UBound(outputArray, 1)
Next rowIndex
End With
End Sub
Column A contains the labels or outcome value, Columns B-N contain varying lengths of comma separated values, but range for each column is the same (i.e., 1-64). The goal is to covert to a new table with Column A representing the value range (1-64) and Columns B-N the labels/outcome from the original table.
A semi-related solution was sought here, but without use of macros.
I will let you to modify this code,
Sub splitThem()
Dim i As Long, j As Long, k As Long, x As Long
x = 1
Sheets.Add.Name = "newsheet"
For j = 2 To Cells(1, Columns.Count).End(xlToLeft).Column
For i = 1 To Cells(Rows.Count, 2).End(xlUp).Row
If Cells(i, j) <> "" Then
For k = 1 To Len(Cells(i, j)) - Len(Replace(Cells(i, j), ",", "")) + 1
Sheets("newsheet").Cells(x, j) = Cells(i, 1)
x = x + 1
Next k
End If
Next i
x = 1
Next j
End Sub
Try this code.
Sub test()
Dim vDB, vR()
Dim vSplit, v As Variant
Dim Ws As Worksheet
Dim i As Long, n As Long, j As Integer, c As Integer
vDB = Range("a2").CurrentRegion
n = UBound(vDB, 1)
c = UBound(vDB, 2)
ReDim vR(1 To 64, 1 To c)
For i = 1 To 64
vR(i, 1) = i
Next i
For i = 2 To n
For j = 2 To c
vSplit = Split(vDB(i, j), ",")
For Each v In vSplit
vR(v, j) = vDB(i, 1)
Next v
Next j
Next i
Set Ws = Sheets.Add '<~~ replace your sheet : Sheets(2)
With Ws
For i = 1 To c
.Range("b1")(1, i) = "COND" & i
Next i
.Range("a2").Resize(64, c) = vR
End With
End Sub