Count 2- and 3-word strings frequency in Excel - excel

Hello smart human beings out there
I have this setup in my Excel
Basically, what I'm trying to achieve here is automatically grab every single string from column A (and paste to column H) and return the frequency in column I. The script is below
Sub WordCountTester()
Dim d As Object, k, i As Long, ws As Worksheet
Set ws = ActiveSheet
With ws.ListObjects("Table3")
If Not .DataBodyRange Is Nothing Then
.DataBodyRange.Delete
End If
End With
Set d = WordCounts(ws.Range("A2:A" & ws.Cells(Rows.Count, "A").End(xlUp).Row), _
ws.Range("F2:F" & ws.Cells(Rows.Count, "F").End(xlUp).Row))
'list words and frequencies
For Each k In d.keys
ws.Range("H2").Resize(1, 2).Offset(i, 0).Value = Array(k, d(k))
i = i + 1
Next k
End Sub
'rngTexts = range with text to be word-counted, defined in set d= above
'rngExclude = 'range with words to exclude from count, defined in set d= above
Public Function WordCounts(rngTexts As Range, rngExclude As Range) As Object 'dictionary
Dim words, c As Range, dict As Object, regexp As Object, w, wd As String, m
Set dict = CreateObject("scripting.dictionary")
Set regexp = CreateObject("VBScript.RegExp") 'see link below for reference
With regexp
.Global = True
.MultiLine = True
.ignorecase = True
.Pattern = "[\dA-Z-]{3,}" 'at least 3 characters
End With
'loop over input range
For Each c In rngTexts.Cells
If Len(c.Value) > 0 Then
Set words = regexp.Execute(LCase(c.Value))
'loop over matches
For Each w In words
wd = w.Value 'the text of the match
If Len(wd) > 1 Then 'EDIT: ignore single characters
'increment count if the word is not found in the "excluded" range
If IsError(Application.Match(wd, rngExclude, 0)) Then
dict(wd) = dict(wd) + 1
End If
End If '>1 char
Next w
End If
Next c
Set WordCounts = dict
End Function
However, it currently count the string with 1 word only. I want to count strings with 2 and 3 words (and I will consider drive-by as 2 words). Can someone please tell me where in this code I have to fix to achieve that? I still want to keep column F there because there can be 2- or 3- word strings that I want to exclude. Thanks!

If you changed your mind and consider that also two words pairs 2-3, 4-5, 6-7 and so on are necessary, please test the next solution:
Private Sub WordPairsCountTester()
Dim d As Object, k, i As Long, ws As Worksheet, arrFin
Set ws = ActiveSheet
'Attention, please! The last parameter of the called function means How Many Consecutive Words to be counted
Set d = WordPairCountsSp(ws.Range("A2:A" & ws.cells(rows.count, "A").End(xlUp).row), _
ws.Range("F2:F" & ws.cells(rows.count, "F").End(xlUp).row), 3)
arrFin = Application.Transpose(Array(d.Keys, d.items)) 'place the dictionary in an array
'clear contents of the columns where a previous result was returned, if any...:
ws.Range("H2:I" & ws.Range("H" & ws.rows.count).End(xlUp).row).ClearContents
ws.Range("H2").Resize(UBound(arrFin), 2).Value = arrFin 'drop the array content at once
End Sub
Private Function WordPairCountsSp(rngTexts As Range, rngExclude As Range, nrNeigh As Long) As Object
Dim dict As Object, arr, arrCell, i As Long, pairWd As String, j As Long, k As Long
arr = rngTexts.Value 'place the range in an array for faster iteration
Set dict = CreateObject("scripting.dictionary")
For i = 1 To UBound(arr) 'iterate between the array elements
arrCell = Split(Replace(Replace(Replace(Replace(arr(i, 1), ",", ""), ".", ""), "?", ""), "!", "")) 'split the string by default delimiter (space)
If UBound(arrCell) + 1 >= nrNeigh Then
For j = 0 To UBound(arrCell) - nrNeigh + 1 'iterate between the array elements
pairWd = arrCell(j)
For k = 1 To nrNeigh - 1
pairWd = pairWd & " " & arrCell(j + k) 'create a string from nrNeigh neighbour words
Next k
If IsError(Application.match(pairWd, rngExclude, 0)) Then
dict(pairWd) = dict(pairWd) + 1 'place the unique pairs as keys and add occurrences as items
End If
Next j
End If
Next i
Set WordPairCountsSp = dict 'return the above created dictionary
End Function

Related

VBA code that concatenates "unique" values

I'm trying to write a VBA code that will take values from a selection and concatenate the cell values with a line seperator. I also wish to not include and duplicates.
Ex. as follows:
Say I have a data set like the below. I would like to type =ConcatenateUnique(A1:B2,",") and have it return One,Two,Three
Column A
Column B
One
Two
Three
One
I tried the below, although I'm aware if it did work it would only return Two,Three
Function CONCATENATEUNIQUE(Ref As Range, Separator As String) As String
Dim Cell As Range
Dim Result As String
For Each Cell In Ref
If WorksheetFunction.CountIf(Ref, Cell.Value) <= 1 Then
Result = Result & Cell.Value & Separator
End If
Next Cell
CONCATENATEMULTIPLE = Left(Result, Len(Result) - 1)
End Function
For this type of task a Scripting Dictionary is useful (but note this won't work on a Mac):
Function UniqueList(rng As Range, Optional sep As String = ",")
Dim arr, r As Long, c As Long, v, dict As Object
If rng.Count = 1 Then 'handle single-cell case
UniqueList = rng.Value
Exit Function
End If
arr = rng.Value 'get values into an array
Set dict = CreateObject("Scripting.Dictionary")
For r = 1 To UBound(arr, 1)
For c = 1 To UBound(arr, 2)
v = arr(r, c)
If Not IsError(v) Then
If Len(v) > 0 Then dict(v) = True
End If
Next c
Next r
UniqueList = Join(Application.Transpose( _
Application.Transpose(dict.Keys)), sep)
End Function

VBA Find all regEx matches in array

I have a large list and want to find all entries for the same project names.
My data looks like this:
A header
Another header
Project names
First
row1
AA_Bla_ABCDEF
Second
Blah
XY_Blah_ABCDEF
Fourth
Again this project name
AA_Bla_ABCDEF
Third
Blubb
12_Blubb_ABCDEF
Therefore, I have this code, which gets all the possible filter criteria (Project names):
lastRow = Range(CStr("C" & ActiveSheet.Rows.Count)).End(xlUp).Row
Dim data(), dict As Object, r As Long
Set dict = CreateObject("Scripting.Dictionary")
data = ActiveSheet.Range("C2", "C" & CStr(lastRow)).Columns(1).Value
For r = 1 To UBound(data)
dict(data(r, 1)) = Empty
Next
data = WorksheetFunction.Transpose(dict.keys())
End Sub
I can access the list of project names like:
Debug.Print data(1, 1) ' AA_Bla_ABCDEF
Debug.Print data(2, 1) ' XY_Blah_ABCDEF
Debug.Print data(3, 1) ' 12_Blubb_ABCDEF
Now, I would like to search in data for all entries that fulfill certain criteria.
I want to exclude all items that do not start with letters. startPattern = "(^[A-Z]{2})"
I want to find in all remaining items those who have the same last 6 symbols (numbers, chars, underscores...) projectPattern = "(.$){6}"
Therefore, I thought of regEx and tried:
Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp") ' Automatic reference binding
For r = 1 To UBound(data)
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = projectPattern
End With
' If data.find(regEx).count > 1 (if I have this pattern more than once)
' similarEntries = data.find(regEx) ...
How can I search the array for all matches that occur more than once?
In the example list it would be only: AA_Bla_ABCDEF
Using LIKE "[A-Z][A-Z]" to exclude some items and RIGHT(string,6) as dictionary key to count duplicates.
Option Explicit
Sub Macro1()
Dim ws As Worksheet
Dim dict As Object, name As String, key, ar
Dim r As Long, lastrow As Long
Set ws = ActiveSheet
Set dict = CreateObject("Scripting.Dictionary")
lastrow = ws.Cells(Rows.Count, "C").End(xlUp).Row
For r = 2 To lastrow
name = Trim(ws.Cells(r, "C"))
If UCase(Left(name, 2)) Like "[A-Z][A-Z]" Then
key = Right(name, 6)
If dict.exists(key) Then
dict(key) = dict(key) & vbTab & name
Else
dict(key) = name
End If
End If
Next
' show results on sheet2
r = 1
For Each key In dict
ar = Split(dict(key), vbTab)
If UBound(ar) > 0 Then
Sheet2.Cells(r, 1) = key
Sheet2.Cells(r, 2) = UBound(ar) + 1
Sheet2.Cells(r, 3).Resize(1, UBound(ar) + 1) = ar
r= r + 1
End If
Next
End Sub

How to count the total number of specific words in a cell and do the same for other cells as well using VBA?

How do I count the total number of "alt" and "first" that appeared in a cell and do the same for other cells as well while ignoring empty cells in the process? For instance, if a cell has first, first, alt, first, first, first, it should give me firstcounter = 5 (where firstcounter is the total count for first) and altcounter= 1(altcounter is the total count for alt). After that I can use the value of firstcounter and altcounter found to concatenate them into a string as shown in column B in the form of "first-" & firstcounter, "alt-"& altcounter.
Dim ia As Long
Dim lastrow2 As Long
Dim ws1 As Worksheet
Set ws1 = ThisWorkbook.Sheets(1)
lastrow2 = ws1.Range("A" & ws1.Rows.count).End(xlUp).Row
For ia = 2 To lastrow2
Dim arr() As Variant
' Split the string to an array
arr = Split(ws1.Cells(ia, "A"), ",").Value
'what should i do after split
Enter the following into a code module...
Function CountWords$(r)
Dim a&, f&, w
For Each w In Split(r, ",")
If w = "alt" Then a = a + 1
If w = "first" Then f = f + 1
Next
If (a + f) Then CountWords = "first-" & f & ",alt-" & a
End Function
Then in cell B2 enter this formula:
=CountWords(A2)
Now copy it downwards as far as you need.
Update
To use the above function from VBA without entering formulas in the worksheet you can do it like this...
Sub Cena()
Dim i&, v
With [a2:a8]
v = .Value2
For i = 1 To UBound(v)
v(i, 1) = CountWords(v(i, 1))
Next
.Offset(, 1) = v
End With
End Sub
Function CountWords$(r)
Dim a&, f&, w
For Each w In Split(r, ",")
If w = "alt" Then a = a + 1
If w = "first" Then f = f + 1
Next
If (a + f) Then CountWords = "first-" & f & ",alt-" & a
End Function
Update #2
In response to your questions in the comments, you can use this variation instead...
Sub Cena()
Dim i&, v
With [a2].Resize(Cells(Rows.Count, 1).End(xlUp).Row - 1)
v = .Value2
For i = 1 To UBound(v)
v(i, 1) = CountWords(v(i, 1))
Next
.Cells = v
End With
End Sub
Function CountWords$(r)
Dim a&, f&, w
For Each w In Split(r, ",")
If w = "alt" Then a = a + 1
If w = "first" Then f = f + 1
Next
If (a + f) Then CountWords = "first-" & f & ",alt-" & a
End Function
In order to make this independent from the words alt and first and whitespaces in the string I would use the following functions
Option Explicit
'Add a reference to Microsoft VBScript Regular Expressions 5.5
Public Function RemoveWhiteSpace(target As String) As String
With New RegExp
.Pattern = "\s"
.MultiLine = True
.Global = True
RemoveWhiteSpace = .Replace(target, vbNullString)
End With
End Function
'Add a reference to Microsoft Scripting Runtime
Function CountWordsA(rg As Range) As String
On Error GoTo EH
Dim dict As Dictionary
Set dict = New Dictionary
Dim vDat As Variant
vDat = RemoveWhiteSpace(rg.Value)
vDat = Split(vDat, ",")
Dim i As Long
For i = LBound(vDat) To UBound(vDat)
If dict.Exists(vDat(i)) Then
dict(vDat(i)) = dict(vDat(i)) + 1
Else
dict.Add vDat(i), 1
End If
Next i
Dim vKey As Variant
ReDim vDat(1 To dict.Count)
i = 1
For Each vKey In dict.Keys
vDat(i) = vKey & "-" & dict(vKey)
i = i + 1
Next vKey
CountWordsA = Join(vDat, ",")
Exit Function
EH:
CountWordsA = ""
End Function
Sub TestIt()
Dim rg As Range
Set rg = Range("A2:A8")
Dim sngCell As Range
For Each sngCell In rg
sngCell.Offset(, 1) = CountWordsA(sngCell)
Next sngCell
End Sub
More about dictionaries and regular expressions
Alternative using Filter() function
This demonstrates the use of the Filter() function to count words via function UBound():
Function CountTerms() (usable also in formulae)
Function CountTerms(ByVal WordList As String, Optional TermList As String = "first,alt", Optional DELIM As String = ",") As String
'Purpose: count found terms in wordlist and return result as list
'[1] assign lists to arrays
Dim words, terms
words = Split(WordList, DELIM): terms = Split(TermList, DELIM)
'[2] count filtered search terms
Dim i As Long
For i = 0 To UBound(terms)
terms(i) = terms(i) & "-" & UBound(Filter(words, terms(i), True, vbTextCompare)) + 1
Next i
'[3] return terms as joined list, e.g. "first-5,alt-1"
CountTerms = Join(terms, ",")
End Function
Example call (due to comment) & help function getRange()
In order to loop over the entire range and replace the original data with the results list:
Sub ExampleCall()
'[1] get range data assigning them to variant temporary array
Dim rng As Range, tmp
Set rng = getRange(Sheet1, tmp) ' << change to sheet's Code(Name)
'[2] loop through array values and get counts
Dim i As Long
For i = 1 To UBound(tmp)
tmp(i, 1) = CountTerms(tmp(i, 1))
Next i
'[3] write to target (here: overwriting due to comment)
rng.Offset(ColumnOffset:=0) = tmp
End Sub
Function getRange(mySheet As Worksheet, tmp) As Range
'Purpose: assign current column A:A data to referenced tmp array
With mySheet
Set getRange = .Range("A2:A" & .Cells(.Rows.Count, 1).End(xlUp).Row)
tmp = getRange ' assign range data to referenced tmp array
End With
End Function

Create a table with all potential combinations from a given list with two columns (excel)

Is there a way (vba code or excel trick) to manipulate a 2 columnar list so that I get a table with all potential combinations depending on a unique identifier in the first column?
E.g. I have one column with Company Names and another with Country Locations. What I need is every set if combinations of the countries per company (see screenshot attached).
This vba module should solve your problem.
Just copy the code to a new module, declare the input and output columns and the number of the first row of your list.
Note that the code will stop once it hits a line where the "Unique Identifier" Cell is empty.
Also, it requires that your list is sorted with respect to your "Unique Identifier".
If a Unique Identifier only appears once, it will still be written into the output list, but only once and with the outColNation2 staying empty in that row. If this is not desired and it should be left out entirely, just delete the commented if-statement.
Example Image of output
Also note, that a unique identifier can repeat at most 100 times. I assume none of them appears that often as that would create a ridiculously long output list.
Option Compare Text
Sub COMBINATIONS()
Dim i As Long, j As Long, k As Long, l As Long, n As Long
Dim arr(100) As String
Dim UI As String
Dim inColUI As Integer, inColNation As Integer
Dim outColUI As Integer, outColNation1 As Integer, outColNation2 As Integer
Dim FirstRowOfData As Integer
Dim YourWS As Worksheet
inColUI = 1 'Column of the "Unique Identifier"
inColNation = 2 'Column of the "Nations" in your example
outColUI = 4
outColNation1 = 5 'output columns
outColNation2 = 6
FirstRowOfData = 2 'First Row of data
Set YourWS = Application.Worksheets("Sheet1") 'Put in your Worksheet Name here.
i = FirstRowOfData
n = FirstRowOfData
With YourWS
Do Until .Cells(i, inColUI) = ""
j = 0
UI = .Cells(i, inColUI)
Do Until .Cells(i - 1, inColUI) <> .Cells(i, inColUI) And j > 0 Or .Cells(i, inColUI) = ""
arr(j + 1) = .Cells(i, inColNation)
i = i + 1
j = j + 1
Loop
If j = 1 Then '<- remove this if-statement and the following marked lines if single appearing UIs should be omitted entirely
.Cells(n, outColUI) = UI '<---
.Cells(n, outColNation1) = arr(1) '<---
n = n + 1 '<---
Else '<---
For k = 1 To j
For l = 1 To j
If arr(k) <> arr(l) Then
.Cells(n, outColUI) = UI
.Cells(n, outColNation1) = arr(k)
.Cells(n, outColNation2) = arr(l)
n = n + 1
End If
Next l
Next k
End If '<---
Loop
End With
End Sub
Edit: cleaned up the code a little bit
Something like the following shows how to iterate through 2 ranges of cells
Dim Rng1 as Range, Rng2 as Range
Dim SrcCell as Range, OthrCell as Range
Dim FullList as string
Rng1 = Range("A1:A12")
Rng2 = Range("B1:B12")
FullList = ""
For Each SrcCell in Rng1
For Each OthrCell in Rng2
FullList = IIF(FullList="","",FullList & vbCrLf) & SrcCell.Value & OthrCell.Value
Next OthrCell
Next srcCell
The FullList string now contains all the combinations but you may require something else. Only intended to give you a start
You need to add code yourself to filter out duplicates
You can do the following (see code below). As another commentee mentioned, when there is only one record of company vs country, it will not show in the output.
The solutions is based on creating a dictionary, each entry is a company and the value is a comma separated string of countries. After the dictionary is created, the dictionary is looped, and a list of countries is then iterated over a nested loop. If the index of the outer loop is the same as the inner index of the loop then the loop is skipped i.e. that would be a Country 1 vs Country 1 combination. Otherwise is added to the output list.
Columns A,B is input and columns D,E,F is output.
Option Explicit
Public Sub sCombine()
Dim r As Range, dest As Range
Dim d As New Dictionary
Dim key As Variant
Dim countries() As String
Dim i As Integer, j As Integer
On Error GoTo error_next
Set r = Sheet1.Range("A1")
Set dest = Sheet1.Range("D:F")
dest.ClearContents
Set dest = Sheet1.Range("D1")
While r.Value <> ""
If d.Exists(r.Value) Then
d(r.Value) = d(r.Value) & "," & r.Offset(0, 1)
Else
d.Add r.Value, r.Offset(0, 1).Value
End If
Set r = r.Offset(1, 0)
Wend
For Each key In d.Keys
countries = Split(d(key), ",")
For i = LBound(countries) To UBound(countries)
For j = LBound(countries) To UBound(countries)
If i <> j Then
dest.Value = key
dest.Offset(0, 1).Value = countries(i)
dest.Offset(0, 2).Value = countries(j)
Set dest = dest.Offset(1, 0)
End If
Next j
Next i
Next key
Exit Sub
error_next:
MsgBox Err.Description
End Sub

Frequency of Words within Cells within Range

I have a column of about 50 cells. Each cell contains a block of text, anywhere from 3-8 sentences.
Id like to populate a list of words being used and obtain their frequencies for the entire range (A1:A50).
Ive tried to manipulate other codes I've found in other posts but they seem to be tailored to cells that contain one word rather than multiple words.
This is the code I found that I was attempting to use.
Sub Ftable()
Dim BigString As String, I As Long, J As Long, K As Long
Dim Selection As Range
Set Selection = ThisWorkbook.Sheets("Sheet1").Columns("A")
BigString = ""
For Each r In Selection
BigString = BigString & " " & r.Value
Next r
BigString = Trim(BigString)
ary = Split(BigString, " ")
Dim cl As Collection
Set cl = New Collection
For Each a In ary
On Error Resume Next
cl.Add a, CStr(a)
Next a
For I = 1 To cl.Count
v = cl(I)
ThisWorkbook.Sheets("Sheet2").Cells(I, "B").Value = v
J = 0
For Each a In ary
If a = v Then J = J + 1
Next a
ThisWorkbook.Sheets("Sheet2").Cells(I, "C") = J
Next I
End Sub
Here you go, a dictionary is the best way to handle this I think as you can test if the dictionary already contains an item. Post back if there's anything you don't get.
Sub CountWords()
Dim dictionary As Object
Dim sentence() As String
Dim arrayPos As Integer
Dim lastRow, rowCounter As Long
Dim ws, destination As Worksheet
Set ws = Sheets("Put the source sheet name here")
Set destination = Sheets("Put the destination sheet name here")
rowCounter = 2
arrayPos = 0
lastRow = ws.Range("A1000000").End(xlUp).Row
Set dictionary = CreateObject("Scripting.dictionary")
For x = 2 To lastRow
sentence = Split(ws.Cells(x, 1), " ")
For y = 0 To UBound(sentence)
If Not dictionary.Exists(sentence(y)) Then
dictionary.Add sentence(y), 1
Else
dictionary.Item(sentence(y)) = dictionary.Item(sentence(y)) + 1
End If
Next y
Next x
For Each Item In dictionary
destination.Cells(rowCounter, 1) = Item
destination.Cells(rowCounter, 2) = dictionary.Item(Item)
rowCounter = rowCounter + 1
Next Item
End Sub
Try this (works for me with some long blocks of Lorem Ipsum text):
Sub Ftable()
Dim BigString As String, I As Long, J As Long, K As Long
Dim countRange As Range
Set countRange = ThisWorkbook.Sheets("Sheet1").Range("A1:A50")
BigString = ""
For Each r In countRange.Cells
BigString = BigString & " " & r.Value
Next r
BigString = Trim(BigString)
ary = Split(BigString, " ")
Dim cl As Collection
Set cl = New Collection
For Each a In ary
On Error Resume Next
cl.Add a, CStr(a)
Next a
For I = 1 To cl.Count
v = cl(I)
ThisWorkbook.Sheets("Sheet2").Cells(I, "B").Value = v
J = 0
For Each a In ary
If a = v Then J = J + 1
Next a
ThisWorkbook.Sheets("Sheet2").Cells(I, "C") = J
Next I
End Sub
I took it down to only looking at the 50 cells where you have data, as opposed to all >1 million in that column. I also fixed an issue where r was getting a length 1 array instead of a Range. And I renamed "Selection" to "countRange" because Selection is already defined in the application, so it was bad naming.
Also, notice that your code pulls from "Sheet1" and outputs into columns B and C of "Sheet2". Make sure you rename your worksheets or edit these values, or you'll get errors/data corruption.
This is how I'd approach the problem:
Sub Ftable()
Dim wordDict As New Dictionary
Dim r As Range
Dim countRange As Range
Dim str As Variant
Dim strArray() As String
Set countRange = ThisWorkbook.Sheets("Sheet1").Range("A1:A50")
For Each r In countRange
strArray = Split(Trim(r.Value), " ")
For Each str In strArray
str = LCase(str)
If wordDict.Exists(str) Then
wordDict(str) = wordDict(str) + 1
Else
wordDict.Add str, 1
End If
Next str
Next r
Set r = ThisWorkbook.Sheets("Sheet2").Range("B1")
For Each str In wordDict.Keys()
r.Value = str
r.Offset(0, 1).Value = wordDict(str)
Set r = r.Offset(1, 0)
Next str
Set wordDict = Nothing
End Sub
It uses a dictionary, so make sure you add a reference to the library (Tools > Add Reference > Microsoft Scripting Library). It also forces everything to lowercase - one big issue of the old code was that it failed to count capitalized and uncapitalized versions correctly, meaning it missed many words. Remove str = LCase(str) if you don't want this.
Bonus: this method ran about 8 times faster on my test sheet.

Resources