Counting the Frequencies of Words in Excel Strings - excel

Suppose I have a column of arbitrary length where each cell contains a string of text. Is there a way to determine what words appear most frequently in the column (not knowing in advance which words to check) and subsequently order these words along with their frequencies in a two column table? Would VBA be best for this task?
As an example, a cell might contain the string "This is a string, and the # of characters inthis string is>0." (errors intentional)

Select a portion of column A and run this small macro ( the table will be placed in cols. B & C :
Sub Ftable()
Dim BigString As String, I As Long, J As Long, K As Long
BigString = ""
' Add code to sum both "All" and "all"
' Add code to separate "." "!" etc. from the word preceeding them so that word
' is also counted in the total. For example: "all." should not be reported as 1 ' "all." but "all" be added to the total count of "all" words.
' Would you publish this new code?
For Each r In Selection
BigString = BigString & " " & r.Value
Next r
BigString = Trim(BigString)
ary = Split(BigString, " ")
Dim cl As Collection
Set cl = New Collection
For Each a In ary
On Error Resume Next
cl.Add a, CStr(a)
Next a
For I = 1 To cl.Count
v = cl(I)
Cells(I, "B").Value = v
J = 0
For Each a In ary
If a = v Then J = J + 1
Next a
Cells(I, "C") = J
Next I
End Sub

Given this:
I'll use a pivot table to get this:
Best part is, if I got more, it's easy to get Top 5, 10, etc. And it'll always result to unique indices. From there, there are all manners of editing and calculation you can do. :)

Using Google Sheets:
index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B$2)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),1,$A6+1)&":"&index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B$2)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),2,$A6+1)
In the above $B$2 contains the text string
$A6 = 1 will give you the most used word
$A6 = 2 will give you the second most used word
etc.
This is set to do 20 most frequent. If you want more, increase the limit value to whatever you want.

Here's a tiny fix plus an enhancement to the script kindly offered by "Gary's Student". The fix is that while building the collection is apparently not case-sensitive (and this is correct--we probably don't want new items added to the collection that differ only in case from existing items), the IF statement that does the counting IS case-sensitive as written, so it doesn't count correctly. Just change that line to...
If LCase(a) = LCase(v) Then J = J + 1
And here's my enhancement. To use it, you first select one or more columns but NOT their (first) header/label rows. Then run the script, and it gives results for each selected column in a new worksheet--along with that header/label row so you know what you're looking at.
I'm just a dabbler. I just hack stuff when I need to get a job done, so it's not elegant, I'm sure...
Sub FrequencyV2() 'Modified from: https://stackoverflow.com/questions/21858874/counting-the-frequencies-of-words-in-excel-strings
'It determines the frequency of words found in each selected column.
'Puts results in new worksheets.
'Before running, select one or more columns but not the header rows.
Dim rng As Range
Dim row As Range
Dim col As Range
Dim cell As Range
Dim ws As Worksheet
Dim wsNumber As Long 'Used to put a number in the names of the newly created worksheets
wsNumber = 1
Set rng = Selection
For Each col In rng.Columns
Dim BigString As String, I As Long, J As Long, K As Long
BigString = ""
For Each cell In col.Cells
BigString = BigString & " " & cell.Value
Next cell
BigString = Trim(BigString)
ary = Split(BigString, " ")
Dim cl As Collection
Set cl = New Collection
For Each a In ary
On Error Resume Next 'This works because an error occurs if item already exists in the collection.
'Note that it's not case sensitive. Differently capitalized items will be identified as already belonging to collection.
cl.Add a, CStr(a)
Next a
Set ws = Sheets.Add(After:=Sheets(Sheets.Count))
ws.Name = "F" & CStr(wsNumber)
wsNumber = wsNumber + 1
Worksheets(ws.Name).Cells(1, "A").Value = col.Cells(1, 1).Offset(-1, 0).Value 'Copies the table header text for current column to new worksheet.
For I = 1 To cl.Count
v = cl(I)
Worksheets(ws.Name).Cells(I + 1, "A").Value = v 'The +1 needed because header text takes up row 1.
J = 0
For Each a In ary
If LCase(a) = LCase(v) Then J = J + 1
Next a
Worksheets(ws.Name).Cells(I + 1, "B") = J 'The +1 needed because header text takes up row 1.
Next I
Next col
End Sub

Related

Assign a variable to cells to compare mutliple numbers

I have a data set where I need to compare the first number in each transect against each other.
For example, in the below data set I need to compare cells D2, D7, D12 and D17 and assign a value based on which one is the smallest number, the next smallest and so on. This will be used to assign the transect numbers in column A.
My issue is that the number of sections (in this example 4) and the number of transects (also 4 in this example) will vary. So the cells I need to compare will change.
I have written the code that calculates the number of transects, which is:
Dim tlength As Worksheet
Dim tb As Long *'tb=transect break*
Sub tlength_start_stop_coords()
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("sheet1")
Set tlength = ThisWorkbook.Worksheets("transect lengths") *' assigns the sheet to a variable
for efficient coding*
tb = 0 *'counter to calculate the number of transects*
j = 2 *'counter for row*
Lastrow = Lastrow + 1 *'add a row to last row so that the last row includes a blank line.*
*'the following for loop is used to calculate the number of transects*
For i = 2 To Lastrow
If tlength.Range("D" & i) = vbNullString Then
If tlength.Range("D" & i - 1) <> vbNullString Then
tb = tb + 1 *'updates the counter*
End If
End If
Next i
tbtotal = tb *'stores the total counter in variable tbtotal for later use*
I think I may need to use a loop. But I am stuck trying to figure out how to manage the unknown number of comparisons in changing cell locations.
The desired result is in the below screenshot of the expected outcome, with results in column A. To begin with, I only need to get the number for the first row of each transect. Once I have that, I can copy using xldown.
Expected outcome:
Another possible data set and outcome expected might be:
enter image description here
with an expected outcome of:
enter image description here
Worked for me using your second set of sample data:
Sub Tester()
Dim tlength As Worksheet, i As Long, tNum As Long, v, vPrev, arr
Dim col As New Collection, e, c As Range, rng As Range
Set tlength = ThisWorkbook.Worksheets("transect lengths")
'collect all the Section 1 Latitudes and row numbers
For i = 2 To tlength.Cells(Rows.Count, "B").End(xlUp).Row
If tlength.Cells(i, "B") = 1 Then
col.Add Array(i, tlength.Cells(i, "D").Value) 'store start row and first Latitude
End If
Next i
SortCollection col, 2 'sort collection by second element in each array
tNum = 0
'loop over the sorted collection and assign the order of the transects
For Each e In col
tNum = tNum + 1
Set c = tlength.Cells(e(0), "B")
'following assumes all transects have at least 2 sections...
tlength.Range(c, c.End(xlDown)).Offset(0, -1).Value = tNum
Next e
End Sub
'Sort a collection of 0-based arrays in ascending order by the n'th element of each array
'Adapted from https://stackoverflow.com/a/3588073/478884
Sub SortCollection(col As Collection, n As Long)
Dim i As Long, j As Long, vTemp As Variant
For i = 1 To col.Count - 1 'Two loops to bubble sort
For j = i + 1 To col.Count
If col(i)(n - 1) < col(j)(n - 1) Then 'change to > for ascending sort
vTemp = col(j) 'store the lesser item
col.Remove j 'remove the lesser item
col.Add Item:=vTemp, before:=i 're-add the lesser item before the greater Item
End If
Next j
Next i
End Sub

Loop through name list and if names exist in selection start after last name

I apologize, this is my first crack at Excel VBA so excuse my lack of knowledge!
So I have a list of (currently) 3 names to assign to the days in column A in a repeating order in Excel.
Currently my VBA code allows it to populate the selected cells with the names in a repeating pattern (this part is good), however there are two pieces I need help with.
1- with current code, once it reaches the bottom of the names it checks for the blank box that would end that list and starts over at the tops as directed but it puts a blank cell first (see screenshot). How can I have it put next name without adding blank cell first?
2- I want to be able to (once this gets going)select the entire D column through what dates need to be filled and:
-check the lowest non blank box
-match to list and set the
counter to name below that so
it continues the name order
from the last person who was
assigned
This is code I have now:
Sub EXAMPLE()
Dim count As Integer
count = 0
For Each c In Selection
c.Value = Range("X1").Offset(count, 0).Value
If c.Value = "" Then count = -1 And c.Value = Range("x1").Offset(count, 0).Value
count = count + 1
Next c
End Sub
Sorry I know that was long, I hope this makes sense.
I think it's worth reading about arrays, as this task is ideally suited to their use. Your best bet would be to read the names into an array and then build a recurring array whose dimension is equal to the number of rows in your dates column (or selection, or however you want to define the size of the output range).
Code would look a little like this:
Dim v As Variant
Dim people() As Variant, output() As Variant
Dim rowCount As Long, i As Long, j As Long
Dim endRange As Range
'Read the list of names into an array.
'This just takes all data in column "X" -> amend as desired
With Sheet1
Set endRange = .Cells(.Rows.Count, "X").End(xlUp)
v = .Range(.Cells(1, "X"), endRange).Value
End With
'Sense check on the names data.
If IsEmpty(v) Then
MsgBox "No names in Column ""X"""
Exit Sub
End If
If Not IsArray(v) Then
ReDim people(1 To 1, 1 To 1)
people(1, 1) = v
Else
people = v
End If
'Acquire the number of rows for repeating list of names.
'This just takes all data in column "A" -> amend as desired
With Sheet1
Set endRange = .Cells(.Rows.Count, "A").End(xlUp)
rowCount = .Range(.Cells(3, "A"), endRange).Rows.Count
End With
'Sense check date data.
If endRange.Row < 3 Then
MsgBox "No dates in Column ""A"""
Exit Sub
End If
'Make a recurring array.
ReDim output(1 To rowCount, 1 To 1)
i = 1
Do While i <= rowCount
For j = 1 To UBound(people, 1)
output(i, 1) = people(j, 1)
i = i + 1
If i > rowCount Then Exit Do
Next
Loop
'Write the output to column "D"
Sheet1.Range("D3").Resize(UBound(output, 1)).Value = output

special case - deleting rows on excel (VBA)

I have a worksheet, and the worksheet has multiple columns,multiple rows. How can I create a script such that, if a cell in any of those columns does not have a phrase, for example,'cat', the whole row gets deleted?
There are many solutions online, but they usually ask you to define a range, such as which column you want to search in and until how many rows. I can't have these restrictions as my different worksheets have different columns and different number of rows, but the same concept, where I delete a row if a certain phrase isn't existent in that row.
I am assuming that the word cat must appear somewhere in the row. If the row is completely "feline-free" then that row will be deleted. Here is a typical approach:
Sub KeepOnlyCatRows()
Dim i As Long, N As Long, s As String, c As String
N = Cells(Rows.Count, "A").End(xlUp).Row
c = Chr(1)
For i = N To 1 Step -1
s = c & Application.WorksheetFunction.TextJoin(c, True, Rows(i)) & c
If InStr(s, c & "cat" & c) = 0 Then
Rows(i).Delete
End If
Next i
End Sub
NOTES:
If the word cat must appear in every cell in the row, then ignore this answer.
I am assuming that the "cat-cell" contains only the word cat and no other text.
EDIT#1:
In order to allow an arbitrary word rather than cat, try the following:
Sub KeepOnlySpecialRows()
Dim i As Long, N As Long, s As String, c As String
Dim SpecialWord As String
N = Cells(Rows.Count, "A").End(xlUp).Row
c = Chr(1)
SpecialWord = Application.InputBox(Prompt:="Enter the special word:", Type:=2)
For i = N To 1 Step -1
s = c & Application.WorksheetFunction.TextJoin(c, True, Rows(i)) & c
If InStr(s, c & SpecialWord & c) = 0 Then
Rows(i).Delete
End If
Next i
End Sub

Extract the first word OR the first and second words from a range of cells

I have a column with text (Words and numbers) separated by spaces. There are two cases:
Case 1 (3 words separated by 2 space): BALDOR 3 hp-4
Case 2(4 words separated by 3 space): US ELECTRICAL 75 hp-232
I need to extract the Bolded word(s) (they aren't bolded in the data i have, it's just to illustrate) so I figured I would reverse the order of the words then get rid of the first two (3 hp4 and 75 hp232) which will always output the bolded words.
I might be going about it the wrong way with reversing the order of the words so If you have another method that you think is better do tell.
This is what I have so far:
Sub ExtractMissingInfo2()
Dim TypeCell As Variant
Dim Manufacturer As String
Dim MFG As Variant
Dim MFGrev As Variant
Dim MFGout As Variant
Dim RowCount As Variant
Dim Rng As Range
Dim a As Variant
Dim I As Variant
Dim wbdata As Workbook
Dim wsData As Worksheet
Set wbdata = Workbooks("trial1")
Set wsData = wbdata.Worksheets("Final Data")
wsData.Activate
'Counts how many cells in the chosen column
RowCount = Cells(Rows.Count, 4).End(xlUp).Row
For a = 2 To RowCount
If Not IsEmpty(Cells(a, 4)) Then
TypeCell = wsData.Cells(a, 4).Text 'cells with information
MFG = Split(TypeCell, " ") 'separate them
'Reverse the order of the words
For I = UBound(MFG) To 0 Step -1
MFGrev = MFGrev + "" + MFG(I) + " "
'Use the last iteration which will include all the words in reverse order
If I = 0 Then
MFGout = MFGrev
End If
Next
'This part I am not sure about
Manufacturer = Split(MFGout.Text, " ")(0)
'Insert extracted words into new column
Cells(a, 16) = WorksheetFunction.Transpose(Manufacturer)
Else
MsgBox ("Is empty... row " & a)
End If
Next
End Sub
So my First issue is that when looping, it keeps adding every string of every cell to the next instead of going through each cell one by one and outputting the words in reverse order.
My second issue is that I am not sure how to delete the first two words after reversing the order.
This is my first question on here so if i made mistakes in the formatting let me know.
Thank you in advance for any help!
EDIT:
What I am trying to do is extract the manufacturers' names for a list of equipment. The names can have one or two words in it so that is what i need to extract. I then am pasting those in another column.
The cases I gave where just examples to show the two cases that arise in that list and ask how to deal with them.
I guess you're after this code (explanations in comments)
Option Explicit
Sub ExtractMissingInfo2()
Dim MFG As Variant
Dim cell As Range
With Workbooks("trial1").Worksheets("Final Data") 'reference your wanted workbook and worksheet
For Each cell In .Range("D2", .Cells(Rows.Count, 4).End(xlUp)).SpecialCells(xlCellTypeConstants) 'loop thorugh referenced sheet column D not empty cells from row 2 down to last not empty row
MFG = Split(cell.Text, " ") ' separate each word
If UBound(MFG) > 1 Then ReDim Preserve MFG(0 To UBound(MFG) - 2) ' if there were more than two words, keep all but the last two ones
cell.Offset(, 12).Value = Join(MFG, " ") ' write remaining words into column P same row of current cell
Next
End With
End Sub
try this code, it works for both cases :
Sub test()
Dim myarray As Variant
myarray = Array("US ELECTRICAL 3 hp-2", "BALDOR 3 hp-4")
For j = 0 To UBound(myarray)
x = ""
t = Split(myarray(j))
For i = 0 To UBound(t) - 2
x = x & " " & t(i)
Next i
MsgBox myarray(j) & " ---- " & x
Next j
End Sub
Assuming you want to remove the first 3 letters of a string in A1, give this a gander:
Dim n As Integer
n = 3
Cells(2, 2).Value = Right(Range("A1"), Len(Range("A1")) - n)
This assumes the length of the first part of the string is constant which it seems like it is from your example, though you may wish to clarify this. It's not too clear what you're after.

Split cell with multiple lines into rows

I have a workbook with cells that have linebreaks (entered via ALT + ENTER).
I have to separate them into individual rows. All the cells are in column A.
Each line in the cell has a bullet point (eg. "* ") up front, which could serve as a beacon to break the line at this point.
You can use split with Chr(10) or VbLf
Dim cell_value As Variant
Dim counter As Integer
'Row counter
counter = 1
'Looping trough A column define max value
For i = 1 To 10
'Take cell at the time
cell_value = ThisWorkbook.ActiveSheet.Cells(i, 1).Value
'Split cell contents
Dim WrdArray() As String
WrdArray() = Split(cell_value, vbLf)
'Place values to the B column
For Each Item In WrdArray
ThisWorkbook.ActiveSheet.Cells(counter, 2).Value = Item
counter = counter + 1
Next Item
Next i
No you have array to place each row to different cell
There is no need of code for this, lets make it simple.
Follow the bellow steps.
Select the data-set you want to split -> Go to Data Tab -> Select "Text to columns" -> from this pop-up select "Delimited" -> Select which delimiter is separating your texts -> Select the destination cell -> Click "OK"
Try This.
Regards,
Ashwin
Edit from Markus: For the newline as delimiter use "Ctr-J"
If you select the cell and run the macro you would get what you want on the next column like this:
Option Explicit
Public Sub selection_into_rows()
Dim k As Variant
Dim l_counter As Long
k = Split(Selection, Chr(10))
For l_counter = LBound(k) To UBound(k)
Cells(l_counter + 1, Selection.Column + 1) = k(l_counter)
Next l_counter
End Sub
This will work on one row only after selecting it (but should get you started):
Option Explicit
Public Sub SelectionIntoRows()
Dim k() As String
Dim l As Long
Dim i As Long
k() = Split(Range("A1"), " ")
i = 1
For l = 0 To UBound(k)
Cells(i, 1) = k(l)
i = i + 1
Next l
End Sub
Sub extract()
'Query extract data in cell B divided by ALT+Enter, Comma space
'Mandatory to create in front Sheet1, Sheet2, and Sheet3
'ATTENTION! if field B is empty return no data!! Manually add column A (with empty column B)if needed!!
'manually remove empty cell in results (Sheet2)
'before START Query remove duplicate from input data!!
'Doesn't work with full stop
'When finished Msg Done will be display
Dim c As Long, r As Range, I As Long, d As Long, Temp() As String
d = 0
For Each r In Range("B2:B" & Range("B" & Rows.Count).End(xlUp).Row) ' Change this to suit your range..
c = 2
Temp = Split((r.Value), Chr(10))
For i = LBound(Temp) To UBound(Temp)
Sheets("Sheet2").Cells(r.Row, c - 1).Offset(d, 0).Value = Cells(r.Row, r.Column - 1).Value
Sheets("Sheet2").Cells(r.Row, c).Offset(d, 0).Value = Temp(i)
Cells(r.Row, c).Offset(d, 0).Select
ActiveCell.Value = Trim(ActiveCell.Value)
d = d + 1
Next
d = d - 1
Next
Sheets("Sheet2").Select
Columns("A:B").Select
ActiveSheet.Range("$A$1:$B$62856").RemoveDuplicates Columns:=Array(1, 2), _
Header:=xlYes
Range("A1").Select
I had a half dozen of these blobs from an poor Acrobat PDF to XLSX conversion, peppered throughout 500 rows.
I copied the text blob (like OP) into Notepad, removed "" from the start and end line, and pasted to a new spreadsheet. That tells me how many lines to insert. Then pasted into that hole.
"This was all of
my input text"
Then I could use TextToCol.

Resources