Related
I am looking to find formula which gives me count of -> how many line in multiline of the cell are begining with - (hyphen)
for e.g. if cell contains
how are you keeping up
-I am well and need toy
-"You" are asking wrong question
<you are wrong>
-why should i reply you
sum count of qualified multiline is = 3
can anyone help me out here please
If you first lines never start with an hyphen, or at least do not count towards the total, then try:
Formula in B1:
=(LEN(A1)-LEN(SUBSTITUTE(A1,CHAR(10)&"-","")))/2
If your first line can also start with an hyphen and therefor count towards the total, try:
=(LEN(CHAR(10)&A1)-LEN(SUBSTITUTE(CHAR(10)&A1,CHAR(10)&"-","")))/2
Here is a VBA solution:
Function CountLines(text As String, Optional flag As String = "") As Long
'counts all lines in text which starts with flag
Dim i As Long, count As Long
Dim lines As Variant
lines = Split(text, vbLf)
For i = LBound(lines) To UBound(lines)
If Mid(lines(i), 1, Len(flag)) = flag Then
count = count + 1
End If
Next i
CountLines = count
End Function
If this is in a standard code module, the example text in A1 and in B1 you enter the formula =CountLines(A1,"-"), it will evaluate to 3.
If you want to include the first line in the potential count, then, in Windows Excel 2013+, you can try:
=COUNTA(FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,">",">"),"<","<"),"""","""),CHAR(10),"</s><s>") & "</s></t>","//s[starts-with(text(),'-')]"))
Replace illegal xml characters ",<, and >
Create an XML by splitting into nodes based on the LF character
Use xpath //s[starts-with(text(),'-')] to return only those nodes that start with a hyphen.
COUNTA to return the count of those nodes
I have VBA which comparing 2 cells. Each cell can contain between 1 and 3 different parameters ant parameters are trimmed by the "," comparison is made by simple double for loop(check code). Thing what i can't figure it out is that: How to modify code and get number of unique entries, example
cell 1 [music, art, science]; cell 2 [art, music]; When i run my for loops i get 2 matches(which is fine) but how to count number of unique words in this case should be 3.
I have tried to enter this part of code but its not working well num_possible = num_possible + 1
game_tags_parts = Split(Cells(11, 2), ",")
game_tags_parts_j = Split(Cells(11, j), ",")
num_matches = 0
num_possible = 0
For m = LBound(game_tags_parts) To UBound(game_tags_parts)
num_possible = num_possible + 1
For n = LBound(game_tags_parts_j) To UBound(game_tags_parts_j)
If Trim(game_tags_parts(m)) = Trim(game_tags_parts_j(n)) Then
num_matches = num_matches + 1
End If
Next n
Next m
Actual result should be number of unique words used in those cells, in some cases i get 3 matches, example cell 1 [scifi, space, star] cell 2 [star, space, scifi] and its in total 3 matches. Modification should provide me an number 3 as number of unique words used in both cells. Or in this case where i have cell 1 [art, music, science] and cell 2 [scifi, space, star] where program gives me 0 same words and modification should give me a number 6 as unique used words.
One easy way to get a unique count is to use a Dictionary object:
game_tags_parts = Split(Cells(11, 2), ",")
game_tags_parts_j = Split(Cells(11, j), ",")
Dim myDict As Object
Set myDict = CreateObject("Scripting.Dictionary")
For Each v In game_tags_parts
If Not myDict.Exists(v) Then myDict.Add v, v
Next v
For Each v In game_tags_parts_j
If Not myDict.Exists(v) Then myDict.Add v, v
Next v
MsgBox "unique count: " & myDict.Count
I have addresses in U and V. I want to see if they are somewhat similar and if they are say "Update" If not say "Omit".
For example 246 N High street in U and 246 North High St in V would return a value of Update.
246 N High Street in U and 458 Auburn Drive in V would return a value of Omit.
Any ideas?
There are a lot of algorithms for doing fuzzy matching. One of the easier ones to implement in excel is N-Gram.
To perform an n-gram match, we have to break each address up into a list of sets of smaller character lengths. A 2-gram list of your address 246 N High street would look like 24,46,6 , N,N , H,Hi,ig,gh,h , s,st,tr,re,ee,et. We could do the same with a 3-gram: 246,46 ,6 N, N ,N H, Hi,Hig,igh,gh ,h s, st,str,tre,ree,eet
We do this with both addresses, then we can check each item in the first address's list to see if it appears in the second address's list; count the matches and divide that by the number of items in the first list. That will give you a percentage of how close they are.
You could get fancy with cell formulas mid() and countif() to do this with sheet formulas, but I think it's easier to just write it out in VBA and make it a UDF.
Function NGramCompare(string1 As String, string2 As String, intGram As Integer) As Double
'Take in two strings and the N-gram
Dim intChar As Integer, intGramMatch As Integer
Dim ngramList1 As String, ngramList2 As String, nGram As Variant
Dim nGramArr1 As Variant
'split the first string into a list of ngrams
For intChar = 1 To Len(string1) - (intGram-1)
If ngramList1 <> "" Then ngramList1 = ngramList1 & ","
ngramList1 = ngramList1 & Mid(string1, intChar, intGram)
Next intChar
'split the secong string into a list of ngrams
For intChar = 1 To Len(string2) - (intGram-1)
If ngramList2 <> "" Then ngramList2 = ngramList2 & ","
ngramList2 = ngramList2 & Mid(string2, intChar, intGram)
Next intChar
'Split the ngramlist1 into an array through which we can iterate
nGramArr1 = Split(ngramList1, ",")
'Iterate through array and compare values to ngramlist2
For Each nGram In nGramArr1
If InStr(1, ngramList2, nGram) Then
'we found a match, add to the counter
intGramMatch = intGramMatch + 1
End If
Next nGram
'output the percentage of grams matching.
NGramCompare = intGramMatch / (UBound(nGramArr1) + 1)
End Function
If you've never used a UDF:
Go to visual basic editor (VBE) with Alt+F11
In the VBA Project window, find your workbook and right click on the name
Choose: Insert>>Module
Double click the new module in the list to bring up it's code window
Paste this function in and save your workbook
Then, assuming address1 is in A1 and address2 is in B1 you can put, in C1:
=NGramCompare(A1, B1, 2)
Which, for your first address, will spit out 56%. Which seems like a reasonably good match. If you find you are getting too many positive hits, you can change your 2-gram to be a 3-gram by changing that last parameter.
To take it a step further so it will say "Update" or "Omit" you could do:
=If(NGramCompare(A1, B1, 2)>.30, "Update", "Omit")
I just set that so that it will consider a match anything above 30%, but you can adjust as necessary. No matter where you set it, you will probably end up with a percentage of compares that are false positives or false negatives, but that's the way fuzzy matching goes.
Some of the naive approaches can be to compare the first few characters
=LEFT(A1,5)=LEFT(B1,5)
or to replace parts until they match
=(SUBSTITUTE(SUBSTITUTE(LOWER(A2)," street"," ST")," north "," N ")
=SUBSTITUTE(SUBSTITUTE(LOWER(B2)," street"," ST")," north "," N "))
both will probably turn into a big ugly formula after adjusting for most cases
I have a list of strings in excel as such:
a>b>b>d>c>a
a>b>c>d
b>b>b>d>d>a
etc.
I want to extract the last c or last d from each string whichever comes last,
e.g
a>b>b>d>c>a = C
a>b>c>d = d
b>b>b>d>d>a = d
how would I do this using VBA (or just straight excel if it is possible)?
You could use an excel formula as follows
To help explain will start with just one letter then will show full formula at the end.
First find the number of occurences of c
= LEN(A1) - LEN(SUBSTITUTE(A1,"c","")
Use this position to replace the last c with a unique character ($ as an example)
=SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c","")))
Next find this unique character
= FIND("$",SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c",""))))
This gives the position of the last c, now you can use this in a mid function to return this last c
= MID(A1,FIND("$",SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c","")))),1)
Finally to account for both c and d, use a max to bring back which comes last
= MID(A1,MAX(IFERROR(FIND("$",SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c","")))),0),IFERROR(FIND("$",SUBSTITUTE(A1,"d","$",LEN(A1) - LEN(SUBSTITUTE(A1,"d","")))),0)),1)
Assuming c/d are just examples:
?LastEither("b>b>b>d>d>a", "c", "d")
d
Using
Function LastEither(testStr As String, find1 As String, find2 As String) As String
Dim p1 As Long: p1 = InStrRev(testStr, find1)
Dim p2 As Long: p2 = InStrRev(testStr, find2)
If (p1 > p2) Then
LastEither = find1
ElseIf (p2 > 0) Then LastEither = find2
End If
End Function
General solution:
?FindLastMatch("b>b>b>d>d>a>q>ZZ", ">", "c", "d")
d
?FindLastMatch("b>b>b>d>d>a>q>ZZ", ">", "c", "d", "q")
q
?FindLastMatch("b>b>b>d>d>a>q>ZZ>ppp", ">", "c", "d", "ZZ", "q")
ZZ
Using
Function FindLastMatch(testStr As String, delimiter As String, ParamArray findTokens() As Variant) As String
Dim tokens() As String, i As Long, j As Long
tokens = Split(testStr, delimiter)
For i = UBound(tokens) To 0 Step -1
For j = 0 To UBound(findTokens)
If tokens(i) = findTokens(j) Then
FindLastMatch = tokens(i)
Exit Function
End If
Next
Next
End Function
And here is a array formula to do the same thing. (Changed formula to avoid problem with original pointed out by Grade 'Eh' Bacon)
=MID(A1,MAX((MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)={"c","d"})*ROW(INDIRECT("1:"&LEN(A1)))),1)
An array formula is entered by holding down ctrl+shift while hitting enter. If you do it correctly, Excel will place braces {...} around the formula which you can see in the formula bar.
The formula will return a #VALUE! error if there is neither c nor d in the string.
EDIT: Having seen from some of your comments that you might want to use more than single character words, I present the following User Defined Function. It allows you to use words of any length, and also you are not limited to just two words -- you can use an arbitrary number of words.
You would enter a formula such as:
=LastOne(A8,"Charlie","Delta")
or
=LastOne(A8,$I1:$I2)
where I1 and I2 contain the words you wish to check for.
The words need to be separated by some delimiter that is neither a letter nor a digit.
A Regular Expression (regex) is constructed which consists of a pipe-separated | list of the words or phrases. The pipe | , in a regex, is the same as an OR. The \b at the beginning and end of the regex indicates a word boundary -- that is the point at which a digit or letter is adjacent to a non-digit or non-letter, or the beginning or end of the string. Hence the actual delimiter does not matter, so long as it is not a letter or digit.
All of the matches are placed in a Match Collection; and we only need to look for the last item in the match. There will be MC.Count matches and, since this count is zero based, we subtract one to get the last match.
Here is the code:
===========================================
Option Explicit
Function LastOne(sSearch As String, ParamArray WordList() As Variant) As String
Dim RE As Object, MC As Object
Dim sPat As String
Dim RNG, C
For Each RNG In WordList
If IsArray(RNG) Or IsObject(RNG) Then
For Each C In RNG
sPat = sPat & "|" & C
Next C
Else
sPat = sPat & "|" & RNG
End If
Next RNG
sPat = "\b(?:" & Mid(sPat, 2) & ")\b"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = sPat
.ignorecase = True
If .test(sSearch) = True Then
Set MC = .Execute(sSearch)
LastOne = MC(MC.Count - 1)
End If
End With
End Function
===========================================
Here is a sample screenshot:
Note that an absence of a WordList word will result in a blank cell. One could produce an error if that is preferable.
In VBA you can do this using following simple logic.
Dim str As String
str = "a>b>b>d>c>a"
Dim Cet
Cet = split(str,">")
Dim i as Integer
For i= Ubound(Cet) to Lbound(Cet)
If Cet(i) = "c" or "d" or "C" or "D" then
MsgBox Cet(i)
Exit For
End if
Next i
Assuming your string is in cell A1, and there are no uses of the tilde (~) character in it, you can use the following in a worksheet:
=IF(IFERROR(FIND("~",SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c","")))),0)>IFERROR(FIND("~",SUBSTITUTE(A1,"d","~",LEN(A1)-LEN(SUBSTITUTE(A1,"d","")))),0),"c","d")
EDIT:
In response to a comment, here's an explanation of how this works. I've also neatened up the formula slightly having looked back at it again. The two formulae for c and d are identical, so the explanation will apply for both. So, working outwards
LEN(A1)-LEN(SUBSTITUTE(A1,"c",""))
Here we remove all instances of c from the string. By comparing the length of this calculated string and the original string, we calculate the number of times c appears in the original string.
SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c","")))
Now that we know the number of times c appears in our string, we
replace the last occurrence of c with the tilde character (here we assume the tilde isn't used in the string otherwise).
FIND("~",SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c",""))))
We then find the position of the tilde in the string, which is equivalent to the position of the last c in the string.
IFERROR(FIND("~",SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c","")))),0)
Wrapping this in an IFERROR ensures that we don't have errors coming through the formula - setting the value to 0 if no c exists ensures that we still get a correct answer if our string contains c but not d (and vice versa).
We then apply the same calculation to d and compare the two to see which occurs later in our string. Note: this will give an incorrect answer if there is neither c nor d in the string.
I have a list of around 1500 items with dimensions, but the dimensions do not all have the same format. The dimensions I want to keep are listed as L x W x H. How can I sort the dimensions listed like this from the stuff I don't want (some are listed as only L x H, Diameter, or just gibberish, etc.) Thank you.
If by gibberish you mean text values that could include <space>x<space> then you have some real problems. However, it it can be reasonable assumed that the L x W x H format is what you want and the only values that contain 2 occurrences of <space>x<space> are valid ones then a helper column would identify the valid entries.
In an unused column to the right put this formula into the second row.
=ISNUMBER(FIND(" x ", $A2, FIND(" x ", $A2) + 3))
Fill down as necessary. The results should resemble the image below.
Use Data ► Sort & Filter ► Filter to filter your Helper column for FALSE. These entries can be deleted and when you turn the filter off you will be ;left with valid entries.
Elaborating on #jeeped's answer, if you are dealing with data from an external source, you might want to relax your rules to allow other valid input formats:
There must be exactly three numbers, all non-negative integers.
A decimal point is allowed, but no digits after the decimal point.
They can be separated by "x" or "X" or "*".
They can have extra spaces before, after or between the numbers, but not between the digits.
That would mean these values would all be OK:
17x12x13
100 * 50 * 2
100. X 200. X 300
Problems of this sort are ideally suited to regular expressions. The RegExp feature can be added in Code editor with Tools > References, then check "Microsoft VBScript Regular Expressions". Then try this VBA function:
Public Function IsNxNxN(s As String) As Boolean
With New RegExp
.Pattern = "^\s*(\d+)\.?\s*[xX*]\s*(\d+)\.?\s*[xX*]\s*(\d+)\.?\s*$"
With .Execute(s)
IsNxNxN = (.Count = 1)
End With
End With
End Function
In jeeped's sample worksheet, you would replace the B2 formula with:
=IsNxNxN(A2)
If you are trying to clean up the data as well as filter it, you could use this:
Public Function CleanupNxNxN(s As String) As String
With New RegExp
.Pattern = "^\s*(\d+)\.?\s*[xX*]\s*(\d+)\.?\s*[xX*]\s*(\d+)\.?\s*$"
With .Execute(s)
If .Count = 1 Then
With .Item(0)
CleanupNxNxN = .SubMatches(0) & " x " & _
.SubMatches(1) & " x " & _
.SubMatches(2)
End With
End If
End With
End With
End Function
and set the formula for C2 to:
=CleanupNxNxN(A2)
Any dimension values that are invalid will report False in column B and blank in Column C. Valid dimensions such as " 10. x 20X30 " would be reformatted as "10 x 20 x 30".
If you would like to allow extra "gibberish" before or after the dimensions, you could remove the "^" and "&" anchor characters from .Pattern, and get:
"approx. Size: 10*20*30 feet" would yield: True, "10 x 20 x 30"