Column A contains the list of all counters and column D (onward) contains some related information which I'm extracting from form XML. Now I want to return those variables in column B for which there are multiple values in column D and beyond
For example:
for counter1, nothing should be returned
for counter2, "D" should be
returned as there are 2 unique values(4 and 5) for it
for counter3, "B,C,D" should be returned as there are multiple values for those 3 variables.
Format is standard, variable-value pairs are separated by comma and there is equal sign between variable and value, but there can be as many number of pairs and as many number of key pair instances(one counter can have up to 100+ such cells in front of them)
Going by the assumption that you want to
Search for multiple assignments to a variable of the form X=n and return all variable names that get assigned at least 2 different values. Multiple assignments within one cell are separated by a defined delimiter. A range of multiple cells that do or do not contain one or more assignments may serve as input.
The below function FindDuplicateAssignmentsInCSVRange(inputRange, listDelimiter, assignmentOperator accepts a range of cells and optionally allows to define an alternate list separator (default: comma) and an alternate assignment operator (default: equality symbol).
It outputs as a comma-separated list (or whatever alternate separator has been passed in) the names of all variables that get assigned at least 2 different values. If no variable name fulfills that requirement, an empty string is returned instead.
Option Explicit
Private Type TKeyValuePair
key As String
val As String
End Type
Private Function FindDuplicateAssignmentsInCSVRange(inpRange As Range, _
Optional ByVal listDelimiter As String = ",", _
Optional ByVal assingmentOperator As String = "=") As String
' this function needs a reference to the Microsoft Scripting Runtime to work
Const ReturnIndicator As String = "Return this value's key when done"
Dim dic As Scripting.Dictionary
Set dic = New Scripting.Dictionary
' Go into each cell in the given range
Dim c As Range
For Each c In inpRange.Cells
' split cell contents and iterate over each particle
Dim particle As Variant
For Each particle In Split(c.Value, listDelimiter)
' split each fragment "A=1" into variable and value; kvp = key-value pair
Dim kvp As TKeyValuePair
kvp.key = Split(particle, assingmentOperator)(0)
kvp.val = Split(particle, assingmentOperator)(1)
If Not dic.Exists(kvp.key) Then
' add new keys/variables to dictionary to keep track
dic.Add kvp.key, kvp.val
Else
' check values of existing keys and mark key if values differ ( = there are 2 or more different values)
Dim AlreadyMarkedForReturn As Boolean
AlreadyMarkedForReturn = (dic(kvp.key) = ReturnIndicator)
If dic(kvp.key) <> kvp.val And Not AlreadyMarkedForReturn Then
dic(kvp.key) = ReturnIndicator
End If
End If
Next particle
Next c
' clean up dictionary to only contain the return entries
Dim k As Variant
For Each k In dic.Keys
If dic(k) <> ReturnIndicator Then dic.Remove k
Next k
' return remaining keys as comma(or whatever)-separated list
FindDuplicateAssignmentsInCSVRange = Join(dic.Keys, listDelimiter)
End Function
Example usage:
Public Sub SO_51607467()
Debug.Print "D2:D2 Variables: '" & FindDuplicateAssignmentsInCSVRange(ThisWorkbook.Sheets(1).Range("D2")) & "'"
Debug.Print "D3:E3 Variables: '" & FindDuplicateAssignmentsInCSVRange(ThisWorkbook.Sheets(1).Range("D3:E3")) & "'"
Debug.Print "D4:F4 Variables: '" & FindDuplicateAssignmentsInCSVRange(ThisWorkbook.Sheets(1).Range("D4:F4")) & "'"
End Sub
Output of running SO_51607467 from within an approximate (due to copy-typing) copy of the workbook you showed:
D2:D2 Variables: ''
D3:E3 Variables: 'D'
D4:F4 Variables: 'B,C'
Related
As mentioned in the title, I wonder if there is any way to use built-in functions in excel to see whether a cell contains a specific number and count the total numbers in the cell. The cell can contain a list of numbers seperated by comas, for instance, "1,4,7" or ranges "10-25" or a combination of both. See the print screen.
No, there is not, but you could write a VBA function to do that, something like:
Function NumberInValues(number As String, values As String) As Boolean
Dim n As Integer
n = CInt(number)
Dim parts() As String
parts = Split(values, ",")
For i = LBound(parts) To UBound(parts)
parts(i) = Replace(parts(i), " ", "")
Next
Dim p() As String
Dim first As Integer
Dim last As Integer
Dim tmp As Integer
For i = LBound(parts) To UBound(parts)
p = Split(parts(i), "-")
' If there is only one entry, check for equality:
If UBound(p) - LBound(p) = 0 Then
If n = CInt(p(LBound(p))) Then
NumberInValues = True
Exit Function
End If
Else
' Check against the range of values: assumes the entry is first-last, does not
' check for last > first.
first = CInt(p(LBound(p)))
last = CInt(p(UBound(p)))
If n >= first And n <= last Then
NumberInValues = True
Exit Function
End If
End If
Next
NumberInValues = False
End Function
and then your cell C2 would be
=NumberInValues(B2,A2)
Calculating how many numbers there are in the ranges would be more complicated as numbers and ranges could overlap.
The key part of implementing this is to create a List or Array of individual numbers that includes all the Numbers represented in the first column.
Once that is done, it is trivial to check for an included, or do a count.
This VBA routine returns a list of the numbers
Option Explicit
Function createNumberList(s)
Dim AL As Object
Dim v, w, x, y, I As Long
Set AL = CreateObject("System.Collections.ArrayList")
v = Split(s, ",")
For Each w In v
'If you need to avoid duplicate entries in the array
'uncomment the If Not lines below and remove the terminal double-quote
If IsNumeric(w) Then
'If Not AL.contains(w) Then _"
AL.Add CLng(w)
Else
x = Split(w, "-")
For I = x(0) To x(1)
'If Not AL.contains(I) Then _"
AL.Add I
Next I
End If
Next w
createNumberList = AL.toarray
End Function
IF your numeric ranges might be overlapping, you will need to create a Unique array. You can do that by changing the AL.Add function to first check if the number is contained in the list. In the code above, you can see instructions for that modification.
You can then use this UDF in your table:
C2: =OR($B2=createNumberList($A2))
D2: =COUNT(createNumberList($A2))
Here is a possible formula solution using filterxml as suggested in the comment:
=LET(split,FILTERXML("<s><t>+"&SUBSTITUTE(A2,",","</t><t>+")&"</t></s>","//s/t"),
leftn,LEFT(split,FIND("-",split&"-")-1),
rightn,IFERROR(RIGHT(split,LEN(split)-FIND("-",split)),leftn),
SUM(rightn-leftn+1))
The columns from F onwards show the steps for the string in A2. I had to put plus signs in because Excel converted a substring like "10-15" etc. into a date as usual.
Then to find if a number (in C2 say) is present:
=LET(split,FILTERXML("<s><t>+"&SUBSTITUTE(A2,",","</t><t>+")&"</t></s>","//s/t"),
leftn,LEFT(split,FIND("-",split&"-")-1),
rightn,IFERROR(RIGHT(split,LEN(split)-FIND("-",split)),leftn),
SUM((--leftn<=C2)*(--rightn>=C2))>0)
As noted by #Ron Rosenfeld, it's possible that there may be duplication within the list: the Count formula would be susceptible to double counting in this case, but the Check (to see if a number was in the list) would give the correct result. So the assumptions are:
(1) No duplication (I think it would be fairly straightforward to check for duplication, but less easy to correct it)
(2) No range in wrong order like 15-10 (although this could easily be fixed by putting ABS around the subtraction in the first formula).
Here is a little cheeky piece of code for a VBA solution:
Function pageCount(s As String)
s = Replace(s, ",", ",A")
s = Replace(s, "-", ":A")
s = "A" & s
' s now looks like a list of ranges e.g. "1,2-3" would give "A1,A2:A3"
pageCount = Union(Range(s), Range(s)).Count
End Function
because after all the ranges in the question behave exactly like Excel ranges don't they?
and for inclusion (of a single page)
Function includes(s As String, m As String) As Boolean
Dim isect As Range
s = Replace(s, ",", ",A")
s = Replace(s, "-", ":A")
s = "A" & s
Set isect = Application.Intersect(Range(s), Range("A" & m))
includes = Not (isect Is Nothing)
End Function
I need your support to get the result in column B, it is basically see if there is 5 digits between a comma and a dash sign ( , & - ) in A cells and delete the text from , to -, the rest of the text need to be same without any changes as shown in the picture. I am looking for a normal excel equation not a vba code. Thanks in advance!
Based on your screenshot and not what you wrote, you seem to want to retain substrings that are in the format of aaaa-nnnn where nnnnn represents four or more digits. If you have the FILTERXML and TEXTJOIN functions, you can use this formula:
=SUBSTITUTE(TEXTJOIN(",",TRUE,FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(A1,"-",",-"),",","</s><s>") & "</s></t>","//s[number(.)<-999] /preceding::*[1] | //s[number(.)<-999]")),",-","-")
Create an XML splitting on the commas and also the hyphen (but retain the hyphen)
Construct an xPath which selects both the node preceding a numeric node with a value of less than -999 and that numeric node itself.
The negative number comes from retaining the hyphen
Put the values back together using TEXTJOIN with a comma separator
Remove the comma that is now preceding the hyphen.
If your version of Excel does not have those functions, VBA or possibly Power Query would be a better solution.
If wind up preferring a VBA solution, I suggest looking for substrings that meet your apparent criteria of a hyphenated substring where the right half is a number > 999. If necessary, checking the left side for being all capital letters could be easily added.
Option Explicit
Function getStr(S As String) As String
Dim V, W
Dim sTemp As String
V = Split(S, ",")
For Each W In V
If Val(Split(W, "-")(1)) > 999 Then _
sTemp = sTemp & "," & W
Next W
getStr = Mid(sTemp, 2)
End Function
Regrettably Substitute Doesn't Allow Wild Characters
Using SEARCH and REPLACE the only thing I could come up with was the following formula:
=IF(ISNUMBER(SEARCH("????-, ",A1)),REPLACE(A1,SEARCH("????-, ",A1),7,""),IF(ISNUMBER(SEARCH(", ????-",RIGHT(A1,7))),REPLACE(A1,LEN(A1)-6,7,""),A1))
which is removing only the first occurrence of the strings you want removed.
As a consolation I'm offering a simple VBA solution, which by default removes all 5-character sub strings in a ", "-delimited string.
In VBA (CTRL-F11) insert a new module into the workbook where you need it. In the code sheet of the module (probably Module1) copy/paste the following code:
Option Explicit
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Purpose: From a delimited string, removes all sub strings containing
' a specified number of characters and returns the remainder
' of the string.
' Returns: A string, if there are any substrings with a different number
' of characters than the specified number of characters,
' or "", if not.
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function FilterC(SourceValue As Variant, _
Optional NumberOfCharacters As Long = 5, _
Optional Delimiter As String = ", ") As String
Dim vntS As Variant ' Source Array
Dim vntT As Variant ' Target Array
Dim i As Long ' Source Array Elements Counter
Dim iTA As Long ' Target Array Elements Counter
Dim strC As String ' Current String
' Check if SourceValue is text.
If VarType(SourceValue) <> vbString Then Exit Function
' Check if SourceValue is "". For a cell in Excel this refers to an empty
' cell or a cell with a formula evaluating to "".
If SourceValue = "" Then Exit Function
' Initialize Target Array Elements Counter.
iTA = -1
' Write SourceValue to elements of Source Array (using 'Split').
vntS = Split(SourceValue, Delimiter)
' Loop through elements of Source Array.
For i = 0 To UBound(vntS)
' Write current element in Source Array to Current String.
strC = vntS(i)
' Check if the length of Current String is NOT equal
' to NumberOfCharacters.
If Len(strC) <> 5 Then GoSub TargetArray
Next
' If only 'NumberOfCharacters' character strings are found.
If iTA = -1 Then Exit Function
' Write elements of Target Array to FilterC (using "Join").
FilterC = Join(vntT, Delimiter)
Exit Function
' Write String to Target Array.
TargetArray:
' Increase Target Array Elements Counter.
iTA = iTA + 1
' Check if Target Array Elements Counter is greater than 0 i.e. if
' there already are any elements in Target Array.
If iTA > 0 Then
' All, except the first element.
ReDim Preserve vntT(iTA)
Else
' Only the first element.
ReDim vntT(0)
End If
' Write Current String to Target Array.
vntT(iTA) = strC
' Note: Target Array Elements Counter (iTA) was initalized with -1, so when
' the first time the code redirects to TargetArray (Subroutine),
' iTA will be 0 and only this time run through the Else clause
' and finally write Current String to Target Array.
Return
End Function
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
In Excel for a result from A1 use the following formula:
=FilterC(A1)
which is the short, default behavior of the fully qualified formula:
=FilterC(A1,5,", ")
I'm relatively new to Excel-Vb. I basically want to extract multiple values from a single cell and assign it to multiple variables. For eg.,
A single cell in an excel contains value in the format (0.1:0.2:10). I need to assign these values in a cell to three different variables. My code should look something like this,
Public DeviceInfo (Optional ByVal rng As Range = Nothing)
If (Len(rng.Cells(1)) > 0 Then
rng.Cells(1) = 'This is where I'm not sure how to delimit colon
'in my cell value and assign it to different
'variables
End If
End Sub
So ideally, the o/p should be the values in the cell 0.1, 0.2, 10 should assigned to Variable 1, variable2, variable3 respectively. Can someone help?
As per understanding of your problem, You can write code as below it works
you need to supply cell as range and position value like if (A1="0.1:0.2:10")
=DeviceInfo(A1,0)'output is 0.1
=DeviceInfo(A1,1)'output is 0.2
=DeviceInfo(A1,2)'output is 10
'''
Public Function DeviceInfo(rng As Range, val As Integer)
'This function split string "0.1:0.2:10" on the basis of char ":"
'required input rng as informat of "0.1:0.2:10" and val as position of string after split
Dim LArray() As String 'Dim for output array
LArray = Split(rng.Value, ":") 'split on basis of ":" char
If Application.WorksheetFunction.IsNumber(val) = True Then
DeviceInfo = LArray(val)
Else
DeviceInfo = "Not Number"
End If
End Function
I have two files one is a Project Register that holds key information on a project and the other is a Risk log.
There is a 1:m relationship between entries in the Register and the Risk log. What I need to do is combine all of a project risks into one cell inside the project register file.
The matching field in both files is the Project ID field
Is there a way I can do this using a vlookup variant or multiple nested vlookups?
Here's the user-defined function approach I mentioned (adapted from a different VLOOKUP-variant I already had made):
' Acts like VLOOKUP in a 1-to-many scenario by concatenating all values in matching rows
' instead of just returning the first match
Public Function VLOOKUP_MANY(lookup_value As String, lookup_range As Range, column_number As Integer, Optional delimiter As Variant) As Variant
Dim vArr As Variant
Dim i As Long
Dim found As Boolean: found = False
' Set default delimiter
If IsMissing(delimiter) Then delimiter = ", "
' Get values
vArr = lookup_range.Value2
' If column_number is outside of the specified range, return #REF
If column_number < LBound(vArr, 2) Or column_number > UBound(vArr, 2) Then
VLOOKUP_MANY = CVErr(xlErrRef)
Exit Function
End If
' Search for matches and build a concatenated list
VLOOKUP_MANY = ""
For i = 1 To UBound(vArr, 1)
If UCase(vArr(i, 1)) = UCase(lookup_value) Then
VLOOKUP_MANY = VLOOKUP_MANY & delimiter & vArr(i, column_number)
found = True ' Mark at least 1 result
End If
Next
If found Then
VLOOKUP_MANY = Right(VLOOKUP_MANY, Len(VLOOKUP_MANY) - Len(delimiter)) ' Remove first delimiter
Else
VLOOKUP_MANY = CVErr(xlErrNA) ' If no matches found, return #N/A
End If
End Function
This will search the first column in the specified range for the specified value (same as VLOOKUP), but returns the values in the specified column number concatenated. It will return #N/A when no matches are found, and #REF if an invalid value is specified for the column number (e.g. you choose column 5 but only had a 4-column table).
In case you don't know about user-defined functions - you can just copy this VBA code into the VBE for a module in your workbook. Hit Alt+F11, go to Insert > Module at the top of the screen, then paste this code into the blank file that opens up. When you go to save, you'll have to save your workbook as Macro-Enabled (.xlsm) to keep the code working - Excel will remind you about this in the save screen.
Be forewarned: it's going to be slower than VLOOKUP as a result of having to look through the entire lookup range instead of being able to stop at the first match it finds.
If you're open to using an array formula instead, there are ways to speed up this sort of functionality for very large datasets...
Different version that leverages some of the benefits of array formulas to store lookup values and speedup subsequent calls:
' Acts like VLOOKUP in a 1-to-many scenario by concatenating all values in matching rows
' instead of just returning the first match
' Utilizes a dictionary to speedup multiple matches (great for array formulas)
Public Function VLOOKUP_MANY_ARRAY(lookup_values As Range, lookup_range As Range, column_number As Integer, Optional delimiter As Variant) As Variant
Dim vHaystack As Variant, vNeedles As Variant
Dim i As Long
Dim found As Boolean: found = False
Dim dict As Object: Set dict = CreateObject("Scripting.Dictionary")
' Set default delimiter
If IsMissing(delimiter) Then delimiter = ", "
' Get values
vHaystack = lookup_range
vNeedles = lookup_values
' If column_number is outside of the specified range, return #REF
If column_number < LBound(vHaystack, 2) Or column_number > UBound(vHaystack, 2) Then
VLOOKUP_MANY_ARRAY = CVErr(xlErrRef)
Exit Function
End If
' Add values to a lookup dictionary
For i = 1 To UBound(vHaystack, 1)
If dict.Exists(UCase(vHaystack(i, 1))) Then
dict.Item(UCase(vHaystack(i, 1))) = dict.Item(UCase(vHaystack(i, 1))) & delimiter & vHaystack(i, column_number)
Else
dict.Add UCase(vHaystack(i, 1)), vHaystack(i, column_number)
End If
Next
Dim outArr As Variant
If IsArray(vNeedles) Then ' Check number of lookup cells
' Build output array
ReDim outArr(1 To UBound(vNeedles, 1), 1 To 1) As Variant
For i = 1 To UBound(vNeedles, 1)
If dict.Exists(UCase(vNeedles(i, 1))) Then
outArr(i, 1) = dict.Item(UCase(vNeedles(i, 1)))
Else
outArr(i, 1) = CVErr(xlErrNA)
End If
Next
Else
' Single output value
If dict.Exists(UCase(vNeedles)) Then
outArr = dict.Item(UCase(vNeedles))
Else
outArr = CVErr(xlErrNA)
End If
End If
VLOOKUP_MANY_ARRAY = outArr
End Function
This creates a Dictionary, which is a special structure that's really good for looking up values. There's a little extra overhead involved in building it, but once you have the structure, you can do lookups into it very quickly. This is especially nice with array formulas, which is basically when the exact same formula gets put into a whole collection of cells, then the function executes once and returns values for every cell (instead of just executing once, separately, for a bunch of cells). Enter it like an array formula with CTRL+SHIFT+ENTER, and make the first argument refer to all your lookup values instead of just one.
It will work without being used as an array formula, but it will be somewhat slower than the first function in that situation. However, if you use it in an array formula, you'll see huge speedups.
RE-EDIT:
You might need to write a user defined function or write a macro (code on same link)
I have an excel Spreadsheet of values. I am trying to build a string of values which will look at all the records in the sheet and determine which ones are the same (based on a sequence)..
As you can see by the picture, there are three columns (E, F, G) which contain the source data. (source ID, target ID and Connection ID).. essentially there can only be one combination of source to target relationships, so I will need to merge any duplicate connections.
so far I have managed to find when they are duplicates by:
concatenating the source and target (Col H)
looking for duplicates (and ordering them) using the formula
=IF(COUNTIF(H:H,H2)>1,COUNTIF(H$2:H2,H2),1)
and Now I am trying to build a string which will be used to merge the records.
Essentially I am trying to build a function which looks for all exact strings in Col H, and then looks at the sequence(I) and builds a string like so:
34~62~65 (which tells me that connection 34 must merge with 62 and then 65)
Problem is that I have not managed to do this.
current formula in Col J is:
=IF(H2=H3,IF(I3=I2+1,G3&"~"&G2,""))
but as you can see its only pairwise, not actually looking for the duplicates in sequence (i.e. 1 then 2 then 3 etc)
A while ago I wrote a quite an extensive UDF for a friend of mine to deal with this problem. It is supposed to look exactly like a VLookup, except for an additional parameter UniqueOnly and a Separator.
What it does is it looks up a value based on a different cell just like VLookup, but unlike Vlookup it returns all possible values as a result, not just one.
It is used like this:
=LookupConcatenate(LookupValue,LookupRange,LookupColumn, [Optional UniqueOnly = 0], [Optional Separator = ", "])
And the code is:
Public Function LookupConcatenate(LookupValue As Range, LookupRange As Range, Column As Integer, Optional UniqueOnly As Boolean = False, Optional Separator As String = ", ") As String
' by Marek Stejskal
Dim rngMatch As Range
Dim rngLookup As Range
Dim varMatch As Variant
Dim varIndex As Variant
Dim intFoundAll As Integer
Dim strFoundAll() As String
Dim intFoundUnique As Integer
Dim strFoundUnique() As String
Dim blnFound As Boolean
Dim strResult As String
Dim i As Integer
On Error GoTo ErrHandler:
Set rngLookup = LookupRange
Set rngMatch = rngLookup.Columns(1)
Do While 1 = 1
' Match function
varMatch = Application.Match(LookupValue, rngMatch, 0)
' Exit checking if MATCH returned no value
If IsError(varMatch) Then Exit Do
' Index function
varIndex = Application.Index(rngLookup, varMatch, Column)
intFoundAll = intFoundAll + 1
' Adding space to ALL array
ReDim Preserve strFoundAll(1 To intFoundAll)
' Checking if the new result is in ALL array
blnFound = False
For i = 1 To UBound(strFoundAll)
If strFoundAll(i) = CStr(varIndex) Then
blnFound = True
Exit For
End If
Next
' If new result is unique add it to UNIQUE array
If blnFound = False Then
intFoundUnique = intFoundUnique + 1
ReDim Preserve strFoundUnique(1 To intFoundUnique)
strFoundUnique(intFoundUnique) = CStr(varIndex)
End If
' Add the new result to ALL array
strFoundAll(intFoundAll) = CStr(varIndex)
' Shortening ranges
Set rngLookup = rngLookup.Resize(rngLookup.Rows.Count - varMatch).Offset(varMatch)
Set rngMatch = rngLookup.Columns(1)
Loop
' Creating result string
If UniqueOnly = True Then
If intFoundUnique = 0 Then
strResult = ""
Else
For i = 1 To UBound(strFoundUnique)
strResult = strResult & IIf(strResult = "", "", Separator) & strFoundUnique(i)
Next i
End If
Else
If intFoundAll = 0 Then
strResult = ""
Else
For i = 1 To UBound(strFoundAll)
strResult = strResult & IIf(strResult = "", "", Separator) & strFoundAll(i)
Next i
End If
End If
LookupConcatenate = strResult
Exit Function
ErrHandler:
LookupConcatenate = Err.Description
End Function
To make this work for you, you will first need to switch the order of Connection and ID and then you can put on row 2 the formula like this:
=LookupConcatenate(G2, G2:J100, 2, 0, "~")
So if you want to do this without VBA, the only way is to build the string as you go down each row. What I mean is the final data would look like:
This does not meet the full requirements of all of column "F" containing the full concatenated string. But the last unique row of ID would contain the final string.
The formula to put in column F (assuming your data is aligned as in the picture here)
=IF(ISERROR(MATCH($D2,INDIRECT("D1:D"&ROW()-1),0)),""&$C2,IFERROR(INDEX(F:F,MATCH($D2,INDIRECT("D1:D"&ROW()-1),1)),INDEX(F:F,MATCH($D2,INDIRECT("D1:D"&ROW()-1),0)))&"~"&$C2)
This works even if the rows are not sorted, (and it actually does not use the sequence column at all). Here is a picture with additional rows added as test data:
You actually then could create the column you are searching for, by adding a column containing:
=IF(COUNTIF($F:$F,SUBSTITUTE($F2,"~","*")&"*")=1,$F2,FALSE)
That would give the following final result: