So imagine there is the following string in a cell in excel:
A1 = "Company 1 Company 2 Company 1 Company 2 Company 3"
and the desired result for now is removing the duplicates:
A1 = "Company 1 Company 2 Company 3" (I imagine that this one doesn't require a macro)
the ideal one would be to put the distinct values in different cells in a vertical way:
A1 = "Company 1"
A2 = "Company 2"
A3 = "Company 3"
(which would require definitely programming but since I never used vba i'm not experienced enough I think to elaborate such code)
Is it feasible?
EDIT: the delimiter can be changed from a space " " to other, for example, a semicolon ";" to prevent errors and to be easier to solve this one.
Asumption is you have a delimiter between the strings to tell apart you could use the following code
Option Explicit
Sub RemoveDuplicates()
Const SEPARATOR = ","
Dim vDat As Variant
vDat = Split(Range("A1"), SEPARATOR)
' remove trailing blanks if necessary
Dim i As Long
For i = LBound(vDat) To UBound(vDat)
vDat(i) = Trim(vDat(i))
Next i
Dim dic As Object
Set dic = CreateObject("Scripting.Dictionary")
Dim vItem As Variant
For Each vItem In vDat
If Not dic.Exists(vItem) Then
dic.Add vItem, vItem
End If
Next
vDat = dic.Keys
' Write data to column B
Range("B1").Resize(UBound(vDat) + 1) = WorksheetFunction.Transpose(vDat)
'Debug.Print Join(vDat, SEPARATOR)
End Sub
Tested with the following data
A1 = Company 1, Company 2, Company 1, Company 2 , Company 3
or
A1 = IBM, Apple, Microsoft, Apple , IBM
With an unambiguous string, and by that I mean:
delimiter not included in the substrings, OR
each entry surrounded by double quotes
you can use Power Query in Excel 2010, 2013 or Data Get & Transform in Excel 2016, to do all of that.
Split the cell on the delimiter
Define the quote mark as the text qualifier if necessary
Rows - remove duplicates
So with data like:
Company 1;Company 2;Company 1;Company 2;Company 3
or (space delimiter)
"Company 1" "Company 2" "Company 1" "Company 2" "Company 3"
you can easily accomplish what you require without using VBA.
And if, as in your examples, there are extraneous spaces at the beginning or end of the data, Power Query has a Text.Trim function that will be useful.
Alternate solution using UDF (commented for clarity):
Public Function UNIQUELIST(ByVal arg_vOriginalList As String, ByVal arg_sDelimiter As String, ByVal arg_lReturnIndex As Long) As Variant
Dim oDict As Object
Dim vElement As Variant
Dim i As Long
'Use a dictionary to extract unique elements
Set oDict = CreateObject("Scripting.Dictionary")
i = 0 'This is a counter to keep track until we reach the appropriate return index
'Loop through each element
For Each vElement In Split(arg_vOriginalList, arg_sDelimiter)
'Check the trimmed, lowercase element against the keys of the dictionary
If Not oDict.Exists(LCase(Trim(vElement))) Then
'Unique element found
i = i + 1
If i = arg_lReturnIndex Then
'Found appropriate unique element, output and exit function
UNIQUELIST = Trim(vElement)
Exit Function
End If
'Not correct return index, add element to dictionary
'Lowercase the key (so uniques aren't case sensitive) and trim both the key and the value
oDict.Add LCase(Trim(vElement)), Trim(vElement)
End If
Next vElement
'arg_lReturnIndex was less than 1 or greater than the number of unique values, return blank
UNIQUELIST = vbNullString
End Function
Then in a cell where you want the output to start (for example, B1), put this formula and copy down (adjust the "," to be the correct delimiter):
=UNIQUELIST($A$1,",",ROW(A1))
Approach using same delimiters as in OP
I assume the same space delimiters as in your original post: As you want to get your company strings in groups of two, I slightly modified the good solution of #Storax by concatenating the Split result in steps of 2 first and demonstrate a shorter way to write results back to sheet (cf. section [5]).
Example Code
Option Explicit ' declaration head of your code module
Sub SplitCompanies()
' [0] declare variables and set objects
Dim v, vItem
Dim i As Integer, n As Integer
Dim s, str As String
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Tabelle5") ' << change to your sheet name
Dim dict As Object ' late binding of dictionary
Set dict = CreateObject("Scripting.Dictionary")
' [1] get cell value and split it (space delimited as in Original Post)
str = ws.Range("A1") ' cell value, e.g. "Company 1 Company 2 Company 1 Company 2 Company 3"
s = Split(str, " ") ' split cell value (space delimiter)
' [2] count all companies and redimension helper array
n = Int((UBound(s) + 1) / 2) - 1 ' items counter equals 1/2 of split items
ReDim v(0 To n) ' redim zero-based 1-dim helper array
' [3] concatenate partial strings in helper array
For i = 0 To n
v(i) = s(i * 2) & " " & s(i * 2 + 1)
Next i
' [4] build dictionary with unique items
For Each vItem In v
If Not dict.Exists(vItem) Then
dict.Add vItem, vItem
End If
Next
' [5] Write data to column B
ws.Range("B1:B" & dict.Count) = Application.Transpose(dict.Keys)
' [6] clear memory
Set dict = Nothing: Set ws = Nothing
End Sub
Related
I have an excel Spreadsheet of values. I am trying to build a string of values which will look at all the records in the sheet and determine which ones are the same (based on a sequence)..
As you can see by the picture, there are three columns (E, F, G) which contain the source data. (source ID, target ID and Connection ID).. essentially there can only be one combination of source to target relationships, so I will need to merge any duplicate connections.
so far I have managed to find when they are duplicates by:
concatenating the source and target (Col H)
looking for duplicates (and ordering them) using the formula
=IF(COUNTIF(H:H,H2)>1,COUNTIF(H$2:H2,H2),1)
and Now I am trying to build a string which will be used to merge the records.
Essentially I am trying to build a function which looks for all exact strings in Col H, and then looks at the sequence(I) and builds a string like so:
34~62~65 (which tells me that connection 34 must merge with 62 and then 65)
Problem is that I have not managed to do this.
current formula in Col J is:
=IF(H2=H3,IF(I3=I2+1,G3&"~"&G2,""))
but as you can see its only pairwise, not actually looking for the duplicates in sequence (i.e. 1 then 2 then 3 etc)
A while ago I wrote a quite an extensive UDF for a friend of mine to deal with this problem. It is supposed to look exactly like a VLookup, except for an additional parameter UniqueOnly and a Separator.
What it does is it looks up a value based on a different cell just like VLookup, but unlike Vlookup it returns all possible values as a result, not just one.
It is used like this:
=LookupConcatenate(LookupValue,LookupRange,LookupColumn, [Optional UniqueOnly = 0], [Optional Separator = ", "])
And the code is:
Public Function LookupConcatenate(LookupValue As Range, LookupRange As Range, Column As Integer, Optional UniqueOnly As Boolean = False, Optional Separator As String = ", ") As String
' by Marek Stejskal
Dim rngMatch As Range
Dim rngLookup As Range
Dim varMatch As Variant
Dim varIndex As Variant
Dim intFoundAll As Integer
Dim strFoundAll() As String
Dim intFoundUnique As Integer
Dim strFoundUnique() As String
Dim blnFound As Boolean
Dim strResult As String
Dim i As Integer
On Error GoTo ErrHandler:
Set rngLookup = LookupRange
Set rngMatch = rngLookup.Columns(1)
Do While 1 = 1
' Match function
varMatch = Application.Match(LookupValue, rngMatch, 0)
' Exit checking if MATCH returned no value
If IsError(varMatch) Then Exit Do
' Index function
varIndex = Application.Index(rngLookup, varMatch, Column)
intFoundAll = intFoundAll + 1
' Adding space to ALL array
ReDim Preserve strFoundAll(1 To intFoundAll)
' Checking if the new result is in ALL array
blnFound = False
For i = 1 To UBound(strFoundAll)
If strFoundAll(i) = CStr(varIndex) Then
blnFound = True
Exit For
End If
Next
' If new result is unique add it to UNIQUE array
If blnFound = False Then
intFoundUnique = intFoundUnique + 1
ReDim Preserve strFoundUnique(1 To intFoundUnique)
strFoundUnique(intFoundUnique) = CStr(varIndex)
End If
' Add the new result to ALL array
strFoundAll(intFoundAll) = CStr(varIndex)
' Shortening ranges
Set rngLookup = rngLookup.Resize(rngLookup.Rows.Count - varMatch).Offset(varMatch)
Set rngMatch = rngLookup.Columns(1)
Loop
' Creating result string
If UniqueOnly = True Then
If intFoundUnique = 0 Then
strResult = ""
Else
For i = 1 To UBound(strFoundUnique)
strResult = strResult & IIf(strResult = "", "", Separator) & strFoundUnique(i)
Next i
End If
Else
If intFoundAll = 0 Then
strResult = ""
Else
For i = 1 To UBound(strFoundAll)
strResult = strResult & IIf(strResult = "", "", Separator) & strFoundAll(i)
Next i
End If
End If
LookupConcatenate = strResult
Exit Function
ErrHandler:
LookupConcatenate = Err.Description
End Function
To make this work for you, you will first need to switch the order of Connection and ID and then you can put on row 2 the formula like this:
=LookupConcatenate(G2, G2:J100, 2, 0, "~")
So if you want to do this without VBA, the only way is to build the string as you go down each row. What I mean is the final data would look like:
This does not meet the full requirements of all of column "F" containing the full concatenated string. But the last unique row of ID would contain the final string.
The formula to put in column F (assuming your data is aligned as in the picture here)
=IF(ISERROR(MATCH($D2,INDIRECT("D1:D"&ROW()-1),0)),""&$C2,IFERROR(INDEX(F:F,MATCH($D2,INDIRECT("D1:D"&ROW()-1),1)),INDEX(F:F,MATCH($D2,INDIRECT("D1:D"&ROW()-1),0)))&"~"&$C2)
This works even if the rows are not sorted, (and it actually does not use the sequence column at all). Here is a picture with additional rows added as test data:
You actually then could create the column you are searching for, by adding a column containing:
=IF(COUNTIF($F:$F,SUBSTITUTE($F2,"~","*")&"*")=1,$F2,FALSE)
That would give the following final result:
I have an excel column that looks like this:
Person 1, Person 2
Person 3
Person 4
Person 5
Person 1
Person 1, Person 4
So in each cell, there is either a single person or a list of people. I want to count these up to be able to generate a count of each person like this from the structure above
Person 1: 3
Person 2: 1
Person 3: 1
Person 4: 2
Person 5: 1
Is there any good way of doing this in VBA?
In order to use this you'll need to add a reference to Microsoft Scripting Runtime (for the Dictionary object)
Sub countU()
Dim c As Range
Dim s As String, v As Variant, i As Integer
Dim arr() As String
Dim dic As Dictionary
Set dic = New Dictionary
'call your actual range here
For Each c In ActiveSheet.Range("A1:A5")
s = c.Value
s = Replace(s, "Person", "")
s = Replace(s, " ", "")
arr = Split(s, ",")
For Each v In arr
If dic.Exists(v) Then
dic.Item(v) = dic.Item(v) + 1
Else
dic.Add v, 1
End If
Next v
Next c
For Each strKey In dic.Keys()
'output to your desired location here
Debug.Print "Person " & strKey & ": " & dic.Item(strKey)
Next strKey
End Sub
What might be an improvement on this is making it a function accepting a range (your range to iterate over (what I am calling A1:A5 in my sample) then set the out put of the function to a range populated with your discovered values.
This will work if all your cells contain only one person, and you can achieve this by going to Data > Text to Columns, selecting Delimited, and checking comma. In the case where people aren't separated by a comma, you would need a way to separate them.
Use a two dimensional array, storing the person in the first dimension and the count in the second dimension of each element. Loop through the cells that contains your list, and using Mid(), check to see if that cell has a substring equal to any of the strings in the first dimension in your array. If it does, increase the count of that element by 1.
In VBA look into the split() function.
Loop through your rows, and do
array() = split("cell string", ",")
split returns an array of strings you can then loop through and count.
Hope this helps you get on the right track.
I have a simple problem that I'm hoping to resolve without using VBA but if that's the only way it can be solved, so be it.
I have a file with multiple rows (all one column). Each row has data that looks something like this:
1 7.82E-13 >gi|297848936|ref|XP_00| 4-hydroxide gi|297338191|gb|23343|randomrandom
2 5.09E-09 >gi|168010496|ref|xp_00| 2-pyruvate
etc...
What I want is some way to extract the string of numbers that begin with "gi|" and end with a "|". For some rows this might mean as many as 5 gi numbers, for others it'll just be one.
What I would hope the output would look like would be something like:
297848936,297338191
168010496
etc...
Here is a very flexible VBA answer using the regex object. What the function does is extract every single sub-group match it finds (stuff inside the parenthesis), separated by whatever string you want (default is ", "). You can find info on regular expressions here: http://www.regular-expressions.info/
You would call it like this, assuming that first string is in A1:
=RegexExtract(A1,"gi[|](\d+)[|]")
Since this looks for all occurance of "gi|" followed by a series of numbers and then another "|", for the first line in your question, this would give you this result:
297848936, 297338191
Just run this down the column and you're all done!
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.count - 1
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function
Here it is (assuming data is in column A)
=VALUE(LEFT(RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2),
FIND("|",RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2)) -1 ))
Not the nicest formula, but it will work to extract the number.
I just noticed since you have two values per row with output separated by commas. You will need to check if there is a second match, third match etc. to make it work for multiple numbers per cell.
In reference to your exact sample (assuming 2 values maximum per cell) the following code will work:
=IF(ISNUMBER(FIND("gi|",$A1,FIND("gi|", $A1)+1)),CONCATENATE(LEFT(RIGHT($A1,LEN($A1)
- FIND("gi|",$A1) - 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ),
", ",LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1)
- 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1) - 2))
-1 )),LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2),
FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ))
How's that for ugly? A VBA solution may be better for you, but I'll leave this here for you.
To go up to 5 numbers, well, study the pattern and recurse manually in the formula. IT will get long!
I'd probably split the data first on the | delimiter using the convert text to columns wizard.
In Excel 2007 that is on the Data tab, Data Tools group and then choose Text to Columns. Specify Other: and | as the delimiter.
From the sample data you posted it looks like after you do this the numbers will all be in the same columns so you could then just delete the columns you don't want.
As the other guys presented the solution without VBA... I'll present the one that does use. Now, is your call to use it or no.
Just saw that #Issun presented the solution with regex, very nice! Either way, will present a 'modest' solution for the question, using only 'plain' VBA.
Option Explicit
Option Base 0
Sub findGi()
Dim oCell As Excel.Range
Set oCell = Sheets(1).Range("A1")
'Loops through every row until empty cell
While Not oCell.Value = ""
oCell.Offset(0, 1).Value2 = GetGi(oCell.Value)
Set oCell = oCell.Offset(1, 0)
Wend
End Sub
Private Function GetGi(ByVal sValue As String) As String
Dim sResult As String
Dim vArray As Variant
Dim vItem As Variant
Dim iCount As Integer
vArray = Split(sValue, "|")
iCount = 0
'Loops through the array...
For Each vItem In vArray
'Searches for the 'Gi' factor...
If vItem Like "*gi" And UBound(vArray) > iCount + 1 Then
'Concatenates the results...
sResult = sResult & vArray(iCount + 1) & ","
End If
iCount = iCount + 1
Next vItem
'And removes trail comma
If Len(sResult) > 0 Then
sResult = Left(sResult, Len(sResult) - 1)
End If
GetGi = sResult
End Function
open your excel in Google Sheets and use the regular expression with REGEXEXTRACT
Sample Usage
=REGEXEXTRACT("My favorite number is 241, but my friend's is 17", "\d+")
Tip: REGEXEXTRACT will return 241 in this example because it returns the first matching case.
In your case
=REGEXEXTRACT(A1,"gi[|](\d+)[|]")
I am trying to use a vlookup or similar function to search a worksheet, match account numbers, then return a specified value. My problem is there are duplicate account numbers and I would like the result to concatenate the results into one string.
Acct No CropType
------- ---------
0001 Grain
0001 OilSeed
0001 Hay
0002 Grain
Is in the first worksheet, on the 2nd worksheet I have the Acct No with other information and I need to get all the matching results into one column on the 2nd worksheet ie. "Grain Oilseed Hay"
Here is a function that will do it for you. It's a little different from Vlookup in that you will only give it the search column, not the whole range, then as the third parameter you will tell it how many columns to go left (negative numbers) or right (positive) in order to get your return value.
I also added the option to use a seperator, in your case you will use " ". Here is the function call for you, assuming the first row with Acct No. is A and the results is row B:
=vlookupall("0001", A:A, 1, " ")
Here is the function:
Function VLookupAll(ByVal lookup_value As String, _
ByVal lookup_column As range, _
ByVal return_value_column As Long, _
Optional seperator As String = ", ") As String
Dim i As Long
Dim result As String
For i = 1 To lookup_column.Rows.count
If Len(lookup_column(i, 1).text) <> 0 Then
If lookup_column(i, 1).text = lookup_value Then
result = result & (lookup_column(i).offset(0, return_value_column).text & seperator)
End If
End If
Next
If Len(result) <> 0 Then
result = Left(result, Len(result) - Len(seperator))
End If
VLookupAll = result
End Function
Notes:
I made ", " the default seperator for results if you don't enter one.
If there is one or more hits, I added some checking at the end to
make sure the string doesn't end with an extra seperator.
I've used A:A as the range since I don't know your range, but
obviously it's faster if you enter the actual range.
One way to do this would be to use an array formula to populate all of the matches into a hidden column and then concatenate those values into your string for display:
=IFERROR(INDEX(cropTypeValues,SMALL(IF(accLookup=accNumValues,ROW(accNumValues)-MIN(ROW(accNumValues))+1,""),ROW(A1))),"")
cropTypeValues: Named range holding the list of your crop types.
accLookup: Named range holding the account number to lookup.
accNumValues: Named range holding the list of your account
numbers.
Enter as an array formula (Ctrl+Shift+Enter) and then copy down as far as necessary.
Let me know if you need any part of the formula explaining.
I've just had a similar problem and I have looked up similar solutions for a long time, nothing really convinced me though. Either you had to write a macro, or some special function, while yet, for my needs the easiest solution is to use a pivot table in e.g. Excel.
If you create a new pivot table from your data and first add "Acct No" as row label and then add "CropType" as RowLabel you will have a very nice grouping that lists for each account all the crop types. It won't do that in a single cell though.
Here is my code which even better than an excel vlookup because you can choose to criterie colum, and for sure a separator (Carriege return too)...
Function Lookup_concat(source As String, tableau As Range, separator As String, colSRC As Integer, colDST As Integer) As String
Dim i, y As Integer
Dim result As String
If separator = "CRLF" Then
separator = Chr(10)
End If
y = tableau.Rows.Count
result = ""
For i = 1 To y
If (tableau.Cells(i, colSRC) = source) Then
If result = "" Then
result = tableau.Cells(i, colDST)
Else
result = result & separator & tableau.Cells(i, colDST)
End If
End If
Next
Lookup_concat = result
End Function
And a gift, you can make also a lookup on multiple element of the same cell (based on the same separator). Really usefull
Function Concat_Lookup(source As String, tableau As Range, separator As String, colSRC As Integer, colDST As Integer) As String
Dim i, y As Integer
Dim result As String
Dim Splitted As Variant
If separator = "CRLF" Then
separator = Chr(10)
End If
Splitted = split(source, separator)
y = tableau.Rows.Count
result = ""
For i = 1 To y
For Each word In Splitted
If (tableau.Cells(i, colSRC) = word) Then
If result = "" Then
result = tableau.Cells(i, colDST)
Else
Dim Splitted1 As Variant
Splitted1 = split(result, separator)
If IsInArray(tableau.Cells(i, colDST), Splitted1) = False Then
result = result & separator & tableau.Cells(i, colDST)
End If
End If
End If
Next
Next
Concat_Lookup = result
End Function
Previous sub needs this function
Function IsInArray(stringToBeFound As String, arr As Variant) As Boolean
IsInArray = (UBound(Filter(arr, stringToBeFound)) > -1)
End Function
Function VLookupAll(vValue, rngAll As Range, iCol As Integer, Optional sSep As String = ", ")
Dim rCell As Range
Dim rng As Range
On Error GoTo ErrHandler
Set rng = Intersect(rngAll, rngAll.Columns(1))
For Each rCell In rng
If rCell.Value = vValue Then
VLookupAll = VLookupAll & sSep & rCell.Offset(0, iCol - 1).Value
End If
Next rCell
If VLookupAll = "" Then
VLookupAll = CVErr(xlErrNA)
Else
VLookupAll = Right(VLookupAll, Len(VLookupAll) - Len(sSep))
End If
ErrHandler:
If Err.Number <> 0 Then VLookupAll = CVErr(xlErrValue)
End Function
Use like this:
=VLookupAll(K1, A1:C25, 3)
to look up all occurrences of the value of K1 in the range A1:A25 and to return the corresponding values from column C, separated by commas.
If you want to sum values, you can use SUMIF, for example
=SUMIF(A1:A25, K1, C1:C25)
to sum the values in C1:C25 where the corresponding values in column A equal the value of K1.
ALL D BEST.
Need help in creating an Excel Macro.I have an Excel sheet.The Excel sheet is not consistent.
I am planning to make it uniform and structured.
Eg.
A B C D
1 test tester tester
2 hai test
3 Bye test tested
4 GN test tested Fine
A B C D
1 test testertester
2 hai test
3 Bye testtested
4 GN testtestedFine
Basically I have to find the last cell where element is placed so based on that I can write my CONCATENATE funciton.
In this case it would be Column D and hence my concatenate function would have been
=CONCATENATE(B1,C1,D1)
Again I would like the result to be in B1 but not a problem if I have to hide.
Can anyone help me in doing this?
You could use the following VBA function which joins (concatenates) the values from an arbitrary range of cells, with an optional delimiter.
Public Function Join(source As Range, Optional delimiter As String)
Dim text As String
Dim cell As Range: For Each cell In source.Cells
If cell.Value = "" Then GoTo nextCell
text = text & cell.Value & delimiter
nextCell:
Next cell
If text <> "" And delimiter <> "" Then
text = Mid(text, 1, Len(text) - Len(delimiter))
End If
Join = text
End Function
For an example of how to use the function, enter =JOIN(A1:D1) into a cell anywhere on the spreadsheet.
=B1&C1&D1
or
Adam's function that I have optimized.
Function Join(source As Range, Optional delimiter As String) As String
'
' Join Macro
' Joins (concatenates) the values from an arbitrary range of cells, with an optional delimiter.
'
'optimized for strings
' check len is faster than checking for ""
' string Mid$ is faster than variant Mid
' nested ifs allows for short-circuit
' + is faster than &
Dim sResult As String
Dim oCell As Range
For Each oCell In source.Cells
If Len(oCell.Value) > 0 Then
sResult = sResult + CStr(oCell.Value) + delimiter
End If
Next
If Len(sResult) > 0 Then
If Len(delimiter) > 0 Then
sResult = Mid$(sResult, 1, Len(sResult) - Len(delimiter))
End If
End If
Join = sResult
End Function