Get value between multiple parenthesis with excel/airtable formula - excel

I'm trying to get all the content between multiple parenthesis and comma delimiting them. So for example
A1 contains
thisfile.jpg (/path/to/file.jpg), thisfile2.jpg (/path/to/file2.jpg)
and B1 should look like
/path/to/file.jpg, /path/to/file2.jpg
If it's just one entry I can get what I need with this:
MID(A1,FIND("(",A1)+1,FIND(")",A1)-FIND("(",A1)-1)
But that only returns the first one, I need to be for each parenthesis. The amount of parenthesis in each row will vary.

I am sure there are better solutions out there with formulas only. Yet, I cannot help you there. But the following UDF is surely also a feasible solution. Just copy this code into an empty module:
Option Explicit
Public Function GetPaths(strTMP As String)
Dim i As Long
Dim varArray As Variant
varArray = Split(strTMP, "(")
For i = LBound(varArray) To UBound(varArray)
If InStr(1, varArray(i), ")") > 0 Then
GetPaths = GetPaths & ", " & Mid(varArray(i), 1, InStr(1, varArray(i), ")") - 1)
End If
Next i
GetPaths = Mid(GetPaths, 3)
End Function
Afterwards, you can use this formula in column B as follows: =GetPaths(A1).

Related

Concatenate values of more cells in a single variable in vba

I have an excel file with four columns: name, surname, address, area.
There are a lot of rows.
Is there a way to concatenate all the values of every single row in a variable, using vba?
I need a variable that should contain something like this:
(name1, surname1, address1, area1); (name2, surname2, address2, area2); (name3, surname3, address3, area3)...
If you have the following data in your worksheet
Then the following code will read the data into an array …
Option Explicit
Public Sub Example()
Dim RangeData() As Variant ' declare an array
RangeData = Range("A1:D5").Value2 ' read data into array
End Sub
… with the following structure:
Alternatively you can do something like
Public Sub Example()
Dim DataRange As Range
Set DataRange = Range("A2:D5")
Dim RetVal As String
Dim Row As Range
For Each Row In DataRange.Rows
RetVal = RetVal & "(" & Join(Application.Transpose(Application.Transpose(Row.Value2)), ",") & "); "
Next Row
Debug.Print RetVal
End Sub
To get this output:
(name1, surname1, address1, area1); (name2, surname2, address2, area2); (name3, surname3, address3, area3); (name4, surname4, address4, area4);
.. is there a way to write the result like a sort of list that shows all the values of the cells of the range?
Yes, there is. In addition to PEH's valid answers and disposing of Excel version MS365 you might also use
Dim s as String
s = Evaluate("ArrayToText(A2:D5, 1)") ' arg. value 1 representing strict format
resulting in the following output string:
{"name1","surname1","address1","area1";"name2","surname2","address2","area2";"name3","surname3","address3","area3";"name4","surname4","address4","area4"}
Syntax
ARRAYTOTEXT(array, [format])
The ARRAYTOTEXT function returns an array of text values from any specified range. It passes text values unchanged, and converts non-text values to text.
The format argument has two values, 0 (concise default format) and 1 (strict format to be used here to distinguish different rows, too):
Strict format, i.e. value 1 includes escape characters and row delimiters. Generates a string that can be parsed when entered into the formula bar. Encapsulates returned strings in quotes except for Booleans, Numbers and Errors.
Thank you for your answers, suggestions, ideas and hints. I am sorry if my question was not so clear, all the solutions you added were perfect and extremely elegant.
In the end I found a way - a dumber way in comparison to all the things you wrote - and I solved with a for statement.
I did like this:
totRow = ActiveSheet.UsedRange.Rows.Count
For i = 1 To totRow
name = Cells(i, 1)
surname = Cells(i, 2)
address = Cells(i, 3)
area = Cells(i, 4)
Example = Example & "(" & name & ", " & surname & ", " & address & ", " & area & "); "
Next i
Range("E1").Value = Example
It works (it does what I wanted to do), but I noticed a little limit: if the rows are a lot I can't keep the whole text in the variable.

Is there built-in excel functions that can check for: "10" (or any number) in "1,3,5-9,13,16-20,23"?

As mentioned in the title, I wonder if there is any way to use built-in functions in excel to see whether a cell contains a specific number and count the total numbers in the cell. The cell can contain a list of numbers seperated by comas, for instance, "1,4,7" or ranges "10-25" or a combination of both. See the print screen.
No, there is not, but you could write a VBA function to do that, something like:
Function NumberInValues(number As String, values As String) As Boolean
Dim n As Integer
n = CInt(number)
Dim parts() As String
parts = Split(values, ",")
For i = LBound(parts) To UBound(parts)
parts(i) = Replace(parts(i), " ", "")
Next
Dim p() As String
Dim first As Integer
Dim last As Integer
Dim tmp As Integer
For i = LBound(parts) To UBound(parts)
p = Split(parts(i), "-")
' If there is only one entry, check for equality:
If UBound(p) - LBound(p) = 0 Then
If n = CInt(p(LBound(p))) Then
NumberInValues = True
Exit Function
End If
Else
' Check against the range of values: assumes the entry is first-last, does not
' check for last > first.
first = CInt(p(LBound(p)))
last = CInt(p(UBound(p)))
If n >= first And n <= last Then
NumberInValues = True
Exit Function
End If
End If
Next
NumberInValues = False
End Function
and then your cell C2 would be
=NumberInValues(B2,A2)
Calculating how many numbers there are in the ranges would be more complicated as numbers and ranges could overlap.
The key part of implementing this is to create a List or Array of individual numbers that includes all the Numbers represented in the first column.
Once that is done, it is trivial to check for an included, or do a count.
This VBA routine returns a list of the numbers
Option Explicit
Function createNumberList(s)
Dim AL As Object
Dim v, w, x, y, I As Long
Set AL = CreateObject("System.Collections.ArrayList")
v = Split(s, ",")
For Each w In v
'If you need to avoid duplicate entries in the array
'uncomment the If Not lines below and remove the terminal double-quote
If IsNumeric(w) Then
'If Not AL.contains(w) Then _"
AL.Add CLng(w)
Else
x = Split(w, "-")
For I = x(0) To x(1)
'If Not AL.contains(I) Then _"
AL.Add I
Next I
End If
Next w
createNumberList = AL.toarray
End Function
IF your numeric ranges might be overlapping, you will need to create a Unique array. You can do that by changing the AL.Add function to first check if the number is contained in the list. In the code above, you can see instructions for that modification.
You can then use this UDF in your table:
C2: =OR($B2=createNumberList($A2))
D2: =COUNT(createNumberList($A2))
Here is a possible formula solution using filterxml as suggested in the comment:
=LET(split,FILTERXML("<s><t>+"&SUBSTITUTE(A2,",","</t><t>+")&"</t></s>","//s/t"),
leftn,LEFT(split,FIND("-",split&"-")-1),
rightn,IFERROR(RIGHT(split,LEN(split)-FIND("-",split)),leftn),
SUM(rightn-leftn+1))
The columns from F onwards show the steps for the string in A2. I had to put plus signs in because Excel converted a substring like "10-15" etc. into a date as usual.
Then to find if a number (in C2 say) is present:
=LET(split,FILTERXML("<s><t>+"&SUBSTITUTE(A2,",","</t><t>+")&"</t></s>","//s/t"),
leftn,LEFT(split,FIND("-",split&"-")-1),
rightn,IFERROR(RIGHT(split,LEN(split)-FIND("-",split)),leftn),
SUM((--leftn<=C2)*(--rightn>=C2))>0)
As noted by #Ron Rosenfeld, it's possible that there may be duplication within the list: the Count formula would be susceptible to double counting in this case, but the Check (to see if a number was in the list) would give the correct result. So the assumptions are:
(1) No duplication (I think it would be fairly straightforward to check for duplication, but less easy to correct it)
(2) No range in wrong order like 15-10 (although this could easily be fixed by putting ABS around the subtraction in the first formula).
Here is a little cheeky piece of code for a VBA solution:
Function pageCount(s As String)
s = Replace(s, ",", ",A")
s = Replace(s, "-", ":A")
s = "A" & s
' s now looks like a list of ranges e.g. "1,2-3" would give "A1,A2:A3"
pageCount = Union(Range(s), Range(s)).Count
End Function
because after all the ranges in the question behave exactly like Excel ranges don't they?
and for inclusion (of a single page)
Function includes(s As String, m As String) As Boolean
Dim isect As Range
s = Replace(s, ",", ",A")
s = Replace(s, "-", ":A")
s = "A" & s
Set isect = Application.Intersect(Range(s), Range("A" & m))
includes = Not (isect Is Nothing)
End Function

Using Excel Proper Function with exception | Excel

Essentially I have multiple strings within my Excel Spreadsheet that are structured the following way:
JOHN-MD-HOPKINS
REC-PW-RESIN
I would like to use the proper function but exclude the part of the string that is within the dashes (-).
The end result should look like the following:
John-MD-Hopkins
Rec-PW-Resin
Is there an excel formula that is capable of doing this?
You may need to create your own VBA function to do this, that checks if there are two hyphens in the data, and if so converts the first and last words to proper case without touching the middle word, otherwise just converts the string to proper case.
Paste the following into a module within Excel:
Function fProperCase(strData As String) As String
Dim aData() As String
aData() = Split(strData, "-")
If UBound(aData) - LBound(aData) = 2 Then ' has two hyphens in the original data
fProperCase = StrConv(aData(LBound(aData)), vbProperCase) & "-" & aData(LBound(aData) + 1) & "-" & StrConv(aData(UBound(aData)), vbProperCase)
Else ' just do a normal string conversion to proper case
fProperCase = StrConv(strData, vbProperCase)
End If
End Function
Then, in your worksheet, you can use this just as you would any built-in formula, so if "JOHN-MD-HOPKINS" is in cell A1, you would use this as a formula in another cell:
=fProperCase(A1)
Which would display John-MD-Hopkins as required.
EDITED CODE
As the requirement is to leave the second word, then this modified VBA function, which "walks" the array should work instead:
Function fProperCase2(strData As String) As String
Dim aData() As String
Dim lngLoop1 As Long
aData() = Split(strData, "-")
For lngLoop1 = LBound(aData) To UBound(aData)
If (lngLoop1 = LBound(aData) + 1) And (lngLoop1 <> UBound(aData)) Then
aData(lngLoop1) = aData(lngLoop1)
Else
aData(lngLoop1) = StrConv(aData(lngLoop1), vbProperCase)
End If
Next lngLoop1
fProperCase2 = Join(aData, "-")
End Function
It basically looks to see if the array element being dealt with is the second (lngLoop1=LBound(aData)+1) and also not the last (lngLoop1<>UBound(aData)).
Regards,

Find how many words from cell are found in an array

I have two columns with data. The first one has some terms and the other one contains single words.
what I have
I'm looking for a way to identify which words from each cell from the first column appear in the second, so the result should look something like this (I don't need the commas):
what I need
My question is somehow similar to Excel find cells from range where search value is within the cell but not exactly, because I need to identify which words are appearing in the second column and there can be more than one word.
I also tried =INDEX($D$2:$D$7;MATCH(1=1;INDEX(ISNUMBER(SEARCH($D$2:$D$7;A2));0);))
but it also returns only one word.
If you are willing to use VBA, then you can define a user defined function:
Public Function SearchForWords(strTerm As String, rngWords As Range) As String
Dim cstrDelimiter As String: cstrDelimiter = Chr(1) ' A rarely used character
strTerm = cstrDelimiter & Replace(strTerm, " ", cstrDelimiter) & cstrDelimiter ' replace any other possible delimiter here
SearchForWords = vbNullString
Dim varWords As Variant: varWords = rngWords.Value
Dim i As Long: For i = LBound(varWords, 1) To UBound(varWords, 1)
Dim j As Long: For j = LBound(varWords, 2) To UBound(varWords, 2)
If InStr(1, strTerm, cstrDelimiter & varWords(i, j) & cstrDelimiter) <> 0 Then
SearchForWords = SearchForWords & varWords(i, j) & ", "
End If
Next j
Next i
Dim iLeft As Long: iLeft = Len(SearchForWords) - 2
If 0 < iLeft Then
SearchForWords = Left(SearchForWords, Len(SearchForWords) - 2)
End If
End Function
And you can use it from the Excel table like this:
=SearchForWords(A2;$D$2:$D$7)
I have a partial solution:
=IF(1-ISERROR(SEARCH(" "&D2:D7&" "," "&A2&" ")),D2:D7&", ","")
This formula returns an array of the words contained in the cell (ranges are according to your picture). This array is sparse: it contains empty strings for each missing word. And it assumes that words are always separated by one space (this may be improved if necessary).
However, native Excel functions are not capable of concatenating an array, so I think the rest is not possible with native formulas only.
You would need VBA but if you use VBA you should not bother with the first part at all, since you can do anything.
You can create a table with the words you want to find across the top and use a formula populate the cells below each word if it's found. See screenshot.
[edit] I've noticed that it's incorrectly picking up "board" in "blackboard" but that should be easily fixed.
=IFERROR(IF(FIND(C$1,$A2,1)>0,C$1 & ", "),"")
Simply concatinate the results
=CONCATENATE(C2,D2,E2,F2,G2,H2)
or
=LEFT(CONCATENATE(C2,D2,E2,F2,G2,H2),LEN(CONCATENATE(C2,D2,E2,F2,G2,H2))-2)
to take off the last comma and space
I've edited this to fix the problem with "blackboard"
new formula for C2
=IF(OR(C$1=$A2,ISNUMBER(SEARCH(" "&C$1&" ",$A2,1)),C$1 & " "=LEFT($A2,LEN(C$1)+1)," " & C$1=RIGHT($A2,LEN(C$1)+1)),C$1 & ", ","")
New formula for B2 to catch the error if there are no words
=IFERROR(LEFT(CONCATENATE(C2,D2,E2,F2,G2,H2,I2),LEN(CONCATENATE(C2,D2,E2,F2,G2,H2,I2))-2),"")

How to extract text within a string of text

I have a simple problem that I'm hoping to resolve without using VBA but if that's the only way it can be solved, so be it.
I have a file with multiple rows (all one column). Each row has data that looks something like this:
1 7.82E-13 >gi|297848936|ref|XP_00| 4-hydroxide gi|297338191|gb|23343|randomrandom
2 5.09E-09 >gi|168010496|ref|xp_00| 2-pyruvate
etc...
What I want is some way to extract the string of numbers that begin with "gi|" and end with a "|". For some rows this might mean as many as 5 gi numbers, for others it'll just be one.
What I would hope the output would look like would be something like:
297848936,297338191
168010496
etc...
Here is a very flexible VBA answer using the regex object. What the function does is extract every single sub-group match it finds (stuff inside the parenthesis), separated by whatever string you want (default is ", "). You can find info on regular expressions here: http://www.regular-expressions.info/
You would call it like this, assuming that first string is in A1:
=RegexExtract(A1,"gi[|](\d+)[|]")
Since this looks for all occurance of "gi|" followed by a series of numbers and then another "|", for the first line in your question, this would give you this result:
297848936, 297338191
Just run this down the column and you're all done!
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.count - 1
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function
Here it is (assuming data is in column A)
=VALUE(LEFT(RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2),
FIND("|",RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2)) -1 ))
Not the nicest formula, but it will work to extract the number.
I just noticed since you have two values per row with output separated by commas. You will need to check if there is a second match, third match etc. to make it work for multiple numbers per cell.
In reference to your exact sample (assuming 2 values maximum per cell) the following code will work:
=IF(ISNUMBER(FIND("gi|",$A1,FIND("gi|", $A1)+1)),CONCATENATE(LEFT(RIGHT($A1,LEN($A1)
- FIND("gi|",$A1) - 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ),
", ",LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1)
- 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1) - 2))
-1 )),LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2),
FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ))
How's that for ugly? A VBA solution may be better for you, but I'll leave this here for you.
To go up to 5 numbers, well, study the pattern and recurse manually in the formula. IT will get long!
I'd probably split the data first on the | delimiter using the convert text to columns wizard.
In Excel 2007 that is on the Data tab, Data Tools group and then choose Text to Columns. Specify Other: and | as the delimiter.
From the sample data you posted it looks like after you do this the numbers will all be in the same columns so you could then just delete the columns you don't want.
As the other guys presented the solution without VBA... I'll present the one that does use. Now, is your call to use it or no.
Just saw that #Issun presented the solution with regex, very nice! Either way, will present a 'modest' solution for the question, using only 'plain' VBA.
Option Explicit
Option Base 0
Sub findGi()
Dim oCell As Excel.Range
Set oCell = Sheets(1).Range("A1")
'Loops through every row until empty cell
While Not oCell.Value = ""
oCell.Offset(0, 1).Value2 = GetGi(oCell.Value)
Set oCell = oCell.Offset(1, 0)
Wend
End Sub
Private Function GetGi(ByVal sValue As String) As String
Dim sResult As String
Dim vArray As Variant
Dim vItem As Variant
Dim iCount As Integer
vArray = Split(sValue, "|")
iCount = 0
'Loops through the array...
For Each vItem In vArray
'Searches for the 'Gi' factor...
If vItem Like "*gi" And UBound(vArray) > iCount + 1 Then
'Concatenates the results...
sResult = sResult & vArray(iCount + 1) & ","
End If
iCount = iCount + 1
Next vItem
'And removes trail comma
If Len(sResult) > 0 Then
sResult = Left(sResult, Len(sResult) - 1)
End If
GetGi = sResult
End Function
open your excel in Google Sheets and use the regular expression with REGEXEXTRACT
Sample Usage
=REGEXEXTRACT("My favorite number is 241, but my friend's is 17", "\d+")
Tip: REGEXEXTRACT will return 241 in this example because it returns the first matching case.
In your case
=REGEXEXTRACT(A1,"gi[|](\d+)[|]")

Resources