How to search for a specific string in cell array - string

I would like to search for a specific string in matlab cell. For example my cell contains a column of strings like this
variable(:,5) = {'10';'10;20';'20';'10;20';'10';'10';'20'};
I would like to search for all cells that have only '10' and delete them.
I tried using this statement for searching
is10 = ~cellfun(# isempty , strfind (variable(:,5) , '10'));
But this returns all cells with '10' (including the ones with '10;20').
I would like to have just the cells with pure '10' values
What is the best way to do this?

It is not working as you expect because strfind allows for a partial string match. What you want is an exact match. You can do this using strcmp. Also, the input to strcmp can actually be a cell array of strings so you can use it the following way.
A = {'10';'10;20';'20';'10;20';'10';'10';'20'};
is10 = strcmp(A, '10');
%// 1 0 0 0 1 1 0
You could also use ismember to do the same thing.
is10 = ismember(A, '10');
%// 1 0 0 0 1 1 0
As a side note, most string functions (including strfind) can actually accept a cell array of strings as input. So in your initial post, the wrapping of strfind inside of cellfun is unnecessary.

Related

Count Patterns In one Cell Excel

I wanted your help, I'm currently working in extracting some data, now the thing is that I have to count an specific amount of Call IDs a call ID format is the following 9129572520020000711. The pattern is 19 characters that starts with 9 and ends in 1.
and I want to count how many times this pattern appears in one cell
I.E. this is the value in one cell and I want to count how many times the pattern appears.
1912957252002000071129129545183410000711391295381628700007114912959791875000071159129597085000000711691295892838400007117912958908933000071189129452513730000711
To solve this with formulae you need to know:
The starting character
The ending character
The length of your Call ID
Finding all possible Call IDs
Let B1 be your number string and B2 be the call ID (or pattern) you are looking for. In B5 enter the formula =MID($B$2,1,1) to find the starting character you are looking for. In B6 enter =RIGHT($B$2,1) for the end character. In B7 enter =LEN($B$2) for the length of the call ID.
In Column A we'll enter the position of every starting character. The first formula will be a simple Find() formula in B10 as =FIND($B$5,$B$1,1). To find the other starting characters start the Find() at the location after the last starting character: =FIND($B$5,$B$1,$A10+1) in B11. Copy this down the column a few dozen times (or more).
In Column B we'll see if the next X characters (where X is the length of the Call ID) meets the criteria for a Call ID:
=IF(MID($B$1,$A10+($B$7-1),1)=$B$6,TRUE,FALSE)
The MID($B$1,$A10+($B$7-1),1)=$B$6 checks if the character at the end of the character at the end of this possible Call ID is the end character we're looking for. $A10+($B$7) calculates the position of the possible Call ID and $B$6 is the end character.
In Column C we can return the actual Call ID if there is a match. This isn't necessary to find the count, but will be useful later. Simply check if the value in Column B is True and, if yes, return the calculated string: =IF(B10,MID($B$1,$A10,$B$7),"").
To actually count the number of valid Call IDs, do a CountIf() of the Call ID column to check for the number of True values: =IF(B10,MID($B$1,$A10,$B$7),"").
If you don't want all the #Values! just wrap everything in IFERROR(,"") formulas.
Finding all consecutive Call IDs
However , some of these Call IDs overlap. Operating on the assumption that Call IDs cannot overlap, we simply have to start our search after the end character of a found ID, not the start. Insert an "Ending Position" column in Column B with the formulae: =$A10+($C$7-1), starting in B11. Alter A11 to =FIND($C$5,$C$1,$B10+1) and copy down. Don't change A10 as this finds the first starting position and is not depending on anything but the original text.
Which ones are valid?
I don't know, that depends on other criteria for your Call IDs. If you receive them consecutively, then the second method is best and the other possible ones found are by coincidence. If not, then you'll have to apply some other validation criteria to the first method, hence why we identified each ID.
You can solve this simply with a UDF using a regular expression.
Option Explicit
Function callIDcount(S As String) As Long
Dim RE As Object, MC As Object
Const sPat As String = "9\d{17}1"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = sPat
Set MC = .Execute(S)
callIDcount = MC.Count
End With
End Function
Using your example, this returns a count of 8
The regular expression engine captures all of the matches that match the pattern, into the match collection. To see how many are there, we merely return the count of that collection.
Trivial modifications would allow one to return the actual ID's also, should that be necessary.
The regex:
9\d{17}1
9\d{17}1
Match the character “9” literally 9
Match a single character that is a “digit” (ASCII 0–9 only) \d{17}
Exactly 17 times {17}
Match the character “1” literally 1
Created with RegexBuddy
EDIT Reading through TheFizh's post, he considered that you might want the count to include overlapping CallID's. In other words, given:
9129572520020000711291
We see that includes:
9129572520020000711
9572520020000711291
where the second overlaps with the first, but both meet your requirements.
Should that be what you want, merely change the regex so it does not "consume" the match:
Const sPat As String = "9(?=\d{17}1)"
and you will return the result of 15 instead of 8, which would be non-overlapping pattern.
Do you mean something like what's following?
Sub CallID_noPatterns()
Dim CallID As String, CallIDLen As Integer
CallID = "9#################1"
CallIDLen = Len(CallID) 'the CallID's length
'Say that you want to get the value of "A1" cell and deal with its value
Dim CellVal As String, CellLen As Integer
CellVal = CStr(Range("A1").Text) 'get its value as a string
CellLen = Len(CellVal) 'get its length
'You Have 2 options:-
'1-The value is smaller than your CallID length. (Not Applicable)
'2-The value is longer than or equal to your CallID length
'So just run your code for the 2nd option
Dim i As Integer, num_checks, num_patterns
i = 0
num_patterns = 0
'imagine both of them as 2 arrays, every array consists of sequenced elements
'and your job is to take a sub-array from your value, of a length
' equals to CallID's length
'then compare your sub-array with CallID
num_checks = CellLen - CallIDLen + 1
If CellLen >= CallIDLen Then
For i = 0 To num_checks - 1 Step 19
For j = i To num_checks - 1
If Mid(CellVal, (j + 1), CallIDLen) Like CallID Then
num_patterns = num_patterns + 1
Exit For
End If
Next j
Next i
End If
'Display your result
MsgBox "Number of Patterns: " & Str(num_patterns)
End Sub

Find index of cells containing my string

I have a cellarray C which contain numbers and string like that.
1 0 'C:\user' 41.57
2 0 'C:\user' 46.25
3 0 'C:\user' 48
4 0 'C:\user' 48.33
I want to get the index of the cell which is equal to a specified name enter.
I have tried to do something like that but it didn't work
idx=find(strcmp([C{:,:}],'C:\User\..')
I need help please
To use strcmp, you have to use num2str at first to convert the double to string. Use UniformOutput as false since your C has both numbers and strings.
idx = find(strcmp(cellfun(#num2str, C, 'un', 0), 'C:\user'));
[row, col] = ind2sub(size(C), idx);

What is the most efficient format for storing strings from a for loop?

I have a script that runs through a series of strings and using regex pulls out certain strings (approx 4 output strings per input string).
e.g. HelloStackOverflowWorld
-> Hello; Stack; Overflow; World;
The final output would ideally be a table where I can filter based upon the strings in the columns. Using the case above, column 1 row 1 would have 'Hello', column 2 row 1 would have 'Stack' and so on.
The problem is, the size of the output will change depending on the input so I am unsure of what output format to use.
At the moment I used something similar to this:
if strfind(missing{ii},'hello')
miss.exch = [miss.exch;'hello'];
temp.exc = regexp(missing{ii},'(?<=\d[Q|T])(\w*?)(?=[q])','match');
miss.exc = [miss.exc;temp.exc];
temp.TQ= regexp(missing{ii},'(Qc|Tc)','match');
if strcmp(temp.TQ{1,1}, 'Tc')
miss.TQ = [miss.TQ;'variableA'];
elseif temp.TQ{1,1} == 'Qc'
miss.TQ = [miss.TQ;'variableB'];
end
else if .........
end
Which obviously results in a 1x1 struct consisting of a number of fields each with many cells. This makes filtering on strings an issue!
How can I define and add data into a 'table of strings' that I can then filter?
I think you are just looking for a cell array. Here is a simple example of what they can do:
C = {'Abc','Bcd';'Cde',[]}
strcmp(C,'Cde')
Results in:
ans =
0 0
1 0
Make sure to check doc cell to see how you can access them.

How to count up elements in excel

So I have a column called chemical formula for like 40,000 entries, and what I want to be able to do is count up how many elements are contained in the chemical formula. So for example:-
EXACT_MASS FORMULA
626.491026 C40H66O5
275.173274 C13H25NO5
For this, I need some kind of formula that will return with the result of
C H O
40 66 5
13 25 5
all as separate columns for the different elements and in rows for the different entries. Is there a formula that can do this?
You could make your own formula.
Open the VBA editor with ALT and F11 and insert a new module.
Add a reference to Microsoft VBScript Regular Expressions 5.5 by clicking Tools, then references.
Now add the following code:
Public Function FormulaSplit(theFormula As String, theLetter As String) As String
Dim RE As Object
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[A-Z]{1}[a-z]?"
End With
Dim Matches As Object
Set Matches = RE.Execute(theFormula)
Dim TheCollection As Collection
Set TheCollection = New Collection
Dim i As Integer
Dim Match As Object
For i = (Matches.Count - 1) To 0 Step -1
Set Match = Matches.Item(i)
TheCollection.Add Mid(theFormula, Match.FirstIndex + (Len(Match.Value) + 1)), UCase(Trim(Match.Value))
theFormula = Left(theFormula, Match.FirstIndex)
Next
FormulaSplit = "Not found"
On Error Resume Next
FormulaSplit = TheCollection.Item(UCase(Trim(theLetter)))
On Error GoTo 0
If FormulaSplit = "" Then
FormulaSplit = "1"
End If
Set RE = Nothing
Set Matches = Nothing
Set Match = Nothing
Set TheCollection = Nothing
End Function
Usage:
FormulaSplit("C40H66O5", "H") would return 66.
FormulaSplit("C40H66O5", "O") would return 5.
FormulaSplit("C40H66O5", "blah") would return "Not found".
You can use this formula directly in your workbook.
I've had a stab at doing this in a formula nad come up with the following:
=IFERROR((MID($C18,FIND(D17,$C18)+1,2))*1,IFERROR((MID($C18,FIND(D17,$C18)+1,1))*1,IFERROR(IF(FIND(D17,$C18)>0,1),0)))
It's not very neat and would have to be expanded further if any of your elements are going to appear more than 99 times - I also used a random placement on my worksheet so the titles H,C and O are in row 17. I would personally go with Jamie's answer but just wanted to try this to see if I could do it in a formula possible and figured it was worth sharing just as another perspective.
Even though this has an excellent (and accepted) VBA solution, I couldn't resist the challenge to do this without using VBA.
I posted a solution earlier, which wouldn't work in all cases. This new code should always work:
=MAX(
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0),
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
)
Enter as an array formula: Ctrl + Shift + Enter
Output:
The formula outputs 0 when not found, and I simply used conditional formatting to turn zeroes gray.
How it works
This part of the formula looks for the element, followed by a number between 1 and 99. If found, the number of atoms is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0)
In the case of C13H25NO5, a search for "C" returns this array:
{1,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,...,0}
1 is the first array element, because C1 is a match. 13 is the thirteenth array element, and that's what we're interested in.
The next part of the formula looks for the element, followed by an uppercase letter, which indicates a new element. (The letters A through Z are characters 65 through 90.) If found, the number 1 is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
"Z" is appended to the chemical formula, so that a match will be found when its last element has no number. (For example, "H2O".) There is no element "Z" in the Periodic Table, so this won't cause a problem.
In the case of C13H25NO5, a search for "N" returns this array:
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0}
1 is the 15th element in the array. That's because it found the letters "NO", and O is the 15th letter of the alphabet.
Taking the maximum value from each array gives us the number of atoms as desired.

Counting the occurence of substrings in matlab

I have a cell, something like this P= {Face1 Face6 Scene6 Both9 Face9 Scene11 Both12 Face15}. I would like to count how many Face values, Scene values, Both values in P. I don't care about the numeric values after the string (i.e., Face1 and Face23 would be counted as two). I've tried the following (for the Face) but I got the error "If any of the input arguments are cell arrays, the first must be a cell array of strings and the second must be a character array".
strToSearch='Face';
numel(strfind(P,strToSearch));
Does anyone have any suggestion? Thank you!
Use regexp to find strings that start (^) with the desired text (such as 'Face'). The result will be a cell array, where each cell contains 1 if there is a match, or [] otherwise. So determine if each cell is nonempty (~cellfun('isempty', ...): will give a logical 1 for nonempty cells, and 0 for empty cells), and sum the results (sum):
>> P = {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
>> sum(~cellfun('isempty', regexp(P, '^Face')))
ans =
4
>> sum(~cellfun('isempty', regexp(P, '^Scene')))
ans =
2
Your example should work with some small tweaks, provided all of P contains strings, but may give the error you get if there are any non-string values in the cell array.
P= {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
strToSearch='Face';
n = strfind(P,strToSearch);
numel([n{:}])
(returns 4)

Resources