How to extract substrings with different lengths? - string

I have an n by 2 matrix that contains start and end indices of substrings of a specified string. How can I extract the n by 1 cell array of substrings without a for-loop?
string = 'Hello World!';
ranges = [1 1;
2 3;
4 5;
3 7];
substrings = cell(size(ranges, 1), 1);
for i=1:size(ranges, 1)
substrings{i} = string(ranges(i, 1):ranges(i, 2));
end
The expected result:
substrings =
'H'
'el'
'lo'
'llo W'

You can use cellfun to make it a single-line operation:
str = 'Hello World!';
ranges = [ 1 1;
2 3;
4 5;
3 7];
% first convert "ranges" to a cell object
Cranges = mat2cell(ranges,ones(size(ranges,1),1),2);
% call "cellfun" on every row/entry of "Cranges"
cellfun(#(x)str(x(1):x(2)),Cranges, 'UniformOutput',false)
ans =
4×1 cell array
{'H' }
{'el' }
{'lo' }
{'llo W'}
I have changed the variable string to str because string is a native function in MATLAB (converting the input to the type string).
Although this is single-line operation, it doesn't mean that it is more efficient:
Num = 1000000;
substrings = cell(size(ranges, 1), 1);
% time for-loop
tic
for j = 1:Num
for i = 1:size(ranges, 1)
substrings{i} = str(ranges(i, 1):ranges(i, 2));
end
end
toc;
Cranges = mat2cell(ranges,ones(size(ranges,1),1),2);
% time function-call
tic
for j = 1:Num
substrings = cellfun(#(x)str(x(1):x(2)),Cranges, 'UniformOutput',false);
end
toc;
Elapsed time is 3.929622 seconds.
Elapsed time is 50.319609 seconds.

Related

Maximize number of substring such that no substring has characters from other substring

So I was asked an interesting question recently related to strings and substring. Still trying to get the most optimal answer to this. I'll prefer answer in Java though any psuedo-code/language will be good as well.
The question is:
I am given a string S. I have to divide it into maximum number of substrings(not subsequence) such that no substring has character which is present in another substring.
Examples:
1.
S = "aaaabbbcd"
Substrings = ["aaaa","bbb","c","d"]
2.
S = "ababcccdde"
Substrings = ["abab","ccc","dd","e"]
3.
S = "aaabbcccddda"
Substrings = ["aaabbcccddda"]
Will be really glad if I can get a solution which is better than O(n^2)
Thanks for the help.
It can be done in O(n) time.
The idea behind it is to predict where each substring will end. We know that if we read a char, then the last occurrence of this char must be in the same substring it is (otherwise there would be a repeated char in two distinct substrings).
Let's use abbacacd as example. Suppose we know the first and the last occurrences of every char in the string.
01234567
abbacacd (reading a at index 0)
- we know that our substring must be at least abbaca (last occurrence of a);
- the end of our substring will be the maximum between the last occurrence of
all the chars inside the own substring;
- we iterate through the substring:
012345 (we found b at index 1)
abbaca substring_end = maximum(5, last occurrence of b = 2)
substring_end = 5.
012345 (we found b at index 2)
abbaca substring_end = maximum(5, last occurrence of b = 2)
substring_end = 5.
012345 (we found a at index 3)
abbaca substring_end = maximum(5, last occurrence of a = 5)
substring_end = 5.
012345 (we found c at index 4)
abbaca substring_end = maximum(5, last occurrence of c = 6)
substring_end = 6.
0123456 (we found a at index 5)
abbacac substring_end = maximum(6, last occurrence of a = 5)
substring_end = 6.
0123456 (we found c at index 6)
abbacac substring_end = maximum(6, last occurrence of c = 6)
substring_end = 6.
---END OF FIRST SUBSTRING---
01234567
abbacacd [reading d]
- the first and last occurrence of d is the same index.
- d is an atomic substring.
The O(n) solution is:
#include <bits/stdc++.h>
using namespace std;
int main(){
int pos[26][2];
int index;
memset(pos, -1, sizeof(pos));
string s = "aaabbcccddda";
for(int i = 0; i < s.size(); i++){
index = s[i] - 'a';
if(pos[index][0] == -1) pos[index][0] = i;
pos[index][1] = i;
}
int substr_end;
for(int i = 0; i < s.size(); i++){
index = s[i] - 'a';
if(pos[index][0] == pos[index][1]) cout<<s[i]<<endl;
else{
substr_end = pos[index][1];
for(int j = i + 1; j < substr_end; j++){
substr_end = max(substr_end, pos[s[j] - 'a'][1]);
}
cout<<s.substr(i, substr_end - i + 1)<<endl;
i = substr_end;
}
}
}
You can do it with two passes. On the 1st you determine the max index of each character in the string. On the 2nd you keep track of the max index of each encountered character. If the max equals the current index you've reached the end of a unique substring.
Here's some Java code to illustrate:
char[] c = "aaaabbbcd".toCharArray();
int[] max = new int[26];
for(int i=0; i<c.length; i++) max[c[i]-'a'] = i;;
for(int i=0, m=0, lm=0; i<c.length;)
if((m = Math.max(m, max[c[i]-'a'])) == i++)
System.out.format("%s ", s.substring(lm, lm = i));
Output:
aaaa bbb c d
And for the other 2 strings:
abab ccc dd e
aaabbcccddda
The accepted answer includes some unnecessary complexity in the implementation of algorithm. It is very straight forward to divide strings (as the examples posted by OP in question) into maximum number of substrings such that no substring has character which is present in another substring.
Algorithm:
(assumption: the input string is a not null string having 1 or more characters within 'a' to 'z' inclusive)
Record the last position of each character of input string.
Assume, the first substring end position is 0.
Iterate through string and for every character in input string-
a). If the current character last position is greater than substring end position than update substring end position to current character last position.
b). Add (or print) current character processing as part of current substring.
c). If substring end position is equal to the position of current character processing then it is end of a unique substring and from next character the new substring starts.
Repeat 3 until input string end.
Implementation:
#include <stdio.h>
#include <string.h>
void unique_substr(const char * pst) {
size_t ch_last_pos[26] = {0};
size_t subst_end_pos = 0;
size_t len = strlen(pst);
printf ("%s -> ", pst);
for (size_t i = 0; i < len; i++) {
ch_last_pos[pst[i] - 'a'] = i;
}
for (size_t i = 0; i < len; i++) {
size_t pos = ch_last_pos[pst[i] - 'a'];
if (pos > subst_end_pos) {
subst_end_pos = pos;
}
printf ("%c", pst[i]);
if (subst_end_pos == i) {
printf (" ");
}
}
printf ("\n");
}
//Driver program
int main(void) {
//base cases
unique_substr ("b");
unique_substr ("ab");
//strings posted by OP in question
unique_substr ("aaaabbbcd");
unique_substr ("ababcccdde");
unique_substr ("aaabbcccddda");
return 0;
}
Output:
# ./a.out
b -> b
ab -> a b
aaaabbbcd -> aaaa bbb c d
ababcccdde -> abab ccc dd e
aaabbcccddda -> aaabbcccddda

Subfunctions in matlab

I have function called Assignment in Matlab with PsychToolBox. This function shows a random color to the paritcipant and require participant to name the color and record this data.
function should return me 2 output as a string
rgb code of the random color like: trial(1).color = [5 5 5]
a matrix which correspond to the sound record.
I write the main functions and color part is okay, but I cannot integrate the recording function into the main function.
in main function I use this string trial.data = recording(1,0,5)
and then I wrote a subfunction named "recording"
function recording (wavfilename, voicetrigger, maxsecs)
bla, bla
end
However, the main function does not recognize the subfunction. Am I doing an logical error? the error message is below
Error: File: assignment.m Line: 40 Column: 27
Unexpected MATLAB expression.
line 40 = trial.data = recording(1,0,5)
function ass8(trial)
Screen('Preference', 'SkipSyncTests', 1)
ListenChar(2);
Screen('HideCursorHelper', 0, 0)
[myWin, rect]=Screen('OpenWindow',0,[128,128,128]);
centerX=rect(3)/2;
centerY=rect(4)/2;
for trial = 1:100
Screen('TextSize', myWin, 30);
Screen('TextFont', myWin, 'Times');
[normBoundsRect, offsetBoundsRect] = Screen('TextBounds',myWin, 'What is the color of the rectangle?');
Screen('DrawText', myWin, 'What is the color of the rectangle?', (centerX-(normBoundsRect(3)/2)),(centerY-(normBoundsRect(4)/2+150)), [0,0,0]);
Screen('Flip', myWin)
WaitSecs(1)% inter stimulus interval
color = randi(255,1,3)
while 1
Screen('FillRect', myWin, color ,[583, 284, 783, 484])
% [ (centerX-100), (centerY-100), (centerX+100),(centerY+100)]);
Screen('Flip', myWin)
WaitSecs(3)
trial.color = color % trial 'ın rengini belirtmesini söyledim
trial.data = reco(1,0 5)% trial'ın ismi 1, kayıt yapacağı süre ise 3 sn
if Waitsecs(3)==1
break; % Terminates the loop if the condition is % satisfied
end
end
pause(.05);
% [clicks, x, y, buttons] = GetClicks(myWin);
%
% buttons=0;
% while ~buttons
% [x, y, buttons] = GetMouse(myWin);
% end
% while 1
% [x,y,buttons] = GetMouse(myWin);
% if ~buttons(1)
% break;
% end
% end
Screen('CloseAll')
end
end
function reco(wavfilename, voicetrigger, maxsecs)
%
% AssertOpenGL;
if nargin < 1
wavfilename = [];
end
if nargin < 2
voicetrigger = [];
end
if isempty(voicetrigger)
voicetrigger = 0;
end
if nargin < 3
maxsecs = [];
end
if isempty(maxsecs)
maxsecs = inf;
end
InitializePsychSound;
freq = 44100;
pahandle = PsychPortAudio('Open', [], 2, 0, freq, 2);
PsychPortAudio('GetAudioData', pahandle, 10);
PsychPortAudio('Start', pahandle, 0, 0, 1);
if voicetrigger > 0
% Yes. Fetch audio data and check against threshold:
level = 0;
% Repeat as long as below trigger-threshold:
while level < voicetrigger
% Fetch current audiodata:
[audiodata offset overflow tCaptureStart] = PsychPortAudio('GetAudioData', pahandle);
% Compute maximum signal amplitude in this chunk of data:
if ~isempty(audiodata)
level = max(abs(audiodata(1,:)));
else
level = 0;
end
% Below trigger-threshold?
if level < voicetrigger
% Wait for a millisecond before next scan:
WaitSecs(0.0001);
end
end
else
% Start with empty sound vector:
recordedaudio = [];
end
s = PsychPortAudio('GetStatus', pahandle)
while ~KbCheck && ((length(recordedaudio) / s.SampleRate) < maxsecs)
% Wait a second...
WaitSecs(1);
% Query current capture status and print it to the Matlab window:
s = PsychPortAudio('GetStatus', pahandle);
% Print it:
fprintf('\n\nAudio capture started, press any key for about 1 second to quit.\n');
fprintf('This is some status output of PsychPortAudio:\n');
disp(s);
% Retrieve pending audio data from the drivers internal ringbuffer:
audiodata = PsychPortAudio('GetAudioData', pahandle);
nrsamples = size(audiodata, 2);
% Plot it, just for the fun of it:
plot(1:nrsamples, audiodata(1,:), 'r', 1:nrsamples, audiodata(2,:), 'b');
drawnow;
% And attach it to our full sound vector:
recordedaudio = [recordedaudio audiodata]; %#ok<AGROW>
end
PsychPortAudio('Stop', pahandle);
audiodata = PsychPortAudio('GetAudioData', pahandle);
recordedaudio = [recordedaudio audiodata];
PsychPortAudio('Close', pahandle);
if ~isempty(wavfilename)
psychwavwrite(transpose(recordedaudio), 44100, 16, wavfilename)
end
fprintf('helal lan!\n');
ListenChar(2);
end

Modified longest common substring

Given two strings what is an efficient algorithm to find the number and length of longest common sub-strings with the sub-strings being called common if :
1) they have at-least x% characters same and at same position.
2) the start and end indexes of the sub-strings being same.
Ex :
String 1 -> abedefkhj
String 2 -> kbfdfjhlo
suppose the x% being asked is 40,then, ans is,
5 1
where 5 is the longest length and 1 is the number of sub-strings in each string satisfying the given property. Sub-String is "abede" in string 1 and "kbfdf" in string 2.
You can use smth like Levenshtein distance without deleting and inserting.
Build the table, where every element [i, j] is error for substring from position [i] to position [j].
foo(string a, string b, int x):
len = min(a.length, b.length)
error[0][0] = 0 if a[0] == b[0] else 1;
for (end: [1 -> len-1]):
for (start: [end -> 0]):
if a[end] == b[end]:
error[start][end] = error[start][end - 1]
else:
error[start][end] = error[start][end - 1] + 1
best_len = 0;
best_pos = 0;
for (i: [0 -> len-1]):
for (j: [i -> 0]):
len = i - j + 1
error_percent = 100 * error[i][j] / len
if (error_percent <= x and len > best_len):
best_len = len
best_pos = j
return (best_len, best_pos)

Finding minimum moves required for making 2 strings equal

This is a question from one of the online coding challenge (which has completed).
I just need some logic for this as to how to approach.
Problem Statement:
We have two strings A and B with the same super set of characters. We need to change these strings to obtain two equal strings. In each move we can perform one of the following operations:
1. swap two consecutive characters of a string
2. swap the first and the last characters of a string
A move can be performed on either string.
What is the minimum number of moves that we need in order to obtain two equal strings?
Input Format and Constraints:
The first and the second line of the input contains two strings A and B. It is guaranteed that the superset their characters are equal.
1 <= length(A) = length(B) <= 2000
All the input characters are between 'a' and 'z'
Output Format:
Print the minimum number of moves to the only line of the output
Sample input:
aab
baa
Sample output:
1
Explanation:
Swap the first and last character of the string aab to convert it to baa. The two strings are now equal.
EDIT : Here is my first try, but I'm getting wrong output. Can someone guide me what is wrong in my approach.
int minStringMoves(char* a, char* b) {
int length, pos, i, j, moves=0;
char *ptr;
length = strlen(a);
for(i=0;i<length;i++) {
// Find the first occurrence of b[i] in a
ptr = strchr(a,b[i]);
pos = ptr - a;
// If its the last element, swap with the first
if(i==0 && pos == length-1) {
swap(&a[0], &a[length-1]);
moves++;
}
// Else swap from current index till pos
else {
for(j=pos;j>i;j--) {
swap(&a[j],&a[j-1]);
moves++;
}
}
// If equal, break
if(strcmp(a,b) == 0)
break;
}
return moves;
}
Take a look at this example:
aaaaaaaaab
abaaaaaaaa
Your solution: 8
aaaaaaaaab -> aaaaaaaaba -> aaaaaaabaa -> aaaaaabaaa -> aaaaabaaaa ->
aaaabaaaaa -> aaabaaaaaa -> aabaaaaaaa -> abaaaaaaaa
Proper solution: 2
aaaaaaaaab -> baaaaaaaaa -> abaaaaaaaa
You should check if swapping in the other direction would give you better result.
But sometimes you will also ruin the previous part of the string. eg:
caaaaaaaab
cbaaaaaaaa
caaaaaaaab -> baaaaaaaac -> abaaaaaaac
You need another swap here to put back the 'c' to the first place.
The proper algorithm is probably even more complex, but you can see now what's wrong in your solution.
The A* algorithm might work for this problem.
The initial node will be the original string.
The goal node will be the target string.
Each child of a node will be all possible transformations of that string.
The current cost g(x) is simply the number of transformations thus far.
The heuristic h(x) is half the number of characters in the wrong position.
Since h(x) is admissible (because a single transformation can't put more than 2 characters in their correct positions), the path to the target string will give the least number of transformations possible.
However, an elementary implementation will likely be too slow. Calculating all possible transformations of a string would be rather expensive.
Note that there's a lot of similarity between a node's siblings (its parent's children) and its children. So you may be able to just calculate all transformations of the original string and, from there, simply copy and recalculate data involving changed characters.
You can use dynamic programming. Go over all swap possibilities while storing all the intermediate results along with the minimal number of steps that took you to get there. Actually, you are going to calculate the minimum number of steps for every possible target string that can be obtained by applying given rules for a number times. Once you calculate it all, you can print the minimum number of steps, which is needed to take you to the target string. Here's the sample code in JavaScript, and its usage for "aab" and "baa" examples:
function swap(str, i, j) {
var s = str.split("");
s[i] = str[j];
s[j] = str[i];
return s.join("");
}
function calcMinimumSteps(current, stepsCount)
{
if (typeof(memory[current]) !== "undefined") {
if (memory[current] > stepsCount) {
memory[current] = stepsCount;
} else if (memory[current] < stepsCount) {
stepsCount = memory[current];
}
} else {
memory[current] = stepsCount;
calcMinimumSteps(swap(current, 0, current.length-1), stepsCount+1);
for (var i = 0; i < current.length - 1; ++i) {
calcMinimumSteps(swap(current, i, i + 1), stepsCount+1);
}
}
}
var memory = {};
calcMinimumSteps("aab", 0);
alert("Minimum steps count: " + memory["baa"]);
Here is the ruby logic for this problem, copy this code in to rb file and execute.
str1 = "education" #Sample first string
str2 = "cnatdeiou" #Sample second string
moves_count = 0
no_swap = 0
count = str1.length - 1
def ends_swap(str1,str2)
str2 = swap_strings(str2,str2.length-1,0)
return str2
end
def swap_strings(str2,cp,np)
current_string = str2[cp]
new_string = str2[np]
str2[cp] = new_string
str2[np] = current_string
return str2
end
def consecutive_swap(str,current_position, target_position)
counter=0
diff = current_position > target_position ? -1 : 1
while current_position!=target_position
new_position = current_position + diff
str = swap_strings(str,current_position,new_position)
# p "-------"
# p "CP: #{current_position} NP: #{new_position} TP: #{target_position} String: #{str}"
current_position+=diff
counter+=1
end
return counter,str
end
while(str1 != str2 && count!=0)
counter = 1
if str1[-1]==str2[0]
# p "cross match"
str2 = ends_swap(str1,str2)
else
# p "No match for #{str2}-- Count: #{count}, TC: #{str1[count]}, CP: #{str2.index(str1[count])}"
str = str2[0..count]
cp = str.rindex(str1[count])
tp = count
counter, str2 = consecutive_swap(str2,cp,tp)
count-=1
end
moves_count+=counter
# p "Step: #{moves_count}"
# p str2
end
p "Total moves: #{moves_count}"
Please feel free to suggest any improvements in this code.
Try this code. Hope this will help you.
public class TwoStringIdentical {
static int lcs(String str1, String str2, int m, int n) {
int L[][] = new int[m + 1][n + 1];
int i, j;
for (i = 0; i <= m; i++) {
for (j = 0; j <= n; j++) {
if (i == 0 || j == 0)
L[i][j] = 0;
else if (str1.charAt(i - 1) == str2.charAt(j - 1))
L[i][j] = L[i - 1][j - 1] + 1;
else
L[i][j] = Math.max(L[i - 1][j], L[i][j - 1]);
}
}
return L[m][n];
}
static void printMinTransformation(String str1, String str2) {
int m = str1.length();
int n = str2.length();
int len = lcs(str1, str2, m, n);
System.out.println((m - len)+(n - len));
}
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
String str1 = scan.nextLine();
String str2 = scan.nextLine();
printMinTransformation("asdfg", "sdfg");
}
}

How do you sort and efficiently find elements in a cell array (of strings) in Octave?

Is there built-in functionality for this?
GNU Octave search a cell array of strings in linear time O(n):
(The 15 year old code in this answer was tested and correct on GNU Octave 3.8.2, 5.2.0 and 7.1.0)
The other answer has cellidx which was depreciated by octave, it still runs but they say to use ismember instead, like this:
%linear time string index search.
a = ["hello"; "unsorted"; "world"; "moobar"]
b = cellstr(a)
%b =
%{
% [1,1] = hello
% [2,1] = unsorted
% [3,1] = world
% [4,1] = moobar
%}
find(ismember(b, 'world')) %returns 3
ismember finds 'world' in index slot 3. This is a expensive linear time O(n) operation because it has to iterate through all elements whether or not it is found.
To achieve a logarathmic time O(log n) solution, then your list needs to come pre-sorted and then you can use binary search:
If your cell array is already sorted, you can do O(log-n) worst case:
function i = binsearch(array, val, low, high)
%binary search algorithm for numerics, Usage:
%myarray = [ 30, 40, 50.15 ]; %already sorted list
%binsearch(myarray, 30, 1, 3) %item 30 is in slot 1
if ( high < low )
i = 0;
else
mid = floor((low + high) / 2);
if ( array(mid) > val )
i = binsearch(array, val, low, mid-1);
elseif ( array(mid) < val )
i = binsearch(array, val, mid+1, high);
else
i = mid;
endif
endif
endfunction
function i = binsearch_str(array, val, low, high)
% binary search for strings, usage:
%myarray2 = [ "abc"; "def"; "ghi"]; #already sorted list
%binsearch_str(myarray2, "abc", 1, 3) #item abc is in slot 1
if ( high < low )
i = 0;
else
mid = floor((low + high) / 2);
if ( mystrcmp(array(mid, [1:end]), val) == 1 )
i = binsearch(array, val, low, mid-1);
elseif ( mystrcmp(array(mid, [1:end]), val) == -1 )
i = binsearch_str(array, val, mid+1, high);
else
i = mid;
endif
endif
endfunction
function ret = mystrcmp(a, b)
%this function is just an octave string compare, its behavior follows the
%strcmp(str1,str2)'s in C and java.lang.String.compareTo(...)'s in Java,
%that is:
% -returns 1 if string a > b
% -returns 0 if string a == b
% -return -1 if string a < b
% The gt() operator does not support cell array. If the single word
% is passed as an one-element cell array, converts it to a string.
a_as_string = a;
if iscellstr( a )
a_as_string = a{1}; %a was passed as a single-element cell array.
endif
% The gt() operator does not support cell array. If the single word
% is passed as an one-element cell array, converts it to a string.
b_as_string = b;
if iscellstr( b )
b_as_string = b{1}; %b was passed as a single-element cell array.
endif
% Space-pad the shortest word so as they can be used with gt() and lt() operators.
if length(a_as_string) > length( b_as_string )
b_as_string( length( b_as_string ) + 1 : length( a_as_string ) ) = " ";
elseif length(a_as_string) < length( b_as_string )
a_as_string( length( a_as_string ) + 1 : length( b_as_string ) ) = " ";
endif
letters_gt = gt(a_as_string, b_as_string); %list of boolean a > b
letters_lt = lt(a_as_string, b_as_string); %list of boolean a < b
ret = 0;
%octave makes us roll our own string compare because
%strings are arrays of numerics
len = length(letters_gt);
for i = 1:len
if letters_gt(i) > letters_lt(i)
ret = 1;
return
elseif letters_gt(i) < letters_lt(i)
ret = -1;
return
endif
end;
endfunction
%Assuming that myarray is already sorted, (it must be for binary
%search to finish in logarithmic time `O(log-n))` worst case, then do
myarray = [ 30, 40, 50.15 ]; %already sorted list
binsearch(myarray, 30, 1, 3) %item 30 is in slot 1
binsearch(myarray, 40, 1, 3) %item 40 is in slot 2
binsearch(myarray, 50, 1, 3) %50 does not exist so return 0
binsearch(myarray, 50.15, 1, 3) %50.15 is in slot 3
%same but for strings:
myarray2 = [ "abc"; "def"; "ghi"]; %already sorted list
binsearch_str(myarray2, "abc", 1, 3) %item abc is in slot 1
binsearch_str(myarray2, "def", 1, 3) %item def is in slot 2
binsearch_str(myarray2, "zzz", 1, 3) %zzz does not exist so return 0
binsearch_str(myarray2, "ghi", 1, 3) %item ghi is in slot 3
To sort your array if it isn't already:
Complexity of sorting depends on the kind of data you have and whatever sorting algorithm GNU octave language writers selected, it's somewhere between O(n*log(n)) and O(n*n).
myarray = [ 9, 40, -3, 3.14, 20 ]; %not sorted list
myarray = sort(myarray)
myarray2 = [ "the"; "cat"; "sat"; "on"; "the"; "mat"]; %not sorted list
myarray2 = sortrows(myarray2)
Code buffs to make this backward compatible with GNU Octave 3. 5. and 7. goes to #Paulo Carvalho in the other answer here.
Yes check this: http://www.obihiro.ac.jp/~suzukim/masuda/octave/html3/octave_36.html#SEC75
a = ["hello"; "world"];
c = cellstr (a)
⇒ c =
{
[1,1] = hello
[2,1] = world
}
>>> cellidx(c, 'hello')
ans = 1
>>> cellidx(c, 'world')
ans = 2
The cellidx solution does not meet the OP's efficiency requirement, and is deprecated (as noted by help cellidx).
Håvard Geithus in a comment suggested using the lookup() function on a sorted cell array of strings, which is significantly more efficient than cellidx. It's still a binary search though, whereas most modern languages (and even many 20 year old ones) give us easy access to associative arrays, which would be a much better approach.
While Octave doesn't obviously have associated arrays, that's effectively what the interpreter is using for ocatve's variables, including structs, so you can make us of that, as described here:
http://math-blog.com/2011/05/09/associative-arrays-and-cellular-automata-in-octave/
Built-in Function: struct ("field", value, "field", value,...)
Built-in Function: isstruct (expr)
Built-in Function: rmfield (s, f)
Function File: [k1,..., v1] = setfield (s, k1, v1,...)
Function File: [t, p] = orderfields (s1, s2)
Built-in Function: fieldnames (struct)
Built-in Function: isfield (expr, name)
Function File: [v1,...] = getfield (s, key,...)
Function File: substruct (type, subs,...)
Converting Matlab to Octave is there a containers.Map equivalent? suggests using javaObject("java.util.Hashtable"). That would come with some setup overhead, but would be a performance win if you're using it a lot. It may even be viable to link in some library written in C or C++? Do think about whether this is a maintainable option though.
Caveat: I'm relatively new to Octave, and writing this up as I research it myself (which is how I wound up here). I haven't yet run tests on the efficiency of these techniques, and while I've got a fair knowledge of the underlying algorithms, I may be making unreasonable assumptions about what's actually efficient in Octave.
This is a version of mystrcmp() that works in Octave of recent version (7.1.0):
function ret = mystrcmp(a, b)
%this function is just an octave string compare, its behavior follows the
%strcmp(str1,str2)'s in C and java.lang.String.compareTo(...)'s in Java,
%that is:
% -returns 1 if string a > b
% -returns 0 if string a == b
% -return -1 if string a < b
% The gt() operator does not support cell array. If the single word
% is passed as an one-element cell array, converts it to a string.
a_as_string = a;
if iscellstr( a )
a_as_string = a{1}; %a was passed as a single-element cell array.
endif
% The gt() operator does not support cell array. If the single word
% is passed as an one-element cell array, converts it to a string.
b_as_string = b;
if iscellstr( b )
b_as_string = b{1}; %b was passed as a single-element cell array.
endif
% Space-pad the shortest word so as they can be used with gt() and lt() operators.
if length(a_as_string) > length( b_as_string )
b_as_string( length( b_as_string ) + 1 : length( a_as_string ) ) = " ";
elseif length(a_as_string) < length( b_as_string )
a_as_string( length( a_as_string ) + 1 : length( b_as_string ) ) = " ";
endif
letters_gt = gt(a_as_string, b_as_string); %list of boolean a > b
letters_lt = lt(a_as_string, b_as_string); %list of boolean a < b
ret = 0;
%octave makes us roll our own string compare because
%strings are arrays of numerics
len = length(letters_gt);
for i = 1:len
if letters_gt(i) > letters_lt(i)
ret = 1;
return
elseif letters_gt(i) < letters_lt(i)
ret = -1;
return
endif
end;
endfunction

Resources