How can I change only vowels from uppercase to lowercase and vice versa (MATLAB) - string

I have to change every vowel on a string to upper or lower case depending what it already is.. so "UPPERCASE lowercase" becomes "uPPeRCaSe lOwErcAsE"
So far I have had no success with this aproach
str= 'UPPERCASE lowercase';
vow = 'aeiou';
vowm = 'AEIOU';
for k = 1:5
if str(str == vow(k))
str(str == vow(k))= vowm(k);
else
if str(str == vowm(k))
str(str == vowm(k))= vow(k);
Expected output: "uPPeRCaSe lOwErcAsE"
Actual output: "uPPERCASE lOwErCAsE"
I am extremely new to matlab and im kinda lost.
i aprecciate your help

Use ismember to find all occurrences of each type of vowels (uppercase or lowercase), and then upper and lower to convert them:
str = 'UPPERCASE lowercase'; %// original string
indl = ismember(str, 'aeiou'); %// locations of lowercase vowels
indu = ismember(str, 'AEIOU'); %// locations of uppercase vowels
str(indl) = upper(str(indl)); %// convert from lower to upper
str(indu) = lower(str(indu)); %// convert from upper to lower

As listed in the question, I am assuming the following as the inputs -
%// Inputs
str= 'UPPERCASE lowercase'
vow = 'aeiou'
vowm = 'AEIOU'
Approach #1
One approach based on changem that is used to substitute values -
%// Create maps from input string to reflect changes from lower to upper
%// and vice versa
map1 = changem(str,vowm,vow)
map2 = changem(str,vow,vowm)
%// Find indices to be changed for lower to upper change and vice versa change
idx1 = find(map1~=str)
idx2 = find(map2~=str)
%// Selectively change input string based on the indices to be changed and maps
str(idx1) = map1(idx1)
str(idx2) = map2(idx2)
Approach #2
With bsxfun -
%// Find indices to be changed for lower to upper change and vice versa change
[~,idx1] = find(bsxfun(#eq,str,vow'))
[~,idx2] = find(bsxfun(#eq,str,vowm'))
%// Selectively change input string based on the indices to be changed and maps
str(idx1) = str(idx1)-32
str(idx2) = str(idx2)+32

You could use regular expressions
as well.
I don't know how different this is to the other answers though...
str= 'UPPERCASE lowercase';
vow = '[aeiou]';
vowm = '[AEIOU]';
indl = regexp(str,vow);
indu = regexp(str,vowm);
str(indl) = upper(str(indl));
str(indu) = lower(str(indu));

Related

Converting letters into NATO alphabet in MATLAB

I want to write a code in MATLAB that converts a letter into NATO alphabet. Such as the word 'hello' would be re-written as Hotel-Echo-Lima-Lima-Oscar. I have been having some trouble with the code. So far I have the following:
function natoText = textToNato(plaintext)
plaintext = lower(plaintext);
r = zeros(1, length(plaintext))
%Define my NATO alphabet
natalph = ["Alpha","Bravo","Charlie","Delta","Echo","Foxtrot","Golf", ...
"Hotel","India","Juliet","Kilo","Lima","Mike","November","Oscar", ...
"Papa","Quebec","Romeo","Sierra","Tango","Uniform","Victor",...
"Whiskey","Xray","Yankee","Zulu"];
%Define the normal lower alphabet
noralpha = ['a' : 'z'];
%Now we need to make a loop for matlab to check for each letter
for i = 1:length(text)
for j = 1:26
n = r(i) == natalph(j);
if noralpha(j) == text(i) : n
else r(i) = r(i)
natoText = ''
end
end
end
for v = 1:length(plaintext)
natoText = natoText + r(v) + ''
natoText = natoText(:,-1)
end
end
I know the above code is a mess and I am a bit in doubt what really I have been doing. Is there anyone who knows a better way of doing this? Can I modify the above code so that it works?
It is because now when I run the code, I am getting an empty plot, which I don't know why because I have not asked for a plot in any lines.
You can actually do your conversion in one line. Given your string array natalph:
plaintext = 'hello'; % Your input; could also be "hello"
natoText = strjoin(natalph(char(lower(plaintext))-96), '-');
And the result:
natoText =
string
"Hotel-Echo-Lima-Lima-Oscar"
This uses a trick that character arrays can be treated as numeric arrays of their ASCII equivalent values. The code char(lower(plaintext))-96 converts plaintext to lowercase, then to a character array (if it isn't already) and implicitly converts it to a numeric vector of ASCII values by subtracting 96. Since 'a' is equal to 97, this creates an index vector containing the values 1 ('a') through 26 ('z'). This is used to index the string array natalph, and these are then joined together with hyphens.

Is there a way to substring, which is between two words in the string in Python?

My question is more or less similar to:
Is there a way to substring a string in Python?
but it's more specifically oriented.
How can I get a par of a string which is located between two known words in the initial string.
Example:
mySrting = "this is the initial string"
Substring = "initial"
knowing that "the" and "string" are the two known words in the string that can be used to get the substring.
Thank you!
You can start with simple string manipulation here. str.index is your best friend there, as it will tell you the position of a substring within a string; and you can also start searching somewhere later in the string:
>>> myString = "this is the initial string"
>>> myString.index('the')
8
>>> myString.index('string', 8)
20
Looking at the slice [8:20], we already get close to what we want:
>>> myString[8:20]
'the initial '
Of course, since we found the beginning position of 'the', we need to account for its length. And finally, we might want to strip whitespace:
>>> myString[8 + 3:20]
' initial '
>>> myString[8 + 3:20].strip()
'initial'
Combined, you would do this:
startIndex = myString.index('the')
substring = myString[startIndex + 3 : myString.index('string', startIndex)].strip()
If you want to look for matches multiple times, then you just need to repeat doing this while looking only at the rest of the string. Since str.index will only ever find the first match, you can use this to scan the string very efficiently:
searchString = 'this is the initial string but I added the relevant string pair a few more times into the search string.'
startWord = 'the'
endWord = 'string'
results = []
index = 0
while True:
try:
startIndex = searchString.index(startWord, index)
endIndex = searchString.index(endWord, startIndex)
results.append(searchString[startIndex + len(startWord):endIndex].strip())
# move the index to the end
index = endIndex + len(endWord)
except ValueError:
# str.index raises a ValueError if there is no match; in that
# case we know that we’re done looking at the string, so we can
# break out of the loop
break
print(results)
# ['initial', 'relevant', 'search']
You can also try something like this:
mystring = "this is the initial string"
mystring = mystring.strip().split(" ")
for i in range(1,len(mystring)-1):
if(mystring[i-1] == "the" and mystring[i+1] == "string"):
print(mystring[i])
I suggest using a combination of list, split and join methods.
This should help if you are looking for more than 1 word in the substring.
Turn the string into array:
words = list(string.split())
Get the index of your opening and closing markers then return the substring:
open = words.index('the')
close = words.index('string')
substring = ''.join(words[open+1:close])
You may want to improve a bit with the checking for the validity before proceeding.
If your problem gets more complex, i.e multiple occurrences of the pair values, I suggest using regular expression.
import re
substring = ''.join(re.findall(r'the (.+?) string', string))
The re should store substrings separately if you view them in list.
I am using the spaces between the description to rule out the spaces between words, you can modify to your needs as well.

I need help converting characters in a string to numerical values in Matlab [duplicate]

I'm looking for a quick way to convert a large character array of lowercase letters, spaces and periods into a set of integers and vice-versa in MATLAB.
Usually I would use the double and char functions, but I would like to use a special set of integers to represent each letter (so that 'a' matches with '1', 'b' matches with '2'.... 'z' matches with 26, ' ' matches with 27, and '.' matches with 28)
The current method that I have is:
text = 'quick brown fox jumps over dirty dog';
alphabet ='abcdefghijklmnopqrstuvwxyz .';
converted_text = double(text);
converted_alphabet = double(alphabet);
numbers = nan(28,1)
for i = 1:28
numbers(converted_text(i)==converted_alphabet(i)) = i;
end
newtext = nan(size(numbers))
for i = 1:size(numbers,1)
newtext(numbers==i) = alphabet(i)
end
Unfortunately this takes quite a bit of time for large arrays, and I'm wondering if there is a quicker way to do this in MATLAB?
An easy way is to use ismember():
[~,pos] = ismember(text,alphabet)
Or use the implicit conversion carried out by -:
out = text - 'a' + 1;
note that blanks will have -64 and full stops -50, which means that you will need:
out(out == -64) = 27;
out(out == -50) = 28;
Speed considerations:
For small sized arrays the latter solution is slightly faster IF you are happy to leave blanks and full stops with their negative index.
For big arrays, on my machine 1e4 times longer, the latter solution is twice faster than ismember().
Going back:
alphabet(out)

How to calculate word co-occurence

I have a string of characters of length 50 say representing a sequence abbcda.... for alphabets taken from the set A={a,b,c,d}.
I want to calculate how many times b is followed by another b (n-grams) where n=2.
Similarly, how many times a particular character is repeated thrice n=3 consecutively, say in the input string abbbcbbb etc so here the number of times b occurs in a sequence of 3 letters is 2.
To find the number of non-overlapping 2-grams you can use
numel(regexp(str, 'b{2}'))
and for 3-grams
numel(regexp(str, 'b{3}'))
to count overlapping 2-grams use positive lookahead
numel(regexp(str, '(b)(?=b{1})'))
and for overlapping n-grams
numel(regexp(str, ['(b)(?=b{' num2str(n-1) '})']))
EDIT
In order to find number of occurrences of an arbitrary sequence use the first element in first parenthesis and the rest after equality sign, to find ba use
numel(regexp(str, '(b)(?=a)'))
to find bda use
numel(regexp(str, '(b)(?=da)'))
Building on the proposal by Magla:
str = 'abcdabbcdaabbbabbbb'; % for example
index_single = ismember(str, 'b');
index_digram = index_single(1:end-1)&index_single(2:end);
index_trigram = index_single(1:end-2)&index_single(2:end-1)&index_single(3:end);
You may try this piece of code that uses ismember (doc).
%generate string (50 char, 'a' to 'd')
str = char(floor(97 + (101-97).*rand(1,50)))
%digram case
index_digram = ismember(str, 'aa');
%trigram case
index_trigram = ismember(str, 'aaa');
EDIT
Probabilities can be computed with
proba = sum(index_digram)/length(index_digram);
this will find all n-grams and count them:
numberOfGrams = 5;
s = char(floor(rand(1,1000)*4)+double('a'));
ngrams = cell(1);
for n = 2:numberOfGrams
strLength = size(s,2)-n+1;
indices = repmat((1:strLength)',1,n)+repmat(1:n,strLength,1)-1;
grams = s(indices);
gramNumbers = (double(grams)-double('a'))*((ones(1,n)*n).^(0:n-1))';
[uniqueGrams, gramInd] = unique(gramNumbers);
count=hist(gramNumbers,uniqueGrams);
ngrams(n) = {struct('gram',grams(gramInd,:),'count',count)};
end
edit:
the result will be:
ngrams{n}.gram %a list of all n letter sequences in the string
ngrams{n}.count(x) %the number of times the sequence ngrams{n}.gram(x) appears

MATLAB: Quickest Way To Convert Characters to A Custom Set Numbers and Back

I'm looking for a quick way to convert a large character array of lowercase letters, spaces and periods into a set of integers and vice-versa in MATLAB.
Usually I would use the double and char functions, but I would like to use a special set of integers to represent each letter (so that 'a' matches with '1', 'b' matches with '2'.... 'z' matches with 26, ' ' matches with 27, and '.' matches with 28)
The current method that I have is:
text = 'quick brown fox jumps over dirty dog';
alphabet ='abcdefghijklmnopqrstuvwxyz .';
converted_text = double(text);
converted_alphabet = double(alphabet);
numbers = nan(28,1)
for i = 1:28
numbers(converted_text(i)==converted_alphabet(i)) = i;
end
newtext = nan(size(numbers))
for i = 1:size(numbers,1)
newtext(numbers==i) = alphabet(i)
end
Unfortunately this takes quite a bit of time for large arrays, and I'm wondering if there is a quicker way to do this in MATLAB?
An easy way is to use ismember():
[~,pos] = ismember(text,alphabet)
Or use the implicit conversion carried out by -:
out = text - 'a' + 1;
note that blanks will have -64 and full stops -50, which means that you will need:
out(out == -64) = 27;
out(out == -50) = 28;
Speed considerations:
For small sized arrays the latter solution is slightly faster IF you are happy to leave blanks and full stops with their negative index.
For big arrays, on my machine 1e4 times longer, the latter solution is twice faster than ismember().
Going back:
alphabet(out)

Resources