Replace substring that lies between two positions - string

I have a string S in Matlab. How can I replace a substring in S with some pattern P. I only know the first and the last index of substring in S. What is the approach?

How about that?
str = 'My dog is called Jim'; %// original string
a = 4; %// starting index
b = 6; %// last index
replace = 'hamster'; %// new pattern
newstr = [str(1:a-1) replace str(b+1:end)]
returns:
newstr = My hamster is called Jim
In case the pattern you want to substitute has the same number of characters as the new one, you can use simple indexing:
str(a:b) = 'cat'
returns:
str = My cat is called Jim

Related

Check if text contains a string and keep matched words from original text:

a = "Beauty Store is all you need!"
b = "beautystore"
test1 = ''.join(e for e in a if e.isalnum())
test2 = test1.lower()
test3 = [test2]
match = [s for s in test3 if b in s]
if match != []:
print(match)
>>>['beautystoreisallyouneed']
What I want is: "Beauty Store"
I search for the keyword in the string and I want to return the keyword from the string in the original format (with capital letter and space between, whatever) of the string, but only the part that contains the keyword.
If the keyword only occurs once, this will give you the right solution:
a = "Beauty Store is all you need!"
b = "beautystore"
ind = range(len(a))
joined = [(letter, number) for letter, number in zip(a, ind) if letter.isalnum()]
searchtext = ''.join(el[0].lower() for el in joined)
pos = searchtext.find(b)
original_text = a[joined[pos][1]:joined[pos+len(b)][1]]
It saves the original position of each letter, joins them to the lowercase string, finds the position and then looks up the original positions again.

Is there a way to substring, which is between two words in the string in Python?

My question is more or less similar to:
Is there a way to substring a string in Python?
but it's more specifically oriented.
How can I get a par of a string which is located between two known words in the initial string.
Example:
mySrting = "this is the initial string"
Substring = "initial"
knowing that "the" and "string" are the two known words in the string that can be used to get the substring.
Thank you!
You can start with simple string manipulation here. str.index is your best friend there, as it will tell you the position of a substring within a string; and you can also start searching somewhere later in the string:
>>> myString = "this is the initial string"
>>> myString.index('the')
8
>>> myString.index('string', 8)
20
Looking at the slice [8:20], we already get close to what we want:
>>> myString[8:20]
'the initial '
Of course, since we found the beginning position of 'the', we need to account for its length. And finally, we might want to strip whitespace:
>>> myString[8 + 3:20]
' initial '
>>> myString[8 + 3:20].strip()
'initial'
Combined, you would do this:
startIndex = myString.index('the')
substring = myString[startIndex + 3 : myString.index('string', startIndex)].strip()
If you want to look for matches multiple times, then you just need to repeat doing this while looking only at the rest of the string. Since str.index will only ever find the first match, you can use this to scan the string very efficiently:
searchString = 'this is the initial string but I added the relevant string pair a few more times into the search string.'
startWord = 'the'
endWord = 'string'
results = []
index = 0
while True:
try:
startIndex = searchString.index(startWord, index)
endIndex = searchString.index(endWord, startIndex)
results.append(searchString[startIndex + len(startWord):endIndex].strip())
# move the index to the end
index = endIndex + len(endWord)
except ValueError:
# str.index raises a ValueError if there is no match; in that
# case we know that we’re done looking at the string, so we can
# break out of the loop
break
print(results)
# ['initial', 'relevant', 'search']
You can also try something like this:
mystring = "this is the initial string"
mystring = mystring.strip().split(" ")
for i in range(1,len(mystring)-1):
if(mystring[i-1] == "the" and mystring[i+1] == "string"):
print(mystring[i])
I suggest using a combination of list, split and join methods.
This should help if you are looking for more than 1 word in the substring.
Turn the string into array:
words = list(string.split())
Get the index of your opening and closing markers then return the substring:
open = words.index('the')
close = words.index('string')
substring = ''.join(words[open+1:close])
You may want to improve a bit with the checking for the validity before proceeding.
If your problem gets more complex, i.e multiple occurrences of the pair values, I suggest using regular expression.
import re
substring = ''.join(re.findall(r'the (.+?) string', string))
The re should store substrings separately if you view them in list.
I am using the spaces between the description to rule out the spaces between words, you can modify to your needs as well.

Python Join String to Produce Combinations For All Words in String

If my string is this: 'this is a string', how can I produce all possible combinations by joining each word with its neighboring word?
What this output would look like:
this is a string
thisis a string
thisisa string
thisisastring
thisis astring
this isa string
this isastring
this is astring
What I have tried:
s = 'this is a string'.split()
for i, l in enumerate(s):
''.join(s[0:i])+' '.join(s[i:])
This produces:
'this is a string'
'thisis a string'
'thisisa string'
'thisisastring'
I realize I need to change the s[0:i] part because it's statically anchored at 0 but I don't know how to move to the next word is while still including this in the output.
A simpler (and 3x faster than the accepted answer) way to use itertools product:
s = 'this is a string'
s2 = s.replace('%', '%%').replace(' ', '%s')
for i in itertools.product((' ', ''), repeat=s.count(' ')):
print(s2 % i)
You can also use itertools.product():
import itertools
s = 'this is a string'
words = s.split()
for t in itertools.product(range(len('01')), repeat=len(words)-1):
print(''.join([words[i]+t[i]*' ' for i in range(len(t))])+words[-1])
Well, it took me a little longer than I expected... this is actually tricker than I thought :)
The main idea:
The number of spaces when you split the string is the length or the split array - 1. In our example there are 3 spaces:
'this is a string'
^ ^ ^
We'll take a binary representation of all the options to have/not have either one of the spaces, so in our case it'll be:
000
001
011
100
101
...
and for each option we'll generate the sentence respectively, where 111 represents all 3 spaces: 'this is a string' and 000 represents no-space at all: 'thisisastring'
def binaries(n):
res = []
for x in range(n ** 2 - 1):
tmp = bin(x)
res.append(tmp.replace('0b', '').zfill(n))
return res
def generate(arr, bins):
res = []
for bin in bins:
tmp = arr[0]
i = 1
for digit in list(bin):
if digit == '1':
tmp = tmp + " " + arr[i]
else:
tmp = tmp + arr[i]
i += 1
res.append(tmp)
return res
def combinations(string):
s = string.split(' ')
bins = binaries(len(s) - 1)
res = generate(s, bins)
return res
print combinations('this is a string')
# ['thisisastring', 'thisisa string', 'thisis astring', 'thisis a string', 'this isastring', 'this isa string', 'this is astring', 'this is a string']
UPDATE:
I now see that Amadan thought of the same idea - kudos for being quicker than me to think about! Great minds think alike ;)
The easiest is to do it recursively.
Terminating condition: Schrödinger join of a single element list is that word.
Recurring condition: say that L is the Schrödinger join of all the words but the first. Then the Schrödinger join of the list consists of all elements from L with the first word directly prepended, and all elements from L with the first word prepended with an intervening space.
(Assuming you are missing thisis astring by accident. If it is deliberately, I am sure I have no idea what the question is :P )
Another, non-recursive way you can do it is to enumerate all numbers from 0 to 2^(number of words - 1) - 1, then use the binary representation of each number as a selector whether or not a space needs to be present. So, for example, the abovementioned thisis astring corresponds to 0b010, for "nospace, space, nospace".

How can I change only vowels from uppercase to lowercase and vice versa (MATLAB)

I have to change every vowel on a string to upper or lower case depending what it already is.. so "UPPERCASE lowercase" becomes "uPPeRCaSe lOwErcAsE"
So far I have had no success with this aproach
str= 'UPPERCASE lowercase';
vow = 'aeiou';
vowm = 'AEIOU';
for k = 1:5
if str(str == vow(k))
str(str == vow(k))= vowm(k);
else
if str(str == vowm(k))
str(str == vowm(k))= vow(k);
Expected output: "uPPeRCaSe lOwErcAsE"
Actual output: "uPPERCASE lOwErCAsE"
I am extremely new to matlab and im kinda lost.
i aprecciate your help
Use ismember to find all occurrences of each type of vowels (uppercase or lowercase), and then upper and lower to convert them:
str = 'UPPERCASE lowercase'; %// original string
indl = ismember(str, 'aeiou'); %// locations of lowercase vowels
indu = ismember(str, 'AEIOU'); %// locations of uppercase vowels
str(indl) = upper(str(indl)); %// convert from lower to upper
str(indu) = lower(str(indu)); %// convert from upper to lower
As listed in the question, I am assuming the following as the inputs -
%// Inputs
str= 'UPPERCASE lowercase'
vow = 'aeiou'
vowm = 'AEIOU'
Approach #1
One approach based on changem that is used to substitute values -
%// Create maps from input string to reflect changes from lower to upper
%// and vice versa
map1 = changem(str,vowm,vow)
map2 = changem(str,vow,vowm)
%// Find indices to be changed for lower to upper change and vice versa change
idx1 = find(map1~=str)
idx2 = find(map2~=str)
%// Selectively change input string based on the indices to be changed and maps
str(idx1) = map1(idx1)
str(idx2) = map2(idx2)
Approach #2
With bsxfun -
%// Find indices to be changed for lower to upper change and vice versa change
[~,idx1] = find(bsxfun(#eq,str,vow'))
[~,idx2] = find(bsxfun(#eq,str,vowm'))
%// Selectively change input string based on the indices to be changed and maps
str(idx1) = str(idx1)-32
str(idx2) = str(idx2)+32
You could use regular expressions
as well.
I don't know how different this is to the other answers though...
str= 'UPPERCASE lowercase';
vow = '[aeiou]';
vowm = '[AEIOU]';
indl = regexp(str,vow);
indu = regexp(str,vowm);
str(indl) = upper(str(indl));
str(indu) = lower(str(indu));

string matching in matlab

I have two short (S with the size of 1x10) and very long (L with the size of 1x1000) strings and I am going to find the locations in L which are matched with S.
In this specific matching, I am just interested to match some specific strings in S (the black strings). Is there any function or method in matlab that can match some specific strings (for example string numbers of 1, 5, 9 in S)?
If I understand your question correctly, you want to find substrings in L that contain the same letters (characters) as S in certain positions (let's say given by array idx). Regular expressions are ideal here, so I suggest using regexp.
In regular expressions, a dot (.) matches any character, and curly braces ({}) optionally specify the number of desired occurrences. For example, to match a string of length 6, where the second character is 'a' and the fifth is 'b', our regular expression could be any of the following syntaxes:
.a..b.
.a.{2}b.
.{1}a.{2}b.{1}
any of these is correct. So let's construct a regular expression pattern first:
in = num2cell(diff([0; idx(:); numel(S) + 1]) - 1); %// Intervals
ch = num2cell(S(idx(:))); %// Matched characters
C = [in(:)'; ch(:)', {''}];
pat = sprintf('.{%d}%c', C{:}); %// Pattern for regexp
Now all is left is to feed regexp with L and the desired pattern:
loc = regexp(L, pat)
and voila!
Example
Let's assume that:
S = 'wbzder'
L = 'gabcdexybhdef'
idx = [2 4 5]
First we build a pattern:
in = num2cell(diff([0; idx(:); numel(S) + 1]) - 1);
ch = num2cell(S(idx(:)));
C = [in(:)'; ch(:)', {''}];
pat = sprintf('.{%d}%c', C{:});
The pattern we get is:
pat =
.{1}b.{1}d.{0}e.{1}
Obviously we can add code that beautifies this pattern into .b.de., but this is really an unnecessary optimization (regexp can handle the former just as well).
After we do:
loc = regexp(L, pat)
we get the following result:
loc =
2 8
Seems correct.

Resources