How to tangle/scramble/rearrange a string in MATLAB? - string

For an example exam question, I've been asked to "tangle" a string as shown:
tangledWord('today')='otady'
tangledWord('12345678')='21436587'
I understand this is an extremely simple problem but it's got me stumped.
I can make it produce a tangled word when the length is even, but I'm having trouble when it's odd, here's my function:
function tangledWord(s)
n=length(s);
a=s(1:2:n);
b=s(2:2:n);
s(1:2:n)=b;
s(2:2:n)=a;
disp(s);
end

For odd word length, you need to reduce n by 1 to leave the last char untouched. Use mod to detect odd word length.

If you want to scramble every char randomly, you can try:
string = '1234567';
shuffled = string(randperm(numel(string)))
shuffled = 5741326
If you want to change the first two chars:
tangled = [string(2) string(1) string(3:end)]
tangled = 2134567
If you want to change every two chars:
n = ( numel(string)-mod(numel(string),2));
tangled2 = [flipud(reshape(string(1:n),[],n/2))(:); string(n+1:end)]'
tangled2 = 2143657

function tangledWord(s)
n=length(s);
if mod(n,2) == 0
a=s(1:2:n);
b=s(2:2:n);
s(1:2:n)=b;
s(2:2:n)=a;
disp(s)
elseif mod(n,2) ~= 0
a=s(1:2:end-1);
b=s(2:2:end-1);
s(1:2:end-1)=b;
s(2:2:end-1)=a;
disp(s)
end
end

Related

Dealing with problems where memory isn't sufficient. Dynamic programming

I was solving a Problem using python, here i was storing a repetitive string "abc" in a string with everytime each character getting double like "abcaabbccaaaabbbbcccc.......... , and i had to find the nth character. The constraints were n<=10^9 , Now when i tried to store this their was memory error as the string was to too large (i tried to store all the charaters till the charater 2^30 times repeated). CAn somebody help me with the approach to tackle this situation.
t=' '
for i in range(0 , 30):
t = t +'a'*(2**i)
t = t +'b'*(2**i)
t = t +'c'*(2**i)
Obviously, you can't do this the straightforward, brute-force way. Instead, you need to count along a virtual string to find where your given index appears. I'll lay this out in too much detail so you can see the logic:
n = 314159265 # Pick a large value for illustration
rem = n
for i in range(0 , 30):
size = 2**i
# print(size, rem)
rem -= size
if rem <= 0:
char = 'a'
break
rem -= size
if rem <= 0:
char = 'b'
break
rem -= size
if rem <= 0:
char = 'c'
break
print("Character", n, "is", char)
Output:
Character 314159265 is b
You can shorten this with a better loop body; I'll leave that as a further exercise. If you get insightful with your arithmetic, you can simply compute the appropriate letter from the chunk sizes you generate.

How to find positions of the last occurrence of a pattern in a string, and use these to extract a substring from another string

I need some help with a specific problem, which I cannot seem to find on this website.
I have a result which looks something like this:
result = "ooooooooooooooooooooooMMMMMMooooooooooooooooooMMMMMMooooooooooMMMMMMMMoo"
This is a transmembrane prediction. So for this string, I have another string of the same length, but is an amino acid code, for example:
amino_acid_code = "MSDENKSTPIVKASDITDKLKEDILTISKDALDKNTWHVIVGKNFGSYVTHEKGHFVYFYIGPLAFLVFKTA"
I want to do some research on the last "M" region. This can vary in length, as well as the "o" that comes after. So in this case I need to extract "PLAFLVFK" from the last string, which corresponds to the last "M" region.
I have something like this already, but I cannot figure out how to obtain the start position, and I also believe a simpler (or computationally better) solution is possible.
end = result.rfind('M')
start = ?
region_I_need = amino_acid_code[start:end]
Thanks in advance
To also find the start position, use rfind again after slicing off the characters after the end of the result string:
result = "ooooooooooooooooooooooMMMMMMooooooooooooooooooMMMMMMooooooooooMMMMMMMMoo"
amino_acid_code = "MSDENKSTPIVKASDITDKLKEDILTISKDALDKNTWHVIVGKNFGSYVTHEKGHFVYFYIGPLAFLVFKTA"
# add 1 to the indices to get the correct positions
end = result.rfind('M') + 1
start = result[:end].rfind('o') + 1
region_I_need = amino_acid_code[start:end]
print(start, end)
print(amino_acid_code[start:end])
>>> 62 70
>>> PLAFLVFK

I need help converting characters in a string to numerical values in Matlab [duplicate]

I'm looking for a quick way to convert a large character array of lowercase letters, spaces and periods into a set of integers and vice-versa in MATLAB.
Usually I would use the double and char functions, but I would like to use a special set of integers to represent each letter (so that 'a' matches with '1', 'b' matches with '2'.... 'z' matches with 26, ' ' matches with 27, and '.' matches with 28)
The current method that I have is:
text = 'quick brown fox jumps over dirty dog';
alphabet ='abcdefghijklmnopqrstuvwxyz .';
converted_text = double(text);
converted_alphabet = double(alphabet);
numbers = nan(28,1)
for i = 1:28
numbers(converted_text(i)==converted_alphabet(i)) = i;
end
newtext = nan(size(numbers))
for i = 1:size(numbers,1)
newtext(numbers==i) = alphabet(i)
end
Unfortunately this takes quite a bit of time for large arrays, and I'm wondering if there is a quicker way to do this in MATLAB?
An easy way is to use ismember():
[~,pos] = ismember(text,alphabet)
Or use the implicit conversion carried out by -:
out = text - 'a' + 1;
note that blanks will have -64 and full stops -50, which means that you will need:
out(out == -64) = 27;
out(out == -50) = 28;
Speed considerations:
For small sized arrays the latter solution is slightly faster IF you are happy to leave blanks and full stops with their negative index.
For big arrays, on my machine 1e4 times longer, the latter solution is twice faster than ismember().
Going back:
alphabet(out)

Getting the largest and smallest word at a string

when I run this codes the output is (" "," "),however it should be ("I","love")!!!, and there is no errors . what should I do to fix it ??
sen="I love dogs"
function Longest_word(sen)
x=" "
maxw=" "
minw=" "
minl=1
maxl=length(sen)
p=0
for i=1:length(sen)
if(sen[i]!=" ")
x=[x[1]...,sen[i]...]
else
p=length(x)
if p<min1
minl=p
minw=x
end
if p>maxl
maxl=p
maxw=x
end
x=" "
end
end
return minw,maxw
end
As #David mentioned, another and may be better solution can be achieved by using split function:
function longest_word(sentence)
sp=split(sentence)
len=map(length,sp)
return (sp[indmin(len)],sp[indmax(len)])
end
The idea of your code is good, but there are a few mistakes.
You can see what's going wrong by debugging a bit. The easiest way to do this is with #show, which prints out the value of variables. When code doesn't work like you expect, this is the first thing to do -- just ask it what it's doing by printing everything out!
E.g. if you put
if(sen[i]!=" ")
x=[x[1]...,sen[i]...]
#show x
and run the function with
Longest_word("I love dogs")
you will see that it is not doing what you want it to do, which (I believe) is add the ith letter to the string x.
Note that the ith letter accessed like sen[i] is a character not a string.
You can try converting it to a string with
string(sen[i])
but this gives a Unicode string, not an ASCII string, in recent versions of Julia.
In fact, it would be better not to iterate over the string using
for i in 1:length(sen)
but iterate over the characters in the string (which will also work if the string is Unicode):
for c in sen
Then you can initialise the string x as
x = UTF8String("")
and update it with
x = string(x, c)
Try out some of these possibilities and see if they help.
Also, you have maxl and minl defined wrong initially -- they should be the other way round. Also, the names of the variables are not very helpful for understanding what should happen. And the strings should be initialised to empty strings, "", not a string with a space, " ".
#daycaster is correct that there seems to be a min1 that should be minl.
However, in fact there is an easier way to solve the problem, using the split function, which divides a string into words.
Let us know if you still have a problem.
Here is a working version following your idea:
function longest_word(sentence)
x = UTF8String("")
maxw = ""
minw = ""
maxl = 0 # counterintuitive! start the "wrong" way round
minl = length(sentence)
for i in 1:length(sentence) # or: for c in sentence
if sentence[i] != ' ' # or: if c != ' '
x = string(x, sentence[i]) # or: x = string(x, c)
else
p = length(x)
if p < minl
minl = p
minw = x
end
if p > maxl
maxl = p
maxw = x
end
x = ""
end
end
return minw, maxw
end
Note that this function does not work if the longest word is at the end of the string. How could you modify it for this case?

MATLAB: Quickest Way To Convert Characters to A Custom Set Numbers and Back

I'm looking for a quick way to convert a large character array of lowercase letters, spaces and periods into a set of integers and vice-versa in MATLAB.
Usually I would use the double and char functions, but I would like to use a special set of integers to represent each letter (so that 'a' matches with '1', 'b' matches with '2'.... 'z' matches with 26, ' ' matches with 27, and '.' matches with 28)
The current method that I have is:
text = 'quick brown fox jumps over dirty dog';
alphabet ='abcdefghijklmnopqrstuvwxyz .';
converted_text = double(text);
converted_alphabet = double(alphabet);
numbers = nan(28,1)
for i = 1:28
numbers(converted_text(i)==converted_alphabet(i)) = i;
end
newtext = nan(size(numbers))
for i = 1:size(numbers,1)
newtext(numbers==i) = alphabet(i)
end
Unfortunately this takes quite a bit of time for large arrays, and I'm wondering if there is a quicker way to do this in MATLAB?
An easy way is to use ismember():
[~,pos] = ismember(text,alphabet)
Or use the implicit conversion carried out by -:
out = text - 'a' + 1;
note that blanks will have -64 and full stops -50, which means that you will need:
out(out == -64) = 27;
out(out == -50) = 28;
Speed considerations:
For small sized arrays the latter solution is slightly faster IF you are happy to leave blanks and full stops with their negative index.
For big arrays, on my machine 1e4 times longer, the latter solution is twice faster than ismember().
Going back:
alphabet(out)

Resources