I need help converting characters in a string to numerical values in Matlab [duplicate] - string

I'm looking for a quick way to convert a large character array of lowercase letters, spaces and periods into a set of integers and vice-versa in MATLAB.
Usually I would use the double and char functions, but I would like to use a special set of integers to represent each letter (so that 'a' matches with '1', 'b' matches with '2'.... 'z' matches with 26, ' ' matches with 27, and '.' matches with 28)
The current method that I have is:
text = 'quick brown fox jumps over dirty dog';
alphabet ='abcdefghijklmnopqrstuvwxyz .';
converted_text = double(text);
converted_alphabet = double(alphabet);
numbers = nan(28,1)
for i = 1:28
numbers(converted_text(i)==converted_alphabet(i)) = i;
end
newtext = nan(size(numbers))
for i = 1:size(numbers,1)
newtext(numbers==i) = alphabet(i)
end
Unfortunately this takes quite a bit of time for large arrays, and I'm wondering if there is a quicker way to do this in MATLAB?

An easy way is to use ismember():
[~,pos] = ismember(text,alphabet)
Or use the implicit conversion carried out by -:
out = text - 'a' + 1;
note that blanks will have -64 and full stops -50, which means that you will need:
out(out == -64) = 27;
out(out == -50) = 28;
Speed considerations:
For small sized arrays the latter solution is slightly faster IF you are happy to leave blanks and full stops with their negative index.
For big arrays, on my machine 1e4 times longer, the latter solution is twice faster than ismember().
Going back:
alphabet(out)

Related

Dealing with problems where memory isn't sufficient. Dynamic programming

I was solving a Problem using python, here i was storing a repetitive string "abc" in a string with everytime each character getting double like "abcaabbccaaaabbbbcccc.......... , and i had to find the nth character. The constraints were n<=10^9 , Now when i tried to store this their was memory error as the string was to too large (i tried to store all the charaters till the charater 2^30 times repeated). CAn somebody help me with the approach to tackle this situation.
t=' '
for i in range(0 , 30):
t = t +'a'*(2**i)
t = t +'b'*(2**i)
t = t +'c'*(2**i)
Obviously, you can't do this the straightforward, brute-force way. Instead, you need to count along a virtual string to find where your given index appears. I'll lay this out in too much detail so you can see the logic:
n = 314159265 # Pick a large value for illustration
rem = n
for i in range(0 , 30):
size = 2**i
# print(size, rem)
rem -= size
if rem <= 0:
char = 'a'
break
rem -= size
if rem <= 0:
char = 'b'
break
rem -= size
if rem <= 0:
char = 'c'
break
print("Character", n, "is", char)
Output:
Character 314159265 is b
You can shorten this with a better loop body; I'll leave that as a further exercise. If you get insightful with your arithmetic, you can simply compute the appropriate letter from the chunk sizes you generate.

How to tangle/scramble/rearrange a string in MATLAB?

For an example exam question, I've been asked to "tangle" a string as shown:
tangledWord('today')='otady'
tangledWord('12345678')='21436587'
I understand this is an extremely simple problem but it's got me stumped.
I can make it produce a tangled word when the length is even, but I'm having trouble when it's odd, here's my function:
function tangledWord(s)
n=length(s);
a=s(1:2:n);
b=s(2:2:n);
s(1:2:n)=b;
s(2:2:n)=a;
disp(s);
end
For odd word length, you need to reduce n by 1 to leave the last char untouched. Use mod to detect odd word length.
If you want to scramble every char randomly, you can try:
string = '1234567';
shuffled = string(randperm(numel(string)))
shuffled = 5741326
If you want to change the first two chars:
tangled = [string(2) string(1) string(3:end)]
tangled = 2134567
If you want to change every two chars:
n = ( numel(string)-mod(numel(string),2));
tangled2 = [flipud(reshape(string(1:n),[],n/2))(:); string(n+1:end)]'
tangled2 = 2143657
function tangledWord(s)
n=length(s);
if mod(n,2) == 0
a=s(1:2:n);
b=s(2:2:n);
s(1:2:n)=b;
s(2:2:n)=a;
disp(s)
elseif mod(n,2) ~= 0
a=s(1:2:end-1);
b=s(2:2:end-1);
s(1:2:end-1)=b;
s(2:2:end-1)=a;
disp(s)
end
end

How to compute word scores in Scrabble using MATLAB

I have a homework program I have run into a problem with. We basically have to take a word (such as MATLAB) and have the function give us the correct score value for it using the rules of Scrabble. There are other things involved such as double word and double point values, but what I'm struggling with is converting to ASCII. I need to get my string into ASCII form and then sum up those values. We only know the bare basics of strings and our teacher is pretty useless. I've tried converting the string into numbers, but that's not exactly working out. Any suggestions?
function[score] = scrabble(word, letterPoints)
doubleword = '#';
doubleletter = '!';
doublew = [findstr(word, doubleword)]
trouble = [findstr(word, doubleletter)]
word = char(word)
gameplay = word;
ASCII = double(gameplay)
score = lower(sum(ASCII));
Building on Francis's post, what I would recommend you do is create a lookup array. You can certainly convert each character into its ASCII equivalent, but then what I would do is have an array where the input is the ASCII code of the character you want (with a bit of modification), and the output will be the point value of the character. Once you find this, you can sum over the points to get your final point score.
I'm going to leave out double points, double letters, blank tiles and that whole gamut of fun stuff in Scrabble for now in order to get what you want working. By consulting Wikipedia, this is the point distribution for each letter encountered in Scrabble.
1 point: A, E, I, O, N, R, T, L, S, U
2 points: D, G
3 points: B, C, M, P
4 points: F, H, V, W, Y
5 points: K
8 points: J, X
10 points: Q, Z
What we're going to do is convert your word into lower case to ensure consistency. Now, if you take a look at the letter a, this corresponds to ASCII code 97. You can verify that by using the double function we talked about earlier:
>> double('a')
97
As there are 26 letters in the alphabet, this means that going from a to z should go from 97 to 122. Because MATLAB starts indexing arrays at 1, what we can do is subtract each of our characters by 96 so that we'll be able to figure out the numerical position of these characters from 1 to 26.
Let's start by building our lookup table. First, I'm going to define a whole bunch of strings. Each string denotes the letters that are associated with each point in Scrabble:
string1point = 'aeionrtlsu';
string2point = 'dg';
string3point = 'bcmp';
string4point = 'fhvwy';
string5point = 'k';
string8point = 'jx';
string10point = 'qz';
Now, we can use each of the strings, convert to double, subtract by 96 then assign each of the corresponding locations to the points for each letter. Let's create our lookup table like so:
lookup = zeros(1,26);
lookup(double(string1point) - 96) = 1;
lookup(double(string2point) - 96) = 2;
lookup(double(string3point) - 96) = 3;
lookup(double(string4point) - 96) = 4;
lookup(double(string5point) - 96) = 5;
lookup(double(string8point) - 96) = 8;
lookup(double(string10point) - 96) = 10;
I first create an array of length 26 through the zeros function. I then figure out where each letter goes and assign to each letter their point values.
Now, the last thing you need to do is take a string, take the lower case to be sure, then convert each character into its ASCII equivalent, subtract by 96, then sum up the values. If we are given... say... MATLAB:
stringToConvert = 'MATLAB';
stringToConvert = lower(stringToConvert);
ASCII = double(stringToConvert) - 96;
value = sum(lookup(ASCII));
Lo and behold... we get:
value =
10
The last line of the above code is crucial. Basically, ASCII will contain a bunch of indexing locations where each number corresponds to the numerical position of where the letter occurs in the alphabet. We use these positions to look up what point / score each letter gives us, and we sum over all of these values.
Part #2
The next part where double point values and double words come to play can be found in my other StackOverflow post here:
Calculate Scrabble word scores for double letters and double words MATLAB
Convert from string to ASCII:
>> myString = 'hello, world';
>> ASCII = double(myString)
ASCII =
104 101 108 108 111 44 32 119 111 114 108 100
Sum up the values:
>> total = sum(ASCII)
total =
1160
The MATLAB help for char() says (emphasis added):
S = char(X) converts array X of nonnegative integer codes into a character array. Valid codes range from 0 to 65535, where codes 0 through 127 correspond to 7-bit ASCII characters. The characters that MATLABĀ® can process (other than 7-bit ASCII characters) depend upon your current locale setting. To convert characters into a numeric array, use the double function.
ASCII chart here.

MATLAB: Quickest Way To Convert Characters to A Custom Set Numbers and Back

I'm looking for a quick way to convert a large character array of lowercase letters, spaces and periods into a set of integers and vice-versa in MATLAB.
Usually I would use the double and char functions, but I would like to use a special set of integers to represent each letter (so that 'a' matches with '1', 'b' matches with '2'.... 'z' matches with 26, ' ' matches with 27, and '.' matches with 28)
The current method that I have is:
text = 'quick brown fox jumps over dirty dog';
alphabet ='abcdefghijklmnopqrstuvwxyz .';
converted_text = double(text);
converted_alphabet = double(alphabet);
numbers = nan(28,1)
for i = 1:28
numbers(converted_text(i)==converted_alphabet(i)) = i;
end
newtext = nan(size(numbers))
for i = 1:size(numbers,1)
newtext(numbers==i) = alphabet(i)
end
Unfortunately this takes quite a bit of time for large arrays, and I'm wondering if there is a quicker way to do this in MATLAB?
An easy way is to use ismember():
[~,pos] = ismember(text,alphabet)
Or use the implicit conversion carried out by -:
out = text - 'a' + 1;
note that blanks will have -64 and full stops -50, which means that you will need:
out(out == -64) = 27;
out(out == -50) = 28;
Speed considerations:
For small sized arrays the latter solution is slightly faster IF you are happy to leave blanks and full stops with their negative index.
For big arrays, on my machine 1e4 times longer, the latter solution is twice faster than ismember().
Going back:
alphabet(out)

How compiler is converting integer to string and vice versa

Many languages have functions for converting string to integer and vice versa. So what happens there? What algorithm is being executed during conversion?
I don't ask in specific language because I think it should be similar in all of them.
To convert a string to an integer, take each character in turn and if it's in the range '0' through '9', convert it to its decimal equivalent. Usually that's simply subtracting the character value of '0'. Now multiply any previous results by 10 and add the new value. Repeat until there are no digits left. If there was a leading '-' minus sign, invert the result.
To convert an integer to a string, start by inverting the number if it is negative. Divide the integer by 10 and save the remainder. Convert the remainder to a character by adding the character value of '0'. Push this to the beginning of the string; now repeat with the value that you obtained from the division. Repeat until the divided value is zero. Put out a leading '-' minus sign if the number started out negative.
Here are concrete implementations in Python, which in my opinion is the language closest to pseudo-code.
def string_to_int(s):
i = 0
sign = 1
if s[0] == '-':
sign = -1
s = s[1:]
for c in s:
if not ('0' <= c <= '9'):
raise ValueError
i = 10 * i + ord(c) - ord('0')
return sign * i
def int_to_string(i):
s = ''
sign = ''
if i < 0:
sign = '-'
i = -i
while True:
remainder = i % 10
i = i / 10
s = chr(ord('0') + remainder) + s
if i == 0:
break
return sign + s
I wouldn't call it an algorithm per se, but depending on the language it will involve the conversion of characters into their integral equivalent. Many languages will either stop on the first character that cannot be represented as an integer (e.g. the letter a), will blindly convert all characters into their ASCII value (e.g. the letter a becomes 97), or will ignore characters that cannot be represented as integers and only convert the ones that can - or return 0 / empty. You have to get more specific on the framework/language to provide more information.
String to integer:
Many (most) languages represent strings, on some level or another, as an array (or list) of characters, which are also short integers. Map the ones corresponding to number characters to their number value. For example, '0' in ascii is represented by 48. So you map 48 to 0, 49 to 1, and so on to 9.
Starting from the left, you multiply your current total by 10, add the next character's value, and move on. (You can make a larger or smaller map, change the number you multiply by at each step, and convert strings of any base you like.)
Integer to string is a longer process involving base conversion to 10. I suppose that since most integers have limited bits (32 or 64, usually), you know that it will come to a certain number of characters at most in a string (20?). So you can set up your own adder and iterate through each place for each bit after calculating its value (2^place).

Resources