Not Returning ASCII characters when encrypting - python-3.x

I am trying to write an encryption code using Python. It encrypts but it does not use the ASCII characters and I am not sure why. I am very new to python and would love some help
def encrypt(text, shift):
cipher=""
for char in range(len(text)):
char = text[char]
if (char.isupper()):
cipher += chr((ord(char) + shift - 65) % 26 + 65)
else:
cipher += chr((ord(char) + shift - 97) % 26 + 97)
return cipher
It is encrypting but not returning ASCII characters

The formula
enc = (char + shift - offset) % m + offset
implicitly defines an alphabet within whose boundaries the encryption takes place, i.e. a character within this alphabet is mapped to another character of this aphabet. For example, the uppercase letters have an offset of 65 and a modulus of 26 (number of characters in the alphabet). This defines the alphabet as the range between 65 (A) and 65 + 26 - 1 = 90 (Z). A character between incl. A and Z is always mapped to a character between incl. A and Z and never to a character outside the alphabet. The same applies to lowercase letters. For this reason:
print("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
print(encrypt("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz", 5));
results in the output:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
FGHIJKLMNOPQRSTUVWXYZABCDEfghijklmnopqrstuvwxyzabcde
If the special characters are also to be included, the range must be choosen accordingly, e.g. as a contiguous range between incl. 32 (Space) and incl. 126 (~). This corresponds to an offset of 32 and a modulus of 126 - 32 + 1 = 95 (number of characters in alphabet). The if-statement is no longer necessary because of the contiguous range, so that simply applies:
cipher += chr((ord(char) + shift - 32) % 95 + 32)
The following code:
print(" !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~");
print(encrypt(" !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~", 5));
then produces the output:
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ !"#$
which now also contains the special characters.

Related

Comparing index of a string with character

How do we compare an element of an index of a string to a characters?
string a;
int j;
for (j = 1; j <= Length(a); j = j + 1)
if ((a[j] >= ‘t’) && (a[j] <= ‘z’))
a[j] = a[j] – 32;
Return(a);
}
Do we use ASCII as a part of the solution? and we change characters according to their equivalent ascii after the operation
What you are doing there is taking a single letter and if it is between a lowercase t and z in the latin alphabet and converting it to an upper-case (capital) version of itself.
To give a more specific answer you need need to let us know what programming language you are using and what you want to achieve as this is essentially pseudo-code
Edit - okay, yes you are using the ASCII character table (see https://www.asciitable.com/). Each character in the string has a numerical equivalent (as all strings are stored in memory as numbers anyway) and subtracting 32 from the numerical value of the character will convert it to upper-case.
Letter 'a' = 97
97 - 32 = 65
65 = 'A'

Reed solomon encoding over Galois Field

I am passing a secret key (length of 16 ASCII chars = 128 bits) through Reed Solomon encoder which is operating over Galois field 16 (2^16).
My question is: should this key be considered as 128 bits or 256?
I got lost here because I know that ASCII character = 8 bits so 16 ASCII character = 128 bits.
I read an article where it says once you passed the key through GF(16) then it will be 256, not 128 and I should pass only a key with 8 ASCII character instead. is this correct?
please see the function that I am using below where i used Matlab communication toolbox
function code = errorCorrectingCode(data, LENGTH)
% ERRORCORRECTINGCODE takes input data and runs it through Reed-Solomon Code
% See the following link:
% http://www.mathworks.com/help/comm/ref/encode.html
% >>> example: rsdec(rsenc(gf([1 2 3 ; 4 5 6],3),7,3),7,3);
% Apply Reed-Solomon encoding operation to data to obtain ECCed code.
% convert to GF(2^16) elements:
% (two characters map to one element, so 16char = 8)
elements = keyToField(data);
msg = elements';
% create encoding:
code = rsenc(msg,LENGTH,length(msg));
end
function gfArray = keyToField(keystr)
% KEYTOFIELD takes a key string composed of N characters,
% converting every two characters to a field array in the
% field GF(2^16).
% define number of elements in gfArray as floor(N / 2):
numElts = floor(length(keystr) / 2);
% define flag that checks if keystr is odd or not:
oddLength = mod(length(keystr),2);
% initialize pre-output:
gfArray_bin = zeros((numElts + oddLength),16);
% loop thru pairs of chars in the key:
for idx=1:numElts
curr = 2 * idx;
gfArray_bin(idx,:) = [dec2bin(double(keystr(curr - 1)),8) dec2bin(double(keystr(curr)),8)] - 48;
end
% take care of last element if odd:
if (oddLength)
gfArray_bin(end,:) = dec2bin(double(keystr(end)),16) - 48;
end
% convert everything to decimal again:
gfArray_dec = zeros(size(gfArray_bin,1),1);
for jdx=1:size(gfArray_bin,1)
% shorthand for current row:
bitrow = gfArray_bin(jdx,:);
% convert from a row of 1's and 0's to decimal representation:
gfArray_dec(jdx) = sum(bitrow .* 2.^(numel(bitrow)-1:-1:0));
end
% generate output by wrapping gfArray_dec in field array:
gfArray = gf(gfArray_dec,16);
end

How to compute word scores in Scrabble using MATLAB

I have a homework program I have run into a problem with. We basically have to take a word (such as MATLAB) and have the function give us the correct score value for it using the rules of Scrabble. There are other things involved such as double word and double point values, but what I'm struggling with is converting to ASCII. I need to get my string into ASCII form and then sum up those values. We only know the bare basics of strings and our teacher is pretty useless. I've tried converting the string into numbers, but that's not exactly working out. Any suggestions?
function[score] = scrabble(word, letterPoints)
doubleword = '#';
doubleletter = '!';
doublew = [findstr(word, doubleword)]
trouble = [findstr(word, doubleletter)]
word = char(word)
gameplay = word;
ASCII = double(gameplay)
score = lower(sum(ASCII));
Building on Francis's post, what I would recommend you do is create a lookup array. You can certainly convert each character into its ASCII equivalent, but then what I would do is have an array where the input is the ASCII code of the character you want (with a bit of modification), and the output will be the point value of the character. Once you find this, you can sum over the points to get your final point score.
I'm going to leave out double points, double letters, blank tiles and that whole gamut of fun stuff in Scrabble for now in order to get what you want working. By consulting Wikipedia, this is the point distribution for each letter encountered in Scrabble.
1 point: A, E, I, O, N, R, T, L, S, U
2 points: D, G
3 points: B, C, M, P
4 points: F, H, V, W, Y
5 points: K
8 points: J, X
10 points: Q, Z
What we're going to do is convert your word into lower case to ensure consistency. Now, if you take a look at the letter a, this corresponds to ASCII code 97. You can verify that by using the double function we talked about earlier:
>> double('a')
97
As there are 26 letters in the alphabet, this means that going from a to z should go from 97 to 122. Because MATLAB starts indexing arrays at 1, what we can do is subtract each of our characters by 96 so that we'll be able to figure out the numerical position of these characters from 1 to 26.
Let's start by building our lookup table. First, I'm going to define a whole bunch of strings. Each string denotes the letters that are associated with each point in Scrabble:
string1point = 'aeionrtlsu';
string2point = 'dg';
string3point = 'bcmp';
string4point = 'fhvwy';
string5point = 'k';
string8point = 'jx';
string10point = 'qz';
Now, we can use each of the strings, convert to double, subtract by 96 then assign each of the corresponding locations to the points for each letter. Let's create our lookup table like so:
lookup = zeros(1,26);
lookup(double(string1point) - 96) = 1;
lookup(double(string2point) - 96) = 2;
lookup(double(string3point) - 96) = 3;
lookup(double(string4point) - 96) = 4;
lookup(double(string5point) - 96) = 5;
lookup(double(string8point) - 96) = 8;
lookup(double(string10point) - 96) = 10;
I first create an array of length 26 through the zeros function. I then figure out where each letter goes and assign to each letter their point values.
Now, the last thing you need to do is take a string, take the lower case to be sure, then convert each character into its ASCII equivalent, subtract by 96, then sum up the values. If we are given... say... MATLAB:
stringToConvert = 'MATLAB';
stringToConvert = lower(stringToConvert);
ASCII = double(stringToConvert) - 96;
value = sum(lookup(ASCII));
Lo and behold... we get:
value =
10
The last line of the above code is crucial. Basically, ASCII will contain a bunch of indexing locations where each number corresponds to the numerical position of where the letter occurs in the alphabet. We use these positions to look up what point / score each letter gives us, and we sum over all of these values.
Part #2
The next part where double point values and double words come to play can be found in my other StackOverflow post here:
Calculate Scrabble word scores for double letters and double words MATLAB
Convert from string to ASCII:
>> myString = 'hello, world';
>> ASCII = double(myString)
ASCII =
104 101 108 108 111 44 32 119 111 114 108 100
Sum up the values:
>> total = sum(ASCII)
total =
1160
The MATLAB help for char() says (emphasis added):
S = char(X) converts array X of nonnegative integer codes into a character array. Valid codes range from 0 to 65535, where codes 0 through 127 correspond to 7-bit ASCII characters. The characters that MATLAB® can process (other than 7-bit ASCII characters) depend upon your current locale setting. To convert characters into a numeric array, use the double function.
ASCII chart here.

Base64 length calculation?

After reading the base64 wiki ...
I'm trying to figure out how's the formula working :
Given a string with length of n , the base64 length will be
Which is : 4*Math.Ceiling(((double)s.Length/3)))
I already know that base64 length must be %4==0 to allow the decoder know what was the original text length.
The max number of padding for a sequence can be = or ==.
wiki :The number of output bytes per input byte is approximately 4 / 3 (33%
overhead)
Question:
How does the information above settle with the output length ?
Each character is used to represent 6 bits (log2(64) = 6).
Therefore 4 chars are used to represent 4 * 6 = 24 bits = 3 bytes.
So you need 4*(n/3) chars to represent n bytes, and this needs to be rounded up to a multiple of 4.
The number of unused padding chars resulting from the rounding up to a multiple of 4 will obviously be 0, 1, 2 or 3.
4 * n / 3 gives unpadded length.
And round up to the nearest multiple of 4 for padding, and as 4 is a power of 2 can use bitwise logical operations.
((4 * n / 3) + 3) & ~3
For reference, the Base64 encoder's length formula is as follows:
As you said, a Base64 encoder given n bytes of data will produce a string of 4n/3 Base64 characters. Put another way, every 3 bytes of data will result in 4 Base64 characters. EDIT: A comment correctly points out that my previous graphic did not account for padding; the correct formula for padding is 4(Ceiling(n/3)).
The Wikipedia article shows exactly how the ASCII string Man encoded into the Base64 string TWFu in its example. The input string is 3 bytes, or 24 bits, in size, so the formula correctly predicts the output will be 4 bytes (or 32 bits) long: TWFu. The process encodes every 6 bits of data into one of the 64 Base64 characters, so the 24-bit input divided by 6 results in 4 Base64 characters.
You ask in a comment what the size of encoding 123456 would be. Keeping in mind that every every character of that string is 1 byte, or 8 bits, in size (assuming ASCII/UTF8 encoding), we are encoding 6 bytes, or 48 bits, of data. According to the equation, we expect the output length to be (6 bytes / 3 bytes) * 4 characters = 8 characters.
Putting 123456 into a Base64 encoder creates MTIzNDU2, which is 8 characters long, just as we expected.
Integers
Generally we don't want to use doubles because we don't want to use the floating point ops, rounding errors etc. They are just not necessary.
For this it is a good idea to remember how to perform the ceiling division: ceil(x / y) in doubles can be written as (x + y - 1) / y (while avoiding negative numbers, but beware of overflow).
Readable
If you go for readability you can of course also program it like this (example in Java, for C you could use macro's, of course):
public static int ceilDiv(int x, int y) {
return (x + y - 1) / y;
}
public static int paddedBase64(int n) {
int blocks = ceilDiv(n, 3);
return blocks * 4;
}
public static int unpaddedBase64(int n) {
int bits = 8 * n;
return ceilDiv(bits, 6);
}
// test only
public static void main(String[] args) {
for (int n = 0; n < 21; n++) {
System.out.println("Base 64 padded: " + paddedBase64(n));
System.out.println("Base 64 unpadded: " + unpaddedBase64(n));
}
}
Inlined
Padded
We know that we need 4 characters blocks at the time for each 3 bytes (or less). So then the formula becomes (for x = n and y = 3):
blocks = (bytes + 3 - 1) / 3
chars = blocks * 4
or combined:
chars = ((bytes + 3 - 1) / 3) * 4
your compiler will optimize out the 3 - 1, so just leave it like this to maintain readability.
Unpadded
Less common is the unpadded variant, for this we remember that each we need a character for each 6 bits, rounded up:
bits = bytes * 8
chars = (bits + 6 - 1) / 6
or combined:
chars = (bytes * 8 + 6 - 1) / 6
we can however still divide by two (if we want to):
chars = (bytes * 4 + 3 - 1) / 3
Unreadable
In case you don't trust your compiler to do the final optimizations for you (or if you want to confuse your colleagues):
Padded
((n + 2) / 3) << 2
Unpadded
((n << 2) | 2) / 3
So there we are, two logical ways of calculation, and we don't need any branches, bit-ops or modulo ops - unless we really want to.
Notes:
Obviously you may need to add 1 to the calculations to include a null termination byte.
For Mime you may need to take care of possible line termination characters and such (look for other answers for that).
(In an attempt to give a succinct yet complete derivation.)
Every input byte has 8 bits, so for n input bytes we get:
n × 8      input bits
Every 6 bits is an output byte, so:
ceil(n × 8 / 6)  =  ceil(n × 4 / 3)      output bytes
This is without padding.
With padding, we round that up to multiple-of-four output bytes:
ceil(ceil(n × 4 / 3) / 4) × 4  =  ceil(n × 4 / 3 / 4) × 4  =  ceil(n / 3) × 4      output bytes
See Nested Divisions (Wikipedia) for the first equivalence.
Using integer arithmetics, ceil(n / m) can be calculated as (n + m – 1) div m,
hence we get:
(n * 4 + 2) div 3      without padding
(n + 2) div 3 * 4      with padding
For illustration:
n with padding (n + 2) div 3 * 4 without padding (n * 4 + 2) div 3
------------------------------------------------------------------------------
0 0 0
1 AA== 4 AA 2
2 AAA= 4 AAA 3
3 AAAA 4 AAAA 4
4 AAAAAA== 8 AAAAAA 6
5 AAAAAAA= 8 AAAAAAA 7
6 AAAAAAAA 8 AAAAAAAA 8
7 AAAAAAAAAA== 12 AAAAAAAAAA 10
8 AAAAAAAAAAA= 12 AAAAAAAAAAA 11
9 AAAAAAAAAAAA 12 AAAAAAAAAAAA 12
10 AAAAAAAAAAAAAA== 16 AAAAAAAAAAAAAA 14
11 AAAAAAAAAAAAAAA= 16 AAAAAAAAAAAAAAA 15
12 AAAAAAAAAAAAAAAA 16 AAAAAAAAAAAAAAAA 16
Finally, in the case of MIME Base64 encoding, two additional bytes (CR LF) are needed per every 76 output bytes, rounded up or down depending on whether a terminating newline is required.
Here is a function to calculate the original size of an encoded Base 64 file as a String in KB:
private Double calcBase64SizeInKBytes(String base64String) {
Double result = -1.0;
if(StringUtils.isNotEmpty(base64String)) {
Integer padding = 0;
if(base64String.endsWith("==")) {
padding = 2;
}
else {
if (base64String.endsWith("=")) padding = 1;
}
result = (Math.ceil(base64String.length() / 4) * 3 ) - padding;
}
return result / 1000;
}
I think the given answers miss the point of the original question, which is how much space needs to be allocated to fit the base64 encoding for a given binary string of length n bytes.
The answer is (floor(n / 3) + 1) * 4 + 1
This includes padding and a terminating null character. You may not need the floor call if you are doing integer arithmetic.
Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately.
For all people who speak C, take a look at these two macros:
// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 encoding operation
#define B64ENCODE_OUT_SAFESIZE(x) ((((x) + 3 - 1)/3) * 4 + 1)
// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 decoding operation
#define B64DECODE_OUT_SAFESIZE(x) (((x)*3)/4)
Taken from here.
While everyone else is debating algebraic formulas, I'd rather just use BASE64 itself to tell me:
$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately."| wc -c
525
$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately." | base64 | wc -c
710
So it seems the formula of 3 bytes being represented by 4 base64 characters seems correct.
I don't see the simplified formula in other responses. The logic is covered but I wanted a most basic form for my embedded use:
Unpadded = ((4 * n) + 2) / 3
Padded = 4 * ((n + 2) / 3)
NOTE: When calculating the unpadded count we round up the integer division i.e. add Divisor-1 which is +2 in this case
Seems to me that the right formula should be:
n64 = 4 * (n / 3) + (n % 3 != 0 ? 4 : 0)
I believe that this one is an exact answer if n%3 not zero, no ?
(n + 3-n%3)
4 * ---------
3
Mathematica version :
SizeB64[n_] := If[Mod[n, 3] == 0, 4 n/3, 4 (n + 3 - Mod[n, 3])/3]
Have fun
GI
Simple implementantion in javascript
function sizeOfBase64String(base64String) {
if (!base64String) return 0;
const padding = (base64String.match(/(=*)$/) || [])[1].length;
return 4 * Math.ceil((base64String.length / 3)) - padding;
}
If there is someone interested in achieve the #Pedro Silva solution in JS, I just ported this same solution for it:
const getBase64Size = (base64) => {
let padding = base64.length
? getBase64Padding(base64)
: 0
return ((Math.ceil(base64.length / 4) * 3 ) - padding) / 1000
}
const getBase64Padding = (base64) => {
return endsWith(base64, '==')
? 2
: 1
}
const endsWith = (str, end) => {
let charsFromEnd = end.length
let extractedEnd = str.slice(-charsFromEnd)
return extractedEnd === end
}
In windows - I wanted to estimate size of mime64 sized buffer, but all precise calculation formula's did not work for me - finally I've ended up with approximate formula like this:
Mine64 string allocation size (approximate)
= (((4 * ((binary buffer size) + 1)) / 3) + 1)
So last +1 - it's used for ascii-zero - last character needs to allocated to store zero ending - but why "binary buffer size" is + 1 - I suspect that there is some mime64 termination character ? Or may be this is some alignment issue.

How compiler is converting integer to string and vice versa

Many languages have functions for converting string to integer and vice versa. So what happens there? What algorithm is being executed during conversion?
I don't ask in specific language because I think it should be similar in all of them.
To convert a string to an integer, take each character in turn and if it's in the range '0' through '9', convert it to its decimal equivalent. Usually that's simply subtracting the character value of '0'. Now multiply any previous results by 10 and add the new value. Repeat until there are no digits left. If there was a leading '-' minus sign, invert the result.
To convert an integer to a string, start by inverting the number if it is negative. Divide the integer by 10 and save the remainder. Convert the remainder to a character by adding the character value of '0'. Push this to the beginning of the string; now repeat with the value that you obtained from the division. Repeat until the divided value is zero. Put out a leading '-' minus sign if the number started out negative.
Here are concrete implementations in Python, which in my opinion is the language closest to pseudo-code.
def string_to_int(s):
i = 0
sign = 1
if s[0] == '-':
sign = -1
s = s[1:]
for c in s:
if not ('0' <= c <= '9'):
raise ValueError
i = 10 * i + ord(c) - ord('0')
return sign * i
def int_to_string(i):
s = ''
sign = ''
if i < 0:
sign = '-'
i = -i
while True:
remainder = i % 10
i = i / 10
s = chr(ord('0') + remainder) + s
if i == 0:
break
return sign + s
I wouldn't call it an algorithm per se, but depending on the language it will involve the conversion of characters into their integral equivalent. Many languages will either stop on the first character that cannot be represented as an integer (e.g. the letter a), will blindly convert all characters into their ASCII value (e.g. the letter a becomes 97), or will ignore characters that cannot be represented as integers and only convert the ones that can - or return 0 / empty. You have to get more specific on the framework/language to provide more information.
String to integer:
Many (most) languages represent strings, on some level or another, as an array (or list) of characters, which are also short integers. Map the ones corresponding to number characters to their number value. For example, '0' in ascii is represented by 48. So you map 48 to 0, 49 to 1, and so on to 9.
Starting from the left, you multiply your current total by 10, add the next character's value, and move on. (You can make a larger or smaller map, change the number you multiply by at each step, and convert strings of any base you like.)
Integer to string is a longer process involving base conversion to 10. I suppose that since most integers have limited bits (32 or 64, usually), you know that it will come to a certain number of characters at most in a string (20?). So you can set up your own adder and iterate through each place for each bit after calculating its value (2^place).

Resources