Generate an integer for encryption from a string and vice versa - python-3.x

I am trying to write an RSA code in python3. I need to turn user input strings (containing any characters, not only numbers) into integers to then encrypt them. What is the best way to turn a sting into an integer in Python 3.6 without 3-rd party modules?

how to encode a string to an integer is far from unique... there are many ways! this is one of them:
strg = 'user input'
i = int.from_bytes(strg.encode('utf-8'), byteorder='big')
the conversion in the other direction then is:
s = int.to_bytes(i, length=len(strg), byteorder='big').decode('utf-8')
and yes, you need to know the length of the resulting string before converting back. if length is too large, the string will be padded with chr(0) from the left (with byteorder='big'); if length is too small, int.to_bytes will raise an OverflowError: int too big to convert.

The #hiro protagonist's answer requires to know the length of the string. So I tried to find another solution and good answers here: Python3 convert Unicode String to int representation. I just summary here my favourite solutions:
def str2num(string):
return int(binascii.hexlify(string.encode("utf-8")), 16)
def num2str(number):
return binascii.unhexlify(format(number, "x").encode("utf-8")).decode("utf-8")
def numfy(s, max_code=0x110000):
# 0x110000 is max value of unicode character
number = 0
for e in [ord(c) for c in s]:
number = (number * max_code) + e
return number
def denumfy(number, max_code=0x110000):
l = []
while number != 0:
l.append(chr(number % max_code))
number = number // max_code
return ''.join(reversed(l))
Intersting: testing some cases shows me that
str2num(s) = numfy(s, max_code=256) if ord(s[i]) < 128
and
str2num(s) = int.from_bytes(s.encode('utf-8'), byteorder='big') (#hiro protagonist's answer)

Related

Understanding Python sequence

I am doing a hackerrank example called Flipping bits where given a list of 32 bit unsigned integers. Flip all the bits (1->0 and 0->1) and return the result as an unsigned integer.
The correct code is:
def flippingBits(n):
seq = format(n, '032b')
return int(''.join(['0' if bit == '1' else '1' for bit in seq]), 2)
I dont understand the last line, what does the ''. part do? and why is there a ,2 at the end?
I have understood most of the code but need help in understanding the last part.
what does the ''. part do
'' represents an empty string which will be used as separator to join collection elements into string (some examples can be found here)
and why is there a ,2 at the end?
from int docs:
class int(x=0)
class int(x, base=10)
Return an integer object constructed from a number or string x
In this case it will parse the string provided in binary format (i.e. with base 2) into int.
I hope the below explanation helps:
def flippingBits(n):
seq = format(n, '032b') # change the format from base 10 to base 2 with 32bit size unsigned integer into string
return int(''.join(['0' if bit == '1' else '1' for bit in seq]), 2)
# ['0' if bit == '1' else '1' for bit in seq] => means: build a list of characters from "seq" string
# in which whenever there is 1 convert it to 0, and to 1 otherwise; then
# ''.join(char_list) => means: build string by joining characters in char_list
# without space between them ('' means empty delimiter); then
# int(num_string, 2) => convert num_string from string to integer in a base 2
Notice that you can do the bit flipping by using bit-wise operations without converting to string back and forth.
def flippingBits(n):
inverted_n = ~n # flip all bits from 0 to 1, and 1 to 0
return inverted_n+2**32 # because the number is a signed integer, the most significant bit should be flipped as well

How to multiply numbers of a string by its position in the string

I am a newby on this. I am trying to multiply every single element of the string below ('10010010') by 2 to the power of the position of the element in the string and sum all the multiplications. So far I am trying to do it like this, but I cannot achieve to figure out how to do it.
def decodingvalue(str1):
# read each character in input string
for ch in str1:
q=sum(2^(ch-1)*ch.isdigit())
return q
Function call
print(decodingvalue('10010010'))
Thanks a lot for your help!
I think you trying convert binary to int. If that so you can do the following:
str = '101110101'
#length is counted 1 to n, decrementing by 1 changes to 0-(n-1)
c = len(str)-1
q = 0
for ch in str:
print(q,c,ch)
q = q + (int(ch)*(2**c)) #in python power is '**'
c = c-1
if c == -1:
break
print(q)
you can of course optimize it and finish in fewer lines.
In python ^ (caret operator) is a Bitwise XOR.

Recreating JS bitwise integer handling in Python3

I need to translate a hash function from JavaScript to Python.
The function is as follows:
function getIndex(string) {
var length = 27;
string = string.toLowerCase();
var hash = 0;
for (var i = 0; i < string.length; i++) {
hash = string.charCodeAt(i) + (hash << 6) + (hash << 16) - hash;
}
var index = Math.abs(hash % length);
return index;
}
console.log(getIndex(window.prompt("Enter a string to hash")));
This function is Objectively Correct™. It is perfection itself. I can't change it, I just have to recreate it. Whatever it outputs, my Python script must also output.
However - I'm having a couple of problems, and I think that it's all to do with the way that the two languages handle signed integers.
JS bitwise operators treat their operands as a sequence of 32 bits. Python, however, has no concept of bit limitation and just keeps going like an absolute madlad. I think that this is the one relevant difference between the two languages.
I can limit the length of hash in Python by masking it to 32 bits with hash & 0xFFFFFFFF.
I can also negate hash if it's above 0x7FFFFFFF with hash = hash ^ 0xFFFFFFFF (or hash = ~hash - they both seem to do the same thing). I believe that this simulates negative numbers.
I apply both of these restrictions to the hash with a function called t.
Here's my Python code so far:
def nickColor(string):
length = 27
def t(x):
x = x & 0xFFFFFFFF
if x > 0x7FFFFFFF:
x = x ^ 0xFFFFFFFF
return x
string = string.lower()
hash = t(0)
for letter in string:
hash = t(hash)
hash = t(t(ord(letter)) + t(hash << 6) + t(hash << 16) - t(hash))
index = hash % length
return index
It seems to work up until the point that a hash needs to become negative, at which point the two scripts diverge. This normally happens about 4 letters into the string.
I'm assuming that my problem lies in recreating JS negative numbers in Python. How can I say bye to this problem?
Here is a working translation:
def nickColor(string):
length = 27
def t(x):
x &= 0xFFFF_FFFF
if x > 0x7FFF_FFFF:
x -= 0x1_0000_0000
return float(x)
bytes = string.lower().encode('utf-16-le')
hash = 0.0
for i in range(0, len(bytes), 2):
char_code = bytes[i] + 256*bytes[i+1]
hash = char_code + t(int(hash) << 6) + t(int(hash) << 16) - hash
return int(hash % length if hash >= 0 else abs(hash % length - length))
The point is, only the shifts (<<) are calculated as 32-bit integer operations, their result is converted back to double before entering additions and subtractions. I'm not familiar with the rules for double-precision floating point representation in the two languages, but it's safe to assume that on all personal computing devices and web servers it is the same for both languages, namely double-precision IEEE 754. For very long strings (thousands of characters) the hash could lose some bits of precision, which of course affects the final result, but in the same way in JS as in Python (not what the author of the Objectively Correct™ function intended, but that's the way it is…). The last line corrects for the different definition of the % operator for negative operands in JavaScript and Python.
Furthermore (thanks to Mark Ransom for reminding me this), to fully emulate JavaScript, it is also necessary to consider its encoding, which is UTF-16, but with surrogate pairs handled as if they consisted of 2 characters. Encoding the string as utf-16-le you make sure that the first byte in each 16-bit “word” is the least significant one, plus, you don't get the BOM that you would get if you used utf-16 tout court (thank you Martijn Pieters).

Python3 write string as binary

For a Python 3 programming assignment I have to work with Huffman coding. It's simple enough to generate the correct codes which result in a long string of 0's and 1's.
Now my problem is actually writings this string of as binary and not as text. I attempted to do this:
result = "01010101 ... " #really long string of 0's and 1's
filewrt = open(output_file, "wb") #appending b to w should write as binary, should it not?
filewrt.write(result)
filewrt.close()
however I'm still geting a large text file of 0 and 1 characters. How do I fix this?
EDIT: It seems as if I just simply don't understand how to represent an arbitrary bit in Python 3.
Based on this SO question I devised this ugly monstrosity:
for char in result:
filewrt.write( bytes(int(char, 2)) )
Instead of getting anywhere close to working, it outputted a zero'd file that was twice as large as my input file. Can someone please explain to me how to represent binary arbitrarily? And in the context of creating a huffman tree, how do I go about concatinating or joining bits based on their leaf locations if I should not use a string to do so.
def intToTextBytes(n, stLen=0):
bs = b''
while n>0:
bs = bytes([n & 0xff]) + bs
n >>= 8
return bs.rjust(stLen, b'\x00')
num = 0b01010101111111111111110000000000000011111111111111
bs = intToTextBytes(num)
print(bs)
open(output_file, "wb").write(bs)
EDIT: A more complicated, but faster (about 3 times) way:
from math import log, ceil
intToTextBytes = lambda n, stLen=0: bytes([
(n >> (i<<3)) & 0xff for i in range(int(ceil(log(n, 256)))-1, -1, -1)
]).rjust(stLen, b'\x00')

How compiler is converting integer to string and vice versa

Many languages have functions for converting string to integer and vice versa. So what happens there? What algorithm is being executed during conversion?
I don't ask in specific language because I think it should be similar in all of them.
To convert a string to an integer, take each character in turn and if it's in the range '0' through '9', convert it to its decimal equivalent. Usually that's simply subtracting the character value of '0'. Now multiply any previous results by 10 and add the new value. Repeat until there are no digits left. If there was a leading '-' minus sign, invert the result.
To convert an integer to a string, start by inverting the number if it is negative. Divide the integer by 10 and save the remainder. Convert the remainder to a character by adding the character value of '0'. Push this to the beginning of the string; now repeat with the value that you obtained from the division. Repeat until the divided value is zero. Put out a leading '-' minus sign if the number started out negative.
Here are concrete implementations in Python, which in my opinion is the language closest to pseudo-code.
def string_to_int(s):
i = 0
sign = 1
if s[0] == '-':
sign = -1
s = s[1:]
for c in s:
if not ('0' <= c <= '9'):
raise ValueError
i = 10 * i + ord(c) - ord('0')
return sign * i
def int_to_string(i):
s = ''
sign = ''
if i < 0:
sign = '-'
i = -i
while True:
remainder = i % 10
i = i / 10
s = chr(ord('0') + remainder) + s
if i == 0:
break
return sign + s
I wouldn't call it an algorithm per se, but depending on the language it will involve the conversion of characters into their integral equivalent. Many languages will either stop on the first character that cannot be represented as an integer (e.g. the letter a), will blindly convert all characters into their ASCII value (e.g. the letter a becomes 97), or will ignore characters that cannot be represented as integers and only convert the ones that can - or return 0 / empty. You have to get more specific on the framework/language to provide more information.
String to integer:
Many (most) languages represent strings, on some level or another, as an array (or list) of characters, which are also short integers. Map the ones corresponding to number characters to their number value. For example, '0' in ascii is represented by 48. So you map 48 to 0, 49 to 1, and so on to 9.
Starting from the left, you multiply your current total by 10, add the next character's value, and move on. (You can make a larger or smaller map, change the number you multiply by at each step, and convert strings of any base you like.)
Integer to string is a longer process involving base conversion to 10. I suppose that since most integers have limited bits (32 or 64, usually), you know that it will come to a certain number of characters at most in a string (20?). So you can set up your own adder and iterate through each place for each bit after calculating its value (2^place).

Resources