How to convert hex encoded bytes to String in Python3? - python-3.x

I read some value from Windows Registry (SAM) with Python3. As far as I can tell it looks like hex encoded bytes:
>>> b = b'A\x00d\x00m\x00i\x00n\x00i\x00s\x00t\x00r\x00a\x00t\x00o\x00r\x00'
>>> print(b)
A d m i n i s t r a t o r
Now how would I convert that to a String (should be "Administrator")? Using "print" just gives me "A d m i n i s t r a t o r". How to do the conversion correctly without using dirty tricks?

b = b'A\x00d\x00m\x00i\x00n\x00i\x00s\x00t\x00r\x00a\x00t\x00o\x00r\x00'
b = b.replace(b'\x00', b'')
print(b)
# b'Administrator'

I propably should have used utf-16 decoding:
>>> b = b'A\x00d\x00m\x00i\x00n\x00i\x00s\x00t\x00r\x00a\x00t\x00o\x00r\x00'
>>> print(b.decode('utf-16'))
Administrator
SORRY!

Related

converting hex to base64 - what really happens in this process?

in attempt to make simple code to convert a hex string to base64:
my thought was: hex -> integer -> binary -> base64
so i wrote this little code:
import string
def bit(integer):
# To binary
return int(bin(integer))[2:]
#Hex multiply by 16 depending on position: 0xAB = A*(16**2) + B(16**1) = #10*16**2 + 11*16**1
#0x3a2f
#3*(16**2) + 7*(16**1) + 5
def removeletter(list):
#"abcdef" = "10 11 12 13 14 15"
for i, letter in enumerate(list):
if letter in hextable.keys():
list[i] = hextable[letter]
return list
def todecimal(h):
power = 0
l = [num for num in str(h)] #['3', 'a', '2', 'f']
l = removeletter(l)
l.reverse() #['f', '2', 'a', '3']
for i, n in enumerate(l):
number = int(n)
l[i] = number*(16**power)
power += 1
l.reverse()
return sum(l)
lowers = string.ascii_lowercase
hextable = {}
for number, letter in enumerate(lowers[:6]):
hextable[letter] = number + 10
in this little challenge i am doing, it says:
The string:
49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
Should produce:
SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
ok,
print(bit(todecimal('49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d')))
this should get the binary of the hex string, which if I put through a binary to base64 converter, should return SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t. or so i thought.
└─$ python3 hextobase64.py
10010010010011101101101001000000110101101101001011011000110110001101001011011100110011100100000011110010110111101110101011100100010000001100010011100100110000101101001011011100010000001101100011010010110101101100101001000000110000100100000011100000110111101101001011100110110111101101110011011110111010101110011001000000110110101110101011100110110100001110010011011110110111101101101
after checking using a hex to binary converter, i can see that the binary is correct.
now, if if i put this through a binary to base64 converter, it should return SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t, right?
the thing is, this binary to base64 converter gave me kk7aQNbS2NjS3M5A8t7q5EDE5MLS3EDY0tbKQMJA4N7S5t7c3urmQNrq5tDk3t7a which is odd to me. so I must be doing something wrong. from my understanding, hexadecimal can be represented in binary, and so can base64. base64 takes the binary and groups the binary by 6 digits to produce its own representation. so obviously if I have the binary, it should be interchangeable, but something is wrong.
what am i doing wrong?

How do i convert all the variables into an int straightaway

a,b,c = input(a,b,c).split()
would need a way to immediately convert a,b,c into int without having to do something like
a = int(a)
b = int(b)
c = int(c)
please help :)
l = input().split(',')
l = [ int(i) for i in l ]
a, b, c = *l
Run the code and enter 3 integers separated by ',' then a b and c will contain the 3 integers

len(str) giving wrong result after retrieving str from filename

Any idea why I am getting a length of 6 instead of 5?
I created a file called björn-100.png and ran the code using python3:
import os
for f in os.listdir("."):
p = f.find("-")
name = f[:p]
print("name")
print(name)
length = len(name)
print(length)
for a in name:
print(a)
prints out the following:
name
björn
6
b
j
o
̈
r
n
instead of printing out
name
björn
5
b
j
ö
r
n
If you're using python 2.7, you can simply decode the file name as UTF-8 first:
length = len(name.decode('utf-8'))
But since you're using python 3 and can't simply decode a string as if it were a bytearray, I recommend using unicodedata to normalize the string.
import unicodedata
length = len(unicodedata.normalize('NFC', name))
The way to get the correct string with the two dots inside the o char is:
import unicodedata
name = unicodedata.normalize('NFC', name)

Can I print symbols in hex with more than 2 characters?

I am using Python 3.5.2
I know that with print("\x00") (where 0 is an ASCII character) I can print symbols with hex format. But how can I print number 500,000 (in hex: 7A120) when print("\x00") takes only 2 characters?
To print a constant hexidecimal expression, you can prefix the number with a 0x, and it will resolve to an int with the equivalent base 10 value, like so:
>>> print(0x7A120)
500000
If you want to print a string with arbitrary hexidecimal characters in it, use int:
>>> a = "7A120"
>>> print(int(a, 16))
500000
The second argument to int is the base to parse the string from, in this case base 16 (hex).
To print an integer in hexidecimal format, use the format operator, %:
>>> a = 0x7A120
>>> print("%x" % a)
7a120
You can change the "x" in "%x" to uppercase to print a through f in uppercase:
>>> b = 0xABCDEF
>>> print("%x" % b)
abcdef
>>> print("%X" % b)
ABCDEF

Python SpellCheck str object has no attribute read

Trying to get this spellchecker I came across online to work, but no luck. Any help Would be appreciated. Original code from http://norvig.com/spell-correct.html
import re, collections, codecs
def words(text): return re.findall('[a-z]+', text.lower())
def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model
file = codecs.open('C:\88888\88888\88888\88888\8888\A Word.txt', encoding='utf-8', mode='r')
NWORDS = train(words(file.read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word):
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in splits if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
inserts = [a + c + b for a, b in splits for c in alphabet]
return set(deletes + transposes + replaces + inserts)
def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words): return set(w for w in words if w in NWORDS)
def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)
Error:
File "C:\8888\8888\8888\8888\88888\SpellCheck.py", line 11
file = codecs.open('C:\888\888\888\8888\88888\A Word.txt', encoding='utf-8', mode='r')
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
OK, let's do something try this...
get a string value '\x' and try to do something to it
or try
string('\x.....')
Returns your error right?
So if you have a string defined say
x = string('\y\o\u \c\a\n \n\e\v\e\r \c\h\a\n\g\e \t\h\i\s \i\n \p\y\t\h\o\n')
Than you are just out of luck.
It will be a bummer if the user decides to type a '\' as any character of the input.
To fix the problem you could try using some looping or recursive code like:
How to remove illegal characters from path and filenames?
C:\88888\88888\88888\88888\8888\A Word.txt - that's the strangest path I've seen this year :)
Try replacing it with C:\\88888\\88888\\88888\\88888\\8888\\A Word.txt

Resources