Hexadecimal byte string conversion to unicode in python - python-3.x

I read a string from a text file which contains many special characters in 'rb' mode and store it in a variable. But when I insert it into a textbox in tkinter some characters go missing.
inp = tut.encrypt_text(tut.getKey(password.get()),text.get("1.0", END))
The code passes a string to the encrypting function to encrypt the text and then gets the encrypted text in hex byte string.
outfile.write(encryptor.encrypt(chunk))
this code is used for writing the encrypted text to a file and then the same is being read from the file and returned back to the call made above.
with open(outputFile, 'rb') as out:
output =out.read()
return output
The encrypted text in file is:
0000000000000011P”=êäS£¾1:–ø{pâRÆA<,ã É˜“Uê
Here itself some characters are missing. When i read it in 'rb' mode and store in variable output and return to the call made, the variable has the value
b'0000000000000011P\x94=\xea\xe4S\x0f\xa3\xbe1\x18:\x96\xf8{p\xe2R\xc6A<,\xe3\xa0\xc9\x8d\x98\x93\x04\x08U\xea'
But when I insert it in the textbox using the code
text.delete('1.0', END)
text.insert(END, inp)
The text that is being printed in the textbox is
0000000000000011P=êäS£¾1:ø{pâRÆA<,ã ÉUê.
This text is different from the one which was in the file. How can I get the same string which is in the text file in the textbox?

Related

generate a hex file by converting strings into hex from a text file in Python

I have a Python tool that generates a text file in which each line has a string. I want to generate a hex file using this text file. The file has lines the following line:
-5.139488050547036391e-01
3.181812818225058681e+00
475.465798764
abc[0]
abc[0]*abc[10]
I tried using binascii.hexlify(b'<String>'), which works when I manually enter the strings, but when I do that:
with open("strings.txt", "r") as a_file:
for line in a_file:
if not line.strip():
continue
stripped_line = line.strip()
hex_= binascii.hexlify(b'<'+ stripped_line +'>')
print(hex_)
I get this error:
TypeError: can't concat str to bytes
How can I convert those strings of different types into hex and generate a .hex file?
To convert a string (line in file) to bytes object you have to encode it. In your case, that only means to replace this line from your code
hex_= binascii.hexlify(b'<'+ stripped_line +'>')
with this line
hex_= binascii.hexlify(stripped_line.encode())
Your code ran into error, because you tried to concatenate 'b<' (byte object) with stripped_line (string object) and it has no meaning in python.

How to convert Hex to original file format?

I have a .tgz file that was formatted as shell code, it looks like this (Hex):
"\x1F\x8B\x08\x00\x44\x7A\x91\x4F\x00\x03\xED\x59\xED\x72.."
It was generated this way (python3):
import os
def main():
dump_src = "MyPlugin.tgz"
fc = ""
try:
with open(dump_src, 'rb') as fd:
fcr = fd.read()
for byte in bytearray(fcr):
fc += "\\x{:02x}".format(byte)
except:
fcr = dump_src
for byte in bytearray(fcr):
fc += "\\x{:02x}".format(byte)
print(fc)
# failed attempt:
fcback = bytes(int(fc[i+2:i+4], 16) for i in range(0, len(fc), 4))
print (fcback)
if __name__ == "__main__":
main()
How can I convert this back to the original tgz archive?
Edit: failed attempt in the last section outputs this:
b'\x8b\x00\x10]\x03\x93o0\x85%\xe2!\xa4H\xf1Fi\xa7\x15\xf61&\x13N\xd9[\xfag\x11V\x97\xd3\xfb%\xf7\xe3\\\xae\xc2\xff\xa4>\xaf\x11\xcc\x93\xf1\x0c\x93\xa4\x1b\xefxj\xc3?\xf9\xc1\xe8\xd1\xd9\x01\x97qB"\x1a\x08\x9cO\x7f\xe9\x19\xe3\x9c\x05\xf2\x04a\xaa\x00A,\x15"RN-\xb6\x18K\x85\xa1\x11\x83\xac/\xffR\x8a\xa19\xde\x10\x0b\x08\x85\x93\xfc]\x8a^\xd2-T\x92\x9a\xcc-W\xc7|\xba\x9c\xb3\xa6V0V H1\x98\xde\x03#\x14\'\n 1Y\xf7R\x14\xe2#\xbe*:\xe0\xc8\xbb\xc9\x0bo\x8bm\xed.\xfd\xae\xef\x9fT&\xa1\xf4\xcf\xa7F\xf4\xef\xbb"8"\xb5\xab,\x9c\xbb\xfc3\x8b\xf5\x88\xf4A\x0ek%5eO\xf4:f\x0b\xd6\x1bi\xb6\xf3\xbf\xf7\xf9\xad\xb5[\xdba7\xb8\xf9\xcd\xba\xdd,;c\x0b\xaaT"\xd4\x96\x17\xda\x07\x87& \xceH\xd6\xbf\xd2\xeb\xb4\xaf\xbd\xc2\xee\xfc\'3zU\x17>\xde\x06u\xe3G\x7f\x1e\xf3\xdf\xb6\x04\x10A\x04\x10A\x04\x10A\x04\x10A\xff\x9f\xab\xe8(\x00'
And when I output it to a file (e.g. via python3 main.py > MyFile.tgz) the file is corrupted.
Since you know the format of the data (each byte is encoded as a string of 4 characters in the format "\xAB") it's easy to revert the conversion and get the original bytes again. It'll only take one line of Python code:
data = bytes(int(fc[i+2:i+4], 16) for i in range(0, len(fc), 4))
This uses:
range(start, stop, step) with step 4 to iterate in groups of 4 characters through your string
slicing to get each group of 2 hexadecimal digits
int(x, base) to convert the hexadecimal string to an integer
a generator expression to immediately pass the converted elements to:
bytes() to create a bytes object with the data
The variable data is now of type bytes and you could directly write it to a file (to decompress with an external zip program), or pass it to zlib.decompress() (to further process it in Python).
UPDATE (follow-up on the comments and updated question):
Firstly, I have tested the above code and it does result in the same bytes as the input. Are you really sure that the example output in your question is the actual result of the code in your question? Please try to be careful when copying code and/or output. A few remarks:
Your code is not properly formatted, so I cannot run it without making modifications. And when I have made modifications to the code, I might run different code than you do, yielding different results. So next time please copy-paste your exact (working, tested) code without modifications.
The format string in your code uses lowercase hexadecimal format, and your first example output uses uppercase. So that output cannot be from this code.
I don't have access to your file "MyPlugin.tgz", but when I test your code with another .tgz file (after fixing the IndentationErrors), my output is correct. It starts with \x1f\x8b as expected (this is the magic number in the gzip header). I can't explain why your output is different...
Secondly, it seems like you don't fully understand how bytes and string representations work. When you write print(fcback), a string representation of the Python object fcback (in this case a bytes object) is printed. The string representation of a bytes object is not the same as the binary data! When printing a bytes object, each byte that corresponds to a printable ASCII character is replaced by that character, other bytes are escaped (similar to the formatted string that your code generates). Also, it starts with b' and ends with '.
You cannot print binary data to your terminal and then pipe the output to a file. This will result in a different file. The correct way to write the data to a file is using file.write(data) in your Python code.
Here's a fully working example:
def binary_to_text(data):
"""Convert a bytes object to a formatted text string."""
text = ""
for byte in data:
text += "\\x{:02x}".format(byte)
return text
def text_to_binary(text):
"""Convert a formatted text string to a bytes object."""
return bytes(int(text[i+2:i+4], 16) for i in range(0, len(text), 4))
def main():
# Read the binary data from input file:
with open('MyPlugin.tgz', 'rb') as input_file:
input_data = input_file.read()
# Convert binary to text (based on your original code):
text = binary_to_text(input_data)
print(text[0:100])
# Convert the text back to binary:
output_data = text_to_binary(text)
print(output_data[0:100])
# Write the binary data back to a file:
with open('MyPlugin-restored.tgz', 'wb') as output_file:
output_file.write(output_data)
if __name__ == '__main__':
main()
Note that I only print the first 100 elements to keep the output short. Also notice that the second print-statement prints a much longer text. This is because the first print gets 100 characters (which are printed "as is"), while the second print gets 100 bytes (of which most bytes are escaped, causing the output to be longer).

Trying to output hex data as readable text in Python 3.6

I am trying to read hex values from specific offsets in a file, and then show that as normal text. Upon reading the data from the file and saving it to a variable named uName, and then printing it, this is what I get:
Card name is: b'\x95\xdc\x00'
Here's the code:
cardPath = str(input("Enter card path: "))
print("Card name is: ", end="")
with open(cardPath, "rb+") as f:
f.seek(0x00000042)
uName = f.read(3)
print(uName)
How can remove the 'b' I am getting at the beginning? And how can I remove the '\x'es so that b'\x95\xdc\x00' becomes 95dc00? If I can do that, then I guess I can convert it to text using binascii.
I am sorry if my mistake is really really stupid because I don't have much experience with Python.
Those string started with b in python is a byte string.
Usually, you can use decode() or str(byte_string,'UTF-8) to decode the byte string(i.e. the string start with b') to string.
EXAMPLE
str(b'\x70\x79\x74\x68\x6F\x6E','UTF-8')
'python'
b'\x70\x79\x74\x68\x6F\x6E'.decode()
'python'
However, for your case, it raised an UnicodeDecodeError during decoding.
str(b'\x95\xdc\x00','UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte
I guess you need to find out the encoding for your file and then specify it when you open the file, like below:
open("u.item", encoding="THE_ENCODING_YOU_FOUND")

Writing to files in ASCII with Python3, not UTF8

I have a program that I created with two sections.
The first one copies a text file with an integer in the middle of the file name in this format.
file = "Filename" + "str(int)" + ".txt"
the user can create as many copies of the file that they would like.
The second part of the program is what I am having the problem with. There is an integer at the very bottom of the file that is to correspond with the integer in the file name. After the first part is done, I open each file one at a time in "r+" read/write format. So I can file.seek(1000) to about where the integer is in the file.
Now in my opinion the next part should be easy. I should just simply have to write str(int) into the file right here. But it wasn't that easy. It worked just fine doing it like that in Linux at home, but at work on Windows it proved difficult. What I ended up having to do after file.seek(1000) is write to the file using Unicode UTF-8. I accomplished this with this code snippet of the rest of the program. I will document it so that it is able to be understood what is going on. Instead of having to write this in Unicode, I would love to be able to write this in good old regular English ASCII characters. Eventually this program will be expanded to include a lot more data at the bottom of each file. Having to write the data in Unicode is going to make things extremely difficult. If I just write the data without turning it into Unicode this is the result. This string is supposed to say #2 =1534, instead it says #2 =ㄠ㌵433.
If someone can show me what I am doing wrong that would be great. I would love to just use something like file.write('1534') to write the data to the file instead of having to do it in Unicode UTF-8.
while a1 < d1 :
file = "file" + str(a1) + ".par"
f = open(file, "r+")
f.seek(1011)
data = f.read() #reads the data from that point in the file into a variable.
numList= list(str(a1)) # "a1" is the integer in the file name. I had to turn the integer into a list to accomplish the next task.
replaceData = '\x00' + numList[0] + '\x00' + numList[1] + '\x00' + numList[2] + '\x00' + numList[3] + '\x00' #This line turns the integer into Utf 8 Unicode. I am by no means a Unicode expert.
currentData = data #probably didn't need to be done now that I'm looking at this.
data = data.replace(currentData, replaceData) #replaces the Utf 8 string in the "data" variable with the new Utf 8 string in "replaceData."
f.seek(1011) # Return to where I need to be in the file to write the data.
f.write(data) # Write the new Unicode data to the file
f.close() #close the file
f.close() #make sure the file is closed (sometimes it seems that this fails in Windows.)
a1 += 1 #advances the integer, and then return to the top of the loop
This is an example of writing to a file in ASCII. You need to open the file in byte mode, and using the .encode method for strings is a convenient way to get the end result you want.
s = '12345'
ascii = s.encode('ascii')
with open('somefile', 'wb') as f:
f.write(ascii)
You can obviously also open in rb+ (read and write byte mode) in your case if the file already exists.
with open('somefile', 'rb+') as f:
existing = f.read()
f.write(b'ascii without encoding!')
You can also just pass string literals with the b prefix, and they will be encoded with ascii as shown in the second example.

Highlight the text in python and save it in word file

I am trying to take the text from word file and highlighted the required text and aging want to save the text into new word file.
I am able to highlight the text using ANSI escape sequences but I am unable add it back to the word file.
from docx import Document
doc = Document('t.docx')
##string present in t.docx '''gnjdkgdf helloworld dnvjk dsfgdzfh jsdfKSf klasdfdf sdfvgzjcv'''
if 'helloworld' in doc.paragraphs[0].text:
high=doc.paragraphs[0].text.replace('helloworld', '\033[43m{}\033[m'.format('helloworld'))
doc.add_paragraph(high)
doc.save('t1.docx')
getting this error.
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
Instead of using ANSI escape sequences, you could use python-docx's built-in Font highlight color:
from docx import Document
from docx.enum.text import WD_COLOR_INDEX
doc = Document('t.docx')
##string present in t.docx '''gnjdkgdf helloworld dnvjk dsfgdzfh jsdfKSf klasdfdf sdfvgzjcv'''
# Get the first paragraph's text
p1_text = doc.paragraphs[0].text
# Create a new paragraph with "helloworld" highlighted
p2 = doc.add_paragraph()
substrings = p1_text.split('helloworld')
for substring in substrings[:-1]:
p2.add_run(substring)
font = p2.add_run('helloworld').font
font.highlight_color = WD_COLOR_INDEX.YELLOW
p2.add_run(substrings[-1])
# Save document under new name
doc.save('t1.docx')

Resources