Converting 16-digit hexadecimal string into double value in Python 3 - python-3.x

I want to convert 16-digit hexadecimal numbers into doubles. I actually did the reverse of this before which worked fine:
import struct
import wrap
def double_to_hex(doublein):
return hex(struct.unpack('<Q', struct.pack('<d', doublein))[0])
for i in modified_list:
encoded_list.append(double_to_hex(i))
modified_list.clear()
encoded_msg = ''.join(encoded_list).replace('0x', '')
encoded_list.clear()
print_command('encode', encoded_message)
And now I want to sort of do the reverse. I tried this without success:
from textwrap import wrap
import struct
import binascii
MESSAGE = 'c030a85d52ae57eac0129263c4fffc34'
#Splitting up message into n 16-bit strings
MSGLIST = wrap(MESSAGE, 16)
doubles = []
print(MSGLIST)
for msg in MSGLIST:
doubles.append(struct.unpack('d', binascii.unhexlify(msg)))
print(doubles)
However, when I run this, I get crazy values, which are of course not what I put in:
[(-1.8561629252326087e+204,), (1.8922789420412524e-53,)]

Were your original numbers -16.657673995556173 and -4.642958715557189 ?
If so, then the problem is that your hex strings contain the big-endian (most-significant byte first) representation of the double, but the 'd' format string in your unpack call specifies conversion using your system's native format, which happens to be little-endian (least-significant byte first). The result is that unpack reads and processes the bytes of the unhexlify'ed string from the wrong end. Unsurprisingly, that will produce the wrong value.
To fix, do one of:
convert the hex string into little-endian format (reverse the bytes, so c030a85d52ae57ea becomes ea57ae525da830c0) before passing it to binascii.unhexlify, or
reverse the bytes produced by unhexlify (change binascii.unhexlify(msg) to binascii.unhexlify(msg)[::-1]) before you pass them to unpack, or
tell unpack to do the conversion using big-endian order (replace the format string 'd' with '>d')
I'd go with the last one, replacing the format string.

Related

can not decoed using utf-8 after encoding with utf-8

In a situation I had to store data as utf-8 and now when I want to fetch and decode('utf-8') data it's just simply does not work. Consider line below as an example:
\x0d\x0a\xd8\xb3\xd8\xa7\xd9\x82\xdb\x8c\xe2\x80\x8c\xd9\x86\xd8\xa7\xd9\x85\xd9\x87
You can simply copy the line below to convert the string above to the human readable format:
b"\x0d\x0a\xd8\xb3\xd8\xa7\xd9\x82\xdb\x8c\xe2\x80\x8c\xd9\x86\xd8\xa7\xd9\x85\xd9\x87".decode("utf-8")
However could not find a way to convert the string to bytestring without corrupting the string. I tried following methods but all of them failed:
.decode("utf-8")
.decode()
.bytes()
Up until this point I could not find solution in OS or other places. Appreciate any help.
x0d\x0a\xd8\xb3\xd8\xa7\xd9\x82\xdb\x8c\xe2\x80\x8c\xd9\x86\xd8\xa7\xd9\x85\xd9\x87
b'x0d\x0a\xd8\xb3\xd8\xa7\xd9\x82\xdb\x8c\xe2\x80\x8c\xd9\x86\xd8\xa7\xd9\x85\xd9\x87'
The above lines (both given in the question) are particular instances of String and Bytes literals (respectively):
\xhh Character with hex value hh (2, 3)
2 Unlike in Standard C, exactly two hex digits are
required.
3 In a bytes literal, hexadecimal and octal escapes denote
the byte with the given value. In a string literal, these escapes
denote a Unicode character with the given value.
Let's check the string defined in such a way (inside Python prompt):
>>> xstr = "\x0d\x0a\xd8\xb3\xd8\xa7\xd9\x82\xdb\x8c\xe2\x80\x8c\xd9\x86\xd8\xa7\xd9\x85\xd9\x87"
>>> xstr
'\r\nساÙ\x82Û\x8câ\x80\x8cÙ\x86اÙ\x85Ù\x87'
>>> print( xstr)
ساÙÛâÙاÙ
Ù
>>>
Apparently, the print( xstr) output does not resemble a word in any known language however all its characters belong (by definition) to Unicode range r'[\u0000-\u00ff]' i.e. the first 256 of characters in Unicode, and voila - it's iso-8859-1 aka 'latin1'.
We need to get an encoded version of the xstr string as a bytes object, e.g. using str.encode method or built-in bytes() function. Then
print( bytes(xstr,'latin1').decode()); print(xstr.encode("latin1").decode())
ساقی‌نامه
ساقی‌نامه

How to replace hex value in a string

While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>, <0x01>).
I want to replace them with specific characters, but am unable to do so. Removing them won't work either.
What it looks like in the exported flat file: https://i.imgur.com/7MQpoMH.png
Another example: https://i.imgur.com/3ZUSGIr.png
This is what I've tried:
(and mind, <0x01> represents a none-editable entity. It's not recognized here.)
import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
s=p.read()
# included in case it bears any significance
import re
import binascii
s = "Some string with hex: <0x01>"
s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte
s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = re.sub(r'\\\\0x01', r'.', s)
s = s.replace('\0x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')
or something along these lines in hopes to get a grasp of it while iterating through the whole string:
for x in s:
try:
base64.encodebytes(x)
base64.decodebytes(x)
s.strip(binascii.unhexlify(x))
s.decode('utf-8')
s.encode('latin1').decode('utf-8')
except:
pass
Nothing seems to get the job done.
I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)
-- edit:
Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?
with io.open('out.txt', 'w', encoding="utf-8") as temp:
temp.write(s)
Judging from the images, these are actually control characters.
Your editor displays them in this greyed-out way showing you the value of the bytes using hex notation.
You don't have the characters "0x01" in your data, but really a single byte with the value 1, so unhexlify and friends won't help.
In Python, these characters can be produced in string literals with escape sequences using the notation \xHH, with two hexadecimal digits.
The fragment from the first image is probably equal to the following string:
"sich z\x01 B. irgendeine"
Your attempts to remove them were close.
s = s.replace('\x01', '.') should work.

unpack syntax in python 3

I am trying to convert hex numbers into decimals using unpack.
When I use:
from struct import *
unpack("<H",b"\xe2\x07")
The output is: 2018, which is what I want.
The thing is I have my hex data in a list as a string in the form of:
asd = ['e2','07']
My question is is there a simple way of using unpack without the backslashes, the x? Something like so:
unpack("<H","e207")
I know this doesn't work but I hope you get the idea.
For clarification I know I could get the data in the form of b'\x11' in the list but then it's interpreted as ASCII, which I don't want, that's why I have it in the format I showed.
You have hex-encoded data, in a text object. So, to go back to raw hex bytes, you can decode the text string. Please note that this is not the usual convention in Python 3.x (generally, text strings are already decoded).
>>> codecs.decode('e207', 'hex')
b'\xe2\x07'
A convenience function for the same thing:
>>> bytes.fromhex('e207')
b'\xe2\x07'
Now you can struct.unpack those bytes. Putting it all together:
>>> asd = ['e2','07']
>>> text = ''.join(asd)
>>> encoded = codecs.decode(text, 'hex')
>>> struct.unpack("<H", encoded)
(2018,)

Why is this error appearing?

AttributeError: 'builtin_function_or_method' object has no attribute 'encode'
I'm trying to make a text to code converter as an example for an assignment and this is some code based off of some I found in my research,
import binascii
text = input('Message Input: ')
data = binascii.b2a_base64.encode(text)
text = binascii.a2b_base64.encode(data)
print (text), "<=>", repr(data)
data = binascii.b2a_uu(text)
text = binascii.a2b_uu(data)
print (text), "<=>", repr(data)
data = binascii.b2a_hqx(text)
text = binascii.a2b_hqx(data)
print (text), "<=>", repr(data)
can anyone help me get it working? it's supposed to take an input in and then convert it into hex and others and display those...
I am using Python 3.6 but I am also a little out of practice...
TL;DR:
data = binascii.b2a_base64(text.encode())
text = binascii.a2b_base64(data).decode()
print (text, "<=>", repr(data))
You've hit on a common problem in the Python3 - str object vs bytes object. The bytes object contains sequence of bytes. One byte can contain any number from 0 to 255. Usually those number are translated through the ASCII table into a characters like english letters. Usually in the Python you should use bytes for working with binary data.
On the other hand the str object contains sequence of code points. One code point usually represent one character printed on your screen when you call print. Internally it is sequence of bytes so the Chinese symbol 的 is internally saved as 3 bytes long sequence.
Now to the your problem. The function requires as input the bytes object but you've got a str object from the function input. To convert str into bytes you have to call str.encode() method on the str object.
data = binascii.b2a_base64(text.encode())
Your original call binascii.b2a_base64.encode(text) means call method encode of the object binascii.b2a_base64 with parameter text.
The function binascii.b2a_base64 returns bytes contains original input encoded with the base64 algorithms. Now to get back the original str from encoded data you have to call this:
# Take base64 encoded data and return it decoded as bytes object
decoded_data = binascii.a2b_base64(data)
# Convert bytes object into str
text = decoded_data.decode()
It can be written as one line
decoded_data = binascii.a2b_base64(data).decode()
WARNING: Your call of print is invalid for Python 3 (it will work only in the python console)

Hexadecimal string with float number to float

Iam working in python 3.6
I receive from serial comunication an string '3F8E353F'. This is a float number 1.111. How can I convert this string to a float number?
Thank you
Ah yes. Since this is 32-bits, unpack it into an int first then:
x='3F8E353F'
struct.unpack('f',struct.pack('i',int(x,16)))
On my system this gives:
>>> x='3F8E353F'
>>> struct.unpack('f',struct.pack('i',int(x,16)))
(1.1109999418258667,)
>>>
Very close to the expected value. However, this can give 'backwards' results based on the 'endianness' of bytes in your system. Some systems store their bytes least significant byte first, others most significant byte first. See this reference page for the descriptors to format based on byte order.
I used struct.unpack('f',struct.pack('i',int(x,16))) to convert Hex value to float but for negative value I am getting below error
struct.error: argument out of range
To resolve this I used below code which converts Hex (0xc395aa3d) to float value (-299.33). It works for both positive as well for negative value.
x = 0xc395aa3d
struct.unpack('f', struct.pack('I', int(x,16) ))
Another way is to use bytes.fromhex()
import struct
hexstring = '3F8E353F'
struct.unpack('!f', bytes.fromhex(hexstring))[0]
#answer:1.1109999418258667
Note: The form '!' is available for those poor souls who claim they can’t remember whether network byte order is big-endian or little-endian (from struct docs).

Resources