unpack syntax in python 3 - python-3.x

I am trying to convert hex numbers into decimals using unpack.
When I use:
from struct import *
unpack("<H",b"\xe2\x07")
The output is: 2018, which is what I want.
The thing is I have my hex data in a list as a string in the form of:
asd = ['e2','07']
My question is is there a simple way of using unpack without the backslashes, the x? Something like so:
unpack("<H","e207")
I know this doesn't work but I hope you get the idea.
For clarification I know I could get the data in the form of b'\x11' in the list but then it's interpreted as ASCII, which I don't want, that's why I have it in the format I showed.

You have hex-encoded data, in a text object. So, to go back to raw hex bytes, you can decode the text string. Please note that this is not the usual convention in Python 3.x (generally, text strings are already decoded).
>>> codecs.decode('e207', 'hex')
b'\xe2\x07'
A convenience function for the same thing:
>>> bytes.fromhex('e207')
b'\xe2\x07'
Now you can struct.unpack those bytes. Putting it all together:
>>> asd = ['e2','07']
>>> text = ''.join(asd)
>>> encoded = codecs.decode(text, 'hex')
>>> struct.unpack("<H", encoded)
(2018,)

Related

string to single byte in python

I have a python string that looks like this:
byte_string = "01100110"
I want to turn this into a byte like so:
byte_byte = some_function(byte_string) # byte_byte = 0b01100110
How can I achieve this?
It appears you want to turn that string into a single number. For that you just convert to int with a base of 2.
byte_byte = int(byte_string, 2)
After reflecting on the question some more, I think I originally mis-understood your intent, which is reflected in my below "Prior Answer".
I'm thinking this prior SO question may address your question as well, as it sounds like you want to convert a string of 8 bits into a single byte.
And, as #Mark Ransom has indicated, it would look like this as a decimal code:
int(byte_string, 2)
output: 102
Or like this as the corresponding ascii character:
chr(int(byte_string, 2))
output: 'f'
----Prior Answer Based on Assumption of Encoding a String into an array of bytes------
You can encode a string to bytes like this, optionally passing the specific encoding if you want:
byte_string = "01100110"
byte_string.encode("utf-8")
bytes(byte_string, 'utf-8')
Result:
b'01100110'
Link to python docs for encoding / decoding strings.
Another option would be to create a bytearray:
bytearray(byte_string, 'utf-8')
output: bytearray(b'01100110')
And here is another StackOverflow thread that might be helpful as well, describing the difference between bytes and bytearray objects in relation to their use in converting to and from strings.

Converting 16-digit hexadecimal string into double value in Python 3

I want to convert 16-digit hexadecimal numbers into doubles. I actually did the reverse of this before which worked fine:
import struct
import wrap
def double_to_hex(doublein):
return hex(struct.unpack('<Q', struct.pack('<d', doublein))[0])
for i in modified_list:
encoded_list.append(double_to_hex(i))
modified_list.clear()
encoded_msg = ''.join(encoded_list).replace('0x', '')
encoded_list.clear()
print_command('encode', encoded_message)
And now I want to sort of do the reverse. I tried this without success:
from textwrap import wrap
import struct
import binascii
MESSAGE = 'c030a85d52ae57eac0129263c4fffc34'
#Splitting up message into n 16-bit strings
MSGLIST = wrap(MESSAGE, 16)
doubles = []
print(MSGLIST)
for msg in MSGLIST:
doubles.append(struct.unpack('d', binascii.unhexlify(msg)))
print(doubles)
However, when I run this, I get crazy values, which are of course not what I put in:
[(-1.8561629252326087e+204,), (1.8922789420412524e-53,)]
Were your original numbers -16.657673995556173 and -4.642958715557189 ?
If so, then the problem is that your hex strings contain the big-endian (most-significant byte first) representation of the double, but the 'd' format string in your unpack call specifies conversion using your system's native format, which happens to be little-endian (least-significant byte first). The result is that unpack reads and processes the bytes of the unhexlify'ed string from the wrong end. Unsurprisingly, that will produce the wrong value.
To fix, do one of:
convert the hex string into little-endian format (reverse the bytes, so c030a85d52ae57ea becomes ea57ae525da830c0) before passing it to binascii.unhexlify, or
reverse the bytes produced by unhexlify (change binascii.unhexlify(msg) to binascii.unhexlify(msg)[::-1]) before you pass them to unpack, or
tell unpack to do the conversion using big-endian order (replace the format string 'd' with '>d')
I'd go with the last one, replacing the format string.

Convert string with "\x" character to float

I'm converting strings to floats using float(x). However for some reason, one of the strings is "71.2\x0060". I've tried following this answer, but it does not remove the bytes character
>>> s = "71.2\x0060"
>>> "".join([x for x in s if ord(x) < 127])
'71.2\x0060'
Other methods I've tried are:
>>> s.split("\\x")
['71.2\x0060']
>>> s.split("\x")
ValueError: invalid \x escape
I'm not sure why this string is not formatted correctly, but I'd like to get as much precision from this string and move on.
Going off of wim's comment, the answer might be this:
>>> s.split("\x00")
['71.2', '60']
So I should do:
>>> float(s.split("\x00")[0])
71.2
Unfortunately the POSIX group \p{XDigit} does not exist in the re module. To remove the hex control characters with regular expressions anyway, you can try the following.
impore re
re.sub(r'[\x00-\x1F]', r'', '71.2\x0060') # or:
re.sub(r'\\x[0-9a-fA-F]{2}', r'', r'71.2\x0060')
Output:
'71.260'
'71.260'
r means raw. Take a look at the control characters up to hex 1F in the ASCII table: https://www.torsten-horn.de/techdocs/ascii.htm

How to replace hex value in a string

While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>, <0x01>).
I want to replace them with specific characters, but am unable to do so. Removing them won't work either.
What it looks like in the exported flat file: https://i.imgur.com/7MQpoMH.png
Another example: https://i.imgur.com/3ZUSGIr.png
This is what I've tried:
(and mind, <0x01> represents a none-editable entity. It's not recognized here.)
import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
s=p.read()
# included in case it bears any significance
import re
import binascii
s = "Some string with hex: <0x01>"
s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte
s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = re.sub(r'\\\\0x01', r'.', s)
s = s.replace('\0x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')
or something along these lines in hopes to get a grasp of it while iterating through the whole string:
for x in s:
try:
base64.encodebytes(x)
base64.decodebytes(x)
s.strip(binascii.unhexlify(x))
s.decode('utf-8')
s.encode('latin1').decode('utf-8')
except:
pass
Nothing seems to get the job done.
I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)
-- edit:
Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?
with io.open('out.txt', 'w', encoding="utf-8") as temp:
temp.write(s)
Judging from the images, these are actually control characters.
Your editor displays them in this greyed-out way showing you the value of the bytes using hex notation.
You don't have the characters "0x01" in your data, but really a single byte with the value 1, so unhexlify and friends won't help.
In Python, these characters can be produced in string literals with escape sequences using the notation \xHH, with two hexadecimal digits.
The fragment from the first image is probably equal to the following string:
"sich z\x01 B. irgendeine"
Your attempts to remove them were close.
s = s.replace('\x01', '.') should work.

Convert hexadecimal to normal string

I'm using Python 3.3.2 and I want convert a hex to a string.
This is my code:
junk = "\x41" * 50 # A
eip = pack("<L", 0x0015FCC4)
buffer = junk + eip
I've tried use
>>> binascii.unhexlify("4142")
b'AB'
... but I want the output "AB", no "b'AB'". What can I do?
Edit:
buffer = junk + binascii.unhexlify(eip).decode('ascii')
binascii.Error: Non-hexadecimal digit found
The problem is I can't concatenate junk + eip.
Thank you.
What that b stands for is to denote that is a bytes class, i.e. a string of bytes. If you want to convert that into a string you want to use the decode method.
>>> type(binascii.unhexlify(b"4142"))
<class 'bytes'>
>>> binascii.unhexlify(b"4142").decode('ascii')
'AB'
This results in a string, which is a string of unicode characters.
Edit:
If you want to work purely with binary data, don't do decode, stick with using the bytes type, so in your edited example:
>>> #- junk = "\x41" * 50 # A
>>> junk = b"\x41" * 50 # A
>>> eip = pack("<L", 0x0015FCC4)
>>> buffer = junk + eip
>>> buffer
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xc4\xfc\x15\x00'
Note the b in b"\x41", which denote that as a binary string, i.e. standard string type in python2, or literally a string of bytes rather than a string of unicode characters which are two completely different things.
That's just a literal representation. Don't worry about the b, as it's not actually part of the string itself.
See What does the 'b' character do in front of a string literal?

Resources