string to single byte in python - python-3.x

I have a python string that looks like this:
byte_string = "01100110"
I want to turn this into a byte like so:
byte_byte = some_function(byte_string) # byte_byte = 0b01100110
How can I achieve this?

It appears you want to turn that string into a single number. For that you just convert to int with a base of 2.
byte_byte = int(byte_string, 2)

After reflecting on the question some more, I think I originally mis-understood your intent, which is reflected in my below "Prior Answer".
I'm thinking this prior SO question may address your question as well, as it sounds like you want to convert a string of 8 bits into a single byte.
And, as #Mark Ransom has indicated, it would look like this as a decimal code:
int(byte_string, 2)
output: 102
Or like this as the corresponding ascii character:
chr(int(byte_string, 2))
output: 'f'
----Prior Answer Based on Assumption of Encoding a String into an array of bytes------
You can encode a string to bytes like this, optionally passing the specific encoding if you want:
byte_string = "01100110"
byte_string.encode("utf-8")
bytes(byte_string, 'utf-8')
Result:
b'01100110'
Link to python docs for encoding / decoding strings.
Another option would be to create a bytearray:
bytearray(byte_string, 'utf-8')
output: bytearray(b'01100110')
And here is another StackOverflow thread that might be helpful as well, describing the difference between bytes and bytearray objects in relation to their use in converting to and from strings.

Related

How to convert Hex into Signed Integer using python3

what is the simplest way to print the result as follows using pyhton3
I have a Hex string s="FFFC"
In python if using this command line: print(int(s,16))
The result I'm expecting is -4 (which is in signed format). But this is not the case, It displays the Unsigned format which the result is 65,532.
How can I convert this the easiest way?
Thank you in advance.
There are several ways, but you could just do the math explicitly (assuming s has no more than 4 characters, otherwise use s[-4:]):
i = int(s, 16)
if i >= 0x8000:
i -= 0x10000
You can use the bytes.fromhex and int.from_bytes class methods.
s = bytes.fromhex('FFFC')
i = int.from_bytes(s, 'big', signed=True)
print(i)
Pretty self-explanatory, the only thing that might need clarification is the 'big' argument, but that just means that the byte array s has the most significant byte first.

How to Turn string into bytes?

Using python3 and I've got a string which displayed as bytes
strategyName=\xe7\x99\xbe\xe5\xba\xa6
I need to change it into readable chinese letter through decode
orig=b'strategyName=\xe7\x99\xbe\xe5\xba\xa6'
result=orig.decode('UTF-8')
print()
which shows like this and it is what I want
strategyName=百度
But if I save it in another string,it works different
str0='strategyName=\xe7\x99\xbe\xe5\xba\xa6'
result_byte=str0.encode('UTF-8')
result_str=result_byte.decode('UTF-8')
print(result_str)
strategyName=ç¾åº¦é£é©ç­ç¥
Please help me about why this happening,and how can I fix it.
Thanks a lot
Your problem is using a str literal when you're trying to store the UTF-8 encoded bytes of your string. You should just use the bytes literal, but if that str form is necessary, the correct approach is to encode in latin-1 (which is a 1-1 converter for all ordinals below 256 to the matching byte value) to get the bytes with utf-8 encoded data, then decode as utf-8:
str0 = 'strategyName=\xe7\x99\xbe\xe5\xba\xa6'
result_byte = str0.encode('latin-1') # Only changed line
result_str = result_byte.decode('UTF-8')
print(result_str)
Of course, the other approach could be to just type the Unicode escapes you wanted in the first place instead of byte level escapes that correspond to a UTF-8 encoding:
result_str = 'strategyName=\u767e\u5ea6'
No rigmarole needed.

unpack syntax in python 3

I am trying to convert hex numbers into decimals using unpack.
When I use:
from struct import *
unpack("<H",b"\xe2\x07")
The output is: 2018, which is what I want.
The thing is I have my hex data in a list as a string in the form of:
asd = ['e2','07']
My question is is there a simple way of using unpack without the backslashes, the x? Something like so:
unpack("<H","e207")
I know this doesn't work but I hope you get the idea.
For clarification I know I could get the data in the form of b'\x11' in the list but then it's interpreted as ASCII, which I don't want, that's why I have it in the format I showed.
You have hex-encoded data, in a text object. So, to go back to raw hex bytes, you can decode the text string. Please note that this is not the usual convention in Python 3.x (generally, text strings are already decoded).
>>> codecs.decode('e207', 'hex')
b'\xe2\x07'
A convenience function for the same thing:
>>> bytes.fromhex('e207')
b'\xe2\x07'
Now you can struct.unpack those bytes. Putting it all together:
>>> asd = ['e2','07']
>>> text = ''.join(asd)
>>> encoded = codecs.decode(text, 'hex')
>>> struct.unpack("<H", encoded)
(2018,)

Python3: Converting or 'Casting' byte array string from string to bytes array

New to this python thing.
A little while ago I saved off output from an external device that was connected to a serial port as I would not be able to keep that device. I read in the data at the serial port as bytes with the intention of creating an emulator for that device.
The data was saved one 'byte' per line to a file as example extract below.
b'\x9a'
b'X'
b'}'
b'}'
b'x'
b'\x8c'
I would like to read in each line from the data capture and append what would have been the original byte to a byte array.
I have tried various append() and concatenation operations (+=) on a bytearray but the above lines are python string objects and these operations fail.
Is there an easy way (a built-in way?) to add each of the original byte values of these lines to a byte array?
Thanks.
M
Update
I came across the .encode() string method and have created a crude function to meet my needs.
def string2byte(str):
# the 'byte' string will consist of 5, 6 or 8 characters including the newline
# as in b',' or b'\r' or b'\x0A'
if len(str) == 5:
return str[2].encode('Latin-1')
elif len(str) == 6:
return str[3].encode('Latin-1')
else:
return str[4:6].encode('Latin-1')
...well, it is functional.
If anyone knows of a more elegant solution perhaps you would be kind enough to post this.
b'\x9a' is a literal representation of the byte 0x9a in Python source code. If your file literally contains these seven characters b'\x9a' then it is bad because you could have saved it using only one byte. You could convert it to a byte using ast.literal_eval():
import ast
with open('input') as file:
a = b"".join(map(ast.literal_eval, file)) # assume 1 byte literal per line

Buffer constructor not treating encoding correctly for buffer length

I am trying to construct a utf16le string from a javascript string as a new buffer object.
It appears that setting a new Buffer('xxxxxxxxxx', utf16le) will actually have a length of 1/2 what it is expected to have. Such as we will only see 5 x's in the console logs.
var test = new Buffer('xxxxxxxxxx','utf16le');
for (var i=0;i<test.length;i++) {
console.log(i+':'+String.fromCharCode(test[i]));
}
Node version is v0.8.6
It is really unclear what you want to accomplish here. Your statement can mean (at least) 2 things:
How to convert an JS-String into a UTF-16-LE Byte-Array
How to convert a Byte-Array containing a UTF-16-LE String into a JS-String
What you are doing in your code sample is decoding a Byte-Array in a string represented as UTF-16-LE to a UTF-8 string and storing that as a buffer. Until you actually state what you want to accomplish, you have 0 chance of getting a coherent answer.
new Buffer('FF', 'hex') will yield a buffer of length 1 with all bits of the octet set. Which is likely the opposite of what you think it does.

Resources