Write binary data to file in python3 - python-3.x

I've been having a LOT of trouble with this and the other questions don't seem to be what I'm looking for. So basically I have a list of bytes gotten from
bytes = struct.pack('I',4)
bList = list(bytes)
# bList ends up being [0,0,0,4]
# Perform some operation that switches position of bytes in list, etc
So now I want to write this to a file
f = open('/path/to/file','wb')
for i in range(0,len(bList)):
f.write(bList[i])
But I keep getting the error
TypeError: 'int' does not support the buffer interface
I've also tried writing:
bytes(bList[i]) # Seems to write the incorrect number.
str(bList[i]).encode() # Seems to just write the string value instead of byte

Oh boy, I had to jump through hoops to solve this. So basically I had to instead do
bList = bytes()
bList += struct.pack('I',4)
# Perform whatever byte operations I need to
byteList = []
# I know, there's probably a list comprehension to do this more elegantly
for i in range(0,len(bList)):
byteList.append(bList[i])
f.write(bytes(byteList))
So bytes can take an array of byte values (even if they're represented in decimal form in the array) and convert it to a proper byteArray by casting

Related

Python 3: for x in bytes as words

I have a python3 script which reads data into a buffer with
fp = open("filename", 'rb')
data = fp.read(count)
I don't fully understand (even after reading the documentation) what read() returns. It appears to be some kind of binary data which is iterable. But it is not a list.
Confusingly, elsewhere in the script, lists are used for binary data.
frames = []
# then later... inside a loop
for ...
data = b''.join(frames)
Regardless... I want to iterate over the object returned by read() in units of word (aka 2 byte blocks)
At the moment the script contains this for loop
for c in data:
# do something
Is it possible to change c such that this loop iterates over words (2 byte blocks) rather than individual bytes?
I cannot use read() in a loop to read 2 bytes at a time.
We can explicitly read (up to) n bytes from a file in binary mode with .read(n) (just as it would read n Unicode code points from a file opened in text mode). This is a blocking call and will only read fewer bytes at the end of the file.
We can use the two-argument form of iter to build an iterator that repeatedly calls a callable:
>>> help(iter)
Help on built-in function iter in module builtins:
iter(...)
iter(iterable) -> iterator
iter(callable, sentinel) -> iterator
Get an iterator from an object. In the first form, the argument must
supply its own iterator, or be a sequence.
In the second form, the callable is called until it returns the sentinel.
read at the end of the file will start returning empty results and not raise an exception, so we can use that for our sentinel.
Putting it together, we get:
for pair in iter(lambda: fp.read(2), b''):
Inside the loop, we will get bytes objects that represent two bytes of data. You should check the documentation to understand how to work with these.
When reading a file in binary mode, a bytes object is returned, which is one of the standard python builtins. In general, its representation in the code looks like that of a string, except that it is prefixed as b" " - When you try printing it, each byte may be displayed with an escape like \x** where ** are 2 hex digits corresponding to the byte's value from 0 to 255, or directly as a single printable ascii character, with the same ascii codepoint as the number. You can read more about this and methods etc of bytes (also similar to those for strings) in the bytes docs.
There already seems to be a very popular question on stack overflow about how to iterate over a bytes object. The currently accepted answer gives this example for creating a list of individual bytes in the bytes object :
L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))]
I suppose that modifying it like this will work for you :
L = [bytes_obj[i:i+2] for i in range(0, len(bytes_obj), 2)]
For example :
by = b"\x00\x01\x02\x03\x04\x05\x06"
# The object returned by file.read() is also bytes, like the one above
words = [by[i:i+2] for i in range(0, len(by), 2)]
print(words)
# Output --> [b'\x00\x01', b'\x02\x03', b'\x04\x05', b'\x06']
Or create a generator that yields words in the same way if your list is likely to be too large to efficiently store at once:
def get_words(bytesobject):
for i in range(0, len(bytesobject), 2):
yield bytesobject[i:i+2]
In the most simple literal sense, something like this gives you a two byte at a time loop.
with open("/etc/passwd", "rb") as f:
w = f.read(2)
while len(w) > 0:
print( w )
w = f.read(2)
as for what you are getting from read, it's a bytes object, because you have specified 'b' as an option to the `open
I think a more python way to express it would be via an iterator or generator.

XOR two strings of different length

So I am trying to XOR two strings together but am unsure if I am doing it correctly when the strings are different length.
The method I am using is as follows.
def xor_two_str(a,b):
xored = []
for i in range(max(len(a), len(b))):
xored_value = ord(a[i%len(a)]) ^ ord(b[i%len(b)])
xored.append(hex(xored_value)[2:])
return ''.join(xored)
I get output like so.
abc XOR abc: 000
abc XOR ab: 002
ab XOR abc: 5a
space XOR space: 0
I know something is wrong and I will eventually want to convert the hex value to ascii so am worried the foundation is wrong. Any help would be greatly appreciated.
Your code looks mostly correct (assuming the goal is to reuse the shorter input by cycling back to the beginning), but your output has a minor problem: It's not fixed width per character, so you could get the same output from two pairs characters with a small (< 16) difference as from a single pair of characters with a large difference.
Assuming you're only working with "bytes-like" strings (all inputs have ordinal values below 256), you'll want to pad your hex output to a fixed width of two, with padding zeroes changing:
xored.append(hex(xored_value)[2:])
to:
xored.append('{:02x}'.format(xored_value))
which saves a temporary string (hex + slice makes the longer string then slices off the prefix, when format strings can directly produce the result without the prefix) and zero-pads to a width of two.
There are other improvements possible for more Pythonic/performant code, but that should be enough to make your code produce usable results.
Side-note: When running your original code, xor_two_str('abc', 'ab') and xor_two_str('ab', 'abc') both produced the same output, 002 (Try it online!), which is what you'd expect (since xor-ing is commutative, and you cycle the shorter input, reversing the arguments to any call should produce the same results). Not sure why you think it produced 5a. My fixed code (Try it online!) just makes the outputs 000000, 000002, 000002, and 00; padded properly, but otherwise unchanged from your results.
As far as other improvements to make, manually converting character by character, and manually cycling the shorter input via remainder-and-indexing is a surprisingly costly part of this code, relative to the actual work performed. You can do a few things to reduce this overhead, including:
Convert from str to bytes once, up-front, in bulk (runs in roughly one seventh the time of the fastest character by character conversion)
Determine up front which string is shortest, and use itertools.cycle to extend it as needed, and zip to directly iterate over paired byte values rather than indexing at all
Together, this gets you:
from itertools import cycle
def xor_two_str(a,b):
# Convert to bytes so we iterate by ordinal, determine which is longer
short, long = sorted((a.encode('latin-1'), b.encode('latin-1')), key=len)
xored = []
for x, y in zip(long, cycle(short)):
xored_value = x ^ y
xored.append('{:02x}'.format(xored_value))
return ''.join(xored)
or to make it even more concise/fast, we just make the bytes object without converting to hex (and just for fun, use map+operator.xor to avoid the need for Python level loops entirely, pushing all the work to the C layer in the CPython reference interpreter), then convert to hex str in bulk with the (new in 3.5) bytes.hex method:
from itertools import cycle
from operator import xor
def xor_two_str(a,b):
short, long = sorted((a.encode('latin-1'), b.encode('latin-1')), key=len)
xored = bytes(map(xor, long, cycle(short)))
return xored.hex()

Python bytes concatenation

I want to concatenate the first byte of a bytes string to the end of the string:
a = b'\x14\xf6'
a += a[0]
I get an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to int
When I type bytes(a[0]) I get:
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
And bytes({a[0]}) gives the correct b'\x14'.
Why do I need {} ?
If you want to change your byte sequence, you should use a bytearray. It is mutable and has the .append method:
>>> a = bytearray(b'\x14\xf6')
>>> a.append(a[0])
>>> a
bytearray(b'\x14\xf6\x14')
What happens in your approach: when you do
a += a[0]
you are trying to add an integer to a bytes object. That doesn't make sense, since you are trying to add different types.
If you do
bytes(a[0])
you get a bytes object of length 20, as the documentation describes:
If [the argument] is an integer, the array will have that size and will be initialized with null bytes.
If you use curly braces, you are creating a set, and a different option in the constructor is chosen:
If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.
Bytes don't work quite like strings. When you index with a single value (rather than a slice), you get an integer, rather than a length-one bytes instance. In your case, a[0] is 20 (hex 0x14).
A similar issue happens with the bytes constructor. If you pass a single integer in as the argument (rather than an iterable), you get a bytes instance that consists of that many null bytes ("\x00"). This explains why bytes(a[0]) gives you twenty null bytes. The version with the curly brackets works because it creates a set (which is iterable).
To do what you want, I suggest slicing a[0:1] rather than indexing with a single value. This will give you a bytes instance that you can concatenate onto your existing value.
a += a[0:1]
bytes is a sequence type. Its individual elements are integers. You can't do a + a[0] for the same reason you can't do a + a[0] if a is a list. You can only concatenate a sequence with another sequence.
bytes(a[0]) gives you that because a[0] is an integer, and as documented doing bytes(someInteger) gives you a sequence of that many zero bytes (e.g,, bytes(3) gives you 3 zero bytes).
{a[0]} is a set. When you do bytes({a[0]}) you convert the contents of that set into a bytes object. This is not a great way to do it in general, because sets are unordered, so if you try to do it with more than one byte in there you may not get what you expect.
The easiest way to do what you want is a + a[:1]. You could also do a + bytes([a[0]]). There is no shortcut for creating a single-element bytes object; you have to either use a slice or make a length-one sequence of that byte.
Try this
values = [0x49, 0x7A]
concat = (values[0] << 8) + values[1]
print(hex(concat))
you should get 0x497A

Python3: Converting or 'Casting' byte array string from string to bytes array

New to this python thing.
A little while ago I saved off output from an external device that was connected to a serial port as I would not be able to keep that device. I read in the data at the serial port as bytes with the intention of creating an emulator for that device.
The data was saved one 'byte' per line to a file as example extract below.
b'\x9a'
b'X'
b'}'
b'}'
b'x'
b'\x8c'
I would like to read in each line from the data capture and append what would have been the original byte to a byte array.
I have tried various append() and concatenation operations (+=) on a bytearray but the above lines are python string objects and these operations fail.
Is there an easy way (a built-in way?) to add each of the original byte values of these lines to a byte array?
Thanks.
M
Update
I came across the .encode() string method and have created a crude function to meet my needs.
def string2byte(str):
# the 'byte' string will consist of 5, 6 or 8 characters including the newline
# as in b',' or b'\r' or b'\x0A'
if len(str) == 5:
return str[2].encode('Latin-1')
elif len(str) == 6:
return str[3].encode('Latin-1')
else:
return str[4:6].encode('Latin-1')
...well, it is functional.
If anyone knows of a more elegant solution perhaps you would be kind enough to post this.
b'\x9a' is a literal representation of the byte 0x9a in Python source code. If your file literally contains these seven characters b'\x9a' then it is bad because you could have saved it using only one byte. You could convert it to a byte using ast.literal_eval():
import ast
with open('input') as file:
a = b"".join(map(ast.literal_eval, file)) # assume 1 byte literal per line

Erlang howto make a list from this binary <<"a,b,c">>

I have a binary <<"a,b,c">> and I would like to extract the information from this binary.
So I would like to have something like A=a, B=b and so on.
I need a general approach on this because the binary string always changes.
So it could be <<"aaa","bbb","ccc">>...
I tried to generate a list
erlang:binary_to_list(<<"a","b","c">>)
but I get string as a result.
"abc"
Thank you.
You did use the right method.
binary_to_list(Binary) -> [char()]
Returns a list of integers which correspond to the bytes of Binary.
There is no string type in Erlang: http://www.erlang.org/doc/reference_manual/data_types.html#id63119. The console just displays the lists in string representation as a courtesy, if all elements are in printable ASCII range.
You should read Erlang's "Bit Syntax Expressions" documentation to understand how to work on binaries.
Do not convert the whole binary into a list if you don't need it in list representation!
To extract the first three bytes you could use
<<A, B, C, Rest/binary>> = <<"aaa","bbb","ccc">>.
If you want to iterate over the binary data, you can use binary comprehension.
<< <<(F(X))>> || <<X>> <= <<"aaa","bbb","ccc">> >>.
Pattern matching is possible, too:
test(<<A, Tail/binary>>, Accu) -> test(Tail, Accu+A);
test(_, Accu) -> Accu.
882 = test(<<"aaa","bbb","ccc">>, 0).
Even for reading one UTF-8 character at once. So to convert a binary UTF-8 string into Erlang's "list of codepoints" format, you could use:
test(<<A/utf8, Tail/binary>>, Accu) -> test(Tail, [A|Accu]);
test(_, Accu) -> lists:reverse(Accu).
[97,97,97,600,99,99,99] = test(<<"aaa", 16#0258/utf8, "ccc">>, "").
(Note that `<<"aaa","bbb","ccc">> = <<"aaabbbccc">>. Don't actually use the last code snipped but the linked method.)

Resources