I want to concatenate the first byte of a bytes string to the end of the string:
a = b'\x14\xf6'
a += a[0]
I get an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to int
When I type bytes(a[0]) I get:
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
And bytes({a[0]}) gives the correct b'\x14'.
Why do I need {} ?
If you want to change your byte sequence, you should use a bytearray. It is mutable and has the .append method:
>>> a = bytearray(b'\x14\xf6')
>>> a.append(a[0])
>>> a
bytearray(b'\x14\xf6\x14')
What happens in your approach: when you do
a += a[0]
you are trying to add an integer to a bytes object. That doesn't make sense, since you are trying to add different types.
If you do
bytes(a[0])
you get a bytes object of length 20, as the documentation describes:
If [the argument] is an integer, the array will have that size and will be initialized with null bytes.
If you use curly braces, you are creating a set, and a different option in the constructor is chosen:
If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.
Bytes don't work quite like strings. When you index with a single value (rather than a slice), you get an integer, rather than a length-one bytes instance. In your case, a[0] is 20 (hex 0x14).
A similar issue happens with the bytes constructor. If you pass a single integer in as the argument (rather than an iterable), you get a bytes instance that consists of that many null bytes ("\x00"). This explains why bytes(a[0]) gives you twenty null bytes. The version with the curly brackets works because it creates a set (which is iterable).
To do what you want, I suggest slicing a[0:1] rather than indexing with a single value. This will give you a bytes instance that you can concatenate onto your existing value.
a += a[0:1]
bytes is a sequence type. Its individual elements are integers. You can't do a + a[0] for the same reason you can't do a + a[0] if a is a list. You can only concatenate a sequence with another sequence.
bytes(a[0]) gives you that because a[0] is an integer, and as documented doing bytes(someInteger) gives you a sequence of that many zero bytes (e.g,, bytes(3) gives you 3 zero bytes).
{a[0]} is a set. When you do bytes({a[0]}) you convert the contents of that set into a bytes object. This is not a great way to do it in general, because sets are unordered, so if you try to do it with more than one byte in there you may not get what you expect.
The easiest way to do what you want is a + a[:1]. You could also do a + bytes([a[0]]). There is no shortcut for creating a single-element bytes object; you have to either use a slice or make a length-one sequence of that byte.
Try this
values = [0x49, 0x7A]
concat = (values[0] << 8) + values[1]
print(hex(concat))
you should get 0x497A
Related
I have a python3 script which reads data into a buffer with
fp = open("filename", 'rb')
data = fp.read(count)
I don't fully understand (even after reading the documentation) what read() returns. It appears to be some kind of binary data which is iterable. But it is not a list.
Confusingly, elsewhere in the script, lists are used for binary data.
frames = []
# then later... inside a loop
for ...
data = b''.join(frames)
Regardless... I want to iterate over the object returned by read() in units of word (aka 2 byte blocks)
At the moment the script contains this for loop
for c in data:
# do something
Is it possible to change c such that this loop iterates over words (2 byte blocks) rather than individual bytes?
I cannot use read() in a loop to read 2 bytes at a time.
We can explicitly read (up to) n bytes from a file in binary mode with .read(n) (just as it would read n Unicode code points from a file opened in text mode). This is a blocking call and will only read fewer bytes at the end of the file.
We can use the two-argument form of iter to build an iterator that repeatedly calls a callable:
>>> help(iter)
Help on built-in function iter in module builtins:
iter(...)
iter(iterable) -> iterator
iter(callable, sentinel) -> iterator
Get an iterator from an object. In the first form, the argument must
supply its own iterator, or be a sequence.
In the second form, the callable is called until it returns the sentinel.
read at the end of the file will start returning empty results and not raise an exception, so we can use that for our sentinel.
Putting it together, we get:
for pair in iter(lambda: fp.read(2), b''):
Inside the loop, we will get bytes objects that represent two bytes of data. You should check the documentation to understand how to work with these.
When reading a file in binary mode, a bytes object is returned, which is one of the standard python builtins. In general, its representation in the code looks like that of a string, except that it is prefixed as b" " - When you try printing it, each byte may be displayed with an escape like \x** where ** are 2 hex digits corresponding to the byte's value from 0 to 255, or directly as a single printable ascii character, with the same ascii codepoint as the number. You can read more about this and methods etc of bytes (also similar to those for strings) in the bytes docs.
There already seems to be a very popular question on stack overflow about how to iterate over a bytes object. The currently accepted answer gives this example for creating a list of individual bytes in the bytes object :
L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))]
I suppose that modifying it like this will work for you :
L = [bytes_obj[i:i+2] for i in range(0, len(bytes_obj), 2)]
For example :
by = b"\x00\x01\x02\x03\x04\x05\x06"
# The object returned by file.read() is also bytes, like the one above
words = [by[i:i+2] for i in range(0, len(by), 2)]
print(words)
# Output --> [b'\x00\x01', b'\x02\x03', b'\x04\x05', b'\x06']
Or create a generator that yields words in the same way if your list is likely to be too large to efficiently store at once:
def get_words(bytesobject):
for i in range(0, len(bytesobject), 2):
yield bytesobject[i:i+2]
In the most simple literal sense, something like this gives you a two byte at a time loop.
with open("/etc/passwd", "rb") as f:
w = f.read(2)
while len(w) > 0:
print( w )
w = f.read(2)
as for what you are getting from read, it's a bytes object, because you have specified 'b' as an option to the `open
I think a more python way to express it would be via an iterator or generator.
So I am trying to XOR two strings together but am unsure if I am doing it correctly when the strings are different length.
The method I am using is as follows.
def xor_two_str(a,b):
xored = []
for i in range(max(len(a), len(b))):
xored_value = ord(a[i%len(a)]) ^ ord(b[i%len(b)])
xored.append(hex(xored_value)[2:])
return ''.join(xored)
I get output like so.
abc XOR abc: 000
abc XOR ab: 002
ab XOR abc: 5a
space XOR space: 0
I know something is wrong and I will eventually want to convert the hex value to ascii so am worried the foundation is wrong. Any help would be greatly appreciated.
Your code looks mostly correct (assuming the goal is to reuse the shorter input by cycling back to the beginning), but your output has a minor problem: It's not fixed width per character, so you could get the same output from two pairs characters with a small (< 16) difference as from a single pair of characters with a large difference.
Assuming you're only working with "bytes-like" strings (all inputs have ordinal values below 256), you'll want to pad your hex output to a fixed width of two, with padding zeroes changing:
xored.append(hex(xored_value)[2:])
to:
xored.append('{:02x}'.format(xored_value))
which saves a temporary string (hex + slice makes the longer string then slices off the prefix, when format strings can directly produce the result without the prefix) and zero-pads to a width of two.
There are other improvements possible for more Pythonic/performant code, but that should be enough to make your code produce usable results.
Side-note: When running your original code, xor_two_str('abc', 'ab') and xor_two_str('ab', 'abc') both produced the same output, 002 (Try it online!), which is what you'd expect (since xor-ing is commutative, and you cycle the shorter input, reversing the arguments to any call should produce the same results). Not sure why you think it produced 5a. My fixed code (Try it online!) just makes the outputs 000000, 000002, 000002, and 00; padded properly, but otherwise unchanged from your results.
As far as other improvements to make, manually converting character by character, and manually cycling the shorter input via remainder-and-indexing is a surprisingly costly part of this code, relative to the actual work performed. You can do a few things to reduce this overhead, including:
Convert from str to bytes once, up-front, in bulk (runs in roughly one seventh the time of the fastest character by character conversion)
Determine up front which string is shortest, and use itertools.cycle to extend it as needed, and zip to directly iterate over paired byte values rather than indexing at all
Together, this gets you:
from itertools import cycle
def xor_two_str(a,b):
# Convert to bytes so we iterate by ordinal, determine which is longer
short, long = sorted((a.encode('latin-1'), b.encode('latin-1')), key=len)
xored = []
for x, y in zip(long, cycle(short)):
xored_value = x ^ y
xored.append('{:02x}'.format(xored_value))
return ''.join(xored)
or to make it even more concise/fast, we just make the bytes object without converting to hex (and just for fun, use map+operator.xor to avoid the need for Python level loops entirely, pushing all the work to the C layer in the CPython reference interpreter), then convert to hex str in bulk with the (new in 3.5) bytes.hex method:
from itertools import cycle
from operator import xor
def xor_two_str(a,b):
short, long = sorted((a.encode('latin-1'), b.encode('latin-1')), key=len)
xored = bytes(map(xor, long, cycle(short)))
return xored.hex()
What is the difference between string and character class in MATLAB?
a = 'AX'; % This is a character.
b = string(a) % This is a string.
The documentation suggests:
There are two ways to represent text in MATLAB®. You can store text in character arrays. A typical use is to store short pieces of text as character vectors. And starting in Release 2016b, you can also store multiple pieces of text in string arrays. String arrays provide a set of functions for working with text as data.
This is how the two representations differ:
Element access. To represent char vectors of different length, one had to use cell arrays, e.g. ch = {'a', 'ab', 'abc'}. With strings, they can be created in actual arrays: str = [string('a'), string('ab'), string('abc')].
However, to index characters in a string array directly, the curly bracket notation has to be used:
str{3}(2) % == 'b'
Memory use. Chars use exactly two bytes per character. strings have overhead:
a = 'abc'
b = string('abc')
whos a b
returns
Name Size Bytes Class Attributes
a 1x3 6 char
b 1x1 132 string
The best place to start for understanding the difference is the documentation. The key difference, as stated there:
A character array is a sequence of characters, just as a numeric array is a sequence of numbers. A typical use is to store short pieces of text as character vectors, such as c = 'Hello World';.
A string array is a container for pieces of text. String arrays provide a set of functions for working with text as data. To convert text to string arrays, use the string function.
Here are a few more key points about their differences:
They are different classes (i.e. types): char versus string. As such they will have different sets of methods defined for each. Think about what sort of operations you want to do on your text, then choose the one that best supports those.
Since a string is a container class, be mindful of how its size differs from an equivalent character array representation. Using your example:
>> a = 'AX'; % This is a character.
>> b = string(a) % This is a string.
>> whos
Name Size Bytes Class Attributes
a 1x2 4 char
b 1x1 134 string
Notice that the string container lists its size as 1x1 (and takes up more bytes in memory) while the character array is, as its name implies, a 1x2 array of characters.
They can't always be used interchangeably, and you may need to convert between the two for certain operations. For example, string objects can't be used as dynamic field names for structure indexing:
>> s = struct('a', 1);
>> name = string('a');
>> s.(name)
Argument to dynamic structure reference must evaluate to a valid field name.
>> s.(char(name))
ans =
1
Strings do have a bit of overhead, but still increase by 2 bytes per character. After every 8 characters it increases the size of the variable. The red line is y=2x+127.
figure is created using:
v=[];N=100;
for ct = 1:N
s=char(randi([0 255],[1,ct]));
s=string(s);
a=whos('s');v(ct)=a.bytes;
end
figure(1);clf
plot(v)
xlabel('# characters')
ylabel('# bytes')
p=polyfit(1:N,v,1);
hold on
plot([0,N],[127,2*N+127],'r')
hold off
One important practical thing to note is, that strings and chars behave differently when interacting with square brackets. This can be especially confusing when coming from python. consider following example:
>>['asdf' '123']
ans =
'asdf123'
>> ["asdf" "123"]
ans =
1×2 string array
"asdf" "123"
I have two bytes in:
b'T'
and
b'\x40' (only bit #6 is set)
In need to perform a check on the first byte to see if bit # 6 is set. For example, on [A-Za-9] it would be set, but on all some characters it would not be set.
if (b'T' & b'\x40') != 0:
print("set");
does not work ...
Byte values, when indexed, give integer values. Use that to your advantage:
value = b'T'
if value[0] & 0x40:
print('set')
You cannot use the & operator on bytes, but it works just fine on integers.
See the documentation on the bytes type:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
…
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer[.]
Note that non-zero numbers always test as true in a boolean context, there is no need to explicitly test for != 0 here either.
You are looking for the ord built-in function, which converts single-character strings (byte or unicode) to the corresponding numeric codepoint.
if ord(b'T') & 0x40:
print("set")
I've been having a LOT of trouble with this and the other questions don't seem to be what I'm looking for. So basically I have a list of bytes gotten from
bytes = struct.pack('I',4)
bList = list(bytes)
# bList ends up being [0,0,0,4]
# Perform some operation that switches position of bytes in list, etc
So now I want to write this to a file
f = open('/path/to/file','wb')
for i in range(0,len(bList)):
f.write(bList[i])
But I keep getting the error
TypeError: 'int' does not support the buffer interface
I've also tried writing:
bytes(bList[i]) # Seems to write the incorrect number.
str(bList[i]).encode() # Seems to just write the string value instead of byte
Oh boy, I had to jump through hoops to solve this. So basically I had to instead do
bList = bytes()
bList += struct.pack('I',4)
# Perform whatever byte operations I need to
byteList = []
# I know, there's probably a list comprehension to do this more elegantly
for i in range(0,len(bList)):
byteList.append(bList[i])
f.write(bytes(byteList))
So bytes can take an array of byte values (even if they're represented in decimal form in the array) and convert it to a proper byteArray by casting