Python3 adding an extra byte to the byte string

Python3 adding an extra byte to the byte string - python-3.x

file_1 = (r'res\test.png')
with open(file_1, 'rb') as file_1_:
file_1_read = file_1_.read()
file_1_hex = binascii.hexlify(file_1_read)
print ('Hexlifying test.png..')
pack = ("test.packet")
file_1_size_bytes = len(file_1_read)
print (("test.png is"),(file_1_size_bytes),("bytes."))
struct.pack( 'i', file_1_size_bytes)
file_1_size_bytes_hex = binascii.hexlify(struct.pack( '>i', file_1_size_bytes))
print (("Hexlifyed length - ("),(file_1_size_bytes_hex),(")."))
with open(pack, 'ab') as header_1_:
header_1_.write(binascii.unhexlify(file_1_size_bytes_hex))
print (("("),(binascii.unhexlify(file_1_size_bytes_hex)),(")"))
with open(pack, 'ab') as header_head_1:
header_head_1.write(binascii.unhexlify("0000020000000D007200650073002F00000074006500730074002E0070006E006700000000"))
print ("Header part 1 added.")
So this writes "0000020000000D007200650073002F00000074006500730074002E0070006E006700000000(00)" to the pack unhexlifyed.
There's an extra "00" byte at the end. this is messing everything up im trying to do because the packets length is referred back to when loading it and i have about 13 extra "00" bytes at the end of each string i write to the file. So in turn my file is 13 bytes longer than it should be. Not to mention the headers byte length isnt being read properly because the padding is off by 1 byte.

You seem to be saying that binascii.unhexlify does not really condense the input string. I have trouble believing that. Here is a minimal complete runnable example and the output I get with 3.4.2 on Win 7.
import binascii
import io
b = binascii.unhexlify(
"000000030000000100000000000000040041004E0049004D00000000000000")
print(b) # bytes
bf = io.BytesIO()
bf.write(b)
print(bf.getvalue())
>>>
b'\x00\x00\x00\x03\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x04\x00A\x00N\x00I\x00M\x00\x00\x00\x00\x00\x00\x00'
b'\x00\x00\x00\x03\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x04\x00A\x00N\x00I\x00M\x00\x00\x00\x00\x00\x00\x00'
Unhexlify has converted each pair of hex characters to the byte expected.

Related

How can I create a binary file in python?

I want to create a new binary file by using python according to the following format:
< Part1: 8 bytes > < Part2: 4 bytes > < Part3: 16 bytes>
so that i will write to any part some value and if this value is not the size of that part, then there will be a complement of zeros for that part.
I looking for the best way and the most efficient way to do it.
I read in the internet that I can do something like that:
f = open('file', 'w+b')
res = struct.pack(">l", 0000)
f.write(res)
but I don't sure that i can by this way to keep a place from the hand.

Let's start with some terminology when working with Python data before getting to your code to write a binary file.
Note: The experiments below are using the Python REPL
An integer in Python can be written as a denary/decimal number (e.g. 1)
>>> type(1)
<class 'int'>
It can also be written in hex by adding a leading 0x:
>>> 0x1
1
>>> type(0x1)
<class 'int'>
A hex integer's leading zeros have no effect. While in denary they give an error:
>>> x = 0x0001
>>> print(x)
1
>>> x = 0001
x = 0001
^^^
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
When writing to a binary file it is bytes that need to get written. If the content is an integer it can be converted to bytes with either the int.to_bytes functionality or the struct library functionality.
To convert 1 to bytes using int.to_bytes:
>>> int(1).to_bytes(length=1, byteorder='little')
b'\x01'
With a length of 1 the byte order (endianness) is not important. For numbers stored in more bytes it is important.
>>> int(1).to_bytes(length=4, byteorder='little')
b'\x01\x00\x00\x00'
>>> int(1).to_bytes(length=4, byteorder='big')
b'\x00\x00\x00\x01'
The same result can be achieved with the struct library:
>>> struct.pack('<l', 1)
b'\x01\x00\x00\x00'
>>> struct.pack('>l', 1)
b'\x00\x00\x00\x01'
The other common way to see values written is a hex string. The denary value of 1 could be written as 01000000 or 00000001 to represent 1 in different endian in 4 bytes.
>>> int(1).to_bytes(length=4, byteorder='big').hex()
'00000001'
>>> int(1).to_bytes(length=4, byteorder='little').hex()
'01000000'
In your question you have written 0000 for the value to be converted using struct and written to a file.
f = open('file', 'w+b')
res = struct.pack(">l", 0000)
f.write(res)
0000 will work but 0001 will give the SyntaxError: leading zeros in decimal integer literals are not permitted;
I think what you have in your question is the hex string representation of the value you want written.
If it is a hex string you are trying to input then the following will work:
f = open('file', 'w+b')
res = bytes.fromhex('0001')
f.write(res)
The other piece in your question was about making the values to certain byte length.
If your hex string represents the correct byte length then you are good.
However the example you gave was only 2 bytes long:
bytes.fromhex('0001')
b'\x00\x01'
len(bytes.fromhex('0001'))
2
And you wanted fields of either 4, 8, or 16 bytes long in which case the bytes have to be "padded` with bytes of zero value to get the correct number of bytes. e.g.
>>> bytes.fromhex('0001').rjust(4, b'\x00')
b'\x00\x00\x00\x01'
>>> bytes.fromhex('0001').rjust(8, b'\x00')
b'\x00\x00\x00\x00\x00\x00\x00\x01'
>>> bytes.fromhex('0001').rjust(16, b'\x00')
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'
If the hex string are in little endian format then ljust would be required:
bytes.fromhex('0100').ljust(4, b'\x00')
b'\x01\x00\x00\x00'

How to convert string into fixed number of bytes?

I want to create an 8 bytes sized variable that will include my string.
byte = 8_bytes_variable
str = 'hello'
# Put str inside byte while byte still remains of size 8 bytes.

You can format the string first by adding some space to the beginning of it. Here I assumed that each character takes 1 bit. (Chinese characters take more)
str = 'hello'
if len(str.encode('utf-8')) > 8:
print("This is not possible!")
else:
str2 = '{0: >8}'.format(str) # adds needed space to the beginnig of str
byte = str2.encode('utf-8')
In order to get the original string later, you can use lstrip():
str2 = byte.decode()
str = str2.lstrip()

Trying to while-loop my way through a binary file until a certain byte but the loop never ends

I've been working on an import script and have managed to hammer out most of the issues up until this point-I need to loop through the vertices until I reach a byte header that proceeds them but despite trying re.match, re.search, and !=
the while loop simply continues till the end of the file. I'm not sure where I went wrong given regex works with the if statement prior to this section of code.
while re.match(b'\x05\xC0.x\6E', byte) is None:
#Fill the vertex list by converting byte to its little endian float value
vertex[0] = struct.unpack('<f', byte)
byte = f.read(4)
vertex[1] = struct.unpack('<f', byte)
byte = f.read(4)
vertex[2] = struct.unpack('<f', byte)
#Append the vertices list with the completed vertex
vertices.append(vertex)
vertex_count += 1
#Read in what will either be the next X coordinate or a file header
byte = f.read(4)

The code is reading 4 bytes each time, but the pattern is 6 bytes-long.
>>> len(b'\x05\xC0.x\6E')
6
>>> b'\x05\xC0.x\6E' == b'\x05' + b'\xC0' + b'.' + b'x' + b'\6' + b'E'
True
The pattern will never match. That's why it continue until the end of the file.
IMHO, you mean this: (swapping the last \ and x)
b'\x05\xC0.\x6E'
>>> import re
>>> re.match(b'\x05\xC0.x\6E', b'\x05\xC0\x00\x6E') # no match
>>> re.match(b'\x05\xC0.\x6E', b'\x05\xC0\x00\x6E') # match
<_sre.SRE_Match object; span=(0, 4), match=b'\x05\xc0\x00n'>

Python - part of str [urllib3 data]

I'm Trying to delete 2 chars from start of the str and 1 char from the end
import urllib3
target_url="www.klimi.hys.cz/nalada.txt"
http = urllib3.PoolManager()
r = http.request('GET', target_url)
print(r.status)
print(r.data)
print()
And Output is
200
b'smutne'
I need to output be only "smutne", only this, not the " b' " and " ' "

When you have bytes, you'll need them to decode it into a string with the appropriate encoding type. For example, if you have a ASCII characters as bytes, you can do:
>>> foo = b'mystring'
>>> print(foo)
b'mystring'
>>> print(foo.decode('ascii'))
'mystring'
Or, more commonly, you probably have Unicode characters (which includes most of the ASCII character codes too):
>>> print(foo.decode('utf-8'))
'mystring'
This will work if you have glyphs with accents and such.
More on Python encoding/decoding here: https://docs.python.org/3/howto/unicode.html
In the particular case of urllib3, r.data returns bytes that you'll need to decode in order to use as a string if that's what you want.

How to slice off a certain number of bytes from a string in Python?

I am trying to write a specific number of bytes of a string to a file. In C, this would be trivial: since each character is 1 byte, I would simply write however many characters from the string I want.
In Python, however, since apparently each character/string is an object, they are of varying sizes, and I have not been able to find how to slice the string at byte-level specificity.
Things I have tried:
Bytearray:
(For $, read >>>, which messes up the formatting.)
$ barray = bytearray('a')
$ import sys
$ sys.getsizeof(barray[0])
24
So turning a character into a bytearray doesn't turn it into an array of bytes as I expected and it's not clear to me how to isolate individual bytes.
Slicing byte objects as described here:
$ value = b'a'
$ sys.getsizeof(value[:1])
34
Again, a size of 34 is clearly not 1 byte.
memoryview:
$ value = b'a'
$ mv = memoryview(value)
$ sys.getsizeof(mv[0])
34
$ sys.getsizeof(mv[0][0])
34
ord():
$ n = ord('a')
$ sys.getsizeof(n)
24
$ sys.getsizeof(n[0])
Traceback (most recent call last):
File "<pyshell#29>", line 1, in <module>
sys.getsizeof(n[0])
TypeError: 'int' object has no attribute '__getitem__'
So how can I slice a string into a particular number of bytes? I don't care if slicing the string actually leads to individual characters being preserved or anything as with C; it just has to be the same each time.

Make sure the string is encoded into a byte array (this is the default behaviour in Python 2.7).
And then just slice the string object and write the result to file.
In [26]: s = '一二三四'
In [27]: len(s)
Out[27]: 12
In [28]: with open('test', 'wb') as f:
....: f.write(s[:2])
....:
In [29]: !ls -lh test
-rw-r--r-- 1 satoru wheel 2B Aug 24 08:41 test

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python3 adding an extra byte to the byte string - python-3.x

Related

How can I create a binary file in python?

How to convert string into fixed number of bytes?

Trying to while-loop my way through a binary file until a certain byte but the loop never ends

Python - part of str [urllib3 data]

How to slice off a certain number of bytes from a string in Python?

Categories

Resources