Python tuple data byte to string easy way? - python-3.x

I'm using socket, struct for receive and unpack the received bytes message via tcp/ip protocol, I'm getting tuple which contains numeric data as well as bytes in the defined order as per contract.
The example data as below...
Example:
receive buffer data from tcp ip
buffer = sock.recv(61)
Unpack the bytes into predefined struct format
tup_data = struct.unpack("<lll8s17sh22s", buffer)
tup_data
(61,12000,4000,b'msg\x00\x00\x00\x00\x00',b'anther
msg\x00\x00\x00\x00\x00\x00\x00',4,b'yet another
msg\x00\x00\x00\x00\x00\x00\x00')
since the data is highly streaming and execution time is matter... I don't want to load the cpu by using any looping and isinstance() method.
Since the location of bytes are defined, so I'm currently using as
processed_data = (*tup_data[:3],
tup_data[3].strip(b"\x00").decode(),
tup_data[4].strip(b"\x00").decode(),
tup_data[5],
tup_data[6].strip(b"\x00").decode())
processed_data
(61,12000,4000,"msg","anther msg",4,"yet another msg")
Is there any magic way to convert bytes into required string at one shot as the location of bytes are known...??

Since you're using struct.unpack for unpacking your buffer and due to the format-characters chart you can't get string format as your output. Therefore you should either strip the extra \x00 at the source or just use a generator comprehension as following to reformat the items that are instances of bytes.
In [12]: tuple(i.strip(b'\x00').decode() if isinstance(i, bytes) else i for i in t)
Out[12]: (61, 12000, 4000, 'msg', 'anther msg', 4, 'yet another msg')

Related

Python 3: for x in bytes as words

I have a python3 script which reads data into a buffer with
fp = open("filename", 'rb')
data = fp.read(count)
I don't fully understand (even after reading the documentation) what read() returns. It appears to be some kind of binary data which is iterable. But it is not a list.
Confusingly, elsewhere in the script, lists are used for binary data.
frames = []
# then later... inside a loop
for ...
data = b''.join(frames)
Regardless... I want to iterate over the object returned by read() in units of word (aka 2 byte blocks)
At the moment the script contains this for loop
for c in data:
# do something
Is it possible to change c such that this loop iterates over words (2 byte blocks) rather than individual bytes?
I cannot use read() in a loop to read 2 bytes at a time.
We can explicitly read (up to) n bytes from a file in binary mode with .read(n) (just as it would read n Unicode code points from a file opened in text mode). This is a blocking call and will only read fewer bytes at the end of the file.
We can use the two-argument form of iter to build an iterator that repeatedly calls a callable:
>>> help(iter)
Help on built-in function iter in module builtins:
iter(...)
iter(iterable) -> iterator
iter(callable, sentinel) -> iterator
Get an iterator from an object. In the first form, the argument must
supply its own iterator, or be a sequence.
In the second form, the callable is called until it returns the sentinel.
read at the end of the file will start returning empty results and not raise an exception, so we can use that for our sentinel.
Putting it together, we get:
for pair in iter(lambda: fp.read(2), b''):
Inside the loop, we will get bytes objects that represent two bytes of data. You should check the documentation to understand how to work with these.
When reading a file in binary mode, a bytes object is returned, which is one of the standard python builtins. In general, its representation in the code looks like that of a string, except that it is prefixed as b" " - When you try printing it, each byte may be displayed with an escape like \x** where ** are 2 hex digits corresponding to the byte's value from 0 to 255, or directly as a single printable ascii character, with the same ascii codepoint as the number. You can read more about this and methods etc of bytes (also similar to those for strings) in the bytes docs.
There already seems to be a very popular question on stack overflow about how to iterate over a bytes object. The currently accepted answer gives this example for creating a list of individual bytes in the bytes object :
L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))]
I suppose that modifying it like this will work for you :
L = [bytes_obj[i:i+2] for i in range(0, len(bytes_obj), 2)]
For example :
by = b"\x00\x01\x02\x03\x04\x05\x06"
# The object returned by file.read() is also bytes, like the one above
words = [by[i:i+2] for i in range(0, len(by), 2)]
print(words)
# Output --> [b'\x00\x01', b'\x02\x03', b'\x04\x05', b'\x06']
Or create a generator that yields words in the same way if your list is likely to be too large to efficiently store at once:
def get_words(bytesobject):
for i in range(0, len(bytesobject), 2):
yield bytesobject[i:i+2]
In the most simple literal sense, something like this gives you a two byte at a time loop.
with open("/etc/passwd", "rb") as f:
w = f.read(2)
while len(w) > 0:
print( w )
w = f.read(2)
as for what you are getting from read, it's a bytes object, because you have specified 'b' as an option to the `open
I think a more python way to express it would be via an iterator or generator.

How to append chunks of a pickled pandas DataFrame

I have pickled a pandas data frame on my server and I'm sending it via a socket connection and I can receive the data but I can't seem to append the chunks of data back together to the original Dataframe format that's all I'm trying to achieve! I have a feeling its the way I'm appending as its turning into a list because of data = [] but I tried an empty pd Dataframe and that didn't work so I'm kinda a bit lost as to how ill append these values
data = []
FATPACKET = 0
bytelength = self.s.recv(BUFFERSIZE)
length = int(pickle.loads(bytelength))
print(length)
ammo = 0
while True:
print("Getting Data....")
packet = self.s.recv(1100)
FATPACKET = int(sys.getsizeof(packet))
ammo += FATPACKET
print(str(FATPACKET) + ' Got this much of data out of ' +str(length))
print("Getting Data.....")
data.append(packet)
print(ammo)
if not ammo > length:
break
print(data)
unpickled = pickle.loads(data)
self.s.close()
print("Closing Connection!")
print(unpickled)
when I try this code Im constantly running into this
TypeError: a bytes-like object is required, not 'list'
or I run into this
_pickle.UnpicklingError: invalid load key, '\x00'
which is the first couple digits of my pickled Dataframe sorry this is my first time messing around with the pickle module so I'm not very knowledgeable
It would help if we could also see exactly what you're doing on the sending end. However, it's apparent that you have several problems.
First, in the initial recv, it's obvious you intended that to only obtain the initial pickle object you used to encode the length of the remaining bytes. However, that recv might also receive an initial segment of the remaining bytes (or even all of the remaining bytes, depending on how large that is). So how much of it should you give to the initial pickle.loads?
You would be better off creating a fixed length field to contain the size of the remaining data. That is often done with the struct module. On the sending side:
import struct
# Pickle the data to be sent
data = pickle.dumps(python_obj)
data_len = len(data)
# Pack data length into 4 bytes, encoded in network byte order
data_len_to_send = struct.pack('!I', data_len)
# Send exactly 4 bytes (due to 'I'), then send data
conn.sendall(data_len_to_send)
conn.sendall(data)
On the receiving side, as the exception said, pickle.loads takes a byte string not a list. So part of solving this will be to concatenate all the list elements into a single byte string before calling loads:
unpickled = pickle.loads(b''.join(data))
Other issues on receiving side: use len(packet) to get the buffer size. sys.getsizeof provides the internal memory used by the bytes object which includes unspecified interpreter overhead and isn't what you need here.
After recv, the first thing you should do is check for an empty buffer which indicates the end-of-stream (len(packet) == 0 or packet == '' or not packet even). That would happen for example if the sender got killed before completing the send (or the network link goes down, or some bug on the sender side, etc.). Otherwise, if the connection ends prematurely, your program will never reach the break and hence it will be in a very tight infinite loop.
So, altogether you could do something like this:
# First, obtain fixed-size content length
buf = b''
while len(buf) < 4:
tbuf = recv(4 - len(buf))
if tbuf == '':
raise RuntimeError("Lost connection with peer")
buf += tbuf
# Decode (unpack) length (note that unpack returns an array)
len_to_recv = struct.unpack('!I', buf)[0]
data = []
len_recved = 0
while len_recvd < len_to_recv:
buf = self.s.recv(min(len_to_recv - len_recvd, BUFFERSIZE))
if buf == '':
raise RuntimeError("Lost connection with peer")
data.append(buf)
len_recvd += len(buf)
unpickled_obj = pickle.loads(b''.join(data))
EDIT: moved parenthesis

Python3: Converting or 'Casting' byte array string from string to bytes array

New to this python thing.
A little while ago I saved off output from an external device that was connected to a serial port as I would not be able to keep that device. I read in the data at the serial port as bytes with the intention of creating an emulator for that device.
The data was saved one 'byte' per line to a file as example extract below.
b'\x9a'
b'X'
b'}'
b'}'
b'x'
b'\x8c'
I would like to read in each line from the data capture and append what would have been the original byte to a byte array.
I have tried various append() and concatenation operations (+=) on a bytearray but the above lines are python string objects and these operations fail.
Is there an easy way (a built-in way?) to add each of the original byte values of these lines to a byte array?
Thanks.
M
Update
I came across the .encode() string method and have created a crude function to meet my needs.
def string2byte(str):
# the 'byte' string will consist of 5, 6 or 8 characters including the newline
# as in b',' or b'\r' or b'\x0A'
if len(str) == 5:
return str[2].encode('Latin-1')
elif len(str) == 6:
return str[3].encode('Latin-1')
else:
return str[4:6].encode('Latin-1')
...well, it is functional.
If anyone knows of a more elegant solution perhaps you would be kind enough to post this.
b'\x9a' is a literal representation of the byte 0x9a in Python source code. If your file literally contains these seven characters b'\x9a' then it is bad because you could have saved it using only one byte. You could convert it to a byte using ast.literal_eval():
import ast
with open('input') as file:
a = b"".join(map(ast.literal_eval, file)) # assume 1 byte literal per line

Write binary data to file in python3

I've been having a LOT of trouble with this and the other questions don't seem to be what I'm looking for. So basically I have a list of bytes gotten from
bytes = struct.pack('I',4)
bList = list(bytes)
# bList ends up being [0,0,0,4]
# Perform some operation that switches position of bytes in list, etc
So now I want to write this to a file
f = open('/path/to/file','wb')
for i in range(0,len(bList)):
f.write(bList[i])
But I keep getting the error
TypeError: 'int' does not support the buffer interface
I've also tried writing:
bytes(bList[i]) # Seems to write the incorrect number.
str(bList[i]).encode() # Seems to just write the string value instead of byte
Oh boy, I had to jump through hoops to solve this. So basically I had to instead do
bList = bytes()
bList += struct.pack('I',4)
# Perform whatever byte operations I need to
byteList = []
# I know, there's probably a list comprehension to do this more elegantly
for i in range(0,len(bList)):
byteList.append(bList[i])
f.write(bytes(byteList))
So bytes can take an array of byte values (even if they're represented in decimal form in the array) and convert it to a proper byteArray by casting

Is it possible to base64-encode a file in chunks?

I'm trying to base64 encode a huge input file and end up with an text output file, and I'm trying to find out whether it's possible to encode the input file bit-by-bit, or whether I need to encode the entire thing at once.
This will be done on the AS/400 (iSeries), if that makes any difference. I'm using my own base64 encoding routine (written in RPG) which works excellently, and, were it not a case of size limitations, would be fine.
It's not possible bit-by-bit but 3 bytes at a time, or multiples of 3 bytes at time will do!.
In other words if you split your input file in "chunks" which size(s) is (are) multiples of 3 bytes, you can encode the chunks separately and piece together the resulting B64-encoded pieces together (in the corresponding orde, of course. Note that the last chuink needn't be exactly a multiple of 3 bytes in size, depending on the modulo 3 value of its size its corresponding B64 value will have a few of these padding characters (typically the equal sign) but that's ok, as thiswill be the only piece that has (and needs) such padding.
In the decoding direction, it is the same idea except that you need to split the B64-encoded data in multiples of 4 bytes. Decode them in parallel / individually as desired and re-piece the original data by appending the decoded parts together (again in the same order).
Example:
"File" contents =
"Never argue with the data." (Jimmy Neutron).
Straight encoding = Ik5ldmVyIGFyZ3VlIHdpdGggdGhlIGRhdGEuIiAoSmltbXkgTmV1dHJvbik=
Now, in chunks:
"Never argue --> Ik5ldmVyIGFyZ3Vl
with the --> IHdpdGggdGhl
data." (Jimmy Neutron) --> IGRhdGEuIiAoSmltbXkgTmV1dHJvbik=
As you see piece in that order the 3 encoded chunks amount the same as the code produced for the whole file.
Decoding is done similarly, with arbitrary chuncked sized provided they are multiples of 4 bytes. There is absolutely not need to have any kind of correspondance between the sizes used for encoding. (although standardizing to one single size for each direction (say 300 and 400) may makes things more uniform and easier to manage.
It is a trivial effort to split any given bytestream into chunks.
You can base64 any chunk of bytes without problem.
The problem you are faced with is that unless you place specific requirements on your chunks (multiples of 3 bytes), the sequence of base64-encoded chunks will be different than the actual output you want.
In C#, this is one (sloppy) way you could do it lazily. The execution is actually deferred until string.Concat is called, so you can do anything you want with the chunked strings. (If you plug this into LINQPad you will see the output)
void Main()
{
var data = "lorum ipsum etc lol this is an example!!";
var bytes = Encoding.ASCII.GetBytes(data);
var testFinal = Convert.ToBase64String(bytes);
var chunkedBytes = bytes.Chunk(3);
var base64chunks = chunkedBytes.Select(i => Convert.ToBase64String(i.ToArray()));
var final = string.Concat(base64chunks);
testFinal.Dump(); //output
final.Dump(); //output
}
public static class Extensions
{
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> list, int chunkSize)
{
while(list.Take(1).Count() > 0)
{
yield return list.Take(chunkSize);
list = list.Skip(chunkSize);
}
}
}
Output
bG9ydW0gaXBzdW0gZXRjIGxvbCB0aGlzIGlzIGFuIGV4YW1wbGUhIQ==
bG9ydW0gaXBzdW0gZXRjIGxvbCB0aGlzIGlzIGFuIGV4YW1wbGUhIQ==
Hmmm, if you wrote the base64 conversion yourself you should have noticed the obvious thing: each sequence of 3 octets is represented by 4 characters in base64.
So you can split the base64 data at every multiple of four characters, and it will be possible to convert these chunks back to their original bits.
I don't know how character files and byte files are handled on an AS/400, but if it has both concepts, this should be very easy.
are text files limited in the length of each line?
are text files line-oriented, or are they just character streams?
how many bits does one byte have?
are byte files padded at the end, so that one can only create files that span whole disk sectors?
If you can answer all these questions, what exact difficulties do you have left?

Resources