Convert bytes to string while replacing SOH character - python-3.x

Using python 3.6. I have a bytes array that is coming over a socket (a FIX message) that contains hex character 1, start of header, as a delimiter. The raw bytes look like
b'8=FIX.4.4\x019=65\x0135=0\x0152=20220809-21:37:06.893\x0149=TRADEWEB\x0156=ABFIXREPO\x01347=UTF-8\x0110=045\x01'
I want to store this bytes array to a file for logging. I have seen FIX messages where this delimiter is converted to ^A control character. The final string I would like to have is -
8=FIX.4.4^A9=65^A35=0^A52=20220809-21:37:06.893^A49=TRADEWEB^A56=ABFIXREPO^A347=UTF-8^A10=045^A
I have tried various different ways to achieve this but could not, for example, tried repr(bytes) and (ord(b) in bytes).
Any pointers are highly appreciated.
Thanks.

One way I can think of doing this is by splitting the original decoded byte array , and insert control character using a loop.
data = b'8=FIX.4.4\x019=65\x0135=0\x0152=20220809-21:37:06.893\x0149=TRADEWEB\x0156=ABFIXREPO\x01347=UTF-8\x0110=045\x01'
data = data.decode().split("\x01")
print(data)
ctl_char = "^A"
string =""
for i in data:
string+=i
string+=ctl_char
print(string)
Please note there can be a better way of doing it.

Related

How to get packet.tcp.payload and packet.http.data as string?

The return value for these attributes are in hex format seperated by ':'
Eg : 70:79:f6:2e: something like this.
When I am trying to decode it to plain string ( human readable ) it doesn't work. What encoding is being used? I tried various different methods like codecs.decode(), binascii.unhexlify(), bytes.fromhex() also different encodings ASCII and UTF-8. Nothing worked, any help is appreciated. I am using python 3.6
Thanks for your question! I believe you're wanting to read the payload in chunks of two hex places. The functions you tried are not able to parse the : delimiter out-of-the-box. Something like splitting the string by the : delimiter, converting their values to human-readable characters, and joining the "list" to a string should do the trick.
hex_string = '70:79:f6:2e'
hex_split = hex_string.split(':')
hex_as_chars = map(lambda hex: chr(int(hex, 16)), hex_split)
human_readable = ''.join(hex_as_chars)
print(human_readable)
Is this what you have in mind?

How to use f'string bytes'string together? [duplicate]

I'm looking for a formatted byte string literal. Specifically, something equivalent to
name = "Hello"
bytes(f"Some format string {name}")
Possibly something like fb"Some format string {name}".
Does such a thing exist?
No. The idea is explicitly dismissed in the PEP:
For the same reason that we don't support bytes.format(), you may
not combine 'f' with 'b' string literals. The primary problem
is that an object's __format__() method may return Unicode data
that is not compatible with a bytes string.
Binary f-strings would first require a solution for
bytes.format(). This idea has been proposed in the past, most
recently in PEP 461. The discussions of such a feature usually
suggest either
adding a method such as __bformat__() so an object can control how it is converted to bytes, or
having bytes.format() not be as general purpose or extensible as str.format().
Both of these remain as options in the future, if such functionality
is desired.
In 3.6+ you can do:
>>> a = 123
>>> f'{a}'.encode()
b'123'
You were actually super close in your suggestion; if you add an encoding kwarg to your bytes() call, then you get the desired behavior:
>>> name = "Hello"
>>> bytes(f"Some format string {name}", encoding="utf-8")
b'Some format string Hello'
Caveat: This works in 3.8 for me, but note at the bottom of the Bytes Object headline in the docs seem to suggest that this should work with any method of string formatting in all of 3.x (using str.format() for versions <3.6 since that's when f-strings were added, but the OP specifically asks about 3.6+).
From python 3.6.2 this percent formatting for bytes works for some use cases:
print(b"Some stuff %a. Some other stuff" % my_byte_or_unicode_string)
But as AXO commented:
This is not the same. %a (or %r) will give the representation of the string, not the string iteself. For example b'%a' % b'bytes' will give b"b'bytes'", not b'bytes'.
Which may or may not matter depending on if you need to just present the formatted byte_or_unicode_string in a UI or if you potentially need to do further manipulation.
As noted here, you can format this way:
>>> name = b"Hello"
>>> b"Some format string %b World" % name
b'Some format string Hello World'
You can see more details in PEP 461
Note that in your example you could simply do something like:
>>> name = b"Hello"
>>> b"Some format string " + name
b'Some format string Hello'
This was one of the bigger changes made from python 2 to python3. They handle unicode and strings differently.
This s how you'd convert to bytes.
string = "some string format"
string.encode()
print(string)
This is how you'd decode to string.
string.decode()
I had a better appreciation for the difference between Python 2 versus 3 change to unicode through this coursera lecture by Charles Severence. You can watch the entire 17 minute video or fast forward to somewhere around 10:30 if you want to get to the differences between python 2 and 3 and how they handle characters and specifically unicode.
I understand your actual question is how you could format a string that has both strings and bytes.
inBytes = b"testing"
inString = 'Hello'
type(inString) #This will yield <class 'str'>
type(inBytes) #this will yield <class 'bytes'>
Here you could see that I have a string a variable and a bytes variable.
This is how you would combine a byte and string into one string.
formattedString=(inString + ' ' + inBytes.encode())

Python replace characters in string

Maybe I have trival issue but I can't find solution.
I use Raspberry Pi to read value from serial port. Input from serial looks like " b'1\r\n' ".
In this input I need only the number. I tried this code:
data = str(data)
data = data[2:7]
data = data.replace("\r\n","")
print(data)
Output of this code is : "1\r\n". I can't get rid of this part of string, replace function just doesn't work and I don't know why.
you have bytes you can use the decode method of bytes to get back a string. you can then use rstrip method of str to remove the trailing new line chars.
data = b'1\r\n'
print(data)
data = data.decode(('utf-8;')).rstrip()
print(data)
OUTPUT
b'1\r\n'
1

How to get MIME type base64 encoded string in nodejs?

When I convert the data to base64, it gives a single line of base64 string.
image = body.toString('base64');
How can I get base64 string used in MIME types which is wrapped at every 76 characters?
Is there any default method in node to achieve that?
There is no built-in method in nodejs for encoding to base64 with line breaks. But there is mimelib library to achieve this:
To add line breaks
mimelib.foldLine(str, 76)
To encode to base64 with line breaks
mimelib.encodeBase64(str)
To break the resulting base-64 string into lines of no more than 76 characters, one can use replace(), e.g.,
body.toString('base64').replace(/.{76}/g, '$&\n')
. = match any character other than newline
{76} = repeat that match exactly 76 times, i.e., split the string into 76-character chunks
g = globally, i.e., keep going until out of data in the string
$& = insert the matched substring
\n= followed by a newline

Convert Binary data from file to readable string

I have binary data stored in a file. I am doing this:
byte[] fileBytes = File.ReadAllBytes(#"c:\carlist.dat");
string ascii = Encoding.ASCII.GetString(fileBytes);
This is giving me following result with lot of invalid characters. What am i doing wrong?
?D{F ?x#??4????? NBR-OF-CARSNUMBER-OF-CARS!"#??? NBR-OF-CARS$%??1y0#123?G??#$ NBR-OF-CARS%45??1y#  NUMBER-OF-CARSd?
hmm... seems like a save was made from a byte buffer where after NBR-OF-CARS was written some numeric data. If you have an access to the code that saves the file could you check if there are numbers over there and if there are - check does the code converts numbers to string before witing the value into the binary stream.

Resources