Having trouble converting hex string to double - struct

I have a little-endian hex string (for example, 'E61000003C9BFAE53893') that I'm trying to convert to a double. I've tried the following:
struct.unpack('<d', binascii.unhexlify('E61000003C9BFAE53893'))
but I keep getting
struct.error: unpack requires a buffer of 8 bytes
I checked the output of binascii.unhexlify('E61000003C9BFAE53893'), and it looks correct:
>> print (binascii.unhexlify('E61000003C9BFAE53893'))
b'\xe6\x10\x00\x00<\x9b\xfa\xe58\x93'
so I'm not sure what the issue is.
For some context, I have a bunch of coordinate data encoded as WKB, but geopandas only supports WKT. I thought it would be easy to write a function to convert one to the other (or WKB to floats), but it's proving more challenging that I expected.

0xE61000003C9BFAE53893 is too long to be a double. A double is 8 bytes and that is 9...ish? If you take a look at the second to last in your output, it is "0xe58".
struct.unpack only accepts 8 byte buffers, as per the error message.

Related

How can I combine nom parsers to get a more bit-oriented interface to the data?

I'm working on decoding AIS messages in Rust using nom.
AIS messages are made up of a bit vector; the various fields in each message are an arbitrary number of bits long, and they don't always align on byte boundaries.
This bit vector is then ASCII encoded, and embedded in an NMEA sentence.
From http://catb.org/gpsd/AIVDM.html:
The data payload is an ASCII-encoded bit vector. Each character represents six bits of data. To recover the six bits, subtract 48 from the ASCII character value; if the result is greater than 40 subtract 8. According to [IEC-PAS], the valid ASCII characters for this encoding begin with "0" (64) and end with "w" (87); however, the intermediate range "X" (88) to "_" (95) is not used.
Example
!AIVDM,1,1,,A,D03Ovk1T1N>5N8ffqMhNfp0,0*68 is the NMEA sentence
D03Ovk1T1N>5N8ffqMhNfp0 is the encoded AIS data
010100000000000011011111111110110011000001100100000001011110001110000101011110001000101110101110111001011101110000011110101110111000000000 is the decoded AIS data as a bit vector
Problems
I list these together because I think they may be related...
1. Decoding ASCII to bit vector
I can do this manually, by iterating over the characters, subtracting the appropriate values, and building up a byte array by doing lots of work bitshifting, and so on. That's fine, but it seems like I should be able to do this inside nom, and chain it with the actual AIS bit parser, eliminating the interim byte array.
2. Reading arbitrary number of bits
It's possible to read, say, 3 bits from a byte array in nom. But, each call to bits! seems to consume a full byte at once (if reading into a u8).
For example:
named!(take_3_bits<u8>, bits!(take_bits!(u8, 3)));
will read 3 bits into a u8. But if I run take_3_bits twice, I'll have consumed 16 bits of my stream.
I can combine reads:
named!(get_field_1_and_2<(u8, u8)>, bits!(pair!(take_bits!(u8, 2), take_bits!(u8, 3))));
Calling get_field_1_and_2 will get me a (u8, u8) tuple, where the first item contains the first 2 bits, and the second item contains the next 3 bits, but nom will then still advance a full byte after that read.
I can use peek to prevent the nom's read pointer from advancing, and then manually manage it, but again, that seems like unnecessary extra work.

Discrepencies in Python hard coding string vs str() methods

Okay. Here is my minimal working example. When I type this into python 3.6.2:
foo = '0.670'
str(foo)
I get
>>>'0.670'
but when I type
foo = 0.670
str(foo)
I get
>>>'0.67'
What gives? It is stripping off the zero, which I believe has to do with representing a float on a computer in general. But by using the str() method, why can it retain the extra 0 in the first case?
You are mixing strings and floats. The string is sequence of code points (one code point represents one character) representing some text and interpreter processing it as a text. The string is always inside single-quotes or double-quotes (e.g. 'Hello'). The float is a number and Python know it so it also know that 1.0000 is the same as 1.0.
In the first case you saved into foo a string. The str() call on string just take the string and return it as is.
In the second case you saved 0.670 as a float (because it's not wrapped in quotes). When Python converting float into a string it always tries create the shortest string possible.
Why Python automatically truncates the trailing zero?
When you try save some real number into computer's memory you have to convert it into binary representation. Usually (but there some exceptions) it's saved in format described in the standard IEEE 754 and Python uses it for floats too.
Let's go to the some example:
from struct import pack
x = -1.53
y = -1.53000
print("X:", pack(">d", x).hex())
print("Y:", pack(">d", y).hex())
The pack() function takes input and based on given format (>d) convert it into bytes. In this case it takes float number and give as how it is saved in memory. If you run the code you will see the x and y are saved in the memory in the same way. The memory doesn't contain information about the format of saved number.
Of course you can add some information about it but:
It would take another memory and it's good practice to use as much memory as you actually need and don't waste it.
What would be result of 0.10 + 0.1 should it be 0.2 or 0.20?
For scientific purposes and significant figures shouldn't it leave the value as the user defined it?
It doesn't matter how you defined the input number. The important is what format you want to use for presenting. As I said the str() always tries create the shortest string possible. str() is good for some simple scripts or tests. For scientific purposes (or for uses where some representation is required) you can convert your numbers to string as you want or need.
For example:
x = -1655484.4584631
y = 42.0
# always print number with sign and exactly 5 numbers from fractional part
print("{:+.5f}".format(x)) # -1655484.45846
print("{:+.5f}".format(y)) # +42.00000
# always print number in scientific format (sign is showed only when the number is negative)
print("{:-2e}".format(x)) # -1.66e+06
print("{:-2e}".format(y)) # 4.20e+01
For more information about formatting numbers and others types look at the Python's documentation.

convert 32 bit, binary number to decimal in python

I'm trying to convert a 32-bit number to decimal in python. I'm quite new at python and so I'm not sure how to go about it. What I have so far is something like
file=open('filepath', 'rb')
num=file.read(4)
The value for num looks something like
b'\x05\x00\x00\x00'
How can I easily convert this to an integer value that can be stored? Eventually I will want to read in every value of this file, and store them to plotted later.
Thanks!
There is a module called struct which might be helpful when unpacking bytes to integer or any other form.
import struct
struct.unpack('i', b'\x05\x00\x00\x00') # i stands for integer
Gives output as (5,) which you can again unpack in some var or use it directly as per your needs.

Is it safe to cast binary data from a byte array to a string and back in golang?

Maybe a stupid question, but if I have some arbitrary binary data, can I cast it to string and back to byte array without corrupting it?
Is []byte(string(byte_array)) always the same as byte_array?
The expression []byte(string(byte_slice)) evaluates to a slice with the same length and contents as byte_slice. The capacity of the two slices may be different.
Although some language features assume that strings contain valid UTF-8 encoded text, a string can contain arbitrary bytes.

Compress bytes into a readable string (no null or endofline)

I'm searching for the most appropriated encoding or method to compress bytes into character that can be read with a ReadLine-like command that only recognizes readable char and terminates on end of line char. There is probably a common practice to achieve it, but I don't know a lot about encoding.
Currently, I'm outputing bytes as a string of hex, so I need 2 bytes to represent 1 byte. It works well, but it is slow. Ex: byte with a value 255 is represented as 'FF'.
I'm sure it could be 3 or 4 times smaller, though there's a limit since I'm outputing MP3 data, but I don't know how. Should I just ZIP my string or there would be too much overhead on it?
Will ASCII85 contains random null bytes and EndOfLine or I'm safe with it?
Don't zip mp3 files, that will not gain much (or anything at all).
I'm a bit disappointed that you did not read up on Ascii85 before asking as I think the Wikipedia article explains fairly clearly that it uses only printable ASCII characters; so, no line endings or null bytes. It is efficient and the conversion is also fairly simple and quick - split your data to 4-byte ints; you will convert these to just five Ascii85 digits by repeatedly dividing the int value by 85 and taking ASCII value of the modulo + 33.
You can also consider using Base64 or UUEncode. These are fairly popular (e.g. used in email attachments) so you will find many libraries preparing these. But they are less efficient.

Resources