What are UInt16LE, UInt16BE, etc. in Node JS? - node.js

In all of my time programming I have squeaked by without ever learning this stuff. Would love to know more about what these are and how they are used:
UInt8
UInt16LE
UInt16BE
UInt32LE
UInt32BE
Int8
Int16LE
Int16BE
Int32LE
Int32BE
FloatLE
FloatBE
DoubleLE
DoubleBE
See https://nodejs.org/api/buffer.html#buffer_buf_readuint8_offset_noassert for where Node uses these.

This datatypes are related to number representation in appropriate byte-order. It typically essential for:
Network protocols
Binary file formats
It is essential because one system should write integers/floats in such way that will give the same value on reader side. So what format to be used is just convention between two sides (writer and reader).
What acronyms means:
BE suffix stands for BigEndian
LE stands for LittleEndian
Int is Integer
Uint is Unsigned Integer
Appropriate number in integers is number of bits in the word.

Related

How can I bit-convert between Int and Word quickly?

The Haskell base documentation says that "A Word is an unsigned integral type, with the same size as Int."
How can I take an Int and cast its bit representation to a Word, so I get a Word value with the same bit representation as the original Int (even though the number values they represent will be different)?
I can't use fromIntegral because that will change the bit representation.
I could loop through the bits with the Bits class, but I suspect that will be very slow - and I don't need to do any kind of bit manipulation. I want some kind of function that will be compiled down to a no-op (or close to it), because no conversion is done.
Motivation
I want to use IntSet as a fast integer set implementation - however, what I really want to store in it are Words. I feel that I could create a WordSet which is backed by an IntSet, by converting between them quickly. The trouble is, I don't want to convert by value, because I don't want to truncate the top half of Word values: I just want to keep the bit representation the same.
int2Word#/word2Int# in GHC.Prim perform bit casting. You can implement wrapper functions which cast between boxed Int/Word using them easily.

How to use leading_zeros/trailing_zeros in platform independent way?

I want find the first non-zero bit in the binary representation of a u32. leading_zeros/trailing_zeros looks like what I want:
let x: u32 = 0b01000;
println!("{}", x.trailing_zeros());
This prints 3 as expected and described in the docs. What will happen on big-endian machines, will it be 3 or some other number?
The documentation says
Returns the number of trailing zeros in the binary representation
is it related to machine binary representation (so the result of trailing_zeros depends on architecture) or base-2 numeral system (so result will be always 3)?
The type u32 respresents binary numbers with 32 bits as an abstract concept. You can imagine them as abstract, mathematical numbers in the range from 0 to 232-1. The binary representation of these numbers is written in the usual convention of starting with the most significant bit (MSB) and ending with the least significant bit (LSB), and the trailing_zeros() method returns the number of trailing zeros in that representation.
Endianness only comes into play when serializing such an integer to bytes, e.g. for writing it to a bytes buffer, a file or the network. You are not doing any of this in your code, so it doesn't matter here.
As mentioned above, writing a number starting with the MSB is also just a convention, but this convention is pretty much universal today for numbers written in positional notation. For programming, this convention is only relevant when formatting a number for display, parsing a number from a string, and maybe for naming methods like trailing_zeros(). When storing an u32 in a register, the bits don't have any defined order.

Is it possible to convert a float to double?

I’m working on a TXT to SPC converter, and certain values have to be stored as hex of double, but Python only works with float and struct.unpack(‘<d’, struct.pack(‘<f’, value))/any other unpack and pack matryoshka doll I can conceive doesn’t work because of the difference in byte size.
The SPC library unpacks said values from SPC as <d and converts them to float through float()
What do I do?
I think you may be getting confused by different programming languages' naming strategies.
There's a class of data types known as "floating point numbers". Two floating-point number types defined by IEEE-754 are "binary32" and "binary64". In C and C++, those two types are exposed as the types float and double, respectively. In Python, only "binary64" is natively supported as a built-in type; it's known as float.
Python's struct module supports both binary32 and binary64, and uses C/C++'s nomenclature to refer to them. f specifies binary32 and d specifies binary64. Regardless of which you're using, the module packs from and unpacks to Python's native float type (which, remember, is binary64). In the case of d that's exact; in the case of f it converts the type under the hood. You don't need to fool Python into doing the conversion.
Now, I'm just going to assume you're wrong about "stored as hex of double". What I think you probably mean is "stored as double" -- namely, 64 bits in a file -- as opposed to stored as "hex of double", namely sixteen human-readable ASCII characters. That latter one just doesn't happen.
All of which is to say, if you want to store things as binary64, it's just a matter of struct.pack('d', value).

Is there a difference between datatypes on different bit-size OSes?

I have a C program that I know works on 32-bit systems. On 64-Bit systems (at least mine) it works to a point and then stops. Reading some forums the program may not be 64-bit safe? I assume it has to do with differences of data types between 32-bit and 64-bit systems.
Is a char the same on both? what about int or long or their unsigned variants? Is there any other way a 32-bit program wouldn't be 64-bit safe? If I wanted to verify the application is 64-bit safe, what steps should I take?
Regular data types in C has minimum ranges of values rather than specific bit widths. For example, a short has to be able to represent, at a minimum, -32767 thru 32767 inclusive.
So,yes, if your code depends on values wrapping around at 32768, it's unlikely to behave well if the short is some big honking 128-bit behemoth.
If you want specific-width data types, look into stdint.h for things like int64_t and so on. There are a wide variety to choose from, specific widths, "at-least" widths, and so on. They also mandate two's complement for these, unlike the "regular" integral types:
integer types having certain exact widths;
integer types having at least certain specified widths;
fastest integer types having at least certain specified widths;
integer types wide enough to hold pointers to objects;
integer types having greatest width.
For example, from C11 7.20.1.1 Exact-width integer types:
The typedef name intN_t designates a signed integer type with width N, no padding
bits, and a two’s complement representation. Thus, int8_t denotes such a signed
integer type with a width of exactly 8 bits.
Provided you have followed the rules (things like not casting pointers to integers), your code should compile and run on any implementation, and any architecture.
If it doesn't, you'll just have to start debugging, then post the detailed information and code that seems to be causing problem on a forum site dedicated to such things. Now where have I seen one of those recently? :-)

Why are there so many string types in MFC?

LPTSTR* arrpsz = new LPTSTR[ m_iNumColumns ];
arrpsz[ 0 ] = new TCHAR[ lstrlen( pszText ) + 1 ];
(void)lstrcpy( arrpsz[ 0 ], pszText );
This is a code snippet about String in MFC and there are also _T("HELLO"). Why are there so many String types in MFC? What are they used for?
Strictly speaking, what you're showing here are windows specific strings, not MFC String types (but your point is even better taken if you add in CString and std::string). It's more complex than it needs to be -- largely for historical reasons.
tchar.h is definitely worth looking at -- also search for TCHAR on MSDN.
There's an old joke about string processing in C that you may find amusing: string handling in C is so efficient because there's no string type.
Historical reasons.
The original windows APIs were in C (unless the real originals were in Pascal and have been lost in the mists). Microsoft created its own datatypes to represent C datatypes, likely because C datatypes are not standard in their size. (For C integral types, char is at least 8 bits, short is at least 16 bits and at least as big as a char, int is at least 16 bits and at least as big as a short, and long is at least 32 bits and at least as big as an int.) Since Windows ran first on essentially 16-bit systems and later 32-bit, C compilers didn't necessarily agree on sizes. Microsoft further designated more complex types, so (if I've got this right) a C char * would be referred to as a LPCSTR.
Thing is, an 8-bit character is not suitable for Unicode, as UTF-8 is not easy to retrofit into C or C++. Therefore, they needed a wide character type, which in C would be referred to as wchar_t, but which got a set of Microsoft datatypes corresponding to the earlier ones. Furthermore, since people might want to compile sometimes in Unicode and sometimes in ASCII, they made the TCHAR character type, and corresponding string types, which would be based on either char (for ASCII compilation) or wchar_t (for Unicode).
Then came MFC and C++ (sigh of relief) and Microsoft wanted a string type. Since this was before the standardization of C++, there was no std::string, so they invented CString. (They also had container classes that weren't compatible with what came to be the STL and then the containers part of the library.)
Like any mature and heavily used application or API, there's a whole lot in it that would be done completely differently if it were possible to do it over from scratch.
See Generic-Text Mappings in TCHAR.H and the description of LPTSTR in Windows Data Types.

Resources