I have a PHP class for reading Binary data that I'm converting to NodeJS or finding the equivalent of a couple functions in NodeJS. The functions I'm interested in this BinaryReader class are ReadULong and ReadUShort. I believe these mean read Unsigned Long integer (4 bytes) and Unsigned Short integer (2 bytes). As I'm trying to find the equivalent for these in NodeJS, I get confused on which function to use between these:
buf.readUInt16LE(offset, [noAssert])
buf.readUInt16BE(offset, [noAssert])
buf.readUInt32LE(offset, [noAssert])
buf.readUInt32BE(offset, [noAssert])
What would LE or BE stand for in this case?
The Buffer docs are located here but I was unable to find an explanation for those here.
Also I've found a constant on the PHP class that says const DEFAULT_BYTE_ORDER = 'L';. Is this L same as that L in readUInt32LE? Is this whole thing about Byte Orders?
So far I've read these articles:
Good Source at cplusplus.com for looking up variable types.
PHP bytewise tutorial and binary math
How to read binary files byte by byte in Node.js question at stackoverflow
If I could be given a couple more references to read about binary reading that would be much appreciated!
BE and LE stand for big endian and little endian. In big endian, the most significant byte is stored in the smallest address, and in little endian, the least significant byte is stored in the smallest address. That being said, endian does indicate the byte order. You can see the pattern in one of the examples in the documentation:
var buf = new Buffer(2);
buf[0] = 0x3;
buf[1] = 0x4;
buf.readUInt16BE(0);
buf.readUInt16LE(0);
// 0x0304
// 0x0403
Related
I am programming a Node JS client that's querying a third-party server for information. Part of the message that I receive back from the third party server is a UINT 128 (specified in documentation). However, from what I can tell, NodeJS buffers do not support reading 128 bit ints from buffers. I have checked this page and the two most likely functions (reading BIGINTS and reading unspecified byte length ints) both don't support 128 bit integers.
I've already tried installing and using the big-integer module, but if that is the correct method, I couldn't understand how to read the bytes out of the buffer into a big-integer object.
How can I read a UINT128 out of a NodeJS buffer? I'm' willing to use bit arithmetic if necessary, I'm just not sure how.
I did it by using getBigUint64 on a DataView twice, then calling toString on both, concatenating the strings and parsing that string as a BigInt... it's dumb but it seems to work. You may also be able to concat the 64bit BigInts using bitwise operators but I haven't tried.
readUInt128() {
let a = this.readUInt64();
let b = this.readUInt64();
return BigInt(a.toString() + b.toString());
}
readUInt64() {
let result = this.view.getBigUint64(this.pos, this.littleEndian)
this.pos += 8
return result
}
Background
I am reading buffers using the Node.js buffer native API. This API has two functions called readUIntBE and readUIntLE for Big Endian and Little Endian respectively.
https://nodejs.org/api/buffer.html#buffer_buf_readuintbe_offset_bytelength_noassert
Problem
By reading the docs, I stumbled upon the following lines:
byteLength Number of bytes to read. Must satisfy: 0 < byteLength <= 6.
If I understand correctly, this means that I can only read 6 bytes at a time using this function, which makes it useless for my use case, as I need to read a timestamp comprised of 8 bytes.
Questions
Is this a documentation typo?
If not, what is the reason for such an arbitrary limitation?
How do I read 8 bytes in a row ( or how do I read sequences greater than 6 bytes? )
Answer
After asking in the official Node.js repo, I got the following response from one of the members:
No it is not a typo
The byteLength corresponds to e.g. 8bit, 16bit, 24bit, 32bit, 40bit and 48bit. More is not possible since JS numbers are only safe up to Number.MAX_SAFE_INTEGER.
If you want to read 8 bytes, you can read multiple entries by adding the offset.
Source: https://github.com/nodejs/node/issues/20249#issuecomment-383899009
I have a node.js application that contains the following piece of code:
const crypto = require('crypto');
const randomBytes=crypto.randomBytes(15);
Now what I want to do is to read a single each byte from 15 bytes because I want to perform a low level operation such as sending via USB in a manner that this answer explains.
But how I can do that?
The crypto.randomBytes return a buffer in order to achieve that you should put read the data by using these methods:
randomBytesInABuffer.readUInt8(offset);
randomBytesInABuffer.readInt8(offset)
Keep in mind that in lower-level languages and specifically C and C++ the variable types char and unsigned char also have 1 byte length and can be used as 1byte integers too.
I am trying to convert a Ruby program to NodeJS, but I seem to be getting stuck with buffers.
I have
rounds = header_bytes[120..-1].unpack('L*').first
In Ruby, which headers a buffer (header_bytes), and get's 120-124 (or in this case -1, which is remaining). Then unpacks it into an unsigned 32 bit integer.
I am trying to do the same thing in JS, but it can't seem to get it to work. I have
rounds = header.slice(120,124).toString('ucs2');
I've tried all the different formats in toString and nothing returns the same result as Ruby.
Assuming that header is an instance of Node's Buffer then you have a variety of functions for reading from a buffer as various sizes of integer, including
buf.readUInt32LE
buf.readUInt32BE
These both take an offset from which to read the bytes. The ruby L specifier means native byte order so depending on where this code is running you might need either of those functions, depending on whether you're on a big or little endian platform. For example on an x86 machine you'd do
header.readUInt32LE(120)
Protocols normally specify big or little endian (traditionally network byte order is big endian)
You can check the platform endianness with os.endianness
I'm writing a client and server program with Linux socket programming. I'm confused about something. Although sizeof(char) is guaranteed to be 1, I know the real size of char may be different in different computer. It may be 8bits,16bits or some other size. The problem is that what if client and server have different size of char. For example client char size is 8bits and server char size is 16bits. Client call write(socket_fd, *c, sizeof(char)) and Server call read(socket_fd, *c, sizeof(char)). Does Client sends 8bits and Server wants to receive 16bits? If it is true, what will happen?
Another question: Is it good for me to pass text between client and server because I don't need to consider the big endian and little endian problem?
Thanks in advance.
What system are you communicating with that has 16bits in a byte? In any case, if you want to know exactly how many bits you have - use int8 instead.
#Basile is right. A char is always eight bits in linux. I found this in the book Linux Kernel Development. This book also states some other rules:
Although there is no rule that the int type be 32 bits, it is in Linux on all currently supported architectures.
The same goes for the short type, which is 16 bits on all current architectures, although no rule explicitly decrees that.
Never assume the size of a pointer or a long, which can be either 32 or 64 bits on the currently supported machines in Linux.
Because the size of a long varies on different architectures, never assume that sizeof(int) is equal to sizeof(long).
Likewise, do not assume that a pointer and an int are the same size.
For the choice of pass by binary data or text data through the network, the book UNIX Network Programming Volume1 gives the two solutions:
Pass all numeric data as text strings.
Explicitly define the binary formats of the supported datatypes (number of bits, big- or little-endian) and pass all data between the client and server in this format. RPC packages normally use this technique. RFC 1832 [Srinivasan 1995] describes the External Data Representation (XDR) standard that is used with the Sun RPC package.
The c definition of char as the size of a memory cell is different from the definition used in Unicode.
A Unicode code-point can, depending on the encoding used, require up to 6 bytes of storage.
This is a slightly different problem than byte order and word size differences between different architectures, etc.
If you wish to express complex structures (containing unicode text), it's probably a
good idea to implement a message protocol, that encode messages to a byte array, that can be send over any communication channel.
A simple client/server mechanism is to send a fixed size header containing the length of the following message. It's a nice exercise to build something like this in c... :-)
Depending on what you are trying to do, it may be worthwhile to look at existing technologies for the message interface; Look at Etch, Thrift, SWIG, *-rpc, asn1, soap, xml, json, corba, etc.