How to append binary data to a buffer in node.js

How to append binary data to a buffer in node.js - node.js

I have a buffer with some binary data:
var b = new Buffer ([0x00, 0x01, 0x02]);
and I want to append 0x03.
How can I append more binary data? I'm searching in the documentation but for appending data it must be a string, if not, an error occurs (TypeError: Argument must be a string):
var b = new Buffer (256);
b.write ("hola");
console.log (b.toString ("utf8", 0, 4)); //hola
b.write (", adios", 4);
console.log (b.toString ("utf8", 0, 11)); //hola, adios
Then, the only solution I can see here is to create a new buffer for every appended binary data and copy it to the major buffer with the correct offset:
var b = new Buffer (4); //4 for having a nice printed buffer, but the size will be 16KB
new Buffer ([0x00, 0x01, 0x02]).copy (b);
console.log (b); //<Buffer 00 01 02 00>
new Buffer ([0x03]).copy (b, 3);
console.log (b); //<Buffer 00 01 02 03>
But this seems a bit inefficient because I have to instantiate a new buffer for every append.
Do you know a better way for appending binary data?
EDIT
I've written a BufferedWriter that writes bytes to a file using internal buffers. Same as BufferedReader but for writing.
A quick example:
//The BufferedWriter truncates the file because append == false
new BufferedWriter ("file")
.on ("error", function (error){
console.log (error);
})
//From the beginning of the file:
.write ([0x00, 0x01, 0x02], 0, 3) //Writes 0x00, 0x01, 0x02
.write (new Buffer ([0x03, 0x04]), 1, 1) //Writes 0x04
.write (0x05) //Writes 0x05
.close (); //Closes the writer. A flush is implicitly done.
//The BufferedWriter appends content to the end of the file because append == true
new BufferedWriter ("file", true)
.on ("error", function (error){
console.log (error);
})
//From the end of the file:
.write (0xFF) //Writes 0xFF
.close (); //Closes the writer. A flush is implicitly done.
//The file contains: 0x00, 0x01, 0x02, 0x04, 0x05, 0xFF
LAST UPDATE
Use concat.

Updated Answer for Node.js ~>0.8
Node is able to concatenate buffers on its own now.
var newBuffer = Buffer.concat([buffer1, buffer2]);
Old Answer for Node.js ~0.6
I use a module to add a .concat function, among others:
https://github.com/coolaj86/node-bufferjs
I know it isn't a "pure" solution, but it works very well for my purposes.

Buffers are always of fixed size, there is no built in way to resize them dynamically, so your approach of copying it to a larger Buffer is the only way.
However, to be more efficient, you could make the Buffer larger than the original contents, so it contains some "free" space where you can add data without reallocating the Buffer. That way you don't need to create a new Buffer and copy the contents on each append operation.

This is to help anyone who comes here looking for a solution that wants a pure approach. I would recommend understanding this problem because it can happen in lots of different places not just with a JS Buffer object. By understanding why the problem exists and how to solve it you will improve your ability to solve other problems in the future since this one is so fundamental.
For those of us that have to deal with these problems in other languages it is quite natural to devise a solution, but there are people who may not realize how to abstract away the complexities and implement a generally efficient dynamic buffer. The code below may have potential to be optimized further.
I have left the read method unimplemented to keep the example small in size.
The realloc function in C (or any language dealing with intrinsic allocations) does not guarantee that the allocation will be expanded in size with out moving the existing data - although sometimes it is possible. Therefore most applications when needing to store a unknown amount of data will use a method like below and not constantly reallocate, unless the reallocation is very infrequent. This is essentially how most file systems handle writing data to a file. The file system simply allocates another node and keeps all the nodes linked together, and when you read from it the complexity is abstracted away so that the file/buffer appears to be a single contiguous buffer.
For those of you who wish to understand the difficulty in just simply providing a high performance dynamic buffer you only need to view the code below, and also do some research on memory heap algorithms and how the memory heap works for programs.
Most languages will provide a fixed size buffer for performance reasons, and then provide another version that is dynamic in size. Some language systems opt for a third-party system where they keep the core functionality minimal (core distribution) and encourage developers to create libraries to solve additional or higher level problems. This is why you may question why a language does not provide some functionality. This small core functionality allows costs to be reduced in maintaining and enhancing the language, however you end up having to write your own implementations or depending on a third-party.
var Buffer_A1 = function (chunk_size) {
this.buffer_list = [];
this.total_size = 0;
this.cur_size = 0;
this.cur_buffer = [];
this.chunk_size = chunk_size || 4096;
this.buffer_list.push(new Buffer(this.chunk_size));
};
Buffer_A1.prototype.writeByteArrayLimited = function (data, offset, length) {
var can_write = length > (this.chunk_size - this.cur_size) ? (this.chunk_size - this.cur_size) : length;
var lastbuf = this.buffer_list.length - 1;
for (var x = 0; x < can_write; ++x) {
this.buffer_list[lastbuf][this.cur_size + x] = data[x + offset];
}
this.cur_size += can_write;
this.total_size += can_write;
if (this.cur_size == this.chunk_size) {
this.buffer_list.push(new Buffer(this.chunk_size));
this.cur_size = 0;
}
return can_write;
};
/*
The `data` parameter can be anything that is array like. It just must
support indexing and a length and produce an acceptable value to be
used with Buffer.
*/
Buffer_A1.prototype.writeByteArray = function (data, offset, length) {
offset = offset == undefined ? 0 : offset;
length = length == undefined ? data.length : length;
var rem = length;
while (rem > 0) {
rem -= this.writeByteArrayLimited(data, length - rem, rem);
}
};
Buffer_A1.prototype.readByteArray = function (data, offset, length) {
/*
If you really wanted to implement some read functionality
then you would have to deal with unaligned reads which could
span two buffers.
*/
};
Buffer_A1.prototype.getSingleBuffer = function () {
var obuf = new Buffer(this.total_size);
var cur_off = 0;
var x;
for (x = 0; x < this.buffer_list.length - 1; ++x) {
this.buffer_list[x].copy(obuf, cur_off);
cur_off += this.buffer_list[x].length;
}
this.buffer_list[x].copy(obuf, cur_off, 0, this.cur_size);
return obuf;
};

insert byte to specific place.
insertToArray(arr,index,item) {
return Buffer.concat([arr.slice(0,index),Buffer.from(item,"utf-8"),arr.slice(index)]);
}

Related

How to improve the memory usage in nodejs code?

I tried one code at hackerearth : https://www.hackerearth.com/practice/data-structures/stacks/basics-of-stacks/practice-problems/algorithm/fight-for-laddus/description/
The speed seems fine however the memory usage exceed the 256mb limit by nearly 2.8 times.
In java and python the memory is 5 times less however the time is nearly twice.
What factor can be used to optimise the memory usage in nodejs code implementation?
Here is nodejs implementation:
// Sample code to perform I/O:
process.stdin.resume();
process.stdin.setEncoding("utf-8");
var stdin_input = "";
process.stdin.on("data", function (input) {
stdin_input += input; // Reading input from STDIN
});
process.stdin.on("end", function () {
main(stdin_input);
});
function main(input) {
let arr = input.split("\n");
let testCases = parseInt(arr[0], 10);
arr.splice(0,1);
finalStr = "";
while(testCases > 0){
let inputArray = (arr[arr.length - testCases*2 + 1]).split(" ");
let inputArrayLength = inputArray.length;
testCases = testCases - 1;
frequencyObject = { };
for(let i = 0; i < inputArrayLength; ++i) {
if(!frequencyObject[inputArray[i]])
{
frequencyObject[inputArray[i]] = 0;
}
++frequencyObject[inputArray[i]];
}
let finalArray = [];
finalArray[inputArrayLength-1] = -1;
let stack = [];
stack.push(inputArrayLength-1);
for(let i = inputArrayLength-2; i>=0; i--)
{
let stackLength = stack.length;
while(stackLength > 0 && frequencyObject[inputArray[stack[stackLength-1]]] <= frequencyObject[inputArray[i]])
{
stack.pop();
stackLength--;
}
if (stackLength > 0) {
finalArray[i] = inputArray[stack[stackLength-1]];
} else {
finalArray[i] = -1;
}
stack.push(i);
}
console.log(finalArray.join(" ") + "\n")
}
}

What factor can be used to optimize the memory usage in nodejs code implementation?
Here are some things to consider:
Don't buffer any more input data than you need to before you process it or output it.
Try to avoid making copies of data. Use the data in place if possible. Remember that all string operations create a new string that is likely a copy of the original data. And, many array operations like .map(), .filter(), etc... create new copies of the original array.
Keep in mind that garbage collection is delayed and is typically done during idle time. So, for example, modifying strings in a loop may create a lot of temporary objects that all must exist at once, even though most or all of them will be garbage collected when the loop is done. This creates a poor peak memory usage.
Buffering
The first thing I notice is that you read the entire input file into memory before you process any of it. Right away for large input files, you're going to use a lot of memory. Instead, what you want to do is read enough of a chunk to get the next testCase and then process it.
FYI, this incremental reading/processing will make the code significantly more complicated to write (I've written an implementation myself) because you have to handle partially read lines, but it will hold down memory use a bunch and that's what you asked for.
Copies of Data
After reading the entire input file into memory, you immediately make a copy of it all with this:
let arr = input.split("\n");
So, now you've more than doubled the amount of memory the input data is taking up. Instead of just one string for all the input, you now still have all of that in memory, but you've now broken it up into hundreds of other strings (each with a little overhead of its own for a new string and of course a copy of each line).
Modifying Strings in a Loop
When you're creating your final result which you call finalStr, you're doing this over and over again:
finalStr = finalStr + finalArray.join(" ") + "\n"
This is going to create tons and tons of incremental strings that will likely end up all in memory at once because garbage collection probably won't run until the loop is over. As an example, if you had 100 lines of output that were each 100 characters long so the total output (not count line termination characters) was 100 x 100 = 10,000 characters, then constructing this in a loop like you are would create temporary strings of 100, 200, 300, 400, ... 10,000 which would consume 5000 (avg length) * 100 (number of temporary strings) = 500,000 characters. That's 50x the total output size consumed in temporary string objects.
So, not only does this create tons of incremental strings each one larger than the previous one (since you're adding onto it), it also creates your entire output in memory before writing any of it out to stdout.
Instead, you can incremental output each line to stdout as you construct each line. This will put the worst case memory usage at probably about 2x the output size whereas you're at 50x or worse.

Remove NodeJs Stream padding

I'm writing an application where I need to strip the first X and last Y bytes from a stream. So what I need is basically a function I can pass to pipe that takes X and Y as parameters and removes the desired number of bytes from the stream as it comes through. My simplified setup is like this:
const rs = fs.createReadStream('some_file')
const ws = fs.createWriteStream('some_other_file')
rs.pipe(streamPadding(128, 512)).pipe(ws)
After that, some_other_fileshould contain all the contents of some_fileminus the first 128 Bytes and the last 512 bytes. I've read up on streams, but couldn't figure out how to properly do this, so that it also handles errors during the transfer and does backpressure correctly.
As far as I know, I'd need a duplex stream, that, whenever I read from it, reads from its input stream, keeps track of where in the stream we are and skips the first 128 bytes before emitting data. Some tips on how to implement that would be very helpful.
The second part seems more difficult, if not impossible to do, because how would I know whether I already reached the last 512 bytes or not, before the input stream actually closed. I suspect that might not be possible, but I'm sure there must be a way to solve this problem, so if you have any advice on that, I'd be very thankful!

You can create a new Transform Stream which does what you wish. As for losing the last x bytes, you can always keep the last x bytes buffered and just ignore them when the stream ends.
Something like this (assuming you're working with buffers).
const {Transform} = require('stream');
const ignoreFirst = 128,
ignoreLast = 512;
let lastBuff,
cnt = 0;
const MyTrimmer = new Transform({
transform(chunk,encoding,callback) {
let len = Buffer.byteLength(chunk);
// If we haven't ignored the first bit yet, make sure we do
if(cnt <= ignoreFirst) {
let diff = ignoreFirst - cnt;
// If we have more than we want to ignore, adjust pointer
if(len > diff)
chunk = chunk.slice(diff,len);
// Otherwise unset chunk for later
else
chunk = undefined;
}
// Keep track of how many bytes we've seen
cnt += len;
// If we have nothing to push after trimming, just get out
if(!chunk)
return callback();
// If we already have a saved buff, concat it with the chunk
if(lastBuff)
chunk = Buffer.concat([lastBuff,chunk]);
// Get the new chunk length
len = Buffer.byteLength(chunk);
// If the length is less than what we ignore at the end, save it and get out
if(len < ignoreLast) {
lastBuff = chunk;
return callback();
}
// Otherwise save the piece we might want to ignore and push the rest through
lastBuff = chunk.slice(len-ignoreLast,len);
this.push(chunk.slice(0,len-ignoreLast));
callback();
}
});
Then you add that your pipeline, assuming you're reading from a file and writing to a file:
const rs = fs.createReadStream('some_file')
const ws = fs.createWriteStream('some_other_file')
myTrimmer.pipe(ws);
rs.pipe(myTrimmer);

SPI linux driver

I am trying to learn how to write a basic SPI driver and below is the probe function that I wrote.
What I am trying to do here is setup the spi device for fram(datasheet) and use the spi_sync_transfer()api description to get the manufacturer's id from the chip.
When I execute this code, I can see the data on the SPI bus using logic analyzer but I am unable to read it using the rx buffer. Am I missing something here? Could someone please help me with this?
static int fram_probe(struct spi_device *spi)
{
int err;
unsigned char ch16[] = {0x9F,0x00,0x00,0x00};// 0x9F => 10011111
unsigned char rx16[] = {0x00,0x00,0x00,0x00};
printk("[FRAM DRIVER] fram_probe called \n");
spi->max_speed_hz = 1000000;
spi->bits_per_word = 8;
spi->mode = (3);
err = spi_setup(spi);
if (err < 0) {
printk("[FRAM DRIVER::fram_probe spi_setup failed!\n");
return err;
}
printk("[FRAM DRIVER] spi_setup ok, cs: %d\n", spi->chip_select);
spi_element[0].tx_buf = ch16;
spi_element[1].rx_buf = rx16;
err = spi_sync_transfer(spi, spi_element, ARRAY_SIZE(spi_element)/2);
printk("rx16=%x %x %x %x\n",rx16[0],rx16[1],rx16[2],rx16[3]);
if (err < 0) {
printk("[FRAM DRIVER]::fram_probe spi_sync_transfer failed!\n");
return err;
}
return 0;
}

spi_element is not declared in this example. You should show that and also how all elements of that are array are filled. But just from the code that's there I see a couple mistakes.
You need to set the len parameter of spi_transfer. You've assigned the TX or RX buffer to ch16 or rx16 but not set the length of the buffer in either case.
You should zero out all the fields not used in the spi_transfer.
If you set the length to four, you would not be sending the proper command according to the datasheet. RDID expects a one byte command after which will follow four bytes of output data. You are writing a four byte command in your first transfer and then reading four bytes of data. The tx_buf in the first transfer should just be one byte.
And finally the number of transfers specified as the last argument to spi_sync_transfer() is incorrect. It should be 2 in this case because you have defined two, spi_element[0] and spi_element[1]. You could use ARRAY_SIZE() if spi_element was declared for the purpose of this message and you want to sent all transfers in the array.
Consider this as a way to better fill in the spi_transfers. It will take care of zeroing out fields that are not used, defines the transfers in a easy to see way, and changing the buffer sizes or the number of transfers is automatically accounted for in remaining code.
const char ch16[] = { 0x8f };
char rx16[4];
struct spi_transfer rdid[] = {
{ .tx_buf = ch16, .len = sizeof(ch16) },
{ .rx_buf = rx16, .len = sizeof(rx16) },
};
spi_transfer(spi, rdid, ARRAY_SIZE(rdid));
Since you have a scope, be sure to check that this operation happens under a single chip select pulse. I have found more than one Linux SPI driver to have a bug that pulses chip select when it should not. In some cases switching from TX to RX (like done above) will trigger a CS pulse. In other cases a CS pulse is generated for every word (8 bits here) of data.
Another thing you should change is use dev_info(&spi->dev, "device version %d", id)' and also dev_err() to print messages. This inserts the device name in a standard way instead of your hard-coded non-standard and inconsistent "[FRAME DRIVER]::" text. And sets the level of the message as appropriate.
Also, consider supporting device tree in your driver to read device properties. Then you can do things like change the SPI bus frequency for this device without rebuilding the kernel driver.

Binary file I/O

How to read and write to binary files in D language? In C would be:
FILE *fp = fopen("/home/peu/Desktop/bla.bin", "wb");
char x[4] = "RIFF";
fwrite(x, sizeof(char), 4, fp);
I found rawWrite at D docs, but I don't know the usage, nor if does what I think. fread is from C:
T[] rawRead(T)(T[] buffer);
If the file is not opened, throws an exception. Otherwise, calls fread for the file handle and throws on error.
rawRead always read in binary mode on Windows.

rawRead and rawWrite should behave exactly like fread, fwrite, only they are templates to take care of argument sizes and lengths.
e.g.
auto stream = File("filename","r+");
auto outstring = "abcd";
stream.rawWrite(outstring);
stream.rewind();
auto inbytes = new char[4];
stream.rawRead(inbytes);
assert(inbytes[3] == outstring[3]);
rawRead is implemented in terms of fread as
T[] rawRead(T)(T[] buffer)
{
enforce(buffer.length, "rawRead must take a non-empty buffer");
immutable result =
.fread(buffer.ptr, T.sizeof, buffer.length, p.handle);
errnoEnforce(!error);
return result ? buffer[0 .. result] : null;
}

If you just want to read in a big buffer of values (say, ints), you can simply do:
int[] ints = cast(int[]) std.file.read("ints.bin", numInts * int.sizeof);
and
std.file.write("ints.bin", ints);
Of course, if you have more structured data then Scott Wales' answer is more appropriate.

Looking for a lock-free RT-safe single-reader single-writer structure

I'm looking for a lock-free design conforming to these requisites:
a single writer writes into a structure and a single reader reads from this structure (this structure exists already and is safe for simultaneous read/write)
but at some time, the structure needs to be changed by the writer, which then initialises, switches and writes into a new structure (of the same type but with new content)
and at the next time the reader reads, it switches to this new structure (if the writer multiply switches to a new lock-free structure, the reader discards these structures, ignoring their data).
The structures must be reused, i.e. no heap memory allocation/free is allowed during write/read/switch operation, for RT purposes.
I have currently implemented a ringbuffer containing multiple instances of these structures; but this implementation suffers from the fact that when the writer has used all the structures present in the ringbuffer, there is no more place to change from structure... But the rest of the ringbuffer contains some data which don't have to be read by the reader but can't be re-used by the writer. As a consequence, the ringbuffer does not fit this purpose.
Any idea (name or pseudo-implementation) of a lock-free design? Thanks for having considered this problem.

Here's one. The keys are that there are three buffers and the reader reserves the buffer it is reading from. The writer writes to one of the other two buffers. The risk of collision is minimal. Plus, this expands. Just make your member arrays one element longer than the number of readers plus the number of writers.
class RingBuffer
{
RingBuffer():lastFullWrite(0)
{
//Initialize the elements of dataBeingRead to false
for(unsigned int i=0; i<DATA_COUNT; i++)
{
dataBeingRead[i] = false;
}
}
Data read()
{
// You may want to check to make sure write has been called once here
// to prevent read from grabbing junk data. Else, initialize the elements
// of dataArray to something valid.
unsigned int indexToRead = lastFullWriteIndex;
Data dataCopy;
dataBeingRead[indexToRead] = true;
dataCopy = dataArray[indexToRead];
dataBeingRead[indexToRead] = false;
return dataCopy;
}
void write( const Data& dataArg )
{
unsigned int writeIndex(0);
//Search for an unused piece of data.
// It's O(n), but plenty fast enough for small arrays.
while( true == dataBeingRead[writeIndex] && writeIndex < DATA_COUNT )
{
writeIndex++;
}
dataArray[writeIndex] = dataArg;
lastFullWrite = &dataArray[writeIndex];
}
private:
static const unsigned int DATA_COUNT;
unsigned int lastFullWrite;
Data dataArray[DATA_COUNT];
bool dataBeingRead[DATA_COUNT];
};
Note: The way it's written here, there are two copies to read your data. If you pass your data out of the read function through a reference argument, you can cut that down to one copy.

You're on the right track.
Lock free communication of fixed messages between threads/processes/processors
fixed size ring buffers can be used in lock free communications between threads, processes or processors if there is one producer and one consumer. Some checks to perform:
head variable is written only by producer (as an atomic action after writing)
tail variable is written only by consumer (as an atomic action after reading)
Pitfall: introduction of a size variable or buffer full/empty flag; these are typically written by both producer and consumer and hence will give you an issue.
I generally use ring buffers for this purpoee. Most important lesson I've learned is that a ring buffer of can never contain more than elements. This way a head and tail variable are written by producer respectively consumer.
Extension for large/variable size blocks
To use buffers in a real time environment, you can either use memory pools (often available in optimized form in real time operating systems) or decouple allocation from usage. The latter fits to the question, I believe.
If you need to exchange large blocks, I suggest to use a pool with buffer blocks and communicate pointers to buffers using a queue. So use a 3rd queue with buffer pointers. This way the allocates can be done in application (background) and you real time portion has access to a variable amount of memory.
Application
while (blockQueue.full != true)
{
buf = allocate block of memory from heap or buffer pool
msg = { .... , buf };
blockQueue.Put(msg)
}
Producer:
pBuf = blockQueue.Get()
pQueue.Put()
Consumer
if (pQueue.Empty == false)
{
msg=pQueue.Get()
// use info in msg, with buf pointer
// optionally indicate that buf is no longer used
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string