Buffers filled with unicode zeroes - node.js

I'm trying to synchronously read parameters from console in node, I managed to do the following:
var load = function () {
const BUFFER_LENGTH = 1024;
const stdin = fs.openSync('/dev/stdin', 'rs');
const buffer = Buffer.alloc(BUFFER_LENGTH);
console.log('Provide parameter: ');
fs.readSync(stdin, buffer, 0, BUFFER_LENGTH);
fs.closeSync(stdin);
return buffer.toString().replace(/\n*/, '');
}
It works, but here's a strange thing:
var loadedValue = load();
console.log(loadedValue); // displays "a", if I typed "a", so the result is correct
console.log({loadedValue}); // displays {a: 'a\n\u0000\u0000....'}
When I wrap the value in an object, the remaining BUFFER bits are showed in a string. Why is that? How can I get rid of them? Regexp on a string before making an object doesn't work.

Buffer.alloc(BUFFER_LENGTH) creates a buffer of a particular length (1024 in your case), and fills that buffer with NULL characters (as documented here).
Next, you read some (say 2) bytes from stdin into that buffer, which replaces the first two of those NULL characters with the characters read from stdin. The rest of the buffer still consists of NULL's.
If you don't truncate the buffer to the amount of bytes read, your function returns a buffer of length 1024, mostly filled with NULL's. Since those aren't printable, they don't show up in the first console.log(), but they're still there.
So after reading from stdin, you should truncate the buffer to the right size:
let bytesRead = fs.readSync(stdin, buffer, 0, BUFFER_LENGTH);
buffer = buffer.slice(0, bytesRead);

Related

Remove NodeJs Stream padding

I'm writing an application where I need to strip the first X and last Y bytes from a stream. So what I need is basically a function I can pass to pipe that takes X and Y as parameters and removes the desired number of bytes from the stream as it comes through. My simplified setup is like this:
const rs = fs.createReadStream('some_file')
const ws = fs.createWriteStream('some_other_file')
rs.pipe(streamPadding(128, 512)).pipe(ws)
After that, some_other_fileshould contain all the contents of some_fileminus the first 128 Bytes and the last 512 bytes. I've read up on streams, but couldn't figure out how to properly do this, so that it also handles errors during the transfer and does backpressure correctly.
As far as I know, I'd need a duplex stream, that, whenever I read from it, reads from its input stream, keeps track of where in the stream we are and skips the first 128 bytes before emitting data. Some tips on how to implement that would be very helpful.
The second part seems more difficult, if not impossible to do, because how would I know whether I already reached the last 512 bytes or not, before the input stream actually closed. I suspect that might not be possible, but I'm sure there must be a way to solve this problem, so if you have any advice on that, I'd be very thankful!
You can create a new Transform Stream which does what you wish. As for losing the last x bytes, you can always keep the last x bytes buffered and just ignore them when the stream ends.
Something like this (assuming you're working with buffers).
const {Transform} = require('stream');
const ignoreFirst = 128,
ignoreLast = 512;
let lastBuff,
cnt = 0;
const MyTrimmer = new Transform({
transform(chunk,encoding,callback) {
let len = Buffer.byteLength(chunk);
// If we haven't ignored the first bit yet, make sure we do
if(cnt <= ignoreFirst) {
let diff = ignoreFirst - cnt;
// If we have more than we want to ignore, adjust pointer
if(len > diff)
chunk = chunk.slice(diff,len);
// Otherwise unset chunk for later
else
chunk = undefined;
}
// Keep track of how many bytes we've seen
cnt += len;
// If we have nothing to push after trimming, just get out
if(!chunk)
return callback();
// If we already have a saved buff, concat it with the chunk
if(lastBuff)
chunk = Buffer.concat([lastBuff,chunk]);
// Get the new chunk length
len = Buffer.byteLength(chunk);
// If the length is less than what we ignore at the end, save it and get out
if(len < ignoreLast) {
lastBuff = chunk;
return callback();
}
// Otherwise save the piece we might want to ignore and push the rest through
lastBuff = chunk.slice(len-ignoreLast,len);
this.push(chunk.slice(0,len-ignoreLast));
callback();
}
});
Then you add that your pipeline, assuming you're reading from a file and writing to a file:
const rs = fs.createReadStream('some_file')
const ws = fs.createWriteStream('some_other_file')
myTrimmer.pipe(ws);
rs.pipe(myTrimmer);

can anyone explain arguments of these functions?

I am working with NodeJs and trying to write and read binary files.
I am having headache with NodeJs documentation which did not provide much explanations.
Especially I want to know
fs.writeSync(fd, buffer, offset, length, position)
I know 'fd' and 'buffer', but confused with 'offset' and 'position'.
fs.readSync(fd, buffer, offset, length, position)
I guess this one is the same.
can any one explain to me?
Thanks
offset is the starting position where you start to read the input buffer (therefore, offset + length should be equal or less then buffer's size)
position is the starting position where you start to write the output
The following stupid example will show you how it works:
const fs = require('fs')
var fd = fs.openSync("test.txt", "w")
var buf = Buffer.alloc(5, 'abcde')
fs.writeSync(fd, buf, 0, buf.length, 0)
// buffer's elements [0-4] are written to file's position 0
// test.txt holds 'abcde'
buf = Buffer.alloc(5, 'fghij')
fs.writeSync(fd, buf, 2, buf.length - 2, 2)
// buffer's elements [2-4] are written to file's position 2
// test.txt holds 'abhij'

Node.js buffer string serialization

I want to serialize a buffer to string without any overhead ( one character for one byte) and be able to unserialize it into buffer again.
var b = new Buffer (4) ;
var s = b.toString() ;
var b2 = new Buffer (s)
Produces the same results only for values below 128. I want to use the whole scope of 0-255.
I know I can write it in a loop with String.fromCharCode() in serializing and String.charCodeAt() in deserializing, but I'm looking for some native module implementation if there is any.
You can use the 'latin1' encoding, but you should generally try to avoid it because converting a Buffer to a binary string has some extra computational overhead.
Example:
var b = Buffer.alloc(4);
var s = b.toString('latin1');
var b2 = Buffer.from(s, 'latin1');

How to append binary data to a buffer in node.js

I have a buffer with some binary data:
var b = new Buffer ([0x00, 0x01, 0x02]);
and I want to append 0x03.
How can I append more binary data? I'm searching in the documentation but for appending data it must be a string, if not, an error occurs (TypeError: Argument must be a string):
var b = new Buffer (256);
b.write ("hola");
console.log (b.toString ("utf8", 0, 4)); //hola
b.write (", adios", 4);
console.log (b.toString ("utf8", 0, 11)); //hola, adios
Then, the only solution I can see here is to create a new buffer for every appended binary data and copy it to the major buffer with the correct offset:
var b = new Buffer (4); //4 for having a nice printed buffer, but the size will be 16KB
new Buffer ([0x00, 0x01, 0x02]).copy (b);
console.log (b); //<Buffer 00 01 02 00>
new Buffer ([0x03]).copy (b, 3);
console.log (b); //<Buffer 00 01 02 03>
But this seems a bit inefficient because I have to instantiate a new buffer for every append.
Do you know a better way for appending binary data?
EDIT
I've written a BufferedWriter that writes bytes to a file using internal buffers. Same as BufferedReader but for writing.
A quick example:
//The BufferedWriter truncates the file because append == false
new BufferedWriter ("file")
.on ("error", function (error){
console.log (error);
})
//From the beginning of the file:
.write ([0x00, 0x01, 0x02], 0, 3) //Writes 0x00, 0x01, 0x02
.write (new Buffer ([0x03, 0x04]), 1, 1) //Writes 0x04
.write (0x05) //Writes 0x05
.close (); //Closes the writer. A flush is implicitly done.
//The BufferedWriter appends content to the end of the file because append == true
new BufferedWriter ("file", true)
.on ("error", function (error){
console.log (error);
})
//From the end of the file:
.write (0xFF) //Writes 0xFF
.close (); //Closes the writer. A flush is implicitly done.
//The file contains: 0x00, 0x01, 0x02, 0x04, 0x05, 0xFF
LAST UPDATE
Use concat.
Updated Answer for Node.js ~>0.8
Node is able to concatenate buffers on its own now.
var newBuffer = Buffer.concat([buffer1, buffer2]);
Old Answer for Node.js ~0.6
I use a module to add a .concat function, among others:
https://github.com/coolaj86/node-bufferjs
I know it isn't a "pure" solution, but it works very well for my purposes.
Buffers are always of fixed size, there is no built in way to resize them dynamically, so your approach of copying it to a larger Buffer is the only way.
However, to be more efficient, you could make the Buffer larger than the original contents, so it contains some "free" space where you can add data without reallocating the Buffer. That way you don't need to create a new Buffer and copy the contents on each append operation.
This is to help anyone who comes here looking for a solution that wants a pure approach. I would recommend understanding this problem because it can happen in lots of different places not just with a JS Buffer object. By understanding why the problem exists and how to solve it you will improve your ability to solve other problems in the future since this one is so fundamental.
For those of us that have to deal with these problems in other languages it is quite natural to devise a solution, but there are people who may not realize how to abstract away the complexities and implement a generally efficient dynamic buffer. The code below may have potential to be optimized further.
I have left the read method unimplemented to keep the example small in size.
The realloc function in C (or any language dealing with intrinsic allocations) does not guarantee that the allocation will be expanded in size with out moving the existing data - although sometimes it is possible. Therefore most applications when needing to store a unknown amount of data will use a method like below and not constantly reallocate, unless the reallocation is very infrequent. This is essentially how most file systems handle writing data to a file. The file system simply allocates another node and keeps all the nodes linked together, and when you read from it the complexity is abstracted away so that the file/buffer appears to be a single contiguous buffer.
For those of you who wish to understand the difficulty in just simply providing a high performance dynamic buffer you only need to view the code below, and also do some research on memory heap algorithms and how the memory heap works for programs.
Most languages will provide a fixed size buffer for performance reasons, and then provide another version that is dynamic in size. Some language systems opt for a third-party system where they keep the core functionality minimal (core distribution) and encourage developers to create libraries to solve additional or higher level problems. This is why you may question why a language does not provide some functionality. This small core functionality allows costs to be reduced in maintaining and enhancing the language, however you end up having to write your own implementations or depending on a third-party.
var Buffer_A1 = function (chunk_size) {
this.buffer_list = [];
this.total_size = 0;
this.cur_size = 0;
this.cur_buffer = [];
this.chunk_size = chunk_size || 4096;
this.buffer_list.push(new Buffer(this.chunk_size));
};
Buffer_A1.prototype.writeByteArrayLimited = function (data, offset, length) {
var can_write = length > (this.chunk_size - this.cur_size) ? (this.chunk_size - this.cur_size) : length;
var lastbuf = this.buffer_list.length - 1;
for (var x = 0; x < can_write; ++x) {
this.buffer_list[lastbuf][this.cur_size + x] = data[x + offset];
}
this.cur_size += can_write;
this.total_size += can_write;
if (this.cur_size == this.chunk_size) {
this.buffer_list.push(new Buffer(this.chunk_size));
this.cur_size = 0;
}
return can_write;
};
/*
The `data` parameter can be anything that is array like. It just must
support indexing and a length and produce an acceptable value to be
used with Buffer.
*/
Buffer_A1.prototype.writeByteArray = function (data, offset, length) {
offset = offset == undefined ? 0 : offset;
length = length == undefined ? data.length : length;
var rem = length;
while (rem > 0) {
rem -= this.writeByteArrayLimited(data, length - rem, rem);
}
};
Buffer_A1.prototype.readByteArray = function (data, offset, length) {
/*
If you really wanted to implement some read functionality
then you would have to deal with unaligned reads which could
span two buffers.
*/
};
Buffer_A1.prototype.getSingleBuffer = function () {
var obuf = new Buffer(this.total_size);
var cur_off = 0;
var x;
for (x = 0; x < this.buffer_list.length - 1; ++x) {
this.buffer_list[x].copy(obuf, cur_off);
cur_off += this.buffer_list[x].length;
}
this.buffer_list[x].copy(obuf, cur_off, 0, this.cur_size);
return obuf;
};
insert byte to specific place.
insertToArray(arr,index,item) {
return Buffer.concat([arr.slice(0,index),Buffer.from(item,"utf-8"),arr.slice(index)]);
}

Making a WCHAR null terminated

I've got this
WCHAR fileName[1];
as a returned value from a function (it's a sys 32 function so I am not able to change the returned type). I need to make fileName to be null terminated so I am trying to append '\0' to it, but nothing seems to work.
Once I get a null terminated WCHAR I will need to pass it to another sys 32 function so I need it to stay as WCHAR.
Could anyone give me any suggestion please?
================================================
Thanks a lot for all your help. Looks like my problem has to do with more than missing a null terminated string.
//This works:
WCHAR szPath1[50] = L"\\Invalid2.txt.txt";
dwResult = FbwfCommitFile(szDrive, pPath1); //Successful
//This does not:
std::wstring l_fn(L"\\");
//Because Cache_detail->fileName is \Invalid2.txt.txt and I need two
l_fn.append(Cache_detail->fileName);
l_fn += L""; //To ensure null terminated
fprintf(output, "l_fn.c_str: %ls\n", l_fn.c_str()); //Prints "\\Invalid2.txt.txt"
iCommitErr = FbwfCommitFile(L"C:", (WCHAR*)l_fn.c_str()); //Unsuccessful
//Then when I do a comparison on these two they are unequal.
int iCompareResult = l_fn.compare(pPath1); // returns -1
So I need to figure out how these two ended up to be different.
Thanks a lot!
Since you mentioned fbwffindfirst/fbwffindnext in a comment, you're talking about the file name returned in FbwfCacheDetail. So from the fileNameLength field you know length for the fileName in bytes. The length of fileName in WCHAR's is fileNameLength/sizeof(WCHAR). So the simple answer is that you can set
fileName[fileNameLength/sizeof(WCHAR)+1] = L'\0'
Now this is important you need to make sure that the buffer you send for the cacheDetail parameter into fbwffindfirst/fbwffindnext is sizeof(WCHAR) bytes larger than you need, the above code snippet may run outside the bounds of your array. So for the size parameter of fbwffindfirst/fbwffindnext pass in the buffer size - sizeof(WCHAR).
For example this:
// *** Caution: This example has no error checking, nor has it been compiled ***
ULONG error;
ULONG size;
FbwfCacheDetail *cacheDetail;
// Make an intial call to find how big of a buffer we need
size = 0;
error = FbwfFindFirst(volume, NULL, &size);
if (error == ERROR_MORE_DATA) {
// Allocate more than we need
cacheDetail = (FbwfCacheDetail*)malloc(size + sizeof(WCHAR));
// Don't tell this call about the bytes we allocated for the null
error = FbwfFindFirstFile(volume, cacheDetail, &size);
cacheDetail->fileName[cacheDetail->fileNameLength/sizeof(WCHAR)+1] = L"\0";
// ... Use fileName as a null terminated string ...
// Have to free what we allocate
free(cacheDetail);
}
Of course you'll have to change a good bit to fit in with your code (plus you'll have to call fbwffindnext as well)
If you are interested in why the FbwfCacheDetail struct ends with a WCHAR[1] field, see this blog post. It's a pretty common pattern in the Windows API.
Use L'\0', not '\0'.
As each character of a WCHAR is 16-bit in size, you should perhaps append \0\0 to it, but I'm not sure if this works. By the way, WCHAR fileName[1]; is creating a WCHAR of length 1, perhaps you want something like WCHAR fileName[1024]; instead.
WCHAR fileName[1]; is an array of 1 character, so if null terminated it will contain only the null terminator L'\0'.
Which API function are you calling?
Edited
The fileName member in FbwfCacheDetail is only 1 character which is a common technique used when the length of the array is unknown and the member is the last member in a structure. As you have likely already noticed if your allocated buffer is is only sizeof (FbwfCacheDetail) long then FbwfFindFirst returns ERROR_NOT_ENOUGH_MEMORY.
So if I understand, what you desire to do it output the non NULL terminated filename using fprintf. This can be done as follows
fprintf (outputfile, L"%.*ls", cacheDetail.fileNameLength, cacheDetail.fileName);
This will print only the first fileNameLength characters of fileName.
An alternative approach would be to append a NULL terminator to the end of fileName. First you'll need to ensure that the buffer is long enough which can be done by subtracting sizeof (WCHAR) from the size argument you pass to FbwfFindFirst. So if you allocate a buffer of 1000 bytes, you'll pass 998 to FbwfFindFirst, reserving the last two bytes in the buffer for your own use. Then to add the NULL terminator and output the file name use
cacheDetail.fileName[cacheDetail.fileNameLength] = L'\0';
fprintf (outputfile, L"%ls", cacheDetail.fileName);

Resources