Node.js buffer string serialization - node.js

I want to serialize a buffer to string without any overhead ( one character for one byte) and be able to unserialize it into buffer again.
var b = new Buffer (4) ;
var s = b.toString() ;
var b2 = new Buffer (s)
Produces the same results only for values below 128. I want to use the whole scope of 0-255.
I know I can write it in a loop with String.fromCharCode() in serializing and String.charCodeAt() in deserializing, but I'm looking for some native module implementation if there is any.

You can use the 'latin1' encoding, but you should generally try to avoid it because converting a Buffer to a binary string has some extra computational overhead.
Example:
var b = Buffer.alloc(4);
var s = b.toString('latin1');
var b2 = Buffer.from(s, 'latin1');

Related

Nim: work with read-only memory mapped files

I've only just started with Nim, hence it possibly is a simple question. We need to do many lookups into data that are stored in a file. Some of these files are too large to load into memory, hence the mmapped approach. I'm able to mmap the file by means of memfiles and either have a pointer or MemSlice at my hand. The file and the memory region are read-only, and hence have a fixed size. I was hoping that I'm able to access the data as immutable fixed size byte and char arrays without copying them, leveraging all the existing functionalities available to seqs, arrays, strings etc.. All the MemSlice / string methods copy the data, which is fair, but not what I want (and in my use case don't need).
I understand array, strings etc. types have a pointer to the data and a len field. But couldn't find a way to create them with a pointer and len. I assume it has something to do with ownership and refs to mem that may outlive my slice.
let mm = memfiles.open(...)
let myImmutableFixesSizeArr = ?? # cast[ptr array[fsize, char]](mm.mem) doesn't compile as fsize needs to be const. Neither could I find something like let x: [char] = array_from(mm.mem, fsize)
let myImmutableFixedSizeString = mm[20, 30].to_fixed_size_immutable_string # Create something that is string like so that I can use all the existing string methods.
UPDATE: I did find https://forum.nim-lang.org/t/4680#29226 which explains how to use OpenArray, but OpenArray is only allowed as function argument, and you - if I'm not mistaken - it is doesn't behave like a normal array.
Thanks for your help
It is not possible to convert a raw char array in memory (ptr UncheckedArray[char]) to a string without copying, only to an openArray[char] (or cstring)
So it won't be possible to use procs that expect a string, only those that accept openArray[T] or openArray[char]
Happily an openArray[T] behaves exactly like a seq[T] when sent to a proc.
({.experimental:"views".} does let you assign an openArray[T] to a local variable, but it's not anywhere near ready for production)
you can use the memSlices iterator to loop over delimited chunks in a memFile without copying:
import memfiles
template toOpenArray(ms: MemSlice, T: typedesc = byte): openArray[T] =
##template because openArray isn't a valid return type yet
toOpenArray(cast[ptr UncheckedArray[T]](ms.data),0,(ms.size div sizeof(T))-1)
func process(slice:openArray[char]) =
## your code here but e.g.
## count number of A's
var nA: int
for ch in slice.items:
if ch == 'A': inc nA
debugEcho nA
let mm = memfiles.open("file.txt")
for slice in mm.memSlices:
process slice.toOpenArray(char)
Or, to work with some char array represented in the middle of the file, you can use pointer arithmetic.
import memfiles
template extractImpl(typ,pntr,offset) =
cast[typ](cast[ByteAddress](pntr)+offset)
template checkFileLen(memfile,len,offset) =
if offset + len > memfile.size:
raise newException(IndexDefect,"file too short")
func extract*(mm: MemFile,T:typedesc, offset:Natural): ptr T =
checkFileLen(mm,T,offset)
result = extractImpl(ptr T,mm.mem,offset)
func extract*[U](mm: MemFile,T: typedesc[ptr U], offset: Natural): T =
extractImpl(T,mm.mem,offset)
let mm = memfiles.open("file.txt")
#to extract a compile-time known length string:
let mystring_offset = 3
const mystring_len = 10
type MyStringT = array[mystring_len,char]
let myString:ptr MyStringT = mm.extract(MyStringT,mystring_offset)
process myString[]
#to extract a dynamic length string:
let size_offset = 14
let string_offset = 18
let sz:ptr int32 = mm.extract(int32,size_offset)
let str:ptr UncheckedArray[char] = mm.extract(ptr UncheckedArray[char], string_offset)
checkFileLen(mm,sz[],string_offset)
process str.toOpenArray(0,sz[]-1)

Remove NodeJs Stream padding

I'm writing an application where I need to strip the first X and last Y bytes from a stream. So what I need is basically a function I can pass to pipe that takes X and Y as parameters and removes the desired number of bytes from the stream as it comes through. My simplified setup is like this:
const rs = fs.createReadStream('some_file')
const ws = fs.createWriteStream('some_other_file')
rs.pipe(streamPadding(128, 512)).pipe(ws)
After that, some_other_fileshould contain all the contents of some_fileminus the first 128 Bytes and the last 512 bytes. I've read up on streams, but couldn't figure out how to properly do this, so that it also handles errors during the transfer and does backpressure correctly.
As far as I know, I'd need a duplex stream, that, whenever I read from it, reads from its input stream, keeps track of where in the stream we are and skips the first 128 bytes before emitting data. Some tips on how to implement that would be very helpful.
The second part seems more difficult, if not impossible to do, because how would I know whether I already reached the last 512 bytes or not, before the input stream actually closed. I suspect that might not be possible, but I'm sure there must be a way to solve this problem, so if you have any advice on that, I'd be very thankful!
You can create a new Transform Stream which does what you wish. As for losing the last x bytes, you can always keep the last x bytes buffered and just ignore them when the stream ends.
Something like this (assuming you're working with buffers).
const {Transform} = require('stream');
const ignoreFirst = 128,
ignoreLast = 512;
let lastBuff,
cnt = 0;
const MyTrimmer = new Transform({
transform(chunk,encoding,callback) {
let len = Buffer.byteLength(chunk);
// If we haven't ignored the first bit yet, make sure we do
if(cnt <= ignoreFirst) {
let diff = ignoreFirst - cnt;
// If we have more than we want to ignore, adjust pointer
if(len > diff)
chunk = chunk.slice(diff,len);
// Otherwise unset chunk for later
else
chunk = undefined;
}
// Keep track of how many bytes we've seen
cnt += len;
// If we have nothing to push after trimming, just get out
if(!chunk)
return callback();
// If we already have a saved buff, concat it with the chunk
if(lastBuff)
chunk = Buffer.concat([lastBuff,chunk]);
// Get the new chunk length
len = Buffer.byteLength(chunk);
// If the length is less than what we ignore at the end, save it and get out
if(len < ignoreLast) {
lastBuff = chunk;
return callback();
}
// Otherwise save the piece we might want to ignore and push the rest through
lastBuff = chunk.slice(len-ignoreLast,len);
this.push(chunk.slice(0,len-ignoreLast));
callback();
}
});
Then you add that your pipeline, assuming you're reading from a file and writing to a file:
const rs = fs.createReadStream('some_file')
const ws = fs.createWriteStream('some_other_file')
myTrimmer.pipe(ws);
rs.pipe(myTrimmer);

Buffers filled with unicode zeroes

I'm trying to synchronously read parameters from console in node, I managed to do the following:
var load = function () {
const BUFFER_LENGTH = 1024;
const stdin = fs.openSync('/dev/stdin', 'rs');
const buffer = Buffer.alloc(BUFFER_LENGTH);
console.log('Provide parameter: ');
fs.readSync(stdin, buffer, 0, BUFFER_LENGTH);
fs.closeSync(stdin);
return buffer.toString().replace(/\n*/, '');
}
It works, but here's a strange thing:
var loadedValue = load();
console.log(loadedValue); // displays "a", if I typed "a", so the result is correct
console.log({loadedValue}); // displays {a: 'a\n\u0000\u0000....'}
When I wrap the value in an object, the remaining BUFFER bits are showed in a string. Why is that? How can I get rid of them? Regexp on a string before making an object doesn't work.
Buffer.alloc(BUFFER_LENGTH) creates a buffer of a particular length (1024 in your case), and fills that buffer with NULL characters (as documented here).
Next, you read some (say 2) bytes from stdin into that buffer, which replaces the first two of those NULL characters with the characters read from stdin. The rest of the buffer still consists of NULL's.
If you don't truncate the buffer to the amount of bytes read, your function returns a buffer of length 1024, mostly filled with NULL's. Since those aren't printable, they don't show up in the first console.log(), but they're still there.
So after reading from stdin, you should truncate the buffer to the right size:
let bytesRead = fs.readSync(stdin, buffer, 0, BUFFER_LENGTH);
buffer = buffer.slice(0, bytesRead);

How to append binary data to a buffer in node.js

I have a buffer with some binary data:
var b = new Buffer ([0x00, 0x01, 0x02]);
and I want to append 0x03.
How can I append more binary data? I'm searching in the documentation but for appending data it must be a string, if not, an error occurs (TypeError: Argument must be a string):
var b = new Buffer (256);
b.write ("hola");
console.log (b.toString ("utf8", 0, 4)); //hola
b.write (", adios", 4);
console.log (b.toString ("utf8", 0, 11)); //hola, adios
Then, the only solution I can see here is to create a new buffer for every appended binary data and copy it to the major buffer with the correct offset:
var b = new Buffer (4); //4 for having a nice printed buffer, but the size will be 16KB
new Buffer ([0x00, 0x01, 0x02]).copy (b);
console.log (b); //<Buffer 00 01 02 00>
new Buffer ([0x03]).copy (b, 3);
console.log (b); //<Buffer 00 01 02 03>
But this seems a bit inefficient because I have to instantiate a new buffer for every append.
Do you know a better way for appending binary data?
EDIT
I've written a BufferedWriter that writes bytes to a file using internal buffers. Same as BufferedReader but for writing.
A quick example:
//The BufferedWriter truncates the file because append == false
new BufferedWriter ("file")
.on ("error", function (error){
console.log (error);
})
//From the beginning of the file:
.write ([0x00, 0x01, 0x02], 0, 3) //Writes 0x00, 0x01, 0x02
.write (new Buffer ([0x03, 0x04]), 1, 1) //Writes 0x04
.write (0x05) //Writes 0x05
.close (); //Closes the writer. A flush is implicitly done.
//The BufferedWriter appends content to the end of the file because append == true
new BufferedWriter ("file", true)
.on ("error", function (error){
console.log (error);
})
//From the end of the file:
.write (0xFF) //Writes 0xFF
.close (); //Closes the writer. A flush is implicitly done.
//The file contains: 0x00, 0x01, 0x02, 0x04, 0x05, 0xFF
LAST UPDATE
Use concat.
Updated Answer for Node.js ~>0.8
Node is able to concatenate buffers on its own now.
var newBuffer = Buffer.concat([buffer1, buffer2]);
Old Answer for Node.js ~0.6
I use a module to add a .concat function, among others:
https://github.com/coolaj86/node-bufferjs
I know it isn't a "pure" solution, but it works very well for my purposes.
Buffers are always of fixed size, there is no built in way to resize them dynamically, so your approach of copying it to a larger Buffer is the only way.
However, to be more efficient, you could make the Buffer larger than the original contents, so it contains some "free" space where you can add data without reallocating the Buffer. That way you don't need to create a new Buffer and copy the contents on each append operation.
This is to help anyone who comes here looking for a solution that wants a pure approach. I would recommend understanding this problem because it can happen in lots of different places not just with a JS Buffer object. By understanding why the problem exists and how to solve it you will improve your ability to solve other problems in the future since this one is so fundamental.
For those of us that have to deal with these problems in other languages it is quite natural to devise a solution, but there are people who may not realize how to abstract away the complexities and implement a generally efficient dynamic buffer. The code below may have potential to be optimized further.
I have left the read method unimplemented to keep the example small in size.
The realloc function in C (or any language dealing with intrinsic allocations) does not guarantee that the allocation will be expanded in size with out moving the existing data - although sometimes it is possible. Therefore most applications when needing to store a unknown amount of data will use a method like below and not constantly reallocate, unless the reallocation is very infrequent. This is essentially how most file systems handle writing data to a file. The file system simply allocates another node and keeps all the nodes linked together, and when you read from it the complexity is abstracted away so that the file/buffer appears to be a single contiguous buffer.
For those of you who wish to understand the difficulty in just simply providing a high performance dynamic buffer you only need to view the code below, and also do some research on memory heap algorithms and how the memory heap works for programs.
Most languages will provide a fixed size buffer for performance reasons, and then provide another version that is dynamic in size. Some language systems opt for a third-party system where they keep the core functionality minimal (core distribution) and encourage developers to create libraries to solve additional or higher level problems. This is why you may question why a language does not provide some functionality. This small core functionality allows costs to be reduced in maintaining and enhancing the language, however you end up having to write your own implementations or depending on a third-party.
var Buffer_A1 = function (chunk_size) {
this.buffer_list = [];
this.total_size = 0;
this.cur_size = 0;
this.cur_buffer = [];
this.chunk_size = chunk_size || 4096;
this.buffer_list.push(new Buffer(this.chunk_size));
};
Buffer_A1.prototype.writeByteArrayLimited = function (data, offset, length) {
var can_write = length > (this.chunk_size - this.cur_size) ? (this.chunk_size - this.cur_size) : length;
var lastbuf = this.buffer_list.length - 1;
for (var x = 0; x < can_write; ++x) {
this.buffer_list[lastbuf][this.cur_size + x] = data[x + offset];
}
this.cur_size += can_write;
this.total_size += can_write;
if (this.cur_size == this.chunk_size) {
this.buffer_list.push(new Buffer(this.chunk_size));
this.cur_size = 0;
}
return can_write;
};
/*
The `data` parameter can be anything that is array like. It just must
support indexing and a length and produce an acceptable value to be
used with Buffer.
*/
Buffer_A1.prototype.writeByteArray = function (data, offset, length) {
offset = offset == undefined ? 0 : offset;
length = length == undefined ? data.length : length;
var rem = length;
while (rem > 0) {
rem -= this.writeByteArrayLimited(data, length - rem, rem);
}
};
Buffer_A1.prototype.readByteArray = function (data, offset, length) {
/*
If you really wanted to implement some read functionality
then you would have to deal with unaligned reads which could
span two buffers.
*/
};
Buffer_A1.prototype.getSingleBuffer = function () {
var obuf = new Buffer(this.total_size);
var cur_off = 0;
var x;
for (x = 0; x < this.buffer_list.length - 1; ++x) {
this.buffer_list[x].copy(obuf, cur_off);
cur_off += this.buffer_list[x].length;
}
this.buffer_list[x].copy(obuf, cur_off, 0, this.cur_size);
return obuf;
};
insert byte to specific place.
insertToArray(arr,index,item) {
return Buffer.concat([arr.slice(0,index),Buffer.from(item,"utf-8"),arr.slice(index)]);
}

D (Tango) Read all standard input and assign it to a string

In the D language how I can read all standard input and assign it to a string (with Tango library) ?
Copied straight from http://www.dsource.org/projects/tango/wiki/ChapterIoConsole:
import tango.text.stream.LineIterator;
foreach (line; new LineIterator!(char)(Cin.stream))
// do something with each line
If only 1 line is required, use
auto line = Cin.copyln();
Another, probably more efficient way, of dumping the contents of Stdin would be something like this:
module dumpstdin;
import tango.io.Console : Cin;
import tango.io.device.Array : Array;
import tango.io.model.IConduit : InputStream;
const BufferInitialSize = 4096u;
const BufferGrowingStep = 4096u;
ubyte[] dumpStream(InputStream ins)
{
auto buffer = new Array(BufferInitialSize, BufferGrowingStep);
buffer.copy(ins);
return cast(ubyte[]) buffer.slice();
}
import tango.io.Stdout : Stdout;
void main()
{
auto contentsOfStdin
= cast(char[]) dumpStream(Cin.stream);
Stdout
("Finished reading Stdin.").newline()
("Contents of Stdin was:").newline()
("<<")(contentsOfStdin)(">>").newline();
}
Some notes:
The second parameter to Array is necessary; if you omit it, Array will not grow in size.
I used 4096 since that's generally the size of a page of memory.
dumpStream returns a ubyte[] because char[] is defined as a UTF-8 string, which Stdin doesn't necessarily need to be. For example, if someone piped a binary file to your program, you would end up with an invalid char[] that could throw an exception if anything checks it for validity. If you only care about text, then casting the result to a char[] is fine.
copy is a method on the OutputStream interface that causes it to drain the provided InputStream of all input.

Resources