Can you pass meta data along with a stream? - node.js

When I pipe something like an image file through a stream is there any way to send an meta object along with it?
My server gets sent an image from a user. The image gets pushed through a set of streams that perform various actions.
The final stream emits a data event, it passes the resulting image buffer into a callback but I lose all context for the user. I need to keep the resulting image tied to the user's id and some other meta data.
Ideal:
stream.on('data', function(img, meta){
...
})
Thanks for any possible solutions!

In short, no, there's nothing built into Node.js to support including metadata with streams. You do have some other options, though, including:
You could use a closure to track the meta data separately from the stream. For example:
function handleImage(imageStream) {
var meta = {...};
imageStream.pipe(otherStreams).on('data', function(image) {
// you now have `image` and `meta` variables at your disposal here.
}
}
The downside of this is that the metadata is not available to your otherStreams.
This is a good solution if your other streams are third-party code outside of your control, of if they don't need to know about the metadata.
You could do something similar to HTTP headers, where all the data up to a certain point is meta data, and everything after it is the image. (In HTTP, the deliminator is wherever \n\n occurs first.) All of your streams in the chain have to know about this and handle it though.
If you know your metadata will always be in one chunk and none of your streams split or merge chunks, then you could simplify this a bit and just say that the first (or last) chunk is always metadata.
Switch to an object stream like Amoli mentioned in his answer. Here you would pass {image: imgData, meta: {...}}. You would then have to update your other streams to expect this format.
The main downside of this method, though, is that you either have to pass the metadata multiple times, cache it somewhere for each stream that needs it, or pass your entire image as one chunk (which kind of kills the entire point of "streams"). And, from what I've been told, node.js can optimize text/binary streams better than object streams. So, this probably isn't a good approach for your situation.
https://github.com/dominictarr/mux-demux might be helpful here. It combines multiple streams into one, so you could have separate image and meta streams. I'm not sure how well it would work for your situation though. You'd probably need to update all of your streams to be aware of it.
I know I said that all but the first option require modifying the other streams, but there is a way around that: you could create a generic "stream wrapper" that splits up the image and meta data and passes just the image data through the main stream, and has the meta data bypass it and go on to the next one down the chain. This gets ugly fast though, so probably not the best idea.

Basically, whenever you want to read or write any objects which are not strings or buffers, you’ll need to put your stream into objectMode
Example (source):
function S3Lister (s3, options) {
options || (options = {});
stream.Readable.call(this, { objectMode : true });
this.s3 = s3; // a knox-like client.
this.marker = options.start;
this.connecting = false;
this.ended = false;
}
util.inherits(S3Lister, stream.Readable);
We set the stream to use objectMode as we want to return not just data but also some metadata.
For more information:
Node.js Docs stream object mode
An introduction to nodes streams

I created a module called metastream for this type of thing. (It is in npm).

Related

Unable to use one readable stream to write to two different targets in Node JS

I have a client side app where users can upload an image. I receive this image in my Node JS app as readable data and then manipulate it before saving like this:
uploadPhoto: async (server, request) => {
try {
const randomString = `${uuidv4()}.jpg`;
const stream = Fse.createWriteStream(`${rootUploadPath}/${userId}/${randomString}`);
const resizer = Sharp()
.resize({
width: 450
});
await data.file
.pipe(resizer)
.pipe(stream);
This works fine, and writes the file to the projects local directory. The problem comes when I try to use the same readable data again in the same async function. Please note, all of this code is in a try block.
const stream2 = Fse.createWriteStream(`${rootUploadPath}/${userId}/thumb_${randomString}`);
const resizer2 = Sharp()
.resize({
width: 45
});
await data.file
.pipe(resizer2)
.pipe(stream2);
The second file is written, but when I check the file, it seems corrupted or didn't successfully write the data. The first image is always fine.
I've tried a few things, and found one method that seems to work but I don't understand why. I add this code just before the I create the second write stream:
data.file.on('end', () => {
console.log('There will be no more data.');
});
Putting the code for the second write stream inside the on-end callback block doesn't make a difference, however, if I leave the code outside of the block, between the first write stream code and the second write stream code, then it works, and both files are successfully written.
It doesn't feel right leaving the code the way it is. Is there a better way I can write the second thumb nail image? I've tried to use the Sharp module to read the file after the first write stream writes the data, and then create a smaller version of it, but it doesn't work. The file doesn't ever seem to be ready to use.
You have 2 alternatives, which depends on how your software is designed.
If possible, I would avoid to execute two transform operations on the same stream in the same "context", eg: an API endpoint. I would rather separate those two different tranform so they do not work on the same input stream.
If that is not possible or would require too many changes, the solution is to fork the input stream and the pipe it into two different Writable. I normally use Highland.js fork for these tasks.
Please also see my comments on how to properly handle streams with async/await to check when the write operation is finished.

How to save records with Asset field use server-to-server cloudkit.js

I want to use server-to-server cloudkit js. to save record with Asset field.
the Asset field is a m4a audio. after saved, the audio file is corrupt to play
The Apple's Doc is not clear about the Asset field.
In a record that is being saved to the database, the value of an Asset field must be a window.Blob type. In the code fragment above, the type of the assetFile variable is window.File.
Docs:
https://developer.apple.com/documentation/cloudkitjs/cloudkit/database/1628735-saverecords
but in nodejs ,there is no Blob or .File, I filled it with a buffer like this code:
var dstFile = path.join(__dirname,"../test.m4a");
var data = fs.readFileSync(dstFile);
let buffer = Buffer.from(data);
var rec = {
recordType: "MyAttachment",
fields: {
ext: { value: ".m4a" },
file: { value: buffer }
}
}
//console.debug(rec);
mydatabase.newRecordsBatch().create(rec).commit().then(function (response) {
if (response.hasErrors) {
console.log(">>> saveAttachFile record failed");
console.warn(response.errors[0]);
} else {
var createdRecord = response.records[0];
console.log(">>> saveAttachFile record success:", createdRecord);
}
});
The record is successful be saved.
But when I download the audio from icloud.developer.apple.com/dashboard .
the audio file is corrupt to play.
What's wrong with it. thank you to reply.
I was having the same problem and have found a working solution!
Remembering that CloudKitJS needs you to define your own fetch method, I implemented a custom one to see what was going on. I then attached a debugger on the custom fetch to inspect the data that was passing through it.
After stepping through the caller, I found that all asset values are transformed using its toString() method only when the library is embedded in NodeJS. This is determined by the absence of the global window object.
When toString() is called on a Buffer, its contents are encoded to UTF-8 (by default), which causes binary assets to become malformed. If you're using node-fetch for your fetch implementation, it supports Buffer and stream.Readable, so this toString() call does nothing but harm.
The most unobtrusive fix I've found is to swap the toString() method on any Buffer or stream.Readable instances passed as an asset field values. You should probably use stream.Readable, by the way, so that you don't load the entire asset in memory when uploading.
Anyway, here's what it looks like in practice:
// Put this somewhere in your implementation
const swizzleBuffer = (buffer) => {
buffer.toString = () => buffer;
return buffer;
};
// Use this asset value instead
{ asset: swizzleBuffer(fs.readFileSync(path)) }
Please be aware that this workaround mutates a Buffer in an ugly way (since Buffer apparently can't be extended). It's probably a good idea to design an API which doesn't use Buffer arguments so that you can mutate instances that only you create yourself to avoid unintended side effects anywhere else in your code.
Also, sure to vendor (make a local copy) of CloudKitJS in your project, as the behavior may change in the future.
ORIGINAL ANSWER
I ran into the same problem and solved it by encoding my data using Base64. It appears that there's a bug in their SDK which mangles Buffer instances containing non-ascii characters (which, um, seems problematic).
Anyway, try something like this:
const assetField = { value: Buffer.from(data.toString('base64')), 'ascii') }
Side note:
You'll need to decode the asset(s) on the device before using them. There's no way to do this efficiently without writing your own routines, as the methods included in Data / NSData instances requires all data to be in memory.
This is a problem with CloudKitJS (and not the native CloudKit client / service), so the other option is to write your own routine to upload assets.
Neither of these options seem particularly great, but rolling your own atleast means there aren't extra steps for clients to take in order to use the asset.

read (pull) vs pipe(control flow) vs data(push)

Node.js has different options to consume the data.
Streams 0,1,2,3 and so on...
My question is with respect to real life application of
These different option. I fairly understand the
Difference between readable /read, data event and
Pipe but not very confident about selecting specific
Method.
For example if I want to use flow control, read with
Some manual work as well as pipe can be used.
data event ignores flow control, should I stop using
Plain data event?
For most things, you should be able to use
src.pipe(dest);
If you look at the source code for the Stream.prototype.pipe implementation, you can see that it's just a very handy wrapper that sets everything up for you
For all the work I do with streams, I generally just choose the proper stream type (Readable, Writable, Duplex, Transform, or PassThrough) and then define the proper methods (_read, _write, and/or _transform) on the stream. Lastly, I use .pipe to connect everything together.
It's very common to see stream setups that appear to be "circular"
client.pipe(encoder).pipe(server).pipe(decoder).pipe(client)
As an example, here's stream I'm using in my burro module. You can write objects to this stream, and you can read JSON strings from it.
// burro/encoder.js
var stream = require("stream"),
util = require("util");
var Encoder = module.exports = function Encoder() {
stream.Transform.call(this, {objectMode: true});
};
util.inherits(Encoder, stream.Transform);
Encoder.prototype._transform = function _transform(obj, encoding, callback) {
this.push(JSON.stringify(obj));
callback(null);
};
As a general recommendation, you will almost always write your Streams like this. That is, you write your own "class" that inherits from one of the built-in streams. It is not really practical for you to use a built-in stream directly.
To demonstrate how you might use this, start by creating a new instance of the stream
var encoder = new Encoder();
See what the encoder outputs by piping it to stdout
encoder.pipe(process.stdout);
Write some sample objects to it
encoder.write({foo: "bar", a: "b"});
// '{"foo":"bar","a":"b"}'
encoder.write({hello: "world"});
// '{"hello":"world"}'

How can I rotate the file I'm writing to in node.js?

I'm writing records to a file in node.js and I need to rotate the file with a new one every so many lines or after a duration but I can't lose any lines in the process. If I try with fs.createWriteStream to create a new stream I end up losing lines by overwriting the old stream. Any advise would be much appreciated.
Don't overwrite the old stream. Create the new stream as a separate resource.
var activestream;
function startup() {
activestream = fs.createWriteStream('path');
}
function record(line) {
activestream.write(line);
}
function rotate() {
var newstream = fs.createWriteStream('path2');
activestream.end();
activestream = newstream;
}
... something like that should work. Obviously you'll have to figure out how to manage the paths.
I am going to give an unconventional answer here.
You can consider a ready made library like winston which comes with well tested functionality to do exactly what you want. Given it's meant to write logs, but you can write your csv entries too just as easily.
Another big advantage of using winston is it supports multiple transports, so you can write not only to files and rotate them you can also write to other media MongoDB and few others.
It also does stuff like conditional writing and you can define custom 'log-level' to write different functional records to your file.
I recommend you check it out before cooking your own solution.
I would recommend writing your own stream manager similar to Jason's, but instead of ending the stream when rotate is requested, let the write finish, pause the stream, rotate the file, then resume the stream. Only one stream per file should be required and you shouldn't need to recreate it.

Websockets with Streaming Archives

So this is the setup I'm working with:
I am on an express server which must stream an archived binary payload to a browser (does not matter if it is zip, tar or tar.gz - although zip would be nice).
On this server, I have a websocket open that connects to another server which is sending me binary payloads of individual files in a directory. I get these payloads streamed, piece-by-piece, as buffers, and I'm doing this serially (that is - file-by-file - there aren't multiple websockets open at one time, and there is one websocket per file). This is the websocket library I'm using: https://github.com/einaros/ws
I would like to go through each file, open a websocket, and then append the buffers to an archiver as they come through the websockets. When data is appended to the archiver, it would be nice if I could stream the ouput of the archiver to the browser (via the response object with response.write). So, basically, as I'm getting the payload from the websocket, I would like that payload streamed through an archiver and then to the response. :-)
Some things I have looked into:
node-zipstream - This is nice because it gives me an output stream I can pipe directly to response.write. However, it doesn't appear to support nested files/folders, and, more importantly, it only accepts an input stream. I have looked at the source code (which is quite terse and readable), and it seems as though, if I were able to have access to the update function within ZipStream.prototype.addFile, I could just call that each time on the message event when I get a binary buffer from the websocket. This is quite messy/hacky though, and, given that this library already doesn't seem to support nested files/folders, I'm not sure I will be going with it.
node-archiver - This suffers from the same issue as node-zipstream (probably because it was inspired by it) where it allows me to pipe the output, but I cannot append multiple buffers for the same file within the archive (and then manually signal when the last buffer has been appended for a given file). However, it does allow me to have nested folders, which is a clear win over node-zipstream.
Is there something I'm not aware of, or is this just a really crazy thing that I want to do?
The only alternative I see at this point is to wait for the entire payload to be streamed through a websocket and then append with node-archiver, but I really would like to reap the benefit of true streaming/archiving on-the-fly.
I've also thought about the possibility of creating a read stream of sorts just to serve as a proxy object that I can pass into node-archiver and then just append the buffers I get from the websocket to this read stream. Looking at various read streams, I'm not sure how to do this though. The only way I could think of was creating a writestream, piping buffers to it, and having a readstream read from that writestream. Am I on the correct thought process here?
As always, thanks for any help/direction you can offer SO community.
EDIT:
Since I just opened this question, and I'm new to node, there may be a better answer than the one I provided. I will keep this question open and accept a better answer if one presents itself within a few days. As always, I will upvote any other answers, even if they're ridiculous, as long as they're correct and allow me to stream on-the-fly as mine does.
I figured out a way to get this working with node-archiver. :-)
It was based off my hunch of creating a temporary "proxy stream" of sorts, inspired by this SO question: How to create streams from string in Node.Js?
The basic gist is (coffeescript syntax):
archive = archiver 'zip'
archive.pipe response // where response is the http response
// and then for each file...
fileName = ... // known file name
fileSize = ... // known file size
ws = .... // create websocket
proxyStream = new Stream()
numBytesStreamed = 0
archive.append proxyStream, name: fileName
ws.on 'message', (dataBuffer) ->
numBytesStreamed += dataBuffer.length
proxyStream.emit 'data', dataBuffer
if numBytesStreamed is fileSize
proxyStream.emit 'end'
// function/indicator to do this for the next file in the folder
// and then when you're completely done...
archive.finalize (err, bytesOfArchive) ->
if err?
// do whatever
else
// unless you somehow knew this ahead of time
res.addTrailers
'Content-Length': bytesOfArchive
res.end()
Note that this is not the complete solution I implemented. There is still a lot of logic dealing with getting the files, their paths, etc. Not to mention error-handling.
EDIT:
Since I just opened this question, and I'm new to node, there may be a better answer. I will keep this question open and accept a better answer if one presents itself within a few days. As always, I will upvote any other answers, even if they're ridiculous, as long as they're correct and allow me to stream on-the-fly as mine does.

Resources