One shot Streams

One shot Streams - node.js

The following will not work properly:
var http = require('http');
var fs = require('fs');
var theIndex = fs.createReadStream('index.html');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/html'});
theIndex.pipe(res);
}).listen(9000);
It will work great on the first request but for all subsequent requests no index.html will be sent to the client. The createReadStream call seems to need be inside the createServer callback. I think I can conceptualize why, but can you articulate why in words? It seems to be that once the stream has completed the file handle is closed and the stream must be created again? It can't simply be "restarted"? Is this correct?
Thanks

Streams contain internal state that keeps track of the state of the stream--in the case of a file stream, you have a file descriptor object, a read buffer, and the current position the file has been read to. Thus, it doesn't make sense to "rewind" a Node.js stream because Node.js is an asynchronous environment--this is an important point to keep in mind, as it means that two HTTP requests can be in the middle of processing at the same time.
If one HTTP request causes the stream to begin streaming from disk, and midway through the streaming process another HTTP request came in, there would be no way to use the same stream in the second HTTP request (the internal record-keeping would incorrectly send the second HTTP response the wrong data). Similarly, rewinding the stream when the second HTTP request is processed would cause the wrong data to be sent to the original HTTP request.
If Node.js were not an asynchronous environment, and it was guaranteed that the stream was completely used up before you rewound it, it might make sense to be able to rewind a stream (though there are other considerations, such as the timing of the open, end, and close events).
You do have access to the low-level fs.read mechanisms, so you could theoretically create an API that only opened a single file descriptor but spawned multiple streams; each stream would contain its own buffer and read position, but share a file descriptor. Perhaps something like:
var http = require('http');
var fs = require('fs');
var theIndexSpawner = createStreamSpawner('index.html');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/html'});
theIndexSpawner.spawnStream().pipe(res);
}).listen(9000);
Of course, you'll have to figure out when it's time to close the file descriptor, making sure you don't hold onto it for too long, etc. Unless you find that opening the file multiple times is an actual bottleneck in your application, it's probably not worth the mental overhead.

Related

Does the new way to read streams in Node cause blocking?

The documentation for node suggests that for the new best way to read streams is as follows:
var readable = getReadableStreamSomehow();
readable.on('readable', function() {
var chunk;
while (null !== (chunk = readable.read())) {
console.log('got %d bytes of data', chunk.length);
}
});
To me this seems to cause a blocking while loop. This would mean that if node is responding to an http request by reading and sending a file, the process would have to block while the chunk is read before it could be sent.
Isn't this blocking IO which node.js tries to avoid?

The important thing to note here is that it's not blocking in the sense that it's waiting for more input to arrive on the stream. It's simply retrieving the current contents of the stream's internal buffer. This kind of loop will finish pretty quickly since there is no waiting on I/O at all.

A stream can be both synchronous and asynchronous. If readable stream synchronously pushes data in the internal buffer then you'll get a synchronous stream. And yes, in that case if it pushes lots of data synchronously node's event loop won't be able to run until all the data is pushed.
Interestingly, if you even remove the while loop in readble callback, the stream module internally calls a while loop once and keeps running until all the pushed data is read.
But for asynchronous IO operations(e.g. http or fs module), they push data asynchronously in the buffer. So the while loop only runs when data is pushed in buffer and stops as soon as you've read the entire buffer.

Write NodeJS stream into a string synchronously

I know that one can synchronously read a file in NodeJS like this:
var fs = require('fs');
var content = fs.readFileSync('myfilename');
console.log(content);
I am instead interested in being able to read the contents from a stream into a string synchronously. Thoughts?

Streams in node.js are not synchronous - they are event driven. They just aren't synchronous. So, you can't get their results into a string synchronously.
If you have no choice but to use a stream, then you have no choice but to deal with the contents of the stream asynchronously.
If you can change the source of the data to a synchronous source such as the fs.readFileSync() that you show, then you can do that (though generally not recommended for a multi-user server process).

long (IO-bound) requests in express

I have an Express URL which has to wait for data to arrive from an external device over a serial port (or another network connection). This can take up to two seconds. I understand that if my get function blocks, it blocks the entire Node process, so I want to avoid this:
app.get('/ext-data', function(req, res){
var data = wait_for_external_data();
res.send(data);
});
I do have an emitter for the external data, so I can get a callback when external data arrive.
I'm unclear on how to tell express to do other things while my code is waiting for external data to become available, and how to pass them on to the repose object once I have them.

Generally you would pass a callback to your wait_for_external_data function that will be called once the data is received, and you need to write wait_for_external_data such that it will not block. To do this you would use the event emitters to get the data, as you mentioned. I can give more info if you elaborate on what library you are using to get the data.
app.get('/ext-data', function(req, res){
wait_for_external_data(function(data){
res.send(data);
});
});

Reporting upload progress from node.js

I'm writing a small node.js application that receives a multipart POST from an HTML form and pipes the incoming data to Amazon S3. The formidable module provides the multipart parsing, exposing each part as a node Stream. The knox module handles the PUT to s3.
var form = new formidable.IncomingForm()
, s3 = knox.createClient(conf);
form.onPart = function(part) {
var put = s3.putStream(part, filename, headers, handleResponse);
put.on('progress', handleProgress);
};
form.parse(req);
I'm reporting the upload progress to the browser client via socket.io, but am having difficulty getting these numbers to reflect the real progress of the node to s3 upload.
When the browser to node upload happens near instantaneously, as it does when the node process is running on the local network, the progress indicator reaches 100% immediately. If the file is large, i.e. 300MB, the progress indicator rises slowly, but still faster than our upstream bandwidth would allow. After hitting 100% progress, the client then hangs, presumably waiting for the s3 upload to finish.
I know putStream uses Node's stream.pipe method internally, but I don't understand the detail of how this really works. My assumption is that node gobbles up the incoming data as fast as it can, throwing it into memory. If the write stream can take the data fast enough, little data is kept in memory at once, since it can be written and discarded. If the write stream is slow though, as it is here, we presumably have to keep all that incoming data in memory until it can be written. Since we're listening for data events on the read stream in order to emit progress, we end up reporting the upload as going faster than it really is.
Is my understanding of this problem anywhere close to the mark? How might I go about fixing it? Do I need to get down and dirty with write, drain and pause?

Your problem is that stream.pause isn't implemented on the part, which is a very simple readstream of the output from the multipart form parser.
Knox instructs the s3 request to emit "progress" events whenever the part emits "data". However since the part stream ignores pause, the progress events are emitted as fast as the form data is uploaded and parsed.
The formidable form, however, does know how to both pause and resume (it proxies the calls to the request it's parsing).
Something like this should fix your problem:
form.onPart = function(part) {
// once pause is implemented, the part will be able to throttle the speed
// of the incoming request
part.pause = function() {
form.pause();
};
// resume is the counterpart to pause, and will fire after the `put` emits
// "drain", letting us know that it's ok to start emitting "data" again
part.resume = function() {
form.resume();
};
var put = s3.putStream(part, filename, headers, handleResponse);
put.on('progress', handleProgress);
};

net module in node.js

I'm trying to make a server based on the net module. what I don't understand is on which event I'm supposed to put the response code:
on(data,function()) could still be in the middle of receiving more data from the stream (so it might be to early to reply)
and on(end,function()) is after the connection is closed .
thank you for your help

The socket event ('data'), calls the callback function every time an incoming data buffer is ready for reading,, and the event emits the socket buffer of data,,
so use this,,
socket.on('data',function(data){
// Here is the function to detect the real data in stream
});
this can help for node v0.6.5, http://nodejs.org/docs/v0.6.5/api/net.html#event_data_
and this for clear understanding for the Readable streames,
http://nodejs.org/docs/v0.6.5/api/streams.html#readable_Stream

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

One shot Streams - node.js

Related

Does the new way to read streams in Node cause blocking?

Write NodeJS stream into a string synchronously

long (IO-bound) requests in express

Reporting upload progress from node.js

net module in node.js

Categories

Resources