Nodejs Multipart File Upload, Resulting File Larger Than Original - node.js

I wanted to write a file uploader for a Nodejs Server using Express 4. I didn't want to use any middleware to achieve this because this was more of an academic exercise to understand a better how Nodejs works and multipart uploads.
Below is just the main bit of code for a route in Express 4 that collects the client data and writes it out.
var clientData = [];
// When Data Arrives
req.on('data', function(data){
clientData.push(data);
});
// Done
req.on('end', function(){
var output = Buffer.concat(clientData);
fs.writeFile('Thisisthesong.mp3', output, 'binary', function(err){
if (err) throw err;
debug('Wrote out song');
});
});
My issue is that when the file is finally written out, is larger than the original. For example, if I were to upload an MP3 with this code that was originally 10.5 MB the result is 11 MB. I believe it has something to do with switching the encodings back and forth from the body to writing it out. I also understand that node does not truly have a binary encoding, could that be an issue?
I also thought it could be because I'm not stripping the boundaries or the the Content-Disposition for the data (this would be the next step once this is working well), but the boundary and the Disposition are only about 300 bytes not 500KB. Does anybody have an explanation or could point out what I'm doing incorrectly, I would greatly appreciate it.
Other Info:
+ Express 4
+ I'm not using any middleware at the moment besides cookieparser
+ Ubuntu 12.04
+ Node v0.10.31

Double check that you're comparing apples to apples here. Different interfaces on the operating system can calculate the size in different ways which could show a difference of hundreds of kilobytes for the exact same file. For example, I have a file on my computer right now which shows 2.3MB in Finder, but shows 2.2MB in Terminal when using ls -h.

Related

Slowness with Node.js FS : how can I list files faster?

I just want to read files names from a dir :
const fs = require('fs');
fs.readdir("repo/_posts", (err, files) => {
files.forEach(file => {
res.write(file + "\n");
});
})
With only a 15 files, it is very slow, it takes several seconds to display the file names. What did I do wrong?
Edit : as suggested by #Darin Dimitrov in the comments, I've tried to replace res.write by console.log, then it's fast. Is res.write a bad practice in a loop or something similar?
Thanks :)
Most browsers will buffer output received from the server for a variety of reasons (including content encoding detection) and some of them may buffer more than others. If you can access the same url via a utility such as cURL and you see the expected output in a much more timely manner, then this confirms the browser buffering "issue."

How can I stream multiple remote images to a zip file and stream that to browser with ExpressJS?

I've got a small web app built in ExpressJs that allows people in our company to browse product information. A recent feature request requires that users be able to download batches of images (potentially hundreds at a time). These are stored on another server.
Ideally I think I need to to stream the batch of files to a zip file and stream that to the end user's browser as a download. All preferably without having to store the files on the server. The idea being that I want to reduce load on the server as much as possible.
Is it possible to do this or do I need to look at another approach? I've been experimenting with the 'request' module for the initial download.
If anyone can point me in the right direction or recommend any NPM modules that might help it would be very much appreciated.
Thanks.
One useful module for this is archiver, but I'm sure there are others as well.
Here's an example program that shows:
how to retrieve a list of URL's (I'm using async to handle the requests, and also to limit the # of concurrent HTTP requests to 3);
how to add the responses for those URL's to a ZIP file;
to stream the final ZIP file somewhere (in this case to stdout, but in case of Express you can pipe to the response object).
Example:
var async = require('async');
var request = require('request');
var archiver = require('archiver');
function zipURLs(urls, outStream) {
var zipArchive = archiver.create('zip');
async.eachLimit(urls, 3, function(url, done) {
var stream = request.get(url);
stream.on('error', function(err) {
return done(err);
}).on('end', function() {
return done();
});
// Use the last part of the URL as a filename within the ZIP archive.
zipArchive.append(stream, { name : url.replace(/^.*\//, '') });
}, function(err) {
if (err) throw err;
zipArchive.finalize().pipe(outStream);
});
}
zipURLs([
'http://example.com/image1.jpg',
'http://example.com/image2.jpg',
...
], process.stdout);
Do note that although this doesn't require the image files to be locally stored, it does build the ZIP file entirely in memory. Perhaps there are other ZIP modules that would allow you to work around that, although (AFAIK) the ZIP file format isn't really great in terms of streaming, as it depends on metadata being appended to the end of the file.

how to read an incomplete file and wait for new data in nodejs

I have a UDP client that grabs some data from another source and writes it to a file on the server. Since this is large amount of data, I dont want the end user to wait until they its full written to the server so that they can download it. So I made a NodeJS server that grabs the latest data from the file and sends it to the user.
Here is the code:
var stream = fs.readFileSync(filename)
.on("data", function(data) {
response.write(data)
});
The problem here is, if the download starts when the file was only for example 10mb.. the fs.readFileSync will only read my file up to 10mb. Even if 2 mins later the file increased to 100mb. fs.readFileSync will never know about the new updated data. How can I do this in Node? I would like somehow refresh the fs state or maybe perpaps wait for new data using fs file system. Or is there some kind of fs fileContent watcher?
EDIT:
I think the code below describes better what I would like to achieve, however in this code it keeps reading forever and I dont have any variable from fs.read that can help me stop it:
fs.open(filename, 'r', function(err, fd) {
var bufferSize=1000,
chunkSize=512,
buffer=new Buffer(bufferSize),
bytesRead = 0;
while(true){ //check if file has new content inside
fs.read(fd, buffer, 0, chunkSize, bytesRead);
bytesRead+= buffer.length;
}
});
Node has built-in methods in the fs module. It is tagged as unstable, so it can change in the future.
Its called: fs.watchFile(filename[, options], listener)
You can read more about it here: https://nodejs.org/api/fs.html#fs_fs_watchfile_filename_options_listener
But i highly suggest you to use one of the good modules mantained actively like
watchr:
From his readme:
Better file system watching for Node.js. Provides a normalised API the
file watching APIs of different node versions, nested/recursive file
and directory watching, and accurate detailed events for
file/directory changes, deletions and creations.
The module page is here: https://github.com/bevry/watchr
(Used the module in a couple of proyects and working great, im not related to it in other way)
you need store in some data base last size of file.
read filesize first.
load your file.
then make a script to check if file was change.
you can consult the size with jquery.post to obtain your result and decide if need to reload in javascript

Handling chunked responses from process.stdout 'data' event

I have some code which I can't seem to fix. It looks as follows:
var childProcess = require('child_process');
var spawn = childProcess.spawn;
child = spawn('./simulator',[]);
child.stdout.on('data',
function(data){
console.log(data);
}
);
This is all at the backend of my web application which is running a specific type of simulation. The simulator executable is a c program which runs a loop waiting to be passed data (via its standard input) When the inputs come in for the simulation (ie from the client), I parse the input, and then write data to the child process stdin as follows:
child.stdin.write(INPUTS);
Now the data coming back is 40,000 bytes give or take. But the data seems to be getting broken into chunks of 8192 bytes. I've tried fixing the standard output buffer of the c program but it doesnt fix it. I'm wondering if there is a limit to the size of the 'data' event that is imposed by node.js? I need it to come back as one chunk.
The buffer chunk sizes are applied in node. Nothing you do outside of node will solve the problem. There is no way to get what you want from node without a little extra work in your messaging protocol. Any message larger than the chunk size will be chunked. There are two ways you can handle this issue.
If you know the total output size before you start to stream out of C, prepend the message length to the data so the node process knows how many chunks to pull before terminating the entire message.
Determine a special character you can append to the message you are sending from the C program. When node sees that character, you end the input from that message.
If you are dealing with IO in a web application you really want to stick with the async methods. You need something like the following (untested). There is a good sample of how to consume the Stream API in the docs
var data = '';
child.stdout.on('data',
function(chunk){
data += chunk;
}
);
child.stdout.on('end',
function(){
// do something with var data
}
);
I ran into the same problem. I tried many different things and was starting to get annoyed. I tried prepending and appending with special characters. Maybe I was stupid but I just couldn't get it right.
I ran into a module called linerstream which basically parses every chunk until it sees an EOF. You can use it like this:
process.stdout.pipe(new Linerstream()).on('data', (data) => {
// data here is complete and not chunked
});
The important part is that you do have to write data to stdout with a line that ends with EOF. Otherwise it doesn't know it is the end.
I can say this worked me. Hopefully it helps other people.
ppejovic's solution works, but I prefer concat-stream.
var concat = require('concat-stream');
child.stdout.pipe(concat(function(data) {
// all your data ready to be used.
});
There are a number of good stream helpers worth looking into based on your problem area. Take a look at substack's stream-handbook.

Memcache in node.js is returning object with varying size

Long-time reader, first-time poster.
I am using node v0.6.6 on OS X 10.7. I have not yet tried this in any other environment. I am using this client: https://github.com/elbart/node-memcache
When I use the following code, data randomly contains a few more bytes (as reported by console.log()), which leads to this image: http://imgur.com/NuaK4 (and many other JPG do this). favicon seems OK and HTML/CSS/javascript all work.
In other words: if I request the image, ~70% of the time the image is returned correctly; the other 30% - data reports a few more bytes and the image appears corrupt in the browser.
client.get(key, function(err, data) {
if (err) throw err;
if (data) {
res.writeHead(200, {'Content-Type': type, 'Content-Length': data.length});
console.log('Sending with length: ' + data.length);
res.end(data, 'binary');
}
});
I have been messing with this for several hours and I can honestly say I am stumped. I am hoping someone can show me the error in my ways. I tried searching if there was a way to properly store binary data with memcache but there's no relevant information.
Extra information: it happens with various JPG images; all images are around 100-300KB or less in filesize. For example, one image has reported the following sizes: 286442, 286443, 286441. This problem DOES NOT occur if I straight read data from disk and serve it with node.
Thanks in advance.
Edit I updated my node version and issue persists. Actual test source photo and corrupt photo can be found in my comment below (stackoverflow doesn't permit more links).
Elbart's node-memcache does not handle binary values correctly for the reasons Steve Campbell suggests: node-memcache does not give the client direct access to the buffer. By stringifying the buffers, binary data is corrupted.
Use the 'mc' npm. ( npm install mc )
Caveat: I'm the author of the 'mc' npm. I wrote it specifically to handle binary values over memcache's text protocol.

Resources