Node request write to file corrupt - node.js

I have a get request in node that successfully receives data from API.
When I pipe that response directly to a file like this, it works, the file created is a valid, readable pdf (as i expect to receive from the API).
var http = require('request');
var fs = require('fs');
http.get(
{
url:'',
headers:{}
})
.pipe(fs.createWriteStream('./report.pdf'));
Simple, however the file gets corrupted if I use the event emitters of the request like this
http.get(
{
url:'',
headers:{}
})
.on('error', function (err) {
console.log(err);
})
.on('data', function(data) {
file += data;
})
.on('end', function() {
var stream = fs.createWriteStream('./report.pdf');
stream.write(file, function() {
stream.end();
});
});
I have tried all manner of writing this file and it always ends in a totally blank pdf - the only time the pdf is valid is via the pipe method.
When i console log the events, the sequence seems to be correct - ie, all chunks received and then the end fires at the end.
It's making things impossible to do anything after the pipe. What is pipe doing differently to the writestream ?

I assume that you initialize file as a string:
var file = '';
Then, in your data handler, you add the new chunk of data to it:
file += data;
However, this performs an implicit conversion to (UTF-8-encoded) strings. If the data is actually binary, like with a PDF, this will invalidate the output data.
Instead, you want to collect the data chunks, which are Buffer instances, and use Buffer.concat() to concatenate all those buffers into one large (binary) buffer:
var file = [];
...
.on('data', function(data) {
file.push(data);
})
.on('end', function() {
file = Buffer.concat(file);
...
});

If you wanted to do something after the file is done being written by pipe, you can add an event listener for finish on the object returned by pipe.
.pipe(fs.createWriteStream('./report.pdf'))
.on('finish', function done() { /* the file has been written */ });
Source: https://nodejs.org/api/stream.html#stream_event_finish

Related

createWriteStream in node seems to execute 'end' event before 'data'

I am unable to understand how the event loop is processing my snippet. What I am trying to achieve is
to read from a csv
download a resource found in the csv
upload it to s3
write it into a new csv file
const readAndUpload = () => {
fs.createReadStream('filename.csv')
.pipe(csv())
.on('data', ((row: any) => {
const file = fs.createWriteStream("file.jpg");
var url = new URL(row.imageURL)
// choose whether to make an http or https request
let client = (url.protocol=="https:") ? https : http
const request = client.get(row.imageURL, function(response:any) {
// file save
response.pipe(file);
console.log('file saved')
let filePath = "file.jpg";
let params = {
Bucket: 'bucket-name',
Body : fs.createReadStream(filePath),
Key : "filename.jpg"
};
// upload to s3
s3.upload(params, function (err: any, data: any) {
//handle error
if (err) {
console.log("Error", err);
}
//success
if (data) {
console.log("Uploaded in:", data.Location);
row.imageURL = data.Location
writeData.push(row)
// console.log(writeData)
}
});
});
}))
.on('end', () => {
console.log("done reading")
const csvWriter = createCsvWriter({
path: 'out.csv',
header: [
{id: 'id', title: 'some title'}
]
});
csvWriter
.writeRecords(writeData)
.then(() => console.log("The CSV file was written successfully"))
})
}
Going by the log statements that I have added, done reading and The CSV file was written successfully is printed by before file saved. My understanding was that the end event is called after the data event, so I am unsure of where I am going wrong.
Thank you for reading!
I'm not sure if this is part of the problem or not, but you've got an extra set of parens in this part of the code. Change this:
.on('data', ((row: any) => {
.....
})).on('end', () => {
to this:
.on('data', (row: any) => {
.....
}).on('end', () => {
And, if the event handlers are set up properly, your .on('data', ...) event handler gets called before the .on('end', ....) for the same stream. If you put this:
console.log('at start of data event handler');
as the first line in that event handler, you will see it get called first.
But, your data event handler uses multiple asynchronous calls and nothing you have in your code makes the end event wait for all your processing to be done in the data event handler. So, since that processing takes awhile, it's natural that the end event would occur before you're done running all that asynchronous code on the data event.
In addition, if you ever can have more than one data event (which one normally would), you're going to have multiple data events in flight at the same time and since you're using a fixed filename, they will probably be overwriting each other.
The usual way to solve something like this is to to stream.pause() to pause the readstream at the start of the data event processing and then when all your asynchronous stuff is done, you can then stream.resume() to let it start going again.
You will need to get the right stream in order to pause and resume. You could do something like this:
let stream = fs.createReadStream('filename.csv')
.pipe(csv());
stream.on('data', ((row: any) => {
stream.pause();
....
});
Then, way inside your s3.upload() callback, you can call stream.resume(). You will also need much, much better error handling that you have or things will just get stuck if you get an error.
It also looks like you have other concurrency issues too where you call:
response.pipe(file);
And you then attempt to use the file without actually waiting for that .pipe() operation to be done (which is also asynchronous). Overall, this whole logic really needs a major cleanup. I don't understand what exactly you're trying to do in all the different steps to know how to write a totally clean and simpler version.

Why the streams are not seen as string on client side

I have made a simple server and client program where the server reads the data from file and send to the client through TCP socket But the data I am getting is in object and not a simple string ?
So why I cant see the data as plaintext as it is in my data.txt file.
Explanation with example would be appreciated.
Here is my code :-
SERVER CODE
const fs = require('fs');
const net = require('net');
const readableData = fs.createReadStream('data.txt', 'utf8');
const server = net.createServer(socket => {
socket.on('data', chunk => {
console.log(chunk.toString());
socket.write(JSON.stringify(readableData));
});
socket.on('end', () => {
console.log("done");
})
socket.on('close', () => {
console.log("closed")
})
});
server.listen(3000);
CLIENT CODE
const fs = require('fs');
const net = require('net');
const client = new net.Socket();
client.connect('3000', () => {
console.log("connected");
client.write("Server please send the data");
});
client.on('data', chunk => {
console.log("Data recieved:" + chunk.toString());
});
client.on('finish', () => {
console.log("Work completed");
})
client.on('close', () => {
console.log("connection closed");
})
And here is my data.txt file which has simple data
Hello client how are you ?
And the output I'm getting is here :-
Data recieved:{"_readableState":{"objectMode":false,"highWaterMark":65536,"buffer":{"head":{"data":"Hello client how are you ?","next":null},"tail":{"data":"Hello client how are you ?","next":null},"length":1},"length":26,"pipes":null,"pipesCount":0,"flowing":null,"ended":true,"endEmitted":false,"reading":false,"sync":false,"needReadable":false,"emittedReadable":false,"readableListening":false,"resumeScheduled":false,"paused":true,"emitClose":false,"autoDestroy":false,"destroyed":false,"defaultEncoding":"utf8","awaitDrain":0,"readingMore":false,"decoder":{"encoding":"utf8"},"encoding":"utf8"},"readable":true,"_events":{},"_eventsCount":1,"path":"data.txt","fd":35,"flags":"r","mode":438,"end":null,"autoClose":true,"bytesRead":26,"closed":false}
The question why I won't be able to see the data as plaintext on client side as it is in data.txt file.
Your variable readableData contains a node.js stream object. That's what that variable is. It's only of use in the current node.js instance so it doesn't do anything useful to try to send that stream object to the client.
If you want to get all the data from that 'data.txt' file, you have several choices.
You can just read the whole file into a local variable with fs.readFile() and then send all that data with socket.write().
You can create a new stream attached to the file for each new incoming request and then as the data comes in on the readStream, you can send it out on the socket (this is often referred to as piping one stream into another). If you use higher level server constructs such as an http server, they make piping real easy.
Option #1 would look like this:
const server = net.createServer(socket => {
socket.on('data', chunk => {
console.log(chunk.toString());
fs.readFile('data.txt', 'utf8', (err, data) => {
if (err) {
// insert error handling here
console.log(err);
} else {
socket.write(data);
}
});
});
socket.on('end', () => {
console.log("done");
})
socket.on('close', () => {
console.log("closed")
})
});
FYI, you should also know that socket.on('data', chunk => {...}) can give you any size chunk of data. TCP streams do not make any guarantees about delivering the exact same chunks of data in the same pieces that they were originally sent in. They will come in order, but if you sent three 1k chunks from the other end, they might arrive as three separate 1k chunks, they might arrive as one 3k chunk or they might arrive as a whole bunch of much smaller chunks. How they arrive will often depend upon what intermediate transports and routers they had to travel over and if there were any recoverable issues along that transmission. For example, data sent over a satellite internet connection will probably arrive in small chunks because the needs of the transport broke it up into smaller pieces.
This means that reading any data over a plain TCP connection generally needs some sort of protocol so that the reader knows when they've gotten a full, meaningful chunk that they can process. If the data is plain text, it might be as simple a protocol as every message ends with a line feed character. But, if the data is more complex, then the protocol may need to be more complex.

How to await fields before processing the file stream in multipart form

I'm using SendGrid for receiving files via email. SendGrid parses the incoming emails and sends the files in a multipart form to an endpoint I have set up.
I don't want the files on my local disk so I stream them straight to Amazon S3. This works perfect.
But before I can stream to S3 I need to get hold of the destination mail address so I can work out the correct s3 folder. This is sent in a field named "to" in the form post. Unfortunately this field sometimes arrives after the files are arriving, hence I need a way to await the to-field before I'm ready to take the stream.
I thought I could wrap the onField in a promise and await the to-field from within the onFile. But this concept seems to lock it self up when the field arrives after the file.
I'm new to booth streams and promises. I would really appreciate if someone could tell me how to do this.
This is the non working pseudoish code:
function sendGridUpload(req, res, next) {
var busboy = new Busboy({ headers: req.headers });
var awaitEmailAddress = new Promise(function(resolve, reject) {
busboy.on('field', function(fieldname, val, fieldnameTruncated, valTruncated) {
if(fieldname === 'to') {
resolve(val);
} else {
return;
}
});
});
busboy.on('file', function(fieldname, file, filename, encoding, mimetype) {
function findInbox(emailAddress) {
console.log('Got email address: ' + emailAddress);
..find the inbox and generate an s3Key
return s3Key;
}
function saveFileStream(s3Key) {
..pipe the file directly to S3
}
awaitEmailAddress.then(findInbox)
.then(saveFileStream)
.catch(function(err) {
log.error(err)
});
});
req.pipe(busboy);
}
I finally got this working. The solution is not very pretty, and I have actually switched to another concept (described at the end of the post).
To buffer the incoming data until the "to"-field arrives I used stream-buffers by #samcday. When I get hold of the to-field I release the readable stream to the pipes lined up for the data.
Here is the code (some parts omitted, but essential parts are there).
var streamBuffers = require('stream-buffers');
function postInboundMail(req, res, next) {
var busboy = new Busboy({ headers: req.headers});
//Sometimes the fields arrives after the files are streamed.
//We need the "to"-field before we are ready for the files
//Therefore the onField is wrapped in a promise which gets
//resolved when the to field arrives
var awaitEmailAddress = new Promise(function(resolve, reject) {
busboy.on('field', function(fieldname, val, fieldnameTruncated, valTruncated) {
var emailAddress;
if(fieldname === 'to') {
try {
emailAddress = emailRegexp.exec(val)[1]
resolve(emailAddress)
} catch(err) {
return reject(err);
}
} else {
return;
}
});
});
busboy.on('file', function(fieldname, file, filename, encoding, mimetype) {
var inbox;
//I'm using readableStreamBuffer to accumulate the data before
//I get the email field so I can send the stream through to S3
var readBuf = new streamBuffers.ReadableStreamBuffer();
//I have to pause readBuf immediately. Otherwise stream-buffers starts
//sending as soon as I put data in in with put().
readBuf.pause();
function getInbox(emailAddress) {
return model.inbox.findOne({email: emailAddress})
.then(function(result) {
if(!result) return Promise.reject(new Error(`Inbox not found for ${emailAddress}`))
inbox = result;
return Promise.resolve();
});
}
function saveFileStream() {
console.log('=========== starting stream to S3 ========= ' + filename)
//Have to resume readBuf since we paused it before
readBuf.resume();
//file.save will approximately do the following:
// readBuf.pipe(gzip).pipe(encrypt).pipe(S3)
return model.file.save({
inbox: inbox,
fileStream: readBuf
});
}
awaitEmailAddress.then(getInbox)
.then(saveFileStream)
.catch(function(err) {
log.error(err)
});
file.on('data', function(data) {
//Fill readBuf with data as it arrives
readBuf.put(data);
});
file.on('end', function() {
//This was the only way I found to get the S3 streaming finished.
//Destroysoon will let the pipes finish the reading bot no more writes are allowed
readBuf.destroySoon()
});
});
busboy.on('finish', function() {
res.writeHead(202, { Connection: 'close', Location: '/' });
res.end();
});
req.pipe(busboy);
}
I would really much like feedback on this solution, even though I'm not using it. I have a feeling that this can be done much more simple and elegant.
New solution:
Instead of waiting for the to-field I send the stream directly to S3. I figured, the more stuff I put in between the incoming stream and the S3 saving, the higher the risk of loosing the incoming file due to a bug in my code. (SendGrid will eventually resend the file if I'm not responding with 200, but it will take some time.)
This is how I do it:
Save a placeholder for the file in the database
Pipe the stream to S3
Update the placeholder with more information as it arrives
This solution also gives me the opportunity to easily get hold of unsuccessful uploads since the placeholders for unsuccessful uploads will be incomplete.
//Michael

Node pipe stops working

My client sends an image file to the server. It works 5 times and then it suddenly stops. I am pretty new using streams and pipe so I am not sure what I am doing wrong.
Server Code
http.createServer(function(req, res) {
console.log("File received");
// This opens up the writeable stream to `output`
var name = "./test"+i+".jpg";
var writeStream = fs.createWriteStream(name);
// This pipes the POST data to the file
req.pipe(writeStream);
req.on('end', function () {
console.log("File saved");
i++;
});
// This is here incase any errors occur
writeStream.on('error', function (err) {
console.log(err);
});
}).listen(3000);
Client code
var request = require('request');
var fs = require('fs');
setInterval(function () {
var readStream = fs.createReadStream('./test.jpg');
readStream.on('open', function () {
// This just pipes the read stream to the response object (which goes to the client)
readStream.pipe(request.post('http://192.168.1.100:3000/test'));
console.log("Send file to server");
});
}, 1000);
Behaves like a resource exhaustion issue. Not sure which calls throw errors and which just return. Does the server connect on the 6th call? Does the write stream open? Does the pipe open?
Try ending the connection and closing the pipe after the image is saved. Maybe close the write stream too, don't remember if node garbage collects file descriptors.
I had to do the following on the server side to make this work :
res.statusCode = 200;
res.end();

Download File, save it and read it again --> Error

I would like to download a file, write it to a temporary file, read it and give the readFileSync Buffer to a function. I tried this:
var file = fs.createWriteStream("temp.pdf")
var request = http.get(linkArray[1], function(response) {
response.on('data', function(data){
file.write(data)
}).on('end', function(){
postData(fs.readFileSync('temp.pdf'))
})
});
Sometimes it works, but sometimes it doesn't - my guess is that the file isn't written completely, when it is read. (But than the 'end' event shouldn't be fired ?!
As you can see, I would like to download a bunch of files and do this. Do you have any advise how to solve this? Maybe this isn't the best way to solve this...
You shouldn't link streams with on('data' you should use pipe. Pipe will link the streams data events to writes and end events to ends.
var file = fs.createWriteStream("temp.pdf");
var request = http.get(linkArray[1], function(response) {
response.pipe(file).on('close', function(){
postData(fs.readFileSync('temp.pdf'));
});
});
also you should use https://github.com/mikeal/request
var request = require('request');
request.get(linkArray[i], function (err, response, body) {
postData(body);
});
or
var request = require('request');
var file = fs.createWriteStream("temp.pdf");
request.get(linkArray[i]).pipe(file).on('close', function () {
postData(fs.readFileSync('temp.pdf'));
});
You need to call file.end(); at the top of your .on('end', ...) handler. The end() method itself is asynchronous, though, so you'll want to read the file once that's complete. E.g.,
var file = fs.createWriteStream("temp.pdf")
var request = http.get(linkArray[1], function(response) {
response.on('data', function(data){
file.write(data)
}).on('end', function(){
file.end(function() {
postData(fs.readFileSync('temp.pdf'))
});
})
});

Resources