I am working on a parser in nodejs. Therefore I request a website and parse the HTML.
I am working with require("htmlparser") and require('follow-redirects').http for the requests.
requestSite(options);
console.log("Done\n");
parser.done();
function requestSite(options) {
http.get(options, function(res) {
console.log("Got response: " + res.statusCode);
res.setEncoding('utf8');
res.on('data', function (chunk) {
parser.parseChunk(chunk.toString('utf8'));
});
}).on('error', function(e) {
console.log("Got error: " + e.message);
});
}
My problem now is that the done() is called before the requestSite function actually has finished its chunks resulting in following error:
Writing to the handler after done() called is not allowed without
calling a reset()
How can I wait for the chunks to finish?
You are not taking account the asynchronous nature of nodejs. It will call requestSite and then moveon to execute the next statement and call parser.done before requestSite is done executing. Do this instead.
requestSite(options, parser);
console.log("Done\n");
function requestSite(options, parser) {
http.get(options, function(res) {
console.log("Got response: " + res.statusCode);
res.setEncoding('utf8');
res.on('data', function (chunk) {
parser.parseChunk(chunk.toString('utf8'));
})
.on("end", function(){
parser.done();
})
}).on('error', function(e) {
console.log("Got error: " + e.message);
});
}
well this is the basic of node.js and the event driven architecture.
Node is not a line by line programing like php, python etc...
look at this simple example:
console.log(1);
setTimeout (function(err, res) {
console.log(2);
}, 0);
console.log(3);
as you will think it should print: 1,2,3
but this will print 1,3,2.
In your example you should move the
parser.done();
to the "end" of the http request.
currently you have an event for getting chunks of data, so simply use:
onEnd or something similar and than place the "parser.done()"
Related
I am implementing https.request per these instructions (https://nodejs.org/api/https.html#httpsrequesturl-options-callback) except instead of doing a stdout, I am writing to file. I want the end to wait until the file writing process is complete. How can I do this?
process.stdout.write(d);
is changed to
fs.writeFile(path, d, err => {
if (err) {
console.error(err);
} else {
console.log("data => " path)
}
})
This is the entire code
const https = require('node:https');
const options = {
hostname: 'encrypted.google.com',
port: 443,
path: '/',
method: 'GET'
};
const req = https.request(options, (res) => {
console.log('statusCode:', res.statusCode);
console.log('headers:', res.headers);
res.on('data', (d) => {
process.stdout.write(d);
});
});
req.on('error', (e) => {
console.error(e);
});
req.end();
UPDATE
MTN posted a solution that works, but I realized that my code is slightly more complex. I read the file in chunks and save at the end. MTN's solution finishes early. Here is the code. Can anyone help me fix it?
const request = https.request(url, (response, error) => {
if (error) {console.log(error)}
let data = '';
response.on('data', (chunk) => {
data = data + chunk.toString();
});
response.on('end', () => {
fs.writeFile(path, data, err => {
if (err) {
console.error(err);
} else {
console.log("data => " path)
}
})
})
})
request.on('error', (error) => {
console.error(error);
})
request.end()
},
The immediate answer would be that whatever should happen after the file was written would have to go into the callback, after your console.log. (There is nothing in your code that looks like it's supposed to run afterwards though.)
But:
Your code would be a lot simpler if you'd...
Use a library for sending HTTP requests instead of the raw https module. (For example: got, axios, node-fetch, ...) - Not only do these take care of things like reading the body for you, they also have a promise interface which allows you to do point 2.
Rewrite the code to uses async/await.
Here is an example with got:
import got from 'got'
import { writeFile } from 'fs/promises'
const response = await got('https://encrypted.google.com').text()
await writeFile('test.html', response)
// Whatever should happen after the file was written would simply go here!
Note: This has to be an ES6 module because I used top-level await and import, and got doesn't even support CommonJS anymore. So either your package.json would have to have "type": "module" or the file ending would have to be mjs.
You can use fs.writeFileSync() instead. Its sync so it waits for the writing to be finished
res.on(“data”, (d) => { fs.writeFile(/* Enter params here */ })
Inside the fs.writeFile, add whatever you want to do in the last callback function.
I read the nodejs document,it says that the only difference between these two function is that http.get will execute req.end automatically.But I got a weird question.I write some codes like these:
http.get(url,function(res){
var data="";
res.on('data',function(chunk){
data+=chunk;
});
res.on('end',function(){
console.log(data);
});
}).on("error",function(){
});
in this place,the data works fine.But when I use http.request,something is wrong.
var pReq = http.request(options, function(pRes) {
var data=" ";
pRes.on('data',function (chunk) {
data+=chunk;
});
pRes.on('end',function() {
console.log(data)
});
}).on('error', function(e) {
});
in this place,I always got Garbled.I'm new in node,are there any mistakes about the sencode one?
I'm aware that there are several questions related to mine, but I didn't find any of them useful:
this one doesn't apply to my case, I'm actually getting the answer, it's the contents that I can't get.
on this one, on the other hand, the problem is a wrong handling of an asynchronous call, which is not my case
there, well, I really didn't fully understand this question
And so on...
Then, I think this is a legitimate question. I'm actually performing some encryption in my server (express routing in node) through a post request:
app.post('/encrypt', encrypt);
Encrypt is doing:
function encrypt(req,res) {
if(req.body.key && req.body.message) {
var encryptedMessage = Encrypter.encrypt(req.body.key,req.body.message);
return res.status(200).json({ message: encryptedMessage });
}
res.status(409).json({ message: 'the message could not be encrypted, no key found' });
}
}
So, I tested this via console.log, and it's working. When the server receives the request, the encrypted message is being generated.
At the same time, I'm testing my thing with mocha and I'm doing it like so:
describe('# Here is where the fun starts ', function () {
/**
* Start and stop the server
*/
before(function () {
server.listen(port);
});
after(function () {
server.close();
});
it('Requesting an encrypted message', function(done) {
var postData = querystring.stringify({
key : key,
message : message
});
var options = {
hostname: hostname,
port: port,
path: '/encrypt',
method: 'POST',
headers: {
'Content-Type': 'application/x-www-form-urlencoded',
'Content-Length': postData.length
}
};
var req = http.request(options, function(res) {
res.statusCode.should.equal(200);
var encryptedMessage = res.message;
encryptedMessage.should.not.equal(message);
done();
});
req.on('error', function(e) {
//I'm aware should.fail doesn't work like this
should.fail('problem with request: ' + e.message);
});
req.write(postData);
req.end();
});
});
So, whenever I execute the tests, it fails with Uncaught TypeError: Cannot read property 'should' of undefined because res.message does not exist.
None of the res.on (data, end, events is working, so I suppose the data should be available from there. First I had this:
var req = http.request(options, function(res) {
res.statusCode.should.equal(200);
var encryptedMessage;
res.on('data', function (chunk) {
console.log('BODY: ' + chunk);
encryptedMessage = chunk.message;
});
encryptedMessage.should.not.equal(message);
done();
});
But res.on was never accessed (the console.log didn't show anything). I'm therefore a bit stuck here. I'm surely doing some basic stuff wrong, but I don't have a clue, and the many questions I found doesn't seem to apply to my case.
Weird enough, if I launch a test server and then I curl it
curl --data "key=secret&message=veryimportantstuffiabsolutellyneedtoprotect" localhost:2409/encrypt
Curl justs waits ad aeternam.
Actually I was doing it properly at the beginning, and the problem was indeed the same than in the second question I mentionned I was actually "clearing" my context with done() before the post data arrived. The solution is:
var req = http.request(options, function(res) {
res.statusCode.should.equal(200);
res.on('data', function(data) {
encryptedMessage = JSON.parse(data).message;
encryptedMessage.should.not.equal(message);
done();
});
});
In such a way that done() is only called when the data has been threated. Otherwise, mocha will not wait for the answer.
I am trying to learn node.js.
I am trying to understand streams and piping.
Is it possible to pipe the response of http request to console.log?
I know how to do this by binding a handler to the data event but I am more interested in streaming it to the console.
http.get(url, function(response) {
response.pipe(console.log);
response.on('end', function() {
console.log('finished');
});
});
console.log is just a function that pipes the process stream to an output.
Note that the following is example code
console.log = function(d) {
process.stdout.write(d + '\n');
};
Piping to process.stdout does exactly the same thing.
http.get(url, function(response) {
response.pipe(process.stdout);
response.on('end', function() {
console.log('finished');
});
});
Note you can also do
process.stdout.write(response);
I successfully wrote app in node.js working with http.get. Problem is, that if page doesn't exist, it make an error that terminates the app. Is any way how to declare timeout. After timeout it should stop waiting for response and let app continue (if written synchronously)
Thanks for all advices...
I'm responding thinking that you are trying to retrieve info via http.get.
You can see the validate using the response status code.
http.get(url, function(res) {
if (res.statusCode !== 200) {
Your custom handler
} else {
var body = '';
res.on('data', function(chunk) {
body += chunk;
});
res.on('end', function() {
console.log(body);
});
}
}).on('error', function(e) {
console.log("error: ", e);
});
A fully implemented example can be found her https://gist.github.com/pasupulaphani/9630789
You can set timeout and emit an error/abort if the request takes too long but this has to be done in application logic.
Found a better/sophisticated solution to handle timeouts in other post :
https://stackoverflow.com/a/12815321/2073176