How to best extract content body from a nodejs net response?

How to best extract content body from a nodejs net response? - node.js

Using the following code:
var net = require("net");
var client = new net.Socket();
client.connect(8080,"localhost",function() {
client.write("GET /edsa-jox/testjson.json HTTP/1.1\r\n");
client.write("Accept-Encoding: gzip\r\n");
client.write("Host: localhost:8080\r\n\r\n");
}
);
client.on("data", function(data) {
console.log(data.toString("utf-8", 0, data.length));
});
I get the following response:
HTTP/1.1 200 OK
Date: Thu, 20 May 2021 22:45:26 GMT
Server: Apache/2.4.25 (Win32) PHP/5.6.30
Last-Modified: Thu, 20 May 2021 20:14:17 GMT
ETag: "1f-5c2c89677c5c7"
Accept-Ranges: bytes
Content-Length: 31
Content-Type: application/json
{"message":"message from json"}
And this response is shown in the console immediately. But since it is coming from the "data" event I guess it would have been coming in chunks if the response was bigger.
So I also tested with the following (all else equal):
var data="";
client.on("data", function(d) {
console.log("1");
data += d.toString("utf-8", 0, d.length);
});
client.on("end", function(d) {
console.log(data);
});
Thinking that I could use the event "end" to be sure that I had the full set of data before doing something else. Which I guess worked, but the unexpected thing was that the "1" was shown immediately but then it took a couple of seconds before the "end" event was triggered.
Question 1) Why is there such a delay with "end" event compared to the last executed "data" event? Is there a better way to do it?
Question 2) Having the above response, which contains both a bunch of headers aswell as a content body. What approach is the best approach to extract the body part?
Note, I want to do this with the net library not the fetch nor the http libraries (or any other abstractions). I want it to be as fast as possible.

i can only see two valid reasons to do it all by hand :
extreme speed needs => then you should consider using "go" or another compiled language
learning (always interesting)
I would recomend you to use express, or any other npm package to deal with everything without reinventing the wheel.
however, i'll help you with what i know :
First thing is to properly decode ut8 strings. You need to use string_decoder because if the data chunk is incomplete, and you call data.toString('utf8'), you will have a mangled character appended. doesn't happen often but hard to debug.
here is a valid way to do it :
const { StringDecoder } = require('string_decoder');
var decoder = new StringDecoder('utf8');
var stdout = '';
stream.on('data', (data) => {
stdout += decoder.write(data);
});
https://blog.raphaelpiccolo.com/post/827
then to answer your questions :
i dont know, may be related to gzip. The server can be slow to stop the connection, or it's the client's fault. Or the network itself. I would try with other clients / servers to be sure, and start profiling.
you need to read http specifications to handle all edge cases (http1/websockets/http2). But i think you are lucky, headers are always separated from the body by a double newline. then if you loop through the data coming from the stream, after it's been decoded, character by character, you can search for this pattern \n\n. Anything coming after will be the body.
one special case i think about is keep alive : if the client and server are in keep alive mode the connection wont be closed between calls. you may need to parse the "Content-Length" header to know how many characters to wait for.

Related

server response to webform: how to answer duplicates?

I'm running a small server that needs to receive webforms. The server checks the request and sends back "success" or "fail" which is then displayed on the form (client screen).
Now, checking the form may take a few seconds, so the user may be tempted to send the form again.
What is the corret way to ignore the second request?
So far I have come out with this solutions: If the form is duplicate of the previous one
Don't check and send some server error back (like 429, or 102, or some other one)
Close directly the connection req.destroy();res.destroy();
Ignore the request and exit from the requestListener function.
With solution 1 and 2 the form (on client's browser) displays a message error (even if the first request they sent was correct, so as the duplicates). So it's not a good one.
Solution 3 gives the desired outcome... but I'm not sure if it is the right way around it... basically not changing req and res instead of destroying them. Could this cause issues, or slow down the server? (like... do they stack up?). Of course the first request, once it has been checked, will be sent back with the outcome code. My concern is with the duplicate requests, which I don't destroy nor answer...
Some details on the setup: Nodejs application using the very default code by the http module.
const http = require("http");
const requestListener = function (req, res) {
var requestBody = '';
req.on('data', (data)=>{
requestBody += data;
});
req.on('end', ()=>{
if (isduplicate(requestBody))
return;
else
evalRequest(requestBody, res);
})
}

Why is my node.js server chopping off the response body?

My node server has a strange behaviour when it comes to a GET endpoint that resplies with a big JSON (30-35MB).
I am not using any npm package. Just the core API.
The unexpected behaviour only happens when querying the server from the Internet and it behaves fine if it is queried from the local network.
The problem is that the server stops writing to the response after it writes the first 1260 bytes of the content body. It does not close the connection nor throw an error. Insomnia (the REST client I use for testing) just states that it received a 1260B chunk. If I query the same endpoint from a local machine it says that it received more and bigger chunks (a few KB each).
I don't even think the problem is caused by node but since I am on a clean raspberry pi (installed raspbian and then just node v13.0.1) and the only process I use is node.js I don't know how to find the source of the problem, there is no load balancer or web server to blame. Also the public IP seems OK, every other endpoint is working fine (they reply with less than 1260B per request)
The code for that endpoint looks like this
const text = url.parse(req.url, true).query.text;
if (text.length > 4) {
let results = await models.fullTextSearch(text);
results = await results.map(async result=>{
result.Data = await models.FindData(result.ProductID, 30);
return result;
});
results = await Promise.all(results);
results = JSON.stringify(results);
res.writeHead(200, {'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Access-Control-Allow-Origin': '*', 'Cache-Control': 'max-age=600'});
res.write(results);
res.end();
break;
}
res.writeHead(403, {'Content-Type': 'text/plain', 'Access-Control-Allow-Origin': '*'});
res.write("You made an invalid request!");
break;

Here are a number of things to do in order to debug this:
Add console.log(results.length) to make sure the length of the data is what you expect it to be.
Add a callback to res.end(function() { console.log('finished sending response')}) to see if the http library thinks it is done sending the response.
Check the return value from res.write(). If it is false (indicating that not all data has yet been sent), add a handler for the drain event and see if it gets called.
Try increasing the sending timeout with res.setTimeout() in case it's just taking too long to send all the data.

nodejs: http listen interferes with serialport reads

I'm trying to read in data from an arduino using serialport, and serve it to a web browser.
Without the webserver (ie. if I just leave out that 'listen' call at the end), the serial data gets constantly streamed in with the expected 5 updates per second shown in the console.
But when I add the 'listen' call, nothing is shown on the console until I make a request to the server with my web browser, at which time the console gets at most only one log entry added (but sometimes still nothing).
The data shown in the web browser is the 'old' data from whenever the last request was made, not the current latest data from the arduino. In other words, the serial data is processed a little after each http request is served - not very useful.
const http = require('http');
const serialport = require('serialport');
var serial = new serialport('/dev/ttyUSB0', {
baudRate: 115200
});
var jsonStr = '';
var jsonObj = {};
function handleData(data) {
jsonStr += data;
if ( data.indexOf('}') > -1 ) {
try {
jsonObj = JSON.parse(jsonStr);
console.log(jsonObj);
}
catch(e) {}
jsonStr = '';
}
};
serial.on('data', function (data) {
handleData(data);
});
const app = http.createServer((request, response) => {
response.writeHead(200, {"Content-Type": "text/html"});
response.write(JSON.stringify(jsonObj));
response.end();
});
app.listen(3000);
(The data coming from the arduino is already a JSON string which is why I'm looking for a '}' to start parsing it.)
I also tried using the 'readable' event for getting the serial data but it makes no difference:
serial.on('readable', function () {
handleData(serial.read());
});
If I understand it correctly, the listen call itself is not blocking, it merely registers an event listener/callback to be triggered later. As an accepted answer in a related question says: "Think of server.listen(port) as being kinda similar to someElement.addEventListener('click', handler) in the browser."
If node.js is single threaded then why does server.listen() return?
So why is that 'listen' preventing the serial connection from receiving anything, except for briefly each time a request is served? Is there no way I can use these two features without them interfering with each other?

I discovered that the code worked as expected on a different computer, even though the other computer was using the exact same operating system (Fedora 20) and the exact same version of node.js (v10.15.0) which had been installed in the exact same way (built from source).
I also found that it worked ok on the original computer with a more recent version of Fedora (29).
This likely points to some slight difference in usb/serial drivers which I don't have the time, knowledge or need to delve into. I'll just use the configurations I know will work.

How to get realtime stock market quotes through an http request without flooding/hitting request limit (Algotrading)

I made a simple program that uses the Google Finance API to grab stock data through HTTP requests and does some calculations on them.
The google-api looks like this(adds a new block of data every minute during trading hours):
https://www.google.com/finance/getprices?i=60&p=0d&f=d,o,h,l,c,v&df=cpct&q=AAPL
This works fine, however I have a huge list of stock-tickers I need to get data from. In order to loop through them without hitting a request limit I set a time interval of 2 seconds between the requests. There's over 5000 stocks, so this takes forever and I need it to get done in < 5 minutes in order for the algorithm to be useful.
I was wondering if there is a way to achieve this with HTTP requests? Or if I'm tackling this the wrong way. I can't download the data beforehand to do it on the client-side as I need to get the data as soon as the first quotes come out in the morning.
Programmed in JavaScript (nodejs), but answers in any language is fine. Here's the function that I call with 2 second intervals:
var getStockData = function(ticker, day, cb){
var http = require('http');
var options = {
host: "www.google.com",
path: "/"
};
ticker = ticker.replace(/\s+/g, '');
var data = '';
options.path = "/finance/getprices?i=60&p=" +day+"d&f=d,o,h,l,c,v&df=cpct&q=" + ticker;
var callback = function(response){
response.on('data', function(chunk){
data +=chunk;
});
response.on('end', function(){
var data_clean = cleanUp(data);
if(data_clean === -1) console.log('we couldnt find anything for this ticker');
cb(data_clean);
})
};
http.request(options, callback).end();
};
Any help would be greatly appreciated.

If designing against a certain APIwith policy threshold ( refresh-rate ceiling, bandwidth limit, etc. )
avoid refetching data the node has already received
using the as-is URL above, a huge block of data is being (re)-fetched, most rows of which, if not all, were already known from an "identical URL" call just 2 seconds before:
EXCHANGE%3DNASDAQ
MARKET_OPEN_MINUTE=570
MARKET_CLOSE_MINUTE=960
INTERVAL=60
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=-300
a1482330600,116.84,116.84,116.8,116.8,225329
1,116.99,117,116.8,116.84,81304
2,117.26,117.28,116.99,117,225262
3,117.32,117.35,117.205,117.28,153225
4,117.28,117.33,117.22,117.32,104072
.
..
...
..
.
149,116.98,117,116.98,116.98,8175
150,116.994,117,116.98,116.99,2751
151,117,117.005,116.9901,116.9937,7774
152,117.01,117.02,116.99,116.995,13011
153,117.0199,117.02,117.005,117.02,9313
review carefully API-specifications to send smarter requests, yielding minimum-footprint data
watch API End-Of-Life signals, to find another source before API stops provisioning data
(cit.:) The Google Finance APIs are no longer available. Thank you for your interest.
As noted below, in the second comment, the inherent inefficiency of re-fetching repetitively the growing block of already known data is to be avoided.
A professional DataPump design ought use API details for doing this:
adding ts=1482330600 aTimeSTAMP ( unix-format [s] ) to define a start of "new" data to be retrieved, leaving those already seen before the time-stamp, out of the transmitted block.

Response encoding with node.js "request" module

I am trying to get data from the Bing search API, and since the existing libraries seem to be based on old discontinued APIs I though I'd try myself using the request library, which appears to be the most common library for this.
My code looks like
var SKEY = "myKey...." ,
ServiceRootURL = 'https://api.datamarket.azure.com/Bing/Search/v1/Composite';
function getBingData(query, top, skip, cb) {
var params = {
Sources: "'web'",
Query: "'"+query+"'",
'$format': "JSON",
'$top': top, '$skip': skip
},
req = request.get(ServiceRootURL).auth(SKEY, SKEY, false).qs(params);
request(req, cb)
}
getBingData("bookline.hu", 50, 0, someCallbackWhichParsesTheBody)
Bing returns some JSON and I can work with it sometimes but if the response body contains a large amount of non ASCII characters JSON.parse complains that the string is malformed. I tried switching to an ATOM content type, but there was no difference, the xml was invalid. Inspecting the response body as available in the request() callback actually shows bad code.
So I tried the same request with some python code, and that appears to work fine all the time. For reference:
r = requests.get(
'https://api.datamarket.azure.com/Bing/Search/v1/Composite?Sources=%27web%27&Query=%27sexy%20cosplay%20girls%27&$format=json',
auth=HTTPBasicAuth(SKEY,SKEY))
stuffWithResponse(r.json())
I am unable to reproduce the problem with smaller responses (e.g. limiting the number of results) and unable to identify a single result which causes the issue (by stepping up the offset).
My impression is that the response gets read in chunks, transcoded somehow and reassembled back in a bad way, which means the json/atom data becomes invalid if some multibyte character gets split, which happens on larger responses but not small ones.
Being new to node, I am not sure if there is something I should be doing (setting the encoding somewhere? Bing returns UTF-8, so this doesn't seem needed).
Anyone has any idea of what is going on?
FWIW, I'm on OSX 10.8, node is v0.8.20 installed via macports, request is v2.14.0 installed via npm.

i'm not sure about the request library but the default nodejs one works well for me. It also seems a lot easier to read than your library and does indeed come back in chunks.
http://nodejs.org/api/http.html#http_http_request_options_callback
or for https (like your req) http://nodejs.org/api/https.html#https_https_request_options_callback (the same really though)
For the options a little tip: use url parse
var url = require('url');
var params = '{}'
var dataURL = url.parse(ServiceRootURL);
var post_options = {
hostname: dataURL.hostname,
port: dataURL.port || 80,
path: dataURL.path,
method: 'GET',
headers: {
'Content-Type': 'application/json; charset=utf-8',
'Content-Length': params.length
}
};
obviously params needs to be the data you want to send

I think your request authentication is incorrect. Authentication has to be provided before request.get.
See the documentation for request HTTP authentication. qs is an object that has to be passed to request options just like url and auth.
Also you are using same req for second request. You should know that request.get returns a stream for the GET of url given. Your next request using req will go wrong.
If you only need HTTPBasicAuth, this should also work
//remove req = request.get and subsequent request
request.get('http://some.server.com/', {
'auth': {
'user': 'username',
'pass': 'password',
'sendImmediately': false
}
},function (error, response, body) {
});
The callback argument gets 3 arguments. The first is an error when applicable (usually from the http.Client option not the http.ClientRequest object). The second is an http.ClientResponse object. The third is the response body String or Buffer.
The second object is the response stream. To use it you must use events 'data', 'end', 'error' and 'close'.
Be sure to use the arguments correctly.

You have to pass the option {json:true} to enable json parsing of the response

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string