What is the best approach to stream JSON from a REST API to an Express app? - node.js

I have a moleculer-based microservice that has an endpoint which outputs a large JSON object (around tens of thousands of objects)
This is a structured JSON object and I know beforehand what it is going to look like.
[ // ... tens of thousands of these
{
"fileSize": 1155624,
"name": "Gyo v1-001.jpg",
"path": "./userdata/expanded/Gyo v01 (2003)"
},
{
"fileSize": 308145,
"name": "Gyo v1-002.jpg",
"path": "./userdata/expanded/Gyo v01 (2003) (Digital)"
}
// ... tens of thousands of these
]
I went about researching on JSON streaming, and made some headway there, in that I know how to consume a NodeJS ReadableStream client-side. I know I can use oboe to parse the JSON stream.
To that end, this is code in my Express-based app.
router.route("/getComicCovers").post(async (req: Request, res: Response) => {
typeof req.body.extractionOptions === "object"
? req.body.extractionOptions
: {};
oboe({
url: "http://localhost:3000/api/import/getComicCovers",
method: "POST",
body: {
extractionOptions: req.body.extractionOptions,
walkedFolders: req.body.walkedFolders,
},
}).on("node", ".*", (data) => {
console.log(data);
res.write(JSON.stringify(data));
});
});
This is the endpoint in moleculer
getComicCovers: {
rest: "POST /getComicCovers",
params: {
extractionOptions: "object",
walkedFolders: "array",
},
async handler(
ctx: Context < {
extractionOptions: IExtractionOptions;
walkedFolders: IFolderData[];
} >
) {
const comicBooksForImport = await getCovers(
ctx.params.extractionOptions,
ctx.params.walkedFolders
);
// comicBooksForImport is the aforementioned array of objects.
// How do I stream it from here to the Express app object-by-object?
},
},
My question is: How do I stream this gigantic JSON from the REST endpoint to the Express app so I can parse it on the client end?
UPDATE
I went with a socket.io implementation per #JuanCaicedo's suggestion. I have it setup on both the server and the client end.
However, I do have trouble with this piece of code
map(
walkedFolders,
async (folder, idx) => {
let foo = await extractArchive(
extractionOptions,
folder
);
let fo =
new JsonStreamStringify({
foo,
});
fo.pipe(res);
if (
+idx ===
walkedFolders.length - 1
) {
res.end();
}
}
);
I get a Error [ERR_STREAM_WRITE_AFTER_END]: write after end error. I understand that this happens because the response is terminated before the next iteration attempts to pipe the updated value of foo (which is a stream) into the response.
How do I get around this?

Are you asking for a general approach recommendation, or for support with the particular solution you have?
If it's for the first, then I think your best bet for communicating between the server and the client is through websockets, perhaps with something like Socket.io. A long lived connection will serve you well here, since it will take a long time to transmit all your data across.
Then you can send data from the server to the client any time you like. At that point you can read your data on the server as a node.js stream and emit the data one at a time.
The problem with using Oboe and writing to the response on every node is that it requires a long running response, and there's a high likelihood the connection could get interrupted before you've sent all the data across.

Related

How to send audio saved as a Buffer, from my api, to my React client and play it?

I've been chasing my tail for two days figuring out how to best approach sending the <Buffer ... > object generated by Google's Text-To-Speech service, from my express-api to my React app. I've come across tons of different opinionated resources that point me in different directions and only potentially "solve" isolated parts of the bigger process. At the end of all of this, while I've learned a lot more about ArrayBuffer, Buffer, binary arrays, etc. yet I still feel just as lost as before in regards to implementation.
At its simplest, all I aim to do is provide one or more strings of text to tts, generate the audio files, send the audio files from my express-api to my react client, and then automatically play the audio in the background on the browser when appropriate.
I am successfully sending and triggering google's tts to generate the audio files. It responds with a <Buffer ...> representing the binary data of the file. It arrives in my express-api endpoint, from there I'm not sure if I should
...
convert the Buffer to a string and send it to the browser?
send it as a Buffer object to the browser?
set up a websocket using socket.io and stream it?
then once it's on the browser,
do I use an <audio /> tag?
should I convert it to something else?
I suppose the problem I'm having is trying to find answers for this results in an information overload consisting of various different answers that have been written over the past 10 years using different approaches and technologies. I really don't know where one starts and the next ends, what's a bad practice, what's a best practice, and moreover what is actually suitable for my case. I could really use some guidance here.
Synthesise function from Google
// returns: <Buffer ff f3 44 c4 ... />
const synthesizeSentence = async (sentence) => {
const request = {
input: { text: sentence },
voice: { languageCode: "en-US", ssmlGender: "NEUTRAL" },
audioConfig: { audioEncoding: "MP3" },
};
const response = await client.synthesizeSpeech(request);
return response[0].audioContent;
};
(current shape) of express-api POST endpoint
app.post("/generate-story-support", async (req, res) => {
try {
// ? generating the post here for simplicity, eventually the client
// ? would dictate the sentences to send ...
const ttsResponse: any = await axios.post("http://localhost:8060/", {
sentences: SAMPLE_SENTENCES,
});
// a resource said to send the response as a string and then convert
// it on the client to an Array buffer? -- no idea if this is a good practice
return res.status(201).send(ttsResponse.data[0].data.toString());
} catch (error) {
console.log("error", error);
return res.status(400).send(`Error: ${error}`);
}
});
react client
so post
useEffect(() => {
const fetchData = async () => {
const data = await axios.post(
"http://localhost:8000/generate-story-support"
);
// converting it to an ArrayBuffer per another so post
const encoder = new TextEncoder();
const encodedData = encoder.encode(data.data);
setAudio(encodedData);
return data.data;
};
fetchData();
}, []);
// no idea what to do from here, if this is even the right path :/

Axios request with chunked response stream from Node

I have a client app in React, and a server in Node (with Express).
At server side, I have an endpoint like the following (is not the real endpoint, just an idea of what i'm doing):
function endpoint(req, res) {
res.writeHead(200, {
'Content-Type': 'text/plain',
'Transfer-Encoding': 'chunked'
});
for(x < 1000){
res.write(some_string + '\n');
wait(a_couple_of_seconds); // just to make process slower for testing purposes
}
res.end();
}
This is working perfect, i mean, when I call this endpoint, I receive the whole stream with all the 1.000 rows.
The thing is that I cannot manage to get this data by chunks (for each 'write' or a bunch of 'writes') in order to show that on the frontend as soon as i'm receiving them..(think of a table that shows the rows as soon as i get them from the endpoint call)
In the frontend I'm using Axios to call the API with the following code:
async function getDataFromStream(_data): Promise<any> {
const { data, headers } = await Axios({
url: `http://the.api.url/endpoint`,
method: 'GET',
responseType: 'stream',
timeout: 0,
});
// this next line doesn't work. it says that 'on' is not a function
data.on('data', chunk => console.log('chunk', chunk));
// data has actually the whole response data (all the rows)
return Promise.resolve();
}
The thing is that the Axios call returns the whole data object after the 'res.end()' on the server is called, but I need to get data as soon as the server will start sending the chunks with the rows (on each res.write or whenever the server thinks is ready to send some bunch of chunks).
I have also tried not to use an await and get the value of the promise at the 'then()' of the axios call but it is the same behavior, the 'data' value comes with all the 'writes' together once the server does the 'res.end()'
So, what I doing wrong here ? maybe this is not possible with Axios or Node and I should use something like websockets to solve it.
Any help will be very appreciate it because I read a lot but couldn't get a working solution yet.
For anyone interested in this, what I ended up doing is the following:
At the Client side, I used the Axios onDownloadProgress handler that allows handling of progress events for downloads.
So, I implemented something like this:
function getDataFromStream(_data): Promise<any> {
return Axios({
url: `http://the.api.url/endpoint`,
method: 'GET',
onDownloadProgress: progressEvent => {
const dataChunk = progressEvent.currentTarget.response;
// dataChunk contains the data that have been obtained so far (the whole data so far)..
// So here we do whatever we want with this partial data..
// In my case I'm storing that on a redux store that is used to
// render a table, so now, table rows are rendered as soon as
// they are obtained from the endpoint.
}
}).then(({ data }) => Promise.resolve(data));
}

Getting net::ERR_INCOMPLETE_CHUNKED_ENCODING 200 when consuming event-stream using EventSource in ReactJs

I have a very simple node service exposing an endpoint aimed to use Server Send Events (SSE) connection and a very basic ReactJs client consuming it via EventSource.onmessage.
Firstly, when I set a debug point in updateAmountState (Chrome Dev) I can't see it evoked.
Secondly, I am getting net::ERR_INCOMPLETE_CHUNKED_ENCODING 200 (OK). According to https://github.com/aspnet/KestrelHttpServer/issues/1858 "ERR_INCOMPLETE_CHUNKED_ENCODING in chrome usually means that an uncaught exception was thrown from the application in the middle of writing to the response body". Then I checked the server side to see if I find any error. Well, I set break point in few places in server.js in both setTimeout(() => {... and I see it run periodically. I would expected each line to run once only. So it seems the front-end is trying permanently call the backend and getting some error.
The whole application, both front in ReactJs and the server in NodeJs can be found in https://github.com/jimisdrpc/hello-pocker-coins.
backend:
const http = require("http");
http
.createServer((request, response) => {
console.log("Requested url: " + request.url);
if (request.url.toLowerCase() === "/coins") {
response.writeHead(200, {
Connection: "keep-alive",
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache"
});
setTimeout(() => {
response.write('data: {"player": "Player1", "amount": "90"}');
response.write("\n\n");
}, 3000);
setTimeout(() => {
response.write('data: {"player": "Player2", "amount": "95"}');
response.write("\n\n");
}, 6000);
} else {
response.writeHead(404);
response.end();
}
})
.listen(5000, () => {
console.log("Server running at http://127.0.0.1:5000/");
});
frontend:
import React, { Component } from "react";
import ReactTable from "react-table";
import "react-table/react-table.css";
import { getInitialCoinsData } from "./DataProvider";
class App extends Component {
constructor(props) {
super(props);
this.state = {
data: getInitialCoinsData()
};
this.columns = [
{
Header: "Player",
accessor: "player"
},
{
Header: "Amount",
accessor: "amount"
}
];
this.eventSource = new EventSource("coins");
}
componentDidMount() {
this.eventSource.onmessage = e =>
this.updateAmountState(JSON.parse(e.data));
}
updateAmountState(amountState) {
let newData = this.state.data.map(item => {
if (item.amount === amountState.amount) {
item.state = amountState.state;
}
return item;
});
this.setState(Object.assign({}, { data: newData }));
}
render() {
return (
<div className="App">
<ReactTable data={this.state.data} columns={this.columns} />
</div>
);
}
}
export default App;
The exception I can see on chrome:
So my straight question is: why I am getting ERR_INCOMPLETE_CHUNKED_ENCODING 200? Am I missing something in the backend or in the frontend?
Some tips may help me:
Why do I see websocket in oending status since I am not using websocket at all? I know the basic difference (websocket is two-way, from front to back and from back to front and is a diferent protocol while SSE run over http and is only back to front). But it is not my intention to use websocket at all. (see blue line in printscreen belllow)
Why do I see eventsource with 0 bytes and 236 bytes both failled. I understand that eventsource is exactly what I am trying to use when I coded "this.eventSource = new EventSource("coins");". (see read line in printscreen bellow)
Very strange at least for me, some time when I kill the serve I could see updateAmountState methond evoked.
If call the localhost:5000/coins in browser I can see the server answers the response (both json strings). Can I assume that I coded properly the server and the erros is something exclusevely in the frontend?
Here are the answers to your questions.
The websocket you see running is not related to the code you have posted here. It may be related to another NPM package that you are using in your app. You might be able to figure out where it is coming from by looking at the headers in the network request.
The most likely cause of the eventsource requests failing is that they are timing out. The Chrome browser will kill an inactive stream after two minutes of inactivity. If you want to keep it alive, you need to add some code to send something from the server to the browser at least once every two minutes. Just to be safe, it is probably best to send something every minute. An example of what you need is below. It should do what you need if you add it after your second setTimeout in your server code.
const intervalId = setInterval(() => {
res.write(`data: keep connection alive\n\n`);
res.flush();
}, 60 * 1000);
req.on('close', () => {
// Make sure to clean up after yourself when the connection is closed
clearInterval(intervalId);
});
I'm not sure why you are sometimes seeing the updateAmountState method being invoked. If you are not seeing it consistently, it's probably not a major concern, but it might help to clean up the setTimeouts in the case that the server stops before they complete. You can do this by declaring them as variables and then passing the variable names to a clearTimeout in a close event handler similar to what I did for the interval in the example in #2 above.
Your code is set up properly, and the error you are seeing is due to the Chrome browser timeouts. Use something like the code in answer #2 above if you want to stop the errors from happening.
I'm not a Node.js expert myself, but it looks like you miss "'Connection': 'keep-alive'" and a "\n" after that - i.e.:
response es.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive'
});
response.write('\n');
see https://jasonbutz.info/2018/08/server-sent-events-with-node/. Hope it works!

NodeJS Stream: Transforming JSON from Input Stream to Output Stream and change some values

I've been scratching my head for days to solve this problem. I want to change value of some key from a relatively big JSON string streamed from the HTTP request, and then stream it to the client. Pretend this is a big JSON:
{
"name":"George",
"country": {
"home": "United States",
"current": "Canada"
}
}
And I want output like this by changing name.country.current
{
"name":"George",
"country": {
"home": "United States",
"current": "Indonesia"
}
}
The transformation is done within a restify handler:
let proxyHandler = function(req, res, next) {
let proxyReq = http.request(opt, r => {
r.on('data', data => {
// transform here and send the data using response.write();
// and close the response object
// when the parsing ends
});
proxyReq.end();
next();
}
I cannot use JSON.parse because the size of the JSON is big, so I'd need to stream/parse/transform it as it arrives. Is there any library out there that able to do so?
I've tried using stream-json, however it's very slow when I need to combine the Transform stream. When I initiate a huge number of requests it just crawls and then timed out.
Because the client is not sent a Content-Length header, the server need to close the stream.
UPDATE:
I understand that there's a streaming JSON parser. However what I need is not only a parser, but also emitter. The process would be
JSON -> Parse (event based) -> Transform parse event -> Emit the transformed JSON. All need to be done in NodeJS stream.
As I've mentioned above, I've used stream-json, I'v written my stack-based emitter but it was slow and created backpressure when a lot of requests come in. What I ask if there's any node library out there that able to process in one go. Ideally, the library can be executed like below:
// JSONTransform is a hypotetical library class
result
.pipe(new JSONTransform('name.country.home', (val) => 'Indonesia')
.pipe(response)
Suppose you want to change the property at foo.bar.jar
The pseudo step could be as follow:
Buffer your data till you find an "{" tag (Ex: Buffer='foo: {' )
Get the property name from Buffer, stream the Buffer down response and clear it
Does the property name match 'foo'
If not, continue streaming down response til you find the closing "}" tag (skip all property in the between).
Repeat step-1 to step-3
If yes do step-4
Repeat step-1 to step-3, only this time check for property name matching 'bar'

Reasonable design of using socket.io for RPC

I am building a web app that uses socket.io to trigger remote procedure calls that passes sessions specific data back go the client. As my app is getting bigger and getting more users, I wanted to check to see if my design is reasonable.
The server that receives websocket communications and triggers RPCs looks something like this:
s.on('run', function(input) {
client.invoke(input.method, input.params, s.id, function(error, res, more) {
s.emit('output', {
method: input.method,
error: error,
response: res,
more: more,
id: s.id
});
});
});
However, this means that the client has to first emit the method invocation, and then listen to all method returns and pluck out its correct return value:
socket.on('output', function(res) {
if (res.id === socket.sessionid) {
if (!res.error) {
if(res.method === 'myMethod') {
var myResponse = res.response;
// Do more stuff with my response
});
}
}
});
It is starting to seem like a messy design as I add more and more functions... is there a better way to do this?
The traditional AJAX way of attaching a callback to each function would be a lot nicer, but I want to take advantage of the benefits of websockets (e.g. less overhead for rapid communications).

Resources