stream array of objects nested in objects in nodejs - node.js

I fetch data from a API using nodejs.
I get a response with such a structure (The response is saved by a stream into JSON file)
{"data":{"total":40,"data":[{"date":"20220914","country":"PL","data1":1,"data2":2,"data3":3,"data4":"4"},{"date":"20220914","country":"DE","data1":21,"data2":22,"data3":23,"data4":"24"},{"date":"20220914","country":"DE","data1":21,"data2":22,"data3":23,"data4":"24"},{"date":"20220914","country":"PL","data1":1,"data2":2,"data3":3,"data4":"4"}], "total_page":1,"page":1,"page_size":100},"success":true,"code":"0","request_id":"123"}
Now I would like read the file in a stream, do some transforms on the each object, however I am not able to retrieve it object by object.
The problem is the array with data which I'm interested in is nested in .data.data object keys and I don't know how to get each element of the array one by one and modify it.
import { pipeline, Transform } from 'stream';
import { promisify } from 'util';
import fs from 'fs';
public async processData() {
await this.api.getReport();
const reader = fs.createReadStream('./response.json');
const writer = fs.createWriteStream('properFormat.txt');
const asyncPipeline = promisify(pipeline);
const newFormatedData = (object: Record<string, string>) => {
//Here I would like to take into consideration only values for example with the key: date, country and data1
console.log(object.toString());
};
const formatData = new Transform({
objectMode: true,
transform(chunk, encoding, done) {
this.push(newFormatedData(chunk));
done();
},
});
asyncPipeline(reader, formatData, writer);
}
Thank you for any hints on this!

Related

Nodejs forking stream, redundant data

I'm new to nodejs stream, and i don't understand why when i try to fork a stream after a trasform i get the same data multiple times. This is the example:
import { createReadStream, createWriteStream } from "fs";
import { Transform } from "stream";
const inputStream = createReadStream("assets/input.txt");
const outputStream1 = createWriteStream("assets/output1.txt");
const outputStream2 = createWriteStream("assets/output2.txt");
const t1 = new Transform({
transform(chunk, encoding, callback) {
this.push(chunk.toString().toUpperCase());
callback();
}
});
inputStream.pipe(t1).pipe(outputStream1);
inputStream.pipe(t1).pipe(outputStream2);
I would expect to get the data just one time. But this are the resulting files:
Input:
hello world
output1:
HELLO WORLDHELLO WORLD
output2:
HELLO WORLDHELLO WORLD
Thank you in advance for the help.

IPFS streams not buffers

I'm trying to create an API get URL that can be called with <img src='...'/> that will actually load the image from IPFS.
I'm getting the file from IPFS and I can send it as a buffer via fastify but can't send it as a stream.
here's the working buffer using ipfs.cat
import { concat as uint8ArrayConcat } from "uint8arrays/concat";
import all from "it-all";
fastify.get(
"/v1/files/:username/:cid",
async function (request: any, reply: any) {
const { cid }: { cid: string } = request.params;
const ipfs = create();
const data = uint8ArrayConcat(await all(ipfs.cat(cid)));
reply.type("image/png").send(data);
}
);
Docs for ipfs cat
Docs for fastify reply buffers
I also tried sending it as a stream to try and not load the file into the server's memory...
import { concat as uint8ArrayConcat } from "uint8arrays/concat";
import all from "it-all";
import { Readable } from "stream";
...
fastify.get(
"/v1/files/:username/:cid",
async function (request: any, reply: any) {
const { cid }: { cid: string } = request.params;
const ipfs = create();
const bufferToStream = async (buffer: any) => {
const readable = new Readable({
read() {
this.push(buffer);
this.push(null);
},
});
return readable;
};
const data = uint8ArrayConcat(await all(ipfs.cat(cid)));
const str = await bufferToStream(data);
reply.send(str);
}
);
With a new error
Error [ERR_STREAM_WRITE_AFTER_END]: write after end
Here I'm trying to push into the stream
import { concat as uint8ArrayConcat } from "uint8arrays/concat";
import all from "it-all";
import { Readable } from "stream";
fastify.get(
"/v1/files/:username/:cid",
async function (request: any, reply: any) {
const { cid }: { cid: string } = request.params;
const ipfs = create();
const myStream = new Readable();
myStream._read = () => {};
const pushChunks = async () => {
for await (const chunk of ipfs.cat(cid)) {
myStream.push(chunk);
}
};
pushChunks();
reply.send(myStream);
}
);
the error now is
INFO (9617): stream closed prematurely
and trying to dump it all in the stream
import { concat as uint8ArrayConcat } from "uint8arrays/concat";
import all from "it-all";
import { Readable } from "stream";
fastify.get(
"/v1/files/:username/:cid",
async function (request: any, reply: any) {
const { cid }: { cid: string } = request.params;
const ipfs = create();
var myStream = new Readable();
myStream._read = () => {};
myStream.push(uint8ArrayConcat(await all(ipfs.cat(cid))));
myStream.push(null);
reply.send(myStream);
}
);
with error
WARN (14295): response terminated with an error with headers already sent
Is there any benefit to converting it to a stream? Hasn't IPFs already loaded it into memory??
Is there any benefit to converting it to a stream? Hasn't IPFs already loaded it into memory??
The ipfs module returns many chunks as a byte array.
So a file is the sum of these chunks.
Now, if you push all these chunks into an Array, and then uint8ArrayConcat is called, all the chunks are actually in your memory server.
So, if the file is 10 GB, your server has an array of bytes equal to 10 GB in memory.
Since this is unwanted for sure, you should push every chunk from the ipfs file to the response. By doing this, the chunk buffer array is transitive in the server's memory, but it is not persisted. So, in this case, you will not have the 10 GB file in memory, but only a tiny slice of it.
Since ipfs.cap returns an Async iterator, you could handle manually or use something like async-iterator-to-stream to write:
const ipfsStream = asyncIteratorToStream(ipfs.cat(cid))
return reply.send(ipfsStream)
As follow up, I share this awesome resource about node.js stream and buffers

remix passing args to async server side functions

I am actually a really beginner with this stuff so I beg your pardon for my (silly) questions.
I want to use async functions inside tsx pages, specifically those functions are fetching calls from shopify to get data and ioredis calls to write and read some data.
I know that remix uses action loader functions, so to manage shopify calls I figured out this
export const loader: LoaderFunction = async ({ params }) => {
return json(await GetProductById(params.id as string));
};
async function GetProductById(id: string) {
const ops = ...;
const endpoint = ...;
const response = await fetch(endpoint, ops);
const json = await response.json();
return json;
};
export function FetchGetProductById(id: number) {
const fetcher = useFetcher();
useEffect(() => {
fetcher.load(`/query/getproductid/${id}`);
}, []);
return fetcher.data;
}
with this solution I can get the data whenever I want just calling FetchGetProductById, but my problem is that I need to send more complex data to the loader (like objects)
How may I do that?
In Remix, the loader only handles GET requests, so data must be in the URL, either via params or searchParams (query string).
If you want to pass data in the body of the request, then you'll need to use POST and create an action.
NOTE: Remix uses FormData and not JSON to send data. You will need to convert your JSON into a string.
export const action = async ({ request }: ActionArgs) => {
const formData = await request.formData();
const object = JSON.parse(formData.get("json") as string);
return json(object);
};
export default function Route() {
const fetcher = useFetcher();
useEffect(() => {
if (fetcher.state !== 'idle' || fetcher.data) return;
fetcher.submit(
{
json: JSON.stringify({ a: 1, message: "hello", b: true }),
},
{ method: "post" }
);
}, [fetcher]);
return <pre>{JSON.stringify(fetcher.data, null, 2)}</pre>
}

Using csv-parse with highlandjs

I would like to do a bit of parsing on csv files to convert them to JSON and extract data out of them. I'm using highland as a stream processing library. I am creating an array of csv parsing streams using
import { readdir as readdirCb, createReadStream } from 'fs';
import { promisify } from 'util';
import _ from 'highland';
import parse from 'csv-parse';
const readdir = promisify(readdirCb);
const LOGS_DIR = './logs';
const options = '-maxdepth 1';
async function main() {
const files = await readdir(LOGS_DIR)
const stream = _(files)
.map(filename => createReadStream(`${LOGS_DIR}/${filename}`))
.map(parse)
}
main();
I have tried to use stream like:
const stream = _(files)
.map(filename => createReadStream(`${LOGS_DIR}/${filename}`))
.map(parse)
.each(stream => {
stream.on('parseable', () => {
let record
while (record = stream.read()) { console.log(record) }
})
})
This does not produce any records. I am not sure as how to proceed and receive the JSON for each row for each CSV file.
EDIT:
Writing a function like this works for an individual file:
import parse from 'csv-parse';
import transform from 'stream-transform';
import { createReadStream } from 'fs';
export default function retrieveApplicationIds(filename) {
console.log('Parsing file', filename);
return createReadStream(filename).pipe(parser).pipe(getApplicationId).pipe(recordUniqueId);
}
Edit 2:
I have tried using the concat streams approach:
const LOGS_DIR = './logs';
function concatStreams(streamArray, streamCounter = streamArray.length) {
streamArray.reduce((mergedStream, stream) => {
// pipe each stream of the array into the merged stream
// prevent the automated 'end' event from firing
mergedStream = stream.pipe(mergedStream, { end: false });
// rewrite the 'end' event handler
// Every time one of the stream ends, the counter is decremented.
// Once the counter reaches 0, the mergedstream can emit its 'end' event.
stream.once('end', () => --streamCounter === 0 && mergedStream.emit('end'));
return mergedStream;
}, new PassThrough());
}
async function main() {
const files = await readdir(LOGS_DIR)
const streams = files.map(parseFile);
const combinedStream = concatStreams(streams);
combinedStream.pipe(process.stdout);
}
main();
When I use this, I get the error:
(node:1050) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 unpipe listeners added to [Transformer]. Use emitter.setMaxListeners() to increase limit

Nodejs module that returns a stream with a pipe

I am trying to create a Nodejs module that returns a Transform stream. It takes as its input a readable stream. However, I would like output to be passed through another stream before being returned. For example:
const { Transform } = require('stream')
const JSONStream = require('JSONStream')
let myTransform = new Transform({
objectMode: true,
transform: function(chunk, encoding, callback) {
callback(null, chunk.foo + 1)
}
})
module.exports = myTransform.pipe(JSONStream.stringify('[', ',', ']'))
When I do this, the stream myTransform is ignored. I realize I could move the pipe to JSONStream elsewhere, for example request('https://...').pipe(myTranform).pipe(JSONStream...) but I would like to keep that part as part of the module.

Resources