How do I read csv file line by line, modify each line, write result to another file - node.js

I recently used event-stream library for nodejs to parse a huge csv file, saving results to database.
How do I solve the task of not just reading a file, but modifying each line, writing result to new file?
Is it some combination of through and map method, or duplex? Any help is highly appreciated.

If you use event-stream for read you can use split() method process csv line by line. Then change and write line to new writable stream.
var fs = require('fs');
var es = require('event-stream');
const newCsv = fs.createWriteStream('new.csv');
fs.createReadStream('old.csv')
.pipe(es.split())
.pipe(
es.mapSync(function(line) {
// modify line way you want
newCsv.write(line);
}))
newCsv.end();

Related

How do I export the data generated by this code to a CSV file in puppeteer?

I need to export the data generated by this code into a CSV file. I am new to node.js/puppeteer so I am struggling on generating a CSV file.
I understand I can use the fs write function and tried adding this to the end of my code to no avail:
const fs = require('fs');
const csv = await page.$$eval('.product_desc_txt', function(products){
// Iterate over product descriptions
let csvLines = products.map(function(product){
// Inside of each product find product SKU and its price
let productId = document.querySelector(".custom-body-copy").innerText.trim();
let productPrice = document.querySelector("span[data-wishlist-linkfee]").innerText.trim();
// Fomrat them as a csv line
return `${productId};${productPrice}`;
});
// Join all lines into one file
return csvLines.join("\n");
});
fs.writeFileSync("test.csv", csv)
});
You've got csv with data from puppeteer, but don't use it. Just write the data to file:
fs.writeFileSync("test.csv", csv);
Also writing to file this
'${productId};${productPrice}'
won't work, there are no such variables at that place and even if there were, the correct way to format variables into a string is with backticks:
`${productId};${productPrice}`

How can I read json values from a file?

So basically I have these json values in my config.json file, but how can I read them from a .txt file, for example:
{"prefix": $}
This would set a variable configPrefix to $. Any help?
You can use require() to read and parse your JSON file in one step:
let configPrefix = require("./config.json").prefix;
Or, if you wanted to get multiple values from that config:
const configData = require("./config.json");
let configPrefix = configData.prefix;
If your data is not actually JSON formatted, then you have to read the file yourself with something like fs.readFile() or fs.readFileSync() and then parse it yourself according to whatever formatting rules you have for the file.
If you are going to be reading this file just as the start of the program then go ahead and use require or import if you have babel. just a tip, suround the require with a try catch block to handle possible errors.
let config
try {
config = require('path.to.file.json')
} catch (error) {
// handle error
config = {}
}
If you will be changing this file externally and you feel the need to source it then apart from reading it at the start you will need a function that uses fs.readFile. consider doing it like this and not with readFileAsync unless you need to block the program until you are done reading the config file.
After all of that you can do const configPrefix = config.prefix which will have the value '$'.

gunzip partials read from read-stream

I use Node.JS to fetch files from my S3 bucket.
The files over there are gzipped (gz).
I know that the contents of each file is composed by lines, where each line is a JSON of some record that failed to be put on Kinesis.
Each file consists of ~12K such records. and I would like to be able to process the records while the file is being downloaded.
If the file was not gzipped, that could be easily done using streams and readline module.
So, the only thing that stopping me from doing this is the gunzip process which, to my knowledge, needs to be executed on the whole file.
Is there any way of gunzipping a partial of a file?
Thanks.
EDIT 1: (bad example)
Trying what #Mark Adler suggested:
const fileStream = s3.getObject(params).createReadStream();
const lineReader = readline.createInterface({input: fileStream});
lineReader.on('line', line => {
const gunzipped = zlib.gunzipSync(line);
console.log(gunzipped);
})
I get the following error:
Error: incorrect header check
at Zlib._handle.onerror (zlib.js:363:17)
Yes. node.js has a complete interface to zlib, which allows you to decompress as much of a gzip file at a time as you like.
A working example that solves the above problem
The following solves the problem in the above code:
const fileStream = s3.getObject(params).createReadStream().pipe(zlib.createGunzip());
const lineReader = readline.createInterface({input: fileStream});
lineReader.on('line', gunzippedLine => {
console.log(gunzippedLine);
})

Writing long strings to file (node js)

I have a string which is 169 million chars long, which I need to write to a file and then read from another process.
I have read about WriteStream and ReadStream, but how do I write the string to a file when it has no method 'pipe'?
Create a write stream is a good idea. You can use it like this:
var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');
wstream.write('Hello world!\n');
wstream.write('Another line\n');
wstream.end();
You can call to write as many time as you need, with parts of that 16 million chars string. Once you have finished writing the file, you can create a read stream to read chunks of the file.
However, 16 million chars are not that much, I would say you could read and write it at once and keep in memory the whole file.
Update: As requested in comment, I update with an example to pipe the stream to zip on-the-fly:
var zlib = require('zlib');
var gzip = zlib.createGzip();
var fs = require('fs');
var out = fs.createWriteStream('input.txt.gz');
gzip.pipe(out);
gzip.write('Hello world!\n');
gzip.write('Another line\n');
gzip.end();
This will create a gz file, and inside, only one file with same name (without the .gz at the end).
This might solve your problem
var fs = require('fs');
var request = require('request');
var stream = request('http://i.imgur.com/dmetFjf.jpg');
var writeStream = fs.createWriteStream('./testimg.jpg');
stream.pipe(writeStream);
Follow the link for more details
http://neethack.com/2013/12/understand-node-stream-what-i-learned-when-fixing-aws-sdk-bug/
If you're looking to write what's called a blocking process, eg something that will prevent you from doing something else, approaching that process asynchronously is the best solution (and why node.js is good at solving these types of problems). With that said, avoid methods that have fs.*Sync as that will be a synchronous method. fs.writeFile is what I believe you're looking for. Read the Docs

Node.js import csv with blank fields

I'm trying to import & parse a CSV file using the csv-parse package, but having difficulty with requireing the csv file in the first place.
When I do input = require('../../path-to-my-csv-file')
I get an error due to consecutive commas because some fields are blank:
e","17110","CTSZ16","Slitzerâ„¢ 16pc Cutlery Set in Wood Block",,"Spice up
^
SyntaxError: Unexpected token ,
How do I import the CSV file into the node environment to begin with?
Package examples are Here
To solve your first problem, reading CSV with empty entries:
Use the 'fast-csv' node package. It will parse csv with emtpy entries.
To answer your second question, how to import a CSV into node:
You don't really "import" csv files into node. You should fs.open the file
or use fs.createReadStream to read the csv file at the appropriate location.
Below is a script that uses fs.createReadStream to parse a CSV called 'test.csv' that is one directory up from the script that is running it.
The first section sets up our program, makes basic declarations of the objects were going use to store our parsed list.
var csv = require('fast-csv') // require fast-csv module
var fs = require('fs') // require the fs, filesystem module
var uniqueindex = 0 // just an index for our array
var dataJSON = {} // our JSON object, (make it an array if you wish)
This next section declares a stream that will intercept data as it's read from our CSV file and do stuff to it. In this case we're intercepting the data and storing it in a JSON object and then saving that JSON object once the stream is done. It's basically a filter that intercepts data and can do what it wants with it.
var csvStream = csv() // - uses the fast-csv module to create a csv parser
.on('data',function(data){ // - when we get data perform function(data)
dataJSON[uniqueindex] = data; // - store our data in a JSON object dataJSON
uniqueindex++ // - the index of the data item in our array
})
.on('end', function(){ // - when the data stream ends perform function()
console.log(dataJSON) // - log our whole object on console
fs.writeFile('../test.json', // - use fs module to write a file
JSON.stringify(dataJSON,null,4), // - turn our JSON object into string that can be written
function(err){ // function(err) only gets performed once were done saving the file and err will be nil if there is no error
if(err)throw err //if there's an error while saving file throw it
console.log('data saved as JSON yay!')
})
})
This section creates what is called a "readStream" from our csv file. The path to the file is relative. A stream is just a way of reading a file. It's pretty powerful though because the data from a stream can be piped into another stream.
So we'll create a stream that reads the data from our CSV file, and then well pipe it into our pre-defined readstream / filter in section 2.
var stream = fs.createReadStream('../test.csv')
stream.pipe(csvStream)
This will create a file called 'test.json' one directory up from the place where our csv parsing script is. test.json will contain the parsed CSV list inside a JSON object. The order in which the code appears here is how it should appear in a script you make.

Resources