NodeJS - read CSV file to array returns [] - node.js

I'm trying to use the promised-csv module (https://www.npmjs.com/package/promised-csv) to read the rows of a CSV file to an array of strings for a unit test:
const inputFile = '.\\test\\example_result.csv';
const CsvReader = require('promised-csv');
function readCSV(inputFile){
var reader = new CsvReader();
var output = [];
reader.on('row', function (data) {
//console.log(data);
output.push(data[0]);
});
reader.read(inputFile, output);
return output;
}
I would like to call this function later in a unit test.
it("Should store the elements of the array", async () => {
var resultSet = readCSV(inputFile);
console.log(resultSet);
});
However, resultSet yields an empty array. I am also open to use any other modules, as long as I can get an array of strings as a result.

The code should look something like this, according to the docs.
const inputFile = './test/example_result.csv';
const CsvReader = require('promised-csv');
function readCSV(inputFile) {
return new Promise((resolve, reject) => {
var reader = new CsvReader();
var output = [];
reader.on('row', data => {
// data is an array of data. You should
// concatenate it to the data set to compile it.
output = output.concat(data);
});
reader.on('done', () => {
// output will be the compiled data set.
resolve(output);
});
reader.on('error', err => reject(err));
reader.read(inputFile);
});
}
it("Should store the elements of the array", async () => {
var resultSet = await readCSV(inputFile);
console.log(resultSet);
});

readCSV() returns a Promise. There are two ways that you can access the data it returns upon completion.
As Roland Starke suggests, use async and await.
var resultSet = await readCSV(inputFile);
This will wait for the Promise to resolve before returning a value.
More here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await
Use Promise.prototype.then() - this is similar to async/await, but can also be chained with other promises and Promise.prototype.catch().
The most important thing to remember is that the function passed to .then() will not be executed until readCSV() has resolved.
readCSV().then((data)=>{return data}).catch((err)=>{console.log(err)})
More here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/then

Related

Comparing two arrays from async functions?

I have read a lot of posts on how to solve this problem, but I cannot understand it.
I have a database (psql) and a csv. I have a two functions. One to read a list of domains from psql. And another to read a different list of domains from the csv.
Both functions are async operations that live in separate modules.
Goal: to bring the results of both reader functions (which are arrays)into the same file and compare the files for duplicates.
Currently, I have made progress using Promise.all. However, I cannot seem to isolate the two separate arrays so I can use them.
Solution Function (not working):
This is where I am trying to read in both lists into two separate arrays.
The CSVList variable has a console.log that logs the array when the CSVList.filter is not present. Which leads me to believe that the array is actually there? Maybe?
const allData = async function () {
let [result1, result2] = await Promise.all([readCSV, DBList]);
const DBLists = result2(async (domainlist) => {
return domainlist;
});
const CSVList = result1(async (csv) => {
const csvArr = await csv.map((x) => {
return x[0];
});
console.log(csvArr);
return csvArr;
});
const main = await CSVList.filter((val) => !DBLists.includes(vals)); // this doesn't work. it says that filter is not a function. I understand why filter is not a function. What I do not understand is why the array is not being returned?
};
allData();
psql reader:
const { pool } = require("./pgConnect");
//
const DBList = async (callback) => {
await pool
.query(
`
SELECT website
FROM domains
limit 5
`
)
.then(async (data) => {
const domainList = await data.rows.map((x) => {
return x.website;
});
callback(domainList);
});
};
csv reader:
const { parseFile } = require("#fast-csv/parse");
const path = require("path");
const fs = require("fs");
const domainPath = path.join(__dirname, "domains.csv");
//reads initial domain list and pushes the domains to an array
//on end, calls a callback function with the domain data
const readCSV = async (callback) => {
let domainList = [];
let csvStream = parseFile(domainPath, { headers: false })
.on("data", (data) => {
//push csv data to domainList array
domainList.push(data);
// console.log(data);
})
.on("end", () => {
callback(domainList);
});
};
I took jFriend00 Advice and I updated my code a bit.
The biggest issue was the ReadCSV function. Fast-csv doesn't seem to be asynchronous. I wrapped it in a new promise manually. And then resolved that promise passing the domain list as an argument to resolve.
updated CSV Reader:
const readCSV2 = new Promise((resolve, reject) => {
let domainList = [];
let csvStream = parseFile(domainPath, { headers: false })
.on("data", (data) => {
//push csv data to domainList array
domainList.push(data[0]);
// console.log(data);
})
.on("end", () => {
resolve(domainList);
});
});
Updated Solution for comparing the two lists
const allData = async function () {
// get the values from the DB and CSV in one place
let [result1, result2] = await Promise.all([readCSV2, DBList]);
const CSVDomains = await result1;
const DBDomains = await result2();
//final list compares the two lists and returns the list of non duplicated domains.
const finalList = await CSVDomains.filter(
(val) => !DBDomains.includes(val)
);
console.log("The new list is: " + finalList);
};
Quick aside: I could have accomplished the same result by using psql ON CONFLICT DO NOTHING. This would have ignored duplicates when updating to the database because I have a UNIQUE constraint on the domain column.

How to return Data from readFileAsync

I have a function that needs to read from a file, then convert it to an json object with JSON.parse(string) and lastly return it to be used elsewhere.
this object is written to a txt file using JSON.stringify():
let User = {
Username:"#",
Password:"#",
FilePaths:[
{Year:'#', Month:'#', FileName:'#'}
]
}
the writing works fine, my problem is the reading,
Here is my code:
const readFileAsync = promisify(fs.readFile);
async function GetUserData(User) {
const Read = async () => {
return await (await readFileAsync('UserData\\'+User.Username+'\\UserData.txt')).toString();
}
return Read().then(data=>{return JSON.parse(data)})
}
console.log(GetUserData(User));
The result I get back is :
Promise {[[PromiseState]]: 'pending', [[PromiseResult]]: undefined}
How can I make this work so
In this instance, GetUserData returns a promise, so you'll need to use .then to get the data once resolved.
GetUserData(User).then(data => {
// do something with data
});
Alternatively, fs has a sync function you can use, rather relying on readFile (which is async), use readFileSync. Usage below:
function GetUserData(User) {
const data = fs.readFileSync('UserData\\'+User.Username+'\\UserData.txt').toString();
return JSON.parse(data);
}
console.log(GetUserData(User));

Can't add key from function to dictionary

My code:
var price = {};
function getPrice(price) {
const https = require('https');
var item = ('M4A1-S | Decimator (Field-Tested)')
var body = '';
var price = {};
https.get('https://steamcommunity.com/market/priceoverview/?appid=730&market_hash_name=' + item, res => {
res.on('data', data => {
body += data;
})
res.on('end', () => price ['value'] = parseFloat(JSON.parse(body).median_price.substr(1))); //doesnt add to dict
}).on('error', error => console.error(error.message));
}
price['test'] = "123" //adds to dict fine
getPrice(price)
console.log(price);
Output:
{ test: '123' }
as you can see, the "test: 123" gets added, but the "value: xxx" from the function doesn't. Why is that?
There are two main problems here:
You're redeclaring the variable inside your function so you're declaring a separate, new variable and modifying that so the higher scoped variable, never gets your .value property.
You're assigning the property inside an asynchronous callback that runs sometime later after your function has returned and thus your function actually returns and you do the console.log() too soon before you have even obtained the value. This is a classic issue with return asynchronously obtained data from a function in Javascript. You will need to communicate back that data with a callback or with a promise.
I would also suggest that you use a higher level library that supports promises for getting your http request and parsing the results. There are many that already support promises, already read the whole response, already offer JSON parsing built-in, do appropriate error detection and propagation, etc... You don't need to write all that yourself. My favorite library for this is got(), but you can see a list of many good choices here. I would strongly advise that you use promises to communicate back your asynchronous result.
My suggestion for fixing this would be this code:
const got = require('got');
async function getPrice() {
const item = 'M4A1-S | Decimator (Field-Tested)';
const url = 'https://steamcommunity.com/market/priceoverview/?appid=730&market_hash_name=' + item;
const body = await got(url).json();
if (!body.success || !body.median_price) {
throw new Error('Could not obtain price');
}
return parseFloat(body.median_price.substr(1));
}
getPrice().then(value => {
// use value here
console.log(value);
}).catch(err => {
console.log(err);
});
When I run this, it logs 5.2.
You're actually console.logging .price before you're setting .value; .value isn't set until the asynchronous call fires.
You are declaring price again inside the function and also not waiting for the asynchronous task to finish.
const https = require("https");
const getPrice = () =>
new Promise((resolve, reject) => {
const item = "M4A1-S | Decimator (Field-Tested)";
let body = "";
return https
.get(
`https://steamcommunity.com/market/priceoverview/?appid=730&market_hash_name=${item}`,
res => {
res.on("data", data => {
body += data;
});
res.on("end", () =>
resolve(
parseFloat(JSON.parse(body).median_price.substr(1))
)
);
}
)
.on("error", error => reject(error));
});
const main = async () => {
try{
const price = await getPrice();
//use the price value to do something
}catch(error){
console.error(error);
}
};
main();

How to make sure all lines are executed before function return

Calling the readCSV from index.js
const productIds = await readCSV();
in another file:
async function readCSV() {
const filepath = path.resolve('src/input_csv.csv');
const readstream = await fs.createReadStream(filepath);
const stream = await readstream.pipe(parser());
let productIds = [];
await stream.on('data', data => {
productIds.push(data.SourceProductId);
console.log('SourceProductId', data.SourceProductId);
});
await stream.on('end', () => {
console.log(productIds);
});
if (productIds.length > 0) return productIds;
else return 'no products found';
});
}
it is giving the output:
in main: []
SourceProductId 1000050429
SourceProductId 1132353
SourceProductId 999915195
SourceProductId 50162873
SourceProductId 1000661087
[ '1000050429', '1132353', '999915195', '50162873', '1000661087' ]
I'm expecting the function to return an array of all values read from CSV. but the seems like it is returning before executing the stream.on statement. How to make sure all lines are executed before returning. I placed await before every statement but no luck
stream.on doesn't return a promise, so you cannot await it.
You can work around this by returning a promise from your readCSV function:
function readCSV() {
return new Promise(resolve => {
const filepath = path.resolve('src/input_csv.csv');
const readstream = fs.createReadStream(filepath);
const stream = readstream.pipe(parser());
let productIds = [];
stream.on('data', data => {
productIds.push(data.SourceProductId);
console.log('SourceProductId', data.SourceProductId);
});
stream.on('end', () => {
console.log(productIds);
if (productIds.length > 0) resolve(productIds);
else resolve('no products found');
});
});
}
You create a promise by passing a callback function as argument. That callback function gets an argument resolve, which is a another callback function that you call when your asynchronous operation is done, passing the result.
In the example above, we call this resolve callback with the product IDs after the file read stream has finished.
Since readCSV now returns a promise, you can await it like you did in your code example:
const productIds = await readCSV();

NodeJS, promises, streams - processing large CSV files

I need to build a function for processing large CSV files for use in a bluebird.map() call. Given the potential sizes of the file, I'd like to use streaming.
This function should accept a stream (a CSV file) and a function (that processes the chunks from the stream) and return a promise when the file is read to end (resolved) or errors (rejected).
So, I start with:
'use strict';
var _ = require('lodash');
var promise = require('bluebird');
var csv = require('csv');
var stream = require('stream');
var pgp = require('pg-promise')({promiseLib: promise});
api.parsers.processCsvStream = function(passedStream, processor) {
var parser = csv.parse(passedStream, {trim: true});
passedStream.pipe(parser);
// use readable or data event?
parser.on('readable', function() {
// call processor, which may be async
// how do I throttle the amount of promises generated
});
var db = pgp(api.config.mailroom.fileMakerDbConfig);
return new Promise(function(resolve, reject) {
parser.on('end', resolve);
parser.on('error', reject);
});
}
Now, I have two inter-related issues:
I need to throttle the actual amount of data being processed, so as to not create memory pressures.
The function passed as the processor param is going to often be async, such as saving the contents of the file to the db via a library that is promise-based (right now: pg-promise). As such, it will create a promise in memory and move on, repeatedly.
The pg-promise library has functions to manage this, like page(), but I'm not able to wrap my ahead around how to mix stream event handlers with these promise methods. Right now, I return a promise in the handler for readable section after each read(), which means I create a huge amount of promised database operations and eventually fault out because I hit a process memory limit.
Does anyone have a working example of this that I can use as a jumping point?
UPDATE: Probably more than one way to skin the cat, but this works:
'use strict';
var _ = require('lodash');
var promise = require('bluebird');
var csv = require('csv');
var stream = require('stream');
var pgp = require('pg-promise')({promiseLib: promise});
api.parsers.processCsvStream = function(passedStream, processor) {
// some checks trimmed out for example
var db = pgp(api.config.mailroom.fileMakerDbConfig);
var parser = csv.parse(passedStream, {trim: true});
passedStream.pipe(parser);
var readDataFromStream = function(index, data, delay) {
var records = [];
var record;
do {
record = parser.read();
if(record != null)
records.push(record);
} while(record != null && (records.length < api.config.mailroom.fileParserConcurrency))
parser.pause();
if(records.length)
return records;
};
var processData = function(index, data, delay) {
console.log('processData(' + index + ') > data: ', data);
parser.resume();
};
parser.on('readable', function() {
db.task(function(tsk) {
this.page(readDataFromStream, processData);
});
});
return new Promise(function(resolve, reject) {
parser.on('end', resolve);
parser.on('error', reject);
});
}
Anyone sees a potential problem with this approach?
You might want to look at promise-streams
var ps = require('promise-streams');
passedStream
.pipe(csv.parse({trim: true}))
.pipe(ps.map({concurrent: 4}, row => processRowDataWhichMightBeAsyncAndReturnPromise(row)))
.wait().then(_ => {
console.log("All done!");
});
Works with backpressure and everything.
Find below a complete application that correctly executes the same kind of task as you want: It reads a file as a stream, parses it as a CSV and inserts each row into the database.
const fs = require('fs');
const promise = require('bluebird');
const csv = require('csv-parse');
const pgp = require('pg-promise')({promiseLib: promise});
const cn = "postgres://postgres:password#localhost:5432/test_db";
const rs = fs.createReadStream('primes.csv');
const db = pgp(cn);
function receiver(_, data) {
function source(index) {
if (index < data.length) {
// here we insert just the first column value that contains a prime number;
return this.none('insert into primes values($1)', data[index][0]);
}
}
return this.sequence(source);
}
db.task(t => {
return pgp.spex.stream.read.call(t, rs.pipe(csv()), receiver);
})
.then(data => {
console.log('DATA:', data);
}
.catch(error => {
console.log('ERROR:', error);
});
Note that the only thing I changed: using library csv-parse instead of csv, as a better alternative.
Added use of method stream.read from the spex library, which properly serves a Readable stream for use with promises.
I found a slightly better way of doing the same thing; with more control. This is a minimal skeleton with precise parallelism control. With parallel value as one all records are processed in sequence without having the entire file in memory, we can increase parallel value for faster processing.
const csv = require('csv');
const csvParser = require('csv-parser')
const fs = require('fs');
const readStream = fs.createReadStream('IN');
const writeStream = fs.createWriteStream('OUT');
const transform = csv.transform({ parallel: 1 }, (record, done) => {
asyncTask(...) // return Promise
.then(result => {
// ... do something when success
return done(null, record);
}, (err) => {
// ... do something when error
return done(null, record);
})
}
);
readStream
.pipe(csvParser())
.pipe(transform)
.pipe(csv.stringify())
.pipe(writeStream);
This allows doing an async task for each record.
To return a promise instead we can return with an empty promise, and complete it when stream finishes.
.on('end',function() {
//do something wiht csvData
console.log(csvData);
});
So to say you don't want streaming but some kind of data chunks? ;-)
Do you know https://github.com/substack/stream-handbook?
I think the simplest approach without changing your architecture would be some kind of promise pool. e.g. https://github.com/timdp/es6-promise-pool

Resources