Processing a csv file with unequal amount of columns in nodes

Processing a csv file with unequal amount of columns in nodes - node.js

I have the following function which processes my csv file. Unfortunately the csv file has one column where it also uses the comma as thousand separator (I have no influence over this exported csv file and its structure). So in every file from a certain row there will be one extra column.
In the on('data', ()) method, I've already fixed this value by joining the two fields together and deleting the redundant field. But in the end, this still results in rows with an extra column. The 4th column will just be empty by doing this..
I would like to let every field 'shift' to the left when the field is deleted. Is it possible to manipulate this? Or do I need an extra function which processes the output and ignores all 'null' fields.
function readLines(file, options, cb){
let results = [];
fs.createReadStream(file)
.pipe(csv(options))
.on('data', (data) => {
if(Object.keys(data).length == 59){
data['2'] = data['2'] + data['3']
delete data['3']
}
results.push(data)
})
.on('end', () => {
cb(results)
});
}

I've fixed it by filtering the return object in the callback function:
cb(Object.values(results).map((r) => {
return r.filter((x) =>{
return x != null && x !== ""
})
}))
Probably not the most efficient, but the best I could come up with so far.

Related

Replace string for a specific column (csv file) in Nodejs

I want to replace string of a particular column from a .csv file in Nodejs
here is my .csv data:
ID,Name,openingBalance,closingBalance
"27G","HARRIS TODD",23.22,465.22
"28G","ANGELO RALPH",124.31,555.20
"28N","GRODKO STEVEN",45.22,
"29A","FOWLER ROBERT",65.25,666.00
"29G","PROVOST BRIAN",,253.11
"300","BECKMAN JUDITH",114.21,878.21
in the closingBalance column there is a blank which I need to be replace as 0.00
I am able to replace the whole data, but not for the specific column,
Can anyone please help?
I used this for replace string :
var fs = require('fs')
fs.readFile(someFile, 'utf8', function (err,data) {
if (err) {
return console.log(err);
}
var result = data.replace(/string to be replaced/g, 'replacement');
fs.writeFile(someFile, result, 'utf8', function (err) {
if (err) return console.log(err);
});
});

You can use regular expression to replace blank cells, but that's easy only for first/last columns:
data.replace(/,$/gm, ',0.00');
See Regex101 for further details and playground
Other way is to parse CSV to AoA:
const csv = `ID,Name,openingBalance,closingBalance
"27G","HARRIS TODD",23.22,465.22
"28G","ANGELO RALPH",124.31,555.20
"28N","GRODKO STEVEN",45.22,
"29A","FOWLER ROBERT",65.25,666.00
"29G","PROVOST BRIAN",,253.11
"300","BECKMAN JUDITH",114.21,878.21`;
const aoa = csv
.split(/\r?\n/g)
.map((row) => {
let [ID, Name, openingBalance, closingBalance] = row.split(',');
// Fix empty values for "openingBalance" column
if (!openingBalance.trim()) {
openingBalance = '0.00';
}
// Fix empty values for "closingBalance" column
if (!closingBalance.trim()) {
closingBalance = '0.00';
}
return [ID, Name, openingBalance, closingBalance]
});
// now you have AoA with fixed values
console.log(aoa.map((row) => row.join(',')).join('\n'))
With this way, you can pre-moderate any column with any code.

You can use regex to find out blank cells for closingBalance column.
The cells for this particular column is the last one in each record which can be easily found by \n and $ in regex.
So to do this:
const result = data.replace(/(,)(?=(\n|$))/g, '$10.00');
of if you want to find out any blank cells, you can use the following regex:
/(,)(?=(,|\n|$))/g

Filling a database with after promises parsing a text file

I'm trying to build a db with bluebird and sqlite3 to manage a lot of "ingredients".
So far I've managed to parse a file and extrapolate some data from it using regex.
Every time a line matches the regex I want to search in the database if an element with the same name has already been inserted, if so the element is skipped, otherwise it must be inserted.
The problem is that some elements get inserted more than one time.
The code partially works, and I'm saying it partially works because if I remove the line of code where I check the existence of an element with the same name, the duplicate rows are much more.
Here's the piece of code:
lineReader.eachLine(FILE_NAME, (line, last, cb) => {
let match = regex.exec(line);
if (match != null) {
let matchedName = match[1].trim();
//This function return a Promise for all the rows with corresponding name
ingredientsRepo.getByName(matchedName)
.then((entries) => {
if (entries.length > 0) {
console.log("ALREADY INSERTED INGREDIENT: " + matchedName)
} else {
console.log("ADDING " + matchedName)
ingredientsRepo.create(matchedName)
}
})
}
});
I know I'm missing something about Promises but I can't understand what I'm doing wrong.
Here's the code of Both getByName(name) and create(name):
create(name) {
return this.dao.run(
`INSERT INTO ingredients (name) VALUES (?)`,
[name]
)
}
getByName(name) {
return this.dao.all(
'SELECT * FROM ingredients WHERE name == ?',
[name]
)
}

ingredientsRepo.getByName(matchedName) returns a promise, this means it's asynchronous. I am also guessing that ingredientsRepo.create(matchedName) is asynchronous, because you are inserting something into a DB.
Your loop doesn't wait for these async functions to complete. Hence the loop could already be on the 50th iteration, when the .then(...) is called for the first iteration.
So let's say the first 10 elements have the same name. Well, since the asynchronous functions take some time to complete, the name will not be inserted into the DB until say maybe the 20th iteration of the loop. So for the first 10 elements, the name does not exist within the DB.
It's a timing issue.

Nodejs Google Drive API - Accessing array elements in callback function

I want file names produced using drive.files.list to use as strings for downloading all files in a folder.
I am having trouble accessing an array of filenames. Basically through lack of knowledge, So I am lost by the logic of the mouse over reference of map in files.map() See below code.
My code:
if (files.length) {
// map method calls the callbackfn one time for each element in the array
files.map((file) => {
// there are x elements (a filename/id) called x times
// I want to access one filename at a time. Return a plain string filename for a downloadFile() function
var names = [];
// rough test to produce desired output. Produces 'undefined' for array indexes greater than 0 e.g names[1]
// files.length = 1;
// Without the files.length = 1; filename is outputted x times
var names = [];
files.forEach(function (file, i) {
names[i]= file.name;
})
// first array index (with files.length =1;) first filename and only this filename. Correct result!!!
console.log(names[0]);
I don't have much experience with OOP or Nodejs. But using various tests (code changes) the largest output looked like an array of arrays. I want to narrow it down to an array of filenames that I can JSON.stringify before using for downloading.
Mouse over of 'map'
(method) Array<drive_v3.Schema$File>.map<void>(callbackfn: (value: drive_v3.Schema$File, index: number, array: drive_v3.Schema$File[]) => void, thisArg?: any): void[]
Calls a defined callback function on each element of an array, and returns an array that contains the results.
#param callbackfn — A function that accepts up to three arguments. The map method calls the callbackfn function one time for each element in the array.
#param thisArg — An object to which the this keyword can refer in the callbackfn function. If thisArg is omitted, undefined is used as the this value.
Any suggestions apprecieated.

Try replacing the code with this one:
if (files.length) {
var names = [];
files.forEach(function(file, i) {
names.push(file.name);
});
console.log('filenames: ', names);
}
and then you can read the filenames by looping through names array, and pass it to other functions for processing:
names.forEach((fileName, i)=>{
console.log('filename: ', fileName, ' index', i);
// pass the name to other functions for processing
download(fileName);
});

merging CSV streams with node, trying to emit a header once and only once

Using Node.js and NodeCSV.
I am looping over multiple CSV streams, and trying to merge them into a single CSV, with a header line as the first line. I am also filtering to get only records whose time column is later than a given time, though I am pretty sure this part isn't relevant to my problem.
The input CSVs have header lines, and some of them might not contain the expected columns, in which case they should pass through without emitting any records.
The code (modified for brevity, please excuse copy/paste/munge errors) is like this:
const csv = require('csv');
let header = true; // first record should emit a header line
const streams = [_array, _of, _read, _streams];
const lastIndex = streams.length - 1;
const sinceTime = Date.parse(someTime);
for (const [i, s] of streams.entries()) {
s
.pipe(csv.parse({columns: true}))
.pipe(csv.transform(function (record) {
const recordTime = Date.parse(record.time);
if (recordTime > sinceTime) {
return record;
}
}))
.pipe(csv.stringify({
columns: ['time', 'col'],
header: header
}))
.pipe(output, {end: i === lastIndex});
// this should only be cleared if the file actually emitted a header, but how do I know!?
header = false;
}
The problem occurs when the first input file doesn't contain the time and col columns, let's say it's a csv with X and Y columns instead.
In this case, the file emits no records (as desired) but as a result it emits no header line, either. But I don't know that, and happily set the header boolean to false. Then my output contains no header line, which is not what I want.
If the first file is with the expected columns time and col, everything is great, and I get the header line from the first file.
How can I handle this case, and emit a header line even if the first file emits no records?
Update:
I thought of the following solution, which works, but it seems like there must be a better way. I tried generating an empty CSV with a header before the loop, like this:
const csv = require('csv');
const streams = [_array, _of, _read, _streams];
const lastIndex = streams.length - 1;
const sinceTime = Date.parse(someTime);
const columns = ['time', 'col'];
// generate the header line with an empty CSV
csv
.generate({length: 0})
.pipe(csv.parse({columns: true}))
.pipe(csv.stringify({
columns: columns,
header: true
}))
.pipe(output, {end: false});
for (const [i, s] of streams.entries()) {
s
.pipe(csv.parse({columns: true}))
.pipe(csv.transform(function (record) {
const recordTime = Date.parse(record.time);
if (recordTime > sinceTime) {
return record;
}
}))
.pipe(csv.stringify({
columns: columns,
header: false
}))
.pipe(output, {end: i === lastIndex});
}
Ugly, but it's doing what I want it to do. Is there a cleaner way?

Deleting rows in excel table with office-js

I have an ajax call in my add in which it should create or update the table in excel. If table is already exists, it should remove the rows and add the new results.
When deleting the rows in loop, it is deleting some rows and then I am getting following error:
Debug info: {"code":"InvalidArgument","message":"The argument is invalid or missing or has an incorrect format.","errorLocation":"TableRowCollection.getItemAt"}
My ajax call in my excel web add-in looks like this:
$.ajax({
//....
}).done(function (data) {
Excel.run(function (ctx) {
var odataTable = ctx.workbook.tables.getItemOrNullObject("odataTable");
//rows items are not available at this point, that is why we need to load them and sync the context
odataTable.rows.load();
return ctx.sync().then(function () {
if (odataTable.rows.items == null) {
odataTable.delete();
odataTable = ctx.workbook.tables.add('B2:G2', true);
odataTable.name = "odataTable";
} else {
console.log("Rows items:" + odataTable.rows.items.length);
odataTable.rows.items.forEach(function (item) {
console.log("Removing row item: " + item.values);
item.delete();
});
console.log("rows cleaned");
}
}).then(function () {
//add rows to the table
});
}).then(ctx.sync);
}).catch(errorHandler);
}).fail(function (status) {
showNotification('Error', 'Could not communicate with the server. ' + JSON.stringify(status));
}).always(function () {
$('#refresh-button').prop('disabled', false);
});

The idea of the iterable collections is that they consist of different items. Once you remove something from these items in a not appropriate way, the collection stops being a collection. This is because they are implemented as a linked list, in which every unit knows only the next unit. https://en.wikipedia.org/wiki/Linked_list
In your case, you are deleting wtih a for-each loop. After the first deletion, the collection is broken. Thus, you need another approach.
Another approach:
Start looping with a normal for loop. Reversed.
E.g.:
for i = TotalRows to 1 i--
if row(i) something then delete

This has already been answered but I recently solved this issue myself and came here to see if anyone had posted a question about it.
When you delete a row, Excel will reorder the row index for each row: i.e. when you delete row 1, row 2 becomes row 1 and all other rows get shifted down 1 index. Because these deletions are pushed to a batch to be completed, when the second deletion is executed, your second row has become row one, so it actually skips the second row and executes what you think is your third row.
If you start from the last row and work backwards, this reordering doesn't occur and neither does the error.
For completeness sake the above example would become:
return ctx.sync().then(function () {
if (odataTable.rows.items == null) {
odataTable.delete();
odataTable = ctx.workbook.tables.add('B2:G2', true);
odataTable.name = "odataTable";
} else {
console.log("Rows items:" + odataTable.rows.items.length);
for (let i = odataTable.rows.items.length -1; i >= 0; i--) // reverse loop
odataTable.rows.items[i].delete();
}
console.log("rows cleaned");
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Processing a csv file with unequal amount of columns in nodes - node.js

I've fixed it by filtering the return object in the callback function: cb(Object.values(results).map((r) => { return r.filter((x) =>{ return x != null && x !== "" }) })) Probably not the most efficient, but the best I could come up with so far.

Related

Replace string for a specific column (csv file) in Nodejs

Filling a database with after promises parsing a text file

Nodejs Google Drive API - Accessing array elements in callback function

merging CSV streams with node, trying to emit a header once and only once

Deleting rows in excel table with office-js

Categories

Resources