Parse xlsx file containing hyperlink with node.js - node.js

I want to parse an xlsx file which contains hyperlinks on my node.js server.
I tried some xlsx parser in npm (like 'excel-parser', 'xlsx'), but I couldn't get the hyperlink values (only the text value).
Does anyone knows how to extract the hyperlink using node.js ?

If you dislike the existing solutions, you can always unzip the file (OfficeOpen XML files are zipped directories with sereval files in it), and parse the main part yourself in the search of links.

An old question, but one without an answer that I could easily find elsewhere after half an hour of looking.
The below code will read an XSLX and dump it to the console row by row: row.values will be plain text or an object with keys formula and result, the former being the hyperlink, the latter the visible text.
I've only just found exceljs so ymmv, but it seems straightforward and has a large, but not overwhelming, variety of options for getting the job done quickly.
const xl = require('exceljs');
const csvPath = 'NTA Transcripts.xlsx';
const workbook = new xl.Workbook();
await workbook.xlsx.readFile(csvPath);
const worksheet = workbook.getWorksheet(1);
worksheet.eachRow({ includeEmpty: true }, (row, rowNumber) => {
console.log("Row ", rowNumber, ": ", JSON.stringify(row.values, null, 2));
});

Related

How do I add a script to a Google Sheet to automatically take data from an XLSX file on Google Drive?

I'm trying to automatically update a Google Sheet from a separate XLSX file, since the XLSX file gets regularly updated, but I need to do some data cleaning. I tried doing a query and importrange neither of which can get data from an xlsx file.
It seems like I need to write a script on the Google Sheet to automatically take the data from the xlsx. Where do I add this, and how would I go about getting started? I have access to both files, so permissions shouldn't be an issue.
Suggestion: Temporarily Convert the Excel File to Google Sheets File to Extract Data
Unfortunately, there is no direct way to extract data from Excel files to Google Sheets using Google Apps Script. As a workaround, you need to first convert your excel file to Google Sheets and then extract the data from the converted file to your output Google Sheets file. You may use the following script as a basis for yours:
function importData() {
var xlsxName = "Test 1.xlsx"; //Change source file name accordingly
var convertID = convert(xlsxName).toString();
var xLSX = SpreadsheetApp.openById(convertID).getSheetByName("Input");
var ss = SpreadsheetApp.openById("<output Sheet ID>").getSheetByName("Output"); //Change output sheet ID
var lastColumn = xLSX.getLastColumn();
var lastRow = xLSX.getLastRow();
ss.getRange(1, 1, lastRow, lastColumn).setValues(xLSX.getDataRange().getValues()); //Sets values from converted xlsx data to output sheet
DriveApp.getFileById(convertID).setTrashed(true); //deletes temporary file
}
function convert(excelFileName) {
var files = DriveApp.getFilesByName(excelFileName);
var excelFile = (files.hasNext()) ? files.next() : null;
var blob = excelFile.getBlob();
var config = {
title: "[Converted File] " + excelFile.getName(), //sets the title of the converted file
parents: [{ id: excelFile.getParents().next().getId() }],
mimeType: MimeType.GOOGLE_SHEETS
};
var spreadsheet = Drive.Files.insert(config, blob);
return (spreadsheet.id); //Returns the ID of the converted file
}
This script involves:
Converting the Excel file to a temporary Google Sheets file.
Importing the data from the temporary Google Sheets file to the desired/output Google Sheets file.
Deleting the temporary Google Sheets file.
NOTE:
Expect a longer runtime when applying this script to a bigger excel file.
You may modify the script to be suitable for your current issue.
The script should be added to your desired output Google Sheets.
Do not forget to add the Drive API service to your script.
Sample Test Case:
Input:
Expected Output:

SheetJS / excel4node modify xlsx file without changing anything else

I have a JSON data like:
data = [
name: "test",
age:50,
country: "America"
]
And I read excel file which looks like that
https://imgur.com/a/InyUXxv
(File is more complex, I have more static images and a lot of more text --> around 1000 cells filled)
So The problem is that I need to fill the JSON data to the excel file.
The Excel file will allways be the same and the data will allways go to same cell.
I can read this Excel template file and update it and write it back in new file but if I do that, I lost images. New file is without images
With excel4node I can write separate images to excel file but I don't know how can i read that file and than write the same back..
Code example for xlsx npm package, where I lost images when writing same file to Excel..
Can someone help me with anything? I am stuck here for a few days and I can't find a solution..
Node.js code:
var xlsx = require("xlsx");
exports.generateExcel = async () => {
var excelFile = await
xlsx.readFile("./utilities/template.xlsx");
const { SheetNames: sheetNames } = excelFile;
var data = xlsx.utils.sheet_to_json(excelFile.Sheets[sheetNames[0]]);
console.log(data);
var ws = xlsx.utils.json_to_sheet(data);
var wb = xlsx.utils.book_new();
xlsx.utils.book_append_sheet(wb, excelFile, "Tests");
xlsx.writeFile(excelFile, "./utilities/novooo.xlsx");
};
So When I write file it is written without images. In the template.xlsx there are images (I read this file in the beginning of the code and store it as variable)

Convert array to blob, blob to legacy XLS file

I have an apps script that generates a 2D array. I would like to export this array to a folder on my Google Drive in legacy .XLS format, ideally without first creating a Google Sheet and then converting that sheet.
I thought I could turn my array into a CSV string and convert that to blob with the appropriate MimeType, and save that in Drive.
However, when I download the file from Drive and open it, the values aren't separated (tried "," and ";" as delimiter).
My script below, with a simplified array for example.
function createXls() {
var data = [["a","b","c"],["d","e","f"]];
var csvString = toCsv(data);
var xlsName = "here goes the filename";
var driveFolder = DriveApp.getFolderById("hereGoesTheFolderId");
var blob = Utilities.newBlob(csvString, MimeType.MICROSOFT_EXCEL_LEGACY);
blob.setName(xlsName + ".xls");
driveFolder.createFile(blob);
};
function toCsv(arr) {
return arr.map(row =>
row.map(val => val).join(';')
).join('\n');
};
Am I missing something here, or is there no wat around putting the data in a sheet first and converting that sheet to xls?
Thank you!

Ignoring formatting while importing to xls using CasperJS

I have a script in casperJS that scrapes data from a webpage, then puts that information into a variable
var lbdes = casper.fetchText(x('//*[#id="product_overview"]'));
and I call that variable using FS to create an excel doc.
casper.then(function() {
var f = fs.open('scrapetest.xls', 'w');
f.write(lbdes);
f.close();
});
Problem is, when that variable imports into excel it looks like the image below -- Ideally I would love to keep that formatting but somehow force that string to stay in 1 cell rather than spanning across 3+ rows and 3+ columns.

My Excel file of CSV data from Google Apps Script doesn't mirror my Google Spreadsheet

I'm trying to write a script that passes information from a Google Spreadsheet, compiles it into a CSV file and emails that file.
My problem: The CSV file on my Excel file looks very different that of my Google Spreadsheet (Dead link).
This is what my Excel file looks like, pasted into another Google Spreadsheet.
The code I am using is below:
function myFunction() {
//get active sheet, the last row where data has been entered, define the range and use that range to get the values (called data)
var sheet = SpreadsheetApp.getActiveSheet();
var lastRow=sheet.getLastRow();
var range = sheet.getRange(1,1,lastRow,91);
var data = range.getValues();
//define a string called csv
var csv = "";
//run for loop through the data and join the values together separated by a comma
for (var i = 0; i < data.length; ++i) {
csv += data[i].join(",") + "\r\n";
}
var csvFiles = [{fileName:"example.csv", content:csv}];
MailApp.sendEmail(Session.getUser().getEmail(), "New Journey Information", "", {attachments: csvFiles});
}
You need to ensure that individual cells' data is atomic. For instance, what you see as a time on the sheet contains a Date object when read by your script, then when that's written to a CSV it may be converted to a date & time string with commas, depending on your locale. (Jan 4, 2013 14:34 for example.) To be safe with punctuation that may be interpreted as delimiters by Excel, you should enclose each element with quotes.
No need to modify your code, as this problem has been solved in one of the examples provided in the Docslist Tutorial. So an easy solution is to replace your code.
Change the first bit of the provided saveAsCSV() as follows, and it will operate either with user input or by passing a filename as a parameter.
function saveAsCSV(filename) {
// Prompts the user for the file name, if filename parameter not provided
fileName = filename || Browser.inputBox("Save CSV file as (e.g. myCSVFile):");
...

Resources