Read from a specific row onwards from Excel File - excel

I have got a Excel file having around 7000 rows approx to read. And Excel file contains Table of Contents and the actual contents data in details below.
I would like to avoid all rows for Table of Content and start from actual content data to read. This is because if I need to read data for "CPU_INFO" the loop and search string occurrence twice 1] from Table of Content and 2] from actual Content.
So I would like to know if there is any way I can point to Start Row Index to start reading data content for Excel File , thus skipping whole of Table Of Content Section?

As taken from the Apache POI documentation on iterating over rows and cells:
In some cases, when iterating, you need full control over how missing or blank rows or cells are treated, and you need to ensure you visit every cell and not just those defined in the file. (The CellIterator will only return the cells defined in the file, which is largely those with values or stylings, but it depends on Excel).
In cases such as these, you should fetch the first and last column information for a row, then call getCell(int, MissingCellPolicy) to fetch the cell. Use a MissingCellPolicy to control how blank or null cells are handled.
If we take the example code from that documentation, and tweak it for your requirement to start on row 7000, and assuming you want to not go past 15k rows, we get:
// Decide which rows to process
int rowStart = Math.min(7000, sheet.getFirstRowNum());
int rowEnd = Math.max(1500, sheet.getLastRowNum());
for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
Row r = sheet.getRow(rowNum);
int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);
for (int cn = 0; cn < lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c == null) {
// The spreadsheet is empty in this cell
} else {
// Do something useful with the cell's contents
}
}
}

Related

C++ Excel Automation : How to use SetFormula function for the cell elements whose row & column were unknown but will be given by the program later on?

I am working on the heritage codes which use C++ Excel Automation to output our analysis data in the excel spreadsheet. From the following article,
https://support.microsoft.com/en-us/topic/how-to-use-mfc-to-automate-excel-and-create-and-format-a-new-workbook-6f2450bc-ba35-a36a-df2f-c9dd53d7aef1
I knew we can use "range.SetFormula() function to calculate the formula results from some specific cells, for example:
range = sheet.GetRange(COleVariant("C2"), COleVariant("C6"));
range.SetFormula(COleVariant("=A2 & \" \" & B2"));
My question here is how can I use SetFormula function to point to some cell elements whose row & column are unknow but will be determined as the program runs. In specifically, I have a number of cell elements populated as my analysis runs. Different analysis will have different number of elements output to the excel spreadsheet. For example, if I have kw data, then the excel output will be populated in kw row 6 column and I also need to output some summary results based on these element underneath these populated elements. Something like this:
int kw = var_length; // the row changes depending on different analysis
DWORD numElements[2];
Range range;
range = sheet.GetRange(COleVariant(_T("A3")),COleVariant(_T("A3")));
numElements[0]= kw; //Number of rows in the range.
numElements[1]= 6; //Number of columns in the range.
saRet.Create(VT_R8, 2, numElements);
for(int iRow = 0;iRow < kw; iRow++)
{
for (iCol = 0; iCol < 6; iCol++)
{
index[0] = iRow;
index[1] = iCol;
saRet.PutElement(index, &somevalue);
}
}
range.SetValue2(COleVariant(saRet));
CString TStr;
TStr.Format(_T("A%d"), kw+2);
range = sheet.GetRange(COleVariant(TStr), COleVariant(TStr))
CString t1, t2;
t1.Format(_T("A%d"), kw/2);
t2.Format(_T("A%d"), kw);
range.SetFormula(COleVariant(L"=SUM(A&t1: A&t2)")); // Calculate the sum of second half of whole elements, Apparently, this didn't work, How can I fix this?
Here I want to sum the second half of whole elements but in the SetFormula function, I didn't know exactly row number for these element, eg, A25 - A50. The row number is dependent on the kw which is given as input from program. Different analysis, kw is different. I attempted to use TStr format to get the row number but it CAN NOT be used inside SetFormula function. Ideally I want to use formula for my summary data output so that if I change my populated the element values, the summary data output can change accordingly. I searched in your MSDN website but couldn't find any solution on how to resolve this.
Can someone help me with the issue?
Thanks in advance.

Excel Online Workbook Links - Linking full row range

I am using Excel Online in the browser, have setup a workbook link to my main file from a source. In my main file I have table headers and additional columns with formula. I just need from A2 to AC down. The issue is that the source file changes daily. There might be more rows the next day or fewer. I need to be able to reference set columns and then detect how many rows are in the data source and update the main file
So far, I have something like this
='https://sharepoint.com/personal/myFolder/Documents/[data_source.xlsx]in'!A2
Which on columns B2 and C2 load the first row. I can select a range from the source data so it loads all of it, but if the next day there is more rows, it wont load those, or if there are fewer, it will display as blanks.
How can I tell the formula to select Columns A2 to C2 and extend down, or refresh the data like it does in Excel desktop when using data connections?
As you can see Source data, Day 2 has extra rows that wont be loaded in my main file.
You can use PowerAutomate and two Office Scripts to link the two workbooks together.
You'd start by using a recurrence. So you'd pick how often you'd like the flow to run (weekly, daily, etc.)
After you set the recurrence, you have to write an office script that work with the table data. You can work with the dataBodyRange of the table by using the table's GetRangeBetweenHeaderAndTotal() method. And once you have that, you can resize the range to get the data you need. Next, you need to get the values which you can use with the GetValues method. GetValues returns a 2d array which you can't return from a PowerAutomate RunScript. Since you can't do that, but you can return a string, you get around that by converting the 2d array to a json string. You can see the code below:
function main(workbook: ExcelScript.Workbook): string {
let sh: ExcelScript.Worksheet = workbook.getActiveWorksheet();
//get table
let tbl: ExcelScript.Table = sh.getTable("Table1");
//get table's column count
let tblColumnCount: number = tbl.getColumns().length;
//set number of columns to keep
let columnsToKeep: number = 3;
//set the number of rows to remove
let rowsToRemove: number = 0;
//resize the table range
let tblRange: ExcelScript.Range = tbl.getRangeBetweenHeaderAndTotal().getResizedRange(rowsToRemove,columnsToKeep - tblColumnCount);
//get the table values
let tblRangeValues: string[][] = tblRange.getValues() as string[][];
//create a JSON string
let result: string = JSON.stringify(tblRangeValues);
//return JSON string
return result;
}
Once you created your script, consider naming it something you'll remember when you call it in PowerAutomate (I called mine getTableValues). Next, after the recurrence in PowerAutomate, add a Run Script step. Fill out the values and select the script like so:
Next, you have to create the script which takes the input returned from the previous script and completes the final steps. So the script has to have a parameter that takes the string returned from the previous script (I called it tableValues in mine). In the script, you have to parse the json string array to create a 2d array, resize the initial range, and then set the values of the resized range. You can see a script that does that below:
function main(workbook: ExcelScript.Workbook, tableValues: string)
{
let sh: ExcelScript.Worksheet = workbook.getWorksheet("Sheet1")
//parses the JSON string to create array
let tableValuesArray: string[][] = JSON.parse(tableValues);
//gets row count from the array
let valuesRowCount: number = tableValuesArray.length - 1
//gets column count from the array
let valuesColumnCount: number = tableValuesArray[0].length - 1
//resizes the range
let rang: ExcelScript.Range = sh.getRange("A1").getResizedRange(valuesRowCount,valuesColumnCount)
//sets the value of the resized range to the array
rang.setValues(tableValuesArray)
}
In PowerAutomate, you have to create a second run script step. In the second step, you should be prompted with a value to enter after you've selected the script (the value is called tableValues in my step.) In the table values input, you have to enter the dynamic content Result value. Once this is done, you can save the script and test.
One thing to note is that the second script doesn't delete old range values from previous runs. This can be done in a number of different ways. But the preferred way may depend on how the workbook is structured. So I'd recommend writing code to clear the range in the second script somewhere in the beginning. Or better yet, add the output of the first script into an Excel table. And just empty out the table every time you run the second script.
If you'd like to see how you might do that, you can take a look at this post here

Apache POI - How to set correct column width in Word table

I have an existing Word document containing a table. The first row of the table has two cells, but all the other rows have four cells and each cell has a different width.
I need to insert new rows via POI that also have four cells with widths that match those of the existing 4-cell rows.
The basic code is:
XWPFTable table = doc.getTableArray(0);
XWPFTableRow oldRow = table.getRow(2);
table.insertNewTableRow(3);
XWPFTableRow newRow = table.getRow(3);
XWPFTableCell cell;
for (int i = 0; i < oldRow.getTableCells().size(); i++) {
cell = newRow.createCell();
CTTblWidth cellWidth = cell.getCTTc().addNewTcPr().addNewTcW();
BigInteger width = oldRow.getCell(i).getCTTc().getTcPr().getTcW().getW();
cellWidth.setW(width); // sets width
XWPFRun run = cell.getParagraphs().get(0).createRun();
run.setText("NewRow C" + i);
}
The result of this is that row 3 has four cells but their widths do not match those of row 2. The total new row width ends up being the same as the total width of the first three cells of row 2. (Sorry, I don't know how to paste the Word table here).
However, if I first manually edit the source document so that the first table row also has four cells, then everything works perfectly. Similarly, if I get a reference to an existing row and add it to the table, then the cell widths are also correct (but I have the same row object twice so can't modify it).
It seems that the number of cells in the first row influences how other rows are inserted. Does this make sense to anyone and can you suggest how to override it? Also, is there a document anywhere that I can study to understand how this works? Thanks.
Accordiing to your mention: "The first row of the table has two cells, but all the other rows have four cells and each cell has a different width." I suspect this will be a very messy table. Although Word is supporting such tables, I would try to avoid such. But if it must be, you need to know that there is a table grid also for those messy tables. Unzip the *.docx and have a look at /word/document.xml there you will find it.
So if we want to insert rows into such messy tables, we also must respect the table grid. For this there is a GridSpan element in the CTTcPr. This we must also copy from the oldRow and not only copy the CTTblWidth.
Also the CTTblWidth has not only a width but also a type. This we also should copy.
Example:
The source.docx looks like this:
As you see the table grid has 10 columns in total. "Cell 2 1" spans 3 columns, "Cell 2 2" spans 3 columns, "Cell 2 3" spans 0 columns (is its own column), "Cell 2 4" spans 3 columns.
With code:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTblWidth;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTcPr;
import java.math.BigInteger;
public class WordInsertTableRow {
public static void main(String[] args) throws IOException, InvalidFormatException {
XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
XWPFTable table = doc.getTableArray(0);
XWPFTableRow oldRow = table.getRow(2);
table.insertNewTableRow(3);
XWPFTableRow newRow = table.getRow(3);
XWPFTableCell cell;
for (int i = 0; i < oldRow.getTableCells().size(); i++) {
cell = newRow.createCell();
CTTcPr ctTcPr = cell.getCTTc().addNewTcPr();
CTTblWidth cellWidth = ctTcPr.addNewTcW();
cellWidth.setType(oldRow.getCell(i).getCTTc().getTcPr().getTcW().getType()); // sets type of width
BigInteger width = oldRow.getCell(i).getCTTc().getTcPr().getTcW().getW();
cellWidth.setW(width); // sets width
if (oldRow.getCell(i).getCTTc().getTcPr().getGridSpan() != null) {
ctTcPr.setGridSpan(oldRow.getCell(i).getCTTc().getTcPr().getGridSpan()); // sets grid span if any
}
XWPFRun run = cell.getParagraphs().get(0).createRun();
run.setText("NewRow C" + i);
}
doc.write(new FileOutputStream("result.docx"));
doc.close();
System.out.println("Done");
}
}
The result.docx looks like:

In Google Spreadsheets, if cell A > cell B, replace Cell B with Cell A

I currently have a column labeled "Current Rank" in column A, and a column labeled "Highest Rank" in column B. If current rank > highest rank, I'd like to replace highest rank with current rank. Is there any way to do this while getting around the self-reference errors?
You can list all the scores and use the formula =max() to find the biggest one, but to actually replace the cell "B" with the bigger value, the only way is with a script, Google's version of Excel macros. You'd have to make something like this, and run it after you put in the new values (this is my first google script, there is most certainly cleaner way, but this code gets the job done)
function myFunction() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheets()[0];
var range=sheet.getRange("A1:A100")
var j=0;
for(var i in range){
if(range.getValues()[j]!=undefined){
if(sheet.getRange("B" + (j+1)).getValue()<range.getValues()[j][0]){
sheet.getRange("B" + (j+1)).setValue(range.getValues()[j][0]);
}
}
j++;
}
}
To make a script, open tools>script editor.
You need a script, but this is not a hard one.
An onEdit trigger is a little cleaner, works automatically. Open the editor via tools, paste this into an empty file. You're done. This script will evaluate column a and column b every time a change is made on the sheet.
Enter data into column A, and the max result will appear in column B (this script assumes there is a header row, data on row 1 won't get evaluated.)
function onEdit(e) {
var ss = e.source;
var sheet = ss.getActiveSheet();
var range = sheet.getDataRange();
var values = range.getValues();
for(var i = 1; i < values.length; i++) {
if (values[i][0] > values[i][1]) {
sheet.getRange(i + 1, 2, 1, 1).setValue(values[i][0]);
}
}
}

While Converting excel to dataset using excelreader.Asdataset(), Sometimes after read an empty cell in excel, the next cell is read as System.DBNull

I am converting excel file data to data set using following code
if (String.Compare(Path.GetExtension(filePath), ".xlsx",StringComparison.OrdinalIgnoreCase) == 0){excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream); if (excelReader != null) {
excelReader.IsFirstRowAsColumnNames = true;var dsresult = new DataSet();
try { dsresult = excelReader.AsDataSet(); }}}
But sometimes, after read an empty cell in excel, the next cell is read as System.DBNull
e. g.
data in excel as below
Col A = 1, Col B is blank, Col C = 2
After conversion to data set values in dataset will be
data set values:
Col A = 1, Col B is blank, Col C is blank
After searching it seems that there is some problem with excel reader, Please suggest some proper solution or atleast workaround for this issue
Thanks
Deepak
There seems to be an issue with old versions of Excel Data reader, I had the same issue as you, tried excelreader.Asdataset() and tried also manually looping with excelReader.Read() but I was still getting empty results. As soon as I updated dll to version 2.1 I got rid of the issue.

Resources