Java Excel POI - Delete rows with empty cells exception

Java Excel POI - Delete rows with empty cells exception - apache-poi

I'm programing an interface with java and Apache POI Library. I've a problem deleting empty rows. My code is:
public class ExcelDeleteRowsCols {
final short ROW_START = 0;
final short COL_START = 0;
public void deleteRows() {
try {
// Open file
FileInputStream inf = new FileInputStream("in.xls");
Workbook wb = WorkbookFactory.create(inf);
// Loop every sheets of workbook
for (Sheet sheet : wb){
// Loop every rows of this sheet
int lastIndex = sheet.getLastRowNum();
for (int i = ROW_START; i <= lastIndex; i++) {
if (sheet.getRow(i) == null || sheet.getRow(i).getCell(COL_START) == null || sheet.getRow(i).getCell(COL_START).toString().equals("")){
sheet.removeRow(sheet.getRow(i)); //sheet.shiftRows(i, lastIndex, 2);
}
}
}
// Save as in another file
FileOutputStream fileOut = new FileOutputStream("out.xls");
wb.write(fileOut);
fileOut.flush();
fileOut.close();
System.out.println("Finished!");
} catch (IOException ioe) {
System.out.println(ioe);
} catch (Exception e) {
System.out.println(e);
}
}
}
Exactly the problem is that in a rows with empty cells show an exception message java.lang.NullPointerException. I don't understand it. Excel Example:
"Empty cell"
Line2
Line3
Line4
Line5
"Empty cell"
Line7
Line8
Line9
Line10
Line11
Line12
Line13
When there aren't empty cells the code is working fine...
Please Could you help me?
Thanks in advance.

Of course you can't use
sheet.removeRow(row)
when your row is null
As I saw in your code you want to keep only the row that are not null in the "in.xls" file and write them to "out.xls" , so I would advise you to change the algorithm this way :
Workbook wbOut = new HSSFWorkbook(); // or whatever workbook you'd like to use
for (Sheet sheet : wb){
Sheet newSheet = wbOut.createSheet();
int newI = 0;
// Loop every rows of this sheet
int lastIndex = sheet.getLastRowNum();
for (int i = ROW_START; i <= lastIndex; i++) {
// the exact opposite condition
if (sheet.getRow(i) != null &&
sheet.getRow(i).getCell(COL_START) != null &&
!sheet.getRow(i).getCell(COL_START).toString().equals("")){
// row is not null so you can copy it to a new sheet
Row newRow = newSheet.createRow(newI++);
newRow = sheet.getRow(i);
}
}
}
// Save as in another file
FileOutputStream fileOut = new FileOutputStream("out.xls");
wbOut.write(fileOut);
fileOut.flush();
fileOut.close();
So you'll copy only the not null rows into a new workbook that you can finally write.

I have this xls document:
My objective is delete every empty rows, for this issue I think that the best option is the OPI in Java. The code is:
public class ExcelDeleteRowsCols {
final short ROW_START = 0;
final short COL_START = 0;
public void deleteRows() {
try {
// Open file
FileInputStream inf = new FileInputStream("in.xls");
Workbook wb = WorkbookFactory.create(inf);
// Loop every sheets of workbook
for (Sheet sheet : wb){
// Loop every rows of this sheet
int lastIndex = sheet.getLastRowNum();
for (int i = ROW_START; i <= lastIndex; i++) {
if (sheet.getRow(i) == null || sheet.getRow(i).getCell(COL_START) == null || sheet.getRow(i).getCell(COL_START).toString().equals("")){
sheet.removeRow(sheet.getRow(i));
}
}
}
// Save as in another file
FileOutputStream fileOut = new FileOutputStream("out.xls");
wb.write(fileOut);
fileOut.flush();
fileOut.close();
System.out.println("Finished!");
} catch (IOException ioe) {
System.out.println(ioe);
} catch (Exception e) {
System.out.println(e);
}
}
}
When the row is empty the code is working fine but the problem is when the row is null(sheet.getRow(i) == null). For example in this xls the row 2, row 12, row 15 and 16 your value is null then it doesn't delete because the command sheet.removeRow(sheet.getRow(i)); throws and exception by NullPointer.
Is there any way to delete a row that the value is null?

So rather than deleting rows, you really want to close up empty rows in your spreadsheet. That is, shift rows containing data up so that there are no blank rows in between.
I have totally changed this answer to take this into consideration.
FileInputStream inf = new FileInputStream("Row_Delete_Test.xlsx");
Workbook wb = WorkbookFactory.create(inf);
for (Sheet sh : wb) {
int previousIndex = sh.getFirstRowNum();
if (previousIndex > 0) {
sh.shiftRows(px, sh.getLastRowNum(), -px);
previousIndex = 0;
}
for (Row row : sh) {
boolean deleteRow = true;
for (Cell cell : row) {
if (!cell.toString().trim().equals("")) {
deleteRow = false;
break;
}
}
int currentIndex = row.getRowNum();
if (deleteRow) {
sh.removeRow(row);
} else {
if (currentIndex > previousIndex + 1) {
sh.shiftRows(row.getRowNum(), sh.getLastRowNum(), previousIndex - currentIndex + 1);
currentIndex = previousIndex + 1;
}
previousIndex = currentIndex;
}
}
}
FileOutputStream fileOut = new FileOutputStream("Row_Delete_Test.xlsx");
wb.write(fileOut);
wb.close();
fileOut.close();
This will have the effect of "deleting" rows from the spreadsheet.
Note: an Excel spreadsheet only really contains the rows with cells, and only contains cells with data. That data can be almost anything, including blanks, so if you also want to "delete" rows where the only cell values are blanks, then you will have to search for that.
Here is some example data (note this is from sheet1.xml inside the Row_Delete_Test.xlsx file)
<sheetData>
<row r="2" spans="2:3" x14ac:dyDescent="0.25">
<c r="B2" t="s">
<v>0</v>
</c>
</row>
<row r="5" spans="2:3" x14ac:dyDescent="0.25">
<c r="B5" t="s">
<v>1</v>
</c>
</row>
<row r="6" spans="2:3" x14ac:dyDescent="0.25">
<c r="C6" t="s">
<v>4</v>
</c>
</row>
<row r="7" spans="2:3" x14ac:dyDescent="0.25">
<c r="B7" t="s">
<v>2</v>
</c>
</row>
<row r="8" spans="2:3" x14ac:dyDescent="0.25">
<c r="B8" t="s">
<v>3</v>
</c>
</row>
<row r="9" spans="2:3" x14ac:dyDescent="0.25">
<c r="C9" t="s">
<v>4</v>
</c>
</row>
</sheetData>
I am going to just tell you, as there is no way you could know other than looking at the shared strings table, that shared string 4, designated by <v>4</v> is just a blank value. The other shared string values are <v>0</v> = 'Row 1', <v>1</v> = 'Row 2', <v>2</v> = 'Row 3', and <v>3</v> = 'Row 4'. So here rows 2, and 5 through 9 are populated, each row has a single cell with data in it. Rows 6 and 9 each have a cell with a blank value in column C.
After running the above code, the sheetData looks like this
<sheetData>
<row r="1" spans="2:3" x14ac:dyDescent="0.25">
<c r="B1" t="s">
<v>0</v>
</c>
</row>
<row r="2" spans="2:3" x14ac:dyDescent="0.25">
<c r="B2" t="s">
<v>1</v>
</c>
</row>
<row r="3" spans="2:3" x14ac:dyDescent="0.25">
<c r="B3" t="s">
<v>2</v>
</c>
</row>
<row r="4" spans="2:3" x14ac:dyDescent="0.25">
<c r="B4" t="s">
<v>3</v>
</c>
</row>
</sheetData>
Now only rows 1-4 are in the spreadsheet. Row 2 has been moved to row 1, 5 to 2, 7 to 3, and 8 to 4.

Related

Apache-poi how to unhide column upon creation of Excel file

I am trying to generate an Excel workbook which will be a template.
For now I am trying to generate a workbook with 1 sheet that holds only header cells with values, with certain Height and Width values. The problem is not that I cannot do it, but when I generate/create the .xlsx file the cells are hidden in a certain way and I have to click N(if there are 13 cells) times to display them all.
[Example of how Cells are hidden]
[1]: https://i.stack.imgur.com/xYh1P.png
And the way I want them to be displayed upon creation of the file is like this.
[Example of how I wish them to be displayed uppon creation]
[2]: https://i.stack.imgur.com/FWILw.png
The code is as follows
//creating workbook
private static void createWorkBookFile() throws IOException {
String filePath = "C:\\UltimateMapper\\UltimateMapperProject\\";
filePath+="\\WriteTestFiles";
System.out.print("Enter name of file: ");
String fileName = scan.nextLine();
filePath+="\\"+fileName+".xlsx";
XSSFWorkbook workbook = new XSSFWorkbook();
XSSFSheet currentSheet = workbook.createSheet("File to STG");
XSSFRow row;
XSSFCell cell;
//TO DO START FORM HERE FINISH AND THEN INSIDE LOOP
String[] fileToSTGCellValues_Names = new String[] {"Source Location_Schema","Source File_Table Name",
"Source_Field_Column Name","Start Pos","End Pos",
"Source Field Length","Source Field_Column Data Type",
"Transformation","Target Location_Schema","Target File_Table Name",
"Target Field_Column Name","Target Field_Column Data Type","Comments"};
HashMap<String,Integer> fileToSTGCellValues_Widths = new HashMap<String,Integer>();
fileToSTGCellValues_Widths.put("Source Location_Schema",26);
fileToSTGCellValues_Widths.put("Source File_Table Name",26);
fileToSTGCellValues_Widths.put("Source_Field_Column Name",35);
fileToSTGCellValues_Widths.put("Start Pos",10);
fileToSTGCellValues_Widths.put("End Pos",10);
fileToSTGCellValues_Widths.put("Source Field Length",18);
fileToSTGCellValues_Widths.put("Source Field_Column Data Type",21);
fileToSTGCellValues_Widths.put("Transformation",43);
fileToSTGCellValues_Widths.put("Target Location_Schema",18);
fileToSTGCellValues_Widths.put("Target File_Table Name",36);
fileToSTGCellValues_Widths.put("Target Field_Column Name",36);
fileToSTGCellValues_Widths.put("Target Field_Column Data Type",20);
fileToSTGCellValues_Widths.put("Comments",47);
int headeRowNumber = 0;
row = currentSheet.createRow(headeRowNumber);
row.setHeightInPoints(28.50f);
for(int i =0;i<13;i++) {
currentSheet.setColumnWidth(i, fileToSTGCellValues_Widths.get(fileToSTGCellValues_Names[i]));
cell = row.createCell(i);
cell.setCellValue(fileToSTGCellValues_Names[i]);
}
FileOutputStream fout = new FileOutputStream(filePath);
workbook.write(fout);
fout.close();
System.out.println("File created");
}```

apache POI: dataValidation (or style) for entire column, except for the header row?

For the CellRange we can pass -1 for both the start/end row parameters to apply styles and dataValidators to the entire column.
But how to skip the header?
The ideal solution would be a CellRangeAddressList created with "A1:A$", but it only have int constructors.
i tried assuming that -1 is a special value that means something special, but CellRangeAddressList(1, -1, ...) fails with a "start row > finish row" error. Then I also tried assuming -1 meant last cell, but going from last to 1 CellRangeAddressList(-1, 1, ...) resulted in no cell selected.
Lastly I tried to remove the first row from the CellRangeAddressList(-1, -1, ...) but it is not possible to manipulate the ranges after creation as far as I could tell from the docs.

Creating a CellRangeAddress for whole column except first row means a CellRangeAddress starts on row 2 and goes up to maximum rows count. This depends on SpreadsheetVersion. In EXCEL2007 the maximum rows count is 2^20 = 1048576. In EXCEL97 the maximum rows count is 2^16 = 65536.
Using SpreadsheetVersion we can get that different maximum rows count dependent on SpreadsheetVersion.
Example:
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.util.CellRangeAddressList;
import org.apache.poi.ss.SpreadsheetVersion;
class CreateCellRangeAddressList {
public static void main(String[] args) throws Exception {
//Workbook workbook = new XSSFWorkbook();
Workbook workbook = new HSSFWorkbook();
// ...
int lastRow = workbook.getSpreadsheetVersion().getLastRowIndex();
CellRangeAddressList cellRangeAddressList = new CellRangeAddressList(
1, // row 2
lastRow,
2, // column C
2);
System.out.println(cellRangeAddressList.getCellRangeAddress(0));
//C2:C1048576 or C2:C65536 dependent on SpreadsheetVersion
// ...
}
}
Because the question was about data validation for whole column except first row let's have a example for this.
import java.io.FileOutputStream;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.util.CellRangeAddressList;
class CreateExcelDataValidationListsWholeColumn {
public static void main(String[] args) throws Exception {
//Workbook workbook = new HSSFWorkbook();
Workbook workbook = new XSSFWorkbook();
Sheet sheet = workbook.createSheet("Sheet1");
sheet.createRow(0).createCell(1).setCellValue("col2Head");
//data validation in column B, except first row:
DataValidationHelper dvHelper = sheet.getDataValidationHelper();
DataValidationConstraint dvConstraint = dvHelper.createExplicitListConstraint(new String[]{"X", "Y"}) ;
int lastRow = workbook.getSpreadsheetVersion().getLastRowIndex();
CellRangeAddressList addressList = new CellRangeAddressList(1, lastRow, 1, 1); //B2:B1048576
DataValidation validation = dvHelper.createValidation(dvConstraint, addressList);
sheet.addValidationData(validation); // data validation for B2:B1048576
FileOutputStream out = null;
if (workbook instanceof HSSFWorkbook) {
out = new FileOutputStream("CreateExcelDataValidationListsWholeColumn.xls");
} else if (workbook instanceof XSSFWorkbook) {
out = new FileOutputStream("CreateExcelDataValidationListsWholeColumn.xlsx");
}
workbook.write(out);
workbook.close();
out.close();
}
}
This results in sheet XML as follows:
<worksheet>
<dimension ref="B1"/>
<sheetViews>
<sheetView workbookViewId="0" tabSelected="true"/>
</sheetViews>
<sheetFormatPr defaultRowHeight="15.0"/>
<sheetData>
<row r="1"><c r="B1" t="s"><v>0</v></c></row>
</sheetData>
<dataValidations count="1">
<dataValidation type="list" sqref="B2:B1048576" allowBlank="true" errorStyle="stop">
<formula1>"X,Y"</formula1>
</dataValidation>
</dataValidations>
<pageMargins bottom="0.75" footer="0.3" header="0.3" left="0.7" right="0.7" top="0.75"/>
</worksheet>
And using HSSFWorkbook the resulting CreateExcelDataValidationListsWholeColumn.xls is 4 KByte in size.

EPPlus DataField with PercentOfTotal

I'm usingg EPPlus to create a pivot table in excel but I wish show data as percent of the total in one my DataFields, how can I do that?
public static void createTableMotivo(Worksheet ws, ExcelRangeBase range)
{
const string FORMATCURRENCY = "#,###;[Red](#,###)";
ExcelWorksheet worksheet = ws.EPPlusSheet;
//The pivot table
ExcelPivotTable pivotTable = worksheet.PivotTables.Add(worksheet.Cells["B12"], range, "pivot_table1");
//The label row field
pivotTable.RowFields.Add(pivotTable.Fields["FIELD1"]);
pivotTable.DataOnRows = false;
pivotTable.ShowCalcMember = true;
//The data fields
ExcelPivotTableDataField fieldSum = pivotTable.DataFields.Add(pivotTable.Fields["FIELD2"]);
fieldSum.Name = "Quantidade de Faturas";
fieldSum.Function = DataFieldFunctions.Sum;
fieldSum.Format = FORMATCURRENCY;
ExcelPivotTableDataField fieldPercent = pivotTable.DataFields.Add(pivotTable.Fields["FIELD2"]);
fieldPercent.Name = "%";
fieldPercent.Function = DataFieldFunctions.None;
fieldPercent.Format = "0.00%";
pivotTable.PageFields.Add(pivotTable.Fields["FIELD3"]);
pivotTable.PageFields.Add(pivotTable.Fields["FIELD4"]);
}

I am trying to do something similar. The only way I can get it this to work is by manipulating the xml of the Excel file.
This was interesting in itself - I can't remember where I saw that you can rename .xlsx to .zip and then just view all of the xml files inside that zip.
In my case, I first set "Show Values As" to "% of Grand Total" on the pivot table of the actual Excel file. Then after changing the file extention from .xlsx to .zip and extracting, there was a pivotTable3.xml file that had:
<dataFields count="2">
<dataField name="Number of Orders" fld="10" subtotal="count"/>
<dataField name="Percent of Total" fld="10" subtotal="count" showDataAs="percentOfTotal" numFmtId="10"/>
</dataFields>
The goal is to get showDataAs="percentOfTotal" in the dataField element whose name is "Percent of Total".
I tried to use an xpath expression to get the dataField element, but it returns null:
pivotTable.PivotTableXml.SelectSingleNode("//dataField[name='Percent of Total']")
So I had to fall back to walking down the xml:
pivotTable.DataFields[1].Format = "#0.00%";
pivotTable.DataFields[1].Function = DataFieldFunctions.Count;
pivotTable.DataFields[1].Name = "Percent of Total";
foreach (XmlElement documentElementChild in pivotTable.PivotTableXml.DocumentElement.ChildNodes)
{
if (documentElementChild.Name.Equals("dataFields"))
{
foreach (XmlElement dataFieldChild in documentElementChild.ChildNodes)
{
foreach (XmlAttribute attribute in dataFieldChild.Attributes)
{
if (attribute.Value.Equals("Percent of Total"))
{
// found our dataField element; add the attribute
dataFieldChild.SetAttribute("showDataAs", "percentOfTotal");
break;
}
}
}
}
}

How to set formula have table column field in Apache POI

I created an XSSFTable with below example code:
https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/usermodel/examples/CreateTable.java
One column in my XSSFTable is a formula that referencing to another column in this table.
For example, in XSSFTable TBL column ColumnA, the formula is: =[#[ColumnB]], I can set the formula on each cell in ColumnA via cell.setCellFormula("TBL[[#This Row],[ColumnB]]"), but it will have problem while opened in Excel and Excel has to remove the formula in order to display the worksheet correctly.
This problem only happened in creating blank new XSSFWorkbook, if it is loaded from an existing .xlsx file created by Excel, it is able to modify the formula via cell.setCellFormula() and able to open in Excel correctly.
If there are any sample code can work correctly in this situation?

Main problem with the linked example is that it names all columns equal "Column":
...
for(int i=0; i<3; i++) {
//Create column
column = columns.addNewTableColumn();
column.setName("Column");
column.setId(i+1);
...
So formula parser cannot difference between them.
But the whole logic of filling the table column headers and filling the sheet contents using one loop is not really comprehensible. So here is a more appropriate example:
public class CreateTable {
public static void main(String[] args) throws IOException {
Workbook wb = new XSSFWorkbook();
XSSFSheet sheet = (XSSFSheet) wb.createSheet();
//Create
XSSFTable table = sheet.createTable();
table.setDisplayName("Test");
CTTable cttable = table.getCTTable();
//Style configurations
CTTableStyleInfo style = cttable.addNewTableStyleInfo();
style.setName("TableStyleMedium2");
style.setShowColumnStripes(false);
style.setShowRowStripes(true);
//Set which area the table should be placed in
AreaReference reference = new AreaReference(new CellReference(0, 0),
new CellReference(4,2));
cttable.setRef(reference.formatAsString());
cttable.setId(1);
cttable.setName("Test");
cttable.setTotalsRowCount(1);
CTTableColumns columns = cttable.addNewTableColumns();
columns.setCount(3);
CTTableColumn column;
XSSFRow row;
XSSFCell cell;
//Create 3 columns in table
for(int i=0; i<3; i++) {
column = columns.addNewTableColumn();
column.setName("Column"+i);
column.setId(i+1);
}
//Create sheet contents
for(int i=0; i<5; i++) {//Create 5 rows
row = sheet.createRow(i);
for(int j=0; j<3; j++) {//Create 3 cells each row
cell = row.createCell(j);
if(i == 0) { //first row is for column headers
cell.setCellValue("Column"+j);
} else if(i<4){ //next rows except last row are data rows, last row is totals row so don't put something in
if (j<2) cell.setCellValue((i+1)*(j+1)); //two data columns
else cell.setCellFormula("Test[[#This Row],[Column0]]*Test[[#This Row],[Column1]]"); //one formula column
}
}
}
FileOutputStream fileOut = new FileOutputStream("ooxml-table.xlsx");
wb.write(fileOut);
fileOut.close();
wb.close();
}
}

What indicates an Office Open XML Cell contains a Date/Time value?

I'm reading an .xlsx file using the Office Open XML SDK and am confused about reading Date/Time values. One of my spreadsheets has this markup (generated by Excel 2010)
<x:row r="2" spans="1:22" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:c r="A2" t="s">
<x:v>56</x:v>
</x:c>
<x:c r="B2" t="s">
<x:v>64</x:v>
</x:c>
.
.
.
<x:c r="J2" s="9">
<x:v>17145</x:v>
</x:c>
Cell J2 has a date serial value in it and a style attribute s="9". However, the Office Open XML Specification says that 9 corresponds to a followed hyperlink. This is a screen shot from page 4,999 of ECMA-376, Second Edition, Part 1 - Fundamentals And Markup Language Reference.pdf.
The presetCellStyles.xml file included with the spec also refers to builtinId 9 as a followed hyperlink.
<followedHyperlink builtinId="9">
All of the styles in the spec are simply visual formatting styles, not number styles. Where are the number styles defined and how does one differentiate a style reference s="9" from indicating a cell formatting (visual) style vs a number style?
Obviously I'm looking in the wrong place to match styles on cells with their number formats. Where's the right place to find this information?

The s attribute references a style xf entry in styles.xml. The style xf in turn references a number format mask. To identify a cell that contains a date, you need to perform the style xf -> numberformat lookup, then identify whether that numberformat mask is a date/time numberformat mask (rather than, for example, a percentage or an accounting numberformat mask).
The style.xml file has elements like:
<xf numFmtId="14" ... applyNumberFormat="1" />
<xf numFmtId="1" ... applyNumberFormat="1" />
These are the xf entries, which in turn give you a numFmtId that references the number format mask.
You should find the numFmts section somewhere near the top of style.xml, as part of the styleSheet element
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<numFmts count="3">
<numFmt numFmtId="164" formatCode="[$-414]mmmm\ yyyy;#" />
<numFmt numFmtId="165" formatCode="0.000" />
<numFmt numFmtId="166" formatCode="#,##0.000" />
</numFmts>
The number format id may be here, or it may be one of the built-in formats. Number format codes (numFmtId) less than 164 are "built-in".
The list that I have is incomplete:
0 = 'General';
1 = '0';
2 = '0.00';
3 = '#,##0';
4 = '#,##0.00';
9 = '0%';
10 = '0.00%';
11 = '0.00E+00';
12 = '# ?/?';
13 = '# ??/??';
14 = 'mm-dd-yy';
15 = 'd-mmm-yy';
16 = 'd-mmm';
17 = 'mmm-yy';
18 = 'h:mm AM/PM';
19 = 'h:mm:ss AM/PM';
20 = 'h:mm';
21 = 'h:mm:ss';
22 = 'm/d/yy h:mm';
37 = '#,##0 ;(#,##0)';
38 = '#,##0 ;[Red](#,##0)';
39 = '#,##0.00;(#,##0.00)';
40 = '#,##0.00;[Red](#,##0.00)';
44 = '_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(#_)';
45 = 'mm:ss';
46 = '[h]:mm:ss';
47 = 'mmss.0';
48 = '##0.0E+0';
49 = '#';
27 = '[$-404]e/m/d';
30 = 'm/d/yy';
36 = '[$-404]e/m/d';
50 = '[$-404]e/m/d';
57 = '[$-404]e/m/d';
59 = 't0';
60 = 't0.00';
61 = 't#,##0';
62 = 't#,##0.00';
67 = 't0%';
68 = 't0.00%';
69 = 't# ?/?';
70 = 't# ??/??';
The missing values are mainly related to east asian variant formats.

The chosen answer is spot-on, but note that Excel defines some number format (numFmt) codes differently from the OpenXML spec. Per the Open XML SDK 2.5 Productivity Tool's documentation (on the "Implementer Notes" tab for the NumberingFormat class):
The standard defines built-in format ID 14: "mm-dd-yy"; 22: "m/d/yy h:mm"; 37: "#,##0 ;(#,##0)"; 38: "#,##0 ;[Red]"; 39: "#,##0.00;(#,##0.00)"; 40: "#,##0.00;[Red]"; 47: "mmss.0"; KOR fmt 55: "yyyy-mm-dd".
Excel defines built-in format ID
14: "m/d/yyyy"
22: "m/d/yyyy h:mm"
37: "#,##0_);(#,##0)"
38: "#,##0_);[Red]"
39: "#,##0.00_);(#,##0.00)"
40: "#,##0.00_);[Red]"
47: "mm:ss.0"
55: "yyyy/mm/dd"
Most are minor variations, but #14 is a doozy. I wasted a couple of hours troubleshooting why leading zeros weren't being added to single-digits months and days (e.g. 01/05/14 vs. 1/5/14).

Thought I'd add my solution that I've put together to determine if the double value FromOADate is really a date or not. Reason being is I have a zip code in my excel file as well. The numberingFormat will be null if it's text.
Alternatively you could use the numberingFormatId and check against a list of Ids that Excel uses for dates.
In my case I've explicitly determined the formatting of all fields for the client.
/// <summary>
/// Creates the datatable and parses the file into a datatable
/// </summary>
/// <param name="fileName">the file upload's filename</param>
private void ReadAsDataTable(string fileName)
{
try
{
DataTable dt = new DataTable();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(string.Format("{0}/{1}", UploadPath, fileName), false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;
var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;
// columns omitted for brevity
// skip first row as this row is column header names
foreach (Row row in rows.Skip(1))
{
DataRow dataRow = dt.NewRow();
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
bool isDate = false;
var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value;
var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);
if (cellFormat.NumberFormatId != null)
{
var numberFormatId = cellFormat.NumberFormatId.Value;
var numberingFormat = numberingFormats.Cast<NumberingFormat>()
.SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);
// Here's yer string! Example: $#,##0.00_);[Red]($#,##0.00)
if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("mm/dd/yy"))
{
string formatString = numberingFormat.FormatCode.Value;
isDate = true;
}
}
// replace '-' with empty string
string value = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i), isDate);
dataRow[i] = value.Equals("-") ? string.Empty : value;
}
dt.Rows.Add(dataRow);
}
}
this.InsertMembers(dt);
dt.Clear();
}
catch (Exception ex)
{
LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);
}
}
/// <summary>
/// Reads the cell's value
/// </summary>
/// <param name="document">current document</param>
/// <param name="cell">the cell to read</param>
/// <returns>cell's value</returns>
private string GetCellValue(SpreadsheetDocument document, Cell cell, bool isDate)
{
string value = string.Empty;
try
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
// check if this is a date or zip.
// integers will be passed into this else statement as well.
if (isDate)
{
value = DateTime.FromOADate(double.Parse(value)).ToString();
}
return value;
}
}
catch (Exception ex)
{
LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);
}
return value;
}

In styles.xml see if there is a numFmt node. I think that will hold a numFmtId of "9" which will relate to the date format that's used.
I don't know where that is in the ECMA, but if you search for numFmt, you might find it.

It was unclear to me how to reliably determine whether a cell has date/time value. After spending some time experimenting I had come up with the code (see post) that would look for both built-in and custom date/time formats.

In case anyone else is having a hard time with this, here is what I've done:
1) Create a new excel file and put in a date time string in cell A1
2) Change formatting on the cell to whatever you want, then save file.
3) Run following powershell script to extract out the stylesheet from .xlxs
[Reflection.Assembly]::LoadWithPartialName("DocumentFormat.OpenXml")
$xlsx = (ls C:\PATH\TO\FILE.xlsx).FullName
$package = [DocumentFormat.OpenXml.Packaging.SpreadsheetDocument]::Open($xlsx, $true)
[xml]$style = $package.WorkbookPart.WorkbookStylesPart.Stylesheet.OuterXml
Out-File -InputObject $style.OuterXml -FilePath "style.xml"
style.xml now contains the information that you can inject to DocumentFormat.OpenXml.Spreadsheet.Stylesheet(string outerXml), leading to
4) Use the extracted file to construct excel object model
var style = File.ReadAllText(#"c:\PATH\TO\EXTRACTED\Style.xml");
var stylesheetPart = WorkbookPart_REFERENCE.AddNewPart<WorkbookStylesPart>();
stylesheetPart.Stylesheet = new Stylesheet(style);
stylesheetPart.Stylesheet.Save();

#RobScott reference to your code snippet
I have found always null in style index of a particular Cell
var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value;
my requirement to read below mentioned excel and transfrom the row and column data to the json.
excel reference
StockInvoiceNo
StockInvoiceOn
Name
Description
DC3320012989
23-01-2021 00:00:00:00
item1
description
DC3320012989
24-01-2021 00:00:00:00
item2
description
DC3320012989
25-01-2021 00:00:00:00
item3
description

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Java Excel POI - Delete rows with empty cells exception - apache-poi

Related

Apache-poi how to unhide column upon creation of Excel file

apache POI: dataValidation (or style) for entire column, except for the header row?

EPPlus DataField with PercentOfTotal

How to set formula have table column field in Apache POI

What indicates an Office Open XML Cell contains a Date/Time value?

Categories

Resources