apache poi - XSSF read formatted cell value

apache poi - XSSF read formatted cell value - apache-poi

Is there any way I can get to the formatted value that excel shows in a row, versus the raw value I am getting returned from the stream?
Or would this fall under the "formula evaluation" category, which this does not support?

If you have the Cell that you're trying to get the data out of, try the following
DataFormatter formatter = new DataFormatter();
String formattedCellValue = formatter.formatCellValue(myCell);
If that doesn't get exactly what you're looking for, there are a number of different methods in the DataFormatter class that do the trick. Check out the API.

Related

Using POI to find out format for a cell

What is the correct way to get the CellFormatType for a cell (DATE, NUMBER, TEXT, GENERAL) ?
cell.getCellStyle.getDataFormat() returns a Short value, which does not map to the above constants.
I cannot just use cell.getCellType for the following reason. For certain rows, there may be a string prefix like <> or > in front of the value. In that case, getCellType will return CELL_TYPE_STRING. So the only way to get the underlying type appears to be to look at the format for the column.
Thanks

I don't know a direct route from Cell to CellFormatType, but there are various ways to determine particular types:
General? getDataFormat will be 0 (or more reliably, getDataFormatString will be "General")
Date? Check out DateUtil.isCellDateFormatted
Text? Typically you'd check for Cell.getCellType = CELL_TYPE_STRING, but since you say you can't do this (see comments below), you could also try checking if getDataFormatString is "#" or "text" (both are possibilities for plain text)
Number? Again, usually you'd just check for CELL_TYPE_NUMERIC. Note that dates are also considered numbers, so check for dates first.
(If the cell type is CELL_TYPE_FORMULA you should check getCachedFormulaResultType instead)
I cannot just use cell.getCellType for the following reason. For certain rows, there may be a string prefix like <> or > in front of the value
I'm not familiar with this problem. Perhaps you have a custom format which prepends characters before the number? My notes on text detection above might help you in this case, but I suggest you double-check that your spreadsheet actually has the relevant information (i.e. it isn't just Excel being clever when you re-open the sheet): log the cell type, format and format string, and confirm that there actually is a difference on the relevant cells. It's possible that your cell is actually being saved as text, with no distinguishing mark. In this case, you'll need to add some special heuristics for your specific scenario.
Finally, since you seem confused about the returned value from CellStyle.getDataFormat; it's just an index in a sheet-wide index → format-string table. You can (and I'd say usually should) use getDataFormatString to get the string format directly. The standard cell formats are listed in BuiltinFormats, which you can use as a reference to see which format strings you might find (but always check the format string, not the ID, otherwise you'll fail to detect custom formats correctly).

When does FlexCel's TCellValue.IsDateTime always return false?

I am using TMS' FlexCel Excel document component. I am - however - having a problem with dates. As I understand it, Excel has three types: Strings, numbers and datetimes. This is also represented in FlexCel's TCellValue's methods, such as IsString, IsNumber and IsDateTime.
However, IsDateTime always returns False. TCellValue has a ToDateTime function that will always return a TDateTime. If the original cell value is not a date, the time will be "1899-12-31T23:59:59".
I am however not keen on the idea using DateUtils's YearOf() on 1899 to detect whether it truly was a date. What if the cell contained a float (double)? That would become a proper TDateTime.
I have made sure that the format for these cells are a datetime. When exporting to Excel's XML Spreadsheet 2003 format, I can see that the cells are properly detected as datetimes by Excel. Why doesn't FlexCel detect the same thing?

Using Apache POI to manipulate excel files

I'm trying to write a test using the Marathon testing tool with Jython. I'm using the Apache POI in order to read/write with Excel files. I'm very new to Jython and the Apache POI so this question may seem very simple to some, but I can't get past it. I'm using the getCell() function in the Cell interface and it grabs the cell just fine, but the contents that it prints for me are not what I want. I want the integer value, but it returns a floating point.
for r in range(1, rowsBusiness):
row = sheetBusiness.getRow(r)
idNum = row.getCell(0) # it is returning double values here
print idNum
print idNum.getStringCellValue()
I'm okay with it returning double values so long as I can convert them to a string or integer because the application that I'm testing converts from string to integer or spits out an error, but I can't figure out how to convert from double and get rid of the decimal point. The getStringCellValue() function doesn't work on idNum. It just leaves it blank and the test gets stuck. I also formatted the Excel file so that it only takes integer values in the cells that I'm referring to. So, for example, in the excel file I have the value 1(formatted to not contain any decimal points), but the print idNum returns 1.0
Any helpful hints on how to get this to a string or integer? Or any other ideas that might contribute to a successful workaround?

The Excel file format only has a few basic types, like Numeric and String. Most numbers get stored as a floating point value + formatting rules. That's why you're getting back a double not an int - that's simply what lives in the file!
Apache POI provides the DataFormatter class, which takes a floating point value and the format string, and does its best to return a string of the number as shown in Excel. For your case, calling that is likely to yield what you're after.

Reading mix between numeric and non-numeric data from excel into Matlab

I have a matrix where the first column contains dates and the first row contains maturities which are alpha/numeric (e.g. 16year).
The rest of the cells contain the rates for each day, which are double precision numbers.
Now I believe xlsread() can only handle numeric data so I think I will need something else or a combination of functions?
I would like to be able to read the table from excel into MATLAB as one array or perhaps a struct() so that I can keep all the data together.
The other problem is that some of the rates are given as '#N/A'. I want the cells where these values are stored to be kept but would like to change the value to blank=" ".
What is the best way to do this? Can it be done as part of the input process?

Well, from looking at matlab reference for xlsread you can use the format
[num,txt,raw] = xlsread(FILENAME)
and then you will have in num a matrix of your data, in txt the unreadable data, i.e. your text headers, and in raw you will have all of your data unprocessed. (including the text headers).
So I guess you could use the raw array, or a combination of the num and txt.
For your other problem, if your rates are 'pulled' from some other source, you can use
=IFERROR(RATE DATA,"")
and then there will be a blank instead of the error code #N\A.
Another solution (only for Windows) would be to use xlsread() format which allows running a function on your imported data,
[num,txt,raw,custom] = xlsread(filename,sheet,xlRange,'',functionHandler)
and let the function replace the NaN values with blank spots. (and you will have your output in the custom array)

How to get the max no. of columns filled in an XLSX file using POI?

I know we can get the max number of columns by iterating over all the rows and calling getLastCellNumber on each row object.. but this approach requires iterating over all the rows which I want to avoid since it will take lot of time for files with a million rows(that’s the kind of files I am expecting to be read).
When POI reads a excel file, it stores the sheet dimensions (first row number, last row number , first col number, last col number) in an object of the DimensionsRecord class. So if I get this object I will get what I need. These objects can be obtained from the Sheet class which is an inner class of POI. I was able to extract what I need for XLS files, but I have hit a roadblock for XLSX files.
Does POI maintain DimensionsRecord object for XLSX also?, if yes has anybody tried to extract it? Or Is there some other by which this can be done?? please help!
Also I wanted to ask, whether my approach is correct or not, i.e I am using the inner classes of POI (it is getting my work done), is this correct or should I solely rely on exposed APIs (too time consuming).

There's a dimension object on XSSF Sheets too. Try:
CTSheetDimension dimension = sheet.getCTWorksheet().getDimension();
String sheetDimensions = dimenson.getRef();
The one issue that springs to mind is I'm not sure if it's required for the dimension (CTDimensions or DimensionsRecord) to always be correct...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string