Using POI to find out format for a cell - apache-poi

What is the correct way to get the CellFormatType for a cell (DATE, NUMBER, TEXT, GENERAL) ?
cell.getCellStyle.getDataFormat() returns a Short value, which does not map to the above constants.
I cannot just use cell.getCellType for the following reason. For certain rows, there may be a string prefix like <> or > in front of the value. In that case, getCellType will return CELL_TYPE_STRING. So the only way to get the underlying type appears to be to look at the format for the column.
Thanks

I don't know a direct route from Cell to CellFormatType, but there are various ways to determine particular types:
General? getDataFormat will be 0 (or more reliably, getDataFormatString will be "General")
Date? Check out DateUtil.isCellDateFormatted
Text? Typically you'd check for Cell.getCellType = CELL_TYPE_STRING, but since you say you can't do this (see comments below), you could also try checking if getDataFormatString is "#" or "text" (both are possibilities for plain text)
Number? Again, usually you'd just check for CELL_TYPE_NUMERIC. Note that dates are also considered numbers, so check for dates first.
(If the cell type is CELL_TYPE_FORMULA you should check getCachedFormulaResultType instead)
I cannot just use cell.getCellType for the following reason. For certain rows, there may be a string prefix like <> or > in front of the value
I'm not familiar with this problem. Perhaps you have a custom format which prepends characters before the number? My notes on text detection above might help you in this case, but I suggest you double-check that your spreadsheet actually has the relevant information (i.e. it isn't just Excel being clever when you re-open the sheet): log the cell type, format and format string, and confirm that there actually is a difference on the relevant cells. It's possible that your cell is actually being saved as text, with no distinguishing mark. In this case, you'll need to add some special heuristics for your specific scenario.
Finally, since you seem confused about the returned value from CellStyle.getDataFormat; it's just an index in a sheet-wide index → format-string table. You can (and I'd say usually should) use getDataFormatString to get the string format directly. The standard cell formats are listed in BuiltinFormats, which you can use as a reference to see which format strings you might find (but always check the format string, not the ID, otherwise you'll fail to detect custom formats correctly).

Related

Counting number of occurences of a specific search string

I'm building a monitoring system that takes a log (where people register their work in a set format) and returns a counter, which I can use for analysis. The monitor and log are two separate workbooks. The log has entries like this: INITALS;DATE;HOUR:RESULT|
Each cell can contain multiple entries.
My first attempt was to do a simple countif and look for a string (note that I use ; instead of , in formulas since I work on a Dutch excel):
=COUNTIF('LOCATION'!Table[LOG];"*NB;??/??/????;??:??:#A*|*")
This worked fine, but the formula only counted the number of cells where this string was present, not the actual number of occurences. I then tried this solution.
=SUM(LEN('LOCATION'!Tabel13[LOG])-LEN(SUBSTITUTE('LOCATION'!Tabel13[LOG];"NB";"")))
This indeed counted the number of times "NB" was present in the LOG. However, when I tried to use the original search string, this solution stopped working:
=SUM(LEN('LOCATION'!Tabel13[LOG])-LEN(SUBSTITUTE('LOCATION'!Tabel13[LOG];"*NB;??/??/????;??:??:#A*|*";"")))
It seems to me that SUM does not recognize symbols like ? or * which are necessary to define the correct search string. Where did I go wrong? Or can this be solved in another way? I can still look into VBA, but the workbooks are slow as hell already.
"?" and "*" are wildcards. Some functions support these (like COUNTIFS()) where others don't. Like you found out, SUBSTITUTE() does not.
Here is one way to count, assuming ms365:
Formula in C1:
=REDUCE(0,A1:A2,LAMBDA(a,b,a+LET(X,SEQUENCE(LEN(b)),SUM(--(IFERROR(SEARCH("NB;??/??/????;??:??:#A*|*",b,X),0)=X)))))
Note: I removed the asterisk in front of "NB" just to make searching for a position valid in comparison to what i called variable "X".

Locale-independent Text function in Excel

I need to format dates in excel, and I'm trying to use the TEXT formula. The problem is that Excel's intepretation of the arguments changes when the locale changes.
For example: if I have a date in cell A1, that i'd like to convert to text, in the year-month-day-format, I have to use =TEXT(A1, "yyyy-mm-dd") if my PC has an English-language locale, but =TEXT(A1, "jjjj-MM-tt") (I kid you not, the M has to be upper case) if it has a German-language locale. This makes the document unportable. (The second argument is plain text and therefore not converted when changing locale.)
Remarks:
This is just an example, I know I could do the long =YEAR(A1) & "-" & TEXT(MONTH(A1), "00") & "-" & TEXT(DAY(A1), "00") in this case. I'm wondering about the more general case.
The date should not just be displayed in a certain format, it should actually be a string. For someone viewing the file this doesn't make a difference, but when using it in other formulas, it does.
I could write a UDF in VBA to solve the issue, but I cannot use VBA in this document.
I do not care about changing the names of the months etc. It's fine, if the name of the month is June or Juni depending on the locale.
I want to stress that the issue occurs due to the PC's locale - not due to the GUI language of the MS Office version. In the example above, Excel's GUI and formulas were in English in both examples; I just changed the locale on the machine.
Many thanks
Here is a slightly cheaty method: Use a VLOOKUP on a value that will change based on your System Language - for example TEXT(1,"MMMM")
=VLOOKUP(TEXT(1,"MMMM"),{"January","yyyy-MM-dd";"Januar","jjjj-MM-tt"},2,FALSE)
In English: Text(1,"MMMM") = "January", so we do a VLOOKUP on the Array below to get "yyyy-MM-dd"
"January" , "yyyy-MM-dd" ;
"Januar" , "jjjj-MM-tt"
Auf Deutsche, Text(1,"MMMM") = "Januar", also wir machen einen SVERWEIS auf dem Array oben, um "jjjj-MM-tt" zu erhalten! :)
Then, just use that in your TEXT function:
=TEXT(A1, VLOOKUP(TEXT(1,"MMMM"),{"January","yyyy-MM-dd";"Januar","jjjj-MM-tt"},2,FALSE))
Obviously, the main reason this works is that TEXT(1,"MMMM") is valid for both German and English. If you are using something like Filipino (where "Month" is "Buwan") then you might find some issues finding a mutually intelligible formatting input.
I found another possibility. It is not perfect in all cases (see below) but it also works with number formats to be locale independent. As I have the same issue with mixed language versions.
For this you make your own function in vba. Open the developer tools with Alt+F11 and create a new module file. Inside the module file paste something like this:
Function FormatString(inputData, formatingString As String) As String
FormatString = Format(inputData, formatingString)
End Function
Then you can use it in cell formulas with english formating strings. Like:
= FormatString(A1; "yyyy-mm-dd")
Advantage: It also works with number formats:
= FormatString(A1; "00.00")
In case (like Germany) your decimal separator is not a .
Drawbacks:
1 Not identical to TEXT function
this doesn't always work with date formatting as maybe expected and not exactly the same as the TEXT function:
FormatString(1; "MMMM")
does not return "January" but "December" because the 1 is taken as a date. Which is something like 31.12.1899.
2 Has to be saved with macros
You have to save the file as *.xlsm for this to work
Note (1): this answers only the case for locale-independent TEXT to format numbers with decimal symbols and digit grouping symbols. For date formatting, see Chronocidal's answer.
Note (2): this answer does not use VBA functions, which would require enabling macros. Enabling macros may not be possible depending on the company's security policy. If enabling macros is an option, Uwe Hafner's answer would be easier.
You can detect the decimal symbol and digit grouping symbol as follows. Enter the number 1 in a specific cell (e.g. A1) and the number 1000 in another cell (e.g. A2).
Decimal symbol: =IF(TEXT(INDIRECT("A1"),"0,00")="001",".",",")
Digit grouping symbol1: =IF(TEXT(INDIRECT("A2"),"#,###")="1000,",".",",")
This is assuming that the decimal symbol is either . or , and the digit grouping symbol is either , or . respectively. This will not detect unusual digit grouping symbols like (space) or ' (apostrophe).
With this information, you can set up a cell (or cells) with a formula that results in the format code you need to apply.
Suppose you need to format a number to two decimal digits and using the digit grouping symbol. You can assume that if the decimal symbol is . then the digit grouping symbol will be , and vice versa. You can do the following:
A1: 1
A2 (the formatting string): =IF(TEXT(INDIRECT("A1"),"0,00")="001","#,##0.00","#.##0,00")
A3 (contains an arbitrary number you wish to format)
A4 (the formatted number): =TEXT(A3,A2)
Technical note: the INDIRECT function is used intentionally because it is a volatile function. This guarantees that the formatting string and anything dependent on it is recalculated even if no data changed in the Excel document. If INDIRECT is not used, Excel caches results and will not recalculate the formatting string when the Excel document is opened on a PC with different locale settings.
1 - Also known as Thousands separator
The easy fix, whether directly custom formatting a cell or using TEXT(), is to use a country code for a language you know the proper formatting codes for.
For instance, I am in the US, have a US version of Excel, and am familiar with its date code formats. So I'd want to use them and to ensure they "come out" regardless of anyone's Windows or Excel version, or the country they are in, I'd do it like the following (for TEXT(), let's say, but it'd be the same idea in custom formatting):
=TEXT(A1,[$-en-US]"yyyy-mm-dd")
The function would collect the value in A1, ask Excel to treat it as a date, Excel would and would say fine, it's cool (i.e.: the value is, say, 43857 and not "horse") because it is a positive number which is a requirement for anything to be treated as a date, and let the function move on to rendering it as a date in the manner prescribed. Rather than giving an #ERROR! as it would for "horse" or -6.
The function would then read the formatting string and see the language code. It would then drop the usual set of formatting codes it loaded upon starting up and load in the formatting codes for English ("en") and in particular, US English ("US"). The rest of the string uses codes from that set so it would interpret them properly and send an appropriate string back to TEXT() for it to display in the cell (and pass on to other formulas if such exist).
I have no way to test the following, but I assume that if one were to use a format that displayed day of the week names or month names, they would be from the same language set. In other words, Excel would not think that even though you specified a country and language that you still wanted, say, Dutch or Congolese month names. So that kind of thing would still need addressed, but would be an easy fix too just involving, say, a simple lookup one could add though it'd be "fun" setting up the lookup table for each language one wanted to accomodate...
However, the basic issue that arises with this problem in general, is very, very easily solved with the country codes. They aren't even hard or arcane anymore now that the [$-409] syntax has been replaced with things like [$-en-us] and [$-he-IL] and so on.

Need to understand fomula re: excel search for a string in a table and return string if true

I've adapted this solution from a couple of years ago:
=LOOKUP(2^15,FIND(Keywords,A2),Categories)
I use this for searching within a description field for keywords in a named list, in order to return a corresponding category from an adjacent named list.
However I do not understand the significance of 2^15. Can someone explain?
Also it's unclear in what order the search operates. If two keyword options were "check" and "deposit," and they were assigned to different categories, but both appeared in the same description field cell, how do I know which will be found first? Is it placement in the string, or order in the list?
2^15 is simply an arbitrarily large number, which lookup attempts to find - when it can't find it, it takes the next lowest number.
Effectively your formula looks at Keywords, and attempts to find the value in A2. For each word that actually matches A2, it provides a non-error message. Then out of the whole list, it attempts to find that line number in categories, resulting in many errors, and a single correct value. Lookup picks the value by using 2^15. Though this seems to be a weird way of doing it; it is likely a holdover of pre-2007, as Lookup is generally used now only for backwards compatibility purposes. Also using 1 instead of 2^15 worked for a couple of simple cases that I tried when writing this up.

Number representation by Excel

I'm building a VBA program on Excel 2007 inputing long string of numbers (UPC). Now, the program usually works fine, but sometimes the number string seems to be converted to scientific notation and I want to avoid this, since I then VLook them up.
So, I'd like to treat a textbox input as an exact string. No scientific notation, no number interpretation.
On a related side, this one really gets weird. I have two exact UPC : both yield the same value (as far as I or any text editor can tell), yet one of the value gives a successful Vlookup, the other does not.
Anybody has suggestions on this one? Thanks for your time.
Long strings that look like numbers can be a pain in Excel. If you're not doing any math on the "number", it should really be treated as text. As you've discovered, when you want to force Excel to treat something as a string, precede it with an apostrophe.
There are a couple of common problems with VLOOKUP. The one you found, extra whitespace, can be avoided by using a formula such as
=VLOOKUP(TRIM(A1),B1:C:100,2,FALSE)
The TRIM function will remove those extraneous spaces. The other common problem with VLOOKUP is that one argument is a string and the other is a number. I run into this one a lot with imported data. You can use the TEXT function to do the VLOOKUP without having to change the raw data
=VLOOKUP(TEXT(A1,"00000"),B1:C100,2,FALSE)
will convert A1 to a five digit string before it tries to look it up in column B. And, of course, if your data is a real mess, you may need
=VLOOKUP(TEXT(TRIM(A1),"00000"),B1:C100,2,FALSE)

Prevent comma-separated list of numbers being interpreted as single large value

33266500,332665100,332665200,332665300 was the original value, cell should look like this: 33266500,332665100,332665200,332665300 but what I see as the cell value in excel is 3.32665E+34
So the question is I want to convert it into the original string. I have found format function on google and I used it like these
format(3.32665E+34,"standard")
giving it as 332,6650,033,266,510,000,000,000
How to parse it or get back the orginal string? I belive format is the function in vba.
Excel has a 15 digit precision limit. If the numbers are already shown like this when you access the file, there is no way to get the number back - you have already lost some digits. VBA code and formulas will not help you.
If this is not the case, you can add a single quote ' mark before the number to store it as text. This will ensure Excel does not try to treat it as a number and thus lose precision.
If you want the value kept exactly, store the data as a string, not as a number. The data type you are using simply doesn't have the ability to do what you are asking it to do.
If you're starting with an Excel file that has already been created then you've already lost the information: Excel has tried to understand what it was given and its best guess has turned out to be wrong. All you can do (if you can't get the source data) is go back to the creator of the Excel file and tell them what's wrong.
If you're starting with, say, a text file that you're importing, then the news is much better:
If you're importing manually using the Text Import Wizard, then at "Step 3 of 3" you need to set "Column Data Format" for the problem field to "Text".
If you're using a macro, you'll need to specify a value for the TextFileColumnDataTypes property that does the same thing. The easiest way to get it right is to use the Macro Recorder.
If you want the four values in the string to be separate cells, then again, look at the Text Import Wizard settings: in Step 1 of 3 you need to set "Delimited" data type (usually the default) and in Step 2 make sure that "Comma" is checked.
The value needs to be entered into the cell as a string. You need to make whatever it is that inserts the value preceed the value with a '.

Resources