Is this an Apache Poi bug in java? - apache-poi

When I took out all the data in row #2, I noticed that my columns C, E, F were missing.
As follows:
My excel file is a new file, I only enter data for columns A, B, D, G, H and columns C, E, F. I don't touch (don't click or touch the cell)
image description
Then i use the following function to get the data from the columns of a row
Row row = sheet.getRow(i);
At this point, I noticed that the data of my columns C, E, F was lost. Specifically, in the row, only columns A, B, D, G, H and their respective data are displayed.
As the picture above, if I color the data row to white or any color or the columns C, E, F I click on (don't write data in the column). Then I re-read the file, now the row receives enough data from all columns A, B, C, D, E, F, G, H. with columns C, E, F, the data is empty.
What do I do when sheet.getRow() doesn't lose columns.

In a Excel sheet only those rows and cells that not are completely unused are stored. A row is stored if at least one cell needs to be stored in that row. A cell needs to be stored if it has a cell value or has a special cell style (using data formatting, font formatting, interior formatting...). All other possible rows and/or cells are not stored.
So what you observed is not a bug in Apache POI but shows how Excel uses it's storage for rows and cells in sheets. Storing all possible cells would be a horrible waste of storage.
For not stored rows, Sheet.getRow return null. For not stored cells, Row.getCell return null. Not stored rows as well as not stored cells are not reachable using iterators.
So, if the need is to iterate over cells, with control of missing / blank cells, then the iterators are not usable. See Iterate over rows and cells -> Iterate over cells, with control of missing / blank cells.
Sample code provided there:
// Decide which rows to process
int rowStart = Math.min(15, sheet.getFirstRowNum());
int rowEnd = Math.max(1400, sheet.getLastRowNum());
for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
Row r = sheet.getRow(rowNum);
if (r == null) {
// This whole row is empty
// Handle it as needed
continue;
}
int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);
for (int cn = 0; cn < lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c == null) {
// The spreadsheet is empty in this cell
} else {
// Do something useful with the cell's contents
}
}
}
As you see, you need to know needed first row and needed last row as well as needed column count. The sample code always starts om column 0 (A). Maybe you also know the first needed column to start there. And you need to null-check the row as well as the cell you got.

Related

Excel VBA Macro to loop, copy, paste, offset based on extracted cell value

I've been trying soo hard much to clean up this csv data for a coworker.
I’m going to walk through what the data usually looks like and then walk through the steps I’ve done and then bring up what I’m currently struggling with… Bear with me as this is my first post (and I have no background in vba and everything is self-taught by Google).
So the data export is a csv which can be opened in excel broken out by several columns. The column in question is column G, which essentially has multiple data sets (1 – 219) for the same menu item (row).
For example:
A B C D E F G
Chicken Soup {1;$6.00;59;$9.00;88;$6.00}
Beef Soup {1;$8.00;59;$12.00;88;$8.00}
Duck Soup {1;$6.00;59;$6.00;88;$6.00}
Egg Soup {1;$8.00;59;$9.00;88;$8.00}
Water {1;$0.00}
French Onion Soup {1;$16.00;59;$15.00;88;$12.00}
Chili Soup {1;$17.00;84;$17.00}
So in column G, you can tell, there is multiple prices the format is:
{Column Number ; $ Price ; Column number $ Price etc & }
Regex: .[0-9]{1,2},[$][0-9]{1,3}[.][0-9][0-9].|[0-9]{1,2},[$][0-9]{1,3}[.][0-9][0-9]
The first goal was to parse out the data in the column into the row, in a format that is true to the csv (so it can be combined and resubmitted).
For example: (imagine there is a semi colon between each data set, as there should be in the final result)
{1;$21.00}
{1;$16.00}
{1;$12.00 5;$12.00 8;$12.00}
{1;$18.00 6;$18.00 8;$18.00}
{1;$10.00 6;$7.00 9;$12.00 11;$10.00}
{1;$20.00 6;$20.00 8;$20.00}
{1;$5.49 3;$3.99 10;$4.99 12;$4.99}
{1;$18.99}
{1;$21.00}
{1;$21.00}
To accomplish this goal, I wrote a macro that:
Copies column G from “Sheet1” and inputs to new sheet “Sheet2” in A1
Replace all “;$” with “,$” to help separate each data set by itself instead of having it broken out column name then dollar sign in two different columns
Text to columns macro splitting on “;” (and inputs results starting B1 so I can keep A1 with all the data sets in one column in case I need it) – also if you know how to keep the semi colon here, that would be helpful so I don’t have to re-add it in the future
Replace All from b1 to end of data set "," to ";" <-- to bring it back to original formatting
Copies the Data from B1 to last cell with data (data is not in order, the 50th row could have 219 columns and then the last row could only have 150) and pastes this data into column G of “rp items” (therefore overriding the existing data and shifting the columns as far right as the last column used.
However, when I showed my coworker what I’ve done, he wanted the leading number (column number) to correspond to the Columns (since data starts in column G, this will be column 1, H will be 2 etc). Therefore looking something like this so he can filter the column by the all the items that have that column number:
For example, this photo is how the outcome should look
So, now the goal is to create a macro that…
Loops through B1:B in sheet “STEP ONE” (column B starting at B1 then C1 then when blank in that row go to next row)
While (B1 (or next row) is blank, do nothing, end macro)
If B1 (or active cell) is not blank, read the cell value to extract column; copy the cell’s contents, paste in “STEP TWO” sheet in the same row as the active cell, but offset by the column number from cell value.
Toggle back to main sheet, goes to next cell within that row – if blank, go to next row and repeat until all data is done.
To give you some background, I have more than 25,000 lines of data (menu items) and the longest column (I believe is 219). So far, I’ve mostly been trying pieces of scripts I’ve found online but none of them are doing similar to what I need and I don’t know how to write enough code to just write the script out myself. I believe I’ll need to have to establish a variable: the column name (not sure if I can extract this using the regex code I found out) and then use that in the offset...
But my range needs to be dynamic and loop…
Please let me know if you can assist – I’ve been stuck on this for like a week!
Thank you all so much for reading – if I can provide extra detail please let me know.
For example you could do it this way:
Sub Tester()
Dim arr, i As Long, c As Range, v, col, price
For Each c In Range("G2:G4").Cells
v = Replace(Replace(c.Value, "{", ""), "}", "") 'remove braces
If Len(c.Value) > 0 Then 'anything to process?
arr = Split(v, ";") 'split on ;
For i = 0 To UBound(arr) - 1 Step 2 'loop 2 at a time
col = CLng(Trim(arr(i))) 'column number
price = Trim(arr(i + 1)) 'price
c.Offset(0, col).Value = col & ";" & price
Next i
End If
Next c
End Sub

Excel Match Numbers in 2 Columns to a Number in a 3rd

I've run into a bit of a road block. I get a .PDF output from an accounting program and copy/paste the data into excel, then convert text to columns. I am trying to match the GL code with the totals for that specific account. Columns A, B, and C show the state of my data prior to sorting it, and the lines under Intended Output show how I would like the data to output.
I am trying to automate this process, so I can paste data into columns A, B, & C in the raw format and have it automatically spit out the required numbers in the format of the Intended Output. The GL codes remain the same, but the numbers and the number of rows will change. I've color coded them for ease of review.
Thank you very much in advance!
Using a combination of the following formulas you can create a list of filtered results. It works on the principal that you Data1 text that you want to pull is the only text with a "-" in it, and that the totals you are pulling from Data2 and Data3 are the only numbers in the column. Any change to that pattern will most likely break the system. Note the formulas will not copy formatting.
IFERROR
INDEX
AGGREGATE
ROW
ISNUMBER
FIND
Lets assume the output will be place in a small table with E2 being the upper left data location.
In E2 use the following formula and copy down as needed:
=IFERROR(INDEX(A:A,AGGREGATE(15,6,ROW($A$1:$A$30)/ISNUMBER(FIND("-",$A$1:$A$30)),ROW(A1))),"")
In F2 use the following formula and copy to the right 1 column and down as needed:
=IFERROR(INDEX(B:B,AGGREGATE(15,6,ROW($A$1:$A$30)/ISNUMBER(B$1:B$30),ROW(A1))),"")
AGGREGATE performs array like calculations. As such, do not use full column references such as A:A in it as it can lead to excess calculations. Be sure to limit it to the range you are looking at.
Try this procedure:
Public Sub bruce_wayne()
'Assumptions
'1. Data spreadsheet will ALWAYS have the structure shown in the question
'2. The key word "Total" (or whatever else it might be) is otherwise NOT found
' anywhere else in the 1st data column
'3. output is written to the same sheet as the data
'4. As written, invoked when data sheet is the active sheet
'5. set the 1st 3 constants to the appropriate values
Const sData2ReadTopLeft = "A1" 'Top left cell of data to process
Const sData2WriteTopLeft = "J2" 'Top left cell of where to write output
Const sSearchText = "Total" 'Keyword for summary data
'*******************
Const sReplaceText = "Wakanda"
Dim r2Search As Range
Dim sAccountCode As String
Dim rSearchText As Range
Dim iRowsProcessed As Integer
Set r2Search = Range(sData2ReadTopLeft).EntireColumn
sAccountCode = Range(sData2ReadTopLeft).Offset(1, 0).Value
iRowsProcessed = 0
Do While Application.WorksheetFunction.CountIf(r2Search, sSearchText) > 0
Set rSearchText = r2Search.Find(sSearchText)
Range(sData2WriteTopLeft).Offset(iRowsProcessed, 0) = sAccountCode
Range(sData2WriteTopLeft).Offset(iRowsProcessed, 1) = rSearchText.Offset(0, 1).Value
Range(sData2WriteTopLeft).Offset(iRowsProcessed, 2) = rSearchText.Offset(0, 2).Value ' add this if there are more summary columns to return
'last two lines could be collapsed into a single line; at the expense of readability..
rSearchText.Value = sReplaceText 'so that next search will find the next instance of the trigger text
iRowsProcessed = iRowsProcessed + 1
sAccountCode = rSearchText.Offset(1, 0).Value
Loop
r2Search.Replace what:=sReplaceText, Replacement:=sSearchText
End Sub

Increment column reference in formula

I have data which I have to enter into a table that has the following layout:
I need to store it in another worksheet that has this layout:
I am trying to create a macro that will accomplish 3 things:
Copy the contents in the first table to a column in another worksheet
clear the original table data (but not the headers q, w, e, etc)
Increment the column reference so that the next time the macro is ran, it will copy the data into column C in the second table, and the third time into column D, and so on.
This is what I have tried:
Line 1: Sheets("sheet2").Range("B1").Value = Sheets("sheet1").Range("A5").Value
Line 2: Sheets("sheet1").Range("A5:D6").ClearContents
In order to accomplish 3), I have to manually change "A5" to "B5" in the code, and to do so for each cell (of which there are about 60). Is there a way to automate this?
You can use excel's offset() function.
To change A5 to B5 you can do
For x = 1 to N
Sheets("sheet2").Range("B1").Value = Sheets("sheet1").Range("A5").offset(0,x).Value
If you want to do it only once then you will not need the for loop.

How to find DataValidation list for an Excel cell using EPPlus

I have a workbook with a number of cells that have data validation specified as a dropdown list of allowed values. Using EPPlus, I want to be able to get for each such cell, the list of allowed values.
So far I've got:
ExcelWorkSheet.DataValidations gives me an ExcelDataValidationCollection, which is a collection of IExcelDataValidation items for the worksheet.
Each IExcelDataValidation has an Address property of type ExcelAddress which presumably references all cells that have that validation rule.
The step I'm stuck on is finding if a given cell is one of the cells included in the ExcelAddress
Any takers?
I'm currently using EPPlus 3.1.1.0, but can upgrade to a more recent version if necessary.
UPDATE
I didn't explain this clearly enough. Here's my situation in more detail.
Assume column C has some cells with list data validation. Some cells allow, say, "A, B, C"; other cells allow "D, E, F" etc. The range of cells for each data validation list is not contiguous, so, for example:
C2, C4, C7-C10, C20 may allow "A,B,C"
C3, C5-C6", C15 may allow "D,E,F"
I'm trying to determine which cells allow "A,B,C" and which allow "D,E,F" etc.
ExcelWorksheet.DataValidations contains ExcelDataValidationList items, one with values "A,B,C", one with values "D,E,F", etc.
ExcelDataValidationList.Address for the list "A,B,C" contains an ExcelAddress whose Address property looks something like: "C4 C7:C10 C2 C20 ...".
I want to determine if a given cell (say C6) is included in the range specified by this address "C4 C7:C10 C2 C20 ...".
Of course, I can String.Split on whitespace, and parse each item in the resulting list. But I was hoping there would be some more direct way of doing this, e.g.
ExcelAddress.Contains("C6")
or
ExcelAddress.Contains(6, 2) // row 6 col 2 = C6
Almost there, just check the IExcelDataValidation's specific type. Tested with EPPlus 4.1.0.0:
using (var package = new ExcelPackage(new FileInfo(path)))
{
var sheet = package.Workbook.Worksheets[1];
var validations = sheet.DataValidations;
foreach (var validation in validations)
{
var list = validation as ExcelDataValidationList;
if (list != null)
{
var range = sheet.Cells[list.Formula.ExcelFormula];
var rowStart = range.Start.Row;
var rowEnd = range.End.Row;
// allowed values probably only in one column....
var colStart = range.Start.Column;
var colEnd = range.End.Column;
for (int row = rowStart; row <= rowEnd; ++row)
{
for (int col = colStart; col <= colEnd; col++)
{
Console.WriteLine(sheet.Cells[row, col].Value);
}
}
}
}
}
Test worksheet:
Output:
one
two
three

How can I make this Excel macro loop through only visible filtered data?

I have very little experience with VBA and I'm now stumped by what I'm trying to accomplish with a macro. Excel 2010.
I have 3 relevant columns. B, C, and AD. (Columns 2, 3, and 30)
My data is filtered down to about 30 rows, almost none of which are contiguous (about 500 rows total).
I want the following to happen:
A formula is entered in ONLY THE VISIBLE ROWS in column AD, which will look at the value in column B of that same row and check for that value in all of the VISIBLE CELLS in column C. It cannot look at all of the cells in column C, only the visible ones.
If the value from column B in that row is found anywhere in the VISIBLE CELLS in column C, then "True" should be returned in column AD. I don't care about what is returned when the value is not found, as I will be filtering for the "True" values only.
As an added requirement, if the first 3 characters of the value in column B are "010" I need it to return a value of "True" in column AD. I then need this formula copied down column AD for each VISIBLE row.
Right now, I have a formula that will conduct the search in column C for the value in column B. (found on stackoverflow)
=NOT(ISNA(VLOOKUP(B4,C:C,1,0))))
This provides a "True" in column AD when the value from column B is found somewhere in column C. With the "010" constraint, the formula looks like this:
=IF(LEFT(B4,3)="010","True",NOT(ISNA(VLOOKUP(B4,C:C,1,0))))
I am having a problem in that this looks at even the hidden (filtered out) rows. Every one of my values in column B will appear in column C at some point, so I'm only getting "True" for all my entries.
I think there must be a better way to do this than just having a macro paste the formula down (even considering I can't get the formula to work). So, 2 questions:
Is the formula the right way to go in this case, and, if so, can anyone tell me how to get it to only search the visible cells in column C?
If code is the best way (I'm guessing it is), can anyone show me an example of code that might work?
You're almost there. On its own, COUNTIF does not have this capability, but if you throw user-defined functions (UDF functions) into the mix, you have an elegant solution.
Add the following code to a module in your code-behind.
Function Vis(Rin As Range) As Range
'Returns the subset of Rin that is visible
Dim Cell As Range
Application.Volatile
Set Vis = Nothing
For Each Cell In Rin
If Not (Cell.EntireRow.Hidden Or Cell.EntireColumn.Hidden) Then
If Vis Is Nothing Then
Set Vis = Cell
Else
Set Vis = Union(Vis, Cell)
End If
End If
Next Cell
End Function
Function COUNTIFv(Rin As Range, Condition As Variant) As Long
'Same as Excel COUNTIF worksheet function, except does not count
'cells that are hidden
Dim A As Range
Dim Csum As Long
Csum = 0
For Each A In Vis(Rin).Areas
Csum = Csum + WorksheetFunction.CountIf(A, Condition)
Next A
COUNTIFv = Csum
End Function
Now you can use the new COUNTIFv() function to do the same as count, but only include visible cells. This code example was sampled from Damon Ostrander's answer to a similar question, so you will probably need to tweak it slightly. You can either use the COUNTIFv function in the macro itself, or modify the VLOOKUP function in a similar fashion to use the worksheet function example you have already. Neither method is really better than the other, so either should work for you.

Resources