Apache POI Difference between SXSSFSheet.addMergedRegion() & SXSSFSheet.addMergedRegionUnsafe()? - apache-poi

I am using Apache POI to build excel file. Since the data might be very large I decided to use SXSSFWorkbook instead of XSSFWorkbook. In application runtime I can see that SXSSFSheet.addMergedRegionUnsafe() run faster than SXSSFSheet.addMergedRegion() but I can't find any docs related to it. So I wonder is there any risk of using SXSSFSheet.addMergedRegionUnsafe()?

SXSSFSheet.addMergedRegionUnsafe simply calls XSSFSheet.addMergedRegionUnsafe. See source code of SXSSFSheet:
...
#Override
public int addMergedRegionUnsafe(CellRangeAddress region) {
return _sh.addMergedRegionUnsafe(region);
}
...
There _sh is XSSFSheet.
And in XSSFSheet.addMergedRegionUnsafe the behavior is documented:
Adds a merged region of cells (hence those cells form one). Skips
validation. It is possible to create overlapping merged regions or
create a merged region that intersects a multi-cell array formula with
this formula, which may result in a corrupt workbook. To check for
merged regions overlapping array formulas or other merged regions
after addMergedRegionUnsafe has been called, call
validateMergedRegions(), which runs in O(n^2) time.
So addMergedRegionUnsafe runs faster because it skips validation. So using this your program needs considering not to create overlapping merged regions and not to create a merged region that intersects a multi-cell array formula with this formula. Else the result workbook gets corrupted.
Source for XSSFSheet.addMergedRegionUnsafe for completeness.

Related

Returning a column reference from MATCH to avoid using INDIRECT with a Named Range

TL;DR: I'm basically trying to obtain a column range such as 'Sheet 1'!$A:$A where the A is obtained by matching the contents of a given cell to a 1:1 range within a sheet referenced by another given cell, for use in a dynamic range.
In the highly probable case where that made zero sense, here's an illustration:
PARAMETERS: A2 = "LIST" | C2 = "FirstName" | Desired result: 'LIST'!$A:$A
And I've obtained that, BUT, I can't use that output ('LIST'!$A:$A) within formulas (namely to create a dynamic range). For instance, here 'LIST'!$A:$A contains 101 cells with values in them:
V3 = NamedFormula = 'LIST'!$A:$A
COUNTA(INDIRECT(V3)) = 101
COUNTA(INDIRECT(NamedFormula)) = 1 because it evaluates to #VALUE and that is a singular result.
Before delving into the topic of using INDIRECT with a Named Range (which I've read about and am still getting over my confused grief), I'm realizing my Names are getting a bit out of hand. I tend to use Excel like a mad scientist. So, in case there's a much simpler solution to what I'm trying to do, here's my actual mission:
0. I'm building a tool to simplify a process where email addresses are built from different data, which needs to run without any scripts, only formulas.
1. A tab with no imposed name would contain a user database with minimally (firstname and lastname OR IDs) AND (potentially other data columns) in no specific order. Tool users would import that tab from wherever the data got to them depending on the client, and would only need to copy-paste relevant headers to the main tab without changing anything else here for data integrity.
2. The main tab would have specific input fields where tool users would paste in the name of the imported tab as well as the labels of the columns they need (for instance, the labels in the first row of the columns containing the first name and the last name), and an input field for the domain name to use to build those email addresses.
3. A Data tab is referenced for cleaning and preparing strings for email address formats.
4. The Export tab would spew out a list of clean email addresses that can be exported to CSV.
The Data tab is just 2 columns to use with SUBSTITUTE so that for instance apostrophes are removed but accented letters are normalized (é -> e). I've used LAMBDA within Names to get there. The problem is to tie everything in - to get those Named ranges into the final formula.
The Names I'm using so far (I'd like to use fewer but testing specific parts extended beyond simple usage I fear):
ALPH ={"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"M";"N";"O";"P";"Q";"R";"S";"T";"U";"V";"W";"X";"Y";"Z"}
LABELS =LAMBDA(labelname,ADDRESS(2,MATCH(labelname,INDIRECT("'"&PARAMETERS!$A$2&"'!$1:$1"),0),1,1,PARAMETERS!$A$2))
RANGECOL =LAMBDA(labelname,COLUMN(INDIRECT(LABELS(labelname))))
RNCOL =LAMBDA(label,"'"&PARAMETERS!$A$2&"'!$"&INDEX(ALPH,RANGECOL(label))&":$"&INDEX(ALPH,RANGECOL(label)))
I haven't tied everything in the Data tab yet - I'm still trying to automate my main tab before pushing further and using the Data tab substitutions on top of everything. That will be the next step, not my current focus. But, for the curious and interested, on the Data tab I'm using something something I found on ablebits which works wonders =]
So, now if I use the offset range with a static LIST!A:A it works:
=IF($C$2<>"",LOWER(INDEX(OFFSET(INDIRECT(ADDRESS(2,MATCH($C$2,INDIRECT("'"&$A$2&"'!$1:$1"),0),1,1,$A$2)),0,0,COUNTA(LIST!A:A)-1,1),ROW())),"") &IF($C$3<>"","."&LOWER(INDEX(OFFSET(INDIRECT(ADDRESS(2,MATCH($C$3,INDIRECT("'"&$A$2&"'!$1:$1"),0),1,1,$A$2)),0,0,COUNTA(LIST!A:A)-1,1),ROW())),"") &"#"&$C$4
But when I try to use the dynamic RNCOL($C$3) it does not:
=IF($C$2<>"",LOWER(INDEX(OFFSET(INDIRECT(LABELS($C$2)),0,0,COUNTA(INDIRECT(RNCOL($C$2)))-1,1),ROW())),"") &IF($C$3<>"","."&LOWER(INDEX(OFFSET(INDIRECT(LABELS($C$3)),0,0,COUNTA(INDIRECT(RNCOL($C$3)))-1,1),ROW())),"") &"#"&$C$4
This just gives #REF, and evaluating shows the digression starting at INDIRECT(RNCOL($C$3)) equating to #VALUE.
I'm starting to see double here but my undying and completely normal love for Excel prevents me from going home from work as I'm way too far down the rabbit hole to let my obsession die here.
Any pointers as to how this can work?
Note - all of the names in the supplied sheet were generated by an online fake name generator, nothing in here is actual user data #GDPR
Thanks in advance! <3
Test sheet is available via Google Drive.
Your current set-up is not good for many reasons, and in my opinion would require a complete overhaul, the scope of which lies beyond a response on this website.
As to a 'quick fix' to your current issue, the reason your formula in E1 is currently returning an error is due to the fact that, as you can see via stepping through with the Evaluate Formula tool, the part
COUNTA(INDIRECT(RNCOL($C$2)))-1
is resolving to
COUNTA(INDIRECT({"'LIST'!$A:$A"}))-1
and this is not the same as
COUNTA(INDIRECT("'LIST'!$A:$A"))-1
in that the value being passed to INDIRECT is an array in the former though not in the latter. Although INDIRECT can accept arrays, it is only within certain constructions in conjunction with other suitable functions; here it will simply error.
And the reason that it is returning an array is due to the fact that RNCOL($C$2) is returning an array, and that is because that function is defined as
=LAMBDA(label,"'"&PARAMETERS!$A$2&"'!$"&INDEX(ALPH,RANGECOL(label))&":$"&INDEX(ALPH,RANGECOL(label)))
and, since RANGECOL($C$2) resolves to 1 here, the above is equivalent to
"'PARAMETERS!$A$2'!$"&INDEX(ALPH,1)&":$"&INDEX(ALPH,1)
Here, because you are omitting the column_num parameter from INDEX, the part
INDEX(ALPH,1)
is resolving to
{"A"}
which is an array (albeit one comprising a single value) and technically different from
"A"
In most circumstances, this is not an issue. As such, it is almost always unnecessary to pass both a row_num and column_num parameter to INDEX when indexing a one-dimensional array. Here, however, it matters.
You can resolve this by explicitly including a column_num parameter, i.e. redefine RNCOL as
=LAMBDA(label,"'"&PARAMETERS!$A$2&"'!$"&INDEX(ALPH,RANGECOL(label),1)&":$"&INDEX(ALPH,RANGECOL(label),1))

Excel design consideration - infrequent change and impact for VBA/formula/add-on

nothing stuck or broken, just I am inspired after a discussion with another Excel author.
His situation:
Read from an existing Excel monster file (column FG), and hard-coded the following
Range("FF:FG").Copy
Potential issue:
data in FF:FG will be pushed to GF:GG every couple of months because newer columns will be inserted in between. (It's a pivot-like design... sorry, but end-users need this appearance, but categories are increasing, summary need to be at right end side)
He has 2 other choices (if he don't want to maintain VBA code every few months):
A: Store "FF","FG" in a Cell (fixed location!), then read the location parameter using VBA
B: Read a second dedicated CSV file (copy/paste from the monster file, consumed by another user so available readily), it only has the 2 columns required..
To me, none is obviously better than the other, just a matter of preference.
Similar but simpler Scene of mine
I produce the monstrous file by lots of Vlookup from manual data sources (inherited the design... and I refactored the design using another automation tool but there is license consideration atm).
In a column there is a formula doing something like
=if(A1="SALES PERSON SICK","void result",(if(A1="MACHINE BROKEN",C2*0.8),"")..
say 5000 rows with this formula
To reduce hard-coding I moved
"SALES PERSON SICK","MACHINE BROKEN" to a reference sheet cell A1,A2, and changed formula to:
=if(A1=Ref!$A$1,"void result",(if(A1=Ref!$A$2,C2*0.8),"")
I feel it's a good practice.
Question: Is method A or B better? Considering column position will move every ~3-6 months, still worth choosing 1 from A/B?
data in FF:FG will be pushed to GF:GG every couple of months because newer columns will be inserted in between
Then you should use named ranges in what you call "monster file" (see Define and use names in formulas) and use them in your VBA.
Eg define a name for Columns FF:FG like CopySource (use a name that describes the data in that columns) and finally you can use that in your VBA code.
Range("CopySource").Copy
Whenever the range moves because new columns are inseted before, the named range moves too, so it still points to the same data.

Excel Match Function with two Criteria but differing match types

Basically I am trying to improve a spreadsheet that current uses fixed IF functions within IF functions to determine where to find data, then originally used the VLOOKUP function to return the appropriate cable cleat size. Where "Cleat Diameter">"Cable Diameter".
I've been using this for a while, however excel quickly runs out of resources with all the remaining calculations being performed. As a result, I've opted to put all data a single table, and try to use the match function to retrieve the necessary row. Then Simply use the =INDIRECT function to retrieve data from the appropriate column of the associated row.
Unfortunately I believe the issue relates to the fact that I first need to perform at MATCH Type 0 (exact match), followed by a type -1 for the size to identify the next size up that can accommodate a specific cable size.
I've managed a simple lookup on another dataset using (for exact matches):
=MATCH($B3,'Current Raw Data'!A:A,0)+ROW('Current Raw Data'!A:A)-1
However when I attempt the same thing with two types of matches I get errors. The closest I get it using the following array formula, but it does not work unless the data set is arranged so that the contents of Cell C3 is the first occurring item in the dataset in column A:A:
{=MATCH(C3,($B3='Lookup - Cleats'!A:A)*('Lookup - Cleats'!B:B),-1)}
Main sheet:
Dataset Example:
With this array formula (click Ctrl + Shift + Enter together inside formula bar), you should be able to get your results:
=IFERROR(INDEX('Lookup - Cleats'!C$3:C$26,MATCH($B3&$C3,'Lookup - Cleats'!$A$3:$A$26&'Lookup - Cleats'!$B$3:$B$26,0)),"")
I tried my best to use your data setup but maybe miss one or two things that you will need to adjust accordingly. Let me know if this is not working.

Which is the most flexible way for end-user to create a report querying an SSAS Cube?

I know that you can query a cube by connecting from Excel to Analysis database and using formulas like cubevalue() or cubemember().
I also know that after converting the power pivot to formulas, you can access the attribute and the value related only by writing a text.
Example: for Branch Dimension, instead of writing
cubemember("connections";"[DimBranch].[Name].[All].[London])" )
you can write in the cell only "London". However, this won't work if you have a parent-child dimension and want to retrieve the amount for one of the intermediate levels.
Did anyone know about how can you avoid writing these formulas directly by the end-user ?
I´ve done this several times. I can describe the approach I´ve used in the past.
You need to define the input from the user, for example Fiscal Year, Fiscal Period,... and create a tab with a friendly layout for this solely purpose.
After that you create your pivot/pivots against the cube/s you want to query with the data extraction that you need (obiously these pivots will refer to the Fiscal Year and Period). Define the desired layout for these pivots (as you want them you be shown). Once You´ve done this, you need to convert into formulas these pivots, including parameters. You´ll see that values placed in the measure box contain a formula cubevalue which makes reference to the connection itself and several cubemembers. It makes reference to the measure as well.
The trickiest part comes now, when you need to modify your cubemember function to take value from the input, something like cubemember("connections";"[DimBranch].[Name].[All].["+MyReferenceCell+")" ), and you have your dimension picked up from a value entered by a user.
If you have as a matter of fact, serveral pivots referencing common dimensions, you need to make the reference this modified cubemember cell affected by user input, otherwise you have to modify all affected cubemembers in your pivots converted to formulas.
This has a really good advantage: depending on the excel´s version, all pivots are automatically recalulated if you change the value in the user´s input cell without hitting the refresh all button.
On the other hand, the disadvantage comes when your report in excel contains a list of values from a dimension which can grow. After converting pivots into formulas those new values for the dimension are lost. That´s why you can always place a control if you have the total value int he pivots regardless the dimension value.
I hope this helps.
Regards

Chart always complains about invalid references - Excel 2007

I made a XY plot that shows points from one data set in two different colors, depending on a set of conditions. I achieved this by making the source table three columns instead of two. First column is the X. Second column is Y is one set of conditions apply, third column is Y is the other set of conditions apply. So the second and third columns have formulas like this in them, respectively:
=IF(ConditionApplies,YValue,"")
=IF(ConditionApplies,"",YValue)
(So the graph actually has two series, each of which is not a contiguous block of numbers - each is interspersed with "nothing")
When I make a change that affects the ConditionApplies, the table reacts properly. Then I switch to the chart (on a different sheet) and it always says: "A formula in this worksheet contains one or more invalid references...". Click OK.
The chart itself always looks the way I would expect, with two different sets of points according to the Conditions I devised. If I inspect the data source fields, all the references are intact and proper.
Basically everything works, I would just like to avoid this annoying pop-up.
Had the same problem. Deleted a data column and the chart that referenced it kept complaining.
Solution was to move the chart to its own page. then copy the chart and put it back into worksheet.
Hope it helps.
I 100% understand everything you've said here and, on the surface, it sounds like it's not any kind of bug. It seems like you are actually referencing something you shouldn't. If that's, in fact, the case that's obviously something you want to fix.
My first guess would be to look at your "ConditionApplies" formulas. Under certain cases, would they create invalid references (referencing data of the wrong type, dividing by zero, circular references, etc.). The most common cause of problems like that would be dragging formulas but not having the "$" signs in the appropriate places. So your cell references change when you expected they'd stay the same.
For example:
=SUM(A1:G25)
should be something like the following to prevent the column and row from incrementing when dragged:
=SUM($A$1:$G$25)
Recommendation
Look at the "ConditionApplies" formulas (or better yet, post them here) and aggressively place $ where ever they don't break things. Then "re-drag" your new formulas, updating the previous ones.
There is a microsoft KB 931389! about this problem with status "Confirmed, not fixed".
In my situation with chart and two series collection problem solved by adding a code to delete all seriesCollection before adding new data:
While Sheets(3).ChartObjects(1).Chart.SeriesCollection.Count > 0
Sheets(3).ChartObjects(1).Chart.SeriesCollection(Sheets(3).ChartObjects(1).Chart.SeriesCollection.Coun t).Delete
Wend

Resources