Chart always complains about invalid references - Excel 2007 - excel

I made a XY plot that shows points from one data set in two different colors, depending on a set of conditions. I achieved this by making the source table three columns instead of two. First column is the X. Second column is Y is one set of conditions apply, third column is Y is the other set of conditions apply. So the second and third columns have formulas like this in them, respectively:
=IF(ConditionApplies,YValue,"")
=IF(ConditionApplies,"",YValue)
(So the graph actually has two series, each of which is not a contiguous block of numbers - each is interspersed with "nothing")
When I make a change that affects the ConditionApplies, the table reacts properly. Then I switch to the chart (on a different sheet) and it always says: "A formula in this worksheet contains one or more invalid references...". Click OK.
The chart itself always looks the way I would expect, with two different sets of points according to the Conditions I devised. If I inspect the data source fields, all the references are intact and proper.
Basically everything works, I would just like to avoid this annoying pop-up.

Had the same problem. Deleted a data column and the chart that referenced it kept complaining.
Solution was to move the chart to its own page. then copy the chart and put it back into worksheet.
Hope it helps.

I 100% understand everything you've said here and, on the surface, it sounds like it's not any kind of bug. It seems like you are actually referencing something you shouldn't. If that's, in fact, the case that's obviously something you want to fix.
My first guess would be to look at your "ConditionApplies" formulas. Under certain cases, would they create invalid references (referencing data of the wrong type, dividing by zero, circular references, etc.). The most common cause of problems like that would be dragging formulas but not having the "$" signs in the appropriate places. So your cell references change when you expected they'd stay the same.
For example:
=SUM(A1:G25)
should be something like the following to prevent the column and row from incrementing when dragged:
=SUM($A$1:$G$25)
Recommendation
Look at the "ConditionApplies" formulas (or better yet, post them here) and aggressively place $ where ever they don't break things. Then "re-drag" your new formulas, updating the previous ones.

There is a microsoft KB 931389! about this problem with status "Confirmed, not fixed".
In my situation with chart and two series collection problem solved by adding a code to delete all seriesCollection before adding new data:
While Sheets(3).ChartObjects(1).Chart.SeriesCollection.Count > 0
Sheets(3).ChartObjects(1).Chart.SeriesCollection(Sheets(3).ChartObjects(1).Chart.SeriesCollection.Coun t).Delete
Wend

Related

How to INDEX(MATCH from two tables

I have two table that are vertical to one another. I make the following 3rd table from the formula below (Also see picture):
=INDEX($C$3:$C$30,MATCH(1,($I3=$A$3:$A$30)*($K3=$E$3:$E$30)*(L$2=$D$3:$D$30),0))
What I need help with is how to make this formula get data from a horizontal set of tables or tables from different worksheets:
Can I chain together ranges like so?
=INDEX($D$3:$D$14:$M$17:$M$28,MATCH(1,($J3=$A$3:$A$14:$J$17:$J$28)*($L3=$F$3:$F$14:$O$17:$O$28)*(M$2=$E$3:$E$14:$N$17:$N$28),0))
It's not working and I know there MUST be a way to do it.
The information I provided was incorrect for the second table in the Day column this might have been the reason it wasn't initially working. I have fixed it and used the suggestion given by Scott Craner!
The function that works is:
=IFERROR(INDEX($D$3:$D$14,MATCH(1,($J3=$A$3:$A$14)*($L3=$F$3:$F$14)*(M$2=$E$3:$E$14),0)), INDEX($M$17:$M$28,MATCH(1,($J3=$J$17:$J$28)*($L3=$O$17:$O$28)*(M$2=$N$17:$N$28),0)))

Column to rows and highlight difference between values in the same group

I have a huge table with data structured like this:
And I would like to display them in Spotfire Analyst 7.11 as follows:
Basically I need to display the columns that contain "ANTE" below the others in order to make a comparison. Values that have variations for the same ID must be highlighted.
I also have the fields "START_DATE_ANTE" and "END_DATE_ANTE" which have been omitted in the example image.
Amusingly, if you were limited to just what the title asks, this would be a very simple answer.
If you wanted this in a table where the rows are displayed as usual, and the cells are highlighted, you can do this by going to properties, adding a newGrouping where you select VAL_1 and VAL_1_ANTE and add a Rule, Rule type "Boolean expression", where the value is:
[VAL_1] - [VAL_1_ANTE] <> 0
This will highlight the affected cells, which you can place next to each other. You can even throw in a calculated column showing the difference between the two columns, and slap it on right next to it. This gives you the further option to filter down to only showing rows with discrepancies, or sorting by these values.
However, if you actually need it to display the POSTs on different lines from the ANTEs, as formatted above, things get a little tricky.
My personal preference would be to pivot (split/union/etc) the data before pulling it in to Spotfire, with an indicator flag on "is this different", yes/no. However, I know a lot of Spotfire users either aren't using a database or don't have leeway to perform the SQL themselves.
In fact, if you try to do it in Spotfire using custom expressions alone, it becomes so tricky, I'm not sure how to answer it right off. I'm inclined to think you should be able to do it in a cross table, using Subsets, but I haven't figured out a way to identify which subset you're in while inside the custom expressions.
Other options include generating a table using IronPython, if you're up to that.

Update Excel ListObject header names without breaking pivots

I have an Excel sheet with a very wide table on it. Due to developer friendlyness I'd like to use a certain style of column header naming (much like proper Hungarian notation), where I suffix each header name with "column type" tags. This allows me to easily spot where e.g. apples and oranges are compared. There are also pivot table reports based on this table.
An example to illustrate this: say you have 2 monetary columns, column A being expressed in another currency than column B. The model should thus never combine them without first applying appropriate exchange rates. To spot this I name these columns e.g. Earned - Cur1 and Saved - Cur2. Any calculation like =[#[Earned - Cur1]] + [#[Saved - Cur2]] is illegal, but due to the tags this can be picked up easily in an audit. I have several such tag groups in use already, and they already prevented some errors creeping in.
However...
The file also needs to be distributed to lots of not-so-savvy end users, and they need to fill in this table and refer to some of the outcome columns. Most intermediate columns we already hide, but the column names are now far from being user-friendly (like: fill out Actual - NK/Q1/EC/%, please?).
And this needs to run in Excel 2010.
What are my options?
Option 1
Add an extra row above the table, putting human readable names in there, and just hide the table header row. This works, but not the users can't sort and filter the table anymore, so that's a no-go.
Option 2
Augment option 1 by prepending a newline to each column name, and make the table header row 1 character high. The header cells would still be there to drive sorting and filtering and the users have human readable names in the row above. The actual header cells would appear like 'empty' buttons. Could work, but then the complex formulas become unreadable due to all the newlines from the column names all over the place.
Option 3
Add a macro that switches the headers in the table by alternative headers in another row above the table. The macro should be ran just before sending out the file to the users, and ran again when they return them filled in and all. I happily coded this option into the file, and it works wonderfully! But then I realized this (and thus option 2 as well) breaks all the derived pivot tables, since Excel links the data by the names used in the table - update the name, and that section of the pivot will be dropped...
I'd really like the option of having our development-oriented column names in there when we ourselves work with the file, but being able to switch out the headers when needed. And of course without rebuilding all the pivots after each such switch.
An opening here would be that pivots seem to only drop the columns once they're refreshed. I could use this to update the header names, then do some magic on the pivots to remap their fields, and only then refresh them, but it seems there's no way from within VBA to accomplish that (PivotField.SourceName is read only).
Hopefully someone can think of an alternative, or am I SOL? I'm totally open to other workarounds.
Workaround 1
Insert null-terminating characters in the header names such that they do not show normally in the formulas, but do not show in the table header row. If only it were that simple though... Turns out Excel throws up from a =Char(0)&"abc", and things like =Char(8)&"abc" (tab anyone?) give Unicode replacement characters when pasted into a header cell... (?)
Workaround 2
A last resort seems to be to unzip the excel file, and plough through the xml data to update everything in one go there, then rezip the file. But this code also needs to be executed by less skilled users, and I see too many ifs and buts to make me feel safe using this setup.
Workaround 3
For now I just use a variation on option 2; I have some VBA that 'empties' the header cells instead of prepending a newline to them. By 'emptying' I mean setting the font size to 1, subscript, non-bold, and then make the font color identical to the background color, followed by setting it's row height to the default 14.5. The cryptic names do leak out however; column header cell drop down arrows for sorting&filtering show the cryptic name, as well as the pivot field settings and of course the formula bar when you just click such a cell. But I guess it's the best I can do?
And then again I'm probably just perfectionizing this thing faaar to much :) But from this point on it's about the challenge!
Make sure you Tick the Box "Add this data to the DataModel" when creating your pivot(s)
AFAIK when your Pivots are connected to the Datamodel instead of directly to the Range/Table you can change your column-names in the Table and your Pivot will stay fine. You could even use other names in your Pivot.

Excel interpolation with results in situ

Extrapolation in Excel is easy: have a list of numbers (and optionally their paired "X-values"), and it can easily generate further entries in the list with the GROWTH() function.
GROWTH() works for interpolation too: you just need to tell it the intermediate X-values that you want it to calculate for. My problem with it is the appearance of the data in the spreadsheet. Here's an example:
Say I have some inputs, and through some process get some outputs. Only, there were gaps in the experiment so no outputs were generated for some values:
Out of curiosity, I copied the data to the right, and used Excel's "Extend with Growth Trend": I highlighted the first two entries (only), then right-click-dragged-down the little square over the next four cells (overriding the final value there) and chose "Growth Trend" in the context menu. To remind myself that the values were Excel-generated, I gave them a grey background:
Hmm. The generated values (unsurprisingly) aren't a good extrapolation, since they don't factor in the later value. It's out by over 40%! Also note that this Extend feature of Excel is an ease-of-input mechanism, not a calculation tool in its own right - Excel enters the data as raw numbers (to multiple decimal places).
So I formalised the Extend column by using the GROWTH() function - again only factoring in the first two values, but also using their paired X-values and the desired interpolation entry as parameters:
D4: =GROWTH(D$2:D$3,$A$2:$A$3,$A4)
D5: =GROWTH(D$2:D$3,$A$2:$A$3,$A5)
D6: =GROWTH(D$2:D$3,$A$2:$A$3,$A6)
Thankfully, the results mimic those of the previous column (Microsoft use the same mechanism for both features!) I didn't overwrite the last entry, since after all it has the value that I actually want! The fact that the calculated values are the same as before is the problem I'm trying to fix, and that this question is about.
To improve the calculated values, I need to incorporate the last value - but at the same time I want the "natural" sequence of input values to be maintained. In other words, I want the interpolated values to be placed in situ. That implies that the arguments to the GROWTH() function need to be discontiguous ranges, which Excel does by using the (Range,Range,...) syntax. I tried it, and got #REF! errors. I then tried using a named discontiguous range: same result.
After a bit of Googling (and StackOverflowing!) I found references to using INDIRECT() - a particularly problematic 'solution', since it evaluates strings that would need to be manually maintained. Nevertheless:
E4: =GROWTH(INDIRECT({"E2:E3","E7"}),INDIRECT({"A2:A3","A7"}),A4)
E5: =GROWTH(INDIRECT({"E2:E3","E7"}),INDIRECT({"A2:A3","A7"}),A5)
E6: =GROWTH(INDIRECT({"E2:E3","E7"}),INDIRECT({"A2:A3","A7"}),A6)
…and after all that it didn't work anyway! The values remained the same as the previous version, that didn't incorporate the last value. Maybe the last value doesn't make for better interpolation results? So, as an experiment, I ignored the "in situ" requirement and generated an "ex situ" version, with the known values followed by the desired values, allowing me to use simple ranges. Success! But to highlight that the data is in the wrong order, I asked Excel to create an X-Y plot of the data too:
B13: =GROWTH(B$10:B$12,$A$10:$A$12,$A13)
B14: =GROWTH(B$10:B$12,$A$10:$A$12,$A14)
B15: =GROWTH(B$10:B$12,$A$10:$A$12,$A15)
Of course, the results are exponential not linear, so setting the Y-axis to logarithmic generates a very readable result - and it effectively masks the back-and-forth of the data. But deep down, we both know that the data is wrong - just look at the table!
Maybe, just maybe, if I used Excel's "Sort Data" feature it would break up the range for me, and show me how I should have written the formulae? Sadly, although it looks like it worked, I get a "Circular reference" error for B12 - the range wasn't modified to make it discontiguous, and now B12's result is dependent on the original range which includes itself! I coloured it below to indicate that this isn't a viable solution:
So, my "final" solution is to maintain the previous "ex situ" version, and simply have an "in situ" column as well that does a VLOOKUP() on the ExSitu (named) table - and I needed to tell it to do an exact match with the FALSE parameter, since the list isn't sorted:
F4: =VLOOKUP($A4,ExSitu,2,FALSE)
F5: =VLOOKUP($A5,ExSitu,2,FALSE)
F6: =VLOOKUP($A6,ExSitu,2,FALSE)
Note that I labelled the column with an asterisk since it's a cheat: the values are only in situ by copying from another table.
Phew! After all that, my question:
Is there a way to directly interpolate the "in situ" values, without having to have an "ex situ" lookup table to generate the results? The above example was deliberately straightforward: you can easily imagine a longer list with more gaps to be filled in.
Since you had a good data sense, I'll share my discovery path on this case. I'm more like a visual person. I don't see things 'that' clear via tables. Here is what I do to you data points. :
Input Raw
360 7.16
370 28.9
380
390
400
410 5,380.00
Highlight all and press my favorite button > F11. I choose line chart type. Then with the plus button on the top left of the chart, I add trendline > more options.. From there I choose 'polynomial' and 'exponential' . Plus, a tick on 'display equation on chart' As you can see in the links, both fit seem ok. just take the equation and fit in for other values as needed.
Three things I've noticed :
The polynomial and exponential fit is close enough to what I need. But it doesn't exactly 'map' on the ( 410, 5380.00 ) point.
By having the formula I find it easier to make sense of whether or not the trendline 'proposed' by excel is a close fit to my need. As you play around you can see how far-off the linear & logarithmic trendline can be.
The trendline equation doesn't really map to 360,370,410... point as the x value, it assumes x is 0,1,2,3... (try to test it with the 'equation' of the excel proposed trendline)
IMHO, use excel trend with care. My next best fitting tool -> wolframalpha logarithmic fit.
For the original question :
Is there a way to directly interpolate the "in situ" values, without having to have an "ex situ" lookup table to generate the results?
I think my simple answer will be : Indirectly, Yes. Directly? not sure.
Hope this heals/helps in some ways.. ( :

How to Differentiate a Data from a Column/Header in an Excel File

I hope someone can help me come up with an algorithm.
Im still very new with Apache POI and I was assigned to come up with an algorithm on how to read a template (Excel) and extract the headers/column names from the data itself.
The following must be taken into account:
There can be multiple headers/column names in just one sheet of an Excel file.
Headers can be horizontal AND/OR vertical in nature. This means that there could be a mixture of vertical and horizontal headers in one sheet.
Headers dont necessarily have to be at the very first row of the file. There could be introductions or banner images there.
The system must allow ANY kind of Excel format, so there is no control over the formatting of the cells, the naming convention, etc.
Some headers are alphanumeric in nature, which means it also contains numbers.
Some cells are merged to make room for a specific header.
Any ideas and suggestions are very much welcome. Just let me know if you have further clarifications.
(I know nothing about Apache, but some about Excel Interop working)
If the sheets to be detected are yours, I'd recomend NAMING those header cells. (To name a cell in Excel, there's a field at the top left of the screen, where normally the cell coordinates appear (like "A1" or "B2" and so...). Type a name in that place, and you will be able to identify that cell via code by it's name. ( 'Worksheet.Range("Name")' is where you get those cells via code)
To manage names, go to "Insert - Names" or "Formulas - Name manager", depending on what version of excel.
(Personally, I never work with sheets via code without naming headers, then I use "Offset" to get the data cells corresponding to those headers - This allows me to freely edit the sheet later without breaking the code)
If the sheets aren't yours, then, you'll need to find out the extents of the data. (Last row and last column)
Then check for the first line that contains all columns filled, none of them blank. That's a probable horizontal header.
As well as check for the first columns that contains all lines filled. That's a probable vertical header.
You could, as well, search for completely blank lines and/or columns to find headers that are AFTER some data, in case of sheets containing multiple horizontal headers, or vertical.
You could use some formatting properties (Range.Interior or Range.Font for examples) of those cells to identify if they are headers (usually headers have different format, color, borders and so on).
If you're sure there's no numeric header, I mean, all headers contains text, check for the type of data in the cells. If all are strings, header probability increases.
Even so, that's a tricky thing to do, if sheets don't follow some pattern, once in a while one of them can deceive your code and bring false results. I'd recommend, if alowed, to add a human verification to confirm the results after the proccess is done.
The solution to this problem involves taking away two of these freedoms. Such constraints applied will make this a tractable problem. Most of such freedoms come from overcautious thinking.
The freedoms are given as quotes below:-
Headers can be horizontal AND/OR vertical in nature. This means that there could be a mixture of vertical and horizontal headers in one sheet.
Typically, vertical headers are not used in Excel Files where there is a need to programmatically detect headers. As the primary, most common and sometimes the only reason for such detection is to upload/transform the tabular data.
Funny things happen when vertical headers are introduced:
They become Labels of Forms. This implies that such forms are used for data entry rather than storage. The data from such forms is stored in horizontal/columnar headers and rowwise/vertical records of data . Thus obviating the need for Upload/Transformation of the data entry sheet.
Excel is designed to have only horizontal headers. Vertical Headers cease to have autofilter support.
Even when Vertical Headers are present, a top horizontal header row can still be introduced to mark the headers themselves as descriptions / categories.
Staying true, to the core need for autodetection of headers, we can state that once our requirement states that Headers can be placed only in a horizontal alignment, the solution becomes slightly more tractable but not fully so.
Some cells are merged to make room for a specific header.
Merging cells is poison and anathema to the entire reason for transformation/upload of data. This is a pill I steadfastly have refused to take in my entire career with Excel & SQL jugglery. You may kindly merge all that you want to for all I care, however thee shall not pass into my beloved SQL Server.
For aforementioned reasons of prejudice and ill-will towards all mergers and mergees alike. I'd respectfully suggest that you too take this course.
Solution
Staying true to the above requirements after taking away the 2 freedoms. The pseudo algorithm (solution) is to
Take a sample of say c x r Excel Rows. For eg: 200 x 201 rows and columns
Find the counts of non-empty cells using an inbuilt formula like COUNTA whose contents have a non-zero length. The Count of such non-empty cells in each row is maintained as a data structure.
The type of data ie:- Number, Date, String should also be maintained in the above data structure capable of expressing the following:
Row# 22 contains
30 non-empty cells of which
28 are alphanumeric,
1 is a Date and
1 is a Number.
The First specific row that contains the maximum number of such non empty cells with the maximum number of strings should very likely be the header row.
Converting all of the above to a specific algorithm in any given language should be a deliciously occupying task for any young developer in their prime.

Resources