Has someone created a language which can be used to track/analyze dependencies between grid cells in a generic way?
I'm trying to write a spreadsheet which uses a functional language. What I'm after is something similar to what Excel might use to manage references between cells. The language will be used create a model which can be analysed for optimisation.
There is a Lisp library which was ported to many Lisps (and some other languages)
http://common-lisp.net/project/cells/
At least some ideas there are worth borrowing.
I've done this with the ancient sc calculator, a very long time ago. You build a dependency graph based on the contents of the cells. I know the cells have two-dimensional names, but for simplicity I'm going to name them with single characters.
Suppose cell X contains the formula Y+1. Then you add an edge from X to Y in the dependency graph. If X contains the formula Y+0.15*Z then you add two edges: from X to Y and from X to Z.
When you've visited every cell, you've built the entire graph. Do a topological sort. If there are no cycles, you're in luck—you can recalculate in topological order.
You can use the same dependency graph for analysis and optimization. You can also update the dependency graph incrementally as the contents of cells change.
I hope this is enough to get you started.
Related
Alright this should be a simple one.
I apologize in case it has been already solved, but I can only find posts related to solving this issue with programming languages and not specifically to EXCEL.
Furthermore, I could find posts that address a sub-problem of my question (e.g. regarding limitation of certain EXCEL functions) and should solve/invalidate my request but maybe, just maybe, there is a workaround.
Problem statement:
I want to calculate the minimum value for each column in an EXCEL matrix. Simply enough, I want to input a 2D array (mxn matrix) in a function and output an array with dimension 1xm where each item is the minimum value MIN(nj) of each nj column.
However, I want to solve this with specific constraints:
Avoid using VBA and other non-function scripting: that I could devise myself;
All in one function: what I want to achieve here is to have one and one function only, not split the problem into multiple passages (such as for example copypasting a MIN() function below each column, that wouldn't do it);
The result should be a transposable array (which is already ok, I assume);
Where I am stranded with my solution so far:
The main issue here is that any function I am trying to use takes the entire matrix as a single array input and would calculate the MIN() of the entire matrix, not each column. My current (not working) function for an exemplary 4x4 matrix in range A1:D4 would be as below (the part in bold is where it is clearly not working):
=MIN(INDEX(A1:D4,SEQUENCE(4,4,1,1)))
which ofc does not work, because INDEX() does probably not "understand" SEQUENCE() as an array of items to take into account. Another, not working, way of solving this is to input a series of ranges (A1:A4;B1:B4;C1:C4;D1:D4) so that INDEX() "understands" the ranges as single columns, but ofc does not know and I do not know sincerely how to formulate that. I could use INDIRECT() in some way to reference the array of ranges, but do not know how and could find a way by searching online.
Fundamental question is: can a function, which works with single arrays, also work with multiple arrays? Basically, I do not know how to communicate an EXCEL array formula, that each batch of data I am inputting is a single array and must be evaluated separately (this is very easily solved with for() cycles, I know).
Many thanks for any suggestion and any workaround, any function and solution works as longs as it fits in the constrains defined above (maybe a LAMBA() function? don't know).
This is ofc a simplification of a way more complex problem (I am trying to calculate the annual mean temperature evolution for a specific location by finding the value - for each year from 1950 to 2021 - that is associated to the lat/lon coordinates that are the nearest to the one of the location inputted, given a netCDF-imported grid of time-arrayed data; the MIN() function is used to selected the nearest location, which is then used, via INDEX() to find temp data). I need to do this in one hit (meaning just pasting the function, which evaluates a matrix of data that is referenced by a fixed range), so that I can just use it modularly for other data sets. I already have a working solution, which is "elegant"* enough, but not "elegant"* as the one I could develop solving this issue.
*where "elegant"= it saves me one click every time for 1000+ datasets when applying the function.
If I understand your problem correct then this should solve it:
=BYCOL(A1:D4,LAMBDA(d,MIN(d)))
How to recognize (and SUM) also chemical formulas which have more occurrences of elements such as AsH2(C4H9) = where you have H on two places or in CH2COOH where actually only one of the "same" elements will be counted??
It is good explained on these threads below, but it isn't included when you have compounds with repetitive elements, where is necessary to SUM it properly, otherwise, one element is always omitted.
So those threads are only for counting elements and simply calculate of compounds, but not for compounds with brackets, and repetitive elements.
How to count up elements in excel
Calculating Molecular Weight Using Excel
And is it possible to do it without VBA coding?
This is coding that I have been using.
=E$2*MAX(IFERROR(IF(FIND(E$1&ROW($1:$99);$A3);ROW($1:$99);0);0);IFERROR(IF(FIND(E$1&CHAR(ROW($65:$90));$A3&"Z");1;0);0))
I included the data set example, in red you can see problematic actually. It is "matching" all the elements without brackets and this is not what I want.
In column B is only sum of all the components from right side.
I used this function:
=E$2*MAX(IFERROR(IF(FIND(E$1&ROW($1:$99);$A3);ROW($1:$99);0);0);IFERROR(IF(FIND(E$1&CHAR(ROW($65:$90));$A3&"Z");1;0);0))
Extrapolation in Excel is easy: have a list of numbers (and optionally their paired "X-values"), and it can easily generate further entries in the list with the GROWTH() function.
GROWTH() works for interpolation too: you just need to tell it the intermediate X-values that you want it to calculate for. My problem with it is the appearance of the data in the spreadsheet. Here's an example:
Say I have some inputs, and through some process get some outputs. Only, there were gaps in the experiment so no outputs were generated for some values:
Out of curiosity, I copied the data to the right, and used Excel's "Extend with Growth Trend": I highlighted the first two entries (only), then right-click-dragged-down the little square over the next four cells (overriding the final value there) and chose "Growth Trend" in the context menu. To remind myself that the values were Excel-generated, I gave them a grey background:
Hmm. The generated values (unsurprisingly) aren't a good extrapolation, since they don't factor in the later value. It's out by over 40%! Also note that this Extend feature of Excel is an ease-of-input mechanism, not a calculation tool in its own right - Excel enters the data as raw numbers (to multiple decimal places).
So I formalised the Extend column by using the GROWTH() function - again only factoring in the first two values, but also using their paired X-values and the desired interpolation entry as parameters:
D4: =GROWTH(D$2:D$3,$A$2:$A$3,$A4)
D5: =GROWTH(D$2:D$3,$A$2:$A$3,$A5)
D6: =GROWTH(D$2:D$3,$A$2:$A$3,$A6)
Thankfully, the results mimic those of the previous column (Microsoft use the same mechanism for both features!) I didn't overwrite the last entry, since after all it has the value that I actually want! The fact that the calculated values are the same as before is the problem I'm trying to fix, and that this question is about.
To improve the calculated values, I need to incorporate the last value - but at the same time I want the "natural" sequence of input values to be maintained. In other words, I want the interpolated values to be placed in situ. That implies that the arguments to the GROWTH() function need to be discontiguous ranges, which Excel does by using the (Range,Range,...) syntax. I tried it, and got #REF! errors. I then tried using a named discontiguous range: same result.
After a bit of Googling (and StackOverflowing!) I found references to using INDIRECT() - a particularly problematic 'solution', since it evaluates strings that would need to be manually maintained. Nevertheless:
E4: =GROWTH(INDIRECT({"E2:E3","E7"}),INDIRECT({"A2:A3","A7"}),A4)
E5: =GROWTH(INDIRECT({"E2:E3","E7"}),INDIRECT({"A2:A3","A7"}),A5)
E6: =GROWTH(INDIRECT({"E2:E3","E7"}),INDIRECT({"A2:A3","A7"}),A6)
…and after all that it didn't work anyway! The values remained the same as the previous version, that didn't incorporate the last value. Maybe the last value doesn't make for better interpolation results? So, as an experiment, I ignored the "in situ" requirement and generated an "ex situ" version, with the known values followed by the desired values, allowing me to use simple ranges. Success! But to highlight that the data is in the wrong order, I asked Excel to create an X-Y plot of the data too:
B13: =GROWTH(B$10:B$12,$A$10:$A$12,$A13)
B14: =GROWTH(B$10:B$12,$A$10:$A$12,$A14)
B15: =GROWTH(B$10:B$12,$A$10:$A$12,$A15)
Of course, the results are exponential not linear, so setting the Y-axis to logarithmic generates a very readable result - and it effectively masks the back-and-forth of the data. But deep down, we both know that the data is wrong - just look at the table!
Maybe, just maybe, if I used Excel's "Sort Data" feature it would break up the range for me, and show me how I should have written the formulae? Sadly, although it looks like it worked, I get a "Circular reference" error for B12 - the range wasn't modified to make it discontiguous, and now B12's result is dependent on the original range which includes itself! I coloured it below to indicate that this isn't a viable solution:
So, my "final" solution is to maintain the previous "ex situ" version, and simply have an "in situ" column as well that does a VLOOKUP() on the ExSitu (named) table - and I needed to tell it to do an exact match with the FALSE parameter, since the list isn't sorted:
F4: =VLOOKUP($A4,ExSitu,2,FALSE)
F5: =VLOOKUP($A5,ExSitu,2,FALSE)
F6: =VLOOKUP($A6,ExSitu,2,FALSE)
Note that I labelled the column with an asterisk since it's a cheat: the values are only in situ by copying from another table.
Phew! After all that, my question:
Is there a way to directly interpolate the "in situ" values, without having to have an "ex situ" lookup table to generate the results? The above example was deliberately straightforward: you can easily imagine a longer list with more gaps to be filled in.
Since you had a good data sense, I'll share my discovery path on this case. I'm more like a visual person. I don't see things 'that' clear via tables. Here is what I do to you data points. :
Input Raw
360 7.16
370 28.9
380
390
400
410 5,380.00
Highlight all and press my favorite button > F11. I choose line chart type. Then with the plus button on the top left of the chart, I add trendline > more options.. From there I choose 'polynomial' and 'exponential' . Plus, a tick on 'display equation on chart' As you can see in the links, both fit seem ok. just take the equation and fit in for other values as needed.
Three things I've noticed :
The polynomial and exponential fit is close enough to what I need. But it doesn't exactly 'map' on the ( 410, 5380.00 ) point.
By having the formula I find it easier to make sense of whether or not the trendline 'proposed' by excel is a close fit to my need. As you play around you can see how far-off the linear & logarithmic trendline can be.
The trendline equation doesn't really map to 360,370,410... point as the x value, it assumes x is 0,1,2,3... (try to test it with the 'equation' of the excel proposed trendline)
IMHO, use excel trend with care. My next best fitting tool -> wolframalpha logarithmic fit.
For the original question :
Is there a way to directly interpolate the "in situ" values, without having to have an "ex situ" lookup table to generate the results?
I think my simple answer will be : Indirectly, Yes. Directly? not sure.
Hope this heals/helps in some ways.. ( :
I have an Excel spreadsheet where each column is a certain variable. At the end of my columns I have a special last column called "Type" which can be A, B, C, or D.
Each row is a data point with different variables that ends up in a certain "Type" bucket (A/B/C/D) recorded in the last column.
I need a way to examine all entries of a certain type (say, "C" or "C"|"D") and find out which of the variable(s) is a good predictor of this last column, and which are better predictors than others.
Some variables are numbers, others are fixed strings (from a set of strings), so it's not just a number/number correlation.
Is Excel 2003 a good tool for that, or are there better statistical programs that make this easier? Do I create a Pivot/Histogram for each category, or is there a better way to run these queries? Thanks
You can make some filtering, especially to clean the data (I mean, to change the data values into one type, string or numeral) using microsoft excel. Execl also makes some data mining. However, for the kind of problems you have, a good tool that I recommend you is WEKA. Using this tool, you can make associative classification prediction (i.e., class association rule mining)of all data instances(rows) and therefore, you can determine which items fall belong to A/B/C/D. Your special attribute will be your class attribute.
I'm autofilling some columns in Excel (one at a time). These column use a UDF I wrote in ExcelDna. Using taskmanager, I notice that only half of the cores are being used. Excel settings is set to "use all processors on this computer." So I can't figure out why only half the cores are in use. Thoughts?
This page contains some insightful explainations about multi-processing in Excel 2007+
http://msdn.microsoft.com/en-us/library/office/aa730921(v=office.12).aspx#office2007excelperf_ExcelPerformanceImprovements
*Ref: "Multithreaded Calculations"
"Some Excel features do not use multithreaded calculation, for example:
Data table calculation (but structured references to tables do use MTC).
User-defined functions (but XLL functions can be multithread-enabled).
XLM functions.
INDIRECT, CELL functions that use either the format2 or address options.
GETPIVOTDATA and other functions referring to PivotTables or cubes.
Range.Calculate and Range.CalculateRowMajorOrder.
Cells in circular reference loops. "
Since it doesn't specifically describe "autofill" - I'll take a crack at it from a concurrency perspective. Please forgive if some of this is speculation..
Certain functions require an "in-order" task that cannot easily be split across processors, we might suspect that fill is one of them. (It requires sequential operation in some of it's modes.. example: stepping 1,1.2,1.4,etc). In this example, processor 2 can't just start from the middle of the page without performing a new/custom independant calculation. Special functions would have to be designed. Perhaps they decided not to code for these kinds of scenerios. With formulas I can't see this even being possible, because you could be creating a formula tree.
Other operations are indepentant based on the formula trees (see link). This strongly implies Excel won't multi-process a complext formula tree either .. so if you have numerous formulas connected, they (that tree) will be only be processsed, in order, by one processor.
Granted, there's all kinds of complex work-arounds for these situations.. (perhaps similiar to compilers, SQL servers, etc) but the documentation above is highly suggestive that Microsoft designed it to streamline independant tasks only.