Excel: How to parse/cast text as a formula? - excel

Is it possible to parse/cast text (like "=A1+A2") as a formula in MS Excel? I want to build a formula from pieces of text - some of which will only be typed in later by a user.
If the INDIRECT() function did not only work for referencing cells, then I could have typed this =INDIRECT("=A1+A2").
I know you can a work around this problem by simply adding a lot more hidden columns to do sub calculations. But for the sake scalability and efficiency, I would rather do something like the above.
I found a similar questions here and here, yet they don't solve my problem.
The Real-world problem:
Read on for a better understanding as to why you would want to do the above
Scenario
Each item in the list consists of a string, which contains anywhere from 1 to 5 account names each. Each account name is followed by an account number in brackets. The length of the number determines the type of account. Part of the account number is a date, of which the date format depends on the type of account. Further more, each account type may have more that 1 account-number length associated with it, although each number-length[*] is only associated with 1 account type.
Objectives
Extract account-names and their respective account-numbers and account-types from a list.
Make an assumption as to the account-type from the account-number
Validate this assumption by inspecting the build of the number and elements in the name
Check the validity of the account-numbers depending on their type.
The tricky part (this is where my problem lies)
The account-types and their respective account-number-lengths are not known before hand, and are typed into a table by the user of the sheet, specifying a type of account and the number-lengths associated with this account-type. The user should type this into a list - not go and tinker around with delicate formulas
Done so far
Column A: Contains the raw data (each cell has up to 5 names and numbers)
Columns B..F: Each column extracts 1 name, remains empty if all are already extracted
Columns G..K: Each column extracts 1 number corresponding to its name in columns B..F, remains empty if all are already extracted
Columns L..P: Each column calculates the length of the corresponding number in columns
G..K
Now the user would type the following details into a table which assigns certain number-lengths an account type:
TYPE2, BUSINESS, (OR(length=13,length=6))
where length will later be replaced with the cell address which contains the calculated account number-length.
What I want to do now
Columns Q..U:
Should all indicate the account-type of the corresponding account-number in columns G..K. The idea is to build a nested if-elseIf-elseIf formula using the criteria typed in by the user as specified above. Example of one of the elseIF statements:
SUBSTITUTE(CONCATENATE("=IF(",criteria,",",type,",",errCode)),"length","O10"))
All of these elseIf statements will then be concatenated together to form a master formula which will then need to be parsed/cast as a formula to calculate the account-type
This proposal uses only 5 columns (1 for each account-number, containing the master formula) and a table specifying account-types and criteria, also keeping the user away from formulas. Editing 1 line of code (the criteria) will update all formulas. Efficient & Scalable.
Since the user should never tinker around with the formulas under the hood, a simple 1 column if-elseIf-elseIf is out of the question. The alternative to the above would be to make a separate column to test for each account-type for each account-number. Separating/Abstracting out each test to its own column results in much better readability, easier editing & much less debugging - Unless you like multi-screen-wide-formulas. Example: 5 account-numbers * 10 possible account types = 50 extra columns.
Each edit to any criteria needs to copied to 4 other non-adjacent columns and drag-filled down 10,000 rows (columns can not be adjacent since it is effectively a 5x5 array of columns). Not Efficient nor scalable. Unless I'm missing some elegant way of updating non-adjacent formulas in a single click
The rest of the validations error indications are trivial.
Sample data
Tshepo Trust (6901/2005) Marlene Mead (8602250646085)
Great Force Inv 67 Pty Ltd (200602258007)
Jane (870811) Livingstone (6901/2005) Janette Appel (8503250647056) James (900111)
I know all this would probably be much easier to achieve with clever usage of VBA, eliminating all the need to simulate abstraction, encapsulation, multi-dimensional arrays and functional programming on a spreadsheet. But until I can program in VBA, worksheet formulas will be my refuge.
[*]: account number-length could also be described as the amount of digits in the number or as indicated by this formula: LEN(accNumber)

In VBA you have access to Cell.Formula.
I usually used Range to peek a cell by address.

I'm not sure if this would answer your question(it's a very detailed question!), but if your user was entering the account numbers in a table (I'm calling it 'RefTable') , that was:
Length of account number | business type
----------------------------------------
6 | Accountant
8 | Advisor
Then you could just use a vlookup on the length of the account number, given you've already separated them out.
=vlookup(len(accNumber), Reftable, 2, false)
Make sure that you either use a dynamic range name, or specify plenty of space below in RefTable, so that when your users add types, they don't get lost.
Also, if you have two different accounts with the same length, this could get you into trouble.

Related

Deal with Ties when Using Index/Match

I'm currently pulling the top (5) number of numerical values from one sheet and inputting them into a different sheet. Each number is within its own column and there is a name matching that column, EX:
And so, having a tie is common with the data that I'm working with, so it nearly deprecates my formulas.
For getting the name:
=INDEX('Total Cases by Categories'!$B$18:$B$50, MATCH(LARGE('Total Cases by Categories'!$H$18:$H$50, A39),'Total Cases by Categories'!$H$18:$H$50, 0))
For getting the numerical value associated with the name:
=LARGE('Total Cases by Categories'!$H$18:$H, A39)
And so, when there are 2 people with the same numerical value associated within a category, then that person appears twice, I assume because of their position within the sheet.
So something like this happens:
So in the event of a tie, I would want to list both names that have the same amount of points instead of the first name that shows up with the duplicated value.
Any help would be appreciated!
Actually, LARGE will give you both of tied names. It's MATCH that can't look beyond the first. To the best of my knowledge there is no way around that (the difficult one being not to use MATCH). Therefore the solution is to have no ties.
This is achieved with helper columns that contain no identical numbers. This can be achieved by adding an insignificant decimal. Since you are dealing with integers, adding 0.1 would be insignificant for your purposes but 13.1 is different from 13.2. If you need to extract the "real" number from this use INT(13.2).
Using the row number to generate an insignificant decimal is popular for this purpose. In row 1 ROW()/10 will return 0.1. But in row 10 ROW()/10 will return 1.0 which isn't an insignificant number anymore. Therefore you have to work with ROW()/100 or an even larger divisor, depending upon how many rows you have. Try ROW()/10^6 - any decimal will do the tie-breaking job.
You may not like that using ROW() will list tied participants in the order in which they appear in the worksheet. The differentiating decimals can be created by any other means that doesn't create ties in itself.
Normally, the helper columns with the decimals added will be hidden. They contain a formula like =D23 + (ROW()/10000) which manages itself. You can then use that column for the MATCH function to list all participants in the order of LARGE using the helper column or the original. Just make sure that MATCH refers to the helper column.

Sum of Questionnaire Scores Based on a Domain Table

I have created a questionnaire that consists of around 100 questions. Participants are asked to fill them in online, where the items are shuffled each time. These items are separated into 6 domains where, for the sake of easier understanding, let's just call them Domain 1 - 6.
I have them typed in one specific table called "Correspondence", with format like below:
(An example)
Question No.|Domain
   1  |Domain A
   2  |Domain C
   3  |Domain A
   4  |Domain B
   5  |Domain A
   6  |Domain C
I used Google Form to generate a spreadsheet of RAW data of respondents, where it will help me mark the RAW Scores, for each item on a separate column:
(An example)
Submission ID|Question 1|Question 2|Question 3|Question 4|Question
5|Question 6
Participant 1 |  2  |  3  |  5   |  1   |  2   |  4   |
Participant 2 |  5  |  4  |  5   |  3   |  5   |  1   |
Participant 3 |  1  |  1  |  1   |  2   |  2   |  2   |
The next thing I need to do is generate another table that sums up the Domain totals for each participant. So from the example above, I need to sum 1,3,5 as Domain A, 4 as Domain B and 2 & 6 as Domain C:
(An example)
Participant 1
    |Domain A|Domain B|Domain C|
Total |  9  |  1  |  7   |
The hardest thing is to find a proper method to kick start this process. Can anyone point me in the right direction? Either formulas or VBAs would be fine too. Thanks!
This can be done if you are able to create a helper row.
First, I created a table to link the question to domain. That is named in my example as "Correspondence". This table is somewhat the answer key. From your description of the problem, you need a table like this to establish which question is associated with the domain/category/point system you want to use.
I then created a helper row for the survey results shown on row 9. This has =INDEX($B$3:$C$8,MATCH(B$10,$B$3:$B$8,0),2) in cell B9 as the code to reference the question to the domain. This is immediately above the questions in the example, but you can put it on a separate sheet if needed.
Then you can just sum them up.
=SUMPRODUCT(SUMIFS(INDIRECT(MATCH($E3,$A:$A,0)&":"&MATCH($E3,$A:$A,0)),$9:$9,F$2))
This formula uses MATCH, which returns an integer, inside INDIRECT to be used as a dynamic row reference. This will fail if the participant names are not unique. The SUMIFS inside the SUMPRODUCT allows the row to be treated like an array without using an array formula. So you can recreate the example I have and copy/paste or drag and paste the formulas as you wish.
A different approach may be that you want to sum up the points to the questions first and then do the conversion from question to domain. That way you don't ever have to manipulate the raw data, just the reports. That may be the better approach for you, actually.
Edit: Added information about the formulas and the example.

Create an Excel Formula that uses filtered data

I'm trying to design a second page that shows % results of my data on page 1.
For example, Column F & G allow manual entry of numbers 1-4 which are based off data the user types in at another location.
This is being used for trade tracking in investments so there will be quite a few numbers but the end result will be a row will show a specific stock, it's subsequent data, whether it made or lost money, etc.
What I want to do in page 2 is using the numbers 1-4 which were typed in at columns F & G, translate that into an edge on page 2.
For example, if there were 50 columns of data typed out for trades executed, I could take the number of winning trades of a certain setup (say number 3) and divide that by the total trades of 50 to come out with a win % for that setup.
However, I have no clue to how to translate that forumla into a filter formula so that on page 2 I could see that of the numbers 1-4 (4 different setups) I could easily see the highest and lowest win % to determine the best setup to use.
I'm not the best in excel but I understand enough to code most of that, I simply have no idea how to take that end formula and add a filter to it so that it only uses partial results. I've got 4 other formulas I want to use on page 2 as well to help build something that could really benefit myself, but if someone could just show me how to filter data into a formula, I think I could take it form there.
Thanks for the help
Ben
You can also do something like this with array formulas
=MAX(IF(Sheet1!$F$2:$F$50=$A2,$E$2:$E$50))
(Press Ctrl+Shift+Enter [CSE], instead of just Enter when entering Array Formulas)
Also, take a look a the SUMPRODUCT function. It comes in very handy for filtering data. Here are some helpful links...
https://www.get-digital-help.com/2017/12/07/sumproduct-multiple-criteria/
https://www.get-digital-help.com/2017/12/08/sumproduct-and-if-function/
https://www.get-digital-help.com/2010/09/01/extract-a-unique-distinct-list-by-matching-items-that-meet-a-criterion-in-excel/

SUMIFS with intermediate VLOOKUP in the criteria

I have 3 tables, 1 of which I want to fill in columns with data based on the other 2. Tables are roughly structured as follows:
Table 1 (Semi-Static Data)
SubGroup Group
----------- -----------
subgroup(1) group(a)
subgroup(2) group(b)
subgroup(3) group(b)
subgroup(4) group(c)
etc.
Table 2 (Variable Data)
SubGroup DataValue
----------- -----------
subgroup(1) datavalue(i)
subgroup(2) datavalue(ii)
subgroup(3) datavalue(iii)
subgroup(4) datavalue(iv)
etc.
Table 3 (Results)
Group TotalValue
----------- -----------
group(a) totalvalue(m)
group(b) totalvalue(n)
group(c) totalvalue(o)
etc.
Where the TotalValue is the sum of all DataValue's for all subgroups that belong to that particular Group.
e.g. for group(b) ---> totalvalue(n) = datavalue(ii) + datavalue(iii)
I am looking to achieve this calculation without adding any additional columns to the Data tables nor using VBA.
Basically I need to perform a COUNTIFS where there is an additional VLOOKUP matching the subgroup criteria range to the group it belongs to, and then only summing for datavalue's that match the group being evaluated. I have tried using array formulas but I'm having issues making it work. Any assistance would be very appreciated. Thank you,
EDIT: Wanted to add some details surrounding my question. First all Google searches did not provide a suitable answer. All the links had solutions to a slightly different problem were the VLOOKUP term is not dependent on the SUMIFS criteria but rather another single static variable. Stack Overflow offered similar solutions. Please let me know if anymore details are required to make my post suitable for this forum. Thank you again.
You can use the SUMPRODUCT function to do it all at once. The first reference $B$2:$B$5 is for the Group names, the second reference $E$2:$E$5 is for the datavalues. The G2 reference is for the group names in the third table, you can enter this formula for the first reference and then drag and fill for the rest.
=SUMPRODUCT($E$2:$E$5 * (G2 = $B$2:$B$5))
Some cell references, and sample data, would be helpful but something like this might be what you want:
=SUMIF(C:C,"="&INDEX(A:A,MATCH(E5,B:B,0)),D:D)
WADR & IMHO, this is simply bad worksheet design. For lack of a single cross-reference column in Table2, any solution would have to be a VBA User Defined Formula or an overly complicated array formula (the latter of which I am not even sure is possible). The data tables are not normalized database tables you can INNER JOIN or GROUP BY ... HAVING.
The formula you are trying to achieve is akin to,
=SUMPRODUCT(SUMIF(D:D, {"subgroup(2)","subgroup(3)"}, E:E))
That only works with hard-coded values as arrayed constants (e.g. {"subgroup(2)","subgroup(3)"}). I know of no way to spit a dynamic list back into the formula using additional native Excel functions but VBA offers some possibilities.
HOWEVER,
The simple addition of one more column to Table2 with a very basic VLOOKUP reduces all of your problems to a SUMIF.
     
The formula in the new column D, row 2 is,
=VLOOKUP(E2, A:B, 2, FALSE)
The formula in I2 is,
=SUMIF(D:D, H2,F:F )
Fill each down as necessary. Sorry if that is not what you wanted to hear.
Thank you everyone that responded and reviewed this post. I have managed to resolve this using an array formula and some matrix algebra. Please note that I am not using VLOOKUP (this operator cannot be performed on arrays) nor SUMIFS as my title states.
My final formula looks like this:
{=SUM(IF([Table2.xlsx]Sheet1!SubGroup=TRANSPOSE(IF([Table1.xlsx]Sheet1!Group=G2,[Table1.xlsx]Sheet1!SubGroup,"")),[Table2.xlsx]Sheet1!DataValue))}
Very simply, I create an array variable that compares the Group being evaluated (e.g. cell G2) with the Groups column for Table 1 and outputs the corresponding matching SubGroups. This results in an array with as many rows as Table 1 had (N) and 1 column: Nx1. I then transpose that array (1xN) and compare it to the SubGroups column (Mx1, M being the number of rows in Table 2) and output the DataValues column for the rows that have a corresponding SubGroup (MxN). Then I perform a sum of the whole array to return a single value.
Notice that as I didn't include a value_if_false output return on either IF operators, it will just populate with FALSE in the arrays were the conditions are not met. This does not matter though for the final result. In the first IF, FALSE will not match the SubGroups so will be ignored. For the second all values FALSE passed to SUM will be calculated as 0. The more complicated question is that it grows the amount of memory required to process as we are not filtering to just have the values we want.
For this application I decided against filtering the subarray as the trade-off in resource utilization was acceptable. If the data sets were any bigger though, I would definitely try doing it. Another concern was that I did not understand fully the filtering logic that I was using based on http://exceltactics.com/make-filtered-list-sub-arrays-excel-using-small/ so decided to simplify. Will revisit this concept latter as I think it will work. I might have completed this solution but was missing transposing the array to compare properly so abandoned this route.

Excel Data Validation as input to another Data Validation

Assuming I have a worksheet with the following information:
Manager Division
Gustavo 1
John 2
Jack 2
Paul 1
Simona 2
I have a data validation list that allows the user to select a division.
If the user selects 1, then in another data validation list I want to list Gustavo and Paul. IF the user selects 2, then in another data validation list I want to list John, Jack and Simona.
Moreover, the data might scale. What I mean is: maybe below Simona another user can be added, let's say: Berry 1. Then if the user selects 1, then Gustavo, Paul and Berry will be the options for the other data validation list.
I have already implemented the first validation list. The problem I am having is with the second part of the problem. Most solutions out there uses name managers. This is a problem for me because the way my data is laid out and because I need to keep constantly updating name managers. I would like to make it more dynamic, when a user adds or remove data, it always shows the current list for that division. I will have a third validation list afterwards, however, if I can learn how to do this one, then I should be able to solve the rest of the problem.
OK, so you can do this in a couple of steps with a working range to hold the validation list:
Somewhere in your workbook, you create an array formula in multiple cells
select cells e.g. F2:F6 (I'm using 5 possible Managers per division in the example, but you can change that)
With them all selected, enter the ARRAY formula (i.e. use Ctrl+Shift+Enter to enter it) =SMALL(IF($B$2:$B$6=$D$4,ROW($B$2:$B$6),""),ROW(INDIRECT("1:5"))) - the managers' divisions are in B2:B6 - the selected one is in D4... the 1:5 is a effectively a counter, going up to our 5 possible managers.
This should give you a list of row numbers where the Division is the same as selected and then some #NUM! errors
In the next column (G2:G6) enter the (normal) formula =IFERROR(INDIRECT("A"&F2),""). This will give you a list of the names of Managers in the selected division.
In the cell you want the selection, use the source =OFFSET($G$2,0,0,COUNT($F$2:$F$6)) - this references the list of names, but gets rid of the blanks at the bottom
Hope this makes sense! Here is a picture of the layout:

Resources