I have the following data set:
Name Class Amount Position
1 I1 A 10 P
2 I1 O R
3 I1 A -5 R
4 I2 O P
5 I2 A 7 P
6 I2 A -11 R
Typically Amount has missing data. I want to populate the missing data from records that match my record on Name and Position . So in Row-2 I want -5 and in Row-4 I want 7. If I combine Name and Position to create a Key-column then I can do an index match or vlookup or just to an index-match with multiple criterion.
What I want to know is what is the most sensible way to go about it?
Do I create a new column, say Amount-2 and write a function in it that pulls up data from Amount and does a index-match if it is missing? I don't want to do this because I don't want to add more columns to my dataset.
Should I sort the dataset on Amount so I can bunch up all the blank cells together on top and write a formula in them referencing the populated cells below in the same column? Is this method going to be robust to later sorting the data set on other fields or will my formula references go haywire?
Write a macro that populates the missing cells? Overkill?
The reason I ask is that I don't want to make my sheet bigger by adding more columns like Key or Amount-2 and I wanted to know what is a robust and clean way to fill in the gaps.
Pros - Nice and simple, guaranteed to work with no surprises.
Cons - Adds to dataset.
Pros - avoids adding to dataset
Cons - sounds a bit convoluted to me, sorting may be an issue, though adding a copy/paste values would mitigate.
Pros - you've got VBA, you can do whatever you want, no extra columns or cutting/pasting required.
Cons - If you're not conversant with VBA, could be a (small) challenge to write. Could require that new users are trained.
My personal thoughts here are to just go ahead and write a macro. Even if you're not VBA savvy, using the macro recorder, then going through your option #1 (deleting the column at the end of the procedure to avoid the extra field) would get you a long way in this particular instance. You definitely want to go in and clean up the code afterwards, but you'd be quite a bit of the way there.
Map that macro to a shortcut, and updating your new data will be a trivial task.
Related
I would like to sum up date periods and sum the days per item.
The input data will grow over time and new item categories can appear, so the items (number of rows) that show in the expected report can not be "hardcoded".
The input parameter is the from and to date that determines the period that must be considered. You can imagine this as a moving date window on the input data grid.
I am a Java programmer and I am sure that I can write a proper SQL that groups and sums the data and generate the result. And I can write a Java program too, that does the job, but I really want to do this calculation from Excel.
Is there any way to generate the report by using only a combination of existing MS Excel formulas without writing any Visual Basic code (macro)?
If yes, then could you please put me in the right direction and tell me which formulas I can use? Then I can figure out how to use the formulas.
I hope that this helps to understand better what I would like to have:
Try:
Formula in F3:
=SUM(COUNTIFS(C:C,E3,A:A,"<="&SEQUENCE(H$2-F$2,,F$2),B:B,">="&SEQUENCE(H$2-F$2,,F$2)))
Note that range references that take whole columns will take long to process all data. The above will work even with overlapping dates.
I'm trying to design a second page that shows % results of my data on page 1.
For example, Column F & G allow manual entry of numbers 1-4 which are based off data the user types in at another location.
This is being used for trade tracking in investments so there will be quite a few numbers but the end result will be a row will show a specific stock, it's subsequent data, whether it made or lost money, etc.
What I want to do in page 2 is using the numbers 1-4 which were typed in at columns F & G, translate that into an edge on page 2.
For example, if there were 50 columns of data typed out for trades executed, I could take the number of winning trades of a certain setup (say number 3) and divide that by the total trades of 50 to come out with a win % for that setup.
However, I have no clue to how to translate that forumla into a filter formula so that on page 2 I could see that of the numbers 1-4 (4 different setups) I could easily see the highest and lowest win % to determine the best setup to use.
I'm not the best in excel but I understand enough to code most of that, I simply have no idea how to take that end formula and add a filter to it so that it only uses partial results. I've got 4 other formulas I want to use on page 2 as well to help build something that could really benefit myself, but if someone could just show me how to filter data into a formula, I think I could take it form there.
Thanks for the help
Ben
You can also do something like this with array formulas
=MAX(IF(Sheet1!$F$2:$F$50=$A2,$E$2:$E$50))
(Press Ctrl+Shift+Enter [CSE], instead of just Enter when entering Array Formulas)
Also, take a look a the SUMPRODUCT function. It comes in very handy for filtering data. Here are some helpful links...
https://www.get-digital-help.com/2017/12/07/sumproduct-multiple-criteria/
https://www.get-digital-help.com/2017/12/08/sumproduct-and-if-function/
https://www.get-digital-help.com/2010/09/01/extract-a-unique-distinct-list-by-matching-items-that-meet-a-criterion-in-excel/
I have 3 tables, 1 of which I want to fill in columns with data based on the other 2. Tables are roughly structured as follows:
Table 1 (Semi-Static Data)
SubGroup Group
----------- -----------
subgroup(1) group(a)
subgroup(2) group(b)
subgroup(3) group(b)
subgroup(4) group(c)
etc.
Table 2 (Variable Data)
SubGroup DataValue
----------- -----------
subgroup(1) datavalue(i)
subgroup(2) datavalue(ii)
subgroup(3) datavalue(iii)
subgroup(4) datavalue(iv)
etc.
Table 3 (Results)
Group TotalValue
----------- -----------
group(a) totalvalue(m)
group(b) totalvalue(n)
group(c) totalvalue(o)
etc.
Where the TotalValue is the sum of all DataValue's for all subgroups that belong to that particular Group.
e.g. for group(b) ---> totalvalue(n) = datavalue(ii) + datavalue(iii)
I am looking to achieve this calculation without adding any additional columns to the Data tables nor using VBA.
Basically I need to perform a COUNTIFS where there is an additional VLOOKUP matching the subgroup criteria range to the group it belongs to, and then only summing for datavalue's that match the group being evaluated. I have tried using array formulas but I'm having issues making it work. Any assistance would be very appreciated. Thank you,
EDIT: Wanted to add some details surrounding my question. First all Google searches did not provide a suitable answer. All the links had solutions to a slightly different problem were the VLOOKUP term is not dependent on the SUMIFS criteria but rather another single static variable. Stack Overflow offered similar solutions. Please let me know if anymore details are required to make my post suitable for this forum. Thank you again.
You can use the SUMPRODUCT function to do it all at once. The first reference $B$2:$B$5 is for the Group names, the second reference $E$2:$E$5 is for the datavalues. The G2 reference is for the group names in the third table, you can enter this formula for the first reference and then drag and fill for the rest.
=SUMPRODUCT($E$2:$E$5 * (G2 = $B$2:$B$5))
Some cell references, and sample data, would be helpful but something like this might be what you want:
=SUMIF(C:C,"="&INDEX(A:A,MATCH(E5,B:B,0)),D:D)
WADR & IMHO, this is simply bad worksheet design. For lack of a single cross-reference column in Table2, any solution would have to be a VBA User Defined Formula or an overly complicated array formula (the latter of which I am not even sure is possible). The data tables are not normalized database tables you can INNER JOIN or GROUP BY ... HAVING.
The formula you are trying to achieve is akin to,
=SUMPRODUCT(SUMIF(D:D, {"subgroup(2)","subgroup(3)"}, E:E))
That only works with hard-coded values as arrayed constants (e.g. {"subgroup(2)","subgroup(3)"}). I know of no way to spit a dynamic list back into the formula using additional native Excel functions but VBA offers some possibilities.
HOWEVER,
The simple addition of one more column to Table2 with a very basic VLOOKUP reduces all of your problems to a SUMIF.
The formula in the new column D, row 2 is,
=VLOOKUP(E2, A:B, 2, FALSE)
The formula in I2 is,
=SUMIF(D:D, H2,F:F )
Fill each down as necessary. Sorry if that is not what you wanted to hear.
Thank you everyone that responded and reviewed this post. I have managed to resolve this using an array formula and some matrix algebra. Please note that I am not using VLOOKUP (this operator cannot be performed on arrays) nor SUMIFS as my title states.
My final formula looks like this:
{=SUM(IF([Table2.xlsx]Sheet1!SubGroup=TRANSPOSE(IF([Table1.xlsx]Sheet1!Group=G2,[Table1.xlsx]Sheet1!SubGroup,"")),[Table2.xlsx]Sheet1!DataValue))}
Very simply, I create an array variable that compares the Group being evaluated (e.g. cell G2) with the Groups column for Table 1 and outputs the corresponding matching SubGroups. This results in an array with as many rows as Table 1 had (N) and 1 column: Nx1. I then transpose that array (1xN) and compare it to the SubGroups column (Mx1, M being the number of rows in Table 2) and output the DataValues column for the rows that have a corresponding SubGroup (MxN). Then I perform a sum of the whole array to return a single value.
Notice that as I didn't include a value_if_false output return on either IF operators, it will just populate with FALSE in the arrays were the conditions are not met. This does not matter though for the final result. In the first IF, FALSE will not match the SubGroups so will be ignored. For the second all values FALSE passed to SUM will be calculated as 0. The more complicated question is that it grows the amount of memory required to process as we are not filtering to just have the values we want.
For this application I decided against filtering the subarray as the trade-off in resource utilization was acceptable. If the data sets were any bigger though, I would definitely try doing it. Another concern was that I did not understand fully the filtering logic that I was using based on http://exceltactics.com/make-filtered-list-sub-arrays-excel-using-small/ so decided to simplify. Will revisit this concept latter as I think it will work. I might have completed this solution but was missing transposing the array to compare properly so abandoned this route.
Is it possible to parse/cast text (like "=A1+A2") as a formula in MS Excel? I want to build a formula from pieces of text - some of which will only be typed in later by a user.
If the INDIRECT() function did not only work for referencing cells, then I could have typed this =INDIRECT("=A1+A2").
I know you can a work around this problem by simply adding a lot more hidden columns to do sub calculations. But for the sake scalability and efficiency, I would rather do something like the above.
I found a similar questions here and here, yet they don't solve my problem.
The Real-world problem:
Read on for a better understanding as to why you would want to do the above
Scenario
Each item in the list consists of a string, which contains anywhere from 1 to 5 account names each. Each account name is followed by an account number in brackets. The length of the number determines the type of account. Part of the account number is a date, of which the date format depends on the type of account. Further more, each account type may have more that 1 account-number length associated with it, although each number-length[*] is only associated with 1 account type.
Objectives
Extract account-names and their respective account-numbers and account-types from a list.
Make an assumption as to the account-type from the account-number
Validate this assumption by inspecting the build of the number and elements in the name
Check the validity of the account-numbers depending on their type.
The tricky part (this is where my problem lies)
The account-types and their respective account-number-lengths are not known before hand, and are typed into a table by the user of the sheet, specifying a type of account and the number-lengths associated with this account-type. The user should type this into a list - not go and tinker around with delicate formulas
Done so far
Column A: Contains the raw data (each cell has up to 5 names and numbers)
Columns B..F: Each column extracts 1 name, remains empty if all are already extracted
Columns G..K: Each column extracts 1 number corresponding to its name in columns B..F, remains empty if all are already extracted
Columns L..P: Each column calculates the length of the corresponding number in columns
G..K
Now the user would type the following details into a table which assigns certain number-lengths an account type:
TYPE2, BUSINESS, (OR(length=13,length=6))
where length will later be replaced with the cell address which contains the calculated account number-length.
What I want to do now
Columns Q..U:
Should all indicate the account-type of the corresponding account-number in columns G..K. The idea is to build a nested if-elseIf-elseIf formula using the criteria typed in by the user as specified above. Example of one of the elseIF statements:
SUBSTITUTE(CONCATENATE("=IF(",criteria,",",type,",",errCode)),"length","O10"))
All of these elseIf statements will then be concatenated together to form a master formula which will then need to be parsed/cast as a formula to calculate the account-type
This proposal uses only 5 columns (1 for each account-number, containing the master formula) and a table specifying account-types and criteria, also keeping the user away from formulas. Editing 1 line of code (the criteria) will update all formulas. Efficient & Scalable.
Since the user should never tinker around with the formulas under the hood, a simple 1 column if-elseIf-elseIf is out of the question. The alternative to the above would be to make a separate column to test for each account-type for each account-number. Separating/Abstracting out each test to its own column results in much better readability, easier editing & much less debugging - Unless you like multi-screen-wide-formulas. Example: 5 account-numbers * 10 possible account types = 50 extra columns.
Each edit to any criteria needs to copied to 4 other non-adjacent columns and drag-filled down 10,000 rows (columns can not be adjacent since it is effectively a 5x5 array of columns). Not Efficient nor scalable. Unless I'm missing some elegant way of updating non-adjacent formulas in a single click
The rest of the validations error indications are trivial.
Sample data
Tshepo Trust (6901/2005) Marlene Mead (8602250646085)
Great Force Inv 67 Pty Ltd (200602258007)
Jane (870811) Livingstone (6901/2005) Janette Appel (8503250647056) James (900111)
I know all this would probably be much easier to achieve with clever usage of VBA, eliminating all the need to simulate abstraction, encapsulation, multi-dimensional arrays and functional programming on a spreadsheet. But until I can program in VBA, worksheet formulas will be my refuge.
[*]: account number-length could also be described as the amount of digits in the number or as indicated by this formula: LEN(accNumber)
In VBA you have access to Cell.Formula.
I usually used Range to peek a cell by address.
I'm not sure if this would answer your question(it's a very detailed question!), but if your user was entering the account numbers in a table (I'm calling it 'RefTable') , that was:
Length of account number | business type
----------------------------------------
6 | Accountant
8 | Advisor
Then you could just use a vlookup on the length of the account number, given you've already separated them out.
=vlookup(len(accNumber), Reftable, 2, false)
Make sure that you either use a dynamic range name, or specify plenty of space below in RefTable, so that when your users add types, they don't get lost.
Also, if you have two different accounts with the same length, this could get you into trouble.
I'm using excel 2007.
I've a list of tasks (200-500) that I need to group in different category/section etc (multiple filters). Whole data is in excel table so I can apply Excel's build-in table filters to display exact data that I need.
However it is always difficult to apply multiple filter to display expected data, specially as I need to do it very frequently. To make things simple I'm planning to number each record like
a.b.c.d.e.f
Where a, b, c, d, e, f are simple numbers. List looks like:
1
1.1
1.2
1.2.1
1.2.1.1
1.2.2
1.3
& so on.
Problem is, Excel take it as number with single decimal but as soon as I add second decimal, excel treat it as text, which is obvious in general behavior.
However, as special case, I need excel treat both as number or text. Number is preferable as I want to sort them, which might be difficult as a text.
To make the things little more complex, while filtering in table, I require if I can add some formula to filter results like 1.* should display all numbers starts with 1.
Is it possible with excel's default behavior, without VBA?
If no, is it possible with VBA? If yes, any clue is appreciated. I don't need whole program as I can write basic VBA program, just a clue how it can be done?
I sort mine by adding a helper column that adds a letter to the front and sort on that. E.g. 1 becomes f1, 1.1 becomes f1.1 etc. Then all are sorted as text.
You can use the formula ="f" & A1.
My sample:
Then the data sorted:
And the filter:
If I were to try this without VBA, my first step would be to use the sort to columns function on the data tab.
Next make sure all empty spaces in your data are filled with zeros.
Then sort the data by column
as long as you left your original data in the same row as the sorted data (I didn't in the images posted to focus on the process), your items should now be in order.