I want to calculate the sumproduct as pictured in the table in the "Revenue" column. However, the dataset is fairly big, I'm limited to excel and the standard functions.
It should find all variables with the keyword "weightl" and "sell" in it and multiply and sum them accordingly per row. In Row 3 for example:
2*3+3*6+2*3 = 30
I thought of using a kind of a dictionary to alter the search terms and go through each column. But I have no clue on how to put it all together.
I used this
{=+isnumber(search("weightl";F2:N2))+isnumber(search("sell";F2:N2))}
to create the 1/0 table of the original one in the hope this could lead me somewhere
and
=SUM(IF(IFERROR(SEARCH("weight";G2:M2);0)>0;IF(G3:M8<>8888;G3:M8)))
to calculate the total sum of the weight values but this doesn't help much here
Can this even be realized with normal functions? if not, how could a solution in VBA look like?
If your "weight" and "sell" columns are always two columns apart, then you can use this array formula which looks for the "weight" column and then multiplies it by the column 2 cells to the right:
hdrs refers to the range $A$1:$I$1 which contains the headers. But it could refer to the entire row, or a much large portion of Row 1
=SUM(IFERROR(SEARCH("*weight*",hdrs)*A2:G2,0) * IFERROR(SEARCH("*weight*",hdrs)*C2:I2,0))
If there might be a variable number of columns between "weight" and "sell", then you can try this array formula which looks for the "weight" and "sell" columns separately:
=SUM(INDEX(A2:I2,1,N(IF(1,AGGREGATE(15,6,SEARCH("*weight*",hdrs)*COLUMN(hdrs),ROW(INDIRECT("1:"&COUNTIF(hdrs,"*weight*")))))))*INDEX(A2:I2,1,N(IF(1,AGGREGATE(15,6,SEARCH("*sell*",hdrs)*COLUMN(hdrs),ROW(INDIRECT("1:"&COUNTIF(hdrs,"*weight*"))))))))
Since this is an array formula, you need to "confirm" it by holding down ctrl + shift while hitting enter. If you do this correctly, Excel will place braces {...} around the formula as observed in the formula bar
Note I just noticed you want to match "weight1", so just make the obvious change in the above formulas.
Here is a formula that should do the matching in the way that you're thinking:
=SUM(A2:I2*ISNUMBER(FIND("weight",A1:I1))*IFERROR(INDEX(A2:I2,N(IF({1},MATCH("*sell"&RIGHT(A1:I1,LEN(A1:I1)-FIND("weightl",A1:I1)-6),A1:I1,0)))),0))
Must be entered as an array formula using CtrlShiftEnter
Note I'm finding the 'sell' header which matches the 'weightl' header, so weightl1_1_4 will match with sell1_1_4 etc., but I'm now wondering if this is necessary - maybe the weight just matches with the next sell, which would be easier.
Related
I have a (large) array of data in Excel of which I need to compute the average value of certain values in one column, based on the values of another column. For example, here's a snippet of my data:
So specifically, I want to take the average of the F635 mean values corresponding with Row values of 1. To take it a step further, I want this to continue to Row values of 2, Row values of 3 etc.
I'm not familiar with how to run code in Excel but have attempted to solve this by using the following:
=IF($C = "1", AVERAGE($D:$D), "")
which (to my understanding) can be interpreted as "if the values (anywhere) in column C are equal to 1, then take the average of the corresponding values in column D."
Of course, as I try this I get a formula error from Excel.
Any guidance would be incredibly appreciated. Thanks in advance.
For more complicated cases, I would use an array-formula. This one is simple enough for the AVERAGEIF formula. For instance =AVERAGEIF(A1:A23;1;B1:B23)
Array-formula allows for more elaborate ifs. To replicate the above, you could do =SUM(IF($A$1:$A$23=1;$B$1:$B$23;0))/COUNT(IF($A$1:$A$23=1;$B$1:$B$23;0)).
Looks like more work but you can create extremely elaborate if-statements. Instead of hitting ENTER, do CTRL-ENTER when entering the formula. Use * between criteria to replicate AND or + for OR. Example: SUM(IF(($A$1:$A$23="apple")*($B$1:$B$23="green");$C$1:$C$23;0)) tallies values for green apples in c1:c23.
Your sample data includes three columns with potential ifs so my guess is that you're going to need array formulas at some point.
Excel already has a builtin function for exactly this use; AVERAGEIF().
=AVERAGEIF(C:C,1,D:D)
Having a list of days suchs as:
01-giu-16
01-giu-16
01-giu-16
31-mag-16
31-mag-16
31-mag-16
31-mag-16
30-mag-16
I was looking for an excel formula that helps me count the number of unique days in the list (in this example 3)
Moreover I need the count only for the dates which have a specific ID in the next column (for example 1565)
Without any additional criteria, you can achieve the uniqueness count by using
=SUMPRODUCT(1/COUNTIF(A1:A8,A1:A8)), assuming your data are in the range A1:A8.
To evaluate subject to additional criteria (suppose they are in column B), use
{=SUM(--(FREQUENCY(IF(B1:B8=1565,MATCH(A1:A8,A1:A8,0)),ROW(A1:A8)-ROW(A1)+1)>0))}
This is an array formula: use Ctrl + Shift + Return once you're done editing (and don't type the curly braces yourself). Personally though I think this exceeds the reasonable threshold for complexity: I'd be inclined to adopt the first approach on a column that represents an intermediate transformation of your input data.
Lets assume your data is in Column A and it has a header row. So the first data number will actually be in A2. Place this formula in B2 and copy down beside your list. It will generate a list of unique cell numbers from column A. Once you have the list you simply need to use a function to count the side of it.
=iferror(INDEX($A$2:$A$5,MATCH(0,INDEX(COUNTIF($B$1:B1,$A$2:$A$5),0,0),0)),"")
in C2 you can use the following formula to get the number of unique cell numbers
=COUNTA(B2:B9)-COUNTIF(B2:B9,"")
In D2 you can use the following formula to get the count of each unique cell number from your original list. Copy it down as far as you need to go.
=IF(B5="","",COUNTIF($A$2:$A$9,B5))
I'm working on data from a population of people with allergies. Each person has a unique ExceptionID, and each allergen has a unique AllergenID (451 in total).
I have a data table with 2 columns (ExceptionID and AllergenID), where each person's allergies are listed row by row. This means that the ExceptionID column has repeated values for people with multiple allergies, and the AllergenID column has repeated values for the different people who have that allergy.
I am trying to count how many times each pair of allergies is present in this population (e.g. Allergen#107 & Allergen#108, Allergen#107 & Allergen#109,etc). To keep it simple I've created a matrix of 451 rows X 451 columns, representing every pair (twice actually because A/B and B/A are equivalent).
I somehow need to use the row name (allergenID) to lookup the ExceptionID in my data table, and count the cases where that matches the ExceptionIDs from the column name (also AllergenID). I have no problem using Vlookup or Index/Match, but I'm struggling with the correct combination of a lookup and Sumproduct or Countif formula.
Any help is greatly appreciated!
Mike
PS I'm using Excel 2016 if that changes anything.
-=UPDATE=-
So the methods suggested by Dirk and MacroMarc both worked, though I couldn't apply the latter to my full data set (17,000+ rows) because it was taking a long time.
I've since decided to turn this into a VBA macro because we now want to see the counts of triplets instead of pairs.
With the 2 columns you start with, it is as good as impossible... You would need to check every ExceptionID to have 2 different specific AllergenID. Better use a helper-table with ExceptionID as rows and AllergenID as columns (or the opposite... whatever you like). The helper table needs a formula like:
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
Which then can be auto-filled. (The ranges are from my example, you need to change them to your needs).
With this helper-matrix you can easily go for your bigger matrix like this:
=COUNTIFS(E:E,1,INDEX($E:$G,,MATCH($I2,$E$1:$G$1,0)),1)
Again, you can auto-fill with this formula, but you need to change it, so it fits your needs.
Because the columns have the same ID2 (would be your AllergenID), there is no need to lookup them because E:E changes automatically with the auto-fill.
Most important part of the formulas are the $ which should not be messed up, or you can not auto-fill it.
Picture of my self-made example (formulas are from the upper left cell in each table):
If you still have any questions, just ask :)
It can be done straight from your original set-up with array formulas:
Please note that array formulas MUST be entered with Ctrl-Shift-Enter, before copying across and down:
In the example pic, I have NAMED the data ranges $A$2:$A$21 as 'People' and $B$2:$B$21 as 'Allergens' to make it a nicer set-up. You can see in the formula bar how that looks as a formula. However you could use the standard references like this in your first matrix cell:
EDIT: silly me, N function is not needed to turn the booleans into 1's and 0's, since multiplying booleans will do the trick. Below formula works...
SUM(IF(MATCH($A$2:$A$21,$A$2:$A$21,0)=ROW($A$2:$A$21)-1, NOT(ISERROR(MATCH($A$2:$A$21&$E2,$A$2:$A$21&$B$2:$B$21,0)))*NOT(ISERROR(MATCH($A$2:$A$21&F$1, $A$2:$A$21&$B$2:$B$21,0))), 0))
Then copy from F2 across and down. It can be perhaps improved in technique with sumproduct or whatever, but it's just a rough example of the technique....
I have a 2-D array: dates on a horizontal axis and identification numbers on a vertical axis.
I want the sums conditioned on a particular date and ID, and I want to know how to do this using SUMIFS.
For some reason, it seems like I cannot since the array is 2-D while the criteria ranges are 1-D. Can anyone give me any advice on other formulas I can use?
In other words, I would like to add the values that satisfy the ID and date I select; there is one or more data point that satisfies the conditions. This is why the SUMIF function is relevant.
With this data you will not be able to use a SUMIF forumula. Here's a formula you can use:
=SUM(IF($B$2:$B$6=C9,IF($F$1:$K$1=B9,$F$2:$K$6)))
Change the addresses where appropriate and be sure and enter it by pressing CTRL + SHIFT + ENTER. You can also use the below formula to avoid pressing CTRL + SHIFT + ENTER:
=SUMPRODUCT(($B$2:$B$6=C9)*($F$1:$K$1=B9)*$F$2:$K$6)
Assuming that you're looking for an intersection of an ID and a Date, you can use the following:
=INDIRECT(ADDRESS(MATCH([ID Number],A:A,0),MATCH([Date],1:1,0)))
INDIRECT allows you to type in an address as plain text and returns the value
ADDRESS turns the numbers for rows and columns into a regular address
MATCH finds where in a row or column a given value is located.
I just wanted to add that the array version of the 2D summation in the answer above
=SUM(IF($B$2:$B$6=C9,IF($F$1:$K$1=B9,$F$2:$K$6)))
will work better if your data table $F$2:$K$6 has blanks (or other non-numeric values) because it will sum only the values that match criteria specified by $B$2:$B$6=C9 $F$1:$K$1=B9 and ignore all others.
Generally, you probably will not have blanks or other non-numeric values in your data table but I just wanted to throw this out there in case it helps someone. It certainly helped me, and I had fun playing with both 2D summation examples above. :)
TOP Table is Input, and bottom table is preview for required output.
For Each ID I need to find earliest datetime. I also need other information from other columns (please see image below).
My current solution is:
In Cell E2 =A2
Cell E3 drag down =IF(E2<>A3,IF(E1=A3,"",A3),"")
In Cell F2 drag down =IF(E2<>"",MIN(IF($A$2:$A$14=E2,$C$2:$C$14)),"") Ctrl+Shift+Enter
One more option without any intermediate calculations:
Select the whole range starting E2 and to the last row where IDs are located - for the sample given it's row 14, so select range E2:E14: =IFERROR(INDEX($A$2:$A$14,SMALL(IF(MATCH($A$2:$A$14,$A$2:$A$14,0)=ROW(INDIRECT("1:"&ROWS($A$2:$A$14))),MATCH($A$2:$A$14,$A$2:$A$14,0),""),ROW(INDIRECT("1:"&ROWS($A$2:$A$14))))),"") and press CTRL+SHIFT+ENTER instead of usual ENTER - this will define a Multicell ARRAY formula and will result in curly {} brackets around it (but do NOT type them manually!).
F2 (ID2): =IF(E2="","",SUMPRODUCT(--(E2=$A$2:$A$14),--(G2=$C$2:$C$14),$B$2:$B$14)) - normal formula.
G2 (Min Date): =IF(E2="","",MIN(IF(E2=$A$2:$A$14,$C$2:$C$14,2^100))) and press CTRL+SHIFT+ENTER instead of usual ENTER - this will define an ARRAY formula and will result in curly {} brackets around it (but do NOT type them manually!).
H2 (InCh): =IF(E2="","",INDEX($D$2:$D$14,SUMPRODUCT(--(E2=$A$2:$A$14),--(F2=$B$2:$B$14),--(G2=$C$2:$C$14),ROW(INDIRECT("1:"&ROWS($D$2:$D$14)))))) - normal formula.
Remarks:
To make the solution more compact and easy to read, define named range for ID column, and then reference other data columns using OFFSET.
ID2 values may not be unique - as they are on the sample for IDs 1...3.
Resulting set for Min Date should be formatted the same way as source Date row.
The key formula of the solution - is multicell monster which returns unique IDs without empty rows - as OP requested)
Sample file: https://www.dropbox.com/s/d2098updfh8djnf/MinDateIDs.xlsx
This is quite a challenge... I think I have found an approach that works. For the sake of clarity, I used a few helper columns. Also, I did not use any named ranges but stuck with the column-row indications. You might want to change that.
It looks like this:
and zooming in to the relevant columns:
Column F contains an array formula to filter out duplicates. An approach is explained here. The formula I used in F2 is
=INDEX($A$2:$A$14, MATCH(MIN(IF(COUNTIF($F$1:F1,$A$2:$A$14)=0, 1, MAX((COUNTIF($A$2:$A$14, "<"&$A$2:$A$14)+1)*2))*(COUNTIF($A$2:$A$14, "<"&$A$2:$A$14)+1)), COUNTIF($A$2:$A$14, "<"&$A$2:$A$14)+1, 0))
Use Ctrl-Shift-Enter to confirm as array formula. Drag this down or copy into column F. Then columns G and H contain the starting and ending indices of the duplicate ID values. This answer helped, please upvote it :-). The two formulas used are:
=MATCH(2,1/FREQUENCY($F2,$A$2:$A$14))
in G2, and
=FREQUENCY($A$2:$A$14,$F2)
in H2. Again, drag them down to get the full column filled. Next, column I is for clarification only -- and for sanity checking. It contains the desired minimum date from each sub-array. Column J substitutes that formula into a MATCH to find the actual index of the desired date.
=MIN(OFFSET($C$2:$C$14,$G2-1,0,1+$H2-$G2,1))
in I2 and
=$G2-1+MATCH(2,1/FREQUENCY(MIN(OFFSET($C$2:$C$14,$G2-1,0,1+$H2-$G2,1)), OFFSET($C$2:$C$14,$G2-1,0,1+$H2-$G2,1))
in J2. Finally, columns L, M and N index into the original set of data via
=INDEX(B$2:B$14,$J2)
in L2, which you can drag horizontally and then vertically.
When you are done, you can hide the helper columns, or fold everything into big formulas. Good luck with that... There might be an easier way to achieve this, but I did not find it.
If you want the value from column D in G then assuming that column C values are unique you could just use a VLOOKUP, i.e. in G2 copied down
=VLOOKUP(F2,C$2:D$14,2,0)
Per your picture, they're all in the same sheet. Just sort by ID, then Date (ascending). As you work your way down the ID column, each time the ID changes, you know you've found the row with the minimum Date for that specific ID. Create an extra column to signify where ID changes occur, and filter for those rows (hide the column if you so desire).
And... voila.
Know this link is old, but there is a much shorter and easier way!
How about using a pivot table using the Minimum as field setting and then do a =GETPIVOTDATA() to get the information back!
Seems a lot simpler as these formulas!
Actually, I just realized I've been overthinking this...Excel keeps the top item and removes all that follow when removing duplicates.
So if you are going to create an extra working table anyway, why not just copy the range/columns you want to keep, then use the basic sort.
Sort first by ID, then by the column you want as the second filter. Be sure the sorts are in the order you want (e.g. newest to oldest, oldest to newest, A to Z, Largest to smallest, etc).
Once the data is sorted, remove duplicates based on ID. You are left with all of your columns of data, filtered by newest/oldest/largest/smallest per individual.
This worked for my table with 30,000+ records, filtered down to 1500 unique individuals with most recent (plus associated amount), and with a second filter, the largest (plus associated date) for each person.