Select a subset of a table for further processing - excel

In Microsoft Excel, how can I compute a range (portion of a column), based on the values in another column of the same table, returning the result in an Array form for further processing by other functions?
In SQL, what I mean is "SELECT field1 FROM table WHERE field2=value".
The selected results will be fed (twice) to FREQUENCY(), to compute the number of distinct entries in "field1". That is: given an existing table like this:
Box Date
1 07/01/12
13 07/01/12
13 07/01/12
27 07/18/12
13 07/18/12
55 07/18/12
I want to produce a resulting table like this:
Boxes Date
2 07/01/12
3 07/18/12
Note that "13" is only counted once in the first date ("distinct"), but it's still counted again in the second date.
I already have an expression that does the right thing for the whole of the table,
=SUM(N(FREQUENCY(Box,Box)>0))
where "Box" is a named range of the first table, consisting of the whole Box column. (Using the same range/array/list as the data and the bins for FREQUENCY is a stupefyingly subtle trick actually contained in the Excel help but -- alas! -- by no means adequately explained.)
But I want (several) subsets, one for each date. I want to expand my "SUM(N(FREQUENCY…" expression to act only on the rows of the first table whose Date column matches the Date column of the row being computed. That is, again resorting to SQL,
SELECT count(DISTINCT t1.Box), t2.Date
FROM `t1` JOIN `t2` ON (Date)
GROUP BY Date
I can even build a pivot table of the interesting values (which gets me counts in its cells), then use a parallel, date-indexed column of
=COUNTIF(…)
to reduce each row of counts down to a single count of uniques for that date. But this requires me to update the pivot table to notice new data in the base table, and then to drag-expand the column of answers to include the new date (or suffer ugly value error markers). So something more automatic, less fussily manual, would be sweet.

I guess not available when you asked, but Excel 2013 has Distinct Count as an option in a PivotTable:

Related

Filter rows containing date ranges to show only those where a certain date is in the range

I have this table:
Valid From,Valid To,Label
1 Jan 2021,7 Jul 2021,A
1 Feb 2021,7 Jul 2021,B
1 Mar 2021,7 Jul 2021,C
I have a "parameter" cell which contains a single date, i.e. 3 Feb 2021. This is what I consider the "test date". If the test date is in the "valid" range, the row is valid.
Expected output:
Valid From,Valid To,Label
1 Jan 2021,7 Jul 2021,A
1 Feb 2021,7 Jul 2021,B
Row C is missing, since it's not valid, yet.
I found a lot of examples how to filter tables by a date range (when you have a single date column, the autofilter does all I need) but my case is the opposite: Each row has a date range in which it's valid and I wand to create a read-only view which ansers the question "If today was SOME_DATE, which rows would be valid?"
Ideally, I want the raw data in one sheet and the view plus the "today" cell in another sheet.
What I've managed so far is a view that uses a couple of helper columns:
E F G
Visible?,Has Content?,Is In Range?
with formulas for Visible?:
=AND(F2,G2)
(TRUE if the other two are true)
Has Content?:
=COUNTA(SourceTable!A2:C2)>0
(TRUE if the source table has any data in any cell for the current row)
and lastly for "Is In Range?":
=AND($C$2>=SourceTable!A2, $C$2<SourceTable!B2)
For each data cell, the formula is:
=IF($E2,SourceTable!A2,"")
(empty unless the cell in the column "Visible?" is TRUE).
This works but has a few drawbacks:
When more rows are appended to the source table, the view isn't updated. I can work around that by adding 1000 more rows to the view but it feels like a waste of space.
It gets worse when rows are inserted in the middle because Excel then updates the references in my view. The new rows won't show up and I'll suddenly have gaps (i.e when I insert row 10 in the source table, the view will use A9 and then A11, A10 will be missing). Workaround: Use absolute references everywhere.
Lastly, I have those ugly helper columns. Workaround: Hide those columns.
I can get something working but I was wondering if there is a better way?
I'm on Excel 365, Version 2108.
Well this is at least a first approximation to an answer:
=FILTER(SourceTable!A:C,(ROW(SourceTable!A:C)=1)+(C2>=SourceTable!A:A)*(C2<=SourceTable!B:B)
If you can have either the start date or the finish date (or even both) as blank, indicating an open-ended date range, it will need modifying. The downside of this is that you need to manually format columns E and F as dates.
The formula is very similar to the examples here. The filter function lets you set up a filtering array with the same number of rows as your data array, where rows in the filtering array that evaluate to true will cause the matching row of the data to be included in the output. In an Excel sheet, anything that is not zero is considered to be true. In array formulas, you can't use AND and OR because they evaluate across the whole array and not row-by row, so you have to use * and + instead.
Actually, hard-coding row 1 is a little crude, this isn't vulnerable to insertion of lines above the data in SourceTable:
=FILTER(SourceTable!A:C,(ROW(SourceTable!A:C)=ROW(SourceTable!A1))+(C2>=SourceTable!A:A)*(C2<=SourceTable!B:B))
So I have chosen to use
ROW(SourceTable!A:C)=ROW(SourceTable!A1)
to include the first (header) row of the data, where row(A:C)=row(A1)=1: this only evaluates to TRUE for the first row which I am assuming initially is in the first row of the SourceTable sheet. The reason for using ROW(SourceTable!A1) rather than hard-coding 1 is that if the user inserted a row above the rows containing the data, A1 would automatically update to A2 and the formula would still work.
Then I add to this (OR) the same expression that you used,
(C2>=SourceTable!A:A)*(C2<=SourceTable!B:B)
but with the AND replaced by *.
I have used full-column references to avoid issued with deletions and insertions, but this may slow down your sheet a little. If there are no gaps in your label column, it would be possible to use an index/counta combination to restrict the size of the arrays to the number of rows actually used.
=LET(rows,COUNTA(SourceTable!C:C),
data,SourceTable!A2:INDEX(SourceTable!C:C,rows),
date1,SourceTable!A2:INDEX(SourceTable!A:A,rows),
date2,SourceTable!B2:INDEX(SourceTable!B:B,rows),
FILTER(data,(ROW(data)=ROW(SourceTable!A1))+(C2>=date1)*(C2<=date2)))
Note
This formula is indeed much faster than the original, 0.003 vs 0.2 seconds.

Calculated Fields in excel pivot

I have column ('CSAT') in a sheet that has numbers 1 and 0 in each cell. '1' represents 'Satisfied' and '0' represents 'Disatisfied'. I want to make a pivot from this sheet and have a new calculated field in it ('CSAT %') that will give me the score by dividing (Total 'Satisfied') count by (Total 'Dissatisfied + Total 'Satisfied') * 100.
I tried with COUNTIF but i dont think we can use this formula in pivot
Calculated Fields and Items in PivotTables are tricky. The main tripping point is understanding that Calculated Fields and Items operate on the totals, not on the individual values in the underlying data.
For example, if you created a new Field that was equal to Field1 * Field2 and data is being summarized by SUM, Excel doesn't multiply all of the respective values in each field and then sum the results. It first sums the fields for each category and then multiplies those results. What it's really doing is SUM(Field1) * SUM(Field2) for each category.
You can use some worksheet functions in the calculated fields, but you have to remember you're still operating on the totals. So if you created a new Field that was equal to Count(Field1) * Count(Field2), you're (almost) always just going to get an answer of 1. This is because the calculation is actually doing Count(SUM(Field1)) * Count(SUM(Field2)) for each category. The sum of each field is a single number, so the calculation is just doing 1*1 for each category.
So for this reason, you can't use aggregating functions like SUMIF or COUNTIF which need to look at each individual elements. Since you need to look at individual elements, you actually can't use a Calculated Field for your solution at all.
What you can do is use a Calculated Item!
The main catch here is you can't use any field in more than 1 location when calculated items are involved. Excel just throws an error message saying you're not allowed.
So if you have a category column as well as the CSAT column, you need to create another dummy column full of 1's to operate on.
You can then set up pivot table as follows:
Category field to Rows.
Dummy field to Data area, summarized by Sum
CSAT field to Columns
Click on the CSAT column headers in the pivot table and choose: PivotTable Tools > Fields, Items, & Sets > Calculated Item
Set Name for your new Item to CSAT%
Enter the formula: ='1'/('0'+'1')
On the CSAT field, hide items 1 and 0, so only the CSAT% field is visible
Result:
A couple of notes:
When entering fields and items in calculated fields and items, do so by placing the cursor where you want in the Formula then double clicking on the field/item name from the lists below. This will add brackets and quotes as required in the correct format.
Note that the formula doesn't need SUM around the item names, because calculated fields/items always work on the total of values. They are totalled according to how the data is summarized in the pivot table.
The dummy column was added with all values of 1 so that summing these values gives you the count, from which the percentage can be calculated using the formula specified.
Answer without using calculated fields:
Assuming you have categories in the row fields, you can put CSAT as a column field as well as a data field then choose to summarize values by Count and show values as a percentage of row totals:
After putting CSAT in column and data fields, right click on the data and select Summarize Values By > More Options...
First choose to Summarize Values By Count:
Then click Show Values As tab and select % of Row Total:
You'll then have percentage of 1's under the CSAT=1 column:

Two-level sorting in Excel with one as Number and second as date

So I have a sheet with almost 50 columns and I have to sort them according to only two columns, say, ID and DATE, where ID is a number.
What I want is the data should be sorted first by ID(ascending) and than by DATE (descending, newer date first).
Problem is, whatever I am trying the data is sorted by Excel for DATE column in ascending order , i.e, earlier date is coming first not in descending.
Can anyone suggest me more ideas?
Solution 1: Check value type
Check the type of the date column. It has to be a number to be sorted correctly.
The safest version-independent check:
select an unused column in your table (in the following assumung that your the date coloumn is 'B', the data rows starting in row 2)
put the formula "=TYPE(B2)" in that cell.
If the result is 1, your Date value is a number (at least in that row)
If the result is 2, your Date value is a string that might behave unexpectedly when you sort.
Solution II: Assure the values being numbers
(Caution: Befeor starting, make sure that you havbe backed up your data)
Put the number 1 into an empty cell somwhere on your sheet
Copy the cell (important! not only the formula)
select all fields in your data column
Hit the right Mouse key, select 'insert contents'
Select the options 'values' and 'multiply'
If the 'multiply' option is not available immediately within the 'insert contents'-submenu when you hit the right mouse button (depends on the excel version) select the 'Insert Contents...' sub-sub-Menu to get the complete insert values-dalog.
Hit 'OK'
If (some of) your date values are turned into 5-digit figures, they were in deed strings. Don't panic! Change the number format of the entire date coloumn to the date format of your liking.
Generally:
In the excel sort dialog, you can always select the sorting order. If the data are OK, the result is also.

the result of calculated new column is not right

There is a table of 12 columns. I would like to add an extra column, where each entry stores the average value of the corresponding row across those 12 columns. I use the feature "Calculated new column" to fulfill this task. After getting the result, I noticed that the average value was returned as zero when one of the 12 columns has zero value on that specific row. For other rows, the calculation is just OK if none of the entries in those 12 columns is zero. I attached the screenshot of resulting table and the calculation procedure in the data table properties for your reference. Would you like to let me know the possible reason?
average value was returned as zero when one of the 12 columns has zero
Actually you don't have zero values, but null values, which is different! Use SN() function to manage null values (need to use for all the columns)!

Excel: If Cell in Column = text value of X, then display text (in the same row, but different column) on another sheet

This is a confusing request.
I have an excel tab with a lot of data, for now I'll focus on 3 points of that data.
Team
Quarter
Task Name
In one tab I have a long list of this data displaying all the tasks for all the teams and what Quarter they will be on.
I WANT to load another tab, and take that data (from the original tab) and insert it into a non-list format. So I would have Quarters 1,2,3,4 as columns going across the screen, and Team Groups going down. I want each "task" that is labeled as Q1 to know to list in the Q1 section of that Teams "Block"
So something like this: "If Column A=TeamA,AND Quarter=Q1, then insert Task Name ... here."
Basically, if the formula = true, I want to print a list of those items within that team section of the excel document.
I'd like to be able to add/move things around at the data level, and have things automatically shift in the Display tab. I honestly have no idea where to start.
If there is never a possibility that there could be more that 1 task for a given team and quarter, then you can use a formula solution.
Given a data setup like this (in a sheet named 'Sheet1'):
And expected results like this (in a different sheet):
The formula in cell B2 and copied over and down is:
=IFERROR(INDEX(Sheet1!$C$2:$C$7,MATCH(1,INDEX((Sheet1!$A$2:$A$7=$A2)*(Sheet1!$B$2:$B$7=B$1),),0)),"")
I came across this situation. When I have to insert the values into a table from an Excel sheet I need all information in 1 Column instead of 2 multiple rows. In Excel my Data looks like:
ProductID----OrderID
9353510---- 1212259
9650934---- 1381676
9572474---- 1381677
9632365---- 1374217
9353182---- 1212260
9353182---- 1219361
9353182---- 1212815
9353513---- 1130308
9353320---- 1130288
9360957---- 1187479
9353077---- 1104558
9353077---- 1130926
9353124---- 1300853
I wanted single row for each product in shape of
(ProductID,'OrdersIDn1,OrderIDn2,.....')
For quick solution I fix it with a third column ColumnC to number the Sale of Product
=IF(A2<>A1,1,IF(A2=A1,C1+1,1))
and fourth Column D as a placeholder to concatenate with previous row value of same product:
=IF(A2=A1,D1+","&TEXT(B2,"########"),TEXT(B2,"########"))
Then Column E is the final column I required to hide/blank out duplicate row values and keep only the correct one:
=IF(A2<>A3,"("&A2&",'"&D2&"'),","")
Final Output required is only from Column E
ProductID Order Id Sno PlaceHolder Required Column
9353510 1212259 1 1212259 (9353510,'1212259'),
9650934 1381676 1 1381676 (9650934,'1381676'),
9572474 1381677 1 1381677 (9572474,'1381677'),
9632365 1374217 1 1374217 (9632365,'1374217'),
9353182 1212260 1 1212260
9353182 1219361 2 1212260,1219361
9353182 1212815 3 1212260,1219361,1212815 (9353182,'1212260,1219361,1212815'),
9353513 1130308 1 1130308 (9353513,'1130308'),
9353320 1130288 1 1130288 (9353320,'1130288'),
9360957 1187479 1 1187479 (9360957,'1187479'),
9353077 1104558 1 1104558
9353077 1130926 2 1104558,1130926 (9353077,'1104558,1130926')
You will notice that final values are only with the Maximum Number of ProductSno which I need to avoid duplication ..
In Your case Product could be Team and Order could be Quarter and Output could be
(Team,Q1,Q2,....),
Based on my understanding of your summary above, you want to put non-numerical data into a grid of teams and quarters.
The offset worksheet function will work well for this in conjunction with the match or vlookup functions. I have often done this task by doing the following steps.
In my data table, I have to concatenate the Team and quarter columns so I have a unique lookup value at the leftmost column of your table (Note: you can eventually hide this for ease of reading).
Note: You will want to name the input range for best formula management. Ideally use an Excel Table (2007 or greater) or create a dynamically named range with the offset and CountA functions working together (http://tinyurl.com/yfhfsal)
First, VLOOKUP arguments are VLOOKUP(Lookup_Value,Table_Array,Col_Index_num,[Range Lookup]) See http://tinyurl.com/22t64x7
In the first cell of your output area you would have a VLOOKUP formula that would look like this
=Vlookup(TeamName&Quarter,Input_List,Column#_Where_Tasks_Are,False)
The Lookup value should be referencing cells where you have the team names and quarter names listed down the sides and across the top. The input list is from the sheet you have the data stored. The number three represents the column number the tasks are listed in your source data, and the False tells the function it will only use an exact match in your putput.

Resources