Excel list of distinct values - excel

I've got a table with a customer email in column A, and the product they bought in column B - there's around 60k rows.
Data
------------------------------------------------------
Email | Product | Time |
------------------------------------------------------
nayena#gmail.com | P1 | 27/02/2020 18:09:41 |
yenaye#hotmail.com | P2 | 28/02/2020 17:09:32 |
nayena#gmail.com | P1 | 29/02/2020 14:05:46 |
yenaye#hotmail.com | P1 | 29/02/2020 13:02:04 |
yenaye#hotmail.com | P2 | 29/02/2020 20:05:21 |
I'm trying to make two new columns
One should be how many distinct products the customer has sold for.
The other should be a list of those distinct products. (Preferably ordered by date, which would be in another column) Desired result:
Desired Result
---------------------------------------------------------------------
Email | Product | Time | Orders | n |
---------------------------------------------------------------------
nayena#gmail.com | P1 | 27/02/2020 18:09:41 | P1 | 1
yenaye#hotmail.com | P2 | 28/02/2020 17:09:32 | P1|P2 | 2
nayena#gmail.com | P1 | 29/02/2020 14:05:46 | P1 | 1
yenaye#hotmail.com | P1 | 29/02/2020 13:02:04 | P1|P2 | 2
yenaye#hotmail.com | P2 | 29/02/2020 20:05:21 | P1|P2 | 2
I've tried something like
=FILTER(B:B,A:A=A2)
but its dropping down the values, spilling into cells below. And it doesn't do unique values.
This is a fairly standard operation in Python but I'd like to know how one does this in excel in the simplest way possible - it should be fairly straight forward I imagine.
I've tried something like
=AGGREGATE(3,0,FILTER(B:B,A:A=A2))
but I'm not clear on how FILTER is passed to AGGREGATE

Can't quite get what you show for your desired output, if you want the orders sorted by date.
One of your customers has P2 as both the earliest and latest, so no matter which way you sort, it will come out P2|P1
The Orders formula below sorts by Date. (No need for sorting in the n formula)
Orders: =TEXTJOIN("|",TRUE,UNIQUE(INDEX(FILTER(SORT($A$2:$C$6,3,1 ),A2 = INDEX(SORT($A$2:$C$6,3,1 ),0,1)),0,2)))
n: =COUNTA(UNIQUE(INDEX(FILTER($A$2:$C$6,A2=$A$2:$A$6),0,2)))
To replicate your Desired Result, you'd have to sort by Product:
Orders: =TEXTJOIN("|",TRUE,UNIQUE(INDEX(FILTER(SORT($A$2:$C$6,2,1 ),A2 = INDEX(SORT($A$2:$C$6,2,1 ),0,1)),0,2)))
Note:
You can use whole column references, but it increases the calculation time significantly
A Power Query solution is also easily doable, but depends on the specifics of exactly how you want the results presented

Related

EXCEL: SUMIFS criterion applied to a INDEX MATCH search equals a value

I've spent pretty much all day trying to figure this out. I've read so many threads on here and on various other sites. This is what I'm trying to do:
I've got the total sales output. It's large and the number of items on it varies depending on the time frame it's looked at. There is a major lack in the system where I cannot get the figures by region. That information is not stored in the system. The records only store the customer's name, the product information, number of units, price, and purchase date. I want to get the total number of each item sold by region so that I can compare item popularity across regions.
There are only about 50 customers, so it is feasible for me to create a separate sheet assigning a region to the customers.
So, I have three sheets:
Sheet 1: Sales
+-----------------------------------------------------+
|Customer Name | Product | Amount | Price | Date |
-------------------------------------------------------
| Joe's Fish | RT-01 | 7 | 5.45 | 2020/5/20 |
-------------------------------------------------------
| Joe's Fish | CB-23 | 17 | 0.55 | 2020/5/20 |
-------------------------------------------------------
| Mack's Bugs | RT-01 | 4 | 4.45 | 2020/4/20 |
-------------------------------------------------------
| Joe's Fish | VX-28 | 1 | 1.20 | 2020/5/13 |
-------------------------------------------------------
| Karen's \/ | RT-01 | 9 | 3.45 | 2020/3/20 |
+-----------------------------------------------------+
Sheet 2: Regions
+----------------------+
| Customer | Region |
------------------------
| Joe's Fish | NA |
------------------------
| Mack's Bugs | NA |
------------------------
| Karen's \/ | EU |
+----------------------+
And my results are going in Sheet 3:
+----------------------+
| | NA | EU |
------------------------
| RT-01 | 11 | 9 |
+----------------------+
So looking at the data I made up for this question, I want to compare the number of RW-01's sold in North America to those sold in Europe. I can do it if I add an INDEX MATCH column to the end of the sales sheet, but I would have to do that every time I update the sales information.
Is there some way to do a SUMIFS like:
SUMIFS(Sheet1!$D:$D,Sheet1!$A:$A,INDEX(Sheet2!$B:$B,MATCH(Sheet1!#Current A#,Sheet2!$A:$A))=Sheet3!$B2,Sheet1!$B:$B,Sheet3!$A3)
?
I think it's difficult to do it with a SUMIFS because the columns you're matching have to be ranges, but you can certainly do it with a SUMPRODUCT and COUNTIFS:
=SUMPRODUCT(Sheet1!$C$2:$C$10*(Sheet1!$B$2:$B$10=$A2)*COUNTIFS(Sheet2!$A$2:$A$5,Sheet1!$A$2:$A$10,Sheet2!$B$2:$B$5,B$1))
I don't recommend using full-column references because it could be slow.
BTW I was assuming that there were no duplicates in Sheet2 for a particular combination of customer and region - if there were, you could use
=SUMPRODUCT(Sheet1!$C$2:$C$10*(Sheet1!$B$2:$B$10=$A2)*
(COUNTIFS(Sheet2!$A$2:$A$5,Sheet1!$A$2:$A$10,Sheet2!$B$2:$B$5,B$1)>0))
EDIT
It is worth using a dynamic version of the formula, though it is not elegant:
=SUM(Sheet1!$C2:INDEX(Sheet1!$C:$C,MATCH(2,1/(Sheet1!$C:$C<>"")))*(Sheet1!$B2:INDEX(Sheet1!$B:$B,MATCH(2,1/(Sheet1!$B:$B<>"")))=$A2)*
(COUNTIFS(Sheet2!$A$2:INDEX(Sheet2!$A:$A,MATCH(2,1/(Sheet2!$A:$A<>""))),Sheet1!$A2:INDEX(Sheet1!$A:$A,MATCH(2,1/(Sheet1!$A:$A<>""))),Sheet2!$B$2:INDEX(Sheet2!$B:$B,MATCH(2,1/(Sheet2!$B:$B<>""))),B$1)>0))
As you would need to make the match in memory I don't think it's feasible in Excel, you'll have to use a vba dictionary.
On the other hand, if the number of columns is fixed in your sales sheet, you can just format as table and add your index match in F.
When updating the sales data delete all lines as of line 3 and copy paste the update value. Excel will automatically apply the index match on all rows.

Creating an MDX measure in Excel to count members of one dimension based on another dimension

I am trying to create an MDX measure in Excel (in OLAP Tools) that will count how many members there are for every other item in another dimension. As I don't know the exact syntax and notation for MDX and OLAP cubes I will try to simply explain what I want to do:
I have a pivot table based on an OLAP Cube. I have a Machine Number field stored in one dimension, that is the "parent" and for every machine number there is a number of articles that were produced (in certain period of time). Those articles are represented by Order Numbers. Those numbers are stored in another dimension. I would like the measure to count how many order numbers there are for every machine number.
So the table looks like this:
+------------------+----------------+
| [Machine Number] | [Order Number] |
+------------------+----------------+
| Machine001 | |
| | 111111111 |
| | 222222222 |
| | 333333333 |
| Machine002 | |
| | 444444444 |
| | 555555555 |
| | 666666666 |
| | 777777777 |
+------------------+----------------+
and I would like the result to be:
+------------------+----------------+------------+
| [Machine Number] | [Order Number] | [Measure1] |
+------------------+----------------+------------+
| Machine001 | | 3 |
| | 111111111 | |
| | 222222222 | |
| | 333333333 | |
| Machine002 | | 4 |
| | 444444444 | |
| | 555555555 | |
| | 666666666 | |
| | 777777777 | |
+------------------+----------------+------------+
I've tried using the COUNT function with EXISTING as well, but it wouldn't work (always showing 1, or the same wrong number for every machine). I believe that I have to somehow connect those two dimensions together so the Order Number is dependent to Machine Number, but lacking the knowledge about MDX and OLAP Cubes I don't even know how to ask Google how to do that.
Thanks in advance for any tips and solutions.
Your problem basicly is, you have two attributes in diffrent dimensions. You want to retrive the valid combinations of these attribute, further you want to count the number of attribute values avaliable in the sceond attribute based on the value of the first attribute.
Based on the above problem statement, in an OLAP cube a fact table or a Measure defines the relations between attributes of diffrent dimension linked to the Measure\Fact-Table. Take a look at the example below.(I have used the SSAS sample db Adventureworks)
--Iam trying to find the promotions that were offered for each product category.
select
[Measures].[Internet Sales Amount]
on columns,
([Product].[Category].[Category],[Promotion].[Promotion].[Promotion])
on rows
from
[Adventure Works]
Result
The result is cross-product of all the product categories and the promotions. Now lets make the cube return the valid combinations only.
select
[Measures].[Internet Sales Amount]
on columns,
nonempty(
([Product].[Category].[Category],[Promotion].[Promotion].[Promotion])
,[Measures].[Internet Sales Amount])
on rows
from
[Adventure Works]
Result
Now we indicated that it needs to return only valid combinations. Note that we provided a measure that belonged to the fact connecting the two dimensions. Now lets count them
with member
[Measures].[test]
as
count(
nonempty(([Product].[Category].currentmember,[Promotion].[Promotion].[Promotion]),[Measures].[Internet Sales Amount])
)
select
[Measures].[Test]
on columns,
[Product].[Category].[Category]
on rows
from
[Adventure Works]
Result
Alternate query
with member
[Measures].[test]
as
{nonempty(([Product].[Category].currentmember,[Promotion].[Promotion].[Promotion]),[Measures].[Internet Sales Amount]) }.count
select
[Measures].[Test]
on columns,
[Product].[Category].[Category]
on rows
from
[Adventure Works]

Using IF AND to calculate based on one or more criteria

Within a resource planner, my data has a row for each employee, and columns detailing the team they work for. Another column details the available days they will work in the year. The teams are also displayed along a row at the top, see below :
A | B | C | D | E | F | G |
1 Employee | Team 1 | Team 2 | Days | Finance | Risk | IT |
2 Employee 1 | Finance | | 170 | | | |
3 Employee 2 | Risk | Finance | 170 | | | |
4 Employee 3 | Finance | | 170 | | | |
5 Employee 4 | IT | Risk | 170 | | | |
6 Employee 5 | IT | Finance | 170 | | | |
I want to use columns E:G as a supply calculator per team. Therefore, the formula in cell E2 would be "=IF(B2=E1,D2,0)" and copied along the row, returning the 170 days under Finance and 0 under the rest.
The issue lies where an employee divides his time between two different teams. As you can see, some employees can work for 2 different teams (Employee 2 works for both Finance and Risk, for example). The formula in E3 would therefore need to be some kind of IF AND, where if a value is present in the Team 2 column (C), the value in the Days column (D) would be divided by two and split across the relevent team columns.
I've tried a few options, IF AND, nested IFS etc but cant seem to get the syntax correct. Any help greatly appreciated.
=IF(ISNUMBER(MATCH(E$1,$B2:$C2,0)),$D2/COUNTA($B2:$C2),0)
You actually want OR and COUNTA:
=IF(OR($B2=E$1,$C2=E$1),$D2/COUNTA($B2:$C2),0)

VLOOKUP to merge two tables into one

I have 2 price lists from 2 different companies but there are some many similar item numbers, is there a code to merge the pricelists into one? below example of what I have.
A-Pricelist
Item | Product | Price
382101 | Truck | 130$
212012 | car | 80$
B-Pricelist
Item | Product | Price
111011 | Airplane | 500$
382101 | truck | 50$
Expected result
Item | Product | A Price | B Price
382101 | Truck | 50$ | 130$
212012 | car | 80$ | -
111011 | Airplane | - |500$
I have seen it is done by Vlookup, but it is just not working for me, thanks.
So, vlookup will work fine, I would think about how to control the list of unique values using a dropdown...
But here is an example, updated to deal with missing prices:

Show top 3 from list based on difference

I have 2 lists;
"yesterday" and "today".
As rows I have a list of companies and the data shown is customer satisfaction going from 0-10. I want to show the top 3 companies that has the best difference between "yesterday" and "today".
How would you approach this??
Expected output looking for top 1:
Yesterday - Today
Company A: 5 10
Company B: 7 8
Company C: 8 6
Top 1: Company A (Since they moved the most(5 positive points))
Assuming your data is like this:
#########Sheet1<YESTERDAY>########
| A | B |
1|Companies| Customer satisfaction|
2|Company1 | 6
3|Company2 | 3
4|Company3 | 4
5|Company4 | 1
6|Company5 | 9
###########Sheet2<TODAY>##########
| A | B | C | D |
1|Companies| Customer satisfaction|Absolute changes | RANK |
2|Company1 | 1 | | |
3|Company2 | 7 | | |
4|Company3 | 7 | | |
5|Company4 | 4 | | |
6|Company5 | 8 | | |
Put this formula into Cell C2to get absolute change:
=ABS(VLOOKUP(A2,YESTERDAY!$A$2:$B$6,2,FALSE)-B2)
Put this formula into Cell D2to get Rank:
=RANK(C2,$C$2:$C$6,0)
So, 1,2,3 in Column RANK are best changes.
I assume best difference as highest difference.run a loop and take the first row company from yesterday as well as the customer satisfaction value and search that same company in today in another inner loop and find the difference of the two values and save it in an array.After that sort the array and display the top 3.

Resources