I have about 500k records in an Excel sheet. I am tasked to identify sellers who have multiple purchase id and buyer id. For example, seller_id (12525) has three different purchase_id (8569, 8591, 8594) and buyer_id C160511, C160512, C160513).
What is the correct syntax to use in Excel 2013 that can easily identify the records I am interested? I searched the web about the conditional formatting.
location Loc_Id Purchase_id buyer_id Seller_Id Date
CA 49456 8569 C160511 12525 5/3/2016
CA 49456 8569 C160511 12525 5/3/2016
CA 49456 8591 C160512 12525 5/3/2016
CA 49456 8591 C160512 12525 5/3/2016
CA 49456 8594 C160513 12525 5/3/2016
CA 49456 8594 C160513 12525 5/3/2016
TX 37140 8620 C170166 5621 1/24/2017
TX 37140 8621 C170167 5621 1/24/2017
Solution is bit lengthy but easy. I have included an extra condition as seller Id 5623 which does not have multiple buyer and purchase id. This is done to test this solution better. I have only highlighted sellerIds which has multiple Buyer and Purchase Id.
Image of my solution:-
Note:- All the formulas or cell names are according to my solution as shown in the image
Now the solution. I will break it in different parts:-
Step 1:- Add a column say Purchase_Id_cnt and write this formula and drag it down till the end
=IF(SUMPRODUCT(($A$2:$A2=A2)*($C$2:$C2=C2))>1,0,1)
Step2 :- Add a column say Buyer_Id_cnt and write this formula and drag it down till the end
=IF(SUMPRODUCT(($B$2:$B2=B2)*($C$2:$C2=C2))>1,0,1)
Step3 :- Add Pivot table in the same sheet(you can use different sheet also. I used cell F1 in same sheet)
Step4:- Add another column say Header (You can give a better name) and enter this formula corresponding to Pivot table in cell I3 in my case and drag it down till the end
=IF(AND(G3>1,H3>1),1,0)
Step5:- Now Select/Higlight C2 to C11. Then go to conditional formatting. Select 'New Rule'. Then Select 'Use a formula to determine which cells to format'.
Step6:- Enter this formula under 'Format Values where this formula is True'
=IFERROR(VLOOKUP(C2,$F$3:$I$5,4,0),0)=1
Step7:-Don't press 'OK' yet. Click on 'Format'. Go in 'fill' tab. Choose any color you want. I choose yellow and then press 'OK'.
Done!!!
Since you mention SQL Server, I'll answer using that. Honestly, your database is a MUCH better place to be doing this. With excel you would have two problems.
Identify these sellers
Apply conditional formatting based on 1.
And problem 1 is going to be heavy processing for 500k records in excel.
On to SQL Server:
SELECT Seller_ID,
CASE WHEN COUNT(DISTINCT Purchase_ID) > 1 THEN 'X' ELSE NULL END AS multiple_purchases,
CASE WHEN COUNT(DISTINCT Buyer_ID) > 1 THEN 'X' ELSE NULL END AS multiple_buyers
FROM your_table_name
HAVING multiple_purchases = 'X' OR multiple_buyers='X'
GROUP BY Seller_ID;
So here we are just aggregating the records by Seller_ID and then using a CASE statement and aggregation formula COUNT(DISTINCT <field>) to get the number of distinct (unique) purchase_id and buyer_id. The HAVING clause tests to see if either of those tests produced a hit and, if not, it drops the record from the result set.
Use the following formula in Conditional Formatting:
=COUNTIFS($B:$B,$B1,$C:$C,"<>" & $C1)>0
Related
Example below, the ID of people are case sensitive (if don’t consider the case, some of them are of the same, for example, David’s and Mike’s, Lilly’s and John’s).
When putting them into a pivot table, it displays the same IDs for different people, i.e. David’s = Mike’s, Lilly’s = John’s.
Is there a way to have the pivot table to display actual IDs (case sensitive)?
ID Name
Txze David
TxZe Mike
TwgQ Lucy
3RqM Lilly
3RQm John
TvrE Kate
So, had a quick go with index() and match() as suggested in my comment:
INDEX(A$2:A$7,MATCH(D2,$B$2:$B$7,0))
D2 and D4 contain the Name looked for and the results are in F2 & F4, which give the different results. Note, 0 is used for an exact match in the match function.
I am trying to sum the bonus for specific activities my employees do. I have a table with the standard bonus values for each bonus(and also for each level of employee), similar like this:
Position
Bonus name
Value
Operator level C
Bonus A
$5
Operator level C
Bonus B
$7
Operator level B
Bonus A
$4
Operator level A
Bonus A
$7
HR level C
Bonus A
$7.50
HR level A
Bonus B
$6.50
HR level A
Bonus C
$6.50
Then I have another table with which activities my employees performed during the week (where "1" represents that he/she done it and "0" activities not done:
Employee ID
Position
Bonus name
Performed or not
1234
Operator level C
Bonus A
1
1234
Operator level C
Bonus A
0
1220
HR level A
Bonus B
1
1220
HR level A
Bonus C
1
1278
Operator level B
Bonus A
1
My final desired output is going to be:
Employee ID
Amount (all bonus calculation)
1234
$5
1220
$13
1278
$4
I also have another table with the employee name, ID and level (like ex: Operator level C).
That is it guys, hopefully you can help me finding an easy way to get this final desired output.
Thanks a lot!
First I'm assuming your first table spans from A1:C8, so I'm writing this all with that in mind, but naming these tables will likely save you trouble in the future as the formulas will adjust with the table. You can do this using CTRL + T
In the second table you have shown, I would make an additional column that uses an Index + Match formula.
Depending on your version of excel this might be an array and needs to be closed by hitting CTRL + Shift + Enter (365 can ignore this note)
=INDEX($A$1:$C$8,MATCH(1,($A$1:$A$8=[Position Cell])*($B$1:$B$8=[Bonus Name Cell]),0),3)*[Performed Cell]
For Bonus Name Cell, Position Cell, and Performed Cell simply reference that row's respective column.
Now the final steps. Highlighting this table, under Insert in the ribbon insert PivotTable.
Then drag the EmployeeID into the bottom left box, and the whatever you named this new column into the bottom right to get the values. Format as necessary.
Working with 2 separate data sets (with duplicates)
Dataset is unique identified by an ID.
There may not be an entry for the timestamp I require.
Datasets are quite large, and due to duplicates, can't use vlookup.
Samples:
Table 1:
Device Name|Time Bracket| On/Off?
ID1 |06:20:00 |
ID2 |06:20:00 |
ID3 |06:30:00 |
Table 2:
Device Name |Timestamp |On/Off?
ID1 |06:20:00 |On
ID2 |06:50:00 |Off
ID3 |07:20:00 |Off
What I want to achieve:
I want an if statement to check if:
1) device ID matches AND
2) timestamp matches
If so, return the value of On/Off from Table 2.
If not, then I want it to return the value of the cell above it IF it's the same device, otherwise just put "absent" into the cell.
I thought I could do this with some IF statements like so:
=if(HOUR([#[Time Bracket]]) = HOUR(Table13[#[Timestamp Rounded (GMT)]]) and
minute([#[Time Bracket]]) = minute(Table13[#[Timestamp Rounded (GMT)]]) and
[#[Device Name]]=Table13[#[Device Name]], Table13[#[On/Off?]],
IF([#[Device Name]]=Table13[#[Device Name]], INDIRECT("B" and Rows()-1), "absent"))
(I put some newlines in there for readability)
However, this doesn't seem to resolve at all... what am I doing wrong?
Is this even the correct way of achieving this?
I've also tried something similar with a VLookUp, but that failed horribly.
Thanks all!
To not deal with array formulas or merging strings which, (not in your case) can still be wrong at the end, I suggest the use of COUNTIFS due to the fact, you have a very small amount of outcomes (just on or off)...
for the first table (starting at A1, so the formula is at C2):
=IFERROR(CHOOSE(
OR(COUNTIFS(Table13[Device Name],[#[Device Name]],Table13[Timestamp],[#[Time Bracket]],Table13[On/Off?],"On"))+
OR(COUNTIFS(Table13[Device Name],[#[Device Name]],Table13[Timestamp],[#[Time Bracket]],Table13[On/Off?],"Off"))*2
,"On","Off","Error"),IF(A1=[#[Device Name]],C1,"Absent"))
this will also show "Error" of a match for "On" and "Off" is shown... to skip that and increase the speed, you also could use:
=IF(COUNTIFS(Table13[Device Name],[#[Device Name]],Table13[Timestamp],[#[Time Bracket]],Table13[On/Off?],"On"),"On",
IF(COUNTIFS(Table13[Device Name],[#[Device Name]],Table13[Timestamp],[#[Time Bracket]],Table13[On/Off?],"Off"),"Off",
IF(A1=[#[Device Name]],C1,"Absent")))
For both the "Device Name" is at column A, "Time Bracket" at column B and "On/Off?" at column C while the table starts at row 1... If that is not the case for you, then change A1 and C1 so they match
(Also inserted line-breaks for better reading)
Picture to show the layout:
I picked the second formula to show how it works... also, this formula should not be able to return 0's... I'm confused
Couple of good suggestions, however using the helper column as suggested in the topic by Scott Craner above worked.
Created a helper column of concat'd device ID and timestamp for both tables, then did a simple VlookUp.
Another lesson learned: Think outside of the box, and go with simple solutions, rather than try + be too clever like I was doing... :)
I have 3 columns with the following name:
name1, name2 and value.
I want to obtain in another sheet two tables having two compulsory condition (min and max calculated using name1 and name2):
the name of the first table to be taken from the column with name2 and this table has two partition.
the first partition named max, is calculating the max for 30_-20, 40_-20, 50_-20, 30_22, 40_22, 50_22, 30_60, 40_60, 50_60 and second partition named min, is calculating the min for 30_-20, 40_-20, 50_-20, 30_22, 40_22, 50_22, 30_60, 40_60, 50_60.
What I want to say can be viewed in the following picture.
I need this for my job, and I don't know anything about macros. I think it will be necessary to learn macros.
Given a layout on a single worksheet similar to your sample image.
The standard formula(s) for G3, I3, M3 and O3 are:
=MAX(INDEX($C$2:$C$999*($A$2:$A$999=H3)*($B$2:$B$999=H$1), , ))
=MIN(INDEX($C$2:$C$999+(($A$2:$A$999<>H3)+($B$2:$B$999<>H$1))*1E+99, , ))
=MAX(INDEX($C$2:$C$999*($A$2:$A$999=N3)*($B$2:$B$999=N$1), , ))
=MIN(INDEX($C$2:$C$999+(($A$2:$A$999<>N3)+($B$2:$B$999<>N$1))*1E+99, , ))
Fill down as necessary. It is usually easier to reference a cell containing a value (e.g. 30_-20) than repeatedly hardcoding the value into a variety of similar formulas. I've used H3:H5 and N3:N5 for the column A values.
How it Works:
See MINIF, MAXIF and MODEIF with Standard Formulas.
I have two columns of data. I want to find which cities are in the top 10 in both columns (format IF in top10_1 AND top10_2), and this is the formula I am trying to use:
=AND(LARGE($B$2:$B$330,10),(LARGE($C$2:$C$330,10))
Any suggestions on what I can do to make it work? I included some sample data if it helps.
CityState Climate Housing
Stamford, CT 2.959 -4.234
Norwalk, CT 2.959 -4.118
San Diego, CA 2.955 -4.160
Honolulu, HI 2.949 -4.146
San Jose, CA 2.946 -4.205
LARGE returns the nth highest value in a set, so your condition would be something like:
=AND($B2 >= LARGE($B$2:$B$330,10),$C2 >= LARGE($C$2:$C$330,10))
Note the relative row reference in each condition so that the conditional formatting applies to each row individually.