So I have 3 different columns in a excel file, what I need to is convert rows into columns by reference_number - excel

So I have columns like this,
REF_NO
LCY_AMOUNT
TAG
001
200
NEGO
001
300
EXCH
001
350
POST
001
400
CONF
002
300
NEGO
002
400
EXCH
002
450
POST
002
500
CONF
What I need is Tag rows into columns like this
REF_NO
NEGO
EXCH
POST
CONF
001
200
300
350
400
002
300
400
450
500

So went with sumifs():
SUMIFS($B$3:$B$10,$A$3:$A$10,$A15,$C$3:$C$10,B$14)
And to produce the values in A15 and A16, you can check out unique().

Related

Calculate quantity using an unrelated field in dax

ID
001
002
REQ
ID ITEM QUANT
001 chips 20
002 chips 100
SCHEDULE
1 001 cleaning
1 002 normal
2 001 normal
2 002 remodel
3 001 normal
3 002 remodel
4 001 remodel
4 002 cleaning
item = corn chips
id_store
1
2
3
4
001
phase
cleaning
normal
normal
remodel
quant
0
20
20
5
002
phase
normal
remodel
remodel
cleaning
quant
100
5
5
0
I want to calculate a quant given a store phase. if the store is cleaning then its 0 quant, if remodeling then 5 quant else its the quant from requirements.
normally I would do this with a switch statement in dax but the phase data is not in my table. Please assist.
it turns out a simple switch statement looking at different tables works just fine.
Num Items :=
VAR T = sum(REQ[Quant])
RETURN
SWITCH(
TRUE(),
VALUES(SCHEDULE[PHASE]) = "cleaning", 0,
VALUES(SCHEDULE[PHASE]) = "remodeling", 5,
T
)

Pivot a column so repeated values/records are placed in 1 cell

I have the following
Input:
samples = [('001', 'RENAL', 'CHROMOPHOBE', 'KICH'),
('002', 'OVARIAN', 'HIGH_GRADE_SEROUS_CARCINOMA', 'LGSOC'),
('003', 'OVARIAN', 'OTHER', 'NaN'),
('001', 'COLORECTAL', 'ADENOCARCINOMA', 'KICH')]
labels = ['id', 'disease_type', 'disease_sub_type', 'study_abbreviation']
df = pd.DataFrame.from_records(samples, columns=labels)
df
id disease_type disease_sub_type study_abbreviation
0 001 RENAL CHROMOPHOBE KICH
1 002 OVARIAN HIGH_GRADE_SEROUS_CARCINOMA LGSOC
2 003 OVARIAN OTHER NaN
3 001 COLORECTAL ADENOCARCINOMA KICH
I want to be able to compress the repeated id, say 001 in this case so that I can have the disease_type and disease_sub_type, study_abbreviation merged into 1 cell each (nested).
Output
id disease_type disease_sub_type study_abbreviation
0 001 RENAL,COLORECTAL CHROMOPHOBE,ADENOCARCINOMA KICH, KICH
1 002 OVARIAN HIGH_GRADE_SEROUS_CARCINOMA LGSOC
2 003 OVARIAN OTHER NaN
This is not for anything but admin work hence the stupid ask but would help greatly when I need to merge on other datasets, thanks again.
You could group by your 'id' column and use list as an aggregation:
df.groupby('id',as_index=False).agg(','.join)
id disease_type disease_sub_type study_abbreviation
0 001 RENAL,COLORECTAL CHROMOPHOBE,ADENOCARCINOMA KICH,KICH
1 002 OVARIAN HIGH_GRADE_SEROUS_CARCINOMA LGSOC
2 003 OVARIAN OTHER NaN

Distribute quantity through min max in Excel

I initially tried to do this directly in SQL Server but it seems like it can't be possible through query so I want to calculate this "Distribute" column in Excel. Below is the details of the question. Appreciate if someone can help here.
I have following column in Excel and want to calculate values in "Distribute" column.
Item
Qty
Customer
Rank
Min
Max
Distribute
001
1500
0101
1
250
600
????
001
1500
0104
2
0
500
????
001
1500
0103
3
100
300
????
001
1500
0105
4
200
300
????
002
2000
0104
1
200
600
????
002
2000
0105
2
150
700
????
002
2000
0101
3
100
200
????
002
2000
0103
4
100
500
????
002
2000
0102
5
50
200
????
003
800
0103
1
100
500
????
003
800
0102
2
50
200
????
003
800
0101
2
50
100
????
003
800
0104
3
50
80
????
There are multiple items (Item) and each item has fixed quantity available (Qty)
Each item is distributed in different customers (Customers) based on their rank (Rank). The ranks are group by for every item. Data is already sorted via Rank column for every item. Multiple customers against an item can have same rank.
From the total quantity (Qty) of each item, every customer must get minimum quantity mentioned in (Min) column irrespective of its rank.
The remaining quantity of every item must be distribute based on the rank of the customer making sure that it should not exceed to the maximum quantity mentioned in (Max) column.
It is OK, if total quantity of the item is not consumed after distribution maximum quantity to all customer.
What I am after is the result something like this:
Item
Qty
Customer
Rank
Min
Max
Distribute
001
1500
0101
1
250
600
600
001
1500
0104
2
0
500
500
001
1500
0103
3
100
300
200
001
1500
0105
4
200
300
200
002
2000
0104
1
200
600
600
002
2000
0105
2
150
700
700
002
2000
0101
3
100
200
200
002
2000
0103
4
100
500
450
002
2000
0102
5
50
200
50
003
800
0103
1
100
500
500
003
800
0102
2
50
200
200
003
800
0101
2
50
100
50
003
800
0104
3
50
80
50
Looking forward if you can provide a formula or solution here. Thanks for your help.
FORMULA BASED SOLUTION
Here is a possible formula based solution with multiple cells involed that assumes the table is already properly sorted (by rank with any order then by rank from smaller to greater) and will stay that way:
A
B
C
D
E
F
G
H
I
J
K
Item
Qty
Customer
Rank
Min
Max
[Cumulative] Qty - Min
Basic
[Cumulative] Remain
Extra
Distribute
1
1500
101
1
250
600
=MAX(0,IF(A1<>A2,B2-E2,G1-E2))
=IF(A2<>A1,MIN(B2,E2),MIN(G1,E2))
=IF(A1<>A2,AGGREGATE(15,6,G:G/(A:A=A2),1),MAX(0,I1-(F1-E1)))
=MIN(I2,F2-E2)
=H2+J2
1
1500
104
2
0
500
=MAX(0,IF(A2<>A3,B3-E3,G2-E3))
=IF(A3<>A2,MIN(B3,E3),MIN(G2,E3))
=IF(A2<>A3,AGGREGATE(15,6,G:G/(A:A=A3),1),MAX(0,I2-(F2-E2)))
=MIN(I3,F3-E3)
=H3+J3
1
1500
103
3
100
300
=MAX(0,IF(A3<>A4,B4-E4,G3-E4))
=IF(A4<>A3,MIN(B4,E4),MIN(G3,E4))
=IF(A3<>A4,AGGREGATE(15,6,G:G/(A:A=A4),1),MAX(0,I3-(F3-E3)))
=MIN(I4,F4-E4)
=H4+J4
1
1500
105
4
200
300
=MAX(0,IF(A4<>A5,B5-E5,G4-E5))
=IF(A5<>A4,MIN(B5,E5),MIN(G4,E5))
=IF(A4<>A5,AGGREGATE(15,6,G:G/(A:A=A5),1),MAX(0,I4-(F4-E4)))
=MIN(I5,F5-E5)
=H5+J5
2
2000
104
1
200
600
=MAX(0,IF(A5<>A6,B6-E6,G5-E6))
=IF(A6<>A5,MIN(B6,E6),MIN(G5,E6))
=IF(A5<>A6,AGGREGATE(15,6,G:G/(A:A=A6),1),MAX(0,I5-(F5-E5)))
=MIN(I6,F6-E6)
=H6+J6
2
2000
105
2
150
700
=MAX(0,IF(A6<>A7,B7-E7,G6-E7))
=IF(A7<>A6,MIN(B7,E7),MIN(G6,E7))
=IF(A6<>A7,AGGREGATE(15,6,G:G/(A:A=A7),1),MAX(0,I6-(F6-E6)))
=MIN(I7,F7-E7)
=H7+J7
2
2000
101
3
100
200
=MAX(0,IF(A7<>A8,B8-E8,G7-E8))
=IF(A8<>A7,MIN(B8,E8),MIN(G7,E8))
=IF(A7<>A8,AGGREGATE(15,6,G:G/(A:A=A8),1),MAX(0,I7-(F7-E7)))
=MIN(I8,F8-E8)
=H8+J8
2
2000
103
4
100
500
=MAX(0,IF(A8<>A9,B9-E9,G8-E9))
=IF(A9<>A8,MIN(B9,E9),MIN(G8,E9))
=IF(A8<>A9,AGGREGATE(15,6,G:G/(A:A=A9),1),MAX(0,I8-(F8-E8)))
=MIN(I9,F9-E9)
=H9+J9
2
2000
102
5
50
200
=MAX(0,IF(A9<>A10,B10-E10,G9-E10))
=IF(A10<>A9,MIN(B10,E10),MIN(G9,E10))
=IF(A9<>A10,AGGREGATE(15,6,G:G/(A:A=A10),1),MAX(0,I9-(F9-E9)))
=MIN(I10,F10-E10)
=H10+J10
3
800
103
1
100
500
=MAX(0,IF(A10<>A11,B11-E11,G10-E11))
=IF(A11<>A10,MIN(B11,E11),MIN(G10,E11))
=IF(A10<>A11,AGGREGATE(15,6,G:G/(A:A=A11),1),MAX(0,I10-(F10-E10)))
=MIN(I11,F11-E11)
=H11+J11
3
800
102
2
50
200
=MAX(0,IF(A11<>A12,B12-E12,G11-E12))
=IF(A12<>A11,MIN(B12,E12),MIN(G11,E12))
=IF(A11<>A12,AGGREGATE(15,6,G:G/(A:A=A12),1),MAX(0,I11-(F11-E11)))
=MIN(I12,F12-E12)
=H12+J12
3
800
101
2
50
100
=MAX(0,IF(A12<>A13,B13-E13,G12-E13))
=IF(A13<>A12,MIN(B13,E13),MIN(G12,E13))
=IF(A12<>A13,AGGREGATE(15,6,G:G/(A:A=A13),1),MAX(0,I12-(F12-E12)))
=MIN(I13,F13-E13)
=H13+J13
3
800
104
3
50
80
=MAX(0,IF(A13<>A14,B14-E14,G13-E14))
=IF(A14<>A13,MIN(B14,E14),MIN(G13,E14))
=IF(A13<>A14,AGGREGATE(15,6,G:G/(A:A=A14),1),MAX(0,I13-(F13-E13)))
=MIN(I14,F14-E14)
=H14+J14
VBA SOLUTION
Here is a possible VBA solution that assumes the table is already properly sorted (by rank with any order then by rank from smaller to greater) and will stay that way:
Sub SubDistribution()
Dim RngData As Range
Dim RngItem As Range
Dim RngQty As Range
Dim RngMin As Range
Dim RngMax As Range
Dim RngDistribute As Range
Dim VarArray() As Variant
Dim DblItemCol As Double
Dim DblQtyCol As Double
Dim DblMinCol As Double
Dim DblMaxCol As Double
Dim DblRow As Double
Dim DblCounter01 As Double
Dim DblQuantity As Double
Dim BlnFirstLap As Boolean
Set RngData = Range("A2")
Set RngQty = Range("B2")
Set RngItem = Range("A2")
Set RngMin = Range("E2")
Set RngMax = Range("F2")
Set RngDistribute = Range("G2")
DblItemCol = RngData.Column - RngItem.Column + 1
DblQtyCol = RngData.Column - RngQty.Column + 1
DblMinCol = RngData.Column - RngMin.Column + 1
DblMaxCol = RngData.Column - RngMax.Column + 1
Set RngData = Range(RngData, RngData.End(xlToRight).End(xlDown))
ReDim VarArray(1 To RngData.Rows.Count)
For DblRow = 1 To RngData.Rows.Count
If RngItem.Offset(DblRow).Value = RngItem.Offset(DblRow - 1).Value And BlnFirstLap = False Then
DblQuantity = RngQty.Offset(DblRow - 1).Value
BlnFirstLap = True
Else
If RngItem.Offset(DblRow).Value <> RngItem.Offset(DblRow - 1).Value Then
BlnFirstLap = False
End If
End If
If RngItem.Offset(DblRow).Value <> RngItem.Offset(DblRow - 1) Then
VarArray(DblRow) = Excel.WorksheetFunction.Min(RngQty.Offset(DblRow - 1), RngMin.Offset(DblRow - 1))
Else
VarArray(DblRow) = Excel.WorksheetFunction.Min(DblQuantity, RngMin.Offset(DblRow - 1))
End If
DblQuantity = Excel.WorksheetFunction.Max(0, DblQuantity - RngMin.Offset(DblRow - 1).Value)
If BlnFirstLap = True Then
DblCounter01 = DblCounter01 + 1
Else
For DblCounter01 = DblCounter01 To 0 Step -1
VarArray(DblRow - DblCounter01) = VarArray(DblRow - DblCounter01) + Excel.WorksheetFunction.Min(DblQuantity, RngMax.Offset(DblRow - 1 - DblCounter01) - RngMin.Offset(DblRow - 1 - DblCounter01))
DblQuantity = Excel.WorksheetFunction.Max(0, DblQuantity - (RngMax.Offset(DblRow - 1 - DblCounter01).Value - RngMin.Offset(DblRow - 1 - DblCounter01).Value))
Next
DblCounter01 = 0
End If
Next
RngDistribute.Resize(UBound(VarArray)).Value = Excel.WorksheetFunction.Transpose(VarArray)
End Sub

How to calculate data changes over time using Python

For the following dataframe, I need calculate the change in 'count', for each set of date, location_id, uid and include the set in the results.
# Sample DataFrame
df = pd.DataFrame({'date': ['2021-01-01', '2021-01-01','2021-01-01','2021-01-02', '2021-01-02','2021-01-02'],
'location_id':[1001,2001,3001, 1001,2001,3001],
'uid': ['001', '003', '002','001', '004','002'],
'uid_count':[1, 2,3 ,2, 2, 4]})
date location_id uid count
0 2021-01-01 1001 001 1
1 2021-01-01 2001 003 2
2 2021-01-01 3001 002 3
3 2021-01-02 1001 001 2
4 2021-01-02 2001 004 2
5 2021-01-02 3001 002 4
My desired results would look like:
# Desired Results
date location_id uid
2021-01-01 1001 001 0
2001 003 0
3001 002 0
2021-01-02 1001 001 1
2001 004 0
3001 002 1
I thought I could do this via groupby by using the following, but the desired calculation isn't made:
# Current code:
df.groupby(['date','location_id','uid'],sort=False).apply(lambda x: (x['count'].values[-1] - x['count'].values[0]))
# Current results:
date location_id uid
2021-01-01 1001 001 0
2001 003 0
3001 002 0
2021-01-02 1001 001 0
2001 004 0
3001 002 0
How can I get the desired results?
The following code works with the test dataframe, I'm not certain about a larger dataframe
.transform() is used to calculate the differences for consecutive occurrences of 'uid_count', for each uid, with the same index as df.
The issue with .groupby(['date','location_id','uid'], is that each group only contains a single value.
Remove 'uid_count' at the end, with .drop(columns='uid_count'), if desired.
import pandas as pd
# sort the dataframe
df = df.sort_values(['date', 'location_id', 'uid'])
# groupby and transform based on the difference in uid_count
uid_count_diff = df.groupby(['location_id', 'uid']).uid_count.transform(lambda x: x.diff()).fillna(0).astype(int)
# create a column in df
df['uid_count_diff'] = uid_count_diff
# set the index
df = df.set_index(['date', 'location_id', 'uid'])
# result
uid_count uid_count_diff
date location_id uid
2021-01-01 1001 001 1 0
2001 003 2 0
3001 002 3 0
2021-01-02 1001 001 2 1
2001 004 2 0
3001 002 4 1

Sorting to find maximum values

Here is a sample data table:
ID Number Test Type Score
001 A 81
001 A 75
001 A 93
001 B 62
001 B 87
001 B 82
002 A 91
002 A 83
002 B 94
002 B 97
What I want, in excel, is a return of the maximum score of each test type for each id number so it would look like this...
ID Number Test Type Score
001 A 93
001 B 87
002 A 91
002 B 97
Is that possible?
You can use MAXIFS(). On your second table, if you have the ID number and Test Types, in the C column you can do:
=MAXIFS(Sheet1!$C$2:$C$1000,Sheet1!$A$2:$A$1000,$A2,Sheet1!$B$2:$B$1000,$B2)
Where Sheet1 is your main table.

Resources