Sorting to find maximum values

Sorting to find maximum values - excel

Here is a sample data table:
ID Number Test Type Score
001 A 81
001 A 75
001 A 93
001 B 62
001 B 87
001 B 82
002 A 91
002 A 83
002 B 94
002 B 97
What I want, in excel, is a return of the maximum score of each test type for each id number so it would look like this...
ID Number Test Type Score
001 A 93
001 B 87
002 A 91
002 B 97
Is that possible?

You can use MAXIFS(). On your second table, if you have the ID number and Test Types, in the C column you can do:
=MAXIFS(Sheet1!$C$2:$C$1000,Sheet1!$A$2:$A$1000,$A2,Sheet1!$B$2:$B$1000,$B2)
Where Sheet1 is your main table.

Related

Calculate quantity using an unrelated field in dax

ID
001
002
REQ
ID ITEM QUANT
001 chips 20
002 chips 100
SCHEDULE
1 001 cleaning
1 002 normal
2 001 normal
2 002 remodel
3 001 normal
3 002 remodel
4 001 remodel
4 002 cleaning
item = corn chips
id_store
1
2
3
4
001
phase
cleaning
normal
normal
remodel
quant
0
20
20
5
002
phase
normal
remodel
remodel
cleaning
quant
100
5
5
0
I want to calculate a quant given a store phase. if the store is cleaning then its 0 quant, if remodeling then 5 quant else its the quant from requirements.
normally I would do this with a switch statement in dax but the phase data is not in my table. Please assist.

it turns out a simple switch statement looking at different tables works just fine.
Num Items :=
VAR T = sum(REQ[Quant])
RETURN
SWITCH(
TRUE(),
VALUES(SCHEDULE[PHASE]) = "cleaning", 0,
VALUES(SCHEDULE[PHASE]) = "remodeling", 5,
T
)

Pivot a column so repeated values/records are placed in 1 cell

I have the following
Input:
samples = [('001', 'RENAL', 'CHROMOPHOBE', 'KICH'),
('002', 'OVARIAN', 'HIGH_GRADE_SEROUS_CARCINOMA', 'LGSOC'),
('003', 'OVARIAN', 'OTHER', 'NaN'),
('001', 'COLORECTAL', 'ADENOCARCINOMA', 'KICH')]
labels = ['id', 'disease_type', 'disease_sub_type', 'study_abbreviation']
df = pd.DataFrame.from_records(samples, columns=labels)
df
id disease_type disease_sub_type study_abbreviation
0 001 RENAL CHROMOPHOBE KICH
1 002 OVARIAN HIGH_GRADE_SEROUS_CARCINOMA LGSOC
2 003 OVARIAN OTHER NaN
3 001 COLORECTAL ADENOCARCINOMA KICH
I want to be able to compress the repeated id, say 001 in this case so that I can have the disease_type and disease_sub_type, study_abbreviation merged into 1 cell each (nested).
Output
id disease_type disease_sub_type study_abbreviation
0 001 RENAL,COLORECTAL CHROMOPHOBE,ADENOCARCINOMA KICH, KICH
1 002 OVARIAN HIGH_GRADE_SEROUS_CARCINOMA LGSOC
2 003 OVARIAN OTHER NaN
This is not for anything but admin work hence the stupid ask but would help greatly when I need to merge on other datasets, thanks again.

You could group by your 'id' column and use list as an aggregation:
df.groupby('id',as_index=False).agg(','.join)
id disease_type disease_sub_type study_abbreviation
0 001 RENAL,COLORECTAL CHROMOPHOBE,ADENOCARCINOMA KICH,KICH
1 002 OVARIAN HIGH_GRADE_SEROUS_CARCINOMA LGSOC
2 003 OVARIAN OTHER NaN

Sort rows in a table in excel

I have a large table with lots of "ORDERS" that go by numbers, each has multiple numbered "STEPS" (not necessarily equal in all orders). Each step has a "STATUS" which is one of two: "In process" or "completed".
I want to create a column that tracks each orders' step, and if they are all completed it will mark the order in all of it's rows as "FINISHED".
I tried formula Array but I can't think about something that worked.
EXAMPLE of a Desired outcome: (First row and column are belong to Excel's bar)
A
B
C
D
1
ORDER number
STEP number
STEP Status
ORDER STATUS
2
179
001
completed
FINISHED
3
179
002
completed
FINISHED
4
179
003
completed
FINISHED
5
179
004
completed
FINISHED
6
192
001
In process
7
192
002
completed
8
192
003
completed
9
192
004
In process
10
192
005
In process
11
202
001
completed
FINISHED
12
202
002
completed
FINISHED
13
202
003
completed
FINISHED
14
202
004
completed
FINISHED
15
202
005
completed
FINISHED
16
202
006
completed
FINISHED

In D2, formula should be =IF(COUNTIFS(A:A, A2, C:C, "completed") = COUNTIF(A:A, A2), "FINISHED", ""). Then copy that formula down column D.

How to calculate data changes over time using Python

For the following dataframe, I need calculate the change in 'count', for each set of date, location_id, uid and include the set in the results.
# Sample DataFrame
df = pd.DataFrame({'date': ['2021-01-01', '2021-01-01','2021-01-01','2021-01-02', '2021-01-02','2021-01-02'],
'location_id':[1001,2001,3001, 1001,2001,3001],
'uid': ['001', '003', '002','001', '004','002'],
'uid_count':[1, 2,3 ,2, 2, 4]})
date location_id uid count
0 2021-01-01 1001 001 1
1 2021-01-01 2001 003 2
2 2021-01-01 3001 002 3
3 2021-01-02 1001 001 2
4 2021-01-02 2001 004 2
5 2021-01-02 3001 002 4
My desired results would look like:
# Desired Results
date location_id uid
2021-01-01 1001 001 0
2001 003 0
3001 002 0
2021-01-02 1001 001 1
2001 004 0
3001 002 1
I thought I could do this via groupby by using the following, but the desired calculation isn't made:
# Current code:
df.groupby(['date','location_id','uid'],sort=False).apply(lambda x: (x['count'].values[-1] - x['count'].values[0]))
# Current results:
date location_id uid
2021-01-01 1001 001 0
2001 003 0
3001 002 0
2021-01-02 1001 001 0
2001 004 0
3001 002 0
How can I get the desired results?

The following code works with the test dataframe, I'm not certain about a larger dataframe
.transform() is used to calculate the differences for consecutive occurrences of 'uid_count', for each uid, with the same index as df.
The issue with .groupby(['date','location_id','uid'], is that each group only contains a single value.
Remove 'uid_count' at the end, with .drop(columns='uid_count'), if desired.
import pandas as pd
# sort the dataframe
df = df.sort_values(['date', 'location_id', 'uid'])
# groupby and transform based on the difference in uid_count
uid_count_diff = df.groupby(['location_id', 'uid']).uid_count.transform(lambda x: x.diff()).fillna(0).astype(int)
# create a column in df
df['uid_count_diff'] = uid_count_diff
# set the index
df = df.set_index(['date', 'location_id', 'uid'])
# result
uid_count uid_count_diff
date location_id uid
2021-01-01 1001 001 1 0
2001 003 2 0
3001 002 3 0
2021-01-02 1001 001 2 1
2001 004 2 0
3001 002 4 1

How to convert oracle number type to string with format?

I want to convert number type to string with format as:
number -> string
1 -> 001
2 -> 002
12 -> 012
340 -> 340

You can use either TO_CHAR() (preferable in this situation) function or LPAD() function to achieve the desired result:
SQL> with t1(col) as(
2 select 1 from dual union all
3 select 2 from dual union all
4 select 12 from dual union all
5 select 340 from dual
6 )
7 select to_char(col, '000') as num_1
8 , lpad(to_char(col), 3, '0') as num_2
9 from t1
10 ;
NUM_1 NUM_2
----- ------------
001 001
002 002
012 012
340 340

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sorting to find maximum values - excel

You can use MAXIFS(). On your second table, if you have the ID number and Test Types, in the C column you can do: =MAXIFS(Sheet1!$C$2:$C$1000,Sheet1!$A$2:$A$1000,$A2,Sheet1!$B$2:$B$1000,$B2) Where Sheet1 is your main table.

Related

Calculate quantity using an unrelated field in dax

Pivot a column so repeated values/records are placed in 1 cell

Sort rows in a table in excel

How to calculate data changes over time using Python

How to convert oracle number type to string with format?

Categories

Resources