How does one use an assignment expression in a dictionary comprehension? - python-3.x

Suppose I have the below data frame:
df = pd.DataFrame([
[100,90,80,70,36,45],
[101,78,65,88,55,78],
[92,77,42,79,43,32],
[103,98,76,54,45,65]],
index = pd.date_range(start='2022-01-01', periods=4)
)
df.columns = pd.MultiIndex.from_tuples(
(("mkf", "Open"),
("mkf", "Close"),
("tdf", "Open"),
("tdf","Close"),
("ghi","Open"),
("ghi", "Close"))
)
And then I execute the following dictionary comprehension:
{c:df[c].assign(r=np.log(df[(c, 'Close')]).diff()) for c in df.columns.levels[0]}
{'ghi': Open Close r
2022-01-01 36 45 NaN
2022-01-02 55 78 0.550046
2022-01-03 43 32 -0.890973
2022-01-04 45 65 0.708651,
'mkf': Open Close r
2022-01-01 100 90 NaN
2022-01-02 101 78 -0.143101
2022-01-03 92 77 -0.012903
2022-01-04 103 98 0.241162,
'tdf': Open Close r
2022-01-01 80 70 NaN
2022-01-02 65 88 0.228842
2022-01-03 42 79 -0.107889
2022-01-04 76 54 -0.380464}
How would one produce the same result with an assignment expression (i.e. the symbol := )?
https://www.digitalocean.com/community/tutorials/how-to-use-assignment-expressions-in-python

Related

How can I create a nested dictionary that performs a computation and inserts a column?

Suppose I have the following dataframe with multi-index columns. Note that this dataframe is composed of three similarly structured sub dataframes.
df = pd.DataFrame([
[100,90,80,70,36,45],
[101,78,65,88,55,78],
[92,77,42,79,43,32],
[103,98,76,54,45,65]],
index = pd.date_range(start='2022-01-01', periods=4)
)
df.columns = pd.MultiIndex.from_tuples(
(("mkf", "Open"),
("mkf", "Close"),
("tdf", "Open"),
("tdf","Close"),
("ghi","Open"),
("ghi", "Close"))
)
And then I apply a dictionary comp to it.
{c:df[c] for c in df.columns.levels[0]}
{'ghi': Open Close
2022-01-01 36 45
2022-01-02 55 78
2022-01-03 43 32
2022-01-04 45 65,
'mkf': Open Close
2022-01-01 100 90
  2022-01-02 101 78
2022-01-03 92 77
2022-01-04 103 98,
'tdf': Open Close
2022-01-01 80 70
2022-01-02 65 88
2022-01-03 42 79
2022-01-04 76 54}
How can I modify the dictionary comprehension {c:df[c] for c in df.columns.levels[0]}
to be a nested dictionary comprehension that applies the below function to each component of the dictionary?
def log_return(df):
df['r'] = np.log(df['Close']).diff()

Calculate weighted average results for multiple columns based on another dataframe in Pandas

Let's say we have a students' score data df1 and credit data df2 as follows:
df1:
stu_id major Python English C++
0 U202010521 computer 56 81 82
1 U202010522 management 92 56 64
2 U202010523 management 95 88 81
3 U202010524 BigData&AI 79 53 74
4 U202010525 computer 53 71 -1
5 U202010526 computer 78 96 53
6 U202010527 BigData&AI 69 63 74
7 U202010528 BigData&AI 86 57 82
8 U202010529 BigData&AI 81 100 85
9 U202010530 BigData&AI 79 67 80
df2:
class credit
0 Python 2
1 English 4
2 C++ 3
I need to calculate weighted average for each students' scores.
df2['credit_ratio'] = df2['credit']/9
Out:
class credit credit_ratio
0 Python 2 0.222222
1 English 4 0.444444
2 C++ 3 0.333333
ie., for U202010521, his/her weighted score will be 56*0.22 + 81*0.44 + 82*0.33 = 75.02, I need to calculate each student's weighted_score as a new column, how could I do that in Pandas?
Try with set_index + mul then sum on axis=1:
df1['weighted_score'] = (
df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
)
df1:
stu_id major Python English C++ weighted_score
0 U202010521 computer 56 81 82 75.777778
1 U202010522 management 92 56 64 66.666667
2 U202010523 management 95 88 81 87.222222
3 U202010524 BigData&AI 79 53 74 65.777778
4 U202010525 computer 53 71 -1 43.000000
5 U202010526 computer 78 96 53 77.666667
6 U202010527 BigData&AI 69 63 74 68.000000
7 U202010528 BigData&AI 86 57 82 71.777778
8 U202010529 BigData&AI 81 100 85 90.777778
9 U202010530 BigData&AI 79 67 80 74.000000
Explaination:
By setting the index of df2 to class, multiplication will now align correctly with the columns of df1:
df2.set_index('class')['credit_ratio']
class
Python 0.222222
English 0.444444
C++ 0.333333
Name: credit_ratio, dtype: float64
Select the specific columns from df1 using the values from df2:
df1[df2['class']]
Python English C++
0 56 81 82
1 92 56 64
2 95 88 81
3 79 53 74
4 53 71 -1
5 78 96 53
6 69 63 74
7 86 57 82
8 81 100 85
9 79 67 80
Multiply to apply the weights:
df1[df2['class']].mul(df2.set_index('class')['credit_ratio'])
Python English C++
0 12.444444 36.000000 27.333333
1 20.444444 24.888889 21.333333
2 21.111111 39.111111 27.000000
3 17.555556 23.555556 24.666667
4 11.777778 31.555556 -0.333333
5 17.333333 42.666667 17.666667
6 15.333333 28.000000 24.666667
7 19.111111 25.333333 27.333333
8 18.000000 44.444444 28.333333
9 17.555556 29.777778 26.666667
Then sum across rows to get the total value.
df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
0 75.777778
1 66.666667
2 87.222222
3 65.777778
4 43.000000
5 77.666667
6 68.000000
7 71.777778
8 90.777778
9 74.000000
dtype: float64
I can do it in several steps, complete workflow is below:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(
"""stu_id major Python English C++
U202010521 computer 56 81 82
U202010522 management 92 56 64
U202010523 management 95 88 81
U202010524 BigData&AI 79 53 74
U202010525 computer 53 71 -1
U202010526 computer 78 96 53
U202010527 BigData&AI 69 63 74
U202010528 BigData&AI 86 57 82
U202010529 BigData&AI 81 100 85
U202010530 BigData&AI 79 67 80"""), sep="\s+")
df2 = pd.read_csv(StringIO(
"""class credit
Python 2
English 4
C++ 3"""), sep="\s+")
df2['credit_ratio'] = df2['credit']/9
df3 = df.melt(id_vars=["stu_id", "major"])
df3["credit_ratio"] = df3["variable"].map(df2[["class", "credit_ratio"]].set_index("class").to_dict()["credit_ratio"])
df3["G"] = df3["value"] * df3["credit_ratio"]
>>> df3.groupby("stu_id")["G"].sum()
stu_id
U202010521 75.777778
U202010522 66.666667
U202010523 87.222222
U202010524 65.777778
U202010525 43.000000
U202010526 77.666667
U202010527 68.000000
U202010528 71.777778
U202010529 90.777778
U202010530 74.000000

Adding empty row base on two columns in Pandas DataFrame

I have a dataframe of following structure
x y z
93 122 787.185547
93 123 847.964905
93 124 908.932190
93 125 1054.865845
93 126 1109.340576
x y is coordinates,and I know their range.For example
x_range=np.arange(90,130)
y_range=np.arange(100,130)
z is measurement data
Now I want to insert missing points with nan value in z
so it looks like
x y z
90 100 NaN
90 101 NaN
...........................
93 121 NaN
93 122 787.185547
93 123 847.964905
93 124 908.932190
...........................
129 128 NaN
129 129 NaN
It can be done by a simple but stupid for loop.
But is there a simple way to perform this?
I will recommend use itertools.product follow by merge
import itertools
df=pd.DataFrame(itertools.product(x_range,y_range),columns=['x','y']).merge(df,how='left')

Fill NaN values from its Previous Value pandas

I have below Data from the excel sheet and i want every NaN to be filled from Just its previous value even if its one or more NaN. I tried with ffill() method but doesn't solve the purpose because it takes very First value before NaN of the column and populated that to all NaN.
Could someone help pls.
My Dtaframe:
import pandas as pd
df = pd.read_excel("Example-sheat.xlsx",sheet_name='Sheet1')
#df = df.fillna(method='ffill')
#df = df['AuthenticZTed domaTT controller'].ffill()
print(df)
My DataFrame output:
AuthenticZTed domaTT controller source KTvice naHR
0 ZTPGRKMIK1DC200.example.com TTv1614
1 TT1NDZ45DC202.example.com TTv1459
2 TT1NDZ45DC202.example.com TTv1495
3 NaN TTv1670
4 TT1NDZ45DC202.example.com TTv1048
5 TN1CQI02DC200.example.com TTv1001
6 DU2RDCRDC1DC204.example.com TTva082
7 NaN xxgb-gen
8 ZTPGRKMIK1DC200.example.com TTva038
9 DU2RDCRDC1DC204.example.com TTv0071
10 NaN ttv0032
11 KT1MUC02DUDC201.example.com TTv0073
12 NaN TTv0679
13 TN1SZZ67DC200.example.com TTv1180
14 TT1NDZ45DC202.example.com TTv1181
15 TT1BLR01APDC200.example.com TTv0859
16 TN1SZZ67DC200.example.com xxg2089
17 NaN TTv1846
18 ZTPGRKMIK1DC200.example.com TTvtp064
19 PR1CPQ01DC200.example.com TTv0950
20 PR1CPQ01DC200.example.com TTc7005
21 NaN TTv0678
22 KT1MUC02DUDC201.example.com TTv257032798
23 PR1CPQ01DC200.example.com xxg2016
24 NaN TTv0313
25 TT1BLR01APDC200.example.com TTc4901
26 NaN TTv0710
27 DU2RDCRDC1DC204.example.com xxg3008
28 NaN TTv1080
29 PR1CPQ01DC200.example.com xxg2022
30 NaN xxg2057
31 NaN TTv1522
32 TN1SZZ67DC200.example.com TTv258998881
33 PR1CPQ01DC200.example.com TTv259064418
34 ZTPGRKMIK1DC200.example.com TTv259129955
35 TT1BLR01APDC200.example.com xxg2034
36 NaN TTv259326564
37 TNHSZPBCD2DC200.example.com TTv259129952
38 KT1MUC02DUDC201.example.com TTv259195489
39 ZTPGRKMIK1DC200.example.com TTv0683
40 ZTPGRKMIK1DC200.example.com TTv0885
41 TT1BLR01APDC200.example.com dbexh
42 NaN TTvtp065
43 TN1PEK01APDC200.example.com TTvtp057
44 ZTPGRKMIK1DC200.example.com TTvtp007
45 NaN TTvtp063
46 TT1BLR01APDC200.example.com TTvtp032
47 KTphbgsa11dc201.example.com TTvtp046
48 NaN TTvtp062
49 PR1CPQ01DC200.example.com TTv0235
50 NaN TTv0485
51 TT1NDZ45DC202.example.com TTv0236
52 NaN TTv0486
53 PR1CPQ01DC200.example.com TTv0237
54 NaN TTv0487
55 TT1BLR01APDC200.example.com TTv0516
56 TN1CQI02DC200.example.com TTv1285
57 TN1PEK01APDC200.example.com TTv0440
58 NaN liv9007
59 HR1GDL28DC200.example.com TTv0445
60 NaN tuv006
61 FTGFTPTP34DC203.example.com TTv0477
62 NaN tuv002
63 TN1CQI02DC200.example.com TTv0534
64 TN1SZZ67DC200.example.com TTv0639
65 NaN TTv0825
66 NaN TTv1856
67 TT1BLR01APDC200.example.com TTva101
68 TN1SZZ67DC200.example.com TTv1306
69 KTphbgsa11dc201.example.com TTv1072
70 NaN webx02
71 KT1MUC02DUDC201.example.com TTv1310
72 PR1CPQ01DC200.example.com TTv1151
73 TN1CQI02DC200.example.com TTv1165
74 NaN tuv90
75 TN1SZZ67DC200.example.com TTv1065
76 KTphbgsa11dc201.example.com TTv1737
77 NaN ramn01
78 HR1GDL28DC200.example.com ramn02
79 NaN ptb001
80 HR1GDL28DC200.example.com ptn002
81 NaN ptn003
82 TN1SZZ67DC200.example.com TTs0057
83 PR1CPQ01DC200.example.com TTs0058
84 NaN TTs0058-duplicZTe
85 PR1CPQ01DC200.example.com xxg2080
86 KTphbgsa11dc204.example.com xxg2081
87 TN1PEK01APDC200.example.com xxg2082
88 NaN xxg3002
89 TN1SZZ67DC200.example.com xxg2084
90 NaN xxg3005
91 ZTPGRKMIK1DC200.example.com xxg2086
92 NaN xxg3007
93 KT1MUC02DUDC201.example.com xxg2098
94 NaN xxg3014
95 TN1PEK01APDC200.example.com xxg2026
96 NaN xxg2094
97 TN1PEK01APDC200.example.com livtp005
98 KT1MUC02DUDC201.example.com xxg2059
99 ZTPGRKMIK1DC200.example.com acc9102
100 NaN xxg2111
101 TN1CQI02DC200.example.com xxgtp009
Desired Output:
AuthenticZTed domaTT controller source KTvice naHR
0 ZTPGRKMIK1DC200.example.com TTv1614
1 TT1NDZ45DC202.example.com TTv1459
2 TT1NDZ45DC202.example.com TTv1495
3 TT1NDZ45DC202.example.com TTv1670 <---
4 TT1NDZ45DC202.example.com TTv1048
5 TN1CQI02DC200.example.com TTv1001
6 DU2RDCRDC1DC204.example.com TTva082
7 DU2RDCRDC1DC204.example.com xxgb-gen <---
1- You are already close to your solution, just use shift() with ffill() it should work.
df = df.apply(lambda x: x.fillna(df['AuthenticZTed domaTT controller']).shift()).ffill()
2- As Quang Suggested that in the comments aso works..
df['AuthenticZTed domaTT controller'] = df['AuthenticZTed domaTT controller'].ffill()
3- or you can also try follows
df = df.fillna({var: df['AuthenticZTed domaTT controller'].shift() for var in df}).ffill()
4- other way around you can define a cols variable if you have multiple columns and then loop through it.
cols = ['AuthenticZTed domaTT controller', 'source KTvice naHR']
for cols in df.columns:
df[cols] = df[cols].ffill()
print(df)
OR
df.loc[:,cols] = df.loc[:,cols].ffill()

Generating all the combinations of 7 columns in a dataframe and add the corresponding rows to generate new columns

I have a dataframe that looks similar to below:
Wave A B C
340 77 70 15
341 80 73 15
342 83 76 16
343 86 78 17
I want to generate columns that will have all the possible combinations of the existing columns. I showed 3 cols here but in my actual data, I have 7 columns and therefore 127 total combinations. The desired output is as follows:
Wave A B C AB AC AD BC ... ABC
340 77 70 15 147 92 ...
341 80 73 15 153 95 ...
342 83 76 16 159 99 ...
I implemented a quite inefficient version where the user inputs the combinations (AB, AC, etc.) and a new col is created with the sum of the rows. This seems almost impossible to accomplish for 127 combinations, esp with descriptive col names.
Create a list of all combinations with chain + combinations from itertools, then sum the appropriate columns:
from itertools import combinations, chain
cols = [*df.iloc[:,1:]]
l = list(chain.from_iterable(combinations(cols, n+2) for n in range(len(cols))))
#[('A', 'B'), ('A', 'C'), ('B', 'C'), ('A', 'B', 'C')]
for items in l:
df[''.join(items)] = df.loc[:, items].sum(1)
Wave A B C AB AC BC ABC
0 340 77 70 15 147 92 85 162
1 341 80 73 15 153 95 88 168
2 342 83 76 16 159 99 92 175
3 343 86 78 17 164 103 95 181
You need to get the all combination first , then we just get the combination , and we need create the maps dict or Series
l=df.columns[1:].tolist()
l1=[list(map(list, itertools.combinations(l, i))) for i in range(len(l) + 1)]
d=[dict.fromkeys(y,''.join(y))for x in l1 for y in x ]
maps=pd.Series(d).apply(pd.Series).stack()
df.set_index('Wave',inplace=True)
df=df.reindex(columns=maps.index.get_level_values(1))
#here using reindex , get the order of your new df to the maps keys
df.columns=maps.tolist()
# here assign the new value to the column , since the order is same that why here I am assign it back
df.sum(level=0,axis=1)
Out[303]:
A B C AB AC BC ABC
Wave
340 77 70 15 147 92 85 162
341 80 73 15 153 95 88 168
342 83 76 16 159 99 92 175
343 86 78 17 164 103 95 181

Resources