So, I have a pandas data frame:
df =
a b c
a1 b1 c1
a2 b2 c1
a2 b3 c2
a2 b4 c2
I want to rename a2 into a1 and then group by a and c and add the corresponding values of b
df =
a b c
a1 b1+b2 c1
a1 b3+b4 c2
So, something like this
df =
a value c
a1 10 c1
a2 20 c1
a2 50 c2
a2 60 c2
df =
a value c
a1 30 c1
a1 110 c2
How to do this?
What about
>>> res = df.replace({"a": {"a2": "a1"}}).groupby(["a", "c"], as_index=False).sum()
>>> res
a c value
0 a1 c1 30
1 a1 c2 110
which first replaces "a2"s with "a1" in only a column and then groups by and sums.
To get the original column order back, we can reindex:
>>> res.reindex(df.columns, axis=1)
a value c
0 a1 30 c1
1 a1 110 c2
Try this:
df.groupby([df['a'].replace({'a2':'a1'}),'c']).sum().reset_index()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have this dataframe:
A C1 C2
a1 c1 c3
a2 c2 c4
And columns C1 and C2 has the same type.
And I want get this:
A C
a1 c1
a1 c3
a2 c2
a2 c4
How I can do this?
UPD:
In answers I get this info:
df_final = df.set_index('A').stack().droplevel(1).rename('C').reset_index()
Out[604]:
A C
0 a1 c1
1 a1 c3
2 a2 c2
3 a2 c4
But what I should if I want split in this way?
A B C1 C2 C3 C4
a1 b1 c1 c2 c3 c4
a2 b2 c5 c6 c7 c8
and get this:
A B C1 C2
a1 b1 c1 c2
a1 b1 c3 c4
a2 b2 c5 c6
a2 b2 c7 c8
Edit 2: If you have even number of columns Cx, you may use numpy to make it simple
import numpy as np
cols = ['C1','C2','C3','C4']
df1 = df.loc[df.index.repeat(len(cols) / 2), ['A','B']].reset_index(drop=True)
df_final = df1.join(pd.DataFrame(df[cols].to_numpy().reshape(-1,2), columns=['C1','C2']))
Out[698]:
A B C1 C2
0 a1 b1 c1 c2
1 a1 b1 c3 c4
2 a2 b2 c5 c6
3 a2 b2 c7 c8
Edit for updated sample:
On multiple columns Cx splitting by 2, you need wide_to_long. However, beforing doing it, you need pre-processing columns names to appropriate format to use with wide_to_long
df1 = df.set_index(['A','B'])
stub_cols = (np.arange(df1.columns.size) % 2).astype(str)
suff_cols = (np.arange(df1.columns.size) // 2).astype(str)
d = dict(zip(stub_cols, ['C1', 'C2']))
df1.columns = pd.Series(stub_cols) + '_' + suff_cols
df_final = pd.wide_to_long(df1.reset_index(),
i=['A','B'],
j='num',
stubnames=['0','1'],
sep='_').droplevel(-1).rename(d, axis=1).reset_index()
Out[680]:
A B C1 C2
0 a1 b1 c1 c2
1 a1 b1 c3 c4
2 a2 b2 c5 c6
3 a2 b2 c7 c8
Give this a try
df_final = df.set_index('A').stack().droplevel(1).rename('C').reset_index()
Out[604]:
A C
0 a1 c1
1 a1 c3
2 a2 c2
3 a2 c4
print(
pd.concat([df.A, df[['C1', 'C2']].apply(list, axis=1)], axis=1).explode(0).rename(columns={0:'C'})
)
Prints:
A C
0 a1 c1
0 a1 c3
1 a2 c2
1 a2 c4
I have a path in
A1 C:\Users\fe\Desktop\01 Tur\2015\Kauk\Telu\Frame Report.pdf
A2 C:\Users\fe\Desktop\01 Tur\Deliveries\10 Toim\Alh\2005\Moot\CMC.doc
A3 C:\Users\fe\Desktop\01 Tur\Equip\Set\M-R\Kir\G3\sen.xls
etc.
I would like to separate these paths to (example for A1)
A2 "Users" | A3 "fe" | A4 "Desktop" | A5 "01 Tur" | A6 "2015" | A7 "Kauk" | A8 "Telu" | A9 "Frame Report.pdf"
I have tried to play with
=IF(ISERROR(FIND("\";A1;FIND("\";A1;1)+2));A1;LEFT(A1;FIND("\";A1;FIND("\";A1;1)+2)))
but it is not so suitable for multiplication. Is there any better solution that can be copied for this case?
With data in A1, in B1 enter:
=TRIM(MID(SUBSTITUTE($A1,"\",REPT(" ",999)),COLUMNS($A:A)*999-998,999))
and copy across:
Let's say I have the following data :
Trade Data :
TradeId,CptyID,Exposure
T1 , C3, 100
T2 , C2, 50
T3 , C6, 200
Business Hierarchy Data :
CptyID,L1-Acronym,L2-Acronym,L3-Acronym
C3, H1, H2, H3
C2, H4, H5, H2
C6, H4, H5, H6
ID Mapping :
Acronym,CptyID,Identifier
H1 , C1, B1
H2 , C2, B2
H3 , C3, B3
H4 , C4, B4
H5 , C5, B5
H6 , C6, B6
IE having hierarchies like :
level Acronym(Identifier)
L1 H1(B1) H4(B4)
L2 H2(B2) H5(B5)
L3 H3(B3) H2(B2) H6(B6)
Trade T1 T2 T3
I would like to get the exposure by identifiers (B1, B2, B3, B4, B5, B6) where Exp(B1) = Exp(T1), Exp(B2) = Exp(T1)+Exp(T2)...
Joining them together doesn't work. It would give me 3 facts :
TradeID, CptyID, Exposure, L1-Acronym, L2-Acronym, L3-Acronym, Identifier
T1 , C3 , 100, H1, H2, H3, B3
T2 , C2 , 50, H4, H5, H2, B2
T3 , C6 , 200, H4, H5, H6, B6
and give me the wrong results as I would only get the exposures for the identifiers at Level 3 :
Identifier,ResultInLive,ExpectedResult
B1 , Null, 100 (Null because I have no facts associated directly to B1)
B2 , 50, 150
B3 , 100, 100
B4 , Null, 250
B5 , Null, 250
B6 , 200, 200
Another difficulty is that those dimensions can have a lot of members (>300K).
Kind regards,
Christophe
Thanks for your answer !
Each level of my Business Hierarchy data are "entities" which have identifiers.
For instance, lets only consider trade T1, who has an exposure of 100. I have a hierarchy of 3 levels:
the first level is H1, which has an identifier = B1
the second level is H2, which has an identifier = B2
the third and lower level is H3, which has an identifier of B3
The thing we are trying to achieved is to have an identifier dimension with members B1,B2, B3... with the right exposure.
Hence, in this case :
B3 would have an exposure of 100 coming from T1 => Exposure(B3) = Exposure(T1)
B2, who is B3 parent, would also have an exposure of 100 coming from T1 => Exposure(B2) = Exposure(T1)
B1, who is B2 parent, would also have an exposure of 100 coming from T1 => Exposure(B1) = Exposure(T1)
Joining using the cptyId doesnt give us the expected result as the underlying fact would be :
TradeID, CptyID, Exposure, L1-Acronym, L2-Acronym, L3-Acronym, Identifier
T1 , C3 , 100, H1, H2, H3, B3
Therefore, in ActivePivot Live, we would see :
Identifier,ResultIn AP Live,ExpectedResult
B1 , Null, 100 (Null because there is no facts associated directly to B1)
B2 , Null, 100 (Null because there is no facts associated directly to B2)
B3 , 100, 100 (given by the trade fact)
In the first post, I also wanted to illustrate the fact that the same identifier can be in 2 different hierarchies.
For instance :
L1 H1(B1) H4(B4)
L2 H2(B2) H5(B5)
L3 H3(B3) H2(B2) H6(B6)
Trade T1 T2 T3
we can see that B2 is present in at the L2 of the first hierarchy and L3 of the second hierarchy.
Therefore, we would expect to have Exposure(B2) = Exposure (T1) + Exposure (T2) = 150.
Kind regards
I'm looking for a way to visualize a Viterbi path in LaTeX or maybe Graphviz, much like in this example:
It doesn't have to be dots, but it could also be actual values between the lines. Much like a table with lines between cells.
I tried searching for ways to do this, but most likely I'm not using the right keywords.
Here's one way to achieve this using graphviz with invisible edges:
graph {
splines=false;
nodesep=0.5;
ranksep=0.5;
node[shape=point, height=0.08];
{ rank=same; a1 -- b1 -- c1 -- d1 -- e1;}
{ rank=same; a2 -- b2; b2 -- c2[style=invis]; c2 -- d2; d2 -- e2[style=invis];}
{ rank=same; a3 -- b3[style=invis]; b3 -- c3; c3 -- d3[style=invis]; d3 -- e3;}
edge[style=invis];
a1 -- a2 -- a3;
b1 -- b2 -- b3;
c1 -- c2 -- c3;
d1 -- d2 -- d3;
e1 -- e2 -- e3;
edge[style=solid, constraint=false];
a2 -- b3 -- c2 -- d3 -- e2;
}