Use pandas dataframe column values to pivot other columns - python-3.x

I have the following dataframe which I want to reshape:
dir hour board_sign pass
1 5 d 294
1 5 u 342
1 6 d 1368
1 6 u 1268
1 7 d 3880
1 7 u 3817
What I want to do is to use the values from "board_sign" as new columns which will include the values from "pass" column so that the dataframe will look as this:
dir hour d u
1 5 294 342
1 6 1368 1268
1 7 3880 3817
I already tried several functions as melt pivot stack and unstack but it seems non of them give the wanted result, I also tried the pivot_table but it make is difficult to iterate since the multi index.
It's seems like an easy operation but I just cant get it right.
Is there any other function I can use for this?
Thanks.

Use pivot_table:
df = df.pivot_table(index=['dir', 'hour'], columns='board_sign', values='pass').reset_index()
del df.columns.name
df
dir hour d u
0 1 5 294 342
1 1 6 1368 1268
2 1 7 3880 3817

Related

How to split dataframe by column value condition, pandas

I want to split a dataframe in to different lists based on column value condition.
Here is a dataframe example.
df=pd.DataFrame({'flag_1':[1,2,3,1,2,500,498,495,1,1,1,1,1,500,440,430,2,3,4,4],'dd':[1,1,1,7,7,7,8,8,8,1,1,1,7,7,7,8,8,8,5,7]})
df_out
df_out=pd.DataFrame({'flag_1':[500,498,495,500,440,430],'dd':[7,8,8,7,7,8]})
Try this:
grp = (df['flag_1']<500).cumsum()
pd.concat({n: g[1:] for n, g in tuple(df.groupby(grp)) if len(g) > 1}, ignore_index=True)
Output:
flag_1 dd
0 500 7
1 598 8
2 595 8
3 500 7
4 540 7
5 5430 8

loops application in dataframe to find output

I have the following data:
dict={'A':[1,2,3,4,5],'B':[10,20,233,29,2],'C':[10,20,3040,230,238]...................}
and
df= pd.Dataframe(dict)
In this manner I have 20 columns with 5 numerical entry in each column
I want to have a new column where the value should come as the following logic:
0 A[0]*B[0]+A[0]*C[0] + A[0]*D[0].......
1 A[1]*B[1]+A[1]*C[1] + A[1]*D[1].......
2 A[2]*B[2]+A[2]*B[2] + A[2]*D[2].......
I tried in the following manner but manually I can not put 20 columns, so I wanted to know the way to apply a loop to get the desired output
:
lst=[]
for i in range(0,5):
j=df.A[i]*df.B[i]+ df.A[i]*df.C[i]+.......
lst.append(j)
i=i+1
A potential solution is the following. I am only taking the example you posted but is works fine for more. Your data is df
A B C
0 1 10 10
1 2 20 20
2 3 233 3040
3 4 29 230
4 5 2 238
You can create a new column, D by first subsetting your dataframe
add = df.loc[:, df.columns != 'A']
and then take the sum over all multiplications of the columns in D with column A in the following way:
df['D'] = df['A']*add.sum(axis=1)
which returns
A B C D
0 1 10 10 20
1 2 20 20 80
2 3 233 3040 9819
3 4 29 230 1036
4 5 2 238 1200

How can I sort 3 columns and assign it to one python pandas

I have a dataframe:
df = {A:[1,1,1], B:[2012,3014,3343], C:[12,13,45], D:[111,222,444]}
but I need to join the last 3 columns in consecutive order horizontally and thus assign it to the first column, some like this:
df2 = {A:[1,1,1,2,2,2], Fusion3:[2012,12,111,3014,13,222]}
I have tried with .melt, but you are struggling with some ideas and grateful for your comments
From the desired output I'm making the assumption that the initial dataframe should have 1,2,3 in the A column rather 1,1,1
import pandas as pd
df= pd.DataFrame({'A':[1,2,3], 'B':[2012,3014,3343], 'C':[12,13,45], 'D':[111,222,444]})
df = df.set_index('A')
df = df.stack().droplevel(1)
will give you this series:
A
1 2012
1 12
1 111
2 3014
2 13
2 222
3 3343
3 45
3 444
Check melt
out = df.melt('A').drop('variable',1)
Out[15]:
A value
0 1 2012
1 2 3014
2 3 3343
3 1 12
4 2 13
5 3 45
6 1 111
7 2 222
8 3 444

New dataframe row by row from a different format dataframe

I'm wandering if something like this could be achieved with python.
I currently have the following dataframe (df1):
A B C D E F
1.1.1 amba 131 1 50 4
2.2.2 erto 50 7 131 8
3.3.3 gema 131 2 50 5
And I would like to get this output in a new dataframe (df2):
ID User 131 50
1.1.1 amba 1 4
2.2.2 erto 8 7
3.3.3 gema 2 5
Take in mind that df1 has an undetermined number of rows and df2 should have the same number of rows than df1. First and second column do not change and keep the same. Columns C and E in df1 store attribute IDs while columns D and F store attribute's values. For example, in df1 131=1 and 50=4 in the first row. Plus attribute IDs are not always in the same column and the attribute ID could be placed in Column C or column E.
I am thinking on creating df2 using a loop and analyzing rows with lambda but i am currently having issues to make work anything for the moment. Any idea?
I have understood evey part of the code and I am now adding columns but I am wondering if this could be done with a loop or something similar. This is how code looks after adding 4 extra colums:
import pandas as pd
import io
df1 = pd.read_csv(io.StringIO(""" A B C D E F G H I J
1.1.1 amba 131 1 50 4 40 3 150 5
2.2.2 erto 50 7 40 8 150 8 131 2
3.3.3 gema 131 2 150 5 40 1 50 3"""), sep="\s+")
df2 = (pd.concat([df1.drop(columns=["C","D","E","F","G","H"]).rename(columns={"I":"key","J":"val"}),
df1.drop(columns=["C","D","E","F","I","J"]).rename(columns={"G":"key","H":"val"}),
df1.drop(columns=["C","D","G","H","I","J"]).rename(columns={"E":"key","F":"val"}),
df1.drop(columns=["E","F","G","H","I","J"]).rename(columns={"C":"key","D":"val"}),
])
.rename(columns={"A":"ID","B":"User"})
.set_index(["ID","User","key"])
.unstack(2)
.reset_index()
)
# flatten the columns..
df2.columns = [c[1] if c[0]=="val" else c[0] for c in df2.columns.to_flat_index()]
df2
print(df2)
And this is the output:
ID User 40 50 131 150
0 1.1.1 amba 3 4 1 5
1 2.2.2 erto 8 7 2 8
2 3.3.3 gema 1 3 2 5
So yes, everything is working fine but I would like to find a way to make this with a loop instead of having tons of lines (I have about 70 columns per row). Thank you very much for the help. Thanks.
I have just one extra question and I will have everything working fine. In my actual table I have some rows with 60 columns nd other ones with just 30 or so. This means that I have tons of NaN in these rows with less colums, so I am getting an error when try to unstack. I have read about pivot_tables, drop_duplicates, etc, but not sure how to make work some of these options with this code. Thanks!
Logically you have a mix of keys being part of row and part of columns. Construct a df by concat() that has the whole key as part of the row. Then it's a simple case of using unstack() to get what you want
df1 = pd.read_csv(io.StringIO(""" A B C D E F
1.1.1 amba 131 1 50 4
2.2.2 erto 50 7 131 8
3.3.3 gema 131 2 50 5"""), sep="\s+")
df2 = (pd.concat([df1.drop(columns=["C","D"]).rename(columns={"E":"key","F":"val"}),
df1.drop(columns=["E","F"]).rename(columns={"C":"key","D":"val"}),
])
.rename(columns={"A":"ID","B":"User"})
.set_index(["ID","User","key"])
.unstack(2)
.reset_index()
)
# flatten the columns..
df2.columns = [c[1] if c[0]=="val" else c[0] for c in df2.columns.to_flat_index()]
df2
output
ID User 50 131
1.1.1 amba 4 1
2.2.2 erto 7 8
3.3.3 gema 5 2
......

Selective multiplication of a pandas dataframe

I have a pandas Dataframe and Series of the form
df = pd.DataFrame({'Key':[2345,2542,5436,2468,7463],
'Segment':[0] * 5,
'Values':[2,4,6,6,4]})
print (df)
Key Segment Values
0 2345 0 2
1 2542 0 4
2 5436 0 6
3 2468 0 6
4 7463 0 4
s = pd.Series([5436, 2345])
print (s)
0 5436
1 2345
dtype: int64
In the original df, I want to multiply the 3rd column(Values) by 7 except for the keys which are present in the series. So my final df should look like
What should be the best way to achieve this in Python 3.x?
Use DataFrame.loc with Series.isin for filter Value column with inverted condition for non membership with multiple by scalar:
df.loc[~df['Key'].isin(s), 'Values'] *= 7
print (df)
Key Segment Values
0 2345 0 2
1 2542 0 28
2 5436 0 6
3 2468 0 42
4 7463 0 28
Another method could be using numpy.where():
df['Values'] *= np.where(~df['Key'].isin([5436, 2345]), 7,1)

Resources