line feed inside row in column with pandas - python-3.x

are there any way in pandas to separate data inside a row in a column? row have multiple data, I mean, I group by col1 and the result is that I have a df like that:
col1 Col2
0 1 abc,def,ghi
1 2 xyz,asd
and desired output would be:
Col1 Col2
0 1 abc
def
ghi
1 2 xyz
asd
thanks

Use str.split and explode:
print (df.assign(Col2=df["Col2"].str.split(","))
.explode("Col2"))
col1 Col2
0 1 abc
0 1 def
0 1 ghi
1 2 xyz
1 2 asd

Related

Pandas: Create different dataframes from an unique multiIndex dataframe

I would like to know how to pass from a multiindex dataframe like this:
A B
col1 col2 col1 col2
1 2 12 21
3 1 2 0
To two separated dfs. df_A:
col1 col2
1 2
3 1
df_B:
col1 col2
12 21
2 0
Thank you for the help
I think here is better use DataFrame.xs for selecting by first level:
print (df.xs('A', axis=1, level=0))
col1 col2
0 1 2
1 3 1
What need is not recommended, but possible create DataFrames by groups:
for i, g in df.groupby(level=0, axis=1):
globals()['df_' + str(i)] = g.droplevel(level=0, axis=1)
print (df_A)
col1 col2
0 1 2
1 3 1
Better is create dictionary of DataFrames:
d = {i:g.droplevel(level=0, axis=1)for i, g in df.groupby(level=0, axis=1)}
print (d['A'])
col1 col2
0 1 2
1 3 1

Pandas dataframe drop rows which store certain number of zeros in it

Hello I have dataframe which is having [13171 rows x 511 columns] what I wanted is remove the rows which is having certain number of zeros
for example
col0 col1 col2 col3 col4 col5
ID1 0 2 0 2 0
ID2 1 1 2 10 1
ID3 0 1 3 4 0
ID4 0 0 1 0 3
ID5 0 0 0 0 1
in ID5 row contains 4 zeros in it so I wanted to drop that row. like this I have large dataframe which is having more than 100-300 zeros in rows
I tried below code
df=df[(df == 0).sum(1) >= 4]
for small dataset like above example code is working but for [13171 rows x 511 columns] not working(df=df[(df == 0).sum(1) >= 15]) any one suggest me how can I get proper result
output
col0 col1 col2 col3 col4 col5
ID1 0 2 0 2 0
ID2 1 1 2 10 1
ID3 0 1 3 4 0
ID4 0 0 1 0 3
This will work:
drop_indexs = []
for i in range(len(df.iloc[:,0])):
if (df.iloc[i,:]==0).sum()>=4: # 4 is how many zeros should row min have
drop_indexs.append(i)
updated_df = df.drop(drop_indexs)

groupby column in pandas

I am trying to groupby columns value in pandas but I'm not getting.
Example:
Col1 Col2 Col3
A 1 2
B 5 6
A 3 4
C 7 8
A 11 12
B 9 10
-----
result needed grouping by Col1
Col1 Col2 Col3
A 1,3,11 2,4,12
B 5,9 6,10
c 7 8
but I getting this ouput
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000025BEB4D6E50>
I am getting using excel power query with function group by and count all rows, but I canĀ“t get the same with python and pandas. Any help?
Try this
(
df
.groupby('Col1')
.agg(lambda x: ','.join(x.astype(str)))
.reset_index()
)
it outputs
Col1 Col2 Col3
0 A 1,3,11 2,4,12
1 B 5,9 6,10
2 C 7 8
Very good I created solution between 0 and 0:
df[df['A'] != 0].groupby((df['A'] == 0).cumsum()).sub()
It will group column between 0 and 0 and sum it

Add rows to dataframe using the string values from column

I want to add rows to a dataframe based on a columns values for each row so a string value of (1:2:3) will create a new column and add rows for that column as described in the example below:
I have this kind of data:
Col1 | Col2
1 | 1:2:3
2 | 4:5
I want to transform it to look like this:
Col1 | Col2
1 | 1
1 | 2
1 | 3
2 | 4
2 | 5
I know that this can be done using nested for loops, but I'm sure there's a better way to do it.
Do split and explode
df=df.assign(Col2=df.Col2.str.split(':')).explode('Col2')
Out[161]:
Col1 Col2
0 1 1
0 1 2
0 1 3
1 2 4
1 2 5
df = pd.DataFrame({'Col1':[1,2],'Col2':['1:2:3','4:5']})
Split the values in Col2 so they are lists and explode.
>>> df['Col2'] = df.apply(lambda x: x['Col2'].split(':'), axis = 1)
>>> df.explode('Col2')
Col1 Col2
0 1 1
0 1 2
0 1 3
1 2 4
1 2 5

Prevent column name from disappearing after using replace on dataframe

So I have a real dataframe that somewhat follows the next structure:
d = {'col1':['1_ABC','2_DEF','3 GHI']}
df = pd.DataFrame(data=d)
Basically, some entries have the " _ ", others have " ".
My goal is to split that first number into a new column and keep the rest. For this, I thought I'd first replace the '_' by ' ' to normalize everything, and then simply split by ' ' to get the new column.
#Replace the '_' for ' '
new_df['Name'] = df['Name'].str.replace('_',' ')
My problem is that now my new_df now lost its column name:
0 1 ABC
1 2 DEF
Any way to prevent this from happening?
Thanks!
Function str.replace return Series, so there is no column name, only Series name.
s = df['col1'].str.replace('_',' ')
print (s)
0 1 ABC
1 2 DEF
2 3 GHI
Name: col1, dtype: object
print (type(s))
<class 'pandas.core.series.Series'>
print (s.name)
col1
If need new column assign to same DataFrame - df['Name']:
df['Name'] = df['col1'].str.replace('_',' ')
print (df)
col1 Name
0 1_ABC 1 ABC
1 2_DEF 2 DEF
2 3 GHI 3 GHI
Or overwrite values of original column:
df['col1'] = df['col1'].str.replace('_',' ')
print (df)
col1
0 1 ABC
1 2 DEF
2 3 GHI
If need new one column DataFrame use Series.to_frame for convert Series to df:
df2 = df['col1'].str.replace('_',' ').to_frame()
print (df2)
col1
0 1 ABC
1 2 DEF
2 3 GHI
Also is possible define new column name:
df1 = df['col1'].str.replace('_',' ').to_frame('New')
print (df1)
New
0 1 ABC
1 2 DEF
2 3 GHI
Like #anky_91 commented, if need new 2 columns add str.split:
df1 = df['col1'].str.replace('_',' ').str.split(expand=True)
df1.columns = ['A','B']
print (df1)
A B
0 1 ABC
1 2 DEF
2 3 GHI
If need add columns to existing DataFrame:
df[['A','B']] = df['col1'].str.replace('_',' ').str.split(expand=True)
print (df)
col1 A B
0 1_ABC 1 ABC
1 2_DEF 2 DEF
2 3 GHI 3 GHI

Resources