How to flatten multiple dictionary objects inside a list in pandas column? - python-3.x

I have a df that with a column that looks like this:
col1
[{'value':'2019-02-02'},{'value':'test1'},{'value':'test2'},{'value':'test3'},{'value':'test4'},{'value':'test5'},{'value':'test6'}]
How do I flatten this column so it looks like this:
col1 col2
value 2019-02-02
value test1
value test2
value test3
value test4
value test5
value test6
I am trying this:
df['Col1'] = [x[0]['value'] for x in df['Col1']]
This is only picking the first value in all the nested dictionary and makign the df like this:
col1
2019-02-02
2019-02-02
2019-02-02
2019-02-02

Do with tolist then we need concat with each pd.DataFrame(sublist)
pd.concat([pd.DataFrame(sublist) for sublist in df.col1.tolist()]).stack().reset_index(level=1)
Out[22]:
level_1 0
0 value 2019-02-02
1 value test1
2 value test2
3 value test3
4 value test4
5 value test5
6 value test6
0 value 2019-02-02
1 value test1
2 value test2
3 value test3
4 value test4
5 value test5
6 value test6

Related

line feed inside row in column with pandas

are there any way in pandas to separate data inside a row in a column? row have multiple data, I mean, I group by col1 and the result is that I have a df like that:
col1 Col2
0 1 abc,def,ghi
1 2 xyz,asd
and desired output would be:
Col1 Col2
0 1 abc
def
ghi
1 2 xyz
asd
thanks
Use str.split and explode:
print (df.assign(Col2=df["Col2"].str.split(","))
.explode("Col2"))
col1 Col2
0 1 abc
0 1 def
0 1 ghi
1 2 xyz
1 2 asd

Grouping corresponding Rows based on One column

I have an Excel Sheet Dataframe with no fixed number of rows and columns.
eg.
Col1 Col2 Col3
A 1 -
A - 2
B 3 -
B - 4
C 5 -
I would like to Group Col1 which has the same content. Like the following.
Col1 Col2 Col3
A 1 2
B 3 4
C 5 -
I am using pandas GroupBy, but not getting what I wanted.
Try using groupby:
print(df.replace('-', pd.np.nan).groupby('Col1', as_index=False).first().fillna('-'))
Output:
Col1 Col2 Col3
0 A 1 2
1 B 3 4
2 C 5 -

How to update NaN values in a dataframe with a dictionary

I'm trying to replace NaN values in one column when the adjacent column starts with a string.
starts_with ={'65S':'test1','95S':test2,'20S':"test3"}
Here is the DF
15 65String NaN
16 95String NaN
17 20String NaN
I would like it to look like
15 65String test1
16 95String test2
17 20String test3
Using findall find the key in dict , the just map back the dict's value
df.c3=df.c2.str.findall('|'.join(starts_with.keys())).str[0].map(starts_with)
df
Out[110]:
c1 c2 c3
0 15 65String test1
1 16 95String test2
2 17 20String test3
Use Series.str.extract with ^ for match start of string with Series.map by dictionary:
starts_with ={'65S':'test1','95S':'test2','20S':"test3"}
pat = '|'.join(r"^{}".format(x) for x in starts_with)
df['C'] = df['B'].str.extract('(' + pat + ')', expand=False).map(starts_with)
print (df)
A B C
0 15 65String test1
1 16 95String test2
2 17 20String test3

How to rename values in a column from list?

I have a df that looks like this:
col1 col2
value test1
value test2
value test3
value test4
value test5
I want to rename col1 from a list in a repeating fashion like so:
lst = ['new1','new2','new3','new4','new5']
col1 col2
new1 test1
new2 test2
new3 test3
new4 test4
new5 test5
I need the list to repeat for all the rows in col1.
I tried this:
df = df.set_index('col1')
df = df.rename(index={'value':['new1','new2','new3','new4','new5']})
but this passes the entire list into each row of col1 like so:
col1 col2
['new1','new2','new3','new4','new5'] test1
['new1','new2','new3','new4','new5'] test2
['new1','new2','new3','new4','new5'] test3
['new1','new2','new3','new4','new5'] test4
['new1','new2','new3','new4','new5'] test5
assign
This only works for OP's example where the lst length is the same as the dataframe df
df.assign(col1=lst)
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
More generically
This is more generic. If you aren't using Python 3.6 and have f-strings you can use str.format
df.assign(col1=[f'new{i+1}' for i in range(len(df))])
# df.assign(col1=[*map('new{}'.format, range(1, len(df) + 1))])
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
With itertools
If you want to just repeat the list you've got, I'd use itertools islice and cycle
from itertools import cycle, islice
df.assign(col1=[*islice(cycle(lst), len(df))])
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
One way from numpy.put
lst = ['new1','new2']
np.put(df['col1'],np.arange(len(df)),lst)
df
Out[37]:
col1 col2
0 new1 test1
1 new2 test2
2 new1 test3
3 new2 test4
4 new1 test5
Another option
n=len(df)
df['col1']=(lst*((n//len(lst))+1))[:n]
df
Out[48]:
col1 col2
0 new1 test1
1 new2 test2
2 new1 test3
3 new2 test4
4 new1 test5

How to prevent df.pivot from inserting 'None' and blank rows during pivot?

I am wanting to pivot a df that looks like this:
columns values
col1 test1
col2 test2
col3 test3
col4 test4
col1 test5
col2 test6
col3 test7
col4 test8
I am trying this:
df['index'] = df.index
df = df.pivot(index='index', columns='columns', values='values')
which results in a df that looks like this (roughly):
col1 col2 col3 col4
None None test1 None
test5 None None None
How do I pivot the df to look like this?:
col1 col2 col3 col4
test1 test2 test3 test4
test5 test6 test7 test8
I am creating an articfical index column because I dont have another column to make an index. I only have 2 columns in the dataframe.
Using cumcount create a new key then do pivot
df.assign(key=df.groupby('columns').cumcount()).pivot('key','columns','values')
Out[54]:
columns col1 col2 col3 col4
key
0 test1 test2 test3 test4
1 test5 test6 test7 test8

Resources