I have a dataframe such as:
col1 col2 col3 ID
A 23 AZ ER1 ID1
B 12 ZE EZ1 ID2
C 13 RE RE1 ID3
I parsed the ID col in order to get some informations, to be quick, for each ID I get some informations, here is a result of the code:
for i in dataframe['ID']:
name = function(i,ranks=True)
print(name)
{'species': 'rabbit', 'genus': 'unis', 'subfamily': 'logomorphidae', 'family': 'lego', 'no rank': 'info, nothing', 'superkingdom': 'eucoryote'}
{'species': 'dog', 'genus': 'Rana', 'subfamily': 'Alphair', 'family': 'doggidae', 'no rank': 'dsDNA , no stage', 'superkingdom': 'eucaryote'}
{'species': 'duck', 'subfamily': 'duckinae', 'family': 'duckidae'}
...
as you can se it is a dictionary return. As you can also see for the ID 1 and 2 I get 6 informations (species, genus, subfamily, family,no rank,superkingdom)
for the ID 3 I only get 3 informations
And the idea is instead of just print the dic contents to add it directly in the dataframe and get :
col1 col2 col3 ID species genus subfamily family no rank superkingdom
A 23 AZ ER1 ID1 rabbit unis logomorphidae lego info, nothing, eucaryote
B 12 ZE EZ1 ID2 dog Rana Alphair doggidae dsDNA , no stage eucaryote
C 13 RE RE1 ID3 duck None duckinae duckidae None None
Have you an idea to do it with pandas?
Thanks for your help.
Store your output in a dict of dicts, making it easy to create a DataFrame and join it back.
d = {}
for i in dataframe['ID']:
d[i] = taxid.lineage_name(i, ranks=True)
df.merge(pd.DataFrame.from_dict(d, orient='index'), left_on='ID', right_index=True)
Output:
col1 col2 col3 ID species genus subfamily family no rank superkingdom
A 23 AZ ER1 ID1 rabbit unis logomorphidae lego info, nothing eucoryote
B 12 ZE EZ1 ID2 dog Rana Alphair doggidae dsDNA , no stage eucaryote
C 13 RE RE1 ID3 duck NaN duckinae duckidae NaN NaN
Related
How can I achieve this in pandas, I have a way where I take out each column as a new data frame and then so a insert in SQL but in that way if I have 10 columns I want to do the same I cannot make 10 data frames so I want to know how can I achieve it dynamically
I have a data set where I have the following data
Output I have
Id col1 col2 col3
1 Ab BC CD
2 har Adi tony
Output I want
Id col1
1 AB
1 BC
1 CD
2 har
2 ADI
2 Tony
melt does work, you just need a few extra steps for the exact output.
Assuming "Id" is a column (if not, reset_index).
(df.melt(id_vars='Id', value_name='col1')
.sort_values(by='Id')
.drop('variable', axis=1)
)
Output:
Id col1
0 1 Ab
2 1 BC
4 1 CD
1 2 har
3 2 Adi
5 2 tony
Used input:
df = pd.DataFrame({'Id': [1, 2],
'col1': ['Ab', 'har'],
'col2': ['BC', 'Adi'],
'col3': ['CD', 'tony']})
are there any way in pandas to separate data inside a row in a column? row have multiple data, I mean, I group by col1 and the result is that I have a df like that:
col1 Col2
0 1 abc,def,ghi
1 2 xyz,asd
and desired output would be:
Col1 Col2
0 1 abc
def
ghi
1 2 xyz
asd
thanks
Use str.split and explode:
print (df.assign(Col2=df["Col2"].str.split(","))
.explode("Col2"))
col1 Col2
0 1 abc
0 1 def
0 1 ghi
1 2 xyz
1 2 asd
I have a dataframe like this.
ID Name id2 name2
101 A 1 d_a
103 B 2 d_b
101 A 3 d_c
103 B 4 d_d
and i want the output df like this.
ID Name id2 name2
101 A [{'id2':1},{'id2':3}] [{'name2':'d_a'},{'name2':'d_c'}]
103 B [{'id2':2},{'id2':4}] [{'name2':'d_b'},{'name2':'d_d'}]
Use list comprehension with DataFrame.to_dict:
df1 = pd.DataFrame([[df[[x]].to_dict('r') for x in df]], columns=df.columns)
print (df1)
col1 \
0 [{'col1': 1}, {'col1': 2}, {'col1': 3}]
col2
0 [{'col2': 'def'}, {'col2': 'bb'}, {'col2': 'ra'}]
EDIT: Use GroupBy.apply with lambda function:
cols = ['id2','name2']
df2 = df.groupby(['ID','Name'])[cols].agg(lambda x: x.to_frame().to_dict('r')).reset_index()
print (df2)
ID Name id2 name2
0 101 A [{'id2': 1}, {'id2': 3}] [{'name2': 'd_a'}, {'name2': 'd_c'}]
1 103 B [{'id2': 2}, {'id2': 4}] [{'name2': 'd_b'}, {'name2': 'd_d'}]
I have a Data frame as below :
Col1 Col2 Col3 Col4
1 111 a Test
2 111 b Test
3 111 c Test
4 222 d Prod
5 333 e Prod
6 333 f Prod
7 444 g Test
8 555 h Prod
9 555 i Prod
Expected output :
Column 1 Column 2 Relationship Count
Col2 Col3 One-to-One 2
Col2 Col3 One-to-Many 3
Explanation :
I need to identify the relationship between Col2 & Col3 and also the value counts.
For Eg. 111(col2) is repeated 3 times and has 3 different respective values a,b,c in Col3.
This means col2 and col3 has one-to-Many relationship - count_1 : 1
222(col2) is not repeated and has only one respective value d in col3.
This means col2 and col3 has one-to-one relationshipt - count_2 : 1
333(col2) is repeated twice and has 2 different respective values e,f in col3.
This means col2 and col3 has one-to-Many relationship - count_1 : 1+1 ( increment this count for every one-to-many relationship)
Similarly for other column values increment the respective counter and display the final results as the expected dataframe.
If you only need to check the relationship between col2 and col3, you can do:
(
df.groupby(by='Col2').Col3
.apply(lambda x: 'One-to-One' if len(x)==1 else 'One-to-Many')
.to_frame('Relationship')
.groupby('Relationship').Relationship
.count().to_frame('Count').reset_index()
.assign(**{'Column 1':'Col2', 'Column 2':'Col3'})
.reindex(columns=['Column 1', 'Column 2', 'Relationship', 'Count'])
)
Output:
Column 1 Column 2 Relationship Count
0 Col2 Col3 One-to-Many 3
1 Col2 Col3 One-to-One 2
I have a Pandas dataframe column which has data in rows such as below:
col1
abc
ab23
2345
fgh67#
8980
I need to create 2 more columns col 2 and col 3 as such below:
col2 col3
abc 2345
ab23 8980
fgh67#
I have used str.isnumeric(), but thats not helping me in a dataframe column. can someone kindly help?
Use str.isnumeric or to_numeric with check non NaNs for boolean mask and filter by boolean indexing:
m = df['col1'].str.isnumeric()
#alternative
#m = pd.to_numeric(df['col1'], errors='coerce').notnull()
df = pd.concat([df.loc[~m, 'col1'].reset_index(drop=True),
df.loc[m, 'col1'].reset_index(drop=True)], axis=1, keys=('col2','col3'))
print (df)
col2 col3
0 abc 2345
1 ab23 8980
2 fgh67# NaN
If want add new columns to existed DataFrame with align by indices:
df['col2'] = df.loc[~m, 'col1']
df['col3'] = df.loc[m, 'col1']
print (df)
col1 col2 col3
0 abc abc NaN
1 ab23 ab23 NaN
2 2345 NaN 2345
3 fgh67# fgh67# NaN
4 8980 NaN 8980
Or without align:
df['col2'] = df.loc[~m, 'col1'].reset_index(drop=True)
df['col3'] = df.loc[m, 'col1'].reset_index(drop=True)
print (df)
col1 col2 col3
0 abc abc 2345
1 ab23 ab23 8980
2 2345 fgh67# NaN
3 fgh67# NaN NaN
4 8980 NaN NaN