How to prevent df.pivot from inserting 'None' and blank rows during pivot? - python-3.x

I am wanting to pivot a df that looks like this:
columns values
col1 test1
col2 test2
col3 test3
col4 test4
col1 test5
col2 test6
col3 test7
col4 test8
I am trying this:
df['index'] = df.index
df = df.pivot(index='index', columns='columns', values='values')
which results in a df that looks like this (roughly):
col1 col2 col3 col4
None None test1 None
test5 None None None
How do I pivot the df to look like this?:
col1 col2 col3 col4
test1 test2 test3 test4
test5 test6 test7 test8
I am creating an articfical index column because I dont have another column to make an index. I only have 2 columns in the dataframe.

Using cumcount create a new key then do pivot
df.assign(key=df.groupby('columns').cumcount()).pivot('key','columns','values')
Out[54]:
columns col1 col2 col3 col4
key
0 test1 test2 test3 test4
1 test5 test6 test7 test8

Related

Identify the relationship between two columns and its respective value count in pandas

I have a Data frame as below :
Col1 Col2 Col3 Col4
1 111 a Test
2 111 b Test
3 111 c Test
4 222 d Prod
5 333 e Prod
6 333 f Prod
7 444 g Test
8 555 h Prod
9 555 i Prod
Expected output :
Column 1 Column 2 Relationship Count
Col2 Col3 One-to-One 2
Col2 Col3 One-to-Many 3
Explanation :
I need to identify the relationship between Col2 & Col3 and also the value counts.
For Eg. 111(col2) is repeated 3 times and has 3 different respective values a,b,c in Col3.
This means col2 and col3 has one-to-Many relationship - count_1 : 1
222(col2) is not repeated and has only one respective value d in col3.
This means col2 and col3 has one-to-one relationshipt - count_2 : 1
333(col2) is repeated twice and has 2 different respective values e,f in col3.
This means col2 and col3 has one-to-Many relationship - count_1 : 1+1 ( increment this count for every one-to-many relationship)
Similarly for other column values increment the respective counter and display the final results as the expected dataframe.
If you only need to check the relationship between col2 and col3, you can do:
(
df.groupby(by='Col2').Col3
.apply(lambda x: 'One-to-One' if len(x)==1 else 'One-to-Many')
.to_frame('Relationship')
.groupby('Relationship').Relationship
.count().to_frame('Count').reset_index()
.assign(**{'Column 1':'Col2', 'Column 2':'Col3'})
.reindex(columns=['Column 1', 'Column 2', 'Relationship', 'Count'])
)
Output:
Column 1 Column 2 Relationship Count
0 Col2 Col3 One-to-Many 3
1 Col2 Col3 One-to-One 2

Grouping corresponding Rows based on One column

I have an Excel Sheet Dataframe with no fixed number of rows and columns.
eg.
Col1 Col2 Col3
A 1 -
A - 2
B 3 -
B - 4
C 5 -
I would like to Group Col1 which has the same content. Like the following.
Col1 Col2 Col3
A 1 2
B 3 4
C 5 -
I am using pandas GroupBy, but not getting what I wanted.
Try using groupby:
print(df.replace('-', pd.np.nan).groupby('Col1', as_index=False).first().fillna('-'))
Output:
Col1 Col2 Col3
0 A 1 2
1 B 3 4
2 C 5 -

How to rename values in a column from list?

I have a df that looks like this:
col1 col2
value test1
value test2
value test3
value test4
value test5
I want to rename col1 from a list in a repeating fashion like so:
lst = ['new1','new2','new3','new4','new5']
col1 col2
new1 test1
new2 test2
new3 test3
new4 test4
new5 test5
I need the list to repeat for all the rows in col1.
I tried this:
df = df.set_index('col1')
df = df.rename(index={'value':['new1','new2','new3','new4','new5']})
but this passes the entire list into each row of col1 like so:
col1 col2
['new1','new2','new3','new4','new5'] test1
['new1','new2','new3','new4','new5'] test2
['new1','new2','new3','new4','new5'] test3
['new1','new2','new3','new4','new5'] test4
['new1','new2','new3','new4','new5'] test5
assign
This only works for OP's example where the lst length is the same as the dataframe df
df.assign(col1=lst)
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
More generically
This is more generic. If you aren't using Python 3.6 and have f-strings you can use str.format
df.assign(col1=[f'new{i+1}' for i in range(len(df))])
# df.assign(col1=[*map('new{}'.format, range(1, len(df) + 1))])
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
With itertools
If you want to just repeat the list you've got, I'd use itertools islice and cycle
from itertools import cycle, islice
df.assign(col1=[*islice(cycle(lst), len(df))])
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
One way from numpy.put
lst = ['new1','new2']
np.put(df['col1'],np.arange(len(df)),lst)
df
Out[37]:
col1 col2
0 new1 test1
1 new2 test2
2 new1 test3
3 new2 test4
4 new1 test5
Another option
n=len(df)
df['col1']=(lst*((n//len(lst))+1))[:n]
df
Out[48]:
col1 col2
0 new1 test1
1 new2 test2
2 new1 test3
3 new2 test4
4 new1 test5

Remove duplicates from multiple cells of a column seperated by "|"

I want to remove duplicates from multiple cells of the column 5 with delimiter "|". The data I have looks like this:
Col1 Col2 Col3 Col4 Col5
1048563 93750984 5 0.499503476 HTR7|HTR7|HTR7
1048564 93751210 5 0.499503476 ABHD3|ABHD3|ABHD3|ABHD3|ABHD3|ABHD3
1048566 93751298 5 0.499503476 ADCYAP1|ADCYAP1|ADCYAP1|ADCYAP1
And I want the result to be:
Col1 Col2 Col3 Col4 Col5
1048563 93750984 5 0.499503476 HTR7
1048564 93751210 5 0.499503476 ABHD3
1048566 93751298 5 0.499503476 ADCYAP1
The number of rows and columns are different.The length of the text in column 5 is not always the same

How to flip string using linux in specific columns

I have few columns as shown below:
col1 col2 col3 col4 a/t t/g g/t f/g
col3 col2 col4 col5 t/a g/t f/g g/t
I would need to flip the values in columns after 4, and the sample output is shown below:
col1 col2 col3 col4 t/a g/t t/g g/f
col3 col2 col4 col5 a/t t/g g/f t/g
I tried using the -rev option in bash but it prints the whole row in the inverted direction (mirror image). Is there an alternate solution for this just to flip the strings as shown in the output? Thanks in advance.
You don't say what the first 4 column may contain, so I assume this would be enough
sed 's/\(\w\)\/\(\w\)/\2\/\1/g' <yourfile>
like:
$ cat test
col1 col2 col3 col4 t/a g/t t/g g/f
col3 col2 col4 col5 a/t t/g g/f t/g
$ sed 's/\(\w\)\/\(\w\)/\2\/\1/g' test
col1 col2 col3 col4 a/t t/g g/t f/g
col3 col2 col4 col5 t/a g/t f/g g/t
if you want to save the result to a file, redirect sed output:
$ sed 's/\(\w\)\/\(\w\)/\2\/\1/g' test > newfile
perl -lane 'print join " ", #F[0..3], map { scalar reverse $_} #F[4..$#F]'

Resources