How can I duplicate a row and append it directly after the duplicated row using pandas?

How can I duplicate a row and append it directly after the duplicated row using pandas? - python-3.x

I've been trying to figure this problem for a couple of hours now and seem to reach a dead end everytime. A small example of what I want to do is shown below.
Normal Series
a
b
c
d
Duplicated Series
a
a
b
b
c
c
d
d

Try with loc and df.index.repeat:
>>> df.loc[df.index.repeat(2)]
Normal Series
0 a
0 a
1 b
1 b
2 c
2 c
3 d
3 d
>>>
Or with reset_index:
>>> df.loc[df.index.repeat(2)].reset_index(drop=True)
Normal Series
0 a
1 a
2 b
3 b
4 c
5 c
6 d
7 d
>>>

You can just concat a duplicated series together and sort it.
sample = pd.Series(['a','b','c','d'])
output = pd.concat([sample,sample]).sort_values().reset_index(drop=True)
output

Related

Do I use a loop, df.melt or df.explode to achieve a flattened dataframe?

Can anyone help with some code that will achieve the following transformation? I have tried variations of df.melt, df.explode, and also a looping statement but only get error statements. I think it might need nesting but don't have the experience to do so.
index A B C D
0 X d 4 2
1 Y b 5 2
Where column D represents frequency of column C.
desired output is:
index A B C
0 X d 4
1 X d 4
2 Y b 5
3 Y b 5

If you want to repeat rows, why not use index.repeat?
import pandas as pd
#recreate the sample dataframe
df = pd.DataFrame({"A":["X","Y"],"B":["d","b"],"C":[4,5],"D":[3,2]}, columns=list("ABCD"))
df = df.reindex(df.index.repeat(df["D"])).drop("D", 1).reset_index(drop=True)
print(df)
Sample output
A B C
0 X d 4
1 X d 4
2 X d 4
3 Y b 5
4 Y b 5

Easily generate edge list from specific structure using pandas

This is a question about how to make things properly with pandas (I use version 1.0).
Let say I have a DataFrame with missions which contains an origin and one or more destinations:
mid from to
0 0 A [C]
1 1 A [B, C]
2 2 B [B]
3 3 C [D, E, F]
Eg.: For the mission (mid=1) people will travel from A to B, then from B to C and finally from C to A. Notice, that I have no control on the datamodel of the input DataFrame.
I would like to compute metrics on each travel of the mission. The expected output would be exactly:
tid mid from to
0 0 0 A C
1 1 0 C A
2 2 1 A B
3 3 1 B C
4 4 1 C A
5 5 2 B B
6 6 2 B B
7 7 3 C D
8 8 3 D E
9 9 3 E F
10 10 3 F C
I have found a way to achieve my goal. Please, find bellow the MCVE:
import pandas as pd
# Input:
df = pd.DataFrame(
[["A", ["C"]],
["A", ["B", "C"]],
["B", ["B"]],
["C", ["D", "E", "F"]]],
columns = ["from", "to"]
).reset_index().rename(columns={'index': 'mid'})
# Create chain:
df['chain'] = df.apply(lambda x: list(x['from']) + x['to'] + list(x['from']), axis=1)
# Explode chain:
df = df.explode('chain')
# Shift to create travel:
df['end'] = df.groupby("mid")["chain"].shift(-1)
# Remove extra row, clean, reindex and rename:
df = df.dropna(subset=['end']).reset_index(drop=True).reset_index().rename(columns={'index': 'tid'})
df = df.drop(['from', 'to'], axis=1).rename(columns={'chain': 'from', 'end': 'to'})
My question is: Is there a better/easier way to make it with Pandas? By saying better I mean, not necessary more performant (it can be off course), but more readable and intuitive.

Your operation is basically explode and concat:
# turn series of lists in to single series
tmp = df[['mid','to']].explode('to')
# new `from` is concatenation of `from` and the list
df1 = pd.concat((df[['mid','from']],
tmp.rename(columns={'to':'from'})
)
).sort_index()
# new `to` is concatenation of list and `to``
df2 = pd.concat((tmp,
df[['mid','from']].rename(columns={'from':'to'})
)
).sort_index()
df1['to'] = df2['to']
Output:
mid from to
0 0 A C
0 0 C A
1 1 A B
1 1 B C
1 1 C A
2 2 B B
2 2 B B
3 3 C D
3 3 D E
3 3 E F
3 3 F C

If you don't mind re-constructing the entire DataFrame then you can clean it up a bit with np.roll to get the pairs of destinations and then assign the value of mid based on the number of trips (length of each sublist in l)
import pandas as pd
import numpy as np
from itertools import chain
l = [[fr]+to for fr,to in zip(df['from'], df['to'])]
df1 = (pd.DataFrame(data=chain.from_iterable([zip(sl, np.roll(sl, -1)) for sl in l]),
columns=['from', 'to'])
.assign(mid=np.repeat(df['mid'].to_numpy(), [*map(len, l)])))
from to mid
0 A C 0
1 C A 0
2 A B 1
3 B C 1
4 C A 1
5 B B 2
6 B B 2
7 C D 3
8 D E 3
9 E F 3
10 F C 3

How to create pandas matrix from one column

I'm trying to create a matrix from one column into two columns, I think this i the right terminology. It's really a 2d matrix I think? I haven't found a lot on this topic which is why I am coming here.
This is what my starting dataframe looks like:
df:
[1]
A
B
C
This is what I am trying to end up with:
df2:
[1] [2]
A B
A C
B C
B A
C A
C B

You can try using permutations
from itertools import permutations
df = pd.DataFrame({1:['A','B','C']})
df_out = pd.DataFrame().from_records(permutations(df[1], 2), columns=[1, 2])
print(df_out)
OUtput:
1 2
0 A B
1 A C
2 B A
3 B C
4 C A
5 C B

Get column names from pandas DataFrame in format dtype:object

I have a similar doubt to the one in the mentioned link. Instead of returning column names in a list, I want column names in the format dtype:object.
For example,
A
B
C
D
Name:x,dtype:object
I am using Excel file in xlsx format.
Link: Get list from pandas DataFrame column headers

I think you need read_excel first for df and then Series constructor or Index.to_series for Series from column names:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5]})
print (df)
A B C D
0 1 4 7 1
1 2 5 8 3
2 3 6 9 5
s = pd.Series(df.columns.values, name='x')
print (s)
0 A
1 B
2 C
3 D
Name: x, dtype: object
s1 = df.columns.to_series().rename('x')
print (s1)
A A
B B
C C
D D
Name: x, dtype: object

Transpose data in Excel

My table currently looks like this:
1 a b c d e
2 a
3 b d g h
4 a c
5 d e j
My desired format is this:
1 a
1 b
1 c
1 d
1 e
2 a
3 b
3 d
3 g
3 h
4 a
4 c
5 d
5 e
5 j
Is there a way to make this modification in Microsoft Excel? I have attempted this in Ms Access but there is a column limit (225) which I exceed. In addition, I have attempted to use the TRANSPOSE function in Excel, but this only switches rows to columns. Please provide suggestions on how this transformation might be achieved. Thanks!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can I duplicate a row and append it directly after the duplicated row using pandas? - python-3.x

I've been trying to figure this problem for a couple of hours now and seem to reach a dead end everytime. A small example of what I want to do is shown below. Normal Series a b c d Duplicated Series a a b b c c d d

Try with loc and df.index.repeat: >>> df.loc[df.index.repeat(2)] Normal Series 0 a 0 a 1 b 1 b 2 c 2 c 3 d 3 d >>> Or with reset_index: >>> df.loc[df.index.repeat(2)].reset_index(drop=True) Normal Series 0 a 1 a 2 b 3 b 4 c 5 c 6 d 7 d >>>

You can just concat a duplicated series together and sort it. sample = pd.Series(['a','b','c','d']) output = pd.concat([sample,sample]).sort_values().reset_index(drop=True) output

Related

Do I use a loop, df.melt or df.explode to achieve a flattened dataframe?

Easily generate edge list from specific structure using pandas

How to create pandas matrix from one column

Get column names from pandas DataFrame in format dtype:object

Transpose data in Excel

Categories

Resources