Appending Rows From an Same Dataframe [duplicate] - python-3.x

Assuming the following DataFrame:
key.0 key.1 key.2 topic
1 abc def ghi 8
2 xab xcd xef 9
How can I combine the values of all the key.* columns into a single column 'key', that's associated with the topic value corresponding to the key.* columns? This is the result I want:
topic key
1 8 abc
2 8 def
3 8 ghi
4 9 xab
5 9 xcd
6 9 xef
Note that the number of key.N columns is variable on some external N.

You can melt your dataframe:
>>> keys = [c for c in df if c.startswith('key.')]
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key')
topic variable key
0 8 key.0 abc
1 9 key.0 xab
2 8 key.1 def
3 9 key.1 xcd
4 8 key.2 ghi
5 9 key.2 xef
It also gives you the source of the key.
From v0.20, melt is a first class function of the pd.DataFrame class:
>>> df.melt('topic', value_name='key').drop('variable', 1)
topic key
0 8 abc
1 9 xab
2 8 def
3 9 xcd
4 8 ghi
5 9 xef

After trying various ways, I find the following is more or less intuitive, provided stack's magic is understood:
# keep topic as index, stack other columns 'against' it
stacked = df.set_index('topic').stack()
# set the name of the new series created
df = stacked.reset_index(name='key')
# drop the 'source' level (key.*)
df.drop('level_1', axis=1, inplace=True)
The resulting dataframe is as required:
topic key
0 8 abc
1 8 def
2 8 ghi
3 9 xab
4 9 xcd
5 9 xef
You may want to print intermediary results to understand the process in full. If you don't mind having more columns than needed, the key steps are set_index('topic'), stack() and reset_index(name='key').

OK , cause one of the current answer is mark as duplicated of this question, I will answer here.
By Using wide_to_long
pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1)
Out[123]:
topic key
0 8 abc
1 9 xab
2 8 def
3 9 xcd
4 8 ghi
5 9 xef

Related

Iterating through a data frame and grouping values in a range

I have a python data frame of weekly data like this :
Week Val
1 11
2 11
3 11
4 11
5 9
6 9
7 9
8 9
I would like create an output table like this:
Week 1 Week 2 Val
1 4 11
5 8 9
Apologies, I am quite new to python and its iterative tools. I am not sure how to solve this problem.
I tried to match using the previous row columns but I do not think how to go further:
df['Match'] = df['Val'].eq(df['Val'].shift(-1))
You want to groupby the consecutive blocks of Val. So you can use cumsum on the non-zero differences to get the block:
blocks = df['Val'].ne(df['Val'].shift(1)).cumsum()
(df.groupby(blocks, as_index=False)
.agg(Week1=('Week','min'), Week2=('Week','max'), Val=('Val', 'first'))
)
Or you can chain:
(df.groupby(df['Val'].ne(df['Val'].shift(1)).cumsum(), as_index=False)
.agg(Week1=('Week','min'), Week2=('Week','max'),Val=('Val', 'first'))
)
Output:
Week1 Week2 Val
0 1 4 11
1 5 8 9

Pandas: Getting new dataframe from existing dataframe from list of substring present in column name

Hello I have dataframe called df and list of substring present in dataframe main problem i am facing is some of the substrings are not present in dataframe.
ls = ["SRR123", "SRR154", "SRR655", "SRR224","SRR661"]
data = {'SRR123_em1': [1,2,3], 'SRR123_em2': [4,5,6], 'SRR661_em1': [7,8,9], 'SRR661_em2': [6,7,8],'SRR453_em2': [10,11,12]}
df = pd.DataFrame(data)
Output:
SRR123_em1 SRR123_em2 SRR661_em1 SRR661_em2
1 4 7 6
2 5 8 7
3 6 9 8
please any one suggest me how can obtaine my output
Do filter with str.contains
sub_df=df.loc[:,df.columns.str.contains('|'.join(ls))].copy()
Out[295]:
SRR123_em1 SRR123_em2 SRR661_em1 SRR661_em2
0 1 4 7 6
1 2 5 8 7
2 3 6 9 8

Append Dataframes of different dimensions

I have multiple dataframes with a different number of rows and columns respectively.
example:
df1:
a b c d
0 1 5 6
8 9 8 7
and df2:
g h
9 8
4 5
6 7
I have to append both the dataframes without a change in their dimensions.
The desired output should be one dataframe Result_df as:
a b c d
0 1 5 6
8 9 8 7
g h
9 8
4 5
6 7
Can anyone please help me to append dataframes without change in their structure.
Thank you

How do I compare a dataframe column with another dataframe and create a column

I have two dataframes df1 and df2. Here is a small sample
Days
4
6
9
1
4
My df2 is
Day1 Day2 Alphabets
2 5 abc
4 7 bcd
8 10 ghi
10 12 abc
I want to change my df1 such that it has new column Alphabets from df2 if the days in df1 is between day1 and day2. Something like:
if df1['Days'] in between df2['Day1'] and df2['Day2']:
df1['Alphabets']=df2['Alphabets']
Result is:
Days Alphabets
4 abc
6 bcd
9 ghi
etc.
I tried for loop and its taking a lot of time even to run. Is there any other elegant way to do?
Thanks in advance
I will use numpy broadcast
s1=df2.Day1.values
s2=df2.Day2.values
s=df1.Days.values[:,None]
df1['V']=((s-s1>0)&(s-s2<0)).dot(df2.Alphabets)
df1
Out[277]:
Days V
0 4 abc
1 6 bcd
2 9 ghi
3 1
4 4 abc

How to select last 5 rows of each unique records in pandas

Using python 3 am trying for each uniqe row in the column 'Name' to get the last 5 records from the column 'Number'. How exactly can this be done in python?
My df looks like this:
Name Number
a 5
a 6
b 7
b 8
a 9
a 10
b 11
b 12
a 9
b 8
I saw same exmples(like this one Get sum of last 5 rows for each unique id ) in SQL but that is time consuming and I would like to learn how to do it in python.
My expected output df would be like this:
Name 1 2 3 4 5
a 5 6 9 10 9
b 7 8 11 12 8
I think you need something like this:
df_out = df.groupby('Name').tail(5)
df_out.set_index(['Name', df_out.groupby('Name').cumcount() +1])['Number'].unstack()
Output:
1 2 3 4 5
Name
a 5 6 9 10 9
b 7 8 11 12 8
Looks like you need pivot after a groupby.cumcount()
df1=df.groupby('Name').tail(5)
final=(df1.assign(k=df1.groupby('Name').cumcount()+1)
.pivot(index='Name', columns='k', values='Number')
.reset_index().rename_axis(None, axis=1))
print(final)
Name 1 2 3 4 5
0 a 5 6 9 10 9
1 b 7 8 11 12 8

Resources