Combine dataframe within the list to form a single dataframe using pandas in python [duplicate] - python-3.x

This question already has answers here:
How to merge two dataframes side-by-side?
(6 answers)
Closed 2 years ago.
Let say I have a list df_list with 3 single column pandas dataframe as below:
>>> df_list
[ A
0 1
1 2
2 3, B
0 4
1 5
2 6, C
0 7
1 8
2 9]
I would like to merge them to become a single dataframe dat as below:
>>> dat
A B C
0 1 4 7
1 2 5 8
2 3 6 9
One way I can get it done is to create a blank dataframe and concatenate each of them using for loop.
dat = pd.DataFrame([])
for i in range(0, len(df_list)):
dat = pd.concat([dat, df_list[i]], axis = 1)
Is there a more efficient way to achieve this without using iteration? Thanks in advance.

Use concat with list of DataFrames:
dat = pd.concat(df_list, axis = 1)

Related

Compare two dataframes and export unmatched data using pandas or other packages?

I have two dataframes and one is a subset of another one (picture below). I am not sure whether pandas can compare two dataframes and filter the data which is not in the subset and export it as a dataframe. Or is there any package doing this kind of task?
The subset dataframe was generated from RandomUnderSampler but the RandomUnderSampler did not have function which exports the unselected data. Any comments are welcome.
Use drop_duplicates with keep=False parameter:
Example:
>>> df1
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
>>> df2
A B
0 0 1
1 2 3
2 6 7
>>> pd.concat([df1, df2]).drop_duplicates(keep=False)
A B
2 4 5
4 8 9

Pandas: Getting new dataframe from existing dataframe from list of substring present in column name

Hello I have dataframe called df and list of substring present in dataframe main problem i am facing is some of the substrings are not present in dataframe.
ls = ["SRR123", "SRR154", "SRR655", "SRR224","SRR661"]
data = {'SRR123_em1': [1,2,3], 'SRR123_em2': [4,5,6], 'SRR661_em1': [7,8,9], 'SRR661_em2': [6,7,8],'SRR453_em2': [10,11,12]}
df = pd.DataFrame(data)
Output:
SRR123_em1 SRR123_em2 SRR661_em1 SRR661_em2
1 4 7 6
2 5 8 7
3 6 9 8
please any one suggest me how can obtaine my output
Do filter with str.contains
sub_df=df.loc[:,df.columns.str.contains('|'.join(ls))].copy()
Out[295]:
SRR123_em1 SRR123_em2 SRR661_em1 SRR661_em2
0 1 4 7 6
1 2 5 8 7
2 3 6 9 8

How to access the items in list from Dataframe against each index item? [duplicate]

This question already has answers here:
Pandas expand rows from list data available in column
(3 answers)
Closed 3 years ago.
Consider the table (Dataframe) below.
Need each item in the list against its index such as given below. What are the possible ways of doing this in python?
Anybody can tweak the question if it matches the context.
You can do this using the pandas library with the explode method. Here is how your code would look -
import pandas as pd
df = [["A", [1,2,3,4]],["B",[9,6,4]]]
df = pd.DataFrame(df, columns = ['Index', 'Lists'])
print(df)
df = df.explode('Lists').reset_index(drop=True)
print(df)
Your output would be -
Index Lists
0 A [1, 2, 3, 4]
1 B [9, 6, 4]
Index Lists
0 A 1
1 A 2
2 A 3
3 A 4
4 B 9
5 B 6
6 B 4

How to select mutiple rows at a time in pandas?

When I have a DataFrame object and an unknown number of rows, I want to select 5 rows each time.
For instance, df has 11 rows , it will be selected 3 times, 5+5+1, and if the rows is 4, only one time will be selected.
How can I write the code using pandas?
Use groupby with a little arithmetic. This should be clean.
chunks = [g for _, g in df.groupby(df.index // 5)]
Depending on how you want your output structured, you may change g to g.values.tolist() (if you want a list instead).
numpy.split
np.split(df, np.arange(5, len(df), 5))
Demo
df = pd.DataFrame(dict(A=range(11)))
print(*np.split(df, np.arange(5, len(df), 5)), sep='\n\n')
A
0 0
1 1
2 2
3 3
4 4
A
5 5
6 6
7 7
8 8
9 9
A
10 10
Create a loop and then use the index for indexing the DataFrame:
for i in range(len(df), 5):
data = df.iloc[i*5:(i+1)*5]

Creating a sub-index in pandas dataframe [duplicate]

This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Closed 1 year ago.
Okay this is tricky. I have a pandas dataframe and I am dealing with machine log data. I have an index in the data, but this dataframe has various jobs in it. I wanted to be able to give those individual jobs an index of their own, so that i could compare them with each other. So I want another column with an index beginning with zero, which goes till the end of the job and then resets to zero for the new job. Or do i do this line by line?
I think you need set_index with cumcount for count categories:
df = df.set_index(df.groupby('Job Columns').cumcount(), append=True)
Sample:
np.random.seed(456)
df = pd.DataFrame({'Jobs':np.random.choice(['a','b','c'], size=10)})
#solution with sorting
df1 = df.sort_values('Jobs').reset_index(drop=True)
df1 = df1.set_index(df1.groupby('Jobs').cumcount(), append=True)
print (df1)
Jobs
0 0 a
1 1 a
2 2 a
3 0 b
4 1 b
5 2 b
6 3 b
7 0 c
8 1 c
9 2 c
#solution with no sorting
df2 = df.set_index(df.groupby('Jobs').cumcount(), append=True)
print (df2)
Jobs
0 0 b
1 1 b
2 0 c
3 0 a
4 1 c
5 2 c
6 1 a
7 2 b
8 2 a
9 3 b

Resources