Un-merge a dataframe based on a column [duplicate] - python-3.x

This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Split (explode) pandas dataframe string entry to separate rows
(27 answers)
Closed 2 years ago.
I am reading a csv file into a pandas dataframe and the data inside dataframe is as below:
item seq_no db_xml
0 28799179 5 ['<my_xml>....</my_xml>']
1 28839888 1 ['<my_xml>....</my_xml>']
2 28840113 75 ['<my_xml>....</my_xml>']
3 28852466 20,22 ['<my_xml1>....</my_xml1>', '<my_xml2>....</my_xml2>']
I need to convert above dataframe as below i.e. each seq_no for same item and its db_xml should be in different rows. I need to unmerge seq_no of same item in subsequent rows.
item seq_no db_xml
0 28799179 5 ['<my_xml>....</my_xml>']
1 28839888 1 ['<my_xml>....</my_xml>']
2 28840113 75 ['<my_xml>....</my_xml>']
3 28852466 20 ['<my_xml1>....</my_xml1>']
4 28852466 22 ['<my_xml2>....</my_xml2>']
Please let me know on how to achieve the same in pandas so that even seq_no is also split and in separate rows?

Related

How to select first row after each 3 rows pandas [duplicate]

This question already has answers here:
Pandas every nth row
(7 answers)
Closed 1 year ago.
I have one dataframe, i want to get first row of each 3 rows in dataframe and save new dataframe
here is input data
df=pd.DataFrame({'x':[1,2,5,6,7,8,9,9,6]})
output:
df_out=pd.DataFrame({'x':[1,6,9]})
Use DataFrame.iloc with slicing:
print (df.iloc[::3])
x
0 1
3 6
6 9

python dataframe - grouping rows [duplicate]

This question already has answers here:
Pandas: groupby column A and make lists of tuples from other columns?
(4 answers)
Closed 3 years ago.
I'm new to python so the question might not be so clear.
I have this dataset with pandas dataframe that goes something like this.
Id item
0 A0029V93 B0239WN
1 A0029V93 B0302SS
2 A02948s8 B0029ST
...
and the result I want is
Id item
0 A0029V93 (B0239WN,B0302SS)
1 A02948s8 (B0029ST, ...)
2 ... ...
...
No duplicate Id and all the items in the data paired with the ID
It doesn't necessarily have to look like this
as long as I can get the Id,[item] data.
df.groupby('Id')['item'].apply(list)

How to extract the entire column from a df based on a string of the column name? [duplicate]

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 3 years ago.
I have 2 dfs:
Sample of df1: s12
BacksGas_Flow_sccm ContextID StepID Time_Elapsed
46.6796875 7289972 12 25.443
46.6796875 7289972 12 26.443
Sample of df2: step12
ContextID BacksGas_Flow_sccm StepID Time_Elapsed
7289973 46.6796875 12 26.388
7289973 46.6796875 12 27.388
Since the BacksGas_Flow_sccm is on different positions in both the dfs, I would like to know as to how can I extract the column using df.columns.str.contains('Flow')
I tried doing:
s12.columns[s12.columns.str.contains('Flow')]
but it just gives the following output:
Index(['BacksGas_Flow_sccm'], dtype='object')
I would like the entire column to be extracted. How can this be done?
You are close, use DataFrame.loc with : for get all rows and columns filtered by conditions:
s12.loc[:, s12.columns.str.contains('Flow')]
Another idea is select by columns names:
cols = s12.columns[s12.columns.str.contains('Flow')]
s12[cols]

NA values on Dataframe [duplicate]

This question already has answers here:
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
(15 answers)
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 4 years ago.
so I'm working with python and data frames.
I have a list of movies from which I create a subset and I'm trying to plot or just figure out the mean for that matter but there is a lot of missing information and the frame just has an "NA" instead of data.
I use np library to ignore does values but I still get an error saying the type of input is not supported by isfinite
data = pd.read_csv("C:\\Users\Bubbles\Documents\CS241\week 13\movies.csv")
Action = data[(data.Action == 1)]
Action = Action[np.isfinite(Action.budget)]
print(Action.budget.mean())
the budget list would just contain "NA" and integers as possible values

Indexing Pandas Dataframe [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
I have 2 pandas dataframes with names and scores.
The first dataframe is is in the form:
df_score_1
A B C D
A 0 1 2 0
B 1 0 0 2
C 2 0 0 3
D 0 2 3 0
where
df_score_1.index
Index(['A', 'B', 'C', 'D'],dtype='object')
The second dataframe is from a text file with three columns which does not display zeros but only positive scores (or non-zero values)
df_score_2
A B 1
A C 1
A D 2
B C 5
B D 1
The goal is to transform df_score_2 into the form df_score_1 using pandas commands. The original form is from a networkx output nx.to_pandas_dataframe(G) line.
I've tried multi-indexing and the index doesn't display the form I would like. Is there an option when reading in a text file or a function to transform the dataframe after?
are you trying to merge the dataframes? or you just want them to have the same index? if you need the same index then use this:
l=df1.index.tolist()
df2.set_index(l, inplace=True)
crosstab and reindex are the best solutions I've found so far:
df = pd.crosstab(df[0], df[1], df[2], aggfunc=sum)
idx = df.columns.union(df.index)
df = df.reindex(index=idx, columns = idx)
The output is an adjacency matrix with NaN values instead of mirrored.
Here's a link to a similar question
I think you need,
df_score_2.set_index(df_score_1.index,inplace=True)

Resources