How to select first row after each 3 rows pandas [duplicate] - python-3.x

This question already has answers here:
Pandas every nth row
(7 answers)
Closed 1 year ago.
I have one dataframe, i want to get first row of each 3 rows in dataframe and save new dataframe
here is input data
df=pd.DataFrame({'x':[1,2,5,6,7,8,9,9,6]})
output:
df_out=pd.DataFrame({'x':[1,6,9]})

Use DataFrame.iloc with slicing:
print (df.iloc[::3])
x
0 1
3 6
6 9

Related

Un-merge a dataframe based on a column [duplicate]

This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Split (explode) pandas dataframe string entry to separate rows
(27 answers)
Closed 2 years ago.
I am reading a csv file into a pandas dataframe and the data inside dataframe is as below:
item seq_no db_xml
0 28799179 5 ['<my_xml>....</my_xml>']
1 28839888 1 ['<my_xml>....</my_xml>']
2 28840113 75 ['<my_xml>....</my_xml>']
3 28852466 20,22 ['<my_xml1>....</my_xml1>', '<my_xml2>....</my_xml2>']
I need to convert above dataframe as below i.e. each seq_no for same item and its db_xml should be in different rows. I need to unmerge seq_no of same item in subsequent rows.
item seq_no db_xml
0 28799179 5 ['<my_xml>....</my_xml>']
1 28839888 1 ['<my_xml>....</my_xml>']
2 28840113 75 ['<my_xml>....</my_xml>']
3 28852466 20 ['<my_xml1>....</my_xml1>']
4 28852466 22 ['<my_xml2>....</my_xml2>']
Please let me know on how to achieve the same in pandas so that even seq_no is also split and in separate rows?

dropna() not working for axis = 1 with the given threshold [duplicate]

This question already has answers here:
thresh in dropna for DataFrame in pandas in python
(3 answers)
Closed 2 years ago.
For the given dataset
I performed a dropna on axis = 1 with threshold = 2
df.dropna(thresh=2,axis=1)
The output was
Which does not seem correct, what I expect is to drop column with index = 1 and 2 given that both columns have NaN occurences >= 2
The code works perfectly fine with axis=0
Try using df.dropna(thresh=6,axis=1) for same dataframe.

python dataframe - grouping rows [duplicate]

This question already has answers here:
Pandas: groupby column A and make lists of tuples from other columns?
(4 answers)
Closed 3 years ago.
I'm new to python so the question might not be so clear.
I have this dataset with pandas dataframe that goes something like this.
Id item
0 A0029V93 B0239WN
1 A0029V93 B0302SS
2 A02948s8 B0029ST
...
and the result I want is
Id item
0 A0029V93 (B0239WN,B0302SS)
1 A02948s8 (B0029ST, ...)
2 ... ...
...
No duplicate Id and all the items in the data paired with the ID
It doesn't necessarily have to look like this
as long as I can get the Id,[item] data.
df.groupby('Id')['item'].apply(list)

How to extract the entire column from a df based on a string of the column name? [duplicate]

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 3 years ago.
I have 2 dfs:
Sample of df1: s12
BacksGas_Flow_sccm ContextID StepID Time_Elapsed
46.6796875 7289972 12 25.443
46.6796875 7289972 12 26.443
Sample of df2: step12
ContextID BacksGas_Flow_sccm StepID Time_Elapsed
7289973 46.6796875 12 26.388
7289973 46.6796875 12 27.388
Since the BacksGas_Flow_sccm is on different positions in both the dfs, I would like to know as to how can I extract the column using df.columns.str.contains('Flow')
I tried doing:
s12.columns[s12.columns.str.contains('Flow')]
but it just gives the following output:
Index(['BacksGas_Flow_sccm'], dtype='object')
I would like the entire column to be extracted. How can this be done?
You are close, use DataFrame.loc with : for get all rows and columns filtered by conditions:
s12.loc[:, s12.columns.str.contains('Flow')]
Another idea is select by columns names:
cols = s12.columns[s12.columns.str.contains('Flow')]
s12[cols]

NA values on Dataframe [duplicate]

This question already has answers here:
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
(15 answers)
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 4 years ago.
so I'm working with python and data frames.
I have a list of movies from which I create a subset and I'm trying to plot or just figure out the mean for that matter but there is a lot of missing information and the frame just has an "NA" instead of data.
I use np library to ignore does values but I still get an error saying the type of input is not supported by isfinite
data = pd.read_csv("C:\\Users\Bubbles\Documents\CS241\week 13\movies.csv")
Action = data[(data.Action == 1)]
Action = Action[np.isfinite(Action.budget)]
print(Action.budget.mean())
the budget list would just contain "NA" and integers as possible values

Resources