python dataframe - grouping rows [duplicate] - python-3.x

This question already has answers here:
Pandas: groupby column A and make lists of tuples from other columns?
(4 answers)
Closed 3 years ago.
I'm new to python so the question might not be so clear.
I have this dataset with pandas dataframe that goes something like this.
Id item
0 A0029V93 B0239WN
1 A0029V93 B0302SS
2 A02948s8 B0029ST
...
and the result I want is
Id item
0 A0029V93 (B0239WN,B0302SS)
1 A02948s8 (B0029ST, ...)
2 ... ...
...
No duplicate Id and all the items in the data paired with the ID
It doesn't necessarily have to look like this
as long as I can get the Id,[item] data.

df.groupby('Id')['item'].apply(list)

Related

Sum values in a list in python [duplicate]

This question already has answers here:
Python - sum values in dictionary
(5 answers)
Closed 22 days ago.
I'm having a problem in processing lists that I have 2 lists:
shop1= [{'status':'1','price':;1200'},{'status':'1','price':'13000'}] shop2= [{'status':'2','price':3000'},{'status':'2','price':'4000'}]
How can I return the sum of all the prices of shop1, shop2, so?
For calculating dynamic number of shops input you can use itertools with zip_longest method. Hence you don't need to worry if the items length are not same.
import itertools
shop_items = itertools.zip_longest(shop1, shop2)
total = sum([sum(item["price"] for item in items if item) for items in shop_items])

How to select first row after each 3 rows pandas [duplicate]

This question already has answers here:
Pandas every nth row
(7 answers)
Closed 1 year ago.
I have one dataframe, i want to get first row of each 3 rows in dataframe and save new dataframe
here is input data
df=pd.DataFrame({'x':[1,2,5,6,7,8,9,9,6]})
output:
df_out=pd.DataFrame({'x':[1,6,9]})
Use DataFrame.iloc with slicing:
print (df.iloc[::3])
x
0 1
3 6
6 9

Un-merge a dataframe based on a column [duplicate]

This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Split (explode) pandas dataframe string entry to separate rows
(27 answers)
Closed 2 years ago.
I am reading a csv file into a pandas dataframe and the data inside dataframe is as below:
item seq_no db_xml
0 28799179 5 ['<my_xml>....</my_xml>']
1 28839888 1 ['<my_xml>....</my_xml>']
2 28840113 75 ['<my_xml>....</my_xml>']
3 28852466 20,22 ['<my_xml1>....</my_xml1>', '<my_xml2>....</my_xml2>']
I need to convert above dataframe as below i.e. each seq_no for same item and its db_xml should be in different rows. I need to unmerge seq_no of same item in subsequent rows.
item seq_no db_xml
0 28799179 5 ['<my_xml>....</my_xml>']
1 28839888 1 ['<my_xml>....</my_xml>']
2 28840113 75 ['<my_xml>....</my_xml>']
3 28852466 20 ['<my_xml1>....</my_xml1>']
4 28852466 22 ['<my_xml2>....</my_xml2>']
Please let me know on how to achieve the same in pandas so that even seq_no is also split and in separate rows?

How to extract the entire column from a df based on a string of the column name? [duplicate]

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 3 years ago.
I have 2 dfs:
Sample of df1: s12
BacksGas_Flow_sccm ContextID StepID Time_Elapsed
46.6796875 7289972 12 25.443
46.6796875 7289972 12 26.443
Sample of df2: step12
ContextID BacksGas_Flow_sccm StepID Time_Elapsed
7289973 46.6796875 12 26.388
7289973 46.6796875 12 27.388
Since the BacksGas_Flow_sccm is on different positions in both the dfs, I would like to know as to how can I extract the column using df.columns.str.contains('Flow')
I tried doing:
s12.columns[s12.columns.str.contains('Flow')]
but it just gives the following output:
Index(['BacksGas_Flow_sccm'], dtype='object')
I would like the entire column to be extracted. How can this be done?
You are close, use DataFrame.loc with : for get all rows and columns filtered by conditions:
s12.loc[:, s12.columns.str.contains('Flow')]
Another idea is select by columns names:
cols = s12.columns[s12.columns.str.contains('Flow')]
s12[cols]

NA values on Dataframe [duplicate]

This question already has answers here:
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
(15 answers)
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 4 years ago.
so I'm working with python and data frames.
I have a list of movies from which I create a subset and I'm trying to plot or just figure out the mean for that matter but there is a lot of missing information and the frame just has an "NA" instead of data.
I use np library to ignore does values but I still get an error saying the type of input is not supported by isfinite
data = pd.read_csv("C:\\Users\Bubbles\Documents\CS241\week 13\movies.csv")
Action = data[(data.Action == 1)]
Action = Action[np.isfinite(Action.budget)]
print(Action.budget.mean())
the budget list would just contain "NA" and integers as possible values

Resources