NA values on Dataframe [duplicate] - python-3.x

This question already has answers here:
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
(15 answers)
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 4 years ago.
so I'm working with python and data frames.
I have a list of movies from which I create a subset and I'm trying to plot or just figure out the mean for that matter but there is a lot of missing information and the frame just has an "NA" instead of data.
I use np library to ignore does values but I still get an error saying the type of input is not supported by isfinite
data = pd.read_csv("C:\\Users\Bubbles\Documents\CS241\week 13\movies.csv")
Action = data[(data.Action == 1)]
Action = Action[np.isfinite(Action.budget)]
print(Action.budget.mean())
the budget list would just contain "NA" and integers as possible values

Related

How to select first row after each 3 rows pandas [duplicate]

This question already has answers here:
Pandas every nth row
(7 answers)
Closed 1 year ago.
I have one dataframe, i want to get first row of each 3 rows in dataframe and save new dataframe
here is input data
df=pd.DataFrame({'x':[1,2,5,6,7,8,9,9,6]})
output:
df_out=pd.DataFrame({'x':[1,6,9]})
Use DataFrame.iloc with slicing:
print (df.iloc[::3])
x
0 1
3 6
6 9

Using apply function to convert values in columns [duplicate]

This question already has answers here:
Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe
(6 answers)
Closed 2 years ago.
I am trying to convert a value of a column in the dataframe. column name is size. it has data as 11.1 K or 51.6M, i.e ending in K or M and has object data type. i want to write an apply function which converts this value to 11.1 if it is ending in K and 516000 if it is ending in M . Any help?
I am trying to code for this in python 3
Lots of way to do this, a simple way would be to use pd.eval with replace
df = pd.DataFrame({'A' : ['56.1M', '11.1K']})
print(df)
A
0 56.1M
1 11.1K
df['B'] = df['A'].replace({'M' : '*10000', 'K' : '*1'},regex=True).map(pd.eval)
print(df)
A B
0 56.1M 561000.0
1 11.1K 11.1

dropna() not working for axis = 1 with the given threshold [duplicate]

This question already has answers here:
thresh in dropna for DataFrame in pandas in python
(3 answers)
Closed 2 years ago.
For the given dataset
I performed a dropna on axis = 1 with threshold = 2
df.dropna(thresh=2,axis=1)
The output was
Which does not seem correct, what I expect is to drop column with index = 1 and 2 given that both columns have NaN occurences >= 2
The code works perfectly fine with axis=0
Try using df.dropna(thresh=6,axis=1) for same dataframe.

python dataframe - grouping rows [duplicate]

This question already has answers here:
Pandas: groupby column A and make lists of tuples from other columns?
(4 answers)
Closed 3 years ago.
I'm new to python so the question might not be so clear.
I have this dataset with pandas dataframe that goes something like this.
Id item
0 A0029V93 B0239WN
1 A0029V93 B0302SS
2 A02948s8 B0029ST
...
and the result I want is
Id item
0 A0029V93 (B0239WN,B0302SS)
1 A02948s8 (B0029ST, ...)
2 ... ...
...
No duplicate Id and all the items in the data paired with the ID
It doesn't necessarily have to look like this
as long as I can get the Id,[item] data.
df.groupby('Id')['item'].apply(list)

How to group by certain columns and invert the group by in Python [duplicate]

This question already has answers here:
Pandas long to wide reshape, by two variables
(6 answers)
Closed 5 years ago.
So I have a Data Frame for which there is the same ID contains multiple Custom Fields. I found this question but it's not quite what I am looking for. Code to create desired starter data frame below
df = pd.DataFrame()
df['ID'] = [np.random.randint(1,2000) for x in range(0,1000)]
new = pd.DataFrame()
for x in range(0,10):
new = new.append(df)
new = new.sort_values('ID').reset_index(drop=True)
new['Custom Field'] = [np.random.randint(1,20) for x in new['ID']]
new['Value'] = [np.random.randint(0,10000000) for x in new['ID']]
new = new.groupby(['ID','Custom Field']).first().reset_index()
new = new.sort_values(['ID','Custom Field']).reset_index(drop=True)
new.head()
Essentially the below picture is what I am looking for:
This image shows that it's taking the values in the Custom Field table and transposing them into separate columns. For every ID it can have up to 20 values in the Custom Field table. I need each of the custom field values (1-20) to be in their own column. If a certain ID does not have the value, it will be blank. I am trying to be as specific as possible but it's hard to explain. Let me know if I need to edit the question to provide more detail.
Use pivot with add prefix i.e
df.pivot('ID','Custom_Field','Value').add_prefix('CF')
Custom_Field CF1 CF2 CF3 CF7 CF8 \
ID
1 NaN 5643962.0 6959658.0 4310939.0 5796051.0
2 1121049.0 6044077.0 NaN NaN NaN
Custom_Field CF9 CF12 CF13 CF15 CF16 CF19
ID
1 1198701.0 NaN 2925189.0 8438978.0 1730570.0 3481493.0
2 4483108.0 3327149.0 NaN 2700632.0 NaN 3249005.0

Resources