How to remove rows from a datascience table in python - python-3.x

I have a table with 4 columns filled with integer. Some of the rows have a value "null" as its more than 1000 records with this "null" value, how can I delete these rows all at once? I tried the delete method but it requires the index of the row its theres over 1000 rows. Is there as faster way to do it?
Thanks

use the 'drop.isnull()' function.

To remove a row in a datascience package:
name_of_your_table.remove() # number of the row in the bracket

Use the function:
name_of_your_table.dropna()
It will drop all the "null" values.

#df is the original dataframe#
#The '-' operator removes the null values and re-assigns the remaining ones to df#
df=idf[-(df['Column'].isnull())]

use dataframe_name.isnull() #To check the is there any missing values in your table.
use dataframe_name.isnull.sum() #To get the total number of missing values.
use dataframe_name.dropna() # To drop or delete the missing values.

Related

How do I drop complete rows (including all values in it) that contain a certain value in my Pandas dataframe?

I'm trying to write a python script that finds unique values (names) and reports the frequency of their occurrence, making use of Pandas library. There's a total of around 90 unique names, which I've anonymised in the head of the dataframe pasted below.
,1,2,3,4,5
0,monday09-01-2022,tuesday10-01-2022,wednesday11-01-2022,thursday12-01-2022,friday13-01-2022
1,Anonymous 1,Anonymous 1,Anonymous 1,Anonymous 1,
2,Anonymous 2,Anonymous 4,Anonymous 5,Anonymous 5,Anonymous 5
3,Anonymous 3,Anonymous 3,,Anonymous 6,Anonymous 3
4,,,,,
I'm trying to drop any row (the full row) that contains the regex expression "^monday.*", intending to indicate the word "monday" followed by any other number of random characters. I want to drop/deselect any cell/value within that row.
To achieve this goal, I've tried using the line of code below (and many other approaches I found on SO).
df = df[df[1].str.contains("^monday.*", case = True, regex=True) == False]
To clarify, I'm trying to search values of column "1" for the value "^.monday.*" and then deselecting the rows and all values in that row that match the regex expression. I've succesfully removed "monday09-01-2022" and "tuesday10-01-2022" etc.. but I'm also losing random names that are not in the matching rows.
Any help would be very much appreciated! Thank you!

Is there any python-Dataframe function in which I can iterate over rows of certain columns?

Want to solve this kind of problem in python:
tran_df['bad_debt']=train_df.frame_apply(lambda x: 1 if (x['second_mortgage']!=0 and x['home_equity']!=0) else x['debt'])
I want be able to create a new column and iterate over index row for specific columns.
in excel it's really easy I did:
if(AND(col_name1<>0,col_name2<>0),1,col_name5)
Any help will be very appreciated.
To iterate over rows only for certain columns:
for rowIndex, row in df[['col1','col2']].iterrows(): #iterate over rows
To create a new column:
df['new'] = 0 # Initialise as 0
As a rule, iterating over rows in pandas is wrong. Use the np.where function from NumPy to select the right values for the rows:
tran_df['bad_debt'] = np.where(
(tran_df['second_mortgage'] != 0) & (tran_df['home_equity'] != 0),
1, tran_df['debt'])
First to create a new column with initial value, then to use .loc to locate rows that match certain condition and assign new value:
tran_df['bad_debt']=tran_df['debt']
tran_df.loc[(tran_df['second_mortgage']!=0)&(tran_df['home_equity']!=0),'bad_debt']=1
Or
tran_df['bad_debt']=1
tran_df.loc[(tran_df['second_mortgage']==0)|(tran_df['home_equity']==0),'bad_debt']=tran_df['debt']
Remember to put round brackets for each condition between bitwise operators (& |)

Remove rows from data frame for which column equals one of following vectors

I have a data frame with 2 columns x&y.
Now I want to remove all rows where column x is either equal 1 or 3.
How can I do that?
setting rm<-c(1,3)
and then df<-df[!df$x==rm,] does not work
df<-data.frame(c(1,2,3,4,4,4,4,2,2,3,3),c(1:11))
rm<-c(1,3)
df<-df[!df$x==rm,]
Found an answer. So just in case anybody checks this question later on:
df<-df[ ! df$x %in% rm, ]

Update specific number of rows based on condition

If I want to updated only a specific number of records based on a filter in a Pandas data frame what should I do?
In this case I am filtering all 'Tickets' series equals to 10 and I want to increment in one the first 5. Here's my attempt:
df.loc[df['Tickets'] == 10, 'Tickets'].iloc[:5] += 1
If I remove .iloc[:5] this call works pretty fine, but not like this.
Thanks!
Chain of .loc and .iloc may cause the unsung error , so you may can check
df.update(df.loc[df['Tickets'] == 10, ['Tickets']].iloc[:5]+1)
Here I think you are updating a copy of the dataframe, you may do :
df.loc[np.where(df['Tickets'] == 10)[0][:5], 'Tickets'] += 1

Choose exact value in DataFrame

I'm looking through a UCI Adult dataframe (https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data). I want to output and count all the rows, where native country is 'Germany'. The following code:
df[df['native-country']=="Germany"]
Says me that all the rows are False. Is there any other way to count the amount of rows and/or print them out? Dummie might not be an option, since there are more than 20 different countries in the dataframe.
I think you have white blank in the country field
Try
df[df['native-country']==" Germany"]
Or
df[df['native-country'].str.contains("Germany")]
Your command df[df['native-country']=="Germany"] should already print only rows that match the condition. If you're seeing rows of False values, you might actually be executing df['native-country']=="Germany", which returns a boolean mask of True and False.
To count the occurrences of each unique value in the native-country column, try:
df['native-country'].value_counts()

Resources