Update specific number of rows based on condition - python-3.x

If I want to updated only a specific number of records based on a filter in a Pandas data frame what should I do?
In this case I am filtering all 'Tickets' series equals to 10 and I want to increment in one the first 5. Here's my attempt:
df.loc[df['Tickets'] == 10, 'Tickets'].iloc[:5] += 1
If I remove .iloc[:5] this call works pretty fine, but not like this.
Thanks!

Chain of .loc and .iloc may cause the unsung error , so you may can check
df.update(df.loc[df['Tickets'] == 10, ['Tickets']].iloc[:5]+1)

Here I think you are updating a copy of the dataframe, you may do :
df.loc[np.where(df['Tickets'] == 10)[0][:5], 'Tickets'] += 1

Related

Pandas: get first datetime-in and last datetime-out in one row

First of all thanks in advance, there are always answers here so we learn a lot from the experts. I'm a noob using "pandas" (it's super handie for what i tried and achieved so far).
I have these data, handed to me like this (don't have access to the origin), 20k rows or more sometimes. The 'in' and 'out' columns may have one or more data per date, so when i get a 'in' the next data could be a 'out' or a 'in', depending, leaving me a blank cell, that's the problem (see first image).
I want to filter the first datetime-in, to left it in one column and the last datetime-out in another but the two in one row (see second image); the data comes in a csv file. I am doing this particular work manually with LibreOffice Calc (yeap).
So far, I have tried locating and relocating, tried merging, grouping... nothing works for me so i feel frustrated, ¿would you please lend me a hand? here is a minimal sample of the file
By the way english is not my language. ¡Thanks so much!
First:
out_column = df["out"].tolist()
This gives you all the out dates as a list, we will need that later.
in_column = df["in"].tolist() # in is used by python so I suggest renaming that row
I treat NaT as NaN (Null) in this Case.
Now we have to find what rows to keep, which we do by going through the in column and only keeping the rows after a NaN (and the first one):
filtered_df = []
tracker = False
for index, element in enumerate(in):
if index == 0 or tracker is True:
filtered_df.append(True)
tracker = False
continue
if element is None:
tracker = True
filtered_df.append(False)
Then you filter your df by this Boolean List:
df = df[filtered_df]
Now you fix up your out column by removing the null values:
while null in out_column:
out_column.remove(null)
Last but not least you overwrite your old out column with the new one:
df["out"] = out_column

How can I replace a particular column in a data frame based on a condition (categorical variables)?

I need to replace the salary status to 1 or 0 respectively if the salary is greater than 50,000 or less than or equal to 50,000 in a df.
The DataFrame shape:30162*13
I have tried this:
data2['SalStat']=data2['SalStat'].map({"less than or equal to 50,000":0,"greater than 50,000":1})
I also tried data2['SalStat']
and loc without any success.
How can I do the same?
I think your solution is nice.
If want match only by substring, e.g. by greater use Series.str.contains for boolean mask with converting to 0,1:
data2['SalStat']=data2['SalStat'].str.contains('greater').astype(int)
Or:
data2['SalStat']=data2['SalStat'].str.contains('greater').view('i1')
Try this
def status(d): return 0 if d == 'less than or equal to 50,000' else 1
data2['SalStat'] = list(map(status ,data2['SalStat']))

Iterate in column for specific value and insert 1 if found or 0 if not found in new column python

I have a DataFrame as shown in the attached image. My columns of interest are fgr and fgr1. As you can see, they both contain values corresponding to years.
I want to iterate in the the two columns and for any value present, I want 1 if the value is present or else 0.
For example, in fgr the first value is 2028. So, the first row in column 2028 will have a value 1 and all other columns have value 0.
I tried using lookup but I did not succeed. So, any pointers will be really helpful.
Example dataframe
Data:
Data file in Excel
This fill do you job. You can use for loops aswell but I think this approach will be faster.
df["Matched"] = df["fgr"].isin(df["fgr1"])*1
Basically you check if values from one are in anoter column and if they are, you get True or False. You then multiply by 1 to get 1 and 0 instead of True or False.
From this answer
Not the most efficient, but should work for your case(time consuming if large dataset)
s = df.reset_index().melt(['index','fgr','fgr1'])
s['value'] = s.variable.eq(s.fgr.str[:4]).astype(int)
s['value2'] = s.variable.eq(s.fgr1.str[:4]).astype(int)
s['final'] = np.where(s['value']+s['value2'] > 0,1,0)
yourdf = s.pivot_table(index=['index','fgr','fgr1'],columns = 'variable',values='final',aggfunc='first').reset_index(level=[1,2])
yourdf

Remove rows from data frame for which column equals one of following vectors

I have a data frame with 2 columns x&y.
Now I want to remove all rows where column x is either equal 1 or 3.
How can I do that?
setting rm<-c(1,3)
and then df<-df[!df$x==rm,] does not work
df<-data.frame(c(1,2,3,4,4,4,4,2,2,3,3),c(1:11))
rm<-c(1,3)
df<-df[!df$x==rm,]
Found an answer. So just in case anybody checks this question later on:
df<-df[ ! df$x %in% rm, ]

How to remove rows from a datascience table in python

I have a table with 4 columns filled with integer. Some of the rows have a value "null" as its more than 1000 records with this "null" value, how can I delete these rows all at once? I tried the delete method but it requires the index of the row its theres over 1000 rows. Is there as faster way to do it?
Thanks
use the 'drop.isnull()' function.
To remove a row in a datascience package:
name_of_your_table.remove() # number of the row in the bracket
Use the function:
name_of_your_table.dropna()
It will drop all the "null" values.
#df is the original dataframe#
#The '-' operator removes the null values and re-assigns the remaining ones to df#
df=idf[-(df['Column'].isnull())]
use dataframe_name.isnull() #To check the is there any missing values in your table.
use dataframe_name.isnull.sum() #To get the total number of missing values.
use dataframe_name.dropna() # To drop or delete the missing values.

Resources