pandas iterate rows and then break until condition - excel

I have a column that's unorganized like this;
Name
Jack
James
Riddick
Random value
Another random value
What I'm trying to do is get only the names from this column, but struggling to find a way to differentiate real names to random values. Fortunately the names are all together, and the random values are all together as well. The only thing I can do is iterate the rows until it gets to 'Random value' and then break off.
I've tried using lambda's for this but with no success as I don't think there's a way to break. And I'm not sure how comprehension could work in this case.
Here's the example I've been trying to play with;
df['Name'] = df['Name'].map(lambda x: True if x != 'Random value' else break)
But the above doesn't work. Any suggestions on what could work based on what I'm trying to achieve? Thanks.

Find index of row containing 'Random value':
index_split = df[df.Name == 'Random value'].index.values[0]
Save your random values column for use later if you want:
random_values = df.iloc[index_split+1:,].values[0]
Remove random values from the Names column:
df = df[0:index_split]

Related

How do I drop complete rows (including all values in it) that contain a certain value in my Pandas dataframe?

I'm trying to write a python script that finds unique values (names) and reports the frequency of their occurrence, making use of Pandas library. There's a total of around 90 unique names, which I've anonymised in the head of the dataframe pasted below.
,1,2,3,4,5
0,monday09-01-2022,tuesday10-01-2022,wednesday11-01-2022,thursday12-01-2022,friday13-01-2022
1,Anonymous 1,Anonymous 1,Anonymous 1,Anonymous 1,
2,Anonymous 2,Anonymous 4,Anonymous 5,Anonymous 5,Anonymous 5
3,Anonymous 3,Anonymous 3,,Anonymous 6,Anonymous 3
4,,,,,
I'm trying to drop any row (the full row) that contains the regex expression "^monday.*", intending to indicate the word "monday" followed by any other number of random characters. I want to drop/deselect any cell/value within that row.
To achieve this goal, I've tried using the line of code below (and many other approaches I found on SO).
df = df[df[1].str.contains("^monday.*", case = True, regex=True) == False]
To clarify, I'm trying to search values of column "1" for the value "^.monday.*" and then deselecting the rows and all values in that row that match the regex expression. I've succesfully removed "monday09-01-2022" and "tuesday10-01-2022" etc.. but I'm also losing random names that are not in the matching rows.
Any help would be very much appreciated! Thank you!

Is there any python-Dataframe function in which I can iterate over rows of certain columns?

Want to solve this kind of problem in python:
tran_df['bad_debt']=train_df.frame_apply(lambda x: 1 if (x['second_mortgage']!=0 and x['home_equity']!=0) else x['debt'])
I want be able to create a new column and iterate over index row for specific columns.
in excel it's really easy I did:
if(AND(col_name1<>0,col_name2<>0),1,col_name5)
Any help will be very appreciated.
To iterate over rows only for certain columns:
for rowIndex, row in df[['col1','col2']].iterrows(): #iterate over rows
To create a new column:
df['new'] = 0 # Initialise as 0
As a rule, iterating over rows in pandas is wrong. Use the np.where function from NumPy to select the right values for the rows:
tran_df['bad_debt'] = np.where(
(tran_df['second_mortgage'] != 0) & (tran_df['home_equity'] != 0),
1, tran_df['debt'])
First to create a new column with initial value, then to use .loc to locate rows that match certain condition and assign new value:
tran_df['bad_debt']=tran_df['debt']
tran_df.loc[(tran_df['second_mortgage']!=0)&(tran_df['home_equity']!=0),'bad_debt']=1
Or
tran_df['bad_debt']=1
tran_df.loc[(tran_df['second_mortgage']==0)|(tran_df['home_equity']==0),'bad_debt']=tran_df['debt']
Remember to put round brackets for each condition between bitwise operators (& |)

Pandas: get first datetime-in and last datetime-out in one row

First of all thanks in advance, there are always answers here so we learn a lot from the experts. I'm a noob using "pandas" (it's super handie for what i tried and achieved so far).
I have these data, handed to me like this (don't have access to the origin), 20k rows or more sometimes. The 'in' and 'out' columns may have one or more data per date, so when i get a 'in' the next data could be a 'out' or a 'in', depending, leaving me a blank cell, that's the problem (see first image).
I want to filter the first datetime-in, to left it in one column and the last datetime-out in another but the two in one row (see second image); the data comes in a csv file. I am doing this particular work manually with LibreOffice Calc (yeap).
So far, I have tried locating and relocating, tried merging, grouping... nothing works for me so i feel frustrated, ¿would you please lend me a hand? here is a minimal sample of the file
By the way english is not my language. ¡Thanks so much!
First:
out_column = df["out"].tolist()
This gives you all the out dates as a list, we will need that later.
in_column = df["in"].tolist() # in is used by python so I suggest renaming that row
I treat NaT as NaN (Null) in this Case.
Now we have to find what rows to keep, which we do by going through the in column and only keeping the rows after a NaN (and the first one):
filtered_df = []
tracker = False
for index, element in enumerate(in):
if index == 0 or tracker is True:
filtered_df.append(True)
tracker = False
continue
if element is None:
tracker = True
filtered_df.append(False)
Then you filter your df by this Boolean List:
df = df[filtered_df]
Now you fix up your out column by removing the null values:
while null in out_column:
out_column.remove(null)
Last but not least you overwrite your old out column with the new one:
df["out"] = out_column

How to find String from multiple columns

I am trying to find a string which I have in first column from another columns in the dateset. The dataset contains name in each column. Below is my dataset.
I am trying a following code but unable to find anything.
P_Name = new_data['P_Name_1']
for i in P_Name:
new_data['new1'] = (new_data.iloc[:,1:].values == i).any(0)
new_data
f there is any similar name found( even first name or last name) it appears in the new column
First of all, be aware that every iteration of your loop will overwrite the entry new_data['new1'].
Then the function x.any() will return a boolean value if any element in x equals to "True" or 1, so what your code would do is to assign a Boolean value to the column new_data['new1'].
I believe it would be easier for people to help if you can specify your problem more explicitly, for example, what's your desired output and what should the loop do?

About lists in python

I have an excel file with a column in which values are in multiple rows in this format 25/02/2016. I want to save all this rows of dates in a list. Each row is a separate value. How do I do this? So far this is my code:
I have an excel file with a column in which values are in multiple rows in this format 25/02/2016. I want to save all this rows of dates in a list. Each row is a separate value. How do I do this? So far this is my code:
import openpyxl
wb = openpyxl.load_workbook ('LOTERIAREAL.xlsx')
sheet = wb.get_active_sheet()
rowsnum = sheet.get_highest_row()
wholeNum = []
for n in range(1, rowsnum):
wholeNum = sheet.cell(row=n, column=1).value
print (wholeNum[0])
When I use the print statement, instead of printing the value of the first row which should be the first item in the list e.g. 25/02/2016, it is printing the first character of the row which is the number 2. Apparently it is slicing thru the date. I want the first row and subsequent rows saved as separate items in the list. What am I doing wrong? Thanks in advance
wholeNum = sheet.cell(row=n, column=1).value assigns the value of the cell to the variable wholeNum, so you're never adding anything to the initial empty list and just overwrite the value each time. When you call wholeNum[0] at the end, wholeNum is a the last string that was read, and you're getting the first character of it.
You probable want wholeNum.append(sheet.cell(row=n, column=1).value) to accumulate a list.
wholeNum =
This is an assignment. It makes the name wholeNum refer to whatever object the expression to the right of the = operator evaluates to.
for ...:
wholeNum = ...
Performing assignment in a loop is frequently not useful. The name wholeNum will refer to whatever value was assigned to it in the last iteration of the loop. The other iterations have no discernible effect.
To append values to a list, use the .append() method.
for ...:
wholeNum.append( ... )
print( wholeNum )
print( wholeNum[0] )

Resources