Remove rows from data frame for which column equals one of following vectors - subset

I have a data frame with 2 columns x&y.
Now I want to remove all rows where column x is either equal 1 or 3.
How can I do that?
setting rm<-c(1,3)
and then df<-df[!df$x==rm,] does not work
df<-data.frame(c(1,2,3,4,4,4,4,2,2,3,3),c(1:11))
rm<-c(1,3)
df<-df[!df$x==rm,]

Found an answer. So just in case anybody checks this question later on:
df<-df[ ! df$x %in% rm, ]

Related

How to create a list and filter out row from another dataframe?

I know this question has been asked before, but every solution doesn't appear to work and gives me the same result. I am looking for insight into what I am doing wrong.
T_18_x2 and Tryp18_50 are large dataframes with different data (except for 2 columns). Specifically, each dataframe contains a column named 'Gene' that posses the same style sting (i.e. HSP90A_HUMAN). I would like to make a list from the Gene column in T_18_x2 to filter rows in Tryp18_50 with the same string in the "Gene" column.
My issue is that the output is simply an empty dataframe. I think it is the string (y2) because the output of this list is duplicates of the strings in the column. I am not sure why this is happening either.
List
Any help would be greatly appreciated.
input:
y2 =T18_x2['Gene'].astype(str).values.tolist()
T18 = Tryp18_50[Tryp18_50['Gene'].isin(y2)]
T18
output:
Output
** I have also tried:
T18=Tryp18_50[pd.notna(Tryp18_50['Gene']) & Tryp18_50['Gene'].astype(str).str.contains('|'.join(y2))]
with the output:
2nd Output
My mistake, I had two "Gene" columns in the first dataframe.

Pandas: get first datetime-in and last datetime-out in one row

First of all thanks in advance, there are always answers here so we learn a lot from the experts. I'm a noob using "pandas" (it's super handie for what i tried and achieved so far).
I have these data, handed to me like this (don't have access to the origin), 20k rows or more sometimes. The 'in' and 'out' columns may have one or more data per date, so when i get a 'in' the next data could be a 'out' or a 'in', depending, leaving me a blank cell, that's the problem (see first image).
I want to filter the first datetime-in, to left it in one column and the last datetime-out in another but the two in one row (see second image); the data comes in a csv file. I am doing this particular work manually with LibreOffice Calc (yeap).
So far, I have tried locating and relocating, tried merging, grouping... nothing works for me so i feel frustrated, ¿would you please lend me a hand? here is a minimal sample of the file
By the way english is not my language. ¡Thanks so much!
First:
out_column = df["out"].tolist()
This gives you all the out dates as a list, we will need that later.
in_column = df["in"].tolist() # in is used by python so I suggest renaming that row
I treat NaT as NaN (Null) in this Case.
Now we have to find what rows to keep, which we do by going through the in column and only keeping the rows after a NaN (and the first one):
filtered_df = []
tracker = False
for index, element in enumerate(in):
if index == 0 or tracker is True:
filtered_df.append(True)
tracker = False
continue
if element is None:
tracker = True
filtered_df.append(False)
Then you filter your df by this Boolean List:
df = df[filtered_df]
Now you fix up your out column by removing the null values:
while null in out_column:
out_column.remove(null)
Last but not least you overwrite your old out column with the new one:
df["out"] = out_column

Iterate in column for specific value and insert 1 if found or 0 if not found in new column python

I have a DataFrame as shown in the attached image. My columns of interest are fgr and fgr1. As you can see, they both contain values corresponding to years.
I want to iterate in the the two columns and for any value present, I want 1 if the value is present or else 0.
For example, in fgr the first value is 2028. So, the first row in column 2028 will have a value 1 and all other columns have value 0.
I tried using lookup but I did not succeed. So, any pointers will be really helpful.
Example dataframe
Data:
Data file in Excel
This fill do you job. You can use for loops aswell but I think this approach will be faster.
df["Matched"] = df["fgr"].isin(df["fgr1"])*1
Basically you check if values from one are in anoter column and if they are, you get True or False. You then multiply by 1 to get 1 and 0 instead of True or False.
From this answer
Not the most efficient, but should work for your case(time consuming if large dataset)
s = df.reset_index().melt(['index','fgr','fgr1'])
s['value'] = s.variable.eq(s.fgr.str[:4]).astype(int)
s['value2'] = s.variable.eq(s.fgr1.str[:4]).astype(int)
s['final'] = np.where(s['value']+s['value2'] > 0,1,0)
yourdf = s.pivot_table(index=['index','fgr','fgr1'],columns = 'variable',values='final',aggfunc='first').reset_index(level=[1,2])
yourdf

Excel sum based on matrix condition and multiple criteria

Following from the example here I'm trying to add additional conditions to a sum formula. I've represented an example below:
The output that I'm looking for for example for Jan 2017 is
2017
1
UP A 1
UP B 6
UP C 6
DOWN A 1
DOWN B 8
DOWN C 7
I tried with the following formula:
=MMULT(--($B$17:$C$17="X"),MATCH(1,($A23=$C$2:$C$14)*(C$21=$A$2:$A$14)*(C$22=$B$2:$B$14)*($E$2:$E$14=$D$2:$D$14),0))
but I get a N/A value.
Does anyone know it if is possible to do it?
In your first example the number of rows in array1 and number of columns in array2 were equal, five. Here you have two columns and 13 rows. That they are unequal here is part (all) of the reason why you are having an issue.
Also your match function is returning a Boolean not an array
I have a way to do this using matrix condition and multiple criteria but had to change problem up a bit, see photo for example:
{=MMULT(--(D18:P18="x"),E$2:E$14*(--(A$2:A$14=$C$21)*--(B$2:B$14=$C$22)*--(C$2:C$14=A24)))"
https://i.stack.imgur.com/FEvgR.png
You can create a formula to fill the second matrix with X's see below
=IF(OR(INDIRECT("D"&VALUE(D20))=$A$18,INDIRECT("D"&VALUE(D20))=$B$18),"X","")
https://i.stack.imgur.com/4rS4L.png
That being said I don't think this is particularly efficient as you are treating the one of the matrixes as a all 1's so you basically just adding an extra criteria / Boolean with added complexity....that being said u asked for this specifically and I believe that I have delivered that LOL
Just add two SUMIFS together.
=SUMIFS($E$2:$E$14, $A$2:$A$14, C$21, $B$2:$B$14, C$22, $C$2:$C$14, $A23, $D$2:$D$14, IF(INDEX($B$17:$C$19, MATCH($B23, $A$17:$A$19, 0), 1)="x", $B$16))+
SUMIFS($E$2:$E$14, $A$2:$A$14, C$21, $B$2:$B$14, C$22, $C$2:$C$14, $A23, $D$2:$D$14, IF(INDEX($B$17:$C$19, MATCH($B23, $A$17:$A$19, 0), 2)="x", $C$16))

Excel -- Filter the first row from a list of values; for multiple values?

All,
I am currently faced with an issue where I need to fetch the first instance of a value in a column, but I have multiple values. No two rows will be the same EXCEPT for the first column.
Example:
A 1 !
A 2 #
B 3 #
B 4 $
C 5 %
C 6 ^
D 7 &
D 8 *
After filter:
A 1 !
B 3 #
C 5 %
D 7 &
Would anyone have a way to go about this? Thanks in advanced.
Edit: Jeeped literally pointed something out that I have been doing for a long time, but didn't even think would work in this instance.
To solve this item, utilize the "Remove Duplicates" on the column in question (Column 1), but make sure you expand the selection. However, uncheck all the columns, and recheck only Column 1 for the criteria.
Thanks.
I am assuming you are trying to extract unique values from each column. If so, Excel has a built in function called Advanced Filter, which will do exactly that.
This tutorial will familiarize you with the feature
http://www.excel-easy.com/examples/advanced-filter.html
Hope that helps

Resources