I have a pandas dataframe with a name column as below
name
Dr. Maso Guilani
Paul Dupey
Mrs. Sarah Kant
Cathay Pane
Canine Paul
I want to remove strings like "Dr. , Mrs." from that "name" column
I tried as below.
df['name']=df.name.replace({"Mrs.": ""},regex=True).replace({"Dr.": ""},regex=True)
But I want to generalize this as I am not sure how many prefixes like "Dr. , Mrs." are
available in the huge dataset. Basically I want to remove all the prefix with dots. Thanks.
Expected output:
name
Maso Guilani
Paul Dupey
Sarah Kant
Cathay Pane
Canine Paul
With your shown samples, please try following. Using str.replace function of Pandas here. Simple explanation of regex would be: replacing everything from starting of value(with a lazy match) till first dot followed by 1 or more spaces with NULL in name column.
df['name'].str.replace(r'^.*?\.\s+','')
Output will be as follows.
Maso Guilani
Paul Dupey
Sarah Kant
Cathay Pane
Canine Paul
One way of doing this:
Via split() and apply() method:
df['name']=df['name'].str.split('.',1).apply(lambda x:x[1] if len(x)>1 else x[0])
Output of df:
0 Maso Guilani
1 Paul Dupey
2 Sarah Kant
3 Cathay Pane
4 Canine Paul
Related
I have a column with entries that follow a similar format as the example below:
ABC - Adam Smith (T) (ABCadasmi)
ABC - John Carter (V) (ABCjohcar)
I'm looking to extract the "ABCadasmi" and "ABCjohcar" strings from these entries. Is there an Excel formula that can do this?
I am trying to print strings in the following format:
1). Tim Brazil
2). Johnny Argentina
3). Sara Ukraine
However, they always end up printing like this:
1). Tim Brazil
2). Johnny Argentina
3). Sara Ukraine
What can I do to fix it so that the columns are aligned such as the first example?
I tried
print('{0}). {1} {2:>11}'.format(i, name, country)) with no success. Any tips would be appreciated. Thanks.
(moving comment to answer)
You are formatting the wrong column.
Try this code:
print('{0}). {1:<11} {2}'.format(i, name, country)) # left justify second column
enter image description hereI have an array of people with scores in other column. I need to find top 3 people with highest score and print their names.
Example:
Maria 1
Thomas 4
John 3
Jack 2
Ray 2
Laura 4
Kate 3
Result should be:
Thomas
Laura
John
What I get:
Thomas
Thomas
John
What I get:
Thomas
John
num
I have tried using LARGE, MATCH, MIN, MAX but nothings works.
My first failure code:
=INDEX($A$2:$A$8; MATCH(LARGE(($B$2:$B$8);{1;2;3}); $B$2:$B$8;0))
My second failure code:
{=INDEX($A$2:$A$14;SMALL(IF($B$2:$B$14=MAX($B$2:$B$14);ROW($B$2:$B$14)-1);ROW(B4)-1))}
Put this in the second row of the column you want:
=INDEX(A:A,AGGREGATE(15,7,ROW($B$1:$B$7)/((COUNTIF($D$1:D1,$A$1:$A$7)=0)*($B$1:$B$7=LARGE(B:B,ROW(1:1)))),1))
And drag down three rows:
Let's say I've got two tables with two columns. In both cases, the first column consists of a name and a second column consist string of characters with the similar pattern. It looks like this:
Table 1
Peter xxxxx01
John xxxxx01
Bill xxxxx01
William xxxxx01
Table 2
Richard xxxxx02
John xxxxx02
Bill xxxxx02
Arthur xxxxx02
Now, I'd like to compare these two tables, find values where the names are duplicated and display data stored in second columns, just like this:
(Peter excluded)
John xxxxx01 xxxxx02
Bill xxxxx01 xxxxx02
(William, Arthur excluded)
I am familiar with pivot tables, however, it won't allow doing this.
I've also tried messing with index match formulas but without much success.
Any advices?
You can use the VLOOKUP function for this.
If your "Table1" is in B3:C6, and your "Table 2" is in F3:G6, then you can use the following formula in D3:D6 to lookup the values in table 2;
Cell D3: =IFERROR(VLOOKUP(B3,$F$3:$G$6,2,FALSE),"")
This is first looking up the name in table 1 (Cell B3) against table 2 (F3:G6), and returning the second column of table 2 if it finds the name. If it doesn't find the name, it will return an error, so we wrap the VLOOKUP in an "IFERROR" function, and replace any errors with an empty string, so it looks a bit friendlier. This results in the following table;
A B C D E F G
1
2 Table 1 Result Table 2
3 Peter xxxxxx01 Richard xxxxxx02
4 John xxxxxx01 xxxxxx02 John xxxxxx02
5 Bill xxxxxx01 xxxxxx02 Bill xxxxxx02
6 William xxxxxx01 Arthur xxxxxx02
You can then filter on the (Non-Blanks) in column D to only get the results you're interested in.
I have 3 columns
a b c
jon ben 2
ben jon 2
roy jack 1
jack roy 1
I'm trying to retrieve all unique permutations e.g. ben and jon = jon and ben so they should only appear once. Expected output:
a b c
jon ben 2
roy jack 1
Any ideas of a function that could do this? The order in the output does not matter. I've tried concatenating and then removing duplicates, but obviously this only considers the string order.
I've created a fourth column by joining all three columns together =a1&","&b1&","&c1 and used excel's built in remove duplicates function. This doesnt work as the order of the strings are different.
In your forth column use the formula
=if(A1<B1,A1&","&B1&","&C1,B1&","&A1&","&C1)
Which should join A and B in alphabetical order, then you can remove duplicates as you have done.