Regular Expressions Python replacing couple names - python-3.x

I would like to find and replace expressions like "John and Jane Doe" with "John Doe and Jane Doe"
for a sample expression
regextest = 'Heather Robinson, Jane and John Smith, Kiwan and Nichols Brady John, Jimmy Nichols, Melanie Carbone, and Nancy Brown'
I can find the expression and replace it with a fixed string but I am not able to replace it with a modification of the original text.
re.sub(r'[a-zA-Z]+\s*and\s*[a-zA-Z]+.[^,]*',"kittens" ,regextest)
Output: 'Heather Robinson, kittens, kittens, Jimmy Nichols, Melanie Carbone, and Nancy Brown'
I think instead of a string ("kittens"), we can pass a function that can make that change but I am unable to write that function. I am getting errors below.
def re_couple_name_and(m):
return f'*{m.group(0).split()[0]+m.group(0).split()[-1:]+ m.group(0).split()[1:]}'
re.sub(r'[a-zA-Z]+\s*and\s*[a-zA-Z]+.[^,]*',re_couple_name_and ,regextest)

IIUC, one way using capture groups:
def re_couple_name_and(m):
family_name = m.group(3).split(" ",1)[1]
return "%s %s" % (m.group(1), family_name) + m.group(2) + m.group(3)
re.sub(r'([a-zA-Z]+)(\s*and\s*)([a-zA-Z]+.[^,]*)',re_couple_name_and ,regextest)
Output:
'Heather Robinson, Jane Smith and John Smith, Kiwan Brady John and Nichols Brady John, Jimmy Nichols, Melanie Carbone, and Nancy Brown'

You can use below regex to capture the items to be interchanged and use re.sub() to construct new string.
(\w+)( +and +)(\w+)( +[^,]*)
Demo
Example
import re
text="Heather Robinson, Jane and John Smith, Kiwan and Nichols Brady John, Jimmy Nichols, Melanie Carbone, and Nancy Brown"
print(re.sub(r"(\w+)( +and +)(\w+)( +[^,]*)",r"\1\4\2\3\4",text))
Output
Heather Robinson, Jane Smith and John Smith, Kiwan Brady John and Nichols Brady John, Jimmy Nichols, Melanie Carbone, and Nancy Brown

Related

VBA for merging range of cells with reference to a cell

I want to merge the range of cells with reference to a unique cell value and require VBA for the same.
Sample Data
Name
Phone No.
Car Company
Car Model
Car Expiry
DL No.
Charlie Andrew
98765
CA123
Charlie Andrew
98765
Mercedes
D201
Charlie Andrew
D201
Jun-50
Charlie Andrew
98765
Volkswagon
CA123
Charlie Andrew
Volkswagon
POLO
Charlie Andrew
POLO
MAR-25
Charlie Andrew
98765
Jun-50
Charlie Andrew
12345
BMW
520D
Charlie Andrew
520D
MAY-40
CA456
Stephen Logan
556644
GM MOTORS
Stephen Logan
GM MOTORS
2255H
Stephen Logan
2255H
APR-30
Stephen Logan
556644
SL987
Desired Result
Name
Phone No.
Car Company
Car Model
Car Expiry
DL No.
Charlie Andrew
987654
Mercedes; Volkswagon
D201; POLO
Jun-50; mar-25
CA123
Charlie Andrew
12345
BMW
520D
MAY-40
CA456
Stephen Logan
556644
GM MOTORS
2255H
APRIL-30
SL987
Please note that, DL No. should not be merged as it is a unique value
Thanks in Advance
I tried various VBA's but didn't get desired result.

Reading from a file how to append strings to a list until a specific marker in python

I have a input text file:
This file contains information of students who are joining picnic:
### bob alice rhea john
mary alex roma
peter &##&
Now I would like to append students name in a list in a line when find ### marker & stop appending when find &##& marker. Output should be(using python):
bob alice rhea john mary alex roma peter
You can use this method
txt = """This file contains information of students who are joining picnic:
### bob alice rhea john
mary alex roma
peter &##&"""
or :
txt = open("./file.txt").read()
ls = txt.split("###")[1].split("&##&")[0].split()
print(ls)
This prints :
['bob', 'alice', 'rhea', 'john', 'mary', 'alex', 'roma', 'peter']
Using re.findall with re.sub:
inp = """This file contains information of students who are joining picnic:
### bob alice rhea john
mary alex roma
peter &##&"""
output = re.sub(r'\s+', ' ', re.findall(r'###\s+(.*?)\s+&##&', inp, flags=re.DOTALL)[0])
print(output) # bob alice rhea john mary alex roma peter
If you want a list, then use:
output = re.split(r'\s+', re.findall(r'###\s+(.*?)\s+&##&', inp, flags=re.DOTALL)[0])
print(output)
This prints:
['bob', 'alice', 'rhea', 'john', 'mary', 'alex', 'roma', 'peter']

Selecting the first and the last word from a 3 word long string in PLSQL

For example I have names like these:
John Lucas Smith
Kevin Thomas Bacon
I need to do it with regexp_substr, or replace or something like that.
and what I want to get is:
John Smith
Kevin Bacon
Thank you!
Something like this?
SQL> with test (col) as
2 (select 'John Lucas Smith' from dual union
3 select 'Kevin Thomas Bacon' from dual union
4 select 'Little Foot' from dual
5 )
6 select regexp_substr(col, '^\w+') ||' '||
7 regexp_substr(col, '\w+$') first_and_last
8 from test;
FIRST_AND_LAST
-------------------------------------
John Smith
Kevin Bacon
Little Foot
SQL>

How to each nth line a a column using awk?

I have a single column text file looking like this:
John
Doe
Male
1984
Marie
Parker
Female
1989
And I would like to convert it to look like this:
John Doe Male 1984
Marie Parker Female 1989
I've tried using awk and modulo but I cannot manage to find a working solution.
$ pr -4at file
John Doe Male 1984
Marie Parker Female 1989
or your format
$ pr -4ats' ' file
John Doe Male 1984
Marie Parker Female 1989
of course with awk
$ awk 'ORS=NR%4?FS:RS' file
John Doe Male 1984
Marie Parker Female 1989
with paste
$ paste -d' ' - - - - < file
John Doe Male 1984
Marie Parker Female 1989

use sed to return only last line that contains specific string

All help would be appreciated as I have tried much googling and have drawn a blank :)
I am new to sed and do not know what the command I need might be.
I have a file containing numerous lines eg
John Smith Aweqwewq321
Mike Smith A2345613213
Jim Smith Ad432143432
Jane Smith A432434324
John Smith Bweqwewq321
Mike Smith B2345613213
Jim Smith Bd432143432
Jane Smith B432434324
John Smith Cweqwewq321
Mike Smith C2345613213
Jim Smith Cd432143432
Jane Smith C432434324
The file is plain text and is not formatted (ie not csv or anything like that )
I want to search for a list of specific strings, eg. John Smith, Mike Smith, Jim Smith and return only the last line entry in the file for each string found (all other lines found would be deleted).
(I do not necessarily need every unique entry, ie Jane Smith may or may not be needed)
It is important that the original order of the lines found is maintained in the output.
So the result would be :
John Smith Cweqwewq321
Mike Smith C2345613213
Jim Smith Cd432143432
I am new to sed and do not know what this command might be.
There are about 100 specific search strings.
Thank you :)
Assuming sample.txt contains the data you provided:
$ cat sample.txt
John Smith Aweqwewq321
Mike Smith A2345613213
Jim Smith Ad432143432
Jane Smith A432434324
John Smith Bweqwewq321
Mike Smith B2345613213
Jim Smith Bd432143432
Jane Smith B432434324
John Smith Cweqwewq321
Mike Smith C2345613213
Jim Smith Cd432143432
Jane Smith C432434324
For this sample data, following script works just fine:
$ cut -f1,2 -d' ' sample.txt | sort | uniq | while read s; do tac sample.txt | grep -m1 -n -e "$s" ; done | sort -n -r -t':' | cut -f2 -d':'
John Smith Cweqwewq321
Mike Smith C2345613213
Jim Smith Cd432143432
Jane Smith C432434324
Here is the breakdown of the script:
First generate all the unique strings ( First Name, Last Name in this case )
Now find the last occurrence each of these strings. For this we find first occurrence by reversing the file. Also print the line number along with output.
Now reverse the output in reverse line number order, and then remove the line numbers ( we don't need them )
you didn't tell the format of given list, I assume it is CSV, same as you wrote in question: eg. John Smith, Mike Smith, Jim Smith
from your description, you want line each string found, not only col1 and col2
from above two points, I have:
awk -v list="John Smith, Mike Smith, Jim Smith" 'BEGIN{split(list,p,",\\s*")}
{for(i=1;i<=length(p);i++){
if($0~p[i]){
a[p[i]]=$0
break
}
}
}END{for(x in a)print a[x]}' file
you can fill the list with your strings, sepearated by comma. output this with your test data:
John Smith Cweqwewq321
Mike Smith C2345613213
Jim Smith Cd432143432
Reverse the list fist, e.g. something like this:
$ sed -n '/Mike/{p;q}' <(tac input.txt)
Mike Smith C2345613213
sed -n -e 's/.*/&³/
H
$ {x
s/\n/²/g
t a
:a
s/²\([a-zA-Z]* [a-zA-Z]* \)[^³]*³\(\(.*\)²\1\)/\2/
t a
s/²//g;s/³$//;s/³/\
/g
p
}' YourFile
work with any name (cannot contain ² nor ³). For composed name change pattern with [-a-z-A-Z]
in your list, there is also Jane Smith that appear at least once
For the specific list, use a grep -f before, it's faster and easier to maintain without changing the code

Resources