question tuples in regex -regular expressions - python-3.x

So I have some code on which I will use an regex on.
Specifically, I need to use re.findall() and a single regular expression to extract the three names and email addresses from the 'string'. To create list of 3 tuples like so: [('Mary Boe', 'md90#uw.com'), ('Cheri Moe Drake', 'cmd39#gmail.gbl'), ('R.L. Fitzgeri', 'fit.rl#hotmail.ing')]
here is the string....
string = """Name: Mary Boe, Email: md90#uw.com\n
Name: Cheri Moe Drake, Email: cmd39#gmail.gbl\n
Name: R.L. Fitzgeri, Email: fit.rl#hotmail.ing"""
So far I have used the following to get ['R.L. Fitzgeri']
with
re.findall('\S\S\w\S\s\w\S\w\w\w\w\S\w',string)
And I have been able to get fit.rl#hotmail.ing
with
re.findall('\w\\w\\w\\S\w\w\S\w\w\w\w\S\w\w\S\w\w\w',string)
I have been able to get Cheri Moe Drake with
re.findall('\w\w\w\w\w\s\w\w\w\s\w\w\w\w\w',string)
But I have struggled condensing this, and secondly, struggled to get it so that it all comes out, as I said, like:
[('Mary Boe', 'md90#uw.com'), ('Cheri Moe Drake', 'cmd39#gmail.gbl'), ('R.L. Fitzgeri', 'fit.rl#hotmail.ing')]

Here is a way to do the job:
import re
string = """Name: Jane Doe, Email: jd12#uw.com\n
Name: Sally Sue Draper, Email: ssd59#gmail.edu\n
Name: J.D. Salinger, Email: sal.jd#hotmail.org"""
pattern = r'Name: (.+?), Email: (.+)'
result = re.findall(pattern, string)
print(result)
Output:
[('Jane Doe', 'jd12#uw.com'), ('Sally Sue Draper', 'ssd59#gmail.edu'), ('J.D. Salinger', 'sal.jd#hotmail.org')]
Regex explanation:
Name: # literally
(.+?) # group 1, 1 or more any character but newline, not greedy
, Email: # literally
(.+) # group 2, 1 or more any character but newline

If you always have the same format, it may make more sense to avoid regular expressions in this scenario and approach the problem alternatively:
string = """Name: Jane Doe, Email: jd12#uw.com\n
Name: Sally Sue Draper, Email: ssd59#gmail.edu\n
Name: J.D. Salinger, Email: sal.jd#hotmail.org"""
people = [person for person in string.split('\n') if person]
people_list = []
for person in people:
name = ''
for char in person[6:]:
if char == ',':
break
else:
name += char
email = ''
for char in person[::-1]:
if char == ' ':
break
else:
email += char
email = email[::-1]
person_tuple = (name, email)
people_list.append(person_tuple)
This will give you a list of tuples if you print people_list:
[('Jane Doe', 'jd12#uw.com'), ('Sally Sue Draper', 'ssd59#gmail.edu'), ('J.D. Salinger', 'sal.jd#hotmail.org')]
This assumes that all your lines start with Name:, which is why the loop builds a name by concatenating all characters after that up to the first comma it finds.
For the email, it does the same thing in reverse: it takes all characters starting from the end of the string until it finds a space, where the email is effectively ending. It then puts it back in order to get the correct email.
To build the list of contacts, the loop will format name and email into a tuple that will be appended to people_list until there are no more contacts to add.
If you insist on using regular expressions, then a good use for those could be if you want to validate email addresses and not add a contact to your list if the email does not correspond to the format of your choosing (or leave it blank instead). The regex pattern could look like this:
email = 'example#email.com'
pattern = r'[a-z]+[a-z0-9]*[\w._-]*#[a-z]+\.[a-z]{1,3}$'
if re.match(pattern, email):
# do something with email here
Note that in this case, the regex uses symbols like + and * to avoid repetition, which is one of the keys in building a more robust regex.

Related

How do I make re.finditer only return each line once

I am searching a text file that is a "phoneBook" for an assignment and am using regex finditer, but if a name has the letter a in it twice it prints that line twice which is what I am trying to avoid. Also is there a way to have it ignore case?
def searchPhonebook(s): #This will search the phonebook(s) for the inputed data that is assigned to d
print()
d=input("Please enter the Name, Character, Phone Number, or a number: ") #Variable d which is the inputted data
print()
import re
pattern = re.compile(d)
for line in open("phone.txt"):
for match in re.finditer(pattern,line):
print(line)
So when I search 'a' it returns
Jack Hammer,277-4829
Jack Hammer,277-4829
Mike Rafone,345-3453
Earl Lee Riser,701-304-8293
So I would like it to return each one once, and also find capitalization of 'a', like Abby
Don't use findall(). Just test whether the line matches the pattern:
for line in open("phone.txt"):
if re.search(pattern, line):
print(line)
Actually, I'm not sure why you're using re at all. Do your users really enter regular expression patterns? If they're just entering a plain string, use if d in line:

Find duplicate letters in two different strings

My program asks the user to input first name then last name. How can I make it find the duplicate letters from first and last name? and if there is none it will print "no duplicate value in First and Last name"
here is my code but i need the output to be "the duplicate characters... is/are ['a', 'b', 'c']". and also when there is no duplicate my code prints multiple "No duplicate value in First name and Last name" but i need it to be one only.
for fl in firstName:
if fl in lastName:
print("The duplicate character in your First name and Last name is/are: ", tuple(fl))
else:
print("No duplicate value in First name and Last name")
One way to do so,
first_name = 'john'
last_name = 'Doe'
for letter in first_name:
if letter in last_name:
print(letter)

How to switch order of last name and first name and remove coma

Might be a simple question but I am very new to Python.
If I have a given "Last Name, First Name", how would I switch the order to "First Name Last Name''?
For example, if I have:
"Doe, John"
How would I make it
"John Doe"?
I tried using .split(), but that converts it into a List and I want it to be a string.
EDIT: The suggested question gives a List, however my example is just a string
This is the code that I came up with:
def change_name(name: str) -> str:
x = name.split(',')
x.reverse()
x.join()
but that just gives me an error that the list doesn't have the attribute join
Also, I did str(x).join(' ') but that just gives me the List ['John', 'Doe']
EDIT: I seem to have gotten closer to what I want but it's still not perfect.
Now I have :
def change_name(name: str) -> str:
x = name.split(',')
x.reverse()
separator = ','
return separator.join(x)
which gives me
' John,Doe'
Now the problem is I need to take out the space in front of the string and the comma.
Your mistake is to use a comma to join the First Name and Last Name while you should have used a space. Try this:
>>> name = "Doe, John"
>>> x = name.split(', ')
>>> x.reverse()
>>> separator = ' '
>>> separator.join(x)
'John Doe'
The operations can also be chained in a single line, such as
>>> ' '.join(reversed(name.split(', ')))
'John Doe'

Place a piece of a split on another line in Python?

I can't find a good example of the following in my textbook:
name = (input("Enter name(First Last:"))
last = name.split()
From here, I want to input the last name into another string.
How can I accomplish this task?
full_name = input("Enter name (First Last):")
first_name, last_name = full_name.split()
print(first_name)
print(last_name)
Split will return 2 strings here because full_name contains only one space between first and last name
After looking a little harder I figured out how input a variable into the middle of a string.
Here is the answer for i found for removing numbers from a string and inserting it to another string
<var0> = ''.join([i for i in <var1> if not i.isdigit()])
var0 = the string minus the numbers
var1 = the initial string to be changed

Read specific area or string in a text file

I have a text file I have written user data to. Username, E-Mail and Password.
That's how the userfile looks like for now
[<< LOGIN >>]
Username: admin
Password: 12345678
E-Mail: hue#hue.hue
[<< LOGIN END >>]
Now for the question.
How can I tell python to specifically read the password only? I mean, it may be possible for now that we know what the password is and what its lenght is. But how am I supposed to read the password later when I encrypt it and get some gibberish with 30+ characters?
The line will contain password so just split once and get the second element:
In [20]: from simplecrypt import encrypt
In [21]: ciph = encrypt('password', "12345678")
In [22]: line = "Password: " + ciph
In [23]: line
Out[23]: 'Password: sc\x00\x01\x0cP\xa1\xee\'$"\xc1\x85\xe0\x04\xd2wg5\x98\xbf\xb4\xd0\xacr\xd3\\\xbc\x9e\x00\xf1\x9d\xbe\xdb\xaa\xe6\x863Om\xcf\x0fc\xdeX\xfa\xa5\x18&\xd7\xcbh\x9db\xc9\xbeZ\xf6\xb7\xd3$\xcd\xa5\xeb\xc8\xa9\x9a\xfa\x85Z\xc5\xb3%~\xbc\xdf'
In [24]: line.split(None,1)[1]
Out[24]: 'sc\x00\x01\x0cP\xa1\xee\'$"\xc1\x85\xe0\x04\xd2wg5\x98\xbf\xb4\xd0\xacr\xd3\\\xbc\x9e\x00\xf1\x9d\xbe\xdb\xaa\xe6\x863Om\xcf\x0fc\xdeX\xfa\xa5\x18&\xd7\xcbh\x9db\xc9\xbeZ\xf6\xb7\xd3$\xcd\xa5\xeb\xc8\xa9\x9a\xfa\x85Z\xc5\xb3%~\xbc\xdf'
In [25]: decrypt("password",line.split(None,1)[1])
Out[25]: '12345678'
In [26]: "12345678" == decrypt("password",line.split(None,1)[1])
Out[26]: True
When you iterate over the file simple use if line.startswith("Password")...
with open(your_file) as f:
for line in f:
if line.startswith("Password"):
password = line.rstrip().split(None,1)[1]
# do your check
You could use a dict and pickle using passwordas a key then just do a lookup:
How can I tell python to specifically read the password only?
data.txt:
[<< LOGIN >>]
Username: admin
Password: 12345678
E-Mail: hue#hue.hue
[<< LOGIN END >>]
[<< LOGIN >>]
Username: admin
Password: XxyYo345320945!##!$##!##$%^%^^##$%!##$#!#41211
E-Mail: hue#hue.hue
[<< LOGIN END >>]
...
import re
f = open('data.txt')
pattern = r"""
Password #Match the word 'Password', followed by...
\s* #whitespace(\s), 0 or more times(*), followed by...
: #a colon
\s* #whitespace, 0 or more times...
(.*) #any character(.), 0 or more times(*). The parentheses 'capture' this part of the match.
"""
regex = re.compile(pattern, re.X) #When you use a pattern over and over for matching, it's more efficient to 'compile' the pattern.
for line in f:
match_obj = regex.match(line)
if match_obj: #then the pattern matched the line
password = match_obj.group(1) #group(1) is what matched the 'first' set of parentheses in the pattern
print password
f.close()
--output:--
12345678
XxyYo345320945!##!$##!##$%^%^^##$%!##$#!#41211
A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression (or if a given regular expression matches a particular string, which comes down to the same thing).
Regular expressions can be concatenated to form new regular expressions; if A and B are both regular expressions, then AB is also a regular expression. In general, if a string p matches A and another string q matches B, the string pq will match AB. This holds unless A or B contain low precedence operations; boundary conditions between A and B; or have numbered group references. Thus, complex expressions can easily be constructed from simpler primitive expressions like the ones described here. For details of the theory and implementation of regular expressions, consult the Friedl book referenced above, or almost any textbook about compiler construction.
A brief explanation of the format of regular expressions follows. For further information and a gentler presentation, consult the Regular Expression HOWTO.
https://docs.python.org/3/library/re.html#module-re

Resources