Can I remove name prefix without contaminate name data? - python-3.x

I have tried to remove prefix from the name. Now I use re.sup method to remove the prefix but some of the name are contain a character that included in the prefix.
Data example
-MisterClarkKent
-Mrs.Carol
-missjanedoemiss
I tried re.sub(r'(^\w{2,5}\ ?)', r'', name) to remove prefix with fix the position but it won't work because I have more than 10 prefix and each prefix has different size.
import re
name = 'mrjasontoddmr'
filter_name = re.sub(r'mr', r'', name)
print(filter_name)
#The result of filer_name is jasontodd but what I want is jasontoddmr
I expect the output of "jasontoddmr"

You can specify the count and ignore the case with the provided arguments in re.sub().
import re
names = ['MisterClarkKent','Mrs.Carol','missjanedoemiss', 'mrjasontoddmr']
filter_names = [re.sub(r'mrs?\.?|mister\s?|miss\s?', r'',name, count=1, flags=re.IGNORECASE) for name in names]
filter_names
Out[99]: ['ClarkKent', 'Carol', 'janedoemiss', 'jasontoddmr']
The ? means the character is optional so in mrs?\.? bother s and . are optional so it can capture both mr or mr. and mrs or mrs..

Related

Extract value from event path - lambda function

The lambda I am working on gets triggered through API gateway.
I want to extract the a specific value from the path in the URL.
Sample URL : {id}/contacts
or
{id-0}/{id}/contacts
In order to extract the path variable I am using event.pathParamters
which gives me the value, but I need to only extract {id} from the path.
I am using the following code to split the path param and extract the {id}, but this is not a feasible option:
arr = path.split("/");
id = arr[arr.length-2];
Are there better ways to extract {id}? The position of this id will be always last right before api name (in his case <<contacts>>).
This would extract the string which is located between the last two occurrences of / or the first occurrence if two / do not exist
([^\/]+)\/[^\/]+$
https://regex101.com/r/tZNhrk/1
Would you please try the following;
import re
str = '{id-0}/{id}/contacts' # example
api_name = 'contacts' # api name
m = re.search(r'[^/]+(?=/%s)' % api_name, str)
if m:
id = m.group()
The regex [^/]+(?=/%s) matches a string of non-slash characters which is followed by a slash and the specified api_name. If the regex matches, m.group() is assigned to it.

How to get demangled function name using regex

I have list of demangled-function names like _Z6__comp7StudentS_
_Z4SortiSt6vectorI7StudentSaIS0_EE. I read wiki and found out that it follows some sort of defined structure. _Z is mangled Symbol followed by a number and then the function name of that length.
So I wanted to retrieve that function name using regex. I only come close to _Z(?:\d)(?<function_name>[a-z_A-Z]){\1}. But referring \1 won't work because its string, right? Is there a single regex pattern solution to this.
You can use 2 capture groups, and get the part of the string using the position of capture group 2
import re
pattern = r"_Z(\d+)([a-z_A-Z]+)"
s = "_Z4SortiSt6vectorI7StudentSaIS0_EE"
m = re.search(pattern, s)
if m:
print(m.group(2)[0: int(m.group(1))])
Output
Sort
Using _Z6__comp7StudentS_ will return __comp

How to find a line which contains a string without any suffix and prefix in a string?

I tried to find the solution on different platform, but I couldn't able to. So I am here.
I am reading a line in a file which contains a specific string(user Input). But the Problem is, my Code is reading all the lines. For an example.
Here user Input is: "Mon_ErrEntryEspSqPlaus"
Output line:
/begin MEASUREMENT Icsp_Dem_Deb_LfEve_Mon_ErrEntryEspSqPlaus
Here Output line string has Suffix with it. Not intended.
Instead of reading just below line:
941 "Mon_ErrEntryEspSqPlaus"
No Suffix and prefix in the above line with user Input string.
Here is the Code:
import re
def a2l_reader(parameter):
count = 0;
count_1 = 0;
with open("TPT.a2l", errors = 'replace') as myfile:
for num, line in enumerate(myfile,1):
if parameter in line:
if re.match(r'sample', line):
count += 1
else:
count_1 += 1
print(count)
print(count_1)
The Question is how to search for the specific line which contains a specific string without Suffix and prefix. Since I have to use the number associated with that string.
Thanks in advance
Instead of
if parameter in line:
you can simply do
if parameter == line:
and it will only proceed if there is an exact match. The first example (which is the one you have in your code) will match if there are substrings matching your input
In that case if you want to match the exact string you can split by spaces and then check contains using in ::
Split by Spaces and the check in list
if parameter in re.split("( )",line):

how can i split a full name to first name and last name in python?

I'm a novice in python programming and i'm trying to split full name to first name and last name, can someone assist me on this ? so my example file is:
Sarah Simpson
I expect the output like this : Sarah,Simpson
You can use the split() function like so:
fullname=" Sarah Simpson"
fullname.split()
which will give you: ['Sarah', 'Simpson']
Building on that, you can do:
first=fullname.split()[0]
last=fullname.split()[-1]
print(first + ',' + last)
which would give you Sarah,Simpson with no spaces
This comes handly : nameparser 1.0.6 - https://pypi.org/project/nameparser/
>>> from nameparser import HumanName
>>> name = "Sarah Simpson"
>>> name = HumanName(name)
>>> name.last
'Simpson'
>>> name.first
'Sarah'
>>> name.last+', '+name.first
'Simpson, Sarah'
you can try the .split() function which returns a list of strings after splitting by a separator. In this case the separator is a space char.
first remove leading and trailing spaces using .strip() then split by the separator.
first_name, last_name=fullname.strip().split()
Strings in Python are immutable. Create a new String to get the desired output.
You can use split() method of string class.
name = "Sarah Simpson"
name.split()
split() by default splits on whitespace, and takes separator as parameter. It returns a list
["Sarah", "Simpson"]
Just concatenate the strings. For more reference https://docs.python.org/3.7/library/stdtypes.html?highlight=split#str.split
Output = "Sarah", "Simpson"
name = "Thomas Winter"
LastName = name.split()[1]
(note the parantheses on the function call split.)
split() creates a list where each element is from your original string, delimited by whitespace. You can now grab the second element using name.split()[1] or the last element using name.split()[-1]
split() is obviously the function to go for-
which can take a parameter or 0 parameter
fullname="Sarah Simpson"
ls=fullname.split()
ls=fullname.split(" ") #this will split by specified space
Extra Optional
And if you want the split name to be shown as a string delimited by coma, then you can use join() or replace
print(",".join(ls)) #outputs Sarah,Simpson
print(st.replace(" ",","))
Input: Sarah Simpson => suppose it is a string.
Then, to output: Sarah, Simpson. Do the following:
name_surname = "Sarah Simpson".split(" ")
to_output = name_surname[0] + ", " + name_surname[-1]
print(to_output)
The function split is executed on a string to split it by a specified argument passed to it. Then it outputs a list of all chars or words that were split.
In your case: the string is "Sarah Simpson", so, when you execute split with the argument " " -empty space- the output will be: ["Sarah", "Simpson"].
Now, to combine the names or to access any of them, you can right the name of the list with a square brackets containing the index of the desired word to return. For example: name_surname[0] will output "Sarah" since its index is 0 in the list.

Searching for strings in a 'dictionary' file with multiple wildcard values

I am trying to create a function which will take 2 parameters. A word with wildcards in it like "*arn*val" and a file name containing a dictionary. It returns a list of all words that match the word like ["carnival"].
My code works fine for anything with only one "*" in it, however any more and I'm stumped as to how to do it.
Just searching for the wildcard string in the file was returning nothing.
Here is my code:
dictionary_file = open(dictionary_filename, 'r')
dictionary = dictionary_file.read()
dictionary_file.close()
dictionary = dictionary.split()
alphabet = ["a","b","c","d","e","f","g","h","i",
"j","k","l","m","n","o","p","q","r",
"s","t","u","v","w","x","y","z"]
new_list = []
for letter in alphabet:
if wildcard.replace("*", letter) in dictionary:
new_list += [wildcard.replace("*", letter)]
return new_list
The parameters parameters: First is the wildcard string (wildcard), and second is the dictionary file name (dictionary_filename).
Most answers on this site were about Regex, which I have no knowledge of.
Your particular error is that .replace replaces all occurrences e.g., "*arn*val" -> "CarnCval" or "IarnIval". You want different letters here. You could use the second nested loop over the alphabet (or use itertools.product() to generate all possible letter pairs) to fix it but a simpler way is to use regular expressions:
import re
# each `*` corresponds to an ascii lowercase letter
pattern = re.escape(wildcard).replace("\\*", "[a-z]")
matches = list(filter(re.compile(pattern+"$").match, known_words))
Note: it doesn't support escaping * in the wildcard.
If input wildcards are file patterns then you could use fnmatch module to filter words:
import fnmatch
matches = fnmatch.filter(known_words, wildcard)

Resources