Finding Close String Matches - valuing sub string word matches higher - python-3.x

I'm trying to find close string matches (context - searching for a discord user from user input).
Atm, I'm trying out the difflib. It works ok, but seems to return some funny results sometimes. Eg. if someone's name contains a word, searching that word may result in something that seems far off instead of that name.
I think that's just because of how get_close_matches works. Could I be suggested some other libraries to try out? (Not sure how to quantify what I'm after, but maybe I want a searcher that gives a higher score to names containing a similar word to the search term)
user_names = []
for member in server.members:
if member.name is not None: user_names.append(member.name)
if member.nick is not None: user_names.append(member.nick)
user_name = difflib.get_close_matches(user_msg, user_names, n = 1, cutoff = 0.2)

I've used https://github.com/seatgeek/fuzzywuzzy for this in the past which provides a few options out of the box from single words to tokenizing and sorting larger strings.

Related

How to change an input() to lower case?

I'm new to python and trying my best to learn. At this moment I'm following to program along with YouTube. But I got stuck with this piece of code where I'm trying to change user input to lowercase and comparing it to a list to see if item is available or not. And ever time I ran code I get Not available. Here is the code:
stock = ['Keyboard', 'Mouse', 'Headphones', 'Monitor']
productName = input('Which product would you like to look up:').lower()
if productName in stock
print('Available')
else:
print('Not Available')
- List item
Change your stock array to be all lowercase, like so:
stock = ['keyboard', 'mouse', 'headphones', 'monitor']
Because you modify the user input to be lowercase, no matter what, and the stock items in the array are capitalized, no matter what, they will never match in your if statement. String comparison in Python is case sensitive (as it is in nearly every programming language).

Is there a way to only list a certain format of text from a list?

I am quite new to python.
And i want to only get a certain format from a bigger list, example:
Whats in the list:
/ABC/EF213
/ABC/EF
/ABC/12AC4
/ABC/212
However the only on i want listed are the ones with this format /###/##### while the rest gets discarded
You could use a generator expression or a for loop to check each element of the list to see if it matches a pattern. One way of doing this would be to check if the item matches a regex pattern.
As an example:
import re
original_list = ["Item I don't want", "/ABC/EF213", "/ABC/EF", "/ABC/12AC4", "/ABC/212", "123/456", "another useless item", "/ABC/EF"]
filtered_list = [item for item in original_list if re.fullmatch("\/\w+\/\w+", item) is not None]
print(filtered_list)
outputs
['/ABC/EF213', '/ABC/EF', '/ABC/12AC4', '/ABC/212', '/ABC/EF']
If you need help making regex patterns, there are many great websites such as regexr which can help you
Every String can be used as a list without any conversion. If the only format you want to check is /###/##### then you can simply make if commands like these:
for text in your_list:
if len(text) == 10 and text[0] == "/" and text[4] == "/" (and so on):
print(text)
Of course this would require a lot of if statements and would take a pretty long time. So I would recomend doing a faster and simpler scan. We could perform this one by, for example, splitting the texts, which would look something like this:
for text in your_list:
checkstring = text.split("/")
Now you have your text Split in parts, and you can simply check what lengths these new parts have with the len() command.

How can I transform the string of characters back into words?

I've been trying to learn Python for the past two months or so, but I'm really only now getting my hands dirty with it, so I thank you in advance for your patience and insight.
I was working on a project where I was cleaning the names in a dataset. That means filtering out the names of the apps who have foreign characters (that is to say, ord(character) > 127.
However, it turns out that this approach removed too many legitimate apps since the emojis in those were coming back as out of that range.
The workaround is to allow up to one foreign character. So it's pretty straightforward for that part; I can simply scan the characters of the names in each list. The part I'm having trouble with is telling Python where in the loop to add a name to the "cleaned" list (the final version of app names having <=1 one error. (The requirements are actually different in my project, but I'm trying to keep it as simple as possible in this example.)
To simplify the problem a bit, I was working on a dummy list. I have included that for you.
Where do I add the code so that after that final iteration of each name, the name is added to the list entitled cleanedNameList to only append names with <=1 foreign character?
When I've tried appending a 'clean' name to the list before (a name that had <=1 foreign characters in it), it also sometimes adds the ones with more than three foreign characters. I think this is due in part to me not knowing where to put the exception counter.
nameList = ['うErick', 'とうきhine', 'Charliと']
cleanedNameList = []
exceptions = 0
for name in nameList:
print('New name', name, 'being evaluated!')
exceptions = 0
for char in name:
print(char, 'being evaluated')
ascii_value = ord(char)
if ascii_value < 127:
continue
elif ascii_value > 127:
exceptions+=1
print(exceptions, 'exception(s) added for', name)
#where would I add append.cleanedNamesList(name) ?
So, TL;DR: how do I scan a list of names, and once done scanning the list, add those names to a new list only IF they have <=1 foreign character.
def canAllow(s):
return sum((1 for char in s if ord(char)>127), 0) <= 1
cleanList = [name for name in nameList if canAllow(name)]

Entering a word and then searching through and array of words to find the word

I am trying to create a program which checks to see if words entered (when run) are in an array. I would like to use a loop for this.
I have created a list of words and tried a for loop however the code is proving to be erroneous.
def Mylist():
Mylist= [Toyota,BMW,Pontiac,Cadillac,Ford,Opel]
Search=input("Enter a word")
Mylist[1]="Toyota"
for loop in range (1,6):
if Mylist[loop]==Search:
print("found")
break
I have repeated line 4 for the other car manufacturers.
TypeError: 'function' object does not support item assignment
First, here some recommendations to start:
Indentation in Python is IMPORTANT, be careful to have the right indentation. You must take special care when posting code here in SO so your code does not look like complete gibberish.
You should read Naming conventions. TL;DR we use snake_case for naming functions and variables.
If you are not using an IDE (such as PyCharm) to program, or something intelligent that shows the information on functions, you should always check out the documentation (it is beautiful).
Check out the difference between "Toyota" and Toyota. The first one has quotes, it is a string (i.e. chain of characters), it is a primitive type such as integer and boolean; the second is a token that is to be evaluated, it has to be defined before, such as variables, functions and classes.
Search in the docs if there is an in-built function that already does the job you want.
Check out return values in functions. Functions evaluate to None when you do not explicit a return value.
Now to your question. As some people pointed out, there is the in keyword that does exactly what you want.
CAR_BRANDS= ["Toyota", "BMW", "Pontiac", "Cadillac", "Ford","Opel"]
def check_car():
word = input("Enter a word: ")
if word in CAR_BRANDS:
print("found")
return True
print("not found")
return False
If you don't care about the print you can just do return word in CAR_BRANDS
If you actually want to challenge yourself to write the logic, you were right in choosing a for-loop to iterate over the list.
Index in Python starts from 0, and that range does not give you all the indexes to iterate over your list, you are missing the 0 index. Also, we don't like magic numbers, instead of hard-coding the length of your list of car brands, better compute the length!
for i in range(len(CAR_BRANDS)):
if CAR_BRANDS[i] == word:
print("found")
But even better you can directly iterate over the items in your list, no need to do the range, which will give you something like:
CAR_BRANDS= ["Toyota", "BMW", "Pontiac", "Cadillac", "Ford","Opel"]
def check_car():
word = input("Enter a word: ")
for brand in CAR_BRANDS:
if brand == word:
print("found")
return True
print("not found")
return False
If you have any more questions, do not hesitate! Happy coding.

Python whois like function

Okay so I have a file called 'whois.txt' which contains
["96363612", "#a2743, coil, charge"]
["12101258", "#a0272, climate, vault"]
["83157521", "sith"]
["33907120", "#a1321, missile, wired"]
["55553768", "#a2722, legal, illegal"]
["22686400", "#a5619, mindless, #a5637, bank"]
["97436430", "jedi, #a5770, charge, lantern, #a9491, legal"]
["91645905", "sith"]
["89514799", "lantern, #a2563, #a2693"]
["19658307", "Umbrechu"]
["56112504", "#a0473, lantern, kryptonian"]
["12195491", "riyoken"]
["53281943", "#a5135, gateway, jedi"]
["76515035", "#a4023, gateway, wired"]
["79444876", "#a2716, loyalty"]
What I'm doing here is using json and using the first numbers as an ID and the accounts that are associated with the ID are linked by ', '. So using python I am using this code to try to get all the accounts that are associated
def getWhois(self):
x = []
f = open('whois.txt','r')
for line in f.readlines():
rid,names = json.loads(line.strip())
x.append([rid,names])
return x
def recvWhois(self,user):
returned = self.getWhois()
x = []
for data in returned:
rid,names = data[0],data[1]
if user in names:
x.append(names)
matches = list(set(', '.join(x).split(', ')))
return matches
So what that is doing is getting the matches of a user you are searching but I want to search the users in those matches also, I have done this but It feels Like I would have to do this an infinite amount of times of researching matches that are pulled so if I were to do self.recvWhois('missile') It would pull "['missile', 'wired', '#a1321']" I would then try to search all of those accounts to link more, and by now you probably see my problem because I would have to do that x amount of times depending on how many matches there are linked to the previous matched accounts If any of you have a solution to my problem it would be very appreciated.
First i would suggest to maintain an index for searching. You could use a search engine but a python map can also serve as a poor man's search engine. So idea is to have an inverted index where the usernames points to records to which they belong. For searching all linked accounts you can write a memoized recursive function which will cut down the infinite recursive paths. Also in case you have large no. of records you can limit recursion to a predefined maximum level.
It is really hard to tell what you are trying to do, but I think you are making it too complicated. Your data structure lends itself to a dictionary. Why not load it using rid as the key and names as the values?

Resources