Forming a target string using minimal substrings of words from a word list - python-3.x

I have a text file with first names but there are new names added every year.
I need a program in Python that takes parts of names from the text file and finds some combination of substrings of these names that can be concatenated to create a string that matches a user's input.
The program should do this using the fewest possible available names from the text file.
For example, if the text file contains this:
Joppe
Fien
Katrijn
Sven
Kobe
The program asks for a name that isn't already in the text file. For example:
Please fill in a name: Katrien
Then it should print this:
Katri => Katrijn
ien => Fien
Not like this--it builds the name correctly, but there is a better solution that uses fewer words:
K => Kobe
a => Katrijn
tr => Katrijn
ien => Fien
If the text file contains this:
Joppe
Fien
Makatrijn
Sven
Kobe
It could also print this:
Katr => Makatrijn
ien => Fien
I tried this but with no result:
name_input = input('Fill in a name: ')
with open('namen.txt', 'r') as file:
for name in file.readlines():
for letter_name_input in name_input:
for letter in name:
if letter == letter_name_input:
print(letter)

You can use a function that takes a target name and a set of names as input, tries matching a prefix of the target name with each name in the set of names, from the longest to the shortest, and for each matching name, recursively finds the names that would form the target name with the prefix removed, from the set of names with the matching name removed, and yields each of the returning combinations with the current prefix and name prepended as a tuple:
def form_name(target, names):
if target:
for i in range(len(target), 0, -1):
prefix = target[:i]
matching_names = [name for name in names if prefix.lower() in name.lower()]
if matching_names:
for name in matching_names:
for fragments in form_name(target[i:], names - {name}):
yield [(prefix, name), *fragments]
else:
yield []
so that you can use the min function with len as the key function to obtain the combination with the fewest names:
from io import StringIO
file = StringIO('''Joppe
Fien
Katrijn
Sven
Kobe''')
for fragment, name in min(form_name('Katrien', set(file.read().split())), key=len):
print(fragment, '=>', name)
outputs:
Katri => Katrijn
en => Fien
Demo: https://repl.it/repls/IllustriousTrustingIntegrationtesting
Note that both Fien and Sven in your example input would match the en fragment and make for valid answers with the fewest names, so the min function would arbitrarily return one of them (which is fine per your requirement). Also note that you shouldn't expect the fragments of the target name to overlap, so instead of ien the second fragment should be en after the first fragment Katri is removed from the target name Katrien.
If you're interested in seeing all the valid answers, you can calculate the minimum length of all the combinations first and then output all the combinations with the minimum length:
combinations = list(form_name('Katrien', set(file.read().split())))
min_len = min(map(len, combinations))
for combination in combinations:
if len(combination) == min_len:
for fragment, name in combination:
print(fragment, '=>', name)
print()
This outputs:
Katri => Katrijn
en => Sven
Katri => Katrijn
en => Fien
Katr => Katrijn
ien => Fien

Assuming you'd want to stop searching as soon as you find a shortest answer, here's my solution:
First you need a function to break the word into all possible parts starting from the biggest possible set:
def breakWord(word, n):
list = []
for k in range(len(word)):
subword = word[k:]
out = [(subword[i:i+n]) for i in range(0, len(subword), n)]
if(k > 0):
out.append(word[:k])
list.append(out)
return list
Notice that if you use:
breakWord(yourWord, len(yourWord)-1)
It will break the word into all possible sets of two parts.
Then a function to check if a given string is in the list of names:
def isInNames(word):
for name in name_list:
if(word in name):
return true
return false
Finally iterate over the whole possible combination of characters:
def findWordCombination(word):
resultSet = []
resultSize = 50 #Something large to ensure it gets changed
for i in range(len(word)-1, 0, -1): #Will go from max to minimum
testSet = breakWord(word, i)
for set in testSet:
isValid = true #assumes true at first
for part in set:
if(not isInNames(part)):
isValid = false
#Once all parts of the set are checked we find
#If the set is valid. i.e. it is a valid combination.
if(isValid and len(set) < resultSize):
resultSize = len(set)
resultList = set
return resultList
This will return the first set that finds with the minimum possible combination of subwords from your search query. You can tweak it to have it store the words names from the list that yielded the resulting set.

Yet another approach (I upvoted #blhsing's recursive solution already, very elegant, I love it)
import itertools as it
from collections import defaultdict
def get_all_substrings(input_string):
length = len(input_string)
return [input_string[i:j+1] for i in range(length) for j in range(i,length)]
names = ['Joppe', 'Fien', 'Katrijn', 'Sven', 'Kobe']
d = defaultdict(list) # each key is a substring of any of the names and the value is the list of names that contain it
for name in names:
for subname in get_all_substrings(name):
d[subname].append(name)
input_name = 'Katrien'
input_subs = get_all_substrings(input_name)
sub_combs = [it.combinations(input_subs, n) for n in range(1,len(input_name))]
whole_combs = [el for co in sub_combs for el in co if ''.join(el) == input_name] # those combs that can form the input name
saved = [wc for wc in whole_combs if all((c in d for c in wc))] # those whole combinations that actually appear
shortest_comb = min(saved, key=len)
shortest_sub_and_name = [(s, d[s]) for s in shortest_comb]
for s, ns in shortest_sub_and_name:
print(f"{s} => {ns}")
produces
Katr => ['Katrijn']
ien => ['Fien']
Note: as you can see, the output shows all the names that can contribute to each specific substring

you could try:
import difflib
name = input('Please fill in a name: ')
with open('namen.txt', 'r') as file:
file_data = file.readlines()
# either you are looking for
print([i for i in file_data if difflib.SequenceMatcher(a = i,b = name).ratio() >= 0.5])
#or you are looking for
print(difflib.get_close_matches(name,file_data,len(file_data),0.5))
['Katrijn\n', 'Fien\n']

Related

How to split strings from .txt file into a list, sorted from A-Z without duplicates?

For instance, the .txt file includes 2 lines, separated by commas:
John, George, Tom
Mark, James, Tom,
Output should be:
[George, James, John, Mark, Tom]
The following will create the list and store each item as a string.
def test(path):
filename = path
with open(filename) as f:
f = f.read()
f_list = f.split('\n')
for i in f_list:
if i == '':
f_list.remove(i)
res1 = []
for i in f_list:
res1.append(i.split(', '))
res2 = []
for i in res1:
res2 += i
res3 = [i.strip(',') for i in res2]
for i in res3:
if res3.count(i) != 1:
res3.remove(i)
res3.sort()
return res3
print(test('location/of/file.txt'))
Output:
['George', 'James', 'John', 'Mark', 'Tom']
Your file opening is fine, although the 'r' is redundant since that's the default. You claim it's not, but it is. Read the documentation.
You have not described what task is so I have no idea what's going on there. I will assume that it is correct.
Rather than populating a list and doing a membership test on every iteration - which is O(n^2) in time - can you think of a different data structure that guarantees uniqueness? Google will be your friend here. Once you discover this data structure, you will not have to perform membership checks at all. You seem to be struggling with this concept; the answer is a set.
The input data format is not rigorously defined. Separators may be commas or commas with trailing spaces, and may appear (or not) at the end of the line. Consider making an appropriate regular expression and using its splitting feature to split individual lines, though normal splitting and stripping may be easier to start.
In the following example code, I've:
ignored task since you've said that that's fine;
separated actual parsing of file content from parsing of in-memory content to demonstrate the function without a file;
used a set comprehension to store unique results of all split lines; and
used a generator to sorted that drops empty strings.
from io import StringIO
from typing import TextIO, List
def parse(f: TextIO) -> List[str]:
words = {
word.strip()
for line in f
for word in line.split(',')
}
return sorted(
word for word in words if word != ''
)
def parse_file(filename: str) -> List[str]:
with open(filename) as f:
return parse(f)
def test():
f = StringIO('John, George , Tom\nMark, James, Tom, ')
words = parse(f)
assert words == [
'George', 'James', 'John', 'Mark', 'Tom',
]
f = StringIO(' Han Solo, Boba Fet \n')
words = parse(f)
assert words == [
'Boba Fet', 'Han Solo',
]
if __name__ == '__main__':
test()
I came up with a very simple solution if anyone will need:
lines = x.read().split()
lines.sort()
new_list = []
[new_list.append(word) for word in lines if word not in new_list]
return new_list
with open("text.txt", "r") as fl:
list_ = set()
for line in fl.readlines():
line = line.strip("\n")
line = line.split(",")
[list_.add(_) for _ in line if _ != '']
print(list_)
I think that you missed a comma after Jim in the first line.
You can avoid the use of a loop by using split property :
content=file.read()
my_list=content.split(",")
to delete the occurence in your list you can transform it to set :
my_list=list(set(my_list))
then you can sort it using sorted
so the finale code :
with open("file.txt", "r") as file :
content=file.read()
my_list=content.replace("\n","").replace(" ", "").split(",")
result=sorted(list(set(my_list)))
you can add a key to your sort function

How to print many variables' names and their values

I have a big chunk of json code. I assign the needed me values to more than +10 variables. Now I want to print all variable_name = value using print how I can accomplish this task
Expected output is followed
variable_name_1 = car
variable_name_2 = house
variable_name_3 = dog
Updated my code example
leagues = open("../forecast/endpoints/leagues.txt", "r")
leagues_json = json.load(leagues)
data_json = leagues_json["api"["leagues"]
for item in data_json:
league_id = item["league_id"]
league_name = item["name"]
coverage_standings = item["coverage"]["standings"]
coverage_fixtures_events =
item["coverage"]["fixtures"]["events"]
coverage_fixtures_lineups =
item["coverage"]["fixtures"]["lineups"]
coverage_fixtures_statistics =
item["coverage"]["fixtures"]["statistics"]
coverage_fixtures_players_statistics = item["coverage"]["fixtures"]["players_statistics"]
coverage_players = item["coverage"]["players"]
coverage_topScorers = item["coverage"]["topScorers"]
coverage_predictions = item["coverage"]["predictions"]
coverage_odds = item["coverage"]["odds"]
print("leagueName:" league_name,
"coverageStandings:" coverage_standings,
"coverage_fixtures_events:"
coverage_fixtures_events,
"coverage_fixtures_lineups:"
coverage_fixtures_lineups,
"coverage_fixtures_statistics:"
coverage_fixtures_statistics,
"covage_fixtes_player_statistics:"
covage_fixres_players_statistics,
"coverage_players:"
coverage_players,
"coverage_topScorers:"
coverage_topScorers,
"coverage_predictions:"
coverage_predictions,
"coverage_odds:"coverage_odds)
Since you have the JSON data loaded as Python objects, you should be able to use regular loops to deal with at least some of this.
It looks like you're adding underscores to indicate nesting levels in the JSON object, so that's what I'll do here:
leagues = open("../forecast/endpoints/leagues.txt", "r")
leagues_json = json.load(leagues)
data_json = leagues_json["api"]["leagues"]
def print_nested_dict(data, *, sep='.', context=''):
"""Print a dict, prefixing all values with their keys,
and joining nested keys with 'separator'.
"""
for key, value in data.items():
if context:
key = context + sep + key
if isinstance(value, dict):
print_nested_dict(value, sep=sep, context=key)
else:
print(key, ': ', value, sep='')
print_nested_dict(data_json, sep='_')
If there is other data in data_json that you do not want to print, the easiest solution might be to add a variable listing the names you want, then add a condition to the loop so it only prints those names.
def print_nested_dict(data, *, separator='.', context=None, only_print_keys=None):
...
for key, value in data.items():
if only_print_keys is not None and key not in only_print_keys:
continue # skip ignored elements
...
That should work fine unless there is a very large amount of data you're not printing.
If you really need to store the values in variables for some other reason, you could assign to global variables if you don't mind polluting the global namespace.
def print_nested_dict(...):
...
else:
name = separator.join(contet)
print(name, ': ', value, sep='')
globals()[name] = value
...

Split the file and further split specific index

I have a big configuration file which needs to be splitted per hierarchy based on specific syntax; completed. Among that on a specific string match I am pulling specific index and that index needs to be splitted further based on regex match"\s{8}exit" which is not working
import re
def config_readFile(filename):
F=open(filename,"r")
s=F.read()
F.close()
return re.split("#\-{50}",s)
return s
C = config_readFile('bng_config_input.txt')
print ('printing C:', C)
print (C.__len__())
vAAA_FIXED = []
for i in C:
if "CRS-1-SUB1" in i:
vAAA_FIXED.append(i)
# print (vAAA_FIXED)
print (vAAA_FIXED.__len__())
print(vAAA_FIXED)
vAAA_FIXED = vAAA_FIXED.split(" ")
print (vAAA_FIXED.__len__())
Need to get new list from the original list

Extract characters within certain symbols

I have extracted text from an HTML file, and have the whole thing in a string.
I am looking for a method to loop through the string, and extract only values that are within square brackets and put strings in a list.
I have looked in to several questions, among them this one: Extract character before and after "/"
But i am having a hard time modifying it. Can someone help?
Solved!
Thank you for all your inputs, I will definitely look more into regex. I managed to do what i wanted in a pretty manual way (may not be beautiful):
#remove all html code and append to string
for i in html_file:
html_string += str(html2text.html2text(i))
#set this boolean if current character is either [ or ]
add = False
#extract only values within [ or ], based on add = T/F
for i in html_string:
if i == '[':
add = True
if i == ']':
add = False
clean_string += str(i)
if add == True:
clean_string += str(i)
#split string into list without square brackets
clean_string_list = clean_string.split('][')
The HTML file I am looking to get as pure text (dataframe later on) instead of HTML, is my personal Facebook data that i have downloaded.
Try out this regex, given a string it will place all text inside [ ] into a list.
import re
print(re.findall(r'\[(\w+)\]','spam[eggs][hello]'))
>>> ['eggs', 'hello']
Also this is a great reference for building your own regex.
https://regex101.com
EDIT: If you have nested square brackets here is a function that will handle that case.
import re
test ='spam[eg[nested]gs][hello]'
def square_bracket_text(test_text,found):
"""Find text enclosed in square brackets within a string"""
matches = re.findall(r'\[(\w+)\]',test_text)
if matches:
found.extend(matches)
for word in found:
test_text = test_text.replace('[' + word + ']','')
square_bracket_text(test_text,found)
return found
match = []
print(square_bracket_text(test,match))
>>>['nested', 'hello', 'eggs']
hope it helps!
You can also use re.finditer() for this, see below example.
Let suppose, we have word characters inside brackets so regular expression will be \[\w+\].
If you wish, check it at https://rextester.com/XEMOU85362.
import re
s = "<h1>Hello [Programmer], you are [Excellent]</h1>"
g = re.finditer("\[\w+\]", s)
l = list() # or, l = []
for m in g:
text = m.group(0)
l.append(text[1: -1])
print(l) # ['Programmer', 'Excellent']

Trying to make a phone book using python

I am trying to make a phone book using these instructions
Write a program that creates 2 lists: one of names and one of phone numbers. Give these variables appropriate names (for example names and numbers). Using a for loop, have the user enter 3 names and 3 numbers of people for the phone book. Next: display the entries from the phone book, name and then number. Use a for loop. Next, ask the user to enter a name. Store their input in a variable. Use a search to see if the name is entered in the name list. If the name is in the name list, print the number. If not have the program respond, “Name not found.
Your output should look like:
Name Number
sally 11
bob 22
carl 33  
Number you are looking for is: 11
All I want to know is how do you make a simple list out of user inputed data. so I can do this question.
Pseudocode is
#LOOP THREE TIMES
# names = GET INPUT name
# numbers = GET INPUT number
#END LOOP
#LOOP THREE TIMES
# PRINT (name) in names, (number) in numbers
#END LOOP
# searchName = GET INPUT "Enter a name for Search"
#IF searchName IN names THEN
# PRINT matching number
# LOOP names
# IF searchName == name THEN
# foundIndex = name(index)
# searchPhoneNumber = phoneNumber[foundIndex]
# END IF
# END LOOP
# PRINT searchPhoneNumber
#ELSE
# PRINT "Name Not Found"
#END IF
use this:
names = []
phone_numbers = []
num = 3
for i in range(num):
name = input("Name: ")
phone_number = input("Phone Number: ") # for convert to int => int(input("Phone Number: "))
names.append(name)
phone_numbers.append(phone_number)
print("\nName\t\t\tPhone Number\n")
for i in range(num):
print("{}\t\t\t{}".format(names[i], phone_numbers[i]))
search_term = input("\nEnter search term: ")
print("Search result:")
if search_term in names:
index = names.index(search_term)
phone_number = phone_numbers[index]
print("Name: {}, Phone Number: {}".format(search_term, phone_number))
else:
print("Name Not Found")
To add a name or number to the appropriate list, use the append function i.e.
numberlist.append(number_that_was_input)
or
namelist.append(name_that_was_input)
and as #cricket007 so eloquently states, we do like to see that you at least try to do things for yourself.
To receive input from the user, use the input() function.
Example:
name = input('type in name')
print(name)
#Outputs the name you typed.
To add that value into a list use the append.
Example:
my_list = [] #Initialize list first.
my_list.append(name) # this will add the contents of variable name to your list.
# my_list now looks like this: ["user817205"]
Since you have to do this 3 times, it's smart to use a for loop to do that,
you can iterate 3 times through a block of code using the following:
for _ in range(3):
#type the code you want to repeat 3 times here!
PS.: Remember you only need to initialize your list once, so keep the my_list = [] out of the for loop.

Resources