I am trying to search for similar words in Python given a wildcard pattern, for example in a text file similar to a dictionary- I could search for r?v?r? and the correct output would be words such as 'rover', 'raver', 'river'.
This is the code I have so far but it only works when I type in the full word and not the wildcard form.
name = input("Enter the name of the words file:\n"
pattern = input("Enter a search pattern:\n")`
textfile = open(name, 'r')
filetext = textfile.read()
textfile.close()
match = re.findall(pattern, filetext)
if match is True:
print(match)
else:
print("Sorry, matches for ", pattern, "could not to be found")
Use dots for blanks
name = input("Enter the name of the words file:\n"
pattern = input("Enter a search pattern:\n")`
textfile = open(name, 'r')
filetext = textfile.read()
textfile.close()
re.findall('r.v.r',filetext)
if match is True:
print(match)
else:
print("Sorry, matches for ", pattern, "could not to be found")
Also, match is a string, so you want to do
if match!="" or if len(match)>0
,whichever one suits your code.
Related
Hot to create a python code that user input matches the list of word and file.
for example
list = ["banana", "apple"]
file = open("file_path", "r")
search_word = input("Search the word you want to search: ")
for search_word in file.read()
and search_word in list
print("Search word is in the list and in the file")
else:
print("Search word is not matches")
If the search word is not in the list then don't see a need to search the contents of the file for a match since it fails the first condition. Can test that condition and if false then don't need to search the file.
You could do a simple string test if it is found in the file contents.
if search_word in data:
print("match")
However, words that contain the search word will also match (e.g. pineapple would match with apple).
You can use a regular expression to check if the word is contained anywhere in the file. The \b metacharacter matches at the beginning or end of a word so for example apple won't match the word pineapple. The (?i) flag does a case-insensitive search so the word apple matches Apple, etc.
You can try something like this.
import re
list = ["banana", "apple"]
search_word = input("Search the word you want to search: ")
if search_word not in list:
# if not in list then don't need to search the file
found = False
else:
with open("file_path", "r") as file:
data = file.read()
found = re.search(fr'(?i)\b{search_word}\b', data)
# now print results if there was a match or not
if found:
print("Search word is in the list and in the file")
else:
print("Search word does not match")
I am trying to read and replace all occurring strings within a file. Most are lowercase but there is one that is capitalized. How to a read the file so that regardless of the capitalization all strings are removed.
For reference this is the text I would like to edit and the word I would like to replace is "morning/Morning".
Text below:
"Good morning! / I was going to say you good morning / Good Afternoon Morning is when the sun comes up / I will call you in the morning"
See code below:
filename = input("Enter the filename: ")
stringToRemove = input("Enter the string to be removed: ")
infile = open(filename, 'r')
fileString = infile.read()
fileString = fileString.replace(stringToRemove, '')
infile.close()
outfile = open(filename, 'w')
outfile.write(fileString)
outfile.close()
print("Done")
You could use re.sub in case insensitive mode:
filename = input("Enter the filename: ")
stringToRemove = input("Enter the string to be removed: ")
infile = open(filename, 'r')
fileString = infile.read()
fileString = re.sub(r'\s*' + stringToRemove + r'\s*', ' ', fileString, flags=re.IGNORECASE).strip()
The output from your sample string here would be:
Good ! / I was going to say you good / Good Afternoon is when the sun comes up / I will call you in the
I want examine the hexa sentence.
with open("C:/python_tria/HEX/sample/test.zip", "rb+") as f:
stri = str(f. read())
sta=stri.find('this is where to start')
end=stri.find('this is where to end')
My plan is extract the part between 'sta' through 'end'.
What is the solution I could take?
You could try using re.findall on the file text to find what you are looking for:
with open("C:/python_tria/HEX/sample/test.zip", "rb+") as f:
stri = str(f.read())
matches = re.findall(r'this is where to start.*?this is where to end', stri, flags=re.DOTALL)
print(matches[0]) # print the first match
This code outputs the matching string once for every time it is in the file that is being searched (so I end up with a huge list if the string is there repeatedly). I only want to know if the strings from my list match, not how many times they match. I do want to know which strings match, so a True/False solution does not work. But I only want them listed once, each, if they match. I do not really understand what the pattern = '|'.join(keywords) part is doing - I got that from someone else's code to get my file to file matching working, but don't know if I need it. Your help would be much appreciated.
# declares the files used
filenames = ['//Katie/Users/kitka/Documents/appreport.txt', '//Dallin/Users/dallin/Documents/appreport.txt' ,
'//Aidan/Users/aidan/Documents/appreport.txt']
# parses each file
for filename in filenames:
# imports the necessary libraries
import os, time, re, smtplib
from stat import * # ST_SIZE etc
# finds the time the file was last modified and error checks
try:
st = os.stat(filename)
except IOError:
print("failed to get information about", filename)
else:
# creates a list of words to search for
keywords = ['LoL', 'javaw']
pattern = '|'.join(keywords)
# searches the file for the strings in the list, sorts them and returns results
results = []
with open(filename, 'r') as f:
for line in f:
matches = re.findall(pattern, line)
if matches:
results.append((line, len(matches)))
results = sorted(results)
# appends results to the archive file
with open("GameReport.txt", "a") as f:
for line in results:
f.write(filename + '\n')
f.write(time.asctime(time.localtime(st[ST_MTIME])) + '\n')
f.write(str(line)+ '\n')
Untested, but this should work. Note that this only keeps track of which words were found, not which words were found in which files. I couldn't figure out whether or not that's what you wanted.
import fileinput
filenames = [...]
keywords = ['LoL', 'javaw']
# a set is like a list but with no duplicates, so even if a keyword
# is found multiple times, it will only appear once in the set
found = set()
# iterate over the lines of all the files
for line in fileinput.input(files=filenames):
for keyword in keywords:
if keyword in line:
found.add(keyword)
print(found)
EDIT
If you want to keep track of which keywords are present in which files, then I'd suggest keeping a set of (filename, keyword) tuples:
filenames = [...]
keywords = ['LoL', 'javaw']
found = set()
for filename in filenames:
with open(filename, 'rt') as f:
for line in f:
for keyword in keywords:
if keyword in line:
found.add((filename, keyword))
for filename, keyword in found:
print('Found the word "{}" in the file "{}"'.format(keyword, filename))
def autocorrect(word):
Break_Word = sorted(word)
Sorted_Word = ''.join(Break_Word)
return Sorted_Word
user_input = ""
while (user_input == ""):
user_input = input("key in word you wish to enter: ")
user_word = autocorrect(user_input).replace(' ', '')
with open('big.txt') as myFile:
for word in myFile:
NewWord = str(word.replace(' ', ''))
Break_Word2 = sorted(NewWord.lower())
Sorted_Word2 = ''.join(Break_Word2)
if (Sorted_Word2 == user_word):
print("The word",user_input,"exist in the dictionary")
Basically when I had a dictionary of correctly spelled word in "big.txt", if I get the similar from the user input and the dictionary, I will print out a line
I am comparing between two string, after I sort it out
However I am not able to execute the line
if (Sorted_Word2 == user_word):
print("The word",user_input,"exist in the dictionary")
When I try hard code with other string like
if ("a" == "a"):
print("The word",user_input,"exist in the dictionary")
it worked. What wrong with my code? How can I compared two string from the file?
What does this mean? Does it throw an exception? Then if so, post that...
However I am not able to execute the line
if (Sorted_Word2 == user_word):
print("The word",user_input,"exist in the dictionary")
because I can run a version of your program and the results are as expected.
def autocorrect(word):
Break_Word = sorted(word)
Sorted_Word = ''.join(Break_Word)
return Sorted_Word
user_input = ""
#while (user_input == ""):
user_input = raw_input("key in word you wish to enter: ").lower()
user_word = autocorrect(user_input).replace(' ', '')
print ("user word '{0}'".format(user_word))
for word in ["mike", "matt", "bob", "philanderer"]:
NewWord = str(word.replace(' ', ''))
Break_Word2 = sorted(NewWord.lower())
Sorted_Word2 = ''.join(Break_Word2)
if (Sorted_Word2 == user_word):
print("The word",user_input,"exist in the dictionary")
key in word you wish to enter: druge
user word 'degru'
The word druge doesn't exist in the dictionary
key in word you wish to enter: Mike
user word 'eikm'
('The word','mike', 'exist in the dictionary')
Moreover I don't know what all this "autocorrect" stuff is doing. All you appear to need to do is search a list of words for an instance of your search word. The "sorting" the characters inside the search word achieves nothing.