I'm trying to code a Python script to find files in a directory that contain two keywords in its file contents. I have posted a question before that referred to a basic issue with a much simpler version of this code, but I wasn't sure if I needed to post this separately since I am now looking at a different issue.
import glob
import os
import sqlite3
import re
conn = sqlite3.connect( "C:\\Users\\Jeff\\Documents\\GitHub\\YGOPro Salvation Server\\YGOPro-Support-System\\http\\ygopro\\databases\\0-en-OCGTCG.cdb" )
curs = conn.cursor()
#Define string constants
trig = "EFFECT_TYPE_TRIGGER"
summon = "SUMMON_SUCCESS"
flip = "EFFECT_TYPE_FLIP"
flip2 = "EVENT_FLIP"
pos = "EVENT_CHANGE_POS"
spelltrap = "EFFECT_TYPE_ACTIVATE"
banish = "EVENT_REMOVE"
grave = "EVENT_TO_GRAVE"
os.chdir( "C:\\Users\\Jeff\\Documents\\GitHub\\YGOPro Salvation Server\\Salvation-Scripts-TCG" )
for files in glob.glob( "*.lua" ) :
f = open( files, 'r', encoding = "iso-8859-1" )
for line in f :
files = re.sub('[c.luatilityold]', '', files)
#Use database to print names corresponding to each file ID for verification purpose
result = curs.execute("SELECT id, name FROM texts WHERE ID=?", (files,))
x = result.fetchone()
#Check for files that have both 'trig' and 'banish' values in contents
if trig and banish in line :
if x is not None :
print ( x )
#Check for files that have both 'trig' and 'grave' values in contents
elif trig and grave in line :
if x is not None :
print ( x )
#Check for files that have both 'trig' and 'summon' values in contents
elif trig and summon in line :
if x is not None :
print ( x )
#Check for files that have 'flip' value in contents
elif flip in line :
if x is not None :
print ( x )
#Check for files that have both 'trig' and 'flip2' values in contents
elif trig and flip2 in line :
if x is not None :
print ( x )
#Check for files that have both 'trig' and 'pos' values in contents
elif trig and pos in line :
if x is not None :
print ( x )
#Ignore other files
else :
pass
The issue that I'm having is that the if-cases aren't working properly. The trig variable gets ignored while the code is running, and it therefore only looks at the second key. I have tried using wording such as if trig in line and banish in line, but the problem is that it will only look for files that have these two keys together on the same line of a file's contents. I need this to be able to find a file that has the two keys anywhere in the file. Is there a better way to search for the two keys in one go like how I was trying to do, or is there a different approach that I need to take?
Since this code relies on databases specific to your machine, I cannot test whether my changes work to create your expected outputs.
I am not really sure what you are trying to do here. If you want to search for the keywords in the whole file, read in the entire file at once instead of line by line.
import glob
import os
import sqlite3
import re
conn = sqlite3.connect( "C:\\Users\\Jeff\\Documents\\GitHub\\YGOPro Salvation Server\\YGOPro-Support-System\\http\\ygopro\\databases\\0-en-OCGTCG.cdb" )
curs = conn.cursor()
#Define string constants
trig = "EFFECT_TYPE_TRIGGER"
summon = "SUMMON_SUCCESS"
flip = "EFFECT_TYPE_FLIP"
flip2 = "EVENT_FLIP"
pos = "EVENT_CHANGE_POS"
spelltrap = "EFFECT_TYPE_ACTIVATE"
banish = "EVENT_REMOVE"
grave = "EVENT_TO_GRAVE"
os.chdir( "C:\\Users\\Jeff\\Documents\\GitHub\\YGOPro Salvation Server\\Salvation-Scripts-TCG" )
for filename in glob.glob( "*.lua" ) :
with open(filename, 'r', encoding = "iso-8859-1") as content_file:
content = content_file.read()
query_file = re.sub('[c.luatilityold]', '', filename)
#Use database to print names corresponding to each file ID for verification purpose
result = curs.execute("SELECT id, name FROM texts WHERE ID=?", (query_file ,))
x = result.fetchone()
#Check for files that have both 'trig' and 'banish' values in contents
if trig in content and banish in content:
if x is not None :
print ( x )
#Check for files that have both 'trig' and 'grave' values in contents
elif trig in content and grave in content:
if x is not None :
print ( x )
...
You definitely want to use the syntax
if x in content and y in content
Otherwise, the conditional will always be evaluated to True because the string is always non-empty and will therefore be always True (as indicated in a comment in your last post).
I assumed in your code that the reuse of the variable files in
files = re.sub('[c.luatilityold]', '', files)
#Use database to print names corresponding to each file ID for verification purpose
result = curs.execute("SELECT id, name FROM texts WHERE ID=?", (files,))
x = result.fetchone()
is incidental. Changing the value stored in files should not affect anything else in this instance, but if you try to get the file path of the current file, you will not be getting what you expected. As such, I would also recommend using a different variable for this operation like I did.
Related
For school, I need to create a spell checker, using python. I decided to do it using a GUI created with tkinter. I need to be able to input a text (.txt) file that will be checked, and a dictionary file, also a text file. The program needs to open both files, check the check file against the dictionary file, and then display any words that are misspelled.
Here's my code:
import tkinter as tk
from tkinter.filedialog import askopenfilename
def checkFile():
# get the sequence of words from a file
text = open(file_ent.get())
dictDoc = open(dict_ent.get())
for ch in '!"#$%&()*+,-./:;<=>?#[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.split()
# make a dictionary of the word counts
wordDict = {}
for w in words:
wordDict[w] = wordDict.get(w,0) + 1
for k in dictDict:
dictDoc.pop(k, None)
misspell_lbl["text"] = dictDoc
# Set-up the window
window = tk.Tk()
window.title("Temperature Converter")
window.resizable(width=False, height=False)
# Setup Layout
frame_a = tk.Frame(master=window)
file_lbl = tk.Label(master=frame_a, text="File Name")
space_lbl = tk.Label(master=frame_a, width = 6)
dict_lbl =tk.Label(master=frame_a, text="Dictionary File")
file_lbl.pack(side=tk.LEFT)
space_lbl.pack(side=tk.LEFT)
dict_lbl.pack(side=tk.LEFT)
frame_b = tk.Frame(master=window)
file_ent = tk.Entry(master=frame_b, width=20)
dict_ent = tk.Entry(master=frame_b, width=20)
file_ent.pack(side=tk.LEFT)
dict_ent.pack(side=tk.LEFT)
check_btn = tk.Button(master=window, text="Spellcheck", command=checkFile)
frame_c = tk.Frame(master=window)
message_lbl = tk.Label(master=frame_c, text="Misspelled Words:")
misspell_lbl = tk.Label(master=frame_c, text="")
message_lbl.pack()
misspell_lbl.pack()
frame_a.pack()
frame_b.pack()
check_btn.pack()
frame_c.pack()
# Run the application
window.mainloop()
I want the file to check against the dictionary and display the misspelled words in the misspell_lbl.
The test files I'm using to make it work, and to submit with the assignment are here:
check file
dictionary file
I preloaded the files to the site that I'm submitting this on, so it should just be a matter of entering the file name and extension, not the entire path.
I'm pretty sure the problem is with my function to read and check the file, I've been beating my head on a wall trying to solve this, and I'm stuck. Any help would be greatly appreciated.
Thanks.
The first problem is with how you try to read the files. open(...) will return a _io.TextIOWrapper object, not a string and this is what causes your error. To get the text from the file, you need to use .read(), like this:
def checkFile():
# get the sequence of words from a file
with open(file_ent.get()) as f:
text = f.read()
with open(dict_ent.get()) as f:
dictDoc = f.read().splitlines()
The with open(...) as f part gives you a file object called f, and automatically closes the file when it's done. This is more concise version of
f = open(...)
text = f.read()
f.close()
f.read() will get the text from the file. For the dictionary I also added .splitlines() to turn the newline separated text into a list.
I couldn't really see where you'd tried to check for misspelled words, but you can do it with a list comprehension.
misspelled = [x for x in words if x not in dictDoc]
This gets every word which is not in the dictionary file and adds it to a list called misspelled. Altogether, the checkFile function now looks like this, and works as expected:
def checkFile():
# get the sequence of words from a file
with open(file_ent.get()) as f:
text = f.read()
with open(dict_ent.get()) as f:
dictDoc = f.read().splitlines()
for ch in '!"#$%&()*+,-./:;<=>?#[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.split()
# make a dictionary of the word counts
wordDict = {}
for w in words:
wordDict[w] = wordDict.get(w,0) + 1
misspelled = [x for x in words if x not in dictDoc]
misspell_lbl["text"] = misspelled
I am learning some coding, and I am stuck with an error I can't explain. Basically I want to read out a .csv file with birth statistics from the US to figure out the most popular name in the time recorded.
My code looks like this:
# 0:Id, 1: Name, 2: Year, 3: Gender, 4: State, 5: Count
names = {} # initialise dict names
maximum = 0 # store for maximum
l = []
with open("Filepath", "r") as file:
for line in file:
l = line.strip().split(",")
try:
name = l[1]
if name in names:
names[name] = int(names[name]) + int(l(5))
else:
names[name] = int(l(5))
except:
continue
print(names)
max(names)
def max(values):
for i in values:
if names[i] > maximum:
names[i] = maximum
else:
continue
return(maximum)
print(maximum)
It seems like the dictionary does not take any values at all since the print command does not return anything. Where did I go wrong (incidentally, the filepath is correct, it takes a while to get the result since the .csv is quite big. So my assumption is that I somehow made a mistake writing into the dictionary, but I was staring at the code for a while now and I don't see it!)
A few suggestions to improve your code:
names = {} # initialise dict names
maximum = 0 # store for maximum
with open("Filepath", "r") as file:
for line in file:
l = line.strip().split(",")
names[name] = names.get(name, 0) + l[5]
maximum = [(v,k) for k,v in names]
maximum.sort(reversed=True)
print(maximum[0])
You will want to look into Python dictionaries and learn about get. It helps you accomplish the objective of making your names dictionary in less lines of codes (more Pythonic).
Also, you used def to generate a function but you never called that function. That is why it's not printing.
I propose the shorted code above. Ask if you have questions!
Figured it out.
I think there were a few flow issues: I called a function before defining it... is that an issue or is python okay with that?
Also I think I used max as a name for a variable, but there is a built-in function with the same name, that might cause an issue I guess?! Same with value
This is my final code:
names = {} # initialise dict names
l = []
def maxval(val):
maxname = max(val.items(), key=lambda x : x[1])
return maxname
with open("filepath", "r") as file:
for line in file:
l = line.strip().split(",")
name = l[1]
try:
names[name] = names.get(name, 0) + int(l[5])
except:
continue
#print(str(l))
#print(names)
print(maxval(names))
I am developing a program which works with a ; separated csv.
When I try to execute the following code
def accomodate(fil, targets):
l = fil
io = []
ret = []
for e in range(len(l)):
io.append(l[e].split(";"))
for e in io:
ter = []
for theta in range(len(e)):
if targets.count(theta) > 0:
ter.append(e[theta])
ret.append(ter)
return ret
, being 'fil' the read rows of the csv file and 'targets' a list which contains the columns to be chosen. While applying the split to the csv file it raises the folowing error: "'l' name is not defined" while as far as I can see the 'l' variable has already been defined.
Does anyone know why this happens? Thanks beforehand
edit
As many of you have requested, I shall provide with an example.
I shall post an example of csv, not a shard of the original one. It comes already listed
k = ["Cookies;Brioche;Pudding;Pie","Dog;Cat;Bird;Fish","Boat;Car;Plane;Skate"]
accomodate(k, [1,2]) = [[Brioche, Pudding], [Cat, Bird], [Car, Plane]]
You should copy the content of fil list:
l = fil.copy()
I'm new to programming. i need to index three separate txt files. And do a search from an input. When i do a print it gives me the entire path name. i would like to print the txt file name.
i've trying using os.list in the function
import os
import time
import string
import os.path
import sys
word_occurrences= {}
def index_text_file (txt_filename,ind_filename, delimiter_chars=",.;:!?"):
try:
txt_fil = open(txt_filename, "r")
fileString = txt_fil.read()
for word in fileString.split():
if word in word_occurrences:
word_occurrences[word] += 1
else:#
word_occurrences [word] = 1
word_keys = word_occurrences.keys()
print ("{} unique words found in".format(len(word_keys)),txt_filename)
word_keys = word_occurrences.keys()
sorted(word_keys)
except IOError as ioe: #if the file can't be opened
sys.stderr.write ("Caught IOError:"+ repr(ioe) + "/n")
sys.exit (1)
index_text_file("/Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.txt","/Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.idx")
SyntaxError: invalid syntax
(base) 8c85908188d1:CODE z007881$ python3 indexed.py
9395 unique words found in /Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.t
xt
i would like it to say 9395 unique words found in book3.txt
One way to do it would be to split the path on the directory separator / and pick the last element:
file_name = txt_filename.split("/")[-1]
# ...
# Then:
print("{} unique words found in".format(len(word_keys)), file_name)
# I would prefer using an fstring, unless your Python version is too old:
print(f"{len(word_keys)} found in {file_name}")
I strongly advise to change the name of txt_filename into something less misleading like txt_filepath, since it does not contain a file name but a whole path (including, but not limited to, the file name).
My application offers the ability to the user to export its results. My application exports text files with name Exp_Text_1, Exp_Text_2 etc. I want it so that if a file with the same file name pre-exists in Desktop then to start counting from this number upwards. For example if a file with name Exp_Text_3 is already in Desktop, then I want the file to be created to have the name Exp_Text_4.
This is my code:
if len(str(self.Output_Box.get("1.0", "end"))) == 1:
self.User_Line_Text.set("Nothing to export!")
else:
import os.path
self.txt_file_num = self.txt_file_num + 1
file_name = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt" + "_" + str(self.txt_file_num) + ".txt")
file = open(file_name, "a")
file.write(self.Output_Box.get("1.0", "end"))
file.close()
self.User_Line_Text.set("A text file has been exported to Desktop!")
you likely want os.path.exists:
>>> import os
>>> help(os.path.exists)
Help on function exists in module genericpath:
exists(path)
Test whether a path exists. Returns False for broken symbolic links
a very basic example would be create a file name with a formatting mark to insert the number for multiple checks:
import os
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
#the "{}" is a formatting mark so we can do file_name.format(num)
num = 1
while os.path.exists(name_to_format.format(num)):
num+=1
new_file_name = name_to_format.format(num)
this would check each filename starting with Exp_Txt_1.txt then Exp_Txt_2.txt etc. until it finds one that does not exist.
However the format mark may cause a problem if curly brackets {} are part of the rest of the path, so it may be preferable to do something like this:
import os
def get_file_name(num):
return os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_" + str(num) + ".txt")
num = 1
while os.path.exists(get_file_name(num)):
num+=1
new_file_name = get_file_name(num)
EDIT: answer to why don't we need get_file_name function in first example?
First off if you are unfamiliar with str.format you may want to look at Python doc - common string operations and/or this simple example:
text = "Hello {}, my name is {}."
x = text.format("Kotropoulos","Tadhg")
print(x)
print(text)
The path string is figured out with this line:
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
But it has {} in the place of the desired number. (since we don't know what the number should be at this point) so if the path was for example:
name_to_format = "/Users/Tadhg/Desktop/Exp_Txt_{}.txt"
then we can insert a number with:
print(name_to_format.format(1))
print(name_to_format.format(2))
and this does not change name_to_format since str objects are Immutable so the .format returns a new string without modifying name_to_format. However we would run into a problem if out path was something like these:
name_to_format = "/Users/Bob{Cat}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Bobcat{}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Smiley{:/Desktop/Exp_Txt_{}.txt"
Since the formatting mark we want to use is no longer the only curly brackets and we can get a variety of errors:
KeyError: 'Cat'
IndexError: tuple index out of range
ValueError: unmatched '{' in format spec
So you only want to rely on str.format when you know it is safe to use. Hope this helps, have fun coding!