how can i open a random text file in python? - python-3.x

i am trying to make a python program that randomly selects a text file to open and outputs the contents of the randomly selected text file
when i try running the code, i get this error
Traceback (most recent call last):
File "//fileserva/home$/K59046/Documents/project/project.py", line 8, in
o = open(text, "r")
TypeError: expected str, bytes or os.PathLike object, not tuple
this is the code that i have written
import os
import random
os.chdir('N:\Documents\project\doodoo')
a = os.getcwd()
print("current dir is",a)
file = random.randint(1, 4)
text = (file,".txt")
o = open(text, "r")
print (o.read())
can somebody tell me what i am doing wrong?

As your error message says, your text variable is a tuple, not a string. You can use f-strings or string concatenation to solve this:
# string concatenation
text = str(file) + ".txt"
# f-strings
text = f"{file}.txt"

Your variable text is not what you expect. You currently create a tuple that could look like this: (2, ".txt"). If you want a string like "2.txt", you need to concatenate the two parts:
text = str(file) + ".txt"

Related

Problem with passing XML element from Oracle database to ElementTree (Python xml parser)

i want to get an XML file out of a database, manipulate it with ElementTree and then insert it into another database. Getting the file works just fine, i can print it in it's entirety. However, whenever i try to get it through the parser it returns the error "no element found".
Here is the code:
import cx_Oracle
import xml.etree.ElementTree as et
cx_Oracle.init_oracle_client(lib_dir=r"here and there")
try:
dsn_tns_source = cx_Oracle.makedsn('censored for obvious reasons')
con_source = cx_Oracle.connect(cx_Oracle.makedsn('same here'))
except cx_Oracle.DatabaseError as err:
print("Connection DB error:", err)
try:
cur_source = con_source.cursor()
source_select = cur_source.execute("working SELECT")
print(source_select)
for row in source_select:
x = row[(len(row) - 1)] # This is the XML
print("source_row: ", x)
tree = et.parse(x)
root = tree.getroot()
print(root)
print(et.tostring(root, encoding='utf-8').decode('utf-8'))
for col in cur_source.description:
print("source_col: ", col)
Apparently I am not passing "x" correctly, however, the entire XML should be help in that variable at the point of calling it. Most tutorials only show how to insert local files so i thought simply using the variable would be sufficient.
The error message is the following:
Traceback (most recent call last):
File "Z:\basler_benchmark\main.py", line 24, in <module>
tree = et.parse(x)
File "C:\Python\lib\xml\etree\ElementTree.py", line 1229, in parse
tree.parse(source, parser)
File "C:\Python\lib\xml\etree\ElementTree.py", line 580, in parse
self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
.parse() specifically parses files.
If x is a string, use .fromstring():
root = et.fromstring(x)
If x is something else, it must be turned into a string first. For cx_Oracle.LOB objects, calling .read() should do the trick:
root = et.fromstring(x.read())
The error was
a) That what I had in x was not an XML file but a LOB and that i had to import it with
et.fromstring(x.read())
like Tomalak answered/commented and
b) that I didn't realize that you HAVE TO iterate over the entire thing in order to be able to use .text to get what's inside the field/tag.
So the solution looks like this:
for row in source_select:
x = row[(len(row) - 1)]
tree = et.ElementTree(et.fromstring(x.read()))
root = tree.getroot()
for aref in root.iter('name_of_tag'):
print(aref.text)

Creating a python spellchecker using tkinter

For school, I need to create a spell checker, using python. I decided to do it using a GUI created with tkinter. I need to be able to input a text (.txt) file that will be checked, and a dictionary file, also a text file. The program needs to open both files, check the check file against the dictionary file, and then display any words that are misspelled.
Here's my code:
import tkinter as tk
from tkinter.filedialog import askopenfilename
def checkFile():
# get the sequence of words from a file
text = open(file_ent.get())
dictDoc = open(dict_ent.get())
for ch in '!"#$%&()*+,-./:;<=>?#[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.split()
# make a dictionary of the word counts
wordDict = {}
for w in words:
wordDict[w] = wordDict.get(w,0) + 1
for k in dictDict:
dictDoc.pop(k, None)
misspell_lbl["text"] = dictDoc
# Set-up the window
window = tk.Tk()
window.title("Temperature Converter")
window.resizable(width=False, height=False)
# Setup Layout
frame_a = tk.Frame(master=window)
file_lbl = tk.Label(master=frame_a, text="File Name")
space_lbl = tk.Label(master=frame_a, width = 6)
dict_lbl =tk.Label(master=frame_a, text="Dictionary File")
file_lbl.pack(side=tk.LEFT)
space_lbl.pack(side=tk.LEFT)
dict_lbl.pack(side=tk.LEFT)
frame_b = tk.Frame(master=window)
file_ent = tk.Entry(master=frame_b, width=20)
dict_ent = tk.Entry(master=frame_b, width=20)
file_ent.pack(side=tk.LEFT)
dict_ent.pack(side=tk.LEFT)
check_btn = tk.Button(master=window, text="Spellcheck", command=checkFile)
frame_c = tk.Frame(master=window)
message_lbl = tk.Label(master=frame_c, text="Misspelled Words:")
misspell_lbl = tk.Label(master=frame_c, text="")
message_lbl.pack()
misspell_lbl.pack()
frame_a.pack()
frame_b.pack()
check_btn.pack()
frame_c.pack()
# Run the application
window.mainloop()
I want the file to check against the dictionary and display the misspelled words in the misspell_lbl.
The test files I'm using to make it work, and to submit with the assignment are here:
check file
dictionary file
I preloaded the files to the site that I'm submitting this on, so it should just be a matter of entering the file name and extension, not the entire path.
I'm pretty sure the problem is with my function to read and check the file, I've been beating my head on a wall trying to solve this, and I'm stuck. Any help would be greatly appreciated.
Thanks.
The first problem is with how you try to read the files. open(...) will return a _io.TextIOWrapper object, not a string and this is what causes your error. To get the text from the file, you need to use .read(), like this:
def checkFile():
# get the sequence of words from a file
with open(file_ent.get()) as f:
text = f.read()
with open(dict_ent.get()) as f:
dictDoc = f.read().splitlines()
The with open(...) as f part gives you a file object called f, and automatically closes the file when it's done. This is more concise version of
f = open(...)
text = f.read()
f.close()
f.read() will get the text from the file. For the dictionary I also added .splitlines() to turn the newline separated text into a list.
I couldn't really see where you'd tried to check for misspelled words, but you can do it with a list comprehension.
misspelled = [x for x in words if x not in dictDoc]
This gets every word which is not in the dictionary file and adds it to a list called misspelled. Altogether, the checkFile function now looks like this, and works as expected:
def checkFile():
# get the sequence of words from a file
with open(file_ent.get()) as f:
text = f.read()
with open(dict_ent.get()) as f:
dictDoc = f.read().splitlines()
for ch in '!"#$%&()*+,-./:;<=>?#[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.split()
# make a dictionary of the word counts
wordDict = {}
for w in words:
wordDict[w] = wordDict.get(w,0) + 1
misspelled = [x for x in words if x not in dictDoc]
misspell_lbl["text"] = misspelled

extract words from a text file and print netxt line

sample input
in parsing a text file .txt = ["'blah.txt'", "'blah1.txt'", "'blah2.txt'" ]
the expected output in another text file out_path.txt
blah.txt
blah1.txt
blah2.txt
Code that I tried, this just appends "[]" to the input file. While I also tried perl one liner replacing double and single quotes.
read_out_fh = open('out_path.txt',"r")
for line in read_out_fh:
for word in line.split():
curr_line = re.findall(r'"(\[^"]*)"', '\n')
print(curr_line)
this happens because while you reading a file it will be taken as string and not as a list even if u kept the formatting of a list. thats why you getting [] while doing re.for line in read_in_fh: here you are taking each letters in the string thats why you are not getting the desired output. so iwrote something first to transform the string into a list. while doing that i also eliminated "" and '' as you mensioned. then wrote it in to a new file example.txt.
Note: change the file name according to your files
read_out_fh = open('file.txt',"r")
for line in read_out_fh:
line=line.strip("[]").replace('"','').replace("'",'').split(", ")
with open("example.txt", "w") as output:
for word in line:
#print(word)
output.write(word+'\n')
example.txt(outputfile)
blah.txt
blah1.txt
blah2.txt
The code below works out for your example you gave in the question:
# Content of textfile.txt:
asdasdasd=["'blah.txt'", "'blah1.txt'", "'blah2.txt'"]asdasdasd
# Code:
import re
read_in_fh = open('textfile.txt',"r")
write_out_fh = open('out_path.txt', "w")
for line in read_in_fh:
find_list = re.findall(r'\[(".*?"*)\]', line)
for element in find_list[0].split(","):
element_formatted = element.replace('"','').replace("'","").strip()
write_out_fh.write(element_formatted + "\n")
write_out_fh.close()

How can I copy all PDF pages in a TXT file in python?

I have written the following script, in order to extract the text of a PDF file into plain text and save it into a TXT file:
import PyPDF2
def pdfToTxt(pdfFile):
pdfFileObject = open(pdfFile, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
numberOfPages = pdfReader.numPages
tempFile = open(r"temp.txt","a")
for p in range(numberOfPages):
pagesObject = pdfReader.getPage(p)
text = pagesObject.extractText()
tempFile.writelines(text)
tempFile.close()
pdfToTxt("PdfFile.pdf")
The code works fine for the first 15 pages, which are successfully written in temp.txt file, but after the 15th page I get the following error:
Traceback (most recent call last):
File "PdfToTextExtractor.py", line 35, in <module>
pdfToTxt("PdfFile.pdf")
File "PdfToTextExtractor.py", line 30, in pdfToTxt
tempFile.writelines(text)
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufb01' in position 0: characte
r maps to <undefined>
It seems that the character '\ufb01' is the problem.
In case you have any idea how to overcome this issue, please let me know.
In order to overcome this issue, you have to replace the character with another one (let's say a white space), before you write it into the file.
In that case you have to add the following line in the for loop:
text = text.replace('\ufb01', " ")
the method should look like this:
def pdfToTxt(pdfFile):
pdfFileObject = open(pdfFile, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
numberOfPages = pdfReader.numPages
tempFile = open(r"temp.txt","a")
for p in range(numberOfPages):
pagesObject = pdfReader.getPage(p)
text = pagesObject.extractText()
text = text.replace('\ufb01', " ")
tempFile.writelines(text)
tempFile.close()
When opening your tempFile, set the encoding like so:
tempFile = open(r"temp.txt","a", encoding='utf-8')
The issue is in the way you open file, so replace
tempFile = open(r"temp.txt","a")
With the same open + extra param:
tempFile = open(r"temp.txt","a", encoding="utf-8")
Additionally, I suggest you to use context manager in case of any file operations, which ensures that file will be closed correctly in case of unexpected exception:
with open(r"temp.txt","a") as tempFile:
...
Also, if you do so, you can remove file closing after for loop.

To find a particular string from multiples text files in the same directory

I am finding a string, for example "error", from a multiple text files. The multiple text files are within a similar directory. After finding, it must be able to print that line containing the string.
So far, I have only been successful on searching and printing out the string from one text file.
In the below code, I tried to create a list of the filenames in the directory; the list is called logz, but it printed out nothing. It only worked when the logz in line 10 is listed as a TXT file.
The desired output should be something like this:
Line 0: asdasda error wefrewfawvewvaw
Line 3: awvawvawvaw error afvavavav
Line 6: e ERROR DSCVSVWASEFVEWVWEVW
Here is my code:
import re
import sys
import os
logz = [fn for fn in os.listdir(r'my text file directory') if fn.endswith('.txt')]
err_occur = [] # The list where we will store results.
pattern = re.compile(" error ", re.IGNORECASE)
try: # Try to:
with open ('logz', 'rt') as in_file: # open file for reading text.
for linenum, line in enumerate(in_file):
if pattern.search(line) != None:
err_occur.append((linenum, line.rstrip('/n')))
print("Line ", linenum, ": ", line, sep='')
You can use the following program as an example for writing yours. Replace the '.' in line 5 with the path to your text file directory. Line 9 can be modified as needed to search for words other than 'error' as well. You will need to be running Python 3.6 if you want to use f'' strings (line 10).
import pathlib
def main():
for path in pathlib.Path('.').iterdir():
if path.suffix.lower() == '.txt':
with path.open() as file:
for line, text in enumerate(file):
if 'error' in text.lower():
print(f'Line {line}: {text}')
if __name__ == '__main__':
main()

Resources