I have a text file contains a text about a story and I want to find a word "like" and get the next word after it and call a function to find synonyms for that word. here is my code:
file = 'File1.txt'
with open(file, 'r') as open_file:
read_file = open_file.readlines()
output_lines = []
for line in read_file:
words = line.split()
for u, word in enumerate(words):
if 'like' == word:
next_word = words[u + 1]
find_synonymous(next_word )
output_lines.append(' '.join(words))
with open(file, 'w') as open_file:
open_file.write(' '.join(words))
my only problem I think in the text itself, because when I write one sentence including the word (like) it works( for example 'I like movies'). but when I have a file contains a lot of sentences and run the code it deletes all text. can anyone know where could be the problem
You have a couple of problems. find_synonymous(next_word ) doesn't replace the word in the list, so at best you will get the original text back. You do open(file, 'w') inside the for loop, so the file is overwritten for each line. next_word = words[u + 1] will raise an index error if like happens to be the last word on the line and you don't handle the case where the thing that is liked continues on the next line.
In this example, I track an "is_liked" state. If a word is in the like state, it is converted. That way you can handle sentences that are split across lines and don't have to worry about index errors. The list is written to the file outside the loop.
file = 'File1.txt'
with open(file, 'r') as open_file:
read_file = open_file.readlines()
output_lines = []
is_liked = False
for line in read_file:
words = line.split()
for u, word in enumerate(words):
if is_liked:
words[u] = find_synonymous(word)
is_liked = False
else:
is_liked = 'like' == word
output_lines.append(' '.join(words) + '\n')
with open(file, 'w') as open_file:
open_file.writelines(output_lines)
Related
I have searched through a text file to find the most frequent word. I have the correct result BUT - I have it twice.
I have been through the code line by line but don't understand why I am getting the result twice.
# open file
with open('file.txt', 'r') as f:
text = f.read()
for char in ',\n':
text = text.replace(char, ' ')
list = text.split()
freq = Counter(list).most_common(1)
for x in freq:
print(x)
Result -
('RED', 5044)
('RED', 5044)
I have 2 x for loops, but both perform separate tasks. In case it is the existence of two 'for' loops, I have tried removing -
for char in ',\n':
and using the replace() function as below -
text = text.replace('\n, ' ')
without success.
I'm trying to create 2 functions.
readfiles(file_path), That reads a file specified by file_path and returns a list of strings containing each line in the file.
writefiles(lines, file_path) That writes line by line the content of the list lines to the file specified by file_path.
When used one after another the output file should be an exact copy of the input file(including the formatting)
This is what i have so far.
file_path = ("/myfolder/text.txt", "r")
def readfiles(file_path):
with open file_path as f:
for line in f:
return line
lst = list[]
lst = line
lst.append(line)
return lst
read_file(file_path)
lines = lst []
def writefiles(lines, file_path):
with open ("file_path", "w") as f:
for line in lst:
f.write(line)
f.write("\n")
I can get it to kind of work when I use this for read
with open("/myfolder/text.txt", "r") as f:
for line in f:
print(line, end='')
and this for write
with open ("/myfolder/text.txt", "w") as f:
for line in f:
f.write(line)
f.write("\n")
But when I try to put them into functions it all messes up.
I'm not sure why, I know it's a simple question but it's just not clicking for me. I've read documentation on it but I'm not following it fully and am at my wits end. What's wrong with my functions?
I get varying errors from
lst = list[]
^
SyntaxError: invalid syntax
to
lst or list is not callable
Also I know there are similar questions but the ones I found don't seem to define a function.
The problems with your code are explained as comments
file_path = ("/myfolder/text.txt", "r") # this is a tupple of 2 elements should be file_path = "/myfolder/text.txt"
def readfiles(file_path):
with open file_path as f: # "open" is a function and will probably throw an error if you use it without parenthesis
# use open this way: open(file_path, "r")
for line in f:
return line # it will return the first line and exit the function
lst = list[] # "lst = []" is how you define a list in python. also you want to define it outside the loop
lst = line # you are replacing the list lst with the string in line
lst.append(line) # will throw an error because lst is a string now and doesn't have the append method
return lst
read_file(file_path) # should be lines = read_file(file_path)
lines = lst [] # lines is an empty list
def writefiles(lines, file_path):
with open ("file_path", "w") as f:
for line in lst: # this line should have 1 more tabulation
f.write(line) # this line should have 1 more tabulation
f.write("\n") # this line should have 1 more tabulation
Here's how the code should look like
def readfiles(file_path):
lst = []
with open(file_path) as f:
for line in f:
lst.append(line.strip("\n"))
return lst
def writefiles(lines, file_path):
with open(file_path, "w") as f:
for line in lines:
f.write(line + "\n")
file_path = "/myfolder/text.txt"
filepathout = "myfolder/text2.txt"
lines = readfiles(file_path)
writefiles(lines, filepathout)
A more pythonic way to do it
# readlines is a built-in function in python
with open(file_path) as f:
lines = f.readlines()
# stripping line returns
lines = [line.strip("\n") for line in lines]
# join will convert the list to a string by adding a \n between the list elements
with open(filepathout, "w") as f:
f.write("\n".join(lines))
key points:
- the function stops after reaching the return statement
- be careful where you define your variable.
i.e "lst" in a for loop will get redefined after each iteration
defining variables:
- for a list: list_var = []
- for a tuple: tup_var = (1, 2)
- for an int: int_var = 3
- for a dictionary: dict_var = {}
- for a string: string_var = "test"
A couple learning points here that will help.
In your reading function, you are kinda close. However, you cannot put the return statement in the loop. As soon as the function hits that anywhere for the first time, it ends. Also, if you are going to make a container to hold the list of things read, you need to make that before you start your loop. Lastly, don't name anything list. It is a keyword. If you want to make a new list item, just do something like: results = list() or results = []
So in pseudocode, you should:
Make a list to hold results
Open the file as you are now
Make a loop to loop through lines
append to the results list
return the results (outside the loop)
Your writefiles is very close. You should be looping through the lines variable that is a parameter of your function. Right now you are referencing lst which is not a parameter of your function.
Good luck!
while True:
try:
line = input("paste:")
except EOFError:
break
f = open("notam_new.txt", "w+")
f.write(line)
f.close()
This code return only the last line of multi-line after Ctrl+D
I tried also:
notam = input("paste new notam: ")
f = open("notam_new.txt", "w+")
f.write(notam)
f.close()
getting only the first row.
Any ideas?
You're setting line in a loop, so every iteration you're just overwriting said line with the next one You need to accumulate your lines in a list (created before the while True) so you can keep track of all of them, and then write to the file in a loop. Plus you also need to add a newline as input() strips it.
lines = []
while True:
try:
lines.append(input("paste:"))
except EOFError:
break
with open("notam_new.txt", "w+") as f:
for line in lines:
f.write(line)
f.write('\n')
I have a folder containing some other folders and each of them contains a lot of text files, about 32214 files. I want to print 5 words before and after a specific word and my code should read all of these files.The code below works but it takes about 8 hours to read all of the files and extracts sentences. How can I change the code so that it reads and prints the sentences just in a few minutes? (The language is Persian)
.
.
.
def extact_sentence ():
f= open ("پاکت", "w", encoding = "utf-8")
y = "پاکت"
text= normal_text(folder_path) # the first function to normalize the files
for i in text:
for line in i:
split_line = line.split()
if y in split_line:
index = split_line.index(y)
d = (' '.join(split_line[max(0,index-5):min(index+6,len(split_line))]))
f.write(d + "\n")
f.close()
enter image description here
Use os.walk to access all the files. Then use a rolling window over each file, and check the middle word of each window:
import os
def getRollingWindow(seq, w):
win = [next(seq) for _ in range(window_size)]
yield win
for e in seq:
win[:-1] = win[1:]
win[-1] = e
yield win
def extractSentences(rootDir, searchWord):
with open("پاکت", "w", encoding="utf-8") as outfile:
for root, _dirs, fnames in os.walk(rootDir):
for fname in fnames:
print("Looking in", os.path.join(root, fname))
with open(os.path.join(root, fname)) as infile:
for window in getRollingWindow(word for line in infile for word in line.split(), 11):
if window[5] != searchWord: continue
outfile.write(' '.join(window))
So I have this messy code where I wanted to get every word from frankenstein.txt, sort them alphabetically, eliminated one and two letter words, and write them into a new file.
def Dictionary():
d = []
count = 0
bad_char = '~!##$%^&*()_+{}|:"<>?\`1234567890-=[]\;\',./ '
replace = ' '*len(bad_char)
table = str.maketrans(bad_char, replace)
infile = open('frankenstein.txt', 'r')
for line in infile:
line = line.translate(table)
for word in line.split():
if len(word) > 2:
d.append(word)
count += 1
infile.close()
file = open('dictionary.txt', 'w')
file.write(str(set(d)))
file.close()
Dictionary()
How can I simplify it and make it more readable and also how can I make the words write vertically in the new file (it writes in a horizontal list):
abbey
abhorred
about
etc....
A few improvements below:
from string import digits, punctuation
def create_dictionary():
words = set()
bad_char = digits + punctuation + '...' # may need more characters
replace = ' ' * len(bad_char)
table = str.maketrans(bad_char, replace)
with open('frankenstein.txt') as infile:
for line in infile:
line = line.strip().translate(table)
for word in line.split():
if len(word) > 2:
words.add(word)
with open('dictionary.txt', 'w') as outfile:
outfile.writelines(sorted(words)) # note 'lines'
A few notes:
follow the style guide
string contains constants you can use to provide the "bad characters";
you never used count (which was just len(d) anyway);
use the with context manager for file handling; and
using a set from the start prevents duplicates, but they aren't ordered (hence sorted).
Using re module.
import re
words = set()
with open('frankenstein.txt') as infile:
for line in infile:
words.extend([x for x in re.split(r'[^A-Za-z]*', line) if len(x) > 2])
with open('dictionary.txt', 'w') as outfile:
outfile.writelines(sorted(words))
From r'[^A-Za-z]*' in re.split, replace 'A-Za-z' with the characters which you want to include in dictionary.txt.