random.choice displays an error - python-3.x

I am new to stack overflow and I was wondering if anyone could help me with the following question. If you know a similar question that was answered, please point me towards it. thanks:)
This is my code to create a function load_words() that creates a list of 6 letter words from the file "words.txt". I want the random.choice() to pick a random word from the list and save it into word. However, I get the error below.
import random
def load_words(filename, length):
file = open(filename, "r")
words = []
for line in file:
word = line.strip()
if len(word)== length:
words.append(word)
return words
word = random.choice(words)
print (word)
the error I get is:
Traceback (most recent call last):
File "C:\Users\mssuk\Desktop\University\Software Engineering\Assignment\assignment 1 - word guessing game\compute_score.py", line 14, in <module>
word = random.choice(words)
NameError: name 'words' is not defined

It's an indentation error. Indent your return statement by 4 spaces and the code will work.
import random
def load_words(filename, length):
file = open(filename, "r")
words = []
#Assuming there is only one word in a line
for line in file:
word = line.strip().lower()
if len(word) == length:
words.add(word)
file.close()
return words
word = random.choice(load_words(your_filename, your_length)
print(word)

Related

read and write from and to file using functions

I'm trying to create 2 functions.
readfiles(file_path), That reads a file specified by file_path and returns a list of strings containing each line in the file.
writefiles(lines, file_path) That writes line by line the content of the list lines to the file specified by file_path.
When used one after another the output file should be an exact copy of the input file(including the formatting)
This is what i have so far.
file_path = ("/myfolder/text.txt", "r")
def readfiles(file_path):
with open file_path as f:
for line in f:
return line
lst = list[]
lst = line
lst.append(line)
return lst
read_file(file_path)
lines = lst []
def writefiles(lines, file_path):
with open ("file_path", "w") as f:
for line in lst:
f.write(line)
f.write("\n")
I can get it to kind of work when I use this for read
with open("/myfolder/text.txt", "r") as f:
for line in f:
print(line, end='')
and this for write
with open ("/myfolder/text.txt", "w") as f:
for line in f:
f.write(line)
f.write("\n")
But when I try to put them into functions it all messes up.
I'm not sure why, I know it's a simple question but it's just not clicking for me. I've read documentation on it but I'm not following it fully and am at my wits end. What's wrong with my functions?
I get varying errors from
lst = list[]
^
SyntaxError: invalid syntax
to
lst or list is not callable
Also I know there are similar questions but the ones I found don't seem to define a function.
The problems with your code are explained as comments
file_path = ("/myfolder/text.txt", "r") # this is a tupple of 2 elements should be file_path = "/myfolder/text.txt"
def readfiles(file_path):
with open file_path as f: # "open" is a function and will probably throw an error if you use it without parenthesis
# use open this way: open(file_path, "r")
for line in f:
return line # it will return the first line and exit the function
lst = list[] # "lst = []" is how you define a list in python. also you want to define it outside the loop
lst = line # you are replacing the list lst with the string in line
lst.append(line) # will throw an error because lst is a string now and doesn't have the append method
return lst
read_file(file_path) # should be lines = read_file(file_path)
lines = lst [] # lines is an empty list
def writefiles(lines, file_path):
with open ("file_path", "w") as f:
for line in lst: # this line should have 1 more tabulation
f.write(line) # this line should have 1 more tabulation
f.write("\n") # this line should have 1 more tabulation
Here's how the code should look like
def readfiles(file_path):
lst = []
with open(file_path) as f:
for line in f:
lst.append(line.strip("\n"))
return lst
def writefiles(lines, file_path):
with open(file_path, "w") as f:
for line in lines:
f.write(line + "\n")
file_path = "/myfolder/text.txt"
filepathout = "myfolder/text2.txt"
lines = readfiles(file_path)
writefiles(lines, filepathout)
A more pythonic way to do it
# readlines is a built-in function in python
with open(file_path) as f:
lines = f.readlines()
# stripping line returns
lines = [line.strip("\n") for line in lines]
# join will convert the list to a string by adding a \n between the list elements
with open(filepathout, "w") as f:
f.write("\n".join(lines))
key points:
- the function stops after reaching the return statement
- be careful where you define your variable.
i.e "lst" in a for loop will get redefined after each iteration
defining variables:
- for a list: list_var = []
- for a tuple: tup_var = (1, 2)
- for an int: int_var = 3
- for a dictionary: dict_var = {}
- for a string: string_var = "test"
A couple learning points here that will help.
In your reading function, you are kinda close. However, you cannot put the return statement in the loop. As soon as the function hits that anywhere for the first time, it ends. Also, if you are going to make a container to hold the list of things read, you need to make that before you start your loop. Lastly, don't name anything list. It is a keyword. If you want to make a new list item, just do something like: results = list() or results = []
So in pseudocode, you should:
Make a list to hold results
Open the file as you are now
Make a loop to loop through lines
append to the results list
return the results (outside the loop)
Your writefiles is very close. You should be looping through the lines variable that is a parameter of your function. Right now you are referencing lst which is not a parameter of your function.
Good luck!

How to delete or skip a list of lines in a text file and print the remaining lines in a new text file?

I am very new to python. I am trying to create a script that prints lines of text to a text file that exclude a list of lines. Is the error IndexError : List index out of range due to the .pop function?
with open(file_path) as f:
lines = []
lines = open(f,'r').readlines()
# delete the following lines from the textfile
skip_line =[14,27,39,56,78]
while skip_line:
pop = skip_line.pop(0)
print(pop)
print(lines[pop])
lines.remove(lines[pop])
with open('duplicates_removed.txt', 'w') as savefile:
savefile.writelines(lines)
savefile.close()
I expect that the lines found in lines[pop] will be removed from lines.
Actual result:
IndexError : List index out of range
skip_lines = {14, 27, 39, 56, 78}
with open(filepath) as infile:
with open("duplicates_removed.txt", "w") as outfile:
for index, line in enumerate(infile):
if index not in skip_lines:
outfile.write(line)

Unable to save the file correctly

I have a text file contains a text about a story and I want to find a word "like" and get the next word after it and call a function to find synonyms for that word. here is my code:
file = 'File1.txt'
with open(file, 'r') as open_file:
read_file = open_file.readlines()
output_lines = []
for line in read_file:
words = line.split()
for u, word in enumerate(words):
if 'like' == word:
next_word = words[u + 1]
find_synonymous(next_word )
output_lines.append(' '.join(words))
with open(file, 'w') as open_file:
open_file.write(' '.join(words))
my only problem I think in the text itself, because when I write one sentence including the word (like) it works( for example 'I like movies'). but when I have a file contains a lot of sentences and run the code it deletes all text. can anyone know where could be the problem
You have a couple of problems. find_synonymous(next_word ) doesn't replace the word in the list, so at best you will get the original text back. You do open(file, 'w') inside the for loop, so the file is overwritten for each line. next_word = words[u + 1] will raise an index error if like happens to be the last word on the line and you don't handle the case where the thing that is liked continues on the next line.
In this example, I track an "is_liked" state. If a word is in the like state, it is converted. That way you can handle sentences that are split across lines and don't have to worry about index errors. The list is written to the file outside the loop.
file = 'File1.txt'
with open(file, 'r') as open_file:
read_file = open_file.readlines()
output_lines = []
is_liked = False
for line in read_file:
words = line.split()
for u, word in enumerate(words):
if is_liked:
words[u] = find_synonymous(word)
is_liked = False
else:
is_liked = 'like' == word
output_lines.append(' '.join(words) + '\n')
with open(file, 'w') as open_file:
open_file.writelines(output_lines)

how to calculate the number of unique words just in a part of a file

I have a file in Persian (a Persian sentence, a "tab", then a Persian word, again a "tab" and then an English word). I have to calculate the number of unique words just in Persian sentences and not the Persian and English words after the tabs. Here's the code:
from hazm import*
file = "F.txt"
def WordsProbs (file):
words = set()
with open (file, encoding = "utf-8") as f1:
normalizer = Normalizer()
for line in f1:
tmp = line.strip().split("\t")
words.update(set(normalizer.normalize(tmp[0].split())))
print(len(words), "unique words")
print (words)
To access just the sentences I have to split each line by "\t". And to access each word of the sentence I have to split tmp[0]. The problem is, when I run the code the error below occurs. It's because of the split after tmp[0]. But if I omit this split after tmp[0], it just counts the letters not unique words. How can I fix it? (Is there another way to write this code to calculate unique words?).
The error:
Traceback (most recent call last):
File "C:\Users\yasini\Desktop\16.py", line 15, in
WordsProbs (file)
File "C:\Users\yasini\Desktop\16.py", line 10, in WordsProbs
words.update(set(normalizer.normalize(tmp[0].split())))
File "C:\Python34\lib\site-packages\hazm\Normalizer.py", line 46, in normalize
text = self.character_refinement(text)
File "C:\Python34\lib\site-packages\hazm\Normalizer.py", line 65, in character_refinement
text = text.translate(self.translations)
AttributeError: 'list' object has no attribute 'translate'
sample file:
https://www.dropbox.com/s/r88hglemg7aot0w/F.txt?dl=0
The problem is that hazm.Normalizer.normalize takes a space separated string as an argument NOT a list. You can see an example here under the "Usage" heading.
Remove the .split() from the argument to your normalize function so that
words.update(set(normalizer.normalize(tmp[0].split())))
becomes
words.update(set(normalizer.normalize(tmp[0])))
and you should be good to go.
I found it myself.
from hazm import*
file = "F.txt"
def WordsProbs (file):
words = []
mergelist = []
with open (file, encoding = "utf-8") as f1:
normalizer = Normalizer()
for line in f1:
line = normalizer.normalize(line)
tmp = line.strip().split("\t")
words = tmp[0].split()
#print(len(words), "unique words")
#print (words)
for i in words:
mergelist.append(i)
uniq = set(mergelist)
uniqueWords = len(uniq)

Creating a dictionary to count the number of occurrences of Sequence IDs

I'm trying to write a function to count the number of each sequence ID that occurs in this file (it's a sample blast file)
The picture above is the input file I'm dealing with.
def count_seq(input):
dic1={}
count=0
for line in input:
if line.startswith('#'):
continue
if line.find('hits found'):
line=line.split('\t')
if line[1] in dic1:
dic1[line]+=1
else:
dic1[line]=1
return dic1
Above is my code which when called just returns empty brackets {}
So I'm trying to count how many times each of the sequence IDs (second element of last 13 lines) occur eg: FO203510.1 occurs 4 times.
Any help would be appreciated immensely, thanks!
Maybe this is what you're after:
def count_seq(input_file):
dic1={}
with open(input_file, "r") as f:
for line in f:
line = line.strip()
if not line.startswith('#'):
line = line.split()
seq_id = line[1]
if not seq_id in dic1:
dic1[seq_id] = 1
else:
dic1[seq_id] += 1
return dic1
print(count_seq("blast_file"))
This is a fitting case for collections.defaultdict. Let f be the file object. Assuming the sequences are in the second column, it's only a few lines of code as shown.
from collections import defaultdict
d = defaultdict(int)
seqs = (line.split()[1] for line in f if not line.strip().startswith("#"))
for seq in seqs:
d[seq] += 1
See if it works!

Resources