Python calling class variable in forloop - python-3.x

I am new to python, and I have confused with the below for loop usage. Can anyone please help me to understand the class usage in the below forloop.
import sys
def checkline():
glb.linecount += 1
w = glb.l.split()
glb.wordcount += len(w)
class glb:
linecount = 0
wordcount = 0
l = []
f = open('Untitled9.ipynb','r')
for glb.l in f.readlines(): #what glb.l exactly does?
checkline()
print(glb.linecount, glb.wordcount)

This entire program counts the lines and words in a file. specifically,
glb.l becomes each line in a file, so you could iterate and count the words in each one of them.
Let me pseudo code it for you.
Open the file `Untitled9.ipynb` for reading. //f
For each line in the file: // checkline
Store the line.// youre adding the line to glb.l, which you will later iterate on to count the words in the file.
Add one to the line count.
For each space, add one to the word count. // counting the results of the split() on glb.l
Print the line and the word count.

Related

find a sequence in string

Hi I've got problem set in cs50 and having difficulties as this is my first week in Python and I would be appreciate if you don't directly write an open answer but forward me to the right functions or method to use.
We've been given a long string sequence in a .txt file, one line and no white spaces. I have to find the longest consecutive sequence of words of given DNA string
example txt:
GGAGGCCAAAGTCTTGTGATATCGGGCAACTCCCCGGGAGGAACACAGGCCCACCGAAAACAGCTTGAAATGGGAAACGTTCCCGATCTACGCCGGGCCAGAGG
original text is around 5000 characters but it goes like the example below. My task is to find the longest consecutive sequences of 'AGATC' string.
lets say the first consequtive sequence is 23 times, after i kept reading and find another consequtive sequences in 34 times, I have to store the biggest number.
My problem is not to find a way to read and analyse a string in this way. I can read a string can find the total repetitive times and so on but finding the longest repetition is not making sense in every way I've tried. I thought C was hard but I can write this code with C so easily as I we can manipulate strings in so much way in C. At least in C there are ways to read in a size but as far as I see Python reads at once and there is no control over read. In Python it doesn't seem you can make much with, at least in my level of knowledge at the moment :/ Probably Python got one line solutions for this, please don't judge this is my 3rd day and 4th program in Python.
What functions or methods I should look to analyze a string in this way. I've watched videos for a similiar thing but for sequence of single character, not a string. Also bought the Python Crash Course to get some knowledge about the string manipulation but couldn't find anything related in this case. Also checked the Python documentation but obviously it's so much complicated for day 3 in Python.
Could anyone help me please.TIA
here is my not-working and not-making-sense code
import csv
import sys
#check the arguments count
if len(sys.argv) != 3:
print("Usage: python dna.py data.csv sequence.txt")
sys.exit(1)
#create a dictionary to store str results
SEQ = {
"AGATC": 0,
"AATG": 0,
"TATC": 0
}
counter = 0 #keeps the the length of the sequence
seq = 0 #keeps the longest sequence
DNA = '' ## keeps the key of SEQ, "AGATC" etc.
#find the longest consecutive sequence of DNA
def findSEQ(file, DNA): #get the sequences text file and the string of the key as parameters
for DNA in (DNA, file):
if file[i:i + len(DNA)] == DNA: #if find a match
counter += 1 #count up the sequence
else:
if counter > seq: #if it's not a sequence the next thing it reads
seq = counter
counter = 0
return seq
seq = 0
#open sequence file and read
with open(sys.argv[2],'r') as file:
reader = csv.reader(file)
#find the longest sequence of AGATC
findSEQ("AGATC", file)
#update the seq dictionary
SEQ["AGATC"] = seq
#find the longest sequence of AATG
findSEQ(file, "AATG")
#update the seq dictionary
SEQ["AATG"] = seq
#find the longest sequence of TATC
findSEQ(file, "TATC")
#update the seq dictionary
SEQ["TATC"] = seq
#open and read database
with open(sys.argv[1], "r") as file:
reader = csv.reader(file)
#skip the first row
next(reader)
#compare the seq dictionary results with database
for row in reader:
seq1, seq2, seq3 = row[1], row[2], row[3]
#if found any match print the name
if SEQ[seq1] == row[1] and SEQ[seq2] == row[2] and SEQ[seq3] == row[3]:
print(row[0])
#otherwise print not found
else:
print("Not found any match.")
To elaborate on my comment, please find the following example:
import re
text = 'GGAGGCCAAGATCAAGTCTTGTGATATCGGGCAACTCCCCGGGAAGATCAGATCAGATCGGAACACAGGCCCACCGAAAACAGCTTGAAGATCAATGGGAAACGTTCCCGATCTACGCCGGGCCAGAGG'
sequence = 'AGATC'
pattern = f'(?:{sequence})+'
findings = sorted(re.findall(pattern, text), key=len)
longest_sequence = len(findings[-1]) / len(sequence)
print(f'longest sequence: {longest_sequence}')
This program uses regex (regular expressions) to find sequences of the pattern you're looking for. It then sorts the findings by length (in an ascending order), allowing you to find the longest sequences in the last index of the list.

How can I print the line index of a specific word in a text file?

I was trying to find a way to print the biggest word from a txt file, it's size and it's line index. I managed to get the first two done but can't quite figure it out how to print the line index. Can anyone help me?
def BiggestWord():
list_words = []
with open('song.txt', 'r') as infile:
lines = infile.read().split()
for i in lines:
words = i.split()
list_words.append(max(words, key=len))
biggest_word = str(max(list_words, key=len))
print biggest_word
print len(biggest_words)
FindWord(biggest_word)
def FindWord(biggest_word):
You don't need to do another loop through your list of largest words from each line. Every for-loop increases function time and complexity, and it's better to avoid unnecessary ones when possible.
As one of the options, you can use Python's built-in function enumerate to get an index for each line from the list of lines, and instead of adding each line maximum to the list, you can compare it to the current max word.
def get_largest_word():
# Setting initial variable values
current_max_word = ''
current_max_word_length = 0
current_max_word_line = None
with open('song.txt', 'r') as infile:
lines = infile.read().splitlines()
for line_index, line in enumerate(lines):
words = line.split()
max_word_in_line = max(words, key=len)
max_word_in_line_length = len(max_word_in_line)
if max_word_in_line_length > current_max_word_length:
# updating the largest word value with a new maximum word
current_max_word = max_word_in_line
current_max_word_length = max_word_in_line_length
current_max_word_line = line_index + 1 # line number starting from 1
print(current_max_word)
print(current_max_word_length)
print(current_max_word_line)
return current_max_word, current_max_word_length, current_max_word_line
P.S.: This function doesn't suggest what to do with the line maximum words of the same length, and which of them should be chosen as absolute max. You would need to adjust the code accordingly.
P.P.S.: This example is in Python 3, so change the snippet to work in Python 2.7 if needed.
With a limited amount of info I'm working with, this is the best solution I could think of. Assuming that each line is separated by a new line, such as '\n', you could do:
def FindWord(largest_word):
with open('song.txt', 'r') as infile:
lines = infile.read().splitlines()
linecounter = 1
for i in lines:
if largest_word in lines:
return linecounter
linecounter += 1
You can use enumerate in your for to get the current line and sorted with a lambda to get the longest word:
def longest_word_from_file(filename):
list_words = []
with open(filename, 'r') as input_file:
for index, line in enumerate(input_file):
words = line.split()
list_words.append((max(words, key=len), index))
sorted_words = sorted(list_words, key=lambda x: -len(x[0]))
longest_word, line_index = sorted_words[0]
return longest_word, line_index
Are you aware that there can be:
many 'largest' words with the same length
several lines contain word(s) with the biggest length
Here is the code that finds ONE largest word and returns a LIST of numbers of lines that contain the word:
# built a dictionary:
# line_num: largest_word_in_this_line
# line_num: largest_word_in_this_line
# etc...
# !!! actually, a line can contain several largest words
list_words = {}
with open('song.txt', 'r') as infile:
for i, line in enumerate(infile.read().splitlines()):
list_words[i] = max(line.split(), key=len)
# get the largest word from values of the dictionary
# !!! there can be several different 'largest' words with the same length
largest_word = max(list_words.values(), key=len)
# get a list of numbers of lines (keys of the dictionary) that contain the largest word
lines = list(filter(lambda key: list_words[key] == largest_word, list_words))
print(lines)
If you want to get all lines that have words with the same biggest length you need to modify the last two lines in my code this way:
lines = list(filter(lambda key: len(list_words[key]) == len(largest_word), list_words))
print(lines)

Reading in a file of one-word lines in python

Just curious if there's a cleaner way to do this. I have a list of words in a file, one word per line.
I want to read them in and pass each word to a function.
I've currently got this:
f = open(fileName,"r");
lines = f.readlines();
count = 0
for i in lines:
count += 1
print("--{}--".format(i.rstrip()))
if count > 100:
return
I there a way to read them in faster without using rstrip on each line?
with open(fileName) as f:
lines = (line for _, line in zip(range(100), f.readlines()))
for line in lines:
print('--{}--'.format(line.rstrip()))
This is how I would do it. Note the context manager (the with/as statement), and the generator comprehension giving us only the first 100 lines.
Similar to Patrick's answer:
with open(filename, "r") as f:
for i, line in enumerate(f):
if i >= 100:
break
print("--{}--".format(line[:-1]))
If you don't an .strip() and know the length line terminator, you can use [:-1].

Python: having trouble with for loop saving only the last object multiple times

I've made the following code trying to load from a newline seperated textfile. It stores apple objects made up of colour then size then kind (each on a newline). The weird thing is that the load function works but it returns all the loaded objects as identical to the last object loaded in (but it puts the correct number of objects in the list based on the lines in the textfile. The print near the append shows the correct data being read though for each object though...
I'm not sure what is wrong and how I rectify it?
def loadInApplesTheOtherWay(filename):
tempList = []
#make a tempList and load apples from file into it
with open(filename,"r") as file:
#file goes, colour \n size \n kind \n repeat...
lines = file.read().splitlines()
count = 1
newApple = apple()
for line in lines:
if count % 3 == 1:
newApple.colour = line
if count % 3 == 2:
newApple.size = line
if count % 3 == 0:
newApple.kind = line
tempList.append(newApple)
print(newApple)
count +=1
return tempList
newApple is just a object reference.
>>> list(map(id, tempList))
The above line will show all apple is of the same id. As last modification of newApple is at the file end, so tempList are all the same as last apple object.
To make it differ, you need to deepcopy the object, such as tempList.append(copy.deepcopy(newApple))see https://docs.python.org/3/library/copy.html for more details.
Or you can create the object on the fly, you don't have to allocate newApple before for loop.
def loadInApplesTheOtherWay(filename):
tempList = []
#make a tempList and load apples from file into it
with open(filename,"r") as file:
#file goes, colour \n size \n kind \n repeat...
lines = file.read().splitlines()
count = 1
for line in lines:
if count % 3 == 1:
colour = line
if count % 3 == 2:
size = line
if count % 3 == 0:
kind = line
newApple = Apple(colour, size, kind)
tempList.append(newApple)
print(newApple)
count +=1
return tempList
You need to move newApple = apple() inside of the for loop.

Python Spell Checker Linear Search

I'm learning Python and one of the labs requires me to import a list of words to serve as a dictionary, then compare that list of words to some text that is also imported. This isn't for a class, I'm just learning this on my own, or I'd ask the teacher. I've been hung up on how to covert that imported text to uppercase before making the comparision.
Here is the URL to the lab: http://programarcadegames.com/index.php?chapter=lab_spell_check
I've looked at the posts/answers below and some youtube videos and I still can't figure out how to do this. Any help would be appreciated.
Convert a Python list with strings all to lowercase or uppercase
How to convert upper case letters to lower case
Here is the code I have so far:
# Chapter 16 Lab 11
import re
# This function takes in a line of text and returns
# a list of words in the line.
def split_line(line):
return re.findall('[A-Za-z]+(?:\'[A-Za-z]+)?',line)
dfile = open("dictionary.txt")
dictfile = []
for line in dfile:
line = line.strip()
dictfile.append(line)
dfile.close()
print ("--- Linear Search ---")
afile = open("AliceInWonderLand200.txt")
for line in afile:
words = []
line = split_line(line)
words.append(line)
for word in words:
lineNumber = 0
lineNumber += 1
if word != (dictfile):
print ("Line ",(lineNumber)," possible misspelled word: ",(word))
afile.close()
Like the lb says: You use .upper():
dictfile = []
for line in dfile:
line = line.strip()
dictfile.append(line.upper()) # <- here.

Resources