Largest number from a file without using array implementation - python-3.x

Basically, the assignment asks for the program to open a file, find the largest number, and count the amount of numbers in the file. Our instructor has told us not to use array implementation and I'm not sure if my code counts as using it. I don't know how to convert it without using array implementations.
def main():
infile = open('numbers.dat', 'r')
numbers = []
for line in infile:
numbers.append(int(line))
infile.close()
largest = max(numbers)
print('The largest number in the file is: ',largest)
count = len(numbers)
print('The amount of numbers in the file is: ', count)
main()

Yes, I think your code counts as using it. Your instructor probably doesn't want you to store all the numbers - luckily you don't have to.
Since this is an assessed exercise I won't write your code for you, but I'll explain the basic approach. You need to just keep two integers to do this - one number is the count of the numbers you've seen in the file so far, the other one is the largest number so far. For each number you read, you store the max() of the largest so far and then number you've just seen, and you simply add one to the counter.
One gotcha - if you start the largest number at zero, then you'll get an incorrect result if all the numbers in the file are negative. You don't specify whether negative numbers are allowed, but it's potentially valid. To avoid this, initialise the value with None to start with, and then just set it to the first number that you see in the file if the value is None.

You're not using arrays, but lists. This is inefficient as your program needs memory in order of the file size, but only needs significantly less (i.e. enough memory to hold the largest and the count of lines).
One could simply call max and len on the elements of the file, like this:
def main():
with open('numbers.dat', 'r') as infile:
largest = max(map(int, infile))
print('The largest number in the file is: ',largest)
with open('numbers.dat', 'r') as infile:
count = sum(1 for line in infile)
print('The amount of numbers in the file is: ', count)
main()
However, both variants are obviously suboptimal since you'd need to read the file twice. Instead, you can modify your for loop, like this:
def main():
largest = float('-inf')
count = 0
with open('numbers.dat', 'r') as infile:
for line in infile:
v = int(line)
# Left as homework: Update largest and count
print('The largest number in the file is: ',largest)
print('The amount of numbers in the file is: ', count)

Related

Generating a random string with matched brackets

I need to generate a random string of a certain length – say ten characters, for the sake of argument – composed of the characters a, b, c, (, ), with the rule that parentheses must be matched.
So for example aaaaaaaaaa, abba()abba and ((((())))) are valid strings, but )aaaabbbb( is not.
What algorithm would generate a random string, uniformly sampled from the set of all strings consistent with those rules? (And run faster than 'keep generating strings without regard to the balancing rule, discard the ones that fail it', which could end up generating very many invalid strings before finding a valid one.)
A string consisting only of balanced parentheses (for any arbitrary pair of characters representing an open and a close) is called a "Dyck string", and the number of such strings with p pairs of parentheses is the pth Catalan number, which can be computed as (2pCp)/(p+1), a formula which would be much easier to make readable if only SO allowed MathJax. If you want to also allow k other non-parenthetic characters, you need to consider, for each number p ≤ n of pairs of balanced parentheses, the number of different combinations of the non-parentheses characters (k(2n-2p)) and the number of ways you can interpolate 2n-2p characters in a string of total length 2n (2nC2p). If you sum all these counts for each possible value of p, you'll get the count of the total universe of possibilities, and you can then choose a random number in that range and select whichever of the individual p counts corresponds. Then you can select a random placement of random non-parentheses characters.
Finally, you need to get a uniformly distributed Dyck string; a simple procedure is to decompose the Dyck string into it's shortest balanced prefix and the remainder (i.e. (A)B, where A and B are balanced subsequences). Select a random length for (A), then recursively generate a random A and a random B.
Precomputing the tables of counts (or memoising the function which generates them) will produce a speedup if you expect to generate a lot of random strings.
Use dynamic programming to generate a data structure that knows how many there are for each choice, recursively. Then use that data structure to find a random choice.
I seem to be the only person who uses the technique. And I always write it from scratch. But here is working code that hopefully explains it. It will take time O(length_of_string * (length_of_alphabet + 2)) and similar data.
import random
class DPPath:
def __init__ (self):
self.count = 0
self.next = None
def add_option(self, transition, tail):
if self.next is None:
self.next = {}
self.next[transition] = tail
self.count += tail.count
def random (self):
if 0 == self.count:
return None
else:
return self.find(int(random.random() * self.count))
def find (self, pos):
result = self._find(pos)
return "".join(reversed(result))
def _find (self, pos):
if self.next is None:
return []
for transition, tail in self.next.items():
if pos < tail.count:
result = tail._find(pos)
result.append(transition)
return result
else:
pos -= tail.count
raise IndexException("find out of range")
def balanced_dp (n, alphabet):
# Record that there is 1 empty string with balanced parens.
base_dp = DPPath()
base_dp.count = 1
dps = [base_dp]
for _ in range(n):
# We are working backwards towards the start.
prev_dps = [DPPath()]
for i in range(len(dps)):
# prev_dps needs to be bigger in case of closed paren.
prev_dps.append(DPPath())
# If there are closed parens, we can open one.
if 0 < i:
prev_dps[i-1].add_option('(', dps[i])
# alphabet chars don't change paren balance.
for char in alphabet:
prev_dps[i].add_option(char, dps[i])
# Add a closed paren.
prev_dps[i+1].add_option(")", dps[i])
# And we are done with this string position.
dps = prev_dps
# Return the one that wound up balanced.
return dps[0]
# And a quick demo of several random strings.
for _ in range(10):
print(balanced_dp(10, "abc").random())

Sliding Window and Recognizing Specific Characters in a List

Instructions: Write a script that will calculate the %GC of a dna string
based on a sliding window of adjustable size. So say the length of
the window is L = 10 bases, then you will move the window along
the dna strand from position 0 to the end (careful, not too far...)
and 'extract' the bases into a substring and analyze GC content.
Put the numbers in a list. The dna string may be very large so you
will want to read the string in from an infile, and print the results
to a comma-delimited outfile that can be ported into Excel to plot.
For the final data analysis, use a window of L = 100 and analyze the two genomes in files:
Bacillus_amyloliquefaciens_genome.txt
Deinococcus_radiodurans_R1_chromosome_1.txt
But first, to get your script functioning, use the following trainer data set.Let window L=4. Example input and output follow:
INPUT:
AACGGTT
OUTPUT:
0,0.50
1,0.75
2,0.75
3,0.50
My code:
dna = ['AACGGTT']
def slidingWindow(dna,winSize,step):
"""Returns a generator that will iterate through
the defined chunks of input sequence. Input sequence
must be iterable."""
# Verify the inputs
#try: it = iter(dna)
# except TypeError:
#raise Exception("**ERROR** sequence must be iterable.")
if not ((type(winSize) == type(0)) and (type(step) == type(0))):
raise Exception("**ERROR** type(winSize) and type(step) must be int.")
if step > winSize:
raise Exception("**ERROR** step must not be larger than winSize.")
if winSize > len(dna):
raise Exception("**ERROR** winSize must not be larger than sequence length.")
# Pre-compute number of chunks to emit
numOfwins = ((len(dna)-winSize)/step)+1
# Do the work
for i in range(0,numOfwins*step,step):
yield dna[i:i+winSize]
chunks = slidingWindow(dna,len(dna),step)
for y in chunks:
total = 1
search = dna[y]
percentage = (total/len(dna))
if search == "C":
total = total+1
print ("#", y,percentage)
elif search == "G":
total = total+1
print ("#", y,percentage)
else:
print ("#", y, "0.0")
"""
MAIN
calling the functions from here
"""
# YOUR WORK HERE
#print ("#", z,percentage)
When approaching a complex problem, it is helpful to divide it into simpler sub-problems. Here, you have at least two separate concepts: a window of bases, and statistics on such a window. Why don't you tackle them one at a time?
Here is a simple generator that produces chunks of the desired size:
def get_chunks(dna, window_size=4, stride=1):
for i in range(0, len(dna) - window_size + 1, stride):
chunk = dna[i:i + window_size]
assert len(chunk) == window_size
yield chunk
for chunk in get_chunks('AACGGTT'):
print(chunk)
It displays this output:
AACG
ACGG
CGGT
GGTT
Now, with that in hand, could you write a simple function that accepts a four-character string and produces an appropriate statistical summary of it? [Please post it as a separate answer to your question. Yes, it might sound odd at first, but StackOverflow does encourage you to post answers to your questions, so you can share what you have learned.]

Incorporate higher order functions to calculate the average from a list of numbers using a text file

I need to write a program that computes and prints the average of numbers from a text file. I need to make use of two higher-order functions to simplify the design.
The text file (integers.txt) I will use has these numbers:
5
4
3
2
1
This is the code I currently have:
# I open up the file.
file = open("integers.txt", 'r')
file = file.read()
# I turn it into a list using the split method
file = file.split()
# I turn it into an integer using the map function.
file = map(int, file)
# I then use a for loop to get the total of all numbers in that list
# I then get the average
sum = 0
for numbers in file:
sum = numbers + sum
print(sum/len(file))
How can I use another high order function in this program? Please help. I am still a beginner.
I figured out the answer!
import functools
# open your file
file = open("integers.txt", 'r')
file = file.read()
# put numbers into a list
file = file.split()
# convert list into integers
file = list(map(int, file))
# use lambda function to get average.
print(functools.reduce(lambda x, y: x+y / len(file), file, 0))

Does max() not recognise 2 digit numbers?

I have a code which reads all files in a directory and appends the highest value from each file into a list. The problem is that it does recognise the number 10, but recognises all numbers 0-9. Each file contains the last three scores of each person, from 1-10. However, if the person scores 10, the program does not read that as the highest value, and chooses the second highest from the file and appends that to the list instead. The code works fine if none of their scores are 10. The code should then sort the list according to each person's highest score, which does work, but because it appended the wrong score to the list, therefore it sorts it incorrectly as well.
For example:
[3, 6, 8], highest score is 8, no problem
[6, 10, 9], highest score is 9, why?
The relevant section of code is below. P.s. I have imported all modules and declared all variables at the start (just not visible here), so that is not the problem. Thanks for all help
scores = []
for file in os.listdir(path):
file = os.path.join(path, file)
if os.path.isfile(file):
with open(file, 'r') as txt_file:
scores.append(max(str(n.strip()) for n in txt_file))
results = list(zip(files, scores))
results.sort(key=operator.itemgetter(1), reverse=True)
student_list = [x + ": " + y for x, y in results]
the problem is specifically with this line:
scores.append(max(str(n.strip()) for n in txt_file))
you are grabbing the max str value, and strings compare the same way all other sequences do: compare first element, if they are the same compare next... so when you do:
max("10","9")
it first compares "1" against "9" and sees that "9" is considered greater so that is the string returned, you need to be converting them to ints for them to compare as ints:
scores.append(max(int(n.strip()) for n in txt_file))
# ^ right here
Although since you are opening every single file in a directory if any of the files contain anything other then valid numbers on every line this would fail, so you probably want a try/except, although I cannot give you an example without knowing how files is defined because scores and files need to be the same length.

Count even numbers in a file using Python

Write a function that takes as argument, a filename to read, returns the number of even numbers present in the file.
I have tried and tried please some one help. it does not return the the even numbers.
def counteven(l):
infile = open('even.txt', 'r')
num = infile.read()
for i in infile:
if (i %2!=0):
return i
infile.close()
assertEqual(counteven('even.txt'),2)
#Ergwun pointed out already the problems in your code. Here's another solution:
def counteven(integers):
return sum(1 for n in integers if n % 2 == 0)
with open('even.txt') as f:
numbers = (int(line) for line in f)
print(counteven(numbers))
You do not say what the format of the file is. Based on your attempt, I'm assuming that your file contains just a single integer on each line.
Here are some of the problems with your function:
You are passing an argument to the function called l, but not using it. You should be using it as the name of the file to open, instead of hard coding 'even.txt'.
You are reading the entire file into a variable called num and then do not even use that variable. Having read in the entire file, there is nothing left to iterate over in your for loop.
Your for loop iterates over the lines of the file as strings. You need to convert the line to an integer before testing if it's divisible by two.
Inside the for loop, you are going to return the first even number found, rather than counting all the even numbers. You need to create a count variable before the loop, and increment in the loop every time an even number is found, then return the count after the loop has completed.
If you fix those problems, your function should look something like this:
def counteven(filename):
countOfEvenNumbers = 0
infile = open(filename, 'r')
for line in infile:
number = int(line)
if (number %2 == 0):
countOfEvenNumbers+= 1
infile.close()
return countOfEvenNumbers
...
UPDATE (to address your comment):
assertEqual is a method of the TestCase class provided by the unittest module.
If you are writing a unit test, then assertEqual should be called in a test case in a class derived from TestCase.
If you simply want to make an assertion ouside of a unit test you can write:
assert counteven('even.txt') == 2, ' Number of even numbers must be 2'

Resources