string to integer and formating code in python - string

I need to read from a file the number of times a letter v is mentioned. I actually know for a fact that if 'v' is in that sentence it will be the first thing to show up. The way I have it set up it goes and counts string by string and that is how it writes it, but I wan to just have one sentence mentioning the number of times 'v' is mentioned in the whole file.
f = open("triangle.txt", 'r') #opens the given name file to read input
fw = open("convert.txt",'w') #opens the given name file to write in
for line in f:
data = line.strip().split(" ")
vertices=0
vertices =(str(data.count('v')))
fw.write("Number of vertices = " + vertices +'\n')
f.close()
fw.close()
I tried
vertices += int((str(data.count('v'))))
yet that keeps giving me an error message that I can't convert string to integer.
Any suggestions greatly appreciated.

Firstly, if you want one sentence that mentions number of times 'v' is mentioned, simply write this line
fw.write("Number of vertices = " + vertices +'\n')
out of for loop. secondly,
data.count('v')
will give you an int value as output, so you don't have to first convert it to string and then back to integer. here's the amended code;
f = open("triangle.txt", 'r') #opens the given name file to read input
fw = open("convert.txt",'w') #opens the given name file to write in
vertices=0
for line in f:
data = line.strip().split(" ")
vertices += ((data.count('v')))
fw.write("Number of vertices = " + str(vertices) +'\n')
f.close()
fw.close()
Also, your code only counts 'v', if it is mentioned as an individual word in the sentence. to count the total number of times 'v' occurred, use what #bad_keypoints suggested.

If you simply want to know the number of times v was mentioned in the file, why don't you simple do this:
with open('file.dat', 'r+') as f:
v_count = f.read().count('v')

Related

How do I count all occurrences of a phrase in a text file using regular expressions?

I am reading in multiple files from a directory and attempting to find how many times a specific phrase (in this instance "at least") occurs in each file (not just that it occurs, but how many times in each text file it occurs) My code is as follows
import glob
import os
path = 'D:/Test'
k = 0
for filename in glob.glob(os.path.join(path, '*.txt')):
if filename.endswith('.txt'):
f = open(filename)
data = f.read()
data.split()
data.lower()
S = re.findall(r' at least ', data, re.MULTILINE)
count = []
if S == True:
for S in data:
count.append(data.count(S))
k= k + 1
print("'{}' match".format(filename), count)
else:
print("'{}' no match".format(filename))
print("Total number of matches", k)
At this moment I get no matches at all. I can count whether or not there is an occurrence of the phrase but am not sure why I can't get a count of all occurrences in each text file.
Any help would be appreciated.
regards
You can get rid of the regex entirely, the count-method of string objects is enough, much of the other code can be simplified as well.
You're also not changing data to lower case, just printing the string as lower case, note how I use data = data.lower() to actually change the variable.
Try this code:
import glob
import os
path = 'c:\script\lab\Tests'
k = 0
substring = ' at least '
for filename in glob.glob(os.path.join(path, '*.txt')):
if filename.endswith('.txt'):
f = open(filename)
data = f.read()
data = data.lower()
S= data.count(substring)
if S:
k= k + 1
print("'{}' match".format(filename), S)
else:
print("'{}' no match".format(filename))
print("Total number of matches", k)
If anything is unclear feel free to ask!
You make multiple mistakes in your code. data.split() and data.lower() have no effect at all, since the both do not modifiy data but return a modified version. However, you don't assign the return value to anything, so it is lost.
Also, you should always close a resource (e.g. a file) when you don't need it anymore.
Also, you append every string you find using re.search to a list S, which you dont use for anything anymore. It would also be pointless, because it would just contain the string you are looking for x amount of time. You can just take the list that is returned by re.search and comupute its length. This gives you the number of times it occurs in the text. Then you just increase your counter variable k by that amount and move on to the next file. You can still have your print statements by simply printing the temporary num_found variable.
import re
import glob
import os
path = 'D:/Test'
k = 0
for filename in glob.glob(os.path.join(path, '*.txt')):
if filename.endswith('.txt'):
f = open(filename)
text = f.read()
f.close()
num_found = len(re.findall(r' at least ', data, re.MULTILINE))
k += num_found

Sorting sentences of text file by users input

My code is sorting the sentences of file based on a length of the sentences by their length and saving to a new file.
How can I alter my code so that if the user inputs any number at program start, we filter the lines based on that input.
Example: The user inputs 50 - the program will sort all sentences that have a greater length than 50 or if the user inputs all then the program will sort all lines as normal.
My code:
file = open("testing_for_tools.txt", "r")
lines_ = file.readlines()
#print(lines_)
user_input = input("enter")
if user_input is int:
lines = sorted(lines_, key=len)
else:
lines = sorted(lines_, key=len)
# lines.sort()
file_out = open('testing_for_tools_sorted.txt', 'w')
file_out.write(''.join(lines)) # Write a sequence of strings to a file
file_out.close()
file.close()
print(lines)
input returns a string, always, if you want an integer or somesuch you need to parse it explicitely, you will never get an integer out of input.
is is not a type-testing primitive in python, it's an identity primitive. It checks if the left and right are the same object and that's it.
filter is what you're looking for here, or a list comprehension: if the user provided an input and that input is a valid integer, you want to filter the lines to only those above the specified length. This is a separate step from sorting.
That aside,
you should use with to manage files unless there are specific reasons that you shan't or can't
files have a writelines method which should be more efficient than writing joined lines
never ever open files in text mode without providing an encoding, otherwise Python asks the system for an encoding and it's easy for that system to be misconfigured or oddly configured leading to garbage inputs
with open("testing_for_tools.txt", "r", encoding='utf-8') as f:
lines_ = file.readlines()
#print(lines_)
user_input = input("enter")
if user_input:
try:
limit = int(user_input.strip())
except ValueError:
pass
else:
lines_ = (l for l in lines_ if len(l) >= limit)
lines = sorted(lines_, key=len)
with open('testing_for_tools_sorted.txt', 'w', encoding='utf-8') as f:
f.writelines(lines)
print(lines)
#Black Snow
I don't have anything else to answer if its working as expected.
This is a rather long answer:
idx_to_sort = [True if len(i)>int(user_input) else False for i in lines_]
idx_to_sort
lines_to_sort = []
for i in range(len(idx_to_sort)):
if idx_to_sort[i]:
lines_to_sort.append(lines_[i])
lines_to_sort
lines = sorted(lines_to_sort, key=len)
lines
counter=0
for i in range(len(idx_to_sort)):
if idx_to_sort[i]:
lines_[i] = lines[counter]
counter += 1
lines_
The output would be different but not what you expected.

Nested for loop in python doesn't iterate fully

I have the following code to replace just one set of value from 26th line of 150 lines data. The problem is with the nested for loop. From the second iteration onwards, the inner loop isn't executing. Instead the loop skips to the last line
n= int(input("Enter the Number of values: "))
arr = []
print ('Enter the Fz values (in Newton): ')
for _ in range(n):
x=(input())
arr.append(x)
print ('Array: ', arr)
os.chdir ("D:\Work\python")
f = open("datanew.INP", "r+")
f1 = open("data.INP", "w+")
for i in range(len(arr)):
str2 = arr[i]
for j,line in enumerate(f):
if j == 25 :
new = line.split()
linenew = line.replace(new[1],str2)
print (linenew)
f1.write(linenew)
else:
f1.write(line)
print(arr[i], 'is replaced')
f.close()
f1.close()
The issue is that your code is looping over a file. On the first pass through, the end of file is reached. After that, there is no data left in the file to read, so the next loop has nothing to iterate over.
Instead, you might try reading over the whole file and storing the data in a list, then looping over that list. (Alternatively, you could eliminate the loops and access the 26th item directly.)
Here is some simple code to read from one file, replace the 26th line, and write to another file:
f = open("data.INP", "r") # Note that for simple reading you don't need the '+' added to the 'r'
the_data = f.readlines()
f.close()
the_data[25] = 'new data\n' # Remember that Python uses 0 indexing, thus the 25
f1 = open("updated_data.out", "w") # Assuming you want to write a new file, leave off the '+' here, as that indicates that you want to append to an existing file
for l in the_data:
f1.write(l)
f1.close()

Append string based on condition python

I just want to append strings based on my condition. For example all strings starting with http won't be appended but all the other strings in each that has a length of 40 will be appended.
words = []
store1 = []
disregard = ["http","gen"]
for all in glob.glob(r'MYDIR'):
with open(all, "r",encoding="utf-16") as f:
text = f.read()
lines = text.split("\n")
for each in lines:
words += each.split()
for each in words:
if len(each) == 40 and each not in disregard:
store1.append(each)
Update:
if disregard[0] not in each:
works but how can I compare it to all the contents in my list? using disregard only doesnt work
Here is my input text file :
http://1234ashajkhdajkhdajkhdjkaaaaaaad1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
genp://1234ashajkhdajkhdajkhdjkaaaaaaad1
a\a
The only thing that will append will be "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
I think the answers should depend on the number of words you want to disregard.
It's important to define what word means. If the word ends with spaces, should they all be stripped?
One solution could be to create a regular expression from all your words and use that to match the line.
import glob
import re
disregard = ["http","gen"]
pattern = "|".join([re.escape(w) for w in disregard])
for all in glob.glob(r'MYDIR/*'):
with open(all, "r", encoding="utf-16") as f:
matched_words = []
for line in f:
line = line.rstrip("\n")
if len(line) == 40 and not re.match(pattern, line):
matched_words.append(line)
print(matched_words)
The basic structure looks ok, it seems the place where it's breaking is setting up incorrect conditionals. You say you want to check where each line starts with the supplied strings, but then you split each line and check for existence of those strings. Use .startswith() instead. This will also make it so there doesn't have to be a space after "http" in order for that string to be caught.
Also, either the conditional testing should be placed after the loop that builds the words list, or else the words list should be reset at the start of each loop so you're not re-testing words you've already checked.
# adjusted some variable names for clarity
words = []
output = []
disregard = ["http","gen"]
for fname in glob.glob(r'MYDIR'):
with open(fname, "r", encoding="utf-16") as f:
text = f.read()
lines = text.split("\n")
for line in lines:
words += line.split()
for word in words:
if len(word) == 40 and not any([word.startswith(dis) for dis in disregard]):
output.append(each)

Program won't print complete words just single letters

I'm using Python and am trying to make a small game for a college assignment. I am trying to print a randomly selected word from a few external text files (with harder words in each one) and have it displayed for 2 seconds, the word then disappears and the user has to spell it. At the moment my program just displays a random letter from the text file and no whole words.
print ("""Welcome to the Spelling Game
What difficulty do you want to play?
Easy, Medium or Hard?""")
strDifficulty = input().upper
if strDifficulty is ("EASY"):
with open ('EASY.txt', 'r') as f:
(chosen) = f.readlines()
if strDifficulty is ("MEDIUM"):
with open ('MEDIUM.txt', 'r') as f:
(chosen) = f.readlines()
if strDifficulty is ("HARD"):
with open ('HARD.txt', 'r') as f:
(chosen) = f.readlines()
import random
x = ('chosen')
print (random.choice (x))
There's multiple issues with your code why it would print out a single character:
strDifficulty = input().upper does not uppercase input from the command line. It will read something you type, which is a string (str in python) and assign the method upper of that string to strDifficulty. What you're likely looking for is strDifficulty = input().upper() (the extra parentheses will call the method upper, returning an uppercased version of what is read from standard in.
x = ('chosen') is assigning the string 'chosen' to x, not the value of the variable chosen. You might have meant x = chosen, assigning the value of chosen to x.
print (random.choice(x)) isn't far off, but will choose a random element from x. As x is always the string 'chosen', you'll likely get one of those letters. You could simply remove the line x = ('chosen') and call print(random.choice(chosen)).
There's plenty more to be said about your piece of code, but let's start here :)
I have made some modifications to your code.
print ("""Welcome to the Spelling Game
What difficulty do you want to play?
Easy, Medium or Hard?""")
strDifficulty = input().upper()
if strDifficulty=="EASY":
with open ('EASY.txt', 'r') as f:
chosen = f.readlines()
if strDifficulty=="MEDIUM":
with open ('MEDIUM.txt', 'r') as f:
chosen = f.readlines()
if strDifficulty=="HARD":
with open ('HARD.txt', 'r') as f:
chosen = f.readlines()
import random
print (random.choice (chosen))

Resources