How do I find multiple strings in a text file? - python-3.x

I need all the strings found in the text file to be found and capitalized. I have found out how to find the string but getting multiple is my issue if you can help me print, where the given string is throughout my code, would be great thanks.
import os
import subprocess
i = 1
string1 = 'biscuit eater'
# opens the text file
# if this is the path where my file resides, f will become an absolute path to it
f = os.path.expanduser("/users/acarroll55277/documents/Notes/new_myfile.txt")
# with this form of open, the wile will automatically close when exiting the code block
txtfile = open (f, 'r')
# print(f.read()) to print the text document in terminal
# this sets variables flag and index to 0
flag = 0
index = 0
# looks through the file line by line
for line in txtfile:
index += 1
#checking if the sting is in the line or not
if string1 in line:
flag = 1
break
# checking condition for sting found or not
if flag == 0:
print('string ' + string1 + ' not found')
else:
print('string ' + string1 + ' found in line ' + str(index))

I believe your approach would work, but it is very verbose and not very Pythonic. Try this out:
import os, subprocess
string1 = 'biscuit eater'
with open(os.path.expanduser("/users/acarroll55277/documents/Notes/new_myfile.txt"), 'r+') as fptr:
matches = list()
[matches.append(i) for i, line in enumerate(fptr.readlines()) if string1 in line.strip()]
fptr.read().replace(string1, string1.title())
if len(matches) == 0: print(f"string {string1} not found")
[print(f"string {string1} found in line {i}") for i in matches]
This will now print out a message for every occurrence of your string in the file. In addition, the file is handled safely and closed automatically at the end of the script thanks to the with statement.

You can use the str.replace-method. So in the line where you find the string, write line.replace(string1, string1.upper(), 1). The last 1 is there to only make the function replace 1 occurence of the string.
Either that or you read the text file as a string and use the replace-method on that entire string. That saves you the trouble of trying to find the occurence manually. In that case, you can write
txtfile = open(f, 'r')
content = txtfile.read()
content = content.replace(string1, string1.upper())

Related

Using a function to print the characters from a file?

So I have a text file, and I need to define a function to open the file, read through it, and then return and print the number of characters within the file.
So far I've got:
def num_chars_in_file(file):
path = 'planets.txt'
file_handle = open(path)
for text in file_handle:
file = file_handle.readlines()
print(file)
print(f"\nProblem 1: {num_chars_in_file()}")
# I'm not sure where to go from where.
You could create a count variable to store the cumulative total of characters as you iterate over each line, something like this:
def num_chars_in_file():
path = 'planets.txt'
file_handle = open(path)
count = 0
for text in file_handle:
count += len(text.rstrip())
file_handle.close() # Make sure to close the file if you're not using with
return count
print(f"\nProblem 1: {num_chars_in_file()}")
with open('my_words.txt') as infile:
lines=0
words=0
characters=0
for line in infile:
wordslist=line.split()
lines=lines+1
words=words+len(wordslist)
characters += sum(len(word) for word in wordslist)
print(lines)
print(words)
print(characters)
Try this to print number of line, words and characters in the file.
Refer to this similar question more details.

Splitting text file in Python - delimeter issue

I try to split a file by delimeter: "}., but the delimeter is not found and as a result I get only one new file with the same content as the original one. The code is:
with open('okladki_200_01') as fp:
contents = fp.read()
i = 1
for entry in contents.split('"}.'):
f= open("okladka_%s" % i,"w+")
f.write(entry)
f.close()
i += 1
Can you help, please?
EDIT:
The content of the file is like:
{"base64Image":"/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB\nAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEB\nAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAusFMADASIA\nAhEBAxEB/8QAHwAAAgIBBQEBAAAAAAAAAAAAAgQAAwUBBgcICQoL/8QAaRAAAQEFBAcDBwgHBQYD\nAwEZAwIBBBESEwAhIiMFFDEyM0FDUVNhBiRCY3GBkQcIFTRSc6GxRGKDk8HR8FRyo+HxCRYlZLPD\ndILTFzWEkp [...] 3aIiVoL1pmNQxjWr27\nPBnhatT94NfdwDzDBz9aSP/Z\n","elementHashcode":-1794239528,"imageOrientation":6,"type":"BOOK"}
{"base64Image":"/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB\nAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB
And I think I just found the problem... HxD viewer displays 0x0A ASCII character as a dot, but it is New Line. So I should look for '"}\n'
Move contents.split('"}.') into its own variable.
lines = contents.split('.}"')
for entry in lines:
...
Code :
with open('textfile') as fp:
contents = fp.read()
i = 1
lines = contents.split('.}"')
for entry in lines:
f= open("textfile_%s" % i,"w+")
f.write(entry)
f.close()
i += 1
fp.close()
Do you actually need to check for brackets? In your case it seems like your input file is already formatted with 1 content = 1 line, so our delimiter could be \n instead and we can use readlines().
Here is a possible solution:
with open('okladki_200_01') as fp:
lines = fp.readlines() # this is a list of strings.
i = 1
for line in lines:
entry = line.lstrip("{").rstrip("}\n") # some clean-up.
f = open("okladka_%s" %i ,"w+")
f.write(entry)
f.close()
i += 1

Python 3.6.1: Code does not execute after a for loop

I've been learning Python and I wanted to write a script to count the number of characters in a text and calculate their relative frequencies. But first, I wanted to know the length of the file. My intention is that, while the script goes from line to line counting all the characters, it would print the current line and the total number of lines, so I could know how much it is going to take.
I executed a simple for loop to count the number of lines, and then another for loop to count the characters and put them in a dictionary. However, when I run the script with the first for loop, it stops early. It doesn't even go into the second for loop as far as I know. If I remove this loop, the rest of the code goes on fine. What is causing this?
Excuse my code. It's rudimentary, but I'm proud of it.
My code:
import string
fname = input ('Enter a file name: ')
try:
fhand = open(fname)
except:
print ('Cannot open file.')
quit()
#Problematic bit. If this part is present, the script ends abruptly.
#filelength = 0
#for lines in fhand:
# filelength = filelength + 1
counts = dict()
currentline = 1
for line in fhand:
if len(line) == 0: continue
line = line.translate(str.maketrans('','',string.punctuation))
line = line.translate(str.maketrans('','',string.digits))
line = line.translate(str.maketrans('','',string.whitespace))
line = line.translate(str.maketrans('','',""" '"’‘“” """))
line = line.lower()
index = 0
while index < len(line):
if line[index] not in counts:
counts[line[index]] = 1
else:
counts[line[index]] += 1
index += 1
print('Currently at line: ', currentline, 'of', filelength)
currentline += 1
listtosort = list()
totalcount = 0
for (char, number) in list(counts.items()):
listtosort.append((number,char))
totalcount = totalcount + number
listtosort.sort(reverse=True)
for (number, char) in listtosort:
frequency = number/totalcount*100
print ('Character: %s, count: %d, Frequency: %g' % (char, number, frequency))
It looks fine the way you are doing it, however to simulate your problem, I downloaded and saved a Guttenberg text book. It's a unicode issue. Two ways to resolve it. Open it as a binary file or add the encoding. As it's text, I'd go the utf-8 option.
I'd also suggest you code it differently, below is the basic structure that closes the file after opening it.
filename = "GutenbergBook.txt"
try:
#fhand = open(filename, 'rb')
#open read only and utf-8 encoding
fhand = open(filename, 'r', encoding = 'utf-8')
except IOError:
print("couldn't find the file")
else:
try:
for line in fhand:
#put your code here
print(line)
except:
print("Error reading the file")
finally:
fhand.close()
For the op, this is a specific occasion. However, for visitors, if your code below the for state does not execute, it is not a python built-in issue, most likely to be: an exception error handling in parent caller.
Your iteration is inside a function, which is called inside a try except block of caller, then if any error occur during the loop, it will get escaped.
This issue can be hard to find, especially when you dealing with intricate architecture.

Parse Text with Python

I have data like the example data below in a text file. What I would like to do is search through the text file and return everything between "SpecialStuff" and the next ";", like I've done with the example out put. I'm pretty new to python so any tips are greatly appreciated, would something like .split() work?
Example Data:
stuff:
1
1
1
23
];
otherstuff:
do something
23
4
1
];
SpecialStuff
select
numbers
,othernumbers
words
;
MoreOtherStuff
randomstuff
##123
Example Out Put:
select
numbers
,othernumbers
words
You can try this:
file = open("filename.txt", "r") # This opens the original file
output = open("result.txt", "w") # This opens a new file to write to
seenSpecialStuff = 0 # This will keep track of whether or not the 'SpecialStuff' line has been seen.
for line in file:
if ";" in line:
seenSpecialStuff = 0 # Set tracker to 0 if it sees a semicolon.
if seenSpecialStuff == 1:
output.write(line) # Print if tracker is active
if "SpecialStuff" in line:
seenSpecialStuff = 1 # Set tracker to 1 when SpecialStuff is seen
This returns a file named result.txt that contains:
select
numbers
,othernumbers
words
This code can be improved! Since this is likely a homework assignment, you'll probably want to do more research about how to make this more efficient. Hopefully it can be a useful starting ground for you!
Cheers!
EDIT
If you wanted the code to specifically read the line "SpecialStuff" (instead of lines containing "SpecialStuff"), you could easily change the "if" statements to make them more specific:
file = open("my.txt", "r")
output = open("result.txt", "w")
seenSpecialStuff = 0
for line in file:
if line.replace("\n", "") == ";":
seenSpecialStuff = 0
if seenSpecialStuff == 1:
output.write(line)
if line.replace("\n", "") == "SpecialStuff":
seenSpecialStuff = 1
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile: # open the input and output files
wanted = False # do we want the current line in the output?
for line in infile:
if line.strip() == "SpecialStuff": # marks the begining of a wanted block
wanted = True
continue
if line.strip() == ";" and wanted: # marks the end of a wanted block
wanted = False
continue
if wanted: outfile.write(line)
Don't use str.split() for that - str.find() is more than enough:
parsed = None
with open("example.dat", "r") as f:
data = f.read() # load the file into memory for convinience
start_index = data.find("SpecialStuff") # find the beginning of your block
if start_index != -1:
end_index = data.find(";", start_index) # find the end of the block
if end_index != -1:
parsed = data[start_index + 12:end_index] # grab everything in between
if parsed is None:
print("`SpecialStuff` Block not found")
else:
print(parsed)
Keep in mind that this will capture everything between those two, including new lines and other whitespace - you can additionally do parsed.strip() to remove leading and trailing whitespaces if you don't want them.

python3 opening files and reading lines

Can you explain what is going on in this code? I don't seem to understand
how you can open the file and read it line by line instead of all of the sentences at the same time in a for loop. Thanks
Let's say I have these sentences in a document file:
cat:dog:mice
cat1:dog1:mice1
cat2:dog2:mice2
cat3:dog3:mice3
Here is the code:
from sys import argv
filename = input("Please enter the name of a file: ")
f = open(filename,'r')
d1ct = dict()
print("Number of times each animal visited each station:")
print("Animal Id Station 1 Station 2")
for line in f:
if '\n' == line[-1]:
line = line[:-1]
(AnimalId, Timestamp, StationId,) = line.split(':')
key = (AnimalId,StationId,)
if key not in d1ct:
d1ct[key] = 0
d1ct[key] += 1
The magic is at:
for line in f:
if '\n' == line[-1]:
line = line[:-1]
Python file objects are special in that they can be iterated over in a for loop. On each iteration, it retrieves the next line of the file. Because it includes the last character in the line, which could be a newline, it's often useful to check and remove the last character.
As Moshe wrote, open file objects can be iterated. Only, they are not of the file type in Python 3.x (as they were in Python 2.x). If the file object is opened in text mode, then the unit of iteration is one text line including the \n.
You can use line = line.rstrip() to remove the \n plus the trailing withespaces.
If you want to read the content of the file at once (into a multiline string), you can use content = f.read().
There is a minor bug in the code. The open file should always be closed. I means to use f.close() after the for loop. Or you can wrap the open to the newer with construct that will close the file for you -- I suggest to get used to the later approach.

Resources