printing sentence from a word search - python-3.x

As an exercise in the code below, I've copied and saved Rice's Tarzan novel into a text file (named tarzan.txt) and within it, I've searched for "row" and printed out the corresponding lines.
Is it difficult to modify this code so that it searches for the word "row" rather than instances of these letters appearing in another word AND it prints the sentence that contain this word rather than simply the line it appears in? Thanks.
PS - in the code below, I couldn't get lines 3, 5, and 6 to indent properly, despite the 4 space suggestion
a="tarzan.txt"
with open (a) as f_obj:
contents=f_obj.readlines()
for line in contents:
if "row" in line:
print(line)

import re
a="tarzan.txt"
with open (a) as f_obj:
contents=f_obj.readlines()
for line in contents:
if re.search(r'\brow\b',line): ####### search for 'row' in line
print contents.index(line) ####### print line number
Here \b means word boundries.

Related

What is the problem with len and the output

I’m working on exercises in Python, I'm a beginner. I have a problem with this exercise:
Book Titles
You have been asked to make a special book categorization program, which assigns each book a special code based on its title.
The code is equal to the first letter of the book, followed by the number of characters in the title.
For example, for the book "Harry Potter", the code would be: H12, as it contains 12 characters (including the space).
You are provided a books.txt file, which includes the book titles, each one written on a separate line.
Read the title one by one and output the code for each book on a separate line.
For example, if the books.txt file contains:
Some book
Another book
Your program should output:
S9
A12
Recall the readlines() method, which returns a list containing the lines of the file.
Also, remember that all lines, except the last one, contain a \n at the end, which should not be included in the character count.
I understand what I should do but my output is not the same as (S9 or A12)..
This is my code…
file = open("/usercode/files/books.txt", "r")
for i in file.readlines():
print(i[0])
print(len(i))
file.close()
my output is:
H
13
T
17
P
20
G
18
Expected Output
H12
T16
P19
G18
You missed the part of the instructions where it says "remember that all lines, except the last one, contain a \n at the end, which should not be included in the character count."
I'd suggest stripping off the newline, e.g. print(len(i.strip('\n'))).
To get them all on the same line, just combine the prints, and use an empty sep:
for i in file:
i = i.strip('\n')
print(i[0], len(i), sep='')

Splitting File contents by regex

As the title stated, I need to split a file by regex in Python.
The lay out of the .txt file is as follows
[text1]
file contents I need
[some text2]
more file contents I need
[more text 3]
last bit of file contents I need
I originally tried splitting the files like so:
re.split('\[[A-Za-z]+\]\n', data)
The problem with doing it this way was that it wouldn't capture the blocks that had spaces in between the text within the brackets.
I then tried using a wild card character: re.split('\[(.*?)\]\n', data)
The problem I ran into this was that I found it would split the file contents as well. What's the best way to to get the following result:
['file contents I need','more file contents I need','last bit of file contents I need']?
Thanks in advance.
Instead of using re.split, you could use a capturing group with re.findall which will return the group 1 values.
In the group, match all the lines that do not start with the [.....] pattern
^\[[^][]*]\r?\n\s*(.*(?:\r?\n(?!\[[^][]*]).*)*)
In parts
^ Start of line
\[[^][]*]
\r?\n\s* Match an newline and optional whitespace chars
( Capture group 1
.* Match any char except a newline 0+ times
(?: Non capture group
\r?\n(?!\[[^][]*]).* Match the line if it does not start with with the [...] pattern using a negative lookahead (?!
)* Close group and repeat 0+ times to get all the lines
) Close group
See a regex demo or a Python demo
Example code
import re
regex = r"^\[[^][]*]\r?\n\s*(.*(?:\r?\n(?!\[[^][]*]).*)*)"
data = ("[text1]\n\n"
"file contents I need\n\n"
"[some text2]\n\n"
"more file contents I need\n\n"
"[more text 3]\n\n"
"last bit of file contents I need\n"
"last bit of file contents I need")
matches = re.findall(regex, data, re.MULTILINE)
print(matches)
Output
['file contents I need\n', 'more file contents I need\n', 'last bit of file contents I need\nlast bit of file contents I need']
Given:
txt='''\
[text1]
file contents I need
[some text2]
more file contents I need
multi line at that
[more text 3]
last bit of file contents I need'''
(Which could be from a file...)
You can do:
>>> [e.strip() for e in re.findall(r'(?<=\])([\s\S]*?)(?=\[|\s*\Z)', txt)]
['file contents I need', 'more file contents I need\nmulti line at that', 'last bit of file contents I need']
Demo
You can also use re.finditer to locate each block of interest:
with open(ur_file) as f:
for i, block in enumerate(re.finditer(r'^\s*\[[^]]*\]([\s\S]*?)(?=^\s*\[[^]]*\]|\Z)', f.read(), flags=re.M)):
print(i, block.group(1))
The individual blocks leading and trailing whitespace can be dealt with as desired...

Read some specific lines from a big file in python

I want to read some specific lines from a large text file where line numbers are in a list, for example:
list_Of_line =[3991, 3992, ...]. I want to check whether there is the string "this city" in line number 3991, 3992,... or not. I want to directly access those lines. How can I do this in python?
Text_File is like below
Line_No
......................
3990 It is a big city.
3991 I live in this city.
3992 I love this city.
.......................
There is no way to "directly access" a specific number of line of a file outright, since lines can start at any position and can be of any lengths. The only way to know where each line is in a file is therefore by reading every character of the file to locate each newline character.
Understanding that, you can iterate through a given file object to build a list of file positions of the end of each line by calling the tell method of the file object, so that you can then "directly access" any line number you want with the seek method of the file object to read the specific line:
list_of_lines = [3991, 3992]
with open('file.txt') as file:
positions = [0, *(file.tell() for line in file)]
for line_number in list_of_lines:
file.seek(positions[line_number - 1])
if 'this city' in next(file):
print(f"'this city' found in line #{line_number}")

using title() creates a extra line in python

Purpose: Writing a code to capitalize the first letter of each word in a file.
Steps include opening the file in read mode and using title() in each line.
When the output is printed it creates extra blank line between each line in the file.
For example:
if the content is
one two three four
five six seven eight
output is:
One Two Three Four
Five Six Seven Eight
Not sure why the space shows up there
I used strip() followed by title() to escape the spaces but would like to know why we get spaces.
inputfile = input("Enter the file name:")
openedfile = open(inputfile, 'r')
for line in openedfile:
capitalized=line.title()
print(capitalized)
the above code prints output with an added blank line
solved it using below code:
inputfile = input("Enter the file name:")
openedfile = open(inputfile, 'r')
for line in openedfile:
capitalized=line.title().strip()
print(capitalized)
Expected to print capitalized words without spaces by just using title() and not title().strip()

Reading text files and calculate the mean length of every 3rd word

How to open a text file (includes 5 lines) and writting a program to calculate the mean length of the third word in line over all lines in this text file. (A word is defined as a group of characters surrounded by spaces and/or a line ending.)
I suggest reading this Reading and writing Files in Python .. since what you are asking is a pretty basic question and I believe there are many resources out there. Just search :]
But not to leave you empty handed...
# mean_word.py
with open('file.txt') as data_file:
# Split data into lists representing lines
word_lists = [line.split(' ') for line in data_file.readlines()]
word_count = sum(len(line) for line in word_lists)
n_of_words = sum(len(word) for line in word_lists for word in line)
mean_word_len = n_of_words / word_count

Resources