Reading in a file of one-word lines in python - string

Just curious if there's a cleaner way to do this. I have a list of words in a file, one word per line.
I want to read them in and pass each word to a function.
I've currently got this:
f = open(fileName,"r");
lines = f.readlines();
count = 0
for i in lines:
count += 1
print("--{}--".format(i.rstrip()))
if count > 100:
return
I there a way to read them in faster without using rstrip on each line?

with open(fileName) as f:
lines = (line for _, line in zip(range(100), f.readlines()))
for line in lines:
print('--{}--'.format(line.rstrip()))
This is how I would do it. Note the context manager (the with/as statement), and the generator comprehension giving us only the first 100 lines.

Similar to Patrick's answer:
with open(filename, "r") as f:
for i, line in enumerate(f):
if i >= 100:
break
print("--{}--".format(line[:-1]))
If you don't an .strip() and know the length line terminator, you can use [:-1].

Related

How can I print the line index of a specific word in a text file?

I was trying to find a way to print the biggest word from a txt file, it's size and it's line index. I managed to get the first two done but can't quite figure it out how to print the line index. Can anyone help me?
def BiggestWord():
list_words = []
with open('song.txt', 'r') as infile:
lines = infile.read().split()
for i in lines:
words = i.split()
list_words.append(max(words, key=len))
biggest_word = str(max(list_words, key=len))
print biggest_word
print len(biggest_words)
FindWord(biggest_word)
def FindWord(biggest_word):
You don't need to do another loop through your list of largest words from each line. Every for-loop increases function time and complexity, and it's better to avoid unnecessary ones when possible.
As one of the options, you can use Python's built-in function enumerate to get an index for each line from the list of lines, and instead of adding each line maximum to the list, you can compare it to the current max word.
def get_largest_word():
# Setting initial variable values
current_max_word = ''
current_max_word_length = 0
current_max_word_line = None
with open('song.txt', 'r') as infile:
lines = infile.read().splitlines()
for line_index, line in enumerate(lines):
words = line.split()
max_word_in_line = max(words, key=len)
max_word_in_line_length = len(max_word_in_line)
if max_word_in_line_length > current_max_word_length:
# updating the largest word value with a new maximum word
current_max_word = max_word_in_line
current_max_word_length = max_word_in_line_length
current_max_word_line = line_index + 1 # line number starting from 1
print(current_max_word)
print(current_max_word_length)
print(current_max_word_line)
return current_max_word, current_max_word_length, current_max_word_line
P.S.: This function doesn't suggest what to do with the line maximum words of the same length, and which of them should be chosen as absolute max. You would need to adjust the code accordingly.
P.P.S.: This example is in Python 3, so change the snippet to work in Python 2.7 if needed.
With a limited amount of info I'm working with, this is the best solution I could think of. Assuming that each line is separated by a new line, such as '\n', you could do:
def FindWord(largest_word):
with open('song.txt', 'r') as infile:
lines = infile.read().splitlines()
linecounter = 1
for i in lines:
if largest_word in lines:
return linecounter
linecounter += 1
You can use enumerate in your for to get the current line and sorted with a lambda to get the longest word:
def longest_word_from_file(filename):
list_words = []
with open(filename, 'r') as input_file:
for index, line in enumerate(input_file):
words = line.split()
list_words.append((max(words, key=len), index))
sorted_words = sorted(list_words, key=lambda x: -len(x[0]))
longest_word, line_index = sorted_words[0]
return longest_word, line_index
Are you aware that there can be:
many 'largest' words with the same length
several lines contain word(s) with the biggest length
Here is the code that finds ONE largest word and returns a LIST of numbers of lines that contain the word:
# built a dictionary:
# line_num: largest_word_in_this_line
# line_num: largest_word_in_this_line
# etc...
# !!! actually, a line can contain several largest words
list_words = {}
with open('song.txt', 'r') as infile:
for i, line in enumerate(infile.read().splitlines()):
list_words[i] = max(line.split(), key=len)
# get the largest word from values of the dictionary
# !!! there can be several different 'largest' words with the same length
largest_word = max(list_words.values(), key=len)
# get a list of numbers of lines (keys of the dictionary) that contain the largest word
lines = list(filter(lambda key: list_words[key] == largest_word, list_words))
print(lines)
If you want to get all lines that have words with the same biggest length you need to modify the last two lines in my code this way:
lines = list(filter(lambda key: len(list_words[key]) == len(largest_word), list_words))
print(lines)

How to process each word in a python program

I want to write a program that reads every word from every line of a text file.
I tried using nested loop but the second loop starts reading each word. Can someone explain this? Accodrding to me it should read the individual words instead of letters.
fh=open("romeo.txt")
d=dict()
c=0
for i in fh:
for j in i:
d[c]=j
c+=1
print(d)
for i in d:
print(d.get('moon',None))
the output is shown in Picture 1
I made a code which does the thing I want but is there any short way to do it?
fh=open("romeo.txt")
d=dict()
c=0
for i in fh:
i=i.rstrip()
print("by the first loop ######################", i)
k=i.split()
for j in k:
print("by the second loop ##################", j)
d[c]=j
c+=1
print(d)
the output which I want is given in Picture 2
Also, can I use split() function here to do it?
How can I use it because it seems to get only the last line of the file as a list and I want all the words in list or dictionary.
Thank You
for i in fh:
This line iterates through each line of text in the file
for j in i:
Since i is a string, this line iterates through each letter in each line. Instead of doing it this way, split() the line over whitespace and then iterate through the resulting list:
for line in fh:
for word in line.split():
#do stuff
Anyway since you wanted a short way to do it here's a neat one liner:
To make a list of each word in the file:
[word for line in open("romeo.txt") for word in line.split()]
To make a dict (list is better since your keys are integer indices anyway):
{c: i for c, i in enumerate([word for line in open("romeo.txt") for word in line.split()])}

Append string based on condition python

I just want to append strings based on my condition. For example all strings starting with http won't be appended but all the other strings in each that has a length of 40 will be appended.
words = []
store1 = []
disregard = ["http","gen"]
for all in glob.glob(r'MYDIR'):
with open(all, "r",encoding="utf-16") as f:
text = f.read()
lines = text.split("\n")
for each in lines:
words += each.split()
for each in words:
if len(each) == 40 and each not in disregard:
store1.append(each)
Update:
if disregard[0] not in each:
works but how can I compare it to all the contents in my list? using disregard only doesnt work
Here is my input text file :
http://1234ashajkhdajkhdajkhdjkaaaaaaad1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
genp://1234ashajkhdajkhdajkhdjkaaaaaaad1
a\a
The only thing that will append will be "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
I think the answers should depend on the number of words you want to disregard.
It's important to define what word means. If the word ends with spaces, should they all be stripped?
One solution could be to create a regular expression from all your words and use that to match the line.
import glob
import re
disregard = ["http","gen"]
pattern = "|".join([re.escape(w) for w in disregard])
for all in glob.glob(r'MYDIR/*'):
with open(all, "r", encoding="utf-16") as f:
matched_words = []
for line in f:
line = line.rstrip("\n")
if len(line) == 40 and not re.match(pattern, line):
matched_words.append(line)
print(matched_words)
The basic structure looks ok, it seems the place where it's breaking is setting up incorrect conditionals. You say you want to check where each line starts with the supplied strings, but then you split each line and check for existence of those strings. Use .startswith() instead. This will also make it so there doesn't have to be a space after "http" in order for that string to be caught.
Also, either the conditional testing should be placed after the loop that builds the words list, or else the words list should be reset at the start of each loop so you're not re-testing words you've already checked.
# adjusted some variable names for clarity
words = []
output = []
disregard = ["http","gen"]
for fname in glob.glob(r'MYDIR'):
with open(fname, "r", encoding="utf-16") as f:
text = f.read()
lines = text.split("\n")
for line in lines:
words += line.split()
for word in words:
if len(word) == 40 and not any([word.startswith(dis) for dis in disregard]):
output.append(each)

Python calling class variable in forloop

I am new to python, and I have confused with the below for loop usage. Can anyone please help me to understand the class usage in the below forloop.
import sys
def checkline():
glb.linecount += 1
w = glb.l.split()
glb.wordcount += len(w)
class glb:
linecount = 0
wordcount = 0
l = []
f = open('Untitled9.ipynb','r')
for glb.l in f.readlines(): #what glb.l exactly does?
checkline()
print(glb.linecount, glb.wordcount)
This entire program counts the lines and words in a file. specifically,
glb.l becomes each line in a file, so you could iterate and count the words in each one of them.
Let me pseudo code it for you.
Open the file `Untitled9.ipynb` for reading. //f
For each line in the file: // checkline
Store the line.// youre adding the line to glb.l, which you will later iterate on to count the words in the file.
Add one to the line count.
For each space, add one to the word count. // counting the results of the split() on glb.l
Print the line and the word count.

Python program for json files

i want to search a particular keyword in a .json file and print 10 lines above and below the line in which the searched keyword is present.
Note - the keyword might be present more than once in the file.
So far i have made this -
with open('loggy.json', 'r') as f:
last_lines = deque(maxlen=5)
for ln, line in enumerate(f):
if "out_of_memory" in line:
print(ln)
sys.stdout.writelines(chain(last_lines, [line], islice(f, 5)))
last_lines.append(line)
print("Next Error")
print("No More Errors")
Problem with this is - the number of times it prints the keyword containing line is equal to that number of times the keyword has been found.
it is only printing 5 lines below it, whereas i want it to print five lines above it as well.
If the json file was misused to store really a lot of information, then
processing on-the-fly may be better. In the case, keep the history lines
say in the list that is shortened if it grows above a given limit.
Then use a counter that indicates how many lines must be displayed after
observing a problem:
#!python3
def print_around_pattern(pattern, fname, numlines=10):
"""Prints the lines with the pattern from the fname text file.
The pattern is a string, numline is the number of lines printed before
and after the line with the pattern (with default value 10).
"""
history = []
cnt = 0
with open(fname, encoding='utf8') as fin:
for n, line in enumerate(fin):
history.append(line) # append the line
history = history[-numlines-1:] # keep only the tail, including last line
if pattern in line:
# Print the separator and the history lines including the pattern line.
print('\n{!r} at the line {} ----------------------------'.format(
pattern, n+1))
for h in history:
print('{:03d}: {}'.format(n-numlines, h), end='')
cnt = numlines # set the counter for the next lines
elif cnt > 0:
# The counter indicates we want to see this line.
print('{:03d}: {}'.format(n+1, line), end='')
cnt -= 1 # decrement the counter
if __name__ == '__main__':
print_around_pattern('out_of_memory', 'loggy.json')
##print_around_pattern('out_of_memory', 'loggy.json', 3) # three lines before and after

Resources