Python program for json files - python-3.x

i want to search a particular keyword in a .json file and print 10 lines above and below the line in which the searched keyword is present.
Note - the keyword might be present more than once in the file.
So far i have made this -
with open('loggy.json', 'r') as f:
last_lines = deque(maxlen=5)
for ln, line in enumerate(f):
if "out_of_memory" in line:
print(ln)
sys.stdout.writelines(chain(last_lines, [line], islice(f, 5)))
last_lines.append(line)
print("Next Error")
print("No More Errors")
Problem with this is - the number of times it prints the keyword containing line is equal to that number of times the keyword has been found.
it is only printing 5 lines below it, whereas i want it to print five lines above it as well.

If the json file was misused to store really a lot of information, then
processing on-the-fly may be better. In the case, keep the history lines
say in the list that is shortened if it grows above a given limit.
Then use a counter that indicates how many lines must be displayed after
observing a problem:
#!python3
def print_around_pattern(pattern, fname, numlines=10):
"""Prints the lines with the pattern from the fname text file.
The pattern is a string, numline is the number of lines printed before
and after the line with the pattern (with default value 10).
"""
history = []
cnt = 0
with open(fname, encoding='utf8') as fin:
for n, line in enumerate(fin):
history.append(line) # append the line
history = history[-numlines-1:] # keep only the tail, including last line
if pattern in line:
# Print the separator and the history lines including the pattern line.
print('\n{!r} at the line {} ----------------------------'.format(
pattern, n+1))
for h in history:
print('{:03d}: {}'.format(n-numlines, h), end='')
cnt = numlines # set the counter for the next lines
elif cnt > 0:
# The counter indicates we want to see this line.
print('{:03d}: {}'.format(n+1, line), end='')
cnt -= 1 # decrement the counter
if __name__ == '__main__':
print_around_pattern('out_of_memory', 'loggy.json')
##print_around_pattern('out_of_memory', 'loggy.json', 3) # three lines before and after

Related

How can I print the line index of a specific word in a text file?

I was trying to find a way to print the biggest word from a txt file, it's size and it's line index. I managed to get the first two done but can't quite figure it out how to print the line index. Can anyone help me?
def BiggestWord():
list_words = []
with open('song.txt', 'r') as infile:
lines = infile.read().split()
for i in lines:
words = i.split()
list_words.append(max(words, key=len))
biggest_word = str(max(list_words, key=len))
print biggest_word
print len(biggest_words)
FindWord(biggest_word)
def FindWord(biggest_word):
You don't need to do another loop through your list of largest words from each line. Every for-loop increases function time and complexity, and it's better to avoid unnecessary ones when possible.
As one of the options, you can use Python's built-in function enumerate to get an index for each line from the list of lines, and instead of adding each line maximum to the list, you can compare it to the current max word.
def get_largest_word():
# Setting initial variable values
current_max_word = ''
current_max_word_length = 0
current_max_word_line = None
with open('song.txt', 'r') as infile:
lines = infile.read().splitlines()
for line_index, line in enumerate(lines):
words = line.split()
max_word_in_line = max(words, key=len)
max_word_in_line_length = len(max_word_in_line)
if max_word_in_line_length > current_max_word_length:
# updating the largest word value with a new maximum word
current_max_word = max_word_in_line
current_max_word_length = max_word_in_line_length
current_max_word_line = line_index + 1 # line number starting from 1
print(current_max_word)
print(current_max_word_length)
print(current_max_word_line)
return current_max_word, current_max_word_length, current_max_word_line
P.S.: This function doesn't suggest what to do with the line maximum words of the same length, and which of them should be chosen as absolute max. You would need to adjust the code accordingly.
P.P.S.: This example is in Python 3, so change the snippet to work in Python 2.7 if needed.
With a limited amount of info I'm working with, this is the best solution I could think of. Assuming that each line is separated by a new line, such as '\n', you could do:
def FindWord(largest_word):
with open('song.txt', 'r') as infile:
lines = infile.read().splitlines()
linecounter = 1
for i in lines:
if largest_word in lines:
return linecounter
linecounter += 1
You can use enumerate in your for to get the current line and sorted with a lambda to get the longest word:
def longest_word_from_file(filename):
list_words = []
with open(filename, 'r') as input_file:
for index, line in enumerate(input_file):
words = line.split()
list_words.append((max(words, key=len), index))
sorted_words = sorted(list_words, key=lambda x: -len(x[0]))
longest_word, line_index = sorted_words[0]
return longest_word, line_index
Are you aware that there can be:
many 'largest' words with the same length
several lines contain word(s) with the biggest length
Here is the code that finds ONE largest word and returns a LIST of numbers of lines that contain the word:
# built a dictionary:
# line_num: largest_word_in_this_line
# line_num: largest_word_in_this_line
# etc...
# !!! actually, a line can contain several largest words
list_words = {}
with open('song.txt', 'r') as infile:
for i, line in enumerate(infile.read().splitlines()):
list_words[i] = max(line.split(), key=len)
# get the largest word from values of the dictionary
# !!! there can be several different 'largest' words with the same length
largest_word = max(list_words.values(), key=len)
# get a list of numbers of lines (keys of the dictionary) that contain the largest word
lines = list(filter(lambda key: list_words[key] == largest_word, list_words))
print(lines)
If you want to get all lines that have words with the same biggest length you need to modify the last two lines in my code this way:
lines = list(filter(lambda key: len(list_words[key]) == len(largest_word), list_words))
print(lines)

How to skip N central lines when reading file?

I have an input file.txt like this:
3
2
A
4
7
B
1
9
5
2
0
I'm trying to read the file and
when A is found, print the line that is 2 lines below
when B is found, print the line that is 4 lines below
My current code and current output are like below:
with open('file.txt') as f:
for line in f:
if 'A' in line: ### Skip 2 lines!
f.readline() ### Skipping one line
line = f.readline() ### Locate on the line I want
print(line)
if 'B' in line: ## Skip 4 lines
f.readline() ### Skipping one line
f.readline() ### Skipping two lines
f.readline() ### Skipping three lines
line = f.readline() ### Locate on the line I want
print(line)
'4\n'
7
'1\n'
'9\n'
'5\n'
2
>>>
Is printing the values I want, but is printing also 4\n,1\n... and besides that, I need to write several f.realines()which is not practical.
Is there a better way to do this?
My expected output is like this:
7
2
Here is a much simpler code for you:
lines=open("file.txt","r").read().splitlines()
#print(str(lines))
for i in range(len(lines)):
if 'A' in lines[i]:
print(lines[I+2]) # show 2 lines down
elif 'B' in lines[i]:
print(lines[I+4]) # show 4 lines down
This reads the entire file as an array in which each element is one line of the file. Then it just goes through the array and directly changes the index by 2 (for A) and 4 (for B) whenever it finds the line it is looking for.
if you don't like repeated readline then wrap it in a function so the rest of the code is very clean:
def skip_ahead(it, elems):
assert elems >= 1, "can only skip positive integer number of elements"
for i in range(elems):
value = next(it)
return value
with open('file.txt') as f:
for line in f:
if 'A' in line:
line = skip_ahead(f, 2)
print(line)
if 'B' in line:
line = skip_ahead(f, 4)
print(line)
As for the extra output, when the code you have provided is run in a standard python interpreter only the print statements cause output, so there is no extra lines like '1\n', this is a feature of some contexts like the IPython shell when an expression is found in a statement context, in this case f.readline() is alone on it's own line so it is detected as possibly having a value that might be interesting. to suppress this you can frequently just do _ = <expr> to suppress output.

Print a line if it matches AND if a condition is met in the N lines before and N lines after

what is the most sensible way of doing this? Once I have my match, I want to know what's happening in the 100 lines before, and in the 100 lines after. Here is the example of the loop
with open(pile, "r") as pileup:
for i, line in enumerate(pileup):
fields = line.split('\t')
if fields[0] == v.CHROM and (v.start -1) == fields[1]:
print(str(v)) #printing the query string #I hope that then the variable i has the value of the match line number
for line in range(i-101, i+101):
if fields[2] >= 4:
print (line) #here I want to pring the line meeting the condition
I know using enumerate should allow to have a line number. But my code doesn't seem to work, it runs forever.
Thanks for any tips
This seems like a good time to use a deque.
A deque can be used as a fixed length list. That way you'll never have over a 100 elements in it.
from collections import deque
cache = deque(maxlen=100)
full_list = []
with open(pile, "r") as pileup:
for line in pileup:
fields = line.split('\t')
cache.append(fields)
if fields[0] == v.CHROM and (v.start -1) == fields[1]:
break
# once you reach your condition
# you can grab the next 100 lines as well
full_list = list(cache)
for i, line in enumerate(pileup):
if i < 100:
full_list.append(line.split('\t'))
else:
break

Sorting sentences of text file by users input

My code is sorting the sentences of file based on a length of the sentences by their length and saving to a new file.
How can I alter my code so that if the user inputs any number at program start, we filter the lines based on that input.
Example: The user inputs 50 - the program will sort all sentences that have a greater length than 50 or if the user inputs all then the program will sort all lines as normal.
My code:
file = open("testing_for_tools.txt", "r")
lines_ = file.readlines()
#print(lines_)
user_input = input("enter")
if user_input is int:
lines = sorted(lines_, key=len)
else:
lines = sorted(lines_, key=len)
# lines.sort()
file_out = open('testing_for_tools_sorted.txt', 'w')
file_out.write(''.join(lines)) # Write a sequence of strings to a file
file_out.close()
file.close()
print(lines)
input returns a string, always, if you want an integer or somesuch you need to parse it explicitely, you will never get an integer out of input.
is is not a type-testing primitive in python, it's an identity primitive. It checks if the left and right are the same object and that's it.
filter is what you're looking for here, or a list comprehension: if the user provided an input and that input is a valid integer, you want to filter the lines to only those above the specified length. This is a separate step from sorting.
That aside,
you should use with to manage files unless there are specific reasons that you shan't or can't
files have a writelines method which should be more efficient than writing joined lines
never ever open files in text mode without providing an encoding, otherwise Python asks the system for an encoding and it's easy for that system to be misconfigured or oddly configured leading to garbage inputs
with open("testing_for_tools.txt", "r", encoding='utf-8') as f:
lines_ = file.readlines()
#print(lines_)
user_input = input("enter")
if user_input:
try:
limit = int(user_input.strip())
except ValueError:
pass
else:
lines_ = (l for l in lines_ if len(l) >= limit)
lines = sorted(lines_, key=len)
with open('testing_for_tools_sorted.txt', 'w', encoding='utf-8') as f:
f.writelines(lines)
print(lines)
#Black Snow
I don't have anything else to answer if its working as expected.
This is a rather long answer:
idx_to_sort = [True if len(i)>int(user_input) else False for i in lines_]
idx_to_sort
lines_to_sort = []
for i in range(len(idx_to_sort)):
if idx_to_sort[i]:
lines_to_sort.append(lines_[i])
lines_to_sort
lines = sorted(lines_to_sort, key=len)
lines
counter=0
for i in range(len(idx_to_sort)):
if idx_to_sort[i]:
lines_[i] = lines[counter]
counter += 1
lines_
The output would be different but not what you expected.

Reading in a file of one-word lines in python

Just curious if there's a cleaner way to do this. I have a list of words in a file, one word per line.
I want to read them in and pass each word to a function.
I've currently got this:
f = open(fileName,"r");
lines = f.readlines();
count = 0
for i in lines:
count += 1
print("--{}--".format(i.rstrip()))
if count > 100:
return
I there a way to read them in faster without using rstrip on each line?
with open(fileName) as f:
lines = (line for _, line in zip(range(100), f.readlines()))
for line in lines:
print('--{}--'.format(line.rstrip()))
This is how I would do it. Note the context manager (the with/as statement), and the generator comprehension giving us only the first 100 lines.
Similar to Patrick's answer:
with open(filename, "r") as f:
for i, line in enumerate(f):
if i >= 100:
break
print("--{}--".format(line[:-1]))
If you don't an .strip() and know the length line terminator, you can use [:-1].

Resources