How to process each 'block' of text separately

How to process each 'block' of text separately - python-3.x

Hoping you can help.
I have a file something like the below. There are lots of lines of text associated with an entry. each entry is separated by ***********
I have written some code that loops through each line, checks some criteria and then writes the output to a csv. However, I don't know how to do that for the whole section, rather than per line.
I kind of want WHILE line <> ***** loop through the lines. But I need to do that for each section in the document.
Would anyone be able to help please?
My attempt:
Split lines doesnt seem to work
import csv
from itertools import islice
output = "Desktop/data.csv"
f = open("Desktop/mpe.txt", "r")
lines = f.readlines().splitlines('*************************************************')
print(lines)
for line in lines:
if 'SEND_HTTP' in line:
date = line[:10]
if 'FAILURE' in line:
status = 'Failure'
else:
status = 'Success'
if 'HTTPMessageResponse' in line:
response = line
with open(output, "a") as fp:
wr = csv.writer(fp, dialect='excel')
wr.writerow([date, status, response])
The file:
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
*************************************************
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
*************************************************

You can first separates entry with str.split method
f = open("Desktop/mpe.txt", "r")
sections = f.read().split("*************************************************\n")
for section in sections:
for line in section.split("\n"):
# your code here

This will loop through your example file, splitting each 'section' as denoted by 50 asterisk (*) characters
fileHandle = open(r"Desktop/mpe.txt", "r")
splitItems = fileHandle.read().split("*"*49)
for index, item in enumerate(splitItems):
if(item == ""):
continue
print("[{}] {}".format(index, item))
You can remove the print statement and do what you need with the results. However, this form of parsing is not great as if the file doesn't have exactly 50 asterisks, this will break.
The if check skips any entries that are empty, which you will get if your example is accurate to the real data.

I would suggest creating a function get_sections which will return generator yielding one section at a time. This way you don't have to load the whole file in memory.
def get_sections():
with open("Desktop/mpe.txt") as f:
section=[]
for line in f:
if("***********" not in line):
section.append(line)
else:
yield section
section=[]
for section in get_sections():
print("new section")
for line in section:
print(line)
## do your processing here

Related

Replace text and emty lines in txt file

Whant to replace "|" whit ";" and remove emty lines in a txt file and save as csv
My code so far
The replacement works but not remove emty lines.
And it save same line twice in csv
f1 = open("txtfile.txt", 'r+')
f2 = open("csvfile.csv", 'w')
for line in f1:
f2.write(line.replace('|', ';'))
if line.strip():
f2.write(line)
print(line)
f1.close()
f2.close()

In your code, f2.write(line.replace('|', ';')) converts line by replacing the | to ; and writes to csv file without checkng emptiness. So you are getting empty lines in csv file. Again in the if condition, f2.write(line) writes the original line once more. That is why you are getting same line (well almost) twice.
Instead of writing the modified line to file, save the it to line like -
for line in f1:
line = line.replace('|', ';')
if line.strip():
f2.write(line)
Here we are first modifying the line to change | to ; and overwrite line with the modified content. Then it checks for emptiness and writes in the csv file. So, the line is printed once and empty lines are skipped.
for line in f1:
if line.strip(): # Check emptiness first
f2.write(line.replace('|', ';')) # then directly write modified line

Replace multiple words in strings from a list of words in Python

I have a text file in the following format:
this is some text __label__a
this is another line __label__a __label__b
this is third line __label__x
this is fourth line __label__a __label__x __label__z
and another list of labels
list_labels = ['__label__x','__label__y','__label__z']
Each line could contain multiple labels from the list.
what is the best way to replace labels from the list in each line with "__label__no"
example:
this is third line __label__no
this is fourth line __label__a __label__no
There are a lot more lines in the text file and labels and I was wondering what is the fastest way to achieve this.

This probably isn't a "fastest way" to do it, but depending on the length of your text file, this may work:
list_labels = ['__label__x','__label__y','__label__z']
with open('text.txt', 'r') as f:
fcontents = f.readlines()
fcontents = [l.strip() for l in fcontents]
def remove_duplicates(l):
temp = []
[temp.append(x) for x in l if x not in temp]
return temp
for line in fcontents:
for ll in list_labels:
if ll in line:
l = line.replace(ll, '__label__no')
line = ' '.join(remove_duplicates(l.split()))
print line
output:
this is some text __label__a
this is another line __label__a __label__b
this is third line __label__no
this is fourth line __label__a __label__no
Borrowing the unique_list function from this question How can I remove duplicate words in a string with Python?

How can I start reading text at a specific line and stop and specific line

Long time listener first time caller, I'm quite new to this so please be kind.
I have a large text document and I would like to strip out the headers and footers. I would like to trigger the start and stop reading lines with specific strings in the text.
filename ='Bigtextdoc.txt'
startlookup = 'Foo'
endlookup = 'Bar'
with open(filename, 'r') as infile:
for startnum, line in enumerate(infile, 1):
if startlookup in line:
data = infile.readlines()
for endnum, line in enumerate(infile, 1):
if endlookup in line:
break
print(data)
like this I can read the lines after the header contain 'Foo' and If I move the data = line after the if endlookup line it will only read the line in the footer starting at 'Bar'
I don't know how to start at Foo and stop at Bar?

For readability I'll extract the logic in a function like:
def lookup_between_tags(lines, starttag, endtag):
should_yield = False
for line in lines:
if starttag in line:
should_yield = True
elif endtag in line:
should_yield = False
if should_yield:
yield line
Using the fact that an opened file is iterable, it can be used like:
with open('Bigtextdoc.txt') as bigtextdoc:
for line in lookup_between_tags(bigtextdoc, 'Foo', 'Bar'):
print(line)

How to print a file containing a list

So basically i have a list in a file and i only want to print the line containing an A
Here is a small part of the list
E5341,21/09/2015,C102,440,E,0
E5342,21/09/2015,C103,290,A,290
E5343,21/09/2015,C104,730,N,0
E5344,22/09/2015,C105,180,A,180
E5345,22/09/2015,C106,815,A,400
So i only want to print the line containing A
Sorry im still new at python,
i gave a try using one "print" to print the whole line but ended up failing guess i will always suck at python

You just have to:
open file
read lines
for each line, split at ","
for each line, if the 5th part of the splitted str is equal to "A", print line
Code:
filepath = 'file.txt'
with open(filepath, 'r') as f:
lines = f.readlines()
for line in lines:
if line.split(',')[4] == "A":
print(line)

How to read in a file and strip the lines, then split the values?

I need to read in a file, then strip the lines of the file, then split the values on each line and finally writing out to a new file. Essentially when I split the lines, all the values will be strings, then once they have been split each line will be its own list! The code I have written is still just copying the text and pasting it to the new file without stripping or splitting values!
with open(data_file) as data:
next(data)
for line in data:
line.rstrip
line.split
output.write(line)
logging.info("Successfully added lines")

with open(data_file) as data:
next(data) #Are you sure you want this? It essentially throws away the first line
# of the data file
for line in data:
line = line.strip()
line = line.split()
output.write(line)
logging.info("Successfully added lines")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to process each 'block' of text separately - python-3.x

You can first separates entry with str.split method f = open("Desktop/mpe.txt", "r") sections = f.read().split("*************************************************\n") for section in sections: for line in section.split("\n"): # your code here

Related

Replace text and emty lines in txt file

Replace multiple words in strings from a list of words in Python

How can I start reading text at a specific line and stop and specific line

How to print a file containing a list

How to read in a file and strip the lines, then split the values?

Categories

Resources