Replace text and emty lines in txt file - python-3.x

Whant to replace "|" whit ";" and remove emty lines in a txt file and save as csv
My code so far
The replacement works but not remove emty lines.
And it save same line twice in csv
f1 = open("txtfile.txt", 'r+')
f2 = open("csvfile.csv", 'w')
for line in f1:
f2.write(line.replace('|', ';'))
if line.strip():
f2.write(line)
print(line)
f1.close()
f2.close()

In your code, f2.write(line.replace('|', ';')) converts line by replacing the | to ; and writes to csv file without checkng emptiness. So you are getting empty lines in csv file. Again in the if condition, f2.write(line) writes the original line once more. That is why you are getting same line (well almost) twice.
Instead of writing the modified line to file, save the it to line like -
for line in f1:
line = line.replace('|', ';')
if line.strip():
f2.write(line)
Here we are first modifying the line to change | to ; and overwrite line with the modified content. Then it checks for emptiness and writes in the csv file. So, the line is printed once and empty lines are skipped.
for line in f1:
if line.strip(): # Check emptiness first
f2.write(line.replace('|', ';')) # then directly write modified line

Related

Scan through large text file using contents of another text file

Hello I am very new to coding, I am writing small python script but I am stuck. The goal is to compare the log.txt contents to the contents of the LargeFile.txt and every line of the log.txt that is not matching to any line of the LargeFile.txt to be stored in the outfile.txt but with the code below I only get the First line of the log.txt to repeat itself in the outfile.txt
logfile = open('log1.txt', 'r') # This file is 8KB
keywordlist = open('LargeFile.txt', 'r') # This file is 1,4GB
outfile = open('outfile.txt', 'w')
loglines = [n for n in logfile]
keywords = [n for n in keywordlist]
for line in loglines:
for word in keywords:
if line not in word:
outfile.write(line)
outfile.close()
So conceptually you're trying to check whether any line of your 1+ GB file occurs in your 8 KB file.
This means one of the files needs to be loaded into RAM, and the smaller file is the natural choice. The other file can be read sequentially and does not need to be loaded in full.
We need
a list of lines from the smaller file
an index of those lines for quick look-ups (we'll use a dict for this)
a loop that runs through the large file and checks each line against the index, making note of every matching line it finds
a loop that outputs the original lines and uses the index to determine whether they are unique or not.
The sample below prints the complete output to the console. Write it to a file as needed.
with open('log1.txt', 'r') as f:
log_lines = list(f)
index = {line: [] for line in log_lines}
with open('LargeFile.txt', 'r') as f:
for line_num, line in enumerate(f, 1):
if line in index:
index[line].append(line_num)
for line in log_lines:
if len(index[line]) == 0:
print(f'{line} -> unique')
else:
print(f'{line} -> found {len(index[line])}x')

How to process each 'block' of text separately

Hoping you can help.
I have a file something like the below. There are lots of lines of text associated with an entry. each entry is separated by ***********
I have written some code that loops through each line, checks some criteria and then writes the output to a csv. However, I don't know how to do that for the whole section, rather than per line.
I kind of want WHILE line <> ***** loop through the lines. But I need to do that for each section in the document.
Would anyone be able to help please?
My attempt:
Split lines doesnt seem to work
import csv
from itertools import islice
output = "Desktop/data.csv"
f = open("Desktop/mpe.txt", "r")
lines = f.readlines().splitlines('*************************************************')
print(lines)
for line in lines:
if 'SEND_HTTP' in line:
date = line[:10]
if 'FAILURE' in line:
status = 'Failure'
else:
status = 'Success'
if 'HTTPMessageResponse' in line:
response = line
with open(output, "a") as fp:
wr = csv.writer(fp, dialect='excel')
wr.writerow([date, status, response])
The file:
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
*************************************************
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
line of text
*************************************************
You can first separates entry with str.split method
f = open("Desktop/mpe.txt", "r")
sections = f.read().split("*************************************************\n")
for section in sections:
for line in section.split("\n"):
# your code here
This will loop through your example file, splitting each 'section' as denoted by 50 asterisk (*) characters
fileHandle = open(r"Desktop/mpe.txt", "r")
splitItems = fileHandle.read().split("*"*49)
for index, item in enumerate(splitItems):
if(item == ""):
continue
print("[{}] {}".format(index, item))
You can remove the print statement and do what you need with the results. However, this form of parsing is not great as if the file doesn't have exactly 50 asterisks, this will break.
The if check skips any entries that are empty, which you will get if your example is accurate to the real data.
I would suggest creating a function get_sections which will return generator yielding one section at a time. This way you don't have to load the whole file in memory.
def get_sections():
with open("Desktop/mpe.txt") as f:
section=[]
for line in f:
if("***********" not in line):
section.append(line)
else:
yield section
section=[]
for section in get_sections():
print("new section")
for line in section:
print(line)
## do your processing here

How to print a file containing a list

So basically i have a list in a file and i only want to print the line containing an A
Here is a small part of the list
E5341,21/09/2015,C102,440,E,0
E5342,21/09/2015,C103,290,A,290
E5343,21/09/2015,C104,730,N,0
E5344,22/09/2015,C105,180,A,180
E5345,22/09/2015,C106,815,A,400
So i only want to print the line containing A
Sorry im still new at python,
i gave a try using one "print" to print the whole line but ended up failing guess i will always suck at python
You just have to:
open file
read lines
for each line, split at ","
for each line, if the 5th part of the splitted str is equal to "A", print line
Code:
filepath = 'file.txt'
with open(filepath, 'r') as f:
lines = f.readlines()
for line in lines:
if line.split(',')[4] == "A":
print(line)

How to read in a file and strip the lines, then split the values?

I need to read in a file, then strip the lines of the file, then split the values on each line and finally writing out to a new file. Essentially when I split the lines, all the values will be strings, then once they have been split each line will be its own list! The code I have written is still just copying the text and pasting it to the new file without stripping or splitting values!
with open(data_file) as data:
next(data)
for line in data:
line.rstrip
line.split
output.write(line)
logging.info("Successfully added lines")
with open(data_file) as data:
next(data) #Are you sure you want this? It essentially throws away the first line
# of the data file
for line in data:
line = line.strip()
line = line.split()
output.write(line)
logging.info("Successfully added lines")

Using python to remove nearly identical lines in txt file, with the exception of first and last lines

Here is a snippet from a text file I am working on.
http://pastebin.com/4Uba5i4P
I would like to use python to detect those big repeating "~ Move" lines (Which are not identical except for the "~ Move" part.), and remove all but the first and last of those lines.
How I would I start to go about this?
You could read the file line by line like this:
`## Open the file with read only permit
f = open('myTextFile.txt')
## Read the first line
line = f.readline()
## If the file is not empty keep reading line one at a time #
# till the file is empty while line: print line
line = f.readline() f.close()`
With this you could then edit this sample to test each line using a regex like this:
`if line.find("~Move") == -1:
Break;
Else:
Line=Line [5:-1]`
Though this assumes that the ~Move is all at the beginning of the line. Hope this helps, if not leave a comment and I'll try and help.

Resources