Adding a second phrase to search for + the 10 following lines - search

with open('file.txt', 'r') as searchfile:
for line in searchfile:
if 'searchphrase' in line:
print line
Hi guy's,
So I have 1200 files to search through, in these 1200 files I need to -
Copy the first line of every file into one new text document, followed by below.
After pulling the first line of the document I need to search the rest of the file for my "Search phrase" - then copy that search phrase line along with the 10 following lines. Close file and move on to the next.
All the files are locate inside one master file, with uniform names.
Eg:
file 1
file 2
file 3
file 4
file 5 and so on...
I'v been trying for days but cannot seem to get it. This could save me 14 days of work.
Any help at all would be really appreciated.

Found a solution!
with open("C:\file.txt") as infile, open("C:\outfile.txt", "w") as outfile:
copy = False
for line in infile:
if line.strip() == "Start-Phrase":
copy = True
elif line.strip() == "End-Phrase":
copy = False
elif copy:
outfile.write(line)

Related

Scan through large text file using contents of another text file

Hello I am very new to coding, I am writing small python script but I am stuck. The goal is to compare the log.txt contents to the contents of the LargeFile.txt and every line of the log.txt that is not matching to any line of the LargeFile.txt to be stored in the outfile.txt but with the code below I only get the First line of the log.txt to repeat itself in the outfile.txt
logfile = open('log1.txt', 'r') # This file is 8KB
keywordlist = open('LargeFile.txt', 'r') # This file is 1,4GB
outfile = open('outfile.txt', 'w')
loglines = [n for n in logfile]
keywords = [n for n in keywordlist]
for line in loglines:
for word in keywords:
if line not in word:
outfile.write(line)
outfile.close()
So conceptually you're trying to check whether any line of your 1+ GB file occurs in your 8 KB file.
This means one of the files needs to be loaded into RAM, and the smaller file is the natural choice. The other file can be read sequentially and does not need to be loaded in full.
We need
a list of lines from the smaller file
an index of those lines for quick look-ups (we'll use a dict for this)
a loop that runs through the large file and checks each line against the index, making note of every matching line it finds
a loop that outputs the original lines and uses the index to determine whether they are unique or not.
The sample below prints the complete output to the console. Write it to a file as needed.
with open('log1.txt', 'r') as f:
log_lines = list(f)
index = {line: [] for line in log_lines}
with open('LargeFile.txt', 'r') as f:
for line_num, line in enumerate(f, 1):
if line in index:
index[line].append(line_num)
for line in log_lines:
if len(index[line]) == 0:
print(f'{line} -> unique')
else:
print(f'{line} -> found {len(index[line])}x')

How can I delete every second line in avery big text file?

I have a very big text file and I want to delete every second line. How can I do it in an effective way?
I have written a code like this:
_file = open("merged_DGM.txt", "r")
text = _file.readlines()
for i, j in enumerate(text):
if i % 2 == 0:
del text[i]
_file.close()
_file = open("half_DGM.txt", "w")
for i in text:
_file.write(i)
_file.close()
It works for small textfiles. but for big files, it loads the whole text into the variable. After 10 minutes it could not solve the problem.
Any suggestions would be appreciated.
The file object returned by open iherits from io.IOBase and can be iterated. By directly iteration over the file you avoid loading your whole file into the memory at once.
with open("merged_DGM.txt", "r") as in_file and open("half_DGM.txt", "w") as out_file:
for index, line in enumerate(in_file):
if index % 2:
out_file.write(line)

Replacing a float number in txt file

Firstly, I would like to say that I am newbie in Python.
I will ll try to explain my problem as best as I can.
The main aim of the code is to be able to read, modify and copy a txt file.
In order to do that I would like to split the problem up in three different steps.
1 - Copy the first N lines into a new txt file (CopyFile), exactly as they are in the original file (OrigFile)
2 - Access to a specific line where I want to change a float number for other. I want to append this line to CopyFile.
3 - Copy the rest of the OrigFile from line in point 2 to the end of the file.
At the moment I have been able to do step 1 with next code:
with open("OrigFile.txt") as myfile:
head = [next(myfile) for x iin range(10)] #read first 10 lines of txt file
copy = open("CopyFile.txt", "w") #create a txt file named CopyFile.txt
copy.write("".join(head)) #convert list into str
copy.close #close txt file
For the second step, my idea is to access directly to the txt line I am interested in and recognize the float number I would like to change. Code:
line11 = linecache.getline("OrigFile.txt", 11) #opening and accessing directly to line 11
FltNmb = re.findall("\d+\.\d+", line11) #regular expressions to identify float numbers
My problem comes when I need to change FltNmb for a new one, taking into consideration that I need to specify it inside the line11. How could I achieve that?
Open both files and write each line sequentially while incrementing line counter.
Condition for line 11 to replace the float number. Rest of the lines are written without modifications:
with open("CopyFile.txt", "w") as newfile:
with open("OrigFile.txt") as myfile:
linecounter = 1
for line in myfile:
if linecounter == 11:
newline = re.sub("^(\d+\.\d+)", "<new number>", line)
linecounter += 1
outfile.write(newline)
else:
newfile.write(line)
linecounter += 1

How do I compare two files and see if content is identical?

I am currently writing a code that asks a user if they want to either copy the contents of a file and put it into another file and also compare two contents of a file to see if they are identical. The copying part of my file works but not the part with comparing the contents of two files. I get an error saying:
line2=output_file.readlines()
io.UnsupportedOperation: not readable
This is my current code at the moment:
userinput=int(input('Press 1 to copy files, 2 to compare files, anything else to stop')) #prompt user for comparing or copying
while userinput==1 or userinput==2:
if userinput==1:
with open(input('Enter file you want copied:')) as input_file:
with open(input('Enter file you want contents copied to:'), 'w') as output_file:
for line in input_file: #contents of first file copied to second file
output_file.write(line)
userinput=int(input('Press 1 to copy files, 2 to compare files, anything else to stop'))
elif userinput==2:
with open(input('Enter file you want to check')) as input_file:
with open(input('Enter second file you want to check:'), 'w') as output_file:
line1=input_file.readlines() #reads each line of the text
line2=output_file.readlines()
if line1==line2: #checks if text is identical to each other
print('files are identical')
userinput=int(input('Press 1 to copy files, 2 to compare files, anything else to stop'))
elif line1 != line2:
print('This is where the file deviates')
print(line1)
print(line2)
userinput=int(input('Press 1 to copy files, 2 to compare files, anything else to stop'))
What can I do to fix this?
You are trying to read the file but u open the file as write-able with the argument 'w'
with open(input('Enter second file you want to check:'), 'w') as
output_file:
Either remove the argument or replace 'w' with 'r'
with open(input('Enter second file you want to check:')) as
output_file:
OR
with open(input('Enter second file you want to check:'), 'w') as
output_file:
You opened the output file output_file for writing ('w'). You cannot read from it. Change 'w' to 'r'.

How do I copy part of a file to a new file?

In Python I'd like to open a text file as file, and copy only part of the file to a new file. For example, I want to copy only part of the file, say between the line EXAMPLE\n and line END\n. So I want to delete everything before line EXAMPLE\n and everything after line END\n. How can I do that?
I can read the file using the following code, but how do I delete the
with open(r'filepath\myfile.txt', 'r') as f:
file = f.readlines()
<delete unwanted lines in file>
with open(r'filepath\newfile.txt', 'r') as f:
f.writelines(file)
Create a new array and only add the lines you want to that array:
new_lines = []
found_example=False
found_end=False
for line in file:
if line == "EXAMPLE\n": found_example=True
if line == "END\n": found_end=True
if found_example != found_end: new_lines.append(line)
file = new_lines
Now just write file to your file and you are done. Note that in your example you didn't open the file in write mode, so it would look more like this:
with open(r'filepath\newfile.txt', 'w+') as f:
f.writelines(file)
Read each line and notice whether it contains EXAMPLE or END. In the former case, set a flag to start outputting lines; in the latter, set the same flag to stop.
process = False
with open('myfile.txt') as f, open('newfile.txt', 'w') as g:
for line in f:
if line == 'EXAMPLE\n':
process = True
elif line == 'END\n':
process = False
else:
pass
if process:
line = line.strip()
print (line, file=g)

Resources