How can I delete every second line in avery big text file? - python-3.x

I have a very big text file and I want to delete every second line. How can I do it in an effective way?
I have written a code like this:
_file = open("merged_DGM.txt", "r")
text = _file.readlines()
for i, j in enumerate(text):
if i % 2 == 0:
del text[i]
_file.close()
_file = open("half_DGM.txt", "w")
for i in text:
_file.write(i)
_file.close()
It works for small textfiles. but for big files, it loads the whole text into the variable. After 10 minutes it could not solve the problem.
Any suggestions would be appreciated.

The file object returned by open iherits from io.IOBase and can be iterated. By directly iteration over the file you avoid loading your whole file into the memory at once.
with open("merged_DGM.txt", "r") as in_file and open("half_DGM.txt", "w") as out_file:
for index, line in enumerate(in_file):
if index % 2:
out_file.write(line)

Related

Scan through large text file using contents of another text file

Hello I am very new to coding, I am writing small python script but I am stuck. The goal is to compare the log.txt contents to the contents of the LargeFile.txt and every line of the log.txt that is not matching to any line of the LargeFile.txt to be stored in the outfile.txt but with the code below I only get the First line of the log.txt to repeat itself in the outfile.txt
logfile = open('log1.txt', 'r') # This file is 8KB
keywordlist = open('LargeFile.txt', 'r') # This file is 1,4GB
outfile = open('outfile.txt', 'w')
loglines = [n for n in logfile]
keywords = [n for n in keywordlist]
for line in loglines:
for word in keywords:
if line not in word:
outfile.write(line)
outfile.close()
So conceptually you're trying to check whether any line of your 1+ GB file occurs in your 8 KB file.
This means one of the files needs to be loaded into RAM, and the smaller file is the natural choice. The other file can be read sequentially and does not need to be loaded in full.
We need
a list of lines from the smaller file
an index of those lines for quick look-ups (we'll use a dict for this)
a loop that runs through the large file and checks each line against the index, making note of every matching line it finds
a loop that outputs the original lines and uses the index to determine whether they are unique or not.
The sample below prints the complete output to the console. Write it to a file as needed.
with open('log1.txt', 'r') as f:
log_lines = list(f)
index = {line: [] for line in log_lines}
with open('LargeFile.txt', 'r') as f:
for line_num, line in enumerate(f, 1):
if line in index:
index[line].append(line_num)
for line in log_lines:
if len(index[line]) == 0:
print(f'{line} -> unique')
else:
print(f'{line} -> found {len(index[line])}x')

Updating values in an external file only works if I restart the shell window

Hi there and thank you in advance for your response! I'm very new to python so please keep that in mind as you read through this, thanks!
So I've been working on some code for a very basic game using python (just for practice) I've written out a function that opens another file, selects a variable from it and adjusts that variable by an amount or if it's a string changes it into another string, the funtion looks like this.
def ovr(file, target, change):
with open(file, "r+") as open_file:
opened = open_file.readlines()
open_file.close()
with open(file, "w+") as open_file:
position = []
for appended_list, element in enumerate(opened):
if target in element:
position.append(appended_list)
if type(change) == int:
opened[position[0]] = (str(target)) + (" = ") + (str(change)) + (str("\n"))
open_file.writelines(opened)
open_file.close()
else:
opened[position[0]] = (str(target)) + (" = ") + ("'") + (str(change)) + ("'") + (str("\n"))
open_file.writelines(opened)
open_file.close()
for loop in range(5):
ovr(file = "test.py", target = "gold", change = gold + 1)
At the end I have basic loop that should re-write my file 5 times, each time increasing the amount of gold by 1. If I write this ovr() funtion outside of the loop and just run the program over and over it works just fine increasing the number in the external file by 1 each time.
Edit: I should mention that as it stands if I run this loop the value of gold increases by 1. if I close the shell and rerun the loop it increases by 1 again becoming 2. If I change the loop to happen any number of times it only ever increases the value of gold by 1.
Edit 2: I found a truly horrific way of fixing this isssue, if anyone has a better way for the love of god please let me know, code below.
for loop in range(3):
ovr(file = "test.py", target = "gold", change = test.gold + 1)
reload(test)
sleep(1)
print(test.gold)
The sleep part is because it takes longer to rewrite the file then it does to run the full loop.
you can go for a workaround and write your new inforamtion into a file called: file1
So you can use ur working loop outside of the write file. Anfter using your Loop you can just change the content of your file by the following steps.
This is how you dont need to rewrite your loop and still can change your file content.
first step:
with open('file.text', 'r') as input_file, open('file1.txt', 'w') as output_file:
for line in input_file:
output_file.write(line)
second step:
with open('file1.tex', 'r') as input_file, open('file.tex', 'w') as output_file:
for line in input_file:
if line.strip() == '(text'+(string of old value of variable)+'text)':
output_file.write('text'+(string of new value of variable)+' ')
else:
output_file.write(line)
then you have updated your text file.

Replacing a float number in txt file

Firstly, I would like to say that I am newbie in Python.
I will ll try to explain my problem as best as I can.
The main aim of the code is to be able to read, modify and copy a txt file.
In order to do that I would like to split the problem up in three different steps.
1 - Copy the first N lines into a new txt file (CopyFile), exactly as they are in the original file (OrigFile)
2 - Access to a specific line where I want to change a float number for other. I want to append this line to CopyFile.
3 - Copy the rest of the OrigFile from line in point 2 to the end of the file.
At the moment I have been able to do step 1 with next code:
with open("OrigFile.txt") as myfile:
head = [next(myfile) for x iin range(10)] #read first 10 lines of txt file
copy = open("CopyFile.txt", "w") #create a txt file named CopyFile.txt
copy.write("".join(head)) #convert list into str
copy.close #close txt file
For the second step, my idea is to access directly to the txt line I am interested in and recognize the float number I would like to change. Code:
line11 = linecache.getline("OrigFile.txt", 11) #opening and accessing directly to line 11
FltNmb = re.findall("\d+\.\d+", line11) #regular expressions to identify float numbers
My problem comes when I need to change FltNmb for a new one, taking into consideration that I need to specify it inside the line11. How could I achieve that?
Open both files and write each line sequentially while incrementing line counter.
Condition for line 11 to replace the float number. Rest of the lines are written without modifications:
with open("CopyFile.txt", "w") as newfile:
with open("OrigFile.txt") as myfile:
linecounter = 1
for line in myfile:
if linecounter == 11:
newline = re.sub("^(\d+\.\d+)", "<new number>", line)
linecounter += 1
outfile.write(newline)
else:
newfile.write(line)
linecounter += 1

Python 3.x outputting a text file with names of files that contain a list of words

I have approximately 160,000 text files in a directory. My first objective is to create a list of files that contain at least one item from a list of about 50 keywords. My current code is
import os
ngwrds= [list of words]
for filename in os.listdir(os.getcwd()):
with open(filename, 'r') as searchfile:
for line in searchfile:
if any(x in line for x in ngwrds):
with open("keyword.txt", 'a') as out:
out.write(filename + '\n')
Which works but sends out duplicate filenames. Ideally what I would like is for the loop to stop once it hits the first keyword, write the file name to 'keyword.txt', and move on to the next file in the directory. Any thoughts on how to do this?
A more in depth answer to #strubbly's comment, you would simply add a break in the 2nd for loop
with open(filename, 'r') as searchfile:
for line in searchfile:
if any(x in line for x in ngwrds):
with open("keyword.txt", 'a') as out:
out.write(filename + '\n')
break
What does the break do? from the python3 docs:
The break statement, like in C, breaks out of the smallest enclosing for or while loop.
for more information on break go to the control flow documentation :https://docs.python.org/3/tutorial/controlflow.html

python3 opening files and reading lines

Can you explain what is going on in this code? I don't seem to understand
how you can open the file and read it line by line instead of all of the sentences at the same time in a for loop. Thanks
Let's say I have these sentences in a document file:
cat:dog:mice
cat1:dog1:mice1
cat2:dog2:mice2
cat3:dog3:mice3
Here is the code:
from sys import argv
filename = input("Please enter the name of a file: ")
f = open(filename,'r')
d1ct = dict()
print("Number of times each animal visited each station:")
print("Animal Id Station 1 Station 2")
for line in f:
if '\n' == line[-1]:
line = line[:-1]
(AnimalId, Timestamp, StationId,) = line.split(':')
key = (AnimalId,StationId,)
if key not in d1ct:
d1ct[key] = 0
d1ct[key] += 1
The magic is at:
for line in f:
if '\n' == line[-1]:
line = line[:-1]
Python file objects are special in that they can be iterated over in a for loop. On each iteration, it retrieves the next line of the file. Because it includes the last character in the line, which could be a newline, it's often useful to check and remove the last character.
As Moshe wrote, open file objects can be iterated. Only, they are not of the file type in Python 3.x (as they were in Python 2.x). If the file object is opened in text mode, then the unit of iteration is one text line including the \n.
You can use line = line.rstrip() to remove the \n plus the trailing withespaces.
If you want to read the content of the file at once (into a multiline string), you can use content = f.read().
There is a minor bug in the code. The open file should always be closed. I means to use f.close() after the for loop. Or you can wrap the open to the newer with construct that will close the file for you -- I suggest to get used to the later approach.

Resources