How to replace the contents of two TXT file in python? - python-3.x

I have 2 (*.txt) file and I want to replace file1 text with file2 and file2 text with file1.
Here is my code:
f1 = open("20-file-1.txt", "r+")
f2 = open("20-file-2.txt", "r+")
f1_all_lines_list = f1.readlines()
f2_all_lines_list = f2.readlines()
f1.truncate(0)
f2.truncate(0)
f1.write(''.join(f2_all_lines_list))
f2.write(''.join(f1_all_lines_list))
f1.close()
f2.close()
Every thing works well but each time I run the code some space would be added before the first line string and after several run both txt files size increases and my IDE Stucks.
Here are txt files before running the code :
Here are txt files after first run:

Related

Scan through large text file using contents of another text file

Hello I am very new to coding, I am writing small python script but I am stuck. The goal is to compare the log.txt contents to the contents of the LargeFile.txt and every line of the log.txt that is not matching to any line of the LargeFile.txt to be stored in the outfile.txt but with the code below I only get the First line of the log.txt to repeat itself in the outfile.txt
logfile = open('log1.txt', 'r') # This file is 8KB
keywordlist = open('LargeFile.txt', 'r') # This file is 1,4GB
outfile = open('outfile.txt', 'w')
loglines = [n for n in logfile]
keywords = [n for n in keywordlist]
for line in loglines:
for word in keywords:
if line not in word:
outfile.write(line)
outfile.close()
So conceptually you're trying to check whether any line of your 1+ GB file occurs in your 8 KB file.
This means one of the files needs to be loaded into RAM, and the smaller file is the natural choice. The other file can be read sequentially and does not need to be loaded in full.
We need
a list of lines from the smaller file
an index of those lines for quick look-ups (we'll use a dict for this)
a loop that runs through the large file and checks each line against the index, making note of every matching line it finds
a loop that outputs the original lines and uses the index to determine whether they are unique or not.
The sample below prints the complete output to the console. Write it to a file as needed.
with open('log1.txt', 'r') as f:
log_lines = list(f)
index = {line: [] for line in log_lines}
with open('LargeFile.txt', 'r') as f:
for line_num, line in enumerate(f, 1):
if line in index:
index[line].append(line_num)
for line in log_lines:
if len(index[line]) == 0:
print(f'{line} -> unique')
else:
print(f'{line} -> found {len(index[line])}x')

extract words from a text file and print netxt line

sample input
in parsing a text file .txt = ["'blah.txt'", "'blah1.txt'", "'blah2.txt'" ]
the expected output in another text file out_path.txt
blah.txt
blah1.txt
blah2.txt
Code that I tried, this just appends "[]" to the input file. While I also tried perl one liner replacing double and single quotes.
read_out_fh = open('out_path.txt',"r")
for line in read_out_fh:
for word in line.split():
curr_line = re.findall(r'"(\[^"]*)"', '\n')
print(curr_line)
this happens because while you reading a file it will be taken as string and not as a list even if u kept the formatting of a list. thats why you getting [] while doing re.for line in read_in_fh: here you are taking each letters in the string thats why you are not getting the desired output. so iwrote something first to transform the string into a list. while doing that i also eliminated "" and '' as you mensioned. then wrote it in to a new file example.txt.
Note: change the file name according to your files
read_out_fh = open('file.txt',"r")
for line in read_out_fh:
line=line.strip("[]").replace('"','').replace("'",'').split(", ")
with open("example.txt", "w") as output:
for word in line:
#print(word)
output.write(word+'\n')
example.txt(outputfile)
blah.txt
blah1.txt
blah2.txt
The code below works out for your example you gave in the question:
# Content of textfile.txt:
asdasdasd=["'blah.txt'", "'blah1.txt'", "'blah2.txt'"]asdasdasd
# Code:
import re
read_in_fh = open('textfile.txt',"r")
write_out_fh = open('out_path.txt', "w")
for line in read_in_fh:
find_list = re.findall(r'\[(".*?"*)\]', line)
for element in find_list[0].split(","):
element_formatted = element.replace('"','').replace("'","").strip()
write_out_fh.write(element_formatted + "\n")
write_out_fh.close()

How to create csv file for each line in a text file?

I have a text file price.txt that contains the following rows:
open
high
low
close
I need to create a separate csv files for each row in the text file and name the csv files as price1.csv, price2.csv and so on
I tried the following code
with open('price.txt') as infile, open('outfile.csv','w') as outfile:
for line in infile:
outfile.write(line.replace(' ',','))
I am getting only one csv file that has the following rows
open
high
low
close
How can I create a csv file for each row?
Here the code to obtain a different file called price1.csv, price2.csv etc for every line (with the substitution whitespace - comma that you put in your example) with comments:
### start a counter for the filename number
i = 0
with open('price.txt') as infile:
### open a loop over rows of input file
for line in infile :
### add 1 to counter
i += 1
### create the output filename for the row
newfile_name = "price" + str(i) + ".csv"
### write in the new filename the modified row
with open(newfile_name,'w') as outfile:
outfile.write(line.replace(' ',','))

I want to create a corpus in python from multiple text files

I want to do text analytics on some text data. Issue is that so far i have worked on CSV file or just 1 file, but here I have multiple text files. So, my approach is to combine them all to 1 file and then use nltk to do some text pre processing and further steps.
I tried to download gutenberg pkg from nltk, and I am not getting any error in the code. But I am not able to see content of 1st text file in 1 cell, 2nd text file in 2nd cell and so on. Kindly help.
filenames = [
"246.txt",
"276.txt",
"286.txt",
"344.txt",
"372.txt",
"383.txt",
"388.txt",
"392.txt",
"556.txt",
"665.txt"
]
with open("result.csv", "w") as f:
for filename in filenames:
f.write(nltk.corpus.gutenberg.raw(filename))
Expected result - I should get 1 csv file with contents of these 10 texts files listed in 10 different rows.
filenames = [
"246.txt",
"276.txt",
"286.txt",
"344.txt",
"372.txt",
"383.txt",
"388.txt",
"392.txt",
"556.txt",
"665.txt"
]
with open("result.csv", "w") as f:
for index, filename in enumerate(filenames):
f.write(nltk.corpus.gutenberg.raw(filename))
# Append a comma to the file content when
# filename is not the content of the
# last file in the list.
if index != (len(filenames) - 1):
f.write(",")
Output:
this,is,a,sentence,spread,over,multiple,files,and,the end
Code and .txt files available at https://github.com/michaelhochleitner/stackoverflow.com-questions-57081411 .
Using Python 2.7.15+ and nltk 3.4.4 . I had to move the .txt files to /home/mh/nltk_data/corpora/gutenberg .

Replacing a float number in txt file

Firstly, I would like to say that I am newbie in Python.
I will ll try to explain my problem as best as I can.
The main aim of the code is to be able to read, modify and copy a txt file.
In order to do that I would like to split the problem up in three different steps.
1 - Copy the first N lines into a new txt file (CopyFile), exactly as they are in the original file (OrigFile)
2 - Access to a specific line where I want to change a float number for other. I want to append this line to CopyFile.
3 - Copy the rest of the OrigFile from line in point 2 to the end of the file.
At the moment I have been able to do step 1 with next code:
with open("OrigFile.txt") as myfile:
head = [next(myfile) for x iin range(10)] #read first 10 lines of txt file
copy = open("CopyFile.txt", "w") #create a txt file named CopyFile.txt
copy.write("".join(head)) #convert list into str
copy.close #close txt file
For the second step, my idea is to access directly to the txt line I am interested in and recognize the float number I would like to change. Code:
line11 = linecache.getline("OrigFile.txt", 11) #opening and accessing directly to line 11
FltNmb = re.findall("\d+\.\d+", line11) #regular expressions to identify float numbers
My problem comes when I need to change FltNmb for a new one, taking into consideration that I need to specify it inside the line11. How could I achieve that?
Open both files and write each line sequentially while incrementing line counter.
Condition for line 11 to replace the float number. Rest of the lines are written without modifications:
with open("CopyFile.txt", "w") as newfile:
with open("OrigFile.txt") as myfile:
linecounter = 1
for line in myfile:
if linecounter == 11:
newline = re.sub("^(\d+\.\d+)", "<new number>", line)
linecounter += 1
outfile.write(newline)
else:
newfile.write(line)
linecounter += 1

Resources