Read multiple text files, search few strings , replace and write in python - python-3.x

I have 10s of text files in my local directory named something like test1, test2, test3, and so on. I would like to read all these files, search few strings in the files, replace them by other strings and finally save back into my directory in such a way that something like newtest1, newtest2, newtest3, and so on.
For instance, if there was a single file, I would have done following:
#Read the file
with open('H:\\Yugeen\\TestFiles\\test1.txt', 'r') as file :
filedata = file.read()
#Replace the target string
filedata = filedata.replace('32-83 Days', '32-60 Days')
#write the file out again
with open('H:\\Yugeen\\TestFiles\\newtest1.txt', 'w') as file:
file.write(filedata)
Is there any way that I can achieve this in python?

If you use Pyhton 3 you can use the scandir in os library.
Python 3 docs: os.scandir
With that you can get the directory entries.
with os.scandir('H:\\Yugeen\\TestFiles') as it:
Then loop over these entries and your code could look something like this.
Notice I changed the path in your code to the entry object path.
import os
# Get the directory entries
with os.scandir('H:\\Yugeen\\TestFiles') as it:
# Iterate over directory entries
for entry in it:
# If not file continue to next iteration
# This is no need if you are 100% sure there is only files in the directory
if not entry.is_file():
continue
# Read the file
with open(entry.path, 'r') as file:
filedata = file.read()
# Replace the target string
filedata = filedata.replace('32-83 Days', '32-60 Days')
# write the file out again
with open(entry.path, 'w') as file:
file.write(filedata)
If you use Pyhton 2 you can use listdir. (also applicable for python 3)
Python 2 docs: os.listdir
In this case same code structure. But you also need to handle the full path to file since listdir will only return the filename.

Related

merge mutliple files into single file, new file file should merge to new line in output file

I've written script to merge multiple files into single file and create list from that.
requirement: file1 + file2 = file3 like below
file1 :
37717531209
201128307083
211669759863
496338947094
File 2:
348353447295
278262427715
901601149752
333676465561
my outputfile(not expecting this output):
37717531209
201128307083
211669759863
496338947094348353447295
278262427715
901601149752
333676465561
my expected outputfile:
37717531209
201128307083
211669759863
496338947094
348353447295
278262427715
901601149752
333676465561
My code is:
with open(outputfile, 'wb') as outfile:
for filename in glob.glob('*.accts'):
if filename == outputfile:
# don't want to copy the output into the output
continue
with open(filename, 'rb') as readfile:
shutil.copyfileobj(readfile, outfile)
#accounts = list(outfile)
with open('accounts.txt') as f:
acc = list(f)
accounts = []
for element in acc:
accounts.append(element.strip())
I want the new file to merge from next line. not to start from same line.
You don't appear to be doing anything that would require writing the contents of the input files into an output file. At the end, after creating the output file, you reopen it and process the account numbers it contains. You could simply do this, which requires no output file at all:
accounts = []
for filename in glob.glob('*.accts'):
with open(filename) as readfile:
accounts.extend(line.strip() for line in readfile)
return accounts
Furthermore, you could probably do away with downloading the *.accts files
by using download_fileobj() to download the files into a file like object (e.g. a io.BytesIO object) and processing from there.

What is the appropriate way to take in files that have a filename with a timestamp in it?

What is the appropriate way to take in files that have a filename with a timestamp in it and read properly?
One way I'm thinking of so far is to take these filenames into one single text file to read all at once.
For example, filenames such as
1573449076_1570501819_file1.txt
1573449076_1570501819_file2.txt
1573449076_1570501819_file3.txt
Go into a file named filenames.txt
Then something like
with open('/Documents/filenames.txt', 'r') as f:
for item in f:
if item.is_file():
file_stat = os.stat(item)
item = item.replace('\n', '')
print("Fetching {}".format(convert_times(file_stat)))
My question is how would I go about this where I can properly read the names in the text file given that they have timestamps in the actual names? Once figuring that out I can convert them.
If you just want to get the timestamps from the file names, assuming that they all use the same naming convention, you can do so like this:
import glob
import os
from datetime import datetime
# Grab all .txt files in the specified directory
files = glob.glob("<path_to_dir>/*.txt")
for file in files:
file = os.path.basename(file)
# Check that it contains an underscore
if not '_' in file:
continue
# Split the file name using the underscore as the delimiter
stamps = file.split('_')
# Convert the epoch to a legible string
start = datetime.fromtimestamp(int(stamps[0])).strftime("%c")
end = datetime.fromtimestamp(int(stamps[1])).strftime("%c")
# Consume the data
print(f"{start} - {end}")
...
You'll want to add some error checking and handling; for instance, if the first or second index in the stamps array isn't a parsable int, this will fail.

How to read file as .dat and write it as a .txt

So I'm making a thing where it reads data from a .dat file and saves it as a list, then it takes that list and writes it to a .txt file (basically a .dat to .txt converter). However, whenever I run it and it makes the file, it is a .txt file but it contains the .dat data. After troubleshooting the variable that is written to the .dat file is normal legible .txt not weird .dat data...
Here is my code (pls don't roast I'm very new I know it sucks and has lots of mistakes just leave me be xD):
#import dependencies
import sys
import pickle
import time
#define constants and get file path
data = []
index = 0
path = input("Absolute file path:\n")
#checks if last character is a space (common in copy+pasting) and removes it if there is a space
if path.endswith(' '):
path = path[:-1]
#load the .dat file into a list names bits
bits = pickle.load(open(path, "rb"))
with open(path, 'rb') as fp:
bits = pickle.load(fp)
#convert the data from bits into a new list called data
while index < len(bits):
print("Decoding....\n")
storage = bits[index]
print("Decoding....\n")
str(storage)
print("Decoding....\n")
data.append(storage)
print("Decoding....\n")
index += 1
print("Decoding....\n")
time.sleep(0.1)
#removes the .dat of the file
split = path[:-4]
#creates the new txt file with _converted.txt added to the end
with open(f"{split}_convert.txt", "wb") as fp:
pickle.dump(data, fp)
#tells the user where the file has been created
close_file = str(split)+"_convert.txt"
print(f"\nA decoded txt file has been created. Run this command to open it: cd {close_file}\n\n")
Quick review; I'm setting a variable named data which contains all of the data from the .dat file, then I want to the save the variable to a .txt file, but whenever I save it to a .txt file it has the contents of the .dat file, even though when I call print(data) it tells me the data in normal, legible text. Thanks for any help.
with open(f"{split}_convert.txt", "wb") as fp:
pickle.dump(data, fp)
When you're opening the file in wb mode, it will automatically write binary data to it. To write plain text to .txt file, use
with open(f"{split}_convert.txt", "w") as fp:
fp.write(data)
Since data is a list, you can't write it straight away as well. You'll need to write each item, using a loop.
with open(f"{split}_convert.txt", "w") as fp:
for line in data:
fp.write(line)
For more details on file writing, check this article as well: https://www.tutorialspoint.com/python3/python_files_io.htm

Pass a file with filepaths to Python in Ubuntu terminal to analyze each file?

I have a text file with file paths:
path1
path2
path3
...
path100000000
I have my python script app.py that should run on each file (path1, path2 ...)
Please advise what is the best way to do it?
Should I just get it as argument, and then:
with open(input_file, "r") as f:
lines = f.readlines()
for line in lines:
main_function(line)
Yes that should work, except readlines() doesn't remove newline characters.
with open(input_file, "r") as f:
lines = f.readlines()
for line in lines:
main_function(line.strip())
**Note: The above code assumes the file is in the same directory as the python script file.
You are using context managers. Hence, place the code inside the context.
So according to your comment,
If you want to pass filename where you will read the file contents in the main_function, then the above code will work.
If you want to read the file and then pass the file contents, then you will have to modify the above code to first read the content and then pass it to the function
with open(input_file, "r") as f:
lines = f.readlines()
for line in lines:
main_function(open(line.strip(), "r").read())
**Note: the above function will read the whole file as a single string (text)

Python - Spyder 3 - Open a list of .csv files and remove all double quotes in every file

I've read every thing I can find and tried about 20 examples from SO and google, and nothing seems to work.
This should be very simple, but I cannot get it to work. I just want to point to a folder, and replace every double quote in every file in the folder. That is it. (And I don't know Python well at all, hence my issues.) I have no doubt that some of the scripts I've tried to retask must work, but my lack of Python skill is getting in the way. This is as close as I've gotten, and I get errors. If I don't get errors it seems to do nothing. Thanks.
import glob
import csv
mypath = glob.glob('\\C:\\csv\\*.csv')
for fname in mypath:
with open(mypath, "r") as infile, open("output.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
writer.writerow(item.replace("""", "") for item in row)
You don't need to use csv-specific file opening and writing, I think that makes it more complex. How about this instead:
import os
mypath = r'\path\to\folder'
for file in os.listdir(mypath): # This will loop through every file in the folder
if '.csv' in file: # Check if it's a csv file
fpath = os.path.join(mypath, file)
fpath_out = fpath + '_output' # Create an output file with a similar name to the input file
with open(fpath) as infile
lines = infile.readlines() # Read all lines
with open(fpath_out, 'w') as outfile:
for line in lines: # One line at a time
outfile.write(line.replace('"', '')) # Remove each " and write the line
Let me know if this works, and respond with any error messages you may have.
I found the solution to this based on the original answer provided by u/Jeff. It was actually smart quotes (u'\u201d') to be exact, not straight quotes. That is why I could get nothing to work. That is a great way to spend like two days, now if you'll excuse me I have to go jump off the roof. But for posterity, here is what I used that worked. (And note - there is the left curving smart quote as well - that is u'\u201c'.
mypath = 'C:\\csv\\'
myoutputpath = 'C:\\csv\\output\\'
for file in os.listdir(mypath): # This will loop through every file in the folder
if '.csv' in file: # Check if it's a csv file
fpath = os.path.join(mypath, file)
fpath_out = os.path.join(myoutputpath, file) #+ '_output' # Create an output file with a similar name to the input file
with open(fpath) as infile:
lines = infile.readlines() # Read all lines
with open(fpath_out, 'w') as outfile:
for line in lines: # One line at a time
outfile.write(line.replace(u'\u201d', ''))# Remove each " and write the line
infile.close()
outfile.close()

Resources