Python - Spyder 3 - Open a list of .csv files and remove all double quotes in every file - python-3.x

I've read every thing I can find and tried about 20 examples from SO and google, and nothing seems to work.
This should be very simple, but I cannot get it to work. I just want to point to a folder, and replace every double quote in every file in the folder. That is it. (And I don't know Python well at all, hence my issues.) I have no doubt that some of the scripts I've tried to retask must work, but my lack of Python skill is getting in the way. This is as close as I've gotten, and I get errors. If I don't get errors it seems to do nothing. Thanks.
import glob
import csv
mypath = glob.glob('\\C:\\csv\\*.csv')
for fname in mypath:
with open(mypath, "r") as infile, open("output.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
writer.writerow(item.replace("""", "") for item in row)

You don't need to use csv-specific file opening and writing, I think that makes it more complex. How about this instead:
import os
mypath = r'\path\to\folder'
for file in os.listdir(mypath): # This will loop through every file in the folder
if '.csv' in file: # Check if it's a csv file
fpath = os.path.join(mypath, file)
fpath_out = fpath + '_output' # Create an output file with a similar name to the input file
with open(fpath) as infile
lines = infile.readlines() # Read all lines
with open(fpath_out, 'w') as outfile:
for line in lines: # One line at a time
outfile.write(line.replace('"', '')) # Remove each " and write the line
Let me know if this works, and respond with any error messages you may have.

I found the solution to this based on the original answer provided by u/Jeff. It was actually smart quotes (u'\u201d') to be exact, not straight quotes. That is why I could get nothing to work. That is a great way to spend like two days, now if you'll excuse me I have to go jump off the roof. But for posterity, here is what I used that worked. (And note - there is the left curving smart quote as well - that is u'\u201c'.
mypath = 'C:\\csv\\'
myoutputpath = 'C:\\csv\\output\\'
for file in os.listdir(mypath): # This will loop through every file in the folder
if '.csv' in file: # Check if it's a csv file
fpath = os.path.join(mypath, file)
fpath_out = os.path.join(myoutputpath, file) #+ '_output' # Create an output file with a similar name to the input file
with open(fpath) as infile:
lines = infile.readlines() # Read all lines
with open(fpath_out, 'w') as outfile:
for line in lines: # One line at a time
outfile.write(line.replace(u'\u201d', ''))# Remove each " and write the line
infile.close()
outfile.close()

Related

How to read many files have a specific format in python

I am a little bit confused in how to read all lines in many files where the file names have format from "datalog.txt.98" to "datalog.txt.120".
This is my code:
import json
file = "datalog.txt."
i = 97
for line in file:
i+=1
f = open (line + str (i),'r')
for row in f:
print (row)
Here, you will find an example of one line in one of those files:
I need really to your help
I suggest using a loop for opening multiple files with different formats.
To better understand this project I would recommend researching the following topics
for loops,
String manipulation,
Opening a file and reading its content,
List manipulation,
String parsing.
This is one of my favourite beginner guides.
To set the parameters of the integers at the end of the file name I would look into python for loops.
I think this is what you are trying to do
# create a list to store all your file content
files_content = []
# the prefix is of type string
filename_prefix = "datalog.txt."
# loop from 0 to 13
for i in range(0,14):
# make the filename variable with the prefix and
# the integer i which you need to convert to a string type
filename = filename_prefix + str(i)
# open the file read all the lines to a variable
with open(filename) as f:
content = f.readlines()
# append the file content to the files_content list
files_content.append(content)
To get rid of white space from file parsing add the missing line
content = [x.strip() for x in content]
files_content.append(content)
Here's an example of printing out files_content
for file in files_content:
print(file)

Running a function on multiple files simultaneously with python

i have a specific function that manipulates text files via input of directory and file name.
The defined function is as below
def nav2xy(target_directory, target_file):
after_rows = f'MOD {target_file}_alines.txt'
after_columns = f'MOD {target_file}_acolumns.txt'
# this segment is used to remove top lines(8 in this case) for work with only the actual data
infile = open(f'{target_directory}/{target_file}', 'r').readlines()
with open(after_rows, 'w') as outfile:
for index, line in enumerate(infile):
if index >= 8:
outfile.write(line)
# this segment removes the necessary columns, in this case leaving only coordinates for gmt use
with open(after_rows) as In, open(after_columns, "w") as Out:
for line in In:
values = line.split()
Out.write(f"{values[4]} {values[5]}\n")
i am searching for a way to run this code once on all files in the chosen directory(could be targeted by name or just do all of them),
should i change the function to use only the file name?
tried running the function this way, to no avail
for i in os.listdir('Geoseas_related_files'):
nav2xy('target_directory', i)
this way works perfectly, although somehow i still get this error with it.
(base) ms-iMac:python gan$ python3 coordinates_fromtxt.py
Traceback (most recent call last):
File "coordinates_fromtxt.py", line 7, in <module>
nav2xy('Geoseas_related_files', str(i))
File "/Users/gadraifman/research/python/GAD_MSC/Nav.py", line 19, in nav2xy
Out.write(f"{values[4]} {values[5]}\n")
IndexError: list index out of range
any help or advice would be a great help,
From what I gather from Iterating through directories with Python, the best way to loop directories is using glob.
I made some extensive other modifications to your code to simplify it and remove the middle step of saving lines to a file just to read them again. If this step is mandatory, then feel free to add it back.
import os, glob
def nav2xy(target_file):
# New file name, just appending stuff.
# "target_file" will contain the path as defined by root_dir + current filename
after_columns = f'{target_file}_acolumns.txt'
with open(target_file, 'r') as infile, open(after_columns, "w") as outfile:
content = infile.readlines()
#
# --- Skip 8 lines here
# |
# v
for line in content[8:]:
# No need to write the lines to a file, just to read them again.
# Process directly
values = line.split()
outfile.write(f"{values[4]} {values[5]}\n")
# I guess this is the dir you want to loop through.
# Maybe an absolute path c:\path\to\files is better.
root_dir = 'Geoseas_related_files/*'
for file_or_dir in glob.iglob(os.path.join(root_dir,"*")):
# Skip directories, if there are any.
if os.path.isfile(file_or_dir):
nav2xy(file_or_dir)

How do I replace the 4th item in a list that is in a file that starts with a particular string?

I need to search for a name in a file and in the line starting with that name, I need to replace the fourth item in the list that is separated my commas. I have began trying to program this with the following code, but I have not got it to work.
with open("SampleFile.txt", "r") as f:
newline=[]
for word in f.line():
newline.append(word.replace(str(String1), str(String2)))
with open("SampleFile.txt", "w") as f:
for line in newline :
f.writelines(line)
#this piece of code replaced every occurence of String1 with String 2
f = open("SampleFile.txt", "r")
for line in f:
if line.startswith(Name):
if line.contains(String1):
newline = line.replace(str(String1), str(String2))
#this came up with a syntax error
You could give some dummy data which would help people to answer your question. I suppose you to backup your data: You can save the edited data to a new file or you can backup the old file to a backup folder before working on the data (think about using "from shutil import copyfile" and then "copyfile(src, dst)"). Otherwise by making a mistake you could easily ruin your data without being able to easily restore them.
You can't replace the string with "newline = line.replace(str(String1), str(String2))"! Think about "strong" as your search term and a line like "Armstrong,Paul,strong,44" - if you replace "strong" with "weak" you would get "Armweak,Paul,weak,44".
I hope the following code helps you:
filename = "SampleFile.txt"
filename_new = filename.replace(".", "_new.")
search_term = "Smith"
with open(filename) as src, open(filename_new, 'w') as dst:
for line in src:
if line.startswith(search_term):
items = line.split(",")
items[4-1] = items[4-1].replace("old", "new")
line = ",".join(items)
dst.write(line)
If you work with a csv-file you should have a look at the csv module.
PS My files contain the following data (the filenames are not in the files!!!):
SampleFile.txt SampleFile_new.txt
Adams,George,m,old,34 Adams,George,m,old,34
Adams,Tracy,f,old,32 Adams,Tracy,f,old,32
Smith,John,m,old,53 Smith,John,m,new,53
Man,Emily,w,old,44 Man,Emily,w,old,44

Replacing and writing a file python [duplicate]

I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.
What is the best way to do this, within the following code?
f = open(file)
for line in f:
if line.contains('foo'):
newline = line.replace('foo', 'bar')
# how to write this newline back to the file
The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
# print "%d: %s" % (fileinput.filelineno(), line), # for Python 2
What happens here is:
The original file is moved to a backup file
The standard output is redirected to the original file within the loop
Thus any print statements write back into the original file
fileinput has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:], without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with statement.
While fileinput is great for throwaway scripts, I would be wary of using it in real code because admittedly it's not very readable or familiar. In real (production) code it's worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.
There are two options:
The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.
I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
def replace(file_path, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
with fdopen(fh,'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
#Copy the file permissions from the old file to the new file
copymode(file_path, abs_path)
#Remove original file
remove(file_path)
#Move new file
move(abs_path, file_path)
Here's another example that was tested, and will match search & replace patterns:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
Example use:
replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")
This should work: (inplace editing)
import fileinput
# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1):
print line.replace("foo", "bar"),
Based on the answer by Thomas Watnedal.
However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis
This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.
Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.
Reading the file as a single string instead of line by line allows for multiline match and replacement.
import re
def replace(file, pattern, subst):
# Read contents from file as a single string
file_handle = open(file, 'r')
file_string = file_handle.read()
file_handle.close()
# Use RE package to allow for replacement (also allowing for (multiline) REGEX)
file_string = (re.sub(pattern, subst, file_string))
# Write contents to file.
# Using mode 'w' truncates the file.
file_handle = open(file, 'w')
file_handle.write(file_string)
file_handle.close()
As lassevk suggests, write out the new file as you go, here is some example code:
fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()
If you're wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you're a fan of regex's:
import re
def replace( filePath, text, subs, flags=0 ):
with open( filePath, "r+" ) as file:
fileContents = file.read()
textPattern = re.compile( re.escape( text ), flags )
fileContents = textPattern.sub( subs, fileContents )
file.seek( 0 )
file.truncate()
file.write( fileContents )
A more pythonic way would be to use context managers like the code below:
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
You can find the full snippet here.
fileinput is quite straightforward as mentioned on previous answers:
import fileinput
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as file:
for line in file:
new_line = line.replace(search_text, new_text)
print(new_line, end='')
Explanation:
fileinput can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path in with statement.
print statement does not print anything when inplace=True, because STDOUT is being forwarded to the original file.
end='' in print statement is to eliminate intermediate blank new lines.
You can used it as follows:
file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')
Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.
Expanding on #Kiran's answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:
import codecs
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
Using hamishmcn's answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.
import re
fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
newline = p.sub('',line) # replace matching strings with empty string
print newline
fout.write(newline)
fin.close()
fout.close()
if you remove the indent at the like below, it will search and replace in multiple line.
See below for example.
def replace(file, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
print fh, abs_path
new_file = open(abs_path,'w')
old_file = open(file)
for line in old_file:
new_file.write(line.replace(pattern, subst))
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)

python3 opening files and reading lines

Can you explain what is going on in this code? I don't seem to understand
how you can open the file and read it line by line instead of all of the sentences at the same time in a for loop. Thanks
Let's say I have these sentences in a document file:
cat:dog:mice
cat1:dog1:mice1
cat2:dog2:mice2
cat3:dog3:mice3
Here is the code:
from sys import argv
filename = input("Please enter the name of a file: ")
f = open(filename,'r')
d1ct = dict()
print("Number of times each animal visited each station:")
print("Animal Id Station 1 Station 2")
for line in f:
if '\n' == line[-1]:
line = line[:-1]
(AnimalId, Timestamp, StationId,) = line.split(':')
key = (AnimalId,StationId,)
if key not in d1ct:
d1ct[key] = 0
d1ct[key] += 1
The magic is at:
for line in f:
if '\n' == line[-1]:
line = line[:-1]
Python file objects are special in that they can be iterated over in a for loop. On each iteration, it retrieves the next line of the file. Because it includes the last character in the line, which could be a newline, it's often useful to check and remove the last character.
As Moshe wrote, open file objects can be iterated. Only, they are not of the file type in Python 3.x (as they were in Python 2.x). If the file object is opened in text mode, then the unit of iteration is one text line including the \n.
You can use line = line.rstrip() to remove the \n plus the trailing withespaces.
If you want to read the content of the file at once (into a multiline string), you can use content = f.read().
There is a minor bug in the code. The open file should always be closed. I means to use f.close() after the for loop. Or you can wrap the open to the newer with construct that will close the file for you -- I suggest to get used to the later approach.

Resources