I have a CSV file with two columns in it, the one of the left being an old string, and the one directly to right being the new one. I have a heap of .xml files that contain the old strings, which I need to replace/update with the new ones.
The script is supposed to open each .xml file one at a time and replace all of the old strings in the CSV file with the new ones. I have tried to use a replace function to replace instances of the old string, called 'column[0]' with the new string, called 'column[1]'. However I must be missing something as this seems to do nothing. If I the first variable in the replace function to an actual string with quotation marks, the replace function works. However if both the terms in the replace function are variables, it doesn't.
Does anyone know what I am doing wrong?
import os
import csv
with open('csv.csv') as csv:
lines = csv.readline()
column = lines.split(',')
fileNames=[f for f in os.listdir('.') if f.endswith('.xml')]
for f in fileNames:
x=open(f).read()
x=x.replace(column[0],column[1])
print(x)
Example of CSV file:
oldstring1,newstring1
oldstring2,newstring2
Example of .xml file:
Word words words oldstring1 words words words oldstring2
What I want in the new .xml files:
Word words words newstring1 words words words newstring2
The problem over here is you are treating the csv file as normal text file not looping over the all the lines in the csv file.
You need to read file using csv reader
Following code will work for your task
import os
import csv
with open('csv.csv') as csvfile:
reader = csv.reader(csvfile)
fileNames=[f for f in os.listdir('.') if f.endswith('.xml')]
for f in fileNames:
x=open(f).read()
for row in reader:
x=x.replace(row[0],row[1])
print(x)
It looks like this is better done using sed. However.
If we want to use Python, it seems to me that what you want to do is best achieved
reading all the obsolete - replacements pairs and store them in a list of lists,
have a loop over the .xml files, as specified on the command line, using the handy fileinput module, specifying that we want to operate in line and that we want to keep around the backup files,
for every line in each of the .xml s operate all the replacements,
put back the modified line in the original file (using simply a print, thanks to fileinput's magic) (end='' because we don't want to strip each line to preserve eventual white space).
import fileinput
import sys
old_new = [line.strip().split(',') for line in open('csv.csv')]
for line in fileinput.input(sys.argv[1:], inplace=True, backup='.bak'):
for old, new in old_new:
line = line.replace(old, new)
print(line, end='')
If you save the code in replace.py, you will execute it like this
$ python3 replace.py *.xml subdir/*.xml another_one/a_single.xml
Related
I have been trying all day.
# successfully writes the data from line 17 and next lines
# to new (temp) file named and saved in the os
import os
import glob
files = glob.glob('/Users/path/Documents/test/*.txt')
for myspec in files:
temp_filename = 'foo.temp.txt'
with open(myspec) as f:
for n in range(17):
f.readline()
with open(temp_filename, 'w') as w:
w.writelines(f)
os.remove(myspec)
os.rename(temp_filename, myspec)
# delete original file and rename the temp file so it replaces the original file
print("done")
The above works and it works well! I love it. I am very happy.
But this below does NOT work (same files, I am preprocessing files) :
# trying unsuccessfully to remove the last line which is line
# 2048 in all files and save again like above
import os
import glob
files = glob.glob('/Users/path/Documents/test/*.txt')
for myspec in files:
temp_filename = 'foo.temp.txt'
with open(myspec) as f:
for n in range(-1):
f.readline()
with open(temp_filename, 'w') as w:
w.writelines(f)
os.remove(myspec)
os.rename(temp_filename, myspec)
# delete original file and rename the temp file so it replaces the original file
print("done")
This does not work. It doesn't give an error, it prints done, but it does not change the file. I have tried range(-1), all the way up to range(-7), thinking maybe there were blank lines at the end I could not see. This is the only difference between the two blocks of code. If anyone could help that would be great.
To summarize, I got rid of permanently the headers and now I still have a 1 line footer I can not get rid of permanently.
Thank you so much for any help. I need to write permanently edited files. Because I have a ton of code that wants 2 or 3 column files without all the header footer junk, and the junk and file types vary widely. So if I lose the junk permanently ASCII can guess correctly the file types. And I really do not want to try and rewrite that code right now, it's very complicated and involves uncertainty and it took me months to get working correctly. I don't read the files until I'm inside a function and there are many files that are displayed in multiple drop downs. Thank you! All day I've been at this, I have tried other methods. I'd like to make THIS the above method work. To pop off the last and write it back to a permanent file. It doesn't like the -1. Right now it is just one specific line, it is (specifically line 2048 after the header is removed.) Therefore just removing line 2048 would be fine too. Its the last line of the files which are a batch of TSV files that are CCD readouts. Thanks in advance!
I need to find a pattern in a text file, which isn't big.
Therefore loading the entire file into RAM isn't a concern for me - as advised here:
I tried to do it in two ways:
with open(inputFile, 'r') as file:
for line in file.readlines():
for date in dateList:
if re.search('{} \d* 1'.format(date), line):
OR
with open(inputFile, 'r') as file:
contents = file.read()
for date in dateList:
if re.search('{} \d* 1'.format(date), contents):
The second one proved to be much faster.
Is there an explanation for this, other than the fact that I am using one less loop with the second approach?
As pointed out in the comments, the two codes are not equivalent as the second one only look for the first match in the whole file. Besides this, the first is also more expensive because the (relatively expensive) format over all dates is called for each line. Storing the regexp and precompiling them should help a lot. Even better: you can generate a regexp to match all the dates at once using something like:
regexp = '({}) \d* 1'.format('|'.join('{}'.format(date) for date in dateList))
with open(inputFile, 'r') as file:
contents = file.read()
# Search the first matching date existing in dateList
if re.search(regexp, contents):
Note that you can use findall if you want all of them.
I have a process where a CSV file can be downloaded, edited then uploaded again. On the download, the CSV file is in the correct format, with no wrapping double quotes
1, someval, someval2
When I open the CSV in a spreadsheet, edit and save, it adds double quotes around the strings
1, "someEditVal", "someval2"
I figured this was just the action of the spreadsheet (in this case, openoffice). I want my upload script to remove the wrapping double quotes. I cannot remove all quotes, just incase the body contains them, and I also dont want to just check first and last characters for double quotes.
Im almost sure that the CSV library in python would know how to handle this, but not sure how to use it...
EDIT
When I use the values within a dictionary, they turn out as follows
{'header':'"value"'}
Thanks
For you example, the following works:
import csv
writer = csv.writer(open("out.csv", "wb"), quoting=csv.QUOTE_NONE)
reader = csv.reader(open("in.csv", "rb"), skipinitialspace=True)
writer.writerows(reader)
You might need to play with the dialect options of the CSV reader and writer -- see the documentation of the csv module.
Thanks to everyone who was trying to help me, but I figured it out. When specifying the reader, you can define the quotechar
csv.reader(upload_file, delimiter=',', quotechar='"')
This handles the wrapping quotes of strings.
For Python 3:
import csv
writer = csv.writer(open("query_result.csv", "wt"), quoting=csv.QUOTE_NONE, escapechar='\\')
reader = csv.reader(open("out.txt", "rt"), skipinitialspace=True)
writer.writerows(reader)
The original answer gives this error under Python 3. Also See this SO for detail: csv.Error: iterator should return strings, not bytes
Traceback (most recent call last):
File "remove_quotes.py", line 11, in
writer.writerows(reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
This is the first time I've asked a question on stackoverflow so let me know if I'm doing something wrong.
I'm trying to rename a file with the os library. I want the file name to include some non-ascii characters that are in a string that I've generated. Here's the code:
for subdir, dirs, files in os.walk(startDir):
for file in files:
# some code to generate the newFileName string
os.rename(os.path.join(subdir,file), s.path.join(subdir,newFileName))
Here's an example of what the newFileName string would be: "te©st©.txt"
However when the file saves, it adds in an extra character: "te©st©.txt"
From other reading I've done it sounds like utf-8 actually maps certain codes to two characters, or something like that, and that's where the  is coming from. If I print the string right before calling os.rename, it prints to the terminal the way that I would expect it to. So I'm guessing it must be something with the way that os.rename is interaction with the filesystem.
I am using Windows.
Perhaps you can try using unicode all the way?
path = u'99 bottles of \N{greek small letter beta}eer on the wall.txt'
f = open(path, 'w')
f.write('Hello, World!\n')
f.close()
import glob
print(glob.glob(path)) # ['99 bottles of βeer on the wall.txt']
import os
print(os.path.getsize(path)) # 15
I have successfully downloaded my data from a given url and for storing it into a csv file I used the following code:
fx = open(destination_url, "w") #write data into a file
for line in lines: #loop through the string
fx.write(line + "\n")
fx.close() # close the file object
return
What happened is that the data is stored but not in separate lines. As one can see in the snapshot - the data is not separated into a different lines when I use the '\n'.
Every separate line of data that I wanted seems to be separated via the '\r' (marked by yellow) on the same cell in the csv file. Here is a snip: .
I know I am missing something here but can I get some pointers with regards to rearranging each line that ends with a \r into a separate line ?
I hope I have made myself clear.
Thanks
~V
There is a method call writelines
https://www.tutorialspoint.com/python/file_writelines.htm
some example is in the given link you can try that first in reality it should work we need the format of the data (what is inside the element) during each iteration print that out if the above method does not work