How to write into a CSV file with Python - python-3.x

Background:
I have a CSV (csv_dump) file with data from a MySQL table. I want copy some of the lines that meet certain conditions (row[1] == condition_1 and row[2] == condition_2) into a temporary CSV file (csv_temp).
Code Snippet:
f_reader = open(csv_dump, 'r')
f_writer = open(csv_temp, 'w')
temp_file = csv.writer(f_writer)
lines_in_csv = csv.reader(f_reader, delimiter=',', skipinitialspace=False)
for row in lines_in_csv:
if row[1] == condition_1 and row[2] == condition_2:
temp_file.writerow(row)
f_reader.close()
f_writer.close()
Question:
How can I copy the line that is being read copy it "as is" into the temp file with Python3?

test.csv
data1,data2,data3
120,80,200
140,50,210
170,100,250
150,70,300
180,120,280
The code goes here
import csv
with open("test.csv", 'r') as incsvfile:
input_csv = csv.reader(incsvfile, delimiter=',', skipinitialspace=False)
with open('tempfile.csv', 'w', newline='') as outcsvfile:
spamwriter = csv.writer(outcsvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
first_row = next(input_csv)
spamwriter.writerow(first_row)
for row in input_csv:
if int(row[1]) != 80 and int(row[2]) != 300:
spamwriter.writerow(row)
output tempfile.csv
data1,data2,data3
140,50,210
170,100,250
180,120,280
if you don't have title remove these two lines
first_row = next(input_csv)
spamwriter.writerow(first_row)

The following Python script seems to do the job... However, having said that, you should probably be using a MySQL query to do this work directly, instead of re-processing from an intermediate CSV file. But I guess there must be some good reason for wanting to do that?
mycsv.csv:
aa,1,2,5
bb,2,3,5
cc,ddd,3,3
hh,,3,1
as,hfd,3,3
readwrite.py:
import csv
with open('mycsv.csv', 'rb') as infile:
with open('out.csv', 'wb') as outfile:
inreader = csv.reader(infile, delimiter=',', quotechar='"')
outwriter = csv.writer(outfile)
for row in inreader:
if row[2]==row[3]:
outwriter.writerow(row)
out.csv:
cc,ddd,3,3
as,hfd,3,3
With a little more work, you could change the 'csv.writer' to make use of the same delimiters and escape/quote characters as the 'csv.reader'. It's not exactly the same as writing out the raw line from the file, but I think it will be practically as fast since the lines in question have already clearly been parsed without error if we have been able to check the value of specific fields.

Related

Compare 2 CSV files (encoded = "utf8") keeping data format

I have 2 stock lists (New and Old). How can I compare it to see what items have been added and what had been removed (happy to add them to 2 different files added and removed)?
so far I have tired along the lines of looking row by row.
import csv
new = "new.csv"
old = "old.csv"
add_file = "add.csv"
remove_file = "remove.csv"
with open(new,encoding="utf8") as new_read, open(old,encoding="utf8") as old_read:
new_reader = csv.DictReader(new_read)
old_reader = csv.DictReader(old_read)
for new_row in new_reader :
for old_row in old_reader:
if old_row["STOCK CODE"] == new_row["STOCK CODE"]:
print("found")
This works for 1 item. if I add an *else: * it just keeps printing that until its found. So it's not an accurate way of comparing the files.
I have 5k worth of rows.
There must be a better way to add the differences to the 2 different files and keep the same data structure at the same time ?
N.B i have tired this link Python : Compare two csv files and print out differences
2 minor issues:
1. the data structure is not kept
2. there is not reference to the change of location
You could just read the data into memory and then compare.
I used sets for the codes in this example for faster lookup.
import csv
def get_csv_data(file_name):
data = []
codes = set()
with open(file_name, encoding="utf8") as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
data.append(row)
codes.add(row['STOCK CODE'])
return data, codes
def write_csv(file_name, data, codes):
with open(file_name, 'w', encoding="utf8", newline='') as csv_file:
headers = list(data[0].keys())
writer = csv.DictWriter(csv_file, fieldnames=headers)
writer.writeheader()
for row in data:
if row['STOCK CODE'] not in codes:
writer.writerow(row)
new_data, new_codes = get_csv_data('new.csv')
old_data, old_codes = get_csv_data('old.csv')
write_csv('add.csv', new_data, old_codes)
write_csv('remove.csv', old_data, new_codes)

Skip lines with strange characters when I read a file

I am trying to read some data files '.txt' and some of them contain strange random characters and even extra columns in random rows, like in the following example, where the second row is an example of a right row:
CTD 10/07/30 05:17:14.41 CTD 24.7813, 0.15752, 1.168, 0.7954, 1497.¸ 23.4848, 0.63042, 1.047, 3.5468, 1496.542
CTD 10/07/30 05:17:14.47 CTD 23.4846, 0.62156, 1.063, 3.4935, 1496.482
I read the description of np.loadtxt and I have not found a solution for my problem. Is there a systematic way to skip rows like these?
The code that I use to read the files is:
#Function to read a datafile
def Read(filename):
#Change delimiters for spaces
s = open(filename).read().replace(':',' ')
s = s.replace(',',' ')
s = s.replace('/',' ')
#Take the columns that we need
data=np.loadtxt(StringIO(s),usecols=(4,5,6,8,9,10,11,12))
return data
This works without using csv like the other answer and just reads line by line checking if it is ascii
data = []
def isascii(s):
return len(s) == len(s.encode())
with open("test.txt", "r") as fil:
for line in fil:
res = map(isascii, line)
if all(res):
data.append(line)
print(data)
You could use the csv module to read the file one line at a time and apply your desired filter.
import csv
def isascii(s):
len(s) == len(s.encode())
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
if len(row)==expected_length and all((isascii(x) for x in row)):
'write row onto numpy array'
I got the ascii check from this thread
How to check if a string in Python is in ASCII?

Extract numbers and text from csv file with Python3.X

I am trying to extract data from a csv file with python 3.6.
The data are both numbers and text (it's url addresses):
file_name = [-0.47, 39.63, http://example.com]
On multiple forums I found this kind of code:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines,)
But this works for numbers only, the url addresses are read as NaN.
If I add dtype:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines, dtype=None)
The url addresses are read correctly, but they got a "b" at the beginning of the address, such as:
b'http://example.com'
How can I remove that? How can I just have the simple string of text?
I also found this option:
file = open(file_path, "r")
csvReader = csv.reader(file)
for row in csvReader:
variable = row[i]
coordList.append(variable)
but it seems it has some issues with python3.

Python CSV not writing data to file

I am running into a wall with this. I am new to writing CSV files with python and have been reading lots of different posts on the topic, but now I ran into a wall with this and could use a little help.
import csv
#headers from the read.csv file that I wan't to parse and write to the new file.
headers = ['header1', 'header5', 'header6', 'header7']
#open the write.csv file to write the data to
with open("write.csv", 'wb') as csvWriter:
writer = csv.writer(csvWriter)
#open the main data file that I want to parse data out of and write to write.csv
with open('reading.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',' )
csvList = list(readCSV)
#finds where the position of the data I want to pull out and write to write.csv
itemCode = csvList[0].index(headers[0])
vendorName = csvList[0].index(headers[1])
supplierID = csvList[0].index(headers[2])
supplierItemCode = csvList[0].index(headers[3])
for row in readCSV:
writer.writerow([row[itemCode], row[vendorName], row[supplierID], row[supplierItemCode]])
csvWriter.close()
---UPDATE---
I made the changes suggested and tried commenting out the following part of the code & changing 'wb' to 'w' and the program worked. However, I don't understand why, and how do I set this up so that I can list the header I want to pull out?
csvList = list(readCSV)
itemCode = csvList[0].index(headers[0])
vendorName = csvList[0].index(headers[1])
supplierID = csvList[0].index(headers[2])
supplierItemCode = csvList[0].index(headers[3])
Here is my updated code:
headers = ['header1', 'header5', 'header6', 'header7']
#open the write.csv file to write the data to
with open("write.csv", 'wb') as csvWriter, open('reading.csv') as csvfile:
writer = csv.writer(csvWriter)
readCSV = csv.reader(csvfile, delimiter=',' )
"""csvList = list(readCSV)
#finds where the position of the data I want to pull out and write to write.csv
itemCode = csvList[0].index(headers[0])
vendorName = csvList[0].index(headers[1])
supplierID = csvList[0].index(headers[2])
supplierItemCode = csvList[0].index(headers[3])"""
for row in readCSV:
writer.writerow([row[0], row[27], row[28], row[29]])
It looks like you want to write a subset of columns to a new file. This problem is simpler with DictReader/DictWriter. Note the correct use of open when using Python 3.x. Your attempt was using the Python 2.x way.
import csv
# headers you want in the order you want
headers = ['header1','header5','header6','header7']
with open('write.csv','w',newline='') as csvWriter,open('read.csv',newline='') as csvfile:
writer = csv.DictWriter(csvWriter,fieldnames=headers,extrasaction='ignore')
readCSV = csv.DictReader(csvfile)
writer.writeheader()
for row in readCSV:
writer.writerow(row)
Test data:
header1,header2,header3,header4,header5,header6,header7
1,2,3,4,5,6,7
11,22,33,44,55,66,77
Output:
header1,header5,header6,header7
1,5,6,7
11,55,66,77
if you want to access both writer under the same block,you should do something like this
with open("write.csv", 'wb') as csvWriter,open('reading.csv') as csvfile:
writer = csv.writer(csvWriter)
readCSV = csv.reader(csvfile, delimiter=',' )
csvList = list(readCSV)
#finds where the position of the data I want to pull out and write to write.csv
itemCode = csvList[0].index(headers[0])
vendorName = csvList[0].index(headers[1])
supplierID = csvList[0].index(headers[2])
supplierItemCode = csvList[0].index(headers[3])
for row in readCSV:
writer.writerow([row[itemCode], row[vendorName], row[supplierID], row[supplierItemCode]])
csvWriter.close()
The with open() as csvWriter: construct handles closing of the supplied file once you exit the block. So once you get down to writer.writerow, the file is already closed.
You need to enclose the entire expression in the with open block.
with open("write.csv", 'wb') as csvWriter:
....
#Do all writing within this block
....

combining results from two different cursors and then writing to a csv file in python 3

I am new to Python and I am working on a script that generates a csv report that writes data from the database, when given an ID as input. It works fine with one cursor.Now I have two different databases and I want to generate a single report that combines the results of both cursors. How do I combine the results from each cursor horizontally ? Is that possible in python 3? Please give me some suggestions. Here is the code I am working which involves one cursor:
cur = conn.cursor()
cur.execute("Select * from FailureAnalysisResults where LotName = ? and TestResultID = ?", (lot_name, testResultID))
with open(csvfile, 'w', newline='') as fout:
writer = csv.writer(fout, delimiter=',', quotechar=' ', quoting=csv.QUOTE_MINIMAL)
writer.writerow([i[0] for i in cur.description]) # heading row
writer.writerows(cur.fetchall())
I want to do the above for another database and combine the results of both the cursors before writing it to the csv file. I tried checking out arrays but I am stuck and need some suggestions. Thank you.
Well , I could achieve the above asked with the following code snippet:
list1 = list(cur1)
list2 = list(cur2)
list3 = list(zip(list1, list2))
with open(csvfile, 'w', newline='') as fout:
writer = csv.writer(fout, delimiter=',', quotechar=' ', quoting=csv.QUOTE_MINIMAL)
writer.writerow([i[0] for i in cur1.description] + [j[0] for j in cur2.description]) # heading row
writer.writerows(list3)
The above code concatenates well but I have formatting issues in the.csv file generated. It also prints '(',',',')' in the csv file which is causing some formatting issues.

Resources