Incorrect formatting while reading csv - python-3.x

CSV format (3 columns):
id_numb formatted_id Comment_Txt
1 Z007 sample text says good morning.
Code to read:
with open("file.csv", 'r' ,newline='') as csvfile:
file_reader = csv.reader(csvfile, delimiter=' ',quotechar='|')
for row in file_reader:
print(row)
Expected OP:
['id_numb', 'formatted_id', 'Comment_Txt']
['1', 'Z007', 'sample','text' ,'says','good','morning.']
My OP:
['1,Z007,sample', 'text' ,'says','good','morning.']
The first 3 tokens are automatically joined. I am not able to understand the mistake. Any suggetsions will be helpful.

import csv
from functools import reduce
with open("file.csv", 'r' ,newline='') as csvfile:
file_reader = csv.reader(csvfile, delimiter=',',quotechar='|')
for row in file_reader:
print(reduce(lambda x, y: x+y, [i.split(' ') for i in row]))
output:
['id_numb', 'formatted_id', 'Comment_Txt']
['1', 'Z007', 'sample', 'text', 'says', 'good', 'morning.']
Is it Expected OP?

You could try using
with open("file.csv", 'r' ,newline='') as csvfile:
file_reader = csv.reader(csvfile, delimiter=',',quotechar='|')
for row in file_reader:
print(row)
since your first row seems to be of the form
1,Z007,sample text says good morning
and using ' ' as a delimiter basically splits anything separated by a space into two different columns.

Related

When editing data in a CSV how do you exclude/only include certain columns?

I've got a CSV of client details for a bank project in Python 3. I've managed to create a function in which you can edit the client details but I want to exclude the last 2 columns as and and can't figure out how.
Example of CSV data:
first_name,last_name,title,pronouns,dob,occupation,account_balance,overdraft_limit
Garner,Coupman,Ms,Male,14/04/2022,General Manager,2200.76,2.28
Jens,Eldrid,Honorable,Male,13/11/2021,Research Associate,967.64,79.15
Edit function:
if choice == "4":
editClient = int(input("Please enter the index number of the client you wish to edit: "))
print("Please enter the details for each of the following: ")
for i in range(len(existing_clients[0])):
newDetails = input("Enter new data for " + str(existing_clients[0][i]) + ": ")
existing_clients[editClient][i] = newDetails
changes = input("Are you sure you'd like to make these changes? Enter Yes or No")
if changes == ("Yes"):
# Newline fixed the spacing issue I was having
with open("mock_data.csv", "w+", newline="") as file:
reader = csv.writer(file)
for i in range(len(existing_clients)):
reader.writerow(existing_clients[i])
if changes == ("No"):
exit()
I've tried changing
for i in range(len(existing_clients[0])):
to
for i in range(len(existing_clients[0:6])):
and I thought this worked until I tried editing a client later the row 6.
I've also messed around a lot with
newDetails = input("Enter new data for " + str(existing_clients[0][i]) + ": ")
to no avail.
Edit the row with slicing and exclude the last two columns:
with open("mock_data.csv", "w", newline="") as file:
writer = csv.writer(file)
for client in existing_clients:
writer.writerow(client[:-2]) # exclude last two columns
Working example with data:
input.csv
first_name,last_name,title,pronouns,dob,occupation,account_balance,overdraft_limit
Garner,Coupman,Ms,Male,14/04/2022,General Manager,2200.76,2.28
Jens,Eldrid,Honorable,Male,13/11/2021,Research Associate,967.64,79.15
test.py
import csv
with open('input.csv', newline='') as f:
reader = csv.reader(f)
data = list(reader)
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
for line in data:
writer.writerow(line[:-2])
output.csv
first_name,last_name,title,pronouns,dob,occupation
Garner,Coupman,Ms,Male,14/04/2022,General Manager
Jens,Eldrid,Honorable,Male,13/11/2021,Research Associate
To select specific columns, you could concatenate different slices:
writer.writerow(line[:2] + line[5:6]) # column indexes 0, 1, and 5
Or use DictReader/DictWriter:
import csv
with open('input.csv', newline='') as f:
reader = csv.DictReader(f)
data = list(reader)
with open('output.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['last_name', 'occupation'], extrasaction='ignore')
writer.writeheader()
for line in data:
writer.writerow(line)
output.csv
last_name,occupation
Coupman,General Manager
Eldrid,Research Associate

How to read a column and write it as a row in python?

I am trying to read a csv and then transpose one column into a row.
I tried following a tutorial for reading a csv and then one for writing but the data doesnt stay saved to the list when I try to write the row.
import csv
f = open('bond-dist-rep.csv')
csv_f = csv.reader(f)
bondlength = []
with open("bond-dist-rep.csv") as f:
for row in csv_f:
bondlength.append(row[1])
print (bondlength)
print (len(bondlength))
with open('joined.csv', 'w', newline='') as csvfile:
csv_a = csv.writer (csvfile, delimiter=',',quotechar='"',
quoting=csv.QUOTE_ALL)
csv_a.writerow(['bondlength'])
with open('joined.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row)
print(row[0])
f.close()
The matter is that you only read the first value of each line and write only a string in the new file.
In order to transpose the read lines, you can use the zip function.
I also delete the first open function which is useless because of the good use of with for opening the file.
Here the final code:
import csv
bondlength = []
with open("bond-dist-rep.csv") as csv_f:
read_csv = csv.reader(csv_f)
for row in read_csv:
bondlength.append(row)
# delete the header if you have one
bondlength.pop(0)
with open('joined.csv', 'w') as csvfile:
csv_a = csv.writer (csvfile, delimiter=',')
for transpose_row in zip(*bondlength):
csv_a.writerow(transpose_row)

how to bind multi-word headers to it's values from csv as a dictionery

I'm scraping data from an HTML table and want to bind multi-word headers together as a dictionary keys for the correct values.
How should I bind the correct headers?
How should I remove unwanted headers at the end?
with open(filename, 'w') as f:
for span in all_spans:
num_page_items = len(all_spans)
f.write(span.text)
driver.close()
with open(filename) as csvfile: # Reading csv file
csvreader = csv.DictReader(csvfile, delimiter=' ')
mydict = {}
for col in csvreader:
print(col)
I expect the output to look something like -
Symbol{'AIZ', ('Shares': 1520), ('Purchase Price': 106.31), ('Market Price': 111.59), ('Total Value': 169,616.80), ('Gain/Loss': 8,025.60), ('Gain/Loss %': 0.050%)}
If i use csv.DictReader I'm getting -
OrderedDict([('Symbol', 'AIZ'), ('Shares', '1520'), ('Purchase', '$106.31'), ('Price', '$8,025.60'), ('Market', '$169,616.80'), ('Total', '0.050%'), ('Value', 'Buy'), ('Gain/Loss', 'Sell'), ('%', None), ('Actions', None)])
And if I`ll use csv.reader I'm getting -
['AIZ', '1520', '$106.31', '$111.59', '$169,616.80', '$8,025.60', '0.050%', 'Buy', '|', 'Sell']
Output file

Python 3.6 CSV Date Formatting d/m/yyyy to ddmmyyyy

I have a date column in a CSV file which I am trying to format from dd/mm/yyyy to ddmmyyyy. Some of the days and months are single digit which leave them as dmyyyy. When I run a print statement all of the rows output correctly.
import csv
with open(r'input file path,'r') as csvfile:
with open(r'outputfilepath,'w') as output:
w = csv.writer(output)
r = csv.reader(csvfile)
for row in r:
#this takes care of incomplete rows at the end
if len(row[6])>1:
dt = row[6].split("/")
n = 0
for n in range(len(dt)):
if len(dt[n])<2:
dt[n] = '0'+dt[n]
else:
dt[n]
row[6] = dt[0]+dt[1]+dt[2]
print(row)
else:
break
Print Output:
['a', '', 'Tom', 'Smith', 'J ', '', '12201956']
['b', '', 'Rick ', 'JOHNSON ', ' ', '', '08121922']
['c', '', 'Morty', 'Harvey', ' ', '', '06031940']
When I change the print to write rows:
import csv
with open(r'input file path,'r') as csvfile:
with open(r'outputfilepath,'w') as output:
w = csv.writer(output)
r = csv.reader(csvfile)
for row in r:
#this takes care of incomplete rows at the end
if len(row[6])>1:
dt = row[6].split("/")
n = 0
for n in range(len(dt)):
if len(dt[n])<2:
dt[n] = '0'+dt[n]
else:
dt[n]
row[6] = dt[0]+dt[1]+dt[2]
w.writerows(row)
else:
break
I get the output below. I've tried moving the writerows function around with no luck. Looking at the CSV module documentation it should delimit on the commas. I'm relatively new to python.
To fix your problem change w.writerows(row) to w.writerow(row). The difference is between the singular and the plural is that the plural version thinks its getting a collection of rows to write. It treats each item in the row you gave as a single row.
Also newline='' to your open because the csv module interacts poorly with universal newline mode on windows. (It tries to write '\r\n'. Universal newline translates that to '\r\r\n'.)
Finally, use datetime to fix your dates.
import csv
from datetime import datetime
with open(inpath, 'r', newline='') as fin:
with open(outpath, 'w', newline='') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for row in reader:
row[6] = datetime.strptime(row[6], '%m/%d/%Y').strftime('%m%dā€Œā€‹%Y')
writer.writerow(row)

How to get rid of empty strings from csv file's row using Python

I am writing code which takes rows from a CSV file and transfers them into a lists of integers. However, if I leave some blank entries in the row, I get a "list index out of range" error. Here is the code:
import csv
with open('Test.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
rows = [[int(row[0]), int(row[1]),int(row[2]),int(row[3])] for row in reader]
for row in rows:
print(row)
I looked up some similar questions on this website and the best idea for the solution I got was:
rows = [[int(row[0]), int(row[1]),int(row[2]),int(row[3])] for row in reader if len(row)>1]
However, it resulted with the same error.
Thanks in advance!
The problem is that if you don't have an int or it is empty the cast will fail.
The below example inserts a zero '0' in case the value is not an int or is empty. Replace it by what you want.
You can optimize the code but this should work:
Edit: Shorter version
import csv
def RepresentsInt(s):
try:
int(s)
return True
except ValueError:
return False
l = []
with open('test.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
l.append([int(r) if RepresentsInt(r) else 0 for r in row])
for row in l:
print(row)

Resources