Python 3.6 CSV Date Formatting d/m/yyyy to ddmmyyyy - python-3.x

I have a date column in a CSV file which I am trying to format from dd/mm/yyyy to ddmmyyyy. Some of the days and months are single digit which leave them as dmyyyy. When I run a print statement all of the rows output correctly.
import csv
with open(r'input file path,'r') as csvfile:
with open(r'outputfilepath,'w') as output:
w = csv.writer(output)
r = csv.reader(csvfile)
for row in r:
#this takes care of incomplete rows at the end
if len(row[6])>1:
dt = row[6].split("/")
n = 0
for n in range(len(dt)):
if len(dt[n])<2:
dt[n] = '0'+dt[n]
else:
dt[n]
row[6] = dt[0]+dt[1]+dt[2]
print(row)
else:
break
Print Output:
['a', '', 'Tom', 'Smith', 'J ', '', '12201956']
['b', '', 'Rick ', 'JOHNSON ', ' ', '', '08121922']
['c', '', 'Morty', 'Harvey', ' ', '', '06031940']
When I change the print to write rows:
import csv
with open(r'input file path,'r') as csvfile:
with open(r'outputfilepath,'w') as output:
w = csv.writer(output)
r = csv.reader(csvfile)
for row in r:
#this takes care of incomplete rows at the end
if len(row[6])>1:
dt = row[6].split("/")
n = 0
for n in range(len(dt)):
if len(dt[n])<2:
dt[n] = '0'+dt[n]
else:
dt[n]
row[6] = dt[0]+dt[1]+dt[2]
w.writerows(row)
else:
break
I get the output below. I've tried moving the writerows function around with no luck. Looking at the CSV module documentation it should delimit on the commas. I'm relatively new to python.

To fix your problem change w.writerows(row) to w.writerow(row). The difference is between the singular and the plural is that the plural version thinks its getting a collection of rows to write. It treats each item in the row you gave as a single row.
Also newline='' to your open because the csv module interacts poorly with universal newline mode on windows. (It tries to write '\r\n'. Universal newline translates that to '\r\r\n'.)
Finally, use datetime to fix your dates.
import csv
from datetime import datetime
with open(inpath, 'r', newline='') as fin:
with open(outpath, 'w', newline='') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for row in reader:
row[6] = datetime.strptime(row[6], '%m/%d/%Y').strftime('%m%dā€Œā€‹%Y')
writer.writerow(row)

Related

When editing data in a CSV how do you exclude/only include certain columns?

I've got a CSV of client details for a bank project in Python 3. I've managed to create a function in which you can edit the client details but I want to exclude the last 2 columns as and and can't figure out how.
Example of CSV data:
first_name,last_name,title,pronouns,dob,occupation,account_balance,overdraft_limit
Garner,Coupman,Ms,Male,14/04/2022,General Manager,2200.76,2.28
Jens,Eldrid,Honorable,Male,13/11/2021,Research Associate,967.64,79.15
Edit function:
if choice == "4":
editClient = int(input("Please enter the index number of the client you wish to edit: "))
print("Please enter the details for each of the following: ")
for i in range(len(existing_clients[0])):
newDetails = input("Enter new data for " + str(existing_clients[0][i]) + ": ")
existing_clients[editClient][i] = newDetails
changes = input("Are you sure you'd like to make these changes? Enter Yes or No")
if changes == ("Yes"):
# Newline fixed the spacing issue I was having
with open("mock_data.csv", "w+", newline="") as file:
reader = csv.writer(file)
for i in range(len(existing_clients)):
reader.writerow(existing_clients[i])
if changes == ("No"):
exit()
I've tried changing
for i in range(len(existing_clients[0])):
to
for i in range(len(existing_clients[0:6])):
and I thought this worked until I tried editing a client later the row 6.
I've also messed around a lot with
newDetails = input("Enter new data for " + str(existing_clients[0][i]) + ": ")
to no avail.
Edit the row with slicing and exclude the last two columns:
with open("mock_data.csv", "w", newline="") as file:
writer = csv.writer(file)
for client in existing_clients:
writer.writerow(client[:-2]) # exclude last two columns
Working example with data:
input.csv
first_name,last_name,title,pronouns,dob,occupation,account_balance,overdraft_limit
Garner,Coupman,Ms,Male,14/04/2022,General Manager,2200.76,2.28
Jens,Eldrid,Honorable,Male,13/11/2021,Research Associate,967.64,79.15
test.py
import csv
with open('input.csv', newline='') as f:
reader = csv.reader(f)
data = list(reader)
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
for line in data:
writer.writerow(line[:-2])
output.csv
first_name,last_name,title,pronouns,dob,occupation
Garner,Coupman,Ms,Male,14/04/2022,General Manager
Jens,Eldrid,Honorable,Male,13/11/2021,Research Associate
To select specific columns, you could concatenate different slices:
writer.writerow(line[:2] + line[5:6]) # column indexes 0, 1, and 5
Or use DictReader/DictWriter:
import csv
with open('input.csv', newline='') as f:
reader = csv.DictReader(f)
data = list(reader)
with open('output.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['last_name', 'occupation'], extrasaction='ignore')
writer.writeheader()
for line in data:
writer.writerow(line)
output.csv
last_name,occupation
Coupman,General Manager
Eldrid,Research Associate

Incorrect formatting while reading csv

CSV format (3 columns):
id_numb formatted_id Comment_Txt
1 Z007 sample text says good morning.
Code to read:
with open("file.csv", 'r' ,newline='') as csvfile:
file_reader = csv.reader(csvfile, delimiter=' ',quotechar='|')
for row in file_reader:
print(row)
Expected OP:
['id_numb', 'formatted_id', 'Comment_Txt']
['1', 'Z007', 'sample','text' ,'says','good','morning.']
My OP:
['1,Z007,sample', 'text' ,'says','good','morning.']
The first 3 tokens are automatically joined. I am not able to understand the mistake. Any suggetsions will be helpful.
import csv
from functools import reduce
with open("file.csv", 'r' ,newline='') as csvfile:
file_reader = csv.reader(csvfile, delimiter=',',quotechar='|')
for row in file_reader:
print(reduce(lambda x, y: x+y, [i.split(' ') for i in row]))
output:
['id_numb', 'formatted_id', 'Comment_Txt']
['1', 'Z007', 'sample', 'text', 'says', 'good', 'morning.']
Is it Expected OP?
You could try using
with open("file.csv", 'r' ,newline='') as csvfile:
file_reader = csv.reader(csvfile, delimiter=',',quotechar='|')
for row in file_reader:
print(row)
since your first row seems to be of the form
1,Z007,sample text says good morning
and using ' ' as a delimiter basically splits anything separated by a space into two different columns.

Using Python to delete rows in a csv file that contain certain chars

I have a csv file that I'm trying to clean up. I am trying to look at the first column and delete any rows that have anything other than chars for that row in the first column (I'm working on cleaning up rows where the first column has a ^ or . for now). It seems all my attempts either do nothing or nuke the whole csv file.
Interestingly enough, I have code that can identify the problem rows and it seems to work fine
def FindProblemRows():
with open('Data.csv') as csvDataFile:
ProblemRows = []
csvReader = csv.reader(csvDataFile)
data = [row for row in csv.reader(csvDataFile)]
length = len(data)
for i in range (0,length):
if data[i][0].find('^')!=-1 or data[i][0].find('.')!=-1:
ProblemRows.append(i)
return (ProblemRows)
Below I have my latest three failed attempts. Where am I going wrong and what should I change? Which of these comes closest?
'''
def Clean():
with open("Data.csv", "w", newline='') as f:
data = list(csv.reader(f))
writer = csv.writer(f)
Problems = FindProblemRows()
data = list(csv.reader(f))
length = len(data)
for row in data:
for i in Problems:
for j in range (0, length):
if row[j] == i:
writer.writerow(row)
Problems.remove(i)
def Clean():
Problems = FindProblemRows()
with open('Data.csv') as csvDataFile:
csvReader = csv.reader(csvDataFile)
data = [row for row in csv.reader(csvDataFile)]
length = len(data)
width = len(data[0])
with open("Data.csv","r") as csvFile:
csvReader = csv.reader( csvFile )
with open("CleansedData.csv","w") as csvResult:
csvWrite = csv.writer( csvResult )
for i in Problems:
for j in range (0, length):
if data[j] == i:
del data[j]
for j in range (0, length):
csvWrite.writerow(data[j])
'''
def Clean():
with open("Data.csv", 'r') as infile , open("CleansedData.csv", 'w') as outfile:
data = [row for row in infile]
for row in infile:
for column in row:
if "^" not in data[row][0]:
if "." not in data[row][0]:
outfile.write(data[row])
Update
Now I have:
def Clean():
df = pd.read_csv('Data.csv')
df = df['^' not in df.Symbol]
df = df['.' not in df.Symbol]
but I get KeyError: True
Shouldn't that work?
You should check whether the column Symbol contains any of the characters of interest. Method contains takes a regular expression:
bad_rows = df.Symbol.str.contains('[.^]')
df_clean = df[~bad_rows]

Get python to add serial nos to each entry as it is run

I am new to programming and probably there is an answer to my question somewhere like here, the closest i found after searching for days. Most of the info deals with existing csvs or hardcoding data. I am trying to make the program create data every time it runs and work on that so a little stumped here.
The Problem:
I can't seem to get python to attach serial nos to each entry when i run the program am making to log my study blocks. It has various fields following are two of them:
Date Time
12-03-2018 11:30
Following is the code snippet:
d= ''
while d == '':
d = input('Date:')
try:
valid_date = dt.strptime(d, '%Y-%m-%d')
except ValueError:
d = ''
print('Please input date in YYYY-MM-DD format.')
t= ''
while t == '':
t = input('Time:')
try:
valid_time = dt.strptime(t, '%H:%M')
except ValueError:
d = ''
print('Please input time in HH:MM format.')
header = csv.DictWriter(outfile, fieldnames= ['UID', 'Date', 'Time', 'Topic', 'Objective', 'Why', 'Summary'], delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL )
header.writeheader()
log_input = csv.writer(outfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
log_input.writerow([d, t, topic, objective, why, summary])
outfile.close()
df = pd.read_csv('E:\Coursera\HSU\python\pom_blocks_log.csv')
df = pd.read_csv('E:\pom_blocks_log.csv')
df = df.reset_index()
df.columns[0] = 'UID'
df['UID'] = df.index
print (df)
I get the following error when i run the program with the df block:
TypeError: Index does not support mutable operations
I new to python and don't really know how to work with data structures, so i am building small programs to learn. Any help is highly appreciated and apologies if this is a duplicate, please point me to the right direction.
So, i figured it out. Following is the process i followed:
I save the CSV file using the csv module.
I load the CSV file in pandas as dataframe.
What this does is, it allows me to append user entries to the CSV every time the program is run and then i can load it as a dataframe and use pandas to manipulate the data accordingly. Then i added a generator to clean the lines off the delimiter character ',' so that it could be loaded as a dataframe for string columns where ',' is accepted as a valid input. Maybe this is a round about approach but, it works.
Following is the code:
import csv
from csv import reader
from datetime import datetime
import pandas as pd
import numpy as np
with open(r'E:\Coursera\HSU\08_programming\trLog_df.csv','a', encoding='utf-8') as csvfile:
# Date
d = ''#input("Date:")
while d == '':
d = input('Date: ')
try:
valid_date = datetime.strptime(d, '%Y-%m-%d')
except ValueError:
d = ''
print("Incorrect data format, should be YYYY-MM-DD")
# Time
t = ''#input("Date:")
while t == '':
t = input('Time: ')
try:
valid_date = datetime.strptime(t, '%H:%M')
except ValueError:
t = ''
print("Incorrect data format, should be HH:MM")
log_input = csv.writer(csvfile, delimiter= ',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
log_input.writerow([d, t])
# Function to clean lines off the delimter ','
def merge_last(file_name, merge_after_col=7, skip_lines=0):
with open(file_name, 'r') as fp:
for i, line in enumerate(fp):
if i < 2:
continue
spl = line.strip().split(',')
yield (*spl[:merge_after_col], ','.join(spl[merge_after_col:2]))
# Generator to clean the lines
gen = merge_last(r'E:\Coursera\HSU\08_programming\trLog_df.csv', 1)
# get the column names
header = next(gen)
# create the data frame
df = pd.DataFrame(gen, columns=header)
df.head()
print(df)
If anybody has a better solution, it would be enlightening to know how to do it with efficiency and elegance.
Thank you for reading.

How to get rid of empty strings from csv file's row using Python

I am writing code which takes rows from a CSV file and transfers them into a lists of integers. However, if I leave some blank entries in the row, I get a "list index out of range" error. Here is the code:
import csv
with open('Test.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
rows = [[int(row[0]), int(row[1]),int(row[2]),int(row[3])] for row in reader]
for row in rows:
print(row)
I looked up some similar questions on this website and the best idea for the solution I got was:
rows = [[int(row[0]), int(row[1]),int(row[2]),int(row[3])] for row in reader if len(row)>1]
However, it resulted with the same error.
Thanks in advance!
The problem is that if you don't have an int or it is empty the cast will fail.
The below example inserts a zero '0' in case the value is not an int or is empty. Replace it by what you want.
You can optimize the code but this should work:
Edit: Shorter version
import csv
def RepresentsInt(s):
try:
int(s)
return True
except ValueError:
return False
l = []
with open('test.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
l.append([int(r) if RepresentsInt(r) else 0 for r in row])
for row in l:
print(row)

Resources