Truncating cells in csv file - python-3.x

im currently trying to create a piece of software that finds and truncate cells containing more than a set number of characters in .csv files.
here's where i'm at :
import csv
with open('test.csv', 'r', newline = '', encoding = "UTF-8") as csv_file, \
open('output.csv', 'x',newline='',encoding="UTF-8") as output_file:
dialect = csv.Sniffer().sniff(csv_file.read(2048))
dialect.escapechar = '\\'
csv_file.seek(0)
writer = csv.writer(output_file, dialect)
for row in csv.reader(csv_file, dialect) :
copy = row
for col in copy :
#truncate the file to desired lenght
col = col[:253] + (col[:253] and '..')
writer.writerow(copy)
The problem here is that the new file is created but not changed.
Thanks for your consideration.

The problem is, is that you recreate the value col. This means that the old value is not changed and it is the old value that is still in the list. Best is to recreate the original list, and this can be done best with a "list comprehension"
copy = [col[:253] + (col[:253] and '..') for col in copy]
What's more, it really does not do anything if your variables have the same name. So, you named your altered value col, the same name as your loop variable, but this not mean that that what's contained by that loop variable (so the value in the list copy) is now replaced.
That's also why you don't have to do copy = row. You can just use row.

Related

Appending data from multiple excel files into a single excel file without overwriting using python pandas

Here is my current code below.
I have a specific range of cells (from a specific sheet) that I am pulling out of multiple (~30) excel files. I am trying to pull this information out of all these files to compile into a single new file appending to that file each time. I'm going to manually clean up the destination file for the time being as I will improve this script going forward.
What I currently have works fine for a single sheet but I overwrite my destination every time I add a new file to the read in list.
I've tried adding the mode = 'a' and a couple different ways to concat at the end of my function.
import pandas as pd
def excel_loader(fname, sheet_name, new_file):
xls = pd.ExcelFile(fname)
df1 = pd.read_excel(xls, sheet_name, nrows = 20)
print(df1[1:15])
writer = pd.ExcelWriter(new_file)
df1.insert(51, 'Original File', fname)
df1.to_excel(new_file)
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
for name in names:
excel_loader(name, 'specific_sheet_name', destination)
Thanks for any help in advance can't seem to find an answer to this exact situation on here. Cheers.
Ideally you want to loop through the files and read the data into a list, then concatenate the individual dataframes, then write the new dataframe. This assumes the data being pulled is the same size/shape and the sheet name is the same. If sheet name is changing, look into zip() function to send filename/sheetname tuple.
This should get you started:
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
#read all files first
df_hold_list = []
for name in names:
xls = pd.ExcelFile(name)
df = pd.read_excel(xls, sheet_name, nrows = 20)
df_hold_list.append(df)
#concatenate dfs
df1 = pd.concat(df_hold_list, axis=1) # axis is 1 or 0 depending on how you want to cancatenate (horizontal vs vertical)
#write new file - may have to correct this piece - not sure what functions these are
writer = pd.ExcelWriter(destination)
df1.to_excel(destination)

How can I create an excel file with multiple sheets that stores content of a text file using python

I need to create an excel file and each sheet contains the contents of a text file in my directory, for example if I've two text file then I'll have two sheets and each sheet contains the content of the text file.
I've managed to create the excel file but I could only fill it with the contents of the last text file in my directory, howevr, I need to read all my text files and save them into excel.
This is my code so far:
import os
import glob
import xlsxwriter
file_name='WriteExcel.xlsx'
path = 'C:/Users/khouloud.ayari/Desktop/khouloud/python/Readfiles'
txtCounter = len(glob.glob1(path,"*.txt"))
for filename in glob.glob(os.path.join(path, '*.txt')):
f = open(filename, 'r')
content = f.read()
print (len(content))
workbook = xlsxwriter.Workbook(file_name)
ws = workbook.add_worksheet("sheet" + str(i))
ws.set_column(0, 1, 30)
ws.set_column(1, 2, 25)
parametres = (
['file', content],
)
# Start from the first cell. Rows and
# columns are zero indexed.
row = 0
col = 0
# Iterate over the data and write it out row by row.
for name, parametres in (parametres):
ws.write(row, col, name)
ws.write(row, col + 1, parametres)
row += 1
workbook.close()
example:
if I have two text file, the content of the first file is 'hello', the content of the second text file is 'world', in this case I need to create two worksheets, first worksheet needs to store 'hello' and the second worksheet needs to store 'world'.
but my two worksheets contain 'world'.
I recommend to use pandas. It in turn uses xlsxwriter to write data (whole tables) to excel files but makes it much easier - with literally couple lines of code.
import pandas as pd
df_1 = pd.DataFrame({'data': ['Hello']})
sn_1 = 'hello'
df_2 = pd.DataFrame({'data': ['World']})
sn_2 = 'world'
filename_excel = '1.xlsx'
with pd.ExcelWriter(filename_excel) as writer:
for df, sheet_name in zip([df_1, df_2], [sn_1, sn_2]):
df.to_excel(writer, index=False, header=False, sheet_name=sheet_name)

Using regex to find and delete data

Need to search through data and delete customer Social Security Numbers.
with open('customerdata.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data.append(row)
for row in customerdata.csv:
results = re.search(r'\d{3}-\d{2}-\d{4}', row)
re.replace(results, "", row)
print(results)
New to scripting and not sure what it is I need to do to fix this.
This is not a job for a regex.
You are using a csv.DictReader, which is awesome. This means you have access to the column names in your csv file. What you should do is make a note of the column that contains the SSN, then write out the row without it. Something like this (not tested):
with open('customerdata.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
del row['SSN']
print(row)
If you need to keep the data but blank it out, then something like:
with open('customerdata.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
row['SSN'] = ''
print(row)
Hopefully you can take things from here; for example, rather than printing, you might want to use a csv dict writer. Depends on your use case. Though, do stick with csv operations and definitely avoid regexes here. Your data is in csv format. Think about the data as rows and columns, not as individual strings to be regexed upon. :)
I'm not seeing a replace function for re in the Python 3.6.5 docs.
I believe the function you would want to use is re.sub:
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged.
This means that all you need in your second for loop is:
for row in customerdata.csv:
results = re.sub(r'\d{3}-\d{2}-\d{4}', row, '')
print(results)

Writerows gives me a blank csv file

I made an array called list_of_rows, and looped appending list_of_cells to aforemention list_of_rows
I then attempt to make a csv file by creating one and assigning a variable called writer and well, it gives me a blank file
outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerows(list_of_rows)
Its supposed to give me a list of rows with cells and I don't know what's wrong.

Is it possible to create a new column for each iteration in XlsxWriter

I want to write data into Excel columns using XlsxWriter. One 'set' of data gets written for each iteration. Each set should be written in a separate column. How do I do this?
I have tried playing around with the col value as follows:
At [1] I define i=0 outside the loop and later increment it by 1 and set col=i. When this is done output is blank. To me this is the most logical solution & I don't know why it won't work.
At [2] i is defined inside the loop. When this happens one column gets written.
At [3] I define col the standard way. This works as expected: One column gets written.
My code:
import xlsxwriter
txt_file = open('folder/txt_file','r')
lines = dofile.readlines()
# [1]Define i outside the loop. When this is used output is blank.
i = 0
for line in lines:
if condition_a is met:
#parse text file to find a string. reg_name = string_1.
elif condition_b:
#parse text file for a second string. esto_name = string_2.
elif condition_c:
#parse text file for a group of strings.
# use .split() to append these strings to a list.
# reg_vars = list of strings.
#[2] Define i inside the loop. When this is used one column gets written. Relevant for [1] & [2].
i+=1 #Increment for each loop
row=1
col=i #Increments by one for each iteration, changing the column.
#[3] #Define col =1. When this is used one column also gets written.
col=1
#Open Excel
book= xlsxwriter.Workbook('folder/variable_list.xlsx')
sheet = book.add_worksheet()
#Write reg_name
sheet.write(row, col, reg_name)
row+=1
#Write esto_name
sheet.write(row, col, esto_name)
row+=1
#Write variables
for variable in reg_vars:
row+=1
sheet.write(row, col, variable)
book.close()
You can use the XlsxWriter write_column() method to write a list of data as a column.
However, in this particular case the issue seems to be that you are creating a new, duplicate, file via xlsxwriter.Workbook() each time you go through the condition_c part of the loop. Therefore the last value of col is used and the entire file is overwritten the next time through the loop.
You should probably move the creation of the xlsx file outside the loop. Probably to the same place you open() the text file.

Resources