I am converting multiple CSV files under the similar directory to XLSX files.
The CSV has delimiter as tab.
I executed the program and managed to generate the XLSX files. However, the XLSX files are not separate by the tab.
Please view my code and tell me what is wrong. In line 10, I have specified my delimiter as tab, but the resulting XLSX file is not separated.
import os
import glob
import csv
import openpyxl
for csvfile in glob.glob(os.path.join(r'(my directory)', '*.csv')):
wb = openpyxl.Workbook()
ws = wb.active
with open(csvfile, 'r') as f:
reader = csv.reader(f, delimiter='\t')
for r, row in enumerate(reader, start=1):
for c, val in enumerate(row, start=1):
ws.cell(row=r, column=c).value = val
wb.save(csvfile + '.xlsx')
Related
Below script is there to convert all xlsx files placed inside the folder to CSV files with the same name.
import os
import glob
import pandas as pd
#set working directory
os.chdir("C:/Users/piyush.upadhyay/Piyush/Tasks/Task-61(Script to convert excel to csv)/Files")
all_files = [i for i in glob.glob('*.{}'.format('xlsx'))]
print(all_files)
li = []
for filename in all_files:
try:
print(filename)
input('Going to read xlsx to csv')
outFileName = filename.split('.')[0]+'.csv'
data_xls = pd.read_excel(filename, engine = 'openpyxl')
print(data_xls)
input('Going to convert xlsx to csv')
data_xls.to_csv(outFileName, header=True, index=None)
input('Converted')
except Exception as e:
print ("Error Logged..")
print(e)
input('Enter to exit')
Code returns a warning at the time of reading xlsx file which says:
Issue: Above function returns an empty data frame when we print the data_xls variable. As soon as we save the same file manually with the same extension i.e. xlsx, the code successfully converts all excel files inside the folder into CSV files.
Issue is same as the link describes
I tried combining multiple sheets of multiple excel into single excel using pandas python but in the end excel sheet,the rows labels are the excel sheet file name,each sheet as column name.I am getting it as messy.
How do I get it in proper format.Here is the code:
import pandas as pd
import os
from openpyxl.workbook import Workbook
os.chdir("C:/Users/w8/PycharmProjects/decorators_exaample/excel_files")
path = "C:/Users/w8/PycharmProjects/decorators_exaample/excel_files"
files = os.listdir(path)
AllFiles = pd.DataFrame()
for f in files:
info = pd.read_excel(f, sheet_name=None)
AllFiles=AllFiles.append(info, ignore_index=True)
writer = pd.ExcelWriter("Final.xlsx")
AllFiles.to_excel(writer)
writer.save()
The final excel looks like this :
enter image description here
you don't actually need the whole os and Workbook part. That could clean your code and ease finding errors. I assume, that path is the path to the folder where all the excel files are stored:
import pandas as pd
import glob
path = "C:\Users\w8\PycharmProjects\decorators_exaample\excel_files"
file_list = glob.glob(path)
df= pd.DataFrame()
for f in file_list :
info = pd.read_excel(f)
df = df.append(info)
df.to_excel('C:\Users\w8\PycharmProjects\decorators_exaample\excel_files\new_filename.xlsx')
should be as easy as that
I am a beginner at python. I am writing a script to :
Read all csv files in a folder
Drop duplicate rows within a .csv file by reading one csv file at a time
Write to *_new.csv file
The code :
import csv
import os
import pandas as pd
path = "/Users/<mylocaldir>/Documents/Data/"
file_list = os.listdir(path)
for file in file_list:
fullpath = os.path.join(path, file)
data = pd.read_csv(fullpath)
newdata = data.drop_duplicates()
newfile = fullpath.replace(".csv","_new.csv")
newdata.to_csv ("newfile", index=True, header=True)
As I run the script, there is no error displayed. But, *_new.csv is not created
Any help to resolve this issue?
I don't know pandas but you don't need it. You could try something like this:
import os
file_list = os.listdir()
# loop through the list
for filename in file_list:
# don't process any non csv file
if not filename.endswith('.csv'):
continue
# lines will be a temporary holding spot to check
# for duplicates
lines = []
new_file = filename.replace('.csv', '_new.csv')
# open 2 files - csv file and new csv file to write
with open(filename, 'r') as fr, open(new_file, 'w') as fw:
# read line from csv
for line in fr:
# if that line is not in temporary list called lines,
# add it there and write to file
# if that line is found in temporary list called lines,
# don't do anything
if line not in lines:
lines.append(line)
fw.write(line)
print('Done')
Result
Original file
cat name.csv
id,name
1,john
1,john
2,matt
1,john
New file
cat name_new.csv
id,name
1,john
2,matt
Another original file
cat pay.csv
id,pay
1,100
2,300
1,100
4,400
4,400
2,300
4,400
It's new file
id,pay
1,100
2,300
4,400
Update
The following script works with a slight modification to read from Src folder and write to Dest folder :
import cdv
import os
import pandas as pd
path = "/Users/<localdir>/Documents/Data/Src"
newPath = "/Users/<localdir>/Documents/Data/Dest"
file_list = os.listdir(path)
for file in file_list:
fullpath = os.path.join(path, file)
data = pd.read_csv(fullpath)
newdata = data.drop_duplicates()
newfile = file.replace(".csv","_new.csv")
if not os.path.isfile(os.path.join(newPath, newfile)):
newdata.to_csv (os.path.join(newPath, newfile), index=False, header=True)
I also added a check to see if a file already exists in the Dest folder.
I will be keen to understand if there is a better way to write this script.
So I'm new to Python(3) and need to create a loop that will go through almost 200 CSV files and convert them each into a pipe-delimited .txt file (with the same name).
I have code that does this for 1 file and works perfectly:
import csv
with open('C:/Path/InputFile.csv', 'r') as fin, \
open('C:/Path/OutputFile.txt', 'w') as fout:
reader = csv.DictReader(fin)
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
Thanks in advance.
You would modify your code to:
for in_name, out_name in zip(list_of_in_names, list_of_out_names):
with open(in_name, 'r') as fin, open(out_name, 'w') as fout:
...
Where list_of_in_names and list_of_out_names names of files you want to read from and write to respectively.
Edit: To address the issue from the comments you can use the pathlib library:
from pathlib import Path
for in_path in Path('C:/Path').glob('*.csv'):
out_path = in_path.with_suffix('.txt')
with in_path.open('r') as fin, out_path.open('w') as fout:
...
I'm trying to change a list of excel files to csvs before loading them into a pandas dataframe, but I'm unsure how I can convert them in my script. Csvkit and xlsx2csv seem to work for doing it from the command line, but when I try to start a subprocess like so
for filename in sorted_files:
file = subprocess.Popen("in2csv filename", stdout=subprocess.PIPE)
print file.stdout
dataframe = pd.read_csv(file)
I'm getting the error
IOError: Expected file path name or file-like object, got type
schema must not be null when format is "fixed"
Is it possible to get the output from the subprocess and pipe that to a dataframe? Any help greatly appreciated!
Although it has been so long since the question was made, I had the same issue and this is the way it was implemented inside a python script:
Could only execute Xlsx2csv with sheetid parameter. In order to get sheet names and ids, get_sheet_details was used.
csvfrmxlsx creates csv files for each sheet in csv folder under parent directory.
import pandas as pd
from pathlib import Path
def get_sheet_details(filename):
import xmltodict
import shutil
import zipfile
sheets = []
# Make a temporary directory with the file name
directory_to_extract_to = (filename.with_suffix(''))
directory_to_extract_to.mkdir(parents=True, exist_ok=True)
# Extract the xlsx file as it is just a zip file
zip_ref = zipfile.ZipFile(filename, 'r')
zip_ref.extractall(directory_to_extract_to)
zip_ref.close()
# Open the workbook.xml which is very light and only has meta data, get sheets from it
path_to_workbook = directory_to_extract_to / 'xl' / 'workbook.xml'
with open(path_to_workbook, 'r') as f:
xml = f.read()
dictionary = xmltodict.parse(xml)
for sheet in dictionary['workbook']['sheets']['sheet']:
sheet_details = {
'id': sheet['#sheetId'], # can be sheetId for some versions
'name': sheet['#name'] # can be name
}
sheets.append(sheet_details)
# Delete the extracted files directory
shutil.rmtree(directory_to_extract_to)
return sheets
def csvfrmxlsx(xlsxfl, df): # create csv files in csv folder on parent directory
from xlsx2csv import Xlsx2csv
(xlsxfl.parent / 'csv').mkdir(parents=True, exist_ok=True)
for index, row in df.iterrows():
shnum = row['id']
shnph = xlsxfl.parent / 'csv' / Path(row['name'] + '.csv') # path for converted csv file
Xlsx2csv(str(xlsxfl), outputencoding="utf-8").convert(str(shnph), sheetid=int(shnum))
return
pthfnc = 'c:/xlsx/'
wrkfl = 'my.xlsx'
xls_file = Path(pthfnc + wrkfl)
sheetsdic = get_sheet_details(xls_file) # dictionary with sheet names and ids without opening xlsx file
df = pd.DataFrame.from_dict(sheetsdic)
csvfrmxlsx(xls_file, df) # df with sheets to be converted