Combine multiple csv files into a single xls workbook Python 3 - excel

We are in the transition at work from python 2.7 to python 3.5. It's a company wide change and most of our current scripts were written in 2.7 and no additional libraries. I've taken advantage of the Anaconda distro we are using and have already change most of our scripts over using the 2to3 module or completely rewriting them. I am stuck on one piece of code though, which I did not write and the original author is not here. He also did not supply comments so I can only guess at the whole of the script. 95% of the script works correctly until the end where after it creates 7 csv files with different parsed information it has a custom function to combine the csv files into and xls workbook with each csv as new tab.
import csv
import xlwt
import glob
import openpyxl
from openpyxl import Workbook
Parsefiles = glob.glob(directory + '/' + "Parsed*.csv")
def xlsmaker():
for f in Parsefiles:
(path, name) = os.path.split(f)
(chort_name, extension) = os.path.splittext(name)
ws = wb.add_sheet(short_name)
xreader = csv.reader(open(f, 'rb'))
newdata = [line for line in xreader]
for rowx, row in enumerate(newdata)
for colx, value in enumerate(row):
if value.isdigit():
ws.write(rowx, colx, value)
xlsmaker()
for f in Parsefiles:
os.remove(f)
wb.save(directory + '/' + "Finished" + '' + oshort + '' + timestr + ".xls")
This was written all in python 2.7 and still works correctly if I run it in python 2.7. The issue is that it throws an error when running in python 3.5.
File "parsetool.py", line 521, in (module)
xlsmaker()
File "parsetool.py", line 511, in xlsmaker
ws = wb.add_sheet(short_name)
File "c:\pythonscripts\workbook.py", line 168 in add_sheet
raise TypeError("The paramete you have given is not of the type '%s'"% self._worksheet_class.__name__)
TypeError: The parameter you have given is not of the type "Worksheet"
Any ideas about what should be done to fix the above error? Iv'e tried multiple rewrites, but I get similar errors or new errors. I'm considering just figuring our a whole new method to create the xls, possibly pandas instead.

Not sure why it errs. It is worth the effort to rewrite the code and use pandas instead. Pandas can read each csv file into a separate dataframe and save all dataframes as a separate sheet in an xls(x) file. This can be done by using the ExcelWriter of pandas. E.g.
import pandas as pd
writer = pd.ExcelWriter('yourfile.xlsx', engine='xlsxwriter')
df = pd.read_csv('originalfile.csv')
df.to_excel(writer, sheet_name='sheetname')
writer.save()
Since you have multiple csv files, you would probably want to read all csv files and store them as a df in a dict. Then write each df to Excel with a new sheet name.
Multi-csv Example:
import pandas as pd
import sys
import os
writer = pd.ExcelWriter('default.xlsx') # Arbitrary output name
for csvfilename in sys.argv[1:]:
df = pd.read_csv(csvfilename)
df.to_excel(writer,sheet_name=os.path.splitext(csvfilename)[0])
writer.save()
(Note that it may be necessary to pip install openpyxl to resolve errors with xlsxwriter import missing.)

You can use the code below, to read multiple .csv files into one big .xlsx Excel file.
I also added the code for replacing ',' by '.' (or vice versa) for improved compatibility on windows environments and according to your locale settings.
import pandas as pd
import sys
import os
import glob
from pathlib import Path
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
writer = pd.ExcelWriter('fc15.xlsx') # Arbitrary output name
for csvfilename in all_filenames:
txt = Path(csvfilename).read_text()
txt = txt.replace(',', '.')
text_file = open(csvfilename, "w")
text_file.write(txt)
text_file.close()
print("Loading "+ csvfilename)
df= pd.read_csv(csvfilename,sep=';', encoding='utf-8')
df.to_excel(writer,sheet_name=os.path.splitext(csvfilename)[0])
print("done")
writer.save()
print("task completed")

Here's a slight extension to the accepted answer. Pandas 1.5 complains about the call to writer.save(). The fix is to use the writer as a context manager.
import sys
from pathlib import Path
import pandas as pd
with pd.ExcelWriter("default.xlsx") as writer:
for csvfilename in sys.argv[1:]:
p = Path(csvfilename)
sheet_name = p.stem[:31]
df = pd.read_csv(p)
df.to_excel(writer, sheet_name=sheet_name)
This version also trims the sheet name down to fit in Excel's maximum sheet name length, which is 31 characters.

If your csv file is in Chinese with gbk encoding, you can use the following code
import pandas as pd
import glob
import datetime
from pathlib import Path
now = datetime.datetime.now()
extension = "csv"
all_filenames = [i for i in glob.glob(f"*.{extension}")]
with pd.ExcelWriter(f"{now:%Y%m%d}.xlsx") as writer:
for csvfilename in all_filenames:
print("Loading " + csvfilename)
df = pd.read_csv(csvfilename, encoding="gb18030")
df.to_excel(writer, index=False, sheet_name=Path(csvfilename).stem)
print("done")
print("task completed")

Related

Python CSV merge issue

New to python and I am presently in the process of CSV merge using Python 3.7.
import pandas as pd
import os
newdir = 'C:\\xxxx\\xxxx\\xxxx\\xxxx'
list = os.listdir(newdir)
writer = pd.ExcelWriter('test.xlsx')
for i in range(0,len(list)):
data = pd.read_csv(list[i],encoding="gbk", index_col=0)
data.to_excel(writer, sheet_name=list[i])
writer.save()
I try to result as below:
FileNotFoundError: [Errno 2] File b'a.csv' does not exist: b'a.csv'
The problem is all of not csv merge into one xlsx file. Please let me know solution.
os.listdir only returns the filenames. You'll need to prepend the folder name to the filename.
import pandas as pd
import os
newdir = 'C:\\xxxx\\xxxx\\xxxx\\xxxx'
names = os.listdir(newdir)
writer = pd.ExcelWriter('test.xlsx')
for name in names:
path = os.path.join(newdir, name)
data = pd.read_csv(path, encoding="gbk", index_col=0)
data.to_excel(writer, sheet_name=name)
writer.save()
Note that I did not bother to check the rest of your code.
Oh and please avoid using builtins to name your variables.

Import dataset from url and convert text to csv in python3

I am pretty new to Python (using Python3) and read Pandas to import dataset.
I need to import dataset from url - https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt
and convert it to csv file, I am getting some special character in converted csv -> ��
I am download txt file and converting it to csv, is is the right approach?
and converted csv is putting entire text into one column
from urllib.request import urlretrieve
import pandas as pd
from pandas import DataFrame
url = 'https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt'
urlretrieve(url, 'index.txt')
df = pd.read_csv('index.txt', sep='/t', engine='python', lineterminator='\r\n')
csv_file = df.to_csv('index.csv', sep='\t', index=False, header=True)
print(csv_file)
after successful import, I have to Extract X as all columns except the first column and Y as first column also.
I'll appreciate your all help.
from urllib.request import urlretrieve
import pandas as pd
url = 'https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt'
urlretrieve(url, 'index.txt')
df = pd.read_csv('index.txt', sep='\t',encoding='utf-16')
Y = df[['REMISS']]
X = df.drop(['REMISS'],axis=1)

How to Read multiple files in Python for Pandas separate dataframes

I am trying to read 6 files into 7 different data frames but I am unable to figure out how should I do that. File names can be complete random, that is I know the files but it is not like data1.csv data2.csv.
I tried using something like this:
import sys
import os
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
f1='Norway.csv'
f='Canada.csv'
f='Chile.csv'
Norway = pd.read_csv(Norway.csv)
Canada = pd.read_csv(Canada.csv)
Chile = pd.read_csv(Chile.csv )
I need to read multiple files in different dataframes. it is working fine when I do with One file like
file='Norway.csv
Norway = pd.read_csv(file)
And I am getting error :
NameError: name 'norway' is not defined
You can read all the .csv file into one single dataframe.
for file_ in all_files:
df = pd.read_csv(file_,index_col=None, header=0)
list_.append(df)
# concatenate all dfs into one
big_df = pd.concat(dfs, ignore_index=True)
and then split the large dataframe into multiple (in your case 7). For example, -
import numpy as np
num_chunks = 3
df1,df2,df3 = np.array_split(big_df,num_chunks)
Hope this helps.
After googling for a while looking for an answer, I decided to combine answers from different questions into a solution to this question. This solution will not work for all possible cases. You have to tweak it to meet all your cases.
check out the solution to this question
# import libraries
import pandas as pd
import numpy as np
import glob
import os
# Declare a function for extracting a string between two characters
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
path = '/path/to/folder/containing/your/data/sets' # use your path
all_files = glob.glob(path + "/*.csv")
list_of_dfs = [pd.read_csv(filename, encoding = "ISO-8859-1") for filename in all_files]
list_of_filenames = [find_between(filename, 'sets/', '.csv') for filename in all_files] # sets is the last word in your path
# Create a dictionary with table names as the keys and data frames as the values
dfnames_and_dfvalues = dict(zip(list_of_filenames, list_of_dfs))

Error message when writing dataframe to excel

I am loading a pkl file into a dataframe and want to save it to excel using Excelwriter from pandas. Loading of the pkl file into a DF works fine, writing the frame to excel throws following error:
ValueError("Cannot convert {0!r} to Excel".format(value)
I do not know what is wrong.
I have anaconda with python 3.7 installed on one computer where the code works fine and runs without problems. However, on a different computer (with python 3.7 and freshly installed pandas and pickle), it fails... any help is appreciated!
The pkl file is a file that has sorted academic litterature in it - so nothing exciting.
import pickle
import pandas as pd
from pandas import ExcelWriter
def open_pickle():
savename = 'neuro_10_neuron[TIAB]_19-02-19'
try:
with open(savename + '.pkl', 'rb') as f:
holder = pickle.load(f)
except FileNotFoundError:
print('Cannot find it!')
framed = pd.DataFrame.from_dict(holder)
writer = ExcelWriter(savename + '.xlsx')
framed.to_excel(writer)
writer.save()
open_pickle()
Thanks in advance!
Below you find a picture of the entire error message. Maybe that points someone into a direction that might help me...
Well, for me the modification of the code after intstalling the xlsxwriter module solved it. Code now looks like this:
import pickle
import pandas as pd
from pandas import ExcelWriter
def open_pickle():
savename = 'neuro_10_neuron[TIAB] AND 2018[PDAT]_01-03-19'
try:
with open(savename + '.pkl', 'rb') as f:
holder = pickle.load(f)
except FileNotFoundError:
print('Cannot find it!')
framed = pd.DataFrame.from_dict(holder)
writer = ExcelWriter(savename + '.xlsx', engine='xlsxwriter')
framed.to_excel(writer)
writer.save()
return(framed)
a= open_pickle()
Hope that helps, if anyone ever comes across something like this.
Cheers

read excel file with words and write it to csv file in Python

I have an excel file with string stored in each cell:
rtypl srtyn OCVXZ srtyn
KPLNV KLNWZ bdfgh KLNWZ
xcvwh mvwhd WQKXM mvwhd
GYTR xvnm YTZN YTZN
ngws jklp PLNM jklp
I wanted to read excel file and write it in csv file. As you can see below:
import pandas as np
import csv
df = pd.read_excel(file, encoding='utf-16')
words= open("words.csv",'wb')
wr = csv.writer(words, dialect='excel')
for item in df:
wr.writerow(item)
But it reads the each line in separated alphabet and not as a string.
r,t,y,p,l
I am limited to write file as csv as I gonna use the result in a library that has lots of facility for csv file. Any advice on how I can read all the rows as a string in the cell is appreciated.
You can try the easiest solution:
# -*- coding: utf-8 -*-
import pandas as pd
df = pd.read_excel(file, encoding='utf-16')
df.to_csv('words.csv', encoding='utf-16')
Adding to zipa : If excel has multiple sheets : you can also try
import pandas as pd
df = pd.read_excel(file, 'Sheet1')
df.to_csv('words.csv')
Refer :
http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/

Resources