For Loop - Reading in all excel tabs into Panda Df's - excel

I have an .xlsx book and I would like to write a function or loop that would create Panda(s) DF's for each tab in excel. So for example, let's say that I have an excel book called book.xlsx and tabs called sheet1 - sheet6. I would like to read in the excel file and create 6 Panda DF's (sheet1 - sheet6) from a function or loop?

To load the file:
path = '../files_to_load/my_file.xlsx'
print(path)
excel_file = pd.ExcelFile(path)
print('File uploaded ✔')
To get a specific sheet:
# Get a specific sheet
raw_data = excel_file.parse('sheet1')
Here an example for the Loop:
You will have all of you sheets stored in a list. All the sheets will be dataframes
In [1]:
import pandas as pd
path = 'my_path/my_file.xlsx'
excel_file = pd.ExcelFile(path)
sheets = []
for sheet in excel_file.sheet_names:
data = excel_file.parse(sheet)
sheets.append(data)

You need to set sheet_name argument to None - it would create an ordered dictionary of sheets stored as dataframes.
dataframes = pd.read_excel(file_name, sheet_name=None)
>>> type(dataframes)
<class 'collections.OrderedDict'>
>>> type(dataframes['first']) # `first` is the name a sheet
<class 'pandas.core.frame.DataFrame'>

Related

Appending data from multiple excel files into a single excel file without overwriting using python pandas

Here is my current code below.
I have a specific range of cells (from a specific sheet) that I am pulling out of multiple (~30) excel files. I am trying to pull this information out of all these files to compile into a single new file appending to that file each time. I'm going to manually clean up the destination file for the time being as I will improve this script going forward.
What I currently have works fine for a single sheet but I overwrite my destination every time I add a new file to the read in list.
I've tried adding the mode = 'a' and a couple different ways to concat at the end of my function.
import pandas as pd
def excel_loader(fname, sheet_name, new_file):
xls = pd.ExcelFile(fname)
df1 = pd.read_excel(xls, sheet_name, nrows = 20)
print(df1[1:15])
writer = pd.ExcelWriter(new_file)
df1.insert(51, 'Original File', fname)
df1.to_excel(new_file)
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
for name in names:
excel_loader(name, 'specific_sheet_name', destination)
Thanks for any help in advance can't seem to find an answer to this exact situation on here. Cheers.
Ideally you want to loop through the files and read the data into a list, then concatenate the individual dataframes, then write the new dataframe. This assumes the data being pulled is the same size/shape and the sheet name is the same. If sheet name is changing, look into zip() function to send filename/sheetname tuple.
This should get you started:
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
#read all files first
df_hold_list = []
for name in names:
xls = pd.ExcelFile(name)
df = pd.read_excel(xls, sheet_name, nrows = 20)
df_hold_list.append(df)
#concatenate dfs
df1 = pd.concat(df_hold_list, axis=1) # axis is 1 or 0 depending on how you want to cancatenate (horizontal vs vertical)
#write new file - may have to correct this piece - not sure what functions these are
writer = pd.ExcelWriter(destination)
df1.to_excel(destination)

Updating excel sheet with Pandas without overwriting the file

I am trying to update an excel sheet with Python codes. I read specific cell and update it accordingly but Padadas overwrites the entire excelsheet which I loss other pages as well as formatting. Anyone can tell me how I can avoid it?
Record = pd.read_excel("Myfile.xlsx", sheet_name'Sheet1', index_col=False)
Record.loc[1, 'WORDS'] = int(self.New_Word_box.get())
Record.loc[1, 'STATUS'] = self.Stat.get()
Record.to_excel("Myfile.xlsx", sheet_name='Student_Data', index =False)
My code are above, as you can see, I only want to update few cells but it overwrites the entire excel file. I tried to search for answer but couldn't find any specific answer.
Appreciate your help.
Update: Added more clarifications
Steps:
1) Read the sheet which needs changes in a dataframe and make changes in that dataframe.
2) Now the changes are reflected in the dataframe but not in the sheet. Use the following function with the dataframe in step 1 and name of the sheet to be modified. You will use the truncate_sheet param to completely replace the sheet of concern.
The function call would be like so:
append_df_to_excel(filename, df, sheet_name, startrow=0, truncate_sheet=True)
from openpyxl import load_workbook
import pandas as pd
def append_df_to_excel(filename, df, sheet_name="Sheet1", startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn"t exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: "/path/to/file.xlsx")
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: "Sheet1")
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
"""
# ignore [engine] parameter if it was passed
if "engine" in to_excel_kwargs:
to_excel_kwargs.pop("engine")
writer = pd.ExcelWriter(filename, engine="openpyxl")
# Python 2.x: define [FileNotFoundError] exception if it doesn"t exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
if "index" not in to_excel_kwargs:
to_excel_kwargs["index"] = False
try:
# try to open an existing workbook
if "header" not in to_excel_kwargs:
to_excel_kwargs["header"] = True
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
to_excel_kwargs["header"] = False
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
to_excel_kwargs["header"] = True
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
We can't replace openpyxl engine here to write excel files as asked in comment. Refer reference 2.
References:
1) https://stackoverflow.com/a/38075046/6741053
2) xlsxwriter: is there a way to open an existing worksheet in my workbook?

How can I create an excel file with multiple sheets that stores content of a text file using python

I need to create an excel file and each sheet contains the contents of a text file in my directory, for example if I've two text file then I'll have two sheets and each sheet contains the content of the text file.
I've managed to create the excel file but I could only fill it with the contents of the last text file in my directory, howevr, I need to read all my text files and save them into excel.
This is my code so far:
import os
import glob
import xlsxwriter
file_name='WriteExcel.xlsx'
path = 'C:/Users/khouloud.ayari/Desktop/khouloud/python/Readfiles'
txtCounter = len(glob.glob1(path,"*.txt"))
for filename in glob.glob(os.path.join(path, '*.txt')):
f = open(filename, 'r')
content = f.read()
print (len(content))
workbook = xlsxwriter.Workbook(file_name)
ws = workbook.add_worksheet("sheet" + str(i))
ws.set_column(0, 1, 30)
ws.set_column(1, 2, 25)
parametres = (
['file', content],
)
# Start from the first cell. Rows and
# columns are zero indexed.
row = 0
col = 0
# Iterate over the data and write it out row by row.
for name, parametres in (parametres):
ws.write(row, col, name)
ws.write(row, col + 1, parametres)
row += 1
workbook.close()
example:
if I have two text file, the content of the first file is 'hello', the content of the second text file is 'world', in this case I need to create two worksheets, first worksheet needs to store 'hello' and the second worksheet needs to store 'world'.
but my two worksheets contain 'world'.
I recommend to use pandas. It in turn uses xlsxwriter to write data (whole tables) to excel files but makes it much easier - with literally couple lines of code.
import pandas as pd
df_1 = pd.DataFrame({'data': ['Hello']})
sn_1 = 'hello'
df_2 = pd.DataFrame({'data': ['World']})
sn_2 = 'world'
filename_excel = '1.xlsx'
with pd.ExcelWriter(filename_excel) as writer:
for df, sheet_name in zip([df_1, df_2], [sn_1, sn_2]):
df.to_excel(writer, index=False, header=False, sheet_name=sheet_name)

Passing XLSX sheet to csvDictReader

Is there a way that I can pass an Excel sheet to csv DictReader without needing to create a new csv file from the sheet?
I want to be able to access the data contained in the Excel sheet with the same way that one can access data with csv DictReader
Per #Clade's suggestion I used pandas.read_excel
import csv
import easygui as eg
import pandas as pd
sheets = ['UC Apps', 'IOS Devices']
for sheet in sheets:
xlsx_data = pd.read_excel(eg.fileopenbox(), sheet, index_col=None)
csv_as_string = xlsx_data.to_csv(index=False)
reader = csv.DictReader(csv_as_string.splitlines())
for row in reader:
print(row)

Python Merge Multiple Excel sheets to form a summary sheet

I need to merge data from multiple sheets of an Excel to form a new summary sheet using Python. I am using pandas to read the excel sheets and create new summary sheet. After concatenation the table format is getting lost i.e. Header and borders.
Is there a way to read from source sheet with the format and write to final sheet.
if first is not possible how to format the data after concatenation
Python Code to concatenate:
import pandas as pd
df = []
xlsFile = "some path excel"
sheetNames = ['Sheet1', 'Sheet2','Sheet3']
for nms in sheetNames:
data = pd.read_excel(xlsFile, sheet_name = nms, header=None, skiprows=1)
df.append(data)
final = "some other path excel "
df = pd.concat(df)
df.to_excel(final, index=False, header=None)
Sheet 1 Input Data
Sheet 2 Input Data
Sheet 3 Input Data
Summary Sheet output
You can try the following code:
df = pd.concat(pd.read_excel('some path excel.xlsx', sheet_name=None), ignore_index=True)
If you set sheet_name=None you can read all the sheets in the workbook at one time.
I suggest you the library xlrd
(https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966
and https://github.com/python-excel/xlrd)
It is a good library to do that.
from xlrd import open_workbook
path = '/Users/.../Desktop/Workbook1.xls'
wb = open_workbook(path, formatting_info=True)
sheet = wb.sheet_by_name("Sheet1")
cell = sheet.cell(0, 0) # The first cell
print("cell.xf_index is", cell.xf_index)
fmt = wb.xf_list[cell.xf_index]
print("type(fmt) is", type(fmt))
print("Dumped Info:")
fmt.dump()
see also:
Using XLRD module and Python to determine cell font style (italics or not)
and How to read excel cell and retain or detect its format in Python (I brought the above code from this address)

Resources