Case insensitive search of an Excel file using Pandas read_excel - python-3.x

I need to get sheets from an Excel file with a certain name. Unfortunately sometimes the sheet names are not formatted correctly ie "Test Sheet" vs "Test sheet". I need a case insestive way of getting these sheets.
excel_file= pd.ExcelFile("file_name.xlsx")
sheet_needed = pd.read_excel(excel_file, sheet_name="Test Sheet") # <- This needs to be case insensitive

So pandas doesnt seem to have a good way of having a case insensitive search, However you can get the sheetnames as a list and pd.read will accept an index for the sheet name so I came up with this to solve the problem
excel_file= pd.ExcelFile("file_name.xlsx")
sheet_to_find = "Test Sheet"
# Get all the sheetnames as a list
sheet_names = excel_file.sheet_names
# Format the list of sheet names
sheet_names = [name.lower() for name in sheet_names]
# Get the index that matches our sheet to find
index = sheet_names.index(sheet_to_find.lower())
# Feed this index into pandas
sheet_needed = pd.read_excel(excel_file, sheet_name=index)

I don't know how to make that request case insesitive, but you could try to manipulate the file with openpyxl something like this:
import openpyxl
filename = 'file_name.xlsx'
wb = openpyxl.load_workbook(filename)
for ws in wb.worksheets:
ws.title = ws.title.title()
filename = 'new_'+filename
wb.save(filename)
wb.close()
the old title gets replaced with the 'titlelized' name of itself. You could also use the lower() or upper() function of the str object for that.

Related

Python3 - Openpyxl - For loop to search through Column - Gather information Based on position of first cell location

UPDATE!
My goal is to modify an existing Workbook ( example - master_v2.xlsm ) and produce a new workbook (Newclient4) based on the updates made to master_v2.
I'm using a single sheet within master_v2 to collect all the data which will be determining what the new workbook will be.
Currently using multiple if statements to find the value of the cells in this "repository" sheet. Based on specific cells, I'm creating and adding values to copies of an existing sheet called "PANDAS".
My goal right now is to create a dict based on two columns. The loop through
the keys so that every time I get a hit on a cell, I will gather values from specific keys.
That's listed below:
from openpyxl import load_workbook
# Start by opening the spreadsheet and selecting the main sheet
workbook = load_workbook(filename="master_v2.xlsm",read_only=False, keep_vba=True)
DATASOURCE = workbook['repository']
DATASOURCE["A:H"]
cell100 = DATASOURCE["F6"].value
CREATION = cell100
cell101 = DATASOURCE["F135"].value
CREATION2 = cell101
cell107 = DATASOURCE["F780"].value
CREATION7 = cell107
if CREATION.isnumeric():
source = workbook['PANDAS']
target = workbook.copy_worksheet(source)
ss_sheet = target
ss_sheet.title = DATASOURCE['H4'].value[0:12]+' PANDAS'
if CREATION2.isnumeric():
source = workbook['PANDAS']
target = workbook.copy_worksheet(source)
ss_sheet = target
ss_sheet.title = DATASOURCE['H133'].value[0:12]+' PANDAS'
if CREATION3.isnumeric():
source = workbook['PANDAS']
target = workbook.copy_worksheet(source)
ss_sheet = target
ss_sheet.title = DATASOURCE['H262'].value[0:12]+' PANDAS'
else:
print ("no")
workbook.save(filename="NewClient4.xlsm")
Instead of the many if statements I was hoping to be able to loop through the column as explained above,
once I found my value, gather data and copy it over to a copy of sheet which is then filled out by other cells. Each time the loop comples, I want to do repeat on the next match of the string.. but I'm only this far and it's not quite working.
Anyone have a way to get this working?
( trying to replace the many one to one mappings and if statements )
for i in range(1,3000):
if DATASOURCE.cell(row=i,column=5).value == "Customer:":
source = workbook['Design details']
target = workbook.copy_worksheet(source)
ss_sheet = target
ss_sheet.title = DATASOURCE['H4'].value[0:12]+' Design details'
else:
print ("no")
Thank you guys in advanced

Openpyxl created excel file with table causes file that requires recovery error

I have been testing adding a table to a worksheet using openpyxl, but I get the error below when I try to open it. The file opens, but the formatting isn't correct. After hitting recover, excel reports that there was an issue with the table xml. Is there a workaround/fix for this?
The code I'm using:
import openpyxl
from openpyxl import Workbook
from openpyxl.worksheet.table import Table, TableStyleInfo
xl_file_name = "new_test.xlsx"
wb = Workbook()
ws = wb.worksheets[0]
ws.title = "Table_Sheet"
headers = ["header1","header2","header3"]
for col in range(1,len(headers)+1):
for row in range(1,5):
if row == 1:
ws.cell(row,col).value = headers[col-1]
else:
ws.cell(row,col).value = str(row)
tbl = Table(displayName="Tbl1",ref="A1:C4")
style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False, showLastColumn=False, showRowStripes=True, showColumnStripes=True)
tbl.tableStyleInfo = style
ws.add_table(tbl)
wb.save("new_test.xlsx")
Your name for the table is causing the problem. Run the same code with displayName="Tbl" or displayName="Tbl_1" instead, and you'll see it works fine. I'm not 100% sure, but I think the cause of the issue is that the name you give conflicts with the formatting for a possible cell reference of TBL1.
For me the following worked:
Change the Workbook as you wish (only Data no formatting)
Save the Workbook (If you would try to open it here it will display the error message)
Close the Workbook
Open the Workbook again (I think here Excel fixes the issue automatically)
Insert necessary formatting commands
Save the workbook
Close the Workbook
Or, as code:
import openpyxl
workbook = openpyxl.load_workbook(Source_Path)
##your code appending and deleting values - which I think sometimes causes the errors
workbook.save(Destination_Path)
workbook.close
#Now open it again
workbook = openpyxl.load_workbook(Destination_Path)
#Your Code to format
workbook.save(Destination_Path)
workbook.close
Now you should be able to open the Excel file without an error.
I've had the same error message.
I was creating tables with numbers at the start of the name, so I changed that code to add t_ at the beginning, so
table_name = "112MHZ_data"
became
table_name = "t_112MHZ_data"
And that solved it for me.

Cannot find a way to replace last row inside excel files

I am fighting with an excel file in which I would simple delete the last row.
I am using XLSXWRITER, and I tried several ways, but nothing is working. I am doing something wrong (maybe I have to take a break).
I tried
worksheet.write_blank(row, col, None)
but I found out that xlsxwriter cannot replace an old row with a new one. So if I use write_blank() to write on on an existing row, it won't work.
Could you please help me? I am looping through several XLSX file, open them and replace the last row with a blank.
Many thanks!
So, I found a way to achieve this step on my own.
Basically I wasn't able to do this with XLSXWRITER library, so I loop through my excel files opening them with OPENPYXL.
import openpyxl
from openpyxl import Workbook
## look for all excel files needed
filepath = r"C:\Users\name\Desktop\folder\folder\folder"
xlsxfiles = glob.glob(filepath + r"\**\*.xlsx")
## for each excel file open the workbook and spreadsheet
for file in xlsxfiles:
wb = openpyxl.load_workbook(file)
ws = wb.active
## for each excel file, count the maximum number of rows and store the value in last_row variable
last_row = ws.max_row
print("MAX NUMER OF ROW: ", last_row)
## replace the last row with None value
ws.cell(last_row, 1).value = None
## save each excel file
wb.save(file)
My need was quite specific but I think it can be easily modify to different purposes.

how to update a portion of existing excel sheet with filtered dataframe?

I have an excel workbook with several sheets. I need to read a portion from one of the sheets, get a filtered dataframe and write a single value from that filtered dataframe to a specific cell in the same sheet. What is the best way to accomplish this, ideally without opening the excel workbook? I need to run this on linux, so can't use xlwings. I don't want to write the entire sheet, but just a selected cell/offset inside it. I tried the following to write to the existing sheet, but doesn't seem to work for me (no update occurs at the desired cell):
with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
writer.book = load_workbook('test.xlsx')
df_filtered.to_excel(writer, 'Sheet_Name', columns=['CS'], startrow=638, startcol=96)
Any tips would be helpful. Thanks.
If you're just writing a single cell the below should suffice.
import pandas as pd
import openpyxl
df = pd.DataFrame(data=[1,2,3], columns=['col'])
filtered_dataframe = df[df.col == 1].values[0][0]
filename = 'test.xlsx'
wb = openpyxl.load_workbook(filename)
wb['Sheet1'].cell(column=1, row=2, value=filtered_dataframe)
wb.save(filename)
I believe your issue was that you never called the save method of the writer.

How to import lots of data into matlab from a spreadsheet?

I have an excel spreadsheet with lots of data that I want to import into matlab.
filename = 'for_matlab.xlsx';
sheet = (13*2)+ 1;
xlRange = 'A1:G6';
all_data = {'one_a', 'one_b', 'two_a', 'two_b', 'three_a', 'three_b', 'four_a', 'four_b', 'five_a', 'five_b', 'six_a', 'six_b', 'seven_a', 'seven_b', 'eight_a', 'eight_b', 'nine_a', 'nine_b', 'ten_a', 'ten_b', 'eleven_a', 'eleven_b', 'twelve_a', 'twelve_b', 'thirteen_a', 'thirteen_b', 'fourteen_a'};
%read data from excel spreadsheet
for i=1:sheet,
all_data{i} = xlsread(filename, sheet, xlRange);
end
Each element of the 'all_data' vector has a corresponding matrix in separate excel sheet. The code above imports the last matrix only into all of the variables. Could somebody tell me how to get it so I can import these matrices into individual matlab variables (without calling the xlsread function 28 times)?
You define a loop using i but then put sheet in the actual xlsread call, which will just make it read repeatedly from the same sheet (the value of the variable sheet is not changing). Also not sure whether you intend to somehow save the contents of all_data, as written there's no point in defining it that way as it will just be overwritten.
There are two ways of specifying the sheet using xlsread.
1) Using a number. If you intended this then:
all_data{i} = xlsread(filename, i, xlRange);
2) Using the name of the sheet. If you intended this and the contents of all_data are the names of sheets, then:
data{i} = xlsread(filename, all_data{i}, xlRange); %avoiding overwriting

Resources