Cannot find a way to replace last row inside excel files - excel

I am fighting with an excel file in which I would simple delete the last row.
I am using XLSXWRITER, and I tried several ways, but nothing is working. I am doing something wrong (maybe I have to take a break).
I tried
worksheet.write_blank(row, col, None)
but I found out that xlsxwriter cannot replace an old row with a new one. So if I use write_blank() to write on on an existing row, it won't work.
Could you please help me? I am looping through several XLSX file, open them and replace the last row with a blank.
Many thanks!

So, I found a way to achieve this step on my own.
Basically I wasn't able to do this with XLSXWRITER library, so I loop through my excel files opening them with OPENPYXL.
import openpyxl
from openpyxl import Workbook
## look for all excel files needed
filepath = r"C:\Users\name\Desktop\folder\folder\folder"
xlsxfiles = glob.glob(filepath + r"\**\*.xlsx")
## for each excel file open the workbook and spreadsheet
for file in xlsxfiles:
wb = openpyxl.load_workbook(file)
ws = wb.active
## for each excel file, count the maximum number of rows and store the value in last_row variable
last_row = ws.max_row
print("MAX NUMER OF ROW: ", last_row)
## replace the last row with None value
ws.cell(last_row, 1).value = None
## save each excel file
wb.save(file)
My need was quite specific but I think it can be easily modify to different purposes.

Related

Openpyxl created excel file with table causes file that requires recovery error

I have been testing adding a table to a worksheet using openpyxl, but I get the error below when I try to open it. The file opens, but the formatting isn't correct. After hitting recover, excel reports that there was an issue with the table xml. Is there a workaround/fix for this?
The code I'm using:
import openpyxl
from openpyxl import Workbook
from openpyxl.worksheet.table import Table, TableStyleInfo
xl_file_name = "new_test.xlsx"
wb = Workbook()
ws = wb.worksheets[0]
ws.title = "Table_Sheet"
headers = ["header1","header2","header3"]
for col in range(1,len(headers)+1):
for row in range(1,5):
if row == 1:
ws.cell(row,col).value = headers[col-1]
else:
ws.cell(row,col).value = str(row)
tbl = Table(displayName="Tbl1",ref="A1:C4")
style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False, showLastColumn=False, showRowStripes=True, showColumnStripes=True)
tbl.tableStyleInfo = style
ws.add_table(tbl)
wb.save("new_test.xlsx")
Your name for the table is causing the problem. Run the same code with displayName="Tbl" or displayName="Tbl_1" instead, and you'll see it works fine. I'm not 100% sure, but I think the cause of the issue is that the name you give conflicts with the formatting for a possible cell reference of TBL1.
For me the following worked:
Change the Workbook as you wish (only Data no formatting)
Save the Workbook (If you would try to open it here it will display the error message)
Close the Workbook
Open the Workbook again (I think here Excel fixes the issue automatically)
Insert necessary formatting commands
Save the workbook
Close the Workbook
Or, as code:
import openpyxl
workbook = openpyxl.load_workbook(Source_Path)
##your code appending and deleting values - which I think sometimes causes the errors
workbook.save(Destination_Path)
workbook.close
#Now open it again
workbook = openpyxl.load_workbook(Destination_Path)
#Your Code to format
workbook.save(Destination_Path)
workbook.close
Now you should be able to open the Excel file without an error.
I've had the same error message.
I was creating tables with numbers at the start of the name, so I changed that code to add t_ at the beginning, so
table_name = "112MHZ_data"
became
table_name = "t_112MHZ_data"
And that solved it for me.

how to update a portion of existing excel sheet with filtered dataframe?

I have an excel workbook with several sheets. I need to read a portion from one of the sheets, get a filtered dataframe and write a single value from that filtered dataframe to a specific cell in the same sheet. What is the best way to accomplish this, ideally without opening the excel workbook? I need to run this on linux, so can't use xlwings. I don't want to write the entire sheet, but just a selected cell/offset inside it. I tried the following to write to the existing sheet, but doesn't seem to work for me (no update occurs at the desired cell):
with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
writer.book = load_workbook('test.xlsx')
df_filtered.to_excel(writer, 'Sheet_Name', columns=['CS'], startrow=638, startcol=96)
Any tips would be helpful. Thanks.
If you're just writing a single cell the below should suffice.
import pandas as pd
import openpyxl
df = pd.DataFrame(data=[1,2,3], columns=['col'])
filtered_dataframe = df[df.col == 1].values[0][0]
filename = 'test.xlsx'
wb = openpyxl.load_workbook(filename)
wb['Sheet1'].cell(column=1, row=2, value=filtered_dataframe)
wb.save(filename)
I believe your issue was that you never called the save method of the writer.

Copy data from excel to excel with openpyxl

Couldn't find answers I can understand. So I've decided to ask.
I'm learning Python. And now I'm trying to solve a problem with collecting data from active spreadsheet in one excel file and paste it to another excel file. The first file contains table and a few cells with information to the right of it. I'm trying to fully copy the spreadsheet data.
import openpyxl, os
from openpyxl.cell import get_column_letter
os.chdir('D:\\Python')
wb = openpyxl.load_workbook('auto.xlsx')
ws = wb.active
# Create new workbook and chose active worksheet.
wbNew = openpyxl.Workbook()
wsNew = wbNew.active
# Loop through all cells (rows, then columns) in the first file
# and fill in the second one.
for allObj in ws['A1':get_column_letter(ws.max_column) + str(ws.max_row)]:
for cellObj in allObj:
for allNewObj in wsNew:
for newCellObj in allNewObj:
wsNew[cellObj.coordinate].value = cellObj.value
wbNew.save('example.xlsx')
Finally it works.

Working with Excel sheets in MATLAB

I need to import some Excel files in MATLAB and work on them. My problem is that each Excel file has 15 sheets and I don't know how to "number" each sheet so that I can make a loop or something similar (because I need to find the average on a certain column on each sheet).
I have already tried importing the data and building a loop but MATLAB registers the sheets as chars.
Use xlsinfo to get the sheet names, then use xlsread in a loop.
[status,sheets,xlFormat] = xlsfinfo(filename);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(filename,sheets{sheetindex});
data{sheetindex}=num; %keep for example the numeric data to process it later outside the loop.
end
I 've just remembered that i posted this question almost 2 years ago, and since I figured it out, I thought that posting the answer could prove useful to someone in the future.
So to recap; I needed to import a single column from 4 excel files, with each file containing 15 worksheets. The columns were of variable lengths. I figured out two ways to do this. The first one is by using the xlsread function with the following syntax.
for count_p = 1:2
a = sprintf('control_group_%d.xls',count_p);
[status,sheets,xlFormat] = xlsfinfo(a);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(a,sheets{sheetindex},'','basic');
data{sheetindex}=num;
FifthCol{count_p,sheetindex} = (data{sheetindex}(:,5));
end
end
for count_p = 3:4
a = sprintf('exercise_group_%d.xls',(count_p-2));
[status,sheets,xlFormat] = xlsfinfo(a);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(a,sheets{sheetindex},'','basic');
data{sheetindex}=num;
FifthCol{count_p,sheetindex} = (data{sheetindex}(:,5));
end
end
The files where obviously named control_group_1, control_group_2 etc. I used the 'basic' input in xlsread, because I only needed the raw data from the files, and it proved to be much faster than using the full functionality of the function.
The second way to import the data, and the one that i ended up using, is building your own activeX server and running a single excelapplication on it. Xlsread "opens" and "closes" an activeX server each time it's called so it's rather time consuming (using the 'basic' input does not though). The code i used is the following.
Folder=cd(pwd); %getting the working directory
d = dir('*.xls'); %finding the xls files
N_File=numel(d); % Number of files
hexcel = actxserver ('Excel.Application'); %starting the activeX server
%and running an Excel
%Application on it
hexcel.DisplayAlerts = true;
for index = 1:N_File %Looping through the workbooks(xls files)
Wrkbk = hexcel.Workbooks.Open(fullfile(pwd, d(index).name)); %VBA
%functions
WorkName = Wrkbk.Name; %getting the workbook name %&commands
display(WorkName)
Sheets=Wrkbk.Sheets; %sheets handle
ShCo(index)=Wrkbk.Sheets.Count; %counting them for use in the next loop
for j = 1:ShCo(index) %looping through each sheet
itemm = hexcel.Sheets.Item(sprintf('sheet%d',j)); %VBA commands
itemm.Activate;
robj = itemm.Columns.End(4); %getting the column i needed
numrows = robj.row; %counting to the end of the column
dat_range = ['E1:E' num2str(numrows)]; %data range
rngObj = hexcel.Range(dat_range);
xldat{index, j} = cell2mat(rngObj.Value); %getting the data in a cell
end;
end
%invoke(hexcel);
Quit(hexcel);
delete(hexcel);

How to import lots of data into matlab from a spreadsheet?

I have an excel spreadsheet with lots of data that I want to import into matlab.
filename = 'for_matlab.xlsx';
sheet = (13*2)+ 1;
xlRange = 'A1:G6';
all_data = {'one_a', 'one_b', 'two_a', 'two_b', 'three_a', 'three_b', 'four_a', 'four_b', 'five_a', 'five_b', 'six_a', 'six_b', 'seven_a', 'seven_b', 'eight_a', 'eight_b', 'nine_a', 'nine_b', 'ten_a', 'ten_b', 'eleven_a', 'eleven_b', 'twelve_a', 'twelve_b', 'thirteen_a', 'thirteen_b', 'fourteen_a'};
%read data from excel spreadsheet
for i=1:sheet,
all_data{i} = xlsread(filename, sheet, xlRange);
end
Each element of the 'all_data' vector has a corresponding matrix in separate excel sheet. The code above imports the last matrix only into all of the variables. Could somebody tell me how to get it so I can import these matrices into individual matlab variables (without calling the xlsread function 28 times)?
You define a loop using i but then put sheet in the actual xlsread call, which will just make it read repeatedly from the same sheet (the value of the variable sheet is not changing). Also not sure whether you intend to somehow save the contents of all_data, as written there's no point in defining it that way as it will just be overwritten.
There are two ways of specifying the sheet using xlsread.
1) Using a number. If you intended this then:
all_data{i} = xlsread(filename, i, xlRange);
2) Using the name of the sheet. If you intended this and the contents of all_data are the names of sheets, then:
data{i} = xlsread(filename, all_data{i}, xlRange); %avoiding overwriting

Resources