Openpyxl - Setting Active Sheet (page.append not working as expected) - python-3.x

I've tried implementing multiple solutions from existing posts, like this one, to no avail:
openpyxl Set Active Sheet
I have a workbook with n number of sheets that I want to iteratively step through, and apply header information to. As far as I can tell, using wb.active = i is setting the active worksheet, but when I follow up with page.append(header), I end up with the header appended n times, ONLY to the index 0 sheet. This is essentially the same q as the link above, but the solution doesn't seem to work.
What am I missing here? I wonder if I need to specify an index for page.append(), but that doesn't seem to be a valid argument for that func.
CODE
header = ['Time [sec]', 'Altitude [km]', 'Velocity [km/s]']
for i in range(len(wb.sheetnames)):
wb.active = i
print(wb.active)
page.append(header)
wb.save(path)
CONSOLE (verifies that the wb.active function is working, but the sheets specified aren't being appended)
<Worksheet "ORB1">
<Worksheet "ORB2">
<Worksheet "ORB3">
<Worksheet "ORB4">
<Worksheet "ORB5">
Here is another version which produces the same result (5x headers applied only to the first sheet).
header = ['Time [sec]', 'Altitude [km]', 'Velocity [km/s]']
for i, s in enumerate(wb.sheetnames):
page.append(header)
wb.save(path)

This one is SOLVED but I want to keep the q up because the solution is... weird.
Earlier in the code I was assigning page = wb.active, and then later using page.append(header).
The issue with that ^, has to do with the format for setting the active sheet.
wb.active is used such that wb.active = sheet_index, rather than the typical function structure where wb.active(sheet_index).
Because of this bizarre arg format, simplifying "wb.active" to "page" breaks this function.
TLDR: This does not work...
page = wb.active
page.append(header)
You must use...
wb.active.append(header)
No idea why that function has such a strange structure, but I suspect I'm not the only person to have had this issue.

Related

Set workbook view with openpyxl?

Is it possible to set the Workkbook View to "Page Layout" with openpyxl? Looking on stackoverflow and the openpyxl docs I can't seem to find it. Is it possible?
Yes, it is possible with this code:
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
#Value must be one of {'pageBreakPreview', 'pageLayout', 'normal'}
ws.sheet_view.view = "pageLayout"
How did I find out?
To my surprise I also could not find any tutorials or anything in the docs on this topic.
I did a little digging and if you type into the terminal:
print(ws.sheetview)
A series of parameters will pop out, including the one we are looking for (the view attribute):
Parameters:
windowProtection=None, showFormulas=None, showGridLines=None, showRowColHeaders=None, showZeros=None, rightToLeft=None, tabSelected=None, showRuler=None, showOutlineSymbols=None, defaultGridColor=None, showWhiteSpace=None, veSymbols=None, defaultGridColor=None, showWhiteSpace=None, view=None, topLeftCell=None, colorId=None, zoomScale=None, zoomScaleNormal=None, zoomScaleSheetLayoutViewalePageLayoutNone,ne, selection=[<openpyxl.worksheet.views.S=None, zoomScalePageLayoutView=None, zoomToFit=None, workbookViewId=0, pane=None, selection=[<openpyxl.worksheet.views.Selection object>
You can then set this attribute according to the code on the top of the answer to the three predetermined values otherwise you will get an ValueError: Value must be one of {'pageBreakPreview', 'pageLayout', 'normal'}.

Openpyxl returns wrong hyperlink address after delete_rows()

Problem: I have a program that scrapes Twitter and returns the results in an excel file. Part of each entry is a column containing a hyperlink to the Tweet and image included in the Tweet if applicable. Entries and hyperlinks work fine except when I run the following code to remove duplicate posts:
#Remove duplicate posts.
values = []
i = 2
while i <= sheet.max_row:
if sheet.cell(row=i,column=3).value in values:
sheet.delete_rows(i,1)
else:
values.append(sheet.cell(row=i,column=3).value)
i+=1
After running the duplicate removal snippet the hyperlinks point to what I assume is the offset of deleted entries. Here is the code for creating a Twitter entry:
sheet.cell(row=row, column=8).hyperlink = "https://twitter.com/"+str(tweet.user.screen_name)+"/status/"+str(tweet.id)
sheet.cell(row=row, column=8).style = "Hyperlink"
Expected Results: Should be able to remove duplicate entries and keep the hyperlink pointed to the correct address.
The hyperlinks point to the correct addresses for whatever reason when I change the code to the this:
sheet.cell(row=row, column=8).value = "https://twitter.com/"+str(tweet.user.screen_name)+"/status/"+str(tweet.id)
sheet.cell(row=row, column=8).style = "Hyperlink"
Requires a rapid double click to work as a hyperlink in the excel sheet versus the one click when inserting using .hyperlink.
So fixed but not fixed.

Paste Special Transpose Syntax in Matlab using ActXServer

I am working on a code in Matlab that will open excel spreadsheet, copy a certain range, and paste it in a new sheet transposing my range in the process. I am completely stuck on the PasteSpecial method and cannot figure out how to make it transpose my data. I've tried everything I could think of: tried VBA-like syntax (Transpose=True), tried (Transpose, 1), tried ([],[],[],1), tried obj.Transpose(with all kinds of variations in the brackets)... and all kinds of other stuff to no avail. Please help me if anyone had done this before. Below if my simplified code in case it's needed. Thank you in advance!
Excel = actxGetRunningServer('excel.application');
set(Excel, 'Visible', 1);
Workbooks = Excel.Workbooks;
Workbook = Excel.Workbooks.Open('C:\Users\...test.xlsx');
curr_sheet = get(Workbook,'ActiveSheet');
rngObj = ('A1:C3')
rngObj.Copy
Sheets = Excel.ActiveWorkBook.Sheets;
new_sheet = Sheets.Add;
new_sheet.PasteSpecial; %This is where I am stuck!
The documentation for PasteSpecial has four input arguments to indicate the parameters of the paste operation. As you can see, the fourth option indicates whether to transpose the data or not.
new_sheet.PasteSpecial(NaN, NaN, NaN, true);

Can pandas implicitly determine header based on value, not row?

I work with people who use Excel and continuously add or subtract rows unbeknownst to me. I have to scrape a document for data, and the row where the header is found changes based on moods.
My challenge is to handle these oscillating currents by detecting where the header is.
I first organized my scrape using xlrd and a number of conditional statements using the values in the workbook.
My initial attempt works and is long (so I will not publish it) but involves bringing in the entire sheet, and not slices:
from xlrd import open_workbook
book = open_workbook(fName)
sheet = book.sheet_by_name(sht)
return book,sheet
However, it is big and I would prefer to get a more targeted selection. The header values never change, nor does when the data shows up after this row.
Do you know of a way to implicitly get the header based on a found value in the sheet using either pandas.ExcelFile or pandas.read_excel?
Here is my attempt with pandas.ExcelFile:
import pandas as pd
xlsx = pd.ExcelFile(fName)
dataFrame = pd.read_excel(xlsx, sht,
parse_cols=21, merge_cells=noMerge,
header=header)
return dataFrame
I cannot get the code to work unless I give the call the correct header value, which is exactly what I'm hoping to avoid.
This previous question seems to present a similar problem without addressing the concern of finding the headers implicitly.
Do the same loop through ExcelFile objects:
xlsx = pd.ExcelFile(fName)
sheet = xlsx.sheet_by_name(sht)
# apply the same algorithm you wrote against xlrd here
# ... results in having header_row = something, 0 based
dataFrame = pd.read_excel(xlsx, sht,
parse_cols=21, merge_cells=noMerge,
skip_rows=header_row)

Working with Excel sheets in MATLAB

I need to import some Excel files in MATLAB and work on them. My problem is that each Excel file has 15 sheets and I don't know how to "number" each sheet so that I can make a loop or something similar (because I need to find the average on a certain column on each sheet).
I have already tried importing the data and building a loop but MATLAB registers the sheets as chars.
Use xlsinfo to get the sheet names, then use xlsread in a loop.
[status,sheets,xlFormat] = xlsfinfo(filename);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(filename,sheets{sheetindex});
data{sheetindex}=num; %keep for example the numeric data to process it later outside the loop.
end
I 've just remembered that i posted this question almost 2 years ago, and since I figured it out, I thought that posting the answer could prove useful to someone in the future.
So to recap; I needed to import a single column from 4 excel files, with each file containing 15 worksheets. The columns were of variable lengths. I figured out two ways to do this. The first one is by using the xlsread function with the following syntax.
for count_p = 1:2
a = sprintf('control_group_%d.xls',count_p);
[status,sheets,xlFormat] = xlsfinfo(a);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(a,sheets{sheetindex},'','basic');
data{sheetindex}=num;
FifthCol{count_p,sheetindex} = (data{sheetindex}(:,5));
end
end
for count_p = 3:4
a = sprintf('exercise_group_%d.xls',(count_p-2));
[status,sheets,xlFormat] = xlsfinfo(a);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(a,sheets{sheetindex},'','basic');
data{sheetindex}=num;
FifthCol{count_p,sheetindex} = (data{sheetindex}(:,5));
end
end
The files where obviously named control_group_1, control_group_2 etc. I used the 'basic' input in xlsread, because I only needed the raw data from the files, and it proved to be much faster than using the full functionality of the function.
The second way to import the data, and the one that i ended up using, is building your own activeX server and running a single excelapplication on it. Xlsread "opens" and "closes" an activeX server each time it's called so it's rather time consuming (using the 'basic' input does not though). The code i used is the following.
Folder=cd(pwd); %getting the working directory
d = dir('*.xls'); %finding the xls files
N_File=numel(d); % Number of files
hexcel = actxserver ('Excel.Application'); %starting the activeX server
%and running an Excel
%Application on it
hexcel.DisplayAlerts = true;
for index = 1:N_File %Looping through the workbooks(xls files)
Wrkbk = hexcel.Workbooks.Open(fullfile(pwd, d(index).name)); %VBA
%functions
WorkName = Wrkbk.Name; %getting the workbook name %&commands
display(WorkName)
Sheets=Wrkbk.Sheets; %sheets handle
ShCo(index)=Wrkbk.Sheets.Count; %counting them for use in the next loop
for j = 1:ShCo(index) %looping through each sheet
itemm = hexcel.Sheets.Item(sprintf('sheet%d',j)); %VBA commands
itemm.Activate;
robj = itemm.Columns.End(4); %getting the column i needed
numrows = robj.row; %counting to the end of the column
dat_range = ['E1:E' num2str(numrows)]; %data range
rngObj = hexcel.Range(dat_range);
xldat{index, j} = cell2mat(rngObj.Value); %getting the data in a cell
end;
end
%invoke(hexcel);
Quit(hexcel);
delete(hexcel);

Resources