Modifying Excel document using python and openpyxl - python-3.x

I am trying to clear a sheet in an Excel document using the Openpyxl library, however the excel document in question has links to other documents. Once the program has run and I open the Excel document, I am notified that the file is damaged and more specifically the damage relates to removal of the original links that were present in the original Excel document.
The code I use is seen below:
def import_data(source_location, xplan_export, workbook):
os.chdir(source_location)
read_csv = pd.read_csv(xplan_export, encoding='latin-1')
# print(read_csv)
vworkbook= openpyxl.load_workbook(workbook, read_only=False)
sheet = vworkbook['Xplan_Export']
# clear Xplan Export sheet in workbook of interest
for row in sheet.iter_rows():
for cell in row:
cell.value = None
vworkbook.save(workbook)
I would like to get around this to ensure that the original links remain in the output of the functions above.

If there are links in the excel, the style will be set to Hyperlink. You can see more details on how to set this here. So, you need to remove the data as well as the formatting. I was able to remove the data (the URL text) and the hyperlink by adding this additional line right below the cell.value = 'None'. After removing the data, type this (inside the loop)...
cell.style = 'Normal'
This will remove the ALL formatting and worked correctly... Hope this is what you are looking for.

Related

Import a specific sheet in an Excel file into Matlab

How can I import a specific sheet in an Excel file into Matlab (as an array or table)?
Apparently xlsread is not recommended in the official documentation. However, their recommended method of readtable does not allow you specify a sheet name (I think? perhaps missed it?)
Using test = xlsread('myfile.xlsx', 'my sheet name') seems to work fine in my case, except it skips column headers. Is there a way to keep headers?
Using test = readtable('myfile.xlsx') keeps the headers but just automatically imports the first sheet.
On Windows, Matlab R2018a.
By default, readtable reads the first sheet. You can specify sheet number/name as well to read your desired sheet.
test = readtable('myfile.xlsx','Sheet','my sheet name');
Please read the documentation for more details.

Adding formulas to excel spreadsheet using python

I am attempting to insert formulas into an excel spreadsheet using python.
Examples of the formulas are:
=VLOOKUP(B3|"Settlement Info"!$B$2:$R$2199|17|FALSE)
=SUMIFS("Payment and Fees"!$I$2:$I$6445|"Payment and Fees"!$B$2:$B$6445|Summary!$B3)
=+E3-F3-G3-I3
=IF(AND(I3>0|I3-N3>=-0.1|I3-N3<=0.1)|"Yes"|"No")
I tried using xlsxwriter and when opening the ss in excel it repairs by removing the "unreadable" content and those cells show as 0. I've seen the comment that the recalculation should be done on the reopening of the sheet when using xlsxwriter but that does not look like is is being done (https://xlsxwriter.readthedocs.io/working_with_formulas.html)
Is there some way to get these formulas into excel without them being removed by excel on opening?
Thanks for any pointers.
I simplified this down to a simple as possible:
When I run the below code and then attempt to open the excel spreadsheet I get an error saying "We found a problem with some content in ...Do you want us to try to recover..If you trust the source select yes"
If I select yes then I get an error " Removed Records: Formula from /xl/worksheets/sheet1.xml part"
And if I continue the sheet opens and there is a 0 in the field.
from xlsxwriter.workbook import Workbook
workbook = Workbook('test.xlsx')
worksheet = workbook.add_worksheet('Summary')
worksheet.write_formula('A2', '=VLOOKUP(B3,"Settlement Info"!$B$2:$R$2199,17,FALSE)')
workbook.close()
If I look at the information at https://xlsxwriter.readthedocs.io/working_with_formulas.html there is the information:
XlsxWriter doesn’t calculate the result of a formula and instead stores the value 0 as the formula result. It then sets a global flag in the XLSX file to say that all formulas and functions should be recalculated when the file is opened.
This is the method recommended in the Excel documentation and in general it works fine with spreadsheet applications. However, applications that don’t have a facility to calculate formulas will only display the 0 results. Examples of such applications are Excel Viewer, PDF Converters, and some mobile device applications.
Which I may not be understanding as I believe that the formula should be left in the sheet.
You can XlsxWriter to create any formula that Excel can handle. However, you need to be careful with the formatting of the formula to make sure that it matches the US version of Excel (which is the default format that formulas are stored in).
So your formulas should probably work as expected if you use a comma instead of a pipe:
=VLOOKUP(B3,"Settlement Info"!$B$2:$R$2199,17,FALSE)
=SUMIFS("Payment and Fees"!$I$2:$I$6445,"Payment and Fees"!$B$2:$B$6445,Summary!$B3)
=IF(AND(I3>0,I3-N3>=-0.1|I3-N3<=0.1),"Yes","No")
This one should work without modification:
=+E3-F3-G3-I3
See this section of the XlsxWriter docs on Working with Formulas.
Update in relation to the updated question:
The formula still has an error. You need to use single quotes instead of double quotes. You also need to add another worksheet for the formula to refer to. Like this:
from xlsxwriter.workbook import Workbook
workbook = Workbook('test.xlsx')
worksheet = workbook.add_worksheet('Summary')
worksheet.write_formula('A2', "=VLOOKUP(B3,'Settlement Info'!$B$2:$R$2199,17,FALSE)")
workbook.add_worksheet('Settlement Info')
workbook.close()
There is still an #N/A error in Excel but that is related to not having data in the VLOOKUP range:
Output:

Is there a python/openpyxl function to read an excel containing links to other workbooks, keeping the right format in the cell value?

By reading the last answer to the question:
How to keep style format unchanged after writing data using OpenPyXL package in Python?
I see there is an XML file which contains metadata from Excel, including external links to other excelworkbooks. Thanks for such a good explanation.
After using load_workbook, when I get the cell value of a cell containing a formula with a reference to other workbook, that referred workbook file name is replaced by a sort of index ([2] / [3] in the example which follows):
=H8+E5+G7+'s1'!$E$5+[2]Sheet1!$D$1+[3]Sheet1!$D$5
I need to obtain that formula with the real workbookname (all the referred excel workbooks are in the same folder as the one I am handling), something like this:
=H8+E5+G7+'s1'!$E$5+'s2.xlsm'Sheet1!$D$1+'s1.xlsm'Sheet1!$D$5
I have seen the option keep links, but I don't manage to get those links to be part of the formula of the cell.
Is there a way to keep the real cell value with references to other workbooks when reading the cell in the origin workbook?
Thanks so much in advance.

Downloading File with importrange function failing - think it's a bug

I've been saving Google Sheets to Excel without any problems for a while. These sheets have always successfully saved and opened in Excel with the importrange function. However, recently it hasn't been successfully saving correctly.
It used to just have the static value (e.g, 40). There used to be an IFERROR in the first cell in the header row but now it exists in every single cell.
E.g, each cell would have something like this:
=IFERROR(__xludf.DUMMYFUNCTION(importrange(blahblah)),"40").
DUMMYFUNCTION throws an error and "40" is returned as a result. but "40" is a string, not an integer which messes up all my formulas.
I also know this isn't an Excel issue because OpenOffice is doing the same thing with the file.
I'm pretty sure this would be a bug because why would it be working for months and then suddenly stop working?
What should I do?
I'm thinking it's a bug too.
Workarounds
On Excel
Copy and paste as values only the ranges with IFERROR(__xludf.DUMMYFUNCTION(..., then use Excel's UI tools to convert numbers shown as text to numbers.
Selectively remove quotes on the IFERROR second argument of the cells causing problems
Remove =IFERROR(__xludf.DUMMYFUNCTION(),"value") except value (we could use Excel's built-in FIND & REPLACE for this)
On Google Sheets
Use Copy > Paste as values only on the range areas having formulas with non-compatible functions like IMPORTRANGE, QUERY, FILTER, etc.
If you only need the values, download it as CSV instead of XLSX
IMPORTANT
In order to help to prioritize this issue, send feedback to Google. To do this open a Google Sheets spreadsheet, click on Help > Report a problem, then fill the feedback form and submit it.
Related stuff
I posted 5 small articles about this in Spanish. You could find them listed on https://www.rubenrivera.mx/p/descargar-hcg-excel.html.
We accidentally created a workaround for this bug with a different sheet that was just set up like this.
This works when you IMPORTRANGE into another Google Sheet. We are doing it into a Google Sheet with a single worksheet - haven't tried it with multiple.
It's going to sound a little nuts but it works for us.
In the first cell of your import range put a hyperlink in the original document you are importing from. This is in the first cell of the import range. We linked it to a worksheet in the original document. It has worked and failed with an external link. With an external link it worked when I linked it to an internal link, then changed it. But when I deleted the cell and just straight linked it to an external URL it didn't work.
Then #timbo was right - put data validation in. This can be in part of the document that isn't being imported into the second sheet. I put it in the first line of the import range but outside what I was importing. It might have to be the first line. I just put a date in one cell, then in the next cell data > data validation > then choose that one date as the data range.
For aesthetics I have hidden the first row in one Google Sheet I am importing into. In another I made the first cell link the title of the sheet and put the data validation outside the import range. Both of these work.
Let me know if this works for you.
Until this bug is fixed, a workaround is to put a data validation (Data > Data Validation) on the imported data (Any kind of data validation will do).

Macro to Split Excel Data into into Existing Tabs

thanks for looking at this problem, I hope I can get some help, as I am not very experienced with VBA syntax in excel.
Background:
I will be receiving a large (1000's of lines) CSV file that will contain data entries of various lengths. Each line will begin with a code (eg, 01, 02,..., 50) and have a series of data entries following it based on that code.
So, for example
01,data,data,data
01,data,data,data
02,data,data,data,data
etc...
I need to import all of this data into an existing excel workbook that already has separate tabs and headers created to correspond with the data type.
What I believe needs to be done, is to import the csv to a new, blank sheet, then run a vba program to check the data code, and move the line to the corresponding tab. I would also like to preserve the formatting on the destination sheet.
Ultimately, what I think I need is a VBA program to read the code cell, and move the line to an existing tab based on that code, and loop through the whole column.
Most of the existing solutions I have found involve the creation of new tabs, but I wish to parse the raw data into existing tabs with headers and formatting. I am aware this may require me to manually type in the code and destination tab names in the program's logic - That will not be an issue as long as I have a base to start with!
Thanks again for your help, and let me know if I can provide any more information.

Resources