This question is related to my previous one.
Can you explain or provide a link to an explanation of how Excel VBA code password protection actually works in versions prior to 2007? What is the difference in Excel 2007 and previous versions in terms of password protection?
Also does Excel's password protection actually encrypt the code? How does Excel execute the code if it is encrypted?
Lastly, how does password removal software for excel work?
VBA security is widely considered to be pretty poor. The VBA code isn't compiled, and the source is available in the excel file. The password protection is pretty easy to circumvent.
As I understand it, Office 2003 and earlier saves the vba code as part of the binary format of the worksheet (or document / presentation). When you fire up the VBA IDE, it simply looks to see whether the VBA code has been "protected" or not. This doesn't mean it's encrypted - just unavailable for viewing. The theory is that this stops your users from meddling with your code, but a hard-core coder would be able to get around the password.
So Excel doesn't need to unencrypt any code - it just needs to stop people from viewing it.
Office 2007 does encrypt macros (don't ask me how or what algorithm). This is necessary presumably because XLSM files (or any Office 2007 file) are just zip files with a different extension. Anyone can get into those files and poke around.
To answer your last question - how does the password removal work on older Office formats, I'm not entirely sure. Different vendors will possibly approach the problem different ways, but I suspect the most common approach will be a brute-force attack on the passwords until a match is found.
The Excel VBProject object has a Protection property which will return different enumerations depending on the protection status of the macro (vbext_pp_locked if the macro is protected, for example). If you were to keep trying passwords programmatically until the vbext_pp_locked evaluated to false, you would have found your password.
Phil is correct - the password prevent you from looking at the modules, they are not encrypted themselves. I know in excel 2007 a file is essentially a zipped collection of XML and other files, but I don't know the details of how encryption is handled. For earlier versions - excel 2, 3, 4, 5, 95, 97, 2000, XP, & 2003, there is the comprehensive OpenOffice.org's Documentation of the Microsoft Excel File Format:
The Excel file format is named BIFF (Binary Interchange File Format). It is used to store all types of documents: worksheet documents, workbook documents, and workspace documents. There are different versions of this file format, depending on the version of Excel that has written the file, and depending on the document type.
A workbook document with several sheets (BIFF5-BIFF8) is usually stored using the compound document file format (also known as “OLE2 storage file format” or “Microsoft Office compatible storage file format”). It contains several streams for different types of data. A complete documentation of the format of compound document files can be found here.
The Workbook Protection Block occurs just after the DEFINEDNAME block (i.e. Named Ranges) in most BIFF streams, although BIFF8 is a major departure from that pattern. The record protection block In Biff5 - Biff8 the structure of the Workbook Protection Block:
WINDOWPROTECT Window settings: 1 = protected
PROTECT Workbook contents: 1 = protected
PASSWORD Hash value of the password; 0 = no password
PROT4REV Shared workbook: 1 = protected
PROT4REVPASS Hash value of the shared password; 0 = no password
The password block stores a 16-bit hash value, calculated from the worksheet or workbook protection password.
Someone made a working vba code that changes the vba protection password to "macro", for all excel files, including .xlsm (2007+ versions). You can see how it works by browsing his code.
Here's the guy blog: http://lbeliarl.blogspot.com/2014/03/excel-removing-password-from-vba.html
Here's the file that does the work: https://docs.google.com/file/d/0B6sFi5sSqEKbLUIwUTVhY3lWZE0/edit
Pasted from a previous post from his blog:
For Excel 2007/2010 (.xlsm) files do following steps:
Create a new .xlsm file.
In the VBA part, set a simple password (for instance 'macro').
Save the file and exit.
Change file extention to '.zip', open it by any archiver program.
Find the file: 'vbaProject.bin' (in 'xl' folder).
Extract it from archive.
Open the file you just extracted with a hex editor.
Find and copy the value from parameter DPB (value in quotation mark), example:
DPB="282A84CBA1CBA1345FCCB154E20721DE77F7D2378D0EAC90427A22021A46E9CE6F17188A". (This value generated for 'macro' password. You can use this DPB value to skip steps 1-8)
Do steps 4-7 for file with unknown password (file you want to unlock).
Change DBP value in this file on value that you have copied in step 8.
If copied value is shorter than in encrypted file you should populate missing characters with 0 (zero). If value is longer - that is not a problem (paste it as is).
Save the 'vbaProject.bin' file and exit from hex editor.
Replace existing 'vbaProject.bin' file with modified one.
Change extention from '.zip' back to '.xlsm'
Now, open the excel file you need to see the VBA code in. The password for the VBA code
will simply be macro (as in the example I'm showing here).
Related
TL;DR: Excel Workbook generated by Docx4J always says corrupted but I can't determine what Excel doesn't like about the underlying XML, let alone how to fix it.
My use case is as follows:
I am trying to produce an excel workbook with charts and graphs automatically on a regular basis. Only the raw data will change but everything else will dynamically update as the raw data is changed.
So I built an excel workbook which has a number of charts and graphs being generated by a sheet of raw data. I am using it as a template. All values of the raw data are numeric. The intent was to use Docx4J to read this 'template' and to populate the raw data sheet, then save it as a new file whereupon opening will initiate the recalculation and the charts and graphs will update. Since I am new to Docx4j, I basically decided to do baby steps by first seeing if I could open and read the contents of the cells; which I could. So far so good. I also could change the values of the cells but I could only verify this programatically by writing out to the console the location and value before a change, then the location and value after the change (ex. A1=45 followed by A1=55).
My problem starts when I try to open the resulting file. It generates, looks to be about the right size but Excel claims it is corrupted. It does try to recover what it can, but ultimately fails and the workbook won't even open. For troubleshooting, I opened up the generated xlsx and confirmed all the various XML files that make up an xlsx file were present and readable so I am concluding either something is missing or some part of the XML coming out the other side is not what Excel wants. Further troubleshooting involved creating an empty workbook (no data, 1 sheet) as my 'template', opening it and then saving it back to the file system with a different name and simply trying to see if I could open it in Excel but no dice. This has me ruling out anything to do with my attempts to write or add data to the sheet.
Relevant Environment Information:
'template' workbook is being generated on a Windows 10 64bit machine
My docx4j code is executing on a Debian 10 Linux machine running OpenJDK 11.0.4
My version of Excel both to create the 'template' and open the copy is Excel for Office365
I am running Docx4J v11.1.3 but I also tried with v8.1.5(both cases I had to use the Reference Implementation of JAXB to get around a marshalling error when trying to save)
I did see another post on Stackoverflow here about an issue related to fonts in Linux environments so I made sure to install the MS TT Corefonts but it didn't help my problem.
I ran the entire unzipped directory through BeyondCompare and there are some differences but I don't know which are just artifacts of the two different OS' or even which differences matter. Mostly they are:
small differences in file size
boolean values showing as "1", "yes", or "true" but not the same way for both files
namespaces and attributes in one file but not the other
Sheet1 from my blank workbook, before and after
All ideas are welcome.
Please try the just-released docx4j 8.1.6, which fixes handling of xlsx files created by recent releases of Excel. This was https://github.com/plutext/docx4j/issues/389
I am trying to have a single working excel document rather than multiple documents. The problem is that every document requires 3 signatures. If I merge the documents, each sheet would require 3 signatures.
The signatures would need to only affect the sheet they are on, rather than the whole document. For example, if Sheet1 is signed, it cannot be edited, but Sheet2 can still be edited. I do not mind using VBA if that is the best way to do this. I cannot even find a reference to anyone else trying this before... Thank you!
This does not work because signing Excel files is a built in Excel feature that is designed by Microsoft to sign a complete file. Therefore it is not possible to sign a worksheet only.
After a file is signed it is closed for editing:
For reference: Add or remove a digital signature in Office files.
I have a heavily formulated workbook which will be used by staff members who will paste in potentially confidential client information. The workbook then comes up with graphics and data to summarize the imputed information
In an ideal world, the people using the program will be able to Save specific sheets as a PDF and would be able to close the program without saving changes to the worksheet but Excel always demands to save the template first, THEN will allow PDFs. This is obviously a problem as if a person uses it, paste's in a client's information and saves the result as a PDF then whoever next opens the Excel workbook will have that previous client's information showing.
SO. I either need a way to tell Excel to not require compulsory saving to allow PDF conversion or another option which will result in the same thing.
I should also note that the workbook has to allow users to paste in information so I don't imagine a Read Only will help :/
I have also attempted using a Macro-Enabled Template which still has the same problem
I imagine there's a ridiculously simple solution to this.
Thanks in advance
I need to modify Excel workbooks from within VBA itself, where the user will point out which workbook to modify, after which a copy of the workbook will be modified and saved under a new name. This can be either an already open workbook, or a closed one on disk, and I need to support all workbook types from 2000 onward (2000-2016, binary, add-ins, templates, etc.). I need to modify the workbook's content as well as any custom UI (ribbon xml) in it. This all has to be done from random Excel installations not under my control.
The problem I'm facing is dealing with password protected workbooks - encrypted ones, i.e. a password for opening. My code needs to be able to handle those, and ideally apply any used password to the saved, updated copy as well.
The code flow is as follows:
let the user select the workbook to modify (via a form)
if the workbook is open:
.SaveCopyAs it to a temp folder
point to the saved copy for further processing
open the user-selected file in a 2nd instance of Excel (invisible)
update the opened temp copy's content
.SaveAs the temp copy to a temp folder without a password and close it
update the closed temp copy's custom UI
re-open the updated temp copy and .SaveAs it again with any password that was on the original workbook
With step 2.1 above, the .SaveCopyAs will save the open workbook and apply any password to the copy as well, which results in step 3 asking for the password in all cases. I cannot remove the pass in step 2.1 by using a .SaveAs, since that will result in the open workbook not being open anymore in the end. It would also only be a half-measure since it doesn't stop the same thing happening with closed files.
In this case, when Excel asks the user for the password (in Excel 2010 at least), the password prompt only shows the filename of the file with an edit box for the pass, all with a popped-up empty 2nd Excel window beneath it open, which is an ugly sight to behold. And it doesn't allow me to capture the entered password as well for step 7.
The best I can do I think is to detect when a closed workbook on disk is encrypted, and show my own password prompt instead before trying to open it. But how to do this? These are the options I can come up with;
When I use Workbooks.Open(Filename:=...), then Excel shows the password prompt, which I like to evade by asking for any pass beforehand myself.
When I use Workbooks.Open(Filename:=..., Password:="notthepassword"), at least Excel won't show a password prompt anymore, workbooks without a password open fine, and those with a pass now generate error 1004. However, I can't act on that to infer that a password is needed, since 1004 is Excel's catch-all error number, and I cannot inspect the Err.Description either for "wrong password" or such, since I do not know the Excel GUI language running on the client. And then there's also Check whether Excel file is Password protected ; when encrypted files have workbook structure protection on as well, apparently Excel won't open them anymore this way - I tested this and it does work with my 2010 Excel, but it's not very encouraging to hear.
Ignoring the ugliness of Excel asking for the pass, reading any Workbook.PasswordXxx property afterwards doesn't reveal anything; they always return the same values in all cases (with or without a password on the workbook).
For OOXML files (.xlsm / .xlsx etc.) I can inspect the file's zip content beforehand for the presence of the two files "EncryptionInfo" and "EncryptedPackage", indicating the file is encrypted, but what about 2000-2003 (.xls) files? There is Microsoft's documentation on the BIFF file structure used in those files telling to check for a FilePass record in the workbook stream; while I know I could implement that logic (there's e.g. the now unsupported Koogra project), I'd rather just not :) (side-question: since when did Microsoft publish these details without first signing NDA's and jumping through their legal hoops?!)
Does anyone have any insight as to how I can make code step 7. above work, short of just living with Excel's prompt and adding a key logger to my app? :)
Is there any way to programmatically determine if an .xls contains macros, without actually opening it in Excel?
Also are there any methods to examine which certificate (including timestamp cert) these macros are signed with? Again without using Excel.
I'm wondering in particular if there are any strings that always show up in the raw data of an Excel file when macros are present.
Yes, you can open the .xls file as a compound document file and check whether is contains a VBA folder and streams containing VBA code.
Sample code is available in this CodeProject article:
Another OLE Doc Viewer but with editing facility
The certificate information is stored in the DocumentSummaryInformation stream. If you want to read out the information from there you should dig into the file format specifications available from Microsoft:
[MS-OSHARED]: Office Common Data Types and Objects Structure Specification
[MS-OFFCRYPTO]: Office Document Cryptography Structure Specification
An xls file containing a macro should contain a string looking something like
Keyboard Shortcut:
Don't know if this is a surefire solution though