We have to process inbound client Excel files. One client highlights changed data in yellow. I have been asked to find the highlighted yellow cells, and import only the rows that have changes.
I am using Alteryx, but that is largely immaterial; The problem is not how to code this, but how to find the relevant highlighted rows and columns in the Excel internal file structure.
I know that an Excel workbook is a zipped set of XML files. I can copy the .xlsx file to .zip, and expand it, and process the individual XML worksheets, for example I can load ".\xl\worksheets\sheet1.xml". In there, I can see the cell locations, and I can see some XML attributes. If the cell has a style, it seems to get the s="" attribute.
I assume this relates to a style setting. I can open the style sheet, but I cannot relate the value in the s attribute to entries in the style sheet file. For example, I have a value of s="219" for a highlighted cell, but there are not 219 styles in the style file, so the s value is not an index.
Does anyone know how Excel files work 'under the hood', so that I can relate the worksheet cell data to the style data to find highlighted cells?
I am not looking for "have you tried an Excel macro?" or "use Python?" or other suggestions/guesses. Please, with great respect to your time and expertise, only answer if you know about Excel's internal file structure and you know the answer to this particular, admittedly very niche, query.
Related
By reading the last answer to the question:
How to keep style format unchanged after writing data using OpenPyXL package in Python?
I see there is an XML file which contains metadata from Excel, including external links to other excelworkbooks. Thanks for such a good explanation.
After using load_workbook, when I get the cell value of a cell containing a formula with a reference to other workbook, that referred workbook file name is replaced by a sort of index ([2] / [3] in the example which follows):
=H8+E5+G7+'s1'!$E$5+[2]Sheet1!$D$1+[3]Sheet1!$D$5
I need to obtain that formula with the real workbookname (all the referred excel workbooks are in the same folder as the one I am handling), something like this:
=H8+E5+G7+'s1'!$E$5+'s2.xlsm'Sheet1!$D$1+'s1.xlsm'Sheet1!$D$5
I have seen the option keep links, but I don't manage to get those links to be part of the formula of the cell.
Is there a way to keep the real cell value with references to other workbooks when reading the cell in the origin workbook?
Thanks so much in advance.
I have one excel file in which I have multiple sheets with financial statements from different companies (called Databas.xlsx). The structures of these sheets are identical. Then I have another excel file that I wish to use to analyse these financial statements using charts. Thus, I must get data from the different sheets into my analysis file. Doing this from one sheet is no problem, as I can simply create a chart and mark the data I need from this sheet, so that the chart data range would be something like this:
=[Databas.xlsx]Kopparbergs!$C$3:$K$3
where "Kopparbergs" is the sheet name in Databas.xlsx. The problem I am facing is that I want to be able to change the sheet name that is put into this formula by writing the name in a cell (because that would enable me to change multiple charts at once). So just to clarify, in the formula written above, I want to be able to change the word "Kopparbergs" by writing text in a cell. If that is not possible, how would I accomplish this? That is, how do you create a chart that can change its content depending on a text in a cell that corresponds to a sheet?
So rather than using Indirect I think you need to use two named ranges for referencing when using a Chart.
This previous answer looks like a good guide to implement (not sure about etiquette of just copy & pasting previous answers so I'll just provide the link):
Dynamic chart range using INDIRECT: That function is not valid (despite range highlighted)
Slightly odd question, but couldn't find anything on it: I have an .xlsb file in which data is displayed by groups, which have to be selected via a drop down menu, so that only part of the entire data set is displayed at any given time.
I'd like to get the underlying data for all groups, but for some reason the sheet from which it is derived does not exist. That is, I have a Sheet1 which displays the data, and the cells that hold the data have a formula that says =Sheet2!A1, but there is no Sheet2, and no sheets are hidden.
What could be going on here? Is this a special .xlsb feature that I don't know about?
thanks for looking at this problem, I hope I can get some help, as I am not very experienced with VBA syntax in excel.
Background:
I will be receiving a large (1000's of lines) CSV file that will contain data entries of various lengths. Each line will begin with a code (eg, 01, 02,..., 50) and have a series of data entries following it based on that code.
So, for example
01,data,data,data
01,data,data,data
02,data,data,data,data
etc...
I need to import all of this data into an existing excel workbook that already has separate tabs and headers created to correspond with the data type.
What I believe needs to be done, is to import the csv to a new, blank sheet, then run a vba program to check the data code, and move the line to the corresponding tab. I would also like to preserve the formatting on the destination sheet.
Ultimately, what I think I need is a VBA program to read the code cell, and move the line to an existing tab based on that code, and loop through the whole column.
Most of the existing solutions I have found involve the creation of new tabs, but I wish to parse the raw data into existing tabs with headers and formatting. I am aware this may require me to manually type in the code and destination tab names in the program's logic - That will not be an issue as long as I have a base to start with!
Thanks again for your help, and let me know if I can provide any more information.
I have a directory with many excel files with numeric data. In each file the data is arranged in the same manner (the same column names, etc...). I am interested to build an interactive
chart which will display the data according to the chosen file name.
For example, the file name will be chosen from validation list in a drop down menu fashion.
The question is how to specify the data range in the chart, such that it will change according to the name of the file that I choose.
I work with excel 2010 and don't have much experience with VBA programming :(
Thanks a lot,
Sasha
Well, a simple solution (there may be more elegant ways but that's a first hint):
Copy the data from your files in the Sheets of your workbook, you can create references (see here) or automate copies through macros (just STFW)
Create your validation list (for instance on Sheet1, cell A1). Let's assume this list contains: DataSource1, DataSource2, DataSource3.
Create Named Range for every other Sheet you have and use the same name as the one in your list (DataSource1, DataSource2, DataSource3)
In the chart source of values, use this formula: =INDIRECT(Sheet1!$A$1)
Hence, Excel will translate the source to the Named Range.
You could probably find solutions with vba too depending on your needs.
Regards,