I am attempting to user SpreadsheetML to generate an Excel report server side and download the report via the browser. I have everything working and am getting the files I need downloaded. However, I run into a problem when I attempt to merge cells in one of the sheets I am creating. I have found two different syntaxes online and tried them both without success. I save the files as .xml files and the will open fine and show the expected data but the cells are not merged.
The first syntax uses the "mergeAcross" qualifier on the element and is supposed to merge the number of cells specified into the current cell. The second syntax using the element. I have pasted the actual xml code below for both attempts. If I can figure out what the XML should be then I can create it programatically easily.
Version 1
<?xml version='1.0'?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name='Distribution List Overview'>
<ss:Table>
<ss:Row>
<ss:Cell mergeAcross="2"><ss:Data ss:Type="String">First Cell Entry</ss:Data></ss:Cell>
<ss:Cell><ss:Data ss:Type="String">Third Cell</ss:Data></ss:Cell>
</ss:Row>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>
Version 1
<?xml version='1.0'?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name='Distribution List Overview'>
<ss:Table>
<ss:Row>
<ss:Cell><ss:Data ss:Type="String">First Cell Entry</ss:Data></ss:Cell>
<ss:Cell><ss:Data ss:Type="String">Third Cell</ss:Data></ss:Cell>
</ss:Row>
</ss:Table>
<mergeCells count="2">
<mergeCell ref="A1:B1"/>
</mergeCells>
</ss:Worksheet>
</ss:Workbook>
Both approaches above fail to create a merged cell. I am expecting to get "First Cell Entry" filling cells A1 and B1 with "Third Cell" in cell C1. Instead I get "First Cell Entry" in cell A1 and "Third Cell" in cell B1. Any help or suggestions would be greatly appreciated. This is the only remaining item I need to get some legacy code working and I do not want to convert the entire report over to OpenXML SDK code.
Try replacing mergeAcross with ss:MergeAcross. In other words, use:
<ss:Cell ss:MergeAcross="2"><ss:Data ss:Type="String">First Cell Entry</ss:Data></ss:Cell>
Also, you may notice that to get "A1:B1" merged cell you need to set ss:MergeAcross value to 1, not 2.
Mario's answer remains the correct answer to this question.
However, since there is decent traffic hitting this question I thought I would add some sample code for a few additional operations that I had to dig up when working on this report.
Here is an example of how to add styling information to a file. Add a block like the one below into your file immediately after the workbook element
<?xml version='1.0'?><ss:Workbook xmlns:ss='urn:schemas-microsoft-com:office:spreadsheet'>
<ss:Styles>
<ss:Style ss:ID='1'>
<ss:Font ss:Bold='1'/>
<ss:Alignment ss:Horizontal='Center'/>
</ss:Style>
</ss:Styles>
<ss:Worksheet ss:Name='Distribution List Overview'>
The defines a style to set the font-weight bold and to horizontally center the text in each cell. You seem to be able to add an arbitrary number of style parameters to the block and should be able to specify pretty much anything supported by Excel. You'll have to do some research to find out what the correct element names are but they seem to closely parallel Excel commands so guessing is not as hard as it sounds.
You can add this to a Cell or Row by appending the style descriptor the the target element as shown below
<ss:Cell ss:StyleID='1'>
To specify horizontal alignment and wrapped text declare another style block in between the 'ss:Styles' and '/ss:Styles' elements and give it a unique identifier
<ss:Style ss:ID='3'>
<ss:Alignment ss:Vertical='Bottom' ss:WrapText='1'/>
</ss:Style>
Borders can be created using the following style structure
<ss:Style ss:ID='4'>
<ss:Font ss:Bold='1'/>
<ss:Borders>
<ss:Border ss:Position='Bottom' ss:LineStyle='Continuous' ss:Weight='1'/>
<ss:Border ss:Position='Left' ss:LineStyle='Continuous' ss:Weight='1'/>
<ss:Border ss:Position='Right' ss:LineStyle='Continuous' ss:Weight='1'/>
<ss:Border ss:Position='Top' ss:LineStyle='Continuous' ss:Weight='1'/>
</ss:Borders>
When assigned to a Cell this will set the text font to bold and apply borders on all sides with a normal weight. Adjust the Weight parameter to make the borders bolder.
Finally, I spent quite a while investigating how to add multiple records to a single cell. As far as I can tell you cannot nest Tables inside a cell so I had to figure out how to encode an alt-enter to cause line feeds inside the cell. This requires a custom style as well as some special text to be inserted inline. You need to enable line wrap as I showed above and then use the '
' string to separate your lines. The block below will display two names in the specified cell on two lines using the example style specified above
<ss:Cell ss:StyleID='3'>
<ss:Data ss:Type='String'>Jane Doe
Janet Doe</ss:Data>
</ss:Cell>
I have not managed to figure out a way to apply multiple styles to a single element so I have had to create several styles, some with very minor differences, and assign each cell the specific style ID it requires to format everything correctly. For example, I had to create a style with normal text and normal weight borders, one with bold text and normal borders and one with bold text and bold borders since I could not figure out a way to apply the font weight and the border weight separately.
Related
I recently wanted to create an excel table to note all my electric components that I have so I can easily find the right component without searching every time for it.
The problem is, especially with capacitors, they come with a wide range of values generally for me between 220uF and 10pF, and I want to create a custom format to display the values properly in excel, for example if I put in a cell 0.00022 it shows 220uF or maybe 0.22mF (but 220uF is better) and not 2.2E-04 or any other format.
I tried the custom tool but I don't know how to add the micros, nanos and picos.
You can add conditions to the formatting, like:
[<0.00001] 0.00%%% "pF";[<0.001] 0.00% "uF";#
This will show 2.20%% uF in the cell (so you can multiply by using the percent sign, more details here).
Drawback: the percent signs are shown, and the Ctrl+J trick described on the link does not really work for me (and I personally find it as an ugly solution).
If I were you I'd add a new column called "Formatted" where I multiply the values with formulas. Like
=IF(A1<0.00001,A1*1000000 & "pF",IF(A1<0.001,A1*10000 & "uF"))
It's easier then to search in both columns (one is by formatted, like all "uF", other is by real Farad value. Also you could use the original column for sorting.
Good morning,
New here so, please in advance excuse me if not very clear.
Here is my problem:
I have created large dependent drop-down lists through both the INDIRECT function and the VLOOKUP function (interesting but painful exercise).
The results are drawn from lists. Those list are formatted with color codes.
For instance my list SEC_PRIV is made of 5 lines:
- Creation (in blue)
- Renforcement des capacites (in green)
- etc...
When I select one of those, the subsequent activities are also color coded in the same colors in the source lists.
I would like, when I select those, that the results appearing in my selection cell, keep the color of the source lines.
Here are the tables to clarify maybe:
enter image description here
- on the table "source list", you can see that "CREATION" is in blue. However you can also see that:
enter image description here
- on the dropdown result table "CREATION" appears in black.
How to I get the "CREATION" in the dropdown result table to keep the color of the source (blue) CREATION, even if I was to change the color of the source later?
Thank you in advance for your help before I have no hair left at all!
Flox
I have an Excel sheet with a very wide table on it. Due to developer friendlyness I'd like to use a certain style of column header naming (much like proper Hungarian notation), where I suffix each header name with "column type" tags. This allows me to easily spot where e.g. apples and oranges are compared. There are also pivot table reports based on this table.
An example to illustrate this: say you have 2 monetary columns, column A being expressed in another currency than column B. The model should thus never combine them without first applying appropriate exchange rates. To spot this I name these columns e.g. Earned - Cur1 and Saved - Cur2. Any calculation like =[#[Earned - Cur1]] + [#[Saved - Cur2]] is illegal, but due to the tags this can be picked up easily in an audit. I have several such tag groups in use already, and they already prevented some errors creeping in.
However...
The file also needs to be distributed to lots of not-so-savvy end users, and they need to fill in this table and refer to some of the outcome columns. Most intermediate columns we already hide, but the column names are now far from being user-friendly (like: fill out Actual - NK/Q1/EC/%, please?).
And this needs to run in Excel 2010.
What are my options?
Option 1
Add an extra row above the table, putting human readable names in there, and just hide the table header row. This works, but not the users can't sort and filter the table anymore, so that's a no-go.
Option 2
Augment option 1 by prepending a newline to each column name, and make the table header row 1 character high. The header cells would still be there to drive sorting and filtering and the users have human readable names in the row above. The actual header cells would appear like 'empty' buttons. Could work, but then the complex formulas become unreadable due to all the newlines from the column names all over the place.
Option 3
Add a macro that switches the headers in the table by alternative headers in another row above the table. The macro should be ran just before sending out the file to the users, and ran again when they return them filled in and all. I happily coded this option into the file, and it works wonderfully! But then I realized this (and thus option 2 as well) breaks all the derived pivot tables, since Excel links the data by the names used in the table - update the name, and that section of the pivot will be dropped...
I'd really like the option of having our development-oriented column names in there when we ourselves work with the file, but being able to switch out the headers when needed. And of course without rebuilding all the pivots after each such switch.
An opening here would be that pivots seem to only drop the columns once they're refreshed. I could use this to update the header names, then do some magic on the pivots to remap their fields, and only then refresh them, but it seems there's no way from within VBA to accomplish that (PivotField.SourceName is read only).
Hopefully someone can think of an alternative, or am I SOL? I'm totally open to other workarounds.
Workaround 1
Insert null-terminating characters in the header names such that they do not show normally in the formulas, but do not show in the table header row. If only it were that simple though... Turns out Excel throws up from a =Char(0)&"abc", and things like =Char(8)&"abc" (tab anyone?) give Unicode replacement characters when pasted into a header cell... (?)
Workaround 2
A last resort seems to be to unzip the excel file, and plough through the xml data to update everything in one go there, then rezip the file. But this code also needs to be executed by less skilled users, and I see too many ifs and buts to make me feel safe using this setup.
Workaround 3
For now I just use a variation on option 2; I have some VBA that 'empties' the header cells instead of prepending a newline to them. By 'emptying' I mean setting the font size to 1, subscript, non-bold, and then make the font color identical to the background color, followed by setting it's row height to the default 14.5. The cryptic names do leak out however; column header cell drop down arrows for sorting&filtering show the cryptic name, as well as the pivot field settings and of course the formula bar when you just click such a cell. But I guess it's the best I can do?
And then again I'm probably just perfectionizing this thing faaar to much :) But from this point on it's about the challenge!
Make sure you Tick the Box "Add this data to the DataModel" when creating your pivot(s)
AFAIK when your Pivots are connected to the Datamodel instead of directly to the Range/Table you can change your column-names in the Table and your Pivot will stay fine. You could even use other names in your Pivot.
I have two datasets, they are stored separately but they are related, they describe the same phenomenon, from different perspectives, in different ways.
The encoding is not really consequential, here they are rendered as Excel/LibreOffice but I can also get them as CSV.
One "sheet", Sheet I, looks like this:
and Sheet II:
Using the field submission # as the unifier I want to create a single sheet which will associate the related blue fields to the corresponding pink field.
For example, the final result should look like this:
Here is a link to those toy examples.
On sheet1 cell h2 insert:
=VLOOKUP($B2,Sheet2!$A:$F,COLUMN(h2)-6,0)
The $ fixes the data table and the row for the lookup.
You can drag the function with the black plus sign which appears when you are pointing the right bottom of the cell.
I hope someone can help me come up with an algorithm.
Im still very new with Apache POI and I was assigned to come up with an algorithm on how to read a template (Excel) and extract the headers/column names from the data itself.
The following must be taken into account:
There can be multiple headers/column names in just one sheet of an Excel file.
Headers can be horizontal AND/OR vertical in nature. This means that there could be a mixture of vertical and horizontal headers in one sheet.
Headers dont necessarily have to be at the very first row of the file. There could be introductions or banner images there.
The system must allow ANY kind of Excel format, so there is no control over the formatting of the cells, the naming convention, etc.
Some headers are alphanumeric in nature, which means it also contains numbers.
Some cells are merged to make room for a specific header.
Any ideas and suggestions are very much welcome. Just let me know if you have further clarifications.
(I know nothing about Apache, but some about Excel Interop working)
If the sheets to be detected are yours, I'd recomend NAMING those header cells. (To name a cell in Excel, there's a field at the top left of the screen, where normally the cell coordinates appear (like "A1" or "B2" and so...). Type a name in that place, and you will be able to identify that cell via code by it's name. ( 'Worksheet.Range("Name")' is where you get those cells via code)
To manage names, go to "Insert - Names" or "Formulas - Name manager", depending on what version of excel.
(Personally, I never work with sheets via code without naming headers, then I use "Offset" to get the data cells corresponding to those headers - This allows me to freely edit the sheet later without breaking the code)
If the sheets aren't yours, then, you'll need to find out the extents of the data. (Last row and last column)
Then check for the first line that contains all columns filled, none of them blank. That's a probable horizontal header.
As well as check for the first columns that contains all lines filled. That's a probable vertical header.
You could, as well, search for completely blank lines and/or columns to find headers that are AFTER some data, in case of sheets containing multiple horizontal headers, or vertical.
You could use some formatting properties (Range.Interior or Range.Font for examples) of those cells to identify if they are headers (usually headers have different format, color, borders and so on).
If you're sure there's no numeric header, I mean, all headers contains text, check for the type of data in the cells. If all are strings, header probability increases.
Even so, that's a tricky thing to do, if sheets don't follow some pattern, once in a while one of them can deceive your code and bring false results. I'd recommend, if alowed, to add a human verification to confirm the results after the proccess is done.
The solution to this problem involves taking away two of these freedoms. Such constraints applied will make this a tractable problem. Most of such freedoms come from overcautious thinking.
The freedoms are given as quotes below:-
Headers can be horizontal AND/OR vertical in nature. This means that there could be a mixture of vertical and horizontal headers in one sheet.
Typically, vertical headers are not used in Excel Files where there is a need to programmatically detect headers. As the primary, most common and sometimes the only reason for such detection is to upload/transform the tabular data.
Funny things happen when vertical headers are introduced:
They become Labels of Forms. This implies that such forms are used for data entry rather than storage. The data from such forms is stored in horizontal/columnar headers and rowwise/vertical records of data . Thus obviating the need for Upload/Transformation of the data entry sheet.
Excel is designed to have only horizontal headers. Vertical Headers cease to have autofilter support.
Even when Vertical Headers are present, a top horizontal header row can still be introduced to mark the headers themselves as descriptions / categories.
Staying true, to the core need for autodetection of headers, we can state that once our requirement states that Headers can be placed only in a horizontal alignment, the solution becomes slightly more tractable but not fully so.
Some cells are merged to make room for a specific header.
Merging cells is poison and anathema to the entire reason for transformation/upload of data. This is a pill I steadfastly have refused to take in my entire career with Excel & SQL jugglery. You may kindly merge all that you want to for all I care, however thee shall not pass into my beloved SQL Server.
For aforementioned reasons of prejudice and ill-will towards all mergers and mergees alike. I'd respectfully suggest that you too take this course.
Solution
Staying true to the above requirements after taking away the 2 freedoms. The pseudo algorithm (solution) is to
Take a sample of say c x r Excel Rows. For eg: 200 x 201 rows and columns
Find the counts of non-empty cells using an inbuilt formula like COUNTA whose contents have a non-zero length. The Count of such non-empty cells in each row is maintained as a data structure.
The type of data ie:- Number, Date, String should also be maintained in the above data structure capable of expressing the following:
Row# 22 contains
30 non-empty cells of which
28 are alphanumeric,
1 is a Date and
1 is a Number.
The First specific row that contains the maximum number of such non empty cells with the maximum number of strings should very likely be the header row.
Converting all of the above to a specific algorithm in any given language should be a deliciously occupying task for any young developer in their prime.