Can someone help me in understanding the Shared Strings in MS Excel? I tried to understand using some blogs but could not get complete idea. Everyone is explaining how to access Shared String using Open XML and where the Shared Strings stored (as sharedStrings.xml). Accessing using API is fine. But,
How to create Shared Strings in Excel. (Creating manually in Excel 2010, not using API)
What is the exact need of Shared Strings?
In which cases i can go for Shared Strings?
I tried following.
http://www.sadev.co.za/content/reading-and-writing-excel-2007-or-excel-2010-c-part-iii-shared-strings
http://msdn.microsoft.com/en-us/library/gg278314.aspx
Shared strings is basically a space saving mechanism. As for your questions:
A1. You can't manually create shared strings using the Excel user interface. That's because Excel by default always store any text as a shared string.
A2. As mentioned it's a space saving mechanism. Excel 2007/2010/2013 uses the Open XML format, which is basically a bunch of XML files zipped together. It might also be for ease of referencing. You just have to refer to an index, just like you refer to an index of an array of strings. (But XML is inherently verbose, so I suspect it's for space saving purposes).
Let's say you have the text "This is a very long string" in cell A1 of sheet "FirstSheet". Let's say you also have the same text in cell B7 of sheet "SecondSheet". Excel stores "This is a very long text" in the shared strings table as one entry, say index 5. In "FirstSheet" cell A1, the Open XML SDK class Cell will contain just "5" as the CellValue. In "SecondSheet" cell B7, the SDK class Cell will also contain "5".
Basically, the CellValue only holds the index to the shared string table. This is how you save space. The assumption is that text is duplicated within the worksheet as well as across different worksheets.
A3. Go for shared strings if you understand how to make it work. If not, just set the actual text in the Cell class for CellValue (Cell.DataType as CellValues.String instead of CellValues.SharedString).
Related
I am using Excel to compare error messages. My error message looks like this .
You have changed the values.
Do you want to continue?
I entered this value in excel using Alt+Enter, when reading this value from UFT, this carriage return is not considered.
How to include carriage return in excel so that it is visible when reading the values from UFT?
First, try to create a string formula, i.e. instead of entering the line break in the string, create a formular like
="<value>"
where <value> is the string value that you originally wanted in your cell, with the line feed contained within the quotation marks.
This might solve your issue.
But:
This is just the top of the mountain of known issues with UFTs data table API. Here is an incomplete list of additional issues (some of which, but not all, are fixed or at least improved upon in 15+):
Date values are not properly handled, especially if you are using a
non-US locale and try to consume values auto-formatted by Excel as
dates
Strange things happen if you have an .xls file. Use an .xlsx file instead. (This used to be the other way around). Note it is not only the extension I am referring to, but also the format (Excel-95 vs. more modern format)
Many formulas are unsupported Formatting behavior differentfrom what Excel would do/show
CRs and LFs are handled differently
from what Excel does
Built-in table editor is quite a silo of bugs
and antiergonomic cell values are limited in length; at the same
time, formulas have different length limits. I.e. a string in a cell is
limited to a certain maximum number of characters, but a formular
returning a string does not have that (but maybe a higher) length
limit
Because of this (and more), we auto-convert all excel sheets on the fly before when we use them in UFT after they have been updated. To do this, we are using Excel Interop (i.e. Excel´s COM automation interface) to spawn an Excel instance, create a converted version that has all formulas and formatting resolved to just string formulars, and use the converted sheets with UFTs DataTable.ImportSheet feature. Which means we unfortunately need Excel on all execution machines.
So my recommendation would be to stay away from the data table editor in UFT. Use Excel, and make sure all your edits come through to UFT in a meaningful way. If they don´t, consider a converter that creates a DataTable-compatible copy of your sheet.
Yes, I know this is suboptimal, but that´s what it has come down to after years and years of struggling with the DataTable API and UFTs "superb" built-in data table editor.
I am trying to import/export a cell in/to Access. This cell is where my coworkers can input their comments (Cell B29).
Here is the coding I write for exporting the data:
rs.Fields("CustomNotes") = Sheets("Main").Range("CUSTOMNOTES")
When I save the data into database, the contents in the cell were successfully saved into the database - in a column with long text.
However, when I read the data from the database, the cell is empty and doesn't show anything. Here is the code i write for importing data:
Sheets("Main").Range("CUSTOMNOTES") = rsl![CustomNotes]
When I do the debugging, rsl![CustomNotes] shows object required..
Can someone please help here? Do I need to add any definition or my variable type is wrong?
Every time I have seen a problem like this, it was related to the field being a Long Text/Memo field and changing the field type to Short Text has fixed it. Excel has problems with Long Text in some cases.
The main difference between a Short Text and a Long Text field is simply the number of characters can hold (256). They both can store alpha and numeric characters.
Other limitations for Short Text are that it cannot hold Rich Text information, whereas Long Text can. If you are using Short Text and need special formatting done, you will need to do that in code instead after you retrieve the data.
I have a workbook wkb1 having a cell containing a data validation list based on some column array (of names let's say) in a another workbook wkb2. Next to this columns are many other columns with data (let's say ages, birthdays etc) corresponding to these names. In wkb1 I get these data by =OFFSET([wkb1]Sheet1!A3, MATCH(...), colsindex) fomulas, in various cells.
Now, imagine I have a ton of workbooks of the same kind that wkb2, with only different data (different names, ages, birthdays). What I would like to do is the following : in some cell in a sheet of wkb1 I would input the path to one of the wkb2's, and then in all cells (in wkb1) where I have the aforementioned formulas plus the cell containing the validation list, I would like wkb2 to be the one pointed by the path.
I don't believe it is possible to use INDIRECT on a string that includes the path to a workbook.
Using Mac Excel 2016, the following works directly in a cell:
{='/Users/xxxx/Desktop/[Book1.xlsx]Sheet1'!$A$1:$B$2}
but
{=INDIRECT("'/Users/xxxx/Desktop/[Book1.xlsx]Sheet1'!$A$1:$B$2")}
gives #REF!
Without the path, both work:
{='[Book1.xlsx]Sheet1'!$A$1:$B$2}
{=INDIRECT("'[Book1.xlsx]Sheet1'!$A$1:$B$2")}
(although INDIRECT is volatile, while the first version is not)
There was a suggestion on the web somewhere, that defining a name as follows would work:
In some cell, $H$6 say, put the path string (for example, '/Users/xxxx/Desktop/[Book1.xlsx]Sheet1'!$A$1:$B$2)
Using Insert > Names > Define Name, create a name, say ExternalRange, with the following value:
=evaluate(Sheet1!$H$6)
In the workbook, =ExternalRange is supposed to give you the external workbook values.
For me, this only worked if the string in $H$6 did not contain the path. In other words, exactly the same behaviour as INDIRECT. (Note that putting a string into a cell which begins with ' requires an extra ' at the beginning. I don't think this is the root of the problem though, as I observe the same behaviour typing the string in directly, which avoids the need for the extra ')
For your problem, I would try a different approach in any case. Both OFFSET and INDIRECT are volatile, and recalculate every time the worksheet recalculates, rather than only when their inputs change. This has the potential to create a big performance penalty.
To create the Validation that is populated from a separate worksheet, I would use Names which are defined with INDEX and MATCH. I give an example at the end.
To be able to dynamically change the source workbook that contains the validation information, the approach I would take would depend on:
(1) Do you need to use the values from numerous wkb2's at the same time?
or
(2) Do you need to be able to dynamically switch between various wkb2's, but only use the data from a single wkb2 at any one time?
Also, is the set of wkb2's known beforehand? Or could the user type in an arbitrary path to a wkb2?
Depending on the answers to the above, I would either created a bunch of links (via Names) to the complete set of possible wkb2's (assuming the set size is not too large) and then dynamically change which Name my validation points to (possibly using CHOOSE, again via a Name), or I would change the source of a single link as required. I only know how to change the source of a link using the interface (Data > Edit Links... > Change Source) or via VBA (Workbook.ChangeLink Method). Note that the Edit Links... menu item only appears if you have links.
If I had to take this approach, I would opt to put the path to the source in a cell, provide a button which links to VBA and makes the change, but I guess it would be possible to have a custom VBA function that takes the path and changes the link. Personally, I think using the custom VBA function would be ugly. Even using the button linked to VBA isn't great, so I would choose to predefine all the possible links if at all possible. (which I would probably do with VBA, but this would be VBA that is used once and not part of the final spreadsheet.)
Here is the way I would setup the validation to avoid using OFFSET or INDIRECT, and also to make it easier to change the validation data in the future.
(1) In wkb2, I have the following data in cells $A$1 to $C$5:
Name Gender Age
Peter Male 23
Sally Female 21
Roger Male 34
Abad Male 27
(2) In wkb2, define the following Names using Insert > Name > Define Name, (or with VBA if you are doing it a lot):
Header1 =Sheet1!$A$1:$C$1 (The Sheet1! makes it a local name)
Data1 =Sheet1!$A$2:$C$5 (The Sheet1! makes it a local name)
I prefer to use local names that are defined on a particular worksheet only, which makes it easier if the sheet is duplicated in the same workbook. They do require you to reference the name with the sheetname though (so Sheet1!Data1 rather than just Data1). Global names will work in this example too, if you prefer them.
(3) In wkb1, define the following name:
NameValidation =INDEX([wkb2.xlsx]Sheet1!Data1,,MATCH("Name",[wkb2.xlsx]Sheet1!Header1,0))
NameValidation now refers to the "Name" column in Data1 of wkb2. You can resize Data1, insert new columns, change the order of the columns, etc., and NameValidation will still point to the "Name" column. Alternatively, you could also define your names in wkb2 to be columns only. So a name for "Names", a different name for "Gender", etc. This would be a good approach if the data lengths are different, or the data was not positioned next to each other on wkb2.
(4) In wkb1, use Data > Validation > Settings: Allow List and choose the source as =NameValidation (Note that I don't think it is possible to put the formula =INDEX([wkb2.xlsx]Sheet1!Data1,,MATCH("Name",[wkb2.xlsx]Sheet1!Header1,0)) directly into the validation. I believe you need to go via a name such as I did with NameValidation)
I am attempting to read an Excel 2007 file (xlsx) from outside of Excel and I am finding an inconsistency that I cannot explain.
If you enter the value of 19.99 into a cell and then look at the underlying Xml document it is actually stored as 19.989999999999998. This is not the only value that does this, but it is a reasonable example. No formatting is applied in the sheet. In my example I just open a new Workbook, type in 19.99 in A1 and save the file.
I have attempted to open this simple example in both open office and Google docs and it shows 19.99 when the document is loaded.
My question is, how do I determine when to transform this value from 19.989999999999998 into 19.99 for use in other systems?
The variation between the 19.99 you entered and the 19.989999999999998 stored is the floating point variation... there will typically always be a slight discrepancy between the binary representation of a float (used internally by Excel) and the decimal used for display (and storage in the xlsx file).
Even if you haven't explicitly assigned a format to the cells, Excel applies a default formatting of "#" or "General", which typically (for numerics) displays to 2dp, applying scientific if needed. If you look at the number formatting for that cell (whether using theMS Excel front-end, or by examining the xlsx file), you should find that it is actually set to the default.
This question is long winded because I have been updating the question over a very long time trying to get SSIS to properly export Excel data. I managed to solve this issue, although not correctly. Aside from someone providing a correct answer, the solution listed in this question is not terrible.
The only answer I found was to create a single row named range wide enough for my columns. In the named range put sample data and hide it. SSIS appends the data and reads metadata from the single row (that is close enough for it to drop stuff in it). The data takes the format of the hidden single row. This allows headers, etc.
WOW what a pain in the butt. It will take over 450 days of exports to recover the time lost. However, I still love SSIS and will continue to use it because it is still way better than Filemaker LOL. My next attempt will be doing the same thing in the report server.
Original question notes:
If you are in Sql Server Integrations Services designer and want to export data to an Excel file starting on something other than the first line, lets say the forth line, how do you specify this?
I tried going in to the Excel Destination of the Data Flow, changed the AccessMode to OpenRowSet from Variable, then set the variable to "YPlatters$A4:I20000" This fails saying it cannot find the sheet. The sheet is called YPlatters.
I thought you could specify (Sheet$)(Starting Cell):(Ending Cell)?
Update
Apparently in Excel you can select a set of cells and name them with the name box. This allows you to select the name instead of the sheet without the $ dollar sign. Oddly enough, whatever the range you specify, it appends the data to the next row after the range. Oddly, as you add data, it increases the named selection's row count.
Another odd thing is the data takes the format of the last line of the range specified. My header rows are bold. If I specify a range that ends with the header row, the data appends to the row below, and makes all the entries bold. if you specify one row lower, it puts a blank line between the header row and the data, but the data is not bold.
Another update
No matter what I try, SSIS samples the "first row" of the file and sets the metadata according to what it finds. However, if you have sample data that has a value of zero but is formatted as the first row, it treats that column as text and inserts numeric values with a single quote in front ('123.34). I also tried headers that do not reflect the data types of the columns. I tried changing the metadata of the Excel destination, but it always changes it back when I run the project, then fails saying it will truncate data. If I tell it to ignore errors, it imports everything except that column.
Several days of several hours a piece later...
Another update
I tried every combination. A mostly working example is to create the named range starting with the column headers. Format your column headers as you want the data to look as the data takes on this format. In my example, these exist from A4 to E4, which is my defined range. SSIS appends to the row after the defined range, so defining A4 to E68 appends the rows starting at A69. You define the Connection as having the first row contains the field names. It takes on the metadata of the header row, oddly, not the second row, and it guesses at the data type, not the formatted data type of the column, i.e., headers are text, so all my metadata is text. If your headers are bold, so is all of your data.
I even tried making a sample data row without success... I don't think anyone actually uses Excel with the default MS SSIS export.
If you could define the "insert range" (A5 to E5) with no header row and format those columns (currency, not bold, etc.) without it skipping a row in Excel, this would be very helpful. From what I gather, noone uses SSIS to export Excel without a third party connection manager.
Any ideas on how to set this up properly so that data is formatted correctly, i.e., the metadata read from Excel is proper to the real data, and formatting inherits from the first row of data, not the headers in Excel?
One last update (July 17, 2009)
I got this to work very well. One thing I added to Excel was the IMEX=1 in the Excel connection string: "Excel 8.0;HDR=Yes;IMEX=1". This forces Excel (I think) to look at all rows to see what kind of data is in it. Generally, this does not drop information, say for instance if you have a zip code then about 9 rows down you have a zip+4, Excel without this blanks that field entirely without error. With IMEX=1, it recognizes that Zip is actually a character field instead of numeric.
And of course, one more update (August 27, 2009)
The IMEX=1 will succeed importing data with missing contents in the first 8 rows, but it will fail exporting data where no data exists. So, have it on your import connection string, but not your export Excel connection string.
I have to say, after so much fiddling, it works pretty well.
P.S. If you are using a x64 bit version, make sure you call the DTExec from C:\Program Files\Microsoft SQL Server\90\DTS.x86\Binn. It will load the 32 bit Excel driver and work fine.
Would it be easier to create the Excel Workbook in a script task, then just pick it up later in the flow?
The engine part of SSIS is good but the integration with Excel is awful
"Using SSIS in conjunction with Excel is like having hot tar funnelled up your iHole in a road cone"
Dr. Zim, I believe you were the one that originally brought up this question. I totally feel your pain. I love SSIS overall, but I absolutely hate the limited tools that come standard for Excel. All I want to do is Bold the Heading or Row1 record in Excel, and not bold the following records. I have not found a great way to do that; granted I am approaching this with no script tasks or custom extensions, but you would think something this simple would be a standard option. Looks like I may be forced to research and program up something fancy for a task that should be so fundamental. I've already spent a rediculous amount of time on this myself. Does anyone know if you can use Excel XML with Excel versions: 2000/XP/2003? Thanks.
This is an old thread but what about using a flat file connection and writing the data out as a formatted html document. Set the mime type in the page header to "application/excel". When you send the document as an attachment and the recipient opens the attachment, it will open a browser session but should pop Excel up over the top of it with the data formatted according to the style (CSS) specified in the page.
Can you have SSIS write the data to an Excel sheet starting at A1, then create another sheet, formatted as you like, that refers to the other sheet at A1, but displays it as A4? That is, on the "pretty" sheet, A4 would refer to A1 on the SSIS sheet.
This would allow SSIS to do what it's good for (manipulate table-based data), but allow the Excel to be formatted or manipulated however you'd like.
When excel is the destination in SSIS, or the target export type in SSRS, you do not have much control over formatting and specifying how you want the final file to be. I have written a custom excel rendering engine for SSRS once, as my client was so strict about the format of final Excel report generated. I used 'Excel xml' to get the job done inside my custom renderer. May be you can use XML output and convert it to Excel XML using XSLT.
I understand you would rather not use a script component so perhaps you could create your own custom task using the code that a script contains so that others can use this in the future. Check here for an example.
If this seems feasible the solution I used was CarlosAg Excel Xml Writer Library. With this you can create code which is similar to using the Interop library but produces excel in xml format. This avoids using the Interop object which can sometimes lead to excel processes hanging around.
Instead of using a roundabout way to do this exercise of trying to write data to particular cell(s), format the cell(s), style them which is indeed a very tedius effort considering the support SSIS has for EXCEL, we could go the "template" way to do this.
assume we need to write data in the so & so cell with all the custom formating thats done on it. Have all the formatting in a sheet, say "SheetActual", Whereas the cells that will hold the data will actually have Lookups/ refrences/ Formulaes to refer to the original data that SSIS exports in a hidden sheet say "SheetMasterHidden" of the same Excel connection. This "SheetMasterHidden" will essentially hold the master data in default format that SSIS writes data to the excel. This way you need not worry about formatting the data runtime.
Formatting the Excel is a one time work "IF" the formatting dont change very often. If the format changes and the format is decided runtime this solution maynot go very well.
The answer is in the question. Over time, it became a progress status. However, there is SSRS that will create Excel files if you create TABLE presentations. It works pretty well too.