I have an application that uses the delivered MS Office IFilters to extract text content from Excel files.
I have an issue with .xlsx files and concatenated strings. The IFilter extracts text, but not concatenated strings.
xls returns concatenated strings (I know they are different file formats and that .xlsx is essentially a zip file with the data being stored as xml). Essentially though, xls returns concatenated strings, xlsx does not.
An example is:
A1=ABC, H2=123, G3=XYZ, D1=Concatenate(A1, H2, G3)
xls IFilter returns the concatenated string as ("ABC123XYZ"), the same as it appears visually in the file, xlsx does not return the concatenated values.
If the cells are adjacent, it may appear that xlsx is returning the concatenated values, but it is not, only the cell values are returned.
I have tried unzipping the xlsx and parsing the .xml files, but again, it does not return the concatenated string.
I'm really after suggestions as how best to handle this. Ultimately I need to be able to extract the concatenated strings from xlsx.
Is my only option to convert the file to xls before extracting the text? Is there an easy way to do this dynamically with no real performance hit and without actually saving the file? Would I be better off 'extracting' the text using Microsoft.Office.Interop.Excel and somehow copying and pasting into a listview? Seems like either would be a huge performance hit.
Any help and advice is gratefully received!
Related
I create .CSV files by building the content like this:
s= "column1, column2, column3 \r\n"
s+= "R2column1, R2column2, R2column3 \r\n"
saveas("file.csv", s);
I now need to include a way to resize the columns when viewed in Excel.
I've read that CSV cannot do this, so what is the next simplest excel file format that can? And how would new syntax look?
CSV files are simple text files that contain plain data.
You can open these files in Excel, and it will be displayed in the spreadsheet view for convenience, with each field separated by the separator (in your case the ,) in a separate cell.
Although you can change the width of an Excel column, this is purely a visual style in Excel and can only be saved in an Excel file.
The solution would therefore be to convert your CSV files to Excel (*.xlsx) files.
Depending on the language you use you can probably directly create Excel files, without the need for conversion. There are libraries available for most programming languages for exactly that purpose.
If you want to create Excel files with Android, this might help: How to create an excel file in android?
I am trying to process a data base, which is in .sav format, on excel.
I have converted sav to csv online: through http://pspp.benpfaff.org/ but I get csv text, not a file.
How can I import this into excel so that it reads it as columns? otherwise I have the text in commas.
Thank you!
I have found the easiest method after online conversion is to save the resulting text as a .txt file. From there you will open Excel-->File-->Import
If your data has headers you will select that option, but the main thing is to set the delimiter as comma and then hit finish. It should retain the data structure initially found in your SPSS file.
I've got a document written in vbscript that generates a .xls document. In the code the data is separated by vbtab's. It opens normally in excel, but in apple numbers all of the data is in one really wide column with the data separated with tabs, as in several spaces. It looks fine, but I need the data to be in different columns so that it can be sorted. Any ideas?
According to Apple, Numbers can import the following formats:
Numbers ’08 or later
Microsoft Excel - Office Open XML (.xlsx) and Office 97 or later (.xls)
Comma Separated Values (CSV)
Tab-delimited text files
It's likely that the issue is that your "xls" file isn't actually Excel formatted. Try changing the file extension of the output file to .txt and opening it in Numbers.
Depending on your data, I would recommend you just output csv instead.
i am exporting data from database to file, which can Excel read and save.
(CSV) I generate csv, with default format (according to RFC 4180, comma delimeter). As expected, stupid excel read all data and place it to one cell.
(CSV with semicolon delimeter), this one excel read fine, but after change some value and press save (CTRL+S), stupid excel saved it to unreadable file (well done!). No delimeters, no string separators. Ok, so i tried to save it as (CSV format with SEMICOLON delimeter), saved file looks ok, but after opening it with excel, error message was showed - INCORRECT FORMAT - no cell found :D really?!
Generating .xsl file in php. It take too much RAM (about 2GB), so it cant be used.
Do you know any good format, which can excel easily open and easily save?
Thanks a lot!
This question is off-topic, but IMHO Excel 2002/2003 XML Format would be the best choice in your circumstances.
The reason for this is that the data in this format is typed - so you will not see numbers misinterpreted as dates, or phone numbers with leading zeros stripped. I am not aware of the kind of problems you describe, so I cannot say for sure how those will be affected.
We have used the concatenate function on excel to add a string of text on excel. We are trying combine around 10 fields with a total of 300 characters. The concatenate function works on excel and once we paste the values to remove the formula, the correct text strings are created. However, the problem arise when we try to use this information as a CSV, tab delimited file to import into our webstore, the concatenate text is not recognised. When we inspect the format of the cell, the characters are displayed as a bunch of ############, rather than the text and I believe this is the reason why it is not allowing us to import the file. Small text strings work, however long strings do not work. We have to use open calc spreadsheet to concatenate, however this has the same problem. We have saved the file at UT8.
You shouldn't have a cell to inspect if you are saving as a csv. You should have a text file to examine. You also don't need to paste the formulas as values, as none of the formulas will remain after you save as a text file.
After you get the Excel file ready with the concatenate formulas, save the sheet as either a tab-delimited .txt or comma-delimited .csv.
Open it up in notepad to verify that the values are still what you expect them to be rather than the ##### characters.
At that point, your import should work.