import text file containing line breaks into excel - excel

I have a plain text file looking like this:
"some
text
containing
line
breaks"
I'm trying to talk excel 2004 (Mac, v.11.5) into opening this file correctly. I'd expect to see only one cell (A1) containing all of the above (without the quotes)...
But alas, I can't make it happen, because Excel seems to insist on using the CR's as row delimiters, even if I set the text qualifier to double quote. I was sort of hoping that Excel would understand that those line breaks are part of the value - they are embedded in double quotes which should qualify them as part of the value. So my Excel sheet has 5 rows, which is not what I want.
I also tried this Applescript to no avail:
tell application "Microsoft Excel"
activate
open text file filename ¬
"Users:maximiliantyrtania:Desktop:linebreaks" data type delimited ¬
text qualifier text qualifier double quote ¬
field info {{1, text format}} ¬
origin Macintosh with tab
end tell
If I could tell Excel to use a row delimiter other than CR (or LF), well, I'd be a happy camper, but excel seems to allow the change of the field delimiter only, not the row delimiter.
Any pointers?
Thanks,
Max
Excel's open

Looks like I just found the solution myself. I need to save the initial file as ".csv". Excel honors the line breaks properly with CSV files. Opening those via applescript works as well.
Thanks again to those who responded.
Max

The other option is to create a macro to handle the opening. Open the file for input, and then read the text into the worksheet, parsing as you need, using a Range object.

If your file has columns separated by list separators (comma's, but semicolons for some non-English region settings), rename it to .csv and open it in Excel.
If your file has columns separated by TABs, rename it to .tab and open it in Excel.
Importing (instead of opening) a csv or tab file does not seem to understand line feeds in between text delimiters. :-(

Is it just one file? If so, don\'t import it. Just copy paste the content of your text file into the first cell (hit f2, then paste).
If you absolutely must script this, Excel actually uses only one of those two chars (cr, lf) as the row delimiter, but I'm not sure which. Try first stripping out the lf's with an external util (leave the cr's) and then import it... if that does't work, strip out the cr's (leave the lf's) and thenimport it.

Related

Opening CSV file

I am generating CSV files. My first row it is column names, and it looks like
User ID;First Name;Last Name;Email;...
But if I will change User ID to ID, MS office cannot open this CSV and shows me error
Cannot read record(number of record)
But this file opens correctly on, Notepad++. I am using Excel 2013. Any ideas what is wrong?
You can solve the problem by inserting the following simple text at the beginning (the first line) of your .csv file:
sep=;
This will not be seen when the file is opened in Excel. What it will do - it will explicitly tell Excel that the delimiter is ;, and values will be separated into separate cells. Also, you will be able to use ID as the title of a column. Unfortunately, I cannot answer why Excel does not like it when you use this title at the beginning of the file.

Carriage Return in Notepad

I have between 1-2 thousand notepad files that I need to add a new line to. I have an excel macro that can automatically find and replace text in notepad files, which I can use to add in the text I need. The excel macro has one cell where the user types the text to be found, and another where the user types the text that will replace that text. The problem is, I need to replace one line with two, and putting in a linebreak in the 'replace with' cell in excel (using alt-enter) does not put the text on a new line in notepad.
Interestingly, when I open the notepad file in Word, it does show up on a new line, with a carriage return between the two lines, but is still on the same line in notepad. Is there any way that I can use the excel macro to add the carriage return to show up in notepad?
ALT+Enter will only put a line feed into the string.
Notepad does not understand the "UNIX" style of encoding, but more advanced programs do.
if you replace the line feed with a full DOS newline, you should find your problem goes away:
NewString=Replace(OldString,vbLf,vbCrLf)
vbLf is the excel constant for the line feed.
vbCrLf is the excel constant for the DOS newline.

Import txt file with line breaks into Excel

while working on an export to Excel I discovered the following problem.
If you create a table where one cell has a line break and you save the document as a txt file it will look like this:
"firstLine<LF>secondLine"<TAB>"secondColoumn"
When I open this file in Excel the line break is gone and the first row has only one cell with the value firstLine
Do you know if it is somehow possible to keep the line breaks?
EDIT: Applies to Excel2010. Don't know if other versions behave different.
EDIT2: Steps to reproduce:
Open blank excel sheet
Enter text (first column with line break, second colum not important)
Save as Unicode Text (txt) // all other txt don't work as well
Close Excel file
File->Open
No changes in the upcoming dialog.
The excel file has now 2 rows which is wrong.
I was finally able to solve the problem! yay :D
CSV:
The german Excel needs a semicolon as a separator. Comma doesn't work.
Note: This is only true when the file is encoded as UTF-8 with BOM at the beginning of the file. If it's ASCII encoded comma does work as a delimiter.
TXT:
The encoding has to be UTF-16LE. Also it needs to be tab delimited.
Important:
The files will still be displayed incorrect if you open them with the "File->Open" dialog and "import" them. Draging them into Excel or opening with double click works.
It isn't a problem - in the sense of expected behaviour - this is inherent when you save text as Unicode or as Text (tab delimited)
If you save the file as unicode and then either
Open it in Notepad
Import it in Excel
you will see that the cells with linebreaks are surrounded by ""
The example below shows two linebreaks
A1 has an entry separated using Alt+Enter
B1 has an enry using the formula CHAR(10)
The picture also shows what notepad sees on a saved Unicode version
Suggested Workaround 1- Manual Method
In Excel, choose Edit>Replace
Click in the Find What box
Hold the Alt key, and (on the number keypad), type 0010
Replace with a double pipe delimiter
Save as Unicode
Then reverse the process when needed to reinsert the linebreaks
This can be done easily in VBA
Suggested Workaround 2 - VBA alternative
Const strDelim = "||"
Sub LBtoPIPE()
ActiveSheet.UsedRange.Replace Chr(10), strDelim, xlPart
ActiveSheet.UsedRange.Replace "CHAR(10)", strDelim, xlPart
End Sub
Sub PIPEtoLB()
ActiveSheet.UsedRange.Replace strDelim, Chr(10), xlPart
ActiveSheet.UsedRange.Replace strDelim, "CHAR(10)", xlPart
End Sub

Importing txt files to excel makes linebreaks disappear

I am trying to import a text file into excel (2007). The file was exported from a C# text box and it contains linebreaks. Although when I import it (with the text import wizard that comes with excel), the linebreaks disappears completely. I would prefer not to have to write a VBA file and place in an excel file to run but instead change this with a neat method in C#, before it turns the text box data into a txt file. Is this possible in any way?
I figured this out. If you put quotes around text any embedded line feeds (ASCII 010) will be imported into Excel as embedded line feeds. In other words, these line feeds will not cause the text to split across Excel rows.
Try it. Create two files in Notepad.exe. In the first terminate the first line by pressing Alt-0010:
Test line 1 terminated with alt-0010
Test line 2
In the second, begin lines with " and terminate with ". For the first line insert an Alt-0010 just before the ":
"Test line 1 terminated with alt-0010 prior to the quote"
"Test line 2"
Now import both into Excel and see the difference.
See IETF RFC 4180 for more information
In excel a line break within a cell is encoded as ascii code 10 (i.e. \n) (determined through the handy use of the macro recorder and inspection of the generated VBA). I think the 'disappearance' of new lines probably is a result of you're C# emitting \n\r, so you might try doing a global replacement of "\r" with "" in your C# code before outputting it.
Thanks for the help! I tried starting the text with \" and finish with \" but when I go to excel, I get each line in a separate cell and my hopeful plan was to get all the text in one single excel cell.

Excel CSV - Number cell format

I produce a report as an CSV file.
When I try to open the file in Excel, it makes an assumption about the data type based on the contents of the cell, and reformats it accordingly.
For example, if the CSV file contains
...,005,...
Then Excel shows it as 5.
Is there a way to override this and display 005?
I would prefer to do something to the file itself, so that the user could just double-click on the CSV file to open it.
I use Excel 2003.
There isn’t an easy way to control the formatting Excel applies when opening a .csv file. However listed below are three approaches that might help.
My preference is the first option.
Option 1 – Change the data in the file
You could change the data in the .csv file as follows ...,=”005”,...
This will be displayed in Excel as ...,005,...
Excel will have kept the data as a formula, but copying the column and using paste special values will get rid of the formula but retain the formatting
Option 2 – Format the data
If it is simply a format issue and all your data in that column has a three digits length. Then open the data in Excel and then format the column containing the data with this custom format 000
Option 3 – Change the file extension to .dif (Data interchange format)
Change the file extension and use the file import wizard to control the formats.
Files with a .dif extension are automatically opened by Excel when double clicked on.
Step by step:
Change the file extension from .csv to .dif
Double click on the file to open it in Excel.
The 'File Import Wizard' will be launched.
Set the 'File type' to 'Delimited' and click on the 'Next' button.
Under Delimiters, tick 'Comma' and click on the 'Next' button.
Click on each column of your data that is displayed and select a 'Column data format'. The column with the value '005' should be formatted as 'Text'.
Click on the finish button, the file will be opened by Excel with the formats that you have specified.
Don't use CSV, use SYLK.
http://en.wikipedia.org/wiki/SYmbolic_LinK_(SYLK)
It gives much more control over formatting, and Excel won't try to guess the type of a field by examining the contents. It looks a bit complicated, but you can get away with using a very small subset.
This works for Microsoft Office 2010, Excel Version 14
I misread the OP's preference "to do something to the file itself." I'm still keeping this for those who want a solution to format the import directly
Open a blank (new) file (File -> New from workbook)
Open the Import Wizard (Data -> From Text)
Select your .csv file and Import
In the dialogue box, choose 'Delimited', and click Next.
Choose your delimiters (uncheck everything but 'comma'), choose your Text qualifiers (likely {None}), click Next
In the Data preview field select the column you want to be text. It should highlight.
In the Column data format field, select 'Text'.
Click finished.
You can simply format your range as Text.
Also here is a nice article on the number formats and how you can program them.
Actually I discovered that, at least starting with Office 2003, you can save an Excel spreadsheet as an XML file.
Thus, I can produce an XML file and when I double-click on it, it'll be opened in Excel.
It provides the same level of control as SYLK, but XML syntax is more intuitive.
Adding a non-breaking space in the cell could help.
For instance:
"firstvalue";"secondvalue";"005 ";"othervalue"
It forces Excel to treat it as a text and the space is not visible.
On Windows you can add a non-breaking space by tiping alt+0160.
See here for more info: http://en.wikipedia.org/wiki/Non-breaking_space
Tried on Excel 2010.
Hope this can help people who still search a quite proper solution for this problem.
I had this issue when exporting CSV data from C# code, and resolved this by prepending the leading zero data with the tab character \t, so the data was interpreted as text rather than numeric in Excel (yet unlike prepending other characters, it wouldn't be seen).
I did like the ="001" approach, but this wouldn't allow exported CSV data to be re-imported again to my C# application without removing all this formatting from the import CSV file (instead I'll just trim the import data).
I believe when you import the file you can select the Column Type. Make it Text instead of Number. I don't have a copy in front of me at the moment to check though.
Load csv into oleDB and force all inferred datatypes to string
i asked the same question and then answerd it with code.
basically when the csv file is loaded the oledb driver makes assumptions, you can tell it what assumptions to make.
My code forces all datatypes to string though ... its very easy to change the schema.
for my purposes i used an xslt to get ti the way i wanted - but i am parsing a wide variety of files.
I know this is an old question, but I have a solution that isn't listed here.
When you produce the csv add a space after the comma but before your value e.g. , 005,.
This worked to prevent auto date formatting in excel 2007 anyway .
The Text Import Wizard method does NOT work when the CSV file being imported has line breaks within a cell. This method handles this scenario(at least with tab delimited data):
Create new Excel file
Ctrl+A to select all cells
In Number Format combobox, select Text
Open tab delimited file in text editor
Select all, copy and paste into Excel
Just add ' before the number in the CSV doc.
This has been driving me crazy all day (since indeed you can't control the Excel column types before opening the CSV file), and this worked for me, using VB.NET and Excel Interop:
'Convert .csv file to .txt file.
FileName = ConvertToText(FileName)
Dim ColumnTypes(,) As Integer = New Integer(,) {{1, xlTextFormat}, _
{2, xlTextFormat}, _
{3, xlGeneralFormat}, _
{4, xlGeneralFormat}, _
{5, xlGeneralFormat}, _
{6, xlGeneralFormat}}
'We are using OpenText() in order to specify the column types.
mxlApp.Workbooks.OpenText(FileName, , , Excel.XlTextParsingType.xlDelimited, , , True, , True, , , , ColumnTypes)
mxlWorkBook = mxlApp.ActiveWorkbook
mxlWorkSheet = CType(mxlApp.ActiveSheet, Excel.Worksheet)
Private Function ConvertToText(ByVal FileName As String) As String
'Convert the .csv file to a .txt file.
'If the file is a text file, we can specify the column types.
'Otherwise, the Codes are first converted to numbers, which loses trailing zeros.
Try
Dim MyReader As New StreamReader(FileName)
Dim NewFileName As String = FileName.Replace(".CSV", ".TXT")
Dim MyWriter As New StreamWriter(NewFileName, False)
Dim strLine As String
Do While Not MyReader.EndOfStream
strLine = MyReader.ReadLine
MyWriter.WriteLine(strLine)
Loop
MyReader.Close()
MyReader.Dispose()
MyWriter.Close()
MyWriter.Dispose()
Return NewFileName
Catch ex As Exception
MsgBox(ex.Message)
Return ""
End Try
End Function
When opening a CSV, you get the text import wizard. At the last step of the wizard, you should be able to import the specific column as text, thereby retaining the '00' prefix. After that you can then format the cell any way that you want.
I tried with with Excel 2007 and it appeared to work.
Well, excel never pops up the wizard for CSV files. If you rename it to .txt, you'll see the wizard when you do a File>Open in Excel the next time.
Put a single quote before the field. Excel will treat it as text, even if it looks like a number.
...,`005,...
EDIT: This is wrong. The apostrophe trick only works when entering data directly into Excel. When you use it in a CSV file, the apostrophe appears in the field, which you don't want.
http://support.microsoft.com/kb/214233

Resources