is a CSV with equal sign valid? - excel

I just got a CSV input file to be processed, which has an equal-sign before the first delimiting quote, and wondered if this is valid and has any purpose. Example (simplified):
"2"
"3"
="4"
After reading some postings like this one I experimented with a CSV like this:
"2"
"3"
="A1+A2"
and:
"2"
"3"
"=A1+A2"
It seems that both Excel and LibreOffice silently ignore the equal-sign before the quote, and nicely treat the equal-sign after the quote as the flag for a formula. However, I could not find any documentation about this.
(For Excel, this CSV needs to be saved with the .txt extension, and opened with control-O)
I am inclined to call the CSV with equal-sign before the open quote as an error that is easy to deal with when reading this file, but still wondering if there is more to say about this.

This is used by Excel to avoid the loss of leading zero's.
For example, if you have a field in your csv file like this: 0123456, Excel will treat it as a number and lose the leading zero.
Saving it as ="0123456" solves this problem.
Using "0123456" won't help either, because quotes are not there to indicate a text field, but to escape possible delimiters inside fields.
Just like having sep=; on the first line to make Excel use the right seperator, the ="" is also 'non-standard', or better: Excel specific, because there is no real standard for csv files.

Excel isn't ignoring = in ="4" or ="A1 + A2", it is treating it as a constant formula.
If you open the csv file that looks like:
"2"
"3"
="4"
="A1+A2"
"=A1+A2"
in Excel the result looks like:
Note how A3 holds the formula ="4" rather than just the number 4.

There is no official standard for CSV. As it says at Comma-separated values,
An official standard for the CSV file format does not exist, but RFC 4180 provides a de facto standard for many aspects of it.
Looking at the RFC 4180, a field is either escaped or non-escaped. The escaped field has a BNF defined like this:
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
Since the equals sign is not a part of the escaped characters, it may be like the "Free Parking" in Monopoly: The rules say nothing regarding it, but the de facto standard is to place $500 under it.

If you have a .csv file with the following contents:
"2"
"3"
="4"
and open it in Excel, you will see:
As you see. Excel discards the double quotes on the first two items and converts the third item into a formula.
That is how Excel functions.
If you want to get the the exact text into Excel (retaining the double quotes) you could use a macro.

Related

how to deal with trailing blanks after importing from excel to sas

I can find several topics and solutions on importing from excel to SAS and how to deal with variable names containing blanks or spaces.
However, in my situation, some of the variable values contain spaces at the end, and after importing I can see the trailing blanks, but compress does not remove them.
I'm thinking they're some other type of character. I've tried some modifiers on the compress function, but cannot seem to make it recognize these spaces.
Because I'm often creating different excel files, I would prefer not having to remove the blanks manually. Is there an option to the proc import step I should add, or is there a modifier I can provide to the compress function to solve this?
I'm using the following basic code to import:
proc import out = METADATA
datafile = "&mdata\meta_data.xlsx"
DBMS = Excel replace;
SHEET = "Sheet1";
GETNAMES = YES;
run;
EDIT (after implementing instructions from comments):
I don't really know how my component of SAS is called - I started working with SAS recently.
I'm using some kind of editor, with a VIEWTABLE window. When looking at my dataset this way, I can select (as in highlight) the variable values. One of my values has a trailing whitespace - I can highlight a finite space beyond the string, which I can't for the other variables. And I know the space is there because I have put it there in excel as well.
The length of my variable is 8, and setting the format to $HEX128 shows:
DOSE 444F534520202020
DOSE2 444F534532A02020
DOSE2 contains the blank space so it's actually 'DOSE2 ' in excel and in the VIEWTABLE.
When converting from string to hex I believe '2' is converted to 32.
That means the whitespace is converted to 'A0' instead of '20'.
Just as a reference for other people searching on these keywords or this topic:
After importing from excel where your values contain spaces, you might end up with a special kind of whitespace: these are non-breaking spaces.
You can find out by setting the format to $HEX128. - the whitespaces should be converted to A0 instead of 20, used for regular whitespaces.
If you want to remove these, you can use var = compress(var, 'A0'x);

#Name error in excel sheet

I have a csv file which is filled automatically through a java programm. I have a line which have the following text when I open the text in Notepad++:
-LRB- from the PMI Practice Standard for Work Breakdown Structures , Oct 2000 -RRB- '',"no","f1_FRAG:1.0","f2_specialChar:1.0","f3:15.0","f4:7.0","f5:0.0","f6:2.0","f7:0.0","f8:3.7612001156935624","f9:7.0","f10:1.0","f11:1.0","f12:0.0","f13:0.0","f14:0.0,"f15_ROOT:1.0","f16_specialChar:1.0","f17_NOTHING:1.0","f18_IN:1.0""
But when I open it in excel sheet, there are two problems:
1) When I click on the cell, I see #Name error and any click on the page causes an error. I even can't close the excel window normally. I also sometimes see something like =A228 or =B223 when I click on the cell. It sounds to be read as a formula, but it actually isn't.
2) The row is not shown completely. I can't see this part when I open the file using office excel:
",f15_ROOT:1.0","f16_specialChar:1.0","f17_NOTHING:1.0","f18_IN:1.0"".
Any help is appreciated.
Since the row starts with a - (minus sign), Excel is expecting a formula.
Manually, you could either:
add an ' (apostrophe) at the beginning of the line (which tells Excel that the cell contains text), or
Format the cell as text : Right-click the cell → Format Cells → Number tab → Text
Ideally, to prevent this issue in the future, the Java program which generates the .CSV file should be changed to enclose text fields with " double quotation marks.
Oddly, that is the only field in your example that isn't surrounded by double quotes.
"-LRB- from the PMI Practice Standard for Work Breakdown Structures , Oct 2000 -RRB- ''","no","f1_FRAG:1.0","f2_specialChar:1.0","f3:15.0","f4:7.0","f5:0.0","f6:2.0","f7:0.0","f8:3.7612001156935624","f9:7.0","f10:1.0","f11:1.0","f12:0.0","f13:0.0","f14:0.0,"f15_ROOT:1.0","f16_specialChar:1.0","f17_NOTHING:1.0","f18_IN:1.0""
At the minimum, double-quotes should the used around any fields that begin with a symbol or contain a comma (like above).
1997,Ford,E350,"Super, luxurious truck"
The double-quotes will be recognized and removed by most apps that open CSV's.
Any field may be quoted (that is, enclosed within double-quote characters). Some fields must be quoted, as specified in following
rules.
"1997","Ford","E350"
Fields with embedded commas or double-quote characters must be quoted.
1997,Ford,E350,"Super, luxurious truck"
Each of the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super, ""luxurious"" truck"
.
More about Comma Separated Value files:
Wikipedia: CSV Files - Basic Rules
RFC 2046 Standard
RFC4180 Standard
.
Surprisingly, I can't find any reference document from Microsoft that mentions starting text cells with an apostrophe. (I guess it's a secret, so if anyone asks, you didn't hear it from me.) :-)
The reason you are getting the #NAME error specifically is because Excel figures you're trying to enter a formula (because of the minus sign) but it doesn't recognize the Name of the function ("LRB")

How to replace wildcharacter in CSV

I have below string in csv files
Part Number WP1166496 (AP6005317) replaces 1166496, 1156976.
Expected Output -
Part Number WP1166496 replaces 1166496, 1156976.
I want to replace (AP6005317) this with blanks.
As there are many rows with different values.
So how can I replace this string with brackets to blanks value.
I don't know how to achieve this exactly in Microsoft Excel.
If you look for find and replace feature, most probably you can see option to replace with regular expressions.
Use regular expression option and replace \(.*\) with (simple space). This will solve your problem.
Note : This is tested and verified in LibreOffice Calc.

Converting xsxl into csv

I'm wondering if there is a way to convert an .xsxl file into .csv while preserving everything in its entirety.
I have a column that for some rows has values like 0738794E5 and when I convert it through "save as", the value turns to 7.39E+10. I understand that some values which have an "E" will be turned to the latter format but this conversion is no use to me since that "E" doesn't stand for exponentiation.
Is there a setting to preserve the values the way they are i.e. text/string?
One option is to create an additional (or replacement) column that has the target values either enclosed in double quotes or prepended by an alpha character.
The quotes or alpha character will guarantee that the problem values come in as text. When the csv file is opened, the quotes or alpha will still be there, so you would need to use a string operation (MID or RIGHT, probably) to recover the original string values.
My dilemma wasn't real and only appeared to be so.
When I convert the .xlsx into .csv and open the .csv, it shows the improperly-converted values.
However, when I run my application, read from the csv, and output what's been read, I get the values contained within the .xlsx just like I wanted.
I'm not sure how/why this is the way it is but it works now.

How to prevent Excel from handling strings containing a colon as formulas

I am generating csv files, and some cells have the format nn:nnnn , i.e. digits separated by a colon. It's not a time format nor a date format, it's just text cells and I don't want them to be re-formatted at all.
I've added some logic to my code in order to identify if it looks like a legal time format or a date, and if so, I wrap that string like this ="nn:nnnn". But I'm not interested in adding those characters to all the cells.
It almost solved my problem, but there are still some cases such as 07:1155 that MS Excel insists to translate it as 1.09375. Other cells such as 68:0062remain intact. Is there a way to recognize what strings are going to be calculated or translated?
Is there any workaround such as any set-up in MS Excel to tell it not to perform any translation on these kind of text?
Instead of just opening your CSV in Excel, you might try doing (Menus) Data/Get External Data/From Text. Or if you're using VBA, that would be something like:
With ActiveSheet.QueryTables.Add(Connection:="TEXT;H:\csvtest.csv" _
, Destination:=Range("$A$3"))
.Name = "csvtest_1"
.FieldNames = True
etc
End With
You may need to specify Text as the incoming column format.
I've got the following answer from Mr. JP Ronse at the Microsoft Community forum
Try to precede a string like 07:1155 with a single quote.
A single quote prevents Excel from interpreting the value.
For some reason Excel interpret a string like 07:1155 as a time and translates it to the value.
Excel sees 07:1155 as 7 hours and 1155 minutes, translated to values:
07:00 => 0.291666666666667
1155 minutes => (1155/60)/24 => 0.802083333333333
The sum is 1.09375
It looks as there is no translation on values like n:00nn or like n:0nnnnn
Checking on the 2 n's after the colon (not 00) could be a workaround.

Resources