Converting xsxl into csv - excel

I'm wondering if there is a way to convert an .xsxl file into .csv while preserving everything in its entirety.
I have a column that for some rows has values like 0738794E5 and when I convert it through "save as", the value turns to 7.39E+10. I understand that some values which have an "E" will be turned to the latter format but this conversion is no use to me since that "E" doesn't stand for exponentiation.
Is there a setting to preserve the values the way they are i.e. text/string?

One option is to create an additional (or replacement) column that has the target values either enclosed in double quotes or prepended by an alpha character.
The quotes or alpha character will guarantee that the problem values come in as text. When the csv file is opened, the quotes or alpha will still be there, so you would need to use a string operation (MID or RIGHT, probably) to recover the original string values.

My dilemma wasn't real and only appeared to be so.
When I convert the .xlsx into .csv and open the .csv, it shows the improperly-converted values.
However, when I run my application, read from the csv, and output what's been read, I get the values contained within the .xlsx just like I wanted.
I'm not sure how/why this is the way it is but it works now.

Related

Decimal number in string-data-type with large amount of decimals always interpreted as large integer (regional decimal separator issue)

Background: I'm receiving data for my Excel application from an API in JSON format. For this matter I'm receiving numerical values as a string, as everything sent in JSON naturally is a text format - and so does VBA also interpret it. As I'm located in Denmark, using a different decimal separator than the native on in Excel (my Danish version utilizes , as separator rather than .).
Case:
This is causing quite a bit of trouble as Excel interprets this as a thousand-separator when converting the string to a number.
Searching for answers I've found that the best solution, normally, is to convert the string to double when using VBA, utilizing CDbl(string to convert to number).
This usually is the case, but in my case I'm receiving a number with a lot of decimals such as: "9.300000190734863".
When doing a CDbl("9.300000190734863") this results in a very large integer: 9,30000019073486E+15
Also, I don't think utilizing a replace() approach is feasible in my case as I might also have data that uses both decimal- and thousand separators at the same time, making my results prone to replacement errors.
However, inserting the string value directly into a cell within Excel converts the number correctly to 9,30000019073486 in my case.
Question: Can it be right that there's no way to mimic, or tap into, this functionality that Excel obviously is using when inserting the string into a cell?
I've searched for quite some time now, and I haven't found any solution other than the obvious: inserting the value into a cell. The problem here is that it's giving me some performance overhead which I would rather avoid.
You can swap the positions of the periods and commas in your input prior to casting as a double, in three steps:
Replace commas with 'X' (or some other value that won't appear in your data)
Replace periods with commas
Replace 'X' with periods

Importing data with invalid characters in numeric columns

DATA STRUCTURE: I have a data set that can be read as an Excel or CSV file. It has the following variable types: dates, time, numeric variables, and what should be numeric variables that incorrectly have characters attached to a number - e.g. -0.011* and 0.023954029324) (the parentheses at the end is in the cell) - due to an error in the program that wrote the file. There are also blank lines between every record, and it is not realistic to delete all of these as I have hundreds of files to manage.
DATA ISSUE: We've determined that some values are correct up to the character (i.e. -0.011 is correct as long as the asterisk is removed), while other values, such as 0.023954029324), are incorrect altogether and should be made missing. Please don't comment on this issue as it is out of my control and at this point all I can do is manage the data until the error is fixed and character values stop being written into the files.
PROBLEM WITH SAS:
1) If I use PROC IMPORT with an Excel file, SAS uses the first eight lines (20 for a CSV file) to determine if a variable is numeric or character. If the asterisk of parenthesis doesn't occur within the first 20 lines, SAS says the variable is numeric, then makes any later cells w/ character values missing. This is NOT okay in the case of the asterisks, because I want to maintain the numeric portion of the value and remove the asterisk in a later data step. Importing Excel files with PROC IMPORT does not allow the GUESSINGROWS option (as it does w/ CSV files, see below). Edit: Also, the MIXED=YES option does NOT work (see comments below - still need to change number of rows SAS uses, which, to me, means this option does...what?).
2) If I use PROC IMPORT with a CSV file, I can specify GUESSINGROWS=32767 and I get really excited because it then determines the variables with the asterisks are character and maintains the asterisks. However, it very strangely no longer determines the variables with parentheses as character (as it would have done when importing an Excel file as long as the parenthesis was in the first 20 lines), but instead removes the character and additionally rounds the value to the nearest whole number (0.1435980234 becomes 0, 1.82149023843 becomes 2, etc.). This is way too coarse of rounding - I need to maintain the decimal places. And, on top of that, the parentheses are now gone so I can't make the appropriate cells missing. I do not know if there is a way to make SAS not round and/or maintain the parentheses. To me, this is inconsistent behavior - why is the asterisk but not a parenthesis considered a character in this case? Also, when I read in the Excel file w/ PROC IMPORT (as described in (1)), it can cope w/ the parentheses (if they appear in the first 20 lines) - another inconsistency.
3) If I use INFILE, well - I get an error w/ every variable I try to read in - this procedure is way too sensitive and unstable for how varying the data are (and I have to code a work-around for the blank data lines).
ULTIMATE GOAL (note this code will be run automatically within a macro, if that matters):
1) Read date variable as a date
2) Read time variable as time
3) Be able to identify a variable w/ characters present in any cell of that variable (even after 20 lines) as a character variable and maintain the values in the cells (i.e. don't round/delete character). This can be by a priori telling SAS to let a certain set of variables be character (I will change them to numeric after I get rid of characters/make cells missing), or by SAS identifying variables w/ characters on its own.
SAS actually by default uses the first 8 rows. That is defined in a registry setting, TYPEGUESSROWS - which is normally stored in HKLM\Software\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows\ (or insert-your-office-version-there). Change that value to FFFF (hex)/65536 (decimal) or some other large number, or zero to search the maximum number of rows (a bit over 16000 - exact number is hard to find).
For a CSV file, you can write a data step import to control the formats of each variable. The easiest way to see this is to run the PROC IMPORT, then check your log; the log will contain the complete code used to read in the file in a data step. Then just modify the informats as needed. You say you have too much trouble with Infile method, so perhaps this won't work for you, but typically you can work around any inconsistencies - and if your files are THAT inconsistent it sounds like you'll be doing a ton of manual work anyway. This gives you the options to read in date/time variables correctly as well.
You also can use PROC IMPORT/CSV to the log, writing the log out to a file, then read THAT in and generate new import code on your own - or even off a proc contents of the generated file, making known modifications.
Not sure what you're asking about with date/time as you don't mention issues with it in the first part of your question.
One additional option you have is to clean out the characters before it's read in (from the CSV). This is pretty simple, if it's truly just numerics and commas (and decimals and negative signs):
data mydata;
infile myfile /*options*/;
input ##;
length infileline $32767; *or your longest reasonable line;
infileline = compress(_infile_,'.-','kd');
run;
data _null_;
set mydata;
file myfile /*options*/ /*or a new file if you prefer */;
put #1 infileline $32767.; *or your longest reasonable line;
run;
Then read that new file in using proc import. I am splitting it into two datasteps so you can see it, but you could combine them into one for ease of running - look up "updating a file in place" in SAS documentation. You could also accomplish this cleaning using OS specific tools; on Unix for example a short awk script could easily remove the misbehaving characters.

How can I prevent leading zeroes from being stripped from columns in my CSV?

I need to upload a CSV file, but my text data 080108 keeps converting to a number 80108. What do I do?
Use Quoted CSV
Use quotes in your CSV so that the column is treated as a string instead of an integer. For example:
"Foo","080108","Bar"
I have seen this often when viewing the CSV using Microsoft Excel. Excel will truncate leading zeros in all number fields. if the output is not coming from excel then check the output in note pad. If it is coming from excel you need to add a single quote mark ' to the beginning of each zip code. then excel will retain the leading 0.

Comparing Strings in Excel Returns Unexpected False

I have two columns of mostly identical strings in excel (including identical case), one is pasted from a CSV file and one is from an XLS file.
If I run EXACT, or just =, or =if(A1=B1,true,false) I always get a negative (false) value. Is this an issue with formats? What can I do to achieve the expected result?
Did you try Trim() Function to filter out extra space in the left or right ?
Importing from CSV create sometime some formatting problem, for example extra spaces o other char!

Prevent comma-separated list of numbers being interpreted as single large value

33266500,332665100,332665200,332665300 was the original value, cell should look like this: 33266500,332665100,332665200,332665300 but what I see as the cell value in excel is 3.32665E+34
So the question is I want to convert it into the original string. I have found format function on google and I used it like these
format(3.32665E+34,"standard")
giving it as 332,6650,033,266,510,000,000,000
How to parse it or get back the orginal string? I belive format is the function in vba.
Excel has a 15 digit precision limit. If the numbers are already shown like this when you access the file, there is no way to get the number back - you have already lost some digits. VBA code and formulas will not help you.
If this is not the case, you can add a single quote ' mark before the number to store it as text. This will ensure Excel does not try to treat it as a number and thus lose precision.
If you want the value kept exactly, store the data as a string, not as a number. The data type you are using simply doesn't have the ability to do what you are asking it to do.
If you're starting with an Excel file that has already been created then you've already lost the information: Excel has tried to understand what it was given and its best guess has turned out to be wrong. All you can do (if you can't get the source data) is go back to the creator of the Excel file and tell them what's wrong.
If you're starting with, say, a text file that you're importing, then the news is much better:
If you're importing manually using the Text Import Wizard, then at "Step 3 of 3" you need to set "Column Data Format" for the problem field to "Text".
If you're using a macro, you'll need to specify a value for the TextFileColumnDataTypes property that does the same thing. The easiest way to get it right is to use the Macro Recorder.
If you want the four values in the string to be separate cells, then again, look at the Text Import Wizard settings: in Step 1 of 3 you need to set "Delimited" data type (usually the default) and in Step 2 make sure that "Comma" is checked.
The value needs to be entered into the cell as a string. You need to make whatever it is that inserts the value preceed the value with a '.

Resources