Formatting columns in exporting SAS table to Excel spreadsheet - excel

I am automating the exporting of a data set from SAS to Excel using ODS ExcelXP:
PURCHASE_ annual_ Minimum_
Obs APR fee Cash_Advance
1 23.45% NONE $10
2 23.45% NONE $10
3 23.45% NONE $10
4 18.45% NONE $10
5 18.45% NONE $10
6 18.45% NONE $10
7 23.45% NONE $10
8 23.45% NONE $10
9 23.45% NONE $10
10 23.45% $0
11 23.45% $0
12 23.45% $0
In SAS, the columns are formatted as text and I want all of the columns to be imported as text into Excel. I've used the following code to create the file using PROC REPORT:
ods tagsets.ExcelXP path="H:/path" file="file.xls" style=myStyle
options(frozen_headers='yes' WrapText='no'
embedded_titles='yes' suppress_bylines='yes'
sheet_interval='none' sheet_label=' '
sheet_name='Solicited'
width_points='1' width_fudge='1'
absolute_column_width='100' autofit_height='yes'
zoom='100');
title1;
proc report data=testing2 nowd;
column purchase_APR annual_fee minimum_cash_advance;
define purchase_APR / display style(column)={tagattr='format:#'} 'PURCHASE_APR';
define annual_fee / display style(column)={tagattr='format:#'} 'ANNUAL_FEE';
define minimum_cash_advance / display style(column)={tagattr='format:#'} 'MINIMUM_CASH_ADVANCE';
run;
ods tagsets.ExcelXP close;
However, when opening up the Excel file, the Text fields have been somehow changed from 23.45%, $0, and $10 (text) to 0.2345, 0, and 10 (text) respectively.
How can I get the output in Excel to be just like the data set in SAS?
I have tried using the specific formats to get them to look the same (i.e.tagattr='format:0.00%', etc.) but the output in Excel is numeric and not text format.

The proper way would be to modify how the template processes numbers. You can do that pretty easily in this case. You could even just comment out a line and one block of code, but here's the really proper answer.
Open the template in a text editor. We're going to add a couple of parameters, and implement them.
First, add the options to the $valid_options array. There are a bunch of lines like these, add these two more (Around line 635 or so):
set $valid_options["TEXTPERCENT"] "This value forces percentages to be displayed as text";
set $valid_options["TEXTCURRENCY"] "This value forces currency amounts to be displayed as text";
That text can be whatever you want, this is one interpretation. Now, around line 700 there are some lines setting the defaults, add these two:
set $option_defaults["TEXTCURRENCY"] 'no';
set $option_defaults["TEXTPERCENT"] 'no';
Now down much later (around row 1670) you have the section that defines $punctuation. We change how that works in order to remove "%" and "$" from the list if you set those options:
set $punctuation $thousands_separator " ";
set $punctuation $punctuation "%" /if ^$textpct;
set $punctuation $punctuation $currency_sym /if ^$textcurr;
(Basically, set $variable /if ; we set up punctuation to start with $thousands_separator and then add in the other bits if they are "no" only.)
Now around line 2100 in the "Yes/no on/off options... " section we evaluate the option's value. (The prior uses these values, but that's okay; it's actually called later.)
set $option_key 'TEXTPERCENT';
trigger do_yes_no;
eval $textpct $answer;
set $option_key 'TEXTCURRENCY';
trigger do_yes_no;
eval $textcurr $answer;
Finally, we implement things. Down around line 7400 is event value_type; which is where the % $ get removed and the numbers get adjusted to be 'real numbers' even if they shouldn't be. This is annoying. So we tell it not to.
do /if ^$textpct;
do /if $convert_percentages;
eval $tmp inputn($value, $test_format)/100;
else;
eval $tmp inputn($value, $test_format);
done;
/*putlog "Percent value:" $tmp;*/
set $value $tmp;
done;
We wrap the percent conversion code with do /if ^$textpct; and done, which tells it to skip doing the inputn (which will kill our percents). If we were cheating and not doing this the proper way, we could comment out this line:
set $value compress($value, $punctuation);
But since we fixed the $punctuation variable to contain (or not contain!) the right stuff already, this isn't an issue.
Now this will work! We just modify the tagset call:
*First include your tagset, which I put in c:\temp\ but you can put wherever and call whatever you like;
%include "c:\temp\excel_tpl_nocompress.txt";
ods tagsets.ExcelXP path="c:\temp\" file="testfile.xml"
options(frozen_headers='yes' WrapText='no'
embedded_titles='yes' suppress_bylines='yes'
sheet_interval='none' sheet_label=' '
sheet_name='Solicited' convert_percentages="no"
width_points='1' width_fudge='1'
absolute_column_width='100' autofit_height='yes'
textcurrency='yes' textpercent='yes'
zoom='100');
title1;
*Then add in the textpercent and textcurrency lines, and it should work as is.;
And now you're off to the races.
- <Row ss:AutoFitHeight="1">
- <Cell ss:StyleID="data__l1" ss:Index="1"> <Data ss:Type="String">23.45%</Data> </Cell>
- <Cell ss:StyleID="data__c1" ss:Index="2"> <Data ss:Type="String" /> </Cell>
- <Cell ss:StyleID="data__l1" ss:Index="3"> <Data ss:Type="String">$0</Data> </Cell>
</Row>

I found the answer I needed. I found the ExcelXP Options on the following page: ExcelXP Options I know it is a 'hack', but I changed the default options in the tagsets.ExcelXP to be currency_symbol = "|" and decimal_separator = "|"... fooling SAS into thinking that it should look for the pipe for currencies instead of a dollar sign and the pipe instead of a period for percentages. That way when it came across $0 or 23.45% it treated these as pure text.

Related

Writing from Powershell to Excel: How to set the cell format for the value?

I am reading values from different Excel files, and composing a new one containing information from all the others. While doing that, Excel seems to automatically change '.' to a comma ','. How do I prevent that?
I am using Powershell ISE on Win10 and Office365. I tried reading and writing 'value2' and 'text' and writing those. I tried casting the value2 to string when I write it. This did not work. The variables in Powershell hold the correct values as strings. The moment I save the new Excel file, the correct format is gone.
Example: Value is "123.456". I can read it, the Powershell variable shows "123.456". I write it to Excel and open the Excel afterwards, it reads:
123,456 and interprets it as number instead of a text.
How I read the value
[...]
$tmp += ($worksheet.cells.item($intRow,$col).value2)
How I write the value (I tried "value", and "text" for both)
[...]
elseif($value -eq 6){
$sheet.Cells.item($intRow,$columncounter).value2 = ($tmp[$value]).ToString()
}
[...]
This is how I open the excel file for writing:
$objExcel=New-Object -ComObject Excel.Application
$objExcel.Visible=$false
$resultbook = $objExcel.Workbooks.Add()
$sheet = $resultbook.ActiveSheet
$sheet.Name = "Data"
This is how I save the excel file
$resultbook.SaveAs($name)
$resultbook.close()
Expected: Input == Output, example: 1234.5678 --> 1234.5678
Actual Result: Input != Output, example 1234.5678 --> 1234,5678
It works fine for all other strings, texts, numbers except those containing dots.
I presume there must be a way to specify the cell format in the target file, however I did not find any documentation on that.

Removing unwanted data from text file

I have a large text file exported from an application that has three unwanted zeros in each row. The text file needs to be imported into another application and the zeros cause a problem.
Basically the unwanted three zeros per row need to be deleted. These zeros are always in the same location (same number of characters when counting from the left), but the location is somewhere in the middle. I have tried various things like importing the file into excel, removing the zeroes and then exporting as text file, but always have formatting problems with the exported text file.
Can someone suggest a solution or point me in the right direction?
something like this ? (quickly done)
Sub replaceInTx()
Dim inFile As String, outFile As String
Dim curLine As String
inFile = "x:\Documents\test.txt"
outFile = inFile & ".new.txt"
Open inFile For Input As #1
Open outFile For Output As #2
Do Until EOF(1)
Line Input #1, curLine
Print #2, Replace(curLine, "000", "", 6, 1, vbTextCompare)
Loop
Close #1
Close #2
End Sub
Alternatively, you can do that with any text editor that allows block selection (I like Notepad2, tiny, fast and portable)
I see you use excel a lot.
When you import the text file into excel do you use the import function and do you push the data into separate cells?
if the cell is numeric you could do the following:
=LEFT(TEXT(G5,"#"),LEN(TEXT(G5,"#"))-3)
if the cell is text:
=LEFT(G5,LEN(G5)-3)
G5 would the cell the data row/field is in.
curLine = Left(curLine, 104)
This will take the first 104 characters

How to sort by year

I have the following data in a variable myVAR (origin is an array, itemdel is TAB:
1949-1958 Jaggi, Ernst (1917-2004)
1897-1939 Laur, Ernst Ferdinand (1871-1964)
1939-1949 Howald, Oskar (1897-1972)
I want to sort them by the first year so that I get:
1897-1939 Laur, Ernst Ferdinand (1871-1964)
1939-1949 Howald, Oskar (1897-1972)
1949-1958 Jaggi, Ernst (1917-2004)
BUT I always end up with the following, no matter what I try:
Howald, Oskar (1897-1972)
Jaggi, Ernst (1917-2004)
Laur, Ernst Ferdinand (1871-1964)
1897-1939
1939-1949
1949-1958
I tried various methods and itemdel and everything but this is my sort code right now:
set the itemdel to numtochar(45) -- this is "-" / also tried TAB and so on
sort lines of myVAR ascending by item 1 of each
Can you spot the mistake?
I just figured it out, there was a rogue LF at the end of the first year range that was put into the array initially and then read from there again thus mixing my order when I sorted myVAR.
This works now:
set the itemdel to numtochar(45)
sort lines of tArraySortedVariable ascending

ODS EXCEL.TAGSET title statement

I would like to print some text before I show the result of a proc report. ODS is excel.tagset. Currently I do it with the title statement. But the title statement is limited to 10 titles (title1 title2,...). However I need more than 10 textlines at the output. How can I do this? I have SAS9.2.
EDIT:
Here is a code example:
ods tagsets.excelxp STYLE=sasdocprinter file=_WEBOUT
options(embedded_titles='yes' embedded_footnotes='yes');
title1 'title text row1';
title2 'title text row2';
...
title10 "title text &macro_var.";
footnote1 'footnote text';
proc report data=lib.a;
...
run;
Given you are using PROC REPORT, the easiest way around this may be to have PROC REPORT handle the lines of text. In PROC REPORT, you have the option of doing compute before _PAGE_, which will execute prior to each time a page is begun - suspiciously like a title.
proc report nowd data=sashelp.class;
columns sex name age height;
define sex/group;
define name/display;
define age /display;
define height/display;
compute before _PAGE_;
line "Title Row 11";
line "Title Row 12";
endcomp;
run;
Depending on your output destination there may be a row between the title and the proc report line, you can control that in some destinations (ie, remove it) with options if it is undesirable (or alternately move ALL of your title to lines like this).

parsing specific data from .txt file to excel or something else

I have extracted data from one source to .txt file. Source is some sort of address book and I used macro recorder for extraction. Now I have several files which are formated exactly in next way (example on 4 contacts):
Abbrucharbeiten
ATR Armbruster
Werkstr. 28
78727 Oberndorf
Tel. 0175 7441784
Fax 07423 6280
Abbrucharbeiten
Jensen & Sohn, Karl
Schallenberg 6A
25587 Münsterdorf
Tel. 04821 82538
Fax 04821 83381
Abbrucharbeiten
Kiwitt, R.
Auf der Heide 54
48282 Emsdetten
Tel. 02572 88559
Tel. 0172 7624359
Abbrucharbeiten, Sand und Kies, Transporte, Kiesgruben, Erdbau
Josef Grabmeier GmbH
Reitgesing 1
85560 Ebersberg
Tel. 08092 24701-0
Fax 08092 24701-24
1st row is always field(name) of bussines
2nd row is always name of company/firm
3rd row is always street adress
4th row is always Zip code and Place
and then
5th row and next couple of rows (sometimes are two rows sometimes more) are eithar Tel. or Fax.
I want to format it so it would be something like excel sheet like:
Branche: Name: Address: Place: contact1: contact2:
1st row 2nd row 3rd row 4th row 5th row 6th row.....
Now the main problem is I have over 500.000 contacts and my main problems are last fields which aren't always the same number... I don't wan't to do it manually, please help me...
Neither python nor visual basic but shouldn't be very difficult to translate to those languages. This is perl.
perl -lne '
## Print header. Either the header and data will be separated with pipes.
## Contacts(contact1, contact2, etc) are not included because at this
## moment I can not know how many there will be. It could be done but script
## would be far more complex.
BEGIN {
push #header, q|Branche:|, q|Name:|, q|Address:|, q|Place:|;
printf qq|%s\n|, join q{|}, #header;
}
## Save information for each contact. At least six lines. Over that only
## if lines begins with strings "Tel" or "Fax".
if ( $line < 6 || m/\A(?i)tel|fax/ ) {
push #contact_info, $_;
++$line;
## Not skip the printing of last contact.
next unless eof;
}
## Print info of contact, initialize data structures and repeat process
## for the next one.
printf qq|%s\n|, join q{|}, #contact_info;
$line = 0;
undef #contact_info;
push #contact_info, $_;
++$line;
' infile
It's a one-liner (I know it doesn't seem, but you can get rid of comments and remove newlines to get it), so run it directly from your shell. It yields:
Branche:|Name:|Address:|Place:
Abbrucharbeiten|ATR Armbruster|Werkstr. 28|78727 Oberndorf |Tel. 0175 7441784|Fax 07423 6280
Abbrucharbeiten|Jensen & Sohn, Karl|Schallenberg 6A|25587 Münsterdorf|Tel. 04821 82538|Fax 04821 83381
Abbrucharbeiten|Kiwitt, R.|Auf der Heide 54|48282 Emsdetten|Tel. 02572 88559|Tel. 0172 7624359
Abbrucharbeiten, Sand und Kies, Transporte, Kiesgruben, Erdbau|Josef Grabmeier GmbH|Reitgesing 1|85560 Ebersberg|Tel. 08092 24701-0|Fax 08092 24701-24
Take into account that I didn't print the full header and that fields are separated with pipes. I think that is not problematic to import it in Excel.

Resources