SAS sgplot step color gradient - colors

I want to generate a "step" plot (CDF) and I'm trying to change the line color using the dattrmap option. But color are not changing. Below is my code:
%MACRO ATRRMAP(fich=,var=);
proc freq data=&fich noprint;
tables &var/nocum nopercent norow nocol out=freq&var;
format _all_;
where &var^=.;
run;
data test;
set freq&var end=eof;
call symputx("mvCAT"||strip(_N_),&var);
if eof then call symputx("NB",_N_);
run;
data myattrmap;
length id $20 value 3 linecolor $10 pattern 3 fillcolor $20;
%do i=1 %to &NB;
id='myid';
value = &&mvCAT&i;
linecolor=cats("grey",put(&i*5,hex2.));
%if &i=1 or &i=5 or &i=9 %then %do;
pattern = 1;
%end;%else %if &i=2 or &i=6 or &i=10 %then %do;
pattern = 15;
%end;%else %if &i=3 or &i=7 or &i=11 %then %do;
pattern = 2;
%end;%else %if &i=4 or &i=8 or &i=12 %then %do;
pattern = 8;
%end;%else %do;
pattern = 41;
%end;
fillcolor=cats("grey",put(&i*5,hex2.));
output;
%end;
run;
%MEND ATRRMAP;
The generated data look like the following:
id value pattern fillcolor
myid -6 1 CXbdc3c7
myid -5 2 CXbdc3c7
myid -4 8 CXbdc3c7
Then, I used the sgplot:
PROC SGPLOT DATA=cumul sganno=annotation NOBORDER dattrmap=myattrmap;
STEP X=variable Y=percent/GROUP=newgroup attrid=myid;
YAXIS LABEL="Cumulative percentage of patients" VALUES=(0 TO 100 BY
10);
XAXIS LABEL=" " VALUES=(-4 to 4 by 0.5) ;
KEYLEGEND /TITLE=" " LOCATION=INSIDE POSITION=BOTTOMRIGHT ACROSS=1
DOWN=3 NOBORDER;
RUN;
The data myfile used with sgplot looks like the following:
variable percent newgroup
-3.66 2.70 -6
-3.41 5.40 -6
-3.26 8.11 -6
-3.28 5.8 -5
-2.97 13.51 -5
I would like to have a grey gradient. But first, I would simply like to choose, with dattrmap, color lines on my plot. I try with fillcolor and linecolor but it did not work. I try to change the color directly in the SGPLOT statement with the datacontrastcolors option of styleattrs and it works. Does someone see what am I missing ?

It has to be the GROUP = variable that is the variable that controls the color, shape and patterns. You're grouping your variables by NEWGROUP not the values. You could create a proxy to do this though, if that's what you want. Without some more details of what you need, I'm not sure how we can help you find a work around, but this does explain why it's not working at the moment.
From documentation:
The values of the VALUE variable are valid data group values. These values are case sensitive. The data group is assigned in the plot statement with the GROUP= option.
Assuming you do want the lines different color based on the NEWGROUP here's how you can modify your code. Note that I've simplified your code drastically and there were issues with how you're specifying color - I ignored those for now and am leaving that up to you to fix. The values are currently hard coded in the macro. I would also recommend changing the if _n_ portion to use the MOD() function since you seem to have some sort of pattern in your data. It may not work, but worth considering.
*create fake data;
data myfile;
input variable percent newgroup $;
cards;
-3.66 2.70 group1
-3.41 5.40 group1
-3.26 8.11 group1
-3.28 5.8 group2
-2.97 13.51 group2
;;;;
run;
*macro to create attribute map;
%MACRO ATRRMAP(fich=,var=);
proc freq data=&fich noprint;
tables &var/nocum nopercent norow nocol out=freq&var (drop=percent);
format _all_;
where not missing(&var);
run;
data myattrmap;
length id $20 value $20 linecolor $10 pattern 3 fillcolor $20;
set freq&var.;
id='myid';
value = &var.;
if _n_ =1 then
linecolor = 'CXbdbdbd';
else if _n_=2 then
linecolor = 'CX636363';
*linecolor=cats("grey",put(_n_*5,hex2.));
if _n_ in (1, 5, 9) then
pattern = 1;
else if _n_ in (2, 6, 10) then
pattern = 15;
else if _n_ in (3, 7, 11) then
pattern = 2;
else if _n_ in ( 4, 8, 12) then
pattern=8;
else pattern = 14;
fillcolor=cats("grey",put(_n_*5,hex2.));
output;
run;
%MEND ATRRMAP;
*create attribute map for newgroup;
%ATRRMAP(fich=myfile, var=newgroup);
*plot graph;
PROC SGPLOT DATA=myfile NOBORDER dattrmap=myattrmap;
STEP X=variable Y=percent/GROUP=newgroup attrid=myid;
YAXIS LABEL="Cumulative percentage of patients" VALUES=(0 TO 100 BY
10);
XAXIS LABEL=" " VALUES=(-4 to 4 by 0.5);
KEYLEGEND /TITLE=" " LOCATION=INSIDE POSITION=BOTTOMRIGHT ACROSS=1
DOWN=3 NOBORDER;
RUN;
Methods & rules for colour scheme names are found here.

Related

Conventions for metadata attributes in netCDF for compound data types

NetCDF allows (at least in its version 4 format based on HDF5) to create compound data types (very similar to a C struct). Each component has a label and a type and a position in the compound type. For example, for a data set of statistics, we could use the compound type defined by [('min', 'float'), ('max', 'float'), ('avg', 'float'), ('std', 'float')] has as its second component a float labeled max.
Now, netCDF also allows for adding metadata. These typically follow cenventions, such as the NetCDF Climate and Forecast (CF) Metadata Conventions. This is useful so that other users of the generated netCDF file can easily understand the metadata.
But I have not found conventions specifically dealing with compound data types, e.g., to give metadata specifically for one component of the compound data.
Are there such conventions?
If not or also, what is being used in practice?
If this is not used, what do you advise and why? (I was thinking of using multi-line attributes, so separated by \n, with a component-specific label to start each line, such as [avg] or #avg.)
To stay within the CF conventions you could create a separate variable for each member of the compound type and use the ancillary_variables attribute to indicate that they are related:
netcdf test {
dimensions:
time = 3 ;
lat = 36 ;
lon = 36 ;
variables:
double time(time) ;
time:long_name = "Time" ;
time:standard_name = "time" ;
time:units = "Days since 1970-01-01 00:00" ;
time:calendar = "standard" ;
float ctp(time, lat, lon) ;
ctp:_FillValue = -999.f ;
ctp:long_name = "Cloud Top Pressure" ;
ctp:standard_name = "air_pressure_at_cloud_top" ;
ctp:units = "Pa" ;
ctp:cell_methods = "time: mean" ;
ctp:ancillary_variables = "ctp_std ctp_min ctp_max" ;
float ctp_std(time, lat, lon) ;
ctp_std:_FillValue = -999.f ;
ctp_std:long_name = "Cloud Top Pressure Standard Deviation" ;
ctp_std:units = "Pa" ;
ctp_std:cell_methods = "time: standard_deviation" ;
float ctp_min(time, lat, lon) ;
ctp_min:_FillValue = -999.f ;
ctp_min:long_name = "Cloud Top Pressure Minimum" ;
ctp_min:units = "Pa" ;
ctp_min:cell_methods = "time: minimum" ;
float ctp_max(time, lat, lon) ;
ctp_max:_FillValue = -999.f ;
ctp_max:long_name = "Cloud Top Pressure Maximum" ;
ctp_max:units = "Pa" ;
ctp_max:cell_methods = "time: maximum" ;
}
You could then add metadata as usual via the variables' attributes. For example, the cell_methods attribute could be used to describe the applied statistics.
If you want to stick to the compound datatype, there is a ticket about vector quantities which might be related (although it is quite old): https://cf-trac.llnl.gov/trac/ticket/79

SAS Index on Array

I am trying to search for a keyword in a description field (descr) and if it is there define that field as a match (what keyword it matches on is not important). I am having an issue where the do loop is going through all entries of the array and . I am not sure if this is because my do loop is incorrect or because my index command is inocrrect.
data JE.KeywordMatchTemp1;
set JE.JEMasterTemp;
if _n_ = 1 then do;
do i = 1 by 1 until (eof);
set JE.KeyWords end=eof;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
end;
match = 0;
do i = 1 to 100 until(match=1);
if index(descr, keywords[i]) then match = 1;
end;
drop i;
run;
Add another condition to your DO loop to have it terminate when any match is found. You might want to also remember how many entries are in the array. Also make sure to use INDEX() function properly.
data JE.KeywordMatchTemp1;
if _n_ = 1 then do;
do i = 1 by 1 until (eof);
set JE.KeyWords end=eof;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
last_i = i ;
retain last_i ;
end;
set JE.JEMasterTemp;
match = 0;
do i = 1 to last_i while (match=0) ;
if index(descr, trim(keywords[i]) ) then match = 1;
end;
drop i last_i;
run;
You have two problems; both of which would be easy to see in a small compact example (suggestion: put an example like this in your question in the future).
data partials;
input keyword $;
datalines;
home
auto
car
life
whole
renter
;;;;
run;
data master;
input #1 description $50.;
datalines;
Mutual Fund
State Farm Automobile Insurance
Checking Account
Life Insurance with Geico
Renter's Insurance
;;;;
run;
data want;
set master;
array keywords[100] $ _temporary_;
if _n_=1 then do;
do _i = 1 by 1 until (eof);
set partials end=eof;
keywords[_i] = keyword;
end;
end;
match=0;
do _m = 1 to dim(keywords) while (match=0 and keywords[_m] ne ' ');
if find(lowcase(description),lowcase(keywords[_m]),1,'t') then match=1;
end;
run;
Two things to look at here. First, notice the addition to the while. This guarantees we never try to match " " (which will always match if you have any spaces in your strings). The second is the t option in find (I note you have to add the 1 for start position, as for some reason the alternate version doesn't work at least for me) which trims spaces from both arguments. Otherwise it looks for "auto " instead of "auto".

SAS simplify the contents of a variable

In SAS, I've a variable V containing the following value
V=1996199619961996200120012001
I'ld like to create these 2 variables
V1=19962001 (= different modalities)
V2=42 (= the first modality appears 4 times and the second one appears 2 times)
Any idea ?
Thanks for your help.
Luc
For your first question (if I understand the pattern correctly), you could extract the first four characters and the last four characters:
a = substr(variable, 1,4)
b = substrn(variable,max(1,length(variable)-3),4);
You could then concatenate the two.
c = cats(a,b)
For the second, the COUNT function can be used to count occurrences of a string within a string:
http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p02vuhb5ijuirbn1p7azkyianjd8.htm
Hope this helps :)
Make it a bit more general;
%let modeLength = 4;
%let maxOccur = 100; ** in the input **;
%let maxModes = 10; ** in the output **;
Where does a certain occurrence start?;
%macro occurStart(occurNo);
&modeLength.*&occurNo.-%eval(&modeLength.-1)
%mend;
Read the input;
data simplified ;
infile datalines truncover;
input v $%eval(&modeLength.*&maxOccur.).;
Declare output and work variables;
format what $&modeLength..
v1 $%eval(&modeLength.*&maxModes.).
v2 $&maxModes..;
array w {&maxModes.}; ** what **;
array c {&maxModes.}; ** count **;
Discover unique modes and count them;
countW = 0;
do vNo = 1 to length(v)/&modeLength.;
what = substr(v, %occurStart(vNo), &modeLength.);
do wNo = 1 to countW;
if what eq w(wNo) then do;
c(wNo) = c(wNo) + 1;
goto foundIt;
end;
end;
countW = countW + 1;
w(countW) = what;
c(countW) = 1;
foundIt:
end;
Report results in v1 and v2;
do wNo = 1 to countW;
substr(v1, %occurStart(wNo), &modeLength.) = w(wNo);
substr(v2, wNo, 1) = put(c(wNo),1.);
put _N_= v1= v2=;
end;
keep v1 v2;
The data I testes with;
datalines;
1996199619961996200120012001
197019801990
20011996199619961996200120012001
;
run;

Proc Reg Make Prediction With New Observations

I am trying to create a prediction interval based on a linear model in SAS. My SAS code is
proc reg data=datain.aswells alpha=0.01;
model arsenic = latitude longitude depth_ft / clb;
run;
I wish to make a 95% prediction interval with latitude=23.75467, longitude=90.66169, and depth_ft=25. This data point does not exist in the data set, but it is in the range of values used to compute the model. Is there an easy way of accomplishing this in SAS? Shouldn't there be a way to compute this prediction interval in SAS easily?
The easiest thing to do is to add it to your input data set with a missing value for ARSENIC. Then use the OUTPUT statement to output the prediction intervals.
Here is an example:
data test;
do i=1 to 100;
x1 = rannor(123);
x2 = rannor(123)*2 + 1;
y = 1*x1 + 2*x2 + 4*rannor(123);
output;
end;
run;
data test;
set test end=last;
output;
if last then do;
y = .;
x1 = 1.5;
x2 = -1;
output;
end;
run;
proc reg data=test alpha=.01;
model y = x1 x2;
output out=test_out(where=(y=.)) p=predicted ucl=UCL_Pred lcl=LCL_Pred;
run;
quit;
The WHERE clause on the output filters the resulting set to just the missing value to be predicted. You can remove it and get all predicted values and prediction intervals.

Weight factors by eigenvalues from PROC FACTOR in SAS?

I am trying to weight the factors from PROC FACTOR by their eigenvalues, but am having some difficulty. I have a solution, but it seems to me that there should be a more direct way to do this.
** Get factors and eigenvalues;
ods output Eigenvalues=MyEigenVals
proc factor data=MyData method=principal out=MyData;
var X1 X2 X3 X4 X5 X6;
run;
ods output close;
** Transpose the eigenvalues;
proc transpose data=MyEigenVals out=MyEigenVals(drop=_NAME_) prefix=eigenval;
id Number;
var Eigenvalue;
run;
** Merge the data and fill down the eigenvalues;
data MyData;
merge MyData MyEigenVals;
retain E1 E2 E3 E4 E5 E6;
if _n_=1 then do;
E1 = eigenval1;
E2 = eigenval2;
E3 = eigenval3;
E4 = eigenval4;
E5 = eigenval5;
E6 = eigenval6;
end;
** weight each factor by its eigenvalue;
factor1 = factor1 * E1;
factor2 = factor2 * E2;
factor3 = factor3 * E3;
factor4 = factor4 * E4;
factor5 = factor5 * E5;
factor6 = factor6 * E6;
run;
As you can see this does not seem to be a very direct way of accomplishing my task. Can anyone here help me fix this up nicely? Is it even possible?
You definitely could combine it more efficiently; at minimum, you can simplify the last datastep.
data mydata;
if _n_=1 then set MyEigenVals;
set mydata;
array factor[6];
array Eigenval[6];
do _i = 1 to dim(factor);
factor[_i] = factor[_i]*eigenval[_i];
end;
run;
SET automatically retains variables.
Secondly, you may be able to skip the multiplication depending on how you're using the results. You might be able to use a weight statement to use the eigenvalues as weights - depending on what procedures you're using to later analyze the data. I don't know if that buys you much, but it could save you from modifying the original value which might be preferable.

Resources