Analysis by columns in SAS - statistics

I need help to analyze the dependence of the objective function (variable: "cus12") on independent variables. I have data that has 1000 columns and the independent variables are divided into 4 groups: app_x, act_x, agr_x, ags_x. I probably need to write a loop to investigate the dependence of the variable "cus12" on variables from the given groups. I have a problem with writing such a loop and entering the appropriate "proc" in it
%let src=C:\Users\chlop\Desktop\ProjektSAS;
%let out=C:\Users\chlop\Desktop\ProjektSAS\wyj;
libname src "&src" compress=yes;
libname out "&out" compress=yes;
data out.test;
set src.train;
drop app_char_job_code app_char_marital_status app_char_city app_char_home_status app_char_cars act: agr: ags: default_cus3 default_cus6 default_cus9;
where period<'200405';
run;
proc sort data=names;
by varnum;
run;
%let target = default_cus12; *zmienna objasniana do modelu;
proc sql;
select name
from names
where name ne "app_char_job_code" and name ne "app_char_marital_status" and name ne "app_char_city" and name ne "app_char_home_status" and name ne "app_char_cars" and name ne "cid" and name ne "period" and name ne "&target";
quit;
proc sql;
select name into:zmienne separeted by ' '
from names
where name ne "app_char_job_code" and name ne "app_char_marital_status" and name ne "app_char_city" and name ne "app_char_home_status" and name ne "app_char_cars" and name ne "cid" and name ne "period" and name ne "&target";
quit;
%put &zmienne;
%macro a(lista);
%do i=1 %to 3;
%put %scan(&lista, &i, ' '); *separator na koncu;
%end;
%mend;
%a(&zmienne);
%macro gini(in, target, vars);
data gini_tab;
length zmienna $ 50. gini 8.;
delete;
run;
proc sql noprint;
select count(*)
into :n
from names;
quit;
%do i=1 %to &n;
%let zm = %scan(&vars, &i,' ');
*separator na koncu, wybieramy kolejne zmienne;
ods output Association=gini;
proc logistic data=&in;
model &target = &zm;
run;
ods output close;
proc sql;
select nValue2 into :g_coef
from gini
where Label2 contains "Somers";
quit;
%put Gini: &zm = &g_coef;
data tmp;
length zmienna $ 50. gini 8.;
zmienna = "&zm";
gini = &g_coef;
run;
proc append base=gini_tab data=tmp force;
run;
%end;
proc sort data=gini_tab;
by descending gini;
run;
%mend;
%gini(src.train, default_cus12, &zmienne);

Related

How do I import text file without column names into SAS format SAS Viya?

I have a text file uploaded in workbench in SAS Viya without column headers. I need covert the file into SAS format and assign the column name myself which I have in word document. I get this code when I use the automated import function in SAS Viya but I don't know how to assign column names. Any help would be great!
proc sql;
%if %sysfunc(exist(WORK.NEWDATA)) %then %do;
drop table WORK.NEWDATA;
%end;
%if %sysfunc(exist(WORK.NEWDATA,VIEW)) %then %do;
drop view WORK.NEWDATA;
%end;
quit;
FILENAME LOC DISK '/workspace/workbench/myorg/data/home/olddatatoimport.txt';
PROC IMPORT DATAFILE=LOC
DBMS=DLM
OUT=WORK.NEWDATA;
GETNAMES=YES;
RUN;
A RENAME statement can be used to change the default column names assigned by PROC IMPORT when GETNAMES=NO.
Example:
Presume the column names in the Word document were copy and pasted into SAS datalines
filename onlydata temp;
* create some sample data;
data _null_;
file onlydata dlm=',';
set sashelp.class(obs=3);
put name age sex height weight;
run;
* log the sample data just to get a look-see;
data _null_;
infile onlydata;
input;
put _infile_;
run;
* IMPORT the data only datafile, no names to get!;
* default column names are VAR1 to VAR<n>;
proc import file=onlydata replace out=onlydata dbms=csv;
GETNAMES=NO;
run;
* presume the column names in the Word document are in the correct order
* (matching the data file) and were copy & pasted into the datalines;
data column_names(label="From WORD document");
length name $32;
input name;
datalines;
name
age
sex
height
weight
run;
* Construct the oldname=newname source code pairs for a RENAME statement;
data _null_;
set column_names end=last_column;
length pairs $32000;
retain pairs;
pairs = catx(' ', pairs, catx('=', cats('var',_n_), name));
if last_column then
call symput ('pairs', trim(pairs));
run;
* use Proc DATASET to modify the header portion of the datas set;
* will not rewrite the entire data set;
proc datasets nolist lib=work;
modify onlydata;
rename &pairs;
run;
quit;
%let syslast = onlydata;
log image of the look-see
data set with default PROC IMPORT column names
data set after RENAME statement applied via PROC DATASETS

How do you in SAS return the list of all columns in all tables in a library that contain a target value?

I am trying to map out fields I see in an application to columns in the source database using SAS EG.
If I search for 'SomeString' or someNumericValue in Library = SomeLibrary
I want the code to output a table that lists the tableName ColumnName that contains the value searched.
Proc SQL:
Select * columns C from all tables in Library L that contain the value or string = 'SomeValue'
Proc contents can create a table of data set names to scan. A scanning macro, say %scanner, can be written and invoked for each data set via call execute. The results of the scan, the data set name and column containing the target, can be appended to an 'all results' table.
Example:
For simplicity it is presumed no data set has more than 10K variables of the target value type -- the code issues a warning if the scanning will be clipped.
Note: Example of string target would be ..., target="Jane", ...
%macro scanner (libname=, memname=, target=20500, flagMax = 10000);
%local type;
%if %qsysfunc(dequote(&target)) = %superq(target) %then
%let type = _numeric_;
%else
%let type = _character_;
data hits(keep=__libname __memname __varname);
array __flag (&flagMax) _temporary_;
set &libname..&memname;
array __candidates &type;
if dim(__candidates) = 0 then stop;
do __index = 1 to min (dim(__candidates), &flagMax);
if not __flag(__index) then
if __candidates(__index) = &target then do;
length __libname $8;
length __memname __varname $32;
__libname = "&libname";
__memname = "&memname";
__varname = vname(__candidates(__index));
__flag(__index) = 1;
OUTPUT;
end;
end;
if _n_ = 1 then
if dim(__candidates) > &flagMax then put "WARNING: &memname has more than &flagMax variables - scanning will be clipped. Increase flagMax=.";
run;
proc append base=hasTarget data=hits(rename=(__libname=libname __memname=memname __varname=varname));
run;
%mend;
proc sql;
create table hasTarget (libname char(8), memname char(32), varname char(32));
quit;
%let libname = SASHELP;
ods noresults;
ods output members=datasets;
proc datasets library=&libname memtype=data;
run;
quit;
ods results;
data _null_;
set datasets(keep=name memtype);
where memtype = 'DATA';
call execute (cats('%nrstr(%scanner(libname=' || "&LIBNAME., " || "memname=", name, '))'));
run;
It is a nice question, i myself wanted to develop the code for me .. you can try following code to find the table names from a library, exact variable names which has the required value
Modified Code
libname temp "Y:\temp\t";
data temp.aa;
a=0;
b=0;
test="String";
run;
data temp.bb;
a=1;
c=0;
d=1;
run;
data temp.cc;
a=0;
b=1;
e=1;
run;
proc sql;
create table info
as
select memname as table, name as column from dictionary.columns
where upcase(type)="NUM" /*upcase(type)="CHAR"*/
and libname='TEMP'
order by memname;
quit;
options merror mprint nosymbolgen nomlogic;
data info1;
length coltab $1000.;
set info;
newtab=catx("_","TEMPT",_n_);
condition=column||"=1"; /*Set Desired value here*/
tab=("'"||table||"' as tab_name");
var=("'"||column||"' as var_name");
coltab="create table "||newtab||" as Select "||column||","||tab||","||var||" from temp."||table|| "where "||condition||";";
run;
proc sql noprint;
select count(*) into: nobs from info1;
quit;
%macro process;
%do i=1 %to &nobs;
Data _null_;
Set info1(firstobs=&i obs=&i);
call symput('query',coltab);
run;
proc sql noprint;
&Query;
quit;
%end;
%mend;
%process;
proc sql noprint;
select distinct memname into :gt separated by " " from dictionary.columns where memname like '%TEMPT%';
quit;
%macro split(var);
%let var_c=%sysfunc(countw(&var));
%do i=1 %to &var_c;
%let var_t=%sysfunc(scan(&var,&i));
proc sql noprint;
select count(*) into :cnt from &var_t;
quit;
%if &cnt=0 %then
%do;
proc datasets lib=work nolist;
delete &var_t;
quit;
run;
%end;
%end;
%mend split;
%split(&gt);
proc sql noprint;
select distinct memname into :gt0 separated by " " from dictionary.columns where memname like '%TEMPT%';
quit;
data all;
set &gt0;
keep tab_name var_name;
run;
proc sort data=all; by tab_name; run;
data final;
length vars $100.;
set all;
by tab_name;
retain vars '';
if first.tab_name then vars=var_name;
else vars=catx(",",vars,var_name);
if last.tab_name;
drop var_name;
run;
proc print data=final; run;
Great challenge! I can get you some of the way there - the mp_searchdata macro of the SASjs macro core library will query all tables in a library (source database) for a string or numeric value. It returns all columns, but will filter for only matching records.
To execute:
/* import library */
filename mc url "https://raw.githubusercontent.com/sasjs/core/main/all.sas";
%inc mc;
/* run macro */
%mp_searchdata(lib=yourlib, string=SomeString)

SAS EG Creating multiple worksheet by ODS

There is two summary tables and one bar chart in my SAS EG project. Can I create output xls-file with multiple worksheet where there will be summary tables and bar chart. I know that ods tagset.excelXP is not suitable here. Maybe I should use another ODS?
I try use this code but instead of bar chart i have blank page:
ods excel file="/sas/user_data/flags/multiple5.xls"
style=pearl
options(
sheet_interval="none"
sheet_name="Sheet1"
);
PROC TABULATE
DATA=SASHELP.APPLIANC
;
VAR units_2;
CLASS units_7 / ORDER=UNFORMATTED MISSING;
TABLE
units_7 *(units_2 * Sum={LABEL="Sum"} )
all = 'Total' *(units_2 * Sum={LABEL="Sum"} ) ;
;
RUN;
ods excel options(sheet_interval='none' sheet_name='Sheet2');
PROC TABULATE
DATA=SASHELP.AARFM
;
VAR lineno;
CLASS key / ORDER=UNFORMATTED MISSING;
TABLE
/* COLUMN Statement */
key *(lineno * Sum={LABEL="Sum"} )
all = 'Total' *(lineno * Sum={LABEL="Sum"} ) ;
;
RUN;
ods excel options(sheet_interval='none' sheet_name='Sheet3');
ods graphics / height=400 width=800 noborder;
PROC GCHART DATA=SASHELP.ADSMSG
;
VBAR
MSGID
/
CLIPREF
FRAME TYPE=FREQ
COUTLINE=BLACK
RAXIS=AXIS1
MAXIS=AXIS2
;
RUN;
ods excel close;
https://communities.sas.com/t5/ODS-and-Base-Reporting/ODS-excel-amp-multiple-sheets/m-p/261953/highlight/true#M15551
ods excel file="C:\elever.xlsx";
ods excel options(sheet_name="SkoleElever" sheet_interval="none");
proc print data=sashelp.class;
run;
proc print data=sashelp.class;
run;
/* Add dummy table */
ods excel options(sheet_interval="table");
ods exclude all;
data _null_;
file print;
put _all_;
run;
ods select all;
ods excel options(sheet_interval="none");
proc tabulate data=sashelp.class;
class age sex;
table age, sex;
run;
proc print data=sashelp.class;
where age=12;
run;
ods EXCEL close;

SAS Code that works like Excel's "VLOOKUP" function

I'm looking for a SAS Code that works just like "VLOOKUP" function in Excel.
I have two tables:
table_1 has an ID column with some other columns in it with 10 rows. Table_2 has two columns: ID and Definition with 50 rows. I want to define a new variable "Definition " in table_1 and lookup the ID values from table_2.
I haven't really tried anything other than merge. but merge keeps all the extra 40 variables from table_2 and that's not what I like.
Thanks, SE
The simplest way is to use the keep option on your merge statement.
data result;
merge table_1 (in=a) table_2 (in=b keep=id definition);
by id;
if a;
run;
An alternative that means you don't have to sort your datasets is to use proc sql.
proc sql;
create table result as
select a.*,
b.definition
from table_1 a
left join table_2 b on a.id = b.id;
quit;
Finally, there is the hash table option if table_2 is small:
data result;
if _n_ = 1 then do;
declare hash b(dataset:'table_2');
b.definekey('id');
b.definedata('definition');
b.definedone();
call missing(definition);
end;
set table_1;
b.find();
run;
Here is one very useful (and often very fast) method specifically for 1:1 matching, which is what VLOOKUP does. You create a Format or Informat with the match-variable and the lookup-result, and put or input the match-variable in the master table.
data class_income;
set sashelp.class(keep=name);
income = ceil(12*ranuni(7));
run;
data for_format;
set class_income end=eof;
retain fmtname 'INCOMEI';
start=name;
label=income;
type='i'; *i=informat numeric, j=informat character, n=format numeric, c=format character;
output;
if eof then do;
hlo='o'; *hlo contains some flags, o means OTHER for nonmatching records;
start=' ';
label=.;
output;
end;
run;
proc format cntlin=for_format;
quit;
data class;
set sashelp.class;
income = input(name,INCOMEI.);
run;

SAS: Macro variable and string. Correct TableName

This is a part of macro:
%let mvTableName = "MyTable";
proc append base = &mvTableName data = TEMP_TABLE;
run;
And i can't find table in WORK :\
After that i check creation of table.
data &mvTableName;
run;
And see in log: Dataset MyTable ...
But when i change string %let mvTableName=MyTable;
I see this log: Dataset WORK.MyTable ..
How it can be explained?
If you are going to use mvTableName as an input for a DATA= option, don't include double quotes
Assuming MyTable and Temp_table are SAS data sets in the WORK library...this should work.
%Let mvTableName=MyTable;
Proc Append base=&mvTableName data=temp_table;
run;
Also,
Data &mvTableName;
Run;
Creates an empty data set...so mvTableName would be overwritten with an empty data set.

Resources