I have a question about SAS-proc nlin.
I'm performing the procedure for 10000 simulations. Lots of them do not converge and give me wrong results.
I would like to add a binary variable to my output table that says that this itteration did not converge.
Does anyone know how to do that ?
Many thanks,
Perry
You need to use ODS to pull the ConvergenceStatus output from PROC NLIN. Add it to your procedure code like this:
PROC NLIN data = ...;
...;
ods output ConvergenceStatus = conv;
RUN;
That gives you a data set with two variables:
Status (0 means convergence, otherwise 1, 2, or 3 are described here: https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_nlin_sect031.htm).
Reason (description of the convergence issue).
So attach the results of that data set to each simulation round, and create a binary indicator for whether status > 0, and you should be all set.
Related
i'm trying to do something like that:
get_num_of_ones = "('1'):rep(%d)"
print(get_num_of_ones:format(24))
but i get the following result: ('1'):rep(24) and not 24 times the number 1.
how can I do it in this way so that i will get 111...11 (24 times the number 1) ?
The simplest, most straightforward, efficient and readable way to achieve what you want is simply to pass your number directly to string.rep; there is no need to format source code here:
get_num_of_ones = ('1'):rep(24)
print(get_num_of_ones)
If there is the artificial constraint that it needs to be achieved by formatting a source code template, you need to use load/loadstring:
get_num_of_ones = "return ('1'):rep(%d)" -- you need the "return" to make it a valid chunk, and to be able to get the result out
local source = get_num_of_ones:format(24) -- formatted source code
local chunk = assert(load(source)) -- load the source code; gives us a "chunk" (function) if there was no syntax error
local retval = chunk()
print(retval)
In SAS, I need a PROC TABULATE where labels are repeated so that it's easier on Excel to find them using INDEX-MATCH. Here is an example with sashelp.cars.
The first PROC TABULATE has the advantage of having repeating labels, which is needed for the INDEX-MATCH. But, its flaw is that SAS only gives the non missing values.
data cars;
set sashelp.cars;
run;
proc sort data=cars;
by make;
run;
This doesn't give all labels. I would like a table with 3 continents by column (Europe, Asia, USA) and every car type (Sedan, SUV, Wagon, Sports...).
PROC TABULATE DATA = cars;
option missing=0;
by make;
CLASS make type Type Origin / mlf MISSING ;
TABLE (
(type*make)
), (Origin='') / printmiss nocellmerge ; RUN;
So, in order to have all the 3 continents by colum, and every type of car (Sedan, SUV, Wagon, Sports...), I use CLASSDATA, as suggested:
Data level;
set cars;
keep make type Type Origin;
Run;
PROC TABULATE DATA = cars MISSING classdata=level;
option missing=0;
by make;
CLASS make type Type Origin / mlf MISSING ;
TABLE (
(make*type)
), (Origin='') / printmiss nocellmerge ;
RUN;
Data level;
set cars;
keep make type Type Origin;
Run;
PROC TABULATE DATA = cars MISSING classdata=level;
option missing=0;
by make;
CLASS make type Type Origin / mlf MISSING ;
TABLE (
(make*type)
), (Origin='') / printmiss nocellmerge ;
RUN;
But this gives a humongous table, and non repeating labels. Is there a midway solution with :
all the columns (3 continents) like in the last table
only the concerned MAKEs, that is the first 6 rows for Acura
repeated labels like in the first PROC TABULATE
Thank you very much,
I advice not exporting the listing of proc tabulate to excel
proc tabulate does not repeat values in the first column for each value in the second, because the output is meant for human reading. This is not the tool you need to write data to excel for further lookup.
I advice not using MATCH but SUMIFS
MATCH is a great function in excel, but is not a good choice for your application, because
it gives an error when it does not find what you look for, and that is why you need all labels in your output
it only supports one criterion, so you need at least 3 of them
it returns a position, so you still need an index function.
Therefore, I advice writing a simple create table
PROC sql;
create table TO_EXPORT as
select REGION, MACTIV, DATE, count(*) as cnt
from data
group by REGION, MACTIV, DATE;
proc export data = TO_EXPORT file="&myFolder\&myWorkbook..xlsx" replace;
RUN;
you will have your data in Excel in a more data oriented format.
To retrieve the data, I advise the following type of excel formula
=sumifs($D:$D,$A:$A,"13-*",$B:$B,$C:$C,"apr2020")`
It adds all counts with left of them the criteria you are looking for.
Because at most one row will meet these criteria, it actually just looks up a count you are looking for.
If that count does not exist, it will just return zero.
Disclaimer:
I did not test this code, so if it does not work, leave a comment and I will.
I need help modifying this code from SAS (http://support.sas.com/kb/33/078.html) to be:
Not case sensitive (therefore not overlooking SMITH versus Smith versus smith, I tried "upcase" but it won't work)
Include a counter (so that I can control for either knowing the first time a value appears and if needed, how many times the value appears)
Allow for a partial search (this code only allows for exact match to be searched which means I am missing many possible variables that the value could be defined under)
Thanks! :)
From your comment:
data _null_;
set &librf..&&ds&i;
%do j=1 %to &numvars;
if INDEX(upcase(&&var&j),"&string") >0 then
/*modified this part to satisfy the first and third things that I wanted*/
put "String &string found in dataset &librf..&&ds&i for variable &&var&j"
;
%end;
run;
So just add code to increment the counter. Do you want to count observations or occurrences? That is if the same observation has multiple hits does it count as one or multiple?
Counting each hit is easier:
data _null_;
set &librf..&&ds&i;
%do j=1 %to &numvars;
if INDEX(upcase(&&var&j),"&string") >0 then do;
_count+1;
put "String &string found in dataset &librf..&&ds&i for variable &&var&j" _count=;
end;
%end;
run;
Here is how you might count each observation.
data _null_;
set &librf..&&ds&i;
%do j=1 %to &numvars;
if INDEX(upcase(&&var&j),"&string") >0 then do;
_hit=1;
put "String &string found in dataset &librf..&&ds&i for variable &&var&j";
end;
%end;
if _hit then do;
_count+1;
put "Number of observations so far=" _count ;
end;
run;
Assuming you are running the code in the sample. I would change the comparison expression.
I would make it a macro parameter. You can use FIND/W/C, regular expression, etc.
exp=%str(find(&&var&j,'-target-','IT')),
%unquote(&exp) to replace underlined in red.
I am having trouble figuring out how to extract specific text within a string. My dataset has been pulled from de-identified electronic health records, and contains a list of every medication that our patients have been prescribed. I am, however, only concerned with a specific list of medications, which I have in another table. Within each cell is the name of the medication, dose, and form (Tabs, Caps, etc.) [See image]. Much of this information is not important for my analysis though, and I only need to extract the medication names that match my list. It might also be useful to extract the first word from each string, as it is (in most cases) the name of the medication.
I have examined a number of different methods of pulling substrings, but haven't quite found something that meets my needs. Any help would be greatly appreciated.
Thanks.
Data DRUGS;
infile datalines flowover;
length drug1-drug69 $20;
array drug[69];
input (drug1-drug69)($);
datalines;
AMITRIPTYLINE
AMOXAPINE
BUPROPION
CITALOPRAM
CLOMIPRAMINE
DESIPRAMINE
DOXEPIN
ESCITALOPRAM
FLUOXETINE
FLUVOXAMINE
IMIPRAMINE
ISOCARBOXAZID
MAPROTILINE
MIRTAZAPINE
NEFAZODONE
NORTRIPTYLINE
PAROXETINE
PHENELZINE
PROTRIPTYLINE
SERTRALINE
TRANYLCYPROMINE
TRAZODONE
TRIMIPRAMINE
VENLAFAXINE
AMITRIP
ELEVIL
ENDEP
LEVATE
ADISEN
AMOLIFE
AMOXAN
AMOXAPINE
DEFANYL
OXAMINE
OXCAP
WELLBUTRIN
BUPROBAN
APLENZIN
BUDEPRION
ZYBAN
CELEXA
ANAFRANIL
NORPRAMIN
SILENOR
PRUDOXIN
ZONALON
LEXAPRO
PROZAC
SARAFEM
LUVOX
TOFRANIL
TOFRANIL-PM
MARPLAN
LUDIOMIL
REMERON
REMERONSOLTAB
PAMELOR
PAXIL
PEXEVA
BRISDELLE
NARDIL
VIVACTIL
ZOLOFT
PARNATE
OLEPTRO
SURMONTIL
EFFEXOR
DESVENLAFAXINE
PRISTIQ
;;;;
run;
Data DM4_;
if _n_=1 then set DRUGS;
array drug[69];
set DM4;
do _i = 1 to countw(Description,' ().,');
_med = scan(Description,_i,' ().,');
_whichmed = whichc(_med, of drug[*]);
if _whichmed > 0 then leave;
end;
run;
Data DM_Meds (drop = drug1-drug69 _i _med _whichmed);
Set DM4_;
IF _whichmed > 0 then anti = _med;
else anti = ' ';
run;
This is a fairly common problem with a bunch of possible solutions depending on your needs.
The simplest answer is to create an array, assuming you have a smallish number of medicines. This isn't necessarily the fastest solution, but it would work fairly well and is simple to construct. Just get your drug list into a dataset, transpose it to horizontal (one row with lots of meds), then load it up this way. You iterate over the words in the name of the medicine and see if any of them are in the medicine list - if they are, then bingo, you have your drug! In real use of course drop the drug: variables afterwards.
This works a bit better than the inverse (searching each drug to see if it's in the medicine name) since usually there are more words in the drug list than in the medicine name. The hash solution might be faster, if you're comfortable with hashes (load the drug list into a hash table then use find() to do the same as what whichc is doing here).
data have;
input #1 medname $50.;
datalines;
PROVIGIL OR
ENSURE HIGH PROTEIN OR LIQD
BENADRYL 25 MG OR CAPS
ECOTRIN LOW STRENGTH 81 MG OR TBEC
SPIRONOLACTONE 25 MG PO TABS
NORVASC 5 MG OR TABS
FLUOXETINE HCL 25MG
IBUPROFEN 200MG
NEFAZODONE TABS OR CAPS 20MG
PAXIL (PAROXETINE HCL) 25MG
;;;;
run;
data drugs;
infile datalines flowover;
length drug1-drug19 $20;
array drug[19];
input (drug1-drug19) ($);
datalines;
AMITRIPTYLINE
AMOXAPINE
BUPROPION
CITALOPRAM
CLOMIPRAMINE
DESIPRAMINE
OXEPIN
ESCITALOPRAM
FLUOXETINE
FLUVOXAMINE
IMIPRAMINE
ISOCARBOXAZID
MAPROTILINE
MIRTAZAPINE
NEFAZODONE
NORTRIPTYLINE
PAROXETINE
PHENELZINE
PROTRIPTYLINE
;;;;
run;
data want;
if _n_ = 1 then set drugs;
array drug[19];
set have;
do _i = 1 to countw(medname,' ().,');
_medword = scan(medname,_i,' ().,');
_whichmed = whichc(_medword, of drug[*]);
if _whichmed > 0 then leave;
end;
run;
This should be an easy task for PROC SQL.
Let's say you have patient information in table A and drug names in table B (long format, not the wide format you gave). Here is the code filtering table A rows into table C where description in A contains drug name in B.
PROC SQL;
CREATE TABLE C AS SELECT DISTINCT *
FROM A LEFT JOIN B
ON UPCASE(A.description) CONTAINS UPCASE(B.drug);
QUIT;
I have this SAS sample code:
data BEFORE;
input v1 v2;
datalines;
1 2
;
data AFTER;
put 'Before IF: ' _ALL_;
if _N_ = 1 then set BEFORE;
put 'After IF : ' _ALL_;
run;
The output is:
BEFORE: v1=. v2=. _ERROR_=0 _N_=1
AFTER : v1=1 v2=2 _ERROR_=0 _N_=1
BEFORE: v1=1 v2=2 _ERROR_=0 _N_=2
AFTER : v1=1 v2=2 _ERROR_=0 _N_=2
And the output file contains:
Obs v1 v2
1 1 2
2 1 2
I know that the SET will import and RETAIN the BEFORE dataset's variables, but why BEFORE's record gets duplicated?
I ran your sample code, and you omitted a crucial piece of information: This message was in the SAS log: "NOTE: DATA STEP stopped due to looping.". Googling on that message led me to a SAS paper describing the error. It suggested not using an IF statement before the SET statement, but to use the OBS= data set option to restrict the number of observations read.
So you would change the line:
if _N_ = 1 then set BEFORE;
to:
set BEFORE(obs=1);
When I ran your code with this change, the "Before IF:" line still printed twice, and I'm not sure why that is so. But the looping NOTE did not occur, so I believe that is the solution.
The SET is an executable statement, that is, unless being executed, it does not reset variables or load the next observation's data, when the data step is executed. (It sets up or alter PDV when the data step is compiled, though.) Because of the if condition, it is executed only once.
The implicit OUTPUT statement at the bottom outputs an observation per iteration. SAS, monitoring to see if a data step loops infinitely, stops the data step after the second iteration and generates the note.