Proc Reg Make Prediction With New Observations - statistics

I am trying to create a prediction interval based on a linear model in SAS. My SAS code is
proc reg data=datain.aswells alpha=0.01;
model arsenic = latitude longitude depth_ft / clb;
run;
I wish to make a 95% prediction interval with latitude=23.75467, longitude=90.66169, and depth_ft=25. This data point does not exist in the data set, but it is in the range of values used to compute the model. Is there an easy way of accomplishing this in SAS? Shouldn't there be a way to compute this prediction interval in SAS easily?

The easiest thing to do is to add it to your input data set with a missing value for ARSENIC. Then use the OUTPUT statement to output the prediction intervals.
Here is an example:
data test;
do i=1 to 100;
x1 = rannor(123);
x2 = rannor(123)*2 + 1;
y = 1*x1 + 2*x2 + 4*rannor(123);
output;
end;
run;
data test;
set test end=last;
output;
if last then do;
y = .;
x1 = 1.5;
x2 = -1;
output;
end;
run;
proc reg data=test alpha=.01;
model y = x1 x2;
output out=test_out(where=(y=.)) p=predicted ucl=UCL_Pred lcl=LCL_Pred;
run;
quit;
The WHERE clause on the output filters the resulting set to just the missing value to be predicted. You can remove it and get all predicted values and prediction intervals.

Related

Verilog code for Clarke and Park transformations

I want to write verilog code for Clarke and Park transformations for the implementation of a foc algorithm. I am new to verilog and I am failing to understand how to write the code for such complex equations which involve cos,sin functions and real numbers. Can someone please give me a start? The verilog code I tried to write is below.
timescale 1ns/1ps
module clarke_park(iR_i,iY_i,iB_i,theta,iD_o,iQ_o);
output real iD_o;
output real iQ_o;
input real iR_i;
input real iY_i;
input real iB_i;
real k = 0.66;
output real ialpha;
output real ibeta;
output real iY_r;//real part
output real iY_c;//complex part
output real iB_r;
output real iB_c;
output real ibeta_r;
output real ibeta_c;
function sin(input real theta);
function cos(input real theta);
iY_r = -1*(iY_i)*(0.5);
iY_c = (iY_i)*(0.866);
iB_r = -1*(iB_i)*(0.5);
iB_c = -1*(iB_i)*(0.866);
ialpha = k*iR;
ibeta_r = k*(0.866)*(iY_r-iB_r);
ibeta_c = k*(0.866)*(iY_c-iB_c);
real a1 = sin(theta);
real a2 = cos(theta);
iD_r = (a1*(ialpha)) + ((sin(theta))*(ibeta_r));
iD_c = a2*(ibeta_c);
iQ_r = - (1*a2*(ialpha)) + (a1*(ibeta_r));
iQ_c = a1*(ibeta_c);
endfunction
assign iD_o = {iD_r,iD_c};
assign iQ_o = {iQ_r,iQ_c};
endmodule
I would start with something like this:
module clarke_park(
output real iD_o,
output real iQ_o,
input real iR_i,
input real iY_i,
input real iB_i,
output real ialpha,
output real ibeta,
output real iY_r,//real part
output real iY_c,//complex part
output real iB_r,
output real iB_c,
output real ibeta_r,
output real ibeta_c
);
localparam k = 0.66;
Not sure what you are trying to do with the functions. but something like:
but note you have not defined theta, it was in you port list but then not defined as input or a real.
real a1;
real a2;
always #* begin
iY_r = -1*(iY_i)*(0.5);
iY_c = (iY_i)*(0.866);
iB_r = -1*(iB_i)*(0.5);
iB_c = -1*(iB_i)*(0.866);
ialpha = k*iR;
ibeta_r = k*(0.866)*(iY_r-iB_r);
ibeta_c = k*(0.866)*(iY_c-iB_c);
a1 = $sin(theta);
a2 = $cos(theta);
iD_r = (a1*(ialpha)) + ((sin(theta))*(ibeta_r));
iD_c = a2*(ibeta_c);
iQ_r = - (1*a2*(ialpha)) + (a1*(ibeta_r));
iQ_c = a1*(ibeta_c);
end
$cos and $sin are described in section 20.8 of ieee 1800-2012.

Sine wave generation using phase

I'm generating a sine wave using the following method -
sampling rate = 22050;
theta = 0;
for (i = 0; i < N; i++)
{
theta = phase * 2 * PI;
signal[i] = amplitude * sin(theta);
phase = phase + frequency/sampling rate;
}
When I generate a signal with a frequency of 8000 Hz, there is distortion in the output. Frequencies below this (e.g. 6000 Hz) are generated correctly. The 8000 Hz signal is generated correctly if I place a check on the phase like so -
if (phase > 1)
{
float temp = phase - 1;
phase = temp;
}
I think it has something to do with the sine function in Xcode, probably a range of values it can accept? The same code with and without the phase wrapping has no difference in Matlab. Can someone explain what's happening here?
I believe the calculation should be (2.0 * PI) * Frequency/Samplerate
This will give you the next phase increment in radians. this value can then be fed into the Sin function to calculate the phase. Note you need to accumulate the radian values.
Technically, your first statement is incorrect as it is worded. FS/2 is the nyquist value. You can produce frequencies above this but they will alias.
In terms of phase wrapping there are different ways to manage this.
My understanding of Radians is that it is 'linear' representation of the phase that doesn't repeat while phase revolves around 2 pi values. So you may not have a wrap issue if you manage phase by managing the radians.
Happy to corrected by more knowledgable folks.
I'm not certain, but I believe the problem might be:
theta = phase * 2 * PI;
I think Xcode will change the result to an integer. You might want to try:
theta = phase * 2.0 * PI;
instead, and make sure your PI variable is a double.
All of which makes this off-topic for DSP.SE. :-)
#cixelsyd has the correct formula ... here is the code to create a set of samples of a given frequency based on a sample rate
incr_theta := (2.0 * math.Pi * given_freq) / samples_per_second
phase := -1.74 // given phase ... typically 0 note its a constant
theta := 0.0
for curr_sample := 0; curr_sample < number_of_samples; curr_sample++ {
source_buffer[curr_sample] = math.Sin(theta + phase)
theta += incr_theta
}
for efficiency its best to move the calculation of delta theta outside of the loop ... notice phase is a constant as it just gives us an initial offset

SAS simplify the contents of a variable

In SAS, I've a variable V containing the following value
V=1996199619961996200120012001
I'ld like to create these 2 variables
V1=19962001 (= different modalities)
V2=42 (= the first modality appears 4 times and the second one appears 2 times)
Any idea ?
Thanks for your help.
Luc
For your first question (if I understand the pattern correctly), you could extract the first four characters and the last four characters:
a = substr(variable, 1,4)
b = substrn(variable,max(1,length(variable)-3),4);
You could then concatenate the two.
c = cats(a,b)
For the second, the COUNT function can be used to count occurrences of a string within a string:
http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p02vuhb5ijuirbn1p7azkyianjd8.htm
Hope this helps :)
Make it a bit more general;
%let modeLength = 4;
%let maxOccur = 100; ** in the input **;
%let maxModes = 10; ** in the output **;
Where does a certain occurrence start?;
%macro occurStart(occurNo);
&modeLength.*&occurNo.-%eval(&modeLength.-1)
%mend;
Read the input;
data simplified ;
infile datalines truncover;
input v $%eval(&modeLength.*&maxOccur.).;
Declare output and work variables;
format what $&modeLength..
v1 $%eval(&modeLength.*&maxModes.).
v2 $&maxModes..;
array w {&maxModes.}; ** what **;
array c {&maxModes.}; ** count **;
Discover unique modes and count them;
countW = 0;
do vNo = 1 to length(v)/&modeLength.;
what = substr(v, %occurStart(vNo), &modeLength.);
do wNo = 1 to countW;
if what eq w(wNo) then do;
c(wNo) = c(wNo) + 1;
goto foundIt;
end;
end;
countW = countW + 1;
w(countW) = what;
c(countW) = 1;
foundIt:
end;
Report results in v1 and v2;
do wNo = 1 to countW;
substr(v1, %occurStart(wNo), &modeLength.) = w(wNo);
substr(v2, wNo, 1) = put(c(wNo),1.);
put _N_= v1= v2=;
end;
keep v1 v2;
The data I testes with;
datalines;
1996199619961996200120012001
197019801990
20011996199619961996200120012001
;
run;

Send subset of data to SAS DS2 thread

I have a dataset with 5 groups and I want to use the DS2 procedure in SAS to concurrently compute group means.
Simulated dataset:
data sim;
call streaminit(7);
do group = 1 to 5;
do pt = 1 to 500;
x = rand('ERLANG', group);
output;
end;
end;
run;
How I envision it working is that each of 5 threads receives a subset of the data corresponding to a particular group. The mean of x is calculated on each subset like so:
proc ds2;
thread t / overwrite=yes;
dcl double n sum mean;
method init();
n = 0;
sum = 0;
mean = .;
end;
method run();
set sim; /* Or perhaps a subsetted dataset */
sum + x;
n + 1;
end;
method term();
mean = sum / n;
output;
end;
endthread;
...
quit;
The problem is, if you call a thread that processes a dataset like below, rows are sent to the 5 threads all willy-nilly (i.e. irrespective of groups).
data test / overwrite=yes;
dcl thread t t_instance;
method run();
set from t_instance threads=5;
end;
enddata;
How can I tell SAS to subset the data by group and pass each subset to its own thread?
I believe you have to add the by statement inside the run() method, and then add some code to deal with the by group (ie, if you want it to output for last.group then add code to do so and clear the totals). DS2 is supposed to be smart and use one thread per by group (or, at least, process an entire by group per thread). I'm not sure if you will see a great improvement if you're reading from disk (since the threading advantage is probably less than the disk read time) but who knows.
The only changes below are in run(), and adding a proc means to check myself.
data sim;
call streaminit(7);
do group = 1 to 5;
do pt = 1 to 500;
x = rand('ERLANG', group);
output;
end;
end;
run;
proc ds2;
thread t / overwrite=yes;
dcl double n sum mean ;
method init();
n = 0;
sum = 0;
mean = .;
end;
method run();
set sim;
by group;
sum + x;
n + 1;
if last.group then do;
mean = sum / n;
output;
n=0;
sum=0;
end;
end;
method term();
end;
endthread;
run;
data test / overwrite=yes;
dcl thread t t_instance;
method run();
set from t_instance threads=5;
end;
enddata;
run;
quit;
proc means data=sim;
class group;
var x;
run;

Weight factors by eigenvalues from PROC FACTOR in SAS?

I am trying to weight the factors from PROC FACTOR by their eigenvalues, but am having some difficulty. I have a solution, but it seems to me that there should be a more direct way to do this.
** Get factors and eigenvalues;
ods output Eigenvalues=MyEigenVals
proc factor data=MyData method=principal out=MyData;
var X1 X2 X3 X4 X5 X6;
run;
ods output close;
** Transpose the eigenvalues;
proc transpose data=MyEigenVals out=MyEigenVals(drop=_NAME_) prefix=eigenval;
id Number;
var Eigenvalue;
run;
** Merge the data and fill down the eigenvalues;
data MyData;
merge MyData MyEigenVals;
retain E1 E2 E3 E4 E5 E6;
if _n_=1 then do;
E1 = eigenval1;
E2 = eigenval2;
E3 = eigenval3;
E4 = eigenval4;
E5 = eigenval5;
E6 = eigenval6;
end;
** weight each factor by its eigenvalue;
factor1 = factor1 * E1;
factor2 = factor2 * E2;
factor3 = factor3 * E3;
factor4 = factor4 * E4;
factor5 = factor5 * E5;
factor6 = factor6 * E6;
run;
As you can see this does not seem to be a very direct way of accomplishing my task. Can anyone here help me fix this up nicely? Is it even possible?
You definitely could combine it more efficiently; at minimum, you can simplify the last datastep.
data mydata;
if _n_=1 then set MyEigenVals;
set mydata;
array factor[6];
array Eigenval[6];
do _i = 1 to dim(factor);
factor[_i] = factor[_i]*eigenval[_i];
end;
run;
SET automatically retains variables.
Secondly, you may be able to skip the multiplication depending on how you're using the results. You might be able to use a weight statement to use the eigenvalues as weights - depending on what procedures you're using to later analyze the data. I don't know if that buys you much, but it could save you from modifying the original value which might be preferable.

Resources