Send subset of data to SAS DS2 thread - multithreading

I have a dataset with 5 groups and I want to use the DS2 procedure in SAS to concurrently compute group means.
Simulated dataset:
data sim;
call streaminit(7);
do group = 1 to 5;
do pt = 1 to 500;
x = rand('ERLANG', group);
output;
end;
end;
run;
How I envision it working is that each of 5 threads receives a subset of the data corresponding to a particular group. The mean of x is calculated on each subset like so:
proc ds2;
thread t / overwrite=yes;
dcl double n sum mean;
method init();
n = 0;
sum = 0;
mean = .;
end;
method run();
set sim; /* Or perhaps a subsetted dataset */
sum + x;
n + 1;
end;
method term();
mean = sum / n;
output;
end;
endthread;
...
quit;
The problem is, if you call a thread that processes a dataset like below, rows are sent to the 5 threads all willy-nilly (i.e. irrespective of groups).
data test / overwrite=yes;
dcl thread t t_instance;
method run();
set from t_instance threads=5;
end;
enddata;
How can I tell SAS to subset the data by group and pass each subset to its own thread?

I believe you have to add the by statement inside the run() method, and then add some code to deal with the by group (ie, if you want it to output for last.group then add code to do so and clear the totals). DS2 is supposed to be smart and use one thread per by group (or, at least, process an entire by group per thread). I'm not sure if you will see a great improvement if you're reading from disk (since the threading advantage is probably less than the disk read time) but who knows.
The only changes below are in run(), and adding a proc means to check myself.
data sim;
call streaminit(7);
do group = 1 to 5;
do pt = 1 to 500;
x = rand('ERLANG', group);
output;
end;
end;
run;
proc ds2;
thread t / overwrite=yes;
dcl double n sum mean ;
method init();
n = 0;
sum = 0;
mean = .;
end;
method run();
set sim;
by group;
sum + x;
n + 1;
if last.group then do;
mean = sum / n;
output;
n=0;
sum=0;
end;
end;
method term();
end;
endthread;
run;
data test / overwrite=yes;
dcl thread t t_instance;
method run();
set from t_instance threads=5;
end;
enddata;
run;
quit;
proc means data=sim;
class group;
var x;
run;

Related

Display query in ABB HMI using software panel builder 600

I am using ABB HMI and programming it on panel builder 600. I have used meters to display angles and set the scale from -100 to +100. I have acheived success in displaying angles but the problem is the change in angle is very frequent and the needle of the meter gets out of control. For example: the angle is 5 degree then it suddenly increased to 10 degree and the decreased to 3 degree again in a very short span of time and my needle in display meter gets out of control. What should I do to resolve this issue? I am using ABB plc and writing my code in codesys in CFC language. Awaiting for the helpful replies TIA
Decreasing Sampling Rate
VAR
plcValue: INT; // this value changes a lot
hmiValue: INT := plcValue; // this value is sent to the HMI to be displayed
sampleRate: TIME := T#2S; // hmiValue will change every 2 seconds
timer: TON; // the timer
END_VAR
timer(IN := TRUE, PT := sampleRate);
IF (timer.Q) THEN
hmiValue := plcValue;
timer(IN := FALSE, PT := sampleRate); // reset
END_IF
Moving Average
VAR CONSTANT
SIZE: INT := 100; // the number of values to average
END_VAR
VAR
plcValue: INT; // this value changes a lot
hmiValue: INT := plcValue; // this value is sent to the HMI to be displayed
movingAverage: ARRAY [0..SIZE] OF INT; // last SIZE number of values of plcValue
maIndex: INT := 0;
maFilled: BOOL;
sum: REAL;
i: INT;
END_VAR
movingAverage[maIndex] := plcValue;
sum := 0;
IF (maFilled) THEN
FOR i := 0 TO SIZE DO
sum := sum + movingAverage[i];
END_FOR
hmiValue := REAL_TO_INT(sum / SIZE);
ELSE
FOR i := 0 TO maIndex DO
sum := sum + movingAverage[i];
END_FOR
hmiValue := REAL_TO_INT(sum / (maIndex + 1));
END_IF
IF (maIndex = SIZE) THEN
maIndex := 0;
maFilled := TRUE;
ELSE
maIndex := maIndex + 1;
END_IF
Comparison
running this code:
IF (plcValue = 5) THEN
plcValue := 10;
ELSIF (plcValue = 10) THEN
plcValue := 3;
ELSE
plcValue := 5;
END_IF
Reduced sampling rate results in the hmiValue still jumping every 2 seconds (or whatever sampleRate was set), while moving average was stuck at 6, which usually makes it the more preferred of the two, though a little bigger codewise, as well as slower to execute (though it shouldn't matter, unless you are counting thousands of averages every cycle). You can also change the average size: The bigger it is, the smoother the value, but also slower to react to change. Try not to make it too big
You can use some different blocks on OSCAT library (It's a 3rd party free library. You need to downloade it if you want to use it). I know you work in CFC and perhaps you are not familiar with ST, but this is best way to represent how to solve your task.
FADE
This block allows slowly change value from one value to another.
PROGRAM PLC_PRG
VAR
iValue: INT(-100..100); (* Value input *)
iGauge: INT(-100..100); (* Smoothed Value for HMI *)
fbFade: FADE; (* fade block *)
END_VAR
(* Play with TF parameter to achieve desired smoothness *)
fbFade(IN1 := INT_TO_REAL(iValue), IN2 := INT_TO_REAL(iGauge), F := FALSE, TF := T#500MS);
iGauge := REAL_TO_INT(fbFade.Y);
END_PROGRAM
FILTER_I
This block averages value for a given time interval. FILTER_I is a filter of the first degree for 16-bit INT data.
PROGRAM PLC_PRG
VAR
iValue: INT(-100..100); (* Value input *)
iGauge: INT(-100..100); (* Smoothed Value for HMI *)
fbFilter: FILETR_I; (* filter block *)
END_VAR
(* Play with T parameter to achieve desired smoothness *)
fbFilter(X := iValue, T := T#500MS, Y => iGauge);
END_PROGRAM
FILTER_MAV_W
And another filter is like #Guiorgy made en example based not on time but on number of values stored which is called MA (Moving Average).
PROGRAM PLC_PRG
VAR
iValue: INT(-100..100); (* Value input *)
iGauge: INT(-100..100); (* Smoothed Value for HMI *)
fbFilter: FILTER_MAV_W; (* filter block *)
END_VAR
(* Play with N parameter to achieve desired smoothness *)
fbFilter(X := INT_TO_WORD(iValue), N := INT#32);
iGauge := WORD_TO_INT(fbFilter.Y);
END_PROGRAM

Point in polygon hit test algorithm

I need to test if a point hits a polygon with holes and isles. I'd like to understand how I'm supposed to do this. That's not documented and I can't find any explanation or examples.
What I do is count +1 for every outer polygon hit and -1 for every inner polygon hit. The resulting sum is:
> 0: hit;
<= 0: miss (outside or in a hole).
The HitData class separates paths based on winding number to avoid unnecessary recomputation of orientation. With Clipper.PointInPolygon() applied to every path the sum is easy to compute.
But there are two major drawbacks:
I have to apply Clipper.PointInPolygon() to EVERY path;
I can't leverage the hierarchy of PolyTree.
Can someone who has hands-on experience with Clipper (#angus-johnson?) clear up this confusion?
Again, my question is: how am I supposed to implement this? Am I re-inventing the wheel, while there's an actual solution readily available in the Clipper Library?
Side note: PolyTree still requires to test EVERY path to determine which PolyNode the point is in. There's no Clipper.PointInPolyTree() method and, thus, AFAIK PolyTree doesn't help.
The structure that separates outer and inner polygons:
public class HitData
{
public List<List<IntPoint>> Outer, Inner;
public HitData(List<List<IntPoint>> paths)
{
Outer = new List<List<IntPoint>>();
Inner = new List<List<IntPoint>>();
foreach (List<IntPoint> path in paths)
{
if (Clipper.Orientation(path))
{
Outer.Add(path);
} else {
Inner.Add(path);
}
}
}
}
And this is the algorithm that tests a point:
public static bool IsHit(HitData data, IntPoint point)
{
int hits;
hits = 0;
foreach (List<IntPoint> path in data.Outer)
{
if (Clipper.PointInPolygon(point, path) != 0)
{
hits++;
}
}
foreach (List<IntPoint> path in data.Inner)
{
if (Clipper.PointInPolygon(point, path) != 0)
{
hits--;
}
}
return hits > 0;
}
Can someone who has hands-on experience with Clipper (#angus-johnson?) clear up this confusion?
It's not clear to me what your confusion is. As you've correctly observed, the Clipper library does not provide a function to determine whether a point is inside multiple paths.
Edit (13 Sept 2019):
OK, I've now created a PointInPaths function (in Delphi Pascal) that determines whether a point is inside multiple paths. Note that this function accommodates the different polygon filling rules.
function CrossProduct(const pt1, pt2, pt3: TPointD): double;
var
x1,x2,y1,y2: double;
begin
x1 := pt2.X - pt1.X;
y1 := pt2.Y - pt1.Y;
x2 := pt3.X - pt2.X;
y2 := pt3.Y - pt2.Y;
result := (x1 * y2 - y1 * x2);
end;
function PointInPathsWindingCount(const pt: TPointD;
const paths: TArrayOfArrayOfPointD): integer;
var
i,j, len: integer;
p: TArrayOfPointD;
prevPt: TPointD;
isAbove: Boolean;
crossProd: double;
begin
//nb: returns MaxInt ((2^32)-1) when pt is on a line
Result := 0;
for i := 0 to High(paths) do
begin
j := 0;
p := paths[i];
len := Length(p);
if len < 3 then Continue;
prevPt := p[len-1];
while (j < len) and (p[j].Y = prevPt.Y) do inc(j);
if j = len then continue;
isAbove := (prevPt.Y < pt.Y);
while (j < len) do
begin
if isAbove then
begin
while (j < len) and (p[j].Y < pt.Y) do inc(j);
if j = len then break
else if j > 0 then prevPt := p[j -1];
crossProd := CrossProduct(prevPt, p[j], pt);
if crossProd = 0 then
begin
result := MaxInt;
Exit;
end
else if crossProd < 0 then dec(Result);
end else
begin
while (j < len) and (p[j].Y > pt.Y) do inc(j);
if j = len then break
else if j > 0 then prevPt := p[j -1];
crossProd := CrossProduct(prevPt, p[j], pt);
if crossProd = 0 then
begin
result := MaxInt;
Exit;
end
else if crossProd > 0 then inc(Result);
end;
inc(j);
isAbove := not isAbove;
end;
end;
end;
function PointInPaths(const pt: TPointD;
const paths: TArrayOfArrayOfPointD; fillRule: TFillRule): Boolean;
var
wc: integer;
begin
wc := PointInPathsWindingCount(pt, paths);
case fillRule of
frEvenOdd: result := Odd(wc);
frNonZero: result := (wc <> 0);
end;
end;
With regards leveraging the PolyTree structure:
The top nodes in PolyTree are outer nodes that together contain every (nested) polygon. So you'll only need to perform PointInPolygon on these top nodes until a positive result is found. Then repeat PointInPolygon on that nodes nested paths (if any) looking for a positive match there. Obviously when an outer node fails PointInPolygon test, then its nested nodes (polygons) will also fail. Outer nodes will increment the winding count and inner holes will decrement the winding count.

SAS Index on Array

I am trying to search for a keyword in a description field (descr) and if it is there define that field as a match (what keyword it matches on is not important). I am having an issue where the do loop is going through all entries of the array and . I am not sure if this is because my do loop is incorrect or because my index command is inocrrect.
data JE.KeywordMatchTemp1;
set JE.JEMasterTemp;
if _n_ = 1 then do;
do i = 1 by 1 until (eof);
set JE.KeyWords end=eof;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
end;
match = 0;
do i = 1 to 100 until(match=1);
if index(descr, keywords[i]) then match = 1;
end;
drop i;
run;
Add another condition to your DO loop to have it terminate when any match is found. You might want to also remember how many entries are in the array. Also make sure to use INDEX() function properly.
data JE.KeywordMatchTemp1;
if _n_ = 1 then do;
do i = 1 by 1 until (eof);
set JE.KeyWords end=eof;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
last_i = i ;
retain last_i ;
end;
set JE.JEMasterTemp;
match = 0;
do i = 1 to last_i while (match=0) ;
if index(descr, trim(keywords[i]) ) then match = 1;
end;
drop i last_i;
run;
You have two problems; both of which would be easy to see in a small compact example (suggestion: put an example like this in your question in the future).
data partials;
input keyword $;
datalines;
home
auto
car
life
whole
renter
;;;;
run;
data master;
input #1 description $50.;
datalines;
Mutual Fund
State Farm Automobile Insurance
Checking Account
Life Insurance with Geico
Renter's Insurance
;;;;
run;
data want;
set master;
array keywords[100] $ _temporary_;
if _n_=1 then do;
do _i = 1 by 1 until (eof);
set partials end=eof;
keywords[_i] = keyword;
end;
end;
match=0;
do _m = 1 to dim(keywords) while (match=0 and keywords[_m] ne ' ');
if find(lowcase(description),lowcase(keywords[_m]),1,'t') then match=1;
end;
run;
Two things to look at here. First, notice the addition to the while. This guarantees we never try to match " " (which will always match if you have any spaces in your strings). The second is the t option in find (I note you have to add the 1 for start position, as for some reason the alternate version doesn't work at least for me) which trims spaces from both arguments. Otherwise it looks for "auto " instead of "auto".

SAS simplify the contents of a variable

In SAS, I've a variable V containing the following value
V=1996199619961996200120012001
I'ld like to create these 2 variables
V1=19962001 (= different modalities)
V2=42 (= the first modality appears 4 times and the second one appears 2 times)
Any idea ?
Thanks for your help.
Luc
For your first question (if I understand the pattern correctly), you could extract the first four characters and the last four characters:
a = substr(variable, 1,4)
b = substrn(variable,max(1,length(variable)-3),4);
You could then concatenate the two.
c = cats(a,b)
For the second, the COUNT function can be used to count occurrences of a string within a string:
http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p02vuhb5ijuirbn1p7azkyianjd8.htm
Hope this helps :)
Make it a bit more general;
%let modeLength = 4;
%let maxOccur = 100; ** in the input **;
%let maxModes = 10; ** in the output **;
Where does a certain occurrence start?;
%macro occurStart(occurNo);
&modeLength.*&occurNo.-%eval(&modeLength.-1)
%mend;
Read the input;
data simplified ;
infile datalines truncover;
input v $%eval(&modeLength.*&maxOccur.).;
Declare output and work variables;
format what $&modeLength..
v1 $%eval(&modeLength.*&maxModes.).
v2 $&maxModes..;
array w {&maxModes.}; ** what **;
array c {&maxModes.}; ** count **;
Discover unique modes and count them;
countW = 0;
do vNo = 1 to length(v)/&modeLength.;
what = substr(v, %occurStart(vNo), &modeLength.);
do wNo = 1 to countW;
if what eq w(wNo) then do;
c(wNo) = c(wNo) + 1;
goto foundIt;
end;
end;
countW = countW + 1;
w(countW) = what;
c(countW) = 1;
foundIt:
end;
Report results in v1 and v2;
do wNo = 1 to countW;
substr(v1, %occurStart(wNo), &modeLength.) = w(wNo);
substr(v2, wNo, 1) = put(c(wNo),1.);
put _N_= v1= v2=;
end;
keep v1 v2;
The data I testes with;
datalines;
1996199619961996200120012001
197019801990
20011996199619961996200120012001
;
run;

Go thread deadlock error - what is the correct way to use go routines?

I am writing a program that calculates a Riemann sum based on user input. The program will split the function into 1000 rectangles (yes I know I haven't gotten that math in there yet) and sum them up and return the answer. I am using go routines to compute the 1000 rectangles but am getting an
fatal error: all go routines are asleep - deadlock!
What is the correct way to handle multiple go routines? I have been looking around and haven't seen an example that resembles my case? I'm new and want to adhere to standards. Here is my code (it is runnable if you'd like to see what a typical use case of this is - however it does break)
package main
import "fmt"
import "time"
//Data type to hold 'part' of function; ie. "4x^2"
type Pair struct {
coef, exp int
}
//Calculates the y-value of a 'part' of the function and writes this to the channel
func calc(c *chan float32, p Pair, x float32) {
val := x
//Raise our x value to the power, contained in 'p'
for i := 1; i < p.exp; i++ {
val = val * val
}
//Read existing answer from channel
ans := <-*c
//Write new value to the channel
*c <- float32(ans + (val * float32(p.coef)))
}
var c chan float32 //Channel
var m map[string]Pair //Map to hold function 'parts'
func main() {
c = make(chan float32, 1001) //Buffered at 1001
m = make(map[string]Pair)
var counter int
var temp_coef, temp_exp int
var check string
var up_bound, low_bound float32
var delta float32
counter = 1
check = "default"
//Loop through as long as we have no more function 'parts'
for check != "n" {
fmt.Print("Enter the coefficient for term ", counter, ": ")
fmt.Scanln(&temp_coef)
fmt.Print("Enter the exponent for term ", counter, ": ")
fmt.Scanln(&temp_exp)
fmt.Print("Do you have more terms to enter (y or n): ")
fmt.Scanln(&check)
fmt.Println("")
//Put data into our map
m[string(counter)] = Pair{temp_coef, temp_exp}
counter++
}
fmt.Print("Enter the lower bound: ")
fmt.Scanln(&low_bound)
fmt.Print("Enter the upper bound: ")
fmt.Scanln(&up_bound)
//Calculate the delta; ie. our x delta for the riemann sum
delta = (float32(up_bound) - float32(low_bound)) / float32(1000)
//Make our go routines here to add
for i := low_bound; i < up_bound; i = i + delta {
//'counter' is indicative of the number of function 'parts' we have
for j := 1; j < counter; j++ {
//Go routines made here
go calc(&c, m[string(j)], i)
}
}
//Wait for the go routines to finish
time.Sleep(5000 * time.Millisecond)
//Read the result?
ans := <-c
fmt.Print("Answer: ", ans)
}
It dead locks because both the calc() and the main() function reads from the channel before anyone gets to write to it.
So you will end up having every (non-main) go routine blocking at:
ans := <-*c
waiting for someone other go routine to enter a value into the channel. There fore none of them gets to the next line where they actually write to the channel. And the main() routine will block at:
ans := <-c
Everyone is waiting = deadlock
Using buffered channels
Your solution should have the calc() function only writing to the channel, while the main() could read from it in a for-range loop, suming up the values coming from the go-routines.
You will also need to add a way for main() to know when there will be no more values arriving, perhaps by using a sync.WaitGroup (maybe not the best, since main isn't suppose to wait but rather sum things up) or an ordinary counter.
Using shared memory
Sometimes it is not necessarily a channel you need. Having a shared value that you update with the sync/atomic package (atomic add doesn't work on floats) lock with a sync.Mutex works fine too.

Resources