Conditionally retaining variables in SAS - programming-languages

I have this SAS sample code:
data BEFORE;
input v1 v2;
datalines;
1 2
;
data AFTER;
put 'Before IF: ' _ALL_;
if _N_ = 1 then set BEFORE;
put 'After IF : ' _ALL_;
run;
The output is:
BEFORE: v1=. v2=. _ERROR_=0 _N_=1
AFTER : v1=1 v2=2 _ERROR_=0 _N_=1
BEFORE: v1=1 v2=2 _ERROR_=0 _N_=2
AFTER : v1=1 v2=2 _ERROR_=0 _N_=2
And the output file contains:
Obs v1 v2
1 1 2
2 1 2
I know that the SET will import and RETAIN the BEFORE dataset's variables, but why BEFORE's record gets duplicated?

I ran your sample code, and you omitted a crucial piece of information: This message was in the SAS log: "NOTE: DATA STEP stopped due to looping.". Googling on that message led me to a SAS paper describing the error. It suggested not using an IF statement before the SET statement, but to use the OBS= data set option to restrict the number of observations read.
So you would change the line:
if _N_ = 1 then set BEFORE;
to:
set BEFORE(obs=1);
When I ran your code with this change, the "Before IF:" line still printed twice, and I'm not sure why that is so. But the looping NOTE did not occur, so I believe that is the solution.

The SET is an executable statement, that is, unless being executed, it does not reset variables or load the next observation's data, when the data step is executed. (It sets up or alter PDV when the data step is compiled, though.) Because of the if condition, it is executed only once.
The implicit OUTPUT statement at the bottom outputs an observation per iteration. SAS, monitoring to see if a data step loops infinitely, stops the data step after the second iteration and generates the note.

Related

Python Warning Panda Dataframe "Simple Issue!" - "A value is trying to be set on a copy of a slice from a DataFrame"

first post / total Python novice so be patient with my slow understanding!
I have a dataframe containing a list of transactions by order of transaction date.
I've appended an additional new field/column called ["DB/CR"], that dependant on the presence of "-" in the ["Amount"] field populates 'Debit', else 'Credit' in the absence of "-".
Noting the transactions are in date order, I've included another new field/column called [Top x]. The output of which is I want to populate and incremental independent number (starting at 1) for both debits and credits on a segregated basis.
As such, I have created a simple loop with a associated 'if' / 'elif' (prob could use else as it's binary) statement that loops through the data sent row 0 to the last row in the df and using an if statement 1) "Debit" or 2) "Credit" increments the number for each independently by "Debit" 'i' integer, and "Credit" 'ii' integer.
The code works as expected in terms of output of the 'Top x'; however, I always receive a warning "A value is trying to be set on a copy of a slice from a DataFrame".
Trying to perfect my script, without any warnings I've been trying to understand what I'm doing incorrect but not getting it in terms of my use case scenario.
Appreciate if someone can kindly shed light on / propose how the code needs to be refactored to avoid receiving this error.
Code (the df source data is an imported csv):
#top x debits/credits
i = 0
ii = 0
for ind in df.index:
if df["DB/CR"][ind] == "Debit":
i = i+1
df["Top x"][ind] = i
elif df["DB/CR"][ind] == "Credit":
ii = ii+1
df["Top x"][ind] = ii
Interpreter
df["Top x"][ind] = i
G:\Finances Backup\venv\Statementsv.03.py:173: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["Top x"][ind] = ii
Many thanks :)
You should use df.loc["DB/CR", ind] = "Debit"
Use iterrows() to iterate over the DF. However, updating DF while iterating is not preferable
see documentation here
Refer to the documentation here Iterrows()
You should never modify something you are iterating over. This is not
guaranteed to work in all cases. Depending on the data types, the
iterator returns a copy and not a view, and writing to it will have no
effect.

Is it possible to skip loading a row using the kiba-etl gem?

Is there a way I can skip loading certain rows if I deem the row invalid using the kiba-etl gem?
For example, if there is a validation that must be passed before I load it into the system or errors that occur and I still need to push the data into to sys regardless while logging the problem.
Author of Kiba here! To remove a row from the pipeline, simply return nil at the end of a transform:
transform do |row|
row_valid = some_custom_operation
row_valid ? row : nil
end
You could also "write down" the offending rows, and report on them later using a post_process block like this (in this case, require a moderate to low number of bogus rows):
#bogus_row_ids = []
transform do |row|
# SNIP
if row_valid(row)
row
else
#bogus_row_ids << row[:id]
nil # remove from pipeline
end
end
post_process do
# do something with #bogus_row_ids, send an email, write a file etc
end
Let me know if this properly answers your question, or if you need a more refined answer.
I'm dumb. I realized you can just catch your errors within the transformation/loading process and return nil.

SYBASE cursor declare as for read only become infinitive

There is a simple cursor, which has 260 records. Inside the loop not only print but update some tables. There is not only code, but explanation
/*Declare cursor for read only*/
DECLARE crsr_one CURSOR FOR
SELECT a,b,c,d
FROM table_name
WHERE a>=100
for read only
OPEN crsr_one /*open cursor*/
DECLARE #a,#b,#c,#d /*declare loop variables*/
while (1=1) /*start while loop*/
BEGIN
FETCH crsr_one into #a,#b,#c,#d /*fetch into variable */
IF (##sqlstatus = 2) /*Break if no more records*/
break
/*some other code with select and update table*/
print "%1! %2! %3! %4!", #a,#b,#c,#d /*Print variables*/
END
The problem is:
In the while loop it became infinitive and brings the same data.
Any idea why and how to correct it?
Your code looks good (except that the syntax for the DECLARE is invalid).
If the loop doesn't break on ##sqlstatus = 2, then the obvious question is: what value does it have? It can also be '1', indicating an error. To find out, print the value.
To be fully correct you should therefor also test for ##sqlstatus = 1. The easiest way to do this is to test for ##sqlstatus != 0 which covers both values 1 and 2.

Lua: Parsing and Manipulating Input with Loops - Looking for Guidance

I am currently attempting to parse data that is sent from an outside source serially. An example is as such:
DATA|0|4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_
This data can come in many different lengths, but the first few pieces are all the same. Each "piece" originally comes in with CRLF after, so I've replaced them with string.gsub(input,"\r\n","|") so that is why my input looks the way it does.
The part I would like to parse is:
4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_
The "4" tells me that there will be four lines total to create this file. I'm using this as a means to set the amount of passes in the loop.
The 7x5 is the font height.
The 1 is the xpos.
The 25 is the ypos.
The variable data (172-24 in this case) is the text at these parameters.
As you can see, it should continue to loop this pattern throughout the input string received. Now the "4" can actually be any variable > 0; with each number equaling a set of four variables to capture.
Here is what I have so far. Please excuse the loop variable, start variable, and print commands. I'm using Linux to run this function to try to troubleshoot.
function loop_input(input)
var = tonumber(string.match(val, "DATA|0|(%d*).*"))
loop = string.match(val, "DATA|0|")
start = string.match(val, loop.."(%d*)|.*")
for obj = 1, var do
for i = 1, 4 do
if i == 1 then
i = "font" -- want the first group to be set to font
elseif i == 2 then
i = "xpos" -- want the second group to be set to xpos
elseif i == 3 then
i = "ypos" -- want the third group to be set to ypos
else
i = "txt" -- want the fourth group to be set to text
end
obj = font..xpos..ypos..txt
--print (i)
end
objects = objects..obj -- concatenate newly created obj variables with each pass
end
end
val = "DATA|0|4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_"
print(loop_input(val))
Ideally, I want to create a loop that, depending on the var variable, will plug in the captured variables between the pipe deliminators and then I can use them freely as I wish. When trying to troubleshoot with parenthesis around my four variables (like I have above), I receive the full list of four variables four times in a row. Now I'm having difficulty actually cycling through the input string and actually grabbing them out as the loop moves down the data string. I was thinking that using the pipes as a means to delineate variables from one another would help. Am I wrong? If it doesn't matter and I can keep the [/r/n]+ instead of each "|" then I am definitely all for that.
I've searched around and found some threads that I thought would help but I'm not sure if tables or splitting the inputs would be advisable. Like these threads:
Setting a variable in a for loop (with temporary variable) Lua
How do I make a dynamic variable name in Lua?
Most efficient way to parse a file in Lua
I'm fairly new to programming and trying to teach myself. So please excuse my beginner thread. I have both the "Lua Reference Manual" and "Programming in Lua" books in paperback which is how I've tried to mock my function(s) off of. But I'm having a problem making the connection.
I thank you all for any input or guidance you can offer!
Cheers.
Try this:
val = "DATA|0|4|7x5|1|25|174-24|7x5|1|17|TERW|7x5|1|9|08MN|7x5|1|1|_"
val = val .. "|"
data = val:match("DATA|0|%d+|(.*)$")
for fh,xpos,ypos,text in data:gmatch("(.-)|(.-)|(.-)|(.-)|") do
print(fh,xpos,ypos,text)
end

Recording non-convergence in SAS NLIN

I have a question about SAS-proc nlin.
I'm performing the procedure for 10000 simulations. Lots of them do not converge and give me wrong results.
I would like to add a binary variable to my output table that says that this itteration did not converge.
Does anyone know how to do that ?
Many thanks,
Perry
You need to use ODS to pull the ConvergenceStatus output from PROC NLIN. Add it to your procedure code like this:
PROC NLIN data = ...;
...;
ods output ConvergenceStatus = conv;
RUN;
That gives you a data set with two variables:
Status (0 means convergence, otherwise 1, 2, or 3 are described here: https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_nlin_sect031.htm).
Reason (description of the convergence issue).
So attach the results of that data set to each simulation round, and create a binary indicator for whether status > 0, and you should be all set.

Resources