As you know there are at least 3 ways to loop procedures in SPSS:
DO REPEAT-END REPEAT
LOOP-END LOOP
!DO-!DOEND structure inside DEFINE-!ENDDEFINE macro definition.
Now I'm asking about the second option. This kind of loop is more simple to use yet its functionality is significantly lesser than especially inside !DO-!DOEND structure.
Does anybody know which commands do work with LOOP-END LOOP and which doesn't? I tried to find a complete list but I haven't found anything.
Konrad,
As others have mentioned it is good to copnsult the SPSS Statistics Documentation. In this case, the SPSS Statistics Command Syntax Reference Guide is the one I'd choose.
https://www.ibm.com/support/knowledgecenter/SSLVMB_26.0.0/statistics_reference_project_ddita/spss/base/syn_commands_and_program_states_procedures.html
Briefly, the looping constructs you've mentioned all operate on Transformational commands -- commands that alter the data but do not require a read of the data. Transformations (e.g. COMPUTE, RECODE, SELECT IF, etc.) stack up in memory and take effect when an EXECUTE or a procedure command cause a data read.
If what you are looking to do is to iterate procedural commands (e.g. FREQUENCIES, REGRESSION, GGRAPH, etc.), that can be accomplished through the DEFINE--!ENDDEFINE command language, or better still through a Python script.
I hope this helps.
Related
I'm new to OpenMDAO and started off with the newest version (version 2.3.1 at the time of this post).
I'm working on the setup to a fairly complicated aero-structural optimization using several external codes, specifically NASTRAN and several executables (compiled C++) that post process NASTRAN results.
Ideally I would like to break these down into multiple components to generate my model, run NASTRAN, post process the results, and then extract my objective and constraints from text files. All of my existing interfaces are through text file inputs and outputs. According to the GitHub page, the file variable feature that existed in an old version (v1.7.4) has not yet been implemented in version 2.
https://github.com/OpenMDAO/OpenMDAO
Is there a good workaround for this until the feature is added?
So far the best solution I've come up with is to group everything into one large component that maps input variables to final output by running everything instead of multiple smaller components that break up the process.
Thanks!
File variables themselves are no longer implemented in OpenMDAO. They caused a lot of headaches and didn't fundamentally offer useful functionality because they requires serializing the whole file into memory and passing it around as string buffers. The whole process was just duplicative and inefficient, since the files were ultimately getting written and read from disk far more times than were necessary.
In your case since you're setting up an aerostructural problem, you really wouldn't want to use them anyway. You will want to have access to either analytic or at least semi-analytic total derivatives for efficient execution. So what that means is that the boundary of each component must composed of only floating point variables or arrays of floating point variables.
What you want to do is wrap your analysis tools using ExternalCodeImplicitComp, which tells openmdao that the underlying analysis is actually implicit. Then, even if you use finite-differences to compute the partial derivatives you only need to FD across the residual evaluation. For NASTRAN, this might be a bit tricky to set up, since I don't know if it directly exposes the residual evaluation, but if you can get to the stiffness matrix then you should be able to compute it. You'll be rewarded for your efforts with a greatly improved efficiency and accuracy.
Inside each wrapper, you can use the built in file wrapping tools to read through the files that were written and pull out the numerical values, which you then push into the outputs vector. For NASTRAN you might consider using pyNASTRAN, instead of the file wrapping tools, to save yourself some work.
If you can't expose the residual evaluation, then you can use ExternalCodeComp instead and treat the analysis as if it was explicit. This will make your FD more costly and less accurate, but for linear analyses you should be ok (still not ideal, but better than nothing).
The key idea here is that you're not asking OpenMDAO to pass around file objects. You are wrapping each component with only numerical data at its boundaries. This has the advantage of allowing OpenMDAO's automatic derivatives features to work (even if you use FD to compute the partial derivatives). It also has a secondary advantage that if you (hopefully) graduate to in-memory wrappers for your codes then you won't have to update your models. Only the component's internal code will change.
I've been reading the Theano documentation on scan and find myself confused by two, seemingly, contradictory statements.
On http://deeplearning.net/software/theano/tutorial/loop.html#scan, one of the advantages of scan is listed as:
Slightly faster than using a for loop in Python with a compiled Theano function.
But, on http://deeplearning.net/software/theano/library/scan.html#lib-scan, in a section on optimizing the use of scan, it says:
Scan makes it possible to define simple and compact graphs that can do
the same work as much larger and more complicated graphs. However, it
comes with a significant overhead. As such, **when performance is the
objective, a good rule of thumb is to perform as much of the computation
as possible outside of Scan**. This may have the effect of increasing
memory usage but can also reduce the overhead introduces by using Scan.
My reading of 'performance', here, is as a synonym for speed. So, I'm left confused as to when/if scan will lead to shorter runtimes, once compilation has been completed.
If your expression intrinsically needs a for-loop, then you sometimes have two options:
Build the expression using a python for loop
Build the expression using scan
Option 1 only works if you know in advance the length of your for-loop. It can happen that the length of your for-loop depends on a symbolic variable that is not available at script-writing time. In that case you need to use scan. Although oftentimes you can formulate the problem either way (see the absence of scan in tensorflow).
As for time performance, there have been a number of results showing that it really depends on the problem which one is faster.
I recently got into Stata coming from a procedural/OO/functional background, and am having trouble understanding the basic elements of the language.
For example, I discovered that there is a syntax command which "allows programs to interpret the arguments the user types according to a grammar, such as standard Stata syntax". I infer this is the reason why some command require a list of variables given as arguments to be separated by whitespaces while others require a comma-separated list. But the idea of a program defining its own syntax instead of the (parameter) syntax being enforced seems plain weird.
Another quite interesting construct is the syntax for macro definition and expansion (`macro') and the apparent absence of local variables as known in other languages.
Is there something like a "Stata for Java developers" document explaining the basic concepts of the language to people with my background?
PS: Apologies if this question seems unclear. Unfortunately, I can't formulate more concrete/clear questions at this point :(
I'm not exactly sure what you are looking for... but here's a few related points. Stata is kind of like writing a Unix shell script or a Windows batch file. Each line executes a command, and the first word is the command name. By convention, most commands have the following structure:
command [varlist] [=exp] [if expression] [in range] [weight] [using filename] [, options]
Brackets [.] means it's optional (or unavailable, depending on the command). Some commands can be prefixed (such as by:, xi:, or svy:) The syntax of commands by Stata Corp and experienced users are pretty consistent. But, because Stata users also write commands, you occasionally see things that are wacky.
When Stata users write commands, they are saved in .ado files (not .do) and are defined using the program command. (See help program and the "Ado files" section of the manual.) Writing a command is akin to writing a function in other languages (e.g., MatLab)
The syntax command is used to help you write your own command. When you execute a command, everything following the command's name (command above) is passed to the program in the local macro `0'. The syntax command parses this local macro, so that you can reference `varlist' or `if' and so on. In theory, you could parse `0' yourself, but the syntax command makes it much easier for you and your users (as long as you are following the conventional syntax). I put an example at the bottom.
I don't know exactly what you mean by "apparent absence of local variables as known in other languages." Macros store a single string or a single number in memory. Here's a comment I wrote about Stata's local/global macros. They are indeed a unique feature of Stata's programming language. As their names imply, "local" macros are only available within a specify program (command) or .do file while "global" macros are available throughout a Stata session.
I found that, once I got used to macros in Stata, I started to miss them in other languages. They are pretty handy. In addition to (local/global) macros and the main data set, you can also store "things" in memory with the scalar and matrix commands (and one or two other obscure things).
I hope that helps. Here's a list resources that might help.
Example:
program define myprogram
syntax varlist [if], [hello(string) yes]
macro list _0 _varlist _if _hello _yes
summarize `varlist' `if'
display "Here's the string in my hello option: `hello'"
if !missing("`yes'") di "Yes is on"
else di "Yes is off"
end
sysuse auto.dta
myprogram rep78 headroom if price > 5000 , hello("world") yes
A few books offer an "X for Y users" approach, but generally between stats software solutions. Regarding your question, I would recommend using instinct first.
I started reading (programming and markup) code about ten years ago, and even though I cannot code in a large number of languages, I can read a few languages rather easily. I found Stata easy because most of its core commands are straightforward, with recurrent optional statements like over, if or replace (to take a voluntarily diverse set of statements) that are easy to understand and then apply.
When I teach Stata, I always have problems getting students to use the help pages as much as I do (and I love the fact they can be accessed so easily, just like in R). I explain the paradox by considering the fact that I can read the syntax indications straightaway. Syntax is very well covered by the previous reply to your question.
The extra mile consists in opening the [R], [U] and especially [P] handbooks that come with Stata in the utilities folder. There is a wealth of details there, which will interest both programmers and training statisticians. This is where I learnt to use macros and loops, beyond the obvious logic of commands like local/global and foreach/while (if I understand the term correctly, Stata is Turing-complete).
Stata is sometimes a bit of a pain when it comes to using single/double quotes in macro loops, but it's pretty straightforward otherwise. Have fun!
Maybe you have come past the following situation. You're working and you start to run one script after another and then suddenly realize you've changed the value of a variable you are interested in. Apart from making a backup of the workspace, is there no other way to protect the variables?
Is there a way to select individual variables in the workspace that you're going to protect?
Apart from seeing the command history register, is there a history register of the different values that have been given to one particular variable?
Running scripts in sequence is a recipe for disaster. If possible, try turning those scripts into functions. This will naturally do away with the problems of overwriting variables you are running into, since variables inside functions are local to those functions whereas variables in scripts are local to the workspace -- and thus easily accessed/overwritten by separate scripts (often unintentionally, especially if you use variable names like "result").
I also agree that writing functions can be helpful in this situation. If however you are manipulating very large data sets then you need to be careful to write your code in a form which doesn't make multiple copies of variables within your functions or you may run into memory shortage problems.
No, there is no workspace history. I would say, if you run into that problem that you described, you should consider changing your programming style.
I would suggest you:
put that much code or information in your script, so you can start from an empty workspace to fulfill a task. For that reason I always put clear all at the start of my main file.
If it's getting too complex, consider calling functions. If you need values that are generated by another script or function, rewrite that script to become a function and call it in your main file or save the variables. Loading variables is absolutely okay. But running scripts in sequence leads to disaster as mentioned by marciovm.
I need to document the software I'm currently working on. The software consists of several programming languages and scripts which got me thinking. If a new developers comes along and needs to fix something, they might know Java but maybe not bash scripting. It would be nice if there was a program which would help to understand what
for f in "$#" ; do
means. I was thinking of something that creates a static HTML page with the code plus syntax highlighting and if you hover over something (like the "for"), it would display a pop-up with an explanation:
for starts a loop which iterates over all values that follow in. In the loop, you can access each value via the variable $f. The loop body is between do and done
Does something like that already exist?
[EDIT] This is just an example. You'll get another help for f, in, "$#", ; and do, i.e. each and every element of the line should be explained. Unknown elements (like command names) should link to Google. So you can understand what it does even if you're missing some detail.
[EDIT2] I'm aware that you can't write a program which understands what another program does. What I'm looking for is a simple tool which will do "extended syntax highlighting" in the sense that it will color an expression and give a short explanation what it means (plus maybe a link to some in-depth reference).
This is meant for someone who knows how to program but maybe hasn't seen some obscure construct before. Say
echo "Error" 1>&2
Every bash programmer knows what this means but a Java developer might be puzzled by the 1>&2 despite the fact that they can guess that echo == System.out.println. A simple "Redirects stdout to stderr" will clear things up and give that instant "AHA!" which allows them to stay in their current train of thought.
A tool like this could be built using ANTLR, i.e. parse the code into an abstract syntax tree using an ANTLR grammar for that language, and write an HTML generator which produced the annotated code.
It sounds like a useful tool to have for language learning, or exploring source code of projects you're not maintaining -- but is it appropriate for documentation?
Why is it important to help the programmers of other languages understand the code at this level of implementation detail? Anyone maintaining the implementation at this level will obviously have to know the language and will probably have an IDE to do most of this.
That said, I'd definitely consider a tool like this as a learning aid.
IMO it would be simpler and more effective to just collect links to good language-specific references and tutorials on a Wiki page.
For all mainstream languages, such sources exist and are maintained regularly. If you try to create your own reference, you need to maintain it too. Fair enough, bash syntax is not going to change very often, but other languages do develop faster, so it is going to be a burden.
If you think about it, it's not that useful to have a tool that explains the syntax. Developers could just google for keywords instead of browsing a website in a similar fashion to http://www.codeweblog.com/source/ .
I believe that good comments will be by far more useful, plus there are tools to extract the documentation by using the comments (for example, HappyDoc does that for Python).
It is a very tricky thing. First of all by definition it can be proven that program that will "understand" any program down't exist. However, you can still use existing documentation. Maybe using tools like Doxygen can help you. You would need to document your code through comments and the documentation will be generated from them.
A language cannot be explained only through its syntax. The runtime environment plays a great part, together with the underlying philosophy of the language and libraies.
Moreover, syntax is not that complex for most common languages (given that code has been written with maintainability in mind).
Going on with bash example, you cannot deeply understand bash if you know nothing about processes & job control, environment variables, a big list of unix commands (tr, sort, cut, paste, sed, awk, find, ...) and many other features that don't appear in syntax.
If the tool produced
for starts a loop which iterates over
all values that follow in. In the
loop, you can access each value via
the variable $f. The loop body is
between do and done
it would be pretty worthless. This is exactly the kind of comment that trainee (human) programmers are told nver to write.