julia: efficient and tidy way to avoid many function arguments - struct

I have been writing stochastic PDE simulations in Julia, and as my problems have become more complicated, the number of independent parameters has increased. So what starts with,
myfun(N,M,dt,dx,a,b)
eventually becomes
myfun(N,M,dt,dx,a,b,c,d,e,f,g,h)
and it results in (1) messy code, (2) increased chance of error due to misplaced function arguments, (3) inability to generalise for use in other functions.
(3) is important, because I have made simple parallelisation of my code to evaluate many different runs of the PDEs. So I would like to convert my functions into a form:
myfun(args)
where args contains all the relevant arguments. The problem I am finding with Julia, is that creating a struct containing all my relevant parameters as attributes slows things down considerably. I think this is due to the continual accessing of the struct attributes. As a simple (ODE) working example,
function example_fun(N,dt,a,b)
V = zeros(N+1)
U = 0
z = randn(N+1)
for i=2:N+1
V[i] = V[i-1]*(1-dt)+U*dt
U = U*(1-dt/a)+b*sqrt(2*dt/a)*z[i]
end
return V
end
If I try to rewrite this as,
function example_fun2(args)
V = zeros(args.N+1)
U = 0
z = randn(args.N+1)
for i=2:args.N+1
V[i] = V[i-1]*(1-args.dt)+U*args.dt
U = U*(1-args.dt/args.a)+args.b*sqrt(2*args.dt/args.a)*z[i]
end
return V
end
Then while the function call looks elegant, it is cumbersome to rework accessing every attribute from the class and also this continual accessing of attributes slows the simulation down. What is a better solution? Is there a way to simply 'unpack' the attributes of a struct so they do not have to be continually accessed? And if so, how would this be generalised?
edit:
I am defining the struct I use as follows:
struct Args
N::Int64
dt::Float64
a::Float64
b::Float64
end
edit2: I have realised that structs with Array{} attributes can give rise to a performance difference if you do not specify the dimensions of the array in the struct definition. For example, if c is a one-dimensional array of parameters,
struct Args_1
N::Int64
c::Array{Float64}
end
will give far worse performance in f(args) than f(N,c). However, if we specify that c is a one-dimensional array in the struct definition,
struct Args_1
N::Int64
c::Array{Float64,1}
end
then the performance penalty disappears. This issue and the type instability shown in my function definitions seem to account for the performance difference I encountered when using a struct as the function argument.

In your code there is a type instability, related to U which is initialized as an 0 (integer), but if you replace it with 0. (floating point number), the type-instability disapears.
For the original versions (with "U=0"), function example_fun takes 801.933 ns (for the parameters 10,0.1,2.,3.) and example_fun2 925.323 ns (for similar values).
In the type-stable version (U=0.), both take 273 ns (+/5 ns). Thus this a substantial speed-up and there is no more a penalty of combining the arguments in the type args.
Here is the complete function:
function example_fun2(args)
V = zeros(args.N+1)
U = 0.
z = randn(args.N+1)
for i=2:args.N+1
V[i] = V[i-1]*(1-args.dt)+U*args.dt
U = U*(1-args.dt/args.a)+args.b*sqrt(2*args.dt/args.a)*z[i]
end
return V
end

Maybe you did not declare the types of the parameters of the type declaration of args?
Consider this small example:
struct argstype
N
dt
end
myfun(args) = args.N * args.dt
myfun is not type-stable can the type of the return type cannot be inferred:
#code_warntype myfun(argstype(10,0.1))
Variables:
#self# <optimized out>
args::argstype
Body:
begin
return ((Core.getfield)(args::argstype, :N)::Any * (Core.getfield)(args::argstype, :dt)::Any)::Any
end::Any
However, if you declare the types, then code becomes type-stable:
struct argstype2
N::Int
dt::Float64
end
#code_warntype myfun(argstype2(10,0.1))
Variables:
#self# <optimized out>
args::argstype2
Body:
begin
return (Base.mul_float)((Base.sitofp)(Float64, (Core.getfield)(args::argstype2, :N)::Int64)::Float64, (Core.getfield)(args::argstype2, :dt)::Float64)::Float64
end::Float64
You see that the inferred return type of Float64.
With parametric types (https://docs.julialang.org/en/v0.6.3/manual/types/#Parametric-Types-1), your code still remains generic and type-stable at the same time:
struct argstype3{T1,T2}
N::T1
dt::T2
end
#code_warntype myfun(argstype3(10,0.1))
Variables:
#self# <optimized out>
args::argstype3{Int64,Float64}
Body:
begin
return (Base.mul_float)((Base.sitofp)(Float64, (Core.getfield)(args::argstype3{Int64,Float64}, :N)::Int64)::Float64, (Core.getfield)(args::argstype3{Int64,Float64}, :dt)::Float64)::Float64
end::Float64

Related

Variable change in a function - Python 3

So I got the following code:
def foo(a, b, c):
try:
a = [0]
b[0] = 1
c[0] = 2
w[0] = 3
except:
pass
return z
x, y, z = [None], [None], [None]
w = z[:]
foo(x,y,z)
print(x,y,z,w)
The last line of the code print(x,y,z,w) prints [None] [1] [2] [3], however
I don't quite get it. Why are x,y,z are being changed from within the funciton? and if w changes - and it points to z, why doesnt z change accordingly?
In Python objects are passed by reference to functions.
This line makes a copy of z
w = z[:]
so changes to z don't affect w and vice versa. In the line
a = [0]
you change the reference to point to a new object, so you don't mutate x (which is what a was initially bound to). In the following lines
b[0] = 1
c[0] = 2
you mutate the objects that you got references to (y and z in global scope), so the objects in the outside scope change. In the line
w[0] = 3
you mutate the global object w since the name w is not a parameter of the function, nor is it bound in the body of the function.
What everyone else says is correct, but I want to add my way of thinking that may be helpful if you have experience with a language like C or C++.
Every variable in Python is a pointer (well, the technical term is a "reference", but I find that more difficult to visualize than "pointer"). You know how in C/C++ you can get a function to output multiple values by passing in pointers? Your code is doing essentially the same thing.
Of course, you may be wondering, if that is the case, why don't you see the same thing happening to ints, strs or whatnot? The reason is that those things are immutable, which means you cannot directly change the value of an int or a str at all. When you "change an integer", like i = 1, you are really changing the variable, pointing it to a different int object. Similarly, s += 'abc' creates a new str object with the value s + 'abc', then assigns it to s. (This is why s += 'abc' can be inefficient when s is long, compared to appending to a list!)
Notice that when you do a = [0], you are changing a in the second way --- changing the pointer instead of the object pointed to. This is why this line doesn't modify x.
Finally, as the others has said, w = z[:] makes a copy. This might be a little confusing, because for some other objects (like numpy arrays), this syntax makes a view instead of a copy, which means that it acts like the same object when it comes to changing elements. Remember that [] is just an operator, and every type of object can choose to give it a different semantical meaning. Just like % is mod for ints, and formatting for strs --- you sometimes just need to get familiar with the peculiarities of different types.

Problem in pushing a value to an element of struct in Julia

Say I have a struct:
mutable struct DataHolder
data1::Vector{Float64}
data2::Vector{Float64}
function DataHolder()
emp = Float64[]
new(emp, emp)
end
end
d = DataHolder()
When I try to push a value to only one element of struct d by doing:
push!(d.data1, 1.0)
the value is pushed not only d.data1 but also d.data2. Indeed, the REPL says
julia> d
DataHolder([1.0], [1.0])
How can I push a value to only one element of the struct??
Your problem is not in push!, but rather in the inner constructor of DataHolder. Specifically:
emp = Float64[]
new(emp, emp)
This code pattern means that the fields of the new DataHolder both point toward the same array (in memory). So if you mutate one of them (say, via push!), you also mutate the other.
You could instead replace those two lines with:
new(Float64[], Float64[])
to get the behaviour you want.
More generally, although it is not forbidden, your use of the inner constructor is a bit odd. Typically, inner constructors should have a method signature corresponding exactly to the fields of your struct, and the inner constructor itself is typically only used to provide a set of universal tests that any new DataHolder should undergo.
Personally I would re-write your code as follows:
mutable struct DataHolder
data1::Vector{Float64}
data2::Vector{Float64}
function DataHolder(data1::Vector{Float64}, data2::Vector{Float64})
#Any tests on data1 and data2 go here
new(data1, data2)
end
end
DataHolder() = DataHolder(Float64[], Float64[])
If you don't need to do any universal tests on DataHolder, then delete the inner constructor entirely.
Final food for thought: Does DataHolder really need to be mutable? If you only want to be able to mutate the arrays in data1 and data2, then DataHolder itself does not need to be mutable, because those arrays are already mutable. You only need DataHolder to be mutable if you plan to completely re-allocate the values in those fields, e.g. an operation of the form dh.data1 = [2.0].
UPDATE AFTER COMMENT:
Personally I don't see anything wrong with DataHolder() = DataHolder(Float64[], ..., Float64[]). It's just one line of code and you never have to think about it again. Alternatively, you could do:
DataHolder() = DataHolder([ Float64[] for n = 1:10 ]...)
which just splats an a vector of empty vectors into the constructor arguments.
#ColinTBowers has answered your question. Here is a very simple and a more general implementation:
struct DataHolder # add mutable if you really need it
data1::Vector{Float64}
data2::Vector{Float64}
end
DataHolder() = DataHolder([], [])
Perhaps you'd like to allow other types than Float64 (because why wouldn't you?!):
struct DataHolder{T}
data1::Vector{T}
data2::Vector{T}
end
DataHolder{T}() where {T} = DataHolder{T}([], [])
DataHolder() = DataHolder{Float64}() # now `Float64` is the default type.
Now you can do this:
julia> DataHolder{Rational}()
DataHold{Rational}(Rational[], Rational[])
julia> DataHolder()
DataHold{Float64}(Float64[], Float64[])

Fortran function that returns scalar OR array depending on input

I'm trying to crate a function in Fortran (95) that that will have as input a string (test) and a character (class). The function will compare each character of test with the character class and return a logical that is .true. if they are of the same class1 and .false. otherwise.
The function (and the program to run it) is defined below:
!====== WRAPPER MODULE ======!
module that_has_function
implicit none
public
contains
!====== THE ACTUAL FUNCTION ======!
function isa(test ,class )
implicit none
logical, allocatable, dimension(:) :: isa
character*(*) :: test
character :: class
integer :: lt
character(len=:), allocatable :: both
integer, allocatable, dimension(:) :: intcls
integer :: i
lt = len_trim(test)
allocate(isa(lt))
allocate(intcls(lt+1))
allocate(character(len=lt+1) :: both)
isa = .false.
both = class//trim(test)
do i = 1,lt+1
select case (both(i:i))
case ('A':'Z'); intcls(i) = 1! uppercase alphabetic
case ('a':'a'); intcls(i) = 2! lowercase alphabetic
case ('0':'9'); intcls(i) = 3! numeral
case default; intcls(i) = 99! checks if they are equal
end select
end do
isa = intcls(1).eq.intcls(2:)
return
end function isa
end module that_has_function
!====== CALLER PROGRAM ======!
program that_uses_module
use that_has_function
implicit none
integer :: i
i = 65
! Reducing the result of "isa" to a scalar with "all" works:
! V-V
do while (all(isa(achar(i),'A')))
print*, achar(i)
i = i + 1
end do
! Without the reduction it doesn''t:
!do while (isa(achar(i),'A'))
! print*, achar(i)
! i = i + 1
!end do
end program that_uses_module
I would like to use this function in do while loops, for example, as it is showed in the code above.
The problem is that, for example, when I use two scalars (rank 0) as input the function still returns the result as an array (rank 1), so to make it work as the condition of a do while loop I have to reduce the result to a scalar with all, for example.
My question is: can I make the function conditionally return a scalar? If not, then is it possible to make the function work with vector and scalar inputs and return, respectively, vector and scalar outputs?
1. What I call class here is, for example, uppercase or lowercase letters, or numbers, etc. ↩
You can not make the function conditionally return a scalar or a vector.
But you guessed right, there is a solution. You will use a generic function.
You write 2 functions, one that takes scalar and return scalar isas, the 2nd one takes vector and return vector isav.
From outside of the module you will be able to call them with the same name: isa. You only need to write its interface at the beginning of the module:
module that_has_function
implicit none
public
interface isa
module procedure isas, isav
end interface isa
contains
...
When isa is called, the compiler will know which one to use thanks to the type of the arguments.
The rank of a function result cannot be conditional on the flow of execution. This includes selection by evaluating an expression.
If reduction of a scalar result is too much, then you'll probably be horrified to see what can be done instead. I think, for instance, of derived types and defined operations.
However, I'd consider it bad design in general for the function reference to be unclear in its rank. My answer, then, is: no you can't, but that's fine because you don't really want to.
Regarding the example of minval, a few things.1 As noted in the comment, minval may take a dim argument. So
integer :: X(5,4) = ...
print *, MINVAL(X) ! Result a scalar
print *, MINVAL(X,dim=1) ! Result a rank-1 array
is in keeping with the desire of the question.
However, the rank of the function result is still "known" at the time of referencing the function. Simply having a dim argument means that the result is an array of rank one less than the input array rather than a scalar. The rank of the result doesn't depend on the value of the dim argument.
As noted in the other answer, you can have similar functionality with a generic interface. Again, the resolved specific function (whichever is chosen) will have a result of known rank at the time of reference.
1 The comment was actually about minloc but minval seems more fitting to the topic.

How arguments are passed into a cppFunction

I'm puzzled as to how arguments are passed into a cppFunction when we use Rcpp. In particular, I wonder if someone can explain the result of the following code.
library(Rcpp)
cppFunction("void test(double &x, NumericVector y) {
x = 2016;
y[0] = 2016;
}")
a = 1L
b = 1L
c = 1
d = 1
test(a,b)
test(c,d)
cat(a,b,c,d) #this prints "1 1 1 2016"
As stated before in other areas, Rcpp establishes convenient classes around R's SEXP objects.
For the first parameter, the double type does not have a default SEXP object. This is because within R, there is no such thing as a scalar. Thus, new memory is allocate making the & reference incompatible. Hence, the variable scope for the modification is limited to the function and there is never an update to the result. As a result, for both test cases, you will see 1.
For the second case, there is a mismatch between object classes. Within the first call object supplied is of type integer due to the L appended on the end, which conflicts with the C++ function expected type of numeric. The issue is resolved once the L is dropped as the object is instantiated as a numeric. Therefore, an intermediary memory location does not need to be created that is of the correct type to receive the value. Hence, the modification in the second case is able to propagate back to R.
e.g.
a = 1L
class(a)
# "integer"
a = 1
class(a)
# "numeric"

What (are there any) languages with only pass-by-reference?

I was wondering. Are there languages that use only pass-by-reference as their eval strategy?
I don't know what an "eval strategy" is, but Perl subroutine calls are pass-by-reference only.
sub change {
$_[0] = 10;
}
$x = 5;
change($x);
print $x; # prints "10"
change(0); # raises "Modification of a read-only value attempted" error
VB (pre .net), VBA & VBS default to ByRef although it can be overriden when calling/defining the sub or function.
FORTRAN does; well, preceding such concepts as pass-by-reference, one should probably say that it uses pass-by-address; a FORTRAN function like:
INTEGER FUNCTION MULTIPLY_TWO_INTS(A, B)
INTEGER A, B
MULTIPLY_BY_TWO_INTS = A * B
RETURN
will have a C-style prototype of:
extern int MULTIPLY_TWO_INTS(int *A, int *B);
and you could call it via something like:
int result, a = 1, b = 100;
result = MULTIPLY_TWO_INTS(&a, &b);
Another example are languages that do not know function arguments as such but use stacks. An example would be Forth and its derivatives, where a function can change the variable space (stack) in whichever way it wants, modifying existing elements as well as adding/removing elements. "prototype comments" in Forth usually look something like
(argument list -- return value list)
and that means the function takes/processes a certain, not necessarily constant, number of arguments and returns, again, not necessarily a constant, number of elements. I.e. you can have a function that takes a number N as argument and returns N elements - preallocating an array, if you so like.
How about Brainfuck?

Resources