I'm puzzled as to how arguments are passed into a cppFunction when we use Rcpp. In particular, I wonder if someone can explain the result of the following code.
library(Rcpp)
cppFunction("void test(double &x, NumericVector y) {
x = 2016;
y[0] = 2016;
}")
a = 1L
b = 1L
c = 1
d = 1
test(a,b)
test(c,d)
cat(a,b,c,d) #this prints "1 1 1 2016"
As stated before in other areas, Rcpp establishes convenient classes around R's SEXP objects.
For the first parameter, the double type does not have a default SEXP object. This is because within R, there is no such thing as a scalar. Thus, new memory is allocate making the & reference incompatible. Hence, the variable scope for the modification is limited to the function and there is never an update to the result. As a result, for both test cases, you will see 1.
For the second case, there is a mismatch between object classes. Within the first call object supplied is of type integer due to the L appended on the end, which conflicts with the C++ function expected type of numeric. The issue is resolved once the L is dropped as the object is instantiated as a numeric. Therefore, an intermediary memory location does not need to be created that is of the correct type to receive the value. Hence, the modification in the second case is able to propagate back to R.
e.g.
a = 1L
class(a)
# "integer"
a = 1
class(a)
# "numeric"
Related
So I got the following code:
def foo(a, b, c):
try:
a = [0]
b[0] = 1
c[0] = 2
w[0] = 3
except:
pass
return z
x, y, z = [None], [None], [None]
w = z[:]
foo(x,y,z)
print(x,y,z,w)
The last line of the code print(x,y,z,w) prints [None] [1] [2] [3], however
I don't quite get it. Why are x,y,z are being changed from within the funciton? and if w changes - and it points to z, why doesnt z change accordingly?
In Python objects are passed by reference to functions.
This line makes a copy of z
w = z[:]
so changes to z don't affect w and vice versa. In the line
a = [0]
you change the reference to point to a new object, so you don't mutate x (which is what a was initially bound to). In the following lines
b[0] = 1
c[0] = 2
you mutate the objects that you got references to (y and z in global scope), so the objects in the outside scope change. In the line
w[0] = 3
you mutate the global object w since the name w is not a parameter of the function, nor is it bound in the body of the function.
What everyone else says is correct, but I want to add my way of thinking that may be helpful if you have experience with a language like C or C++.
Every variable in Python is a pointer (well, the technical term is a "reference", but I find that more difficult to visualize than "pointer"). You know how in C/C++ you can get a function to output multiple values by passing in pointers? Your code is doing essentially the same thing.
Of course, you may be wondering, if that is the case, why don't you see the same thing happening to ints, strs or whatnot? The reason is that those things are immutable, which means you cannot directly change the value of an int or a str at all. When you "change an integer", like i = 1, you are really changing the variable, pointing it to a different int object. Similarly, s += 'abc' creates a new str object with the value s + 'abc', then assigns it to s. (This is why s += 'abc' can be inefficient when s is long, compared to appending to a list!)
Notice that when you do a = [0], you are changing a in the second way --- changing the pointer instead of the object pointed to. This is why this line doesn't modify x.
Finally, as the others has said, w = z[:] makes a copy. This might be a little confusing, because for some other objects (like numpy arrays), this syntax makes a view instead of a copy, which means that it acts like the same object when it comes to changing elements. Remember that [] is just an operator, and every type of object can choose to give it a different semantical meaning. Just like % is mod for ints, and formatting for strs --- you sometimes just need to get familiar with the peculiarities of different types.
I have been writing stochastic PDE simulations in Julia, and as my problems have become more complicated, the number of independent parameters has increased. So what starts with,
myfun(N,M,dt,dx,a,b)
eventually becomes
myfun(N,M,dt,dx,a,b,c,d,e,f,g,h)
and it results in (1) messy code, (2) increased chance of error due to misplaced function arguments, (3) inability to generalise for use in other functions.
(3) is important, because I have made simple parallelisation of my code to evaluate many different runs of the PDEs. So I would like to convert my functions into a form:
myfun(args)
where args contains all the relevant arguments. The problem I am finding with Julia, is that creating a struct containing all my relevant parameters as attributes slows things down considerably. I think this is due to the continual accessing of the struct attributes. As a simple (ODE) working example,
function example_fun(N,dt,a,b)
V = zeros(N+1)
U = 0
z = randn(N+1)
for i=2:N+1
V[i] = V[i-1]*(1-dt)+U*dt
U = U*(1-dt/a)+b*sqrt(2*dt/a)*z[i]
end
return V
end
If I try to rewrite this as,
function example_fun2(args)
V = zeros(args.N+1)
U = 0
z = randn(args.N+1)
for i=2:args.N+1
V[i] = V[i-1]*(1-args.dt)+U*args.dt
U = U*(1-args.dt/args.a)+args.b*sqrt(2*args.dt/args.a)*z[i]
end
return V
end
Then while the function call looks elegant, it is cumbersome to rework accessing every attribute from the class and also this continual accessing of attributes slows the simulation down. What is a better solution? Is there a way to simply 'unpack' the attributes of a struct so they do not have to be continually accessed? And if so, how would this be generalised?
edit:
I am defining the struct I use as follows:
struct Args
N::Int64
dt::Float64
a::Float64
b::Float64
end
edit2: I have realised that structs with Array{} attributes can give rise to a performance difference if you do not specify the dimensions of the array in the struct definition. For example, if c is a one-dimensional array of parameters,
struct Args_1
N::Int64
c::Array{Float64}
end
will give far worse performance in f(args) than f(N,c). However, if we specify that c is a one-dimensional array in the struct definition,
struct Args_1
N::Int64
c::Array{Float64,1}
end
then the performance penalty disappears. This issue and the type instability shown in my function definitions seem to account for the performance difference I encountered when using a struct as the function argument.
In your code there is a type instability, related to U which is initialized as an 0 (integer), but if you replace it with 0. (floating point number), the type-instability disapears.
For the original versions (with "U=0"), function example_fun takes 801.933 ns (for the parameters 10,0.1,2.,3.) and example_fun2 925.323 ns (for similar values).
In the type-stable version (U=0.), both take 273 ns (+/5 ns). Thus this a substantial speed-up and there is no more a penalty of combining the arguments in the type args.
Here is the complete function:
function example_fun2(args)
V = zeros(args.N+1)
U = 0.
z = randn(args.N+1)
for i=2:args.N+1
V[i] = V[i-1]*(1-args.dt)+U*args.dt
U = U*(1-args.dt/args.a)+args.b*sqrt(2*args.dt/args.a)*z[i]
end
return V
end
Maybe you did not declare the types of the parameters of the type declaration of args?
Consider this small example:
struct argstype
N
dt
end
myfun(args) = args.N * args.dt
myfun is not type-stable can the type of the return type cannot be inferred:
#code_warntype myfun(argstype(10,0.1))
Variables:
#self# <optimized out>
args::argstype
Body:
begin
return ((Core.getfield)(args::argstype, :N)::Any * (Core.getfield)(args::argstype, :dt)::Any)::Any
end::Any
However, if you declare the types, then code becomes type-stable:
struct argstype2
N::Int
dt::Float64
end
#code_warntype myfun(argstype2(10,0.1))
Variables:
#self# <optimized out>
args::argstype2
Body:
begin
return (Base.mul_float)((Base.sitofp)(Float64, (Core.getfield)(args::argstype2, :N)::Int64)::Float64, (Core.getfield)(args::argstype2, :dt)::Float64)::Float64
end::Float64
You see that the inferred return type of Float64.
With parametric types (https://docs.julialang.org/en/v0.6.3/manual/types/#Parametric-Types-1), your code still remains generic and type-stable at the same time:
struct argstype3{T1,T2}
N::T1
dt::T2
end
#code_warntype myfun(argstype3(10,0.1))
Variables:
#self# <optimized out>
args::argstype3{Int64,Float64}
Body:
begin
return (Base.mul_float)((Base.sitofp)(Float64, (Core.getfield)(args::argstype3{Int64,Float64}, :N)::Int64)::Float64, (Core.getfield)(args::argstype3{Int64,Float64}, :dt)::Float64)::Float64
end::Float64
I'm trying to crate a function in Fortran (95) that that will have as input a string (test) and a character (class). The function will compare each character of test with the character class and return a logical that is .true. if they are of the same class1 and .false. otherwise.
The function (and the program to run it) is defined below:
!====== WRAPPER MODULE ======!
module that_has_function
implicit none
public
contains
!====== THE ACTUAL FUNCTION ======!
function isa(test ,class )
implicit none
logical, allocatable, dimension(:) :: isa
character*(*) :: test
character :: class
integer :: lt
character(len=:), allocatable :: both
integer, allocatable, dimension(:) :: intcls
integer :: i
lt = len_trim(test)
allocate(isa(lt))
allocate(intcls(lt+1))
allocate(character(len=lt+1) :: both)
isa = .false.
both = class//trim(test)
do i = 1,lt+1
select case (both(i:i))
case ('A':'Z'); intcls(i) = 1! uppercase alphabetic
case ('a':'a'); intcls(i) = 2! lowercase alphabetic
case ('0':'9'); intcls(i) = 3! numeral
case default; intcls(i) = 99! checks if they are equal
end select
end do
isa = intcls(1).eq.intcls(2:)
return
end function isa
end module that_has_function
!====== CALLER PROGRAM ======!
program that_uses_module
use that_has_function
implicit none
integer :: i
i = 65
! Reducing the result of "isa" to a scalar with "all" works:
! V-V
do while (all(isa(achar(i),'A')))
print*, achar(i)
i = i + 1
end do
! Without the reduction it doesn''t:
!do while (isa(achar(i),'A'))
! print*, achar(i)
! i = i + 1
!end do
end program that_uses_module
I would like to use this function in do while loops, for example, as it is showed in the code above.
The problem is that, for example, when I use two scalars (rank 0) as input the function still returns the result as an array (rank 1), so to make it work as the condition of a do while loop I have to reduce the result to a scalar with all, for example.
My question is: can I make the function conditionally return a scalar? If not, then is it possible to make the function work with vector and scalar inputs and return, respectively, vector and scalar outputs?
1. What I call class here is, for example, uppercase or lowercase letters, or numbers, etc. ↩
You can not make the function conditionally return a scalar or a vector.
But you guessed right, there is a solution. You will use a generic function.
You write 2 functions, one that takes scalar and return scalar isas, the 2nd one takes vector and return vector isav.
From outside of the module you will be able to call them with the same name: isa. You only need to write its interface at the beginning of the module:
module that_has_function
implicit none
public
interface isa
module procedure isas, isav
end interface isa
contains
...
When isa is called, the compiler will know which one to use thanks to the type of the arguments.
The rank of a function result cannot be conditional on the flow of execution. This includes selection by evaluating an expression.
If reduction of a scalar result is too much, then you'll probably be horrified to see what can be done instead. I think, for instance, of derived types and defined operations.
However, I'd consider it bad design in general for the function reference to be unclear in its rank. My answer, then, is: no you can't, but that's fine because you don't really want to.
Regarding the example of minval, a few things.1 As noted in the comment, minval may take a dim argument. So
integer :: X(5,4) = ...
print *, MINVAL(X) ! Result a scalar
print *, MINVAL(X,dim=1) ! Result a rank-1 array
is in keeping with the desire of the question.
However, the rank of the function result is still "known" at the time of referencing the function. Simply having a dim argument means that the result is an array of rank one less than the input array rather than a scalar. The rank of the result doesn't depend on the value of the dim argument.
As noted in the other answer, you can have similar functionality with a generic interface. Again, the resolved specific function (whichever is chosen) will have a result of known rank at the time of reference.
1 The comment was actually about minloc but minval seems more fitting to the topic.
Suppose I have an algol-like language, with static types and the following piece of code:
a := b + c * d;
where a is a float, b an integer, c a double and d a long. Then, the language will convert d to long to operate with c, and b to double to operate with c*d result. So, after that, the double result of b+c*d will be converted to float to assign the result to a. But, when it happens?, I mean, do all the conversions happens in runtime or compile time?
And if I have:
int x; //READ FROM USER KEYBOARD.
if (x > 5) {
a:= b + c * d;
}
else {
a := b + c;
}
The above code has conditionals. If the compiler converts this at compile time, some portion may never run. Is this correct?
You cannot do a conversion at compile-time any more than you can do an addition at compile time (unless the compiler can determine the value of the variable, perhaps because it is actually constant).
The compiler can (and does) emit a program with instructions which add and multiply the value of variables. It also emits instructions which convert the type of a stored value into a different type prior to computation, if that is necessary.
Languages which do not have variable types fixed at compile-time do have to perform checks at runtime and conditionally convert values to different types. But I don't believe that is the case with any of the languages included in the general category of "Algol-like".
I have two statements:
String aStr = new String("ABC");
String bStr = "ABC";
I read in book that in first statement JVM creates two bjects and one reference variable, whereas second statement creates one reference variable and one object.
How is that? When I say new String("ABC") then It's pretty clear that object is created.
Now my question is that for "ABC" value to we do create another object?
Please clarify a bit more here.
Thank you
You will end up with two Strings.
1) the literal "ABC", used to construct aStr and assigned to bStr. The compiler makes sure that this is the same single instance.
2) a newly constructed String aStr (because you forced it to be new'ed, which is really pretty much non-sensical)
Using a string literal will only create a single object for the lifetime of the JVM - or possibly the classloader. (I can't remember the exact details, but it's almost never important.)
That means it's hard to say that the second statement in your code sample really "creates" an object - a certain object has to be present, but if you run the same code in a loop 100 times, it won't create any more objects... whereas the first statement would. (It would require that the object referred to by the "ABC" literal is present and create a new instance on each iteration, by virtue of calling the constructor.)
In particular, if you have:
Object x = "ABC";
Object y = "ABC";
then it's guaranteed (by the language specification) than x and y will refer to the same object. This extends to other constant expressions equal to the same string too:
private static final String A = "a";
Object z = A + "BC"; // x, y and z are still the same reference...
The only time I ever use the String(String) constructor is if I've got a string which may well be backed by a rather larger character array which I don't otherwise need:
String x = readSomeVeryLargeString();
String y = x.substring(5, 10);
String z = new String(y); // Copies the contents
Now if the strings that y and x refer to are eligible for collection but the string that z refers to isn't (e.g. it's passed on to other methods etc) then we don't end up holding all of the original long string in memory, which we would otherwise.