Prefered style using public/private statements in Fortran? - scope

For example:
module m_a
private
integer :: x, y
public :: x
end module m_a
module m_b
public
integer :: x, y
private :: y
end module m_b
Obviously the variables x and y have equivalent scope in module m_a and m_b. My question is which one is preferred from the point of view of programming style?
In most tutorial of Fortran 90/95, style of module m_a is adopted. However, for a large project, which contains a complicated hierarchy of hundreds of modules, I've noticed significant longer compilation time of sytle m_a than that of style m_b.
I have not found similar topic discussed. Maybe I've misused public/private statements in module hierarchy? Any suggestions?

Style m_a is preferred, where the default is made private, and items are explicitly declared public. With this approach the programmer can readily identify which items are exported (public) by the module. With the other approach this information is difficult to figure out. Assisting the programmer in understanding the module is more important than compilation time.

Related

Scope, association, and information hiding in Fortran 90

1
This question is concerned with calling and called subroutines in Fortran 90. I am quite confused about the rules for host/use/arguments association; I have trouble understanding the scoping logic that results from these rules. Perhaps the simplest way to expose my problem is to explain what I would like to achieve and why.
I would like to meet two design requirements:
(i) A calling subroutine only allows the called subroutine to access its entities that are passed as arguments, no other.
(ii) A called subroutine does not allow a calling subroutine to access any of the entities it locally defines.
If a picture helps, I can provide one. I wish I could think of the calling and the called subroutines as two rooms connected by a channel they use to pass or return arguments. I would like this argument association to be the only means by which two subroutines could have any influence on each other. I believe a code that meets these requirements will be more robust to side effects. If I am mistaken in this idea, I would be grateful to be explained why. If there is a strong reason for which these requirements should not be desired, I also would be happy to know.
2
Certainly, fortran 90 offers the possibility to use modules and the ‘only’ option. For example, one can do the following:
module my_mod_a
contains
subroutine my_sub_a
use my_mod_b, only: my_sub_b
…
call my_sub_b(arg_list_b)
…
end subroutine my_sub_a
end module my_mod_a
!………
module my_mod_b
contains
subroutine my_sub_b(arg_list_b’)
do stuff with arg_list_b’
end my_sub_b
…
end module my_mod_b
!………
Sure enough, my_sub_a will at most be allowed to access those entities of my_mod_b for which my_sub_b is a scoping unit. But will it be able to access entities of my_sub_b other than the argument list it is passing? In particular, will my_sub_a be able to access entities that are local to my_sub_b? Conversely, does the use association allow my_sub_b to access entities of my_sub_a different than those passed as actual arguments?
3
Is the following ‘buffer module’ construction sufficient in order to meet the requirements of #1?
module my_mod_a
contains
subroutine my_sub_a
use my_mod_b_shell, only: my_sub_b_shell
…
call my_sub_b_shell(arg_list_b)
…
end subroutine my_sub_a
end module my_mod_a
!………
module my_mod_b_shell
contains
subroutine my_sub_b_shell(arg_list_b’)
!passes arguments, does not do anything else
use my_mod_b, only: my_sub_b
call my_sub_b(arg_list_b’)
end my_sub_b_shell
end module my_mod_b_shell
!………
module my_mod_b
contains
subroutine my_sub_b(arg_list_b’)
do stuff with arg_list_b’
end my_sub_b
…
end module my_mod_b
!………
4
Is there any simpler construction to achieve the goals of #1?
5
Following the suggestions proposed by Ross and Vladimir F,
one possibility could be:
(i’) to have a one-to-one correspondence between modules and subroutines,
(ii’) to declare local variables in the module instead of the subroutine; one is then able to tag local variables as ‘private’.
Just to be sure I have got the point right, here is a trivial program that illustrates (i’) and (ii’):
program main
use sub_a_module
implicit none
double precision :: x
x=0.0d+0
write(*,*) 'x ante:',x
call sub_a(x)
write(*,*) 'x post:',x
end program main
!-------------------------------------------------
module sub_a_module
double precision, private :: u0
double precision, private :: v0
contains
!.-.-.-.-.-.-.-.-
subroutine sub_a(x)
use sub_b_module
implicit none
double precision :: x
u0=1.0d+0
v0=2.0d+0
call sub_b(v0)
x=x+u0+v0
end subroutine sub_a
!.-.-.-.-.-.-.-.-
end module sub_a_module
!-------------------------------------------------
module sub_b_module
double precision, private :: w0
contains
!.-.-.-.-.-.-.-.-
subroutine sub_b(v)
implicit none
double precision :: v
w0=1.0d-1
v=v+w0
end subroutine sub_b
!.-.-.-.-.-.-.-.-
end module sub_b_module
In this example, the only entity of sub_a that sub_b can access is v0 (argument association); u0 will remain hidden to sub_b. Conversely, the 'private' tag guarantees that the variable w0 remains out of the scope of sub_a even if sub_a USEs the sub_b_module. Is that right?
#Ross: Thank you for pointing out a previous post where association is inherited. However, my impression is that it only addresses half of my problem; the construction discussed in that post illustrates how one can prevent a caller program unit to access entities of a called program unit that should remain hidden ('use only' and/or 'private' options), but I am unable to assert with certainty that entities of the caller program unit that are not argument-associated will remain unaccessible to the called program unit.
You can always create a small module that contains only the subroutine and nothing else and use it from a larger module that collect these and which is than actually used for calling the subroutine.
Fortran 2015 brings more control for host asociation using IMPORT. I am not sure whether that affects also module procedures, but it might. But you ask for the ancient Fortran 90 so probably you are not interested in this (compilers don't implement it yet anyway), but I will leave it here for the future:
If one import statement in a scoping unit is an import,only statement,
they must all be, and only the entities listed become accessible by
host association.
If an import,none statement appears in a scoping unit, no entities are
accessible by host association and it must be the only import
statement in the scoping unit. ...
(From: Reid (2017) The new features of Fortran 2015)
Yes, your example is more or less correct, although there are many possible variations. You can always add the only clause to document why do you use the module and which symbols are imported.
Important: I suggest you do not put implicit none inside contained procedures but once in the module. You definitely do want to have the module variables covered by implicit none! And use some indentation, so that the structure is actually visible when looking at a block of code.
If you use only one source file you definitely have to put there the modules and the program in a different order:
module sub_b_module
implicit none
double precision, private :: w0
contains
subroutine sub_b(v)
double precision :: v
w0=1.0d-1
v=v+w0
end subroutine sub_b
end module sub_b_module
module sub_a_module
implicit none
double precision, private :: u0
double precision, private :: v0
contains
subroutine sub_a(x)
use sub_b_module, only: sub_b
double precision :: x
u0=1.0d+0
v0=2.0d+0
call sub_b(v0)
x=x+u0+v0
end subroutine sub_a
end module sub_a_module
program main
use sub_a_module, only: sub_a
implicit none
double precision :: x
x=0.0d+0
write(*,*) 'x ante:',x
call sub_a(x)
write(*,*) 'x post:',x
end program main
If you are super concern about data access, you can make modules mod_sub_a, mod_sub_b, mod_sub_c each with one public subroutine only. Then module subroutines which uses those and let all other code use only module subroutines to access those.
To be clear - from the IanH's answer I can see there may have been some misunderstanding here - I certainly do NOT recommend one to one correspondence of modules and subroutines. I regard it as a rather extreme thing to do to ensure your main point:
A calling subroutine only allows the called subroutine to access its
entities that are passed as arguments, no other.
So I showed you how to make sure that your subroutine does not have access anything outside - by placing it into a module which does not have anything more to have access to. I also showed you the future way with import that can be used to disable access to other entities defined in the host module.
I just ignored your second point
A called subroutine does not allow a calling subroutine to access any
of the entities it locally defines.
because that is fulfilled automatically,unless the calling subroutine is internal and contained within the calling subroutine.
Local variables of a subroutine are only accessible from within the scope of that subroutine (and its internal procedures) - that's why they are called "local variables". You can achieve your design goals in part 1 of your question without any particular effort.
Variables that are not local variables (for example, module variables, variables in a common block) may be accessible or share information across different scopes - that's typically the reason those other sorts of variable exist in the language. If you don't want that sort of information sharing, then don't use those other sorts of variable!
The suggestions in part 5 of the edited question are going the wrong way...

System of equations using metaprogramming

I am trying to create a function that computes the residuals of a system of equations using metaprogramming.
This is what I have tried so far (toy example):
function syst!(x::Vector, ou::Vector)
for i in 1:length(x)
eval(parse("ou[$i] = x[$i]^2 + x[$i]"))
end
return ou
end
However, when I try to compute the function, Julia says that the variable x is not defined. But if I include a println(parse("ou[$i] = x[$i]^2 + x[$i]")) I get the code that would be "typed" in the body of the function (sorry if I'm not using the correct technical CS terms, I come from the "scientific culture").
Anyways, it seems that the parseed x lives in another scope. How can I bring that parsed x to the scope of the function so that it represents the x from the arguments of syst!?
Bonus: I have a system of 700 equations and they are amenable to be "typed" using metaprogramming, what's the best way/technique to create a function that computes the residuals of the system? Was I on the right track?
Stefan's comment is right; in this specific example there is no need for metaprogramming. However, if you wanted to generate many lines similar to ou[i] = x[i]^2 + x[i] but different in complicated ways, you could generate them with a macro. See http://docs.julialang.org/en/release-0.4/manual/metaprogramming/. Macros expand to generated code "in place" as if you had typed it yourself, so variables can refer to the surrounding scope.

Is there a fast way of going from a symbol to a function call in Julia? [duplicate]

This question already has an answer here:
Julia: invoke a function by a given string
(1 answer)
Closed 6 years ago.
I know that you can call functions using their name as follows
f = x -> println(x)
y = :f
eval(:($y("hi")))
but this is slow since it is using eval is it possible to do this in a different way? I know it's easy to go the other direction by just doing symbol(f).
What are you trying to accomplish? Needing to eval a symbol sounds like a solution in search of a problem. In particular, you can just pass around the original function, thereby avoiding issues with needing to track the scope of f (or, since f is just an ordinary variable in your example, the possibility that it would get reassigned), and with fewer characters to type:
f = x -> println(x)
g = f
g("hi")
I know it's easy to go the other direction by just doing symbol(f).
This is misleading, since it's not actually going to give you back f (that transform would be non-unique). But it instead gives you the string representation for the function (which might happen to be f, sometimes). It is simply equivalent to calling Symbol(string(f)), since the combination is common enough to be useful for other purposes.
Actually I have found use for the above scenario. I am working on a simple form compiler allowing for the convenient definition of variational problems as encountered in e.g. finite element analysis.
I am relying on the Julia parser to do an initial analysis of the syntax. The equations entered are valid Julia syntax, but will trigger errors on execution because some of the symbols or methods are not available at the point of the problem definition.
So what I do is roughly this:
I have a type that can hold my problem description:
type Cmd f; a; b; end
I have defined a macro so that I have access to the problem description AST. I travers this expression and create a Cmd object from its elements (this is not completely unlike the strategy behind the #mat macro in MATLAB.jl):
macro m(xp)
c = Cmd(xp.args[1], xp.args[3], xp.args[2])
:($c)
end
At a later step, I run the Cmd. Evaluation of the symbols happens only at this stage (yes, I need to be careful of the evaluation context):
function run(c::Cmd)
xp = Expr(:call, c.f, c.a, c.b)
eval(xp)
end
Usage example:
c = #m a^b
...
a, b = 2, 3
run(c)
which returns 9. So in short, the question is relevant in at least some meta-programming scenarios. In my case I have to admit I couldn't care less about performance as all of this is mere preprocessing and syntactic sugar.

Why does NPM's policy of duplicated dependencies work?

By default, when I use NPM to manage a package depending on foo and bar, both of which depend on corelib, by default, NPM will install corelib twice (once for foo, and once for bar). They might even be different versions.
Now, let's suppose that corelib defined some data structure (e.g. a URL object) which is passed between foo, bar and the main application. Now, what I would expect, is if there was ever a backwards incompatible change to this object (e.g. one of the field names changed), and foo depended on corelib-1.0 and bar depended on corelib-2.0, I'd be a very sad panda: bar's version of corelib-2.0 might see a data structure created by the old version of corelib-1.0 and things would not work very well.
I was really surprised to discover that this situation basically never happens (I trawled Google, Stack Overflow, etc, looking for examples of people whose applications had stopped working, but who could have fixed it by running dedupe.) So my question is, why is this the case? Is it because node.js libraries never define data structures that are shared outside of the programmers? Is it because node.js developers never break backwards compatibility of their data structures? I'd really like to know!
this situation basically never happens
Yes, my experience is indeed that that is not a problem in the Node/JS ecosystem. And I think it is, in part, thanks to the robustness principle.
Below is my view on why and how.
Primitives, the early days
I think the first and foremost reason is that the language provides a common basis for primitive types (Number, String, Bool, Null, Undefined) and some basic compound types (Object, Array, RegExp, etc...).
So if I receive a String from one of the libs' APIs I use, and pass it to another, it cannot go wrong because there is just a single String type.
This is what used to happen, and still happens to some extent to this day: Library authors try to rely on the built-ins as much as possible and only diverge when there is sufficient reason to, and with sufficient care and thought.
Not so in Haskell. Before I started using stack, I've run into the following situation quite a few times with Text and ByteString:
Couldn't match type ‘T.Text’
with ‘Text’
NB: ‘T.Text’
is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.1’
‘Text’ is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.0’
Expected type: String -> Text
Actual type: String -> T.Text
This is quite frustrating, because in the above example only the patch version is different. The two data types may only be different nominally, and the ADT definition and the underlying memory representation may be completely identical.
As an example, it could have been a minor bugfix to the intersperse function that warranted the release of 1.2.2.1. Which is completely irrelevant to me if all I care about, in this hypothetical example, is concatenating some Texts and comparing their lengths.
Compound types, objects
Sometimes there is sufficient reason to diverge in JS from the built in data types: Take Promises as an example. It's such a useful abstraction over async computations compared to callbacks that many APIs started using them. What now? How come we don't run into many incompatibilities when different versions of these {then(), fail(), ...} objects are being passed up, down and around the dependency tree?
I think it's thanks to the robustness principle.
Be conservative in what you send, be liberal in what you accept.
So if I am authoring a JS library which I know returns promises and takes promises as part of its API, I'll be very careful how I interact with the received objects. E.g. I won't be calling fancy .success(), .finally(), ['catch']() methods on it, since I want to be as compatible as possible with different users, with different implementations of Promises. So, very conservatively, I may just use .then(done, fail), and nothing more. At this point, it doesn't matter if the user uses the promises that my lib returns, or Bluebirds' or even if they hand-write their own, so long as those adhere to the most basic Promise 'laws' -- the most basic API contracts.
Can this still lead to breakage at runtime? Yes, it can. If even the most basic API contract is not fulfilled, you may get an exception saying "Uncaught TypeError: promise.then is not a function". I think the trick here is that library authors are explicit about what their API needs: e.g. a .then method on the supplied object. And then its up to whoever is building on top of that API to make it damn sure that that method is available on the object they pass in.
I'd like to also point out here that this is also the case for Haskell, isn't it? Should I be so foolish as to write an instance for a typeclass that still type-checks without following its laws, I'll get runtime errors, won't I?
Where do we go from here?
Having thought through all this just now, I think we might be able to have the benefits of the robustness principle even in Haskell with much less (or even no(?)) risk for runtime exceptions/errors compared to JavaScript: We just need the typesystem be granular enough so it can distinguish what we want to do with the data we manipulate, and determine if that is still safe or not. E.g. The hypothetical Text example above, I would wager is still safe. And the compiler should only complain if I'm trying to use intersperse, and asks me to qualify it. E.g. with T.intersperse so it can be sure which one I want to use.
How do we do this in practice? Do we need extra support, e.g. language extension flags from GHC? We might not.
Just recently I found bookkeeper, which is a compile-time type-checked anonymous records implementation.
Please note: The following is conjecture on my part, I haven't taken much time to try and experiment with Bookkeeper. But I intend to in my Haskell projects to see if what I write about below could really be achieved with an approach such as this.
With Bookkeeper I could define an API like so:
emptyBook & #then =: id & #fail =: const
:: Bookkeeper.Internal.Book'
'["fail" 'Data.Type.Map.:-> (a -> b -> a),
"then" 'Data.Type.Map.:-> (a1 -> a1)]
Since functions are also first-class values. And whichever API takes this Book as an argument can be very specific what it demands from it: Namely the #then function, and that it has to match a certain type signature. And it cares not for any other function that may or may not be present with whatever signature. All this checked at compile time.
Prelude Bookkeeper
> let f o = (o ?: #foo) "a" "b" in f $ emptyBook & #foo =: (++)
"ab"
Conclusion
Maybe Bookkeeper or something similar will turn out to be useful in my experiments. Maybe Backpack will rush to the rescue with its common interface definitions. Or some other solution comes along. But either way, I hope we can move towards being able to take advantage of the robustness principle. And that Haskell's dependency management can also "just work" most of the time and fail with type errors only when it is truly warranted.
Does the above make sense? Anything unclear? Does it answer your question? I'd be curious to hear.
Further possibly relevant discussion may be found in this /r/haskell reddit thread, where this topic came up just not long ago, and I thought to post this answer to both places.
If I understand well, the supposed problem might be :
Module A
exports = require("c") //v0.1
Module B
console.log(require("a"))
console.log(require("c")) //v0.2
Module C
V0.1
exports = "hello";
V0.2
exports = "world";
By copying C_0.2 in node_modules and C0.1 in node_modules/a/node_modules and creating dummy packages.json, I think I created the case you're talking about.
will B have 2 different conflicting versions of C_data ?
Short answer :
it does. So node does not handle conflicting versions.
The reason you don't see it on the internet is as gustavohenke explained that node naturally does not encourage you to pollute the global scope or chain pass structures between modules.
In other words, it's not often that you'll see a module export another module's structure.
I don't have first-hand experience with this kind of situation in a large JS program, but I would guess that it has to do with the OO style of bundling data together with the functions that act on that data into a single object. Effectively the "ABI" of an object is to pull public methods by name out of a dictionary, and then invoke them by passing the object as the first argument. (Or perhaps the dictionary contains closures that are already partially applied to the object itself; it doesn't really matter.)
In Haskell we do encapsulation at a module level. For example, take a module that defines a type T and a bunch of functions, and exports the type constructor T (but not its definition) and some of the functions. The normal way to use such a module (and the only way that the type system will permit) is to use one exported function create to create a value of type T, and another exported function consume to consume the value of type T: consume (create a b c) x y z.
If I had two different versions of the module with different definitions of T and I was able to use the create from version 1 together with the consume from version 2 then I'd likely get a crash or wrong answer. Note that this is possible even if the public API and externally observable behavior of the two versions is identical; perhaps version 2 has a different representation of T that allows for a more efficient implementation of consume. Of course, GHC's type system stops you from doing this, but there are no such safeguards in a dynamic language.
You can translate this style of programming directly into a language like JavaScript or Python:
import M
result = M.consume(M.create(a, b, c), x, y, z)
and it would have exactly the same kind of problem that you are talking about.
However, it's far more common to use the OO style:
import M
result = M.create(a, b, c).consume(x, y, z)
Note that only create is imported from the module. consume is in a sense imported from the object we got back from create. In your foo/bar/corelib example, let's say that foo (which depends on corelib-1.0) calls create and passes the result to bar (which depends on corelib-2.0) which will call consume on it. Actually, while foo needs a dependency on corelib to call create, bar does not need a dependency on corelib to call consume at all. It's only using the base language notions to invoke consume (what we could spell getattr in Python). In this situation, bar will end up invoking the version of consume from corelib-1.0 regardless of what version of corelib bar "depends on".
Of course for this to work the public API of corelib must not have changed too much between corelib-1.0 and corelib-2.0. If bar wants to use a method fancyconsume which is new in corelib-2.0 then it won't be present on an object created by corelib-1.0. Still, this situation is much better than we had in original Haskell version, where even changes that do not affect the public API at all can cause breakage. And perhaps bar depends on corelib-2.0 features for the objects it creates and consumes itself, but only uses the API of corelib-1.0 to consume objects it receives externally.
To achieve something similar in Haskell, you could use this translation. Rather than directly using the underlying implementation
data TImpl = TImpl ... -- private
create_ :: A -> B -> C -> TImpl
consume_ :: TImpl -> X -> Y -> Z -> R
...
we wrap up the consumer interface with an existential in an API package corelib-api:
module TInterface where
data T = forall a. T { impl :: a,
_consume :: a -> X -> Y -> Z -> R,
... } -- Or use a type class if preferred.
consume :: T -> X -> Y -> Z -> R
consume t = (_consume t) (impl t)
and then the implementation in a separate package corelib:
module T where
import TInterface
data TImpl = TImpl ... -- private
create_ :: A -> B -> C -> TImpl
consume_ :: TImpl -> X -> Y -> Z -> R
...
create :: A -> B -> C -> T
create a b c = T { impl = create_ a b c,
_consume = consume_ }
Now foo uses corelib-1.0 to call create, but bar only needs corelib-api to call consume. The type T lives in corelib-api, so if the public API version does not change, then foo and bar can interoperate even if bar is linked against a different version of corelib.
(I know Backpack has a lot to say about this kind of thing; I'm offering this translation as a way to explain what is happening in the OO programs, not as a style one should seriously adopt.)
Here is a question that mostly answers the same thing: https://stackoverflow.com/a/15948590/2083599
Node.js modules don't pollute the global scope, so when they're required, they'll be private to the module that required them - and this is a great functionality.
When 2 or more packages require different versions of the same lib, NPM will install them for each package, so no conflicts will ever happen.
When they don't, NPM will install only once that lib.
In the other hand, Bower, which is a package manager for the browser, does install only flat dependencies because the libs will go to the global scope, so you can't install jquery 1.x.x and 2.x.x. They'll only export the same jQuery and $ vars.
About the backwards compatibility problems:
All developers do break backwards compatibility at least once! The only difference between Node developers and developers of other platforms is that we have been teached to always use semver.
Considering that most packages out there have not reached v2.0.0 yet, I believe that they have kept the same API in the switch from v0.x.x to v1.0.0.

Achieving the right abstractions with Haskell's type system

I'm having trouble using Haskell's type system elegantly. I'm sure my problem is a common one, but I don't know how to describe it except in terms specific to my program.
The concepts I'm trying to represent are:
datapoints, each of which takes one of several forms, e.g. (id, number of cases, number of controls), (id, number of cases, population)
sets of datapoints and aggregate information: (set of id's, total cases, total controls), with functions for adding / removing points (so for each variety of point, there's a corresponding variety of set)
I could have a class of point types and define each variety of point as its own type. Alternatively, I could have one point type and a different data constructor for each variety. Similarly for the sets of points.
I have at least one concern with each approach:
With type classes: Avoiding function name collision will be annoying. For example, both types of points could use a function to extract "number of cases", but the type class can't require this function because some other point type might not have cases.
Without type classes: I'd rather not export the data constructors from, say, the Point module (providing other, safer functions to create a new value). Without the data constructors, I won't be able to determine of which variety a given Point value is.
What design might help minimize these (and other) problems?
To expand a bit on sclv's answer, there is an extended family of closely-related concepts that amount to providing some means of deconstructing a value: Catamorphisms, which are generalized folds; Church-encoding, which represents data by its operations, and is often equivalent to partially applying a catamorphism to the value it deconstructs; CPS transforms, where a Church encoding resembles a reified pattern match that takes separate continuations for each case; representing data as a collection of operations that use it, usually known as object-oriented programming; and so on.
In your case, what you seem to want is an an abstract type, i.e. one that doesn't export its internal representation, but not a completely sealed one, i.e. that leaves the representation open to functions in the module that defines it. This is the same pattern followed by things like Data.Map.Map. You probably don't want to go the type class route, since it sounds like you need to work with a variety of data points, rather than on an arbitrary choice of a single type of data point.
Most likely, some combination of "smart constructors" to create values, and a variety of deconstruction functions (as described above) exported from the module is the best starting point. Going from there, I expect most of the remaining details should have an obvious approach to take next.
With the latter solution (no type classes), you can export a catamorphism on the type rather than the constructors..
data MyData = PointData Double Double | ControlData Double Double Double | SomeOtherData String Double
foldMyData pf cf sf d = case d of
(PointData x y) -> pf x y
(ControlData x y z) -> cf x y z
(SomeOtherData s x) -> sf s x
That way you have a way to pull your data apart into whatever you want (including just ignoring the values and passing functions that return what type of constructor you used) without providing a general way to construct your data.
I find the type-classes-based approach better as long as you are not going to mix different data points in a single data structure.
The name collision problem you mentioned can be solved by creating a separate type class for each distinct field, like this:
class WithCases p where
cases :: p -> NumberOfCases

Resources