Something that comes up a fair amount when dealing with heterogeneous data is the need to partially change simple objects that hold data. For instance, you might want to add, drop, or rename a property, or concatenate two objects. This is easy enough in dynamic languages, but I'm wondering if there are any clever solutions proposed by static languages?
To fix ideas, are there any languages that enable, perhaps through some sort of static mixin syntax, something like this (C#):
var hello = new { Hello = "Hello" };
var world = new { World = "World" };
var helloWorld = hello + world;
Console.WriteLine(helloWorld.ToString());
//outputs {Hello = Hello, World = World}
This certainly seems possible, since no runtime information is used. Are there static languages that have this capability?
Added:
A limited version of what I'm considering is F#'s copy-and-update record expression:
let myRecord3 = { myRecord2 with Y = 100; Z = 2 }
What you're describing is known in programming language research as record concatenation. There has been some work on static type systems for record concatenation, mostly in the context of automatic type inference a la Haskell or ML. To the best of my knowledge it has not yet made an impact on any mainstream programming languages.
Related
I'm new to functional languages and I was wondering why we can't pass a parameter by reference.
I found anserws saying that
you are not supposed to change the state of objects once they have been created
but I didn't quite get the idea.
It's not so much that you can't pass references, it's that with referential transparency there isn't a programmer-visible difference between references and values, because you aren't allowed to change what references point to. This makes it actually safer and more prevalent in pure functional programming to pass shared references around everywhere. From a semantic point of view, they may as well be values.
I think you have misunderstood the concept. Both Scheme and C/C++ are pass by value languages and most values are addresses (references).
Purely functional languages can have references and those are passed by value. What they don't have is redefining variables in the same scope (mutate bindings) and they don't have the possibility to update the object the reference points to. All operations return a fresh new object.
As an example I can give you Java's strings. Java is not purely functional but its strings are. If you change the string to uppercase you get a new string object in return and the original one has not been altered.
Most languages I know of are pass by value. Pass by name is alien to me.
Because if you pass params by reference you could change something in the parameter, which could introduce a side effect. Consider this:
function pay(person, cost) {
person.wallet -= cost;
}
function money(person) {
return person.wallet;
}
let joe = { name: "Joe", wallet: 300 };
console.log(money(joe)); // 300
pay(joe, 20);
console.log(money(joe)); // 280
The two money(joe) are taking the same input (the object joe) and giving different output (300, 280). This is in contradiction to the definition of a pure functional language (all functions must return the same output when given the same input).
If the program was made this way, there is no problem:
function pay(person, cost) {
return Object.freeze({ ...person, wallet: person.wallet - cost });
}
function money(person) {
return person.wallet;
}
let joe = Object.freeze({ name: "Joe", wallet: 300 });
console.log(money(joe)); // 300
let joe_with_less_money = pay(joe, 20);
console.log(money(joe)); // still 300
console.log(money(joe_with_less_money)); // 280
Here we have to fake pass-by-value by freezing objects (which makes them immutable) since JavaScript can pass parameters only one way (pass by sharing), but the idea is the same.
(This presupposes the implications of the term "pass-by-reference" that apply to languages like C++, where the implementation detail affects mutability, not the actual implementation detail of modern languages, where references are typically passed under the hood but immutability is assured by other means.)
In one of his videos (concerning Scala's lazy evaluation, namely lazy keyword), Martin Odersky shows the following implementation of cons operation used to construct a Stream:
def cons[T](hd: T, tl: => Stream[T]) = new Stream[T] {
def head = hd
lazy val tail = tl
...
}
So tail operation is written concisely using lazy evaluation feature of the language.
But in reality (in Scala 2.11.7), the implementation of tail is a bit less elegant:
#volatile private[this] var tlVal: Stream[A] = _
#volatile private[this] var tlGen = tl _
def tailDefined: Boolean = tlGen eq null
override def tail: Stream[A] = {
if (!tailDefined)
synchronized {
if (!tailDefined) {
tlVal = tlGen()
tlGen = null
}
}
tlVal
}
Double-checked locking and two volatile fields: that's roughly how you would implement a thread-safe lazy computation in Java.
So the questions are:
Doesn't lazy keyword of Scala provide any 'evaluated maximum once' guarantee in a multi-threaded case?
Is the pattern used in real tail implementation an idiomatic way to do a thread-safe lazy evaluation in Scala?
Doesn't lazy keyword of Scala provide any 'evaluated maximum once'
guarantee in a multi-threaded case?
Yes, it does, as others have stated.
Is the pattern used in real tail implementation an idiomatic way to do
a thread-safe lazy evaluation in Scala?
Edit:
I think I have the actual answer as to why not lazy val. Stream has public facing API methods such as hasDefinitionSize inherited from TraversableOnce. In order to know if a Stream has a finite size not, we need a way of checking without materializing the underlying Stream tail. Since lazy val doesn't actually expose the underlying bit, we can't do that.
This is backed by SI-1220
To strengthen this point, #Jasper-M points out that the new LazyList api in strawman (Scala 2.13 collection makeover) no longer has this issue, since the entire collection hierarchy has been reworked and there are no longer such concerns.
Performance related concerns
I would say "it depends" on which angle you're looking at this problem. From a LOB point of view, I'd say definitely go with lazy val for conciseness and clarity of implementation. But, if you look at it from the point of view of a Scala collections library author, things start to look differently. Think of it this way, you're creating a library which will be potentially be used by many people and ran on many machines across the world. This means that you should be thinking of the memory overhead of each structure, especially if you're creating such an essential data structure yourself.
I say this because when you use lazy val, by design you generate an additional Boolean field which flags if the value has been initialized, and I am assuming this is what the library authors were aiming to avoid. The size of a Boolean on the JVM is of course VM dependent, by even a byte is something to consider, especially when people are generating large Streams of data. Again, this is definitely not something I would usually consider and is definitely a micro optimization towards memory usage.
The reason I think performance is one of the key points here is SI-7266 which fixes a memory leak in Stream. Note how it is of importance to track the byte code to make sure no extra values are retained inside the generated class.
The difference in the implementation is that the definition of tail being initialized or not is a method implementation which checks the generator:
def tailDefined: Boolean = tlGen eq null
Instead of a field on the class.
Scala lazy values are evaluated only once in multi-threaded cases. This is because the evaluation of lazy members is actually wrapped in a synchronized block in the generated code.
Lets take a look at the simple claas,
class LazyTest {
lazy val x = 5
}
Now, lets compile this with scalac,
scalac -Xprint:all LazyTest.scala
This will result in,
package <empty> {
class LazyTest extends Object {
final <synthetic> lazy private[this] var x: Int = _;
#volatile private[this] var bitmap$0: Boolean = _;
private def x$lzycompute(): Int = {
LazyTest.this.synchronized(if (LazyTest.this.bitmap$0.unary_!())
{
LazyTest.this.x = (5: Int);
LazyTest.this.bitmap$0 = true
});
LazyTest.this.x
};
<stable> <accessor> lazy def x(): Int = if (LazyTest.this.bitmap$0.unary_!())
LazyTest.this.x$lzycompute()
else
LazyTest.this.x;
def <init>(): LazyTest = {
LazyTest.super.<init>();
()
}
}
}
You should be able to see... that the lazy evaluation is thread-safe. And you will also see some similarity to that "less elegant" implementation in Scala 2.11.7
You can also experiment with tests similar to following,
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
case class A(i: Int) {
lazy val j = {
println("calculating j")
i + 1
}
}
def checkLazyInMultiThread(): Unit = {
val a = A(6)
val futuresList = Range(1, 20).toList.map(i => Future{
println(s"Future $i :: ${a.j}")
})
Future.sequence(futuresList).onComplete(_ => println("completed"))
}
checkLazyInMultiThread()
Now, the implementation in standard library avoids using lazy because they are able to provide a more efficient solution than this generic lazy translation.
You are correct, lazy vals use locking precisely to guard against double evaluation when accessed at the same time by two threads. Future developments, furthermore, will give the same guarantees without locking.
What is idiomatic, in my humble opinion, is a highly debatable subject when it comes to a language that, by design, allows for a wide range of different idioms to be adopted. In general, however, application code tends to be considered idiomatic when going more into the direction of pure functional programming, as it gives a series of interesting advantages in terms of ease of testing and reasoning that would make sense to give up only in case of serious concerns. This concern can be one of performance, which is why the current implementation of the Scala Collection API, while exposing in most cases a functional interface, makes heavy use (internally and in restricted scopes) of vars, while loops and established patterns from imperative programming (as the one you highlighted in your question).
The following scenario shows an abstraction that seems to me to be impossible to implement declaratively.
Suppose that I want to create a Symbol object which allows you to create objects with strings that can be compared, like Symbol.for() in JavaScript. A simple implementation in JS might look like this:
function MySymbol(text){//Comparable symbol object class
this.text = text;
this.equals = function(other){//Method to compare to other MySymbol
return this.text == other.text;
}
}
I could easily write this in a declarative language like Haskell:
data MySymbol = MySymbol String
makeSymbol :: String -> MySymbol
makeSymbol s = MySymbol s
compareSymbol :: MySymbol -> MySymbol -> Bool
compareSymbol (MySymbol s1) (MySymbol s2) = s1 == s2
However, maybe in the future I want to improve efficiency by using a global registry without changing the interface to the MySymbol objects. (The user of my class doesn't need to know that I've changed it to use a registry)
For example, this is easily done in Javascript:
function MySymbol(text){
if (MySymbol.registry.has(text)){//check if symbol already in registry
this.id = MySymbol.registry.get(text);//get id
} else {
this.id = MySymbol.nextId++;
MySymbol.registry.set(text, this.id);//Add new symbol with nextId
}
this.equals = function(other){//To compare, simply compare ids
return this.id == other.id;
}
}
//Setup initial empty registry
MySymbol.registry = new Map();//A map from strings to numbers
MySymbol.nextId = 0;
However, it is impossible to create a mutable global registry in Haskell. (I can create a registry, but not without changing the interface to my functions.)
Specifically, these three possible Haskell solutions all have problems:
Force the user to pass a registry argument or equivalent, making the interface implementation dependent
Use some fancy Monad stuff like Haskell's Control.Monad.Random, which would require either foreseeing the optimization from the start or changing the interface (and is basically just adding the concept of state into your program and therefore breaks referential transparency etc.)
Have a slow implementation which might not be practical in a given application
None of these solutions allow me to sufficiently abstract away implementation from my Haskell interface.
So, my question is: Is there a way to implement this optimization to a Symbol object in Haskell (or any declarative language) without causing one of the three problems listed above,
and are there any other situations where an imperative language can express an abstraction (for example an optimization like above) that a declarative language can't?
The intern package shows how. As discussed by #luqui, it uses unsafePerformIO at a few key moments, and is careful to hide the identifiers produced during interning.
In some dynamic languages I have seen this kind of syntax:
myValue = if (this.IsValidObject)
{
UpdateGraph();
UpdateCount();
this.Name;
}
else
{
Debug.Log (Exceptions.UninitializedObject);
3;
}
Basically being able to return the last statement in a branch as the return value for a variable, not necessarily only for method returns, but they could be achieved as well.
What's the name of this feature?
Can this also be achieved in staticly typed languages such as C#? I know C# has ternary operator, but I mean using if statements, switch statements as shown above.
It is called "conditional-branches-are-expressions" or "death to the statement/expression divide".
See Conditional If Expressions:
Many languages support if expressions, which are similar to if statements, but return a value as a result. Thus, they are true expressions (which evaluate to a value), not statements (which just perform an action).
That is, if (expr) { ... } is an expression (could possible be an expression or a statement depending upon context) in the language grammar just as ?: is an expression in languages like C, C# or Java.
This form is common in functional programming languages (which eschew side-effects) -- however, it is not "functional programming" per se and exists in other language that accept/allow a "functional like syntax" while still utilizing heavy side-effects and other paradigms (e.g. Ruby).
Some languages like Perl allow this behavior to be simulated. That is, $x = eval { if (true) { "hello world!" } else { "goodbye" } }; print $x will display "hello world!" because the eval expression evaluates to the last value evaluated inside even though the if grammar production itself is not an expression. ($x = if ... is a syntax error in Perl).
Happy coding.
To answer your other question:
Can this also be achieved in staticly typed languages such as C#?
Is it a thing the language supports? No. Can it be achieved? Kind of.
C# --like C++, Java, and all that ilk-- has expressions and statements. Statements, like if-then and switch-case, don't return values and there fore can't be used as expressions. Also, as a slight aside, your example assigns myValue to either a string or an integer, which C# can't do because it is strongly typed. You'd either have to use object myValue and then accept the casting and boxing costs, use var myValue (which is still static typed, just inferred), or some other bizarre cleverness.
Anyway, so if if-then is a statement, how do you do that in C#? You'd have to build a method to accomplish the goal of if-then-else. You could use a static method as an extension to bools, to model the Smalltalk way of doing it:
public static T IfTrue(this bool value, Action doThen, Action doElse )
{
if(value)
return doThen();
else
return doElse();
}
To use this, you'd do something like
var myVal = (6 < 7).IfTrue(() => return "Less than", () => return "Greater than");
Disclaimer: I tested none of that, so it may not quite work due to typos, but I think the principle is correct.
The new IfTrue() function checks the boolean it is attached to and executes one of two delegates passed into it. They must have the same return type, and neither accepts arguments (use closures, so it won't matter).
Now, should you do that? No, almost certainly not. Its not the proper C# way of doing things so it's confusing, and its much less efficient than using an if-then. You're trading off something like 1 IL instruction for a complex mess of classes and method calls that .NET will build behind the scenes to support that.
It is a ternary conditional.
In C you can use, for example:
printf("Debug? %s\n", debug?"yes":"no");
Edited:
A compound statement list can be evaluated as a expression in C. The last statement should be a expression and the whole compound statement surrounded by braces.
For example:
#include <stdio.h>
int main(void)
{
int a=0, b=1;
a=({
printf("testing compound statement\n");
if(b==a)
printf("equals\n");
b+1;
});
printf("a=%d\n", a);
return 0;
}
So the name of the characteristic you are doing is assigning to a (local) variable a compound statement. Now I think this helps you a little bit more. For more, please visit this source:
http://www.chemie.fu-berlin.de/chemnet/use/info/gcc/gcc_8.html
Take care,
Beco.
PS. This example makes more sense in the context of your question:
a=({
int c;
if(b==a)
c=b+1;
else
c=a-1;
c;
});
In addition to returning the value of the last expression in a branch, it's likely (depending on the language) that myValue is being assigned to an anonymous function -- or in Smalltalk / Ruby, code blocks:
A block of code (an anonymous function) can be expressed as a literal value (which is an object, since all values are objects.)
In this case, since myValue is actually pointing to a function that gets invoked only when myValue is used, the language probably implements them as closures, which are originally a feature of functional languages.
Because closures are first-class functions with free variables, closures exist in C#. However, the implicit return does not occur; in C# they're simply anonymous delegates! Consider:
Func<Object> myValue = delegate()
{
if (this.IsValidObject)
{
UpdateGraph();
UpdateCount();
return this.Name;
}
else
{
Debug.Log (Exceptions.UninitializedObject);
return 3;
}
};
This can also be done in C# using lambda expressions:
Func<Object> myValue = () =>
{
if (this.IsValidObject) { ... }
else { ... }
};
I realize your question is asking about the implicit return value, but I am trying to illustrate that there is more than just "conditional branches are expressions" going on here.
Can this also be achieved in staticly
typed languages?
Sure, the types of the involved expressions can be statically and strictly checked. There seems to be nothing dependent on dynamic typing in the "if-as-expression" approach.
For example, Haskell--a strict statically typed language with a rich system of types:
$ ghci
Prelude> let x = if True then "a" else "b" in x
"a"
(the example expression could be simpler, I just wanted to reflect the assignment from your question, but the expression to demonstrate the feature could be simlpler:
Prelude> if True then "a" else "b"
"a"
.)
What languages provide the use of object literals? (Or in what languages could you easily emulate them?) Can you give a code example?
Starting with the obvious javascript snippet:
var someObj = {
someProperty: 123,
someFunction: function() {
alert('hello!');
}
};
Checkout C# anonymous types
var Customer = new
{
Company = "AgileApps",
Website = "http://www.agileapps.co.uk",
Name = "Big Al",
Entered = DateTime.Now
};
If you replace object by "term," then Prolog does this naturally (in fact, there's no other way to construct an object). Here's an example featuring binary trees:
% find a node in List with a nil left child and call its rightmost grandchild X
member(node(nil,node(_,X)), List).
Lisp and Scheme also have some pretty advances features in this area, particularly quoting and semiquoting:
;; construct right-leaning binary tree with x as the rightmost grandchild
`(nil . (nil . ,x))
Practically all functional programming languages have copied this in some form.