Using Z3 with parallelization from SBV

Using Z3 with parallelization from SBV - haskell

I'd like to use Z3 via SBV using multiple cores. Based on this answer I should be able to do that just by passing parallel.enable=true to the z3 executable on the command line. Since I am using SBV, I need to go through SBV's interface to various SMTLib solvers, so here's what I tried:
foo = runSMTWith z3par $ do
...
where
z3par = z3
{ SBV.solver = (SBV.solver z3)
{ SBV.options = \cfg -> SBV.options (SBV.solver z3) cfg ++ ["parallel.enable=true"]
}
}
However, I am not seeing any signs of Z3 running with parallelization enabled:
CPU usage doesn't go above one core
No speedup compared to running without this flag
How do I enable Z3 parallelization, when going via SBV?

What you're doing is essentially how it is done from SBV. You might want to increase verbosity of z3 and output the diagnostics to a file to inspect later. Something like:
import Data.SBV
import Data.SBV.Control
foo :: IO (Word64, Word64)
foo = runSMTWith z3{solver = par} $ do
x <- sWord64 "x"
y <- sWord64 "y"
setOption $ DiagnosticOutputChannel "diagnostic_output"
constrain $ x * y .== 13
constrain $ x .> 1
constrain $ y .> 1
query $ do ensureSat
(,) <$> getValue x <*> getValue y
where par = (solver z3) {options = \cfg -> options (solver z3) cfg ++ extras}
extras = [ "parallel.enable=true"
, "-v:3"
]
Here, we're not only setting z3's parallel-mode, but we're also telling it to increase verbosity and put all the diagnostics in a file. (Side note: There are many other settings in the parallel section of z3 config, you can see what they are by issuing z3 -pd in your command line and looking at the output. You can set any other parameters from there by adding it to the extras variable above.)
When I run the above, I get:
*Main> foo
(6379316565415788539,3774100875216427415)
But I also get a file named diagnostic_output created in the current directory, which contains the following lines, amongst others:
(tactic.parallel :progress 0% :open 1)
(tactic.parallel :split-cube 0)
(parallel.tactic simplify-1)
(tactic.parallel :progress 100.00% :status sat :open 0)
So z3 is indeed in the parallel mode and things are happening. Of course, what exactly it does is more or less a black-box, and it's impossible to interpret the above output without inspecting z3 internals. (I don't think the meaning of these stats nor the strategies for the parallel solver are that well documented. If you find a good documentation on the details, please do report!)
Update
As of this commit, you can now simply say:
runSMTWith z3{extraArgs = ["parallel.enable=true"]} $ do ...
simplifying the programming a bit further.
Solver agnostic concurrency directly from SBV
Note that SBV also has combinators for running things concurrently directly from Haskell. See the functions:
satConcurrentWithAny
satConcurrentWithAll
proveConcurrentWithAny
proveConcurrentWithAll
These functions are solver agnostic, you can use them with any solver of your choosing. Of course, they require you to restructure your problem and do a manual decomposition to take advantage of the multiple-cores in your computer and stitch the solutions together yourself. But they also give you full control over how you want to structure your expensive search.

Related

Trying to understand Julia syntax in linear regression code (GLM package)

Total Julia noob here (with basic knowledge of Python). I am trying to do linear regression and things I read suggest the GLM package. Here is some sample code I found here:
using DataFrames, GLM
y = 1:10
df = DataFrame(y = y, x1 = y.^2, x2 = y.^3)
sm = GLM.lm( #formula(y ~ x1 + x2), df )
coef(sm)
Can someone explain the syntax here? What does #formula mean? Docs here say #foo means a
macro which I guess is basically just a function, but where do I find the function/macro formula? Just looking at the use here though, I would have thought it is maybe passing y ~ x1 + x2 (whatever that is) as the formula argument to lm? (similar to keyword arguments = in python?)
Next, what is ~ here? General docs say ~ means negation but I'm not seeing how that makes here.
Is there a place in the GLM docs where all of this is explained? I'm not seeing that. Only seeing a few examples but not a full breakdown of each function and all of its arguments.

You have stumbled upon the #formula language that is defined in the StatsModels.jl package and implemented in many statistics/econometrics related packages across the Julia ecosystem.
As you say, #formula is a macro, which transforms the expression given to it (here y ~ x1 + x2) into some other Julia expression. If you want to find out what happens when a macro gets called in Julia - which I admit can often look like magic to new (and sometimes experienced!) users - the #macroexpand macro can help you. In this case:
julia> #macroexpand #formula(y ~ x1 + x2)
:(StatsModels.Term(:y) ~ StatsModels.Term(:x1) + StatsModels.Term(:x2))
The result above is the expression constructed by the #formula macro. We see that the variables in our formula macro are transformed into StatsModels.Term objects. If we were to use StatsModels directly, we could construct this ourselves by doing:
julia> Term(:y) ~ Term(:x1) + Term(:x2)
FormulaTerm
Response:
y(unknown)
Predictors:
x1(unknown)
x2(unknown)
julia> (Term(:y) ~ Term(:x1) + Term(:x2)) == #formula(y ~ x1 + x2)
true
Now what is going on with ~, which as you say can be used for negation in Julia? What has happened here is that StatsModels has defined methods for ~ (which in Julia is and infix operator, that means essentially it is a function that can be written in between its arguments rather than having to be called with its arguments in brackets:
julia> (Term(:y) ~ Term(:x)) == ~(Term(:y), Term(:x))
true
So writing y::Term ~ x::Term is the same as calling ~(y::Term, x::Term), and this method for calling ~ with terms on the left and right hand side is defined by StatsModels (see method no. 6 below):
julia> methods(~)
# 6 methods for generic function "~":
[1] ~(x::BigInt) in Base.GMP at gmp.jl:542
[2] ~(::Missing) in Base at missing.jl:100
[3] ~(x::Bool) in Base at bool.jl:39
[4] ~(x::Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8}) in Base at int.jl:254
[5] ~(n::Integer) in Base at int.jl:138
[6] ~(lhs::Union{AbstractTerm, Tuple{Vararg{AbstractTerm,N}} where N}, rhs::Union{AbstractTerm, Tuple{Vararg{AbstractTerm,N}} where N}) in StatsModels at /home/nils/.julia/packages/StatsModels/pMxlJ/src/terms.jl:397
Note that you also find the general negation meaning here (method 3 above, which defines the behaviour for calling ~ on a boolean argument and is in Base Julia).
I agree that the GLM.jl docs maybe aren't the most comprehensive in the world, but one of the reasons for that is that the whole machinery behind #formula actually isn't a GLM.jl thing - so do check out the StatsModels docs linked above which are quite good I think.

unused variable(s) warning in runjags model

I am running JAGS models through the R package runjags. I just updated to JAGS 4.0.0 from JAGS 3.4, and have noticed some unexpected behavior that seems to be related to the update.
First, when I run a model, I now get a warning message WARNING: Unused variable(s) in data table: followed by a list of data objects that are referenced in the model and provided as data. It doesn't seem to affect the results (but it is very puzzling). I have, however, noticed a few times while playing around with this that for some variables the posteriors were virtually identical to the priors (indicating that no updating occured). I can't seem to recreate the update failure right now, but below is a reproducible code example illustrating the odd warning message. The code example on the run.jags help page also produces the same warning.
Second, I thought I'd check to see if the same message pops up if I use the R package R2jags instead of runjags, but R2jags won't load because apparently rjags (one of the dependencies) is not compatible with JAGS 4.0 (its looking for JAGS 3.X). Also, in the runjags function run.jags, the argument method="rjags" doesn't seem to work anymore, but method="parallel" does work.
I'm using runjags_2.0.1-4 and R 3.2.2.
So my questions are:
1) Is rjags really incompatible with JAGS 4.0? The motivation to go to 4.0 was to use vectors as indices (see https://martynplummer.wordpress.com/2015/08/16/whats-new-in-jags-4-0-0-part-34-r-style-features/).
2) What is up with the unused variable(s) warning, and should I be concerned about it?
Thanks,
Glenn
Code:
#--- GENERATE DATA ------------------------
rm(list=ls())
# Number of sites and observations per site
N <- 200
nobs <- 3
# generate covariates and standardize (where appropriate)
set.seed(123)
forest <- rnorm(N)
# relationship between occupancy and covariates
b0 <- 0.5
b.for <- 0.5
psi <- plogis(b0 + b.for*forest)
# draw occupancy for each site
z <- rbinom(n=N, size=1,prob=psi)
# specify detection probablility
p <- 0.5
pz <- p*z
# generate the observations
Y <- rbinom(n=N, size=nobs,prob=pz)
#---- BUGS model ------------------------
model1 <- "model {
for (i in 1:N){
logit(eta[i]) <- b0 + b.for*forest[i]
z[i] ~ dbern(eta[i])
pz[i] <- z[i]*p
y[i] ~ dbin(pz[i],nobs)
} #i
b0.0 ~ dunif(0,1)
b0 <- log(b0.0/(1-b0.0))
b.for ~ dnorm(0,0.01)
p ~ dunif(0,1)
}"
occ.data1 <-list(y=Y,N=N,nobs=nobs,forest=forest)
inits1 <- function(){list(b0.0=runif(1),b.for=rnorm(1),p=runif(1),z=as.numeric(Y>0))}
parameters1 <- c("b0","b.for","p")
#---- RUN MODEL ------------------------
library(runjags)
ni <- 2000
nt <- 1
nb <- 1000
nc <- 3
ad <- 100
out <- run.jags(model=model1,data=occ.data1,monitor=parameters1,n.chains=nc,inits=inits1,burnin=nb,
sample=ni,adapt=ad,thin=nt,modules=c("glm","dic"),method="parallel")

To answer your questions:
1) rjags and JAGS used linked (non-interchangable) versions, and CRAN systems are still using JAGS_3.4.0 so the version of rjags on CRAN matches. This will be updated soon, and in the meantime you can grab the correct version of rjags from the sourceforge page as #jbaums notes.
2) This is a helpful message from JAGS/rjags telling you that you have specified something as data that the model isn't using. Remember that variable names are case sensitive i.e.
library('runjags')
model <- "model {
m ~ dunif(-1000,1000)
#data# M
#inits# m
#monitor# m
}"
M <- 0
m <- list(-10, 10)
results <- run.jags(model, method="interruptible", n.chains=2)
results <- run.jags(model, method="rjags", n.chains=2)
... gives you a warning because M does not match m. Also note that the warning looks a bit different from the two function calls - in the first it comes half-way down the JAGS output and in the second it comes as a warning in R after the function is completed.
As for 'should I be concerned' - yes if you think these variables should be in your model. If you can't find the problem try posting the code you are using - it got cut off from your original post.
Matt

JAGS: multivariate normal with both, unobservable and observable variables?

I have a "simple" problem with JAGS that drives me crazy. In essence, consider the following example that works:
x2[i] ~ dnorm(mu[i,1], tau1);
u[i] ~ dnorm(mu[i,2], tau2);
Here, x2 is an observable variable (that is, data), while u is a latent variable. In the example, both are drawn independently from two distinct normal distributions.
However, I want them to be (possibly) dependent, that is, to be drawn from one multivariate normal distribution. So I would like to do:
c(x2[i], u[i]) ~ dmnorm(mu[i,1:2], Omega[1:2,1:2]);
Unfortunately, this doesn't work because this syntax is not correct. However, having tried many different syntaxes, neither of them does work. E.g.,
y[i,1] <- x2[i];
y[i,2] <- u[i];
y[i,1:2] ~ dmnorm(mu[i,1:2], Omega[1:2,1:2]);
leads to the error Node y[1,1:2] overlaps previously defined nodes, what is obvious.
So what can I do? Please, help me, I'm getting mad...
UPDATE: I figured out that I can at least do the following:
(in R:)
p <- 1/(1+exp(-x2));
t <- rep(10000, length(x2));
s <- rbinom(length(x2), t, p2);
(in JAGS:)
nul[i,1] <- 0;
nul[i,2] <- 0;
e[i,1:2] ~ dmnorm(nul[i,1:2], Omega[1:2,1:2]);
u[i] <- mu[i,2] + e[i,2];
x2g[i] <- mu[i,1] + e[i,1];
pg[i] <- 1/(1+exp(-x2g[i]));
s[i] ~ dbin(pg[i], t[i]);
This works (a bit), but looses of course efficiency since an observable variable (x2) is treated as if it was only indirectly observable (through s).

You are defining y twice Once in:
y[i,1] <- x2[i];
y[i,2] <- u[i];
And once in
y[i,1:2] ~ dmnorm(mu[i,1:2], Omega[1:2,1:2]);
You might be able to get away with:
x2[i] <- y[i,1];
Or you might be able to simply write out the regression (after all it is bivariate, so not that difficulty).
You also might get a faster response on the JAGS mailing list (which Martyn Plummer regularly monitors).

You can use the data block as follows:
data{
for(i in 1:length(x2)) {
y[i,1] <- x2[i]
y[i,2] <- u[i]
}
}
model{
for(i in 1:length(x2)) {
y[i,1:2] ~ dmnorm(mu[i,], Tau)
}
# ... definition of mu, Tau, and their prior distribution
}
However, be sure there are no missing values in x2 or u, since a multivariate node cannot be partly observed.
Regards! :)

Running into memory issues with Data.Sequence on a manageably sized dataset

TL;DR: I'm working on a piece of code which generates a (long) array of numbers. I'm able to generate this array, convert it to a List and then calculate the maximum (using a strict left fold). BUT, I run into memory issues when I try to convert the list to a Sequence prior to calculating the maximum. This is quite counter-intuitive to me.
My question: Why is this happening and what is the correct approach for converting the data to a Sequence structure?
Background:
I'm working on a problem which I've chosen to tackle in using three steps (below).
*Note: I'm intentionally keeping the problem statement vague so this post doesn't serve as a hint.
Anyways, my proposed approach:
(i) First, generate a long list of integers, namely, the number of factors for each integer from 1 to 100 million (NOT the factors themselves, just the number of factors)
(ii) second, convert this list into a Sequence.
(iii) lastly, use an efficient sliding window maximum algorithm to calc my answer (this step requires dequeue operations, hence the need for a Sequence)
(Again, the specifics of the problem aren't that relevant as I'm just curious as to why I'm running into this particular issue in the first place.)
What I've done so far?
Step 1 was fairly straightforward - see output below (full code is included at the bottom). I just bruteforce a sieve using an Unboxed Array and the accumArray function, nothing fancy. Note: I've used this same algorithm to solve a number of other such problems so I'm reasonably confident that it's giving the right answer.
For the purposes of showing execution time / memory-usage stats, I've (admittedly arbitrarily) chosen to calculate the maximum element in the resulting array - the idea is simply to use a function which forces construction of all elements of the Array, thereby ensuring that we see meaningful stats for exec time / memory-usage.
The following main function...
main = print $ maximum' $ elems (sieve (10^8))
...results in the following (i.e., it says that the number below 100 million with the most divisors has a total of 768 divisors):
Linking maxdivSO ...
768
33.73s user 70.80s system 99% cpu 1:44.62 total
344,214,504,640 bytes allocated in the heap
58,471,497,320 bytes copied during GC
200,062,352 bytes maximum residency (298 sample(s))
3,413,824 bytes maximum slop
386 MB total memory in use (0 MB lost due to fragmentation)
%GC time 24.7% (30.5% elapsed)
The problem
It seems like we can accomplish the first step without breaking a sweat since I've allocated a total of 5gb to my VirtualBox and the above code uses <400mb (as reference, I've seen programs execute successfully and report using 3gb+ of total memory). In other words, it seems like we've accomplished Step 1 with plenty of headroom.
So I'm a bit surprised as to why the following version of the main function fails. We attempt to perform the same calculation of the maximum but after converting the list of integers to a Sequence. The following code...
main = print $ maximum' $ fromList $ elems (sieve (10^8))
...results in the following:
Linking maxdivSO ...
maxdivSO: out of memory (requested 2097152 bytes)
39.48s user 76.35s system 99% cpu 1:56.03 total
My question: Why does the algorithm (as currently written) run out of memory if we try to convert the list to a Sequence? And how might I go about successfully converting this list into a Sequence?"
(I'm not one to stubbornly stick to brute-force for these types of problems - but I have a strong suspicion that this particular issue is due to my not being able to reason well about evaluation.)
The code itself:
{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.Word (Word32, Word16)
import Data.Foldable (Foldable, foldl')
import Data.Array.Unboxed (UArray, accumArray, elems)
import Data.Sequence (fromList)
main :: IO ()
main = print $ maximum' $ elems (sieve (10^8)) -- <-- works
--main = print $ maximum' $ fromList $ elems (sieve (10^8)) -- <-- doesn't work
maximum' :: (Foldable t, Ord a, Num a) => t a -> a
maximum' = foldl' (\x acc -> if x > acc then x else acc) 0
sieve :: Int -> UArray Word32 Word16
sieve u' = accumArray (+) 2 (1,u) ( (1,-1) : factors )
where
u = fromIntegral u'
cap = floor $ sqrt (fromIntegral u) :: Word32
factors = [ (i*d,j) | d <- [2..cap]
, i <- [2..(u `quot` d)]
, d <= i, let j = if i == d then 1 else 2
]

I think the reason for this is that to get the first element of of a sequence requires the full sequence to be constructed in memory (since the internal representation of the sequence is a tree). In the list case elems yields the elements lazily.
Rather than turning the full array into a sequence, why not make the sequence only as long as your sliding window?

Do you find you still need variables you can change, and if so why?

One of the arguments I've heard against functional languages is that single assignment coding is too hard, or at least significantly harder than "normal" programming.
But looking through my code, I realized that I really don't have many (any?) use patterns that can't be written just as well using single assignment form if you're writing in a reasonably modern language.
So what are the use cases for variables that vary within a single invocation of their scope? Bearing in mind that loop indexes, parameters, and other scope bound values that vary between invocations aren't multiple assignments in this case (unless you have to change them in the body for some reason), and assuming that you are writing in something a far enough above the assembly language level, where you can write things like
values.sum
or (in case sum isn't provided)
function collection.sum --> inject(zero, function (v,t) --> t+v )
and
x = if a > b then a else b
or
n = case s
/^\d*$/ : s.to_int
'' : 0
'*' : a.length
'?' : a.length.random
else fail "I don't know how many you want"
when you need to, and have list comprehensions, map/collect, and so forth available.
Do you find that you still want/need mutable variables in such an environment, and if so, what for?
To clarify, I'm not asking for a recitation of the objections to SSA form, but rather concrete examples where those objections would apply. I'm looking for bits of code that are clear and concise with mutable variables and couldn't be written so without them.
My favorite examples so far (and the best objection I expect to them):
Paul Johnson's Fisher-Yates algorithm answer, which is pretty strong when you include the big-O constraints. But then, as catulahoops points out, the big-O issue isn't tied to the SSA question, but rather to having mutable data types, and with that set aside the algorithm can be written rather clearly in SSA:
shuffle(Lst) ->
array:to_list(shuffle(array:from_list(Lst), erlang:length(Lst) - 1)).
shuffle(Array, 0) -> Array;
shuffle(Array, N) ->
K = random:uniform(N) - 1,
Ek = array:get(K, Array),
En = array:get(N, Array),
shuffle(array:set(K, En, array:set(N, Ek, Array)), N-1).
jpalecek's area of a polygon example:
def area(figure : List[Point]) : Float = {
if(figure.empty) return 0
val last = figure(0)
var first= figure(0)
val ret = 0
for (pt <- figure) {
ret+=crossprod(last - first, pt - first)
last = pt
}
ret
}
which might still be written something like:
def area(figure : List[Point]) : Float = {
if figure.length < 3
0
else
var a = figure(0)
var b = figure(1)
var c = figure(2)
if figure.length == 3
magnitude(crossproduct(b-a,c-a))
else
foldLeft((0,a,b))(figure.rest)) {
((t,a,b),c) => (t+area([a,b,c]),a,c)
}
Or, since some people object to the density of this formulation, it could be recast:
def area([]) = 0.0 # An empty figure has no area
def area([_]) = 0.0 # ...nor does a point
def area([_,_]) = 0.0 # ...or a line segment
def area([a,b,c]) = # The area of a triangle can be found directly
magnitude(crossproduct(b-a,c-a))
def area(figure) = # For larger figures, reduce to triangles and sum
as_triangles(figure).collect(area).sum
def as_triangles([]) = [] # No triangles without at least three points
def as_triangles([_]) = []
def as_triangles([_,_]) = []
def as_triangles([a,b,c | rest) = [[a,b,c] | as_triangles([a,c | rest])]
Princess's point about the difficulty of implementing O(1) queues with immutable structures is interesting (and may well provide the basis for a compelling example) but as stated it's fundamentally about the mutability of the data structure, and not directly about the multiple assignment issue.
I'm intrigued by the Sieve of Eratosthenes answer, but unconvinced. The proper big-O, pull as many primes as you'd like generator given in the paper he cited does not look easy to implement correctly with or without SSA.
Well, thanks everyone for trying. As most of the answers turned out to be either 1) based on mutable data structures, not on single-assignment, and 2) to the extent they were about single assignment form easily countered by practitioners skilled in the art, I'm going to strike the line from my talk and / or restructure (maybe have it in backup as a discussion topic in the unlikely event I run out of words before I run out of time).
Thanks again.

The hardest problem I've come across is shuffling a list. The Fisher-Yates algorithm (also sometimes known as the Knuth algorithm) involves iterating through the list swapping each item with a random other item. The algorithm is O(n), well known and long-since proven correct (an important property in some applications). But it requires mutable arrays.
That isn't to say you can't do shuffling in a functional program. Oleg Kiselyov has written about this. But if I understand him correctly, functional shuffling is O(n . log n) because it works by building a binary tree.
Of course, if I needed to write the Fisher-Yates algorithm in Haskell I'd just put it in the ST monad, which lets you wrap up an algorithm involving mutable arrays inside a nice pure function, like this:
-- | Implementation of the random swap algorithm for shuffling. Reads a list
-- into a mutable ST array, shuffles it in place, and reads out the result
-- as a list.
module Data.Shuffle (shuffle) where
import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.STRef
import System.Random
-- | Shuffle a value based on a random seed.
shuffle :: (RandomGen g) => g -> [a] -> [a]
shuffle _ [] = []
shuffle g xs =
runST $ do
sg <- newSTRef g
let n = length xs
v <- newListArray (1, n) xs
mapM_ (shuffle1 sg v) [1..n]
getElems v
-- Internal function to swap element i with a random element at or above it.
shuffle1 :: (RandomGen g) => STRef s g -> STArray s Int a -> Int -> ST s ()
shuffle1 sg v i = do
(_, n) <- getBounds v
r <- getRnd sg $ randomR (i, n)
when (r /= i) $ do
vi <- readArray v i
vr <- readArray v r
writeArray v i vr
writeArray v r vi
-- Internal function for using random numbers
getRnd :: (RandomGen g) => STRef s g -> (g -> (a, g)) -> ST s a
getRnd sg f = do
g1 <- readSTRef sg
let (v, g2) = f g1
writeSTRef sg g2
return v

If you want to make the academic argument, then of course it's not technically necessary to assign a variable more than once. The proof is that all code can be represented in SSA (Single Static Assignment) form. Indeed, that's the most useful form for many kinds of static and dynamic analysis.
At the same time, there are reasons we don't all write code in SSA form to begin with:
It usually takes more statements (or more lines of code) to write code this way. Brevity has value.
It's almost always less efficient. Yes I know you're talking about higher languages -- a fair scoping -- but even in the world of Java and C#, far away from assembly, speed matters. There are few applications where speed is irrelevant.
It's not as easy to understand. Although SSA is "simpler" in a mathematical sense, it's more abstract from common sense, which is what matters in real-world programming. If you have to be really smart to grok it, then it has no place in programming at large.
Even in your examples above, it's easy to poke holes. Take your case statement. What if there's an administrative option that determines whether '*' is allowed, and a separate one for whether '?' is allowed? Also, zero is not allowed for the integer case, unless the user has a system permission that allows it.
This is a more real-world example with branches and conditions. Could you write this as a single "statement?" If so, is your "statement" really different from many separate statements? If not, how many temporary write-only variables do you need? And is that situation significantly better than just having a single variable?

I've never identified such a case. And while you can always just invent new names, as in conversion to SSA form, I actually find it's easy and natural for each value to have its own name. A language like Haskell gives me a lot of choices about which values to name, and two different places to put name bindings (let and where). I find the single-assignment form quite natural and not at all difficult.
I do occasionally miss being able to have pointers to mutable objects on the heap. But these things have no names, so it's not the same objection. (And I also find that when I use mutable objects on the heap, I tend to write more bugs!)

I think you'll find the most productive languages allow you to mix functional and imperative styles, such as OCaml and F#.
In most cases, I can write code which is simply a long line of "map x to y, reduce y to z". In 95% of cases, functional programming simplifies my code, but there is one area where immutability shows its teeth:
The wide disparity between the ease of implementing and immutable stack and an immutable queue.
Stacks are easy and mesh well with persistence, queues are ridiculous.
The most common implementations of immutable queues use one or more internal stacks and stack rotations. The upside is that these queues run in O(1) most of the time, but some operations will run in O(n). If you're relying on persistence in your application, then its possible in principle that every operation runs in O(n). These queues are no good when you need realtime (or at least consistent) performance.
Chris Okasaki's provides an implementation of immutable queues in his book, they use laziness to achieve O(1) for all operations. Its a very clever, reasonably concise implementation of a realtime queue -- but it requires deep understanding of its underlying implementation details, and its still an order of magnitude more complex than an immutable stack.
In constrast, I can write a stack and queue using mutable linked lists which run in constant time for all operations, and the resulting code would be very straightforward.
Regarding the area of a polygon, its easy to convert it to functional form. Let's assume we have a Vector module like this:
module Vector =
type point =
{ x : float; y : float}
with
static member ( + ) ((p1 : point), (p2 : point)) =
{ x = p1.x + p2.x;
y = p1.y + p2.y;}
static member ( * ) ((p : point), (scalar : float)) =
{ x = p.x * scalar;
y = p.y * scalar;}
static member ( - ) ((p1 : point), (p2 : point)) =
{ x = p1.x - p2.x;
y = p1.y - p2.y;}
let empty = { x = 0.; y = 0.;}
let to_tuple2 (p : point) = (p.x, p.y)
let from_tuple2 (x, y) = { x = x; y = y;}
let crossproduct (p1 : point) (p2 : point) =
{ x = p1.x * p2.y; y = -p1.y * p2.x }
We can define our area function using a little bit of tuple magic:
let area (figure : point list) =
figure
|> Seq.map to_tuple2
|> Seq.fold
(fun (sum, (a, b)) (c, d) -> (sum + a*d - b*c, (c, d) ) )
(0., to_tuple2 (List.hd figure))
|> fun (sum, _) -> abs(sum) / 2.0
Or we can use the cross product instead
let area2 (figure : point list) =
figure
|> Seq.fold
(fun (acc, prev) cur -> (acc + (crossproduct prev cur), cur))
(empty, List.hd figure)
|> fun (acc, _) -> abs(acc.x + acc.y) / 2.0
I don't find either function unreadable.

That shuffle algorithm is trivial to implement using single assignment, in fact it's exactly the same as the imperative solution with the iteration rewritten to tail recursion. (Erlang because I can write it more quickly than Haskell.)
shuffle(Lst) ->
array:to_list(shuffle(array:from_list(Lst), erlang:length(Lst) - 1)).
shuffle(Array, 0) -> Array;
shuffle(Array, N) ->
K = random:uniform(N) - 1,
Ek = array:get(K, Array),
En = array:get(N, Array),
shuffle(array:set(K, En, array:set(N, Ek, Array)), N-1).
If the efficiency of those array operations is a concern, then that's a question about mutable data structures and has nothing to do with single assignment.
You won't get an answer to this question because no examples exist. It is only a question of familiarity with this style.

In response to Jason --
function forbidden_input?(s)
(s = '?' and not administration.qmark_ok) ||
(s = '*' and not administration.stat_ok) ||
(s = '0' and not 'root node visible' in system.permissions_for(current_user))
n = if forbidden_input?(s)
fail "'" + s + "' is not allowed."
else
case s
/^\d*$/ : s.to_int
'' : 0
'*' : a.length
'?' : a.length.random
else fail "I don't know how many you want"

I would miss assignments in a non-purely functional language. Mostly because they hinder the usefulness of loops. Examples (Scala):
def quant[A](x : List[A], q : A) = {
var tmp : A=0
for (el <- x) { tmp+= el; if(tmp > q) return el; }
// throw exception here, there is no prefix of the list with sum > q
}
This should compute the quantile of a list, note the accumulator tmp which is assigned to multiple times.
A similar example would be:
def area(figure : List[Point]) : Float = {
if(figure.empty) return 0
val last = figure(0)
var first= figure(0)
val ret = 0
for (pt <- figure) {
ret+=crossprod(last - first, pt - first)
last = pt
}
ret
}
Note mostly the last variable.
These examples could be rewritten using fold on a tuple to avoid multiple assignments, but that would really not help the readability.

Local (method) variables certainly never have to be assigned to twice. But even in functional programming re-assigning a variable is allowed. It's changing (part of) the value that's not allowed. And as dsimcha already answered, for very large structures (perhaps at the root of an application) it doesn't seem feasible to me to replace the entire structure. Think about it. The state of an application is all contained ultimately by the entrypoint method of your application. If absolutely no state can change without being replaced, you would have to restart your application with every keystroke. :(

If you have a function that builds a lazy list/tree then reduces it again, a functional compiler may be able to optimize it using deforestation.
If it's tricky, it might not. Then you're sort of out of luck, performance & memory wise, unless you can iterate and use a mutable variable.

Thanks to the Church-Turing Thesis, we know that anything that can be written in a Turing-complete language can be written in any Turing-complete language. So, when you get right down to it, there's nothing you can't do in Lisp that you couldn't do in C#, if you tried hard enough, or vice versa. (More to the point, either one is going to get compiled down to x86 machine language in most cases anyway.)
So, the answer to your question is: there are no such cases. All there are are cases that are easier for humans to comprehend in one paradigm/language or another-- and the ease of comprehension here is tied to training and experience.

Perhaps the main issue here is the style of looping in a language. In langauges where we use recursion, any values changing over the course of a loop are re-bound when the function is called again. Languages using iterators in blocks (e.g., Smalltalk's and Ruby's inject method) tend to be similar, though many people in Ruby would still use each and a mutable variable over inject.
When you code loops using while and for, on the other hand, you don't have the easy re-binding of variables that comes automatically when you can pass in several parameters to your chunk of code that does one iteration of the loop, so immutable variables would be rather more inconvenient.
Working in Haskell is a really good way to investigate the necessity of mutable variables, since the default is immutable but mutable ones are available (as IORefs, MVars, and so on). I've been recently, er, "investigating" in this way myself, and have come to the following conclusions.
In the vast majority of cases, mutable variables are not necessary, and I'm happy living without them.
For inter-thread communication, mutable variables are essential, for fairly obvious reasons. (This is specific to Haskell; runtime systems that use message passing at the lowest level don't need them, of course.) However, this use is rare enough that having to use functions to read and write them (readIORef fooRef val etc.) is not much of a burden.
I have used mutable variables within a single thread, because it seemed to make certain things easier, but later regretted it as I realized that it became very hard to reason about what was happening to the value stored there. (Several different functions were manipulating that value.) This was a bit of an eye-opener; in typical frog-in-the-pot-of-warming-water style, I'd not realized how easy Haskell had made it for me to reason about the use of values until I ran into an example of how I used to use them.
So these days I've come down fairly firmly on the side of immutable variables.
Since previous answers to this question have confused these things, I feel compelled to point out here quite forcefully that this issue is orthogonal to both purity and functional programming. I feel that Ruby, for example, would benefit from having single-assignment local variables, though possibly a few other changes to the language, such as adding tail recursion, would be necessary to make this truly convenient.

What about when you need to make small changes to large data structures? You don't really want to copy a whole array or large class every time you would modify a few elements.

I haven't really thought about this much except now that you're pointing it out.
Actually I try not using multiple assignments subconsciously.
Here's an example of what Im talking about, in python
start = self.offset%n
if start:
start = n-start
Written this way to avoid an unneccesary extra Modulo or subtraction. This is used with bignum style long ints, so its a worthwhile optimization. Thing about it, though, is that it really is a single assignment.
I wouldn't miss multiple assignment at all.

I know you asked for code that did show the benefits of mutable variables. And I wish I could provide it. But as pointed out before - there is no problem that can't be expressed in both fashions. And especially since you pointed out that jpalecek's area of a polygon example could be written with a folding algo (which is IMHO way messier and takes the problem to different level of complexity) - well it made me wonder why you are coming down on mutability so hard. So I'll try to make the argument for a common ground and an coexistence of immutable and mutable data.
In my opinion this question misses the point a bit. I know that us programmers are prone to liking things clean and simple but we sometimes miss that a mixture is possible as well. And that's probably why in the discussion about immutability there is seldom somebody taking the middle ground. I just wonder why, because let's face it - immutability is a great tool of abstracting all kinds of problems. But sometimes it is a huge pain in the ass. Sometimes it simply is too constraining. And that alone makes me stop and thing - do we really want to loose mutability? Is it really either-or? Isn't there some common ground we can arrive at? When does immutability help me achieve my goals faster, when does mutability? Which solution is easier to read and maintain? (Which for me is the biggest question)
A lot of these questions are influenced by a programmer's taste and by what they are used to program in. So I'll focus on one of the aspects that is usually the center of most pro-immutability arguments - Parallelism:
Often parallelism is thrown into the argument surrounding immutability. I've worked on problem sets that required 100+ CPUs to solve in a reasonable time. And it has taught me one very important thing: Most of the time parallelizing the manipulation of graphs of data is really not the kind of thing that will be the most efficient way to parallelize. It sure can benefit greatly, but imbalance is a real problem in that problem-space. So usually working on multiple mutable graphs in parallel and exchanging information with immutable messages is way more efficient. Which means, when I know that the graph is isolated, that I have not revealed it to the outside world, I would like to perform my operations on it in the most concise manner I can think of. And that usually involves mutating the data. But after these operation on the data I want to open the data up to the whole world - and that's the point where I usually get a bit nervous, if the data is mutable. Because other parts of the program could corrupt the data, the state becomes invalid, ... because after opening up to the world the data often does get into the world of parallelism.
So real world parallel programs usually have areas where data graphs are used in definitive single thread operations - because they simply are not known to the outside - and areas where they could be involved in multi-threaded operations (hopefully just supplying data not being manipulated). During those multi-threaded parts we never want them to change - it simply is better to work on outdated data than on inconsistent data. Which can be guaranteed by the notion of immutability.
That made me come to a simple conclusion: The real problem for me is that non of the programming languages I am familiar with allow me to say: "After this point this whole data structure shal be immutable" and "give me a mutable copy of this immutable data structure here, please verify that only I can see the mutable copy". Right now I have to guarantee it myself by flipping a readonly bit or something similar. If we could have compiler support for it, not only would it guarantee for me that I did not do anything stupid after flipping said bit, but it could actually help the compiler do various optimizations that it couldn't do before. Plus - the language would still be attractive for programmers that are more familiar with the imperative programming model.
So to sum up. IMHO programs usually have a good reason to use both immutable and mutable representations of data graphs. I would argue that data should be immutable by default and the compiler should enforce that - but we should have the notion of private mutable representations, because there naturally are areas where multi-threading will never reach - and readability and maintainability could benefit from a more imperative structuring.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string