My scikit-learn LogisticRegression model, which uses the lbfgs solver, is stopping early as shown in the logs bellow. The data is standardized.
(...)
At iterate13150 f= 4.05397D+03 |proj g|= 2.41194D+04
At iterate13200 f= 4.05213D+03 |proj g|= 1.36863D+04
.venv/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[Parallel(n_jobs=-1)]: Done 1 out of 1 | elapsed: 5.5s finished
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
62 13240 15001 1 0 0 4.800D+04 4.051D+03
F = 4051.0211050375365
sklearn uses the scipy implementation of the lbfgs solver. The function scipy/optimize/_lbfgsb_py.py:_minimize_lbfgsb has the following early stop conditions
if n_iterations >= maxiter:
task[:] = 'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
elif sf.nfev > maxfun:
task[:] = ('STOP: TOTAL NO. of f AND g EVALUATIONS '
'EXCEEDS LIMIT')
I am indeed hitting the sf.nfev > maxfun limit. Unfortunatly, sklearn fixes the value of maxfun to 15_000 when it instanciates the scipy solver (`sklearn/linear_model/_logistic.py:442).
When I hotfix the sklearn package to set maxfun to 100_000, the solver converges. But this is not a real solution (since I do not want to carry arround a custom sklearn dist with one different constant).
Any ideas on how to set the maxfun parameter in another way?
I am a bit confused in the Poisson distribution. Actually I am fitting a Poisson type distribution and the I need to extract its mean and error on mean. So as we know the Poisson distribution is
In root (C/c++ based analysis framework) I defined this function like below
function = ( [0] * Power( [1] / [2] , x/[2] ) * exp (-[1]/[2]) ) / Gamma(x/[2] + 1)
Where : [0] = Normalizing parameter
[1] / [2] -> mean (mu)
x / [2] -> x
Gamma( x / [2] + 1 ) = factorial (x / [2])
So, In principle then mean of Poisson distribution is mu = 1/2 and error will be the standard deviation which is square root of mean.
But, If I am using this value then my mean is coming around 10 and hence error is ~3.
While the mean of distribution is around 2 (as we can see) so I am confused. Because the parameter 1 's value is coming out to around 2 or 3. So, should I use parameter 1 as mean value or what??
Please suggest what should I use and why?
My Full code is below:
TH1F *hClusterSize = new TH1F("hClusterSize","Cluster size for GE1/1", 10,0.,10.);
tmpTree->Draw("g1ycl.ngeoch>>hClusterSize","g1ycl#.GetEntries()==1 && g1xcl#.GetEntries()==1");
hClusterSize->GetXaxis()->SetTitle("Cluster Size");
hClusterSize->GetYaxis()->SetTitle("#Entries");
TF1 *f1 = new TF1("f1","[0]*TMath::Power(([1]/[2]),(x/[2]))*(TMath::Exp(-([1]/[2])))/TMath::Gamma((x/[2])+1)", 0, 10);
f1->SetParameters(hClusterSize->GetMaximum(), hClusterSize->GetMean(), 1);
hClusterSize->Fit("f1"); // Use option "R" = fit between "xmin" and "xmax" of the "f1"
On the root command line fitting a poisson distribution can be done like this:
TF1* func = new TF1("mypoi","[0]*TMath::Poisson(x,[1])",0,20)
func->SetParameter(0,5000) // set starting values
func->SetParameter(1,2.) // set starting values
func->SetParName(0,"Normalisation")
func->SetParName(1,"#mu")
TH1F* hist = new TH1F("hist","hist",20,-0.5,19.5)
for (int i = 0 ; i < 5000 ; i++) { hist->Fill(gRandom->Poisson(3.5)); }
hist->Draw()
hist->Fit(func)
Note that that bin centers are shifted wrt your initial post, such that the bin
center of 0 counts is at 0 and not at 0.5 (and the same for all the other bins).
I have to solve a large number of simultaneous equations (~1000s) to solve at every time step for a general mean curvature flow problem. The problem is defined over closed manifolds so the boundary condition is periodic.
I am using successive-over-relaxation algorithm right now to solve this, but is very slow. I tried dgbtrf -> dgbtrs (without the periodicity condition), and is quite faster.
The coefficient matrix looks like this
⎛c₁ d₁ e₁ a₁ b₁⎞ ^
⎢b₂ c₂ d₂ e₂ 0 a₂⎥ |
⎢a₃ b₃ c₃ d₃ . 0 ⎥ |
A ← ⎢ a₄ b₄ c₄ . . ⎥ ~1000
⎢ 0 . . . . en₋₂⎥ |
⎢en₋₁ 0 . . . dn₋₁⎥ |
⎝dn en an bn cn ⎠ v
I need to solve pentadiagonal systems, that are not symmetric and not known to be positive definite.
Is there a way to solve cyclic/periodic banded systems in LAPACK?
Or do I have to use general solvers, such as dgetrs?
I am struggling with a math problem set question and just wanting some pointers:
I have this a stationary time series (MA(h)) that satisfies this equation below and has the sigma^2 below
xt=(ut+ut-1+ut-2+...ut-m)/m+1
with
Var(x)=4.0
how do I figure out the roh(h) the auto correlation function of this?
-it is given as (m+1-h)/m+1 <= h <=1 but I don't know how to get to that point
-I do know:
--sigma(h)/sigma(0) = ACF (sigma =autocovariance of 0 and h)
--sigma(h) = variance [sum aplha(i)*alpha(i+h)] if i<=q 0 else wherd
--when h =0 acf(h)=1 to start then it degrades to 0
any hints on where to start here?
I am running JAGS models through the R package runjags. I just updated to JAGS 4.0.0 from JAGS 3.4, and have noticed some unexpected behavior that seems to be related to the update.
First, when I run a model, I now get a warning message WARNING: Unused variable(s) in data table: followed by a list of data objects that are referenced in the model and provided as data. It doesn't seem to affect the results (but it is very puzzling). I have, however, noticed a few times while playing around with this that for some variables the posteriors were virtually identical to the priors (indicating that no updating occured). I can't seem to recreate the update failure right now, but below is a reproducible code example illustrating the odd warning message. The code example on the run.jags help page also produces the same warning.
Second, I thought I'd check to see if the same message pops up if I use the R package R2jags instead of runjags, but R2jags won't load because apparently rjags (one of the dependencies) is not compatible with JAGS 4.0 (its looking for JAGS 3.X). Also, in the runjags function run.jags, the argument method="rjags" doesn't seem to work anymore, but method="parallel" does work.
I'm using runjags_2.0.1-4 and R 3.2.2.
So my questions are:
1) Is rjags really incompatible with JAGS 4.0? The motivation to go to 4.0 was to use vectors as indices (see https://martynplummer.wordpress.com/2015/08/16/whats-new-in-jags-4-0-0-part-34-r-style-features/).
2) What is up with the unused variable(s) warning, and should I be concerned about it?
Thanks,
Glenn
Code:
#--- GENERATE DATA ------------------------
rm(list=ls())
# Number of sites and observations per site
N <- 200
nobs <- 3
# generate covariates and standardize (where appropriate)
set.seed(123)
forest <- rnorm(N)
# relationship between occupancy and covariates
b0 <- 0.5
b.for <- 0.5
psi <- plogis(b0 + b.for*forest)
# draw occupancy for each site
z <- rbinom(n=N, size=1,prob=psi)
# specify detection probablility
p <- 0.5
pz <- p*z
# generate the observations
Y <- rbinom(n=N, size=nobs,prob=pz)
#---- BUGS model ------------------------
model1 <- "model {
for (i in 1:N){
logit(eta[i]) <- b0 + b.for*forest[i]
z[i] ~ dbern(eta[i])
pz[i] <- z[i]*p
y[i] ~ dbin(pz[i],nobs)
} #i
b0.0 ~ dunif(0,1)
b0 <- log(b0.0/(1-b0.0))
b.for ~ dnorm(0,0.01)
p ~ dunif(0,1)
}"
occ.data1 <-list(y=Y,N=N,nobs=nobs,forest=forest)
inits1 <- function(){list(b0.0=runif(1),b.for=rnorm(1),p=runif(1),z=as.numeric(Y>0))}
parameters1 <- c("b0","b.for","p")
#---- RUN MODEL ------------------------
library(runjags)
ni <- 2000
nt <- 1
nb <- 1000
nc <- 3
ad <- 100
out <- run.jags(model=model1,data=occ.data1,monitor=parameters1,n.chains=nc,inits=inits1,burnin=nb,
sample=ni,adapt=ad,thin=nt,modules=c("glm","dic"),method="parallel")
To answer your questions:
1) rjags and JAGS used linked (non-interchangable) versions, and CRAN systems are still using JAGS_3.4.0 so the version of rjags on CRAN matches. This will be updated soon, and in the meantime you can grab the correct version of rjags from the sourceforge page as #jbaums notes.
2) This is a helpful message from JAGS/rjags telling you that you have specified something as data that the model isn't using. Remember that variable names are case sensitive i.e.
library('runjags')
model <- "model {
m ~ dunif(-1000,1000)
#data# M
#inits# m
#monitor# m
}"
M <- 0
m <- list(-10, 10)
results <- run.jags(model, method="interruptible", n.chains=2)
results <- run.jags(model, method="rjags", n.chains=2)
... gives you a warning because M does not match m. Also note that the warning looks a bit different from the two function calls - in the first it comes half-way down the JAGS output and in the second it comes as a warning in R after the function is completed.
As for 'should I be concerned' - yes if you think these variables should be in your model. If you can't find the problem try posting the code you are using - it got cut off from your original post.
Matt