I would like to test the significance of my random effects term after multiple imputation with mice. I have two nested models which I try to compare with pool compare, however this creates some error.
fm1 <- with(dti.mice1, glmer(Treatment ~ (1|Hospital) + Age))
fm2 <- with (dti.mice1, glm(Treatment ~ Age))
pool.compare (fm1, fm2)
Error: Model 'fit1' not larger than 'fit0'
The problem seems to be, that fm1 and fm2 are not recognized as nested models
(with fm1 being the larger model).
From the mice documentation:
fit1
An object of class 'mira', produced by with.mids().
fit0
An object of class 'mira', produced by with.mids(). The model in fit0 is a nested fit0 of fit1.
So the complete/ larger model has to be fit1. Which is the case for your example.
Might be that you can't use glm and glmer together. What happens, if you use glm for both models? Does this work?
Related
I'm using spatstat to run some mppm models and would like to be able to calculate standard errors for the predictions as in predict.ppm. I could use predict.ppm on each point process individually of course, but I'm wondering if this in invalid for any reason or if there is a better way of doing so?
This is not yet implemented as an option in predict.mppm. (It is on our long list of things to do. I will move it closer to the top of the list.)
However, it is available by applying predict.ppm to each element of subfits(model), where model was the original fitted model of class mppm. Something like:
m <- mppm(......)
fits <- subfits(m)
Y <- lapply(fits, predict, se=TRUE)
Just to clarify, fits[[i]] is a point process model, of class ppm, for the data in row i of the data hyperframe, implied by the big model m. The parameter estimates and variance estimates in fits[[i]] are based on information from the entire hyperframe. This is not the same as fitting a separate model of class ppm to the data in each row of the hyperframe and calculating predictions and standard errors for those fits.
Apologies if this is a stupid question but I have a dataset with two classes I wish to attempt to classify using a U-Net.
When creating the label matrices, do I need to explicitly define the null / base class (everything which isn't a class) or will Keras calculate this automatically?
For example, if I have a set of images where I'd like to classify the regions where there is a dog or where this is a cat, do I need to create a third label matrix which labels everything which is not a dog or cat (and thus, have three classes)?
Furthermore, the null class dominates the images I'm wishing to segment; if I were to use a class_weight, it seems to only accept a dictionary as input whereas I swear before I good specify a list and that would suffice.
If I treat my problem as a two-class problem, I'm assuming I need to specify the weight of the null class too, i.e. class_weight = [nullweight, dogweight, catweight].
Thank you
edit: Attached example
Is this above image a two class or three class problem?
You must specify the other class since the network needs to differentiate between the dog, the cat and the background.
As for the class_weights parameter, the discussion is a little bit more complicated, you cannot assign like you would do in a simple classification problem.
Indeed, in many problems the background constitutes a big part of the image so you need to be careful when approaching such an imbalanced problem.
You need to inspect the parameter sample_weights, not class_weights, you can have a look at these threads:
https://datascience.stackexchange.com/questions/31129/sample-importance-training-weights-in-keras
https://github.com/keras-team/keras/issues/3653
Weighting samples in multiclass image segmentation using keras
image-segmentation-using-keras
I saw both transformer and estimator were mentioned in the sklearn documentation.
Is there any difference between these two words?
The basic difference is that a:
Transformer transforms the input data (X) in some ways.
Estimator predicts a new value (or values) (y) by using the input data (X).
Both the Transformer and Estimator should have a fit() method which can be used to train them (they learn some characteristics of the data). The signature is:
fit(X, y)
fit() does not return any value, just stores the learnt data inside the object.
Here X represents the samples (feature vectors) and y is the target vector (which may have single or multiple values per corresponding sample in X). Note that y can be optional in some transformers where its not needed, but its mandatory for most estimators (supervised estimators). Look at StandardScaler for example. It needs the initial data X for finding the mean and std of the data (it learns the characteristics of X, y is not needed).
Each Transformer should have a transform(X, y) function which like fit() takes the input X and returns a new transformed version of X (which generally should have same number samples but may or may not have same features).
On the other hand, Estimator should have a predict(X) method which should output the predicted value of y from the given X.
There will be some classes in scikit-learn which implement both transform() and predict(), like KMeans, in that case carefully reading the documentation should solve your doubts.
Transformer is a type of Estimator that implements transform method.
Let me support that statement with examples I have come across in sklearn implementation.
Class sklearn.preprocessing.FunctionTransformer :
This inherits from two other classes TransformerMixin, BaseEstimator
Class sklearn.preprocessing.PowerTransformer :
This also inherits from TransformerMixin, BaseEstimator
From what I understand, Estimators just take data, do some processing, and store data based on logic implemented in its fit method.
Note: Estimator's aren't used to predict values directly. They don't even have predict method in them.
Before I give more explanation to the above statement, let me tell you about Mixin Classes.
Mixin Class: These are classes that implement a Mix-in design pattern. Wikipedia has very good explanation about it. You can read it here . To summarise, these are classes you write which have methods that can be used in many different classes. So, you write them in one class and just inherit in many different classes(A form of composition. Read These Links - Link1 Link2)
In Sklearn there are many mixin classes. To name a few
ClassifierMixin, RegressorMixin, TransformerMixin.
Here, TransformerMixin is the class that's inherited by every Transformer used in sklearn. TransformerMixin class has only one method which is reusable in every transformer and that is fit_transform.
All transformers inherit two classes, BaseEstimator(Which has fit method) and TransformerMixin(Which has fit_transform method). And, Each transformer has transform method based on its functionality
I guess that gives an answer to your question. Now, let me answer the statement I made regarding the Estimator for prediction.
Every Model Class has its own predict class that does prediction.
Consider LinearRegression, KNeighborsClassifier, or any other Model class. They all have a predict function declared in them. This is used for prediction. Not the Estimator.
The sklearn usage is perhaps a little unintuitive, but "estimator" doesn't mean anything very specific: basically everything is an estimator.
From the sklearn glossary:
estimator:
An object which manages the estimation and decoding of a model...
Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator.
transformer:
An estimator supporting transform and/or fit_transform...
As in #VivekKumar's answer, I think there's a tendency to use the word estimator for what sklearn instead calls a "predictor":
An estimator supporting predict and/or fit_predict. This encompasses classifier, regressor, outlier detector and clusterer...
I would like to use libsvm for a keypoint detection algorithm. Each keypoint has 36 features, but each sample of an Object has a different count of keypoints...
my input array would look like:
Object 1: (K1_F1,...K1_F36,K2_F1,...K2_F36, ... , K12_F1,...K12_F36)
Object 1: (K1_F1,...K1_F36,K2_F1,...K2_F36, ... , K15_F1,...K15_F36)
Object 2: (K1_F1,...K1_F36,K2_F1,...K2_F36, ... , K16_F1,...K16_F36)
Object 2: (K1_F1,...K1_F36,K2_F1,...K2_F36, ... , K9_F1,...K9_F36)
Is it even possible to train with different count of keypoints?
In short: no, it is not possible. SVM requires constant shape data representation. There are two ways for approaching such a problem:
Create some conversion into constant size representation, one of the most common are clustering methods, bag of words representations and other compression-based approaches.
Find a suitable kernel function, which for two sets of keypoints returns a valid scalar product value in some space and feed it to the SVM.
I'm trying to fit some models in scikit-learn using grisSearchCV, and I would like to use the "one standard error" rule to select the best model, i.e. selecting the most parsimonious model from the subset of models whose score is within one standard error of the best score. Is there a way to do this?
You can compute the standard error of the mean of the validation scores using:
from scipy.stats import sem
Then access the grid_scores_ attribute of the fitted GridSearchCV object. This attribute has changed in the master branch of scikit-learn so please use an interactive shell to introspect its structure.
As for selecting the most parsimonious model, the model parameters of the models do not always have a degrees of freedom interpretation. The meaning of the parameters is often model specific and there is no high level metadata to interpret their "parsimony". You can have to encode your interpretation on a case by case basis for each model class.