I have a generalized Linear mixed model in SAS based on past data but I don't know how to export it to a pure mathematical formula that could be used to generates probabilities varying different inputs.
For example, I want to be able to input a values to different parameters and get a probability for that instance. Is there an easy way to export the mathematical model from SAS?
I guess you're using proc glimmix for your model. You can use the ODS output tables to get the coef of your model in a table that you can implement internally using proc score our externally (export to excel?).
ods output coef=coef;
proc glimmix data=HessianFly;
class block entry;
model y/n = block entry / e;
run;
ODS output table are available depending upon which options you have selected for your model.
Have a look at this link - all available options for ods outputs
Hope it helps
Related
After training my dataset which has a number of categorical data using fastai's tabular model, I wish to read out the entity embedding and use it to map to my original data values.
I can see the embedding weights. The number of input don't seem to match anything, but maybe it is based on the unique categorical values in the train_ds.
To get that map, I would like to get the self.categories dictionary from the Categorify transform class. Is there anyway to get that from the data variable obtained by calling TabularList.from_df?
Or maybe someone can tell me a better way to get this map. I know the input df into the TabularList.from_df() is not it, because the number of rows are wrong. Most likely because df is splitted into train and valid subsets. But there is no easy way to obtain the train part of the TabularList to check just the train part.
It's strange I can't find any code example that shows this. Doesn't anyone else care to map the entity embedding value back to its original categorical value?
I found it.
It is in data.train_ds.inner_df.
I am running an ordinal logistic regression. My problem is that SAS won't let me specify which value in the dependent categorical variable as my reference.
My code looks like:
proc surveylogistic data=mydata;
weight mywgt;
strata mystrata;
domain mydomain;
class depvar (ref="myref") indvar1 (ref="myref1") indvar2 (ref="myref2") /param=ref;
model depvar (order=internal)=indvar1 indvar2;
title 'my model';run;
In the class statement I specify that I want "myref" to be the reference for the dependent var which means when I look at the parameter estimates for the Intercepts the value "myref" should be omitted. When I look at the response profile, SAS correctly orders the categories for my dependent var, but no matter what I put in the class or model statement, I keep getting the highest value as my reference for my dependent var.
Does anyone know how I can specify my reference for my dependent var? It occurred to me I could change the order so that the category I want as the reference would have the highest value, but then it wouldn't be ordered correctly so an ordinal logistic regression would be inappropriate.
Thanks
Use the event= to specify your ref in the dependent variable.
model depvar(event='myref')=indvar1 indvar2;
I discovered that Ordinal Logistic Regressions don't have reference groups for the dependent variable. Only multinomial logistic regressions do, so that's why I couldn't do it.
I have a dataset containing 8 Parameters (4 Continuous 4 Categorical) and I am trying to eliminate features as per RFEC class in Scikit.
This is the formula I am using:
svc = SVC(kernel="linear")
rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(y, 2),
scoring='accuracy')
rfecv.fit(X, y)
As I have categorical data also, I changed it to the Dummy Variable using dmatrics (Patsy).
I want to try different Classification models on the data after feature selection to improve model along with SVC.
I ran RFE after transforming data and I think I am doing wrong.
Do we run the RFECV before transforming the Categorical data or after?
I can't find any clear indication in any document.
It depends on whether you want to select given values of he categorical variable or the whole variable.
You are currently selecting single settings (aka levels) of the categorical variable.
To select the whole variables, you would probably need to do a bit of hackery, defining your own estimator based on SVC.
You could do make_pipeline(OneHotEncoder(categorical_features), SVC()) but then you need to set the coef_ of th pipeline to something that reflects the input shape.
I'm usig RapidMiner for the first time. I have a dataset (in .xlsx format) on which I want to run the neural network algorithm. I am getting this error;
The operator NeuralNet does not have sufficient capabilities for the given data set; polynomial attributes not supported
Any help about this please?
Thank in advance!
Per the Neural Net operator's Help file...
...This operator cannot handle polynominal attributes.
Your given input file has several binominal and polynominal attributes. Therefore, if you wish to use the out of the box Neural Net operator, you need to convert your nominal data to numerical data. One way of doing this within RapidMiner is with the Nominal to Numerical operator.
Always be cognizant of the type of data/attribute you are maniuplating: (1) text, (2) numeric, and (3) nominal.
Given a sample dataset with 1000 samples of data, suppose I would like to preprocess the data in order to obtain 10000 rows of data, so each original row of data leads to 10 new samples. In addition, when training my model I would like to be able to perform cross validation as well.
The scoring function I have uses the original data to compute the score so I would like cross validation scoring to work on the original data as well rather than the generated one. Since I am feeding the generated data to the trainer (I am using a RandomForestClassifier), I cannot rely on cross-validation to correctly split the data according to the original samples.
What I thought about doing:
Create a custom feature extractor to extract features to feed to the classifier.
add the feature extractor to a pipeline and feed it to, say, GridSearchCv for example
implement a custom scorer which operates on the original data to score the model given a set of selected parameters.
Is there a better method for what I am trying to accomplish?
I am asking this in connection to a competition going on right now on Kaggle
Maybe you can use Stratified cross validation (e.g. Stratified K-Fold or Stratified Shuffle Split) on the expanded samples and use the original sample idx as stratification info in combination with a custom score function that would ignore the non original samples in the model evaluation.