VIFGC_ovedom function in GenABEL - nlm

I am using non-additive GWAS models implemented in GenABEL. VIFGC function for dominant and recessive GWAS non-additive model is working fine but for overdominance inheritance with the same data, VIFGC_ovedom function is giving an error:
Error: Error in nlm(GC_VIF_nlm, c(F = F1, K = K1)) : missing value in parameter
Also, how to calculate phenotypic variance explained by the significant markers in these non-additive models.
Thanks,

Related

ValueError: Found array with dim 3. Estimator expected <= 2 during RandomUndersampling

For one of my datasets, I have a data imbalance problem as the minority class has very few samples compared to the majority class. So I want to balance the data by undersampling the majority class. When I am trying to use RandomUnderSamples from imblearn package on a 3D array and I have an error
ValueError: Found array with dim 3. Estimator expected <= 2.
The features in the data which are in 3D format
train['X'].shape
(276216, 101, 4)
The input labels
train['y'].shape
(276216, 1)
When I try to randomly undersample data when I run this
from imblearn.under_sampling import RandomUnderSampler
undersample = RandomUnderSampler(sampling_strategy='majority')
X_train_under, y_train_under = undersample.fit(train['X'], train['y'])
I get the above error. Any help would be appreciated.
The function expects 2D arrays to be passed as arguments. Reshape your data and you'll be fine. Also, you will have to call fit_resample as per docs.
X = train['X'].reshape(train['X'].shape[0], -1)
X_train_under, y_train_under = undersample.fit_resample(X, train['y'])

Dictionary of Constraints Python SciPy.Optimize

I'm working on creating a dictionary of constraints for a large SCED power problem for minimization. However, I'm being given a ValueError saying an unknown type is passed despite only using Optimize.LinearConstraints at present. When I change to NonlinearConstraints (shown below), indicating that 'NonlinearConstraint' object has no attribute 'A'.
I have a feeling it's due to recursive elements, as even using a single constraint as I've defined them returns the same error
Any idea how I can create the recursive linear constraints?
##EDIT
I've been told to copy the code and provide a bit more context. "gen_supply_seg" is a three dimensional array that, depending on different points in time, has different constraints
def con2a():
for t in range(len(LOAD)):
for g in range(len(GEN)):
nlc2a = optimize.NonlinearConstraint(gen_supply_seg[t,g,1],lb=0,ub=P2Max[g])
return(nlc2a)
def con2b():
for t in range(len(LOAD)):
for g in range(len(GEN)):
nlc2b = optimize.NonlinearConstraint(gen_supply_seg[t,g,2],lb=0,ub=P3Max[g])
return (nlc2b)
def con2c():
for t in range(len(LOAD)):
for g in range(len(GEN)):
nlc2c = optimize.NonlinearConstraint(gen_supply_seg[t,g,3],lb=0,ub=P4Max[g])
return (nlc2c)
con2a = con2a()
con2b = con2b()
con2c = con2c()
These constraints are then added to a set like shown
cons = (con2a,
con2b,
con2c)

ValueError: shapes (5,14) and (16,) not aligned: 14 (dim 1)!= 16 (dim 0)

I am working on housing dataset and when trying to fit the linear regression model getting error as mentioned. Complete code as below.
I am not sure where is code going wrong. I tried pasting the code as it is from the reference book.
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(housing_prepared, housing_labels)
some_data = housing.iloc[:5]
some_labels = housing_labels.iloc[:5]
some_data_prepared = full_pipeline.transform(some_data)
print("Predictions:\t", lin_reg.predict(some_data_prepared))
ERROR: ValueError: shapes (5,14) and (16,) not aligned: 14 (dim 1) != 16 (dim 0)
What am I doing wrong here?
Explanation
Hi, I guess you are reading and following the Hands on Machine Learning with Scikit Learn and Tensorflow book. The problem also occurred to me.
In the following part of the code you select from the data set the first 5 instances. One of the attributes in the data set which is called ocean_proximity is an object and for the linear regression model to be able to operate with it, it must be translated to an integer, which in the book is done with a one hot encoding.
One hot encoding works by analyzing all the categories that can be assigned to the attribute, in this case 5 ('<1H OCEAN', 'INLAND', 'NEAR OCEAN', 'NEAR BAY', 'ISLAND'), and then creating a matrix of that length for each instance and zeroing every element of the matrix except the category of that instance which is assigned a 1 (or another value). For example:
If ocean_proximity equals '<1H OCEAN' the conversion would be [1, 0, 0, 0, 0]
In this piece of code you select the five first instances of the data set, but this does not assure you that all the categories in "ocean_proximity" will appear. It could happen that only 3 of them appear or just 1. Therefor if you apply a one hot encoding to those five selected rows and only 3 categories appear (for example just 'INLAND', 'ISLAND' and 'NEAR BAY'), the matrices created by the one hot encoding will be of length 3.
some_data = housing.iloc[:5]
some_labels = housing_labels.iloc[:5]
some_data_prepared = full_pipeline.transform(some_data)
The error is just telling you that, since the one hot conversion of some_data created matrices of a length inferior to 5, the total columns in some_data_prepared is 14, which is less than the columns in housing_prepared (16), thus making the model unable to predict the prices.
If you transform both some_data_prepared and housing_prepared into dataframes and then call .head() you will see the problem.
some_data_prepared.head()
housing_prepared.head()
Solution
To solve the problem you must create the columns missing in some_data_prepared by creating a zeroed numpy array of shape [5,x] (being 5 the number of rows and x the number of columns missing) and concatenating it to some_data_prepared to match the shape of the housing_prepared data set.
some_data = housing.iloc[:5]
some_labels = housing_labels.iloc[:5]
some_data_prepared = full_pipeline.fit_transform(some_data)
dummy_array = np.zeros((5,1))
some_data_prepared = np.c_[some_data_prepared, dummy_array]
predictions = linear_regression.predict(some_data_prepared)
print("Predictions: ", predictions)
print("Labels: ", some_labels.values)
Missing category values (ocean proximity in this case) in some_data compared to housing_prepared is the issue.
housing_prepared.shape gives (16512, 16), but some_data_prepared.shape gives (5,14), so add zeros for the missing columns:
dummy_array = np.zeros((5,2))
some_data_prepared = np.c_[some_data_prepared,dummy_array]
the 2 in np.zeros determines the difference of columns
I've at first encountered the same issue on the considered piece of code. After exploring the issues of the handson-ml repository, I think I have understood the subtlety which is causing the error here.
My guess is that (as in my case), closing the notebook might have caused what was in memory (and the trained model in particular) to be lost. In my case, I could get the result and avoid the error rerunning the notebook from the beginning.
Instead, from a theoretical viewpoint, you should never call fit() or fit_transform() on data which is not training data (eg on some_data). Here, running fit_transform(some_data) and then stacking the dummy array to some_data_prepared works, but it forces the model to be trained again on some_data rather than on housing_prepared, which is not what you want.

Getting an error while executing perplexity function to evaluate the LDA model

I am trying to evaluate the topic modeling(LDA). Getting a error while execting perplexity function as: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘perplexity’ for signature ‘"LDA_Gibbs", "numeric"’ someone please help to solve this.
As you haven't provided any example of your code, it's difficult to know what your exact issue is. However, I found this question when I was facing the same error so I will provide the problem I faced and solution here in the hope that it may help someone else.
In the topicmodels package, when fitting using Gibbs the perplexity() function requires newdata to be supplied in a document-term format. If you give it something else, you get this error. Going by your error message you were probably giving it something numeric instead of a dtm.
Here is a working example, using the newsgroups data from the lda package converted to the dtm format:
library(topicmodels)
# load the required data from lda package
data("newsgroup.train.documents", "newsgroup.test.documents", "newsgroup.vocab", package="lda")
# create document-term matrix using newsgroups training data
dtm <- ldaformat2dtm(documents = newsgroup.train.documents, vocab = newsgroup.vocab)
# fit LDA model using Gibbs sampler
fit <- LDA(x = dtm, k = 20, method="Gibbs")
# create document-term matrix using newsgroups test data
testdtm <- ldaformat2dtm(documents = newsgroup.test.documents, vocab = newsgroup.vocab)
# calculate perplexity
perplexity(fit, newdata = testdtm)

How to omit empty Model errors in tune.svm()?

I’am trying to use the tune.svm-function and since I don’t really know which parameters will produce a good model (as training data will be picked by a user) I need to cover a wide range of values. Currently I've got this behavior
tune(svm, value ~ . , data= data_l, ranges=list(cost = 10^(0:5), epsilon = 10^(-1:0)))
Parameter tuning of ‘svm’:
- sampling method: 10-fold cross validation
- best parameters:
cost epsilon
100 0.1
- best performance: 277.5491
and
tune(svm, value ~ . , data= data_l, ranges=list(cost = 10^(0:5), epsilon = 10^(-1:1)))
Error in predict.svm(ret, xhold, decision.values = TRUE) : Model is empty!
(Difference in the highest value of epsilon)
I know that svm won’t work with epsilon =10 but my intuition for this tuning function would be, that it can handle parameters that will not produce a model.
Why wouldn't it pick the models that could be generated?
Is there an “easy” way to omit this error-behavior? (I tried tryCatch(tune()) and a lot of other stuff I found, but I guess I would have to dig deep into the tune/svm/predict-codes which doesn’t sound “easy” anymore)
My understanding of the "Model is empty!" error is that it indicates a singular training matrix was fed into the SVM. See Salvy's answer to this related post and Oldrich Kruza's related message in the R mailing list.

Resources