I have a multi class classification problem. I am using random forest classifier. My boss has asked if it is possible to also view our problem with regression. I understand that for a classification task, it is of course better to use a classifier, but is it possible to implement a regression model.
My data is as such:
I have a dataset consisting of software requirements, these are rated as either 1, 2, 3, 4 or 5.
I am then creating a feature matrix to use for training the model to make predictions on the class, with 10 features such as: num_words, num_sentences, num_syllables, weak_words, flesh_idx etc
My model works quite well with 93% accuracy.
Is there a way I can view this problem using regression? Such that the model would make predictions such as 1.5 for example, where the prediction doesn't fall into the class 1 or 2 but somewhere in the middle? Or maybe 2.2, 3.3 etc as opposed to 1, 2, 3, 4, or 5?
I guess the reason is just to see if we can see the software requirement scores in a continuous way.
try Softmax regression (or multinomial logistic regression) with mxnet or with tensorflow
The way you can use regression in classification problems is with Logistic Regressions. You can use this individually to classify 1 vs not 1, 2 vs not 2, and so on for each classification (don't do this), or use Softmax that in simple words, weights each class and returns a probability for each given class, then you just pick the one with the max probability and that will be your predicted class. There are a lot of neural networks that use softmax when working with mutli-class classification.
Here is a great article from scikit-learn's documentation:
https://scikit-learn.org/stable/modules/neural_networks_supervised.html
Related
I'm using Windows 10 machine.
Libraries: Keras with Tensorflow 2.0
Embeddings:Glove(100 dimensions)
I am trying to implement an LSTM architecture for multi-label text classification.
My problem is that no matter how much fine-tuning I do, the results are really bad.
I am not experienced in DL practical implementations that's why I ask for your advice.
Below I will state basic information about my dataset and my model so far.
I can't embed images since I am a new member so they appear as links.
Dataset form+Embedings form+train-test-split form
Dataset's labels distribution
My Implementation of LSTM
Model's Summary
Model's Accuracy plot
Model's Loss plot
As you can see my dataset is really small (~6.000 examples) and maybe that's one reason why I cannot achieve better results. Still, I chose it because it's unbiased.
I'd like to know if there is any fundamental mistake in my code regarding the dimensions, shape, activation functions, and loss functions for multi-label text classification?
What would you recommend to achieve better results on my model? Also any general advice regarding optimizing, methods,# of nodes, layers, dropouts, etc is very welcome.
Model's best val accuracy that I achieved so far is ~0.54 and even if I tried to raise it, it seems stuck there.
There are many ways to get this wrong but the most common mistake is to get your model overfit the training data.
I suspect that 0.54 accuracy means that your model selects the most common label (offensive) for almost all cases.
So, consider one of these simple solutions:
Create balanced training data: like 400 samples from each class.
or sample balanced batches for training (exactly the same number of labels on each training batch)
In addition to tracking accuracy and loss, look at precision-recall-f1 or even better try plotting area under curve, maybe different classes need different thresholds of activation. (If you are using Sigmoid on last layer maybe one class could perform better with 0.2 activations and another class with 0.7)
first try simple model. embedding 1 layer LSTM than classify
how to tokenize text , is vocab size enough ?
try dice loss
I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/
I was trying to find the best features that dominate for the output of my regression model, Following is my code.
seed = 7
np.random.seed(seed)
estimators = []
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=3,
batch_size=20)))
pipeline = Pipeline(estimators)
rfe = RFE(estimator= pipeline, n_features_to_select=5)
fit = rfe.fit(X_set, Y_set)
But I get the following runtime error when running.
RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes
How to overcome this issue and select best features for my model? If not, Can I use algorithms like LogisticRegression() provided and supported by RFE in Scikit to achieve the task of finding best features for my dataset?
I assume your Keras model is some kind of a neural network. And with NN in general it is kind of hard to see which input features are relevant and which are not. The reason for this is that each input feature has multiple coefficients that are linked to it - each corresponding to one node of the first hidden layer. Adding additional hidden layers makes it even more complicated to determine how big of an impact the input feature has on the final prediction.
On the other hand, for linear models it is very straightforward since each feature x_i has a corresponding weight/coefficient w_i and its magnitude directly determines how big of an impact it has in prediction (assuming that features are scaled of course).
The RFE estimator (Recursive feature elimination) assumes that your prediction model has an attribute coef_ (linear models) or feature_importances_(tree models) that has the length of input features and that it represents their relevance (in absolute terms).
My suggestion:
Feature selection: (Option a) Run the RFE on any linear / tree model to reduce the number of features to some desired number n_features_to_select. (Option b) Use regularized linear models like lasso / elastic net that enforce sparsity. The problem here is that you cannot directly set the actual number of selected features. (Option c) Use any other feature selection technique from here.
Neural Network: Use only features from (1) for your neural network.
Suggestion:
Perform the RFE algorithm on a sklearn-based algorithm to observe feature importance. Finally, you use the most importantly observed features to train your algorithm based on Keras.
To your question: Standardization is not required for logistic regression
I am doing transfer-learning/retraining using Tensorflow Inception V3 model. I have 6 labels. A given image can be one single type only, i.e, no multiple class detection is needed. I have three queries:
Which activation function is best for my case? Presently retrain.py file provided by tensorflow uses softmax? What are other methods available? (like sigmoid etc)
Which Optimiser function I should use? (GradientDescent, Adam.. etc)
I want to identify out-of-scope images, i.e. if users inputs a random image, my algorithm should say that it does not belong to the described classes. Presently with 6 classes, it gives one class as a sure output but I do not want that. What are possible solutions for this?
Also, what are the other parameters that we may tweak in tensorflow. My baseline accuracy is 94% and I am looking for something close to 99%.
Since you're doing single label classification, softmax is the best loss function for this, as it maps your final layer logit values to a probability distribution. Sigmoid is used when it's multilabel classification.
It's always better to use a momentum based optimizer compared to vanilla gradient descent. There's a bunch of such modified optimizers like Adam or RMSProp. Experiment with them to see what works best. Adam is probably going to give you the best performance.
You can add an extra label no_class, so your task will now be a 6+1 label classification. You can feed in some random images with no_class as the label. However the distribution of your random images must match the test image distribution, else it won't generalise.
I have 3 predictive models and I am evaluating there performance with a confusion matrix.
I am getting the same results for the confusion matrix for each of the 3 models.
I expect that the different models would perform differently and produce different confusion matrices. I am new to predictive modelling, so I suspect I am making a "Rooky mistake" . The full script I am using is sitting in a Jupyter notebook on GiThub here
A screenshot of the code for the 3 models is below
Can some one point out what is going wrong?
Cheers
Mike
As mentioned: make predictions on the test data. But keep in mind that your targets are skewed! So use StratifiedKFolds or something like this.
Also I guess that your data is a bit corrupted. While all models show the same result it may be a big mistake underneath.
Few questions/advises:
1. Did you scale your data?
2. Did you use one-hot-encoding?
2. Use don't Decision Trees but Forests/XGBoost. Easy to overfit with DT.
3. Don't use >2 hidden layers in NN because it's easy to overfit too. Use 2 firstly. And your architecture (30, 30, 30) with 2 target classes seems weird.
4. And if you wish to use >2 hidden layers - go to Keras or TF. You'll find there many features that can help you to not overfit.
That is simply because you are using the same Training data to make predictions. Since your models are already trained on the same data that you are making the predictions on, they will return the same results (and ultimately the same confusion matrix). You need to split your dataset into training and test sets. Then train your classifier on training set and make predictions on test set.
You can use train_test_split in Sklearn to split your dataset into training or test set.