How to preprocess data for training a character level RNN - keras

I am trying to train a RNN model which classifies origin of names. The data looks like attached image. I understand I have to first map the labels as an integer. I am using the following code to do that:
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
origin_label = encoder.fit_transform(origins)
I am having troubles figuring out what the next steps would be. I am using Keras to build this model. Thank you very much for your help.
Data Format

Related

Does K-Fold iteratively train a model

If you run cross-val_score() or cross_validate() on a dataset, is the estimator trained using all the folds at the end of the run?
I read somewhere that cross-val_score takes a copy of the estimator. Whereas I thought this was how you train a model using k-fold.
Or, at the end of the cross_validate() or cross_val_score() you have a single estimator and then use that for predict()
Is my thinking correct?
You can refer to sklearn-document here.
If you do 3-Fold cross validation,
the sklearn will split your dataset to 3 parts. (For example, the 1st part contains 1st-3rd rows, 2nd part contains 4th-6th rows, and so on)
sklearn iterate to train new model 3 times with different training set and validation set
In the first round, it combine 1st and 2nd part together and use it as training set and test the model with 3rd part.
In the second round, it combine 1st and 3rd part together and use it as training set and test the model with 2nd part.
and so on.
So, after using cross-validate, you will get three models. If you want the model objects of each round, you can add parameter return_estimato=True. The result which is the dictionary will have another key named estimator containing the list of estimator of each training.
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.metrics import confusion_matrix
from sklearn.svm import LinearSVC
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cv_results = cross_validate(lasso, X, y, cv=3, return_estimator=True)
print(sorted(cv_results.keys()))
#Output: ['estimator', 'fit_time', 'score_time', 'test_score']
cv_results['estimator']
#Output: [Lasso(), Lasso(), Lasso()]
However, in practice, the cross validation method is used only for testing the model. After you found the good model and parameter setting that give you the high cross-validation score. It will be better if you fit the model with the whole training set again and test the model with the testing set.

How am I able to visualize incorrect predictions from a binary image classifier and printing out the classification report

I have been trying to follow Francois example of a binary image classifier of cats and dogs. I have attempted to follow his example in another similar set in kaggle (https://www.kaggle.com/playlist/men-women-classification) and I want to achieve the following
Visualise the predictions that are wrong
Come out with the classification report
I already have a model with around 85% accuracy on the validation set but I want to know roughly what kind of images my model is getting wrong as well as coming up with a classification report with sklearn.metric's classification report.
However I do not know how does the image generator works and have a big problem trying to know how to pair the predictions with the labels of the test images.
from sklearn.metrics import classification_report
new_test_datagen = test_datagen.flow_from_directory(
directory = test_dir,
target_size=(150,150),
batch_size=1,
class_mode='binary',
seed = 42,
)
train_image = new_train_generator.next()
plt.imshow(train_image[0].reshape(150,150,-1))
print(train_image[1])
#I want to output images but I am not sure if this is the most efficient way of doing it
predictions = model.predict(test_generator)
predictions.shape
#The predictions is a numpy array of length 476 but I do not know what are the 'correct' labels found in my test set to validate it against this output.
model.evaluate(test_generator)
# [0.3109202980995178, 0.8886554837226868]

Pass text features along with image data to a CNN model

I am trying to approach a multi label image classification problem,for which i have image data but i also have some other features like gender etc, but the issue is that i will get this information during testing, in other words during testing only the image information will be provided.
My question is how can i use these extra features to help my image model which is a convolution Neural Network even though i wont have this info during testing?
Any advice will be helpful.Thanks in advance.
This is a really open ended question. I can give you some general guidelines on how this can work.
keras model API supports multiple inputs as well as merge layers. For example you can have something like this:
from keras.layers import Input
from keras.models import Model
image = Input(...)
text = Input(...)
... # apply layers onto image and text
from keras.layers.merge import Concatenate
combined = Concatenate()([image, text])
... # apply layers onto combined
model = Model([image, text], [combined])
This way you can have a model that takes multiple inputs that can make use of all of your data sources. keras has tools to combine your different inputs to produce one output. The part where this becomes open ended is the architecture.
Right now you should probably pass image through a CNN, and then merge the output with text. You have to tweak the exact specifications, such as how you handle each input, your merge method, and how you handle the combined output.
A good example of merge being used is here, where a GAN is given latent noise in the form of an image but also a label to determine what kind of image it should generate. Both the discriminator and the generator make use of the multiply merge layer to combine their inputs.

Machine Learning liner Regression - Sklearn

I'm new to the Machine learning domain and in Learn Regression i have some doubt
1:While practicing the sklearn learn regression model prediction method getting the below error.
Code:
sklearn.linear_model.LinearRegression.predict(25)
Error:
"ValueError: Expected 2D array, got scalar array instead: array=25. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."
Do i need to pass a 2-D array? Checked on sklearn documentation page any haven't found any thing for version update.
**Running my code on Kaggle
https://www.kaggle.com/aman9d/bikesharingdemand-upx/
2: Is index of dataset going to effect model's score (weights)?
First of all you should put your code as you use:
# import, instantiate, fit
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
linreg.fit(X, y)
# use the predict method
linreg.predict(25)
Because what you post in the question is not properly executable, predict method is not static for the class LinearRegression.
When you fit a model, the first step is recognize which kind of data will be the input, in your case will be similar to X, that means that if you pass something with different shape of X to the model it will raise an error.
In your example X seems to be a pd.DataFrame() instance with only 1 column, this should be replaceable with an array of 2 dimension representing the number of examples by the number of features, so if you try:
linreg.predict([[25]])
should work.
For example if you were trying a regression with more than 1 feature aka column, let's say temp and humidity, your input would look like this:
linreg.predict([[25, 56]])
I hope this will help you and always keep in mind which is the shape of your data.
Documentation: LinearRegression fit
X : array-like or sparse matrix, shape (n_samples, n_features)

How to use class weights for hot encoded outputs with Keras

I am new to AI and I am using Keras and Tensorflow to train CNNs. My dataset is heavily unbalanced and I want to use class weights to counter that.
After a small search in the internet I found out that I can use scikit learn's class weight() and sample weight() to get the class weights and sample weights respectively and it can be passed to model.fit() in Keras. But I am unsure how to implement it programmatically for hot encoded outputs.
Can someone provide sample code explaining how to implement classweights for hot encoded outputs with Keras?
Thanks in advance 😁

Resources