how to make RandomForestClassifier faster? - python-3.x

I am trying to implement bag of word model from kaggle site with a twitter sentiments data which has around 1M raw. I already clean it but in last part when I applied my features vectors and sentiments to Random Forest classifier it is taking so much time.here is my code...
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators = 100,verbose=3)
forest = forest.fit( train_data_features, train["Sentiment"] )
train_data_features is 1048575x5000 sparse matrix.I tried to converted it into an array while doing it indicates a memory error.
Where am I doing wrong?Can some suggest me some source or another way to do it faster?I absolutely novice in machine learning and not have that much programming background so some guide will accommodate.
Much thanks to you in advance

Actually the solution is pretty straight forward: get strong machine and run it in parallel. By default RandomForestClassifier uses a single thread, but since it is an ensemble of completely independent models you can train each of these 100 tress in parallel. Just set
forest = RandomForestClassifier(n_estimators = 100,verbose=3,n_jobs=-1)
to use all of your cores. You can also limit max_depth which will speed things up (in the end you will probably need this either way, since RF can overfit badly without any limitation on depth).

Related

How to select the best fit machine learning algorithm

while frequently running the machine learning algorithms the accuracy is changing in that case how to select the best fit algorithm for that particular data set.
You should definitely provide more details. It's impossible to suggest anything without the domain, the model architecture, hyperparameters.
I guess you are complaining due to changing of accuracy of the model. I think you should set seeds for randomized parameters so that accuracy don't change while training different times and you can reproduce your results.
numpy.random.seed(1)
random.seed(1)
tf.random.set_random_seed(1) # if using tensorflow
Lets assume , the question is for the same data set X (Training), everytime when we run the accuracy by comparing the predicted responses against our Testdata Dependent values(Y) .
If the accuracy keeps changing if we run the model seems, the issue is Sampling Bias ( the division of Training and Test data upholds a mystery).
When you import train_test_split function , use the random_state attribute wisely to keep the test data representative for the overall population of data.

Which classifier is good for a scheduling problem with sklearn?

With sklearn, I am trying to model a pickup and dropoff vehicle routing problem. If you can recommend one of classifiers, it will be appreciated. For a simplicity, there is one vehicle and there are 5 customers. The training data has 20 features and 10 outputs.
Features include the x-y cords of 5 customers. Each customer has pickup and dropoff locations.
c1p_x, c1p_y,c2p_x, c2p_y,c3p_x, c3p_y,c4p_x, c4p_y,c5p_x, c5p_y,
c1d_x, c1d_y,c2d_x, c2d_y,c3d_x, c3d_y,c4d_x, c4d_y,c5d_x, c5d_y,
c1p_x, c1p_y: customer 1 pickup x-y cord.
c1d_x: c1d_y: customer 1 dropoff x-y cord.
For example,
123,106,332,418,106,477,178,363,381,349,54,214,297,34,5,122,3,441,455,322
Outputs include the optimal sequence of visit.For example, 5,10,2,7,1,6,4,9,3,8
Customer 5 (pkup) => 10 (drop) => 2 (pkup) => 7 (drop) ... => 8 (drop)
Note each pickup will be immediately followed by dropoff.
Here are codes I tried.
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
train = pd.read_csv('ML_DARP_train.txt',header=None,sep=',')
print (train.head())
x = train[range(0,19)]
y = train[range(20,30)]
classifier = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(15,), random_state=1)
MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto',
beta_1=0.9, beta_2=0.999, early_stopping=False,
epsilon=1e-08, hidden_layer_sizes=(15,),
learning_rate='constant', learning_rate_init=0.001,
max_iter=200, momentum=0.9, n_iter_no_change=10,
nesterovs_momentum=True, power_t=0.5, random_state=1,
shuffle=True, solver='lbfgs', tol=0.0001,
validation_fraction=0.1, verbose=False, warm_start=False)
classifier.fit(x, y)
print(classifier.score(x, y))
test = pd.read_csv('ML_DARP_test.txt',header=None,sep=',')
test = test[range(0,19)]
print (classifier.predict(test))
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.multioutput import MultiOutputClassifier
import pandas as pd
train = pd.read_csv('ML_DARP_train.txt',header=None,sep=',')
print (train.head())
x = train[range(0,19)]
y = train[range(20,30)]
print (y)
forest = RandomForestClassifier(n_estimators=100, random_state=0)
classifier = MultiOutputClassifier(forest, n_jobs=-1)
classifier.fit(x, y)
print(classifier.score(x, y))
test = pd.read_csv('ML_DARP_test.txt',header=None,sep=',')
test = test[range(0,19)]
print (classifier.predict(test))
Here are training data.
123,106,332,418,106,477,178,363,381,349,54,214,297,34,5,122,3,441,455,322,5,10,2,7,1,6,4,9,3,8
154,129,466,95,135,191,243,13,289,227,300,40,171,286,219,403,232,113,378,428,5,10,2,7,1,6,4,9,3,8
215,182,163,321,259,500,434,304,355,276,77,414,93,83,42,292,101,459,488,237,5,10,4,9,3,8,2,7,1,6
277,220,313,29,304,229,500,454,263,154,339,255,484,351,287,87,330,147,411,343,1,6,3,8,2,7,4,9,5,10
308,258,464,223,349,460,64,120,188,62,100,96,374,118,16,368,73,352,365,480,2,7,1,6,5,10,3,8,4,9
369,296,97,385,363,174,161,317,128,472,346,423,217,338,246,163,349,87,335,132,2,7,4,9,1,6,5,10,3,8
400,318,263,94,471,467,321,45,146,475,107,264,139,136,53,36,155,370,382,380,3,8,2,7,4,9,5,10,1,6
477,387,461,350,62,244,417,242,102,399,401,137,76,451,330,364,431,90,368,47,3,8,1,6,4,9,2,7,5,10
38,441,95,12,45,412,452,361,496,276,162,479,420,155,12,112,128,263,290,138,4,9,1,6,3,8,2,7,5,10
69,447,245,205,106,157,79,89,467,216,393,289,311,422,273,440,435,30,291,323,2,7,4,9,3,8,1,6,5,10
115,0,427,430,214,451,207,302,439,172,185,178,232,220,64,282,210,266,292,22,2,7,5,10,1,6,3,8,4,9
192,53,92,123,259,180,273,468,363,81,447,19,122,488,310,77,454,471,246,159,3,8,1,6,5,10,2,7,4,9
223,91,227,317,304,411,385,180,319,5,208,361,498,239,54,389,245,222,231,328,5,10,2,7,4,9,1,6,3,8
269,113,424,57,396,188,12,378,322,493,470,218,435,52,331,231,20,474,263,59,4,9,5,10,1,6,2,7,3,8
315,151,42,204,410,387,78,43,215,355,215,28,278,273,44,11,264,178,170,149,5,10,2,7,3,8,4,9,1,6
393,236,239,444,487,148,191,240,202,326,23,417,200,71,321,338,39,414,203,365,4,9,2,7,3,8,1,6,5,10
454,274,390,153,62,410,303,453,173,266,286,259,106,354,96,165,331,165,203,48,4,9,2,7,3,8,5,10,1,6
15,327,86,378,154,187,447,181,160,237,62,131,27,152,389,23,137,448,220,264,3,8,4,9,2,7,1,6,5,10
61,365,237,71,184,417,43,379,131,178,324,474,403,388,133,334,413,167,205,417,5,10,3,8,1,6,4,9,2,7
123,418,403,280,261,178,124,59,56,86,101,331,309,170,394,145,172,404,175,70,3,8,2,7,5,10,4,9,1,6
169,441,36,458,275,378,190,194,466,465,332,141,167,406,108,426,385,76,97,176,3,8,2,7,1,6,5,10,4,9
215,494,249,213,398,186,365,470,500,483,124,29,120,236,431,315,238,391,161,439,3,8,5,10,4,9,1,6,2,7
246,0,337,345,396,370,399,88,377,329,355,324,449,441,113,48,420,32,68,28,2,7,3,8,1,6,5,10,4,9
339,100,49,84,489,131,496,286,317,253,163,213,370,238,390,375,195,268,37,181,5,10,2,7,3,8,4,9,1,6
385,122,215,294,80,424,139,29,320,240,425,70,292,36,181,233,17,50,70,413,2,7,4,9,1,6,5,10,3,8
416,144,366,2,125,154,236,211,291,180,170,396,182,304,427,44,277,286,86,112,5,10,3,8,2,7,4,9,1,6
477,198,15,180,170,384,348,424,231,105,448,253,41,39,171,356,68,22,56,265,1,6,4,9,2,7,5,10,3,8
38,251,181,390,231,130,414,58,171,13,209,95,448,322,417,151,281,211,10,387,2,7,5,10,1,6,3,8,4,9
69,258,347,98,276,360,495,239,111,439,455,437,354,89,161,447,40,447,480,55,3,8,1,6,4,9,5,10,2,7
131,327,28,323,384,153,169,500,130,426,248,309,275,388,469,321,379,229,27,286,1,6,5,10,2,7,4,9,3,8
161,334,116,454,351,305,172,102,492,272,463,88,87,76,120,37,60,387,419,361,1,6,5,10,3,8,4,9,2,7
238,403,313,194,428,82,300,331,479,227,255,462,9,375,412,396,367,154,420,45,2,7,1,6,4,9,5,10,3,8
285,456,10,419,35,375,460,74,498,230,47,350,447,189,219,254,189,437,484,308,2,7,5,10,4,9,1,6,3,8
346,494,144,112,95,120,71,287,469,170,294,176,322,441,480,81,480,188,469,476,3,8,2,7,4,9,5,10,1,6
393,47,310,322,156,367,168,453,378,63,71,33,227,223,240,393,208,377,423,113,5,10,1,6,3,8,4,9,2,7
470,100,22,77,264,159,312,197,380,34,364,406,180,52,47,266,30,159,440,329,4,9,2,7,5,10,1,6,3,8
15,138,157,239,294,358,362,332,289,429,109,248,23,273,246,14,243,348,393,466,5,10,4,9,3,8,2,7,1,6
61,176,323,449,355,119,490,59,261,384,371,89,430,39,22,357,49,99,394,134,2,7,4,9,1,6,3,8,5,10
92,198,427,110,353,303,39,210,201,293,117,400,274,260,220,121,278,304,348,272,1,6,4,9,5,10,3,8,2,7
185,267,154,367,476,111,183,439,172,249,410,288,226,89,27,496,84,70,365,488,5,10,4,9,1,6,3,8,2,7
231,321,305,59,36,357,264,119,112,157,187,145,117,357,288,291,345,291,319,124,4,9,1,6,5,10,2,7,3,8
262,327,440,238,66,71,345,270,36,66,417,456,477,92,17,86,72,480,273,262,1,6,2,7,4,9,3,8,5,10
323,381,89,431,96,287,411,436,477,476,179,298,367,359,231,367,317,200,242,415,1,6,3,8,2,7,5,10,4,9
369,418,286,171,219,94,85,195,10,494,456,170,304,173,54,256,154,499,321,192,1,6,2,7,3,8,4,9,5,10
416,456,437,365,249,325,166,377,452,403,217,12,179,425,299,51,415,218,275,330,3,8,2,7,4,9,5,10,1,6
477,9,86,42,294,55,248,58,360,296,479,354,53,176,28,347,158,423,214,452,3,8,2,7,4,9,1,6,5,10
7,31,237,251,355,301,360,255,332,236,240,179,460,459,289,158,434,158,214,151,2,7,3,8,4,9,5,10,1,6
84,84,418,476,447,78,473,468,319,207,17,52,381,241,65,0,225,426,231,335,5,10,4,9,2,7,3,8,1,6
146,154,83,169,477,277,22,102,180,37,295,410,256,493,279,265,438,83,107,395,4,9,5,10,2,7,1,6,3,8
177,176,234,363,21,7,134,299,183,24,40,236,131,244,24,76,213,334,155,125,4,9,2,7,1,6,3,8,5,10
238,214,416,87,144,332,309,74,201,27,318,108,68,58,347,466,66,148,202,372,1,6,2,7,4,9,5,10,3,8
316,298,128,344,236,108,438,287,173,468,126,498,5,372,154,340,357,400,188,40,4,9,1,6,5,10,3,8,2,7
346,305,215,475,219,276,441,391,50,330,341,292,334,76,337,57,38,41,126,162,3,8,1,6,4,9,5,10,2,7
408,358,397,199,296,37,68,103,37,285,118,133,239,359,97,384,330,309,127,346,3,8,5,10,1,6,4,9,2,7
439,381,47,377,357,284,180,316,494,225,380,491,114,110,358,211,120,44,112,30,1,6,3,8,5,10,2,7,4,9
15,450,228,117,449,60,309,28,465,165,156,348,51,425,150,69,412,311,97,199,3,8,4,9,5,10,2,7,1,6
77,2,363,295,463,260,358,163,358,43,434,205,411,160,348,318,124,469,35,305,5,10,1,6,3,8,2,7,4,9
123,40,59,19,23,21,487,392,345,500,211,62,317,428,124,160,416,236,36,4,5,10,3,8,1,6,2,7,4,9
169,62,194,197,83,267,83,72,317,455,441,389,207,194,385,472,175,472,37,189,2,7,1,6,5,10,4,9,3,8
231,131,376,438,160,28,164,254,241,348,234,261,129,493,145,298,435,176,492,326,3,8,5,10,2,7,4,9,1,6
277,154,25,115,221,274,276,452,213,304,480,87,3,244,406,109,210,428,493,10,3,8,2,7,4,9,5,10,1,6
323,191,176,309,266,489,358,132,153,213,241,429,394,11,135,405,470,147,463,179,5,10,1,6,3,8,2,7,4,9
354,214,311,487,296,203,423,283,46,90,488,255,253,247,349,185,198,336,401,300,3,8,1,6,4,9,5,10,2,7
447,298,7,211,372,481,66,11,64,93,296,143,175,45,141,43,4,103,449,31,3,8,4,9,2,7,5,10,1,6
493,336,157,420,449,242,178,224,35,33,41,470,65,312,417,370,296,370,434,200,5,10,3,8,1,6,2,7,4,9
7,343,323,129,40,19,291,421,477,458,287,311,488,110,193,197,71,90,404,369,2,7,5,10,4,9,3,8,1,6
69,396,474,307,54,218,372,102,432,382,64,168,346,346,407,477,331,326,389,21,1,6,3,8,5,10,4,9,2,7
146,465,155,31,115,465,469,284,357,291,342,25,252,113,167,304,90,30,343,158,3,8,1,6,5,10,4,9,2,7
192,2,305,240,191,226,96,11,375,278,103,368,158,396,444,146,397,313,375,390,3,8,1,6,4,9,5,10,2,7
238,24,471,450,268,488,209,209,315,202,365,209,64,178,220,474,156,33,360,58,3,8,2,7,4,9,1,6,5,10
285,78,105,111,282,202,274,359,224,80,126,50,424,415,434,238,385,222,298,179,5,10,3,8,1,6,2,7,4,9
346,116,286,336,358,464,387,71,195,35,388,393,330,197,210,80,176,474,299,364,1,6,3,8,5,10,2,7,4,9
424,185,484,92,466,256,46,316,229,38,196,281,283,26,17,454,499,272,347,110,3,8,4,9,2,7,5,10,1,6
470,223,133,270,11,471,111,482,138,432,442,122,157,278,231,218,242,476,301,248,3,8,2,7,4,9,5,10,1,6
15,261,284,464,56,201,208,163,94,357,203,465,32,29,477,29,1,196,271,401,4,9,1,6,5,10,2,7,3,8
46,267,418,141,70,400,258,313,2,250,434,275,392,265,190,310,230,401,209,6,4,9,3,8,5,10,1,6,2,7
107,336,130,381,193,224,417,72,5,237,242,178,345,79,12,183,68,183,257,253,2,7,4,9,1,6,3,8,5,10
154,358,234,43,176,392,452,176,415,114,473,474,188,284,195,417,250,341,179,359,5,10,3,8,1,6,4,9,2,7
200,412,400,252,252,153,79,405,370,54,250,331,94,66,472,259,56,92,165,27,1,6,4,9,5,10,3,8,2,7
277,465,81,462,298,383,129,38,295,448,26,188,485,334,201,54,269,281,134,196,3,8,4,9,1,6,2,7,5,10
339,18,262,186,405,176,304,314,313,451,304,60,406,131,8,429,122,94,182,428,4,9,1,6,2,7,3,8,5,10
370,56,429,411,498,453,432,26,269,375,65,403,328,430,300,271,398,331,152,80,1,6,5,10,4,9,2,7,3,8
431,94,78,88,11,152,466,161,162,252,327,244,187,166,499,19,110,3,74,186,5,10,3,8,1,6,4,9,2,7
462,116,228,282,71,414,94,390,180,239,72,70,93,449,274,362,417,286,122,433,2,7,4,9,1,6,5,10,3,8
38,185,410,6,164,190,237,102,136,179,366,459,500,231,66,220,208,22,107,101,5,10,4,9,2,7,3,8,1,6
100,238,75,215,225,421,319,284,92,104,142,300,390,499,311,15,468,258,77,238,3,8,5,10,1,6,2,7,4,9
131,245,179,362,192,73,337,387,454,451,358,95,233,203,478,249,149,400,485,329,1,6,3,8,5,10,4,9,2,7
177,298,392,118,331,397,27,178,3,469,150,484,186,32,317,154,18,229,47,91,3,8,2,7,1,6,5,10,4,9
254,352,57,311,392,143,108,344,460,409,428,341,76,299,61,450,262,450,48,275,3,8,4,9,1,6,5,10,2,7
285,390,191,489,406,358,174,9,369,302,173,167,436,35,291,229,6,138,2,413,5,10,2,7,3,8,1,6,4,9
362,443,373,229,482,119,318,253,387,289,451,23,357,334,67,71,329,437,34,143,4,9,1,6,3,8,2,7,5,10
408,481,54,439,105,428,462,466,358,245,227,397,279,131,375,446,119,188,35,328,5,10,3,8,1,6,4,9,2,7
470,34,204,131,134,142,42,147,283,138,489,238,154,383,104,241,364,393,475,450,5,10,1,6,2,7,3,8,4,9
15,71,355,325,164,357,92,282,176,15,250,64,28,134,318,5,76,65,413,71,3,8,5,10,2,7,4,9,1,6
61,94,4,18,225,102,204,495,178,488,12,406,435,417,78,317,368,333,429,286,1,6,5,10,3,8,2,7,4,9
123,163,202,259,348,411,364,254,181,475,289,279,357,215,402,206,205,115,462,487,2,7,4,9,1,6,5,10,3,8
185,201,336,437,362,125,429,405,90,352,50,120,231,452,115,487,434,320,400,107,1,6,5,10,3,8,2,7,4,9
231,238,17,145,407,356,41,117,77,323,312,463,121,218,360,298,225,70,432,323,1,6,4,9,2,7,5,10,3,8
262,276,167,355,484,101,138,298,1,216,73,320,27,0,120,108,485,291,370,461,1,6,4,9,3,8,2,7,5,10
292,283,302,32,44,347,203,449,442,141,304,130,403,252,366,420,213,480,356,129,4,9,3,8,1,6,2,7,5,10
Sklearn is built for generic algorithms, TSP/VRP are too specific for it. Are you open to trying more specific libraries then Sklearn?
Recent advance in Reinforcement Learning seems to address TSP and VRP problems in a way that challenges the traditional Combinatorial Optimization approach.
To start with, you can look at this tutorial.
A recent paper shows a method for VRP. They also shared their code on Github.
A more recent paper claims to have a shorter training period.
Generally speaking, the architecture proposed in these papers looks on the VRP job as a whole and is better than a greedy approach by:
The training phase which goes back and forth to include future
rewards
The solution architecture includes (at least) two NN.
Encoder and Decoder. The Encoder goes thru the entire input BEFORE the Decoder starts producing the output
To summarize, if you want a quick and robust solution you can use existing open libraries such as Jsprit. If you have time for research, the resources for training a NN and can take the risk of failing, go after Reinforcement Learning.
Based on your comments, using ML purely to generate a starting point for a traditional MIP/constraint/heuristic solver is a better idea than using ML to solve the whole thing, but I believe it to still be a bad idea. In my opinion, you will find it very hard to get a useful initial solution using ML. In a few lines of code you could probably put together a heuristic to greedily grow routes for a search starting point; getting ML to do something of even roughly equivalent quality would be a lot more work, and maybe not even possible.
If you really wanted to try this (and I emphasize again that it's a bad idea), the choice of features is likely much more important than the choice of classifier. For example at the moment you're asking the classifier to learn both (a) pythagoras and then (b) what's a good route. It has to learn pythagoras because you're passing in the coords directly. ML works best when the features are engineered to make the learning task easier. Passing in a normalised distance matrix instead of the raw coords might be more succesful, because then the classifier doesn't have to learn pythagoras. However, then you have n^2 scaling in features, which would likely cause overfitting and the problems associated with that...
Alternative you could grow the route from empty using ML to decide the next stop to add each time. So ML classifier chooses the first stop, then you classify again to choose the second stop, then classify again to choose the third and so-on. This would be simpler too, though the ML will primarily just learn 'what the closest stop is to the last one'. I have known some companies to use this kind of 'ML chooses the next stop or job' approach when they're scheduling/dispatching jobs one at a time for food takeaway/on-demand deliveries - i.e. problems similar to Uber Eats delivering hot food from restaurants. This is a bit different from your case, as it's dynamic/realtime route optimisation problem, but still some companies are actually using ML in vehicle route optimisation for real. In my option it's still a bad approach though - e.g. we did a study in this video https://www.youtube.com/watch?v=EMhnXAH5dvM where we look at the effect of this kind of one-at-a-time scheduling/dispatching (which you can use ML for) vs proper route optimisation, and one-at-a-time scheduling/dispatching comes off significantly worse.

XGBOOST faster than random forest?

I am doing kaggle inclass challege of bosten hosing prices and learnt that XGBoost is faster than RandomForest but when implemented was slower.i Want to ask when XGBoost becomes faster and when RandomForest??.I am new to machine learning and need your help.Thanking in advance
Mainly, the parameters you choose have strong impact in the speed of your algorithm, (e.g learning rate, depth of the tree, number of features etc.), there's a trade-off between accuracy and speed, so i suggest you put the parameters you've chosen for every model and see how to change it to get faster performance with reasonable accuracy.

Sklearn overfitting

I have a data set containing 1000 points each with 2 inputs and 1 output. It has been split into 80% for training and 20% for testing purpose. I am training it using sklearn support vector regressor. I have got 100% accuracy with training set but results obtained with test set are not good. I think it may be because of overfitting. Please can you suggest me something to solve the problem.
You may be right: if your model scores very high on the training data, but it does poorly on the test data, it is usually a symptom of overfitting. You need to retrain your model under a different situation. I assume you are using train_test_split provided in sklearn, or a similar mechanism which guarantees that your split is fair and random. So, you will need to tweak the hyperparameters of SVR and create several models and see which one does best on your test data.
If you look at the SVR documentation, you will see that it can be initiated using several input parameters, each of which could be set to a number of different values. For the simplicity, let's assume you are only dealing with two parameters that you want to tweak: 'kernel' and 'C', while keeping the third parameter 'degree' set to 4. You are considering 'rbf' and 'linear' for kernel, and 0.1, 1, 10 for C. A simple solution is this:
for kernel in ('rbf', 'linear'):
for c in (0.1, 1, 10):
svr = SVR(kernel=kernel, C=c, degree=4)
svr.fit(train_features, train_target)
score = svr.score(test_features, test_target)
print kernel, c, score
This way, you can generate 6 models and see which parameters lead to the best score, which will be the best model to choose, given these parameters.
A simpler way is to let sklearn to do most of this work for you, using GridSearchCV (or RandomizedSearchCV):
parameters = {'kernel':('linear', 'rbf'), 'C':(0.1, 1, 10)}
clf = GridSearchCV(SVC(degree=4), parameters)
clf.fit(train_features, train_target)
print clf.best_score_
print clf.best_params_
model = clf.best_estimator_ # This is your model
I am working on a little tool to simplify using sklearn for small projects, and make it a matter of configuring a yaml file, and letting the tool do all the work for you. It is available on my github account. You might want to take a look and see if it helps.
Finally, your data may not be linear. In that case you may want to try using something like PolynomialFeatures to generate new nonlinear features based on the existing ones and see if it improves your model quality.
Try fitting your data using training data split Sklearn K-Fold cross-validation, this provides you a fair split of data and better model , though at a cost of performance , which should really matter for small dataset and where the priority is accuracy.
A few hints:
Since you have only two inputs, it would be great if you plot your data. Try either a scatter with alpha = 0.3 or a heatmap.
Try GridSearchCV, as mentioned by #shahins.
Especially, try different values for the C parameter. As mentioned in the docs, if you have a lot of noisy observations you should decrease it. It corresponds to regularize more the estimation.
If it's taking too long, you can also try RandomizedSearchCV
As a side note from #shahins answer (I am not allowed to add comments), both implementations are not equivalent. GridSearchCV is better since it performs cross-validation in the training set for tuning the hyperparameters. Do not use the test set for tuning hyperparameters!
Don't forget to scale your data

RandomForestClassifier vs ExtraTreesClassifier in scikit learn

Can anyone explain the difference between the RandomForestClassifier and ExtraTreesClassifier in scikit learn. I've spent a good bit of time reading the paper:
P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006
It seems these are the difference for ET:
1) When choosing variables at a split, samples are drawn from the entire training set instead of a bootstrap sample of the training set.
2) Splits are chosen completely at random from the range of values in the sample at each split.
The result from these two things are many more "leaves".
Yes both conclusions are correct, although the Random Forest implementation in scikit-learn makes it possible to enable or disable the bootstrap resampling.
In practice, RFs are often more compact than ETs. ETs are generally cheaper to train from a computational point of view but can grow much bigger. ETs can sometime generalize better than RFs but it's hard to guess when it's the case without trying both first (and tuning n_estimators, max_features and min_samples_split by cross-validated grid search).
ExtraTrees classifier always tests random splits over fraction of features (in contrast to RandomForest, which tests all possible splits over fraction of features)
The main difference between random forests and extra trees (usually called extreme random forests) lies in the fact that, instead of computing the locally optimal feature/split combination (for the random forest), for each feature under consideration, a random value is selected for the split (for the extra trees). Here is a good resource to know more about their difference in more detail Random forest vs extra tree.

Resources