I am getting ValueError: The estimator Sequential should be a classifier. how can I solve it? - keras

I am using VotingClassifier for 31 Pre-trained models, when I wanted to do voting using VotingClassifier. I got this error ValueError: The estimator Sequential should be a classifier. The code is as shown below:
estimators = [("EfficientNetB0_model",EfficientNetB0_model),("EfficientNetB1_model",EfficientNetB1_model),("DenseNet121_model",DenseNet121_model),
("DenseNet169_model",DenseNet169_model),("DenseNet201_model",DenseNet201_model),("EfficientNetB2_model",EfficientNetB2_model),
("EfficientNetB3_model",EfficientNetB3_model),("EfficientNetB4_model",EfficientNetB4_model),("EfficientNetB5_model",EfficientNetB5_model),
("EfficientNetB6_model",EfficientNetB6_model),("EfficientNetB7_model",EfficientNetB7_model),("EfficientNetV2B0_model",EfficientNetV2B0_model),
("EfficientNetV2B1_model",EfficientNetV2B1_model),("EfficientNetV2B2_model",EfficientNetV2B2_model),("EfficientNetV2B3_model",EfficientNetV2B3_model),
("EfficientNetV2L_model",EfficientNetV2L_model),("EfficientNetV2M_model",EfficientNetV2M_model),("EfficientNetV2S_model",EfficientNetV2S_model),
("InceptionResNetV2_model",InceptionResNetV2_model),("InceptionV3_model",InceptionV3_model),
("ResNet50_model",ResNet50_model),("ResNet50V2_model",ResNet50V2_model),("ResNet101_model",ResNet101_model),
("ResNet101V2_model",ResNet101V2_model),("ResNet152_model",ResNet152_model),("ResNet152V2_model",ResNet152V2_model),
("VGG16_model",VGG16_model),("VGG19_model",VGG19_model),("Xception_model",Xception_model),
("MobileNet_model",MobileNet_model),("MobileNetV2_model",MobileNetV2_model)]
weights = [0.2, 0.3, 0.0, 0.1, 0.0,
0.3, 0.2, 0.1, 0.0, 0.3,
0.1, 0.3, 0.3, 0.1, 0.0,
0.1, 0.2, 0.1, 0.1, 0.1,
0.4, 0.0, 0.2, 0.1, 0.4,
0.0, 0.0, 0.1, 0.1, 0.0, 0.0
]
ensemble = VotingClassifier(estimators, weights=weights, voting= 'soft')
ensemble._estimator_type = "classifier"
ensemble = ensemble.fit(X_train, y_train)
print(ensemble.predict(X_test))
Could you help me since I could not find any solution for that. Thank you
Is there any other ways to do voting ?

Related

How to do grid search for Topic Modeling in Py -Spark along with evaluation metrics?

I am doing topic modeling hyper pramater tuning in spark along with evaluation using log preplixity and log likelihood.
My code :
lda = LDA(k=num_topics, maxIter=100, featuresCol=features_col)
paramGrid = ParamGridBuilder() \
.addGrid({lda.setOptimizer, ['em', 'online']}) \
.addGrid(lda.setLearningDecay, [0.5, 0.6, 0.7, 0.8, 0.9, 0.97]) \
.addGrid(lda.setLearningOffset, [100, 200, 500, 1000, 2000]) \
.addGrid(lda.setSubsamplingRate, [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]) \
.addGrid([lda.setDocConcentration, [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]]) \
.addGrid([lda.setTopicConcentration, [0.4, 0.5, 0.6, 0.7, 0.8, 0.9]]) \
.addGrid(lda.maxIter, [10, 50 , 100]) \
.build()
cv = CrossValidator()
.setEstimator(lda)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(5)
lda_model = cv.fit(df_featured)
ll = lda_model.logLikelihood(df_featured)
lp = lda_model.logPerplexity(df_featured)
Evaluator is missing in LDA algorithm in Spark .

Python GEKKO: Value of parameter changes while solving the model

I face the following problem with GEKKO: some parameters (.Param) are changing (others not) when solving a model and I cannot determine why.
Background: I am currently trying to translate code from EViews (see gennaro.zezza.it) to python. I use GEKKO to simulate a system consisting out of 11 equations (for now). I do want to use parameters (instead of constants which seem to work perfectly fine) as I need to ('exogenously') change their value over time (and thus need an array).
Example: In the following example, an 'economic system' reacts to new government expenditures. Here, I particularly face problems with "m.alpha1" and "m.alpha2" - if they are introduced as ".Param" their value will change to 1.0 (instead of 0.6 and 0.4) when solving the model. How can I stop GEKKO from doing this? (Again, I want to be able to change, e.g., alpha1 to 0.7 after time x. E.g., lower and upper bounds won't help here.)
Thanks for your help!!
Code:
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
# Initialize model
m = GEKKO(remote=False)
tstart = 1945
tend = 2000
tdur = tend-tstart+1
m.time = np.linspace(0, tend-tstart, tdur)
# Model parameters
m.t = m.Param(value=m.time)
# Exogenous parameters
alpha1_ex = 0.6
alpha2_ex = 0.4
theta_ex = 0.2
w_ex = 1
# -as .Const
m.alpha1 = m.Const(value=alpha1_ex, name='Propensity to consume out of income')
m.alpha2 = m.Const(value=alpha2_ex, name='Propensity to consume out of wealth')
#m.theta = m.Const(value=theta_ex, name='Tax rate')
#m.w = m.Const(value=w_ex, name='Wage rate')
# -as .Param: issues with alpha1 & alpha2
#m.alpha1 = m.Param(value=np.full(tdur,alpha1_ex), name='Propensity to consume out of income')
#m.alpha2 = m.Param(value=np.full(tdur,alpha2_ex), name='Propensity to consume out of wealth')
m.theta = m.Param(value=np.full(tdur,theta_ex), name='Tax rate')
m.w = m.Param(value=np.ones(tdur), name='Wage rate')
# no issues with g_d
m.g_d = m.Param(value=np.zeros(tdur), name='Government goods, demand')
m.g_d[1:] = 20
# Endogenous variables
m.c_d = m.Var(value=0, name='Consumption goods demand by households')
m.c_s = m.Var(value=0, name='Consumption goods supply')
m.g_s = m.Var(value=0, name='Government goods, supply')
m.h_h = m.Var(value=0, name='Cash money held by households')
m.h_s = m.Var(value=0, name='Cash money supplied by government')
m.n_d = m.Var(value=0, name='Demand for labor')
m.n_s = m.Var(value=0, name='Supply for labor')
m.t_d = m.Var(value=0, name='Taxes, "demand"')
m.t_s = m.Var(value=0, name='Taxes, "supply"')
m.y = m.Var(value=0, name='Income (=GDP)')
m.yd = m.Var(value=0, name='Disposable income of households')
# Lag variables
m.h_h_lag = m.Var(value=0, name='Cash money held by households (t-1)')
m.delay(m.h_h,m.h_h_lag,1) # m.h_h_lag = m.h_h(t-1)
m.h_s_lag = m.Var(value=0, name='Cash money supplied by government (t-1)')
m.delay(m.h_s,m.h_s_lag,1)
# Equations
m.Equation(m.c_s == m.c_d)
m.Equation(m.g_s == m.g_d)
m.Equation(m.t_s == m.t_d)
m.Equation(m.n_s == m.n_d)
m.Equation(m.yd == m.w*m.n_s - m.t_s)
m.Equation(m.t_d == m.theta*m.w*m.n_s)
m.Equation(m.c_d == m.alpha1*m.yd + m.alpha2*m.h_h_lag)
m.Equation(m.h_s == m.h_s_lag + m.g_d - m.t_d)
m.Equation(m.h_h == m.h_h_lag + m.yd - m.c_d)
m.Equation(m.y == m.c_s + m.g_s)
m.Equation(m.n_d == m.y/m.w)
# Solve
m.options.IMODE = 4
m.solve(disp=False)
print("Alpha1 = ", m.alpha1.value)
print("Alpha2 = ", m.alpha2.value)
print("Theta = ", m.theta.value)
print("w = ", m.w.value)
# Plot results
fig, axes = plt.subplots(2, 2, sharex=True, figsize=(8, 7))
fig.canvas.manager.set_window_title('Figures Chapter 3')
fig.suptitle('SIM Model - basic')
x_major_ticks = np.arange(0,tdur,5)
axes[0,0].plot(m.time, m.g_d.value, '-', color='black', linewidth=1)
axes[0,0].legend([m.g_d.name],loc=4,fontsize=7)
axes[0,0].grid()
axes[0,0].set_xticks(x_major_ticks)
axes[1,0].plot(m.time, m.y.value, '-', color='red', linewidth=1)
axes[1,0].legend([m.y.name],loc=4,fontsize=7)
axes[1,0].grid()
axes[1,0].set_xlabel('Time (years)')
axes[1,0].set_xticks(x_major_ticks)
axes[0,1].plot(m.time, m.c_d.value, '-', color='blue', linewidth=0.75)
axes[0,1].plot(m.time, m.yd.value, '-', color='green', linewidth=0.75)
axes[0,1].legend([m.c_d.name,m.yd.name],loc=4,fontsize=7)
axes[0,1].grid()
axes[0,1].set_xticks(x_major_ticks)
ln1 = axes[1,1].plot(m.time, m.h_h.value, '-', color='purple', linewidth=0.75)
axes[1,1].tick_params(axis='y', labelcolor='purple')
ax2 = axes[1,1].twinx()
ln2 = ax2.plot(m.time, [a_i - b_i for a_i, b_i in zip(m.h_h, m.h_h_lag)], '-', color='orange', linewidth=0.75)
ax2.tick_params(axis='y', labelcolor='orange')
lns = ln1+ln2
axes[1,1].legend(lns,[m.h_h.name,'Household savings'],loc=4,fontsize=7)
axes[1,1].grid()
axes[1,1].set_xticks(x_major_ticks)
axes[1,1].set_xlabel('Time (years)')
plt.show()
Output #1: with m.alpha1 and m.alpha2 as .const
Alpha1 = 0.6
Alpha2 = 0.4
Theta = [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]
w = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Output #2: with m.alpha1 as .param
Alpha1 = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Alpha2 = 0.4
Theta = [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]
w = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
The problem is that the name of the variable name='Propensity to consume out of income' is over 25 characters long.
m.alpha1 = m.Param(value=np.full(tdur,alpha1_ex), name='Propensity to consume out of income')
m.alpha2 = m.Param(value=np.full(tdur,alpha2_ex), name='Propensity to consume out of wealth')
The model file is produced correctly (gk_model0.apm) but the data file (gk_model0.csv) header is truncated to 25 characters. The files are accessible with m.open_folder(). The bug is in this line of gk_write_files.py where numbers are output as strings of length 25.
np.savetxt(os.path.join(self._path,file_name), csv_data.T, delimiter=",", fmt='%1.25s')
I've added this as a bug report with tracking on GitHub. One work-around is to use shorter variable names or leave off the variable names.
m.alpha1 = m.Param(value=np.full(tdur,alpha1_ex)) # Propensity to consume out of income

How to use fit_transform with an array?

Example of array content:
[
[4.9, 3.0, 1.4, 0.2, 0.0, 2.0],
[4.7, 3.2, 1.3, 0.2, 0.0, 2.0],
[4.6, 3.1, 1.5, 0.2, 0.0, 2.0],
...
]
model = TSNE(learning_rate=100)
transformed = model.fit_transform(data)
I'm trying to apply tSNE to a float array, but I get an error. What should I change?
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (149,) + inhomogeneous part.
Try this example:
from sklearn.manifold import TSNE
import numpy as np
X = np.array([[4.9, 3.0, 1.4, 0.2, 0.0, 2.0], [4.7, 3.2, 1.3, 0.2, 0.0, 2.0]])
model = TSNE(learning_rate=100)
transformed = model.fit_transform(X)
print(transformed)

I want to create simple list [0, 0.05, 0.10, 0.15,.....,1.00] using while loop in python

I want to create a while loop in python which will give an output as a list [0.00, 0.05, 0.10, 0.15,...., 1.00]
I tried doing it by following method:
alpha=0
alphalist=list()
while alpha<=1:
alphalist.append(alpha)
alpha+=0.05
print(alphalist)
I got the output as [0, 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.3, 0.35, 0.39999999999999997, 0.44999999999999996, 0.49999999999999994, 0.5499999999999999, 0.6, 0.65, 0.7000000000000001, 0.7500000000000001, 0.8000000000000002, 0.8500000000000002, 0.9000000000000002, 0.9500000000000003]
But What I want is this: [0.00, 0.05, 0.10, 0.15,...., 1.00]
This is the result of floating-point error. 0.05 isn't really the rational number 1/20 to begin with, so any arithmetic involving it may differ from what you expect.
Dividing two integers, rather than starting with a floating-point value, helps mitigate the problem.
>>> [x/100 for x in range(0, 101, 15)]
[0.0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0]
There are some numbers that can cause imprecisions with the floating number system computers use. You're just seeing an example of that.
What I would do, if you want to continue using the while loop this way, is to add another line with
alpha = round(alpha,2)

How to efficiently deal with nested data in PySpark?

I had a situation here, and found that collect_list in spark is not efficient when the item is already a list.
Basically, I tried to calculate the mean of a nested list (the size of each list is guaranteed to be the same). When the data set becomes, for example, 10 M rows, it may produce out of memory errors. Originally, I thought it has something to do with the udf (to calculate the mean). But actually, I found that the aggregation part (collect_list of lists) is the real problem.
What I am doing now is to divide the 10 M rows into multiple blocks (by 'user'), aggregate each block individually, and then union them at the end. Any better suggestion on efficiently dealing with nested data?
For example, the toy example is like this:
data = [('user1','place1', ['place1', 'place2', 'place3'], [0.0, 0.5, 0.4], [0.0, 0.4, 0.3]),
('user1','place2', ['place1', 'place2', 'place3'], [0.7, 0.0, 0.4], [0.6, 0.0, 0.3]),
('user2','place1', ['place1', 'place2', 'place3'], [0.0, 0.4, 0.3], [0.0, 0.3, 0.4]),
('user2','place3', ['place1', 'place2', 'place3'], [0.1, 0.2, 0.0], [0.3, 0.1, 0.0]),
('user3','place2', ['place1', 'place2', 'place3'], [0.3, 0.0, 0.4], [0.2, 0.0, 0.4]),
]
data_df = sparkApp.sparkSession.createDataFrame(data, ['user', 'place', 'places', 'data1', 'data2'])
data_agg = data_df.groupBy('user') \
.agg(f.collect_list('place').alias('place_list'),
f.first('places').alias('places'),
f.collect_list('data1').alias('data1'),
f.collect_list('data1').alias('data2'),
)
import numpy as np
def average_values(sim_vectors):
if len(sim_vectors) == 1:
return sim_vectors[0]
mat = np.array(sim_vectors)
mean_vector = np.mean(mat, axis=0)
return np.round(mean_vector, 3).tolist()
avg_vectors_udf = f.udf(average_values, ArrayType(DoubleType()))
data_agg_ave = data_agg.withColumn('data1', avg_vectors_udf('data1')) \
.withColumn('data2', avg_vectors_udf('data2'))
The result would be:
+-----+----------------+--------------------+-----------------+-----------------+
| user| place_list| places| data1| data2|
+-----+----------------+--------------------+-----------------+-----------------+
|user1|[place1, place2]|[place1, place2, ...|[0.35, 0.25, 0.4]|[0.35, 0.25, 0.4]|
|user3| [place2]|[place1, place2, ...| [0.3, 0.0, 0.4]| [0.3, 0.0, 0.4]|
|user2|[place1, place3]|[place1, place2, ...|[0.05, 0.3, 0.15]|[0.05, 0.3, 0.15]|
+-----+----------------+--------------------+-----------------+-----------------+

Resources