AttributeError: 'str' object has no attribute 'parameters' due to new version of sklearn - scikit-learn

I am doing topic modeling using sklearn. While trying to get the log-likelihood from Grid Search output, I am getting the below error:
AttributeError: 'str' object has no attribute 'parameters'
I think I understand the issue which is: 'parameters' is used in the older version and I am using the new version (0.22) of sklearn and that is giving error. I also search for the term which is used in the new version but couldn't find it. Below is the code:
# Get Log Likelyhoods from Grid Search Output
n_components = [10, 15, 20, 25, 30]
log_likelyhoods_5 = [round(gscore.mean_validation_score) for gscore in model.cv_results_ if gscore.parameters['learning_decay']==0.5]
log_likelyhoods_7 = [round(gscore.mean_validation_score) for gscore in model.cv_results_ if gscore.parameters['learning_decay']==0.7]
log_likelyhoods_9 = [round(gscore.mean_validation_score) for gscore in model.cv_results_ if gscore.parameters['learning_decay']==0.9]
# Show graph
plt.figure(figsize=(12, 8))
plt.plot(n_components, log_likelyhoods_5, label='0.5')
plt.plot(n_components, log_likelyhoods_7, label='0.7')
plt.plot(n_components, log_likelyhoods_9, label='0.9')
plt.title("Choosing Optimal LDA Model")
plt.xlabel("Num Topics")
plt.ylabel("Log Likelyhood Scores")
plt.legend(title='Learning decay', loc='best')
plt.show()
Thanks in advance!

There is key 'params' which is used to store a list of parameter settings dicts for all the parameter candidates. You can see the GridSearchCv doc here from sklearn documentation.
In your code, gscore is a string key value of cv_results_.
Output of cv_results_ is a dictionary of string key like 'params','split0_test_score' etc(you can refer the doc) and their value as list or array etc.
So, you need to make following change to your code :
log_likelyhoods_5 = [round(model.cv_results_['mean_test_score'][index]) for index, gscore in enumerate(model.cv_results_['params']) if gscore['learning_decay']==0.5]

Related

ColumnTransformer object has no attribute shape error

My data file (CSV) contains categorical and non-categorical variables. To perform cox proportional hazard (CPH) I applied OneHotEncoder on two categorical variables (study_category and patient_category). I am getting the following error on the line where I am trying to fit the CPH model. I am passing three parameters: dataframe, duration column (), event column() to cph.fit() method. I googled the error but could not found something useful. I am using CPH first time, any help to fix the issue will be appreciated.
Error:
AttributeError: 'ColumnTransformer' object has no attribute 'shape'
My python code:
def meth():
dataset = pd.read_csv("C:/Users/XYZ/CTR_Project/CPH.csv")
dataset=dataset.loc[:,
['study_Category','patient_Category','Diff_time','Events']]
X=dataset.loc[:,['study_Category','patient_Category','Diff_time','Events']]
colm_transf=make_column_transformer((OneHotEncoder(),
['study_Category','patient_Category']),remainder='passthrough')
colm_transf.fit_transform(X)
cph= CoxPHFitter()
cph.fit(colm_transf,duration_col='Diff_time', event_col='Events')
cph.print_summary()

AttributeError: 'numpy.ndarray' object has no attribute 'sqrt'

I am trying to split dataframe in equal samples and applying some function to calculate value of each sample if any sample value greater than 0.3 then in result dataframe i want to save filename
df=pd.DataFrame({'Value':[-0.016,-0.006,0.003,-0.011,-0.036,-0.031,-0.014,-0.006,-0.01 ,-0.009,0.004,0.001,-0.012,-0.021,-0.008,0.001,-0.011,-0.01,-0.006,0.002,0.004],'Nmae':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]})
x=pd.DataFrame([x.values.sqrt(np.mean(df2['Value']**2)) for x in np.array_split(df2, (len(df2)/10))])
getting this error
AttributeError: 'numpy.ndarray' object has no attribute 'sqrt'
if someone have any other effective way to do this task
This is a working version of your Code:
res= [np.sqrt(np.mean((x.Value**2))) for x in np.array_split(df, (len(df)/10))]
An alternative way of approaching this with Pandas would be. You define a new column 'Split_variable' and use it to apply your calculations:
df.groupby('Split_variable')['Value'].apply(lambda x: np.sqrt(np.mean((x**2))))

AttributeError: 'numpy.ndarray' object has no attribute 'rolling'

When I am trying to do MA or rolling average with log transformed data I get this error. Where am I going wrong?
This one with original data worked fine-
# Rolling statistics
rolmean = data.rolling(window=120).mean()
rolSTD = data.rolling(window=120).std()
with log transformed data-
MA = X.rolling(window=120).mean()
MSTD = X.rolling(window=120).std()
AttributeError: 'numpy.ndarray' object has no attribute 'rolling'
You have to convert the numpy array to a pandas dataframe to use the pandas.rolling method.
The change could be something like this
dataframe = pd.DataFrame(data)
rolmean = dataframe.rolling(120).mean()
Try this instead:
numpy.roll(your_array, shift, axis = None)
There is no attribute rolling in numpy. So you shoud use the above syntax
Hope this helps

Find Max Value in a field of a shapefile

I have a shapefile (mich_co.shp) which I try to find the county with max population. My idea is to use max() function it's not possible. Here is my code so far:
from osgeo import ogr
import os
shapefile = "C:/Users/root/Python/mich_co.shp"
driver = ogr.GetDriverByName("ESRI Shapefile")
dataSource = driver.Open(shapefile, 0)
layer = dataSource.GetLayer()
for feature in layer:
print(feature.GetField("pop"))
layer.ResetReading()
The code above however only print all values of "pop" field like this:
10635.0
9541.0
112039.0
29234.0
23406.0
15477.0
8683.0
58990.0
106935.0
17465.0
156067.0
43868.0
135099.0
I tried:
print(max(feature.GetField("pop")))
but it returns TypeError: 'float' object is not iterable. For this, I've also tried:
for feature in range(layer):
and it returns TypeError: 'Layer' object cannot be interpreted as an integer.
Any helps of hints would be much appreciated.
Thanks you!
max() needs an iterable, such as a list. Try to build a list:
pops = [ feature.GetField("pop") for feature in layer ]
print(max(pops))

LdaModel - random_state parameter not recognized - gensim

I'm using gensim's LdaModel, which, according to the documentation, has the parameter random_state. However, I'm getting an error that says:
TypeError: __init__() got an unexpected keyword argument 'random_state'
Without the random_state parameter, the function works as expected. So, the workflow looks like this for those that want to know what else is happening...
from gensim import corpora, models
import numpy as np
# pseudo code of text pre-processing all on "comments" variable
# stop words
# remove punctuation (optional)
# keep alpha only
# stemming
# get bigrams and integrate with corpus (gensim makes this very easy)
dictionary = corpora.Dictionary(comments)
corpus = [dictionary.doc2bow(comm) for comm in comments]
tfidf = models.TfidfModel(corpus) # change weights
corp_tfidf = tfidf[corpus] # apply them to corpus
# set random seed
random_seed = 135
state = np.random.RandomState(random_seed)
# train model
num_topics = 3
lda_mod = models.LdaModel(corp_tfidf, # corpus
num_topics=num_topics, # number of topics we want back
id2word=dictionary, # our id-word map
passes=10, # how many passes to take over the data
random_state=state) # reproduce the results
Which results in the error message above...
TypeError: __init__() got an unexpected keyword argument 'random_state'
I'd like to be able to recreate my results, if possible.
According to this, random_state parameter was added in the latest version (0.13.2). You can update your gensim installation with pip install gensim --upgrade. You might need to update scipy first, because it caused me problems.

Resources