How to run all modules in a folder? - python-3.x

/machine_learning
dtree.py
lr.py
nb.py
svm.py
/main.py
Each python file contains one class of machine learning method. In the main.py, import machine_learning as ml, so calling each method like
model = ml.py_name.model_name()
Is there a way to let me build a list containing all the model classes like
[ml.svm.svm_ml(), ml.nb.naivebayes(), ml.lr.logisticregression(), ml.dtree.decisiontree()]
I tried
ml_list = [name for _, name, _ in pkgutil.iter_modules(['machine_learning'])];
print(ml_list);
#["dtree","lr","nb","svm"]

import all models you need -> from sklearn.neighbors import KNeighborsClassifier
creat list models=[]
add models to list -> models.append(KNeighborsClassifier(n_neighbors=3))
split your data to train test
use for loop to fit your data to models
for the model in models:
model.fit(X, Y)

Related

AttributeError: 'CountVectorizer' object has no attribute '_load_specials'

I am dumping my pretrained doc2vec model using below command
model.train(labeled_data,total_examples=model.corpus_count, epochs=model.epochs)
print("Model Training Done")
#Saving the created model
model.save(project_name + '_doc2vec_vectorizer.npz')
vectorizer=CountVectorizer()
vectorizer.fit(df[0])
vec_file = project_name + '_doc2vec_vectorizer.npz'
**pickle.dump(vectorizer, open(vec_file, 'wb'))**
vdb = db['vectorizers']
and then I am loading Doc2vec model using below command in another function
loaded_vectorizer = pickle.load(open(vectorizer, 'rb'))
and then getting the error CountVectorizer has no attribute _load_specials on below line i.e model2
model2= gensim.models.doc2vec.Doc2Vec.load(vectorizer)
The gensim version being used by me is 3.8.3 as I am using the LabeledSentence class
The .load() method on Gensim model classes should only be used with objects of exactly that same class that were saved to file(s) *using the Gensim .save() method.
Your code shows you trying to use Doc2Vec.load() with the vectorizer object itself (not a file path to the previously-saved model), so the error is to be expected.
If you actually want to pickle-save & then pickle-load the vectorizer object, be sure to:
use a different file path than you did for the model, or you'll overwrite the model file!
use pickle methods (not Gensim methods) to re-load anything that was pickle-saved

Using ray with custom environment created with gym.make()

I would like to run the following code but instead of Cartpole use a custom environment:
import ray
import ray.rllib.agents.dqn.apex as apex
from ray.tune.logger import pretty_print
def train_cartpole() -> None:
ray.init()
config = apex.APEX_DEFAULT_CONFIG.copy()
config["num_gpus"] = 0
config["num_workers"] = 3
trainer = apex.ApexTrainer(config=config, env="CartPole-v0")
for _ in range(1000):
# Perform one iteration of training the policy with Apex-DQN
result = trainer.train()
print(pretty_print(result))
train_cartpole()
My environment is defined as a gym.Env class and I want to create it using gym.make and then apply a wrapper to it and gym's FlattenObservation(). I have found ways of providing the environment as a class or a string, but that does not work for me because I do not know how to apply the wrappers afterwards.

understanding tensorflow Recommending movies: retrieval / usage of : in python class /usage of : in python function

I was reading and trying to work with below documentation from tensorflow
https://www.tensorflow.org/recommenders/examples/basic_retrieval?hl=sl
In this we have implementation of MovielenseModel class. Let me provide snippet of same code below
class MovielensModel(tfrs.Model):
def __init__(self, user_model, movie_model):
super().__init__()
self.movie_model: tf.keras.Model = movie_model
self.user_model: tf.keras.Model = user_model
self.task: tf.keras.layers.Layer = task
def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
# We pick out the user features and pass them into the user model.
user_embeddings = self.user_model(features["user_id"])
# And pick out the movie features and pass them into the movie model,
# getting embeddings back.
positive_movie_embeddings = self.movie_model(features["movie_title"])
# The task computes the loss and the metrics.
return self.task(user_embeddings, positive_movie_embeddings)
In this one usages are not clear and could not find much help in any online documentations
Usage of self.movie_model: tf.keras.Model = movie_model . Looks like its first class object implementation of function but how does this work? When I simply tried d:c=3, just to replicate it worked fine d gets value 3 and c its saying as undefined.
Its an type annotation, check this link here https://docs.python.org/3/library/typing.html. Here self.movie_model is supposed to be an instance of tf.keras.Model it is very useful and helpful as python is dynamically typed language especially in function / method signatures
you can annotate types of inpuit params and the type of return value

Restarting an optimisation with Pymoo

I'm trying to restart an optimisation in pymoo.
I have a problem defined as:
class myOptProb(Problem):
"""my body goes here"""
algorithm = NSGA2(pop_size=24)
problem = myOptProblem(opt_obj=dp_ptr,
nvars=7,
nobj=4,
nconstr=0,
lb=0.3 * np.ones(7),
ub=0.7 * np.ones(7),
parallelization=('threads', cpu_count(),))
res = minimize(problem,
algorithm,
('n_gen', 100),
seed=1,
verbose=True)
During the optimisation I write the design vectors and results to a .csv file. An example of design_vectors.csv is:
5.000000000000000000e+00, 4.079711567060104183e-01, 6.583544872784267143e-01, 4.712364759485179189e-01, 6.859360188593541796e-01, 5.653765991273791425e-01, 5.486782880836487131e-01, 5.275405748345924906e-01,
7.000000000000000000e+00, 5.211287914743063521e-01, 6.368123569438421949e-01, 3.496693260479644128e-01, 4.116734716044557763e-01, 5.343037085833151068e-01, 6.878382993278697732e-01, 5.244120877022839800e-01,
9.000000000000000000e+00, 5.425317846613321171e-01, 5.275405748345924906e-01, 4.269449637288642574e-01, 6.954464617649794844e-01, 5.318980876983187001e-01, 4.520564690494201510e-01, 5.203792876471586837e-01,
1.100000000000000000e+01, 4.579502451694219545e-01, 6.853050113762846340e-01, 3.695822666721857441e-01, 3.505318077758549089e-01, 3.540316632186925050e-01, 5.022648662707586142e-01, 3.086099221096791911e-01,
3.000000000000000000e+00, 4.121775968257620493e-01, 6.157117313805953174e-01, 3.412904026310568106e-01, 4.791574104703620329e-01, 6.634382012372381787e-01, 4.174456593494717538e-01, 4.151101354345394512e-01,
The results.csv is:
5.000000000000000000e+00, 1.000000000000000000e+05, 1.000000000000000000e+05, 1.000000000000000000e+05, 1.000000000000000000e+05,
7.000000000000000000e+00, 1.041682833582066703e+00, 3.481167125962069189e-03, -5.235115318709097909e-02, 4.634480813876099177e-03,
9.000000000000000000e+00, 1.067730307802263967e+00, 2.194702810002167534e-02, -3.195892023664552717e-01, 1.841232582360878426e-03,
1.100000000000000000e+01, 8.986880344052742275e-01, 2.969022150977750681e-03, -4.346692726475211849e-02, 4.995468429444801205e-03,
3.000000000000000000e+00, 9.638770499257821589e-01, 1.859596479928402393e-02, -2.723230073142696162e-01, 1.600910928983005632e-03,
The first column is the index of the design vector - because I thread asynchronously, I specify the indices.
I see that it should be possible to restart the optimisation via the sampling parameter for pymoo.algorithms.nsga2.NSGA2 but I couldn't find a working example. The documentation for both population and individuals is also not clear. So how can I restart a simulation with the previous results?
Yes, you can initialize the algorithm object with a population instead of doing it randomly.
I have written a small tutorial for a biased initialization:
https://pymoo.org/customization/initialization.html
Because in your case the data already exists, in a CSV or in-memory file, you might want to create a dummy problem (I have called it Constant in my example) to set the attributes in the Population object. (In the population X, F, G, CV and feasible needs to be set). Another way would be setting the attributes directly...
The biased initialization with a dummy problem is shown below. If you already use pymoo to store the csv files, you can also just np.save the Population object directly and load it. Then all intermediate steps are not necessary.
I am planning to improve checkpoint implementation in the future. So if you have some more feedback and use case which are not possible yet please let me know.
import numpy as np
from pymoo.algorithms.nsga2 import NSGA2
from pymoo.algorithms.so_genetic_algorithm import GA
from pymoo.factory import get_problem, G1, Problem
from pymoo.model.evaluator import Evaluator
from pymoo.model.population import Population
from pymoo.optimize import minimize
class YourProblem(Problem):
def __init__(self, n_var=10):
super().__init__(n_var=n_var, n_obj=1, n_constr=0, xl=-0, xu=1, type_var=np.double)
def _evaluate(self, x, out, *args, **kwargs):
out["F"] = np.sum(np.square(x - 0.5), axis=1)
problem = YourProblem()
# create initial data and set to the population object - for your this is your file
N = 300
X = np.random.random((N, problem.n_var))
F = np.random.random((N, problem.n_obj))
G = np.random.random((N, problem.n_constr))
class Constant(YourProblem):
def _evaluate(self, x, out, *args, **kwargs):
out["F"] = F
out["G"] = G
pop = Population().new("X", X)
Evaluator().eval(Constant(), pop)
algorithm = GA(pop_size=100, sampling=pop)
minimize(problem,
algorithm,
('n_gen', 10),
seed=1,
verbose=True)

Importing a module causes an error, but separating them to two and then importing doesn't, why?

I'm trying to find out for myself how I could work around the problem I recently asked here and I come across a potential solution, but I honestly don't understand why it works, and why the other doesn't.
For context, model requires variables a and b to be defined before being successfully loaded and defined in a module. Otherwise, it throws an error: NameError: name 'a' is not defined.
Starting off with model.py:
import pickle
from tensorflow import keras
# loads and returns the variables needed by model
def load_model_vars():
return pickle.load(open('./file.pkl', 'rb'))
# loads and returns the model
def load_model():
return keras.models.load_model('./model.h5')
Now to minimally reproduce and identify the problem I created a new module, foo.py:
from model import load_model_vars, load_model
# goal here is to supposedly expose only the model to other modules
a, b = load_model_vars()
globals()['model'] = load_model()
I then created another module to import foo.py into, let's name it bar.py:
import foo
# just checks if the model is defined
foo.model.summary()
Which for some reason throws the formerly mentioned NameError. Why? The variables are defined, it was executed in order(load variables first, then model), and even if I change a, b to globals()['a'], globals()['b'], import foo to from foo import * or from foo import a, b or even combinations of any of these, it always arrives into this error.
But when I introduce another module, say, baz.py, that contains these two lines:
from model import load_model_vars
a, b = load_model_vars()
Then import it to bar.py:
from baz import a, b
import foo
# just checks if the model is defined
foo.model.summary()
With foo.py unchanged, or with a, b = load_model_vars() commented out:
from model import load_model_vars, load_model
# goal here is to supposedly expose only the model to other modules
# a, b = load_model_vars()
globals()['model'] = load_model()
It successfully loads the freaking model! Why? What's this sorcery underneath the import function? What actually happens under the hood?

Resources