sklearn subset fitted pipeline - reuse for tranform - scikit-learn

I have constructed a pipeline with several steps which takes some time to fit. For debugging I would like to be able to inspect subsets of that pipeline (e.g. {pipe step 1-3}.transform(X)).
I know that I can use Pipe(pipe.named_steps[:3]) to extract a subset and construct a new pipeline from it. Unfortunately I have to refit the pipeline before calling transform on it.
Is there a way to avoid the refit?

You can access subparts of a Pipeline object by indexing it like a normal list, e.g. pipe[:3]. This will return a new, yet unfitted Pipeline instance. Interestingly though, its components are fitted.
In consequence, a check with scikit-learn's check_is_fitted function would raise an error. However, you can nonetheless call pipe[:3].transform(X) which will still work if you have fit the whole pipeline before.

Related

What are the steps within `make_pipeline()`?

What are the steps within sklearn.pipeline.make_pipeline()?
The documentation doesn't explicitly state what are the steps that we "save" when using make_pipeline() instead of doing it the normal way. Was wondering what those steps are just for my understanding?
I can guess that one of the steps is applying a scaling to all columns in the dataset, like StandardScaler().

How to organize one step after another in Azure Machine Learning Pipelines?

I have defined an Azure Machine Learning Pipeline with three steps:
e2e_steps=[etl_model_step, train_model_step, evaluate_model_step]
e2e_pipeline = Pipeline(workspace=ws, steps = e2e_steps)
The idea is to run the Pipeline in the given sequence:
etl_model_step
train_model_step
evaluate_model_step
However, my experiment is failing because it is trying to execute evaluate_model_step before train_model_step:
How do I enforce the sequence of execution?
azureml.pipeline.core.StepSequence lets you do exactly that.
A StepSequence can be used to easily run steps in a specific order, without needing to specify data dependencies through the use of PipelineData.
See the docs to read more.
However, the preferable way to have steps run in order is stitching them together via PipelineData or OutputFileDatasetConfig. In your example, does the train_step depend on outputs from the etl step? If so, consider having that be the way that steps are run in sequence. For more info see this tutorial for more info

How to change the input of a decision operation in the decision center

I am using the decision center business console, and it allows me to create new action rules or decision tables. Also I can add new variables to the variable sets. But it seems that I cannot modify the INPUT/OUTPUT of the decision operation. I can create a new decision operation, so it seems reasonable that I should be able to add INPUT/OUTPUT variables to it... but I can't figure it out! Please help.
Open rule project map in eclipse and then Add decision operation. Here you find Signature are where you can INPUT/OUTPUT Params.

Brightway2 - Get LCA scores of immediate exchanges

I'm having some problems regarding the post-processing analysis of my LCA results from brightway2. After running a LCA calculation, if, for example, I type top_activities() I get a list of a bunch of activities and their associated scores, however none of the activities/scores are the ones associated directly with my functional unit (they appear to be some exchanges of my exchanges...).
How can I get the LCA scores of the exchanges (both technosphere and biosphere) I defined when constructing my Functional Unit?
Thanks!
I've found the best way to get aggregated results for your foreground model in brightway is using the bw2analyzer.traverse_tagged_databases() function rather than top_activities(). Details in the docs are here.
It's designed to calculate upstream impacts of the elements of your foreground model and then aggregate the impacts based on a tag it finds in the activity. e.g. if you add 'tag':'use phase' or 'tag':'processing' to your activities you can aggregate impact results by life cycle stage.
BUT you can change the default label it looks for, so instead of tag you can tell it to look for name - that'll give you the aggregated upstream impact of each of the activities in your foreground model. It returns a dictionary with the names of your tags as keys, and impacts as values. It also returns a graph of your foreground system which you can use to create some cool tree/bullseye charts - see the docs for the format.
Here's the function you need:
results, graph = recurse_tagged_databases(functional_unit, method, label='name')
Here are a couple of examples of the kinds of visualisations you can make using the data recurse_tagged_databases gives you:
Waterfall chart example from the results dictionary
Bullseye chart example from the tagged graph
It is pretty easy to traverse the supply chain manually, and everyone wants to do this a slightly different way, so it isn't built in to Brightway yet. Here is a simple example:
from brightway2 import *
func_unit = Database("ecoinvent 3.4 cutoff").random()
lca = LCA({func_unit: 1}, methods.random())
lca.lci()
lca.lcia()
print(func_unit)
for exc in func_unit.technosphere():
lca.redo_lcia({exc.input: exc['amount']})
print(exc.input, exc['amount'], lca.score)

Aggregate multiple perf profiles

I'm working on a managed runtime. A code change went in which resulted in a regression in the rate at which the JIT compiler processes compilations. (That is, the act of compiling is slower, the resulting code being generated is unaffected.) This was observed using our standard benchmark.
I'm trying to nail down the mechanics underlying this regression. I have been looking at pairs of profiles created from single runs of the benchmark. For each pair, the first profile is generated with a build without the change, the second is generated with a build with is identical to the first, modulo the regression-causing change.
I'm finding that there aren't enough samples to make useful determinations when using a profile for a single run. I would like to collect multiple profiles for both before and after (generally k for each of before and after), and merge them together to generate a smoother view of what's going on.
Is there a way to do this?

Resources