Reusing common step definitions between feature files in behave python - python-3.x

i have some checks that needs to be included in multiple feature files i don't want to duplicate the step definitions across other step definitions.
eg:
#when(u'parquet files exist in "{container}" container in the data lake')
def step_imp(context, container):
parquet_files_array = []
for parquet_file in context.list_of_files:
parquet_files_array.append(parquet_file.name)
check_parquet_files_are_present_in_the_container_area_data_lake(parquet_files_array)**
i have to use this check in another step definition files too.
I have created a common_steps.py class and stuck all the common steps there i wonder how can reuse them with out duplicating across multiple features

Did you try importing them?
# in <step definitions>.py
import common_steps
#when(u'parquet files exist in "{container}" container in the data lake')
def step_imp(*args, **kwargs):
common_steps.step_imp(*args, **kwargs)
#in common_steps.py
def step_imp(context, container):
# implementation

When the common_steps.py is imported we don't have to define the step in the respective step definition file when we execute the feature file the step definition will be accessed from common_steps automatically

Related

How to import custom functions on my experiment script for Azure ML?

I can successfully submit an experiment to processing on a remote compute target on Azure ML.
In my notebook, for submitting the experiment, I have:
# estimator
estimator = Estimator(
source_directory='scripts',
entry_script='exp01.py',
compute_target='pc2',
conda_packages=['scikit-learn'],
inputs=[data.as_named_input('my_dataset')],
)
# Submit
exp = Experiment(workspace=ws, name='my_exp')
# Run the experiment based on the estimator
run = exp.submit(config=estimator)
RunDetails(run).show()
run.wait_for_completion(show_output=True)
However, in order to keep things clean, I want to define my general use functions on an auxiliary script, so the first will import it.
On my script experiment file exp01.py, I wanted:
import custom_functions as custom
# azure experiment start
run = Run.get_context()
# the data from azure datasets/datastorage
df = run.input_datasets['my_dataset'].to_pandas_dataframe()
# prepare data
df_transformed = custom.prepare_data(df)
# split data
X_train, X_test, y_train, y_test = custom.split_data(df_transformed)
# run my models.....
model_name = 'RF'
model = custom.model_x(model_name, a_lot_of_args)
# log the results
run.log(model_name, results)
# azure finish
run.complete()
The thing is: Azure wont let me import the custom_functions.py.
How are you doing it?
TL;DR any files you put inside the source_directory in your case, scripts will be available to the Estimator.
To make this happen, simply create a file called custom_functions.py in the scripts folder that contains your prepare_data(), split_data(), model_x() functions.
I also recommend that you include only exactly what you need in the source_directory folder and make distinct folders for each Estimator because:
the entire folder's contents will be uploaded when you use a remote compute_target, and
when you started using ML Pipeilnes (which are awesome), PythonScriptSteps allow_reuse parameter will look to see if any files in the source_directory have changed when determining if the step needs to run again or not.
Lastly, when you want to share general utility functions across PythonScriptSteps or Estimators without having to copy and paste code, that's when you might want to consider creating a custom python package.

Creating SequenceTaggingDataset from list, not file

I would like to create a SequenceTaggingDataset from two lists that I have created dynamically inside my code - train_sentences and train_tags. I would want to write something like this:
train_data = SequenceTaggingDataset(examples=(zip(train_sentences, train_tags)))
However, the constructor must receive a path. And not only that - it looks from the code as though, even if I were to provide the examples, it will override those, and initialize examples to be an empty list.
For various reasons, I do not want to save the lists I created in a file from which the SequenceTaggingDataset could read. Is there any way around this, save defining my own custom class?
You will need to modify source code for it (https://pytorch.org/text/_modules/torchtext/datasets/sequence_tagging.html#SequenceTaggingDataset). You can make a local copy and import as your module.
path is used in __init__. The important part is that it takes lines from file and splits it using given separator into list named columns. Then this columns list is being fed into another class method together with fields to construct examples list. Please read provided example here to understand fields (Note that UDPOS is called there to create SequenceTaggingDataset).
What you need is columns, which you don't need to read from file as you have all components already. You will feed it directly by simplifying class __init__:
def __init__(self, columns, fields, encoding="utf-8", separator="\t", **kwargs):
examples = []
examples.append(data.Example.fromlist(columns, fields))
super(SequenceTaggingDataset, self).__init__(examples, fields,
**kwargs)
columns is nested list of lists: [[word], [UD_TAG], [PTB_TAG]]. It means that you need to feed following into modified class:
train = SequenceTaggingDataset([train_sentences, train_tags], fields=...)

Inheritance and Pandas

I am trying to create a file writer based on Pandas' ExcelWriter. I proceeded as I usually do with classes in Python (3) with inheritance:
import pandas as pd
class Writer(pd.ExcelWriter):
def __init__(self, fname, engine='openpyxl'):
pd.ExcelWriter.__init__(self, fname, engine=engine)
self.newvar = 0
However, when I try to use it, I cannot access newvar:
test = Writer('test.xlsx')
test.newvar
returns:
AttributeError: '_XlsxWriter' object has no attribute 'nmax'
And when I check the type of test, it returns:
pandas.io.excel._XlsxWriter
I don't understand what I am missing since I used this kind of inheritance in many other cases. Any idea would be appreciated!
This is because pandas.ExcelWriter.__new__ returns a different class than itself (actually it is an abc.ABCMeta). The class is chosen based on the extension of the file path and the engine which is used - you could observe that when you checked the type of the newly created instance. That means the __init__ method of whatever class is returned gets called. You can think of ExcelWriter as some kind of proxy for the specific writers for each format and engine (though it also defines the API which such a writer must provide).
In order to make your writer available (for the given engine), you need to register it.
But before you can do that you need to make your class compatible by following the instructions which you'll find via help(pandas.ExcelWriter). For the sake of completeness I cite them here:
# Defining an ExcelWriter implementation (see abstract methods for more...)
# - Mandatory
# - ``write_cells(self, cells, sheet_name=None, startrow=0, startcol=0)``
# --> called to write additional DataFrames to disk
# - ``supported_extensions`` (tuple of supported extensions), used to
# check that engine supports the given extension.
# - ``engine`` - string that gives the engine name. Necessary to
# instantiate class directly and bypass ``ExcelWriterMeta`` engine
# lookup.
# - ``save(self)`` --> called to save file to disk
# - Mostly mandatory (i.e. should at least exist)
# - book, cur_sheet, path
# - Optional:
# - ``__init__(self, path, engine=None, **kwargs)`` --> always called
# with path as first argument.
So with that in mind we can extend your class:
class Writer(pd.ExcelWriter):
engine = 'openpyxl'
supported_extensions = ('xlsx',)
def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0):
# Implement something useful here.
pass
def save(self):
# Implement something useful here.
pass
def __init__(self, fname, engine='openpyxl', **kwargs):
super().__init__(self, fname, engine=engine, **kwargs)
Now you can use pd.io.excel.register_writer(Writer) to register the writer. But you need to make sure the engine which you've specified matches your version of openpyxl. You can check the process of how a specific writer is chosen here; the writers which are currently registered for each version can be checked via print(pd.io.excel._writers).
As a side note: You can also subclass one of the already available specific writers and reuse their write_cells and save methods for example (however you'll need to register your writer also in that case):
_Openpyxl1Writer
_Openpyxl20Writer
_Openpyxl22Writer
_XlwtWriter
_XlsxWriter

Automatically creating separate instances of one class (Python - Excel/CSV)

My goal is, given an Excel or CSV file, to automatically create instances of one object. After some research, I saw that similar questions have been asked. But in most of the cases, the author only wanted to put instances into a list to print information on them (like: Python creating class instances in a loop).
What I need, is not only to create separate instances of a class, but also to be able to call on these distinct instances later in the code. Also, the main point of this, is that my file is dynamic. The one I put just below is just a toy example, my goal being to be able to automatically process bigger and more complex "models".
Let's have an example. Given the following file:
I would like to create different instances of the following object to store the information given in the file:
class Element
name = ""
Property1 = []
Property2 = []
def add_name(self, name):
self.name = name
def add_pos_reg(self, p1):
self.Property1.append(p1)
def add_neg_reg(self, p2):
self.Property2.append(p2)
I thought of using the classic way of instancing an object in a loop, and then stocking the instances in a list:
ListeElement = []
for i in range(2, max_row):
e=Element("get the property from the file") ## I already have a custom function to get the properties from the file into the instance. ##
ListeElement.append(e)
But then, I think that this way does not create distinct instances, and also I am not even sure that I will be able to call on specific instances stocked in the list later in my code.
I am sorry if this is a redundant question, I usually find what I want to do using the search function on this website, but I am getting stuck there.

Writing a custom builder that executes external command and python function

I'm looking to write a custom SCons Builder that:
Executes an external command to produce foo.temp
Then executes a python function to manipulate foo.temp and produce the final output file
I've referred to the two following sections, but I'm not sure the correct way to "glue" them together.
18.1. Writing Builders That Execute External Commands
18.4. Builders That Execute Python Functions
I know that Command accepts a list of actions to take. But how do I properly handle that intermediate file? Ideally the intermediate file would be invisible to the user -- the entire Builder would appear to operate atomically.
Here's what I've come up with that seems to be working. However the .bin file isn't being deleted automatically.
from SCons.Action import Action
from SCons.Util import is_List
from SCons.Script import Delete
_objcopy_builder = Builder(
action = 'objcopy -O binary $SOURCE $TARGET',
suffix = '.bin',
single_source = 1
)
def _add_header(target, source, env):
source = str(source[0])
target = str(target[0])
with open(source, 'rb') as src:
with open(target, 'wn') as tgt:
tgt.write('MODULE\x00\x00')
tgt.write(src.read())
return 0
_addheader_builder = Builder(
action = _add_header,
single_source = 1
)
def Elf2Mod(env, target, source, *args, **kw):
def check_one(x, what):
if not is_List(x):
x = [x]
if len(x) != 1:
raise StopError('Only one {0} allowed'.format(what))
return x
target = check_one(target, 'target')
source = check_one(source, 'source')
# objcopy a binary file
binfile = _objcopy_builder.__call__(env, source=source, **kw)
# write the module header
_addheader_builder.__call__(env, target=target, source=binfile, **kw)
# delete the intermediate binary file
# TODO: Not working
Delete(binfile)
return target
def generate(env):
"""Add Builders and construction variables to the Environment."""
env.AddMethod(Elf2Mod, 'Elf2Mod')
print 'Added Elf2Mod to env {0}'.format(env)
def exists(env):
return True
This can indeed be done with the Command builder, by specifying a list of actions, as follows:
Command('foo.temp', 'foo.in',
['your_external_action',
your_python_function])
Notice that foo.in is the source, and you should name it accordingly. But if foo.temp is internal as you mention, then this approach probably isnt the best approach.
Another way, which I feel is much more flexible, would be to use Custom Builder with a Generator and/or Emitter.
The Generator is a Python function where you do the actual work, which in your case would be calling the external command, and also call the Python function.
An Emitter allows you to have a fine-tuned control over the sources and targets. I used a Builder with a Emitter (and Generator) once to do C++ and Java code-generation with Thrift input IDL files. I had to read and process the Thrift input file to know exactly what files would be code-generated (which are the actual targets), and the Emitter is the best/only way to do something like this. If your particular use-case isnt so complicated, you can skip the Emitter and just list your sources/targets in the call to the builder. But if you want foo.temp to be transparent to the end-user, then you'll need an Emitter.
When using a Custom Builder with a Generator and Emitter, the Emitter will be called every time by SCons to calculate the sources and dependencies to know if the Generator needs to be called. The Generator will only be called if one of the targets is considered older with respect to the sources.
There are numerous examples showing how to use a Generator and Emitter in a Custom Builder, so I wont list the code here, but let me know if you need help with the syntax, etc.

Resources