Django-viewflow - keeping models separate from process flow - django-viewflow

This question is triggered from designing models in django-viewflow
While keeping the models separate from viewflow process, getting the below error -
File "/usr/local/lib/python3.6/site-packages/django/db/models/fields/related.py", line 625, in resolve_related_fields raise ValueError('Related model %r cannot be resolved' % self.remote_field.model) ValueError: Related model'mymodel.MyModel' cannot be resolved
Here are my model and viewflow process model class
MyModel(models.Model):
field1 = models.IntegerField(default=None)
field2 = models.IntegerField(default=None)
field3 = models.CharField(null=True, max_length=60, default=None)
approved = models.BooleanField(default=False) approved_at = models.DateTimeField(null=True)

Created a separate migration script (appname_initial_01.py) containing details of 'mymodel' and refer this in dependencies list of migration script (appname_initial_02.py) containing 'MyProcess' resolved the issue.
dependencies = [
('appname', 'appname_initial_01'),
]

Related

How to create and read createOrReplaceGlobalTempView when using static clusters

In my deployment.yaml file I have defined a static cluster as such:
custom:
basic-cluster-props: &basic-cluster-props
spark_version: "11.2.x-scala2.12"
basic-static-cluster: &basic-static-cluster
new_cluster:
<<: *basic-cluster-props
num_workers: 1
node_type_id: "Standard_DS3_v2"
I use this for all of my tasks. In one of the tasks, I save a DataFrame using:
transactions.createOrReplaceGlobalTempView("transactions")
And in another task (which is depended on the previous task), I try to read the temporary view as such:
global_temp_db = session.conf.get("spark.sql.globalTempDatabase")
# Load wallet features
transactions = session.sql(f"""SELECT *
FROM """ + global_temp_db + """.transactions""")
But I get the error:
AnalysisException: Table or view not found: global_temp.transactions; line 2 pos 43;
'Project [*]
+- 'UnresolvedRelation [global_temp, transactions], [], false
Both tasks run within the same SparkSession, so why can it not find my global temp view?
Unfortunately this won't work unless you're using a cluster-reuse feature (otherwise you have a new cluster each time, therefore you won't be able to cross-reference this view).
A more pythonic approach would be to add the code that initializes the view in every task, e.g. if you're using the pre-defined Task class:
class TaskWithPreInitializedView(Task):
def _add_transactions_view(self):
transactions = ... # some code to define the view
transactions.createOrReplaceGlobalTempView(...)
def launch(self):
self._add_transactions_view()
class RealTask(TaskWithPreInitializedView):
def launch(self):
super(RealTask).launch()
... # your code
Since view creation is a very cheap operation which doesn't take much time, this is a quite efficient approach.

How to add multiple fields' reference to "unique_together" error message

I have a model with multiple fields being checked for uniqueness:
class AudQuestionList(BaseTimeStampModel):
aud_ques_list_id = models.AutoField(primary_key=True,...
aud_ques_list_num = models.CharField(max_length=26,...
aud_ques_list_doc_type = models.ForeignKey(DocType,...
short_text = models.CharField(max_length=55,...
aud_scope_standards = models.ForeignKey(ScopeStandard, ...
aud_freqency = models.ForeignKey(AuditFrequency, ...
aud_process = models.ForeignKey(AuditProcesses, ...
unique_together = [['aud_scope_standards', 'aud_freqency', 'aud_process',],]
My model form is as described below:
class CreateAudQuestionListForm(forms.ModelForm):
class Meta:
model = AudQuestionList
fields = ('aud_ques_list_doc_type', 'aud_scope_standards', 'aud_freqency', 'aud_process', 'short_text', ...
def validate_unique(self):
try:
self.instance.validate_unique()
except ValidationError:
self._update_errors({'aud_scope_standards': _('Record exists for the combination of key values.')})
The scenario works perfectly well, only that the field names (labels) itself are missing from the message.
Is there a way to add the field names to the message above, say something like:
Record exists for the combination of key fields + %(field_labels)s.

How to create pivot tables with model_bakery

I want to create a baker recipe who create one object, objects from a pivot table and links everything well.
An incident is created when cells of a technology (2g, 3g, 4g, 5g) are down, but not all technologies are impacted. So we have a pivot table to manage this.
My goal is to avoid writing code duplicates and manage this with a simple baker.make_recipe("incidents.tests.incident_with_multiple_impacts") to create an incident and 4 different impacts (technologies are created in fixtures)
# incident/models/incident.py
class Incident(models.Model):
impacted_technologies = models.ManyToManyField(
Technology, through="IncidentIncidentImpactedTechnologies"
)
# network/models/technology.py
class Technology(models.Model):
alias = models.CharField(max_length=2, unique=True, blank=False) # ie: "2G", "3G"...
name = models.CharField(max_length=4, unique=True, blank=False)
# incident/models/incident_technologies.py
class IncidentIncidentImpactedTechnologies(models.Model):
incident = models.ForeignKey(
to="incident.Incident",
on_delete=models.CASCADE,
related_name="technology_impacts",
)
technology = models.ForeignKey(
to="network.Technology",
on_delete=models.CASCADE,
related_name="impacting_incidents",
)
During my researches, I writed code above (I want to keep first recipes to reuse them later):
from model_bakery.recipe import Recipe, foreign_key, related
def get_technology(key=None):
if key:
return Technology.objects.get_by_alias(key) # to get it with "2G"
return random.choice(Technology.objects.all())
incident = Recipe(
Incident,
# few other fields
)
impacted_techno = Recipe(
IncidentIncidentImpactedTechnologies,
incident=foreign_key(incident),
technology=get_technology, # Default, take random technology
# Fields describing a NO_IMPACT on this techno
)
impacted_techno_cell_down = impacted_techno.extend(
# Fields describing a CELL_DOWN impact on this techno
)
impacted_techno_degraded = impacted_techno.extend(
# Fields describing a DEGRADED impact on this techno
)
impacted_techno_down = impacted_techno.extend(
# Fields describing a full TECHNO_DOWN on this techno
)
# Below code is not working, because in incident, impacted_technologies is a O2M field pointing on Technology, not on IncidentIncidentImpactedTechnology.
incident_with_multiple_impacted_technos = incident.extend(
impacted_technologies=related(
impacted_techno.extend(technology=lambda: get_technology("2G")),
impacted_techno_degraded.extend(technology=lambda: get_technology("3G")),
impacted_techno_down.extend(technology=lambda: get_technology("4G")),
impacted_techno_cell_down.extend(technology=lambda: get_technology("5G")),
)
)
So how can I make a recipe create all the object tree?
Incident # Only one
| \
| [IncidentIncidentImpactedTechnology] * 4
| /
[Technology] * 4 # already created
avoiding doing this:
#staticmethod
def _get_technology(key):
return Technology.objects.get_by_natural_key(key)
def _create_incident_with_multiple_impacted_technos(self):
incident = baker.make_recipe("incident.tests.incident")
baker.make_recipe(
"incident.tests.impacted_techno",
incident=incident,
technology=self._get_technology("2G"),
)
baker.make_recipe(
"incident.tests.impacted_techno_cell_down",
incident=incident,
technology=self._get_technology("3G"),
)
baker.make_recipe(
"incident.tests.impacted_techno_degraded",
incident=incident,
technology=self._get_technology("4G"),
)
baker.make_recipe(
"incident.tests.impacted_techno_down",
incident=incident,
technology=self._get_technology("5G"),
)
return incident
def test_my_test(self):
incident = self._create_incident_with_multiple_impacted_technos()
# test stuff

How do we do Batch Inferencing on Azure ML Service with Parameterized Dataset/DataPath input?

The ParallelRunStep Documentation suggests the following:
A named input Dataset (DatasetConsumptionConfig class)
path_on_datastore = iris_data.path('iris/')
input_iris_ds = Dataset.Tabular.from_delimited_files(path=path_on_datastore, validate=False)
named_iris_ds = input_iris_ds.as_named_input(iris_ds_name)
Which is just passed as an Input:
distributed_csv_iris_step = ParallelRunStep(
name='example-iris',
inputs=[named_iris_ds],
output=output_folder,
parallel_run_config=parallel_run_config,
arguments=['--model_name', 'iris-prs'],
allow_reuse=False
)
The Documentation to submit Dataset Inputs as Parameters suggests the following:
The Input is a DatasetConsumptionConfig class element
tabular_dataset = Dataset.Tabular.from_delimited_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
tabular_pipeline_param = PipelineParameter(name="tabular_ds_param", default_value=tabular_dataset)
tabular_ds_consumption = DatasetConsumptionConfig("tabular_dataset", tabular_pipeline_param)
Which is passed in arguments as well in inputs
train_step = PythonScriptStep(
name="train_step",
script_name="train_with_dataset.py",
arguments=["--param2", tabular_ds_consumption],
inputs=[tabular_ds_consumption],
compute_target=compute_target,
source_directory=source_directory)
While submitting with new parameter we create a new Dataset class:
iris_tabular_ds = Dataset.Tabular.from_delimited_files('some_link')
And submit it like this:
pipeline_run_with_params = experiment.submit(pipeline, pipeline_parameters={'tabular_ds_param': iris_tabular_ds})
However, how do we combine this: How do we pass a Dataset Input as a Parameter to the ParallelRunStep?
If we create a DatasetConsumptionConfig class element like so:
tabular_dataset = Dataset.Tabular.from_delimited_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
tabular_pipeline_param = PipelineParameter(name="tabular_ds_param", default_value=tabular_dataset)
tabular_ds_consumption = DatasetConsumptionConfig("tabular_dataset", tabular_pipeline_param)
And pass it as an argument in the ParallelRunStep, it will throw an error.
References:
Notebook with Dataset Input Parameter
ParallelRunStep Notebook
AML ParallelRunStep GA is a managed solution to scale up and out large ML workload, including batch inference, training and large data processing. Please check out below documents for the details.
• Overview doc: run batch inference using ParallelRunStep
• Sample notebooks
• AI Show: How to do Batch Inference using AML ParallelRunStep
• Blog: Batch Inference in Azure Machine Learning
For the inputs we create Dataset class instances:
tabular_ds1 = Dataset.Tabular.from_delimited_files('some_link')
tabular_ds2 = Dataset.Tabular.from_delimited_files('some_link')
ParallelRunStep produces an output file, we use the PipelineData class to create a folder which will store this output:
from azureml.pipeline.core import Pipeline, PipelineData
output_dir = PipelineData(name="inferences", datastore=def_data_store)
The ParallelRunStep depends on ParallelRunConfig Class to include details about the environment, entry script, output file name and other necessary definitions:
from azureml.pipeline.core import PipelineParameter
from azureml.pipeline.steps import ParallelRunStep, ParallelRunConfig
parallel_run_config = ParallelRunConfig(
source_directory=scripts_folder,
entry_script=script_file,
mini_batch_size=PipelineParameter(name="batch_size_param", default_value="5"),
error_threshold=10,
output_action="append_row",
append_row_file_name="mnist_outputs.txt",
environment=batch_env,
compute_target=compute_target,
process_count_per_node=PipelineParameter(name="process_count_param", default_value=2),
node_count=2
)
The input to ParallelRunStep is created using the following code
tabular_pipeline_param = PipelineParameter(name="tabular_ds_param", default_value=tabular_ds1)
tabular_ds_consumption = DatasetConsumptionConfig("tabular_dataset", tabular_pipeline_param)
The PipelineParameter helps us run the pipeline for different datasets.
ParallelRunStep consumes this as an input:
parallelrun_step = ParallelRunStep(
name="some-name",
parallel_run_config=parallel_run_config,
inputs=[ tabular_ds_consumption ],
output=output_dir,
allow_reuse=False
)
To consume with another dataset:
pipeline_run_2 = experiment.submit(pipeline,
pipeline_parameters={"tabular_ds_param": tabular_ds2}
)
There is an error currently: DatasetConsumptionConfig and PipelineParameter cannot be reused

es.normalize_entity error variable not found in entity

Am using the featuretools documentation to learn entityset and am currently getting error KeyError: 'Variable: device not found in entity' for the following piece of code:
import featuretools as ft
data = ft.demo.load_mock_customer()
customers_df = data["customers"]
customers_df
sessions_df = data["sessions"]
sessions_df.sample(5)
transactions_df = data["transactions"]
transactions_df.sample(10)
products_df = data["products"]
products_df
### Creating an entity set
es = ft.EntitySet(id="transactions")
### Adding entities
es = es.entity_from_dataframe(entity_id="transactions", dataframe=transactions_df, index="transaction_id", time_index="transaction_time", variable_types={"product_id": ft.variable_types.Categorical})
es
es["transactions"].variables
es = es.entity_from_dataframe(entity_id="products",dataframe=products_df,index="product_id")
es
### Adding new relationship
new_relationship = ft.Relationship(es["products"]["product_id"],
es["transactions"]["product_id"])
es = es.add_relationship(new_relationship)
es
### Creating entity from existing table
es = es.normalize_entity(base_entity_id="transactions",
new_entity_id="sessions",
index = "session_id",
additional_variables=["device",customer_id","zip_code"])
This is as per the URL - https://docs.featuretools.com/loading_data/using_entitysets.html
From the API es.normalise_entity it appears that the function would create new entity 'sessions' with index as 'session_id', and rest of the 3 variables however the error is:
C:\Users\s_belvi\AppData\Local\Continuum\Anaconda2\lib\site-packages\featuretools\entityset\entity.pyc in _get_variable(self, variable_id)
250 return v
251
--> 252 raise KeyError("Variable: %s not found in entity" % (variable_id))
253
254 #property
KeyError: 'Variable: device not found in entity'
Do we need to create entity "sessions" separately before using es.normalize_entity? Looks like something syntactically has gone wrong in the flow, some minor mistake..
The error here arises from device not being a column in your transactions_df. The "transactions" table referenced in that page of the documentation has more columns than demo.load_mock_customer in its dictionary form. You can find the rest of the columns using the return_single_table argument. Here's a full working example of normalize_entity which is only slightly modified from the code that you tried:
import featuretools as ft
data = ft.demo.load_mock_customer(return_single_table=True)
es = ft.EntitySet(id="Mock Customer")
es = es.entity_from_dataframe(entity_id="transactions",
dataframe=data,
index="transaction_id",
time_index="transaction_time",
variable_types={"product_id": ft.variable_types.Categorical})
es = es.normalize_entity(base_entity_id="transactions",
new_entity_id="sessions",
index = "session_id",
additional_variables=["device","customer_id","zip_code"])
This will return an EntitySet with two Entities and one Relationship:
Entityset: Mock Customer
Entities:
transactions [Rows: 500, Columns: 8]
sessions [Rows: 35, Columns: 5]
Relationships:
transactions.session_id -> sessions.session_id

Resources