I'm creating a Scenario Outline similar to the following one (it is a simplified version but gives a good indication of my problem):
Given I have a valid operator such as 'MyOperatorName'
When I provide a valid phone number for the operator
And I provide an '<amount>' that is of the following '<type>'
And I send a request
Then the following validation message will be displayed: 'The Format of Amount is not valid'
And the following Status Code will be received: 'AmountFormatIsInvalid'
Examples:
| type | description | amount |
| Negative | An amount that is negative | -1.0 |
| Zero | An amount that is equal to zero | 0 |
| ......... | .......... | .... |
The Examples table provides the test data that I need but I would add another Examples table with just the names of the operators (instead of MyOperatorName) in order to replicate the tests for different operators
Examples:
| operator |
| op_numb_1 |
| op_numb_2 |
| op_numb_3 |
in order to avoid repeating the same scenario outline three times; I know that this is not possible but I'm wondering what is the best approach to avoid using three different scenario outlines inside the feature that are pretty the same apart from the operator name.
I know that I can reuse the same step definitions but I'm trying to understand if there is a best practice to prevent cluttering the feature with scenarios that are too much similar.
Glad you know this isn't possible...
So what options are there?
Seems like there are 5:
a: Make a table with every option (the cross product)
Examples:
| type | description | amount | operator |
| Negative | An amount that is negative | -1.0 | op_numb_1 |
| Zero | An amount that is equal to zero | 0 | op_numb_1 |
| Negative | An amount that is negative | -1.0 | op_numb_2 |
| Zero | An amount that is equal to zero | 0 | op_numb_2 |
| ......... | .......... | .... | ... |
b. Repeat the scenario for each operator, with a table of input rows
- but you said you didn't want to do this.
c. Repeat the scenario for each input row, with a table of operators
- I like this option, because each rule is a separate test. If you really, really want to ensure that every different implementation of your "operator" strategy passes and fails in the same validation scenarios, then why not write each validation scenario as a single Scenario Outline: e.g.
Scenario Outline: Operators should fail on Negative inputs
Given I have a valid operator such as 'MyOperatorName'
When I provide a valid phone number for the operator
And I send a request with the amount "-1.0"
Then the following validation message will be displayed: 'The Format of Amount is not valid'
And the following Status Code will be received: 'AmountFormatIsInvalid'
Scenario Outline: Operators should fail on Zero inputs
...etc...
d. Rethink how you are using Specflow - if you only need KEY examples to illustrate your features (as described by Specification by Example by Gojko Adzic), then you are overdoing it by checking every combination. If however you are using specflow to automate your full suite of integration tests then your scenarios could be appropriate... but you might want to think about e.
e. Write integration / unit tests based on the idea that your "operator" validation logic is applied only in one place. If the validation is the same on each operator, why not test it once, and then have all the operators inherit from or include in their composition the same validator class?
Related
I am currently setting up a data lake trying to follow the principles of Delta Lake (landing in bronze, cleaning and merging into silver, and then, if needed, presenting the final view in gold) and have a question about what should be stored in Silver.
For example, if the data in bronze comes in from a REST API and is stored in the JSON form it comes in in this format:
id (Int)
name (String)
fields (Array of Strings)
An example looks like:
{
'id':12345,
'name':'Test',
'fields':['Hello','this','is','a','test']
}
In the end I want to present this as two tables. One would be the base table and look like:
TABLE 1
| id | name |
| -------- | -------------- |
| 12345 | Test |
And another would look like:
TABLE 2
| id | field_value |
| -------- | -------------- |
| 12345 | Hello |
| 12345 | this |
| 12345 | is |
| 12345 | a |
| 12345 | test |
My question is, should I pre-process the data in Spark and store the data in silver in separate folders like this:
-- root
---table 1
----file1.parquet
----etc.parquet
---table 2
----file1.parquet
----etc.parquet
Or store it all in silver under one folder and then derive those two tables using TSQL and functions like OPENJSON later?
Thank you for your help or insight!
I do not think there is a real answer to your questions, but here is a stab - based on your explicit example and this reference https://k21academy.com/microsoft-azure/data-engineer/delta-lake/
My question is, should I pre-process the data in Spark and store the
data in silver in separate folders like this: ...
Yes, I would as JSON takes more time to process. I use JSON for RAW on a current project if it comes in that format and in the REFined Area we store arrays if needed, as opposed to JSON structs. But this is because we use a data Hub approach based on Martin Fowler's Distributed Data Mesh. We have a BUSiness Area where we model the data according to a semantic model.
But for every expert there is an equal and opposite expert. Some would say do it on the fly, like SAP Hana ETL on-the-fly.
For analysis of datasets given to Data Scientist for analysis, or ad hoc analysis, the 2nd approach is fine. The data would be in the Bronze zone. That said gdpr aspects would, could mean refine them to the Silver zone with gdpr aspects removed.
In short, depends on your use case.
My question looks similar to this
, but it should be different because my question relates to the combination of entities and context.
Let me show an example; here duplicated synonyms are:
| Entity | Value | Synonyms |
|-------------+-------------+--------------|
| whether | whether | fine, cloudy |
| granularity | granularity | fine, coarse |
And I have same training phrase on different intents with different input context as follows:
Intent-a:
input context: A
training phrase: is it <fine>?
synonym <fine> is for #whether
Intent-b:
input context: B
training phrase: is it <fine>?
synonym <fine> is for #granularity
When a user says "is it fine?" under the context 'A', fine is identified to #whether? Or, input context is not considered on intent-detection for user input sentence so that is it fifty-fifty which intent is detected?
Context is considered when identifying the intents. Therefore if you have properly defined the contexts for the above scenario no ambiguity is created.
I want to create an excel table that will help me when estimating implementation times for tasks that I am given. To do so, I derived 4 categories in which I individually rate the task from 1 to 10.
Those are: Complexity of system (simple scripts or entire business systems), State of requirements (well defined or very soft), Knowledge about system (how much I know about the system and the code base) and Plan for implementation (do I know what to do or don't I have any plan what to do or where to start).
After rating each task in these categories, I want to have a resulting factor of how expensive and how long the task will likely take, as a very rough estimate that I can tell my bosses.
What I thought about doing
I thought to create a function where I define the inputs and then get the result in form of a number, see:
| a | b | c | d | Result |
| 1 | 1 | 1 | 1 | 160 |
| 5 | 5 | 5 | 5 | 80 |
| 10 | 10 | 10 | 10 | 2 |
And I want to create a function that, when given a, b, c, d will produce the results above for the extreme cases (max, min, avg) and of course any values (float) in between.
How can I go about doing this? I imagine this is some form of polynomial problem, but how can I actually create the function that creates these results?
I have tasks like this often, so it would be cool to have a sort of pattern to follow whenever I need to create such functions for any amount of parameters and results needed.
I tried using wolfram alphas interpolate polynomial command for this, but the result is just a mess of extremely large fractions...
How can I create this function properly with reasonable results?
While writing this edit, I realize this may be better suited over at programmers.SE - If no one answers here, I will move the question there.
You don't have enough data as it is. The simplest formula which takes into account all your four explanatory variables would be linear:
x0 + x1*a + x2*b + x3*c + x4*d
If you formulate a set of equations for this, you have three equations but five unknowns, which means that you don't have a unique solution. On the other hand, the data points which you did provide are proof of the fact that the relation between scores and time is not exactly linear. So you might have to look at some family of functions which is even more complex, and therefore has even more parameters to tune. While it would be easy to tune parameters to match the input, that choice would be pretty arbitrary, and therefore without predictive power.
So while your system of four distinct scores might be useful in the long run, I'd not use that at the moment. I'd suggest you collect some more data points, see how long a given task actually did take you, and only use that fine-grained a model once you have enough data points to fit all of its parameters.
In the meantime, aggregate all four numbers into a single number. E.g. by taking their average. Then decide on a formula to choose. E.g. a quadratic one:
182 - 22.9*a + 0.49*a*a
That's a fair fit for your requirements, and not too complex or messy. But the choice of function, i.e. a polynomial one, is still pretty arbitrary. So revisit that choice once you have more data. Note that this polynomial is almost the one Wolfram Alpha found for your data:
1642/9 - 344/15*a + 22/45*a*a
I only converted these rational numbers to decimal notation, which I truncated pretty early on since all of this is very rough in any case.
On the whole, this question appears more suited to CrossValidated than to Programmers SE, in my opinion. But don't bother them unless you have sufficient data to actually fit a model.
I've been asked to document a piece of code using UML diagrams. The code models a situation like the following: a driver can be assigned to one or more routes. Each route has an upstream and a downstream direction. For each route the driver can drive in the upstream direction and/or in the downstream direction.
A simplified pseudo-code is for the Driver class is the following:
class Driver:
HashMap<Route, Direction> upstream;
HashMap<Route, Direction> downstream;
HashMap<Route, Direction> assignedTo;
where the assignedTo map is actually a property returning a hashmap composed of the routes where the driver is assigned to both the upstream and downstream directions (think of it as a view on the other two hashmaps)
So far I've come up the the following UML representation.
----------- ---------
| CLASS | (assignedTo) | CLASS |
| DRIVER |----------------------------| ROUTE |
----------- * | * ---------
|
-------------
| CLASS |
| DIRECTION |
-------------
^ ^
| |
------------ --------------
| CLASS | | CLASS |
| UPSTREAM | | DOWNSTREAM |
------------ --------------
However, I'm a little puzzled by the fact that in the UML I;m using inheritance while the code uses no inheritance. What do you think?
I've changed a little but this is another sample of mine. I am not sure if I understand the shown pseudo-code correctly, but the case when assigned to both directions, it could be a problem. In my personal opinion, my sample diagram would be easier for implementation too.
Regarding the inheritance, the answer would be different what this UML is for.. to represent how to implement or to explain the concepts. If the latter, there would be no problem using inheritance.
Hopefully I can explain my issue clearly enough for others to understand, here we go, imagine I have the two following hypothetical scenarios:
Scenario: Filter sweets by king size and nut content
Given I am on the "Sweet/List" Page
When I filter sweets by
| Field | Value |
| Filter.KingSize | True |
| Filter.ContainsNuts | False |
Then I should see :
| Value |
| Yorkie King Size |
| Mars King Size |
Scenario: Filter sweets by make
Given I am on the "Sweet/List" Page
When I filter sweets by
| Field | Value |
| Filter.Make | Haribo |
Then I should see :
| Value |
| Starmix |
These scenarios are useful because I can add as many When rows of Field/Value and Then Value entries as I like without changing the associated compiled test steps. However copy/pasting scenarios for different filter tests will become repetitive and take up alot of code - something I would like to avoid. Ideally I would like to create a scenario outline and keep the dynamic nature I have with the tests above, however when I try to do that I run into a problem defining the example table I cant add new rows as I see fit because that would be a new test instance, currently I have this:
Scenario Outline: Filter Sweets
Given I am on the <page> Page
When I filter chocolates by
| Field | Value |
| <filter> | <value> |
Then I should see :
| Output |
| <output> |
Examples:
| page | filter | value | output |
| Sweet/List | Filter.Make | Haribo | Starmix |
So I have the problem of being able to dynamically add rows to my filter and expected data when using a scenario outline, is anyone aware of a way around this? Should I be approaching this from a different angle?
A workaround could be something like :
Then I should see :
| Output |
| <x> |
| <y> |
| <z> |
Examples:
| x | y | z |
But thats not very dynamic.... hoping for a better solution? :)
I don't think what you're asking for is possible with SpecFlow, Gherkin, and out-of-the-box Cucumber. I can't speak for the authors, but I bet it purposely is not meant to be used this way because it goes against the overall "flow" of writing and implementing these specs. Among many things, the specs are meant to be readable to non-programmers, to give the programmer a guide to implement code that matches the specs, for integration testing, and to give a certian amount of flexibility when refactoring.
I think this is one of the situations where the pain you're feeling is a sign that there's a problem, but it may not be the one you think. You said:
"However copy/pasting scenarios for different filter tests will become repetitive and take up alot of code - something I would like to avoid. "
First, I'd disagree that explaining yourself in writing is "repetitive," at least any more than it's repetitive to use specific words like "the, apple, car, etc." over and over again. The issue is: Are these words properly explaining what you're doing? If they are, and explaining your situation requires you to write out multiple scenarios, then that's just what it requires. Communication requires words, and sometimes the same ones.
In fact, what you call "repetitive" is one of the benefits of using Gherkin and a tool like Cucumber or SpecFlow. If you're able to use that sentence over and over and over and over, it means you're not having to write the test code over and over and over and over.
Second, are you sure you're writing a spec for the right thing? I ask only because if the number of scenarios gets out-of-hand, to the point where you have so many that a human can't follow what you write, it's possible that your spec isn't targeted at the right thing.
A possible example of this could be how you're testing the filtering and the pagination in this scenario. Yes, you want your specs to cover full features and your site will have pagination on the same page as your filtering, but at what cost? It takes experience and practice to know when giving up on the supposed "ideal" of no-mocking, full-integration tests will yield better results.
Third, don't think that specs are meant to be perfect coverage for every possible scenario. The scenarios are basically snapshots of state, which means that there are some features that could cover an infinitely-large set of scenarios, which is impossible. So what do you do? Write features that tell the story as best you can. Even let the story drive the development. However, details that don't translate to your specs or other cases are best left to straight-up TDD, done in addition to the specs.
In your example, it seems that you basically are telling a story about a site that lets a user create a dynamic search against sweets and candy. They enter one of a large set of possible search criteria, click a button, and get results. Just stick to that, writing only enough specs to fulfill the story. If you're not satisfied with your coverage, clean it up with more specs or unit tests.
Anyway, that's just my thoughts, hope it helps.
Technically, I think you could try calling steps from within a step definition:
Calling Steps from Step Definitions
For example I think you could rewrite the
Then I should see :
| Output |
| <output> |
To be a custom step like
I should have output that contains <output>
Where output is a comma separated list of expected values. In the custom step you could break the comma separated list into an array and iterate over it calling
Then "I should see #{iterated_value}"
You could use a similar technique to pass in lists of filters and filter values. Your example row for the king size test might look like
| page | filter | value | output |
| Sweet/List | Filter.KingSize, Filter.ContainsNuts | True, False | Yorkie King Size, Mars King Size |
Or maybe
| page | filter-value-pairs | output |
| Sweet/List | Filter.KingSize:True, Filter.ContainsNuts:False | Yorkie King Size, Mars King Size |
That being said, you should perhaps take Darren's words to heart. I'm not really sure that this method would help the ultimate goal of having scenarios that are readable by non-developers.