SpecFlow/Cucumber/Gherkin - Using tables in a scenario outline - cucumber

Hopefully I can explain my issue clearly enough for others to understand, here we go, imagine I have the two following hypothetical scenarios:
Scenario: Filter sweets by king size and nut content
Given I am on the "Sweet/List" Page
When I filter sweets by
| Field | Value |
| Filter.KingSize | True |
| Filter.ContainsNuts | False |
Then I should see :
| Value |
| Yorkie King Size |
| Mars King Size |
Scenario: Filter sweets by make
Given I am on the "Sweet/List" Page
When I filter sweets by
| Field | Value |
| Filter.Make | Haribo |
Then I should see :
| Value |
| Starmix |
These scenarios are useful because I can add as many When rows of Field/Value and Then Value entries as I like without changing the associated compiled test steps. However copy/pasting scenarios for different filter tests will become repetitive and take up alot of code - something I would like to avoid. Ideally I would like to create a scenario outline and keep the dynamic nature I have with the tests above, however when I try to do that I run into a problem defining the example table I cant add new rows as I see fit because that would be a new test instance, currently I have this:
Scenario Outline: Filter Sweets
Given I am on the <page> Page
When I filter chocolates by
| Field | Value |
| <filter> | <value> |
Then I should see :
| Output |
| <output> |
Examples:
| page | filter | value | output |
| Sweet/List | Filter.Make | Haribo | Starmix |
So I have the problem of being able to dynamically add rows to my filter and expected data when using a scenario outline, is anyone aware of a way around this? Should I be approaching this from a different angle?
A workaround could be something like :
Then I should see :
| Output |
| <x> |
| <y> |
| <z> |
Examples:
| x | y | z |
But thats not very dynamic.... hoping for a better solution? :)

I don't think what you're asking for is possible with SpecFlow, Gherkin, and out-of-the-box Cucumber. I can't speak for the authors, but I bet it purposely is not meant to be used this way because it goes against the overall "flow" of writing and implementing these specs. Among many things, the specs are meant to be readable to non-programmers, to give the programmer a guide to implement code that matches the specs, for integration testing, and to give a certian amount of flexibility when refactoring.
I think this is one of the situations where the pain you're feeling is a sign that there's a problem, but it may not be the one you think. You said:
"However copy/pasting scenarios for different filter tests will become repetitive and take up alot of code - something I would like to avoid. "
First, I'd disagree that explaining yourself in writing is "repetitive," at least any more than it's repetitive to use specific words like "the, apple, car, etc." over and over again. The issue is: Are these words properly explaining what you're doing? If they are, and explaining your situation requires you to write out multiple scenarios, then that's just what it requires. Communication requires words, and sometimes the same ones.
In fact, what you call "repetitive" is one of the benefits of using Gherkin and a tool like Cucumber or SpecFlow. If you're able to use that sentence over and over and over and over, it means you're not having to write the test code over and over and over and over.
Second, are you sure you're writing a spec for the right thing? I ask only because if the number of scenarios gets out-of-hand, to the point where you have so many that a human can't follow what you write, it's possible that your spec isn't targeted at the right thing.
A possible example of this could be how you're testing the filtering and the pagination in this scenario. Yes, you want your specs to cover full features and your site will have pagination on the same page as your filtering, but at what cost? It takes experience and practice to know when giving up on the supposed "ideal" of no-mocking, full-integration tests will yield better results.
Third, don't think that specs are meant to be perfect coverage for every possible scenario. The scenarios are basically snapshots of state, which means that there are some features that could cover an infinitely-large set of scenarios, which is impossible. So what do you do? Write features that tell the story as best you can. Even let the story drive the development. However, details that don't translate to your specs or other cases are best left to straight-up TDD, done in addition to the specs.
In your example, it seems that you basically are telling a story about a site that lets a user create a dynamic search against sweets and candy. They enter one of a large set of possible search criteria, click a button, and get results. Just stick to that, writing only enough specs to fulfill the story. If you're not satisfied with your coverage, clean it up with more specs or unit tests.
Anyway, that's just my thoughts, hope it helps.

Technically, I think you could try calling steps from within a step definition:
Calling Steps from Step Definitions
For example I think you could rewrite the
Then I should see :
| Output |
| <output> |
To be a custom step like
I should have output that contains <output>
Where output is a comma separated list of expected values. In the custom step you could break the comma separated list into an array and iterate over it calling
Then "I should see #{iterated_value}"
You could use a similar technique to pass in lists of filters and filter values. Your example row for the king size test might look like
| page | filter | value | output |
| Sweet/List | Filter.KingSize, Filter.ContainsNuts | True, False | Yorkie King Size, Mars King Size |
Or maybe
| page | filter-value-pairs | output |
| Sweet/List | Filter.KingSize:True, Filter.ContainsNuts:False | Yorkie King Size, Mars King Size |
That being said, you should perhaps take Darren's words to heart. I'm not really sure that this method would help the ultimate goal of having scenarios that are readable by non-developers.

Related

Do you store data in the Delta Lake Silver layer in a normalized format or do you derive it?

I am currently setting up a data lake trying to follow the principles of Delta Lake (landing in bronze, cleaning and merging into silver, and then, if needed, presenting the final view in gold) and have a question about what should be stored in Silver.
For example, if the data in bronze comes in from a REST API and is stored in the JSON form it comes in in this format:
id (Int)
name (String)
fields (Array of Strings)
An example looks like:
{
'id':12345,
'name':'Test',
'fields':['Hello','this','is','a','test']
}
In the end I want to present this as two tables. One would be the base table and look like:
TABLE 1
| id | name |
| -------- | -------------- |
| 12345 | Test |
And another would look like:
TABLE 2
| id | field_value |
| -------- | -------------- |
| 12345 | Hello |
| 12345 | this |
| 12345 | is |
| 12345 | a |
| 12345 | test |
My question is, should I pre-process the data in Spark and store the data in silver in separate folders like this:
-- root
---table 1
----file1.parquet
----etc.parquet
---table 2
----file1.parquet
----etc.parquet
Or store it all in silver under one folder and then derive those two tables using TSQL and functions like OPENJSON later?
Thank you for your help or insight!
I do not think there is a real answer to your questions, but here is a stab - based on your explicit example and this reference https://k21academy.com/microsoft-azure/data-engineer/delta-lake/
My question is, should I pre-process the data in Spark and store the
data in silver in separate folders like this: ...
Yes, I would as JSON takes more time to process. I use JSON for RAW on a current project if it comes in that format and in the REFined Area we store arrays if needed, as opposed to JSON structs. But this is because we use a data Hub approach based on Martin Fowler's Distributed Data Mesh. We have a BUSiness Area where we model the data according to a semantic model.
But for every expert there is an equal and opposite expert. Some would say do it on the fly, like SAP Hana ETL on-the-fly.
For analysis of datasets given to Data Scientist for analysis, or ad hoc analysis, the 2nd approach is fine. The data would be in the Bronze zone. That said gdpr aspects would, could mean refine them to the Silver zone with gdpr aspects removed.
In short, depends on your use case.

Create (mathematical) function from set of predefined values

I want to create an excel table that will help me when estimating implementation times for tasks that I am given. To do so, I derived 4 categories in which I individually rate the task from 1 to 10.
Those are: Complexity of system (simple scripts or entire business systems), State of requirements (well defined or very soft), Knowledge about system (how much I know about the system and the code base) and Plan for implementation (do I know what to do or don't I have any plan what to do or where to start).
After rating each task in these categories, I want to have a resulting factor of how expensive and how long the task will likely take, as a very rough estimate that I can tell my bosses.
What I thought about doing
I thought to create a function where I define the inputs and then get the result in form of a number, see:
| a | b | c | d | Result |
| 1 | 1 | 1 | 1 | 160 |
| 5 | 5 | 5 | 5 | 80 |
| 10 | 10 | 10 | 10 | 2 |
And I want to create a function that, when given a, b, c, d will produce the results above for the extreme cases (max, min, avg) and of course any values (float) in between.
How can I go about doing this? I imagine this is some form of polynomial problem, but how can I actually create the function that creates these results?
I have tasks like this often, so it would be cool to have a sort of pattern to follow whenever I need to create such functions for any amount of parameters and results needed.
I tried using wolfram alphas interpolate polynomial command for this, but the result is just a mess of extremely large fractions...
How can I create this function properly with reasonable results?
While writing this edit, I realize this may be better suited over at programmers.SE - If no one answers here, I will move the question there.
You don't have enough data as it is. The simplest formula which takes into account all your four explanatory variables would be linear:
x0 + x1*a + x2*b + x3*c + x4*d
If you formulate a set of equations for this, you have three equations but five unknowns, which means that you don't have a unique solution. On the other hand, the data points which you did provide are proof of the fact that the relation between scores and time is not exactly linear. So you might have to look at some family of functions which is even more complex, and therefore has even more parameters to tune. While it would be easy to tune parameters to match the input, that choice would be pretty arbitrary, and therefore without predictive power.
So while your system of four distinct scores might be useful in the long run, I'd not use that at the moment. I'd suggest you collect some more data points, see how long a given task actually did take you, and only use that fine-grained a model once you have enough data points to fit all of its parameters.
In the meantime, aggregate all four numbers into a single number. E.g. by taking their average. Then decide on a formula to choose. E.g. a quadratic one:
182 - 22.9*a + 0.49*a*a
That's a fair fit for your requirements, and not too complex or messy. But the choice of function, i.e. a polynomial one, is still pretty arbitrary. So revisit that choice once you have more data. Note that this polynomial is almost the one Wolfram Alpha found for your data:
1642/9 - 344/15*a + 22/45*a*a
I only converted these rational numbers to decimal notation, which I truncated pretty early on since all of this is very rough in any case.
On the whole, this question appears more suited to CrossValidated than to Programmers SE, in my opinion. But don't bother them unless you have sufficient data to actually fit a model.

Multi dimensional Scenario Outlines in Specflow

I'm creating a Scenario Outline similar to the following one (it is a simplified version but gives a good indication of my problem):
Given I have a valid operator such as 'MyOperatorName'
When I provide a valid phone number for the operator
And I provide an '<amount>' that is of the following '<type>'
And I send a request
Then the following validation message will be displayed: 'The Format of Amount is not valid'
And the following Status Code will be received: 'AmountFormatIsInvalid'
Examples:
| type | description | amount |
| Negative | An amount that is negative | -1.0 |
| Zero | An amount that is equal to zero | 0 |
| ......... | .......... | .... |
The Examples table provides the test data that I need but I would add another Examples table with just the names of the operators (instead of MyOperatorName) in order to replicate the tests for different operators
Examples:
| operator |
| op_numb_1 |
| op_numb_2 |
| op_numb_3 |
in order to avoid repeating the same scenario outline three times; I know that this is not possible but I'm wondering what is the best approach to avoid using three different scenario outlines inside the feature that are pretty the same apart from the operator name.
I know that I can reuse the same step definitions but I'm trying to understand if there is a best practice to prevent cluttering the feature with scenarios that are too much similar.
Glad you know this isn't possible...
So what options are there?
Seems like there are 5:
a: Make a table with every option (the cross product)
Examples:
| type | description | amount | operator |
| Negative | An amount that is negative | -1.0 | op_numb_1 |
| Zero | An amount that is equal to zero | 0 | op_numb_1 |
| Negative | An amount that is negative | -1.0 | op_numb_2 |
| Zero | An amount that is equal to zero | 0 | op_numb_2 |
| ......... | .......... | .... | ... |
b. Repeat the scenario for each operator, with a table of input rows
- but you said you didn't want to do this.
c. Repeat the scenario for each input row, with a table of operators
- I like this option, because each rule is a separate test. If you really, really want to ensure that every different implementation of your "operator" strategy passes and fails in the same validation scenarios, then why not write each validation scenario as a single Scenario Outline: e.g.
Scenario Outline: Operators should fail on Negative inputs
Given I have a valid operator such as 'MyOperatorName'
When I provide a valid phone number for the operator
And I send a request with the amount "-1.0"
Then the following validation message will be displayed: 'The Format of Amount is not valid'
And the following Status Code will be received: 'AmountFormatIsInvalid'
Scenario Outline: Operators should fail on Zero inputs
...etc...
d. Rethink how you are using Specflow - if you only need KEY examples to illustrate your features (as described by Specification by Example by Gojko Adzic), then you are overdoing it by checking every combination. If however you are using specflow to automate your full suite of integration tests then your scenarios could be appropriate... but you might want to think about e.
e. Write integration / unit tests based on the idea that your "operator" validation logic is applied only in one place. If the validation is the same on each operator, why not test it once, and then have all the operators inherit from or include in their composition the same validator class?

DDD: Help me further understand Value Objects and Entities

There are several questions on this, and reading them isn't helping me. In Eric Evans DDD, he uses the example of address being a value type in certain situations. For a mail order company, the address is a value type because it doesn't really matter if the address is shared, who else lives at the address, simply that the package arrives at the address.
This makes sense to me until I start thinking about how this would be designed. Given the diagram on page 99, he has it like this:
+------------+
|Customer |
+------------+
|customerId |
|name |
|street |
|city |
|state |
+------------+
This changes to:
+------------+
|Customer | (entity)
+------------+
|customerId |
|name |
|address |
+------------+
+------------+
|Address | (value object)
+------------+
|street |
|city |
|state |
+------------+
If these were tables, Address would have its own Id in order to have a relationship with the customer, turning it into an entity.
Is the idea that in a relational database these would stay in the same table, such as in the first example, and that you'd use features of the ORM to abstract address as a value object (such as nHibernate's component features)?
I realize that a couple of pages later he talks about denormalization, I'm just trying to make sure I'm understanding the concept correctly.
When Eric Evans talks about "entities have identity, Value Objects do not", he's not talking about an ID column in the database - he's talking about identity as a concept.
VOs have no conceptual identity. That doesn't mean that they shouldn't have persistence identity. Don't let persistence implementation cloud your understanding of Entities vs VOs.
You can create separate table for address or in same table in Customer
Is the idea that in a relational
database these would stay in the same
table, such as in the first example,
and that you'd use features of the ORM
to abstract address as a value object
(such as nHibernate's component
features)?
Yes, generally, that is the idea.
Alternatively (if your ORM doesn't support Value Objects directly), you can let the VO tables have an ID, but hide that within your domain model.
I personally don't give a damn about having ID on value objects as long as they override equality comparison properly (cause value objects differs by their value not identity).
Mapping value objects to database is technical concern, sometimes (e.g. marking props virtual so ORM could crawl underneath) You just need to sacrifice purity of domain model a bit. Or make Your infrastructure smarter - usage of nhib components or something.
Yes, generally Address would stay in the same table. Address would be mapped something like this:
+-----------------+
|Customer |
+-----------------+
|customerId |
|name |
|address_street |
|address_city |
|address_state |
+-----------------+
If Address was an entity, then it would be in a separate table, as you said. If two of the same Customers linked to the same Address entity, then changing an attribute of that Address would affect both Customers. However, a VO implementation would only affect one or the other.

Shotgun vs Sequential Input Layout

I would like to determine which of the two layouts below is the better layout. I would like usability to be the main concern. Which one is better (in terms of usability) and why is it better?
Shotgun
Use as much of the horizontal screen width as possible without causing horizontal scrolling to occur. Obvious benefit is that vertical scrolling will be minimized/eliminated and screen real estate is maximized.
Sequential
One input per line. Downside is that there could be significantly more scrolling than the Shotgun layout.
Shotgun Sequential
|----------------------------------| |-----------------------------------|
| | | |
| Input1: ______ Input2: ______ | | Input1: ______ |
| | vs | |
| Input3: ______ Input4: ______ | | Input2: ______ |
| | | |
|----------------------------------| | Input3: ______ |
| |
| Input4: ______ |
| |
|-----------------------------------|
The sequential has better usability.
In both layouts user discerns lines. In the Shotgun case each line is about two things which requires extra mental processing to understand. In the Sequential case each line is about a single concept which is simpler.
Having more than one concept on a line not only divides attention but also takes additional brain power to identify possible relations between the concepts, to analyze whether the inputs are meant to be related until the analysis subroutine says "no".
As a general rule, dense interfaces with high ratio of elements per space area are more tiring and slowing down than "white space" interfaces. Elements include any UI entity, be it an active input element, a passive textual comment or a graphical element.
I would agree with New in town, with the exception of times when the fields make more sense to be beside each other. Such as when you are entering first and last names for a number of people, such as:
First Name: __________ Last Name: __________
First Name: __________ Last Name: __________
First Name: __________ Last Name: __________
If these were to be in a sequential, it would be much harder to understand and group the fields together (in your head).

Resources