How to handle too long Gherkin scenario lines? - cucumber

I have some scenarios with too many parameters and most of the parameters causes the variation of scenarios. Therefore, I need to include parameter details in scenario name to give insight about the scenario. However, this causes too long Scenario lines.
For Example:
Scenario: Create list for Today's unique stuff of 'X' item with multiple string attribute values and 'distinct count' aggregation
Given I create a 'Create List' request and name as 'New List'
When I add 'X' item to 'Create List' request
And I add item attribute to current list query on list preview request
| attribute | operator | values |
| id | EXMATCH | id1,id2 |
And I add list aggregation to current list query on 'Create List' request
| aggField | aggType |
| stuff | DISTINCT_COUNT |
And I send request to 'Create List' request date as 'TODAY'
Then 'success' parameter in response should be true
And received list name should be equal to created list name
And received list queries in 'Create List' response should be equal to created list queries
Another Scenario:
Scenario: Create list for Today's unique stuff of 'X' item with multiple integer attribute values and 'sum' aggregation
Or:
Scenario: Create list for Today's unique stuff of 'X' item with multiple integer attribute values, 'sum' aggregation and <some other parameter related conditions which causes too long scenario name>
This can go on and on according to the number of different parameters who effects the scenario.
I have a feeling like there must be best practices writing clearer and shorter scenario names. Are there any?
How should I handle these long scenario names? Or, can I find easer/shorter way of express the content of scenario?

Cucumber allows you to use natural language (as opposed to a programming language) to write your scenarios. You can use all the tools of natural language to simplify your scenarios.
The two most powerful tools to simplify are
abstraction
naming
These tools work hand in hand. With abstraction you take something with alot of details and abstract it into something simpler that removes the details. You use naming to give this new thing a name. If your name is good you can know talk about your complex thing using your new simple term and no longer have to talk about the details.
To make you scenarios simpler you need to abstract, remove the details, and give things good names. The way to do this is to read your scenario and differentiate between WHAT you are doing and HOW you are doing it. Then have your scenarios focus on only saying WHAT they are doing and not saying anything about HOW they are doing it.
One additional tool when thinking about WHAT something is doing is to also think about WHY someone is doing the thing. So lets have a look at your scenario and formulate a few questions.
We do this all the time in computing. Every time we write a method/function we are abstracting and naming. We do this even more often in real life. When you order a coffee you don't say
"I'd like a double expresso in a warmed cup with 3oz of milk wet
foamed at 60C poured with a swan pattern"
You say
"I'd like a flat white"
And of course a double expresso is just another abstraction for a set of instructions that talks about water temperature, number of grammes of coffee, grind settings (extra fine), pressure of water etc. etc.
By using abstraction and naming we can talk eloquently about coffee with all its complexity without mentioning any of the details.
So what is the 'flat white' or 'double expresso' for your scenario?
Your scenario seems to be about creating some sort of a list.
WHAT sort of list is this?
WHY are you creating it?
WHAT will people use this list for?
WHY will people find this list useful?
Once you have asked and answered these questions you can start thinking about how to simplify. When you answer these questions, use the answers to
name your feature
describe your feature
write a preamble for your feature (the bit between Feature and the first Scenario)
write you Scenario titles
You shouldn't start writing a scenario until you have all of this done, and have a Feature that tells you WHAT your scenarios are going to be about and WHY its important for you to do these things.
You also talk about the parameters you are adding causing a variation in the scenarios. So for each parameter you are adding you should be asking
WHAT sort of variation does this parameter cause?
WHY is this variation important? Is it important enough to have its own scenario?
Again think about sets of parameters creating named things like a mocha, cortado or latte.
When you have answered these questions you can remove the parameters from your scenarios. Each set of parameters that creates a variation. For each variation you can remove the parameters by abstracting and giving a name to the variation
If you apply this approach and answer these questions then you transform your scenarios into something much simpler

Related

CQRS Read Model Projections: How complex is too complex a data transformation

I want to sanity check myself on a view projection, in regards to if an intermediary concept can purely exist in the read model while providing a bridge between commands.
Let me use a contrived example to explain.
We place an order which raises an OrderPlaced event. The workflow then involves generating a picking slip, which is used to prepare a shipment.
A picking slip can be generated from an order (or group of orders) without any additional information being supplied from any external source or user. Is it acceptable then that the picking slip can be represented purely as a read model?
So:
PlaceOrderCommand -> OrderPlacedEvent
OrderPlacedEvent -> PickingSlipView
The warehouse manager can then view a picking slip, select the lines they would like to ship, and then perform a PrepareShipment command. A ShipmentPrepared event will then update the original order, and remove the relevant lines from the PickingSlipView.
I know it's a toy example, but I have a conceptually similar use case where a colleague believes the PickingSlip should be a domain entity/aggregate in its own right, as it's conceptually different to order. So you have PlaceOrder, GeneratePickingSlip, and PrepareShipment commands.
The GeneratePickingSlip command however simply takes an order number (identifier), transforms the order data into a picking slip entity, and persists the entity. You can't modify or remove a picking slip or perform any action on it, apart from using it to prepare a shipment.
This feels like introducing unnecessary overhead on the write model, for what is ultimately just a transformation of existing information to enable another command.
So (and without delving deeply into the problem space of warehouses and shipping)...
Is what I'm proposing a legitimate use case for a read model?
Acting as an intermediary between two commands, via transformation of some data into a different view. Or, as my colleague proposes, should every concept be represented in the write model in all cases?
I feel my approach is simpler, and avoiding unneeded complexity, but I'm new to CQRS and so perhaps missing something.
Edit - Alternative Example
Providing another example to explore:
We have a book of record for categories, where each record is information about products and their location. The book of record is populated by an external system, and contains SKU numbers, mapped to available locations:
Book of Record (Electronics)
SKU# Location1 Location2 Location3 ... Location 10
XXXX Introduce Remove Introduce ... N/A
YYYY N/A Introduce Introduce ... Remove
Each book of record is an entity, and each line is a value object.
The book of record is used to generate different Tasks (which are grouped in a TaskPlan to be assigned to a person). The plan may only cover a subset of locations.
There are different types of Tasks: One TaskPlan is for the individual who is on a location to add or remove stock from shelves. Call this an AllocateStock task. Another type of Task exists for a regional supervisor managing multiple locations, to check that shelving is properly following store guidelines, say CheckDisplay task. For allocating stock, we are interested in both introduced and removed SKUs. For checking the displays, we're only interested in newly Introduced SKUs, etc.
We are exploring two options:
Option 1
The person creating the tasks has a View (read model) that allows them to select Book of Records. Say they select Electronics and Fashion. They then select one or more locations. They could then submit a command like:
GenerateCheckDisplayTasks(TaskPlanId, List<BookOfRecordId>, List<Locations>)
The commands would then orchestrate going through the records, filtering out locations we don't need, processing only the 'Introduced' items, and creating the corresponding CheckDisplayTasks for each SKU in the TaskPlan.
Option 2
The other option is to shift the filtering to the read model before generating the tasks.
When a book of record is added a view model for each type of task is maintained. The data might be transposed, and would only include relevant info. ie. the CheckDisplayScopeView might project the book of record to:
Category SKU Location
Electronics (BookOfRecordId) XXXX Location1
Electronics (BookOfRecordId) XXXX Location3
Electronics (BookOfRecordId) YYYY Location2
Electronics (BookOfRecordId) YYYY Location3
Fashion (BookOfRecordId) ... ... etc
When generating tasks, the view enables the user to select the category and locations they want to generate the tasks for. Perhaps they select the Electronics category and Location 1 and 3.
The command is now:
GenerateCheckDisplayTasks(TaskPlanId, List<BookOfRecordId, SKU, Location>)
Where the command now no longer is responsible for the logic needed to filter out the locations, the Removed and N/A items, etc.
So the command for the first option just submits the ID of the entity that is being converted to tasks, along with the filter options, and does all the work internally, likely utilizing domain services.
The second option offloads the filtering aspect to the view model, and now the command submits values that will generate the tasks.
Note: In terms of the guidance that Aggregates shouldn't appear out of thin air, the Task Plan aggregate will create the Tasks.
I'm trying to determine if option 2 is pushing too much responsibility onto the read model, or whether this filtering behavior is more applicable there.
Sorry, I attempted to use the PickingSlip example as I thought it would be a more recognizable problem space, but realize now that there are connotations that go along with the concept that may have muddied the waters.
The answer to your question, in my opinion, very much depends on how you design your domain, not how you implement CQRS. The way you present it, it seems that all these operations and aggregates are in the same Bounded Context but at first glance, I would think that there are 3 (naming is difficult!):
Order Management or Sales, where orders are placed
Warehouse Operations, where goods are packaged to be shipped
Shipments, where packages are put in trucks and leave
When an Order is Placed in Order Management, Warehouse reacts and starts the Packaging workflow. At this point, Warehouse should have all the data required to perform its logic, without needing the Order anymore.
The warehouse manager can then view a picking slip, select the lines they would like to ship, and then perform a PrepareShipment command.
To me, this clearly indicates the need for an aggregate that will ensure the invariants are respected. You cannot select items not present in the picking slip, you cannot select more items than the quantities specified, you cannot select items that have already been packaged in a previous package and so on.
A ShipmentPrepared event will then update the original order, and remove the relevant lines from the PickingSlipView.
I don't understand why you would modify the original order. Also, removing lines from a view is not a safe operation per se. You want to guarantee that concurrency doesn't cause a single item to be placed in multiple packages, for example. You guarantee that using an aggregate that contains all the items, generates the packaging instructions, and marks the items of each package safely and transactionally.
Acting as an intermediary between two commands
Aggregates execute the commands, they are not in between.
Viewing it from another angle, an indication that you need that aggregate is that the PrepareShippingCommand needs to create an aggregate (Shipping), and according to Udi Dahan, you should not create aggregate roots (out of thin air). Instead, other aggregate roots create them. So, it seems fair to say that there needs to be some aggregate, which ensures that the policies to create shippings are applied.
As a final note, domain design is difficult and you need to know the domain very well, so it is very likely that my proposed solution is not correct, but I hope the considerations I made on each step are helpful to you to come up with the right solution.
UPDATE after question update
I read a couple of times the updated question and updated several times my answer, but ended up every time with answers very specific to your example again and I'm most likely missing a lot of details to actually be helpful (I'd be happy to discuss it on another channel though). Therefore, I want to go back to the first sentence of your question to add an important comment that I missed:
an intermediary concept can purely exist in the read model, while providing a bridge between commands.
In my opinion, read models are disposable. They are not a single source of truth. They are a representation of the data to easily fulfil the current query needs. When these query needs change, old read models are deleted and new ones are created based on the data from the write models.
So, only based on this, I would recommend to not prepare a read model to facilitate your commands operations.
I think that your solution is here:
When a book of record is added a view model for each type of task is maintained. The data might be transposed, and would only include relevant info.
If I understand it correctly, what you should do here is not create view model, but create an Aggregate (or multiple). Then this aggregate can receive the commands, apply the business rules and mutate the state. So, instead of having a domain service reading data from "clever" read models and putting it all together, you have an aggregate which encapsulates the data it needs and the business logic.
I hope it makes sense. It's a broad topic and we could talk about it for hours probably.

Internal Search optimization for relevance

My team is using Solr and I have a question regarding it.
There are some search terms which doesn't gives relevant results or results which should have been displayed. For example:
Searching for Macy's without the apostrophe like "Macys" doesnt give back any result for Macy's.
Searching for JPMorgan vs JP Morgan gives different result
Searching for IBM doesn't show results which contains its full name i.e International business machine.
How can we improve and optimize such cases so that it gets applied to all, even to the one we didn't catch apart from these 3 above?
Any suggestions?
All these issues are related to how you process the incoming text for those fields. You'll have to create a filter chain for the field - and possibly use multiple fields for different use cases and prioritize those using qf - that processes the input values to do what you want.
Your first case can be solved by using a PatternReplaceFilter to remove any apostrophes - depending on your use case and tokenizer you might want to use the CharFilter version, as it processes the text before it's split into multiple tokens.
Your second case is a straight forward synonym filter or a WordDelimiterFilter, where you expand JPMorgan to "JP Morgan", or use the WordDelimiterFilter to expand case changes into separate tokens. That'll also allow you to search for JP and get JPMorgan related entries. These might have different effects on score, use debugQuery=true to see exactly how each term in your query contributes to the score.
The third case is in general the same as the second case. You'll have to create a decent synonym word list for the terms used, and this is usually something you build as you get feedback from your users, from existing dictionaries and from domain knowledge. There's also the option of preprocessing text using NLP, or in this case, something as primitive as indexing the initials of any capitalized words after each other could help.

Where to abstract Cucumber step data properly?

I need to write a class for enforcing rules about items which may or may not be added to the same container in a warehouse, and I'd like to translate the requirements in to Cucumber before implementing it.
Each item has several attributes, such as "Item Family" (eg: electronics, book), "Item Status" (eg: main stock, faulty stock), and "Batch" (eg: 1050, 1051).
I can think of several strategies for writing a Cucumber test for this, and I'd like to know which is the recommended one:
Firstly, you could enumerate all of the attributes per product:
Given I have a tote containing:
| sku | client | family | status | batch | weight |
| 100000 | Foo | garment | main | 1234 | 10 |
When I add the item:
| sku | client | family | status | batch | weight |
| 200000 | Bar | garment | main | 1234 | 10 |
Then I should be told there is a Client conflict
Secondly, you could have a basic product hard-coded, and try specifying the minimum differing attributes from it:
Given I have a tote containing an item that's client "Foo"
When I add an item that's client "Bar"
Then I should be told there is a Client conflict
This assumes the step definitions hold the basic attributes, and override them when attributes are mentioned in the steps.
Finally, you could go a further step of abstraction:
Given I have a tote containing an item
And I add an item with a different client
Then I should be told there's a client conflict
Any guidance on the correct approach here?
The answer from The Cucumber Book would be whichever is most readable to the non-technical members of your team. Sit down with the QA lead and project manager, and ask them the same question. I had a similar problem, and started with something like your first suggestion. Then I decided it was too detailed and jumped to #3. Then I sat down with the project manager and found out that when I was creating the data I did not need any detail, but when we changed the data (in our case updating line item values on an invoice), we wanted to see what those values were in the steps.
Chapter 6 from The Cucumber Book "When Cucumbers Go Bad" was really helpful in directing to the right level of detail. I really think you should give it a read, especially the part about coming up with an Ubiquitous Language. I think that will help you decide on the right level of detail for your organization.
If you are tempted to use the first test, my question to you would be, "How often are you going to change those values?" If the answer is "not very" or "never", then you should consider whether they are adding to or detracting from the readability of the test.
P.S. I'm still reading The Cucumber Book, but so far it has been extremely helpful, for example pointing me towards FactoryGirl as socjopata suggested.
First option mentioned is the one that's most flexible and reusable. With the first approach you can cover basically any case you may need, but there are some cons, that you'll read about below.
The 2nd and 3rd options are easier to read, which is also an important factor while writing tests. Furthermore, the seem to focus on what is actually tested, i.e the key difference which "Foo" and "Bar" seem to make in that scenario/feature. And that's also preferred while writing tests.
Generally, imho writing Cucumber tests is like placing yourself between a rock and a hard place. I noticed that developers tend to reuse and over reuse cucumber steps creating hard to understand and maintain scenarios.
The second approach requires more work in defining steps, butscenarios are cleaner and easier to read... BUT it requires more time to write a scenario as well as it produces a large steps definition base, which can be hard to maintain.
If you really want Cucumber for bdd then I would most prolly lean to 2nd option. Just make sure you'll use FactoryGirl or something similar under the hood, to create generic object and overwrite only what you need at a time.
I hope that you find this useful.

Freebase: Format search result to list all properties of object of unknown type(s)

I'm trying to write a MQL query to format a search result in freebase (the "output" parameter in the search API). I essentially want to find the (simple) values of all the properties of a given search result (without knowing anything about the types of the result a priori). By "simple", I mean only the default properties if the values are complex objects.
E.g., if I search for "Yo La Tengo" and this takes me to the result for "/en/yo_la_tengo", I want to be able to get the group's members (I just need names, not instruments or dates started), albums (again, just names), films contributed to (again, just names), etc.
Is there a simple way to do this with a search output query, given that I know nothing about the types? I imagine there's some sort of reflection magic I can use, and I've tried mucking about with "/type/reflect", but I'm not getting anywhere. I'm brand-new to MQL (though I have extensive SQL experience), so this is a little daunting. Any ideas?
Edit: So to clarify, I think the problem I'm seeing is due to mediator types like "performance" (an actor in a film) or "marriage". E.g., with a query about Yo La Tengo, I can see most (all?) information that I'm interested in, but a similar query about [The Muppet Movie]( freebase.com/api/service/search?limit=1&mql_output=%5B%7B%22%2Ftype%2Freflect%2Fany_reverse%22%3A%5B%7B%7D%5D%2C%22%2Ftype%2Freflect%2Fany_master%22%3A%5B%7B%7D%5D%2C%22%2Ftype%2Freflect%2Fany_value%22%3A%5B%7B%7D%5D%7D%5D&query=The%20Muppet%20Movie -- sorry, SO thinks I'm a spammer so I can't make this a link), I don't see Frank Oz reference at all (probably because his performance is referenced instead). Is there a generic way for me to "follow" mediator types to get all their properties? E.g., is there a single output MQL that would allow me to get the actor in a performance (when linked form a film search result) and give the the spouse in a marriage (when linked from a person)?
Querying not only every property, but then following those properties another ply deep in the graph for all search results is going to be an incredibly expensive operation. What is the use case for this? Do you really have a UI where the user can see and effectively absorb all this information? To answer the question directly though, it's not possible to unpack mediator types automatically using mql_output on the search API.
I'd suggest combining a basic set of information on the search query with a deeper set of information on a topic that the user has expressed interest (e.g. by hovering over). This UI experience would be similar to that of Freebase Suggest.
In the years since the question was originally asked there have been some additional useful things added such as the "notable" pseudo-property which lets you see what the topic is notable for.
Of course everyone also needs to be moving to the new API, so the queries would be:
https://www.googleapis.com/freebase/v1/search?query=%22the%20muppet%20movie%22&limit=1&indent=true
https://www.googleapis.com/freebase/v1/topic/en/the_muppet_movie
AFAIK there is no way to do this in outright MQL, but you can:
Get all the properties of an object or type of object, then
Programmatically construct another MQL query to get those objects you want to know more about.
Look at this example:
[{
"type|=": [
"/film/actor",
"/tv/tv_actor",
"/celebrities/celebrity"
],
"*": [{}]
}]​
It grabs all the properties of all objects that have the type actor, tv_actor, or celebrity. When you run it, you'll see all the possible "follow" points you can explore.
This is not exactly what you want, but it should get you closer.

Patterns for the overlap of two objects

I'm sure this has already been asked and answered so I apologize in advance for that but I'm not figuring out the correct keywords to search for. Searching for "Pattern" hits way too many Q & A's to be useful.
I'm working on a regression testing app. I'm displaying a form on the screen and according to which user is logged in to the app some of the fields should be read-only. So I can abstract a field object and I can abstract a user object but what pattern should I be looking at to describe the intersection of these two concepts? In other words how should I describe that for Field 1 and User A, the field should be read-only? It seems like read-only (or not) should be a property of the Field class but as I said, it depends on which user is looking at the form. I've considered a simple two-dimensional array (e. g. ReadOnly[Field,User] = True) but I want to make sure I've picked the most effective structure to represent this.
Are there any software design patterns regarding this kind of data structure? Am I overcomplicating things--would a two-dimensional array be the best way to go here? As I said if this has been asked and answered, I do apologize. I did search here and didn't find anything and a Google search failed to turn up anything either.
Table driven designs can be effective.
Steve Maguire had few nice examples in Writing Solid Code .
They are also a great way to capture tests, see fit .
In your case something like:
Field1ReadonlyRules = {
'user class 1' : True,
'user class 2' : False
}
field1.readOnly = Field1ReadonlyRules[ someUser.userClass ]
As an aside you probably want to model both users and user classes/roles/groups instead of combining them.
A user typically captures who (authentication) while groups/roles capture what (permissions, capabilities)
At first blush it sounds more like you have two different types of users and they have different access levels. This could be solved by inheritance (PowerUser, User) or by containing a security object or token that sets the level for the user.
If you don't like inheritance as a rule, you could use a State pattern on the application, Decorate the user objects (Shudder) or possibly add strategy patterns for differing security levels. But I think it's a little early yet, I don't normally apply patterns until I have a firm idea of how the item will grown and be maintained.

Resources