Can I reuse a cucumber/gherkin Example block? - cucumber

I've got two different scenarios that use the same example block. I need to run the example block for two different times of the day and I'm looking for a succinct way to do this (without copy+pasting my example block).
I'm replacing the yyymmdd with an actual date in my stepdef.
I'd like to reuse my Example block because in real life it's a MUCH longer list.
Scenario Outline: File arrives in the morning
Given a file <file> arrives in the morning
When our app runs
Then The file should be moved to <newFile>
And the date should be today
Examples:
|Filename|NewFilename|
|FileA|NewFileA_yyyymmdd|
|FileB|NewFileB_yyyymmdd|
Scenario Outline: File arrives in the evening
Given a file <file> arrives in the evening
When our app runs
Then The file should be moved to <newFile>
And the date should be tomorrow
Examples:
|Filename|NewFilename|
|FileA|NewFileA_yyyymmdd|
|FileB|NewFileB_yyyymmdd|
I'm implementing this in java, though I don't know if that's a relevant detail.

No, this is not supported in the Gherkin syntax. I don't often advised copy-and-paste, but this is one case where it is warranted due to a missing feature of the language.
Generally this should not be a big deal, as the example size should be small. It you really need a large number of examples then recreating this test in code only (Java, Python, C#, etc.) might be the best idea. Most unit test libraries offer some form of data driven tests that might provide a DRYer, more maintainable solution than Gherkin.

This is something thats better tested at a lower level. What you are testing here is your file renaming algorithm. You could write a unit test to do this which would
run much much faster (100 1000 or even 10K times faster is perfectly realistic)
be much more expressive
deal with edge cases better
Once you have that done I would write a single scenario that deals with the whole end to end process, and just ensures that the file is moved and renamed e.g.
Given a file has arrived
When our app runs
Then the file should be moved
And it should be renamed
And the new name should contain the current date
Cukes are expensive to create and particularly to run, so you need to get lots of functionality exercised for each one. When you use outlines and create identical scenarios you are just wasting loads of runtime and adding complexity for little benefit.

Related

Best way to search through a very big dataset?

I have text files that contain about 12gbs worth of tweets and need to search through this dataset off of keywords. What is the best way to go about doing this?
Familiar with Java, Python, R. I don't think my computer can handle the files if, for example, I do some sort of script that goes through each text file in python
"Oh, Python, or any other language, can most-certainly do it." Might take a few seconds, but the job will get done. I suggest that the best approach to your problem is: "straight ahead." Write scripts that process the files one line at a time.
Although "12 gigabytes" sounds enormous to us, to any modern-day machine it's really not that big at all.
Build hashes (associative arrays) in memory as needed. Generally avoid database-operations (other than "SQLite" database files, maybe ...), but, if you happen to find yourself needing "indexed file storage," SQLite is a terrific tool.
. . . with one very-important caveat: "when using SQLite, use transactions, even when reading." By default, SQLite will physically-commit every write and physically-verify every read, unless you are in a transaction. Then, and only then, it will "lazy read/write," as you might have expected it to do all the time. (And then, "that sucker's f-a-s-t...!")
If you want to be exact, then you need to see at every file once, so if your computer can't take that load, then say goodbye to exactness.
Another approach, would be to use approximation algorithms which are faster than the exact ones, but come in the expense of loosing accuracy.
That should get you started and I will stop my answer here, since the topic is just too broad to continue with from here.

Maximum/Optimal number of Cucumber test scenarios in one file

I have a Cucumber feature file with over 66 scenarios! The title of the feature file does represent what the scenarios are all about.
But 66 (200 steps) feels like quite a large number. Does this suggest that my feature title is too broad?
What is the maximum number of scenarios one should have in a single feature file (from a best practice point of view)?
Thanks in advance :)
Although I don't know your system and feature file, I can surely say that there is a misunderstanding of scenarios and their purpose.
The purpose of scenarios is to bring a clarification for the feature by examples. Usually, people tend to write scenarios to cover all use cases. If you do scenarios that way, the feature loses the ability to be human-readable.
Keep in mind that acceptance tests are expensive to write and expensive to change. Write the minimum scenarios. If there is a scenario that doesn't bring any additional value for the understanding of the feature, then that scenario shouldn't be there. Move all use cases into a lower level of testing - unit tests.
In most cases, the feature has the number of scenarios in units, or tens if it's a complex feature.
Edit: If the number of scenarios would go close to 10, I would rather split the feature file into more files describing deeper part of the feature.
Yes, 200 is an unusually large number of scenarios for a single file. It is likely to be hard to find a particular scenario in the file or to keep it organized. (Multiple smaller files are easier to organize; a directory of files is easier for people to understand and maintain than a long file with comments or worse yet some uncommented ordering scheme.) It will also take a long time to run the file, which will make development difficult.
More importantly, 200 scenarios for a single feature might mean that the feature is extremely complex or that it is very broad. In either case it can probably be broken up into multiple smaller feature files. It also might mean that there are too many scenarios. There might be a scenario for every value of some variable (it might be sufficient to write a single scenario and not worry about different values) or a scenario for every detail of every feature (it might be better to write unit tests, which are smaller and more focused and faster, for details).
But, as with any software metric about the size of a piece of code, there might be a typical size, but every problem is different. Your feature might really be that complex. We can't say without understanding the domain and seeing the feature file.

Time virtualisation on linux

I'm attempting to test an application which has a heavy dependency on the time of day. I would like to have the ability to execute the program as if it was running in normal time (not accelerated) but on arbitrary date/time periods.
My first thought was to abstract the time retrieval function calls with my own library calls which would allow me to alter the behaviour for testing but I wondered whether it would be possible without adding conditional logic to my code base or building a test variant of the binary.
What I'm really looking for is some kind of localised time domain, is this possible with a container (like Docker) or using LD_PRELOAD to intercept the calls?
I also saw a patch that enabled time to be disconnected from the system time using unshare(COL_TIME) but it doesn't look like this got in.
It seems like a problem that must have be solved numerous times before, anyone willing to share their solution(s)?
Thanks
AJ
Whilst alternative solutions and tricks are great, I think you're severely overcomplicating a simple problem. It's completely common and acceptable to include certain command-line switches in a program for testing/evaluation purposes. I would simply include a command line switch like this that accepts an ISO timestamp:
./myprogram --debug-override-time=2014-01-01Z12:34:56
Then at startup, if set, subtract it from the current system time, and indeed make a local apptime() function which corrects the output of regular system for this, and call that everywhere in your code instead.
The big advantage of this is that anyone can reproduce your testing results, without a big readup on custom linux tricks, so also an external testing team or a future co-developer who's good at coding but not at runtime tricks. When (unit) testing, that's a major advantage to be able to just call your code with a simple switch and be able to test the results for equality to a sample set.
You don't even have to document it, lots of production tools in enterprise-grade products have hidden command line switches for this kind of behaviour that the 'general public' need not know about.
There are several ways to query the time on Linux. Read time(7); I know at least time(2), gettimeofday(2), clock_gettime(2).
So you could use LD_PRELOAD tricks to redefine each of these to e.g. substract from the seconds part (not the micro-second or nano-second part) a fixed amount of seconds, given e.g. by some environment variable. See this example as a starting point.

How to organize my spec files?

I'm using mocha to test my node.js application.
I notice that my spec files getting bigger and bigger over time. Are there any pattern to organize the test files (e.g. one specs file per test)? Or are there other frameworks on top of mocha to help me structure the tests? Or do you prefere other test frameworks for that reason?
Large test/spec files tend to mean the code under test might be doing too much. This is not always the case though, often your test code will always out weigh the code under test, but if you are finding them hard to manage this might be a sign.
I tend to group tests based on functionality. Imagine if we have example.js, I would expect example.tests.js to begin with.
Rather than one spec called ExampleSpec I tend to have many specs/tests based around different contexts. For example I might have EmptyExample, ErrorExample, and DefaultExample which have different pre-condidtions. If these become too large, you either have missing abstractions, or should then think about splitting the files out. So you could end up with a directory structure such as:
specs/
Example/
EmptyExample.js
ErrorExample.js
DefaultExample.js
To begin with though, one test/spec file per production file should be the starting point. Only separate if needs be.

Writing easily modified code

What are some ways in which I can write code that is easily modified?
The one I have learned from experience is that I almost always need to write one to throw away. That way I have developed a sense of the domain knowledge and program structure required before coding the actual application.
The general guidelines are offcourse
High cohesion, low coupling
Dont repeat yourself
Recognize design patterns and implement them
Dont recognize design patterns where they are not existing or necassary
Use a coding standard, stick to it
Comment everyting that should be commented, when in doubt : comment
Use unit tests
Write comments and tests before implementation, that way you know exactly what you want to do
And when it goes wrong : refactor, refactor, refactor. With good tests you can be sure nothing breaks
And oh yeah:
read this : http://www.pragprog.com/the-pragmatic-programmer
Everything (i think) above and more is in it
I think your emphasis on modifiability is more important than readability. It is not hard to make something easy to read, but the real test of how well it is understood comes when someone else (or you) has to modify it in repsonse to changing requirements.
What I try to do is assume that modifications will be necessary, and if it is not really clear how to do them, leave explicit directions in the code for how to do them.
I assume that I may have to do some educating of the reader of the code to get him or her to know how to modify the code properly. This requires energy on my part, and it requires energy on the part of the person reading the code.
So while I admire the idea of literate programming, that can be easily read and understood, sometimes it is more like math, where the only way to do it is for the reader to buckle down, pay close attention, re-read it a few times, and make sure they understand.
Readability helps a lot: If you do something non-obvious, or you are taking a shortcut, comment. Comments are places where you can go back and refactor if you have time later. Use sensible names for everything, makes it easier to understand what is going on.
Continuous revision will let you move from that first draft to a better one without throwing away (too much) work. Any time you rewrite from scratch you may lose lessons learned. As you code, use refactoring tools to eliminate code representing areas of exploration that are no longer needed, and to make obvious things that were obscure. The first one reduces the amount that you need to maintain; the second reduces the effort per square foot. (Sqft makes about as much sense as lines of code, really.)
Modularize appropriately and enforce encapsulation and separation of logic between your modules. You don't want too many dependencies on any one part of the code or that part becomes inherently harder to understand.
Considering using tried and true methods over cutting edge ones. You give up some functionality for predictability.
Finally, if this is code that people will be using before and after modification, you need(ed) to have an appropriate API insulating your code from theirs. Having a strong API lets you change things behind the scenes without needing to alert all your consumers. I think there's a decent article on Coding Horror about this.
Hang Your Code Out to D.R.Y.
I learned this early when assigned the task of changing the appearance of a web-interface. The code was in C, which I hated, and was compiled to a CGI executable. And, worse, it was built on a library that was abandoned—no updates, no support, and too many man-hours put into its use to change it. On top of the framework was a disorderly web of code, consisting of various form and element builders, custom string implementations, and various other arcane things (for a non-C programmer to commit suicide with).
For each change I made there were several, sometimes many, exceptions to the output HTML. Each one of these exceptions required a small change or improvement in the form builder, thanks to the language there's no inheritance and therefore only functions and structs, and instead of putting the hours in the team instead wrote these exceptions frequently.
In my inexperience I was forced to change the output of each exception, rather than consolidate the changes in an improved form builder. But, trawling through 15,000 lines of code for several hours after ineffective changes would induce code-burn, and a fogginess that took a night's sleep to cure.
Always run your code through the DRY-er.
The easiest way to modify a code is NOT to write code. Write pseudo code not just for algo but how your code should be structured if you are unsure.
Designing while writing code never works...for me :-)
Here is my current experience: I'm working (Java) with a kind of database schema that might often change (fields added/removed, data types modified). My strategy is to parse this schema and to generate the code with apache velocity. The BaseClass generated is never modified by the programmer. Else, a MyClass extends BaseClass is created and the logical components of this class (e.g. toString() ! )are implemented using the 'getters' and the 'setters' of the super class.

Resources