Can I use BDD by testing low abstraction level code? - cucumber

I checked several (real world) BDD examples, but all I have found are e2e tests using selenium. I was wondering, is it possible to write unit tests with BDD? If so, how should such a unit test look alike in gherkin? I have a hard time to imagine what to write into the feature and scenario description and how to use them to generate a documentation for example by the java collection framework.
edit
I have found an example here: http://jonkruger.com/blog/2010/12/13/using-cucumber-for-unit-tests-why-not/comment-page-1/
features:
Feature: Checkout
Scenario Outline: Checking out individual items
Given that I have not checked anything out
When I check out item
Then the total price should be the of that item
Examples:
| item | unit price |
| "A" | 50 |
| "B" | 30 |
| "C" | 20 |
| "D" | 15 |
Scenario Outline: Checking out multiple items
Given that I have not checked anything out
When I check out
Then the total price should be the of those items
Examples:
| multiple items | expected total price | notes |
| "AAA" | 130 | 3 for 130 |
| "BB" | 45 | 2 for 45 |
| "CCC" | 60 | |
| "DDD" | 45 | |
| "BBB" | 75 | (2 for 45) + 30 |
| "BABBAA" | 205 | order doesn't matter |
| "" | 0 | |
Scenario Outline: Rounding money
When rounding "" to the nearest penny
Then it should round it using midpoint rounding to ""
Examples:
| amount | rounded amount |
| 1 | 1 |
| 1.225 | 1.23 |
| 1.2251 | 1.23 |
| 1.2249 | 1.22 |
| 1.22 | 1.22 |
step definitions (ruby):
require 'spec_helper'
describe "Given that I have not checked anything out" do
before :each do
#check_out = CheckOut.new
end
[["A", 50], ["B", 30], ["C", 20], ["D", 15]].each do |item, unit_price|
describe "When I check out an invididual item" do
it "The total price should be the unit price of that item" do
#check_out.scan(item)
#check_out.total.should == unit_price
end
end
end
[["AAA", 130], # 3 for 130
["BB", 45], # 2 for 45
["CCC", 60],
["DDD", 45],
["BBB", 75], # (2 for 45) + 30
["BABBAA", 205], # order doesn't matter
["", 0]].each do |items, expected_total_price|
describe "When I check out multiple items" do
it "The total price should be the expected total price of those items" do
individual_items = items.split(//)
individual_items.each { |item| #check_out.scan(item) }
#check_out.total.should == expected_total_price
end
end
end
end
class RoundingTester
include Rounding
end
[[1, 1],
[1.225, 1.23],
[1.2251, 1.23],
[1.2249, 1.22],
[1.22, 1.22]].each do |amount, rounded_amount|
describe "When rounding an amount of money to the nearest penny" do
it "Should round the amount using midpoint rounding" do
RoundingTester.new.round_money(amount).should == rounded_amount
end
end
end
I don't know a way of generating documentation based on this. It is not hopeless, e.g. it is easy to map the Feature: Checkout to the Checkout class. Maybe something similar can be done on the method level. Another possible solution to write helpers specific to this task.

A key idea here is understanding the difference between describing behaviour, and testing. In this context describing behaviour is:
more abstract
easy to read by a wider audience
more focused on what you are doing and why you are doing
less focused on 'how' you are doing something
less exhaustive, we use examples, we don't cover everything
Testing tends to be:
precise
detailed
exhaustive
technical
When you use a BDD tool, e.g. Cucumber to write unit tests you tend to end up with tests that are
verbose
full of technical detail which only a few people can appreciate
very expensive to run
difficult to maintain
So we have different tools and different techniques for different sorts of 'testing'. You get the most bang for your buck by using the right tool for the right job.
Whilst the idea of using one tool for all your testing seems very appealing. In the end its about as sensible as using one tool to fix your car - try pumping up your tyres with a hammer!

BDD describe systems as a black box. If you have any words in there related to the implementation, it's no longer BDD. Inf3rno posted an example with the correct abstraction.
I always ask myself, if the user interface was gone, would I be able to keep the same feature files? If the use cases were to be carried out over voice commands, would the steps still make sense?
Another way to think about it is, steps statements should be facts about a system, not instructions on how to manually test it.
good step definition
Given An account "FOO" with username <username> and password <password>
bad step definition (only applies to ui)
Given I am at the login page
And I enter <username> as the username
And I enter <password> as the password
Full example
Given An account "FOO" with username <username> and password <password>
When Creating an account with username <username> and password BAR
Then An error "Username already in use" is returned
Note that I could implement this last example against the user interface, against the api, but I could also implement it over voice commands ;)

Related

Pandas: Sliding window, summing app 14 day data

I do wonder how it is possible to make sliding windows in Pandas.
I have a dataframe with three columns.
Country | Number | DayOfTheYear
===================================
No | 50 | 0
No | 20 | 1
No | 37 | 2
I would love to see 14 day chunks for every country and day combination.
The country think can be ignored for the moment, since I can filter those manually in some way. But imagine there is only one country, is there a smart way to get some sort of summed up sliding window, resulting in something like the following?
Country | Sum | DatesOftheYear
===================================
No | 504 | 0-13
No | 207 | 1-14
No | 337 | 2-15
I would also accept if if they where disjunct, being only 0-13, 14-27, etc.
But I just cannot come along with Pandas. I know an old SQL solution, but is there anybody having a nice idea for Pandas?
If you want a rolling windows of your dataframe, you can simply use the .rolling function of pandas : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html
In your case : df["Number"].rolling(14).sum()

how to create external table from csv file with commas in quote field in greenplum?

I'm trying to create external table from csv like this:
CREATE EXTERNAL TABLE hctest.ex_nkp
(
a text,
b text,
c text,
d text,
e text,
f text,
g text,
h text
)
LOCATION ('gpfdist://192.168.56.111:10000/performnkp.csv')
FORMAT 'CSV' (DELIMITER ',' HEADER);
The csv is delimited by comma (,) and looks like this :
"Subject Username","Form Title","Form Start Date","Form End Date","Competency Name","Competency Description","Core Competency","Competency Official Rating"
"90008765","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","1. Uncompromising Integrity","<p>High ethical standards, low tolerance of unethical conduct.</p>","Yes","3"
"90008766","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","2. Team Synergy","<p>Passionately work together, ensuring completeness, to achieve common goals.</p>","Yes","3"
"90008767","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","3. Simplicity","<p>We do our utmost to deliver the easy to use solutions, exceeding customers&#39","","
"90008768","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","4. Exceptional Performance","<p>Highest level of performance, with a heart for people.</p>","Yes","3"
And i found the error:
ERROR: extra data after last expected column (seg0 slice1 192.168.56.111:6000 pid=14121)
DETAIL: External table ex_nkp, line 5 of file gpfdist://192.168.56.111:10000/performnkp.csv
How can i resolve this?
It looks like your CSV is malformed in line 4. Notice that at the end of line 4, there is a single quote, and Greenplum is interpreting that as a CSV field with a line break. By adding the missing quote on line 4, I am able to read the file in Greenplum.
"Subject Username","Form Title","Form Start Date","Form End Date","Competency Name","Competency Description","Core Competency","Competency Official Rating"
"90008765","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","1. Uncompromising Integrity","<p>High ethical standards, low tolerance of unethical conduct.</p>","Yes","3"
"90008766","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","2. Team Synergy","<p>Passionately work together, ensuring completeness, to achieve common goals.</p>","Yes","3"
"90008767","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","3. Simplicity","<p>We do our utmost to deliver the easy to use solutions, exceeding customers&#39","","
"90008768","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","4. Exceptional Performance","<p>Highest level of performance, with a heart for people.</p>","Yes","3"
Resulting query:
fguerrero=# select * from ex_nkp ;
NOTICE: HEADER means that each one of the data files has a header row
a | b | c | d | e | f | g | h
----------+-------------------------------------------------------+------------+------------+-----------------------------+------------------------------------------------------------------------------------+-----+---
90008765 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 1. Uncompromising Integrity | <p>High ethical standards, low tolerance of unethical conduct.</p> | Yes | 3
90008766 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 2. Team Synergy | <p>Passionately work together, ensuring completeness, to achieve common goals.</p> | Yes | 3
90008767 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 3. Simplicity | <p>We do our utmost to deliver the easy to use solutions, exceeding customers&#39 | |
90008768 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 4. Exceptional Performance | <p>Highest level of performance, with a heart for people.</p> | Yes | 3
(4 rows)
Let me know if this helps
You can specify "LOG ERRORS SEGMENT REJECT LIMIT 10" in your external table definition. This way, segment will skip the rows with errors.
Then you can come back to trace the detail with "select * from gp_read_error_log('external_table_name');"
From the example, it looks like you have extra comma in the field. Try to specify QUOTE '"' after HEADER.

Getting multiple readings from .txt into excel

I'm not sure if this is the correct place to ask this, but basically I have a .txt file containing values that came from 2 separate sensors.
Example of some data:
{"t":3838202,"s":0,"n":"x1","v":-1052}
{"t":3838203,"s":0,"n":"y1","v":44}
{"t":3838204,"s":0,"n":"z1","v":-84}
{"t":3838435,"s":0,"n":"x1","v":-1052}
{"t":3838436,"s":0,"n":"y1","v":36}
{"t":3838437,"s":0,"n":"z1","v":-80}
{"t":3838670,"s":0,"n":"x1","v":-1056}
{"t":3838671,"s":0,"n":"y1","v":52}
{"t":3838672,"s":0,"n":"z1","v":-88}
{"t":3838902,"s":0,"n":"x1","v":-1052}
{"t":3838903,"s":0,"n":"y1","v":48}
{"t":3838904,"s":0,"n":"z1","v":-80}
{"t":3839136,"s":0,"n":"x1","v":-1056}
{"t":3839137,"s":0,"n":"y1","v":40}
{"t":3839138,"s":0,"n":"z1","v":-80}
x2:-944
y2:108
z2:-380
{"t":3839841,"s":0,"n":"x1","v":-1052}
{"t":3839842,"s":0,"n":"y1","v":44}
{"t":3839843,"s":0,"n":"z1","v":-80}
x2:-948
y2:100
z2:-380
{"t":3840541,"s":0,"n":"x1","v":-1052}
{"t":3840542,"s":0,"n":"y1","v":40}
{"t":3840543,"s":0,"n":"z1","v":-84}
{"t":3840774,"s":0,"n":"x1","v":-1052}
{"t":3840775,"s":0,"n":"y1","v":40}
{"t":3840776,"s":0,"n":"z1","v":-84}
x2:-948
y2:108
z2:-368
I'm trying to get the data into excel, so that for each "chunk" of data in the x1y1z1 section, I take the last set of recorded data and discard the rest and "pair" it with the next set of x2y2z2 data. I don't think I'm explaining it very well, but I basically want to take that text file and get this in excel:
+---------+-------+----+-----+------+-----+------+
| t | x1 | y1 | z1 | x2 | y2 | z2 |
+---------+-------+----+-----+------+-----+------+
| 3839138 | -1056 | 40 | -80 | -944 | 100 | -380 |
| 3839843 | -1052 | 44 | -80 | -948 | 100 | -380 |
| 3840776 | -1052 | 40 | -84 | -948 | 108 | -368 |
+---------+-------+----+-----+------+-----+------+
I'm really stuck as to where I should even start
I think like a programmer, so I would approach this problem in steps. If you are not a programmer, this might not be so helpful to you, and I am sorry for that.
First, define the data. How does each line of data get read and understood.
Second, write a parsing utility. A piece of code which interprets the data as it is read in and stores it in the form you want for your output
Third, import data into Excel.
So, based on the limited data you provided, I am not sure how you are able to determine the x1,y1,z1,x2,y2,z2 for each t, but I assume that the values enclosed in curly braces have something to do with that based on the values for s, n, and v I'm seeing in there. So, first of all you need to clearly determine the way you read the data. Take it one line at a time, and determine how you would build your output table based on each line of data. I assume you would treat the lines enclosed in curly braces differently from the lines with standalone x/y/z values for example.
I hope this points you in the right direction.

Add "invisible" decimal places to end of number?

I am printing a "Table" to the console. I will be using this same table structure for several different variables. However as you can see from Output below, the lines don't all align.
One way to resolve it would be to increase the number of decimal places (e.g. 6.730000 for Standard Deviation) which would push the line into place.
However, I do not want this many decimal places.
Is it possible to add extra 0s to the end of a number, and make these invisible?
I am planning on using this table structure for several variables, and the length of Mean, Stddev, and Median will likely never be more than 6 characters.
EDIT - I would really like to ensure that each value which appears in the table will be 6 characters long, and if it is not 6 characters long, add additional "invisible" zeros.
Input
# Create and structure Table to store descriptive statistics for each variable.
subtitle = "| Mean | Stddev | Median |"
structure = '| {:0.2f} | {:0.2f} | {:0.2f} |'
lines = '=' * len(subtitle)
# Print table.
print(lines)
print(subtitle)
print(lines)
print(structure.format(mean, std, median))
print(lines)
Output:
======================================
| Mean | Stddev | Median |
======================================
| 181.26 | 6.73 | 180.34 |
======================================
Didn't really figure this out - but found a workaround.
I just did the following:
"| {:^6} | {:^6} | {:^6} | {:^6} | {:^6} |"
This keeps the width between | consistent.

How to dynamically create a cumulative overall total based on a non-cumulative categorical column in excel

Slightly wordy title but here goes
I have a grid in excel which includes 3 columns (media spend, marginal revenue returns & media channel invested in) and I want to create the column below called desired cumulative spend
The reason the grid is structured in this way it does is that it represents an optimised spend laydown ordered by how much of each media channel's budget should be invested in until the marginal returns diminish such that it should be substituted for another media channel.
It is possible that this substitution can then be reversed back to the original channel if the new channel has a sharply diminishing curve, such that all marginal benefit associated to the new channel diminishes and the total spend level still means it is mathematically sensible to switch back to the original curve (maybe it has a lower base level but reduces less sharply). It is also possible that at the point in which the marginal benefit associated to the new channel diminishes, the best next step is to invest in a third channel.
The desired new spend column has two elements to it
it is a simple accumulation of spend from row to row when the
media channel is constant from row to row
it is a slightly more tricky accumulation of spend when the media
channel changes - then it needs to be able to reference back to the
last spend level associated to the channel which has been
substituted in. For row 4, the logic I am struggling with would need
to the running total from row 3 plus the new spend level associated
to row 4 minus the spend level the last time this channel was used
(row 2)
|spend | mar return | media | desired cumulative spend |
|------ |----------- |-------| ----------------------------------------- |
1 | £580 | 128 | chan1 | 580 |
2 | £620 | 121 | chan1 | 580+(620-580) |
3 | £900 | 115.8 | chan2 | 580+(620-580)+900 |
4 | £660 | 115.1 | chan1 | 580+(620-580)+900+(660-620) |
5 | £920 | 114 | chan2 | 580+(620-580)+900+(660-620)+(920-900) |
6 | £940 | 112 | chan2 | 580+(620-580)+900+(660-620)+(920-900)+(940-920) |
If my comment is the correct sugestion, then something like this should do it (£580 is at A2, so the first output is D2):
D2 =A2
D3 =D2+A3-IF(COUNTIF($C$2:C2,C3),INDEX(A:A,MAX(IF($C$2:C2=C3,ROW($A$2:A2)))))
D3 contains an array formula and must be confirmed with ctrl+shift+enter.
Now you can simply copy down from D3.

Resources