What do you think of the Managed Contract Tools library

What do you think of the Managed Contract Tools library - c#-4.0

I recently saw this video
http://channel9.msdn.com/pdc2008/TL51/ about the managed Contract tools library which certainly looks very interesting. Sadly it seems they won't include this into the language itself which would have been more elegant as in Spec#. It would be nice to have is as both options in C#4.0 in fact as the Contracts adds a lot of noise to the business code.
Have anybody here used it and have some real world feedback? Could you also add contracts to class Properties and even variables? Something like
decimal Percentage (min 0, max 1)
string NotNullString (not null, regex("??"))
would maybe been nice.

I'm trying it, but I think that the library is too young to be used seriously on big projects, at least with the static checks enabled: compiling is very slow and the specific warning are not very clear to read.
Runtime checkings can be used without problems, because they seems to be implemented as Debug.Assert. At least you document methods.
In order to add contracts to properties i would add constraints to the set property, but in this specific case I think it would be better writing a class that could actually encapsulate the requirements, allowing the construction of good objects only. Anyway:
private decimal _Percentage;
decimal Percentage
{
get{ return _Percentage;}
set
{
CodeContract.RequiresAlways(value <= 1);
CodeContract.RequiresAlways(value >= 0);
_Percentage = value;
}
}
p.s: It seems to me that the trend in C# goes in the dynamic typing direction, instead of going to the strict and strong typing coding methods. I think that DbC works better with strong typing, at least because it let you add more requirements to types and to functions.

It seems the Contract Library could be a nice fit for Linq2Sql. Define fields and constraint in your sql database and contracts could be generated for you. This could be a nice introduction to Contracts.

Related

Should Spark transformation be broken to functions

It looks like all the Spark examples found in web are built in as single long function (usually in main)
But it is often the case that it makes sense to break the long call into functions like:
Improve readability
Share code between solution paths
A typical signature would look like this (this is Java code but similar signature would appear in all languages)
private static Dataset<Row> myFiltering(Dataset<Row> data) {
return data.filter(functions.col("firstName").isNotNull()).filter(functions.col("lastName").isNotNull());
}
The problem here is that there is no safety for the content of the Row, as there is no enforcement on the fields, and calling the function becomes not only a matter of matching the singnature but also the content of Row. Which obviously may (and does in my case) cause errors.
What is the best practice you enforce in large scale development environments? do you leave the code as one long function? do you suffer every time you change field names?

Yes, you should split your method into smaller methods. Be aware, that many small functions also are not much readable ;)
My rules:
Split some chain of transformation if we can name it - if we have name, it means that this is some kind of subfunction
If there are many functions of the same domain - extract new class or even package.
KISS: don't split if your function call chain is short and also describes some subfunction, even if some lines may be extracted - it is not as much readable to read many many custom functions
Any larger - not in 1-3 lines - filters, maps I suggest to extract to method and use method reference
Marked as Community Wiki, because this is only my point of view and it's OT for StackOverflow. It someone else has any other suggestions, please share it :)

Unused object properties bad?

I have a object list with some properties that dont need .I always include the ones I know I won't use anyway because "maybe I need them later" excuse. Is there a clear answer if this is bad or not? I imagine it is a tiny waste of performance, but does it really matter?

It depends.
I often find that it is just as easy to add properties in later when I know for sure that I need them, and how I will use them.
Depending on what your compiler/interpreter does with the unused properties then you might get (very negligible) performance issues[citation for C#].
If you are just hacking together some code, then you are probably not going to hurt anything. On the other hand, if you are working with a team or even on a public API then having unused variables could be very harmful. Imagine I have some interface like this:
interface IMyInterface {
void DoStuff(); // implemented
bool WarningAsError {get; set;} // just speculative - no implementation yet
}
Anybody using this interface will probably assume that they can set the WarningAsError flag and your interface will treat warnings as errors. This could waste huge amounts of time wondering why the warnings are not treated as errors.

Building custom Parse Trees in ANTLR v4

Question: is there a (more straightforward) way of building custom parse trees at parse time in ANTLR v4?
I guess that one could traverse and rewrite the automatically built tree but I was wondering if we can still do manual tree building (or tweaking, for that matter) at parse time (similar to ANTLR v3 and ealier). The idea is that, depending on how one writes his/her grammar, we get a lot of useless nodes in the ANTLR-built tree and while I get it that you can override only the listener methods that interest you, one still has to check and skip useless token types, etc.

No, our experience with ANTLR 3 is the manual AST feature inevitably resulted in code which was more difficult to maintain and understand, leading to a high rate of regression bugs for developers making any change to a grammar. Tokens are no longer omitted from the tree since it's difficult to tell which terminals will be needed by future releases of an application, and you don't want to have to change/verify all of your code operating on the parse tree if a terminal which was previously unused is now required by a new component or feature.

You can override org.antlr.v4.runtime.Parser.addContextToParseTree() to get some control over what nodes get created. Not sure this is quite what you mean by custom.
#parser::members {
#Override
protected void addContextToParseTree() {
// code is a rule enabled by semantic predicate 'full'
// that matches generic lines of code.
if(!full && _ctx instanceof CodeContext){
return;
}
// otherwise add the node to the tree..
super.addContextToParseTree();
}
}

How to increase reusability between SpecFlow/Gherkin steps?

I think I thoroughly understand the concepts and ideas behind SpecFlow, but even after reading the Secret Ninja Cucumber Scrolls, The Cucumber Book, and going through the various forums I'm still unsure about the path to reusability.
Our scenarios already comply to various guidelines
Self explanatory
Must have a understandable purpose (what makes it different from the other scenarios)
Are unique
Represent vertical functional slices
Uses Ubiquitous Language
Written from the stakeholder perspective
About business functionality, not about software design
Grouped by Epics
ARE NOT TEST SCRIPTS
Let somebody else read them to see if the scenario is correct
Doesn't refer to UI elements
Represent key examples
Non-technical
Precise and testable
As repeatable as possible
'Given' represent state, not actions
'When' represent actions
'Then' should represent a visible change, not some internal event
Our steps have to comply to the following guidelines (some are specific to SpecFlow):
Uses Ubiquitous Language
Doesn't refer to UI elements
Should not be combined
Should be reusable and global over all features
Should not be linked to a specific feature
Grouped by entities, entity groups or domain concepts
Don't create steps to reuse logic in a step definitions file
Think thoroughly in what Steps file a step belongs
Don't reuse steps between phases
Literal strings in steps must be avoided, but if required use single quotes
Never apply multiple [Given], [When] or [Then] attributes to the step
method
Order the steps according to the phase they represent
If it is not important for the scenario, it is very important not to mention it
But we still end up with lots of variations of the same steps, even if we use regex placeholders. Especially the rule that if something is not important, you shouldn't mention it results in those variations. And yes, internally these steps do a lot of reusing, but not in the scenario.
Consider for example the following scenario:
Feature: Signing where both persons are physically available
#Smoke
Scenario: Show remaining time to sign based on previous signature
Given a draft proposal
And the first signature has been set
When I try to set the second signature
Then the remaining time to sign should be shown
#Smoke
Scenario: Re-signing of the first proposal
Given a signature that has not been set within the configured time
And the first signature has just been re-signed
When I try to set the second signature
Then the remaining time should start over
Would it be better to combine the two 'given' steps into one and loose some reusability?
Some other examples:
Feature: Conditionally show signatures to be signed
#Smoke
Scenario: Show the correct signature for a proposal with a night shift
Given I have a proposal for the day shift
When I change it to the night shift
Then I should only be able to sign for the night shift
#Smoke
Scenario: Show additional signature when extending the shift
Given I have a suspended proposal for the night shift
When I extend the period to the day shift
Then I should confirm extening the period over the shift
Am I missing a fundamental concept here?

This is not an answer, but some hints:
you can put multiple Given/When/Then attributes on the same method. If the parameters are the same and the difference is only in phrasing, this can be useful
in many project we use driver/page object pattern, so the step definitions are usually quite short (2-3 lines), so we bother less about the number of them
I like your scenarios, I would not change them. On the other hand try to focus on the readability and not the reusability. If your language is consistent, the reusability will come.
For increasing the reusability especially when there are a lot of "variations" of the entity you are talking about, you can consider using the step argument transformations. Here is an example:
you need a class to represent a permit in the tests with decorations:
class PermitDescription{
bool suspended;
bool draft;
}
create converter methods:
[StepArgumentTransformation("permit")]
public PermitDescription CreateSimple(){
return new PermitDescription();
}
[StepArgumentTransformation("draft permit")]
public PermitDescription CreateDraft(){
return new PermitDescription() { draft = true; }
}
[StepArgumentTransformation("suspended permit")]
public PermitDescription CreateSuspended(){
return new PermitDescription() { suspended = true; }
}
you can have now more flexible step definitions that require permits:
[Given(#"I have a (.*) for the day shift")]
public void Something(PermitDescription p)
{ ... }
that matches to:
Given I have a permit for the day shift
Given I have a draft permit for the day shift
Given I have a suspended permit for the day shift
of course this is tool that can be also abused, but in some cases it can help.

Adding onto the answer from #gaspar-nagy
It follows the pattern of class design in C programming. Anywhere a common group of classes share common properties/methods, those properties/methods can be refactored into a base class.
What it looks like in our SpecFlow tests is that common browser operations are in the base classes:
Login()
Logout()
NavigateToUrl(string url)
UserHasPermission(string permission)
WaitForElementToAppearById(string id)
WaitForElementToAppearByClass(string class)
And each of those methods could have 1 or more Given/When/Then attributes like #gasper-nagy stated.
Another technique which proves invaluable is to share variables between .features and their respective C# step files is to use the ScenarioContext.
For example, whenever Login() is called to initiate our browser based tests, we do this:
ScenarioContext.Current.Set<IWebDriver>(driver, "driver")
Then anywhere else that needs the driver, can get it by:
var driver = ScenarioContext.Current.Get<IWebDriver>("driver")
This makes steps re-usable, such as user input tests for validation you may decide to pass the element being validated around like this:
ScenarioContext.Current.Set<IWebElement>(element, "validation-element")

Meaning of Leaky Abstraction?

What does the term "Leaky Abstraction" mean? (Please explain with examples. I often have a hard time grokking a mere theory.)

Here's a meatspace example:
Automobiles have abstractions for drivers. In its purest form, there's a steering wheel, accelerator and brake. This abstraction hides a lot of detail about what's under the hood: engine, cams, timing belt, spark plugs, radiator, etc.
The neat thing about this abstraction is that we can replace parts of the implementation with improved parts without retraining the user. Let's say we replace the distributor cap with electronic ignition, and we replace the fixed cam with a variable cam. These changes improve performance but the user still steers with the wheel and uses the pedals to start and stop.
It's actually quite remarkable... a 16 year old or an 80 year old can operate this complicated piece of machinery without really knowing much about how it works inside!
But there are leaks. The transmission is a small leak. In an automatic transmission you can feel the car lose power for a moment as it switches gears, whereas in CVT you feel smooth torque all the way up.
There are bigger leaks, too. If you rev the engine too fast, you may do damage to it. If the engine block is too cold, the car may not start or it may have poor performance. And if you crank the radio, headlights, and AC all at the same time, you'll see your gas mileage go down.

It simply means that your abstraction exposes some of the implementation details, or that you need to be aware of the implementation details when using the abstraction. The term is attributed to Joel Spolsky, circa 2002. See the wikipedia article for more information.
A classic example are network libraries that allow you to treat remote files as local. The developer using this abstraction must be aware that network problems may cause this to fail in ways that local files do not. You then need to develop code to handle specifically errors outside the abstraction that the network library provides.

Wikipedia has a pretty good definition for this
A leaky abstraction refers to any implemented abstraction, intended to reduce (or hide) complexity, where the underlying details are not completely hidden
Or in other words for software it's when you can observe implementation details of a feature via limitations or side effects in the program.
A quick example would be C# / VB.Net closures and their inability to capture ref / out parameters. The reason they cannot be captured is due to an implementation detail of how the lifting process occurs. This is not to say though that there is a better way of doing this.

Here's an example familiar to .NET developers: ASP.NET's Page class attempts to hide the details of HTTP operations, particularly the management of form data, so that developers don't have to deal with posted values (because it automatically maps form values to server controls).
But if you wander beyond the most basic usage scenarios the Page abstraction begins to leak and it becomes hard to work with pages unless you understand the class' implementation details.
One common example is dynamically adding controls to a page - the value of dynamically-added controls won't be mapped for you unless you add them at just the right time: before the underlying engine maps the incoming form values to the appropriate controls. When you have to learn that, the abstraction has leaked.

Well, in a way it is a purely theoretical thing, though not unimportant.
We use abstractions to make things easier to comprehend. I may operate on a string class in some language to hide the fact that I'm dealing with an ordered set of characters that are individual items. I deal with an ordered set of characters to hide the fact that I'm dealing with numbers. I deal with numbers to hide the fact that I'm dealing with 1s and 0s.
A leaky abstraction is one that doesn't hide the details its meant to hide. If call string.Length on a 5-character string in Java or .NET I could get any answer from 5 to 10, because of implementation details where what those languages call characters are really UTF-16 data-points which can represent either 1 or .5 of a character. The abstraction has leaked. Not leaking it though means that finding the length would either require more storage space (to store the real length) or change from being O(1) to O(n) (to work out what the real length is). If I care about the real answer (often you don't really) you need to work on the knowledge of what is really going on.
More debatable cases happen with cases like where a method or property lets you get in at the inner workings, whether they are abstraction leaks, or well-defined ways to move to a lower level of abstraction, can sometimes be a matter people disagree on.

I'll continue in the vein of giving examples by using RPC.
In the ideal world of RPC, a remote procedure call should look like a local procedure call (or so the story goes). It should be completely transparent to the programmer such that when they call SomeObject.someFunction() they have no idea if SomeObject (or just someFunction for that matter) are locally stored and executed or remotely stored and executed. The theory goes that this makes programming simpler.
The reality is different because there's a HUGE difference between making a local function call (even if you're using the world's slowest interpreted language) and:
calling through a proxy object
serializing your parameters
making a network connection (if not already established)
transmitting the data to the remote proxy
having the remote proxy restore the data and call the remote function on your behalf
serializing the return value(s)
transmitting the return values to the local proxy
reassembling the serialized data
returning the response from the remote function
In time alone that's about three orders (or more!) of magnitude difference. Those three+ orders of magnitude are going to make a huge difference in performance that will make your abstraction of a procedure call leak rather obviously the first time you mistakenly treat an RPC as a real function call. Further a real function call, barring serious problems in your code, will have very few failure points outside of implementation bugs. An RPC call has all of the following possible problems that will get slathered on as failure cases over and above what you'd expect from a regular local call:
you might not be able to instantiate your local proxy
you might not be able to instantiate your remote proxy
the proxies may not be able to connect
the parameters you send may not make it intact or at all
the return value the remote sends may not make it intact or at all
So now your RPC call which is "just like a local function call" has a whole buttload of extra failure conditions you don't have to contend with when doing local function calls. The abstraction has leaked again, even harder.
In the end RPC is a bad abstraction because it leaks like a sieve at every level -- when successful and when failing both.

What is abstraction?
Abstraction is a way of simplifying the world.
It means you don't have to worry about what is actually happening under the hood.
Example: Flying a 737/747 is "abstracted" away
Planes are complicated systems: it involves: jet engines, oxygen systems, electrical systems, landing gear systems etc.
...but the pilot doesn't have to worry about it... all of that is "abstracted away". The only thing a pilot needs to focus on is yoke (i.e. steering wheel of the plane).
He pushes the yoke left to go left, and right to go right etc.
....that is in an ideal world. In reality, flying a plane is much more complicated. Because many details ARE NOT "abstracted away".
Leaky Abstractions in 737 Example
Pilots in reality have to worry about a LOT of things: wind speed, thrust, angles of attack, fuel, altitude, weather problems, angles of descent. Computers can help the pilot in these tasks, but not everything is automated / simplified......not everything is "abstracted away".
e.g. If the pilot pulls up too hard on the column - the plane will obey, but then the plane might stall, and that's really bad.
In other words, it is not enough for the pilot to simply control the steering wheel without knowing anything else.........nooooo.......the pilot must know about the underlying risks and limitations of the plane before the pilot flies one.......the pilot must know how the plane works, and how the plane flies; the pilot must know implementation details..... that pulling up too hard will lead to a stall, or that landing too steeply will destroy the plane etc.
Those things are not abstracted away. A lot of things are abstracted, but not everything. The abstraction is "leaky".
Leaky Abstractions in Code
......it's the same thing in your code. If you don't know the underlying implementation details, then you're gonna have problems.
ORMs abstract a lot of the hassle in dealing with database queries, but if you've ever done something like:
User.all.each do |user|
puts user.name # let's print each user's name
end
Then you will realise that's a nice way to kill your app. You need to know that calling User.allwith 25 million users is going to spike your memory usage, and is going to cause problems. You need to know some underlying details. The abstraction is leaky.

An example in the django ORM many-to-many example:
Notice in the Sample API Usage that you need to .save() the base Article object a1 before you can add Publication objects to the many-to-many attribute. And notice that updating the many-to-many attribute saves to the underlying database immediately, whereas updating a singular attribute is not reflected in the db until the .save() is called.
The abstraction is that we are working with an object graph, where single-value attributes and mult-value attributes are just attributes. But the implementation as a relational database backed data store leaks... as the integrity system of the RDBS appears through the thin veneer of an object interface.

The fact that at some point, which will guided by your scale and execution, you will be needed to get familiar with the implementation details of your abstraction framework in order to understand why it behave that way it behave.
For example, consider this SQL query:
SELECT id, first_name, last_name, age, subject FROM student_details;
And its alternative:
SELECT * FROM student_details;
Now, they do look like a logically equivalent solutions, but the performance of the first one is better due the individual column names specification.
It's a trivial example but eventually it comes back to Joel Spolsky quote:
All non-trivial abstractions, to some degree, are leaky.
At some point, when you will reach a certain scale in your operation, you will want to optimize the way your DB (SQL) works. To do it, you will need to know the way relational databases works. It was abstracted to you in the beginning, but it's leaky. You need to learn it at some point.

Assume, we have the following code in a library:
Object[] fetchDeviceColorAndModel(String serialNumberOfDevice)
{
//fetch Device Color and Device Model from DB.
//create new Object[] and set 0th field with color and 1st field with model value.
}
When the consumer calls the API, they get an Object[]. The consumer has to understand that the first field of the object array has color value and second field is the model value. Here the abstraction has leaked from library to the consumer code.
One of the solutions is to return an object which encapsulates Model and Color of the Device. The consumer can call that object to get the model and color value.
DeviceColorAndModel fetchDeviceColorAndModel(String serialNumberOfTheDevice)
{
//fetch Device Color and Device Model from DB.
return new DeviceColorAndModel(color, model);
}

Leaky abstraction is all about encapsulating state. very simple example of leaky abstraction:
$currentTime = new DateTime();
$bankAccount1->setLastRefresh($currentTime);
$bankAccount2->setLastRefresh($currentTime);
$currentTime->setTimestamp($aTimestamp);
class BankAccount {
// ...
public function setLastRefresh(DateTimeImmutable $lastRefresh)
{
$this->lastRefresh = $lastRefresh;
} }
and the right way(not leaky abstraction):
class BankAccount
{
// ...
public function setLastRefresh(DateTime $lastRefresh)
{
$this->lastRefresh = clone $lastRefresh;
}
}
more description here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string