Building custom Parse Trees in ANTLR v4

Building custom Parse Trees in ANTLR v4 - antlr4

Question: is there a (more straightforward) way of building custom parse trees at parse time in ANTLR v4?
I guess that one could traverse and rewrite the automatically built tree but I was wondering if we can still do manual tree building (or tweaking, for that matter) at parse time (similar to ANTLR v3 and ealier). The idea is that, depending on how one writes his/her grammar, we get a lot of useless nodes in the ANTLR-built tree and while I get it that you can override only the listener methods that interest you, one still has to check and skip useless token types, etc.

No, our experience with ANTLR 3 is the manual AST feature inevitably resulted in code which was more difficult to maintain and understand, leading to a high rate of regression bugs for developers making any change to a grammar. Tokens are no longer omitted from the tree since it's difficult to tell which terminals will be needed by future releases of an application, and you don't want to have to change/verify all of your code operating on the parse tree if a terminal which was previously unused is now required by a new component or feature.

You can override org.antlr.v4.runtime.Parser.addContextToParseTree() to get some control over what nodes get created. Not sure this is quite what you mean by custom.
#parser::members {
#Override
protected void addContextToParseTree() {
// code is a rule enabled by semantic predicate 'full'
// that matches generic lines of code.
if(!full && _ctx instanceof CodeContext){
return;
}
// otherwise add the node to the tree..
super.addContextToParseTree();
}
}

Related

Controlling lookahead depth or synchronizing token source with token consumption

I'm porting (or actually rewriting from scratch) SystemVerilog grammar from ANTLR2.7.7 to ANTLR4.7
SystemVerilog has plenty of directives inherited from Verilog. They can show up pretty much anywhere in the source code. As such they cannot be handled by
parser. Some are interpreted by lexer and never reach further (controlling source encryption), some are for preprocessor (macros, conditional compilation etc.) but there are some that live beyond that phase. These need two way communication between parser and direct token source that handles them. Parser asks for state of these directives when it encounteres construct that can be influenced by them (when visitor is used for actions parser needs to ask for that information to remember it as part of context). On the other hand token source needs to ask parser for current scope while processing these directives as some cannot be used inside any unit.
Consider the following example:
`timescale 1ns/1ps
module m;
`resetall
initial $printtimescale; //Time scale of (m) is 1ns / 1ps
endmodule
In old ANTLR2 I know exactly when and how many tokens will be pulled from source before action that does communication between source and parser is executed. That knowledge means that there is no need for fancy synchronization, token source can interpret directives as they arrive and parser can ask for directive state when it needs to, because as long as no syntactic predicate is used token creation and consumption happen in parallel.
As far as I know ANTLR4 can pull as many tokens as it wants before executing any action. If parser pulls too many tokens while processing module header it will cause token source to "execute" resetall directive, thus nullifying previous use of timescale.
If there is no other option the way to work around the problem is most likely for token source to remember tokens related to directives it encounters and pack them along with related actions to some functor. Once it needs to emit actual token to the parser and there is at least one functor not yet emitted it would create a subclass of regular token that would remember the functor on top of regular token stuff. This way parser can do whatever it wants predicting parsing path and pulling new tokens from token source - no directive related action will be executed so far. The last bit is to redefine standard consume routine. It will have to ask token if it is of the "carying directive action" kind and execute that action - this way directive actions (and therefore state of directives) will be synchronized with token consumption, not with token creation.
The problem is I cannot test the above workaroud at the moment, because everything up to the point of final token emission is shared between old and new parser. Only the very last bit creating actual tokens is different. This means that to test the solution I'd need to apply it to old parser as well.
Finally, the questions:
is the workaround necessary or are there some reliable techniques to control lookahead in ANTLR4, at least for selected rules?
would the workaround work? I'm mostly concerned about two elements of that solution: is redefining consume enough or there are some other routines that can mess with token stream that would need to be touched as well? also if internally parser changes tokens in the stream f.e. by recreating them with use of some other token class my special functor data would be lost in such case (that's probably just my paranoia speaking).

run puppet resource types only when file has changed

I feel like this is a very common scenario but yet I haven't found a solution that meets my needs. I want to create something that works like below. (Unfortunately set_variable_only_if_file_was_changed doesn't exist)
file { 'file1':
path => "fo/file1",
ensure => present,
source => "puppet:///modules/bar/file1",
set_variable_only_if_file_was_changed => $file_change = true
}
if $file_change{
file{'new file1'
...
}
exec{'do something1'
...
}
package{'my package'
...
}
}
I have seen how exec can be suppressed using the refreshonly => true but with files it seems harder. I would like to avoid putting code into a .sh file and execute it through a exec.

Puppet does not have the feature you describe. Resources that respond to refresh events do so in a manner appropriate to their type; there is no mechanism to define custom refresh behavior for existing resources. There is certainly no way to direct which resources will be included in a node's catalog based on the results of applying other resources -- as your pseudocode seems to suggest -- because the catalog is completely built before anything in it is applied.
Furthermore, that is not nearly as constraining as you might suppose if you're coming to Puppet with a substantial background in scripting, as many people do. People who think in terms of scripting need to make a mental adjustment to Puppet, because Puppet's DSL is definitely not a scripting language. Rather, it is a language for describing the construction of a machine state description (a catalog). It is important to grasp that such catalogs describe details of the machine's target state, not the steps required to take it from its current state to some other state. (Even Execs, a possible exception, are most effective when used in that way.) It is very rare to be unable to determine a machine's intended state from its identity and initial state. It is uncommon even for it to be difficult to do so.
How you might achieve the specific configuration you want depends on the details. Often, the answer lies in user-defined facts. For example, instead of capturing whether a given file was changed during catalog application, you provide a fact that you can interrogate during catalog building to evaluate whether that file will be changed, and adjust the catalog details appropriately. Sometimes you don't need such tests in the first place, because Puppet takes action on a given resource only when changes to it are needed.

DDD: Using Value Objects inside controllers?

When you receive arguments in string format from the UI inside you controller, do you pass strings to application service (or to command) directly ?
Or, do you create value objects from the strings inside the controller ?
new Command(new SomeId("id"), Weight.create("80 kg"), new Date())
or
new Command("id", "80 kg", new Date())
new Command("id", "80", "kg", new Date())
Maybe it is not important, but it bothers me.
The question is, should we couple value objects from the domain to (inside) the controller ?
Imagine you don't have the web between you application layer and the presentation layer (like android activity or swing), would you push the use of value objects in the UI ?
Another thing, do you serialize/unserialize value objects into/from string like this ?
Weight weight = Weight.create("80 kg");
weight.getValue().equals(80.0);
weight.getUnit().equals(Unit.KILOGRAMS);
weight.toString().equals("80 kg");
In the case of passing strings into commands, I would rather pass "80 kg" instead of "80" and "kg".
Sorry if the question is not relevant or funny.
Thank you.
UPDATE
I came across that post while I was searching information about a totally different topic : Value Objects in CQRS - where to use
They seem to prefer primitives or DTOs, and keep VOs inside the domain.
I've also taken a look at the book of V. Vernon (Implementing DDD), and it talks about (exactly -_-) that in chapter 14 (p. 522)
I've noticed he's using commands without any DTOs.
someCommand.setId("id");
someCommand.setWeightValue("80");
someCommand.setWeightUnit("kg");
someCommand.setOtherWeight("80 kg");
someCommand.setDate("17/03/2015 17:28:35");
someCommand.setUserName("...");
someCommand.setUserAttribute("...");
someCommand.setUserOtherAttributePartA("...");
someCommand.setUserOtherAttributePartB("...");
It is the command object that would be mapped by the controller. Value objects initialization would appeare in the command handler method, and they would throw something in case of bad value (self validation in initialization).
I think I'm starting to be less bothered, but some other opinions would be welcomed.

As an introduction, this is highly opinionated and I'm sure everyone has different ideas on how it should work. But my endeavor here is to outline a strategy with some good reasons behind it so you can make your own evaluation.
Pass Strings or Parse?
My personal preference here is to parse everything in the Controller and send down the results to the Service. There are two main phases to this approach, each of which can spit back error conditions:
1. Attempt to Parse
When a bunch of strings come in from the UI, I think it makes sense to attempt to interpret them immediately. For easy targets like ints and bools, these conversions are trivial and model binders for many web frameworks handle them automatically.
For more complex objects like custom classes, it still makes sense to handle it in this location so that all parsing occurs in the same location. If you're in a framework which provides model binding, much of this parsing is probably done automatically; if not - or you're assembling a more complex object to be sent to a service - you can do it manually in the Controller.
Failure Condition
When parsing fails ("hello" is entered in an int field or 7 is entered for a bool) it's pretty easy to send feedback to the user before you even have to call the service.
2. Validate and Commit
Even though parsing has succeeded, there's still the necessity to validate that the entry is legitimate and then commit it. I prefer to handle validation in the service level immediately prior to committing. This leaves the Controller responsible for parsing and makes it very clear in the code that validation is occurring for every piece of data that gets committed.
In doing this, we can eliminate an ancillary responsibility from the Service layer. There's no need to make it parse objects - its single purpose is to commit information.
Failure Condition
When validation fails (someone enters an address on the moon, or enters a date of birth 300 years in the past), the failure should be reported back up to the caller (Controller, in this case). While the user probably makes no distinction between failure to parse and failure to validate, it's an important difference for the software.
Push Value Objects to UI?
I would accept parsed objects as far up the stack as possible, every time. If you can have someone else's framework handle that bit of transformation, why not do it? Additionally, the closer to the UI that the objects can live, the easier it is to give good, quick feedback to the user about what they're doing.
A Note on Coupling
Overall, pushing objects up the stack does result in greater coupling. However, writing software for a particular domain does involve being tightly coupled to that domain, whatever it is. If a few more components are tightly coupled to some concepts that are ubiquitous throughout the domain - or at least to the API touchpoints of the service being called - I don't see any real reduction in architectural integrity or flexibility occurring.
Parse One Big String or Components?
In general, it tends to be easiest to just pass the entire string into the Parse() method to get sorted through. Take your example of "80 kg":
"80 kg" and "120 lbs" may both be valid weight inputs
If you're passing in strings to a Parse() method, it's probably doing some fairly heavy lifting anyway. Expecting it to split a string based on a space is not overbearing.
It's far easier to call Weight.create(inputString) than it is to split inputString by " ", then call Weight.create(split[0], split[1]).
It's easier to maintain a single-string-input Parse() function as well. If some new requirement comes in that the Weight class has to support pounds and ounces, a new valid input may be "120 lbs 6 oz". If you're splitting up the input, you now need four arguments. Whereas if it's entirely encapsulated within the Parse() logic, there's no burden to outside consumers. This makes the code more extensible and flexible.

The difference between a DTO and a VO is that a DTO has no behavior, it's a simple container designed to pass data around from component to component. Besides, you rarely need to compare two DTO's and they are generally transient.
A Value Object can have behavior. Two VO's are compared by value rather than reference, which means for instance two Address value objects with the same data but that are different object instances will be equal. This is useful because they are generally persisted in one form or another and there are more occasions to compare them.
It turns out that in a DDD application, VO's will be declared and used in your Domain layer more often than not since they belong to the domain's Ubiquitous Language and because of separation of concerns. They can sometimes be manipulated in the Application layer but typically won't be sent between the UI layer and Application. We use DTO's for that instead.
Of course, this is debatable and depends a lot on the layers you choose to build your application out of. There might be cases when crunching your layered architecture down to 2 layers will be beneficial, and when using business objects directly in the UI won't be that bad.

How to increase reusability between SpecFlow/Gherkin steps?

I think I thoroughly understand the concepts and ideas behind SpecFlow, but even after reading the Secret Ninja Cucumber Scrolls, The Cucumber Book, and going through the various forums I'm still unsure about the path to reusability.
Our scenarios already comply to various guidelines
Self explanatory
Must have a understandable purpose (what makes it different from the other scenarios)
Are unique
Represent vertical functional slices
Uses Ubiquitous Language
Written from the stakeholder perspective
About business functionality, not about software design
Grouped by Epics
ARE NOT TEST SCRIPTS
Let somebody else read them to see if the scenario is correct
Doesn't refer to UI elements
Represent key examples
Non-technical
Precise and testable
As repeatable as possible
'Given' represent state, not actions
'When' represent actions
'Then' should represent a visible change, not some internal event
Our steps have to comply to the following guidelines (some are specific to SpecFlow):
Uses Ubiquitous Language
Doesn't refer to UI elements
Should not be combined
Should be reusable and global over all features
Should not be linked to a specific feature
Grouped by entities, entity groups or domain concepts
Don't create steps to reuse logic in a step definitions file
Think thoroughly in what Steps file a step belongs
Don't reuse steps between phases
Literal strings in steps must be avoided, but if required use single quotes
Never apply multiple [Given], [When] or [Then] attributes to the step
method
Order the steps according to the phase they represent
If it is not important for the scenario, it is very important not to mention it
But we still end up with lots of variations of the same steps, even if we use regex placeholders. Especially the rule that if something is not important, you shouldn't mention it results in those variations. And yes, internally these steps do a lot of reusing, but not in the scenario.
Consider for example the following scenario:
Feature: Signing where both persons are physically available
#Smoke
Scenario: Show remaining time to sign based on previous signature
Given a draft proposal
And the first signature has been set
When I try to set the second signature
Then the remaining time to sign should be shown
#Smoke
Scenario: Re-signing of the first proposal
Given a signature that has not been set within the configured time
And the first signature has just been re-signed
When I try to set the second signature
Then the remaining time should start over
Would it be better to combine the two 'given' steps into one and loose some reusability?
Some other examples:
Feature: Conditionally show signatures to be signed
#Smoke
Scenario: Show the correct signature for a proposal with a night shift
Given I have a proposal for the day shift
When I change it to the night shift
Then I should only be able to sign for the night shift
#Smoke
Scenario: Show additional signature when extending the shift
Given I have a suspended proposal for the night shift
When I extend the period to the day shift
Then I should confirm extening the period over the shift
Am I missing a fundamental concept here?

This is not an answer, but some hints:
you can put multiple Given/When/Then attributes on the same method. If the parameters are the same and the difference is only in phrasing, this can be useful
in many project we use driver/page object pattern, so the step definitions are usually quite short (2-3 lines), so we bother less about the number of them
I like your scenarios, I would not change them. On the other hand try to focus on the readability and not the reusability. If your language is consistent, the reusability will come.
For increasing the reusability especially when there are a lot of "variations" of the entity you are talking about, you can consider using the step argument transformations. Here is an example:
you need a class to represent a permit in the tests with decorations:
class PermitDescription{
bool suspended;
bool draft;
}
create converter methods:
[StepArgumentTransformation("permit")]
public PermitDescription CreateSimple(){
return new PermitDescription();
}
[StepArgumentTransformation("draft permit")]
public PermitDescription CreateDraft(){
return new PermitDescription() { draft = true; }
}
[StepArgumentTransformation("suspended permit")]
public PermitDescription CreateSuspended(){
return new PermitDescription() { suspended = true; }
}
you can have now more flexible step definitions that require permits:
[Given(#"I have a (.*) for the day shift")]
public void Something(PermitDescription p)
{ ... }
that matches to:
Given I have a permit for the day shift
Given I have a draft permit for the day shift
Given I have a suspended permit for the day shift
of course this is tool that can be also abused, but in some cases it can help.

Adding onto the answer from #gaspar-nagy
It follows the pattern of class design in C programming. Anywhere a common group of classes share common properties/methods, those properties/methods can be refactored into a base class.
What it looks like in our SpecFlow tests is that common browser operations are in the base classes:
Login()
Logout()
NavigateToUrl(string url)
UserHasPermission(string permission)
WaitForElementToAppearById(string id)
WaitForElementToAppearByClass(string class)
And each of those methods could have 1 or more Given/When/Then attributes like #gasper-nagy stated.
Another technique which proves invaluable is to share variables between .features and their respective C# step files is to use the ScenarioContext.
For example, whenever Login() is called to initiate our browser based tests, we do this:
ScenarioContext.Current.Set<IWebDriver>(driver, "driver")
Then anywhere else that needs the driver, can get it by:
var driver = ScenarioContext.Current.Get<IWebDriver>("driver")
This makes steps re-usable, such as user input tests for validation you may decide to pass the element being validated around like this:
ScenarioContext.Current.Set<IWebElement>(element, "validation-element")

What do you think of the Managed Contract Tools library

I recently saw this video
http://channel9.msdn.com/pdc2008/TL51/ about the managed Contract tools library which certainly looks very interesting. Sadly it seems they won't include this into the language itself which would have been more elegant as in Spec#. It would be nice to have is as both options in C#4.0 in fact as the Contracts adds a lot of noise to the business code.
Have anybody here used it and have some real world feedback? Could you also add contracts to class Properties and even variables? Something like
decimal Percentage (min 0, max 1)
string NotNullString (not null, regex("??"))
would maybe been nice.

I'm trying it, but I think that the library is too young to be used seriously on big projects, at least with the static checks enabled: compiling is very slow and the specific warning are not very clear to read.
Runtime checkings can be used without problems, because they seems to be implemented as Debug.Assert. At least you document methods.
In order to add contracts to properties i would add constraints to the set property, but in this specific case I think it would be better writing a class that could actually encapsulate the requirements, allowing the construction of good objects only. Anyway:
private decimal _Percentage;
decimal Percentage
{
get{ return _Percentage;}
set
{
CodeContract.RequiresAlways(value <= 1);
CodeContract.RequiresAlways(value >= 0);
_Percentage = value;
}
}
p.s: It seems to me that the trend in C# goes in the dynamic typing direction, instead of going to the strict and strong typing coding methods. I think that DbC works better with strong typing, at least because it let you add more requirements to types and to functions.

It seems the Contract Library could be a nice fit for Linq2Sql. Define fields and constraint in your sql database and contracts could be generated for you. This could be a nice introduction to Contracts.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string