Controlling lookahead depth or synchronizing token source with token consumption

Controlling lookahead depth or synchronizing token source with token consumption - antlr4

I'm porting (or actually rewriting from scratch) SystemVerilog grammar from ANTLR2.7.7 to ANTLR4.7
SystemVerilog has plenty of directives inherited from Verilog. They can show up pretty much anywhere in the source code. As such they cannot be handled by
parser. Some are interpreted by lexer and never reach further (controlling source encryption), some are for preprocessor (macros, conditional compilation etc.) but there are some that live beyond that phase. These need two way communication between parser and direct token source that handles them. Parser asks for state of these directives when it encounteres construct that can be influenced by them (when visitor is used for actions parser needs to ask for that information to remember it as part of context). On the other hand token source needs to ask parser for current scope while processing these directives as some cannot be used inside any unit.
Consider the following example:
`timescale 1ns/1ps
module m;
`resetall
initial $printtimescale; //Time scale of (m) is 1ns / 1ps
endmodule
In old ANTLR2 I know exactly when and how many tokens will be pulled from source before action that does communication between source and parser is executed. That knowledge means that there is no need for fancy synchronization, token source can interpret directives as they arrive and parser can ask for directive state when it needs to, because as long as no syntactic predicate is used token creation and consumption happen in parallel.
As far as I know ANTLR4 can pull as many tokens as it wants before executing any action. If parser pulls too many tokens while processing module header it will cause token source to "execute" resetall directive, thus nullifying previous use of timescale.
If there is no other option the way to work around the problem is most likely for token source to remember tokens related to directives it encounters and pack them along with related actions to some functor. Once it needs to emit actual token to the parser and there is at least one functor not yet emitted it would create a subclass of regular token that would remember the functor on top of regular token stuff. This way parser can do whatever it wants predicting parsing path and pulling new tokens from token source - no directive related action will be executed so far. The last bit is to redefine standard consume routine. It will have to ask token if it is of the "carying directive action" kind and execute that action - this way directive actions (and therefore state of directives) will be synchronized with token consumption, not with token creation.
The problem is I cannot test the above workaroud at the moment, because everything up to the point of final token emission is shared between old and new parser. Only the very last bit creating actual tokens is different. This means that to test the solution I'd need to apply it to old parser as well.
Finally, the questions:
is the workaround necessary or are there some reliable techniques to control lookahead in ANTLR4, at least for selected rules?
would the workaround work? I'm mostly concerned about two elements of that solution: is redefining consume enough or there are some other routines that can mess with token stream that would need to be touched as well? also if internally parser changes tokens in the stream f.e. by recreating them with use of some other token class my special functor data would be lost in such case (that's probably just my paranoia speaking).

Related

REST API what is the proper way of URI naming using nouns and not verbs?

I have a login and I need to do something like this:
1. POST .../authentication/login
2. POST .../authentication/verifyToken
3. POST .../authentication/forgotPassword
Will ask for a phone and a password.
Will ask for an auth token.
Will ask for a phone and a password.
But as I read, this structure is not good since it contains verbs rather than nouns.
I have tried to make something like this:
1. POST .../sessions/new
2. GET .../sessions/:token
3. GET .../sessions/forgot
1. Will create a new token, based on phone and password correct credentials.
2. Will verify the token validation or expiration.
3. Will send a SMS within a new password or a new temporary password reset code.
This first method is not REST. But its totally clear.
You can read the URL and understand exactly what it gonna to do. You do not need any kind of explanation.
However, more and more articles say that verbs in REST is not RESTFUL and thus, not a good practice to achieve.
What is the proper way of handle this?

REST doesn't care what spelling you use for your resource identifiers.
But as I read, this structure is not good since it contains verbs rather than nouns.
The structure is fine.
You can read the URL and understand exactly what it gonna to do.
Right -- REST does not care if you can read the URL and understand what it is going to do.
URL/URI are identifiers, in the same way that variable names in your program are identifiers. Your compiler doesn't particularly care whether your variable is named accountBalance, purpleMonkeyDishwasher, or x. The spellings described in our coding standards are for human readers, and the same is true for hackable urls.
However, more and more articles say that verbs in REST is not RESTFUL and thus, not a good practice to achieve.
REST is an architectural style
As an architectural style for network-based applications, its definition is presented in the dissertation incrementally, as an accumulation of design constraints that derive from nine pre-existing architectural styles and five additional constraints unique to the Web
Fielding describes the constraints in his thesis; any who claim that verbs in identifiers "are not RESTful" should be able to point to a specific constraint that they violate.
I have yet to see a compelling argument that any of the REST constraints are violated by identifier spellings.

First I would not use the term session. This implies a server side state, what is problematic in stateless communication as REST requires.
So your problem could be solved by modeling resources, probably like:
GET ./authentication/token
to get a token if valid credentials provided in the request headers.
GET ./authentication/password
to get a new temporary password if E-mail address provided in the request headers.
You also can use POST in order to transport values in the request body.
Be aware of that the service should answer with an HTTP 204 in case of the result is being sent by SMS.

How to implement Commands and Events for complex form using Event Sourcing?

I would like to implement CQRS and ES using Axon framework
I've got a pretty complex HTML form which represents recruitment process with six steps.
ES would be helpful to generate historical statistics for selected dates and track changes in form.
Admin can always perform several operations:
assign person responsible for each step
provide notes for each step
accept or reject candidate on every step
turn on/off SMS or email notifications
assign tags
Form update (difference only) is sent from UI application to backend.
Assuming I want to make changes only for servers side application, question is what should be a Command and what should be an Event, I consider three options:
Form patch is a Command which generates Form Update Event
Drawback of this solution is that each event handler needs to check if changes in form refers to this handler ex. if email about rejection should be sent
Form patch is a Command which generates several Events ex:. Interviewer Assigned, Notifications Turned Off, Rejected on technical interview
Drawback of this solution is that some events could be generated and other will not because of breaking constraints ex: Notifications Turned Off will succeed but Interviewer Assigned will fail due to assigning unauthorized user. Maybe I should check all constraints before commands generation ?
Form patch is converted to several Commands ex: Assign Interviewer, Turn Off Notifications and each command generates event ex: Interviewer Assigned, Notifications Turned Off
Drawback of this solution is that some commands can fail ex: Assign Interviewer can fail due to assigning unauthorized user. This will end up with inconsistent state because some events would be stored in repository, some will not. Maybe I should check all constraints before commands generation ?

The question I would call your attention to: are you creating an authority for the information you store, or are you just tracking information from the outside world?
Udi Dahan wrote Race Conditions Don't Exist; raising this interesting point
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
If you have an unauthorized user in your system, is it really critical to the business that they be authorized before they are assigned responsibility for a particular step? Can the system really tell that the "fault" is that the responsibility was assigned to the wrong user, rather than that the user is wrongly not authorized?
Greg Young talks about exception reports in warehouse systems, noting that the responsibility of the model in that case is not to prevent data changes, but to report when a data change has produced an inconsistent state.
What's the cost to the business if you update the data anyway?
If the semantics of the message is that a Decision Has Been Made, or that Something In The Real World Has Changed, then your model shouldn't be trying to block that information from being recorded.
FormUpdated isn't a particularly satisfactory event, for the reason you mention; you have to do a bunch of extra work to cast it in domain specific terms. Given a choice, you'd prefer to do that once. It's reasonable to think in terms of translating events from domain agnostic forms to domain specific forms as you go along.
HttpRequestReceived ->
FormSubmitted ->
InterviewerAssigned
where the intermediate representations are short lived.

I can see one big drawback of the first option. One of the biggest advantage of CQRS/ES with Axon is scalability. We can add new features without worring about regression bugs. Adding new feature is the result of defining new commands, event and handlers for both of them. None of them should not iterfere with ones existing in our system.
FormUpdate as a command require adding extra logic in one of the handler. Adding new attribute to patch and in consequence to command will cause changes in current logic. Scalability is no longer advantage in that case.

VoiceOfUnreason is giving a very good explanation what you should think about when starting with such a system, so definitely take a look at his answer.
The only thing I'd like to add, is that I'd suggest you take the third option.
With the examples you gave, the more generic commands/events don't tell that much about what's happening in your domain. The more granular events far better explain what exactly has happened, as the event message its name already points it out.
Pulling Axon Framework in to the loop, I can also add a couple of pointers.
From a command message perspective, it's safe to just take a route and not over think it to much. The framework quite easily allows you to adjust the command structure later on. In Axon Framework trainings it is typically suggested to let a command message take the form of a specific action you're performing. So 'assigning a person to a step would typically be a AssignPersonToStepCommand, as that is the exact action you'd like the system to perform.
From events it's typically a bit nastier to decide later on that you want fine grained or generic events. This follows from doing Event Sourcing. Since the events are your source of truth, you'll thus be required to deal with all forms of events you've got in your system.
Due to this I'd argue that the weight of your decision should lie with how fine grained your events become. To loop back to your question: in the example you give, I'd say option 3 would fit best.

What is the best practice to handle invalid or illegal combinations of inputs in a verilog module?

When I am writing a module for a bigger project that takes several control input, what is the best practice/standard to handle combinations of input that are invalid or illegal?
For example, I have a Queue that has three control signals - Enqueue, Dequeue and Delete. Assume that only one of the operations can be performed in a cycle. Therefore, only one input signals should be asserted at a time. Now, what is the proper way to handle the case when some parent module drives two control signals at a time?
In my project, I can handle it any way I wish and I will take care to avoid it. But in a company wide level, someone may mess up later and use it improperly. What is the practice to prevent this problem? Or in other words, I am looking for something analogous to try-catch/exception in verilog.

This is a classic example of where an assertion is useful. We don't tend to put error-checking logic in our chips (unless we are designing something safety-critical); instead we use assertions. A property is a potential fact about your design (eg "only one input signal is asserted at a time"). An assertion is a statement that a property should be true.
You can check assertions either using a formal tool or by simulating. In your case, the latter makes sense. So, you would implement a suitable check (the assertion) and then would run all your simulations and make sure that the assertion never fails.
So, how to implement the assertion? How to code it? You could
i) Switch to SystemVerilog. SystemVerilog has an assert statement that is useful for basic assertions and there is a part of SystemVerilog called SystemVerilog Assertions (SVA), which is much more powerful. Verilog is merely a subset of SystemVerilog, but nevertheless switching is still clearly easier said than done - you might need a change in company policy or to buy more expensive licences or some training...
ii) Write assertions in some other language (eg SVA or PSL, but code in Verilog. Again, easier said than done - again you might need a change in company policy or to buy more expensive licences or some training..
iii) Use OVL. This is a free, downloadable library of modules that implement basic (and not so basic) assertions. There is a version written in Verilog, so no company policy change or licences required, but you'd have to invest a bit of time learning how to use them.
iv) Write assertions in Verilog. You could hide them inside generate statements (or ifdefs if you must), to keep them away from the synthesiser, eg:
generate if (ASSERTIONS_ENABLED)
begin : ASSERT_ONLY_ONE_OF_Enqueue_Dequeue_Delete
always #(posedge clock) // it is nearly always better to check assertions synchronously
if (Enqueue + Dequeue + Delete > 2'b1)
$display("ASSERTION FAIL : ASSERT_ONLY_ONE_OF_Enqueue_Dequeue_Delete");
end
endgenerate
(The links to SVA and PSL are to my company's website. But these were first in the Google results, anyway.)

DDD: Using Value Objects inside controllers?

When you receive arguments in string format from the UI inside you controller, do you pass strings to application service (or to command) directly ?
Or, do you create value objects from the strings inside the controller ?
new Command(new SomeId("id"), Weight.create("80 kg"), new Date())
or
new Command("id", "80 kg", new Date())
new Command("id", "80", "kg", new Date())
Maybe it is not important, but it bothers me.
The question is, should we couple value objects from the domain to (inside) the controller ?
Imagine you don't have the web between you application layer and the presentation layer (like android activity or swing), would you push the use of value objects in the UI ?
Another thing, do you serialize/unserialize value objects into/from string like this ?
Weight weight = Weight.create("80 kg");
weight.getValue().equals(80.0);
weight.getUnit().equals(Unit.KILOGRAMS);
weight.toString().equals("80 kg");
In the case of passing strings into commands, I would rather pass "80 kg" instead of "80" and "kg".
Sorry if the question is not relevant or funny.
Thank you.
UPDATE
I came across that post while I was searching information about a totally different topic : Value Objects in CQRS - where to use
They seem to prefer primitives or DTOs, and keep VOs inside the domain.
I've also taken a look at the book of V. Vernon (Implementing DDD), and it talks about (exactly -_-) that in chapter 14 (p. 522)
I've noticed he's using commands without any DTOs.
someCommand.setId("id");
someCommand.setWeightValue("80");
someCommand.setWeightUnit("kg");
someCommand.setOtherWeight("80 kg");
someCommand.setDate("17/03/2015 17:28:35");
someCommand.setUserName("...");
someCommand.setUserAttribute("...");
someCommand.setUserOtherAttributePartA("...");
someCommand.setUserOtherAttributePartB("...");
It is the command object that would be mapped by the controller. Value objects initialization would appeare in the command handler method, and they would throw something in case of bad value (self validation in initialization).
I think I'm starting to be less bothered, but some other opinions would be welcomed.

As an introduction, this is highly opinionated and I'm sure everyone has different ideas on how it should work. But my endeavor here is to outline a strategy with some good reasons behind it so you can make your own evaluation.
Pass Strings or Parse?
My personal preference here is to parse everything in the Controller and send down the results to the Service. There are two main phases to this approach, each of which can spit back error conditions:
1. Attempt to Parse
When a bunch of strings come in from the UI, I think it makes sense to attempt to interpret them immediately. For easy targets like ints and bools, these conversions are trivial and model binders for many web frameworks handle them automatically.
For more complex objects like custom classes, it still makes sense to handle it in this location so that all parsing occurs in the same location. If you're in a framework which provides model binding, much of this parsing is probably done automatically; if not - or you're assembling a more complex object to be sent to a service - you can do it manually in the Controller.
Failure Condition
When parsing fails ("hello" is entered in an int field or 7 is entered for a bool) it's pretty easy to send feedback to the user before you even have to call the service.
2. Validate and Commit
Even though parsing has succeeded, there's still the necessity to validate that the entry is legitimate and then commit it. I prefer to handle validation in the service level immediately prior to committing. This leaves the Controller responsible for parsing and makes it very clear in the code that validation is occurring for every piece of data that gets committed.
In doing this, we can eliminate an ancillary responsibility from the Service layer. There's no need to make it parse objects - its single purpose is to commit information.
Failure Condition
When validation fails (someone enters an address on the moon, or enters a date of birth 300 years in the past), the failure should be reported back up to the caller (Controller, in this case). While the user probably makes no distinction between failure to parse and failure to validate, it's an important difference for the software.
Push Value Objects to UI?
I would accept parsed objects as far up the stack as possible, every time. If you can have someone else's framework handle that bit of transformation, why not do it? Additionally, the closer to the UI that the objects can live, the easier it is to give good, quick feedback to the user about what they're doing.
A Note on Coupling
Overall, pushing objects up the stack does result in greater coupling. However, writing software for a particular domain does involve being tightly coupled to that domain, whatever it is. If a few more components are tightly coupled to some concepts that are ubiquitous throughout the domain - or at least to the API touchpoints of the service being called - I don't see any real reduction in architectural integrity or flexibility occurring.
Parse One Big String or Components?
In general, it tends to be easiest to just pass the entire string into the Parse() method to get sorted through. Take your example of "80 kg":
"80 kg" and "120 lbs" may both be valid weight inputs
If you're passing in strings to a Parse() method, it's probably doing some fairly heavy lifting anyway. Expecting it to split a string based on a space is not overbearing.
It's far easier to call Weight.create(inputString) than it is to split inputString by " ", then call Weight.create(split[0], split[1]).
It's easier to maintain a single-string-input Parse() function as well. If some new requirement comes in that the Weight class has to support pounds and ounces, a new valid input may be "120 lbs 6 oz". If you're splitting up the input, you now need four arguments. Whereas if it's entirely encapsulated within the Parse() logic, there's no burden to outside consumers. This makes the code more extensible and flexible.

The difference between a DTO and a VO is that a DTO has no behavior, it's a simple container designed to pass data around from component to component. Besides, you rarely need to compare two DTO's and they are generally transient.
A Value Object can have behavior. Two VO's are compared by value rather than reference, which means for instance two Address value objects with the same data but that are different object instances will be equal. This is useful because they are generally persisted in one form or another and there are more occasions to compare them.
It turns out that in a DDD application, VO's will be declared and used in your Domain layer more often than not since they belong to the domain's Ubiquitous Language and because of separation of concerns. They can sometimes be manipulated in the Application layer but typically won't be sent between the UI layer and Application. We use DTO's for that instead.
Of course, this is debatable and depends a lot on the layers you choose to build your application out of. There might be cases when crunching your layered architecture down to 2 layers will be beneficial, and when using business objects directly in the UI won't be that bad.

Can TDD be a valid alternative to overkill data validation?

Consider these two data validation scenarios:
Check everything everywhere
Make sure that every method that takes one or more arguments actually checks them to ensure that they're syntactically valid.
Pros
Very fine check granularity.
If the code that is being written is for some kind of library we make sure to limit the damage that can be done if the developers that will be using it fail to provide valid data.
Cons
It's costly to always perform checks that most of the time shouldn't be needed.
It's still possible to forget to add a check every now and then.
More code is being written and hence in need of maintenance.
Make use of TDD goodness
Validate data only when it enters your code from the external world.
To make sure that internally data will be always syntactically correct, create tests that check every method that returns a value. To make sure that if valid data enters, valid data exits.
The pros and the cons are practically switched with the ones from the former approach.
As of now I'm using the first approach, but since I'm employing test driven development I thought that maybe I could go with the second one.
The advantages are clear, still, I wonder if it's as secure as the first method.

It sounds like the first method is contract driven, and one aspect of that is that you also need to verify that what you return from any public interface meets the contract.
But, I think that both approaches are valid, but very different.
TDD only partially deals with the public interface, in that it should check that every input is properly validated, unfortunately, unless you have all your validation in separate functions, to adequately test, it becomes very difficult to ensure that this function of 3 or 4 parameters is being properly tested for validity. The number of tests you have to write is quite high, in either approach.
If you are using a library, then in every function that can be called directly from the outside (outside being outside the library) then you will need to check that every input is valid, and that invalid input is handled as per the contract, either returning a null or throwing an exception. But, it must be in agreement with the documentation.
Once you have verified it, then there is no reason to force the verification on private functions as those can only be called from within the library, and you should be verifying that you are only dealing with valid data.
Lots of tests will be needed, regardless, unfortunately. All these tests do is to ensure that you don't have any surprise problems, but that should generally help justify the cost of writing and maintaining them.
As to your question, if your tests are really well written, and you ensure that all validity checks are done completely, then it should be as secure, but the risk is that if you believe it is secure and you have poorly written tests then it will actually be worse than no tests, as there is an assumption that your tests are well-written.
I would use both methods, until you know your tests are well-written then just go with TDD.

My opinion is that in the first scenario, two of your Cons outweigh everything else:
It's costly to always perform checks
that most of the time shouldn't be
needed.
More code is being written and hence
in need of maintenance.
Also, technically TDD has no bearing on this question, because it is not a testing technique. More later...
To mitigate the Cons I would strongly advocate (as I think you say) splitting the code into an outside and an inside: The outside is where all the validation occurs. Hopefully this is but a thin wrapper around the inside, to prevent GIGO. Once inside, data never needs to be validated again.
As for TDD, I would strongly advocate (as you are now doing) employing it to develop your code, with the added benefit of leaving a trail of tests that become a regression test suite. Now you will naturally develop your outside code to perform robust validation, with the promise of easily adding any checks that you might initially forget. Your inside code can be developed assuming it will only handle valid data, but TDD will still give you the confidence that it will function to spec.
I'm saying that I would go with the second approach, as I've described, independently of whether I'm developing with TDD, or not (but TDD is always my first choice).

The advantages are clear, still, I wonder if it's as secure as the first method.
This completely depends on how well you test it.
This could be just as secure, if the following two criteria are met:
Every publicly exposed means of adding data to the system are validated completely
Every internal method that translates data is completely and adequately tested
However, I question that this would be easier or that it would require less code. The amount of code required to check every public entry point is going to be very similar to the amount of code required to validate each method. You're going to need more checks in the entry points, since they'll have to check things that might otherwise be checked internally.

For the second method, you need two good sets of tests. You must not only check that
To make sure that if valid data
enters, valid data exits.
You must also check that if Invalid data enters, an exception is thrown. I suppose you still have to validate data and kick out if you have invalid data. This is really the only way if you don't want pesky ArgumentNullException s or other cryptic errors in your production application. However TDD can really toughen up the quality of all that checking (especially with Fuzz Testing).

One item is missing from your list of Pros and Cons and that is something important enough to make unit testing a much more safer method than maniac parameters checking.
You just have to consider the When and the Where.
For unit testing the when and the where are:
when: at design time
where: in a dedicated source file outside of the application code
For overkill data checking they are:
when: at runtime
where: entangled in the application source code, typically using asserts.
That is the point: code covered by unit testing detects errors at design time when you run the tests, if you are the paranoid and schizofrenic kind of tester (the bests) you write tests designed to break whatever can be, checking each data boundary and perverse input. You also use code coverage tools to ensure every branch of every alternative is tested. You have no limit : tests lies in their own files and do not clutter application. Doesn't matter if you get ten times as many test lines than the actual application code, no run time penalty, no readability penalty.
On the other hand integrated overkill testing detects errors at runtime. In the worst-case it will detects errors on the user system, where you can do nothing about it (if even you ever heard of this error happening). Also even if you are the paranoid kind you will have to limit your testing. Assertion just can't be 90 percents of the application code. It raise readability issues, maintenance, often heavy performances penalty. Where will you stop then: only checking parameters for external input ? Checking every possible or impossible inputs of inner functions ? Checking every loop invariant ? Also testing behavior when out of flow data (globals, system files, etc) is changed ? You must also be conscious that assertion code can also contain some bugs. What if the formula of an assertion perform a divide. You must ensure it will not lead by a DIVIDE-BY-ZERO error or such ?
Another problem is that in many cases you just don't know what can be done when an assertion failure. If you are at a real entry point you can return back something understandable for your user or the lib user... when you are checking innner functions

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string