Using TokenStreamRewriter to insert tokens after lexing but before parsing - antlr4

Using ANTLR 4.9.2 for C++.
Depending on the first tokens I might need to insert some tokens before parsing. My approach (simplified)
antlr4::ANTLRInputStream antlrIs(properlyEscaped);
Lexer lexer(&antlrIs);
antlr4::CommonTokenStream tokens(&lexer);
antlr4::TokenStreamRewriter tokenStreamRewriter(&tokens);
if (!(tokens.LA(1) == Lexer::MY_SPECIAL_TOKEN))
{
tokenStreamRewriter.insertBefore(tokens.LT(1), string("begin"));
}
Parser parser(&tokens);
Parser::FileContext* fileContext = parser.file();
Stepping with the debugger I see that the token is actually inserted. But the new token I insert seems be be ignored by parser.file().
How can I insert tokens so that parser.file() uses them?

TokenStreamRewriter just builds up a set of instructions for how the input stream should be changed. It doesn’t actually change the token stream itself.
Once you have executed all of your modification calls, you’ll need to call .getText() (or .getText(String programName)) to get get a String that has all of your changes incorporated. Then you can use that as the input to your Lexer to get a token stream containing your modifications.

Related

How to get SAP CloudSdk BatchRequest not to ignore filter parameter on Batch Query?

We are currently struggeling with Batch Query,
which seems to ignore the filter expressions on S4 side caused by a wrong URL encoding.
/sap/opu/odata/sap/ZP2M_A_CONTRACT_SEARCH_HDR_CDS/ZP2M_A_CONTRACT_SEARCH_HDR?$filter=PurchaseContractID eq %274600002020%27&$select=*&$format=json
Executing the query using FluentHelperRead.execute(HttpClient)
the returned list of entities contains the expected result with exactly one entity.
Executing the query as Batch Query the following request is logged in console:
GET ZP2M_A_CONTRACT_SEARCH_HDR?%24filter%3DPurchaseContractID+eq+%25274600002020%2527%26%24select%3D*%26%24format%3Djson HTTP/1.1
The collected list from all batch result parts contains all entities.
It seems, that the query URL is encoded in wrong way
and that S4 ignored the filter expressions when encoded in this way.
e.g. $filter is encoded to %24filter which is ignored by S4.
This seems to be a bug in BatchRequestImpl.getRequest(ODataQueryImpl) method,
where URL encoding is done a 2nd time on already encoded URL parts.
if(systemQuery.indexOf("$format=json&$count=true") != -1)
{
systemQuery = systemQuery.substring(0, systemQuery.indexOf("$format=json&$count=true") -1);
keysUrl.append("/$count");
}
systemQuery = URLEncoder.encode(systemQuery, "UTF-8"); // this code line which encodes the query 2nd time
keysUrl.append("?");
The code line systemQuery = URLEncoder.encode(systemQuery, "UTF-8"); located in
  BatchRequestImpl(1.38.0) - line 295
  BatchRequestImpl(1.42.2) - line 307
encodes the systemQuery string again (including the already encoded parts of FilterExpression as well).
When undoing the changes of this code line in debugger and replacing the scapces by %20 or '+' the Batch Query looks like that
GET ZP2M_A_CONTRACT_SEARCH_HDR?$filter=PurchaseContractID%20eq%20%274600002020%27&$select=*&$format=json HTTP/1.1
GET ZP2M_A_CONTRACT_SEARCH_HDR?$filter=PurchaseContractID+eq+%274600002020%27&$select=*&$format=json HTTP/1.1
and it returns the expected result (exactly 1 entity).
This wrong encoding appears when using these library versions:
sdk-bom: 3.16.1
connectivity: 1.38.0
This issue appears in newest SDK versions as well:
sdk-bom: 3.21.0
connectivity: 1.39.0
This issue appears with connectivity JAR in newest version too:
sdk-bom: 3.21.0
connectivity: 1.40.2
Debugging together with a ABAP/S4 colleague figures out,
that S4 only applies filter expressions, if the keyword $filter is found in request,
%24filter%3D is ignored (the cause why we get all entities running the Batch Query).
My suggestion to solve it would be
// decode query first (to decode the filter expression)
systemQuery = URLDecoder.decode(systemQuery, "UTF_8");
// encode query
systemQuery = org.apache.commons.httpclient.util.URIUtil.encodeQuery(systemQuery, "UTF_8");
My code, how I am calling the batchRequest:
FluentHelperRead<?, MyEntity, ?> queryApi = myService.getAll... // with adding some filter expression
BatchRequestBuilder batchRequestBuilder = BatchRequestBuilder.withService(MyService.DEFAULT_SERVICE_PATH);
ODataQuery query = queryApi.toQuery();
batchRequestBuilder.addQueryRequest(query);
HttpClient httpClient = HttpClientAccessor
.getHttpClient(DefaultErpHttpDestinationAccessor.get());
BatchRequest request = batchRequestBuilder.build();
BatchResult result = request.execute(httpClient);
// ... evaluate response
I think, this is a general issue in the Cloud SDK.
Would is be possible to get this fixed in next Cloud SDK release?
Can you share your code for Batch request? Do you use BatchRequestImpl directly?
The thing is SAP Cloud SDK relies on some dependencies one of which introduces the BatchRequestImpl and if it's called directly the bug is on the dependency side. I have already informed them to investigate this double encoding issue. Unfortunately, we can't directly influence how fast it is resolved and sometimes it takes longer than we'd like.
The good news, we're working on replacing this dependency with our own implementation to solve exactly this kind of problem. The batch is work in progress and should be available in Beta around the end of next month for OData V4 and hopefully around the same time for OData V2 (it's not a hard commitment and depends on other priorities).
From here we have to wait for whatever happens first:
The bug is fixed on the dependency side
Internal OData client implementation is ready together with Batch
I hope it helps and explains current solution path. If you share a bit around your deadlines and the potential impact we'll be happy to consider that.
This has been fixed within the dependency and as of version 3.25.0 the SAP Cloud SDK includes the fix.

Generate a consistent sha256 hash from an object in Node

I have an object that I'd like to hash with sha256 in Node. The contents of the object are simple Javascript types. For example's sake, let's say:
var payload = {
"id": "3bab3f00-7d55-11e7-9b0a-4c32759242a5",
"foo": "a message",
"version": 7,
};
I create a hash like this:
const crypto = require('crypto');
var hash = crypto.createHash('sha256');
hash.update( ... ).digest('hex');
The question is, what to pass to update? The documentation for crypto says you can pass a <string> | <Buffer> | <TypedArray> | <DataView>, which seems to suggest an object is not a good thing to pass.
I can't use toString() because that prints "[object Object]". I could use JSON.stringify, however I have read elsewhere that the output from stringify is not guaranteed to be deterministic for the same input.
Are there any other options? I do not want to download a package from NPM.
The right terms are "canonical" and the action is called "canonicalization" (I'm assuming EN-US here), you can find a stringify that produces canonical output here.
Beware that you must make sure that the output also has the right character set (UTF-8 should be preferred) and line endings. Spurious data should not be present, e.g. a byte order mark or NUL termination string is enough to void the hash value.
After that you can pass it as string I suppose.
You can of course use any canonical encoding. Note that XML has defined XML-digsig, which contains canonicalization during signature generation and signing, which means that the verification will even succeed if the XML code is altered (without altering the structure or contents of course, but whitespace / indentation will not matter).
I'd still recommend regression testing between implementations and even version updates of the libraries.

Orchard cms token json

I would like to format a json payload in a workflow activity. I use the new {Text.JavaScriptEncode} to enclose my properties in {}. I should do it wrong because the tokens are not evaluated anymore. So if i use
{Text.JavaScriptEncode}{
"Courriel":{FormSubmission.Field:Courriel}
{Text.JavaScriptEncode}}
It ends with the following value:
{
"Courriel":{FormSubmission.Field:Courriel}
}
So the {FormSubmission.Field:Courriel} is not evaluated. If i don't specify {Text.JavaScriptEncode} before the first {, nothing is rendered (empty string).
I'm using Orchard 1.10.1.0
You might need to turn on the Tokenizers HashMode.
I haven't tested your token but I'm pretty sure the tokenizer tries to evaluate
this as a token and fails:
{"Courriel":{FormSubmission.Field:Courriel}
With hashMode enabled your code will look like this:
#{Text.JavaScriptEncode}{
"Courriel":#{FormSubmission.Field:Courriel}
#{Text.JavaScriptEncode}}

What is the best way to safely read user input?

Let's consider a REST endpoint which receives a JSON object. One of the JSON fields is a String, so I want to validate that no malicious text is received.
#ValidateRequest
public interface RestService {
#POST
#Consumes(APPLICATION_JSON)
#Path("endpoint")
void postData (#Valid #NotNull Data data);
}
public class Data {
#ValidString
private String s;
// get,set methods
}
I'm using the bean validation framework via #ValidString to delegate the validation to the ESAPI library.
#Override
public boolean isValid (String value, ConstraintValidatorContext context) {
return ESAPI.validator().isValidInput(
"String validation",
value,
this.constraint.type(),
this.constraint.maxLength(),
this.constraint.nullable(),
true);
}
This method canonicalizes the value (i.e. removes encryption) and then validates against a regular expression provided in the ESAPI config. The regex is not that important to the question, but it mostly whitelists 'safe' characters.
All good so far. However, in a few occasions, I need to accept 'less' safe characters like %, ", <, >, etc. because the incoming text is from an end user's free text input field.
Is there a known pattern for this kind of String sanitization? What kind of text can cause server-side problems if SQL queries are considered safe (e.g. using bind variables)? What if the user wants to store <script>alert("Hello")</script> as his description which at some point will be send back to the client? Do I store that in the DB? Is that a client-side concern?
When dealing with text coming from the user, best practice is to white list only known character sets as you stated. But that is not the whole solution, since there are times when that will not work, again as you pointed out sometimes "dangerous" characters are part of the valid character set.
When this happens you need to be very vigilant in how you handle the data. I, as well as the commenters, recommended is to keep the original data from the user in its original state as long as possible. Dealing with the data securely will be to use proper functions for the target domain/output.
SQL
When putting free format strings into a SQL database, best practice is to use prepared statements (in java this is the PreparedStatement object or using ORM that will automatically parameterizes the data.
To read more on SQL injection attacks and other forms of Injection attacks (XML, LDAP, etc.) I recommended OWASPS Top 10 - A1 Injections
XSS
You also mentioned what to do when outputting this data to client. In this case I you want to make sure you html encode the output for the proper context, aka contextual output encoding. ESAPI has Encoder Class/Interface for this. The important thing to note is which context (HTML Body, HTML Attribute, JavaScript, URL, etc.) will the data be outputted. Each area is going to be encoded differently.
Take for example the input: <script>alert('Hello World');<script>
Sample Encoding Outputs:
HTML: <script>alert('Hello World');<script>
JavaScript: \u003cscript\u003ealert(\u0027Hello World\u0027);\u003cscript\u003e
URL: %3Cscript%3Ealert%28%27Hello%20World%27%29%3B%3Cscript%3E
Form URL:
%3Cscript%3Ealert%28%27Hello+World%27%29%3B%3Cscript%3E
CSS: \00003Cscript\00003Ealert\000028\000027Hello\000020World\000027\000029\00003B\00003Cscript\00003E
XML: <script>alert(&apos;Hello World&apos;);<script>
For more reading on XSS look at OWASP Top 10 - A3 Cross-Site Scripting (XSS)

ANTLR4: Getting start and end index for each rule: $stop behaves strange

I need to get the start and end index of each rule. I.e., the start index is the character position of the first character of the first token belonging to the rule and the end index is the last character position of the last token belonging to the rule. With these numbers I can crop the result of a rule out of the input file precisely.
The straight-forward way of doing this should be using the $start and $stop tokens, i.e., $start.getStartIndex() and $stop.getStopIndex(). However, I have encountered that the $stop token is often null even when used in the #after action.
According to the definitive Antlr4 reference the $stop token is defined as: "The last nonhidden channel token to be matched
by the rule. When referring to the current rule,
this attribute is available only to the after and
finally actions." This sounds as if such token should exist (at least for any rule that matches at least one token). Thus, it is quite strange why this token is null in many cases (even for rules that have a simple token - not a subrule - as their last token. How can a stop token be null in this case?
Right now, I am using a workaround by just asking the input about its current token, moving one token back and using this token as stop token. However, this seems hacky:
#after {
int start = $start.getStartIndex();
int stop = _input.get(_input.index()-1).getStopIndex();
// do something with start and stop
}
The cleaner solution (if stop was not null) should look like this:
#after {
int start = $start.getStartIndex();
int stop = $stop.getStopIndex();
}
The stop token is set in the finally block in the generated code, after any user-defined #finally{} action is executed. The #after{} code is executed in the try block, which also occurs before the stop token is set.
The stop property only works for qualified references. For example, you could do the following:
foo : bar {assert $bar.stop != null};
Also, note that ANTLR 4 is designed to encourage the relocation of action code from embedded actions to listener and/or visitor interfaces that operate on the parse tree after parsing is complete. When used in this manner, the stop tokens will be set for all contexts in the tree. In nearly all cases, the use of a #after or #finally block is a code smell in ANTLR 4 that you should avoid.

Resources