How to ignore a token and reset lexer lookahead index - antlr4

A couple of weeks ago I started to work with ANTLR4. I would like to know if there is a way to ignore a matched token by the lexer and reset the index position of the CharStream to the start position of the ignored token so that it can be matched again by other rule (for example other rule within a different mode).
Many thanks in advance.

This sounds quite a bit like the less lexer command which was proposed but never fully defined:
#212: A new lexer command: Less
For now, I would override the nextToken command in your lexer.
#Override
public Token nextToken() {
while (true) {
int mark = input.mark();
try {
int startIndex = input.index();
Token token = super.nextToken();
if (token != null && token.getType() == MY_SPECIAL_TYPE) {
pushMode(MY_SPECIAL_MODE);
input.seek(startIndex);
continue;
}
return token;
}
finally {
input.release(mark);
}
}
}

Related

Spell checker in android

I want to create an application in which it checks if the word typed by user is correct or not using Google Dictionary ?
i have used the below link . But the problem with the given example is that it suggests the different words. I don't want suggestion, instead i want to only check that word entered is correct or not.
I haven't worked on it yet. But you can probably modify it as:
When you get the suggestions, instead of appending them to StringBuilder, and showing that StringBuilder to MainView, just compare all suggestions with your input string of edittext.
If it matches, then the spell is correct, else the spell is incorrect.
Code snippet:
#Override
public void onGetSuggestions(final SuggestionsInfo[] arg0) {
isSpellCorrect = false;
final StringBuilder sb = new StringBuilder();
for (int i = 0; i < arg0.length; ++i) {
// Returned suggestions are contained in SuggestionsInfo
final int len = arg0[i].getSuggestionsCount();
if(editText1.getText().toString().equalsIgnoreCase(arg0[i].getSuggestionAt(j))
{
isSpellCorrect = true;
break;
}
}
}
Hope this helps.

Get the most possible token types according to line and column number in ANTLR4

I would like to get a list of most possible list of tokens for a given location in the text (line and column number) to determine what has to be populated for auto code completion. Can this be easily achieved using ANTLR 4 API.
I want to get the possible list of tokens for a given location because the user might be writing/editing somewhere in the middle of the text which still guarantees the possible list of tokens.
Please give me some guidelines because I was unable to find an online resource on this topic.
One way to get tokens by line number is to create a ParseTreeListener for your grammar, use it to walk a given ParseTree and index TerminalNodes by line number. I don't know C#, but here is how I've done it in Java. Logic should be similar.
public class MyLineIndexer extends MyGrammarParserBaseListener {
protected MultiMap<Integer, TerminalNode> filelineTokenIndex;
#Override
public void visitTerminal(#NotNull TerminalNode node) {
// map every token to its file line for searching later...
if ( node.getSymbol() != null ) {
List<TerminalNode> tokens;
Integer line = node.getSymbol().getLine();
if (!filelineTokenIndex.containsKey(line)) {
tokens = new ArrayList<>();
filelineTokenIndex.put(line, tokens);
} else {
tokens = filelineTokenIndex.get(line);
}
tokens.add(node);
}
super.visitTerminal(node);
}
}
then walk the parse tree the usual way...
ParseTree parseTree = ... ; // parse it how you want to
MyLineIndexer indexer = new MyLineIndexer();
ParseTreeWalker walker = new ParseTreeWalker();
walker.walk(indexer, parseTree);
Getting the token at a line and range is now reasonably straight forward and efficient assuming you have a reasonable number of tokens on a line. For example you can add another method to the Listener like this:
public TerminalNode findTerminalNodeAtCaret(int caretPos, int caretLine) {
if (caretPos <= 0) return null;
if (this.filelineTokenIndex.containsKey(caretLine)) {
List<TerminalNode> nodes = filelineTokenIndex.get(caretLine);
if (nodes.size() == 0) return null;
int tokenEndPos, tokenStartPos;
for (TerminalNode n : nodes) {
if (n.getSymbol() != null) {
tokenEndPos = n.getSymbol().getCharPositionInLine() + n.getText().length();
tokenStartPos = n.getSymbol().getCharPositionInLine();
// If the caret is within this token, return this token
if (caretPos >= tokenStartPos && caretPos <= tokenEndPos) {
return n;
}
}
}
}
return null;
}
You will also need to ensure your parser allows for 'loose' parsing. While a language construct is being typed, it is likely not to be valid. Your Parser rules should allow for this.

Menu driven palindrome tester getting stack Overflow error

I'm doing this for a class, and I'm really not terribly awesome at this as I'm over a decade out of practice. I'm trying to write a program that displays a menu so the user can choose between methods of determining if it's a palindrome.
It then needs to redisplay the menu once it has completed the test. I'm getting a stack overflow error in the isPalindrome method since I combined the 2 classes into one class, which I thought would fix another problem I was having with the output! Any thoughts or directions I can take?
import java.util.Scanner;
public class PalHelper
{
public String pal;
public void MenuList()
{
System.out.println("How would you like to check your phrase?");
System.out.println("1. Check the first letter like it's the last letter - Leave no phrase unturned!");
System.out.println("2. I Prefer my Palindromes have the gentle treatment");
System.out.println("3. We're done here");
System.out.print("Selection: ");
}
public PalHelper()
{
Scanner decision = new Scanner(System.in);
MenuList();
switch (decision.nextInt())
{
//to access the character by character method of determination
case 1:
System.out.print("Enter Phrase to Test, the Hard Way:");
Scanner keyboard1 = new Scanner(System.in); //declares scanner variable for user entry
String UserInput1 = keyboard1.next();//Phrase variable
Boolean test1 = isPalindrome(UserInput1);
if (test1 == true){
System.out.println(UserInput1+" is a palindrome. That doesn't make you smart.");
}
else {
System.out.println(UserInput1+" is a not palindrome. Why don't you think a little harder and try again.");
}
System.out.println("..\n..\n..\n");
keyboard1.close();
new MenuList();
break;
//to access the string buffer method of determination
case 2:
System.out.print("Thank you for choosing the gentle way, please enter your phrase:");
Scanner keyboard2 = new Scanner(System.in); //declares scanner variable for user entry
String UserInput2 = keyboard2.next();
Boolean test2 = isPalindrome2(UserInput2);
if (test2 == true){
System.out.println(UserInput2+" is a palindrome. Congratulations! You are so wonderful!");
}
else {
System.out.println(UserInput2+" is a not palindrome. It's ok, I'm sure you'll get it next time.");
}
System.out.println("..\n..\n..\n");
keyboard2.close();
new MenuList();
break;
//exit menu
case 3:
System.out.println ( "Too bad – I hid a boot!" );
break;
//response to input other than 1,2,3
default:
System.out.println ( "No sir! Away! A papaya war is on." );
System.out.println("..\n..\n..\n");
new MenuList();
break;
}// close switch
}//close pal helper
public void Palindrome(String UserInput) {
}
public boolean isPalindrome(String UserInput) {
pal = UserInput.toUpperCase();
if (pal.length() <= 1) {//one character, automatically a palindrome
return true;
}
char start = pal.charAt(0);
char end = pal.charAt(pal.length()-1);
if (Character.isLetter(start) &&
Character.isLetter(end)) {//check if first and last characters match
if (start != end) {
return false; //if the beginning & ending characters are not the same it's not a palindrome
}
else {
Palindrome subpal = new Palindrome(pal.substring(1,pal.length()-1));
return subpal.isPalindrome(); //check middle dropping start and end letters
}
}
else if (!Character.isLetter(start)) {
Palindrome subpal = new Palindrome(pal.substring(1));
return subpal.isPalindrome(pal); //check if first letter is a letter, drop if not
}
else {
Palindrome subpal = new Palindrome(pal.substring(0,pal.length()-1));
return subpal.isPalindrome(pal); //check if first letter is a letter, drop if not
}
}//close isPalindrome
public boolean isPalindrome2(String UserInput){
pal = UserInput.toUpperCase();
pal = pal.replaceAll("\\W", "");//gets rid of space and punctuation
StringBuffer check = new StringBuffer(pal);//reverses pal string and creates new stringbuffer for check
check.reverse();
if (check.toString().equals(pal)){//checks for equality between pal and it's reverse
return true;
}
else {
return false;
}
}//close isPalindrome2
public static void main (String[]args)
{
new PalHelper();
}//close main
}//close class
Aside from my comment above, there is a problem that I noticed: in some cases, you construct a new Palindrome instance with the substring, but pass the full string while recursing to the isPalindrome() method. This causes the recursion to never terminate (which also makes your code hard to follow).
Your example is missing something, the following line has missing closing brackets:
Palindrome subpal = new Palindrome(pal.substring(1,pal.length()-1);
Your comments dont seem to match the code:
return subpal.isPalindrome(pal); //check if first letter is a letter, drop if not
Try adding Output to the Method isPalindrome() or debug it. You are probably not calling it with the right Strings and end up looping over the same Strings over and over.
public boolean isPalindrome(String UserInput) {
System.out.println(UserInput);
...
Edit: If your code really is exaclty like you posted then vhallac is right, you call isPalindrome() with the full String.

LL_EXACT_AMBIG_DETECTION - Interpetation

When using PredictionMode::LL_EXACT_AMBIG_DETECTION I get the following error messages:
line 186:7 reportAttemptingFullContext d=30, input='ON REPORT HEAD
How am I to interpret the d attribute. Does it reference a rule in my grammar and how can I find out which?
According to the code:
#Override
public void reportAttemptingFullContext(#NotNull Parser recognizer,
#NotNull DFA dfa,
int startIndex, int stopIndex,
#NotNull ATNConfigSet configs)
{
recognizer.notifyErrorListeners("reportAttemptingFullContext d=" +
dfa.decision + ", input='" +
recognizer.getTokenStream().getText(Interval.of(startIndex, stopIndex)) + "'");
}
the attribute d is a decision in DFA. But I have not found out how the use the information back to the rule in the grammar.
Thank for your help.
Kind regards,
Wolfgang Hämmer
The following helper methods can convert a decision number to a rule name. You can create your own error listener implementation based on DiagnosticErrorListener and use these methods to include the name of the rule in each message.
If a rule has more than one decision, then you can pass the -atn flag to ANTLR when you generate code for your grammar. Once you have the name of the rule, look at the graph produced by ruleName.dot (where ruleName is the rule), and you'll see a node in the graph labeled d=decisionNumber (where decisionNumber is the number you're currently seeing). That will point you to the exact location where the problem is occurring.
Keep in mind that rule and decision numbers change when you change your grammar, so when you open ruleName.dot you'll want to verify the actual decision number each you regenerate the code for your grammar.
public static int getDecisionRule(Recognizer<?, ?> recognizer, int decision) {
if (recognizer == null || decision < 0) {
return -1;
}
if (decision >= recognizer.getATN().decisionToState.size()) {
return -1;
}
return recognizer.getATN().decisionToState.get(decision).ruleIndex;
}
public static String getRuleDisplayName(Recognizer<?, ?> recognizer, int ruleIndex) {
if (recognizer == null || ruleIndex < 0) {
return Integer.toString(ruleIndex);
}
String[] ruleNames = recognizer.getRuleNames();
if (ruleIndex < 0 || ruleIndex >= ruleNames.length) {
return Integer.toString(ruleIndex);
}
return ruleNames[ruleIndex];
}

How to fix "Path Manipulation Vulnerability" in some Java Code?

The below simple java code getting Fortify Path Manipulation error. Please help me to resolve this. I am struggling from long time.
public class Test {
public static void main(String[] args) {
File file=new File(args[0]);
}
}
Try to normalize the URL before using it
https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#normalize()
Path path = Paths.get("/foo/../bar/../baz").normalize();
or use normalize from org.apache.commons.io.FilenameUtils
https://commons.apache.org/proper/commons-io/javadocs/api-1.4/org/apache/commons/io/FilenameUtils.html#normalize(java.lang.String)
Stirng path = FilenameUtils.normalize("/foo/../bar/../baz");
For both the result will be \baz
Looking at the OWASP page for Path Manipulation, it says
An attacker can specify a path used in an operation on the filesystem
You are opening a file as defined by a user-given input. Your code is almost a perfect example of the vulnerability! Either
Don't use the above code (don't let the user specify the input file as an argument)
Let the user choose from a list of files that you supply (an array of files with an integer choice)
Don't let the user supply the filename at all, remove the configurability
Accept the vulnerability but protect against it by checking the filename (although this is the worst thing to do - someone may get round it anyway).
Or re-think your application's design.
Fortify will flag the code even if the path/file doesn't come from user input like a property file. The best way to handle these is to canonicalize the path first, then validate it against a white list of allowed paths.
Bad:
public class Test {
public static void main(String[] args) {
File file=new File(args[0]);
}
}
Good:
public class Test {
public static void main(String[] args) {
File file=new File(args[0]);
if (!isInSecureDir(file)) {
throw new IllegalArgumentException();
}
String canonicalPath = file.getCanonicalPath();
if (!canonicalPath.equals("/img/java/file1.txt") &&
!canonicalPath.equals("/img/java/file2.txt")) {
// Invalid file; handle error
}
FileInputStream fis = new FileInputStream(f);
}
Source: https://www.securecoding.cert.org/confluence/display/java/FIO16-J.+Canonicalize+path+names+before+validating+them
Only allow alnum and a period in input. That means you filter out the control chars, "..", "/", "\" which would make your files vulnerable. For example, one should not be able to enter /path/password.txt.
Once done, rescan and then run Fortify AWB.
I have a solution to the Fortify Path Manipulation issues.
What it is complaining about is that if you take data from an external source, then an attacker can use that source to manipulate your path. Thus, enabling the attacker do delete files or otherwise compromise your system.
The suggested remedy to this problem is to use a whitelist of trusted directories as valid inputs; and, reject everything else.
This solution is not always viable in a production environment. So, I suggest an alternative solution. Parse the input for a whitelist of acceptable characters. Reject from the input, any character you don't want in the path. It could be either removed or replaced.
Below is an example. This does pass the Fortify review. It is important to remember here to return the literal and not the char being checked. Fortify keeps track of the parts that came from the original input. If you use any of the original input, you may still get the error.
public class CleanPath {
public static String cleanString(String aString) {
if (aString == null) return null;
String cleanString = "";
for (int i = 0; i < aString.length(); ++i) {
cleanString += cleanChar(aString.charAt(i));
}
return cleanString;
}
private static char cleanChar(char aChar) {
// 0 - 9
for (int i = 48; i < 58; ++i) {
if (aChar == i) return (char) i;
}
// 'A' - 'Z'
for (int i = 65; i < 91; ++i) {
if (aChar == i) return (char) i;
}
// 'a' - 'z'
for (int i = 97; i < 123; ++i) {
if (aChar == i) return (char) i;
}
// other valid characters
switch (aChar) {
case '/':
return '/';
case '.':
return '.';
case '-':
return '-';
case '_':
return '_';
case ' ':
return ' ';
}
return '%';
}
}
Assuming you're running Fortify against a web application, during your triage of Fortify vulnerabilities that would likely get marked as "Not an issue". Reasoning being A) obviously this is test code and B) unless you have multiple personality disorder you're not going to be doing a path manipulation exploit against your self when you run that test app.
If very common to see little test utilities committed to a repository which produces this style of false positive.
As for your compilation errors, that generally comes down to classpath issues.
We have code like below which was raising Path Manipulation high category issue in fortify .
String.join(delimeter,string1,string2,string2,string4);
Our program is to deal with AWS S3 bucket so, we changed as below and it worked .
com.amazonaws.util.StringUtils.join(delimeter,string1,string2,string2,string4);
Using the Tika library FilenameUtils.normalize solves the fortify issue.
import org.apache.tika.io.FilenameUtils;
public class Test {
public static void main(String[] args) {
String filePath = FilenameUtils.normalize(args[0]); //This line solve issue.
File file=new File(filePath);
}
}
Try this for replacing FileInputStream. You will need to close your project and open again to accurately see whether changes worked.
File to byte[] in Java
Use Normalize() function in C# and it resolved the fortify vulnerability in next scan.
string s = #:c:\temp\scan.log".Normalize();
Use regex to validate the file path and file name
fileName = args[0];
final String regularExpression = "([\\w\\:\\\\w ./-]+\\w+(\\.)?\\w+)";
Pattern pattern = Pattern.compile(regularExpression);
boolean isMatched = pattern.matcher(fileName).matches();

Resources