For reference, here is my code: http://hpaste.org/86949
I am trying to parse the following expression: if (a[1].b[2].c.d[999].e[1+1].f > 3) { }. The method playing up is varExpr, which parses the variable member chains.
Context
In the language I am parsing, a dot can specify accessing a member variable. Since a member variable can be another object, chains can be produced ie: a.b.c, or essentially (a.b).c. Do not assume the dots are function composition.
Implementation
The logic is like this:
First, before <- many vocc collects all the instances of varname . and their optional array expression, leaving only a single identifier left
this <- vtrm collects the remaining identifier plus array expression -- the only one not proceeded by a dot
Issues
I am having two issues:
Firstly, the first term [for a reason that I cannot determine] seems to always require that it be wrapped in brackets for the parser to accept it ie: (a[1]).b[2].c... -- subsequent terms do not require this.
Secondly, the many vocc won't stop parsing. It always expects another identifier and another dot and I am unable to terminate the expression to catch the last vtrm.
I am looking for hints or solutions that will help me solve my problem(s)/headaches. Thanks.
When varExpr runs, it checks whether the next bit of input is matched by vocc or vtrm.
varExpr = do before <- many vocc -- Zero or more occurrences
this <- vtrm
return undefined
The problem is that any input matched by vtrm is also matched by the first step of vocc. When varExpr runs, it runs vocc, which runs vobj, which runs vtrm.
vocc = vobj <* symbol "."
vobj = choice [try vtrm, try $ parens vtrm]
Parsing of many vocc ends when vocc fails without consuming input. This happens when both vtrm and parens vtrm fail. However, after many vocc ends, the next parser to run is vtrm—and this parser is sure to fail!
You want vocc to fail without consuming input if it doesn't find a "." in the input. For that, you need to use try.
vocc = try $ vobj <* symbol "."
Alternatively, if vobj and vtrm really should be the same syntax, you can define varExpr as vobj `sepBy1` symbol ".".
Related
I'm attempting to parse permutations of flags. The behavior I want is "one or more flags in any order, without repetition". I'm using the following packages:
megaparsec
parser-combinators
The code I have is outputting what I want, but is too lenient on inputs. I don't understand why it's accepting multiples of the same flags. What am I doing wrong here?
pFlags :: Parser [Flag]
pFlags = runPermutation $ f <$>
toPermutation (optional (GroupFlag <$ char '\'')) <*>
toPermutation (optional (LeftJustifyFlag <$ char '-'))
where f a b = catMaybes [a, b]
Examples:
"'-" = [GroupFlag, LeftJustifyFlag] -- CORRECT
"-'" = [LeftJustifyFlag, GroupFlag] -- CORRECT
"''''-" = [GroupFlag, LeftJustifyFlag] -- INCORRECT, should fail if there's more than one of the same flag.
Instead of toPermutation with optional, I believe you need to use toPermutationWithDefault, something like this (untested):
toPermutationWithDefault Nothing (Just GroupFlag <$ char '\'')
The reasoning is described in the paper “Parsing Permutation Phrases” (PDF) in §4, “adding optional elements” (emph. added):
Consider, for example […] all permutations of a, b and c. Suppose b can be empty and we want to recognise ac. This can be done in three different ways since the empty b can be recognised before a, after a or after c. Fortunately, it is irrelevant for the result of a parse where exactly the empty b is derived, since order is not important. This allows us to use a strategy similar to the one proposed by Cameron: parse nonempty constituents as they are seen and allow the parser to stop if all remaining elements are optional. When the parser stops the default values are returned for all optional elements that have not been recognised.
To implement this strategy we need to be able to determine whether a parser can derive the empty string and split it into its default value and its non-empty part, i.e. a parser that behaves the same except that it does not recognise the empty string.
That is, the permutation parser needs to know which elements can succeed without consuming input, otherwise it will be too eager to commit to a branch. I don’t know why this would lead to accepting multiples of an element, though; perhaps you’re also missing an eof?
I am writing the following simple routine:
program scratch
character*4 :: word
word = 'hell'
print *, concat(word)
end program scratch
function concat(x)
character*(*) x
concat = x // 'plus stuff'
end function concat
The program should be taking the string 'hell' and concatenating to it the string 'plus stuff'. I would like the function to be able to take in any length string (I am planning to use the word 'heaven' as well) and concatenate to it the string 'plus stuff'.
Currently, when I run this on Visual Studio 2012 I get the following error:
Error 1 error #6303: The assignment operation or the binary
expression operation is invalid for the data types of the two
operands. D:\aboufira\Desktop\TEMP\Visual
Studio\test\logicalfunction\scratch.f90 9
This error is for the following line:
concat = x // 'plus stuff'
It is not apparent to me why the two operands are not compatible. I have set them both to be strings. Why will they not concatenate?
High Performance Mark's comment tells you about why the compiler complains: implicit typing.
The result of the function concat is implicitly typed because you haven't declared its type otherwise. Although x // 'plus stuff' is the correct way to concatenate character variables, you're attempting to assign that new character object to a (implictly) real function result.
Which leads to the question: "just how do I declare the function result to be a character?". Answer: much as you would any other character variable:
character(len=length) concat
[note that I use character(len=...) rather than character*.... I'll come on to exactly why later, but I'll also point out that the form character*4 is obsolete according to current Fortran, and may eventually be deleted entirely.]
The tricky part is: what is the length it should be declared as?
When declaring the length of a character function result which we don't know ahead of time there are two1 approaches:
an automatic character object;
a deferred length character object.
In the case of this function, we know that the length of the result is 10 longer than the input. We can declare
character(len=LEN(x)+10) concat
To do this we cannot use the form character*(LEN(x)+10).
In a more general case, deferred length:
character(len=:), allocatable :: concat ! Deferred length, will be defined on allocation
where later
concat = x//'plus stuff' ! Using automatic allocation on intrinsic assignment
Using these forms adds the requirement that the function concat has an explicit interface in the main program. You'll find much about that in other questions and resources. Providing an explicit interface will also remove the problem that, in the main program, concat also implicitly has a real result.
To stress:
program
implicit none
character(len=[something]) concat
print *, concat('hell')
end program
will not work for concat having result of the "length unknown at compile time" forms. Ideally the function will be an internal one, or one accessed from a module.
1 There is a third: assumed length function result. Anyone who wants to know about this could read this separate question. Everyone else should pretend this doesn't exist. Just like the writers of the Fortran standard.
In a small DSL, I'm parsing macro definitions, similarly to #define C pre-processor directives (here a simplistic example):
_def mymacro(a,b) = a + b / a
When the following call is encountered by the parser
c = mymacro(pow(10,2),3)
it is expanded to
c = pow(10,2) + 3 / pow(10,2)
My current approach is:
wrap the parser in a State monad
when parsing macro definitions, store them in the state, with their body unparsed (parse it as a string)
when parsing a macro call, find the definition in the state, replace the arguments in the body text, replace the call with this body and resume the parsing.
Some code from the last step:
macrocallStmt
= do -- capture starting position and content of old input before macro call
oldInput <- getInput
oldPos <- getPosition
-- parse the call
ret <- identifier
symbolCS "="
i <- identifier
args <- parens $ commaSep anyExprStr
-- expand the macro call
us <- get
let inlinedCall = replaceMacroArgs i args ret us
-- set up new input with macro call expanded
remainder <- getInput
let newInput = T.append inlinedCall (T.cons '\n' remainder)
setPosition oldPos
setInput newInput
-- update the expanded input script
modify (updateExpandedInput oldInput newInput)
anyExprStr = fmap praShow expression <|> fmap praShow algexpr
This approach does the job decently. However, it has a number of drawbacks.
Parsing multiple times
Any valid DSL expression can be an argument of the macro call. Therefore, even though I only need their textual representation (to be replaced in the macro body), I need to parse them and then convert them again to string - simply looking for the next comma wouldn't work. Then the complete and customised macro will be parsed. So in practice, macro arguments get parsed twice (and also show-ed, which has its cost). Moreover, each call requires a new parsing of the (almost same) body. The reason to keep the body unparsed in memory is to allow maximum flexibility: in the body, even DSL keywords could be constructed out of the macro arguments.
Error handling
Because the expanded body is inserted in front of the unconsumed input (replacing the call), the initial and final input can be quite different. In the event of a parse error, the position where the error occurred in the expanded input is available. However, when processing the error, I only have the original, not expanded, input. So the error position won't match.
That is why, in the code snippet above, I use the state to save the expanded input, so that it is available when the parser exits with an error.
This works well, but I noticed that it becomes quite costly, with new Text arrays (the input stream is Text) being allocated for the whole stream at every expansion. Perhaps keeping the expanded input in the state as String, rather than Text, would be cheaper in this case, i.e. when a middle part needs to be replaced?
The reasons for this question are:
I would appreciate suggestions / comments on the two issues described above
Can anyone suggest a better approach altogether?
I' m having a problem parsing the lat and long cords from TinyGPS++ to a Double or a string. The code that i'm using is:
String latt = ((gps.location.lat(),6));
String lngg = ((gps.location.lng(),6));
Serial.println(latt);
Serial.println(lngg);
The output that i'm getting is:
0.06
Does somebody know what i'm doing wrong? Does it have something to do with rounding? (Math.Round) function in Arduino.
Thanks!
There are two problems:
1. This does not compile:
String latt = ((gps.location.lat(),6));
The error I get is
Wouter.ino:4: warning: left-hand operand of comma has no effect
Wouter:4: error: invalid conversion from 'int' to 'const char*'
Wouter:4: error: initializing argument 1 of 'String::String(const char*)'
There is nothing in the definition of the String class that would allow this statement. I was unable to reproduce printing values of 0.06 (in your question) or 0.006 (in a later comment). Please edit your post to have the exact code that compiles, runs and prints those values.
2. You are unintentionally using the comma operator.
There are two places a comma can be used: to separate arguments to a function call, and to separate multiple expressions which evaluate to the last expression.
You're not calling a function here, so it is the latter use. What does that mean? Here's an example:
int x = (1+y, 2*y, 3+(int)sin(y), 4);
The variable x will be assigned the value of the last expression, 4. There are very few reasons that anyone would actually use the comma operator in this way. It is much more understandable to write:
int x;
1+y; // Just a calculation, result never used
2*y; // Just a calculation, result never used
3 + (int) sin(y); // Just a calculation, result never used
x = 4; // A (trivial) calculation, result stored in 'x'
The compiler will usually optimize out the first 3 statements and only generate code for the last one1. I usually see the comma operator in #define macros that are trying to avoid multiple statements.
For your code, the compiler sees this
((gps.location.lat(),6))
And evaluates it as a call to gps.location.lat(), which returns a double value. The compiler throws this value away, and even warns you that it "has no effect."
Next, it sees a 6, which is the actual value of this expression. The parentheses get popped, leaving the 6 value to be assigned to the left-hand side of the statement, String latt =.
If you look at the declaration of String, it does not define how to take an int like 6 and either construct a new String, or assign it 6. The compiler sees that String can be constructed from const char *, so it tells you that it can't convert a numeric 6 to a const char *.
Unlike a compiler, I think I can understand what you intended:
double latt = gps.location.lat();
double lngg = gps.location.lon();
Serial.println( latt, 6 );
Serial.println( lngg, 6 );
The 6 is intended as an argument to Serial.println. And those arguments are correctly separated by a comma.
As a further bonus, it does not use the String class, which will undoubtedly cause headaches later. Really, don't use String. Instead, hold on to numeric values, like ints and floats, and convert them to text at the last possible moment (e.g, with println).
I have often wished for a compiler that would do what I mean, not what I say. :D
1 Depending on y's type, evaluating the expression 2*y may have side effects that cannot be optimized away. The streaming operator << is a good example of a mathematical operator (left shift) with side effects that cannot be optimized away.
And in your code, calling gps.location.lat() may have modified something internal to the gps or location classes, so the compiler may not have optimized the function call away.
In all cases, the result of the call is not assigned because only the last expression value (the 6) is used for assignment.
I have recently started learning Haskell and have been trying my hand at Parsec. However, for the past couple of days I have been stuck with a problem that I have been unable to find the solution to. So what I am trying to do is write a parser that can parse a string like this:
<"apple", "pear", "pineapple", "orange">
The code that I wrote to do that is:
collection :: Parser [String]
collection = (char '<') *> (string `sepBy` char ',')) <* (char '>')
string :: Parser String
string = char '"' *> (many (noneOf ['\"', '\r', '\n', '"'])) <* char '"'
This works fine for me as it is able to parse the string that I have defined above. Nevertheless, I would now like to enforce the rule that every element in this collection must be unique and that is where I am having trouble. One of the first results I found when searching on the internet was this one, which suggest the usage of the nub function. Although the problem stated in that question is not the same, it would in theory solve my problem. But what I don't understand is how I can apply this function within a Parser. I have tried adding the nub function to several parts of the code above without any success. Later I also tried doing it the following way:
collection :: Parser [String]
collection = do
char '<'
value <- (string `sepBy` char ','))
char '>'
return nub value
But this does not work as the type does not match what nub is expecting, which I believe is one of the problems I am struggling with. I am also not entirely sure whether nub is the right way to go. My fear is that I am going in the wrong direction and that I won't be able to solve my problem like this. Is there perhaps something I am missing? Any advice or help anyone could provide would be greatly appreciated.
The Parsec Parser type is an instance of MonadPlus which means that we can always fail (ie cause a parse error) whenever we want. A handy function for this is guard:
guard :: MonadPlus m => Bool -> m ()
This function takes a boolean. If it's true, it return () and the whole computation (a parse in this case) does not fail. If it's false, the whole thing fails.
So, as long as you don't care about efficiency, here's a reasonable approach: parse the whole list, check for whether all the elements are unique and fail if they aren't.
To do this, the first thing we have to do is write a predicate that checks if every element of a list is unique. nub does not quite do the right thing: it return a list with all the duplicates taken out. But if we don't care much about performance, we can use it to check:
allUnique ls = length (nub ls) == length ls
With this predicate in hand, we can write a function unique that wraps any parser that produces a list and ensures that list is unique:
unique parser = do res <- parser
guard (allUnique res)
return res
Again, if guard is give True, it doesn't affect the rest of the parse. But if it's given False, it will cause an error.
Here's how we could use it:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\">"
Right ["apple","pear","pineapple","orange"]
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):unknown parse error
This does what you want. However, there's a problem: there is no error message supplied. That's not very user friendly! Happily, we can fix this using <?>. This is an operator provided by Parsec that lets us set the error message of a parser.
unique parser = do res <- parser
guard (allUnique res) <?> "unique elements"
return res
Ahhh, much better:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):
expecting unique elements
All this works but, again, it's worth noting that it isn't efficient. It parses the whole list before realizing elements aren't unique, and nub takes quadratic time. However, this works and it's probably more than good enough for parsing small to medium-sized files: ie most things written by hand rather than autogenerated.