Every char action for column count in Ragel - lexer

What is the preferred way to implement a column counter in a Ragel finite state machine. If it makes any difference, my main machine is a scanner as defined in chapter 6.3 of the Ragel manual. I'm thinking probably that I just need to be able to execute an action for every character consumed (i.e. incrementing a counter), but if there's a better way to do it, I'd love to know.

You can keep track of the position of newlines in the input and get the column using the current position whenever you need to, using something like column = p - lineStart + 1, where lineStart is the position just after the previous newline (or the beginning of the file if you're on the first line).

Related

How to deal with CHARACTER variables, longer than the allowed maximum of 2000?

I'm working with Progress-4GL, appBuilder and procedure editor, release 11.6.
I've just found a CHARACTER type global variable (DEF VAR global_variable AS CHAR NO-UNDO.), containing up to 12901 characters. The variable is only used for passing information within the application, the information will never be stored as one tuple within a table.
The information in that variable seems to be handled well: the content is correct.
Yet, as this URL mentions, the maximum length of a character variable in Progress being 2000 characters, and this makes me worry: I'm afraid that one day, another limit may be crossed and from that moment on, we'll need to rethink the whole idea, and I'd like to be prepared for that day.
Therefore, does anybody know the "next" length limit of a character variable in Progress?
That reference you mention points to SQL limitations.
In the ABL, a CHARACTER variable can hold ~ 32 k
DEFINE VARIABLE c AS CHARACTER NO-UNDO.
ASSIGN c = FILL ("*", 31000) .
MESSAGE LENGTH (c)
VIEW-AS ALERT-BOX INFORMATION BUTTONS OK.
Beyond that you have to use LONGCHAR with it's limitations:
slightly slower
cannot be indexed in temp-tables or database tables.
CHARACTER variables are always stored in the CPINTERNAL codepage. LONGCHAR's can use a different codepage through the FIX-CODEPAGE statement.

is there a way to calculate every possible order of operation for 1 operation in Python?

Let's say that I have a = '1+2*5/3', there's a specific order to which my machine will evaluate this statement (with eval(a))
I would like to know if there's a line of code (or a function? just an elegant way that could get the job done) that would calculate :
(1+2)*5/3
1+(2*5)/3
1+2*(5/3)
(1+2*5)/3
1+(2*5/3)
(1+2)*(5/3)
1+2*5/3
In this example, I used an operation with 4 factors, so I could just code 1 function for each possibility, but I need to do the same thing with 6 factors and that would just take way too much time and effort since the possibility of different operation order would increase exponentially
It would be also great that it returns everything in a dictionary in this form {operation:result} with the parentheses included, if not i'll find my way around it
edit: as requested, the main goal is to make a program that find the solution to the game " le compte est bon " brute force method, the rules can be found here : https://en.wikipedia.org/wiki/Des_chiffres_et_des_lettres#Le_compte_est_bon_.28.22the_total_is_right.22.29
This is going to be very hard. I recommend you follow these steps:
Create a list to check if the formula has already been calculated
Randomize the order (such as +-*/ and randomly place numbers
Check if rule number`s one is a valid formula. if not try number 1 again
Randomize the order (such as opening and closing parentheses and ^)
Check if the sentence above is a valid formula. if not try number 3 again.
Check the formula through the list and see if it has already been calculated. if it has been calculated then we don't use it and go back to number two. but...
If it is not in the list then we can use it.
Those are the basic steps for common known math symbols, but what about square root?
Another way to do this is by making python move the symbols over like you did with the parentheses, but for EVERYTHING (numbers and symbols(+-/*))
EDIT:
This was before the original question was changed.

The most effective string replace alogrithm?

We know that most code editors implement the string search with the Boyer-Moore algorithm.How does it implements the string replace algorithm, Any idea?
I'm guessing that nowadays most text editors use either a single block of memory to hold the entire file, or an array of lines or blocks of larger size, each of which points to its own block of memory. (In the past there have been more interesting techniques employed. One way is to have all text to the left or above the cursor position "pressed against" the left end of a fixed-size buffer, and all text to the right or below "pressed against" the right end, with a gap of free space in the middle. Then the common operations of inserting or deleting characters can take place in constant time! Moving the cursor k positions to the right entails sliding k bytes from the left end of the right segment to the right end of the left segment, i.e. moving the cursor is now a linear time operation!)
Assuming that the text is stored in an "ordinary" way (i.e. not the left-right cursor-dependent buffer pair described above), there aren't too many ways to optimise replace operations, especially if the replacement text is longer than the search text -- in this case, there is no escaping the fact that the rest of the line/block/file must be shunted forward in memory for each replacement. The best you can do there is to avoid multiple O(n) copy operations when one will do -- i.e. don't delete the search string, then insert the replacement string one character at a time, shunting the rest of the line/block/document forward one character at a time, because the latter step will cost O(n^2) time. Instead, shunt the rest of the document text far enough forward to make room for the replacement string in one O(n) step.
If the replacement string is shorter than the search text, you can scan forward with two pointers from and to, always copying from one to the other. As replacements are made, to will start to lag behind from. This is safe because to <= from always holds, so you will never write over something you have to read later.
Actually, if the replacement string is longer than the search string, and no suffix of the search string is also a prefix of the search string, then you can safely scan backwards from the end in one O(n) pass. The suffix/prefix requirement is necessary to avoid situations like the following, which would produce different behaviour depending on the scan direction:
Search and replace "abcabc" with "xyz" in document text "abcabcabc":
S&R using forward algo gives: xyzabc
S&R using backward algo gives: abcxyz

Algorithm for string replacement based on conditional char replacement

Usage case: I'm writing a domain specific language (DSL) for a regex-like but way more powerful Lispy string processing system focused on conditional replacements (like simulation of language evolution for conlangers/linguists) rather than matching as regexes do. As usual I wrote down the specs before actually writing down the code.
However, due to a somewhat stupid but hard to fix mistake, I ended up with a system only capable of doing stuff one char at a time. Thus, a rewrite rule might be (in pseudocode) change 'a' to 'e' when last char is 's' and next char is 'd'. Chars can also be deleted: delete 'a' when ....
Since the interpreter for the DSL is a bit spaghetti-ish (not in the sense of unstructured, but in the sense that 1. I haven't figured out OO for my implementation lang Chicken Scheme 2. No IDE, so must remember 20+ variable names and use emacs) I don't want to touch it, but rather "unsugar" string replacements to conditional char replacements.
The trivial example: change "ab" to "cd" unconditionally rewrites to change 'a' to 'c' when followed by 'b'; change 'b' to 'd' when preceded by a. However, when there are conditions, things become very ugly very quick. Is there some easy recursive way to do the rewriting, or is this nearly impossible in the rewriting phase and I should probably fix my DSL interpreter? (Note: my DSL has ways to get the n-th letter before and after the current char)
The problem is that since we are going through the data character-at-a-time, when a condition is applied to a multi-character string, that condition has to be expressed in different ways for every position. For instance "abc" followed by "x" combines in a straightforward way into the condition for a, b and c, but has to change shape. The x is actually three positions away from a, but only two from b. This is bad because it causes a proliferation of conditions, which all get wastefully evaluated.
I'd solve this by adding the concept of frames into the interpreter. A frame is established at the current character position, and then holds that position somehow, allowing frame-relative addressing of the characters.
I can think of a few ways of introducing this position fixing. One would be to introduce variable binding into the interpreter. It could support a pair of instructions bind symbol and unbind n, where we would be using a gensym for the symbol.
When generating the code for an operation on a string like "abc", we would generate an instruction like bind #:g0025, which would fix the position of the a, and then the compiler will analyze the conditions applied to the string, and re-phrase them in terms that are relative to #:g0025. After the processing of "abc", we would emit unbind 1 to drop the most recently bound variable.
We could also bind variables to the Boolean values of conditions.
As an example with the named frames, suppose we have
Replace "abc" with "ijk" when preceded by "x" and followed by "yz".
This goes to something like:
bind #:frame
bind #:cond0 to when #:frame[-1] is "x" and #:frame[3] is "y" and #:frame[4] is "z"
replace "a" with "i" when #:cond0 ; these insns move the char position
replace "b" with "j" when #:cond0
replace "c" with "k" when #:cond0
unbind 2
So the difficulty has been translated to one of compiling the condition into frame-relative addressing. The #:frame[3] is derived from the length of the "abc" pattern, which is available to the translator all at once. That is information not available in the target language, which doesn't have "abc" all at once.
The system almost certainly needs some way to try different matches at the same location. If there is no "abc" at the current position, another rule which replaces "foo" with something has to be tried at the same position. Perhaps when the conditions fail, the instruction doesn't advance the character position. So in our example above, that would work: all instructions share the same condition, so in the case of a match the position moves by three positions, otherwise it doesn't. Still, in spite of that, there may be a requirement to have multiple edits with different conditions at the same spot. The scope of my answer isn't to design the whole thing, though.

Extracting information in a string

I would like to parse strings with an arbitrary number of parameters, such as P1+05 or P2-01 all put together like P1+05P2-02. I can get that data from strings with a rather large (too much to post around...) IF tree and a variable keeping track of the position within the string. When reaching a key letter (like P) it knows how many characters to read and proceeds accordingly, nothing special. In this example say I got two players in a game and I want to give +05 and -01 health to players 1 and 2, respectively. (hence the +-, I want them to be somewhat readable).
It works, but I feel this could be done better. I am using Lua to parse the strings, so maybe there is some built-in function, within Lua, to ease that process? Or maybe some general hints , or references for better approaches?
Here is some code:
for w in string.gmatch("P1+05P2-02","%u[^%u]+") do
print(w)
end
It assumes that each "word" begins with an uppercase letter and its parameters contain no uppercase letters.

Resources