Ragel FSM for parsing SQL-like statements

Ragel FSM for parsing SQL-like statements - lexer

I'm having a bit of a problem with Ragel, mostly due to still trying to grasp how the whole thing works.
I'm trying to make a simple parser for a language similar to SQL (but less flexible), where you have functions (all uppercase), identifiers (all lowercase) and where you could nest functions within functions.
Here's what I have so far:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
typedef struct Parser {
int current_line;
int nesting;
/* Ragel FSM */
int cs;
const char *ts;
const char *te;
int act;
} Parser;
%%{
machine gql;
access parser->;
Function = [A-Z][A-Z_]+ ;
Identifier = [a-z][a-z_]+ ;
Integer = [0-9]+ ;
Parameter = ( Identifier | Integer )+ ;
WhiteSpace = [ \t\r\n] ;
action function_call {
parser->nesting++;
printf("FUNCTION CALL\n");
}
action function_finish {
parser->nesting--;
printf("FUNCTION FINISH!\n");
}
action function_add_identifier {
printf("FUNCTION ADD IDENTIFIER\n");
}
FunctionCall =
Function #function_call WhiteSpace* "("
Parameter %function_add_identifier
( WhiteSpace* ',' WhiteSpace* Parameter %function_add_identifier )* WhiteSpace*
%function_finish ')' ;
main := FunctionCall ;
}%%
%% write data;
void Parser_Init(Parser *parser) {
parser->current_line = 1;
parser->nesting = 0;
%% write init;
}
void Parser_Execute(Parser *parser, const char *buffer, size_t len) {
if(len == 0) return;
const char *p, *pe, *eof;
p = buffer;
pe = buffer+len;
eof = pe;
%% write exec;
}
int main(int argc, char *argv[]) {
Parser *parser = malloc(sizeof(Parser));
Parser_Init(parser);
printf("Parsing:\n%s\n\n\n", argv[1]);
Parser_Execute(parser, argv[1], sizeof(argv[1]));
printf("Parsed %d lines\n", parser->current_line);
return 0;
}
It is calling the function_call action once per character, not picking up the Parameters, and I can't think how to make functions work inside functions.
Any tips on what I'm doing wrong here?

The standard approach is to create a lexer (written in Ragel or GNU Flex) that just tokenizes your language input. The tokens are then consumed by a parser (not written in Ragel) that is able to parse recursive structures (e.g. nested functions) - with a parser generator like GNU Bison.
Note that Ragel includes (as advanced feature) directives to manage a stack (which enables you to parse recursive structures)- but with that you leave the domain of regular languages you otherwise are working with in ragel specifications. Thus, you could write a parser that is able to parse nested functions completely with Ragel. But a properly layered architecture (1st layer: lexer, 2nd layer: parser, ...) simplifies the task, i.e. the parts are easier to debug, test and maintain.

Related

C++ vector<thread*> push_back(): can't figure out incantation to create unnamed thread variables

Creating a named thread is working well for me:
void inserter( int iTimes ) {
for ( int i = 0; i < iTimes; i++ )
DoOne();
}
int main( int nArg, const char* apszArg[] ) {
std::thread t1( inserter, 100 );
:
:
But I can't figure out how to do it when creating the threads without a name. This produces an error that it cannot resolve the constructor. I'm also wondering whether, once that is working, whether the vector's type will be the right type or whether instead of thread* I need to specify template arguments and if so how to do so for 1) the function and 2) the parameter list.
using namespace std;
vector<thread*> apthread;
for ( int i = 0; i < nThreads; i++ )
apthread.push_back( new thread( inserter, i ) );

The only thing explicitly missing in your example code to make it compile is std::: https://godbolt.org/z/3gX_h2
#include <thread>
#include <vector>
void DoOne(){}
void DoMany( int iTimes ) {
for ( int i = 0; i < iTimes; i++ )
DoOne();
}
int main(){
std::vector<std::thread*> apthread;
const auto nThreads=10;
for ( int i = 0; i < nThreads; i++ )
apthread.push_back( new std::thread( DoMany, i ) );
// join all the threads
for(auto& t: apthread){
t->join();
}
}
However, you should never use plain new, and there is no need to use dynamic allocation for std::thread anyway: it is already a handle, and you can just push_back the new thread object into the vector:
#include <thread>
#include <vector>
void DoOne(){}
void DoMany( int iTimes ) {
for ( int i = 0; i < iTimes; i++ )
DoOne();
}
int main(){
std::vector<std::thread> apthread;
const auto nThreads=10;
for ( int i = 0; i < nThreads; i++ )
apthread.push_back(std::thread( DoMany, i ) );
// join all the threads
for(auto& t: apthread){
t.join();
}
}

The problem isn't due to the vector of threads.
Instead, the problem is that the thread function is called inserter(), which is also a symbol defined by C++ std libraries.
The vector version, in addition to adding the vector, added using namespace std; which prevented the compiler (g++ 7.2.1) from resolving intent.
Changing the function name to practically anything else (not defined by the standard libraries) enables compilation. Likewise, removing the using namespace std; and instead explicitly prepending all library symbols with std:: works.

Does Arduino support the struct hack or similar solution in lieu of flexible array elements?

I coded an Arduino project for my son and learned about C in the process. All works fine but after dividing up the code into ten files and grouping the variables into structs in each file I'm not able to solve one wish for clarity. We need to empirically determine the best size of an array for storing and averaging port reads so this is what I want:
struct Alarms {
// Configurable parameters
const unsigned int number_of_reads = 24;
// State variables
int reads[number_of_reads]; // Error: invalid use of non-static data member 'Alarms::num_of_reads'
};
It’s simple but doesn't work. I tried flexible array members until I found that that feature is not supported in C++. Arduino compiles with C++. I tried many examples of the 'struct hack' but they all returned errors like this one:
struct Alarms {
// Configurable parameters
int number_of_reads = 24;
// State variables
int reads[];
} ar;
void setup_alarm() {
ar.reads = malloc(sizeof(int) * ar.number_of_reads); // Error: incompatible types in assignment of 'void*' to 'int [0]'
}
That looked promising but I suspect my ignorance is glowing brightly. Most struct hack examples call for declaring the struct and later initializing the struct variables. I’m hoping to not duplicate the struct.
I considered splitting the struct but that would be error prone and, well, another compile error:
struct Alarms2 {
int reads[ar.num_of_reads]; // Error: array bound is not an integer constant before ']' token
} ar2;
An alternative is to size the array and get the size later but it needs an explanation:
struct Alarms {
// Configurable parameters
int reads[ 24 ]; // Put number of reads to average between brackets
// State variables
int number_of_reads;
};
void setup_alarm() {
ar.number_of_reads = sizeof(ar.reads) / sizeof(ar.reads[0]); // this works
}
Is there a way to work the struct hack or some similar solution in Arduino to like achieve the first example?

The size of the struct must be known at compilation time. Const data types in structs can change per instance of the structure, that is why you are getting the invalid use of non-static data member 'Alarms::num_of_reads' when you try to initialize your array. The best way to solve this is to have an init_alarm and destroy_alarm functions. Like so ...
#include <stdio.h>
#include <stdlib.h>
#define DEFAULT_NUM_OF_READS (24)
struct alarm {
// Configurable parameters
const int number_of_reads;
// State variables
int *reads;
};
void init_alarm(struct alarm *alarm)
{
alarm->reads = (int *) malloc(alarm->number_of_reads * sizeof(int));
}
void destroy_alarm(struct alarm *alarm)
{
free(alarm->reads);
}
int main(int argc, char **argv)
{
// When we create our struct, set number_of_reads to default
struct alarm alarm = {.number_of_reads = DEFAULT_NUM_OF_READS, .reads = NULL};
init_alarm(&alarm);
alarm.reads[0] = 13;
alarm.reads[23] = 100;
printf("alarm.reads[0] = %d, alarm.reads[23] = %d\n", alarm.reads[0], alarm.reads[23]);
destroy_alarm(&alarm);
return 0;
}
Note: Inorder to use the designated initializer to initialize a structure you must compile with ANSI (C99) like so ...
gcc --std=c99 test.c -o test

Is there a Lua string replace() function for faster replacements than gsub()?

I see a list of Lua string functions and I see the .gsub(), for global search and replace: http://www.gammon.com.au/scripts/doc.php?general=lua_string
All lua string functions :
static const luaL_Reg strlib[] = {
{"byte", str_byte},
{"char", str_char},
{"dump", str_dump},
{"find", str_find},
{"format", str_format},
{"gfind", gfind_nodef},
{"gmatch", gmatch},
{"gsub", str_gsub},
{"len", str_len},
{"lower", str_lower},
{"match", str_match},
{"rep", str_rep},
{"reverse", str_reverse},
{"sub", str_sub},
{"upper", str_upper},
{NULL, NULL}
};
Why is there no simple, fast, litteral (non-regex) string replace function?
Is .gsub() so efficient that there is no benefit?
I found this written in 2006 but it does not seem like it's included: http://lua-users.org/wiki/StringReplace

This is likely because gsub is capable of doing exactly what a replace function would do, and Lua's design goals include that of a small, generally uncomplicated standard library. There's no need for a redundancy like this to be baked right into the language.
As an outside example, the Ruby programming language provides both String#gsub and String#replace in its standard library. Ruby is a much, much larger language out of the box because of decisions like this.
However, Lua prides itself on being a very easy language to extend. The link you've shown shows how to bake the function into the standard library when compiling Lua as a whole. You could also piece it together to create a module.
Quickly patching together the parts we need results in (note we need the lmemfind function from lstrlib.c):
#include <lua.h>
#include <lauxlib.h>
#include <string.h>
static const char *lmemfind
(const char *s1, size_t l1, const char *s2, size_t l2) {
if (l2 == 0)
return s1; /* empty strings are everywhere */
else if (l2 > l1)
return NULL; /* avoids a negative 'l1' */
const char *init; /* to search for a '*s2' inside 's1' */
l2--; /* 1st char will be checked by 'memchr' */
l1 = l1-l2; /* 's2' cannot be found after that */
while (l1 > 0 && (init = (const char *) memchr(s1, *s2, l1)) != NULL) {
init++; /* 1st char is already checked */
if (memcmp(init, s2+1, l2) == 0)
return init-1;
else { /* correct 'l1' and 's1' to try again */
l1 -= init-s1;
s1 = init;
}
}
return NULL; /* not found */
}
static int str_replace(lua_State *L) {
size_t l1, l2, l3;
const char *src = luaL_checklstring(L, 1, &l1);
const char *p = luaL_checklstring(L, 2, &l2);
const char *p2 = luaL_checklstring(L, 3, &l3);
const char *s2;
int n = 0;
int init = 0;
luaL_Buffer b;
luaL_buffinit(L, &b);
while (1) {
s2 = lmemfind(src+init, l1-init, p, l2);
if (s2) {
luaL_addlstring(&b, src+init, s2-(src+init));
luaL_addlstring(&b, p2, l3);
init = init + (s2-(src+init)) + l2;
n++;
} else {
luaL_addlstring(&b, src+init, l1-init);
break;
}
}
luaL_pushresult(&b);
lua_pushnumber(L, (lua_Number) n); /* number of substitutions */
return 2;
}
int luaopen_strrep (lua_State *L) {
lua_pushcfunction(L, str_replace);
return 1;
}
We can compile this into a shared object with the proper linkage (cc -shared, cc -bundle, etc...), and load it into Lua like any other module with require.
local replace = require 'strrep'
print(replace('hello world', 'hello', 'yellow')) -- yellow world, 1.0
This answer is a formalized reconstruction of the comments above.

writing my first exploit in linux

How can I modify the source code in the func( ) so that the address to which the program returns after executing func () is changed in such a manner that the instruction printf("first print\n”) is skipped. Use the pointer *ret defined in func() to modify the return address appropriately in order to achieve this.
Here is the code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void func(char *str)
{
char buffer[24];
int *ret;
strcpy(buffer,str);
}
int main(int argc, char **argv)
{
if (argc < 2)
{
printf("One argument needed.\n");
exit(0);
}
int x;
x = 0;
func(argv[1]);
x = 1;
printf("first print\n");printf("second print\n");
}

As sherrellbc noted, a program's exploits are usually written without modifying its source code. But if you want, inserting these two lines into func() may do:
ret = (int *)&str; // point behind saved return address
ret[-1] += 12; // or however many code bytes are to be skipped

split a string using find_if

I found the following code in the book "Accelerated C++" (Chapter 6.1.1), but I can't compile it. The problem is with the find_if lines. I have the necessary includes (vector, string, algorithm, cctype). Any idea?
Thanks, Jabba
bool space(char c) {
return isspace(c);
}
bool not_space(char c) {
return !isspace(c);
}
vector<string> split_v3(const string& str)
{
typedef string::const_iterator iter;
vector<string> ret;
iter i, j;
i = str.begin();
while (i != str.end())
{
// ignore leading blanks
i = find_if(i, str.end(), not_space);
// find end of next word
j = find_if(i, str.end(), space);
// copy the characters in [i, j)
if (i != str.end()) {
ret.push_back(string(i, j));
}
i = j;
}
return ret;
}

Writing this in a more STL-like manner,
#include <algorithm>
#include <cctype>
#include <functional>
#include <iostream>
#include <iterator>
#include <string>
#include <vector>
using namespace std;
template<class P, class T>
void split(const string &str, P pred, T output) {
for (string::const_iterator i, j = str.begin(), str_end = str.end();
(i = find_if(j, str_end, not1(pred))) != str_end;)
*output++ = string(i, j = find_if(i, str_end, pred));
}
int main() {
string input;
while (cin >> input) {
vector<string> words;
split(input, ptr_fun(::isspace), inserter(words, words.begin()));
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));
}
return 0;
}

There is no problem in the code you posted. There is a very obvious problem with the real code you linked to: is_space and space are member functions, and they cannot be called without an instance of Split2. This requirement doesn't make sense, though, so at least you should make those functions static.
(Actually it doesn't make much sense for split_v3 to be a member function either. What does having a class called Split2 achieve over having just a free function - possibly in a namespace?)

As requested:
class SplitV2 {
public:
void foo();
private:
struct space { bool operator() (char c) { return isspace(c); } };
struct not_space {
Split2::space space;
bool operator() (char c) { return !space(c); }
};
Use them with std::find_if(it, it2, space()) or std::find_if(it, it2, not_space().
Notice that not_space has a default constructed space as a member variable. It may be not wise to construct space in every call to bool not_space::operator() but maybe the compiler could take care of this. If the syntax for overloading operator() confuses you and you would like to know more about using structs as Predicates you should have a look at operator overloading and some guidelines to the STL.

Off hand, I would say it should probably be
i = str.find_if( ...
j = str.find_if( ...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Ragel FSM for parsing SQL-like statements - lexer

Related

C++ vector<thread*> push_back(): can't figure out incantation to create unnamed thread variables

Does Arduino support the struct hack or similar solution in lieu of flexible array elements?

Is there a Lua string replace() function for faster replacements than gsub()?

writing my first exploit in linux

split a string using find_if

Categories

Resources