Optional capture of balanced brackets in Lua - string

Let's say I have lines of the form:
int[4] height
char c
char[50] userName
char[50+foo("bar")] userSchool
As you see, the bracketed expression is optional.
Can I parse these strings using Lua's string.match() ?
The following pattern works for lines that contain brackets:
line = "int[4] height"
print(line:match('^(%w+)(%b[])%s+(%w+)$'))
But is there a pattern that can handle also the optional brackets? The following does not work:
line = "char c"
print(line:match('^(%w+)(%b[]?)%s+(%w+)$'))
Can the pattern be written in another way to solve this?

Unlike regular expressions, ? in Lua pattern matches a single character.
You can use the or operator to do the job like this:
line:match('^(%w+)(%b[])%s+(%w+)$') or line:match('^(%w+)%s+(%w+)$')
A little problem with it is that Lua only keeps the first result in an expression. It depends on your needs, use an if statement or you can give the entire string the first capture like this
print(line:match('^((%w+)(%b[])%s+(%w+))$') or line:match('^((%w+)%s+(%w+))$'))

LPeg may be more appropriate for your case, especially if you plan to expand your grammar.
local re = require're'
local p = re.compile( [[
prog <- stmt* -> set
stmt <- S { type } S { name }
type <- name bexp ?
bexp <- '[' ([^][] / bexp)* ']'
name <- %w+
S <- %s*
]], {set = function(...)
local t, args = {}, {...}
for i=1, #args, 2 do t[args[i+1]] = args[i] end
return t
end})
local s = [[
int[4] height
char c
char[50] userName
char[50+foo("bar")] userSchool
]]
for k, v in pairs(p:match(s)) do print(k .. ' = ' .. v) end
--[[
c = char
userSchool = char[50+foo("bar")]
height = int[4]
userName = char[50]
--]]

Related

How convert first char to lowerCase

Try to play with string and I have string like: "Hello.Word" or "stackOver.Flow"
and i what first char convert to lower case: "hello.word" and "stackOver.flow"
For snakeCase it easy we need only change UpperCase to lower and add '_'
but in camelCase (with firs char in lower case) i dont know how to do this
open System
let convertToSnakeCase (value:string) =
String [|
Char.ToLower value.[0]
for ch in value.[1..] do
if Char.IsUpper ch then '_'
Char.ToLower ch |]
Who can help?
module Identifier =
open System
let changeCase (str : string) =
if String.IsNullOrEmpty(str) then str
else
let isUpper = Char.IsUpper
let n = str.Length
let builder = new System.Text.StringBuilder()
let append (s:string) = builder.Append(s) |> ignore
let rec loop i j =
let k =
if i = n (isUpper str.[i] && (not (isUpper str.[i - 1])
((i + 1) <> n && not (isUpper str.[i + 1]))))
then
if j = 0 then
append (str.Substring(j, i - j).ToLower())
elif (i - j) > 2 then
append (str.Substring(j, 1))
append (str.Substring(j + 1, i - j - 1).ToLower())
else
append (str.Substring(j, i - j))
i
else
j
if i = n then builder.ToString()
else loop (i + 1) k
loop 1 0
type System.String with
member x.ToCamelCase() = changeCase x
printfn "%s" ("StackOver.Flow".ToCamelCase()) //stackOver.Flow
//need stackOver.flow
I suspect there are much more elegant and concise solutions, I sense you are learning functional programming, so I think its best to do stuff like this with recursive function rather than use some magic library function. I notice in your question you ARE using a recusive function, but also an index into an array, lists and recursive function work much more easily than arrays, so if you use recursion the solution is usually simpler if its a list.
I'd also avoid using a string builder, assuming you are learning fp, string builders are imperative, and whilst they obviously work, they wont help you get your head around using immutable data.
The key then is to use the pattern match to match the scenario that you want to use to trigger the upper/lower case logic, as it depends on 2 consecutive characters.
I THINK you want this to happen for the 1st char, and after a '.'?
(I've inserted a '.' as the 1st char to allow the recursive function to just process the '.' scenario, rather than making a special case).
let convertToCamelCase (value : string) =
let rec convertListToCamelCase (value : char list) =
match value with
| [] -> []
| '.' :: second :: rest ->
'.' :: convertListToCamelCase (Char.ToLower second :: rest)
| c :: rest ->
c :: convertListToCamelCase rest
// put a '.' on the front to simplify the logic (and take it off after)
let convertAsList = convertListToCamelCase ('.' :: (value.ToCharArray() |> Array.toList))
String ((convertAsList |> List.toArray).[1..])
The piece to worry about is the recusive piece, the rest of it is just flipping an array to a list and back again.

How to separate a string by Capital Letter?

I currently have to a code in ABAP which contains a String that has multiple words that start with Capital letters/Uppercase and there is no space in-between.
I have to separate it into an internal table like this:
INPUT :
NameAgeAddress
OUTPUT :
Name
Age
Address
Here is the shortest code I could find, which uses a regular expression combined with SPLIT:
SPLIT replace( val = 'NameAgeAddress' regex = `(?!^.)\u` with = ` $0` occ = 0 )
AT ` `
INTO TABLE itab.
So, replace converts 'NameAgeAddress' into 'Name Age Address' and SPLIT puts the 3 words into an internal table.
Details:
(?!^.) to say the next character to find (\u) should not be the first character
\u being any upper case letter
$0 to replace the found string ($0) by itself preceded with a space character
occ = 0 to replace all occurrences
Unfortunately, the SPLIT statement in ABAP does not allow a regex as separator expression. Therefore, we have to use progressive matching, which is a bit awkward in ABAP:
report zz_test_split_capital.
parameters: p_input type string default 'NameAgeAddress' lower case.
data: output type stringtab,
off type i,
moff type i,
mlen type i.
while off < strlen( p_input ).
find regex '[A-Z][^A-Z]*'
in section offset off of p_input
match offset moff match length mlen.
if sy-subrc eq 0.
append substring( val = p_input off = moff len = mlen ) to output.
off = moff + mlen.
else.
exit.
endif.
endwhile.
cl_demo_output=>display_data( output ).
Just for comparison, the following statement would do the job in Perl:
my $input = "NameAgeAddress";
my #output = split /(?=[A-Z])/, $input;
# gives #output = ('Name','Age','Address')
It is easy with using regular expressions. The solution could look like this.
REPORT ZZZ.
DATA: g_string TYPE string VALUE `NameAgeAddress`.
DATA(gcl_regex) = NEW cl_abap_regex( pattern = `[A-Z]{1}[a-z]+` ).
DATA(gcl_matcher) = gcl_regex->create_matcher( text = g_string ).
WHILE gcl_matcher->find_next( ).
DATA(g_match_result) = gcl_matcher->get_match( ).
WRITE / g_string+g_match_result-offset(g_match_result-length).
ENDWHILE.
For when regular expressions are just overkill and plain old ABAP will do:
DATA(str) = 'NameAgeAddress'.
IF str CA sy-abcde.
DATA(off) = 0.
DO.
data(tailstart) = off + 1.
IF str+tailstart CA sy-abcde.
DATA(len) = sy-fdpos + 1.
WRITE: / str+off(len).
add len to off.
ELSE.
EXIT.
ENDIF.
ENDDO.
write / str+off.
ENDIF.
If you do not want to use or cannot use Regex, here another solution:
DATA: lf_input TYPE string VALUE 'NameAgeAddress',
lf_offset TYPE i,
lf_current_letter TYPE char1,
lf_letter_in_capital TYPE char1,
lf_word TYPE string,
lt_word LIKE TABLE OF lf_word.
DO strlen( lf_input ) TIMES.
lf_offset = sy-index - 1.
lf_current_letter = lf_input+lf_offset(1).
lf_letter_in_capital = to_upper( lf_current_letter ).
IF lf_current_letter = lf_letter_in_capital.
APPEND INITIAL LINE TO lt_word ASSIGNING FIELD-SYMBOL(<ls_word>).
ENDIF.
IF <ls_word> IS ASSIGNED. "if input string does not start with capital letter
<ls_word> = <ls_word> && lf_current_letter.
ENDIF.
ENDDO.

Find substring of string w/o knowing the length of string

I have a string x: x = "{abc}{def}{ghi}"
And I need to print the string between second { and second }, in this case def. How can I do this without knowing the length of the string? For example, the string x could also be {abcde}{fghij}{klmno}"
This is where pattern matching is useful:
local x = "{abc}{def}{ghi}"
local result = x:match(".-{.-}.-{(.-)}")
print(result)
.- matches zero or more characters, non-greedy. The whole pattern .-{.-}.-{(.-)} captures what's between the second { and the second }.
Try also x:match(".-}{(.-)}"), which is simpler.
I would go about it in a different manner:
local i, x, result = 1, "{abc}{def}{ghi}"
for w in x:gmatch '{(.-)}' do
if i == 2 then
result = w
break
else
i = i + 1
end
end
print( result )

Insert quoted and unquoted parts of string in table

I've been working on this part of a saycommand system which is supposed to separate parts of a string and put them in a table which is sent to a function, which is queried at the beginning of the string. This would look like, for example, !save 1 or !teleport 0 1, or !tell 5 "a private message".
I would like this string to turn into a table:
[[1 2 word 2 9 'more words' 1 "and more" "1 2 34"]]
(Every non-quoted part of the string gets its own key, and the quoted parts get grouped into a key)
1 = 1
2 = 2
3 = word
4 = 2
5 = 9
6 = more words
7 = 1
8 = and more
9 = 1 2 34
I've tried doing this with Lua pattern, but I'm stuck trying to find out how to capture both quoted and unquoted pieces of the string. I've tried a lot of things, but nothing helped.
My current pattern attempts look like this:
a, d = '1 2 word 2 9 "more words" 1 "and more" "1 2 34"" ', {}
-- previous attempts
--[[
This one captures quotes
a:gsub('(["\'])(.-)%1', function(a, b) table.insert(d, b) end)
This one captures some values and butchered quotes,
which might have to do with spaces in the string
a:gsub('(["%s])(.-)%1', function(a, b) table.insert(d, b) end)
This one captures every value, but doesn't take care of quotes
a:gsub('(%w+)', function(a) table.insert(d, a) end)
This one tries making %s inside of quotes into underscores to
ignore them there, but it doesn't work
a = a:gsub('([%w"\']+)', '%1_')
a:gsub('(["\'_])(.-)%1', function(a, b) table.insert(d, b) end)
a:gsub('([%w_]+)', function(a) table.insert(d, a) end)
This one was a wild attempt at cracking it, but no success
a:gsub('["\']([^"\']-)["\'%s]', function(a) table.insert(d, a) end)
--]]
-- This one adds spaces, which would later be trimmed off, to test
-- whether it helped with the butchered strings, but it doesn't
a = a:gsub('(%w)(%s)(%w)', '%1%2%2%3')
a:gsub('(["\'%s])(.-)%1', function(a, b) table.insert(d, b) end)
for k, v in pairs(d) do
print(k..' = '..v)
end
This would not be needed for simple commands, but a more complex one like !tell 1 2 3 4 5 "a private message sent to five people" does need it, first to check if it's sent to multiple people and next to find out what the message is.
Further down the line I want to add commands like !give 1 2 3 "component:material_iron:weapontype" "food:calories", which is supposed to add two items to three different people, would benefit greatly from such a system.
If this is impossible in Lua pattern, I'll try doing it with for loops and such, but I really feel like I'm missing something obvious. Am I overthinking this?
You cannot process quoted strings with Lua patterns. You need to parse the string explicitly, as in the code below.
function split(s)
local t={}
local n=0
local b,e=0,0
while true do
b,e=s:find("%s*",e+1)
b=e+1
if b>#s then break end
n=n+1
if s:sub(b,b)=="'" then
b,e=s:find(".-'",b+1)
t[n]=s:sub(b,e-1)
elseif s:sub(b,b)=='"' then
b,e=s:find('.-"',b+1)
t[n]=s:sub(b,e-1)
else
b,e=s:find("%S+",b)
t[n]=s:sub(b,e)
end
end
return t
end
s=[[1 2 word 2 9 'more words' 1 "and more" "1 2 34"]]
print(s)
t=split(s)
for k,v in ipairs(t) do
print(k,v)
end
Lua string patterns and regex for that matter generally aren't well suited when you need to do parsing that requires varying nesting levels or token count balancing like parenthesis ( ). But there is another tool available to Lua that's powerful enough to deal with that requirement: LPeg.
The LPeg syntax is a bit archaic and takes some getting use to so I'll use the lpeg re module instead to make it easier to digest. Keep in mind that anything you can do in one form of the syntax you can also express in the other form as well.
I'll start by defining the grammar for parsing your format description:
local re = require 're'
local cmdgrammar =
[[
saycmd <- '!' cmd extra
cmd <- %a%w+
extra <- (singlequote / doublequote / unquote / .)*
unquote <- %w+
singlequote <- "'" (unquote / %s)* "'"
doublequote <- '"' (unquote / %s)* '"'
]]
Next, compile the grammar and use it to match some of your test examples:
local cmd_parser = re.compile(cmdgrammar)
local saytest =
{
[[!save 1 2 word 2 9 'more words' 1 "and more" "1 2 34"]],
[[!tell 5 "a private message"]],
[[!teleport 0 1]],
[[!say 'another private message' 42 "foo bar" baz]],
}
There are currently no captures in the grammar so re.match returns the last character position in the string it was able to match up to + 1. That means a successful parse will return the full character count of the string + 1 and therefore is a valid instance of your grammar.
for _, test in ipairs(saytest) do
assert(cmd_parser:match(test) == #test + 1)
end
Now comes the interesting part. Once you have the grammar working as desired you can now add captures that automatically extracts the results you want into a lua table with relatively little effort. Here's the final grammar spec + table captures:
local cmdgrammar =
[[
saycmd <- '!' {| {:cmd: cmd :} {:extra: extra :} |}
cmd <- %a%w+
extra <- {| (singlequote / doublequote / { unquote } / .)* |}
unquote <- %w+
singlequote <- "'" { (unquote / %s)* } "'"
doublequote <- '"' { (unquote / %s)* } '"'
]]
Running the tests again and dumping the re.match results:
for i, test in ipairs(saytest) do
print(i .. ':')
dump(cmd_parser:match(test))
end
You should get output similar to:
lua say.lua
1:
{
extra = {
"1",
"2",
"word",
"2",
"9",
"more words",
"1",
"and more",
"1 2 34"
},
cmd = "save"
}
2:
{
extra = {
"5",
"a private message"
},
cmd = "tell"
}
3:
{
extra = {
"0",
"1"
},
cmd = "teleport"
}
4:
{
extra = {
"another private message",
"42",
"foo bar",
"baz"
},
cmd = "say"
}

How can I extract the middle part of a string in FSharp?

I want to extract the middle part of a string using FSharp if it is quoted, similar like this:
let middle =
match original with
| "\"" + mid + "\"" -> mid
| all -> all
But it doesn't work because of the infix operator + in pattern expression. How can I extract this?
I don't think there is any direct support for this, but you can certainly write an active pattern. Active patterns allow you to implement your own code that will run as part of the pattern matching and you can extract & return some part of the value.
The following is a pattern that takes two parameters (prefix and postfix string) and succeeds if the given input starts/ends with the specified strings. The pattern is not complete (can fail), so we'll use the |Name|_| syntax and it will need to return option value:
let (|Middle|_|) prefix postfix (input:string) =
// Check if the string starts with 'prefix', ends with 'postfix' and
// is longer than the two (meaning that it contains some middle part)
if input.StartsWith(prefix) && input.EndsWith(postfix) &&
input.Length >= (prefix.Length + postfix.Length) then
// Strip the prefix/postfix and return 'Some' to indicate success
let len = input.Length - prefix.Length - postfix.Length
Some(input.Substring(prefix.Length, len))
else None // Return 'None' - string doesn't match the pattern
Now we can use Middle in pattern matching (e.g. when using match):
match "[aaa]" with
| Middle "[" "]" mid -> mid
| all -> all
Parameterized active patterns to the rescue!
let (|HasPrefixSuffix|_|) (pre:string, suf:string) (s:string) =
if s.StartsWith(pre) then
let rest = s.Substring(pre.Length)
if rest.EndsWith(suf) then
Some(rest.Substring(0, rest.Length - suf.Length))
else
None
else
None
let Test s =
match s with
| HasPrefixSuffix("\"","\"") inside ->
printfn "quoted, inside is: %s" inside
| _ -> printfn "not quoted: %s" s
Test "\"Wow!\""
Test "boring"
… or just use plain old regular expression
let Middle input =
let capture = Regex.Match(input, "\"([^\"]+)\"")
match capture.Groups.Count with
| 2 -> capture.Groups.[1].Value
| _ -> input
Patterns have a limited grammar - you can't just use any expression. In this case, I'd just use an if/then/else:
let middle (s:string) =
if s.[0] = '"' && s.[s.Length - 1] = '"' && s.Length >= 2 then
s.Substring(1,s.Length - 2)
else s
If grabbing the middle of a string with statically known beginnings and endings is something that you'll do a lot, then you can always use an active pattern as Tomas suggests.
Not sure how efficient this is:
let GetQuote (s:String) (q:char) =
s
|> Seq.skip ((s |> Seq.findIndex (fun c -> c = q))+1)
|> Seq.takeWhile (fun c-> c <> q)
|> Seq.fold(fun acc c -> String.Format("{0}{1}", acc, c)) ""
Or there's this with Substring in place of the fold:
let GetQuote2 (s:String) (q:char) =
let isQuote = (fun c -> c = q)
let a = (s |> Seq.findIndex isQuote)+1
let b = ((s |> Seq.take(a) |> Seq.findIndex isQuote)-1)
s.Substring(a,b);
These will get the first instance of the quoted text anywhere in the string e.g "Hello [World]" -> "World"

Resources