Insert quoted and unquoted parts of string in table - string

I've been working on this part of a saycommand system which is supposed to separate parts of a string and put them in a table which is sent to a function, which is queried at the beginning of the string. This would look like, for example, !save 1 or !teleport 0 1, or !tell 5 "a private message".
I would like this string to turn into a table:
[[1 2 word 2 9 'more words' 1 "and more" "1 2 34"]]
(Every non-quoted part of the string gets its own key, and the quoted parts get grouped into a key)
1 = 1
2 = 2
3 = word
4 = 2
5 = 9
6 = more words
7 = 1
8 = and more
9 = 1 2 34
I've tried doing this with Lua pattern, but I'm stuck trying to find out how to capture both quoted and unquoted pieces of the string. I've tried a lot of things, but nothing helped.
My current pattern attempts look like this:
a, d = '1 2 word 2 9 "more words" 1 "and more" "1 2 34"" ', {}
-- previous attempts
--[[
This one captures quotes
a:gsub('(["\'])(.-)%1', function(a, b) table.insert(d, b) end)
This one captures some values and butchered quotes,
which might have to do with spaces in the string
a:gsub('(["%s])(.-)%1', function(a, b) table.insert(d, b) end)
This one captures every value, but doesn't take care of quotes
a:gsub('(%w+)', function(a) table.insert(d, a) end)
This one tries making %s inside of quotes into underscores to
ignore them there, but it doesn't work
a = a:gsub('([%w"\']+)', '%1_')
a:gsub('(["\'_])(.-)%1', function(a, b) table.insert(d, b) end)
a:gsub('([%w_]+)', function(a) table.insert(d, a) end)
This one was a wild attempt at cracking it, but no success
a:gsub('["\']([^"\']-)["\'%s]', function(a) table.insert(d, a) end)
--]]
-- This one adds spaces, which would later be trimmed off, to test
-- whether it helped with the butchered strings, but it doesn't
a = a:gsub('(%w)(%s)(%w)', '%1%2%2%3')
a:gsub('(["\'%s])(.-)%1', function(a, b) table.insert(d, b) end)
for k, v in pairs(d) do
print(k..' = '..v)
end
This would not be needed for simple commands, but a more complex one like !tell 1 2 3 4 5 "a private message sent to five people" does need it, first to check if it's sent to multiple people and next to find out what the message is.
Further down the line I want to add commands like !give 1 2 3 "component:material_iron:weapontype" "food:calories", which is supposed to add two items to three different people, would benefit greatly from such a system.
If this is impossible in Lua pattern, I'll try doing it with for loops and such, but I really feel like I'm missing something obvious. Am I overthinking this?

You cannot process quoted strings with Lua patterns. You need to parse the string explicitly, as in the code below.
function split(s)
local t={}
local n=0
local b,e=0,0
while true do
b,e=s:find("%s*",e+1)
b=e+1
if b>#s then break end
n=n+1
if s:sub(b,b)=="'" then
b,e=s:find(".-'",b+1)
t[n]=s:sub(b,e-1)
elseif s:sub(b,b)=='"' then
b,e=s:find('.-"',b+1)
t[n]=s:sub(b,e-1)
else
b,e=s:find("%S+",b)
t[n]=s:sub(b,e)
end
end
return t
end
s=[[1 2 word 2 9 'more words' 1 "and more" "1 2 34"]]
print(s)
t=split(s)
for k,v in ipairs(t) do
print(k,v)
end

Lua string patterns and regex for that matter generally aren't well suited when you need to do parsing that requires varying nesting levels or token count balancing like parenthesis ( ). But there is another tool available to Lua that's powerful enough to deal with that requirement: LPeg.
The LPeg syntax is a bit archaic and takes some getting use to so I'll use the lpeg re module instead to make it easier to digest. Keep in mind that anything you can do in one form of the syntax you can also express in the other form as well.
I'll start by defining the grammar for parsing your format description:
local re = require 're'
local cmdgrammar =
[[
saycmd <- '!' cmd extra
cmd <- %a%w+
extra <- (singlequote / doublequote / unquote / .)*
unquote <- %w+
singlequote <- "'" (unquote / %s)* "'"
doublequote <- '"' (unquote / %s)* '"'
]]
Next, compile the grammar and use it to match some of your test examples:
local cmd_parser = re.compile(cmdgrammar)
local saytest =
{
[[!save 1 2 word 2 9 'more words' 1 "and more" "1 2 34"]],
[[!tell 5 "a private message"]],
[[!teleport 0 1]],
[[!say 'another private message' 42 "foo bar" baz]],
}
There are currently no captures in the grammar so re.match returns the last character position in the string it was able to match up to + 1. That means a successful parse will return the full character count of the string + 1 and therefore is a valid instance of your grammar.
for _, test in ipairs(saytest) do
assert(cmd_parser:match(test) == #test + 1)
end
Now comes the interesting part. Once you have the grammar working as desired you can now add captures that automatically extracts the results you want into a lua table with relatively little effort. Here's the final grammar spec + table captures:
local cmdgrammar =
[[
saycmd <- '!' {| {:cmd: cmd :} {:extra: extra :} |}
cmd <- %a%w+
extra <- {| (singlequote / doublequote / { unquote } / .)* |}
unquote <- %w+
singlequote <- "'" { (unquote / %s)* } "'"
doublequote <- '"' { (unquote / %s)* } '"'
]]
Running the tests again and dumping the re.match results:
for i, test in ipairs(saytest) do
print(i .. ':')
dump(cmd_parser:match(test))
end
You should get output similar to:
lua say.lua
1:
{
extra = {
"1",
"2",
"word",
"2",
"9",
"more words",
"1",
"and more",
"1 2 34"
},
cmd = "save"
}
2:
{
extra = {
"5",
"a private message"
},
cmd = "tell"
}
3:
{
extra = {
"0",
"1"
},
cmd = "teleport"
}
4:
{
extra = {
"another private message",
"42",
"foo bar",
"baz"
},
cmd = "say"
}

Related

Pattern Matching BASIC programming Language and Universe Database

I need to identify following patterns in string.
- "2N':'2N':'2N"
- "2N'-'2N'-'2N"
- "2N'/'2N'/'2N"
- "2N'/'2N'-'2N"
AND SO ON.....
basically i want this pattern if written in Simple language
2 NUMBERS [: / -] 2 NUMBERS [: / -] 2 NUMBERS
So is there anyway by which i could write one pattern which will cover all the possible scenarios ? or else i have to write total 9 patterns and had to match all 9 patterns to string.... and it is not the scenario in my code , i have to match 4, 2 number digits separated by [: / -] to string for which i have towrite total 27 patterns. So for understanding purpose i have taken 3 ,2 digit scenario...
Please help me...Thank you
Maybe you could try something like (Pick R83 style)
OK = X MATCH "2N1X2N1X2N" AND X[3,1]=X[6,1] AND INDEX(":/-",X[3,1],1) > 0
Where variable X is some input string like: 12-34-56
Should set variable OK to 1 if validation passes, else 0 for any invalid format.
This seems to get all your required validation into a single statement. I have assumed that the non-numeric characters have to be the same. If this is not true, the check could be changed to something like:
OK = X MATCH "2N1X2N1X2N" AND INDEX(":/-",X[3,1],1) > 0 AND INDEX(":/-",X[6,1],1) > 0
Ok, I guess the requirement of surrounding characters was not obvious to me. Still, it does not make it much harder. You just need to 'parse' the string looking for the first (I assume) such pattern (if any) in the input string. This can be done in a couple of lines of code. Here is a (rather untested ) R83 style test program:
PROMPT ":"
LOOP
LOOP
CRT 'Enter test string':
INPUT S
WHILE S # "" AND LEN(S) < 8 DO
CRT "Invalid input! Hit RETURN to exit, or enter a string with >= 8 chars!"
REPEAT
UNTIL S = "" DO
*
* Look for 1st occurrence of pattern in string..
CARDNUM = ""
FOR I = 1 TO LEN(S)-7 WHILE CARDNUM = ""
IF S[I,8] MATCH "2N1X2N1X2N" THEN
IF INDEX(":/-",S[I+2,1],1) > 0 AND INDEX(":/-",S[I+5,1],1) > 0 THEN
CARDNUM = S[I,8] ;* Found it!
END ELSE I = I + 8
END
NEXT I
*
CRT CARDNUM
REPEAT
There is only 7 or 8 lines here that actually look for the card number pattern in the source/test string.
Not quite perfect but how about 2N1X2N1X2N this gets you 2 number followed by 1 of any character followed by 2 numbers etc.
This might help:
BIG.STRING ="HELLO TILDE ~ CARD 12:34:56 IS IN THIS STRING"
TEMP.STRING = BIG.STRING
CONVERT "~:/-" TO "*~~~" IN TEMP.STRING
IF TEMP.STRING MATCHES '0X2N"~"2N"~"2N0X' THEN
FIRST.TILDE.POSN = INDEX(TEMP.STRING,"~",1)
CARD.STRING = BIG.STRING[FIRST.TILDE.POSN-2,8]
PRINT CARD.STRING
END

How do I get from string "3+10" to strings "3" "+" "10"?

I'm making a graphing calculator in Unity and I have input with strings like "3+10" and I want to split it to "3","+" and "10".
I can figure out a way to deal with them once I've got them to this form, but I really need a way to split the string to the left and right of key characters such as plus, times, exponent, etc.
I'm doing this in Unity, but a way to do this in any language should help.
C#
The following code will do what you asked for (and nothing more).
string input = "3+10-5";
string pattern = #"([-+^*\/])";
string[] substrings = Regex.Split(input, pattern);
// results in substrings = {"3", "+", "10", "-", "5"}
By using Regex.Split instead of String.Split you are able to retrieve the math operators as well. This is done by putting the math operators in a capture group ( ). If you're not familiar with regular expressions you should google the basics.
The code above will stubbornly use the math operators to split your string. If the string doesn't make sense, the method doesn't care and may even produce unexpected results. For example "5//10-" will result in {"5", "/", "", "10", "-", ""}. Note that only one / is returned and empty strings are added.
You can use more complex regular expressions to check if your string is a valid mathematical expression before you try to split it. For example ^(\d+(?:.\d+)?+([-+*^\/]\g<1>)?)$ would check if your string consists of a decimal number and zero or more combinations of an operator and another decimal number.
Here is the C# way -- which I mention because you are using Unity.
words = phrase.Split(default(string[]),StringSplitOptions.RemoveEmptyEntries);
https://msdn.microsoft.com/en-us/library/tabh47cf%28v=vs.110%29.aspx
Here is Java code for splitting a String by math operators
String[] splitByOperators(String input) {
String[] output = new String[input.length()];
int index = 0;
String current = "";
for (char c : input){
if (c == '+' || c == '-' || c == '*' || c == '/'){
output[index] = current;
index++;
output[index] = c;
index++;
current = "";
} else {
current = current + c;
}
}
output[index] = current;
return output;
}
Using Python regular expressions:
>>> import re
>>> match = re.search(r'(\d+)(.*)(\d+)', "3+1")
>>> match.group(1)
'3'
>>> match.group(2)
'+'
>>> match.group(3)
'1'
The reason for using regular expressions is for greater flexibility in handling a variety of simple arithmetic expressions.
R: EDITED
Take your input vector as x<-c("3+10", "4/12" , "8-3" ,"12*1","1+2-3*4/8").
We can use the following string split based on regex:
> strsplit(x,split="(?<=\\d)(?=[+*-/])|(?<=[+*-/])(?=\\d)",perl=T)
[[1]]
[1] "3" "+" "10"
[[2]]
[1] "4" "/" "12"
[[3]]
[1] "8" "-" "3"
[[4]]
[1] "12" "*" "1"
[[5]]
[1] "1" "+" "2" "-" "3" "*" "4" "/" "8"
How it works:
Split the string when one of two things is found:
A digit followed by an arithmetic operator. (?<=\\d) finds something immediately preceded by a digit, while (?=[+*-/]) finds something immediately succeeded by an arithmetic operator, i.e. +, *, -, or /. The "something" in both cases is the blank string "" found between a digit and an operator, and the string is split at such a point.
An arithmetic operator followed by a digit. This is just the reverse of the above.

Python Join String to Produce Combinations For All Words in String

If my string is this: 'this is a string', how can I produce all possible combinations by joining each word with its neighboring word?
What this output would look like:
this is a string
thisis a string
thisisa string
thisisastring
thisis astring
this isa string
this isastring
this is astring
What I have tried:
s = 'this is a string'.split()
for i, l in enumerate(s):
''.join(s[0:i])+' '.join(s[i:])
This produces:
'this is a string'
'thisis a string'
'thisisa string'
'thisisastring'
I realize I need to change the s[0:i] part because it's statically anchored at 0 but I don't know how to move to the next word is while still including this in the output.
A simpler (and 3x faster than the accepted answer) way to use itertools product:
s = 'this is a string'
s2 = s.replace('%', '%%').replace(' ', '%s')
for i in itertools.product((' ', ''), repeat=s.count(' ')):
print(s2 % i)
You can also use itertools.product():
import itertools
s = 'this is a string'
words = s.split()
for t in itertools.product(range(len('01')), repeat=len(words)-1):
print(''.join([words[i]+t[i]*' ' for i in range(len(t))])+words[-1])
Well, it took me a little longer than I expected... this is actually tricker than I thought :)
The main idea:
The number of spaces when you split the string is the length or the split array - 1. In our example there are 3 spaces:
'this is a string'
^ ^ ^
We'll take a binary representation of all the options to have/not have either one of the spaces, so in our case it'll be:
000
001
011
100
101
...
and for each option we'll generate the sentence respectively, where 111 represents all 3 spaces: 'this is a string' and 000 represents no-space at all: 'thisisastring'
def binaries(n):
res = []
for x in range(n ** 2 - 1):
tmp = bin(x)
res.append(tmp.replace('0b', '').zfill(n))
return res
def generate(arr, bins):
res = []
for bin in bins:
tmp = arr[0]
i = 1
for digit in list(bin):
if digit == '1':
tmp = tmp + " " + arr[i]
else:
tmp = tmp + arr[i]
i += 1
res.append(tmp)
return res
def combinations(string):
s = string.split(' ')
bins = binaries(len(s) - 1)
res = generate(s, bins)
return res
print combinations('this is a string')
# ['thisisastring', 'thisisa string', 'thisis astring', 'thisis a string', 'this isastring', 'this isa string', 'this is astring', 'this is a string']
UPDATE:
I now see that Amadan thought of the same idea - kudos for being quicker than me to think about! Great minds think alike ;)
The easiest is to do it recursively.
Terminating condition: Schrödinger join of a single element list is that word.
Recurring condition: say that L is the Schrödinger join of all the words but the first. Then the Schrödinger join of the list consists of all elements from L with the first word directly prepended, and all elements from L with the first word prepended with an intervening space.
(Assuming you are missing thisis astring by accident. If it is deliberately, I am sure I have no idea what the question is :P )
Another, non-recursive way you can do it is to enumerate all numbers from 0 to 2^(number of words - 1) - 1, then use the binary representation of each number as a selector whether or not a space needs to be present. So, for example, the abovementioned thisis astring corresponds to 0b010, for "nospace, space, nospace".

Automatic acronyms of strings in R

Long strings in plots aren't always attractive. What's the shortest way of making an acronym in R? E.g., "Hello world" to "HW", and preferably to have unique acronyms.
There's function abbreviate, but it just removes some letters from the phrase, instead of taking first letters of each word.
An easy way would be to use a combination of strsplit, substr, and make.unique.
Here's an example function that can be written:
makeInitials <- function(charVec) {
make.unique(vapply(strsplit(toupper(charVec), " "),
function(x) paste(substr(x, 1, 1), collapse = ""),
vector("character", 1L)))
}
Test it out:
X <- c("Hello World", "Home Work", "holidays with children", "Hello Europe")
makeInitials(X)
# [1] "HW" "HW.1" "HWC" "HE"
That said, I do think that abbreviate should suffice, if you use some of its arguments:
abbreviate(X, minlength=1)
# Hello World Home Work holidays with children Hello Europe
# "HlW" "HmW" "hwc" "HE"
Using regex you can do following. The regex pattern ((?<=\\s).|^.) looks for any letter followed by space or first letter of the string. Then we just paste resulting vectors using collapse argument to get first letter based acronym. And as Ananda suggested, if you want to make unique pass the result through make.unique.
X <- c("Hello World", "Home Work", "holidays with children")
sapply(regmatches(X, gregexpr(pattern = "((?<=\\s).|^.)", text = X, perl = T)), paste, collapse = ".")
## [1] "H.W" "H.W" "h.w.c"
# If you want to make unique
make.unique(sapply(regmatches(X, gregexpr(pattern = "((?<=\\s).|^.)", text = X, perl = T)), paste, collapse = "."))
## [1] "H.W" "H.W.1" "h.w.c"

Split a string using string.gmatch() in Lua

There are some discussions here, and utility functions, for splitting strings, but I need an ad-hoc one-liner for a very simple task.
I have the following string:
local s = "one;two;;four"
And I want to split it on ";". I want, eventually, go get { "one", "two", "", "four" } in return.
So I tried to do:
local s = "one;two;;four"
local words = {}
for w in s:gmatch("([^;]*)") do table.insert(words, w) end
But the result (the words table) is { "one", "", "two", "", "", "four", "" }. That's certainly not what I want.
Now, as I remarked, there are some discussions here on splitting strings, but they have "lengthy" functions in them and I need something succinct. I need this code for a program where I show the merit of Lua, and if I add a lengthy function to do something so trivial it would go against me.
local s = "one;two;;four"
local words = {}
for w in (s .. ";"):gmatch("([^;]*);") do
table.insert(words, w)
end
By adding one extra ; at the end of the string, the string now becomes "one;two;;four;", everything you want to capture can use the pattern "([^;]*);" to match: anything not ; followed by a ;(greedy).
Test:
for n, w in ipairs(words) do
print(n .. ": " .. w)
end
Output:
1: one
2: two
3:
4: four
Just changing * to + works.
local s = "one;two;;four"
local words = {}
for w in s:gmatch("([^;]+)") do
table.insert(words, w)
print(w)
end
The magic character * represents 0 or more occurrene, so when it meet ',', lua regarded it as a empty string that [^;] does not exist.
Sorry for my carelessness, the words[3] should be a empty string, but when I run the original code in lua5.4 interpreter, everything works.
code here
running result here
(I have to put links because of lack of reputation)
function split(str,sep)
local array = {}
local reg = string.format("([^%s]+)",sep)
for mem in string.gmatch(str,reg) do
table.insert(array, mem)
end
return array
end
local s = "one;two;;four"
local array = split(s,";")
for n, w in ipairs(array) do
print(n .. ": " .. w)
end
result:
1:one
2:two
3:four

Resources