Lua string.match problem? - string

how can I match following strings with one expression?
local a = "[a 1.001523] <1.7 | [...]> < a123 > < ? 0 ?>";
local b = "[b 2.68] <..>";
local c = "[b 2.68] <>";
local d = "[b 2.68] <> < > < ?>";
local name, netTime, argument1, argument2, argumentX = string:match(?);
-- (string is a or b or c or d)
The problem is, the strings can have various counts of arguments( "<...>" ) and the arguments can have numbers, chars, special chars or spaces in it.
I'm new to Lua and I have to learn string matching, but I cannot learn this in a few hours. I ask YOU, because I need the result tomorrow and I really would appreciate your help!
cheers :)

Lua patterns are very limited, you can't have alternative expressions and no optional groups. So that means all of your arguments would need to be matched with the same expressions and you would need to use a fixed amount of arguments if you only write a single pattern. Check this tutorial, it doesn't take long to get used to lua patterns.
You might be still able to parse those strings using multiple patterns. ^%[(%a+)%s(%d+%.%d+)%]%s is the best you can do to get the first part, assuming local name can have multiple upper and lower case letters. To match the arguments, run multiple patterns on parts of the input, like <%s*> or <(%w+)> to check each argument individually.
Alternatively get a regex library or a parser, which would be much more useful here.

Lua patterns are indeed limited, but you can get around if you can make some assumptions. Like if there will be no >'s in the arguments you could just loop over all matching pairs of <> :
local a = "[a 1.001523] <1.7 | [...]> < a123 > < ? 0 ?>"
local b = "[b 2.68] <..>"
local c = "[b 2.68] <>"
local d = "[b 2.68] <> < > < ?>"
function parse(str)
local name,nettime,lastPos = str:match'%[(%a+)%s(%d+%.%d+)%]()'
local arguments={}
-- start looking for arguments only after the initial part in [ ]
for argument in str:sub(lastPos+1):gmatch('(%b<>)') do
argument=argument:sub(2,-2) -- strip <>
-- do whatever you need with the argument. Here we'll just put it in a table
arguments[#arguments+1]=argument
end
return name,nettime,unpack(arguments)
end
For more complicated things you'll be better of using something like LPEG, like kapep said.

Related

Python ord() and chr()

I have:
txt = input('What is your sentence? ')
list = [0]*128
for x in txt:
list[ord(x)] += 1
for x in list:
if x >= 1:
print(chr(list.index(x)) * x)
As per my understanding this should just output every letter in a sentence like:
))
111
3333
etc.
For the string "aB)a2a2a2)" the output is correct:
))
222
B
aaaa
For the string "aB)a2a2a2" the output is wrong:
)
222
)
aaaa
I feel like all my bases are covered but I'm not sure what's wrong with this code.
When you do list.index(x), you're searching the list for the first index that value appears. That's not actually what you want though, you want the specific index of the value you just read, even if the same value occurs somewhere else earlier in the list too.
The best way to get indexes along side values from a sequence is with enuemerate:
for i, x in enumerate(list):
if x >= 1:
print(chr(i) * x)
That should get you the output you want, but there are several other things that would make your code easier to read and understand. First of all, using list as a variable name is a very bad idea, as that will shadow the builtin list type's name in your namespace. That makes it very confusing for anyone reading your code, and you even confuse yourself if you want to use the normal list for some purpose and don't remember you've already used it for a variable of your own.
The other issue is also about variable names, but it's a bit more subtle. Your two loops both use a loop variable named x, but the meaning of the value is different each time. The first loop is over the characters in the input string, while the latter loop is over the counts of each character. Using meaningful variables would make things a lot clearer.
Here's a combination of all my suggested fixes together:
text = input('What is your sentence? ')
counts = [0]*128
for character in text:
counts[ord(character)] += 1
for index, count in enumerate(counts):
if count >= 1:
print(chr(index) * count)

How to dissect and parse a string in lua?

I am trying to make command arguments in Roblox. For example, /kill playername. The problem is I don't know how to parse the playername from the string /kill playername. This code is in something like this:
game:GetService("Players").PlayerAdded:Connect(function(Player)
Player.Chatted:Connect(function(Message)
if string.sub(1, #Message) == "/kill " then
--this means the string starts with /kill and is expecting an argument.
--How can I parse this argument from the string
end
end)
end)
Edit: I want to add /setdata <Playername> <DataToChange eg. money> <Value>
Example command:
/setdata MyRobloxUsername Money 10000
I am trying to use something like this to do so
local Command, Playername, DataToChange, Value = string.match(???)
I just need to get the values from the string into variables. I can figure out how to change the data using the variables myself. Just how to get the values from the string. How can I do what I am describing?
I unaccepted the answer because I need further help. Once I get this help I will re accept it. My next request is similar, but with 3 arguments instead of 1. I need help as string:Match() is very counter intuitive to me
Use string.match:
Message=" /kill playername "
command, arg = Message:match("%s*/(.-)%s+(.*)%s*$")
If you want this to be more flexible to more commands in the future, I suggest you take both lhf's and BotOfWar's suggestions and combine them.
local function executeCommandInMessage(message)
-- do a quick regex of the message to see if it is formatted as a command
-- all we care about is the command, any arguments are optional.
local command, arguments = string.match(message, "^/(%w+)[%s]?([%w%s]+)$")
if command ~= nil then
-- we've found a command, parse the arguments into groups of non-space characters
-- then store each word in the parts array
local parts = {}
for w in arguments:gmatch("%S+") do
table.insert(parts, w)
end
-- handle each command individually
if command == "kill" then
local player = parts[1]
print(string.format("Killing %s", player))
elseif command == "setdata" then
local player = parts[1]
local value = parts[2]
local amount = parts[3]
print(string.format("Setting %s on %s to %s", value, player, amount))
-- add any further commands to the list..
-- elseif command == "" then
end
end
end
-- listen for any message submitted by players
game:GetService("Players").PlayerAdded:Connect(function(Player)
Player.Chatted:Connect(function(msg)
-- check for any commands
executeCommandInMessage(msg)
end)
end)
In the future, if you need a better regex to parse the message, I suggest you take a look at how to do Lua pattern matching. They're pretty easy to read once you know what to look at.
I suggest splitting the string with the string.split method to get the segments, then check if the first value is what you want.
game:GetService("Players").PlayerAdded:Connect(function(Player)
Player.Chatted:Connect(function(Message)
local segments = Message:split(" ")
if((#segments >= 1) and (segments[1] == "/kill")) then
-- The rest of the arguments can be accessed like this:
local args = {unpack(segments, 2)} -- Gets every argument after the first value,
-- which is the command.
end
end)
end)

How to validate this string when we don't have the `|` operator in Lua?

I have strings of the form:
cake!apple!
apple!
cake!juice!apple!cake!
juice!cake!
In other words, these strings are composed of the three sub-strings "cake!", "apple!" and "juice!".
I need to validate these strings. The way to do this with a regular expression is thus:
/^(apple!|juice!|cake!)*$/
But Lua's patterns don't have the | operator, so it seemingly can't be done this way.
How can I validate my strings in Lua?
(I don't care about the contents of the strings: I only care about whether they conform (validate) or not.)
I know to write the code to do this but I can't think of a short way to do this. I'm looking for a short solution. I wonder if there's an elegant solution that I'm not aware of. Any ideas?
if str:gsub("%w+!", {["apple!"]="", ["juice!"]="", ["cake!"]=""}) == "" then
--do something
end
This solution uses a table as the second parameter to string.gsub. Since the patterns all match %w+, the table will validate for second time, only the real three patterns are replaced with an empty string. If after all the replacement, the string becomes empty, then the match succeeds.
Using a helper table variable can make it more clear:
local t = {["apple!"]="", ["juice!"]="", ["cake!"]=""}
if str:gsub("%w+!", t) == "" then
--do something
end
If there is a character that will never be in your string, for instance, the character "\1"(ASCII 1) is unlikely in a normal string, you can use this:
local str = "cake!juice!apple!cake!"
if str:gsub("apple!","\1"):gsub("juice!","\1"):gsub("cake!","\1"):gsub("\1","") == "" then
--do something
end
By replacing every match of the patterns to "\1", and finally replace "\1" to an empty string, the correct match would be an empty string in the end.
It has flaws(sometimes it's impossible to find a character that is never in the string), but I think it works in many situations.
The following seems to work for (the included) quick tests.
local strs = {
"cake!apple!",
"bad",
"apple!",
"apple!bad",
" apple!bad",
"cake!juice!apple!cake!",
"cake!juice! apple!cake!",
"cake!juice!badapple!cake!",
"juice!cake!",
"badjuice!cake!",
}
local legalwords = {
["cake!"] = true,
["apple!"] = true,
["juice!"] = true,
}
local function str_valid(str)
local newpos = 1
for pos, m in str:gmatch("()([^!]+!)") do
if not legalwords[m] then
return
end
newpos = pos + m:len()
end
if newpos ~= (str:len() + 1) then
return nil
end
return true
end
for _, str in ipairs(strs) do
if str_valid(str) then
print("Match: "..str)
else
print("Did not match: "..str)
end
end
Just to provide another answer, you can do this easily with lpeg's re module:
re = require 're'
local testdata =
{
"cake!apple!",
"apple!",
"cake!juice!apple!cake!",
"cake!juice!badbeef!apple!cake!",
"juice!cake!",
"badfood",
}
for _, each in ipairs(testdata) do
print(re.match(each, "('cake!' / 'apple!' / 'juice!')*") == #each + 1)
end
This outputs:
true
true
true
false
true
false
This looks almost like your regex pattern above minus the ^ $ of course since lpeg matching is always anchored.
Lua patterns are not a replacement for regular expressions, and cannot represent this sort of pattern. In this case, you just need to repeatedly make sure the front of the string matches one of your words and then pop it off, but you probably already knew that.
Something like:
local words = {cake=1,apple=2,juice=3}
local totals = {}
local matches = 0
local invalid = 0
string.gsub("cake!","(%a+)!",
function(word)
local index = words[word]
if index then
matches = matches + 1
totals[index] = totals[index] + 1
else
invalid = invalid + 1
end
end
)
if matches > 0 and invalid == 0 then
-- Do stuff
end
This will pass each word to the supplied function where you can validate each one.
I dont know if it'll help you to get by you problem. But using string.find() i could use "or". look:
str="juice!"
print(string.find(str, "cake!" or "teste"))
best regards

Lua - How to find a substring with 1 or 2 characters discrepancy

Say I have a string
local a = "Hello universe"
I find the substring "universe" by
a:find("universe")
Now, suppose the string is
local a = "un#verse"
The string to be searched is universe; but the substring differs by a single character.
So obviously Lua ignores it.
How do I make the function find the string even if there is a discrepancy by a single character?
If you know where the character would be, use . instead of that character: a:find("un.verse")
However, it looks like you're looking for a fuzzy string search. It is out of a scope for a Lua string library. You may want to start with this article: http://ntz-develop.blogspot.com/2011/03/fuzzy-string-search.html
As for Lua fuzzy search implementations — I haven't used any, but googing "lua fuzzy search" gives a few results. Some are based on this paper: http://web.archive.org/web/20070518080535/http://www.heise.de/ct/english/97/04/386/
Try https://github.com/ajsher/luafuzzy.
It sounds like you want something along the lines of TRE:
TRE is a lightweight, robust, and efficient POSIX compliant regexp matching library with some exciting features such as approximate (fuzzy) matching.
Approximate pattern matching allows matches to be approximate, that is, allows the matches to be close to the searched pattern under some measure of closeness. TRE uses the edit-distance measure (also known as the Levenshtein distance) where characters can be inserted, deleted, or substituted in the searched text in order to get an exact match. Each insertion, deletion, or substitution adds the distance, or cost, of the match. TRE can report the matches which have a cost lower than some given threshold value. TRE can also be used to search for matches with the lowest cost.
A Lua binding for it is available as part of lrexlib.
If you are really looking for a single character difference and do not care about performance, here is a simple approach that should work:
local a = "Hello un#verse"
local myfind = function(s,p)
local withdot = function(n)
return p:sub(1,n-1) .. '.' .. p:sub(n+1)
end
local a,b
for i=1,#s do
a,b = s:find(withdot(i))
if a then return a,b end
end
end
print(myfind(a,"universe"))
A simple roll your own approach (based on the assumption that the pattern keeps the same length):
function hammingdistance(a,b)
local ta={a:byte(1,-1)}
local tb={b:byte(1,-1)}
local res = 0
for k=1,#a do
if ta[k]~=tb[k] then
res=res+1
end
end
print(a,b,res) -- debugging/demonstration print
return res
end
function fuz(s,pat)
local best_match=10000
local best_location
for k=1,#s-#pat+1 do
local cur_diff=hammingdistance(s:sub(k,k+#pat-1),pat)
if cur_diff < best_match then
best_location = k
best_match = cur_diff
end
end
local start,ending = math.max(1,best_location),math.min(best_location+#pat-1,#s)
return start,ending,s:sub(start,ending)
end
s=[[Hello, Universe! UnIvErSe]]
print(fuz(s,'universe'))
Disclaimer: not recommended, just for fun:
If you want a better syntax (and you don't mind messing with standard type's metatables) you could use this:
getmetatable('').__sub=hammingdistance
a='Hello'
b='hello'
print(a-b)
But note that a-b does not equal b-a this way.

How to parse a string (by a "new" markup) with R?

I want to use R to do string parsing that (I think) is like a simplistic HTML parsing.
For example, let's say we have the following two variables:
Seq <- "GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA"
Str <- ">>>>>>>..>>>>........<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<."
Say that I want to parse "Seq" According to "Str", by using the legend here
Seq: GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA
Str: >>>>>>>..>>>>........<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<.
| | | | | | | || |
+-----+ +--------------+ +---------------+ +---------------++-----+
| Stem 1 Stem 2 Stem 3 |
| |
+----------------------------------------------------------------+
Stem 0
Assume that we always have 4 stems (0 to 3), but that the length of letters before and after each of them can very.
The output should be something like the following list structure:
list(
"Stem 0 opening" = "GCCTCGA",
"before Stem 1" = "TA",
"Stem 1" = list(opening = "GCTC",
inside = "AGTTGGGA",
closing = "GAGC"
),
"between Stem 1 and 2" = "G",
"Stem 2" = list(opening = "TACGA",
inside = "CTGAAGA",
closing = "TCGTA"
),
"between Stem 2 and 3" = "AGGtC",
"Stem 3" = list(opening = "ACCAG",
inside = "TTCGATC",
closing = "CTGGT"
),
"After Stem 3" = "",
"Stem 0 closing" = "TCGGGGC"
)
I don't have any experience with programming a parser, and would like advices as to what strategy to use when programming something like this (and any recommended R commands to use).
What I was thinking of is to first get rid of the "Stem 0", then go through the inner string with a recursive function (let's call it "seperate.stem") that each time will split the string into:
1. before stem
2. opening stem
3. inside stem
4. closing stem
5. after stem
Where the "after stem" will then be recursively entered into the same function ("seperate.stem")
The thing is that I am not sure how to try and do this coding without using a loop.
Any advices will be most welcomed.
Update: someone sent me a bunch of question, here they are.
Q: Does each sequence have the same number of ">>>>" for the opening sequence as it does for "<<<<" on the ending sequence?
A: Yes
Q: Does the parsing always start with a partial stem 0 as your example shows?
A: No. Sometimes it will start with a few "."
Q: Is there a way of making sure you have the right sequences when you start?
A: I am not sure I understand what you mean.
Q: Is there a chance of error in the middle of the string that you have to restart from?
A: Sadly, yes. In which case, I'll need to ignore one of the inner stems...
Q: How long are these strings that you want to parse?
A: Each string has between 60 to 150 characters (and I have tens of thousands of them...)
Q: Is each one a self contained sequence like you show in your example, or do they go on for thousands of characters?
A: each sequence is self contained.
Q: Is there always at least one '.' between stems?
A: No.
Q: A full set of rules as to how the parsing should be done would be useful.
A: I agree. But since I don't have even a basic idea on how to start coding this, I thought first to have some help on the beginning and try to tweak with the other cases that will come up before turning back for help.
Q: Do you have the BNF syntax for parsing?
A: No. Your e-mail is the first time I came across it (http://en.wikipedia.org/wiki/Backus–Naur_Form).
You can simplify the task by using run length encoding.
First, convert Str to be a vector of individual characters, then call rle.
split_Str <- strsplit(Str, "")[[1]]
rle_Str <- rle(split_Str)
Run Length Encoding
lengths: int [1:14] 7 2 4 8 4 1 5 7 5 5 ...
values : chr [1:14] ">" "." ">" "." "<" "." ">" "." "<" "." ">" "." "<" "."
Now you just need to parse rle_Str$values, which is perhaps simpler. For instance, an inner stem will always look like ">" "." "<".
I think the main thing that you need to think about is the structure of the data. Does a "." always have to come between ">" and "<", or is it optional? Can you have a "." at the start? Do you need to be able to generalise to stems within stems within stems, or even more complex structures?
Once you have this solved, contructing your list output should be straightforward.
Also, don't worry about using loops, they are in the language because they are useful. Get the thing working first, then worry about speed optimisations (if you really have to) afterwards.

Resources