I want to convert string text to table and this text must be divided on characters. Every character must be in separate value of table, for example:
a="text"
--converting string (a) to table (b)
--show table (b)
b={'t','e','x','t'}
You could use string.gsub function
t={}
str="text"
str:gsub(".",function(c) table.insert(t,c) end)
Just index each symbol and put it at same position in table.
local str = "text"
local t = {}
for i = 1, #str do
t[i] = str:sub(i, i)
end
The builtin string library treats Lua strings as byte arrays.
An alternative that works on multibyte (Unicode) characters is the
unicode library that
originated in the Selene project.
Its main selling point is that it can be used as a drop-in replacement
for the string library, making most string operations “magically”
Unicode-capable.
If you prefer not to add third party dependencies your task can easily
be implemented using LPeg.
Here is an example splitter:
local lpeg = require "lpeg"
local C, Ct, R = lpeg.C, lpeg.Ct, lpeg.R
local lpegmatch = lpeg.match
local split_utf8 do
local utf8_x = R"\128\191"
local utf8_1 = R"\000\127"
local utf8_2 = R"\194\223" * utf8_x
local utf8_3 = R"\224\239" * utf8_x * utf8_x
local utf8_4 = R"\240\244" * utf8_x * utf8_x * utf8_x
local utf8 = utf8_1 + utf8_2 + utf8_3 + utf8_4
local split = Ct (C (utf8)^0) * -1
split_utf8 = function (str)
str = str and tostring (str)
if not str then return end
return lpegmatch (split, str)
end
end
This snippet defines the function split_utf8() that creates a table
of UTF8 characters (as Lua strings), but returns nil if the string
is not a valid UTF sequence.
You can run this test code:
tests = {
en = [[Lua (/ˈluːə/ LOO-ə, from Portuguese: lua [ˈlu.(w)ɐ] meaning moon; ]]
.. [[explicitly not "LUA"[1]) is a lightweight multi-paradigm programming ]]
.. [[language designed as a scripting language with "extensible ]]
.. [[semantics" as a primary goal.]],
ru = [[Lua ([лу́а], порт. «луна») — интерпретируемый язык программирования, ]]
.. [[разработанный подразделением Tecgraf Католического университета ]]
.. [[Рио-де-Жанейро.]],
gr = [[Η Lua είναι μια ελαφρή προστακτική γλώσσα προγραμματισμού, που ]]
.. [[σχεδιάστηκε σαν γλώσσα σεναρίων με κύριο σκοπό τη δυνατότητα ]]
.. [[επέκτασης της σημασιολογίας της.]],
XX = ">\255< invalid"
}
-------------------------------------------------------------------------------
local limit = 14
for lang, str in next, tests do
io.write "\n"
io.write (string.format ("<%s %3d> ->", lang, #str))
local chars = split_utf8 (str)
if not chars then
io.write " INVALID!"
else
io.write (string.format (" <%3d>", #chars))
for i = 1, #chars > limit and limit or #chars do
io.write (string.format (" %q", chars [i]))
end
end
end
io.write "\n"
Btw., building a table with LPeg is significantly faster than calling
table.insert() repeatedly.
Here are stats for splitting the whole of Gogol’s Dead Souls (in
Russian, 1023814 bytes raw, 571395 characters UTF) on my machine:
library method time in ms
string table.insert() 380
string t [#t + 1] = c 310
string gmatch & for loop 280
slnunicode table.insert() 220
slnunicode t [#t + 1] = c 200
slnunicode gmatch & for loop 170
lpeg Ct (C (...)) 70
You can below code to achieve this easily.
t = {}
str = "text"
for i=1, string.len(str) do
t[i]= (string.sub(str,i,i))
end
for k , v in pairs(t) do
print(k,v)
end
-- 1 t
-- 2 e
-- 3 x
-- 4 t
Using string.sub
string.sub(s, i [, j])
Return a substring of the string passed. The substring starts at i. If the third argument j is not given, the substring will end at the end of the string. If the third argument is given, the substring ends at and includes j.
Related
I'm looking for the most efficient way to split a Lua string into a table.
I found two possible ways using gmatch or gsub and tried to make them as fast as possible.
function string:split1(sep)
local sep = sep or ","
local result = {}
local i = 1
for c in (self..sep):gmatch("(.-)"..sep) do
result[i] = c
i = i + 1
end
return result
end
function string:split2(sep)
local sep = sep or ","
local result = {}
local pattern = string.format("([^%s]+)", sep)
local i = 1
self:gsub(pattern, function (c) result[i] = c i = i + 1 end)
return result
end
The second option takes ~50% longer than the first.
What is the right way and why?
Added: I added a third function with the same pattern.
It shows the best result.
function string:split3(sep)
local sep = sep or ","
local result = {}
local i = 1
for c in self:gmatch(string.format("([^%s]+)", sep)) do
result[i] = c
i = i + 1
end
return result
end
"(.-)"..sep - works with a sequence.
"([^" .. sep .. "]+)" works with a single character. In fact, for each character in the sequence.
string.format("([^%s]+)", sep) is faster than "([^" .. sep .. "]+)".
The string.format("(.-)%s", sep) shows almost the same time as "(.-)"..sep.
result[i]=c i=i+1 is faster than result[#result+1]=c and table.insert(result,c)
Code for test:
local init = os.clock()
local initialString = [[1,2,3,"afasdaca",4,"acaac"]]
local temTable = {}
for i = 1, 1000 do
table.insert(temTable, initialString)
end
local dataString = table.concat(temTable,",")
print("Creating data: ".. (os.clock() - init))
init = os.clock()
local data1 = {}
for i = 1, 1000 do
data1 = dataString:split1(",")
end
print("split1: ".. (os.clock() - init))
init = os.clock()
local data2 = {}
for i = 1, 1000 do
data2 = dataString:split2(",")
end
print("split2: ".. (os.clock() - init))
init = os.clock()
local data3 = {}
for i = 1, 1000 do
data3 = dataString:split3(",")
end
print("split3: ".. (os.clock() - init))
Times:
Creating data: 0.000229
split1: 1.189397
split2: 1.647402
split3: 1.011056
The gmatch version is preferred. gsub is intended for "global substitution" - string replacement - rather than iterating over matches; accordingly it presumably has to do more work.
The comparison isn't quite fair though as your patterns differ: For gmatch you use "(.-)"..sep and for gsub you use "([^" .. sep .. "]+)". Why don't you use the same pattern for both? In newer Lua versions you could even use the frontier pattern.
The different patterns also lead to different behavior: The gmatch-based func will return empty matches whereas the others won't. Note that the "([^" .. sep .. "]+)" pattern allows you to omit the parentheses.
I am trying to separate the string data in an HTTP protocol in wireshark using lua and I am not having success finding the end of the string, this is what I currently have
HTTP_protocol = Proto("ourHTTP", "HTTPProtocol")
first =ProtoField.string("HTTP_protocol.first", "first", base.ASCII)
second =ProtoField.string("HTTP_protocol.second", "second", base.ASCII)
HTTP_protocol.fields = {first}
function HTTP_protocol.dissector(buffer, pinfo, tree)
length = buffer:len()
if length ==0 then return end
pinfo.cols.protocol = HTTP_protocol.name
local subtree = tree:add(HTTP_protocol, buffer(), "HTTPProtocol data ")
local string_length
for i = 0, length - 1, 1 do
if (buffer(i,1):uint() == '\r') then
string_length = i - 0
break
end
end
subtree:add(first, buffer(0,string_length))
end
porttable = DissectorTable.get("tcp.port")
porttable:add(80, HTTP_protocol)
i have tried searching for '\r', '\0' and '\n' but no matter what I still get all the strings inputed as one. Is there something I am doing wrong?
You can use 0x0D instead. That's the ASCII code for \r. So it will end up as
if (buffer(i,1):uint() == 0x0D) then
In Wireshark:
So I have the following code to split a string between whitespaces:
text = "I am 'the text'"
for string in text:gmatch("%S+") do
print(string)
end
The result:
I
am
'the
text'
But I need to do this:
I
am
the text --[[yep, without the quotes]]
How can I do this?
Edit: just to complement the question, the idea is to pass parameters from a program to another program. Here is the pull request that I am working, currently in review: https://github.com/mpv-player/mpv/pull/1619
There may be ways to do this with clever parsing, but an alternative way may be to keep track of a simple state and merge fragments based on detection of quoted fragments. Something like this may work:
local text = [[I "am" 'the text' and "some more text with '" and "escaped \" text"]]
local spat, epat, buf, quoted = [=[^(['"])]=], [=[(['"])$]=]
for str in text:gmatch("%S+") do
local squoted = str:match(spat)
local equoted = str:match(epat)
local escaped = str:match([=[(\*)['"]$]=])
if squoted and not quoted and not equoted then
buf, quoted = str, squoted
elseif buf and equoted == quoted and #escaped % 2 == 0 then
str, buf, quoted = buf .. ' ' .. str, nil, nil
elseif buf then
buf = buf .. ' ' .. str
end
if not buf then print((str:gsub(spat,""):gsub(epat,""))) end
end
if buf then print("Missing matching quote for "..buf) end
This will print:
I
am
the text
and
some more text with '
and
escaped \" text
Updated to handle mixed and escaped quotes. Updated to remove quotes. Updated to handle quoted words.
Try this:
text = [[I am 'the text' and '' here is "another text in quotes" and this is the end]]
local e = 0
while true do
local b = e+1
b = text:find("%S",b)
if b==nil then break end
if text:sub(b,b)=="'" then
e = text:find("'",b+1)
b = b+1
elseif text:sub(b,b)=='"' then
e = text:find('"',b+1)
b = b+1
else
e = text:find("%s",b+1)
end
if e==nil then e=#text+1 end
print("["..text:sub(b,e-1).."]")
end
Lua Patterns aren't powerful to handle this task properly. Here is an LPeg solution adapted from the Lua Lexer. It handles both single and double quotes.
local lpeg = require 'lpeg'
local P, S, C, Cc, Ct = lpeg.P, lpeg.S, lpeg.C, lpeg.Cc, lpeg.Ct
local function token(id, patt) return Ct(Cc(id) * C(patt)) end
local singleq = P "'" * ((1 - S "'\r\n\f\\") + (P '\\' * 1)) ^ 0 * "'"
local doubleq = P '"' * ((1 - S '"\r\n\f\\') + (P '\\' * 1)) ^ 0 * '"'
local white = token('whitespace', S('\r\n\f\t ')^1)
local word = token('word', (1 - S("' \r\n\f\t\""))^1)
local string = token('string', singleq + doubleq)
local tokens = Ct((string + white + word) ^ 0)
input = [["This is a string" 'another string' these are words]]
for _, tok in ipairs(lpeg.match(tokens, input)) do
if tok[1] ~= "whitespace" then
if tok[1] == "string" then
print(tok[2]:sub(2,-2)) -- cut off quotes
else
print(tok[2])
end
end
end
Output:
This is a string
another string
these
are
words
I want to have ability to use a lastIndexOf method for the strings in my Lua (Luvit) project. Unfortunately there's no such method built-in and I'm bit stuck now.
In Javascript it looks like:
'my.string.here.'.lastIndexOf('.') // returns 14
function findLast(haystack, needle)
local i=haystack:match(".*"..needle.."()")
if i==nil then return nil else return i-1 end
end
s='my.string.here.'
print(findLast(s,"%."))
print(findLast(s,"e"))
Note that to find . you need to escape it.
If you have performance concerns, then this might be a bit faster if you're using Luvit which uses LuaJIT.
local find = string.find
local function lastIndexOf(haystack, needle)
local i, j
local k = 0
repeat
i = j
j, k = find(haystack, needle, k + 1, true)
until j == nil
return i
end
local s = 'my.string.here.'
print(lastIndexOf(s, '.')) -- This will be 15.
Keep in mind that Lua strings begin at 1 instead of 0 as in JavaScript.
Here’s a solution using
LPeg’s position capture.
local lpeg = require "lpeg"
local Cp, P = lpeg.Cp, lpeg.P
local lpegmatch = lpeg.match
local cache = { }
local find_last = function (str, substr)
if not (str and substr)
or str == "" or substr == ""
then
return nil
end
local pat = cache [substr]
if not pat then
local p_substr = P (substr)
local last = Cp() * p_substr * Cp() * (1 - p_substr)^0 * -1
pat = (1 - last)^0 * last
cache [substr] = pat
end
return lpegmatch (pat, str)
end
find_last() finds the last occurence of substr in the string
str, where substr can be a string of any length.
The first return value is the position of the first character of
substr in str, the second return value is the position of the
first character following substr (i.e. it equals the length of the
match plus the first return value).
Usage:
local tests = {
A = [[fooA]], --> 4, 5
[""] = [[foo]], --> nil
FOO = [[]], --> nil
K = [[foo]], --> nil
X = [[X foo X bar X baz]], --> 13, 14
XX = [[foo XX X XY bar XX baz X]], --> 17, 19
Y = [[YYYYYYYYYYYYYYYYYY]], --> 18, 19
ZZZ = [[ZZZZZZZZZZZZZZZZZZ]], --> 14, 17
--- Accepts patterns as well!
[P"X" * lpeg.R"09"^1] = [[fooX42barXxbazX]], --> 4, 7
}
for substr, str in next, tests do
print (">>", substr, str, "->", find_last (str, substr))
end
To search for the last instance of string needle in haystack:
function findLast(haystack, needle)
--Set the third arg to false to allow pattern matching
local found = haystack:reverse():find(needle:reverse(), nil, true)
if found then
return haystack:len() - needle:len() - found + 2
else
return found
end
end
print(findLast("my.string.here.", ".")) -- 15, because Lua strings are 1-indexed
print(findLast("my.string.here.", "here")) -- 11
print(findLast("my.string.here.", "there")) -- nil
If you want to search for the last instance of a pattern instead, change the last argument to find to false (or remove it).
Can be optimized but simple and does the work.
function lastIndexOf(haystack, needle)
local last_index = 0
while haystack:sub(last_index+1, haystack:len()):find(needle) ~= nil do
last_index = last_index + haystack:sub(last_index+1, haystack:len()):find(needle)
end
return last_index
end
local s = 'my.string.here.'
print(lastIndexOf(s, '%.')) -- 15
I have string
'TEST1, TEST2, TEST3'
I want to have
'TEST1,TEST2,TEST3'
Is in powerbuilder is a function like replace, substr or something?
One way is to use the database since you probably have an active connection.
string ls_stringwithspaces = "String String String String"
string ls_stringwithnospace = ""
string ls_sql = "SELECT replace('" + ls_stringwithspaces + "', ' ', '')"
DECLARE db DYNAMIC CURSOR FOR SQLSA;
PREPARE SQLSA FROM :ls_sql USING SQLCA;
OPEN DYNAMIC db;
IF SQLCA.SQLCode > 0 THEN
// erro handling
END IF
FETCH db INTO :ls_stringwithnospace;
CLOSE db;
MessageBox("", ls_stringwithnospace)
Sure there is (you could have easily found it in the help) but it is not quite helpful, though.
Its prototype is Replace ( string1, start, n, string2 ), so you need to know the position of the string to replace before calling it.
There is a common wrapper for this that consists of looping on pos() / replace() until there is nothing left to replace. The following is the source code of a global function:
global type replaceall from function_object
end type
forward prototypes
global function string replaceall (string as_source, string as_pattern, string as_replace)
end prototypes
global function string replaceall (string as_source, string as_pattern, string as_replace);//replace all occurences of as_pattern in as_source by as_replace
string ls_target
long i, j
ls_target=""
i = 1
j = 1
do
i = pos( as_source, as_pattern, j )
if i>0 then
ls_target += mid( as_source, j, i - j )
ls_target += as_replace
j = i + len( as_pattern )
else
ls_target += mid( as_source, j )
end if
loop while i>0
return ls_target
end function
Beware that string functions (searching & concatenating) in PB are not that efficient, and an alternative solution could be to use the FastReplaceall() global function provided by the PbniRegex extension. It is a c++ compiled plugin for PB classic from versions 9 to 12.
I do that:
long space, ll_a
FOR ll_a = 1 to len(ls_string)
space = pos(ls_string, " ")
IF space > 0 THEN
ls_string= Replace(ls_string, space, 1, "")
END IF
NEXT