Removing special characters from string in LUA

Removing special characters from string in LUA - string

I'm trying to clean up a column of data containing postal codes before processing the values. The data contains all kinds of crazy formatting or input like the following and is a CHAR datatype:
12345
12.345
1234-5678
12345 6789
123456789
12345-6789
.
[blank]
I would like to remove all of the special characters and have tried the following code, but my script fails after many iterations of the logic. When I say it fails, let's say sOriginalZip = '.', but it gets past my empty string check and nil check as if it is not empty even after I have replaced all special characters, control characters and space characters. So my output looks like this:
" 2 sZip5 = "
code:
nNull = nil
sZip5 = string.gsub(sOriginalZip,"%p","")
sZip5 = string.gsub(sZip5,"%c","")
sZip5 = string.gsub(sZip5,"%s","")
print("sZip5 = " .. sZip5)
if sZip5 ~= sBlank or tonumber(sZip5) ~= nNull then
print(" 2 sZip5 = " .. sZip5)
else
print("3 sZip5 = " .. sZip5)
end

I think there are different ways to go, following should work:
sZip5 = string.gsub(sOriginalZip, '.', function(d) return tonumber(d) and d or '' end)
It returns a number string, blank value or nil

Thanks! I ended up using a combination of csarr and Egor's suggestions to get this:
sZip5 = string.gsub(sOriginalZip,"%W",function(d)return tonumber(d) and d or "" end)
Looks like it is evaluating correctly. Thanks again!

Related

Skipping spaces in Groovy

I'm trying to write a conditional statement where I can skip a specific space then start reading all the characters after it.
I was thinking to use substring but that wouldn't help because substring will only work if I know the exact number of characters I want to skip but in this case, I want to skip a specific space to read characters afterward.
For example:
String text = "ABC DEF W YZ" //number of characters before the spaces are unknown
String test = "A"
if ( test == "A") {
return text (/*escape the first two space and return anything after that*/)
}

You can split your string on " " with tokenize, remove the first N elements from the returned array (where N is the number of spaces you want to ignore) and join what's left with " ".
Supposing your N is 2:
String text = "ABC DEF W YZ" //number of characters before the spaces are unknown
String test = "A"
if ( test == "A") {
return text.tokenize(" ").drop(2).join(" ")
}

Finding the "difference" between two string texts (Lua example)

I'm trying to find the difference in text between two string values in Lua, and I'm just not quite sure how to do this effectively. I'm not very experienced in working with string patterns, and I'm sure that's my downfall on this one. Here's an example:
-- Original text
local text1 = "hello there"
-- Changed text
local text2 = "hello.there"
-- Finding the alteration of original text with some "pattern"
print(text2:match("pattern"))
In the example above, I'd want to output the text ".", since that's the difference between the two texts. Same goes for cases where the difference could be sensitive to a string pattern, like this:
local text1 = "hello there"
local text2 = "hello()there"
print(text2:match("pattern"))
In this example, I'd want to print "(" since at that point the new string is no longer consistent with the old one.
If anyone has any insight on this, I'd really appreciate it. Sorry I couldn't give more to work with code-wise, I'm just not sure where to begin.

Just iterate over the strings and find when they don't match.
function StringDifference(str1,str2)
for i = 1,#str1 do --Loop over strings
if str1:sub(i,i) ~= str2:sub(i,i) then --If that character is not equal to it's counterpart
return i --Return that index
end
end
return #str1+1 --Return the index after where the shorter one ends as fallback.
end
print(StringDifference("hello there", "hello.there"))

local function get_inserted_text(old, new)
local prv = {}
for o = 0, #old do
prv[o] = ""
end
for n = 1, #new do
local nxt = {[0] = new:sub(1, n)}
local nn = new:sub(n, n)
for o = 1, #old do
local result
if nn == old:sub(o, o) then
result = prv[o-1]
else
result = prv[o]..nn
if #nxt[o-1] <= #result then
result = nxt[o-1]
end
end
nxt[o] = result
end
prv = nxt
end
return prv[#old]
end
Usage:
print(get_inserted_text("hello there", "hello.there")) --> .
print(get_inserted_text("hello there", "hello()there")) --> ()
print(get_inserted_text("hello there", "hello htere")) --> h
print(get_inserted_text("hello there", "heLlloU theAre")) --> LUA

Getting the largest and smallest word at a string

when I run this codes the output is (" "," "),however it should be ("I","love")!!!, and there is no errors . what should I do to fix it ??
sen="I love dogs"
function Longest_word(sen)
x=" "
maxw=" "
minw=" "
minl=1
maxl=length(sen)
p=0
for i=1:length(sen)
if(sen[i]!=" ")
x=[x[1]...,sen[i]...]
else
p=length(x)
if p<min1
minl=p
minw=x
end
if p>maxl
maxl=p
maxw=x
end
x=" "
end
end
return minw,maxw
end

As #David mentioned, another and may be better solution can be achieved by using split function:
function longest_word(sentence)
sp=split(sentence)
len=map(length,sp)
return (sp[indmin(len)],sp[indmax(len)])
end

The idea of your code is good, but there are a few mistakes.
You can see what's going wrong by debugging a bit. The easiest way to do this is with #show, which prints out the value of variables. When code doesn't work like you expect, this is the first thing to do -- just ask it what it's doing by printing everything out!
E.g. if you put
if(sen[i]!=" ")
x=[x[1]...,sen[i]...]
#show x
and run the function with
Longest_word("I love dogs")
you will see that it is not doing what you want it to do, which (I believe) is add the ith letter to the string x.
Note that the ith letter accessed like sen[i] is a character not a string.
You can try converting it to a string with
string(sen[i])
but this gives a Unicode string, not an ASCII string, in recent versions of Julia.
In fact, it would be better not to iterate over the string using
for i in 1:length(sen)
but iterate over the characters in the string (which will also work if the string is Unicode):
for c in sen
Then you can initialise the string x as
x = UTF8String("")
and update it with
x = string(x, c)
Try out some of these possibilities and see if they help.
Also, you have maxl and minl defined wrong initially -- they should be the other way round. Also, the names of the variables are not very helpful for understanding what should happen. And the strings should be initialised to empty strings, "", not a string with a space, " ".
#daycaster is correct that there seems to be a min1 that should be minl.
However, in fact there is an easier way to solve the problem, using the split function, which divides a string into words.
Let us know if you still have a problem.
Here is a working version following your idea:
function longest_word(sentence)
x = UTF8String("")
maxw = ""
minw = ""
maxl = 0 # counterintuitive! start the "wrong" way round
minl = length(sentence)
for i in 1:length(sentence) # or: for c in sentence
if sentence[i] != ' ' # or: if c != ' '
x = string(x, sentence[i]) # or: x = string(x, c)
else
p = length(x)
if p < minl
minl = p
minw = x
end
if p > maxl
maxl = p
maxw = x
end
x = ""
end
end
return minw, maxw
end
Note that this function does not work if the longest word is at the end of the string. How could you modify it for this case?

Python Join String to Produce Combinations For All Words in String

If my string is this: 'this is a string', how can I produce all possible combinations by joining each word with its neighboring word?
What this output would look like:
this is a string
thisis a string
thisisa string
thisisastring
thisis astring
this isa string
this isastring
this is astring
What I have tried:
s = 'this is a string'.split()
for i, l in enumerate(s):
''.join(s[0:i])+' '.join(s[i:])
This produces:
'this is a string'
'thisis a string'
'thisisa string'
'thisisastring'
I realize I need to change the s[0:i] part because it's statically anchored at 0 but I don't know how to move to the next word is while still including this in the output.

A simpler (and 3x faster than the accepted answer) way to use itertools product:
s = 'this is a string'
s2 = s.replace('%', '%%').replace(' ', '%s')
for i in itertools.product((' ', ''), repeat=s.count(' ')):
print(s2 % i)

You can also use itertools.product():
import itertools
s = 'this is a string'
words = s.split()
for t in itertools.product(range(len('01')), repeat=len(words)-1):
print(''.join([words[i]+t[i]*' ' for i in range(len(t))])+words[-1])

Well, it took me a little longer than I expected... this is actually tricker than I thought :)
The main idea:
The number of spaces when you split the string is the length or the split array - 1. In our example there are 3 spaces:
'this is a string'
^ ^ ^
We'll take a binary representation of all the options to have/not have either one of the spaces, so in our case it'll be:
000
001
011
100
101
...
and for each option we'll generate the sentence respectively, where 111 represents all 3 spaces: 'this is a string' and 000 represents no-space at all: 'thisisastring'
def binaries(n):
res = []
for x in range(n ** 2 - 1):
tmp = bin(x)
res.append(tmp.replace('0b', '').zfill(n))
return res
def generate(arr, bins):
res = []
for bin in bins:
tmp = arr[0]
i = 1
for digit in list(bin):
if digit == '1':
tmp = tmp + " " + arr[i]
else:
tmp = tmp + arr[i]
i += 1
res.append(tmp)
return res
def combinations(string):
s = string.split(' ')
bins = binaries(len(s) - 1)
res = generate(s, bins)
return res
print combinations('this is a string')
# ['thisisastring', 'thisisa string', 'thisis astring', 'thisis a string', 'this isastring', 'this isa string', 'this is astring', 'this is a string']
UPDATE:
I now see that Amadan thought of the same idea - kudos for being quicker than me to think about! Great minds think alike ;)

The easiest is to do it recursively.
Terminating condition: Schrödinger join of a single element list is that word.
Recurring condition: say that L is the Schrödinger join of all the words but the first. Then the Schrödinger join of the list consists of all elements from L with the first word directly prepended, and all elements from L with the first word prepended with an intervening space.
(Assuming you are missing thisis astring by accident. If it is deliberately, I am sure I have no idea what the question is :P )
Another, non-recursive way you can do it is to enumerate all numbers from 0 to 2^(number of words - 1) - 1, then use the binary representation of each number as a selector whether or not a space needs to be present. So, for example, the abovementioned thisis astring corresponds to 0b010, for "nospace, space, nospace".

Insert quoted and unquoted parts of string in table

I've been working on this part of a saycommand system which is supposed to separate parts of a string and put them in a table which is sent to a function, which is queried at the beginning of the string. This would look like, for example, !save 1 or !teleport 0 1, or !tell 5 "a private message".
I would like this string to turn into a table:
[[1 2 word 2 9 'more words' 1 "and more" "1 2 34"]]
(Every non-quoted part of the string gets its own key, and the quoted parts get grouped into a key)
1 = 1
2 = 2
3 = word
4 = 2
5 = 9
6 = more words
7 = 1
8 = and more
9 = 1 2 34
I've tried doing this with Lua pattern, but I'm stuck trying to find out how to capture both quoted and unquoted pieces of the string. I've tried a lot of things, but nothing helped.
My current pattern attempts look like this:
a, d = '1 2 word 2 9 "more words" 1 "and more" "1 2 34"" ', {}
-- previous attempts
--[[
This one captures quotes
a:gsub('(["\'])(.-)%1', function(a, b) table.insert(d, b) end)
This one captures some values and butchered quotes,
which might have to do with spaces in the string
a:gsub('(["%s])(.-)%1', function(a, b) table.insert(d, b) end)
This one captures every value, but doesn't take care of quotes
a:gsub('(%w+)', function(a) table.insert(d, a) end)
This one tries making %s inside of quotes into underscores to
ignore them there, but it doesn't work
a = a:gsub('([%w"\']+)', '%1_')
a:gsub('(["\'_])(.-)%1', function(a, b) table.insert(d, b) end)
a:gsub('([%w_]+)', function(a) table.insert(d, a) end)
This one was a wild attempt at cracking it, but no success
a:gsub('["\']([^"\']-)["\'%s]', function(a) table.insert(d, a) end)
--]]
-- This one adds spaces, which would later be trimmed off, to test
-- whether it helped with the butchered strings, but it doesn't
a = a:gsub('(%w)(%s)(%w)', '%1%2%2%3')
a:gsub('(["\'%s])(.-)%1', function(a, b) table.insert(d, b) end)
for k, v in pairs(d) do
print(k..' = '..v)
end
This would not be needed for simple commands, but a more complex one like !tell 1 2 3 4 5 "a private message sent to five people" does need it, first to check if it's sent to multiple people and next to find out what the message is.
Further down the line I want to add commands like !give 1 2 3 "component:material_iron:weapontype" "food:calories", which is supposed to add two items to three different people, would benefit greatly from such a system.
If this is impossible in Lua pattern, I'll try doing it with for loops and such, but I really feel like I'm missing something obvious. Am I overthinking this?

You cannot process quoted strings with Lua patterns. You need to parse the string explicitly, as in the code below.
function split(s)
local t={}
local n=0
local b,e=0,0
while true do
b,e=s:find("%s*",e+1)
b=e+1
if b>#s then break end
n=n+1
if s:sub(b,b)=="'" then
b,e=s:find(".-'",b+1)
t[n]=s:sub(b,e-1)
elseif s:sub(b,b)=='"' then
b,e=s:find('.-"',b+1)
t[n]=s:sub(b,e-1)
else
b,e=s:find("%S+",b)
t[n]=s:sub(b,e)
end
end
return t
end
s=[[1 2 word 2 9 'more words' 1 "and more" "1 2 34"]]
print(s)
t=split(s)
for k,v in ipairs(t) do
print(k,v)
end

Lua string patterns and regex for that matter generally aren't well suited when you need to do parsing that requires varying nesting levels or token count balancing like parenthesis ( ). But there is another tool available to Lua that's powerful enough to deal with that requirement: LPeg.
The LPeg syntax is a bit archaic and takes some getting use to so I'll use the lpeg re module instead to make it easier to digest. Keep in mind that anything you can do in one form of the syntax you can also express in the other form as well.
I'll start by defining the grammar for parsing your format description:
local re = require 're'
local cmdgrammar =
[[
saycmd <- '!' cmd extra
cmd <- %a%w+
extra <- (singlequote / doublequote / unquote / .)*
unquote <- %w+
singlequote <- "'" (unquote / %s)* "'"
doublequote <- '"' (unquote / %s)* '"'
]]
Next, compile the grammar and use it to match some of your test examples:
local cmd_parser = re.compile(cmdgrammar)
local saytest =
{
[[!save 1 2 word 2 9 'more words' 1 "and more" "1 2 34"]],
[[!tell 5 "a private message"]],
[[!teleport 0 1]],
[[!say 'another private message' 42 "foo bar" baz]],
}
There are currently no captures in the grammar so re.match returns the last character position in the string it was able to match up to + 1. That means a successful parse will return the full character count of the string + 1 and therefore is a valid instance of your grammar.
for _, test in ipairs(saytest) do
assert(cmd_parser:match(test) == #test + 1)
end
Now comes the interesting part. Once you have the grammar working as desired you can now add captures that automatically extracts the results you want into a lua table with relatively little effort. Here's the final grammar spec + table captures:
local cmdgrammar =
[[
saycmd <- '!' {| {:cmd: cmd :} {:extra: extra :} |}
cmd <- %a%w+
extra <- {| (singlequote / doublequote / { unquote } / .)* |}
unquote <- %w+
singlequote <- "'" { (unquote / %s)* } "'"
doublequote <- '"' { (unquote / %s)* } '"'
]]
Running the tests again and dumping the re.match results:
for i, test in ipairs(saytest) do
print(i .. ':')
dump(cmd_parser:match(test))
end
You should get output similar to:
lua say.lua
1:
{
extra = {
"1",
"2",
"word",
"2",
"9",
"more words",
"1",
"and more",
"1 2 34"
},
cmd = "save"
}
2:
{
extra = {
"5",
"a private message"
},
cmd = "tell"
}
3:
{
extra = {
"0",
"1"
},
cmd = "teleport"
}
4:
{
extra = {
"another private message",
"42",
"foo bar",
"baz"
},
cmd = "say"
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Removing special characters from string in LUA - string

I think there are different ways to go, following should work: sZip5 = string.gsub(sOriginalZip, '.', function(d) return tonumber(d) and d or '' end) It returns a number string, blank value or nil

Thanks! I ended up using a combination of csarr and Egor's suggestions to get this: sZip5 = string.gsub(sOriginalZip,"%W",function(d)return tonumber(d) and d or "" end) Looks like it is evaluating correctly. Thanks again!

Related

Skipping spaces in Groovy

Finding the "difference" between two string texts (Lua example)

Getting the largest and smallest word at a string

Python Join String to Produce Combinations For All Words in String

Insert quoted and unquoted parts of string in table

Categories

Resources