Is it possible to substitute characters according to a list in Lua, like tr in Perl? For example, I would like to substitute A to B and B to A (e.g. AABBCC becomes BBAACC).
In Perl, the solution would be $str ~= tr/AB/BA/. Is there any native way of doing this in Lua? If not, I think the best solution would be iterating through the entire string, since separate substitutions need to use a special symbol to distinguish characters that were already substituted and characters that weren't.
Edit: my goal was to calculate the reverse complement of a DNA string, as described here.
string.gsub can take a table as the third argument. The table is queried for each match, using the first capture as the key, and the associated value is used as the replacement string. If the value is nil, the match is not changed.
So you can build a helper table like this:
local s = "AABBCC"
local t = {A = "B", B = "A"}
local result = string.gsub(s, "[AB]", t)
print(result)
or this same one-liner:
print((string.gsub("AABBCC", "[AB]", {A = "B", B = "A"})))
Output:
BBAACC
For a one character pattern like "[AB]", "." can work as well because whatever not found in the table won't be changed. (But I don't think that's more efficient) But for some more complicated cases, a good pattern is needed.
Here is an example from Programming in Lua: this function substitutes the value of the global variable varname for every occurrence of $varname in a string:
function expand (s)
return (string.gsub(s, "$(%w+)", _G))
end
The code below will replace each character with a desired mapping (or leave alone if no mapping exists). You could modify the second parameter to string.gsub in tr to be more specific if you know the exact range of characters.
s = "AABBCC"
mappings = {["A"]="B",["B"]="A"}
function tr(s,mappings)
return string.gsub(s,
"(.)",
function(m)
-- print("found",m,"replace with",mappings[m],mappings[m] or m)
if mappings[m] == nil then return m else return mappings[m] end
end
)
end
print(tr(s,mappings))
Outputs
henry#henry-pc:~/Desktop$ lua replace.lua
found A replace with B B
found A replace with B B
found B replace with A A
found B replace with A A
found C replace with nil C
found C replace with nil C
BBAACC 6
Related
How can I iterate over a string in Python (get each character from the string, one at a time, each time through a loop)?
As Johannes pointed out,
for c in "string":
#do something with c
You can iterate pretty much anything in python using the for loop construct,
for example, open("file.txt") returns a file object (and opens the file), iterating over it iterates over lines in that file
with open(filename) as f:
for line in f:
# do something with line
If that seems like magic, well it kinda is, but the idea behind it is really simple.
There's a simple iterator protocol that can be applied to any kind of object to make the for loop work on it.
Simply implement an iterator that defines a next() method, and implement an __iter__ method on a class to make it iterable. (the __iter__ of course, should return an iterator object, that is, an object that defines next())
See official documentation
If you need access to the index as you iterate through the string, use enumerate():
>>> for i, c in enumerate('test'):
... print i, c
...
0 t
1 e
2 s
3 t
Even easier:
for c in "test":
print c
Just to make a more comprehensive answer, the C way of iterating over a string can apply in Python, if you really wanna force a square peg into a round hole.
i = 0
while i < len(str):
print str[i]
i += 1
But then again, why do that when strings are inherently iterable?
for i in str:
print i
Well you can also do something interesting like this and do your job by using for loop
#suppose you have variable name
name = "Mr.Suryaa"
for index in range ( len ( name ) ):
print ( name[index] ) #just like c and c++
Answer is
M r . S u r y a a
However since range() create a list of the values which is sequence thus you can directly use the name
for e in name:
print(e)
This also produces the same result and also looks better and works with any sequence like list, tuple, and dictionary.
We have used tow Built in Functions ( BIFs in Python Community )
1) range() - range() BIF is used to create indexes
Example
for i in range ( 5 ) :
can produce 0 , 1 , 2 , 3 , 4
2) len() - len() BIF is used to find out the length of given string
If you would like to use a more functional approach to iterating over a string (perhaps to transform it somehow), you can split the string into characters, apply a function to each one, then join the resulting list of characters back into a string.
A string is inherently a list of characters, hence 'map' will iterate over the string - as second argument - applying the function - the first argument - to each one.
For example, here I use a simple lambda approach since all I want to do is a trivial modification to the character: here, to increment each character value:
>>> ''.join(map(lambda x: chr(ord(x)+1), "HAL"))
'IBM'
or more generally:
>>> ''.join(map(my_function, my_string))
where my_function takes a char value and returns a char value.
Several answers here use range. xrange is generally better as it returns a generator, rather than a fully-instantiated list. Where memory and or iterables of widely-varying lengths can be an issue, xrange is superior.
You can also do the following:
txt = "Hello World!"
print (*txt, sep='\n')
This does not use loops but internally print statement takes care of it.
* unpacks the string into a list and sends it to the print statement
sep='\n' will ensure that the next char is printed on a new line
The output will be:
H
e
l
l
o
W
o
r
l
d
!
If you do need a loop statement, then as others have mentioned, you can use a for loop like this:
for x in txt: print (x)
If you ever run in a situation where you need to get the next char of the word using __next__(), remember to create a string_iterator and iterate over it and not the original string (it does not have the __next__() method)
In this example, when I find a char = [ I keep looking into the next word while I don't find ], so I need to use __next__
here a for loop over the string wouldn't help
myString = "'string' 4 '['RP0', 'LC0']' '[3, 4]' '[3, '4']'"
processedInput = ""
word_iterator = myString.__iter__()
for idx, char in enumerate(word_iterator):
if char == "'":
continue
processedInput+=char
if char == '[':
next_char=word_iterator.__next__()
while(next_char != "]"):
processedInput+=next_char
next_char=word_iterator.__next__()
else:
processedInput+=next_char
I have a problem with splitting string into two parts on special character.
For example:
12345#data
or
1234567#data
I have 5-7 characters in first part separated with "#" from second part, where are another data (characters,numbers, doesn't matter what)
I need to store two parts on each side of # in two variables:
x = 12345
y = data
without "#" character.
I was looking for some Lua string function like splitOn("#") or substring until character, but I haven't found that.
Use string.match and captures.
Try this:
s = "12345#data"
a,b = s:match("(.+)#(.+)")
print(a,b)
See this documentation:
First of all, although Lua does not have a split function is its standard library, it does have string.gmatch, which can be used instead of a split function in many cases. Unlike a split function, string.gmatch takes a pattern to match the non-delimiter text, instead of the delimiters themselves
It is easily achievable with the help of a negated character class with string.gmatch:
local example = "12345#data"
for i in string.gmatch(example, "[^#]+") do
print(i)
end
See IDEONE demo
The [^#]+ pattern matches one or more characters other than # (so, it "splits" a string with 1 character).
I would like to have a procedure which makes a local copy b of input character a (of not assumed length) into an allocatable array of characters. I do have the following code
program test_copystr
character(len=6) :: str
str = 'abc'
call copystr(str)
contains
subroutine copystr(a)
character(len=*), intent(in) :: a
!> Local variables
integer :: i
character, allocatable :: b(:)
allocate(b(len_trim(a)))
do i=1, len_trim(a)
b(i) = a(i:i)
end do
print *, b
b(1:len_trim(a)) = a(1:len_trim(a))
print *, b
end subroutine copystr
end program test_copystr
where I'm trying to assign a to b in two different ways. The result is
abc
aaa
I thought that both assignments should yield the same output. Can anyone explain me that difference? (To compile this code I'm using gfortran 5.2.0 compiler.)
As you know b is an array of characters while a is a scalar; when the subroutine is called it is a 6-character string. These are different things. The statement
b(1:len_trim(a)) = a(1:len_trim(a))
specifies the array section b(1:3) on the lhs, that is all 3 elements of b, and the substring a(1:3) on the rhs. Now, when assigning a substring of length 3 to a single character such as any element of b Fortran assigns only the first character of the string.
In this case every element of b is set to the first character of a. It is as if the compiler generates the 3 statements
b(1) = 'abc'
b(2) = 'abc'
b(3) = 'abc'
to implement the array assignment. This is what Fortran's array syntax does with an array on the lhs and a scalar (expression) on the rhs, it broadcasts the scalar to each element of the array.
The first method you use, looping across the elements of b and the characters of a is the regular way make an array of characters equivalent to a string. But you could try transfer -- see my answer to this question Removing whitespace in string
here I have a part of my awk code to parse a file but the output is not 100% what I want.
match($0,/root=[^,]*/){
n=split(substr($0,RSTART+5,RLENGTH-5),N,/:/)
My Problem is that I can not tell by 100% what this piece of code is exactly doing ...
Can someone just tell me what this two lines exactly do?
EDIT:
I just want to know what the code does so I can fix it myself, so please do not ask something like: how the file you parse looks like? ..
match(s, r [, a])
Returns the position in s where the regular expression r occurs, or 0
if r is not present, and sets the values of RSTART and RLENGTH. Note
that the argument order is the same as for the ~ operator: str ~ re.
If array a is provided, a is cleared and then elements 1 through n are
filled with the portions of s that match the corresponding
parenthesized subexpression in r. The 0'th element of a contains the
portion of s matched by the entire regular expression r. Subscripts
a[n, "start"], and a[n, "length"] provide the starting index in the
string and length respectively, of each matching substring.
substr(s, i [, n])
Returns the at most n-character substring of s starting at i. If n is
omitted, the rest of s is used.
split(s, a [, r])
Splits the string s into the array a on the regular expression r, and
returns the number of fields. If r is omitted, FS is used instead. The
array a is cleared first. Splitting behaves identically to field
splitting, described above.
So when match finds something that matches /root=[^,]*/ in the line ($0) it will return that position (non-zero integers are truth-y for awk) and the action will execute.
The action then uses RSTART and RLENGTH as set by match to get the substring of the line that matched (minus root= because of the +5/-5) and then splits that into the array N on : and saves the number of fields split into n.
That could probably be changed to match($0, /root=([^,]*)/, N) as the pattern and then use N[1,"start"] in the action instead of substr if you wanted.
How do I remove lines from a string begins with another string in Lua ? For instance i want to remove all line from string result begins with the word <Table. This is the code I've written so far:
for line in result:gmatch"<Table [^\n]*" do line = "" end
string.gmtach is used to get all occurrences of a pattern. For replacing certain pattern, you need to use string.gsub.
Another problem is your pattern <Table [^\n]* will match all line containing the word <Table, not just begins with it.
Lua pattern doesn't support beginning of line anchor, this almost works:
local str = result:gsub("\n<Table [^\n]*", "")
except that it will miss on the first line. My solution is using a second run to test the first line:
local str1 = result:gsub("\n<Table [^\n]*", "")
local str2 = str1:gsub("^<Table [^\n]*\n", "")
The LPEG library is perfect
for this kind of task.
Just write a function to create custom line strippers:
local mk_striplines
do
local lpeg = require "lpeg"
local P = lpeg.P
local Cs = lpeg.Cs
local lpegmatch = lpeg.match
local eol = P"\n\r" + P"\r\n" + P"\n" + P"\t"
local eof = P(-1)
local linerest = (1 - eol)^1 * (eol + eof) + eol
mk_striplines = function (pat)
pat = P (pat)
local matchline = pat * linerest
local striplines = Cs (((matchline / "") + linerest)^1)
return function (str)
return lpegmatch (striplines, str)
end
end
end
Note that the argument to mk_striplines() may be a string or a
pattern.
Thus the result is very flexible:
mk_striplines (P"<Table" + P"</Table>") would create a stripper
that drops lines with two different patterns.
mk_striplines (P"x" * P"y"^0) drops each line starting with an
x followed by any number of y’s -- you get the idea.
Usage example:
local linestripper = mk_striplines "foo"
local test = [[
foo lorem ipsum
bar baz
buzz
foo bar
xyzzy
]]
print (linestripper (test))
The other answers provide good solutions to actually stripping lines from a string, but don't address why your code is failing to do that.
Reformatting for clarity, you wrote:
for line in result:gmatch"<Table [^\n]*" do
line = ""
end
The first part is a reasonable way to iterate over result and extract all spans of text that begin with <Table and continue up to but not including the next newline character. The iterator returned by gmatch returns a copy of the matching text on each call, and the local variable line holds that copy for the body of the for loop.
Since the matching text is copied to line, changes made to line are not and cannot modifying the actual text stored in result.
This is due to a more fundamental property of Lua strings. All strings in Lua are immutable. Once stored, they cannot be changed. Variables holding strings are actually holding a pointer into the internal table of reference counted immutable strings, which permits only two operations: internalization of a new string, and deletion of an internalized string with no remaining references.
So any approach to editing the content of the string stored in result is going to require the creation of an entirely new string. Where string.gmatch provides an iteration over the content but cannot allow it to be changed, string.gsub provides for creation of a new string where all text matching a pattern has been replaced by something new. But even string.gsub is not changing the immutable source text; it is creating a new immutable string that is a copy of the old with substitutions made.
Using gsub could be as simple as this:
result = result:gsub("<Table [^\n]*", "")
but that will disclose other defects in the pattern itself. First, and most obviously, nothing requires that the pattern match at only the beginning of the line. Second, the pattern does not include the newline, so it will leave the line present but empty.
All of that can be refined by careful and clever use of the pattern library. But it doesn't change the fact that you are starting with XML text and are not handling it with XML aware tools. In that case, any approach based on pattern matching or even regular expressions is likely to end in tears.
result = result:gsub('%f[^\n%z]<Table [^\n]*', '')
The start of this pattern, '%f[^\n%z], is a frontier pattern which will match any transition from either a newline or zero character to another character, and for frontier patterns the pre-first character counts as a zero character. In other words, using that prefix allows the rest of the pattern to match at either the first line or any other start-of-line.
Reference: the Lua 5.3 manual, section 6.4.1 on string patterns