How to check if a string is alphanumeric? - string

I can use occursin function, but its haystack argument cannot be a regular expression, which means I have to pass the entire alphanumeric string to it. Is there a neat way of doing this in Julia?

I'm not sure your assumption about occursin is correct:
julia> occursin(r"[a-zA-z]", "ABC123")
true
julia> occursin(r"[a-zA-z]", "123")
false

but its haystack argument cannot be a regular expression, which means I have to pass the entire alphanumeric string to it.
If you mean its needle argument, it can be a Regex, for eg.:
julia> occursin(r"^[[:alnum:]]*$", "adf24asg24y")
true
julia> occursin(r"^[[:alnum:]]*$", "adf24asg2_4y")
false
This checks that the given haystack string is alphanumeric using Unicode-aware character class
[[:alnum:]] which you can think of as equivalent to [a-zA-Z\d], extended to non-English characters too. (As always with Unicode, a "perfect" solution involves more work and complication, but this takes you most of the way.)
If you do mean you want the haystack argument to be a Regex, it's not clear why you'd want that here, and also why "I have to pass the entire alphanumeric string to it" is a bad thing.

As has been noted, you can indeed use regexes with occursin, and it works well. But you can also roll your own version, quite simply:
isalphanumeric(c::AbstractChar) = isletter(c) || ('0' <= c <= '9')
isalphanumeric(str::AbstractString) = all(isalphanumeric, str)

Related

python Using variable in re.search source.error("bad escape %s" % escape, len(escape)) [duplicate]

I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex?
For example, the user wants to search for Word (s): regex engine will take the (s) as a group. I want it to treat it like a string "(s)" . I can run replace on user input and replace the ( with \( and the ) with \) but the problem is I will need to do replace for every possible regex symbol.
Do you know some better way ?
Use the re.escape() function for this:
4.2.3 re Module Contents
escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
A simplistic example, search any occurence of the provided string optionally followed by 's', and return the match object.
def simplistic_plural(word, text):
word_or_plural = re.escape(word) + 's?'
return re.match(word_or_plural, text)
You can use re.escape():
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'
If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.
If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_).
Unfortunately, re.escape() is not suited for the replacement string:
>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'
A solution is to put the replacement in a lambda:
>>> re.sub('a', lambda _: '_', 'aa')
'__'
because the return value of the lambda is treated by re.sub() as a literal string.
Usually escaping the string that you feed into a regex is such that the regex considers those characters literally. Remember usually you type strings into your compuer and the computer insert the specific characters. When you see in your editor \n it's not really a new line until the parser decides it is. It's two characters. Once you pass it through python's print will display it and thus parse it as a new a line but in the text you see in the editor it's likely just the char for backslash followed by n. If you do \r"\n" then python will always interpret it as the raw thing you typed in (as far as I understand). To complicate things further there is another syntax/grammar going on with regexes. The regex parser will interpret the strings it's receives differently than python's print would. I believe this is why we are recommended to pass raw strings like r"(\n+) -- so that the regex receives what you actually typed. However, the regex will receive a parenthesis and won't match it as a literal parenthesis unless you tell it to explicitly using the regex's own syntax rules. For that you need r"(\fun \( x : nat \) :)" here the first parens won't be matched since it's a capture group due to lack of backslashes but the second one will be matched as literal parens.
Thus we usually do re.escape(regex) to escape things we want to be interpreted literally i.e. things that would be usually ignored by the regex paraser e.g. parens, spaces etc. will be escaped. e.g. code I have in my app:
# escapes non-alphanumeric to help match arbitrary literal string, I think the reason this is here is to help differentiate the things escaped from the regex we are inserting in the next line and the literal things we wanted escaped.
__ppt = re.escape(_ppt) # used for e.g. parenthesis ( are not interpreted as was to group this but literally
e.g. see these strings:
_ppt
Out[4]: '(let H : forall x : bool, negb (negb x) = x := fun x : bool =>HEREinHERE)'
__ppt
Out[5]: '\\(let\\ H\\ :\\ forall\\ x\\ :\\ bool,\\ negb\\ \\(negb\\ x\\)\\ =\\ x\\ :=\\ fun\\ x\\ :\\ bool\\ =>HEREinHERE\\)'
print(rf'{_ppt=}')
_ppt='(let H : forall x : bool, negb (negb x) = x := fun x : bool =>HEREinHERE)'
print(rf'{__ppt=}')
__ppt='\\(let\\ H\\ :\\ forall\\ x\\ :\\ bool,\\ negb\\ \\(negb\\ x\\)\\ =\\ x\\ :=\\ fun\\ x\\ :\\ bool\\ =>HEREinHERE\\)'
the double backslashes I believe are there so that the regex receives a literal backslash.
btw, I am surprised it printed double backslashes instead of a single one. If anyone can comment on that it would be appreciated. I'm also curious how to match literal backslashes now in the regex. I assume it's 4 backslashes but I honestly expected only 2 would have been needed due to the raw string r construct.

Remove part of string (regular expressions)

I am a beginner in programming. I have a string for example "test:1" and "test:2". And I want to remove ":1" and ":2" (including :). How can I do it using regular expression?
Hi andrew it's pretty easy. Think of a string as if it is an array of chars (letters) cause it actually IS. If the part of the string you want to delete is allways at the end of the string and allways the same length it goes like this:
var exampleString = 'test:1';
exampleString.length -= 2;
Thats it you just deleted the last two values(letters) of the string(charArray)
If you cant be shure it's allways at the end or the amount of chars to delete you'd to use the version of szymon
There are at least a few ways to do it with Groovy. If you want to stick to regular expression, you can apply expression ^([^:]+) (which means all characters from the beginning of the string until reaching :) to a StringGroovyMethods.find(regexp) method, e.g.
def str = "test:1".find(/^([^:]+)/)
assert str == 'test'
Alternatively you can use good old String.split(String delimiter) method:
def str = "test:1".split(':')[0]
assert str == 'test'

Lua -- match strings including non-letter classes

I'm trying to find exact matches of strings in Lua including, special characters. I want the example below to return that it is an exact match, but because of the - character it returns nil
index = string.find("test-string", "test-string")
returns nil
index = string.find("test-string", "test-")
returns 1
index = string.find("test-string", "test")
also returns 1
How can I get it to do full matching?
- is a pattern operator in a Lua string pattern, so when you say test-string, you're telling find() to match the string test as few times as possible. So what happens is it looks at test-string, sees test in there, and since - isn't an actual minus sign in this case, it's really looking for teststring.
Do as Mike has said and escape it with the % character.
I found this helpful for better understanding patterns.
You can also ask for a plain substring match that ignores magic characters:
string.find("test-string", "test-string",1,true)
you need to escape special characters in the pattern with the % character.
so in this case you are looking for
local index = string.find('test-string', 'test%-string')

Lua plain searching with string.gsub?

With Lua's string.find function, there is an optional fourth argument you can pass to enable plain searching. From the Lua wiki:
The pattern argument also allows more complex searches. See the
PatternsTutorial for more information. We can turn off the pattern
matching feature by using the optional fourth argument plain. plain
takes a boolean value and must be preceeded by index. E.g.,
= string.find("Hello Lua user", "%su") -- find a space character followed by "u"
10 11
= string.find("Hello Lua user", "%su", 1, true) -- turn on plain searches, now not found
nil
Basically, I was wondering how I can accomplish the same plain searching using Lua's string.gsub function.
I expected there to be something in the standard library for this, but there isn't. The solution, then, is to escape the special characters in the pattern so they don't perform their usual functions.
Here's the general idea:
obtain the pattern string
replace any special characters with % followed by it (for example, % becomes %%, [ becomes %[
use this as your search pattern for replacing the text
Here is a simple library function for text replacement:
function string.replace(text, old, new)
local b,e = text:find(old,1,true)
if b==nil then
return text
else
return text:sub(1,b-1) .. new .. text:sub(e+1)
end
end
This function can be called as newtext = text:replace(old,new).
Note that this only replaces the first occurrence of old in text.
Use this function to escape all magic characters (and only those) in your search string.
function escape_magic(s)
return (s:gsub('[%^%$%(%)%%%.%[%]%*%+%-%?]','%%%1'))
end

Modifying a character in a string in Lua

Is there any way to replace a character at position N in a string in Lua.
This is what I've come up with so far:
function replace_char(pos, str, r)
return str:sub(pos, pos - 1) .. r .. str:sub(pos + 1, str:len())
end
str = replace_char(2, "aaaaaa", "X")
print(str)
I can't use gsub either as that would replace every capture, not just the capture at position N.
Strings in Lua are immutable. That means, that any solution that replaces text in a string must end up constructing a new string with the desired content. For the specific case of replacing a single character with some other content, you will need to split the original string into a prefix part and a postfix part, and concatenate them back together around the new content.
This variation on your code:
function replace_char(pos, str, r)
return str:sub(1, pos-1) .. r .. str:sub(pos+1)
end
is the most direct translation to straightforward Lua. It is probably fast enough for most purposes. I've fixed the bug that the prefix should be the first pos-1 chars, and taken advantage of the fact that if the last argument to string.sub is missing it is assumed to be -1 which is equivalent to the end of the string.
But do note that it creates a number of temporary strings that will hang around in the string store until garbage collection eats them. The temporaries for the prefix and postfix can't be avoided in any solution. But this also has to create a temporary for the first .. operator to be consumed by the second.
It is possible that one of two alternate approaches could be faster. The first is the solution offered by PaĆ­lo Ebermann, but with one small tweak:
function replace_char2(pos, str, r)
return ("%s%s%s"):format(str:sub(1,pos-1), r, str:sub(pos+1))
end
This uses string.format to do the assembly of the result in the hopes that it can guess the final buffer size without needing extra temporary objects.
But do beware that string.format is likely to have issues with any \0 characters in any string that it passes through its %s format. Specifically, since it is implemented in terms of standard C's sprintf() function, it would be reasonable to expect it to terminate the substituted string at the first occurrence of \0. (Noted by user Delusional Logic in a comment.)
A third alternative that comes to mind is this:
function replace_char3(pos, str, r)
return table.concat{str:sub(1,pos-1), r, str:sub(pos+1)}
end
table.concat efficiently concatenates a list of strings into a final result. It has an optional second argument which is text to insert between the strings, which defaults to "" which suits our purpose here.
My guess is that unless your strings are huge and you do this substitution frequently, you won't see any practical performance differences between these methods. However, I've been surprised before, so profile your application to verify there is a bottleneck, and benchmark potential solutions carefully.
You should use pos inside your function instead of literal 1 and 3, but apart from this it looks good. Since Lua strings are immutable you can't really do much better than this.
Maybe
"%s%s%s":format(str:sub(1,pos-1), r, str:sub(pos+1, str:len())
is more efficient than the .. operator, but I doubt it - if it turns out to be a bottleneck, measure it (and then decide to implement this replacement function in C).
With luajit, you can use the FFI library to cast the string to a list of unsigned charts:
local ffi = require 'ffi'
txt = 'test'
ptr = ffi.cast('uint8_t*', txt)
ptr[1] = string.byte('o')

Resources