my code is right but Lua replace doesn't working (:gsub) - string

Hello I am trying to replace a specific text to "" and my code doesn't working. I just don't know why my code not working
b = 'Just testing.<script>window.location.replace("http://google.com");</script>'
print(b)
b = b:gsub('<script>window.location.replace("http://google.com");</script>', "")
print(b)
out 1: Just testing.window.location.replace("http://google.com");
out 2: Just testing.window.location.replace("http://google.com");
I tried b = string.gsub(b,'<script>window.location.replace("http://google.com");</script>',"") too but its doesn't worked either
I am working in FiveM

You need to escape ( and ), in a lua pattern they are recognized as special character. you can escape them using %
b = 'Just testing.<script>window.location.replace("http://google.com");</script>'
print(b)
b = b:gsub('<script>window.location.replace%("http://google.com"%);</script>', "")
print(b)
For more infomration on lua patter you can look at these resources:
Understanding Lua Patterns
20.1 – Pattern-Matching Functions
20.2 – Patterns

Related

Remove all spaces for chinese characters while keeping necessary spaces for english in Python regex

Let's say my dataframe has column which is mixed with english and chinese words or characters, I would like to remove all the whitespaces between them if they're chinese words, otherwise if they're english, then keep one space only between words:
I have found a solution for removing extra spaces between english from here
import re
import pandas as pd
s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful', '你 好', '黑 石 公 司', 'FAN STUD1O', 'beauty face 店 铺'])
Code:
regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')
Out:
Out[87]:
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
4 你 好
5 黑 石 公 司
dtype: object
But as you see, it works out for english but didn't remove spaces between chinese, how could get an expected result as follows:
Out[87]:
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
4 你好
5 黑石公司
dtype: object
Reference: Remove all spaces between Chinese words with regex
You could use the Chinese (well, CJK) Unicode property \p{script=Han} or \p{Han}.
However, this only works if the regex engine supports UTS#18 Unicode regular expressions. The default Python re module does not but you can use the alternative (much improved) regex engine:
import regex as re
rex = r"(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1})[ ]+(?=[a-zA-Z] |.$)|(?<=\p{Han}) +"
test_str = ("V e r y calm\n"
"Keen and a n a l y t i c a l\n"
"R a s h and careless\n"
"Always joyful\n"
"你 好\n"
"黑 石 公 司")
result = re.sub(rex, "", test_str, 0, re.MULTILINE | re.UNICODE)
Results in
Very calm
Keen and analytical
Rash and careless
Always joyful
你好
黑石公司
Online Demo (the demo is using PCRE for demonstration purposes only)
This regex should get you what you want. See the full code snippet at the bottom.
regex = re.compile(
"((?<![a-zA-Z]{2})(?<=[a-zA-Z]{1})\s+(?=[a-zA-Z]\s|.$)|(?<=[\u4e00-\u9fff]{1})\s+)",
re.UNICODE,
)
I made the following edits to your regex above:
Right now, the regex basically matches all spaces that appear after a single-letter word and before another single character word.
I added a part at the end of the regex that would select all spaces after a Chinese character (I used the unicode range [\u4e00-\u9fff] which would cover Japanese and Korean as well.
I changed the spaces in the regex to the whitespace character class \s so we could catch other input like tabs.
I also added the re.UNICODE flag so that \s would cover unicode spaces as well.
import re
import pandas as pd
s = pd.Series(
[
"V e r y calm",
"Keen and a n a l y t i c a l",
"R a s h and careless",
"Always joyful",
"你 好",
"黑 石 公 司",
]
)
regex = re.compile(
"((?<![a-zA-Z]{2})(?<=[a-zA-Z]{1})\s+(?=[a-zA-Z]\s|.$)|(?<=[\u4e00-\u9fff]{1})\s+)",
re.UNICODE,
)
s.str.replace(regex, "")
Output:
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
4 你好
5 黑石公司
dtype: object
Use word boundaries \b in look arounds:
(?<=\b\w\b) +(?=\b\w\b)
This matches spaces between solitary (bounded by word boundaries) "word characters", which includes Chinese characters.
Pre python 3 (and for java for example), \w only matches English letters, so you would need to add the unicode flag (?u) to the front of the regex.
s = ['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful', '你 好', '黑 石 公 司']
regex = r'(?<=\b\w\b) +(?=\b\w\b)'
res = [re.sub(regex, '', line) for line in s]
print(res)
Output:
['Very calm', 'Keen and analytical', 'Rash and careless', 'Always joyful', '你好', '黑石公司']

What pattern or (loops and patterns) can find all possible character in word and keep sequence of founded in LUA

I want to take each founded character in any word
str = "ABC"
str:match("(A?)(B?)(C?)") if u can see I don't know any character of str. but i wanna see them if they found.
but what if
str = yyyyAyyyyByyyyCyyyy
print( str:match(?) ) -- like "(A?).-(B?).-(C?)"
--> A, B, C
str = yyCyyAyyyyByyyyyyyy
print( str:match(?) )"
--> A, B
str = yyCyyyyyyyByyyyyCyyy
print( str:match(?) )"
--> B, C
the word can be in any variation, just any. Hoh to do that pattern or loop if patterns is useless. I am dumb give me just a working code =)

Specify Python 3 to not meddle with new line?

I have the following code in Python 3:
st = "a" + '\r\n\r\n\r\n' + "b"
print( st )
The output is the following:
I do not want Python to add a 'CR' for me - I need to be in control. Is there anything I can do about it?
The built-in method repr() will return the string without the newline formatting.
https://docs.python.org/3/library/functions.html#repr
>>> st = "a" + '\r\n\r\n\r\n' + "b"
>>> print(repr(st))
'a\r\n\r\n\r\nb'
Alternatively, you can use a raw string like as demonstrated below.
https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
>>> print("a" + r'\r\n\r\n' + "b")
a\r\n\r\nb
Except the last one, all others are there just because you are telling python to write them (\r for LF and \n for CR). If you refer to the CR LF then you can use:
print(st,end="\n")

How would you solve the letter changer in Julia?

I found this challenge:
Using your language, have the function LetterChanges(str) take the str parameter being passed and modify it using the following algorithm. Replace every letter in the string with the letter following it in the alphabet (ie. c becomes d, z becomes a). Then capitalize every vowel in this new string (a, e, i, o, u) and finally return this modified string.
I am new in Julia, and I was challenging myself in this challenge. I found this challenge very hard in Julia lang and I could not find a solution.
Here I tried to solve in the way below, but I got error: the x value is not defined
How would you solve this?
function LetterChanges(stringis::AbstractString)
alphabet = "abcdefghijklmnopqrstuvwxyz"
vohels = "aeiou"
for Char(x) in split(stringis, "")
if x == 'z'
x = 'a'
elseif x in vohels
uppercase(x)
else
Int(x)+1
Char(x)
println(x)
end
end
end
Thank you
As a side note:
The proposed solution works properly. However, if you would need high performance (which you probably do not given the source of your problem) it is more efficient to use string builder:
function LetterChanges2(str::AbstractString)
v = Set("aeiou")
#sprint(sizehint=sizeof(str)) do io # use on Julia 0.7 - new keyword argument
sprint() do io # use on Julia 0.6.2
for c in str
c = c == 'z' ? 'a' : c+1 # we assume that we got only letters from 'a':'z'
print(io, c in v ? uppercase(c) : c)
end
end
end
it is over 10x faster than the above.
EDIT: for Julia 0.7 this is a bit faster:
function LetterChanges2(str::AbstractString)
v = BitSet(collect(Int,"aeiouy"))
sprint(sizehint=sizeof(str)) do io # use on Julia 0.7 - new keyword argument
for c in str
c = c == 'z' ? 'a' : c+1 # we assume that we got only letters from 'a':'z'
write(io, Int(c) in v ? uppercase(c) : c)
end
end
end
There is a logic error. It says "Replace every letter in the string with the letter following it in the alphabet. Then capitalize every vowel in this new string". Your code checks, if it is a vowel. Then it capitalizes it or replaces it. That's different behavior. You have to first replace and then to check if it is a vowel.
You are replacing 'a' by 'Z'. You should be replacing 'z' by 'a'
The function split(stringis, "") returns an array of strings. You can't store these strings in Char(x). You have to store them in x and then you can transform theses string to char with c = x[1].
After transforming a char you have to store it in the variable: c = uppercase(c)
You don't need to transform a char into int. You can add a number to a char: c = c + 1
You have to store the new characters in a string and return them.
function LetterChanges(stringis::AbstractString)
# some code
str = ""
for x in split(stringis, "")
c = x[1]
# logic
str = "$str$c"
end
return str
end
Here's another version that is a bit faster than #BogumilKaminski's answer on version 0.6, but that might be different on 0.7. On the other hand, it might be a little less intimidating than the do-block magic ;)
function changeletters(str::String)
vowels = "aeiouy"
carr = Vector{Char}(length(str))
i = 0
for c in str
newchar = c == 'z' ? 'a' : c + 1
carr[i+=1] = newchar in vowels ? uppercase(newchar) : newchar
end
return String(carr)
end
At the risk of being accused of cheating, this is a dictionary-based approach:
function change_letters(s::String)::String
k = collect('a':'z')
v = vcat(collect('b':'z'), 'A')
d = Dict{Char, Char}(zip(k, v))
for c in Set("eiou")
d[c - 1] = uppercase(d[c - 1])
end
b = IOBuffer()
for c in s
print(b, d[c])
end
return String(take!(b))
end
It seems to compare well in speed terms with the other Julia 0.6 methods for long strings (e.g. 100,000 characters). There's a bit of unnecessary overhead in constructing the dictionary which is noticeable on small strings, but I'm far too lazy to type out the 'a'=>'b' construction long-hand!

Automatic acronyms of strings in R

Long strings in plots aren't always attractive. What's the shortest way of making an acronym in R? E.g., "Hello world" to "HW", and preferably to have unique acronyms.
There's function abbreviate, but it just removes some letters from the phrase, instead of taking first letters of each word.
An easy way would be to use a combination of strsplit, substr, and make.unique.
Here's an example function that can be written:
makeInitials <- function(charVec) {
make.unique(vapply(strsplit(toupper(charVec), " "),
function(x) paste(substr(x, 1, 1), collapse = ""),
vector("character", 1L)))
}
Test it out:
X <- c("Hello World", "Home Work", "holidays with children", "Hello Europe")
makeInitials(X)
# [1] "HW" "HW.1" "HWC" "HE"
That said, I do think that abbreviate should suffice, if you use some of its arguments:
abbreviate(X, minlength=1)
# Hello World Home Work holidays with children Hello Europe
# "HlW" "HmW" "hwc" "HE"
Using regex you can do following. The regex pattern ((?<=\\s).|^.) looks for any letter followed by space or first letter of the string. Then we just paste resulting vectors using collapse argument to get first letter based acronym. And as Ananda suggested, if you want to make unique pass the result through make.unique.
X <- c("Hello World", "Home Work", "holidays with children")
sapply(regmatches(X, gregexpr(pattern = "((?<=\\s).|^.)", text = X, perl = T)), paste, collapse = ".")
## [1] "H.W" "H.W" "h.w.c"
# If you want to make unique
make.unique(sapply(regmatches(X, gregexpr(pattern = "((?<=\\s).|^.)", text = X, perl = T)), paste, collapse = "."))
## [1] "H.W" "H.W.1" "h.w.c"

Resources