Is there a way to get a string to get rid of icon characters automatically?
input: This is String 💋👌✅this is string✅✍️string✍️✔️
output wish: This is String this is stringstring
replace('💋👌✅', '') is not used because the icon character changes within each string without our prior knowledge of the content
Try this:
import re
def strip_emoji(text):
RE_EMOJI = re.compile(u'([\U00002600-\U000027BF])|([\U0001f300-\U0001f64F])|([\U0001f680-\U0001f6FF])')
return RE_EMOJI.sub(r'', text)
print(strip_emoji('This is String 💋👌✅this is string✅✍️string✍️✔️'))
Consider using the re module in Python to replace characters that you don't want. Something like:
import re
re.sub(r'[^(a-z|A-Z)]', '', my_string)
Related
So I have this string:
'm(1,2),m(4,3)'
How can I split it to get list that contains only 2 elements:
['m(1,2)', 'm(4,3)']
I can't use str.split function because it will split the whole string and would give me something like this:
['m(1', '2)', 'm(4', '3)']
Can you help me?
You can try regex:
import re
regex = re.compile("m\([0-9],[0-9]\)")
s = "m(1,2),m(4,3)"
regex.findall(s)
should yield:
['m(1,2)', 'm(4,3)']
I have a string, I have to get digits only from that string.
url = "www.mylocalurl.com/edit/1987"
Now from that string, I need to get 1987 only.
I have been trying this approach,
id = [int(i) for i in url.split() if i.isdigit()]
But I am getting [] list only.
You can use regex and get the digit alone in the list.
import re
url = "www.mylocalurl.com/edit/1987"
digit = re.findall(r'\d+', url)
output:
['1987']
Replace all non-digits with blank (effectively "deleting" them):
import re
num = re.sub('\D', '', url)
See live demo.
You aren't getting anything because by default the .split() method splits a sentence up where there are spaces. Since you are trying to split a hyperlink that has no spaces, it is not splitting anything up. What you can do is called a capture using regex. For example:
import re
url = "www.mylocalurl.com/edit/1987"
regex = r'(\d+)'
numbers = re.search(regex, url)
captured = numbers.groups()[0]
If you do not what what regular expressions are, the code is basically saying. Using the regex string defined as r'(\d+)' which basically means capture any digits, search through the url. Then in the captured we have the first captured group which is 1987.
If you don't want to use this, then you can use your .split() method but this time provide a split using / as the separator. For example `url.split('/').
I am working in a text with several syllables divisions.
A typical string is something like that
"this good pe-
riod has"
I tried:
my_string.replace('-'+"\r","")
However, it is not working.
I would like to get
"this good period has"
Have you tried this?
import re
text = """this good pe-
riod has"""
print(re.sub(r"-\s+", '', text))
# this good period has
After you match -, you should match the newline \n :
my_string = """this good pe-
riod has"""
print(my_string.replace("-\n",""))
# this good period has
It depends how your string ends, you could also use my_string.replace('-\r\n', '') or an optional carriage return using re.sub and -(?:\r?\n|\r)
If there has to be a word character before and after, instead of removing all the hyphens at the end of the line, you could use lookarounds:
(?<=\w)-\r?\n(?=\w)
Regex demo | Python demo
For example
import re
regex = r"(?<=\w)-\r?\n(?=\w)"
my_string = """this good pe-
riod has"""
print (re.sub(regex, "", my_string))
Output
this good period has
I recently came across python f-string. Let, there is a long line of string like
string = 'L4qrGFHK8eDA9Vy05gNH7inxfaVpPZH3i9pRdWScalA0pIGHKGqUEXejplKiYaCNizbiKH72LoQaoz1pQH9caDfAX5xtfQAZEri7QGkvxMAJWXsjPEPLQhTtTvvhDR1tMM9zX8Dd0l15bBW1Q3VQReOsbP5AQmMOK9GV0WYPZ015sg7tg8JKOs7hFJfD8bdpUQgWbGrxSdS95PnBTf4P2nTWWLrWzo3DQNyXHs29R6MZ92qqfPLGL8SSQNchWyo4V9NoHdAHDo5TdPf6VmNQaQAl9HKLVawTTg379plHr81YYEoojzstCSPh3jAy9W4dmjTLrBUxzA9tK5UlHKMGx7IYieNGfXBKTaCegdJOUubZPajkp0KY8OcpHxlaVFVdIPi58n6VH7evAomB'
I want to print this string like this:
L4qrGFHK8eDA9Vy05gNH7inxfaVpPZH3i9pRdWSc
alA0pIGHKGqUEXejplKiYaCNizbiKH72LoQaoz1p
QH9caDfAX5xtfQAZEri7QGkvxMAJWXsjPEPLQhTt
TvvhDR1tMM9zX8Dd0l15bBW1Q3VQReOsbP5AQmMO
K9GV0WYPZ015sg7tg8JKOs7hFJfD8bdpUQgWbGrx
SdS95PnBTf4P2nTWWLrWzo3DQNyXHs29R6MZ92qq
fPLGL8SSQNchWyo4V9NoHdAHDo5TdPf6VmNQaQAl
9HKLVawTTg379plHr81YYEoojzstCSPh3jAy9W4d
mjTLrBUxzA9tK5UlHKMGx7IYieNGfXBKTaCegdJO
UubZPajkp0KY8OcpHxlaVFVdIPi58n6VH7evAomB
And I want to use F-strings. I have tried left alignment like
print(f"{string}:<{40}")
But it is not working.
This is what you use textwrap for:
import textwrap
string = 'L4qrGFHK8eDA9Vy05gNH7inxfaVpPZH3i9pRdWScalA0pIGHKGqUEXejplKiYaCNizbiKH72LoQaoz1pQH9caDfAX5xtfQAZEri7QGkvxMAJWXsjPEPLQhTtTvvhDR1tMM9zX8Dd0l15bBW1Q3VQReOsbP5AQmMOK9GV0WYPZ015sg7tg8JKOs7hFJfD8bdpUQgWbGrxSdS95PnBTf4P2nTWWLrWzo3DQNyXHs29R6MZ92qqfPLGL8SSQNchWyo4V9NoHdAHDo5TdPf6VmNQaQAl9HKLVawTTg379plHr81YYEoojzstCSPh3jAy9W4dmjTLrBUxzA9tK5UlHKMGx7IYieNGfXBKTaCegdJOUubZPajkp0KY8OcpHxlaVFVdIPi58n6VH7evAomB'
# print("{string}:<40")
print(textwrap.fill(string, 40))
L4qrGFHK8eDA9Vy05gNH7inxfaVpPZH3i9pRdWSc
alA0pIGHKGqUEXejplKiYaCNizbiKH72LoQaoz1p
QH9caDfAX5xtfQAZEri7QGkvxMAJWXsjPEPLQhTt
TvvhDR1tMM9zX8Dd0l15bBW1Q3VQReOsbP5AQmMO
K9GV0WYPZ015sg7tg8JKOs7hFJfD8bdpUQgWbGrx
SdS95PnBTf4P2nTWWLrWzo3DQNyXHs29R6MZ92qq
fPLGL8SSQNchWyo4V9NoHdAHDo5TdPf6VmNQaQAl
9HKLVawTTg379plHr81YYEoojzstCSPh3jAy9W4d
mjTLrBUxzA9tK5UlHKMGx7IYieNGfXBKTaCegdJO
UubZPajkp0KY8OcpHxlaVFVdIPi58n6VH7evAomB
I have this string in var1
var1 = '$a=1%7Cscroll%20on%20%22Page%3A%20Generator-Sets-Construction%3Fid%3Dci%26s%3DY2l8Tj00Mjk0NzQ4MDY5KzQyOTQ5NjM4OTY%3D%22%7C-%7Cscroll%7C1443616500011%7C1443616500586%7C3774$fId=16440287_806$rId=RID_-62268720$rpId=1762047089$domR=1443616443684$time=1443616500588'
How can I change the contents of the string into 'readable' text i.e. non-URL encoded.
From research, here is the code I have tried, but it still keeps the URL-encoded items e.g. %20 etc.
import html
print(html.unescape('$a=1%7Cscroll%20on%20%22Page%3A%20Generator-Sets- Construction%3Fid%3Dci%26s%3DY2l8Tj00Mjk0NzQ4MDY5KzQyOTQ5NjM4OTY%3D%22%7C-%7Cscroll%7C1443616500011%7C1443616500586%7C3774$fId=16440287_806$rId=RID_-62268720$rpId=1762047089$domR=1443616443684$time=1443616500588'))
All help is appreciated or if there is an existing module that does this.
What you are trying to do is unquoting of parameters string and not unescaping of html. Following should work -
import urllib.parse
print(urllib.parse.unquote('$a=1%7Cscroll%20on%20%22Page%3A%20Generator-Sets- Construction%3Fid%3Dci%26s%3DY2l8Tj00Mjk0NzQ4MDY5KzQyOTQ5NjM4OTY%3D%22%7C-%7Cscroll%7C1443616500011%7C1443616500586%7C3774$fId=16440287_806$rId=RID_-62268720$rpId=1762047089$domR=1443616443684$time=1443616500588'))