I have a string, I have to get digits only from that string.
url = "www.mylocalurl.com/edit/1987"
Now from that string, I need to get 1987 only.
I have been trying this approach,
id = [int(i) for i in url.split() if i.isdigit()]
But I am getting [] list only.
You can use regex and get the digit alone in the list.
import re
url = "www.mylocalurl.com/edit/1987"
digit = re.findall(r'\d+', url)
output:
['1987']
Replace all non-digits with blank (effectively "deleting" them):
import re
num = re.sub('\D', '', url)
See live demo.
You aren't getting anything because by default the .split() method splits a sentence up where there are spaces. Since you are trying to split a hyperlink that has no spaces, it is not splitting anything up. What you can do is called a capture using regex. For example:
import re
url = "www.mylocalurl.com/edit/1987"
regex = r'(\d+)'
numbers = re.search(regex, url)
captured = numbers.groups()[0]
If you do not what what regular expressions are, the code is basically saying. Using the regex string defined as r'(\d+)' which basically means capture any digits, search through the url. Then in the captured we have the first captured group which is 1987.
If you don't want to use this, then you can use your .split() method but this time provide a split using / as the separator. For example `url.split('/').
Hi I am trying to create a regular expression with the rules:
portion before '.com' or '.edu' can only have at most 10 alphabets
if this portion does not exist, then it should only return 'com'
For example,
'stack.com' is valid
'stackoverflow.com' is not valid as it has more than 10 alpha before .com
'.com' is not valid while 'com' is valid
Here is what I have so far:
regex = r'^([A-Za-z]{,10}\.)?(com|edu)'
re.match(regex, 'com')
I am trying to group the portion before (com|edu) together, so that if it does not exist, then the . will also not be there.
Given the condition ".com is not valid while com is valid", I think your expression is the right one and you just have to do some processing afterwards:
import re
full_string = """stack.com
stackoverflow.com
.com
foo.edu
com
bar
sitcom
sit.com"""
regex = r'^([A-Za-z]{,10}\.)?(com|edu)$'
for base, domain in re.findall(regex, full_string, re.MULTILINE):
if base not in (".", ""):
print(base.strip("."))
else: # nothin before com/edu
print(domain)
Edit: if you want to completely exclude .com (and not change it to com) you can still go with:
regex = r'^(?:[A-Za-z]{1,10}\.)?(?:com|edu)$'
print(re.findall(regex, full_string, re.MULTILINE))
I am having a string as follows:
A5697[2:10] = {ravi, rageev, raghav, smith};
I want the content after "A5697[2:10] =". So, my output should be:
{ravi, rageev, raghav, smith};
This is my code:
print(re.search(r'(?<=A\d+\[.*\] =\s).*', line).group())
But, this is giving error:
sre_constants.error: look-behind requires fixed-width pattern
Can anyone help to solve this issue? I would prefer to use regex.
You can try re.sub , like below, Since you have given only one data point. I am assuming all the other data points are following the similar pattern.
import re
text = "A5697[2:10] = {ravi, rageev, raghav, smith}"
re.sub(r'(A\d+\[\d+:\d+\]\s+=\s+)(.+)', r'\2', text)
returns,
'{ravi, rageev, raghav, smith}'
re.sub : substitutes the entire match as given as regex with the 2nd capturing group. The second capturing group captures every thing after '= '.
Simply replace the bits you don't want:
print re.sub(r'A\d[^=]*= *','',line)
See demo here: https://rextester.com/NSG17655
I need to format a url string for using with urllib:
the url string I want to get are as:
http://localhost:8086/service/records/names?name=A,B,C,D,E,F,G
if I use
namelist = ['A','B','C','D','E','F','G']
url = 'http://localhost:8086/service/records/names?name={namelist}'.format(namelist=namelist}
then I get:
http://localhost:8086/service/records/names?name=['A','B','C','D','E','F','G']
so how should I format an url string by passing in a string list wihout "[]'"?
Join the list into a string with...
'[insert seperator here]'.join(namelist)
so in your case
','.join(namelist)
this produces 'A,B,C,D,E,F'...
Then you can use your initial method with the .format()
Your first option is what the other answers suggest: to create your comma-separated list yourself, like so:
import urllib.parse
query_string = 'name=' + ','.join(namelist)
url = 'http://localhost:8086/service/records/names?{query_string}'.format(query_string=query_string)
# url == 'http://localhost:8086/service/records/names?name=A,B,C,D,E,F,G'
This fits what you asked for, however it has some limitations: first, if one of your names has a comma, it will not be correctly escaped.
Second, the commas in your list, and other characters in namelist won't be properly encoded for the URL.
Your second option, a more robust version of the previous one, is to encode your list, like so:
import urllib.parse
query_params = {'name': ','.join(namelist)}
query_string = urllib.parse.urlencode(query_params)
url = 'http://localhost:8086/service/records/names?{query_string}'.format(query_string=query_string)
# url == 'http://localhost:8086/service/records/names?name=A%2CB%2CC%2CD%2CE%2CF%2CG'
This will properly escape the characters for URL usage, but you are still left with the manual assembling and parsing of the query string.
There is a third option, which I would suggest: use the standard way of passing a list in the query string, which is to repeat the key.
import urllib.parse
query_params = {'name': namelist}
query_string = urllib.parse.urlencode(query_params, doseq=True)
url = 'http://localhost:8086/service/records/names?{query_string}'.format(query_string=query_string)
# url == 'http://localhost:8086/service/records/names?name=A&name=B&name=C&name=D&name=E&name=F&name=G'
This last option, a bit more verbose, is more robust though, as the URL parser will return a list you don't need to parse.
Additionally, if there is a comma in one of your names, it will be automatically escaped.
Check out the difference between the three options:
>>> urllib.parse.parse_qs('name=A,B,C,D,E,F,G')
{'name': ['A,B,C,D,E,F,G']}
>>> urllib.parse.parse_qs('name=A%2CB%2CC%2CD%2CE%2CF%2CG')
{'name': ['A,B,C,D,E,F,G']}
>>> urllib.parse.parse_qs('name=A&name=B&name=C&name=D&name=E&name=F&name=G')
{'name': ['A', 'B', 'C', 'D', 'E', 'F', 'G']}
Last one will be easier to work with!
.format(namelist=",".join(namelist))
will work, using "," between list entries
In addition to the other answers: you can also use the nice F-string feature of Python 3. It's a lot prettier than .format() imo, and reminds you of other programming languages that allow variable interpolation.
namelist = ['A','B','C','D','E','F','G']
url = f"http://localhost:8086/service/records/names?name={','.join(namelist)}"
print(url)
# http://localhost:8086/service/records/names?name=A,B,C,D,E,F,G
Are there any alternatives that are similar to .replace() but that allow you to pass more than one old substring to be replaced?
I have a function with which I pass video titles so that specific characters can be removed (because the API I'm passing the videos too has bugs that don't allow certain characters):
def videoNameExists(vidName):
vidName = vidName.encode("utf-8")
bugFixVidName = vidName.replace(":", "")
search_url ='https://api.brightcove.com/services/library?command=search_videos&video_fields=name&page_number=0&get_item_count=true&token=kwSt2FKpMowoIdoOAvKj&any=%22{}%22'.format(bugFixVidName)
Right now, it's eliminating ":" from any video titles with vidName.replace(":", "") but I also would like to replace "|" when that occurs in the name string sorted in the vidName variable. Is there an alternative to .replace() that would allow me to replace more than one substring at a time?
>>> s = "a:b|c"
>>> s.translate(None, ":|")
'abc'
You may use re.sub
import re
re.sub(r'[:|]', "", vidName)