python regular expression for URL just "www.example.com" - python-3.x

I need python regex for "www.example.com" (without quotes).
example can be of any string.
I need it without any other text before "www" and after ".com"

You can use a dedicated function from the standard library urllib.parse.urlparse:
>>> from urllib.parse import urlparse
>>> parts = urlparse('http://www.example.org')
>>> parts
ParseResult(scheme='http', netloc='www.example.org', path='', params='', query='', fragment='')
>>> parts.netloc
'www.example.org'
Or you can use this regexp for a text:
>>> import re
>>> regexp = re.compile(r'\s*(www\.[^:\/\n]+\.com)\s*')
>>> urls = regexp.findall('Hello https://www.mywebsite.com/index.py?q=search bonjour...')
>>> urls
['www.mywebsite.com']

Related

converting bytes to string gives b' prefix

I'm trying to convert an old python2 code to python3, and I'm facing a problem with strings vs bytes
In the old code, this line was executed:
'0x' + binascii.hexlify(bytes_reg1)
In python2 binascii.hexlify(bytes_reg1) was returning a string but in python3 it returns bytes, so it cannot be concatenated to "0x"
TypeError: can only concatenate str (not "bytes") to str
I tried converting it to string:
'0x' + str(binascii.hexlify(bytes_reg1))
But what I get as a result is:
"0xb'23'"
And it should be:
"0x23"
How can I convert the bytes to just 23 instead of b'23' so when concatenating '0x' I get the correct string?
can you try doing this and let me know whether it worked for you or not :
'0x' + str(binascii.hexlify(bytes_reg1)).decode("utf-8")
# or
'0x' + str(binascii.hexlify(bytes_reg1), encoding="utf-8")
note- Also if you can provide the sample of bytes_reg1, it will be easier to provide a solution.
Decode is the way forward, as #Satya says.
You can access the hex string in another way:
>>> import binascii
>>> import struct
>>>
>>> some_bytes = struct.pack(">H", 12345)
>>>
>>> h = binascii.hexlify(some_bytes)
>>> print(h)
b'3039'
>>>
>>> a = h.decode('ascii')
>>> print(a)
3039
>>>
>>> as_hex = hex(int(a, 16))
>>> print(as_hex)
0x3039
>>>

How to replace random elements of a list with a unique symbol?

I am a newbie to python programming. I have two lists, the first list containing stopwords while the other containing the text document. I want to replace the stop words in the text document with "/". Is there anyone that could help?
I have used the replace function, it was giving an error
text = "This is an example showing off word filtration"
stop = `set`(stopwords.words("english"))
text = nltk.word_tokenize(document)
`for` word in stop:
text = text.replace(stop, "/")
`print`(text)
It should output
"/ / / example showing / word filtration"
How about a list comprehension:
>>> from nltk.corpus import stopwords
>>> from nltk.tokenize import word_tokenize
>>> stop_words = set(stopwords.words('english'))
>>> text = "This is an example showing off word filtration"
>>> text_tokens = word_tokenize(text)
>>> replaced_text_words = ["/" if word.lower() in stop_words else word for word in text_tokens]
>>> replaced_text_words
['/', '/', '/', 'example', 'showing', '/', 'word', 'filtration']
>>> replaced_sentence = " ".join(replaced_text_words)
>>> replaced_sentence
/ / / example showing / word filtration
How about using a regex pattern?
Your code could then look like this:
from nltk.corpus import stopwords
import nltk
text = "This is an example showing off word filtration"
text = text.lower()
import re
pattern = re.compile(r'\b(' + r'|'.join(stopwords.words('english')) + r')\b\s*')
text = pattern.sub('/ ', text)
In relation to this post.
you should use word not stop in your replace function.
for word in stop:
text = text.replace(word, "/")
you can try this
' '/join([item if item.lower() not in stop else "/" for item in text ])

converting a string into parameter for GET request in python

I am calling a GET API to retrieve some data. For get call I need to covert my keyword as
keyword = "mahinder singh dhoni"
into
caption%3Amahinder%2Ccaption%3Asingh%2Ccaption%3Adhoni
I am new to python and dont know the pythonic way. I am doing like this
caption_heading = "caption%3A"
caption_tail = "%2Ccaption%3A"
keyword = "mahinder singh dhoni"
x = keyword.split(" ")
new_caption_keyword = []
new_caption_keyword.append(caption_heading)
for data in x:
new_caption_keyword.append(data)
new_caption_keyword.append(caption_tail)
search_query = ''.join(new_caption_keyword)
search_query = search_query[:-13]
print("new transformed keyword", search_query)
Is there a better way to do this.I means this is kind of hard coding.
Thanks
Best to turn our original string into a list:
>>> keyword = "mahinder singh dhoni"
>>> keyword.split()
['mahinder', 'singh', 'dhoni']
Then your actual string looks like caption:...,caption:...,caption:..., that can be done with a join and a format:
>>> # if you're < python3.6, use 'caption:{}'.format(part)`
>>> [f'caption:{part}' for part in keyword.split()]
['caption:mahinder', 'caption:singh', 'caption:dhoni']
>>> ','.join([f'caption:{part}' for part in keyword.split()])
'caption:mahinder,caption:singh,caption:dhoni'
And finally you'll urlencode using urllib.parse:
>>> import urllib.parse
>>> urllib.parse.quote(','.join([f'caption:{part}' for part in keyword.split()]))
'caption%3Amahinder%2Ccaption%3Asingh%2Ccaption%3Adhoni'
so try this way,
instead of split you can replace " " empty space with "%2Ccaption%3A" and start your string with "caption%3A"
for 2.x:
>>> from urllib import quote
>>> keyword = "mahinder singh dhoni"
>>> quote(','.join(['caption:%s'%i for i in keyword.split()]))
for 3.x:
>>> from urllib.parse import quote
>>> keyword = "mahinder singh dhoni"
>>> quote(','.join(['caption:%s'%i for i in keyword.split()]))

Detect subcommands in Argparse

The following parser should let me do some sub commands:
% my_script acmd a_val
Is processed sort of like this in my_script.py (using the list instead of an actual command line.)
import argparse
parser = argparse.ArgumentParser(description='example')
subparsers = parser.add_subparsers()
acmd_parser = subparsers.add_parser('acmd')
acmd_parser.add_argument('a_arg')
bcmd_parser = subparsers.add_parser('bcmd')
bcmd_parser.add_argument('b_arg')
args = parser.parse_args(['acmd','a_val'])
print(args)
The result is this:
Namespace(a_arg='a_val')
How do I tell whether I ran acmd or bcmd? Do I just have to figure it out from the arguments?
Provide a dest parameter to the add_subparsers command, as documented in
https://docs.python.org/3/library/argparse.html#sub-commands
>>> parser = argparse.ArgumentParser()
>>> subparsers = parser.add_subparsers(dest='subparser_name')
>>> subparser1 = subparsers.add_parser('1')
>>> subparser1.add_argument('-x')
>>> subparser2 = subparsers.add_parser('2')
>>> subparser2.add_argument('y')
>>> parser.parse_args(['2', 'frobble'])
Namespace(subparser_name='2', y='frobble')
That also documents the use of set_defaults.

how to format the variable x into format(platecode)

import string
with open('platenon.txt', 'w') as f:
for platecode in range(1000):
x =['A' + upper_char for upper_char in string.ascii_uppercase]
f.write('KJA{0:03d}'.format(platecode))
To get a list of all combinations of two letters from 'AA' to 'ZZ':
import string
import product
list(''.join(pair) for pair in itertools.product(string.lowercase, repeat=2))
If I understand you question correctly you want to get a list which contains the strings 'AA' - 'AZ' ['AA', 'AB', 'AC', ..., 'AZ']?
import string
upper_chars = ['A' + upper_char for upper_char in string.ascii_uppercase]
To get a list with all strings from 'AA' to 'ZZ' you can use this in python3
from string import ascii_uppercase
from itertools import product
[''.join(c) for c in product(string.ascii_uppercase, string.ascii_uppercase)]

Resources