converting a string into parameter for GET request in python - python-3.x

I am calling a GET API to retrieve some data. For get call I need to covert my keyword as
keyword = "mahinder singh dhoni"
into
caption%3Amahinder%2Ccaption%3Asingh%2Ccaption%3Adhoni
I am new to python and dont know the pythonic way. I am doing like this
caption_heading = "caption%3A"
caption_tail = "%2Ccaption%3A"
keyword = "mahinder singh dhoni"
x = keyword.split(" ")
new_caption_keyword = []
new_caption_keyword.append(caption_heading)
for data in x:
new_caption_keyword.append(data)
new_caption_keyword.append(caption_tail)
search_query = ''.join(new_caption_keyword)
search_query = search_query[:-13]
print("new transformed keyword", search_query)
Is there a better way to do this.I means this is kind of hard coding.
Thanks

Best to turn our original string into a list:
>>> keyword = "mahinder singh dhoni"
>>> keyword.split()
['mahinder', 'singh', 'dhoni']
Then your actual string looks like caption:...,caption:...,caption:..., that can be done with a join and a format:
>>> # if you're < python3.6, use 'caption:{}'.format(part)`
>>> [f'caption:{part}' for part in keyword.split()]
['caption:mahinder', 'caption:singh', 'caption:dhoni']
>>> ','.join([f'caption:{part}' for part in keyword.split()])
'caption:mahinder,caption:singh,caption:dhoni'
And finally you'll urlencode using urllib.parse:
>>> import urllib.parse
>>> urllib.parse.quote(','.join([f'caption:{part}' for part in keyword.split()]))
'caption%3Amahinder%2Ccaption%3Asingh%2Ccaption%3Adhoni'

so try this way,
instead of split you can replace " " empty space with "%2Ccaption%3A" and start your string with "caption%3A"
for 2.x:
>>> from urllib import quote
>>> keyword = "mahinder singh dhoni"
>>> quote(','.join(['caption:%s'%i for i in keyword.split()]))
for 3.x:
>>> from urllib.parse import quote
>>> keyword = "mahinder singh dhoni"
>>> quote(','.join(['caption:%s'%i for i in keyword.split()]))

Related

Why substring cannot be found in the target string?

To understand the values of each variable, I improved a script for replacement from Udacity class. I convert the codes in a function into regular codes. However, my codes do not work while the codes in the function do. I appreciate it if anyone can explain it. Please pay more attention to function "tokenize".
Below codes are from Udacity class (CopyRight belongs to Udacity).
# download necessary NLTK data
import nltk
nltk.download(['punkt', 'wordnet'])
# import statements
import re
import pandas as pd
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
url_regex = 'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
def load_data():
df = pd.read_csv('corporate_messaging.csv', encoding='latin-1')
df = df[(df["category:confidence"] == 1) & (df['category'] != 'Exclude')]
X = df.text.values
y = df.category.values
return X, y
def tokenize(text):
detected_urls = re.findall(url_regex, text) # here, "detected_urls" is a list for sure
for url in detected_urls:
text = text.replace(url, "urlplaceholder") # I do not understand why it can work while does not work in my code if I do not convert it to string
tokens = word_tokenize(text)
lemmatizer = WordNetLemmatizer()
clean_tokens = []
for tok in tokens:
clean_tok = lemmatizer.lemmatize(tok).lower().strip()
clean_tokens.append(clean_tok)
return clean_tokens
X, y = load_data()
for message in X[:5]:
tokens = tokenize(message)
print(message)
print(tokens, '\n')
Below is its output:
I want to understand the variables' values in function "tokenize()". Following are my codes.
X, y = load_data()
detected_urls = []
for message in X[:5]:
detected_url = re.findall(url_regex, message)
detected_urls.append(detected_url)
print("detected_urs: ",detected_urls) #output a list without problems
# replace each url in text string with placeholder
i = 0
for url in detected_urls:
text = X[i].strip()
i += 1
print("LN1.url= ",url,"\ttext= ",text,"\n type(text)=",type(text))
url = str(url).strip() #if I do not convert it to string, it is a list. It does not work in text.replace() below, but works in above function.
if url in text:
print("yes")
else:
print("no") #always show no
text = text.replace(url, "urlplaceholder")
print("\nLN2.url=",url,"\ttext= ",text,"\n type(text)=",type(text),"\n===============\n\n")
The output is shown below.
The outputs for "LN1" and "LN2" are same. The "if" condition always output "no". I do not understand why it happens.
Any further help and advice would be highly appreciated.

I am trying to get output like below using pythone code...any suggestions?

I am trying to get output like below using pythone code...any suggestions?
list=["ABCPMCABCCMD","CMDABC"]
list2=["ABC","CMD"]
output:
[ABCABCCMD,CMDABC]
You can use re module:
import re
list1 = ["ABCPMCABCCMD", "CMDABC", "ABCMD"]
list2 = ["ABC", "CMD"]
r = re.compile("|".join(re.escape(w) for w in list2))
out = ["".join(r.findall(word)) for word in list1]
print(out)
Prints:
['ABCABCCMD', 'CMDABC', 'ABC']

converting bytes to string gives b' prefix

I'm trying to convert an old python2 code to python3, and I'm facing a problem with strings vs bytes
In the old code, this line was executed:
'0x' + binascii.hexlify(bytes_reg1)
In python2 binascii.hexlify(bytes_reg1) was returning a string but in python3 it returns bytes, so it cannot be concatenated to "0x"
TypeError: can only concatenate str (not "bytes") to str
I tried converting it to string:
'0x' + str(binascii.hexlify(bytes_reg1))
But what I get as a result is:
"0xb'23'"
And it should be:
"0x23"
How can I convert the bytes to just 23 instead of b'23' so when concatenating '0x' I get the correct string?
can you try doing this and let me know whether it worked for you or not :
'0x' + str(binascii.hexlify(bytes_reg1)).decode("utf-8")
# or
'0x' + str(binascii.hexlify(bytes_reg1), encoding="utf-8")
note- Also if you can provide the sample of bytes_reg1, it will be easier to provide a solution.
Decode is the way forward, as #Satya says.
You can access the hex string in another way:
>>> import binascii
>>> import struct
>>>
>>> some_bytes = struct.pack(">H", 12345)
>>>
>>> h = binascii.hexlify(some_bytes)
>>> print(h)
b'3039'
>>>
>>> a = h.decode('ascii')
>>> print(a)
3039
>>>
>>> as_hex = hex(int(a, 16))
>>> print(as_hex)
0x3039
>>>

python regular expression for URL just "www.example.com"

I need python regex for "www.example.com" (without quotes).
example can be of any string.
I need it without any other text before "www" and after ".com"
You can use a dedicated function from the standard library urllib.parse.urlparse:
>>> from urllib.parse import urlparse
>>> parts = urlparse('http://www.example.org')
>>> parts
ParseResult(scheme='http', netloc='www.example.org', path='', params='', query='', fragment='')
>>> parts.netloc
'www.example.org'
Or you can use this regexp for a text:
>>> import re
>>> regexp = re.compile(r'\s*(www\.[^:\/\n]+\.com)\s*')
>>> urls = regexp.findall('Hello https://www.mywebsite.com/index.py?q=search bonjour...')
>>> urls
['www.mywebsite.com']

how to format the variable x into format(platecode)

import string
with open('platenon.txt', 'w') as f:
for platecode in range(1000):
x =['A' + upper_char for upper_char in string.ascii_uppercase]
f.write('KJA{0:03d}'.format(platecode))
To get a list of all combinations of two letters from 'AA' to 'ZZ':
import string
import product
list(''.join(pair) for pair in itertools.product(string.lowercase, repeat=2))
If I understand you question correctly you want to get a list which contains the strings 'AA' - 'AZ' ['AA', 'AB', 'AC', ..., 'AZ']?
import string
upper_chars = ['A' + upper_char for upper_char in string.ascii_uppercase]
To get a list with all strings from 'AA' to 'ZZ' you can use this in python3
from string import ascii_uppercase
from itertools import product
[''.join(c) for c in product(string.ascii_uppercase, string.ascii_uppercase)]

Resources