Sending URL inside another URL - python-3.x

So I'm building a telegram bot with python and I need to send to the user an URL. I'm using telegram send_text URL:
https://api.telegram.org/bot{bot_token}/sendMessage?chat_id={chat_id}&parse_mode=Markdown&text={message}
but the URL that I'm using:
https://www.amazon.es/RASPBERRY-Placa-Modelo-SDRAM-1822096/dp/B07TC2BK1X/ref=sr_1_3?__mk_es_ES=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=YJ6X8FN3V801&keywords=raspberry+pi+4&qid=1577853490&sprefix=raspberr%2Caps%2C195&sr=8-3
has a special character like & that prevents the message to be sent with the full URL. In the case of this URL I only receive this:
https://www.amazon.es/RASPBERRY-Placa-Modelo-SDRAM-1822096/dp/B07TC2BK1X/ref=sr13?mkesES=ÅMÅŽÕÑ
I tried using utf-8 to replace the characters like & but python transforms them back to "real character" so I had to throw the idea off.
In case you want to check out what I tried here is the code snippet:
url = url.replace('&', u"\x26")
So is there any way I could fix this?

Encode the URL with urlencode()
import requests
import urllib.parse
link = "https://www.amazon.es/RASPBERRY-Placa-Modelo-SDRAM-1822096/dp/B07TC2BK1X/ref=sr_1_3?__mk_es_ES=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=YJ6X8FN3V801&keywords=raspberry+pi+4&qid=1577853490&sprefix=raspberr%2Caps%2C195&sr=8-3"
markdownMsg = "[Click me!](" + urllib.parse.quote(link) + ")"
url = "https://api.telegram.org/bot<TOKEN>/sendMessage?chat_id=<ID>&text=" + markdownMsg + "&parse_mode=MarkDown"
response = requests.request("GET", url, headers={}, data ={})
print(response.text.encode('utf8'))
This also works for &parse_mode=HTML
htmlMsg = "Click me!"

Related

i want download files with python using wget(FTP). but error occured. please help to download

I want down load "*_ice.nc" files in ftp. so..
library
import wget
import math
import re
from urllib import request
adress and file list
url = "ftp://ftp.hycom.org/datasets/GLBy0.08/expt_93.0/data/hindcasts/2021/" #url
html = request.urlopen(url) #open url
html_contents = str(html.read().decode("cp949"))
url_list = re.findall(r"(ftp)(.+)(_ice.nc)", html_contents)
loop for download
for url in url_list: #loop
url_full="".join(url) #tuple to string
file_name=url_full.split("/")[-1]
print('\nDownloading ' + file_name)
wget.download(url_full) #down with wget
but error messege occured like this
(ValueError: unknown url type: 'ftp%20%20%20%20%20%20ftp%20%20%20%20%20%20382663848%20Jan%2002%20%202021%20hycom_GLBy0.08_930_2021010112_t000_ice.nc')
could i get some help?
After decoding
ftp%20%20%20%20%20%20ftp%20%20%20%20%20%20382663848%20Jan%2002%20%202021%20hycom_GLBy0.08_930_2021010112_t000_ice.nc
is
ftp ftp 382663848 Jan 02 2021 hycom_GLBy0.08_930_2021010112_t000_ice.nc
which clearly is not legal ftp address. You need alter your code so it will be
ftp://ftp.hycom.org/datasets/GLBy0.08/expt_93.0/data/hindcasts/2021/hycom_GLBy0.08_930_2021010112_t000_ice.nc
I suggest temporarily replacing wget.download(url_full) using print(url_full), then apply changes to get desired output and then reverting to wget.download(url_full).

Validating if user input is already sanitised

I've been working on a mini project with Python3 and tkinter recently that is used to sanitise URLs and IP addresses. I've hit a roadblock with my function that I cannot workout. What I am trying to achieve is:
Has a user entered a URL such as http://www.google.com or https://www.google.com and if so, sanitise as:
hxxp[:]//www[.]google[.]com or hxxps[:]//www[.]google[.]com
Has a user entered an IP address such as 192.168.1.1 or http://192.168.1.1 and sanitise as:
192[.]168[.]1[.]1 or hxxp[:]//192[.]168[.]1[.]1
Has a user entered already sanitised input? Is there unsanitised input along with it? If so, just sanitise the unsanitised input and print them to the results output Textbox.
I have included a screenshot of what is currently happening to my normal input, after input is sanitised and how I want to handle the above issues.
Also: Is the .strip() in the OutputTextbox.insert line redundant?
I appreciate any help and recommendations!
def printOut():
outputTextbox.delete("1.0", "end")
url = inputTextbox.get("1.0", "end-1c")
if len(url) == 0:
tk.messagebox.showerror("Error", "Please enter content to sanitise")
if "hxxp" and "[:]" and "[.]" in url or "hxxps" and "[:]" and "[.]" in url:
outputTextbox.insert("1.0", url, "\n".strip())
pass
elif "http" and ":" and "." in url:
url = url.replace("http", "hxxp")
url = url.replace(":", "[:]")
url = url.replace(".", "[.]")
outputTextbox.insert("1.0", url, "\n".strip())
elif "https" and ":" and "." in url:
url = url.replace("https", "hxxps")
url = url.replace(":", "[:]")
url = url.replace(".", "[.]")
outputTextbox.insert("1.0", url, "\n".strip())
elif "http" and ":" and range(0, 10) and "." in url or range(0, 10) and "." in url:
url = url.replace("http", "hxxp")
url = url.replace(".", "[.]")
outputTextbox.insert("1.0", url, "\n".strip())
The expression like A AND B AND C IN URL will has result like A AND B AND (C IN URL), not what you expect that A, B, C are all found in URL.
You can use re (regex module) to achieve what you want:
import re
def printOut():
outputTextbox.delete("1.0", "end")
url = inputTextbox.get("1.0", "end-1c")
if len(url) == 0:
messagebox.showerror("Error", "Please enter content to sanitise")
result = url.replace("http", "hxxp")
result = re.sub(r"([^\[]):([^\]])", r"\1[:]\2", result)
result = re.sub(r"([^\[])\.([^\]])", r"\1[.]\2", result)
outputTextbox.insert("end", result, "\n")
There may be better regex for that.

How can I unshorten a URL in python 3

import http.client
import urllib.parse
def unshorten_url(url):
parsed = urllib.parse.urlparse(url)
h = http.client.HTTPConnection(parsed.netloc)
resource = parsed.path
if parsed.query != "":
resource += "?" + parsed.query
h.request('HEAD', resource )
response = h.getresponse()
if response.status/100 == 3 and response.getheader('Location'):
return unshorten_url(response.getheader('Location')) # changed to process chains of short urls
else:
return url
unshorten_url("http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34")
Input will be :
http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34 #yes the same is returned.'
Output URL after unshorten which i need : https://ec.europa.eu/esco/portal/occupation?uri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F00030d09-2b3a-4efd-87cc-c4ea39d27c34&conceptLanguage=en&full=true#&uri=http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34'
As you can see I have two URLs one which Short URL which is my input and The other one is Full URL, to achieve the required output URL I identified a pattern from a set of the same kind URLs. And I wrote this code and achieved the required output.
my_url = "http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34"
a="https://ec.europa.eu/esco/portal/occupationuri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F"
b = my_url.split("/")[-1]
URL = a+ b+ "&conceptLanguage=en&full=true#&uri=" + my_url
the output i.e; Required full URL is URL.
URL = " https://ec.europa.eu/esco/portal/occupation?uri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F00030d09-2b3a-4efd-87cc-c4ea39d27c34&conceptLanguage=en&full=true#&uri=http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34'"

Python 3.6.3 - Send MozillaCookieJar File and read HTML source code

I'm very fresh about python (i'm learning just about 1 day long).
I need to send cookies (i got them from my Google Chrome browser to a *.text file) and be redirected after login to my account page, to after read a source HTML code do what i wanna do. With much searches allong internet, i already have this piece of code:
import os
import time
import urllib.request
import http.cookiejar
while 1:
cj = http.cookiejar.MozillaCookieJar('cookies.txt')
cj.load()
print(len(cj)) # output: 9
print(cj) # output: <MozillaCookieJar[<Cookie .../>, <Cookie .../>, ... , <Cookie .../>]>
for cookie in cj:
cookie.expires = time.time() + 14 * 24 * 3600
cookieProcessor = urllib.request.HTTPCookieProcessor(cj)
opener = urllib.request.build_opener(cookieProcessor)
request = urllib.request.Request(url='https://.../')
response = opener.open(request, timeout=100)
s = str(response.read(), 'utf-8')
print(s)
if 'class' in s:
os.startfile('test.mp3')
time.sleep(5)
With this code i believe, hope i'm not be mistaken, have sending the cookies correctly. My main question is: How can i wait and catch the source HTML code after server redirect my login to personal page? I can't call again my Request with the same URL.
Thank you in advance.

How to use urllib with username/password authentication in python 3?

Here is my problem with urllib in python 3.
I wrote a piece of code which works well in Python 2.7 and is using urllib2. It goes to the page on Internet (which requires authorization) and grabs me the info from that page.
The real problem for me is that I can't make my code working in python 3.4 because there is no urllib2, and urllib works differently; even after few hours of googling and reading I got nothing. So if somebody can help me to solve this, I'd really appreciate that help.
Here is my code:
request = urllib2.Request('http://mysite/admin/index.cgi?index=127')
base64string = base64.encodestring('%s:%s' % ('login', 'password')).replace('\n', '')
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib2.urlopen(request)
resulttext = result.read()
Thankfully to you guys I finally figured out the way it works.
Here is my code:
request = urllib.request.Request('http://mysite/admin/index.cgi?index=127')
base64string = base64.b64encode(bytes('%s:%s' % ('login', 'password'),'ascii'))
request.add_header("Authorization", "Basic %s" % base64string.decode('utf-8'))
result = urllib.request.urlopen(request)
resulttext = result.read()
After all, there is one more difference with urllib: the resulttext variable in my case had the type of <bytes> instead of <str>, so to do something with text inside it I had to decode it:
text = resulttext.decode(encoding='utf-8',errors='ignore')
What about urllib.request ? It seems it has everything you need.
import base64
import urllib.request
request = urllib.request.Request('http://mysite/admin/index.cgi?index=127')
base64string = bytes('%s:%s' % ('login', 'password'), 'ascii')
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib.request.urlopen(request)
resulttext = result.read()
An alternative using OpenerDirector that installs the auth headers for all future urllib requests
login_pass = base64.b64encode(f'{login}:{password}'.encode()).decode()
opener = urllib.request.build_opener()
opener.addheaders = [('Authorization', f'Basic {login_pass}')]
urllib.request.install_opener(opener)
response = urllib.request.urlopen(API_URL)
print(response.read().decode())
A further example using HTTPBasicAuthHandler although a bit more work required if need to send credentials unconditionally:
password_mgr = urllib.request.HTTPPasswordMgrWithPriorAuth()
password_mgr.add_password(None, API_URL, login, password, is_authenticated=True)
auth_handler = request.HTTPBasicAuthHandler(password_mgr)
opener = request.build_opener(auth_handler)
request.install_opener(opener)
response = urllib.request.urlopen(API_URL)
print(response.read().decode())

Resources