Validating if user input is already sanitised - python-3.x

I've been working on a mini project with Python3 and tkinter recently that is used to sanitise URLs and IP addresses. I've hit a roadblock with my function that I cannot workout. What I am trying to achieve is:
Has a user entered a URL such as http://www.google.com or https://www.google.com and if so, sanitise as:
hxxp[:]//www[.]google[.]com or hxxps[:]//www[.]google[.]com
Has a user entered an IP address such as 192.168.1.1 or http://192.168.1.1 and sanitise as:
192[.]168[.]1[.]1 or hxxp[:]//192[.]168[.]1[.]1
Has a user entered already sanitised input? Is there unsanitised input along with it? If so, just sanitise the unsanitised input and print them to the results output Textbox.
I have included a screenshot of what is currently happening to my normal input, after input is sanitised and how I want to handle the above issues.
Also: Is the .strip() in the OutputTextbox.insert line redundant?
I appreciate any help and recommendations!
def printOut():
outputTextbox.delete("1.0", "end")
url = inputTextbox.get("1.0", "end-1c")
if len(url) == 0:
tk.messagebox.showerror("Error", "Please enter content to sanitise")
if "hxxp" and "[:]" and "[.]" in url or "hxxps" and "[:]" and "[.]" in url:
outputTextbox.insert("1.0", url, "\n".strip())
pass
elif "http" and ":" and "." in url:
url = url.replace("http", "hxxp")
url = url.replace(":", "[:]")
url = url.replace(".", "[.]")
outputTextbox.insert("1.0", url, "\n".strip())
elif "https" and ":" and "." in url:
url = url.replace("https", "hxxps")
url = url.replace(":", "[:]")
url = url.replace(".", "[.]")
outputTextbox.insert("1.0", url, "\n".strip())
elif "http" and ":" and range(0, 10) and "." in url or range(0, 10) and "." in url:
url = url.replace("http", "hxxp")
url = url.replace(".", "[.]")
outputTextbox.insert("1.0", url, "\n".strip())

The expression like A AND B AND C IN URL will has result like A AND B AND (C IN URL), not what you expect that A, B, C are all found in URL.
You can use re (regex module) to achieve what you want:
import re
def printOut():
outputTextbox.delete("1.0", "end")
url = inputTextbox.get("1.0", "end-1c")
if len(url) == 0:
messagebox.showerror("Error", "Please enter content to sanitise")
result = url.replace("http", "hxxp")
result = re.sub(r"([^\[]):([^\]])", r"\1[:]\2", result)
result = re.sub(r"([^\[])\.([^\]])", r"\1[.]\2", result)
outputTextbox.insert("end", result, "\n")
There may be better regex for that.

Related

How to convert script path into JSON?

I am scraping this website : https://www.epicery.com/c/promos?gclid=CjwKCAjw97P5BRBQEiwAGflV6bGzNEAz7MTIrgelBkTR277v3lhStP5tH0wgxuLj1ytlcQAAjb-cxBoCsVwQAvD_BwE
And I am trying to retrive some info in the script path like the description.
I get the script content with the xpath and make some regex and try to load it as json:
script_path = response.xpath('/html/body/script[1]').get()
j_list = re.findall(r'\[(.*)\}\]',script_path)
j = j[0].replace("'","")
json_script = json.loads(j)
But I have this following error that I cannot handle :
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 152446 (char 152445)
I'm not sure what do you want but this works for me:
def parse(self, response):
taxons_str = response.xpath('//script[contains(., "var taxons")]/text()').re_first(r'(?s)var taxons = (.+?)var shops')
if taxons_str:
taxons = json.loads(taxons_str)
for product in taxons:
process_your_product(product)

scapy sniff with special characters

Hi i have written below program, it is sniffing packets and i could see username and passwords and URLs, but when i enter password with special character i am getting like this "%21" can somebody please help...
#!/bin/python3
import scapy.all as scapy
from scapy.layers import http
def sniff(interface):
scapy.sniff(iface=interface, store=False, prn=process_sniffed_packets)
def get_url(packet):
if packet.haslayer(http.HTTPRequest):
url = packet[http.HTTPRequest].Host + packet[http.HTTPRequest].Path
return url
def get_login_info(packet):
if packet.haslayer(http.HTTPRequest):
if packet.haslayer(scapy.Raw):
load = packet[scapy.Raw].load
#load = str(load)
keybword = ["usr", "uname", "username", "pwd", "pass", "password"]
for eachword in keybword:
if eachword.encode() in load:
return load
def process_sniffed_packets(packet):
if packet.haslayer(http.HTTPRequest):
url = get_url(packet)
print("[+] HTTP Request>>" + str(url))
login_info = get_login_info(packet)
if login_info:
print("\n\n[+] Possible username and password >>" + str(login_info) + "\n\n")
sniff("eth0")
root#kali:~/python_course_by_zaid# ./packet_sniffer.py
[+] HTTP Request>>b'testing-ground.scraping.pro/login?mode=login'
[+] Possible username and password >>b"b'usr=admin&pwd=123456%21%40
it supposed to print 123456!#
The problem is that the password is URL-encoded. Essentially some characters cannot be put into the URL like ! and #, so they are escaped with a %.
If you URL-decode these strings prior to printing them, you'll get the expected result. In Python3, you can decode like so:
# script.py
import urllib.parse
result = urllib.parse.unquote("123456%21%40")
print(result)
Running it we get:
$ python script.py
123456!#

Sending URL inside another URL

So I'm building a telegram bot with python and I need to send to the user an URL. I'm using telegram send_text URL:
https://api.telegram.org/bot{bot_token}/sendMessage?chat_id={chat_id}&parse_mode=Markdown&text={message}
but the URL that I'm using:
https://www.amazon.es/RASPBERRY-Placa-Modelo-SDRAM-1822096/dp/B07TC2BK1X/ref=sr_1_3?__mk_es_ES=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=YJ6X8FN3V801&keywords=raspberry+pi+4&qid=1577853490&sprefix=raspberr%2Caps%2C195&sr=8-3
has a special character like & that prevents the message to be sent with the full URL. In the case of this URL I only receive this:
https://www.amazon.es/RASPBERRY-Placa-Modelo-SDRAM-1822096/dp/B07TC2BK1X/ref=sr13?mkesES=ÅMÅŽÕÑ
I tried using utf-8 to replace the characters like & but python transforms them back to "real character" so I had to throw the idea off.
In case you want to check out what I tried here is the code snippet:
url = url.replace('&', u"\x26")
So is there any way I could fix this?
Encode the URL with urlencode()
import requests
import urllib.parse
link = "https://www.amazon.es/RASPBERRY-Placa-Modelo-SDRAM-1822096/dp/B07TC2BK1X/ref=sr_1_3?__mk_es_ES=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=YJ6X8FN3V801&keywords=raspberry+pi+4&qid=1577853490&sprefix=raspberr%2Caps%2C195&sr=8-3"
markdownMsg = "[Click me!](" + urllib.parse.quote(link) + ")"
url = "https://api.telegram.org/bot<TOKEN>/sendMessage?chat_id=<ID>&text=" + markdownMsg + "&parse_mode=MarkDown"
response = requests.request("GET", url, headers={}, data ={})
print(response.text.encode('utf8'))
This also works for &parse_mode=HTML
htmlMsg = "Click me!"

How can I unshorten a URL in python 3

import http.client
import urllib.parse
def unshorten_url(url):
parsed = urllib.parse.urlparse(url)
h = http.client.HTTPConnection(parsed.netloc)
resource = parsed.path
if parsed.query != "":
resource += "?" + parsed.query
h.request('HEAD', resource )
response = h.getresponse()
if response.status/100 == 3 and response.getheader('Location'):
return unshorten_url(response.getheader('Location')) # changed to process chains of short urls
else:
return url
unshorten_url("http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34")
Input will be :
http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34 #yes the same is returned.'
Output URL after unshorten which i need : https://ec.europa.eu/esco/portal/occupation?uri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F00030d09-2b3a-4efd-87cc-c4ea39d27c34&conceptLanguage=en&full=true#&uri=http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34'
As you can see I have two URLs one which Short URL which is my input and The other one is Full URL, to achieve the required output URL I identified a pattern from a set of the same kind URLs. And I wrote this code and achieved the required output.
my_url = "http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34"
a="https://ec.europa.eu/esco/portal/occupationuri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F"
b = my_url.split("/")[-1]
URL = a+ b+ "&conceptLanguage=en&full=true#&uri=" + my_url
the output i.e; Required full URL is URL.
URL = " https://ec.europa.eu/esco/portal/occupation?uri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F00030d09-2b3a-4efd-87cc-c4ea39d27c34&conceptLanguage=en&full=true#&uri=http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34'"

Merge a string to URL in python

I have to concatenate a hardcoded path of "string" type to a URL to have a result which is a URL.
url (which doesn't end with "/") + "/path/to/file/" = new_url
I tried concatenation using URL join and also tried used simple string concat but the result is not a URL which can be reached. (not that the URL address is invalid )
mirror_url = "http://amazonlinux.us-east-
2.amazonaws.com/2/core/latest/x86_64/mirror.list"
response = requests.get(mirror_url)
contents_in_url = response.content
## returns a URL as shown below but of string type which cannot be
##concatenated to another string type which could be requested as a valid
##URL.
'http://amazonlinux.us-east- 2.amazonaws.com/2/core/2.0/x86_64/8cf736cd3252ada92b21e91b8c2a324d05b12ad6ca293a14a6ab7a82326aec43'
path_to_add_to_url = "/repodata/primary.sqlite.gz"
final_url = contents_in_url + path_to_add_to_url
Desired Result:
Without omitting any path to that file.
final_url = "http://amazonlinux.us-west-2.amazonaws.com/2/core/2.0/x86_64/8cf736cd3252ada92b21e91b8c2a324d05b12ad6ca293a14a6ab7a82326aec43/repodata/primary.sqlite.gz"
You need to get contents of the first response by response.text method, not response.content:
import requests
mirror_url = "http://amazonlinux.us-east-2.amazonaws.com/2/core/latest/x86_64/mirror.list"
response = requests.get(mirror_url)
contents_in_url = response.text.strip()
path_to_add_to_url = "/repodata/primary.sqlite.gz"
response = requests.get(contents_in_url + path_to_add_to_url)
with open('primary.sqlite.gz', 'wb') as f_out:
f_out.write(response.content)
print('Downloading done.')

Resources