HTTPConnection.request fails but urllib.request.urlopen works? - python-3.x

I've been stuck here for quite a while but can't find anything helpful.
I'm trying to connect to a website and get a response json file in Python3. The code looks like below:
conn = http.client.HTTPConnection('host.address')
params = "xx"+ xx + "xx" + ...
conn.request('GET', '/a/b/c', params)
resp = conn.getresponse()
This actually won't return the json file but the webpage http://host.address/a/b/c, which is an error page.
However, while using the following code:
params = "xx"+ xx + "xx" + ...
resp = urllib.request.urlopen("http://host.address/a/b/c?"+params)
It returns correctly the json file.
Any idea what's wrong with the code?
Thanks

In HTTP, only POST requests are supposed to have a body. The third parameter to request() is actually the body (see http://docs.python.org/py3k/library/http.client.html#http.client.HTTPConnection.request) - just build the URL as shown in the second example.

Just to complement #sqrtsben's answer with an example:
import urllib.parse
import http.client
u = urllib.parse.urlparse("http://localhost:8080/index.php?utf8=✓")
conn = http.client.HTTPConnection(u.hostname, u.port)
if u.query == '':
conn.request("GET", u.path)
else:
conn.request("GET", u.path + '?' + u.query)

Related

How to get a download link which requires checkboxes checking in additional dialog box

I want to download the last publicly available file from https://sam.gov/data-services/Exclusions/Public%20V2?privacy=Public
while trying to download manually, the real download links look like:
https://falextracts.s3.amazonaws.com/Exclusions/Public%20V2/SAM_Exclusions_Public_Extract_V2_22150.ZIP?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T143743Z&X-Amz-SignedHeaders=host&X-Amz-Expires=2699&X-Amz-Credential=AKIAY3LPYEEXWOQWHCIY%2F20220530%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=3eca59f75a4e1f6aa59fc810da8f391f1ebfd8ca5a804d56b79c3eb9c4d82e32
My function gets only initial link, which refers to the real link:
import json
import requests
from operator import itemgetter
files_url = 'https://sam.gov/api/prod/fileextractservices/v1/api/listfiles?random=1653676394983&domain=Exclusions/Public%20V2&privacy=Public'
def get_file():
response = requests.get(files_url, stream=True)
links_resp = json.loads(response.text)
links_dicts = [d for d in links_resp['_embedded']['customS3ObjectSummaryList'] if d['displayKey'].count('SAM_Exclus')]
sorted_links = sorted(links_dicts, key=itemgetter('dateModified'), reverse=True)
return sorted_links[0]['_links']['self']['href']
get_file()
Result:
'https://s3.amazonaws.com/falextracts/Exclusions/Public V2/SAM_Exclusions_Public_Extract_V2_22150.ZIP'
But by following the above link, I get Access denied
So I will appreciate any hints on how to get real download links
I've edited your code as much as possible so you can understand. The requests library can convert it to json itself.
imports that are not at the beginning of the code do not look very good for reading...
import requests as req
from operator import itemgetter
files_url = "https://sam.gov/api/prod/fileextractservices/v1/api/listfiles?random=1653676394983&domain=Exclusions/Public%20V2&privacy=Public"
down_url = "https://sam.gov/api/prod/fileextractservices/v1/api/download/Exclusions/Public%20V2/{}?privacy=Public"
def get_file():
response = req.get(files_url, stream=True).json()
links_dicts = [d for d in response["_embedded"]["customS3ObjectSummaryList"]]
sorted_links = sorted(links_dicts, key=itemgetter('dateModified'), reverse=True)
key = sorted_links[0]['displayKey']
down = req.get(down_url.format(key))
if not down.status_code == 200:
return False
print(key)
open(key, 'wb').write(down.content)
return True
get_file()

Trouble getting Vimeo filename from API using PyVimeo

I have an script using Python and PyVimeo that I am working on to use the "GEThttps://api.vimeo.com/videos/{video_id}" so I can get the file name. When I try to run my app, I am getting an error {'error': "The requested video couldn't be found."}. However, when I use this same video ID under the Try it out section (https://developer.vimeo.com/api/reference/videos#get_video), it works fine.
I am assuming there is something wrong with my code, but if I use the demo from the github example (about_me = v.get('/me')), it works fine and that needs authentication as well.
Is there something simple I am missing? Thank you so much.
import vimeo
v = vimeo.VimeoClient(
token= 'VimeoToken',
key= 'VimeoKey',
secret= 'VimeoSecret'
)
class Vimeo:
def get_vimeo_data(video_file):
uri = 'https://api.vimeo.com/videos/{video_file}'
# uri = 'https://api.vimeo.com/me/videos' - This response works
response = v.get(uri)
data = response.json()
print(data)
Vimeo.get_vimeo_data(55555)
You forgot to add an f before your f-string.
class Vimeo:
def get_vimeo_data(video_file):
# THIS f
uri = f"https://api.vimeo.com/videos/{video_file}"
# uri = 'https://api.vimeo.com/me/videos' - This response works
response = v.get(uri)
data = response.json()
print(data)

Changed my code from python 2 to 3 and trying to run this Unix

I was able to run the below code in python 2.7 in Windows correctly. But when I converted and changed my code from 2.7 to Python 3.6, then trying to run the code in Unix. I am getting the following error.
'TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.'
I am very new to this URLLIB. It would be great help if you could tell me what is the problem in my code.
python 2.7, windows 10:
import json
import win32com.client as WC
def extractFromParFiles_SingleComponent (HCP_ID,ListOfTestnumbers,ListOfPartSequenceNumbers):
InputObject = {"HCP_ID":HCP_ID,"ListOfTestnumbers":ListOfTestnumbers,"ListOfPartSequenceNumbers":ListOfPartSequenceNumbers,"KeepNullValues":True}
return callWebservice(InputObject)
def callWebservice(InputObject):
URL = ""
h = WC.Dispatch('WinHTTP.WinHTTPRequest.5.1')
h.SetAutoLogonPolicy(0)
h.SetTimeouts(0,0,0,0)
h.Open('POST', URL, False)
h.SetRequestHeader('Content-Type', 'application/json')
JSON_In = json.dumps(InputObject)
h.Send(JSON_In)
JSON_Out = h.responseText
try:
OutputData = json.loads(JSON_Out)
except:
print JSON_Out
if "d" in OutputData:
return OutputData["d"]
else:
print "Length of JSON_In: %s\n" % len(JSON_In)
print "JSON_In: %s\n" % JSON_In
print "Length of JSON_Out: %s\n" % len(JSON_Out)
print "JSON_Out: %s\n" % JSON_Out
raise RuntimeError, "Error from Webservice"
Python 3.6, Unix:
import json
import urllib.request, urllib.error, urllib.parse
from ntlm3 import HTTPNtlmAuthHandler
def extractFromParFiles_SingleComponent(HCP_ID,ListOfTestnumbers,ListOfPartSequenceNumbers):
InputObject = {"HCP_ID":HCP_ID,"ListOfTestnumbers":ListOfTestnumbers,"ListOfPartSequenceNumbers":ListOfPartSequenceNumbers,"KeepNullValues":True}
return callWebservice(InputObject)
def callWebservice(InputObject):
URL = ""
User = ""
Password = ""
pass_item = urllib.request.HTTPPasswordMgrWithDefaultRealm()
pass_item.add_password(None, URL, User, Password)
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(pass_item)
opener = urllib.request.build_opener(auth_NTLM)
urllib.request.install_opener(opener)
JSON_In = json.dumps(InputObject)
req = urllib.request.Request(URL, JSON_In, {'Content-Type': 'application/json'})
f = urllib.request.urlopen(req)
JSON_Out = f.read()
f.close()
OutputData = json.loads(JSON_Out)
return OutputData["d"]
urllib.request.Request data parameter expects a bytes-like object, you're passing in a string. Call encode() on JSON_In to convert it to a byte array.
From the documentation (https://docs.python.org/3/library/urllib.request.html#urllib.request.Request):
data must be an object specifying additional data to send to the server, or None if no such data is needed. Currently HTTP requests are the only ones that use data. The supported object types include bytes, file-like objects, and iterables. If no Content-Length nor Transfer-Encoding header field has been provided, HTTPHandler will set these headers according to the type of data. Content-Length will be used to send bytes objects, while Transfer-Encoding: chunked as specified in RFC 7230, Section 3.3.1 will be used to send files and other iterables.
Your code should then look like:
req = urllib.request.Request(URL, JSON_In.encode(), {'Content-Type': 'application/json'})

python regex API pulling and converting to Text format giving error

I have a simple python code to pull some malware feeds from open source api and find the only IP from this list .
The url already contains IP but when you capture it and save in local file you can see there are other string \r\n present after each IP may be because of new line.
Can some one pls guide as I am new to Python and what i am doing wrong here?
import urllib.request
import urllib.parse
import re
url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
ip = re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', resp)
malwareIPList = ip.read()
print (malwareIPlist)
error
line 223, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
The issue is that you need to .read() the resp from urllib.request.urlopen
Consider:
import urllib.request
import urllib.parse
import re
url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
print(resp)
Prints:
<http.client.HTTPResponse object at 0x103a4ccf8>
What I think you are looking for is:
url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
ip = re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', str(resp.read(), 'utf-8'))
print (ip)
Prints a bunch of IP addresses...
BTW, since the data are ip addresses delimited by \r\n you actually do not need a regex. You can do:
>>> str(resp.read(), 'utf-8').splitlines()
['103.14.120.121', '103.19.89.55', '103.224.212.222', '103.24.13.91', ...]

Python: Facebook Graph API - pagination request using facebook-sdk

I'm trying to query Facebook for different information, for example - friends list. And it works fine, but of course it only gives limited number of results. How do I access the next batch of results?
import facebook
import json
ACCESS_TOKEN = ''
def pp(o):
with open('facebook.txt', 'a') as f:
json.dump(o, f, indent=4)
g = facebook.GraphAPI(ACCESS_TOKEN)
pp(g.get_connections('me', 'friends'))
The result JSON does give me paging-cursors-before and after values - but where do I put it?
I'm exploring Facebook Graph API through the facepy library for Python (works on Python 3 too), but I think I can help.
TL-DR:
You need to append &after=YOUR_AFTER_CODE to the URL you've called (e.g: https://graph.facebook/v2.8/YOUR_FB_ID/friends/?fields=id,name), giving you a link like this one: https://graph.facebook/v2.8/YOUR_FB_ID/friends/?fields=id,name&after=YOUR_AFTER_CODE, that you should make a GET Request.
You'll need requests in order to make a get request for Graph API using your user ID (I'm assuming you know how to find it programatically) and some url similar to the one I give you below (see URL variable).
import facebook
import json
import requests
ACCESS_TOKEN = ''
YOUR_FB_ID=''
URL="https://graph.facebook.com/v2.8/{}/friends?access_token={}&fields=id,name&limit=50&after=".format(YOUR_FB_ID, ACCESS_TOKEN)
def pp(o):
all_friends = []
if ('data' in o):
for friend in o:
if ('next' in friend['paging']):
resp = request.get(friend['paging']['next'])
all_friends.append(resp.json())
elif ('after' in friend['paging']['cursors']):
new_url = URL + friend['paging']['cursors']['after']
resp = request.get(new_url)
all_friends.append(resp.json())
else:
print("Something went wrong")
# Do whatever you want with all_friends...
with open('facebook.txt', 'a') as f:
json.dump(o, f, indent=4)
g = facebook.GraphAPI(ACCESS_TOKEN)
pp(g.get_connections('me', 'friends'))
Hope this helps!

Resources