Requests with multipart/form-data and no file returns 400 - python-3.x

I want to convert the (working) curl command to Python:
$ curl -X POST --header 'Content-Type: multipart/form-data' --header 'Accept: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/1.7.0"' -F wikitext=%27%27%27Mahikari%27%27%27%20is%20a%20%5B%5BJapan%5D%5Dese%20%5B%5Bnew%20religious%20movement%5D%5D -F body_only=true 'https://en.wikipedia.org/api/rest_v1/transform/wikitext/to/html'
<p id="mwAQ">%27%27%27Mahikari%27%27%27%20is%20a%20%5B%5BJapan%5D%5Dese%20%5B%5Bnew%20religious%20movement%5D%5D</p>
Using Requests, I replace the file requested at the first position of the tuple I pass to files with None (apparently an overloaded feature for this API), but I can't still this code to work. It returns a 400:
import requests
text = """
The '''Runyon classification''' of nontuberculous [[mycobacteria]] based on the rate of growth, production of yellow pigment and whether this pigment was
produced in the dark or only after exposure to light.
It was introduced by Ernest Runyon in 1959.
"""
multipart_data = {
'wikitext': (None, urllib.parse.quote(text)),
'body_only': (None, 'true'),
}
url = 'https://en.wikipedia.org/api/rest_v1/transform/wikitext/to/html'
headers = {'Content-Type': 'multipart/form-data', 'Accept': 'text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/1.7.0"'}
r = requests.post(url, files=multipart_data, headers=headers) # , headers={'Content-Type': 'multipart/form-data', 'Accept': 'text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/1.7.0"'})
if r.status_code == 200:
Update
I tried another solution using requests_toolbelt.multipart.encoder.MultipartEncoder and it still doesn't work:
from requests_toolbelt.multipart.encoder import MultipartEncoder
text = """
The '''Runyon classification''' of nontuberculous [[mycobacteria]] based on the rate of growth, production of yellow pigment and whether this pigment was produced in the dark or only after exposure to light.
It was introduced by Ernest Runyon in 1959.
"""
from requests_toolbelt.multipart.encoder import MultipartEncoder
multipart_data = MultipartEncoder(
fields=(
('wikitext', urllib.parse.quote(text)),
('body_only', 'true'),
)
)
url = 'https://en.wikipedia.org/api/rest_v1/transform/wikitext/to/html'
headers = {'Content-Type': 'multipart/form-data', 'Accept': 'text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/1.7.0"'}
r = requests.post(url, data=multipart_data, headers=headers)

Related

Error 500 : Anyone can guess why i am getting this if sending this request from robot framework python and its working using postman

Code:
Make request for getBookingOptions with valid user
#[Arguments] ${VALID_USER} ${VALID_PASSWORD} ${VALID_EMAIL} ${MEETING_DATE} ${MEETING_TIME} ${MEETING_DURATION} ${TIMEZONE_OFFSET} ${ROOMS} ${REQUIRED_INVITEES}
${body} = Create Dictionary userId=45646546456 password=fgdfgdfg email=mohammednasir.ali#istrbc.com meetingDate=2022-08-30 meetingTime=2022-08-30T16:00:00.000 meetingDuration=30 timeZoneOffset=-14400
${header} = Create Dictionary Content-Type=application/json
${body} Evaluate json.dumps(${body}) json
${response} = Post request CEA /getBookingOptions json=${body} headers=${header}
log ${RESPONSE_DATA}
set test variable ${response} ${response}
set global variable ${RESPONSE_DATA} ${response.json()}
Robot framework generated:
POST Request using
: uri=/getBookingOptions, params=None, files=None, allow_redirects=True, timeout=None
headers={'User-Agent': 'python-requests/2.23.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Type': 'application/json'}
data=None
json={"userId": "64554646", "password": "Disdcovdgery7", "email": "nasir#google.com", "meetingDate": "2022-08-30", "meetingTime": "2022-08-30T16:00:00.000", "meetingDuration": "30", "timeZoneOffset": "-14400"}
Output:
{'message': 'invalid request', 'status': 'failure'}
${response} = <Response [500]>
but same kind of json is working fine using postman. I am unable to find the reason. Can anyone guess please?

Python call rest api to get data from url

I've created a Bash script to get the data from the url using rest API from a appliance using username, password and saving the Session ID into a Variable and then using the session ID to get the data into csv format which is working fine.
I want to change the bash code into python3 code as i'm parsing it using pandas.
Bash Code:
#!/bin/bash
sessionID=$(curl -k -H "accept: application/json" -H "content-type: application/json" -H "x-api-version: 120" -d '{"userName":"administrator","password":"adminpass"}' -X POST https://hpe.sysnergy.com/rest/login-sessions | jq -r ".sessionID")
curl -k -H 'accept: application/json' \
-H 'content-type: text/csv' \
-H 'x-api-version: 2' \
-H "auth: $sessionID" \
-X GET https://hpe.sysnergy.com/rest/resource-alerts
Python Version of tries code:
#!/usr/bin/python3
import requests
import json
url = "https://hpe.sysnergy.com/rest/login-sessions"
data = {'username': 'administrator', 'password': 'adminpass'}
headers = {'Content-type': 'text/csv', 'Accept': 'application/json', 'x-api-version': 2}
r = requests.post(url, data=json.dumps(data), headers=headers)
print(r)
I am getting below error:
Error:
requests.exceptions.InvalidHeader: Value for header {x-api-version: 2} must be of type str or bytes, not <class 'int'>
if i convert int to str as '2' then it gives another ssl error:
requests.exceptions.SSLError: HTTPSConnectionPool(host='hpe.synerg.com', port=443): Max retries exceeded with url: /rest/login-sessions (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:877)'),))
EDIT:
I have tried little different approach to get the same code format as bash in python but now it returns new error with new response code.
import os
import requests
sessionID = os.getenv('sessionID')
headers = {
'accept': 'application/json',
'content-type': 'text/csv',
'x-api-version': '2',
'auth': f"{sessionID}",
}
data = '{"userName":"administrator","password":"adminpassword"}'
response = requests.post('https://hpe.synergy.com/rest/login-sessions', headers=headers, data=data, verify=False)
print(response)
Error:
/python3/lib64/python3.6/site-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'hpe.synergy.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
<Response [415]>
Please help or suggest the way to achieve same function in the python.
You first need to make a POST request to get the sessionID, then you need to make a GET request. Also note the headers are slightly different for the 2 requests. Something like this should work:
import requests
session = requests.Session()
url = "https://hpe.sysnergy.com/rest/login-sessions"
credentials = {"userName": "administrator", "password": "adminpass"}
headers = {"accept": "application/json",
"content-type": "application/json",
"x-api-version": "120",
}
response = session.post(url, headers=headers, json=credentials, verify=False)
session_id = response.json()["sessionID"]
url = "https://hpe.sysnergy.com/rest/resource-alerts"
headers = {"accept": "application/json",
"content-type": "text/csv",
"x-api-version": "2",
"auth": session_id,
}
response = session.get(url, headers=headers, verify=False)
print(response)
#print(response.content) # returns bytes
#print(response.text) # returns string

Process raw HTTP request to python requests

I'd like to pass a raw HTTP request like:
GET /foo/bar HTTP/1.1
Host: example.org
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; fr; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8
Accept: */*
Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
X-Requested-With: XMLHttpRequest
Referer: http://example.org/test
Cookie: foo=bar; lorem=ipsum;
And generate the python request such as:
import requests
burp0_url = "http://example.org:80/foo/bar"
burp0_cookies = {"foo": "bar", "lorem": "ipsum"}
burp0_headers = {"User-Agent": "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; fr; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8", "Accept": "*/*", "Accept-Language": "fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3", "Accept-Encoding": "gzip,deflate", "Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Keep-Alive": "115", "Connection": "keep-alive", "Content-Type": "application/x-www-form-urlencoded", "X-Requested-With": "XMLHttpRequest", "Referer": "http://example.org/test"}
requests.get(burp0_url, headers=burp0_headers, cookies=burp0_cookies)
Is there a library for that?
I could not find an existing library that does this conversion, but there is a Python library to convert curl commands to python requests code.
https://github.com/spulec/uncurl
e.g.
import uncurl
print(uncurl.parse('curl --header "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" --compressed --header "Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3" --header "Connection: keep-alive" --header "Content-Type: application/x-www-form-urlencoded" --cookie "foo=bar; lorem=ipsum;" --header "Keep-Alive: 115" --header "Referer: http://example.org/test" --user-agent "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; fr; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8" --header "X-Requested-With: XMLHttpRequest" https://example.org/foo/bar '))
I haven't found a Python library to transform raw HTTP into such a curl command. However, this Perl program does it.
Like this:
$ cat basic
GET /index.html HTTP/2
Host: example.com
Authorization: Basic aGVsbG86eW91Zm9vbA==
Accept: */*
$ ./h2c < basic
curl --http2 --header User-Agent: --user "hello:youfool" https://example.com/index.html
You could either call it from your python script, use a Python-Perl bridge or try to port it.
Postman also allows you to convert raw HTTP requests directly to python requests code, using its Code snippet generator. Although, it seems this can only be done via the GUI. It's also not Open-Source, so you can't access the code that does this transformation.
I needed something that can generate a request and couldn't find it so ended up writing it in gist:
class RequestParser(object):
def __parse_request_line(self, request_line):
request_parts = request_line.split(' ')
self.method = request_parts[0]
self.url = request_parts[1]
self.protocol = request_parts[2] if len(request_parts) > 2 else DEFAULT_HTTP_VERSION
def __init__(self, req_text):
req_lines = req_text.split(CRLF)
self.__parse_request_line(req_lines[0])
ind = 1
self.headers = dict()
while ind < len(req_lines) and len(req_lines[ind]) > 0:
colon_ind = req_lines[ind].find(':')
header_key = req_lines[ind][:colon_ind]
header_value = req_lines[ind][colon_ind + 1:]
self.headers[header_key] = header_value
ind += 1
ind += 1
self.data = req_lines[ind:] if ind < len(req_lines) else None
self.body = CRLF.join(self.data)
def __str__(self):
headers = CRLF.join(f'{key}: {self.headers[key]}' for key in self.headers)
return f'{self.method} {self.url} {self.protocol}{CRLF}' \
f'{headers}{CRLF}{CRLF}{self.body}'
def to_request(self):
req = requests.Request(method=self.method,
url=self.url,
headers=self.headers,
data=self.data, )
return req

convert curl to python requests (pisignage API)

I am trying to convert a curl of pisignage into python requests. The curl is,
curl -X POST "https://swagger.piathome.com/api/files" -H "accept:
application/json" -H "x-access-token: login_session_token" -H
"Content-Type: multipart/form-data" -F "Upload
file=#test.jpg;type=image/jpeg"
My code is,
import requests
files = {'Upload file': open('test.jpg', 'rb'), 'type': 'image/jpeg'}
headers = {'Content-type': 'multipart/form-data', 'accept': 'application/json', 'x-access-token': 'login_session_token'}
file_response = requests.post(
'https://swagger.piathome.com/api/files',
files=files,
headers=headers
)
print(file_response)
It returns 404. I tried uncurl, the code is:
import uncurl
u = uncurl.parse('curl -X POST "https://swagger.piathome.com/api/files" -H "accept: application/json" -H "x-access-token: login_session_token" -H "Content-Type: multipart/form-data" -F "Upload file=test.jpg;type=image/jpeg"')
print(u)
The output is ,
error: unrecognized arguments: -F Upload file=test.jpg;type=image/jpeg
After a day of searching it turns out the swagger documentation is incorrect.
use
files = {
'assets': (open('test.jpg', 'rb'))
}
Try this
import requests
headers = {
'accept': 'application/json',
'x-access-token': 'login_session_token',
'Content-Type': 'multipart/form-data',
}
files = {
'Upload file': (None, 'test.jpg;type'),
}
response = requests.post('https://swagger.piathome.com/api/files', headers=headers, files=files)
link to parse curl to request python

Understanding Bearer Authorization for web scraping using python 3.8 and requests

So I am looking to scrape the following site:
https://hyland.csod.com/ux/ats/careersite/4/home?c=hyland
What I am running into using the Python Requests library is that the header requires I pass along an Authorization header that bears a token of some kind. While I can get this to work if I manually go to the page, copy and paste it, and then run my program, I am wondering how I could bypass this issue (After all, what is the point in running a scraper if I still have to visit the actual site manually and retrieve the authorization token).
I am newer to authorization/ bearer headers and am hoping someone might be able to clarify how the browser generates a token to retrieve this information/ how I can simulate this. Here is my code:
import requests
import json
import datetime
today = datetime.datetime.today()
url = "https://hyland.csod.com/services/x/career-site/v1/search"
# actual sitehttps://hyland.csod.com/ux/ats/careersite/4/home?c=hyland
headers = {
'authority': 'hyland.csod.com',
'origin': 'https://hyland.csod.com',
'authorization': 'Bearer eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCIsImNsaWQiOiI0bDhnbnFhbGk3NjgifQ.eyJzdWIiOi0xMDMsImF1ZCI6IjRxNTFzeG5oY25yazRhNXB1eXZ1eGh6eCIsImNvcnAiOiJoeWxhbmQiLCJjdWlkIjoxLCJ0emlkIjoxNCwibmJkIjoiMjAxOTEyMzEyMTE0MTU5MzQiLCJleHAiOiIyMDE5MTIzMTIyMTUxNTkzNCIsImlhdCI6IjIwMTkxMjMxMjExNDE1OTM0In0.PlNdWXtb1uNoMuGIhI093ZbheRN_DwENTlkNoVr0j7Zah6JHd5cukudVFnZEiQmgBZ_nlDU4C-9JO_2We380Vg',
'content-type': 'application/json',
'accept': 'application/json; q=1.0, text/*; q=0.8, */*; q=0.1',
'x-requested-with': 'XMLHttpRequest',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36',
'csod-accept-language': 'en-US',
'referer': 'https://hyland.csod.com/ux/ats/careersite/4/home?c=hyland',
'accept-encoding': 'gzip, deflate, br',
'cookie': 'CYBERU_lastculture=en-US; ASP.NET_SessionId=4q51sxnhcnrk4a5puyvuxhzx; cscx=hyland^|-103^|1^|14^|KumB4VhzYXML22MnMxjtTB9SKgHiWW0tFg0HbHnOek4=; c-s=expires=1577909201~access=/clientimg/hyland/*^!/content/hyland/*~md5=78cd5252d2efff6eb77d2e6bf0ce3127',
}
data = ['{"careerSiteId":4,"pageNumber":1,"pageSize":25,"cultureId":1,"searchText":"","cultureName":"en-US","states":["oh"],"countryCodes":[],"cities":[],"placeID":"","radius":null,"postingsWithinDays":null,"customFieldCheckboxKeys":[],"customFieldDropdowns":[],"customFieldRadios":[]}',
'{"careerSiteId":4,"pageNumber":2,"pageSize":25,"cultureId":1,"searchText":"","cultureName":"en-US","states":["oh"],"countryCodes":[],"cities":[],"placeID":"","radius":null,"postingsWithinDays":null,"customFieldCheckboxKeys":[],"customFieldDropdowns":[],"customFieldRadios":[]}']
def hyland(url, data):
# for openings in data:
dirty = requests.post(url, headers=headers, data=data).text
if 'Unauthorized' in dirty:
print(dirty)
print("There was an error connecting. Check Info")
# print(dirty)
clean = json.loads(dirty)
cleaner = json.dumps(clean, indent=4)
print("Openings at Hyland Software in Westlake as of {}".format(today.strftime('%m-%d-%Y')))
for i in range(0,60):
try:
print(clean["data"]["requisitions"][i]["displayJobTitle"])
print("")
print("")
except:
print("{} Openings at Hyland".format(i))
break
for datum in data:
hyland(url, data=datum)
So basically what my code is doing is sending a post request to the url above along with the headers and necessary data to retrieve what I want. This scraper works for a short period of time, but if I leave and come back after a few hours it no longer works due to authorization (at least that is what I have concluded).
Any help/ clarification on how all this works would be greatly appreciated.
Your code has a few problems:
As you noted you have to get the bearer token
You have to send your requests using requests.session() (as this webpage seems to pay attention to the cookies you send)
Optional: your headers had a lot of unnecessary headers that could be removed
All in all, here bellow is the working code:
import requests
import json
import datetime
today = datetime.datetime.today()
session = requests.session()
url = "https://hyland.csod.com:443/ux/ats/careersite/4/home?c=hyland"
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "DNT": "1", "Connection": "close", "Upgrade-Insecure-Requests": "1"}
raw = session.get(url, headers=headers).text
token = raw[raw.index("token")+8:]
token = token[:token.index("\"")]
bearer_token = f"Bearer {token}"
url = "https://hyland.csod.com/services/x/career-site/v1/search"
# actual sitehttps://hyland.csod.com/ux/ats/careersite/4/home?c=hyland
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0", "Authorization": bearer_token}
data = ['{"careerSiteId":4,"pageNumber":1,"pageSize":25,"cultureId":1,"searchText":"","cultureName":"en-US","states":["oh"],"countryCodes":[],"cities":[],"placeID":"","radius":null,"postingsWithinDays":null,"customFieldCheckboxKeys":[],"customFieldDropdowns":[],"customFieldRadios":[]}',
'{"careerSiteId":4,"pageNumber":2,"pageSize":25,"cultureId":1,"searchText":"","cultureName":"en-US","states":["oh"],"countryCodes":[],"cities":[],"placeID":"","radius":null,"postingsWithinDays":null,"customFieldCheckboxKeys":[],"customFieldDropdowns":[],"customFieldRadios":[]}']
def hyland(url, data, session= session):
# for openings in data:
dirty = session.post(url, headers=headers, data=data).text
if 'Unauthorized' in dirty:
print(dirty)
print("There was an error connecting. Check Info")
# print(dirty)
clean = json.loads(dirty)
cleaner = json.dumps(clean, indent=4)
print("Openings at Hyland Software in Westlake as of {}".format(today.strftime('%m-%d-%Y')))
for i in range(0,60):
try:
print(clean["data"]["requisitions"][i]["displayJobTitle"])
print("")
print("")
except:
print("{} Openings at Hyland".format(i))
break
for datum in data:
hyland(url, data=datum, session = session)
hope this helps

Resources