Requests vs Curl - python-3.x

Requests vs Curl - python-3.x

I have an application running on AWS that makes a request to a page to pull meta tags using requests. I'm finding that page is allowing curl requests, but not allowing requests from the requests library.
Works:
curl https://www.seattletimes.com/nation-world/mount-st-helens-which-erupted-41-years-ago-starts-reopening-after-covid-closures/
Hangs Forever:
imports requests
requests.get('https://www.seattletimes.com/nation-world/mount-st-helens-which-erupted-41-years-ago-starts-reopening-after-covid-closures/')
What is the difference between curl and requests here? Should I just spawn a curl process to make my requests?

Either of the agents below do indeed work. One can also use the user_agent module (located on pypi here) to generate random and valid web user agents.
import requests
agent = (
"Mozilla/5.0 (X11; Linux x86_64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/85.0.4183.102 Safari/537.36"
)
# or can use
# agent = "curl/7.61.1"
url = ("https://www.seattletimes.com/nation-world/"
"mount-st-helens-which-erupted-41-years-ago-starts-reopening-after-covid-closures/")
r = requests.get(url, headers={'user-agent': agent})
Or, using the user_agent module:
import requests
from user_agent import generate_user_agent
agent = generate_user_agent()
url = ("https://www.seattletimes.com/nation-world/"
"mount-st-helens-which-erupted-41-years-ago-starts-reopening-after-covid-closures/")
r = requests.get(url, headers={'user-agent': agent})
To further explain, requests sets a default user agent here, and the seattle times is blocking this user agent. However, with python-requests one can easily change the header parameters in the request as shown above.
To illustrate the default parameters:
r = requests.get('https://google.com/')
print(r.request.headers)
>>> {'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
vs. the updated header parameter
agent = "curl/7.61.1"
r = requests.get('https://google.com/', headers={'user-agent': agent})
print(r.request.headers)
>>>{'user-agent': 'curl/7.61.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

Related

Steam Api Store Sales

I would like to do a little script with nodejs that tracks interesting steam promos.
First of all I would like to retrieve the list of games on sale.
I tried several things, without success ...
GET request on the store.steampowered.com page (works but only displays the first 50 results because the rest only appears when you scroll to the bottom of the page)
Use of the API but it would be necessary to retrieve the list of all the games but it would take too long to check if each one is in promotion
If anyone has a solution, I'm interested
Thank's a lot

You can get the list of featured games by sending a GET request to https://store.steampowered.com/api/featuredcategories, though this may not give you all of the results you're looking for.
import requests
url = "http://store.steampowered.com/api/featuredcategories/?l=english"
res = requests.get(url)
print(res.json())
You can also get all the games on sale by sending a GET request to https://steamdb.info/sales/ and doing some extensive HTML parsing. Note that SteamDB is not maintained by Valve at all.
Edit: The following script does the GET request.
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'DNT': '1',
'Alt-Used': 'steamdb.info',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0',
'TE': 'Trailers',
}
response = requests.get('https://steamdb.info/sales/', headers=headers)
print(response)

Check if the path is dynamically generated in web scraping

I am scraping data from trip.com . It is a hotel listing website. After entering the details when i click on the search button, the search results are displayed in a new tab with the results being generated dynamically. When i scroll doen the website more results are downloaded and displayed. Now as I understand to generate the data dynamically and scrape it I need to have information about the header of the API returning the JSON value dynamically. But the issue here is this site I am scraping genrates is header param dynamically and in an encrypted format as well. What i mean is this is my request URL:
Request URL: https://www.trip.com/restapi/soa2/16709/json/rateplan?testab=ec23b14de9ad450c7b74612efc288bfdd523314036afe19b5fe135f206284aab
and this is my request header:
:authority: www.trip.com
:method: POST
:path: /restapi/soa2/16709/json/rateplan?testab=ec23b14de9ad450c7b74612efc288bfdd523314036afe19b5fe135f206284aab
:scheme: https
accept: application/json
accept-encoding: gzip, deflate, br
accept-language: en-GB,en-US;q=0.9,en;q=0.8
cache-control: no-cache
content-length: 1697
content-type: application/json
cookie: ibulanguage=EN; cookiePricesDisplayed=USD; ibu_online_home_language_match={"isFromTWNotZh":false,"isFromIPRedirect":false,"isFromLastVisited":false,"isRedirect":false,"isShowSuggestion":false,"lastVisited":""}; _abtest_userid=55c19cf3-dcd6-4f4a-bfba-5965c52ac66c; _tp_search_latest_channel_name=hotels; _RF1=45.115.185.74; _RSG=BJ4Q9HdNV80BpEgEyf8ZZ9; _RDG=286d5feba1bdad2eee089fc228174f22ec; _RGUID=021f5e74-4968-44cb-98e3-229f0ea8eccb; ibulocale=en_us; g_state={"i_p":1600591022929,"i_l":3}; Union=AllianceID=1078337&SID=2036545&OUID=ctag.hash.d23ecf76442c&SourceID=&AppID=&OpenID=&Expires=1602581159329&createtime=1599989159; IBU_TRANCE_LOG_URL=/hotels/mumbai-hotel-detail-762871/grand-hyatt-mumbai/?checkIn=2020-09-14&checkOut=2020-09-15&cityId=724&adult=2&children=0&ages=&crn=1&travelpurpose=0&curr=USD&showtotalamt=0&hoteluniquekey=H4sIAAAAAAAAAOPaycjFK8Fk8B8GGIWYOBilFjNyfJl7U12Iy9DE0sTczNzQwMhgCrNFs44jAwgcaHDwBDMKWh0CeCYxSnKCeef3OAiC6AbVnQ5OrBxr_SRYZjB-P663gpFxIyNEY5LDDkamE4x-C5j-PnnDvIuJleM1uwTTISA9SVCC5RQTwyUmhltMDI-YGF4xMXxiYvgFVdHEzNDFzDCJGaJuFjPDImYGIRaQG6UUjMxTjI0NE00tzYzMTSwT00B0qplJYpKxUXKiuaW5ArdG16GPv1iNGKyYpRjdPBiD2Iwd3SyMXKJkuJg9_YIE4xpqS16d2m4vxRwa7KKoqyj_JSdM2iGJNTVPNyIi4x1LAWMXI5MA4yRGTo7m3U8-Mp5gTAYA1R43aDgBAAA(; librauuid=3lSNuDO18464CG5a; intl_ht1=h4%3D724_762871; hotel=762871; hotelhst=1164390341; _bfa=1.1599889636407.b231b.1.1599996200640.1600004365027.18.57; _bfs=1.1; _bfi=p1%3D10320668147%26p2%3D10320668147%26v1%3D57%26v2%3D56; IBU_TRANCE_LOG_P=22266407054
origin: https://www.trip.com
p: 22266407054
pid: 584e7499-4df6-45dd-8242-94cb5dec36c5
pragma: no-cache
referer: https://www.trip.com/hotels/mumbai-hotel-detail-762871/grand-hyatt-mumbai/?checkIn=2020-09-14&checkOut=2020-09-15&cityId=724&adult=2&children=0&ages=&crn=1&travelpurpose=0&curr=USD&showtotalamt=0&hoteluniquekey=H4sIAAAAAAAAAOPaycjFK8Fk8B8GGIWYOBilFjNyfJl7U12Iy9DE0sTczNzQwMhgCrNFs44jAwgcaHDwBDMKWh0CeCYxSnKCeef3OAiC6AbVnQ5OrBxr_SRYZjB-P663gpFxIyNEY5LDDkamE4x-C5j-PnnDvIuJleM1uwTTISA9SVCC5RQTwyUmhltMDI-YGF4xMXxiYvgFVdHEzNDFzDCJGaJuFjPDImYGIRaQG6UUjMxTjI0NE00tzYzMTSwT00B0qplJYpKxUXKiuaW5ArdG16GPv1iNGKyYpRjdPBiD2Iwd3SyMXKJkuJg9_YIE4xpqS16d2m4vxRwa7KKoqyj_JSdM2iGJNTVPNyIi4x1LAWMXI5MA4yRGTo7m3U8-Mp5gTAYA1R43aDgBAAA(
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36
Now here the value of testab parameter is generated dynamically when i scroll down in the site. But I am not able to understand how this testab value is being generated. Is it generated byb encrypting the rest of the request header info. FYI, I have all the request header info except the "path" value. So if the value is generated by encryption, how do I proceed with scraping this. Also, I cannot use selenuim or any browser based scraping here.

The testab value is being generated at random using the following JavaScript in the file https://ak-s.tripcdn.com/modules/ibu/ibu-hotel-online/smart/smart.353046e23a610af9fcf9.js
key: "gencb",
value: function gencb(r) {
var o = function() {
for (var e = "qwertyuiopasdfg$hjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM", t = "", n = 0; n < 10; n++)
t += e.charAt(~~(Math.random() * e.length));
return t
}();
return window[o] = function(e) {
delete window[o];
var t = e()
, n = "?";
r.realUrl && 0 < r.realUrl.indexOf("?") && (n = "&"),
r.realUrl += n + "testab=" + encodeURIComponent(t)
}
,
o
}
It is then being written to the server on a POST request to https://www.trip.com/restapi/soa2/16709/json/getHotelScript but it is encrypted in the hotelUuidKey unless you are able to crack the encryption you had better render the page using JavaScript.
You say you can't use Selenium or any browser based solution have you looked at PyQt?
https://doc.qt.io/qt-5/qtwebengine-overview.html#qt-webengine-core-module
The Qt WebEngine core is based on the Chromium Project. Chromium provides its own network and painting engines and is developed tightly together with its dependent modules.
Note: Qt WebEngine is based on Chromium, but does not contain or use any services or add-ons that might be part of the Chrome browser that is built and delivered by Google.
import sys
from PyQt5.QtCore import QUrl
from PyQt5.QtWidgets import QApplication
from PyQt5.QtWebEngineCore import QWebEngineUrlRequestInterceptor
from PyQt5.QtWebEngineWidgets import QWebEngineView, QWebEnginePage, QWebEngineProfile
class WebEngineUrlRequestInterceptor(QWebEngineUrlRequestInterceptor):
def interceptRequest(self, info):
if info.requestUrl().url().startswith('https://www.trip.com/restapi/soa2/16709/json/rateplan?testab='):
print(info.requestUrl().url())
# Do stuff
sys.exit()
class MyWebEnginePage(QWebEnginePage):
def acceptNavigationRequest(self, url, _type, isMainFrame):
return QWebEnginePage.acceptNavigationRequest(self, url, _type, isMainFrame)
if __name__ == "__main__":
app = QApplication(sys.argv)
browser = QWebEngineView()
interceptor = WebEngineUrlRequestInterceptor()
profile = QWebEngineProfile()
profile.setRequestInterceptor(interceptor)
page = MyWebEnginePage(profile, browser)
url = 'https://www.trip.com/hotels/mumbai-hotel-detail-762871/grand-hyatt-mumbai/?checkIn=2020-09-14&checkOut=2020-09-15&cityId=724&adult=2&children=0&ages=&crn=1&travelpurpose=0&curr=USD&showtotalamt=0&hoteluniquekey=H4sIAAAAAAAAAOPaycjFK8Fk8B8GGIWYOBilFjNyfJl7U12Iy9DE0sTczNzQwMhgCrNFs44jAwgcaHDwBDMKWh0CeCYxSnKCeef3OAiC6AbVnQ5OrBxr_SRYZjB-P663gpFxIyNEY5LDDkamE4x-C5j-PnnDvIuJleM1uwTTISA9SVCC5RQTwyUmhltMDI-YGF4xMXxiYvgFVdHEzNDFzDCJGaJuFjPDImYGIRaQG6UUjMxTjI0NE00tzYzMTSwT00B0qplJYpKxUXKiuaW5ArdG16GPv1iNGKyYpRjdPBiD2Iwd3SyMXKJkuJg9_YIE4xpqS16d2m4vxRwa7KKoqyj_JSdM2iGJNTVPNyIi4x1LAWMXI5MA4yRGTo7m3U8-Mp5gTAYA1R43aDgBAAA('
page.setUrl(QUrl(url))
browser.setPage(page)
browser.show()
sys.exit(app.exec_())
Adapted from https://stackoverflow.com/a/50786759/839338 authored by eyllanesc
Outputs the link (and a few warnings) e.g.
https://www.trip.com/restapi/soa2/16709/json/rateplan?testab=15feb5b1067d2e4e2b979fe97830d884c5e3a07*e145f7¼(5955400Z380ac6a6
Updated in response to comment
You just need to grab the cookies and make a request. Very quick and dirty code is below.
import requests
import sys
from PyQt5.QtCore import QUrl
from PyQt5.QtWidgets import QApplication
from PyQt5.QtWebEngineCore import QWebEngineUrlRequestInterceptor
from PyQt5.QtWebEngineWidgets import QWebEngineView, QWebEnginePage, QWebEngineProfile
from PyQt5.QtNetwork import QNetworkCookie
class WebEngineUrlRequestInterceptor(QWebEngineUrlRequestInterceptor):
def __init__(self, on_network_call):
super().__init__()
self.on_network_call = on_network_call
def interceptRequest(self, info):
if info.requestUrl().url().startswith('https://www.trip.com/restapi/soa2/16709/json/rateplan?testab='):
self.on_network_call(info)
sys.exit()
class MyWebEnginePage(QWebEnginePage):
def acceptNavigationRequest(self, url, _type, isMainFrame):
return QWebEnginePage.acceptNavigationRequest(self, url, _type, isMainFrame)
def on_network_call(info):
print(info.requestUrl().url())
headers = {
'authority': 'www.trip.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'accept': 'application/json',
'dnt': '1',
'p': '99783168614',
'pid': '256f8038-1c06-4173-99b5-880dc120042f',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
'content-type': 'application/json',
'origin': 'https://www.trip.com',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.trip.com/hotels/mumbai-hotel-detail-762871/grand-hyatt-mumbai/?checkIn=2020-09-14&checkOut=2020-09-15&cityId=724&adult=2&children=0&ages=&crn=1&travelpurpose=0&curr=USD&showtotalamt=0&hoteluniquekey=H4sIAAAAAAAAAOPaycjFK8Fk8B8GGIWYOBilFjNyfJl7U12Iy9DE0sTczNzQwMhgCrNFs44jAwgcaHDwBDMKWh0CeCYxSnKCeef3OAiC6AbVnQ5OrBxr_SRYZjB-P663gpFxIyNEY5LDDkamE4x-C5j-PnnDvIuJleM1uwTTISA9SVCC5RQTwyUmhltMDI-YGF4xMXxiYvgFVdHEzNDFzDCJGaJuFjPDImYGIRaQG6UUjMxTjI0NE00tzYzMTSwT00B0qplJYpKxUXKiuaW5ArdG16GPv1iNGKyYpRjdPBiD2Iwd3SyMXKJkuJg9_YIE4xpqS16d2m4vxRwa7KKoqyj_JSdM2iGJNTVPNyIi4x1LAWMXI5MA4yRGTo7m3U8-Mp5gTAYA1R43aDgBAAA(sec-fetch-dest:%20empty',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
}
data = '{"checkIn":"2020-09-15","checkOut":"2020-09-16","priceType":"0","adult":2,"popularFacilityType":"","hotelUniqueKey":"H4sIAAAAAAAAAOPaycjFK8Fk8B8GGIWYOBilFjNyfJl7U12Iy9DE0sTczNzQwMhgCrNFs44jAwgcaHDwBDMKWh0CeCYxSnKCeef3OAiC6AbVnQ5OrBxr_SRYZjB-P663gpFxIyNEY5LDDkamE4x-C5j-PnnDvIuJleM1uwTTISA9SVCC5RQTwyUmhltMDI-YGF4xMXxiYvgFVdHEzNDFzDCJGaJuFjPDImYGIRaQG6UUjMxTjI0NE00tzYzMTSwT00B0qplJYpKxUXKiuaW5ArdG16GPv1iNGKyYpRjdPBiD2Iwd3SyMXKJkuJg9_YIE4xpqS16d2m4vxRwa7KKoqyj_JSdM2iGJNTVPNyIi4x1LAWMXI5MA4yRGTo7m3U8-Mp5gTAYA1R43aDgBAAA(sec-fetch-dest:%20empty","child":0,"roomNum":1,"masterHotelId":762871,"age":"","cityId":"724","hotel":"762871","versionControl":[{"key":"RoomCardVersionB","value":"T"}],"signInRoomKey":"","signInType":0,"filterCondition":null,"unAvailableRoomInfo":null,"minPriceRoomKey":"","Head":{"Locale":"en-XX","Currency":"USD","AID":"","SID":"","ClientID":"1600039009299.2v21ry","OUID":"","CAID":"","CSID":"","COUID":"","TimeZone":"1","PageID":"10320668147","HotelExtension":{"WebpSupport":true,"Qid":"","hasAidInUrl":false,"group":"TRIP","PID":"256f8038-1c06-4173-99b5-880dc120042f","hotelUuidKey":"S96K39i7Te47IA7idYlfYp6E3YLpemawnOWOYhgjs6wZFv0lEPYtNjoSwHSybpjsY1pKL4KazvlLjFYoTvU1YQByTZjUBvc9ed7YG9jHZy5Y1fekTv0NEghwGqWbsenZi8BwMYtY5OInLeo9YmDvFSeDrNbeUZjnkwDfY7bwzSEkY1dRSYX0INbWBYaqYonikdikSiXNj5Y5bjSQi4gYBkwPoJoGRcaYT7woY0ZR7fwa7W6XW4hR7BRqpJT4JMfy9SEcbRgaE4ZEaY4FyfQK11xomETtvc1KQtY3aWGBr90yBXET9vSOvhkyg1E9DJGYUaRkNwG3W9fW6QWf7iDOv5DWqbWFHvfSYHdvdtvOYaXjOcwLkvthjUYAqR9ZwqdjAHW53eZPROqWzSJ3PWPYPnRgqwmFW43jDSePDRBPWtcY3niTYHpRqLwUgWz6WPURD1RUZJ8bJ73ytTEFlWGmW6G","hotelUuid":"dhX4uhn0MdpHusaD"},"Frontend":{"vid":"1600039009299.2v21ry","sessionID":2,"pvid":6},"P":"99783168614","Device":"PC","Version":"0"}}'
r = requests.post(info.requestUrl().url(), cookies=to_cookie_dict(), data=data, headers=headers)
print(r.json())
def on_cookie_added(cookie):
for c in cookies:
if c.hasSameIdentifier(cookie):
return
cookies.append(QNetworkCookie(cookie))
def to_cookie_dict():
cookie_dict = {}
for c in cookies:
cookie_dict[bytearray(c.name()).decode()] = bytearray(c.value()).decode()
print(cookie_dict)
return cookie_dict
if __name__ == "__main__":
app = QApplication(sys.argv)
browser = QWebEngineView()
interceptor = WebEngineUrlRequestInterceptor(on_network_call)
profile = QWebEngineProfile()
cookie_store = profile.cookieStore()
cookie_store.cookieAdded.connect(on_cookie_added)
cookies = []
profile.setRequestInterceptor(interceptor)
page = MyWebEnginePage(profile, browser)
url = 'https://www.trip.com/hotels/mumbai-hotel-detail-762871/grand-hyatt-mumbai/?checkIn=2020-09-14&checkOut=2020-09-15&cityId=724&adult=2&children=0&ages=&crn=1&travelpurpose=0&curr=USD&showtotalamt=0&hoteluniquekey=H4sIAAAAAAAAAOPaycjFK8Fk8B8GGIWYOBilFjNyfJl7U12Iy9DE0sTczNzQwMhgCrNFs44jAwgcaHDwBDMKWh0CeCYxSnKCeef3OAiC6AbVnQ5OrBxr_SRYZjB-P663gpFxIyNEY5LDDkamE4x-C5j-PnnDvIuJleM1uwTTISA9SVCC5RQTwyUmhltMDI-YGF4xMXxiYvgFVdHEzNDFzDCJGaJuFjPDImYGIRaQG6UUjMxTjI0NE00tzYzMTSwT00B0qplJYpKxUXKiuaW5ArdG16GPv1iNGKyYpRjdPBiD2Iwd3SyMXKJkuJg9_YIE4xpqS16d2m4vxRwa7KKoqyj_JSdM2iGJNTVPNyIi4x1LAWMXI5MA4yRGTo7m3U8-Mp5gTAYA1R43aDgBAAA('
page.setUrl(QUrl(url))
browser.setPage(page)
browser.show()
sys.exit(app.exec_())
Thanks to How to capture the response of a request intercepted by QWebEngineUrlRequestInterceptor? authored by eriel marimon and https://stackoverflow.com/a/48154459/839338 authored by eyllanesc

Discord.py Register request headers?

I am attempting to register a user in Discord using the following code:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7',
'Content-Type': 'application/json'
}
r = requests.post('https://canary.discordapp.com/api/v6/auth/register',headers=headers)
print(r)
The output I am getting is HTTP Error 400.
Question: what headers should I use for this to succeed?

Scraping links with BeautifulSoup from all pages in Amazon results in error

I'm trying to scrape product URLs from the Amazon Webshop, by going through every page.
import requests
from bs4 import BeautifulSoup
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
products = set()
for i in range(1, 21):
url = 'https://www.amazon.fr/s?k=phone%2Bcase&page=' + str(i)
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content)
print(soup) # prints the HTML content saying Error on Amazon's side
links = soup.select('a.a-link-normal.a-text-normal')
for tag in links:
url_product = 'https://www.amazon.fr' + tag.attrs['href']
products.add(url_product)
Instead of getting the content of the page, I get a "Sorry, something went wrong on our end" HTML Error Page. What is the reason behind this? How can I successfully bypass this error and scrape the products?

According to your question:
Be informed that AMAZON not allowing automated access to for it's data! So you can double check this by checking the response via r.status_code ! which can lead you to have that error MSG:
To discuss automated access to Amazon data please contact api-services-support#amazon.com
Therefore you can use AMAZON API or you can pass a list of proxies to the GET request via proxies = list_proxies.
Here's the correct way to pass headers to Amazon without getting block and it's Works.
import requests
from bs4 import BeautifulSoup
headers = {
'Host': 'www.amazon.fr',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'TE': 'Trailers'
}
for item in range(1, 21):
r = requests.get(
'https://www.amazon.fr/s?k=phone+case&page={item}&ref=sr_pg_{item}', headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.findAll('a', attrs={'class': 'a-link-normal a-text-normal'}):
print(f"https://www.amazon.fr{item.get('href')}")
Run Online: Click Here

Docusign API: GET userID by e-mail address

Creating an app in python that uses the Docusign API to delete user accounts. It appears I'll need the users userID to accomplish this so I need to make 2 calls, one to get the UserID and then one to delete the user.
The problem is that when I make a response.get() for the user I get every user account.
email = sys.arg[1]
account_id = "<account_id goes here>"
auth = 'Bearer <long token goes here>'
head = {
'Accept': 'application/json',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8,fa;q=0.6,sv;q=0.4',
'Cache-Control': 'no-cache',
'Origin': 'https://apiexplorer.docusign.com',
'Referer': 'https://apiexplorer.docusign.com/',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) \
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36',
'Authorization': auth,
'Content-Length': '100',
'Content-Type': 'application/json'}
url = 'https://demo.docusign.net/restapi/v2/accounts/{}/users'.format(account_id)
data = {"users": [{"email": email}]}
response = requests.get(url, headers=head, json=data)
print(response.text)
Why do I get a response.text with every user? And how can I just get a single user's information based on the e-mail address?

With the v2 API you can do this using the email_substring query parameter that allows you to search on specific users.
GET /v2/accounts/{accountId}/users?email_substring=youremail
https://developers.docusign.com/esign-rest-api/v2/reference/Users/Users/list

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Requests vs Curl - python-3.x

Related

Steam Api Store Sales

Check if the path is dynamically generated in web scraping

Discord.py Register request headers?

Scraping links with BeautifulSoup from all pages in Amazon results in error

Docusign API: GET userID by e-mail address

Categories

Resources