How to mimic an XHR request in Ahrefs.com with Python? - python-3.x

I am trying to scrape data from a page that comes from an XHR request. The request is made when the user clicks a link. I've been trying to mimic the request with my scraper, by using the hash in the link's 'onClick' attribute. I can get it to work for the first link, but I need to iterate each of the links on the page. The page itself is behind a login, so I am reproducing as much of what I think is needed below.
The bigger question is this: am I even trying the right thing? Should I use something completely different? Any direction is appreciated.
I am still quite new to this, so my thought is there is perhaps something obvious that I am missing.
Here is my code:
import requests
import json
headers = {
'accept':'application/json, text/javascript, */*; q=0.01',
'accept-encoding':'gzip, deflate, br',
'accept-language':'en-US,en;q=0.9',
#'content-length':'93',
'content-type':'application/x-www-form-urlencoded; charset=UTF-8',
'cookie':'ajs_anonymous_id=%2254791bca-cf67-4536-bd9e-231dcf9157ef%22; ajs_group_id=null; ajs_user_id=108300; _vis_opt_s=1%7C; _vwo_uuid_22=654C6CD9AE3842F79BD796977EB8A53F; _vis_opt_exp_22_combi=2; new_RT=1; _vwo_uuid_v2=81F71C91AD0F1AEA7D52FA84E4BCD177|9f489a6cfd5cc6a02bc483a2f51a4a34; intercom-id-dic5omcp=9bfc00e2-1f81-449f-ad47-a79360ad0b5e; _ga=GA1.2.1205864433.1504618215; intercom-lou-dic5omcp=1; PHPSESSID=ldbj3g6tnrq9k8qlekn1813fqc; ljs-lang=en; _gid=GA1.2.895488268.1516275764; intercom-session-dic5omcp=THQ1VW1zT3JpWGRVVitTM0hjbUdMRkRkOGRxamQ5Zkx4VUlrQVNiMkI1VSsxdFIwbTdTbGlrNitGakh4QlRmZC0tRkY3UE10WmxYWldaZVBKT2VGcm9SQT09--5f518888fe0a0a2d6f59fcbc441e827d45674ec0; XSRF-TOKEN=eyJpdiI6ImtOM1RIcG50aCtGY0lsZHdQcXVBcEE9PSIsInZhbHVlIjoiU1FNM2xsYmhYZXZGRTRlSmNaWmM3SXVZMjZpVzNyWkZRcVdqSm0rcHczd0xRSHJyQnpGbzVOZzlMWUtTaWJyeU5SMWlUUHZCcERZbEVWYUtuOUtaWlE9PSIsIm1hYyI6IjFlOTA1MWE0NjYzM2RiNGQyNTkyMmM2ZmI0MmM1MmE3Y2M2N2M1MTdjYTJiMGY1MTA2NTM1ZmFjZjUwOTRjMzQifQ%3D%3D; ahrefs_cookie=eyJpdiI6ImNPcHF4Sm9LclZTKzh4WEZ1aHlKSnc9PSIsInZhbHVlIjoiWHZcLytZTkFUK01XUE9SYWtkOCtlY1dcL1oyWDNlQURINmJ3M25cLzhWXC8yNFwvU3BQVTB2ZU50cG51QnRxeG9nQlFwMjVoa01taUxPYTFsSzVxSEZ3ZEZGUT09IiwibWFjIjoiNGFlNmUzMmQ3ZTQwYzgyM2JmN2U4NzU2NzgwYmY4ZGM0MjZhMDBhYWE1NjY3NGRhYmU1MDIyOTQzOWY4ZWExYiJ9',
'origin':'https://ahrefs.com',
'referer':'https://ahrefs.com/link-intersect/result/3/common:desc?linking[0]=http%3A%2F%2Fwww.grandchancellorhotels.com%2Fau%2Fmelbourne%2F&linking[1]=https%3A%2F%2Fwww.oakshotels.com%2Foaks-on-lonsdale&linking[2]=https%3A%2F%2Fcrossleyhotel.com.au%2F&linking[3]=https%3A%2F%2Fwww.rydges.com%2Faccommodation%2Fmelbourne-vic%2Fmelbourne-cbd%2F&linking[4]=http%3A%2F%2Fwww.thehotelwindsor.com.au%2F&linking[5]=http%3A%2F%2Fwww.spacehotel.com.au%2F&linking[6]=http%3A%2F%2Fwww.stamford.com.au%2Fspm%2F&linking[7]=https%3A%2F%2Fwww.citadines.com%2Fen%2Faustralia%2Fmelbourne%2Fcitadines-on-bourke-melbourne%2F&linking[8]=https%3A%2F%2Fmelbourne.grand.hyatt.com&linking[9]=http%3A%2F%2Fwww.somerset.com%2Faustralia%2Fmelbourne%2Fsomerset-on-elizabeth-melbourne%2F&linking-modes[0]=prefix&linking-modes[1]=prefix&linking-modes[2]=subdomains&linking-modes[3]=prefix&linking-modes[4]=subdomains&linking-modes[5]=subdomains&linking-modes[6]=prefix&linking-modes[7]=prefix&linking-modes[8]=prefix&linking-modes[9]=prefix&no-linking[0]=http%3A%2F%2Fwww.marriott.com%2Fhotels%2Ftravel%2Fmelmc-melbourne-marriott-hotel%2F&no-linking-modes[0]=prefix&is_union=1',
'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
'x-csrf-token':'kbIUUA1ygDlENDGohYxYbEhLPUVZZBIR2hWOcGoB',
'x-requested-with':'XMLHttpRequest'
}
data={
'for_domain':'cityaccommodations.com.au',
'domain_index':101,
'target_index':3,
'offset':0,
'count':0,
'total_count':4
}
r = requests.post('https://ahrefs.com/site-explorer/ajax/examples/backlinks-for-intersect-domain/d72718a2895587e32a5a869c3858df14', headers=headers, data=data)
response = r.text
python_object = json.loads(response)
print(python_object)
for item in python_object:
print(item)
for thing in item:
print(". " + thing)
print(r.text)
It doesn't error, but the JSON returned has an empty array for 'result', and I noticed 'cache-expired' is True. The JSON returned, if it works properly, would be:
{"examples_data":{"result":[{"url_from":"http:\/\/cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","ahrefs_rank":8,"domain_rating":45,"ahrefs_top":0,"ip_from":"182.50.132.56","links_internal":"14","links_external":"7","page_size":"5 kB","encoding":"iso-8859-1","title":"The Crossley Hotel Managed by Mecure - Accommodation - Melbourne City, Little Bourke St, Melbourne","language":"en","url_to":"https:\/\/crossleyhotel.com.au\/","first_seen":"27 Jun '17","last_visited":"16 Jan '18","prev_visited":"1 Dec '17","original":false,"redirect":0,"alt":"","anchor":"www.crossleyhotel.com.au","text_pre":"Website:","text_post":"","http_code":200,"url_from_first_seen":"2016-01-23T19:23:54Z","first_origin":"recrawl","last_origin":"recrawl","sitewide":false,"link_type":"Nofollow","nofollow":true,"number":1,"type":"","powered_by":[],"WidthAhrefsRank":0,"IsRedirectChain":false,"rows_limit_not_exceeded":true,"authorized_user":true,"free_user":false,"lite_user":false,"standard_user":false,"free_or_lite_user":false,"free_or_lite_or_standard_user":false,"position_limit_not_exceeded":true,"Visited":"16 Jan '18","TimeAgoSecond":true,"VisitedLess2Days":false,"VisitedMore2Days":true,"VisitedLess1Hour":false,"VisitedMore1Hour":false,"VisitedMore1Month":false,"VisitedDays":3,"VisitedHours":0,"VisitedMinutes":0,"new_ref_page":true,"curr_ref_page":false,"curr_ref_page_number":1,"empty_anchor":false,"show_anchor":true,"space_between_anchor_and_post":" ","http_code_name":"ok","https_url_from":false,"prepared_url_from":"http:\/\/cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","https_url_to":true,"prepared_url_to":"https:\/\/crossleyhotel.com.au\/","url_from_Parts":true,"url_from_Part2":"cityaccommodations.com.au","url_from_Part3":"\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","url_to_Parts":true,"url_to_Part2":"crossleyhotel.com.au","url_to_Part3":"\/","isJsRendered":false,"StatusDropped":false,"StatusPageRedirected":false,"StatusPageNoIndex":false,"StatusPageNonCanonical":false,"StatusLinkRemoved":false,"StatusLinkBrokenRedirect":false,"DateDeleted":false,"backlink_status":"","isLost":true,"deleted_at":"","empty_domain_rank":false,"low_domain_rank":true,"middle_domain_rank":false,"height_domain_rank":false,"DomainRating":45,"ExampleUniqueId":"101_domain3_1","ExampleTRId":"tr_example_backlinks","Language":"English","SocialMediaURLHash":"63bb0d959c467f5828824e310bb87690","available_social_media":true,"not_available_social_media":false},{"url_from":"http:\/\/cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","ahrefs_rank":8,"domain_rating":45,"ahrefs_top":0,"ip_from":"182.50.132.56","links_internal":"14","links_external":"7","page_size":"5 kB","encoding":"iso-8859-1","title":"The Crossley Hotel Managed by Mecure - Accommodation - Melbourne City, Little Bourke St, Melbourne","language":"en","url_to":"https:\/\/www.crossleyhotel.com.au\/","first_seen":"21 Apr '17","last_visited":"16 Jan '18","prev_visited":"1 Dec '17","original":false,"redirect":0,"alt":"","anchor":"www.crossleyhotel.com.au","text_pre":"Website:","text_post":"","http_code":200,"url_from_first_seen":"2016-01-23T19:23:54Z","first_origin":"recrawl","last_origin":"recrawl","sitewide":false,"link_type":"Nofollow","nofollow":true,"number":2,"type":"","powered_by":[],"WidthAhrefsRank":0,"IsRedirectChain":false,"rows_limit_not_exceeded":true,"authorized_user":true,"free_user":false,"lite_user":false,"standard_user":false,"free_or_lite_user":false,"free_or_lite_or_standard_user":false,"position_limit_not_exceeded":true,"Visited":"16 Jan '18","TimeAgoSecond":true,"VisitedLess2Days":false,"VisitedMore2Days":true,"VisitedLess1Hour":false,"VisitedMore1Hour":false,"VisitedMore1Month":false,"VisitedDays":3,"VisitedHours":0,"VisitedMinutes":0,"new_ref_page":false,"curr_ref_page":true,"curr_ref_page_number":1,"empty_anchor":false,"show_anchor":true,"space_between_anchor_and_post":" ","http_code_name":"ok","https_url_from":false,"prepared_url_from":"http:\/\/cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","https_url_to":true,"prepared_url_to":"https:\/\/www.crossleyhotel.com.au\/","url_from_Parts":true,"url_from_Part2":"cityaccommodations.com.au","url_from_Part3":"\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","url_to_Parts":true,"url_to_Part2":"www.crossleyhotel.com.au","url_to_Part3":"\/","isJsRendered":false,"StatusDropped":false,"StatusPageRedirected":false,"StatusPageNoIndex":false,"StatusPageNonCanonical":false,"StatusLinkRemoved":false,"StatusLinkBrokenRedirect":false,"DateDeleted":false,"backlink_status":"","isLost":true,"deleted_at":"","empty_domain_rank":false,"low_domain_rank":true,"middle_domain_rank":false,"height_domain_rank":false,"DomainRating":45,"ExampleUniqueId":"101_domain3_2","ExampleTRId":"tr_example_backlinks","Language":"English","SocialMediaURLHash":"63bb0d959c467f5828824e310bb87690","available_social_media":true,"not_available_social_media":false},{"url_from":"http:\/\/www.cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","ahrefs_rank":8,"domain_rating":45,"ahrefs_top":0,"ip_from":"182.50.132.56","links_internal":"14","links_external":"7","page_size":"5 kB","encoding":"iso-8859-1","title":"The Crossley Hotel Managed by Mecure - Accommodation - Melbourne City, Little Bourke St, Melbourne","language":"","url_to":"https:\/\/crossleyhotel.com.au\/","first_seen":"14 Jul '17","last_visited":"11 Dec '17","prev_visited":"9 Nov '17","original":false,"redirect":0,"alt":"","anchor":"www.crossleyhotel.com.au","text_pre":"Website:","text_post":"","http_code":200,"url_from_first_seen":"2013-08-10T21:38:52Z","first_origin":"recrawl","last_origin":"recrawl","sitewide":false,"link_type":"Nofollow","nofollow":true,"number":3,"type":"","powered_by":[],"WidthAhrefsRank":0,"IsRedirectChain":false,"rows_limit_not_exceeded":true,"authorized_user":true,"free_user":false,"lite_user":false,"standard_user":false,"free_or_lite_user":false,"free_or_lite_or_standard_user":false,"position_limit_not_exceeded":true,"Visited":"11 Dec '17","TimeAgoSecond":true,"VisitedLess2Days":false,"VisitedMore2Days":false,"VisitedLess1Hour":false,"VisitedMore1Hour":false,"VisitedMore1Month":true,"VisitedDays":0,"VisitedHours":0,"VisitedMinutes":0,"new_ref_page":true,"curr_ref_page":false,"curr_ref_page_number":3,"empty_anchor":false,"show_anchor":true,"space_between_anchor_and_post":" ","http_code_name":"ok","https_url_from":false,"prepared_url_from":"http:\/\/www.cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","https_url_to":true,"prepared_url_to":"https:\/\/crossleyhotel.com.au\/","url_from_Parts":true,"url_from_Part2":"www.cityaccommodations.com.au","url_from_Part3":"\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","url_to_Parts":true,"url_to_Part2":"crossleyhotel.com.au","url_to_Part3":"\/","isJsRendered":false,"StatusDropped":false,"StatusPageRedirected":false,"StatusPageNoIndex":false,"StatusPageNonCanonical":false,"StatusLinkRemoved":false,"StatusLinkBrokenRedirect":false,"DateDeleted":false,"backlink_status":"","isLost":true,"deleted_at":"","empty_domain_rank":false,"low_domain_rank":true,"middle_domain_rank":false,"height_domain_rank":false,"DomainRating":45,"ExampleUniqueId":"101_domain3_3","ExampleTRId":"tr_example_backlinks","Language":false,"SocialMediaURLHash":"0a29309c991ccbb6c4cbc8884a5274e0","available_social_media":true,"not_available_social_media":false},{"url_from":"http:\/\/www.cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","ahrefs_rank":8,"domain_rating":45,"ahrefs_top":0,"ip_from":"182.50.132.56","links_internal":"14","links_external":"7","page_size":"5 kB","encoding":"iso-8859-1","title":"The Crossley Hotel Managed by Mecure - Accommodation - Melbourne City, Little Bourke St, Melbourne","language":"","url_to":"https:\/\/www.crossleyhotel.com.au\/","first_seen":"3 May '17","last_visited":"11 Dec '17","prev_visited":"9 Nov '17","original":false,"redirect":0,"alt":"","anchor":"www.crossleyhotel.com.au","text_pre":"Website:","text_post":"","http_code":200,"url_from_first_seen":"2013-08-10T21:38:52Z","first_origin":"recrawl","last_origin":"recrawl","sitewide":false,"link_type":"Nofollow","nofollow":true,"number":4,"type":"","powered_by":[],"WidthAhrefsRank":0,"IsRedirectChain":false,"rows_limit_not_exceeded":true,"authorized_user":true,"free_user":false,"lite_user":false,"standard_user":false,"free_or_lite_user":false,"free_or_lite_or_standard_user":false,"position_limit_not_exceeded":true,"Visited":"11 Dec '17","TimeAgoSecond":true,"VisitedLess2Days":false,"VisitedMore2Days":false,"VisitedLess1Hour":false,"VisitedMore1Hour":false,"VisitedMore1Month":true,"VisitedDays":0,"VisitedHours":0,"VisitedMinutes":0,"new_ref_page":false,"curr_ref_page":true,"curr_ref_page_number":3,"empty_anchor":false,"show_anchor":true,"space_between_anchor_and_post":" ","http_code_name":"ok","https_url_from":false,"prepared_url_from":"http:\/\/www.cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","https_url_to":true,"prepared_url_to":"https:\/\/www.crossleyhotel.com.au\/","url_from_Parts":true,"url_from_Part2":"www.cityaccommodations.com.au","url_from_Part3":"\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","url_to_Parts":true,"url_to_Part2":"www.crossleyhotel.com.au","url_to_Part3":"\/","isJsRendered":false,"StatusDropped":false,"StatusPageRedirected":false,"StatusPageNoIndex":false,"StatusPageNonCanonical":false,"StatusLinkRemoved":false,"StatusLinkBrokenRedirect":false,"DateDeleted":false,"backlink_status":"","isLost":true,"deleted_at":"","empty_domain_rank":false,"low_domain_rank":true,"middle_domain_rank":false,"height_domain_rank":false,"DomainRating":45,"ExampleUniqueId":"101_domain3_4","ExampleTRId":"tr_example_backlinks","Language":false,"SocialMediaURLHash":"0a29309c991ccbb6c4cbc8884a5274e0","available_social_media":true,"not_available_social_media":false}],"Allow":true,"APIURLHash":"1e52c833a6961148516eadc7c536c81c","projectPath":"\/site-explorer","AnchorIndex":0,"DomainIndex":101,"SnippetIndex":0,"URLIndex":0,"AggregatorIndex":0,"TargetIndex":3,"Count":10,"ForAnchor":"","ForDomain":"cityaccommodations.com.au","ForSnippet":"","ForURL":"https:\/\/crossleyhotel.com.au\/","ForAggregator":"","Offset":0,"TotalCount":4,"CachedSecurityHash":"d72718a2895587e32a5a869c3858df14","DateFrom":"","DateTo":"","ExamplesDataType":"backlinks_for_intersect_domain","DataType":"backlinks_for_intersect_domain","MaxAhrefsRank":0,"AvailableDisavowLinks":false,"DashboardID":false,"DisavowInterface":false,"DefaultHistoryMode":"recent","HistoryMode":"recent","DataRowsLeft":5000000,"linkExtIntType":"","linkForExport":"https:\/\/ahrefs.com\/site-explorer\/ajax\/examples\/backlinks-for-intersect-domain\/d72718a2895587e32a5a869c3858df14?for_domain=cityaccommodations.com.au&domain_index=101&target_index=3","ReferringDomainsHistoryExamples":false,"ReferringDomainsNewExamples":false,"ExamplesModeAnchors":false,"ExamplesModeDomains":false,"ExamplesModeSnippets":false,"ExamplesModeLinkedAnchors":false,"ExamplesModeTopContent":false,"ExamplesModeIntersect":true,"SELinkBacklinks":"\/site-explorer\/backlinks\/v5\/external\/exact\/recent\/all\/all\/1\/ahrefs_rank_desc?target=https:\/\/crossleyhotel.com.au\/","social_media":"99e442558b5cc604a33f8818bff91601"},"examples_html":"<div id=\"data_for_examples_container_101_domain3\" class=\"bg-lightblue intable relative p-x-2 m-b-2\">\n <table class=\"table table-ahrefs intable bg-lightblue b-b-1px\" id=\"examples_table_101_domain3\">\n <tbody>\n <tr>\n\t\t\t\t\t\t<th class=\"width-2 p-l-0\">Referring page<\/th>\n\t\t\t\t\t\t<th class=\"width-01\" title=\"Domain rating\">DR<\/th>\n\t\t\t\t\t\t<th class=\"width-01\" title=\"URL rating\">UR<\/th>\n\t\t\t\t\t\t<th class=\"width-01\" title=\"External links\">Ext.<\/th>\n\t\t\t\t\t\t<th class=\"width-2 p-l-0\">Anchor and backlink<\/th>\n\t\t\t\t\t\t<th class=\"width-001 text-nowrap text-xs-left p-r-0\">First seen<br> Last check<\/th>\n <\/tr>\n <\/tbody>\n <\/table>\n <\/div>\n\t\t \n<div class=\"clearfix\"><\/div>","examples_footer_html":"<div class=\"examples-footer\">\n\t\t\t<div class=\"p-x-2 view-more-group m-b-2\">\n\t\t\t\t\t\t\t\t<a class=\"view-more border-right list-inline-item\" href=\"javascript:void(0);\" title=\"Hide All\"\n\t\t\t\t onclick=\"if ($('#examples_backlinks_container_101_domain3').length > 0) { $('#get_backlinks_link_101_domain3').click(); }\">\n\t\t\t\t\tHide All\n\t\t\t\t<\/a>\n\t\t\t\t<a class=\"clickable all__link\" title=\"Export all data in CSV\" data-limit=\"4\" onclick=\"CheckExportTimeout({link: 'https:\/\/ahrefs.com\/site-explorer\/ajax\/examples\/backlinks-for-intersect-domain\/d72718a2895587e32a5a869c3858df14?for_domain=cityaccommodations.com.au&domain_index=101&target_index=3', hash: '1e52c833a6961148516eadc7c536c81c', need_rows: 4, rawExportLeft: 5000000, limit_obj: this});\">\n <span class=\"icon icon--export colored\"><\/span> Export\n<\/a>\n\t\t\t<\/div>\n\t\t<\/div>\n\t","cache_expired":false}
Here is the link that, when clicked, makes the XHR request:
4
In dev tools, under Network, here is the content of the request:
General:
Request URL:https://ahrefs.com/site-explorer/ajax/examples/backlinks-for-intersect-domain/d72718a2895587e32a5a869c3858df14
Request Method:POST
Status Code:200
Remote Address:151.80.39.61:443
Referrer Policy:no-referrer-when-downgrade
Response Headers:
cache-control:private, must-revalidate
content-encoding:gzip
content-length:2419
content-type:application/json
date:Fri, 19 Jan 2018 19:13:57 GMT
expires:-1
pragma:no-cache
server:nginx/1.10.3
set-cookie:XSRF-TOKEN=eyJpdiI6IitNYmwxa1FTS29LdXEwd2xPWFpXTHc9PSIsInZhbHVlIjoicXBLSmxvMWJ1TCttdTBCV0R3ZWNZNkZBRzduVjcwaXpXbFdPY0JoaHRIXC9hQXRpNVBKYkc4RFZRWWtSUEM4T3NTclwvQTBWNFkxUXZSXC9NVlVQR0ZSTmc9PSIsIm1hYyI6IjNlNWE2NGU4NDViZDI4ZGNlNjZlNTVmYzFjYWMyOTllYTgyMThmZTc0OTY3YjAwYjhmZTdmOGI2MDE5MWMzNDQifQ%3D%3D; expires=Tue, 20-Mar-2018 19:13:57 GMT; Max-Age=5184000; path=/; domain=.ahrefs.com
set-cookie:ahrefs_cookie=eyJpdiI6IjJZZnI3bUcxNU9BUk51VnRYV2piaWc9PSIsInZhbHVlIjoidDZzV2VzMWloQ3BxeVI5eFQ3TFR0NVYybmVCdTdvVFBIMzJGNTMxMEFJclV1OFh5ZlBaeFRoY3d0cWxvYmExbnh1THAzTk9salZ0cXd0QTRadDZRWUE9PSIsIm1hYyI6IjQ5OTBiY2ExNDM1YjE1MzE4Nzc4MmNhZDRhNThhN2JmN2M2N2M2NWU1NWFlNGVhMThhOWY3NWM4NmEyZWFmZGMifQ%3D%3D; expires=Tue, 20-Mar-2018 19:13:57 GMT; Max-Age=5184000; path=/; domain=.ahrefs.com; HttpOnly
status:200
strict-transport-security:max-age=31536000
vary:Accept-Encoding
Request Headers:
:authority:ahrefs.com
:method:POST
:path:/site-explorer/ajax/examples/backlinks-for-intersect-domain/d72718a2895587e32a5a869c3858df14
:scheme:https
accept:application/json, text/javascript, */*; q=0.01
accept-encoding:gzip, deflate, br
accept-language:en-US,en;q=0.9
content-length:99
content-type:application/x-www-form-urlencoded; charset=UTF-8
cookie:ajs_anonymous_id=%2254791bca-cf67-4536-bd9e-231dcf9157ef%22; ajs_group_id=null; ajs_user_id=108300; _vis_opt_s=1%7C; _vwo_uuid_22=654C6CD9AE3842F79BD796977EB8A53F; _vis_opt_exp_22_combi=2; new_RT=1; _vwo_uuid_v2=81F71C91AD0F1AEA7D52FA84E4BCD177|9f489a6cfd5cc6a02bc483a2f51a4a34; intercom-id-dic5omcp=9bfc00e2-1f81-449f-ad47-a79360ad0b5e; _ga=GA1.2.1205864433.1504618215; intercom-lou-dic5omcp=1; PHPSESSID=ldbj3g6tnrq9k8qlekn1813fqc; ljs-lang=en; _gid=GA1.2.895488268.1516275764; intercom-session-dic5omcp=THQ1VW1zT3JpWGRVVitTM0hjbUdMRkRkOGRxamQ5Zkx4VUlrQVNiMkI1VSsxdFIwbTdTbGlrNitGakh4QlRmZC0tRkY3UE10WmxYWldaZVBKT2VGcm9SQT09--5f518888fe0a0a2d6f59fcbc441e827d45674ec0; XSRF-TOKEN=eyJpdiI6ImtOM1RIcG50aCtGY0lsZHdQcXVBcEE9PSIsInZhbHVlIjoiU1FNM2xsYmhYZXZGRTRlSmNaWmM3SXVZMjZpVzNyWkZRcVdqSm0rcHczd0xRSHJyQnpGbzVOZzlMWUtTaWJyeU5SMWlUUHZCcERZbEVWYUtuOUtaWlE9PSIsIm1hYyI6IjFlOTA1MWE0NjYzM2RiNGQyNTkyMmM2ZmI0MmM1MmE3Y2M2N2M1MTdjYTJiMGY1MTA2NTM1ZmFjZjUwOTRjMzQifQ%3D%3D; ahrefs_cookie=eyJpdiI6ImNPcHF4Sm9LclZTKzh4WEZ1aHlKSnc9PSIsInZhbHVlIjoiWHZcLytZTkFUK01XUE9SYWtkOCtlY1dcL1oyWDNlQURINmJ3M25cLzhWXC8yNFwvU3BQVTB2ZU50cG51QnRxeG9nQlFwMjVoa01taUxPYTFsSzVxSEZ3ZEZGUT09IiwibWFjIjoiNGFlNmUzMmQ3ZTQwYzgyM2JmN2U4NzU2NzgwYmY4ZGM0MjZhMDBhYWE1NjY3NGRhYmU1MDIyOTQzOWY4ZWExYiJ9
origin:https://ahrefs.com
referer:https://ahrefs.com/link-intersect/result/3/common:desc?linking[0]=http%3A%2F%2Fwww.grandchancellorhotels.com%2Fau%2Fmelbourne%2F&linking[1]=https%3A%2F%2Fwww.oakshotels.com%2Foaks-on-lonsdale&linking[2]=https%3A%2F%2Fcrossleyhotel.com.au%2F&linking[3]=https%3A%2F%2Fwww.rydges.com%2Faccommodation%2Fmelbourne-vic%2Fmelbourne-cbd%2F&linking[4]=http%3A%2F%2Fwww.thehotelwindsor.com.au%2F&linking[5]=http%3A%2F%2Fwww.spacehotel.com.au%2F&linking[6]=http%3A%2F%2Fwww.stamford.com.au%2Fspm%2F&linking[7]=https%3A%2F%2Fwww.citadines.com%2Fen%2Faustralia%2Fmelbourne%2Fcitadines-on-bourke-melbourne%2F&linking[8]=https%3A%2F%2Fmelbourne.grand.hyatt.com&linking[9]=http%3A%2F%2Fwww.somerset.com%2Faustralia%2Fmelbourne%2Fsomerset-on-elizabeth-melbourne%2F&linking-modes[0]=prefix&linking-modes[1]=prefix&linking-modes[2]=subdomains&linking-modes[3]=prefix&linking-modes[4]=subdomains&linking-modes[5]=subdomains&linking-modes[6]=prefix&linking-modes[7]=prefix&linking-modes[8]=prefix&linking-modes[9]=prefix&no-linking[0]=http%3A%2F%2Fwww.marriott.com%2Fhotels%2Ftravel%2Fmelmc-melbourne-marriott-hotel%2F&no-linking-modes[0]=prefix&is_union=1
user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36
x-csrf-token:kbIUUA1ygDlENDGohYxYbEhLPUVZZBIR2hWOcGoB
x-requested-with:XMLHttpRequest
Form Data:
for_domain:cityaccommodations.com.au
domain_index:101
target_index:3
offset:0
count:0
total_count:4

Related

How to creat connection websocket qxbroker in python

how to bypass HTTP/1.1 403 Forbidden in connect to wss://ws2.qxbroker.com/socket.io/EIO=3&transport=websocket, i try change user-agent and try use proxy and add cookis but not work
class WebsocketClient(object):
def __init__(self, api):
websocket.enableTrace(True)
Origin = 'Origin: https://qxbroker.com'
Extensions = 'Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits'
Host = 'Host: ws2.qxbroker.com'
Agent = 'User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 OPR/94.0.0.0'
self.api = api
self.wss=websocket.WebSocketApp(('wss://ws2.qxbroker.com/socket.io/EIO=3&transport=websocket'), on_message=(self.on_message),
on_error=(self.on_error),
on_close=(self.on_close),
on_open=(self.on_open),
header=[Origin,Extensions,Agent])
request and response header this site protect with cloudflare
--- request header ---
GET /socket.io/?EIO=3&transport=websocket HTTP/1.1
Upgrade: websocket
Host: ws2.qxbroker.com
Sec-WebSocket-Key: 7DgEjWxUp8N8PVY7N7vyDw==
Sec-WebSocket-Version: 13
Connection: Upgrade
Origin: https://qxbroker.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
-----------------------
--- response header ---
HTTP/1.1 403 Forbidden
Date: Sat, 11 Feb 2023 23:33:11 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: close
Permissions-Policy: accelerometer=(),autoplay=(),camera=(),clipboard-read=(),clipboard-write=(),fullscreen=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()
Referrer-Policy: same-origin
X-Frame-Options: SAMEORIGIN
Cache-Control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Set-Cookie: __cf_bm=7TD4hk4.bntJRdP6w9K.AjXF5MsV9LERTJV00jL2Uww-1676158391-0-AZFOKw90ZYdyy4RxX1xJ4jZQMt74+3UkQDZpDrdXE8BxGJULfe8j0T8EZnpUNXr2W3YHd/FxRoO/bPhKA2Dc0E0=; path=/; expires=Sun, 12-Feb-23 00:03:11 GMT; domain=.qxbroker.com; HttpOnly; Secure; SameSite=None
Server-Timing: cf-q-config;dur=6.9999950937927e-06
Server: cloudflare
CF-RAY: 7980e3583b6a0785-MRS

HTTP get request Access Denied

Trying to understand why I am getting access denied when attempting to download the index.html from www.gamestop.com. I have figured out how to get around it. https://www.gamestop.com/on/demandware.static/Sites-gamestop-us-Site/-/default/v1592871955944/js/main.js. I was wondering if anyone understood why the basic url (www.gamestop.com) is rejected.
Code:
import requests
import http.client as http_client
import logging
headers = {
'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding':'gzip, deflate, br',
'accept-language':'en-US,en;q=0.9',
'cache-control':'max-age=0',
'connection':'keep-alive',
'dnt':'1',
'downlink':'10',
'ect':'4g',
'rtt':'50',
'sec-fetch-dest':'document',
'sec-fetch-mode':'navigate',
'sec-fetch-site':'none',
'sec-fetch-user':'?1',
'upgrade-insecure-requests':'1',
'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.410 3.97 Safari/537.36'
}
http_client.HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
r = requests.get('https://www.gamestop.com', headers=headers)
print(r.text)
print(r.status_code)
print(r.headers)
Output:
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.gamestop.com:443
send: b'GET / HTTP/1.1\r\nHost: www.gamestop.com\r\nuser-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.410 3.97 Safari/537.36\r\naccept-encoding: gzip, deflate, br\r\naccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nconnection: keep-alive\r\naccept-language: en-US,en;q=0.9\r\ncache-control: max-age=0\r\ndnt: 1\r\ndownlink: 10\r\nect: 4g\r\nrtt: 50\r\nsec-fetch-dest: document\r\nsec-fetch-mode: navigate\r\nsec-fetch-site: none\r\nsec-fetch-user: ?1\r\nupgrade-insecure-requests: 1\r\n\r\n'
reply: 'HTTP/1.1 403 Forbidden\r\n'
header: Server: AkamaiGHost
header: Mime-Version: 1.0
header: Content-Type: text/html
header: Content-Length: 265
header: Expires: Fri, 26 Jun 2020 19:54:19 GMT
header: Date: Fri, 26 Jun 2020 19:54:19 GMT
header: Connection: close
header: Server-Timing: cdn-cache; desc=HIT
header: Server-Timing: cdn-cache; desc=HIT
DEBUG:urllib3.connectionpool:https://www.gamestop.com:443 "GET / HTTP/1.1" 403 265
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access "http://www.gamestop.com/" on this server.<P>
Reference #18.19e8d93f.1593201259.5c2b9d0
</BODY>
</HTML>
403
{'Server': 'AkamaiGHost', 'Mime-Version': '1.0', 'Content-Type': 'text/html', 'Content-Length': '265', 'Expires': 'Fri, 26 Jun 2020 19:54:19 GMT', 'Date': 'Fri, 26 Jun 2020 19:54:19 GMT', 'Connection': 'close', 'Server-Timing': 'cdn-cache; desc=HIT, edge; dur=1'}
This is a code from my another project.
By using python fake user agent you can bypass this;
Use google to learn more about those module that i used here..
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from fake_useragent import UserAgent
ua = UserAgent()
userAgent = ua.random
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument(f'user-agent={userAgent}')
driver = webdriver.Chrome(
executable_path=r'C:\Users\ASHIK\Desktop\chromedriver.exe', options=chrome_options)
driver.get("https://www.myntra.com/men?f=Categories%3ATshirts&p=1")
html_doc = driver.page_source
with open('myntra-ecom.html', 'w', encoding='utf-8') as hfile:
hfile.writelines(html_doc)
hfile.close()
print("Html file Downloaded...")

Python POST request to retrieve base64 encode File

Im trying to POST request using Python to retreive a specific File. Since the URL is behind a server with authorized access theres no use posting it here
However the form data contains a field called base64 and lengthy which I cant figure out if its a form data value or base64 encoding of post request
Here are browser parameters
General:
Request URL: http://exampleapi.com/api/Document/Export
Request Method: POST
Status Code: 200 OK
Remote Address: XX.XXX.XXX.XX:XX
Referrer Policy: no-referrer-when-downgrade
Response Headers:
Access-Control-Allow-Origin: http://example.com
Cache-Control: no-cache
Content-Disposition: attachment; filename=location-downloads.xlsx
Content-Length: 7148
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Date: Tue, 23 Jul 2019 21:00:18 GMT
Expires: -1
Pragma: no-cache
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Request Headers :
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 10162
Content-Type: application/x-www-form-urlencoded
Cookie: abcConnection=!UA7tkC3iZCmVNGRUyRpDWARVBWk/lY6SZvgxLlaygsQKk+vuwA1NxvhwE9ph4i+3NZlKeepIfuHhUvyQjl68fhhrT9ueqMx/3mBKUDcT
DNT: 1
Host: exampleapi.com
Origin: http://example.com
Referer: http://example.com/
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36
Form Data:
fileName: location-downloads.xlsx
contentType: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
base64: UEsDBAoAAAAAAAh4904AAAAAAAAAAAAAAAAJAAAAZG9jUHJvcHMvUEsDBAoAAAAIAAh490(shortened for simplicity)
Here is what I tried
url='http://example.com'
urllib3.disable_warnings()
headers = {
"Content-Type": "application/x-www-form-urlencoded",
"User-Agent": "Mozilla/5.0",
}
with requests.session() as s:
r=s.get(url,headers={"User-Agent":"Mozilla/5.0"},verify=False)
data=r.content
soup=BeautifulSoup(data,'html.parser')
form_data = {
"fileName":"location-downloads.xlsx",
"contentType":"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}
r2=s.post('http://exampleapi.com/api/Document/Export',data=json.dumps(form_data,ensure_ascii=True).encode('utf-8'),headers=headers,verify=False)
print(r2.status_code)
Any idea how i should proceed. My status code also shows 500 for the post here

Expressjs Route contains weird characters

What could possibly be the reason for expressjs route to contain the following data? I am expecting it to return JSON data. I am making an ajax call to the server(expressjs) which gives me the below data with weird characters. Is this data gzipped? I have set the headers and contentType as follows:
headers: {"Access-Control-Allow-Origin":"*"}
contentType: 'application/json; charset=utf-8'
�=O�0�b��K�)�%7�܈9���G��%NOU���O'6��k�~6��S.���,��/�wأ%6�K�)��e�
The HTTP response is as follows:
General:
Request URL: http://localhost/expressRoute.js
Request Method: GET
Status Code: 200 OK
Remote Address: [::1]:80
Referrer Policy: no-referrer-when-downgrade
Response Headers:
Accept-Ranges: bytes
Connection: Keep-Alive
Content-Length: 29396
Content-Type: application/javascript
Date: Thu, 22 Nov 2018 00:50:36 GMT
ETag: "72d4-57b124e0c372e"
Keep-Alive: timeout=5, max=100
Last-Modified: Tue, 20 Nov 2018 05:57:12 GMT
Server: Apache/2.4.34 (Win32) OpenSSL/1.1.0i PHP/7.2.10
Request Headers:
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cache-Control: no-cache
Connection: keep-alive
Host: localhost
Pragma: no-cache
Referer: http://localhost/index.html
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36

Post request returns empty string (python)

I need to get data from site. To get this data user have to enter post code first. After exploring the source code i got the following.
Response result (that is what i need after all)
{PostCodePK: 16666, PostCode: "7468", City: "MACQUARIE HEADS", State: "TAS", Country: "AUST",…}
1
:
{PostCodePK: 16667, PostCode: "7468", City: "STRAHAN", State: "TAS", Country: "AUST",…}
Request Data.
Request URL:http://www.lucasmill.com/Resources/ws-common.aspx
Request Method:POST
Status Code:200 OK
Remote Address:111.67.1.113:80
Response Headers
view source
Cache-Control:private
Content-Length:321
Content-Type:application/json; charset=utf-8
Date:Tue, 21 Feb 2017 20:19:26 GMT
Expires:Tue, 21 Feb 2017 20:19:26 GMT
Server:Microsoft-IIS/8.5
Set-Cookie:dnn_IsMobile=False; path=/; HttpOnly
Request Headers
view source
Accept:application/json, text/javascript, */*; q=0.01
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8,ru;q=0.6,uk;q=0.4
Connection:keep-alive
Content-Length:18
Content-Type:application/json; charset=UTF-8
Cookie:.ASPXANONYMOUS=ptTlH_jC0gEkAAAAZTU4MTA5NTItZmNlZS00MzRjLThmYTgtMWZkYWNkOTEwZmY00; dnn_IsMobile=False; language=en-AU; __utmt=1; __utma=97280258.254723646.1487697408.1487697408.1487708346.2; __utmb=97280258.1.10.1487708346; __utmc=97280258; __utmz=97280258.1487697408.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
DNN-Service:true
DNN-Service-Method:GetTown
DNT:1
Host:www.lucasmill.com
Origin:http://www.lucasmill.com
Referer:http://www.lucasmill.com/Sawmilling-Contractors
User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
X-Requested-With:XMLHttpRequest
Request Payload
{ 'param':'7468' }
Here is my python code.
import requests
r = requests.post('http://www.lucasmill.com/Resources/ws-common.aspx',data={ 'param':'7468' })
print(r.text)
but all i receive in response is an empty string.
Where am i wrong?
You need to add a few extra headers that particular application requires.
If you look at the request headers dump in the browser, you can see the following:
So, translated to python it will look like:
import json
import requests
headers = {
'DNN-Service': 'true',
'DNN-Service-Method': 'GetTown',
}
r = requests.post('http://www.lucasmill.com/Resources/ws-common.aspx',data=json.dumps({'param':7468}), headers=headers)
print(r.text)
Output:
[none][22:28:30] vlazarenko#alluminium (~/tests)$ python post.py
[{"PostCodePK":16666,"PostCode":"7468","City":"MACQUARIE
HEADS","State":"TAS","Country":"AUST","Latitude":"-42.2149353","Longitude":"145.1951436","CanGeocode":true},{"PostCodePK":16667,"PostCode":"7468","City":"STRAHAN","State":"TAS","Country":"AUST","Latitude":"-42.1534771","Longitude":"145.3281242","CanGeocode":true}]

Resources