How to mimic an XHR request in Ahrefs.com with Python? - python-3.x
I am trying to scrape data from a page that comes from an XHR request. The request is made when the user clicks a link. I've been trying to mimic the request with my scraper, by using the hash in the link's 'onClick' attribute. I can get it to work for the first link, but I need to iterate each of the links on the page. The page itself is behind a login, so I am reproducing as much of what I think is needed below.
The bigger question is this: am I even trying the right thing? Should I use something completely different? Any direction is appreciated.
I am still quite new to this, so my thought is there is perhaps something obvious that I am missing.
Here is my code:
import requests
import json
headers = {
'accept':'application/json, text/javascript, */*; q=0.01',
'accept-encoding':'gzip, deflate, br',
'accept-language':'en-US,en;q=0.9',
#'content-length':'93',
'content-type':'application/x-www-form-urlencoded; charset=UTF-8',
'cookie':'ajs_anonymous_id=%2254791bca-cf67-4536-bd9e-231dcf9157ef%22; ajs_group_id=null; ajs_user_id=108300; _vis_opt_s=1%7C; _vwo_uuid_22=654C6CD9AE3842F79BD796977EB8A53F; _vis_opt_exp_22_combi=2; new_RT=1; _vwo_uuid_v2=81F71C91AD0F1AEA7D52FA84E4BCD177|9f489a6cfd5cc6a02bc483a2f51a4a34; intercom-id-dic5omcp=9bfc00e2-1f81-449f-ad47-a79360ad0b5e; _ga=GA1.2.1205864433.1504618215; intercom-lou-dic5omcp=1; PHPSESSID=ldbj3g6tnrq9k8qlekn1813fqc; ljs-lang=en; _gid=GA1.2.895488268.1516275764; intercom-session-dic5omcp=THQ1VW1zT3JpWGRVVitTM0hjbUdMRkRkOGRxamQ5Zkx4VUlrQVNiMkI1VSsxdFIwbTdTbGlrNitGakh4QlRmZC0tRkY3UE10WmxYWldaZVBKT2VGcm9SQT09--5f518888fe0a0a2d6f59fcbc441e827d45674ec0; XSRF-TOKEN=eyJpdiI6ImtOM1RIcG50aCtGY0lsZHdQcXVBcEE9PSIsInZhbHVlIjoiU1FNM2xsYmhYZXZGRTRlSmNaWmM3SXVZMjZpVzNyWkZRcVdqSm0rcHczd0xRSHJyQnpGbzVOZzlMWUtTaWJyeU5SMWlUUHZCcERZbEVWYUtuOUtaWlE9PSIsIm1hYyI6IjFlOTA1MWE0NjYzM2RiNGQyNTkyMmM2ZmI0MmM1MmE3Y2M2N2M1MTdjYTJiMGY1MTA2NTM1ZmFjZjUwOTRjMzQifQ%3D%3D; ahrefs_cookie=eyJpdiI6ImNPcHF4Sm9LclZTKzh4WEZ1aHlKSnc9PSIsInZhbHVlIjoiWHZcLytZTkFUK01XUE9SYWtkOCtlY1dcL1oyWDNlQURINmJ3M25cLzhWXC8yNFwvU3BQVTB2ZU50cG51QnRxeG9nQlFwMjVoa01taUxPYTFsSzVxSEZ3ZEZGUT09IiwibWFjIjoiNGFlNmUzMmQ3ZTQwYzgyM2JmN2U4NzU2NzgwYmY4ZGM0MjZhMDBhYWE1NjY3NGRhYmU1MDIyOTQzOWY4ZWExYiJ9',
'origin':'https://ahrefs.com',
'referer':'https://ahrefs.com/link-intersect/result/3/common:desc?linking[0]=http%3A%2F%2Fwww.grandchancellorhotels.com%2Fau%2Fmelbourne%2F&linking[1]=https%3A%2F%2Fwww.oakshotels.com%2Foaks-on-lonsdale&linking[2]=https%3A%2F%2Fcrossleyhotel.com.au%2F&linking[3]=https%3A%2F%2Fwww.rydges.com%2Faccommodation%2Fmelbourne-vic%2Fmelbourne-cbd%2F&linking[4]=http%3A%2F%2Fwww.thehotelwindsor.com.au%2F&linking[5]=http%3A%2F%2Fwww.spacehotel.com.au%2F&linking[6]=http%3A%2F%2Fwww.stamford.com.au%2Fspm%2F&linking[7]=https%3A%2F%2Fwww.citadines.com%2Fen%2Faustralia%2Fmelbourne%2Fcitadines-on-bourke-melbourne%2F&linking[8]=https%3A%2F%2Fmelbourne.grand.hyatt.com&linking[9]=http%3A%2F%2Fwww.somerset.com%2Faustralia%2Fmelbourne%2Fsomerset-on-elizabeth-melbourne%2F&linking-modes[0]=prefix&linking-modes[1]=prefix&linking-modes[2]=subdomains&linking-modes[3]=prefix&linking-modes[4]=subdomains&linking-modes[5]=subdomains&linking-modes[6]=prefix&linking-modes[7]=prefix&linking-modes[8]=prefix&linking-modes[9]=prefix&no-linking[0]=http%3A%2F%2Fwww.marriott.com%2Fhotels%2Ftravel%2Fmelmc-melbourne-marriott-hotel%2F&no-linking-modes[0]=prefix&is_union=1',
'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
'x-csrf-token':'kbIUUA1ygDlENDGohYxYbEhLPUVZZBIR2hWOcGoB',
'x-requested-with':'XMLHttpRequest'
}
data={
'for_domain':'cityaccommodations.com.au',
'domain_index':101,
'target_index':3,
'offset':0,
'count':0,
'total_count':4
}
r = requests.post('https://ahrefs.com/site-explorer/ajax/examples/backlinks-for-intersect-domain/d72718a2895587e32a5a869c3858df14', headers=headers, data=data)
response = r.text
python_object = json.loads(response)
print(python_object)
for item in python_object:
print(item)
for thing in item:
print(". " + thing)
print(r.text)
It doesn't error, but the JSON returned has an empty array for 'result', and I noticed 'cache-expired' is True. The JSON returned, if it works properly, would be:
{"examples_data":{"result":[{"url_from":"http:\/\/cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","ahrefs_rank":8,"domain_rating":45,"ahrefs_top":0,"ip_from":"182.50.132.56","links_internal":"14","links_external":"7","page_size":"5 kB","encoding":"iso-8859-1","title":"The Crossley Hotel Managed by Mecure - Accommodation - Melbourne City, Little Bourke St, Melbourne","language":"en","url_to":"https:\/\/crossleyhotel.com.au\/","first_seen":"27 Jun '17","last_visited":"16 Jan '18","prev_visited":"1 Dec '17","original":false,"redirect":0,"alt":"","anchor":"www.crossleyhotel.com.au","text_pre":"Website:","text_post":"","http_code":200,"url_from_first_seen":"2016-01-23T19:23:54Z","first_origin":"recrawl","last_origin":"recrawl","sitewide":false,"link_type":"Nofollow","nofollow":true,"number":1,"type":"","powered_by":[],"WidthAhrefsRank":0,"IsRedirectChain":false,"rows_limit_not_exceeded":true,"authorized_user":true,"free_user":false,"lite_user":false,"standard_user":false,"free_or_lite_user":false,"free_or_lite_or_standard_user":false,"position_limit_not_exceeded":true,"Visited":"16 Jan '18","TimeAgoSecond":true,"VisitedLess2Days":false,"VisitedMore2Days":true,"VisitedLess1Hour":false,"VisitedMore1Hour":false,"VisitedMore1Month":false,"VisitedDays":3,"VisitedHours":0,"VisitedMinutes":0,"new_ref_page":true,"curr_ref_page":false,"curr_ref_page_number":1,"empty_anchor":false,"show_anchor":true,"space_between_anchor_and_post":" ","http_code_name":"ok","https_url_from":false,"prepared_url_from":"http:\/\/cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","https_url_to":true,"prepared_url_to":"https:\/\/crossleyhotel.com.au\/","url_from_Parts":true,"url_from_Part2":"cityaccommodations.com.au","url_from_Part3":"\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","url_to_Parts":true,"url_to_Part2":"crossleyhotel.com.au","url_to_Part3":"\/","isJsRendered":false,"StatusDropped":false,"StatusPageRedirected":false,"StatusPageNoIndex":false,"StatusPageNonCanonical":false,"StatusLinkRemoved":false,"StatusLinkBrokenRedirect":false,"DateDeleted":false,"backlink_status":"","isLost":true,"deleted_at":"","empty_domain_rank":false,"low_domain_rank":true,"middle_domain_rank":false,"height_domain_rank":false,"DomainRating":45,"ExampleUniqueId":"101_domain3_1","ExampleTRId":"tr_example_backlinks","Language":"English","SocialMediaURLHash":"63bb0d959c467f5828824e310bb87690","available_social_media":true,"not_available_social_media":false},{"url_from":"http:\/\/cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","ahrefs_rank":8,"domain_rating":45,"ahrefs_top":0,"ip_from":"182.50.132.56","links_internal":"14","links_external":"7","page_size":"5 kB","encoding":"iso-8859-1","title":"The Crossley Hotel Managed by Mecure - Accommodation - Melbourne City, Little Bourke St, Melbourne","language":"en","url_to":"https:\/\/www.crossleyhotel.com.au\/","first_seen":"21 Apr '17","last_visited":"16 Jan '18","prev_visited":"1 Dec '17","original":false,"redirect":0,"alt":"","anchor":"www.crossleyhotel.com.au","text_pre":"Website:","text_post":"","http_code":200,"url_from_first_seen":"2016-01-23T19:23:54Z","first_origin":"recrawl","last_origin":"recrawl","sitewide":false,"link_type":"Nofollow","nofollow":true,"number":2,"type":"","powered_by":[],"WidthAhrefsRank":0,"IsRedirectChain":false,"rows_limit_not_exceeded":true,"authorized_user":true,"free_user":false,"lite_user":false,"standard_user":false,"free_or_lite_user":false,"free_or_lite_or_standard_user":false,"position_limit_not_exceeded":true,"Visited":"16 Jan '18","TimeAgoSecond":true,"VisitedLess2Days":false,"VisitedMore2Days":true,"VisitedLess1Hour":false,"VisitedMore1Hour":false,"VisitedMore1Month":false,"VisitedDays":3,"VisitedHours":0,"VisitedMinutes":0,"new_ref_page":false,"curr_ref_page":true,"curr_ref_page_number":1,"empty_anchor":false,"show_anchor":true,"space_between_anchor_and_post":" ","http_code_name":"ok","https_url_from":false,"prepared_url_from":"http:\/\/cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","https_url_to":true,"prepared_url_to":"https:\/\/www.crossleyhotel.com.au\/","url_from_Parts":true,"url_from_Part2":"cityaccommodations.com.au","url_from_Part3":"\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","url_to_Parts":true,"url_to_Part2":"www.crossleyhotel.com.au","url_to_Part3":"\/","isJsRendered":false,"StatusDropped":false,"StatusPageRedirected":false,"StatusPageNoIndex":false,"StatusPageNonCanonical":false,"StatusLinkRemoved":false,"StatusLinkBrokenRedirect":false,"DateDeleted":false,"backlink_status":"","isLost":true,"deleted_at":"","empty_domain_rank":false,"low_domain_rank":true,"middle_domain_rank":false,"height_domain_rank":false,"DomainRating":45,"ExampleUniqueId":"101_domain3_2","ExampleTRId":"tr_example_backlinks","Language":"English","SocialMediaURLHash":"63bb0d959c467f5828824e310bb87690","available_social_media":true,"not_available_social_media":false},{"url_from":"http:\/\/www.cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","ahrefs_rank":8,"domain_rating":45,"ahrefs_top":0,"ip_from":"182.50.132.56","links_internal":"14","links_external":"7","page_size":"5 kB","encoding":"iso-8859-1","title":"The Crossley Hotel Managed by Mecure - Accommodation - Melbourne City, Little Bourke St, Melbourne","language":"","url_to":"https:\/\/crossleyhotel.com.au\/","first_seen":"14 Jul '17","last_visited":"11 Dec '17","prev_visited":"9 Nov '17","original":false,"redirect":0,"alt":"","anchor":"www.crossleyhotel.com.au","text_pre":"Website:","text_post":"","http_code":200,"url_from_first_seen":"2013-08-10T21:38:52Z","first_origin":"recrawl","last_origin":"recrawl","sitewide":false,"link_type":"Nofollow","nofollow":true,"number":3,"type":"","powered_by":[],"WidthAhrefsRank":0,"IsRedirectChain":false,"rows_limit_not_exceeded":true,"authorized_user":true,"free_user":false,"lite_user":false,"standard_user":false,"free_or_lite_user":false,"free_or_lite_or_standard_user":false,"position_limit_not_exceeded":true,"Visited":"11 Dec '17","TimeAgoSecond":true,"VisitedLess2Days":false,"VisitedMore2Days":false,"VisitedLess1Hour":false,"VisitedMore1Hour":false,"VisitedMore1Month":true,"VisitedDays":0,"VisitedHours":0,"VisitedMinutes":0,"new_ref_page":true,"curr_ref_page":false,"curr_ref_page_number":3,"empty_anchor":false,"show_anchor":true,"space_between_anchor_and_post":" ","http_code_name":"ok","https_url_from":false,"prepared_url_from":"http:\/\/www.cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","https_url_to":true,"prepared_url_to":"https:\/\/crossleyhotel.com.au\/","url_from_Parts":true,"url_from_Part2":"www.cityaccommodations.com.au","url_from_Part3":"\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","url_to_Parts":true,"url_to_Part2":"crossleyhotel.com.au","url_to_Part3":"\/","isJsRendered":false,"StatusDropped":false,"StatusPageRedirected":false,"StatusPageNoIndex":false,"StatusPageNonCanonical":false,"StatusLinkRemoved":false,"StatusLinkBrokenRedirect":false,"DateDeleted":false,"backlink_status":"","isLost":true,"deleted_at":"","empty_domain_rank":false,"low_domain_rank":true,"middle_domain_rank":false,"height_domain_rank":false,"DomainRating":45,"ExampleUniqueId":"101_domain3_3","ExampleTRId":"tr_example_backlinks","Language":false,"SocialMediaURLHash":"0a29309c991ccbb6c4cbc8884a5274e0","available_social_media":true,"not_available_social_media":false},{"url_from":"http:\/\/www.cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","ahrefs_rank":8,"domain_rating":45,"ahrefs_top":0,"ip_from":"182.50.132.56","links_internal":"14","links_external":"7","page_size":"5 kB","encoding":"iso-8859-1","title":"The Crossley Hotel Managed by Mecure - Accommodation - Melbourne City, Little Bourke St, Melbourne","language":"","url_to":"https:\/\/www.crossleyhotel.com.au\/","first_seen":"3 May '17","last_visited":"11 Dec '17","prev_visited":"9 Nov '17","original":false,"redirect":0,"alt":"","anchor":"www.crossleyhotel.com.au","text_pre":"Website:","text_post":"","http_code":200,"url_from_first_seen":"2013-08-10T21:38:52Z","first_origin":"recrawl","last_origin":"recrawl","sitewide":false,"link_type":"Nofollow","nofollow":true,"number":4,"type":"","powered_by":[],"WidthAhrefsRank":0,"IsRedirectChain":false,"rows_limit_not_exceeded":true,"authorized_user":true,"free_user":false,"lite_user":false,"standard_user":false,"free_or_lite_user":false,"free_or_lite_or_standard_user":false,"position_limit_not_exceeded":true,"Visited":"11 Dec '17","TimeAgoSecond":true,"VisitedLess2Days":false,"VisitedMore2Days":false,"VisitedLess1Hour":false,"VisitedMore1Hour":false,"VisitedMore1Month":true,"VisitedDays":0,"VisitedHours":0,"VisitedMinutes":0,"new_ref_page":false,"curr_ref_page":true,"curr_ref_page_number":3,"empty_anchor":false,"show_anchor":true,"space_between_anchor_and_post":" ","http_code_name":"ok","https_url_from":false,"prepared_url_from":"http:\/\/www.cityaccommodations.com.au\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","https_url_to":true,"prepared_url_to":"https:\/\/www.crossleyhotel.com.au\/","url_from_Parts":true,"url_from_Part2":"www.cityaccommodations.com.au","url_from_Part3":"\/Melbourne_accommodations\/The_Crossley_Hotel_Managed_by_Mecure\/Accommodation\/1378","url_to_Parts":true,"url_to_Part2":"www.crossleyhotel.com.au","url_to_Part3":"\/","isJsRendered":false,"StatusDropped":false,"StatusPageRedirected":false,"StatusPageNoIndex":false,"StatusPageNonCanonical":false,"StatusLinkRemoved":false,"StatusLinkBrokenRedirect":false,"DateDeleted":false,"backlink_status":"","isLost":true,"deleted_at":"","empty_domain_rank":false,"low_domain_rank":true,"middle_domain_rank":false,"height_domain_rank":false,"DomainRating":45,"ExampleUniqueId":"101_domain3_4","ExampleTRId":"tr_example_backlinks","Language":false,"SocialMediaURLHash":"0a29309c991ccbb6c4cbc8884a5274e0","available_social_media":true,"not_available_social_media":false}],"Allow":true,"APIURLHash":"1e52c833a6961148516eadc7c536c81c","projectPath":"\/site-explorer","AnchorIndex":0,"DomainIndex":101,"SnippetIndex":0,"URLIndex":0,"AggregatorIndex":0,"TargetIndex":3,"Count":10,"ForAnchor":"","ForDomain":"cityaccommodations.com.au","ForSnippet":"","ForURL":"https:\/\/crossleyhotel.com.au\/","ForAggregator":"","Offset":0,"TotalCount":4,"CachedSecurityHash":"d72718a2895587e32a5a869c3858df14","DateFrom":"","DateTo":"","ExamplesDataType":"backlinks_for_intersect_domain","DataType":"backlinks_for_intersect_domain","MaxAhrefsRank":0,"AvailableDisavowLinks":false,"DashboardID":false,"DisavowInterface":false,"DefaultHistoryMode":"recent","HistoryMode":"recent","DataRowsLeft":5000000,"linkExtIntType":"","linkForExport":"https:\/\/ahrefs.com\/site-explorer\/ajax\/examples\/backlinks-for-intersect-domain\/d72718a2895587e32a5a869c3858df14?for_domain=cityaccommodations.com.au&domain_index=101&target_index=3","ReferringDomainsHistoryExamples":false,"ReferringDomainsNewExamples":false,"ExamplesModeAnchors":false,"ExamplesModeDomains":false,"ExamplesModeSnippets":false,"ExamplesModeLinkedAnchors":false,"ExamplesModeTopContent":false,"ExamplesModeIntersect":true,"SELinkBacklinks":"\/site-explorer\/backlinks\/v5\/external\/exact\/recent\/all\/all\/1\/ahrefs_rank_desc?target=https:\/\/crossleyhotel.com.au\/","social_media":"99e442558b5cc604a33f8818bff91601"},"examples_html":"<div id=\"data_for_examples_container_101_domain3\" class=\"bg-lightblue intable relative p-x-2 m-b-2\">\n <table class=\"table table-ahrefs intable bg-lightblue b-b-1px\" id=\"examples_table_101_domain3\">\n <tbody>\n <tr>\n\t\t\t\t\t\t<th class=\"width-2 p-l-0\">Referring page<\/th>\n\t\t\t\t\t\t<th class=\"width-01\" title=\"Domain rating\">DR<\/th>\n\t\t\t\t\t\t<th class=\"width-01\" title=\"URL rating\">UR<\/th>\n\t\t\t\t\t\t<th class=\"width-01\" title=\"External links\">Ext.<\/th>\n\t\t\t\t\t\t<th class=\"width-2 p-l-0\">Anchor and backlink<\/th>\n\t\t\t\t\t\t<th class=\"width-001 text-nowrap text-xs-left p-r-0\">First seen<br> Last check<\/th>\n <\/tr>\n <\/tbody>\n <\/table>\n <\/div>\n\t\t \n<div class=\"clearfix\"><\/div>","examples_footer_html":"<div class=\"examples-footer\">\n\t\t\t<div class=\"p-x-2 view-more-group m-b-2\">\n\t\t\t\t\t\t\t\t<a class=\"view-more border-right list-inline-item\" href=\"javascript:void(0);\" title=\"Hide All\"\n\t\t\t\t onclick=\"if ($('#examples_backlinks_container_101_domain3').length > 0) { $('#get_backlinks_link_101_domain3').click(); }\">\n\t\t\t\t\tHide All\n\t\t\t\t<\/a>\n\t\t\t\t<a class=\"clickable all__link\" title=\"Export all data in CSV\" data-limit=\"4\" onclick=\"CheckExportTimeout({link: 'https:\/\/ahrefs.com\/site-explorer\/ajax\/examples\/backlinks-for-intersect-domain\/d72718a2895587e32a5a869c3858df14?for_domain=cityaccommodations.com.au&domain_index=101&target_index=3', hash: '1e52c833a6961148516eadc7c536c81c', need_rows: 4, rawExportLeft: 5000000, limit_obj: this});\">\n <span class=\"icon icon--export colored\"><\/span> Export\n<\/a>\n\t\t\t<\/div>\n\t\t<\/div>\n\t","cache_expired":false}
Here is the link that, when clicked, makes the XHR request:
4
In dev tools, under Network, here is the content of the request:
General:
Request URL:https://ahrefs.com/site-explorer/ajax/examples/backlinks-for-intersect-domain/d72718a2895587e32a5a869c3858df14
Request Method:POST
Status Code:200
Remote Address:151.80.39.61:443
Referrer Policy:no-referrer-when-downgrade
Response Headers:
cache-control:private, must-revalidate
content-encoding:gzip
content-length:2419
content-type:application/json
date:Fri, 19 Jan 2018 19:13:57 GMT
expires:-1
pragma:no-cache
server:nginx/1.10.3
set-cookie:XSRF-TOKEN=eyJpdiI6IitNYmwxa1FTS29LdXEwd2xPWFpXTHc9PSIsInZhbHVlIjoicXBLSmxvMWJ1TCttdTBCV0R3ZWNZNkZBRzduVjcwaXpXbFdPY0JoaHRIXC9hQXRpNVBKYkc4RFZRWWtSUEM4T3NTclwvQTBWNFkxUXZSXC9NVlVQR0ZSTmc9PSIsIm1hYyI6IjNlNWE2NGU4NDViZDI4ZGNlNjZlNTVmYzFjYWMyOTllYTgyMThmZTc0OTY3YjAwYjhmZTdmOGI2MDE5MWMzNDQifQ%3D%3D; expires=Tue, 20-Mar-2018 19:13:57 GMT; Max-Age=5184000; path=/; domain=.ahrefs.com
set-cookie:ahrefs_cookie=eyJpdiI6IjJZZnI3bUcxNU9BUk51VnRYV2piaWc9PSIsInZhbHVlIjoidDZzV2VzMWloQ3BxeVI5eFQ3TFR0NVYybmVCdTdvVFBIMzJGNTMxMEFJclV1OFh5ZlBaeFRoY3d0cWxvYmExbnh1THAzTk9salZ0cXd0QTRadDZRWUE9PSIsIm1hYyI6IjQ5OTBiY2ExNDM1YjE1MzE4Nzc4MmNhZDRhNThhN2JmN2M2N2M2NWU1NWFlNGVhMThhOWY3NWM4NmEyZWFmZGMifQ%3D%3D; expires=Tue, 20-Mar-2018 19:13:57 GMT; Max-Age=5184000; path=/; domain=.ahrefs.com; HttpOnly
status:200
strict-transport-security:max-age=31536000
vary:Accept-Encoding
Request Headers:
:authority:ahrefs.com
:method:POST
:path:/site-explorer/ajax/examples/backlinks-for-intersect-domain/d72718a2895587e32a5a869c3858df14
:scheme:https
accept:application/json, text/javascript, */*; q=0.01
accept-encoding:gzip, deflate, br
accept-language:en-US,en;q=0.9
content-length:99
content-type:application/x-www-form-urlencoded; charset=UTF-8
cookie:ajs_anonymous_id=%2254791bca-cf67-4536-bd9e-231dcf9157ef%22; ajs_group_id=null; ajs_user_id=108300; _vis_opt_s=1%7C; _vwo_uuid_22=654C6CD9AE3842F79BD796977EB8A53F; _vis_opt_exp_22_combi=2; new_RT=1; _vwo_uuid_v2=81F71C91AD0F1AEA7D52FA84E4BCD177|9f489a6cfd5cc6a02bc483a2f51a4a34; intercom-id-dic5omcp=9bfc00e2-1f81-449f-ad47-a79360ad0b5e; _ga=GA1.2.1205864433.1504618215; intercom-lou-dic5omcp=1; PHPSESSID=ldbj3g6tnrq9k8qlekn1813fqc; ljs-lang=en; _gid=GA1.2.895488268.1516275764; intercom-session-dic5omcp=THQ1VW1zT3JpWGRVVitTM0hjbUdMRkRkOGRxamQ5Zkx4VUlrQVNiMkI1VSsxdFIwbTdTbGlrNitGakh4QlRmZC0tRkY3UE10WmxYWldaZVBKT2VGcm9SQT09--5f518888fe0a0a2d6f59fcbc441e827d45674ec0; XSRF-TOKEN=eyJpdiI6ImtOM1RIcG50aCtGY0lsZHdQcXVBcEE9PSIsInZhbHVlIjoiU1FNM2xsYmhYZXZGRTRlSmNaWmM3SXVZMjZpVzNyWkZRcVdqSm0rcHczd0xRSHJyQnpGbzVOZzlMWUtTaWJyeU5SMWlUUHZCcERZbEVWYUtuOUtaWlE9PSIsIm1hYyI6IjFlOTA1MWE0NjYzM2RiNGQyNTkyMmM2ZmI0MmM1MmE3Y2M2N2M1MTdjYTJiMGY1MTA2NTM1ZmFjZjUwOTRjMzQifQ%3D%3D; ahrefs_cookie=eyJpdiI6ImNPcHF4Sm9LclZTKzh4WEZ1aHlKSnc9PSIsInZhbHVlIjoiWHZcLytZTkFUK01XUE9SYWtkOCtlY1dcL1oyWDNlQURINmJ3M25cLzhWXC8yNFwvU3BQVTB2ZU50cG51QnRxeG9nQlFwMjVoa01taUxPYTFsSzVxSEZ3ZEZGUT09IiwibWFjIjoiNGFlNmUzMmQ3ZTQwYzgyM2JmN2U4NzU2NzgwYmY4ZGM0MjZhMDBhYWE1NjY3NGRhYmU1MDIyOTQzOWY4ZWExYiJ9
origin:https://ahrefs.com
referer:https://ahrefs.com/link-intersect/result/3/common:desc?linking[0]=http%3A%2F%2Fwww.grandchancellorhotels.com%2Fau%2Fmelbourne%2F&linking[1]=https%3A%2F%2Fwww.oakshotels.com%2Foaks-on-lonsdale&linking[2]=https%3A%2F%2Fcrossleyhotel.com.au%2F&linking[3]=https%3A%2F%2Fwww.rydges.com%2Faccommodation%2Fmelbourne-vic%2Fmelbourne-cbd%2F&linking[4]=http%3A%2F%2Fwww.thehotelwindsor.com.au%2F&linking[5]=http%3A%2F%2Fwww.spacehotel.com.au%2F&linking[6]=http%3A%2F%2Fwww.stamford.com.au%2Fspm%2F&linking[7]=https%3A%2F%2Fwww.citadines.com%2Fen%2Faustralia%2Fmelbourne%2Fcitadines-on-bourke-melbourne%2F&linking[8]=https%3A%2F%2Fmelbourne.grand.hyatt.com&linking[9]=http%3A%2F%2Fwww.somerset.com%2Faustralia%2Fmelbourne%2Fsomerset-on-elizabeth-melbourne%2F&linking-modes[0]=prefix&linking-modes[1]=prefix&linking-modes[2]=subdomains&linking-modes[3]=prefix&linking-modes[4]=subdomains&linking-modes[5]=subdomains&linking-modes[6]=prefix&linking-modes[7]=prefix&linking-modes[8]=prefix&linking-modes[9]=prefix&no-linking[0]=http%3A%2F%2Fwww.marriott.com%2Fhotels%2Ftravel%2Fmelmc-melbourne-marriott-hotel%2F&no-linking-modes[0]=prefix&is_union=1
user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36
x-csrf-token:kbIUUA1ygDlENDGohYxYbEhLPUVZZBIR2hWOcGoB
x-requested-with:XMLHttpRequest
Form Data:
for_domain:cityaccommodations.com.au
domain_index:101
target_index:3
offset:0
count:0
total_count:4
Related
How to creat connection websocket qxbroker in python
how to bypass HTTP/1.1 403 Forbidden in connect to wss://ws2.qxbroker.com/socket.io/EIO=3&transport=websocket, i try change user-agent and try use proxy and add cookis but not work class WebsocketClient(object): def __init__(self, api): websocket.enableTrace(True) Origin = 'Origin: https://qxbroker.com' Extensions = 'Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits' Host = 'Host: ws2.qxbroker.com' Agent = 'User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 OPR/94.0.0.0' self.api = api self.wss=websocket.WebSocketApp(('wss://ws2.qxbroker.com/socket.io/EIO=3&transport=websocket'), on_message=(self.on_message), on_error=(self.on_error), on_close=(self.on_close), on_open=(self.on_open), header=[Origin,Extensions,Agent]) request and response header this site protect with cloudflare --- request header --- GET /socket.io/?EIO=3&transport=websocket HTTP/1.1 Upgrade: websocket Host: ws2.qxbroker.com Sec-WebSocket-Key: 7DgEjWxUp8N8PVY7N7vyDw== Sec-WebSocket-Version: 13 Connection: Upgrade Origin: https://qxbroker.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 ----------------------- --- response header --- HTTP/1.1 403 Forbidden Date: Sat, 11 Feb 2023 23:33:11 GMT Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Connection: close Permissions-Policy: accelerometer=(),autoplay=(),camera=(),clipboard-read=(),clipboard-write=(),fullscreen=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=() Referrer-Policy: same-origin X-Frame-Options: SAMEORIGIN Cache-Control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Expires: Thu, 01 Jan 1970 00:00:01 GMT Set-Cookie: __cf_bm=7TD4hk4.bntJRdP6w9K.AjXF5MsV9LERTJV00jL2Uww-1676158391-0-AZFOKw90ZYdyy4RxX1xJ4jZQMt74+3UkQDZpDrdXE8BxGJULfe8j0T8EZnpUNXr2W3YHd/FxRoO/bPhKA2Dc0E0=; path=/; expires=Sun, 12-Feb-23 00:03:11 GMT; domain=.qxbroker.com; HttpOnly; Secure; SameSite=None Server-Timing: cf-q-config;dur=6.9999950937927e-06 Server: cloudflare CF-RAY: 7980e3583b6a0785-MRS
HTTP get request Access Denied
Trying to understand why I am getting access denied when attempting to download the index.html from www.gamestop.com. I have figured out how to get around it. https://www.gamestop.com/on/demandware.static/Sites-gamestop-us-Site/-/default/v1592871955944/js/main.js. I was wondering if anyone understood why the basic url (www.gamestop.com) is rejected. Code: import requests import http.client as http_client import logging headers = { 'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'accept-encoding':'gzip, deflate, br', 'accept-language':'en-US,en;q=0.9', 'cache-control':'max-age=0', 'connection':'keep-alive', 'dnt':'1', 'downlink':'10', 'ect':'4g', 'rtt':'50', 'sec-fetch-dest':'document', 'sec-fetch-mode':'navigate', 'sec-fetch-site':'none', 'sec-fetch-user':'?1', 'upgrade-insecure-requests':'1', 'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.410 3.97 Safari/537.36' } http_client.HTTPConnection.debuglevel = 1 logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) requests_log = logging.getLogger("requests.packages.urllib3") requests_log.setLevel(logging.DEBUG) requests_log.propagate = True r = requests.get('https://www.gamestop.com', headers=headers) print(r.text) print(r.status_code) print(r.headers) Output: DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.gamestop.com:443 send: b'GET / HTTP/1.1\r\nHost: www.gamestop.com\r\nuser-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.410 3.97 Safari/537.36\r\naccept-encoding: gzip, deflate, br\r\naccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nconnection: keep-alive\r\naccept-language: en-US,en;q=0.9\r\ncache-control: max-age=0\r\ndnt: 1\r\ndownlink: 10\r\nect: 4g\r\nrtt: 50\r\nsec-fetch-dest: document\r\nsec-fetch-mode: navigate\r\nsec-fetch-site: none\r\nsec-fetch-user: ?1\r\nupgrade-insecure-requests: 1\r\n\r\n' reply: 'HTTP/1.1 403 Forbidden\r\n' header: Server: AkamaiGHost header: Mime-Version: 1.0 header: Content-Type: text/html header: Content-Length: 265 header: Expires: Fri, 26 Jun 2020 19:54:19 GMT header: Date: Fri, 26 Jun 2020 19:54:19 GMT header: Connection: close header: Server-Timing: cdn-cache; desc=HIT header: Server-Timing: cdn-cache; desc=HIT DEBUG:urllib3.connectionpool:https://www.gamestop.com:443 "GET / HTTP/1.1" 403 265 <HTML><HEAD> <TITLE>Access Denied</TITLE> </HEAD><BODY> <H1>Access Denied</H1> You don't have permission to access "http://www.gamestop.com/" on this server.<P> Reference #18.19e8d93f.1593201259.5c2b9d0 </BODY> </HTML> 403 {'Server': 'AkamaiGHost', 'Mime-Version': '1.0', 'Content-Type': 'text/html', 'Content-Length': '265', 'Expires': 'Fri, 26 Jun 2020 19:54:19 GMT', 'Date': 'Fri, 26 Jun 2020 19:54:19 GMT', 'Connection': 'close', 'Server-Timing': 'cdn-cache; desc=HIT, edge; dur=1'}
This is a code from my another project. By using python fake user agent you can bypass this; Use google to learn more about those module that i used here.. from selenium import webdriver from selenium.webdriver.chrome.options import Options from fake_useragent import UserAgent ua = UserAgent() userAgent = ua.random chrome_options = Options() chrome_options.add_argument("--headless") chrome_options.add_argument(f'user-agent={userAgent}') driver = webdriver.Chrome( executable_path=r'C:\Users\ASHIK\Desktop\chromedriver.exe', options=chrome_options) driver.get("https://www.myntra.com/men?f=Categories%3ATshirts&p=1") html_doc = driver.page_source with open('myntra-ecom.html', 'w', encoding='utf-8') as hfile: hfile.writelines(html_doc) hfile.close() print("Html file Downloaded...")
Python POST request to retrieve base64 encode File
Im trying to POST request using Python to retreive a specific File. Since the URL is behind a server with authorized access theres no use posting it here However the form data contains a field called base64 and lengthy which I cant figure out if its a form data value or base64 encoding of post request Here are browser parameters General: Request URL: http://exampleapi.com/api/Document/Export Request Method: POST Status Code: 200 OK Remote Address: XX.XXX.XXX.XX:XX Referrer Policy: no-referrer-when-downgrade Response Headers: Access-Control-Allow-Origin: http://example.com Cache-Control: no-cache Content-Disposition: attachment; filename=location-downloads.xlsx Content-Length: 7148 Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Date: Tue, 23 Jul 2019 21:00:18 GMT Expires: -1 Pragma: no-cache Server: Microsoft-IIS/7.5 X-AspNet-Version: 4.0.30319 X-Powered-By: ASP.NET Request Headers : Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3 Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9 Cache-Control: max-age=0 Connection: keep-alive Content-Length: 10162 Content-Type: application/x-www-form-urlencoded Cookie: abcConnection=!UA7tkC3iZCmVNGRUyRpDWARVBWk/lY6SZvgxLlaygsQKk+vuwA1NxvhwE9ph4i+3NZlKeepIfuHhUvyQjl68fhhrT9ueqMx/3mBKUDcT DNT: 1 Host: exampleapi.com Origin: http://example.com Referer: http://example.com/ Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 Form Data: fileName: location-downloads.xlsx contentType: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet base64: UEsDBAoAAAAAAAh4904AAAAAAAAAAAAAAAAJAAAAZG9jUHJvcHMvUEsDBAoAAAAIAAh490(shortened for simplicity) Here is what I tried url='http://example.com' urllib3.disable_warnings() headers = { "Content-Type": "application/x-www-form-urlencoded", "User-Agent": "Mozilla/5.0", } with requests.session() as s: r=s.get(url,headers={"User-Agent":"Mozilla/5.0"},verify=False) data=r.content soup=BeautifulSoup(data,'html.parser') form_data = { "fileName":"location-downloads.xlsx", "contentType":"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" } r2=s.post('http://exampleapi.com/api/Document/Export',data=json.dumps(form_data,ensure_ascii=True).encode('utf-8'),headers=headers,verify=False) print(r2.status_code) Any idea how i should proceed. My status code also shows 500 for the post here
Expressjs Route contains weird characters
What could possibly be the reason for expressjs route to contain the following data? I am expecting it to return JSON data. I am making an ajax call to the server(expressjs) which gives me the below data with weird characters. Is this data gzipped? I have set the headers and contentType as follows: headers: {"Access-Control-Allow-Origin":"*"} contentType: 'application/json; charset=utf-8' �=O�0�b��K�)�%7�܈9���G��%NOU���O'6��k�~6��S.���,��/�wأ%6�K�)��e� The HTTP response is as follows: General: Request URL: http://localhost/expressRoute.js Request Method: GET Status Code: 200 OK Remote Address: [::1]:80 Referrer Policy: no-referrer-when-downgrade Response Headers: Accept-Ranges: bytes Connection: Keep-Alive Content-Length: 29396 Content-Type: application/javascript Date: Thu, 22 Nov 2018 00:50:36 GMT ETag: "72d4-57b124e0c372e" Keep-Alive: timeout=5, max=100 Last-Modified: Tue, 20 Nov 2018 05:57:12 GMT Server: Apache/2.4.34 (Win32) OpenSSL/1.1.0i PHP/7.2.10 Request Headers: Accept: */* Accept-Encoding: gzip, deflate, br Accept-Language: en-US,en;q=0.9 Cache-Control: no-cache Connection: keep-alive Host: localhost Pragma: no-cache Referer: http://localhost/index.html User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Post request returns empty string (python)
I need to get data from site. To get this data user have to enter post code first. After exploring the source code i got the following. Response result (that is what i need after all) {PostCodePK: 16666, PostCode: "7468", City: "MACQUARIE HEADS", State: "TAS", Country: "AUST",…} 1 : {PostCodePK: 16667, PostCode: "7468", City: "STRAHAN", State: "TAS", Country: "AUST",…} Request Data. Request URL:http://www.lucasmill.com/Resources/ws-common.aspx Request Method:POST Status Code:200 OK Remote Address:111.67.1.113:80 Response Headers view source Cache-Control:private Content-Length:321 Content-Type:application/json; charset=utf-8 Date:Tue, 21 Feb 2017 20:19:26 GMT Expires:Tue, 21 Feb 2017 20:19:26 GMT Server:Microsoft-IIS/8.5 Set-Cookie:dnn_IsMobile=False; path=/; HttpOnly Request Headers view source Accept:application/json, text/javascript, */*; q=0.01 Accept-Encoding:gzip, deflate Accept-Language:en-US,en;q=0.8,ru;q=0.6,uk;q=0.4 Connection:keep-alive Content-Length:18 Content-Type:application/json; charset=UTF-8 Cookie:.ASPXANONYMOUS=ptTlH_jC0gEkAAAAZTU4MTA5NTItZmNlZS00MzRjLThmYTgtMWZkYWNkOTEwZmY00; dnn_IsMobile=False; language=en-AU; __utmt=1; __utma=97280258.254723646.1487697408.1487697408.1487708346.2; __utmb=97280258.1.10.1487708346; __utmc=97280258; __utmz=97280258.1487697408.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none) DNN-Service:true DNN-Service-Method:GetTown DNT:1 Host:www.lucasmill.com Origin:http://www.lucasmill.com Referer:http://www.lucasmill.com/Sawmilling-Contractors User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36 X-Requested-With:XMLHttpRequest Request Payload { 'param':'7468' } Here is my python code. import requests r = requests.post('http://www.lucasmill.com/Resources/ws-common.aspx',data={ 'param':'7468' }) print(r.text) but all i receive in response is an empty string. Where am i wrong?
You need to add a few extra headers that particular application requires. If you look at the request headers dump in the browser, you can see the following: So, translated to python it will look like: import json import requests headers = { 'DNN-Service': 'true', 'DNN-Service-Method': 'GetTown', } r = requests.post('http://www.lucasmill.com/Resources/ws-common.aspx',data=json.dumps({'param':7468}), headers=headers) print(r.text) Output: [none][22:28:30] vlazarenko#alluminium (~/tests)$ python post.py [{"PostCodePK":16666,"PostCode":"7468","City":"MACQUARIE HEADS","State":"TAS","Country":"AUST","Latitude":"-42.2149353","Longitude":"145.1951436","CanGeocode":true},{"PostCodePK":16667,"PostCode":"7468","City":"STRAHAN","State":"TAS","Country":"AUST","Latitude":"-42.1534771","Longitude":"145.3281242","CanGeocode":true}]